Query Optimizer

K2 has a flexible, extensible query optimizer which uses both rewrite rules and a cost model.

Rewrite Rules

After a query has been translated into an abstract syntax tree, which K2 uses to represent queries internally, it is manipulated by applying a series of rewrite rules. This is where the bulk of K2's optimization work is done.

First, a collection of rules is applied which simplifies the query, by taking pieces expressed using certain kinds of tree nodes and replacing them with others. This reduces the number of types of nodes we need to deal with, thus reducing the number and complexity of the rewrite rules that follow.

Next the query is normalized. Normalization rules include things like taking a function that is applied to a conditional structure, and rewriting it so the function is applied to each of the expressions in the condition. Another normalization rule will remove loops that range over collections that are known to be empty. The repeated application of the normalization rules, which currently number more than twenty, will reduce a query to a minimal, or "normal" form.

The final set of rewrite rules are then applied to the normalized query. These rules do things like parallelizing loops over external data sources, and pushing selections, projections, and joins down to the drivers where possible.

Cost Model

Once all of the rewrite rules have been applied, there may still be room for further optimization. In particular, a query may not have a single "minimal" form, but a whole family of them. The way we decide between them is by using a cost model. We quickly get an estimate of the execution time of each version of the query, and choose the one that we expect to execute fastest.

The current cost model is still in the development stage. It is functional, but does not always choose the optimal form of the query. For more information on our plans for the optimizer, see our Ongoing Work page.


[K2 Front Page]