Memory |
Cardinality, existing indices, cardinality of groups |
Arrangements and size estimates |
(Vec, Formula)
|
Hydration time |
Cardinality |
Time estimate |
Formula
|
Latency per update |
Cardinality, expected change rate per-update delta |
Time estimate (per source) |
Map>
|
Scalability |
Distribution of keys (e.g., cardinality of distinct keys) |
Data parallelism estimate |
Map>
|
In the output type, Arrangement
refers to summary information about the arrangements necessary; Formula<V>
refers to an algebraic expression with variables drawn from the type V
.
The memory dimension is meant to identify particularly expensive new arrangements: for example, a count query grouping by US state code (just 50, no big deal) vs. grouping by user ID (as big as your user table).
The hydration time dimension is effectively a classical batch database cost model: given cardinalities on the input relations, predict how much work a given plan has to do.
The latency per update dimension is unique to our streaming setting. Different query plans will lead to different costs when different sources produce new data. If a calculus metaphor is helpful, you can think of the cost model here producing a partial derivative with respect to each input; alternatively, the cost model’s aggregate latency per update cost is the Jacobian of that query.
The scalability dimension is meant to characterize how much the work for the given query can be spread across multiple nodes; the output should identify which columns (per source) will allow for distribution.
Reference explanation
This hasn't been implemented yet.
Rollout
Implementation plan
The current plan is to work towards a minimally viable cost model, with the following priority ordering memory > hydration > latency > scalability.
Start with a source cardinality analysis that maps (symbolic) input cardinalities to (symbolic) output cardinalities.
Cardinality analysis can work on MIR or LIR, since cardinality ought to be constant under optimization.
Selectivity factors will need to be just made up for now; it may pay to set all factors to 1, i.e., to get a worst-case estimate.
Use the cardinality analysis to build the memory cost model.
- The memory cost model will work on LIR using the "lower and check cost" approach when planning joins in MIR.
Try to validate the memory cost model on a variety of queries.
Experiment with join ordering to see if the memory cost model will improve on our heuristics.
Testing and observability
Testing can be split into two categories:
- High-level validation of the cost model in terms of a corpus of interesting/worthwhile queries.
- Low-level correctness of the cost model in terms of the costs it assigns to particular small queries.
High-level validation requires running queries on meaningful amounts of data; low-level correctness checking might be able to run via datadriven::walk
without using any data at all, just checking that certain queries are always given better costs than other queries.
Low-level correctness can live in CI, but high-level validation may be costly enough to need to live elsewhere. We should ensure that high-level validation is run with some frequency to avoid drift.
Lifecycle
This feature will only be visible as EXPLAIN WITH(cost)
, with no real contract with users about the meaning of costs (at least for now).
Drawbacks
A bad cost model may be worse than no cost model.
Conclusion and alternatives
Alternatives
Cost models all look more or less the same, though the precise input features and outputs vary. These four factors and their input features are what we came up with as sensible things to go for at first.
An alternative implementation approach would be to build a completely empirical cost model. That is, rather than writing a symbolic cost model a priori, we could curate a corpus of candidate queries, observe their performance on a variety of inputs, and then try to fit a cost model based on those results. Such an empirical cost model will almost surely be overfit and not particularly robust; it would also be hard to maintain.
Should the multi-dimensional cost model be separate analyses (each running in a linear pass over the plan) or as a single analysis (running one pass)? The shared dependency on source cardinality information suggests that a multi-pass approach will be logistically easier. We can try to combine passes later, though the overall time spent calculating cost should be quite low.
Unresolved questions
Which loads should we test on? No cost model is perfect, but more testing with better/more realistic loads will give us some confidence that our model will help us make good decisions.
Should the cost model work on MirRelationExpression
or LIR's Plan
? Join planning happens at the MIR level, and join planning is a natural first client of the cost model. Even so, join planning is late enough that we could have the cost model work on LIR and plan joins by lowering the joined relations to LIR, considering their cost, and then throwing away that lowering. It's more expensive, but working on LIR means the cost model never has to guess how the query will lower.
How should multi-dimensional costs be rendered in EXPLAIN WITH(cost)
?
How should we communicate source features to the cost model?
Future work
When we have a candidate cost model, we will want to identify opportunities to use the cost model in the optimizer and pick a good first place to use it—most likely join planning. Actually using the cost model in the optimizer will require more validation and testing—at a minimum on LDBC, but ideally on client queries—to ensure that we don’t regress performance.
We've identified the following early candidate clients for the cost model:
Join ordering. There are some trade offs here between set enumeration and more heuristic approaches---in an ideal world, we'd have an explicit "optimization budget" to determine how much of the join ordering space to explore. (Uses memory at joins; cardinality and predicate selectivity across MIR.)
CTE filter pushdown. Pushing filters all the way down can be advantageous, depending on selectivity. (Uses cardinality and predicate selectivity across MIR.) LDBC BI Q16 is an example that would benefit.
Reduction pushdown. Pushing reductions down is not always advantageous. (Uses cardinality and predicate selectivity across MIR.)
Late materialization. Given a join on a number of tables, it may be advantageous to break the join in two: a small join on a projection to set of common keys and a larger join to collect the other fields. Whether or not late materialization is worthwhile depends on which arrangements already exist. (Uses memory and predicate selectivity over MIR (or maybe LIR).)
Broadcast selection. Using cardinality information, choose cheap broadcasts. https://materializeinc.slack.com/archives/C02PPB50ZHS/p1685614805876319?thread_ts=1685552977.381309&cid=C02PPB50ZHS
OR <-> UNION decomposition. It is possible to decompose a disjunctive WHERE
clause into a UNION
, and this is sometimes worthwhile. https://github.com/MaterializeInc/database-issues/issues/1312#issuecomment-1571767817