It is difficult to understand query performance, namely: which part of the query is responsible for poor performance?
There are three stakeholders who want to understand query performance:
Existing introspection views make it possible to understand dataflow performance; writing WMR queries over these introspection views suffices to get useful information about dataflow performance.
With some (??? how much/how tolerable ???) work, the cluster team and field engineering can attribute this profile information to parts of MIR or even SQL. Improvements to rendering have made the names of dataflows more meaningful, but there is still a substantial gap between dataflows and the MIR.
It is not clear to what extent customers can understand the introspection views or the WMR queries aggregating their information.
A stakeholder can run a command like:
EXPLAIN PLAN PROFILE FOR INDEX my_crazy_view_idx;
and receive some output that explains how my_crazy_view_idx
is
currently running, i.e., what operations does it perform and how well
are those operations performing (in terms of, e.g., memory or CPU
usage, hydration status, number of input rows for each operator,
worker skew).
In an ideal solution, all three stakeholders will receive output that is legible at their level of sophistication.
There are three criteria that determine how legible the output is:
sophistication Optimizer engineers are prepared to deal with low-level details of the process, but others may not be prepared to deal with such details.
education Documentation and other materials can clarify meanings and offer examples and explanation.
design Our choices in how we represent the output and the quality of our implementation.
What structures do we attribute profiling information to? We could annotate dataflows; we could annotate the original SQL query. Our current architecture admits five possibilities:
(dfl) EXPLAIN PLAN PROFILE
is a macro around a WMR analysis
query; it describes dataflows. Attribution is direct.
(lir) EXPLAIN PLAN PROFILE
annotates an LIR plan with
information mapped back from the dataflows. Attribution is more or
less direct.
(mir) EXPLAIN PLAN PROFILE
annotates an MIR plan with
information mapped back from the dataflows. Attribution requires
some mapping between MIR and LIR; the proposed 'address
predicting' forward mapping should support this.
(hir) EXPLAIN PLAN PROFILE
annotates an HIR plan with
information mapped back from the dataflows. Attribution requires
the MIR->LIR mapping as well as some mapping from HIR->MIR. The
current lowering is not careful about provenance; we would need to
add source location information at the HIR level and track
provenance at the MIR level, which would be a large change to the
transform
crate.
(sql) EXPLAIN PLAN PROFILE
annotates the original SQL query
information mapped back from the dataflows. Attribution requires
HIR->MIR->LIR mapping; we would additionally need source location
information from the SQL parser, another large change.
Each of these approaches is differently legible; as we move closer to the SQL query itself, implementation complexity increases.
Approach | Cluster? | FieldEng? | Customer? | Impl. Complexity |
---|---|---|---|---|
dfl | ~ | ~ | X | low |
lir | + | + | X | low |
mir | + | + | ~ | medium |
hir | ~ | ~ | + | high |
sql | ~ | ~ | + | v high |
It seems unrealistic to expect customers to understand dataflows or LIR as they exist now. MIR is a plausible thing to show customers (per field engineering), though we lack documentation.
HIR and SQL will be more legible to customers, but the current approach to lowering will make attribution at these levels less useful for the cluster team and field engineering.
Designing a new level---DDIR, an MIR replacement---is too much for a first cut. Refining an existing level is a good idea---but not the first step.
We should attribute at the MIR level: it offers the best mix of abstraction and legibility at only a moderate level of complexity.
There is good precedent for choosing MIR, as MIR is cognate to
EXPLAIN (FORMAT TEXT)
from Postgres, EXPLAIN FORMAT=TREE
from
MySQL, or EXPLAIN USING TEXT
from Snowflake.
We should identify a partner customer who can help us refine the
output of EXPLAIN
in general and EXPLAIN PLAN PROFILE
in
particular, rather than attempting to guess at the right information
to show.
We will use introspection sources to do the source mapping. At present, we have introspection sources that let us map dataflow addresses to their statistics and operators.
Each LIR operator has a distinct (to that query) LirId
, a u64
value generated when lowering MIR to LIR.
Each MIR operator has a distinct index in the post-order traversal;
we can call this index its MirId
.
We will add introspection sources:
GlobalId
s and LirId
s to dataflow addressesGlobalId
s and LirId
s to the LIR operator (think: a
single line of an EXPLAIN PHYSICAL PLAN
)GlobalId
s and MirId
s to an LirId
GlobalId
s and MirId
s to the MIR operator (think: a
single line of an EXPLAIN PLAN
)With these introspection sources, we can implement EXPLAIN PLAN
PROFILE FOR (INDEX|MATERIALIZED VIEW) [name]
as a query not unlike
the existing ad-hoc WMR used to analyze dataflow. That query will:
name
to a GlobalId
GlobalId
) the MirId
, the MIR operator, the
LirId
, the dataflow address, and the dataflow statisticsMirId
descending will give you
post-order)We might see output like the following (totally made up numbers, query
drawn from joins.slt
; plan seems weird to me, not relevant):
> CREATE MATERIALIZED VIEW v AS SELECT name, id FROM v4362 WHERE name = (SELECT name FROM v4362 WHERE id = 1);
> EXPLAIN PLAN PROFILE FOR MATERIALIZED VIEW v;
|operator |memory|
|----------------------------------------------------------------|------|
|Project (#0, #1) | |
| Join on=(#0 = #2) type=differential | |
| ArrangeBy keys=[[#0]] | 12GB|
| ReadStorage materialize.public.t4362 | |
| ArrangeBy keys=[[#0]] | 10GB|
| Union | |
| Project (#0) | |
| Get l0 | |
| Map (error("more than one record produced in subquery"))| |
| Project () | |
| Filter (#0 > 1) | |
| Reduce aggregates=[count(*)] | 3GB|
| Project () | |
| Get l0 | |
|cte l0 = | |
| Filter (#1 = 1) | |
| ReadStorage materialize.public.t4362 | |
There are pros and cons to this approach.
EXPLAIN PLAN PROFILE
macro).EXPLAIN PLAN PROFILE
inside environmentd, we'd need to have a way to read from
introspection sources while running an EXPLAIN
.Adaptable: we can run custom queries or edit the macro easily. We can therefore experiment a little more freely with how we render different MIR operators.
Doesn't hook in to existing EXPLAIN
infrastructure.
If we find we miss features from the EXPLAIN
infrastructure, we can
enrich what we put in the MirId
operator mapping.
As an extension, we can support EXPLAIN PLAN PROFILE ... AS SQL
,
which would generate and print the attribution query without actually
running it; this allows us to edit that query as we please.
We must be careful to show profiles and plans for the existing,
running dataflows. For example, suppose the user creates
my_crazy_view
on top of my_perfectly_sane_view
and then creates
my_crazy_view_idx
. Later on, they create an index for
my_perfectly_sane_view
called my_perfectly_sane_view_idx
. The
running dataflow for my_crazy_view_idx
will not use
my_perfectly_sane_view_idx
, but if we were to recreate the index it
might!
There are several possible profiling metrics to track---current memory usage, peak memory usage, current number of rows, row throughput, worker skew, latency (average, P50, P99, max), timestamp. Which should we do first? (I would propose some form of memory usage and some form of latency.)
If we commit to implementing EXPLAIN PLAN PROFILE
at the MIR level,
how hard is it to also implement it for LIR along the way?