Min/max reductions and top-k operations are computed in Materialize by creating a stack of arrangements and performing tournament-tree-style hierarchical processing of updates. By default, the height of the hierarchies created is sized to accommodate the existence of potentially very large query groups in the data. This implies that Materialize provides an incremental update experience from the start, even in the presence of large input query groups, but this comes at the cost of extra memory usage.
Memory usage can be reduced substantially for queries with min/max or top-k by tuning the
EXPECTED GROUP SIZE
query hint. The query hint informs Materialize about the maximum size of
the query groups that are expected in the input to the min/max reduction or top-k operation. Based
on this information, Materialize can size down the height of the hierarchy and stack fewer
arrangements.
Currently, the EXPECTED GROUP SIZE
controls the tuning of both min/max reductions and top-k computations.
However, when the two patterns co-occur in the same query block, it is currently impossible to tune them
independently. This problem is reported in issue MaterializeInc/database-issues#5578. We expand the first
example query provided in the issue to illustrate the problem in more detail:
SELECT
sum(max_revenue) AS sum_max_revenue
FROM (
SELECT l_orderkey,
max(l_extendedprice * (1 - l_discount)) AS max_revenue
FROM lineitem
GROUP BY l_orderkey
OPTIONS (EXPECTED GROUP SIZE = 7)
ORDER BY max_revenue DESC
LIMIT 10
);
The EXPLAIN
output for the above query is:
Optimized Plan
-------------------------------------------------------------------------------------
Explained Query: +
Return +
Union +
Get l0 +
Map (null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0)] +
TopK order_by=[#0 desc nulls_first] limit=10 exp_group_size=7 +
Project (#1) +
Reduce group_by=[#0] aggregates=[max((#1 * (1 - #2)))] exp_group_size=7+
Project (#0, #5, #6) +
Get materialize.public.lineitem +
(1 row)
Above, we see that the exp_group_size
for both the max reduction and the top-k operation
are set to the same value. This may not be desirable as the size of the groups going into the
reduction above correspond to the number of line items per order (maximum 7), but the size of
the single group going into the top-k operation corresponds to the number of orders (1.5 million
times the scale factor).
To fix the above, the user presently has two options:
Rewrite the SQL to separate the operations into different query blocks. In the example:
SELECT
sum(max_revenue) AS sum_max_revenue
FROM (
SELECT max_revenue
FROM (
SELECT l_orderkey,
max(l_extendedprice * (1 - l_discount)) AS max_revenue
FROM lineitem
GROUP BY l_orderkey
OPTIONS (EXPECTED GROUP SIZE = 7)
)
OPTIONS (EXPECTED GROUP SIZE = 1500000)
ORDER BY max_revenue DESC
LIMIT 10
);
Optimized Plan
-------------------------------------------------------------------------------------
Explained Query: +
Return +
Union +
Get l0 +
Map (null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0)] +
TopK order_by=[#0 desc nulls_first] limit=10 exp_group_size=1500000 +
Project (#1) +
Reduce group_by=[#0] aggregates=[max((#1 * (1 - #2)))] exp_group_size=7+
Project (#0, #5, #6) +
Get materialize.public.lineitem +
(1 row)
To avoid using the workaround above and directly hint the original SQL statement, we would need to break the ambiguity regarding to which construct the hint applies.
EXPECTED GROUP SIZE
query hint should not have
to be rewritten to use a different hint. At the same time, a user could themself choose to exploit
higher potential for memory savings with minimal changes to their SQL (i.e., by changing the
hints in the OPTIONS
clause).Fixing any understandability issues with the current EXPECTED GROUP SIZE
hint is considered
out of scope to solve the problem above. In particular, sometimes the EXPECTED GROUP SIZE
is
useful even if a GROUP BY
clause is not present in the query, e.g.:
SELECT max(l_orderkey) FROM lineitem
OPTIONS (EXPECTED GROUP SIZE = 6000000)
SELECT DISTINCT ON(l_orderkey) l_orderkey, l_linenumber
FROM lineitem
OPTIONS (EXPECTED GROUP SIZE = 7)
ORDER BY l_orderkey, l_extendedprice DESC;
In the queries above, users may be confused about what GROUP
is referred to.
We do not aim to solve this understandability issue with the existing hint as part of this design.
Presently, if a query hint is provided but not used, no notice nor error is produced. For example:
SELECT * FROM lineitem
OPTIONS (EXPECTED GROUP SIZE = 6000000)
Above, the query will be executed by simply ignoring the EXPECTED GROUP SIZE
hint.
We deem out-of-scope to change this behavior.
An EXPECTED GROUP SIZE
hint specification will be provided to reductions that do not include
min/max aggregates in MIR, but will be subsequently be attached only to hierarchical reductions that
employ bucketed plans in lowering to LIR and ignored otherwise. For example:
SELECT l_linenumber, sum(l_extendedprice)
FROM lineitem
GROUP BY l_linenumber
OPTIONS (EXPECTED GROUP SIZE = 860000);
Optimized Plan
-------------------------------------------------------------------
Explained Query: +
Reduce group_by=[#0] aggregates=[sum(#1)] exp_group_size=860000+
Project (#3, #5) +
Get materialize.public.lineitem +
(1 row)
We do not aim to change this current characteristic of the code.
In a single query block, we can add different kinds of SQL constructs that include min/max aggregates and different flavors of top-k operations. Consider the following example:
CREATE TABLE teachers (id INT, name TEXT);
CREATE TABLE sections (id INT, teacher_id INT, course_id INT, schedule TEXT);
CREATE MATERIALIZED VIEW nested_distinct_on_group_by_limit AS
SELECT SUM(id) AS sum_id, SUM(teacher_id) AS sum_teacher_id, SUM(max_course_id) AS sum_max_course_id
FROM (
SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id) AS max_course_id
FROM sections
GROUP BY id, teacher_id
OPTIONS (EXPECTED GROUP SIZE = 1000)
ORDER BY teacher_id, id
LIMIT 2
);
Optimized Plan
------------------------------------------------------------------------------------------
materialize.public.nested_distinct_on_group_by_limit: +
Return +
Union +
Get l0 +
Map (null, null, null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0), sum(#1), sum(#2)] +
TopK order_by=[#1 asc nulls_last, #0 asc nulls_last] limit=2 exp_group_size=1000+
TopK group_by=[#1] order_by=[#0 asc nulls_last] limit=1 exp_group_size=1000 +
Reduce group_by=[#0, #1] aggregates=[max(#2)] exp_group_size=1000 +
Project (#0..=#2) +
Get materialize.public.sections +
(1 row)
To disambiguate the query hints when necessary, we argue for an approach with the following characteristics:
EXPECTED GROUP SIZE
by allowing users to use this
hint with the exact same semantics it has today, i.e., if the EXPECTED GROUP SIZE
is specified,
it is attached to all instances of reductions and top-k operators originating from the query block.EXPECTED GROUP SIZE
, the statement will error out.The error behavior advocated for in 2. above ensures that either the user will employ the new,
more ergonomic hints, or alternatively rely on the backwards compatible EXPECTED GROUP SIZE
.
It eliminates any concerns regarding interactions between the new hints and the old one.
To operationalize the above, the following new hints are proposed:
AGGREGATE INPUT GROUP SIZE
: This hint attaches to the Reduce
operator implementing the aggregation
in the query block.DISTINCT ON INPUT GROUP SIZE
: This hint attaches to the TopK
operator implementing the
DISTINCT ON
clause.LIMIT INPUT GROUP SIZE
: This hint attaches to the TopK
operator implementing the LIMIT
clause.The following queries illustrate how the proposed query hints can be used in a manner that is backwards compatible with the current syntax, but allows for breaking the ambiguity when desired.
CREATE MATERIALIZED VIEW nested_distinct_on_group_by_limit AS
SELECT SUM(id) AS sum_id, SUM(teacher_id) AS sum_teacher_id, SUM(max_course_id) AS sum_max_course_id
FROM (
SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id) AS max_course_id
FROM sections
GROUP BY id, teacher_id
OPTIONS (AGGREGATE INPUT GROUP SIZE = 1000, LIMIT INPUT GROUP SIZE = 50)
ORDER BY teacher_id, id
LIMIT 2
);
Expected Plan:
Optimized Plan
------------------------------------------------------------------------------------------
materialize.public.nested_distinct_on_group_by_limit: +
Return +
Union +
Get l0 +
Map (null, null, null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0), sum(#1), sum(#2)] +
TopK order_by=[#1 asc nulls_last, #0 asc nulls_last] limit=2 exp_group_size=50 +
TopK group_by=[#1] order_by=[#0 asc nulls_last] limit=1 +
Reduce group_by=[#0, #1] aggregates=[max(#2)] exp_group_size=1000 +
Project (#0..=#2) +
Get materialize.public.sections +
(1 row)
CREATE MATERIALIZED VIEW nested_distinct_on_group_by_limit AS
SELECT SUM(id) AS sum_id, SUM(teacher_id) AS sum_teacher_id, SUM(max_course_id) AS sum_max_course_id
FROM (
SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id) AS max_course_id
FROM sections
GROUP BY id, teacher_id
OPTIONS (LIMIT INPUT GROUP SIZE = 50, DISTINCT ON INPUT GROUP SIZE = 60)
ORDER BY teacher_id, id
LIMIT 2
);
Expected Plan:
Optimized Plan
------------------------------------------------------------------------------------------
materialize.public.nested_distinct_on_group_by_limit: +
Return +
Union +
Get l0 +
Map (null, null, null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0), sum(#1), sum(#2)] +
TopK order_by=[#1 asc nulls_last, #0 asc nulls_last] limit=2 exp_group_size=50 +
TopK group_by=[#1] order_by=[#0 asc nulls_last] limit=1 exp_group_size=60 +
Reduce group_by=[#0, #1] aggregates=[max(#2)] +
Project (#0..=#2) +
Get materialize.public.sections +
(1 row)
CREATE MATERIALIZED VIEW nested_distinct_on_group_by_limit AS
SELECT SUM(id) AS sum_id, SUM(teacher_id) AS sum_teacher_id, SUM(max_course_id) AS sum_max_course_id
FROM (
SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id) AS max_course_id
FROM sections
GROUP BY id, teacher_id
OPTIONS (AGGREGATE INPUT GROUP SIZE = 1000, DISTINCT ON INPUT GROUP SIZE = 60, LIMIT INPUT GROUP SIZE = 50)
ORDER BY teacher_id, id
LIMIT 2
);
Expected Plan:
Optimized Plan
------------------------------------------------------------------------------------------
materialize.public.nested_distinct_on_group_by_limit: +
Return +
Union +
Get l0 +
Map (null, null, null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0), sum(#1), sum(#2)] +
TopK order_by=[#1 asc nulls_last, #0 asc nulls_last] limit=2 exp_group_size=50 +
TopK group_by=[#1] order_by=[#0 asc nulls_last] limit=1 exp_group_size=60 +
Reduce group_by=[#0, #1] aggregates=[max(#2)] exp_group_size=1000 +
Project (#0..=#2) +
Get materialize.public.sections +
(1 row)
CREATE MATERIALIZED VIEW nested_distinct_on_group_by_limit AS
SELECT SUM(id) AS sum_id, SUM(teacher_id) AS sum_teacher_id, SUM(max_course_id) AS sum_max_course_id
FROM (
SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id) AS max_course_id
FROM sections
GROUP BY id, teacher_id
OPTIONS (LIMIT INPUT GROUP SIZE = 50, EXPECTED GROUP SIZE = 1000)
ORDER BY teacher_id, id
LIMIT 2
);
Expected Plan:
ERROR: EXPECTED GROUP SIZE cannot be used in combination with LIMIT INPUT GROUP SIZE.
CREATE MATERIALIZED VIEW nested_distinct_on_group_by_limit AS
SELECT SUM(id) AS sum_id, SUM(teacher_id) AS sum_teacher_id, SUM(max_course_id) AS sum_max_course_id
FROM (
SELECT DISTINCT ON(teacher_id) id, teacher_id, MAX(course_id) AS max_course_id
FROM sections
GROUP BY id, teacher_id
OPTIONS (EXPECTED GROUP SIZE = 1000)
ORDER BY teacher_id, id
LIMIT 2
);
Expected Plan:
Optimized Plan
------------------------------------------------------------------------------------------
materialize.public.nested_distinct_on_group_by_limit: +
Return +
Union +
Get l0 +
Map (null, null, null) +
Union +
Negate +
Project () +
Get l0 +
Constant +
- () +
With +
cte l0 = +
Reduce aggregates=[sum(#0), sum(#1), sum(#2)] +
TopK order_by=[#1 asc nulls_last, #0 asc nulls_last] limit=2 exp_group_size=1000+
TopK group_by=[#1] order_by=[#0 asc nulls_last] limit=1 exp_group_size=1000 +
Reduce group_by=[#0, #1] aggregates=[max(#2)] exp_group_size=1000 +
Project (#0..=#2) +
Get materialize.public.sections +
(1 row)
We illustrate a few more queries with the usage of the new hints and variations of top-k patterns:
CREATE MATERIALIZED VIEW sections_of_top_3_courses_per_teacher AS
SELECT id AS teacher_id, section_id
FROM teachers grp,
LATERAL (SELECT id AS section_id
FROM sections
WHERE teacher_id = grp.id
OPTIONS (LIMIT INPUT GROUP SIZE = 1000)
ORDER BY course_id DESC
LIMIT 3);
In the query above, the hint EXPECTED GROUP SIZE
could have been used interchangeably, since there
is no ambiguity and the hint applies to the LIMIT
clause.
CREATE MATERIALIZED VIEW max_sections_of_top_3_courses_per_teacher AS
SELECT id AS teacher_id, max_section_id
FROM teachers grp,
LATERAL (SELECT course_id, MAX(id) AS max_section_id
FROM sections
WHERE teacher_id = grp.id
GROUP BY course_id
OPTIONS (AGGREGATE INPUT GROUP SIZE = 1000, LIMIT INPUT GROUP SIZE = 20)
ORDER BY course_id DESC
LIMIT 3);
The above query specifies both the AGGREGATE INPUT GROUP SIZE
and the LIMIT INPUT GROUP SIZE
wherein the
AGGREGATE INPUT GROUP SIZE
will thus only apply to the min/max reduction while the LIMIT INPUT GROUP SIZE
will apply to the top-k operation.
This section captures a few more suggestions that came up during discussion of the proposal in this design doc.
The three new hints are arguably more ergonomic and should be the preferred way to tune the exp_group_size
of various operators. So we should update our documentation to primarily refer to the three new hints. Perhaps
not even mentioning the old EXPECTED GROUP SIZE
at all would allow us to create a path for syntax deprecation
in the future (even though that process is not in scope for this design).
Two follow-up items that are related improvements to usability of the hints include: (a) Producing errors
when the hints do not apply (MaterializeInc/database-issues#6472); (b) Moving the OPTIONS
clause to a different
position in the SELECT
syntax and potentially renaming it (MaterializeInc/database-issues#6473).
These two additional improvements are considered out-of-scope for this design, but during implementation
we will evaluate if opportunity arises to address them expediently (leaving them to future work otherwise).
Two main categories of approches were discussed to solve the problem:
Hints are about semantic properties: If the user informs us of an EXPECTED GROUP COUNT
,
this hint could transcend the configuration of top-k operators. It could be used for
cardinality estimation, for example. However, it's not very clear what we mean if we use this
hint in the context of a LATERAL
join. Probably a clearer hint there would be something
describing the expected number of join pairs per row of the outer relation in the join.
Hints are about configuring low-level operators: In this line of thinking, we accept that
hints should apply to particular low-level operators and users need to know what operator they are
targeting with a particular hint. Instead of EXPECTED GROUP COUNT
, we could directly specify
something like EXPECTED TOPK GROUP SIZE
.
The proposal in this design document is a hybrid of the two categories above in that we attach semantics to specific SQL clauses, which are then attached to particular HIR and MIR operators. We find this approach more workable that hints about low-level operators, since the same SQL query block can include different syntactic constructs that refer to different instances of the same type of low-level operator.
The suggestion in MaterializeInc/database-issues#5578 to add an EXPECTED GROUP COUNT
hint is a
proposal that follows a semantic hinting philosophy, but is higher-level in than the proposal
in this design in that it does not refer to a specific SQL clause. Given the many variations
that SQL syntax includes, we found it tricky to define an extensive hint set about semantic
properties that would match the many syntactic variations, especially for top-k. This is why
this proposal focuses on attaching hints to the SQL syntax that encodes the reduction and
top-k variants.
We considered changing the current behavior of EXPECTED GROUP SIZE
. For example, we could make the
EXPECTED GROUP SIZE
apply only to the reduction in a single query block and force users to
specify the other proposed hints for top-k constructs. Additionally, AGGREGATE INPUT GROUP SIZE
could
be introdced as a synonym for EXPECTED GROUP SIZE
, which would in turn be deprecated.
However, such a change would introduce operational complexity. Migration procedures would need to be
devised and implemented where we rewrite production queries to introduce the additional
hints instead of only the EXPECTED GROUP SIZE
in single block queries with multiple constructs.
Additional migration procedures would need to rewrite indexed / materialized view definitions
in the catalog to change EXPECTED GROUP SIZE
to AGGREGATE INPUT GROUP SIZE
. Finally, a syntax
deprecation process would need to be followed.
Given that the issue is one of better UX on the specific cases where hints need to be provided and only in the cases where there is ambiguity, the trade-off between operational complexity and risk with the gain from the change leans on the direction of maintaining backwards compatibility, as advocated by the proposal in this design document.