123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445 |
- operators:
- - operator: Constant
- plan_types: "optimized,raw"
- description: |
- Always produces the same collection of rows.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: |
- ```mzsql
- Constant
- - ((1, 2) x 2)
- - (3, 4)
- ```
- - operator: Constant
- plan_types: "LIR"
- description: |
- Always produces the same collection of rows.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "`→Constant (2 rows)`"
- - operator: Get
- plan_types: "optimized,raw"
- description: |
- Produces rows from either an existing relation (source/view/materialized view/table) or from a previous
- CTE in the same plan.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "`Get materialize.public.ordered`"
- - operator: Stream, Arranged, Index Lookup, Read
- plan_types: "LIR"
- description: |
- Produces rows from either an existing relation (source/view/materialized view/table) or from a previous
- CTE in the same plan.
- A parent `Fused Map/Filter/Project` operator can combine with this operator.
- There are four types of `Get`.
- 1. `Stream` indicates that the results are not [arranged](/get-started/arrangements/#arrangements) in memory
- and will be streamed directly.
- 2. `Arranged` indicates that the results are [arranged](/get-started/arrangements/#arrangements) in memory.
- 2. `Index Lookup` indicates the results will be
- _looked up_ in an existing [arrangement]((/get-started/arrangements/#arrangements).
- 3. `Read` indicates that the results are unarranged,
- and will be processed as they arrive.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "`Arranged materialize.public.t`"
- - operator: Project
- plan_types: "optimized,raw"
- description: |
- Produces a subset of the [columns](#explain-plan-columns) in the input
- rows. See also [column numbering](#explain-plan-columns).
- uses_memory: False
- memory_details: ""
- expansive: False
- expansive_details: |
- Each row has _less_ data (i.e., shorter rows, but same number of rows).
- example: "`Project (#2, #3)`"
- - operator: Map
- plan_types: "optimized,raw"
- description: |
- Appends the results of some scalar expressions to each row in the input.
- uses_memory: False
- memory_details: ""
- expansive: True
- expansive_details: |
- Each row has more data (i.e., longer rows but same number of rows).
- example: "`Map (((#1 * 10000000dec) / #2) * 1000dec)`"
- - operator: Map/Filter/Project
- plan_types: "LIR"
- description: |
- Computes new columns (maps), filters columns, and projects away columns. Works row-by-row.
- Maps and filters will be printed, but projects will not.
- These may be marked as **`Fused`** `Map/Filter/Project`, which means they will combine with the operator beneath them to run more efficiently.
- uses_memory: False
- memory_details: ""
- expansive: True
- expansive_details: |
- Each row may have more data, from the `Map`.
- Each row may also have less data, from the `Project`.
- There may be fewer rows, from the `Filter`.
- example: |
- ```mzsql
- →Map/Filter/Project
- Filter: (#0{a} < 7)
- Map: (#0{a} + #1{b})
- ```
- - operator: FlatMap
- plan_types: "optimized"
- description: |
- Appends the result of some (one-to-many) [table function](/sql/functions/#table-functions) to each row in the input.
- uses_memory: False
- memory_details: ""
- expansive: True
- expansive_details: |
- Depends on the [table function](/sql/functions/#table-functions) used.
- example: "`FlatMap jsonb_foreach(#3)`"
- - operator: Table Function
- plan_types: "LIR"
- description: |
- Appends the result of some (one-to-many) [table function](/sql/functions/#table-functions) to each row in the input.
- A parent `Fused Table Function unnest_list` operator will fuse with its child `GroupAggregate` operator. Fusing these operator is part of how we efficiently compile window functions from SQL to dataflows.
- A parent `Fused Map/Filter/Project` can combine with this operator.
- uses_memory: False
- memory_details: ""
- expansive: True
- expansive_details: |
- Depends on the [table function](/sql/functions/#table-functions) used.
- example: |
- ```mzsql
- →Table Function generate_series(#0{a}, #1{b}, 1)
- Input key: (#0{a})
- ```
- - operator: CallTable
- plan_types: "raw"
- description: |
- Appends the result of some (one-to-many) [table function](/sql/functions/#table-functions) to each row in the input.
- uses_memory: False
- memory_details: ""
- expansive: True
- expansive_details: |
- Depends on the [table function](/sql/functions/#table-functions) used.
- example: "`CallTable generate_series(1, 7, 1)`"
- - operator: Filter
- plan_types: "optimized,raw"
- description: |
- Removes rows of the input for which some scalar predicates return `false`.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "`Filter (#20 < #21)`"
- expansive_details: |
- May reduce the number of rows.
- - operator: ~Join
- plan_types: "raw"
- description: |
- Performs one of `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` join on the two inputs, using the given predicate.
- uses_memory: True
- memory_details: |
- Uses memory proportional to the input sizes, unless [the inputs have appropriate indexes](/transform-data/optimization/#join). Certain joins with more than 2 inputs use additional memory, see details in the optimized plan.
- expansive: True
- expansive_details: |
- For `CrossJoin`s, Cartesian product of the inputs (|N| x |M|). Note that, in many cases, a join that shows up as a cross join in the RAW PLAN will actually be turned into an inner join in the OPTIMIZED PLAN, by making use of an equality WHERE condition.
- For other join types, depends on the join order and facts about the joined collections.
- example: "`InnerJoin (#0 = #2)`"
- - operator: Join
- plan_types: "optimized"
- description: |
- Returns combinations of rows from each input whenever some equality predicates are `true`.
- uses_memory: True
- memory_details: |
- The `Join` operator itself uses memory only for `type=differential` with more than 2 inputs.
- However, `Join` operators need [arrangements](/get-started/arrangements/#arrangements) on their inputs (shown by the `ArrangeBy` operator).
- These arrangements use memory proportional to the input sizes. If an input has an [appropriate index](/transform-data/optimization/#join), then the arrangement of the index will be reused.
- expansive: True
- expansive_details: |
- Depends on the join order and facts about the joined collections.
- example: "`Join on=(#1 = #2) type=delta`"
- - operator: Differential Join, Delta Join
- plan_types: "LIR"
- description: |
- Both join operators indicate the join ordering selected.
- Returns combinations of rows from each input whenever some equality predicates are `true`.
- Joins will indicate the join order of their children, starting from 0.
- For example, `Differential Join %1 » %0` will join its second child into its first.
- The [two joins differ in performance characteristics](/transform-data/optimization/#join).
- uses_memory: True
- memory_details: |
- Uses memory for 3-way or more differential joins.
- expansive: True
- expansive_details: |
- Depends on the join order and facts about the joined collections.
- example: |
- ```mzsql
- →Differential Join %1 » %0
- Join stage %0: Lookup key #0{a} in %0
- ```
- - operator: CrossJoin
- plan_types: "optimized"
- description: |
- An alias for a `Join` with an empty predicate (emits all combinations). Note that not all cross joins are marked
- as `CrossJoin`: In a join with more than 2 inputs, it can happen that there is a cross join between some of the inputs.
- You can recognize this case by `ArrangeBy` operators having empty keys, i.e., `ArrangeBy keys=[[]]`.
- uses_memory: True
- memory_details: |
- Uses memory for 3-way or more differential joins.
- expansive: True
- expansive_details: |
- Cartesian product of the inputs (|N| x |M|).
- example: "`CrossJoin type=differential`"
- - operator: Reduce
- plan_types: "optimized"
- description: |
- Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs.
- uses_memory: True
- memory_details: |
- `SUM`, `COUNT`, and most other aggregations use a moderate amount of memory (proportional either to twice the output size or to input size + output size).
- `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see
- [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
- expansive: False
- example: "`Reduce group_by=[#0] aggregates=[max((#0 * #1))]`"
- - operator: GroupAggregate
- plan_types: "LIR"
- description: |
- Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs.
- There are five types of `GroupAggregate`, ordered by increasing complexity:
- 1. `Distinct GroupAggregate` corresponds to the SQL `DISTINCT` operator.
- 2. `Accumulable GroupAggregate` (e.g., `SUM`, `COUNT`) corresponds to several easy to implement aggregations that can be executed simultaneously and efficiently.
- 3. `Hierarchical GroupAggregate` (e.g., `MIN`, `MAX`) corresponds to an aggregation requiring a tower of arrangements. These can be either monotonic (more efficient) or bucketed. These may benefit from a hint; [see `mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
- These may either be bucketed or monotonic (more efficient).
- These may consolidate their output, which will increase memory usage.
- 4. `Collated Multi-GroupAggregate` corresponds to an arbitrary mix of reductions of different types, which will be performed separately and then joined together.
- 5. `Non-incremental GroupAggregate` (e.g., window functions, `list_agg`) corresponds to a single non-incremental aggregation.
- These are the most computationally intensive reductions.
- A parent `Fused Map/Filter/Project` can combine with this operator.
- uses_memory: True
- memory_details: |
- `Distinct` and `Accumulable` aggregates use a moderate amount of memory (proportional to twice the output size).
- `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see
- [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
- `Non-incremental` aggregates use memory proportional to the input + output size.
- `Collated` aggregates use memory that is the sum of their constituents, plus some memory for the join at the end.
- expansive: False
- example: |
- ```mzsql
- →Accumulable GroupAggregate
- Simple aggregates: count(*)
- Post-process Map/Filter/Project
- Filter: (#0 > 1)
- ```
- - operator: Reduce
- plan_types: "raw"
- description: |
- Groups the input rows by some scalar expressions, reduces each group using
- some aggregate functions, and produces rows containing the group key and
- aggregate outputs. In the case where the group key is empty and the input
- is empty, returns a single row with the aggregate functions applied to the
- empty input collection.
- uses_memory: True
- memory_details: |
- `SUM`, `COUNT`, and most other aggregations use a moderate amount of memory (proportional either to twice the output size or to input size + output size).
- `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see
- [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
- expansive: False
- example: "`Reduce group_by=[#0] aggregates=[max((#0 * #1))]`"
- - operator: Distinct
- plan_types: "optimized"
- description: |
- Alias for a `Reduce` with an empty aggregate list.
- uses_memory: True
- memory_details: |
- Uses memory proportional to twice the output size.
- expansive: False
- example: "`Distinct`"
- - operator: Distinct
- plan_types: "raw"
- description: |
- Removes duplicate copies of input rows.
- uses_memory: True
- memory_details: |
- Uses memory proportional to twice the output size.
- expansive: False
- example: "`Distinct`"
- - operator: TopK
- plan_types: "optimized,raw"
- description: |
- Groups the input rows by some scalar expressions, sorts each group using the group key, removes the top `offset` rows in each group, and returns the next `limit` rows.
- uses_memory: True
- memory_details: |
- Can use significant amount as the operator can significantly overestimate
- the group sizes. Consult
- [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
- expansive: False
- example: "`TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5`"
- - operator: TopK
- plan_types: "LIR"
- description: |
- Groups the input rows, sorts them according to some ordering, and returns at most `K` rows at some offset from the top of the list, where `K` is some (possibly computed) limit.
- There are three types of `TopK`. Two are special cased for monotonic inputs (i.e., inputs which never retract data).
- 1. `Monotonic Top1`.
- 2. `Monotonic TopK`, which may give an expression indicating the limit.
- 3. `Non-monotonic TopK`, a generic `TopK` plan.
- Each version of the `TopK` operator may include grouping, ordering, and limit directives.
- uses_memory: True
- memory_details: |
- `Monotonic Top1` and `Monotonic TopK` use a moderate amount of memory. `Non-monotonic TopK` uses significantly more memory as the operator can significantly overestimate
- the group sizes. Consult
- [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
- expansive: False
- example: |
- ```mzsql
- →Consolidating Monotonic TopK
- Order By #1 asc nulls_last, #0 desc nulls_first
- Limit 5
- ```
- - operator: Negate
- plan_types: "optimized,raw"
- description: |
- Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "`Negate`"
- - operator: Negate Diffs
- plan_types: "LIR"
- description: |
- Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "`→Negate Diffs`"
- - operator: Threshold
- plan_types: "optimized,raw"
- description: |
- Removes any rows with negative counts.
- uses_memory: True
- memory_details: |
- Uses memory proportional to the input and output size, twice.
- expansive: False
- example: "`Threshold`"
- - operator: Threshold Diffs
- plan_types: "LIR"
- description: |
- Removes any rows with negative counts.
- uses_memory: True
- memory_details: |
- Uses memory proportional to the input and output size, twice.
- expansive: False
- example: "`→Threshold Diffs`"
- - operator: Union
- plan_types: "optimized,raw"
- description: |
- Sums the counts of each row of all inputs. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.)
- uses_memory: True
- memory_details: |
- Moderate use of memory. Some union operators force consolidation, which results in a memory spike, largely at hydration time.
- expansive: False
- example: "`Union`"
- - operator: Union
- plan_types: "LIR"
- description: |
- Combines its inputs into a unified output, emitting one row for each row on any input. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.)
- uses_memory: True
- memory_details: |
- A `Consolidating Union` will make moderate use of memory, particularly at hydration time. A `Union` that is not `Consolidating` will not consume memory.
- expansive: False
- example: "`→Consolidating Union`"
- - operator: ArrangeBy
- plan_types: "optimized"
- description: |
- Indicates a point that will become an [arrangement](/get-started/arrangements/#arrangements) in the dataflow engine (each `keys` element will be a different arrangement). Note that if an appropriate index already exists on the input or the output of the previous operator is already arranged with a key that is also requested here, then this operator will just pass on that existing arrangement instead of creating a new one.
- uses_memory: True
- memory_details: |
- Depends. If arrangements need to be created, they use memory proportional to the input size.
- expansive: False
- example: "`ArrangeBy keys=[[#0]]`"
- - operator: Arrange
- plan_types: "LIR"
- description: |
- Indicates a point that will become an [arrangement](/get-started/arrangements/#arrangements) in the dataflow engine, i.e., it will consume memory to cache results.
- uses_memory: True
- memory_details: |
- Uses memory proportional to the input size. Note that in the LIR / physical plan, `Arrange`/`ArrangeBy` almost always means that an arrangement will actually be created. (This is in contrast to the "optimized" plan, where an `ArrangeBy` being present in the plan often does not mean that an arrangement will actually be created.)
- expansive: False
- example: |
- ```mzsql
- →Arrange
- Keys: 1 arrangement available, plus raw stream
- Arrangement 0: #0
- ```
- - operator: Unarranged Raw Stream
- plan_types: "LIR"
- description: |
- Indicates a point where data will be streamed (even if it is somehow already arranged).
- uses_memory: False
- expansive: False
- example: "`→Unarranged Raw Stream`"
- - operator: With ... Return ...
- plan_types: "optimized,raw,LIR"
- description: |
- Introduces CTEs, i.e., makes it possible for sub-plans to be consumed multiple times by downstream operators.
- uses_memory: False
- memory_details: ""
- expansive: False
- example: "[See above](#reading-plans)"
|