explain_plan_operators.yml 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445
  1. operators:
  2. - operator: Constant
  3. plan_types: "optimized,raw"
  4. description: |
  5. Always produces the same collection of rows.
  6. uses_memory: False
  7. memory_details: ""
  8. expansive: False
  9. example: |
  10. ```mzsql
  11. Constant
  12. - ((1, 2) x 2)
  13. - (3, 4)
  14. ```
  15. - operator: Constant
  16. plan_types: "LIR"
  17. description: |
  18. Always produces the same collection of rows.
  19. uses_memory: False
  20. memory_details: ""
  21. expansive: False
  22. example: "`→Constant (2 rows)`"
  23. - operator: Get
  24. plan_types: "optimized,raw"
  25. description: |
  26. Produces rows from either an existing relation (source/view/materialized view/table) or from a previous
  27. CTE in the same plan.
  28. uses_memory: False
  29. memory_details: ""
  30. expansive: False
  31. example: "`Get materialize.public.ordered`"
  32. - operator: Stream, Arranged, Index Lookup, Read
  33. plan_types: "LIR"
  34. description: |
  35. Produces rows from either an existing relation (source/view/materialized view/table) or from a previous
  36. CTE in the same plan.
  37. A parent `Fused Map/Filter/Project` operator can combine with this operator.
  38. There are four types of `Get`.
  39. 1. `Stream` indicates that the results are not [arranged](/get-started/arrangements/#arrangements) in memory
  40. and will be streamed directly.
  41. 2. `Arranged` indicates that the results are [arranged](/get-started/arrangements/#arrangements) in memory.
  42. 2. `Index Lookup` indicates the results will be
  43. _looked up_ in an existing [arrangement]((/get-started/arrangements/#arrangements).
  44. 3. `Read` indicates that the results are unarranged,
  45. and will be processed as they arrive.
  46. uses_memory: False
  47. memory_details: ""
  48. expansive: False
  49. example: "`Arranged materialize.public.t`"
  50. - operator: Project
  51. plan_types: "optimized,raw"
  52. description: |
  53. Produces a subset of the [columns](#explain-plan-columns) in the input
  54. rows. See also [column numbering](#explain-plan-columns).
  55. uses_memory: False
  56. memory_details: ""
  57. expansive: False
  58. expansive_details: |
  59. Each row has _less_ data (i.e., shorter rows, but same number of rows).
  60. example: "`Project (#2, #3)`"
  61. - operator: Map
  62. plan_types: "optimized,raw"
  63. description: |
  64. Appends the results of some scalar expressions to each row in the input.
  65. uses_memory: False
  66. memory_details: ""
  67. expansive: True
  68. expansive_details: |
  69. Each row has more data (i.e., longer rows but same number of rows).
  70. example: "`Map (((#1 * 10000000dec) / #2) * 1000dec)`"
  71. - operator: Map/Filter/Project
  72. plan_types: "LIR"
  73. description: |
  74. Computes new columns (maps), filters columns, and projects away columns. Works row-by-row.
  75. Maps and filters will be printed, but projects will not.
  76. These may be marked as **`Fused`** `Map/Filter/Project`, which means they will combine with the operator beneath them to run more efficiently.
  77. uses_memory: False
  78. memory_details: ""
  79. expansive: True
  80. expansive_details: |
  81. Each row may have more data, from the `Map`.
  82. Each row may also have less data, from the `Project`.
  83. There may be fewer rows, from the `Filter`.
  84. example: |
  85. ```mzsql
  86. →Map/Filter/Project
  87. Filter: (#0{a} < 7)
  88. Map: (#0{a} + #1{b})
  89. ```
  90. - operator: FlatMap
  91. plan_types: "optimized"
  92. description: |
  93. Appends the result of some (one-to-many) [table function](/sql/functions/#table-functions) to each row in the input.
  94. uses_memory: False
  95. memory_details: ""
  96. expansive: True
  97. expansive_details: |
  98. Depends on the [table function](/sql/functions/#table-functions) used.
  99. example: "`FlatMap jsonb_foreach(#3)`"
  100. - operator: Table Function
  101. plan_types: "LIR"
  102. description: |
  103. Appends the result of some (one-to-many) [table function](/sql/functions/#table-functions) to each row in the input.
  104. A parent `Fused Table Function unnest_list` operator will fuse with its child `GroupAggregate` operator. Fusing these operator is part of how we efficiently compile window functions from SQL to dataflows.
  105. A parent `Fused Map/Filter/Project` can combine with this operator.
  106. uses_memory: False
  107. memory_details: ""
  108. expansive: True
  109. expansive_details: |
  110. Depends on the [table function](/sql/functions/#table-functions) used.
  111. example: |
  112. ```mzsql
  113. →Table Function generate_series(#0{a}, #1{b}, 1)
  114. Input key: (#0{a})
  115. ```
  116. - operator: CallTable
  117. plan_types: "raw"
  118. description: |
  119. Appends the result of some (one-to-many) [table function](/sql/functions/#table-functions) to each row in the input.
  120. uses_memory: False
  121. memory_details: ""
  122. expansive: True
  123. expansive_details: |
  124. Depends on the [table function](/sql/functions/#table-functions) used.
  125. example: "`CallTable generate_series(1, 7, 1)`"
  126. - operator: Filter
  127. plan_types: "optimized,raw"
  128. description: |
  129. Removes rows of the input for which some scalar predicates return `false`.
  130. uses_memory: False
  131. memory_details: ""
  132. expansive: False
  133. example: "`Filter (#20 < #21)`"
  134. expansive_details: |
  135. May reduce the number of rows.
  136. - operator: ~Join
  137. plan_types: "raw"
  138. description: |
  139. Performs one of `INNER` / `LEFT` / `RIGHT` / `FULL OUTER` / `CROSS` join on the two inputs, using the given predicate.
  140. uses_memory: True
  141. memory_details: |
  142. Uses memory proportional to the input sizes, unless [the inputs have appropriate indexes](/transform-data/optimization/#join). Certain joins with more than 2 inputs use additional memory, see details in the optimized plan.
  143. expansive: True
  144. expansive_details: |
  145. For `CrossJoin`s, Cartesian product of the inputs (|N| x |M|). Note that, in many cases, a join that shows up as a cross join in the RAW PLAN will actually be turned into an inner join in the OPTIMIZED PLAN, by making use of an equality WHERE condition.
  146. For other join types, depends on the join order and facts about the joined collections.
  147. example: "`InnerJoin (#0 = #2)`"
  148. - operator: Join
  149. plan_types: "optimized"
  150. description: |
  151. Returns combinations of rows from each input whenever some equality predicates are `true`.
  152. uses_memory: True
  153. memory_details: |
  154. The `Join` operator itself uses memory only for `type=differential` with more than 2 inputs.
  155. However, `Join` operators need [arrangements](/get-started/arrangements/#arrangements) on their inputs (shown by the `ArrangeBy` operator).
  156. These arrangements use memory proportional to the input sizes. If an input has an [appropriate index](/transform-data/optimization/#join), then the arrangement of the index will be reused.
  157. expansive: True
  158. expansive_details: |
  159. Depends on the join order and facts about the joined collections.
  160. example: "`Join on=(#1 = #2) type=delta`"
  161. - operator: Differential Join, Delta Join
  162. plan_types: "LIR"
  163. description: |
  164. Both join operators indicate the join ordering selected.
  165. Returns combinations of rows from each input whenever some equality predicates are `true`.
  166. Joins will indicate the join order of their children, starting from 0.
  167. For example, `Differential Join %1 » %0` will join its second child into its first.
  168. The [two joins differ in performance characteristics](/transform-data/optimization/#join).
  169. uses_memory: True
  170. memory_details: |
  171. Uses memory for 3-way or more differential joins.
  172. expansive: True
  173. expansive_details: |
  174. Depends on the join order and facts about the joined collections.
  175. example: |
  176. ```mzsql
  177. →Differential Join %1 » %0
  178. Join stage %0: Lookup key #0{a} in %0
  179. ```
  180. - operator: CrossJoin
  181. plan_types: "optimized"
  182. description: |
  183. An alias for a `Join` with an empty predicate (emits all combinations). Note that not all cross joins are marked
  184. as `CrossJoin`: In a join with more than 2 inputs, it can happen that there is a cross join between some of the inputs.
  185. You can recognize this case by `ArrangeBy` operators having empty keys, i.e., `ArrangeBy keys=[[]]`.
  186. uses_memory: True
  187. memory_details: |
  188. Uses memory for 3-way or more differential joins.
  189. expansive: True
  190. expansive_details: |
  191. Cartesian product of the inputs (|N| x |M|).
  192. example: "`CrossJoin type=differential`"
  193. - operator: Reduce
  194. plan_types: "optimized"
  195. description: |
  196. Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs.
  197. uses_memory: True
  198. memory_details: |
  199. `SUM`, `COUNT`, and most other aggregations use a moderate amount of memory (proportional either to twice the output size or to input size + output size).
  200. `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see
  201. [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
  202. expansive: False
  203. example: "`Reduce group_by=[#0] aggregates=[max((#0 * #1))]`"
  204. - operator: GroupAggregate
  205. plan_types: "LIR"
  206. description: |
  207. Groups the input rows by some scalar expressions, reduces each group using some aggregate functions, and produces rows containing the group key and aggregate outputs.
  208. There are five types of `GroupAggregate`, ordered by increasing complexity:
  209. 1. `Distinct GroupAggregate` corresponds to the SQL `DISTINCT` operator.
  210. 2. `Accumulable GroupAggregate` (e.g., `SUM`, `COUNT`) corresponds to several easy to implement aggregations that can be executed simultaneously and efficiently.
  211. 3. `Hierarchical GroupAggregate` (e.g., `MIN`, `MAX`) corresponds to an aggregation requiring a tower of arrangements. These can be either monotonic (more efficient) or bucketed. These may benefit from a hint; [see `mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
  212. These may either be bucketed or monotonic (more efficient).
  213. These may consolidate their output, which will increase memory usage.
  214. 4. `Collated Multi-GroupAggregate` corresponds to an arbitrary mix of reductions of different types, which will be performed separately and then joined together.
  215. 5. `Non-incremental GroupAggregate` (e.g., window functions, `list_agg`) corresponds to a single non-incremental aggregation.
  216. These are the most computationally intensive reductions.
  217. A parent `Fused Map/Filter/Project` can combine with this operator.
  218. uses_memory: True
  219. memory_details: |
  220. `Distinct` and `Accumulable` aggregates use a moderate amount of memory (proportional to twice the output size).
  221. `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see
  222. [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
  223. `Non-incremental` aggregates use memory proportional to the input + output size.
  224. `Collated` aggregates use memory that is the sum of their constituents, plus some memory for the join at the end.
  225. expansive: False
  226. example: |
  227. ```mzsql
  228. →Accumulable GroupAggregate
  229. Simple aggregates: count(*)
  230. Post-process Map/Filter/Project
  231. Filter: (#0 > 1)
  232. ```
  233. - operator: Reduce
  234. plan_types: "raw"
  235. description: |
  236. Groups the input rows by some scalar expressions, reduces each group using
  237. some aggregate functions, and produces rows containing the group key and
  238. aggregate outputs. In the case where the group key is empty and the input
  239. is empty, returns a single row with the aggregate functions applied to the
  240. empty input collection.
  241. uses_memory: True
  242. memory_details: |
  243. `SUM`, `COUNT`, and most other aggregations use a moderate amount of memory (proportional either to twice the output size or to input size + output size).
  244. `MIN` and `MAX` aggregates can use significantly more memory. This can be improved by including group size hints in the query, see
  245. [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
  246. expansive: False
  247. example: "`Reduce group_by=[#0] aggregates=[max((#0 * #1))]`"
  248. - operator: Distinct
  249. plan_types: "optimized"
  250. description: |
  251. Alias for a `Reduce` with an empty aggregate list.
  252. uses_memory: True
  253. memory_details: |
  254. Uses memory proportional to twice the output size.
  255. expansive: False
  256. example: "`Distinct`"
  257. - operator: Distinct
  258. plan_types: "raw"
  259. description: |
  260. Removes duplicate copies of input rows.
  261. uses_memory: True
  262. memory_details: |
  263. Uses memory proportional to twice the output size.
  264. expansive: False
  265. example: "`Distinct`"
  266. - operator: TopK
  267. plan_types: "optimized,raw"
  268. description: |
  269. Groups the input rows by some scalar expressions, sorts each group using the group key, removes the top `offset` rows in each group, and returns the next `limit` rows.
  270. uses_memory: True
  271. memory_details: |
  272. Can use significant amount as the operator can significantly overestimate
  273. the group sizes. Consult
  274. [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
  275. expansive: False
  276. example: "`TopK order_by=[#1 asc nulls_last, #0 desc nulls_first] limit=5`"
  277. - operator: TopK
  278. plan_types: "LIR"
  279. description: |
  280. Groups the input rows, sorts them according to some ordering, and returns at most `K` rows at some offset from the top of the list, where `K` is some (possibly computed) limit.
  281. There are three types of `TopK`. Two are special cased for monotonic inputs (i.e., inputs which never retract data).
  282. 1. `Monotonic Top1`.
  283. 2. `Monotonic TopK`, which may give an expression indicating the limit.
  284. 3. `Non-monotonic TopK`, a generic `TopK` plan.
  285. Each version of the `TopK` operator may include grouping, ordering, and limit directives.
  286. uses_memory: True
  287. memory_details: |
  288. `Monotonic Top1` and `Monotonic TopK` use a moderate amount of memory. `Non-monotonic TopK` uses significantly more memory as the operator can significantly overestimate
  289. the group sizes. Consult
  290. [`mz_introspection.mz_expected_group_size_advice`](/sql/system-catalog/mz_introspection/#mz_expected_group_size_advice).
  291. expansive: False
  292. example: |
  293. ```mzsql
  294. →Consolidating Monotonic TopK
  295. Order By #1 asc nulls_last, #0 desc nulls_first
  296. Limit 5
  297. ```
  298. - operator: Negate
  299. plan_types: "optimized,raw"
  300. description: |
  301. Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input.
  302. uses_memory: False
  303. memory_details: ""
  304. expansive: False
  305. example: "`Negate`"
  306. - operator: Negate Diffs
  307. plan_types: "LIR"
  308. description: |
  309. Negates the row counts of the input. This is usually used in combination with union to remove rows from the other union input.
  310. uses_memory: False
  311. memory_details: ""
  312. expansive: False
  313. example: "`→Negate Diffs`"
  314. - operator: Threshold
  315. plan_types: "optimized,raw"
  316. description: |
  317. Removes any rows with negative counts.
  318. uses_memory: True
  319. memory_details: |
  320. Uses memory proportional to the input and output size, twice.
  321. expansive: False
  322. example: "`Threshold`"
  323. - operator: Threshold Diffs
  324. plan_types: "LIR"
  325. description: |
  326. Removes any rows with negative counts.
  327. uses_memory: True
  328. memory_details: |
  329. Uses memory proportional to the input and output size, twice.
  330. expansive: False
  331. example: "`→Threshold Diffs`"
  332. - operator: Union
  333. plan_types: "optimized,raw"
  334. description: |
  335. Sums the counts of each row of all inputs. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.)
  336. uses_memory: True
  337. memory_details: |
  338. Moderate use of memory. Some union operators force consolidation, which results in a memory spike, largely at hydration time.
  339. expansive: False
  340. example: "`Union`"
  341. - operator: Union
  342. plan_types: "LIR"
  343. description: |
  344. Combines its inputs into a unified output, emitting one row for each row on any input. (Corresponds to `UNION ALL` rather than `UNION`/`UNION DISTINCT`.)
  345. uses_memory: True
  346. memory_details: |
  347. A `Consolidating Union` will make moderate use of memory, particularly at hydration time. A `Union` that is not `Consolidating` will not consume memory.
  348. expansive: False
  349. example: "`→Consolidating Union`"
  350. - operator: ArrangeBy
  351. plan_types: "optimized"
  352. description: |
  353. Indicates a point that will become an [arrangement](/get-started/arrangements/#arrangements) in the dataflow engine (each `keys` element will be a different arrangement). Note that if an appropriate index already exists on the input or the output of the previous operator is already arranged with a key that is also requested here, then this operator will just pass on that existing arrangement instead of creating a new one.
  354. uses_memory: True
  355. memory_details: |
  356. Depends. If arrangements need to be created, they use memory proportional to the input size.
  357. expansive: False
  358. example: "`ArrangeBy keys=[[#0]]`"
  359. - operator: Arrange
  360. plan_types: "LIR"
  361. description: |
  362. Indicates a point that will become an [arrangement](/get-started/arrangements/#arrangements) in the dataflow engine, i.e., it will consume memory to cache results.
  363. uses_memory: True
  364. memory_details: |
  365. Uses memory proportional to the input size. Note that in the LIR / physical plan, `Arrange`/`ArrangeBy` almost always means that an arrangement will actually be created. (This is in contrast to the "optimized" plan, where an `ArrangeBy` being present in the plan often does not mean that an arrangement will actually be created.)
  366. expansive: False
  367. example: |
  368. ```mzsql
  369. →Arrange
  370. Keys: 1 arrangement available, plus raw stream
  371. Arrangement 0: #0
  372. ```
  373. - operator: Unarranged Raw Stream
  374. plan_types: "LIR"
  375. description: |
  376. Indicates a point where data will be streamed (even if it is somehow already arranged).
  377. uses_memory: False
  378. expansive: False
  379. example: "`→Unarranged Raw Stream`"
  380. - operator: With ... Return ...
  381. plan_types: "optimized,raw,LIR"
  382. description: |
  383. Introduces CTEs, i.e., makes it possible for sub-plans to be consumed multiple times by downstream operators.
  384. uses_memory: False
  385. memory_details: ""
  386. expansive: False
  387. example: "[See above](#reading-plans)"