Feature name: (Recursion limit in WITH MUTUALLY RECURSIVE)
Associated: (Issues: Stabilizing WMR (#7012), #4869, #5409, #835)

Summary

WITH MUTUALLY RECURSIVE (WMR) currently runs until fixpoint. It would be useful to enable the user to limit the number of iterations for, e.g., making it easier for users to debug WMR queries.

I propose implementing only a soft limit for now. By soft, I mean that upon reaching the limit, we would stop iterating and simply consider the current state to be the final result (rather than throwing an error). The default would be no limit. The user can specify a limit using a SQL syntax extension at the level of WMR blocks. Therefore, each WMR block can have its own limit.

Motivation

I can imagine 3 use cases, of which the 2. seems urgent to me:

Stopping divergent queries (#4869): It is very easy to accidentally write a WMR query that runs forever (actually, until OOM in most cases, but that would often take a very long time). We used to have the additional problem that Ctrl+C (or dropping a materialized view or index) didn't cancel a dataflow (#835), which made for quite an unpleasant user experience, as the user had to manually DROP the entire replica. However, proper dataflow cancellation for WMR queries has been recently implemented, and therefore this use case of recursion limits is not so important anymore.
Debugging state between iterations (#5409): WMR queries are not so obvious to write, so users often need to debug their queries during development. In such cases, it can be enlightening to inspect the intermediate states between iterations. If we had an option to limit the number of iterations, the user could just run the query repeatedly with successively larger limits. Multiple people expressed their wish for this feature, e.g., here.
Algorithms requiring a fixed number of iterations: Some algorithms are unintuitive to express as a loop running until a fixpoint (or reaching a fixpoint can't be ensured), and instead need a loop that runs a specific number of times. Note that in the above two use cases, we don't expect to run into the limit in production under normal operation, but in this use case a limit will be part of the logic of production queries.

Explanation

What to do when we reach the limit? -- hard limit and soft limit

Edit: In the meantime, we are referring to these as "erroring limit" and "non-erroring limit" (internally), because "hard limit" suggests something that can't be changed.

When a computation reaches the limit, we could either gracefully stop and simply output the current state of the WMR variables as the final result of the WMR clause, or we could error out. Interestingly, we have conflicting requirements for the different use cases mentioned above:

For use cases 2. and 3., we obviously need graceful stopping.
For use case 1., I'd argue that erroring out is important. This is because in use case 1., the query hits the limit unexpectedly. This means that if we didn't error out, the user might not notice that the limit was hit, and just keep working with the result under the incorrect assumption that it's a fixpoint.

If we wanted to support all three use cases, then we would need two different kinds of limits to accommodate the conflicting requirements above: a hard limit and a soft limit with different effects upon reaching the limit:

For the hard limit, the query would error out, thus making a noise to the user.
For the soft limit, we just gracefully stop and output the current state as the final result.

Edit: We have implemented both the soft and the hard limit. The hard limit will be the default when the user doesn't specify the kind of limit, because other systems also have hard limits.

Syntax

An example for the proposed syntax:

WITH MUTUALLY RECURSIVE (ITERATION LIMIT 100)
  cnt (i int) AS (
    SELECT 1 AS i
    UNION
    SELECT i+1 FROM cnt)
SELECT * FROM cnt;

Our WMR implementation supports having multiple WMR blocks inside a single query. An example is computing strongly connected components, which can be implemented by nested WMRs. In case of multiple WMR blocks, the above syntax allows for setting different limits for each WMR block.

Having separate limits for separate WMR blocks in one query is especially important for the compositionality of views. Let's say that Alice writes a view that has just a single WMR. Then Bob writes another query that uses Alice's view, and has a WMR itself. The view is inlined, so Bob's query now has two WMRs. But Bob shouldn't need to care about the internals of Alice's view, and might not even know that the view has a WMR, so he thinks his WMR is the only one. And then he sets a limit for the entire query, which will unexpectedly mess up Alice's WMR that is inlined from inside the view.

The syntax follows our SQL Design Principles:

The options block is parenthesized.
There is a space between ITERATION and LIMIT.
We accept an optional = before the number.
We use the generate_extracted_config! macro to process the options.

Edit: We finalized the syntax to [STOP AT | ERROR AT] RECURSION LIMIT, see very long discussion on Slack. An observation from Nikhil was that if the main WMR keywords have the word RECURSIVE, then we might not want to say ITERATION here, because these are often opposing terms, so both being present might confuse users. (An observation from Jan is that the word LIMIT might not be needed, because we can explain this to users as doing exactly this number of iterations, which is the same thing as stopping at either fixpoint or this number of iterations. Edit: I was afraid that users might not make this extra mental leap, and might get concerned that superfluous iterations are happening.)

Rendering

The rendering part of the design is simple and uncontroversial. We will use branch_when (from Timely) to access the pointstamps containing the iteration numbers (after the consolidation of the result of an iteration). This will give us two streams: rows that are within the limit, and rows that are outside. The latter one we can either throw away or route into the error stream, depending on whether we want to just gracefully stop at the limit with the current state as the final result (soft limit), or error out when reaching the limit (hard limit).

IRs

At the AST and HIR levels we can simply add a field to our WMR block representation. However, in MIR we have the problem that if we simply put one limit into each WMR block, then limit settings would not survive NormalizeLets. This is because NormalizeLets sometimes merges WMR blocks (and moves bindings around in some other ways). For example, if there are two independent WMR clauses (whose results eventually flow together into, e.g., a union or join), then NormalizeLets merges these two WMR blocks. In this case, setting just one limit of this merged WMR clause would be wrong.

To solve the above problem, I propose to have a per-let-binding limit in MIR and LIR (and in rendering), so that different let bindings that came from different WMR clauses with differing limits could retain their differing limits. This needs some improvements in NormalizeLets:

We need to attend to the LetRec construction in NormalizeLets at the end of digest_lets. We need to come up with limits for each element of bindings. What goes into bindings is what digest_lets_helper puts into bindings and worklist. Smartening the insertion into worklist is easy, because that is a non-recursive Let, so the limit can be infinite. Smartening the insertion into bindings is also easy, because the binding directly comes from a LetRec binding, from which we could copy the limit.
In some situations, we modify the list of bindings of an existing LetRec in-place:
- The inlining doesn't need any changes, because the limit is only relevant when a back-edge goes out from a binding (the rendering puts the limit handling only on such edges), but in this case the binding can't be inlined anyway.
- At the end of action, we add bindings from the result of a harvest_non_recursive. Again, non-recursive bindings can have an infinite limit.
- post_order_harvest_lets also adds bindings from harvest_non_recursive.
- harvest_non_recursive itself just removes bindings.

An additional difficulty is that various optimizer transforms will have fixpoint loops that simulate some aspects of a LetRec execution. We will need to be careful in each of these. The current (or soon) ones that I'm aware of:

ColumnKnowledge: We can just lower max_iterations to the smallest of the limits of all the bindings, because ColumnKnowledge is already prepared to handle an early exit that doesn't exactly simulate the fixpoint semantics of LetRec.
FoldConstants: This one will have to precisely handle the limits of each binding.

We should make MIR EXPLAIN specially handle the case when all bindings of a LetRec have the same limit, and print the limit simply on the LetRec in this case, to avoid confusing users in simple cases.

LIR EXPLAIN should print the limit of a binding only when it is non-infinite.

Future Work: Hard limit

As mentioned before, we could also implement a hard limit, building on the implementation of the soft limit. The main difference would be in the rendering, as already mentioned above.

Before we had proper dataflow cancellation for WMR queries, a default hard limit would have been important, but now it's not so urgent anymore. It is not even clear whether we need it at all. A default hard limit could prevent the dataflow from OOMing in some cases, but the set of scenarios where this actually happens is very narrow:

If the dataflow operates on a small dataset (e.g., a toy dataset during the development of a query), then it will take a long time to OOM, because each iteration has a non-trivial time overhead.
On a large dataset, the dataflow will probably OOM before reaching the limit. E.g., let's say the user expects tens of iterations, but the query diverges, and we have a default limit of 1000. In this case, the memory utilisation will be more than an order of magnitude larger than what the user expected.

If we implement a hard limit, we should probably set a finite default value, since users would hit the hard limit unexpectedly in most cases. Unfortunately, determining the default value seems to be not obvious at all.

First of all, note that a default limit that is over something like 100000 is not very useful, because then even a very simple WMR query takes tens of seconds to reach the limit with even a single worker thread, due to each iteration having a fixed overhead. (And the overhead would probably be larger for realistic WMR queries, and when running on multiple threads/machines.)

One might think that having a default limit of 10000 would be totally uncontroversial, because nobody would ever hit it with a correct query on correct input data. However, Frank says he likes his iterative dataflows untamed, and makes a good point with an example use case: "Let's say you want to do reachability / distances on a road network (which doesn't have exponential growth), where edges are roads sliced up to 100 feet." In graph queries, the number of iterations often depends on the diameter of the input graph, which will be quite high in the above example.

Several other systems are surprisingly strict about their default limits: In Snowflake and Google BigQuery, the limit is only a few hundred, and the limit can only be modified by support. I'm not sure why these systems are so strict about this, but it doesn't feel right to me. I think we should make the default easily overridable by users. (And Frank agrees.) We could simply have the same syntax for the hard limit as for the soft limit. Also note that our recursive query support is much more general than other systems, so we shouldn't necessarily follow their path.

We might also add a feature flag for a default hard limit, so that we can change it without a release.

Alternatives

Where to set the option?

System / session variable

Alternatively, we could set the limits in a system / session variable. Note that neither of these would be suitable for the 3. use case.

A system variable applies to all queries in the entire system. (Even if we make a dataflow pick up the current value only at dataflow creation, this can affect all queries if a restart suddenly happens.) Therefore, a system variable is not suitable for the 2. and 3. use cases. However, it is suitable for 1. (the hard limit).

Most system variables can't be modified by a user on her own, but only by support. However, there is a precedent for a system variable that is modifiable by users. As mentioned above, we would like the hard limit to be easily modifiable by users, and therefore if we decide to put the hard limit in a system variable, we should definitely make it modifiable by users.

Session variables have the problem that a custom value will be lost across environmentd restarts, meaning that it is impossible to install a view where the limit is different from the default, and have the custom limit stick between restarts. (We could conceivably add the value to the catalog along with the query's SQL definition, but this is probably not worth the trouble.) Therefore, a session variable wouldn't be suitable for the 1. (the hard limit) and 3. use cases, because for these one has to raise the limit for a production view in a manner that survives restarts. It might be kind of ok for the 2. use case (debugging): one can temporarily change the limit, and run the query that is being debugged. However, there is still the risk that somebody forgets to change the limit back. Also note that this wouldn't allow for different limits in different WMR blocks inside the same query.

In general, it's inadvisable for system / session variables to affect query semantics in a major way, so we rejected this option.

Query-level option

As discussed above, we would like to make it possible to set different limits for different WMR blocks inside the same query, which wouldn't be possible with a query-level option.

Also note that we still wouldn't avoid adding a new options clause, because the current OPTIONS clause is attached to a Select block as opposed to a Query block.

WHERE clause with a special, unmaterializable function

Theoretically, we could put the limit in WHERE clauses with a special, unmaterializable function, e.g., WHERE iteration_number() < 100. However, this would have the difficulty that PredicatePushdown would happily push iteration_number() < 100 all the way to the sources, and therefore out of the recursion. We could of course change PredicatePushdown to have a special case for expressions involving iteration_number(), but the problem is that it's hard to know what other transforms might have similar problems.

Our current UnmaterializableFunc enum uses the term "unmaterializable" as a substitute for "depends on the read-only environmentd context (that is, they can be substituted at query optimization / compilation time). However, the proposed iteration_number() function goes even further, as it depends on the execution context of the rendered dataflow (so the value can only be substituted at runtime). This entails that our function evaluation logic would need to be augmented with some notion of a runtime context.

One trick would be to implement an HIR rewrite to an extra binding that counts iterations (see the Workaround section). However, currently, this would have performance problems.

Workaround: Manually setting a limit using existing SQL building blocks

Users can manually write SQL that counts the number of iterations, and converts reaching the iteration count limit into a fixpoint. However, it would be cumbersome for users to do this every time they want to debug a query. Also note that, currently, this would have the problem for use case 3. that it would involve a cross join with a singleton collection, which would need a broadcast join to avoid grabbing everything onto one worker.

Testing and observability

We will add the limit setting to EXPLAIN, so that tests can verify that the limit settings are correctly picked up.

We'll have to have several tests that involve multiple WMR clauses in various relative positions (independent, sequenced, nested) to see that the limits are not messed up by optimizer transforms.

We'll need to test that the soft limit does the correct thing when reaching the limit, i.e., not erroring out but emitting the current state.

We should keep in mind that some optimizer transforms perform loops when analyzing and transforming a LetRec, and in these cases they should respect the limits.

Drawbacks

The bookkeeping for different limits for different WMR blocks inside the same query introduces a slight maintenance burden in some optimizer transforms (mainly in NormalizeLets).

Unresolved questions

What should be the keyword? This also depends on what will the final keywords be for WITH MUTUALLY RECURSIVE itself. (The latest suggestion was WITH REPEATEDLY.)

Do we want also a hard limit later? If yes, what should be the default value?

20230330_recursion_limit.md 20 KB Permalink History Raw