Catalog tables for Optimizer Notices

Associated:
- MaterializeInc/database-issues#6468 (part of MaterializeInc/console#549)
- MaterializeInc/database-issues#5916 (add redacted_..._sql columns to system catalog)

The Problem

Currently, the optimizer collects and emits notices as a by-product of sequencing or bootstrapping CREATE MATERIALIZED VIEW and CREATE INDEX statements. In order to see a notice, the user has to either:

use psql (or a compatible client) when executing a CREATE statement, or
look at the EXPLAIN PLAN output for the created catalog items.

Applications that want to expose optimizer notices are also stuck with these two choices, making it hard to integrate them in the console UX. In order to lay the foundations for both future UX work and for adding more optimizer notice types (including types not related to a single catalog item) in this document we propose:

A catalog schema for keeping track of optimizer notices.
A vision of how the schema can be integrated into a console-based UX.
A vision of how we will add support for more notice types in the backend.

Some critical (but not obvious at first glance) associated problems that are addressed by necessity in this design doc are stated below.

SOC2 Compliance

Some of the notices print literal constraints appearing in the

CREATE MATERIALIZED VIEW AS <query>
CREATE VIEW AS <query>

statements. Since literal constrains appearing in these statements are classified as "Customer Data", we should obfuscate those in redacted_~ columns similarly to the redacted_create_sql columns proposed in MaterializeInc/database-issues#5916.

Notice state transitions

Note: Issues related to this section are out of scope for the MVP.

Most of the notices will suggest a concrete corrective action that resolves them. However, teaching the system to automatically recognize when a corrective action was taken has some challenges. For example, in a blue-green setup, notices based on production objects should not be handled directly and instead should be resolved in the current development environment.

The proposed design therefore assumes that the console UX will provide workflows where users will explicitly transition the state of a specific notice (similar to workflows on issue tracking websites such as Jira or GitHub).

Lifespan issues

Note: Issues related to this section are out of scope for the MVP.

Even though not all optimizer notices will be scoped to a catalog entry (a materialized view or an index), for those that are we run into the following lifespan issue.

On the one hand, the lifespan of the notice corresponds to the lifespan of the dataflow backing the catalog entry that produced that notice. This is the case because (a) we re-optimize dataflow-backed catalog entries on every environmentd restart, and (b) the notice might disappear or change when we re-optimize the entry due to changes in the optimizer code or in the catalog state.

On the other hand, for a frictionless UX we should be able to remember a user's decision to ignore noisy notices across environmentd restarts.

Consequently, the lifespan of an "ignored notice" should outlive multiple incarnations of the same notice caused by environmentd restart or user-initiated re-creation of the associated catalog entry.

Success Criteria

Stakeholders have agreed on:

The primary use cases and UX workflows that the schema will facilitate.
The design of a catalog schema that contains the optimizer notices corresponding to the current catalog state.
The design of code that will maintain the contents of that schema at runtime, including the mechanisms used to align the externally set notice status with the internal system state.

Out of Scope

Design details about the actual console UX beyond basic wireframe diagrams.
Implementation details about the integration path for optimizer notices associated with SELECT queries (although the design should have at least some idea how these will work out).
Implementation details about the mechanisms that we are going to adopt in order to implement notice lifecycle management in a way that will be able to survive environmentd restarts and DDL re-runs. This will be covered by a future design revision if product decides that state management is a feature that our customers want to see.

Solution Proposal

Console UX and Workflows sketches the main UX workflows that the current design aims to support.
Catalog Schema section describes the design of a catalog schema that contains the optimizer notices corresponding to the current catalog state.
Notice Generation and Management elaborates how notices are generated and managed by the optimizer and adapter code in order to keep the contents of the proposed catalog schema maintained at runtime, including the mechanisms used to align the (externally visible) status attached to a notice with the internal system state.

Console UX and Workflows

This section contains wireframe diagrams that illustrate the console UX and workflows that will be supported by the proposed notices schema. Each wireframe links to an Excalidraw scene which contains the wireframe sources and supports leaving comments directly on the diagram.

Parts highlighted in blue color are not going to be part of the MVP.

Listing all Notices

For the MVP, all notices will be listed in a single page. In the future, we can have scoped versions of this view as a tab on the clusters page.
The customer will be able to change the status of a notice directly from the list view:
- A notice marked as "resolved" indicates that the customer has performed the suggested corrective correction. This does not imply that the notice is already resolved, but that we expect this to happen once the related catalog objects have been (re-)created.
- A notice marked as "ignored" indicates that the user considers it irrelevant or low priority and they don't want to see it in the foreseeable future.
The page will contain an extra filter dropdown that will allow customers to filter out notices with a certain status:
- hide ignored and resolved - probably this should be the default,
- hide ignored - probably most useful for field engineers using the impersonated console,
- show all.

Inspecting a Notice

The Notice and the Hint sections correspond to what is currently rendered in the EXPLAIN output. In addition, the detailed view contains a Corrective Action statement which contains concrete instructions that will resolve the issue. If possible, the corrective action is given as a concrete SQL snippet, otherwise it is a valid markdown string.
Literals referenced in the notice, hint, and action fields are redacted (replaced by █) unless the feature is used by a Materialize superuser.
Every detailed view has a "Learn more" link below the notice section that references the public docs for this notice type. The notice docs are structured similar to the Rust error codes page and will provide more context, a minimal reproducing example, and a corrective action for the example for each notice type for customers that want to understand where and why a specific notice is emitted.

Catalog Schema

As part of the feature we will add the following builtin objects to mz_internal.

`mz_optimizer_notices`

Optimizer notices are exposed through a new BuiltinTable that contains an unfiltered list of all optimizer notices emitted by the optimizer when planning materialized views and indexes. The table contents are maintained directly by the adapter. The table has the following schema.

Field	Type	Meaning
`notice_type`	`text`	The notice type. Each type will correspond to a Materialize docs sub-page.
`message`	`text`	A brief description of the issue highlighted by this notice.
`hint`	`text`	A high-level hint that tells the user what can be improved.
`action`	`text`	A concrete action that will resolve the notice.
`redacted_message`	`text`	A redacted version of the `message` column. `NULL` if no redaction is needed.
`redacted_hint`	`text`	A redacted version of the `hint` column. `NULL` if no redaction is needed.
`redacted_action`	`text`	A redacted version of the `action` column. `NULL` if no redaction is needed.
`action_type`	`text`	The type of the `action` string (`sql_statements` for a valid SQL string or `plain_text` for plain text).
`object_id`	`text`	The ID of the materialized view or index. Corresponds to `mz_objects.id`. For global notices, this column is `NULL`.
`dependency_ids`	`text list`	A list of dependency IDs that need to exist for this notice to be still valid. Corresponds to `mz_objects.id`.
`created_at`	`timestamp with time zone`	The time at which the notice was created.
`updated_at`	`timestamp with time zone`	The time at which the notice was last updated.
`fingerprint`	`uuid`	A fingerprint for this notice computed in a way that survives re-creation of all associated objects using identical DDL statements.

`mz_notices`

The data from mz_optimizer_notices table is integrated into a mz_notices view which has identical schema and in the long run is meant to be defined as follows:

notices_part_1 UNION ALL
notices_part_2 UNION ALL
...
notices_part_n

where each of the n inputs corresponds to a catalog entry that provides notices managed by a different subsystem (optimizer, sources and sinks, adapter). At the moment, however, the view is defined with n = 1 and only selects from mz_optimizer_notices.

`mz_ignored_optimizer_notices`

Note: this section is out of scope for the MVP.

A new BuiltinSource used to keep track of ignored notices. Because of the fingerprint characteristics ignored notices will continue to be ignored even if we re-create the objects associated via object_id and dependency_ids as long as we use identical DDL statements.

Notice Generation and Management

Notice Generation

An optimizer pass produces a vector of notices as a side effect of an optimization run. In order to persist those notices in mz_optimizer_notices we need to be able to emit retractions for a previously inserted notice (for example when we delete the associated catalog item). However, the notice structure of the notices generated by the optimizer is not suitable for emitting retractions - for example:

It doesn't have an optional created_at field.
Some of the text fields need to be resolved against catalog items that might be renamed or removed in the meantime.

In order to protect ourselves against invalid retraction issues arising from these problems we make a distinction between the notice type produced by the optimizer (RawOptimizerNotice) and the notice type stored in the CatalogPlans struct (OptimizerNotice).

Further, the type of the notices stored in DataflowMetainfo::optimizer_notices depends on the execution stage - within the optimizer we still need to use RawOptimizerNotice, but before storing the final result in the CatalogPlans struct we need to move to a OptimizerNotice vector (done with a CatalogState::render_notices call). We use the CatalogPlans entries to generate insertions and deletions in the mz_optimizer_notices table.

Also, since we expect to have more than two notice types in the long term, we reorganize the code a bit so adding a new notice is easier in the future. The required steps are as follows:

Create a mz_transform::notice::<notice_type> submodule.
Define a struct for the new notice in that module.
Implement OptimizerNoticeApi for that struct.
Re-export the notice type in mz_transform::notice.
Add the notice type to the raw_optimizer_notices macro which generates the RawOptimizerNotice enum and other boilerplate code.

`mz_optimizer_notices` maintenance

Notices associated with an object_id or dependency_ids are immediately removed from the BuiltinTable when the associated catalog entry is deleted. We install a CatalogState::pack_optimizer_notices helper that is

Called with diff = 1:
- In Coordinator::bootstrap for each re-hydrated materialized view and index.
- In Coordinator::sequence_create_materialized_view for new entries.
- In Coordinator::sequence_create_index for new entries.
Called with diff = -1:
- For each id in drop_ids towards the end of Catalog::transact.

`mz_ignored_optimizer_notices` maintenance

Note: this section is out of scope for the MVP.

The design here follows the pattern adopted by MaterializeInc/materialize#21379.

The BuiltinSource is maintained by API extensions in the StorageController in the sequence_~ methods handling the SQL syntax extensions proposed below:

A SQL statement that marks a notice as ignored results in an insertion added to the source.
A SQL statement that marks a notice as no longer ignored results in a retraction added to the source.

SQL syntax extensions

Note: this section is out of scope for the MVP.

In order to support the ability to ignore notices as advertised in the Listing all Notices diagram we need a SQL extensions to be wired with the console dropdown. To facilitate this we propose the following syntax:

ALTER NOTICE <fingerprint> SET STATUS = '<status>';
ALTER NOTICE <fingerprint> RESET STATUS;

The new SQL command will translate into changes into the base tables contributing to mz_notices. In order to do that, we will extend the Coordinator with sequence_ method that handle the corresponding plan type as follows:

If the status is new or we have a RESET plan:
- Update the notice in the CatalogPlans in-memory state.
- Change the mz_optimizer_notices entry.
- Remove the <fingerprint> entry from mz_ignored_optimizer_notices.
If the status is resolved:
- Update the notice in the CatalogPlans in-memory state.
- Change the mz_optimizer_notices entry.
- Remove the <fingerprint> entry from mz_ignored_optimizer_notices.
If the status is ignored:
- Add a <fingerprint> entry to mz_ignored_optimizer_notices.

Minimal Viable Prototype

A prototype of the backend part of the design proposed above can be found in #23360.

The current prototype does not implement the following parts of the proposed design:

The mz_ignored_optimizer_notices builtin source.
Filtering based on fingerprint in the mz_notices.
The proposed SQL syntax extensions.

RBAC handling

Currently, Builtin objects are created with a sensitivity field that is assigned a DataSensitivity value.

/// The extent to which data in a builtin object
/// should be considered "sensitive" and therefore
/// access to it restricted.
#[derive(Clone, Debug, Hash, Serialize)]
pub enum DataSensitivity {
    /// Any user may query the object.
    Public,
    /// Superusers or Materialize staff may query the object.
    SuperuserAndSupport,
    /// Only superusers may query the object.
    Superuser,
}

Some of the builtin objects proposed in the Catalog Schema (such as mz_optimizer_notices) are only meant to store raw data and support the construction of user-facing builtin views. Those can be all defined with sensitivity DataSensitivity::Superuser.

For the MVP, we decided to go with a combination of a Superuser and SuperuserAndSupport policies where:

mz_optimizer_notices uses Superuser.
mz_optimizer_notices_redacted uses SuperuserAndSupport. The view is redacted by re-binding ~ to redacted_~.

This means that.

☑ All notices will be only to superusers.
☑ All notices will be only to Materialize support engineers.
☑ The design is SOC2 compliant.
☒ Regular users and will see redacted hints by default and might require assistance from a superuser if there are ambiguities.

Discussed Alternatives can be found in the alternatives section.

In the long term, we will most probably adopt the dedicated roles mechanism proposed by MaterializeInc/database-issues#7261 in order to allow admins to grant read access to other users.

Alternatives

There are no fundamentally different architectures, but several places of the outlined proposal can be implemented in a different way. Those are listed here along with reasons why we opted for the current design.

Eager cleanup of notices when a dependency is deleted

There is a design choice in how to ensure that notices that are no longer valid don't show up in mz_optimizer_notices. A notice is no longer valid when either its dependant object_id or one of its dependency_ids disappears.

We can maintain this in two ways:

In the adapter by emitting retractions for notices whenever the associated catalog object is deleted. At the moment we do this for object_id.
In the view maintenance DDL by filtering out mz_optimizer_notices entries that are not present in mz_objects. At the moment we do this for dependency_ids.

This asymmetry is a bit odd. In the long term we should handle both cases in the same way unless there are some very good reasons against that.

More efficient `notice_type` and `action_type` fields

We can change the type of these fields to be int instead of string and define auxiliary BuiltinTable sources that are populated with the currently available sources on startup.

The current design is simpler and optimizes for speed of delivering the MVP. This alternative is a performance optimization that can be done as a follow-up if needed.

RBAC handling alternatives

Besides the accepted RBAC handling alternative, we have considered the following two options.

Use DataSensitivity::Public everywhere.
- ☑ All notices will be visible to all Materialize users.
- ☒ The mz_notices view might leak information about DDL statements in the text fields of notices scoped to these statements (for example the hint text might leak a literal constraint from a WHERE clause for a CREATE MATERIALIZED VIEW statement).
Use DataSensitivity::SuperuserAndSupport everywhere.
- ☑ All notices will be visible to all Materialize users.
- ☒ Notices will be visible to non-superuser customers only from the UX which will run SQL queries under the mz_support user for everybody.
- ☒ The UX might leak information by listing notices for catalog items that are not owned by the current user with the same implications as above.

Open questions

What to use to compute the fingerprint?

GlobalID values (survive restarts but not DDL evolution).
Names of objects (ultimately they are also subject to change).
Alpha-equivalent versions of the create SQL.
Re-associating once an object has been re-created (REPLAN) would render the first two of the above alternatives useless.

If we use an extra BuiltinSource for the fingerprints of ignored statements, how are we ensure that we periodically clean up the ignored table?

Why not save the notices in a BuiltinTable?

Changes required for Platform V2

The current design is in general compatible with the "Platform V2" effort, but the implementation will need to be adapted.

More specifically, in Platform V2 we expect to have multiple coordd instances that will hydrate identical in-memory version of the persisted catalog. DDL statements will be routed to one of the available coordd instances which will act as a leader and perform the catalog transaction. Follower instances will get notified about the catalog change and will have to react to an Op corresponding to a new materialized view or index entry by populating their internal memory structures with identical copies of the catalog plans and associated notices. In terms of BuiltinTable maintenance, however, follower instances will have to be careful not to emit insertions or retractions against mz_optimizer_notices.

20231113_optimizer_notice_catalog.md 27 KB History Raw

Catalog tables for Optimizer Notices

The Problem

SOC2 Compliance

Notice state transitions

Lifespan issues

Success Criteria

Out of Scope

Solution Proposal

Console UX and Workflows

Listing all Notices

Inspecting a Notice

Catalog Schema

mz_optimizer_notices

mz_notices

mz_ignored_optimizer_notices

Notice Generation and Management

Notice Generation

mz_optimizer_notices maintenance

mz_ignored_optimizer_notices maintenance

SQL syntax extensions

Minimal Viable Prototype

RBAC handling

Alternatives

Eager cleanup of notices when a dependency is deleted

More efficient notice_type and action_type fields

RBAC handling alternatives

Open questions

Changes required for Platform V2

20231113_optimizer_notice_catalog.md 27 KB

History Raw

`mz_optimizer_notices`

`mz_notices`

`mz_ignored_optimizer_notices`

`mz_optimizer_notices` maintenance

`mz_ignored_optimizer_notices` maintenance

More efficient `notice_type` and `action_type` fields