redacted_..._sql
columns to system catalog)Currently, the optimizer collects and emits notices as a by-product of
sequencing or bootstrapping CREATE MATERIALIZED VIEW
and CREATE INDEX
statements. In order to see a notice, the user has to either:
psql
(or a compatible client) when executing a CREATE
statement, orEXPLAIN PLAN
output for the created catalog items.Applications that want to expose optimizer notices are also stuck with these two choices, making it hard to integrate them in the console UX. In order to lay the foundations for both future UX work and for adding more optimizer notice types (including types not related to a single catalog item) in this document we propose:
Some critical (but not obvious at first glance) associated problems that are addressed by necessity in this design doc are stated below.
Some of the notices print literal constraints appearing in the
CREATE MATERIALIZED VIEW AS <query>
CREATE VIEW AS <query>
statements. Since literal constrains appearing in these statements are
classified as "Customer Data", we should obfuscate those in
redacted_~
columns similarly to the redacted_create_sql
columns proposed in
MaterializeInc/database-issues#5916.
Note: Issues related to this section are out of scope for the MVP.
Most of the notices will suggest a concrete corrective action that resolves them. However, teaching the system to automatically recognize when a corrective action was taken has some challenges. For example, in a blue-green setup, notices based on production objects should not be handled directly and instead should be resolved in the current development environment.
The proposed design therefore assumes that the console UX will provide workflows where users will explicitly transition the state of a specific notice (similar to workflows on issue tracking websites such as Jira or GitHub).
Note: Issues related to this section are out of scope for the MVP.
Even though not all optimizer notices will be scoped to a catalog entry (a materialized view or an index), for those that are we run into the following lifespan issue.
On the one hand, the lifespan of the notice corresponds to the lifespan of the
dataflow backing the catalog entry that produced that notice. This is the case
because (a) we re-optimize dataflow-backed catalog entries on every
environmentd
restart, and (b) the notice might disappear or change when we
re-optimize the entry due to changes in the optimizer code or in the catalog
state.
On the other hand, for a frictionless UX we should be able to remember a user's
decision to ignore noisy notices across environmentd
restarts.
Consequently, the lifespan of an "ignored notice" should outlive multiple
incarnations of the same notice caused by environmentd
restart or
user-initiated re-creation of the associated catalog entry.
Stakeholders have agreed on:
SELECT
queries (although the design should have at least
some idea how these will work out).This section contains wireframe diagrams that illustrate the console UX and workflows that will be supported by the proposed notices schema. Each wireframe links to an Excalidraw scene which contains the wireframe sources and supports leaving comments directly on the diagram.
Parts highlighted in blue color are not going to be part of the MVP.
EXPLAIN
output. In addition, the detailed view contains a Corrective
Action statement which contains concrete instructions that will resolve the
issue. If possible, the corrective action is given as a concrete SQL snippet,
otherwise it is a valid markdown string.notice
, hint
, and action
fields are redacted
(replaced by █) unless the feature is used by a Materialize superuser.As part of the feature we will add the following builtin objects to
mz_internal
.
mz_optimizer_notices
Optimizer notices are exposed through a new BuiltinTable
that contains an
unfiltered list of all optimizer notices emitted by the optimizer when planning
materialized views and indexes. The table contents are maintained directly by
the adapter. The table has the following
schema.
Field | Type | Meaning |
---|---|---|
notice_type |
text |
The notice type. Each type will correspond to a Materialize docs sub-page. |
message |
text |
A brief description of the issue highlighted by this notice. |
hint |
text |
A high-level hint that tells the user what can be improved. |
action |
text |
A concrete action that will resolve the notice. |
redacted_message |
text |
A redacted version of the message column. NULL if no redaction is needed. |
redacted_hint |
text |
A redacted version of the hint column. NULL if no redaction is needed. |
redacted_action |
text |
A redacted version of the action column. NULL if no redaction is needed. |
action_type |
text |
The type of the action string (sql_statements for a valid SQL string or plain_text for plain text). |
object_id |
text |
The ID of the materialized view or index. Corresponds to mz_objects.id . For global notices, this column is NULL . |
dependency_ids |
text list |
A list of dependency IDs that need to exist for this notice to be still valid. Corresponds to mz_objects.id . |
created_at |
timestamp with time zone |
The time at which the notice was created. |
updated_at |
timestamp with time zone |
The time at which the notice was last updated. |
fingerprint |
uuid |
A fingerprint for this notice computed in a way that survives re-creation of all associated objects using identical DDL statements. |
mz_notices
The data from mz_optimizer_notices
table is integrated into a mz_notices
view
which has identical schema and in the long run is meant to be defined as
follows:
notices_part_1 UNION ALL
notices_part_2 UNION ALL
...
notices_part_n
where each of the n
inputs corresponds to a catalog entry that provides
notices managed by a different subsystem (optimizer, sources and sinks,
adapter). At the moment, however, the view is defined with n = 1
and only
selects from mz_optimizer_notices
.
mz_ignored_optimizer_notices
Note: this section is out of scope for the MVP.
A new BuiltinSource
used to keep track of ignored notices. Because of the
fingerprint
characteristics ignored notices will continue to be ignored even
if we re-create the objects associated via object_id
and dependency_ids
as
long as we use identical DDL statements.
An optimizer pass produces a vector of notices as a side effect of an
optimization run. In order to persist those notices in
mz_optimizer_notices
we need to be able to emit retractions for a
previously inserted notice (for example when we delete the associated catalog
item). However, the notice structure of the notices generated by the optimizer
is not suitable for emitting retractions - for example:
created_at
field.In order to protect ourselves against invalid retraction issues arising from
these problems we make a distinction between the notice type produced by the
optimizer (RawOptimizerNotice
) and the notice type stored in the
CatalogPlans
struct (OptimizerNotice
).
Further, the type of the notices stored in DataflowMetainfo::optimizer_notices
depends on the execution stage - within the optimizer we still need to use
RawOptimizerNotice
, but before storing the final result in the CatalogPlans
struct we need to move to a OptimizerNotice
vector (done with a
CatalogState::render_notices
call). We use the CatalogPlans
entries to
generate insertions and deletions in the mz_optimizer_notices
table.
Also, since we expect to have more than two notice types in the long term, we reorganize the code a bit so adding a new notice is easier in the future. The required steps are as follows:
mz_transform::notice::<notice_type>
submodule.OptimizerNoticeApi
for that struct.mz_transform::notice
.raw_optimizer_notices
macro which generates the
RawOptimizerNotice
enum and other boilerplate code.mz_optimizer_notices
maintenanceNotices associated with an object_id
or dependency_ids
are immediately
removed from the BuiltinTable
when the associated catalog entry is deleted. We
install a CatalogState::pack_optimizer_notices
helper that is
diff = 1
:
Coordinator::bootstrap
for each re-hydrated materialized view and index.Coordinator::sequence_create_materialized_view
for new entries.Coordinator::sequence_create_index
for new entries.diff = -1
:
id
in drop_ids
towards the end of Catalog::transact
.mz_ignored_optimizer_notices
maintenanceNote: this section is out of scope for the MVP.
The design here follows the pattern adopted by MaterializeInc/materialize#21379.
The BuiltinSource
is maintained by API extensions in the StorageController
in the sequence_~
methods handling the SQL syntax
extensions proposed below:
Note: this section is out of scope for the MVP.
In order to support the ability to ignore notices as advertised in the Listing all Notices diagram we need a SQL extensions to be wired with the console dropdown. To facilitate this we propose the following syntax:
ALTER NOTICE <fingerprint> SET STATUS = '<status>';
ALTER NOTICE <fingerprint> RESET STATUS;
The new SQL command will translate into changes into the base tables
contributing to mz_notices
. In order to do
that, we will extend the Coordinator
with sequence_
method that handle the
corresponding plan type as follows:
new
or we have a RESET
plan:
CatalogPlans
in-memory state.mz_optimizer_notices
entry.<fingerprint>
entry from mz_ignored_optimizer_notices
.resolved
:
CatalogPlans
in-memory state.mz_optimizer_notices
entry.<fingerprint>
entry from mz_ignored_optimizer_notices
.ignored
:
<fingerprint>
entry to mz_ignored_optimizer_notices
.A prototype of the backend part of the design proposed above can be found in #23360.
The current prototype does not implement the following parts of the proposed design:
mz_ignored_optimizer_notices
builtin
source.fingerprint
in the mz_notices
.Currently, Builtin
objects are created with a sensitivity
field that
is assigned a DataSensitivity
value.
/// The extent to which data in a builtin object
/// should be considered "sensitive" and therefore
/// access to it restricted.
#[derive(Clone, Debug, Hash, Serialize)]
pub enum DataSensitivity {
/// Any user may query the object.
Public,
/// Superusers or Materialize staff may query the object.
SuperuserAndSupport,
/// Only superusers may query the object.
Superuser,
}
Some of the builtin objects proposed in the Catalog Schema
(such as mz_optimizer_notices
) are only meant to store raw data and
support the construction of user-facing builtin views. Those can be all defined
with sensitivity DataSensitivity::Superuser
.
For the MVP, we decided to go with a combination of a Superuser
and
SuperuserAndSupport
policies where:
mz_optimizer_notices
uses Superuser
.mz_optimizer_notices_redacted
uses SuperuserAndSupport
. The view is redacted by
re-binding ~
to redacted_~
.This means that.
Discussed Alternatives can be found in the alternatives section.
In the long term, we will most probably adopt the dedicated roles mechanism proposed by MaterializeInc/database-issues#7261 in order to allow admins to grant read access to other users.
There are no fundamentally different architectures, but several places of the outlined proposal can be implemented in a different way. Those are listed here along with reasons why we opted for the current design.
There is a design choice in how to ensure that notices that are no longer valid
don't show up in mz_optimizer_notices
. A notice is no longer valid when
either its dependant object_id
or one of its dependency_ids
disappears.
We can maintain this in two ways:
object_id
.mz_optimizer_notices
entries that are not present in mz_objects
. At the moment we do this for
dependency_ids
.This asymmetry is a bit odd. In the long term we should handle both cases in the same way unless there are some very good reasons against that.
notice_type
and action_type
fieldsWe can change the type of these fields to be int
instead of string
and
define auxiliary BuiltinTable
sources that are populated with the currently
available sources on startup.
The current design is simpler and optimizes for speed of delivering the MVP. This alternative is a performance optimization that can be done as a follow-up if needed.
Besides the accepted RBAC handling alternative, we have considered the following two options.
DataSensitivity::Public
everywhere.
mz_notices
view might leak information about DDL
statements in the text fields of notices scoped to these statements (for
example the hint
text might leak a literal constraint from a WHERE
clause for a CREATE MATERIALIZED VIEW
statement).DataSensitivity::SuperuserAndSupport
everywhere.
mz_support
user for everybody.What to use to compute the fingerprint
?
GlobalID
values (survive restarts but not DDL evolution).If we use an extra BuiltinSource
for the fingerprints of ignored statements,
how are we ensure that we periodically clean up the ignored table?
Why not save the notices in a BuiltinTable
?
The current design is in general compatible with the "Platform V2" effort, but the implementation will need to be adapted.
More specifically, in Platform V2 we expect to have multiple coordd
instances
that will hydrate identical in-memory version of the persisted catalog. DDL
statements will be routed to one of the available coordd
instances which will
act as a leader and perform the catalog transaction. Follower instances will get
notified about the catalog change and will have to react to an Op
corresponding to a new materialized view or index entry by populating their
internal memory structures with identical copies of the catalog plans and
associated notices. In terms of BuiltinTable
maintenance, however, follower
instances will have to be careful not to emit insertions or retractions against
mz_optimizer_notices
.