20230717_alter_set_cluster.md 4.1 KB

Moving source and sinks

Issue: #17417

Context

Materialize's API today allows users to specify a target cluster for sources, sinks and materialized views on creation, but does not allow for changing the cluster at a later time. To change the cluster, users have to drop and recreate the object.

This is problematic:

  • Sources almost always have downstream dependencies that need to be recreated, too.
  • Sinks require users to recreate the topic in the downstream Kafka broker.
  • Materialized views can have downstream dependencies.

Goals

  • Users can change the cluster on which a source, sink, or materialized view runs.

Non-Goals

  • Mixing compute and storage workload on the same cluster.

Overview

We'll add support for the following declarative management commands to Materialize:

-- Move a source to cluster `clstr`:
ALTER SOURCE src SET CLUSTER clstr;

-- Move a sink to cluster `c2`:
ALTER SINK snk SET CLUSTER c2;

-- Move a materialized view to cluster `prod`:
ALTER MATERIALIZED VIEW my_view SET CLUSTER prod;

This corresponds to the CREATE syntax where the user specifies the cluster with an CLUSTER parameter.

Note that this is intentionally lacking parens (ALTER SOURCE src SET (CLUSTER = clstr);). This is for consistency with the analogous PostgreSQL command ALTER TABLE ... SET {SCHEMA|TABLESPACE} {schema|tblspc}. The rule is that only parameters configured in the WITH block go inside the parens; parameters that have dedicated syntax, like schemas, tablespaces, and clusters, get dedicated ALTER ... SET syntax too.

Detailed description

The ALTER SOURCE, ALTER SINK and ALTER MATERIALIZED VIEW statements will learn a new option SET CLUSTER clstr:

  • The cluster name clstr must match an existing cluster.
  • The usual constraints apply. For sources and sinks, the cluster must either be empty or only host sources and sinks. The replication factor must be 0 or 1. For materialized views, the cluster must not host sources or sinks.
  • Clusters can be managed or unmanaged.

When the command updates the cluster, Materialize will immediately drop the existing source, sink, or materialized view and recreate it in the target cluster.

Warning

In the initial implementation, changing the cluster associated will result in downtime while the object recreates. During this time, the source, sink, or materialized view will not produce new data.

We hope to change the behavior in a future release so that no downtime is incurred, but this will require smarter scheduling.

Users may not schedule compute objects together with sources and sinks. Attempting to schedule incompatible objects will result in an error like the following:

ALTER SOURCE src SET CLUSTER compute_cluster;
ERROR: cannot alter source cluster to cluster containing indexes or materialized views

Users may not schedule sources or sinks on clusters with an incompatible replication factor. Attempting to do so will result in an error like the following:

ALTER SOURCE src SET CLUSTER cluster;
ERROR: cannot alter source cluster to cluster with more than one replica

Users cannot move system objects to different clusters, and users cannot move user objects to system clusters.

Linked clusters

Sources and sinks can have linked clusters, which are clusters owned by the source or sink. Altering clusters of sources and sinks will cause any previously linked cluster to be dropped.

We do not support moving a source or sink back to a linked cluster. If there is the need to move sources or sinks back to a linked cluster, we could use the following syntax:

ALTER SOURCE src RESET CLUSTER

Alternatives

  • We could wait to improve the API until we have unified clusters.

  • We could support moving a source or sink to a linked cluster when either a size value was previously specified, or explicitly set on the ALTER command.

SET IN CLUSTER

We rejected the following alternative syntax using IN:

ALTER SOURCE src IN CLUSTER clstr;

This is to avoid potential future conflict with setting parameters that explicitly require IN.