Epic: #19547
Materialize's API today requires that users control a clusters's size and replication factor by manually managing its replicas. The following is a quick tour of the SQL commands used for these operations.
To create a cluster, users must explicitly specify the replica set:
-- Create a cluster with two replicas of different sizes.
CREATE CLUSTER c REPLICAS (
r1 (SIZE 'small'),
r2 (SIZE 'medium')
);
To decrease the replication factor of a cluster, they must choose which replica to discard:
-- Decrease the replication factor of `c` to 1.
DROP CLUSTER REPLICA c.r2;
To increase the replication factor, they must create the appropriate number of new replicas:
-- Increase the replication factor of `c` to 3.
CREATE CLUSTER REPLICA c.r2 SIZE 'small';
CREATE CLUSTER REPLICA c.r3 SIZE 'small';
To resize a cluster, they must create new replicas at the new size with the desired replication factor:
-- Change the size of `c` from `small` to `medium`.
-- Create new bigger replicas.
CREATE CLUSTER REPLICA c.r1_big SIZE 'medium';
CREATE CLUSTER REPLICA c.r2_big SIZE 'medium';
CREATE CLUSTER REPLICA c.r3_big SIZE 'medium';
-- Drop old smaller replicas.
DROP CLUSTER REPLICA c.r1;
DROP CLUSTER REPLICA c.r2;
DROP CLUSTER REPLICA c.r3;
We've received overwhelming feedback from education, field engineering, and end users that manual management of replicas is confusing and/or frustrating. Questions have included:
r1
, r2
, ..., why do I need to specify
their names?ALTER CLUSTER ... SET (SIZE)
?The key source of friction is that users think in terms of the fundamental properties of a cluster—its size and its replication factor—but today's API requires that users imperatively manage replicas, rather than declaratively specifying the cluster's size and replication factor.
We've always intended to build a declarative API on top of the existing API. The original cluster replica design from April 2022 sketched this declarative API that would allow users to create a cluster with three replicas that automatically scaled up and down in response to load:
CREATE CLUSTER foo REPLICATION FACTOR 3, MIN SIZE 'small', MAX SIZE 'xlarge'
As described, this API requires "dynamic cluster scheduling", which is difficult and has many unknowns, so we intentionally chose to not work on this in the past year.
The need for improvement has recently become more urgent, as the designers of both our Terraform provider and our web console have considered layering on a better cluster management experience in their respective tools. But the right place to solve this is in the database itself, so that we don't need to write bespoke logic in each tool that is downstream of the SQL API, and so that users who use the SQL API directly get the improved experience too.
So, this design document proposes a new course: a new cluster management API that solves most of the ergonomic issues with the existing API but does not require the full generality of dynamic cluster scheduling.
We'll add support for the following declarative cluster management commands to Materialize:
-- Create a managed cluster with one small replica.
CREATE CLUSTER foo SIZE 'small';
-- Size up the cluster. IN THE FUTURE this will gracefully
-- change the size without downtime. AT PRESENT, this will cause
-- downtime for as long as it takes the new replica(s) to rehydrate.
--
-- If you NEED graceful resizing, convert this cluster to an unmanaged
-- cluster and handle dropping/creating replicas yourself.
ALTER CLUSTER foo SET (SIZE 'medium');
-- Add two new replicas automatically.
ALTER CLUSTER foo SET (REPLICATION FACTOR 3);
-- Turn off the cluster for the night.
ALTER CLUSTER foo SET (REPLICATION FACTOR 0);
-- You can also create a cluster with multiple replicas from the get go.
CREATE CLUSTER foo2 SIZE 'small', REPLICATION FACTOR 2;
We'll introduce the concept of a managed cluster. A managed cluster is one with a declared size and replication factor, where Materialize is responsible for ensuring the replica set matches the declared size and replication factor. The replicas of a managed cluster are visible in the system catalog, but cannot be directly modified by users.
Existing clusters, where users control the replica set manually via
{CREATE|DROP} CLUSTER REPLICA
, will be deemed unmanaged clusters.
CREATE CLUSTER
The CREATE CLUSTER
statement will learn three new options:
SIZE
, a string, which specifies the desired size of the replicas in the
cluster.MANAGED
, a boolean, which specifies whether the cluster is managed or
unmanaged.REPLICATION FACTOR
, an integer, which specifies the desired number of
replicas.In addition to this, the CREATE CLUSTER
statement will learn specialized
configuration parameters for replicas to control introspection, the
arrangement idle merge effort, and availability zone.
Users must specify either the SIZE
or REPLICAS
option when creating a
cluster. The two options may not be specified simultaneously.
The MANAGED
option must be true if SIZE
is specified or false if REPLICAS
is specified. It takes on the appropriate default if unspecified. We expect that
this option will never be explicitly specified in practice, but we accept it in
order to be consistent with ALTER CLUSTER
.
The REPLICATION FACTOR
option may only be specified if MANAGED
is true. If unspecified, it defaults to 1
.
For managed clusters, Materialize will automatically create replicas named
r1
, r2
, ..., rN
, where N
is the desired replication factor, all of
the size specified by the SIZE
option.
Note: The documentation should call out that the replica naming scheme may change in future versions of Materialize.
For unmanaged clusters, CREATE CLUSTER
's behavior remains unchanged.
Note: Managed clusters can not be used to host sources and sinks, or can be used as linked clusters.
ALTER CLUSTER
We'll extend the ALTER CLUSTER
statement to support altering the SIZE
and
REPLICATION FACTOR
of a cluster using the usual syntax for ALTER ... SET
and ALTER ... RESET
commands:
-- Change the size of a cluster to `medium`.
ALTER CLUSTER c SET (SIZE = 'medium');
-- Change the replication factor of a cluster to 2.
ALTER CLUSTER c SET (REPLICATION FACTOR = 2);
-- Simultaneously change the size and replication factor of a cluster.
ALTER CLUSTER c SET (SIZE = 'medium', REPLICATION FACTOR = 2);
-- Change the replication factor of a cluster to the default value of 1.
ALTER CLUSTER c RESET (REPLICATION FACTOR);
When the command updates the cluster's size, Materialize will immediately drop all existing replicas of the cluster and recreate them with the same names with the new size.
Warning
In the initial implementation, changing a cluster's size will result in downtime while the new replicas rehydrate. This may be surprising to users.
We hope to change the behavior in a future release so that no downtime is incurred, but this will require dynamic cluster scheduling.
When the command increases the cluster's replication factor, Materialize will
immediately create as many new replicas as necessary to meet the desired
replication factor. It will name the replicas such that the resulting replica
set adheres to the r1
, r2
, ..., rN
naming scheme.
When the command decreases the cluster's replication factor, Materialize will
immediately drop as many replicas as necessary to meet the desired replication
factor. It will choose replicas to drop so that the resulting replica set
adheres to the r1
, r2
, ..., rN
naming scheme.
Users may only modify these options for managed clusters. Attempting to set the size or replication factor for an unmanaged cluster will result in an error like the following:
> CREATE CLUSTER c REPLICAS ();
> ALTER CLUSTER c SET (SIZE = 'medium');
ERROR: cannot set SIZE option for unmanaged cluster "c"
HINT: Manually manage replicas using CREATE CLUSTER REPLICA and DROP CLUSTER REPLICA,
or convert the cluster to a managed cluster by running ALTER CLUSTER c SET (MANAGED).
ALTER CLUSTER
will support enabling the MANAGED
option for a cluster to
convert an unmanaged cluster to a managed cluster:
ALTER CLUSTER c SET (MANAGED = true);
ALTER CLUSTER c SET (MANAGED); -- Shorthand for the above
Conversion is possible if and only if all of the following conditions are met:
S
.r1
, r2
, ..., rN
, with no gaps in the numbering.
(Zero replicas trivially matches this pattern with N=0
.)The SIZE
option of the cluster will be set to S
and the
REPLICATION FACTOR
will be set to N
.
Because the conditions for conversion are strict
> CREATE CLUSTER c REPLICAS (
r1 (SIZE 'small'),
r2 (SIZE 'medium')
);
> ALTER CLUSTER c SET (MANAGED);
ERROR: cannot convert cluster "c" to a managed cluster
DETAIL: Replicas do not all have the same size.
> CREATE CLUSTER c REPLICAS (
foo (SIZE 'small'),
bar (SIZE 'small'),
r3 (SIZE 'small')
);
> ALTER CLUSTER c SET (MANAGED);
ERROR: cannot convert cluster "c" to a managed cluster
DETAIL: The following replicas do not match required naming pattern `r1`, `r2`, ...
foo
bar
> CREATE CLUSTER c REPLICAS (
r1 (SIZE 'small'),
r3 (SIZE 'medium')
);
> ALTER CLUSTER c SET (MANAGED);
ERROR: cannot convert cluster "c" to a managed cluster
DETAIL: The replicas do not match required naming pattern `r1`, `r2`, .... The
first missing replica is "r2".
ALTER CLUSTER
will support disabling the MANAGED
option for a cluster to
convert a managed cluster to an unmanaged cluster:
ALTER CLUSTER c SET (MANAGED = false);
There are no constraints on the conversion. All managed clusters can be converted to unmanaged clusters.
The conversion preserves the existing replicas of the cluster. Users can then
create or drop replicas of the cluster using {CREATE|DROP} CLUSTER REPLICA
.
{CREATE|DROP} CLUSTER REPLICA
The CREATE CLUSTER REPLICA
and DROP CLUSTER REPLICA
statements will be
restricted to operations on replicas of an unmanaged cluster.
Materialize will produce an error message like the following when attempting to manually manipulate replicas of a managed cluster:
> CREATE CLUSTER c SIZE 'small';
> DROP CLUSTER REPLICA c1.r1;
ERROR: cannot drop replica of managed cluster "c"
HINT: Use ALTER CLUSTER to change the cluster's size and replication factor, or
convert the cluster to an unmanaged cluster by running ALTER CLUSTER c SET (MANAGED = false).
ALTER CLUSTER [REPLICA] ... RENAME TO
ALTER CLUSTER
and ALTER CLUSTER REPLICA
will learn to support renaming
clusters and cluster replicas, respectively:
-- Rename cluster c1 to c2.
ALTER CLUSTER c1 RENAME TO c2;
-- Rename replica r1 of cluster c1 to r2.
ALTER CLUSTER REPLICA c1.r1 RENAME TO r2;
ALTER CLUSTER REPLICA
will not permit renaming a replica of a managed cluster.
mz_clusters
mz_clusters
will grow three new columns:
Field | Type | Meaning |
---|---|---|
managed |
bool |
Whether the cluster has automatically managed replicas. |
size |
text |
If the cluster is managed, the desired size of the cluster's replicas. If the cluster is unmanaged, NULL . |
replication_factor |
bigint |
If the cluster is managed, the desired number of replicas of the cluster. If the cluster is unmanaged, NULL . |
We could wait to improve the API until we've built dynamic cluster scheduling.
Is it acceptable that ALTER CLUSTER ... SET (SIZE = ...)
causes downtime?
Should we infer the MANAGED
option based on whether SIZE
is set? If so,
users would convert to a managed cluster by running
ALTER CLUSTER unmanaged SET (SIZE = ...)
, and convert to an unmanaged
cluster by running ALTER CLUSTER managed RESET (SIZE)
.
I personally prefer the explictness of
ALTER CLUSTER ... SET (MANAGED = ...)
.