This document fleshes out the design for secrets, which securely store sensitive values.
Four new SQL statements will be introduced with the following grammar to create, alter, drop, and list secrets, respectively:
create_secret_stmt ::=
CREATE SECRET [IF NOT EXISTS] <name> AS <value>
alter_secret_stmt ::=
ALTER SECRET [IF EXISTS] <name> AS <value>
alter_secret_rename_stmt ::=
ALTER SECRET [IF EXISTS] <name> RENAME TO <name>
drop_secret_stmt ::=
DROP SECRET [IF EXISTS] <name> [{ RESTRICT | CASCADE }]
show_secrets_stmt: SHOW SECRETS [FROM <schema>] [{ LIKE 'pattern' | WHERE <expr> }]
<name> ::= <qualified-identifier>
<value> ::= <scalar-expression>
Initially, all of these statements will be available in experimental mode only.
The name of a secret is a qualified identifier. Secrets live in the standard database and schema hierarchy alongside tables, views, sources, etc.
The CREATE SECRET
statement creates a new secret with the specified contents.
It returns an error if a secret with the specified name already exists unless
the IF NOT EXISTS
clause is present. The <value>
expression may not
reference any relations and must be implicitly castable to bytea
.
The first form of the ALTER SECRET
statement changes the contents of an
existing secret. The second form changes the name of an existing secret.
The DROP SECRET
statement drops an existing secret. It returns an error
if a secret with the specified name does not exist unless the IF EXIST
clause is present. Dropping a secret that is in use by a CONNECTOR
is not
permitted unless CASCADE
is present.
A SQL transaction can usually contain only one secret DDL statement. As a
special case, if the secrets controller in use supports
multi-operation atomicity, a SQL transaction can contain any number of
non-renaming ALTER SECRET
statements, but no other statements, in which case
the secrets will be updated atomically.
Note that while the catalog updates to a secret apply immediately, changes to the contents of the secret do not propagate to downstream consumers of the secret on any particular timeline.
The SHOW SECRETS
statement lists the names of the secrets in the specified
schema, or the first schema in the search path if no schema is explicitly
specified, optionally filtered by the provided LIKE
pattern or WHERE
expression. The filter clauses which work analogously to the same clauses in the
SHOW DATABASES
statement.
Note that there is intentionally no way to view the contents of secrets.
A new mz_secrets
table with the following structure will describe the
available secrets in the system:
Field | Type | Meaning |
---|---|---|
id |
text |
The ID of the secret. |
schema_id |
bigint |
The ID of the schema to which the secret belongs. |
name |
text |
The name of the secret. |
Note that, again, the contents of the secrets are not exposed.
There is presently no way to use a secret. The existing CREATE SOURCE
and
CREATE SINK
commands will be extended to support referencing secrets, but
that work is left to a future design document.
The CatalogItem
enum will be extended with an additional Secret
variant:
pub enum CatalogItem {
// ...
Secret,
}
Unlike the other catalog items, there is no data associated with a secret. The
name and ID of the secret are stored in the CatalogEntry
struct, and the
actual data for the secret will be stored in a secret backend (see next
section). Therefore there is presently no additional metadata that needs to be
recorded about a secret.
When serializing a secret, its create_sql
will be synthesized as:
CREATE SECRET <name> AS '******'
When deserializing a secret on bootstrap, the '******'
will be ignored.
The Coordinator
will host a SecretsController
that securely handles the
storage of secrets. It will interact with the controller via the following
interface.
/// Securely stores secrets.
pub trait SecretsController {
/// Applies the specified secret operations in bulk.
///
/// Implementations must apply the operations atomically. If the method
/// returns `Ok(())`, then all operations have been applied successfully;
/// if the method returns `Err(())`, then none of the operations have been
/// applied.
///
/// Implementations are permitted to reject combinations of operations which
/// they cannot apply atomically.
fn apply(&mut self, ops: Vec<SecretOp>) -> Result<(), anyhow::Error>;
/// Returns the IDs of all known secrets.
fn list(&self) -> Result<Vec<GlobalId>, anyhow::Error>;
}
/// An operation on a [`SecretsController`].
pub enum SecretOp {
/// Create or update the contents of a secret.
Ensure {
/// The ID of the secret to create or update.
id: GlobalId,
/// The binary contents of the secret.
contents: Vec<u8>,
},
/// Delete a secret.
Delete {
/// The id of the secret to delete.
id: GlobalId
}
}
There will be two concrete implementations of the SecretsController
:
The FilesystemSecretsController
will store secrets on the local filesystem.
It will not support multi-operation atomicity, but will support atomically
updating or removing one secret at a time.
The KubernetesSecretsController
will store secrets in Kubernetes. It
will support multi-operation atomicity.
All secrets controllers expose the contents of secrets to the STORAGE
and COMPUTE
layer by way of a directory that contains one file per secret,
where the name of the file is the secret's ID and the contents of the file are
the secret's contents. For example, if IDs u12
and u36
are secrets, the
directory would look like this:
secrets/
u12
u36
This bears repeating: even the KubernetesSecretsController
will expose the
contents of the secrets via the filesystem. The filesystem is the lingua
franca for exposing secrets to applications. Kubernetes supports exposing the
contents of secrets as files on the filesystem, as does HashiCorp Vault.
The STORAGE
and COMPUTE
layers will need to be taught to re-read the
contents of secrets that they consume from disk periodically, in case the
contents have bene updated by an ALTER SECRET
. We may wish to build a helper
"secret consumer" library that uses filesystem events to notice when the secrets
change, rather than polling periodically.
Providing proper SQL transaction semantics requires a complicated integration between catalog transactions and secret controller transactions.
Let's consider each of the cases in turn:
The SQL transaction contains exactly one CREATE SECRET
statement.
The coordinator calls SecretsController::apply
with a single
SecretOp::Ensure
. If the apply fails, the coordinator reports the SQL
transaction as failed. If the apply succeeds, the coordinator calls
Catalog::transact
to create the secret object. If the catalog transaction
succeeds, the coordinator reports the SQL transaction as successful. If
the catalog transaction fails, the coordinator makes a best-effort attempt
to remove the secret by calling SecretsController::apply
with a single
SecretOp::Delete
, then reports the SQL transaction as failed.
The SQL transaction contains exactly one DROP SECRET
statement.
The coordinator calls Catalog::transact
to remove the secret from the
catalog. If the catalog transaction fails, the coordinator reports the SQL
transaction as failed. If the catalog transaction succeeds, the coordinator
calls SecretsController::apply
with a single SecretOp::Delete
to remove
the secret from disk. If the apply succeeds, the coordinator reports the SQL
transaction as successful. If the apply fails, the coordinator logs an
error, but still reports the SQL transaction as successful.
The SQL transaction contains one or more ALTER SECRET ... AS
statements.
The coordinator calls SecretsController:apply
with multiple
SecretOp::Ensure
s. If the apply fails, the coordinator reports the
SQL transaction as failed; if the apply succeeds, the coordinator reports
the SQL transaction as successful.
SQL transactions that contain any other combination of secret DDL statements are prohibited.
The above integration is designed so that failures result in orphaned secrets in the secrets controller; i.e., secrets that exist in the secrets controller but have no corresponding entry in the catalog. (The inverse would not be permissible: a secret that exists in the catalog must be known to the secrets controller.) The coordinator will periodically attempt to garbage collect orphaned secrets, either at startup or in a background task.
The ADAPTER
layer must be configured with a SecretsController
implementation. The STORAGE
and COMPUTE
layers must be configured
with the path on the local filesystem in which the secrets controller has made
the contents of the secrets available.
That means that presently materialized
will create a
FilesystemSecretsController
that stores data in mzdata/secrets
. For
simplicity, to start, this directory will not be configurable. This controller
will be provided to the Coordinator
, while the path to mzdata/secrets
will
be provided to the dataflow layer.
In the future, coordd
will learn to instantiate a
KubernetesSecretsController
, and dataflowd
will learn to expect the secrets
to be mounted at a known location.
Consider the addition of a --secrets-dir
command-line flag to control where
secrets are stored.
Consider the addition of a VaultSecretsController
, if we need to use
HashiCorp Vault in Materialize Platform.