20220303_secrets.md 10 KB

Secrets

Summary

This document fleshes out the design for secrets, which securely store sensitive values.

Design

Secret SQL syntax and semantics

Four new SQL statements will be introduced with the following grammar to create, alter, drop, and list secrets, respectively:

create_secret_stmt ::=
  CREATE SECRET [IF NOT EXISTS] <name> AS <value>

alter_secret_stmt ::=
  ALTER SECRET [IF EXISTS] <name> AS <value>

alter_secret_rename_stmt ::=
  ALTER SECRET [IF EXISTS] <name> RENAME TO <name>

drop_secret_stmt ::=
  DROP SECRET [IF EXISTS] <name> [{ RESTRICT | CASCADE }]

show_secrets_stmt: SHOW SECRETS [FROM <schema>] [{ LIKE 'pattern' | WHERE <expr> }]

<name> ::= <qualified-identifier>
<value> ::= <scalar-expression>

Initially, all of these statements will be available in experimental mode only.

The name of a secret is a qualified identifier. Secrets live in the standard database and schema hierarchy alongside tables, views, sources, etc.

The CREATE SECRET statement creates a new secret with the specified contents. It returns an error if a secret with the specified name already exists unless the IF NOT EXISTS clause is present. The <value> expression may not reference any relations and must be implicitly castable to bytea.

The first form of the ALTER SECRET statement changes the contents of an existing secret. The second form changes the name of an existing secret.

The DROP SECRET statement drops an existing secret. It returns an error if a secret with the specified name does not exist unless the IF EXIST clause is present. Dropping a secret that is in use by a CONNECTOR is not permitted unless CASCADE is present.

A SQL transaction can usually contain only one secret DDL statement. As a special case, if the secrets controller in use supports multi-operation atomicity, a SQL transaction can contain any number of non-renaming ALTER SECRET statements, but no other statements, in which case the secrets will be updated atomically.

Note that while the catalog updates to a secret apply immediately, changes to the contents of the secret do not propagate to downstream consumers of the secret on any particular timeline.

The SHOW SECRETS statement lists the names of the secrets in the specified schema, or the first schema in the search path if no schema is explicitly specified, optionally filtered by the provided LIKE pattern or WHERE expression. The filter clauses which work analogously to the same clauses in the SHOW DATABASES statement.

Note that there is intentionally no way to view the contents of secrets.

System catalog changes

A new mz_secrets table with the following structure will describe the available secrets in the system:

Field Type Meaning
id text The ID of the secret.
schema_id bigint The ID of the schema to which the secret belongs.
name text The name of the secret.

Note that, again, the contents of the secrets are not exposed.

Referencing secrets

There is presently no way to use a secret. The existing CREATE SOURCE and CREATE SINK commands will be extended to support referencing secrets, but that work is left to a future design document.

Implementation

Catalog changes

The CatalogItem enum will be extended with an additional Secret variant:

pub enum CatalogItem {
    // ...
    Secret,
}

Unlike the other catalog items, there is no data associated with a secret. The name and ID of the secret are stored in the CatalogEntry struct, and the actual data for the secret will be stored in a secret backend (see next section). Therefore there is presently no additional metadata that needs to be recorded about a secret.

When serializing a secret, its create_sql will be synthesized as:

CREATE SECRET <name> AS '******'

When deserializing a secret on bootstrap, the '******' will be ignored.

Secrets controller

The Coordinator will host a SecretsController that securely handles the storage of secrets. It will interact with the controller via the following interface.

/// Securely stores secrets.
pub trait SecretsController {
    /// Applies the specified secret operations in bulk.
    ///
    /// Implementations must apply the operations atomically. If the method
    /// returns `Ok(())`, then all operations have been applied successfully;
    /// if the method returns `Err(())`, then none of the operations have been
    /// applied.
    ///
    /// Implementations are permitted to reject combinations of operations which
    /// they cannot apply atomically.
    fn apply(&mut self, ops: Vec<SecretOp>) -> Result<(), anyhow::Error>;

    /// Returns the IDs of all known secrets.
    fn list(&self) -> Result<Vec<GlobalId>, anyhow::Error>;
}

/// An operation on a [`SecretsController`].
pub enum SecretOp {
    /// Create or update the contents of a secret.
    Ensure {
        /// The ID of the secret to create or update.
        id: GlobalId,
        /// The binary contents of the secret.
        contents: Vec<u8>,
    },
    /// Delete a secret.
    Delete {
        /// The id of the secret to delete.
        id: GlobalId
    }
}

There will be two concrete implementations of the SecretsController:

  • The FilesystemSecretsController will store secrets on the local filesystem. It will not support multi-operation atomicity, but will support atomically updating or removing one secret at a time.

  • The KubernetesSecretsController will store secrets in Kubernetes. It will support multi-operation atomicity.

All secrets controllers expose the contents of secrets to the STORAGE and COMPUTE layer by way of a directory that contains one file per secret, where the name of the file is the secret's ID and the contents of the file are the secret's contents. For example, if IDs u12 and u36 are secrets, the directory would look like this:

secrets/
  u12
  u36

This bears repeating: even the KubernetesSecretsController will expose the contents of the secrets via the filesystem. The filesystem is the lingua franca for exposing secrets to applications. Kubernetes supports exposing the contents of secrets as files on the filesystem, as does HashiCorp Vault.

The STORAGE and COMPUTE layers will need to be taught to re-read the contents of secrets that they consume from disk periodically, in case the contents have bene updated by an ALTER SECRET. We may wish to build a helper "secret consumer" library that uses filesystem events to notice when the secrets change, rather than polling periodically.

Transaction integration

Providing proper SQL transaction semantics requires a complicated integration between catalog transactions and secret controller transactions.

Let's consider each of the cases in turn:

  • The SQL transaction contains exactly one CREATE SECRET statement.

    The coordinator calls SecretsController::apply with a single SecretOp::Ensure. If the apply fails, the coordinator reports the SQL transaction as failed. If the apply succeeds, the coordinator calls Catalog::transact to create the secret object. If the catalog transaction succeeds, the coordinator reports the SQL transaction as successful. If the catalog transaction fails, the coordinator makes a best-effort attempt to remove the secret by calling SecretsController::apply with a single SecretOp::Delete, then reports the SQL transaction as failed.

  • The SQL transaction contains exactly one DROP SECRET statement.

    The coordinator calls Catalog::transact to remove the secret from the catalog. If the catalog transaction fails, the coordinator reports the SQL transaction as failed. If the catalog transaction succeeds, the coordinator calls SecretsController::apply with a single SecretOp::Delete to remove the secret from disk. If the apply succeeds, the coordinator reports the SQL transaction as successful. If the apply fails, the coordinator logs an error, but still reports the SQL transaction as successful.

  • The SQL transaction contains one or more ALTER SECRET ... AS statements.

    The coordinator calls SecretsController:apply with multiple SecretOp::Ensures. If the apply fails, the coordinator reports the SQL transaction as failed; if the apply succeeds, the coordinator reports the SQL transaction as successful.

SQL transactions that contain any other combination of secret DDL statements are prohibited.

The above integration is designed so that failures result in orphaned secrets in the secrets controller; i.e., secrets that exist in the secrets controller but have no corresponding entry in the catalog. (The inverse would not be permissible: a secret that exists in the catalog must be known to the secrets controller.) The coordinator will periodically attempt to garbage collect orphaned secrets, either at startup or in a background task.

Configuration

The ADAPTER layer must be configured with a SecretsController implementation. The STORAGE and COMPUTE layers must be configured with the path on the local filesystem in which the secrets controller has made the contents of the secrets available.

That means that presently materialized will create a FilesystemSecretsController that stores data in mzdata/secrets. For simplicity, to start, this directory will not be configurable. This controller will be provided to the Coordinator, while the path to mzdata/secrets will be provided to the dataflow layer.

In the future, coordd will learn to instantiate a KubernetesSecretsController, and dataflowd will learn to expect the secrets to be mounted at a known location.

Future work

  • Consider the addition of a --secrets-dir command-line flag to control where secrets are stored.

  • Consider the addition of a VaultSecretsController, if we need to use HashiCorp Vault in Materialize Platform.