TODO: This document will describe the architecture of the CLOUD OPERATOR component that spins up the STORAGE, COMPUTE, and CONTROL layers described in the Database Architecture.
⚠️ WARNING! ⚠️ This document is a work in progress! At this stage it is a loose collection of raw thoughts.
What's called a "region" in user-facing code will be internally called an "environment", to avoid confusion with a cloud-provider region.
Each environment will correspond to a namespace in Kubernetes. Each environment is issued a TLS certificated by Let's Encrypt. We've requested an increased rate limit from Let's Encrypt, as by default Let's Encrypt will only issue 50 certificates per domain.
Environments are managed by the environment controller, which is a Kubernetes operator that lives in a private repository.
The STORAGE and COMPUTE layers each turn on instances by creating services in
an Orchestrator
. An Orchestrator
is backed by Kubernetes in the cloud,
and a service corresponds directly to a StatefulSet
. This may evolve soon.
All ports created by the orchestrator have names.
The Kubernetes orchestrator installs the following labels on all services and pods that it creates:
environmentd.materialize.cloud/namespace
: the orchestrator "subnamespace"
in which the service is created. Either storage
or compute
at present.
environmentd.materialize.cloud/service-id
: the namespace-provided
identifier for the service, like "cluster-0" or "runtime".
environmentd.materialize.cloud/port-PORT=true
: A label for each named port
exposed by the service.
NAMESPACE.environmentd.materialize.cloud/KEY
: arbitrary labels assigned
at service creation time, prefixed with the NamespacedOrchestrator
that created the label.
Additionally, the service that invokes environmentd
can pass
--orchestrator-service-label
to specify additional labels that should be
attached to all pods and services created by the Kubernetes orchestrator.
See the secret design doc.
There are presently two binaries involved in Materialize Platform:
environmentd
, which hosts the storage controller, compute controller,
and adapter layersclusterd
, which hosts a storage and compute runtime.There are four separate network protocols:
The storagectl protocol, which his how environmentd
communicates
with the storage runtime in clusterd
. This protocol is conventionally
hosted on port 2100.
The computectl protocol, which his how environmentd
communicates
with the compute runtime in clusterd
. This protocol is conventionally
hosted on port 2101.
The timely protocol, which is how cluster processes communicate with one another. This protocol is conventionally hosted on port 2102.
Building a Terraform provider that manages clusters, roles, databases, schemas, and secrets should be straightforward. Because all of these objects are managed via SQL, our Terraform provider can be a fork of [terraform-provider-postgresql] with only minor modifications.
Supporting sources and sinks in the Terraform provider will be more challenging,
because it amounts to remapping the CREATE SOURCE
and CREATE SINK
syntax
to Terraform resources, and that syntax is extremely complex.