This document describes the characteristics of the network boundaries between the components of the Materialize platform. In its current form it raises more questions than it provides answers, which is as intended.
Materialize's storage layer is responsible for ingesting data from external sources, providing some form of durability, and publishing data to external subscribers. The compute layer is responsible for transforming data based on queries installed by the user. Together they provide the core computational resource for Materialize. In the following, we describe the requirements of the network boundary between the two components.
The STORAGE hosts a server socket accepting connections from COMPUTE clients.
The DATAFLOW hosts a client connected to a STORAGE server.
The storage service accepts the following commands over the network:
Subscribe(dataflow, source, worker)
: The identified dataflow subscribes to updates from the named source worker.Unsubscribe(dataflow, source, worker)
: The identified dataflow unsubscribes from updates from the named source worker.
This causes resouces on STORAGE to be freed.The storage service updates the compute service using the following responses:
Data(dataflow, source, worker, events)
: The captured data as events
for the named dataflow from the identified
source, from a specific source worker.
We support two variants of this response to cover both the data and error collections.