The contents of this design doc are a cleaned up version of a subset of the "Protobuf Support for ComputeCommand" Google Doc, which was the primary design doc while working on #11735.
As part of the Platform initiative, materialized
will be broken down into different processes that communicate through a set of well-defined APIs comprised of commands^cc^sc and responses^cr^sr.
API messages therefore need to be serialized into a backwards-compatible binary format in order to facilitate inter-process communication and replay-based failure recovery.
This document proposes an encoding to be adopted for these purposes based on Google's Protocol Buffers^pb.
persist
) already use Protobuf, and there is a large overalop of the set of types used both by persist
and the current API (for example, everything under Row
).
Second, Protobuf nicely integrates with adjacent technologies (such as gRPC) that might be of interest in the near future.We delegate the heavy-lifting of the serialization/deserialization efforts and large chunks of the backwards-compatibility story to Protobuf.
Following suit from the rest of the codebase, the Protobuf integration is handled by the prost
[^prost] library.
Given the fact that we want to retro-actively add Protobuf support for all Rust types used in the current API, we have the following strategies:
*.proto
files.prost
attributes.$T
, create a mirroring type Proto$T
and define a pair of conversion functions that mediate between the two types.The design proposed here is based on (3) because this strategy offers the highest degree of flexibility with respect to backwards compatibility. The pros and cons of (1) and (2) are discussed in the Alternatives section.
Pros
The selected strategy offers flexibility due to the separation of the serializable type (Proto$T
) from the client facing type ($T
). In particular:
$T
as usual and keep client code simple.
The technical complexity caused by backwards-compatibility guarantees is accumulated in the Proto$T
and the associated Proto$T ⇒ $T
conversion function.*.proto
defs by prost-build
(see the rejected alternatives for details).
The existing Rust type $T
can remain unchaged, while Proto$T
can deviate in a predictable and consistent way based on the limitations of prost-build
.
Moreover, we can offer library functions that mediate between $T
and Proto$T
in a consistent way across the codebase.
For example, we can enforce that the Rust type usize
is always represented by the Protobuf type uint64
.Cons
Proto$T
in a *.proto
file,prost-build
to generate the Proto$T
Rust type, and$T ⇔ Proto$T
for each $T
.$T
used by the API.$T ⇔ Proto$T
for complex type is coing to be recursive and therefore susceptible to stack overflow issues (see #9000).$T
and Proto$T
, possibly in the hot paths of some processes.*.proto
filesWith this strategy, we will:
$T
to a corresponding Protobuf message type,$T
from this message using prost-build
, and$T
with its derived version in client code.Pros
*.proto
files to derive messages in other languages.Cons
*.proto
message definitions by prost
and the existing Rust types, so we will need to touch client code. For example:
usize
, chrono
types, tuples).prost
attributesWith this strategy, we will add Protobuf serialization and deserialization support directly to the existing types by annotating them with prost-derive
macros (e.g. ::prost::Message
).
Pros
Cons
Same as for the other rejected alternative.
$T ⇔ Proto$T
step at the moment? If yes, we need to run some benchmarks to quantify this.#eng-storage
regarding protobuf representation for Row
#eng-persist
regarding adopting the Codec trait#team-status
thread prior to the meeting on 2022/03/29#help-rust
thread regarding usize handling#prost
Discord channel Q1: cross-crate imports#prost
Discord channel Q2: blanket implementations for tuples#prost
Discord channel Q3: FileDescriptorSet
handling#prost
Discord channel Q4: (Optional<T>
handling in proto3
)[^pb]: Protocol Buffers
[^prost]: prost
GitHub page