An open source distributed build system maintained by Google, a fork of their internal build system known as Blaze.
tl;dr: These tips should get you started with Bazel:
- To generate a
BUILD.bazel
file, runbin/bazel gen
.- When running Bazel, a target is defined like
//src/catalog:mz_catalog
where//
signifies the root of our repository,src/catalog
is the path to a directory containing aBUILD.bazel
file, and:mz_catalog
is a named target within thatBUILD.bazel
file.- To see what targets are available you can use the
query
subcommand, e.g.bin/bazel query //src/catalog/...
.
Bazel's main component for building code are "rules", which are provided by open source rule sets, e.g.
rules_rust
. When using a rule, e.g.
rust_library
, you define all of
the inputs (e.g. source files) and extra parameters (e.g. compiler flags) required to build your
target. Bazel then computes a build graph which is used to order operations and determine when
something needs to get re-built.
A common annoyance with Bazel is that it operates in a sandbox. A build that otherwise succeeds on your machine might fail when run with Bazel because it has a different version of a compiler, or can't find some necessary file. This is a key feature though because it makes builds hermetic and allows Bazel to aggressively cache artifacts, which reduces build times.
Table of contents:
bazelisk
To use bazel
you first need to install bazelisk
, which
is a launcher that automatically makes sure you have the correct version of Bazel installed.
Note: We have a
.bazelversion
file in our repository that ensures everyone is using the same version.
On macOS you can do this with Homebrew:
brew install bazelisk
For Linux distributions you'll need to grab a binary from their releases
page and put them into your PATH as bazel
:
chmod +x bazelisk-linux-amd64
sudo mv bazelisk-linux-amd64 /usr/local/bin/bazel
.bazelrc
fileBazel has numerous command line options,
which can be defined in a .bazelrc
file to create different configurations that you run Bazel
with. We have a .bazelrc
in the root of our repository that defines several
common build configurations, but it's also recommended that you create a .bazelrc
in your home
directory (i.e. ~/.bazelrc
) to customize how you run Bazel locally. Options specified in your
home RC file will override those of the workspace RC file.
A good default to start with is:
# Bazel will use all but one CPU core, so your machine is still responsive.
common --local_resources=cpu="HOST_CPUS-1"
# Define a shared disk cache so builds from different Materialize repos can share artifacts.
build --disk_cache=~/.cache/bazel
# Optional. The workspace RC already sets a max disk cache size, and artifact age
# but you can override that if you have more limited disk space.
common --experimental_disk_cache_gc_max_size=40G
common --experimental_disk_cache_gc_max_age=7d
Bazel supports reading and writing artifacts to a remote cache.
We currently have two setup in i2
that are backed by S3 and running bazel-remote
.
One is accessible by developers and used by PR builds in CI, we treat this as semi-poisoned. The
other is only accessible by CI and used for builds from main
and tagged builds.
To enable remote caching as a developer you must to the following:
tsh
installedCreate ~/.config/materialize/build.toml
and add the following:
[bazel]
remote_cache = "teleport:bazel-remote-cache"
When running Bazel via bin/bazel
we will read the build config and spawn a Teleport proxy via
tsh
if one isn't already running, then specify --remote_cache
to bazel
with the correct URL.
In some cases you might see a warning printed when calling bin/bazel
indicating the Teleport
proxy failed to start, e.g.
Teleport proxy failed to start, 'tsh' process already running!
existing 'tsh' processes: [10001]
exit code: 1
Generally this means there is a Teleport proxy already running that we've lost track of. You can
fix this issue by terminating the existing tsh
process with the PID specified in the warning
message.
We maintain two remote caches in the "Materialize Core" AWS account stored under S3 buckets:
materialize-bazel-remote
: Used for PR builds and accessible by developersmaterialize-bazel-remote-pa
: Used for main branch and tagged builds (CI only)Each bucket contains two main folders cas.v2
and ac
.
To force Bazel to rebuild each cache from scratch, you can delete these folders. Note that you'll need the appropriate AWS permissions to perform these operations.
Bazel has been integrated into mzbuild
, which means you can use
it for other tools as well like mzimage
and mzcompose
! To enable Bazel specify the --bazel
flag like you would specify the --dev
flag, e.g. bin/mzcompose --bazel ...
.
Otherwise Bazel can be used just like cargo
, to build individual targets and run tests. We
provide a thin wrapper around the bazel
command in the form of bin/bazel
.
This sets up remote caching, and provides the fmt
and gen
subcommands.
Otherwise it forwards all commands onto bazel
itself.
All Rust crates in our Cargo Workspace have a BUILD.bazel
file that define different build
targets for the crate. You don't have to write these files, they are automatically generated from
the crate's Cargo.toml
. For more details see the Generating BUILD.bazel
files
section.
tl;dr to build a crate run
bin/bazel build //src/<crate-name>
from the root of the repo.
To determine what targets are available for a crate you can use the query
subcommand, e.g.
$ bin/bazel query //src/adapter/...
//src/adapter:adapter
//src/adapter:mz_adapter
//src/adapter:mz_adapter_doc_test
//src/adapter:mz_adapter_lib_tests
//src/adapter:mz_adapter_parameters_tests
//src/adapter:mz_adapter_sql_tests
//src/adapter:mz_adapter_timestamp_selection_tests
Every Rust crate has at least one Bazel target, which is the name of the crate. In the example
above the "adapter" crate has the target mz_adapter
. So you can build the mz_adapter
crate
by running the following:
$ bin/bazel build //src/adapter:mz_adapter
For convenience we also alias the primary target to have the same name as the folder, in the
example above we alias mz_adapter
to adapter
. This allows a shorthand syntax for building a
crate:
# Builds the same target as the example above!
$ bin/bazel build //src/adapter
When adding a new crate to our workspace follow the normal flow that you would
with Cargo, e.g. run cargo new --lib my_crate
. Once it's created you'll need
to add an entry to the Bazel WORKSPACE
in the root of our
repository. In that file search for "crates_repository" and then find the
manifests section, it should look something like this:
crates_repository(
name = "crates_io",
//...
manifests = [
"//:Cargo.toml",
"//:src/adapter-types/Cargo.toml",
<add your new crate to this list>
],
)
The
crates_repository
Bazel rule aggregates all of the third-party crates that we use and automatically generatesBUILD.bazel
files for them.
Once your new crate is added to crates_repository
run bin/bazel gen
to
generate a new BUILD.bazel
file, and you should be all set!
Note: Support for running Rust tests with Bazel is still experimental. We're waiting on #29266.
Defined in a crate's BUILD.bazel
are test targets. The following targets are automatically
generated:
<crate_name>_lib_tests
<crate_name>_doc_tests
<crate_name>_<integration_test_file_name>_tests
For example, at the time of writing the ore
crate has three files underneath ore/tests
,
future.rs
, panic.rs
, and task.rs
. As such the BUILD.bazel
file for the ore
crate has the
following test targets:
mz_ore_lib_tests
mz_ore_doc_tests
mz_ore_future_tests
mz_ore_panic_tests
mz_ore_task_tests
You can run the tests in future.rs
by running the following command:
bin/bazel test //src/ore:mz_ore_future_tests
You can provide arguments to the underlying test binary with the --test_arg
command line option. This allows you to provide a filter to Rust's test framework, e.g.
bin/bazel test //src/ore:mz_ore_future_tests --test_arg=catch_panic_async
Would run only the tests in future.rs
matching the filter "catch_panic_async".
WORKSPACE
, BUILD.bazel
, *.bzl
filesThere are three kinds of files in our Bazel setup:
WORKSPACE
: Defines the root of our workspace, we only have one of these. This is where we load
all of our rule sets, download remote repositories, and register toolchains.BUILD.bazel
: Defines how a library/crate is built, where you use "rules". This is generally
equivalent to a Cargo.toml
, one per-crate.*.bzl
: Used to define new functions or macros that can be used in BUILD.bazel
files, written
in Starlark. As a general developer you should rarely if
ever need to interact with these files.BUILD.bazel
filestl;dr run
bin/bazel gen
from the root of the repository.
Just like Cargo.toml
, associated with every crate is a BUILD.bazel
file that provides targets that
Bazel can build. We auto-generate these files with cargo-gazelle
which
developers can easily run via bin/bazel gen
.
There are times though when Cargo.toml
doesn't provide all of the information required to build a
crate, for example the std::include_str!
macro adds an implicit dependency on the file being included. Bazel operates in a sandbox and thus
will fail unless you tell it about the file! For these cases you can add the dependency via a
[package.metadata.cargo-gazelle.<target>]
section in the Cargo.toml
. For example:
[package.metadata.cargo-gazelle.lib]
compile_data = ["path/to/my/file.txt"]
This will add "path/to/my/file.txt"
to the compile_data
attribute on the
resulting rust_library
Bazel target.
cargo-gazelle
gazelle
is a semi-official BUILD.bazel
file
generator that supports Golang and protobuf. There exists a gazelle_rust
plugin, but it's not yet mature enough to fit our needs. Still, it's important for producivity that
developers who don't want to interact with Bazel shouldn't have to, so generating a BUILD.bazel
file from a Cargo.toml
is quite important.
Thus we decided to write our own generator, cargo-gazelle
! It's not a plugin for the existing
gazelle
tool but theoretically could be. It's designed to be fully generic with very few (if any)
Materialize specific configurations built in.
cargo-gazelle
supports the following configuration in a Cargo.toml
file.
# Configuration for the crate as a whole.
[package.metadata.cargo-gazelle]
# Will skip generating a BUILD.bazel entirely.
#
# If you specify this setting please include a reason at the top of the
# BUILD.bazel file explaining why we skip generating.
skip_generating = (true | false)
# Concatenate the specified string at the end of the generated BUILD.bazel file.
#
# This is largely an escape hatch and should be avoided if possible.
additive_content = "String"
# Configuration for the library target of the crate.
[package.metadata.cargo-gazelle.lib]
# Skip generating the library target.
skip = (true | false)
# Extra data that will be provided to the Bazel target at compile time.
compile_data = ["String Array"]
# Extra data that will be provided to the Bazel target at compile and run time.
data = ["String Array"]
# Extra flags for rustc.
rustc_flags = ["String Array"]
# Environment variables to set for rustc.
[package.metadata.cargo-gazelle.lib.rustc_env]
var1 = "my_value"
# By default Bazel enables all features of a crate, if provided we will
# _override_ that set with this list.
features_override = ["String Array"]
# Extra dependencies to include for the target.
extra_deps = ["String Array"]
# Extra proc-macro dependencies to include for the target.
extra_proc_macro_deps = ["String Array"]
# Configuration for the crate's build script.
[package.metadata.cargo-gazelle.build]
# Skip generating the library target.
skip = (true | false)
# Extra data that will be provided to the Bazel target at compile time.
compile_data = ["String Array"]
# Extra data that will be provided to the Bazel target at compile and run time.
data = ["String Array"]
# Extra flags for rustc.
rustc_flags = ["String Array"]
# Environment variables to set for rustc.
[package.metadata.cargo-gazelle.build.rustc_env]
var1 = "my_value"
# Environment variables to set for the build script.
build_script_env = ["String Array"]
# Skip the automatic search for protobuf dependencies.
skip_proto_search = (true | false)
# Configuration for test targets in the crate.
#
# * Library tests are named "lib"
# * Doc tests are named "doc"
#
[package.metadata.cargo-gazelle.test.<name>]
# Skip generating the library target.
skip = (true | false)
# Extra data that will be provided to the Bazel target at compile time.
compile_data = ["String Array"]
# Extra data that will be provided to the Bazel target at compile and run time.
data = ["String Array"]
# Extra flags for rustc.
rustc_flags = ["String Array"]
# Environment variables to set for rustc.
[package.metadata.cargo-gazelle.test.<name>.rustc_env]
var1 = "my_value"
# Bazel test size.
#
# See <https://bazel.build/reference/be/common-definitions#common-attributes-tests>.
size = "String"
# Environment variables to set for test execution.
[package.metadata.cargo-gazelle.test.<name>.env]
var1 = "my_value"
# Configuration for binary targets of the crate.
[package.metadata.cargo-gazelle.binary.<name>]
# Skip generating the library target.
skip = (true | false)
# Extra data that will be provided to the Bazel target at compile time.
compile_data = ["String Array"]
# Extra data that will be provided to the Bazel target at compile and run time.
data = ["String Array"]
# Extra flags for rustc.
rustc_flags = ["String Array"]
# Environment variables to set for rustc.
[[package.metadata.cargo-gazelle.binary.<name>.rustc_env]]
var1 = "my_value"
# Environment variables to set for test execution.
[package.metadata.cargo-gazelle.binary.<name>.env]
var1 = "my_value"
If all else fails, the code that handles this configuration lives in misc/bazel/cargo-gazelle
!
Bazel is designed to be run on a variety of hardware, operating systems, and system configurations. To manage this complexity Bazel has a concept of "constraints" to allow conditional configuration of rules, and "platforms" to manage hardware differences. There are three roles that a platform can serve:
The platforms that we build for are defined in /platforms/BUILD.bazel
.
A common way to configure a build based on platform is to use the
select
function. This allows you to return
different values depending on the platform we're targetting.
Not necessarily related to platforms, but still defined in
/platforms/BUILD.bazel
are our custom build flags.
Currently we have custom build settings for the following features:
While most build settings can get defined in the .bazelrc
these features require slightly more
complex configuration. For example, if we're building with a sanitizer we need to disable
jemalloc
, this is because sanitizers commonly have their own allocator. To do this we create a
new build flag with the string_flag
rule from the Bazel Skylib rule set and match on this using the config_setting
rule that is built in to Bazel. The [config_setting
] is then what we can match on in our
BUILD.bazel
files with a select({ ... })
function.
Bazel has a specific framework to manage compiler toolchains. For example, instead of having to
specify a Rust toolchain every time you use the rust_library
rule, you instead register a global
Rust toolchain that rules resolve during analysis.
Toolchains are defined and registered in the WORKSPACE
file. We currently use
Clang/LLVM to build C/C++ code (via the toolchains_llvm
ruleset)
where the version is defined by the LLVM_VERSION
constant. For Rust we support both stable and
nightly, where the versions defined by the RUST_VERSION
and RUST_NIGHTLY_VERSION
constants
respectively.
Both toolchains_llvm
and rules_rust
have "process wrappers".
These are small wrappers around clang
and rustc
that are able to inspect the absolute path they
are being invoked from. Bazel does not expose absolute paths at all so these wrappers are how
arguments like --remap-path-prefix
get set. These wrappers are helpful but can also cause issues like toolchains_llvm#421
.
The upstream LLVM toolchains are very large and
built for bespoke CPU architectures. While maybe not ideal, we build our own LLVM toolchains which
live in the MaterializeInc/toolchains repo. This
ensures we're using the same version of clang
across all architectures we support and greatly
improves the speed of cold builds.
Note: The upstream LLVM toolchains are ~1 GiB and compressed with gzip, end-to-end they took about 3 minutes to download and setup. Our toolchains are ~80MiB and compressed with zstd which end-to-end take less than 30 seconds to download and setup.
Along with a C-toolchain we also provide a system root for our builds. A system root contains
things like libc
, libm
, and libpthread
, as well as their associated header files. Our system
roots also live in the MaterializeInc/toolchains repo.
For building Rust code we use rules_rust
. It's
primary component is the crates_repository
rule.
crates_repository
Normally when building a Rust library you define external dependencies in a Cargo.toml
, and
cargo
handles fetching the relevant crates, generally from crates.io
. The crates_repository
rule does the same thing, we define a set of manifests (Cargo.toml
files), it will analyze them
and create a Bazel repository containing all of
the necessary external dependencies.
Then to build our crates, e.g. mz-adapter
, we use the handy
all_crate_deps
macro. When using this macro in a BUILD.bazel
file, it determines which package we're in (e.g.
mz-adapter
) and expands to all of the necessary external dependencies. Unfortunately it does not
include dependencies from within our own workspace, so we still need to do a bit of manual work
of specifying dependencies when writing our BUILD.bazel
files.
In the WORKSPACE
file we define a "root" crates_repository
named crates_io
.
-sys
cratesThere are some Rust crates that are wrappers around C libraries, like
decnumber-sys
is a wrapper around
libdecnumber
. cargo-gazelle
will generate a Bazel target
for the crate's build script, but it's likely this build script will fail because it can't find
tools like cmake
, our system root, or implicitly depends on some other C library.
The general approach we've used to get these crates to build is to duplicate the logic from the
-sys
crate's build.rs
script into a Bazel target. See
bazel/c_deps/rust-sys for some examples. Once you write a
BUILD.bazel
file for the C dependency we add a crate.annotation
in our WORKSPACE
file that appends your newly written BUILD.bazel
file to the one generated for the Rust crate.
Duplicating logic is never great, but having Bazel explicitly build these C dependencies provides better caching and more control over the process which unlocks features like cross language LTO.
There are a few C dependencies which are used both by a Rust -sys
crate and another C dependency.
For example zstd
is used by both the zstd-sys
Rust crate and the rocksdb
C library. For these
cases instead of depending on the version included via the Rust -sys
crate, we "manually" include
them by downloading the source files as an http_archive
.
All cases of external C dependencies live in bazel/c_deps/repositories.bzl
.
Nearly all of our Rust build scripts do a single thing, and that's generate Rust bindings to
protobuf definitions. rules_rust
includes rules for generating protobuf bindings
when using Prost and Tonic, but they don't interact with Cargo Build Scripts very well. Instead we
added a new crate called build-tools
whose purpose is to abstract over
whatever build system you're using and provide the tool a build script might need, like protoc
.
For Bazel we provide the necessary tools via "runfiles", which are defined in the data
field
of the rust_library
target. Bazel "runfiles" are a set of files that are provided at runtime
execution. So in your build script to get the current path of the protoc
executable you would
call mz_build_tools::protoc
(example)
which returns a different path depending on the build system
currently being used.
Development builds of Materialize include the current git hash in their version number. The sandbox that Bazel creates when building a Rust library does not include any git info, so attempts to get the current hash will fail.
But! Bazel has a concept of "stamping" builds which allows you to provide local system information
as part of the build process, this information is known as the workspace status.
Generating the workspace status and providing it to Rust libraries requires a few steps, all of
which are described in the bazel/build-info/BUILD.bazel
file.
Unfortunately this isn't the whole story though. It turns out workspace status and stamping builds
causes poor remote cache performance. On a new build Bazel will regenerate the volatile-status.txt
file used in workspace stamping which causes any stamped libraries to not be fetched from the
remote cache, see bazelbuild#10075
. For us
this caused a pretty serious regression in build times so we came up with a workaround:
mzbuild.py
will write out the current git hash to a temporary file.build-info
Rust crate knows to read from this temporary file
in a non-hermetic/side-channel way to get the git hash into the current build without
invalidating the remote cache.While definitely hacky, our side-channel for the git hash does provide a substantial improvement in build times, while providing similar guarantees to the Cargo build with respect to when the hash gets re-computed.