|
1 month ago | |
---|---|---|
.. | ||
results | 1 month ago | |
.gitignore | 1 month ago | |
README.md | 1 month ago | |
lib.py | 1 month ago | |
mzcompose | 1 month ago | |
mzcompose.py | 1 month ago | |
scalability.ipynb | 1 month ago |
The scalability test framework attempts to determine how the product scales in terms of concurrent SQL connections
The goals of this framework are to:
Identify bottlenecks in the product, that is, workloads that do not scale well for some reason.
Evaluate the performance impact of changes to the code and detect regressions
Collect the data:
cd test/scalability
./mzcompose run default
Start Jupyter lab
./mzcompose run lab
This is going to display a URL that you can open in your browser
Open scalability.ipynb
From the Run menu, select Run All Cells
Select the workload you want charted from the drop-down
Use drag+drop to preserve any charts
The framework can be directed to execute its queries against various targets. Multiple targets can be specified within a single command line.
./mzcompose run default --target HEAD ...
./mzcompose run default --target v1.2.4 ...
In both of those cases, Materialize, CRDB and Python will run within the same machine, possibly interfering with each other
./mzcompose run default --target postgres ...
./mzcompose run default --target=remote \--materialize-url="postgres://user:password@host:6875/materialize?sslmode=require" --cluster-name= ...
This resolves to the commit of the merge base of the current branch.
latest
will be used.When running locally, this is the lasat shared commit of the current branch and the main
branch.
./mzcompose run default --target common-ancestor ...
A regression is defined as a deterioration in performance (transactions per seconds) of more than a configurable threshold (default: 20%) for a given workload and count factor compared to the baseline.
To detect a regression, add the parameter --regression-against
and specify a target. The specified target will be
added to the --target
s if it is not already present.
The framework uses an exponential function to determine what concurrencies to test. By default, exponent base of 2 is used, with a default minimum value of 1 and maximum of 256. To get more data points, you can modify the exponent base to be less than 2:
./mzcompose run default --exponent-base=1.5 --min-concurrency=N --max-concurrency=M
The framework will run --count=256
operations for concurrency=1 and then multiply the count by sqrt(concurrency) for higher concurrencies.
This way, a larger number of operations will be performed for the higher concurrencies, leading to more stable results. If --count
operations was used when benchmarking concurrency 256, the test would complete in a second, leading to unstable results.
This diagram show the transactions per second per concurrency. Higher values are better.
These plots show the duration of the individual statements per concurrency. They provide information about the mean duration of an operation and their timing reliability. Lower values are better. Violin plots are used by default, boxplots are available as alternative.
The violin plots show the distribution of the data. The dark blue bar shows the interquartile range, which contains 75% of the measurements. The horizontal dark blue line shows the median.
See also: https://en.wikipedia.org/wiki/Violin_plot
The most important things in a nutshell:
See also: https://en.wikipedia.org/wiki/Box_plot
The following considerations can potentially impact the accuracy/realism of the reported results:
The framework is an open-loop benchmark, so it just pumps SQL statements to the database as fast as it can execute them.
The transactions-per-second is calculated in the following ways:
The framework uses concurrent.futures.ThreadPoolExecutor
. We measured the time it takes to run a no-op in Python , as well as the time it takes
to run a time.sleep()
, and , while both produced unwelcome outliers, it does not seem that Python would be obscuring any major trends in the charts.
By default, all participants run on a single machine, which may influence the results. In particular, CRDB exhibits high CPU usage, which may be crowding out the remaining components.
Consider using a Materialize instance that is not colocated with the rest of the components.
To reduce end-to-end latency, consider using a bin/scratch instance that is in the same AWS region as the Materialize instance you are benchmarking.