# [LDBC SNB Business Intelligence benchmark](https://github.com/ldbc/ldbc_snb_bi/tree/main) To run this benchmark, you will need: - inputs at an appropriate scale factor - parameters at that same scale factor You can download pre-fab datasets and parameters from . (Note that validation parameters exist only for scale factor 10.) # What does the benchmark run? LDBC BI runs a variety of analysis queries on a (generated) social-network-like graph. After a bulk load, the queries are rerun after several batches of updates. The benchmark specifies a protocol for running "enough" queries on each batch of updates. Each query of the benchmark has some parameters to make it somehow specific to the data. For example, query 1 has a parameter `datetime`; it calculates statistics on the number of messages for each 'message length category' in the same year as `datetime` but before `datetime` itself. The generated parameters offer a number of interesting `datetime`s to select. The [Umbra implementation](https://github.com/ldbc/ldbc_snb_bi/tree/main/umbra) does some pre-computation in manually materialized views. We translate those to true materialized views. # How do I run it? The script `init.sql` does the initial bulk load. To get it to work, you should [download scale factor 1](https://pub-383410a98aef4cb686f0c7601eddd25f.r2.dev/bi-pre-audit/bi-sf1-composite-merged-fk.tar.zst) and unzip it into `test/ldbc-bi/bi-sf1-composite-merged` and run `find ${UMBRA_CSV_DIR} -name "*.csv.gz" -print0 | parallel -q0 gunzip` to unzip each individual CSV file. You can then run `\i init.sql` from a psql session with Materialize. It will take a few minutes to load the data and define the materialized views. Run `\i qXX.sql` to run query `XX`. Each query has appropriate `\set` commands at the beginning to fill in a parameter value (the first one in the parameter set, arbitrarily). # Ideas, questions, and tasks During the conversion process, each of the benchmark queries is run as one-shot select. It may be more interesting to treat the queries as materialized views; we would then want to not just track total time querying in the benchmark, but also some measure of latency. # Local changes We've manually reordered `weights` in the `PathQ19` view of query 19 to accommodate the way delta joins hydrate (they follow the join plan of the first syntactic table, which happened to be a poor choice for this query). ## TODO - [ ] apply updates + do we want to measure "liveness" of the views as we run? - [ ] fully automate locally/in staging - [ ] load generator