123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475 |
- columns:
- - column: "Failure Type"
- - column: "RPO"
- - column: "RTO (RF1 - single AZ)"
- - column: "RTO (RF2 - multiple AZs)"
- rows:
- - Failure Type: "**Machine failure**"
- RPO: 0
- RTO (RF1 - single AZ): |
- Time to spin up new machine + possible rehydration time, depending on the
- objects on the machine:
- - If non-upsert sources, no rehydration time(i.e., does not require
- rehydration).
- - If upsert sources, rehydration time.
- - If sinks, no rehydration time (i.e., does not require rehydration).
- - If compute, rehydration time.
- - If serving, rehydration time.
- Additionally, there may be some time to catch up with changes that may
- have occurred during the downtime.
- To reduce rehydration time, scale up the cluster.
- RTO (RF2 - multiple AZs): |
- Can be:
- - 0 if only compute and serving objects are on the machine.
- - Time to spin up new machine if sources or sinks are on the machine.
- In addition, cluster RTO is affected if the [`environmentd` is
- down](#environmentd) (seconds to minutes).
- - Failure Type: "**Single AZ failure**"
- RPO: 0
- RTO (RF1 - single AZ): |
- *For managed clusters*
- Time to spin up new machine + possible rehydration time, depending on the
- objects on the machine:
- - If non-upsert sources, no rehydration time(i.e., does not require
- rehydration).
- - If upsert sources, rehydration time.
- - If sinks, no rehydration time (i.e., does not require rehydration).
- - If compute, rehydration time.
- - If serving, rehydration time.
- Additionally, there may be some time to catch up with changes that may
- have occurred during the downtime.
- To reduce rehydration time, you can scale up the cluster.
- During downtime, single AZ PrivateLinks are impacted.
- RTO (RF2 - multiple AZs): |
- Can be:
- - 0 if only compute and serving objects are on the machine.
- - Time to spin up new machine if sources or sinks are on the machine.
- In addition, cluster RTO is affected if the [`environmentd` is
- down](#environmentd) (seconds to minutes).
- - Failure Type: "**Regional failure (or 2 AZs failures)**"
- RPO: |
- At most, 1 hour (time since last backup, based on hourly backups).<br>
- RTO (RF1 - single AZ): |
- ~1 hour (time to check pointers).
- RTO (RF2 - multiple AZs): |
- High/Significant. Consider using a [regional failover strategy](/manage/disaster-recovery/#level-3-a-duplicate-materialize-environment-inter-region-resilience).
|