--- title: "Disaster recovery" description: "Learn about various disaster recovery (DR) strategies for Materialize." disable_list: true menu: main: parent: "manage" weight: 45 identifier: "disaster-recovery" --- The following outlines various disaster recovery (DR) strategies for Materialize. ## Level 1: Basic configuration (Intra-Region Recovery) Because Materialize is deterministic and its infrastructure runs on a container scheduler (AWS MSK), basic Materialize configuration provides intra-region disaster recovery **as long as**: - Materialize can spin up a new pod somewhere in the region, and - S3 is available. In such cases, your mean time to recovery is the **same as your compute cluster's rehydration time**. {{< annotation type="💡 Recommendation" >}} When running with the basic configuration, we recommend that you track your rehydration time to ensure that it is within an acceptable range for your business' risk tolerance. {{}} ## Level 2: Multi-replica clusters (High availability across AZs) {{< note >}} The hybrid strategy is available if your deployment uses a [three-tier or a two-tier architecture](/manage/operational-guidelines/). {{}} Materialize supports multi-replica clusters, allowing for distribution across Availability Zones (AZs): {{< include-md file="shared-content/multi-replica-az.md" >}} Multi-replica **compute clusters** and multi-replica **serving clusters** (excluding sink clusters) with replicas distributed across AZs provide DR resilience against: machine-level failures; rack and building-level outages; and AZ level failures for those clusters: - With multi-replica **compute clusters**, each replica performs the same work. - With multi-replica **serving clusters** (excluding sink clusters), each replica processes the same queries. As such, your compute and serving clusters will continue to serve up-to-date data uninterrupted in the case of a replica failure. {{< annotation type="💡 Cost and work capacity" >}} {{< include-md file="shared-content/cluster-replica-cost-capacity-notes.md" >}} {{}} If you require resilience beyond a single region, consider the Level 3 strategy. ## Level 3: A duplicate Materialize environment (Inter-region resilience) {{< note >}} {{< include-md file="/shared-content/regional-dr-infrastructure-as-code.md" >}} {{}} For region-level fault tolerance, you can choose to have a second Materialize environment in another region. With this strategy: - You avoid complicated cross-regional communication. - You avoid state dependency checks and verifications. - And, because Materialize is deterministic, as long as your upstream sources can also be accessed from the second region, the two Materialize environments can guarantee the same results. {{< annotation type="💡 No strict transactional consistency between environments" >}} This approach does **not** offer strict transactional consistency across regions. However, as long as both regions are caught up, the results should be within about a second of each other. {{}} The duplicate Materialize environment setup can be adapted into a more cost-effective setup if your deployment uses a [three-tier or a two-tier architecture](/manage/operational-guidelines/). For details, see the [hybrid variation](#hybrid-variation). ### Hybrid variation {{< note >}} - The hybrid strategy is available if your deployment uses a [three-tier or a two-tier architecture](/manage/operational-guidelines/). - {{< include-md file="/shared-content/regional-dr-infrastructure-as-code.md" >}} {{}} For a more cost-effective variation to the duplicate Materialize environment in another region, you can choose a hybrid strategy where: - Only the sources clusters are running in the second Materialize environment. - The compute clusters are provisioned **only** in the event of an incident. When combined with a [multi-replica approach](#level-2--multi-replica-clusters-high-availability-across-azs), you have: - Immediate failover during an AZ failure. - Downtime equal to hydration time during intra-region failover. ## See also - [Materialize DR characteristics](/manage/disaster-recovery/recovery-characteristics)