HCP Vault Dedicated opens cluster DR drills, by support ticket only

For the first time, HCP Vault Dedicated customers can run cluster-level disaster recovery drills against their managed secrets infrastructure, but only by opening a support ticket. HashiCorp announced cluster disaster recovery for HCP Vault Dedicated in public preview, letting teams simulate cluster failures and validate failover readiness. The catch is upstream of the announcement: the operation is gated through HashiCorp support, not the customer's runbook.

What public preview actually opens up

For HCP Vault Dedicated customers on production-tier clusters with DR already enabled, the new feature lets you exercise the cluster-level failure path on demand. The blog post calls out two operations: isolating an affected cluster, and promoting a DR secondary so the workload keeps reading secrets while the primary is out of commission. Before this, the cluster-level path existed inside HashiCorp, but a customer had no sanctioned way to drill it.

For CI/CD owners, the practical read is simple: every pipeline that pulls secrets from Vault has a single failure mode the team cannot rehearse, and that mode just acquired a rehearsal hook.

Why this matters in a pipeline context

CI rarely tests the assumption that the secrets backend will answer. Most pipelines have one Vault address baked into a CI variable; if it stops resolving, every job that needs a token sits and waits, or fails fast, depending on the workflow's retry posture. Region-level outages are the case teams rehearse. Cluster-level ones (the noisier neighbour, the upgrade gone sideways, the corrupted Raft state) tend to live in vendor postmortems, never in a customer's game day.

How a drill actually runs

Per the announcement, the workflow is manual. The customer opens a support ticket with the cluster details and the desired failover window. HashiCorp's on-call Vault team then performs the operation and validates recovery. The post commits to routing requests submitted within a 16-hour lead time to that on-call rotation.

Two operational consequences fall out of that.

First, a drill is a calendar event, not a button. You scope a window, file the ticket, wait for a confirmation, and run your scenario inside the window. That is fine for the quarterly DR exercise on a sprint board. It is awkward for the kind of unscheduled chaos test a platform team likes to slip in.

Second, the 16-hour lead time is also the rehearsal SLA. You cannot meaningfully drill an incident-response scenario faster than that, because the operator on the other side of the ticket is part of the path you are rehearsing.

Where this still does not bite

A few things the public preview does not change, all from the same release notes:

The cluster DR action runs at HashiCorp's hand. The customer's Terraform, the customer's CI, the customer's chaos tool: none of them invoke the failover.
The feature is scoped to production-tier clusters with DR enabled. Other tiers are out of scope for the drill.
Regional DR remains a separate capability. Cluster DR is additive, not a replacement for the existing region-level path.

The honest summary is that this is a rehearsal mechanism, not an automation primitive. Pipelines that want to gate on "the Vault region is reachable" still have to test that themselves, and runbooks that assume a one-command failover should keep their wording careful.

How the rest of the managed-secrets field handles it

Cluster-level DR drills sit awkwardly across managed secrets and KMS because the cluster is the vendor's unit of operation, not the customer's. A few comparison points:

AWS KMS does not expose cluster failover. The failure model is "the region is up or it is not", and DR drills happen on the consumer side, by failing over the application's endpoint.
Google Cloud KMS and Azure Key Vault both surface regional or paired-region replication for higher tiers, but cluster mechanics stay hidden and the customer never holds the failover switch.
Self-hosted Vault leaves you the chaos engineering work entirely: you operate the cluster, so you can break it on a Tuesday afternoon if you want to.

HCP's cluster DR drill is the first of these to give a managed-service customer a sanctioned rehearsal for the cluster boundary. It does that slowly, through a ticket queue, which is a reasonable starting point and a real constraint.

Before the first drill

A short list for platform teams considering a first run: confirm the cluster is on a production tier with DR enabled, write the rehearsal scenario before opening the ticket, schedule the window with the 16-hour lead time on top, and decide ahead of time which CI workflows will be running during the cutover. If the pipelines are quiet, what gets validated is HashiCorp's runbook, not yours.

The feature is in public preview. The next thing to watch is whether HashiCorp lets customers initiate the failover directly, or keeps the ticket-mediated workflow as the long-term shape.

HCP Vault Dedicated opens cluster DR drills, by support ticket only

What public preview actually opens up

Why this matters in a pipeline context

How a drill actually runs

Where this still does not bite

How the rest of the managed-secrets field handles it

Before the first drill

Vault wants your AI agents to ask permission, one request at a time

Vault learns to speak SPIFFE, and your pipeline's static token is on notice

Dependabot can finally pull from private GitHub Packages without a PAT

Turn this into your pipeline. Build it on Buddy.