Test Failover¶

On this page

Test Failover Process

Test Failover Using the Atlas UI
Test Failover Using the API
Troubleshoot Failover Issues

Note

Feature unavailable in Free and Shared-Tier Clusters

This feature is not available for M0 free clusters, M2, and M5 clusters. To learn more about which features are unavailable, see Atlas M0 (Free Cluster), M2, and M5 Limitations.

Replica set elections are necessary every time Atlas makes configuration changes as well as during failure scenarios. Configuration changes may occur as a result of patch updates or scaling events. As a result, you should write your applications to be capable of handling elections without any downtime.

Note

Retryable Writes with MongoDB 3.6 and later

MongoDB drivers can automatically retry certain write operations a single time. Retryable writes provide built-in handling of automatic failovers and elections. To learn more, See retryable writes.

To enable this feature, add retryWrites=true to your Atlas URI connection string. To learn more, see Connect via Driver.

You can use the Atlas UI and API to test the failure of the replica set primary in your Atlas cluster and observe how your application handles a replica set failover. You must have Project Cluster Manager or higher role to test failover.

Test Failover Process¶

When you submit a request to test failover using the Atlas UI or API, Atlas simulates a failover event. During this process:

Atlas shuts down the current primary.
The members of the replica set hold an election to choose which of the secondaries will become the new primary.
Atlas brings the original primary back to the replica set as a secondary. When the old primary rejoins the replica set, it will sync with the new primary to catch up any writes that occurred during its downtime.
Note
If the original primary accepted write operations that had not been successfully replicated to the secondaries when the primary stepped down, the primary rolls back those write operations when it re-joins the replica set and begins synchronizing. For more information on rollbacks, see Rollbacks During Replica Set Failover.
Contact MongoDB support for assistance with resolving rollbacks.

Note

If you are testing failover on a sharded cluster, Atlas triggers an election on all the replica sets in the sharded cluster.

Only the mongos processes that are on the same instances as the primaries of the replica sets in the sharded cluster are restarted.
The primaries of the replica sets in the sharded cluster are restarted in parallel.

Test Failover Using the Atlas UI¶

Click Database.
For the cluster you wish to perform failover testing, click on the ... button.
Click Test Failover. Atlas displays a Test Failover modal with the steps Atlas will take to simulate a failover event.
Click Restart Primary to begin the test. See Test Failover Process for information on the failover process. Atlas notifies you in the Test Failover modal the results of your failover process.

Test Failover Using the API¶

You can use the Test Failover API endpoint to simulate a failover event. To learn more about the failover process, see Test Failover Process.

You can verify that the failover was successful by doing the following:

Log in to the Atlas UI and click Database.
Click the name of the cluster for which you performed the failover test.
Observe the following changes in the list of nodes in the Overview tab:
- The original PRIMARY node is now a SECONDARY node.
- A former SECONDARY node is now the PRIMARY node.

Troubleshoot Failover Issues¶

If your application does not handle the failover gracefully, ensure the following:

The connection string includes all members of the replica set.
You are using the latest version of the driver.
You have implemented appropriate retry logic in your application.

← Connect to a Cluster using Command Line Tools Best Practices Connecting from AWS Lambda →