Data Synchronization Modes In Always On In SQL Server

In this article we will learn about the data synchronization modes in Always On high availability. There are two modes of data synchornization - Synchronous commit and Asynchronous commit.

We've all heard the saying, "You can't have it all." For instance, you can't enjoy all the perks of single life and be married, or have a busy weekend and still get plenty of rest. Similarly, when choosing the right data synchronization mode for Always On Availability Groups, you must consider trade-offs and opportunity costs.

In synchronous-commit mode, the focus is on high availability rather than performance, resulting in increased transaction latency. The primary replica waits to confirm the transaction to the client until the secondary replica has written the log to disk.

In asynchronous-commit mode, the priority is minimizing transaction latency at the expense of high availability. The primary replica confirms the transaction to the client right after logging the record locally, without waiting for any secondary replicas.

As a wise person once said, "The bad news is you can't have it all. The good news is that when you know what's truly important, you don't want it all anyway."

Synchronous-Commit Mode


Synchronous Commit Mode

Data synchronization in synchronous-commit mode works as follows:

1. A client initiates a transaction against the database in the availability group on the primary replica.

2. The primary replica generates transaction log blocks. Concurrently, the secondary replica requests the log blocks from the primary, negotiating the appropriate log sequence number (LSN) starting point and other necessary information. The primary replica’s log cache fills with these log blocks.

3. When the log block is full or the primary replica commits the operation, the log blocks from the log cache are flushed to the log file for persistence. In an Always On Availability Group setup, these log blocks are also copied to the log pool as they are flushed to the disk on the primary replica.

4. A thread called log capture reads the log blocks from the log pool and sends them to the secondary replica. For multiple secondary replicas, there is one log capture thread per replica, ensuring parallel log block transmission across replicas. The log content is compressed and encrypted before being sent to the secondary replicas.

5. On the secondary replica, a thread called log receive obtains the log blocks from the network.

6. These log blocks are written to the log cache on the secondary replica.

7. A redo thread, one per database, constantly runs on the secondary replica. As the log blocks are written to the log cache, the redo thread reads these blocks and applies the changes to the data and index pages in the secondary database to synchronize it with the primary replica. When the log block on the secondary replica is full, or upon receiving a commit log record, the content of the log cache is hardened onto the log disk on the secondary replica.

8. If the secondary replica is configured for synchronous mode, it acknowledges the commit to the primary node, indicating that the transaction is safely hardened, and the user can be informed that the transaction is committed. This ensures no data loss in case of a failover since the log is hardened on the secondary.

9. The redo thread operates independently of the log block generation on the secondary and their copying and persistence. If the redo thread is a few minutes behind, it may not find the log blocks in the log cache and will instead retrieve them from the log disk, as illustrated by the dotted line in the diagram above.

Asynchronous-Commit Mode


Asynchronous Commit Mode

Data synchronization in Asynchronous-commit mode works as follows:

1. A client initiates a transaction against the database in the availability group on the primary replica.

2. The primary replica generates transaction log blocks. Concurrently, the secondary replica requests the log blocks from the primary, negotiating the appropriate log sequence number (LSN) starting point and other necessary information.

3. When the log block is full or the primary replica commits the operation, the log blocks from the log cache are flushed to the log file for persistence. In an Always On Availability Group setup, these log blocks are also copied to the log pool as they are flushed to the disk on the primary replica.

4. If all secondary replicas are in asynchronous availability mode, a successful I/O to the local transaction log is sufficient to send an acknowledgment of a successful commit back to the application. Simultaneously, a thread called log capture reads the log blocks from the log pool and sends them to the secondary replica. For multiple secondary replicas, there is one log capture thread per replica to ensure parallel transmission. The log content is compressed and encrypted before being sent to the secondary replicas.

5. A thread called log receive runs on the secondary replica, receiving the log blocks from the network and writing them to the log cache.

6. The log blocks are written to the log cache on the secondary replica.

7. A redo thread constantly runs on the secondary replica. As the log blocks are written to the log cache, the redo thread reads these blocks and applies the changes to the data and index pages in the secondary database, synchronizing it with the primary replica. When the log cache on the secondary replica is full or receives a commit log record, it hardens the content onto the log disk on the secondary replica.

8. The redo thread operates independently of the log block generation and copying on the secondary replica. If the redo thread lags by a few minutes, it may not find the log blocks in the log cache and will instead retrieve them from the log disk, as indicated by the dotted line in the diagram.

Note: 
Starting with SQL Server 2016, the synchronization throughput of Availability Groups has increased approximately tenfold due to enhancements in the data synchronization process. These improvements include parallel and faster compression of log blocks on the primary replica, an optimized synchronization protocol, and parallel decompression and redo of log records on the secondary replica. In SQL Server 2012 and 2014, the redo process was executed serially by a single thread, limiting it to a single CPU core. This caused secondary replicas to struggle with high-write workloads, leading to long downtimes during a failover. In SQL Server 2016, the redo process is executed in parallel, utilizing all available CPU cores. These enhancements have improved database recovery times on failover and increased the data freshness on secondary replicas.

0 comments:

Post a Comment