We've all
heard the saying, "You can't have it all." For instance, you can't
enjoy all the perks of single life and be married, or have a busy weekend and
still get plenty of rest. Similarly, when choosing the right data
synchronization mode for Always On Availability Groups, you must consider
trade-offs and opportunity costs.
In synchronous-commit mode, the focus is on high availability rather than performance, resulting in increased transaction latency. The primary replica waits to confirm the transaction to the client until the secondary replica has written the log to disk.
In asynchronous-commit mode, the priority is minimizing transaction latency at the expense of high availability. The primary replica confirms the transaction to the client right after logging the record locally, without waiting for any secondary replicas.
As a wise person once said, "The bad news is you can't have it all. The good news is that when you know what's truly important, you don't want it all anyway."
Synchronous-Commit Mode
Data
synchronization in synchronous-commit mode works as follows:
1. A client initiates a transaction against the database in the availability group on the primary replica.
2. The primary replica generates transaction log blocks. Concurrently, the secondary replica requests the log blocks from the primary, negotiating the appropriate log sequence number (LSN) starting point and other necessary information. The primary replica’s log cache fills with these log blocks.
3. When the log block is full or the primary replica commits the operation, the log blocks from the log cache are flushed to the log file for persistence. In an Always On Availability Group setup, these log blocks are also copied to the log pool as they are flushed to the disk on the primary replica.
4. A thread called log capture reads the log blocks from the log pool and sends them to the secondary replica. For multiple secondary replicas, there is one log capture thread per replica, ensuring parallel log block transmission across replicas. The log content is compressed and encrypted before being sent to the secondary replicas.
5. On the secondary replica, a thread called log receive obtains the log blocks from the network.
6. These log blocks are written to the log cache on the secondary replica.
7. A redo thread, one per database, constantly runs on the secondary replica. As the log blocks are written to the log cache, the redo thread reads these blocks and applies the changes to the data and index pages in the secondary database to synchronize it with the primary replica. When the log block on the secondary replica is full, or upon receiving a commit log record, the content of the log cache is hardened onto the log disk on the secondary replica.
8. If the secondary replica is configured for synchronous mode, it acknowledges the commit to the primary node, indicating that the transaction is safely hardened, and the user can be informed that the transaction is committed. This ensures no data loss in case of a failover since the log is hardened on the secondary.
9. The redo thread operates independently of the log block generation on the secondary and their copying and persistence. If the redo thread is a few minutes behind, it may not find the log blocks in the log cache and will instead retrieve them from the log disk, as illustrated by the dotted line in the diagram above.
Asynchronous-Commit Mode
Data synchronization in Asynchronous-commit mode works as follows:
1. A
client initiates a transaction against the database in the availability group
on the primary replica.
2. The
primary replica generates transaction log blocks. Concurrently, the secondary
replica requests the log blocks from the primary, negotiating the appropriate
log sequence number (LSN) starting point and other necessary information.
3. When
the log block is full or the primary replica commits the operation, the log
blocks from the log cache are flushed to the log file for persistence. In an
Always On Availability Group setup, these log blocks are also copied to the log
pool as they are flushed to the disk on the primary replica.
4. If all
secondary replicas are in asynchronous availability mode, a successful I/O to
the local transaction log is sufficient to send an acknowledgment of a
successful commit back to the application. Simultaneously, a thread called log
capture reads the log blocks from the log pool and sends them to the secondary
replica. For multiple secondary replicas, there is one log capture thread per
replica to ensure parallel transmission. The log content is compressed and
encrypted before being sent to the secondary replicas.
5. A
thread called log receive runs on the secondary replica, receiving the log
blocks from the network and writing them to the log cache.
6. The log
blocks are written to the log cache on the secondary replica.
7. A redo
thread constantly runs on the secondary replica. As the log blocks are written
to the log cache, the redo thread reads these blocks and applies the changes to
the data and index pages in the secondary database, synchronizing it with the
primary replica. When the log cache on the secondary replica is full or
receives a commit log record, it hardens the content onto the log disk on the
secondary replica.
8. The redo thread operates independently of the log block generation and copying on the secondary replica. If the redo thread lags by a few minutes, it may not find the log blocks in the log cache and will instead retrieve them from the log disk, as indicated by the dotted line in the diagram.
Note:
Starting with SQL Server 2016, the synchronization throughput of Availability Groups has increased approximately tenfold due to enhancements in the data synchronization process. These improvements include parallel and faster compression of log blocks on the primary replica, an optimized synchronization protocol, and parallel decompression and redo of log records on the secondary replica. In SQL Server 2012 and 2014, the redo process was executed serially by a single thread, limiting it to a single CPU core. This caused secondary replicas to struggle with high-write workloads, leading to long downtimes during a failover. In SQL Server 2016, the redo process is executed in parallel, utilizing all available CPU cores. These enhancements have improved database recovery times on failover and increased the data freshness on secondary replicas.
0 comments:
Post a Comment