Exclusive reveal: SQL Sever AlwaysOn breakthrough in Ali Cloud


Wang Fangming
DBAplus community

DBAplus community

WeChat ID dbaplus

Intro is an enterprise-level professional community centered on Data, Blockchain and AiOps. Top celebrities, technical dry goods, high-quality original articles pushed every day, weekly online technology sharing, monthly offline technology salon, audience 20W+.


today


The authors introduce

Wang Fangming is a technical expert of Alibaba. From DBA to product research and development, he has grown up with Ali Cloud database products, and has a deep understanding of database technology and back-end technology platform construction. Currently, he is mainly responsible for RDS SQLServer product research and development.


As early as 2015, with the rapid development of Ali Cloud business, SQL Server business has also accumulated a large number of loyal customers, some of which are large. When the business peak is similar to the rush, the single machine specifications of RDS (specifications are allocated according to a certain proportion of “memory *CPU*IOPS”, According to the different underlying resources, there will be their own upper limit), which can not meet the business needs of users, so in our opinion, Scale Out is also needed.



However, SQL Server does not have a complete middleware product, so both logical Sharding and read-only separation need users’ cooperation in application transformation. From the perspective of users, Sharding is changing rapidly, which can not be completed in a short time. Therefore, more hope is placed on us to provide read and write separation solutions to meet business needs.


So read-write separation, the first thing that comes to mind is AlwaysOn technology. But because AlwaysOn was heavily dependent on domain control and Windows clustering, both of which were challenging to the infrastructure we relied on, it required a lot of non-standardization beyond product limits, and there were security risks. So we ended up abandoning the AlwaysOn solution and redesigning the solution to help users survive.


In the face of such customer needs, it is worth thinking about how to productize our solutions.


1. Rapid development of products


In addition to read-write separation, there were more important issues that we needed to address, so from 2015 to 2017, we went through a period of rapid growth, and we did a lot of things around product stability, diversity and user experience, just to name a few:



  • In order to improve stability and user experience, we first replaced the underlying architecture, which also laid a foundation for the diversified development of subsequent products.

  • In order to meet different user requirements, the combined versions of SQL Server 2008R2, 2012/2014/2016 Web, Standard, Enterprise and Edition are released.

  • The cloud assessment tool was launched to address the difficulty of cloud access, For different versions and scenarios, cloud SQL Server 2008 R2 on full backup data, cloud SQL Server 2012 and later on full backup data, and SQL on incremental backup data Cloud on Server instance level database;

  • In order to improve user experience and support more features, we provide a lot of encapsulated stored procedures in the SQL layer. Some seemingly simple functions in the face of external security, internal SQL mirroring and other factors, the implementation of the challenge is still very big.

  • In order to make expert services more intelligent and close to each user, we developed SQL Server CloudDBA to integrate a large number of performance and space problems on the cloud solutions.


There are still user requirements for read/write separation, and each time we encounter it, we first bootstrap to the IaaS layer and implement it with ECS. Because the time is not ripe for PaaS. The specific reason is closely related to the combination of SQL Server’s current technology stack and cloud products. Here we can also share some thoughts behind it.


Two, read and write separation


First of all, we need to clarify what read/write separation is. Most of the read/write separation of MySQL uses the middle layer to do route resolution, which can basically achieve transparency to the application side, and only a few scenarios need user adaptation.


SQL Server does not have a mature middleware product. In essence, this is the reason why TDS (Tabular Data Stream) is not fully open. There are ways to do it, but the costs far outweigh the benefits. Based on this, no matter what technology is used to achieve read and write separation in SQL Server, some adaptation is needed for applications. Even with AlwaysOn’s technology, the configuration of link-driven parameters is different, so the read/write separation we discuss later is based on this premise.


Three, technology selection


We compared all the related technology stacks of SQL Server:



Data security, High Availability (HA), Disaster Recovery (DR), and readability of the standby database are the most important issues for us.



HA here refers to whether the native technology itself supports automatic HA, and when combined with some cloud products, we also have the ability to turn non-support into support. Data security and disaster recovery time are largely determined by native technologies. Readable standby databases are a statement of a single technology, but there are combinations of technologies that can make unreadable readable ones readable (Database Mirroring + Database Snapshots).



Finally Transactional Replication and AlwaysOn together are technologies that we feel have an opportunity to productize for read/write separation.


Let’s look at the two technologies separately:



In principle, Replication is logical Replication. There are differences in performance, latency, and reliability between Replication and AlwaysOn’s physical Replication. In terms of product complexity read, controllability, and ease of use, Replication was too flexible to control down to the table and column levels, and the overall complexity was too high for users to use or for us to productize, so we chose AlwaysOn.


Iv. AlwaysOn technology


AlwaysOn isa technology that supports both High Availability and Disaster Recovery. Failover Cluster Instances (hereafter referred to as FCI) and Availability Groups (hereafter referred to as AG) are divided into Failover Cluster Instances (hereafter referred to as FCI) and Availability Groups (hereafter referred to as AG).



Both FCI and conventional AG rely on Windows Server Failover Clustering (WSFC). FCI is a Share Storage, AG is a Share Nothing. FCI is instance level synchronization, while AG is DB level synchronization.



It is easy to imagine that Share Nothing would have a distinction between synchronous and asynchronous (similar to mirroring), where the distinction requires that we know AlwaysOn’s basic synchronization process:





First, logs (Commit/Log Block Write) on the Primary node are flushed from the Log Cache to disks. Meanwhile, Log Capture on the Primary node sends logs to all other Replica nodes. The Log Receive Thread of the node flusher the received logs from the Log Cache to disk. Finally, the Redo Thread applies the received logs to the data file.


Unacknowlege unacknowlege Unacknowlege Unacknowlege UnACKNOWLEge UnACKNOWLEge unACKNOWLEge unACKNOWLEge unACKNOWLEge On the other hand, if the Primary does not return Secondary, then it is asynchronous.


This is the basic synchronization process, but both AlwaysOn and Database Mirroring have a condition that if the Secondary end is abnormal in synchronization mode, the Primary end does not receive its heartbeat or Acknowlege Commit. Then it doesn’t count as a write failure.



Because once it identifies the Secondary exception, it will not wait for this ACK, but degenerate into a similar asynchronous mode, but will record the abnormal status of the Secondary end in the base table, through the relevant view: Sys. dm_hadr_database_replica _States and sys.database_mirroring are exposed as our common NOT SYNCHRONIZING/Disconnect state.



At this time, the automatic operation and maintenance system or the DBA needs to make a judgment. When the Secondary is restored and brought back online, the system reports the End of Log (EOL) LSN to the Primary, and the Primary sends all logs of the HARDENED EOL LSN to the Primary.



Once the Secondary receives these logs and gradually brushes them into the log file, the AG or Mirroring related views mark their state as Synchronizing, indicating that they are Synchronizing until Last Hardened (LH) LSN reaches the master/slave state. You’re back in sync mode.


This has always been the case. SQLServer 2017 CU 1 introduced the REQUIRED_SYNCHRONIZ ED_SECONDARIES_TO_COMMIT parameter, which has a long name and basically covers its purpose. It is possible to have the Primary server wait until the Secondary node is back online and synchronized.


Having learned about AG synchronization, asynchrony and FCI, let’s summarize our concerns:



In the actual scheme, these can also be combined, and finally integrated with Ali Cloud products to make an overall scheme. As mentioned before, Ali Cloud started to make similar solutions to solve users’ problems in 15 years, until the final PaaS, which is also over three versions.


Evolution on the cloud


In the first version, we used ECS, SSD cloud disk, OSS, VPC and SLB as the foundation. In SQL technology, we use SQL+WSFC+AD. At present, there are many versions supported by this way, ranging from 12 to 17. The authentication mode can be domain control or certificate.


However, there are two disadvantages:


  • The high cost. In addition to Primary and two Secondary nodes, there should also be two AD nodes. After all, we need to ensure high availability in every link.

  • Not enough stability. Network jitter makes WSFC easily find exceptions, and THE SQL DB is unavailable at the same time.




This is the second version of the architecture, compared with the first version, we use HAVIP to solve the listener problem, remove the AD can only use certificates to do authentication, but also reduce the minimum resource cost to 3.



Before the plan is also used more on ali cloud, but the same as the first scheme, there will be many challenges on stability in the network, that is because our future scenarios with not only the city across the available area, there will be more across the Region, and through overseas, so this scheme can only Cover part of the user’s needs, But it’s not a final solution for us.




Finally, we came up with plan three, which eliminated WSFC and AD and focused only on the underlying cloud products and SQL itself.



The most important thing is that compared with scheme 2, the network jitter sensitivity will be lower and more controllable. At most, Send Queue accumulation will occur at the Primary end, which can be monitored and repaired through Performance Counter related to SQLServer.


However, no solution is perfect, and the price of strong controllability is that the native architecture without cluster and domain control does not have HADR capability, as students familiar with WSFC can know. Previous architectures relied on WSFC for HA, including health checks, resource management, distributed metadata notification maintenance, and failover, so we had to solve this problem ourselves.



We have also made a lot of efforts to achieve the final implementation of AlwaysOn HA system without domain control and without Cluster, independent of Cluster completely independent and controllable HA.



Six, productization


The final product architecture is as follows: Firstly, two synchronous nodes are guaranteed as master and standby nodes, and they are allocated to different available areas as far as possible. Other read-only nodes are asynchronous by default, and there can be up to seven read-only nodes. There are three types of user access links:


  • Read/write link: it points to two synchronous nodes, which is guaranteed by our HA.

  • Unified read-only link: set according to user requirements, bind the specified Replica nodes together to allocate links according to a certain weight ratio;

  • Single read-only link: Each read-only node provides A separate link for users to flexibly configure. For example, if the APP Server is in availability zone A, users can directly access the read-only address of availability zone A to avoid being routed to other areas.




So far, SQLServer AlwaysOn has been PaaS in ali cloud, of course, currently only support the most important functions, there are a lot of follow-up can be improved and rich. If you have any good suggestions or questions, please leave a message and communicate with me.


Wang Fangming

Reward

Long press the QR code to transfer money to me

Due to apple’s new regulations, the appreciation function of the iOS version of wechat has been disabled, and qr code transfer can be used to support public accounts.

    Views

    Scan QR Code via WeChat

    to follow Official Account

    Wechat scan


    Use applets

    About to open.”
    “Small programs
    cancel
    Open the