ACM SIGMOD International Conference on Data Management is an international conference sponsored by ACM Data Management Professional Committee (SIGMOD), which has the highest academic status in the field of database.

SIGMOD and two other database conferences VLDB and ICDE constitute the top three conferences in the database world. SIGMOD is relatively more lucrative than the other two, making it harder to get accepted. ACM SIGMOD’s paper acceptance rate is very low, with an average acceptance rate of about 15%-17%.

The paper “TcpRT: Instrument and Diagnostic Analysis System for Service Quality of Cloud Databases at Massive Scale in real-time “(TcpRT: Real-time collection and diagnosis system of quality of service for large-scale and mass cloud database) was included by SIGMOD 2018.

TcpRT paper introduces the innovation work of RDS Sky Vision system in SLA data collection of cloud database, calculation of service quality index, anomaly detection and fault root cause analysis, as well as customer practical experience in large-scale deployment of automated services on various cloud platforms.

The judges to evaluate

I have plenty of experience with manual anomaly detection. That has wasted much time for me at work, so I liked what you described.

The following is SIGMOD 2018 Ali Cloud selected papers.Copy the code

Download the Chinese version: click.aliyun.com/m/100000035… Download English version: click.aliyun.com/m/100000035…

Introduction to the

With the increasingly hot trend of cloud on the enterprise, database, as the core component of the industry, has become the fastest growing online service business of cloud computing companies. As the largest cloud database vendor in China, our RDS team is committed to providing users with stable cloud database services. In essence, RDS is a multi-tenant DBaaS platform. It uses lightweight KVM, Docker image and other resource isolation technologies to deploy database instances purchased by users on physical machines, allocate resources on demand and perform automatic upgrade and downgrade to achieve a set of fully automated intelligent operation and maintenance management.

Cloud databases are critical to the stability of customer services. Therefore, it is a challenge for cloud database vendors to quickly detect cloud database performance anomalies and locate the cause of the anomalies in a timely manner. TcpRT is an infrastructure used by Ali Cloud database to monitor and diagnose database service quality. TcpRT collects trace data from the choking control of host TCP/IP protocol stack, calculates database delay and network anomalies, conducts large-scale real-time data analysis and aggregation on background streaming computing platform, and discovers anomalies through cauchy distribution of statistical index historical data. The exception probability of different components is calculated based on the proportion of the consistency trend of all instances on the same host, switch, and proxy.

Up to now, TcpRT has been running steadily for three years in Aliyun with its excellent performance of collecting 20 million original trace data per second, processing 10 billion transaction data in the background every day and detecting anomalies at the second level.

This article contribution

  • This paper presents a new method to collect the quality of service (QOS) of database based on kernel congestion module, which can collect the latency and bandwidth of per connection of relational database based on stop protocol in a non-invasive and low-cost way, and analyze the model of user using database (short connection and long connection). In addition, the impact of basic network service quality on database service quality can be recorded and quantified end-to-end, including packet loss rate and retransmission rate.
  • We have developed a set of streaming computing system for data cleaning, filtering, aggregation and analysis of the collected original data. The system can achieve horizontal expansion, fault tolerance, real-time, Exactly Once, and has the ability to exchange data with other big data platforms such as EMR and MaxCompute
  • We propose a new algorithm to analyze TcpRT data to find out whether the quality of service of database is abnormal and locate the root cause of abnormal events

The conference will be held on June 10 in Houston, USA, and the paper will also be publicly presented. Title: SIGMOD/PODS ’18 International Conference on Management of Data Houston, TX, USA — June 10-15, 2018 Pages: 1846 Sponsor: SIGMOD ACM Special Interest Group on Management of Data Publisher: ACM New York, NY, USA ISBN: 978-1-4503-4703-7 Conference: MODInternational Conference on Management of Data

The original link