TiDB SQL Infra Team: Working together to build the perfect bridge from computing to storage | PingCAP Recruitment Season

What is TiDB SQL Infra? Isn’t TiDB an Infrastructure Project? In short, TiDB SQL Infra Team is a part of TiDB r&d Team. Let’s talk about what we are doing and what we want to do in the future.

TiDB, as a database separated from computation and storage, has many things to do at the SQL Layer Layer, such as mode control for the upper query and mapping construction for the lower KV data. What the TiDB SQL Infra Team is doing is becoming the bridge between KV Store and SQL Layer. This is also the origin of its SQL Infra name: Infra means infrastructure. Thanks to the existence of online consistent mode change, a mature structured query system can be built on Raw KV from bottom to top on each TiDB Server.

With the growth of TiDB R&D group, TiDB SQL Infra Team is gradually separated from the original r&d Team, and the work content is more visualized, specific and modular. TiDB SQL Infra Team is now more focused on TiDB schema information processing and maintenance, DDL syntax control flow, SQL diagnostic data integration, and data interaction with Placement Driver (PD) and KV Storage. As TiDB pays more attention to computation and data source affinity, the Coprocessor coprocessing logic derived from operator downscaling is also divided into the work scope of TiDB SQL Infra Team.

What are we doing

TiDB meta information management

The management of meta – information is very important for a database. Therefore, the bulk of the work within the SQL Infra group is the TiDB DDL. As you may know, as a distributed database, TiDB also supports Online DDL changes based on Google F1 Online Schema Change. This theory itself is relatively simple, but there are many things to consider in practical engineering application:

How do DDL online changes combine with Failover for distributed systems? Which states need to be persisted and which do not?
Which stages of DDL can be optimized (skipped)? Which DDLS can be supported in parallel, and how?
Which DDLS need to modify user data, and which do not, and how to avoid large-scale conflicts with user transactions when modifying data, and how to adjust the contention with user transactions for cluster resources?

With these considerations in mind, we have improved the DDL of TiDB continuously by providing FLASHBACK table-level flashbacks that can save lives in critical moments and avoid “operation not standard, family tears”. Table-level locking allows TiDB to better support bulk import/backup/recovery of data. In extreme cases, TiDB metadata is broken or lost, and ADMIN REPAIR can REPAIR the metadata to prevent the entire cluster from becoming unavailable. The SEQUENCE, which is being developed, provides services with a more flexible persistent auto-value-added solution. In addition, we are still doing further optimization and exploration work.

TiDB cluster data collection and diagnosis

TiDB is a distributed system with storage and computing separation and multiple components. As you can imagine, many problems need to be considered in operation and maintenance:

How do I determine whether a database cluster is healthy?
If the cluster has a problem/fault, how to solve it?
How to quickly cluster the current bottlenecks where, how to deal with these bottlenecks?
What type of SQL executes slowly? Why or when did it slow down?

To solve the above problems, we developed TiDB STATEMENT SUMMARY by referring to SQL audit functions of MySQL/Oracle/DB2 and other existing systems. TiDB SQL execution process is counted based on STATEMENT and execution plan fingerprint, and updated and archived in a rolling manner. This helps users and DBAs troubleshoot performance problems that do not meet expectations. The TiDB performance diagnostic framework under development directly provides the collection and display of TiDB index status of the whole group, the whole module and the whole link in a built-in way. It can be said that you can think of and unexpected indicators can be obtained through this framework. For example, you can use an SQL to get the CPU and memory flame map of TiDB/TiKV running.

Our team

About the place

TiDB SQL Infra Team members are evenly distributed in each Office (Beijing, Hangzhou, Chengdu and Guangzhou), because TiDB is a project born from open source and developed by the open source community, which makes us believe that remote communication and Office work is normal. So PingCAP doesn’t divide your work modules by location, and you’re free to work wherever you want to join the TiDB team.

About job Content

In fact, TiDB is a very open project, and there are no limits to what we can do: whether it’s improving TiDB performance, improving usability, or even just making a feature cool, you can apply for anything you can think of that would make TiDB better and you’re interested in digging holes in it. In TiDB SQL Infra Team, we don’t want everyone to just follow the established TODO List, we want everyone to open their minds and work in constant thinking.

About growing up

The development of the underlying system software is a very challenging task. The early database system was built in the 80s and 90s with C code, and the huge system was difficult to understand and debug in modernization. The modern database itself is a very challenging and interesting project, how to implement the traditional AST parsing, planning optimization, physical storage in a modern language and give Raft high availability reality was a huge challenge for us. This is especially challenging for TiDB, a database that targets HTAP Workload and wants to be deployed in a variety of user scenarios.

Frankly speaking, this is a very high demand for engineers, and we attach great importance to personal growth in the work. The company regularly holds Infra Meetup, Paper Reading and a variety of internal and external technical exchanges and sharing. We have a study group within the Team, where you can add the direction you are interested in to the topic pool of the study group. Every week, we have special time for everyone to learn and communicate with each other. We believe these mechanisms can help you open your mind to what’s going on in the industry.

Gossip.

TiDB is a database written from scratch, compared with a mature database to do improvements, development of middleware, this approach of writing code from scratch to do a lot of work. But the advantage is that you are free to design and implement according to your own ideas, and even to add new syntax for your ideal functions, and to think about the various scenarios in which database users use these syntax, what problems they want to solve, and how to maintain them afterwards. These things can be implemented in the TiDB team and pushed to the majority of TiDB users.

If you like to study extreme performance, like to think about database design, like to know the bottom of the database, welcome to join us!

Join us!

We think good engineers have more or less the following traits in common:

· A Quick Learner · A-N Earnest Curiosity · Faith in Open Source · Self-driven · Get Things Done

If you have any of the qualities listed above, welcome to the jobs page to view the job opportunities currently open:

www.pingcap.com/recruit-cn/…

Please send your resume to [email protected]

Interns: The company’s benefits and learning resources are fully open to interns. More importantly, interns will have the opportunity to contact industry-level projects before graduation, and those who perform well during the internship will have the opportunity to get the privilege of green channel. In view of the practice time is not enough friend, you can pass Talent Plan knowledge base (university.pingcap.com/talent-plan… TiDB Open Source community gets more practice opportunities!

Berle recommendation: If you have a partner who meets the above requirements, you can also talk to us. If you recommend successfully, you will have the chance to get berle recommendation reward. Name of candidate – job title – name of recommender – Mobile phone number of recommender.

Extending reading

Yes, we are hiring! PingCAP 2020 recruitment season is officially open

TiDB Architecture Team: Challenges the nature of databases

PingCAP’s young frontier team: The user ecosystem