Ali, Toutiao, Meituan, Kuaishou Big data development post interview summary

Since March, I have had interviews with ali, Toutiao, Meituan and Kuaishou for the positions of big data development. Nearly 20 interviews have been quite mental, but the results are ok. Except for Toutiao, I have received offers from the other three companies. Because I can’t remember exactly what was asked in which scene.

First, let’s talk about the overall feeling of these interviews. The headline and the style of Quick hand are similar, and there are algorithms or implementation questions on each side. The algorithms are mainly Leetcode easy and middle difficulty questions, which are related to your interview situation. The author brush questions not much, also more than 100, fortunately, I did not encounter too difficult algorithm, but brush questions or usually have nothing to brush more good, after all, every interview only assault brush a little tired, brush questions also helps to broaden the mind. The implementation questions mainly ask you to implement HashMap, LRU, production consumer model, singleton pattern, etc. The interviewer can see your understanding of data structure and code implementation ability. For Ali and Meituan, there are not many algorithm questions, but they pay more attention to the highlights of the project. The highlights I understand include that you have developed or optimized valuable functions, solved complex or difficult problems, etc. This needs to be summarized according to the projects you have done. Number warehouse SQL, modeling theory asked more.

It is recommended not to send resumes at the beginning of the interview, you can first find a training hand, in the interview process to fill the gaps, and constantly improve the knowledge blind spot, these companies, as long as your performance is not too bad, a department surface but also can face other departments. Can resume in affectionately, retractor, boss straight employed for HR internal employees to push, but can be by headhunters, but search is uneven, good job recruiters were hiring position gives a detailed introduction on the company, have more than one offer you do this especially in the back when the choice is more convenient, you can also learn about the pros and cons of these offer by headhunters, But good headhunters are hard to touch.

Ok, here is a list of the interview questions I met during this interview. There are no answers to these questions, but I will start with some books and courses I have read. The following article will summarize these knowledge points, please pay attention to.

1 the Java based

What does polymorphism mean in Java
Have you used final keys in Java
Talk about the role of the volatile keyword as opposed to the synchronized keyword
Do you understand the internal structure of HashMap? Implement a HashMap yourself
Principles and differences of HashMap, Hashtable, and ConcurrentHashMap
Implement a producer-consumer model in Java that blocks queues with BlockingQueue
Learn which design patterns and implement a singleton pattern

Speech: Do big data development, Java foundation is necessary, generally one side, two will be asked, Java foundation answer is not good, generally will not give.

Recommended reading:

JAVA interview questions with answers

2. Data structure and algorithm

Algorithm:

Search rotated sorted arrays, Leetcode 33, medium difficulty
Implement an LRU cache, Leetcode 146, medium difficulty
Implement a queue with two stacks, LeetCode 232, easy
Given an array of non-empty integers, return the element with the highest occurrence frequency k. Leetcode 347, medium difficulty
The nearest common ancestor of binary trees, Leetcode 236, medium difficulty

Remarks: Remember these, you can also see that the basic questions are leetcode, it is very necessary to brush the questions. It is recommended to brush according to classification, such as binary search, dynamic planning have some fixed patterns.

Data structure:

Bloom filter
Bitmap
B + tree
LSM Tree
Jump table
Hyperloglog

The course “The Beauty of Data Structures and Algorithms” by Professor Wang Zheng has been bought by more than 80,000 people so far, which should be the course with the largest number of geek time buyers. If you need it, you can scan the following QR code to buy it. Quality is absolutely high, anyway I read a lot. For example, when talking about why the ordered set underlying data structure of Redis uses hoptables, the teacher will start with binary search tree and B+ tree, so that you can understand the similarities and differences and application scenarios of these three data structures at the same time.

3 Hive

Hive SQL: Hive SQL: Hive SQL: Hive SQL

Hive Row_number and rank
Hive window function How to set the window size
Hive order by,sort by,distribute by, and cluster by differences
How do I set Hive Map and Reduce numbers
What are the causes of Hive SQL data skew? How to optimize the
Do you understand the internal structure of the Parquet data format
What compression format does Hive data use
How to convert Hive SQL into MR tasks
Hive bucket partitioning
Are Hive UDFS, UDAF, and UDTF familiar? Have you ever written a UDF
How to verify Hive SQL
Lateral view explode to split array
How does MapReduce, the underlying join operation, perform

SQL word test:

A login_in table, userID, login_time, and IP, contains a large amount of data. A person may have multiple logins.
The login_IN table is used to count the total number of logins (PV) and total number of logins (UV).
Userid, follow_list A [B, C, D] B [A, C] C [D] count how many friend pairs there are in the table

Testimonial: this mainly inspects you to write SQL at ordinary times much, the number warehouse development post will be more concerned about SQL ability

Recommended reading:

Why did we choose Parquet

4 MapReduce&Spark

The MapReduce operation process involves several sorts
Spark Task execution process
Difference between MapReduce Shuffle and Spark Shuffle
Spark’s memory management model
Tell me about the Spark Shuffle
Do you know the Spark Shuffle bypass model
What problems do YOU encounter in using Spark and how do you solve them

Acknowledgements: MapReduce&Spark is the main offline computing engine, so you need to be familiar with the task scheduling process and potential performance bottlenecks, and understand component principles and tuning. If you encounter and solve large data engineering performance problems in your work, you will gain extra points

Recommended reading:

Overview of MapReduce Shuffle and Spark Shuffle

Basic principles of Spark memory management

Spark Performance tuning n/A Shuffle Related parameters

5 Spark Streaming&Flink

Spark Streaming versus Flink
How can Flink be Exactly Once
Can Spark Streaming achieve Exactly Once semantics
What StateStore Flink has and what it has used in its work
Did you do Flink memory tuning
Have you ever met OOM situation? How to deal with it
Talk about Spark Streaming and Flink backpressure mechanism
Flink window function, time mechanism, CheckPoint mechanism, two phase commit
Flink shuangliu Join
How to set Flink State TTL
What are the methods of Flink dimension table association and how to deal with the large amount of data

Some other word questions:

Real-time PV and UV statistics
Real-time TOP N statistics
Live join of AD exposure stream and click stream

Remarks: Not only should you have a clear understanding of the principle of components, but also have actually done real-time related business development. The interviewer will also give you some of their scenarios for you to say how you will design, so you need to pay more attention to the implementation of real-time business scenarios.

Recommended reading:

Flink DataStream Uses the associated dimension table

Large screen real-time computing of e-commerce based on Kafka + Flink + Redis

Ali Jiang Xiaowei on the similarities, differences and advantages of Flink and Spark

Flink base | deep understanding of Apache Flink core technology

6 Data Warehouse

How is the warehouse structured and layered in your company
Talk about the difference between paradigm modeling and dimensional modeling
Tell me the difference between a star model and a snowflake model
Design a retention model for each channel
What about the slowly varying dimension
How do you synchronize the data to the data warehouse? How do you ensure that the data is not lost
How is data quality controlled
How is the data specification defined
If metadata management

Remarks: we need to understand the methodology of data warehouse, and the overall construction concept of data warehouse should be correct. To meet a business requirement, we can give a reasonable data warehouse construction model.

Recommended reading:

Distinguish the connotation and differences of BI, data warehouse, data lake and data center (suggested collection)

The layered theory of data warehouse

7 Kafka

Tell me about Kafka
Kafka fundamentals, advantages over other MQ
Talk about the difference between Kafka’s high and low consumer apis
Kafka’s ACKS are different
How do Kafka consumers get their data from Kafka
How Exactly Once can Kafka produce and consume
How does Kafka guarantee order
What is Kafka Controller for
How to elect Kafka multi-replica leader
What is the Kafka consumer group rebalancing process
What is the difference between Kafka offset and Kafka offset
How to check the consumption progress of consumers

Recommended reading:

Kafka common interview points summary

Kafka high availability implementation principles

8 HBase

How is the HBase RowKey designed
Talk about the hot issues and how to solve them
This section describes the HBase read/write process
What optimization is performed during HBase usage
This operation acts as an HBase Compaction mechanism

Recommended reading:

HBase: Designed for efficient and scalable distributed systems

B+ Tree and LSM-tree for data storage retrieval

Details about the HBase Compaction process

Understand the HBase architecture in one view

9 Redis

What data structures Redis contains
The underlying implementation of ordered collections in Redis
What are the data persistence methods of Redis and their advantages and disadvantages
Consistent hashing, okay

Recommended reading:

Why do we use Redis

How to use Redis to count unique user visits?

Redis architecture evolution

conclusion

In daily work, we must pay attention to summary and accumulation, check loopholes and fill gaps, and constantly improve our knowledge system.

Ali, Toutiao, Meituan, Kuaishou Big data development post interview summary

1 the Java based

2. Data structure and algorithm

3 Hive

4 MapReduce&Spark

5 Spark Streaming&Flink

6 Data Warehouse

7 Kafka

8 HBase

9 Redis

conclusion

Related Posts

Soul Gateway (18) – WebHandler interface

Linux Health Check indicator

Soul Gateway source code analysis (15)