Since March, I have had interviews with ali, Toutiao, Meituan and Kuaishou for the positions of big data development. Nearly 20 interviews have been quite mental, but the results are ok. Except for Toutiao, I have received offers from the other three companies. Because I can’t remember exactly what was asked in which scene.

First, let’s talk about the overall feeling of these interviews. The headline and the style of Quick hand are similar, and there are algorithms or implementation questions on each side. The algorithms are mainly Leetcode easy and middle difficulty questions, which are related to your interview situation. The author brush questions not much, also more than 100, fortunately, I did not encounter too difficult algorithm, but brush questions or usually have nothing to brush more good, after all, every interview only assault brush a little tired, brush questions also helps to broaden the mind. The implementation questions mainly ask you to implement HashMap, LRU, production consumer model, singleton pattern, etc. The interviewer can see your understanding of data structure and code implementation ability. For Ali and Meituan, there are not many algorithm questions, but they pay more attention to the highlights of the project. The highlights I understand include that you have developed or optimized valuable functions, solved complex or difficult problems, etc. This needs to be summarized according to the projects you have done. Number warehouse SQL, modeling theory asked more.

It is recommended not to send resumes at the beginning of the interview, you can first find a training hand, in the interview process to fill the gaps, and constantly improve the knowledge blind spot, these companies, as long as your performance is not too bad, a department surface but also can face other departments. Can resume in affectionately, retractor, boss straight employed for HR internal employees to push, but can be by headhunters, but search is uneven, good job recruiters were hiring position gives a detailed introduction on the company, have more than one offer you do this especially in the back when the choice is more convenient, you can also learn about the pros and cons of these offer by headhunters, But good headhunters are hard to touch.

Ok, here is a list of the interview questions I met during this interview. There are no answers to these questions, but I will start with some books and courses I have read. The following article will summarize these knowledge points, please pay attention to.

1 the Java based

  1. What does polymorphism mean in Java
  2. Have you used final keys in Java
  3. Talk about the role of the volatile keyword as opposed to the synchronized keyword
  4. Do you understand the internal structure of HashMap? Implement a HashMap yourself
  5. Principles and differences of HashMap, Hashtable, and ConcurrentHashMap
  6. Implement a producer-consumer model in Java that blocks queues with BlockingQueue
  7. Learn which design patterns and implement a singleton pattern

Speech: Do big data development, Java foundation is necessary, generally one side, two will be asked, Java foundation answer is not good, generally will not give.

Recommended reading:

JAVA interview questions with answers

2. Data structure and algorithm

Algorithm:

  1. Search rotated sorted arrays, Leetcode 33, medium difficulty
  2. Implement an LRU cache, Leetcode 146, medium difficulty
  3. Implement a queue with two stacks, LeetCode 232, easy
  4. Given an array of non-empty integers, return the element with the highest occurrence frequency k. Leetcode 347, medium difficulty
  5. The nearest common ancestor of binary trees, Leetcode 236, medium difficulty

Remarks: Remember these, you can also see that the basic questions are leetcode, it is very necessary to brush the questions. It is recommended to brush according to classification, such as binary search, dynamic planning have some fixed patterns.

Data structure:

  1. Bloom filter
  2. Bitmap
  3. B + tree
  4. LSM Tree
  5. Jump table
  6. Hyperloglog

The course “The Beauty of Data Structures and Algorithms” by Professor Wang Zheng has been bought by more than 80,000 people so far, which should be the course with the largest number of geek time buyers. If you need it, you can scan the following QR code to buy it. Quality is absolutely high, anyway I read a lot. For example, when talking about why the ordered set underlying data structure of Redis uses hoptables, the teacher will start with binary search tree and B+ tree, so that you can understand the similarities and differences and application scenarios of these three data structures at the same time.

3 Hive

Hive SQL: Hive SQL: Hive SQL: Hive SQL

  1. Hive Row_number and rank
  2. Hive window function How to set the window size
  3. Hive order by,sort by,distribute by, and cluster by differences
  4. How do I set Hive Map and Reduce numbers
  5. What are the causes of Hive SQL data skew? How to optimize the
  6. Do you understand the internal structure of the Parquet data format
  7. What compression format does Hive data use
  8. How to convert Hive SQL into MR tasks
  9. Hive bucket partitioning
  10. Are Hive UDFS, UDAF, and UDTF familiar? Have you ever written a UDF
  11. How to verify Hive SQL
  12. Lateral view explode to split array
  13. How does MapReduce, the underlying join operation, perform

SQL word test:

  1. A login_in table, userID, login_time, and IP, contains a large amount of data. A person may have multiple logins.
  2. The login_IN table is used to count the total number of logins (PV) and total number of logins (UV).
  3. Userid, follow_list A [B, C, D] B [A, C] C [D] count how many friend pairs there are in the table

Testimonial: this mainly inspects you to write SQL at ordinary times much, the number warehouse development post will be more concerned about SQL ability

Recommended reading:

Why did we choose Parquet

4 MapReduce&Spark

  1. The MapReduce operation process involves several sorts
  2. Spark Task execution process
  3. Difference between MapReduce Shuffle and Spark Shuffle
  4. Spark’s memory management model
  5. Tell me about the Spark Shuffle
  6. Do you know the Spark Shuffle bypass model
  7. What problems do YOU encounter in using Spark and how do you solve them

Acknowledgements: MapReduce&Spark is the main offline computing engine, so you need to be familiar with the task scheduling process and potential performance bottlenecks, and understand component principles and tuning. If you encounter and solve large data engineering performance problems in your work, you will gain extra points

Recommended reading:

Overview of MapReduce Shuffle and Spark Shuffle

Basic principles of Spark memory management

Spark Performance tuning n/A Shuffle Related parameters

5 Spark Streaming&Flink

  1. Spark Streaming versus Flink
  2. How can Flink be Exactly Once
  3. Can Spark Streaming achieve Exactly Once semantics
  4. What StateStore Flink has and what it has used in its work
  5. Did you do Flink memory tuning
  6. Have you ever met OOM situation? How to deal with it
  7. Talk about Spark Streaming and Flink backpressure mechanism
  8. Flink window function, time mechanism, CheckPoint mechanism, two phase commit
  9. Flink shuangliu Join
  10. How to set Flink State TTL
  11. What are the methods of Flink dimension table association and how to deal with the large amount of data

Some other word questions:

  1. Real-time PV and UV statistics
  2. Real-time TOP N statistics
  3. Live join of AD exposure stream and click stream

Remarks: Not only should you have a clear understanding of the principle of components, but also have actually done real-time related business development. The interviewer will also give you some of their scenarios for you to say how you will design, so you need to pay more attention to the implementation of real-time business scenarios.

Recommended reading:

Flink DataStream Uses the associated dimension table

Large screen real-time computing of e-commerce based on Kafka + Flink + Redis

Ali Jiang Xiaowei on the similarities, differences and advantages of Flink and Spark

Flink base | deep understanding of Apache Flink core technology

6 Data Warehouse

  1. How is the warehouse structured and layered in your company
  2. Talk about the difference between paradigm modeling and dimensional modeling
  3. Tell me the difference between a star model and a snowflake model
  4. Design a retention model for each channel
  5. What about the slowly varying dimension
  6. How do you synchronize the data to the data warehouse? How do you ensure that the data is not lost
  7. How is data quality controlled
  8. How is the data specification defined
  9. If metadata management

Remarks: we need to understand the methodology of data warehouse, and the overall construction concept of data warehouse should be correct. To meet a business requirement, we can give a reasonable data warehouse construction model.

Recommended reading:

Distinguish the connotation and differences of BI, data warehouse, data lake and data center (suggested collection)

The layered theory of data warehouse

7 Kafka

  1. Tell me about Kafka
  2. Kafka fundamentals, advantages over other MQ
  3. Talk about the difference between Kafka’s high and low consumer apis
  4. Kafka’s ACKS are different
  5. How do Kafka consumers get their data from Kafka
  6. How Exactly Once can Kafka produce and consume
  7. How does Kafka guarantee order
  8. What is Kafka Controller for
  9. How to elect Kafka multi-replica leader
  10. What is the Kafka consumer group rebalancing process
  11. What is the difference between Kafka offset and Kafka offset
  12. How to check the consumption progress of consumers

Recommended reading:

Kafka common interview points summary

Kafka high availability implementation principles

8 HBase

  1. How is the HBase RowKey designed
  2. Talk about the hot issues and how to solve them
  3. This section describes the HBase read/write process
  4. What optimization is performed during HBase usage
  5. This operation acts as an HBase Compaction mechanism

Recommended reading:

HBase: Designed for efficient and scalable distributed systems

B+ Tree and LSM-tree for data storage retrieval

Details about the HBase Compaction process

Understand the HBase architecture in one view

9 Redis

  1. What data structures Redis contains
  2. The underlying implementation of ordered collections in Redis
  3. What are the data persistence methods of Redis and their advantages and disadvantages
  4. Consistent hashing, okay

Recommended reading:

Why do we use Redis

How to use Redis to count unique user visits?

Redis architecture evolution

conclusion

In daily work, we must pay attention to summary and accumulation, check loopholes and fill gaps, and constantly improve our knowledge system.