“Live up to the time, the creation of non-stop, this article is participating in 2021 year-end summary essay competition”

preface

Speaking ashamed, last updated blog has been nearly half a year, the drag more reason is magical, because has been in preparation for the interview, so think about jump over the notes again to update to the blog, so drag drag, drag from years to the end of the year, and finally completed the giant leap from small companies to the Internet.

Cause last year

The experience of job-hopping was actually quite tortuous. As can be seen from last year’s year-end summary, I was engaged in some data mining work at that time, so I wanted to try to see if I could be transferred to an algorithm position, such as recommendation algorithm. So I spent half a year on mathematics, machine learning and deep learning. And when I applied for interviews later this year, I was almost wiped out. It could not be worse. First of all, big companies don’t look up to transfer jobs without project experience. Then, small companies are not full of algorithm posts and may have to do everything.

Diligence technology

So after the failure of the algorithm, he went back to his old job of big data development. But it has already arrived in April, because the previous review is completely algorithm of the knowledge and technology stack, to the knowledge at that time to interview big data post, it can be said that there is no chance. Therefore, after making a series of review and learning lists for myself, I decided to give up and continue to cast in gold, silver and four, and first make up what I lack.

Study in 2021

  • Spark source
    • Before just for RPC module, storage module, computing module source
    • This time mainly from spark cluster startup, master, worker startup
    • And the entire job process from Spark-submit to the end of job execution
  • Flink source
    • The cluster starts and the job executes the entire process
    • As well as two-stage commit, checkpoint, broadcast, backpressure details of the source code
  • HDFS and YARN source code
    • The whole process of cluster startup and job execution is rarely used nowadays, but it is still asked in an interview. For example, spark and Flink have different RPC mechanisms, heartbeat mechanisms, and shuffle processes.
  • Hive source
    • It’s basically logical planning to physical planning to execution
  • B station is still silicon Valley teaching video,I have to say that Silicon Valley provides a good video for many beginners or quick refreshers.
    • Offline for warehouse
    • Number of real-time warehouse
    • Kylin
    • Atlas
    • Presto
    • Clickhouse
  • Geek Time Spark Performance Tuning In Action
  • B stand live

The Spark/Flink/Hadoop/Hive source code

Here mainly talk about big data component source of learning, I studied a large data recommend nai architects or something, is almost the source of all frameworks, including I haven’t learn Hbase, Kafka, zookeeper, B stand some video, I will not put a link, you can find careful search, The ability to support a legitimate course is also ok.

Silicon Valley and B station

For the unused big data components, I will generally search on The Silicon Valley or B website to quickly understand what this framework component is mainly doing, what are the characteristics, and then why it needs to be used. Generally speaking, it will take about ten hours to quickly start.

Geek Time Spark Performance Tuning In Action

Spark gave a very in-depth lecture on this course. After class, many students asked in-depth comments, and the teacher answered them all. In particular, the teacher gave a detailed answer to a question I asked more than half a year before the end of the course. Give this teacher a call!

B stand live

Since late last year, study in B site to do a live small anchor, to live is always more than a year now, I am glad to have a few friend often point in watching live, from the stranger strangers to chat sometimes in the studio will learn, career planning, also is one of the regular visitors to the studio, though not every day to chat, But it’s nice to see familiar names in the studio every time we start broadcasting.

Interview in the second half of the year

After half a year of review and preparation, I began to prepare for the interview at the end of October in the second half of the year. I originally planned to practice my skills and then prepare for the golden Three silver four. As a result, I may have been lucky and prepared well. Finally, I got offers from two second-tier Internet companies in December, and completed the leap from a small company to an Internet giant.

Interview preparation

  • Java collection classes, concurrency, JVM, GC tuning.
    • Concurrency I recommend Java concurrent learning
    • JVM I recommend that THE JVM learn (a) – memory structure series
    • Of course, there is time to read the Java foundation learning (directory) can be
  • Spark/Flink/Hadoop/Hive source proficient, shuffle tuning, source code two changes
  • Kafka, recommend Kafka core technology and combat < a > this series of articles
  • Warehouse theory, modeling, business analysis, etc
  • Basic machine learning algorithms
  • LeetCode brush problem
  • Project details, overall framework, architectural design

The interview summary

After the interview I found the following points worth writing down:

  1. Start with a small factory. You might miss out if you go straight to your dream company.
  2. Spark seems to be in short supply in the current job market, and I’m talking about worthwhile mid-sized companies. I personally feel that the current market of big data post is mainly divided into two general directions, one is the number of warehouse Hive, almost every business department will have warehouse post. The other is real-time Flink (plus a Clickhouse for real-time data bins), and the better Internet giants are gradually increasing the demand for Flink real-time data bins. This leaves a small component post (source code 2) and a data platform post, both of which will ask a little bit about Spark. Of course, this is just my opinion of the interview experience in the past month or two. So Flink is already a stack you must know and master to get ahead.
  3. The interview content,
    • The Java part is basically the JVM, concurrency, and collections. Further down the line, you might ask ioc and AOP of Spring. Feel are Java eight-part essay, so we can back or more back.
    • Spark, Flink, Kafka, Hive, hbase, ES, Hadoop have all been asked about, depending on your business stack. So it’s not enough to write it on your resume and just use it. At the very least, you should have prepared the stack, such as its architecture, principles, tuning, trimmings, and even the source code. Of course, I didn’t prepare much for hbase and ES, so I will hang up. However, that position is more oriented to big data storage, which is a little different from my desired position, so I will hang up.
    • Other questions may be asked, CAP theory, mysql index, some of these feelings are also eight-part article, can back it
    • The business part is also very important. The basic interview will ask you to talk about the business of offline, real-time, and warehouse projects you have done, what the difficulties are, what the difficulties are, and how to solve them.
    • There will also be some LeetCode algorithms, as well as basic code handwriting, such as quick sorting, singleton pattern, consumer producer model, Lru model, etc., these can be searched on the Internet, and then knock on your own impression a little more.

The interview results

The interview lasted for about a month. Now, the advantage of the interview is that most of the interviews can be conducted by phone or video in the evening, so that I can keep interviewing to check and fill in the gaps without asking for leave. Four or five interviews a week can be said to be very hard. But to be honest, there is a lot of pressure, especially after a series of failed interviews. So we look at their psychological capacity, do not lose confidence, every failure is for our own shortcomings, the next time can perform better.

Finally, excluding the offers from small companies, the final finalists are the offers from two second – and third-tier large factories, whose salaries are all N*16. In terms of annual salary, compared with the current company, it can be said that it is double. Giant! Sweet!

Compare that with the target set for 2020

  • BI data warehouse learning
  • Zk source code, Redis source code learning
  • An Ali Tianchi match every 2-3 months
  • Machine learning algorithms are reproduced in Python
  • Finish brushing LeetCode’s Easy and Medium!
  • Can get a satisfactory offer

Can only say that the plan is not as fast as change, at that time still want to turn the algorithm, the results finally jumped back to the big data post.

The 2022 target

In the New Year, I will continue to study in the new company and change from 1065 to 11105. I hope I can maintain my current learning momentum and constantly improve myself. And start to exercise, this year because all things want to do after job-hopping, so this year almost did not exercise fitness, fat several pounds, from this month will start to resume fitness.

Here are your goals for 2022:

  • Spark/Flink/Hadoop/Hive source code study notes organized into a blog post
  • Continue to zK, hbase, Kafka source code learning
  • Entered dachang, pay more attention to the new technology stack and cutting-edge technology information
  • Attend an offline technology salon
  • Do a technology sharing within the department
  • Keep working out and keep clocking in or at the gym more than twice a week
  • Take off a single