Internet advance batch basically come to an end, big and small offer also took some, bing returns to society, adhere to socialism with Chinese characteristics, save the idea of a new code of farmers, sorting out the gluten of each company, test points, I hope to inspire you to move brick workers, play a full drive, strive for the upper role

stunt

Little sister ♀ said first take out offer fried people, otherwise no one see… I’m not that vulgar, but I’m not that refined either…

I began to look for a job in the middle of July, looking for the direction of big data research and development or big data algorithm, anyway, around the big data to hit ☯️, as of September 12, Tencent, Meituan, 360 security Research Institute, Sogou, shell have received the intention to offer

Of course, noodles also eat a lot 😭

  • Ali recruitment system has a bug, did not find the opportunity to interview, and then accepted a wave of persuasion written test, recently to see if I can find a kind little sister to push me in the interview
  • Baidu three phone did not pick up, and then did not then…
  • Netease picked up a wave of xuan surface ~
  • The headlines also went to face a wave, but really did not review so complete, come back to the sea to learn a wave, also calculate value
  • Didi gave a “true push” (directly asked to come to the internship is not, come to have…) Recently, however, in the limelight, considering…

I do not introduce the specific work content of the offer, for fear of being found out by HR little sister, I am invited to tea…

Small talk

I Beijing 985 slag master one, is the real slag, roommate shang Tang, kuang world, study abroad, school of all kinds of big guy a tuo, in the crack to survive, mix rice to eat

Main attack direction: None. Leaders make administration, every day end tea pour water take express, external hanging write endless fund book, the paper all rely on personal insight + god mercy

Let me tell you something serious. As for my work route, I finally chose to focus on big data research and development. The main reason is that I took some distributed courses in my first year of graduate school, and I can still bluff people by talking about them. Personal feeling algorithm can also do, research and development, algorithm six of one, six of the other

School side, there are excellent college endorsement naturally better, no words, there is no bar, cattle are not bad this, scum you are not strong, do not have to force

The project, as you read the book, is important to understand the project, otherwise the interviewer will have no questions, your awkwardness will be cool. This project does not have to be really yours. When looking for a job, all the projects of your predecessors and friends are yours. Under the package, what you say is yours is yours, and no one will check your bottom line. I was lucky enough to do some side work on a data flow project, but on my resume, I became the project’s lead… Experience the art of packaging for yourself…

Resume, I used to think that we “haven’t seen the pig run haven’t eaten pork”, so much emphasis on the importance of resume online, until I saw my little teacher younger brother’s resume, found that someone really didn’t eat pork (I learn master he professional master, looking for a job, he has Baidu factory endorsement, but this job looking for alas…) . The simplest modification method, the classmate that lets you nearby see, the first feeling is ok, ok, not ok gg, according to somebody else’s change change

Now before, prepare to be early, cast also want to be early basic in the middle of July on and off began to have advance approval, miss did not have >… <

Dry goods

This goods also don’t do, past a whole time, I can write as much as you can write, test site website a lot of

technical

Let’s start with the technical process:

  1. Let you introduce yourself (get ready)
  2. Look at your resume and say you’re familiar with this? balabla… You ask simple familiar, ask difficult sorry not to understand
  3. Have a look at your internship experience, let you introduce what internship does, is there any big project can take hands, dry talk for 10-20 minutes
  4. Ask about career planning. Here to show your attitude of piety, not random job-hopping, steadfast work, let him feel you want to enter this company, two eyes can shine better… The big boys can make a big splash
  5. What’s wrong with you? I prepare two problems commonly, what can you introduce this branch to do specifically? When will I know if I’ve passed or not? Do not ask as far as possible, each big guy had said in the post

Must take the initiative in our own hands, the interviewer will ask a lot of you don’t know or don’t know the problem, then you can say I use at ordinary times less than the (understandable, where fresh graduates understand so many crooked ways, please ignore my bosses), and a little then said he to the XXX field, led him to ask you

Opposite to ask you, actually very test of your defense skill, when you are familiar things, that’s OK, balabala can, ask you are not familiar with, you only know some fur, but also to your resume writing), you can say “I encountered while doing project, generally has examined the, have a little impression, you can try to ask”, this will make the interviewer to lower expectations, The difficulty of the question will be reduced, bonus points for the answer, not the problem

HR side

  1. Self introduction (prepared version of non-technical! From undergraduate to university, research direction, reason for choosing this company, internship experience, you can usually gather enough time to make each other feel that you are a talkative person.
  2. Talk about hobbies (Come up with healthy hobbies, game companies can consider unhealthy ones…)
  3. Talk about your internship experience, and how are your superiors and subordinates doing
  4. What’s wrong with you? I don’t know much about this department, could you introduce it in the system? Introduction to orientation?

The following specific lala factory interview experience, I put out the more characteristic questions, unified I put in the following knowledge points, rewrite too much trouble…

tencent

Maybe I’m lucky?

One side

Over the phone, asked the basic questions asked Java, which is the impression

  • Final this field in the field and method difference, in the method seems to be able to speed up, we check it yourself.

Second interview

In person, basically around your resume

  • Xgboost is an improvement over GBRT
  • TopK problem

On three sides

Director, this side basically talk about internship projects, used technology, etc., I do not know whether to ask sp questions… I suggest you 0-offer party (offer full of big guy please ignore me) do not ask sp in this link, the final to sp, it seems not this guy decided, specific or depends on the chat situation, chat opened anything can chat, chat not open, then 88

Meituan

Difficulty: normal

One, two

  • Java asked a bunch of questions
  • Spark asked a bunch
  • Then he introduced me to the department, and I was basically listening to him…

Behind the knowledge points have I will not expand to write o_o….

On three sides

Ask me to brush the question? (What answer do you want me to give), cup pouring puzzles, internship projects

360

Difficulty: normal +

One side

I hate a lot of things about Python, I use Python as a script, can we talk about it in a different direction, and then I hate a lot of things about Python

  • expect
  • Accuracy recall rate
  • AUC and ROC
  • Decision tree chat
  • Explain how you design the model

At first the questions were more algorithmic, then I said I was actually good at architecture stuff… The little sister said, so ah, but see you answer of return can ah, structure I don’t ask you…

Second interview

I asked some questions about data warehouse, snowflake and star database, etc. I said I was confused, and then I pulled Spark Streaming with him, he was quite interested in this, it was a bit of a wobble, and then he said how to work overtime, and then he began to introduce his NB team, I began to admire, worship… Finally, I asked if I could come to the internship, and the basic answer was yes. I need to know the time and it is better to wait until all the offers come down. I estimate it will be in the middle of October

headlines

Difficulty: father

One side

  • Spark Streaming disliking a wave
  • How to guarantee exactly once
  • TCP full connection pool and half connection pool
  • 1. The binary tree snake skin??

Second interview

  • Yarn scheduling algorithm
  • Pull spark Streaming
  • An algorithm, two nodes in the tree to find the closest father…
  • Internship chat

The interviewer was not very cold and asked me if I had anything to add, but I said no… Cataplexy, bloody lesson

sogou

Difficulty: hard

One side

Two interviewers handed out a paper with everything on it:

  1. Query index of a number in a sorted, repeated array, be careful not to degenerate into o (N) algorithm
  2. Find the nearest common parent of two nodes in a tree.
  3. linux ? $# $0
  4. Puzzle 50 red 50 black divide two bags to touch the ball
  5. n! How many zeros are there after that
  6. Basic operations of HDFS
  7. Spark is used to handle big data processing problems

Second interview

  • Why am I not considering a PhD? (Learn too poor, feel there is no need to read…)
  • Ask me what I thought of the previous interviewer (are you going to fire them…)
  • Internship chat

shell

Difficulty: normal +

Speaking truth shell I was holding to play small strange mentality to go, and did not intend to leave, but the interview experience is very good, finally give the salary super high, welfare also explosion, shortcomings we understand, partial pension, see you choose. Is to help the big brother of the wave of advertising

One side

How to deal with the data skew? I personally feel that I gave a poor answer. I didn’t think about this problem carefully before, but later I checked and found that there were many trails in it.

However, the interviewer was very nice and asked me about other aspects, which was not bad. After that, we discussed about career development for 30 minutes, which was really worth it. Can I add you to wechat?

Second interview

The younger, cooler guy met mine

  • Spark architecture
  • Shuffle process
  • Principle of Spark Streaming
  • A chessboard on the top left corner to run to the bottom right corner of the problem, from search, DP, to mathematical methods are discussed again
  • Machine learning by the way
  • The internship had a chat

digression

Write to a man who needs, and does not want to die. ◕ ‿ ◕.) ノ

  1. Read bo really good, we can consider, is really a way out, especially you are younger words
  2. Account Internet can not have both, recently considering account alas
  3. State-owned enterprises, banks, civil servants can consider ah, feel this kind of work free to accompany family

knowledge

Finally, to sort out the knowledge in the process of my interview (maybe wrong -_ – | |), for everybody to leak the completion, I hope everyone can find a good offer (@ ^ 0 ^ @)

Java

Threads concurrent

This is a sinkhole, basic ask not over, the difficulty is also pull full of the kind of…

Q: What is the difference between a process and a thread?

  1. A process is the basic unit of resource allocation and a thread is the smallest unit of program execution
  2. The process has an independent address space, threads depend on the process exist, thread switching overhead is small
  3. A multi-process service is more stable. When one process dies, it does not affect the other. In contrast, when one thread dies, all threads supporting that process crash

Q: Interprocess communication?

  • The pipe
  • A semaphore
  • The message queue
  • Shared memory (IPC)
  • socket

The core purpose is to exchange data

In addition to enumerating, the specific concept of these nouns should also be clear, portal

Q: Communication between threads?

  • Locking mechanism
  • A semaphore

The core purpose is synchronization

Q: What is the difference between Callable and Runnable?

  1. The core difference is that Callable has a return value, Runnable does not
  2. The Callable method is call() and the Runnable method is run()
  3. Callable can throw exceptions, while Runnable cannot

Q: Future and Callable?

  • After the Callable is executed, a result is returned, which can be returned via the Future class.
  • In addition, you should know about FutureTask, which implements Runnable and Future and has constructors that receive Callable

Q: How to create a thread?

  1. Start ();
  2. Runnable: start()
  3. Use the ExecutorService submitted

Q: What does the volatile keyword do?

  1. Preventing instruction reordering (in singleton mode)
  2. Memory visibility

Q: Synchronized?

  1. Decorates instance methods that apply to the current object and do not conflict between two different objects
  2. Modifies static methods that apply to the current class and conflict with two different objects
  3. Modifies a code block that locks the specified object

Q: What about the Java memory model?

A lot of online, citing predecessors on the Qingming River map

Q: What about CountDownLatch and CyclicBarrier?

  • One thread in CountDownLatch waits for several other threads to complete.
  • In a CyclicBarrier, several threads wait for each other to complete an event.
  • Cyclicbarriers can be reused.

An arbitrary door

Q: Semaphore?

Control the use of a set of resources, acquire and release the set of locks through acquire() and release(), panpan security door

Q: What does ThreadLocal do?

Modifies variables to control their scope so that they are shared among several functions in the same thread.

Heavyweight directions

Q: What is the difference between singletons and multicases?

  1. Singleton non-static and static variables are both thread-unsafe
  2. Many non-static variables are thread-safe, but static variables are still thread-safe
  3. Static variables can be thread-safe with synchronized or ThreadLocal

bosses

Q: When will the lock be released?

  1. After executing the synchronized code block
  2. An exception occurred while executing a synchronized code block, causing the thread to terminate
  3. When the wait keyword is encountered during the execution of a synchronized code block, the thread releases the object lock, and the current thread enters the thread wait pool and waits to be awakened

Q: Notify when to wake up?

Notify does not immediately wake up the thread in the thread wait pool. Instead, it releases the current object lock and wakes up the wait thread after the current synchronized code block completes execution.

Q: What is the difference between notify and notifyAll?

Notify notifies one thread to acquire a lock, and notifyAll notifies all related threads to compete for the lock

Q: What about Lock?

Lock was born to make up for the defects of synchronized, mainly to solve two scenarios

  1. Read and write operations should not be mutually exclusive
  2. Avoid permanently waiting for a lock

Lock is a class, not a Java keyword, and requires manual Lock release as opposed to synchronized.

Q: What kind of locks?

  1. ReentrantLock, for example
  2. The interruptible Lock, lockInterruptibly() reflects the interruptibility of the Lock
  3. Fair Lock, synchronized Unfair Lock, Lock default unfair Lock (adjustable)
  4. Read/write locks, such as ReadWriteLock

A hub for big shots

A collection of

Collection is relatively easy, and the regular essay questions will basically ask about HashMap

Q: TreeSet feature?

The inner elements are sorted by compare.

Q: LinkedHashMap feature?

An internal bidirectional linked list maintains the order in which keys are inserted, allowing the map to iterate based on the order in which keys are inserted.

Q: What is the difference between ArrayList and Vector?

ArrayList is thread-safe; Vector is thread-safe.

Q: How is LinkedList different from ArrayList?

  1. LinkedList is based on linked lists, and ArrayList is based on arrays
  2. LinkedList has no random access feature
  3. ArrayList removes added elements less efficiently than LinkedList

Q: What is the difference between HashMap and HashTable?

  1. HashTable thread is safe, HashMap thread is not
  2. A HashMap allows null keys and values, whereas a HashTable does not

Q: What is the difference between a Set and a List? What are the subclasses?

Set does not allow repeating elements, List does allow repeating elements, List has an index

  • Set: HashSet, LinkedHashMap, TreeSet
  • List: Vector, ArrayList, LinkedList

Q: What is the difference between hashCode(), equals() and ==?

  1. Equals compares whether two objects are equal, and if they are, their hashCode must be equal
  2. If two objects have different Hashcodes, they are not equals
  3. == Compares memory addresses to see if they are the same object

Q: Are objects added to a Java container references or values?

reference

Q: What is the difference between Iterator and ListIterator?

  1. ListIterator can traverse forward and backward
  2. You can add elements
  3. You can locate the current index

Q: HashMap implementation?

The content is huge, quoting big guy face classics, worth a look, directory for your reference

  • The concept of hashing
  • Resolving collisions in HashMap (zip method)
  • Equals () and hashCode()
  • Benefits of immutable objects
  • HashMap multi-threaded conditional competition
  • Resize the HashMap

PS: HashSet is implemented via HashMap

Q: ConcurrentHashMap and HashTable?

  1. HashTable implements thread safety through synchronized
  2. ConcurrentHashMap A segment lock is used to lock only one part of the map

GC

This section introduces the partitioning of JVM memory and GC algorithms

Q: What are memory leaks and memory spills?

  • Memory leak: the memory space that has been applied cannot be released. The damage caused by a memory leak can be ignored, but the accumulation result is very serious. No matter how much memory is, it will be leaked sooner or later.
  • Memory overflow: There is not enough memory space for it to use.

A memory leak can eventually result in not having enough space to allocate objects, leading to an overflow of memory, or of course it is possible to start allocating objects that are too large

Q: What causes memory overflow?

  1. Too much data is loaded in memory, such as too much data is fetched from the database at one time.
  2. The collection class has references to objects that are not emptied after use, making it impossible for the JVM to recycle.
  3. Code that has an infinite loop or loop that produces too many duplicate object entities.
  4. The memory value of startup parameter is too small. Procedure

The back door

Q: JVM memory partition?

  • Object heap:
  • Method area: classes, static variables, and constants
  • Stack: local variable table

Basically say the above three can, see the picture below for more details, the front door

Q: A little bit about recycling?

It’s not easy…

Garbage definition:

  • Reference counting: Circular references are buggy
  • Reachable algorithm: GC Roots, such as reference objects in stack, static method area, constant objects, local method area objects, not in the heap

Heap memory distribution:

  • New generation (33%) : Eden: From Survivor: To Survivor= 8:1:1
  • Old age (66%) : large objects, long-term survival objects
  • Immortal generation (beyond worlds) : Methods are usually implemented using immortal generation

Garbage collection algorithm:

  • Mark clearing algorithm
  • Replication and clearance (New generation)
  • Mark sorting and clearing (old age)

It’s the door

Q: What is the difference between Minor, Major and Full GC?

  • Minor GC is garbage collection for the new generation
  • The Major GC does garbage collection for older generations
  • Full GC does garbage collection for the entire heap

Q: When does the Full GC trigger?

  1. System.gc(), not necessarily triggered, just suggested
  2. Lack of space in the old era (core trigger point from which all other schemes are derived)
  3. Insufficient space in immortal generation (when placing method region in immortal generation)
  4. The size promoted to the old age after Minor GC > the old age remaining space (actually 2). A manifestation of the lack of space in the old era)
  5. Allocating large objects in the heap (large objects can go directly to the old age, resulting in insufficient space in the old age)

Q: What is a constant pool?

Constant pools are divided into static constant pools and runtime constant pools.

  • Static constant pool: Refers to the constant pool in *.class files
  • Run constant pool: refers to the location in memory where constants from *.class files are loaded into the method area (also known as eternal generation when method area is in immortal generation)

Information included:

  • String literals
  • Class and method information

This problem usually leads to string constant comparisons

    String s1 = "Hello";
    String s2 = "Hello";
    String s3 = "Hel" + "lo";
    String s4 = "Hel";
    String s5 = "lo";
    String s6 = s4 + s5;
    String s7 = "Hel" + new String("lo");
    String s8 = new String("Hello");
    String s9 = s8.intern();

    System.out.println(s1 == s2);  // truePrintln (s1 == s3); //trueS6 system.out.println (s1 == s6); s6 system.out.println (s1 == s6); s6 system.out.println (s1 == s6); //falseS3 system.out. println(s1 == s7); s3 system.out.println (s1 == s7); //false, contains the object new String("lo")
    System.out.println(s1 == s8);  // falseSystem.out.println(s1 == s9); //trueLiteral comparisonCopy the code

Class loading

Someone asked about it in the interview, went back and looked it up

Q: What about the class loading process?

  1. Load: Loads *.class files into memory through various class loaders

  2. Links: In three steps

    1. Validation: To ensure that the byte stream loaded in conforms to the JVM’s specifications, which I understand as syntactic validation (and may not be strict)
    2. Preparation: Allocates memory for class variables (non-instance variables) and assigns an initial value (this initial value is the JVM’s own agreed initial value, not user-defined initial value, unless constant, modified with final static)
    3. A.a()=> some memory address
  3. Initialization: Initializes a class variable, executing the constructor for the class variable

The door

Q: Java initialization order?

This is an interview question I met in iQiyi. I almost lost my eyes at that time. First time I found so many things to initialize

  1. Order of initialization in a class (class before instance)

Class contents (static variables, static initializer blocks) => Instance contents (variables, initializer blocks, constructors)

  1. Order of initialization in two classes of inheritance relationships (class after instance, parent after child)

Superclass (static variables, static initializer blocks) => subclass (static variables, static initializer blocks) => superclass (variables, initializer blocks, constructors) => subclass (variables, initializer blocks, constructors)

The stone door

Q: What kind of Java class loader?

  1. Start the Boostrap class loader in

    /lib
  2. Extension Extension class loader: load path

    /lib/ext
  3. System class loader: load path -classpath

Q: What about the parental delegation pattern?

Delegate means to load a class and see if it has been loaded. If it has been loaded, it doesn’t load and uses the class that was loaded as the loader.

The next loading order is System->Extension->Boostrap

Advantages:

  • Avoid reloading classes
  • The core API will not be changed

object-oriented

These questions are very weak chicken, but the examination is also more

Q: Three features of object orientation?

Recite the following: encapsulation, inheritance, polymorphism

And then I’m going to ask you to talk about how these three features are represented, so think about it for yourself, makes sense, right

Q: What are the differences between interfaces and abstract classes in Java?

  1. Multiple interfaces can be implemented (Implement), but only one abstract class can be inherited (extend)
  2. Methods in interfaces cannot be implemented; partial methods can be implemented in abstract classes
  3. The data in the interface are all public static final types, and the methods are all public Abstract
  4. Essentially, interfaces are saying what objects can do, and abstract classes are saying what objects are

Q: Overloading and overwriting?

Pig brain, old memory mixed

  • Overloading: In the same class, the function name is the same, but the parameters must be different, can return different results
  • Rewrite: different classes have the same function name, the same arguments, and the same result

Design patterns

Q: Give me an example of a design pattern you know?

Generally speaking, 5 or 6, there is a sample on the line

  • Combined mode: addAll for collections
  • Decorator pattern: Various nesting of streams
  • Abstract factory: Driver creates new connections in JDBC
  • Builder mode: PreparedStatement in StringBuilder or SQL
  • Chain of responsibility: Various filters for processing requests in structs2
  • Interpreter: Regular expressions
  • Observer: Events in Swing listen to various Listeners

Q: A single case of hand lifting?

After that, I’ll let you talk about the internals, volatile or multiple cases

    public class Singleton {
        private volatile static Singleton singleton;
        private Singleton(){}
        public static Singleton getSingleton() {if (singleton == null) {
                synchronized (Singleton.class) {
                    if(singleton == null) { singleton = new Singleton(); }}}returnsingleton; }}Copy the code

Network protocol

Q: TCP3 handshake 4 wave?

Basically draw a picture on K.O., fate gate, internal problems are also suggested to have a look

Q: Why does TCP have to have three handshakes instead of two or more?

Two waves of the hand

During the establishment of the first request, the client sends a second request because the request is delayed for a long time. After the establishment, the first request reaches the server and the server maintains another link. However, the link is invalid and wastes resources on the server.

More than 3 times

In theory, it is possible to do more than 3 times, but it is impossible to achieve a perfect and reliable communication, because each reply is a response to the last request, but the reply will still be lost in the unreliable channel, considering the practical efficiency problem, 3 times is enough.

Q: Why does TCP wave four times instead of three?

  • The second handshake carries the response ACK and request SYN
  • Do not carry both types of information at one time during the wave wave because the server may have incomplete data transfer.

Q: TCP half-connection pool or full connection pool?

  • Semi-connection pool: When receiving the first handshake request, the client adds the link to the semi-connection pool, which is the main target of Synflood attacks
  • Full connection pool: When receiving the request in the second step of client handshake, the client removes the link from the half-connection pool and puts it into the full connection pool.

Q: The difference between TCP and UDP?

  1. TCP is based on connections, while UDP is based on connectionless
  2. TCP consumes more resources because of the handshake and wave
  3. TCP is a data stream, while UDP is a datagram
  4. TCP ensures data correctness and order, while UDP may lose packets and does not guarantee order

Q: TCP and UDP applications?

  • TCP: FTP, HTTP, POP, IMAP, SMTP, TELNET, and SSH
  • UDP: video streaming and Voip

Q: TCP/IP and OSI model?

TCP/IP model, bottom up

  1. The link layer
  2. Network layer (IP, ICMP, IGMP)
  3. Transport Layer (TCP, UDP)
  4. Application layer (Telnet, FTP)

OSI model, bottom up

  1. The physical layer
  2. Data link layer
  3. The network layer
  4. Transport layer
  5. The session layer
  6. The presentation layer
  7. The application layer

Q: Which protocol is the ping command based on?

ICMP

Q: What is the difference between blocking and non-blocking IO?

The block type

  • A thread is opened for each connection, 10 requests for 10 threads
  • Threads spend most of their time waiting for data to arrive, wasting resources
  • Suitable for applications with small amount of concurrency and large amount of data

Non-blocking type

  • The basic idea is to put all the connections in a table and then poll them
  • The implementation can use an event notification mechanism and can handle 100 requests with 10 threads
  • Suitable for applications with a large amount of concurrent data

The database

I have done development with database, but I don’t know much about it. When I was asked if I could write SQL, I answered “simple can, complex try”. SQL complex up really not written by people…

Q: What is the difference between a clustered index and a non-clustered index?

  • Clustered index: Leaf nodes are actual data, and there can only be one clustered index in a table
  • Non-clustered indexes: leaf nodes are addresses that need to be jumped again. A table can have multiple non-clustered indexes

Q: Where, group by, having order of execution?

  1. Where filters row data
  2. Group by group
  3. Having Filter group

Q: Star, snowflake?

  • Star: Partial redundancy exists
  • Snowflake: The watch is segmented very fine, there is no redundancy

from

Q: SQL vertical to horizontal, horizontal to column?

Basically, this is the hardest SQL problem except for the group by + aggregate function

  • Longitudinal to transverse

sum(case when A=’a’ then B else 0 end) as D

We need sum or some other aggregate function here, because we’re working in a group

  • Transverse to longitudinal

The core with the union

So remember these two things and just do them, Demo

Q: Dirty read, unrepeatable read, phantom read?

  • Dirty read: Transaction A reads the value committed by transaction B
  • Non-repeatable read: Transaction A reads the value of transaction B twice. Transaction B changes and commits the value during this process, causing the two read values of transaction A to be inconsistent
  • Unreal: transaction A modifies A to B, and transaction B adds A new A in the process, causing the newly added A to not change to B

This leads to the transaction isolation level

Transaction isolation level Dirty read Unrepeatable read Phantom read
Read uncommitted (read-uncommitted) is is is
Read -committed no is is
Repeatable read no no is
Serializable no no no

Q: Three ways to implement join?

  • Nested loops work better when the inner table has an index
  • Merge join: Merge two tables into sort (if there is no sort)
  • Hash Join: Hash the table and then scan the other table

Linux

Q: Check whether port XXX is occupied.

  • netstat -tunlp |grep xxx
  • lsof -i:xxx

Q: Check whether the XXX process is occupied.

  • ps -ef |grep xxx

Q: Check the CPU usage?

  • top

Q: Check the memory usage?

  • free
  • top

Q: Check the disk usage?

  • df -l

Q: $0, $n, $#, $*, $@, $? ,? Meaning?

variable meaning
$0 The file name of the current script
$n The NTH argument passed to the script
$# The number of arguments passed to the script
$* Pass all parameters to the script
$@ Pass all parameters to the script. Slightly different from $*,Out of the torso
$? Exit status of the previous command
? Current Shell process ID

Q: >, >> difference?

  • > : Redirect to a file
  • >> : Appends to a file

Q: >, 1>, 2>, 2>&1, 2>1?

  • > : Defaults to correct output to a file, error output directly to the console
  • 1> : Correct output
  • 2> : Error output
  • 2>&1: Redirects the error output to the correct output, usually preceded by 1> a.txt, so that subsequent errors are also output to a.txt, through the correct output
  • 2>1: error output to 1 file, error writing, distinguish &1

Q: Scheduled task command?

  • crontab

algorithm

Algorithms of the ocean of endless, but to deal with the algorithm of interview questions, personally think “sword point offer” a sufficient…

Personal “Sword Finger Offer” brush about four times, basically see a question, all know the solution, interview also basically from here

I met the algorithm questions out of the scene (except “sword finger offer”), generally violent search questions, do not come up to DP…

A classic problem

  • Substring matching problem
  • Subsequence matching problem
  • Merge list
  • The nearest common parent of two nodes in a tree
  • Fast row, stack row
  • Various types of binary lookup
  • The two numbers are swapped without a third variable
  • Pond sampling, big guy problem solving

brainteasers

  • The probability that a stick breaks randomly into three sections to form a triangle
  • Water problems
  • Flour weighing problem
  • Burning rope problem

Big data

There is a general bias in favour of various frameworks

  • Hadoop
  • Yarn
  • Spark
  • Hive
  • HBase
  • Zookeeper

Above frame, everyone take what they need, there are always a few to be able to take out to blow, I personally mainly blow Spark

Hive and HBase are also used as tools. If I have used them, I will ask more questions. I usually answer that I have built them

Zookeeper maintains distributed consistency at the bottom, so knowing something like raft, the distributed protocol, is a plus

Hadoop

Q: Two table Join scheme?

  1. Reduce Side Join: basic
  2. Map side Join: Distributes small tables to create only maps
  3. Semi Join + Reduce Side join: Extracts the key of a table, distributes it, and performs reduce side join to reduce the amount of join data
  4. Semi Join + Bloomfilter + Reduce side Join: Based on the improvement of the above scheme, it mainly deals with the situation that the key is too large to be placed

3 and 4 schemes can be understood, personally feel a little unreliable, generally not mentioned in the interview, the interviewer generally request to 2, there is another data tilt

Q: MapReduce process?

Big data jobs must be examined

I don’t know what to say, but I suggest the portal of the Internet mogul

After watching, you can answer the following questions:

  • How to process map data when it is full
  • Combiner Function and position
  • How many times does sort occur, location, what sort

Q: Is Secondary NameNode used in Hadoop?

Merges fsimage and Editlog

Yarn

Q: Yarn architecture?

Figure of goddess

Q: The advantages of Yarn over Hadoop, or why Yarn?

  1. Simplify JobTracker and delegate its functions to ResourceManager and ApplicationMaster
  2. Resources are in memory, which makes more sense than the remaining slots
  3. Container abstraction enables clusters to support multiple frameworks, such as Spark

Q: Three kinds of Yarn schedulers?

These three pictures are good, but I don’t think they go far enough… Look around if you’re interested

Step chariot figure

Q: What does Yarn delay scheduling mean?

When the resources required by the job are not met locally, it will be delayed for a period of time and then try to schedule the job. In case of failure, it will be scheduled on another machine, mainly because local scheduling is the most efficient.

Spark

Q: How many deployment modes does Spark have?

  1. local
  2. standalone
  3. yarn
  4. mesos

Q: Standalone basic architecture?

  • Client: Submits a job
  • Master: Collects jobs submitted by clients and manages workers
  • Worker: Manages resources on this node and periodically reports usage to master
  • Driver: Includes DAGScheduler and TaskScheduler. It determines whether the Driver is on the client or worker based on the client and cluster
  • Executer: Located on the Worker where the job is actually executed

Q: Which is more efficient, groupByKey or reduceByKey?

  • ReduceByKey is more efficient and can be executed on each executor with merge logic, resulting in more compact results (i.e. keys and values) and less shuffle
  • GroupByKey keeps all data of the same key.

Q: What is data skew? How to deal with it?

Must test questions, can ask deep…

Definition: During shuffle, too many values corresponding to several keys are concentrated in a Reduce task, which results in the task processing being slow or crashing out of memory.

Solution:

  • Switch to a more powerful computer with more memory: To avoid running out of memory, but fixing the symptoms rather than the root cause will not satisfy your interviewer
  • Change the degree of parallelism: it is possible to divide the keys that have many values into several different keys, which will not work if they are all concentrated on a small number of keys, or one key
  • Random number is added to perform two aggregations: in the first aggregation, random number _key is used as the new key; in the second aggregation, random number is removed, which means that the partitions corresponding to the original key are first partially aggregated and then unified aggregated. Interviewers generally expect to finish here

Ask big guy to dial, I think the random number algorithm can solve a certain data tilt, but

  1. The idea of using combiner is consistent with this, right? So random number _key doesn’t seem to have any value, right
  2. This solution can only be used in scenarios where combiner can be used. How to solve scenarios where combiner cannot be used?

Q: What about tilted JOIN?

Related to the data skew above, but not identical

  • Map Side Join: Hadoop mentioned the join mode
  • Add random values and expand the table: map the smaller table in the tilted key to range_key, where range takes every number in [0,…,n-1], that is, each record in the small table will be mapped to N records with different keys; Map the larger table to single_key, where single is generated by random(n), that is, each record in the size table will be mapped to a single record with a random key, and then join

Q: Basic concept?

It depends on how much you know about Spark

  • RDD
  • DAG
  • Stage
  • Wide dependence, narrow dependence
  • parallelism

Q: Enumerate transform and action?

  • Transform: Filter, Map, FlatMap, reduceByKey, groupByKey
  • Action: take, collect, count, foreach

Spark Streaming

I usually play Spark Streaming with interviewers, this part is for those who need it

Q: Spark Streaming principle?

The data stream is divided into mini batch, which is essentially a continuous processing of small batch data. The core is to trigger the submission of jobs on time. In addition, some special processing is made for the concept of window class in stream calculation. I’m not going to expand it here

Q: Data receiving mode?

I generally combine this with Kafka. There are two ways to receive data from Kafka:

  • Receiver: Data needs to be pulled to the local PC and backed up to ensure data integrity
  • Based on the underlying API (Direct) : Kafka ensures data integrity. Spark Streaming only calculates the offset that needs to be pulled

Q: Implementation details of data receiving based on receiver?

The interviewer wants to know how familiar you are with the Spark Streaming source code and whether you have really studied the receiving process in depth.

  1. The data received from the receiver is retrieved from a buffer
  2. There are two timers to deal with it
    • Periodically encapsulate the data in buffer into blocks
    • Blocks are periodically transmitted and stored in BlockManager to ensure data integrity

Q: Exactly once?

Personally, I think this is a very serious problem. It is not the use of a few components, but the coordination and organization of the whole system. I mainly discuss this problem from three aspects

  1. Data source: To ensure that the data source is traceable, in case the original data is lost and cannot be found, this requires a reliable message queue, such as Kafka
  2. Processing framework: The processing framework needs to maintain the offset itself. In case of failure, it can know where to process the offset itself. Since data integrity is guaranteed by upstream, it can use direct to pull the offset
  3. Output: The output operator must be idempotent

My understanding is so much, the feeling is very shallow, welcome big guy to add…

Kafka

Q: Basic architecture?

  1. Producer
  2. Consumer
  3. Broker
  4. Topic
  5. Partition
  6. Leader
  7. Follower
  8. User Group
  9. Offset

It’s basically OK to be able to string these concepts together

Q: How do I introduce the ISR copy policy?

The followers pull copies from the leader and return acks. After collecting enough acks, the leader considers the message committed and returns the committed message to the client.

The leader and followers are said to be in sync. This set changes dynamically. If a follower drops too much, he or she will be kicked out of the set, ensuring a quick response to user requests.

To prevent data loss, you can set the minimum number of follwers required by the set. If the number is smaller than this number, the partition is unavailable

Glance at?

HBase

Q: This topic describes the HBase principle and design.

It’s a great article, ladies in tang Palace

Repeat the following basic concepts after reading

  • Master
  • RegionServer
  • Region
  • memstore
  • HFile
  • HLog

In fact, there is still a chapter ML not written, but I feel that the question is not deep, I did not write, you have the need to fill it (ฅ´ω ‘ฅ)

☞ The little sister said to attach

Little sister said points like ╰(~ ▽)╮