Translation | Wang Ke coagulation


Product | AI technology base (public ID: rgznai100)

Since the beginning of this year, I have been working in India in the fields of data science, machine learning and deep learning. During his 34-day job search, he interviewed with eight to 10 companies, including startups, service-based companies, and product-based companies. The author has written this article in the hope that his interview experience will provide some useful information for job seekers. I hope you can gain something after reading it!

First of all, let me introduce myself:

I have more than 4 years of experience in machine learning (speech analysis, text analysis and image analysis). In general, I think most job positions in this field mainly involve text analysis (natural language processing) and image analysis (computer vision). Very few companies are hiring for voice or audio analytics. My goal now is to apply for a mid-to-senior position where I can lead a deep learning or machine learning team on some interesting projects.

Here are some of the questions I was asked during the interview process that I hope you found helpful.

▌ Company 1: A company with global services (interview duration: 20-25 minutes)

  1. You mentioned in your resume that you built a document mining system. What did you do? Can document clustering be implemented using LDA techniques in Topic Modeling?

  2. Suppose you have hundreds of megabytes of data files, including PDFS, text files, images, scanned PDFS, and so on.

  3. How do you read the contents of scanned PDF files or written documents in image format?

  4. Why is Naive Bayes called naive?

  5. Please tell us more about naive Bayes classifier.

  6. What is deep learning? What is the difference between deep learning and machine learning?

▌ In addition, the interviewer asked me some other questions, but I was so confused that I had no idea what answers he wanted to hear. I always wanted to talk about technical issues, such as training a Tesseract or language model, but he didn’t seem interested. Maybe he just wants to hear about something that has been done, or a good explanation, or a better solution. I feel there is no difference between them interviewing a new person and interviewing an experienced professional.

▌ Company 2: A company with global services (interview duration: 40-45 minutes)

  1. How to perform file clustering in unsupervised learning?

  2. How do I find files related to certain query statements/searches?

  3. Explain tF-IDF technology.

  4. According to my experience, TF-IDF technology does not work well in file classification or clustering. How would you improve it?

  5. What is long short-term Memory Neural Network (LSTM)? Explain how it works.

  6. What is the Word2VEc model?

  7. Explain mutable and immutable objects in Python.

  8. What data structures have you used in Python?

☞ The whole interview process was based on text similarity questions, and I passed them all. But again, there was no deeper technical discussion. Maybe the company had several small projects in the field of text analysis, and I finally got the offer from the company.

▌ company 3: a company based on global products and services (interview duration: 40min)

  1. How do unbalanced datasets handle categorization of multiple categories?

  2. How do you do language recognition from a text statement?

  3. How to represent pictographic characters in Chinese or Japanese?

  4. How do you design a chatbot? (I don’t have any ideas, but I try to answer this question with intent and feedback based on tF-IDF similarity.)

  5. Is it possible to design a chatbot using recurrent neural networks to respond to input questions with intent and answer?

  6. Suppose you designed a chatbot on a Reddit dataset using a recurrent neural network or short and long memory neural network that could provide 10 possible responses. How do you choose the best response, or delete the others?

  7. Explain how support vector machines (SVM) learn nonlinear boundaries.

☞ THERE are a few other questions I don’t remember, but this was the first time I went into technical detail in an interview, which led to an offer from the company.

Company 4: ONE-year-old medical startup (interview duration: 50min)

  1. What are precision and recall? Which do you think is more important in medical diagnosis?

  2. Explain the accuracy and recall.

  3. How to draw the receiver operating Characteristic curve (ROC curve)? What is the area under the ROC curve?

  4. How to draw ROC curves for multi-category classification tasks?

  5. List additional metrics for multi-category tasks.

  6. What are Sensitivity and specificity?

  7. What does “random” mean in a random forest?

  8. How to classify texts?

  9. How do I know if I’ve learned a text? Is it impossible without TF-IDF technology? (I replied to use the n-gram model (n = 1,2,3,4) and create a long count vector using the tf-idf technique.)

  10. What else can you do with machine learning? (I suggest a combination of the LONG and short term memory neural network and Word2vec, or a one-dimensional recurrent neural network and Word2vec for classification. But interviewers want to improve algorithms based on machine learning.)

  11. How does a neural network learn nonlinear shapes when it consists of linear nodes? Why does it learn nonlinear boundaries?

☞ There are a few more good questions I haven’t remembered. Although the interview was good, we didn’t see eye to eye on some issues. And during the interview, I found that as a startup, there are only 2-3 people doing ML, DL and DS. I didn’t succeed in the interview.

▌ Company 5: Amazon (Interview duration: 50-55min)

  1. When you train a decision tree, what are its parameters?

  2. What are the criteria for segmentation at a node of the decision tree?

  3. What is the formula for calculating gini coefficient?

  4. What’s the formula for entropy?

  5. How does a decision tree decide at which feature it must split?

  6. How do you use the information you collect from mathematical calculations?

  7. Describe the advantages of random forests.

  8. Boosting algorithm is described briefly.

  9. How does gradient Boosting work?

  10. The working principle of AdaBoost algorithm is introduced briefly.

  11. What kernels are used in SVM? What are the optimization techniques of SVM?

  12. How does SVM learn hyperplane? Discuss the mathematical details.

  13. What about unsupervised learning? What are the algorithms?

  14. How to define the value of K in k-means clustering algorithm?

  15. List at least 3 methods for defining K in the k-means clustering algorithm.

  16. What other clustering algorithms do you know?

  17. Introduce the DB-SCAM algorithm.

  18. Briefly describe the working principle of Hierarchical agglomerate veclustering.

  19. Explain the principal component analysis algorithm (PCA) and briefly describe the mathematical steps of using PCA.

  20. 20. What are the disadvantages of using PCA algorithm?

  21. Talk about how convolutional neural networks work? Specify the implementation details.

  22. Explain back propagation in convolutional neural networks.

  23. How do you deploy machine learning models?

  24. Most of the time we have to build a machine learning model from scratch in C++. Can you do that?

☞ I am interviewing for a Level 6 position at Amazon. Their main focus is on algorithms and mathematics. However, I did not prepare for the knowledge of mathematics. I just talked about what I knew and did not discuss the details of mathematics in more detail. Therefore, the interviewer thought that I was not suitable for the job of Level 6. I believe that if you can remember the general mathematical representation of machine learning algorithms, you can easily pass the Amazon technical interview.

Company 6: A global service giant (interview duration: 50-55min)

  1. What is the scope of the Sigmoid function?

  2. Name the package that SciKit-learn can implement logistic regression.

  3. What are the mean and variance of the standard normal distribution?

  4. What data structures do you use in Python?

  5. What are the methods of text classification? How would you classify?

  6. Explain tF-IDF technology and its disadvantages, and how to overcome the disadvantages of TF-IDF?

  7. What are Bigrams and Trigrams? Explain the tF-IDF technique of two-word collocation and three-word collocation in a text statement.

  8. Examples of applications for Word2vec.

  9. How do you design a neural network? How to achieve “depth”? This is a fundamental neural network problem.

  10. Describe the working principle of LSTM. How does it remember text?

  11. What is a Naive Bayes classifier?

  12. What’s the probability of getting 4 heads out of 10 flips?

  13. How do I get the index of an element in a Python list?

  14. How to merge two PANDAS datasets?

  15. From the perspective of user behavior, you need to simulate a fraudulent activity. How would you solve this problem? This could be an exception detection problem or a classification problem!

  16. Which do you prefer, decision trees or random forests?

  17. What’s the difference between logistic regression and random forest?

  18. Would you use a decision tree or a random forest to solve a classification problem? What are the advantages of random forests?

☞ I have also received an offer from this company. In fact, I enjoyed the technical exchange. You might think these are the most basic questions in machine learning and data science, but I get the feeling that the interviewer is either not in the field or doesn’t know much about what’s going on in the field.

Company 7: Global business management company (interview duration: 25-30min)

  1. In unbalanced data set, which model would you choose: Random forest or Boosting? Why is that?

  2. What do you know about Boosting?

  3. Which model do you choose for supervised learning to solve classification problems? Let’s say there are 40-50 categories!

  4. How do you use Ensemble technology?

  5. The working principle of support vector machine (SVM) is briefly introduced.

  6. What is Kernel? Just a quick introduction.

  7. How to achieve nonlinear regression?

  8. What are Lasso regression and Ridge regression?

▌ To tell you the truth, the interview was a bit dull, so I didn’t take it seriously. But it’s a good question. The position I was interviewing for was to lead a team of 15 or 16 people on a project, followed by manager interviews and HR interviews. Eventually they offered me job counseling and a good salary.

Company 8: A four-year-old production and service company (60 minutes)

  1. You mentioned in your resume that you have done speech recognition, what is your specific approach?

  2. What is Meir frequency cepstrum (MFCCs)?

  3. What is the Gaussian mixture model and how does it accomplish clustering?

  4. How to maximize expectations? Describe the implementation steps.

  5. How are probabilities calculated in the GMM model?

  6. How do you perform MAP adjustments for GMM-UBM technology when doing speech recognition?

  7. Talk about the i-vector technique you use.

  8. What are the main factors in analyzing context?

  9. What is the difference between JFA and I-Vector? Why choose i-Vector over JFA?

  10. Have you ever used PLDA i-vector?

  11. Have you read baidu’s Deep Speaker paper?

  12. If you had two models to choose from, what would you base your choice on? (Examine techniques for model selection)

  13. The mathematical principles of Bayesian information measure (BIC) and red pool information amount (AIC) are briefly introduced.

  14. What is the working principle of Bayesian information measurement and red pool information?

  15. What happens if data is lost in the MFCC eigenvector matrix?

  16. How to do speech recognition? What are the characteristics?

  17. Is your classifier for speech and music, or for speech and non-speech?

  18. How is deep neural network applied in speech analysis?

☞ Yes, you might be surprised what these questions are. Coincidentally, both of us work in the area of speech analysis (specifically speech recognition). So the whole interview process is all about voice analysis. It was clear that the interviewer was professional and gave me positive feedback. Later, the company offered me a job as an AI solution architect.


Some advice

I’ve spoken to probably 25-30 professionals over the course of this job search. Here’s my advice for both readers and job seekers:

  • Resumes are important. Be sure to include projects you’ve participated in, Kaggle contests, MOOC certifications, or papers. I just got an interview call from Amazon without any references. Your resume will impress the HR and the interviewer.

  • Confidence and enthusiasm are half the battle. Go into an interview confident and show your enthusiasm (this is especially important when interviewing startups and service-based companies).

  • Don’t rush to answer the interviewer’s questions. Take some time to organize your answers before you answer them, and be sure to ask your interviewer if you don’t understand a question. There is also in the interview must be calm!

  • Be sure to present yourself appropriately when explaining concepts. Name a few projects you have achieved and be sure to familiarize yourself with the skills and projects mentioned in your resume.

  • In most cases, interviewers are looking for technical people with experience in the field. If you’re new to the field, start with projects you’ve worked on when creating your resume. Your GitHub account is also very convincing. In addition, take up Kaggle contests and MOOC courses.

  • Be humble and pay attention to what the interviewer has to say or you will be rejected. Sometimes people who use R and Python will despise each other, and you’d better not get caught up in this argument, or you’ll be rejected. I personally believe that BOTH R and Python are tools for implementing logic and concepts.

Finally, I wish you a successful interview!

Original link:

https://appliedmachinelearning.wordpress.com/2018/04/13/my-data-science-machine-learning-job-interview-experience-list-o f-ds-ml-dl-questions/