Machine Learning Pen

Red Stone’s personal website: Redstonewill.com

Past review:

Machine Learning Pen (2)

Machine learning is both a theoretical and practical technical discipline. When applying for machine learning-related jobs, we often encounter various machine learning questions and knowledge points. In order to help you sort out and understand these knowledge points, so that you can better deal with machine learning written tests including interviews. Red stone is ready to serialize some machine learning pen test series articles in the public, hoping to be helpful to everyone!

Q1. About “Regression” and “Correlation”, which of the following statements is true? Note: x is the independent variable and y is the dependent variable.

A. Regression and correlation are symmetric between x and y

B. Both regression and correlation are asymmetric between x and y

C. Regression is asymmetric between x and y, and correlation is mutually symmetric between x and y

D. Regression is symmetric between x and y, correlation is asymmetric between x and y

Answer: C

Correlation is used to calculate the degree of linear Correlation between two variables. In other words, the correlation between x and y is the same as the correlation between y and x, there is no difference.

Regression generally predicts the output Y by using feature X, which is one-way and asymmetric.

Q2. If we only know Mean and Median of variables, can we calculate the Skewness of variables?

Can A.

B. no

Answer: B

Analysis: Skewness is a measure of the skew direction and degree of statistical data distribution. Skewness is defined by using the third-order moment, and its calculation formula is as follows:

Where n is the number of samples. The frequency distribution of statistical data is either symmetric or asymmetric, that is, skewness. In skewness distribution, when the skewness is positive, the distribution is positively skewness, that is, the mode digit is to the left of the arithmetic mean. When the skewness is negative, the distribution has a negative skewness, i.e. the mode digit is to the right of the arithmetic mean.

We can use the relationship between mode, median, and arithmetic mean to determine whether a distribution is left-skewed or right-skewed, but to measure how skewed the distribution is, we need to calculate the skewness.

Q3. Suppose there are n groups of data sets. In each group of data sets, the average value of X is 9, the variance of X is 11, the average value of Y is 7.50, the correlation coefficient between X and Y is 0.816, and the fitting linear regression equation is y = 3.00 + 0.500*x. So are these n data sets the same?

As A.

B. is different

C. Uncertain

Answer: C

4. Anscombe’s Quartet no. In 1973, the statistician F.J. Anscombe constructed four curious sets of data. In all four sets of data, the average x value is 9.0 and the average Y value is 7.5; The variance of x value is 10.0, and the variance of Y value is 3.75; They all have a 0.816 correlation and a linear regression line of Y =3+0.5x. Looking at these statistics alone, the four sets of data tell a very similar story. In fact, the four sets of data are very different, as the chart below shows:

The corresponding Python code is:

import seaborn as sns

sns.set(style="ticks")
# Load the example dataset for Anscombe's quartet
df = sns.load_dataset("anscombe")
# Show the results of a linear regression within each dataset
sns.lmplot(x="x", y="y", col="dataset", hue="dataset", data=df,
         col_wrap=2, ci=None, palette="muted", size=4,
         scatter_kws={"s": 50."alpha": 1})
Copy the code

Q4. How does the number of observed samples affect overfitting (multiple selection)? Note: The parameters remain the same in all cases.

A. The number of observations is small, and overfitting is easy to occur

B. The number of observations is small, and the over-fitting is not easy to occur

C. Observation times are large, and overfitting is easy to occur

D. It is not easy to have over-fitting due to the large number of observations

Answer: the AD

Analysis: If the number of sample observations is small and the number of samples is small, it is easy to fit all sample points very well by increasing the model complexity, such as polynomial order, resulting in over-fitting. However, if the number of observations is large and the samples are more representative, it is not easy to have over-fitting even if the model is complex, and the obtained model can reflect the real data distribution more realistically.

Q5. Suppose a more complex regression model is used to fit the sample data, and Ridge regression is used to debug the regularization parameter λ to reduce the model complexity. If λ is large, which of the following statements is true regarding bias and variance?

A. If λ is large, the deviation and variance decrease

B. If λ is large, the deviation decreases and the variance increases

C. If λ is large, the deviation increases and the variance decreases

D. If λ is large, the deviation and variance increase

Answer: C

Analysis: if λ is large, it means that the model complexity is low, which is prone to underfitting, corresponding deviation increases, variance decreases. A quick summary:

Small λ : the deviation decreases, variance increases, easy to overfit
Large λ : deviation increases, variance decreases, prone to under-fitting

Q6. If a more complex regression model is used to fit the sample data, Ridge regression is used to debug the regularization parameter λ to reduce the model complexity. If λ is small, which of the following statements is true with respect to bias and variance?

A. If λ is small, the deviation and variance decrease

B. If λ is small, the deviation decreases and the variance increases

C. If λ is small, the deviation increases and the variance decreases

D. If λ is small, the deviation and variance increase

Answer: B

See Q5.

Q7. Which of the following statements about Ridge regression is true (multiple choices)?

A. If λ=0, it is equivalent to general linear regression

B. If λ=0, it is not equivalent to the general linear regression

C. If λ=+∞, the weight coefficient is very small and close to zero

D. If λ=+∞, the weight coefficient is very large, close to infinity

Answer: AC

Analysis: if λ=0, that is, there is no regularization term, equivalent to the general linear regression, the least square method can be used to solve the coefficient. If λ=+∞, the regularization term’s “penalty” on the weight coefficient is very large, and the corresponding weight coefficient is very small, close to zero.

For a graphical explanation of regularization, please refer to my article:

Intuitive explanation of L1 and L2 regularization in machine learning

Q8. Of the three residual graphs given below, which of the following represents the worse model compared to the other models?

Note:

1. All residuals have been standardized

2. In the figure, the abscissa is the predicted value and the ordinate is the residual

A. 1

B. 2

C. 3

[D]

Answer: C

Analysis: There should not be any functional relationship between the predicted value and residual, if there is a functional relationship, it indicates that the model fitting effect is not very good. Accordingly, in the figure, if the abscissa is the predicted value and the ordinate is the residual, the residual should be presented as a random distribution independent of the predicted value. However, the relationship between residuals and predicted values in FIG. 3 is quadratic, indicating that the model is not ideal.

Q9. Which of the following methods does not have a closed-form solution for the coefficients?

A. Ridge regression

B. Lasso

C. Ridge regression and Lasso

D. None of the above

Answer: B

Ridge regression is a general linear regression with L2 regular terms. It has a closed form solution that can be solved based on the least square method.

Lasso regression is a general linear regression plus the L1 regular term, which makes the solution nonlinear and has no closed form solution.

Q10. Observe the following data set:

Delete a, B, C, D which point has the greatest influence on the fitting regression line?

A. a

B. b

C. c

D. d

Answer: D

Analysis: Linear regression is sensitive to outliers in data. Although point C is also an outlier, it is close to the regression line and has a small residual. Therefore, point D has the greatest influence on the fitting regression line.

Q11. In a simple linear regression model (with only one variable), if you change the input variable by one unit (increase or decrease), how much will the output change?

A. One unit

B. the same

C. intercept

D. Scale factors of regression models

Answer: D

If x changes by one unit, for example x+1, then y changes by b units. B is the scale factor of the regression model.

Q12. Logistic regression limits the output probability to between [0,1]. Which of the following functions does this?

A. the Sigmoid function

B. tanh function

C. ReLU function

D. Leaky ReLU function

Answer: A,

The Sigmoid function expression and graph are as follows:

The Sigmoid function is limited to output values between 0 and 1.

Tanh functions:

ReLU function:

Leaky ReLU function:

λ is a variable parameter, for example, λ=0.01.

Q13. In linear regression and logistic regression, is the partial derivative of loss function with respect to weight coefficient correct in the following statement?

A. They are different

B. It’s the same

C. Uncertain

Answer: B

Analysis: The loss function of linear regression is:

The loss function of logistic regression is:

The output layer of logistic regression contains the Sigmoid nonlinear function, and the partial derivative of the loss function to the linear output Z before Sigmoid function is the same as the partial derivative of the loss function of linear regression to the linear output Z, which are:

The specific derivation process is relatively simple and omitted here.

DZ is the same, the partial derivative with respect to the property weight coefficient is the same in the reverse.

Q14. Assume that logistic regression is used for n multi-category classification and one-VS-REST taxonomy is used. Which of the following statements is true?

A. For n categories, n models need to be trained

B. For n categories, n-1 models need to be trained

C. For n category, only 1 model needs to be trained

[D]. None of the above is true

Answer: A,

In the one-VS-REST taxonomy, assuming n categories, n binomial classifiers are created, each of which classifies One category and the remaining categories. During prediction, n binomial classifiers are used for classification, and the probability of data belonging to the current class is obtained. The category with the highest probability is selected as the final prediction result.

For a simple example, the 3 categories are {-1, 0, 1}. Construct three binary classifiers:

Minus 1 and 0,1
0 and minus 1,1
Minus 1 and minus 1, 0

If the probability of -1 for the first binary classifier is 0.7, the probability of 0 for the second binary classifier is 0.2, and the probability of 1 for the third binary classifier is 0.4, then the final predicted category is -1.

Q15. The following figure shows two logistic regression models corresponding to different β0 and β1 (green and black) :

About the values of β0 and β1 in the two logistic regression models, which of the following statements is true?

Note: y= β0+β1*x, β0 is the intercept, β1 is the weight coefficient.

A. The green model’s beta 1 is larger than the black model’s beta 1

B. The green model’s β1 is smaller than the black model’s β1

C. β1 in both models is identical

[D]. None of the above is true

Answer: B

Analysis: The logistic regression model finally passes through the Sigmoid nonlinear function. Sigmoid is an increasing function and its graph is similar to the black model in the figure above. The black model is an increasing function, indicating its β1>0, and the green model is a decreasing function, indicating its β1<0. Therefore, it is concluded that the β1 of the green model is smaller than that of the black model.

References:

https://www.analyticsvidhya.com/blog/2016/12/45-questions-to-test-a-data-scientist-on-regression-skill-test-regression-s olution/

Related Posts

2018 Summary of natural language processing

Artificial Intelligence – Categorizing tasks and logistic regression – logistic regression

From shallow to deep: The relationship between convolution layer and transpose convolution layer in CNN