Pearson correlation coefficient

Misunderstandings and premises

The variables must be linearly correlated before calculating Pearson’s correlation coefficient! Draw a scatter chart to see the trend!

Explanation of correlation coefficient size

Pearson correlation coefficient VS Spearman correlation coefficient

Blog.csdn.net/lambsnow/ar…

Spearman correlation coefficient is insensitive to data errors and extreme values compared to Pearson correlation coefficient.

Pearson correlation coefficient should meet the following conditions:

  • Data Pearson correlation coefficient should be normally distributed
  • The Pearson correlation coefficient is greatly affected by outliers, so the gap between experimental data should not be too large
  • Each group of samples is sampled independently of each other -> is needed to construct the T-statistic

★★★★ Pearson correlation coefficient using steps:

  1. Draw a scatter plot to see if there is a linear relationship
  2. Test for normal distribution: JB test (>30), Shapiro ‐ wilk test ([3,50]), q-q diagram
  3. Calculation of Pearson correlation coefficient CORrCOef (Test)
  4. Hypothesis testing of Spearman correlation coefficient, marked significance level of each correlation coefficient *

★★★★ The following is important

Note: 90% choose Spearman, because Pearson normal distribution is too difficult to satisfy

Hypothesis testing of Spearman correlation coefficient

See https://juejin.cn/post/6844903823681536013

Small sample n 30 or less:

Table lookup – two-sided test, 0.05 significance, r needs to be greater than or equal to the critical value in the table to draw a significant conclusion: significant difference from 0

Large sample: