• Data exploration

    • Analyze the value range using a box plot to look for outliers
    • Padding outliers
    • The correlation coefficient was calculated and the threshold was set to remove the attributes with large correlation coefficient
    • chi-square
  • Characteristics of the engineering

    • Data normalization
    • Data dimension reduction
    • Split data set, validate set
  • Model selection

  • Model to evaluate

    • Overfitting (complex model, regularization)

    • Underfitting (simple model)

    • Cross validation

    • The grid search

    • Model integration