Cart-classfication And Regression Tree – Classification Regression Tree

CART only supports binary trees, which can be used as both classification and regression trees

What is a classification tree and what is a regression tree?

Classification tree – select from several classifications – discrete data regression tree – Given data, predicted results – continuous values for prediction

CART classification tree workflow

The core of decision tree – search for pure partition – purity CART classification tree – C4.5 algorithm similar – attribute selection index – Gini coefficient

Gini coefficient minus 1 sigma P squared

The smaller the index-Gini coefficient of samples is, the smaller the difference between samples is, and the uncertainty degree is low

CART classification process

  1. Gini coefficient of each child node
  2. Gini coefficient of parent node = sum of normalized Gini coefficient of child node

How do I create a classification tree using the CART algorithm

Iris was used as the data set

  1. Load data set
  2. Get feature sets and classification identifiers
  3. The feature set is divided into training set and test set
  4. Create the CART classification tree
  5. Classification tree is constructed by fitting
  6. Use the CART classification tree for prediction
  7. The predicted results were compared with the test set results

CART regression tree workflow

Process and classification trees, the gini coefficient as standard – the prediction results of regression tree are continuous values, judges “purity” – samples of discrete degree Find a way to nodes The least absolute deviation (LAD) | x – mu | least-square deviation (LSD) 1 / n * Σ (x – mu) squared

Pruning the CART decision tree

CCP method after pruning – cost – complexity prune complexity [cost] surface error rate gain value of the node of alpha = (C (t) – C (t))/(| t | 1) = section of the idea after the trees are pruned error – error/ideas trees were pruned before in the festival The leaf tree after pruning (if pruned, it is 1 less than the original leaf tree) is the error change of the node point tree after pruning divided by the number of leaves cut to minimize the error before and after pruning – find the node corresponding to the minimum α value and cut it

ID3, C4.5, the difference in node division of CART classification tree

ID3 – judged based on the information gain – maximum information gain – selected as the root node C4.5 – judged based on the information gain rate – Maximum information gain rate – as the root node CART – judged based on the Gini coefficient – attributes with the minimum Gini coefficient are classified as attributes