This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money. Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

An introduction to the

Decision stump, also called decision stump, is a simple decision tree. In the second installment, we talked about the principle of decision tree. Now we will build a single layer decision tree, which makes decisions based on a single feature. Since the tree divides only once, it is essentially a stump.

Build a simple data set

Let’s start by building a simple data set to make sure our function works.

Import numpy as np import pandas as pd # def get_Mat(path) dataSet = pd.read_table(path,header = None) xMat = np.mat(dataSet.iloc[:,:-1].values) yMat = np.mat(dataSet.iloc[:,-1].values).T return xMat,yMatCopy the code
xMat,yMat = get_Mat('simpdata.txt')
xMat
Copy the code

yMat
Copy the code

Import matplotlib. Pyplot as PLT plt.rcparams ['font. Sans-serif ']=['simhei']# show Chinese %matplotlib inline # def showPlot(xMat,yMat): Array (xMat[:,0]) Y = Np.Array (xMat[:,1]) Label = NP.Array (yMat) plT. scatter(x, Y, C =label) plt.title(' Single layer decision tree test data ') plt.show()Copy the code
showPlot(xMat,yMat)
Copy the code

3. Construct a single-layer decision tree

We will build two functions to implement our single-layer decision tree: the first tests whether any value is less than or greater than the threshold we are testing. The second function is slightly more complex, loops through a weighted data set and finds the single-layer decision tree pseudocode with the lowest error rate as follows:

  • Set the minimum error rate minE as +∞
  • For each feature in the dataset (layer 1 loop) :
  • For each step size (layer 2 loop) :
  • For each inequality sign (layer 3 loop) :
  • A single-layer decision tree is established and predicted by weighted data set
  • If the error rate is lower than minE, the current single-layer decision tree is set as the best single-layer decision tree
  • Returns the optimal single-layer decision tree
Def Classify0(xMat, I,Q,S): re = np.ones((xMat. Shape [0],1)) # if S == 'lt': Re [xMat[:, I] <= Q] = -1 # If less than threshold, assign -1 else: re[xMat[:, I] > Q] = -1 # If greater than threshold, assign -1 return reCopy the code
def get_Stump(xMat,yMat,D): M,n = xmat. shape #m is the number of samples Steps = 10 # Initialize a step bestStump = {} # store the stump information in dictionary form bestClas = np.mat(np.zeros((m,1))) # initialize the classification result as 1 minE = Np.INF For I in range(n): Min = xMat[:, I].min() # step = (max-min)/Steps # step = (max-min Range (-1, int(Steps)+1): for S in ['lt', 'gt']: # Lt: less than, Gt :greater than Q = (Min + j * stepSize) # Calculate threshold re = Classify0(xMat, I, Q, Err = np.mat(np.ones((m,1))) # err[re == yMat] = 0 # eca = D.T * err # {I}, threshold: {np. Round (Q, 2)}, mark: {S}, weighted error: {np. Round (eca, 3)} ') if the eca < mime: MinE = eca bestClas = re.copy() bestStump[' feature column '] = I bestStump[' threshold '] = Q bestStump[' flag '] = S return bestStump,minE,bestClasCopy the code
Shape [0] D = np.mat(np.ones((m, 1))/m) #Copy the code