“This is the 10th day of my participation in the Gwen Challenge in November. Check out the details: The Last Gwen Challenge in 2021.”

content

1. Analyze the test data set of WEKA;

2. Use WEKA to realize data mining in the database;

3. Preprocessing the data with weKA preprocessing algorithm, including adding attributes, deleting attributes/instances, and discretization of data.

Steps and Results

Analyze weKA’s own test data set;

First, install weKA

After installation, unpack weka.jar

Check out the Data folder, which contains the data sets that come with WEKA

Weka is used to realize data mining in the database

When mining data in the database, we need to connect WEKA to mysql

 

First, configure environment variables

% WEKA_HOME % \ lib \ mysql connector – Java – 5.1.47. Jar;

Second, start the database, create a database named WEKA, and create the following table

Third, modify the following configuration files

Modify the following two lines

After setting up, open weKA and enter the Explorer page

 

Click the following button

 

The database connection success message is displayed

Query the data in the WEka1 table

The results are as follows

 

The preprocessing algorithm in WEKA is used to preprocess data, including adding attributes, deleting attributes/instances and discretization of data.

 

First, load the data

The following page is displayed after the data is loaded

Second, delete the attribute

Click choose

 

The appropriate filter for removing attributes is Remove, and we find the Remove entry under unsupervised \attribute

 

And then click apply

Attribute deleted successfully

 

Third, add attributes

Click the Choose button again, and then weka-filter-unsupervised – attribute-addUserFileds filters in sequence

Set up properties

 

New attributes are generated after Apply

 

Add filter AddValues

 

Click Edit to view

Fourth, delete the instance

<1> Select choose-weka-filter-unsupervised- instance-removefolds, the filter will segment the dataset into a given cross validation folds and specify the output folds. Click the text box next to Choose to pop up the following dialog box

 

Hitting Apply leaves only two data points

 

<2>choose-weka-filter-unsupervised-instance-RemovePercentage,

Filter to remove instances of a given percentage of the dataset, click the text box next to Choose,

The following dialog box is displayed,

 

Only 1 data is left after apply

 

< 3 > select choose – weka – filter – unsupervised – instance – RemoveRange,

Filter to remove instances of a given range from the dataset, click the text box next to Choose,

The following pops up:

When I hit Apply

 

Fifth, use WEKA to discretize the data

Locate the glass data set glass.arff file in the Data directory

RI property histogram

Uniform width discretization: Open the choo-weka-filters -unsupervised- attribute-discretize one by one. Leave the default parameters unchanged

Click Apply and the following image appears:

 

Equifrequency discretization: Set the value of Discretize to true. RI property after constant frequency discretization is obtained as shown in the figure below:

 

Check the Ba, Fe

Sixth, supervised discretization

First, open the iris data set in the data set, namely the iris.arff file, and the attributes of the data set are as follows

 

Open iris data set in Weka, as shown in the figure below

 

Then click on choo-weka-filters-image-attribute-discretize one by one and click Apply to open the visualization window and find the value range of each attribute as follows: