Honor of Kings has been going on for so long, and you haven’t been king yet? Ha ha ha, look over, is not enough to understand the hero thoroughly, is not very good for the hero classification, today to see the hero classification

Technology stack

A brief introduction to EM clustering

Second, climb online hero initial attribute value

Make a pie chart

Introduction to EM clustering

EM is known as Expectation Maximization, also known as maximum Expectation algorithm.

In statistical calculation, the MAXIMUM expectation (EM) algorithm is an algorithm that finds the maximum likelihood or maximum posterior estimate of parameters in the probabilistic model, which relies on Latent variables that cannot be observed.

The maximum expectation algorithm is calculated alternately through two steps. The first step is to calculate the expectation (E), which uses the existing estimates of hidden variables to calculate their maximum likelihood estimates. The second step is to maximize (M), maximizing the maximum likelihood value obtained at step E to compute the value of the parameter. The parameter estimates found at step M are used in the next step E calculation, and the process continues alternately.

Perform hero clustering

Using EM clustering algorithm framework in SKLearn library, gaussian mixture model is adopted

from sklearn.mixture import GaussianMixture
Copy the code

The meanings of some parameters are as follows. For other parameters, see related documents

  1. N_components: Number of mixed Gaussian models, that is, the desired number of clusters. Default is 1

  2. Covariance_type: Covariance type, including {‘ full ‘, ‘tied’, ‘diag’, ‘Spherical’}, corresponding to complete covariance matrix (elements are not zero), same complete covariance matrix (HMM uses), and diagonal covariance matrix (off-diagonal zero, Diagonal non-zero), spherical covariance matrix (off-diagonal zero, diagonal identical, spherical properties), default ‘full’ full covariance matrix

  3. Max_iter: indicates the maximum number of iterations. The default value is 100

Therefore, GMM clustering can be constructed as follows:

# Construct GMM clustering
gmm = GaussianMixture(n_components=20, covariance_type='full')
Copy the code

There is a data structure as follows:

As you can see, there are many attributes involved. The initial attribute Settings are as follows:

feature = ['Physical Attack level 1'.'Physical Attack level 15'.'Growth per level'.'Life Level 1'.'Level 15 Life'.'Health growth'.'Physical Defense Level 1'.'Physical Defense level 15'.'Physical defense growth per level'.'Attack speed growth'.'Level 1 regenerates health every 5 seconds'.'Level 15 regenerates health every 5 seconds'.'Maximum mana of level 1'.'Maximum mana of Level 15'.'Maximum mana growth'.'Level 1 regains blue every 5 seconds'.'Rank 15 regenerates blue every 5 seconds'.'Near/far? '.'move speed'.'positioning'.'Personal suggestion of diversion']
Copy the code

Attribute dimension reduction

We can first judge which attributes are strongly correlated through the thermal diagram, and only the unique attributes can be retained

import seaborn as sns
import matplotlib.pyplot as plt

corr = data[feature].corr()
plt.figure(figsize=(14.14))
sns.heatmap(corr, annot=True)
plt.show()
Copy the code

It can be seen that “level 1 maximum mana”, “Level 15 Maximum Mana” and “Maximum mana growth” are strongly correlated, so attribute screening can be made, and the final reserved attributes are as follows:

features_remain = ['Level 15 Life'.'Physical Attack level 15'.'Physical Defense level 15'.'Maximum mana of Level 15'.'Level 15 regenerates health every 5 seconds'.'Rank 15 regenerates blue every 5 seconds'.'move speed'.'Attack speed growth'.'Near/far? ']
Copy the code

Data normalization

Put the attack range field (” near/remote? “) converts to 0 and 1

data_new['Near/far? '] = data_new['Near/far? '].map({'remote': 1, 'short-range': 0})
Copy the code

EM clustering calculation

Gaussian mixture mode is adopted, and the generated categories are written into CSV files

 # Construct GMM clustering
 gmm = GaussianMixture(n_components=20, covariance_type='full')
 gmm.fit(data_new)
 
 # Training data
 prediction = gmm.predict(data_new)
 # print(prediction)
 
 hero_data.insert(0, 'group', prediction)
 hero_data.to_csv('hero_out.csv', index=False, sep=', ', encoding='gb18030')
Copy the code

The pie chart output

In order to have a more intuitive view of the grouping of each hero, the pie chart is used to make visualization. Firstly, the “grouping” and “name” fields of the data are taken out, and the “grouping” fields are grouped

df = hero_data[['group'.'name']]
grouped = df.groupby(['group'])
Copy the code

Then take out the values in the grouping and use Pyecharts to draw the pie chart

 from pyecharts import Pie
 
 k = []
 for name, group in grouped:
     k.append({name: list(group['name'].values)})
 
 kk = []
 for i in k:
     for k, v in i.items():
        kk.append(v)

length = []
key = []
for i in kk:
    key.append(str(i))
    length.append(len(i))
pie = Pie('Hero complete attribute Classification Chart', title_pos='center')
pie.add("", key, length,
        is_label_show=True, legend_pos="bottom", legend_orient="vertical",)
pie.render()
Copy the code

Capture the hero’s initial stats

If you want to get a more complete hero data, you still need to go to the web to crawl, so that the number of heroes is up to date. Here I use data from db.18183.com/. The page is as follows:

Gets the hero page URL

BeautifulSoup is used to locate ul elements whose class is modd-iconlist, which holds pages for individual heroes

    url = 'http://db.18183.com/'
    url_list = []
    res = requests.get(url + 'wzry').text
    content = BeautifulSoup(res, "html.parser")
    ul = content.find('ul', attrs={'class': "mod-iconlist"})
    hero_url = ul.find_all('a')
    for i in hero_url:
        url_list.append(i['href'])
Copy the code

Fetching details

Loop through the list of urls to fetch details for each hero

     base_url = 'http://db.18183.com/'
     detail_list = []
     for i in url:
         # print(i)
         res = requests.get(base_url + i).text
         content = BeautifulSoup(res, "html.parser")
         name_box = content.find('div', attrs={'class': 'name-box'})
         name = name_box.h1.text
         hero_attr = content.find('div', attrs={'class': 'attr-list'})
        attr_star = hero_attr.find_all('span')
        survivability = attr_star[0]['class'][1].split(The '-')[1]
        attack_damage = attr_star[1]['class'][1].split(The '-')[1]
        skill_effect = attr_star[2]['class'][1].split(The '-')[1]
        getting_started = attr_star[3]['class'][1].split(The '-')[1]
        details = content.find('div', attrs={'class': 'otherinfo-datapanel'})
        # print(details)
        attrs = details.find_all('p')
        attr_list = []
        for attr in attrs:
            attr_list.append(attr.text.split(':')[1].strip())
        detail_list.append([name, survivability, attack_damage,
                            skill_effect, getting_started, attr_list])
Copy the code

Save to a CSV file

Open a file and store the corresponding list fields

     with open('all_hero_init_attr.csv'.'w', encoding='gb18030') as f:
         f.write('Hero name, Survivability, Attack damage, Skill Effect, Difficulty, Maximum Health, Maximum mana, Physical attack,'
                 'Spell Attack, Physical Defense, Physical Damage rate, Spell Defense, Spell Damage rate, Movement Speed, Physical Armor penetration, Spell Armor penetration, Attack speed bonus, Crit chance,'
                 'Crit effect, Physical Leech, Spell Leech, Cooldown Reduction, Range of attack, Resilience, Health Regeneration, Mana Regeneration \n')
         for i in details:
             try:
                 rowcsv = '{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}'.format(
                     i[0], i[1], i[2], i[3], i[4], i[5][0], i[5][1], i[5][2], i[5][3], i[5][4], i[5][5],
                     i[5][6], i[5][7], i[5][8], i[5][9], i[5][10], i[5][11], i[5][12], i[5][13], i[5][14], i[5][15],
                    i[5][16], i[5][17], i[5][18], i[5][19], i[5][20]
                )
                f.write(rowcsv)
                f.write('\n')
            except:
                continue
Copy the code

Data cleaning

Because the site may not be very careful, some attributes will have two percent signs and empty, as shown in the figure:

So we have to deal with that.

For both percent signs, simply use notepad++ to replace all %% with single %

For fields that are empty, use the following code and fill them with 0

Set the null value to 0
data_init = data_init.fillna(0)
Copy the code

complete

For data normalization, GMM clustering and pie chart presentation, are similar to the previous, no further details, let’s take a look at the pie chart effect

Although through these two pie charts, there is no way to improve your hand problems, but the clear classification of heroes, is not a step closer to the king

The full code is here: github.com/zhouwei713/…