This article is participating in Python Theme Month. See [activities]

1. Index classification

The index is the basis for finding the data. The purpose of setting the index is to make it easy for us to find the data.

1.1 Common Index

For dataframe data structures, there are generally two types of indexes. Are the common index and the location index respectively.

Normal index: it is actually a name, each column of data will have a column name, each row of data can also set a row name

As shown in the figure, for a row, a normal index is the first column 1, 2, 3, 4, 5 contents; For columns, an ordinary index is the first row of the area, province, city, time, index, and weight

1.2 Location Index

As the name implies, the data is in the row and column. The corresponding number is subtracted by 1 to give its index (because the number of rows and columns starts at 1 and the index starts at 0, so it needs to be subtracted by 1. Otherwise you’re going to get the wrong data. Note here.)

Note: The location index is built-in, whereas the normal index is not necessarily available. Sometimes you need to set it.

2. Set the index

The contents of a non-indexed table are as follows:

2.1 Adding an Index to a Non-indexed Table

Import pandas as pd df = pd.read_excel(r 'c :\Users\admin\Desktop\data_test.xlsx') print(" before adding index :") print(df) df.index = [' a ', '2', '3', '4'] df. Columns = [' area ', 'province', 'cities',' time ', 'index', 'weight'] print () "after adding indexes:" print (df)Copy the code

The result:

Before adding an index: 2019-09-06 00:00:00 12 0.78 0 2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 55 2019-09-10 21 8.90 Regional province city time index weight 1 Northwest Guangdong Xi 'an 2 South China Beijing Shenzhen 2019-09-08 87 0.34 3 North China Hubei Beijing 4 Central China Heilongjiang Wuhan The 2019-09-10 21 8.90Copy the code

After the index is added, the table contents are as follows:

2.2 Resetting the index

Resetting the index, usually refers to the setting of the row index. Although some tables have indexes, they are not the indexes we want. You need to reset.

2.2.1 Setting a Single-column Index

Df = pd.read_excel(r'C:\Users\admin\Desktop\data_test.xlsx') print(" ") print(df) df.set_index(" city ") Print (" reset index :") print(df)Copy the code

The result:

Before resetting index: 2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 2019-09-09 45 1.23 3 2019-09-09 45 2019-09-10 21 8.90 After resetting the index: Regional province time index weight city Xi 'an northwest Guangdong Shenzhen South Beijing 2019-09-08 87 0.34 Beijing North Hubei 2019-09-09 45 1.23 Wuhan Central China Heilongjiang The 2019-09-10 21 8.90Copy the code

Set_index () sets the corresponding column to the index by passing in the name of the column you want to set as the index.

2.2.2 Setting multi-column Indexes

The use of multiple columns to index a table is called hierarchical indexing.

Df = pd.read_excel(r 'c :\Users\admin\Desktop\data_test.xlsx') print(" ") print(df) print(" ") Print (df.set_index([" region ", "province "]))Copy the code

result:

Before resetting index: 2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 2019-09-09 45 1.23 3 2019-09-09 45 2019-09-10 21 8.90 After resetting the index: City time index weight region province northwest Guangdong Xi 'an south China Beijing Shenzhen 2019-09-08 87 0.34 North China Hubei Beijing 2019-09-09 45 1.23 Central China Heilongjiang Wuhan The 2019-09-10 21 8.90Copy the code

2.3 Index Renaming

When renaming an index, the parameters should be passed in the dictionary form of {original index name: new index name}.

Df = pd.read_excel(r 'c :\Users\admin\Desktop\data_test.xlsx') print(" ") print(df) print(" ") Print (df.rename(columns={" region ": "region "," province ": "province "}))Copy the code

result:

Before renaming index: 2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 2019-09-09 45 1.23 3 2019-09-09 45 2019-09-10 21 8.90 2019-09-10 21 8.90 2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 2019-09-09 45 1.23 3 2019-09-09 45 The 2019-09-10 21 8.90Copy the code

Note: The most commonly used is the column index, here as an example to describe the column index. The practice of row index is consistent with it, just change the columns to index.

2.4 Index Reset

Resetting index is mainly used in hierarchical index tables. Resetting index is to return index columns as columns

Let’s look at the situation before index reset:

Import pandas as pd df = pd.read_excel(r 'c :\Users\admin\Desktop\data_test.xlsx') df2 = df.set_index([" area ", "City "]) print(df2)Copy the code

result:

2019-09-07 87 0.65 2019-09-08 87 0.34 2019-09-08 87 0.34 2019-09-09 45 1.23 The 2019-09-10 21 8.90Copy the code
print(df2.index)
Copy the code

result:

Xian northwest of MultiIndex ([(' a ', ' '), (' south ', 'shenzhen'), (' north ', 'Beijing'), (' central China ', 'wuhan)], names = [' area', 'city'])Copy the code

As you can see, this is a hierarchical index consisting of two columns: region and city.

The reset_index() method is used to reset the index. The common parameters are described as follows:

Level: Specifies the level of the index to be converted to columns. In a hierarchical index (that is, a table with multiple indexes), the first index is level 0, the second is level 1, and so on. This parameter converts all indexes to columns by default.

Drop: specifies whether to delete the index, that is, not as a new column. The default value is False, that is, not to delete the original index.

Inplace: Specifies whether to modify the original data table.

The effects of these parameters are illustrated one by one:

Against 2.4.1 level parameter

(1) Default case

Df = pd.read_excel(r'C:\Users\admin\Desktop\data_test.xlsx') df2 = df.set_index([" region ", "city "]) print(df2.reset_index())Copy the code

result:

2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 2019-09-09 45 1.23 3 2019-09-09 45 The 2019-09-10 21 8.90Copy the code
print(df2.reset_index().index)
Copy the code

result:

RangeIndex(start=0, stop=4, step=1)
Copy the code

You can see by comparing it to the reset TAB. The default is restored to the initial case after the index is reset. That is, manually set indexes are deleted.

(2) the specified value

Df = pd.read_excel(r'C:\Users\admin\Desktop\data_test.xlsx') df2 = df.set_index([" area ", Print (df2.reset_index(level=0))Copy the code

result:

Regional province time index weight city Xi 'an northwest Guangdong Shenzhen South Beijing 2019-09-08 87 0.34 Beijing North Hubei 2019-09-09 45 1.23 Wuhan Central China Heilongjiang The 2019-09-10 21 8.90Copy the code
print(df2.reset_index(level=0).index)
Copy the code

result:

Index ([' xi 'an ', 'shenzhen', 'Beijing', 'wuhan'], dtype = 'object', name = 'city')Copy the code

Compared to the default, you can see that the region index has been removed. The only index left is the city. So this argument can be interpreted as if there are multiple indexes, and you want to delete the index by setting it to the number minus 1. For example, if I want to drop the third index, the value of the level parameter is 2.

2.4.2 drop parameters

Df = pd.read_excel(r'C:\Users\admin\Desktop\data_test.xlsx') df2 = df.set_index([" area ", Print (df2.reset_index(drop=True))Copy the code

result:

2019-09-07 87 0.65 1 2019-09-08 87 0.34 2 2019-09-09 45 1.23 3 2019-09-10 21 8.90Copy the code

The region and city columns have been deleted as indexes, so they are missing from the printed result.