This is the fourth day of my participation in Gwen Challenge

Author: Cola

Source: Coke’s path to data analysis

Please contact authorization for reprinting (wechat ID: data_COLA)

The previous article mentioned a sequence, which can be understood as a column of data in Excel without column names. How does a table of rows and columns in Excel correspond to Python? The data box of today: DataFrame.

A DataFrame is a two-dimensional data structure consisting of a set of data and a pair of indexes (row index and column index). It can be viewed as a table in Excel. Unlike a Series, a DataFrame can have multiple rows/columns.

1. build

The first step is to import the pandas module, or PD.

Df_l = pd.dataframe (list1) df_l In [1]:import pandas as pdCopy the code

Results:

 

The column index is the first column [0,1,2,3]. The column index is the first column in Excel. The column index is the first column in Excel. Therefore, all indexes are incremented from 0 by default. Here, the first row is the column index, excluding the column index, and the middle area is values.

Created from a dictionary

# to create from dictionaries dict1 = {" name ": [" Tony", "Nancy", "Judy", "Cindy"], "age" :,17,18,15 [16]. "sex":["male","female","female","female"]} df_d = pd.DataFrame(dict1) df_dCopy the code

Results:

To create a DataFrame from a dictionary, each key defaults to columns.

Created from a nested list

A nested list, as the name suggests, is a list within a list, which can also be used to create data boxes. Unlike a dictionary, which creates data boxes in columns of key-value pairs, nested lists create data boxes in rows.

Df1 = pd.dataframe (list2) df1 = pd.dataframe (list2) df1Copy the code

Results:

Columns specifies the column index. Columns specifies the column index. Index specifies the row index.

Df1 = pd.dataframe (list2,index = [1,2,3],columns = [["Jane",15,101],["David",18,103],["Peter",16,102],columns = ["name","age","num"]) df1Copy the code

Results:

Check 2.

Lookup refers to the access to the row/column data of the data box

2.1 select rows

There is no way to select rows in Excel, just select rows with the mouse. To select a row or rows in Pandas, you can use either loC or ILOC. The loC method passes in the name of the index where the row resides.

Choose a row

Select the second row of the DF1 data box, using DF1.loc [2], where 2 is the name of the row index corresponding to the second row.

# access df1 line 2 df1.loc[2]Copy the code

Results:

If iloc is used, write df1.iloc[1]. Remember that the index is always 0, so the absolute position of the second line is 1.

df1.iloc[1]
Copy the code

The result is the same as loC

Choose a few lines

To select several lines, use ilOC to select absolute positions and slice them.

# select first 2 lines df1.iloc[:2]Copy the code

Results:

Instead of slicing, ilOC passes in the absolute position of the selected row, and LOC passes in the index name, enclosed in a list.

Df1.iloc [[0,2]] df1.loc[[1,3]]Copy the code

Results:

2.2 select column

In Excel, the cursor is used to select columns, except for the conditionally selected column. The cursor is used to select columns in the text box. The cursor is used to select columns in the text box. Column name, either of them.

Data box [column name] Data box. The column name

Select a column

Df. Column name equivalent to df [column name]

# select name column df1["name"] df1.nameCopy the code

Results:

Note that selecting the column gives you a sequence, not a data box. If you want a data box, add brackets.

Choose a few columns

Select the same rows. When selecting several columns, enclose them in brackets.

Df1 [["name","num"]]Copy the code

Results:

2.3 Row and Column Positioning at the same time

Loc positioning

Df.loc [row index, column index] can locate a data.

Loc [[1,3],["name","age"]]Copy the code

Results:

[1,3] is the row index, is a list value, indicating the row of the label 1,3, [” name “, “age”] is the column index, indicating the column of the index name,age.

You can also use slices to get the entire row

Df1.loc [:,["name","num"]]Copy the code

Results:

The colon on the left represents fetching all rows, and the list value on the right represents fetching the name and NUM columns.

Get all columns in the same way

Loc [[2,3],:]Copy the code

Results:

The: symbol can be used not only to represent all rows/columns, but also to slice rows/columns.

Df1.loc [1:3,:]Copy the code

Results:

Iloc positioning

Column and column indexes start at 0 in terms of the absolute position of the element. Compared to loc,1,3 in loc is the name of the row index, while 0,2 in iloc is the position of the 1 and 3 row indexes. Similarly, the name and age columns are 0,1.

# loc positioning df1. Loc [[1, 3], [" name ", "age"]] # iloc df1. Iloc [[0, 2], [0, 1]] # iloc methodCopy the code

Iloc can also be sliced.

# get name num column lines all df1. Loc [: [" name ", "num"]] df1, iloc [: [0, 2]] # iloc methodCopy the code

Results:

# 2\3 rows all columns df1, loc. [[2, 3], :] df1 iloc [[1, 2], :] # iloc methodCopy the code

Results:

Loc [1:3,:] df1.iloc[0:3,:] # iloc methodCopy the code

Results:

Iloc slice is left closed and right open, that is, it is not included in the right interval. 0:3 means that the values from the first to the fourth line are taken, not including the fourth line, so in fact, only the third line is taken. The rules for Iloc slices are the same as for Series slices.

3. To add

3.1 insert row

As with sequences, if you want to add rows to a DataFrame, create a new DataFrame and then vertically merge the two dataframes, again using the append method.

# additional line df2 = pd. DataFrame ({" name ": []" Jane ", "age" : [16], "sex" : [" female "]}) df_d. Append (df2, ignore_index = True)Copy the code

Results:

In addition to the append method, which can merge tables vertically to insert row records, there is the concat method. Concat is based on the pandas method of tabulating two data boxes vertically. Here we can see that the index is the same as the original index in the data box. We can reset the index and set ignore_index = True to create a new index.

pd.concat([df_d,df2],ignore_index = True)
Copy the code

The result is the same:

3.2 insert column

Assign directly to the new column, which is at the end of the data box. Df1 [” score “] cannot be replaced with df1.score.

["score"] = [85,58,99] df1Copy the code

Results:

Insert method that specifies the location of the new column.

Df1. Insert (1, "score2", [77,78,79]) df1Copy the code

Results:

The first argument to the INSERT method is the position of the column to be inserted, 1 means to insert the new column into the second column, the second argument is the column name, in this case score2, and the third argument is the value.

4. Delete

4.1 delete rows

The Drop method, index, specifies the row. Index = 1 means to Drop the row whose index name is 1.

Drop (index = 1) drop(index = 1)Copy the code

Results:

You can also write axis = 0 instead of index, which means delete by row.

df1.drop(1,axis = 0)
Copy the code

The result is the same

4.2 delete columns

Columns can be passed to specify columns for deleting rows

Drop (columns = "num")Copy the code

Results:

You can also pass no columns, but the axis = 1 argument.

df1.drop("num",axis = 1)
Copy the code

5. Change

Replace (A,B). Replace (A,B). Replace (A,B). Select the age column, replace the value of 15 in the age column with 25, print df1, and set the inplace = True parameter to update immediately.

# replace df1["age"]. Replace (15,25,inplace = True) df1Copy the code

Results:

The last example was to replace one value with another, but what if you want to replace 18 and 16 with 26? Box 16 and 18 with a list and replace them with 26.

# many-to-one replace df1["age"]. Replace ([18,16],26,inplace = True) df1Copy the code

Results:

For example, the num column 101,102,103 should be replaced with 1001,1002 and 1003. This is where the dictionary comes in handy.

Replace # many-to-many df1) [r]. "num" replace ({101:1001102-1002103:1003}, inplace = True) df1Copy the code

Results: