About the dataframe data structure

1. Standardize data:

data = pd.read_excel('data_shiyan.xlsx',header=0)

data = data.apply(lambda x: ((x - np.min(x)) / (np.max(x) - np.min(x))))
Copy the code

2. Find the lengths of the rows and columns:

col,row = data.shape

3. On how to index (copied link:www.cnblogs.com/nxf-rabbit7…)

# loc can get through the index and the columns, can't use digital df. Loc [` ` 'one' ` `, ` ` 'a' ` `] ` ` # one row, a column of df loc [` ` 'one' ` ` : ` ` 'two' ` `, ` ` 'a' ` `] ` ` # one to two lines, Column a df. Loc [` ` 'one' ` ` : ` ` 'two' ` `, ` ` 'a' ` ` : ` ` 'c' ` `] ` ` # one to two lines, a to c column df. Loc [` ` 'one' ` ` : ` ` 'two' ` `, [` ` 'a' ` `, ` ` 'c' ` `]] ` ` # one to two lines, Ac columnCopy the code
Iloc can only use numeric index, not index name
df.iloc[``0` ` : ` `2` `] ` `Before the # 2 line
df.iloc[``0` `] ` `Line # 0
df.iloc[``0` ` : ` `2` `, ` `0` ` : ` `2` `] ` `#0, 1 row, 0, 1 column
df.iloc[[``0` `, ` `2` `], [` `1` `, ` `2` `, ` `3` `]] ` `# Row 0, 2, 1, 2, 3
Copy the code

4. Create lists with Numpy (like list)

import numpy as np
arr1 =  np.numpy(n)# Create a behavior n amount of data
arr2 =  np.numpy((n,m)) Create a matrix with n rows and m columns

Copy the code

5. On the use of time series (see blog:www.programiz.com/python-prog…

6. Solution for lineplot() where hue is used cannot be set legend (link:Stackoverflow.com/questions/5…

Show that there is no good solution, which can be solved by using the set_test method after obtaining the legend in the problem

legend = ax.legend()
legend.texts[0].set_text("Whatever else")
Copy the code

Merge table pd.merge()

The merge mode can be inline, left, or right. You can use the merge mode

pd.merge(how ='inner',on ="") #on can indicate which column to join on
Copy the code

8. If you need to calculate the ratio of a column, but there are different column calculations, use transform(“sum”) to get the ratio (see link:Pbpython.com/pandas_tran…)

df["Order_Total"] = df.groupby('order') ["ext price"].transform('sum') # Calculate the total for each axis
df["Percent_of_Order"] = df["ext price"] / df["Order_Total"]
Copy the code

9. If multiple columns are needed, use data.apply(axis=1). Axis refers to references in rows, not in rows