Import the XLSX file using the read_excel() method

Import pandas as pd df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ') print(df)Copy the code

A few points to note:

1. Install dependent libraries

Run the above code directly, error:

Since Anaconda is not used locally and the PIP command is used directly, many other modules need to be installed.

Install OpenPyXL

Run again:

Date Classification Title Author 0 Historical records Sima Qian 1 Novel DREAM of the Red Chamber Cao Xueqin 2 Prose Culture Journey after the Autumn Rain 3 History of the Ming Dynasty those things the Moon 4 Comics half an hour comics mix Confucius 5 Essays Bacon essay Weight (kg)Copy the code

2. File path

The default file path on the PC is \. In this case, you need to add r before the path to avoid escaping \. Or you can replace all \ in the path with /. Plus r is recommended.

This rule also applies when importing files in other formats. I will not repeat this later.

3. Sheet selection

Data analysis test table. XLSX table contents are as follows:

3.1 Import the first sheet by default

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ') print(df)Copy the code

The result:

Classification title Author 0 Historical records Sima Qian 1 novel a Dream of red Mansions Cao Xueqin 2 Prose Culture Bitter Journey after autumn Rain 3 History of the Ming Dynasty those things the Moon 4 Comics half an hour comics mix Confucius Yue 5 Essays Bacon essays BaconCopy the code

3.2 Set the sheet_name parameter to specify the sheet content to be imported

3.2.1 Specify the imported Sheet by its name

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=" locale ") print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

3.2.1 Specify the imported sheets by the order of the sheets

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

4. Index specification

4.1 Specifying a row index

When importing a local file into a DataFrame, the row index uses the default index starting from 0, which can be set by setting the index_col parameter. This parameter indicates the number of lines in the.xlsx file to index, counting from 0

4.1.1 Use the first column data in the table as the row index

Df = pd.read_excel(r 'c: Users\admin\Desktop\ XLSX ', sheet_name=1, index_col=0) print(df)Copy the code

result:

Province City region Northeast Liaoning Dalian Northwest Shaanxi Xi 'an South China Guangdong Shenzhen North China Beijing Beijing Central China Hubei WuhanCopy the code

4.1.2 Use the data in the second column of the table as the row index

Df = pd.read_excel(r 'c: Users\admin\Desktop\ XLSX ', sheet_name=1, index_col=1) print(df)Copy the code

result:

Regional city Province Northeast Liaoning Dalian Northwest Shaanxi Xi 'an Guangdong South China Shenzhen Beijing North China Beijing Hubei Central China WuhanCopy the code

4.1.3 Use the number starting from 0 by default as the row index

Df = pd.read_excel(r 'c: Users\admin\Desktop\ XLSX ', sheet_name=1, index_col=None) print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

4.2 Column index specification

When importing a local file into a DataFrame, the column index defaults to the first row (or header) of the metadata table as the column index, which can be set by setting the set header parameter. The header parameter defaults to 0, which means the first row is used as the column index. It could be any other line, as long as you pass in the number corresponding to the specific line. You can also use a number starting from 0 by default as an index

4.2.1 Adopt the first column data in the table as the column index

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1, header=0) print(df)Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

4.2.2 Use the data in the second column of the table as the column index

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1, header=1) print(df)Copy the code

result:

Northeast Liaoning Dalian 0 Northwest Shaanxi Xi 'an 1 South China Guangdong Shenzhen 2 North China Beijing Beijing 3 Central China Hubei WuhanCopy the code

4.2.3 Use numbers starting from 0 by default as column indexes

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1, header=None) print(df)Copy the code

result:

1 Northeast Liaoning Dalian 2 Northwest Shaanxi Xi 'an 3 South China Guangdong Shenzhen 4 North China Beijing Beijing 5 Central China Hubei WuhanCopy the code

5. Read data

5.1 Reading Row Data

5.1.1 Reading Data of a Row

Df = pd.read_excel(r 'c: Users\admin\Desktop\ XLSX ', sheet_name=1) data = df.loc[0]. Values print(data)Copy the code

This is reading the first row

The result:

[' Northeast ', 'Liaoning', 'Dalian ']Copy the code

5.1.2 Reading Multiple rows of Data

Note that the arguments in loC must be lists.

Df = pd.read_excel(r 'c: Users\admin\Desktop\ XLSX ', sheet_name=1) data = df.loc[[1, 2, 3]]. Values print(data)Copy the code

This is reading the second, third, and fourth rows of data

The result:

[[' northwest ' 'shaanxi' 'xi 'an] [' south' ' 'in guangdong' shenzhen '] [' north ' 'Beijing' 'Beijing']]Copy the code

5.1.3 Randomly obtain the contents of any several lines

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) data = df.sample(2). Values print(data)Copy the code

result:

[[' North China ' 'Beijing '] [' Northwest' 'Shaanxi' 'Xi 'an ']]Copy the code

5.1.4 Read the first few rows of data

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) print(df.head())Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'an 2 South China Guangdong Shenzhen 3 North China Beijing Beijing 4 Central China Hubei WuhanCopy the code

Head () reads the first five rows by default

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) print(df.head(2))Copy the code

result:

Region province city 0 Northeast Liaoning Dalian 1 Northwest Shaanxi Xi 'anCopy the code

Head (2) reads the first two rows

5.2 Reading Column Data

5.2.1 Reading Single Column Data

Df = pd.read_excel(r'C: Users\admin\Desktop\ XLSX ', sheet_name=1) data = df[' province ']. Values print(data)Copy the code

result:

[' Liaoning ', 'Shaanxi', 'Guangdong', 'Beijing', 'Hubei ']Copy the code

Tips: This method cannot read the contents of multiple columns at the same time, but only one row at a time

5.2.2 Reading the specified row and column

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) data = df.loc[[1, 3, 4],{" province "}. Values print(data)Copy the code

This reads the province data in the second row, the province data in the fourth row, and the province data in the fifth row

result:

[' Shaanxi '] [' Beijing '] [' Hubei ']Copy the code

5.2.3 Reading multi-row and multi-column data

Df = pd.read_excel(r 'c: Users\admin\Desktop\ XLSX ', sheet_name=1) data = df.loc[[1, 3, 4],{" province ", }]. Values print(data)Copy the code

This reads the province and city data in the second row, the province and city data in the fourth row, and the province and city data in the fifth row

result:

[[' Xi 'an '] [' Beijing '] [' Hubei ']]Copy the code

5.2.4 Get specified column data for all rows

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) data = df.loc[:, {" province ", }]. Values print(data)Copy the code

This is reading the province and city data for each row in the table

result:

[[' liaoning ' 'dalian] [' shaanxi' 'xi 'an] [' guangdong' 'shenzhen'] [' Beijing ' 'Beijing'] [' hubei ' 'wuhan]]Copy the code

5.3 Obtaining line name output

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) print(df.index. Values)Copy the code

result:

[0 1 2 3 4]
Copy the code

5.4 Obtaining column name output

Df = pd.read_excel(r 'c :\Users\admin\Desktop\ XLSX ', sheet_name=1) print(df.columns. Values)Copy the code

result:

[' region ', 'province', 'city ']Copy the code