This is the sixth day of my participation in the First Challenge 2022. For details: First Challenge 2022.

preface

A few days ago, a friend of mine asked me about learning Python and VBS. He wanted to automate his complicated Excel operation. He realized that both Python and VBS could achieve his goal, so he was wondering which one to learn? Since I’m a Python guy, I suggested Python on the spot.

In order to experience the simplicity and efficiency of Python in practice, we will enter the Excel world of Pandas.

The installation

pip install pandas
Copy the code

When you encounter installation errors, you can usually resolve them by upgrading PIP and setupTools.

practice

About the DataFrame

A DataFrame is an abstract data object (table type) in Pandas. Data in Excel can be converted to a DataFrame object.

DataFrame and Excel attribute comparison table

pandas Excel
DataFrame Sheet page
Series column
Index The line Numbers
row line
NaN An empty cell

Data input output

In Jupyter, we used Pandas to print a table with two rows and two columns.

Then we write the above data into excel file, and finally read the data of excel file and print it out.

Data limit

First we increase the data in the test table to 10 rows and 10 columns.

We can re-read the table to show only the first five rows.

If you want to specify an action column name (the starting line), you can use the header argument, which defaults to 0.

As you can see above, the data starts at row 5 and only shows the “first five rows”.

We can also show only data with columns 10 greater than 10.

You’ll see how simple and elegant it all is.

The data processing

If we want to manipulate the data in the table, we can do so. Let’s say I divide column 6 by 2.

The other operations are similar and will not be shown.

All of the above are column-based operations. How do you implement row operations? Simple, you can see that all operations on columns use column names, so that operations on rows use row indexes.

The operation

Show the third row of data.

Add 10 to all the data in row 4.

I’m sure you’ve got the rest of the operation figured out.

Operations on cells

In general, we may need to operate on a cell in addition to rows and columns.

DataFrame is used to locate cells. Pandas, however, provides more than one method. See the following figure for the differences.

methods explain
at Uniquely locate cells by row/column labels
loc Access cells through labels or arrays
iat Row/column cells are accessed through integer positions

Locate the cell in row 2, column 2.

Suppose we want to square data 4 on column 3, row 3.

The last

In addition, there are many other features that are more flexible than the original Excel: formulas, data formats, sorting, look-and-replace, and perspectives. I strongly recommend that any student who is interested in Pandas try it. It will surprise you.