preface

Pandas is often used recently to manipulate Excel files.

So, here is a brief note of the process you usually use.

There is no description for Pandas. It will continue to be updated. It is only a personal record

Build Data 1

There are only two Excel files for simplicity, and the content is super simple.

I’m not going to give you the data, but you can just create it.

1. Merge all Excel files in the specified directory

import pandas as pd 
import os

path = r'.\ Student scores'
Get all excel files in the specified folder
excels = [file for file in os.listdir(path) if ('.xls' in file) or ('.xlsx' in file)]
{sheet_name: sheet_data,... } the dictionary
all_dfs = []
print('---------------\t start reading data \t---------------')
for excel in excels:
    print(f'####\t{excel}')
    # converters={' student number ': STR} prevents the first 0 of the student number from being lost
    dfs = pd.read_excel('\ \'.join([path, excel]), sheet_name=None,
        converters={'student id': str})
    
    all_dfs.append(dfs)
print('---------------\t end reading data \t---------------')

print('---------------\t Start merging and writing data \t---------------')
writer = pd.ExcelWriter('\ \'.join([path, 'Student scores (Summary).xlsx']), mode='w')
for key in all_dfs[0].keys():
    print(f'####\t{key}')
    # merge excel data tables where sheet_name = key
    concat_df = pd.concat([df[key] for df in all_dfs])
    concat_df.to_excel(writer, sheet_name=key, index=False)
writer.save()
print('---------------\t End merge and write data \t---------------')
Copy the code

Effect:

Note: 1) Before writing 3-4 functions, read Excel files in the specified directory, read Excel data, merge data, etc., step by step, it is better to save time like this; 2) There are few key codes, you can remove all print(), this is just a personal habit, like to see the progress of running;

This code, which is for multiple sheets, is actually simpler if you want to get the merged data of only one sheet. For example: just want to merge Chinese result

import pandas as pd 
import os

path = r'.\ Student scores'
Get all excel files in the specified folder
excels = [file for file in os.listdir(path) if ('.xls' in file) or ('.xlsx' in file)]

all_dfs = []
for excel in excels:
    df = pd.read_excel('\ \'.join([path, excel]), sheet_name='Chinese',
        converters={'student id': str})
    all_dfs.append(df)

writer = pd.ExcelWriter('\ \'.join([path, 'Student result (Chinese).xlsx']), mode='w')
concat_df = pd.concat(all_dfs)
concat_df.to_excel(writer, sheet_name='Chinese', index=False)
writer.save()
Copy the code