This article is participating in “Python Theme Month”, check out: Python Writing season, and show off your Python article – 2000 yuan for a limited prize

A lifelong learner, practitioner, and sharer committed to the path of technology, an original blogger who is busy and sometimes lazy, and a teenager who is occasionally boring and sometimes humorous.

Welcome to search “Jge’s IT Journey” on wechat!

Merge multiple sheets, merge multiple workbooks, and split a table by column in Python

1. Explanation of relevant knowledge points

1.1 Libraries to be used

import numpy as np
import pandas as pd
import os
Copy the code

1.2 OS. Walk (PWD)

For example, the following

Let’s see what’s under “G:\a” first.

1.2.2 Code operations are as follows:
pwd = "G:\\a"
print(os.walk(pwd))
for i in os.walk(pwd):
    print(i)
for path,dirs,files in os.walk(pwd):
    print(files)
Copy the code
1.2.3 The results are as follows:
<generator object walk at 0x0000029BB5AEAB88>
('G:\\a', [], ['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx'])
['aa.txt', 'bb.xlsx', 'cc.txt', 'dd.docx']

Copy the code

1.3 OS. Path. Join (path1, path2…).

Function: Multiple paths are combined and returned

For example, the following

path1 = 'G:\\a'
path2 = 'aa.txt'
print(os.path.join(path1,path2))
Copy the code

Here are the results:

G:\a\aa.txt
Copy the code

1.4 Case Analysis

Here are some examples:

pwd = "G:\\a"
file_path_list = []
for path,dirs,files in os.walk(pwd):
    for file in files:
        file_path_list.append(os.path.join(pwd,file))
print(file_path_list)
Copy the code

Here are the results:

['G:\\a\\aa.txt','G:\\a\\bb.xlsx','G:\\a\\cc.txt','G:\\a\\dd.docx']
Copy the code

1.5 How do I store multiple DataFrame data in a list

# Create two DataFrame data sources using the following code. Import numpy as np xx = np.arange(15) 0 (5,3) yy = np.arange(1,16). Shape (5,3) xx = Pd. DataFrame (xx, the columns = [" language ", "mathematics", "foreign language"]) yy = pd. The DataFrame (yy, columns = [" language ", "mathematics", "foreign language"]) print (xx) print (yy)Copy the code

Here are the results:

How do I put these two DataFrame together?

Concat_list = [] concat_list.append(xx) concat_list.append(yy) # pd.concat(list) # pd.concat(list) Is passed in parentheses with a list of DataFrame. # ignore_list=True # ignore_list=True Z = pd.concat(concat_list,ignore_list=True) print(z)Copy the code

Here are the results:

2. Merging multiple workbooks (1)

2.1 Merging Multiple Excel into one Excel (each Excel has only one sheet)

The operation is as follows:

import pandas as pd
import os
pwd = "G:\\b"
df_list = []
for path,dirs,files in os.walk(pwd):
    for file in files:
        file_path = os.path.join(path,file)                        
        df = pd.read_excel(file_path) 
        df_list.append(df)
result = pd.concat(df_list)
print(result)
result.to_excel('G:\\b\\result.xlsx',index=False)
Copy the code

Here are the results:

3. Merging multiple workbooks (2)

3.1 Explanation of relevant knowledge

3.1.1 Usage of xlsxwrite
1) Create a "workbook", which by default generates a Sheet called "Sheet1". Import xlsxwriter # This step creates a new "workbook"; # workbook "workbook" does not exist; # "demo. XLSX "file exists, new" workbook "overwrites original" workbook "; Workbook = xlsxwriter. workbook ("demo. XLSX ") Otherwise, the created file cannot be displayed. Workbook.close () 2) Create a "workbook" and add a "worksheet" named "2018 Sales ". Import xlsXWriter workbook = xlsxwriter.workbook ("cc.xlsx") worksheet = workbook.add_worksheet("2018 sales ") workbook.close()Copy the code

Here are the results:

3) Create a header for the "sales in 2018" worksheet and insert a piece of data into it. Import xlsxwriter # create a workbook named [demo.xlsx]; Workbook = xlsxwriter.workbook ("demo.xlsx") # Create a worksheet named 2018 Sales; Worksheet = workbook.add_worksheet("2018 Sales ") # Add a header to the worksheet with the write_row method; A large sections start with a keywords following a keywords: [' product ',' sales ',' unit price '] Data = [" apple ",500,8.9] for I in range(len(keywords)): worksheet.write(1,i,data[i]) workbook.close()Copy the code

Here are the results:

Other USES can refer to: www.cnblogs.com/brightbroth…

3.1.2 Usage of XLRD

1) Open an existing Excel file and return us the "xlrd.book. book "workbook object; Open the table by loading it into memory. Let's open the "test.xlsx" file we created above; Import XLRD file = r"G:\Jupyter\test.xlsx" XLRD. Open_workbook (file) <xlrd.book. book at 0x29BB8E4EDA0 > 2) Sheet_Names () : Obtains all sheet table names, if there are multiple sheet tables, returns one list; Import XLRD file = r"G:\Jupyter\test.xlsx" fh = xlrd.open_workbook(file) fH.sheet_names () # ['2018 sales ', '2019 sales '] 3) Sheets () method: returns a list of objects in the sheet table. # return the list of objects in the sheet table fh.sheets() # [<xlrd.sheet.Sheet at 0x29bb8f07a90>, <xlrd.sheet.Sheet at 0x29BB8ef1390 >] <xlrd.sheet.Sheet at 0x29BB8F07A90 > fh.sheets()[1] <xlrd.sheet.Sheet at 0x29BB8EF1390 > 4) Return the number of rows (nrows) and columns (ncols) for each sheet table; We can use the sheet object created above to operate on each sheet table; Fh.sheets ()[0].nrows # = 4 fh.sheets()[0].ncols # = 3 sheet1 = fh.sheets()[0] for row in range(fh.sheets()[0].nrows): value = sheet1.row_values(row) print(value)Copy the code

Here are the results:

6) col_values(number of columns) : obtain the data of each column in each sheet table; sheet1 = fh.sheets()[0] for col in range(fh.sheets()[0].ncols): value = sheet1.col_values(col) print(value)Copy the code

Here are the results:

3.2 Merging Multiple Excel files into one Excel (more than one Sheet in each Excel file)

Open an Excel file and create a workbook object def open_xlsx(file): Fh =xlrd.open_workbook(file) def get_sheet_num(fh): Def get_file_content(file,shnum) = len(fh.sheets()) return x # fh=open_xlsx(file) table=fh.sheets()[shnum] num=table.nrows for row in range(num): rdata=table.row_values(row) datavalue.append(rdata) return datavalue def get_allxls(pwd): allxls = [] for path,dirs,files in os.walk(pwd): for file in files: Datavalue = [] PWD = "G:\ d" for fl in get_allxls(PWD): Fh = open_xlsx(fl) x = get_sheet_num(fh) for shnum in range(x): print(" sheet_num: "+ STR (fl)+" + STR (shnum)+" sheet..." Endfile = "G:\\d\\concat.xlsx" wb1= xlsxwriter.workbook (endfile) # Ws =wb1.add_worksheet() for a in range(len(rvalue)): for b in range(len(rvalue[a])): C =rvalue[a][b] w.write (a,b,c) wb1.close() print(" file merge complete ")Copy the code

The above code is encapsulated as follows

import xlrd import xlsxwriter import os class Xlrd(): def __init__(self,pwd): Self. datavalue = [] self. PWD = PWD # Open an Excel file and create a workbook object; Def open_xlsx(self,fl): fh=xlrd.open_workbook(fl) return fh # Def get_sheet_num(self,fh): x = len(fh.sheets()) return x # def get_file_content(self,file,shnum): fh = self.open_xlsx(file) table=fh.sheets()[shnum] num=table.nrows for row in range(num): Rdata =table.row_values(row) # because every sheet has a header; # remove this header from the list; # add a table header to the table where the data is written; If rdata == [' name ',' gender ',' age ',' home address ']: pass else: self.datavalue. Append (rdata) return self.datavalue; def get_allxls(self): allxls = [] for path,dirs,files in os.walk(self.pwd): for file in files: Allxls.append (os.path.join(path,file)) return allxls # def return_rvalue(self): for fl in self.get_allxls(): fh = self.open_xlsx(fl) x = self.get_sheet_num(fh) for shnum in range(x): Print (" is read the file: "+ STR (fl) +" first "+ STR (shnum) +" the content of the sheet table..." ) rvalue = self.get_file_content(fl,shnum) return rvalue class Xlsxwriter(): def __init__(self,endfile,rvalue): Rvalue = rvalue def save_data(self): wb1 = xlsXwriter.workbook (endfile) # create a sheet; Ws = wb1.add_worksheet(" wb1.add_worksheet ") A large heading begins with a keyword (len(self.rvalue)): For b in range(len(self.rvalue[a])): c = self.rvalue[a][b] # for b in range(len(self.rvalue[a])): c = self.rvalue[a][b] # Ws.write (a+1,b,c) wb1.close() print(" file merge complete ") PWD = "G:\\d" xl = Xlrd(PWD) rvalue = xl.return_rvalue() endfile = "G:\\d\\concat.xlsx" write = Xlsxwriter(endfile,rvalue) write-save_data ();Copy the code

Here are the results:

4. A workbook with multiple sheets

4.1 Merge multiple Sheets in an Excel table and save them in the same Excel.

import xlrd import pandas as pd from pandas import DataFrame from openpyxl import load_workbook excel_name = r"D:\pp.xlsx" wb = xlrd.open_workbook(excel_name) sheets = wb.sheet_names() alldata = DataFrame() for i in range(len(sheets)): df = pd.read_excel(excel_name, sheet_name=i, index=False, encoding='utf8') alldata = alldata.append(df) writer = pd.ExcelWriter(r"C:\Users\Administrator\Desktop\score.xlsx",engine='openpyxl') book = load_workbook(writer.path) Writer. book = book # Alldata.to_excel (excel_writer=writer,sheet_name=" alldata ") writer.save() writer.close()Copy the code

Here are the results:

Five, a table split (according to a table column split)

5.1 Divide an Excel table into multiple tables according to a certain column.

import pandas as pd import xlsxwriter data=pd.read_excel(r"C:\Users\Administrator\Desktop\chaifen.xlsx",encoding='gbk') Area_list =list(set(data[' store '])) writer=pd.ExcelWriter(r"C:\Users\Administrator\Desktop\ \ table 1. XLSX ",engine='xlsxwriter') Data.to_excel (writer,sheet_name=" sheet_name ",index=False) for j in area_list: Df =data[data[' sheet_name ']==j] df.to_excel(writer,sheet_name=j,index=False) writer.save(Copy the code

Here are the results:

In this paper, to the end.


Original is not easy, if you think this article is a little useful to you, please give me a like, comment or forward for this article, because this will be my power to output more quality articles, thanks!

By the way, dig friends remember to give me a free attention yo! In case you get lost and you can’t find me next time.

See you next time!