Use Python to solve the classification of massive data summary ~ one-click office magic!

Project introduction

Yesterday suddenly discovered that a more headaches, there is a data is a school dormitory, have different dimensions, the classification of the general data about 4000 data, the need to classify, and then classified according to different dimensions to form production, finally generated eight folder, each folder there are 24 form, This is our final implementation of this program function. If we need to use Excel to filter a lot of times, and several people need to cooperate with the work, so it is relatively laborious, then as the data analysis of Python magic tool, can solve this problem, the answer is yes!

Project ideas

1. First import this large amount of data, use CSV library, then write and parse according to Python objects, and finally store it in PyCharm to run the memory space for our next operation. 2. After importing, we need to classify, at this time we need to write an algorithm, I called it “dictionary iteration algorithm”, of course, I named it myself, this involves a lot of pits, finally we need to encapsulate this function. 3. Data preservation refers to writing data into CSV files, and finally creating folders by using Python’s built-in module OS, and finally saving data. At this time, we need to solve the problem of Chinese garble in CSV files.

The difficulties in

1. How to split and save the parsed data? 2. How to solve garbled characters when writing files? 3. How do we structure our code

The code is introduced

This is the general idea, let’s look at the procedure to achieve the functional steps

Analytical data

Def csv_data() def csv_data() Dormitory_data import CSV: : dormitory_data = [] with open(r" dormitory_data ", Encoding =' utF-8-sig ') as file:# place your CSV file and the program file in a folder f_csv = csv.reader(file) Select * from f_csv; select * from f_csv; select * from f_csv; select * from f_csv; select * from f_csv; data = {} for index in range(7): data[header[index]] = row[index] dormitory_data.append(data)Copy the code

Here we take a copy of Excel data, change its suffix to CSV file suffix, and then we import and parse the data.

This parsing process is similar to our previous article “Writing a grade Calculation System in Python.” The main thing to understand is that the first row of the table is extracted and the data is iteratively analyzed, and finally stored in a list. Note that global variables are usually declared.

The effect to perform

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/4821a8a16aca44f2921507000f579f58)

Segmentation data

Def csv_sort(): global dicts dicts=[]; I = 0 questory_datas = questory_data.copy () : Dormitory_datass = quory_datass. Copy () for x in: b = [] for sort in: dormitory_datass: A_1 = sort[" dormitory id "] B.append (a_1) dicts.append(x) dormitory_data.remove(x) dormitory_datass= quory_data.copy () if b[i][:3] ! = b[i+1][:3]: breakCopy the code

Here do not look down on this few lines of code, the algorithm inside is the need for repeated testing, only to implement, there are a few pits, really is a bit of a headache, fortunately finally solved.

1. First of all we should according to an algorithm to segment data, after browsing data, we found that each group 1-4 bedroom data are associated, 1 to 2 floor of the dorm number we according to the data node, the three indexes to judge, so that every data to iteration, and then, finally, if different, We realized that there must be different floors, and we need to split the data.

3. Use the dictionary iterative algorithm to determine when to segment data, and finally encapsulate the function.

Save the data

Def keep_data(): import CSV import OS import codecs for w in range(65,73): If not os.path.exists(path): os.mkdir(path) os.chdir(path) else: os.chdir(path) a = [] dict = dormitory_data[0] for headers in dict.keys(): Select * from a list (k in range(1,5)); select * from a list (k in range(1,5)) For K = K p in range (1, 7) : p = p csv_sort () with the open (' group % s % d % d floor. CSV '% (W, K, p),' a ', newline = ', encoding = 'utf-8 - sig) as f: Writer = csv.dictwriter (f, fieldNames =header,) # Preview the column names in advance and set them to one when the code below writes data. Writer. writeheader() # writer.writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows ! ! !" .format(W,K,P))Copy the code

This function also has several disadvantages. First, we need to design an iterative for loop to save the data, and continue to create folders automatically by using the OS module. Finally, we name the data, which is convenient for us to view. This will cause the problem of garbled characters in the form of our Chinese data.

So we went and worked it out

encoding='utf-8-sig'
1
Copy the code

Let’s take a look at a demonstration of the overall operation

! [](https://p26-tt.byteimg.com/large/pgc-image/46fe8f86ac2a4301801d503ac800ff9f)

! [](https://p26-tt.byteimg.com/large/pgc-image/1ba403d704a547d08f036af72dd0c38f)

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/ce2f1656364c4fa6b335d98f8aecfc48)

! [](https://p1-tt-ipv6.byteimg.com/large/pgc-image/9c7f08fcf778448ba08646fa6da50c87)

! [](https://p6-tt-ipv6.byteimg.com/large/pgc-image/cf1bc09b34a74390a7796e46d54fd316)

Code upgrade

1. There are other ways to automate the table data and add header information, but I won’t do the demo here, so you can find different solutions. 2. We can also draw grid lines for data tables to make our tables more beautiful, such as font center 3. Writing a program that prints automatically, linking it to our computer printer and printing the data with one click has greatly improved our efficiency.

These functions can be implemented by readers, I will not do here, after all, the code involved and the project is not easy, ha ha ha!

Office automation, one-click processing, is Python’s strength, and we can use it to solve our learning and living problems. Finally, I want to pay tribute to those people who click On Excel every day to organize data. After all, this thing is big, difficult, boring and boring

The last thing I want to say is that although it is a headache to design a project program, it is portable and constantly upgraded, and finally others use an hour, you only need 3 seconds to run and check!

Program source code

# -*- coding: UTF-8 -*- # @time: 2020/9/15 13:26 # @author: Wang Xing-wang # @software: PyCharm # @file: Dormitory data classification. Py - version 1.0 # @ CSDN: https://blog.csdn.net/weixin_47723732 # 1. Def csv_data() def csv_data(): Dormitory_data import CSV: : dormitory_data = [] with open(r" dormitory_data ", Encoding =' utF-8-sig ') as file:# place your CSV file and the program file in a folder f_csv = csv.reader(file) Select * from f_csv; select * from f_csv; select * from f_csv; select * from f_csv; select * from f_csv; data = {} for index in range(7): Data [header[index]] = row[index] dormitory_data.append(data) def csv_sort(): global dicts dicts=[]; I = 0 questory_datas = questory_data.copy () : Dormitory_datass = quory_datass. Copy () for x in: b = [] for sort in: dormitory_datass: A_1 = sort[" dormitory id "] B.append (a_1) dicts.append(x) dormitory_data.remove(x) dormitory_datass= quory_data.copy () if b[i][:3] ! Def keep_data(): import CSV import OS import codecs for w in range(55,73): def keep_data(): import CSV import OS import codecs for w in range If not os.path.exists(path): os.mkdir(path) os.chdir(path) else: os.chdir(path) a = [] dict = dormitory_data[0] for headers in dict.keys(): Select * from a list (k in range(1,5)); select * from a list (k in range(1,5)) For K = K p in range (1, 7) : p = p csv_sort () with the open (' group % s % d % d floor. CSV '% (W, K, p),' a ', newline = ', encoding = 'utf-8 - sig) as f: Writer = csv.dictwriter (f, fieldNames =header,) # Preview the column names in advance and set them to one when the code below writes data. Writer. writeheader() # writer.writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows(dicts) # writerows ! ! !" .format(W,K,P)) def main(): csv_data() keep_data() if __name__ == '__main__': main()Copy the code

Did you learn?

This article reprinted text, copyright belongs to the author, such as infringement contact xiaobian delete!

Original address: blog.csdn.net/weixin_4772…

Complete project source codeClick here to get

Use Python to solve the classification of massive data summary ~ one-click office magic!

Related Posts

Mask-rcnn is used to train custom sized datasets

[Network security] Automatic analysis and killing of Agent memory horse

Technology sharing | how to choose a suitable for your code open source licenses?