This is the fourth day of my participation in the First Challenge 2022

Python interworks with CSV and MySQL

Make sure MySQL is available

After installing MySQL (the easiest way to install MySQL is to download the installation package from the official website and install it step by step), it is time to operate

% mysql --version
% zsh: command not found: mysql
Copy the code

Error: Command not found: mysql Solutions:

# 1, enter the following command to enter the environment variable file (not %)
% vim ~/.bash_profile 
# 2. Press A to enter the edit state and paste the following content
export PATH=$PATH:/usr/local/mysql/bin
# 3, Press ESC, then shift + :
Enter wq to save the editing
Run the following code to restart the configuration file
% source ~/.bash_profile
Copy the code

If you want to check the mysql version, run the following command

Mysql --version % mysql Ver 8.0.22forMacos10.15 on x86_64 (MySQL Community Server-gpl)Copy the code

Second, simply get some data into CSV

2.1 Page Viewing and Analysis

Here, we climb down the learning competition page data of Ali Yuntianchi

Chinese English The data type note
Event name raceName string string
Event ID raceId string 531842
Welcome to brief string A long string
Start time currentSeasonStart date The 2020-11-19 17:00:00
End time currentSeasonEnd date The 2021-01-14 23:55:00
Competition status raceState int 1 In progress

0 is over
Number of teams teamNum int The integer
Event type type int The integer
Tournament bonus currency int The integer
Event prize symbol currencySymbol string string

You can start page analysis by holding down F12 to refresh the page, refresh, refresh, and look at the loading status of the page data in Network. pageNum=1&pageSize… This is fate. The data is here. If you click Headers on the Request URL, you will see that this is a data interface

In this case, data acquisition is much simpler ~!

2.2 Write a few lines of code

Import a series of packages that need to be used

"' crawl ali cloud, big data platform tianchi, learn what, and related data address: https://tianchi.aliyun.com/competition/gameList/coupleList ' ' '

import warnings
warnings.filterwarnings("ignore")
# ignore warnings
import requests
# import page request module
from fake_useragent import UserAgent
Import a module that randomly generates request headers
import json
Import json data processing package
import pandas as pd
# Import data processing module
import pymysql
Import database processing module
Copy the code

PIP insatll no module name PIP insatll

Crawl data
def get_data() :
    # request header
    headers = {
            "User-Agent": UserAgent(verify_ssl=False,use_cache_server=False).random
        }
    # list for storing data
    csv_data = []
    
    # table header, the data item you need
    head_csv = ['raceName'.'raceId'.'brief'.'currentSeasonStart'.'currentSeasonEnd'.'raceState'.'teamNum'.'type'.'currency'.'currencySymbol']
    After page analysis, there are a total of 4 pages and 32 pieces of data
    for i in range(1.5) :# Data API and rules found through page analysis
        competition_api = 'https://tianchi.aliyun.com/competition/proxy/api/competition/api/race/listBrief?pageNum=%d&pageSize=10&type=2'%i
        # print(competition_api)
        Send a get request
        response_data = requests.get(competition_api,headers=headers)
        Convert the retrieved data to JSON format, and then process the data like a dictionary
        json_data = response_data.text
        json_data = json.loads(json_data)
        Print observation data
        # print(type(json_data['data']['list']))
        
        # Loop through each piece of data to retrieve the data you need
        for i in json_data['data'] ['list']:
            one_csv_data = []
            one_csv_data.append(i['raceName'])
            one_csv_data.append(i['raceId'])
            one_csv_data.append(i['brief'])
            one_csv_data.append(i['currentSeasonStart'])
            one_csv_data.append(i['currentSeasonEnd'])
            one_csv_data.append(1 if i['raceState'] = ='ONGOING' else 0)
            one_csv_data.append(i['teamNum'])
            one_csv_data.append(i['type'])
            one_csv_data.append(i['currency'])
            one_csv_data.append(Renminbi if i['currencySymbol'] = =A '$' else '$')
            csv_data.append(one_csv_data)
    return head_csv,csv_data

data = get_data()
Copy the code

The whole code logic is very clear ha, through the discovery of the data interface, obtain data, convert to JSON format data and then iterate to take out the data you need to store in the list first.

Then we can store the data into a CSV file while it’s hot.

Store list data as A CSV file.
def save_to_cvs(data) :
    # parse data
    head_csv = data[0] # header
    csv_data = data[1] # Table data
    Convert data to DataFrame format
    csv_data =pd.DataFrame(columns=head_csv,data=csv_data,)
    # print(csv_data)
    Call to_csv to store the data as a CSV file
    csv_data.to_csv('tianchi_competition.csv',index=False, encoding='gbk')
    
save_to_cvs(data)
Copy the code

The data are as follows:

Three, Python, CSV, MySQL three army linkage

3.1 about pymysql

Pymysql is a package (module) in Python that can connect to mysql, requiring Python version 3 or above. It is simple to use, and has been introduced in the crawler articles written long ago. For example, in XXX, pymysql is used to write data related to medical devices into mysql.

3.2 Mysql basic operation code

Create a new database.
def create_database(database, cursor) :
    Create a database, delete the database if it exists, and create a new one
    cursor.execute("drop database if exists %s;"%database)
    cursor.execute("create database %s;"%database)
    print("Database created successfully: %s"%database)
    
Create a new table in the schema database.
def create_table(table_name, table_sql, database, cursor) :
    Select database
    cursor.execute("use %s;"%database)
    Create a database named table
    Select * from table_name where table_name = 'table'
    cursor.execute("drop table if exists %s;"%table_name)
    Create table
    cursor.execute(table_sql)
    print("Successfully created table in database {0} : {1}".format(database,table_name))
    

Add, delete, query, and modify tables in a specified database.
def sql_basic_operation(table_sql, database, cursor) :
    Select database
    cursor.execute("use %s"%database)
    Execute SQL statement
    cursor.execute(table_sql)
    print("SQL statement executed successfully")
    
"" We will continue to enrich related functions and add them to the Web framework Flask for linkage.
Copy the code

3.3 Connect and store data to MySQL

  • Connect to MySQL and create a new database
Connect to database and add cursor
conn = pymysql.connect(host = "localhost",port = 3306,user = "Your username", password = "Your MySQL password",charset="utf8")
cursor = conn.cursor()
SQL > alter table name
database = "tianchi"
Create database
create_database(database, cursor)
Copy the code
  • Create a data table
table_name = "competition"
Select * from raceId
table_sql = """create table %s( raceId int not null auto_increment, raceName varchar(200), brief varchar(1000), currentSeasonStart date, currentSeasonEnd date, raceState int, teamNum int, type int, currency int, currencySymbol char(3), PRIMARY KEY(raceId)); "" "%table_name
create_table(table_name, table_sql, database, cursor)
Copy the code
  • Read data from CSV
Read CSV file data.
def read_csv(filepath) :
    To read the CSV file, call the read_csv function for pandas
    data = pd.read_csv(filepath, encoding='gbk')
    # print(type(data)) # <class 'pandas.core.frame.DataFrame'>
    sql_data = []
    # Walk through DataFrame format data, fetching line by line
    for row in data.itertuples():
        # print(type(row)) # 
      
        # print(getattr(row,'raceName')
        sql_data.append([getattr(row,'raceId'), getattr(row,'raceName'), getattr(row,'brief'), getattr(row,'currentSeasonStart'), getattr(row,'currentSeasonEnd'),
                        getattr(row,'raceState'), getattr(row,'teamNum'), getattr(row,'type'), getattr(row,'currency'), getattr(row,'currencySymbol')])
    return sql_data

filepath = 'tianchi_competition.csv'
sql_data = read_csv(filepath)
Copy the code
  • Iterate over the data and store it in MySQL
Insert data loop
for row in sql_data:
    Insert SQL statement
    sql_insert = '''insert into {0} values({1},'{2}','{3}','{4}','{5}',{6},{7},{8},{9},'{10}'); ' ' '.format(table_name,row[0],row[1],row[2],row[3],row[4],row[5],row[6],row[7],row[8],row[9])                                                                                    
    # print(sql_insert)
    sql_basic_operation(sql_insert, database, cursor)
    
conn.commit()

# close the connection
conn.close()
cursor.close()
Copy the code

3.4 Checking data Stores in MySQL

  • Terminal access mysql
mysql -hlocalhost -uroot -p
Copy the code

  • Select the database and view the table data
# select database: tianchi use tianchi; Select * from competition;Copy the code

This completes the linkage between Python, CSV, and mysql.

Four, in order to learn, small analysis

One small requirement: the ids of the top five study competitions currently taking place in the study competition, the names of the study competitions, the number of participants, and a brief introduction

  • Python
filepath = 'tianchi_competition.csv'
To read the CSV file, call the read_csv function for pandas
data = pd.read_csv(filepath, encoding='gbk')
ID, name, number of participants and brief introduction of the top five study competitions currently in progress
# 1. Remove finished races
data = data[data['raceState'] = =1]
# 2: Select * from teamNum ()
data = data.sort_values(by='teamNum', axis=0, ascending=False)
# 3, take the first 5 and print the specified column
data.head(5).ix[:, ['raceId'.'raceName'.'teamNum'.'brief']]
Copy the code

\

  • SQL
select raceId, raceName, teamNum, brief 
from competition 
where raceState=1 
order by teamNum DESC 
limit 5;
Copy the code

\

  • Excel


Step1 selectedraceStateColumn, click Sort and Filter, select FilterSetp2 choiceraceStateIf the value is 1, the match is in progressStep3 hold down the control key, select and right-click, and select hide unwanted columnsStep4 Select the teamNum column, click Sort and Filter, and sort in descending orderStep5 copy the first 5 data to the new table, complete

In terms of ease of operation and difficulty, SQL is probably the most convenient. An SQL statement is done. In terms of universality, Excel has a low threshold, what you see is what you get.

Python is much more convenient, especially for a Python learner like me, and obviously much more flexible.

Five, a brief summary

Simple, verbose, 2.5K words, 100 lines of code, I don’t know if you have learned the basic crawler and Python linkage CSV and MySQL skills, learned to remember to like + in + forward ~

If I get more than 30 likes in this issue, I’m going to add the Flask framework for a simple data visualization system, and you might be wondering what this has to do with CSV. Mysql directly add flask is not good, can be classified as support functions of data import, now looks very humble, perfecting its shape first, later to optimize its kernel.

Like to see the message forwarding, four support, the original is not easy. Ok, see you next time, I love the cat love technology, more love si si’s old cousin Da Mian ଘ(˙꒳˙)ଓ Di Di