This is the 21st day of my participation in the November Gwen Challenge. Check out the event details: The last Gwen Challenge 2021

Experiment 3

3.1 the topic

Proficient in Selenium searching HTML elements, crawling Ajax web data, waiting for HTML elements, etc.

Selenium framework + MySQL database storage technology is used to climb the stock data information of “Shanghai and Shenzhen A-shares”, “Shanghai A-shares” and “Shenzhen A-shares”.

Candidate sites: Oriental wealth network: quote.eastmoney.com/center/grid…

3.2 train of thought

3.2.1 Sending a Request

  • Introduction drive
chrome_path = r"D:\Download\Dirver\chromedriver_win32\chromedriver_win32\chromedriver.exe"  The path of the driver
browser = webdriver.Chrome(executable_path=chrome_path)
Copy the code
  • Save the sections you need to climb
    target = ["hs_a_board"."sh_a_board"."sz_a_board"]
    target_name = {"hs_a_board": Shanghai and Shenzhen A-shares."sh_a_board": "Shanghai A-Share"."sz_a_board": Shenzhen A-shares}
Copy the code

The plan is to crawl two pages of information from three templates.

  • Send the request
    for k in target:
        browser.get('http://quote.eastmoney.com/center/gridlist.html#%s'.format(k))
        for i in range(1.3) :print("------------- page {} ---------".format(i))
            if i <= 1:
                get_data(browser, target_name[k])
                browser.find_element_by_xpath('//*[@id="main-table_paginate"]/a[2]').click() # flip
                time.sleep(2)
            else:
                get_data(browser, target_name[k])
Copy the code

Time. Sleep (2)

Otherwise, he will request so quickly that even though you turn to the second page, you still crawl the first page of information!!

3.2.2 Obtaining a Node

  • Even when parsing web pagesimplicitly_waitWaiting for the
  browser.implicitly_wait(10)
  items = browser.find_elements_by_xpath('//*[@id="table_wrapper-table"]/tbody/tr')
Copy the code

And then this item is all the information

    for item in items:
        try:
            info = item.text
            infos = info.split("")
            db.insertData([infos[0], part, infos[1], infos[2],
                  infos[4], infos[5],
                  infos[6], infos[7],
                  infos[8], infos[9],
                  infos[10], infos[11],
                  infos[12], infos[13]])except Exception as e:
            print(e)
Copy the code

3.2.3 Saving Data

  • Database class that encapsulates initialization and insert operations
class database() :
    def __init__(self) :
        self.HOSTNAME = '127.0.0.1'
        self.PORT = '3306'
        self.DATABASE = 'scrapy_homeword'
        self.USERNAME = 'root'
        self.PASSWORD = 'root'
        Open a database connection
        self.conn = pymysql.connect(host=self.HOSTNAME, user=self.USERNAME, password=self.PASSWORD,
                                    database=self.DATABASE, charset='utf8')
        Create a cursor object cursor using the cursor() method
        self.cursor = self.conn.cursor()

    def insertData(self, lt) :
        sql = "INSERT INTO spider_gp(serial number, block, stock code, stock name, latest offer, up/down, up/down, volume, turnover, amplitude, high, low, today, yesterday)" \
              "VALUES (%s,%s, %s, %s, %s, %s,%s, %s, %s, %s, %s,%s,%s,%s)"
        try:
            self.conn.commit()
            self.cursor.execute(sql, lt)
            print("Insert successful")
        except Exception as err:
            print("Insert failed", err)
Copy the code