In the era of e-commerce, commodity data of Taobao, JINGdong and Tmall are of great help to store operation. Therefore, it is of great value to obtain the data of corresponding store commodities. Then how do we obtain the corresponding data?

This time we will explain taobao crawler code ———— code is as follows:

from selenium import webdriver
import time
import csv
import re
Copy the code

Import automatic library Selenium, cache time library, save the file in CSV form, import re re matching library

if __name__ == '__main__': Path = r 'l :\ webDriver \chromedriver.exe' driver = webdriver.Chrome(path) driver.get('https://www.taobao.com/') main()Copy the code

Input the query key words: for example, input INS trend T-shirt, path for webDriver. Exe driver device storage path, instantiate a driver object, and then use the get method to visit Taobao website! Call the main() method.

Please remember, be sure to scan code login! Otherwise it will be taobao reverse climb! As shown!

Insert a picture description here

  • The results are as follows:

20210116103556989.png

def main(): Print (' search_product(keyword) get_product() page = search_product(keyword) get_product() page_num = 1 # q = page: Print (' - * - * 50) print (' climbing on taking the first {} page data '. The format (page_num + 1)) print (' * - * * 50) driver.get('https://s.taobao.com/search?q={}&s={}'.format(keyword, # implICITly_wait (2) # maximize_window() get_product() page_num += 1Copy the code

In the main() method, the search_product function and get_product function are first used to crawl a page of data, and then the while loop is used to crawl all the data. We will first explain the crawl of a page of data.

def search_product(key): Driver.find_element_by_id ('q').send_keys(key) driver.find_element_by_class_name('btn-search').click() # maximize the browser window Driver.maximize_window () time.sleep(15) # Find_element_by_xpath ('//*[@id="mainsrp-pager"]/div/div/div/div[1]').text page = re.findall('(\d+)', page)[0] return int(page)Copy the code

Firstly, driver.find_element_by_id is used to find the input box, and the key variable is entered into the search box. Then driver.find_element_by_class_name is used to find the search word, and click() is used to click search. Maximize the window and pause for 15s. Since automatic login to Taobao will be recognized by Alibaba, all pause for 15s is to use manual scan code login. Then use xapth to find the label of the page number, match the number to get the first value, return the page number, such as page 5, return 5, pass the parameter to the page, call get_product() to get the product details of the page. For example, the name of the product, the price of the product, the number of people paying for the product, the address of the product, the name of the store, and so on

Insert a picture description here

def get_product(): divs = driver.find_elements_by_xpath('//div[@class="items"]/div[@class="item J_MouserOnverReq "]') print(divs) for div in divs: Info = div.find_element_by_xpath('.//div[@class="row row-2 title"]/a') Div.find_element_by_xpath ('.//strong').text + "meta" # number of payers deal = Div.find_element_by_xpath ('.//div[@class="deal-cnt"]').text # store name name = Div.find_element_by_xpath ('.//div[@class="shop"]/a').text # Store location place = div.find_element_by_xpath('.//div[@class="location"]').text print(info, price, deal, name, place, Sep = '|') with the open (' ins with short sleeves. CSV ', 'a', newline = "") as fp: csvwriter = csv.writer(fp, delimiter=',') csvwriter.writerow([info, price, deal, name, place])Copy the code

First find the divs tag of the list of goods, then use the for loop to get the div tag of each item, use xpath syntax to get the info, price, deal, name, place information and save it as a CSV file!

  • The data finally climbed down is imported into Excel, as shown in the figure below:

    Insert a picture description here

If you are interested in Python, please add QQ: 3271330538