Environment configuration

Install Python3.7.3, beautifulsoup4, requests

Request that the HTML page content be printed

import requests
url = 'http://www.eastmoney.com/'
req = requests.get(url)
req.encoding = req.apparent_encoding
html = req.text
Copy the code

Print (HTML) to print the content of the web page

Web page parsing

Beautifulsoup module is used to extract the message from the home page of Oriental Fortune Network. Right-click the corresponding element and choose Check. Then we can see the source code of the web page.

We find that the corresponding elements are selected by <div class=”nlist”>, and we can filter out the corresponding code accordingly.

from bs4 import BeautifulSoup
bf = BeautifulSoup(html, 'lxml')
nmlist = bf.find_all(class_ = 'nlist')
Copy the code

The title of the discovery message and the link <a> are qualified and obtained using find_all

a = nlist.find_all('a')
for each in a:
    print(each.string, each.get('href'))
Copy the code

Store the CSV

import csv
date = open('test.csv','w')
writer = csv.writer(date)
date.close()
Copy the code

The complete code is as follows

# -*- coding: utf-8 -*-
# @Time    : 2019/4/8 17:40
# @Author  : linjingtu
# @Email   : [email protected]
# @File    : test.py
# @Software: PyCharm

import requests
import lxml
from bs4 import BeautifulSoup
import csv

date = open('F:\\test.csv', 'w+')
writer = csv.writer(date)

url = 'http://www.eastmoney.com/'
req = requests.get(url)
req.encoding = req.apparent_encoding
html = req.text

bf = BeautifulSoup(html, 'lxml')
nlist = bf.find_all(class_ = 'nlist')[0]

a = nlist.find_all('a')
for each in a:
    a_list = []
    a_list.append(each.string)
    a_list.append(each.get('href'))
    writer.writerow(a_list)

date.close()
#print(nlist)
Copy the code

Shenzhen programmer exchange group 550846167