Python Small Project - crawler articles on eastern wealth

Environment configuration

Install Python3.7.3, beautifulsoup4, requests

Request that the HTML page content be printed

import requests
url = 'http://www.eastmoney.com/'
req = requests.get(url)
req.encoding = req.apparent_encoding
html = req.text
Copy the code

Print (HTML) to print the content of the web page

Web page parsing

Beautifulsoup module is used to extract the message from the home page of Oriental Fortune Network. Right-click the corresponding element and choose Check. Then we can see the source code of the web page.

We find that the corresponding elements are selected by <div class=”nlist”>, and we can filter out the corresponding code accordingly.

from bs4 import BeautifulSoup
bf = BeautifulSoup(html, 'lxml')
nmlist = bf.find_all(class_ = 'nlist')
Copy the code

The title of the discovery message and the link <a> are qualified and obtained using find_all

a = nlist.find_all('a')
for each in a:
    print(each.string, each.get('href'))
Copy the code

Store the CSV

import csv
date = open('test.csv','w')
writer = csv.writer(date)
date.close()
Copy the code

The complete code is as follows

# -*- coding: utf-8 -*-
# @Time    : 2019/4/8 17:40
# @Author  : linjingtu
# @Email   : [email protected]
# @File    : test.py
# @Software: PyCharm

import requests
import lxml
from bs4 import BeautifulSoup
import csv

date = open('F:\\test.csv', 'w+')
writer = csv.writer(date)

url = 'http://www.eastmoney.com/'
req = requests.get(url)
req.encoding = req.apparent_encoding
html = req.text

bf = BeautifulSoup(html, 'lxml')
nlist = bf.find_all(class_ = 'nlist')[0]

a = nlist.find_all('a')
for each in a:
    a_list = []
    a_list.append(each.string)
    a_list.append(each.get('href'))
    writer.writerow(a_list)

date.close()
#print(nlist)
Copy the code

Shenzhen programmer exchange group 550846167

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Python Small Project – crawler articles on eastern wealth

Environment configuration

Request that the HTML page content be printed

Web page parsing

Store the CSV

The complete code is as follows

Python Small Project – crawler articles on eastern wealth

Environment configuration

Request that the HTML page content be printed

Web page parsing

Store the CSV

The complete code is as follows

Related Posts

[GitHub] with me white whoring GitHub Pages do personal site?

Vuejs component list should have explicit keys problem

LeetCode brush questions do not know how many days can stick to orz… 13. Convert Roman numerals to whole numbers