I don’t know what to say

As a reptile, my first small task was, of course, to climb our cute new friend, “New Pen Fun Pavilion”.

It’s not that difficult,

But let me share a little bit about my coding journey,

I hope I can give you some ideas or help,

Of course, if there’s a bigwig who can point out mistakes or things that can be improved,

That would be even better,

Waiting for you yo ~

Projects show

So before I talk about the project,

Just to show you how it works,

Otherwise, after looking for a long time, the result is not what you want that is not autistic.

< — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — line — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — >

Honestly, I think I’m stupid enough to write code like this, because if you don’t download a whole novel,

The result is my stupid chapter after chapter, and even feel good, harm

Explanation of code ideas

Modules to be used by the project

Import osfrom time import sleepImport requestsfrom LXML import etree

PIP /pip3 install Import osfrom time Import sleep Import requestsfrom LXML import etree

All novel interface links of new Pen Fun Pavilion, all novel links are here

Url = ‘www.xbiquge.la/xiaoshuodaq… Headers = {‘ user-agent ‘:”Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit / 537.36 (KHTML, Like Gecko) Chrome/73.0.3683.103 Safari/537.36″} All_book_html = etree.html (all_book_r.content.decode(‘ utF-8 ‘) Click the F12 key on the upper right of the keyboard or right-click the mouse and click check to enter the page debugging interface (white part on the right). Then click the “Select” button in the upper left corner of the debugging console. Select a novel at first, and then click it to find that the HTML interface in the left debugging console has jumped to the place we selected. We can see that this is an A tag under an UL list, and this UL list stores all the links to novels, then it is easy to do ~

We use Chrome’s xpath plugin to get links to all the novels of new Biquge. If not, you can download and install them yourself. (strongly recommended, especially useful) Link: pan.baidu.com/s/1_HzBzOp-… Extract code: sb7p xpath usage: here is a big guy’s blog, do not understand can see

We’re going to put the code that gets the information that we need correctly into the xpath method, and the xpath method is going to put all the information that we need from the HTML page that we get into a list, which is the variable names that you set up

Stores a list of all novel links

All_book_url = all_book_html.xpath(‘//div[@class=”novellist”]/ul/li/a/@href’)# all_book_title = all_book_title All_book_html. Xpath (‘ / / div [@ class = “novellist”] / ul/li/a/text () ‘) print (all_book_url) # below

Set the num value and judge it once in each cycle until the novel is found. Then the num value will be used as the subscript of the list above to obtain the link of the novel that the user wants to download and obtain the data of the novel

Type in the title of the novel you are looking for

Find_book = input(‘ enter the name of the book you want to download :’) num = 0 If find_book == book_title: Print (‘ find, Book_url = all_book_URL [num] # request book_r = requests. Get (book_url, Headers =headers) # parse book_html = eree.html (book_r.content.decode(‘ utF-8 ‘)) # list of chapter links for this novel book_URL = Book_html.xpath (‘//div[@id=”list”]/dl/dd/a/@href’) # List of chapter names of the novel chapter_title = Book_html.xpath (‘//div[@id=”list”]/dl/dd/a/text()’) # loop once with num += 1 Not only was it confusing, but it was hard to find the chapters I wanted to read. (It wasn’t that hard as long as the stories were packed together, but my sister thought.)

Checks whether the path exists, returns true if it does, and flase if it does not

judge = os.path.exists(‘.. / fiction /%s’ % STR (book_title))# judge: os.makedirs(‘.. / novel /%s ‘% STR (book_title)) loops the desired chapter links through the user-entered value, and then retrieves the novel text content via xpath

Tell users how many chapters novel print (‘ < — — — — — – please enter a number (the novel, a total of % s) — — — — — – > ‘% len (chapter_title))

# download_book_start = int(input(' input from chapter: ')) download_book_end = int(input(' input to chapter: ') ')) chapter_num = 0 # download_book_start - 1 because list subscripts start from 0 for book_content_URL in book_url[download_book_start - 1:download_book_end]: Sleep (2) new_book_content_URL = 'http://www.xbiquge.la' + book_content_url book_content_r = requests.get(new_book_content_url, Headers =headers) book_content_html = etree.html (book_content_r.content.decode(' utF-8 ')) # book_content_html.xpath('//div[@class="box_con"]/div[@id="content"]/text()')Copy the code

Loop the list of contents into the all_content string, and write it to the text file with open(‘.. / novel /%s/% s.ext ‘% (STR (book_title), chapter_title[download_book_start + chapter_num -1]), ‘w’, Encoding =’ utF-8 ‘) as write_content: # All_content = “for content in book_content: all_content += content write_content.write(all_content) print(chapter_title[download_book_start + chapter_num -1], Chapter_num += 1 print(‘ All downloads complete ‘)

Emmm, probably maybe maybe finished, should talk about quite detailed, MOE new write for the first time, if there is any deficiency, can put forward (do not spray), will slowly improve the complete code

import osfrom time import sleep import requestsfrom lxml import etree

Url = ‘www.xbiquge.la/xiaoshuodaq… ‘

Headers = {‘ user-agent ‘:”Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36″}

all_book_r = requests.get(url, headers=headers)

all_book_html = etree.HTML(all_book_r.content.decode(‘utf-8’))

all_book_url = all_book_html.xpath(‘//div[@class=”novellist”]/ul/li/a/@href’)

all_book_title = all_book_html.xpath(‘//div[@class=”novellist”]/ul/li/a/text()’)print(all_book_url)

Num = 0for book_title in all_book_title: if find_book == book_title: Print (‘ I ‘, book_title) book_url = all_book_url[num]

book_r = requests.get(book_url, headers=headers) book_html = etree.HTML(book_r.content.decode('utf-8')) book_url = book_html.xpath('//div[@id="list"]/dl/dd/a/@href') chapter_title = book_html.xpath('//div[@id="list"]/dl/dd/a/text()') judge = os.path.exists('.. %s' % STR (book_title)) if not judge: os.makedirs('.. STR/novel / % s' % (book_title)) print (' < -- -- -- -- -- - please enter a number (the novel, a total of % s) -- -- -- -- -- - > '% len (chapter_title) download_book_start = Int (input(' input to the end of chapter: ')) download_book_end = int(input to the end of chapter: ')) chapter_num = 0 for book_content_url in book_url[download_book_start - 1:download_book_end]: sleep(2) new_book_content_url = 'http://www.xbiquge.la' + book_content_url book_content_r = requests.get(new_book_content_url, headers=headers) book_content_html = etree.HTML(book_content_r.content.decode('utf-8')) book_content = book_content_html.xpath('//div[@class="box_con"]/div[@id="content"]/text()') with open('.. / novel /%s/% s.ext '% (STR (book_title), chapter_title[download_book_start + chapter_num -1]), 'w', encoding='utf-8') as write_content: all_content = '' for content in book_content: all_content += content write_content.write(all_content) print(chapter_title[download_book_start + chapter_num -1], Chapter_num += 1 print(' book_title ') break elif num + 1 == len(all_book_title): print(' book_title ') num += 1Copy the code

This article is reprinted from SCDN