[url=]
[/url]


#
Today’s goal

**50 lines of code to crawl all articles on wechat public account **

Today we are going to climb the wechat public number. There are two common ways to climb the public number. One is to obtain through sogou search, the disadvantage is that can only get the latest ten push articles, today introduces another way to get the public number of articles by capturing the PC end wechat method, more convenient than other methods. Analysis: We found that mp.weixin.qq.com was requested every time the article was pulled down to refresh

/mp/

XXX public account does not allow to add homepage link, XXX stands for profile_ext) interface. After many tests and analyses, the following parameters are used:
__biz
: the unique ID between the user and the public id. Uin: the user’s private idkey: the secret key of the request, which will only be invalid after a period of time. Offset: offset count: number of requests

* Code implementation *

` ` `
import
requests
import
json
import
time
from

pymongo

import
MongoClienturl

=

http://mp.weixin.qq.com/mp/xxx
(public account does not allow to add homepage link, XXX stands for profile_ext)
#
Mongo

configuration

conn = MongoClient(

127.0.0.1

, 27017,

)db

= conn.wx

#
Connect to the WX database, or create it automatically

mongo_wx = db.article

#
Use the Article collection; if not, it is created automatically
def

get_wx_article(biz, uin, key, index=0, count=10

): offset

= (index + 1) *

count params

=

{
__biz
: biz,
uin
: uin,
key
: key,
offset
: offset,
count
: count,
action

:

getmsg
.
f

:

json
} headers

=

{
User-Agent

:

Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36
} response

= requests.get(url=url, params=params, headers=

headers) resp_json

=

response.json()
if

resp_json.get(

errmsg

) = =

ok
: resp_json

=

response.json()
#


Whether there is paging data to determine the value of return

can_msg_continue = resp_json[

can_msg_continue
]
#
Number of current paged articles

msg_count = resp_json[

msg_count
] general_msg_list

= json.loads(resp_json[

general_msg_list
]) list

= general_msg_list.get(

list
)
print

(list,

* * * * * * * * * * * * * *
)
for

i

in
list: app_msg_ext_info

= i[

app_msg_ext_info
]
#
The title

title = app_msg_ext_info[

title
]
#
This article addresses

content_url = app_msg_ext_info[

content_url
]
#
cover

cover = app_msg_ext_info[

cover
]
#
Release time

datetime = i[

comm_msg_info

] [

datetime
] datetime

= time.strftime(

%Y-%m-%d %H:%M:%S
, time.localtime(datetime)) mongo_wx.insert({
title
: title,
content_url
: content_url,
cover
: cover,
datetime
: datetime })
if

can_msg_continue == 1

:
return
True
return
False
else
:
print

(

Get article exceptions…
)
return
False
if
__name__

= =

__main__
: biz

=

Mzg4MTA2Nzg0NA==
uin

=

NDIyMTI5NDM1
key

=

20a680e825f03f1e7f38f326772e54e7dc0fd02ffba17e92730ba3f0a0329c5ed310b0bd55


b3c0b1f122e5896c6261df2eaea4036ab5a5d32dbdbcb0a638f5f3605cf1821decf486bb6eb4d92d36c620
index

=

0
while

1

:
print

(f

Start to grab the public number {index + 1} page article.
) flag

= get_wx_article(biz, uin, key, index=

index)
#
To prevent harmony, pause for 8 seconds

time.sleep(8

) index

+ = 1

if
not
flag:
print

(

Public article has been all captured, exit the program.
)
break
print

(f

. Ready to grab the public number {index + 1} page article.
) ` ` `
[url=]

[/url]


More technical information can be obtained from itheimaGZ