accessflyai.clubCreate your AI project with one click



The author | apes tx program

https://juejin.cn/post/6844903605242183687


I was still young and funny…


Anyway, I’m using


Selenium simulates login +BeautifulSoup4 to crawl data +wordcloud to generate wordcloud


BeautifulSoup installation


pip install beautifulsoup4


The following table lists the major parsers, along with their pros and cons:



Selenium Simulated Login


Install PIP Install Selenium using Selenium simulation to log in to QQ Space


I’m using the Chrome browser, WebDriver.chrome (), to get the Chrome driver.


Chromedriver executable needs to be in PATH error chromedriver executable needs to be in PATH error https://blog.csdn.net/zxy987872674/article/details/53082896


Exe to the Python installation directory, for example, D:\ Python. You also need to add the Python installation directory to the system environment variable.


Qq login page http://i.qq.com, use webdriver to open the LOGIN page of QQ space


driver = webdriver.Chrome()

driver.get(“http://i.qq.com”)



Open the page, right click to check the page element, and find that the account password is in the login_frame. Locate the frame, driver.switch_to.frame(“login_frame”), and then click the account password login button to automatically enter the account password to login. And open the talk page, the detailed code is as follows



At this time, you can see that the page of QQ talk has been opened. Notice that a prompt box will appear after some space is opened. You need to simulate clicking the event to close the prompt box



At the same time, because the content is dynamically loaded, it needs to automatically pull down the scroll bar, load out all the content, and then simulate clicking the next page to load the content.


BeautifulSoup crawl


F12 If you look at the content, you can find the content in the feed_wrap

    array of

  1. tags. Each content is in the
     tag of 
            
    class="bd".



So far QQ talk has climbed down, and saved in the QQ_word file


Word cloud


Use the WordCloud package to generate word clouds,


pip install wordcloud


Here you can use jieba particiber, I did not use, because I think qq said sentences read a little feeling, personal preference, use jieba particiber to see say some words with high frequency.


Set some properties of WordCloud. Note that the font_path property must be set here, otherwise Chinese characters will appear garbled.


As a reminder, if you are using a virtual environment, do not run the following script in a virtual environment, or you may get an error



Deactivate exits the virtual environment and runs again



So far, climb qq say content, and generate word cloud map.


— the End —