This article has participated in the “Digitalstar Project” and won a creative gift package to challenge the creative incentive money.

Foreword — a few days ago, my that go up junior high school’s younger sister suddenly hair VX ask me to say she wants to copy a few friend circle copy copy that searches on the net take hair friend circle, but the problem is copy not!

Hear this question I hey hey smile (thought: is there data on the net that I crawler can’t climb? Hasn’t sister heard of a legend circulating in all corners of the country – – visible climb! “I popped out of bed, sat down at my computer, opened Google Chrome and typed in my sister’s web address — sure enough:

It’s the same familiar pop-ups and the same stupid VIP privileges, but — these are all minor problems for us reptilians. I turn on my PyCharm and bam bam, and in a few minutes I have a crawler for my sister’s website. Type in the url and download OK:

After that, I will download and tidy up the TXT text directly sent to sister, sister get a good brother brother kua – body bone to crisp! But, I suddenly thought, for sister this kind of programming little white, next time she encountered similar problems or come to ask me, I give her to solve it! “No, no, no,” I tell myself – this is the IT taboo! Teach a man to fish instead of giving him a fish – that’s the way!! But what exactly is this “fishing”?


I won’t keep you in suspense! Here is a simple operation to share with you – just use a Google Browser (no matter you are a child or uncle or aunt), follow the simple steps I teach below, I can unseal you, copy whatever you want to copy!

Step 1: Click the right mouse button in the blank of the page -> then click “Check”;

Step 2: Click the gear icon in the upper right corner of the page.

Step 3: Scroll down to Disable JavaScript and click the blank box to select it.

End, now – you have unsealed the page and can copy whatever you want.


However, if you’re an app geek, or want to be one in the future, it’s not enough to just know how to use the same browser! If you’ve mastered all of them — congratulations: you’re already a very powerful programmer!

@TOC

1.Chrome Debug panel

(1) Common panel (crawler positioning elements must be used!)

  1. Position small arrow button (first from left) :

Select the Elements panel and launch the button to locate the source location of the corresponding element on the page, or select the source location to locate the corresponding element on the page. 2. Mobile-pc view switch button (second from the left) : When this button is activated, the web page can be converted between PC web site and mobile web site. Since it is relatively easier to crawl the mobile website web page in the crawler process, this button can switch the web page to the mobile web page to achieve faster crawling operation. This panel displays all the rendered HTML source code, which can be used to find the location, attributes, and other characteristics of each tag when you use Selenium to crawl a web page. More importantly, double-click the HTML source code or the CSS on the right to change the appearance of the web page, that is, you can debug static web pages. 4. Console Panel Shortcut key: CTRL+~ This panel displays the log information during the web page loading process, including printing, warning, error, and other information that can be displayed. It is also a JS interactive console. 5. Sources panel (Source panel) This panel is grouped by site and stores all requested resources (HTML, CSS, JPG, GIF, JS, etc.). Because this panel holds all the resources, this is where the object code is looked for when debugging JS. The panel also provides debugging button tools. Network panel (Network panel) Network panel records the detailed information of Network requests, including request headers, response headers, form data, parameter information, etc. Shortcut key small learning (to check the page to enter oh!) : CTRL+SHIFT+P input javascript (you can directly select the Disabled javascript option) : this site can be blocked JS code, after the refresh of this site will not execute javascript code! Type full: Can take a screenshot (will take a screenshot of the entire page)

(2) Network panel (crawler filter request and filter data type must be used — such as filter asynchronous load out of the request!)

  1. ALL: ALL requests

  2. XHR(XmlHttpRequest object JS generation) : JS loads requests dynamically

  3. JS: JS code

  4. Style of Css:

  5. Image: image

  6. Media: Audio and video

  7. The Font, Font

  8. DOC: the home page

  9. WS: WebSocket

  10. Hide Data URLs: Filters out data responses

  11. Note: (1) The Preserve log option in the upper left corner, if checked, will not clear the data requested by the previous page. For example: in a web page login, if you do not check this option, because click login before belongs to a request; Click login and it belongs to another request. So after clicking, there is no login information for you! (2) The Disable cache option in the upper left corner indicates that the cache is cleared. This option is usually selected to prevent unexpected errors caused by the presence of local cache during web operations.

    (3) The box Filter in the upper left corner. ① You can filter the response of the domain name into baidu.com so that you can look for cookie. ②set-cookie-name: indicates the key in the cookie. You can also filter responses that contain this key, making it easier for you to find cookies. ③set-cookie-value: indicates the value in the cookie. You can also filter responses that contain this value, making it easier for you to find cookies. ④cookie-name: indicates the key in the cookie. You can filter requests for keys that contain this cookie.

(3) set breakpoint (crawler advanced JS penetration must use operation!)

Part one: How to Use it!

Purpose: Find where the target data is generated through debugging (JS penetration must use!) Breakpoints are used to pause JavaScript code and examine the value of variables and the stack called at a particular moment. The most basic way to set breakpoints is to manually add a breakpoint on a particular line of code. These breakpoints can also be configured to fire only when certain conditions are met. On the left side of the source code, you can see the line number. This area is called the line number gutter. Clicking the line number in the line number slot adds a breakpoint on that line of code. For example, events, DOM changes.

Part two: Step by step debugging!

Part three: Scope!

When the script breaks, the Scope pane displays all the currently defined properties at the current time.

Part four: Call stack!

  • Near the top of the sidebar is the Call Stack pane. When the code pauses at the breakpoint, the CallStack pane shows the execution path, in reverse chronological order, taking the code to that breakpoint. This helps to understand where the execution is now and how it got there, and is an important factor in debugging.
  • To call the function chain, the function above is called below

2.Chrome Shortcuts

(1) TAB page and window shortcut keys (key: common!)

operation shortcuts
Open a new window Ctrl + n
Opens a new window in traceless mode Ctrl + Shift + n
Open a new TAB and jump to the TAB Ctrl + t
Re-open the last closed TAB and jump to the TAB Ctrl + Shift + t
Jump to the next TAB that opens Ctrl + Tab or Ctrl + PgDn
Jumps to the last open TAB Ctrl + Shift + Tab or Ctrl + PgUp
Jumps to a specific TAB Ctrl + 1 to Ctrl + 8
Jump to the last TAB Ctrl + 9
Open the home page in the current TAB Alt + Home
Opens the previous page recorded in the current TAB browsing history Alt + left arrow key
Opens the next page recorded in the current TAB browsing history Alt + right arrow key
Close the current TAB Ctrl + W or Ctrl + F4
Close all open tabs and browsers Ctrl + Shift + w
Minimize the current window Alt + space + n
Maximizing the current window Alt + space + x
Close current window Alt + F4
Exit the Google Chrome Ctrl + Shift + q

(2) Google Chrome function shortcuts

(3) Web shortcut keys

3.In The End

The above knowledge points are mostly some simple operation commands, the typical kind of look and forget the content. So take some advice from some of the biggest names in the programming world: if you see it, forget it, if you use it, don’t remember it — keep watching!!

Start now, stick to it, a little progress a day, in the near future, you will thank you for your efforts!

This blogger will continue to update the basic column of crawler and crawler combat column, carefully read this article friends, you can like the collection and comment on your feelings after reading. And can follow this blogger, read more crawler in the days ahead!

If there are mistakes or inappropriate words can be pointed out in the comment area, thank you! If reprint this article, please contact me to explain the meaning and mark the source and the name of the blogger, thank you!Copy the code