To capture packets, press F12 of the browser to call up the packet capture tool of the browser, analyze web requests, extract request headers and parameters, and use specific languages such as Python or Java to write codes to capture packets. This is routine, but too slow. Humans are at the top of the food chain because they use tools. Now I introduce a fast and efficient method to capture and write a crawler. First of all, we use Fiddler to capture packets, but you can search for the details of how fiddler captures packets. As shown below, we log in to the home page of the book and catch the package of the home page of the book. Follow the steps below to see the Raw Raw request sent to Jane.

GET https://www.jianshu.com/HTTP / 1.1Host: www.jianshu.com
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1.User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3Accept-Encoding: gzip, deflate, br
Accept-Language: zh-CN,zh; Q = 0.9, en. Q = 0.8Cookie: __yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068 = 1564068289156155, 331156155, 349156155, 353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e 1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%2 2%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_tr affic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_ref errer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6% 89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collecti on%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A %22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175If-None-Match: W/"59ba10ba20969a6bd6d15c129cdd78f0"

Copy the code

The first line is a GET request, the address of the request is https://www.jianshu.com/, set it to postman, then copy the request header to remove the contents of the request line and set it to hearders, as follows:

If the address is HTTPS, an error is reported after clicking Send: Could not get any response, so you also need to set a place, File=>Settings=>Generals where SSL Certificate verification is closed, as shown below, to close it.


Once the request is successful, you can see the data.

At this point, at least with our arguments above, we can call through the target address, which we need to convert to Python code or Java code. To do this, click Code=> and select the language you want. Direct connection can run the code.

import requests

url = "https://www.jianshu.com/"

headers = {
    'Host': "www.jianshu.com".'Connection': "keep-alive".'Cache-Control': "max-age=0".'Upgrade-Insecure-Requests': "1".'User-Agent': "Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36".'Accept': "text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3".'Accept-Encoding': "gzip, deflate, br".'Accept-Language': "zh-CN,zh; Q = 0.9, en. Q = 0.8".'Cookie': "__yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068 = 1564068289156155, 331156155, 349156155, 353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e 1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%2 2%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_tr affic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_ref errer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6% 89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collecti on%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A %22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175, __yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068 = 1564068289156155, 331156155, 349156155, 353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e 1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%2 2%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_tr affic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_ref errer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6% 89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collecti on%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A %22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175; locale=zh-CN; read_mode=day; default_font=font2".'If-None-Match': "W/"59ba10ba20969a6bd6d15c129cdd78f0"".'Postman-Token': "f2d54dc2-67f6-451d-88a6-c5495bb8d2de,bd6acef5-9ef9-405f-888e-6d782198d06d".'cache-control': "no-cache"
    }

response = requests.request("GET", url, headers=headers)

print(response.text)
Copy the code

The Java code is as follows:

OkHttpClient client = new OkHttpClient();

Request request = new Request.Builder()
  .url("https://www.jianshu.com/")
  .get()
  .addHeader("Host"."www.jianshu.com")
  .addHeader("Connection"."keep-alive")
  .addHeader("Cache-Control"."max-age=0")
  .addHeader("Upgrade-Insecure-Requests"."1")
  .addHeader("User-Agent"."Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.80 Safari/537.36")
  .addHeader("Accept"."text/html,application/xhtml+xml,application/xml; Q = 0.9, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3")
  .addHeader("Accept-Encoding"."gzip, deflate, br")
  .addHeader("Accept-Language"."zh-CN,zh; Q = 0.9, en. Q = 0.8")
  .addHeader("Cookie"."__yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068 = 1564068289156155, 331156155, 349156155, 353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e 1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%2 2%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_tr affic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_ref errer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6% 89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collecti on%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A %22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175, __yadk_uid=AFXc3m0QhoWYKzHTlaOb5lpJUvepcIMq; read_mode=day; default_font=font2; locale=zh-CN; Hm_lvt_0c0e9d9b1e7d617b3e6842e85b9fb068 = 1564068289156155, 331156155, 349156155, 353; remember_user_token=W1s4NzQ2OTA3XSwiJDJhJDExJHhrakNvbjdKLmhRanZPVmt0c0Y0WXUiLCIxNTY0ODQyMDYwLjgxMDM4MjQiXQ%3D%3D--deff2e 1e9ec55f5f3c64bd00bd11a1d987de45ee; _m7e_session_core=094e6f9ef939baa86ed84097f2315b3e; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%2 2%24device_id%22%3A%2216b27cf05cfab1-049233ccf59a98-37c153e-1327104-16b27cf05d0939%22%2C%22props%22%3A%7B%22%24latest_tr affic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24latest_ref errer_host%22%3A%22%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6% 89%93%E5%BC%80%22%2C%22%24latest_utm_source%22%3A%22desktop%22%2C%22%24latest_utm_medium%22%3A%22notes-included-collecti on%22%2C%22%24latest_utm_campaign%22%3A%22maleskine%22%2C%22%24latest_utm_content%22%3A%22note%22%7D%2C%22first_id%22%3A %22%22%7D; Hm_lpvt_0c0e9d9b1e7d617b3e6842e85b9fb068=1564842175; locale=zh-CN; read_mode=day; default_font=font2")
  .addHeader("If-None-Match"."W/"59ba10ba20969a6bd6d15c129cdd78f0"")
  .addHeader("Postman-Token"."f2d54dc2-67f6-451d-88a6-c5495bb8d2de,c5e2f8e3-61ba-4846-816d-1ab1118f3c2a")
  .addHeader("cache-control"."no-cache")
  .build();

Response response = client.newCall(request).execute();
Copy the code

The above.