I wrote an article about crawling the INS data before, but recently the INS interface has been updated and made easier.

The purpose of this article is to share the technology, and you are at your own risk for using the technology in this article. Technology is not guilty. I have nothing to do with it.

Brief analysis of INS interface

Recently, I have done several projects using INS data, to summarize the basic interface types of INS.

The website of INS is a React BASED SSR, so in order to render the interface faster, they request data from the server in advance and load it into the page. The variable name is window._sharedata, which is the first type of interface. Most of these interfaces only need to directly get the corresponding page and find the variable.

The second type of interface is JSON API, which directly returns JSON. These interfaces are less open to the Web, and most of them are in their official APIS, which require keys to be used. There are only two known web applications.

Crawls the first type interface

First, open someone’s Instagram home page. F12 opens the console and looks at the elements. It’s obvious to see a line of code

window._sharedData

So, I’m going to offer you a method that I use now

export function getShareData(str){
    return JSON.parse(str.match(/(window._sharedData\s?) (=\s?) (. *?) ; <\/script>/) [3]),}Copy the code

Crawls the interface of the second type

  • Paging interface

If you look at shareData on the home page, you will find that there are only 12 images in it. Of course, this is not enough. Let’s scroll down the page and open the console to Network. You will see a loading network with a query request

https://www.instagram.com/static/bundles/es6/ProfilePageContainer.js/229d3e1a78c4.js

// STR is the HTML string returned when the first page is requested
export function getProfilePageContainerURL(str,type='profile'){
  let reg=' '
    if(type==='profile'){
       reg=/<script.*src="(.*ProfilePageContainer.*)".*<\/script>/
    }else{
      reg=/<script.*src="(.*TagPageContainer.*)".*<\/script>/
    }
    return 'https://www.instagram.com'+str.match(reg)[1].split('"') [0]}Copy the code

So I’ve got this file address, I’m just going to get it as usual and then I’m going to go through a recursion here to get all the Queryids

GetQueryHashByScript (res,'queryId')
//res is what gets the script back
export function getQueryHashByScript(str,word,arr=[]){
    let i=0,str2=' ',key=' '
    i=str.indexOf(word)  
    if(i>- 1){
        key=str.substr(i+9.32)
        arr.push(key)
        str2=str.substring(i+42)  
        return getQueryHashByScript(str2,word,arr)
    }else{
        return arr
    }
}
Copy the code

Once you find the queryId, you can request the corresponding Query interface according to the format

All that’s left is a direct request.

  • Post Details Interface

This interface is by post shortcode retrieve posts details, specific address https://www.instagram.com/$/ p / ${username} {shortcode} /? __a = 1 get request directly.

For now, there are only so many interfaces that the Web can use, and the rest of the interfaces that are part of the Instagram API require keys. People can also communicate with each other when they discover new interfaces.

In order to cultivate your hands-on ability, I specially leave a foible, is ins hashtag interface, you should be able to guess it, hands-on practice it

2019-11 updates, now found that users can directly get the home page, the interface is as follows: https://www.instagram.com/${username} /? __a = 1