Python’s Super Document Search Tool: What you Have to know

Literature search for the majority of students is really a hassle, if your school to buy the paper download permission is not enough, or not on campus, it is very headache. Fortunately, we have a paper search tool made in Python that simplifies our learning.

Scihub

Sci-hub is an online database that provides access to 81.6 million scientific papers and articles. It was started by a graduate student, Alessandra Elbakin, who came up with the idea of making knowledge accessible to more people after she found it too expensive to pay for the hundreds of papers she needed while doing research at Harvard University.

Later, the website became more and more popular, and gradually became popular in more countries such as India, Indonesia, China, Russia and other countries, and successfully cooperated with several organizations to maintain and operate the website. By 2017, there were 81.6 million academic papers on the site, accounting for 69 percent of all academic papers, which basically meets the demand for most papers, while the remaining 31 percent are papers that researchers do not want to access.

Why do we need itPython toolsdownload

In the beginning, the site was accessible to all, but as its popularity grew, more and more publishers became interested in it. In 2015, it was blocked by a US court and its SERVERS in the US became inaccessible. Since then, they have waged a guerrilla war with publishers.

The downside of guerrilla warfare is that sciHub addresses need to be changed frequently, so there is no way to use exactly one address all the time to access the database. Of course, there are other ways to access the site for a long time, such as modifying DNS and hosts files, but these methods are not only cumbersome, but also not permanent, and there is a possibility of failure.

New posture: Apis written in PythontoolSuper easy to download papers

Github is an open source, unofficial API tool from github.

Github.com/zaytoun/sci…

First we need to download the tool and clone the project on Github:

git clone Github.com/zaytoun/sci…

Or download the ZIP from the Clone or Download button and unzip it.

The unzipped folder may be named scihub.py, so change it to scihub. After unzipped, use CMD to enter the folder and type the following command (by default you have Python installed) to install dependencies:

pip install -r requirements.txt

Then we’ll be ready to use it!

This tool is very simple to use. You can first find the paper you want on Google Academic (search for the url of the paper) or IEEE. Copy the url of the paper as follows:

Img3.imgtn.bdimg.com/it/u=664814…

Then create a new file called download.py in the sciHub folder and type:

from scihub import SciHub

sh = SciHub()

Enter the website address of the paper as the first parameter
# path: indicates the path to save the file
result = sh.download('http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1648853', path='paper.pdf')

Copy the code

Enter this folder and run in CMD /terminal:

python download.py
Copy the code

You will find the file successfully downloaded to your current directory under the name paper.pdf. If it doesn’t work, just try it a few more times. If it still doesn’t work, you can ask in the comments below.

Of course, this API tool also has many features, such as batch download using Google Academic keyword search papers, these features are waiting for you to explore.

from scihub import SciHub

sh = SciHub()

# Get 5 articles for the keyword 'bittorrent' on Google Scholar
results = sh.search('bittorrent'And 5)Download the paper and call sciHub if necessary
for paper in results['papers']:
 sh.download(paper['url'])
Copy the code

The working principle of

The source code for this API is actually pretty easy to read.

Find the domain name sci-Hub currently available

First it will find the sci-Hub domain name currently available for downloading papers:

whereisscihub.now.sh/

Second, the user input paper address analysis, find the corresponding paper

  1. If the user enters a link that is not directly downloadable, use SCI-Hub to download it

  2. If the sciHub web address is unavailable, switch to another web address, unless all web addresses are unavailable.

Three, download

  1. Once you get the paper, it’s stored in the data variable

  2. Then store the data variable as a file

It is worth noting that the code uses a Retry decorator, which can be used for error retries. The authors set the number of retries to 10, with a maximum wait time of 1 second per retry.

This is the end of our article, if you want our article today, please continue to pay attention to us, if it is helpful to you, please click on the bottom of the like/read oh


Python Dict.com Is more than a dictatorial model