Hi, I’m Jiejie.

Today we introduce a Python library called Filestools, which was developed by a very familiar guy.

The Filestools library currently includes four toolkits that I really like:

  • ⅰ Tree directory display;
  • ⅱ Comparison of text files;
  • ⅲ image watermarking;
  • ⅳ. Convert curl network request command into Requests library request code;

You need to install the library before you can use it. With a single command:

pip install filestools -i https://pypi.org/simple/ -U
Copy the code

1. Tree directory display

This feature allows us to recursively display all files and folders in a directory and show the size of each file and folder at a glance.

Let’s take Windows as an example.

The entire operation is carried out in the CMD black window. First you need to learn how to switch to the specified disk, specified directory.

C: Users\Administrator>D: C: Users\Administrator> CD C: Users\Administrator\Desktop\ Python 3 Swordsters \python crawlerCopy the code

There are two commands: tree and tree2.

  • If your system’s native Python has a higher priority than the environment’s, execute the tree command.
  • If your system, because the system environment has a higher priority than native Python, in addition to adjusting the order of environment variables to change the priority, you can also use the tree2 command, consistent with tree. However, if you execute the tree command, no effect is seen.

Based on my computer, I’ll show you:

As you can see, when I execute the tree command, the system display is what it was before the library was installed.

This is due to the precedence of the system environment over native Python.

In this case, run the tree2 command.

Of course, people don’t like executing commands in CMD Windows either. In the jupyter notebook, do the following:

From treedir.tree import tree_dir tree_dir(r"C:\Users\Administrator\Desktop\python \ crawls ", m_level=7, no_calc=False)Copy the code

The results are as follows:

The tree_dir() function takes the following three parameters:

  • Path: indicates the path of the recursively displayed directory. The default value is the current directory.
  • M_level: maximum number of levels to display recursively. Default is 7 levels.
  • No_calc: if this parameter is specified, the folder size will not be calculated recursively if the folder size exceeds the maximum number of layers displayed recursively.

2. Comparison of text files

This feature allows us to compare the differences between two files and output them to an HTML web page. Let’s say we write a piece of code and change it later. However, due to too much code, we do not know where to change, at this time, use this function, very good to help us compare.

Let’s take an example. I used to have an A.txt file, and over time I modified the contents to get the final B.txt.

Requirements: Want to see where the changes were made (if there is a lot of content)

from filediff.diff import file_diff_compare
file_diff_compare("a.txt", "b.txt")
Copy the code

This will generate an HTML web page file in the current working directory.

Double-click to open and view the contents:

Yellow indicates the changed content, green indicates the newly added content, and red indicates the deleted content.

The file_diff_compare() function takes the following seven arguments:

from filediff.diff import file_diff_compare
file_diff_compare(file1, file2, diff_out='diff_result.html', max_width=70, numlines=0, show_all=False, no_browser=False)
Copy the code

The seven parameters are described as follows:

  • File1 / file2: two files to be compared. They must be text files.
  • Diff_out: File name (web page format) to save the difference results. The default value is diff_result.html.
  • Max_width: wrap each line with more than one character. The default value is 70.
  • Numlines: how many rows to display before and after the difference rows. Default is 0;
  • Show_all: Displays all original data as long as this parameter is set. In this case, the -n parameter is invalid and all data is not displayed by default.
  • No_browser: Set this parameter so that the browser does not open automatically after the result is generated. When set to False, the browser is automatically opened.

3. Image watermarking

This is probably the best image watermarking code I’ve ever seen, using the add_mark() function.

Marker import add_mark # Note: Some parameters are default and you can change them at will; Add_mark (file, mark, out='output', color='#8B8B1B', size=30, opacity=0.15, space=75, Angle =30)Copy the code

The add_mark() function takes the following eight arguments:

  • File: indicates the photo to be watermarked.
  • Mark: Which words to use as watermarks;
  • Out: save position after adding watermark.
  • Color: watermark font color, default color #8B8B1B;
  • Size: watermark font size, default 50;
  • Opacity: the default opacity of the watermark font, 0.15;
  • Space: space between watermarking fonts, default 75 Spaces;
  • Angle: indicates the rotation Angle of the watermark font. The default value is 30 degrees.

For example, we execute the following command:

Marker import add_mark add_mark(file=r"C:\Users\Administrator\Desktop\ university.jpg ", Out =r"C:\Users\Administrator\Desktop\python ", mark=" yellow ", Angle =30, space=30)Copy the code

We want to add a yellow student watermark to university. JPG, save it in the franchisee crawler folder, the opacity is 0.2, the rotation Angle is 30°, and the spacing between the fonts is 30.

The original picture is as follows:

The final effect is as follows:

Curl network requests to the Requests library request code

When we write crawlers, we often use some parameter information, such as:

If one manual copy, will it seem very troublesome?

This function solves this problem by converting cURL to Python code, which we just copy.

The general steps are as follows:

  • ⅰ In Google Explorer, the replication network caught the network request as cURL(bash);
  • ⅱ Run the curl2py command to convert it to Python code.

Take the Python position of internship network as an example to illustrate.

www.shixi.com/search/inde…

We have copied the curl for a single request.

As you can see, there are various request urls and then -h is followed by the various parameters for that request. We copy the curl for each link we need to request.

Take a closer look at the image below:

Once you’ve copied curl, paste it and see what you have.

curl 'http://www.shixi.com/search/index?key=python' \ -H 'Connection: keep-alive' \ -H 'Cache-Control: Max-age =0' \ -h 'upset-insecure -Requests: 1' \ -h' user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' \ -h 'Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9, image/avif, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3; Q =0.9' \ -h 'Referer: http://www.shixi.com/' \ -h 'accept-language: zh-cn,zh; Q =0.9' \ -h 'Cookie: UM_distinctid= 17a50a2C8eA537-046c01e944e72F-6373267-100200-17a50A2C8eb4FF; PHPSESSID=rpprvtdrcrvt54fkr7msgcde17; CNZZDATA1261027457=1711789791-1624850487-https%253A%252F%252Fwww.baidu.com%252F%7C1627741311; Hm_lvt_536f42de0bcce9241264ac5d50172db7=1627741268; Hm_lpvt_536f42de0bcce9241264ac5d50172db7=1627741334' \ --compressed \ --insecureCopy the code

With the curl above, you can convert it to Python code using the curl2py command.

from curl2py.curlParseTool import curlCmdGenPyScript curl_cmd = """curl 'http://www.shixi.com/search/index?key=python' \  -H 'Connection: keep-alive' \ -H 'Cache-Control: max-age=0' \ -H 'Upgrade-Insecure-Requests: 1' \ -H 'User-Agent: Mozilla / 5.0 (Windows NT 10.0; Win64; X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' \ -h 'Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9, image/avif, image/webp image/apng, * / *; Q = 0.8, application/signed - exchange; v=b3; Q =0.9' \ -h 'Referer: http://www.shixi.com/' \ -h 'accept-language: zh-cn,zh; Q =0.9' \ -h 'Cookie: UM_distinctid= 17a50a2C8eA537-046c01e944e72F-6373267-100200-17a50A2C8eb4FF; PHPSESSID=rpprvtdrcrvt54fkr7msgcde17; CNZZDATA1261027457=1711789791-1624850487-https%253A%252F%252Fwww.baidu.com%252F%7C1627741311; Hm_lvt_536f42de0bcce9241264ac5d50172db7=1627741268; Hm_lpvt_536f42de0bcce9241264ac5d50172db7=1627741334' \ --compressed \ --insecure""" output = curlCmdGenPyScript(curl_cmd) print(output)Copy the code

The final result is as follows:

As you can see, many of the arguments are converted to regular Python code, which we can use directly.

If you think this article is helpful to you, remember not to forget 3 even, your affirmation will be the strongest power that I continue to output more high-quality articles!