Still reading files with Open? Out, this library is 100 times better than Open

Using the open function to read files seems to be a common understanding among all Python engineers.

Today Mingo is going to recommend a better and more elegant way to read files than open: Use FileInput

Fileinput is a built-in module in Python, but I’m sure many people are unfamiliar with it. Today I will explain all the usage and functions of FileInput in detail, and enumerate some very practical cases, for understanding and using it can be said to have no problem.

1. Read from standard input

When your Python script passes no arguments, FileInput defaults to stdin as the input source

# demo.py
import fileinput

for line in fileinput.input() :print(line) 
Copy the code

The effect is that whatever you type, the program automatically reads and prints again, like a rereader.

$ python demo.py 
hello
hello

python
python
Copy the code

2. Open a separate file

The contents of the script are as follows

import fileinput

with fileinput.input(files=('a.txt')),as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}Line:{line}', end=' ') 
Copy the code

The content of a. TXT is as follows

hello
world
Copy the code

When executed, the output is as follows

$ python demo.pyLine 1: hello. Line 2: worldCopy the code

One thing to note is that fileinput.input() reads files in mode=’r’ by default, or mode=’rb’ if your files are binary. Fileinput has these two and only read modes.

3. Open multiple files in batches

As you can also see from the above example, I pass in the files parameter to the fileInput. input function, which accepts a list or tuple of file names. Passing one means reading one file, and passing multiple means reading multiple files.

import fileinput

with fileinput.input(files=('a.txt'.'b.txt')) as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}Line:{line}', end=' ') 
Copy the code

The contents of A. tuck and B. tuck are respectively

$ cat a.txt
hello
world
$ cat b.txt
hello
python
Copy the code

Fileinput.lineno () is the true line number in the original file only when a file is read, since the contents of a.txt and b.txt are consolidated into a single file object file.

$ python demo.pyLine 1: hello a.txt Line 2: world b.txt line 3: hello b.txt line 4: PythonCopy the code

If you want to read multiple files while also reading the true implementation number of the original file, use the fileInput.filelineno () method

import fileinput

with fileinput.input(files=('a.txt'.'b.txt')) as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.filelineno()}Line:{line}', end=' ') 
Copy the code

After running, the output is as follows

$ python demo.pyLine 1: hello a.txt line 2: world b.txt line 1: hello b.txt line 2: PythonCopy the code

This usage is perfect for the Glob module

import fileinput
import glob
 
for line in fileinput.input(glob.glob("*.txt")) :if fileinput.isfirstline():
        print(The '-'*20.f'Reading {fileinput.filename()}. '.The '-'*20)
    print(str(fileinput.lineno()) + ':' + line.upper(), end="")
Copy the code

Run as follows

$ python demo.py
-------------------- Reading b.txt... --------------------
1: HELLO
2: PYTHON
-------------------- Reading a.txt... --------------------
3: HELLO
4: WORLD
Copy the code

4. Back up files while reading

Fileinput. input has a backup argument, and you can specify a backup suffix, such as.bak

import fileinput


with fileinput.input(files=("a.txt",), backup=".bak") as file:
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}Line:{line}', end=' ') 
Copy the code

This will result in an additional a.txt.bak file

$ ls -l a.txt*
-rw-r--r--  1 MING  staff  12  2 27 10:43 a.txt

$ python demo.pyLine 1: hello. Line 2: world
$ ls -l a.txt*
-rw-r--r--  1 MING  staff  12  2 27 10:43 a.txt
-rw-r--r--  1 MING  staff  42  2 27 10:39 a.txt.bak
Copy the code

5. Standard output redirection replacement

Fileinput. input takes an inplace argument that indicates whether to write the result of standard output back to a file. The default is not replaced

Take a look at the following test code

import fileinput

with fileinput.input(files=("a.txt",), inplace=True) as file:
    print("[INFO] task is started...") 
    for line in file:
        print(f'{fileinput.filename()} 第{fileinput.lineno()}Line:{line}', end=' ') 
    print("[INFO] task is closed...") 
Copy the code

Print in the body of the for loop is written back to the original file. Print outside the for circulation did not change.

$ cat a.txt
hello
world

$ python demo.py
[INFO] task is started...
[INFO] task is closed...

$ cat a.txtLine 1: hello. Line 2: worldCopy the code

With this mechanism, text replacement can be easily implemented.

import sys
import fileinput

for line in fileinput.input(files=('a.txt', ), inplace=True) :Convert Windows/DOS text files to Linux files
    if line[-2:] = ="\r\n":  
        line = line + "\n"
    sys.stdout.write(line)
Copy the code

Attached: how to realize DOS and UNIX format interchange for program testing, using VIM input the following instructions

DOS to UNIX: : setFileformat = UNIX UNIX to DOS: : setFileformat = DOSCopy the code

6. Methods that have to be introduced

If you just want FileInput to be used as an alternative to Open for reading files, this is all you need.

Fileinput.filenam () returns the file name currently being read. Return None until the first line is read.
Fileinput.fileno () returns the “file descriptor” of the current file as an integer. When the file is not open (between the first line and the file), -1 is returned.
Fileinput.lineno () returns the cumulative line number that has been read. Returns 0 before the first line is read. After the last line of the last file has been read, returns the line number of that line.
Fileinput.filelineno () returns the line number in the current file. Returns 0 before the first line is read. Returns the line number of that line in the last file after the last line of the last file has been read.

However, if you want to do more complex logic based on FileInput, you may need to use these methods

fileinput.isfirstline()Returns if the line just read is the first line in the fileTrueOtherwise returnFalse.
fileinput.isstdin()If the last row read comes fromsys.stdinIt returnsTrueOtherwise returnFalse.
fileinput.nextfile()Close the current file so that the next iteration will read the first line from the next file (if present); Rows that are not read from this file are not counted in the cumulative line count. The file name does not change until the first line of the next file is read. This function will not take effect until the first line is read; It cannot be used to skip the first file. This function will no longer take effect after the last line of the last file has been read.
fileinput.close()Close the sequence.

7. More advanced gameplay

There is an openhook argument in fileInput.input (), which allows the user to pass in custom object reading methods.

If you don’t pass in any ticks, FileInput uses the open function by default.

Fileinput has two hooks built in for you to use

fileinput.hook_compressed(*filename*, *mode*)

Transparently open gzip and bzip2-compressed files (identified by extensions ‘.gz’ and ‘.bz2’) using the Gzip and Bz2 modules. If the file extension is not ‘.gz’ or ‘.bz2’, the file will be opened in the normal way (that is, with open() and without any decompression). Fi = fileInput. fileinput (openhook= fileInput.hook_compressed)
fileinput.hook_encoded(*encoding*, *errors=None*)

Returns a hook that opens each file with open(), reading the file with the given encoding and errors. Example: fi = fileinput.fileinput (openhook=fileinput.hook_encoded(” UTF-8 “, “surrogateescape”))

If your own scene is special, the above three hooks do not meet your requirements, you can also customize.

Let me give you an example here

If I want to use FileInput to read files on the network, I can define hooks like this.

Start by downloading the file locally using Requests
Then use open to read it

def online_open(url, mode) :
    import requests
    r = requests.get(url) 
    filename = url.split("/")[-1]
    with open(filename,'w') as f1:
        f1.write(r.content.decode("utf-8"))
    f2 = open(filename,'r')
    return f2
Copy the code

Just pass this function to Openhoos

import fileinput

file_url = 'https://www.csdn.net/robots.txt'
with fileinput.input(files=(file_url,), openhook=online_open) as file:
    for line in file:
        print(line, end="")
Copy the code

After running, the files of CSDN robots are printed out as expected

User-agent: * Disallow: /scripts Disallow: /public Disallow: /css/ Disallow: /images/ Disallow: /content/ Disallow: /ui/ Disallow: /js/ Disallow: /scripts/ Disallow: /article_preview.html* Disallow: /tag/ Disallow: /*? * Disallow: /link/ Sitemap: https://www.csdn.net/sitemap-aggpage-index.xml Sitemap: https://www.csdn.net/article/sitemap.txtCopy the code

8. Give some practical examples

Case 1: Read all lines of a file

import fileinput
for line in fileinput.input('data.txt') :print(line, end="")
Copy the code

Case 2: Read all lines of multiple files

import fileinput
import glob
 
for line in fileinput.input(glob.glob("*.txt")) :if fileinput.isfirstline():
        print(The '-'*20.f'Reading {fileinput.filename()}. '.The '-'*20)
    print(str(fileinput.lineno()) + ':' + line.upper(), end="")
Copy the code

Case 3: Use FileInput to convert CRLF files to LF

import sys
import fileinput

for line in fileinput.input(files=('a.txt', ), inplace=True) :Convert Windows/DOS text files to Linux files
    if line[-2:] = ="\r\n":  
        line = line + "\n"
    sys.stdout.write(line)
Copy the code

Case 4: Log analysis with RE: fetch all rows containing dates


#-- sample file -- : error.log
aaa
1970- 01-0113:45:30  Error: **** Due to System Disk spacke not enough...
bbb
1970- 01-0210:20:30  Error: **** Due to System Out of Memory...
ccc
 
#-- Test script --
import re
import fileinput
import sys
 
pattern = '\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'
 
for line in fileinput.input('error.log',backup='.bak',inplace=1) :if re.search(pattern,line):
        sys.stdout.write("=> ")
        sys.stdout.write(line)
 
#-- Test results --= >1970- 01-0113:45:30  Error: **** Due to System Disk spacke notenough... = >1970- 01-0210:20:30  Error: **** Due to System Out of Memory...
Copy the code

Case 5: Using FileInput to implement functions similar to grep

import sys
import re
import fileinput
 
pattern= re.compile(sys.argv[1])
for line in fileinput.input(sys.argv[2) :if pattern.match(line):
        print(fileinput.filename(), fileinput.filelineno(), line)
$ ./test.py import.*re *.py
# find all py files with import re in them
addressBook.py  2   import re
addressBook1.py 10  import re
addressBook2.py 18  import re
test.py         238 import re
Copy the code

9. Write at the end

Fileinput is a reencapsulation of the Open function. In scenarios where only data is read, FileInput is more professional and human than Open. Of course, in other complex scenarios where write operations are performed, FileInput cannot be used. The module itself knows from the fileInput name that it only focuses on input (read) and not output (write).

At the end of this article, I will introduce two online documents written by myself:

The first document: PyCharm Chinese Guide 1.0 document

I sorted out 100 Tips on using PyCharm. I spent a lot of time recording hundreds of GIFs for beginners to be able to use them directly. If you are interested, you can read online documents.

Second document: PyCharm Dark Magic Guide 1.0 document

The system includes a variety of cold Python knowledge, Python Shell diverse gameplay, crazy Python skills, Python super detailed advanced knowledge interpretation, very practical Python development skills.