The content of the document is too extensive to be covered in a single article, so it is divided into several sections, each of which provides a detailed introduction to a specific module.

This article is about the basics of documentation and flow in the standard library. it’s a very dry reading of theoretical explanations in written form. However, this knowledge is very important and is the basis for understanding the operation of C standard library files. If you read it patiently and digest it, you will gain a lot.

stdio.h

Stdio. h is the C standard I/O library header file that declares many types, macros, and related functions. If you want to use standard input/output (such as the printf function), you must include this header file. For GCC compilers, the standard I/O library is linked to each program by default; manual linking is not required.

flow

In the STANDARD I/O library of THE C language, I/O is abstracted as a “stream” to ignore differences between files or devices and achieve a cleaner and more efficient I/O experience.

Just like a real stream, a stream abstracts the input or output into a sequence of bytes of arbitrary length that has one direction. When we put data in the stream, it automatically flows to its destination, and we can put new data in; When you pull data from the stream, it pushes the new data to the front for the next retrieval.

Streams are usually implemented with buffers, files are like a stream, and buffers are like a small stream as far as we can see. When reading a file, the stream loads one piece of data from the file into the buffer at a time. We access the buffer. When we reach the end of the buffer, the next piece of data is automatically loaded into the buffer. When writing to a file, we first operate on the buffer. When certain conditions are met (for example, the buffer is full), the contents of the buffer are automatically flushed to the file. The buffer can be reused, and the contents of the file are written after the last one.

Streams typically include associated files, stream types, buffers, file location indicators, stream state, and so on, in the form of a structure pointer, depending on the compiler.

File location indicator

For streams that support location requests (such as streams associated with normal files), include a file location indicator. It represents the position of the current operation in the overall flow. A stream abstracts data into a sequence of bytes, and the file location indicator is the offset from the beginning of the file. All inputs and outputs are taken at the location indicated by the file location indicator, and the file location indicator is automatically updated when the operation is complete.

Text streams versus binary streams

First of all, all data in a computer is binary. Text is nothing more than a binary string encoded in a particular way.

Corresponding to text and binary streams are text files and binary files. C standard differentiates the two and has different performance on different operating system platforms.

For Unix/Linux and macOS operating systems after OS X, text streams are identical to binary streams.

For DOS and Windows, as well as earlier macOS operating systems, text streaming had some additional operations.

On DOS or Windows operating systems, the word \r\n represents a newline, while the C language inherits Unix using \n. In consideration of the portability of text processors, \r\n is automatically converted to \n when read on DOS or Windows operating systems. When \n is written, it is automatically converted to \r\n.

Early macOS operating systems used \r as a newline character, and there was a conversion between \r and \n when processing text streams.

In addition, text streams are no different from binary streams, and the various I/O functions are common.

The buffer

Buffering techniques are important when there is a gap in I/O speeds between two devices or processes. Without buffering, fast processes must wait for slow processes to complete I/O; With buffering in place, fast processes can put data into buffers before performing other tasks. At the same time, buffering avoids frequent I/O calls, making the overall speed faster.

There are generally two types of buffering: row buffering and full buffering.

  • Line buffering: When the buffer is full or a newline character is encountered (\n), flush the buffer.
  • Full buffering: Flushes buffers if and only if they are full.

Manually created streams are generally fully buffered.

Predefined flow

There are three predefined streams in stdio.h, stdin, stdout and stderr, corresponding to file standard input, standard output and standard error respectively, corresponding to file descriptors 0, 1 and 2. All three streams are associated with the terminal by default.

Note that these three streams are not always directly associated with the terminal, but rather inherit the corresponding files of the parent process, and if the parent’s predefined streams are redirected, they will be redirected as well.

The C standard states that stderr is not fully buffered by default, stDIN and STdout are fully buffered if and only if a stream is associated with a non-interactive device. POSIX adds that stderr is unbuffered, stdout is row buffered when associated with terminals, and stDIN is buffered when associated with terminals. Terminals have their own buffering.

Combining the two standards, the following conclusions can be drawn:

  • stderrIn any caseThere is no buffer, the output content will be immediately displayed to the specified location, so log is recommended instderrThe output.
  • stdoutThe default (associated with the terminal) isThe line bufferIf no newline character is encountered (or the buffer is full), nothing is displayed, which can cause trouble for debugging output. whenstdoutIt is when redirected to a normal fileThe bufferThis can improve the execution efficiency of the program.
  • stdinBy default the buffering method (associated with the terminal) does not matter because the input is taken over by the terminal, which has its own buffer, and it is up to the terminal when to flush the buffer. The buffering mode of the terminal is generallyThe line buffer, which makes thestdinIt looks like row buffering. Changing terminal Settings can change the buffering behavior on input, changestdinThe buffer is invalid. whenstdinIt is when redirected to a normal fileThe bufferIn order to improve the efficiency of the program.

The buffering methods mentioned above are the default behaviors and can be modified using setbuf or setvbuf functions. Windows is not fully POSIX compliant.

Streams associated with terminals do not support location requests, so functions such as rewind, ftell, fseek, fgetpos, and fsetpos cannot be used.

EOF

EOF is short for End of File, indicating the End of a File.

On most modern operating systems, EOF is not a character that actually exists at the end of a file, but rather a value that some functions return when they read to the end of a file. The way to determine if the end of a file is reached is not to check for EOF characters, but to compare file sizes.

EOF is defined as a macro, as follows:

#define EOF (-1)
Copy the code

The stream contains the EOF flag, which is set when the end of the file is read. When the EOF flag is set, all read operations return with direct failure, with no effect on flow.

For functions or operations that read a single byte, such as fgetc and fscanf(stream, “%c”, PTR), the EOF flag is not set after the last byte in the stream is read. Instead, the end of the file must be read once.

For example, a file contains the following:

123
Copy the code

Using the fgeTC function to read one byte at a time, after three calls to the function, the file position falls after 3, which is the end of the file. At this point, EOF is not triggered. If fgeTC is called again, the EOF flag is set and EOF is returned.

For functions that read more than one byte, such as fgets, fscanf, fread, etc., the EOF flag is set when the last byte in the stream is read and attempts to continue.

For example, a file contains the following:

123
Copy the code

If you call the fscanf(stream, “%d”, PTR) function, after reading character 3, the program is not sure that it is the whole number. There may be other things after it, so it tries to read the end of the file. The EOF flag is set.

If the function fgets(PTR, 10, stream) is called and after reading character 3, no newline ‘\n’ is encountered and no 10-1 bytes are reached, so an attempt is made to read the end of the file, and the EOF flag is set. If you call gets(PTR, 4, stream) and reach 4-1 bytes after reading character 3, the function returns directly without setting the EOF flag. If there is a newline character at the end of the file, the calling function fgets never sets the EOF flag because the function returns after reading the newline character.

If the function fread(PTR, 2, 2, stream) is called, after reading character 3, it does not reach 2 * 2 bytes, so an attempt is made to read the end of the file, and the EOF flag is set. If you call fread(PTR, 1, 3, stream) and reach 1 * 3 bytes after reading character 3, the function returns directly without setting the EOF flag.

The reason for this is that functions that read more than one byte are implemented to call the fgeTC function consecutively. This design is obviously flawed and should be used with care to avoid traps.