Python Logging module and multi-process logging

This article focuses on a deeper understanding of Python Logging. More importantly, we will discuss how to use logging to output logs in a multi-process environment and how to safely slice log files.

1. Logging Log module introduction

Python’s Logging module provides a flexible standard module that allows any Python program to use this third-party module for logging. Python Logging official documentation

The logging framework consists of four parts:

Loggers: interfaces that can be invoked directly by programs
Handlers: Decides to assign logging to the correct destination
Filters: Provides finer-grained determination of whether logs are output
Formatters: Sets the format layout for the final record print

2. Logging composition

loggers

Loggers are logging interfaces that programs can call directly to write log messages to logger. Logger is not instantiated directly, but gets objects via logging.getLogger(name). In fact, Logger objects are in singleton mode. Logging is multithreaded safe, meaning that the same logger object is retrieved wherever logging is required. Unfortunately logger does not support multiple processes, which will be explained in a later section and some solutions. Loggers have parent-child relationships. The parent object is root when there is no parent logger object, and the parent-child relationship is corrected when there is a parent object. For example logging.getLogger(“abc.xyz”) creates two Logger objects, one ABC parent and one XYZ child, and ABC has no parent so its parent is root. But in reality ABC is a placeholder object (virtual log object), and there can be no handler to handle the log. However, root is not a placeholder object. If a log object is logged, its parent object will receive the log at the same time. Therefore, some users will log twice when they create a Logger because the logger is logged once and the root object is logged at the same time.

Each logger has a log level. The following levels are defined in logging

Level	Numeric value
NOTSET	0
DEBUG	10
INFO	20
WARNING	30
ERROR	40
CRITICAL	50

When a logger receives a log message, it checks whether it is level or not, and if it decides to process it, it passes the message to Handlers for processing.

Handlers

Handlers distribute the messages sent by the Logger to the correct location. For example, send to console or file or both or somewhere else (process pipeline or whatever). It determines the behavior of each log and is the focus area that needs to be configured later.

Each Handler also has a log level. A Logger can have multiple handlers, which means a logger can pass logs to different handlers based on the log level. You can also pass to multiple Handlers at the same level so you have the flexibility to set them up according to your needs.

Filters

Filters provide finer-grained judgment to determine whether the log needs to be printed. In principle, when a handler obtains a log, it must process the log according to its level. However, if the handler has a Filter, it can perform additional processing and judgment on the log. For example, Filter can intercept logs from a specific source or modify or even modify the log level (after modification, determine the log level).

Both Loggers and handlers can install filters or even install multiple filters in series.

Formatters

Formatters specify the format layout for the final print of a particular record. Formatter concatenates the message into a concrete string, and Format simply prints the message %(message)s by default. There are some built-in LogRecord properties available in Format, as shown in the following table:

Attribute	Format	Description
asctime	%(asctime)s	Construct the log time in a readable form, which by default is’ 2016-02-08 12:00:00,123 ‘accurate to milliseconds
filename	%(filename)s	Name of the file containing path
funcName	%(funcName)s	Which function sends the log
levelname	%(levelname)s	The final log level (modified by filter)
message	%(message)s	Log information
lineno	%(lineno)d	The line number of the current log
pathname	%(pathname)s	The full path
process	%(process)s	The current process
thread	%(thread)s	The current thread

A Handler can only have one Formatter, so if you want to implement output in multiple formats you can only use multiple handlers.

3. The logging configuration

Simple configuration

As explained in the LoGGERS section, we have a default log object, root. The advantage of this root log object is that we can use logging directly for configuration and logging. Such as:

logging.basicConfig(level=logging.INFO,filename='logger.log')
logging.info("info message")
Copy the code

So the simple configuration here refers to the root log object, which is available on demand. Each Logger is a singleton so once configured it can be called from anywhere in the program. We can easily configure the root log object by simply calling basicConfig, which is actually quite efficient and easy to use. It guarantees that at least one Handler will be able to handle the log whenever any logger is called. A simple configuration can be set up like this:

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s', datefmt='[%Y-%m_%d %H:%M:%S]', filename='.. /log/my.log', filemode='a')Copy the code

Code configuration

A more nuanced way to set things up is to configure them in code, but this is the least used method because you don’t want to write your Settings into code. But here’s a little bit of an introduction, although you don’t use it much, you can use a handful if you have to. (Fill in later)

Configuration file Configuration

The logging configuration file in Python is based on ConfigParser functionality. That is, the configuration file format is also written in this way. I’ll go into details with a generic configuration file

############################################## [loggers] keys=root, log02 [logger_root] level=INFO handlers=handler01 [logger_log02] level=DEBUG handler=handler02 qualname=log02 ############################################## [handlers] keys=handler01,handler02 [handler_handler01] class=FileHandler  level=INFO formatter=form01 args=('.. /log/cv_parser_gm_server.log',"a") [handler_handler02] class=StreamHandler level=NOTSET formatter=form01 args=(sys.stdout,) ############################################## [formatters] keys=form01,form02 [formatter_form01] format=%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(process)d %(message)s datefmt=[%Y-%m-%d %H:%M:%S] [formatter_form02] format=(message)s ##############################################Copy the code

I believe that after a look, also find the rule, I will use # to separate out several blocks. Each logger or handler or formatter has a key name. In the logger example, you first need to add the key name in the [loGGERS] configuration to represent the logger. [loggers_xxxx] where XXXX is the key name. In log02 I configured level and a handler name. Of course you can configure multiple Handers. I’m going to go to [Handlers] and find the configuration for that particular handler, and so on. Then in the code, load the configuration file like this:

logging.config.fileConfig(log_conf_file)
Copy the code

There is a class configuration in Handler that some readers may not understand. This is actually some of the handler classes that were originally written in logging, and you can call them from here. Class refers to a class that is equivalent to the executor of the Handler that is being processed. As you can see in the logging documentation, all of the Handler classes are thread-safe and are safe to use. So the question is, what if there are multiple processes. In the next chapter I will focus on rewriting the Handler class to enable logging in a multi-process environment. We can either rewrite or create a whole new Handler class, and then point the class configuration to our own Handler class so that we can load our own Handler.

4. Logging encountered multiple processes (important)

This part is actually why I wrote this article. For some historical reason in Python, the performance of multithreading is largely ignored. So python typically uses multiple processes for parallel operations or parallel computation. However, logging in Python does not support multiple processes, so there are some problems. TimedRotatingFileHandler class as an example. This Handler is used to slice log files by the day. (Today’s file is XXXX. log yesterday’s file is XXXX.log.2016-06-01). This has the advantage of being able to find logs by the day, keeping log files small, and deleting expired logs by the day. But the problem is coming, if you are using multiple processes to output logs, only one process will switch, other processes will continue to play in the original file, and may have other process when some process switching in the new log file into a thing, so he will be ruthless to delete, to build a new log file. It’s gonna be messy, messy, and not fun at all. So there are several ways to solve the multi-process logging problem

why

Before we solve that, let’s look at why this is happening. Post the source code for TimedRotatingFileHandler. This is what happens when you switch:

def doRollover(self): """ do a rollover; in this case, a date/time stamp is appended to the filename when the rollover happens. However, you want the file to be named for the start of the interval, not the current time. If there is a backup count, then we have to get a list of matching filenames, sort them and remove the one with the oldest suffix. """ if self.stream: self.stream.close() self.stream = None # get the time that this sequence started at and make it a TimeTuple currentTime = int(time.time()) dstNow = time.localtime(currentTime)[-1] t = self.rolloverAt - self.interval if self.utc: timeTuple = time.gmtime(t) else: timeTuple = time.localtime(t) dstThen = timeTuple[-1] if dstNow ! = dstThen: if dstNow: addend = 3600 else: addend = -3600 timeTuple = time.localtime(t + addend) dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple) if os.path.exists(dfn): os.remove(dfn) # Issue 18940: A file may not have been created if delay is True. if os.path.exists(self.baseFilename): os.rename(self.baseFilename, dfn) if self.backupCount > 0: for s in self.getFilesToDelete(): os.remove(s) if not self.delay: self.stream = self._open() newRolloverAt = self.computeRollover(currentTime) while newRolloverAt <= currentTime: newRolloverAt = newRolloverAt + self.interval #If DST changes and midnight or weekly rollover, adjust for this. if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc: dstAtRollover = time.localtime(newRolloverAt)[-1] if dstNow ! = dstAtRollover: if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour addend = -3600 else: # DST bows out before next rollover, so we need to add an hour addend = 3600 newRolloverAt += addend self.rolloverAt = newRolloverAtCopy the code

We start by observing the line if os.path.exists(DFN), where the logic is to remove the DFN file if it exists and rename the baseFilename file to DFN. And then I’m going to open baseFilename again and start writing stuff. So the logic here is clear

In this example, the current log file name is current-log.log.2016-06-01
Check whether current. Log. 2016-06-01 exists. If so, delete it
Rename the current log file name to current.log.2016-06-01
Re-open the new file (I observed that the source code defaults to open in “A” mode, previously said to be “W”)

So in a multi-process scenario, if one process switches, the other process’s handle is still in current-log.2016-06-01 and it continues to write into it. Or if a process performs a switch, the current.log.2016-06-01 file renamed by another process will be deleted directly. Or, in another case, when one process is writing something, another process is already switching, causing unexpected things to happen. If the first process is doing step 3 and the second process has just finished step 2, then the first process has finished renaming but has not yet created a new current. Log. The second process has started renaming, and the second process will have an error because it cannot find current. If the first process has successfully created current.log, the second process will save the empty file as current.log.2016-06-01. Not only is the log file deleted, but process 1 thinks it’s done cutting and will not cut again, when in fact its handle points to current-log.2016-06-01. Ok, so this looks very complicated, but in fact it’s because there’s no constraint on multiple processes when it comes to file operations. So how do you solve this problem gracefully? I’ve come up with two options, and of course I’ll suggest some more for you to try below.

Solution 1

Earlier we found a flaw in the logic in TimedRotatingFileHandler. We just need to tweak the logic slightly:

Check whether the split file current-.log. 2016-06-01 exists. If not, rename the file. (If there is another process cut, I don’t need to cut, just change the handle.)
Open current. Log in “A” mode and find out how simple it is.

class SafeRotatingFileHandler(TimedRotatingFileHandler): def __init__(self, filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False): TimedRotatingFileHandler.__init__(self, filename, when, interval, backupCount, encoding, delay, utc) """ Override doRollover lines commanded by "##" is changed by cc """ def doRollover(self): """ do a rollover; in this case, a date/time stamp is appended to the filename when the rollover happens. However, you want the file to be named for the start of the interval, not the current time. If there is a backup count, then we have to get a list of matching filenames, sort them and remove the one with the oldest suffix. Override, 1. if dfn not exist then do rename 2. _open with "a" model """ if self.stream: self.stream.close() self.stream = None # get the time that this sequence started at and make it a TimeTuple currentTime = int(time.time()) dstNow = time.localtime(currentTime)[-1] t = self.rolloverAt - self.interval if self.utc: timeTuple = time.gmtime(t) else: timeTuple = time.localtime(t) dstThen = timeTuple[-1] if dstNow ! = dstThen: if dstNow: addend = 3600 else: addend = -3600 timeTuple = time.localtime(t + addend) dfn = self.baseFilename + "." + time.strftime(self.suffix, timeTuple) ## if os.path.exists(dfn): ## os.remove(dfn) # Issue 18940: A file may not have been created if delay is True. ## if os.path.exists(self.baseFilename): if not os.path.exists(dfn) and os.path.exists(self.baseFilename): os.rename(self.baseFilename, dfn) if self.backupCount > 0: for s in self.getFilesToDelete(): os.remove(s) if not self.delay: self.mode = "a" self.stream = self._open() newRolloverAt = self.computeRollover(currentTime) while newRolloverAt <= currentTime: newRolloverAt = newRolloverAt + self.interval #If DST changes and midnight or weekly rollover, adjust for this. if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc: dstAtRollover = time.localtime(newRolloverAt)[-1] if dstNow ! = dstAtRollover: if not dstNow: # DST kicks in before next rollover, so we need to deduct an hour addend = -3600 else: # DST bows out before next rollover, so we need to add an hour addend = 3600 newRolloverAt += addend self.rolloverAt = newRolloverAtCopy the code

Don’t assume that the code is that long, in fact, the modification is only the “##” comments, the rest is copied from the source code. This class inherits the TimedRotatingFileHandler to override the shard process. This solution is very elegant, with very few changes, and very effective. If two or more processes enter the if statement at the same time, rename will still delete the log. Does this kind of things do happen, due to the split files only once a day, just the shard when have two Handler in operation at the same time, it also go to here, is also pretty smart, but in order to perfect, you can add a file lock, if after the lock, lock before judging once, and then to rename this way perfect. The code is not posted, involving the lock code, the effect is beautiful.

Solution 2

I think the simplest and most effective solution. Overrides the FileHandler class (which is the TimedRotatingFileHandler class that all file-writing handlers need to inherit from; We can just add some simple judgments and operations. Our logic goes like this:

Check whether the current timestamp is the same time as the file name pointed to
If not, switch the pointing file to end, is not very simple logic. talking is cheap show me the code:

class SafeFileHandler(FileHandler): def __init__(self, filename, mode, encoding=None, delay=0): """ Use the specified filename for streamed logging """ if codecs is None: encoding = None FileHandler.__init__(self, filename, mode, encoding, delay) self.mode = mode self.encoding = encoding self.suffix = "%Y-%m-%d" self.suffix_time = "" def emit(self, record): """ Emit a record. Always check time """ try: if self.check_baseFilename(record): self.build_baseFilename() FileHandler.emit(self, record) except (KeyboardInterrupt, SystemExit): raise except: self.handleError(record) def check_baseFilename(self, record): """ Determine if builder should occur. record is not used, as we are just comparing times, but it is needed so the method signatures are the same """ timeTuple = time.localtime() if self.suffix_time ! = time.strftime(self.suffix, timeTuple) or not os.path.exists(self.baseFilename+'.'+self.suffix_time): return 1 else: return 0 def build_baseFilename(self): """ do builder; in this case, old time stamp is removed from filename and a new time stamp is append to the filename """ if self.stream: self.stream.close() self.stream = None # remove old suffix if self.suffix_time ! = "": index = self.baseFilename.find("."+self.suffix_time) if index == -1: index = self.baseFilename.rfind(".") self.baseFilename = self.baseFilename[:index] # add new suffix currentTimeTuple = time.localtime() self.suffix_time = time.strftime(self.suffix, currentTimeTuple) self.baseFilename = self.baseFilename + "." + self.suffix_time self.mode = 'a' if not self.delay: self.stream = self._open()Copy the code

Check_baseFilename; Build_baseFilename is the handle to execute logic 2. That’s it. The difference with this scheme is that the current file is current-log.2016-06-01, and tomorrow the current file will be current-log.2016-06-02 with no rename and no deletion. Very simple and elegant. It can also solve multi-process logging problems.

Solution Others

Of course, there are other solutions, such as having one logging process log all log content, and other processes funnel all log content into the logging process pipeline for it to manage. The same is true for logging into network sockets.

5. Reference materials

This is where the first scheme comes from