I wrote about how to log elegantly in Django. I thought that once the code went live, I would be happy to read the log and analyze the problem through the log, but the reality was always different, and two exceptions slapped me in the face with big hands.

The two exceptions are as follows:

  1. Log write error;
  2. Logs are not segmented by day and can be lost.

I searched some information on the Internet and found the reason:

Django Logging is implemented based on the Python Logging module, which is thread-safe but does not guarantee multi-process safety. My Django project is a multi-process startup using UWSGi, so both of these problems occur.

The following is a detailed description of the abnormal process. Assume that we generate a log file error.log every day and perform log segmentation at dawn every day. So, in a single-process environment, it looks like this:

  1. Generate the error. Log file;
  2. Write a day’s log;
  3. At 0, check whether error.log-2020-05-15 exists, if so, delete it; If not, rename the error.log file to error.log-2020-05-15;
  4. Regenerate the error.log file and point the Logger handle to the new error.log.

Let’s look at the multi-process case:

  1. Generate the error. Log file;
  2. Write a day’s log;
  3. At 0, process 1 determines whether error.log-2020-05-15 exists, and deletes it if so. If not, rename the error.log file to error.log-2020-05-15;
  4. At this point, process 2 May still be writing to the error.log file, because the write file handle has been opened, so it will write to error.log-2020-05-15.
  5. Because error.log-2020-05-15 already exists, process 2 will delete it and rename error.log, which will cause log loss.
  6. Because process 2 renamed error.log to error.log-2020-05-15, process 1 continued to write to error.log-2020-05-15, which caused the write error.

The reason is clear. So, what’s the solution? Two options:

Use the concurrent – log – handler package

This package provides multi-process security by locking and can split log files when they reach a certain size, but does not support splitting by time.

The source code of the package I did not carefully study, there is no way to elaborate again, if you are interested in it, you can look for yourself.

Since my requirements are divided by time, I obviously can’t use it, so I’ll write it myself.

Rewrite TimedRotatingFileHandler

From the above analysis, it can be known that the problem occurs when the log is split. One is deleting files, and the other is not updating the write handle in time.

Therefore, in view of these two points, my countermeasures are: first, remove the logic of deleting files; second, update the write handle to the latest when cutting files.

The code is as follows:

# Fix multiprocess log write chaos


import os
import time
from logging.handlers import TimedRotatingFileHandler


class CommonTimedRotatingFileHandler(TimedRotatingFileHandler):

    @property
    def dfn(self):
        currentTime = int(time.time())
        # get the time that this sequence started at and make it a TimeTuple
        dstNow = time.localtime(currentTime)[- 1]
        t = self.rolloverAt - self.interval
        if self.utc:
            timeTuple = time.gmtime(t)
        else:
            timeTuple = time.localtime(t)
            dstThen = timeTuple[- 1]
            ifdstNow ! = dstThen:if dstNow:
                    addend = 3600
                else:
                    addend = - 3600.
                timeTuple = time.localtime(t + addend)
        dfn = self.rotation_filename(self.baseFilename + "." + time.strftime(self.suffix, timeTuple))

        return dfn

    def shouldRollover(self, record):
        "" whether to perform the log scroll operation: 1. Perform the scroll operation when the archive file already exists 2. Perform the scroll operation when the current time >= the scroll time.
        dfn = self.dfn
        t = int(time.time())
        if t >= self.rolloverAt or os.path.exists(dfn):
            return 1
        return 0

    def doRollover(self):
        "" perform scroll operation 1. Update file handle 2. Handle existing files 3.
        if self.stream:
            self.stream.close()
            self.stream = None
        # get the time that this sequence started at and make it a TimeTuple

        dfn = self.dfn

        Archive log already exists processing
        if not os.path.exists(dfn):
            self.rotate(self.baseFilename, dfn)

        # Backup number control
        if self.backupCount > 0:
            for s in self.getFilesToDelete():
                os.remove(s)

        # delay processing
        if not self.delay:
            self.stream = self._open()

        # Update scrolling point in time
        currentTime = int(time.time())
        newRolloverAt = self.computeRollover(currentTime)
        while newRolloverAt <= currentTime:
            newRolloverAt = newRolloverAt + self.interval

        # If DST changes and midnight or weekly rollover, adjust for this.
        if (self.when == 'MIDNIGHT' or self.when.startswith('W')) and not self.utc:
            dstAtRollover = time.localtime(newRolloverAt)[- 1]
            dstNow = time.localtime(currentTime)[- 1]
            ifdstNow ! = dstAtRollover:if not dstNow:  # DST kicks in before next rollover, so we need to deduct an hour
                    addend = - 3600.
                else:           # DST bows out before next rollover, so we need to add an hour
                    addend = 3600
                newRolloverAt += addend
        self.rolloverAt = newRolloverAt
Copy the code

Just introduce the upper class in Settings Handles.

The production environment has been online for two weeks and runs stably without any exceptions.

The above.

Reference Documents:

www.jianshu.com/p/665694966…

Juejin. Cn/post / 684490…

Tech Blog:

Github.com/yongxinz/te…

Meanwhile, welcome to follow my wechat official account AlwaysBeta, more exciting content waiting for you.