What is a memory manager (What)

Unlike most programming languages, variables in Python do not need to be declared in advance, variables do not need to be typed, programmers do not need to worry about memory management, and the Python interpreter automatically retrievals you. Developers don’t have to worry too much about memory management mechanisms, it’s all up to the Python memory manager to take on the complexity of memory management.

Memory is created and destroyed, and this article will focus on Python’s memory pool and garbage collection.

Python memory pool

Why introduce memory pools

When creating a large number of objects that consume a small amount of memory, frequent calls to new/malloc can result in a large amount of memory fragmentation, resulting in reduced efficiency. The function of the memory pool is to apply for a certain number of memory blocks of the same size in the memory in advance to be reserved for standby. When there is a new memory demand, memory is allocated from the memory pool to meet the demand, and then apply for new memory. The most significant advantage of this is that it reduces memory fragmentation and improves efficiency.

The memory management mechanism in Python is Pymalloc

How memory pools work

First, let’s look at a CPython memory architecture diagram:

  • Python’s object management is primarily at the Level+1 to Level+3 levels
  • Level+3: Python’s built-in objects (e.g. ints,dict, etc.) have separate private memory pools. Memory pools are not shared between objects, i.e. memory freed by ints is not allocated to floats
  • Level+2: if the requested memory size is smaller than 256KB, memory allocation is mainly implemented by Python’s object allocator
  • Level+1: When the requested memory size is larger than 256KB, it is allocated by Python’s native memory allocator, essentially calling functions such as malloc/realloc in the C standard library

In terms of freeing memory, Python calls an object’s destructor when its reference count goes to 0. Calling the destructor does not necessarily mean that free will eventually be called to free up memory. If it is, then frequently applying and freeing memory can make Python’s execution less efficient. Therefore, the memory pool mechanism is also used during the destruction. memory obtained from the memory pool is returned to the memory pool to avoid frequent request and release actions.

Garbage collection mechanism

The garbage collection mechanism in Python is based on reference counting, supplemented by mark-cleanup and generational collection. Among them, the mark-clean mechanism is used to solve the problem that memory cannot be freed due to circular references caused by counting references, and the generational collection mechanism is used to improve the efficiency of garbage collection.

Reference counting

Python keeps track of variables in memory by reference counting, which records the number of times an object has been referenced by other objects in use.

Python has an internal tracking variable called the reference counter, how many references there are per variable, or reference count for short. When an object has a reference count of zero, it is placed in the garbage collection queue.

>>> a=[1,2] >>> import sys >>> sys.getrefcount(a) ## number of times to get a reference to object a 2 >>> b=a >>> sys.getrefcount(a) 3 >>> del b ## delete a reference to object b >>> sys.getrefcount(a) 2 >>> c=list() >>> c.append(a) ## Add to the container >>> sys.getrefcount(a) 3 >>> del c ## delete the container, Reference -1 >>> sys.getrefCount (a) 2 >>> b=a >>> sys.getrefcount(a) 3 >>> a=[3,4] ## Reassignment >>> sys.getrefcount(a) 2Copy the code

Note: When passing a as an argument to getrefcount, a temporary reference is generated, so the result is +1 more than the real one

  • The reference count increases:
  1. An object is assigned a new name (for example: a=[1,2])
  2. Put it in a container (such as a list, tuple, or dictionary) (e.g. C.append (a))
  • When the reference count is reduced:
  1. Explicit destruction of an object alias using the del statement (e.g., del b)
  2. The container in which the object resides is destroyed or the object is removed from the container (for example, del c).
  3. References are out of scope or re-assigned (e.g., a=[3,4])

Reference counting solves most garbage collection problems, but in cases where two objects reference each other, the DEL statement reduces the number of references, but the reference count does not go to zero and the object is not destroyed, causing a memory leak. For this, Python introduced a mark-clear mechanism.

Mark-clear

Mark-clear is used to resolve the problem of circular references generated by the reference counting mechanism, resulting in memory leaks. Circular references are created only for container objects such as dictionaries, tuples, lists, etc.

As the name suggests, the mechanism is divided into two steps during garbage collection, which are:

  • The tag phase, in which all objects are traversed, marks the object as reachable if it is reachable, that is, if there are objects that reference it
  • In the clean phase, objects are traversed again, and if an object is not marked as reachable (that is, Unreachable), it is reclaimed
> > > a = [1, 2] > > > b = [3, 4] > > > sys. Getrefcount 2 (a) > > > sys. Getrefcount (b) 2 > > > a.a ppend (b) > > > sys. Getrefcount (b) 3 > > > b.append(a) >>> sys.getrefcount(a) 3 >>> del a >>> del bCopy the code
  • A refers to B,b refers to a, and the two objects are each referenced twice (removing temporary references to getrefcout()).

  • After del is executed, objects A and B are referenced by -1, and their reference counters are 1, thus falling into a circular reference

  • Tag: Find one of the ends, a, because it has a reference to b, and count the reference to B by -1

  • If the object is not referred to as a, the reference to a is 0. If the object is not referred to as a, the reference to b is 0. If the object is not referred to as a, the reference to b is 0.

  • Clear: Objects marked unreachable are the ones that really need to be released

The phase of garbage collection described above suspends the entire application until the flag is cleared before resuming the application. To reduce the amount of time an application is suspended, Python improves garbage Collection efficiency with a space-for-time approach called Generational Collection.

Generational recycling

Generational recycling is based on the statistical fact that, for programs, a certain percentage of memory blocks have a short lifetime; And the rest of the memory block, the lifetime will be relatively long, even from the beginning of the program to the end of the program. The proportion of short-lived objects is usually between 80% and 90%. So, to put it simply: the longer an object lives, the more likely it is not garbage, and the less it should be collected. In this way, the number of objects traversed by the mark-clean algorithm can be effectively reduced, thus improving the speed of garbage collection, which is a method strategy of swapping space for time.

Python divides all objects into three generations: young (generation 0), middle (generation 1), and old (generation 2). All new objects are generation 0 objects by default. Objects that survive a GC scan in generation 0 will be moved to generation 1, and objects that survive a GC scan in generation 1 will be moved to generation 2.

Gc scanning times (generation 0 > Generation 1 > Generation 2)

When the difference between allocated objects and freed objects in a generation reaches a certain threshold, gc scans in the current generation are triggered. When a generation is scanned, the younger generation is also scanned. Therefore, when the GC scan of the second generation occurs, the GC scan of the first generation will also occur, which is the whole generation scan.

>>> import gc >>> gc.get_threshold() ## Generational collection mechanism parameter threshold setting (700, 10, 10)Copy the code
  • 700= number of newly allocated objects – Number of released objects, generation 0 GC scan triggered
  • The first 10: generation 0 GC scan occurs 10 times, then generation 1 GC scan is triggered
  • Second 10: The gc scan of the first generation occurs 10 times, then the GC scan of the second generation is triggered

thinking

In mark-clear, what happens after the del operation if the object C also references a?

References to objects A, B, and C are shown in the figure below:

> > > a = [1, 2] > > > b = [3, 4] > > > c = a > > > a.a ppend (b) > > > b.a ppend (a)Copy the code

  • Ref_count Indicates the reference count
  • Objects A, B, and C are all reachable

After executing del, the reference relationship looks like the figure below:

>>> del a
>>> del b
Copy the code

  • A,b,c ref_count = 1

Perform A GC scan

  • Mark: a references b, subtract 1 from b’s ref_count by 0, b references a, subtract 1 from a’s ref_count by 1, put b under unreachable

  • Recycling: Since a is reachable, all nodes reachable from a will be recursively marked as reachable, i.e.

  • Clear: There is no object that can be cleaned under unreachable, so a, B, and C objects will not be cleaned

conclusion

In general, Python uses memory pools to reduce memory fragmentation and improve execution efficiency. The garbage collection is mainly completed by reference counting, the problems caused by circular reference of container objects are solved by mark-clean, and the efficiency of garbage collection is improved by generational collection.

Welcome to [pay attention] to me, let’s study and make progress together.

In addition, if you have any questions, please discuss in the comment section and communicate actively.

If this post helped you, don’t forget to “like” and “collect”, and refuse to reach out!