这是一篇协同翻译的文章，目前翻译进度为 20％，你可以点击『我来翻译』按钮来 参与翻译 。
B都应该被垃圾回收处理。但是，这两个对象的引用计数值不为零， 所以内存会一直被占用。为了解决这个问题，CPython 通过使用算法来检测是否存在循环引用并释放循环引用中的对象。
- 通过启发式算法提升性能。 越晚创建的对象更可能需要被回收。 CPython引入了一个
分代回收的概念来判断一个对象使用的相对年龄。年轻一代是指最新被创建出来的对象，而老一代则代表早前创建的对象。每个对象都确定的属于某一代。 当垃圾回收机制执行时， CPython 会优先尝试回收年轻一代的对象。CPython会定期回收老一代的对象(由启发式算法确定该回收执行的效率).
The Garbage Collection Lifecycle
It's probably most instructive to go through the lifecycle of how the CPython garbage collector would typically get run. Let's create a new object and see what happens with the garbage collector:
Python wants to allocate a new object. To do this, it makes a call to
_PyObject_GC_Malloc. This method assigns the object some memory locations and adds the object to the garbage collector's first generation (we'll call it generation 0). The method then checks to see if the number of objects in generation 0 is greater than some threshold. If it is and the garbage collector is not currently running, then a call to
collect_generationsis made to begin garbage collection. Otherwise, the object is just allocated normally.
- Python starts to do garbage collection when
collect_generationsgets called. This method figures out what generation to do garbage collection on (by default CPython has 3 generations but this can be modified with the GC module. In addition, younger generations have lower indices, so generation 0 is the youngest generation). Python will loop over all generations (from oldest to youngest) and detect whether a particular generation's object count is greater than some threshold. If it is, then it will merge all younger generations with the current generation and perform garbage collection on that generation by calling
collect. NOTE: Python wants to do garbage collection on generation 0 for better performance, because this has the newest objects and also will have the fewest objects to iterate over. Doing garbage collection on the oldest generation is equivalent to collecting over all objects because doing garbage collection on generation i will use all objects in generations 0 through i.
collectmethod will run garbage collection on a specified generation. What this amounts to is running the reference cycle detection algorithm (explained later) and finding a set of reachable and unreachable objects in a particular generation. The reachable objects will be merged into the next higher generation (i.e. if
collectwas run on generation
i, then the reachable objects from generation
iwould be merged into generation
i+1). For the unreachable objects, CPython will make all necessary finalizer callbacks, make weak ref callbacks, and finally deallocate the objects.
- Finally, the internal state of the garbage collection module will be updated as
collectfinishes performing its duties.
CPython's Algorithm for Detecting Reference Cycles
Python attempts to find reference cycles within a generation. Confining the search for reference cycles to a single generation decreases the amount of work that has to be done in a single collection (if it is one of the generations that holds younger objects).
To find reference cycles, Python uses
young, the pointer to the head of the list of objects for the generation that's being garbage collected, and runs the following:
update_refs(young) subtract_refs(young) gc_init_list(&unreachable) move_unreachable(young, &unreachable)
update_refs method makes a copy of the reference count for every object in the generation so that the garbage collector can mutate its own version of the reference count without messing with the real reference count.
subtract_refs goes through each of the objects
i in the generation being garbage collected, and decrements the reference counts on any objects
j in the generation list that are referenced by object
i. After this method has run, the reference count on an object in the generation should equal the number of references to that object from objects which do not belong to that generation (since all references from objects within the same generation have been removed).
Now comes the fun part. The
move_unreachable method scans through the
young list and moves objects with a reference count of 0 into the
unreachable list and changes their reference count to
GC_TENTATIVELY_UNREACHABLE. Objects with a non-zero reference count are marked as
GC_REACHABLE and the objects they reference are traversed and marked as
GC_REACHABLE then moved to the end of the
young list so they too can be traversed later.
The reason that objects with a reference count of 0 are tentatively unreachable is as follows. Suppose object A has been marked as tentatively unreachable and is referenced by some object B. Suppose that B is in the same generation as A and is actually reachable from outside the generation, but that B comes later in the
young list than A. Then A would be sent to the
unreachable list when
move_unreachable scans over it. However, when
move_unreachable scans over B, it will notice that it's reference count is non-zero, mark it as
GC_REACHABLE, and traverse B's references and mark them as reachable. Now, A has become
GC_REACHABLE as well and has been moved to the end of the
young list so that it's references can also be marked as
GC_REACHABLE. Thus, we only know that an object is unreachable after
move_unreachable has scanned over the entire
Once the entire
young list has been traversed, then all the items left in the
unreachable list are definitely unreachable and so they can be deallocated. The items in the
young list are then merged into older generations.
- CPython's garbage collector still stops the world, but does so more infrequently than other implementations. Reference count deallocation increases the time between collections because the number of objects in a generation decreases whenever an object's reference count falls to 0 and gets deallocated. Since collections are triggered when the number of objects in a generation are above a threshold, reference count deallocation decreases the number of collections as long as there aren't too many reference cycles.
- If the hypothesis that younger objects are the objects more likely to need to be garbage collected is true, then running the garbage collector on younger generations can significantly reduce total runtime.
- Memory fragmentation occurs. Reference counting will naturally cause memory to get fragmented since the deallocation of an object means there will be small chunks of memory that get added back onto the heap. Some garbage collectors deal with fragmentation by copying all live objects into a different section of memory and freeing up an entire section of memory, but CPython doesn't.
- Normal execution is slower. There is an extra check whenever an object gets allocated, referenced, or dereferenced. This means that instead of running normally, then performing a garbage collection all at once like some tracing garbage collectors do, CPython's reference count garbage collector will spread the work across normal activities and perform collection less frequently. Unfortunately, this means the total amount of work that goes into garbage collection is higher (because of the added reference count checks).
我们的翻译工作遵照 CC 协议，如果我们的工作有侵犯到您的权益，请及时联系我们。