Table tidy

A table tidy , or recuperator of memory , or glanor of cells (in English garbage collector , shortened in GC) is a subsystem Informatique of automatic management of the memory. It is responsible for the recycling of the allocated memory beforehand then unutilised.

When a system has a table tidy, this last fact generally part of the environment of execution associated with a particular Computer programming language. The collecting of the crumbs was invented by John McCarthy like belonging to the first system Lisp.

Definition and operation

The basic principle of the automatic recovery of the memory is simple:
  • to determine which objects in the program cannot be used,
  • to recover the storage used by these objects.

Although in general it is impossible to determine in advance at which time an object will not be used any more, it is possible to discover it with the execution: an object on which the program does not maintain any more a reference, therefore become inaccessible, will not be used any more.

Principle of accessibility of an object

The table tidies use a criterion of accessibility to determine if an object can be potentially used.

The principles are:

  • a unit distinct from objects which are supposed atteignables, they are the roots. In a typical system these objects are the registers machine, the pile, the pointer of instruction, the aggregate variables. In other words all that a program can reach directly.
  • any object referred since a atteignable object is itself atteignable.

Known as differently: a atteignable object can be obtained while following a chain of pointers or references.

Obviously, such an algorithm is a preserving approximation of the ideal objective of destruction of the values not being useful more: certain values can extremely well be accessible since the roots but not to be used never again. This ideal objective is however inaccessible algorithmiquement: to determine which values will be useful in the future is equivalent to the Problème of the stop.

This preserving approximation is the reason of the possibility of escapes of memory, i.e. of the accumulation of stacks memory which will never be re-used, but ever released either. For example, a program can preserve a pointer on a structure of data which will never be re-used. It for this reason is recommended to crush the pointers towards unutilised structures, in order to avoid preserving useless references.

Basic algorithm

The algorithm of the table tidy is due to Schorr and Waite. The table tidies carry out cycles of collecting. A cycle is started when the recuperator decides (or is notified) that it must recover storage space. A cycle is consisted of the following stages:
  • Créer units says black, gray and white. Initially, the black unit is empty, the gray unit contains the objects “roots” and possibly certain selected additional objects according to the particular algorithm employed, and the white unit contains all the remainder. Constantly in the execution of the algorithm, an object can be only in one of the three units. The white unit can be seen like the whole of the objects of which we try to recover the memory capacity; during the cycle, the algorithm will remove objects of the white unit, leaving there the objects of which it can claim the memory capacity.

  • To choose an object of the gray unit, to move this object towards the black unit, to move all the white objects referred directly by this object towards the gray unit. This stage is repeated until there is no more objects as a whole gray.
  • When there are no more objects as a whole gray, then all the objects remaining as a whole white are not atteignables, and it memory capacity which they use can be claimed.

The invariant of the three colors can be translated like this: no object black points directly on a white object .

Let us observe that the algorithm above preserves the invariant of the three colors. The initial partition does not have a black object, so that the invariant trivialement is trivialement respected. Thereafter, if an object becomes black, all its direct wire (objects that it reference) become gray, this preserving the invariant. When the last stage of the algorithm is carried out, because the invariant is preserved, any the objects of the black unit does not point towards an object of the white unit (and there is no more gray object) what means that the residual white objects are unattainable since the roots. The system can then call their destructors and release their memory capacity.

Certain alternatives of the algorithm do not respect the invariant of the three colors, but they use a different principle by which all the important properties are respected.

Example in C++

#include class has { int X; public: With () {X = 0; ++x; } }; int hand () { for (int I = 0; I < 1000000000; ++i) { With *a = new has (); delete has; } std:: cost << " DING! " << std:: endl; }

Alternatives

The basic algorithm has several alternatives.
  • table tidy which moves the objects in memory (which change their address), known as stop and Copy .

  • Certains recuperators can correctly identify all the references to an object: they are called “exact” recuperators, by opposition with preserving” or “partially preserving” recuperators “. The “preserving” table tidies must suppose that any continuation of bits in memory is a pointer if (when they are interpreted like a pointer) it points on any instancié object. Thus, the preserving recuperators can have negative forgeries, where the memory capacity is not claimed because of the false accidental pointers. In practice this is seldom a great disadvantage.

  • the table tidy can be carried out in alternation or parallel with the remainder of the system; the simplest recuperators suspend the execution of the system when they carry out a cycle; they are not incrémentaux; the incrémentaux recuperators interlace their work to be carried out during times of inactivity of the remainder of the system. Certain incrémentaux recuperators can be carried out completely in parallel in a separate Thread; they can in theory be carried out on a different processor, but the cost of the setting in Cohérence of the masks makes this approach less practical than it appears to with it.

Classification of the table tidies

The recuperators can be classified by considering the way whose they implement the three whole of white objects, gray and black.

Marking and cleaning

Or mark and sweep in English. A table tidy of this type maintains a bit (or two) associated with each object to indicate if it is white or black; the gray unit is maintained either like a separate list or by using another bit. A recuperator copier distinguishes the gray objects and blacks by copying them towards other zones memory (the space of copy) and often differentiates the black objects from the gray objects out of Bi-partitionnant the space of copy (in the simplest case by maintaining a single pointer which indicates separation between the black and gray objects). An advantage of the table tidies copiers is that the release of the white objects (deaths) is made in bulk, while releasing in only once the old zone, and that the cost of the table tidy is proportional to the numbers of alive objects. This is particularly useful when there are many allocated objects, whose majority are temporary and die quickly.

Conservative vs Précis

Or conservative vs specifies English . A table tidy is conservative when it does not release certain storage areas become useless. For example, the table tidy of Boehm regards any word report as a potential pointer to follow, including on the pile of call, and is used easily out of C. On the contrary, the precise table tidies distinguish everywhere the pointers from the other data (including on the pile of call) and with this intention require the co-operation of the compiler (which will generate the descriptors of framework of calls) or of the programmer. Generally, the conservative table tidies are markers and do not modify the address of the zones used.

Recuperator with generations

Or generational GC in English. All the data of a program do not have the same lifespan. Some are eliminable very little of time after their creation (for example, a structure of data only created to turn over a value of a function, and dismantled as soon as the data were extracted from it). Others persist during all the execution time of the program (for example, of the total tables created during initialization). A table tidy treating all these data in the same way is not inevitably more effective.

A solution would be to require of the programmer to label the data created according to their probable lifespan. However, this solution would be heavy to use; for example, it is current that the data are created in functions of library (for example, a function creating a Table of chopping), it would be necessary to provide them the lifespans in parameter.

A less invasive method is the system of the generations. The table tidy operates then on a hierarchy of 2 or more generations, staged moreover “young” with more “old”. The data lately created (in general) are placed in the youngest generation. One rather frequently collects the crumbs in this young generation; the data still present at the conclusion of the destruction of the inaccessible data of this generation are placed in the generation of higher age, and so on. The idea is that the data moreover short duration of life do not reach, for the majority, the higher generation (it can reach it if they have just been allocated when the collecting of crumbs the reference mark in the young generation, but it is a rare case).

One generally uses 2 or 3 generations, of increasing sizes. Generally, one does not use the same algorithm of table tidy for the various generations. It is thus current to use an algorithm not Incrémental for the youngest generation: because of its low size, the time of table tidy is weak and the temporary interruption of the execution of the application is not awkward, even for an interactive application. The older generations are rather collected with incrémentaux algorithms.

The adjustment of the parameters of a table tidy with generation can be delicate. Thus, the size of the youngest generation can influence in an important way the computing time (a overcost of 25%, for example, for a badly selected value): time of table tidy, impact on the locality of the mask… In addition, the best choice depends on the application, the type of processor and architecture memory.

Counting of references

A solution which comes quickly to mind for the automatic release from storage areas is to associate with each one a meter giving the number of references which point on it. These meters must be updated at each time a reference is created, deteriorated or destroyed. When the meter associated with a zone memory reaches zero, the zone can be released.

This technique suffers from an unquestionable disadvantage during the use of cyclic structures: if a structure has point on a structure B which points on has (or, more generally, if there exists a cycle in the graph of the references), but that no external pointer points neither on has nor on B , the structures has and B is never released: their meters of references are strictly higher than zero (and as it is impossible that the program reach has or B , these meters cannot never pass by again to zero).

Because of these limits, some consider that the counting of references is not a technique of recovery of memory strictly speaking; they restrict the term of recovery of memory to techniques based on accessibility.

The counting of references suffers from certain serious problems, like its cost high in computing times and also in memory capacity and, as one saw, impossibility of managing the circular references. On another side, it recovers the “crumbs rather quickly”, which has advantages if there are destructors to carry out to release the rare resources (sockets…) others that the heap (memory).

Hybrid systems using the counting of references to obtain the quasi immediate release of the resources, and calling on the occasion a recuperator of the type Mark and Sweep to release the objects containing of the cycles of references, were proposed and sometimes implemented. That gives the best of the two worlds, but always at the price of a cost high in terms of performances.

Languages equipped with table tidy

Advantages and disadvantages of the table tidies

The languages using a table tidy make it possible to write simpler and surer programs. The memory being managed automatically by the environment of execution, the programmer is released from this task, source of many errors difficult to flush out. The manual management of the memory is one of the most current sources of error.

Three principal types of errors can occur:

  • the access to a not allocated zone, or which was released,
  • release of an already released zone,
  • not-release of the unutilised memory ( escapes memory ).

The use of suitable tools and methodology makes it possible to reduce the impact of it, while the use of a table tidy makes it possible to eliminate them almost completely - the escapes of memory remain possible, although rarer. This simplification of the programming work can present some disadvantages, mainly at the level of the performances of the programs using them.

Measurements show that in certain cases the implementation of a table tidy increases the performances of a program, in other case the opposite occurs. The choice of the parameters of the table tidy can also deteriorate or significantly improve the performances of a program. When the table tidy carries out many operations of copies in background task (case of the algorithm stop-and-Copy ), it tightens with défragmenter the memory. The table tidy can thus appear faster than an ad hoc coding of the allowance/desallocation. The best implementations can also optimize the use of the hiding places memories, thus accelerating the access to the data. A contrario , the operation of collection is often expensive.

It is difficult to limit the execution time of the phase of collection of the objects not atteignables. The use of a standard table tidy can thus make difficult the writing of programs real-time; a specialized table tidy (time-reality) must be used for that.

Without intervention of the programmer, a program using a table tidy tends to use more memory than a program where management is manual (by admitting that, in this case, there are no escapes, of error of access or release). However, nothing prohibits to employ strategies of pre-allowance of the objects used, in pools, when one wants to minimize the rate of allowance/desallocation. In this case, the table tidy always provides the benefit of programming without serious error of management of the memory (an insurance).

Although it is not the goal of a table tidy its implementation can also facilitate the implementation of the Persistance of object (certain algorithms are divided).

Quotations

“It is known as that the programmers Lisp know that the management of the memory is so important that it cannot be left the programmers, and that the programmers C know that the management of the memory is so important that it cannot be left with the system” -- Bjarne Stroustrup perhaps drawn from a former source.

See too

External bonds

  • Collect-crumbs of Boehm-Demers-Weiser

  • Collect-crumbs of .NET and automatic Java
  • Recovery of the memory
  • memorymanagement.org
  • Publications of group OOPS of the University of Texas
  • Resources on the collect-crumbs

References

H. Schorr, W.M. Waite, Efficient Year Machine-Independent Procedure for Garbage Collection in Various List Structures . CACM August 1967

R.E. Jones and R. Lins. Garbage Collection: Algorithms for Automatic Dynamic Memory Management . Wiley, Chichester, July 1996.

Random links:1951 in science | Schefflera veitchii | Ovine national federation | Path of expansion | Route secondary road 772 (Puy-de-Dôme) | Groupe_symétrique