Sérialisation
In Data-processing, the serialization (of the American English serialization , the term marshalling is also employed in a synonymous way) is a process aiming at encoder the state of an object which is in memory in the form of a succession of bytes. This continuation of bytes could for example be used for the safeguard on disc (Persistance) or transport on the network (proxy, RPC…). The symmetrical activity, aiming at decoding the continuation of bytes to create a certified copy of the objects of origin, is called the deserialisation (or Unmarshalling ).
Of simple appearance, these operations pose actually a certain number of problems, as for example the management of the references enter objects or the portability of the encodings. In addition the choices between the various techniques of serialization have an influence on the criteria of performances as the size of the continuation of bytes serialized or the speed of their treatment.
Context
As for much in algorithmic choices, plus the mechanism of serialization is specialized for a specific type of data, more it will be powerful. For example, if one wants to transmit only ten numbers whose values lie between 0 and 255, he will be enough to 10 bytes. So on the other hand one cannot in advance the quantity of objects be transmitted one will have at least to envisage an additional byte to transmit this quantity. So in more they are not only integers, but of the unspecified objects that one wishes to transmit, it will be necessary to envisage to associate with it information which will make it possible to code the precise type of each object.
Overall, it is necessary to do one a priori on the available resources at the time of the deserialisation to determine information which one will be able to rebuild using a simple reference and those which it is necessary of encoder. It is for example the case of the Font faces in a file pdf : according to whether one wishes to privilege the exactitude of returned on all the machines or the size of the generated file, it is possible to transmit the complete definition of the layout of the characters or to be satisfied to transmit the name of the police force and some other basic characteristics, by leaving the care to the target machines to seek the police force most adapted among those she has.
There exists finally of information whose nature and which does not make it possible to be serialized will have in any event to be rebuilt. It is the case for example of a Descripteur of file. From one machine to another and even from one execution to another of the program on the same machine, these descriptors are allotted in an arbitrary way by the Operating system : There is thus no direction with sérialiser their contents and one will need encoder rather information which will make it possible to rebuild it at the time of the deserialisation (such as for example the complete name of the file reached via the descriptor). Another typical case, is the serialization of the pointer , which is the subject of a technique spécifique : the Change of pointers ( Swizzling in English).
Encoding
The basic choice is between binary format and format text :
- the binary files are generally more compact, the code for Parser this type of data is simpler to develop, the reading and the writing is less demanding in resources processeurs ;
- the textual files are simpler to check or to modify manually, they pose less problems of portability, they are simpler to maintain and make evolve/move according to the needs.
Binary codings
One of the constraints of the binary codings is the portability. For example a machine using another model of processor which the computer of origin must be able déserialiser a storage block, by taking of account the problems of Alignement of data or of Endianness. This is why, even if the object does not comprise pointers, the simple copy of the print memory of an object is generally not an acceptable solution.
Thus should here also be used conventional encodings. It is rather current to use conventions suivantes : no alignment; encoding of the standard C whole according to their print memory all with the format big-endian, the numbers in floating decimal point use the standard IEEE 754.
Protocols like GIOP of CORBA or RMI of Java employ both of the binary codings.
Codings texts
To define a textual coding requires to choose a protocol to separate the fields, for encoder of the binary data (for example uuencode, Base64 or exhaust of the characters not ASCII)…
It is relatively current to use a derivative of the format XML.
The protocols SOAP and XML-RPC employ both of codings in format text.
Sérialisation of an atomic object
An atomic object is an object which does not comprise any reference towards other objects.
Type of the object
According to the possibilities of the language, the reanimation will be able to use a mechanism of Métaclasse brought by the language, or a manufactures specific. In all the cases, it is necessary to store the information which will make it possible to select the type of object to be created.
If the number of the types of objects with sérialiser is known in advance, information of the type can be coded in a very compact way (for example, on a simple byte if those do not exceed 256).
If not, it will be necessary to use conventions, such as for example that of the packages of the language Java. These conventional names being able to be bulky, it could be useful to envisage a mechanism of alias to avoid the repetitions during the treatment of several objects of the same type.
It is also possible to directly transmit the code implementing the encodé type. It is the case for example module Marshall of the standard library of the language python, and it is a mechanism which is more generally supported by all the interpreted languages supporting the setting out of mask of their byte codes.
Data
Each type of data is responsible for the filing and the restoration of its data members. For the composite types, it is about sérialiser each field in a preset order.
Hierarchical types
In directed Programming object, it is necessary to treat the data managed by the basic type before filing the data of the derived type.
Course of a graph of objects
It is problems which are rather common and which one finds for example when one seeks to implement a cloning, or a Ramasse-miettes…
More or less powerful algorithms can be selected according to with-priori one can make on the topology of the graph:
- tree
- graph only connected to the sheets
- unspecified graph
Manual course
In the general case it is necessary to memorize the objects traversed to detect the cycles.It is not an good idea to use the objects them to even point their visited statute:
- the method of sérialisaton starts to transfer the objects
- that poses problem of re-entry
The change of pointers is important to guarantee that the multiple references towards an object will be correctly désérialisées :
Course by introspection
The languages which support the introspection can provide a mechanism of serialization by defect.
Désérialisation
The deserialisation also poses a certain number of problems like the reanimation of nonmutable object. The objects cannot be used during the deserialisation
The deserialisation poses also problems of Sûreté of typing.
Safety
The deserialisation requires the interpretation of data which can come from a source which is out of control. The serialization can also involve the exposititon private data.
Management of the versions
It is often necessary to guarantee an upward compatibility or downward, i.e. the possibility of reading again its data with a new version of the software or of allowing has an old version of the software of reading data created starting from a more recent version. That requires on the one hand a mode of versionnage which makes it possible to know the compatible versions and a means for the oldest versions of being unaware of the data that it cannot interpret.
Alternatives
The serialization is an atomic mechanism of coding: it is not intended to give access a fragment data without to have very decoded.
There exist mechanisms as NSKeyedArchiver of the Cocoa library which allows a reanimation partial of the objects. It approach the system of Database
External bonds
- the serialization in. Net
See too
| Random links: | Dysphagie | Guillac (Morbihan) | Domodossola | Gehyra will pilbara | Lewis Reeve Gibbes | Protection_automatique_de_train |