Endianness

In Data-processing, certain data such as the numbers whole S can be represented on several Octet S. the order in which these bytes are organized in memory or in a communication endianness is called.
In the same way that certain human languages are written from left to right, and others are written from right to left, it exists a major alternative to the organization of the bytes representing a data: the orientation big-endian and the orientation little-endian . These expressions are sometimes translated by large-boutiste and small-boutiste . The expressions byte order , of order of the bytes or byte sex are also used.

The endianness relates to only the data structured on several Octet S, such as the integers, or the characters Unicode, coded in UTF-16 or UTF-32. The data coded on only one byte, such as the characters ASCII are not concerned.

In the Computer S

When certain computers record an entirety on 32 bits in memory, for example 0xA0B70708 in hexadecimal notation, they records it in bytes in the order which follows: A0 B7 07 08, for a structure of memory based on an atomic unit of 1 byte and an increment of address of 1 byte. Thus, the byte of weight most extremely (here A0) is recorded with the address the smallest memory, the byte of lower weight (here B7) is recorded with the address following memory and so on.

For a structure of memory or a communication protocol based on an atomic unit of 2 bytes, with an increment of address of 1 byte, the recording in bytes will be: A0B7 0708. The atomic unit of weight most extremely (here A0B7) is recorded with the address the smallest memory.

Architectures which comply with this rule are known as big-endian or large-boutistes or word of weight extremely at the head , for example the processors Motorola 68000, SPARC (Sun Microsystems) or the System/370 (IBM).

The other computers record 0xA0B70708 in the following order: 08 07 B7 A0 (for a structure of memory based on an atomic unit of 1 byte and an increment of address of 1 byte), i.e. with the weakest byte of the weight in first. Such architectures are known as little-endian or small-boutistes or weak word of weight at the head . For example, the processors X86, which are in PC have an architecture small-boutiste.

For a structure of memory or a communication protocol based on an atomic unit of 2 byte, with an increment of address of 1 byte, record it in bytes will be: 0708 A0B7. The atomic unit of the weakest weight (here 0708) is recorded with the address the smallest memory.

Certain architectures support the two rules, for example architectures PowerPC (IBM), ARM, DEC Alpha, MIPS, PA-RISC (HP) and IA-64 (Intel). They are called bytesexual (jargon), Bi-endian or, more rarely, biboutists . The choice of the mode can be done at the software level, the material level or both.

Some other rare architectures, called middle-endian , have a more complex scheduling: the bytes composing the atomic units undergo an operation of swap . For example 0xA0B70708 is recorded in a memory, middle-endian of which the atomic units are of 2 bytes, with an increment of address of 1 byte, in the order: 0807 B7A0 or B7A0 0807.

or alternatively

We will note an ambiguity in the representation of this data. Indeed the information of endianness on the manner of ordering the atomic units exists always indeed. At the end of middle-endian, we will thus prefer rather to use the terms of big-endian or little-endian associated with a characteristic of byte-swap . Our example consequently becomes nonambiguous:

1. in a memory little-endian with byte-swap , 2-bytes of atomic unit, 1-byte of increment of address, 0xA0B70708 is represented by 0807B7A0, 08 being with the address 0.

2. in a memory big-endian with byte-swap , 2 bytes of atomic unit, 1 byte of increment of address, 0xA0B70708 is represented by B7A00807, B7 being with the address 0.

It is more difficult to work with such processors, PDP-11 for example.

The classification of the bits in an architecture big-endian is as follows: the bits are numbered left, thus bit 0 most extremely has the weight, and weakest bit 7 being that of the weight in a byte. It seems more intuitive to number the bits with the manner little-endian if a byte must represent one entirety, because in this case, the number of the bit corresponds to the exhibitor. However, if the byte must represent one binary fraction, then convention big-endian is appropriate better.

Average a mnemotechnics not to mix the ideas, it is enough to replace “endian” by “head”. What gives us: “big head” for the bits of weight “extremely at the head” “small head” for the bits of weak weight “at the head”

In the communications

One calls that problem NUXI, indeed if one wants to send the chain “UNIX” by gathering two bytes by whole word of 16 bits on a machine of different convention, then one obtains NUXI. This problem was discovered while wanting to carry one of the first versions of Unix of a PDP-11 middle-endian on an architecture IBM big-endian .

Protocol IP defines a standard, the network byte order (either order of the bytes of the network). In this protocol, the binary data in general are coded out of packages, and are sent on the network, the byte of weight most extremely in first, i.e. according to the mode big-endian and that whatever the natural endianness of the processor host.

The peripherals must also respect a convention in order to ensure the coherence of the system. All that is fixed by the protocol of binder the layer Course of OSI model.

Practical differences

Although the difference between the two modes big-endian and little-endian seems tiny today and is limited to a problem of convention, one can announce advantages related to each one:

The numbers big-endian are easier to read when one débogue a program because their contents is directly readable without having to change the order of the bytes constituting the number. That is due to the fact that the order of the figures is the same one as that of the normal writing .

The mode little-endian had advantages when the processors used variable sizes of register, i.e. 8,16 or 32 bits. Starting from a Address memory given, one could read the same number by reading 8,16 or 32 bits. Example: number 33 (0x21 in Hexadécimal) is written 21 00 00 00 in little endian in 32 bits, which is always read 21 whatever the number of bytes read. This is false in big-endian because the first address changes according to the number of bytes to reading.

In general, it is said that one prefers one or the other of the representations according to that which one studied in first.

Software and portability

It was well understood that these conventions pose problems in the bearing of the software. For example, while reading binary data, according to architecture, one will not obtain the same data after reading if one does not worry about convention.

Of course the choice of big-endian or little-endian is always arbitrary, which raises debates intensives, because there is many arguments in favor of the one and other. Languages for example, according to the group Germanic, English or different linguistics, do not have same perception.

Writing of the numbers in the human languages

The problem of the choice of the direction of writing also arises for the writing of the numbers in positional notation in the human languages.

In the languages using the Latin alphabet, which are read from left to right, the numbers are written while starting with the figures of the strongest weights. It is thus a convention big-endian . In Arab, it is the reverse: one writes - this time from right to left - initially the units, then tens, etc It is a convention little-endian … relatively within the meaning of ordinary writing of this language. (It is amusing to note that these two opposite conventions give the same result, from the left-right-hand side point of view: units on the right…)

Writing of the dates

Certain countries have standards concerning the writing of the dates. The concept of endianness is present there as show it the following examples:
  • Europe: JJ/MM/AAAA (little endian)
  • Japan: AAAA/MM/JJ (big endian)
  • the United States: MM/JJ/YYYY (middle endian)

Etymology

The terms big-endian and little-endian were borrowed from the Gulliver's Travels of Jonathan Swift, in which these two clans of Lilliputians are made the war because of the different manner that they have to break the boiled eggs: by the large one or the short period.

See too

Related articles

External articles

  • White Paper: Endianness gold Where is Byte 0?.

References

Random links:Balistidae | Echinostoma | Cacheu | Brujas FC Escazu | Natural site protected from Boucherette (Lugny) | Funkley,_Minnesota