See also: Byte (homonymy)
The byte is a Measuring unit in data processing measuring the quantity of Donnée S. a byte itself is composed of 8 Bit S, that is to say 8 Chiffre S binary. The Byte, which is a whole of adjacent bits, almost always has a size of a byte and the two words, but wrongly, are generally regarded as synonyms.
SymbolsThe symbol of the byte is the letter “O” lower-case and that of the byte is “B”.
The letter “O” is not acceptable in the international Système of units (IF) because of the risk of confusion with the figure 0. This question is not solved, the units of information not forming part of IF.
The byte is also more rarely noted with the letter “B” lower-case (only by the large-public or certain anglophone media, or other languages which includes the word byte in their vocabulary, sometimes with an adapted orthography), which is incorrect: this symbol “B” is used to note the Bit.
Traditionally, when they are applied to the bytes, the prefixes “Kilo”, “Méga”, “Giga”, etc, do not represent a multiple of: 1000, but a multiple of 210 =: 1024. However this tradition violates the standards in force for the other units, including the bit, and is not even applied uniformly to the bytes, in particular as far as the capacity of the Hard drives. A new standard was thus created to note the multiples of 210 =: 1024: “Kibi”, “Mébi”, “Gibi”, etc
The traditional use remains largely in force at the professionals like the general public, even if it is in contradiction with the recommendations IF which define other prefixes clearly. The use of the binary prefixes remains very confidential and almost does not spread itself in the language running, whereas the values represented by these units in power of 2 are very much used in the applications, in particular the operating systems.
This distinction is used besides for a long time by the manufacturers of hard drives. The fact that the use of prefixes in powers of 10 makes it possible commercially to post capacities higher than those given by the powers of 2 can introduce an error of appreciation on behalf of not informed users. Thus, an hard drive of 100 gigaoctets (100×109 bytes) contains the same number (round-off) of bytes that a disc of 93,13 gibioctets (93,13×230 bytes).
Very great majority of the hard drives being divided and addressable into sectors of 512 bytes, a counting in units of: 1024 bytes would be more natural (by using this time the binary prefixes) ; discs of storage to nonvolatile memory (including the keys USB, mobile readers MP3…) generally use the unit with the binary prefix. But it should be noted that this capacity is that of volume not formatted, the formatting of the discs in filesystem withdraws a part of it, moreover one small portion of volume of the nonvolatile memory is sometimes used by the internal software of the storage unit.
Other everyday usages, but incorrect, completely remove the name or the symbol of the unit not to more keep but the name or the symbol of the multiplying prefix “K”. That however involves many ambiguities as for the nature of this unit, in particular when is used it to express a rate of transfer of information or the capacity of a memory chip. Indeed, in these two cases, it is current that one measures out of bits rather than in bytes.
StandardizedThe standardization of the binary prefixes of 1998 by the international electrotechnical Commission specifies the following prefixes to represent the powers of 2:
- kibi for “ ki lo Bi naire”;
- mébi for “ me ga Bi naire”;
- gibi for “ gi ga Bi naire”;
- tébi for “ t-piece ruffle Bi naire”;
Concerning the multiples of the byte, that gives:
The prefixes kilo, méga, giga, will téra, etc, correspond to the same multipliers as in all the other fields: powers of 10. Applied to data processing, that gives:
TraditionalBy convention, in an erroneous way according to the IF, and before the standardization of 1998, one uses derived units which are the kilooctet, the megabyte, the gigaoctet to represent the following values in power of 2:
Spelling variantsThe French language poses also a problem of syntax, relating to the pronunciation of the initial vowel of the word byte with a préfixe ; also, in the literature the following forms are seen: “kilobyte” (with hyphen), “kilooctet” (without hyphen) or “kiloctet” (fusion of the vowels). In the units IF, the name of the prefix is always written without hyphen before the name of the basic unit.
Properties of binary representationA byte can take 28=256 different values. The value of any byte can be written with a natural Nombre between 0 and 255 included/understood (bases 10 of them). It can be also written with eight binary digits , between 000000002 and 111111112 lain, or with two hexadecimal digits , between 0016 and FF16 included/understood. The hexadecimal notation is usable in many data-processing languages because it is practical and compact to note the value of a byte.
A byte can be used to note a natural number, called in Informatique “not signed”, between 0 and 255 (bases 10 of them). Another current convention, the Twos complement, makes it possible to note a Integer, or “signed”, between -128 and +127 included/understood (bases 10 of them). See also the article Binary system .
Many conventions exist to represent a character by one or more bytes. One can in particular quote coding ISO 8859-1 very much used to represent with a byte the 26 small letters, the 26 capital letters, the 10 figures, the accented accentuated letters and the punctuation, languages of Western Europe, of which the French. More recent, coding UTF-8 makes it possible to note any character with one to four bytes, according to the character. The article on the Codage of characters develops this topic.
Properties of decimal representationIn certain applications requiring an exact coding of the decimal values (financial applications), the powers of 2 can not prove to be practical. Also a byte is also sometimes used to store to 2 decimal digits exactly (between 0010 and 9910), each one coded on a four-bit byte (4 bits) distinct between 00002=010 and 10012=910. The other values of four-bit bytes can be used to code the position of a decimal point, a sign, the absence of significant figure to the position indicated, or another special function (infinite value, nonnumerical erroneous value, etc). Certain calculators (and of the software libraries of calculation with fixed point or working on very great entireties or values of high degree of accuracy) use this format says “BCD”, initials English of decimal Binary coded (decimal coded into binary).
The use of coding BCD was popular on the old systems (in particular those using EBCDIC) because that avoided an expensive final conversion to post the numbers with floating decimal point. Moreover this system was more practical in the time when the data had entered manually on perforated cards: to convert a number BCD into characters, it was enough to burst number BCD into two by using one four-bit byte by byte to represent the figure into decimal, the four-bit byte of weight extremely taking a fixed value stating simply that it is a decimal digit. The other values of four-bit bytes of strong weight were used to indicate that it was a capital letter, a small letter or another symbol or punctuation. Today, in the majority of the current systems, coding EBCDIC of the characters and the BCD for the numerical values is more rarely used because the majority of calculations are more quickly made of binary representation in a material way with a preset total precision (coded on a fixed number of bytes).
There exist alternatives of system BCD making it possible to preserve a precise representation of the numbers at fixed point or floating bases 10 of them, while allowing a greater compactness of storage and while making calculations faster. The easy way consists to group the decimal digits and to represent them into binary on a group of several bytes. For example:
- first of all the representation in four-bit bytes proves to be expensive in term of treatment, and a value BCD is generally initially converted into removing separation in four-bit bytes, for then representing into binary the two decimal digits on same the octet ; calculation is then simplified because it is carried out by group of 2 digits at the same time instead of only one ; this representation leaves 1 bit of extremely unutilised weight (but one can use it as marker for special values) ;
- one can represent exactly 4 decimal digits in a group of 2 bytes (i.e. on 16 bits, since those can contain 216 =: 65536 different values) ; with a representation BCD, one would exactly also store 4 decimal digits, but calculations would be made only figure by figure (i.e. 4 times more slowly) ; this representation leaves 2 bits of extremely unutilised weights (but one can use them as markers for special values) ;
- one can represent exactly 7 decimal digits in a group of 3 bytes (i.e. on 24 bits, since those can contain 224 =: 16777216 different values) ; with a representation BCD, one would store exactly only 6 digits décimaux ; this representation does not leave any bit inutilisé ;
- one can represent exactly 9 decimal digits in a group of 4 bytes (i.e. on 32 bits, since those can contain 232 =: 4294967296 different values) ; with a representation BCD, one would store exactly only 8 digits décimaux ; this representation leaves 2 bits of extremely unutilised weights (but one can use them as markers for special values). This representation is often used in the mathematical libraries of calculation on numbers of very high degree of accuracy.
UsesThe Processeur S generally do not operate on each bit individually, but on bytes. The practice to design the material so that it treats the bits by eight, or by multiples of eight, spread since the Années 1970, so that today the byte and its multiples are generally used like measures storage capacity of the computing memories: Random access memory, Diskette, Hard drive, CD-ROM, etc the size of the file S is also measured in bytes (with generally the conventional multiples, except for the discs durs ; to see below).
The rate of transfer of the Drunk data-processing S between the computer applications and Périphériques local data processing is generally given in bytes by second (with the multiples normalisés ; to see below). But the flows on the networks or supports of data transmission are expressed rather:
- either in Baud S (with the standardized multiples), i.e. the number of symbols coded a second, for material technologies of modulation of very low level, for example in the Modem S, these technologies separating the sampling rate in bauds (strongly related to the Band-width physical expressed in Hertz) from the precision of sampling expressed out of bits by symbol (strongly related to the signal-to-noise Report/ratio of the support of transmission expressed in Decibel S or bits),
- or in Bit S a second (with the standardized multiples), for final binary debit usable, resulting from the product of the baud rate by the number of coded symbols usable by symbol, possibly decreased of the bits of detection or correction of errors or synchronization.
Bits and bytesAttention, the rates of transfer of the Data-processing networks are often given in bits a second (b/s, Mb/s, Gb/s).
The memory size of the chips of Random access memory and that of the cartridges of video game is often given out of bits (B, Mb, GB).
The media confuse bits and bytes very regularly, which gives errors of a factor 8.
The capacity of the disks hard built after the standardization of 1998 is given in gigaoctets (Go), whereas certain operating systems not taking charges the new standard with it still give the capacity in gigaoctets and not in gibioctets.
Example of conversion of the bits towards bytes:
8 bits = 1 byte according to the formula: many bytes = (many bits/8)
- 512 kilobits =: 512000 bits =: 64000 bytes = 62,5 kibioctets (: 64000/: 1024).
- 1 megabit =: 1000000 bits =: 125000 bytes ≈ 122,07 kibioctets (: 125000/: 1024).
- 10 megabits =: 10000000 bits =: 1250000 bytes ≈ 1,19 mébioctet ((: 1250000/: 1024)/: 1024).
- 100 megabits =: 100000000 bits =: 12500000 bytes ≈ 11,9 mébioctets
WordsWhen the treatment is done on several bytes simultaneously, in particular 2 bytes (16 bits) and 4 bytes (32 bits), one speaks sometimes about word and double-word. The significance of these terms tends to vary with the context, also it is not recommended to use them.
Bytes, bits and decibelsThe bit (or its multiple the byte) is a unit derived sometimes more practical in certain calculations on the signals than the Bel noted “B” or the Décibel noted “dB”: a quality of signal of 1 bit is defined like the double of the binary Logarithme of a signal-to-noise Rapport exactly equal to 2 and the beautiful one is defined like the decimal logarithm of a report/ratio of sizes exactly equal to 10.
Thereafter, one speaks about signals of quality 1 byte:
- the quality of signal of 1 Bit is worth exactly 2.log 10 (2) Bel S or 20.log 10 (2) Décibel S (either approximately 0,6 B or 6 dB).
- the quality of signal of 1 byte is worth 8 times exactly more, either 16.log 10 (2) Bel S or 160.log 10 (2) Décibel S (or approximately 4,8 B or 48 dB).
One can note that in lower parts of 6 dB of signal-to-noise report/ratio, it is not theoretically possible any more to detect only one whole bit of information with only one sample. However, the detection of information (for the transmission or storage) in such a signal is not not impossible thanks to the technique of the Suréchantillonnage. It is enough to take several samples: the signal-to-noise reports/ratios of each sample are added then until exceeding the threshold of 6 dB, with beyond which it is possible, by calculation, to detect, transmit or store 1 bit of information. One makes then in the same way with signals in lower parts of 48 dB to detect, transmit or store, by calculation, 1 byte of information. The smallest “fragments” of bits or bytes are thus perfectly usable!
On the supports of data transmission
Rate of transmissionMaximum capacity of a support of data transmission (expressed in decibels a second, or resp. bits a second, or bytes a second) is the sum of the products of the sampling rate of each signal transmitted (expressed in hertz) by the quality of this signal (expressed in decibels, or resp. bits or in bytes ). According to technologies employed, there exists always a compromise between quality and sampling rate of each signal, the best compromise (which brings the maximum capacity) being obtained when the product quality-frequency is maximum.
Flow (or temporal capacity of information) effective of a support of data transmission (expressed in decibels a second, or resp. out of bits a second or in bytes a second) is always strictly lower than this theoretical maximum capacity than it is impossible to exceed on the same support (in condition of having taken into account all detectable signals on this support).
The unit most employed to measure the flow of a support of data transmission is the bit a second (and its multiples standardized in powers of 10 like the kilobit a second symbolized kb/s, even kbps in English) for the physical supports, but the use of the byte a second (and its multiples standardized in powers of 10, or traditional in powers of 2) is very current for the applications and protocols of transfer of file.
Quantity of information transmittedThe maximum quantity of information transmitted in an time interval on a support of data transmission is the integration of its flow on each temporal position in this interval where the samples are transmitted and detected.
The quantity of information actually transmitted is always strictly lower than this theoretical maximum quantity. It is measured in decibels, or resp. bits or bytes (or its multiples standardized in powers of 10, or conventional in powers of 2). The most used unit is the traditional kilooctet , symbolized KB in French.
On the memory supports of information
Capacity per space unitOne carries out the same reasoning for the capacity of the static supports of storage, by considering that each position length (or resp. surface or volume) of this support defines a certain number of signals having each one a quality expressed out of bits, bytes or decibels, the nature of these detectable signals depending on the used sensors (conversion of electrical signals into magnetic signals, optics,…), of their quality (i.e. their intrinsic precision), and of the surrounding noise level (also depending on construction on the support, in particular its insulation).
The space field of storage more used today is surface (diskettes, hard drives, optics, magneto-optical, electronic memories,…), but the length is still used (magnetic bands). The space field of volume is still in an experimental state (storage optical in a crystal, holographic,…), but starts to appear on the optical disks multi-layer (the corresponding units remain still surface).
One can then speak of linear capacity (or resp. surface or voluminal) of information expressed in decibels per mm (or resp. by mm ² or mm ³) or thus also out of bits per mm (or resp. by mm ² or mm ³) or in bytes per mm (or resp. by mm ² or mm ³), the unit of length (or resp. of surface or volume) replacing the second in the preceding paragraph, and according to the same formulas of Nyquist-Shannon for the quality of the signals sampled on this support. For the hard drives, optical disks or magneto-optical, the most used surface unit of capacity is the bit per mm ², or its multiples standardized in powers of 10 like the kilobit per mm ², symbolized kb/mm ².
One finds however also mention of the multiple standardized kilooctet per mm ² (ko/mm ² in French). The traditional units are never employed on this level, contrary to the manufacturers of memories which prefer the traditional units in powers of 2 as the kilooctet per mm ² (which sometimes rather count of many transistors per unit of area, but this time with the conventional multiples in powers of 10, knowing that a bit of stored information often requires two transistors!).
Full capacity of the supportThe maximum full capacity of information of this support, expressed in decibels, bits or bytes is the integration of this linear capacity (or resp. surface or voluminal) on each position of the length (or resp. surface or volume) of this support.
The effective full capacity of information of a memory support is generally measured in bytes (or its multiples standardized in powers of 10, like the kilooctet , conventionally symbolized KB on the hard drives). But generally the interface of this support is done by sectors of conventional size of 512 bytes, and thus the capacity of use of this support in the operating systems is more often measured and in a more practical way with the multiples tradionnels.
|Random links:||Celestial Ecuador | InteracciÃ³n dÃ©bil | Paintball | 2009: Lost Memories | Firmin d' Uzès | Under the sky of Paris (film)|