The format of the data is the manner used in Informatique to represent binary Donnée S in the form of Nombre S . It is a convention (possibly standardized) used to represent Donnée S, that is to say information representing a text, a page, a image, a its, a achievable Fichier, etc When these Donnée S is stored in a file, one speaks about format of file . Such a convention makes it possible to exchange Donnée S between various computer programs or Logiciel S, either by a direct connection or via a file. One calls Interopérabilité this possibility of exchanging Donnée S between various software.

Typology

One distinguishes a format from which the specification is publicly accessible, a open Format, of a Format closed whose specification is secret. A closed format generally corresponds to a Logiciel only able fully to exploit it.

Another distinction takes place between a standardized format, being the subject of a standardization by an international public institution or (ISO, W3C) and an unspecified format, which can become a Standard in fact if it is popular. Such a format is sometimes standardized thereafter like OpenDocument.

A format is known as owner if it were worked out by a company, with a primarily commercial aim. A format owner can be open (the format pdf of Adobe for example) if it is published, or closed (the format “.doc” of Microsoft for example). But even when specifications are made public, the companies at the origin of formats owners try to preserve control of it at the same time either by proposing new more elaborate versions regularly (control by maintenance of a technological advance) or by using legal means like the patent. This type of competing anti practices via legal tools is allowed in the United States. It is prone to controversy in Europe (see Brevetabilité of the software).

Formats of the numbers

Integers

A natural Integer is in general simply represented into binary (bases 2 of them), with the traditional rule of conversion. Of course, unlike the natural entireties the data-processing entireties are finished. One cannot thus represent as well as the numbers holding in the interval defined by the number of bits available. When one wants to represent a relative whole , one holds a bit to indicate the sign (in general the bit on the left); one speaks then about “signed entirety”. Most of the time, the negative numbers will be coded according to the rule of the Twos complement.

For example, with a Octet one can represent:

  • natural entireties of 0 (00000000 into binary) to 255 (11111111 into binary)
  • entireties relative of -128 (10000000 into binary) to -1 (11111111 into binary) and of 0 (00000000) to 127 (01111111); one thus codes with a byte the entireties from -128 to +127.

See the detailed article: Binary system

Fractional numbers

For the fractional numbers, the convention of numeration wants that in base N , “0, have ” indicates has ·1 N (= has · N -1), “0,0 has ” indicates has ·1 N ² (= has · N -2)… For example bases 10 of them ( N = 10), “0,005” indicates 5·10-3.

Thus, number 0,001 into binary ( N = 2) indicates 1·2-3=0,125.

Thus of data processing, the first solution consists in allotting a certain number of bits on the right to the negative powers of 2.

Another solution consists in regarding the numbers as characters (text), and using for example the format ASCII. However, that can relate to only storage: the computer which can apply only the binary Arithmetic , it should necessarily be converted into binary for calculation.

See the detailed article: Floating decimal point.

Formats of text

The texts are made of character S of finished numbers (letter S, Diacritique S, signs of Ponctuation…). It is thus simple to allot a number to each character. This conversion character → number is defined by convention in the form of a table, or Page of code. The most used are the ASCII and the Unicode.

The texts include/understand also page layout (alignment of the paragraphs) and setting forms some (standard Bill of character, size…). The solution adopted in general consists in defining control words, instructions, separated from the text by a special character. Thus, in HTML, the instructions are called “beacons” and are put between rafters < … > ; in Latex, the instructions are introduced by a bar of reversed fraction \ . So certain characters are reserved for the instructions and cannot belong to the text any more; there then exist “codes of exhaust”, or many special instructions making it possible to represent them.

Until 2006, the software Microsoft Word retained another manner of storing working: the Donnée S (text and images) are put out of crude (without working) in the document, and working is defined in part of the document called “section break” ( section station-wagon ). The section break, except the fact of marking a change of page layout (column break, page break), is an invisible zone containing of the pointers allotting a formatting to part of the section. This storage solution rough of the data was historically adopted at one time when few of other solutions existed (with beginning of the year 80). It was an approach usually adopted for many applications and bound inter alia at the low level of standards of storage. However, the experiment showed that this approach appears very heavy and source of problems (corruption of documents) for the very bulky documents (near or higher than 100 pages). If Microsoft tried to preserve its model and to make it evolve/move gradually to avoid all beginning again at the base, this last arrived today at its limits. For its edition of 2007, Microsoft adopted a new format owner, Open XML.

Formats of image

The base of the representation of the images is the analytical Geometry.

Format chart of points

One can cut out an image in elementary points, or “Pixel”, and allot a color to this pixel. The color is represented by a number, the correspondence color → number being made by a “pallet”.

It is useless to give the coordinates of the points: if one gives the width of the image of number N of points, then N first points represent the first line, the points N +1 to 2 N represent the second line… It is then enough to fix by convention the order of sweeping, in fact the Western reading order (from left to right and from top to bottom).

This gives an image to the format chart of points , often called image bitmap . It is thus a groundwork of points of which each one is seen allotting a coloured value. The great differences between the existing formats are the depth of colors (1 bit: black or white, 8 bits: 256 colors, 24 bits: 16 million colors…) and the type of compression (without compression, or raw , with a compression by reasons, a destructive compression…)

For example, let us take a chart black and white (1 for white, 0 for black) defining a broad image of five points thereafter of figures following: 1000101010001000101010001 it is necessary to cut out this chart by groups of 5 bits: 10001 01010 00100 01010 10001 what gives us the drawing of “X” white on black bottom.

The format of the Donnée S must thus include, in addition to the list of the points, the width of the image and the description of the pallet; this is done in general at the beginning of file (one speaks about “heading” of file).

See also: matric Image

Vectorial format

An image with the vectorial format is an image which is described by whole of mathematical coordinates and not by a groundwork of point. For example,

  • to describe a line it is enough to know its arrival and starting coordinates;
  • for a rectangle (having its axes in alignment with the axes of the frame of reference in progress), two points are also enough (two opposite corners)
  • for a circle, only one point, as well as a ray, are necessary.

Moreover information on the layout is necessary: the graphic attributes are the thickness, the style (continuous or dotted lines), the color of the feature, its transparency, etc A vectorial image is thus a whole of coordinates, attributes and orders that the program of posting (with the screen or on paper) is given the responsability to interpret.

For images being able to be easily reduced with geometrical forms (typographism, cartography…), the vectorial format is extremely sparing.

The characteristic of the vectorial formats lies in the fact that made to them final depends only on the resolution of the peripheral of exit. This type of image can also be increased without awkward effects; there is no effect of “pixellisation” (the diagonal or curved lines do not appear in the form of staircase).

Some famous vectorial formats: SVG, Adobe pdf (Acrobat), Adobe Illustrator, encapsuled postscript EPS, Quark QXD, Macromedia Flash (format of vectorial animation), AutoCAD DXF.

Representation of the colors

See also: data-processing Coding of the colors

Video formats

See also: video Format

Format of scene 3D

The representation of the virtual objects created by the software of modeling 3D requires a format of Donnée S specific because the preceding formats are unsuited. Indeed, to represent an object 3D a description is needed at least:
  • of the topology of the object: its form, its size and its complexity
  • of the attributes of representation: colors, textures (nature and position), photometric quality of its surface, transparency
  • of its dynamic attributes if it is animated: capacity of collision with other objects, articulations and constraints, etc

To represent a scene also requires to specify lightings used, the relative positions of the objects, the possible effects of environment but especially its hierarchical structure (bonds of the elements between them).

The first standard formats in fact were formats adapted to the CAD: the object is defined using facets or of analytical surfaces. It is enough to define its origin then the coordinates characteristic of the elements in space in 3 dimensions. For example, in the format DXF of Autocad, an object is a succession of entities named and composed of list of items X, Y, Z. By indexing, one constitutes facets triangles or lines which are based on these points.

If this format were sufficient for technical design it was completely unsuited to virtual reality. In the years 1990, the company Silicon Graphics (three-dimensional chart constructor of station) published the format Inventor which comprised the majority of the elements necessary. This format evolved to the format vrml which was standardized.

In addition, the format 3D ASCII Studio was him also published but the explosion of the market of the 3D gave rise to quantity of formats owners. For the user, the problem was frequently to convert a model from one format to another without losing too many information. Companies even specialized in this type of conversion.

Currently, in the professional world, there are no single format but rather formats more or less used according to the type of application. For example:

  • Blender format for Multi-media creation
  • format Pro/Engineer for the CAD industrial
  • OpenFlight format for the control and/or flight simulation.

Nevertheless, the majority of the modellers 3D can more or less well read (Importation) and create (Export) several formats: it is an important selection criterion. Among the most widespread formats one can quote:

  • BLEND of Blender
  • 3DS
  • DXF of Autocad
  • IGES standardized
  • X Direct 3D
  • OBJ of Wavefront
  • LWO of Lightwave
  • vrml with its versions (1, 2 and X3D)
  • .COB of Truespace

The current trend is to privilege a descriptive format of type XML. The format of Donnée S 3D is then called descriptive language like X3D (evolution of vrml with a formatting XML).

The free format COLLADA also makes it possible to exchange Donnée S between various software. There exists in particular a importeur/exportor for Blender. There is currently a proposal to adopt it like format 3D in the wikipedia.

Format of sound

The formats of sound break up into three parts:

  • rough formats: the sound is not compressed, the values resulting from the conversion of the analogical values numerically ( sampling ) are recorded in the chronological order and by channel;

  • compressed formats: the sound is compressed with or without losses according to an algorithm adapted to the perception of the sounds of the human ear and/or a traditional compression;
  • formats of flow ( stream ): listening by part without having allows the totality of the file.

(See the chapter traditional Formats )

Compression of the data

The compression of the data is the technique which consists in transforming the Donnée S so that they take less place. The Given S having to be decompressed before being treated, this is done at the expense of the speed, and with a greater risk of loss of Donnée S.

The basic idea is that in general, of the elements repeat themselves in the files. One thus may find it beneficial to represent the elements often repeating itself by smaller numbers (i.e. taking less bits).

One can distinguish two types of compression:

  • compressions without a priori on the Given S: in fact algorithms work only on the numbers, whatever the information carried by these numbers; they are thus generals, not specific to the Donnée S; one can distinguish:
    • algorithms with stored table: the algorithm makes a first analysis to locate the elements repeating itself, and built a table of correspondence with a code shortened for each repetitive element; the size occupied by the table of storage makes that this process is rather adapted to the large files;
    • algorithms with table built with the stolen one: the table of correspondence is built in a systematic way, without preliminary analysis of the file; it can be rebuilt with stolen starting from the compressed file; it is for example the case of the algorithm of Lempel-Ziv-Welch (LZW)
  • compressions specific to the Donnée S: if one knows the Donnée S, one can optimize the algorithm; for example if it is known that one deals with text, one can base oneself on the frequency of use of the words in the language; two subcategories are distinguished:
    • compressions without loss of quality;
    • compressions with loss of quality: the first idea is to make a “under-sampling”, i.e. simply to degrade the Qualité of the data
      en studying the directions and the way in which the brain interprets information, one can degrade certain characteristics of the not very sensitive Donnée S, therefore without too much deteriorating the total quality of the data; thus, if the human ear is not very sensitive to certain frequency bands, one can degrade (even to remove) certain parts of the spectrum and not of others;
      les algorithms of compression of image and film JPEG and MPEG uses a loss of quality.

See the detailed article Data compression.

Traditional formats

  • Images: png, MNG, tiff, JPEG, GIF, TGA, BMP

  • vectorial Drawing: SVG, Flash, AI, EPS, DXF
  • 3D: XCF, BLEND, DXF, 3DS max, vrml, X3D
  • Sound: OGG, FLAC, MP3, WAV, WMA, AAC
  • Video
  • : MPEG, GMO (DVD, DivX, XviD), AVI, Theora, FLV
  • Page: pdf, PostScript, HTML, XHTML
  • Document: ODT, TXT, DOC., Achievable rtf
  • : BIN, ELF, EXE
  • Files: 7Z, TAR, GZIP, ZIP, LZW, ARJ, RAR

See too

External links

  • File extensions encyclopedia
  • Wotsit.org - The Programmer' S Spins and Data Resource
  • Magic signature database - Standard file format information and FFID registry
  • Format wars File formats for websites and print explained
  • File signatures (aka magic numbers) found in files to indicate to their standard file off
  • PRONOUN technical registry
  • Library Congress spins format information
  • Intoduction to Uniform Type Identifiers
  • Game Spins Central Format - has broad and expanding list off detailed descriptions off range-related file-formats
  • BIEW Binary vIEW project

Random links:Yen | Masters d' Indian Wells | The Saint-Germain-Source-Seine | John Griggs Thompson | Aravane Rezaï

© 2007-2008 speedlook.com; article text available under the terms of GFDL, from fr.wikipedia.org