Bio-data processing
The bio-data processing is a multidisciplinary research field where work in concert Biologiste S, Informaticien S, Mathématicien S and Physicien S, with an aim of solving a scientific problem posed by the Biologie. The bio-data processing term can also describe (by abuse language) all the applications Informatique S resulting from this research. That goes from the Analyze of the genome to the modeling of the evolution of an animal population in a given environment, while passing by the molecular Modélisation, the Analyze of image, the Séquençage of the Génome and the rebuilding of phylogenetic trees (Phylogénie). This discipline constitutes the “ biology In silico ”, by analogy with in vitro or in vivo .
According to Claverie: Bio-data processing is consisted the whole of the concepts and the techniques necessary to interpretation of genetic information (sequences) and structural (folding up 3D). It is the decoding of “bio-information”. Bio-data processing is thus a theoretical branch of biology.
In a direction even wider, one can also include under the concept of bio-data processing the development of tools for data processing based on biological systems like, for example, the use of the properties Combinatoire S of the genetic Code for the design of computers with DNA allowing to solve complex problems Algorithmique S .
Analysis of sequence
Whereas more and more of sequences of Genome, Transcriptome or Protéome are available, the significance of the majority of these sequences remains to include/understand. The first difficulty was to organize this enormous mass of information and to make it available to the whole of the community researchers. That was made possible thanks to various databases, accessible in lines, like GenBank, UniProt, PDB, etc . ( cf bonds at the end of the article).It is then necessary to develop tools of analyzes sequences in order to be able to determine their properties.
- Search for proteins starting from the translation of known nucleic sequences. This one passes by the determination of the open phases of reading of a nucleic sequence and of its or its translation (S) probable.
- Search of sequences in a data bank starting from an other sequence or for a fragment of sequence. The most common technique is the BLAST.
- Alignment of sequences : to find the resemblances between two sequences and to determine their possible homologies. Alignments are at the base of the construction of relationships following of the molecular criteria, or of the recognition of particular reasons in a protein starting from the sequence of this one.
Bio-data processing also intervenes in the Séquençage, with for example the use of chips to DNA or Biopuce. The principle of such a chip rests on the characteristic spontaneously to reform the double helix of the desoxyribonucleic acid vis-a-vis the complementary bit. The four basic molecules of the DNA indeed have the characteristic to link two to two. If a patient is carrying a disease, the bits extracted the DNA of a patient, go hybrider with the synthetic bits of DNA representative of the disease.
Molecular modeling
The molecules from their dimensions are invisible with any direct means of investigation such as microscopy. It is by the analysis of indirect data that the researchers can reconstitute a molecular model, i.e. a intellectual construction presenting the best adequacy with the experimental results. These data result mainly from crystallographic analyzes (study of the figures of Diffraction of the X-rays by a Cristal), or from nuclear Magnetic resonance. They represent the experimental constraints exerted on the model. The molecular model obtained then is a whole of atomic coordinates in space. Data processing intervenes in all the stages leading of the experimentation to the model, then in the analysis of the model by the molecular visualization (see the Protéine S in 3D).It is used for example to study the active Sites of a Enzyme, to develop by means of computer a series of Inhibiteur S possible for this enzyme, and not to synthesize and test only those which seem to be appropriate. That makes it possible to reduce the costs of research and to accelerate this research.
The visualization of the three-dimensional structure of nucleic acids (ARN and DNA) also formed part of the pallet of the tools bio-data processing very much used.
Last aspect is prediction of structure 3D of protein from its structure primary (the list of the Amino-acid which composes it), by modelling the various characteristics of the amino-acids. That has a great interest because the function, the activity of a protein depend largely on its form. In the same way, the modeling of the structures 3D of nucleic acids (starting from their sequence nucleotidic) revêt same importance as for proteins.
Construction of phylogenetic trees
One calls genes homologous Gènes descendant with the same ancestral gene. In a more specific way, one says these genes which they are orthologists if they are found in different species (speciation without duplication), or that they are paralogists if they are found at the same species (duplication inside the genome).It is then possible to quantify the genetic distance between two species by comparing their genes orthologists. This genetic distance is represented by the number and the type of changes which separate two genes.
Applied to a more significant number of living beings, this method makes it possible to establish a matrix of the genetic distances between several species. The phylogenetic trees bring closer the species which have the greatest proximity. Several different algorithms are used to trace trees starting from the matrices of distance. They rest each one on models of different evolutionary mechanisms. The two most known methods are the method UPGMA and the method of the Neighbor Joining.
The construction of phylogenetic trees is used by the programs of multiple alignments of sequences in order to eliminate most of possible alignments and to limit the computing times thus.
The modeling of population
References
| Random links: | Gerhard Unger | Bellefonds | Customers of the Good yellow Dog | Accident off Birth (album) | Heinkenszand |