Algorithm of Needleman-Wunsch
The algorithm of Needleman-Wunsch carries out a total alignment of two sequences (called has and B here). It is usually used in Bioinformatique to align sequences of Protéine S or of Nucléotide S. the algorithm was presented in 1970 by Saul Needleman and Christian Wunsch in their publication has general method off applicable to the search for similarities in the amino acid sequence two proteins , J Mol Biol. 48 (3): 443-53.
The algorithm of Needleman-Wunsch is a dynamic example of Programming, just like the Algorithme of Levenshtein to which it is related. It guarantees to find the alignment of maximum score. It was the first application of the dynamic programming for the comparison of biological sequences.
The scores for the aligned characters are specified by a matrix of similarity. Here, is the similarity of characters I and J. It uses a “penalty of hole”, called here D.
For example, if the matrix of similarity were
then alignment: AGACTAGTTAC CGA---GACGT with a penalty of hole of -5, would have the following score:
To determine the alignment of maximum score, a two-dimensional table, or matrix is used. This matrix is sometimes called matrix F, and its elements with the positions (I, J) are noted . There is a column for each character of the sequence has, and a line for each character of the sequence B. Thus, if one aligns sequences of size N and m, the execution time of the algorithm is O (Nm), and it memory capacity used is O (Nm). (However, there exists a modified version of the algorithm, which uses a memory capacity out of O (m + N), but has a longer execution time. This modification is in fact a general technique in dynamic programming; it was introduced into the Algorithme of Hirschberg).
Progressively of the progression of the algorithm, will be seen assigning the optimal score for the alignment of I first characters of has with the J first characters of B. the principle of optimality is applied as follows. Base: Recursion, based on the principle of optimality:
The pseudo code of calculation of the matrix F is given here: for i=0 to length (A) - 1 F (I, 0) < - d*i for j=0 to length (B) - 1 F (0, J) < - d*j for i=1 to length (A) for J = 1 to length (B) { Choice1 < - F (i-1, j-1) + S (has (I), B (J)) Choice2 < - F (i-1, J) + D Choice3 < - F (I, j-1) + D F (I, J) < - max (Choice1, Choice2, Choice3) }
Once the matrix F is calculated, it is seen that the element (I, J) corresponds to the maximum score for any alignment. To determine which alignment provides this score, it is necessary to start from this element (I, J), and to carry out the “opposite way” towards the element (1,1), while looking with each stage starting from which neighbor one left. If it were about the diagonal element, then (I have) and B (I) are aligned. If it were about the element (i-1, J), then has (I) is aligned with a hole, and if it were about the element (I, j-1), then B (I) is aligned with a hole. AlignmentA < - " " AlignmentB < - " " I < - length (A) - 1 J < - length (B) - 1 while (I > 0 AND J > 0) { Score < - F (I, J) ScoreDiag < - F (I - 1, J - 1) ScoreUp < - F (I, J - 1) ScoreLeft < - F (I - 1, J) yew (Score == ScoreDiag + S (has (I), B (J))) { AlignmentA < - has (i-1) + AlignmentA AlignmentB < - B (j-1) + AlignmentB I < - I - 1 J < - J - 1 } else yew (Score == ScoreLeft + d) { AlignmentA < - has (i-1) + AlignmentA AlignmentB < - " - " + AlignmentB I < - I - 1 } otherwise (Score == ScoreUp + d) { AlignmentA < - " - " + AlignmentA AlignmentB < - B (j-1) + AlignmentB J < - J - 1 } } while (I > 0) { AlignmentA < - has (i-1) + AlignmentA AlignmentB < - " - " + AlignmentB I < - I - 1 } while (J > 0) { AlignmentA < - " - " + AlignmentA AlignmentB < - B (j-1) + AlignmentB J < - J - 1 }
External bonds
-
Algorithm of Needleman-Wunsch in java.
See too