MD5

The algorithm MD5 , for Message Digest 5 , is a very popular Fonction of chopping cryptographic, but which is not regarded any more as a sure algorithm. One now suggests using rather algorithms such as SHA-256, RIPEMD-160 or Whirlpool. The MD5 was invented by Ronald Rivest following the vulnerabilities on MD4.

Origin

MD5 ( Message Digest 5 ) is a cryptographic Fonction of chopping which makes it possible to obtain for each message a numerical impressed (in fact a sequence of 128 bits or 32 characters in hexadecimal notation) with a very strong probability that, for two different messages, their prints are different.

In 1991, Ronald Rivest improves architecture of MD4 to counter potential attacks, which will be confirmed later by work of Hans Dobbertin. In 1996, a serious fault (possibility of creating collisions with the request) is discovered and indicates that MD5 should be put on side with the profit of more robust functions like SHA-1. In 2004, a Chinese team discovers complete collisions. MD5 is not thus regarded any more as sure with the cryptographic direction.

MD5 remains still used as tool for checking at the time of the remote loadings (for example, in ftp). The sites still often post the signature in MD5 (128 bits) of their files, although SHA-1 (160 bits) replaces it more and more.

The user can thus validate the integrity of the version downloaded thanks to the print. This can be done with a program like md5sum for the MD5 and sha1sum for SHA-1. This measurement makes it possible to avoid downloading a version containing a Computer virus or any other suspect code coming from a nonofficial site.

MD5 can also be used to record a print of a password, it is the system employed in GNU/Linux with the presence of a salt complicating decoding. It is indeed surer to store prints MD5 rather than the passwords themselves, so that if somebody reaches this list, it cannot find the passwords, at least those which are not commonplace.

The program John the ripper makes it possible to break the commonplace MD5 by rough force. Waiters of " tables inverses" (with direct access, and which makes sometimes several gigaoctets) allow to often crack them in less than one second. See Rainbow attacks.

Example

Here the signature obtained on a sentence:
MD5 (" Wikipedia, the free encyclopedia and gratuite") = d6aa97d33d459ea3670056e737c99a3d
By modifying a character, the signature changes radically:
MD5 (" Wikipedia, the free encyclopedia and gratuit' E' ") = 5da8aa7126701c9840f99f8e9fa54976

Cryptanalyse

MD5 was regarded as sure at the beginning. Its effectiveness was exhausted little by little thanks to the discovery of potential faults in its operation. The MD5 was broken during the summer 2004 by Chinese researchers, Xiaoyun Wang, Dengguo Feng, Xuejia Lai (Co-inventor of the famous encryption algorithm IDEA) and Hongbo Yu. Their attack made it possible to discover a complete collision (two different messages which produces the same print) without passing by a method of the type exhaustive Recherche.

On a paralleled system, calculations took only a few hours. The MD5 is not thus regarded any more as sure but the algorithm developed by the Chinese relates to unspecified collisions and does not allow to carry out a collision on a specific print: starting from the print of a message, to carry out another message which produces the same print. A project of distributed calculation launched in March 2004, MD5CRK, aimed at discovering a complete collision but was suddenly stopped after the discovery of the Chinese team. The Safety of the MD5 being guaranteed according to its cryptographic definition, the specialists recommend to use more recent functions of chopping like SHA-256.

One can now generate collisions MD5 in less than one minute when the two blocks in collisions are " libres".

One can from now on generate an infinity of collisions with a text T starting from two messages of the same M1 and m2 length which are in collision. It is enough to concaténer M1 and m2 with T, such as T1 = M1 + T and T2 = m2 + T, in order to obtain a complete collision between T1 and T2. One cannot however not generate a particular signature and falsification of documents remains a difficult exercise.

Today (2006), it is for example possible to create pages HTML with the very different contents and having however the same MD5. The presence of Métacodes of " bourrage" placed in comments, visible only in the source of the Web page, betrays however the pages modified to usurp the MD5 of another. Trickery can thus be raised, still is necessary it to think of examining the source of the page in question.

Algorithm

Notation

  •   S is a rotation of S bits towards the left, S varies for each operation.
  •   symbolizes the addition module 232.
  •   \ oplus, \ wedge, \ vee, \ neg symbolizes the Boolean operations respectively XOR, AND, GOLD and NOT.

Preparation of the message

MD5 works with a variable message of size and produces a print of 128 bits. The message is divided into blocks of 512 bits, one applies a filling so as to have a message of which the length is a multiple of 512. The filling arises as follows:
  • one adds one “1” at the end of the message

  • one adds a sequence of “0” (the number of zeros depends on the length on the filling necessary)
  • one writes the size of the message, an entirety coded on 64 bits

This filling is always applied, even if the length of the message can be divided by 512. This method of padding is similar to that used in the majority of the algorithms of Message Digest of the families MANDELEVIUM (like MD5 or RIPEMD) or SHA (SHA-1 or SHA-512) but different from that of the algorithm Tiger which uses a convention known as Little endian of scheduling of the bits in each byte.

The size of the message is coded in Little endian. The message now has a multiple size out of bits of 512, i.e. it contains a multiple of 16 words of 32 bits.

Principal loop

The principal algorithm works with a state on 128 bits. It itself is divided into 4 words of 32 bits: has , B , C and D . They are initialized at the beginning with constants. The algorithm uses then the blocks coming from the message to chop, these blocks will modify the internal state. The operations on a block break up into four rounds (stages), themselves subdivided in 16 similar operations based on a non-linear function F which varies according to the round, an addition and a rotation towards the left. The four non-linear functions available are:

F (X, Y, Z) = (X \ wedge {Y}) \ vee (\ neg {X} \ wedge {Z})

G (X, Y, Z) = (X \ wedge {Z}) \ vee (Y \ wedge \ neg {Z})
H (X, Y, Z) = X \ oplus Y \ oplus Z
I (X, Y, Z) = Y \ oplus (X \ vee \ neg {Z})

Pseudocode

MD5 can be written in this form in Pseudo-code

// Note: All the variables are on 32 bits // To define R as follows: VAr int R, K R 0..15: = {7, 12,17,22,7,12,17,22,7,12,17,22,7,12,17,22} R: = {5, 9,14,20,5,9,14,20,5,9,14,20,5,9,14,20} R: = {4, 11,16,23,4,11,16,23,4,11,16,23,4,11,16,23} R: = {6, 10,15,21,6,10,15,21,6,10,15,21,6,10,15,21} // MD5 uses sines of entireties for its constants: for I of 0 with 63 to make K: = floor (ABS (sin (I + 1)) × 2^32) fine for // Preparation of the variables: VAr int h0: = 0x67452301 VAr int h1: = 0xEFCDAB89 VAr int H2: = 0x98BADCFE VAr int h3: = 0x10325476 // Preparation of the message (padding): to add " 1" bit with the message to add " 0" bits until the size of the message out of bits is equal to 448 (MOD 512) to add size of the message coded in 64-bit little-endian to the message // Cutting in blocks of 512 bits: for each block of 512 bits of the message to subdivide in 16 words of 32 bits in little-endian W, 0 ≤ I ≤ 15 // to initialize the values of chopping: VAr int a:= h0 VAr int b:= h1 VAr int C: = H2 VAr int D: = h3 // for principal: for I of 0 with 63 to make if 0 ≤ I ≤ 15 then F: = (B and c) or (( not b) and d) G: = I if not if 16 ≤ I ≤ 31 then F: = (D and b) or (( not d) and c) G: = (5×i + 1) MOD 16 if not if 32 ≤ I ≤ 47 then F: = B xor C xor D G: = (3×i + 5) MOD 16 if not if 48 ≤ I ≤ 63 then F: = C xor (B or ( not d)) G: = (7×i) MOD 16 fine if fine if fine if fine if VAr int temp: = D D: = C C: = B b:= ((+ F has + K + W) leftrotate R) + B a:= temp fine for // to add the result to the preceding block: h0: = h0 + has h1: = h1 + B H2: = H2 + C h3: = h3 + D fine for VAr int impressed: = h0 concaténer h1 concaténer H2 concaténer h3 // (in little-endian)

Calculation of the print

Initialization

One considers the registers has, B, C and D following:

  • has = 0x67452301

  • B = 0xefcdab89
  • C = 0x98badcfe
  • D = 0x10325476

As well as the following functions:

  • F (X, there, Z) = X and there gold not (X) and Z => (X & there) | (~x & Z)

  • G (X, there, Z) = X and Z gold not (Z) and there => (X & Z) | (~z & there)
  • H (X, there, Z) = X xor there xor Z => X ^ there ^ Z
  • I (X, there, Z) = there xor (X gold not (Z)) => there ^ (X | ~z)

(the 2nd notation corresponds to the language C)

  • FF (has, B, C, D, K, S, I) => has = B + ((has + F (B, C, d) + X + T) <<< S)

  • GG (has, B, C, D, K, S, I) => has = B + ((has + G (B, C, d) + X + T) <<< S)
  • HH (has, B, C, D, K, S, I) => has = B + ((has + H (B, C, d) + X + T) <<< S)
  • II (has, B, C, D, K, S, I) => has = B + ((has + I (B, C, d) + X + T) <<< S)

(<<< S represents a circular rotation on the left S bits)

Note that it is often more optimal to use the equivalent functions:

  • F (X, there, Z) = ((Z xor there) and X) xor Z => ((Z ^ there) & X) ^ Z

  • G (X, there, Z) = ((there xor X) and Z) xor there => ((there ^ X) & Z) ^ there
  • I (X, there, Z) = ((Z xor -1) gold X) xor there => ((Z ^ -1) | X) ^ there

And table T following:

  • T = 0xd76aa478

  • T = 0xe8c7b756
  • T = 0x242070db
  • T = 0xc1bdceee
  • T = 0xf57c0faf
  • T = 0x4787c62a
  • T = 0xa8304613
  • T = 0xfd469501
  • T = 0x698098d8
  • T = 0x8b44f7af
  • T = 0xffff5bb1
  • T = 0x895cd7be
  • T = 0x6b901122
  • T = 0xfd987193
  • T = 0xa679438e
  • T = 0x49b40821
  • T = 0xf61e2562
  • T = 0xc040b340
  • T = 0x265e5a51
  • T = 0xe9b6c7aa
  • T = 0xd62f105d
  • T = 0x02441453
  • T = 0xd8a1e681
  • T = 0xe7d3fbc8
  • T = 0x21e1cde6
  • T = 0xc33707d6
  • T = 0xf4d50d87
  • T = 0x455a14ed
  • T = 0xa9e3e905
  • T = 0xfcefa3f8
  • T = 0x676f02d9
  • T = 0x8d2a4c8a
  • T = 0xfffa3942
  • T = 0x8771f681
  • T = 0x6d9d6122
  • T = 0xfde5380c
  • T = 0xa4beea44
  • T = 0x4bdecfa9
  • T = 0xf6bb4b60
  • T = 0xbebfbc70
  • T = 0x289b7ec6
  • T = 0xeaa127fa
  • T = 0xd4ef3085
  • T = 0x04881d05
  • T = 0xd9d4d039
  • T = 0xe6db99e5
  • T = 0x1fa27cf8
  • T = 0xc4ac5665
  • T = 0xf4292244
  • T = 0x432aff97
  • T = 0xab9423a7
  • T = 0xfc93a039
  • T = 0x655b59c3
  • T = 0x8f0ccc92
  • T = 0xffeff47d
  • T = 0x85845dd1
  • T = 0x6fa87e4f
  • T = 0xfe2ce6e0
  • T = 0xa3014314
  • T = 0x4e0811a1
  • T = 0xf7537e82
  • T = 0xbd3af235
  • T = 0x2ad7d2bb
  • T = 0xeb86d391

M represents the message (original message + padding + length of the original message). M represents the nth word of 32 bits of the message.
N is the number of words of 32 bits.
X is a table of 16 words of 32 bits.
AA, BB, DC and DD are registers plugs.

Calculation

Here the pseudo code corresponding to calculations to carry out:

Buckle I of 0 in N/16 - 1 To make Buckle J from 0 to 15 To make X = M + J FinBoucle J AA = HAS BB = B DC = C DD = D /* Ronde 1 * FF (HAS, B, C, D, 0,7,0) FF (D, HAS, B, C, 1,12,1) FF (C, D, HAVE, B, 2,17,2) FF (B, C, D, HAVE, 3,22,3) FF (HAS, B, C, D, 4,7,4) FF (D, HAS, B, C, 5,12,5) FF (C, D, HAVE, B, 6,17,6) FF (B, C, D, HAVE, 7,22,7) FF (HAS, B, C, D, 8,7,8) FF (D, HAS, B, C, 9,12,9) FF (C, D, HAVE, B, 10,17,10) FF (B, C, D, HAVE, 11,22,11) FF (HAS, B, C, D, 12,7,12) FF (D, HAS, B, C, 13,12,13) FF (C, D, HAVE, B, 14,17,14) FF (B, C, D, HAVE, 15,22,15) /* Ronde 2 * GG (HAS, B, C, D, 1,5,16) GG (D, HAS, B, C, 6,9,17) GG (C, D, HAVE, B, 11,14,18) GG (B, C, D, HAVE, 0,20,19) GG (HAS, B, C, D, 5,5,20) GG (D, HAS, B, C, 10,9,21) GG (C, D, HAVE, B, 15,14,22) GG (B, C, D, HAVE, 4,20,23) GG (HAS, B, C, D, 9,5,24) GG (D, HAS, B, C, 14,9,25) GG (C, D, HAVE, B, 3,14,26) GG (B, C, D, HAVE, 8,20,27) GG (HAS, B, C, D, 13,5,28) GG (D, HAS, B, C, 2,9,29) GG (C, D, HAVE, B, 7,14,30) GG (B, C, D, HAVE, 12,20,31) /* Ronde 3 * HH (HAS, B, C, D, 5,4,32) HH (D, HAS, B, C, 8,11,33) HH (C, D, HAVE, B, 11,16,34) HH (B, C, D, HAVE, 14,23,35) HH (HAS, B, C, D, 1,4,36) HH (D, HAS, B, C, 4,11,37) HH (C, D, HAVE, B, 7,16,38) HH (B, C, D, HAVE, 10,23,39) HH (HAS, B, C, D, 13,4,40) HH (D, HAS, B, C, 0,11,41) HH (C, D, HAVE, B, 3,16,42) HH (B, C, D, HAVE, 6,23,43) HH (HAS, B, C, D, 9,4,44) HH (D, HAS, B, C, 12,11,45) HH (C, D, HAVE, B, 15,16,46) HH (B, C, D, HAVE, 2,23,47) /* Ronde 4 * II (HAS, B, C, D, 0,6,48) II (D, HAS, B, C, 7,10,49) II (C, D, HAVE, B, 14,15,50) II (B, C, D, HAVE, 5,21,51) II (HAS, B, C, D, 12,6,52) II (D, HAS, B, C, 3,10,53) II (C, D, HAVE, B, 10,15,54) II (B, C, D, HAVE, 1,21,55) II (HAS, B, C, D, 8,6,56) II (D, HAS, B, C, 15,10,57) II (C, D, HAVE, B, 6,15,58) II (B, C, D, HAVE, 13,21,59) II (HAS, B, C, D, 4,6,60) II (D, HAS, B, C, 11,10,61) II (C, D, HAVE, B, 2,15,62) II (B, C, D, HAVE, 9,21,63) WITH = + AA HAS B = B + BB C = C + DC D = D + DD FinBoucle I

Results

The result corresponds to the registers has, B, C, D concaténés of the weak byte of weight of has with the byte of strong weight of D.

I.e.:
(Digest is a chain of 16 characters, Digest is the nth character, the 1st byte is the weak byte of weight,…, 4th is the byte of strong weight)

Digest = 1st byte of has
Digest = 2nd byte of has
Digest = 3rd byte of has
Digest = 4th byte of has
Digest = 1st byte of B
Digest = 2nd byte of B
Digest = 3rd byte of B
Digest = 4th byte of B
Digest = 1st byte of C
Digest = 2nd byte of C
Digest = 3rd byte of C
Digest = 4th byte of C
Digest = 1st byte of D
Digest = 2nd byte of D
Digest = 3rd byte of D
Digest = 4th byte of D

The print of M is Digest.

-->

External bonds

Implementations

the RFC 1321 which details the algorithm:
  • RFC 1321 in English

There exists a great number of implementations for various architectures and languages, with free sources or not. MD5 is now integrated of office within the API several languages like Python, or Java.

  • LibTomCrypt (C/C++)

  • Gnu Crypto (C/C++)
  • a simple implementation out of C
  • Fast MD5 implementation (Java)
  • Class MessageDigest (Java)
  • MD5 (Library Python)

Cryptanalyse

  • '' Cryptanalysis off MD5 Compress '' by Hans Dobbertin
  • '' New way off cryptanalyzing MD5 '' by Buddhi Madhav
  • '' Practical Attacks one DIGITAL Signatures Using MD5 Message Digest '' by Ondrej Mikle
  • '' Finding MD5 collisions - has toy for notebook '' of Vlastimil Klíma
  • '' Musings one the Wang '' and Al '' MD5 Collision '' of P. Hawkes and Al (precise and formal description of the search for collisions)

Inversion

  • Decoder on line, based on a system of dictionary
  • Another site which makes it possible to reverse sums MD5 starting from a table. 400.000 entries approximately.
  • Another site which makes it possible to reverse sums MD5 starting from a table. 30.000.000 entries approximately.
  • a generator of collisions md5
  • Tables rainbow to reverse various types of chopping

Random links:Saint-Firmin (Meurthe-et-Moselle) | Intergalactic space | Amy Ray | Paul and Virginia (musical comedy) | Georges Buchard