In building a stateless web services API (e.g. a RESTful one), developers often must consider the chance that an attacker might attempt to spoof a call on behalf of a legitimate client. For example, if the attacker—Evil Enterprises—were to spoof a request to an event-logging API used legitimately by Alpha Incorporated with garbage or inaccurate values, all sorts of data integrity issues might ensue. To guard against this type of spoofing, many APIs require a “Message Authentication Code” (MAC) to be provided along with the API call, allowing the server to identify that a) the message originated from whom it is claimed to, b) that the message has not been modified in transit, and optionally c) that the message has not been previously submitted (i.e. “replayed”). However, many implementers—including some of the most famous sites—have made the naive assumption that an MD5 signature of the parameters of the request—i.e. signature=md5(key + param1 + param2 … paramn)—can provide these guarantees.
The trouble with the above formula, however, is that most hash functions, MD5 very much included, all have “second preimage vulnerabilities,” meaning that, given Hash1 = hash(M1) it is computationally possible to find a second message, M2, that yields the same hash value. In this case, given the signature, S, from a captured, legitimate request, an attacker can come up with extra text (a.k.a. “chaff”) C such that the hash function’s output even without the secret key is a known valid hash of the original request S = MD5(key + param1 + param2) = MD5(param100 + param101 + C).
Moreover, switching the sequence of the MAC generator function to signature=md5(param1 + param2 … paramn + key) (i.e. a “secret suffix”) still leaves the vulnerability that if the attacker can find a collision in the hash function, he knows that he’ll still have a collision when he appends the secret. Thus the security of the MAC is illusory.
- H(·) be a cryptographic hash function
- K be a secret key padded to the right with extra zeros to the block size of the hash function
- m be the message to be authenticated
- ∥ denote concatenation
- ⊕ denote exclusive or (XOR)
- opad be the outer padding (0x5c5c5c…5c5c, one-block-long hexadecimal constant)
- ipad be the inner padding (0×363636…3636, one-block-long hexadecimal constant)
Then HMAC(K,m) is mathematically defined by HMAC(K,m) = H((K ⊕ opad) ∥ H((K ⊕ ipad) ∥ m)).
In this way, the signature of the message can be made robust against spoofing, and can even remain strong despite the known collision vulnerabilities against MD5 and SHA1. That said, anybody implementing a new system should skip over MD5 and SHA1 and go straight to HMAC-SHA256.