5.1 Security Considerations for Implementers

RDC relies on a relatively weak hash function (that is, MD4) for its chunks. MD4 was chosen because it requires few cycles per byte, has a relatively low startup cost (because several hashes are computed per file, one per chunk), and yet it is still a reasonably collision-resistant hash function. This is to be taken in the context in which the main functionality of RDC is to speed up many (but not necessarily all) file transfers. RDC itself does not advertise that the transfer is accurate, but check integrity by strong hash functions. Strong hash functions can be computed when traversing the source file and sent over at the end or beginning of a transfer.

Thus, it is possible (even to be (rarely) expected) that two distinct chunks of data can have the same length and the same MD4 hash value. In this case, it is possible that the use of RDC will result in the construction of a target file that is, in fact, distinct from the source file.

It is recommended that an application using RDC use an alternate means outside of RDC to determine whether the target file has the same content as the original source file.<5>

One way to achieve this goal is to compute a hash (using some application-selected hash algorithm) on the entire contents of the source file and the target file. If the hashes do not match, the application could invoke some sort of recovery operation (such as transferring the source file without the use of RDC). If it is possible that the input data to RDC is generated by a source that is not trustworthy, ensure that this secondary hash is cryptographically secure, so that an attacker cannot intentionally generate a file update that could result in a signature collision and consequent errant transfer.