5.1 Security Considerations for Implementers
RDC relies on a relatively weak hash function (that is, MD4) for its chunks. MD4 was chosen because it requires few cycles per byte, has a relatively low startup cost (because several hashes are computed per file, one per chunk), and yet it is still a reasonably collision-resistant hash function. This is to be taken in the context in which the main functionality of RDC is to speed up sufficiently many (but not necessarily all) file transfers. RDC itself does not advertise that the transfer is accurate, but integrity should be checked by strong hash functions. Strong hash functions can be computed when traversing the source file and sent over at the end or beginning of a transfer.
Thus, it is possible (even to be (rarely) expected) that two distinct chunks of data may have the same length and the same MD4 hash value. In this case, it is possible that the use of RDC will result in the construction of a target file that is, in fact, distinct from the source file.
An application using RDC should have an alternate means outside of RDC to determine if the target file has the same content as the original source file.<5>
One way to achieve this goal is to compute a hash (using some application-selected hash algorithm) on the entire contents of the source file and the target file. If the hashes do not match, the application could invoke some sort of recovery operation (such as transferring the source file without the use of RDC). If it is possible that the input data to RDC is generated by a source that is not trustworthy, this secondary hash should be cryptographically secure, so that an attacker cannot intentionally generate a file update that could result in a signature collision and consequent errant transfer.