Export (0) Print
Expand All Finding Chunk Boundaries

Given a file F of length n consisting of bytes b0 .. bn–1 and parameters windowSize and horizon, RDC finds the chunk boundaries. windowSize MUST be an integer that is at least 2 and no more than 96. horizon MUST be an integer that is at least 128 and less than 16,384. Recall that windowSize and horizon are inputs to RDC, and MUST be negotiated between the two parties participating in a transfer using RDC.

A byte bi is a local maximum if i > horizon and for all j ≠ i where i - horizon ≤ j ≤ i + horizon, the H3 hash value after adding bi is greater than that after adding any bj.

To determine the chunks of F, define the first chunk to start with b0. Work through each consecutive byte from b1 to bn–1. A chunk starts at a byte that is a local maximum or that is 65,535 bytes beyond the last byte that started a chunk.

Thus, the following is a recursive definition of chunk boundaries.

Note  Subscripts are expressed with parentheses (for example, "hash_valuei" would be expressed as "hash_value(i)").

File position 0 is a chunk boundary
File position n > 0 is a chunk boundary if:
      FORALL k in { n-horizon .. n+horizon } \ { n }
          hash_value(n) > hash_value(k)
     or     n - 65,535 is greater than or equal to 0 and 
            n - 65,535 is a chunk boundary and
            FORALL k in { n - 65,534 .. n - 1 } 
                  k is not a chunk boundary.

Note  The "\" operator means exclusion; so in this construct, it means all values in the range except n.

Define boundaryi to be the ith chunk boundary.

© 2014 Microsoft