3.1.5.4.2 Finding Seed Files

Using the similarity data that has been calculated for files on the target location, it might be possible to find RDC seed files for files that have been added to the source location but not to the target location. Similarity data consists of 16 bytes of data called similarity traits.

When the source location has a new file that requires replication on the target location, the application on the source location can send the similarity data of the new file to the target location. The application on the target location compares the similarity traits of the new file to the similarity traits of each of the target location's existing files, counting the similarity traits that are the same (that is, that match). The application can then determine which of the existing files are used as seed, based on the number of similarity trait matches; for example, based on a threshold or a majority scheme. The choice of what scheme is used is implementation-specific.<4>

Note that similarity traits are compared on a one-to-one basis so that there are at most 16 comparisons for each target location file. That is, the first similarity trait of the new file is compared to the first similarity trait of a target location file, the second similarity trait is compared to the second similarity trait, and so on.

Any number of target location files can be used as seed files for an RDC operation. Target location files with the greatest number of similarity traits that match the similarity traits of the new source location file are generally the best candidates.