I want tools that will do several things for us:
The 3.0.8 version of rsync allows incremental scans rather than the giant list of all files to be copied. It also calculates an md5sum. I modified rsync to write out the checksum and the file name, but the presence of multiple threads means the output is sometimes stepped on.
The dar program allows one to create slices instead of a giant tar-ball like archive. The catalog can be extracted separately, and the whole thing used together with Parchive to add redundancy to let you reconstruct despite some corruption.
Tape drives waste tape when many small files are written, and disk drives also prefer larger chunks that match boundaries.
Therefore partitioning the copy into multiple sections to be farmed out to a processor farm is an important technical issue to be solved.
Writing to a single archive disk array will be limited by the write speed of that array. This will limit the number of farm jobs that can be active.
I don't know of any direct way of having multiple computers write to a single tape drive: that requires an intermediary program. The farm computers don't have direct tape access in any event. Therefore writing to tape means that data has to be staged to disk first.