Step 1: Copy files from lustre to the array.
time ~jbellinger/rsync/rsync-3.0.8/rsync -rLptgoDvWP --inplace /data/exp/IceCube/2005/FAT /data/F00/ *gt;& /mnt/space/testcopy/step1.log
Step 2: Do it again, and compare times
Trial | Real(min) | User | Sys |
1 rsync lustre to disk | 130.5 | 7.5 | 10 |
2 rsync lustre to disk | 128.5 | 7.5 | 10 |
Step 3: Use dar to create stripes from the array to the array
export PATH=$PATH:/mnt/space/dar/dar-2.4.7/src/dar_suite dar -s 1024M -c /data/F02/slices/FAT_Trial -R /data/F01/FATThis refused to work as a background process! I had to reconnect it to the terminal.
Trial | Real(min) |
dar disk to disk | 37 |
Step 4: Copy stripes from the array to the array, using rsync
time ~jbellinger/rsync/rsync-3.0.8/rsync -rLptgoDvWP --inplace /data/F02/slices /data/F03/ > /mnt/space/testcopy/step3.log 2>/mnt/space/testcopy/step3.err
Trial | Real(min) |
rsync stripes disk to disk | 21.5 |
Step 5: Copy files from the array to the array, using rsync
time ~jbellinger/rsync/rsync-3.0.8/rsync -rLptgoDvWP --inplace /data/F01/FAT /data/F03/ > /mnt/space/testcopy/step4.log 2>/mnt/space/testcopy/step4.err
Trial | Real(min) |
rsync files disk to disk | 53.5 |
First conclusion: The combination of using dar to create slices and rsync to copy the slices is about 10% slower than using rsync to copy the individual files. In other words the times are comparable. The overhead of writing many small files is tangible.
Step 6: Copy files from lustre to stripes in the array, using dar
export PATH=$PATH:/mnt/space/dar/dar-2.4.7/src/dar_suite dar -s 1024M -c /data/F03/slices2/slices2 -R /data/exp/IceCube/2005/FAT
Trial | Real(min) |
dar lustre to disk | 110.5 |
There are several unknowns. Since rsync writes so much to the log file I'm assuming that this dominates its processing time. It also calculates a checksum, and that may not actually be trivial.
Step 7: Check dar copy NFS stripe to NFS stripe with different block size
dar -s 2048M -c /data/F03/slices3/FAT_Trial -R /data/F02/slices
Trial | Real(min) |
dar NFS to NFS disk new stripe size | 26 |
Step 8: Check rsync NFS file copy without checksum or terminal logging. Note the absence of the P and v options.
time ~jbellinger/rsync/rsync-3.0.8/rsync -rLptgoDW --inplace /data/F01/FAT /data/F02/FATX/ > /mnt/space/testcopy/step8.log 2>/mnt/space/testcopy/step8.err
Trial | Real(min) |
rsync NFS to NFS disk non-verbose | 51.5 |
Name | Description |
RL | Time required to read the files from lustre |
RD | Time required to read the files from NFS array |
RS | Time required to read the stripes from NFS array |
WD | Time required to write the files to NFS array |
WS | Time required to write the stripes to NFS array |
P | Dar Processing time |
R | Rsync processing time (dominated by log output?) for files |
0 | rsync processing time for the stripes is presumed negligable |
From the above we have several equations
RL+(R+WD)=130 |
RD+(R+WD)=54 |
RD+(P+WS)=37 |
RS+WS=22 |
RL+(P+WS)=110 |
RS+(P'+WS)=26 |
From this we quickly see that reading the files from the NFS disk is 76 minutes less than reading them from lustre. The processing time P' for dar to process large stripes as input only adds 4 minutes over rsync copying them. RD+(P+WS) is in the range (34,37) minutes. The difference between (R+WD) and (P+WS) is about 17minutes. Since P is presumably not smaller than P', P≥4minutes and R≥2minutes, and we're getting that the contribution due to writing to disk is of order (15-19) minutes larger when writing 118,000 small files than when writing 46 large files. Writing to a RAID array like this takes longer than reading, by a factor of about 2 or so. From RS+WS=22 that implies 14 minutes due to write and 7 to read (assuming the program isn't clever), which is not badly inconsistent with the previous estimate.
So writing stripes takes O(15minutes/46GB=3GB/min) and writing files takes O(32minutes/46GB=1.4GB/min). This is all over NFS.
rsync 3.0.8 starts copying quickly and does not maintain a monster memory footprint. dar starts copying quickly, and maintains O(650bytes/file), which for (e.g.) /net/user/aura demands 50GB of main memory. Not happening. Therefore dar can only be used on an already-partitioned set of files. For example a job copied 27GB in 100 minutes eating (by the end) 21% of the 8GB memory on sam, archiving /net/user/aura; it wasn't anywhere near done.
I can try rsync | tar | split | parchive
That doesn't work because par2 wants the filenames specified. Also rsync doesn't like to write to a pipe: it can't read first to check that the file already exists. (r"sync" for synchronization)
BFI possible: write all the files to a file and process that. Lots of latency, lots of overhead, and do you scramble the files to try to spread the lustre server overhead around? Or not?
Or do I need a custom program that uses an incremental scan for files, checksums them on the fly, and creates archive files of the desired size? And writes an index stream and uses reed-solomon redundancy to the degree specified?
How long would it take to create such a program? Maybe a week to steal the incremental scan from rsync, a week to steal the checksum from rsync, a week to devise a blocking scheme, a week to steal code from cp (mostly error handling), a week to steal the reed-solomon redundancy code? 5 weeks. Double that and use the next higher unit: 10 months. Not affordable. And I didn't specify the unpacking program... Staging can simplify the coding, since the output blocks can be par2'd in a script.
cpio has an 8GB file size limit.
pax? It has a useless checksum (of the header block)
Incremental search: has state, given location to search, discovers files until a filecount is reached, fills a buffer with the file name and path and hands this to the copier--and waits until it is restarted. Possible errors: failure to read. Possible states: more files, no more files.
Copier: Reads files individually (default block size?), checksums them on the fly and writes the result into a chunk buffer. file names and sizes and checksums and chunk file names go into a list which, when it reaches a given size, is written out to an index file. When the chunk buffer fills the chunk is written out (file may span chunks). Add redundancy here at specified level. Possible errors: failure to read file, failure to write chunk file, failure to write index, out of memory, overflow of index buffer, user interrupt. Chunk has file indexing information and header info and file contents.
Restoration tool uses index to find file chunks needed. Read file chunk and verify or restore the original chunk. Extract portion of file and write it, use other chunks if required.