Slides and Notes 30-August-2011

I got a new ls from a system that runs coreutils-5.97-34.el5 and it doesn't have the problem the old one does. (If you pipe the output of ls -UR to a file it doesn't recurse.) It took 7 hours to process 1227242 file in /data/exp/IceCube/2008/filtered on lustre on the otherwise unloaded sam.

If the rsync numbers can be trusted, and I don't think they can, then the minimum file transfer rate is roughly proportional to the file size. Which implies a nearly constant tranfer time for files, which I know is incorrect from watching the log files grow.

If the file transfers quickly rsync will sometimes repeat the information for the first chunk as though it were a second chunk. As can be seen below, the penultimate chunk transfer sometimes better reflects the average transfer rate than the last one. I calculate the average by reading each block and taking the difference between the previous total bytes and the current and dividing it by the speed to get the transfer time for that chunk. Then I add up all these times and divide the total by it to get an average speed. If something goes wrong (divide by 0 attempted) I flag that as bad.

nagios finds data transfer rates of O(39MB/sec), and I, adding up the files transferred in 3372 seconds, find 39.3MB/sec. So I agree with nagios, but not the average transfer rates that I am calculating from the rsync information. Therefore either I do not understand what rsync is doing with the chunk reporting or I have a bug in my awk script.

The rest of these use larger statistics.


Returning to the disk-based transfer (so I can get rid of pesky details about server variation). There are almost always 2 chunks reported, and for the 56320 byte files the first transfer seems to be not quite as fast as expected compared to the other, or compared to the second chunk. Given that the second chunk is often reported as a duplicate of the first chunk when the file size is small, this suggests that the default initial chunk size is going to be at most 40K. Checking shows 32768 as the first chunk size whenever the file is bigger than this. The second chunk is also 32768 bytes almost all the time, but larger at the 1% level. The third chunk (with our particular file mix so far) has peaks at 50 and 75MB--quite a bit larger. Chunks 4 and 5 and 6 and 7 and even 8 peak at about 80MB: though the 8'th chunk has a large broad peak at 50. Mileage will vary.


Summary:


Modified 30-August-2011 at 08:22

http://icecube.wisc.edu/~jbellinger/StorHouse/30Aug2011
Previous notes Next notes Main slide directory

Please contact jnbt@hep.physics.wisc.edu if you have trouble accessing the information on this page.