Maximum acceleration v17 (with default options) flasher: Device time: 27211.5 [ms] real 0m28.483s user 0m1.100s sys 0m0.188s f2k muon: Device time: 10263.1 [ms] real 0m13.492s user 0m2.548s sys 0m0.676s Maximum acceleration v16 (with default options), not directly comparable to other versions (as some of used options could make those versions faster as well). flasher: Device time: 33737.6 [ms] 451295 2256475 13624272 real 0m35.101s user 0m1.196s sys 0m0.192s f2k muon: Device time: 11365.1 [ms] real 0m14.217s user 0m2.220s sys 0m0.616s New GPU results: v14 on a faster clocked GTX 295 GPU in the new versions options ASENS, ROMB, and ACCL6 are disabled, also cz changed to 20 (while lowering MAXGEO from 5160 to 3540) to be most similar to the algorithm of the Assembly version. flasher: Device time: 72213.5 [ms] real 1m22.858s user 0m10.377s sys 0m0.348s f2k muon: Device time: 18694.5 [ms] real 0m22.302s user 0m2.808s sys 0m0.744s New Assembly results flasher: real 82m11.985s user 82m4.792s sys 0m2.916s f2k muon: real 28m31.088s user 28m25.971s sys 0m1.040s New GPU results (GPU version v12 on a faster clocked GTX 295 GPU): flasher: Device time: 72488.9 [ms] real 1m16.484s user 0m3.772s sys 0m0.264s f2k muon: Device time: 20247.7 [ms] real 0m23.970s user 0m2.972s sys 0m0.740s (GPU version v12): flasher: Device time: 75634.4 [ms] real 1m19.896s user 0m4.320s sys 0m0.264s f2k muon: Device time: 21144.4 [ms] real 0m25.073s user 0m3.148s sys 0m0.776s (GPU version v11): flasher: Device time: 84608.4 [ms] real 1m27.663s user 0m3.132s sys 0m0.220s f2k muon: Device time: 21896.2 [ms] real 0m24.952s user 0m2.632s sys 0m0.408s (GPU version v10): flasher: Device time: 86867.7 [ms] real 1m30.719s user 0m4.076s sys 0m0.256s f2k muon: Device time: 22775.0 [ms] real 0m26.642s user 0m3.148s sys 0m0.692s (GPU version v9): flasher: Device time: 114169.9 [ms] real 1m58.586s user 0m4.116s sys 0m0.248s f2k muon: Device time: 29926.6 [ms] real 0m33.852s user 0m3.100s sys 0m0.768s Full (longer) test (GPU version v7): flasher: real 105m34.074s user 104m42.725s Assembly sys 0m4.328s real 196m10.007s user 194m26.765s c++ 32-bit sys 0m15.485s real 196m39.933s user 194m26.353s c++ 64 bit sys 0m15.337s real 184m2.920s user 182m8.027s c++ 32-bit fast sys 0m31.782s real 147m42.861s user 146m6.344s c++ 64-bit fast sys 0m29.354s Device time: 116501.7 [ms] real 2m0.575s user 0m4.080s GTX 295 GPU sys 0m0.288s f2k muon: real 34m16.698s user 33m59.179s Assembly sys 0m1.436s real 101m30.158s user 100m37.637s c++ 32-bit sys 0m5.216s real 97m44.541s user 96m56.428s c++ 64-bit sys 0m4.788s real 64m55.442s user 64m22.713s c++ 32-bit fast sys 0m3.508s real 52m14.626s user 51m47.714s c++ 64-bit fast sys 0m3.372s Device time: 29143.4 [ms] real 0m41.434s user 0m11.209s GTX 295 GPU sys 0m0.744s Short (older) test results: flasher: real 0m58.441s user 0m58.416s Assembly sys 0m0.008s real 1m48.110s user 1m48.015s c++ 32-bit sys 0m0.096s real 1m48.086s user 1m48.023s c++ 64-bit sys 0m0.060s real 2m18.296s user 2m18.237s c++ 64-bit old *) sys 0m0.064s real 1m41.724s user 1m41.538s c++ 32-bit fast sys 0m0.188s real 1m21.242s user 1m21.097s c++ 64-bit fast sys 0m0.144s Device time: 1158.0 [ms] real 0m1.442s user 0m0.168s GTX 295 GPU sys 0m0.140s f2k muon: real 1m39.787s user 1m39.770s Assembly sys 0m0.016s real 5m9.184s user 5m8.119s c++ 32-bit sys 0m0.172s real 4m43.612s user 4m43.558s c++ 64-bit sys 0m0.080s real 6m15.054s user 6m14.851s c++ 64-bit old *) sys 0m0.104s real 3m6.262s user 3m6.160s c++ 32-bit fast sys 0m0.096s real 2m28.235s user 2m28.177s c++ 64-bit fast sys 0m0.072s Device time: 1613.2 [ms] real 0m2.324s user 0m0.444s GTX 295 GPU sys 0m0.284s *) compiled on older Linux (cobalt64: 2.6.9, gcc 3.4.6) as opposed to newer Linux: cudatest: 2.6.28, gcc 4.3.3 Tareq's test run: real 31m45.307s user 31m45.167s Assembly sys 0m0.212s Device time: 28948.8 [ms] real 0m36.240s user 0m6.540s GTX 295 GPU sys 0m0.812s