flasher: real 0m58.441s user 0m58.416s Assembly sys 0m0.008s real 1m48.110s user 1m48.015s c++ 32-bit sys 0m0.096s real 1m48.086s user 1m48.023s c++ 64-bit sys 0m0.060s real 2m18.296s user 2m18.237s c++ 64-bit old *) sys 0m0.064s real 1m41.724s user 1m41.538s c++ 32-bit fast sys 0m0.188s real 1m21.242s user 1m21.097s c++ 64-bit fast sys 0m0.144s Device time: 1522.8 [ms] real 0m1.818s user 0m0.176s GTX 295 GPU sys 0m0.152s f2k muon: real 1m39.787s user 1m39.770s Assembly sys 0m0.016s real 5m9.184s user 5m8.119s c++ 32-bit sys 0m0.172s real 4m43.612s user 4m43.558s c++ 64-bit sys 0m0.080s real 6m15.054s user 6m14.851s c++ 64-bit old *) sys 0m0.104s real 3m6.262s user 3m6.160s c++ 32-bit fast sys 0m0.096s real 2m28.235s user 2m28.177s c++ 64-bit fast sys 0m0.072s Device time: 5655.5 [ms] real 0m6.404s user 0m0.544s GTX 295 GPU sys 0m0.200s *) compiled on older Linux (cobalt64: 2.6.9, gcc 3.4.6) as opposed to newer Linux: cudatest: 2.6.28, gcc 4.3.3 Tareq's test run: real 31m45.307s user 31m45.167s Assembly sys 0m0.212s Device time: 105548.4 [ms] real 1m56.073s user 0m9.757s GTX 295 GPU (per-event mode) sys 0m0.492s Device time: 53094.9 [ms] real 0m58.905s user 0m6.328s GTX 295 GPU (congregated-event mode) sys 0m0.408s