Photon Propagation Code in Assembly
|
click here for the previous version
|
Most improvement is observed on Pentium M (my laptop). details
|
| flasher | f2k muon |
c++: | 1.00 | 1.00 |
asm: | |
Core i7 2.67 GHz: | 1.38 | 2.92 |
Intel Xeon 2.4 GHz: | 1.49 | 2.30 |
Intel Xeon 3.2 GHz: | 1.64 | 2.51 |
Pentium M 2.0 GHz: | 2.16 | 3.45 |
AMD: | |
Opteron 2.0-2.4 GHz: | 1.60 | 2.33 |
Tareq's test run took 31.9 minutes on Core i7 (32-bit asm). Considering that 4 threads can be run simultaneously on a single CPU, this is further reduced to 8 min. Compared to 1.22 min. on 9800 GT GPU, this is only a factor of 6.5x slower. More recent GPUs could increase this by ~2.5 to 16x.
|
|
|
There is a number of differences between the ppc in Assembly and ppc in c++ implementations:
| c++ | Assembly |
calculation precision: | double-precision everywhere | limited precision: mostly single precision or in some cases even lower (in the direction-vector normalization calculation) |
wavelength dependence: | full 6-parameter ice model | tabulated in 10 nm bins |
random number generator: | rand() of stdlib | 32-bit base multiply-with-carry, 223 different (normalized!) numbers |
Several minor differences in conditional statements (start or end on an OM, do not exit OMs: only enter, etc.) |
Despite these differences, the agreement between both versions is very strong:
|
|
|