Photon Propagation Code for the GPU
Sources

<--c++-->

ini.cxx Makefile f2k.cxx
<--cu-->
ppc.cu src pro.cu
<--datafiles (SPICE MIE)-->
rnd.txt cfg.txt icemodel.dat
geo-f2k wv.dat icemodel.par
tilt.dat tilt.par as.dat
<--fast version of ppc-->
simply compile with "make cpu"
click here for the previous version
Performance comparison on Core i7 2.67 GHz. details

flasherf2k muon
Original:1/1.851/2.47
fast c++:1.001.00
Assembly:1.241.34
GTX 295:140.123.

Assembly numbers improved compared to the previous version used in this study. The GPU code compiled for the CPU (the new "fast c++") is taken as the new 1.0 reference. These tests were run on the cudatest computer. On a 1.296 GHz GeForce GTX 295 GPU Tareq's test run takes 18.22 seconds, 91.9 times faster than the Assembly code on 1 CPU node. i3mcml achieves a comparable level of performance on this GPU.
The GPU version of ppc is very similar in implementation to the ppc in Assembly and to the "fast c++" version listed in the above tables.

The agreement between both versions is very strong:

The following is a GPU resources usage summary (v27):
  • 62460 bytes of the 64k GPU constant (cached) memory are used to hold the geometry of up to 5200 in-ice sensors and several constants.
  • 9592 bytes of the 16k GPU shared per-multiprocessor memory are used to hold geometry cell-association look-up tables (21x19 cells), absorption and scattering coefficients in up to 180 layers (33 ice tables are calculated for different wavelengths and are loaded in different execution blocks, possibly simultaneously on different multiprocessors), ice tilt data (from 6 dust logs), as well as some constants and pointers to input/output structures.
  • Program uses 37 registers per thread and supports running up to 384 threads on a single multiprocessor.
  • Program uses 0 bytes of the slower local memory.
Main ppc page. Readme file.