Photon Propagation Code for IceSim

ppc (Photon Propagation Code) is an icetray module that creates Cherenkov photons along the muon tracks and secondary cascades (and other interactions) and propagates them in ice with layered scattering and absorption (with tilting as surveyed by the dust loggers) till they get absorbed or hit an OM. Thus, ppc replaces photonics, photonics-interface, and hit-constructor.

The icetray version of ppc shares code with a stand-alone version, that is used for ice properties studies (fits to flasher data) and quick checks on f2k muon data.

ppc has several substantial advantages over the established photonics-based simulation:

The basic version is enabled by default, when compiling icetray version of the ppc. It is possible to compile the GPU-accelerated version, to be run on CUDA GPUs (of the recent NVidia video cards, series 8000 and up), of the CPU-only version of the GPU-accelerated code, both in the ppc-gpu directory of the module.

The following table compares run times of the 3 ppc variants available within the icetray module (processing of 1 file of set 2972):

basic (default) 12h 1m 37s
CPU-only of ppc-gpu 3h 8m 38s
GPU version of ppc-gpu 0h 3m 45s

here is a detailed summary:
basic:
real    721m37.433s
user    718m51.836s
sys     0m32.642s
     
CPU-only:
real    188m37.917s
user    187m21.671s
sys     0m18.441s
     
GPU:
Device time: 78644.6 [ms]
real    3m44.556s
user    2m20.385s
sys     0m2.896s

As indicated above, only 78.6 seconds are spent on the actual photon propagation, most of the rest (140 seconds) are spent on other simulation modules (I3PMTSimulator, I3DOMsimulator, I3SMTrigger, etc.). A small portion of the CPU time (~10 seconds) is spent by the ppc module itself.

The ppc-gpu was tested on a cudatest computer with 6 GPUs and 4 CPU cores (capable of running 8 threads). Details on execution times on this computer are given in the Appendix. As a summary, this computer can process a neutrino-generator file 127 times faster than an average CPU node used for the equivalent photonics-based production.

Given that neutrino-generator alone, excluded from this simulation chain (pre-calculated on a cluster of CPUs) takes ~ 7.5 times longer per file than processing of the file by ppc-gpu and the rest of the simulation chain, the cudatest computer can be well matched with over 45 CPU nodes.

The 6 GPUs of the cudatest computer were used at ~35% capacity as the ppc-gpu has to wait for other modules to finish before processing more photons. Thus, the acceleration factor can be improved even further by one of the following techniques that might be considered:

In the first quarter of 2010 NVidia promises to release CUDA-capable video cards that will be more than 2 times faster than the existing hardware (as implemented within the cudatest computer). From the information currently available servers built around both current and next-generation GPUs might have similar performance/price ratio. Either way, custom-built computers appear to be much more cost-efficient than the pre-configured servers (that also need a host CPU system) as sold by NVidia.

Appendix
run times:

Files of sets 1540 and 2972 processed with ppc-gpu are available in /data/ana/IC40/ppc/.