ppc (Photon Propagation Code) is an icetray module that creates Cherenkov photons along the muon tracks and secondary cascades (and other interactions) and propagates them in ice with layered scattering and absorption (with tilting as surveyed by the dust loggers) till they get absorbed or hit an OM. Thus, ppc replaces photonics, photonics-interface, and hit-constructor.
The icetray version of ppc shares code with a stand-alone version, that is used for ice properties studies (fits to flasher data) and quick checks on f2k muon data.
ppc has several substantial advantages over the established photonics-based simulation:
The basic version is enabled by default, when compiling icetray version of the ppc. It is possible to compile the GPU-accelerated version, to be run on CUDA GPUs (of the recent NVidia video cards, series 8000 and up), of the CPU-only version of the GPU-accelerated code, both in the ppc-gpu directory of the module.
The following table compares run times of the 3 ppc variants available within the icetray module (processing of 1 file of set 2972):
basic (default) | 12h 1m 37s | |
CPU-only of ppc-gpu | 3h 8m 38s | |
GPU version of ppc-gpu | 0h 3m 45s |
here is a detailed summary:
basic: real 721m37.433s user 718m51.836s sys 0m32.642s |
CPU-only: real 188m37.917s user 187m21.671s sys 0m18.441s |
GPU: Device time: 78644.6 [ms] real 3m44.556s user 2m20.385s sys 0m2.896s |
As indicated above, only 78.6 seconds are spent on the actual photon propagation, most of the rest (140 seconds) are spent on other simulation modules (I3PMTSimulator, I3DOMsimulator, I3SMTrigger, etc.). A small portion of the CPU time (~10 seconds) is spent by the ppc module itself.
The ppc-gpu was tested on a cudatest computer with 6 GPUs and 4 CPU cores (capable of running 8 threads). Details on execution times on this computer are given in the Appendix. As a summary, this computer can process a neutrino-generator file 127 times faster than an average CPU node used for the equivalent photonics-based production.
Given that neutrino-generator alone, excluded from this simulation chain (pre-calculated on a cluster of CPUs) takes ~ 7.5 times longer per file than processing of the file by ppc-gpu and the rest of the simulation chain, the cudatest computer can be well matched with over 45 CPU nodes.
The 6 GPUs of the cudatest computer were used at ~35% capacity as the ppc-gpu has to wait for other modules to finish before processing more photons. Thus, the acceleration factor can be improved even further by one of the following techniques that might be considered:
In the first quarter of 2010 NVidia promises to release CUDA-capable video cards that will be more than 2 times faster than the existing hardware (as implemented within the cudatest computer). From the information currently available servers built around both current and next-generation GPUs might have similar performance/price ratio. Either way, custom-built computers appear to be much more cost-efficient than the pre-configured servers (that also need a host CPU system) as sold by NVidia.
per file | ||
photonics-based (including corsika), per CPU | 3h 19m 7s | |
ppc-based (excluding corsika), on a 6-GPU computer | 0h 1m 23s |
here is a detailed summary of ppc-gpu:
693 corsika files: Device time: 144401066.7 [ms] real 956m40.579s user 3118m37.174s sys 42m15.550s total [s] per file [s] Device time: 144401.0667 208.371 real time: 57400.579 82.829 real time x6: 344403.474 496.975 user time: 187117.174 270.010 sys time: 2535.55 3.659
per file | ||
neutrino-generator only | 0h 26m 31s | |
photonics-based (including neutrino-generator) | 1h 55m 13s | |
ppc-based (excluding neutrino-generator), on a 6-GPU computer | 0h 0m 42s |
here is a detailed summary of ppc-gpu:
500 neutrino-generator files: Device time: 31790709.4 [ms] real 352m19.215s user 1096m8.342s sys 22m49.046s total [s] per file [s] Device time: 31790.7094 63.5814 real time: 21139.215 42.278 real time x5: 105696.075 211.392 user time: 65768.342 131.537 sys time: 1369.046 2.738
Files of sets 1540 and 2972 processed with ppc-gpu are available in /data/ana/IC40/ppc/.