Optimized Corsika - tests and run time estimates

I Standard corsika (100k files) set 1541 (weighted spectrum of Hoerandel dslope=-1 ) log(Energy):
after Level6 and 7 cuts can be found here

II Optimized Corsika - tests

Two Oksana's Corsika settings have been tested (Amanda cascade analysis) with an extra cut on the most energetic cascades (Sebastians IcePick filter)
Tested: Oxana setting 1 with LogLeadingCscd_Energy ge 2.0 (set1a), Oxana setting 2 with LogLeadingCscd_Energy ge 2.0 (set2a) and Setting3 with LogLeadingCscd_Energy ge 2.0 (set3a), see Table below for explicite cut values

Test other possibilities:
- smaller generation volume? (level4 cuts reduce number of corsika events at online cascade filter level by a factor of 200 - 400 depending on corsika settings)

1.	WeightedCorsika	Old MC standard	New MC standard (test)	New MC optimized set1a	New MC optimized set2a	New MC optimized set3a	New MC optimized set4a	New MC optimized set5a
2a.	Energy_Primary Cut	600 GeV	600 GeV	3 TeV	20 TeV	40 TeV	40 TeV	80 TeV
2b.	Energy_Muons Cut (ecuts2)	273 GeV	273 GeV	1.2 TeV	3.0 TeV	3.0 TeV	5.0 TeV	5.0 TeV
2c.	Energy_electrons Cut (ecuts3)	0.003 GeV	0.003 GeV	500 GeV	800 GeV	800 GeV	800 GeV	800 GeV
2d.	Log10(LeadCscdEnergy Cut)	-	-	2.0	2.0	2.0	2.0	2.0
3.	DatasetId	1625 ( 1541 )	2301	2471	2479 2487 ( 2480 2600 )	2493	2519	2512
4.	Number of files	100 000 files	90 files	100 files	200 files (934, 9841 files)	100 files	991 files	200 files
4.a	Number of gen corsika showers per file	400 000	400 000	400 000	400 000	400 000	400 000	400 000
4.b	dslope	-1	-1	-1	-1	-1	-1	-1
5.	NumberOfEvents per file (IC22 triggered)	2800	2900	4000 (xxx)	7400	10900	6500	3000
6.	NumberOfEvents per file (Level2)	900	935	1600 (xxx)	3000	4500	2600	1100
7.	NumberOfEvents per file (Cscd filter only at level2)	xx	216	450	912 (44222/50=884)	1370		416
8.	NumberOfEvents per file (level2 CscdFilter + log(RecoEn) gt 4)	xx	21	49	79 (3959/50=79.18)	126		50
9.	NumberOfEvents per file (level4)	55000/100000=0.55	48/90 = 0.53 (8)	101/100 = 1.0	578/200 = 2.89 (2617/934=2.80)	412/100 = 4.12	2084/872 = 2.4
10.	NumberOfEvents per file (level6)	3240/100000=0.03	4/90 = 0.04	9/100 = 0.09	88/200 = 0.44 (415/934=0.44, 4181/9841=0.42)	53/100 = 0.53	350/872 = 0.4	14/150= 0.09
11.	NumberOfEvents per file (level6 log(RecoEn) gt 3.0)	1153/100000=0.01	xx	4/100 = 0.04	43/200 = 0.21 (164/934=0.18, 1807/9841=0.18)	26/100 = 0.26	167/872 = 0.19	8/150 = 0.05
12.	NumberOfEvents per file (level4 + log(RecoEn) gt 2.0 )	xxx	39/90 = 0.43 (7)	93/100 =0.93	549 /200 = 2.75 (2461/934=2.63)	391 /100 = 3.91
13.	NumberOfEvents per file (level4 + log(RecoEn) gt 2.6 )	xxx	11/90 = 0.12 (4)	38/100=0.38	279 /200 = 1.40 (1247/934=1.34)	194 /100 = 1.94
PDSF (e.g. 1 job only)	RunTime (wallclock)	xx	pc1016 26700 s (7.4h)		pc1016 a) job=2479.26 50573 s (14 h) b) job=2479.97 27167 s (7.5h)
PDSF (e.g. 1 job only)	RunTime (user CPU) (*)	xx	pc1016 16000 s (4.4h)		pc1016 a) job=2479.26 47554 s (13.2 h) b) job=2479.97 26380 s (7.3h)

Running time on PDSF machines: 1 node= 2x4=8 cores, cpu speed = 2 GHz , total memory 16 GB
1 core=1 job slot
(*) USer CPU at pdsf : not correct ? to be checked with pdsf experts what is user CPU (does not look like cpu)

7) At level2 (cascade filter only)

Rate versus log(Energy):

Test of standard corsika (a) large statistics older software set 1541 (black histogram) compared with (b) newer software (used for optimized corsika sets) set 2301 (green histogram): Rate [Hz] vs log (MCPrimary_Energy) and ratio vs log (MCPrimary_Energy)
Conclusion: standard corsika sets (old and new software, settings the same) are consistent

10) After level6 (low statistics) (no energy cut)
10a) Rate versus log(Energy): black=standard corsika

10b) Effective Livetime ( icecube/200902001-v2 ) vs : MCLeadCascade log(Energy) , log(RecoEnergy) , MCPrimary log(Energy)

10c) Rate versus log(Energy): (same as 10a but only 3 high statistics histograms)

(left) Ratio=Rate(set 2a)/Rate(standard) vs log10(Primary_Energy) and (right) Ratio=Rate(set 4a)/Rate(standard) vs log10(Primary_Energy)

11) After level6 (low statistics) (logRecoEn gt 3.0 )
Note: In the analysis log(RecoEnergy) gt 4 cut is used at the final cut level

11a) Rate versus log(Energy):

11b) Effective Livetime ( icecube/200902001-v2 ) vs : MCLeadCascade log(Energy) , log(RecoEnergy) , MCPrimary log(Energy)

11c) Rate versus log(Energy): (same as 11a but only 3 high statistics histograms)

(left) Ratio=Rate(set 2a)/Rate(standard) vs log10(Primary_Energy) and (right) Ratio=Rate(set 4a)/Rate(standard) vs log10(Primary_Energy)

Run time (rough!) estimate:
Optimized corsika test samples have limited statistics, but we can see that for optimized and standard corsika rates are ~ consistent and similar energy ranges are covered for standard and optimized corsika; MC statistics enhancement factor is ~ or more than 20
To get the same amount of statistics as for standard MC (100 000 jobs), we would need to run 4400 jobs (400 000 corsika showers per job)
At pdsf it would take a month or longer. We need more than what we have in stndard IC22 MC.

12) After level7 (no energy cut)
Note: In the analysis log(RecoEnergy) gt 4 cut is used at the final cut level

12a) Rate versus log(Energy):

12b) Effective Livetime ( icecube/200902001-v2 ) vs : MCLeadCascade log(Energy) , log(RecoEnergy) , MCPrimary log(Energy)