Technical Documentation
This section includes instructions to reproduce the results of the analysis.
Load software
Login to the IceCube cluster. You can login to cobalts to create jobs files. Single jobs can be run on cobalts, but if you want to run multiple jobs, you should login to NPX (condor). These instructions load the software to run my analysis. If you would like to install the software on your own, there are instructions below.
# load cvmfs python
eval $(/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/setup.sh)
# load icetray
/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/RHEL_7_x86_64/metaprojects/icetray/v1.10.0/env-shell.sh
# load python environment
source /data/user/pgutjahr/software/virtual_envs/tensorflow_gpu_py3-v4.3.0/bin/activate
Simulation
The simulation is officially done via the IceProd framework. If you need further information, you find the source code for IceProd here. The datasets used in this analysis are 22774, 22775, 22776, 22777 and 22778.
Reconstruction
Detailed information about the reconstructions are provided here. That paragraph describes the fundamental ideas of the CNN based reconstruction, the features, the training data, and the evaluation. The networks used to reconstruct energies, direction and geometries are stored here:
/data/user/pgutjahr/exported_models/atmospheric_muon_leading/Old_CORSIKA/loose_precuts
and the network used to apply the pre-cut to remove low energy events at an early stage is stored here:
/data/user/pgutjahr/exported_models/atmospheric_muon_leading/Old_CORSIKA/
To apply a network using the ic3-processing framework, you can add the following dicts to your tray_segments
list. At first, you have to apply the time window cleaning via:
{
# add cleaned muon pulses
ModuleClass: 'ic3_processing.modules.pulses.cleaning.apply_time_window_cleaning',
ModuleKwargs: {},
}
Then, you get the cleaned pulses on which the networks are trained, named SplitInIceDSTPulsesTWCleaning6000ns
. To apply the pre-cut network, you can use the following code:
{
# run dnn-reco for pre cut
ModuleClass: 'ic3_processing.modules.reco.reco.apply_dnn_reco',
ModuleTimer: True,
ModuleKwargs: {
cfg: {
# global settings shared for all DNN-reco modules
add_dnn_reco: True,
DNN_batch_size: 1,
DNN_excluded_doms: [
'SaturationWindows', 'BadDomsList', 'CalibrationErrata',
],
DNN_partial_exclusion: True,
DNN_reco_configs: [
{
pulse_map_string: SplitInIceDSTPulsesTWCleaning6000ns,
DNN_model_names: [
'precut_surface_bundle_energy_3inputs_6ms_01',
],
DNN_ignore_misconfigured_settings_list: [],
DNN_models_dir: '/data/user/pgutjahr/exported_models/atmospheric_muon_leading/Old_CORSIKA/',
},
],
},
},
}
The other three networks are applied in the same way, but they are stored at a different location, as mentioned above.
{
# run dnn-reco for model evaluation
ModuleClass: 'ic3_processing.modules.reco.reco.apply_dnn_reco',
ModuleTimer: True,
ModuleKwargs: {
cfg: {
# global settings shared for all DNN-reco modules
add_dnn_reco: True,
DNN_batch_size: 1,
DNN_excluded_doms: [
'SaturationWindows', 'BadDomsList', 'CalibrationErrata',
],
DNN_partial_exclusion: True,
DNN_reco_configs: [
{
# models trained on cleaned pulses, 9 inputs
pulse_map_string: SplitInIceDSTPulsesTWCleaning6000ns,
num_data_bins: 9,
DNN_model_names: [
'direction_9inputs_6ms_medium_02_03',
'leading_bundle_surface_leading_bundle_energy_OC_inputs9_6ms_large_log_02',
'track_geometry_9inputs_6ms_medium_01',
],
DNN_ignore_misconfigured_settings_list: [],
DNN_models_dir: '/data/user/pgutjahr/exported_models/atmospheric_muon_leading/Old_CORSIKA/loose_precuts',
},
],
},
},
}
To reproduce the analysis, of course, the networks do not need to be trained again. The networks are already trained and stored in the locations mentioned above. However, if you want to train the networks again, you have to instal the dnn_reco framework on your local machine that provides a GPU. Then, you have to create the training data on the IceCube cluster. For this, we use ic3-processing. The config files to create the training data can be found here:
The pre-cut network is trained on IceCube Level2 after applying the muon filter. The other three networks are trained on our Level3, after applying a pre-cut of 200 TeV on the estimated bundle energy at surface. These data need to be transferred to your local machine, where you can train the networks.
The architectures of the networks are defined in these configs:
Process data to Level 5
Config locations
The processing is done with ic3-processing, which I can highly recommend to use. You can define all your steps, functions and modules in a single config file. This makes it easy to reproduce the analysis. The config files used to process the CORSIKA datasets, NuGen datasets and the burnsample to level 4 can be found here:
The final cuts to process the data to Level 5 are then performed in the analysis notebooks by applying the function apply_quality_cuts
, which can be found here. Additionally, the configs mentioned above apply only a pre-cut of 200 TeV on the bundle energy at surface. Thus, when you load the data in a
notebook, you also have to apply the cut DeepLearningReco_precut_surface_bundle_energy_3inputs_6ms_01_bundle_energy_in_mctree > 5e5
to reach Level 4 as described in this analysis.
Config explanation
In the following, I will explain the structure of the config files using the Level 4 config file for the CORSIKA datasets as an example. The config files are structured in a way that you can easily add or remove modules.
At first, you define the resources needed for the jobs and some dagman settings, which do not need to be adapted. For the resources, you can decide whether the job should use a GPU or not. When you don’t run the job on a
GPU, you should set gpus: 0
and has_avx2: True
and has_ssse3: True
. The networks used in this analysis are also fast on a CPU as stated in the selection section in Table 6.
What often needs to be adjusted is the memory.
Then, you define your input and output patterns. The input pattern is the path to the input files, which are stored on the IceCube cluster. The output pattern is the path where the output files are stored. The output files are named according to the dataset number and run number.
Here, the Level 2 production files in i3-format are loaded.
#------------------------------
# General job submission config
#------------------------------
keep_crashed_files: False
resources:
# If gpus == 1 this will be run on a GPU with
gpus: 1
cpus: 1
memory: 6gb # often this needs to be adjusted
# has_avx2: True # set to true, if gpu is not used
# has_ssse3: True # set to true, if gpu is not used
dagman_max_jobs: 5000
dagman_submits_interval: 500
dagman_scan_interval: 1
dagman_submit_delay: 0
# If true, the input files will first be checked for corruption.
# Note that this will take a while, since each input file has to be
# iterated through. You generally only want to set this to true if you
# are merging a number of input files of which some are known to be corrupt.
exclude_corrupted_input_files: False
#------------------------------
# Define Datasets to process
#------------------------------
#------
# common settings shared by all datasets
#------
i3_ending: 'i3.zst'
n_runs_per_merge: 100
in_file_pattern: '/data/sim/IceCube/2023/generated/CORSIKA_EHISTORY/filtered/level2/CORSIKA-in-ice/{dataset_number}/{folder_num_pre_offset_n_merged:04d}000-{folder_num_pre_offset_n_merged:04d}999/Level2_IC86.2024_corsika.{dataset_number:06d}.{run_number:04d}*.{i3_ending}'
out_file_pattern: '{data_type}_{step}.{dataset_number:06d}.{run_number:04d}XX'
out_dir_pattern: '{data_type}/{dataset_number}/{step}'
folder_pattern: '{folder_num_pre_offset:04d}000-{folder_num_pre_offset:04d}999'
folder_offset: 0
n_jobs_per_folder: 1000
# do not count weight frames first, assume every input file is 1 run
n_files_is_n_runs: False
gcd: '/cvmfs/icecube.opensciencegrid.org/data/GCD/GeoCalibDetectorStatus_2020.Run134142.Pass2_V0.i3.gz'
#------
datasets:
datasets_100per_merge__10000_runs :
cycler:
dataset_number: [22774]
runs_range: [0, 100] # set the number of runs to process, here 100 files are merged to one, using the key n_runs_per_merge in the global settings above
data_type: CORSIKA
step: 'level4'
datasets_10per_merge__10000_runs :
n_runs_per_merge: 10
in_file_pattern: '/data/sim/IceCube/2023/generated/CORSIKA_EHISTORY/filtered/level2/CORSIKA-in-ice/{dataset_number}/{folder_num_pre_offset_n_merged:04d}000-{folder_num_pre_offset_n_merged:04d}999/Level2_IC86.2024_corsika.{dataset_number:06d}.{run_number:05d}*.{i3_ending}'
out_file_pattern: '{data_type}_{step}.{dataset_number:06d}.{run_number:05d}X'
cycler:
dataset_number: [22776, 22777, 22778]
runs_range: [0, 1000] # set the number of runs to process, here 10 files are merged to one, using the key n_runs_per_merge
data_type: CORSIKA
step: 'level4'
datasets_10per_merge__20000_runs :
n_runs_per_merge: 10
in_file_pattern: '/data/sim/IceCube/2023/generated/CORSIKA_EHISTORY/filtered/level2/CORSIKA-in-ice/{dataset_number}/{folder_num_pre_offset_n_merged:04d}000-{folder_num_pre_offset_n_merged:04d}999/Level2_IC86.2024_corsika.{dataset_number:06d}.{run_number:05d}*.{i3_ending}'
out_file_pattern: '{data_type}_{step}.{dataset_number:06d}.{run_number:05d}X'
cycler:
dataset_number: [22775] # set the number of runs to process, here 10 files are merged to one, using the key n_runs_per_merge
runs_range: [0, 2000] # set the number of runs to process, here 10 files are merged to one, using the key n_runs_per_merge
data_type: CORSIKA
step: 'level4'
Then, general job templates and paths are defined. This doesn’t need to be adjusted.
# -------------------------------------------------------------
# Define environment information shared across processing steps
# -------------------------------------------------------------
job_template: job_templates/cvmfs_python.sh
script_name: general_i3_processing.py
cuda_home: /data/user/mhuennefeld/software/cuda/cuda-11.8
# add optional additions to the LD_LIBRARY_PATH
# Note: '{ld_library_path_prepends}' is the default which does not add anything
ld_library_path_prepends: '{ld_library_path_prepends}'
# Defines environment variables that are set from python
set_env_vars_from_python: {
# 'TF_DETERMINISTIC_OPS': '1',
}
In the next step, you can define several steps to run. Here, only one step is needed. For each step, you can load different python versions and icetray metaprojects. For example, this is necessary if a CORSIKA file was
produced with an old software and the I3MCTree was not stored, but you would like to re-create the tree. For this, you need to run the exact same software that was used to create the CORSIKA file. Here, in step 1 you could
re-create the I3MCTree, but in step 2 you can use the new software to process the data. In the list tray_segments
, you can define the actual processing steps.
#-----------------------------------------------
# Define I3Traysegments for each processing step
#-----------------------------------------------
# a list of processing steps. Each processing step contains
# information on the python and cvmfs environment as well as
# a list of I3TraySegments/Modules that will be added to the I3Tray.
# Any options defined in these nested dictionaries will supersede the
# ones defined globally in this config.
# Tray context can be accessed via "context-->key".
# For nested dictionaries it's possible to do: "context-->key.key2.key3"
# The configuration dictionary of the job can be passed via "<config>"
# Special keys for the tray_segments:
# ModuleClass: str
# The module/segment to run.
# ModuleKwargs: dict
# The parameters for the specified module.
# ModuleTimer: str
# If provided, a timer for this module will be added.
# Results of all timers are saved in the frame key "Duration".
processing_steps: [
# -----------------
# Processing step 1
# -----------------
{
# Define environment for this processing step
cvmfs_python: py3-v4.3.0,
icetray_metaproject: icetray/v1.10.0,
python_user_base_cpu: /data/user/pgutjahr/software/virtual_envs/tensorflow_gpu_py3-v4.3.0,
python_user_base_gpu: /data/user/pgutjahr/software/virtual_envs/tensorflow_gpu_py3-v4.3.0,
write_i3: False, # if True, i3 files are written
write_hdf5: True, # if True, hdf5 files are written
# define a list of tray segments to run
tray_segments: [
{
# load modules and do the actual stuff here...
# modules used for Level 4 are explained in detail below
}
],
},
]
The modules used in the tray segments are explained in detail below. The modules are loaded in the order they are defined in the list tray_segments
.
The first module is the ic3_processing.modules.processing.filter_events.filter_streams
module. This module is used to filter the streams of the input files. In this case, we only want to keep the InIceSplit
stream.
{
# Only keep 'InIceSplit' stream
ModuleClass: 'ic3_processing.modules.processing.filter_events.filter_streams',
ModuleKwargs: {streams_to_keep: ['InIceSplit']},
}
This module is followed by ic3_processing.modules.processing.filter_events.apply_l2_filter_mask
. This module is used to apply the muon filter.
{
# Only keep events of the MuonFilter
ModuleClass: 'ic3_processing.modules.processing.filter_events.apply_l2_filter_mask',
ModuleKwargs: {filter_base_name: 'MuonFilter'},
}
Then, ic3_processing.modules.processing.filter_events.I3OrphanFrameDropper
is used to discard orphan Q-frames. This module is used to remove Q-frames that are not followed by P-frames.
{
# Discard orphan Q-frames
ModuleClass: 'ic3_processing.modules.processing.filter_events.I3OrphanFrameDropper',
ModuleKwargs: {OrphanFrameStream: 'Q'},
}
Then, ic3_processing.modules.pulses.cleaning.apply_time_window_cleaning
is used to apply the time window cleaning of 6 µs.
{
# add cleaned muon pulses
ModuleClass: 'ic3_processing.modules.pulses.cleaning.apply_time_window_cleaning',
ModuleKwargs: {},
}
Then, ic3_processing.modules.reco.reco.apply_dnn_reco
applies the DNN reconstruction as explained before.
{
# run dnn-reco for pre cut
ModuleClass: 'ic3_processing.modules.reco.reco.apply_dnn_reco',
ModuleTimer: True,
ModuleKwargs: {
cfg: {
# global settings shared for all DNN-reco modules
add_dnn_reco: True,
DNN_batch_size: 1,
DNN_excluded_doms: [
'SaturationWindows', 'BadDomsList', 'CalibrationErrata',
],
DNN_partial_exclusion: True,
DNN_reco_configs: [
{
pulse_map_string: SplitInIceDSTPulsesTWCleaning6000ns,
DNN_model_names: [
'precut_surface_bundle_energy_3inputs_6ms_01',
],
DNN_ignore_misconfigured_settings_list: [],
DNN_models_dir: '/data/user/pgutjahr/exported_models/atmospheric_muon_leading/Old_CORSIKA/',
},
],
},
},
}
The moduel ic3_processing.modules.processing.filter_events.filter_events
applies cuts. Here, a pre-cut of 200 TeV on the bundle energy at surface is applied. The key is named bundle_energy_in_mctree
.
{
# apply loose pre-cut
ModuleClass: 'ic3_processing.modules.processing.filter_events.filter_events',
ModuleKwargs: {
# Define a config where each entry must contain the fields
# key, column, value, option, combination
# filter options are greater_than, less_than, equal_to, unequal_to
# combination options are: 'and', 'or'
# Two masks: one for each combination type will be created
# An event passes the filter if any of these two masks is True!
filter_list: [
{
'key': 'DeepLearningReco_precut_surface_bundle_energy_3inputs_6ms_01',
'column': 'bundle_energy_in_mctree',
'value': !!float 2e5,
'option': 'greater_than',
'combination': 'and',
},
],
},
}
After applying the pre-cut, again orphan Q-frames are discarded.
{
# Discard orphan Q-frames
ModuleClass: 'ic3_processing.modules.processing.filter_events.I3OrphanFrameDropper',
ModuleKwargs: {OrphanFrameStream: 'Q'},
}
The module ic3_labels.weights.segments.WeightEvents
calculates the weights for the events. The weights are used to re-weight the events to the correct energy spectrum. The weights are calculated using the corsika
and nugen
files. The weights are stored in the output file. However, for the final weighting of the events we use simweights.
{
# compute weights
ModuleClass: 'ic3_labels.weights.segments.WeightEvents',
ModuleKwargs: {
'infiles': context-->ic3_processing.infiles,
'dataset_type': '{data_type}',
'dataset_n_files': context-->ic3_processing.n_files,
'dataset_n_events_per_run': -1,
'dataset_number': '{dataset_number}',
'check_n_files': ['corsika', 'nugen'],
'add_mceq_weights': False,
'mceq_kwargs': {
'cache_file': '/data/ana/PointSource/DNNCascade/utils/cache/mceq.cache',
},
},
ModuleTimer: True,
}
The next module ic3_processing.modules.labels.primary.add_weighted_primary
extracts the cosmic ray primary from the I3MCTree before the muon propagation and saves it as MCPrimary
in the frame. This key
is needed for the weighting.
{
# add weighted primary
ModuleClass: 'ic3_processing.modules.labels.primary.add_weighted_primary',
ModuleKwargs: {},
}
The module ic3_processing.modules.labels.recreate_and_add_mmc_tracklist.RerunProposal
runs PROPOSAL again to re-create the I3MCTree. In the simulation, the trees are not saved to reduce the disk space.
# -----------------
# Recreate I3MCTree
# -----------------
{
# Re-create I3MCTree
ModuleClass: 'ic3_processing.modules.labels.recreate_and_add_mmc_tracklist.RerunProposal',
ModuleKwargs: {
'random_service_class': 'I3GSLRandomService',
},
ModuleTimer: True,
}
The moduel dnn_selections.selections.atmospheric_muon_leading.ic3.labels.MCMostEnergeticMuonInside
selects the most energetic muon of the muon bundle in the convex hull extended by 200 m and saves it as an I3Particle in the frame.
# ----------
# Add labels
# ----------
{
# add labels: Most energetic muon in convex hull
ModuleClass: 'dnn_selections.selections.atmospheric_muon_leading.ic3.labels.MCMostEnergeticMuonInside',
ModuleKwargs: {
AddPseudoMuon: True,
ConvexHull: icecube_hull_ext_200,
RunOnDAQFrames: True,
OutputKey: 'MostEnergeticMuonInside',
},
ModuleTimer: True,
}
The module dnn_selections.selections.atmospheric_muon_leading.ic3.labels.MCLabelsLeadingMuons
adds many labels used in the analysis. For example, all information about the primary particle, the energy of the muon bundle at the entry and the surface, same for the leading muon, the entry position, the closest approach point to the center of the detector, the number of muons entering the detector, the number of coincident primaries, the stochasticity of the muon bundle and also the bundle radius.
{
# add labels: MCLabelsLeadingMuons
ModuleClass: 'dnn_selections.selections.atmospheric_muon_leading.ic3.labels.MCLabelsLeadingMuons',
ModuleKwargs: {
'OutputKey': 'MCLabelsLeadingMuons',
'MostEnergeticMuonInsideKey': 'MostEnergeticMuonInside',
'ConvexHull': icecube_hull_ext_200,
'ComputeBundleRadius': False,
'ComputeStochasticity': False,
'FixMuonPairProductionBug': True,
},
ModuleTimer: True,
}
The module dnn_selections.selections.atmospheric_muon_leading.ic3.labels.MCLabelsMostEnergeticMuonParentInfo
adds the parent information of the most energetic muon. This is needed to tag a muon as prompt or conventional.
{
# add labels: MCLabelsMostEnergeticMuonParentInfo
ModuleClass: 'dnn_selections.selections.atmospheric_muon_leading.ic3.labels.MCLabelsMostEnergeticMuonParentInfo',
ModuleKwargs: {
'OutputKey': 'MCLabelsMostEnergeticMuonParentInfo',
'ConvexHull': icecube_hull_ext_200,
'MostEnergeticMuonInsideKey': 'MostEnergeticMuonInside',
},
ModuleTimer: True,
}
The module ic3_data.segments.CreateDNNData
creates the training data for the DNN. For the pre-cut network only 3 input features are generated, for the other networks 9 input features are calculated.
########################
### Check dnn input data
########################
{
# write DNN-reco training data to file [3 inputs, cleaned]
ModuleClass: 'ic3_data.segments.CreateDNNData',
ModuleKwargs: {
NumDataBins: 3,
RelativeTimeMethod: ,
DataFormat: reduced_summary_statistics_data,
PulseKey: SplitInIceDSTPulsesTWCleaning6000ns,
DOMExclusions: ['SaturationWindows','BadDomsList','CalibrationErrata'],
PartialExclusion: True,
OutputKey: 'dnn_data_SplitInIceDSTPulsesTWCleaning6000ns',
},
ModuleTimer: True,
},
{
# write DNN-reco training data to file [9 inputs, cleaned]
ModuleClass: 'ic3_data.segments.CreateDNNData',
ModuleKwargs: {
NumDataBins: 9,
RelativeTimeMethod: time_range,
DataFormat: pulse_summmary_clipped,
PulseKey: SplitInIceDSTPulsesTWCleaning6000ns,
DOMExclusions: ['SaturationWindows','BadDomsList','CalibrationErrata'],
PartialExclusion: True,
OutputKey: 'dnn_data_inputs9_SplitInIceDSTPulsesTWCleaning6000ns',
},
ModuleTimer: True,
}
The module icecube.common_variables.hit_statistics.I3HitStatisticsCalculator
calculates some basic statistics based on the input pulses, like charge etc. These can be used for some low-level checks. However, they are not used in the analysis.
########################
### Add hit statistics
########################
{
# Add hit statistics
ModuleClass: 'icecube.common_variables.hit_statistics.I3HitStatisticsCalculator',
ModuleKwargs: {
'PulseSeriesMapName': 'SplitInIceDSTPulses',
'OutputI3HitStatisticsValuesName': HitStatistics_SplitInIceDSTPulses,
},
},
{
# Add hit statistics
ModuleClass: 'icecube.common_variables.hit_statistics.I3HitStatisticsCalculator',
ModuleKwargs: {
'PulseSeriesMapName': 'SplitInIceDSTPulsesTWCleaning6000ns',
'OutputI3HitStatisticsValuesName': HitStatistics_SplitInIceDSTPulsesTWCleaning6000ns,
},
}
The module Delete
deletes some keys to reduce the size of the output files. This is not necessary, but it is recommended to do so.
{
# Delete keys to reduce space
ModuleClass: 'Delete',
ModuleKwargs: {Keys: [
I3MCTree, MMCTrackList,
I3MCTree_recreation_meta_info,
InIceRawData, I3MCPulseSeriesMap,
I3MCPulseSeriesMapParticleIDMap,
I3MCPulseSeriesMapPrimaryIDMap,
]},
}
After the processing is done, the output files are written. Here, you can define the file format and the streams that should be written. You also have to add the keys that should be written to the hdf5 files. Actually, some modules are written in a way, that they already append they produced keys to this list. For example, the DNN reconstruction modules append their keys to the list, which is why you don’t find them in the list below.
#--------------------
# File output options
#--------------------
# write output as i3 files via the I3Writer
write_i3: False
write_i3_kwargs: {
# only write these stream types,
# i.e ['Q', 'P', 'I', 'S', 'M', 'm', 'W', 'X']
'i3_streams': ['Q', 'P', 'I', 'S', 'M', 'm', 'W', 'X'],
}
# write output as hdf5 files via the I3HDFWriter
write_hdf5: True
write_hdf5_kwargs: {
# sub event streams to write
'SubEventStreams': ['InIceSplit'],
# HDF keys to write (in addition to the ones in
# tray.context['ic3_processing']['HDF_keys'])
# Note: added tray segments should add outputs that should be written
# to hdf5 to the tray context.
'Keys': [
# general
'I3EventHeader',
'DurationQ',
'DurationP',
# dnn reco input data
'dnn_data_SplitInIceDSTPulsesTWCleaning6000ns_bin_values',
'dnn_data_SplitInIceDSTPulsesTWCleaning6000ns_bin_indices',
'dnn_data_SplitInIceDSTPulsesTWCleaning6000ns_global_time_offset',
# labels
'MCLabelsLeadingMuons',
'MCLabelsMostEnergeticMuonParentInfo',
# weighting
'weights',
'weights_meta_info',
'MCPrimary',
'I3MCWeightDict',
'CorsikaWeightMap',
# Filters
'FilterMask',
'QFilterMask',
# Other
'MostEnergeticMuonInside',
'PolyplopiaPrimary',
# Hit Statistics
'HitStatistics_SplitInIceDSTPulses',
'HitStatistics_SplitInIceDSTPulsesTWCleaning6000ns',
# Snowstorm
# 'SnowstormParameterRanges',
'SnowstormParameters',
# 'SnowstormParametrizations',
# 'SnowstormProposalDistribution',
'SnowstormParameterDict',
],
}
Job submission
Log in to NPX (condor). Then, load the software environment as described above here.
Then, you have to create the job files. This is done via ic3-processing
:
ic3_create_job_files <path-to-config> -d <path-to-output-directory> -p <path-to-scratch-directory> --dagman
The tag -h
can be used to get a list of all options. The output directory is the directory where the output files are stored. The scratch directory is the directory where the job files are stored. When I log in to the submitter, this command could look like this to create the job files to process CORSIKA datasets to Level 4:
ic3_create_job_files /home/pgutjahr/dnn_selections/resources/configs/atmospheric_muon_leading/processing/Evaluation/CORSIKA_v1100/level4.yaml -d /data/user/pgutjahr/CORSIKA -p /scratch/pgutjahr/CORSIKA --dagman
To submit the jobs, you have to execute the bash script start_dagman.sh
, like this:
/scratch/pgutjahr/CORSIKA/level4_0000/start_dagman.sh
As usual, condor_q
can be used to check the status of the jobs.
However, often jobs crash due to memory issues. Some jobs require more memory than others. Sometimes a single jobs needs up to 20 GB of memory. Setting the default to 20 GB is not a good idea, because then you get less jobs on the cluster and the processing will take much longer, and also other users will get less jobs because memory is limited. Instead, when all your jobs are either done or crashed, you can check the log files of the jobs that crashed. For this, I run:
grep -r Error /scratch/pgutjahr/CORSIKA/level4_0000/logs/
For example, this will indicate that jobs have requested 6 GB of memory, but actually 10 are needed. Then, you can adjust this in the OneJob.submit
file and submit the jobs again. To edit the file, you can use any text editor,
like vim
or nano
.
vim /scratch/pgutjahr/CORSIKA/level4_0000/OneJob.submit
# change request_memory = 6gb to request_memory = 10gb
# save and exit
# submit the jobs again, like you did before
/scratch/pgutjahr/CORSIKA/level4_0000/start_dagman.sh
This will submit only the crashed jobs again with the new memory request. If you some jobs fail again because a few jobs need even more memory, you can repeat this process. Sometimes, the jobs don’t crash, but
they end up being hold. In this case, just kill the jobs as usual by condor_rm <job_id>
or condor_rm <user>
when there was only one submit.
Now we are done with the processing. In the output directory you will find the processed hdf5 files, which can now be loaded in the analysis notebooks.
Technical review
Processing all files needs a lot of resources and time. Thus, I created some configs for the technical review to process just a subset of the entire dataset.
I prepared these configs to process CORSIKA, NuGen and exp data to Level 5. In this case, I just added the quality cuts to the configs. These are the exact same cuts that are applied in the analysis notebooks, but instead of the function apply_quality_cuts
, I just added the cuts directly to the config using the module ic3_processing.modules.processing.filter_events.filter_events
, which is also used to apply the pre-cut. The configs are stored here:
CORSIKA:
/home/pgutjahr/dnn_selections/resources/configs/atmospheric_muon_leading/tech_review/tech_review_level5_CORSIKA.yaml
NuGen:
/home/pgutjahr/dnn_selections/resources/configs/atmospheric_muon_leading/tech_review/tech_review_level5_NuGen.yaml
exp data:
/home/pgutjahr/dnn_selections/resources/configs/atmospheric_muon_leading/tech_review/tech_review_level5_exp.yaml
Now you can create the jobs files and submit the jobs as described above. One CORSIKA job will require about 13 GB memory. You can do the steps mentioned above to adjust
the memory in the OneJob.submit
file and submit the job again.
Data-MC
Unfolding
To run the unfolding, there is a script run_funfolding.py
. This script is used to run the unfolding on the processed hdf5 files. This can be done on a cobalt machine
in a tmux session. Check the available memory before running the script, because it can take up to 70 GB of memory. This could be done with free -h
or htop
.
Then, you can run the script like this:
# log in to a cobalt machine and check the availale memory
htop
# start a tmux session
tmux new -s unfolding
# load the software environment
source /home/pgutjahr/py3421_env/bin/activate
python run_funfolding.py <path-to-config>
Unfolding script:
/home/pgutjahr/dnn_selections/notebooks/atmospheric_muon_leading/selection_performance/unfolding/scratch/run_funfolding.py
Unfolding config:
/home/pgutjahr/dnn_selections/resources/configs/atmospheric_muon_leading/unfolding/test_primary_models/mcmc_train_H4a_test_GSF_01_tau001.yaml
To run the script with this config, you have to set the path to the final output directory. This is done in the config file with the key plot_dir
.
Install software
Note
These instructions install tensorflow software 2.14 for dnn_reco framework version 2. The models trained with dnn_reco version 1 are not compatible with the dnn_reco version 2, as mentioned in the release notes here. Thus, to reproduce the analysis with the exact same networks, dnn_reco v1 must be installed. However, you can also load the environment installed by myself, which is located in /data/user/pgutjahr/software/virtual_envs/tensorflow_gpu_py3-v4.3.0, as mentioned above.
The following instructions to install the software are for the IceCube cluster and extracted from the intructions of the dnn_reco framework.
# load cvmfs python environment
eval $(/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/setup.sh)
mkdir /data/user/{USER}/technical_review/virtual_envs
cd /data/user/{USER}/technical_review/virtual_envs
python -m virtualenv py3-v4.3.0_tensorflow2.14
source py3-v4.3.0_tensorflow2.14/bin/activate
# change activation script such that it prepends the path
# to the virtual environment to the PYTHONPATH environment variable
perl -i -0pe 's/_OLD_VIRTUAL_PATH\="\$PATH"\nPATH\="\$VIRTUAL_ENV\/bin:\$PATH"\nexport PATH/_OLD_VIRTUAL_PATH\="\$PATH"\nPATH\="\$VIRTUAL_ENV\/bin:\$PATH"\nexport PATH\n\n# prepend virtual env path to PYTHONPATH if set\nif ! \[ -z "\$\{PYTHONPATH+_\}" \] ; then\n _OLD_VIRTUAL_PYTHONPATH\="\$PYTHONPATH"\n export PYTHONPATH\=\$VIRTUAL_ENV\/lib\/python3.7\/site-packages:\$PYTHONPATH\nfi/' py3-v4.3.0_tensorflow2.14/bin/activate
perl -i -0pe 's/ export PYTHONHOME\n unset _OLD_VIRTUAL_PYTHONHOME\n fi/ export PYTHONHOME\n unset _OLD_VIRTUAL_PYTHONHOME\n fi\n\n if ! \[ -z "\$\{_OLD_VIRTUAL_PYTHONPATH+_\}" \] ; then\n PYTHONPATH\="\$_OLD_VIRTUAL_PYTHONPATH"\n export PYTHONPATH\n unset _OLD_VIRTUAL_PYTHONPATH\n fi/' py3-v4.3.0_tensorflow2.14/bin/activate
pip install tensorflow==2.14 tensorflow_probability==0.22.1
cat << 'EOF' >> py3-v4.3.0_tensorflow2.14/bin/activate
export CUDA_HOME=/data/user/mhuennefeld/software/cuda/cuda-11.8
export PATH=$PATH:${CUDA_HOME}/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${CUDA_HOME}/lib64
EOF
pip install pybind11
export I3_BUILD=/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/RHEL_7_x86_64/metaprojects/icetray/v1.12.0/
export I3_SRC=/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/metaprojects/icetray/v1.12.0/
pip install git+ssh://git@github.com/icecube/TFScripts
pip install git+ssh://git@github.com/icecube/ic3-labels
pip install git+ssh://git@github.com/icecube/ic3-data
########################################################################
# typically,
# if there is a HDF5 version mismatch we must install h5py from source
# Use: 'h5cc -showconfig' to obtain hdf5 configuration and library version
# use: 'ls -lah $(which h5cc)' to obtain path to hdf5 directory
pip uninstall h5py
HDF5_VERSION=1.14.0 HDF5_DIR=/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/RHEL_7_x86_64/spack/opt/spack/linux-centos7-x86_64_v2/gcc-13.1.0/hdf5-1.14.0-4p2djysy6f7vful3egmycsguijjddkah pip install --no-binary=h5py h5py==3.11.0
########################################################################
mkdir /data/user/{USER}/technical_review/repositories
cd /data/user/{USER}/technical_review/repositories
git clone https://github.com/mhuen/dnn_reco.git
# install package
pip install -e dnn_reco
git clone https://github.com/mhuen/ic3-processing.git
pip install -e ic3-processing
# Now you can verify your installation. Log out from cobalt and log in again.
# load icecube environment
eval $(/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/setup.sh)
/cvmfs/icecube.opensciencegrid.org/py3-v4.3.0/RHEL_7_x86_64/metaprojects/icetray/v1.12.0/env-shell.sh
source /data/user/{USER}/technical_review/virtual_envs/py3-v4.3.0_tensorflow2.14/bin/activate
echo $CUDA_HOME
# this should print: /data/user/mhuennefeld/software/cuda/cuda-11.8
python -c 'import tensorflow as tf; print(tf.__version__)'
# this should print some information about about tensorflow, which can take up to a minute, the the last print should be: 2.14.0
python -c 'import dnn_reco; import tfscripts; import ic3_labels; import ic3_data'
# this should not print any error messages (Could not find TensorRT -- is expected)
# install analysis repo
cd /data/user/{USER}/technical_review/repositories
git clone git@github.com:icecube/dnn_selections.git
pip install -e dnn_selections
cd dnn_selections
git checkout -b AnalysisPipeline origin/AnalysisPipeline
Note
The prebuilt tensorflow binary is built to use avx2 and ssse3 instructions among others.
These are not available on cobalts 1 through 4.
Attempting to import tensorflow will lead to an “illegal instructions”
error. Therefore, if running on the cobalts, simply choose one of the
newer machines: cobalt >=5.
On NPX, if running CPU jobs, you can request nodes with avx2 and ssse3
support by adding: requirements = (TARGET.has_avx2) && (TARGET.has_ssse3)
. This is only necessary for CPU jobs. For GPU jobs,
these requirements should not be set.