Technical Documentation¶

The analysis scripts can be found in the StoppingMuonAnalysis repository located in /home/lwitthaus/stoppingmuons/. The data processing makes use of additional methods defined in the ic3-labels and ic3-data repos.

The scripts for training the network are not included in the analysis repo. They can be found in the dnn_reco repo with a detailed documentation about their usage.

All scripts can be run using the virtual environment located in /home/lwitthaus/venvs/stmuons_py3v10/.

The general analysis workflow consists of the following steps:

Create labels for untriggered level 0 MC data (needed for effective area estimation)
Create the level 2 DNN training data
Train and export the networks
Apply the networks to MC/measured data
Create plots to check reconstruction quality (for MC only)
Calculate effective areas
Unfolding

Processing the data¶

All processing related scripts are located in the ic3-processing repository where the custom modules for this analysis are defined in /home/lwitthaus/stoppingmuons/stoppingmuons/processing/. and the configs to run the processing are stored in /home/lwitthaus/stoppingmuons/configs/processing. Information about their usage is provided in the corresponding README.md of the ic3-processing repo. It is possible to create jobfiles to run the scripts on the cluster with dagman (recommended) or run them locally (for testing).

Data processing is required for the following analysis steps:

Creating labels for untriggered MC data
Creating training data for the DNN
Applying the DNNs to MC/measured data (main analysis processing)

The processing is divided into its sub-steps, where each step is configured via an individual config file.

The steps are executed in the following order:

Creating labels for untriggered data
1. L0Processing.yaml
Creating training data:
1. TrainingData.yaml
DNN Application to data
1. L2Processing.yaml

The resulting files are stored in hdf5 format. These files serve as input for the analysis.

Training the Network¶

As mentioned above, the scripts for training the networks are located in the dnn_reco repo. The configs to train the networks are provided in /home/lwitthaus/stoppingmuons/configs/training/. The training data is located in /data/user/lwitthaus/TrainingData/datasets/20904/TrainingData/.

Once the configs are available the training is essentially run by following the steps described in this section of the documentation. A data transformation model has to be created by running the script python create_trafo_model.py /PATH/TO/CONFIG and then the training is started by running python train_model.py /PATH/TO/CONFIG.

Data¶

Paths to finalized data sets will be linked here: …

Running the analysis¶

All subsequent steps for running the analysis are handeled by the stoppingmuons/stoppingmuons/analysis/run_analysis.py script. It uses a designated config file via python analysis.py /PATH/TO/CONFIG which contains all necessary parameters. A standard example is provided by stoppingmuons/configs/analysis/standard_analysis.yaml.

Find further information about the settings below:

Main¶

unique_name: str
Unique name for the analysis. This will be used as a folder name to save the analysis results.
output_path: str
Output path for the analysis results. A folder unique_name will be created in this directory.

Loading data¶

inputs: dict
Dictionary containing all information about the data.

Each key specifies a distinct set of data. This analysis uses level 2 MC data, measured level 2 IceCube data and untriggered MC data. The following information has to be given:
- Input path
- Saved as hdf5 true/false (alternative: simple .pickle file)
- Data level
- Data type
- Read livetime true/false (only necessary for measured data)
cut_exceptions: list
List of data set keys to not apply the analysis cuts to.

This usually refers to the untriggered MC data.

Analysis cuts¶

filtermask: str
Applied filter mask. Corresponds to the key the mask is saved in.
cuts: dict
Define analysis cuts.

The keys correspond to the parameter the cut should be applied to. The values are the cut values.

Reconstructions¶

evaluate_score_distribution: bool
Evaluate the distribution of stopping muon score values as predicted by the DNN.
plot_multiplicity: bool
Create stopping muon multiplicity plot based on MC data.
plot correlations: bool
Create correlation plots between MC truths and reconstructions based on MC data.
plot_data_mc_agreement: bool
Create data-mc agreement plots for all DNN reconstructions.
analyse_mi_scores: bool
Calculate and plot mutual information scores between defined proxies and the two target quantities of the analysis (propagation length and surface energy).
observables: list of str
Proxies for mutual information scores.

Effective area¶

calculate_effective_area_range: bool
Calculate effective area for propagation length unfolding.
calculate_effective_area_energy: bool
Calculate effective area for surface energy unfolding.
load_effective_area_range: bool
Load previously calculated effecitve area for propagation length unfolding.
load_effective_area_energy: bool
Load previously calculated effecitve area for energy unfolding.
effective_area_iteration_number: int
Number of bootstrapping iterations for calculating the effective area.
include_effective_area_syst_unc: bool
Include systematic uncertainties in the calculation of the effecitve area.

Unfolding¶

is_sim: bool
Unfold simulation data or measured data.
unfold_range: bool
Unfold propagation length distribution.
optimize_unfolding_range: bool
Optimize regularization strength for propagation length unfolding.
load_unfolding_range_results: bool
Load previous unfolding results.
plot_results_range: bool
Plot unfolded propagation length flux.
evaluate_unfolding_range: bool
Create evaluation plots for propagation length unfolding.
unfold_energy: bool
Unfold surface energy distribution.
optimize_unfolding_energy: bool
Optimize regularization strength for surface energy unfolding.
mceq_path (optional): str
MCEq theory results input path.
load_unfolding_energy_results: bool
Load previous unfolding results.
plot_results_energy: bool
Plot unfolded surface energy flux.
evaluate_unfolding_energy: bool
Create evaluation plots for surface energy unfolding.
train_size: int
Number of events sampled for training.
test_size: int
Number of events sampled for testing.
reg_range: float or list of floats
Regularization strength for propagation length unfolding. Either total or binwise.
reg_energy: float or list of floats
Regularization strength for surface energy unfolding. Either total or binwise.
epsilon: float
Generic epsilon for systematics in funfolding.
plot_systematics: bool
Create per-bin systematics plots.
bin_index: int
Define bin to plot systematics for. If None, the systematics are plotted for all bins.
reg_min: float
Minimum regularization strength for optimization.
reg_max: float
Maximum regularization strenght for optimization.
n_steps: int
Number of grid points for optimization.
include_systematics: bool
Include systematic uncertainties in the unfolding.
n_used_steps: int
Number of steps in MCMC unfolding.
n_burn_steps: int
Number of burn in steps in MCMC unfolding.
n_walkers: int
Number of MCMC walkers.
range_fit: bool
Fit effective ice parameters a and b using propagation length unfolding resutls.
energy_fit: bool
Fit normalization and spectral index using surface energy unfolding results.

Systematics¶

data_based_systematics_range: bool
Calculate data based systematics for propagation length unfolding
calculate_effective_area_systematics_range: bool
Calculate effective areas for all defined detector sectors in propagation length unfolding.
data_based_systematics_energy: bool
Calculate data based systematics for surface energy unfolding.
calculate_effective_area_systematics_energy: bool
Calculate effective areas for all defined detector sectors in surface energy.
sectors: dict
Define cuts on reconstructions to divide detector into different sectors.

Binning¶

range_min: float
Lower limit for propagation length bins.
range_max: float
Upper limit for propagation length bins.
n_range_bins: int
Number of propagation length bins.
n_proxy_bins: int
Number of proxy bins.
range_log: bool
Logarithmic propagation length bins.
energy_min: float
Lower limit for surface energy bins.
energy_max: float
Upper limit for surface energy bins.
n_energy_bins: 10
Number of surface energy bins.
energy_log: bool
Logarithmic surface energy bins.

General¶

model: str
Flux model (key in files).
seed: int
Random seed.

Data¶

keys: dict
Keys that are read from the hdf5 files. Each parent key corresponds to a data type. The sub-keys define which key to read from the individual files. The value names correspond to [“name_to_save_in_dictionary”, “name_in_hdf5_file”].

Notebooks for Testing¶

Some analysis tests are not included in the pipeline. These tests can be run from within jupyter notebooks located in stoppingmuons/notebooks/. They have no further use to the analysis but serve as important cross checks for certain steps. All test results are included and evaluated in this Wiki.