Technical Documentation

The analysis scripts can be found in the StoppingMuonAnalysis repository located in /home/lwitthaus/stoppingmuons/. The data processing makes use of additional methods defined in the ic3-labels and ic3-data repos.

The scripts for training the network are not included in the analysis repo. They can be found in the dnn_reco repo with a detailed documentation about their usage.

All scripts can be run using the virtual environment located in /home/lwitthaus/venvs/stmuons_py3v10/.

The general analysis workflow consists of the following steps:

  1. Create labels for untriggered level 0 MC data (needed for effective area estimation)

  2. Create the level 2 DNN training data

  3. Train and export the networks

  4. Apply the networks to MC/measured data

  5. Create plots to check reconstruction quality (for MC only)

  6. Calculate effective areas

  7. Unfolding

Processing the data

All processing related scripts are located in the ic3-processing repository where the custom modules for this analysis are defined in /home/lwitthaus/stoppingmuons/stoppingmuons/processing/. and the configs to run the processing are stored in /home/lwitthaus/stoppingmuons/configs/processing. Information about their usage is provided in the corresponding README.md of the ic3-processing repo. It is possible to create jobfiles to run the scripts on the cluster with dagman (recommended) or run them locally (for testing).

Data processing is required for the following analysis steps:

  • Creating labels for untriggered MC data

  • Creating training data for the DNN

  • Applying the DNNs to MC/measured data (main analysis processing)

The processing is divided into its sub-steps, where each step is configured via an individual config file.

The steps are executed in the following order:

  • Creating labels for untriggered data

    1. L0Processing.yaml

  • Creating training data:

    1. TrainingData.yaml

  • DNN Application to data

    1. L2Processing.yaml

The resulting files are stored in hdf5 format. These files serve as input for the analysis.

Training the Network

As mentioned above, the scripts for training the networks are located in the dnn_reco repo. The configs to train the networks are provided in /home/lwitthaus/stoppingmuons/configs/training/. The training data is located in /data/user/lwitthaus/TrainingData/datasets/20904/TrainingData/.

Once the configs are available the training is essentially run by following the steps described in this section of the documentation. A data transformation model has to be created by running the script python create_trafo_model.py /PATH/TO/CONFIG and then the training is started by running python train_model.py /PATH/TO/CONFIG.

Data

Paths to finalized data sets will be linked here: …

Running the analysis

All subsequent steps for running the analysis are handeled by the stoppingmuons/stoppingmuons/analysis/run_analysis.py script. It uses a designated config file via python analysis.py /PATH/TO/CONFIG which contains all necessary parameters. A standard example is provided by stoppingmuons/configs/analysis/standard_analysis.yaml.

Find further information about the settings below:

Main

  • unique_name: str

    Unique name for the analysis. This will be used as a folder name to save the analysis results.

  • output_path: str

    Output path for the analysis results. A folder unique_name will be created in this directory.

Loading data

  • inputs: dict

    Dictionary containing all information about the data.

    Each key specifies a distinct set of data. This analysis uses level 2 MC data, measured level 2 IceCube data and untriggered MC data. The following information has to be given:

    • Input path

    • Saved as hdf5 true/false (alternative: simple .pickle file)

    • Data level

    • Data type

    • Read livetime true/false (only necessary for measured data)

  • cut_exceptions: list

    List of data set keys to not apply the analysis cuts to.

    This usually refers to the untriggered MC data.

Analysis cuts

  • filtermask: str

    Applied filter mask. Corresponds to the key the mask is saved in.

  • cuts: dict

    Define analysis cuts.

    The keys correspond to the parameter the cut should be applied to. The values are the cut values.

Reconstructions

  • evaluate_score_distribution: bool

    Evaluate the distribution of stopping muon score values as predicted by the DNN.

  • plot_multiplicity: bool

    Create stopping muon multiplicity plot based on MC data.

  • plot correlations: bool

    Create correlation plots between MC truths and reconstructions based on MC data.

  • plot_data_mc_agreement: bool

    Create data-mc agreement plots for all DNN reconstructions.

  • analyse_mi_scores: bool

    Calculate and plot mutual information scores between defined proxies and the two target quantities of the analysis (propagation length and surface energy).

  • observables: list of str

    Proxies for mutual information scores.

Effective area

  • calculate_effective_area_range: bool

    Calculate effective area for propagation length unfolding.

  • calculate_effective_area_energy: bool

    Calculate effective area for surface energy unfolding.

  • load_effective_area_range: bool

    Load previously calculated effecitve area for propagation length unfolding.

  • load_effective_area_energy: bool

    Load previously calculated effecitve area for energy unfolding.

  • effective_area_iteration_number: int

    Number of bootstrapping iterations for calculating the effective area.

  • include_effective_area_syst_unc: bool

    Include systematic uncertainties in the calculation of the effecitve area.

Unfolding

  • is_sim: bool

    Unfold simulation data or measured data.

  • unfold_range: bool

    Unfold propagation length distribution.

  • optimize_unfolding_range: bool

    Optimize regularization strength for propagation length unfolding.

  • load_unfolding_range_results: bool

    Load previous unfolding results.

  • plot_results_range: bool

    Plot unfolded propagation length flux.

  • evaluate_unfolding_range: bool

    Create evaluation plots for propagation length unfolding.

  • unfold_energy: bool

    Unfold surface energy distribution.

  • optimize_unfolding_energy: bool

    Optimize regularization strength for surface energy unfolding.

  • mceq_path (optional): str

    MCEq theory results input path.

  • load_unfolding_energy_results: bool

    Load previous unfolding results.

  • plot_results_energy: bool

    Plot unfolded surface energy flux.

  • evaluate_unfolding_energy: bool

    Create evaluation plots for surface energy unfolding.

  • train_size: int

    Number of events sampled for training.

  • test_size: int

    Number of events sampled for testing.

  • reg_range: float or list of floats

    Regularization strength for propagation length unfolding. Either total or binwise.

  • reg_energy: float or list of floats

    Regularization strength for surface energy unfolding. Either total or binwise.

  • epsilon: float

    Generic epsilon for systematics in funfolding.

  • plot_systematics: bool

    Create per-bin systematics plots.

  • bin_index: int

    Define bin to plot systematics for. If None, the systematics are plotted for all bins.

  • reg_min: float

    Minimum regularization strength for optimization.

  • reg_max: float

    Maximum regularization strenght for optimization.

  • n_steps: int

    Number of grid points for optimization.

  • include_systematics: bool

    Include systematic uncertainties in the unfolding.

  • n_used_steps: int

    Number of steps in MCMC unfolding.

  • n_burn_steps: int

    Number of burn in steps in MCMC unfolding.

  • n_walkers: int

    Number of MCMC walkers.

  • range_fit: bool

    Fit effective ice parameters a and b using propagation length unfolding resutls.

  • energy_fit: bool

    Fit normalization and spectral index using surface energy unfolding results.

Systematics

  • data_based_systematics_range: bool

    Calculate data based systematics for propagation length unfolding

  • calculate_effective_area_systematics_range: bool

    Calculate effective areas for all defined detector sectors in propagation length unfolding.

  • data_based_systematics_energy: bool

    Calculate data based systematics for surface energy unfolding.

  • calculate_effective_area_systematics_energy: bool

    Calculate effective areas for all defined detector sectors in surface energy.

  • sectors: dict

    Define cuts on reconstructions to divide detector into different sectors.

Binning

  • range_min: float

    Lower limit for propagation length bins.

  • range_max: float

    Upper limit for propagation length bins.

  • n_range_bins: int

    Number of propagation length bins.

  • n_proxy_bins: int

    Number of proxy bins.

  • range_log: bool

    Logarithmic propagation length bins.

  • energy_min: float

    Lower limit for surface energy bins.

  • energy_max: float

    Upper limit for surface energy bins.

  • n_energy_bins: 10

    Number of surface energy bins.

  • energy_log: bool

    Logarithmic surface energy bins.

General

  • model: str

    Flux model (key in files).

  • seed: int

    Random seed.

Data

  • keys: dict

    Keys that are read from the hdf5 files. Each parent key corresponds to a data type. The sub-keys define which key to read from the individual files. The value names correspond to [“name_to_save_in_dictionary”, “name_in_hdf5_file”].

Notebooks for Testing

Some analysis tests are not included in the pipeline. These tests can be run from within jupyter notebooks located in stoppingmuons/notebooks/. They have no further use to the analysis but serve as important cross checks for certain steps. All test results are included and evaluated in this Wiki.