Example: Training of a Cascade Hypothesis

Here, an example is shown that demonstrates how the Event-Generator framework can be used to define and train a model for a simple cascade hypothesis consisting of 7 parameters (x, y, z, zenith, azimuth, energy, time).

The framework is steered by a general yaml-config file. In this config file, all necessary settings are defined from training data specifications, to event hypothesis, likelihood function, and neural network model architecture. For this example, we will use the config file provided here.

Configuration Settings

The configuration file includes many settings. Here, some of the most important ones are highlighted.

Model Settings

The config file contains a section title “Model Settings” with a dictionary of the name model_settings. This is arguably the heart of the configuration file, as it defines the event hypothesis. This dictionary is defined in a nested way, where each element must at least define the model_class and keyword arguments to that python class via the config parameter. Instances of Multi-Source classes will also need to provide the nested models via the key multi_source_bases.

The general structure of the defined event hypothesis in the config file looks like the following:

# Settings for the neural network model class
model_settings: {

    # The source class to use
    'model_class': 'egenerator.model.multi_source.independent.IndependentMultiSource',

    # configuration settings for DefaultMultiCascadeModel object
    config: {
        'sources': {
            'noise': 'noise',
            'cascade': 'cascade',
        },
    },

    # Base sources used by the multi-source class
    multi_source_bases: {

        noise: {
            # The noise source model class to use
            'model_class': 'egenerator.model.source.noise.default.DefaultNoiseModel',
            config: {},
        },

        cascade: {
            # The cascade source model class to use
            'model_class': 'egenerator.model.source.cascade.default.DefaultCascadeModel',

            config: {
                [...]
            },
        },
    },
}

At the top-most level, this model is defined via the IndependentMultiSource class, which is a class that combines multiple sources assuming that they are independent. In this case, the model is set up with the defined sources named “noise” and “cascade”. The configuration for these sub-sources are provided in the multi_source_bases key. Essentially this means that a 2-component event hypothesis is defined consisting of a noise model via the class DefaultNoiseModel and a cascade via the class DefaultCascadeModel. Training via this configuration file will therefore train these two individual source definitions at the same time. Note that these are saved and exported in the same nested structure. Therefore, the trained cascade component can be used individually later on, if desired.

This example demonstrates a simple nested structure of 2 base sources. In principle, one can add an arbitrary number of layers to this nested structure. A more complex example that defines a 2-cascade hypothesis with a previously trained cascade model is given here. By providing the keys load_dir and data_trafo_settings, a previously trained model is loaded instead of starting from scratch.

An additional example for a “track” consisting of 10 cascades is provided here. As of now, the Event-Generator framework does not scale too well to Multi-Sources with many individual models. A track definition with 100-1000 independent cascade models is currently not feasible. This might require some optimizations in the software framework. A workaround may be to directly define more complex sources for cases such as these, instead of stacking 100 individual models.

Loss module settings

This defines the loss-function (likelihood) to use. For this example config, the likelihood is defined via two components. The first component is an unbinned likelihood for the pulse arrival times, defined in the function unbinned_pulse_time_llh of the module DefaultLossModule. The second component is a negative binomial distribution for the total measured charge via the function negative_binomial_charge_pdf. The negative binomial is chosen instead of a simple Poisson distribution, such that over-dispersion from marginalization over systematic parameters can be accounted for.

Training Settings

The key training_settings defines the training and learning rate decay strategy. In this example, the ADAM optimizer is used with 3 training steps. The first training step consisting of 10000 optimization steps uses a learning rate that starts at 0.01 and then decays linearly to 0.001. Afterwards, 490000 optimization steps are performed with a fixed learning rate of 0.001. Finally, 500000 steps are performed with a learning rate that decays from 0.001 down to 0.000001 with a polynomial of second degree. In total, the model is trained for 1 million optimization steps as defined in the num_training_iterations key. One optimization step is the forward and backward propagation one batch of training data.

The key save_frequency defines at which intervals the model is saved to disk. The Event-Generator framework keeps track of how many training steps are performed with which configuration file. You can stop and restart the training procedure at any time. Concurrent training will pick-up from the last saved checkpoint, unless the setting model_manager_settings['restore_model'] is set to False. Note however, that the optimizer settings are not saved to disk. Thus, interrupting and restarting will mean that the learning rate strategy starts from the beginning. In any case, all configuration and training settings are saved to together with the model checkpoint, such that it remains reproducible.

Training and Exporting an Model

Once the configuration file is created, the model can be trained. This is done in two steps. First, the data transformation model must be created with the python script create_trafo_model.py . This model performs basic transformations to the event hypothesis input parameters such that these are normalized and easier to use within the network architecture. Create the transformation model by running the following command:

python create_trafo_model.py /PATH/TO/MY/CONFIG/FILE

This step should be fairly quick to run. A number of batches will be read in from the training data to obtain summary statistics on the event hypothesis parameters.

Once this steps completes, the training of the neural networks can begin by executing the following:

python train_model.py /PATH/TO/MY/CONFIG/FILE

As noted above in the Training Settings, this step may be run as many times and with as many different training settings as desired.

When training is complete, the model can be exported by running the following command:

python export_manager.py /PATH/TO/MY/CONFIG/FILE -o /PATH/TO/OUTPUT/DIR

The exported model can then be distributed and used within the provided I3TraySegments as described in the section Apply Exported Model.