Example: Training of a Cascade Hypothesis
Here, an example is shown that demonstrates how the Event-Generator
framework
can be used to define and train a model for a simple cascade hypothesis
consisting of 7 parameters (x, y, z, zenith, azimuth, energy, time).
The framework is steered by a general yaml-config file. In this config file, all necessary settings are defined from training data specifications, to event hypothesis, likelihood function, and neural network model architecture. For this example, we will use the config file provided here.
Configuration Settings
The configuration file includes many settings. Here, some of the most important ones are highlighted.
Model Settings
The config file contains a section title “Model Settings” with a dictionary
of the name model_settings
. This is arguably the heart of the configuration
file, as it defines the event hypothesis.
This dictionary is defined in a nested way, where each element must at least
define the model_class
and keyword arguments to that python class via the
config
parameter. Instances of Multi-Source
classes will also need
to provide the nested models via the key multi_source_bases
.
The general structure of the defined event hypothesis in the config file looks like the following:
# Settings for the neural network model class
model_settings: {
# The source class to use
'model_class': 'egenerator.model.multi_source.independent.IndependentMultiSource',
# configuration settings for DefaultMultiCascadeModel object
config: {
'sources': {
'noise': 'noise',
'cascade': 'cascade',
},
},
# Base sources used by the multi-source class
multi_source_bases: {
noise: {
# The noise source model class to use
'model_class': 'egenerator.model.source.noise.default.DefaultNoiseModel',
config: {},
},
cascade: {
# The cascade source model class to use
'model_class': 'egenerator.model.source.cascade.default.DefaultCascadeModel',
config: {
[...]
},
},
},
}
At the top-most level, this model is defined via the IndependentMultiSource
class,
which is a class that combines multiple sources assuming that they are independent.
In this case, the model is set up with the defined sources
named “noise” and “cascade”.
The configuration for these sub-sources are provided in the multi_source_bases
key.
Essentially this means that a 2-component event hypothesis is defined consisting of
a noise model via the class DefaultNoiseModel
and a cascade via the class DefaultCascadeModel
.
Training via this configuration file will therefore train these two individual source definitions at the same time.
Note that these are saved and exported in the same nested structure. Therefore, the trained
cascade component can be used individually later on, if desired.
This example demonstrates a simple nested structure of 2 base sources.
In principle, one can add an arbitrary number of layers to this nested structure.
A more complex example that defines a 2-cascade hypothesis with a previously
trained cascade model is given here.
By providing the keys load_dir
and data_trafo_settings
, a previously trained
model is loaded instead of starting from scratch.
An additional example for a “track” consisting of 10 cascades is provided here.
As of now, the Event-Generator
framework does not scale too well to Multi-Sources with many
individual models. A track definition with 100-1000 independent cascade models is currently
not feasible. This might require some optimizations in the software framework.
A workaround may be to directly define more complex sources for cases such as these, instead
of stacking 100 individual models.
Loss module settings
This defines the loss-function (likelihood) to use. For this example config,
the likelihood is defined via two components.
The first component is an unbinned likelihood for the pulse arrival times,
defined in the function unbinned_pulse_time_llh
of the module DefaultLossModule
.
The second component is a negative binomial distribution for the total measured charge
via the function negative_binomial_charge_pdf
.
The negative binomial is chosen instead of a simple Poisson distribution, such
that over-dispersion from marginalization over systematic parameters can be accounted for.
Training Settings
The key training_settings
defines the training and learning rate decay
strategy.
In this example, the ADAM
optimizer is used with 3 training steps.
The first training step consisting of 10000 optimization steps uses a learning rate
that starts at 0.01 and then decays linearly to 0.001.
Afterwards, 490000 optimization steps are performed with a fixed learning rate
of 0.001.
Finally, 500000 steps are performed with a learning rate that decays from 0.001
down to 0.000001 with a polynomial of second degree.
In total, the model is trained for 1 million optimization steps as defined
in the num_training_iterations
key.
One optimization step is the forward and backward propagation one batch of
training data.
The key save_frequency
defines at which intervals the model is saved to disk.
The Event-Generator
framework keeps track of how many training steps are performed
with which configuration file. You can stop and restart the training procedure
at any time. Concurrent training will pick-up from the last saved checkpoint,
unless the setting model_manager_settings['restore_model']
is set to False.
Note however, that the optimizer settings are not saved to disk. Thus, interrupting
and restarting will mean that the learning rate strategy starts from the beginning.
In any case, all configuration and training settings are saved to together with
the model checkpoint, such that it remains reproducible.
Training and Exporting an Model
Once the configuration file is created, the model can be trained. This is done in two steps. First, the data transformation model must be created with the python script create_trafo_model.py . This model performs basic transformations to the event hypothesis input parameters such that these are normalized and easier to use within the network architecture. Create the transformation model by running the following command:
python create_trafo_model.py /PATH/TO/MY/CONFIG/FILE
This step should be fairly quick to run. A number of batches will be read in from the training data to obtain summary statistics on the event hypothesis parameters.
Once this steps completes, the training of the neural networks can begin by executing the following:
python train_model.py /PATH/TO/MY/CONFIG/FILE
As noted above in the Training Settings, this step may be run as many times and with as many different training settings as desired.
When training is complete, the model can be exported by running the following command:
python export_manager.py /PATH/TO/MY/CONFIG/FILE -o /PATH/TO/OUTPUT/DIR
The exported model can then be distributed and used within the provided I3TraySegments as described in the section Apply Exported Model.