.. _example_cascade: Example: Training of a Cascade Hypothesis ***************************************** Here, an example is shown that demonstrates how the |egenerator| framework can be used to define and train a model for a simple cascade hypothesis consisting of 7 parameters (x, y, z, zenith, azimuth, energy, time). The framework is steered by a general yaml-config file. In this config file, all necessary settings are defined from training data specifications, to event hypothesis, likelihood function, and neural network model architecture. For this example, we will use the config file provided `here `_. Configuration Settings ---------------------- The configuration file includes many settings. Here, some of the most important ones are highlighted. Model Settings ============== The config file contains a section title "Model Settings" with a dictionary of the name ``model_settings``. This is arguably the heart of the configuration file, as it defines the event hypothesis. This dictionary is defined in a nested way, where each element must at least define the ``model_class`` and keyword arguments to that python class via the ``config`` parameter. Instances of ``Multi-Source`` classes will also need to provide the nested models via the key ``multi_source_bases``. The general structure of the defined event hypothesis in the config file looks like the following: .. code-block:: yaml # Settings for the neural network model class model_settings: { # The source class to use 'model_class': 'egenerator.model.multi_source.independent.IndependentMultiSource', # configuration settings for DefaultMultiCascadeModel object config: { 'sources': { 'noise': 'noise', 'cascade': 'cascade', }, }, # Base sources used by the multi-source class multi_source_bases: { noise: { # The noise source model class to use 'model_class': 'egenerator.model.source.noise.default.DefaultNoiseModel', config: {}, }, cascade: { # The cascade source model class to use 'model_class': 'egenerator.model.source.cascade.default.DefaultCascadeModel', config: { [...] }, }, }, } At the top-most level, this model is defined via the ``IndependentMultiSource`` class, which is a class that combines multiple sources assuming that they are independent. In this case, the model is set up with the defined ``sources`` named "noise" and "cascade". The configuration for these sub-sources are provided in the ``multi_source_bases`` key. Essentially this means that a 2-component event hypothesis is defined consisting of a noise model via the class ``DefaultNoiseModel`` and a cascade via the class ``DefaultCascadeModel``. Training via this configuration file will therefore train these two individual source definitions at the same time. Note that these are saved and exported in the same nested structure. Therefore, the trained cascade component can be used individually later on, if desired. This example demonstrates a simple nested structure of 2 base sources. In principle, one can add an arbitrary number of layers to this nested structure. A more complex example that defines a 2-cascade hypothesis with a previously trained cascade model is given `here `_. By providing the keys ``load_dir`` and ``data_trafo_settings``, a previously trained model is loaded instead of starting from scratch. An additional example for a "track" consisting of 10 cascades is provided `here `_. As of now, the |egenerator| framework does not scale too well to Multi-Sources with many individual models. A track definition with 100-1000 independent cascade models is currently not feasible. This might require some optimizations in the software framework. A workaround may be to directly define more complex sources for cases such as these, instead of stacking 100 individual models. Loss module settings ==================== This defines the loss-function (likelihood) to use. For this example config, the likelihood is defined via two components. The first component is an unbinned likelihood for the pulse arrival times, defined in the function ``unbinned_pulse_time_llh`` of the module ``DefaultLossModule``. The second component is a negative binomial distribution for the total measured charge via the function ``negative_binomial_charge_pdf``. The negative binomial is chosen instead of a simple Poisson distribution, such that over-dispersion from marginalization over systematic parameters can be accounted for. .. _example_cascade__training_settings: Training Settings ================= The key ``training_settings`` defines the training and learning rate decay strategy. In this example, the ``ADAM`` optimizer is used with 3 training steps. The first training step consisting of 10000 optimization steps uses a learning rate that starts at 0.01 and then decays linearly to 0.001. Afterwards, 490000 optimization steps are performed with a fixed learning rate of 0.001. Finally, 500000 steps are performed with a learning rate that decays from 0.001 down to 0.000001 with a polynomial of second degree. In total, the model is trained for 1 million optimization steps as defined in the ``num_training_iterations`` key. One optimization step is the forward and backward propagation one batch of training data. The key ``save_frequency`` defines at which intervals the model is saved to disk. The |egenerator| framework keeps track of how many training steps are performed with which configuration file. You can stop and restart the training procedure at any time. Concurrent training will pick-up from the last saved checkpoint, unless the setting ``model_manager_settings['restore_model']`` is set to False. Note however, that the optimizer settings are not saved to disk. Thus, interrupting and restarting will mean that the learning rate strategy starts from the beginning. In any case, all configuration and training settings are saved to together with the model checkpoint, such that it remains reproducible. Training and Exporting an Model ------------------------------- Once the configuration file is created, the model can be trained. This is done in two steps. First, the data transformation model must be created with the python script `create_trafo_model.py `_ . This model performs basic transformations to the event hypothesis input parameters such that these are normalized and easier to use within the network architecture. Create the transformation model by running the following command: .. code-block:: bash python create_trafo_model.py /PATH/TO/MY/CONFIG/FILE This step should be fairly quick to run. A number of batches will be read in from the training data to obtain summary statistics on the event hypothesis parameters. Once this steps completes, the training of the neural networks can begin by executing the following: .. code-block:: bash python train_model.py /PATH/TO/MY/CONFIG/FILE As noted above in the :ref:`Training Settings`, this step may be run as many times and with as many different training settings as desired. When training is complete, the model can be exported by running the following command: .. code-block:: bash python export_manager.py /PATH/TO/MY/CONFIG/FILE -o /PATH/TO/OUTPUT/DIR The exported model can then be distributed and used within the provided I3TraySegments as described in the section :ref:`Apply Exported Model`.