Q & A

Questions by Collaboration Reviewer Christian Haack

Event selection

Q: How do you (precisely) define stopping event?

If all muons of and event that enter the pre-defined detector volume stop inside this volume, the event is labeled as a stopping event. If an event has multiple muons that do not enter the detector, it can still be a stopping event if the muons that do enter are all stopping within the volume. The volume is here defined as a cylindrical volume with a radius of 500m and a height of 1km around the coordinate system origin. The volume could in principle also be chosen differently. It should cover most of the detector and is important for the effective area later on. We chose it to be slightly smaller than the instrumented volume.


Q: Is there a reason why you are not using per-DOM timing information (such as the time of the first hit per DOM)?

I actually do. The timing information is used per DOM. I also mention this explicitly in the Wiki now.


Q: For your studies on the data/mc agreement you show the impact on the reconstructed depth and zenith, but not on the classification score. Could you add that plot as well?

The plot was added to the Wiki.


Q: Depending on your definition of “stopping event”: Wouldn’t an additional classification (or regression) target for the muon number help you in selecting stopping single muons? Also explicitly trying to cut away bundles should remove systematic uncertainties (for example: how well we can model the muon multiplicity distribution in our simulations)

Depending on the score cut, more than 90% of the stopping muon events are single muons (you can find the plots in the event selection section). This is simply a property of the event type itself. Most of the lower energetic muons are usually stopping before reaching the detector. As far as I know it is not really possible to resolve individual muons of a bundle with the IceCube Detector. This is why I did not train an additional network. I would expect it to have a very hard time to distinguish between single muons and bundles, thus simply leaving us with less statistics. The visible “track” in the detector should in any case correspond to the one of highest energetic muon, because it propagates the longest distance through the ice and produces light.


Q: Could you show a few plots of the data/mc agreement of your reconstructed observable as function of the score cut (also one for score > 0.995 or 0.999?)?

The plots were added to the Wiki (Fig. 32 - Fig. 37).


Q: How do systematic uncertainties (such as ice model, dom efficiency) impact the resolutions of the reconstructed observables?

The effects of the different systematics on the reconstructed observables are now displayed at the end of the Event Selection section. The plots show the observable distributions with varying systematic parameters.


Effective area

Q: I’m still having some trouble with the effective area section. The unfolding gives you the true surface energy (E) and propagation length / depth (L). You then construct the effective area A(L) (so a 1D histogram with units length^2 ?) to convert to Phi(L). What happens with the energy dimension? Wouldn’t you actually want to construct an effective area A(L, E) and then integrate the resulting flux over energy? Also in this approach, systematic uncertainties from e.g. hadronic interaction models would result in changes of the effective area, right? Are you planning to take these uncertainties into account? (e.g. by producing effective areas using two different hadronic interaction models?)

Yes, the effective area is calculated as a 1D histogram A(L) in case of propagation length unfolding and A(E) in case of surface energy unfolding. The precise description in the Wiki refers to the calculation of A(L). The second effective area A(E) is calculated analogously, where the propagation length L is substituted by the surface energy E in the calculations. This means that for A(L) there never exists an energy dimension to integrate over (and vice versa). Note that this is somewhat different to the “standard” way of calculating (for example) neutrino effective areas because here we are not just interested in an energy spectrum.

Systematic uncertainties also have an impact on the effective area calculation. These uncertainties are not yet taken into account but we should include them. (This is possible once there is more snowstorm data available.) Model dependencies should cancel out in the ratio of triggered to untriggerd data, however everything that changes the trigger rate has an impact.


Q: Regarding the effective area calculation itself: So for calculating A(L) you are effectively averaging over the energy (since you have that information for every event) before converting from event rate to flux. Aren’t you throwing away some information there? The surface energy of each event will be correlated to the propagation depth, so wouldn’t you get a better estimate of Phi(L) if you would calculate a 2D effective area A(L, E), multiply that to your 2D event count histogram N(L,E) and then integrate out the energy dimension (and vice-versa for the surface flux)?

I agree that in theory averaging before converting from an event rate to flux does throw away a certain amount of information, especially if our simulations do not agree with data. With perfect Data-MC agreement I think it should not make a difference at what level the averaging is carried out. The main reason why it is done this way is that the software we are using is currently not supporting multidimensional unfolding in target space. Implementing this functionality would also need a careful treatment of e.g. regularization etc. and would thus be a task for entirely new analyses in the future. This is why we need to average the effective area before unfolding in my analysis. The main argument from a physical standpoint is that we believe that our simulations are describing the data sufficiently well such that it should not really make a difference.


Q: In the results section you show that both approaches yield nearly identical results - which approach will you use for your analysis and why?

I will be using the more general approach (described first in the effective area section) for simplicity reasons. The second, more specific approach was an additional test that was triggered by a discussion in the CR-Call. I think it was difficult to understand which muons I want to correct for with the effective area. Since I am performing the analysis on stopping muons, it was believed that I should only correct for muons that are actually stopping within the detector volume (In short: How many muons are stopping within IceCube that we did not measure). However, each muon has its own propagation length independent of its exact stopping point (that could be inside the detector or not). Hence, one could also correct for all muons (not just those stopping inside IceCube) as long as the injection surface etc is handled properly. This is the only difference between the two approaches. It was simply a check to see that both ways of thinking indeed give us almost identical results.

Unfolding

Q: Regarding the unfolding package: I assume that this is the same software that Leonora uses for the atmospheric flux unfolding?

Yes, the package is funfolding, exactly the same that Leonora uses.


Q: In the unfolding section, you show the weighting function obtained from snowstorm simulation - how do you obtain these exactly? Are you using a small gaussian kernel to reweight the snowstorm set to the respective value shown on the x-axis in the weighting function plots?

The systematic parameters are uniformly sampled in the snowstorm simulation. The average value always corresponds to the center value of the interval that the values are sampled from. By defining cuts on the systematic parameters, we can then shift the average value, thereby creating multiple subsets with different average parameters. So I am not using a gaussian kernel, I just “shift” the uniform distribution around. For each of the subsets I count the events in my observable bins (the weights are adjusted to the size of the subset) and calculate the ratio. The ratio is then fitted by a simple polynomial for each bin and systematic. That is what you see in the plots in the unfolding section.


Q: Can you quote the values for regularization strength and log-offset that you found? Can you show how your unfolding roundtrip (in the results section) changes as function of these parameters?

The roundtrip is included in the Systematic Checks section.


Q: There’s one additional test I’d be interested in: If you create a pseudo dataset where the systematic parameters are offset from the baseline, do you recover them in the unfolding - ie are the posteriors in the MCMC centered around the value you have injected?

The test is shown in the Systematic Checks section.


Q: Could you add your unblinding plan to you analysis documentation?

The unblinding plan was added to the Overview section.

Questions by CR WG Reviewer Dennis Soldin

Overview

Q: I think it is important to explain that this analysis selects events that do not traverse the entire detector, i.e. that all muons in the bundle stop in the detector volume, and that this analysis measures the highest energetic muon of the bundle, i.e. the muon with the largest track length. I think this can be made clearer in order to avoid questions. Related to that, do you have a plot of the true muon multiplicities for your event selection from MC? I think this is important information that should be added to the wiki.

The information was added to the Overview section and the muon multiplicities plot was added to the Event Selection paragraph.


Q: Generally: What is the main goal of this analysis? To derive the propagation length distribution or to derive the energy spectrum (or both)? I think this should be clearly stated in the introduction.

A clear statement was added to the Overview section.

Q: “…or even the primary cosmic ray energy…” According to the Heitler model, for example, the muon multiplicity depends on the primary energy, as well as the primary mass. Thus, it is not possible to disentangle energy and mass, and I do not think that it is a good proxy for the primary energy.

The statement has been removed.

Event selection

Q: Currently the training uses old CORSIKA MC, what is the time line for the updated MC distributions? I think this will be required before unblinding.

The networks were retrained on new MC data (20904) and the plots were updated. Information about the old CORSIKA MC sets was deleted from the wiki.


Q: What is the standard deviation of pulses at each DOM, how is this defined? Can you explain this in more detail on your wiki? Also, what processing has been done to get the “SplitInIceDSTPulses”? The word “split” indicates that some basic processing (cleaning/splitting?) has been applied. Please describe the basic processing steps to get the SplitInIceDSTPulses on your wiki, if there is some basic processing.

One can define a charge weighted mean arrival time of the pulses for each event. Taking the difference between the mean arrival time and the individual arrival time of each pulse yields the relative time offset (also one of the inputs). From that, one can calculate the standard deviation of pulse arrival times as the square root of the sample variance.

The SplitInIcePulses/SplitInIceDSTPulses already exist on level 2 data and no additional processing has to be applied. The CleanedMuonPulses are created using the I3TimeWindowCleaning<I3RecoPulse> module that is mentioned in the Event Selection section now and there is also more information about the basic processing steps.


Q: I do not See any studies of the data/MC agreement of your input parameters for your DNN. When you use machine learning methods based on MC training, it is crucial to show the data/MC agreement of the inputs as your entire subsequent analysis depends on it and a data/MC mismatch may lead to wrong results. I.e. you need to show agreement of the inputs total charge, time of first pulse, and standard deviation.

Plots of the data/MC agreement for the DNN inputs are now provided here.


Q: Which filter stream is used? If you do not select a filter, this may cause problems as certain events will have “pre-scales”, not all frame objects are always available, etc… the standard IceCube procedure is to select certain filter streams for further analysis. I do not find any information and it seems, you do not use any filter which may cause issues, and my question would be, why don’t you use any filter to avoid problems?

The FSSFilter ist used as filter stream. The information is now stated more clearly in the wiki. There are additional plots (Fig. 3 and Fig. 4) to provide explanation of why we choose this filter.


Q: You use H3a as primary model for your MC that is used for the training. How does using another primary flux assumption change the training? I believe this needs to be included as a systematic, however, I do not see any discussion on your wiki.

The impact of the primary model on the unfolding procedure is discussed in Fig. 49 and Fig. 50.


Q: I do not understand at all what are the inputs for your DNN and what the DNN does. In fact, besides a reference to a journal publication and the short sentence “The framework has been adjusted by Mirco Hünnefeld to the classification of stopping atmospheric muons”, I do not see any explanation of details about your DNN. Details of the DNN and in particular the adjustments should be added to the wiki.

There is now much more information about the DNN provided in the Event Selection section and also the inputs are properly explained. There are also links to the exact code stated in the text. The reference is provided because in principle my network works exactly the same way as the one described in the paper, although it is a little bit smaller in its architecture.


Q: You list the final cuts on Scores, Depth Uncertainties, and Zenith unc. What are the unc. Variables and how were all the cut values chosen? The choice seems quite arbitrary to me. Was this optimized in any way?

The score cut is chosen based on the tradeoff between precision and recall. The cut should be as strict as possible while still providing enough statistics. The cuts on the uncertainties are motivated by the fact that we want to cut out events that are not that well reconstructed based on the uncertainty distributions Fig. 21 and Fig. 23.


Q: Precision and recall should be explained.

Explanaiton now provided in Event Selection section.


Q: Data-MC agreement - “All three quantities show a general offset of about 20%.” I disagree. Only the Depth distribution shows a constant offset, the other distributions clearly show dependence on zenith and propagation length (not only very close to the horizon, as you describe), i.e. the ratio plots are not flat (at all). Is this a problem? How do you interpret these differences?

The disagreements in data/MC should not be a problem. At least not as they are now. An overall offset like it is the case for the depth should not have an impact in unfolding per definition. Thats different for a transition as we see it in the zenith angle and then also in the propagation length, however the unfolding should also be very robust against such differences in the spectrum (as also shown in the unfolding section later). So in summary, we do not expect this to be a problem, although it should of course be as good as possible. We can also not provide an explanation for why there is this transition in the number of events for the zenith.


Q: While you added more information on the basic event selection throughout the documentation, i.e. filter stream, pulseseries, cleanings, etc., I think the structure can be improved. The data sets used in the analysis (including simulated data sets) should be introduced *before you describe the DNN, as the selection provides the basic inputs for the DNN (right?). For example, you use some pulseseries (i.e. CleanedMuonPulses) to derive the total charge which is used as an input for the DNN. Thus, the filtering/cleaning (and data/MC agreement) should be described before. However, this is generally not a major issue, just a matter of clarity.*

The Structure was updated.

Effective area

Q: I think it is important to describe in the first paragraph that the effective area accounts for all detector/analysis efficiencies in order to derive corresponding distributions at the surface, which can be (better) compared to other experiments. You just say that event rates between different detectors differ but not why this is (it’s due to different efficiencies as the muon flux should be the same at first order approximation, i.e. neglecting different altitude, etc.).

Q: “This is done via the normalization…”: This is not a normalization, what is shown is a flux! The normalization arises from deriving an efficiency and applying this to the measured flux. This needs clarification.

“The stopping muon depth intensity is different to i.e. an energy spectrum at the surface in the sense that it can not be defined at a certain point in space. The stopping point of a muon depends on its propagation length (and therefore its energy). Hence, if one thinks of a number of muons being generated on a 2D plane (defined at a certain depth in the ice), the depth intensity spans from muons stopping far in front of the plane up to muons stopping far behind the plane.” This is generally not different from the determination of the energy spectrum. I.e. if you want to measure the energy spectrum of muons above 500 GeV in IceCube, for example, not every muon with this energy makes it to the detector. So, in this case the effective area accounts for these “missing” muons and the situation is (to my view) equivalent to your discussion here. In fact, that’s the point of using an effective area, otherwise one would not need it to derive muon spectra from IceCube data.

Q: “Although it might seem to be a very strong assumption to include these muons in the calculations, …” Why? See above, we would do the same to derive an energy spectrum.

Q: “This is not entirely feasible according to our current understanding.” Why?

The phrasing was updated for all these points and some additional information is provided.


Q: “Note that N_sel counts selected events and N_gen counts individual muons.” I do not understand this. If N_gen would count events, what is the meaning of its dependence on L? L is a property of a single muon, not the CR event, and thus N_gen MUST count muons based on dimensional analysis. In fact, in the next section you disagree with your own statement, saying “… is desired to take the number of generated muons N_gen.”

The statement about N_sel and N_gen means that we are trying to compensate for the muon multiplicity in a statistical manner. N_sel counts events. The events might consist of multiple muons entering the detector. We can not resolve this. The reconstructed propagation length corresponds to the one of the highest energetic muon. N_gen counts all muons (I think this is what I always wrote in the text). Not, as one might think, the highest energetic of each event. This means we are also accounting for the multiplicity in the effective area, as we calculate a correction factor that traces selected events (mostly single muons, but somtimes also small bundles) back to the distribution of individual muons (each single muon from the generated CORSIKA data).


Q: “The Problem with Existing IceCube Simulations” To my understanding you reproduce all plots with new simulations, right? If this is true, I would actually recommend to remove the parts about the „old simulations” to avoid confusion, if possible (i.e. if everything is derived with new simulations in the updated analysis). Generally, try to avoid to describe aspects that are not used in the final analysis. Also, what is the time line of the reprocessing using new simulations (see also my previous comment on this)? Finally, when the reprocessing is done, the wiki pages should be generally updated.

This part will be removed as soon as the new simulations are finished. All plots will be updated.


Q: “The deviating shape is expected, as it results from the propagation through a different medium.” I think this is actually interesting. As a general question, is it somehow possible to correct for these nuclear effects in order to have a apple-to-apple comparison with other experiments? I’m just curious whether we can use this aspect to learn about nuclear effects or so. It’s interesting.

We did not find a way to correct for the nuclear effects so far but this might also be a good idea for a follow-up analysis. We can instead compare this to other in-water/ice measurements instead. This has simply been a reference for myself that we end up in the correct order of magnitude so far. Prof. Rhode was also very interested and already thought about measuring this effect. However, we did not spend much time thinking about this yet.

Unfolding

Q: How did you derive the in-ice systematic uncertainties and what exactly does “ratio to baseline” in your plots mean? Is this in terms of event yield, or what observable? Also, it is not clear to me which values are used for the systematics as you show distributions. Can you elaborate a bit more on the estimation of systematic uncertainties?

We use additional self-simulated data (Table 2) to estimate the systematic uncertainties with snowstorm simulations because there are no official systematic data sets available for 20904. With snowstorm, the events are simulated with different values for the systematic parameters, taken from a pre defined parameter range. Because the values are sampled from a uniform distribution over the defined range, the average value always corresponds to the center value of the interval for each parameter. By defining cuts on the systematic parameters, we can then shift the average value, therby creating multiple sub sets with different systematic parameters. We then calculate the distribution of our observable using the same binning as for unfolding. The bin count should change depending on the systematic parameter. We then quantify and fit this change in the counts for each bin, which results in multiple weighting functions that can be used to fit the systematic uncertainties by multiplying them to the detector response matrix, as described in the unfolding section. The parameters vary over different ranges, which can be seen on the x-axis of the provided plots.

Results

Q: For the comparison with MCEq, how do you chose the nominal values at which to calculated the MCEq fluxes. I.e. for the data you show bins, while the MCEq prediction is a continuous line, how does this compare? Where would be the nominal values for your binned distribution and can this account for the differences seen?

The nominal values for each bin would be slightly shifted to the left compared to the bin-center (if one takes for example the average energy per bin). This shifts the points in the wrong direction and hence it can not be the reason for the visible difference.


Q: I think you should say what the “mini burnsample” is, i.e. 1% of experimental data I assume?

The mini burnsample is not even 1% of data simply because there is so much. We tested our scripts on an even smaller sample. Details in Table 3.

General

Q: Unblinding plans - I do not find any information about the details of the actual unblinding request. E.g. which years are you planning to unblind, will you first look into 10% of the data or directly use the full dataset, etc.? What is actually requested in terms of unblinding?

We plan to unblind one year of data (2020). This is mentioned on the main page.


Q: Related to technical review: Where do we find your scripts and data files?

Technical details to reproduce the analysis are provided in the Technical Documentation now.