Unfolding of atmospheric neutrino spectrum with IC9 data
Unblinding proposal

» Introduction

» Motivation

» SVD method

» Unfolding variables

» Monte Carlo

» Quality Cuts

» Processing

» Comparison MC-data

» Robustness checks

» Results

» Unblinding proposal

Comments to:

zornoza@icecube.wisc.edu

Introduction

We present in these pages the proposal for the unblinding of the IC9 data for the unfolding on the atmospheric neutrino spectrum. The structure of this site is as follows:

Motivation
Single Value Decomposition algorithm
Selection of variable for unfolding
Monte Carlo samples
Quality cuts
Processing
Comparison between Monte Carlo and data
Robustness checks
Results
Unblinding proposal

Motivation

The atmospheric neutrino spectrum is one of the basic measurements of neutrino telescopes. Apart from the intrinsec interest for the study of atmospheric neutrino studies, it is also fundamental for understanding the background for other analysis, in particular point-like source search. The first analysis of the atmospheric atmospheric spectrum with IC9 data was done by J. Pretz, who showed that the behaviour of the detector is understood. In J. Pretz analysis event rates and zenith distributions were studied, but not the energy spectrum of the atmospheric neutrinos.

The aim of this analysis is to reconstruct the energy spectrum of atmospheric neutrinos using unfolding techniques. The need for unfolding algorithms comes from the fact that the reconstruction of the spectrum using the individual energy of the events is impaired by the combination of two features of this problem. First, the energy resolution is limited, since the energy loss of the muon is very stochastic (and moreover only part of the muon energy is observable, which is turn is only part of the neutrino energy). Second, the spectrum falls very fast. Therefore, the low-energy events for which the energy is overestimated "bury" the regions at higher energies. The way to avoid this problem is to use unfolding algorithms, as it has been previously made in AMANDA (cf H. Geneen or K. Muenich works).

go up

Single Value Decomposition method

There are several methods unfolding methods commonly used in high-energy physics. Among them we can metion RUN (Blobel), Bayesian (D'Agostini) and SVD (Hoecker). For this analysis, we have used the latter, easy to implement and quite efficient.

The problem of unfolding consist in solving the Freholm integral equation of first kind which relates the some measured quantity with the energy:

In matrix notations, this reads as:

where A is the so-called "smearing-matrix", which has to be generated by Monte Carlo. Aparently, the solution of the system only requires the inversion of the smearing matrix, but this leads to useless solutions, since the effect of statistical fluctuations spoils completely the solution. Therefore, an alternative approach has to be followed, which basically implies to impose in the system the reasonable assumption that the shape of the spectrum is smooth.

The SVD method (Singular Value Decompostion) is based on the idea of decomposing the smearing matrix as

where U and V are orthogonal matrices and S is a non-negative diagonal matrix whose diagonal elements are called "singular values". It can be showed that this decomposition allows to identify easily the elements of the system that contribute to the statiscal fluctuations, but do not provide useful information. There
mation. Therefore, we can cut off these elements in order to reduce the effect of such fluctuations. Another interesting point of this method is that in practice we do not try to solve directly the spectrum, but the deviation from a reasonable assumption. This also helps to reduce the effect of the statisctical fluctuations, since the solution is expected to be smooth (ideally flat and close to the unit) if the initial assumption is close enough to the real solution.

go up

Unfolding variables

The first issue for this analysis is to determine which is the best observable for unfolding. Obviously, it has to be a variable which has a good correlation with the neutrino energy, i.e. dependence as linear as possible and small spread. We have considered several possibilities: total charge in the detector, number of hit DOMs, number of DOMs with one hit, reconstructed energy. In the following we show how to compare the quality of different variables for unfolding. In particular, we show the results for the best two cases: the total charge measured in the detector and the muon energy at the point of closest approach as reconstructed by RIME (developed by Dima Chirkin).

The information in the plots above is the following:

1) upper left: "colz" plot of log10 of the charge versus log10 of the primary neutrino energy.

2) upper right: we contruct the profile plot of the previous one (i.e. we represent the mean in each vertical slice) and fit it to a straight line. We are are interested in the deviations with respect to such parametrization, but it as to be scaled in order to be able to compare different magnitudes. Therefore, we calculate a variable called "delta" as the difference between the fitting function at 10^5.5 and 10^1.5 GeV.

3) We show the scatter plot of the difference between the value of log10 (charge) and the actual fited straight line, divided by the value of delta calculated previously.

4) Finally, we project on the left axis in order to get the histogram of such scaled deviations. The RMS of this histograms should be as low as possible, and allows the comparison among different magnitudes.

We repeat this process for the energy estimated by RIME:

The behaviour of the energy is clearly better than for the total charge, as shown in the table below. This is actually what we could expect, since the reconstructed energy, even if it is only part of the energy of the muon (which in turn is only part of the primary energy of the neutrino, as any other possible variable), uses also the information of the position of the track which allows to distinguish between close low-energy tracks and far high-energy tracks.

variable	RMS
log10(charge)	0.42
log10(RIME energy)	0.28

A more detailed description of the RIME algorithm and its estimation of the energy can be found in Dima's talks.

go up

Monte Carlo sample

The information about the Monte Carlo samples that have been used for this analys is summarized here:

	dataset (events)
corsika	296 (1000x10^6)
coincident muons	394 (3000x10^6)
nugen_numu	437 (50x10^6)

Note that run 437 was generated with E-2 spectrum instead of E-1, so that we do not generate many events at high energies that we are not going to use. The number of events above 10^5.5 GeV for IC9 during 6 months is negligible, so it is more efficient to generate with E-2. In the two plots below we compare the number of events generated and the expected (in 6 months) for E-1 (left) and E-2 (rigth) (note that the energy limits are also different!):

gamma=-1

gamma=-2

go up

Quality Cuts

The quality cuts used in this analysis are based on the cuts chosen by J. Pretz in his analysis of neutrino rates. The variables that are used are the same and the actual cut values have been relaxed to increase the statistics:

Variable	Cut
Ndir	>= 8
Ldir	> 200 m
Theta	> 92 deg

A cut in Nchan<46 is also applied in order to keep blinded the high-energy events.

In order to check the effect of this on the background, we compare in the following plot the expected number of events of signal, corsika events and coincident muons, versus energy estimated by RIME:

At this level of cuts the purity is 98%, with 671 atmospheric neutrino events, 13 corsika events and 2.6 coincident muons. In each bin the background contamination is of the order of (or much lower) than the expected number of atmospheric neutrinos, so the shape of the spectrum is not modified.

go up

Processing

The script used for the processing of the events can be seen here. The main steps of this process are:

Cleaning of the bad DOMs
Feature extractor processing
RIME reconstruction to estimate the energy at point of closest approach
Linefit reconstruction
LLH reconstruction fed by Linefit solution

go up

Comparison of Monte Carlo vs Data

The following plots show a comparison between data and Monte Carlo at the level of the cuts used in this analysis. It can be seen that the agreement is good.

go up

Robustness checks

There are two elements whose impact in the robustness has to be checked: the choice of the spectral index for generating the smearing matrix and the initial guess on the shape of the spectrum. As mentioned before, the smearing matrix is generated with a spectral index gamma=-2. This is well different enough from the actual shape of the atmoshperic spectrum. Therefore, the main issue to check is the stability against uncertainties for the choice of the inital guess. As it has been metioned before, this guess is used in order to make more effective the method, since the spectrum to recontruct is the deviation from this assumption.

We show two different cases. First, a function which is similar to what we expect from our best knowledge of the atmospheric neutrino spectrum. The second plot shows a more difficult case, in which we have constructed a function which deviates more from the expected solution (both at low and high energies), to show that the result converges towards the real input.

Case 1

Case 2

The input for the algorithm is the distribution of energy estimator. We use the Monte Carlo generation to estimate the average number in each bin, and then we randomize Poissonly in order to have a sample corresponding to six months.

go up

Results

The results for both cases are shown below:

Case 1

Case 2

It can be seen that in both cases the true distribution is well reconstructed. We can also show the differences between the true and unfolded distributions, compared with the 1-sigma and 2-sigma regions.

Case 1

Case 2

go up

Unblinding proposal

In the previous sections we have shown that the proposed cuts offer a good compromise between number of events available for the unfolding and minimum contamination from background. It has also be shown that there is a good agreement between data and MC at this level of quality cuts. We have also checked that the algorithm proposed for the unfolding is robust and efficient in the spectrum reconstruction.

go up

Steve's free web templates