Preparation

In order to perform signal background saparation, User must merge all the results into single files

Results of 100 million events (per station) that needed for signal background saparation cut: run number, event number, trigger type, unix time, live time, quality cut (33 cuts), interferometry (2 pols), and matched filter (2 pols)

Numpy array that has 100 million elements in float format is equal to 1 GB. So, It is easily handle all of them at once

This is done by below commands

For the data side:

source ../setup.sh
python3 script_executor.py -k sub_info_burn -s <station ID> -r <run number>          # collect sencondary information in the burn sample
python3 info_summary.py <station ID>                                                 # collect run and event number, trigger type, and unxi time for all events
python3 dat_summary_live.py <station ID> <entry> <entry width>                       # calculates live time based on 1st data cleaning
python3 dat_summary_live_sum.py <station ID>                                         # merges the results
python3 dat_summary_qual.py <station ID> <entry> <entry width>                       # collect the data cleaning results of all events
python3 dat_summary_qual_sum.py <station ID>                                         # merges the results
python3 dat_summary.py <station ID> <entry> <entry width>                            # collect the vertex reconstruction and matched filter results of all events
python3 dat_summary_sum.py <station ID>                                              # merges the results

For the simulation side:

source ../setup.sh
python3 sim_summary_qual.py <station ID> <signal or noise>                            # both signal and noise. collect the data cleaning results of all events
python3 sim_summary.py <station ID> <signal or noise>                                 # both signal and noise. collect the vertex reconstruction and matched filter results of all events

Results of all 3 data cleanings in 2 parameter space

This is done by Check_Sim_v30_mf_corr_2d_map_final_surface_cut_total

Results of evolution of cuts in the space of two event selection methods

  • Interferometric in x-axis and matched filter in y-axis

  • All configurations are merged in one dataset

Remaining data is most likely consist to thermal noise

Signal/background cut is defined in this space

../../_images/sb_hist_0.png

Fig. 96 Evolution of data cuts. A2 VPol

../../_images/sb_hist_1.png

Fig. 97 Evolution of data cuts. A2 HPol

../../_images/sb_hist_2.png

Fig. 98 Evolution of data cuts. A3 VPol

../../_images/sb_hist_3.png

Fig. 99 Evolution of data cuts. A3 HPol

Diagonal Cut in 2 parameter space

Making/Optimizing the diagonal cut in Cmax vs Mmax space for signal/background separation

Makes diagonal cut by optimizing y = a * x + b. a: slope and b: intercept

Decided to make a signal/background cut by adding up all configurations (global cut)

  • More conservative approach

  • Testing various slope and intercept parameters: 2d grid search

Cut will be optimized by more broad distribution

../../_images/sb_hist1_0.png

Fig. 100 Projected data and simulation based on slope and intercept parameter

../../_images/sb_hist1_1.png

Fig. 101 A2 VPol. Projected data and simulation based on all slope and intercept parameter

../../_images/sb_hist1_2.png

Fig. 102 A2 HPol. Projected data and simulation based on all slope and intercept parameter

../../_images/sb_hist1_3.png

Fig. 103 A3 VPol. Projected data and simulation based on all slope and intercept parameter

../../_images/sb_hist1_4.png

Fig. 104 A3 HPol. Projected data and simulation based on all slope and intercept parameter

Goodness of fit

This is done by back_est_gof_ell.py and back_est_gof_ell_sum.py

In each slope, exponential fit is performed.

  • y = p0 * exp(-p1 * (x - xmin)). xmin: fit starting point

Calculates -2log(L) values of pseudo-experiment (10k) created by fit

  • Check where -2log(L) of actual data can be landed in the pseudo-distribution

  • If p-value is bigger than 0.05, it is acceptable fit that can describe the data

Procedure is,

  • Integral the fit region to get expected # of background

  • Apply poisson dist. by using ‘expected # of background’ as 𝝺

  • (Randomly) generate K amount of data by following fit line -> pseudo-experiment

  • Computing log-likelihood between Pseudo data and fit

  • Do this many times

In order to confirm which data region is proper to use for fitting, Above procedure also repeated by 20 different data region (dividing range between max peak to last data point into 20)

../../_images/sb_hist2_0.png

Fig. 105 Left: data distribution and fit including parameters. Middle: Result of pseudo-experiment including p value. Right: Result of pseudo-experiment into 2d histogram. p value that close to 0.5 is selected for estimating background

../../_images/sb_hist2_1.png

Fig. 106 Left: fit results with 20 different data range. Middle: Result of pseudo-experiment. Right: Result of pseudo-experiment into 2d histogram. p value that close to 0.5 is selected for estimating background

../../_images/sb_hist2_2.png

Fig. 107 A3 VPol

../../_images/sb_hist2_3.png

Fig. 108 A3 VPol with all data range

Background Estimation

This is done by Check_Sim_v32.2_back_est_pseudo_total_w_edge_ellipse

Performs pseudo-experiment to estimate background estimation

Creates 100k of different fit line by using gaussian distribution

  • p0 and p1 are means and uncertainty of each parameter are sigma

  • sigam is calculated by uncertainty and correlation coefficient of parameter

Calculates 100k of background number for each intercept cut values

Each intercept cut values, choose median as an background estimation and 1 sigma from median as an error

../../_images/sb_hist3_0.png

Fig. 109 Illustration of fluctuation of fit

../../_images/sb_hist3_1.png

Fig. 110 background estimation data and noise sim by pseudo-experiment

Upper Limit

This is done by upper_limit_summary_total.py

Use results of back.est. to run another 10k pseudo exp. by poisson distribution

  • Considering zero signal detection

Use K from poisson dist. to run Feldman Cousin method

Mean value of upper limit distribution by 10k pseudo exp. was used for final value

Cut value that has maximum ratio of S / Sup would be optimal upper limit position

../../_images/sb_hist4_1.png

Fig. 111 A2 VPol. Results of upper limit and s / sup. maximum s / sup ratio is final cut value

../../_images/sb_hist4_0.png

Fig. 112 A2 VPol. s / sup ratio in all slope

../../_images/sb_hist4_3.png

Fig. 113 A2 HPol. Results of upper limit and s / sup. maximum s / sup ratio is final cut value

../../_images/sb_hist4_2.png

Fig. 114 A2 HPol. s / sup ratio in all slope

../../_images/sb_hist4_5.png

Fig. 115 A3 VPol. Results of upper limit and s / sup. maximum s / sup ratio is final cut value

../../_images/sb_hist4_4.png

Fig. 116 A3 VPol. s / sup ratio in all slope

../../_images/sb_hist4_7.png

Fig. 117 A3 HPol. Results of upper limit and s / sup. maximum s / sup ratio is final cut value

../../_images/sb_hist4_6.png

Fig. 118 A3 HPol. s / sup ratio in all slope

Results of signal / background cut in 2 parameter space

This is done by Check_Sim_v34.3_mf_corr_ver_2d_map_w_cut_combine_w_noise_total_w_edge_ellipse

../../_images/sb_hist4_8.png

Fig. 119 A2 VPol. Left: sim signal. Middle: sim noise. Right: data

../../_images/sb_hist4_9.png

Fig. 120 A2 HPol. Left: sim signal. Middle: sim noise. Right: data

../../_images/sb_hist4_10.png

Fig. 121 A3 VPol. Left: sim signal. Middle: sim noise. Right: data

../../_images/sb_hist4_11.png

Fig. 122 A3 HPol. Left: sim signal. Middle: sim noise. Right: data

Passed simulation event

This is done by Check_Sim_v34.3.2_mf_corr_ver_pos_total

../../_images/sb_hist555_0.png

Fig. 123 A2. passed simulated events by 3 different steps

../../_images/sb_hist555_1.png

Fig. 124 A3. passed simulated events by 3 different steps

Left col.: Radius vs Depth map. Station is located the zero position

Middle col.: Energy vs Radius map

Right col.: Vertex theta/phi position differences between reco (interferometric) and true

Most of event we will see is corresponding to depth above -2000 m and energy below 10^11 eV

Tail to up on elevation angle differences are coming from mis-reconstruction of surface reflected event

Sanity check of AraCorrelator and AraVertex

Below plots are sanity check of our vertex reconstruction methods. 2d map is the reconstructed elevation angle against the true elevation angles. 1d map is differences bwteern reconstructed and true.

I didnt exactly saparate the event based on their polarization. So, for example, half circle shape of distribution in AraCorr V is not caused by mis-reconstruction. It has strong HPol signal and weak VPol signal.

Except polarization issue, Typical three branch distribution in each plot (streched from center to bottom left, top right and top left) is similar to testbed publication. Top left branch is caused by general mis-reconstruction of surface reflected event.

In A3, the mis-reconstruction, specially in AraVer V, in config 6 to 9 is caused by the channel that experiencing amplifier failure

If below plots are too hard to see, you can find high resolution one in here: A2 and A3

../../_images/sb_hist5_2.png

Fig. 125 A2. True vs Reco elevation angle. 1st: VPol results from AraCorrelator, 2nd: HPol results from AraCorrelator, 3rd: VPol results from AraVertex, 4th: HPol results from AraVertex, 5th: V+HPol results from AraVertex

../../_images/sb_hist5_3.png

Fig. 126 A2. True - Reco elevation angle. 1st: VPol results from AraCorrelator, 2nd: HPol results from AraCorrelator, 3rd: VPol results from AraVertex, 4th: HPol results from AraVertex, 5th: V+HPol results from AraVertex

../../_images/sb_hist5_4.png

Fig. 127 A3. True vs Reco elevation angle.

../../_images/sb_hist5_5.png

Fig. 128 A3. True - Reco elevation angle.

Below plots are comparison of reconstructed vertex positions bewtween AraCorrelator and AraVertex

Both methods have different search condition

  • AraCorrelator search through all theta and phi with 1 degree resolution. But I limited radius to 41, 170, 300, 450,and 600 m

  • AraVertex search conditions are 1) ice model parameter is changed to match with AraSim default model iceProp(1.78,-0.43, 0.0132). 2) radius search range is 170 to 5000 m. 170 m is set to prevent mis_reconstruction. If I set minimum range to original value, often surface events are reconstructed to close to antenna and have a elevation angle clode to 0 degree. 3) RPR threshold is set to 4 and minimum number of antenna that requried to perform Aravertex is set to 3. It is for catching surface event that has low SNR.

Due to radius conditions of both methods, depth results, which is driven from theta and radius results, are has discripancy. but most of discripancy is casued by very low SNR event.

../../_images/sb_hist5_6.png

Fig. 129 A2. 1st: VPol results of theta comaprison. 2nd: HPol results of theta comaprison. 3rd: VPol results of depth comaprison. 4th: HPol results of depth comaprison.

../../_images/sb_hist5_7.png

Fig. 130 A2. 1st: VPol results of theta differences. 2nd: HPol results of theta differences. 3rd: VPol results of depth differences. 4th: HPol results of depth differences.

../../_images/sb_hist5_8.png

Fig. 131 A3

../../_images/sb_hist5_9.png

Fig. 132 A3

Summary of backgound estimation and signal efficiency

A2

../../_images/sb_hist6_0.png

Fig. 133 A2 backgound estimation

../../_images/sb_hist6_1.png

Fig. 134 A2 signal efficiency

A3

../../_images/sb_hist6_2.png

Fig. 135 A3 backgound estimation

../../_images/sb_hist6_3.png

Fig. 136 A3 signal efficiency