Overview

1st data cleaning is designed to ‘remove’ 1) unphysical waveform/event caused by DAQ error, and 2) the cuts that reducing livetime

  • Time period that contaminated by weather balloon also removed from live time by cross-checking with balloon’s GPS log (CW GPS cut)

Most of the cuts are designed to remove the event by only checking DAQ reading

  • IRS block index, SensorHk reading, Event rate

All the cuts are independant.

This cut was applied to blinded (100%) data to produce accurate pedestal

1st data cleaning is done by below command:

source ../setup.sh
python3 script_executor.py -k sub_info -s <station ID> -r <run number> -b 1          # collect sencondary information
python3 script_executor.py -k qual_cut_1st -s <station ID> -r <run number> -b 1 -q 1 # performs 1st data cleaning

It will use pre_qual_cut_loader and ped_qual_cut_loader class

DAQ index cut

Related doc. DB2530, DB2554

This cut is done by get_daq_structure_errors function

It is designed to remove unphysical event by checking secondary information (DAQ index)

Corrupted IRS block / DDA board / channel masking index and unexpected WF length has strong correlation with unphysical WF shape

Number of bad events identified by DAQ index cut is negligible in live time (rare event)

../../_images/daq_err_index.png

Fig. 14 secondary index of DAQ error event

../../_images/daq_err_wf.png

Fig. 15 DAQ error wf with block and sample index

../../_images/daq_err_wf16.png

Fig. 16 DAQ error 16 wfs with block and sample index

DAQ index cut will test each event’s DAQ reading by below categories

  • number of IRS blocks are always 4 multiples

  • IRS number is always the same for all DDAs

  • DDA index is always like 0,1,2,3,0,1~

  • Channel masking is always ‘1’ for all 8 binary digits

  • There is a block gap or not

  • WF length is always same or greater than run configuration. example) soft:8, RF/Cal:28, (depending on DAQ configuration)

Cut 5 & 6 are imported from previous A2/3 diffuse analysis

First minute cut

Related doc. DB2554

This cut is done by get_first_minute_events function

The events at every beginning of the run have unphysical WF shape, such as Spikey WF, large offset with a pedestal

Cause is unknown. It might be due to digitizer reset issue

This cut will exclude first 1 minute of every runs

../../_images/cur2_wf2.png

Fig. 17 bad wf in first minute

../../_images/cur2_wf_hist.png

Fig. 18 distribution of first minute wfs in median

../../_images/cut2_table.png

Fig. 19 livetime losses of first minute cut

DDA voltage cut

Related doc. DB2530

This cut is done by get_bias_voltage_events and get_bad_evt_rate_events function

Unstable bias voltage feeds on the DDA boards cause unphysical WF.

Shift of Noise level, 0 ADC event, time drift, and 0 Calpulser rate

If DDA voltage readings (in sensorHk) were out of 3 ~ 3.5 or Calpulser rate was lower than 0.85 Hz, events are removed

../../_images/cut3_hist.png

Fig. 20 data status with sensoeHk data

../../_images/cut3_hist2.png

Fig. 21 result of the cut

High software rate cut

Related doc. DB2554

This cut is done by get_bad_evt_rate_events function with use_sec = True option

High Software rate are presented mostly beginning of the run

Assuming DAQ / internal clock are not performing well

If there is more than 2 software event per unix time, all the events in that unix time are removed

../../_images/cut4_hist.png

Fig. 22 distribution of software rate. top: based on operation time, bottom: baseed on rate

../../_images/cut34_table.png

Fig. 23 livetime losses of DDA voltage and high software cut

../../_images/cut4_hist0_0.png

Fig. 24 example run when software rate is 2

../../_images/cut4_hist0_1.png

Fig. 25 example run when software rate is 2. Maroon circle indicates software triggered event.

Short run cut

Related doc. DB2560

This cut is done by get_short_run_events function

Decided to remove the run if Ops. time is shorter than 30 min. or number of RF/Soft events are smaller than 10000

Focus the analysis to the run that has normalish Ops. time

../../_images/cut5_hist.png

Fig. 26 distribution of runs based on operation time and number triggered events

Known bad runs / unix time cut

Related doc. ARA wiki, GitHub

This cut is done by get_known_bad_unix_time_events and get_known_bad_run_events function

Most of known bad runs / unix times are corresponding to calibration run or surface activity during the pole season

../../_images/cut56_table.png

Fig. 27 livetime losses of short run, known bad run and bad unix time

CW GPS cut

Related doc. DB2664

This cut is done by get_cw_log_events function

The event that has weather balloon signature is removed by MET’s balloon flight log

If distance between station and balloon is smaller than ~17 km, events in that period are removed

../../_images/cut7_hist.png

Fig. 28 example GPS path and overlap with Sinesubtract data

../../_images/cut7_hist2.png

Fig. 29 total comaprison 2013 to 2020

../../_images/cut7_hist3.png

Fig. 30 Sinesubtract data vs distance between station and balloon

Pedestal cut

Related doc. DB2560

This cut is done by run_ped_qual_cut function

If the run has a short operation time or small number of triggered events, It has not good enough events to create a pedestal for all buffer regions

After all the cuts, If usage of any IRS block was below 20 (current Config. file limit for pedestal production), run will be removed

../../_images/cut8_log.png

Fig. 31 pedestal configuration in the log file

../../_images/cut8_hist.png

Fig. 32 example of bad pedestal file

../../_images/cut78_table.png

Fig. 33 livetime losses of CW GPS cut and pedestal cut

Livetime

Total livetime after 8 cuts (Good livetime)

  • A2: 5.45 -> 4.85 years (88.95 %)

  • A3: 5.27 -> 4.79 years (90.91 %)

~5% from CW GPS, ~4% from known bad runs, and ~1% from the DAQ error.

Simulation results are weighted based on ‘Good’ live time

This plot is done by dat_summary_live.py, dat_summary_live_sum.py and Check_Sim_v34.0.5_live_time_plot script

../../_images/live_hist.png

Fig. 34 live time after 8 cuts

../../_images/live_hist2.png

Fig. 35 live time after 8 cuts per configuration

Pedestal production

Pedestal production is done by repeder

  • Default pedestal measured from DAQ has a offset and causes miscalibration

new method is extracting pedestal from ADC itself

  • Aligning raw data (ADC) with their analog buffer index and calculates median value of each analog buffer cell

pedestal production is done by below command:

source ../setup.sh
python3 script_executor.py -k ped -s <station ID> -r <run number> -b 1               # produces pedestal

It will launch repeder in AraRoot

../../_images/ped_hist.png

Fig. 36 comparison between default and repeder

../../_images/ped_hist2.png

Fig. 37 In this run, unphysical WFs are clustered in the X: 0 ~ 5000, Y: 2000 regions more than normal WFs. So, the median calculation in repeder chosen unphysical WF cluster as an expected baseline of run

L1 cut (potential cut)

This cut is living in the package. But I decided to exclude from 1st data cleaning

It is designed to remove the period that trigger is stablizing to servo goal value

Reason I decied to excluded it are 1) The event that removed by this cut is not that much different with thermal noise event and 2) It is taking out 15 % of live time

But we must explore this period further in the future

This is done by get_bad_l1_rate_events function

Low RF rate cut

Every beginning of the run, low RF rates are observed while threshold is stabilizing itself

Decided to removed all events when RF event rates are low

But this can be re-defined by threshold value (the hardware value), not by empirical cut !!

../../_images/l1_hist0.png

Fig. 38 top: RF event rate of all runs, middle: event rate of A2 Run12001, down: L1 threshold of A2 Run12001

L1 rate

Number of triggered events are regulated by so called dynamic threshold system

DAC value from comparator is constantly changing to keep goal of L1 rate

  • DAC is updated every 2 second based on data

  • how many time ADC value of event is bigger than DAC

  • It called the servo goal

../../_images/l1_hist1.png

Fig. 39 Event rate, L1 rate and L1 threshold of A3 Run11649

../../_images/l1_hist2.png

Fig. 40 L1 Rate of all data (2013 ~2022). Alignment of 1d histogram of L1 rate. Majority of run/channel is following the servo goal. But Some channels fluctuation is too big

Attempt to isolating stabilization period

Tried to isolate the period in two way

  • Calculate mean of L1 rate in each minute and compare with L1 goal

  • Calculate standard deviation of L1 rate in each minute from L1 goal

Run by run or certain time period are showing completely different rate behavior

  • Couldn’t set the global cut value for isolating beginning of period…

../../_images/l1_hist3.png

Fig. 41 L1 rate with servo goal

../../_images/l1_hist4.png

Fig. 42 mean and std of L1 rate of all runs

Simple approach for the L1 cut

Locating ‘stabilized’ period on each run by utilizing shape of L1 rate

  • smoothing out L1 rate fluctuation by averaging in certain time period (1 min or 10 sec)

  • set the cut when smoothing value is passed L1 goal value

Tried several different method

  • Ultimatly decied to use 10 second mean without error value

  • After 10 second mean, find the point smoothed line is crossed with servo goal value (1st Min from 1st Max point)

../../_images/l1_hist5.png

Fig. 43 Several different methods to find the flow of L1 rate

L1 cut results in event rate

2d histogram (time vs rate) of stacked RF event rate in different stage

Quality cut is all the hardware cut i developed

Looks like all the early value is nicely cut it out

But it removed 15 % of live time

../../_images/l1_hist6.png

Fig. 44 2d histogram (time vs rate) of stacked RF event rate in different stage

../../_images/l1_hist7.png

Fig. 45 2d histogram (run vs rate) of stacked RF event rate in different stage

L1 cut results in Cmax and Mmax

Distribution show events that flagged by L1 cut is located in the center of the thermal noise distribution

Since I’m not really seeing bad event from L1 cut results, I decided to exclude the L1 cut from 1st data cleaning

../../_images/l1_hist8.png

Fig. 46 A2 VPol Left: distribution in interferometry parameter. Right: distribution in matched filter parameter

../../_images/l1_hist9.png

Fig. 47 A2 HPol

../../_images/l1_hist10.png

Fig. 48 A3 VPol

../../_images/l1_hist11.png

Fig. 49 A3 HPol