Overview

1st data cleaning is designed to ‘remove’ 1) unphysical waveform/event caused by DAQ error, and 2) the cuts that reducing livetime

Time period that contaminated by weather balloon also removed from live time by cross-checking with balloon’s GPS log (CW GPS cut)

Most of the cuts are designed to remove the event by only checking DAQ reading

IRS block index, SensorHk reading, Event rate

All the cuts are independant.

This cut was applied to blinded (100%) data to produce accurate pedestal

1st data cleaning is done by below command:

source ../setup.sh
python3 script_executor.py -k sub_info -s <station ID> -r <run number> -b 1          # collect sencondary information
python3 script_executor.py -k qual_cut_1st -s <station ID> -r <run number> -b 1 -q 1 # performs 1st data cleaning

It will use pre_qual_cut_loader and ped_qual_cut_loader class

DAQ index cut

Related doc. DB2530, DB2554

This cut is done by get_daq_structure_errors function

It is designed to remove unphysical event by checking secondary information (DAQ index)

Corrupted IRS block / DDA board / channel masking index and unexpected WF length has strong correlation with unphysical WF shape

Number of bad events identified by DAQ index cut is negligible in live time (rare event)

../../_images/daq_err_index.png — Fig. 14 secondary index of DAQ error event

../../_images/daq_err_wf.png — Fig. 15 DAQ error wf with block and sample index

../../_images/daq_err_wf16.png — Fig. 16 DAQ error 16 wfs with block and sample index

DAQ index cut will test each event’s DAQ reading by below categories

number of IRS blocks are always 4 multiples
IRS number is always the same for all DDAs
DDA index is always like 0,1,2,3,0,1~
Channel masking is always ‘1’ for all 8 binary digits
There is a block gap or not
WF length is always same or greater than run configuration. example) soft:8, RF/Cal:28, (depending on DAQ configuration)

Cut 5 & 6 are imported from previous A2/3 diffuse analysis

First minute cut

Related doc. DB2554

This cut is done by get_first_minute_events function

The events at every beginning of the run have unphysical WF shape, such as Spikey WF, large offset with a pedestal

Cause is unknown. It might be due to digitizer reset issue

This cut will exclude first 1 minute of every runs

../../_images/cur2_wf2.png — Fig. 17 bad wf in first minute

../../_images/cur2_wf_hist.png — Fig. 18 distribution of first minute wfs in median

../../_images/cut2_table.png — Fig. 19 livetime losses of first minute cut

DDA voltage cut

Related doc. DB2530

This cut is done by get_bias_voltage_events and get_bad_evt_rate_events function

Unstable bias voltage feeds on the DDA boards cause unphysical WF.

Shift of Noise level, 0 ADC event, time drift, and 0 Calpulser rate

If DDA voltage readings (in sensorHk) were out of 3 ~ 3.5 or Calpulser rate was lower than 0.85 Hz, events are removed

../../_images/cut3_hist.png — Fig. 20 data status with sensoeHk data

../../_images/cut3_hist2.png — Fig. 21 result of the cut

High software rate cut

Related doc. DB2554

This cut is done by get_bad_evt_rate_events function with use_sec = True option

High Software rate are presented mostly beginning of the run

Assuming DAQ / internal clock are not performing well

If there is more than 2 software event per unix time, all the events in that unix time are removed

../../_images/cut4_hist.png — Fig. 22 distribution of software rate. top: based on operation time, bottom: baseed on rate

../../_images/cut34_table.png — Fig. 23 livetime losses of DDA voltage and high software cut

../../_images/cut4_hist0_0.png — Fig. 24 example run when software rate is 2

../../_images/cut4_hist0_1.png — Fig. 25 example run when software rate is 2. Maroon circle indicates software triggered event.

Short run cut

Related doc. DB2560

This cut is done by get_short_run_events function

Decided to remove the run if Ops. time is shorter than 30 min. or number of RF/Soft events are smaller than 10000

Focus the analysis to the run that has normalish Ops. time

../../_images/cut5_hist.png — Fig. 26 distribution of runs based on operation time and number triggered events

Known bad runs / unix time cut

Related doc. ARA wiki, GitHub

This cut is done by get_known_bad_unix_time_events and get_known_bad_run_events function

Most of known bad runs / unix times are corresponding to calibration run or surface activity during the pole season

../../_images/cut56_table.png — Fig. 27 livetime losses of short run, known bad run and bad unix time

CW GPS cut

Related doc. DB2664

This cut is done by get_cw_log_events function

The event that has weather balloon signature is removed by MET’s balloon flight log

If distance between station and balloon is smaller than ~17 km, events in that period are removed

../../_images/cut7_hist.png — Fig. 28 example GPS path and overlap with Sinesubtract data

../../_images/cut7_hist2.png — Fig. 29 total comaprison 2013 to 2020

../../_images/cut7_hist3.png — Fig. 30 Sinesubtract data vs distance between station and balloon

Pedestal cut

Related doc. DB2560

This cut is done by run_ped_qual_cut function

If the run has a short operation time or small number of triggered events, It has not good enough events to create a pedestal for all buffer regions

After all the cuts, If usage of any IRS block was below 20 (current Config. file limit for pedestal production), run will be removed

../../_images/cut8_log.png — Fig. 31 pedestal configuration in the log file

../../_images/cut8_hist.png — Fig. 32 example of bad pedestal file

../../_images/cut78_table.png — Fig. 33 livetime losses of CW GPS cut and pedestal cut

Livetime

Total livetime after 8 cuts (Good livetime)

A2: 5.45 -> 4.85 years (88.95 %)
A3: 5.27 -> 4.79 years (90.91 %)

~5% from CW GPS, ~4% from known bad runs, and ~1% from the DAQ error.

Simulation results are weighted based on ‘Good’ live time

This plot is done by dat_summary_live.py, dat_summary_live_sum.py and Check_Sim_v34.0.5_live_time_plot script

../../_images/live_hist.png — Fig. 34 live time after 8 cuts

../../_images/live_hist2.png — Fig. 35 live time after 8 cuts per configuration

Pedestal production

Pedestal production is done by repeder

Default pedestal measured from DAQ has a offset and causes miscalibration

new method is extracting pedestal from ADC itself

Aligning raw data (ADC) with their analog buffer index and calculates median value of each analog buffer cell

pedestal production is done by below command:

source ../setup.sh
python3 script_executor.py -k ped -s <station ID> -r <run number> -b 1               # produces pedestal

It will launch repeder in AraRoot

../../_images/ped_hist.png — Fig. 36 comparison between default and repeder

../../_images/ped_hist2.png — Fig. 37 In this run, unphysical WFs are clustered in the X: 0 ~ 5000, Y: 2000 regions more than normal WFs. So, the median calculation in repeder chosen unphysical WF cluster as an expected baseline of run

L1 cut (potential cut)

This cut is living in the package. But I decided to exclude from 1st data cleaning

It is designed to remove the period that trigger is stablizing to servo goal value

Reason I decied to excluded it are 1) The event that removed by this cut is not that much different with thermal noise event and 2) It is taking out 15 % of live time

But we must explore this period further in the future

This is done by get_bad_l1_rate_events function

Low RF rate cut

Every beginning of the run, low RF rates are observed while threshold is stabilizing itself

Decided to removed all events when RF event rates are low

But this can be re-defined by threshold value (the hardware value), not by empirical cut !!

../../_images/l1_hist0.png — Fig. 38 top: RF event rate of all runs, middle: event rate of A2 Run12001, down: L1 threshold of A2 Run12001

L1 rate

Number of triggered events are regulated by so called dynamic threshold system

DAC value from comparator is constantly changing to keep goal of L1 rate

DAC is updated every 2 second based on data
how many time ADC value of event is bigger than DAC
It called the servo goal

../../_images/l1_hist1.png — Fig. 39 Event rate, L1 rate and L1 threshold of A3 Run11649

../../_images/l1_hist2.png — Fig. 40 L1 Rate of all data (2013 ~2022). Alignment of 1d histogram of L1 rate. Majority of run/channel is following the servo goal. But Some channels fluctuation is too big

Attempt to isolating stabilization period

Tried to isolate the period in two way

Calculate mean of L1 rate in each minute and compare with L1 goal
Calculate standard deviation of L1 rate in each minute from L1 goal

Run by run or certain time period are showing completely different rate behavior

Couldn’t set the global cut value for isolating beginning of period…

../../_images/l1_hist3.png — Fig. 41 L1 rate with servo goal

../../_images/l1_hist4.png — Fig. 42 mean and std of L1 rate of all runs

Simple approach for the L1 cut

Locating ‘stabilized’ period on each run by utilizing shape of L1 rate

smoothing out L1 rate fluctuation by averaging in certain time period (1 min or 10 sec)
set the cut when smoothing value is passed L1 goal value

Tried several different method

Ultimatly decied to use 10 second mean without error value
After 10 second mean, find the point smoothed line is crossed with servo goal value (1st Min from 1st Max point)