from __future__ import division, print_function
import numpy, pylab, math
%matplotlib inline
Now that you have learned how to use dataio-shovel and steamshovel, you are ready to learn to manipulate information within an i3 file. You will use the python skills you learned to write simple scripts to access information, select events, add objects to the frame and to make simple plots.
The first step is to get the IceCube software
import icecube
We now have access to all the projects that we built before. The first project we will learn about is dataio.
from icecube import dataio
The dataio project allows us to access i3 files. We will learn more about the dataio.I3File method using the ipython help() function
help(dataio.I3File)
We see that there are five ways to initialize an I3File. In order to access the files that we looked at earlier, we will use the third initializer. Since we have two files, a GCD file and a data file, we will open them both separately:
geofile = dataio.I3File('/home/mamday/IceCube/data/IC86/SantaIgel/GeoCalibDetectorStatus_IC86.55697_corrected_V2.i3')
infile = dataio.I3File('/home/mamday/IceCube/data/IC86/SantaIgel/JP/SebCutL8-JPVuvNuMu-JPVuvNuMu-SebCut-genie_ic.1450.000499.L7.i3')
If we want to be able to write frames into a new i3 file, we need to use either the fourth or fifth initializer. We can create a file using the fourth initializer called 'testfile.i3':
testfile = dataio.I3File('testfile.i3',dataio.I3File.Writing)
But this can be acheived with a little less typing using the fifth initializer:
outfile = dataio.I3File('newfile.i3','w')
So now we have our two input i3 files and our output i3 file. In order to access the information in the files, we have to access the frames inside. There are several methods for accessing the frame objects. First we can just pop the first frame in the file using pop_frame()
frame = infile.pop_frame()
Great. Now we have accessed our first frame! If we want to know what is contained in our frame, we can use keys():
frame.keys()
This is the first frame in the file. Since it is a data file, the first frame is a DAQ frame.
What happens if we use pop_frame again? Let's try:
frame = infile.pop_frame()
frame.keys()
Now we have a new frame. Since the i3 files use a stack data structure, we can no longer access the previous frame of the file. However, to return to the first frame, we can use the rewind() method:
infile.rewind()
Now, if we pop the frame again, we will get the first frame in the file:
frame = infile.pop_frame()
frame.keys()
There are a few other ways of getting frames, depending on what types of frame you are interested in. If you only want Physics frames, you can use pop_physics():
frame = infile.pop_physics()
Or if you only want DAQ frames you can use pop_daq():
frame = infile.pop_daq()
For now we want to look at the first frame in the file, so we will rewind again:
infile.rewind()
Generally, in an analysis, you will want to select some frames in a file and reject others. Predominately you will do this for a large number of events and you will want to use the Icetray module framework. However, if you have a small number of events, you can keep or reject frames using the dataio methods. Previously we learned how to create an outputfile. Now we will add a frame to the output file using the push method:
outfile.push(frame)
Our output file now contains a single frame. A simple use for this method would be to write out the first 100 frames of a file into a new i3 file:
for i in xrange(100):
frame = infile.pop_frame()
outfile.push(frame)
outfile.close()
Now if we look at our file in dataio-shovel, we see that it is the same as the previous file, but now it contains only the first 100 frames. Before we continue, we will rewind the input file and get the first frame again
infile.rewind()
frame = infile.pop_frame()
frame.keys()
Now, if we want to actually be able to look at anything in the frame, we need to get an important new project, dataclasses:
from icecube import dataclasses
The dataclasses project contains the majority of the objects that you will see in most i3 files. Most analyses also depend on other projects in the IceCube software package as well, which will be covered further in the section on icetray and i3 modules later in the week. For now we are going to use our new project to get some information about our frame. The first frame object we will look at is the 'I3EventHeader'. Using the Get method:
evt_head = frame.Get('I3EventHeader')
We can also get access to this object by using frame["ObjectName"]
evt_head = frame["I3EventHeader"]
Now we have access to the I3EventHeader object. Generally the most useful information in the I3EventHeader are the Run ID and the Event ID
evt_id = evt_head.event_id
run_id = evt_head.run_id
print(run_id, evt_id)
Now we want to select events based on their Event ID. First we will create a new output file, eventfile.i3
outfile1 = dataio.I3File('eventfile.i3','w')
We would like to loop over all the frames, but we don't necessarily know how many frames are in the file. In order to do that, we can use the I3File.more method, which is True if there are more frames in the file and False otherwise:
while(infile.more()):
frame = infile.pop_frame()
evt_head = frame["I3EventHeader"]
evt_id = evt_head.event_id
if(evt_id<100):
print(evt_id)
Exercise: Fill eventfile.i3 with events whose event_id ends with the number 5
Example Solution:
infile.rewind()
while(infile.more()):
frame = infile.pop_frame()
evt_head = frame["I3EventHeader"]
evt_id = evt_head.event_id
if(not(evt_id%5) and evt_id%10):
outfile1.push(frame)
outfile1.close()
infile.rewind()
There are relatively few cases where you will want to cut events based on the Event ID. A more common object used to cut events is the FilterMask. The second level of processing for analysis files after triggering is to determine if the event passes certain predetermined cuts, or filters, for a specific type of analysis. The pass or fail result for the event is then stored in a dict in the FilterMask.
filt_mask = frame["FilterMask"]
You can determine what filters and included in the FilterMask simply by printing it:
print(filt_mask)
Each filter name is a key in the filter mask. You can then access the pass or fail result of the event as a boolean using the method condition_passed:
filt_mask["CascadeFilter_11"].condition_passed
In this case the result of the CascadeFilter_11 is False.
Exercise: Write a function that outputs frames to a file if they pass EHEFilter_11
Most of the information you want to use in your analysis is contained in the reconstructed pulses. Reconstructed pulses contain information about what was measured by each DOM in an event.
off_pulses = frame['OfflinePulses']
The pulses we extracted from the frame are indexed by the DOM where the interaction took place. We can look at all the DOMs in the event using a simple for loop:
for i,j in off_pulses:
print(i,j)
From this you can see that each entry has an OMKey, which identifies the DOM, and a pulse series. From the OMKey you can get the OM number and the string number for the DOM:
omstringTup = [(i.om,i.string) for i in off_pulses.keys()]
print(omstringTup[:10])
off_pulses.keys()
The pulses for each DOM contain information about the time, charge and the electronic read out of the pulse (we'll ignore width for now). We can use this information to learn about the physics of the event. One criterion that is often used is the number of DOMs in a pulse series, referred to as the "number of channels" in the event. We can easily get this information, which is simply the length of our pulse variable:
print(len(off_pulses))
You might also want to know the total number of pulses in the event. Each DOM may contain multiple pulses. In order to count the total number of pulses on each DOM, you can just get the length of the list containing all of the elements in reconstructed pulse series:
all_pulses = [p for i,j in off_pulses for p in j]
print(len(all_pulses))
For technical reasons a single interaction can sometimes be split into multiple pulses. Therefore it is often better to use the total charge in an event instead of the total number of pulses. Since we created a list of all the pulses in the event already, we can calculate the total charge as follows:
tot_charge = sum([p.charge for p in all_pulses])
print(tot_charge)
We can now use what we learned about plotting to plot the number of channels for each event in the file:
n_chan = []
while(infile.more()):
frame = infile.pop_frame()
off_pulses = frame['OfflinePulses']
n_chan.append(len(off_pulses))
bin_it = numpy.linspace(0,max(n_chan),max(n_chan)+1)
counts, bins, patches = pylab.hist(n_chan,bin_it,color='r',histtype='step')
pylab.ylim(0,max(counts)+1)
pylab.xlabel('NChan')
Exercise: Plot number of pulses and total charge for all events in the file
Some pulses do not contain both fADC and ATWD information. If you want to treat pulses differently depending on whether or not they have ATWD information, you can do so by using flags as follows:
tot_atwd_charge = sum([p.charge for p in all_pulses if (p.flags & dataclasses.I3RecoPulse.PulseFlags.ATWD)])
print(tot_atwd_charge)
We can then look at the distribution of charge in the event like so:
charges = numpy.array([p.charge for p in all_pulses])
bin_it = numpy.linspace(0,math.ceil(max(charges)),math.ceil(max(charges))+1)
counts, bins, patches = pylab.hist(charges,bin_it,color='r',histtype='step')
pylab.ylim(0,max(counts)+1)
pylab.xlabel('Pulse Charge')
And compare this to the plot of the distribution of charge for pulses that have ATWD information:
charges = numpy.array([p.charge for p in all_pulses])
atwd_charges = numpy.array([p.charge for p in all_pulses if (p.flags & dataclasses.I3RecoPulse.PulseFlags.ATWD)])
bin_it = numpy.linspace(0,math.ceil(max(charges)),math.ceil(max(charges))+1)
counts, bins, patches = pylab.hist(charges,bin_it,color='r',histtype='step',label='All')
counts1, bins1, patches1 = pylab.hist(atwd_charges,bin_it,color='b',histtype='step',label="Has ATWD")
pylab.ylim(0,max(counts+counts1)+1)
pylab.xlabel('Pulse Charge')
pylab.legend(loc=1)
For various reasons we may not want to use all of the pulses in an event. We may then want to be able to pass a pulse series that meets our criterion to various algorithms. In this case, we can create a new pulse series that contains only the pulses we want by using an I3RecoPulseSeriesMapMask. Let's make a pulse series containing only pulses with ATWD information. First we have to create a function that returns True or False depending on if the pulse meets our requirements:
def myPulses(omkey, index, pulse):
if(pulse.flags & dataclasses.I3RecoPulse.PulseFlags.ATWD):
return True
else:
return False
Our function takes three arguments: omkey, index and pulse. The omkey is the key for the DOM that contains the pulse, the index is the position of the pulse in the I3RecoPulse map for that DOM, and the pulse is the I3RecoPulse, which contains information about the time, charge and flags of the pulse, as we learned before. Now, to create our new pulse series, we define the mask:
my_mask = dataclasses.I3RecoPulseSeriesMapMask(frame,'OfflinePulses',myPulses)
Now that we created our I3RecoPulseSeriesMapMask, we can access information about the pulses in the same way as before:
mask_pulses = my_mask.apply(frame)
tot_mask_charge = sum([p.charge for i,j in mask_pulses for p in j])
print(tot_mask_charge)
Calculating the charge of our new pulse series, we see that we get the same result as when we required ATWD information in our full pulse series.
Exercise: Create a mask that contains only DOMs where the total charge is > 10
Exercise: Create a mask that contains only DOMs on the inner DeepCore strings (inner DC strings are strings [26,27,35,36,37,45,46,79,80,81,82,83,84,85,86])
Now that you have created a new pulse object, you will probably want to add this object to the frame. This is acheived with the frame 'Put' method, which requires the name of your object and the object variable as inputs:
frame.Put('ATWDPulseMask',my_mask)
or
frame['OtherATWDPulseMask']=my_mask
We can now add our modified frame to the output file:
testfile.push(frame)
testfile.close()
Other useful information that you will often want is the position of the DOM that contains the pulses you are interested in. I3RecoPulses do not explicitly contain information about the position of the DOM. However, the OMKey of the DOM can be used to determine the position from the I3Geometry. We will now use the second file we loaded before:
g_frame = geofile.pop_frame()
g_frame.keys()
We saw before that the GCD file contains three main objects: I3Geometry, I3Calibration and I3DetectorStatus. To get the position information for the DOMs, we want to access the I3Geometry object:
geometry = g_frame["I3Geometry"]
The I3Geometry contains information about the DOM positions in something called an OMGeo. We can see the mapping of OMKeys to DOM information by looping over the OMGeos, similar to how we looped over the I3RecoPulseSeries before:
#for i, j in geometry.omgeo:
# print i,j
help(icecube.dataclasses.I3OMGeo)
From this we see that the OMGeo has many properties. The property we want is 'position':
dom_pos = [j.position for i,j in geometry.omgeo]
The entries of domPos are all objects of type 'I3Position', which contains the x, y and z component of the position. You can access the individual components as follows:
sep_pos = [(p.x,p.y,p.z) for p in dom_pos]
print(sep_pos[:10])
Exercise: In the IceCube detector there is a region where the ice is much less clear due to the accumulation of particulates called the 'dust layer'. This dust layer is approximately in the region from 0 to -150 m in the IceCube coordinate system. Create two pulse series, one with pulses above the dust layer and one with pulses below the dust layer. Plot the total charge in each pulse series for all events.
Exercise: a) Previously you created a pulse series that contained only DOMs from the inner DeepCore strings. Create a new pulse discriminator that requires that these pulses be below the dust layer.
You can find the distance between two DOMs very simply. I3Positions act like vectors, and can be added and subtracted. Then, to find the distance, you can very simply find the magnitude of the difference:
first_dist = dom_pos[0]-dom_pos[1]
print(dom_pos[0],dom_pos[1])
first_dist_mag = first_dist.magnitude
print (first_dist_mag)
Exercise: Create a pulse series that contains all pulses that were not selected by your inner DeepCore pulse selector, call it 'NonDCPulses'.
Muons travel through the ice at approximately the speed of light. A muon travelling through the detector and into DeepCore will have pulses inside the DeepCore volume that are causally related to pulses outside of it.
For each pulse in NonDCPulses calculate the distance and the time between the pulses and the COG of the DeepCore pulses you selected before. From this, calculate the velocity of a particle travelling between those pulses, distance/time. Plot the distribution of velocities. Count the number of pulses in NonDCPulses with a velocity in the range 0.25-0.4 m/ns.
Create an output file that contains events that have fewer than 3 pulses in NonDCPulses.
Exercise: This is a simple method for vetoing through going muons that pass through DeepCore. Think of a way to improve this algorithm using information about the pulses that we learned in this tutorial.
Exercise: Create a new method of vetoing events using information we learned in this tutorial.