LightMap Hit Pattern Examples

In this notebook, we go over a method for modeling a hit pattern — the photon collection efficiency as a function of SiPM, given a photon generation position inside a detector (in this case, a TPC).

All caveats regarding the total light treatment in this project apply equally here.

Jump to:

Imports

We want numpy+matplotlib for the basics; histlite for histograms; and the new nEXO-offline package LightMap.

  • Histlite: pip install --user histlite
  • LightMap: in your nEXO-offline sources, git checkout u/mrichman/issue_0050 (see issue 50)
In [1]:
import numpy as np
import matplotlib.pyplot as plt

import histlite as hl
import LightMap

%matplotlib inline
Welcome to JupyROOT 6.16/00
In [2]:
# plotting aesthetics
plt.rc('figure', dpi=150, figsize=(4,3), facecolor='w')
plt.rc('savefig', dpi=150, facecolor='w')
plt.rc('lines', linewidth=1.5)

Loading an existing LightMap

We'll start by loading up a LightMap previously computed in the total-light tutorial. This is useful for two reasons:

  1. We can ensure that our hitpattern is defined for the same geometry.
  2. We can optionally use the total light map in conjunction with the hitpattern.

The models are stored under:

In [3]:
import os
home_dir = os.getenv('HOME')
model_dir = '{}/lightmap/total/tutorial'.format(home_dir)

And we recover the LightMap itself by, e.g.:

In [4]:
lm = LightMap.load_model(model_dir, 'LightMapSplineXYZ')
lm
<- /g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy ...
Out[4]:
LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(15 bins in [-605.493299575714,605.493299575714], 15 bins in [-605.493299575714,605.493299575714], 15 bins in [-1006.8730845119838,257.1300590359741], with sum 5000000.0, 600 empty bins, and 0 non-finite values)
  spline=<histlite.SplineFit object at 0x2aab7bedbe10>
)

The geometry of that lightmap is:

In [5]:
tpc = lm.tpc
tpc
Out[5]:
TPC(r=567.65, zmin=-967.37, zmax=217.63)

HitPattern — low statistics

We'll work with the same MC dataset as in the total light tutorial:

In [6]:
mc_dir = '/p/lustre2/nexouser/optical_sample/g4'
filenames = [
    '{}/PhotonBomb_gps_ActiveOptical_seed{}.nEXOevents.root'.format(mc_dir, i)
    for i in range(1, 1001)]

High statistics MC is required to parameterize the hit pattern with reasonable accuracy. And now, we need not only the per-"event" summary information, but the details of where each collected photon landed. For this reason, the fitting is done in terms of filenames rather than data pre-loaded into RAM.

The HitPattern is initially constructed like so:

In [7]:
# assumes 384x120 SiPM array
hp1 = LightMap.hitpattern.HitPatternHist(tpc)

We'll start by fitting with a single file:

In [8]:
hp1.fit(filenames[:1])
Processing 1 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed1.nEXOevents.root ...
Done: 0:00:16.426142 elapsed.
Warning in <TClass::Init>: no dictionary for class nEXO::EvtNavigator is available
Warning in <TClass::Init>: no dictionary for class nEXO::SimEvent is available
Warning in <TClass::Init>: no dictionary for class nEXO::EventObject is available
Warning in <TClass::Init>: no dictionary for class nEXO::SimHeader is available
Warning in <TClass::Init>: no dictionary for class nEXO::HeaderObject is available
Warning in <TClass::Init>: no dictionary for class nEXO::SmartRef is available
Warning in <TClass::Init>: no dictionary for class JobInfo is available
Warning in <TClass::Init>: no dictionary for class nEXO::FileMetaData is available
Warning in <TClass::Init>: no dictionary for class nEXO::TreeMetaData is available
Warning in <TClass::Init>: no dictionary for class nEXO::UniqueIDTable is available
Warning in <TClass::Init>: no dictionary for class nEXO::TablePerTree is available

Over 15 seconds per file is not the best, so for this tutorial we'll save immediately after every fit:

In [9]:
LightMap.save_model(model_dir, 'HitPatternHist__0001_files', hp1)
-> /g/g20/richman3/lightmap/total/tutorial/HitPatternHist__0001_files/model.npy ...

We reload the lightmap to make sure everything is working:

In [10]:
hp1a = LightMap.load_model(model_dir, 'HitPatternHist__0001_files')
<- /g/g20/richman3/lightmap/total/tutorial/HitPatternHist__0001_files/model.npy ...

A quick check that the underlying histograms match:

In [11]:
np.all(hp1.H.values == hp1a.H.values)
Out[11]:
True

HitPattern — decent statistics

We won't be able to do much with one file. How long are we willing to wait? Let's say 60 mins. How many files will that get us?

In [12]:
60 * 60 / 16. # mins * sec-per-min / sec-per-file
Out[12]:
225.0

Off we go, then, processing 200 files:

In [13]:
hp = LightMap.hitpattern.HitPatternHist(tpc)
In [14]:
hp.fit(filenames[:200])
Processing 200 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed200.nEXOevents.root ...
Done: 1:01:37.934875 elapsed.

Again, we save right away:

In [15]:
LightMap.save_model(model_dir, 'HitPatternHist__0200_files', hp)
-> /g/g20/richman3/lightmap/total/tutorial/HitPatternHist__0200_files/model.npy ...
In [16]:
#hp = LightMap.load_model(model_dir, 'HitPatternHist__0200_files')

HitPattern Visualization

Before we try anything else, we should make sure the hitpattern makes sense visually.

We'll define some helper variables and a function:

In [17]:
figsize = (1.5*3.9, 1.5*1.24)
figsize6 = (2*figsize[0], 3 * figsize[1])
In [18]:
def hp_plot(ax, h):
    out = hl.plot2d(ax, h / h.values.max(), cmap='Greys_r', vmin=0)
    ax.set_xlabel(r'SiPM $\theta$ bin')
    ax.set_ylabel(r'SiPM $z$ bin')
    return out

And now, here is an example of continuous evaluation in $r_\text{gen}$:

In [19]:
q = hp.centers_theta[17]
z = hp.centers_z[4]
rs = np.linspace(hp.centers_r[6], hp.centers_r[7], 6)
fig, axs = plt.subplots(3, 2, figsize=figsize6)
axs = np.ravel(axs)
title_fmt = (r'$(r_\mathsf{{gen}},\theta_\mathsf{{gen}},z_\mathsf{{gen}})'
             r'\ =\ (\mathbf{{{:.0f}}},\ {:.0f}^\circ,\ {:.0f})$')
for (i,r) in enumerate(rs):
    ax = axs[i]
    h = hp(r, q, z, cyl=True)
    hp_plot(ax, h)
    ax.set_title(title_fmt.format(r, np.degrees(q), z))
plt.tight_layout()

and in $\theta_\text{gen}$:

In [20]:
r = hp.centers_r[6]
z = hp.centers_z[4]
qs = np.linspace(hp.centers_theta[16], hp.centers_theta[17], 6)
fig, axs = plt.subplots(3, 2, figsize=figsize6)
axs = np.ravel(axs)
title_fmt = (r'$(r_\mathsf{{gen}},\theta_\mathsf{{gen}},z_\mathsf{{gen}})'
             r'\ =\ ({:.0f},\ \mathbf{{{:.0f}^\circ}},\ {:.0f})$')
for (i,q) in enumerate(qs):
    ax = axs[i]
    h = hp(r, q, z, cyl=True)
    hp_plot(ax, h)
    ax.set_title(title_fmt.format(r, np.degrees(q), z))
plt.tight_layout()

and finally, in $z_\text{gen}$:

In [21]:
r = hp.centers_r[6]
q = hp.centers_theta[17]
zs = np.linspace(hp.centers_z[4], hp.centers_z[5], 6)
fig, axs = plt.subplots(3, 2, figsize=figsize6)
axs = np.ravel(axs)
title_fmt = (r'$(r_\mathsf{{gen}},\theta_\mathsf{{gen}},z_\mathsf{{gen}})'
             r'\ =\ ({:.0f},\ {:.0f}^\circ,\ \mathbf{{{:.0f}}})$')
for (i,z) in enumerate(zs):
    ax = axs[i]
    h = hp(r, q, z, cyl=True)
    hp_plot(ax, h)
    ax.set_title(title_fmt.format(r, np.degrees(q), z))
plt.tight_layout()

HitPattern value access

Ok, so what are these HitPatternHist objects? If called like a function, they behave similarly to LightMaps, but instead of accepting arrays of generation positions and returning corresponding floats, they accept one generation position at a time and return the entire hl.Hist of detection probabilities.

Note: the below plot revealed that there is either a new, or a pre-existing but previously undetected, bug in the SiPMID $\leftrightarrow$ (col,row) conversion; see the overly bright column on the right side. We're going to ignore it for now, but obviously this needs to be fixed — and in general, it would be good to get some clarity on how exactly LightMap.hitpattern.sipm_id_to_col_row() and its TODO sibling, LightMap.hitpattern.col_row_to_sipm_id() should be expressed.

In [22]:
fig, ax = plt.subplots(figsize=figsize)
hp_plot(ax, hp(142, -142, -400))
Out[22]:
<matplotlib.collections.QuadMesh at 0x2aab911cb940>

To be clear,

In [23]:
hp(142, -142, -400)
Out[23]:
Hist(383 bins in [0,383], 119 bins in [0,119], with sum 0.3241305488888902, 485 empty bins, and 0 non-finite values)

Either Cartesian (default) or cylindrical coordinates can be used:

In [24]:
hp(400, np.radians(45), -400)  # silliness: why y in rad?
Out[24]:
Hist(383 bins in [0,383], 119 bins in [0,119], with sum 0.32999776639086975, 707 empty bins, and 0 non-finite values)
In [25]:
hp(400, np.radians(45), -400, cyl=True)  # ok
Out[25]:
Hist(383 bins in [0,383], 119 bins in [0,119], with sum 0.33035493267465127, 715 empty bins, and 0 non-finite values)
In [26]:
hp(400/np.sqrt(2), 400/np.sqrt(2), -400)  # ok
Out[26]:
Hist(383 bins in [0,383], 119 bins in [0,119], with sum 0.3303549326746512, 715 empty bins, and 0 non-finite values)

We can capture the total collection probability (summing over SiPMs):

In [27]:
hp(400, np.radians(45), -400, cyl=True).values.sum()
Out[27]:
0.33035493267465127

We can also specify a total lightmap from which to obtain the overall normalization:

In [28]:
hp(400, np.radians(45), -400, cyl=True, total=lm).values.sum()
Out[28]:
0.34086768752329394

Of course, this will match the collection efficiency reported by lm:

In [29]:
lm(400, np.radians(45), -400, cyl=True)
Out[29]:
array([0.34086769])

Random sampling

We can sample from the hitpattern:

In [30]:
col, row = hp.sample(30, 400, np.radians(-45), -400, cyl=True, total=lm, seed=1)
col, row
Out[30]:
(array([2.24419195e+02, 2.70685220e+02, 2.04452250e-01, 1.92878117e+02,
        1.07027388e+02, 6.86704675e+01, 1.34417305e+02, 2.06558690e+02,
        2.19140387e+02, 2.43198101e+02]),
 array([ 54.80074457,  49.96826158,  18.31342418,  82.69232262,
         98.87638915,  14.89460666,  55.08504421, 116.03905478,
         73.16983042, 116.8781425 ]))

And we can convert the sampled values to a histogram:

In [31]:
hp.sample_to_image((col, row))
Out[31]:
Hist(383 bins in [0,383], 119 bins in [0,119], with sum 10.0, 45567 empty bins, and 0 non-finite values)

We can also obtain a randomly sampled histogram directly:

In [32]:
hp.sample_image(30, 400, np.radians(-45), -400, cyl=True, total=lm, seed=1)
Out[32]:
Hist(383 bins in [0,383], 119 bins in [0,119], with sum 10.0, 45567 empty bins, and 0 non-finite values)

The latter is useful if, for example, you want to go straight to plotting:

In [33]:
fig, ax = plt.subplots(figsize=figsize)
h = hp.sample_image(10000, 400, np.radians(-45), -400, cyl=True, total=lm, seed=2)
hp_plot(ax, h)
Out[33]:
<matplotlib.collections.QuadMesh at 0x2aab7bd1a5c0>

Finally, note that you can either specify a quantum efficiency, reducing the number of observed photons:

In [34]:
hp.sample(30, 400, np.radians(-45), -400, cyl=True, total=lm, seed=1, qe=.15)
Out[34]:
[array([224.72032449]), array([54.00011437])]

Or you can sample a specific number of observed rather than initial photons:

In [35]:
hp.sample(30, 400, np.radians(-45), -400, cyl=True, total=lm, seed=1, all=True)
Out[35]:
[array([224.09834683, 270.42110763,   0.95788953, 192.53316528,
        107.69187711,  68.31551563, 134.68650093, 206.83462567,
        219.01828828, 243.75014431, 224.98886109, 264.74816565,
        145.28044399, 315.78927933,  19.10322601, 262.44789353,
        224.9085955 , 246.29361415, 103.28777534, 141.13002857,
        288.01936696, 363.67883553, 196.21162812, 265.26554666,
        314.49157316, 322.05336255,  62.57411761,  27.14672857,
        123.58930554, 315.69975836]),
 array([ 54.10233443,  49.41405599,  18.69440016,  82.41417927,
         98.04995346,  14.53589641,  55.66379465, 116.51488911,
         73.94459476, 116.58655504,  80.90340192,  63.1374747 ,
         92.13927635,  70.80739129,  54.39767684,  46.1653542 ,
         57.92750858,  64.34776586,  50.7508121 , 105.72599799,
         54.88330609,   9.62367221,  66.75094243,  70.34889834,
        106.26992789,  76.89588622,  52.42809119,  90.96484005,
         69.6634415 ,  71.62169572])]

Towards maximum likelihood

This material probably deserves to be formalized as its own tool, as part of LightMap or elsewhere.

Let's arbitrarily suppose that we have an event that initially generates 60k photons, fairly close to the skin:

In [36]:
x, y, z = 500, 100, -400
im = hp.sample_image(60000, x, y, z, total=lm, qe=.15, seed=1)
im.values.sum()
Out[36]:
2929.0

The event data look like so:

In [37]:
fig, ax = plt.subplots(figsize=figsize)
hp_plot(ax, im)
Out[37]:
<matplotlib.collections.QuadMesh at 0x2aab99a44588>

Compare with the expectation for an event at this position:

In [38]:
fig, ax = plt.subplots(figsize=figsize)
hp_plot(ax, hp(x, y, z))
Out[38]:
<matplotlib.collections.QuadMesh at 0x2aab88a07dd8>

Let's write down a simple binned-Poissonian log likelihood for the position:

In [48]:
def minus_llh(im, x, y, z):
    # expectation must be nonzero, and we're still working with low stats
    # so, smooth the expectation
    if not hp.tpc.mask(x, y, z):
        return 1e5
    expected = hp(x, y, z)#.gaussian_filter(2)
    obs = im.values.ravel()
    pred = im.values.sum() * expected.values.ravel()
    # logl = -pred + obs*log(pred)
    mask = (pred > 0)
    return -np.sum(-pred[mask] + obs[mask] * np.log(pred[mask]))

vminus_llh = np.vectorize(minus_llh)

This looks like it should be "enough information" to enable a position reconstruction; let's try. First, we test the TS at the true generated position, and compare with spots in the middle and on the other side of the TPC:

In [49]:
minus_llh(im, x, y, z), minus_llh(im, 0, 0, z), minus_llh(im, -x, -y, z)
Out[49]:
(10180.486235213546, 11878.096129304298, 13841.455244019573)

Next, let's scan the TS on a grid:

In [50]:
X, Y, Z, V = tpc.get_grid_cartesian()
In [51]:
X.size
Out[51]:
759
In [52]:
%time grid_ts = vminus_llh(im, X, Y, Z)
CPU times: user 4.85 s, sys: 17 ms, total: 4.86 s
Wall time: 4.85 s

The best fit position so far is:

In [53]:
ibest = np.argmin(grid_ts)
ibest
Out[53]:
742
In [54]:
X[ibest], Y[ibest], Z[ibest]
Out[54]:
(454.1199746817857, 113.52999367044652, -399.72301792387816)

We take this as a seed for a likelihood fit:

In [55]:
from scipy import optimize
In [56]:
%time optimize.fmin(lambda xyz: minus_llh(im, *xyz), x0=(X[ibest], Y[ibest], Z[ibest]))
Optimization terminated successfully.
         Current function value: 10268.911665
         Iterations: 64
         Function evaluations: 125
CPU times: user 791 ms, sys: 4.82 ms, total: 796 ms
Wall time: 789 ms
Out[56]:
array([ 465.28386084,  114.95225852, -404.49391104])

Sooo... this basically works, but it's pretty slow.

What's next?

Resolving the hit pattern as a function of event generation position is a relatively high-dimensional (5D) problem. Here we have only a preliminary solution. Things that can be done with this code:

  • Experiment with light-only reconstructions
  • Experiment with light-charge matching
  • If we can sort out the sipm_id_to_col_row() and col_row_to_sipm_id() conversions, then the above .sample can be used to do per-SiPM "fastlight" MC.

Here are some open issues:

  • Histlite was written more for convenience than for speed; many of its sanity checks are unnecessary and slow for performance sensitive code. Once the parameterization is built, evaluation for given generated positions would be faster with dedicated code (unlike for total light, I think C++ might help a lot here).
  • This feels like an excellent use case for deep learning methods. In IceCube we have a "cascade generator" that can substitute for light propagation spline tables by effectively simulating (coarsely) each event hypothesis, rather than looking up light expectations. Maybe we can do the same for nEXO.