In this notebook, we go over some ways of modeling a lightmap — the photon collection efficiency as a function of position inside a detector (in this case, a TPC).
While example scripts may be added, this is arguably the clearest way of documenting both the how and the why.
In "the future", we foresee an incremental port of this machinery that allows parts, and eventually the whole, to be run within nEXO-offline using the SNiPER framework.
Jump to:
import numpy as np
import matplotlib.pyplot as plt
import histlite as hl
import LightMap
%matplotlib inline
# plotting aesthetics
plt.rc('figure', dpi=150, figsize=(4,3), facecolor='w')
plt.rc('savefig', dpi=150, facecolor='w')
plt.rc('lines', linewidth=1.5)
We need some data to get started. For generating actual lightmap models, it's unclear whether standalone scripts or SNiPER modules will ultimately be preferred. In any case, for now we use a dedicated utility function to get a Pandas DataFrame.
pip install --user pandas root_numpy
mc_dir = '/p/lustre2/nexouser/optical_sample/g4'
filenames = [
'{}/PhotonBomb_gps_ActiveOptical_seed{}.nEXOevents.root'.format(mc_dir, i)
for i in range(1, 1001)]
# first 25 files
data = LightMap.read_files(filenames[:25])
LightMap.read_files()
extracts fGenX,fGenY,fGenZ
as x,y,z
. From these, it computes r,theta
. It also loads fNOP
.
data.head()
Data is loaded from the Event/Sim/SimEvent
tree by default, but technically this is optional. Extra branches can be added with the branches=
argument.
LightMap.read_files(filenames[0], tree='Event/Sim/SimEvent', branches=['fSiPMID']).head()
A raw numpy array can also be obtained, rather than a DataFrame:
LightMap.read_files(filenames[0], df=False)[:5]
Now before we press on, note that we know a priori that for this dataset, the true number of generated photons per event was 100. For convenience, we note the collection efficiency in the DataFrame:
data['eff'] = data['fNOP'] / 100.0
The actual geometry of the TPC is complicated; here, we basically try not to stress too much about it. For our purposes, the "TPC" is just a cylindrical volume in which the lightmap is defined. This is modeled with the TPC
class. Since we've already committed to an example dataset, we'll simply infer the geometry from it.
# small buffer in each coordinate
tpc = LightMap.TPC(data.r.max() + 1, data.z.min() - 1, data.z.max() + 1)
tpc
Several possible models are available, all of which are derived from LightMap.LightMap
(technically, LightMap.total.LightMap
). The simplest of these is "Hist XYZ", so we start there.
train = data.x.values, data.y.values, data.z.values, data.eff.values
lm_hist = LightMap.LightMapHistXYZ(tpc)
lm_hist
%time lm_hist.fit(*train)
lm_hist
Histlite provides some slicing and plotting routines, so we can visualize this model easily enough by inspecting the h_mean
property:
# plotting style
pkw = dict(vmin=0.2, vmax=.34, cmap='Greys_r')
# x=0 slice
fig, axs = plt.subplots(1, 3, figsize=(9, 3))
ax = axs[0]
hl.plot2d(ax, lm_hist.h_mean[0], **pkw)
ax.set_xlabel(r'$y$')
ax.set_ylabel(r'$z$')
ax.set_title(r'$x=0$')
# y=0 slice
ax = axs[1]
hl.plot2d(ax, lm_hist.h_mean[:,0], **pkw)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$z$')
ax.set_title(r'$y=0$')
# z=-200 slice
ax = axs[2]
hl.plot2d(ax, lm_hist.h_mean[:,:,-200], **pkw)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$y$')
ax.set_title(r'$z=-200$')
plt.tight_layout()
To get the efficiency value at a particular position, we can just "call" the lightmap:
lm_hist(0, 0, -200)
We can also give sequences of per-event coordinates:
lm_hist([-100, 100], [-100, 100], [-300, -100])
Or a single mapping argument with (at least) x,y,z
keys:
lm_hist(data.head())
In combination with some histlite tricks, this gives a more general way of making plots — both in terms of supporting arbitrary lightmap models, and in terms of choice over slices to visualize. Here's an example of a slice at given $\theta$:
fig, ax = plt.subplots()
# get a function of (r,z)
f = lambda r, z: lm_hist(r, np.repeat(0, r.size), z, cyl=True)
# set the evaluation and plotting range
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
# evaluate on a grid
# vectorize=False means arrays work, no need to say np.vectorize(f)(...) internally
h = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
# plot with colorbar
d = hl.plot2d(ax, h, cbar=True, **pkw)
# labels etc
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
plt.tight_layout()
In the limit of infinite statistics, we can choose an appropriately fine binning (see the LightMap.total.LightMapHistXYZ
docstring) and be done. Unfortunately, that's barely the situation with MC, let alone with plausible future calibration data, so let's look at some other models.
I don't actually remember coding a working spline model, but apparently there is one of some sort.
Beware: the RAM requirements on this thing grow out of control very fast for a large number of bins, including the default 50x50x50 binning in (x,y,z).
lm_spline = LightMap.LightMapSplineXYZ(tpc, 15) # 15x15x15 bins
lm_spline
%time lm_spline.fit(*train)
lm_spline
To reduce copypasta, here's a helper function for plotting:
def plot_lm_rz(ax, lm, theta=0, vectorize=False, cbar=True):
f = lambda r, z: lm(r, (theta if vectorize else np.repeat(theta, z.size)), z, cyl=True)
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
h = hl.hist_from_eval(f, vectorize=vectorize, bins=200, range=rang)
d = hl.plot2d(ax, h, cbar=cbar, **pkw)
if cbar:
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
return d
fig, ax = plt.subplots()
plot_lm_rz(ax, lm_spline, vectorize=True)
plt.tight_layout()
We can use a scikit-learn random forest to model the lightmap. The pros and cons are more obvious given a plot, so let's get one.
pip install --user scikit-learn
sklearn_kw = dict(max_depth=8, n_estimators=4, n_jobs=10, random_state=1)
lm_rf = LightMap.LightMapRF(tpc, **sklearn_kw)
lm_rf
%time lm_rf.fit(*train)
lm_rf
fig, ax = plt.subplots()
plot_lm_rz(ax, lm_rf)
plt.tight_layout()
If we take a different slice things might look a little better:
fig, ax = plt.subplots()
# get a function of (y,z)
f = lambda y, z: lm_rf(np.repeat(0, z.size), y, z)
# set the evaluation and plotting range
rang = (-tpc.r, tpc.r), (tpc.zmin, tpc.zmax)
# evaluate on a grid
# vectorize=False means arrays work, no need to say np.vectorize(f)(...) internally
h = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
# plot with colorbar
d = hl.plot2d(ax, h, cbar=True, **pkw)
# labels etc
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$y$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
plt.tight_layout()
But in any case, the pros and cons are now on display.
Finally, we'll model the lightmap using a neural network, internally using Tensorflow and Keras.
pip install --user tensorflow==2.0.0-rc0 h5py
layers = [512, 256, 128, 64, 32]
lm_nn = LightMap.total.LightMapNN(tpc, epochs=1, batch_size=1024, hidden_layers=layers)
lm_nn
# ensemble method seems to work best here -> take mean of 3 NN's
for i in range(3):
%time lm_nn.fit(*train)
lm_nn
fig, ax = plt.subplots()
plot_lm_rz(ax, lm_nn)
plt.tight_layout()
First off, let's make the plot:
lms = lm_hist, lm_spline, lm_rf, lm_nn
fig, axs = plt.subplots(1, 5, figsize=(9,3), dpi=400, gridspec_kw=dict(width_ratios=[1,1,1,1,.05]))
for (ax,lm) in zip(axs, lms):
vectorize = ('Spline' in lm.kind)
d = plot_lm_rz(ax, lm, vectorize=vectorize, cbar=False)
ax.set_title(lm.kind)
cb = fig.colorbar(d, cax=axs[-1])
cb.set_label(r'collection efficiency')
for ax in axs[1:-1]:
ax.set_yticks([])
ax.set_ylabel('')
What can we say about these?
For most applications, we'll want to build a lightmap once and use it lots of times. Let's save the above four models:
# decide on a directory (will be created on the fly if necessary)
import os
home_dir = os.getenv('HOME')
model_dir = '{}/lightmap/total/tutorial'.format(home_dir)
for lm in lms:
LightMap.save_model(model_dir, lm.kind, lm)
!find /g/g20/richman3/lightmap/total/tutorial/ | sort
Saving our models wouldn't be very useful if we couldn't load them back up. Here's how this is done:
kinds = 'LightMapHistXYZ LightMapSplineXYZ LightMapRF LightMapNN'.split()
restored_lms = []
for kind in kinds:
lm = LightMap.load_model(model_dir, kind)
restored_lms.append(lm)
print('\n', lm, '\n')
LightMap.LightMap
provides a method for randomly sampling the number of detected photons, given a total number of generated photons and some quantum efficiency for detection. Let's suppose we expect generated 90e3/6
(this number came out of discussions I only partially remember from the Summer 2019 meeting, but it had something to do with α from Rn vs 0νββ and γ near the Q-value). Then for an event at (x,y,z)=(0,0,0) and 10% QE, we may get for example this many detected photons:
# number generated
n_gen_expected = 90e3 / 6
# optional seed for reproducibility
lm_nn.sample_n_collected(0, 0, 0, n_gen_expected, qe=0.10, seed=1)
We can apply this for a bunch of events at once. Let's make some up:
N = 30000
# stay away from the edges a bit
r2 = np.random.uniform(0, (.9*tpc.r)**2, N)
r = np.sqrt(r2)
theta = np.random.uniform(0, 2*np.pi, N)
z = np.random.uniform(tpc.zmin + .05*tpc.length, tpc.zmax - .05*tpc.length, N)
x, y = r*np.cos(theta), r*np.sin(theta)
ngen = np.random.poisson(int(n_gen_expected), N)
Then the per-event number of detected photons is, e.g.,
ndet = lm_nn.sample_n_collected(x, y, z, ngen, qe=0.10, seed=2)
eff = ndet / n_gen_expected
A quick sanity check:
fig, ax = plt.subplots()
hl.plot1d(ax, hl.hist(eff, bins=50).normalize(), crosses=True)
hl.plot1d(ax, hl.kde(eff), color='C0', lw=1, alpha=.33)
ax.set_xlabel(r'collection efficiency')
ax.set_ylabel(r'PDF');
A further sanity check would be to see if we can re-calibrate based on this fake data and thus reproduce the original lightmap — this is a full circle test.
train_again = x, y, z, 10*eff # undo QE effect
layers = [512, 256, 128, 64, 32]
lm_nn_again = LightMap.total.LightMapNN(tpc, epochs=10, batch_size=64, hidden_layers=layers)
for i in range(3):
%time lm_nn_again.fit(*train_again)
fig, axs = plt.subplots(1, 3, figsize=(4.5,3), gridspec_kw=dict(width_ratios=[1,1,.05]))
these_lms = lm_nn, lm_nn_again
titles = 'Original NN', 'Round-trip NN'
for (ax,lm, title) in zip(axs, these_lms, titles):
d = plot_lm_rz(ax, lm, cbar=False)
ax.set_title(title)
cb = fig.colorbar(d, cax=axs[-1])
cb.set_label(r'collection efficiency')
for ax in axs[1:-1]:
ax.set_yticks([])
ax.set_ylabel('')
A percent difference plot might be helpful here:
fig, ax = plt.subplots()
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
f = lambda r, z: lm_nn(r, np.repeat(0, r.size), z, cyl=True)
h_nn = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
f = lambda r, z: lm_nn_again(r, np.repeat(0, r.size), z, cyl=True)
h_nn_again = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
d = hl.plot2d(ax, 100*(h_nn_again - h_nn) / h_nn, cbar=True, vmin=-5, vmax=5, cmap='RdBu_r')
d['colorbar'].set_label(r'percent difference (new vs orig NN)')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
For comparison (between comparisons...) consider the percent difference between the original spline and original NN:
fig, ax = plt.subplots()
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
f = lambda r, z: lm_spline(r, np.repeat(0, r.size), z, cyl=True)
h_spline = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
d = hl.plot2d(ax, 100*(h_spline - h_nn) / h_nn, cbar=True, vmin=-5, vmax=5, cmap='RdBu_r')
d['colorbar'].set_label(r'percent difference (Spline vs NN)')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
There is plenty more that can be done already with this code:
LightMapHistXYZ
or LightMapSplineXYZ
LightMapAnalytic
LightMapHistCyl
Here we demonstrated how these tools can be used. Lots of useful work can be done in an interactive notebook. For serious processing work, the first step would be to solidify the exact desired steps in scripts, which could be run sequentially following the main processing and which could use ROOT
or root_numpy
or whatever else to save the results. However, as soon as possible, we want to integrate at least use of, if not fitting of, lightmap models into SNiPER: first as python modules, and later, likely as a C++ service plus some C++ modules.
Going forward, a (relatively) simple per-SiPM hit-pattern treatment will be incorporated in similar fashion to what's shown here.