LightMap Total Light Examples

In this notebook, we go over some ways of modeling a lightmap — the photon collection efficiency as a function of position inside a detector (in this case, a TPC).

While example scripts may be added, this is arguably the clearest way of documenting both the how and the why.

In "the future", we foresee an incremental port of this machinery that allows parts, and eventually the whole, to be run within nEXO-offline using the SNiPER framework.

Jump to:

Imports

We want numpy+matplotlib for the basics; histlite for histograms; and the new nEXO-offline package LightMap.

  • Histlite: pip install --user histlite
  • LightMap: in your nEXO-offline sources, git checkout u/mrichman/issue_0050 (see issue 50)
In [1]:
import numpy as np
import matplotlib.pyplot as plt

import histlite as hl
import LightMap

%matplotlib inline
In [2]:
# plotting aesthetics
plt.rc('figure', dpi=150, figsize=(4,3), facecolor='w')
plt.rc('savefig', dpi=150, facecolor='w')
plt.rc('lines', linewidth=1.5)

Loading some data

We need some data to get started. For generating actual lightmap models, it's unclear whether standalone scripts or SNiPER modules will ultimately be preferred. In any case, for now we use a dedicated utility function to get a Pandas DataFrame.

  • Dependencies: pip install --user pandas root_numpy
In [3]:
mc_dir = '/p/lustre2/nexouser/optical_sample/g4'
filenames = [
    '{}/PhotonBomb_gps_ActiveOptical_seed{}.nEXOevents.root'.format(mc_dir, i)
    for i in range(1, 1001)]
In [4]:
# first 25 files
data = LightMap.read_files(filenames[:25])
Welcome to JupyROOT 6.16/00
Reading 25 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed25.nEXOevents.root ...
Done: 0:00:07.920123 elapsed.

LightMap.read_files() extracts fGenX,fGenY,fGenZ as x,y,z. From these, it computes r,theta. It also loads fNOP.

In [5]:
data.head()
Out[5]:
fNOP x y z r theta
0 30 -194.365575 251.323492 -517.713527 317.712880 2.229086
1 34 -22.179348 -285.365986 -313.788512 286.226605 4.634822
2 36 358.826140 -127.085345 -345.179910 380.666368 5.942801
3 35 245.737944 434.649320 36.410796 499.306687 1.056229
4 33 189.608664 -184.041113 -92.808621 264.239620 5.512686

Data is loaded from the Event/Sim/SimEvent tree by default, but technically this is optional. Extra branches can be added with the branches= argument.

In [6]:
LightMap.read_files(filenames[0], tree='Event/Sim/SimEvent', branches=['fSiPMID']).head()
Reading 1 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed1.nEXOevents.root ...
Done: 0:00:04.465417 elapsed.
Out[6]:
fNOP fSiPMID x y z r theta
0 30 [52342, 454, 22025, 10750, 40914, 92349, 2716,... -194.365575 251.323492 -517.713527 317.712880 2.229086
1 34 [161759, 102437, 170312, 152738, 112229, 17060... -22.179348 -285.365986 -313.788512 286.226605 4.634822
2 36 [140762, 131302, 161822, 161939, 62728, 110340... 358.826140 -127.085345 -345.179910 380.666368 5.942801
3 35 [212936, 140843, 221251, 71319, 212707, 132618... 245.737944 434.649320 36.410796 499.306687 1.056229
4 33 [20812, 50859, 161617, 90707, 120600, 112820, ... 189.608664 -184.041113 -92.808621 264.239620 5.512686

A raw numpy array can also be obtained, rather than a DataFrame:

In [7]:
LightMap.read_files(filenames[0], df=False)[:5]
Reading 1 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed1.nEXOevents.root ...
Done: 0:00:00.204536 elapsed.
Out[7]:
array([(-194.36557513,  251.32349175, -517.71352743, 30),
       ( -22.17934751, -285.36598641, -313.78851224, 34),
       ( 358.82614042, -127.0853445 , -345.17990957, 36),
       ( 245.73794367,  434.64931974,   36.41079584, 35),
       ( 189.60866434, -184.04111347,  -92.80862108, 33)],
      dtype=[('fGenX', '<f8'), ('fGenY', '<f8'), ('fGenZ', '<f8'), ('fNOP', '<i4')])

Now before we press on, note that we know a priori that for this dataset, the true number of generated photons per event was 100. For convenience, we note the collection efficiency in the DataFrame:

In [8]:
data['eff'] = data['fNOP'] / 100.0

TPC Geometry

The actual geometry of the TPC is complicated; here, we basically try not to stress too much about it. For our purposes, the "TPC" is just a cylindrical volume in which the lightmap is defined. This is modeled with the TPC class. Since we've already committed to an example dataset, we'll simply infer the geometry from it.

In [9]:
# small buffer in each coordinate
tpc = LightMap.TPC(data.r.max() + 1, data.z.min() - 1, data.z.max() + 1)
In [10]:
tpc
Out[10]:
TPC(r=567.65, zmin=-967.37, zmax=217.63)

LightMap fitting

Several possible models are available, all of which are derived from LightMap.LightMap (technically, LightMap.total.LightMap). The simplest of these is "Hist XYZ", so we start there.

In [11]:
train = data.x.values, data.y.values, data.z.values, data.eff.values
In [12]:
lm_hist = LightMap.LightMapHistXYZ(tpc)
lm_hist
Out[12]:
LightMapHistXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=None
)
In [13]:
%time lm_hist.fit(*train)
lm_hist
CPU times: user 7 s, sys: 2.06 s, total: 9.07 s
Wall time: 9.04 s
Out[13]:
LightMapHistXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(50 bins in [-566.6147511325235,566.6487692671878], 50 bins in [-566.6205786069158,566.6376119572055], 50 bins in [-966.3729862761102,216.62996080010043]) with sum 5000000.0, 23126 empty bins, and 0 non-finite values
)

Histlite provides some slicing and plotting routines, so we can visualize this model easily enough by inspecting the h_mean property:

In [14]:
# plotting style
pkw = dict(vmin=0.2, vmax=.34, cmap='Greys_r')
In [15]:
# x=0 slice
fig, axs = plt.subplots(1, 3, figsize=(9, 3))
ax = axs[0]
hl.plot2d(ax, lm_hist.h_mean[0], **pkw)
ax.set_xlabel(r'$y$')
ax.set_ylabel(r'$z$')
ax.set_title(r'$x=0$')

# y=0 slice
ax = axs[1]
hl.plot2d(ax, lm_hist.h_mean[:,0], **pkw)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$z$')
ax.set_title(r'$y=0$')

# z=-200 slice
ax = axs[2]
hl.plot2d(ax, lm_hist.h_mean[:,:,-200], **pkw)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$y$')
ax.set_title(r'$z=-200$')

plt.tight_layout()

To get the efficiency value at a particular position, we can just "call" the lightmap:

In [16]:
lm_hist(0, 0, -200)
Out[16]:
array([0.32017241])

We can also give sequences of per-event coordinates:

In [17]:
lm_hist([-100, 100], [-100, 100], [-300, -100])
Out[17]:
array([0.32431818, 0.29730769])

Or a single mapping argument with (at least) x,y,z keys:

In [18]:
lm_hist(data.head())
Out[18]:
array([0.33261538, 0.31860465, 0.34016129, 0.27808511, 0.29895833])

In combination with some histlite tricks, this gives a more general way of making plots — both in terms of supporting arbitrary lightmap models, and in terms of choice over slices to visualize. Here's an example of a slice at given $\theta$:

In [19]:
fig, ax = plt.subplots()
# get a function of (r,z)
f = lambda r, z: lm_hist(r, np.repeat(0, r.size), z, cyl=True)
# set the evaluation and plotting range
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
# evaluate on a grid
# vectorize=False means arrays work, no need to say np.vectorize(f)(...) internally
h = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
# plot with colorbar
d = hl.plot2d(ax, h, cbar=True, **pkw)
# labels etc
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
plt.tight_layout()

In the limit of infinite statistics, we can choose an appropriately fine binning (see the LightMap.total.LightMapHistXYZ docstring) and be done. Unfortunately, that's barely the situation with MC, let alone with plausible future calibration data, so let's look at some other models.

Spline

I don't actually remember coding a working spline model, but apparently there is one of some sort.

Beware: the RAM requirements on this thing grow out of control very fast for a large number of bins, including the default 50x50x50 binning in (x,y,z).

In [20]:
lm_spline = LightMap.LightMapSplineXYZ(tpc, 15) # 15x15x15 bins
lm_spline
Out[20]:
LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=None
  spline=None
)
In [21]:
%time lm_spline.fit(*train)
lm_spline
CPU times: user 20.4 s, sys: 41 s, total: 1min 1s
Wall time: 10.5 s
Out[21]:
LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(15 bins in [-605.493299575714,605.493299575714], 15 bins in [-605.493299575714,605.493299575714], 15 bins in [-1006.8730845119838,257.1300590359741]) with sum 5000000.0, 600 empty bins, and 0 non-finite values
  spline=<histlite.SplineFit object at 0x2aab802e1e10>
)

To reduce copypasta, here's a helper function for plotting:

In [22]:
def plot_lm_rz(ax, lm, theta=0, vectorize=False, cbar=True):
    f = lambda r, z: lm(r, (theta if vectorize else np.repeat(theta, z.size)), z, cyl=True)
    rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
    h = hl.hist_from_eval(f, vectorize=vectorize, bins=200, range=rang)
    d = hl.plot2d(ax, h, cbar=cbar, **pkw)
    if cbar:
        d['colorbar'].set_label(r'collection efficiency')
    ax.set_xlabel(r'$r$')
    ax.set_ylabel(r'$z$')
    ax.set_aspect('equal')
    return d
In [23]:
fig, ax = plt.subplots()
plot_lm_rz(ax, lm_spline, vectorize=True)
plt.tight_layout()

Random Forest

We can use a scikit-learn random forest to model the lightmap. The pros and cons are more obvious given a plot, so let's get one.

  • Dependencies: pip install --user scikit-learn
In [24]:
sklearn_kw = dict(max_depth=8, n_estimators=4, n_jobs=10, random_state=1)
lm_rf = LightMap.LightMapRF(tpc, **sklearn_kw)
lm_rf
Out[24]:
LightMapRF(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  downsample=1, n_iter=1,
  rf=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=4, n_jobs=10,
           oob_score=False, random_state=1, verbose=0, warm_start=False)
)
In [25]:
%time lm_rf.fit(*train)
lm_rf
CPU times: user 2min 29s, sys: 2.06 s, total: 2min 31s
Wall time: 40.1 s
Out[25]:
LightMapRF(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  downsample=1, n_iter=1,
  rf=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=4, n_jobs=10,
           oob_score=False, random_state=1, verbose=0, warm_start=False)
)
In [26]:
fig, ax = plt.subplots()
plot_lm_rz(ax, lm_rf)
plt.tight_layout()

If we take a different slice things might look a little better:

In [27]:
fig, ax = plt.subplots()
# get a function of (y,z)
f = lambda y, z: lm_rf(np.repeat(0, z.size), y, z)
# set the evaluation and plotting range
rang = (-tpc.r, tpc.r), (tpc.zmin, tpc.zmax)
# evaluate on a grid
# vectorize=False means arrays work, no need to say np.vectorize(f)(...) internally
h = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
# plot with colorbar
d = hl.plot2d(ax, h, cbar=True, **pkw)
# labels etc
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$y$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
plt.tight_layout()

But in any case, the pros and cons are now on display.

  • Pro: A random forest is naturally good at interpolating.
  • Con: We've trained on $(x,y,z,r,\theta)$, and unfortunately, this isn't an ideal basis for slicing up the volume: we wind up with stripe-like artifacts at constant $z$, among potentially others.

Neural Network

Finally, we'll model the lightmap using a neural network, internally using Tensorflow and Keras.

  • Dependencies: pip install --user tensorflow==2.0.0-rc0 h5py
In [28]:
layers = [512, 256, 128, 64, 32]
lm_nn = LightMap.total.LightMapNN(tpc, epochs=1, batch_size=1024, hidden_layers=layers)
lm_nn
Out[28]:
LightMapNN(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  hidden_layers=[512, 256, 128, 64, 32]
  epochs=1, batch_size=1024, range=(0, 1)
  models=[])
In [29]:
# ensemble method seems to work best here -> take mean of 3 NN's
for i in range(3):
    %time lm_nn.fit(*train)
lm_nn
Train on 5000000 samples
5000000/5000000 [==============================] - 26s 5us/sample - loss: 0.0011
CPU times: user 5min 9s, sys: 30.9 s, total: 5min 40s
Wall time: 27.7 s
Train on 5000000 samples
5000000/5000000 [==============================] - 26s 5us/sample - loss: 0.0011
CPU times: user 5min 10s, sys: 31.4 s, total: 5min 41s
Wall time: 27.8 s
Train on 5000000 samples
5000000/5000000 [==============================] - 26s 5us/sample - loss: 0.0011
CPU times: user 5min 12s, sys: 28.8 s, total: 5min 40s
Wall time: 27.4 s
Out[29]:
LightMapNN(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  hidden_layers=[512, 256, 128, 64, 32]
  epochs=1, batch_size=1024, range=(0, 1)
  models=[<tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab8ab80b70>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab8ab80d30>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab8686b710>])
2019-08-27 09:19:46.750365: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-27 09:19:46.759160: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2095150000 Hz
2019-08-27 09:19:46.761330: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xbed9a60 executing computations on platform Host. Devices:
2019-08-27 09:19:46.761386: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
In [30]:
fig, ax = plt.subplots()
plot_lm_rz(ax, lm_nn)
plt.tight_layout()

Model Summary

First off, let's make the plot:

In [31]:
lms = lm_hist, lm_spline, lm_rf, lm_nn
In [32]:
fig, axs = plt.subplots(1, 5, figsize=(9,3), dpi=400, gridspec_kw=dict(width_ratios=[1,1,1,1,.05]))
for (ax,lm) in zip(axs, lms):
    vectorize = ('Spline' in lm.kind)
    d = plot_lm_rz(ax, lm, vectorize=vectorize, cbar=False)
    ax.set_title(lm.kind)
cb = fig.colorbar(d, cax=axs[-1])
cb.set_label(r'collection efficiency')
for ax in axs[1:-1]:
    ax.set_yticks([])
    ax.set_ylabel('')

What can we say about these?

  • HistXYZ: needs crazy-high stats to be effective, but very easy to reason about.
  • SplineXYZ: didn't even think this would work; RAM limitations restrict precision achievable.
  • RF: robust way to get a decent fit, but still inherently unsmooth in the details.
  • NN: works very well in at least this case of good, relatively uniformly-distributed stats.

Saving Models

For most applications, we'll want to build a lightmap once and use it lots of times. Let's save the above four models:

In [33]:
# decide on a directory (will be created on the fly if necessary)
import os
home_dir = os.getenv('HOME')
model_dir = '{}/lightmap/total/tutorial'.format(home_dir)
In [34]:
for lm in lms:
    LightMap.save_model(model_dir, lm.kind, lm)
-> /g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapRF/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__000.h5 ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__001.h5 ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__002.h5 ...
In [35]:
!find /g/g20/richman3/lightmap/total/tutorial/ | sort
/g/g20/richman3/lightmap/total/tutorial/
/g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ
/g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ/model.npy
/g/g20/richman3/lightmap/total/tutorial/LightMapNN
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model.npy
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__000.h5
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__001.h5
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__002.h5
/g/g20/richman3/lightmap/total/tutorial/LightMapRF
/g/g20/richman3/lightmap/total/tutorial/LightMapRF/model.npy
/g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ
/g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy

Loading Models

Saving our models wouldn't be very useful if we couldn't load them back up. Here's how this is done:

In [36]:
kinds = 'LightMapHistXYZ LightMapSplineXYZ LightMapRF LightMapNN'.split()
restored_lms = []
for kind in kinds:
    lm = LightMap.load_model(model_dir, kind)
    restored_lms.append(lm)
    print('\n', lm, '\n')
<- /g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ/model.npy ...

 LightMapHistXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(50 bins in [-566.6147511325235,566.6487692671878], 50 bins in [-566.6205786069158,566.6376119572055], 50 bins in [-966.3729862761102,216.62996080010043]) with sum 5000000.0, 23126 empty bins, and 0 non-finite values
) 

<- /g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy ...

 LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(15 bins in [-605.493299575714,605.493299575714], 15 bins in [-605.493299575714,605.493299575714], 15 bins in [-1006.8730845119838,257.1300590359741]) with sum 5000000.0, 600 empty bins, and 0 non-finite values
  spline=<histlite.SplineFit object at 0x2aae69894b38>
) 

<- /g/g20/richman3/lightmap/total/tutorial/LightMapRF/model.npy ...

 LightMapRF(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  downsample=1, n_iter=1,
  rf=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=4, n_jobs=10,
           oob_score=False, random_state=1, verbose=0, warm_start=False)
) 

<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model.npy ...
<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__000.h5 ...
<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__001.h5 ...
<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__002.h5 ...

 LightMapNN(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  hidden_layers=[512, 256, 128, 64, 32]
  epochs=1, batch_size=1024, range=(0, 1)
  models=[<tensorflow.python.keras.engine.sequential.Sequential object at 0x2aae69881b00>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aae69881438>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab6047e048>]) 

Towards "Fastlight"

LightMap.LightMap provides a method for randomly sampling the number of detected photons, given a total number of generated photons and some quantum efficiency for detection. Let's suppose we expect generated 90e3/6 (this number came out of discussions I only partially remember from the Summer 2019 meeting, but it had something to do with α from Rn vs 0νββ and γ near the Q-value). Then for an event at (x,y,z)=(0,0,0) and 10% QE, we may get for example this many detected photons:

In [37]:
# number generated
n_gen_expected = 90e3 / 6
# optional seed for reproducibility
lm_nn.sample_n_collected(0, 0, 0, n_gen_expected, qe=0.10, seed=1)
Out[37]:
array([413])

We can apply this for a bunch of events at once. Let's make some up:

In [38]:
N = 30000
# stay away from the edges a bit
r2 = np.random.uniform(0, (.9*tpc.r)**2, N)
r = np.sqrt(r2)
theta = np.random.uniform(0, 2*np.pi, N)
z = np.random.uniform(tpc.zmin + .05*tpc.length, tpc.zmax - .05*tpc.length, N)
x, y = r*np.cos(theta), r*np.sin(theta)
ngen = np.random.poisson(int(n_gen_expected), N)

Then the per-event number of detected photons is, e.g.,

In [39]:
ndet = lm_nn.sample_n_collected(x, y, z, ngen, qe=0.10, seed=2)
eff = ndet / n_gen_expected

A quick sanity check:

In [40]:
fig, ax = plt.subplots()
hl.plot1d(ax, hl.hist(eff, bins=50).normalize(), crosses=True)
hl.plot1d(ax, hl.kde(eff), color='C0', lw=1, alpha=.33)
ax.set_xlabel(r'collection efficiency')
ax.set_ylabel(r'PDF');

A further sanity check would be to see if we can re-calibrate based on this fake data and thus reproduce the original lightmap — this is a full circle test.

In [41]:
train_again = x, y, z, 10*eff  # undo QE effect
In [42]:
layers = [512, 256, 128, 64, 32]
lm_nn_again = LightMap.total.LightMapNN(tpc, epochs=10, batch_size=64, hidden_layers=layers)
In [43]:
for i in range(3):
    %time lm_nn_again.fit(*train_again)
Train on 30000 samples
Epoch 1/10
30000/30000 [==============================] - 2s 67us/sample - loss: 3.6081e-04
Epoch 2/10
30000/30000 [==============================] - 1s 46us/sample - loss: 1.2049e-04
Epoch 3/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1844e-04
Epoch 4/10
30000/30000 [==============================] - 1s 48us/sample - loss: 1.1746e-04
Epoch 5/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1559e-04
Epoch 6/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1615e-04
Epoch 7/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1553e-04
Epoch 8/10
30000/30000 [==============================] - 1s 48us/sample - loss: 1.1420e-04
Epoch 9/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1474e-04
Epoch 10/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1228e-04
CPU times: user 57 s, sys: 17.5 s, total: 1min 14s
Wall time: 14.9 s
Train on 30000 samples
Epoch 1/10
30000/30000 [==============================] - 2s 65us/sample - loss: 2.9073e-04
Epoch 2/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.2198e-04
Epoch 3/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1957e-04
Epoch 4/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1719e-04
Epoch 5/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1798e-04
Epoch 6/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1557e-04
Epoch 7/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1331e-04
Epoch 8/10
30000/30000 [==============================] - 1s 46us/sample - loss: 1.1369e-04
Epoch 9/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1260e-04
Epoch 10/10
30000/30000 [==============================] - 1s 46us/sample - loss: 1.1129e-04
CPU times: user 56.7 s, sys: 16.1 s, total: 1min 12s
Wall time: 14.3 s
Train on 30000 samples
Epoch 1/10
30000/30000 [==============================] - 2s 65us/sample - loss: 3.4873e-04
Epoch 2/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.2157e-04
Epoch 3/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1910e-04
Epoch 4/10
30000/30000 [==============================] - 1s 44us/sample - loss: 1.1889e-04
Epoch 5/10
30000/30000 [==============================] - 1s 44us/sample - loss: 1.1626e-04
Epoch 6/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1652e-04
Epoch 7/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1606e-04
Epoch 8/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1363e-04
Epoch 9/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1238e-04
Epoch 10/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1281e-04
CPU times: user 56.8 s, sys: 15.8 s, total: 1min 12s
Wall time: 14.5 s
In [44]:
fig, axs = plt.subplots(1, 3, figsize=(4.5,3), gridspec_kw=dict(width_ratios=[1,1,.05]))
these_lms = lm_nn, lm_nn_again
titles = 'Original NN', 'Round-trip NN'
for (ax,lm, title) in zip(axs, these_lms, titles):
    d = plot_lm_rz(ax, lm, cbar=False)
    ax.set_title(title)
cb = fig.colorbar(d, cax=axs[-1])
cb.set_label(r'collection efficiency')
for ax in axs[1:-1]:
    ax.set_yticks([])
    ax.set_ylabel('')

A percent difference plot might be helpful here:

In [45]:
fig, ax = plt.subplots()
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
f = lambda r, z: lm_nn(r, np.repeat(0, r.size), z, cyl=True)
h_nn = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
f = lambda r, z: lm_nn_again(r, np.repeat(0, r.size), z, cyl=True)
h_nn_again = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
d = hl.plot2d(ax, 100*(h_nn_again - h_nn) / h_nn, cbar=True, vmin=-5, vmax=5, cmap='RdBu_r')
d['colorbar'].set_label(r'percent difference (new vs orig NN)')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')

For comparison (between comparisons...) consider the percent difference between the original spline and original NN:

In [46]:
fig, ax = plt.subplots()
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
f = lambda r, z: lm_spline(r, np.repeat(0, r.size), z, cyl=True)
h_spline = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
d = hl.plot2d(ax, 100*(h_spline - h_nn) / h_nn, cbar=True, vmin=-5, vmax=5, cmap='RdBu_r')
d['colorbar'].set_label(r'percent difference (Spline vs NN)')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')

What's next?

There is plenty more that can be done already with this code:

  • How can we quantify the accuracy of the baseline "truth", which strictly speaking is only implicit in the MC?
  • Are we capturing the relevant dependence in full 3D?
  • How well do the various methods work for realistic calibration datasets?
  • Is there anything to gain from methods not shown here?
    • Gaussian smoothing and/or sliding binning for LightMapHistXYZ or LightMapSplineXYZ
    • analytic lightmaps with LightMapAnalytic
    • partially-implemented cylindrically-binned LightMapHistCyl
  • What constraints can this give us on the detector and calibration designs?
  • N.B.: everything introduced here should, in practice, be tested with data not used for initial lightmap fitting. We've only used the first few of 1k files in the chosen production, so there's plenty more to choose from.

Here we demonstrated how these tools can be used. Lots of useful work can be done in an interactive notebook. For serious processing work, the first step would be to solidify the exact desired steps in scripts, which could be run sequentially following the main processing and which could use ROOT or root_numpy or whatever else to save the results. However, as soon as possible, we want to integrate at least use of, if not fitting of, lightmap models into SNiPER: first as python modules, and later, likely as a C++ service plus some C++ modules.

Going forward, a (relatively) simple per-SiPM hit-pattern treatment will be incorporated in similar fashion to what's shown here.