LightMap Total Light Examples¶

In this notebook, we go over some ways of modeling a lightmap — the photon collection efficiency as a function of position inside a detector (in this case, a TPC).

While example scripts may be added, this is arguably the clearest way of documenting both the how and the why.

In "the future", we foresee an incremental port of this machinery that allows parts, and eventually the whole, to be run within nEXO-offline using the SNiPER framework.

Jump to:

Loading some data
TPC Geometry
LightMap Fitting — histogram
Spline
Random Forest
Neural Network
Model Summary
Saving Models
Loading Models
Towards "Fastlight"
What's next?

Imports¶

We want numpy+matplotlib for the basics; histlite for histograms; and the new nEXO-offline package LightMap.

Histlite: pip install --user histlite
LightMap: in your nEXO-offline sources, git checkout u/mrichman/issue_0050 (see issue 50)

import numpy as np
import matplotlib.pyplot as plt

import histlite as hl
import LightMap

%matplotlib inline

# plotting aesthetics
plt.rc('figure', dpi=150, figsize=(4,3), facecolor='w')
plt.rc('savefig', dpi=150, facecolor='w')
plt.rc('lines', linewidth=1.5)

Loading some data¶

We need some data to get started. For generating actual lightmap models, it's unclear whether standalone scripts or SNiPER modules will ultimately be preferred. In any case, for now we use a dedicated utility function to get a Pandas DataFrame.

Dependencies: pip install --user pandas root_numpy

mc_dir = '/p/lustre2/nexouser/optical_sample/g4'
filenames = [
    '{}/PhotonBomb_gps_ActiveOptical_seed{}.nEXOevents.root'.format(mc_dir, i)
    for i in range(1, 1001)]

# first 25 files
data = LightMap.read_files(filenames[:25])

Welcome to JupyROOT 6.16/00
Reading 25 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed25.nEXOevents.root ...
Done: 0:00:07.920123 elapsed.

LightMap.read_files() extracts fGenX,fGenY,fGenZ as x,y,z. From these, it computes r,theta. It also loads fNOP.

data.head()

Data is loaded from the Event/Sim/SimEvent tree by default, but technically this is optional. Extra branches can be added with the branches= argument.

LightMap.read_files(filenames[0], tree='Event/Sim/SimEvent', branches=['fSiPMID']).head()

Reading 1 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed1.nEXOevents.root ...
Done: 0:00:04.465417 elapsed.

A raw numpy array can also be obtained, rather than a DataFrame:

LightMap.read_files(filenames[0], df=False)[:5]

Reading 1 files...
 <- /p/lustre2/nexouser/optical_sample/g4/PhotonBomb_gps_ActiveOptical_seed1.nEXOevents.root ...
Done: 0:00:00.204536 elapsed.

array([(-194.36557513,  251.32349175, -517.71352743, 30),
       ( -22.17934751, -285.36598641, -313.78851224, 34),
       ( 358.82614042, -127.0853445 , -345.17990957, 36),
       ( 245.73794367,  434.64931974,   36.41079584, 35),
       ( 189.60866434, -184.04111347,  -92.80862108, 33)],
      dtype=[('fGenX', '<f8'), ('fGenY', '<f8'), ('fGenZ', '<f8'), ('fNOP', '<i4')])

Now before we press on, note that we know a priori that for this dataset, the true number of generated photons per event was 100. For convenience, we note the collection efficiency in the DataFrame:

data['eff'] = data['fNOP'] / 100.0

TPC Geometry¶

The actual geometry of the TPC is complicated; here, we basically try not to stress too much about it. For our purposes, the "TPC" is just a cylindrical volume in which the lightmap is defined. This is modeled with the TPC class. Since we've already committed to an example dataset, we'll simply infer the geometry from it.

# small buffer in each coordinate
tpc = LightMap.TPC(data.r.max() + 1, data.z.min() - 1, data.z.max() + 1)

tpc

TPC(r=567.65, zmin=-967.37, zmax=217.63)

LightMap fitting¶

Several possible models are available, all of which are derived from LightMap.LightMap (technically, LightMap.total.LightMap). The simplest of these is "Hist XYZ", so we start there.

train = data.x.values, data.y.values, data.z.values, data.eff.values

lm_hist = LightMap.LightMapHistXYZ(tpc)
lm_hist

LightMapHistXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=None
)

%time lm_hist.fit(*train)
lm_hist

CPU times: user 7 s, sys: 2.06 s, total: 9.07 s
Wall time: 9.04 s

LightMapHistXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(50 bins in [-566.6147511325235,566.6487692671878], 50 bins in [-566.6205786069158,566.6376119572055], 50 bins in [-966.3729862761102,216.62996080010043]) with sum 5000000.0, 23126 empty bins, and 0 non-finite values
)

Histlite provides some slicing and plotting routines, so we can visualize this model easily enough by inspecting the h_mean property:

# plotting style
pkw = dict(vmin=0.2, vmax=.34, cmap='Greys_r')

# x=0 slice
fig, axs = plt.subplots(1, 3, figsize=(9, 3))
ax = axs[0]
hl.plot2d(ax, lm_hist.h_mean[0], **pkw)
ax.set_xlabel(r'$y$')
ax.set_ylabel(r'$z$')
ax.set_title(r'$x=0$')

# y=0 slice
ax = axs[1]
hl.plot2d(ax, lm_hist.h_mean[:,0], **pkw)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$z$')
ax.set_title(r'$y=0$')

# z=-200 slice
ax = axs[2]
hl.plot2d(ax, lm_hist.h_mean[:,:,-200], **pkw)
ax.set_xlabel(r'$x$')
ax.set_ylabel(r'$y$')
ax.set_title(r'$z=-200$')

plt.tight_layout()

To get the efficiency value at a particular position, we can just "call" the lightmap:

lm_hist(0, 0, -200)

array([0.32017241])

We can also give sequences of per-event coordinates:

lm_hist([-100, 100], [-100, 100], [-300, -100])

array([0.32431818, 0.29730769])

Or a single mapping argument with (at least) x,y,z keys:

lm_hist(data.head())

array([0.33261538, 0.31860465, 0.34016129, 0.27808511, 0.29895833])

In combination with some histlite tricks, this gives a more general way of making plots — both in terms of supporting arbitrary lightmap models, and in terms of choice over slices to visualize. Here's an example of a slice at given $\theta$:

fig, ax = plt.subplots()
# get a function of (r,z)
f = lambda r, z: lm_hist(r, np.repeat(0, r.size), z, cyl=True)
# set the evaluation and plotting range
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
# evaluate on a grid
# vectorize=False means arrays work, no need to say np.vectorize(f)(...) internally
h = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
# plot with colorbar
d = hl.plot2d(ax, h, cbar=True, **pkw)
# labels etc
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
plt.tight_layout()

In the limit of infinite statistics, we can choose an appropriately fine binning (see the LightMap.total.LightMapHistXYZ docstring) and be done. Unfortunately, that's barely the situation with MC, let alone with plausible future calibration data, so let's look at some other models.

Spline¶

I don't actually remember coding a working spline model, but apparently there is one of some sort.

Beware: the RAM requirements on this thing grow out of control very fast for a large number of bins, including the default 50x50x50 binning in (x,y,z).

lm_spline = LightMap.LightMapSplineXYZ(tpc, 15) # 15x15x15 bins
lm_spline

LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=None
  spline=None
)

%time lm_spline.fit(*train)
lm_spline

CPU times: user 20.4 s, sys: 41 s, total: 1min 1s
Wall time: 10.5 s

LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(15 bins in [-605.493299575714,605.493299575714], 15 bins in [-605.493299575714,605.493299575714], 15 bins in [-1006.8730845119838,257.1300590359741]) with sum 5000000.0, 600 empty bins, and 0 non-finite values
  spline=<histlite.SplineFit object at 0x2aab802e1e10>
)

To reduce copypasta, here's a helper function for plotting:

def plot_lm_rz(ax, lm, theta=0, vectorize=False, cbar=True):
    f = lambda r, z: lm(r, (theta if vectorize else np.repeat(theta, z.size)), z, cyl=True)
    rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
    h = hl.hist_from_eval(f, vectorize=vectorize, bins=200, range=rang)
    d = hl.plot2d(ax, h, cbar=cbar, **pkw)
    if cbar:
        d['colorbar'].set_label(r'collection efficiency')
    ax.set_xlabel(r'$r$')
    ax.set_ylabel(r'$z$')
    ax.set_aspect('equal')
    return d

fig, ax = plt.subplots()
plot_lm_rz(ax, lm_spline, vectorize=True)
plt.tight_layout()

Random Forest¶

We can use a scikit-learn random forest to model the lightmap. The pros and cons are more obvious given a plot, so let's get one.

Dependencies: pip install --user scikit-learn

sklearn_kw = dict(max_depth=8, n_estimators=4, n_jobs=10, random_state=1)
lm_rf = LightMap.LightMapRF(tpc, **sklearn_kw)
lm_rf

LightMapRF(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  downsample=1, n_iter=1,
  rf=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=4, n_jobs=10,
           oob_score=False, random_state=1, verbose=0, warm_start=False)
)

%time lm_rf.fit(*train)
lm_rf

CPU times: user 2min 29s, sys: 2.06 s, total: 2min 31s
Wall time: 40.1 s

LightMapRF(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  downsample=1, n_iter=1,
  rf=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=4, n_jobs=10,
           oob_score=False, random_state=1, verbose=0, warm_start=False)
)

fig, ax = plt.subplots()
plot_lm_rz(ax, lm_rf)
plt.tight_layout()

If we take a different slice things might look a little better:

fig, ax = plt.subplots()
# get a function of (y,z)
f = lambda y, z: lm_rf(np.repeat(0, z.size), y, z)
# set the evaluation and plotting range
rang = (-tpc.r, tpc.r), (tpc.zmin, tpc.zmax)
# evaluate on a grid
# vectorize=False means arrays work, no need to say np.vectorize(f)(...) internally
h = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
# plot with colorbar
d = hl.plot2d(ax, h, cbar=True, **pkw)
# labels etc
d['colorbar'].set_label(r'collection efficiency')
ax.set_xlabel(r'$y$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')
plt.tight_layout()

But in any case, the pros and cons are now on display.

Pro: A random forest is naturally good at interpolating.
Con: We've trained on $(x,y,z,r,\theta)$, and unfortunately, this isn't an ideal basis for slicing up the volume: we wind up with stripe-like artifacts at constant $z$, among potentially others.

Neural Network¶

Finally, we'll model the lightmap using a neural network, internally using Tensorflow and Keras.

Dependencies: pip install --user tensorflow==2.0.0-rc0 h5py

layers = [512, 256, 128, 64, 32]
lm_nn = LightMap.total.LightMapNN(tpc, epochs=1, batch_size=1024, hidden_layers=layers)
lm_nn

LightMapNN(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  hidden_layers=[512, 256, 128, 64, 32]
  epochs=1, batch_size=1024, range=(0, 1)
  models=[])

# ensemble method seems to work best here -> take mean of 3 NN's
for i in range(3):
    %time lm_nn.fit(*train)
lm_nn

Train on 5000000 samples
5000000/5000000 [==============================] - 26s 5us/sample - loss: 0.0011
CPU times: user 5min 9s, sys: 30.9 s, total: 5min 40s
Wall time: 27.7 s
Train on 5000000 samples
5000000/5000000 [==============================] - 26s 5us/sample - loss: 0.0011
CPU times: user 5min 10s, sys: 31.4 s, total: 5min 41s
Wall time: 27.8 s
Train on 5000000 samples
5000000/5000000 [==============================] - 26s 5us/sample - loss: 0.0011
CPU times: user 5min 12s, sys: 28.8 s, total: 5min 40s
Wall time: 27.4 s

LightMapNN(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  hidden_layers=[512, 256, 128, 64, 32]
  epochs=1, batch_size=1024, range=(0, 1)
  models=[<tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab8ab80b70>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab8ab80d30>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab8686b710>])

2019-08-27 09:19:46.750365: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-08-27 09:19:46.759160: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2095150000 Hz
2019-08-27 09:19:46.761330: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0xbed9a60 executing computations on platform Host. Devices:
2019-08-27 09:19:46.761386: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version

fig, ax = plt.subplots()
plot_lm_rz(ax, lm_nn)
plt.tight_layout()

Model Summary¶

First off, let's make the plot:

lms = lm_hist, lm_spline, lm_rf, lm_nn

fig, axs = plt.subplots(1, 5, figsize=(9,3), dpi=400, gridspec_kw=dict(width_ratios=[1,1,1,1,.05]))
for (ax,lm) in zip(axs, lms):
    vectorize = ('Spline' in lm.kind)
    d = plot_lm_rz(ax, lm, vectorize=vectorize, cbar=False)
    ax.set_title(lm.kind)
cb = fig.colorbar(d, cax=axs[-1])
cb.set_label(r'collection efficiency')
for ax in axs[1:-1]:
    ax.set_yticks([])
    ax.set_ylabel('')

What can we say about these?

HistXYZ: needs crazy-high stats to be effective, but very easy to reason about.
SplineXYZ: didn't even think this would work; RAM limitations restrict precision achievable.
RF: robust way to get a decent fit, but still inherently unsmooth in the details.
NN: works very well in at least this case of good, relatively uniformly-distributed stats.

Saving Models¶

For most applications, we'll want to build a lightmap once and use it lots of times. Let's save the above four models:

# decide on a directory (will be created on the fly if necessary)
import os
home_dir = os.getenv('HOME')
model_dir = '{}/lightmap/total/tutorial'.format(home_dir)

for lm in lms:
    LightMap.save_model(model_dir, lm.kind, lm)

-> /g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapRF/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model.npy ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__000.h5 ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__001.h5 ...
-> /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__002.h5 ...

!find /g/g20/richman3/lightmap/total/tutorial/ | sort

/g/g20/richman3/lightmap/total/tutorial/
/g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ
/g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ/model.npy
/g/g20/richman3/lightmap/total/tutorial/LightMapNN
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model.npy
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__000.h5
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__001.h5
/g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__002.h5
/g/g20/richman3/lightmap/total/tutorial/LightMapRF
/g/g20/richman3/lightmap/total/tutorial/LightMapRF/model.npy
/g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ
/g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy

Loading Models¶

Saving our models wouldn't be very useful if we couldn't load them back up. Here's how this is done:

kinds = 'LightMapHistXYZ LightMapSplineXYZ LightMapRF LightMapNN'.split()
restored_lms = []
for kind in kinds:
    lm = LightMap.load_model(model_dir, kind)
    restored_lms.append(lm)
    print('\n', lm, '\n')

<- /g/g20/richman3/lightmap/total/tutorial/LightMapHistXYZ/model.npy ...

 LightMapHistXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(50 bins in [-566.6147511325235,566.6487692671878], 50 bins in [-566.6205786069158,566.6376119572055], 50 bins in [-966.3729862761102,216.62996080010043]) with sum 5000000.0, 23126 empty bins, and 0 non-finite values
) 

<- /g/g20/richman3/lightmap/total/tutorial/LightMapSplineXYZ/model.npy ...

 LightMapSplineXYZ(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  h_count=Hist(15 bins in [-605.493299575714,605.493299575714], 15 bins in [-605.493299575714,605.493299575714], 15 bins in [-1006.8730845119838,257.1300590359741]) with sum 5000000.0, 600 empty bins, and 0 non-finite values
  spline=<histlite.SplineFit object at 0x2aae69894b38>
) 

<- /g/g20/richman3/lightmap/total/tutorial/LightMapRF/model.npy ...

 LightMapRF(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  downsample=1, n_iter=1,
  rf=RandomForestRegressor(bootstrap=True, criterion='mse', max_depth=8,
           max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=4, n_jobs=10,
           oob_score=False, random_state=1, verbose=0, warm_start=False)
) 

<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model.npy ...
<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__000.h5 ...
<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__001.h5 ...
<- /g/g20/richman3/lightmap/total/tutorial/LightMapNN/model_nn__002.h5 ...

 LightMapNN(
  tpc=TPC(r=567.65, zmin=-967.37, zmax=217.63),
  hidden_layers=[512, 256, 128, 64, 32]
  epochs=1, batch_size=1024, range=(0, 1)
  models=[<tensorflow.python.keras.engine.sequential.Sequential object at 0x2aae69881b00>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aae69881438>, <tensorflow.python.keras.engine.sequential.Sequential object at 0x2aab6047e048>])

Towards "Fastlight"¶

LightMap.LightMap provides a method for randomly sampling the number of detected photons, given a total number of generated photons and some quantum efficiency for detection. Let's suppose we expect generated 90e3/6 (this number came out of discussions I only partially remember from the Summer 2019 meeting, but it had something to do with α from Rn vs 0νββ and γ near the Q-value). Then for an event at (x,y,z)=(0,0,0) and 10% QE, we may get for example this many detected photons:

# number generated
n_gen_expected = 90e3 / 6
# optional seed for reproducibility
lm_nn.sample_n_collected(0, 0, 0, n_gen_expected, qe=0.10, seed=1)

array([413])

We can apply this for a bunch of events at once. Let's make some up:

N = 30000
# stay away from the edges a bit
r2 = np.random.uniform(0, (.9*tpc.r)**2, N)
r = np.sqrt(r2)
theta = np.random.uniform(0, 2*np.pi, N)
z = np.random.uniform(tpc.zmin + .05*tpc.length, tpc.zmax - .05*tpc.length, N)
x, y = r*np.cos(theta), r*np.sin(theta)
ngen = np.random.poisson(int(n_gen_expected), N)

Then the per-event number of detected photons is, e.g.,

ndet = lm_nn.sample_n_collected(x, y, z, ngen, qe=0.10, seed=2)
eff = ndet / n_gen_expected

A quick sanity check:

fig, ax = plt.subplots()
hl.plot1d(ax, hl.hist(eff, bins=50).normalize(), crosses=True)
hl.plot1d(ax, hl.kde(eff), color='C0', lw=1, alpha=.33)
ax.set_xlabel(r'collection efficiency')
ax.set_ylabel(r'PDF');

A further sanity check would be to see if we can re-calibrate based on this fake data and thus reproduce the original lightmap — this is a full circle test.

train_again = x, y, z, 10*eff  # undo QE effect

layers = [512, 256, 128, 64, 32]
lm_nn_again = LightMap.total.LightMapNN(tpc, epochs=10, batch_size=64, hidden_layers=layers)

for i in range(3):
    %time lm_nn_again.fit(*train_again)

Train on 30000 samples
Epoch 1/10
30000/30000 [==============================] - 2s 67us/sample - loss: 3.6081e-04
Epoch 2/10
30000/30000 [==============================] - 1s 46us/sample - loss: 1.2049e-04
Epoch 3/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1844e-04
Epoch 4/10
30000/30000 [==============================] - 1s 48us/sample - loss: 1.1746e-04
Epoch 5/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1559e-04
Epoch 6/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1615e-04
Epoch 7/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1553e-04
Epoch 8/10
30000/30000 [==============================] - 1s 48us/sample - loss: 1.1420e-04
Epoch 9/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1474e-04
Epoch 10/10
30000/30000 [==============================] - 1s 47us/sample - loss: 1.1228e-04
CPU times: user 57 s, sys: 17.5 s, total: 1min 14s
Wall time: 14.9 s
Train on 30000 samples
Epoch 1/10
30000/30000 [==============================] - 2s 65us/sample - loss: 2.9073e-04
Epoch 2/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.2198e-04
Epoch 3/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1957e-04
Epoch 4/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1719e-04
Epoch 5/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1798e-04
Epoch 6/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1557e-04
Epoch 7/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1331e-04
Epoch 8/10
30000/30000 [==============================] - 1s 46us/sample - loss: 1.1369e-04
Epoch 9/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1260e-04
Epoch 10/10
30000/30000 [==============================] - 1s 46us/sample - loss: 1.1129e-04
CPU times: user 56.7 s, sys: 16.1 s, total: 1min 12s
Wall time: 14.3 s
Train on 30000 samples
Epoch 1/10
30000/30000 [==============================] - 2s 65us/sample - loss: 3.4873e-04
Epoch 2/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.2157e-04
Epoch 3/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1910e-04
Epoch 4/10
30000/30000 [==============================] - 1s 44us/sample - loss: 1.1889e-04
Epoch 5/10
30000/30000 [==============================] - 1s 44us/sample - loss: 1.1626e-04
Epoch 6/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1652e-04
Epoch 7/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1606e-04
Epoch 8/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1363e-04
Epoch 9/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1238e-04
Epoch 10/10
30000/30000 [==============================] - 1s 45us/sample - loss: 1.1281e-04
CPU times: user 56.8 s, sys: 15.8 s, total: 1min 12s
Wall time: 14.5 s

fig, axs = plt.subplots(1, 3, figsize=(4.5,3), gridspec_kw=dict(width_ratios=[1,1,.05]))
these_lms = lm_nn, lm_nn_again
titles = 'Original NN', 'Round-trip NN'
for (ax,lm, title) in zip(axs, these_lms, titles):
    d = plot_lm_rz(ax, lm, cbar=False)
    ax.set_title(title)
cb = fig.colorbar(d, cax=axs[-1])
cb.set_label(r'collection efficiency')
for ax in axs[1:-1]:
    ax.set_yticks([])
    ax.set_ylabel('')

A percent difference plot might be helpful here:

fig, ax = plt.subplots()
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
f = lambda r, z: lm_nn(r, np.repeat(0, r.size), z, cyl=True)
h_nn = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
f = lambda r, z: lm_nn_again(r, np.repeat(0, r.size), z, cyl=True)
h_nn_again = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
d = hl.plot2d(ax, 100*(h_nn_again - h_nn) / h_nn, cbar=True, vmin=-5, vmax=5, cmap='RdBu_r')
d['colorbar'].set_label(r'percent difference (new vs orig NN)')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')

For comparison (between comparisons...) consider the percent difference between the original spline and original NN:

fig, ax = plt.subplots()
rang = (0, tpc.r), (tpc.zmin, tpc.zmax)
f = lambda r, z: lm_spline(r, np.repeat(0, r.size), z, cyl=True)
h_spline = hl.hist_from_eval(f, vectorize=False, bins=200, range=rang)
d = hl.plot2d(ax, 100*(h_spline - h_nn) / h_nn, cbar=True, vmin=-5, vmax=5, cmap='RdBu_r')
d['colorbar'].set_label(r'percent difference (Spline vs NN)')
ax.set_xlabel(r'$r$')
ax.set_ylabel(r'$z$')
ax.set_aspect('equal')

What's next?¶

There is plenty more that can be done already with this code:

How can we quantify the accuracy of the baseline "truth", which strictly speaking is only implicit in the MC?
Are we capturing the relevant dependence in full 3D?
How well do the various methods work for realistic calibration datasets?
Is there anything to gain from methods not shown here?
- Gaussian smoothing and/or sliding binning for LightMapHistXYZ or LightMapSplineXYZ
- analytic lightmaps with LightMapAnalytic
- partially-implemented cylindrically-binned LightMapHistCyl
What constraints can this give us on the detector and calibration designs?
N.B.: everything introduced here should, in practice, be tested with data not used for initial lightmap fitting. We've only used the first few of 1k files in the chosen production, so there's plenty more to choose from.

Here we demonstrated how these tools can be used. Lots of useful work can be done in an interactive notebook. For serious processing work, the first step would be to solidify the exact desired steps in scripts, which could be run sequentially following the main processing and which could use ROOT or root_numpy or whatever else to save the results. However, as soon as possible, we want to integrate at least use of, if not fitting of, lightmap models into SNiPER: first as python modules, and later, likely as a C++ service plus some C++ modules.

Going forward, a (relatively) simple per-SiPM hit-pattern treatment will be incorporated in similar fashion to what's shown here.

	fNOP	x	y	z	r	theta
0	30	-194.365575	251.323492	-517.713527	317.712880	2.229086
1	34	-22.179348	-285.365986	-313.788512	286.226605	4.634822
2	36	358.826140	-127.085345	-345.179910	380.666368	5.942801
3	35	245.737944	434.649320	36.410796	499.306687	1.056229
4	33	189.608664	-184.041113	-92.808621	264.239620	5.512686

	fNOP	fSiPMID	x	y	z	r	theta
0	30	[52342, 454, 22025, 10750, 40914, 92349, 2716,...	-194.365575	251.323492	-517.713527	317.712880	2.229086
1	34	[161759, 102437, 170312, 152738, 112229, 17060...	-22.179348	-285.365986	-313.788512	286.226605	4.634822
2	36	[140762, 131302, 161822, 161939, 62728, 110340...	358.826140	-127.085345	-345.179910	380.666368	5.942801
3	35	[212936, 140843, 221251, 71319, 212707, 132618...	245.737944	434.649320	36.410796	499.306687	1.056229
4	33	[20812, 50859, 161617, 90707, 120600, 112820, ...	189.608664	-184.041113	-92.808621	264.239620	5.512686