IceHive Code Review

General

Project: HiveSplitter, IceHive branch
Author: Marcel Zoll
Sponser: David Boersma
Consumers: everyone
Version Reviewed: projects/HiveSplitter/branches/IceHive r121895

This code review was motivated by the request to run this as our main splitting algorithm at the pole. Thus why everyone is listed as a consumer. Because of the large change to all processing this could have, the review will be detailed and nitpicky at times.

Note

This review only covers the code itself, not the physics.

Documentation

Doxygen

There is decent documentation of the two main modules (I3IceHive and I3HiveCleaning), so from a user perspective this is probably sufficient.

rst

There is no rst documentation at all, which needs to be remedied. Some general user documentation is strongly recomended, and detailed documentation of the algorithm would be welcome. I ask for this because this is the standard documentation linked to from http://software.icecube.wisc.edu/icerec_trunk/.

Examples

There are simple examples for both splitting and cleaning.

A more complex example may prove useful, such as a real-life example when running on a data or simulation file. What modules should go before or after these modules?

Tests

Tests are still a work in progress and need more effort.

General Comments

It is easy to see that some of the code has been reviewed before (in previous incarnations), as it is in decent shape regarding our coding standards.

Here are some general complaints:

OMKeyHash

Is there a reason we need another hash function for OMKeys? Was the one in icetray not good enough? If you just needed OM and String numbers and didn't want the PMT number, there is a ModuleKey class in dataclasses for this.

Fixed detector size

Fixed sizes are a bad idea in general and especially now since upgrade simulations have already begun. Unless you have an extremely good reason for it, this needs to be set by the constructor at runtime. (#17 [sutter])

Details in public headers

Templated code of course needs to be in header files, but over 90% of the code for the project is in the public headers. Is there a way to hide some of the implementation details in private files? If things are forced to remain fully templated, possibly a break up into public and private classes so the private class headers can actually be private.

DANGER HARDCODING

Everytime I see DANGER HARDCODING I cringe a little more. Either a value is a true constant which will never change, or it's an option that may change. Remove the ambiguity. (#16 [sutter])

uint vs unsigned int

There are many instances of each of these. You should pick one or the other notation and stick with it. Also, please directly include sys/types.h instead of relying on icetray headers to include it for you.

Some words on templates

There are a lot of templates used here, probably more than is wise. One thing I would recommend is that instead of Response being a templated object, it could be an internal interface to access any type of splittable object, with derived classes for each specific object. This greatly reduces the burden of outside classes to support exactly the syntax you want (meaning no modifying of dataclasses). It should also allow you to make much of the code private, which would provide a nicer interface for users.

Specific Comments

HiveSplitter.h

lines 17,21,23,24,29: Several headers seem to be included but not used. Cleanup is in order.
lines 41-61: Since these variables are public, do they need a trailing underscore?
lines 82-123: The data structures and helper functions don't seem to depend on anything within the HiveSplitter class, so why are they inside the class? Putting this in a private namespace would be better.
line 138: Saying that the variable is static in a comment isn't a good idea. Either use the keyword static in the code, or it's not static. Perhaps you should call it fixed size? Though I'm not sure the comment is even necessary.
lines 140,142: These two lookup tables are presented as c-style arrays using string and om indices. Would a hash_map be more appropriate for a lookup-based structure?
lines 289-290,685-686: Don't write namespace usings in a header file. (#59 [sutter])
line 303: Prefer prefix increment. (#28 [sutter])
line 332: Do you want to create a new Hit or just a reference to the first hit in the list? Because this creates a new Hit object.
line 336: Use the key to erase directly instead of finding the iterator first.
line 340: How much does the hinting matter for speed? And would it be better to do something like this:
```
if (complete.empty())
  complete.insert(h)
else
  complete.insert(--complete.end(),h)
```
line 361: What happens when c.complete is empty? I'm pretty sure that's undefined behavior, and should be avoided.
line 384: Can this be simplified to return(it1==end1)? The way it is currently written, both it1 and it2 can have not reached then ends and it returns true, which is not desired (not that this case should occur, but just in case).
lines 395-397: First you call GetDistance(h1.domIndex, h2.domIndex), then just below that you call GetDistance(SimpleIndex2OMKey(h1.domIndex), SimpleIndex2OMKey(h2.domIndex)). Is there a reason for the different calls?
line 405: It seems you could return a lot sooner if you tested that dist was NAN here, saving some computation.
line 437: Fix the FIXME.
line 447: If you need pairs of numbers, why don't you just ask for input as a std::vector< std::pair< double, double > >?
lines 515-517: Why are the hits extracted in one ordering, only to be reordered immediately?
line 548: A comment would be helpful to explain what is happening here.
lines 549,573,622,642,661: erase() returns an iterator to the next element in the list. Please use it instead of getting the next iterator before erasing.
lines 690-693: Do not use $I3_SRC. Prefer $I3_BUILD.
lines 882-883: The comparison domTopo_A >= domTopo_B is done twice. It is better to use a proper if/else construct and do the comparison once.

HiveSplitter.cxx

lines 28-40: The vectors can be filled in the constructor, so you can do this:

static const double ic[] = {300., 300., 272.7, 272.7, 165.8, 165.8};
static const double dc[] = {150., 150., 131.5, 131.5, 40.8, 40.8};
static const double pingu[] = {150., 150., 144.1, 144.1, 124.7, 124.7, 82.8, 82.8};
static const double v_ic[] = {100.,100.,100.,100.};
static const double v_dc[] = {100.,100.,100.,100.,100.,100.};
static const double v_p[] = {100,100.,100.,100.,100.,100.,100.,100.};
HiveSplitter_ParameterSet::HiveSplitter_ParameterSet():
  multiplicity_(4),
  timeWindow_(2000.*I3Units::ns),
  timeStatic_(200.*I3Units::ns),
  timeCVMinus_(200.*I3Units::ns),
  timeCVPlus_(200.*I3Units::ns),
  timeCNMinus_(200.*I3Units::ns),
  timeCNPlus_(200.*I3Units::ns),
  selfconnect_(true),
  domSpacingsOpt_(false),
  SingleDenseRingLimits_(ic, ic + sizeof(ic) / sizeof(ic[0]) ),
  DoubleDenseRingLimits_(dc, dc + sizeof(dc) / sizeof(dc[0]) ),
  TripleDenseRingLimits_(pingu, pingu + sizeof(pingu) / sizeof(pingu[0]) ),
  SingleDenseRingVicinity_(v_ic, v_ic + sizeof(v_ic) / sizeof(v_ic[0]) ),
  DoubleDenseRingVicinity_(v_dc, v_dc + sizeof(v_dc) / sizeof(v_dc[0]) ),
  TripleDenseRingVicinity_(v_p, v_p + sizeof(v_p) / sizeof(v_p[0]) )
{ }

Note that I used sizeof instead of a fixed length, which is preferred (magic numbers are disliked, and the compiler is smart).

I3IceHive.h

lines 90,92: These pointers never get cleaned up, resulting in a memory leak. Perhaps you need a destructor? You should probably also initialize them to NULL. Or you can use a shared_ptr.
line 156: Are these internal or external helpers? If internal, they should be private.
line 289: How do you know that the user chose not to apply TriggerSplitting?
line 361: Why use trigHierName_ instead of triggerSplitterOpt_? The way it is written, triggerSplitter_ may not be a valid pointer.
lines 380,401,408: Remove lines that are commented out and no longer valid.

Hive-lib.h

line 15: Include the stdlib as a c++ header:
```
#include <cstdlib>
```
line 22: Check to make sure this is not already declared somewhere else (like sys/types.h). Also, uint is a really common name to be adding to the global namespace.

Hive-lib.cxx

There are a number of //std::cout lines, which should either be converted to log_trace or removed.
line 12: Include the stdlib as a c++ header.
lines 31-32: Consider using honey_[0] where appropriate instead of an iterator.

lines 52-56: I again question the use of an iterator, instead of:

if (honey_.size() <= ringnbr)
    return std::set<uint>();
else
    return honey_[ringnbr];

lines 93-94,97,102: Better to get the honey_.size() directly, so we do one less function call and less addition operations.
line 189: It is not guaranteed that combs->begin() returns a valid iterator. Check that it does not equal combs->end() before using it.
line 192: You say in the documentation "-1 if not possible" but return 0.
lines 199-214: Instead of adding all strings then removing those already in the current comb, consider first getting the set of strings in the comb and only adding strings that do not match that. This prevents erasing after the fact, which is not very clean.
line 228: You say in the documentation "-1 if not possible" but return 0.
line 237: In all other loops scale starts at 1 instead of 0.
lines 220-282: Unless you have a very good speed reason, it is better to call ExpandToNextRing() scale_factor number of times. It is significantly simpler to do this, thus easier to test for correctness.
line 294: Did you mean to compare combs1 and combs2? Right now this will always return false.
line 299: Prefer the copy constructor over assignment.
lines 445,456,467,472,483,488: Prefer && instead of nested conditionals. (#20 [sutter])

HitSorting.h

line 97: Allowing the InputIterators to be different classes means we might be comparing two completely different classes. This is fairly dangerous and probably not needed.
line 98: Why do you take iterators here, yet in SetsIdentical you take ordered_set objects?
line 115: std::equal can replace SetsIdentical(). (#84 [sutter])
line 249: Where do you create the vector that you're pushing back into?
lines 244-258: Should the if statement go inside the loop for less duplicated code?
line 341: Incorrect function name in debug statement.
line 362: Would it be better to iterate instead of using an index?
lines 370-373: Suggest using std::accumulate().
line 390: Incorrect function name in debug statement.
line 391: What is GetRetrievalOrdered()? Did you mean GetRetrievalOrderedHits()?
line 392: What is sort()? Is this std::sort()?
line 401: Fix the FIXME.
line 409: Why use the iterator when you can use array notation?
lines 405-421: Should the if statement go inside the loop for less duplicated code?
line 428: Fix the FIXME.

HitSorting.cxx

lines 50-53,58-61: Couldn't you just use the set copy constructor?

HiveCleaning.h

There appears to be a lot of duplicate code (and comments) from HiveSplitter.h. Perhaps this should be factored out? I'm not going to bother reviewing the duplicate code, so assume it has the same problems as in the other file.
lines 138-139,205,277-278: Don't write namespace usings in a header file. (#59 [sutter])

I3HiveCleaning.h

line 37: Does this need to inherit from I3Splitter? It doesn't look like it splits anything.
line 77: Cannot find implementation for ConfigureSplitters().

TriggerSplitter.h

line 306: Don't write namespace usings in a header file. (#59 [sutter])

TriggerSplitter.cxx

line 43: The vector can be filled in the constructor (as seen above).
lines 89-90: Prefer && instead of nested conditionals. (#20 [sutter])

pybindings/module.cxx

line 25: Don't write namespace usings before an #include. (#59 [sutter])

References

[sutter]

(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) C++ Coding Standards (http://www.gotw.ca/publications/c++cs.htm), by Herb Sutter and Andrei Alexandrescu