This document briefly describes standard operating procedures for running the IceCube Testing DAQ, which was integrated at the South Pole in January/February of 2005. Common problems and their solutions are outlined.
Both the DOMhub and the DOMs are expected to be pre-installed with the necessary software.
The DOMhub installation procedure (see http://docushare.icecube.wisc.edu/docushare/dsweb/View/Collection-979 ) consists of using an automated "minimal-install" process, followed by a secondary post-installation process. The initial process consists of using a specialized kickstart file to install the operating system (Red Hat 9) at the end of which a "minimal install" scripts runs. The secondary process consists of installing additional files from a large zip file.
The DOM software installation is also a two-step process. It consists of uploading the chosen release to the DOMs, followed by uploading the latest version of DomCal to the DOMs.
To upload the chosen release: (takes about 15 minutes)
Issue command 'ps -ef' to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and dtsx and domhub-app).
Issue command 'ldall release.hex'. At the time of this writing the chosen
releases could be found in /home/testdaq/releases/release308 and
/home/testdaq/releases/release-pole-fb-01. The latter release has all the
functionality of release308, and additionally includes flasherboard support.
Issue command ' echo "0" > /proc/driver/domhub/blocking '
Issue command ' off all '
To upload DomCal: (takes about 3 minutes)
Issue command 'on all'
Issue command 'gotoiceboot'
Issue command 'upload_domcal'. This will upload the file
/home/testdaq/domcal5.bin.gz to all accessible DOMs.
To run lcchain.py:
Issue command 'ps -ef' to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and domserv and dtsx and domhub-app).
Issue command 'off all'
Issue command 'on all'
Issue command 'gotoiceboot'
Issue command 'dtsxall'
lcchain.py arguments indicate the top and/or bottom DOM in the chain you wish to test.
for example:
lcchain.py -h localhost -s 001 -e 710
The above command will test the local coincidence signals for a
complete DOMhub. "001" represents card 0, wire pair 0, DOM B (the topmost
DOM on the DOMhub), and
"710" represents card 7, wire pair 1, DOM A (the bottomost DOM on the
DOMhub).
IceTop has a special set of "circular" LC hookups. An example of how to use lcchain.py on an IceTop station (if you are logged into sps-ithub-cont01) is:
lcchain.py -i -h localhost -s 000
The above command will test all DOMs within the station, which spanse DOR cards 0 and 1. To test the next station you would run lcchain.py again with the command:
lcchain.py -i -h localhost -s 200
Another set of higher level tests which you need to do both as
part of DOM commissioning and then again at regular intervals thereafter is to
run the Simple Test Framework (STF). To run STF on a DOMhub: (takes about 30
minutes)
Issue command 'ps -ef' to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and dtsx and domhub-app).
Issue command 'off all'
Issue command 'source setclasspath /home/testdaq/work-stf/'
Issue command 'java icecube.daq.stf.STF'
A box will appear (if it does not then probably you cannot connect to the database on sps-dbs).
Use pull-down menu 'Connect --> Open direct DOR session'
This will turn on all DOMs and put them into the "STF" state.
Use pull-down menu 'File --> Load tests'
Click once on the directory "All-tests" (do not double click)
Use pull-down menu 'Start --> Select all DOMs'
Use pull-down menu 'Start --> Select all Tests'
Use pull-down menu 'Start --> Run'
You now have to answer three questions:
How many iterations? Answer between 1 and 5
Enter DOM temperature. Put in your best guess
Test integrated DOMs. Answer "Y".
STF will now open up a much larger window where you can see a grid of green (for passing) and red (for failing) boxes as the tests are run on each DOM.
You can double click on the boxes to get information about individual tests. Additionally, there is a file on sps-stringproc01 called /mnt/data/testdaq/useful/mysql.txt which shows you some example queries you could use to extract STF information from the database at a later date.
If a particular test fails for a DOM then that test should be repeated several times to verify whether or not the DOM has a problem.
If you have trouble running STF because of a badly communicating DOM,
you can try this (no guarantees here): (note that you may want to verify
that the flash download was successful - ie put the DOM into iceboot, for
example).
"on all"
turn off the wire pair for the wire pair giving you problems. If card 4,
wire pair 0 is the culprit then the command would be "off 4 0".
then try to run STF again. If that still doesn't work then do the above,
followed by "gotoiceboot", and then run STF one more time.
Prior to running DomCal, DOMs should already be at their nominal operating temperature. If they are not, it is necessary to warm them up. Do this by turning them on and putting them in iceboot mode (they draw more current in iceboot mode than in configboot mode, and hence they warm up faster). A cold DOM probably takes about two hours to warm up properly.
To run DomCal: (takes about 30 minutes)
On each DOMhub for which you wish to calibrate DOMs:
Issue command "ps -ef" to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and dtsx and domhub-app).
Issue command "off all"
Issue command "on all"
Issue command "gotoiceboot"
Issue command "dtsxall"
Then, from the appropriate string processor:
Issue command "cd /mnt/data/testdaq/domcal"
Issue command "nohup java icecube.daq.domcal.DOMCal DOMHUB-MACHINE-NAME 5000 64
/mnt/data/testdaq/domcal/ calibrate dom calibrate hv &". At the pole, DOMHUB-MACHINE-NAME is either sps-ichub-cont01,
sps-ichub-cont02 or sps-ithub-cont01. Feel free to run DomCal on multiple
DOMhubs in parallel.
At the end (use "ps -ef" to see if DomCal is still running, make sure that
all DomCal files have been successfully updated in the /mnt/data/testdaq/domcal output
directory. Do "ls -altr" and "ls | grep domcal -c" to verify this.
Problems? Check /mnt/data/testdaq/domcal/nohup.out. Database access errors can cause DomCal to hang.
Problem PMTs can cause individual HV calibrations to fail. If you suspect an
HV calibration problem, try running DomCal on one DOM without the argument
"calibrate hv" - and make sure you get the port numbers right!
It is important that there be one DomCal file for each DOM that the
string processor is going to take data with. Hence, my advice is to have
up-to-date DomCal files for all DOMs on sps-stringproc01 (the InIce string
processor), and DomCal files for only IceTop doms on sps-icetop01 (the IceTop
string processor). The up-to-date DomCal files must reside in the directory
/mnt/data/testdaq/domcal on each machine.
Steering files for 60 DOMs are typically about 200 kBytes. It is somewhat impractical to write these by hand. There is a program called autogen-wrapper which will generate several different types of steering files in the directory from which you run it from (so be careful where you run it from). Then you can select the steering files you want to use, and make whatever small modifications you deem necessary.
To run autogen-wrapper:
On each DOMhub for which you wish to include DOMs in your runs:
Issue command "ps -ef" to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and dtsx and domhub-app).
Issue command "off all"
Issue command "on all"
Issue command "gotoiceboot"
Issue command "dtsxall"
Then, from the appropriate string processor:
Edit the file /mnt/data/testdaq/bin/autogen-wrapper to include the DOMhubs
for which you wish to generate steering files for.
Make any modifications you need to make at the bottom of
/mnt/data/testdaq/bin/autogen-steering-LC in order to make a flasherboard
steering file for the DOM of your choice.
Issue command "autogen-wrapper" (remember to be careful about which
directory you are in when you run it!)
Now you are ready to modify the steering file parameters. "executionTime" is a typical parameter you may wish to modify - I advise you not to set the executionTime to be less than 45 seconds, because of configuration timing issues. Also, a 10 minute (600 second) run can produce a very large .hit file (400MBytes plus). If the .hit file is going to be copied over the satellite, I advise you not to use executionTimes above around 600 seconds.
On each DOMhub for which you wish to include DOMs in your runs:
Issue command "ps -ef" to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and dtsx and domhub-app).
Issue command "off all"
Issue command "ready". You should see the following line at the bottom of
domhubapp.log, which should be being tailed on each hub as a result of the
"ready" command (therefore, Cntrl-C will not stop domhub-app):
[main] DEBUG (DOMHub.java:123) - Waiting for RMI method calls
Then, from the appropriate string processor:
Issue command "ps -ef" to verify that no programs are running which might
be in conflict. You do not want multiple copies of "automate" or testdaq"
running.
Issue command "go"
InIce runs should be taken from SPS-STRINGPROC01
IceTop runs should be taken from SPS-ICETOP01
Combined InIce-IceTop runs should be taken from SPS-STRINGPROC01.
Combined should have the phrase "IniceIcetop" in the steering file name.
To stop a run cleanly from the string processor, issue the command "stoptestdaqclean". This will kill the program "automate". This means that no new runs will be started. The current run will go to completion (unless either domhubapp or the DOMs are shut down prematurely on any of the DOMhubs). Stopping a run this way also makes it possible to later start up runs again without having to issue any commands on any of the DOMhubs. It is necessary to ensure that data-accumulation for the current run has finished before you try to start up any new runs (use "ps -ef").
to stop all activity on the string processor immediately (including ongoing post-run processing via the program "background_it.pl"), issue the command "stoptestdaq". In this case, it is also necessary that you issue the command "stoptestdaq" on each of the DOMhubs, since the DOMs will likely be left in a strange state as a result of the abrupt end to data-taking.
It is usually desirable to leave the DOMs powered up with their high voltage on (for temperature and high voltage stability). If not taking TestDAQ data, then it is suggested you take multimon data instead.
TestDAQ runs need to be monitored regularly, especially since there has not been a lot of experience at running TestDAQ at the pole. Initially you should watch (via "tail -f") the domhubapp.log file on each DOMhub and also the testdaq.log file on the string processor, the nohup.out file in /mnt/data/testdaq, and the size of the files in the output directory on the local disk. You should see the size of the files increasing (especially the .hit file), and you should not see any obvious errors in either the domhubapp.log file or the testdaq.log file.
When the run is over, you should look at the .dataqualitylog when it becomes available. Look at the hit rates for each DOM to see if they make sense. Look for the message "Requeted DOM not present in hit stream" - if you haven't explicitly turned this DOM off in the exclude DOM section of dh.properties, then something is likely wrong.
If you want to scan through a bunch of runs then cd to /mnt/data/testdaq/latest_data and issue the following commands:
grep -i error */*.testdaqcontrollog
grep -i exception */*.testdaqcontrollog
grep -i requested */*.dataqualitylog
The major problem to look out for is the case where a run hangs - either while data is being accumulated or else when the run is stopping. In either of these cases TestDAQ will need to be shut down and restarted. The sooner you catch a problem like this the better.
On each DOMhub for which you wish to include DOMs in your multimon runs:
Issue command "ps -ef" to verify that no programs are running which might
hold on to the DOM device ports (look for and terminate programs like
domterm and dtsx and domhub-app).
Issue command "off all"
Issue command "on all"
Issue command "gotoiceboot"
Issue command "dtsxall"
On the appropriate string processor issue the command:
multimon-wrapper-icecube (for both inice hubs)
or
multimon-wrapper-icetop
(multimon-wrapper-icecube1 and multimon-wrapper-icecube2
will only attempt to connect to one of the inice hubs, which may be the
appropriate command depending on what is going on at the time).
Note that it is necessary to shut down multimon (use the "ps -ef" and "kill" commands) and also to power off the DOMs before they can be utilized for some other purpose.
The output of multimon is directed towards /mnt/data/testdaq/monitoring/icecube or icetop.
Multimon for the inice hubs should be run on SPS-STRINGPROC01. Meanwhile, Multimon for the icetop hub should be run on SPS-ICETOP01.
java -jar mmdislay.jar invokes a nice tool for looking at the output of multimon. (invoke from /mnt/data/testdaq on the string processors).