Cluster Computing with HTCondor

submitter.icecube.wisc.edu

IceCube has an HTCondor compute cluster at UW-Madison, colloquially called NPX:

dschultz@pub1 ~ $ ssh submitter

================ This is the submit node for the NPX cluster ===============

Policies, Condor documentation, howtos, best practices, etc. can be found
at https://wiki.icecube.wisc.edu/index.php/Condor. Here are highlights:

    * By default, maximum job runtime is 12 hours
    * By default, jobs are allocated 1 CPU core, 1GB of memory, 1GB of disk

dschultz@submitter ~ $

Use this machine for submitting cluster jobs.

Basics of a Cluster Job

You should already have a script or program that you run to create/analyze data and simulation. To run this on a cluster, you need to eliminate:

  • user input
  • screen graphics
  • any other interactive features

The basic job framework is strictly file-based input/output. Note that this is excellent for IceTray processing, since we have an I3Reader and I3Writer with modules in between.

HTCondor Hello World

HTCondor has lots of options, but we'll only focus on a few. Let's do the most basic thing we can, hello world.

First, we need a script to run:

#!/bin/sh
echo Hello World!

Always verify that the script does what you want before running it on the cluster:

dschultz@submitter ~/test/helloworld $ chmod +x hello.sh
dschultz@submitter ~/test/helloworld $ ./hello.sh
Hello World!
dschultz@submitter ~/test/helloworld $

Good, that does what we want.

Submit File

Now, let's make the HTCondor submit file:

# this is the script we want to run
executable =  hello.sh

# some logging in case we have to debug things
log = hello.log
output = hello.out
error = hello.err

# don't send me any emails about job status
notification = never

# add the job to the queue
queue

Submit the Job

And finally we tell HTCondor to run our job:

dschultz@submitter ~/test/helloworld $ condor_submit hello.submit
Submitting job(s).
1 job(s) submitted to cluster 20556929.
dschultz@submitter ~/test/helloworld $

If the cluster is not very busy, this will run immediately and make the output:

dschultz@submitter ~/test/helloworld $ cat hello.out
Hello World!
dschultz@submitter ~/test/helloworld $

Checking Job Status

For longer jobs, which may take several hours, we can view the status of the job via condor_q:

dschultz@submitter ~/test/helloworld $ condor_q dschultz


-- Submitter: submit.icecube.wisc.edu : <10.128.12.110:34298> : submit.icecube.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
20556929.0 dschultz          5/21 10:30   0+00:00:00 I  0   0.0  hello.sh

1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended
dschultz@submitter ~/test/helloworld $

Look, we have an idle job. It's probably waiting to run.

Job States

The most common HTCondor job states are:

  • I: idle
  • R: running
  • H: held (hit time limit or other administrative error)
  • C: complete (will disappear from queue in next update cycle)

Selecting Resources

Say we have a high memory job, which requires 8 GB of memory to run. While it will probably work with the default 1 GB setting, it is polite to ask for what you use. Modify the submit file to:

executable =  hello.sh
log = hello.log
output = hello.out
error = hello.err
notification = never

# give me more memory
request_memory = 8000

queue

Other things that can be requested in this manner are disk space (request_disk), cpus (request_cpus), and gpus (request_gpus).

GPU Resources

GPU jobs are similarly defined with a gpu request in the submit file:

executable =  hello.sh
log = hello.log
output = hello.out
error = hello.err
notification = never

# gpu job
request_gpus = 1
# only run on CUDA gpus?
#requirements = CUDACapability

queue

Note the CUDACapability requrement if your program needs an Nvidia or Cuda gpu.

Groups

We also have groups to denote special conditions. The main group is the long group.

The default group sets a job time limit of 12 hours, after which the job is put on Hold. The long group increases this time to 48 hours, at the expense of fewer jobs running at one time. Add a job to the long group with the following submit file:

executable =  hello.sh
log = hello.log
output = hello.out
error = hello.err
notification = never

# long job
+AccountingGroup="long.$ENV(USER)"

queue

Advanced Submission

Recently condor submission has gained new features to make it more useful:

executable =  hello.sh
log = hello.log
output = hello.out
error = hello.err
notification = never

# set arguments to executable
arguments = $(Item)

queue 1 in (1,2,3,4,5,6,7,8,9,10 )

This will make 10 jobs, one for each argument.

Advanced Submission

If you wanted to run on input files, you can do:

queue 1 Item matching (*.i3.gz)

This will start one job per input file, passing the filename as an argument to the executable.

You can also set up a separate file to hold the arguments:

queue 1 Item from arguments.txt

Each line in arguments.txt will create a separate job with that line passed as an argument to the executable.

DAGMan

DAGMan is a tool that comes bundled with HTCondor. It can do two useful things:

  • Control number of running jobs
  • Handle inter-job dependencies

Also, DAGMan = Directed Acyclic Graph Manager.

DAGMan Basics

Let's make a basic DAG submit file:

# file name: dagman.submit
JOB job1 job.condor
VARS job1 Filenum="001"
JOB job2 job.condor
VARS job2 Filenum="002"
JOB job3 job.condor
VARS job3 Filenum="003"
JOB job4 job.condor
VARS job4 Filenum="004"

DAGMan Basics (2)

And a regular condor submit file to run:

# file name: job.condor
# special variables:
#   Filenum = Filenum var defined in dagman.submit
Executable = job.sh
Arguments = $(Filenum)

output = job.$(Cluster).out
error = job.$(Cluster).err
log = job.log

notification = never

queue

DAGMan Basics (3)

And a script to run:

#!/bin/sh
# file name: job.sh
echo $@

DAGMan Basics (4)

And submit it, limiting to 2 active jobs:

dschultz@submitter ~/test/dagman $ chmod +x job.sh
dschultz@submitter ~/test/dagman $ condor_submit_dag -maxjobs 2 dagman.submit

-----------------------------------------------------------------------
File for submitting this DAG to Condor           : dagman.submit.condor.sub
Log of DAGMan debugging messages                 : dagman.submit.dagman.out
Log of Condor library output                     : dagman.submit.lib.out
Log of Condor library error messages             : dagman.submit.lib.err
Log of the life of condor_dagman itself          : dagman.submit.dagman.log

Submitting job(s).
1 job(s) submitted to cluster 21135967.
-----------------------------------------------------------------------
dschultz@submitter ~/test/dagman $

DAGMan Basics (5)

We can see the dagman running:

dschultz@submitter ~/test/dagman $ condor_q dschultz


-- Submitter: submit.icecube.wisc.edu : <10.128.12.110:34298> : submit.icecube.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
21135967.0   dschultz        6/2  14:31   0+00:00:38 R  0   0.3  condor_dagman -f -

1 jobs; 0 completed, 0 removed, 0 idle, 1 running, 0 held, 0 suspended
dschultz@submitter ~/test/dagman $

DAGMan Basics (6)

Looking at the dagman output messages, the job limit can be seen:

...
06/02/14 14:40:49 Of 4 nodes total:
06/02/14 14:40:49  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
06/02/14 14:40:49   ===     ===      ===     ===     ===        ===      ===
06/02/14 14:40:49     0       0        2       0       2          0        0
06/02/14 14:40:49 0 job proc(s) currently held
06/02/14 14:40:49 Note: 2 total job deferrals because of -MaxJobs limit (2)
...

Job Dependencies

Now let us look at an example with dependencies, where one job must run before another one.

Let's make a dag with 3 parents and one child (maybe processing and cleanup?):

# file name: dagman.submit
JOB job1 job.condor
VARS job1 Filenum="001"
JOB job2 job.condor
VARS job2 Filenum="002"
JOB job3 job.condor
VARS job3 Filenum="003"
JOB job4 job.condor
VARS job3 Filenum="004"
# define the DAG relationship
Parent job1 job2 job3 Child job4

Job Dependencies (2)

And submit it, limiting to 2 active jobs:

dschultz@submitter ~/test/dagman $ condor_submit_dag -maxjobs 2 dagman.submit

-----------------------------------------------------------------------
File for submitting this DAG to Condor           : dagman.submit.condor.sub
Log of DAGMan debugging messages                 : dagman.submit.dagman.out
Log of Condor library output                     : dagman.submit.lib.out
Log of Condor library error messages             : dagman.submit.lib.err
Log of the life of condor_dagman itself          : dagman.submit.dagman.log

Submitting job(s).
1 job(s) submitted to cluster 21139114.
-----------------------------------------------------------------------
dschultz@submitter ~/test/dagman $

Job Dependencies (3)

Looking at the dagman output messages, we can see the child job is un-ready until the three parents have finished:

...
06/02/14 15:45:31 Of 4 nodes total:
06/02/14 15:45:31  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
06/02/14 15:45:31   ===     ===      ===     ===     ===        ===      ===
06/02/14 15:45:31     0       0        2       0       1          1        0
06/02/14 15:45:31 0 job proc(s) currently held
06/02/14 15:45:31 Note: 1 total job deferrals because of -MaxJobs limit (2)
...
06/02/14 15:45:56 Of 4 nodes total:
06/02/14 15:45:56  Done     Pre   Queued    Post   Ready   Un-Ready   Failed
06/02/14 15:45:56   ===     ===      ===     ===     ===        ===      ===
06/02/14 15:45:56     3       0        0       0       1          0        0
06/02/14 15:45:56 0 job proc(s) currently held
...

IceTray Job

Let's first get a basic icetray script:

dschultz@submitter ~/test/icetray $ wget http://code.icecube.wisc.edu/svn/sandbox/bootcamp_madison_2014/my_first_icetray_script.py
dschultz@submitter ~/test/icetray $ chmod +x my_first_icetray_script.py
dschultz@submitter ~/test/icetray $

Now for some examples of how to submit IceTray jobs.

Basic Submission

When you only have a few jobs to run, basic HTCondor submission is fine:

# file name: job.condor
Executable = my_first_icetray_script.py
output = job.$(Cluster).out
error = job.$(Cluster).err
log = job.log
notification = never
# use the current metaproject environment
getenv = True

# run on input
Arguments = input.i3.bz2 output.i3.bz2
queue

# run on input2
Arguments = input2.i3.bz2 output2.i3.bz2
queue

DAGMan Submission

For larger numbers of jobs, DAGMan should be used. Here is the dagman submit file:

# file name: dagman.submit
JOB job1 job.condor
VARS job1 gcd="gcd.i3.gz"
VARS job1 infilename="input.i3.bz2"
VARS job1 outfilename="output.i3.bz2"
JOB job2 job.condor
VARS job1 gcd="gcd.i3.gz"
VARS job2 infilename="input2.i3.bz2"
VARS job2 outfilename="output2.i3.bz2"

DAGMan Submission (2)

And the condor submit file:

# file name: job.condor
Executable = my_first_icetray_script.py
output = job.$(Cluster).out
error = job.$(Cluster).err
log = job.log
notification = never

# use the current metaproject environment
getenv = True

Arguments = $(gcd) $(infilename) $(outfilename)
queue

SubmitMyJobs

Many wrapper scripts exist to do some of the tedious work of submission, such as modifying the basic template to match valid filenames for the current run(s). Here are some examples.

This shell script will build the dag submit file for you, based on the input directory you give it. Some customization of the script may be necessary each time it is used.

Shell Script:

#!/bin/sh
for i in $1/*[0123456789].i3.bz2; do
    JOBID=job.`basename $i`
    echo JOB $JOBID job.condor
    gcdfile=`echo $i | sed s/Part.*.i3.bz2/GCD.i3.gz/g`
    echo VARS $JOBID gcd=\"$gcdfile\"
    echo VARS $JOBID infilename=\"$i\"
    echo VARS $JOBID outfilename=\"data/`basename $i`\"
done

This series of scripts can be used to submit one or multiple jobs. It consists of a single job submit file, the DAG builder, and a shell script to submit and begin monitoring in one step. There is some documentation both in a README and in comments in each file.

condorDAGManExamples is located on svn:

dschultz@cobalt01 ~ $ svn ls http://code.icecube.wisc.edu/svn/sandbox/gladstone/condorDAGManExamples/
OneJob.submit
README
SubmitMyJobs.sh
builddag.sh
dagman.config

Troubleshooting

#1 advice: get on slack and ask some questions. We promise to be nice.

My job hasn't started running yet. Why not?

If the queue is full you may need to wait up to an hour for your job to start. Also, if you have been running lots of other jobs your priority may be lower than other users.

You can check your priority with condor_userprio.

If you think your job should be running and it isn't, then debugging can start. First, find the ID of the job. Then run condor_q -better-analyze on that ID.

My jobs were running, but aren't anymore. What happened?

Try running condor_q -hold $USER. This will tell you if your jobs have been stopped, and hopefully why. If nothing appears, either your jobs failed or they will restart automatically.

A common error message is:

-- Submitter: submitter.icecube.wisc.edu : <10.128.12.110:46424> : submitter.icecube.wisc.edu
 ID          OWNER          HELD_SINCE  HOLD_REASON
21482446.0   briedel         6/9  21:55 Maximum excution time exceeded: 12:01:24 > 12:00:00

The problem here is that this job was in the standard group and exceeded the 12 hour time limit. It should be resubmitted on the long group.

condor_q fails

condor_q or other condor commands are failing with an error like:

-- Failed to fetch ads from: <10.128.12.110:40381> : submitter.icecube.wisc.edu
CEDAR:6001:Failed to connect to <10.128.12.110:40381>

HTCondor is likely overloaded, so stop trying to ask it things. Wait 5 minutes and try again.

If it fails to work for 30 minutes, then there might be a real problem. Email help@icecube.wisc.edu with the error message.

Exercise

To practice submitting to condor and doing real work, let's find all events that pass the min bias filter from the first 100 Level2 files in this directory:

/data/sim/IceCube/2011/filtered/level2/CORSIKA-in-ice/10668/01000-01999/

Some hints:

  • A magic shebang is:

    #!/bin/sh /cvmfs/icecube.opensciencegrid.org/standard/icetray-start
    #METAPROJECT: offline-software/trunk
    
  • Be sure to check that the prescale passed too:

    frame['FilterMask'][filter].condition_passed and
    frame['FilterMask'][filter].prescale_passed
    

Excercise answers

A script to process each file:

#!/bin/sh /cvmfs/icecube.opensciencegrid.org/standard/icetray-start
#METAPROJECT: offline-software/trunk
import sys,os

input = sys.argv[1]
output = os.path.join(sys.argv[2],os.path.basename(input))

from icecube import dataclasses,dataio

outfile = dataio.I3File(output,'w')
try:
    for frame in dataio.I3File(input):
        if 'FilterMask' in frame and frame['FilterMask']['FilterMinBias_11']:
            outfile.push(frame)
finally:
    outfile.close()

Exercise

The condor submit file:

executable = script.py
output = out.$(Process)
error = err.$(Process)
log = log
notification = never

arguments = $(Item) /data/user/dschultz/bootcamp_output

queue 1 Item matching (/data/sim/IceCube/2011/filtered/level2/CORSIKA-in-ice/10668/01000-01999/Level2_IC86.2011_corsika.010668.0010*.i3.bz2)

Links