Package iceprod :: Package server :: Module db :: Class MonitorDB
[hide private]
[frames] | no frames]

Class MonitorDB

source code

IceProdDB --+
            |
           MonitorDB

Instance Methods [hide private]
 
__init__(self)
Constructor
source code
 
new(self)
Create a copy of this instance
source code
 
reset_old_jobs(self, grid_id, maxidletime, maxruntime, maxsubmittime, maxcopytime, maxfailures=10, maxevicttime=10, keepalive=14400)
reset status of jobs that where queued but who's status has not changed in more that maxtime minutes
source code
 
download_tasks(self, dataset_id, steering)
Get job parts from database
source code
 
getNewSets(self, grid=None, dataset=0)
Get a list of new datasets.
source code
 
getNewFinishedSets(self, dataset=0)
Get a list of finished dataset for which no histos have been created.
source code
 
getFinishedSets(self, datasetlist=[])
Get a list of finished dataset for which no histos have been created.
source code
 
getSetsInfo(self, dataset_id)
Get a list of new datasets.
source code
 
AddedHisto(self, dataset) source code
 
update_monitoring(self, grid_id=None, dataset_id=0)
Update statistics for datasets and return all dataset which have completed
source code
 
GetGridId(self, grid_name, institution=None, batchsys=None, url=None)
Retrieve the key for grid_name
source code
 
RegisterServer(self, grid_id, server_name, server_status, server_pid)
Retrieve the key for grid_name
source code
 
GetGridStatusChanges(self, grid_id)
Get status changes for daemons
source code
 
GridRequestSuspend(self, grid, daemon)
Change status of daemons
source code
 
GridRequestResume(self, grid, daemon)
Change status of daemons
source code
 
GetDatasetParams(self, dataset)
Get parameters for given dataset
source code
 
GetFinishedJobs(self, grid_id, max_copy=20, delay=5)
Fetch list of jobs that have completed for given grid_id
source code
 
GetResetJobs(self, grid_id, max_reset=50)
Fetch list of jobs that have completed for given grid_id
source code
 
reset_old_tasks(self, grid_id, maxidletime, maxruntime, maxsubmittime, maxcopytime, maxfailures=10, maxevicttime=10, keepalive=14400)
reset status of tasks that where queued but who's status has not changed in more that maxtime minutes
source code
 
GetResetTasks(self, grid_id, max_reset=50)
Fetch list of jobs that have completed for given grid_id
source code
 
GetActiveJobs(self, grid_id)
Get list of jobs currently in any active state
source code
 
GetQueuedJobs(self, grid_id, delay=5)
Get list of jobs currently in queued status
source code
 
GetProcessingJobs(self, grid_id, delay=5)
Get list of jobs currently in queued status
source code
 
GetFinishedTasks(self, grid_id, max_copy=20, delay=5)
Fetch list of jobs that have completed for given grid_id
source code
 
GetTasks(self, grid_id, status=('QUEUED', 'QUEUEING', 'PROCESSING', 'RESET', 'ERROR'), delay=0)
Get list of tasks currently in any given status
source code
 
GetActiveTasks(self, grid_id, delay=0)
Get list of tasks currently active
source code
 
GetQueuedTasks(self, grid_id, delay=5)
Get list of tasks currently in queue
source code
 
GetProcessingTasks(self, grid_id, delay=5)
Get list of tasks currently in queue
source code
 
CheckJobDependencies(self, grid_id) source code
 
QueueJobs(self, maxjobs, grid_id, jobs_at_once=20, fifo=True, debug=0, maxq=1000)
Reserve at most 'maxjobs' from a given dataset.
source code
 
SetTasks(self, maxjobs, grid_id, jobs_at_once=20, fifo=True, debug=0)
DAG mode Reserve at most 'maxjobs' from a given dataset.
source code
 
QueueTasks(self, maxjobs, grid_id, jobs_at_once=20, fifo=True, debug=0)
DAG mode Reserve at most 'maxjobs' from a given dataset.
source code
 
SuspendGridDataset(self, grid, dataset, suspend=1)
Update grid participation in dataset dataset.
source code
 
InitializeGridStats(self, grids, dataset_id)
Insert grid_statistics entries for grids which should run this dataset.
source code
 
InitializeGridStatsDAG(self, grids, steering, dataset_id)
Insert grid_statistics entries for grids which should run this dataset.
source code
 
InitializeJobTable(self, maxjobs, dataset_id, priority=0, stepsize=1000, start_qid=0, status='WAITING')
Create job monitoring entries in database
source code
 
GetJob(self, dataset_id=0, queue_id=0)
Get Job info
source code
 
jobstart(self, hostname, grid_id, dataset_id=0, queue_id=0, key=None)
Change the status of a job to indicate it is currently running
source code
 
jobreset(self, dataset_id, job_id, reason=None, passkey=None)
Update status for job
source code
 
jobcopying(self, dataset_id, job_id, passkey=None)
Update status for job
source code
 
jobfinalize(self, dataset_id, job_id, job, status='OK', clear_errors=True)
Update status for job
source code
 
get_stats(self, dataset_id)
Get collected dataset statistics
source code
 
jobfinish(self, dataset_id, job_id, stats, key=None, mode=0)
Update monitoring for job and write statistics
source code
 
update_node_statistics(self, gridinfo, stats, retval=0) source code
 
jobsubmitted(self, dataset_id, job_id, submitdir, grid_queue_id=None)
Set the submission path of job so that it can be post processed on termination.
source code
 
jobping(self, dataset_id, job_id, host, key=None, tray=0, iter=0)
Update status_changed time for job
source code
 
jobabort(self, job_id, dataset_id, error, errormessage='', key=None, stats={})
Reset any pending jobs to they get reprocesses.
source code
 
jobclean(self, dataset_id, archive=True)
Remove jobs from queueue
source code
 
jobsuspend(self, job_id, dataset_id, suspend=True)
Reset any pending jobs to they get reprocesses.
source code
 
jobsetstatus(self, dataset_id, job_id=-1, status='RESET', reason=None, passkey=None) source code
 
ToggleDatasetDebug(self, dataset)
Update status of dataset
source code
 
getDatasetStatus(self, dataset)
Get status of dataset
source code
 
setDatasetStatus(self, dataset, status)
Update status of dataset
source code
 
set_metadata_subcat(self, dataset_id, sub_cat)
Change Plus:subcategory in DIFPlus metadata
source code
 
validate(self, dataset_id, status='TRUE')
Mark dataset as visible and valid.
source code
 
multipart_job_start(self, dataset_id, queue_id, key='')
Change the status of a job to indicate it is currently running
source code
 
multipart_job_finish(self, dataset_id, queue_id, key='') source code
 
get_iterations(self, dataset_id) source code
 
get_task_id(self, task_def_id, job_id, tray, iter) source code
 
task_start(self, dataset_id, queue_id, taskname, tray, iter, hostname, key='') source code
 
task_init(self, dataset_id, job_id, tray=None, iter=None)
Initialize task entry and set status to IDLE
source code
 
task_status(self, task_id)
Get status for task
source code
 
task_update_status(self, task_id, status, key='', cursor=None, grid_id=0) source code
 
task_is_finished(self, td_id, job_id, tray=None, iter=None) source code
 
task_abort(self, task_id, key='', stats={}) source code
 
task_finish(self, task_id, stats, key='') source code
 
get_finished_jobs(self, grid_id, delay=1)
Reset any pending jobs to they get reprocesses.
source code
 
SetFileURL(self, queue_id, dataset_id, location, filename, md5sum, filesize, transfertime, key)
Add or change the global location of a file
source code
 
GetStorageURL(self, dataset_id, queue_id, passkey, storage_type='INPUT')
Get status of dataset
source code
 
clearStorageURL(self, dataset_id)
Get status of dataset
source code
 
getsummary(self, days, groupby=None) source code
 
getsummary_stats(self, days, groupby=None) source code
 
getgrid_ids(self) source code
 
getstatus(self, dataset, job=-1)
Returns: formated string with dataset/job summary
source code
 
printsummary(self, days)
Returns: formated string with production summary
source code
 
add_history(self, user, command)
Add a history item
source code

Inherited from IceProdDB: SetAuthFunc, authenticate, authenticate2, commit, connect, defang, disconnect, execute, fetch_metaproject_id, get, getcursor, insert_id, intcast, isconnected, mkkey, nonify, nullify, ping, rollback, set_auto

Class Variables [hide private]
  logger = logging.getLogger('MonitorDB')
Method Details [hide private]

__init__(self)
(Constructor)

source code 

Constructor

Overrides: IceProdDB.__init__

new(self)

source code 

Create a copy of this instance

Overrides: IceProdDB.new

reset_old_jobs(self, grid_id, maxidletime, maxruntime, maxsubmittime, maxcopytime, maxfailures=10, maxevicttime=10, keepalive=14400)

source code 

reset status of jobs that where queued but who's status has not changed in more that maxtime minutes

Parameters:
  • grid_id - id of current cluster
  • maxruntime - maximum run time for jobs
  • maxsubmittime - maximum submit time for jobs
  • maxcopytime - maximum time for jobs to be in 'copying' state
  • maxfailures - maximum number of time a job is allowd to fail
  • keepalive - how often should server expect to hear from jobs

download_tasks(self, dataset_id, steering)

source code 

Get job parts from database

Parameters:
  • dataset_id - ID of the run whose configuration we whish to download

reset_old_tasks(self, grid_id, maxidletime, maxruntime, maxsubmittime, maxcopytime, maxfailures=10, maxevicttime=10, keepalive=14400)

source code 

reset status of tasks that where queued but who's status has not changed in more that maxtime minutes

Parameters:
  • grid_id - id of current cluster
  • maxruntime - maximum run time for jobs
  • maxsubmittime - maximum submit time for jobs
  • maxcopytime - maximum time for jobs to be in 'copying' state
  • maxfailures - maximum number of time a job is allowd to fail
  • keepalive - how often should server expect to hear from jobs

QueueJobs(self, maxjobs, grid_id, jobs_at_once=20, fifo=True, debug=0, maxq=1000)

source code 

Reserve at most 'maxjobs' from a given dataset. Get proc ids and set their status to 'QUEUEING'

SetTasks(self, maxjobs, grid_id, jobs_at_once=20, fifo=True, debug=0)

source code 

DAG mode Reserve at most 'maxjobs' from a given dataset. Get proc ids and set their status to 'QUEUEING'

QueueTasks(self, maxjobs, grid_id, jobs_at_once=20, fifo=True, debug=0)

source code 

DAG mode Reserve at most 'maxjobs' from a given dataset. Get proc ids and set their status to 'QUEUEING'

SuspendGridDataset(self, grid, dataset, suspend=1)

source code 

Update grid participation in dataset dataset.

Parameters:
  • grid - grid or cluster
  • dataset - dataset id

InitializeGridStats(self, grids, dataset_id)

source code 

Insert grid_statistics entries for grids which should run this dataset.

Parameters:
  • grids - list of grids or clusters
  • dataset_id - dataset id

InitializeGridStatsDAG(self, grids, steering, dataset_id)

source code 

Insert grid_statistics entries for grids which should run this dataset.

Parameters:
  • grids - list of grids or clusters
  • dataset_id - dataset id

GetJob(self, dataset_id=0, queue_id=0)

source code 

Get Job info

Parameters:
  • queue_id - queue_id
  • dataset_id - dataset ID
Returns:
i3Job object

jobstart(self, hostname, grid_id, dataset_id=0, queue_id=0, key=None)

source code 

Change the status of a job to indicate it is currently running

Parameters:
  • hostname - host where job was queued from
  • grid_id - ID of iceprod queue
  • dataset_id - Optional dataset ID
  • queue_id - Optional job ID (within dataset)
  • key - temporary passkey to avoid job spoofs
Returns:
dataset_id,nproc,procnum

jobreset(self, dataset_id, job_id, reason=None, passkey=None)

source code 

Update status for job

Parameters:
  • dataset_id - dataset index
  • job_id - process number within dataset

jobcopying(self, dataset_id, job_id, passkey=None)

source code 

Update status for job

Parameters:
  • dataset_id - dataset index
  • job_id - process number within dataset

jobfinalize(self, dataset_id, job_id, job, status='OK', clear_errors=True)

source code 

Update status for job

Parameters:
  • dataset_id - dataset index
  • job_id - process number within dataset

get_stats(self, dataset_id)

source code 

Get collected dataset statistics

Parameters:
  • dataset_id - dataset index

jobfinish(self, dataset_id, job_id, stats, key=None, mode=0)

source code 

Update monitoring for job and write statistics

Parameters:
  • dataset_id - dataset index
  • job_id - process number within dataset
  • stats - dictonary of stat entries

jobabort(self, job_id, dataset_id, error, errormessage='', key=None, stats={})

source code 

Reset any pending jobs to they get reprocesses. This would typically be run at startup in case the daemon crashed previously.

To Do: update node statistics

jobsuspend(self, job_id, dataset_id, suspend=True)

source code 

Reset any pending jobs to they get reprocesses. This would typically be run at startup in case the daemon crashed previously.

multipart_job_start(self, dataset_id, queue_id, key='')

source code 

Change the status of a job to indicate it is currently running

Parameters:
  • dataset_id - Dataset ID
  • queue_id - Queue ID (within dataset)
  • key - temporary passkey to avoid job spoofs
Returns:
dataset_id,nproc,procnum

get_finished_jobs(self, grid_id, delay=1)

source code 

Reset any pending jobs to they get reprocesses. This would typically be run at startup in case the daemon crashed previously.

getsummary(self, days, groupby=None)

source code 
Parameters:
  • days - number of days to get summary from
  • groupby - how to group statistics

getsummary_stats(self, days, groupby=None)

source code 
Parameters:
  • days - number of days to get summary from
  • groupby - how to group statistics

getstatus(self, dataset, job=-1)

source code 
Parameters:
  • dataset - dataset id
  • job - optional job id
Returns:
formated string with dataset/job summary

printsummary(self, days)

source code 
Parameters:
  • days - number of days to gather information from starting from today
Returns:
formated string with production summary