From cindy.mackenzie@icecube.wisc.edu Mon Jan 31 22:28:37 2005 Date: Sun, 23 Jan 2005 14:15:44 -0600 From: Cindy Mackenzie To: Darryn Schneider Cc: wo@amanda.spole.gov Subject: Re: Daily SPADE checks I've added some more info below.... ~ cindy Darryn Schneider wrote: >Get JMX console >[darryn@localhost darryn]$ ssh -L 8082:sps-sattx:8080 darryn@sps-access > >on local browser http://localhost:8082/ > >Go to "JMX Console" > >Under "IceCube" go to "component=SPADE,host=localhost" > >** 1 > >Go to "showSubsystemRunState" > >(note all is alphabetical order, and Operations bottom half) > >* All should be running >SPADE is RUNNING. >icecube.datahandling.spade.Archiver is RUNNING. >icecube.datahandling.spade.Fetcher is RUNNING. >icecube.datahandling.spade.Processor is RUNNING. >icecube.datahandling.spade.Verifier is RUNNING. >icecube.datahandling.spade.Monitor is RUNNING. > >** 2 > >Got to "showActiveAlerts" > >* Look for "ERROR" > >Try and fix problems - need to check logs for this - on sps-sattx sudo >to user jboss and check log > >[jboss@sps-sattx icecube]$ more /mnt/local/icecube/jboss/server/iceboss0/log/server.log > >exmaple >Active allert - >2005-01-21 06:23:00 PAIR WARN Fetch Remote files did not appear to be a well-defined pair. Could not fetch. > >Looks like drill data. Go to "showFileRegistry" to find out where to >check this. - amanda2.spole.gov /mnt/disk2/drill/hole21 > >Maybe when sem file not there yet throws this WARN > > This one indicates that a semaphore file is sitting in the directory without a matching binary file. So in theory it's a bit more serious than the reverse situation: a binary without a semaphore, because *something* has happened to the binary file, or it was never produced. If data producers are dropping files in the directory correctly and SPADE can delete them after fetching, then this shouldn't occur. One possibility is that the pair was processed correctly, but the .sem file couldn't be deleted from the source, so was left behind. If that was the case, you would also see an earlier ERROR on the "Active Alerts" page indicating that SPADE could not delete the semaphore file. The best way to check what happened to the pair of this name is to check the database for its processing history. This is probably more detail than I should get into right now. Another thing you could do would be to check check SPADE's "sent" directory (/mnt/data/spade/sent/scp) for the same name as the stray semaphore, but with a .tar.gz extension (this would be the binary/metadata pair if it was successfully sent). But note that the sent file will only exist in the sent directory until it is verified by Ingest from the north, when it is deleted -- so this check is only good for the short term. Worst case, you could also grep the logs for the filename to trace what happened. >** 3 > >while on sps-sattx check dir status > >[jboss@sps-sattx spade]$ du -ks /mnt/data/spade/* >88 /mnt/data/spade/inbox >3540 /mnt/data/spade/logs >29044 /mnt/data/spade/outbox >19944 /mnt/data/spade/problem_files >4 /mnt/data/spade/raw_file_cache >104 /mnt/data/spade/registry_files >16 /mnt/data/spade/resend >36 /mnt/data/spade/resources >40 /mnt/data/spade/schemas >2452 /mnt/data/spade/scripts >2143012 /mnt/data/spade/sent >788 /mnt/data/spade/tape_queue >0 /mnt/data/spade/test.txt >2832 /mnt/data/spade/verification_files > >Note that /mnt/data/spade/sent stores files waiting for an >acknowledgment from "ingest" in Madison before deleting - or >resending. > >Since SPADE does not yet have access to the White Sands server these >files are backing up a bit. Might motivate them to finish. > > >** 4 > >Check tape status > > > >