Prasanna

Re-queuing jobs based on exit codes specified in queues

1 post in this topic

This is proposed RFE.

I was looking to have a functionality in PBS professional which can re-queue the jobs based on the exit codes which will be mentioned in the queues and returned by spool scripts. The idea here is to prevent jobs from falling over when it could not obtain a license or host related problems. In case the application is unable to obtain licenses and falls over( I can do some post failure checks in my scripts and grep the failure string) and exit it with a certain exit code by which it will be re-queued at the top of the queue. For host specific failures codes can we also exclude the node list which was used for the 1 st run of the Job.

Example to illustrate -

Queue A

rerun_exit_codes = "62,58"

 

Sample script that will perform check and denote the exit code. The Job should land on top of the queue -

if [ `tail -100 ${INPUT_NAME}_dyna.stdout|egrep -c "Error 70011"` -ne 0 ] ; then

      echo -ne "ERROR: Indicates License Problem.. Rerunning the job ....\n";  

      exit 62

fi

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now