Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
Prasanna

Re-queuing jobs based on exit codes specified in queues

Recommended Posts

This is proposed RFE.

I was looking to have a functionality in PBS professional which can re-queue the jobs based on the exit codes which will be mentioned in the queues and returned by spool scripts. The idea here is to prevent jobs from falling over when it could not obtain a license or host related problems. In case the application is unable to obtain licenses and falls over( I can do some post failure checks in my scripts and grep the failure string) and exit it with a certain exit code by which it will be re-queued at the top of the queue. For host specific failures codes can we also exclude the node list which was used for the 1 st run of the Job.

Example to illustrate -

Queue A

rerun_exit_codes = "62,58"

 

Sample script that will perform check and denote the exit code. The Job should land on top of the queue -

if [ `tail -100 ${INPUT_NAME}_dyna.stdout|egrep -c "Error 70011"` -ne 0 ] ; then

      echo -ne "ERROR: Indicates License Problem.. Rerunning the job ....\n";  

      exit 62

fi

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×