Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
Sign in to follow this  
jmesseng

How-to for clearing a job in R state that is not running, no SessID or Elapsed time

Recommended Posts

Clear a hung (zombie) vnode


22-April-2014:  PBS Pro version 12.0 installed on SLES 11 SP2 running on an SGI UV1000


              


Symptom:


PBS Pro qstat reports a job or jobs in R state without a SessID or Elapsed Time


  • The job is not running on the OS nor is there an existing vnode cpuset, look in /dev/cpuset/PBSPro, there will not be a cpuset directory for the affected Job
  • The qdel is ineffective

 


How-to resolve:


  1. Use  “qstat –r“ to list the affected jobs
  2. Use  “qalter –r y PBS_JobID“ to mark the job rerunable
  3. You can check it is now marked rerunable, “qstat –f PBS_JobID | grep Rerunable”  should return  “Rerunable = True
  4. Use  “qrerun –W force PBS_JobID”  to re-queue job
  5. You can check it is re-queued,  “qstat –a” will list the job in Q state
  6. Use  “qdel PBS_JobID” to terminate job

 


Tips:


               Fixing several nodes at the same time, targeting a particular user:


 


                        qstat –ru a_users_id


 


                        qalter –r –y PBS_JobID_1  PBS_JobID_2  PBS_JobID_3  PBS_JobID_4


 


                qstat –f PBS_JobID_1  PBS_JobID_2  PBS_JobID_3  PBS_JobID_4 | grep Rerunable


 


                qrerun –W force PBS_JobID_1 PBS_JobID_2  PBS_JobID_3  PBS_JobID_4


 


                qstat –au a_users_id


 


                qdel PBS_JobID_1  PBS_JobID_2  PBS_JobID_3  PBS_JobID_4


Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×