Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

jmesseng

Members
  • Content count

    3
  • Joined

  • Last visited

  • Days Won

    2
  1. PBS Pro license issue: “Not Running: PBS Error: Floating License unavailable” 09-May-2014 System: PBS Pro version 12.0 installed on SLES 11 SP2LMX version 12.0 license manager Network license schema using a three server HAL setup Symptom: qstat –sw reports “Not Running: PBS Error: Floating License unavailable” One or more jobs in Q state affectedPotentially your LMX license server is down and needs to be restartedPotentially an update was done to LMX licensing such as new license file and despite normal appearances where other Altair LMX based applications are able to check-out licenses (look at LMX logs on the license server and run lmx license stat) PBS Pro is reporting this problemExisting jobs in the PBS queue may be running unaffectedSome new jobs submitted to the PBS queue may transition to R state Notes: Attempting to qrun an affected job fails because PBS cannot allocate needed licensesVerify PBS license location string is correctIf using a HAL schema, the host order in the string has to match the HAL server order in the LMX server configuration file, altair-serv.cfgSending the PBS scheduler a HUP will not workYou do not want to attempt a qterm and warm start of the PBS server in the given license circumstance How-to resolve: Re-insert the pbs_license_info string using qmgr, this causing PBS to reinitialize the licensing internally Use qmgr -c "p s" | grep pbs_license_info to display the license string and a good time to confirm it is correct, see Notes above Using the license string captured above or a corrected one, reinitialize the PBS license connection: qmgr –c “set server pbs_license_file_location = 6200@huey:6200@louie:6200@dewey” Example of condition: qstat -ws vulcan: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - ----- 106976.vulcan krockon gp MRB_FC 3221 2 4 126gb -- R 76:20:11 Job run at Tue May 06 at 05:56 on (vulcan[22]:ncpus=2:mem=66060288kb)+(vulcan[23]:ncpus=2:mem=66060288kb) 107077.vulcan choosew cfd_gp Cfg6_CP_2 279272 10 60 300gb -- R 65:52:12 Job run at Tue May 06 at 16:24 on (vulcan[62]:ncpus=6:mem=31457280kb)+(vulcan[63]:ncpus=6:mem=31457280kb)+(... 107282.vulcan hammero gp HX22E_full_f 379874 1 4 254gb -- R 02:59:10 Job run at Fri May 09 at 07:17 on (vulcan[30]:mem=67108864kb:ncpus=4+vulcan[31]:mem=67092480kb+vulcan[28]... 107293.vulcan nuktynm shortgp nxn_43n_bm -- 1 4 63gb 01:00 Q -- Not Running: PBS Error: Floating License unavailable 107294.vulcan nuktynm shortgp nxn_43m_bm -- 1 6 63gb 01:00 Q -- Not Running: PBS Error: Floating License unavailable 107299.vulcan aaelidk gp SOF5_Heb_SYS12 -- 1 4 63gb -- Q -- Not Running: PBS Error: Floating License unavailable 107301.vulcan saracki gp MCDV-Down-Af 158128 1 1 254gb -- R 00:04:14 Job run at Fri May 09 at 10:12 on (vulcan[26]:mem=67108864kb:ncpus=1+vulcan[27]:mem=67092480kb+vulcan[24]...
  2. PBS Pro license issue: “Not Running: PBS Error: Floating License unavailable” 09-May-2014 System: PBS Pro version 12.0 installed on SLES 11 SP2LMX version 12.0 license manager Network license schema using a three server HAL setup Symptom: qstat –sw reports “Not Running: PBS Error: Floating License unavailable” One or more jobs in Q state affectedPotentially your LMX license server is down and needs to be restartedPotentially an update was done to LMX licensing such as new license file and despite normal appearances where other Altair LMX based applications are able to check-out licenses (look at LMX logs on the license server and run lmx license stat) PBS Pro is reporting this problemExisting jobs in the PBS queue may be running unaffectedSome new jobs submitted to the PBS queue may transition to R state Notes: Attempting to qrun an affected job fails because PBS cannot allocate needed licensesVerify PBS license location string is correct If using a HAL schema, the host order in the string has to match the HAL server order in the LMX server configuration file, altair-serv.cfgSending the PBS scheduler a HUP will not workYou do not want to attempt a qterm and warm start of the PBS server in the given license circumstance How-to resolve: Re-insert the pbs_license_info string using qmgr, this causing PBS to reinitialize the licensing internally Use qmgr -c "p s" | grep pbs_license_info to display the license string and a good time to confirm it is correct, see Notes above Using the license string captured above or a corrected one, reinitialize the PBS license connection: qmgr –c “set server pbs_license_file_location = 6200@huey:6200@louie:6200@dewey” Example of condition: qstat -ws vulcan: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------------------------------ --------------- --------------- --------------- -------- ---- ----- ------ ----- - ----- 106976.vulcan krockon gp MRB_FC 3221 2 4 126gb -- R 76:20:11 Job run at Tue May 06 at 05:56 on (vulcan[22]:ncpus=2:mem=66060288kb)+(vulcan[23]:ncpus=2:mem=66060288kb) 107077.vulcan choosew cfd_gp Cfg6_CP_2 279272 10 60 300gb -- R 65:52:12 Job run at Tue May 06 at 16:24 on (vulcan[62]:ncpus=6:mem=31457280kb)+(vulcan[63]:ncpus=6:mem=31457280kb)+(... 107282.vulcan hammero gp HX22E_full_f 379874 1 4 254gb -- R 02:59:10 Job run at Fri May 09 at 07:17 on (vulcan[30]:mem=67108864kb:ncpus=4+vulcan[31]:mem=67092480kb+vulcan[28]... 107293.vulcan nuktynm shortgp nxn_43n_bm -- 1 4 63gb 01:00 Q -- Not Running: PBS Error: Floating License unavailable 107294.vulcan nuktynm shortgp nxn_43m_bm -- 1 6 63gb 01:00 Q -- Not Running: PBS Error: Floating License unavailable 107299.vulcan aaelidk gp SOF5_Heb_SYS12 -- 1 4 63gb -- Q -- Not Running: PBS Error: Floating License unavailable 107301.vulcan saracki gp MCDV-Down-Af 158128 1 1 254gb -- R 00:04:14 Job run at Fri May 09 at 10:12 on (vulcan[26]:mem=67108864kb:ncpus=1+vulcan[27]:mem=67092480kb+vulcan[24]...
  3. Clear a hung (zombie) vnode 22-April-2014: PBS Pro version 12.0 installed on SLES 11 SP2 running on an SGI UV1000 Symptom: PBS Pro qstat reports a job or jobs in R state without a SessID or Elapsed Time. The job is not running on the OS nor is there an existing vnode cpuset, look in /dev/cpuset/PBSPro, there will not be a cpuset directory for the affected Job The qdel is ineffective How-to resolve: Use “qstat –r“ to list the affected jobs Use “qalter –r y PBS_JobID“ to mark the job rerunable You can check it is now marked rerunable, “qstat –f PBS_JobID | grep Rerunable” should return “Rerunable = True” Use “qrerun –W force PBS_JobID” to re-queue job You can check it is re-queued, “qstat –a” will list the job in Q state Use “qdel PBS_JobID” to terminate job Tips: Fixing several nodes at the same time, targeting a particular user: qstat –ru a_users_id qalter –r –y PBS_JobID_1 PBS_JobID_2 PBS_JobID_3 PBS_JobID_4 qstat –f PBS_JobID_1 PBS_JobID_2 PBS_JobID_3 PBS_JobID_4 | grep Rerunable qrerun –W force PBS_JobID_1 PBS_JobID_2 PBS_JobID_3 PBS_JobID_4 qstat –au a_users_id qdel PBS_JobID_1 PBS_JobID_2 PBS_JobID_3 PBS_JobID_4
×