Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
jringoot

resource balance information/manipulation

Recommended Posts

Circumstances:
During holiday season, while primary HPC admins are on holiday, I got a question from HPC user: "why are my jobs queued longer than others, even though they are moderate in resources?"
I checked with SGI and it doesn't appear to be a faulty condition: jobs run and get finished correctly, it is just a matter of scheduling.
Apparently some users  jobs remain in entry queue and are surpassed  by jobs from other users that require more CPU/memory/wall-time .
The user asked me: "Is there somekind of record of used resources per users, and did I got someway in a "low resource restriction scheme" because I used already a lof of resources?" "Can I see a balance of my resource use?"

I found the command "mybalance" on this site
https://hpcc.usc.edu/support/documentation/useful-pbs-commands/

But it doesn't seem to be a standard command, since I don't find it on our cluster and it is not described in the PBS user documentation from altair.

Are there alternatives or is this anyway a correct command and do I need to install an additional package?
How and where are these accounting information configured and stored? (I see there is a command pbs-report that can give some info, I am looking into it as well, it appears to take long to make such a report)
Is there a way to give grace to a user for a certain job to permit him to run it despite going over his balance?
Is there a way to reset the counters of a user/group of users?

Thanks,

Joost

Share this post


Link to post
Share on other sites

The URL you referenced is based on TORQUE - not PBS Professional. 

Can you confirm that your site is using PBS Professional?

There can be several reasons why jobs are pending (e.g., scheduling policies, resource requests). 

As a user, you have a some commands you can use to figure out why the job is not running. The most common command is

qstat -f $PBS_JOBID

where $PBS_JOBID is the job in question. This will show the full details of the job and there is a comment attribute that contains a string. Usually, this is the last statement from the PBS Scheduler. However, admins have the ability to overwrite this comment field. 

Another command, which needs to be executed on the PBS Server, is to use 

tracejob -z -n $DAYS $PBS_JOBID

where $DAYS is how many days in the past you want to parse the daemon logs on the local machine. Since I mentioned loggin into the PBS Server, you will see Server and Scheduler logs.  

If the output of tracejob is NOT clear, then the admin could have filtered out DEBUG and other verbose records. You may want to look through the Scheduler logs ($PBS_HOME/sched_logs/) yourself. 

I don't know if your site is using any other 3rd party integrations (e.g., allocation managers, cluster managers) to influence scheduling. So, you wll need to provide more details about your site's configuration in order for someone on the list to provide you better guidance. 

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×