During holiday season, while primary HPC admins are on holiday, I got a question from HPC user: "why are my jobs queued longer than others, even though they are moderate in resources?"
I checked with SGI and it doesn't appear to be a faulty condition: jobs run and get finished correctly, it is just a matter of scheduling.
Apparently some users jobs remain in entry queue and are surpassed by jobs from other users that require more CPU/memory/wall-time .
The user asked me: "Is there somekind of record of used resources per users, and did I got someway in a "low resource restriction scheme" because I used already a lof of resources?" "Can I see a balance of my resource use?"
I found the command "mybalance" on this site https://hpcc.usc.edu/support/documentation/useful-pbs-commands/
But it doesn't seem to be a standard command, since I don't find it on our cluster and it is not described in the PBS user documentation from altair.
Are there alternatives or is this anyway a correct command and do I need to install an additional package?
How and where are these accounting information configured and stored? (I see there is a command pbs-report that can give some info, I am looking into it as well, it appears to take long to make such a report)
Is there a way to give grace to a user for a certain job to permit him to run it despite going over his balance?
Is there a way to reset the counters of a user/group of users?