Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
Sign in to follow this  
tryer

Question

Recommended Posts

Hello,

I'm currently studying to use PBS Pro, i have read the user and administrator manual, but I still have some questions related to the possible configuration (i'm not very used to the jobs administration, and i didn't find french manual, so sorry if my questions look easy to answer, or stupid ;) )

First of all, i have understood "CPUs" in the manual is mainly used for "core" ? If i am wrong, please tell me :)

I wonder how PBS Pro uses the cores resource:

Let's say I have 1 machine with 4 cores. I have two kinds of processus on the machine, process H (heavy processus) and L (light processus, they treat some things, wait a long time for an answer, then stops). I have understood I can define 2 different queues, one for H jobs, one for L jobs.

- Is it possible to define that the L jobs must use only 1 core maximum ?

- And if yes, when the L processus are waiting (so very few CPU usage), can the H jobs use this core too ?

Indeed, i wouldn't want to have 1 core reserved for L jobs when they are inactive, whereas my other jobs could need this core

Thanks a lot,

and sorry for my english.

Share this post


Link to post
Share on other sites

In general, the PBS resource ncpus does correspond to CPU cores. The pbs_mom detects the number of cores on a system and sets a default resources_available.ncpus value. This is almost always the best value to use, but it can be overridden if desired (though this is not true on certain systems where PBS has a deeper integration with the system, like on Cray systems or SGI systems using cpusets).

- Is it possible to define that the L jobs must use only 1 core maximum ?

Yes, you can set resources_max.ncpus=1 on that specific queue.

- And if yes, when the L processus are waiting (so very few CPU usage), can the H jobs use this core too ?

Indeed, i wouldn't want to have 1 core reserved for L jobs when they are inactive, whereas my other jobs could need this core

It is important to remember that PBS does not actually pin individual job processes to specific CPUs on the system, that is left to the OS process scheduler. PBS controls how many jobs are sent to the system, it does not decide which job processes actually run on which cores (again, SGI systems running with cpuset support are a slightly different case, though it does not operate on the level of placing a specific process on a specific core).

When PBS runs a job which requests 1 ncpus resource, 1 of the ncpus resources on the node where the job was run has been consumed in the eyes of PBS. Jobs which request 0 ncpus cannot run. So the question for you is: do you want a node that is already "full" with H jobs to be able to additionally take on 1 or more L jobs?

Share this post


Link to post
Share on other sites
Yes, you can set resources_max.ncpus=1 on that specific queue.

sounds great :)

Jobs which request 0 ncpus cannot run

Just a silly question, requesting 0 ncpus means no ncpus restriction, or you're saying ncpus request number can never be 0 ?

So the question for you is: do you want a node that is already "full" with H jobs to be able to additionally take on 1 or more L jobs?

Yes you're right, if I can limit the CPU usage to 1 core for L jobs, and if it is possible to add L jobs when a node is "full" of H jobs, it could result the same. You're thinking of a priority mechanism ?

Share this post


Link to post
Share on other sites
Just a silly question, requesting 0 ncpus means no ncpus restriction, or you're saying ncpus request number can never be 0 ?

I was just saying that you can't run a job that requests o ncpus in total, so you can't have a job that just consumes memory on a node while the H jobs consume both ncpus and mem resources.

My real idea is to tell PBS that the nodes have more ncpus resources than they really do. This would work best if all of your H jobs are the same size, let's say 4 ncpus, and your nodes each have 4 ncpus. We could manually change resources_available.ncpus to 5 on each node, so that each node could run 1 H job and 1 L job at a time (assuming the mem requested by each job was under the resources_Available.mem for the node). If you had nodes that were resources_available.ncpus=8, you could set them to 10 to allow up to 2 of each type of job.

Does something like that sound like it might work for you? If not, we could get a bit more complicated and create multiple vnnodes for each host, some of which accept L jobs and some of which accept H jobs. This might be the way to go if the H jobs vary wildly in size. We can discuss this more if necessary.

Both ideas are essentially the same though, in that the both involve putting "too much" work on the nodes and letting the processes fight over CPU cycles on the OS level.

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×