Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
Sign in to follow this  
CSCFCEM

PBS queue for limited walltime AND exclusive nodes

Recommended Posts

I had a question about setting up a queue on exclusive nodes and accepting jobs with limited walltime.  I am more interseted in "what happens when..." for this scenario.  I saw this example (http://forum.pbsworks.com/index.php?/topic/251-assign-exclusive-nodes-to-a-queue/) for setting exclusive nodes to a queue, and it is very clear to me.  I read through the documentation for assigning a max walltime for a queue as well using a route queue.  This also makes sense to me.  It's combining these that has me scratching my head.
 
So let's say I want a queue with 4 exclusive nodes (but my system has plenty other nodes), but to only accept jobs with a walltime less than 6 hours.  If I set up the queue to with the exclusive nodes I want, and set up a routing queue to send all jobs with less than 6 hours of requested walltime to that same queue, what will happen if I submit a job requesting more than 4 nodes but less than 6 hours of walltime.  Will the job get rejected for requesting too many resources?  Will PBS be smart enough to just send it to a queue with non exclusive nodes (which would be most desirable)?  Or will I need to create a hook to handle this?
 
In other words, I want a queue with N exclusive nodes than only accepts jobs with requested walltime less than M, but if a jobs has a requested walltime less than M, but requests more than N nodes, I don't want to to get rejected for there not being enough resources.  Is this a legitamit concern, or does PBS handle this appropriately without hooks (or any other intervention).

Share this post


Link to post
Share on other sites
Hello CSCFCEM,

 

Are you planning on using both resources_max.walltime and resources_max.ncpus?  Let's look at the following configuration.

 

#

# Create and define queue q1

#

create queue q1

set queue q1 queue_type = Route

set queue q1 route_destinations = q2

set queue q1 route_destinations += q3

set queue q1 enabled = True

set queue q1 started = True

#

# Create and define queue q2

#

create queue q2

set queue q2 queue_type = Execution

set queue q2 resources_max.ncpus = 16    

set queue q2 resources_max.walltime = 06:00:00

set queue q2 enabled = True

set queue q2 started = True

#

# Create and define queue q3

#

create queue q3

set queue q3 queue_type = Execution

set queue q3 resources_max.ncpus = 128

set queue q3 resources_max.walltime = 24:00:00

set queue q3 enabled = True

set queue q3 started = True

#

 

We will use the Associating Vnodes With One Queue approach (See Admin Guide for details).  If you are feeling adventurous, compare this to the more robust Associating Vnodes With Multiple Queues approach.  

  • set node n1 queue = q2
  • set node n2 queue = q2
  • set node n3 queue = q2
  • set node n4 queue = q2
 

Now let's see where the jobs end up.

  • -lwalltime=02:00:00  ncpus=2 <--Route to q2.
  • -lwalltime=02:00:00  ncpus=4 <--Route to q2.
  • -lwalltime=02:00:00  ncpus=16 <--Route to q2.
  • -lwalltime=02:00:00  ncpus=32 <--Route to q3.  (Exceeded set queue q2 resources_max.ncpus = 16)
  •  
  • -lwalltime=02:00:00  ncpus=8 <--Route to q2.
  • -lwalltime=04:00:00  ncpus=8 <--Route to q2.
  • -lwalltime=08:00:00  ncpus=8 <--Route to q3.  (Exceeded set queue q2 resources_max.walltime = 06:00:00)
  • -lwalltime=16:00:00  ncpus=8 <--Route to q3.  (Exceeded set queue q2 resources_max.walltime = 06:00:00)
  •  
  • -lwalltime=99:99:99  ncpus=8 <--"qsub: Job rejected by all possible destinations" error
 

I hope this helps,

BrianL

Share this post


Link to post
Share on other sites

Looks great @BrianL, it looks like you addressed any concerns I had.  I think the part I was missing was having multiple queues assigned to the reroute queue (sounds like a strange thing to worry about in hindsight).


 


 


I do however have another question now.  Let's say I submit multiple jobs that would get rerouted to q2 in your example.  As a result, q2 is now full.  If another job that would get redirected to q2 is submitted, and say q3 is empty, would that job sit and wait for q2? Or would it go to where it can run (in q3).  I would like to implement that second scenario.  The example we were using here in the office... imagine you own a grocery store... you want there to be an express lane for shoppers with a few items, but if that line was too long, and a regular lane was open, a shopper with few items should be able to go into the regular lane.  I don't want to FORCE all the jobs into q2, just like express lanes shouldn't force all shoppers with a few items to use it.


 


Hope that makes sense.  Is that possible?  Will PBS reroute small jobs to another queue that has room if it was originally rerouted to a full queue?


 


Thanks again.


Share this post


Link to post
Share on other sites
CSCFCEM,

 

The current queue attributes act as a gating limit, and help to route jobs to the proper queue.  So let's say you submit 100 small jobs, and open the floodgates.  Since the scheduler considers q2 before q3, all 100 jobs will be placed in q2.  I understand this is not your desired behavior. 

 

set queue q2 resources_max.ncpus = 16    

set queue q2 resources_max.walltime = 06:00:00

 

If you want queues to fill up (due to #jobs or used resources), you should consider defining any of the following queue attributes:

 

set queue q2 max_queued

set queue q2 max_queued_res.<resource>

set queue q2 queued_jobs_threshold

set queue q2 queued_jobs_threshold_res.<resource>

 

Basically max_queued actually represents the total number of jobs/resources for jobs in state=R + state=Q.

queued_jobs_threshold represents the total number of jobs/resources for jobs in state=Q jobs.

 

 

Thank You,

BrianL

Share this post


Link to post
Share on other sites

Thanks again BrianL,


 


     As usual, new solutions present new special cases.  Now we are starting to wonder what happens when our q2 fills up, and q2 "qualified" jobs get put into q3 (let's say behind some really big job).  Eventually q2 clears up, there is a job in q3 that is qualified to run in q2, is there a mechanism for PBS to automatically bring it over to q2 once it has room?  In general, is there a way for PBS to re-sort between queues (not within itself).  I am aware of qalter, but was thinking something that PBS would do automatically (perhaps based on an exit event from q2, although I don't think an epilogue can change other jobs in queues)?  I am familiar with the job_sort_formula, but I'm pretty sure that is only for jobs in a queue, not for sorting between queues (correct me if I'm wrong).  I'm guessing this will have to involve "peer scheduling".


 


     Also, it appears that queued_jobs_threshold is PBSPro 13.0 and later, so I think it's a good time to mention I'm using 11.3 (better late than never I suppose).  However I think max_queued should suffice.


 


Thank you.


Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×