Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

meuser

Members
  • Content count

    2
  • Joined

  • Last visited

  1. Hi Everyone; I have been experiencing on-going challenges with getting preemption to work as I need for our site. I have set up soft limits to limit a particular user group to a max of 50% of the system resources at any given time. ---- set server scheduling = True set server max_run_res_soft.mem = [g:PBS_GENERIC=15tb] set server max_run_res_soft.ncpus += [g:PBS_GENERIC=992] set server max_run_res_soft.mem += [g:group1=8tb] set server max_run_res_soft.ncpus += [g:group1=491] set server max_run_res_soft.mem += [g:group2=8tb] set server max_run_res_soft.ncpus += [g:group=491] --- I have a user job who is part of group1 who is preempting and exceeding these limits regardless. I have put that user's job on 'suspend' and other user jobs begin to execute. If I release it, it grabs resources and somehow manages to force other users jobs to suspend. So, I took a look at pbsf, and this user ------ I then ran a trace job on one of the individual array jobs that keeps getting preempted and I saw this. Notice the odd date?? Feb 22 ?? Where is this date coming from? Thanks!!! 10/04/2015 11:10:26 L Failed to update estimated attrs. 10/04/2015 11:10:26 L Fairshare usage of entity dbodi increased due to job becoming a top job. 10/04/2015 11:10:26 L Job is a top job and will run at Mon Feb 22 04:32:18 2027 10/04/2015 11:10:26 L Host set host=itmiuv2 has too few free resources or is too small The system date is correct Sun Oct 4 11:16:17 EDT 2015
  2. 'pbsnodes -a' indicates that nodes are available but, jobs are being queued due to insufficient cpu's. (992 avail, 207 assigned) resv_enable = True sharing = <various> license = l resources_available.arch = linux_cpu resources_available.mem = 16121888768kb resources_available.ncpus = 992 resources_available.vmem = 0kb resources_available.vnode = <various> resources_available.accelerator_memory = 0kb resources_available.naccelerators = 0 resources_available.netwins = 0 resources_available.board = <various> resources_available.boardpair = <various> resources_available.halfrack = <various> resources_available.iru = <various> resources_available.iruhalf = <various> resources_available.rack = r001 resources_available.socket = <various> resources_assigned.mem = 3394240512kb resources_assigned.ncpus = 207 resources_assigned.vmem = 0kb resources_assigned.accelerator_memory = 0kb resources_assigned.naccelerators = 0 resources_assigned.netwins = 0 ---qstat -f comment = Not Running: Insufficient amount of resource ncpus (R: 16 A: 9 T: 992)
×