Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

ingrid

Members
  • Content count

    12
  • Joined

  • Last visited

  1. Hi, I have a few quick questions: Will PBS Pro 12.0 work on Windows Server 2012 R2? If not, is there any version of PBS Pro that will? Will we have to purchase that version in order to upgrade from 12.0? Sorry if any of the questions are silly. Any answers or opinions on this will be very much appreciated.
  2. removing an execution host

    Hi, I have a simple question. We are removing some computers from our cluster that were PBS execution hosts (not the server host). Do I need to do anything (like uninstalling PBS from those computers) before removing them? Thanks!
  3. net stop/start pbs_server

    Thanks for replying! Everything seems to work properly now that I've freed up space in those 2 hosts. Yes, currently, the ER/OU of all jobs overwrite the same file in a network drive. I tested it just now, and nothing went to the undelivered folder. Perhaps when many jobs finished at the same time, they couldn't overwrite at the same time and went undelivered? When the C drives were full, in pbsnodes -a the state=free for those hosts.
  4. net stop/start pbs_server

    Found out that those 2 hosts had a full C drive from all the undelivered output files. Now I have to figure out why they were undelivered...
  5. net stop/start pbs_server

    My hosts are not DHCP. I found out that only 2 out of the 8 hosts are not able to receive jobs (did this by sending job requests to specific hosts). I checked their IP addresses and they have not been changed. They are also the same as the ones I get using pbs_hostn. Any ideas as to what I should do next? Thanks so much, Scott. There isn't really anyone to help me with this around here.
  6. net stop/start pbs_server

    It was working fine before the attempted restart. I'll look into what you suggested soon.
  7. Hi, I stopped the pbs_server (didn't really need to, and probably shouldn't have), but then when I tried to restart it, it was unsuccessful. When I tried to start it again a few days later, it still says "The service is starting or stopping. Please try again later". Everything seemed to work fine at first, but now when I try to run jobs, only the first out of many get sent while the others wait in queue, even though I'm sure there are many hosts that are able to run them. In sched_logs it says "Failed to run: Execution server rejected request (15041)" In server_logs it says: 11/24/2014 18:15:00;0008;Server@mpr42;Job;194[2].mpr42;Job Run at request of Scheduler@mpr42.medphys.net on exec_vnode (mpr45:ncpus=1) 11/24/2014 18:15:00;0002;PBS_send_job;Svr;Log;Log opened 11/24/2014 18:15:00;0002;PBS_send_job;Svr;PBS_send_job;pbs_version=PBSPro_12.0.1.130184 11/24/2014 18:15:00;0002;PBS_send_job;Svr;PBS_send_job;pbs_build=mach=WIN32:security=:configure_args= 11/24/2014 18:15:03;0002;PBS_send_job;Svr;Log;Log closed 11/24/2014 18:15:06;0008;Server@mpr42;Job;194[2].mpr42;Unable to Run Job, MOM rejected 11/24/2014 18:15:06;0080;Server@mpr42;Req;req_reject;Reject reply code=15041, aux=0, type=15, from Scheduler@mpr42.medphys.net Looking at the mom_logs, since the attempted server restart, it doesn't log anything about the jobs like it used to. How do I fix this problem? Any help will be much appreciated. Thanks in advance.
  8. scheduling & restarting problem

    I'm just going to answer my own question in case this helps anyone. 1. Yes, I had to restart the scheduler. 2. To restart anything, I just had to run command prompt as administrator. (Haha.)
  9. Hi, I'm quite new to PBS Pro and there are some simple problems that I can't figure out how to solve: 1. I'm trying to have PBS schedule the jobs so that it doesn't use all the cpus in one host if it doesn't need to. So I edited node_sort_key in the sched_config file from "sort_priority HIGH" to "ncpus HIGH unused". But it still isn't working yet. Do I need to restart the scheduler? 2. It sounds silly, but I haven't figured out how to restart the scheduler or MOM or anything else. When I try "net stop pbs_sched", it says "Access is denied". I should have administrative privileges, and if I don't, how do I get them? I am the only one using PBS Pro at the moment. Also, the person who installed PBS Pro onto our cluster has left. Please help! Thanks in advance.
  10. Insufficient amount of resource ncpus

    I know now that 'qsub -l select=1:...' is the correct one, since that applies to each sub-job and not the whole job array. I realised that all the jobs that are ending are the ones on the other computers, which means there may be an issue with my mapped drive.
  11. Insufficient amount of resource ncpus

    Thanks for your quick reply, Scott. Using '-l select=' has helped, however, I have another related problem. I'm trying to use PBS Pro to run multiple simulations at the same time. Currently, each simulation is dependent on the 'PBS_ARRAY_INDEX'. I'm sure I can change that, but I'd prefer not to, since it might complicate things. How do I get PBS to run all the sub-jobs of the job array at the same time? For example, when I use: qsub -l select=10:ncpus=7 -J 1-10 ... Often it will run the first sub-job on all chunks, and will not run the next sub-job until the previous has been completed. When I use: qsub -l select=1:ncpus=7 -J 1-10 ... It 'runs' the sub-jobs one after the other, on different chunks, but they expire too quickly and there is no output (so they don't run properly at all, although there are no errors that I can see). I would really appreciate any help. Thanks in advance!
  12. Hello, I am quite new to this and I am trying to get PBS Pro to work again after some time of it not being used. My problem is that, every time I try to run a job, it doesn't run because of 'insufficient amount of resource ncpus'. Which I don't understand, because the combined resources_available.ncpus of our computers is much higher than what I'm requesting. The job runs when the ncpus I request to use is equal or less than the cores I have on my one computer (the 'front-end machine'), i.e. it works when the job only needs to be run on my computer, and none of the others. When I type in 'pbsnodes -a' though, it shows that there are many computers that are 'free'. So my question is: why is PBS not recognizing that there are many CPUs to use in the other machines? Any help would be much appreciated. Thanks!
×