Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

adiaz

Members
  • Content count

    113
  • Joined

  • Last visited

About adiaz

  • Rank
    Advanced Member

Profile Information

  • Gender
    Not Telling
  1. There is some configuration require for using GPUs. Its not too difficult. We recently produced a document that descibes how you might go about doing this. you can obtain that document here: />http://www.pbsworks.com/ResLibSearchResult.aspx?keywords=Scheduling%20Jobs%20onto%20GPUs also you can obtain all the most recent documentation here: />http://www.pbsworks.com/PageSupport.aspx?id=19
  2. There really isnt any additional real setup work required. You will however need to think about your license. You probably want a license that covers however many threads you want to run on each execution node. This would be true if you had hyperthreading or not. PBS Professional licensing works basically by the thread (this is sometimes referred to as "by the job"). You could use less than the total number of cpu cores (undersubscription), more than the total number of cpu cores (oversubscription) or only allow jobs up to the total number of cpu cores on each execution node (equal subscription). This last case is how most people use PBS. If you have a license for (8 cores per node times 4 nodes) 32 cores and you then enable Hyperthreading you will need double the licenses or 64 to completely use all the cores on a node.
  3. Depending upon the exact version you are using, PBS Professional in most more modern versions (about ~ v. 8 or higher) can define defaults at several levels in qmgr such as the server and queue level. This includes what is called the "default_chunk" at the sever and queue level as well as the "default_qsub_arguments" server level qmgr setting. I think maybe what you are asking for is a user defined default and I can think of probably two other options to make that happen. One would be to simply place a line at the top of your job script to source a setting files say in your home dir. I havent tried this myself and I have to say I am not sure it would work. The second would be to create what is called a "hook" which is a python script that would upon job submission check the users home directory for say a file name ".pbs_defaults" and apply those settings. You might have to make some decisions on the logic of how that would work but that is definitley an approach that would work. The short answer is if you are ok with a global type setting of defaults you need to do this in qmgr but if you are looking for user changeable defaults via a config file you should probably make a hook. Hooks are talked about in the admin manual.
  4. why log files go to "undelivered" dir ?

    Well, again I think the shortest path to getting this working is using $usecp if you have common mount points across your execution nodes. To add multiple directories you add multiple lines in the mom config $usecp *:/home/user /home/user $usecp *:/data /data $usecp *:/blah /blah This is documented in the PBS Pro 9.1 Admin Guide on page 268. I would also encourage you to use the latest version of PBS if possible. Version 10.4 was just released this week.
  5. why log files go to "undelivered" dir ?

    The return of output files is a common initial setup issue. I suspect you simply dont have the nodes that arent returning the files configured correctly. How are you trying to return the files? There are several ways to do this but my suggested route would be that you use the mom config option called "$usecp". You can add a line to the mom config like: $usecp *:/path /path Dont forget to reboot the mom process. This will place all output files below the /path instead of trying to use rcp or scp (which is what PBS does by default). You can read more about ""$usecp" in the admin guide.
  6. Queue job limit reached ?

    What I am suggesting is that you need your basic user authentication to work. So you are running a linux based PBS cluster with LDAP authentication? You need to be able to login to an execution node (outside of PBS) using the users login and password. If you cannot do this nothing inside PBS will work either. you should test 1. login (by hand) as the user on an execution node. If this works go to step2 2. "qsub -I" as that user. These two things will tell you if its outside or inside PBS. My suspicion is that somehow LDAP isnt working to authenticate the user.
  7. Queue job limit reached ?

    The other thing you can do is to use qsub with a "-I" option as that user. qsub -I -u 127238 This will open an interactive session as that user. If it really is a password issue this will also fail. Also have you checked to make sure the password is in fact set for that user?
  8. Queue job limit reached ?

    From your trace job I see this: What version of PBS are you using? You might need to look into the server setting "flatuid" and possibly set this to true. Is your submitting user's id really "127238" or is that a mistake?
  9. Queue status alerts to System Admin

    There are two things you could do. 1. create a wrapper (shell script) to qsub that detects if people submitted their job with -M and if not adds the needed option and specified users to mail to. You would need to set both "m" and "M" options to qsub. See the qsub man page for the difference. 2. Use a Hook- This essentially would function the same as the wrapper but would a python script inserted in PBS. You can read more about Hooks in the PBS Pro admin guide in chapter 9. There are several examples included one of which I am sure can be adapted to do what you are trying to do.
  10. Queue status alerts to System Admin

    I guess I am a little confused by your question. Do you mean when a *job* (not a queue) is queued, running, ended etc? A job owner will get email along the path of job submission. This can be controlled with the "-m" option to qsub. You can check this out in the qsub man page. Also "-M" controls who gets the mails so you can include other people besides the job owner.
  11. Distributing executables to chunks

    I think this has to do with your MPI. If you check out the admin guide (section 11.10 in the 9.1 AG) there is a section on integration with MPI. PBS needs to be able to use what is called "pbs_attach" to allow it to monitor, cleanup and control all the processes in a PBS job. Some MPI vendors do this for you and provide a configure option when you build it (like OpenMPI). For others we can use a simple scripted integration by redirecting the mpirun command to a pbs version of that command. It essentially takes over the np and machine file args (passing all other options though) and make use of the pbs_attach command. You basically have two options: 1. take an exising pbs_mpirun and edit it for your use (see 11.10.1.3) 2. make use of the pbsun_wrap - while ultimately more flexible this require more work. You simply need to setup the mpi integration correctly and the accounting should work correctly.
  12. Distributing executables to chunks

    Exactly! One entry is written per chunk (unless mpiprocs is > 1) in the PBS_NODEFILE. They are written in the order that the chunks are declared. Adam
  13. Distributing executables to chunks

    Can you tell me which MPI you are using? It should be possible using your MPI options (depending upon what you are using) to pass an executable to each rank either via an "appfile" or some other option. Knowing what MPI you are using would allow for more specific advice. I think basically the answer is that once PBS assigns you the resources its is really up to you to use those resources the way you need to in the job script. There may be some other ways but the appfile seems easiest.
  14. This is probably something you want to look up in the Materials Studio Documentation to make sure you have followed all their instructions. We definitely partner with them but technically they did the actual integration work. A couple years ago I tested this for them. If you cannot find the instructions in the Accelrys documentation you probably want to contact their support team. In short what I remember doing was setting up the cluster, installing PBS and Materials studio on linux and accessing this from Windows using their windows gui. There is then a windows based server console where you have to setup a connection to the linux server as a "gateway" and then a web site for the gateway where you can see the running jobs and manipulate them. This was previoulsy done with mostly perl if I remember correctly. This was all a couple years ago though. They used to package PBS with their products so you may already have it. If you have trouble finding this information from them please let me know and I will see if I can help you find it. Again I think emailing their support group would be the right move. They can point you to the correct docs and tell you about the latest and greatest things about the integration. />http://accelrys.com/customer-support/contact.html
  15. The version is totally relevant to this issue and always relevant when you are asking for help with something that looks like a bug. This isnt a torque support forum. Sounds like torque has (another) bug. Torque is based upon OpenPBS but they parted ways in version 2. PBS Pro is now on version 10.2. You might want to try their forum or mailing list. I am sure they have one. Also I wouldnt be doing my job if I didnt suggest you simply dump torque all together and use something supported like PBS Pro so you wont have these issues.
×