Scott Suchyta

Moderators
  • Content count

    157
  • Joined

  • Last visited

  • Days Won

    9

Scott Suchyta last won the day on June 21 2016

Scott Suchyta had the most liked content!

About Scott Suchyta

  • Rank
    Advanced Member

Profile Information

  • Gender
    Not Telling
  1. Referencing the latest PBS Professional Installation and Upgrade Guide (14.2.1) A few more points.. what ports to open for interactive qsub jobs? It's unknown, and it has to, since you have different sessions -- a possibly unlimited number of them -- and they don't use privileged ports since they are not run as root but as the user (they simply communicate the port to the execution nodes by setting an attribute). Allow traffic _from_ the execution hosts.
  2. You could try the following as root qhold `qselect` the command qselect will list all of the jobs in the queue(s). Since there are no arguments supplied to qselect, it will result in listing all jobs. Yo may want to review the qselect documentation to understand how you can filter the job list. You reference Compute Manager, which suggests to me that you (your company) should have a support contract. I wouldn't know whether your company is current on the support contract, but it would hurt to send email to the PBS Support team to get real-time help. See http://www.pbsworks.com/ContactSupport.aspx
  3. The same OS is not required to be the same on head node and compute nodes. So, your configuration "server parts on Redhat 6.7 and install pbs_mom on CentOS 7.3" is a valid configuration.
  4. Job(s) running on the "offline" node will continue to execute. By setting a node to "offline" will NOT stop the executing job(s) on the node. Referring to the reference guide, pbsnodes -o <nodename> Marks listed hosts as OFFLINE even if currently in use. This is different from being marked DOWN. A host that is marked OFFLINE will continue to execute the jobs already on it, but will be removed from the scheduling pool (no more jobs will be scheduled on it.) For hosts with multiple vnodes, pbsnodes operates on a host and all of its vnodes, where the hostname is resources_available.host, which is the name of the natural vnode. To offline a single vnode in a multi-vnoded system, use: qmgr -c “set node <nodename> state=offline” If you want to issue the node comment at the same time as offlining the node, you can pbsnodes -C "Note: not accepting new jobs" -o <nodename> [<nodename2> ...> with qmgr, you would qmgr -c "set node <nodename> comment = 'Note: not accepting new jobs'"
  5. Have you looked at setting the node offline? pbsnodes -o <nodename> or you can use qmgr -c "set node <nodename> state = offline"
  6. What do you get if you execute rpm -qi <package_name> Otherwise, you may want to review the http://goo.gl/j04vz the company that maintains Torque.
  7. Sorry, garygo, this forum is for PBS Professional; I cannot comment on Torque. Have you verified that the firewall is not blocking communication ports? I am asking because we receive several PBS Professional support calls related to pbs_server and pbs_mom communication issues and it is because of the firewall rules.
  8. The URL you referenced is based on TORQUE - not PBS Professional. Can you confirm that your site is using PBS Professional? There can be several reasons why jobs are pending (e.g., scheduling policies, resource requests). As a user, you have a some commands you can use to figure out why the job is not running. The most common command is qstat -f $PBS_JOBID where $PBS_JOBID is the job in question. This will show the full details of the job and there is a comment attribute that contains a string. Usually, this is the last statement from the PBS Scheduler. However, admins have the ability to overwrite this comment field. Another command, which needs to be executed on the PBS Server, is to use tracejob -z -n $DAYS $PBS_JOBID where $DAYS is how many days in the past you want to parse the daemon logs on the local machine. Since I mentioned loggin into the PBS Server, you will see Server and Scheduler logs. If the output of tracejob is NOT clear, then the admin could have filtered out DEBUG and other verbose records. You may want to look through the Scheduler logs ($PBS_HOME/sched_logs/) yourself. I don't know if your site is using any other 3rd party integrations (e.g., allocation managers, cluster managers) to influence scheduling. So, you wll need to provide more details about your site's configuration in order for someone on the list to provide you better guidance.
  9. We are excited to inform you that the PBS Professional team has successfully completed its goal in releasing the open source licensing option of PBS Professional by mid-2016. Now Available for Download Visit that brand new website www.pbspro.org to learn more about the initiative and download the software packages. Join the Community We want you to be a part of the open source project community! Join our forum to continue to receive announcements and interact with one another to discuss topics and help answer questions. Everyone is welcome to contribute to the code in a variety of ways including developing new capabilities, testing, etc. Visit www.pbspro.org to learn about our different mailing lists and the numerous ways to participate. Thank you, The PBS Professional Open Source Team
  10. We are excited to inform you that the PBS Professional team has successfully completed its goal in releasing the open source licensing option of PBS Professional by mid-2016. Now Available for Download Visit that brand new website www.pbspro.org to learn more about the initiative and download the software packages. Join the Community We want you to be a part of the open source project community! Join our forum to continue to receive announcements and interact with one another to discuss topics and help answer questions. Everyone is welcome to contribute to the code in a variety of ways including developing new capabilities, testing, etc. Visit www.pbspro.org to learn about our different mailing lists and the numerous ways to participate. Thank you, The PBS Professional Open Source Team
  11. What does the server_logs and mom_logs say for not being able to qdel the job? Who is trying to qdel the job? is it root or the user that submitted the job? Can you share the logs? The JOBID has "admin", but your /etc/pbs.conf says the PBS_SERVER=sansao.. Looks like your cluster has multiple interfaces and points of name resolution, so you will need to describe what you cluster looks like from a network and naming point of view. I noticed in your other post about the Mom attribute that you are running 11.3.2. This is a pretty old version of PBS. I am not saying 11.3.2 has any issues, but I wanted to make sure you were aware of v13.0 having much better support for multi-interface clusters.
  12. That is very interesting. The MOM attribute represents the hostname of host on which MoM daemon runs. The server can set this to the FQDN of the host on which MoM runs, if the vnode name is the same as the hostname. By chance does n016 resolve to n016.default.domain and n013.default.domain? On the PBS Server and on the PBS MOM (n016), please execute the following to see the possible hostname resolution pbs_hostn -v n016 If you have name resolution issue where n016 thinks it is n016.default.domain and n013.default.domain, make sure you fix that before proceeding with the following suggestion. Your node configuration looks very basic - meaning there are no custom resources. Your fastest method of cleaning this up would be delete the node from qmgr and re-create it. qmgr -c "d n n016" qmgr -c "c n n016"
  13. based on what I can research, it looks like this was resolved in v12.1
  14. Referencing the PBS Professional 12.2.6 Release Notes, it does indicate the following:
  15. This is somewhat _old news_, but still worth making sure people know.. qmgr history (and editing) is available as of PBS Professional v12.2. See a snippet from the manual below: