Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

garygo

Members
  • Content count

    3
  • Joined

  • Last visited

  1. single-node host name woes

    Actually, I solved my problem. It seems that with Torque 4.2.10, any setup that has the same host name for pbs_server and pbs_mom is seen as a NUMA system. The key to getting the MOM node to state=free was to edit /var/lib/torque/mom_priv/mom.layout and enter the single line "nodes=0" in it. After restarting the pbs daemons (e.g., service pbs_server restart), a "qnodes" shows the single 16-core compute node as in state=free and jobs now do run. I'll have to do further investigation to determine for sure that all cpus and 16 processors indeed get used, but I think I'm on the right track now. Hopefully, this will be of help to some other Torque users who want to set up a single-node queueing system. - Gary
  2. single-node host name woes

    Thanks for your comments, Scott. Yes. I have verified that the firewall allows communication ports. I also tried turning off the firewall altogether. I realize that this forum is not exactly for Torque. I was just hoping that some knowledge here of host name issues would translate to my situation. I also have been unable to find any forum that directly addresses Torque as simply downloaded by yum for CentOS. Any suggestions on appropriate forums? Thanks, Gary
  3. single-node host name woes

    I am attempting to set up torque to run on a single node with 20 logical cores, configured as np=16. Both the server name and the single mom node are meant to have the hostname -s of dev1-linux. The setup mostly is working. A queue exists and I can submit jobs to it. But qnodes shows the node with state=down and the jobs do not run. I am running on CentOS 6.8 using Torque 4.2.10. From having tried this in the past, I suspect the problem is that there is some kind of communication problem between pbs_server and pbs_mom, with some elements seeing the hostname as the full host (hostname -f) and some as the short name. Log files don't reveal any obvious errors, except that the server_logs file shows the server as 'dev1-linux.attlocal.net' when I have used 'dev1-linux' in all the places I can think of where a host name is specified. Any suggestions about the state=down problem in general or other places (beside server_name and mom_priv/config) for controlling host name?
×