Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

Search the Community

Showing results for tags 'pbs_comm configuration'.



More search options

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • A Test Category
  • FAQs and Tech Notes
    • FAQs
    • Tutorials
    • Tech Notes
    • SECURITY ADVISORIES
  • Downloads, Plug-Ins, Add-Ons
    • Hooks
    • Application Definitions
    • Configurations
  • Community Discussions
    • Troubleshooting
    • Admins
    • Users
    • Bugs and RFE's
  • Application Integration
  • Admin
  • Users
  • Bugs and RFE‚Äôs

Found 1 result

  1. pbs_comm on wrong eth

    I've finished setting up ohpc 1.2 with pbs pro. The setup is is as follows the master is connected to the LAN on eth0 and to the compute nodes (via a switch) on eth1. The pbs_comm however, defaults to the IP adress of eth0 which the compute nodes of course can not reach Output from /var/spool/pbs/comm_logs: 01/20/2017 14:54:41;0002;Comm@ricr-cluster;Svr;Comm@ricr-cluster;Exiting 01/20/2017 14:54:41;0002;Comm@ricr-cluster;Svr;Log;Log closed 01/20/2017 14:55:08;0002;Comm@ricr-cluster;Svr;Log;Log opened 01/20/2017 14:55:08;0002;Comm@ricr-cluster;Svr;Comm@ricr-cluster;pbs_version=14.1.0 01/20/2017 14:55:08;0002;Comm@ricr-cluster;Svr;Comm@ricr-cluster;pbs_build=mach=N/A:security=N/A:configure_args=N/A 01/20/2017 14:55:08;0002;Comm@ricr-cluster;Svr;Comm@ricr-cluster;/opt/pbs/sbin/pbs_comm ready (pid=16276), Proxy Name:ricr-cluster:17001, Threads:4 01/20/2017 14:55:08;0c06;Comm@ricr-cluster;TPP;Comm@ricr-cluster(Thread 1);tfd=18, Leaf registered address 10.155.198.146:15004 01/20/2017 14:55:14;0c06;Comm@ricr-cluster;TPP;Comm@ricr-cluster(Thread 2);tfd=19, Leaf registered address 10.155.198.146:15001 01/20/2017 14:55:41;0c06;Comm@ricr-cluster;TPP;Comm@ricr-cluster(Thread 3);tfd=20, Leaf registered address 192.168.1.4:15003 I don't really understand what is going on here since 192.168.1.4 is the IP of compute node1. Output from /var/spool/pbs/mom_logs/ 01/20/2017 14:55:41;0d80;pbs_mom;TPP;pbs_mom(Thread 0);sd 0, Received noroute to dest 192.168.1.5:15001, msg="tfd=20, pbs_comm:10.155.198.146:17001: Dest not found" 01/20/2017 14:55:46;0001;pbs_mom;Svr;pbs_mom;Access from host not allowed, or unknown host (15008) in is_request, bad connect from 10.155.198.146:15001 This is the pbs.conf PBS_SERVER=ricr-cluster PBS_START_SERVER=1 PBS_START_SCHED=1 PBS_START_COMM=1 PBS_START_MOM=0 PBS_EXEC=/opt/pbs PBS_HOME=/var/spool/pbs PBS_CORE_LIMIT=unlimited PBS_SCP=/bin/scp For the compute nodes, the 0/1 flags are interchanged of course All services are running: [root@c1 ~]# ps -ef | grep pbs root 3952 1 0 14:55 ? 00:00:00 /opt/pbs/sbin/pbs_mom root 4099 3812 0 15:23 pts/0 00:00:00 grep --color=auto pbs [root@ricr-cluster ~]# ps -ef | grep pbs root 16276 1 0 14:55 ? 00:00:00 /opt/pbs/sbin/pbs_comm root 16293 1 0 14:55 ? 00:00:00 /opt/pbs/sbin/pbs_sched root 16753 1 0 14:55 ? 00:00:00 /opt/pbs/sbin/pbs_ds_monitor monitor postgres 16853 1 0 14:55 ? 00:00:00 /usr/bin/postgres -D /var/spool/pbs/datastore -p 15007 postgres 16861 16853 0 14:55 ? 00:00:00 postgres: postgres pbs_datastore 10.155.198.146(40520) idle root 16870 1 0 14:55 ? 00:00:00 /opt/pbs/sbin/pbs_server.bin root 17700 14501 0 15:21 pts/0 00:00:00 grep --color=auto pbs The IP listed after pbs_datastore is the unwanted IP of eth0 pinging and sshing works in both directions. The nodes are all listed as down. I'm guessing this is due to them not communicationg with pbs_comm [root@ricr-cluster ~]# pbsnodes -av c1 Mom = c1.localdomain Port = 15002 pbs_version = unavailable ntype = PBS state = state-unknown,down pcpus = 1 resources_available.host = c1 resources_available.ncpus = 1 resources_available.vnode = c1 resources_assigned.accelerator_memory = 0kb resources_assigned.mem = 0kb resources_assigned.naccelerators = 0 resources_assigned.ncpus = 0 resources_assigned.netwins = 0 resources_assigned.vmem = 0kb comment = node down: communication closed resv_enable = True sharing = default_shared How do I reconfigure pbs_comm?
×