Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

Marcelo

Members
  • Content count

    3
  • Joined

  • Last visited

  1. OpenMPI 1.6.5 and PBSPro

    Dear Users I am try to use the OpenMPI 1.6.5 with the PBSPro_13.1.501.161802, but I am facing some problems. I can run the case in one node but when I try to split the code in several nodes the the application do not start, although the qstart shows the process as running. I found some references saying that it is necessary to compile the OpenMPI with some special flag to integrate it with the PBSPro. Have someone successfully integrated the OpenMPI 1.6.5 with the PBSPro? Was necessary to compile the OpenMPI with some special flag? If yes, which flags? Thanks for your help, Regards, Marcelo
  2. Can not delete Hang job

    I have problem with one job which status is hold but i could not delete it. # qstat Job id Name User Time Use S Queue ---------------- ---------------- ---------------- -------- - ----- 1535.admin impi vbrito84 00:00:00 H workq # qstat -a sansao: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time --------------- -------- -------- ---------- ------ --- --- ------ ----- - ----- 1535.admin vbrito84 workq impi 55072 4 96 -- 200:0 H 00:00 # qdel 1535.admin qdel: Unauthorized Request 1535.admin # qdel -W force 1535.admin qdel: Unauthorized Request 1535.admin # Here are some information of configuration in my system # /opt/pbs/11.3.2.130131/bin/pbs_hostn -v sansao primary name: sansao.peno.coppe.ufrj.br (from gethostbyname()) aliases: sansao address length: 4 bytes address: 146.164.57.11 (188327058 dec) name: sansao.peno.coppe.ufrj.br # /opt/pbs/11.3.2.130131/bin/pbs_hostn -v admin primary name: admin.default.domain (from gethostbyname()) aliases: admin aliases: loghost address length: 4 bytes address: 10.0.10.1 (17432586 dec) name: admin.default.domain # qstat -Qf Queue: workq queue_type = Execution total_jobs = 1 state_count = Transit:0 Queued:0 Held:1 Waiting:0 Running:0 Exiting:0 Begun :0 enabled = True started = True # qmgr -c 'print server' # # Create queues and set their attributes. # # # Create and define queue workq # create queue workq set queue workq queue_type = Execution set queue workq enabled = True set queue workq started = True # # Set server attributes. # set server scheduling = True set server default_queue = workq set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server resources_default.ncpus = 1 set server default_chunk.ncpus = 1 set server scheduler_iteration = 600 set server resv_enable = True set server node_fail_requeue = 310 set server max_array_size = 10000 set server pbs_license_info = /opt/altair/licensing11.0/altair_lic.dat set server pbs_license_min = 1 set server pbs_license_max = 2147483647 set server pbs_license_linger_time = 31536000 set server license_count = "Avail_Global:0 Avail_Local:0 Used:0 High_Use:0 Avail_Sockets:26 Unused_Sockets:0" set server eligible_time_enable = False set server max_concurrent_provision = 5 # cat /etc/pbs.conf PBS_EXEC=/opt/pbs/default PBS_HOME=/var/spool/PBS PBS_START_SERVER=1 PBS_START_MOM=0 PBS_START_SCHED=1 PBS_SERVER=sansao PBS_SCP=/usr/bin/scp PBS_RSHCOMMAND=ssh Anyone have idea how can I delete this hold job. Thanks.
  3. 'Mom' atrrib

    Hi I have several nodes which present more than one node in the Mom attrib how can I remove the second one? Here is an example: n016 Mom = n016.default.domain,n013.default.domain Port = 15002 pbs_version = PBSPro_11.3.2.130131 ntype = PBS state = free pcpus = 24 resources_available.arch = linux resources_available.host = n016 resources_available.mem = 32829916kb resources_available.ncpus = 24 resources_available.vnode = n016 resources_assigned.accelerator_memory = 0kb resources_assigned.mem = 0kb resources_assigned.naccelerators = 0 resources_assigned.ncpus = 0 resources_assigned.netwins = 0 resources_assigned.vmem = 0kb resv_enable = True sharing = default_shared license = l I want to remove node013 from the node016 Mom attribute Thanks for your help Marcelo
×