Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 

chiyen

Members
  • Content count

    1
  • Joined

  • Last visited

  1. I have install PBSpro 14.2 for failover cluster when I run job,i try to tracejob id ,it's display have a mistake :06/06/2017 11:34:01 L Failed to run: (15016) how to slove this problem? [root@HN01 network-scripts]# tracejob 342 Job: 342.HN01 06/06/2017 11:34:01 S enqueuing into workq, state 1 hop 1 06/06/2017 11:34:01 S Job Run at request of Scheduler@hn01 on exec_vnode (cn01:ncpus=32)+(cn02:ncpus=32)+(cn03:ncpus=32)+(cn04:ncpus=32)+(cn05:ncpus=32)+(cn06:ncpus=32) 06/06/2017 11:34:01 S (req_movejob) Request invalid for state of job, state=4 06/06/2017 11:34:01 L Considering job to run 06/06/2017 11:34:01 L Job run 06/06/2017 11:34:01 L Considering job to run 06/06/2017 11:34:01 L Failed to run: (15016) 06/06/2017 11:34:01 S Job Queued at request of danish@mgnt02, owner = danish@mgnt02, job name = pbs_run2.job, queue = workq 06/06/2017 11:34:01 A queue=workq 06/06/2017 11:34:01 A user=danish group=navy project=_pbs_project_default jobname=pbs_run2.job queue=workq ctime=1496720041 qtime=1496720041 etime=1496720041 start=1496720041 exec_host=cn01/0*32+cn02/0*32+cn03/0*32+cn04/0*32+cn05/0*32+cn06/0*32 exec_vnode=(cn01:ncpus=32)+(cn02:ncpus=32)+(cn03:ncpus=32)+(cn04:ncpus=32)+(cn05:ncpus=32)+(cn06:ncpus=32) Resource_List.mpiprocs=192 Resource_List.ncpus=192 Resource_List.nodect=6 Resource_List.nodes=6:ppn=32 Resource_List.place=scatter Resource_List.select=6:ncpus=32:mpiprocs=32 Resource_List.walltime=72:00:00 resource_assigned.ncpus=192 cn01 mom_logs: 06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;nprocs: 630, cantstat: 0, nomem: 0, skipped: 0, cached: 0, max excluded PID: 0 06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;Started, pid = 19512 06/06/2017 11:32:33;0080;pbs_mom;Job;341.HN01;task 00000001 terminated 06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;Terminated 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;task 00000001 cput= 0:00:00 06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;kill_job 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;CN01 cput= 0:00:00 mem=432kb 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn02 cput= 0:00:00 mem=0kb 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn03 cput= 0:00:00 mem=0kb 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn04 cput= 0:00:00 mem=0kb 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn05 cput= 0:00:00 mem=0kb 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn06 cput= 0:00:00 mem=0kb 06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;no active tasks 06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;Obit sent 06/06/2017 11:32:33;0100;pbs_mom;Req;;Type 54 request received from root@192.168.2.61:15001, sock=1 06/06/2017 11:32:33;0080;pbs_mom;Job;341.HN01;copy file request received 06/06/2017 11:32:34;0100;pbs_mom;Job;341.HN01;staged 2 items out over 0:00:01 06/06/2017 11:32:34;0008;pbs_mom;Job;341.HN01;no active tasks 06/06/2017 11:32:34;0100;pbs_mom;Req;;Type 6 request received from root@192.168.2.61:15001, sock=1 06/06/2017 11:32:34;0080;pbs_mom;Job;341.HN01;delete job request received 06/06/2017 11:32:34;0008;pbs_mom;Job;341.HN01;kill_job 06/06/2017 11:34:01;0100;pbs_mom;Req;;Type 1 request received from root@192.168.2.61:15001, sock=1 06/06/2017 11:34:01;0100;pbs_mom;Req;;Type 3 request received from root@192.168.2.61:15001, sock=1 06/06/2017 11:34:01;0100;pbs_mom;Req;;Type 5 request received from root@192.168.2.61:15001, sock=1 06/06/2017 11:34:01;0008;pbs_mom;Job;342.HN01;Type 5 request received from root@192.168.2.61:15001, sock=1 06/06/2017 11:34:01;0008;pbs_mom;Job;342.HN01;Started, pid = 20071 mom_log_341.txt tracejob_341.txt
×