Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
chiyen

pbspro Failed to run: (15016)

Recommended Posts

I have install PBSpro 14.2 for failover cluster

when I run job,i try to tracejob id ,it's display

have a mistake :06/06/2017 11:34:01  L    Failed to run:  (15016)

how to slove this problem?

[root@HN01 network-scripts]# tracejob 342

Job: 342.HN01

06/06/2017 11:34:01  S    enqueuing into workq, state 1 hop 1
06/06/2017 11:34:01  S    Job Run at request of Scheduler@hn01 on exec_vnode
                          (cn01:ncpus=32)+(cn02:ncpus=32)+(cn03:ncpus=32)+(cn04:ncpus=32)+(cn05:ncpus=32)+(cn06:ncpus=32)
06/06/2017 11:34:01  S    (req_movejob) Request invalid for state of job, state=4
06/06/2017 11:34:01  L    Considering job to run
06/06/2017 11:34:01  L    Job run
06/06/2017 11:34:01  L    Considering job to run
06/06/2017 11:34:01  L    Failed to run:  (15016)
06/06/2017 11:34:01  S    Job Queued at request of danish@mgnt02, owner = danish@mgnt02, job
                          name = pbs_run2.job, queue = workq
06/06/2017 11:34:01  A    queue=workq
06/06/2017 11:34:01  A    user=danish group=navy project=_pbs_project_default
                          jobname=pbs_run2.job queue=workq ctime=1496720041 qtime=1496720041
                          etime=1496720041 start=1496720041
                          exec_host=cn01/0*32+cn02/0*32+cn03/0*32+cn04/0*32+cn05/0*32+cn06/0*32
                          exec_vnode=(cn01:ncpus=32)+(cn02:ncpus=32)+(cn03:ncpus=32)+(cn04:ncpus=32)+(cn05:ncpus=32)+(cn06:ncpus=32)
                          Resource_List.mpiprocs=192 Resource_List.ncpus=192
                          Resource_List.nodect=6 Resource_List.nodes=6:ppn=32
                          Resource_List.place=scatter
                          Resource_List.select=6:ncpus=32:mpiprocs=32
                          Resource_List.walltime=72:00:00 resource_assigned.ncpus=192

 

cn01 mom_logs:

06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;nprocs:  630, cantstat:  0, nomem:  0, skipped:  0, cached:  0, max excluded PID:  0
06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;Started, pid = 19512
06/06/2017 11:32:33;0080;pbs_mom;Job;341.HN01;task 00000001 terminated
06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;Terminated
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;task 00000001 cput= 0:00:00
06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;kill_job
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;CN01 cput= 0:00:00 mem=432kb
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn02 cput= 0:00:00 mem=0kb
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn03 cput= 0:00:00 mem=0kb
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn04 cput= 0:00:00 mem=0kb
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn05 cput= 0:00:00 mem=0kb
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;cn06 cput= 0:00:00 mem=0kb
06/06/2017 11:32:33;0008;pbs_mom;Job;341.HN01;no active tasks
06/06/2017 11:32:33;0100;pbs_mom;Job;341.HN01;Obit sent
06/06/2017 11:32:33;0100;pbs_mom;Req;;Type 54 request received from root@192.168.2.61:15001, sock=1
06/06/2017 11:32:33;0080;pbs_mom;Job;341.HN01;copy file request received
06/06/2017 11:32:34;0100;pbs_mom;Job;341.HN01;staged 2 items out over 0:00:01
06/06/2017 11:32:34;0008;pbs_mom;Job;341.HN01;no active tasks
06/06/2017 11:32:34;0100;pbs_mom;Req;;Type 6 request received from root@192.168.2.61:15001, sock=1
06/06/2017 11:32:34;0080;pbs_mom;Job;341.HN01;delete job request received
06/06/2017 11:32:34;0008;pbs_mom;Job;341.HN01;kill_job
06/06/2017 11:34:01;0100;pbs_mom;Req;;Type 1 request received from root@192.168.2.61:15001, sock=1
06/06/2017 11:34:01;0100;pbs_mom;Req;;Type 3 request received from root@192.168.2.61:15001, sock=1
06/06/2017 11:34:01;0100;pbs_mom;Req;;Type 5 request received from root@192.168.2.61:15001, sock=1
06/06/2017 11:34:01;0008;pbs_mom;Job;342.HN01;Type 5 request received from root@192.168.2.61:15001, sock=1
06/06/2017 11:34:01;0008;pbs_mom;Job;342.HN01;Started, pid = 20071

mom_log_341.txt

tracejob_341.txt

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×