Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
Sign in to follow this  
begou

why log files go to "undelivered" dir ?

Recommended Posts

Hi,

I've tryed a search with the keyword "undelivered" on the forum but do not find anything. On google my search only give me informations on how to remove files in the "undelivered" PBS directory.

But my problem is "Why did log files go in this undelivered directory" ?

I do not find any help in PBS Professional 9.1 admin guide too.....

My PBSPro is runing on a mixed cluster (itanium and Xeon) with queues for the itanium part and queues for the xeon part.

If I launch a job for an itanium queue from the itanium host I get the log file of the job in my dir.

If I launch the same job for the same itanium queue from the Xeon host the log file goes in the "undelivered" dir on the itanium host !

The PATHs of all my disk area are the same on the 2 hosts and the disk is nfs mounted on the Xeon from the itanium.

Any Idea ? I think it is related to the definition of the output path with this syntax

Output_Path = submission-host:/path/to/the/file

but even if I use

#PBS -o /absolute/path/to/the/out

#PBS -e /absolute/path/to/the/err

the submission hostname is added (as shown by "qstat -f jobid") and the log files go into the "undelivered" dir!

Thanks for your help

Patrick

Share this post


Link to post
Share on other sites

The return of output files is a common initial setup issue. I suspect you simply dont have the nodes that arent returning the files configured correctly. How are you trying to return the files? There are several ways to do this but my suggested route would be that you use the mom config option called "$usecp". You can add a line to the mom config like:

$usecp *:/path /path

Dont forget to reboot the mom process. This will place all output files below the /path instead of trying to use rcp or scp (which is what PBS does by default). You can read more about ""$usecp" in the admin guide.

Share this post


Link to post
Share on other sites

Hi Adam,

I'm back after many tests! Your suggestion give me a starting point to browse the admin guide efficiently. However, as I have several independant PATH (/home/user/* and /data/*) I don't found in the doc how to set $usecp for all of them.

But I understand that setting PBS_SCP=/usr/bin/scp in /etc/pbs.conf tell PBS to use scp to copy these files. So I spent many time to setup a host based authentication between the nodes, the Xeon front-end and the itanium front-end.

Now a user can ssh from any node to any node with this host based authentication.

But I still have job staying in the E state and, for a job runing on node cl1n004, the hanging process are:

decaixj 4461 4353 0 15:25 ? 00:00:00 /usr/bin/scp -Brvp /var/spool/PBS/spool/13810.calcul9sv4.OU decaixj@cl1n003 /home/users/decaixj/calcul/Venturi_4/KLSST_c02_m042/KLSST_c02_m042.o13810

decaixj 4462 4461 0 15:25 ? 00:00:00 /usr/bin/ssh -x -oForwardAgent no -oClearAllForwardings yes -oBatchmode yes -v -ldecaixj cl1n003 scp -v -r -p -t /home/users/decaixj/calcul/Venturi_4/KLSST_c02_m042/KLSST_c02_m042.o13810

I think this job has been submitted by a previous job runing on node cl1n003 and has been allocated on node cl1n004.

And files still go in the undelivred directory.

Patrick

Share this post


Link to post
Share on other sites

Well, again I think the shortest path to getting this working is using $usecp if you have common mount points across your execution nodes. To add multiple directories you add multiple lines in the mom config

$usecp *:/home/user /home/user

$usecp *:/data /data

$usecp *:/blah /blah

This is documented in the PBS Pro 9.1 Admin Guide on page 268. I would also encourage you to use the latest version of PBS if possible. Version 10.4 was just released this week.

Share this post


Link to post
Share on other sites

OK Adam, I will try the $usecp config. That I have not understood page 268 was that I was able to use several instances of $usecp. Not explained in the doc I have read but may be I missed a paragraph where it is detailed.

Updating PBSPro could be a solution, may be this summer when fewer people are working. These weeks we have many students and researchers runing long CFD simulations and I can't stop the servers.

Thanks for your detailed answer.

Patrick

Share this post


Link to post
Share on other sites

Although the above information was very helpful for Linux/Unix systems, I'm currently having the same kind of issue on a Windows Server 2008 R2 box.

Currently the MOM is trying to copy over the output files to this sort of path:

workstation.domain.local:Z:/path/to/output.eXXX

As you can see, the actual Windows network drive appears in the path and differs from user to the next and I'm not surprised that pbs_rcp.exe spits out an error.

So would using the following make any sense?:

$usecp = *.domain.local:*/path D:/path

Share this post


Link to post
Share on other sites

YLB, I would make a slight modification to your mom_priv/config attribute. You are going to need to include the originating drive letter to map back to D:/path. Otherwise, this will fail because PBS will not know which Drive to map to D:/path

You might want to check PBS Professional Install Guide 11.3 section 3.5.10 Network Drives and File Delivery

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now
Sign in to follow this  

×