Jump to content
  • Announcements

    • admin

      PBS Forum Has Closed   06/12/17

      The PBS Works Support Forum is no longer active.  For PBS community-oriented questions and support, please join the discussion at http://community.pbspro.org.  Any new security advisories related to commercially-licensed products will be posted in the PBS User Area (https://secure.altair.com/UserArea/). 
david882

PBS in the Cloud

Recommended Posts

Hello


I am designing a PBS solution in the Cloud and have a queston;


 


Can you automatically provision and then destroy nodes as and when the workload arrives and departs? I was thinking of building an image with the client being pre-configured pointing at the IP of the PBS server and then you could just increase the number of nodes in the cluster and then decrease when the work drops off. This will bring real cost savings to the solution as I'm sure you can appreciate.


 


I see in the PBS Professional webpage it says you can monitor, shutdown and restart machines as an when required but I'm not sure if this is slightly more complex.


Share this post


Link to post
Share on other sites
Hi there,

 

Setting up a cloud provisioned environment is definitely achievable with PBS Pro, I have done it with AWS instances.

 

The idea is a bit heavy-handed on the configuration side as opposed to a turnkey solution, primarily because PBS Pro won't allow adding nodes that aren't resolvable, nor reflect the state of a node that may be available but that we would want to offload work to only under specific conditions, there are also some security considerations that  aren't address in PBS Pro's most common environment of traditional HPC solutions.

 

In a nutshell, the steps call for:

 

1) writing scripts that will bring remote nodes online (e.g. AWS)

2) configuring end-to-end open networking, i.e. resolved any firewall traversal that would

    block communication on PBS Ports (defaults 1500[1-10], and scp for file staging

3) configuring a flat uid user namespace across local and remote servers

 

Based on policy dictating when a workload should trigger a cloud provision they can (which could be monitored through a cron script):

 

4) invoke the provisioning mechanism

5a) associate provisioned nodes to a queue (e.g., cloudq)

      -OR-

5b) create a reservation on nodes on that queue to control provisioning lease expiration  

6) move workload into that queue

 

 

Tear down would consist of removing the nodes and optionally deleting the queue.

 

The traffic between the local server and the cloud MoM would be as secure as the link between these two endpoints, so unless configuration of an encrypted channel was performed, the data sent may be compromised.

 

Cheers

 

Vincent

Share this post


Link to post
Share on other sites

Please sign in to comment

You will be able to leave a comment after signing in



Sign In Now

×