ParaQ:Client-Server Connections
Overview
The precise mechanism of connecting a client to a server in ParaQ should be designed into the ParaQ client. The current mechanism of launching the server and the client with a script (when on a Unix client) and having the users do it manually on the windows side is not an acceptable design for ParaQ.
The computational and visualization clusters at Sandia do not currently support the client connecting directly to a running server. As the job submission, network and name resolution setups at Sandia are very similar to other laboratories and universities. Having a general ‘reverse connect’ functionality would probably be useful for the larger ParaQ community.
This document will try to address the desired functionality for ‘reverse’ connection and forward connection scenarios. This document will also try to address the issue of multiple clients connected to a server, and multiple servers connected to a single client.
Users would probably expect:
- ParaQ to come up by default in ‘stand alone’ or ‘localhost’ mode. They can then use ParaQ on data from their local workstation.
- ParaQ to have the ability to connect to a specific server and read data off of that server.
Use Cases
- UC1 The user’s environment is setup similar to Sandia’s and a reverse connection is necessary.
- UC2 The user’s environment allows direct connections to a running server but doing a reverse connection is also totally fine.
- UC3 The user’s environment allows direct connections to a running server but for some architectural reason will not allow a reverse connection.
- UC4 Same as UC1/UC2/UC3 but user would like to connect multiple servers to a client.
- UC5 Same as UC1/UC2/UC3 but user would like to connect multiple clients to a server.
- UC6 Same as UC1/UC2/UC3 but the user would like to disconnect from the server and reconnect at a later time.
Notes
Even though reverse connection is the Sandia perferred method. There are several use cases where the Client host/ip would not be available for the server to connect back to. For example, consider a student/employee connecting to a ParaView server running on a cluster at school/work from home. Many homes connect to the internet with a dynamic IP address with no hostname to resolve it. Furthermore, many home computers sit behind a firewall that blocks all incomming connections.
Proposed Approach
The proposed approach tries to address the fact that every installation will have different server names and different ways to connect to that server and then different ways of launching the paraview server.
Specifically the use of a server.xml file in a specific location of the installation is proposed. The XML file will contain all the information relevant to the available servers.
- The server name (perhaps fully qualified “foo.sandia.gov”)
- The displayed name (“foo”)
- Whether the queue is interactive or not (to be displayed in menu)*
- The connection protocol (rsh, ssh, other?)
- The script to be run once connected (“paraview_server_go”)
- The arguments to that script (nodes, times, case_number)
- The 'direction' that this server expects connections (forward/reverse)
The menu items and the corresponding connect dialog box will be constructed from this XML. So for instance if I say connect to ‘redrage’ then a dialog box may pop up asking for a kerboros password and the number of nodes, time, and a case number. If I connect to ‘testcluster’ the dialog may just have number of nodes and time. We will want to UI construction to be ‘extendable’ so that if some cluster needs another variable like which queue to submit to then the XML could define a new variable that the UI would ask the user to type in.
The approach described above should address UC1 and UC2. UC3 can also be addressed by having the client 'poll' the server until a conenction is established. From a user perspective the client is in a 'wait' mode until something good happens.
Multiple Servers
UC4: Not sure exactly what the usage of multiple servers might look like in the end product, but it appears that the proposed approach actually lends itself quite nicely to multiple servers. The user might do something like specify a window before connecting to the server and then each window would have a different server (really not sure).
Multiple Clients
UC5: With the proposed functionality of having paraview server state 'synced' on the client side, this should be fairly straight forward. Any 'new' clients that connect will simply gather up the current state and be 'synced' up. Any new clients will have to be authenticated by the server before being allowed to connect.
-
Note:
ParaQ should have designed in from the beginning an authentication functionality. Need to know issues have to be addressed for both forward and reverse connection scenarios. This authentication functionality should follow accepted standards. The ParaQ client will also include an ssh executable that can be used if the user's environment does not provide an existing ssh. See Secure Connections.
Disconnect/Reconnect
UC6: Although all the details have not been worked out we believe that in general a disconnect/reconnect functionality can be addressed with an authentication mechanism. One solution is to write out a small "cookie" file that contains enough information to reestablish the connection. This has the potential to solve many of the problems with connecting to servers. It solves the problem of specifying the server on a cluster running multiple servers. It also can solve the security problem of forward connections given above. The "secret key" could be randomly generated when the server starts, and that key can be placed in the cookie. Thus, access to the server is basically limited to people with access to the cookie file.
Challenges
Whenever you ‘submit’ a job to a computation cluster you are in tricky waters. The challenges to this type of design include at least the following items.
- Which ssh do I use? Are there issues about using a particular (site specific) ssh? Can I use some general OpenSSH library?
- What are the issues with taking in a user’s password?
- What are the issues with grabbing the client host name? Is there a nice cross platform way of doing this? Will it be fully qualified? Can the server use it to connect back? Lots of issues. For starters, the client may not have a host name at all (or it's not registered in any DNS). If the client and server are not on the same LAN, the client may not even have a valid IP address.
- What if you have two clients on the same host?
- What if you have two servers on the same cluster?
This would be very helpful at a place like Sandia where we have several clusters which have interactive queues for vis, but many others that don’t. It’s a nice indication to the user that they may have to wait a long time for non-interactive compute platforms. Also it will mean that users start to demand an interactive queue for those platforms that don’t currently have an interactive vis queue.
Appendix
Notes on conversations about disconnect/reconnect
NasaDave's Peanut Gallery: There is still the problem where you have a client on your desktop and an existing server running on a machine sitting behing a firewall/NAT (as is the case for our clusters). The client can't re-connect to the server directly since the server only has a local IP address (normally 192.168.X.X or 10.0.X.X). The cookie would somehow need to know the head node for the cluster and try to bonk the server through the head node and tell it to reconnect to the client.
Tim's Peanut Gallery: Since dropping the original connection is the problem, don't drop the original connection - create a proxy server that can "forward" PVS connections:
Firewall | | ParaQ Client <-------> PVS Proxy <----> | |<----> Actual PVS Server | |
In the reverse-connection case, it is the PVS Proxy that accepts the reverse connection, rather than the client. That simplifies the client since it never accepts a connection. If the client disconnects, the PVS Proxy continues running. That gives subsequent clients a server they can connect-to when they want.
Dave's Walnut Gallery: The PVS Proxy needs to run on a machine other than the user's local machine. The biggest use case for this disconnect-reconnect functionality will likely be on the secure network. In this case, the user needs to be able to turn off their machine and still be able to reconnect to a running server.
Brian's Phone Home/Phone a Friend suggestion: Lets skip the whole proxy thing (too complicated). Let have a phone home or phone a friend option on the server. The client disconnects and as part of the disconnect asks the server to either poll the same client or another client. The server (during idle) tries to connect back to the client or another client every 30 seconds or so. That way I can disconnect from one client and reconnect from another client (tricky). Also this gives this client the 'control' of saying that only the current client or one that I specify can connect up to the server. I know this does not address the use case where the client ip/host cannot be explicity given but it certainly will work in the 'reverse connect' case and should actually be quite staight forward.