ParaQ:Client-Server Connections
Client-Server Connections in ParaQ
Background:
The precise mechanism of connecting a client to a server in ParaQ should be designed into the ParaQ client. The current mechanism of launching the server and the client with a script (when on a Unix client) and having the users do it manually on the windows side is not an acceptable design for ParaQ.
The computational and visualization clusters at Sandia do not currently support the client connecting directly to a running server. As the job submission, network and name resolution setups at Sandia are very similar to other laboratories and universities. Having a general ‘reverse connect’ functionality would probably be useful for the larger ParaQ community.
This document will try to address the desired functionality for ‘reverse’ connection and forward connection scenarios. This document will also try to address the issue of multiple clients connected to a server, and multiple servers connected to a single client.
General:
Users would probably expect ParaQ to come up by default in ‘stand alone’ or ‘localhost’ mode. They can then use ParaQ on data from their local workstation.
Users would also probably expect ParaQ to have the ability to connect to a specific server and read data off of that server.
Use Cases:
UC1) The user’s environment is setup similar to Sandia’s and a reverse connection is necessary.
UC2) The user’s environment allows direct connections to a running server but doing a reverse connection is also totally fine.
UC3) The user’s environment allows direct connections to a running server but for some architectural reason will not allow a reverse connection.
UC4) Same as UC1/UC2/UC3 but user would like to connect multiple servers to a client.
UC5) Same as UC1/UC2/UC3 but user would like to connect multiple clients to a server.
UC6) Same as UC1/UC2/UC3 but the user would like to disconnect from the server and reconnect at a later time.
Use Case Elaboration:
Even though reverse connection is the Sandia perferred method. There are several use cases where the Client host/ip would not be available for the server to connect back to. For example, consider a student/employee connecting to a ParaView server running on a cluster at school/work from home. Many homes connect to the internet with a dynamic IP address with no hostname to resolve it. Furthermore, many home computers sit behind a firewall that blocks all incomming connections.
Proposed Approach:
The proposed approach tries to address the fact that every installation will have different server names and different ways to connect to that server and then different ways of launching the paraview server.
Specifically the use of a server.xml file in a specific location of the installation is proposed. The XML file will contain all the information relevant to the available servers.
1) The server name (perhaps fully qualified “foo.sandia.gov”)
2) The displayed name (“foo”)
3) Whether the queue is interactive or not (to be displayed in menu)*
4) The connection protocol (rsh, ssh, other?)
5) The script to be run once connected (“paraview_server_go”)
6) The arguments to that script (nodes, times, case_number)
7) The 'direction' that this server expects connections (forward/reverse)
The menu items and the corresponding connect dialog box will be constructed from this XML. So for instance if I say connect to ‘redrage’ then a dialog box may pop up asking for a kerboros password and the number of nodes, time, and a case number. If I connect to ‘testcluster’ the dialog may just have number of nodes and time. We will want to UI construction to be ‘extendable’ so that if some cluster needs another variable like which queue to submit to then the XML could define a new variable that the UI would ask the user to type in.
The approach described above should address UC1 and UC2. UC3 can also be addressed by having the client 'poll' the server until a conenction is established. From a user perspective the client is in a 'wait' mode until something good happens.
Multiple Servers:
UC4: Not sure exactly what the usage of multiple servers might look like in the end product, but it appears that the proposed approach actually lends itself quite nicely to multiple servers. The user might do something like specify a window before connecting to the server and then each window would have a different server (really not sure).
Multiple Clients:
UC5: This is the tricky one. If the server always does a connection back to the client, there seems to be no way to actually have multiple clients connect into one server. Well what on the surfaces seems like a bad thing is actually a good thing.
-
Security Sidebar:
Let’s talk about the other case where the server is ‘open’ and clients can simply connect in. The server is running with the permissions of the user that started that server, and you surely don’t want anyone with a client to be able to arbitrarily connect. Okay so you put something in place like a secret key or something, well in practice everyone will choose ‘123’ and you haven’t really protected ‘need to know’ issues in any significant way (by the way this is one of the reasons why I propose the deprecation of UC3). In general I like reverse connect because then the person starting the server is specifically specifying which client to connect back to and because that client is ‘waiting’ the connection is immediately established. In the case where the server is ‘open’ and excepting connections you don’t have any real way of saying except connections from this client but not this client, also inevitably, the server is simply ‘hanging’ out for a while before you get around to starting your client, this is also a bit of a security hole.
So, in order to connect up another client, the user of the ‘initiating’ client has to explicitly say connect to client ‘blah’, again in my opinion this is a good thing. The user can at that point also specify whether that client is ‘read/view only’ or can control/write to the server state, also issues like writing to the servers disk (again you have a client that may not be the same user as the person who started the server). So the server then connects to the specified client (who, because UC3 is deprecated, is waiting in reverse connect mode).
UC6: When connecting to the server after all clients have disconnected, the above approach will not work. A solution to this problem is to write out a small "cookie" file that contains enough information to reestablish the connection. This has the potential to solve many of the problems with connecting to servers. It solves the problem of specifying the server on a cluster running multiple servers. It also can solve the security problem of forward connections given above. The "secret key" could be randomly generated when the server starts, and that key can be placed in the cookie. Thus, access to the server is basically limited to people with access to the cookie file.
NasaDave's Peanut Gallery: There is still the problem where you have a client on your desktop and an existing server running on a machine sitting behing a firewall/NAT (as is the case for our clusters). The client can't re-connect to the server directly since the server only has a local IP address (normally 192.168.X.X or 10.0.X.X). The cookie would somehow need to know the head node for the cluster and try to bonk the server through the head node and tell it to reconnect to the client.
Tim's Peanut Gallery: Since dropping the original connection is the problem, don't drop the original connection - create a proxy server that can "forward" PVS connections:
Firewall | | ParaQ Client <-------> PVS Proxy <----> | |<----> Actual PVS Server | |
In the reverse-connection case, it is the PVS Proxy that accepts the reverse connection, rather than the client. That simplifies the client since it never accepts a connection. If the client disconnects, the PVS Proxy continues running. That gives subsequent clients a server they can connect-to when they want.
Dave's Walnut Gallery: The PVS Proxy needs to run on a machine other than the user's local machine. The biggest use case for this disconnect-reconnect functionality will likely be on the secure network. In this case, the user needs to be able to turn off their machine and still be able to reconnect to a running server.
Challenges:
Whenever you ‘submit’ a job to a computation cluster you are in tricky waters. The challenges to this type of design include at least the following items.
1) Which ssh do I use? Are there issues about using a particular (site specific) ssh? Can I use some general OpenSSH library?
2) What are the issues with taking in a user’s password?
3) What are the issues with grabbing the client host name? Is there a nice cross platform way of doing this? Will it be fully qualified? Can the server use it to connect back? Lots of issues. For starters, the client may not have a host name at all (or it's not registered in any DNS). If the client and server are not on the same LAN, the client may not even have a valid IP address.
4) What if you have two clients on the same host?
5) What if you have two servers on the same cluster?
- This would be very helpful at a place like Sandia where we have several clusters which have interactive queues for vis, but many others that don’t. It’s a nice indication to the user that they may have to wait a long time for non-interactive compute platforms. Also it will mean that users start to demand an interactive queue for those platforms that don’t currently have an interactive vis queue.