VTK/Parallel
This is a spot to start documenting the ins and outs of making VTK work in parallel at the various levels. The page has been started by someone who is new to VTK, so there are probably errors in here. Other people will hopefully assist with fleshing out this page. Formatting will be sparse for the moment, I'll just put down my thoughts to begin with.
There doesn't seem to be much solid documentation for VTK in parallel. The mailing list gets regular questions, and I personally haven't come away from reading replies knowing everything I wanted to know.
There are several levels and meanings of "Parallel":
Loop-Level (compiler extensions, very fine)
This can be achieved via the new OpenMP features of GCC, Microsoft's Visual C++ (pro, not express) and Intel's icc. icc can also support Intels Thread Building Blocks (did I read that gcc also supports TBB?).
A loop can be run in parallel via compiler extensions. The only concern is whether the data it is accessing is thread-safe.
http://en.wikipedia.org/wiki/OpenMP
Algorithm-Level (threads, MPI, fine)
The algorithm has a parallel mode of operation built-in. Well-written quote from the mailing list:
Many of VTK's imaging filters support thread level parallelism. When VTK is compiled with threading support, those filters in the Imaging kit will automatically spawn threads to take advantage of the available parallelism near to the innermost loop. When processing this type of data, threading is fairly easy to do and scales, in terms of execution time, very well.
I assume that its up to the filter-writer as to how this can be implemented, and could also be done with MPI, sockets and processes, SOAP over web services, phone-a-friend and ask-the-audience. VTK does not provide any support for these alternate methods, but it does provide support for threading image filters via vtkThreadedImageAlgorithm.
Note that this SEEMS to be where the vtkMultiThreader slots in. It provides support for an algorithm to spawn threads and distribute its workload. Its a low-level class.
This is also where vtkMultiProcessController and vtkMPIController slots in. These classes help to distribute and control processes that an algorithm wants to distribute. The algorithm registers callbacks, remote methods, etc and triggers remote executions.
Note that all this parallelism is still in the branch-execute-join pattern ... each filter in the pipeline executes sequentially, but that execution can be done in parallel.
The Streaming pipeline system complicates this a little bit, because one filter may request more information (pieces) from a source filter. Note that during the request the calling filter waits for the response, so again its just one filter executing at once.
Note that if you look through the code, you'll see "asynchronous" mentioned from time to time. This seems to be for dealing with remote executions, but once the remote calls have been made, the filter will still wait for the responses before moving on.
Filter/Task-Level (processes, MPI, coarse)
If you want to execute more than one filter at a time, you basically have to create two independent pipelines and merge the results together at the end via vtkParallelRenderManager (eg vtkCompositeRenderManager) or similar.
Each pipeline has to operate in a separate process (thread is not enough) via socket or MPI communication
The data must be partitioned between the two pipelines (doing this efficiently is complicated).
Render-compositing is not straight forward either, as you have to consider object-ordering, or where the data will be displayed, and volume rendering in parallel may not be possible at all.
Note that the rendering is no longer done directly to the display, it is often rendered to memory (ie images), then composited and only then displayed. How much hardware acceleration is possible, I don't know. I see a lot of mentions of using the Mesa library (OpenGL software-rendering).
Why can't it be threaded?
First I (edice, the original author of this page) should explain why I am even asking this question, and what previous approaches I had taken. It will help explain why serious VTK users wouldn't want threading anyway (that is later).
I am not working on anything bigger than a PC (quad-core is the current top-of-the-line that could be considered). I cannot farm out processing to a network of PCs (the LAN is relatively slow and difficult to deploy onto). The data sizes I deal with are not large, but they are large enough to be annoying if you had to wait for them.
The key feature I am looking for is the ability to "background" the processing of filters. If I do a gaussian-splatter, I would rather look at a partial dataset, make changes, and see the results of the splatter only when it has finished and it is up to date with the rest of the pipeline.
So far I have achieved this with a different 3D rendering system by implementing a data-push pipeline. Once my "filters" have done their work, the results are pushed into the 3D framework and the results become visible - it can happen at any time, even while the user is rotating the model.
The data that moves between my "filters" are largely self-contained and can be processed through const interface. If the data needs to be manipulated, a deep copy can be made. Shallow copies in general are not needed, although if the data is large then a special adaptor can be used to support sorting, adding, removing, and changing the core const dataset. Because of the simple nature of the data, threading is trivial as each job can be considered as a unit, and will not impact on anything else in the system.
VTK's data is not simple. There is a lot of extra stuff added (reference counting, vtkInformation support) that could talk to a lot of other objects. Reference counting alone looks very thread-unsafe... and to make things more complicated, there seems to be extra code to side-step garbage collection when the objects are created outside of the "main thread" - possibly to support threaded algorithms that need to create their own working space?
So instead, VTK's data can be serialized and sent to filters that live in a different process or on a different machine (via MPI, sockets, etc). There is also apparently support for shared-memory architectures, although I do not know anything about it - it may be a new or coming-soon feature. Serialization introduces overheads which I normally avoid by simply having dumb data objects.
I would consider making VTK's current implementation to be thread-safe as non-trivial, and probably pointless in its current design.
If you wanted to visualize a large dataset, you wouldn't want thread safety. You wouldn't want or need multiple pipelines executing in the same process. Instead you would be distributing the workload across a large cluster or an N-way machine, and the main thread is no longer the limiting factor. And more importantly, you probably wouldn't want to look at an incomplete visualization. You would rather wait for the full thing to finish, or let the streaming show you what you want to see.