D4 Design
This document captures the current design plans for D4, the next generation of D3. The document is split into two major parts: problems with the current D3 that should be fixed before moving on to D4 and new feature requirements for D4.
D3 Issues
Here are the current issues that we have with D3. Any feature that is not being inherited by D4 can be skipped over.
vtkMergeCells
I don't understand why this class exists. It provides the same basic functionality as vtkAppendFilter
except that it does not have the added convenience of actually being a filter. The added functionality of vtkMergeCells
should be incorporated with vtkAppendFilter
(the most important of which being the handling of global cell ids) so that vtkMergeCells
can be deprecated.
vtkSubGroup
I hate everything about this class. First and foremost, the name is completely wrong. The main operation of the class is to perform collective communications, not store a sub group of something. Second, the majority of the methods are implemented within C macros. That makes it impossible to debug effectively. This sometimes affects troubleshooting bugs not contained within vtkSubGroup
but rather something that uses it. Third, the implementation is amazingly clunky. There is this idea of maintaining an array of ids which gets reordered in weird ways depending on the "root" of the collective operation, and then restored later.
No matter what, I want to eventually get rid of this class. At the very least, replace it with a vtkCollectiveOperations
class that provides the same functionality, but with code that is easier to debug (perhaps with methods calling templated functions).
Ideally, however, I would like to see the functionality split into two parts. The actual implementation of the collective operations should be placed directly in vtkMultiProcessController
. That greatly simplifies doing the collective operations and also allows subclasses such as vtkMPIController
to override the methods with implementations supported in the native API. The subgrouping could be implemented with a special subclass of vtkCommunicator
that reordered indices (and again MPI could provide its own native implementation). That way the sub group feature could be used with any communication, not just collective communication.
vtkBSPCuts
This class just needs to go away. First of all, this class has nothing to do with BSP trees. It just holds a kd-tree in a compact array format. A kd-tree and a BSP tree are very different and should not be confused.
vtkBSPCuts
has very little functionality. Apart from simply storing a kd-tree in a different format than vtkKdTree
, vtkBSPCuts
only does
- Build a
vtkKdTree
structure from compact arrays and vice versa. - Test to see if two kd-trees are equal.
This functionality should just be wrapped up into vtkKdTree
.
vtkKdTree
vtkKdTree
does not deal well with regions without cells. In fact it was designed in such a way that you always specify the minimum amount of cells per region. This is a bad idea when combined with specifying the number of regions desired. The result is that if you do not have enough cells, you end up with a tree with less regions than expected. That leads to some interesting crashes.
The issue is that when vtkKdTree
is running select, it expects to get a median index rather than a median point in space. If it was the latter, we could be more flexible in how we divided our space. With the current implementation, when we run out of cells, we no longer have a cell index to split on and we have to fudge it.
vtkKdTree
should not go through the trouble of computing cell centroids. Instead, just build the vtkKdTree
based on points (which already have a location). Cells that straddle regions could be placed by any criteria: lowest region index, first point, region containing the most points, random, etc. It really does not matter. The division may become slightly unbalanced, but not enough to matter.
vtkKdTree
should be able to build a tree with a specific number of regions. That includes non-powers-of-two. This can be done by making splits that have a different number of leaves on each side. The number of cells in each side of the split would then not be even, but rather weighted based on the number of regions on each side.
vtkPKdTree
I am not at all convinced that the select operation is the correct one to use for determining kd-tree splits. I think an algorithm that minimizes data copies rather than comparisons would be more optimal.
vtkDistributedDataFilter
vtkDistributedDataFilter
will only use global ids if they are of type vtkIdType (see bug #3437). Sometimes they are stored as integers or long longs. That will happen if you save the data out to a VTK XML format and then read it back in.
General Performance Metrics
There is a lot of code in D3, and although there has been some ad-hoc performance tuning, we need some more thorough analysis. We need to know what needs to be improved for D4.
There is also a lot of code of dubious value. For example, many of the sub-algorithms in D3 have two forms: a "quick" version and a "memory conservative" version. By default, the quick version is used, and I doubt the memory conservative version is used at all. Furthermore, the code between these two versions are the same. As we add functionality to D3, it would be helpful if we could dump some of the less useful features.
D4 Requirements
Here are the new features we need with D4.
Multiple Data Types
Right now, D3 converts everything into unstructured grids when it distributes cells. D4 should be more type aware and allow native support for multiple data types. Certainly, a poly data input should result in a poly data output. Even more important, D4 should handle trees and graphs natively. Our new parallel infovis architecture will definitely need to distribute this type of data. Converting these structures to unstructured grids looses information.
The implementation may mean that there is a separate filter for each data type (vtkDistributeUnstructuredGrid
, vtkDistributePolyData
, vtkDistributeGraph
, etc.). Then again, it may be easier to roll it all into one filter.
Multiple Partition Strategies
Right now, D3 exclusively uses k-d trees to distribute the data. However, D4 needs to support this and other partitioning schemes. First of all, the new infovis data types may not have geometric information to place in a k-d tree. Second, the user may want to use a different algorithm, such as one provided by Zoltan. Third, the user may already have a partitioning scheme in mind. For example, if we could simply tell D4 where to move data, that would be a simple solution for many of our MxN issues in ParaView.
D4 should introduce a new class called something like vtkD4Apportion
. This class will be held by the distribution filter. The distribution filter will give vtkD4Apportion
the input data and then tell it to run the apportion. Once it is complete, the distribution filter will then query vtkD4Apportion for things like which cells it should send or receive.
The obvious first implementation of vtkD4Apportion
will internally use a vtkPKdTree
to apportion the data.
Multiblock Input
D4 should handle multiblock inputs well. It should make its division based on all the blocks but maintain the multiblock structure in its output.
Partial Updates
It is fairly common for the input to only change partially. For example, a time change may result in only the field data changing, but not the geometry or position information. In situations like this, it would be faster if D4 only did a "partial" update in that it skipped steps that did not change, such as building the k-d tree.
Work Update
From a telecon on 12/16/2008 (Berk, Andy, Ken) here is some thoughts from that.
- Current work in addition to the stuff mentioned above:
- The ability to assign the ownership of a node to a process which is useful for doing things like computing nodal values as each process that shares a node will know whether they need to compute information for that node based on ownership.
- Current work does not need to include Zoltan.
- Idea is to start with D4 as an exact copy of D3 and clean out unneeded/bad stuff and add new functionality deliverables.
- Scalability.
- Better temporal support with an emphasis on performance
- Long term may want to add in a higher level control for distributed meshes with functionality like:
- Given a point location return a list of processes that may contain a grid that contains the point
- Given a mesh entity, get the owning process (e.g. ghost cells, shared nodes, etc.) or a list of process that use that mesh entity
- Easy and efficient point-to-point communication operations
A side note on using a node of the cell as the "geometric" description of the cell rather than the cell centroid. This will probably cause problems/confusions such as:
- Multiple cells will have the same geometric description
- The ordering of the objects may be unexpected (e.g. cell B may be considered left of cell A in the image below if node 2 is used as the geometric representation of B and node 1 is used as the geometric representation of A)
- The cell centroid gives a more consistent representation of the cell
Questions on D4 Details
Currently (12/29/2008), most of the vtkDynamicDistributedDataFilter class (D4 filter) has the unstructured grids replaced with data sets. It passes the DynamicDistributedData.cxx test (same as DistributedData.cxx test except using the D4 filter instead of the D3 filter). The 2 sole unstructured grid relics laying around are:
- vtkExtractUserDefinedPiece is an unstructured grid algorithm
- vtkAppendFilter which stated above should be used to replace vtkMergeCells is an unstructured grid algorithm