Talk:Graph and Tree Structures

From ParaQ Wiki
Jump to navigationJump to search

vtkAbstractGraph IsA vtkPointSet

Let us not forget that vtkAbstractGraph inherits from vtkPointSet. As such, you must implement the convention of vtkDataSet and vtkPointSet regardless of what new conventions you create. Thus, even though we may want to implement more abstract conventions for accessing graph features, we still must index nodes as points and we must index arcs as cells. If we fail to do this, then it will be impossible to reference the vtkAbstractGraph as anything but that, in which case it makes no sense to inherit from vtkDataSet at all.

Given that we have to implement index access to nodes and arcs anyway, I'm very sceptical on implementing any secondary set of access methods. Even if the new method is significantly better, that is still two interfaces we have to implement and support. It also means that there two interfaces most users will have to learn and that the interface methods are a confusing mix of two completely different sets of access methods.

--Ken 17:05, 6 Sep 2006 (EDT)

I agree that if vtkAbstractGraph still has vtkPointSet as a superclass, then we still need the index-based approach, and adding node and arc types is likely pretty pointless. What we are suggesting here is to change the superclass of vtkAbstractGraph to vtkDataObject, which is a major shift, especially since we already agreed to keep vtkPointSet as the superclass. Item 5 in the feature list describes the reasoning for this change.
Jeff 18:04, 6 Sep 2006 (EDT)
I am convinced that it is better to inherit directly from vtkDataObject instead of vtkPointSet. Looking at vtkDataSet and vtkPointSet API, I see a whole set of functions that I don't think belong to a tree. Furthermore, an ID based API takes us far from conventional tree implementations making vtkTree non-intuitive to use. For the longest time, I wished we abstracted some of the slow-path access to our datasets using iterators. Specially in implementing things like blanking. Having algorithms that purely access cells and points using indices makes implementing smart iterators that can skip over certain cells useless. Now, I find us trying to fit an index based API to a data structure that is not even normally randomly accessed. The final draw was that I didn't even consider using vtkTree when considering how to implement a small application to generate an xml tree of VTK class hierarchy. I thought I would write my own in python. So, I am in favor of an iterator access, tree-like tree that inherits from vtkDataObject.
Berk 06:37, 7 Sep 2006 (EDT)
If vtkAbstractGraph inherits from vtkDataObject then I can no longer run it through filters -- I use vtkThreshold a lot. I suppose I could write a filter to convert a graph to an unstructured grid or a polydata. Also, if we inherit from vtkDataObject, how will we manage node and arc data arrays?
Awilson 10:36, 7 Sep 2006 (EDT)

vtkNode and vtkArc footprints too big

I think the footprints for vtkNode and vtkArc (as well as the superclass vtkIndexedObject) is way too big. They all inherit from vtkObject. vtkObject contains overhead for reference counting, garbage collection, observers, and many virtual functions. There is also much code run during the creation and deletion of every vtkObject.

A much better solution is a small structure that is passed around that minimally describes the node or arc. Since we probably have to index nodes and arcs anyway because #vtkAbstractGraph IsA vtkPointSet, the structure would minimally be the index:

typedef vtkIdType vtkIndexedObject;
typedef vtkIdType vtkNode;
typedef vtkIdType vtkArc;

Of course, that gives us little or no value over just using vtkIdType to index nodes and arcs.

--Ken 17:05, 6 Sep 2006 (EDT)

There is probably no reason that vtkIndexedObject must inherit from vtkObject; it could have no superclass. The total amount of memory storage needed should then be more similar to the previous structures. The memory would, however, still be much more fragmented than the prior implementation. If we went with this implementation and wanted to be more memory-savvy, we could in the future manage memory ourselves (perhaps using an enhanced version of vtkHeap).
--Jeff 18:14, 6 Sep 2006 (EDT)

Linked Lists Vs. Arrays

In the document, it is suggested that we use linked lists rather than array. I personally would rather continue to use arrays. First of all, arrays are used in every other VTK data structure, so using arrays will mesh better with vtkAbstractGraph's superclasses and siblings. Second, field and coordinate data is still held in arrays. What is the point of having node and arc information in linked lists if you still have to maintain indices into the field data and perform array-type manipulations on them. Third, most operations are at least as fast or faster with the data arrays. I outline the operations in the table below

Operation Faster Data Structure
Array Linked List
Insertion Tie Tie
Deletion Linked List (maybe)
Random Access Array
Non-ordered Traversal Tie Tie
Bredth/Depth Traversal Linked List (sort of)

Inserting an element at the end of an array is not much different then inserting an element at the end of a linked list. Traditionally, linked lists are considered to be easier to delete from, however, I am highly sceptical that in this case it will actually be faster. First of all, deleting a single value from an array is not hard. Just move the entry in the last element to the slot being deleted. That is an O(1) operation. To delete from a linked list, you first have to find the element. Even assuming that you already have a reference to the entry to be deleted (so you don't have to traverse to get there), you still have to remove the entry in the field arrays corresponding to the node or arc. So now you are back to deleting from a linked list.

Random access is obviously better in an array. For traversing through the data, both structures are about the same. The linked list may have a slight edge in bredth or depth first traversal. This is only contingent on keeping links not only in a list of nodes and a list of arcs, but in links from nodes to arcs and vice versa. In this case, you have a very very slight edge in that you can just dereference the link rather than getting an index from a vtkCellLinks type data structure. However, I expect that maintaining these links will prove to be difficult.

--Ken 17:43, 6 Sep 2006 (EDT)

I think this is a good point. The order that the nodes and arcs are stored in the graph is totally irrelevant, so we could do deletions by moving objects from the end position to the deleted position. I agree that arrays are also cleaner and easier to manage.
--Jeff 18:28, 6 Sep 2006 (EDT)
Maybe I'm missing something here but if I have to move the first element (cause I'm deleting it) in an array to the end then wouldn't I have to shift all other elements by one in order to fill the space used by the element that was "deleted" ? (which would not be order 1) Also - if the node and arc data structures also provided the structure to form the linked list (or doubly linked list) then you never need to search the list to find the node in order to deleted it.
I also disagree on the "tie" listed for insertion - unless you are going to allocate an array so large you will never need to expand it you will have to copy the array from time to time to make room for the new insertion. Also in the case of a linked list you never have to allocate more memory than you are using.
--Bob 23:36, 6 Sep 2006 (EDT)
Since the suggestion is to remove random access and provide iterators instead, the vector looses it's advantage there. It sounds like when designing addition and removal for a vector, you either optimize for memory or performance. When adding, the vector has to grow somehow, when removing, it has to shrink. This almost always requires re-allocation and copying. It is possible to put this off by having the vector grow by a factor everytime an element is added beyond the allocated space (VTK double vector size everytime this happens) or marking deleted elements as invalid when they are removed. However, eventually, one will want to squeeze the array. Squeeze will be quite costly.
As long as we allow only iterator access, we can go either way, I think. If we use linked lists, we will still have to provide some smart heap mechanism to prevent memory segmentation. We may be able to use vectors instead and build this heap mechanism into them. When deleting an element, mark it's memory space as unused, when inserting an element, re-use that space. Iterators could easily skip over unused elements. Or something of the sort. We still have to provide some other sort of heap support to handle the fromEdges and toEdges dynamic arrays in each node etc. Otherwise, these little memory blocks would be distributed all over the place in memory.
Berk 08:23, 7 Sep 2006 (EDT)