Talk:Graph and Tree Analysis

From ParaQ Wiki
Jump to navigationJump to search

Pointer Accessor Functions

The currently proposed pointer accessor functions are of the form

vtkIdType GetInDegree(vtkIdType node);
vtkIdType* GetInEdges(vtkIdType node);

Is there a reason they are not of the form

void GetInEdges(vtkIdType node, vtkIdType &nedges, vtkIdType &*edges);

This latter form matches the cell accessors for poly data and unstructured grids. It also lets you get all the elements desired in one function call rather than two.

--Ken 12:46, 12 Sep 2006 (EDT)

Break up AdjacencyListToEdges array win

I am assuming that the big win for the Graph and Tree Analysis#Break up AdjacencyListToEdges array work is that inserting and deleting nodes and edges from the graph will no longer invalidate the adjacency list. Instead, each operation will simply update the adjacency list, and the broken up adjacency lists will make it possible to do this in O(1) time (amortized).

If this is the case, it would be nice if this was explicitly mentioned on the Wiki page.

--Ken 12:45, 12 Sep 2006 (EDT)

out_degree cost

Why is the cost of out_degree(v) O(out_degree(v)) rather than O(1)? The adjacency lists, which will be in arrays, will necessarily have a count associated with them. As such, you should be able to just return the count, right?

--Ken 12:51, 12 Sep 2006 (EDT)

That's true. I'll update the runtime.
Jeff 08:34, 13 Sep 2006 (EDT)

Separate directed and undirected graph classes.

It is really unclear to me why we need separate classes for directed and undirected graph classes. The two share so much functionality that it does not seem to be worth it. In my mind, the only difference between the directed and undirected cases are how the adjacency list is updated.

Is there something else that needs to be handled by the BOOST adapter that matters whether the class is directed or not? If so, can't this be handled by a little code in the adapter?

--Ken 13:05, 12 Sep 2006 (EDT)

It's true that we could have boost's functions return the appropriate result for specific functions (like out_edges()) that would give different results depending on whether the vtkGraph is directed or undirected. The issue is that, for boost, each graph type has a corresponding tag directed_category which should be set to directed or undirected. So, if the tag for vtkGraph is set to directed, then undirected algorithms (e.g. connected_components) will not work. It seems that there is no other way around this than to have a separate type for directed and undirected graphs.
--Jeff 12:43, 13 Sep 2006 (EDT)
Ken and I discussed this more and it seems that we could have directed and undirected class types for Boost without changing vtkGraph's class hierarchy. This would involve the classes vtkBoostUndirectedGraph and vtkBoostDirectedGraph which simply wrap a pointer to vtkGraph. The only downside is that each boost function call would involve an additional level of indirection. In order to use a Boost algorithm, the classes would need to be wrapped in the appropriate type as follows
vtkGraph* g;
// ...
if (g->GetDirected())
  {
  vtkBoostDirectedGraph graph(g);
  // Perform directed graph algorithm
  }
else
  {
  vtkBoostUndirectedGraph graph(g);
  // Perform undirected graph algorithm
  }
--Jeff 09:29, 14 Sep 2006 (EDT)

Node and Arc structures moved out of vtkAbstractGraph

This is probably a bit off topic for this web page, but shouldn't the structures for holding node and arc connectivity information (currently EdgesToNodes, NodesToAdjacencyList and AdjacencyListToEdges) be moved out of vtkAbstractGraph into the subclasses. It seems to me that a tree should hold connectivity information very differently than a general graph.

Also, as the adjacency list gets more complicated (stored in an efficient, monitored heap) that functionality should probably be encapsulated in its own class so that it can be duplicated easily.

--Ken 08:49, 13 Sep 2006 (EDT)

Should vtkAdjacencyGraph exist?

The benefit of taking the node and arc structures of vtkAbstractGraph is that it makes it possible for subclasses to have more efficient structures. However, I do not think that vtkAdjacencyGraph (or vtkAbstractAdjacencyGraph) should be one of them. The basic idea of an adjacency graph is to have a graph without indexed arcs. There are some advantages to having a graph of that type: the memory footprint is smaller and finding node neighbors is faster. However, I think these advantages are limited.

  1. The memory used by storing arc information is pretty small: two identifiers per arc. I think the memory of storing this in an array and the connectivity information in a heap is still about the same as storing the graph as a bunch of linked lists.
  2. The overhead of looking up an adjacent node using connectivity arrays point to arcs is small. For a directed graph, it takes just two pointer lookups. For an undirected graph, it takes three pointer lookups and a condition. I think we would need some empirical evidence to suggest that this is slowing algorithms down before trying to speed this up.

I can also think of many disadvantages to implementing an adjacency graph that, in my mind, far outweigh the advantages.

  1. The use case is really unclear (and trees don't count, see below). Clearly there will exist graphs that do not have any arc data, but how often will an application be in a situation where it can guarantee that condition? And what if the user wants to later attach data or run a filter that attaches data to arcs? What if the user wants to select arcs. Oops, sorry, they don't have identifiers.
  2. It invalidates the methods in the vtkDataSet superclass. If arcs have no identifiers, how will GetCell behave? It is not out of the question to move vtkAbstractGraph up to inherit directly from vtkDataObject, but I don't think this is a sufficient reason.
Not really. If you don't have edges, numCells == 0. Then, there would be no need to call GetCell(). Berk
  1. It complicates using graphs. Filters and users now have to make decisions on what to do depending on whether a graph supports arc ids and data. Who handles the case where the user wants to run a filter that adds arc data on an adjacency graph? Does the user have to make a conversion? Does the filter detect this and change the graph type? Does the executive have a special case for this?
  2. It's yet two more classes that have to be understood by users and maintained by everybody.
  3. It would be better to support node-to-node lookups in vtkGraph. Assuming that there is a clear advantage to having arrays that point directly from node to node (which is as of yet unproven), then why can't you just add that array to vtkGraph. It's construction and maintenance is almost identical to the node-to-edge lookup arrays. Assuming the implementation only built these arrays on demand, it would be memory efficient since they would only exist if used. Also, you would get the performance boost of direct node-to-node lookups on graphs even if it had arc data.
I don't like this because: 1. because it doubles memory 2. it forces the graph to maintain 2 data structures that have the same information. They would have be sync'ed. It sounds like a bug waiting to happen. Berk

Also, regardless of whether adjacency graphs exist, vtkTree should not be one of them. Since each node has at most one arc to one parent, the id of the arcs can be implicitly defined by the id of the "child" side of the arc. For example, let us assume that the root node always has id zero (we can argue later whether that is a good idea). If we have <math>n</math> nodes, then we have <math>n-1</math> arcs. Arc <math>i</math> can be implicitly defined as the arc pointing from node <math>i+1></math> to its parent. The child-to-parent connections can be represented as a simple id array with <math>n-1</math> entries where each entry <math>i</math> contains the identifier to node <math>i+1</math>'s parent. The parent-to-child connections can be represented in the same way as the arc connections of a regular, directed graph (a heap of arrays).

I think this is a very slick implementation. However, what is the use case? When will we need edge data associated with a tree? And if we do, why wouldn't we use a graph? Berk

--Ken 13:52, 15 Sep 2006 (EDT)

My suggestion is to implement the following:

+ vtkAbstractGraph
|
+--- vtkTree
|
|
+--- vtkGraph

and have vtkTree return 0 for number of edges/cells. Also, to avoid implementing GetXXXEdges() methods in vtkAbstractGraph. In the future, if we have a use case for a graph without explicit edges, we might consider inserting a class between vtkTree and vtkAbstractGraph and making this graph a subclass of that. This would not break backwards compatibility. The same way, if we have a use case for edges in trees, we can go with Ken's idea. I'd rather leave both out until we have good use cases. As Bob said, "it is easier to add to the API than to remove from it". Berk 15:20, 15 Sep 2006 (EDT)

Taking another look at the strategies for vtkGraphLayout, which is currently the only subclass of vtkAbstractGraphAlgorithm which works on both trees and graphs, there are calls to GetNumberOfEdges() and GetSource(edgeId), GetTarget(edgeId). This code could be rewritten to handle the case where the edges cannot be traversed by index, but this would make the code more complex. This would suggest that these methods should be present vtkAbstractGraph, and that the vtkTree should use the edge-numbering convention that Ken suggested.
I agree that we should stick with just vtkGraph and vtkTree.
--Jeff 09:14, 18 Sep 2006 (EDT)
Jeff has already decided to implement arcs in graphs, so this point is probably moot. But nevertheless I would like to capture these potential use cases for arc data in trees.
  1. An algorithm takes a graph as input and generates a tree (e.g. minimum spanning tree). The input graph happens to have data on its arcs.
  2. The vtkTree represents some sort of sorting tree. Each arc represents a relationship between the data in the child subtree to the data in the parent (e.g. less than).
  3. The vtkTree represents some sort of ontology. On each arc you want to put the proportion of entities classified as the parent that are also classified as the child. (This proportion is unrelated to the size of the subtrees.)
--Ken 12:39, 18 Sep 2006 (EDT)

RemoveSubtree vs. RemoveLeaf

Why are you providing a separate method for removing leaves? Aren't leaves just single node subtrees?

--Pat 12:41, 18 Sep 2006 (EDT)