Data Array Strides and Iteration: Difference between revisions

From ParaQ Wiki
Jump to navigationJump to search
Line 4: Line 4:


This document explores ways to handle different striding in data arrays while allowing algorithms efficient access to the data.
This document explores ways to handle different striding in data arrays while allowing algorithms efficient access to the data.
== Data Layouts to Support ==
Let us briefly consider the different ways we expect do find data to be lain out.  We can then postulate on a general storage mechanism that handles all of them.
=== VTK Layout ===
Everyone knows VTK's Layout, right.  Components for each tuple are in contiguous areas of memory.  Tuples are packed tightly in order.
{| border=1
|
{| border=1
| T0C0 || T0C1 || T0C2
|}
|
{| border=1
| T1C0 || T1C1 || T1C2
|}
| ...
|
{| border=1
| T''n''C0 || T''n''C1 || T''n''C2
|}
|}
=== Component Major Order ===
=== Contiguous Chunks ===
=== Layouts Not Considered ===
=== General Storage ===


== Access Timing ==
== Access Timing ==

Revision as of 17:39, 6 March 2009

One of the most important issues we have run into with our in-situ work is the ability to shallow copy arrays with strides that are different than those native to the vtkDataArrays. That is, the simulation stores its data in a different way than VTK does.

We want vtkDataArray to conform to data layouts that are provided by simulations. Our current solution is to subclass the appropriate data array and handle the different layout in the subclass. That works fine so long as the calling codes call the virtual accessor methods (such as GetTuple). However, using these methods is slow (see timings below), so many filters opt to grab a properly cast C pointer and access the array directly. If an application does this, then there is no choice but to deep copy the array, which is back to the bad situation we had to begin with. Another problem which may come up is that there are several accessor methods in vtkDataArrayTemplate and its subclasses (such as GetValue) that are not virtual. If something calls one of these methods on a modified data array subclass, something wrong will happen.

This document explores ways to handle different striding in data arrays while allowing algorithms efficient access to the data.

Data Layouts to Support

Let us briefly consider the different ways we expect do find data to be lain out. We can then postulate on a general storage mechanism that handles all of them.

VTK Layout

Everyone knows VTK's Layout, right. Components for each tuple are in contiguous areas of memory. Tuples are packed tightly in order.

T0C0 T0C1 T0C2
T1C0 T1C1 T1C2
...
TnC0 TnC1 TnC2

Component Major Order

Contiguous Chunks

Layouts Not Considered

General Storage

Access Timing

Before we can reasonably talk about the efficiency of data array iteration, we should understand how long it takes to use the different accessors. To do this, I a created a simple program that does a simple computation on a set of double arrays. It times how long it takes to access the data through Get/SetTuple1, Get/SetValue, or direct array access (from the array returned with GetPointer). Here is the program:

#include "vtkDoubleArray.h"
#include "vtkMath.h"
#include "vtkTimerLog.h"

#define NUM_ARRAY_ENTRIES	10000000
#define NUM_REPETITIONS		100

#define TUPLE_ACCESS  0
#define VALUE_ACCESS  1
#define DIRECT_ACCESS 0

int main(int, char **)
{
  vtkTimerLog *timer = vtkTimerLog::New();
  vtkDoubleArray *a = vtkDoubleArray::New();
  vtkDoubleArray *b = vtkDoubleArray::New();
  vtkDoubleArray *c = vtkDoubleArray::New();
  a->SetNumberOfTuples(NUM_ARRAY_ENTRIES);
  b->SetNumberOfTuples(NUM_ARRAY_ENTRIES);
  c->SetNumberOfTuples(NUM_ARRAY_ENTRIES);

  for (vtkIdType i = 0; i < NUM_ARRAY_ENTRIES; i++)
    {
    a->SetValue(i, vtkMath::Random());
    b->SetValue(i, vtkMath::Random());
    }

#if DIRECT_ACCESS
  double *aarray = a->GetPointer(0);
  double *barray = b->GetPointer(0);
  double *carray = c->WritePointer(0, NUM_ARRAY_ENTRIES);
#endif

  timer->StartTimer();
  for (int j = 0; j < NUM_REPETITIONS; j++)
    {
    for (vtkIdType i = 0; i < NUM_ARRAY_ENTRIES; i++)
      {
      double aval, bval, cval;

#if TUPLE_ACCESS
      aval = a->GetTuple1(i);
      bval = b->GetTuple1(i);
#endif
#if VALUE_ACCESS
      aval = a->GetValue(i);
      bval = b->GetValue(i);
#endif
#if DIRECT_ACCESS
      aval = aarray[i];
      bval = barray[i];
#endif

      cval = aval + bval;

#if TUPLE_ACCESS
      c->SetTuple1(i, cval);
#endif
#if VALUE_ACCESS
      c->SetValue(i, cval);
#endif
#if DIRECT_ACCESS
      carray[i] = cval;
#endif
      }
    }
  timer->StopTimer();

  cout << "Time to compute: " << timer->GetElapsedTime() << endl;

  timer->Delete();
  a->Delete();
  b->Delete();
  c->Delete();
  return 0;
}

I compiled this in release mode (with compiler optimizations on) and linked it against a similarly compiled VTK. I ran the test for the Get/SetValue access twice: once for the current implementation and once for if those methods are declared virtual. The latter is more representative of the speed for accessing a subclasses array since we would have to declare them virtual for them to work anyway. I ran the program for arrays containing 10,000,000 and iterating over them 100 times. Here are the results.

Access Type Time (sec)
GetTuple1/SetTuple1 36.5643
GetValue/SetValue - inline 7.16258
GetValue/SetValue - virtual 10.904
Direct array access 7.29754

In summary, the inlined Get/SetValue is just as fast as direct memory access. The compiler seems to be doing its job in inline these methods. Making the methods virtual adds about a 50% overhead to the memory access. The Get/SetTuple methods are crap in terms of speed. They take over 5 times as long as direct memory access.

Minimal Changes

One of the problems with the current implementation is that a coder, when faced with a vtkDataArray of unknown type, is faced with two options. The first is to use the generic Get/SetTuple methods, which, as demonstrated above, are slow. The second option is to use the vtkTemplateMacro to cast a pointer and call a templated method. That results in code like this.

switch (dataArray->GetDataType())
  {
  vtkTemplateMacro(myTemplatedFunction(static_cast<VTK_TT>(dataArray->GetVoidPointer(0)),
                                       dataArray->GetNumberOfTuples(),
                                       dataArray->GetNumberOfComponents()));
  }

So a coder either has to accept inefficiency or do raw pointer access. Thus, we are actually encouraging coders to call GetVoidPointer, which forces us to do deep memory copies in in-situ.

A better approach is to encourage users to down cast the array to the appropriate subclass and then use the more efficient typed accessors to the data. This has the double advantage of making code management easier for coders and giving us more control over how the arrays are accessed. The simplest way to implement this is to add another typedef to vtkTemplateMacro that defines the array type.

switch (dataArray->GetDataType())
  {
  vtkTemplateMacro(myTemplatedFunction(static_cast<VTK_ARRAYT>(dataArray)));
  }
Will this work right for smart pointers? --Ken 17:24, 6 March 2009 (EST)

The only disadvantage to this approach that I can see is that the templated function is no longer templated on type for which the computation is happening. This can also be easily circumvented by adding something like typedef T ValueType to vtkDataArrayTemplate. Then the templated function can get properly cast values as follows.

template<class T> void myTemplatedFunction(T *dataArray)
{
  T::DataType value = dataArray->GetValue(0);

Possible Workaround

Acknowledgments

This work was done in part at Sandia National Laboratories. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.