Plotting Parallel Data (PGraph): Difference between revisions

From ParaQ Wiki
Jump to navigationJump to search
(Not just vtkImageData)
 
(8 intermediate revisions by the same user not shown)
Line 87: Line 87:
=Approach=
=Approach=


Our approach is based on the fact that the input for all of the plots given in [[#Plot Types|Plot Types]] can be represented as either 1D or 2D arraysFurthermore, because the resolution of the displays is limited to what representable by the display media, the size of these arrays Thus, to simplify things and enable us to break this problem into manageable pieces, we represent the plot data in simple arrays.
Our approach hinges on the distinction between [[#Abstract Visualizations with Plots|abstract visualizations with plots]] and [[#Plot Types|plot types]].  An implementation of a component that draws a particular [[#Plot Types|plot type]] can, and should, be independent of the type of [[#Abstract Visualizations with Plots|visualization]] being performedA visualization involving plots then becomes a two step process: first compute the data for the plot and then draw the plot. This allows us to break the process into components.


The image below shows the basic units of PGraph.  The [[#Sampling|Sampling]] unit takes VTK objects, which may be distributed, and performs some sort of sampling to generate arrays of data.  The [[#Parallel Reduction|Parallel Reduction]] unit combines array distributed across processes to a single array.  The [[#Server to Client|Server to Client]] unit transfers data from the root of the data server to the client.  Finally, [[#Plotting Components|Plotting Components]] are components that represent the arrays a plots.  They are components that plug into the GUI.
The image below shows the basic units of PGraph.  The [[#Extraction|Extraction]] unit takes VTK objects, which may be distributed, and performs some sort of operation that generates data that can be directly plotted.  The [[#Extraction|Extraction]] will also perform a type of data reduction.  Since this data will ultimately be displayed on a media with a limited resolution, this can be safely done without losing information in the final result.  The [[#Parallel Combination|Parallel Combination]] unit combines data distributed across processes to a single processes.  The [[#Server to Client|Server to Client]] unit transfers data from the root of the data server to the client.  Finally, [[#Plotting Components|Plotting Components]] are components that draw plots of the data.  They are components that plug into the GUI.


<center>
<center>
Line 95: Line 95:
</center>
</center>


Obviously, some of these units may not be necessary.  For example, [[#Server to Client|Server to Client]] is only necessary in client/server mode.  [[#Parallel Reduction|Parallel Reduction]] is only necessary in a parallel job.
Obviously, some of these units may not be necessary.  For example, [[#Server to Client|Server to Client]] is only necessary in client/server mode.  [[#Parallel Combination|Parallel Combination]] is only necessary in a parallel job.


=Sampling=
=Extraction=


The sampling unit takes a vtkDataSet and generates data in 1D or 2D arraysIt comprises a collection of vtkImageAlgorithm classesThere will be a separate vtkImageAlgorithm class for each type of sampling performedIn general, we will probably need a separate image algorithm for each type of [[#Abstract Visualizations with Plots|plot visualization]], although similar ones may be combined.
The extraction unit takes a vtkDataSet and generates data that can be given to a [[#Plotting Components|plotting component]]The generated data is also stored in a vtkDataSetThe data to be plotted will be stored in the point or cell arrays of the generated data.  However, the geometry of the vtkDataSet provides cues on how the data should be plottedFor example, a regularly sampled function may be stored in a vtkImageData, whereas an irregularly sample function may be stored in a vtkRectilinearGrid, with the spacing of the grid determining the spacing of the samples in the function.


Using vtkImageAlgorithm objects to perform the sampling provides many advantagesA vtkImageAlgorithm can interface directly with a vtkDataSet or within any VTK pipelineThe output of the vtkImageAlgorithm, a vtkImage, can also be used with other VTK components or passed downstream to another pipeline.  Also, using a vtkImageAlgorithm makes controling the system with ParaView's server manager objects easy.  Finally, using a vtkImageAlgorithm does no harm as it is not difficult to get the raw array data from a vtkImage object.
The implementation of the extraction unit comprises a collection of vtkAlgorithm classesThere is a separate vtkAlgorithm class for each type of extraction performedIn general, we can expect to need a separate algorithm for each type of [[#Abstract Visualizations with Plots|plot visualization]], although similar ones may be combined.


The algorithms used for sampling are not parallel algorithms, even though they run in a parallel jobInstead, the algorithm on each process runs independently on each piece of dataIt is up to the [[#Parallel Reduction|Parallel Reduction]] unit to combine the results.
Using vtkAlgorithm objects to perform the extraction provides many advantages.  A vtkAlgorithm can interface directly with a vtkDataSet or within any VTK pipeline.  The output of the vtkAlgorithm can also be used with other VTK components or passed downstream to another pipelineAlso, using a vtkAlgorithm makes controlling the system with ParaView's server manager objects easyFinally, using a vtkAlgorithm does no harm as it is not difficult to get the raw array data from a vtkDataSet object.


=Parallel Reduction=
The algorithms used for extraction are not parallel algorithms, even though they run in a parallel job.  Instead, the algorithm on each process runs independently on each piece of data.  It is up to the [[#Parallel Combination|Parallel Combination]] unit to combine the results.


If the [[#Sampling|Sampling]] occurs in parallel, the resulting arrays on each process need to be combined into a single array.  The combining follows the following algorithm:
=Parallel Combination=
 
If the [[#Extraction|Extraction]] occurs in parallel, the resulting data objects on each process need to be combined into a single data object.  The amount of data produced by the [[#Extraction|Extraction]] unit should be small, so the Parallel Combination unit should have no trouble storing the resulting data on a single process.
 
The Parallel Combination is implemented in one or more parallel vtkAlgorithm classes.  We do this for all the advantages listed in [[#Extraction]].
 
The data coming from the [[#Extraction|Extraction]] unit, if parallel, will be in one of two forms.  The first form has the geometry replicated on all processes with different data in the point and cell arrays.  In this case, the data in the arrays must be [[#Reduction|reduced]].  The second form has the geometry distributed across multiple processes.  Apart from ghost cells, the data is not replicated amongst processes.  In this case, the data must be [[#Collection|collected]].
 
==Reduction==
 
In the case where the geometry is replicated, the point or cell arrays on each process need to be combined into a single array.  The combining follows the following algorithm:


  for each array index <math>i</math>
  for each array index <math>i</math>
   <math>A_1[i] \leftarrow A_1[i] \oplus A_2[i] \oplus A_3[i] \oplus \ldots \oplus A_N[i]</math>
   <math>A_1[i] \leftarrow A_1[i] \oplus A_2[i] \oplus A_3[i] \oplus \ldots \oplus A_N[i]</math>


where <math>A_j[i]</math> is the <math>i</math>th element of the array on process <math>j</math> and <math>\oplus</math> is some mathematical binary operation.  In words, it is a piecewise "sum" of elements across processes, except that the operation used to combine numbers may be something other than plus.  This operation can changed based on the [[#Abstract Visualizations with Plots|plot visualization type]].  For histograms, the operation would be an actual sum.  For line probes, the operation would be to select a single value that is not undefined.
where <math>A_j[i]</math> is the <math>i</math>th element of the array on process <math>j</math> and <math>\oplus</math> is some mathematical binary operation.  In words, it is a piecewise "sum" of elements across processes, except that the operation used to combine numbers may be something other than plus.  This operation can changed based on the [[#Abstract Visualizations with Plots|plot visualization type]].  For histograms, the operation would be an actual sum.  For curve probes, the operation would be to select a single value that is not undefined.


In parallel computing, this is called a reduction.  In fact, MPI has a function specifically designed to do a reduction.  For smaller sized data, simply using this MPI function may be sufficient.  In fact, the function may be sufficient for larger 2D arrays.
In parallel computing, this is called a reduction.  In fact, MPI has a function specifically designed to do a reduction.  For smaller sized data, simply using this MPI function may be sufficient.  In fact, the function may be sufficient for larger 2D arrays.  Ultimately, the size of the data being reduced is limited and the time to generate the data is not critical.


Another option to do the reduction is to consider our parallel rendering code.  If you think about it, image compositing is really just a reduction operation with special depth-test or color-blending operations.  We have at our disposal a very efficient library to do these reductions: IceT.  It may be worthwhile to leverage the IceT code to do reduction of larger data.
==Collection==


Whatever code we use to perform the parallel reduction, we can wrap it up in a parallel vtkImageAlgorithm classWe should do this for all the advantages listed in [[#Sampling]].
In the case where the geometry is distributed, Parallel Combination involves just bringing all the pieces to a single processesThis is the basic functionality of the vtkMPIMoveData.  Currently, vtkMPIMoveData only works on vtkPolyData.  It may be necessary to modify vtkMPIMoveData to handle other types of data.


=Server to Client=
=Server to Client=
Line 124: Line 134:
If the original data resides on a server, and the [[#Plotting Components|plotting component]] resides on the client, at some point the data must be moved.  The actual data transfer is fairly trivial.  The difficult part is coordinating the transfers with the creation and drawing of plot.
If the original data resides on a server, and the [[#Plotting Components|plotting component]] resides on the client, at some point the data must be moved.  The actual data transfer is fairly trivial.  The difficult part is coordinating the transfers with the creation and drawing of plot.


Plot visualizations are to be managed by a special subclass of vtkSMConsumerDisplayProxy called vtkSMPlotDisplayProxy.  This display proxy takes as input any source from the pipeline.  To this input vtkSMPlotDisplayProxy adds the necessary vtkImageAlgorithms to perform the [[#Sampling|Sampling]] and [[#Parallel Reduction|Parallel Reduction]].  It also adds a mechanism for transfer the data from server to client.
Plot visualizations are to be managed by a special subclass of vtkSMConsumerDisplayProxy called vtkSMPlotDisplayProxy.  This display proxy takes as input any source from the pipeline.  To this input vtkSMPlotDisplayProxy adds the necessary vtkImageAlgorithms to perform the [[#Extraction|Extraction]] and [[#Parallel Combination|Parallel Combination]].  It also adds a mechanism for transfer the data from server to client.


vtkSMPlotDisplayProxy controls the logic of when it is necessary to update and transfer the data.  This makes vtkSMPlotDisplayProxy very similar to other display proxies.  It may also supersede (and therefore deprecate) the vtkSMXYPlotDisplayProxy.  Different types of [[#Abstract Visualizations with Plots|plot visualizations]] will probably require different specialized subclasses of vtkSMPlotDisplayProxy.
vtkSMPlotDisplayProxy controls the logic of when it is necessary to update and transfer the data.  This makes vtkSMPlotDisplayProxy very similar to other display proxies.  It may also supersede (and therefore deprecate) the vtkSMXYPlotDisplayProxy.  Different types of [[#Abstract Visualizations with Plots|plot visualizations]] will probably require different specialized subclasses of vtkSMPlotDisplayProxy.

Latest revision as of 13:34, 18 January 2006

Overview

This document exists to outline the mechanism for plotting data that comes from distributed parallel sources within ParaQ. We dub the collection of code for parallel plotting as PGraph, although the implementation will actually be set of smaller, distinct pieces. Because two of the main design criteria for ParaQ are the ability to scale well on parallel machines and provide quantitative information, the design of PGraph is critical.

This is an active design document. If you have any comments or suggestions, please add them to the discussion page. This will allow us to track and resolve issues.

Use Cases

The overall goal of PGraph is to provide plot-style visualizations of distributed VTK objects. In this section we list the types of plots users may wish to perform on their data. I encourage other ParaQ developers to contribute to this list as they see fit.

Immediately below is a table listing the types of abstract visualizations with plots users may wish to create. The "Plot Type" column refers to one of the types of plots, which are listed further below in the Plot Types table.

Abstract Visualizations with Plots

Name Description Plot Type
Histogram Given a data array (usually point scalars or cell scalars), returns a discrete function with <math>N</math> items or bins. Each bin has an assigned range and contains the count of entries in the data array that fall within that range. Bar Chart or X-Y Plot
Time Plot Given a point or cell, show the values of some scalar over time. X-Y Plot
Curve Probe Given a curve in space, plot the value of a scalar field defined by a data set along the curve. In general the curve may be a line, but circles have also been requested. X-Y Plot
Plane Probe Given a plane in space, plot the value of a scalar field defined by a data set along the line. X-Y-Z Plot
Scatter Plot Like a histogram, except that two scalars are selected and the bins are set into a 2D grid with each bin representing a range in each scalar. Density Plot
Surface Cut Plot along the curve resulting from the intersection of a surface and a cutting plane. X-Y Plot

Plot Types

Name Input Dimensions Description
Bar Chart 1 Shows discrete values with bars with heights proportional to the values.
X-Y Plot 1 A traditional graph of a function with the variables on the X axis and the value on the Y axis.
X-Y-Z Plot 2 This plot shows a surface in 3-space. The X and Y axis represent two input variables. The position of the surface in Z corresponds to the value for the variables at the given X and Y position.
Density Plot 2 A 2D plot with both the X and Y axis representing input variables. Every point in the image is shaded (or colored) to represent the value at that point.

Approach

Our approach hinges on the distinction between abstract visualizations with plots and plot types. An implementation of a component that draws a particular plot type can, and should, be independent of the type of visualization being performed. A visualization involving plots then becomes a two step process: first compute the data for the plot and then draw the plot. This allows us to break the process into components.

The image below shows the basic units of PGraph. The Extraction unit takes VTK objects, which may be distributed, and performs some sort of operation that generates data that can be directly plotted. The Extraction will also perform a type of data reduction. Since this data will ultimately be displayed on a media with a limited resolution, this can be safely done without losing information in the final result. The Parallel Combination unit combines data distributed across processes to a single processes. The Server to Client unit transfers data from the root of the data server to the client. Finally, Plotting Components are components that draw plots of the data. They are components that plug into the GUI.

PGraph.png

Obviously, some of these units may not be necessary. For example, Server to Client is only necessary in client/server mode. Parallel Combination is only necessary in a parallel job.

Extraction

The extraction unit takes a vtkDataSet and generates data that can be given to a plotting component. The generated data is also stored in a vtkDataSet. The data to be plotted will be stored in the point or cell arrays of the generated data. However, the geometry of the vtkDataSet provides cues on how the data should be plotted. For example, a regularly sampled function may be stored in a vtkImageData, whereas an irregularly sample function may be stored in a vtkRectilinearGrid, with the spacing of the grid determining the spacing of the samples in the function.

The implementation of the extraction unit comprises a collection of vtkAlgorithm classes. There is a separate vtkAlgorithm class for each type of extraction performed. In general, we can expect to need a separate algorithm for each type of plot visualization, although similar ones may be combined.

Using vtkAlgorithm objects to perform the extraction provides many advantages. A vtkAlgorithm can interface directly with a vtkDataSet or within any VTK pipeline. The output of the vtkAlgorithm can also be used with other VTK components or passed downstream to another pipeline. Also, using a vtkAlgorithm makes controlling the system with ParaView's server manager objects easy. Finally, using a vtkAlgorithm does no harm as it is not difficult to get the raw array data from a vtkDataSet object.

The algorithms used for extraction are not parallel algorithms, even though they run in a parallel job. Instead, the algorithm on each process runs independently on each piece of data. It is up to the Parallel Combination unit to combine the results.

Parallel Combination

If the Extraction occurs in parallel, the resulting data objects on each process need to be combined into a single data object. The amount of data produced by the Extraction unit should be small, so the Parallel Combination unit should have no trouble storing the resulting data on a single process.

The Parallel Combination is implemented in one or more parallel vtkAlgorithm classes. We do this for all the advantages listed in #Extraction.

The data coming from the Extraction unit, if parallel, will be in one of two forms. The first form has the geometry replicated on all processes with different data in the point and cell arrays. In this case, the data in the arrays must be reduced. The second form has the geometry distributed across multiple processes. Apart from ghost cells, the data is not replicated amongst processes. In this case, the data must be collected.

Reduction

In the case where the geometry is replicated, the point or cell arrays on each process need to be combined into a single array. The combining follows the following algorithm:

for each array index <math>i</math>
  <math>A_1[i] \leftarrow A_1[i] \oplus A_2[i] \oplus A_3[i] \oplus \ldots \oplus A_N[i]</math>

where <math>A_j[i]</math> is the <math>i</math>th element of the array on process <math>j</math> and <math>\oplus</math> is some mathematical binary operation. In words, it is a piecewise "sum" of elements across processes, except that the operation used to combine numbers may be something other than plus. This operation can changed based on the plot visualization type. For histograms, the operation would be an actual sum. For curve probes, the operation would be to select a single value that is not undefined.

In parallel computing, this is called a reduction. In fact, MPI has a function specifically designed to do a reduction. For smaller sized data, simply using this MPI function may be sufficient. In fact, the function may be sufficient for larger 2D arrays. Ultimately, the size of the data being reduced is limited and the time to generate the data is not critical.

Collection

In the case where the geometry is distributed, Parallel Combination involves just bringing all the pieces to a single processes. This is the basic functionality of the vtkMPIMoveData. Currently, vtkMPIMoveData only works on vtkPolyData. It may be necessary to modify vtkMPIMoveData to handle other types of data.

Server to Client

If the original data resides on a server, and the plotting component resides on the client, at some point the data must be moved. The actual data transfer is fairly trivial. The difficult part is coordinating the transfers with the creation and drawing of plot.

Plot visualizations are to be managed by a special subclass of vtkSMConsumerDisplayProxy called vtkSMPlotDisplayProxy. This display proxy takes as input any source from the pipeline. To this input vtkSMPlotDisplayProxy adds the necessary vtkImageAlgorithms to perform the Extraction and Parallel Combination. It also adds a mechanism for transfer the data from server to client.

vtkSMPlotDisplayProxy controls the logic of when it is necessary to update and transfer the data. This makes vtkSMPlotDisplayProxy very similar to other display proxies. It may also supersede (and therefore deprecate) the vtkSMXYPlotDisplayProxy. Different types of plot visualizations will probably require different specialized subclasses of vtkSMPlotDisplayProxy.

Plotting Components

At the very end of the chain of components we have the a set of widgets that draw plots. These widgets are stand-alone in the sense that they can interface with any application that wishes to plot some array of data. They should also share similar interfaces as much as possible so that they can be swapped for one another as much as possible.

An open question is how to properly interface the vtkSMPlotDisplayProxy described in #Server to Client. The answer may be to have a special subclass of vtkSMRenderModuleProxy that interfaces with one of these plotting components rather than a vtkRenderWindow.