Client Side Delivery: Difference between revisions

From ParaQ Wiki
Jump to navigationJump to search
m (change fetch to Fetch)
mNo edit summary
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
The goal of client side delivery is to provide functions that can be called from a python script to perform a parallel aggregation of data from the server to the client. The bulk of the implementation occurs inside the '''vtkSMGenericViewDisplayProxy''' (GVD) class. This class is used throughout paraview3 to obtain data to display within a view. We have made modifications to the internals of the GVD and created a new class and a new python function function that automates the use of the combined whole.  
The goal of client side delivery is to provide functions that can be called from a python script to perform a parallel aggregation of data from the server to the client. The bulk of the implementation occurs inside the '''vtkSMGenericViewDisplayProxy''' (GVD) class. This class is used throughout paraview3 to obtain data to display within a view. We have made modifications to the internals of the GVD and created a new class and a new python function that automates the use of the combined whole.  


The GVD is a proxy for the pipeline shown in the figure below.  
The GVD is a proxy for the pipeline shown in the figure below.  
Line 7: Line 7:
|-
|-
|  
|  
[[Image:P3_csd.png]]
[[Image:client_side_delivery.png]]
||  
||  
The '''vtkReductionFilter''' class within the GVD does the communication needed to perform data aggregation on the server. This filter defers all data manipulation to two user provided algorithms. The '''PreGatherHelper''' is a user specified vtkAlgorithm that each node in the server executes in parallel _before_ sending their data to the root node. The PreGather algorithm is essential for scalability. The reduction filter then issues MPI calls to gather the output of each node to the root node in the server. Next it runs the '''PostGatherHelper''' algorithm on the intermediate results and leaves the result on the root node. In much of paraview3 the PreGatherHelper is null and the PostGatherHelper is a vtkPolyDataAppend filter which creates a single vtkPolyData with primitives and attributes from every node.
The '''vtkReductionFilter''' class within the GVD does the communication needed to perform data aggregation on the server. This filter defers all data manipulation to two user provided algorithms. The '''PreGatherHelper''' is a user specified vtkAlgorithm that each node in the server executes in parallel _before_ sending their data to the root node. The PreGather algorithm is essential for scalability. The reduction filter then issues MPI calls to gather the output of each node to the root node in the server. Next it runs the '''PostGatherHelper''' algorithm on the intermediate results and leaves the result on the root node. In much of paraview3 the PreGatherHelper is null and the PostGatherHelper is a vtkPolyDataAppend filter which creates a single vtkPolyData with primitives and attributes from every node.

Latest revision as of 15:42, 7 February 2007

The goal of client side delivery is to provide functions that can be called from a python script to perform a parallel aggregation of data from the server to the client. The bulk of the implementation occurs inside the vtkSMGenericViewDisplayProxy (GVD) class. This class is used throughout paraview3 to obtain data to display within a view. We have made modifications to the internals of the GVD and created a new class and a new python function that automates the use of the combined whole.

The GVD is a proxy for the pipeline shown in the figure below.

Client side delivery.png

The vtkReductionFilter class within the GVD does the communication needed to perform data aggregation on the server. This filter defers all data manipulation to two user provided algorithms. The PreGatherHelper is a user specified vtkAlgorithm that each node in the server executes in parallel _before_ sending their data to the root node. The PreGather algorithm is essential for scalability. The reduction filter then issues MPI calls to gather the output of each node to the root node in the server. Next it runs the PostGatherHelper algorithm on the intermediate results and leaves the result on the root node. In much of paraview3 the PreGatherHelper is null and the PostGatherHelper is a vtkPolyDataAppend filter which creates a single vtkPolyData with primitives and attributes from every node.

The vtkClientServerMoveData portion of the GVD pipeline exists primarily on the root node of the server and the client. Its function is to serialize the data on the server and transmit it to the client.

The vtkUpdateSupressor is used to prevent redundant pipeline updates and to communicate piece and extent information to each node in the server.



New work

vtkReductionFilter
This filter has been modified to allow a passthrough of any individual nodes data. This is useful when the data of interest has already been collected to the root node for instance. The PreGatherHelper has been added for scalability during parallel reduction operations.
vtkMinMax
This is a new filter that performs an operation on the attribute data of its input. As of this writing the operation is one of MIN, MAX or SUM. The filter iterates through whatever datasets it is given and calls a templated operate function on each value. The templated function is necessary to avoid double conversion and loss of precision on non double valued arrays. This filter can take one or more input datasets on its first input port and always produces a single vtkPolyData output that contains exactly one point and one cell. This filter is designed to be used as both the PreGatherHelper and PostGatherHelper algorithms above. In this case each node does a local reduction and then the root node does a global reduction on the intermediate results.
paraview.Fetch
This is a new function that has been added to the paraview.py module. It automates the use of the above components and encapsulates many operations within a single function call. The code for the Fetch function is included below:
def Fetch(input, arg=None):
""" 
A convenience method that moves data from the server to the client, 
optionally performing some operation on the data as it moves.
The input argument is the name of the (proxy for a) source or filter
whose output is needed on the client.

You can use fetch to do three things.
If arg is None (the default) then the output is brought to the client 
unmodified. In parallel runs the Append or AppenPolyData filters merges the
data on each processor into one dataset.
 
If arg is an integer then one particular processor's output is brought to
the client. In serial runs the arg is ignored. If you have a filter that
computes results in parallel and brings them to the root node, then set 
arg to be 0.

If arg is an algorithm, for example vtkMinMax, the algorithm will be applied
to the data to obtain some result. In parallel runs the algorithm will be
run on each processor to make intermediate results and then again on the 
root processor over all of the intermediate results to create a 
global result.
"""
 
import types
 
#create the pipeline that reduces and transmits the data
gvd = CreateProxy("displays", "GenericViewDisplay")
gvd.SetInput(input) 

if arg == None:
  print "getting appended"
  if input.GetDataInformation().GetDataClassName() == "vtkPolyData":
    print "use append poly data filter"
    gvd.SetReductionType(1)
  else:
    print "use append filter"
    gvd.SetReductionType(2)
 
elif type(arg) is types.IntType:          
 print "getting node %d" % arg
 gvd.SetReductionType(3)   
 gvd.SetPreGatherHelper(None)
 gvd.SetPostGatherHelper(None)
 gvd.SetPassThrough(arg)

else:
 print "applying operation"
 gvd.SetReductionType(3)   
 gvd.SetPreGatherHelper(arg)
 gvd.SetPostGatherHelper(arg)
 gvd.SetPassThrough(-1)

#go!
gvd.UpdateVTKObjects()
gvd.Update()   
return gvd.GetOutput()

Example use

With the above infrastucture data can be obtained easily from a client side pvpython session as in the following example.

import paraview
#connect to a running pvserver with one or more nodes
paraview.ActiveConnection = paraview.Connect("localhost", 11111)

#create a sample data set
sphere = paraview.CreateProxy("sources", "SphereSource")
elev = paraview.CreateProxy("filters", "ElevationFilter")
elev.SetInput(sphere)
elev.SetLowPoint(0,-.5,0)
elev.SetHighPoint(0,.5,0)

#choose the operation to perform in parallel
max = paraview.CreateProxy("filters", "MinMax")
max.SetOperation("MAX")
max.UpdateVTKObjects()

#fetch the entire data set
out = paraview.Fetch(elev)
print out.GetBounds() 
 
#fetch one node's portion of the data set
out = paraview.Fetch(elev, 0)
print out.GetBounds() 
 
#calculate the maximum attribute values and show the results 
out = paraview.Fetch(elev, max)
arr = out.GetPointData().GetArray("Elevation")
arr.GetValue(0)

Status

Currently there is a bug that happens when the GVD attempts to change the vtkDataSet type within its Pre and PostGather helpers. Using a vtkAppendFilter on image data will produce an unstructured grid dataset which results in warning messages and incomplete results. The same effect can be observed when computing the Max of any non polygonal dataset. We hope to address this bug shortly.

Beyond this issue, to progress further we need to better define the set of aggregation operations that we need. The min/max/sum operation is admittedly very simple and what makes it so is that it is geometry independent and simply throws away all of the input geometry it is given. Specific use cases that involve geometry aggregation are needed.