Client Side Delivery: Difference between revisions

From ParaQ Wiki
Jump to navigationJump to search
m (add info about collection automation)
mNo edit summary
 
(10 intermediate revisions by the same user not shown)
Line 1: Line 1:
The goal of client side delivery is to provide functions that can be called from a python script to perform a parallel aggregation of data with the result obtained on the client. This goal is accomplished with a combination of new, modified, and existing filters. A wrapper filter that automates the creation and control of the necessary pipeline of proxies is under development.
The goal of client side delivery is to provide functions that can be called from a python script to perform a parallel aggregation of data from the server to the client. The bulk of the implementation occurs inside the '''vtkSMGenericViewDisplayProxy''' (GVD) class. This class is used throughout paraview3 to obtain data to display within a view. We have made modifications to the internals of the GVD and created a new class and a new python function that automates the use of the combined whole.  


== Design ==
The GVD is a proxy for the pipeline shown in the figure below.  
;vtkSMGenericViewDisplayProxy: This unmodified filter orchestrates gathering results in paraview for the GUI. The filter internally uses a vtkReductionFilter to direct communications between the server nodes and a vtkClientServerModeData filter that transmits data from the server to the client. For our purposes we use vtkSMGenericViewDisplayProxy's FIRST_NODE_ONLY reduction type setting, which simply transmits data from the root node of the server to the client.


;vtkReductionFilter: This modified filter does the communication needed to perform the aggregation. This filter defers all data manipulation to a user provided algorithm. The filter has been modified to run an additional user provided algorithm, which each node executes in parallel _before_ sending their data to the root node. This pregather algorithm is essential for scalability. After the root node obtains the intermediate results from all nodes it runs a user provided algorithm as before. Both pre and post gather algorithms can refer to the same object. The vtkReductionFilter was also modified to allow simple passthrough of data and to be able to pass though any one particular cluster node's data.
{| border="0"
|- bgcolor="#abcdef"
|-
|
[[Image:client_side_delivery.png]]
||
The '''vtkReductionFilter''' class within the GVD does the communication needed to perform data aggregation on the server. This filter defers all data manipulation to two user provided algorithms. The '''PreGatherHelper''' is a user specified vtkAlgorithm that each node in the server executes in parallel _before_ sending their data to the root node. The PreGather algorithm is essential for scalability. The reduction filter then issues MPI calls to gather the output of each node to the root node in the server. Next it runs the '''PostGatherHelper''' algorithm on the intermediate results and leaves the result on the root node. In much of paraview3 the PreGatherHelper is null and the PostGatherHelper is a vtkPolyDataAppend filter which creates a single vtkPolyData with primitives and attributes from every node.


;vtkMinMax: A new filter that performs an operation on the attribute data of its input. This filter iterates through whatever datasets it is given and calls a templated operate function on each value. The templated function is necessary to avoid double conversion and loss of precision on non double valued arrays. This filter can take one or more input datasets on its first input port and always produces a single unstructured grid output that contains exactly one point and one cell.
The '''vtkClientServerMoveData''' portion of the GVD pipeline exists primarily on the root node of the server and the client. Its function is to serialize the data on the server and transmit it to the client.


;fetcher.py: A python module that automates the collection of data to the client in a pvpython session.
The '''vtkUpdateSupressor''' is used to prevent redundant pipeline updates and to communicate piece and extent information to each node in the server.
|}
 
 
 
 
== New work ==
 
;vtkReductionFilter: This filter has been modified to allow a passthrough of any individual nodes data. This is useful when the data of interest has already been collected to the root node for instance. The PreGatherHelper has been added for scalability during parallel reduction operations.
 
;vtkMinMax: This is a new filter that performs an operation on the attribute data of its input. As of this writing the operation is one of MIN, MAX or SUM. The filter iterates through whatever datasets it is given and calls a templated operate function on each value. The templated function is necessary to avoid double conversion and loss of precision on non double valued arrays. This filter can take one or more input datasets on its first input port and always produces a single vtkPolyData output that contains exactly one point and one cell. This filter is designed to be used as both the PreGatherHelper and PostGatherHelper algorithms above. In this case each node does a local reduction and then the root node does a global reduction on the intermediate results.
 
;paraview.Fetch: This is a new function that has been added to the paraview.py module. It automates the use of the above components and encapsulates many operations within a single function call. The code for the Fetch function is included below:
 
def Fetch(input, arg=None):
"""
A convenience method that moves data from the server to the client,
optionally performing some operation on the data as it moves.
The input argument is the name of the (proxy for a) source or filter
whose output is needed on the client.
You can use fetch to do three things.
If arg is None (the default) then the output is brought to the client
unmodified. In parallel runs the Append or AppenPolyData filters merges the
data on each processor into one dataset.
 
If arg is an integer then one particular processor's output is brought to
the client. In serial runs the arg is ignored. If you have a filter that
computes results in parallel and brings them to the root node, then set
arg to be 0.
If arg is an algorithm, for example vtkMinMax, the algorithm will be applied
to the data to obtain some result. In parallel runs the algorithm will be
run on each processor to make intermediate results and then again on the
root processor over all of the intermediate results to create a  
global result.
"""
 
import types
 
#create the pipeline that reduces and transmits the data
gvd = CreateProxy("displays", "GenericViewDisplay")
gvd.SetInput(input)
if arg == None:
  print "getting appended"
  if input.GetDataInformation().GetDataClassName() == "vtkPolyData":
    print "use append poly data filter"
    gvd.SetReductionType(1)
  else:
    print "use append filter"
    gvd.SetReductionType(2)
 
elif type(arg) is types.IntType:         
  print "getting node %d" % arg
  gvd.SetReductionType(3) 
  gvd.SetPreGatherHelper(None)
  gvd.SetPostGatherHelper(None)
  gvd.SetPassThrough(arg)
else:
  print "applying operation"
  gvd.SetReductionType(3) 
  gvd.SetPreGatherHelper(arg)
  gvd.SetPostGatherHelper(arg)
  gvd.SetPassThrough(-1)
#go!
gvd.UpdateVTKObjects()
gvd.Update() 
return gvd.GetOutput()


== Example use ==
== Example use ==
With the above infrastucture data can be obtained easily from a client side pvpython session as in the following example.


  import paraview
  import paraview
  #connect to a running pvserver
  #connect to a running pvserver with one or more nodes
  paraview.ActiveConnection = paraview.Connect("localhost", 11111)
  paraview.ActiveConnection = paraview.Connect("localhost", 11111)
  #create a sample data set
  #create a sample data set
  sphere = paraview.CreateProxy("sources", "SphereSource")
  sphere = paraview.CreateProxy("sources", "SphereSource")
  elev = paraview.CreateProxy("filters", "ElevationFilter")
  elev = paraview.CreateProxy("filters", "ElevationFilter")
  elev.SetInput(sphere)
  elev.SetInput(sphere)
elev.SetLowPoint(0,-.5,0)
elev.SetHighPoint(0,.5,0)
   
   
  #create the reduction pipeline
  #choose the operation to perform in parallel
  fetch = paraview.CreateProxy("displays", "GenericViewDisplay")
  max = paraview.CreateProxy("filters", "MinMax")
  fetch.SetInput(elev)
  max.SetOperation("MAX")
fetch.SetReductionType(3) #collect from root node only
  max.UpdateVTKObjects()
#tell fetch that its vtkReductionFilter should use the new vtkMinMax
#algorithm on each node and on the root node on the gathered results
op = paraview.CreateProxy("filters", "Operator")
  op.SetOperation(0) #min as opposed to max
op.UpdateVTKObjects()
fetch.SetPreReductionHelper(op)
fetch.SetReductionHelper(op)
#uncomment the next line to get the result from node 1 instead of all
#fetch.SetPassThrough(1)
fetch.UpdateVTKObjects()
#go!
fetch.Update()
   
   
  #the result will be stored in the first (and only) tuple of each array
  #fetch the entire data set
  out = fetch.GetOutput()
out = paraview.Fetch(elev)
print out.GetBounds()
 
#fetch one node's portion of the data set
out = paraview.Fetch(elev, 0)
print out.GetBounds()
 
#calculate the maximum attribute values and show the results
  out = paraview.Fetch(elev, max)
  arr = out.GetPointData().GetArray("Elevation")
  arr = out.GetPointData().GetArray("Elevation")
  arr.GetValue(0)
  arr.GetValue(0)
== Status ==
Currently there is a bug that happens when the GVD attempts to change the vtkDataSet type within its Pre and PostGather helpers. Using a vtkAppendFilter on image data will produce an unstructured grid dataset which results in warning messages and incomplete results. The same effect can be observed when computing the Max of any non polygonal dataset. We hope to address this bug shortly.
Beyond this issue, to progress further we need to better define the set of aggregation operations that we need. The min/max/sum operation is admittedly very simple and what makes it so is that it is geometry independent and simply throws away all of the input geometry it is given. Specific use cases that involve geometry aggregation are needed.

Latest revision as of 15:42, 7 February 2007

The goal of client side delivery is to provide functions that can be called from a python script to perform a parallel aggregation of data from the server to the client. The bulk of the implementation occurs inside the vtkSMGenericViewDisplayProxy (GVD) class. This class is used throughout paraview3 to obtain data to display within a view. We have made modifications to the internals of the GVD and created a new class and a new python function that automates the use of the combined whole.

The GVD is a proxy for the pipeline shown in the figure below.

Client side delivery.png

The vtkReductionFilter class within the GVD does the communication needed to perform data aggregation on the server. This filter defers all data manipulation to two user provided algorithms. The PreGatherHelper is a user specified vtkAlgorithm that each node in the server executes in parallel _before_ sending their data to the root node. The PreGather algorithm is essential for scalability. The reduction filter then issues MPI calls to gather the output of each node to the root node in the server. Next it runs the PostGatherHelper algorithm on the intermediate results and leaves the result on the root node. In much of paraview3 the PreGatherHelper is null and the PostGatherHelper is a vtkPolyDataAppend filter which creates a single vtkPolyData with primitives and attributes from every node.

The vtkClientServerMoveData portion of the GVD pipeline exists primarily on the root node of the server and the client. Its function is to serialize the data on the server and transmit it to the client.

The vtkUpdateSupressor is used to prevent redundant pipeline updates and to communicate piece and extent information to each node in the server.



New work

vtkReductionFilter
This filter has been modified to allow a passthrough of any individual nodes data. This is useful when the data of interest has already been collected to the root node for instance. The PreGatherHelper has been added for scalability during parallel reduction operations.
vtkMinMax
This is a new filter that performs an operation on the attribute data of its input. As of this writing the operation is one of MIN, MAX or SUM. The filter iterates through whatever datasets it is given and calls a templated operate function on each value. The templated function is necessary to avoid double conversion and loss of precision on non double valued arrays. This filter can take one or more input datasets on its first input port and always produces a single vtkPolyData output that contains exactly one point and one cell. This filter is designed to be used as both the PreGatherHelper and PostGatherHelper algorithms above. In this case each node does a local reduction and then the root node does a global reduction on the intermediate results.
paraview.Fetch
This is a new function that has been added to the paraview.py module. It automates the use of the above components and encapsulates many operations within a single function call. The code for the Fetch function is included below:
def Fetch(input, arg=None):
""" 
A convenience method that moves data from the server to the client, 
optionally performing some operation on the data as it moves.
The input argument is the name of the (proxy for a) source or filter
whose output is needed on the client.

You can use fetch to do three things.
If arg is None (the default) then the output is brought to the client 
unmodified. In parallel runs the Append or AppenPolyData filters merges the
data on each processor into one dataset.
 
If arg is an integer then one particular processor's output is brought to
the client. In serial runs the arg is ignored. If you have a filter that
computes results in parallel and brings them to the root node, then set 
arg to be 0.

If arg is an algorithm, for example vtkMinMax, the algorithm will be applied
to the data to obtain some result. In parallel runs the algorithm will be
run on each processor to make intermediate results and then again on the 
root processor over all of the intermediate results to create a 
global result.
"""
 
import types
 
#create the pipeline that reduces and transmits the data
gvd = CreateProxy("displays", "GenericViewDisplay")
gvd.SetInput(input) 

if arg == None:
  print "getting appended"
  if input.GetDataInformation().GetDataClassName() == "vtkPolyData":
    print "use append poly data filter"
    gvd.SetReductionType(1)
  else:
    print "use append filter"
    gvd.SetReductionType(2)
 
elif type(arg) is types.IntType:          
 print "getting node %d" % arg
 gvd.SetReductionType(3)   
 gvd.SetPreGatherHelper(None)
 gvd.SetPostGatherHelper(None)
 gvd.SetPassThrough(arg)

else:
 print "applying operation"
 gvd.SetReductionType(3)   
 gvd.SetPreGatherHelper(arg)
 gvd.SetPostGatherHelper(arg)
 gvd.SetPassThrough(-1)

#go!
gvd.UpdateVTKObjects()
gvd.Update()   
return gvd.GetOutput()

Example use

With the above infrastucture data can be obtained easily from a client side pvpython session as in the following example.

import paraview
#connect to a running pvserver with one or more nodes
paraview.ActiveConnection = paraview.Connect("localhost", 11111)

#create a sample data set
sphere = paraview.CreateProxy("sources", "SphereSource")
elev = paraview.CreateProxy("filters", "ElevationFilter")
elev.SetInput(sphere)
elev.SetLowPoint(0,-.5,0)
elev.SetHighPoint(0,.5,0)

#choose the operation to perform in parallel
max = paraview.CreateProxy("filters", "MinMax")
max.SetOperation("MAX")
max.UpdateVTKObjects()

#fetch the entire data set
out = paraview.Fetch(elev)
print out.GetBounds() 
 
#fetch one node's portion of the data set
out = paraview.Fetch(elev, 0)
print out.GetBounds() 
 
#calculate the maximum attribute values and show the results 
out = paraview.Fetch(elev, max)
arr = out.GetPointData().GetArray("Elevation")
arr.GetValue(0)

Status

Currently there is a bug that happens when the GVD attempts to change the vtkDataSet type within its Pre and PostGather helpers. Using a vtkAppendFilter on image data will produce an unstructured grid dataset which results in warning messages and incomplete results. The same effect can be observed when computing the Max of any non polygonal dataset. We hope to address this bug shortly.

Beyond this issue, to progress further we need to better define the set of aggregation operations that we need. The min/max/sum operation is admittedly very simple and what makes it so is that it is geometry independent and simply throws away all of the input geometry it is given. Specific use cases that involve geometry aggregation are needed.