Multi Core Processing: Difference between revisions

From ParaQ Wiki
Jump to navigationJump to search
(Not finished)
 
No edit summary
Line 10: Line 10:


== Approaches ==
== Approaches ==
There are multiple general approaches we can make.


=== Multiblock parallelism ===
=== Multiblock parallelism ===
Line 17: Line 19:
'''Pros:'''  
'''Pros:'''  
* Lots of parallelism for "free."  We only implement the parallelism once in the pipeline and the simple filters automatically are parallelized.
* Lots of parallelism for "free."  We only implement the parallelism once in the pipeline and the simple filters automatically are parallelized.
* Encourages readers and filters to keep pieces separated, even when they belong to the same logical block (using vtkMultiPieceDataSet).  For example, the Exodus reader spends a lot of time appending pieces from separate files together.


'''Cons:'''
'''Cons:'''
* Only works for multiblocks and at whatever granularity the blocks in the data are stored.  That does not exactly guarantee efficiency.
* Only works for multiblocks and at whatever granularity the blocks in the data are stored.  That does not exactly guarantee efficiency.
* The way the pipeline iterates over multiblocks is proving to be not the fastest way to handle the multiblock data sets, especially when there are many blocks.  Efficiency may dictate writing special filters to handle the multiblock anyway.
* The way the pipeline iterates over multiblocks is proving to be not the fastest way to handle the multiblock data sets, especially when there are many blocks.  Efficiency may dictate writing special filters to handle the multiblock anyway.
* An issue with vtkMultiPieceDataSet (especially with unstructured data) is that proper handling of these will require the processing of ghost cells.  We will need to generate algorithms to do this, they may have to run automatically, and they may take a large computational overhead which may or may not parallelize well.
* I do not expect very many filters to actually be thread safe.  At the least, the initialization phase may be accessing and writing to some field variables.  In the end, we may need to do a lot of editing on filters anyway.
* I do not expect very many filters to actually be thread safe.  At the least, the initialization phase may be accessing and writing to some field variables.  In the end, we may need to do a lot of editing on filters anyway.


=== Customizing Filters ===
=== Customizing Filters ===
Although there are an awful lot of filters in VTK, a small subset are exposed in ParaView.  An even smaller subset is commonly used.  We could parallelize ParaView functionality simply by optimizing filters for threads independently.  We start with the most common and compute intensive filters and move down.
'''Pros:'''
* As each filter is independently optimized, we are likely to get the most efficient form of parallelism this way.
* Parallelism granularity not dependent on partitioning of the data.
* Takes true advantage of the shared memory nature of the processors.
'''Cons:'''
* More work (but probably easier work) to parallelize each filter one at a time.
* Inevitably, some filters will remain non-parallel.


== Threading Methods ==
== Threading Methods ==
A related but separate decision on


=== VTK threads ===
=== VTK threads ===

Revision as of 22:33, 5 March 2008

This page captures the initial thoughts on multi core processing. Currently, ParaView (basically) only supports MPI-based parallel processing. This type of parallelism concerns the latest generation of multicore processors that are common now.

Motivation

In some sense, the parallelization of ParaView is done. ParaView already supports efficient parallel processing with MPI, which is the "hard" version of distributed memory parallel processing. So far, we have made use of multi core processors by running multiple process on each node to take advantage of the multiple cores as if they had separate memory.

This works OK, but it is not the most efficient use of these SMP parallel processes. An even bigger problem is that this method only really works when running in client/server mode. A very valid mode of operation that is becoming commonplace is the usage of a high end workshop with a fair number of cores (4-8) to do visualization of moderate size data sets (moderate in comparison to some of the large simulations done at Sandia). Right now, the only way to take advantage of the multiple cores is to launch a server locally, and that is a waste of resources on many levels.

Thus, it is becoming vital that we natively support multicore processors.

Approaches

There are multiple general approaches we can make.

Multiblock parallelism

Multiblock support within ParaView is getting better and better. We are also seeing more data being multiblock. Since all of these blocks are in core anyway, one easy way to take advantage of this split of the data is to run the processing of each block in a different thread (since many algorithms are run independently on blocks anyway.

Pros:

  • Lots of parallelism for "free." We only implement the parallelism once in the pipeline and the simple filters automatically are parallelized.
  • Encourages readers and filters to keep pieces separated, even when they belong to the same logical block (using vtkMultiPieceDataSet). For example, the Exodus reader spends a lot of time appending pieces from separate files together.

Cons:

  • Only works for multiblocks and at whatever granularity the blocks in the data are stored. That does not exactly guarantee efficiency.
  • The way the pipeline iterates over multiblocks is proving to be not the fastest way to handle the multiblock data sets, especially when there are many blocks. Efficiency may dictate writing special filters to handle the multiblock anyway.
  • An issue with vtkMultiPieceDataSet (especially with unstructured data) is that proper handling of these will require the processing of ghost cells. We will need to generate algorithms to do this, they may have to run automatically, and they may take a large computational overhead which may or may not parallelize well.
  • I do not expect very many filters to actually be thread safe. At the least, the initialization phase may be accessing and writing to some field variables. In the end, we may need to do a lot of editing on filters anyway.

Customizing Filters

Although there are an awful lot of filters in VTK, a small subset are exposed in ParaView. An even smaller subset is commonly used. We could parallelize ParaView functionality simply by optimizing filters for threads independently. We start with the most common and compute intensive filters and move down.

Pros:

  • As each filter is independently optimized, we are likely to get the most efficient form of parallelism this way.
  • Parallelism granularity not dependent on partitioning of the data.
  • Takes true advantage of the shared memory nature of the processors.

Cons:

  • More work (but probably easier work) to parallelize each filter one at a time.
  • Inevitably, some filters will remain non-parallel.

Threading Methods

A related but separate decision on

VTK threads

OpenMP

Threading Building Blocks

http://threadingbuildingblocks.org/ http://www.amazon.com/Intel-Threading-Building-Blocks-Parallelism/dp/0596514808/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1204749345&sr=8-1