By the way, did you make sure to apply D3? disk_out_ref.ex2 is not partitioned so by default it would be loaded entirely onto MPI rank 0.<br><br><div class="gmail_quote">On Mon, Jun 4, 2012 at 5:21 AM, Stephan Rogge <span dir="ltr">&lt;<a href="mailto:Stephan.Rogge@tu-cottbus.de" target="_blank">Stephan.Rogge@tu-cottbus.de</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hello Leo,<br>

<br>

ok, I took the &quot;disk_out_ref.ex2&quot; example data set and did some time<br>

measurements. Remember, my machine has 4 Cores + HyperThreading.<br>

<br>

My first observation is that PV seems to have a problem with distributing<br>

the data when the Multi-Core option (GUI) is enabled. When PV is started<br>

with builtin Multi-Core I was not able to apply a stream tracer with more<br>

than 1000 seed points (PV is freezing and never comes back). Otherwise, when<br>

pvserver processes has been started manually I was able to set up to 100.000<br>

seed points. Is it a bug?<br>

<br>

Now let&#39;s have a look on the scaling performance. As you suggested, I&#39;ve<br>

used the D3 filter for distributing the data along the processes. The stream<br>

tracer execution time for 10.000 seed points:<br>

<br>

##   Bulitin: 10.063 seconds<br>

##   1 MPI-Process (no D3): 10.162 seconds<br>

##   4 MPI-Processes: 15.615 seconds<br>

##   8 MPI-Processes: 14.103 seconds<br>

<br>

and 100.000 seed points:<br>

<br>

##   Bulitin: 100.603 seconds<br>

##   1 MPI-Process (no D3): 100.967 seconds<br>

##   4 MPI-Processes: 168.1 seconds<br>

##   8 MPI-Processes: 171.325 seconds<br>

<br>

I cannot see any positive scaling behavior here. Maybe is this example not<br>

appropriate for scaling measurements?<br>

<br>

One more thing: I&#39;ve visualized the vtkProcessId and saw that the whole<br>

vector field is partitioned. I thought, that each streamline is integrated<br>

in its own process. But it seems that this is not the case. This could<br>

explain my scaling issues: In cases of small vector fields the overhead of<br>

synchronization becomes too large and decreases the overall performance.<br>

<br>

My suggestion is to have a parallel StreamTracer which is built for a single<br>

machine with several threads. Could be worth to randomly distribute the<br>

seeds over all available (local) processes? Of course, each process have<br>

access on the whole vector field.<br>

<br>

Cheers,<br>

<div class="im">Stephan<br>

<br>

<br>

<br>

Von: Yuanxin Liu [mailto:<a href="mailto:leo.liu@kitware.com">leo.liu@kitware.com</a>]<br>

</div>Gesendet: Freitag, 1. Juni 2012 16:13<br>

<div class="HOEnZb"><div class="h5">An: Stephan Rogge<br>

Cc: Andy Bauer; <a href="mailto:paraview@paraview.org">paraview@paraview.org</a><br>

Betreff: Re: [Paraview] Parallel Streamtracer<br>

<br>

Hi, Stephan,<br>

  I did measure the performance at some point and was able to get fairly<br>

decent speed up with more processors. So I am surprised you are seeing huge<br>

latency.<br>

<br>

  Of course, the performance is sensitive to the input.  It is also<br>

sensitive to how readers distribute data. So, one thing you might want to<br>

try is to attach the &quot;D3&quot; filter to the reader.<br>

<br>

  If that doesn&#39;t help,  I will be happy to get your data and take a look.<br>

<br>

Leo<br>

<br>

On Fri, Jun 1, 2012 at 1:54 AM, Stephan Rogge &lt;<a href="mailto:Stephan.Rogge@tu-cottbus.de">Stephan.Rogge@tu-cottbus.de</a>&gt;<br>

wrote:<br>

Leo,<br>

<br>

As I mentioned in my initial post of this thread: I used the up-to-date<br>

master branch of ParaView. Which means I have already used your<br>

implementation.<br>

<br>

I can imagine, to parallelize this algorithm can be very tough. And I can<br>

see that distribute the calculation over 8 processes does not lead to a nice<br>

scaling.<br>

<br>

But I don&#39;t understand this huge amount of latency when using the<br>

StreamTracer in a Cave-Mode with two view ports and two pvserver processes<br>

on the same machine (extra machine for the client). I guess the tracer<br>

filter is applied for each viewport separately? This would be ok as long as<br>

both filter executions run parallel. And I doubt that this is the case.<br>

<br>

Can you help to clarify my problem?<br>

<br>

Regards,<br>

Stephan<br>

<br>

<br>

Von: Yuanxin Liu [mailto:<a href="mailto:leo.liu@kitware.com">leo.liu@kitware.com</a>]<br>

Gesendet: Donnerstag, 31. Mai 2012 21:33<br>

An: Stephan Rogge<br>

Cc: Andy Bauer; <a href="mailto:paraview@paraview.org">paraview@paraview.org</a><br>

Betreff: Re: [Paraview] Parallel Streamtracer<br>

<br>

It is in the current VTK and ParaView master.  The class is<br>

vtkPStreamTracer. <br>

<br>

Leo<br>

On Thu, May 31, 2012 at 3:31 PM, Stephan Rogge &lt;<a href="mailto:stephan.rogge@tu-cottbus.de">stephan.rogge@tu-cottbus.de</a>&gt;<br>

wrote:<br>

Hi, Andy and Leo,<br>

<br>

thanks for your replies.<br>

<br>

Is it possible to get this new implementation? I would to give it a try.<br>

<br>

Regards,<br>

Stephan<br>

<br>

Am 31.05.2012 um 17:48 schrieb Yuanxin Liu &lt;<a href="mailto:leo.liu@kitware.com">leo.liu@kitware.com</a>&gt;:<br>

Hi, Stephan,<br>

   The previous implementation only has serial performance:  It traces the<br>

streamlines one at a time and never starts a new streamline until the<br>

previous one finishes.  With communication overhead, it is not surprising it<br>

got slower.<br>

<br>

  My new implementation is able to let the processes working on different<br>

streamlines simultaneously and should scale much better.<br>

<br>

Leo<br>

<br>

On Thu, May 31, 2012 at 11:27 AM, Andy Bauer &lt;<a href="mailto:andy.bauer@kitware.com">andy.bauer@kitware.com</a>&gt; wrote:<br>

Hi Stephan,<br>

<br>

The parallel stream tracer uses the partitioning of the grid to determine<br>

which process does the integration. When the streamline exits the subdomain<br>

of a process there is a search to see if it enters a subdomain assigned to<br>

any other processes before figuring it whether it has left the entire<br>

domain.<br>

<br>

Leo, copied here, has been improving the streamline implementation inside of<br>

VTK so you may want to get his newer version. It is a pretty tough algorithm<br>

to parallelize efficiently without making any assumptions on the flow or<br>

partitioning.<br>

<br>

Andy<br>

<br>

On Thu, May 31, 2012 at 4:16 AM, Stephan Rogge &lt;<a href="mailto:Stephan.Rogge@tu-cottbus.de">Stephan.Rogge@tu-cottbus.de</a>&gt;<br>

wrote:<br>

Hello,<br>

<br>

I have a question related to the parallelism of the stream tracer: As I<br>

understand the code right, each line integration (trace) is processed in an<br>

own MPI process. Right?<br>

<br>

To test the scalability of the Stream tracer I&#39;ve load a structured<br>

(curvilinear) grid and applied the filter with a Seed resolution of 1500 and<br>

check the timings in a single and multi-thread (Multi Core enabled in PV<br>

GUI) situation.<br>

<br>

I was really surprised that multi core slows done the execution time to 4<br>

seconds. The single core takes only 1.2 seconds. Data migration cannot be<br>

the explanation for that behavior (0.5 seconds). What is the problem here?<br>

<br>

Please see attached some statistics...<br>

<br>

Data:<br>

* Structured (Curvilinear) Grid<br>

* 244030 Cells<br>

* 37 MB Memory<br>

<br>

System:<br>

* Intel i7-2600K (4 Cores + HT = 8 Threads)<br>

* 16 GB Ram<br>

* Windows 7 64 Bit<br>

* ParaView (master-branch, 64 bit compilation)<br>

<br>

#################################<br>

Single Thread (Seed resolution 1500):<br>

#################################<br>

<br>

Local Process<br>

Still Render,  0.014 seconds<br>

RenderView::Update,  1.222 seconds<br>

   vtkPVView::Update,  1.222 seconds<br>

       Execute vtkStreamTracer id: 2184,  1.214 seconds<br>

Still Render,  0.015 seconds<br>

<br>

#################################<br>

Eight Threads (Seed resolution 1500):<br>

#################################<br>

<br>

Local Process<br>

Still Render,  0.029 seconds<br>

RenderView::Update,  4.134 seconds<br>

vtkSMDataDeliveryManager: Deliver Geome,  0.619 seconds<br>

   FullRes Data Migration,  0.619 seconds<br>

Still Render,  0.042 seconds<br>

   OpenGL Dev Render,  0.01 seconds<br>

<br>

<br>

Render Server, Process 0<br>

RenderView::Update,  4.134 seconds<br>

   vtkPVView::Update,  4.132 seconds<br>

       Execute vtkStreamTracer id: 2193,  3.941 seconds<br>

FullRes Data Migration,  0.567 seconds<br>

   Dataserver gathering to 0,  0.318 seconds<br>

   Dataserver sending to client,  0.243 seconds<br>

<br>

Render Server, Process 1<br>

Execute vtkStreamTracer id: 2193,  3.939 seconds<br>

<br>

Render Server, Process 2<br>

Execute vtkStreamTracer id: 2193,  3.938 seconds<br>

<br>

Render Server, Process 3<br>

Execute vtkStreamTracer id: 2193,  4.12 seconds<br>

<br>

Render Server, Process 4<br>

Execute vtkStreamTracer id: 2193,  3.938 seconds<br>

<br>

Render Server, Process 5<br>

Execute vtkStreamTracer id: 2193,  3.939 seconds<br>

<br>

Render Server, Process 6<br>

Execute vtkStreamTracer id: 2193,  3.938 seconds<br>

<br>

Render Server, Process 7<br>

Execute vtkStreamTracer id: 2193,  3.939 seconds<br>

<br>

Cheers,<br>

Stephan<br>

<br>

<br>

_______________________________________________<br>

Powered by <a href="http://www.kitware.com" target="_blank">www.kitware.com</a><br>

<br>

Visit other Kitware open-source projects at<br>

<a href="http://www.kitware.com/opensource/opensource.html" target="_blank">http://www.kitware.com/opensource/opensource.html</a><br>

<br>

Please keep messages on-topic and check the ParaView Wiki at:<br>

<a href="http://paraview.org/Wiki/ParaView" target="_blank">http://paraview.org/Wiki/ParaView</a><br>

<br>

Follow this link to subscribe/unsubscribe:<br>

<a href="http://www.paraview.org/mailman/listinfo/paraview" target="_blank">http://www.paraview.org/mailman/listinfo/paraview</a><br>

<br>

<br>

<br>

<br>

<br>

_______________________________________________<br>

Powered by <a href="http://www.kitware.com" target="_blank">www.kitware.com</a><br>

<br>

Visit other Kitware open-source projects at <a href="http://www.kitware.com/opensource/opensource.html" target="_blank">http://www.kitware.com/opensource/opensource.html</a><br>

<br>

Please keep messages on-topic and check the ParaView Wiki at: <a href="http://paraview.org/Wiki/ParaView" target="_blank">http://paraview.org/Wiki/ParaView</a><br>

<br>

Follow this link to subscribe/unsubscribe:<br>

<a href="http://www.paraview.org/mailman/listinfo/paraview" target="_blank">http://www.paraview.org/mailman/listinfo/paraview</a><br>

</div></div></blockquote></div><br>