ITK/Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14
From KitwarePublic
< ITK | Release 4 | Refactoring Level Set Framework
Jump to navigationJump to search
Revision as of 16:00, 9 December 2011 by Daviddoria (talk | contribs) (moved ITK Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14 to ITK/Release 4/Refactoring Level Set Framework/LevelSetGPUBostonMeeting 2010-10-14)
Level Set And GPU
Meeting on Oct 14 2010, 3-4 pm Cambridge, MA
Attendees: Arnaud Gelas, Kishore Mosaliganti, Won-Ki Jeong
Levelset implementation can happen with GPU involvement at different levels. Primarily (at the solver level), the level set implementation consists of while-loop iterations inside the domain. Inside each iteration, speed terms are calculated at each pixel of the domain. Finally, the resulting update is then added to the level-set function to obtain a new function.
(a) while( Stopping Criterion Is Not Satisfied ) (b) { (c) for all level set ls_i in the level set container (d) { (e) for all pixels in the domain of ls_i (f) { (g) Compute all speed terms // iterate on the Term Container (h) Evaluate the updated levelset function (i) Compute time step from CFL Condition (j) } (k) Reinitialize to signed distance function (if requested by user) (l) } (m)}
We list out 3 scenarios here where GPU can be used:
- GPU implementation during pixel updates at (g): The pixel neighborhood in image and level set information is copied to GPU memory and the terms are evaluated in the GPU function. This leads to minimal changes in the current proposed design. Each term will have a CPU and GPU implementation. A term factory will call the GPU implementation. However, this is very bad according to performance.
- The entire code (while loop iteration) is GPU, and everything is copied inside the GPU memory.
- Downside:
- memory limitation of the GPU (<2 Gb)
- Code duplication: CPU and GPU
- Note: Copy b/w memory 4Gb/s
- Fastest solution in terms of performance!!!
- Downside:
- In the last scenario, the code nesting is different:
for all level set ls_i in the level set container { for all terms in the term container { for all pixels in the domain of ls_i { Evaluate the update Compute time step from CFL Condition for this term } } Evaluate the updated levelset function Reinitialize to signed distance function (if requested by user) }
In this one the GPU Implementation will occur for the most nested for loop.
- Keep copying the level set and image in each iteration in the GPU
- Second most optimal implementation for GPU
- But the good point: there is no code duplication!!!