Quantcast
Channel: Recent posts
Viewing all articles
Browse latest Browse all 20

Performance of Parallel Pipeline - How to Improve?

$
0
0

Hello there!

I've parallelized a list chasing toy benchmark using TBB parallel_pipeline and got a surprising result. The program was parallelized using a pipeline of two stages where both stages does the same computation, furthermore the communication between the stages is just a pointer and the second stage is parallel.

What surprised me was that using this partitioning with TBB the best execution time I got was about ~5s (with only one token..) but when I execute the program with this same partitioning and a "hand-written" cache-friendly queue [1,2] the execution time is only ~1,5s. The serial execution of the program is ~3s.

I'm not very familiar with TBB so I may be doing something wrong - I started using it a few days ago. So my question is: Am I doing something wrong here? How can I improve the performance of this example? 

It seems that the "buffer" used to communicate between stages is not very optimized - the best execution time I got was when using only one token. It is possible that false-sharing is causing the slowdown. Is there a way to change the pipeline to use another more cache-friendly "buffer"/queue or even a custom queue implementation?

The source code is here: http://pastebin.com/bwB9Yfzq

[1] http://www.cse.cuhk.edu.hk/~pclee/www/pubs/ancs09poster.pdf
[2] http://ce.colorado.edu/Publications/pact07_ff.pdf

 

Viewing all articles
Browse latest Browse all 20

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>