Parallelizing Borůvka's Algorithm

(Checkpoint)

By Kathleen Fuh (kfuh) and Shreya Vemuri (shreyav)

Work Completed

So far we have:

Implemented a sequential version of Borůvka’s algorithm for finding an MST. In this process, we were able to determine the data structure for our graph that would best lend itself to parallelizing
Used OpenMP primitives in order to parallelize our CPU version of Borůvka’s algorithm (we’re working on making it a better parallel version)
Written a lock-free union-find structure and find operation such that we do not need to use critical regions in our parallel version

Challenges

We had some trouble trying to figure out how to parallelize finding the minimum edges incident on each vertex, specifically after graph contraction. In our implementation we do not update the graph, so finding the minimum edge for a connected component would require critical regions as we combine the results from multiple vertices in the same component. We plan to try using star contraction or parallelize over groups such that we can do multiple minimum searches at once.

We also had trouble figuring out how to create larger graphs from provided text files as was done in Assignment 3. Some of that code was not immediately obvious to us and also we would like to use an undirected graph, not just a directed graph. We decided that for now we will not be using those graphs and will be working on generating our own that can better fit our graph representation.

Creating a lock-free union find structure took us some time to research, so that is why we are behind schedule a bit. This lock-free union find implementation does not use union-by-rank, so we may try to add that optimization if time permits. We feel that we may try image segmentation in the last week if time permits because it would be cool to apply our algorithm to that.

One concern we currently have is that Borůvka’s algorithm doesn’t seem to lend itself well to a CUDA version, and we’re not sure how we will divide the work into blocks. We’re still in the process of researching ideas for a CUDA implementation.

Parallelism Competition

At the competition we still plan to display graphs depicting the performance of our sequential algorithm versus the performance of our implementations on a CPU versus on a GPU. We also would like to show graphs comparing performance on a varied number of cores for a CPU in order to determine what the optimal number of cores is on the CPU for some given graph. If time permits in our work schedule, it would also be nice to demo our image segmentation on some chosen pictures

Revised Schedule

Week	Plan
April 4 - April 10	Implement sequential version of Borůvka - DONE (K and S)
April 11 - April 17	Parallel implementation of Borůvka on CPU using OpenMP - DONE (K and S)
April 18 - April 21	Create larger random graphs to test implementations (K and S)
April 22 - April 26	Make improvements to OpenMP Parallel Implementation (K and S)
April 27 - May 2	Parallel implementation of Borůvka on GPU using CUDA (K and S)
May 2 - May 8	Try image segmentation + analysis of results/buffer window/presentation preparation (K and S)