Parallel SCC Detection, Jason Xu, Brendan Ney

Summary: We are going to implement parallel algorithms for strongly connected component detection in graphs for multi-core processors. The implementations will be compared in different characteristics (scalability, speedup) and applications (graph sparsity, structure).

Background: Tarjan's classical sequential algorithm performs repeated DFS to detects the SCCs. DFS is naturally recursive and sequential and difficult to find good ways to parallelize. Many parallel approaches to the problem use a divide-and-conquer approach that divides the graph into smaller parts until they are strongly connected. The challenge then is to perform this division efficiently and recombining the results.

Challenge: The classical sequential approach to SCC detection uses DFS, but this is inherently difficult to parallelize. We hope to learn different approaches of transforming a recursive sequential algorithm into iterative parallel algorithms and implementing them efficiently for graphs. The workload consists of an directed graph with arbitrary degrees. There is good locality given by the edge relationships of neighboring vertices, but computation is not very predictable because of the unknown topology of the graph.

Resources: We will find papers on parallel algorithms for SCC detection. A thesis on descriptions of different such algorithms we have found here: https://www.cs.vu.nl/~wanf/theses/matei.pdf. Divide and conquer approaches are described in this paper https://cfwebprod.sandia.gov/cfdocs/CompResearch/docs/irreg00.pdf

Goals: We will implement multiple parallel SCC detection algorithms and compare their performance. Will plan to implement 2-3 high performing algorithms that provide a reasonable speedup over sequential implementations. We hope to implement 4-5 high performing algorithms. We will present our results in the form of speedup graphs and algorithm diagrams.

Platform: We will program on CPUs in C++. CPUs are chosen over GPUs because the divergent execution does not match a GPU's highly parallel execution model. C++ is chosen for a good balance between low- and high-level programming capabilities.

Schedule: Week 1: Study the problem and existing literature, choose an algorithm, and consider the implementation details. Week 2: Implement algorithm and optimize for speedup. Week 3: Implement and optimize a second algorithm. Week 4: Implement and optimize a third algorithm. Week 5: Perform speed tests on the implementations against different parameters and workloads.