A work-efficient algorithm for parallel unordered depth-first search

This project concerns high-performance parallel graph traversal (or, graph search) over directed, in-memory graphs using shared-memory, multiprocessor machines (aka multicore). The main research question addressed by this project is the following:

In our Supercomputing’15 paper¹, we answer the main question from above in the affirmative: we present a new algorithm that performs a DFS-like traversal over a given graph. We prove that the algorithm is strongly work efficient, meaning that, in addition to having the linear upper bound on the total work performed, the algorithm effectively amortizes overheads, such as the cycles spent balancing traversal workload between cores. Put a different way, we prove that, for any input, the algorithm confines the amount of load-balancing work to a small fraction of the total (linear) running time. Then, we prove that the algorithm achieves a high degree of parallelism, that is, near optimal under the constraints of the graph. Finally, we present an experimental evaluation showing that, thanks to reducing overheads and increasing parallelism, our implementation of the algorithm outperforms two other state-of-the-art algorithms in all but a few cases, and in some cases, far outperforms the competition.

We made available a long version of our Supercomputing’15 article which includes appendices. In the appendix, there is a longer version of one of the proofs

Run our experimental evaluation

The source code we used for our experimental evaluation is hosted by a Github repository. The source code for our PDFS and our implementation of the DFS of Cong et al is stored in the file named dfs.hpp.

Prerequisites

To have enough room to run the experiments, your filesystem should have about 300GB of free hard-drive space and your machine at least 128GB or RAM. These space requirements are so large because some of the input graphs we use are huge.

We use the nix package manager to handle the details of our experimental setup. The first step is to install nix on your machine.

Obtaining experimental data

If there is sufficient free space on the current file system, then simply run the following command.

To streamline this process, we are going to add the benchmarking folder to our path.

We can now start generating the graph data. This process requires access to a fast internet connection and likely a few hours to complete. If there are any issues with the download, please contact the authors for help.

Running the experiment

The first series of benchmark runs gather data to serve as baseline measurements.

After the command completes, the results of the experiment are going to be stored in the _results folder.

We are now ready to run the main body of experiments. The following command starts the experiments running.

To achieve clean results, we recommend performing thirty runs. However, it may be faster to perform just a few at first, and then collect more data later, next time passing to graph.pbench the flag -mode append.

Analyzing the results

After running the experiments, the raw result data should be stored in new files named results_accessible_large.txt, results_baselines_large.txt, and results_overview_large.txt. If the experiments completed successfully, then the kind of plots that appear in our SC’15 paper can be generated by the following command.

After completion, the file named table_graphs.pdf should contain a table that bears resemblance to Table 1 in our SC’15 paper. The file named _results/table_graphs.tex contains the source code. A speedup plot, like the one in Figure 6, should appear in a new file named plot_main.pdf (with souce code in _results/plot_main.pdf-all.r). The plot corresponding to Figure 8 in the paper, namely, the locality study, should appear in the file named plots_overview_locality_large.pdf (with souce code in plots_overview_locality_large.pdf-all.r).

References

Acar, Umut A, Arthur Charguéraud, and Mike Rainey. 2015. “A Work-Efficient Algorithm for Parallel Unordered Depth-First Search.” In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 67:1–12. ACM. http://chargueraud.org/research/2015/pdfs/pdfs_sc15.pdf.