P2PDiBS : Peer-to-Peer Distributed Backup System

Distributed Systems (18-842) Spring '07 - Instructor: Greg Ganger, Mentor: Gaurav Shah


Project Members

Aditya Bhave    Bobby Asher    Nitasha Walia    Alhad Palkar
<ayb@andrew.cmu.edu>
<basher@andrew.cmu.edu>
<nwalia@andrew.cmu.edu>
<apalkar@andrew.cmu.edu>


Abstract

Backup is essential to prevent data loss due to hardware, software, or user error. Organizations today mostly implement centralized backup systems that cost a lot in terms of both time and money; the cost of installing dedicated backup servers for the Local Area Network (LAN), and administrators for maintaining the backup system. It would be much better if we could implement a reliable backup system using the existing infrastructure of unused hard disk space on client machines, with only a small monetary overhead.

A survey at Information Networking Institute (INI), re-enforced the fact that a majority of users have a lot of space on their hard drives that goes unused. According to the survey, more than 50% of the users do not utilize more than half of their hard disk space. The results of the survey are evident in the graph shown below. Our project is based on this very fact. Every node on the LAN acts as client whenever it needs to perform backup, and takes on the role of  a server whenever other nodes need to backup their files. Although our system scores over a centralized dedicated backup system in terms of cost, we also need to take into account issues like fault tolerance, reliability and availability. Further we also need to study how the extraneous disk and network activity caused by our system interferes with the normal functioning of the nodes.

The P2PDiBS system is based on the following basic primitives:

    Discover - To facilitate the discovery of nodes on the netowork that are ready to act as backup servers.
    Backup   - To backup a file specified by the client on k servers.
    Retrieve  - To retrieve a file from one of the k servers on which it had been earlier backed up.
    Remove  - To free HDD space on servers by notifying them that they no longer need to hold the file on their system.
    Rename  - To notify the servers that a particular file that they hold has been renamed at the client.
    Sync       - To maintain consistency of the data structures at each node on the network.
    Restore   - A crashed system that loses all data, can use this primitive to retrieve all the files that it had previously. 

This system has been deployed in a semi - trusted environment, and hence is equiped with only a password based encryption, that encrypts a file before it is backed up on a server node, and MD5 to ensure integrity of the backed up file.


The system has a user friendly intuitive interface that helps the user peform various functions like backup and retrieve. The user can set the amount of space that he is willing to set apart for perfroming backup for others. A snapshot of  the user interface can be seen below

For further information on this project, you can download a pdf copy of our project report. If you do not have Adobe Reader installed on your system, you can download it here. For any further questions you can contact the Basherz group at basherz-dsproj@lists.andrew.cmu.edu.