15-440* Git Quick Start Guide


To the end of more facile development of your projects, we've written this quick-start guide for using a modern and popular system for source control: the Git version control system. This document will serve first as a user's reference, second as an explanation of concepts (although you need not understand all of the concepts to use Git), and third as evangelism for Git and other distributed version control systems (although you need not drink my kool-aid to use Git). In theory, each part should stand alone; you need not know of the concepts to use the reference, and you need not know of the reference to be evangelized to. In practice, you may find it useful to read all three parts to get a deeper understanding of what Git is doing while you aren't looking.

Should you use Git, or something simpler? On the one hand, other things might be simpler and faster to learn right now. On the other hand, time spent learning Git will pay off if join a project that already uses Git. Because there are so many revision-control systems currently in use, there is no guarantee you won't have to learn something else, but Git is among the more popular systems, so it's a plausible investment.

Quick-start

Obtaining/installing Git

Telling Git about you

Getting your project set up

  # Create a directory within AFS to house the GIT repository
  # ONE partner should execute this command within her/his AFS space for the 
  # team. The person doing this should have enough quota to hold the repository.
  $ mkdir -p ~/440/REPOSITORY

  # Give the "Other partner" AFS access to this space, while exlcuding others
  $ find ~/440 -type d -exec fs sa -dir '{}' -acl YOUR_OWN_ANDREWID all -clear \;
  $ find ~/440 -type d -exec fs sa -dir '{}' -acl OTHER_PARTNER_ANDREWID all \;

  # Execute these commands only once on AFS to actually create the repository
  $ cd ~/440/REPOSITORY
  $ git init --bare project2

  # Both you and your partner do this on AFS as well, to give each of you a 
  # place to actually do your individual work
  $ cd ~/440
  $ git clone --no-hardlinks ~ANDREWID_OF_HOSTING_PARTNER/440/REPOSITORY/project2

  # One of you adds the project handout via your personal repository
  $ cd ~/440/project2
  $ wget http://www.andrew.cmu.edu/course/15-440-s12/applications/labs/lab2/project2.zip
  $ unzip project2.zip
  $ git add *
  $ git commit -a -m "Initial commit"
  $ git push origin master -u

Working with Git on a day-to-day basis

These operations will become your new best friends. You will use them many times per day; it will pay off to become familiar with their operation and their quirks.

It behooves you to try to make "good" commits, both for yourself and for your partner. To see what a "good" commit looks like, we should probably first look at what a "bad" commit is. Here's a transcript of a former OS TA making quite a few mistakes, all in one short command:

joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit -a -m "wee"
[master 6946f54] wee
 2 files changed, 2 insertions(+), 2 deletions(-)
joshua@escape:~/school/15-410-ta/p3-s09p4$

What's so wrong about this? Well, the most obvious is the message; the message "wee" conveys absolutely no information to your TA's partner (well, maybe it tells my partner that I was excited about this, but not much more than that). But there are more substantial issues here. Let's go back and do this again and see what your TA missed.

joshua@escape:~/school/15-410-ta/p3-s09p4$ git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working
#   directory)
#
#       modified:   kern/mutex.c
#       modified:   user/progs/vm_explode.c
# Untracked files:
#   (use "git add ..." to include in what will be committed)
#
#       user/progs/mytest.c
no changes added to commit (use "git add" and/or "git commit -a")
joshua@escape:~/school/15-410-ta/p3-s09p4$ git diff
diff --git a/kern/mutex.c b/kern/mutex.c
index 4a13af1..55f4569 100644
--- a/kern/mutex.c
+++ b/kern/mutex.c
@@ -39,7 +39,7 @@ void mutex_init(mutex_t *mp)
 void mutex_lock(mutex_t *mp)
 {
-       make_mutexes_work();
+       make_mutexes_not_work();  // XXX changed briefly to test my demo program
        mutex_level++;
diff --git a/user/progs/vm_explode.c b/user/progs/vm_explode.c
index ee8c2b9..94c8c94 100644
--- a/user/progs/vm_explode.c
+++ b/user/progs/vm_explode.c
@@ -72,7 +72,7 @@ int main() {
         vanish();
       }
     }
-    printf("parent: all balls accounted for!\n");
+    printf("parent: all children accounted for!\n");
     set_status(0);
     vanish();
   }
joshua@escape:~/school/15-410-ta/p3-s09p4$ git add user/progs/vm_explode.c user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit
in your TA's editor...
Modify vm_explode to more accurately describe what it's doing instead of punning
on the P2 test 'juggle', and create a spinoff, mytest.

mytest makes sure that the frubulator is frobbed in the mutexes; you can
make it fail by commenting out the call to make_mutexes_work() in kern/mutex.c.
But make sure not to commit that change!  Otherwise we'll both be sad.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Committer: Joshua Wise <joshua@escape.joshuawise.com>
#
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   user/progs/mytest.c
#       modified:   user/progs/vm_explode.c
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   kern/mutex.c
and back at the shell...
".git/COMMIT_EDITMSG" 30L, 1112C written
[master 6ef906e] Modify vm_explode to more accurately describe what it's doing instead
of punning on the P2 test 'juggle', and create a spinoff, mytest.
 1 files changed, 1 insertions(+), 1 deletions(-)
 create mode 100644 user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$

Much better! This time, your TA checked to see what he was changing before he committed it, added only the files he wanted to commit, and then wrote a descriptive commit message so that his partner could test this for himself. Importantly, your TA did not commit the change that would break his kernel's mutexes while doing this, and hence did not get strangled in his sleep by his partner.

Strive to emulate this workflow. You may find that you don't need quite such verbose messages, and git commit -m will work fine for you. That's OK; but try to make your commit messages at least somewhat useful.

Time travel with Git

In an ideal world, we would make no errors while writing code. Sadly, sometimes we wish to travel back to the past and determine what broke. It is generally considered inadvisable to modify history; if you do, you run the risk of killing one or more of your parents, and being in a paradoxical state of existance. If you wish to modify history, you might wish to create an alternate universe; in Git, we call these alternate universes "branches". Luckily, branches aren't needed to just go back and look. You may use these commands somewhat less frequently, but they are no less important.

Splitting reality with Git

At some point, you may wish that you could make a change on a previous version of your tree without affecting the current version (yet); or you may wish to split reality in half, and work on an experimental side-project without disrupting main development of your project. Branches in Git are designed to allow you to do just those things; split away from the main view of reality from some point in time (be that time now or the past).

Don't lose your data!

Git is meant to track versions of files, but that doesn't mean that you can't lose data when working with git. There are multiple kinds of data that might get lost if something goes wrong with git. For more information about what data you might lose, see How to use git to lose data.

You can protect yourself against some git-related data loss by adding these settings to the config file of your shared central repository (e.g., ~/440/REPOSITORY/project2/config).

[receive]
  fsckObjects = true
  denyDeletes = true
  denyNonFastForwards = true
[gc]
  reflogExpire = never
  reflogExpireUnreachable = never
  pruneExpire = never
  rerereresolved = never
  rerereunresolved = never
[core]
  logAllRefUpdates = true

You can do this by editing the config file directly, or by using these commands:

# One person does these, once.
$ cd ~/440/REPOSITORY/project2

$ git config receive.fsckObjects true
$ git config receive.denyDeletes true
$ git config receive.denyNonFastforwards true

$ git config gc.reflogExpire never
$ git config gc.reflogExpireUnreachable never
$ git config gc.pruneExpire never
$ git config gc.rerereresolved never
$ git config gc.rerereunresolved never

$ git config core.logAllRefUpdates true

You may also wish to apply some or all of these settings to your personal repository, though this is less important because in theory you are frequently pushing your work to a well-configured central repository.

Explanation of Concepts

The above involved some simplifications of the underlying concepts of Git for the purposes of readability and for the purposes of understandability of an introduction. The simplifications are not disastrous in terms of your comprehension of what Git is doing behind your back, but you may find it helpful to know how Git stores data to better work with Git. Tommi Virtanen's excellent page Git for Computer Scientists may provide some insight as well, for those who like to talk about DAGs and are big fans of arrows pointing every which way.

Commits

The basic unit of a point in time stored in Git is a commit. Each time we spoke of recording changes earlier, it would have been more correct to say "creating a commit"; I used the words "recording changes" to distinguish the operation from pushing and publishing your changes to your partner. A commit, by its nature, is comprised of a few pieces of information:

A commit is identified by the SHA1 hash of all of the information that it contains. This hash is one common form of a refspec -- that is to say, it is one common way to specify a single commit. Recall that when you did a checkout to go back in time, you specified a SHA1 hash; in that case, you were using the SHA1 hash as a refspec.

You may have inferred by now that commits exist in a sort of a tree. Each commit may have one or more parent commits (a commit with more than one parent is called a merge commit), and each commit may have zero or more child commits. You can view the commit tree using gitk, as we saw above; each commit was identified by a dot, and gitk drew lines for us between each commit to explicitly show the branches of the tree.

This tree of cryptographic hashes gives Git a few very useful properties. Git can assure you that nobody has changed the tree that you have based your work on, because every element in the tree, down to the blobs, is identified by its cryptographic hash (its SHA1). If a parent object has changed, either by malicious intent or by disk corruption, Git simply will not be able to find the parent object, instead of giving you the incorrect data. This makes Git relatively immune to AFS corrupting its metadata.

Further, it makes it impossible to throw away history. Some version control systems that we discussed in lecture have versions per file; so deleting a file may delete its version history, or otherwise create a discontinuity in how the file is linked in terms of time. Similarly, renaming a file is not disastrous (although somewhat quirky); the only changes happen locally in the commit object. If a delete required a change of history, then the cryptographic hashes would change, and the entire tree's parent hash would have to change. The cryptographic hash system, then, makes Git resistant to inadvertant deletion of history.

Branches, tags, and refspecs -- oh my!

In this section, until now, you've seen only one kind of refspec -- a SHA1 hash of a commit. But in the quick-start above, you've worked with more types of refspecs; when you checked out a branch, you used the refspec that refers to the branch.

Further Reading

Here are some sources you might consult.


* We warmly thank the kind 15-410, Operating Systems, course staff for this tutorial, which we stole and very slightly personalized. Any errors, are almost certainly part of our adaptation and should be reported to staff-440@cs.cmu.edu.