15-440^* Git Quick Start Guide

To the end of more facile development of your projects, we've written this quick-start guide for using a modern and popular system for source control: the Git version control system. This document will serve first as a user's reference, second as an explanation of concepts (although you need not understand all of the concepts to use Git), and third as evangelism for Git and other distributed version control systems (although you need not drink my kool-aid to use Git). In theory, each part should stand alone; you need not know of the concepts to use the reference, and you need not know of the reference to be evangelized to. In practice, you may find it useful to read all three parts to get a deeper understanding of what Git is doing while you aren't looking.

Should you use Git, or something simpler? On the one hand, other things might be simpler and faster to learn right now. On the other hand, time spent learning Git will pay off if join a project that already uses Git. Because there are so many revision-control systems currently in use, there is no guarantee you won't have to learn something else, but Git is among the more popular systems, so it's a plausible investment.

Quick-start

Obtaining/installing Git

On Andrew UNIX, Git is available for you in /usr/bin.
On non-Andrew Linux systems, Git is typically installed through the distribution's package manager. For Ubuntu, Git is installed by the command sudo apt-get install git-core or sudo apt-get install git depending on release; for Fedora, Git is installed by the command su -c "yum install git". For other distributions, refer to system documentation.
On other systems, or if no Git package is available, the latest version of Git can be obtained from the official download site. It is buildable with the traditional ./configure && make && sudo make install procedure.

Telling Git about you

Edit your gitconfig. In your home directory, there can be a file named .gitconfig which will affect all git repositories on your system. Populate it with the following commands:
```
git config --global user.email hqbovik@andrew.cmu.edu
git config --global user.name 'Harry Q. Bovik'
```
This way Git will know who you are whenever you make a commit and all of your commits will be tagged with your name for your partner to identify. There's a lot of other options that can go into your ~/.gitconfig, so check out man git-config to find out about colors, aliases, and other global options.

Getting your project set up

  # Create a directory within AFS to house the GIT repository
  # ONE partner should execute this command within her/his AFS space for the 
  # team. The person doing this should have enough quota to hold the repository.
  $ mkdir -p ~/440/REPOSITORY

  # Give the "Other partner" AFS access to this space, while exlcuding others
  $ find ~/440 -type d -exec fs sa -dir '{}' -acl YOUR_OWN_ANDREWID all -clear \;
  $ find ~/440 -type d -exec fs sa -dir '{}' -acl OTHER_PARTNER_ANDREWID all \;

  # Execute these commands only once on AFS to actually create the repository
  $ cd ~/440/REPOSITORY
  $ git init --bare project2

  # Both you and your partner do this on AFS as well, to give each of you a 
  # place to actually do your individual work
  $ cd ~/440
  $ git clone --no-hardlinks ~ANDREWID_OF_HOSTING_PARTNER/440/REPOSITORY/project2

  # One of you adds the project handout via your personal repository
  $ cd ~/440/project2
  $ wget http://www.andrew.cmu.edu/course/15-440-s12/applications/labs/lab2/project2.zip
  $ unzip project2.zip
  $ git add *
  $ git commit -a -m "Initial commit"
  $ git push origin master -u

Creating the repository - The first bit of this creates the respository in one of your home directories. It names the repository project2 and ensures that both of you have access to it. One assumption is that the top level of your home directory is viewable by your partner. If not, you'll need to fix this with fs, e.g. "fs sa -dir ~/ -acl PARTNERANDREWID l". The command git init --bare project2 creates the directory, and makes it ready as a remote. You and your partner will push and pull from this repository.

Cloning the repository - Now that the remote is set up, you can clone it into a local work directory for yourself via the git clone command. You can ignore the warning about cloning an empty repository because you haven't had any commits yet. The --no-hardlinks flag tells git to actually copy all the files instead of wasting time trying to make hard linkes between AFS volumes (which doesn't work). Actually, unless you are working with large projects, doing a full copy gives you added protection against disk errors or accidental repository corruption, so this is probably a good habit for you to pick up even when AFS isn't part of the picture.
Making the first commit - After the first commit, the warning about the repository being empty will go away. This is done by adding a file, such as a README file, and then committing and pushing the file to the remote. Normally, you can type just git push, but the first time you push you need to specify which branch you will be pushing to on the remote via git push origin master -u.
Cloning when not on AFS - If you or your partner is not working on AFS, then the clone command won't work, because it will be trying to clone a path which doesn't exist. You can change the URL, however to look like YOUR_ANDREWID@unix.andrew.cmu.edu:~ANDREWID_OF_HOSTING_PARTNER/440/project2 and then git will push/pull files over SSH.

It is highly recommended that you figure out the real path where your repository is stored (without any symbolic links) and clone from that path instead.

Working with Git on a day-to-day basis

These operations will become your new best friends. You will use them many times per day; it will pay off to become familiar with their operation and their quirks.

To record changes to every file that Git is tracking, run git commit -a. Git will bring up your editor (if it brings up the wrong editor, set your EDITOR environment variable by adding a line like "export EDITOR=vim" to your bashrc) to prompt you for a commit message. To make the best use of some of Git's other features, you should endeavor to make changes in an order that will make sure that your project still compiles and runs when you commit. This will only record the change in your local repository, and not yet make your changes visible to your partner; see the section on pushing and pulling later.

This command will not add new files to Git! If you add a new file to your repository and use git commit -a without first running git add, your partner will be very sad when you go to sleep, they wake up to work, and the file is not there (and they will probably call you and wake you up).

If you wish to add a short message to your commit on the command line, you can do so with the -m "message" option. Before you begin committing, you might wish to read our guidance below about what makes a good commit!
To record changes to just a few files that Git is tracking, run git commit file1 file2 ..., where file1... are files that you'd like to record changes to, and file2... are optional.
To record changes to a few parts of some of your changed files (i.e., patch chunks), you have a few options.
- git commit --interactive. This will bring up an interactive prompt that will allow you to choose what you'd like to commit; to get started, try typing "status" at the "What now?>" prompt to see what's changed, and then "patch" to interactively make choices. (Resist the urge to shout "whatnow?!" whenever you see the prompt.)
- git add -i. This is like commit --interactive, but it doesn't create any commits. This command just tells git that on the next commit, include these changes.
- git add -p. This command allows you to only add patches of a file. Use this if you have edited a large number of files, but you don't want all of the modifications to be one large commit. You will be asked about each chunk of diff in every file about whether it should be added or not. Some useful commands are: s - split the current diff up into smaller diffs to accept/reject or y/n - yes or no to accept the current diff.
To add a new file to Git, run git add file1 file2 ..., where file1... are files that you'd like to add, and file2... are optional. Then, to record the newly added file, run git commit -m "message". If you do this on a file that already exists, Git will record the changed state at the time you ran the add.
To delete a file from Git, run git rm file1 file2 ..., where file1... are files that you'd like to delete, and file2... are optional. Then, to record the newly deleted file, run git commit -m "message". The delete is not permanent; you can still check out older revisions with that file intact.
To rename or move a file in Git, run git mv oldname newname, where oldname and newname are the obvious. (git mv also has similar semantics to the UNIX command mv; this is just the most common usage.) To record the moved files, run git commit.
To see what changes you've made to files that Git is tracking, run git status to get a list of changed files (and at the same time, a list of new files and a list of deleted files). To look at the specific changes, run git diff. Git will produce diff-formatted output about all of the current unrecorded changes in your repository.
To make your changes visible to your partner, run git push. This will "push" your changes into the bare repository. If your repository is not already up to date with the bare repository, then git push will fail with a message like remote is not a strict subset of local, or comments about things not being "fast-forward"s. Resist the urge to use the -f option! You'll be sad if you do that. See the section on getting your partner's changes.
To get your partner's changes, run git pull. This will "pull" changes from the bare repository into your local repository. If you and your partner have changed the same sections of the same file in non-trivial ways that Git could not resolve, then the pull will leave your repository in a "conflicted" state with a message beginning with CONFLICT:. Edit the conflicted files to resolve the conflicts, make sure your project builds, and then run something like git commit -m "Fix merge conflict" to record the fix. Your fix will not be visible to your partner until you push.

It behooves you to try to make "good" commits, both for yourself and for your partner. To see what a "good" commit looks like, we should probably first look at what a "bad" commit is. Here's a transcript of a former OS TA making quite a few mistakes, all in one short command:

joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit -a -m "wee"
[master 6946f54] wee
 2 files changed, 2 insertions(+), 2 deletions(-)
joshua@escape:~/school/15-410-ta/p3-s09p4$

What's so wrong about this? Well, the most obvious is the message; the message "wee" conveys absolutely no information to your TA's partner (well, maybe it tells my partner that I was excited about this, but not much more than that). But there are more substantial issues here. Let's go back and do this again and see what your TA missed.

joshua@escape:~/school/15-410-ta/p3-s09p4$ git status
# On branch master
# Changed but not updated:
#   (use "git add ..." to update what will be committed)
#   (use "git checkout -- ..." to discard changes in working
#   directory)
#
#       modified:   kern/mutex.c
#       modified:   user/progs/vm_explode.c
# Untracked files:
#   (use "git add ..." to include in what will be committed)
#
#       user/progs/mytest.c
no changes added to commit (use "git add" and/or "git commit -a")
joshua@escape:~/school/15-410-ta/p3-s09p4$ git diff
diff --git a/kern/mutex.c b/kern/mutex.c
index 4a13af1..55f4569 100644
--- a/kern/mutex.c
+++ b/kern/mutex.c
@@ -39,7 +39,7 @@ void mutex_init(mutex_t *mp)
 void mutex_lock(mutex_t *mp)
 {
-       make_mutexes_work();
+       make_mutexes_not_work();  // XXX changed briefly to test my demo program
        mutex_level++;
diff --git a/user/progs/vm_explode.c b/user/progs/vm_explode.c
index ee8c2b9..94c8c94 100644
--- a/user/progs/vm_explode.c
+++ b/user/progs/vm_explode.c
@@ -72,7 +72,7 @@ int main() {
         vanish();
       }
     }
-    printf("parent: all balls accounted for!\n");
+    printf("parent: all children accounted for!\n");
     set_status(0);
     vanish();
   }
joshua@escape:~/school/15-410-ta/p3-s09p4$ git add user/progs/vm_explode.c user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$ git commit
in your TA's editor...
Modify vm_explode to more accurately describe what it's doing instead of punning
on the P2 test 'juggle', and create a spinoff, mytest.

mytest makes sure that the frubulator is frobbed in the mutexes; you can
make it fail by commenting out the call to make_mutexes_work() in kern/mutex.c.
But make sure not to commit that change!  Otherwise we'll both be sad.

# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Committer: Joshua Wise <joshua@escape.joshuawise.com>
#
# On branch master
# Changes to be committed:
#   (use "git reset HEAD <file>..." to unstage)
#
#       new file:   user/progs/mytest.c
#       modified:   user/progs/vm_explode.c
#
# Changed but not updated:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#       modified:   kern/mutex.c
and back at the shell...
".git/COMMIT_EDITMSG" 30L, 1112C written
[master 6ef906e] Modify vm_explode to more accurately describe what it's doing instead
of punning on the P2 test 'juggle', and create a spinoff, mytest.
 1 files changed, 1 insertions(+), 1 deletions(-)
 create mode 100644 user/progs/mytest.c
joshua@escape:~/school/15-410-ta/p3-s09p4$

Much better! This time, your TA checked to see what he was changing before he committed it, added only the files he wanted to commit, and then wrote a descriptive commit message so that his partner could test this for himself. Importantly, your TA did not commit the change that would break his kernel's mutexes while doing this, and hence did not get strangled in his sleep by his partner.

Strive to emulate this workflow. You may find that you don't need quite such verbose messages, and git commit -m will work fine for you. That's OK; but try to make your commit messages at least somewhat useful.

Time travel with Git

In an ideal world, we would make no errors while writing code. Sadly, sometimes we wish to travel back to the past and determine what broke. It is generally considered inadvisable to modify history; if you do, you run the risk of killing one or more of your parents, and being in a paradoxical state of existance. If you wish to modify history, you might wish to create an alternate universe; in Git, we call these alternate universes "branches". Luckily, branches aren't needed to just go back and look. You may use these commands somewhat less frequently, but they are no less important.

To get a graphical view of your repository's history, run gitk. Ogle at all the pretty colors. Each time in the past that you or your partner recorded changes using the commit command will be represented by a line in the pane at the top. Select a line, and more details about the change will show up in the bottom panes, including the change's SHA1 ID.
To get a non-graphical list of all changes in a file's history, run git log file, where file is the file that you wish to get a change list for. Each change will start with a line starting with commit, and ending in the change's SHA1 ID. Some short information about the change will be given to you, including the message that you specified with the -m option to git commit.
To go back and view the repository's state as it was after a change in the past, run git checkout sha1, where sha1 consists of enough characters from the change's SHA1 ID to disambiguate it from all other changes. For instance, if the change you're looking for has ID 311b98a0a1c40ad176103ee8026131fcd0fcc919, then you may only need to run git checkout 311b98 to get the change you're looking for.

Do not make any changes when you are viewing the past like this. If you wish to make changes from the past, use a branch. If you have outstanding changes that you have not run git commit on when you attempt to switch to viewing an old version of the repository, Git will give you an error message like error: You have local changes..., and will refuse to change what version you are viewing.
To view the most recent change in the repository (i.e., recover from viewing a change in the past), run git checkout master. Any changes that you may have committed from viewing the past will be lost into the abyss (they are not irrecoverable, but doing so is beyond the scope of this document).
To revert one or more files to the state in which they were after you last committed changes or ran a checkout, run git checkout file1 file2 ..., where file1... are files that you'd like to revert changes to, and file2... are optional.
To temporarily save what you're working on to do something else, you can use the git stash command. When you run git stash, then anything that you haven't committed gets saved as a diff onto a "stash stack", and the repository takes on the appearance of the last commit.
To restore something that you have stashed, run git stash pop. This will apply the diff on the top of the stack to whatever the current state of the repository may be. For more info on git stash, run git stash --help; you may find it to be an extremely useful tool!

Splitting reality with Git

At some point, you may wish that you could make a change on a previous version of your tree without affecting the current version (yet); or you may wish to split reality in half, and work on an experimental side-project without disrupting main development of your project. Branches in Git are designed to allow you to do just those things; split away from the main view of reality from some point in time (be that time now or the past).

To create a new branch from some point in the past, run git checkout -b branchname sha1, where branchname is what you want your new branch to be called (pick something descriptive and without spaces; Experimental_COW might be a good name if you're experimenting with copy-on-write), and sha1 is the SHA1 ID of the change that you wish to branch from (see the section go back and view above). Git will change you over to that branch, and you can begin recording changes on it immediately.
To change to a different branch, run git checkout branchname. The branch name that you started on was master; so to return to the version that's in the bare repostiory, run git checkout master. (The astute reader will note that this is the same command as to recover from being in the past.)
To merge from one branch to another branch, first change to the branch that you want to merge to, then run git pull . branchname, where branchname is the branch that you want to merge from. This can be done as many times as you like; there are no negative consequences from merging repeatedly. (Git considers your other branch as a 'virtual partner' to pull from.) To publish the pulled and merged changes from your branch to your partner, you can just run a git push when you are on the master branch, as normal.
To create a branch based on the current state of your repository, run git checkout -b branchname. The semantics are similar to creating a branch from some point in the past.
To create a tag, run git tag tagname. By convention, tag names are capitalized, but this is not enforced by Git. A tag name can be used anywhere a SHA1 ID would otherwise be used; to go back to the point at which you first got your shell running in Project 3, then you might run git checkout SHELL_RUNNING. The usual rules apply if you don't create a branch there; namely, recording changes would be a bad idea unless you proceed to create a branch.
To rewrite history to clean it up, stop! You might not want to do this. You might have heard someone talk about using git rebase to "clean up" history of branches, and you might have heard someone say that "all git gurus know about rebase!". rebase has its uses, to be sure, but it's worth doing a lot of research before using it. To that end, here's the documentation from the Git book; here's an article arguing that rebase should never be used; here's a more balanced article; and here's an explanation from the Pro Git book, which one of your TAs thinks is a pretty good usage overview. The choice is yours; rebase is a very powerful tool, but it is also capable of making a pretty substantial mess.

Don't lose your data!

Git is meant to track versions of files, but that doesn't mean that you can't lose data when working with git. There are multiple kinds of data that might get lost if something goes wrong with git. For more information about what data you might lose, see How to use git to lose data.

You can protect yourself against some git-related data loss by adding these settings to the config file of your shared central repository (e.g., ~/440/REPOSITORY/project2/config).

[receive]
  fsckObjects = true
  denyDeletes = true
  denyNonFastForwards = true
[gc]
  reflogExpire = never
  reflogExpireUnreachable = never
  pruneExpire = never
  rerereresolved = never
  rerereunresolved = never
[core]
  logAllRefUpdates = true

You can do this by editing the config file directly, or by using these commands:

# One person does these, once.
$ cd ~/440/REPOSITORY/project2

$ git config receive.fsckObjects true
$ git config receive.denyDeletes true
$ git config receive.denyNonFastforwards true

$ git config gc.reflogExpire never
$ git config gc.reflogExpireUnreachable never
$ git config gc.pruneExpire never
$ git config gc.rerereresolved never
$ git config gc.rerereunresolved never

$ git config core.logAllRefUpdates true

You may also wish to apply some or all of these settings to your personal repository, though this is less important because in theory you are frequently pushing your work to a well-configured central repository.

Explanation of Concepts

The above involved some simplifications of the underlying concepts of Git for the purposes of readability and for the purposes of understandability of an introduction. The simplifications are not disastrous in terms of your comprehension of what Git is doing behind your back, but you may find it helpful to know how Git stores data to better work with Git. Tommi Virtanen's excellent page Git for Computer Scientists may provide some insight as well, for those who like to talk about DAGs and are big fans of arrows pointing every which way.

Commits

The basic unit of a point in time stored in Git is a commit. Each time we spoke of recording changes earlier, it would have been more correct to say "creating a commit"; I used the words "recording changes" to distinguish the operation from pushing and publishing your changes to your partner. A commit, by its nature, is comprised of a few pieces of information:

A reference to a parent commit: Each commit has one or more parent commits that refer to where the commit was derived from; you can think of the parent commits as previous steps in time from this commit. The very first commit you make (we called this the initial import earlier) has a special referenced parent of all zeroes, which Git takes to mean that a given commit is an initial import.
A description: This is the text that you enter in the -m option to git commit.
One or more changed files: When files are changed, Git records either a delta -- a binary patch against a file's version in the parent commit -- or a full version of the file in association with the commit. The file is technically not stored in the commit; instead, it is stored as a blob, and the commit contains a reference to the blob. Each blob can be referenced by many commits, but for most purposes, blobs behave as if they are "owned" by a commit.

A commit is identified by the SHA1 hash of all of the information that it contains. This hash is one common form of a refspec -- that is to say, it is one common way to specify a single commit. Recall that when you did a checkout to go back in time, you specified a SHA1 hash; in that case, you were using the SHA1 hash as a refspec.

You may have inferred by now that commits exist in a sort of a tree. Each commit may have one or more parent commits (a commit with more than one parent is called a merge commit), and each commit may have zero or more child commits. You can view the commit tree using gitk, as we saw above; each commit was identified by a dot, and gitk drew lines for us between each commit to explicitly show the branches of the tree.

This tree of cryptographic hashes gives Git a few very useful properties. Git can assure you that nobody has changed the tree that you have based your work on, because every element in the tree, down to the blobs, is identified by its cryptographic hash (its SHA1). If a parent object has changed, either by malicious intent or by disk corruption, Git simply will not be able to find the parent object, instead of giving you the incorrect data. This makes Git relatively immune to AFS corrupting its metadata.

Further, it makes it impossible to throw away history. Some version control systems that we discussed in lecture have versions per file; so deleting a file may delete its version history, or otherwise create a discontinuity in how the file is linked in terms of time. Similarly, renaming a file is not disastrous (although somewhat quirky); the only changes happen locally in the commit object. If a delete required a change of history, then the cryptographic hashes would change, and the entire tree's parent hash would have to change. The cryptographic hash system, then, makes Git resistant to inadvertant deletion of history.

Branches, tags, and refspecs -- oh my!

In this section, until now, you've seen only one kind of refspec -- a SHA1 hash of a commit. But in the quick-start above, you've worked with more types of refspecs; when you checked out a branch, you used the refspec that refers to the branch.

15-440* Git Quick Start Guide