A tutorial introduction to git

Importing a new project

Assume you have a tarball project.tar.gz with your initial work. You can place it under git revision control as follows.

$ tar xzf project.tar.gz
$ cd project
$ git init-db

Git will reply

defaulting to local storage area

You've now initialized the working directory—you may notice a new directory created, named ".git". Tell git that you want it to track every file under the current directory with

$ git add .

Finally,

$ git commit -a

will prompt you for a commit message, then record the current state of all the files to the repository.

Try modifying some files, then run

$ git diff

to review your changes. When you're done,

$ git commit -a

will again prompt your for a message describing the change, and then record the new versions of the modified files.

A note on commit messages: Though not required, it's a good idea to begin the commit message with a single short (less than 50 character) line summarizing the change, followed by a blank line and then a more thorough description. Tools that turn commits into email, for example, use the first line on the Subject line and the rest of the commit in the body.

To add a new file, first create the file, then

$ git add path/to/new/file

then commit as usual. No special command is required when removing a file; just remove it, then commit.

At any point you can view the history of your changes using

$ git log

If you also want to see complete diffs at each step, use

$ git log -p

Managing branches

A single git repository can maintain multiple branches of development. To create a new branch named "experimental", use

$ git branch experimental

If you now run

$ git branch

you'll get a list of all existing branches:

  experimental
* master

The "experimental" branch is the one you just created, and the "master" branch is a default branch that was created for you automatically. The asterisk marks the branch you are currently on; type

$ git checkout experimental

to switch to the experimental branch. Now edit a file, commit the change, and switch back to the master branch:

(edit file)
$ git commit -a
$ git checkout master

Check that the change you made is no longer visible, since it was made on the experimental branch and you're back on the master branch.

You can make a different change on the master branch:

(edit file)
$ git commit -a

at this point the two branches have diverged, with different changes made in each. To merge the changes made in the two branches, run

$ git pull . experimental

If the changes don't conflict, you're done. If there are conflicts, markers will be left in the problematic files showing the conflict;

$ git diff

will show this. Once you've edited the files to resolve the conflicts,

$ git commit -a

will commit the result of the merge. Finally,

$ gitk

will show a nice graphical representation of the resulting history.

If you develop on a branch crazy-idea, then regret it, you can always delete the branch with

$ git branch -D crazy-idea

Branches are cheap and easy, so this is a good way to try something out.

Using git for collaboration

Suppose that Alice has started a new project with a git repository in /home/alice/project, and that Bob, who has a home directory on the same machine, wants to contribute.

Bob begins with:

$ git clone /home/alice/project myrepo

This creates a new directory "myrepo" containing a clone of Alice's repository. The clone is on an equal footing with the original project, possessing its own copy of the original project's history.

Bob then makes some changes and commits them:

(edit files)
$ git commit -a
(repeat as necessary)

When he's ready, he tells Alice to pull changes from the repository at /home/bob/myrepo. She does this with:

$ cd /home/alice/project
$ git pull /home/bob/myrepo

This actually pulls changes from the branch in Bob's repository named "master". Alice could request a different branch by adding the name of the branch to the end of the git pull command line.

This merges Bob's changes into her repository; "git log" will now show the new commits. If Alice has made her own changes in the meantime, then Bob's changes will be merged in, and she will need to manually fix any conflicts.

A more cautious Alice might wish to examine Bob's changes before pulling them. She can do this by creating a temporary branch just for the purpose of studying Bob's changes:

$ git fetch /home/bob/myrepo master:bob-incoming

which fetches the changes from Bob's master branch into a new branch named bob-incoming. (Unlike git pull, git fetch just fetches a copy of Bob's line of development without doing any merging). Then

$ git log -p master..bob-incoming

shows a list of all the changes that Bob made since he branched from Alice's master branch.

After examining those changes, and possibly fixing things, Alice can pull the changes into her master branch:

$ git checkout master
$ git pull . bob-incoming

The last command is a pull from the "bob-incoming" branch in Alice's own repository.

Later, Bob can update his repo with Alice's latest changes using

$ git pull

Note that he doesn't need to give the path to Alice's repository; when Bob cloned Alice's repository, git stored the location of her repository in the file .git/remotes/origin, and that location is used as the default for pulls.

Bob may also notice a branch in his repository that he didn't create:

$ git branch
* master
  origin

The "origin" branch, which was created automatically by "git clone", is a pristine copy of Alice's master branch; Bob should never commit to it.

If Bob later decides to work from a different host, he can still perform clones and pulls using the ssh protocol:

$ git clone alice.org:/home/alice/project myrepo

Alternatively, git has a native protocol, or can use rsync or http; see git-pull(1) for details.

Git can also be used in a CVS-like mode, with a central repository that various users push changes to; see git-push(1) and git for CVS users.

Exploring history

Git history is represented as a series of interrelated commits. We have already seen that the git log command can list those commits. Note that first line of each git log entry also gives a name for the commit:

$ git log
commit c82a22c39cbc32576f64f5c6b3f24b99ea8149c7
Author: Junio C Hamano <junkio@cox.net>
Date:   Tue May 16 17:18:22 2006 -0700

    merge-base: Clarify the comments on post processing.

We can give this name to git show to see the details about this commit.

$ git show c82a22c39cbc32576f64f5c6b3f24b99ea8149c7

But there other ways to refer to commits. You can use any initial part of the name that is long enough to uniquely identify the commit:

$ git show c82a22c39c   # the first few characters of the name are
                        # usually enough
$ git show HEAD         # the tip of the current branch
$ git show experimental # the tip of the "experimental" branch

Every commit has at least one "parent" commit, which points to the previous state of the project:

$ git show HEAD^  # to see the parent of HEAD
$ git show HEAD^^ # to see the grandparent of HEAD
$ git show HEAD~4 # to see the great-great grandparent of HEAD

Note that merge commits may have more than one parent:

$ git show HEAD^1 # show the first parent of HEAD (same as HEAD^)
$ git show HEAD^2 # show the second parent of HEAD

You can also give commits names of your own; after running

$ git-tag v2.5 1b2e1d63ff

you can refer to 1b2e1d63ff by the name "v2.5". If you intend to share this name with other people (for example, to identify a release version), you should create a "tag" object, and perhaps sign it; see git-tag(1) for details.

Any git command that needs to know a commit can take any of these names. For example:

$ git diff v2.5 HEAD     # compare the current HEAD to v2.5
$ git branch stable v2.5 # start a new branch named "stable" based
                         # at v2.5
$ git reset --hard HEAD^ # reset your current branch and working
                         # directory its state at HEAD^

Be careful with that last command: in addition to losing any changes in the working directory, it will also remove all later commits from this branch. If this branch is the only branch containing those commits, they will be lost. (Also, don't use "git reset" on a publicly-visible branch that other developers pull from, as git will be confused by history that disappears in this way.)

The git grep command can search for strings in any version of your project, so

$ git grep "hello" v2.5

searches for all occurrences of "hello" in v2.5.

If you leave out the commit name, git grep will search any of the files it manages in your current directory. So

$ git grep "hello"

is a quick way to search just the files that are tracked by git.

Many git commands also take sets of commits, which can be specified in a number of ways. Here are some examples with git log:

$ git log v2.5..v2.6            # commits between v2.5 and v2.6
$ git log v2.5..                # commits since v2.5
$ git log --since="2 weeks ago" # commits from the last 2 weeks
$ git log v2.5.. Makefile       # commits since v2.5 which modify
                                # Makefile

You can also give git log a "range" of commits where the first is not necessarily an ancestor of the second; for example, if the tips of the branches "stable-release" and "master" diverged from a common commit some time ago, then

$ git log stable..experimental

will list commits made in the experimental branch but not in the stable branch, while

$ git log experimental..stable

will show the list of commits made on the stable branch but not the experimental branch.

The "git log" command has a weakness: it must present commits in a list. When the history has lines of development that diverged and then merged back together, the order in which "git log" presents those commits is meaningless.

Most projects with multiple contributors (such as the linux kernel, or git itself) have frequent merges, and gitk does a better job of visualizing their history. For example,

$ gitk --since="2 weeks ago" drivers/

allows you to browse any commits from the last 2 weeks of commits that modified files under the "drivers" directory. (Note: you can adjust gitk's fonts by holding down the control key while pressing "-" or "+".)

Finally, most commands that take filenames will optionally allow you to precede any filename by a commit, to specify a particular version of the file:

$ git diff v2.5:Makefile HEAD:Makefile.in

You can also use "git cat-file -p" to see any such file:

$ git cat-file -p v2.5:Makefile

Next Steps

This tutorial should be enough to perform basic distributed revision control for your projects. However, to fully understand the depth and power of git you need to understand two simple ideas on which it is based:

The object database is the rather elegant system used to store the history of your project—files, directories, and commits.
The index file is a cache of the state of a directory tree, used to create commits, check out working directories, and hold the various trees involved in a merge.

Part two of this tutorial explains the object database, the index file, and a few other odds and ends that you'll need to make the most of git.

If you don't want to consider with that right away, a few other digressions that may be interesting at this point are:

git-format-patch(1), git-am(1): These convert series of git commits into emailed patches, and vice versa, useful for projects such as the linux kernel which rely heavily on emailed patches.
git-bisect(1): When there is a regression in your project, one way to track down the bug is by searching through the history to find the exact commit that's to blame. Git bisect can help you perform a binary search for that commit. It is smart enough to perform a close-to-optimal search even in the case of complex non-linear history with lots of merged branches.
Everyday GIT with 20 Commands Or So
git for CVS users.