- BLOB: A "blob" object is nothing but a binary blob of data, and
- doesn't refer to anything else. There is no signature or any
- other verification of the data, so while the object is
- consistent (it _is_ indexed by its sha1 hash, so the data itself
- is certainly correct), it has absolutely no other attributes.
- No name associations, no permissions. It is purely a blob of
- data (i.e. normally "file contents").
-
- In particular, since the blob is entirely defined by its data,
- if two files in a directory tree (or in multiple different
- versions of the repository) have the same contents, they will
- share the same blob object. The object is totally independent
- of it's location in the directory tree, and renaming a file does
- not change the object that file is associated with in any way.
-
- TREE: The next hierarchical object type is the "tree" object. A tree
- object is a list of mode/name/blob data, sorted by name.
- Alternatively, the mode data may specify a directory mode, in
- which case instead of naming a blob, that name is associated
- with another TREE object.
-
- Like the "blob" object, a tree object is uniquely determined by
- the set contents, and so two separate but identical trees will
- always share the exact same object. This is true at all levels,
- i.e. it's true for a "leaf" tree (which does not refer to any
- other trees, only blobs) as well as for a whole subdirectory.
-
- For that reason a "tree" object is just a pure data abstraction:
- it has no history, no signatures, no verification of validity,
- except that since the contents are again protected by the hash
- itself, we can trust that the tree is immutable and its contents
- never change.
-
- So you can trust the contents of a tree to be valid, the same
- way you can trust the contents of a blob, but you don't know
- where those contents _came_ from.
-
- Side note on trees: since a "tree" object is a sorted list of
- "filename+content", you can create a diff between two trees
- without actually having to unpack two trees. Just ignore all
- common parts, and your diff will look right. In other words,
- you can effectively (and efficiently) tell the difference
- between any two random trees by O(n) where "n" is the size of
- the difference, rather than the size of the tree.
-
- Side note 2 on trees: since the name of a "blob" depends
- entirely and exclusively on its contents (i.e. there are no names
- or permissions involved), you can see trivial renames or
- permission changes by noticing that the blob stayed the same.
- However, renames with data changes need a smarter "diff" implementation.
-
-CHANGESET: The "changeset" object is an object that introduces the
- notion of history into the picture. In contrast to the other
- objects, it doesn't just describe the physical state of a tree,
- it describes how we got there, and why.
-
- A "changeset" is defined by the tree-object that it results in,
- the parent changesets (zero, one or more) that led up to that
- point, and a comment on what happened. Again, a changeset is
- not trusted per se: the contents are well-defined and "safe" due
- to the cryptographically strong signatures at all levels, but
- there is no reason to believe that the tree is "good" or that
- the merge information makes sense. The parents do not have to
- actually have any relationship with the result, for example.
-
- Note on changesets: unlike real SCM's, changesets do not contain
- rename information or file mode change information. All of that
- is implicit in the trees involved (the result tree, and the
- result trees of the parents), and describing that makes no sense
- in this idiotic file manager.
-
-TRUST: The notion of "trust" is really outside the scope of "git", but
- it's worth noting a few things. First off, since everything is
- hashed with SHA1, you _can_ trust that an object is intact and
- has not been messed with by external sources. So the name of an
- object uniquely identifies a known state - just not a state that
- you may want to trust.
-
- Furthermore, since the SHA1 signature of a changeset refers to
- the SHA1 signatures of the tree it is associated with and the
- signatures of the parent, a single named changeset specifies
- uniquely a whole set of history, with full contents. You can't
- later fake any step of the way once you have the name of a
- changeset.
-
- So to introduce some real trust in the system, the only thing
- you need to do is to digitally sign just _one_ special note,
- which includes the name of a top-level changeset. Your digital
- signature shows others that you trust that changeset, and the
- immutability of the history of changesets tells others that they
- can trust the whole history.
-
- In other words, you can easily validate a whole archive by just
- sending out a single email that tells the people the name (SHA1
- hash) of the top changeset, and digitally sign that email using
- something like GPG/PGP.
-
- In particular, you can also have a separate archive of "trust
- points" or tags, which document your (and other peoples) trust.
- You may, of course, archive these "certificates of trust" using
- "git" itself, but it's not something "git" does for you.
-
-Another way of saying the last point: "git" itself only handles content
-integrity, the trust has to come from outside.
-
-
-
- The "index" aka "Current Directory Cache" (".git/index")
-
-
+Blob Object
+~~~~~~~~~~~
+A "blob" object is nothing but a binary blob of data, and doesn't
+refer to anything else. There is no signature or any other
+verification of the data, so while the object is consistent (it _is_
+indexed by its sha1 hash, so the data itself is certainly correct), it
+has absolutely no other attributes. No name associations, no
+permissions. It is purely a blob of data (i.e. normally "file
+contents").
+
+In particular, since the blob is entirely defined by its data, if two
+files in a directory tree (or in multiple different versions of the
+repository) have the same contents, they will share the same blob
+object. The object is totally independent of it's location in the
+directory tree, and renaming a file does not change the object that
+file is associated with in any way.
+
+Tree Object
+~~~~~~~~~~~
+The next hierarchical object type is the "tree" object. A tree object
+is a list of mode/name/blob data, sorted by name. Alternatively, the
+mode data may specify a directory mode, in which case instead of
+naming a blob, that name is associated with another TREE object.
+
+Like the "blob" object, a tree object is uniquely determined by the
+set contents, and so two separate but identical trees will always
+share the exact same object. This is true at all levels, i.e. it's
+true for a "leaf" tree (which does not refer to any other trees, only
+blobs) as well as for a whole subdirectory.
+
+For that reason a "tree" object is just a pure data abstraction: it
+has no history, no signatures, no verification of validity, except
+that since the contents are again protected by the hash itself, we can
+trust that the tree is immutable and its contents never change.
+
+So you can trust the contents of a tree to be valid, the same way you
+can trust the contents of a blob, but you don't know where those
+contents _came_ from.
+
+Side note on trees: since a "tree" object is a sorted list of
+"filename+content", you can create a diff between two trees without
+actually having to unpack two trees. Just ignore all common parts,
+and your diff will look right. In other words, you can effectively
+(and efficiently) tell the difference between any two random trees by
+O(n) where "n" is the size of the difference, rather than the size of
+the tree.
+
+Side note 2 on trees: since the name of a "blob" depends entirely and
+exclusively on its contents (i.e. there are no names or permissions
+involved), you can see trivial renames or permission changes by
+noticing that the blob stayed the same. However, renames with data
+changes need a smarter "diff" implementation.
+
+
+Changeset Object
+~~~~~~~~~~~~~~~~
+The "changeset" object is an object that introduces the notion of
+history into the picture. In contrast to the other objects, it
+doesn't just describe the physical state of a tree, it describes how
+we got there, and why.
+
+A "changeset" is defined by the tree-object that it results in, the
+parent changesets (zero, one or more) that led up to that point, and a
+comment on what happened. Again, a changeset is not trusted per se:
+the contents are well-defined and "safe" due to the cryptographically
+strong signatures at all levels, but there is no reason to believe
+that the tree is "good" or that the merge information makes sense.
+The parents do not have to actually have any relationship with the
+result, for example.
+
+Note on changesets: unlike real SCM's, changesets do not contain
+rename information or file mode change information. All of that is
+implicit in the trees involved (the result tree, and the result trees
+of the parents), and describing that makes no sense in this idiotic
+file manager.
+
+Trust Object
+~~~~~~~~~~~~
+The notion of "trust" is really outside the scope of "git", but it's
+worth noting a few things. First off, since everything is hashed with
+SHA1, you _can_ trust that an object is intact and has not been messed
+with by external sources. So the name of an object uniquely
+identifies a known state - just not a state that you may want to
+trust.
+
+Furthermore, since the SHA1 signature of a changeset refers to the
+SHA1 signatures of the tree it is associated with and the signatures
+of the parent, a single named changeset specifies uniquely a whole set
+of history, with full contents. You can't later fake any step of the
+way once you have the name of a changeset.
+
+So to introduce some real trust in the system, the only thing you need
+to do is to digitally sign just _one_ special note, which includes the
+name of a top-level changeset. Your digital signature shows others
+that you trust that changeset, and the immutability of the history of
+changesets tells others that they can trust the whole history.
+
+In other words, you can easily validate a whole archive by just
+sending out a single email that tells the people the name (SHA1 hash)
+of the top changeset, and digitally sign that email using something
+like GPG/PGP.
+
+In particular, you can also have a separate archive of "trust points"
+or tags, which document your (and other peoples) trust. You may, of
+course, archive these "certificates of trust" using "git" itself, but
+it's not something "git" does for you.
+
+Another way of saying the last point: "git" itself only handles
+content integrity, the trust has to come from outside.
+
+
+
+
+The "index" aka "Current Directory Cache"
+-----------------------------------------