Git Data Model

This is already rather terse, so this is going to be even terserer!

Git has 4 things: - objects: commits, trees, blobs, tags - references: branches, tags, remote-tracking branches &c. - the index: aka the staging area - reflogs: logs of changes to references

objects

Git objects never ever change, and have an ID which is the hash of its contents. The 4 object types have various required fields, such as a tree containing the file structure of everything in it, the parent commit id and the author, and a committer and commit message.

Git objects can never be changed, so git commit –amend actually just creates a new commit with the same parent (and then you can optionally delete the old one).

Trees are how git represents file directories. They can contain files (blobs) or other trees. It stores the filename, file type, object id.

A blob object contains a file’s contents, whatever that is. Whenever you make a commit, git stores the entire contents of each file you changed as a blob. Git is very memory efficient when you have lots of small files, and you don’t change most of them.

Tags store the id of the thing they refer to, the type of that object, the id of the tagger and a message. (Tags are what people use to tag release versions and so on)

references

References are names for commits (easier than remembering a hash). They either refer to an object, or another reference. A branch refers to the latest commit on that branch, which is automatically updated as you work on (check out) that branch.

HEAD is the current branch, which is either a symbolic reference to that branch or a direct reference to a commit ID if there is no current branch. that’s what DETACHED HEAD means. that’s when the current branch is not actually a branch, just a lone commit id.

Remote tracking branches also just store commit ids, which is how git knows if you are up to date with the remote or not. refs/remotes//HEAD is a symbolic reference to the remote’s default branch.

stash, bisect and third party tools can also create their own references. lots of fun.

git will delete unreachable objects. reachable objects will never be deleted. an object can be reached if something refers to it. like garbage collection y’know

the index

aka the staging area. this is a list of files and contents thereof (blobs). git add adds things to the staging area. unlike a tree, the index is flat. when you commit, git converts the index into a tree and uses that tree in the new commit. nifty.

git status shows you a diff of the index and HEAD.

reflogs

each time you update a reference (a branch, or reference tag, or HEAD), git updates a log for that reference. this means if you lose a commit you can usually get it back via its id in the reflog as long as the blobs still exist, which they probably will.

reflogs do not go to the remote!

woah.

This is really well explained! i’m starting to understand what git is really. its very clever, i have to admit. maybe this linus guy knows his stuff.