We developers use git all the time. Git internals might feel like magic but what git actually does is really simple. Let first see how git works and then build a simple clone to really understand it.
Lets create a empty folder and then run initialize the git repo by running
$ git init
This creates .git folder inside your empty folder. The structure of this .git folder is as follows
.git/ ├── branches ├── config ├── description ├── HEAD ├── hooks ├── info │ └── exclude ├── objects │ ├── info │ └── pack └── refs ├── heads └── tags
If you open up the HEAD file in your text editor you will see the following text in it
which means that your current branch is master.
Now add a file and make a first commit by doing
$ echo "Hello" >> README.md $ git add . $ git commit -m "Initial commit"
$ git log and you will get
commit acde617e8ab39bb157821d3bf84d04e157bff52c (HEAD -> master) Author: username <email@example.com> Date: Wed Aug 05 18:43:48 2020 +0330 Initial commit
- The exact commit hash that you get will differ from what you see here depending on your username, email and time that you make the commit
And if you open up refs/head/master it will have text
acde617e8ab39bb157821d3bf84d04e157bff52c inside it.
In git each commit is associated with a hash. The content of the file refs/head/master means master is pointing to the commit
// TODO: 1. Now make a second commit 2. check `$ git log` and content of **refs/head/master**
After you have made your first commit if you inspect the contents of the .git folder again you will see something new
.git/ ├── branches ├── COMMIT_EDITMSG ├── config ├── description ├── HEAD ├── hooks ├── index ├── info │ └── exclude ├── logs │ ├── HEAD │ └── refs │ └── heads │ └── master ├── objects │ ├── ac │ │ └── de617e8ab39bb157821d3bf84d04e157bff52c │ ├── dc │ │ └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f │ ├── e9 │ │ └── 65047ad7c57865823c7d992b1d046ea66edf78 │ ├── info │ └── pack └── refs ├── heads │ └── master └── tags
There is a new file called index and there some weird things inside object folders (we will ignore all other new things for now).
When you run
$ git add . git takes the changes that you have made and creates objects for it. The names of the objects are determined by running your file content into SHA1 algorithm. SHA1 algorithm basically takes some input and outputs 40 character string.
Lets try to generate SHA1 of the file README.md. You can do that by running
$ git hash-object README.md
which will give you output
So that is where the content of the file README.md is stored. The first two characters of the hash are used for folder name. The file 65047ad7c57865823c7d992b1d046ea66edf78 is binary file to see its content we can run
$ git cat-file -p e965047ad7c57865823c7d992b1d046ea66edf78
Which is the content of your README.md !!!
But what are other two objects?
There two other objects that are present in the objects directory. What are those? Git has four types of objects blob, tree, commit and tag. Blob is used to store the content of the file the one we just saw is a blob. You can see the type of the object by running
$ git cat-file -t e965047ad7c57865823c7d992b1d046ea66edf78
which will print
When you run
$ git cat-file -p acde617e8ab39bb157821d3bf84d04e157bff52c
and you will get
tree dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f author username <firstname.lastname@example.org> 1597845162 +0530 committer username <email@example.com> 1597845162 +0530 Initial commit
That is our actual commit and it has author, committer and something called tree which is another git object
Lets see what that tree object has by running
$ git cat-file -p dc0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f
100644 blob e965047ad7c57865823c7d992b1d046ea66edf78 README.md
It has the name of the file and name of the blob that has the file content. It is essentially how your working directory looked like at that commit.
Lets sum up our understanding till now. When you make a commit in git. The content of the file are passed through an SHA1 hash to get a 40 character length string which is used to store the content of the file. Then it creates a tree object which is essentially how your working directory looked at that point in time. The tree says which blobs are associated with which file names. Then there is a commit object which points to this tree object and also has the commit message, author, committer, and email.
// TODO 1. Now make another commit 2. inspect the contents of the .git folder 3. See what are the objects that are there in .git folder 4. Look at content of objects
Now lets create a branch by running
$ git branch feature-1
Now lets take a look at content of .git folder
.git/ ├── branches ├── COMMIT_EDITMSG ├── config ├── description ├── HEAD ├── hooks ├── index ├── info │ └── exclude ├── logs │ ├── HEAD │ └── refs │ └── heads │ ├── feature-1 │ └── master ├── objects │ ├── ac │ │ └── de617e8ab39bb157821d3bf84d04e157bff52c │ ├── dc │ │ └── 0a29da1d9b3f68dcd56af0e34f8df4fbf8b24f │ ├── e9 │ │ └── 65047ad7c57865823c7d992b1d046ea66edf78 │ ├── info │ └── pack └── refs ├── heads │ ├── feature-1 │ └── master └── tags
Now there is a new file called refs/heads/feature-1 and if we take a peak at its content it will be the commit hash from which you created the branch.
Now if we checkout feature-1 branch by running
$ git checkout feature-1
The content of our HEAD file changes to
// TODO 1. Try creating a file refs/heads/feature-2 2. Run git log 3. Put the hash inside that file 4. Try running git branch
When you create a file you are creating the file in your local file system and after you are done you add the file to git by
$ git add . this adds the file to the staging area. Then when you make commit the files the commit object is created for
files in the staging area.
So the question now is where is this staging area? The answer is it is in the index file
We can see the contents of the staging area by running
$ git ls-files --stage
which gives us
100644 e965047ad7c57865823c7d992b1d046ea66edf78 0 README.md
// TODO 1. Create a README2.md file 2. Run git ls-files --stage and look at its content 3. Run git add . 4. Now run git ls-files --stage again
- We have skipped some details like tags, packing... which is not necessary for building simple git
- The number 100644 is essentially permissions of the file
Now that we have some understanding of git lets start building a simple version in the next section