Having fun with Git

I recently read The Git Book. As I went through the Git Internals parts, it struck me how simple and elegant the structure of Git really is. I decided that I just had to create my own little library to work with Git repositories (as you do). I call the result Silly Jgit. In this article, I will be walking through the code.

This article is for you if you want to understand Git a bit deeper or perhaps even want to work directly with a Git repository in your favorite programming language. I will be walking through four topics: 1) Reading a raw commit from a repository, 2) Reading the tree hash of the root of a commit, 3) parsing the file list of a directory tree, and 4) Reading the file contents from a subdirectory of a commit root.

Reading the head commit from a repository

The first thing we need to do in order to read the head commit is to find out which commit is the head of the repository. The .git/HEAD file is a plain text file that contains the name of a file in the .git/refs/heads directory. If you’ve checked out master, this will be .git/refs/heads/master. This file is a plain text file which contains a hash, that is: a 40 digit hexadecimal number. The hash can be converted to a filename of a Git Object under .git/objects. This file is a compressed file containing the commit information. Here’s the code to read it:

<span style="color: #003399;">File</span> repository <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">File</span><span style="color: #009900;">(</span><span style="color: #0000ff;">".git"</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #003399;">File</span> headFile <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">File</span><span style="color: #009900;">(</span>repository,
         <span style="color: #003399;">Util</span>.<span style="color: #006633;">asString</span><span style="color: #009900;">(</span><span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">File</span><span style="color: #009900;">(</span>repository, <span style="color: #0000ff;">"HEAD"</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span>.<span style="color: #006633;">split</span><span style="color: #009900;">(</span><span style="color: #0000ff;">" "</span><span style="color: #009900;">)</span><span style="color: #009900;">[</span><span style="color: #cc66cc;">1</span><span style="color: #009900;">]</span>.<span style="color: #006633;">trim</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
 
<span style="color: #003399;">String</span> commitHash <span style="color: #339933;">=</span>  <span style="color: #003399;">Util</span>.<span style="color: #006633;">asString</span><span style="color: #009900;">(</span>headFile<span style="color: #009900;">)</span>.<span style="color: #006633;">trim</span><span style="color: #009900;">(</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #003399;">File</span> commitFile <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">File</span><span style="color: #009900;">(</span>repository,
         <span style="color: #0000ff;">"objects/"</span> <span style="color: #339933;">+</span> commitHash.<span style="color: #006633;">substring</span><span style="color: #009900;">(</span><span style="color: #cc66cc;">0</span>,<span style="color: #cc66cc;">2</span><span style="color: #009900;">)</span> <span style="color: #339933;">+</span> <span style="color: #0000ff;">"/"</span> <span style="color: #339933;">+</span> commitHash.<span style="color: #006633;">substring</span><span style="color: #009900;">(</span><span style="color: #cc66cc;">2</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">try</span><span style="color: #009900;">(</span><span style="color: #000000; font-weight: bold;">final</span> <span style="color: #003399;">InputStream</span> inputStream <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">InflaterInputStream</span><span style="color: #009900;">(</span><span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">FileInputStream</span><span style="color: #009900;">(</span>commitFile<span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #009900;">)</span> <span style="color: #009900;">{</span>
    <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">(</span><span style="color: #003399;">Util</span>.<span style="color: #006633;">asString</span><span style="color: #009900;">(</span>inputStream<span style="color: #009900;">)</span><span style="color: #009900;">)</span><span style="color: #339933;">;</span>
<span style="color: #009900;">}</span>

Running this code produces the following output (notice that some of the spaces in the output are actually null bytes in the file):

commit 237 tree c03265971361724e18e31cc83e5c60cd0e0f5754
parent 141f5d5a2cc0c268e7b05be17a49c1c0dc61efad
author Johannes Brodwall <jbr @exilesoft.com> 1379445359 +0200
committer Johannes Brodwall </jbr><jbr @exilesoft.com> 1379445359 +0200

This is the commit comment
</jbr>

Finding the directory tree of a commit

When we have the commit information, we can parse it to find the tree hash. The tree hash references another file under .git/objects which contains the index of the root directory of the files in the commit. In the example above, the tree hash is “c03265971361724e18e31cc83e5c60cd0e0f5754″. But before we read the tree hash, we have to read the object type (in this case a “commit”) and size (in this case 237).

Related:
1 2 3 Page 1
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.