Open source Java projects: GitHub

A guide to social coding with Git and GitHub

If you've been curious about GitHub then this short tutorial in the Open source Java projects series is for you. Get an overview of the source code repository that has changed the way that many developers work, both individually and collaboratively. Then try GitHub for yourself, using common Git commands to branch and commit your own open source project.

GitHub is a social coding website and source-code hosting service that uses Git as its version control system. Launched in 2008, GitHub already boasts nearly 1.7 million people hosting nearly 3 million repositories. Like most social networks, GitHub allows users to create and follow feeds associated with each other's projects. It also extends the social paradigm to include network graphs that show repository usage. You can think about GitHub as a social network, a la Facebook, but just for software developers.

Bringing together social elements with a free repository to host open source projects, GitHub aims to cultivate a supportive and active community for the betterment of the software industry. The more active a project is, the more people will find it, and hopefully contribute to it. GitHub also offers commercial project support at a nominal cost.

In addition to following projects, GitHub allows users to follow individual software developers. This makes it easy to keep up with what friends and colleagues are doing and review their code, as well as seek out well-known programmers and follow their work. A regularly updated feed presents an opportunity to watch someone practice their craft. For developers, there's a lot to learn from studying each other's code and methodology; for instance, being able to see what code other developers push to their projects, and when, is a great way to learn at a high level about the release development cycle.

Social coding with GitHub enables developers to learn from each other in a new way while storing and updating code using a popular, well-featured version control system. In this edition of Open source Java projects I will help you get started with GitHub. First I'll provide an overview of the platform, then introduce some Git basics, including command-line options that you'll use frequently in GitHub. Finally, I'll walk through a simple diff-to-commit example that demonstrates the everyday power of this distributed code repository.

Get started with GitHub

GitHub accounts come in several flavors, grouped by individual or commercial account and by public or private repository. Open source developers are allowed unlimited public repositories, or for a small fee can choose to host between five and 20 private repositories. Commercial developers pay more (about twice as much as open source developers as of this writing) and can scale to up to 125 private repositories. See the GitHub homepage for a complete listing of plans and pricing.

You will need a GitHub account in order to follow along with this article. Go to the GitHub website and click on the Signup and Pricing link at the top of the page. Click "Create a free account" and complete the account-creation process.

If you want setup instructions for your operating system, see the GitHub tutorial. Note that the installation process automatically installs a GUI client and prompts you to manually install GitHub's command-line tools. I recommend that you take this option in case you ever want to do something quickly on the command line.

Git: A primer

You will need to be at least somewhat familiar with Git in order to effectively use GitHub. A point of interest to most geeks is that Git was designed and developed by Linus Torvalds, the founder of Linux. In this section I provide an overview of Git and describe how it works. Toward the end of the article I present a review of a few of the more popular commands to help you become productive quickly. This is by no means an exhaustive tutorial but it should help you get started.

When software developers think about a version control system (VCS), we tend to think of a central repository that we'll use to download source code, make changes locally, and then submit those changes back to the central repository. Git is a little different. It is a distributed version control system, meaning that it really isn't a central repository but rather multiple clones of repositories. So the "master repository" exists somewhere (like in GitHub) but we work locally on clone repositories.

Git's distributed architecture provides a significant benefit over non-distributed version control systems in that developers can locally check-in and check-out code, create branches, and more. For a major change in a traditional VCS you would create a personal branch and check code into that branch. When you were done with your changes, you would merge that branch into the main branch.

Distributed version control

Git changes the VC paradigm because you can work locally and merge all of your changes in a singular commit (you can keep your local history when you merge or you can combine all changes into one check-in). So the central repository is not littered with branches and dozens of historical notes, but only information about feature changes that have been made. In essence, Git uses branches as they were intended: to develop a new feature set, to maintain a release, or to fix bugs associated with a release.

When you install Git on your local machine and "clone" a repository, you receive the entire repository, including historical information about all of the source code in the project. You then work against your local repository, adding new files, removing files, and changing files in a staging environment until you actually commit them to the local repository. Git maintains versioning information about all of your changes and you can easily roll back to any point in your history. Finally, when you are ready, you can synchronize your local repository with a remote one.

Changes are synchronized to a remote repository via a push while changes in a remote repository are synchronized with your local repository via a pull. Because you have a full clone of the repository locally, you are not limited to simply working against that repository's main branch. You can create branches to contain your changes and then either push or pull them as appropriate.

See Resources if you need a more complete tutorial introduction to Git. I'll focus on GitHub for the remainder of this article.

Social coding with GitHub

After you have created a GitHub account you can start following the work of other software developers or watching projects that interest you. You can find people or projects to follow by searching for them directly; or, if you're looking for ideas you can use GitHub's "Explore" function to find projects based on your interest. Explore GitHub displays trending repositories as well as featured ones. In addition to these, you can explore all repositories by clicking on the "Repositories" button on the toolbar. If you want to search for projects coded in a specific programming language you can click "Languages" on the toolbar, then choose the language that you want to explore. Figure 1 shows the most watched (i.e., trending) Java repositories at the time of this writing.

Figure 1. GitHub's most watched Java repositories (click to enlarge)

Storm was the most watched Java repository on GitHub at the time that I checked. Once you find a project that you're interested in, click on it and you'll see a "Watch" option, as shown in Figure 2.

Figure 2. Following a repository (click to enlarge)

If you click "Watch" then you'll be subscribed to follow the project and will be able to see changes made to it on your GitHub homepage. Figure 3 shows my GitHub homepage, which contains a listing of updates to various Spring projects.

Figure 3. Watched projects and developers on GitHub (click to enlarge)

Following individual developers works the same way as following projects. For example, I recently decided to follow my friend Tom Akehurst, as shown in Figure 4.

Figure 4. Following a developer (click to enlarge)

From a social perspective, GitHub empowers you to easily find developers and projects that you might be interested in and receive updates about them. Of course, the flip side is also true: GitHub is an excellent place to show off your work and get feedback and recognition from your peers.

Using GitHub for project updates and maintenance

Remember that the main driver behind GitHub is to promote the development of open source software projects. So if you've built something good, why not contribute your code to GitHub and make it freely available to the world?

You can create a new repository from the GitHub website or from the client application on your desktop. In this section we'll first walk through creating a repository from the website, then I'll show you how to update and commit a file via the command-line.

Setting up a GitHub repository

Assuming that you have an account set up, log in and you'll see something similar to what's shown in Figure 5. Click "Create a Repository" and you'll be guided through the process. First, you'll click on the first item in your top-right toolbar, "Create a New Repo."

Figure 5. Creating a new repository (click to enlarge)

Click that link and give your repository a name and description. I created a new repository to host my GeekCap utilities, which is a set of helper classes that include sorting algorithms and a re-sortable list, a class that easily extracts icons from the Java Look-and-Feel Graphics Repository, ZIP utilities, and more. While not the coolest project, I included it because most of my other projects use one or more of these utilities, so it's good to have them stored in an accessible place. I named my project geek-util and gave it a description: "Geekcap Utilities: helpful classes that are used by other projects."

Once your project is set up you should see a screen like the one shown the Figure 6.

Figure 6. Repository created (click to enlarge)

Adding a project using Git commands

The screenshot in Figure 6 shows a listing of what you can do with your new repository, as well as an example of creating a README file and pushing it into your repository. I have an existing Maven project that I need to add for the first time, so I start by adding my pom.xml file and my src directory. Below are the Git commands that I entered for the initial push of the project into the repository:

Listing 1. Git commands for creating a repository

git init
git add src
git add pom.xml
git commit -m 'Initial commit'
git remote add origin
git push -u origin master 

Here's where familiarity with Git is important if you want to use GitHub. Fortunately, the main Git commands are relatively intuitive:

  • git init creates an empty Git repository. Specifically, this creates the .git directory, which the git command will recognize as a repository.
  • git add adds files to the repository; in this case I added my pom.xml and my src directory.
  • git commit commits changes to the repository. All I did was to add the pom.xml file and src directory. You would also use this command after modifying the contents of a file or deleting files via the git rm command.
  • git remote add origin adds the specified URL as the origin server for the Git repository. As you saw in Figure 6, the origin server is created on GitHub for you and the URL is provided in the setup documentation.
  • git push uploads all committed changes to the specified server. In this case I've pushed the initial commit that contains the pom.xml and src directory to the origin server, which I previously set.

You can use Git from your IDE or from the command line; I just I happen to be a command-line junkie. Executing git help shows the most common commands, which are summarized in Listing 2.

Listing 2. More common Git commands

usage: git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path]
           [-p|--paginate|--no-pager] [--no-replace-objects] [--bare]
           [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>]
           [-c name=value] [--help]
           <command> [<args>]

The most commonly used git commands are:add        Add file contents to the index
   bisect     Find by binary search the change that introduced a bug
   branch     List, create, or delete branches
   checkout   Checkout a branch or paths to the working tree
   clone      Clone a repository into a new directory
   commit     Record changes to the repository
   diff       Show changes between commits, commit and working tree, etc
   fetch      Download objects and refs from another repository
   grep       Print lines matching a pattern
   init       Create an empty git repository or reinitialize an existing one
   log        Show commit logs
   merge      Join two or more development histories together
   mv         Move or rename a file, a directory, or a symlink
   pull       Fetch from and merge with another repository or a local branch
   push       Update remote refs along with associated objects
   rebase     Forward-port local commits to the updated upstream head
   reset      Reset current HEAD to the specified state
   rm         Remove files from the working tree and from the index
   show       Show various types of objects
   status     Show the working tree status
   tag        Create, list, delete or verify a tag object signed with GPG     

Update and commit

The ability to make changes locally and then push them into the repository is a powerful feature of GitHub. For instance, I recently noticed that my GitHub project geek-util uses JUnit 4.6 and decided to upgrade to v4.10. I started by updating my POM file locally to include JUnit 4.10. In order to review the differences between my local repository and the code I'm working on I executed the git diff command:

Listing 3. Diff review

$ git diff

index d1c3066..5fe2f4f 100644
--- a/pom.xml
+++ b/pom.xml
@@ -63,7 +63,7 @@
-            <version>4.6</version>
+            <version>4.10</version>

The minus sign (-) shows the value of the line in the repository and the plus sign (+) shows the value in my staging environment, so it's obvious that I updated JUnit from 4.6 to 4.10. My next step is to commit the change to my local repository, which I can do with the git add and git commit commands.

Listing 4. Add and commit

$ git add pom.xml
$ git commit -m "Updated JUnit from version 4.6 to 4.10"
[master 413e4ec] Updated JUnit from version 4.6 to 4.10
 1 files changed, 1 insertions(+), 1 deletions(-)

With the commit stored locally my final step is to push this file up to GitHub using git push:

Listing 5. Push to GitHub

$ git push -u origin master
Counting objects: 5, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 332 bytes, done.
Total 3 (delta 1), reused 0 (delta 0)
   ed80e8e..413e4ec  master -> master
Branch master set up to track remote branch master from origin.

On the GitHub website I can now see the updated file, as shown in Figure 7.

Figure 7. Reviewing committed changes (click to enlarge)

In conclusion

GitHub is a social networking site and source code hosting repository for software developers dedicated to promoting and supporting open source projects. It allows you to freely host your own open source projects, as well as hosting commercial private projects for a nominal fee. In GitHub you can easily find existing projects based on popularity, search criteria, or programming language preference. You can also follow the progress of software projects and individual developers.

This edition of Open source Java projects has reviewed what GitHub is and how it works, provided a brief overview of the Git distributed version control system, and demonstrated how to create and set up a GitHub repository. I concluded with an example demonstrating the process of modifying, committing, and pushing changes to GitHub.

Hopefully this overview has given you a taste for what Git and GitHub have to offer. There is a lot more to learn, so see the Resources section for additional reading.

Steven Haines is a technical architect at Kit Digital, currently working onsite at Disney in Orlando. He is the founder of, an online education website, and has written hundreds of Java-related articles as well as three books: Java 2 From Scratch, Java 2 Primer Plus, and Pro Java EE Performance Management and Optimization. He lives with his wife and two children in Apopka, Florida.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies