If you've been curious about GitHub then this short tutorial in the Open source Java projects series is for you. Get an overview of the source code repository that has changed the way that many developers work, both individually and collaboratively. Then try GitHub for yourself, using common Git commands to branch and commit your own open source project.
GitHub is a social coding website and source-code hosting service that uses Git as its version control system. Launched in 2008, GitHub already boasts nearly 1.7 million people hosting nearly 3 million repositories. Like most social networks, GitHub allows users to create and follow feeds associated with each other's projects. It also extends the social paradigm to include network graphs that show repository usage. You can think about GitHub as a social network, a la Facebook, but just for software developers.
Bringing together social elements with a free repository to host open source projects, GitHub aims to cultivate a supportive and active community for the betterment of the software industry. The more active a project is, the more people will find it, and hopefully contribute to it. GitHub also offers commercial project support at a nominal cost.
In addition to following projects, GitHub allows users to follow individual software developers. This makes it easy to keep up with what friends and colleagues are doing and review their code, as well as seek out well-known programmers and follow their work. A regularly updated feed presents an opportunity to watch someone practice their craft. For developers, there's a lot to learn from studying each other's code and methodology; for instance, being able to see what code other developers push to their projects, and when, is a great way to learn at a high level about the release development cycle.
Follow the story on JavaWorld
Social coding with GitHub enables developers to learn from each other in a new way while storing and updating code using a popular, well-featured version control system. In this edition of Open source Java projects I will help you get started with GitHub. First I'll provide an overview of the platform, then introduce some Git basics, including command-line options that you'll use frequently in GitHub. Finally, I'll walk through a simple
commit example that demonstrates the everyday power of this distributed code repository.
Get started with GitHub
GitHub accounts come in several flavors, grouped by individual or commercial account and by public or private repository. Open source developers are allowed unlimited public repositories, or for a small fee can choose to host between five and 20 private repositories. Commercial developers pay more (about twice as much as open source developers as of this writing) and can scale to up to 125 private repositories. See the GitHub homepage for a complete listing of plans and pricing.
You will need a GitHub account in order to follow along with this article. Go to the GitHub website and click on the Signup and Pricing link at the top of the page. Click "Create a free account" and complete the account-creation process.
If you want setup instructions for your operating system, see the GitHub tutorial. Note that the installation process automatically installs a GUI client and prompts you to manually install GitHub's command-line tools. I recommend that you take this option in case you ever want to do something quickly on the command line.
Git: A primer
You will need to be at least somewhat familiar with Git in order to effectively use GitHub. A point of interest to most geeks is that Git was designed and developed by Linus Torvalds, the founder of Linux. In this section I provide an overview of Git and describe how it works. Toward the end of the article I present a review of a few of the more popular commands to help you become productive quickly. This is by no means an exhaustive tutorial but it should help you get started.
When software developers think about a version control system (VCS), we tend to think of a central repository that we'll use to download source code, make changes locally, and then submit those changes back to the central repository. Git is a little different. It is a distributed version control system, meaning that it really isn't a central repository but rather multiple clones of repositories. So the "master repository" exists somewhere (like in GitHub) but we work locally on clone repositories.
Git's distributed architecture provides a significant benefit over non-distributed version control systems in that developers can locally check-in and check-out code, create branches, and more. For a major change in a traditional VCS you would create a personal branch and check code into that branch. When you were done with your changes, you would merge that branch into the main branch.
Distributed version control
Git changes the VC paradigm because you can work locally and merge all of your changes in a singular commit (you can keep your local history when you merge or you can combine all changes into one check-in). So the central repository is not littered with branches and dozens of historical notes, but only information about feature changes that have been made. In essence, Git uses branches as they were intended: to develop a new feature set, to maintain a release, or to fix bugs associated with a release.
When you install Git on your local machine and "clone" a repository, you receive the entire repository, including historical information about all of the source code in the project. You then work against your local repository, adding new files, removing files, and changing files in a staging environment until you actually commit them to the local repository. Git maintains versioning information about all of your changes and you can easily roll back to any point in your history. Finally, when you are ready, you can synchronize your local repository with a remote one.
Changes are synchronized to a remote repository via a push while changes in a remote repository are synchronized with your local repository via a pull. Because you have a full clone of the repository locally, you are not limited to simply working against that repository's main branch. You can create branches to contain your changes and then either push or pull them as appropriate.
See Resources if you need a more complete tutorial introduction to Git. I'll focus on GitHub for the remainder of this article.
Social coding with GitHub
After you have created a GitHub account you can start following the work of other software developers or watching projects that interest you. You can find people or projects to follow by searching for them directly; or, if you're looking for ideas you can use GitHub's "Explore" function to find projects based on your interest. Explore GitHub displays trending repositories as well as featured ones. In addition to these, you can explore all repositories by clicking on the "Repositories" button on the toolbar. If you want to search for projects coded in a specific programming language you can click "Languages" on the toolbar, then choose the language that you want to explore. Figure 1 shows the most watched (i.e., trending) Java repositories at the time of this writing.
Storm was the most watched Java repository on GitHub at the time that I checked. Once you find a project that you're interested in, click on it and you'll see a "Watch" option, as shown in Figure 2.
If you click "Watch" then you'll be subscribed to follow the project and will be able to see changes made to it on your GitHub homepage. Figure 3 shows my GitHub homepage, which contains a listing of updates to various Spring projects.
Following individual developers works the same way as following projects. For example, I recently decided to follow my friend Tom Akehurst, as shown in Figure 4.
From a social perspective, GitHub empowers you to easily find developers and projects that you might be interested in and receive updates about them. Of course, the flip side is also true: GitHub is an excellent place to show off your work and get feedback and recognition from your peers.
Using GitHub for project updates and maintenance
Remember that the main driver behind GitHub is to promote the development of open source software projects. So if you've built something good, why not contribute your code to GitHub and make it freely available to the world?
You can create a new repository from the GitHub website or from the client application on your desktop. In this section we'll first walk through creating a repository from the website, then I'll show you how to update and commit a file via the command-line.
Setting up a GitHub repository
Assuming that you have an account set up, log in and you'll see something similar to what's shown in Figure 5. Click "Create a Repository" and you'll be guided through the process. First, you'll click on the first item in your top-right toolbar, "Create a New Repo."
Click that link and give your repository a name and description. I created a new repository to host my GeekCap utilities, which is a set of helper classes that include sorting algorithms and a re-sortable list, a class that easily extracts icons from the Java Look-and-Feel Graphics Repository, ZIP utilities, and more. While not the coolest project, I included it because most of my other projects use one or more of these utilities, so it's good to have them stored in an accessible place. I named my project
geek-util and gave it a description: "Geekcap Utilities: helpful classes that are used by other Geekcap.com projects."
Once your project is set up you should see a screen like the one shown the Figure 6.
Figure 6. Repository created (click to enlarge)
The screenshot in Figure 6 shows a listing of what you can do with your new repository, as well as an example of creating a
README file and pushing it into your repository. I have an existing Maven project that I need to add for the first time, so I start by adding my
pom.xml file and my
src directory. Below are the Git commands that I entered for the initial push of the project into the repository:
Listing 1. Git commands for creating a repository
git init git add src git add pom.xml git commit -m 'Initial commit' git remote add origin https://github.com/geekcap/geek-util.git git push -u origin master
Here's where familiarity with Git is important if you want to use GitHub. Fortunately, the main Git commands are relatively intuitive:
- git init creates an empty Git repository. Specifically, this creates the
.gitdirectory, which the
gitcommand will recognize as a repository.
- git add adds files to the repository; in this case I added my
- git commit commits changes to the repository. All I did was to add the
srcdirectory. You would also use this command after modifying the contents of a file or deleting files via the
- git remote add origin adds the specified URL as the origin server for the Git repository. As you saw in Figure 6, the origin server is created on GitHub for you and the URL is provided in the setup documentation.
- git push uploads all committed changes to the specified server. In this case I've pushed the initial commit that contains the
srcdirectory to the origin server, which I previously set.
You can use Git from your IDE or from the command line; I just I happen to be a command-line junkie. Executing
git help shows the most common commands, which are summarized in Listing 2.
Listing 2. More common Git commands
usage: git [--version] [--exec-path[=<path>]] [--html-path] [--man-path] [--info-path] [-p|--paginate|--no-pager] [--no-replace-objects] [--bare] [--git-dir=<path>] [--work-tree=<path>] [--namespace=<name>] [-c name=value] [--help] <command> [<args>] The most commonly used git commands are:add Add file contents to the index bisect Find by binary search the change that introduced a bug branch List, create, or delete branches checkout Checkout a branch or paths to the working tree clone Clone a repository into a new directory commit Record changes to the repository diff Show changes between commits, commit and working tree, etc fetch Download objects and refs from another repository grep Print lines matching a pattern init Create an empty git repository or reinitialize an existing one log Show commit logs merge Join two or more development histories together mv Move or rename a file, a directory, or a symlink pull Fetch from and merge with another repository or a local branch push Update remote refs along with associated objects rebase Forward-port local commits to the updated upstream head reset Reset current HEAD to the specified state rm Remove files from the working tree and from the index show Show various types of objects status Show the working tree status tag Create, list, delete or verify a tag object signed with GPG
Update and commit
The ability to make changes locally and then push them into the repository is a powerful feature of GitHub. For instance, I recently noticed that my GitHub project
geek-util uses JUnit 4.6 and decided to upgrade to v4.10. I started by updating my POM file locally to include JUnit 4.10. In order to review the differences between my local repository and the code I'm working on I executed the
git diff command:
Listing 3. Diff review
$ git diff index d1c3066..5fe2f4f 100644 --- a/pom.xml +++ b/pom.xml @@ -63,7 +63,7 @@ <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> - <version>4.6</version> + <version>4.10</version> <scope>test</scope> </dependency> <dependency>
The minus sign (-) shows the value of the line in the repository and the plus sign (+) shows the value in my staging environment, so it's obvious that I updated JUnit from 4.6 to 4.10. My next step is to commit the change to my local repository, which I can do with the
git add and
git commit commands.
Listing 4. Add and commit
$ git add pom.xml $ git commit -m "Updated JUnit from version 4.6 to 4.10" [master 413e4ec] Updated JUnit from version 4.6 to 4.10 1 files changed, 1 insertions(+), 1 deletions(-)
With the commit stored locally my final step is to push this file up to GitHub using
Listing 5. Push to GitHub
$ git push -u origin master Username: Password: Counting objects: 5, done. Delta compression using up to 8 threads. Compressing objects: 100% (3/3), done. Writing objects: 100% (3/3), 332 bytes, done. Total 3 (delta 1), reused 0 (delta 0) To https://github.com/geekcap/geek-util.git ed80e8e..413e4ec master -> master Branch master set up to track remote branch master from origin.
On the GitHub website I can now see the updated file, as shown in Figure 7.
GitHub is a social networking site and source code hosting repository for software developers dedicated to promoting and supporting open source projects. It allows you to freely host your own open source projects, as well as hosting commercial private projects for a nominal fee. In GitHub you can easily find existing projects based on popularity, search criteria, or programming language preference. You can also follow the progress of software projects and individual developers.
This edition of Open source Java projects has reviewed what GitHub is and how it works, provided a brief overview of the Git distributed version control system, and demonstrated how to create and set up a GitHub repository. I concluded with an example demonstrating the process of modifying, committing, and pushing changes to GitHub.
Hopefully this overview has given you a taste for what Git and GitHub have to offer. There is a lot more to learn, so see the Resources section for additional reading.
Learn more about this topic
- "Git tutorial" (Lars Vogel, Vogella.com): Maybe not everything you ever wanted to know about Git, but surely most of it.
- See "Getting the hang of GitHub" (Andrew Burgess, NetTuts, January 2011) to learn more about navigating the GitHub interface.
- The distinction between centralized versus distributed version control is key to Git's and GitHub's success (John Ferguson Smart, JavaWorld, September 2007).
- In July 2012 GitHub received a $100 million investment from venture capital firm Andreessen Horowitz (Paul Krill, InfoWorld).
- Visit the GitHub homepage to explore interesting projects and create an account.