Introducing continuous integration

How continuous is your integration and what could your team be doing to improve it? A JavaWorld excerpt from Continuous Integration: Improving Software Quality and Reducing Risk (Addison-Wesley Professional, June 2007)

Cover image

Excerpt from Continuous Integration: Improving Software Quality and Reducing Risk.

By Paul Duvall, Steve Matyas, and Andrew Glover

Published by Addison Wesley Professional.

ISBN-10: 0-321-33638-0

ISBN-13: 978-0-321-33638-5

Assumption is the mother of all screw-ups.

- Wethern's Law of Suspended Judgment

Early in my career, I learned that developing good software comes down to consistently carrying out fundamental practices regardless of the particular technology. In my experience, one of the most significant problems in software development is assuming. If you assume a method will be passed the right parameter value, the method will fail. Assume that developers are following coding and design standards, and the software will be difficult to maintain. Assume configuration files haven't changed, and you'll spend precious development hours needlessly hunting down problems that don't exist. When we make assumptions in software development, we waste time and increase risks.

Reducing assumptions: Continuous integration (CI) can help reduce assumptions on a project by rebuilding software whenever a change occurs in a version-control system.

We may think that the latest, greatest technology will be the "silver bullet" to solve all of our problems, but it will not. At one company, one of my initial responsibilities was to incorporate good software-development practices into the company by example. Over time, we were able to implement many widely accepted practices for developing good software into the projects. Having worked on many projects that used different methodologies, I have found that, in general, iterative projects using the Rational Unified Process (RUP) and eXtreme Programming (XP) in my case work best, because risks are mitigated all along the way. Developing software requires planning for change, continually observing the results and incrementally course-correcting based on the results. This is how CI operates. CI is the embodiment of tactics that gives us, as software developers, the ability to make changes in our code, knowing that if we break software, we'll receive immediate feedback. This immediate feedback gives us time to course-correct and adjust to change more rapidly.

CI is about the fundamentals. It may not be the most glamorous activity in software development, but integrating software is vitally important in today's complex projects. Seldom do the users of the software say to me, "Wow, I really like the way you integrated the software in the last release." And because that doesn't happen, it may seem like it isn't worthwhile to make these efforts behind the scenes. However, anyone who has developed software using a practice such as CI is empowered by a consistent and repeatable build process kicked off when a change occurs to the version-control repository.

CI as a centerpiece for quality: Some see CI as a process of simply putting software components together. We see CI as the centerpiece of software development, as it ensures the health of software through running a build with every change. Determining the quality of software can be as easy as checking the latest integration build.

Spending some time on the nonglamorous fundamental activities in software development means there is more time to spend on the challenging, thought-provoking activities that make our jobs interesting and fun. If we don't focus on the fundamentals, such as defining the development environment and building the software, we'll be forced to perform low-level tasks later, usually at the most inconvenient times (immediately before software goes to production, for example). This is when mistakes happen as well. The discipline involved in keeping the build "in the green" frees you from worrying about whether everything is still working. It's like exercising -- yes, it takes self-discipline; yes, it can be painful work -- but it keeps you in shape to play in the big game, when it counts.

This chapter attempts to answer the questions that you may have when making the decision to implement the practices of CI on a project. It provides an overview of the advantages and disadvantages of CI, and covers how CI complements other software-development practices. CI is not a practice that can be handed off to a project's "build master" and forgotten about. It affects every person on the software-development team, so we discuss CI in terms of what all team members must practice to implement it.

What's a day of work like using CI? Let's examine Tim's experiences.

A day in the life of CI

As Tim opens the door to his company's suite, he views the wide-screen monitor displaying real-time information for his project. The monitor shows him that the last integration build ran successfully a few minutes ago on the CI server. It shows a list of the latest quality metrics, including coding/design standard adherence, code duplication and so on. Tim is one of 15 developers on a Java project creating management software for an online brewery. See Figure 2-1 for a visualization of some of the activities in Tim's day.

Starting his day, Tim refactors a subsystem that was reported to have too much duplicate code based on the latest reports from the CI server. Before committing his changes to Subversion, he runs a private build, which compiles and runs the unit tests against the newest source code. After running this build on his machine, he commits his changes to Subversion. All the while, the CruiseControl CI server is polling the Subversion repository. A few minutes later, the CI server discovers the changes that Tim committed and runs an integration build. This integration build runs automated inspection tools to verify that all code adheres to the coding standard. Tim receives an e-mail about a coding-standard violation, quickly makes the changes and checks the source code back into Subversion. The CI server runs another build, and it is successful. By reviewing the Web reports generated by the CI server, Tim finds that his recent code refactoring successfully reduced the amount of duplicate code in his subsystem.

Figure 2.1
Figure 2-1

A day in the life

Later in the day, another developer on the project, Lisa, runs into Tim's office.

Lisa: I think the changes you made earlier today broke the last build!

Tim: Hmm, but, I ran the tests.

Lisa: Oh, I didn't have time to write tests.

Tim: Are you following the code-coverage metric we have established for the project?

Because of this discussion, they decided to fail the integration build if their code coverage was below 85%. Furthermore, because of her conversation with Tim, Lisa wrote a test for the defect and fixed the problem she discovered. The integration build continued to stay "in the green."

Terms of the trade

automated: A "hands-off" process. Once a fully automated process begins, no user intervention is required. Systems administrators call this a "headless" process.

build: A set of activities performed to generate, test, inspect, and deploy software.

continuous: Technically, continuous means something that once started never stops. This would mean the build runs all the time; however, this isn't the case. Continuous, in the context of CI, is more like continual, and in the case of CI servers, a process continually runs, polling for changes to the version control repository. If the CI server discovers changes, it executes a build script.

Continuous integration: "A software-development practice where members of a team integrate their work frequently; usually each person integrates at least daily-leading to multiple integrations per day. Each integration is verified by an automated build (including test) to detect integration errors as quickly as possible. Many teams find that this approach leads to significantly reduced integration problems and allows a team to develop cohesive software more rapidly."1

development environment: The environment in which software is written. This can include the IDE, build scripts, tools, third-party libraries, servers and configuration files.

inspection: Analysis of source code/bytecode for the internal quality attributes. In the context of this book, we refer to the automated aspects (static and run-time analysis) as software inspection.

integration: The act of combining separate source-code artifacts to determine how they work as a whole.

integration build: An integration build is the act of combining software components (programs and files) into a software system. This build includes multiple components on bigger projects or only low-level compiled source files on smaller projects. In our everyday life, we tend to use the terms build and integration build interchangeably, but for the purposes of this book we make the distinction that an integration build is performed by a separate integration build machine.

private (system) build: Running a build locally on your workstation before committing your changes to the version-control repository, to lessen the chances that your recent changes break the integration build.2

quality: The Free On-Line Dictionary of Computing3 defines quality as "an essential and distinguishing attribute of something and "superior grade." The term quality is often overused, and some seem to think it is based on perception. In this book, we take the stance that quality is a measurable specification, just like any other. This means you can identify specific metrics of quality, such as maintainability, extensibility, security, performance and readability.

release build: Readies the software for release to users. It may occur at the end of an iteration or some other milestone, and it must include any acceptance tests and may include more extensive performance and load tests.

risk: The potential for a problem to occur. A risk that has been realized is known as a problem. We focus on the higher-priority risks (damage to our interests and goals) that have the highest likelihood of occurring.

testing: The general process of verifying that software works as designed. Furthermore, we define developer tests into multiple categories, such as unit tests, component tests and system tests, all of which verify that objects, packages, modules and the software system work as designed. There are many other types of tests, such as functional and load tests, but from a CI perspective, all unit tests written by developers, at a minimum, are executed as a part of a build (though builds may be staged to run fast tests first followed by slower tests).

What is the value of CI?

At a high level, the value of CI is to:

  • Reduce risks

  • Reduce repetitive manual processes

  • Generate deployable software at any time and at any place

  • Enable better project visibility

  • Establish greater confidence in the software product from the development team

Let's review what these principles mean and what value they offer.

Reduce risks

By integrating many times a day, you can reduce risks on your project. Doing so facilitates the detection of defects, the measurement of software health and a reduction of assumptions.

  • Defects are detected and fixed sooner: Because CI integrates and runs tests and inspections several times a day, there is a greater chance that defects are discovered when they are introduced (i.e., when the code is checked into the version-control repository) instead of during late-cycle testing.

  • Health of software is measurable: By incorporating continuous testing and inspection into the automated integration process, the software product's health attributes, such as complexity, can be tracked over time.

  • Reduce assumptions: By rebuilding and testing software in a clean environment using the same process and scripts on a continual basis, you can reduce assumptions (e.g., whether you are accounting for third-party libraries or environment variables).

CI provides a safety net to reduce the risk that defects will be introduced into the code base. The following are some of the risks that CI helps to mitigate. We discuss these and other risks in the next chapter.

  • Lack of cohesive, deployable software

  • Late defect discovery

  • Low-quality software

  • Lack of project visibility

Reduce repetitive processes

Reducing repetitive processes saves time, costs and effort. This sounds straightforward, doesn't it? These repetitive processes can occur across all project activities, including code compilation, database integration, testing, inspection, deployment and feedback. By automating CI, you have a greater ability to ensure all of the following.

  • The process runs the same way every time.

  • An ordered process is followed. For example, you may run inspections (static analysis) before you run tests-in your build scripts.

  • The processes will run every time a commit occurs in the version control repository.

This facilitates

  • The reduction of labor on repetitive processes, freeing people to do more thought-provoking, higher-value work

  • The capability to overcome resistance (from other team members) to implement improvements by using automated mechanisms for important processes such as testing and database integration

Generate deployable software

CI can enable you to release deployable software at any point in time. From an outside perspective, this is the most obvious benefit of CI. We could talk endlessly about improved software quality and reduced risks, but deployable software is the most tangible asset to "outsiders," such as clients or users. The importance of this point cannot be overstated. With CI, you make small changes to the source code and integrate these changes with the rest of the code base on a regular basis. If there are problems, the project members are informed and the fixes are applied to the software immediately. Projects that do not embrace this practice may wait until immediately before delivery to integrate and test the software. This can delay a release, delay or prevent fixing certain defects, cause new defects as you rush to complete and can ultimately spell the end of the project.

Enable better project visibility

CI provides the ability to notice trends and make effective decisions, and it helps provide the courage to innovate new improvements. Projects suffer when there is no real or recent data to support decisions, so everyone offers their best guesses. Typically, project members collect this information manually, making the effort burdensome and untimely. The result is that often the information is never gathered. CI has the following positive effects.

  • Effective decisions: A CI system can provide just-in-time information on the recent build status and quality metrics. Some CI systems can also show defect rates and feature completion statuses.

  • Noticing trends: Because integrations occur frequently with a CI system, the ability to notice trends in build success or failure, overall quality and other pertinent project information becomes possible.

Establish greater product confidence

Overall, effective application of CI practices can provide greater confidence in producing a software product. With every build, your team knows that tests are run against the software to verify behavior, that project coding and design standards are met, and that the result is a functionally testable product.

Without frequent integrations, some teams may feel stifled, because they don't know the impacts of their code changes. Since a CI system can inform you when something goes wrong, developers and other team members have more confidence in making changes. Because CI encourages a single-source point from which all software assets are built, there is greater confidence in its accuracy.

What prevents teams from using CI?

If CI has so many benefits, then what would prevent a development team from continuously integrating software on its projects? Often, it is a combination of concerns.

  • Increased overhead in maintaining the CI system: This is usually a misguided perception, because the need to integrate, test, inspect and deploy exists regardless of whether you are using CI. Managing a robust CI system is better than managing manual processes. Manage the CI system or be controlled by the manual processes. Ironically, complicated multiplatform projects are the ones that need CI the most, yet these projects often resist the practice as being "too much extra work."

  • Too much change: Some may feel there are too many processes that need to change to achieve CI for their legacy project. An incremental approach to CI is most effective; first add builds and tests with a lower occurrence (for example, a daily build), then increase the frequency as everyone gets comfortable with the results.

  • Too many failed builds: Typically, this occurs when developers are not performing a private build before committing their code to the version-control repository. It could be that a developer forgot to check in a file or had some failed tests. Rapid response is imperative when using CI because of the frequency of changes.

  • Additional hardware/software costs: To effectively use CI, a separate integration machine should be acquired, which is a nominal expense when compared to the more expensive costs of finding problems later in the development life cycle.

  • Developers should be performing these activities: Sometimes management feels like CI is just duplicating the activities that developers should be performing anyway. Yes, developers should be performing some of these activities, but they need to perform them more effectively and reliably in a separate environment. Leveraging automated tools can improve the efficiency and frequency of these activities. Additionally, it ensures that these activities are performed in a clean environment, which will reduce assumptions and lead to better decision making.

1 2 3 Page 1