Tools that help developers to successfully deliver software

Software Configuration Management

Subscribe to Software Configuration Management: eMailAlertsEmail Alerts newslettersWeekly Newsletters
Get Software Configuration Management: homepageHomepage mobileMobile rssRSS facebookFacebook twitterTwitter linkedinLinkedIn


SCM Authors: Stackify Blog, Elizabeth White, Mike Raia, John Basso, Derek Weeks

Related Topics: Java Developer Magazine, Software Configuration Management

Java Developer : Article

Managing Java Source Code Dependencies for SCM

Managing Java Source Code Dependencies for SCM

There are many facets to consider when implementing even the most basic software configuration management (SCM). For Java, with its import mechanism, these simple goals often become unmanageable when the source code tree grows beyond a certain point of complexity.

This is mainly due to the reticulate interdependencies that arise within the source code tree as it evolves. Also, because code is seldom (if ever) retired, the code base continues to grow, causing this network to become increasingly complicated over time.

In this article I explore the evolution of the typical Java source code tree and the underlying relationships that make even basic Java SCM problematic. I also suggest a simple way to manage source code relationships to meet basic SCM goals.

Understanding these topics will enable Java development shops to begin implementing simple yet effective SCM systems that balance the requisite process with unencumbered development, testing, and operational deployment. By requisite process I mean staying a couple of steps ahead of SCM-related firefighting while remaining free from laborious and/or unnecessary processes.

Some Simple Goals of Java SCM

  • Maintaining source code under revision control
  • Managing code dependencies and third-party library dependencies
  • Managing builds and build dependencies
  • Managing dependencies on third-party JARs
Beyond a certain range of complexity (usually a few hundred total source files, depending on the skill of your developers and how quickly they're being asked to churn out code) the reticulate interdependencies within the code are unable to be unwrapped. That is, the large number of interdependencies introduced by import statements causes artificial dependencies when trying to add features and build, branch, release, and test your code.

More specifically:

  • Building a subtree causes the compilation of every source code file in your source tree due to circular dependencies. This results in extremely lengthy build times for some projects I've seen.
  • No source code is free to move along under its own development cycle ­ you might need to build a subbranch n times per day and another only m times per month, but, because they have import interdependencies, they're both built at the maximum (required) rate.
  • Branching and merging are extremely time-consuming and complex and can introduce significant developer downtime, mostly due to the large number of source files that must be considered.
  • Releasing code to operations is very difficult, as you have to push every Java class file upon release.
  • Testing is more difficult, if not impossible, since it's harder to isolate subbranches of code to understand their functionality. It's also more difficult to write a testing harness for a subbranch (e.g., using JUnit).

    Most current source code management tools deal with navigating source hierarchies and finding objects and methods. These are great problems to solve, but not ones that we're primarily interested in (JavaDeps comes relatively close in that it helps to discover some compilation dependencies that go unnoticed by some compilers).

    Similarly, many revision control systems (RCS) provide check-in, checkout, branch, and merge capabilities, but none address source code tree structure and how to manage the requisite dependencies involved.

    Target Audience
    The target audience is developers, testers, and operational support staff who are interested in taking the necessary steps to actively manage their Java-based projects in terms of building source code for test and operational deployment; developing multiple versions of a product or service in parallel; and replicating operational, test, and development environments to reproduce unexpected behavior and fix bugs.

    Large numbers of Java source files are in the range of 500..10K with large numbers of dependent third-party JARs in the range of 50..1K. All told, we're talking about a set of development projects that have 0(50L) total document and code artifacts...not very big, but large enough that it's worth examining how the code base evolves and how to keep it from turning into a liability instead of the asset it's intended to be.

    Due to its complex nature, this topic is too large to be covered in a single article. I'll start by covering the basics of source code management and builds, and finish by touching on the topics of managing deployments and documentation. Future areas for discussion include managing properties files and build tools and building WAR files.

    The Evolution of Java Source Code Hierarchies (aka Back to Basics)
    Every Java shop I've ever worked in has followed an eerily similar evolutionary path as far as its Java source code is concerned:
    1.   Starts the root branch off by creating Java package com.mycompany
    2.   Begins to populate the source tree with a layer of utility and/or base classes, many of them the usual suspects like com.mycompany.db, com.mycompany.utils, com.mycompany. regexp, and com.mycompany.xml
    3.   Continues to populate this source tree with a set of servlets, beans, data access, and JSPs that depend on the set of common classes (the aforementioned usual suspects)

    This approach is extremely intuitive and works for a while ­ for about as long as the codebase remains simple enough that dependencies between distinct packages are well understood.

    The first dependencies introduced are usually servlets, beans, and JSPs importing common/shared utility classes. These dependencies are distinct, simple, and well understood. However, soon thereafter, more complex references are introduced as developers try to reuse as much code as possible while minimizing the time they spend repackaging code. A servlet from one package begins to look like a utility to another package and is subsequently imported. This type of import can create a circular reference (see Figure 1) between source code files and sets the stage to make even simple SCM prohibitively complex.

    Introducing circular references in Java is surprisingly easy and extremely common, though, interestingly enough, I've never actually heard of a developer admitting to such a practice. Understanding these relationships, plus your source code's dependencies on third-party JAR files, is key to having a modular, branchable, buildable, testable, and deployable codebase.

    Partitioning Source Code: The Introduction of Components
    The first step in decoupling direct source code dependencies is to partition your source code into components.

    A component is a set of Java packages that provides a specific set of functionality and has its own development cycle. It doesn't matter whether it's one package or 20, one source file or 200 source files (though using more than a few hundred source files in one component will bring you right back where you started, in terms of problematic source code management). Having their own development cycle means that, relative to the other components, the source files need to be built/tested/deployed n times a day while other source code needs to go through this cycle m times a day.

    Partitioning source code into components will become fairly intuitive after a few examples:

  • Example 1
    Utils make great components because they are shared by so many other source files and therefore are dependent on a lot of files. This also causes them to have a quicker dev cycle (and therefore a quicker build turnaround) than most other source code. Create one component for all your utils, or partition them further into multiple components (see Tables 1 and 2).

  • Example 2
    Database access classes can be grouped into separate components. For multiple database servers, use multiple data access components, tying each schema to a component 1:1. This handles schema changes nicely and helps manage a component's dependencies on multiple database servers (see Table 3).

  • Example 3
    A set of JSPs or servlets that provides a specific set of functionality should be a separate component. This could be a data-entry application, a data-feed reader, or an administrative UI for one of your internal systems. Because these types of components have their own requirements and delivery dates and the requirements change, they end up on their own development schedule, so it makes sense to create a component here (see Table 4).

    You could end up with as many as several hundred components, each with anywhere between 10 and perhaps 350 source files. Although partitioning your source code looks complicated, it's actually easy (the difficult part is getting your builds started).

    All this source code needs to be checked into a revision control system (RCS). Any/all RCS syntax in this article will be in reference to Perforce (www.perforce.com), as it is has many features that make SCM very simple.

    In Perforce parlance, the component source code is checked into location:

    //depot/components/<component_name>/src

    For example:

    //depot/components/FileUtils/src/com/mycompany/...
    //depot/components/DataParser/src
    //depot/components/UserData/src

    Builds, branches, and documentation are also partitioned under each component for RCS:

    //depot/components/<component_name>/src
    //depot/components/<component_name>/branch
    //depot/components/<component_name>/builds
    //depot/components/<component_name>/docs

    Third-Party JAR and ZIP Files
    The other source for build dependencies are between your source code and JARs provided by a third party.

    This necessitates actively managing these files to keep on top of their multiple versions and frequent name collisions. It's very easy to impede the progress of debugging and building through the mismanagement of third-party JARs and ZIPs (e.g., opening up JARs manually to try to find a version number to find out what you built against, or what version you have in production), and yet remarkably simple to organize them intuitively and efficiently.

    Because successive versions of third-party JARs sometimes result in name collisions, it's necessary to use the version numbers to maintain them under RCS. In Perforce, the JARs might look like the following (using the JDK and JSDK as examples):

    //depot/jars/jdk/1.2.2/rt.jar
    //depot/jars/jdk/1.3.0/rt.jar
    //depot/jars/jdk/1.3.1/rt.jar
    //depot/jars/jsdk/2.0/jsdk.jar
    //depot/jars/jsdk/2.1/server.jar
    //depot/jars/jsdk/2.1/servlet.jar
    //depot/jars/jsdk/2.2/servlet.jar

    This versioning scheme allows components that might depend on the 2.1 version of servlet.jar to reside next to components that might depend on the 2.2 version. Both components can be built and deployed in parallel and their dependencies tracked accordingly.

    This approach also has the added bonus of allowing for any client that has access to your RCS server to be able to run builds, as every server has access to the requisite JARs via RCS.

    Building Components
    Now that your source code is partitioned and third-party JARs are under RCS, it's time to start building. Build requirements are very simple:

  • A build for one component may only execute against that component's source code. All other build dependencies must be linked through other components' builds or third-party JAR/ZIP files. In short, a component build may not execute against any source code other than its own.
  • Results of builds (JARs) must be under RCS.
  • Source code needs to be labeled with the build number, so there is a link between a build JAR and the source code that produced that JAR/WAR. This implies that given any JAR for any component, the original set of source code can be located.
  • The dependencies for a deployment (a set of JARS that are deployed together into QA/dev for testing/production) must be under revision control; i.e., the list of dependent JARs for a build of a component must be under revision control.

    This first component built must be entirely self-contained ­ it can be built using only its own source code and (optionally) third-party JAR files. Components built this way are seed builds and start your build process. Build each of these components one at a time by compiling their Java source, JARing up the resultant class files, and checking these JARs into your RCS (build scripts should do all this for you).

    If you can't isolate a component so that it's entirely self-contained, either repackage your source code (not often done due to time constraints) or generate an invalid build so you can begin to generate seed builds. (An invalid build is when a component is built against its own source code plus the source code of another component. Sometimes it's impossible to isolate even one component so it's self-contained, so you'll need to build it against multiple components' source code to get started. After this initial build, you'll be able to build it against its own source code and JARs created from this first build.)

    A high-level overview of a build involves the following steps:
    1.   Sync up source code and third-party JARs from your RCS to your local machine.
    2.   Make sure your target build number hasn't been built already.
    3. Set up your CLASSPATH, which contains three sets of entries:

  • The path to the root of the component's source
  • The paths to other JAR files from other components
  • The paths to requisite versions of third-party JAR files
    4.   Execute make to build your source.
    5.   JAR up the resultant class files and check this JAR into RCS.
    6.   Generate a build summary file (containing the environment, date, etc.) and check this into RCS.
    7. Generate a label and stamp the source code for the build with it.

    Once you have all of your seed builds, begin to build those components that have only one level of dependence on other source code within your repository, i.e., they can be built using only their own source code, the seed JARs, and, optionally, third-party JARs. Build each of these components individually, JAR up their resultant class files, and check these JARs into RCS.

    Once all your one-level dependence builds are complete, it's open season to build the rest of your components, usually done in the order of increasing number of dependencies. The goal is to make sure no component is compiled against any source code except its own.

    What you're effectively doing here is isolating like branches of Java code in sets of Java packages against changes in other branches of Java code (also grouped in Java packages). This is probably the most important aspect of the build strategy. This allows stakeholders of your SCM system to isolate, and therefore understand, the dependencies between source code and JARs, and trace any build to the source code that was used to generate that build as well as replicate an environment by easily reproducing the JARs used to construct the environment.

    Equally important is that the source code for a component is associated with its build JAR via a label, so it's easy to trace any class file you have in production back to its source files, and then from there to trace other dependent components' class files back to their corresponding source code.

    This method of organizing your builds also frees up any components that share a dependency on a common component (e.g., utils). The common component is now on its own development cycle, so it can iterate through many build cycles while allowing dependent components to migrate to newer builds when it makes the most sense. Said another way, it allows for independent/parallel development of components that have a dependency on a single, shared component.

    Build Example
    At a high level, consider the following scenario for building your first three components:
    1.   Component 1 ­ Utils: No dependencies. Build it against its own Java source code to generate a JAR file that contains the resultant .class files.
    2.   Component 2 ­ DataParser: Depends on a previous build of the Utils component, as well as a third-party JAR called xerces.jar (v1.4.1). Build it against its own source code, a previous Utils component build, and the xerces.jar file from xerces v1.4.1.
    3.   Component 3 ­ DataCaptureUI: Depends on a previous build of DataParser, a previous build of Utils, and a third-party JAR called servlet.jar (v2.0). Build it against its own source code, a previous DataParser component build, a previous Utils component build, and the servlet.jar file from jsdk 2.0.

    Note that because Component 3 depends on DataParser, and DataParser depends on xerces.jar, you'll need to add xerces.jar as a dependent JAR for the DataCaptureUI build.

    The above set of builds and dependencies is shown in Figure 2.

    Conclusion
    Managing source code dependencies is only the tip of the iceberg for comprehensive SCM. Other facets of SCM that fit into the component model include:

  • Managing deployments: A deployment is the set JAR, ZIP, WAR, and properties files that allow the component to operate in its designated environment (usually dev, test, or operations). Property and config files can be partitioned similar to source code, whereupon all component artifacts can be synced directly from your RCS server to their deployment server with deployment dependencies tested and well understood.
  • Managing documentation: Component documentation can be bundled with its corresponding component under RCS and mapped to a mount-point on your intranet server for automated publishing. Documentation management has a large number of implicit requirements involving availability, content, and versioning from release to release.

    Partitioning Java source code into components and formalizing dependencies will provide several key benefits for your Java-based projects, some of which are implicit thus far:

  • Ability to provide parallel development of projects that share a common codebase
  • Ability to easily deploy to development, test, and operational environments
  • Ability to minimize the amount of code associated with a build/deployment
  • Elimination of confusion and name collisions due to third-party JAR dependencies
  • Reproduction of deployment environments to help reproduce problems (and then eliminate them)
  • Ability to retire code and branches of code when a component is retired
  • More Stories By Tom Laramee

    Tom Laramee is a software developer currently working with the Blindsight Corporation writing computer vision software for embedded systems and handheld devices. He has spent the last five years designing and building Web applications as both a development lead and system architect. He has a masters degree in electrical and computer engineering from the University of Massachussetts, Amherst, MA.

    Comments (6) View Comments

    Share your thoughts on this story.

    Add your comment
    You must be signed in to add a comment. Sign-in | Register

    In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


    Most Recent Comments
    Morten 03/25/03 09:19:00 AM EST
    Edwin Hautus 12/13/02 09:44:00 AM EST

    Hi Tom,

    I am working on a tool that I think can help in the process described in your article. The tool analyzes and visualizes dependencies between packages and assists in refactoring to break dependencies.

    You are very right that dependencies are often a mess in Java programs, just look at the Java class library for a good example ;-).

    Of course, there are a lot of issues besides the package depedency structure to deal with in SCM. Nevertheless, I think a good component structure starts with a proper Java package structure.

    Edwin Hautus

    Jens Schumann 11/13/02 02:03:00 PM EST

    Hi Tom,

    Thanks for your reply.

    You are right, there are details missing to understand the problem. I will try to explain as much as possible online to keep other people informed - if you have further questions please send me an e-mail.

    Some time ago we introduced something we called "Service" to implement fine-grained business logic. Services differ from each other through the business methods they offer (the business interface); and need to extend a common service interface for the service life cycle. Our overall goal was to offer a design guide to our developers to implement fine-grained business logic while hiding implementation details. Within a couple of projects we realized that there was always the need for basic services such us Identity Management, Authorization Service, Authentication Service, Content Integration Service and such. So we decided to extract the services with stable business interfaces and used them as components (since our contract status made this possible ;). The components itself may offer different implementations for the same business interface. If a customer decides they don't like our default implementation we are still able to change the implementation of that service without changing the client code.

    An example.

    Whenever you deal with identities for a Portal and such you need to find, create, update, remove them and their relations to other identities. Therefore a service offers methods such as createIdentity, updateIdentity, removeIdentity and the same for their relations. A client of that service (which means coarse-grained business logic) doesn't care what kind of implementation it is using - you just want to call methods, which handle everything. So our component IdentityService ships with Util classes (Value objects and such), Exceptions, the Business Interface and classes for default implementation for Relational DB or LDAP based storage based on EJBs or not. And yes - we ship the service as compiled classes.

    This leads to your questions regarding repositories. In our case we work for different customers who usually keep the repository on their side. The components we offer are in our repository, so we end up having a copy of the versioned library in debug and optimized format in every project repository.

    How stable are these components?
    Apart from bug fixes and queries, which get moved from a project back to the component, our components are pretty stable. New services in projects aren't. We have chosen the same approach to implement project specific fine-grained business logic for the reasons you mentioned in your article.

    The result is that a typical (server side) project repository contains the build management, framework libraries, coarse-grained business logic source code and integrates component libraries from other cvs modules/repositories. This enables us to work with distributed development teams (even from other IT companies) on the same project; and we enforce separation of concerns and high code reuse. It also proofs your concept - yes, it really works. But we need to reproduce the same build management and environment for many projects and just need something, which is really easy to setup, extend and maintain. So far we are able to create a build management, which is easy to extend but hard to setup and maintain. Again, think about 20, 30 or 50 components (and their dependencies) you need to integrate. I think this week we finally understood what our solution will be, but this enforces a strict directory layout for libraries and components, way more than we ever wanted to.

    Regarding EJB's - yes - I can produce different versions - but this is again an ongoing manual effort, which just does not work over the time. We have successfully tested more than 10 different J2EE servers that we can use for deployment - most of them require specific changes before we can deploy a component. For instance server A ships with a regular expressions lib, server B doesn't. Also you may need to tune your EJB settings for a specific project, specific target (local test system, integration system, pre production, production system) and such. The project build management has to handle this automatically and needs to be IDE independent.

    In the end your customer will always ask you why they need to pay that much for build management ;)

    Jens

    Tom Laramee 11/12/02 05:41:00 PM EST

    hi Jens:

    I've been thinking about your post for a couple of days now. I have a couple of thoughts about the idea of "using the same component across multiple projects for different customers".

    The first is that just because there are multiple projects all sharing a component, that doesn't mean that the source code for the component needs to be checked into multiple CVS repositories. The original idea behind the scheme I presented was that the source code was only in 1 place (possibly branched within that 1 CVS, but never in multiple places).

    So I'm, not sure if you were referring to checking a component's build artifacts into multiple CVSs, or
    the source code, but things become quite problematic when you define a component such that it's source
    code needs to be spread across more than 1 CVS (as it appears you're experiencing right now).

    I suspect the trouble here is the definition of what exactly a component is. Specifically, for your scenario of components for multiple customers I'm wondering: how many changes does a single component need to undergo for a different customer before it's a different component altogether?

    In other words, it would be convenient if your services (your components) could all have the same set of generic functionality implemented, and then differences for customers be captured in property files or some other non-code mechansim (like a database table). I have no idea what you're trying to do, so this is just an idea.

    As soon as you start making code changes that are specific to each customer, I'm wondering if it's still
    a single component, particularly if the code changes are extensive. I'm also wondering if you can abstract the generic functionality of a component (the functionality that is common to all customers), call that a component, and then create specific components for each customer. These latter components could have both compile and run-time dependencies on the generic components, but could be developed in
    such a way that one customer's code doesn't affect another customer's code (a good idea anyways). The problem then becomes changing the generic components, which affect's everyone's codebase.

    And on EJB JARs and how you can't add libraries to them: that's fine, just have your build produce more than one JAR ... one EJB JAR (or multiple EJBs), and a separate JAR for other component builds. If I'm understanding you correctly, there's no reason to force a dependency between JARs, it's only sometimes convenient to do this but is never required.

    I hope this helps.

    --toml

    Jens Schumann 11/06/02 06:07:00 PM EST

    (ups - something missing)

    (you may or may not add libraries to an ejb-jar) , component dependencies and so on. So if someone can offer a maintainable solution for our problem I would love to hear it ;).

    Jens

    Jens Schumann 11/06/02 06:04:00 PM EST

    Tom,

    We came to the same conclusions as you did and run into some major issues satisfying our additional requirements.

    Apart from using versioned libraries our overall goal was reusing fine-grained business logic as much as possible across projects. For our so-called services we introduced a service business interface to hide even implementation details per service. Examples are Identity Management, Authentication Service, Authorization Service and more, which might use Database or LDAP storage or whatsoever. To use one of those components you need to adjust the J2EE resource mappings for accessing the physical available resources and you are done. Sounds nice, doesn't it? Well, almost.

    First you will run into a major nightmare called build process of .war, xxx-ejb.jar or .ear files while adjusting deployment descriptors and adding required libraries. We are using ant and had a hard time to create the needed ant utils for those tasks.

    Second, since we use the same component across multiple projects for different customers you will end up having copies of the same service library version in multiple (CVS) repositories including the libraries they depend on. This might work for 2 or 3 components without any issues - but just try it with 10, 15 or more than 25 components and their required libraries.

    An third - usually you will have multiple "releases" of the same version of your components: non-optimized with debug code, optimized without debug code, obfuscated versions and more.

    To make things worse now think about bug fixes, new features, application server dependencies (you may or may not add libraries to an ejb-jar