OSGi at the UK's biggest science lab

Developers at Diamond Light Source set out to migrate a mission-critical, Java-based acquisition system to dynamic class loading. Here’s what they learned.

Page 2 of 2

Devices need  many types of hooks, and three are shown in Listing 1. Note that the object called manager is invoking annotations on the devices taking part in the scan. Annotations give a simple, low-dependency way of creating a device that can respond to the many different parts of a scan. Previously we had used inheritance for this feature, but that became less manageable as the tree became huge (over 10 levels, depending on device).

After upgrading the scanning with our new fast file writing, the power of annotations, and by using fork/join pools, we discovered an unexpected outcome: we had made our scanning about 10 times slower in the benchmark test. In this test we ran old scans and a new scans from the Jython layer, timing the result. On the upside, the new system scaled to millions of points, whereas the original system had started to get slow above tens of thousands of points and would grind to a halt at hundreds of thousands.

After a bit of head scratching and timing various parts of the system (using nothing more complex than test classes) we discovered that startup was taking much longer to process. The thing was that all the dynamic loading at the start of the scan with OSGi was being timed, and it really made an impact on the benchmark used. After changing the test to set up the bundle separately (@BeforeClass), we were left with a system about one millisecond slower per point of non-hardware accelerated scan and much more scalable.

Lesson learned.

Real world problem #5: Cardinality

When we decide to consume or donate a service, a little XML file is declared to be read by editing the manifest. Lars Vogel’s excellent blog on the subject from eight years ago gives the gory details. You have to do some important things to get this to work properly:

  1. MANIFEST.MF
    1. Bundle-ActivationPolicy: lazy
    2. Service-Component: OSGI-INF/*.xml  (or wherever your XML files are)
  2. Do little or ideally no work in the constructor to your service. Then it will work nicely with other services that it consumes.
  3. Make sure the new XML files are built by setting them in build.properties
  4. Use the correct cardinality in your service XML files.

In mathematics, cardinality refers to the number of elements in a set or other grouping, as a property of that grouping. In OSGi you have the following options for a cardinality of a service:

  • 0..1 (meaning zero or one service instances)
  • 0..n (meaning zero to n)
  • 1..1 (one and if it is not available, things start to fail)
  • 1..n (one or more)

Here’s an example of us injecting some services to a class and setting cardinality:

Setting cardinality

Figure 3. Injecting services and setting cardinality. Image credit: Matthew Gerring.

We started off setting most of the files where services are consumed to having a 1..1 cardinality and this worked well for a while. It made sense: we had one instance of each service and required it.

In practice, however, a service sometimes could not resolve; perhaps it had a dependency missing or something it relied on bombed out. If that happens and they are all 1..1 and in one class, you will find that all of your services do not resolve in that class. This leads to errors later on, that are not related to the actual service that had problems. Therefore, we switched to mostly having 0..1 cardinality in classes with many services injected. We rely on an NPE to warn developers of the class of the specific service having errors. For another approach, we have services injected into the class that uses them. In this case there must be a no-argument constructor that does very little work because at OSGi injection time, the whole system might not be up and working.

Real world problem #6: Declarative services

I wrote in Real world problem #4 that our product today has lots of services. It has been surprising how fast the idea of a no-dependency interface implemented by bundles elsewhere has caught on for us. It has not yet happened that every developer in the group is familiar with declarative services, however; rather, several pioneers have taken up the cause in different teams.

Going back a few years, before OSGI, we were prospecting: looking for a better way of doing things. Now different members of the group have taken the idea forward. They have enthusiastically hidden dependencies and created new services, and this has presented its own problem. Although we have one server, different developers provide bundles without knowing the context of how they will be run on other experiments. Not only that, but we also have test products and share bundles with other products; for example, we have a product we use for analysis called DAWN. We also have open source projects that we have or will be donating to the Eclipse Foundation, like Scanning, January and Richbeans. This means that OSGi XML that works fine in one context has the potential to cause warnings in another. Some example warnings can be seen in the log below:

Listing 2. Logged warnings


!ENTRY org.eclipse.equinox.ds 1 0 2016-11-04 15:55:27.344
!MESSAGE Could not bind a reference of component Data Slice Service Holder. The reference is: Reference[name = IImageService, interface = org.eclipse.dawnsci.plotting.api.histogram.IImageService, policy = dynamic, cardinality = 0..1, target = null, bind = setImageService, unbind = null]

!ENTRY org.eclipse.equinox.ds 1 0 2016-11-04 15:55:27.345
!MESSAGE Could not bind a reference of component Data Slice Service Holder. The reference is: Reference[name = IPlotImageService, interface = org.eclipse.dawnsci.plotting.api.image.IPlotImageService, policy = dynamic, cardinality = 0..1, target = null, bind = setPlotImageService, unbind = null]
Starting VMXi SampleHandling Service

!ENTRY org.eclipse.scanning.connector.epicsv3 4 0 2016-11-04 15:55:27.385
!MESSAGE [SCR] Error occurred while processing end tag of XML 'bundleentry://627.fwk1711105800/OSGI-INF/epicsv3DynamicDataset.xml' in bundle org.eclipse.scanning.connector.epicsv3_1.0.0.qualifier [627]!  The 'service' tag must have one 'provide' tag set at line 4 

!ENTRY org.eclipse.equinox.ds 1 0 2016-11-08 10:25:48.544
!MESSAGE Could not bind a reference of component Scanning Servlet Services. The reference is: Reference[name = IEventService, interface = org.eclipse.scanning.api.event.IEventService, policy = dynamic, cardinality = 0..1, target = null, bind = setEventService, unbind = null]

The way around this problem, it seems, is to understand each message concerned. You can do things like adding -Dequinox.ds.print=true, which will give the actual errors and these must be resolved. This is helpful. However, some messages are correct but don’t actually matter. One above for instance is a 0..1 cardinality for a service not yet available. Later in the class loading, this service will resolve and actually work when used. So my advice would be understand all your OSGi error messages and be warned that some really matter, while others will self correct at runtime.

Real world problem #7: The hidden cost of TDD

A significant fraction of our code base is older than our decision to use test-driven development. Then there’s the fact that we sometimes have to add certain features in a hurry for a given experiment. These can, not surprisingly, be areas with limited tests. So everything new we wrote in the OSGi server, we wanted to try to do with a TDD methodology, bolting things down as we go. All the modularity provided by moving to services allowed the new services to be mocked out and a huge number of tests to be created, which led to an interesting problem: all new the tests added significantly to our Jenkins build time, impacting developers in the whole group.

The tests were checking bundles which the rest of the group were unlikely to be changing on the beamline, for instance generic scanning or file writing. So we brought Travis CI to the rescue. Travis runs from a GitHub webhook executing the build (Maven) and test (JUnit) for us whenever a GitHub pull request is submitted to one of the repositories. This means that in the main product we can have a faster build (in-house and Jenkins) because specific API bundles, on a separate GitHub repository can have a separate build and test.

Increased modularity helped break up the build, which lowered time developers spent waiting for a test/build, for instance when doing Gerrit reviews, allowing an increased rate of development.

Have a go! Get the OSGi scanning server code

Diamond Light Source are committed to open source data and open source code. With the help of the Eclipse Foundation we are planning to get most of the parts of our data acquisition system IP-checked.  Follow these instructions to get a toy OSGi scanning server and run it with a user interface and mocked out hardware connection. (Note that you should be familiar with targets and products in Eclipse and with Git.)

  1. Get the code from GitHub:
    
    git clone --depth=50 --branch=master 
    https://github.com/DiamondLightSource/daq-eclipse 
    ./eclipse/org.eclipse.scanning
    
    git clone --depth=50 --branch=master 
    https://github.com/eclipse/richbeans.git 
    ./eclipse/org.eclipse.richbeans
    
    git clone --depth=50 --branch=master 
    https://github.com/eclipse/dawnsci.git 
    ./eclipse/org.eclipse.dawnsci
    
    git clone --depth=50 --branch=master 
    https://github.com/DawnScience/dawn-hdf.git ./dawn-hdf
    
    
    
  2. Import all the projects from the repositories you checked out into your Eclipse workspace. You will need Eclipse with the RCP development tools.
  3. Open the file org.eclipse.scanning.target.platform.fat.target. You need to have Eclipse download these components to your target, which will happen when you open the file. Click the set as target platform link in the top-right corner:
    Target definition

    Figure 4. Set the target platform (click to enlarge). Image credit: Matthew Gerring.

  4. At this point all the projects should compile. You should start the server using the product org.eclipse.scanning.example.server.product and then start the client using the product org.eclipse.scanning.example.client.fat.product. If the server starts correctly you will see the message:
    
    	11:36:15.434 INFO  o.e.scanning.event.ConsumerImpl - X-Ray 
    Centering Consumer Submission ActiveMQ connection to failover:(tcp://localhost:61616)?startupMaxReconnectAttempts=3 made.
     	[Consumer Thread X-Ray Centering Consumer]
    
    
    It starts up a local version of activemq on port 61616. You can configure activemq using command-line options.
  5. Try running a scan by going to the Scanning perspective and drawing a grid scan using the Scan Editor. It looks something like this:
    scanningperspective

    Figure 5. Using the Scan Editor (click to enlarge). Image credit: Matthew Gerring.

Conclusions

Moving mature, complex, and mission-critical software products to dynamic class loading is actually fairly easy, and we certainly should have done it sooner! We have had some details to get over but actually we were able to find solutions without spending a huge amounts of time. The move suited the way we work and improved it. We found mature tools to pick up and use, a clear migration path, and blogs to follow and help with the process. Moving to OSGi is so straightforward, I would recommend anyone considering a dynamic class loading solution to invest some time in it.

Thanks to staff at Diamond Light Source for working on our OSGi upgrade and editing this article for JavaWorld.

| 1 2 Page 2