Integration entrepreneur: Now's the time for semantic technology

Sanjiva Nath, CEO of ALM innovator zAgile, explains why the Semantic Web is finally getting traction in the cloud era -- and the role he hopes his company will play

In an interview with the Wall Street Journal earlier this month, Google Fellow Amit Singhal revealed that Google was infusing its search engine with semantic search technology to dramatically improve the contextual accuracy of Google search results. That same essential technology, based on the Semantic Web model Tim Berners-Lee pioneered over a decade ago, was the inspiration for zAgile, an ALM (application lifecycle management) software company that entrepreneur Sanjiva Nath founded in 2006.

Before launching zAgile, Nath headed software delivery at Trigo Technologies, a product information management software vendor acquired by IBM in 2004. The company's core offering, Trigo Product Center, was developed to enable large retailers and manufacturers to establish a "golden source" of product information to feed all their applications. Along the way, Nath encountered the severe limitations of using conventional relational database technology to reconcile multiple versions of the truth -- and decided that the Semantic Web held the answer.

[ Download InfoWorld's new Developer's Survival Guide for analysis of the latest programming trends. | Also see Eric Knorr's post "Devops and the great IT convergence" and Paul Krill's article "Devops gets developers and admins on the same page." ]

Among the first enterprise software companies to use Semantic Web technology, zAgile just updated Wikidsmart, an open source, integrated ALM platform that enables app dev managers to monitor and control projects with multiple development teams. The core technology is the Wikidsmart Context Server, which uses Semantic Web technology to reconcile the varying nomenclature, methodologies, and artifacts development teams employ.

I interviewed Nath on the heels of the new release and shortly after Singhal offered his de facto endorsement of semantic technology. The conversation ranged from the problems with conventional integration technologies to what Nath believes is the "ultimate destiny" for zAgile, which is to be the integration platform of choice for deploying and managing enterprise cloud applications and services.

Eric Knorr: How do you define zAgile's value proposition, and why did you turn to Semantic Web technology for your company's core technology?

Sanjiva Nath: I wanted to offer solutions to problems I first encountered at Trigo, which had very large manufacturers and retailers as customers, including HP, Sony, Philips, and Wal-Mart. The product data integration challenges at those companies were pretty significant. At the time, for example, a single HP printer had over 7,000 attributes. And there are a lot of different elements to this data; integration involves more than simply mapping elements among tables. You must have a mechanism for capturing and representing complex taxonomies, representing attribute-level inheritance, and so on.

To accomplish that, normally people don't get very far using relational models because it's very, very difficult. A lot of people have lost their hair trying to represent that.

Knorr: That kind of data reconciliation is the dirtiest job in IT.

Nath: It gets dirtier particularly in home-grown applications, because the semantics of any particular data element are unclear. For example, if you put a flag in a particular field, it may mean one thing; if you put a date in it, it means something else. So that level of interpretation is sometimes hidden within the code so it's not obvious in any form, which mostly means that the metadata associated with this information is very, very lightweight. It exists either in an implicit form within the code or in some user's head or in some specification. It's not available for anybody to interpret.

Knorr: So in the process of building Trigo, these problems came into sharp relief. And that was the genesis of zAgile?

Nath: Yes, but the challenge of integrating product information was only half of the inspiration. I had another problem: managing 160 developers in four countries. It was always impossible to keep teams in sync with respect to methodology, processes, timelines, status, and so on.

It was frustrating because I knew that each tool and application had the needed information in its own repository, but there was no efficient way to aggregate and reconcile all that data into a real-time or even a weekly report. As it turned out, the problems of integrating multiple development teams were similar to the ones we were encountering with integrating product information.

But information management highlights only one dimension of the problem: Taking data from multiple sources and creating a consistent and centralized repository out of it. There's a bigger problem, and that's getting even bigger now, which has to do with integrating social collaboration and processes in the enterprise. So how do you pull all of that together? The problems of integration were compounding before we even tackled one dimension of it.

Knorr: So how did you determine that the Semantic Web held the solution?

Nath: I wasn't happy with my other options. When I looked around at existing solutions, they were always custom. Whether it was ETL, application integration with a service bus, or even data federation, it was the same thing time and again. We were required to map every single artifact, and then still there wasn't a holistic, contextual relevance to the content.

Whenever you integrate content, you have the same problem. The integration entails mapping data to data and does not capture any intrinsic understanding of the process or context of how the data is related. It's essentially dumb mapping. We needed to represent taxonomies and manage attribute-level inheritance, which aren't very natural at all using conventional database technology. And I saw that semantic technologies could do not only that but a heck of a lot more -- and with ease.

The fundamental premise of our platform and architecture is integration based on semantic reconciliation. That gives you the ability to define, regardless of what the tool is or what's coming from the tool, a single, agreed-upon classification scheme. Not only that, you can capture a lot more information. The richness of metadata that you can capture simply isn't there with relational models.

Knorr: Can you give me an example?

Nath: Sure. Here's a simple one. We just implemented a solution for a customer yesterday that would be impossible using conventional integration schemes. On this customer's internal website, there were pages in a wiki that represented software requirements, which were in turn implemented using an issue tracking system. There's a very simple point-to-point integration scheme: A page implicitly represents a requirement definition, and on this page they would add links to implementation tasks for easy reference.

But there's a heck of a lot that this simple point-to-point integration doesn't really give you.

You haven't really captured the semantics of anything -- any formal relationships or any kind of metadata around what that relationship is about. The page doesn't know that what's referenced on it is actually a task, and the task doesn't know that it's represented on the page. When you go to the task in an issue tracking app, for example, it doesn't tell you that it's been referenced on a page. It couldn't tell you what that reference is or what that relationship is. Is that page representing a design document, a requirement, or is this just a comment? It can't tell you.

Then there's another dimension of this. What if you want to know this relationship -- not just between the wiki page and the task, where you have referenced it -- what if you want to know it somewhere else? Maybe there's a customer case associated with either the task or requirement in Salesforce.  You have no idea whether that case has any relation to the task, nor do you know if there're any documents associated with it in the wiki. You cannot access these relationships in other applications.

Knorr: You're talking about capturing context.

Nath: That's right. The simple example that I described illustrates that in typical point-to-point integration schemes, you don't capture the context between the two. Not only do you not capture the formal context, it's not available anywhere else. Nobody else can interpret that or know anything about it.

Maybe the requirement has multiple tasks associated with it, but each task is associated with a different version of that requirement. So even though it's a single page and five tasks referenced in it, the relationship between each task and the requirement is actually a different relationship, meaning that it has some different attribute values. Without semantic context you can't capture that.

Finally there's the problem of synonyms, both broad terms and narrow terms. Many enterprises are investing in creating their own internal vocabularies and taxonomies. They're doing that because they need to get on the same page when they're talking across geographies and across departments.

Semantics gives you support for all of those things. It's a very simple capability.

Knorr: To what degree do applications need to be compatible with semantic technology?

Nath: That's the magic of it -- applications don't have to care. The bottom line is that you have a mechanism for mapping whatever applications represent. The application does what it does. Our platform and our framework are different than Trigo, because we're not creating a single source of data that everyone then goes after. What we're doing is capturing the semantics in a central location and interrelating concepts with each other, based on pre-defined domain-specific metamodels, so the semantics can both be consistent and available to all applications whenever the context is needed.

If you go into an issue tracking application just to update the status of an issue, you don't really need any context for it. But on the other hand, if you want to know who worked on it, what else they are assigned, any related design documents, and why the issue existed in the first place, then that context can be available appropriately within that app. So the applications actually don't care at all. It's really the mapping between whatever the application represents and whatever your cohesive metamodel is, if you will.

Knorr: So the semantics become the middleware?

Nath: Exactly.

Knorr: It seems to me you had three choices when you were trying to solve this problem. The first one, the relational model, just wasn't going to work. And I suppose you would have had the possibility of just going off and developing something on your own. But specifically you turned to semantic Web technology. Did you feel that was your only option? Did you think it was hand in glove? Was it something you knew about before Trigo and considered seriously? Or was there an "aha" moment?

Nath: I think it was "aha" moment. The biggest reason for going to the Semantic Web -- which, even though it began around 2000 and has taken a while to mature -- was because we see a broad impact. The applicability of the platform and technology crosses a lot of boundaries and domains and industries. And so it was really important to use open standards that would not only make people comfortable, but more importantly address the larger communities around it. Semantic technology obviously has quite a lot of support.

Knorr: Yet in the past there has been a lot of eye-rolling and resistance to the Semantic Web. Why is that and why is it wrong?

Nath: One reason is that the technology is nascent, so the main focus has been in academia and at the level of the Web -- in the latter case, such use cases as figuring out how to make Wikipedia smart. Few have really focused on what to do with it in the enterprise. My big objection to the semantic community has been that they needed to move from basic science to an applied science paradigm.

Knorr: Why did that take so long?

Nath: It's kind of chicken and egg. Until we came along, no one had mapped the application of technology to a critical business problem. That always affects adoption.

You have a lot of new technologies that are coming out right now -- various flavors of big data and NoSQL stuff -- all of it is very new to people. But they're converging on it, because they realize it can help them with something that's been hurting the longest. Our application of semantic technologies for information integration or application integration is a solution that people are just starting to jump on. No new technology goes anywhere until someone comes up with a solution that gives people the "aha" moment.

Knorr: So why did you decide on application life cycle management as the place to apply semantic technology? Because you knew that area best?

Nath: Well, we really didn't start with the technology and try to figure out what to do with it. I actually had a problem I wanted to solve -- one that I had faced during my entire career. So I really spent a lot of time trying to figure out how to solve it, because no conventional solutions existed for it. So through that research and trying out different methods, I arrived at our semantic solution.

The framework that we use -- it's an open source framework -- was developed at HP labs in Bristol. And I believe HP spent millions developing it. So this is a fairly tried-and-proven solution.

Knorr: Why is your solution suitable for this point in time in particular?

1 2 Page 1