With Web services now becoming a viable technology, enterprises are beginning to see real return on investment from this technology. Web services represent a less invasive, less costly, and loosely-coupled approach to integrating heterogeneous, distributed applications and processes. Web services and service-oriented architectures (SOAs) address the integration of business processes and applications; therefore, it becomes necessary to understand how Web services interact with the data layer.
At the heart of any software architecture resides information, or data as we know it, that can span multiple locations, applications, and usage scenarios. If this data can be represented in a pervasive and standards-based manner, it becomes that much easier to enable the flow of this information throughout an enterprise and also across trading partners.
We believe XML databases provide a schema-agnostic and ubiquitous representation of information and can enable enterprise information integration (EII). By the end of this article, you will have enough information to deduce that XML databases combined with Web services enable the flow of information across loosely-coupled applications, resulting in a more responsive architecture and compelling return on investment.
Before we move on to specific examples, you will need to understand what a native XML database (NXD) is all about.
What is a native XML database?
Native XML databases are designed especially for storing XML documents. Like other databases, their basic supported features include transactions, security, multiuser access, programmatic APIs, query languages, and so on. NXDs differ from other databases in that their internal data models are based on XML only.
NXDs are most useful for storing document-centric data because they preserve document order, processing instructions, comments,
CDATA sections, and entity usage, while XML-enabled databases do not. Furthermore, NXDs support XML query languages, allowing you to pose requests like, "Get me all documents in which the third paragraph after the start of the section contains a bold word." Such queries are clearly difficult to ask in a language like SQL.
In addition, NXDs prove useful for storing documents with a natural format of XML, regardless of what those documents contain. For example, consider XML messages in an SOA environment. Although these documents are probably data-centric, their natural format as messages is XML. Thus, when they are stored in a message queue, it makes more sense to use a message queue built on a native XML database than a non-XML database. The NXD offers XML-specific capabilities, such as XML query languages, and usually retrieves whole messages faster. Another example of this usage is a Web or enterprise data cache.
Uses for a native XML database include:
- Enabling EII
- Providing a unified master data-access layer across the enterprise
- Validating, persisting, querying, and repurposing XML
- Becoming XML-standards compliant
- Aggregating content from a variety of systems (Java Database Connectivity (JDBC), HTTP, filesystem, Web services)
- Serving as an enterprise data cache and operational datastore to improve data-access response times and relieve burden on backend systems
- Supporting an enterprise data bus solution
Native XML database features may include:
- Internal identity management systems and integration with external identity management systems like LDAP (lightweight directory access protocol)
- Seamless, schema-independent persistence/caching of Java Web service messages, XML, SOAP, and WSDL (Web Services Description Language)
- Built-in support for security standards such as Security Assertion Markup Language (SAML), Web Services Security (WSS), and XML Encryption
- Built-in support for workflow management based on Business Process Execution Language (BPEL)
- Built-in support for ebXML Registry functionality
- Seamless persistence of unstructured content
- Robust and lightning-fast querying by an engine with "pure-play" XQuery implementation, where, instead of mapping XQuery to another query language, it is used directly against a database designed from the ground up for XQuery
- Interactive and intuitive graphical environment
- Intelligent tools to repurpose the data (using XSLT (Extensible Stylesheet Language Transformations) and XQuery)
- Seamless integration with external JDBC sources with ability to read, query, insert, update, and delete all within an XA-compliant transaction
- Seamless integration with HTTP and SOAP sources
- Transactions with all available datasources using XQuery
- Standards compliance by enforcing schema validation and data aggregation mapped to a required schema
- A single source of identity management and authorization
- Backup, restore, and replicate capabilities
Figure 1 illustrates an NXD server's components.
Two primary uses of an XML database are to enable EII or provide a midtier operational datastore (ODS) platform.
Enterprise information integration
XML databases enable EII by providing a platform for querying across heterogeneous datasources, resulting in one 360-degree view of all common entities spread across enterprise systems or services. EII provides huge benefits to business users. For example, imagine a doctor-patient encounter and a system where a doctor can enter a patient chart number, name, or other form of identification and obtain information on that patient's history of illnesses, allergies, medications (current and past), X-rays, past surgeries, and doctor summary reports, all in one screen, irrespective of the originating datasources.
A midtier ODS can provide the necessary infrastructure for managing enterprise data and bringing it closer to the consuming business application, while simultaneously reducing the burden on backend systems of record. XML databases are an ideal technology to serve as an ODS because of their ability to maintain schemas and to bind heterogeneous datasources. Furthermore, XML databases' support for XA-compliant transactions make them an ideal ODS and EII technology that enables both read and write capabilities across heterogeneous systems.
With this information in mind, we move on to a specific use-case scenario and look at how the ODS ties these concepts together.
Consider the high-level architecture of a hospital information system (HIS). As is the case with any scalable, connected, and secure information system, an HIS consists of bringing information and functionality from distributed systems to diverse users in real time. Typically, the actors in this use case are doctors, nurses, lab technicians, IT resources, third-party vendors, and probably patients.
An HIS architecture contains the following components:
- A Federation of datasources
- An integration of systems
- Packaged applications
- Custom applications
- The delivery of functionality
- Medical digital assistants (MDAs)
In Figure 2, the federated datasource could be an XML database that queries backend systems via their Web service interfaces. To better understand the overall system architecture, consider a typical hospital infrastructure model as depicted in Figure 3:
The above diagram depicts an overall set of systems a hospital may be using. These systems could potentially be provided by a single vendor, but in most realistic cases, they are provided by disparate vendors, each with a totally different set of APIs and user interaction interfaces. Thus, for these systems to live together and exchange information, the hospital IT department should create a set of Web services for each system, exposing the important data and functionality of the respective system.
Java Web services: What problems can it solve?
Presuming the hospital IT department leverages Web services, many traditional problems can be addressed:
- Connecting traditionally separate and autonomous software systems
- Enabling the construction of distributed systems
- Creating dynamic, collaborative applications
- Allowing diverse and redundant systems to be addressed through a common, coherent set of interfaces
- Protecting existing IT investments without inhibiting the deployment of new capabilities
- Bringing information technology investments more in line with business strategies
Java Web services need persistence and query
As mentioned earlier, Java Web services create huge amounts of new data, specifically the exchange of data-rich XML messages. These messages contain important information that many organizations will want and need to store, access, query, audit, analyze, and repurpose.
It is nearly impossible to persist all of these messages in a relational database because of the inflexible data model they impose. You must know what type of data the message will contain and set up relational tables to store it. Additionally, you will have to write code that knows, for every message type, how to take the incoming message, shred it, and populate the tables.
XML databases are particularly useful for handling new message types or evolving message structures. Storing message content in a native XML database reduces the development time and cost at least 50 percent by eliminating the need to define object-to-relational mapping. Extracting, transforming, and working with XML content stored in a native XML database is also relatively simple. Every aspect of data management is done using XQuery, which is a powerful language specified by the W3C (World Wide Web Consortium) and designed specifically for working with XML. Furthermore, being able to seamlessly communicate with internally managed data, HTTP, filesystem (URI), and Web services (based on WSDL references), all from within an XQuery statement makes for a powerful and easy-to-use solution. Such functionality enables an enterprise design to have a single point of access (NXD) and provides the ability to aggregate, transform, and repurpose the data via the same API and query language (XQuery).
An NXD can also serve as an operational data cache. Using this approach, specific content that most likely will not change often, or once created, never changes, can be cached in the NXD as either XML or other formats required by the client systems. In addition, each datasource can be configured with a time-to-live setting; when a request is made, that configuration is evaluated by the NXD engine and results are either returned directly from the internal cache or fetched from the originating source (if the cache is deemed as expired).
As we dive a little deeper and discuss specific use-case scenarios as depicted in Figure 4, we will build a stronger case for how XML databases can augment Web services.
Figure 4 depicts four specific example use cases and also introduces three external sources a hospital must interact with on a day-to-day basis: Food and Drug Administration (FDA), Centers for Disease Control and Prevention (CDC), and insurance providers. In all three cases, the hospital must communicate using Web services and XML-based standards as follows:
- CDC and FDA: A CDA (clinical documents architecture) standard from Health Level Seven (HL7 is an ANSI-accredited standards-developing organization).
- Insurance providers: An XML-based standard as defined by the ACORD (Association for Cooperative Operations Research and Development) standards body. The ACORD XML for P&C (property and casualty) standards addresses the industry's real-time requirements. It defines P&C transactions that include both request and response messages for accounting, claims, personal lines, commercial lines, specialty lines, and surety transactions.
The complexity of the data and a typical WSDL describing each datasource contains numerous specific methods, each returning a part of the overall content. To make all content available in a single view, the client application must make multiple Web service calls to each datasource. Returned data must be aggregated in such a way where it can be properly interpreted and used by the client.
A doctor has a computer in the patient's visiting room. During a patient encounter, the doctor launches a browser and logs on to the hospital's internal portal. Based on the user's role in the system (in this case, a doctor), the user is presented with a certain list of functions available to him, such as the Patient Record Viewer. A Patient Viewer screen consists of the following tabs:
- Patient Details: Primarily patient information such as name, address, and birthday
Patient Medical History: A list of all known illnesses or patient encounters, with the ability to pull up details on each illness (and order new lab work) including:
- Doctor reports and notes
- X-ray and other types of imaging
- Patient Medication: A list of all known medications this patient has been prescribed over time, with the ability to prescribe new medication if needed
- Patient Allergies: A list of all known patient allergies, with the ability to make a new entry
- Medical Report: A list of available doctor observations and visit reports, with the ability to create a new report following a given visit