The basic tenet of a service-oriented architecture (SOA) is to provide loose-coupling for different applications. It is thus imperative that data is produced by and for these applications, and that this data is stored and handled optimally. Given the pervasive nature of each application in an SOA, the way this data is stored is typically location-dependent and specific to the application.
An SOA repository is a mechanism that handles the persistence of distributed SOA data. It is a complex and sophisticated enterprise-grade technology that not only handles persistence and caching, but also enables lifecycle management, security, discovery, and transformation of distributed data from diverse service-oriented applications such as silo applications, Web portals, business processes, and mobile applications.
SOA data is basically transient and streaming in nature. It thus necessitates a native XML data storage that aggregates the data relevant to a specific service, regardless of the applications used, rather than assigning the data to the individual applications that make up that service. Otherwise, data becomes difficult to access and cost-prohibitive to store and replicate.
SOA data is typically stored in relational databases and filesystems, but these are not entirely capable of handling SOA data. Elliotte Harold, in his article "Managing XML Data: Native XML Databases," (IBM developerWorks, June 2005) clearly addresses the need for and benefits of a native XML database. In his words, "When your only tool is a hammer, everything looks like a nail. When your only tool is a relational database, everything looks like a table. Reality, however, is more complicated than that. Data often isn't tabular and can benefit from a tool that more closely fits its natural structure. When that data is XML, the appropriate tool for managing it might well be a native XML database."
Being fundamentally XML, SOA data cannot be easily modeled in relational databases. The inflexibility of relational database schemas does not lend itself well to the ever-evolving nature of schemas in an SOA, and more so when trading partners collaborate across enterprises. Filesystems also do not provide advanced querying and management capabilities, which is a typical need in an SOA. For these compelling reasons, we strongly believe that data created as XML should be persisted, managed, and treated as XML.
Consider the complex and ever-evolving list of Web services standards. They include a number of OASIS initiatives such as Web Services Business Process Execution Language (WSBPEL), Web Services Security, Web Services Distributed Management (WSDM), ebXML Collaboration Protocol Profile and Agreement (CPPA), and Web Services Policy Framework (WS-Policy), as well as numerous World Wide Web Consortium initiatives, and REST-based XML artifacts. Wading through this exhaustive alphabet soup of standards, one realizes that at their core, these standards are basically represented by XML Schemas such as the WS-Policy XML Schema, the Collaboration Protocol Profile (CPP) XML Schema, the Collaboration Protocol Agreement (CPA) XML Schema, further strengthening the case that if SOA data is created in XML, it should be persisted, managed, and treated as XML.
Consider Figure 1's WS-Policy Schema
ws-policy.xsdfile associated with the WS-Policy Framework initiative, which standardizes how policies are to be communicated between service consumers and providers.
As shown in Figure 1, data and metadata associated with WS-Policy can be represented in XML and, hence, stored and managed with ease in an XML persistence mechanism. Similarly, CPP, CPA, and Web Services Security details can also be natively stored and managed in an XML persistence mechanism.
Mid-tier caching in an SOA
SOAs need persistence mechanisms to persist information such as the state of a business step in an application, the state of a long-running business process in execution, Web services management, monitoring information, lists of available Web services, and more. Often, much of this information is frequently requested and accessed, thus making the case for caching in the middle tier, which also alleviates the performance bottleneck that can be caused by multiple requests to the same information store.
With SOA data and metadata being XML, we propose a simple, yet effective, mid-tier caching architecture that includes an XML database as a mid-tier cache along with a number of XQuery-powered services. An SOA repository can enable increased performance, reliability, functionality, and usability of SOA artifacts through an effective mid-tier caching architecture powered by a number of important services as follows:
Policy-based caching service: For increased performance and quality of service (QoS)
A policy-based caching service can enable the setup of XQuery-based policies to cache result sets of low-performing services. These policies can also be constructed to include the time-to-live before the cache is refreshed. Policies based on time-of-day requests can determine if the data in the cache is valid for this request or if the originating source must be used. Also, policies based on service availability ensure that if the service is not available, results are obtained from the cache. A cache can be refreshed based on time and other configurable parameters by letting policies trigger the XML persistence mechanism. The design can also include dynamic just-in-time trace logging for service calls made by the XML persistence mechanism.
Data repurposing service: For richer functionality and improved performance
A data repurposing service can enable additional filtering and search criteria on content returned from a given service. Additionally, XQuery can be used to drive transformations for repurposing the content and provide analytics and reporting on returned content. XQuery can also deliver portions of result sets and create a final result set based on aggregation of content from multiple services.
Data abstraction service: For easier deployment and maintenance
A data abstraction service can eliminate the need for Web services to be aware of individual datasources. Figure 2 shows a better use of Web services by eliminating the need to develop separate clients and Web services for each operation. Datasource management for disparate datasources such as JDBC, HTTP, WSDL, and filesystems can be enabled using this service.
In addition, since services can run on any system, an SOA repository can be used to enable the federation of services in an SOA. It can also be used to alleviate performance issues for center-tier process-abstracting remote services by collocating the data as close to the data processing as possible. As a persistence layer at the central-tier, an SOA repository can be used to store transactional information for many purposes, including analysis and integrity management issues, such as logging. By handling abstracted and composite data elements at the central-tier, a centralized repository for SOA data can be enabled.
Exciting new SOA technologies such as enterprise service buses and orchestration engines can employ an SOA repository for state management, workflow persistence, and message persistence. An SOA repository can also provide the persistence backbone in SOA registries, whether they are UDDI (Universal, Description, Discovery, and Integration) or ebXML registries, to enable the discovery, publishing, and subscription of services.
The need for complex and sophisticated XML data management for an SOA
As already discussed, Web services and SOAs create huge amounts of complex and sophisticated new data in the form of data-rich XML messages exchanged between applications, which must be stored so they can be effectively audited and analyzed. When we look at the various technologies that enable and empower an SOA, it is apparent that an SOA's key characteristics and benefits form the basis for many vendor offerings in this space. As shown in Figure 3, the following functionalities form the core SOA and Web services infrastructure:
- Web services management
- Web services monitoring
- SOA governance
- Web services security
- SOA persistence and caching
- SOA discovery, publishing, and subscription
We can map the use of XML data management as an enabling and/or empowering technology at various points in this infrastructure in the following ways:
- SOA metadata persistence
- SOA discovery, publishing, and subscription
- Persistence of Web services management data
- Acceleration caching
- Service aggregation
- Web services policy caching and management
XML data management in an SOA can also enable and/or empower:
- Persistence of monitoring, logging/auditing
- Persistence of security capabilities
- SOA governance
- SOA OLAP data and metadata transformations and persistence
- Trading partner profile and agreement persistence
- Message persistence
- State management
- Schema versioning
Native XML data management server overview
An XML data management server (XDMS) is much more than a data store for XML data. An XDMS is a sophisticated system that must be designed with flexibility, scalability, and performance in mind. The reality is that most XML data management servers do not measure up to these exacting demands. Typically with an XDMS, no prior knowledge of the XML document?s structure is necessary. Any valid XML document such as XML, Web Services Description Language (WSDL), CPPA, XML Schema, or Extensible Stylesheet Language Transformation, can be inserted at will, and the native XML data management server automatically will create the required internal structures to accommodate such storage.
In addition, XML data management servers support transactions, indexing, schema or DTD validation (some support schema versioning), extended connectivity, users- and groups-based security, plus backup/restore and server mirroring. An XDMS solution must also be able to store non-XML data (such as binary data), thus providing a solution for storing any other content you may require.
The native XML interface for SOA repository operations is XQuery. To tap into the full potential of XML databases, XQuery is the way to create, manipulate, examine, and manage XML data. XQuery also provides a standard way to unify disparate datasources and make them all appear to be a single server.
XQuery is a functional language; as such, expressions are composed and combined to create arbitrarily complex queries over one or more sets of XML data. XQuery offers both strongly-typed mechanisms using XML Schema and DTD, and weakly-typed mechanisms for handling raw XML data.
The XQuery data model
The XQuery data model is more extensive than the standard XML data model of XML Infoset and Post-Schema Validation Infoset (PSVI). XQuery is defined in terms of operations on the data model, but it does restrict how documents and instances in the data model are constructed. The data model consists of the XML data being queried, any intermediate values, and the final query results. It supports intermediate expressions that can result in values that are not XML (for example, a list of integers or strings), XML fragments, and both typed and untyped data.
XQuery and XML Schema have the same type concept for XML data. XQuery provides built-in types based on XML Schema and support for user-defined Schema types. XQuery also supports additional data types outside the existing XML Schema data types.
The components of the XQuery data model and type system are as follows: