This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
The Morgan Kaufmann Series in Data Management Systems Series Editor: Jim Gray, Microsoft Research Component Database Systems Klaus R. Dittrich and Andreas Geppert Managing Reference Data in Enterprise Databases: Binding Corporate Data to the Wider World Malcolm Chisholm
Understanding SQL’s Stored Procedures: A Complete Guide to SQL/PSM Jim Melton Principles of Multimedia Database Systems V. S. Subrahmanian
Information Visualization in Data Mining and Knowledge Discovery Edited by Usama Fayyad, Georges Girnstein, and Andreas Wierse
Principles of Database Query Processing for Advanced Applications Clement T. Yu and Weiyi Meng
Data Mining: Concepts and Techniques Jiawei Han and Micheline Kamber
Advanced Database Systems Carlo Zaniolo, Stefano Ceri, Christos Faloutsos, Richard T. Snodgrass, V. S. Subrahmanian, and Roberto Zicari
Understanding SQL and Java Together: A Guide to SQLJ, JDBC, and Related Technologies Jim Melton and Andrew Eisenberg Database: Principles, Programming, and Performance, Second Edition Patrick and Elizabeth O’Neil The Object Data Standard: ODMG 3.0 Edited by R. G. G. Cattell and Douglas K. Barry
Principles of Transaction Processing Philip A. Bernstein and Eric Newcomer Using the New DB2: IBMs Object-Relational Database System Don Chamberlin Distributed Algorithms Nancy A. Lynch
Data on the Web: From Relations to Semistructured Data and XML Serge Abiteboul, Peter Buneman, Dan Suciu
Active Database Systems: Triggers and Rules For Advanced Database Processing Edited by Jennifer Widom and Stefano Ceri
Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations Ian Witten, Eibe Frank
Migrating Legacy Systems: Gateways, Interfaces, & the Incremental Approach Michael L. Brodie and Michael Stonebraker
Joe Celko’s SQL for Smarties: Advanced SQL Programming, Second Edition Joe Celko
Atomic Transactions Nancy Lynch, Michael Merritt, William Weihl, and Alan Fekete
Joe Celko’s Data and Databases: Concepts in Practice Joe Celko Developing Time-Oriented Database Applications in SQL Richard T. Snodgrass Web Farming for the Data Warehouse Richard D. Hackathorn Database Modeling & Design, Third Edition Toby J. Teorey Management of Heterogeneous and Autonomous Database Systems Edited by Ahmed Elmagarmid, Marek Rusinkiewicz, and Amit Sheth Object-Relational DBMSs: Tracking the Next Great Wave, Second Edition Michael Stonebraker and Paul Brown with Dorothy Moore A Complete Guide to DB2 Universal Database Don Chamberlin Universal Database Management: A Guide to Object/Relational Technology Cynthia Maro Saracco Readings in Database Systems, Third Edition Edited by Michael Stonebraker and Joseph M. Hellerstein
Query Processing for Advanced Database Systems Edited by Johann Christoph Freytag, David Maier, and Gottfried Vossen Transaction Processing: Concepts and Techniques Jim Gray and Andreas Reuter Understanding the New SQL: A Complete Guide Jim Melton and Alan R. Simon Building an Object-Oriented Database System: The Story of O2 Edited by François Bancilhon, Claude Delobel, and Paris Kanellakis Database Transaction Models for Advanced Applications Edited by Ahmed K. Elmagarmid A Guide to Developing Client/Server SQL Applications Setrag Khoshafian, Arvola Chan, Anna Wong, and Harry K. T. Wong The Benchmark Handbook for Database and Transaction Processing Systems, Second Edition Edited by Jim Gray Camelot and Avalon: A Distributed Transaction Facility Edited by Jeffrey L. Eppinger, Lily B. Mummert, and Alfred Z. Spector Readings in Object-Oriented Database Systems Edited by Stanley B. Zdonik and David Maier
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopying, recording, or otherwise—without the prior written permission of the publisher. Library of Congress Cataloging-in-Publication Data Component database systems / edited by Klaus R. Dittrich, Andreas Geppert. p.cm Includes bibliographical references and index. ISBN 1-55860-642-4 1. Distributed databases. 2. Database management. 3. Distributed databases. I. Dittrich, Klaus, R. II. Geppert, Andreas. QA76.9.D3 C65 2001 005.75'8--dc21 00-063407 MKP ISBN: 1-55860-642-4 dpunkt ISBN: 3-932588-75-4 This book is printed on acid-free paper.
• • • • • • •
Foreword Peter C. Lockemann Professor of Informatics Universität Karlsruhe I recently came across a fascinating book, Regional Advantage—Culture and Competition in Silicon Valley and Route 128.1 The author compares the rise and dominance of Silicon Valley during the 1970s and 1980s with the slow decline of Route 128 during the 1980s and hypothesizes that the contrasts in development were mainly due to the large differences in industrial culture. More specifically, she claims that industrial culture in Silicon Valley is one of dense networks, close collaboration between small and highly innovative companies, low vertical integration where companies instead depend for much of their own products on the skills and expertise of neighboring companies, and companies’ close relationship with their suppliers and industrial customers that ties their own success to the success of the others, and vice versa. The author makes a convincing point that such a culture is particularly capable of coping with global competition fueled by ever-shorter technology cycles. What bearing can a study of industrial sociology possibly have on technical systems such as database management systems (DBMS)? It seems to me that indeed there are important parallels. Database systems tend to be huge monolithic systems with an internal structure that is reminiscent of tight vertical integration. One suspects that this requires the vendors themselves to follow a strategy of vertical integration. So if the study teaches us any lesson, it is that database systems should be broken up into smaller pieces that can each be procured from the best and most innovative suppliers. Many will benefit: the DBMS vendors who will have a technological edge over their slower competitors, the customers who will get the best technology for their money, and the suppliers who have a dependable partner that allows them to stay ahead of the crowd. There remains just a small problem: What are the pieces—the components—of a DBMS, and what is the glue that holds it together? It seems that even after 30 years of DBMS development, and 20 years after the question was first posed, nobody in industry or in academia has a ready answer. To be sure, the 1980s and early 1990s saw a wealth of research flowing into extensible database system prototypes, but only one found its way into a commercial product. Other products soon followed suit. A number of chapters in this book describe the efforts of 1 Saxonian, Annalee. 1996. Regional Advantage—Culture and Competition in Silicon Valley and Route 128. Harvard University Press, Cambridge, MA.
vi |
Foreword
the vendors of these systems, with almost all based on an objectrelational approach, where the objects are supposed to provide the components and the relational base system the glue. One of the major upcoming influences comes from an entirely different direction: distributed systems and the techniques that cope with the heterogeneity and interoperability of sources. On a systems level the issues are reflected in the Intelligent Information Infrastructure (I3) architecture, with wrappers, mediators, and facilitators. On a more technical level, middleware is the buzzword. In the context of component databases, they are candidates to provide the glue. Three of the contributions in this volume reflect efforts in this direction. At the outset the editors set a very ambitious agenda. Some readers may feel—as I did—that the remaining chapters contribute only very modestly to the lofty goals. But we should always keep in mind that commercial products look for what is feasible at the time and solve the most pressing problems at hand. Other—seemingly esoteric—problems will have to wait until practical needs arise. For researchers, however, there is no reason to wait. It should have become clear to all of us that, with all the new technology in processor speeds, storage capacities, transmission bandwidths, mobility, the number of database applications, and the degree of database system specializations will have to grow dramatically. Economics in database engineering have become the order of the day. And so we come full circle back to our initial, almost cultural argument in favor of componentizing database systems. What we have in hand, then, is a wonderful book to show us how far database technology has been able to move in the direction of component databases, how limited the approaches still are when measured against the agenda, and what enormous challenges—and ensuing rewards—still lie ahead. If the book stimulates renewed efforts in research and development, then indeed it has served its purpose as a trailblazer marking the way to an information technology infrastructure that is receptive to the needs of the information society.
• • • • • • •
Contents Foreword / v Preface / xv
Chapter 1
1.1 1.2
Component Database Systems: Introduction, Foundations, and Overview / 1 Klaus R. Dittrich, Andreas Geppert Introduction / 2 The Need for Componentized DBMSs 1.2.1 1.2.2 1.2.3 1.2.4
1.3
/
8
DBMS Architecture / 8 Components and Database Management System Architecture / 10
/
Kernel Systems / 12 Pure Extensible Database Systems / Customizable Systems / 13 Toolkit Systems / 14 Transformational Systems / 15 Generators / 16 Frameworks / 17 Discussion / 17
The Oracle8i interMedia Text Data Cartridge / 101 The Oracle8i Spatial Data Cartridge / 102 The Oracle8i Visual Information Retrieval Data Cartridge / 103
Conclusion
/
104
Extensible Indexing Support in DB2 Universal Database / 105 Stefan Deßloch, Weidong Chen, Jyh-Herng Chow, You-Chin (Gene) Fuh, Jean Grandbois, Michelle Jou, Nelson Mattos, Raiko Nitzsche, Brian Tran, Yun Wang Introduction / 106 Hard-Wired Indexing / 108 High-Level Indexing of User-Defined Types 4.3.1 4.3.2 4.3.3 4.3.4
Index Maintenance / 110 User-Defined Predicates and Search-Key Generation / 112 Index Exploitation / 114 Implementation and Predicate Filtering / 114
/
118
Indexing for Geographical Information System Applications / 118 Indexing on XML Documents / 121
x
|
Contents
4.5
Loose Integration of External Search Engines 4.5.1 4.5.2 4.5.3
4.6 4.7
Chapter 5 5.1 5.2 5.3
Performance / 132 Related Work and Conclusion
5.7
/
139
143 143
154
158
Database Server / 159 Distributed and Heterogeneous Query Processing / Full-Text Queries on Relational Data / 162 Distributed Transformation Services / 164 Online Analytical Processing Services / 166
Microsoft Data Access Software Developer’s Kit 5.7.1 5.7.2 5.7.3 5.7.4 5.7.5 5.7.6
5.8
/
127
Adding Generic Connection Facilities through Data Link / 155 Adding Common Resource-Pooling Facilities / 155 Providing a Rich Client-Cursor Service / 156
The Microsoft Component Object Model / Introspection via Properties / 146 Common Abstractions / 147 Common Extensions / 149
Services
5.4.2 5.4.3
124
134
Introduction / 140 Universal Data Access / 141 OLE DB: A Component Data Model
5.4.1
5.5 5.6
/
Enabling Component Databases with OLE DB José A. Blakeley and Michael J. Pizzo
5.3.1 5.3.2 5.3.3 5.3.4
5.4
/
DB2 Text Extender Overview / 124 Exploiting High-Level User-Defined Indexing / Generalization for Arbitrary Predicates / 129
Documentation / 167 ActiveX Data Objects (ADO) / 168 OLE DB Data Providers / 170 Simple Provider Toolkit / 170 Test Tools / 171 Samples / 171
Summary
/
171
/
167
160
Contents
Chapter 6
6.1 6.2 6.3 6.4
An Architecture for Transparent Access to Diverse Data Sources / 175 Mary Tork Roth, Peter Schwarz, and Laura Haas Introduction / 176 System Overview / 178 Goals for the Wrapper Component Architecture Building a Wrapper / 180 6.4.1 6.4.2 6.4.3 6.4.4 6.4.5
6.5 6.6 6.7
Chapter 7 7.1. 7.2 7.3
179
/
200
Building Component Database Systems Using CORBA M. Tamer Özsu and Bin Yao Introduction / 208 Object Management Architecture / 209 Common Object Request Broker Architecture
/
211
Interface Definition Language / 212 Client-CORBA Object Communication / 217 Object Adapters / 221 ORB Interoperability / 224
Wrapper Implementations Related Work / 202 Conclusion / 205
7.3.1 7.3.2 7.3.3 7.3.4
7.4
| xi
/
224
Naming Service / 225 Event Service / 225 Life-Cycle Service / 226 Persistent Object Service / 226 Transaction Service / 227 Concurrency Control Service / 227 Query Service / 227 Collections Service / 228 Other Object Services / 228
Common Facilities / 229 Building Componentized Applications
/
229
/
207
xii |
Contents
7.7
CORBA and Database Interoperability 7.7.1 7.7.2 7.7.3 7.7.4 7.7.5
7.8
Chapter 8
8.1 8.2 8.3
Conclusion
Chapter 9 9.1 9.2
233
235
235
Introduction / 238 The Idea of a Pure Java Database Management System User API and Object Model / 239 Implementation / 240 Database / 240 Transaction / 241 Collections / 241
Object Management 242 Transaction Management / 243 Concurrency Control / 244 Backend / 245 Distributed Applications / 246 Event Management / 247 Log Service / 248 Postprocessing / 249 On-Demand Assembly of Components Outlook / 251
/
250
Conclusions and Perspectives / 253 Andreas Geppert and Klaus R. Dittrich Achievements / 254 Open Issues / 255 9.2.1 9.2.2 9.2.3 9.2.4
9.3
/
/
The Architecture of a Database System for Mobile and Embedded Devices / 237 Heiko Bobzin
8.3.1 8.3.2 8.3.3 8.3.4
8.4 8.5 8.6 8.7 8.8 8.9 8.10 8.11 8.12 8.13
Object Granularity / 234 Object Interfaces / 234 Association Mode / 234 Call Mode / 235 Concurrently Active Objects /
Adequate Support for Components / 255 Application Development Using CDBMSs / 256 Performance Issues / 258 Development of CDBMS Components / 259
The End (Is Not Yet Near)
/
260
/
238
Contents
Bibliography Index
/
/
263
281
About the Authors
/
290
| xiii
This Page Intentionally Left Blank
• • • • • • •
Preface
The recent past has witnessed an overwhelming proliferation of database technology. Without query languages, transaction management, logical data independence, and many other database functions, it is virtually impossible to run an enterprise or organization. However, now that the advantages of using database management systems are apparent, users require database technology to reach far beyond its traditional power. Nowadays database management systems must be able to handle a much broader diversity of data, such as multimedia and unstructured data. Likewise, database technology must be exported to other existing software systems, and operational existing data stores must be integrated into a coherent entirety. Any vendor attempting to address all these new requirements in a single system would very likely end up with a dinosaur. In consequence, most vendors try to tackle this problem by making their systems extensible and by offering specialized functionality in terms of extensions. Such a goal is easier stated than achieved—prerequisites to be met are the redesign of existing database management systems to open them up for extensions, and exact and complete specification of how to add extensions in a sound way. Of course, many different ways can be devised to accommodate extensiblity. The common characteristics of the various approaches elaborated in this collection are that database functions are componentized at least to some extent and that new functions can be incorporated by adding new components or replacing existing ones. In the first chapter, we give an in-depth overview of and introduction to the subject matter. Chapters 2 through 8 address specific aspects of component database systems, as outlined next. Chapter 2 by Paul Brown presents the Informix perspective on distributed component database management systems based on the object-relational model. He discusses how distributed object-relational database management systems can be extended. He also discusses implications of componentization on query optimization in a distributed database management system. Chapter 3 by Sandeepan Banerjee, Vishu Krishnamurthy, and Ravi Murthy introduces Oracle’s approach to component database systems. They present a complete set of possible extensions to Oracle8i, including user-defined types, functions, operators, and aggregates. They also discuss extensible indexing and query optimization.
xvi |
Preface
Chapter 4 by Stefan Deßloch, Weidong Chen, Jyh-Herng Chow, You-Chin (Gene) Fuh, Jean Grandbois, Michelle Jou, Nelson Mattos, Raiko Nitzsche, Brian Tran, and Yun Wang presents the IBM DB2 approach to componentization. They focus on extensible indexing and present applications of their approach to selected areas. They also discuss the integration of DB2 UDB (DB2 Universal Database) with external search engines. Chapter 5 by José A. Blakeley and Michael J. Pizzo introduce OLE DB. They present Microsoft’s component object model and show how it is used in OLE DB to accommodate componentization. They also discuss several scenarios using OLE DB, and describe how components in OLE DB can be built and managed. Chapter 6 by Mary Tork Roth, Peter Schwarz, and Laura Haas presents the Garlic system for the integration of diverse data sources. They introduce Garlic’s wrapper architecture and describe the steps necessary to build a new wrapper. They also discuss query execution and optimization in Garlic. Chapter 7 by M. Tamer Özsu and Bin Yao discusses the Object Management Group (OMG) perspective on the subject. They introduce CORBA (Common Objects Request Broker Architecture) and CORBAservices, show how componentized applications can be built using CORBA, and show how CORBA and CORBAservices can be used to develop interoperable database management systems. Chapter 8 by Heiko Bobzin considers database support for embedded and mobile devices. He argues that for such systems componentization is a prerequisite. He then describes POET’s Navajo system and discusses the various Navajo parts that can be plugged together to form a customized database system for mobile devices. Finally, Chapter 9 concludes the book and discusses achievements and open questions. We are very pleased that this book, which includes the broad spectrum of covered systems, became possible. The chapter authors are leading experts in their field and key people in their companies. We are therefore very grateful that they found the time to contribute their chapters, despite the pressure due to ongoing development, product releases, and so forth. We also appreciate the very professional support by Morgan Kaufmann. It was a pleasure to cooperate with Marilyn Alan and Diane Cerra; without their patience despite the many missed deadlines, this book would never have materialized.
Preface
| xvii
Acknowledgments Chapter 1 We gratefully acknowledge the comments by Mike Carey, Tamer Özsu, and Reinhard Riedl on previous versions of this chapter.
Chapter 3 We would like to recognize the contributions of several individuals who made the Oracle Extensibility Architecture a reality. Anil Nori, Chin Hong, Kenneth Ng, and Andy Mendelsohn helped set the direction for this technology in Oracle. There were several project leaders and developers, including the authors, who helped deliver the architecture; prominent among them are Jay Banerjee, Jags Srinivasan, Nipun Agarwal, Seema Sundara, Dinesh Das, S. Muralidhar, and Vikas Arora.
Chapter 4 This work is partially supported by DARPA Contract F33615-93-11339. We would like to thank the Garlic team members, both past and present, whose hard work and technical contributions made the Garlic project possible.
Introduction Database management systems (DBMSs) support individual applications and comprehensive information systems with modeling and longterm reliable data storage capabilities as well as with retrieval and manipulation facilities for persistent data by multiple concurrent users or transactions. The concept of data model (most notably the relational models, Codd 1970; and the object-oriented data models, Atkinson et al. 1989; Cattell & Barry 1997), the Structured Query Language (SQL, Melton & Simon 1996), and the concept of transaction (Gray & Reuter 1992) are crucial ingredients of successful data management in current enterprises. Nowadays DBMSs are well established and are, indeed, the cornerstones of virtually every enterprise. Traditionally, data elements stored in databases have been simply structured (e.g., employee records, and product and stock information). Transactions have been of short duration and often needed to access only a few data items. Most traditional queries have been simple and the techniques used to answer them efficiently are well understood. Taking a broader view, DBMS-based information systems have been built in a rather database-centric way; that is, environment decisions such as the use of mainframe-based or client/server architectures have been typically based on what the DBMS itself requires or supports in this respect. In the recent past, however, more and more new and demanding application domains have emerged that could also benefit from database technology, and new requirements have been posed to DBMSs. Many applications require the management of data types that are not handled well by conventional DBMSs. Examples of such new data types include multimedia data, documents, engineering artifacts, and temporal data, to name just a few. Likewise, DBMSs are more and more often required to integrate with other infrastructure parts of their environment. For instance, instead of letting the DBMSs manage already existing data, it is often necessary to leave data where it is (possibly because there are applications that users would not want to migrate as well) and instead enhance the external data management system with some sort of database functionality (Vaskevitch 1994). It might also be necessary to integrate existing data stores with database systems in such a way that each of them is still independently operational, while users are provided with an integrated and uniform view of the entire system. In other words, often applications need support as offered by multidatabase systems or federated database systems (Elmagarmid et al. 1999; Sheth & Larson 1990), in which the federated parties might be any kind of data store. Some applications even require support for queries combining data from databases and data extracted from sources such as the World Wide Web (WWW).
1.1
Introduction
| 3
In order to meet all these new requirements, DBMSs (or whatever the resulting kind of system will ultimately be called) apparently have to be extended to include new functionality. However, enhancing a single system with modules implementing all the new functions is not a viable approach for several reasons:
•
DBMSs would become so large and, in consequence, complex that they could no longer be maintained at reasonable cost.
•
Users would have to pay a high price for the added functionality, even if they do not need every part of it.
•
Users and applications would also have to pay a performance penalty for added functionality that they actually do not need.
•
A DBMS vendor might not have the expertise to perform such extensions and might not have the resources to undertake all extensions within a reasonable period of time.
Thus, beefing up a monolithic DBMS by adding more and more functions does not work. Instead, it seems attractive to consider the alternative of extending DBMSs in a piecemeal way, that is, striving to allow functionality to be added or replaced in a modular manner, as needed. Modular extensions to a DBMS require that a well-defined software architecture is imposed on the DBMS. This architecture clearly defines the places in the system where extensions are possible. In general, extensions should be possible at a few well-defined, distinct places in the system only, and the effects of modifications on other parts should be avoided or at least minimized. To that end, DBMS architectures need to be componentized in such a way that new components can be added to it or existing components can be exchanged in a flexible yet welldefined way. Thus, the componentized architecture specifies and restricts the ways in which a DBMS can be customized. Ultimately, the componentized architecture also defines the notion of component, itself, and hence the realm of valid extensions to a DBMS. We refer to DBMSs that have a componentized architecture and allow users to add components as component DBMSs (CDBMSs). Due to the componentized architecture, these extensions are possible without requiring other system parts to be rewritten. Components can be provided by third parties and possibly even users, thus increasing the developer base of a DBMS. Ultimately, unnecessary components do not have to be added, and applications therefore do not pay (in terms of system cost and performance) for functionality they do not use. In the remainder of this chapter, we introduce principles and different forms of CDBMSs. In Section 1.2, we present a more detailed motivation of componentization of DBMSs. We then discuss the foundations of CDBMSs: DBMS architecture and componentware. In Section 1.4, we review past efforts regarding the extensibility of DBMSs. Section 1.5 presents a more detailed taxonomy of CDBMSs, and Section 1.6 concludes the chapter.
4 |
Component Database Systems: Introduction, Foundations, and Overview
1.2
The Need for Componentized DBMSs In this section, we will motivate CDBMS in more detail (see also Vaskevitch 1994, 1995). To this end, we consider DBMS support for advanced information systems (IS) with respect to the following tasks: management of new data types, adapted and new database functionality, and better integration with other IS parts and the IS environment.
1.2.1
Handling Nonstandard Data Types In view of the success of relational database systems in supporting IS, new applications pose new requirements for database systems, and existing application areas define extended requirements. For example, many organizations offering services or products in the WWW want to manage their data in a database system. In particular, e-commerce applications need database support to manage their catalogues, product descriptions, orders, and so forth. Other areas, such as engineering (e.g., bio-engineering, mechanical engineering, or software engineering) and scientific applications (e.g., in biochemistry, geography, or meteorology), also extend the requirements of database systems. In those cases where data do not already live in existing systems, it will often be desirable to exploit database technology in order to benefit from data independence, query capabilities, transaction management, and consistency enforcement. These applications often need support for modeling and storing data that, so far, are not supported by database systems. Examples of such nonstandard data types are images, text, spatial data, temporal data, videos, Virtual Reality Modeling Language (VRML) objects, and structured documents. Moreover, applications often need to aggregate related instances of these data types into complex objects. While some advanced data types might be supported by specialized DBMSs (e.g., temporal, spatial, or multimedia database systems), in many cases a system that offers exactly the desired set of data types does not exist. Moreover, many of these specialized DBMSs are not available commercially. In addition to storing nonstandard data, the data also need to be retrieved. Thus, declarative queries over these nonstandard data types must be supported. However, many of them have type-specific query modes, such as content-based search in the case of images or information-retrieval-style queries in the case of text documents. Even existing applications (such as financial applications) might need more powerful query support (e.g., new forms of aggregates, such as a moving average). Query processing, of course, must not only be possible, but it also needs to be efficient. To that end, query optimizers must be able to generate efficient plans for operations on any kind of data supported by the DBMS, including nonstandard data. Optimizers thus must know
1.2
The Need for Componentized DBMSs
| 5
(or be taught) how to generate efficient execution plans for queries over predefined as well as nonstandard data types. Furthermore, nonstandard data types often require or can at least benefit from specialized index structures. Without exploiting these index structures, the performance of retrieval is, in many cases, suboptimal. DBMSs therefore should be able to incorporate new index structures whenever these are needed to execute queries against nonstandard data types efficiently. Currently, no single, off-the-shelf DBMS exists that meets all these requirements for a complete range of data types. Apparently, vendors are not interested in or able to build such a system for several reasons. Such a one-size-fits-all solution might meet the functional requirements of most applications, but for each application it also would provide functionality that this application does not need. Users would still have to pay for all the functions, regardless of what they effectively need. Moreover, adding functions usually leads to performance degradation, even if the new functions are not used—thus, customers have to pay a performance penalty for features they do not use. A further reason for vendors’ reluctance to build such a system is that adding too many features in parallel is not advisable and manageable. In special cases, a vendor also might not have the expertise to add the required new features, for example, if the implementation of new query modes or optimizations requires highly specialized, type-specific knowledge. Thus, a single DBMS supporting all the emerging and ever-moredemanding applications is not conceivable. A possible solution is to extend a DBMS on demand—that is, to provide standard, commonly needed functionality right away and to add other, more advanced features in a piecemeal way later. Customers can then purchase the core DBMS plus the extensions they require. In other words, instead of a monolithic DBMS, customers use an a la carte DBMS configuration.
1.2.2
Data Integration The analysis just described considers cases where all the data are kept, or can be brought, under the control of a single DBMS. This, however, is not always possible because in many cases the data have existed in specialized data stores, such as image libraries or document collections, for a long time. The migration of data and application programs without loss is often difficult or impossible. Hence, existing and new data are to be integrated under the common roof of an integration layer. The reasons for integration typically include the facts that data in different data stores are (semantically) related in some way and that the task of the integration layer is to store and maintain these relationships. Likewise, integration is often desired to achieve a uniform and homogeneous view of the diverse data stores and to receive database support
6 |
Component Database Systems: Introduction, Foundations, and Overview
for the integrated system (e.g., queries against the integrated system). Thus, we face the challenge of integrating a broad diversity of data stores into a database environment. In order to integrate disparate data stores, some kind of middleware system is needed that understands (or can be taught) the data models of the data stores. Even in the case of standardized data models (such as the relational one), each system has intricacies that require a systemspecific handling and mapping. Even worse, nondatabase systems handle data in an application-specific format. Consequently, integration cannot be done just once per system, but has to be done separately for each data source. As in the previous case, there are two options: Either a DBMS is built that knows everything (i.e., how to integrate all the possible data stores) or the integration platform is kept extensible and built in a piecemeal way. The first option is, again, not feasible because the DBMS has to know about all the interfaces and intricacies of the involved data stores. A feasible solution to this problem is, once more, to use the principles of componentization and extensibility. Assume the middleware layer defines a (minimal) set of functions it expects from the data stores. Each extension then implements these functions using the interface of the corresponding data store to be integrated. In other words, these extensions serve as gateways or wrappers: They abstract from the intricacies of the data store, and they (instead of the data stores proper) are hooked into the middleware. The apparent advantages of this approach are that the knowledge and customization required of the middleware layer are minimized (because it only needs to know the general abstract notion of wrappers) and that ideally wrappers or gateways are written once per type of data store.
1.2.3
Downsized Database Systems So far, we have implicitly assumed that required database functionality is provided in the form of database systems, and that nonstandard requirements are met by adding extensions. In order to use database functions, applications are implemented on top of a database system or database middleware. This, however, is not the only conceivable form of database support to meet novel requirements. Sometimes, applications need only parts of the entire spectrum of database functions or need these parts in a very specialized form. Forcing applications to always use full-fledged, entire DBMSs can turn out to be an overkill and/or to not lead to the right kind of support. As an example, suppose a document management system or a groupware system already has its own persistence mechanism to store documents. Assume further that queries are desired, but not yet supported. Query support could be achieved by moving the document management system on top of a
1.2
The Need for Componentized DBMSs
| 7
database system. But, if, instead, query processing were available as a stand-alone service, the document management system could exploit this service. Of course, splitting a DBMS into single, stand-alone parts is a difficult problem because the right balance between the granularity of services and the (number of) interservice relationships has to be found. Likewise, determining the right interfaces for the services is a complex task. However, feasible solutions to these questions and, in consequence, the availability of a broad range of services open up completely new ways of implementing database applications and application servers. These systems can then pick the service provider that is optimal for their problem and do not need to rely on large full-fledged DBMSs. Moreover, in cases in which services are standardized, we can envision that in such a scenario it will also be possible to mix and match services from different vendors. The objective of turning DBMSs into lightweight collections of freely combinable services can be met only if these services and their implementations are described and built as components.
1.2.4
Discussion The general problem in the scenarios we considered is the monolithic structure of a traditional DBMS. By monolithic structure or monolithic architecture, we mean that the DBMS is a single unit whose parts are connected to one another and dependent on one another to such a degree that modifications and extensions are not easily possible. In particular, each DBMS part might make assumptions about the requirements and operations of other parts, which leads to domino effects whenever changes are made in one place. Extensions can only be made if all the interconnections and dependencies are known. The evolution of and extensions to a DBMS are only possible by the vendor, who, for each extension, also needs to make sure that other affected parts are adequately adapted. To prevent misinterpretations, it should be stressed that monolithic architectures in their current common form have not emerged simply because developers and vendors did not know better. In the past, they have been sufficient because applications posed rather restricted requirements for DBMSs. Moreover, a monolithic DBMS can be implemented in a way that optimizes runtime performance and throughput for all applications that need just the functionality offered by the DBMS. Nevertheless, whenever extensions and customizations are considered, problems with monolithic architectures occur with respect to system complexity, system performance, system cost (production and maintenance), and complexity of system evolution. The general idea for overcoming these problems and still providing for the needed functionality as sketched in the previous scenarios is to
8 |
Component Database Systems: Introduction, Foundations, and Overview
offer a core system and to extend it as needed. CDBMSs are meant to allow such extensions in a controlled and safe way. Although different forms of CDBMSs exist, their common basis is a componentized architecture and the support of components implementing some kind of database function. These components can be added to a DBMS or used in some other way to obtain database support. In order to effectively allow extensions, the DBMS architecture must be made explicit and be well defined. While typically some parts of the CDBMS will need to be fixed without the possibility of altering them, others can be extended (we call this the variable part). Explicit means that the system structure is defined, preferably in a (formal) architecture model, and that the system structure is visible to those actors modifying it. Second, for a CDBMS the meaning of the notion of component is defined, with varieties ranging from abstract data types to implementations of internal tasks. However, the common characteristics of components are that they implement a coherent set of functions, make all restrictions concerning their use explicit in their interface, and are generally applicable across a variety of applications. Ultimately, a CDBMS architecture also defines places in the system (the variable part) where components can be added. These places can be thought of as hooks used to plug components into the enclosing system. Technically, these hooks are defined in terms of the interfaces the component can use and/or should implement.
1.3
Prerequisites and Foundations of CDBMSs In this section, we elaborate on the principles of CDBMSs in more detail. As mentioned before, extensions to a DBMS affect its architecture and also require certain prerequisites to be met. We therefore first briefly address the issue of DBMS architecture. The subsequent section relates CDBMSs to componentware and then gives a classification of CDBMSs.
1.3.1
DBMS Architecture Different kinds of architecture serve different purposes. For instance, the three-level-schema architecture (which distinguishes the external schemas that users work with, the internal integrated schema of the entire database, and the physical schema determining the storage and organization of databases on secondary storage) reflects the different levels of abstraction of data in a database system. The layered architecture in Figure 1.1 illustrates a hierarchy of mappings, where the topmost layer deals with data model entities and the bottommost layer deals with blocks and files. Finally, a task-oriented architecture identi-
1.3
Figure 1.1
Prerequisites and Foundations of CDBMSs
| 9
Layered DBMS architecture. Layer interface
Layer tasks
Typical entities
Set-oriented interface (L4)
Query optimization Access control Integrity enforcement
Relations Tuples Views
Record-oriented interface (L3)
Data dictionary Sorting component Transaction management Cursor management
Records Sets Access paths
Internal record interface (L2)
Record manager Access path management Lock management Log manager and recovery
Records B-trees
Buffer interface (L1)
Buffer manager
Pages Segments
Secondary storage manager interface (L0)
Secondary storage manager
Blocks Files
fies the relevant modules (i.e., their purpose and interface) and relationships to other modules (e.g., in the form of exported and imported interfaces). Examples of such tasks include query optimization, concurrency control, and recovery. The last two also are examples of tasks that are hard to assign to a specific layer in the layered architecture or that might even be addressed by multiple layers. Although a taskoriented architecture is much more suitable for reasoning about extensibility and DBMS construction, reference architectures rarely exist (with the strawman architecture developed by the Computer Corporation of America, CCA 1982, as a notable exception), and concrete architectures are described at a granularity too coarse to be helpful for our purposes. For educational purposes, it is convenient to consider a DBMS architecture as consisting of a number of layers (Härder & Rahm 1999; Härder & Reuter 1983; Ramakrishnan 1997). Each layer supports a set of data types and operations at its interface and typically consists of several components (modules or managers of some concrete or abstract
10 |
Component Database Systems: Introduction, Foundations, and Overview
resource). The data types and operations defined for the modules of one layer are implemented using the concepts (data types and operations) of the next-lower level. Therefore, the layered architecture can also be considered as a stack of abstract machines. Concretely, the layered architecture model as introduced by Härder and Reuter (1983) is composed of five layers (see Figure 1.1): 1. The uppermost layer (L4) supports logical data structures such as relations, tuples, and views. Typical tasks of this layer include query processing and optimization, access control, and integrity enforcement. 2. The next layer (L3) implements a record-oriented interface. Typical entities are records and sets (e.g., as found in the Committee on Data Systems Languages, CODASYL data model) as well as logical access paths. Typical components are the data dictionary, transaction management, and cursor management. 3. The middle layer (L2) manages storage structures (internal records), physical access paths, locking, logging, and recovery. Therefore, relevant modules include the record manager, physical access path managers (e.g., a hash table manager), and modules for lock management, logging, and recovery. 4. The next layer (L1) implements (page) buffer management and implements the page replacement strategy. Typical entities are pages and segments. 5. The lowest layer (L0) implements the management of secondary storage (i.e., maps segments and pages to blocks and files). In general, due to performance considerations, no concrete DBMS has fully obeyed the layered architecture. Note further that different layered architectures and different numbers of layers are proposed, depending on the desired interfaces at the top layer. If, for instance, only a set-oriented interface is needed, it is useful to merge the upper two layers. From a more practical point of view, most DBMS architectures have been influenced by System R (Astrahan et al. 1976), which consists of two layers: the relational data system (RDS), providing for the relational data interface (RDI); and the relational storage system (RSS), supporting the relational storage interface (RSI). While RDS implements SQL (including query optimization, access control, triggers, etc.), RSS supports access to single tuples of base relations at its interface.
1.3.2
Components and Database Management System Architecture When we strive for reusability, extensibility, openness, and interoperability of database systems, looking at software engineering research and practice yields helpful insights. In particular, componentware (Allen &
1.3
Prerequisites and Foundations of CDBMSs
| 11
Frost 1998; D’Souza & Wills 1999; Griffel 1998; Hamilton 1997; Krieger & Adler 1998; Nierstrasz & Dami 1995; Nierstrasz & Meijler 1998; Orfali et al. 1996; Szyperski 1997) is a recently proposed paradigm to address these issues. This is the notion that software systems are built in a disciplined manner out of building blocks with specific properties, called components. There is currently no widely agreed-on definition of the term component; however, the following characteristics of components can be found in most definitions in the literature. A (software) component, then, is a software artifact modeling and implementing a coherent and well-defined set of functions. It consists of a component interface and a component implementation. Components are black boxes, which means that clients can use them properly without knowing their implementation. Component interface and implementation should be separated such that multiple implementations can exist for one interface and implementations can be exchanged. Defining components as black boxes also means that each component sees only the interfaces of other components; that is, access to the internal operations and structures of other components is not permitted. A component should not have an overly high number of relationships to other components because this might restrict its reuse potential. Systems are built by putting components together to form new software systems (this principle has been referred to as reuse by composition). Systems constructed by composition can be modified or extended by replacing or adding new components. Component-based systems are expected to facilitate the addition or replacement of components without recompilation (or even without shutting down) the entire system. In order to put the right components together to obtain complete and adequate systems, a frame (into which components are plugged) and rules governing the composition process are needed. The frame is given by the software architecture (Perry & Wolf 1992; Shaw & Garlan 1996) of the system under construction. Similar software systems are then described by architecture skeletons or generic architectures (Nierstrasz & Meijler 1998) that are successively enhanced and completed by components. Thus, as a prerequisite, the underlying generic architecture needs to be defined in terms of components (acting as placeholders) and connections in such a way that components can later be added in a meaningful and consistent way. Components usually possess a coarser granularity than objects in object-oriented systems and models. A well-designed component supports a coherent set of tasks (e.g., in one of our scenarios, storing and retrieving textual documents), while objects and classes typically address only a part thereof. Components and objects are, however, not mutually exclusive alternatives; rather, components leverage object orientation to a higher level of abstraction and granularity. In fact, “under the hood” components are often assemblies of objects.
12 |
Component Database Systems: Introduction, Foundations, and Overview
We use the principles of componentware to better understand, abstract, and classify the various approaches to extending and customizing DBMSs. Moreover, the characteristics of componentware (components and architecture) are crucial requirements for systematic and well-defined extensibility and integration. Extensions to a DBMS in this context are represented as components (i.e., they should meet the aforementioned properties of components). Further, DBMS should exhibit a componentized architecture, at least for those parts that are intended to be customizable.
1.4
Related Work: The Roots of CDBMSs In this section, we review the roots of CDBMSs. In a nutshell, these roots are relational database systems, object-orientation in general and object-oriented DBMS in particular, and extensible database systems. Extensible database systems (Carey & Haas 1990) all attempt to ease the construction of DBMSs by exploiting some kind of software reusability (Geppert & Dittrich 1994). The proposal is for a general core that can be customized or extended in some way by users, or even used to generate some DBMS parts. Here, we survey these approaches and classify them by their dominant way of extending or constructing DBMSs.
1.4.1
Kernel Systems Kernel systems offer the common functionality required by all or most DBMS (e.g., physical object management), but typically are not fully functional DBMSs. The upper layers of a DBMS have to be implemented (i.e., programmed) by the DBMS implementor (DBI). The major question in this approach is how powerful the kernel interface is. A low-level interface (e.g., page management or unstructured records) leaves the DBI enough freedom to implement the desired concepts. On the other hand, much implementation work is necessary due to the low level of the kernel. Alternatively, if a kernel supports more powerful concepts, less implementation is required from the DBI, while the kernel will be geared toward specific kinds of systems or constructions. The Wisconsin Storage System (WISS) (Chou et al. 1985) offers basic DBMS functionality. At its top layer interface, WISS provides for (unstructured) records and scans of files of records, where scans can contain search predicates. All necessary additional functionality has to be implemented on top of this layer. DASDBS (Darmstadt Database System) is another kernel system (Paul et al. 1987; Schek et al. 1990) that offers a general fixed interface. The reusable part of DASDBS is the complex record manager, which
1.4
Related Work: The Roots of CDBMSs
| 13
handles record structures comparable to nested relations (Schek & Scholl 1986). DASDBS also supports multilevel transactions (Weikum 1991) and, therefore, provides support for implementing transaction management on upper levels. Object managers such as Bess (Biliris & Panagos 1995), EOS (Biliris 1992), the EXODUS storage manager (Carey et al. 1986), Kiosk (Nittel and Dittrich 1996), ObServer (Hornick & Zdonik 1987; Skarra et al. 1991), and Texas (Singhal et al. 1992) also fall into the class of kernel systems.
1.4.2
Pure Extensible Database Systems Pure extensible database systems allow new parts such as abstract data types or index structures to be added to the system (note that the term “extensible database system” in the broader sense often refers to systems that support any kind of enhancing, extending, or customizing DBMS; Carey & Haas 1990). Enhancing DBS with new Abstract Data Type (ADT) or index structures has been pioneered in the Ingres/Postgres systems (Stonebraker et al. 1983; Stonebraker 1986; Lynch & Stonebraker 1988). Ingres supports the definition of new ADTs, including operators. References to other tuples can be expressed through queries (i.e., the data type postquel), but otherwise ADTs and their associated relations still must be in first normal form. This restriction has been relaxed in systems that have a more powerful type system (e.g., an object-oriented data model) (Bancilhon et al. 1992; Dadam et al. 1986; Dittrich et al. 1986; Linnemann et al. 1988; Schek et al. 1990). Another area in which extensions have been extensively considered are index structures. In Ingres/Postgres, existing indexes (such as Btrees) can be extended to also support new types (or support existing types in a better way). To extend an index mechanism, new implementations of type-specific operators of indexes have to be provided by the user. In this way, existing index structures are tailored to fit new purposes and thus have been called extended secondary indexes. Since most of the implementation of an index does not need to be changed, extensions are easier to perform than implementing a completely new index structure. The principle of extended secondary indexes has recently been applied in the DB2 UDB object-relational DBMS (Chen et al. 1999).
1.4.3
Customizable Systems This kind of system is based on a complete DBMS that is modified or extended so that it satisfies new requirements. The basic DBMS is customized to a concrete, new DBMS. In principle, (internal) DBMS components are exchanged in order to achieve specific functionality in a
14 |
Component Database Systems: Introduction, Foundations, and Overview
different way than in the original system. Therefore, a crucial issue is the underlying architecture and the proper definition of places where exchanges can be performed. Starburst (Haas et al. 1990; Lindsay et al. 1987) is an example of a customizable DBMS. Its query language can be extended by new operators on relations (Haas et al. 1989). Various phases of query processing in Starburst are also customizable. Queries are internally represented as query graphs, and query rewrite transforms these graphs into equivalent, better ones. The rewrite part of Starburst’s query processor can be customized (Pirahesh et al. 1992) by adding new rewrite rules (where each rule is defined in the form of two C procedures). The Starburst query optimizer maps a query graph into an executable plan in a bottom-up manner. For each element of the query graph, it creates one or more alternative plans and selects the cheapest plan that meets the required properties. The mapping of query graph elements into plan operators is defined by Strategy Alternative Rules (STARS) (Lohman 1988). The optimizer can be customized by defining new STARS. Storage methods can also be customized in Starburst in that (new) relation storage methods are plugged into the system. Relation storage methods implement the storage of relation instances and operations on them. They determine how a specific relation is represented physically (e.g., as a temporary relation or in the leaves of a B-tree). Furthermore, attachment types can be associated with relation instances. The meaning of attachments is that their operations are invoked as a side effect of relation-modification operations (operations that update a relation). Attachment types include access structures, integrity constraints, and triggers. Both relation storage methods and attachment types have to implement a collection of generic operations.
1.4.4
Toolkit Systems Toolkit systems offer libraries of modules that implement alternative techniques for given tasks (e.g., physical access paths). The variable part of a DBMS is then built by selecting one technique from each library, and plugging together the chosen techniques. The EXODUS (Carey et al. 1990, 1991) approach (see also Section 1.4.6) applies the idea of a toolkit for specific parts of the DBMS. A library is provided for access methods. While the library initially contains type-independent access methods such as B-trees, grid files, and linear hashing, it can also be extended by a DBI with new methods. Hereby, the DBI can use the DBMS implementation language E (Richardson & Carey 1986), a derivative of C++, for the implementation of extensions as well as for other parts of a DBMS. Furthermore, another library exists for operator methods, each of which implements an operation on a single type of storage object (e.g., selection). These operator
1.4
Related Work: The Roots of CDBMSs
| 15
methods are used later to realize operators of a query language (see later sections). The Open OODB (Open Object-Oriented Database) approach (Blakeley 1994; Wells et al. 1992) supports the construction of objectoriented DBMSs. Open OODB distinguishes a meta-architecture and an extensible collection of modules implementing specific functionality (policies). The meta-architecture defines a set of kernel modules, mechanisms to define the system architecture (boundaries between components), and so forth. For some specific functional tasks, various policies can be applied (and can even be exchanged dynamically). Each domain for which multiple policies can be used is controlled by a policy manager, and all the policies of a specific domain are required to guarantee the same invariants (which ensures that they are interchangeable). In a nutshell the construction process with Open OODB consists of two major steps: defining the architecture with the means provided by the meta-architecture, and selecting policies for those required domains that allow multiple policies. Further examples of toolkits are Trent (Unland & Schlageter 1992) for the construction of transaction managers (mainly, transaction structures and concurrency control) and A la carte (Drew et al. 1992) for the construction of heterogeneous DBMSs. One problem in any toolkit approach is the consistency (or compatibility) of reused components. Lienert (1987) investigates conditions under which DBMSs can be configured. He first identifies the major tasks of a DBMS and generally distinguishes access (storing and retrieving data) from management (concurrency control, recovery, integrity enforcement, etc.). Furthermore, he elaborates the definition of standard techniques for these tasks and interrelationships between these techniques. Then, attributes representing certain properties of techniques are derived for the various tasks, rules are specified for deriving (deducing) some of them, and conditions are specified that have to hold for a configuration (e.g., a configuration is not correct if specific techniques have mutually contradictory values for some attributes).
1.4.5
Transformational Systems In a transformational approach to DBMS construction, the DBI uses languages for the specification of the functions to be implemented. These functions are implemented using the interfaces of a lower layer, and DBI must also specify the mapping to that implementation base. GENESIS (Batory et al. 1988a, 1988b) is a transformational approach that supports the implementation of data models as a sequence of layers. The interface of each layer defines its notions of files, records, and links between files. The implementation of a layer is described by the transformation of its concepts to those of the next-lower layer. Transformations themselves are collected in libraries, so that they can
16 |
Component Database Systems: Introduction, Foundations, and Overview
be reused for future layer implementations. The basic (fixed) component is the JUPITER system for file management. The sequence of transformations maps the data-model concepts to the JUPITER interface. The approach of GENESIS has been generalized to a construction method for hierarchical software systems (Batory & O’Malley 1992). The underlying assumption is that DBMSs can be constructed as layered systems. The central notion of this approach is the component, where each component is an element of a realm. All the elements of a realm have the same interface, but possibly different implementations. Then, a software system is described as a kind of algebraic expression. Component reuse refers to referencing the same component in multiple expressions. Another transformational approach that uses specification constructs similar to those of Acta (Chrysanthis & Ramamritham 1994) has been described by Georgakopoulos et al. (1994). The transaction specification and management environment (TSME) consists of two building blocks: a transaction specification facility (TSF) and a programmable transaction management mechanism (TMM). TSF also uses the notion of dependency between transactions. Specific dependencies are implemented through Event-Condition-Action (ECA) rules in DOMS.
1.4.6
Generators Generator systems support the specification of (parts of) a DBMS functionality and the generation of DBMS components based on those specifications. The DBI specifies a model (e.g., an optimizer, a data model, or a transaction model), which is input to a generator. The generator then automatically creates a software component that implements the specified model based on some implementation base (e.g., a storage manager or kernel in the case of data-model software generation). The knowledge for mapping the concepts specified in the model to the implementation base is in the generator. An example of a generator system is the EXODUS query-optimizer generator (Graefe & DeWitt 1987). Input to the generator is a model description file, which contains a set of operators, a set of methods to be considered for constructing query execution plans, transformation rules, and implementation rules. Transformation rules specify legal transformations of query trees into new ones, and implementation rules define correspondences between operators and methods (i.e., which method can be used to implement an operator); for example the join operator could have as a corresponding method a hash join. In addition to the model description, a set of C procedures has to be supplied, which, for example, determine the cost functions of the various methods. Given this information, the generator can create a query optimizer for a DBMS under construction.
1.4
Related Work: The Roots of CDBMSs
| 17
Volcano (Graefe & McKenna 1993), the successor of the EXODUS optimizer generator, also falls into the group of generator systems. Volcano has been used to build the optimizer for Open OODB (Blakeley et al. 1993).
1.4.7
Frameworks Frameworks model families of software systems with a common prescribed architecture and behavior. They model the variable parts of the considered systems as abstract classes, and a concrete system can be built by deriving new subclasses from the abstract ones (called reuse by inheritance). Opt++ (Kabra & DeWitt 1999) is a query-optimizer framework. Abstract classes model operator trees and access plans. Further classes model the transformations of operator trees into other trees or into access plans, or of access plans into other plans. Finally, search strategies are also represented as classes. The Opt++ framework defines a general architecture of a query optimizer in terms of instances of these classes. A concrete optimizer is built by deriving concrete classes from the abstract classes prescribed by the framework. Other frameworks for building query optimizers are the ones described in Özsu et al. (1995), Cascades (Graefe 1995), and EROC (Extensible Reusable Optimization Components) (McKenna et al. 1996). Framboise (Fritschi et al. 1998) is a framework for layering active database functionality on top of passive DBMSs. Generalized search trees (GiST) (Hellerstein et al. 1995) allow the incorporation of new index structures into a DBMS. GiST supports tree-based indexes; their general behavior (e.g., insertion and deletion) is predefined and thus does not need to be rewritten for new instances of GiST. New index structures can be built by implementing or overriding six methods specific for concrete search trees. The GiST approach has been extended to cover a broader spectrum of search trees and search techniques (e.g., nearest neighbor) (Aoki 1998). Kornacker et al. (1997) shows how concurrency control and recovery can be implemented for instances of GiST. GiST has recently been incorporated into Informix Dynamic Server with Universal Data Option (Kornacker 1999).
1.4.8
Discussion Many of the techniques surveyed in this section have made their way into products (user-definable types, extensible index structures, etc.). While there are some success stories (i.e., techniques having found their way into products), there are also lessons to be learned and problems to be addressed.
18 |
Component Database Systems: Introduction, Foundations, and Overview
An interesting observation is that recent efforts have assumed a fixed data model, such as the object-relational or object-oriented model. This is in contrast to older work, in which the implementation of new data models was also investigated. The reasons for this trend are, in our opinion, that there is no significant market for new specialized data models and that the invention of new data models requires that most of the other DBMS parts (such as the query processor) as well as tools (e.g., for database design) be adapted accordingly. Most of the recent efforts also assumed a fixed transaction model (e.g., ACID-transactions or closed nested transactions), while significantly less work has been done on extensible transaction management. Exceptions to this observation are concurrency control and recovery for extensible index structures (e.g., Kornacker et al. 1997). The probable reasons for this trend are that many of the proposed nonstandard transaction models have never been implemented and questions remain concerning their semantics and range of applicability. Furthermore, it can be concluded from the work done in the past and the functionality available today that some techniques are more feasible than others or that they are well suited for specific cases. For instance, techniques requiring a large implementation effort (e.g., implementing a DBMS on top of object managers) are only practical for a vendor who wants to offer a suite of products and wants to share part of the codebase among products. Customization techniques require a sound understanding of the systems and of the effects that customizations have. They are therefore also only applicable for vendors, unless their specification is possible in a very high-level way and implementors are not required to know the internals of the core system. A further noteworthy observation is that approaches in extensible database systems seem to follow trends in software engineering, but delayed a few years. For instance, while older approaches have used the principles of generators, more recent efforts have devised frameworkbased solutions. Finally, another lesson is that feasible approaches to extensibility should consider the entire DBMS. Approaches concentrating on a single aspect (e.g., transaction management) tend to make assumptions that restrict other parts of the DBMS. These restrictions then raise questions of consistency and compatibility when other parts of the system should be kept extensible or customizable as well. In consequence, in order to be feasible, extension mechanisms in one area should not preclude extensions in other parts of the system. Finally, as more and more extensions become available, we expect that problems known from software-reuse research and practice (Krueger 1992) need to be solved in our context as well. For instance, whenever there is a significant number of components to choose from, users need support for selection (i.e., finding the adequate reusable software
1.5
Component Database Models
| 19
artifact). In cases in which extensions cannot be reused as is, users need help to adapt reused extensions. We conclude from these lessons that feasible approaches to extensibility and customizability need to rely on the componentization of DBMSs. In this way, variable parts are clearly identified, as are the component plugs used for extension and composition. This in turn significantly reduces the amount of knowledge about internals required for extensions and customizations. The notion of component used to describe possible extensions is suitable for addressing reuse problems, due mainly to their higher level of abstraction, but also due to the requirement that connections (to other system parts) be well defined.
1.5
Component Database Models In this section, we present the various types of CDBMSs. We consider two dimensions:
•
Components: What kinds of components are considered? Which kind of database functionality or DBMS task can be represented as a component? How are components defined?
•
Architecture: What is the generic DBMS architecture that allows plug-in components? What are the respective fixed and variable parts? How are components and connections described?
The classification given in this section does not imply that all the categories are completely disjoint. In fact, a concrete system can belong to multiple categories, for instance, if it allows the addition of different kinds of components. We return to this issue in Section 1.5.5.
1.5.1
Plug-in Components The first category of CDBMSs comprises universal servers. The core system in this group is formed by a fully functional DBMS that implements all the standard functionality expected from a DBMS. Nonstandard features or functionality not yet supported can be plugged into this DBMS (see Figure 1.2). Thus, such a system is functionally complete and meets basic requirements, while extensions add further functionality for specialized needs. The components in this kind of CDBMS are typically families of base and abstract data types or implementations of some DBMS function, such as new index structures. The DBMS architecture, among others, defines a number of plugs that components can use, for example, interfaces of functions that the DBMS will invoke and that the component thus must implement. In other words, the architecture formulates
20 |
Component Database Systems: Introduction, Foundations, and Overview
Figure 1.2
Principle of plug-in style CDBMSs.
DBMS
expectations concerning interfaces that the component must meet in order to be integrated successfully. To date, all systems in this category are based on the relational data model and existing relational DBMSs, and all of them offer some object-oriented extensions. We thus discuss this type of CDBMS in an object-relational (Stonebraker et al. 1999) context, although componentization is also possible for object-oriented database systems. Example systems include IBM’s DB2 UDB (IBM 1995), Informix Universal Server (Informix 1998), Oracle8 (Oracle 1999a), and Predator (Seshadri 1998). Descriptions of sample component developments can be found in (Bliujute et al. 1999; Deßloch & Mattos 1997). The approaches we consider here aim at providing data management facilities for new nonstandard data types and for nonstandard or specialized database functionality within the DBMS. Instances of these new data types are thus stored in the database, and their manipulation and retrieval is implemented by the DBMS. Assume an application needs support for data types not yet supported by the DBMS in use (e.g., spatial data or social security numbers). The ultimate task is to teach the DBMS how to store, manipulate, and retrieve instances of these data types. In the first step, the designer has to model the structure of the desired new data types as well as their type-specific behavior. From a user’s point of view, new data types are either complex or atomic. Complex data types possess structure and their values can thus be represented as specialized records or tuples or collections using the data definition language. Atomic data types do not have an internal structure and consist of literal values (such as numbers or characters). For atomic types, the DBMS needs basic information, such as their length in bytes, in order to store their instances. Thus, for spatial data, we might specify points as a complex data type modeling locations in three-dimensional space. 3DPoint could be specified as a tuple type with attributes x, y, and z, each of which is of type decimal. Another example would be Region, whose structure
1.5
Component Database Models
| 21
could be defined as a pair of points LowerLeft and UpperRight. Social security numbers would be defined as an atomic type whose instances are 9 bytes long. In addition to the structure of data types, the type-specific behavior of the new sorts of data needs to be specified. For each complex data type, its specific functions, operators, and predicates must be made known to the DBMS. In our example, a possible function would be the move function for points, and a typical predicate would be overlaps, which tests whether two regions intersect in some subregion. Atomic types normally do not exhibit type-specific behavior. They, however, often require special treatment with respect to ordering and representation. Indeed, one reason to introduce a new type for nonstandard atomic type is that existing DBMSs do not know how to sort them correctly. Thus, for new atomic types it is necessary to define operators such as <. Furthermore, the internal representation usually is not very telling for end users; thus, functions converting elements of atomic types from an internal (stored) representation to and from an external one are needed. For each function, operator, and predicate, a signature (i.e., its name, arguments, and result) must be defined and an implementation must be provided. The implementation language in turn depends on the DBMS, possibilities ranging from DBMS-specific languages such as Oracle’s PL/SQL to general-purpose programming languages such as C or C++ or Java. The collection of data types (their definition and implementation) forms a significant part of a component, which then needs to be plugged into the DBMS. To this end, DBMSs in this category offer a facility to register new components. Component registration introduces new definitions (for types, functions, etc.), and also informs the DBMS where (i.e., in which files) implementations can be found. After a data type has been registered, applications can, in principle, start to use them (i.e., to create and retrieve instances of them). However, efficient retrieval and processing might require further enhancements to the DBMS, particularly to the access path manager and the query optimizer. Thus, we observe that extending a DBMS by plugging in new components often has a sort of domino effect because other parts must be adapted accordingly. Current DBMSs typically contain B-tree access paths and possibly also hash-based indexes. B-trees can handle one-dimensional keys very well and rely on the orderability of keys. New types such as spatial data types can, however, be multidimensional; thus they would not be adequately served by B-trees, and, consequently, query processing might easily become inefficient. Therefore, in some situations it will be desirable to add new access methods to the DBMS, such as one that supports multidimensional indexing (Gaede & Günther 1998). In order to integrate well with other parts of the DBMS, such a new index has to
22 |
Component Database Systems: Introduction, Foundations, and Overview
implement exactly those functions the DBMS calls for its indexes (i.e., insertion and deletion of entries, index scanning, etc.). Furthermore, the addition of new data types also affects query processing, in particular query optimization. Cost-based optimization techniques, for instance, need information about the cost of each operator (in terms of CPU consumption and I/O-operations) to find an optimal plan. Thus, to ensure efficient query processing in this case, it is necessary to provide the optimizer with the adequate information, such as the knowledge of how it can estimate the cost of evaluating newly defined predicates. As a typical example of the aforementioned domino effect, consider concurrency control for access paths. Many DBMSs use specialized concurrency control protocols on indexes to prevent unnecessary locking conflicts (Bayer & Schkolnick 1997; Kornacker et al. 1997), which otherwise would increase lock contention and decrease transaction throughput. Therefore, whenever a new index is introduced, concurrency control (for this new index type) should also be adapted, which is, however, not possible in current systems.
1.5.2
Database Middleware The typical aim of systems falling into this category is to integrate existing data stores, that is, to leave data items under the control of their original (external) management systems while integrating them into a common DBMS-style framework. For instance, existing data stores should be integrated into query processing or transaction management of the entire system. External systems will in many cases exhibit different capabilities, such as query languages with varying power or no querying facilities at all. The different data stores might also have different data models (i.e., different data definition and structuring means), or no explicit data model at all. Users and applications should, nevertheless, be shielded from this kind of heterogeneity and should be provided with a uniform and integrated view of the entire system. This task is accomplished by the CDBMS acting as middleware (Orfali et al. 1996; Ferreira Rezende & Hergula 1998) between these data stores and the applications of the integration. The overall aim of systems in this group is similar to that of multidatabase systems (Sheth & Larson 1990; Elmagarmid et al. 1999), although the latter typically consider only the integration of database systems, instead of any kind of data store. The goal of graceful integration is achieved through componentization in the following way (Figure 1.3). The architecture introduces a common (intermediate) format into which the local data formats can be translated. Components are introduced that are able to perform this kind of translation. Second, common interfaces and protocols define how the database middleware system and the components interact (e.g., in order to retrieve data from a data store). These components
1.5
Figure 1.3
Component Database Models
| 23
Middleware-style CDBMSs. Application 1
Application n
(called wrappers) are also able to transform requests issued via these interfaces (e.g., queries) into requests understandable by the external system. In other words, these components implement the functionality needed to access from, within the database middleware system, data managed by the external data store. The underlying problem in this respect is that the database middleware needs to understand the data formats and the functions of each data source. Two extreme alternatives exist to tackle this problem. In the first, information about the data sources’ interfaces are hard-wired into the (integrating) DBMS. The realm of integrable external data stores is thus restricted, and the DBMS needs to be extended for each specific type of data store. In the other alternative, a common data model, query language, or interface to external data stores is set as a prerequisite. For instance, we might require that all data stores understand SQL and be able to return the results of SQL queries in the form of relations. Each data store that does not implement SQL right away would thus have to be extended to do so. Moreover, all the data sources and the middleware would have to agree on one specific SQL dialect. The solution that helps to overcome the intrinsic problems of both approaches lies in the introduction of additional components. In a nutshell, a component is pushed between the DBMS and each data source. These components serve to homogenize differences in formats and functionality from the DBMS’s point of view. From the data sources’ perspective, they level the different data source capabilities to a common basis. Thus, each component mediates between the data sources and the DBMS, or—from the DBMS’s point of view—wraps the data source. The first prerequisite of this approach is a common data abstraction (e.g., objects). Second, the mediation components must offer a common interface. This interface is used by the DBMS to request data from
24 |
Component Database Systems: Introduction, Foundations, and Overview
the data sources. Each component should at least support the minimal interface, such as scanning a collection of data entities. Depending on the data source capabilities, its mediation component can, however, contribute more features or specialized functions, such as predicate evaluation. Whenever the DBMS executes a query and determines that it needs results from the data source, it sends a request to the corresponding component, which in turn translates the request into a form the data source can handle. Eventually, the component receives the results from the data source and converts them into the common format expected by the database middleware. This approach requires an appropriately defined notion of components for wrapping the external data stores because it must match the requirements and characteristics of the DBMS while also using the capabilities of the data stores. Moreover, using the full potential of this approach means that a component is written once for each kind of data store (e.g., for a specific image management system) and used for all subsequent instances of the data store. Ultimately, users should be allowed to introduce new components to integrate data stores not yet covered. To that end, the implementation of a component must be possible without the component implementor knowing the internal structure and operations of the database middleware; the requirements to be met by a component must be entirely expressed in its interface. Examples of this approach include Disco (Tomasic et al. 1998), Garlic (Tork Roth & Schwartz 1997), OLE DB (Blakeley 1996a, 1996b; Microsoft 1996), Tsimmis (Papakonstantinou et al. 1995a), Harmony (Röhm & Böhm 1999) (which implements the CORBA query service), and Sybase Adaptive Server Enterprise (Olson et al. 1998) (which allows access to external data stores, in Sybase called specialty data stores, and other types of database systems).
1.5.3
DBMS Services The third type of componentized DBMS is characterized by a serviceoriented view of database functionality. All DBMS and related tasks are unbundled (Geppert & Dittrich 1998) into services. As a result, a monolithic DBMS is transformed into a set of stand-alone services. For instance, the unbundling process can result in persistence, query, and transaction services. Applications then no longer operate on fullfledged DBMSs, but instead use those services as needed (Figure 1.4). Each service is defined by one or more interfaces and implemented by some software systems. Services (i.e., their interfaces) are defined in a common model or language. Services are implemented using a common platform in order to render the service implementations exchangeable and freely combinable. Services should be as indepen-
1.5
Figure 1.4
Component Database Models
| 25
Principle of service-oriented architectures. Application
Platform
dent as possible from one another; that is, they should not rely on the availability of a specific other service in the environment and they should not assume that other services are implemented in a particular way. In this scenario, (database) services and their implementations are viewed as components. Given that both the platform and the service interfaces are standardized, exchangeability and compatibility are achieved. Different implementations of a service can be exchanged, and implementations of different services—possibly from different vendors—can be plugged together. An example of this approach are CORBAservices (OMG 1995a), which leverage several DBMS tasks to general object systems. These services are standardized by the Object Management Group (OMG). Service interfaces are defined using the Interface Definition Language. Services related to database functionality include persistency, concurrency, and queries. Such services are implemented on top of Object Request Brokers (ORB) (OMG 1997a). In contrast to the other approaches discussed here, the components (i.e., services) are not meant to extend or customize a DBMS. Rather, the systems constructible by using services are distributed applications located above the DBMS level. In fact, services such as persistence could be implemented by a DBMS, and the transaction service might be implemented by transaction processing monitors (Bernstein & Newcomer 1996). The underlying semantics and models of services (such as the transaction model or query language) are fixed. Thus, for a transaction service, there will be distinct implementations all implementing transactions such as flat ACID transactions, but transactions such as cooperative or other forms of nonstandard transactions (Elmagarmid 1992) will not be supported. A second example of this approach is the strawman architecture developed at Computer Corporation of America (CCA 1982). The aim of this study was to propose standard interfaces between users or applications and DBMSs as well as standards for internal interfaces (such that different DBMS subsystems can be combined more easily). This
26 |
Component Database Systems: Introduction, Foundations, and Overview
study identified 79 subcomponents, which are grouped together into 38 components, some of which denote internal functions, while others refer to external (i.e., visible at the DBMS interface) ones. The subcomponents are partitioned into six groups of related tasks. For each subcomponent, procedures and interfaces are proposed. Therefore, the view of DBMS architecture is more service-oriented, and concrete components are proposed to implement such services (to the best of our knowledge, however, a DBMS implementing this architecture has never been built).
1.5.4
Configurable DBMS In the previous form of CDBMSs, the set of services have been standardized and fixed. One step further are configurable DBMSs that allow new DBMS parts to be developed and integrated into a DBMS (Figure 1.5). Thus, configurable DBMSs are similar to DBMS services in that they also rely on unbundled DBMS tasks that can be mixed and matched to obtain database support. The difference lies in the possibility of adapting service implementations to new requirements or of defining new services whenever needed. The components are DBMS subsystems, which are defined in an underlying architecture model. In this approach, the architecture model is also used to model the DBMS architecture, which is no longer fixed. Configurable DBMSs also consider services as unbundled representations of DBMS tasks. However, the models underlying the various services and defining the semantics of the corresponding DBMS parts can now in addition be customized. As a consequence, components for the same DBMS task can vary not only in their implementations for the same standardized interface, but also in their interfaces for the same task. DBMS implementors select (or construct new) components implementing the desired functionality and obtain a DBMS by assembling the selected components (Figure 1.5). The DBMS is thus the result of a configuration process. An example of a configurable DBMS is the KIDS project (Geppert & Dittrich 1995; Geppert et al. 1997), which aims at constructing a DBMS by developing subsystems that implement various aspects of a DBMS (such as transaction management or constraint enforcement) and by then configuring these subsystems together into a coherent and complete DBMS. The underlying architecture model provides for constructs that are adequate for defining the architecture of DBMSs. The tasks and functionality of a DBMS and its components are modeled by means of services. Services are provided by reactive components called brokers (i.e., brokers are responsible for services). In the case of service requests, the responsible brokers react by providing the service. The functionality of
1.5
Figure 1.5
Component Database Models
| 27
Principle of configurable DBMSs.
DBMS
each subsystem under construction is represented as a set of services, and one or more brokers are designated as components implementing the subsystem. The construction process defines how to proceed in order to obtain a DBMS with the desired functionality. This process consists of several phases including requirements analysis, design, implementation, and integration of multiple DBMS subsystems. Some phases of the process are common to all constructible DBMSs and are independent of subsystems (e.g., requirements analysis and architecture design). For each type of subsystem, a dedicated construction subprocess is defined and integrated into the enclosing DBMS-construction process. For each subsystem, a dedicated specification language is used to define its functionality (such as Acta in the case of transaction models, Chrysanthis & Ramamritham 1994; or Second-order Signature (SOS) in the case of data models, Güting 1993). These specifications serve as the input to subsystem-specific implementation phases, which in turn use techniques such as the generation of subsystems or the configuration of subsystems out of reusable, already existing components (Geppert & Dittrich 1995).
1.5.5
Discussion We now summarize and discuss the elaboration of CDBMS models. Table 1.1 summarizes the characteristics of the four categories. These categories are not necessarily disjoint. For instance, it is conceivable that both plug-in components for nonstandard data and wrappers for accessing external data stores can be added to a single system. Such a system would therefore belong to the first two categories (e.g., as outlined in Stonebraker et al. 1998). Likewise, OLE DB could also be classified as a
28 |
Component Database Systems: Introduction, Foundations, and Overview
Table 1.1
Classification of component DBMS-models
Category
Purpose
Plug-in DBMS
Extend existing DBMS with Interfaces expected or nonstandard functionality provided by kernel
ADT definition and implementation, new indexes
Database middleware
Integrate existing data Common format and stores into database system interfaces between DBMS and wrappers
Wrappers for external data sources
Serviceoriented architectures
Provide database function- Service definitions ality in standardized, unbundled form
Service implementations
Configurable DBMS
Compose nonstandard DBMS out of reusable components
Architecture, Plugs
Service definitions
Typical Components
DBMS subsystems
configurable DBMS, since it in principle allows the exchange and addition of internal components, for example, to add specialized query processors. In the remainder of this book, representatives of these groups are discussed in more detail.
1.6
Summary and Conclusion Today’s users of database technology require extensibility in all conceivable forms. In order to maintain the software quality and robustness that current (monolithic) DBMS engines exhibit, yet also meet the extensibility requirements, database technology needs to adopt the principles of component technology. This chapter has also classified approaches to componentizing DBMS and database functionality in general. The following chapters describe prominent representatives of these classes in more detail.
Introduction A component database management system (DBMS) is a systems software framework that application developers can extend by embedding modules of programming logic in it. These extensions implement database objects such as new data types, application-level object classes, analytic algorithms, and external data-access facilities. An objectrelational DBMS (ORDBMS) is one kind of component DBMS. While early ORDBMSs were obliged to create their own component models and proprietary interfaces, in recent times the trend has been to adopt open standards such as Microsoft’s Component Object Model (COM), SunSoft’s Enterprise JavaBeans (EJB), and the Object Management Group’s Common Object Request Broker Architecture (CORBA). What distinguishes an ORDBMS from other component frameworks, such as object-oriented DBMSs, middleware (e.g., transaction-processing monitors), and general-purpose application servers, is the way that, once integrated within the ORDBMS, components are organized and manipulated using a declarative database language, instead of a procedural programming language. A distributed ORDBMS differs from a single-site ORDBMS in that the objects within it—both the persistent object data and logic implementing the extensions—are physically distributed across a set of network-connected computer systems (called nodes of the overall system). As part of its runtime operations, the distributed system moves data and logic between nodes to answer user queries as efficiently as possible. We illustrate this idea in Figure 2.1. Ideally, the fact of this physical dispersal is not apparent to users and developers, a property known as location transparency. Queries in a distributed ORDBMS are submitted at a single node (called the local node or local site). Within this local node the query expression is decomposed into a set of lower-level operations. Some of these operations might be forwarded from the local site to remote nodes. Processing a query in a distributed ORDBMS may involve moving data between the nodes of the system, performing computational tasks on several nodes, and possibly even moving the entire implementation of some component logic between nodes. Figure 2.1 illustrates this idea by showing a single external user exchanging SQL queries for data with a local node. The arrows between the nodes of the distributed system illustrate the flow of data, code, and messages coordinating the execution of the query. Note that this diagram is misleading in one important way. From Figure 2.1 you might conclude that the data-processing code is distinct from the database’s tables. As we shall see, this is not the case. The component logic embedded in the ORDBMS is invoked using query expressions. A better way to represent this is to show the data-processing code in the columns and rows of the tables. The implications of this are far-
2.1
Introduction
| 31
Figure 2.1 Topology of distributed ORDBMS.
Queries
Data
reaching. It means that the ORDBMS is more than a reliable but relatively stupid repository for data. ORDBMSs are software frameworks within which developers can store, organize, and reason about a set of software components corresponding to complex real-world entities.
2.1.1
Why Distributed ORDBMSs? Distributed component DBMSs such as ORDBMSs are useful because they address requirements not well served by traditional distributed systems technologies. First, a distributed ORDBMS provides the means to integrate the data and logic of multiple information systems into a coherent data management infrastructure. Today, new information systems are frequently constructed by combining several preexisting systems. For example, organizations that have grown through mergers and acquisitions might need to maintain their preexisting accounting, order processing, and supply-chain software while at the same time providing global access to the information in these systems to facilitate management decision making. The kind of query-centric interface, provided by a distributed ORDBMS, is more powerful than procedural interfaces supported by other middleware technologies. Second, in a similar fashion, certain information systems can be improved by allowing them access to remote data or logic. Transient data that reflect the current state of some functioning system—such as electronics that monitor environmental conditions or reflect the state of
32 |
Distributed Component Database Management Systems
a security system—can be accessed through the ORDBMS and combined with persistent data in queries. E-commerce web sites interact with remote systems to process credit card transactions. The most efficient way to develop these systems is to incorporate code that understands and accesses such remote data structures and services directly into the data management framework. Third, new systems with extreme sizing requirements—either in terms of the volume of data, number of concurrently connected users, or volume of data throughput—can be constructed using distributed ORDBMSs. Single-site ORDBMSs are limited in their capacity to handle simultaneous connections and data volumes, which forces software engineers to take a divide-and-conquer approach. Distributed ORDBMS architectures minimize internode connections while maximizing concurrent users. Partitioning data across multiple nodes facilitates parallelism. Also, these schemes can provide the basis for high availability. Such systems share many characteristics with shared-nothing DBMS architectures. One can make a stronger claim. Distributed ORDBMSs constitute a powerful, innovative model for generalized distributed computing. ORDBMS technology is unique in that it provides a framework for managing both logic and data in an abstract data model. Traditionally, SQL queries (and in prerelational DBMSs, lower-level data access operations) were embedded in the procedural logic. Data returned by an embedded data access operation was typically subjected to further processing by the logic. And all of this was hand-coded, compiled, and deployed, so that each problem or feature of the information system had its corresponding, self-contained software module. In contrast, in an ORDBMS the logic that is integrated into the query-processing framework typically contains no data-access operations at all. Component routines can be thought of as subroutines extracted from traditional programs. Logic is bound to data using queries, dynamicprogramming expressions that are processed by the ORDBMS. What this yields is tremendous flexibility at acceptable levels of performance. Distributed ORDBMSs provide additional advantages.
•
ORDBMSs include elaborate metadata management facilities. These provide a global catalogue or directory describing the contents of the entire system; its data and logic. Unlike other distributed directory services, this can be accessed by using reporting tools and graphical query-creation software, rather than by using static applications or procedural code.
•
The logical data model and declarative query language mean that many high-level, conceptual operations can be expressed more or less directly in the system. Further, these expressions can be optimized at runtime to achieve goals such as computational load bal-
2.1
Introduction
| 33
ancing. Other approaches rely on design time partitioning of logic based on assumptions that are often invalid at runtime.
•
Distributed ORDBMSs can co-opt standard component development models to simplify software engineering. Engineers creating the system’s atomic components need not know anything about the application context within which they are used. It becomes the job of the SQL-level developer to deploy them to solve a business problem.
•
Component standards also make it possible to automatically move extension logic between nodes of the distributed system when necessary. This provides a mechanism for upgrading the logic in a distributed information system.
•
All of the considerable virtues of the relational model apply in ORDBMS databases. Developers can use normal form theory to ensure that the databases they design are free from error and ambiguity. Three-value logic can be applied to the problem of missing information. The logical and physical abstraction inherent in the ORDBMS means developers are free to develop a variety of client applications, using a variety of programming environments, against a single schema.
However, despite the attractions of distributed computing, distributed ORDBMSs, multidatabases, and initiatives such as CORBA all have modest commercial track records. In some situations, developers have been successful building information systems based on these technologies. But more general acceptance has been slow in coming, primarily for nontechnical reasons (Gordon & Gordon 1993). It is usually considerable pressure from non-information-system (non-IS) employees and support from management that leads to their adoption. The exception to this rule has been transaction-processing monitors (or TP monitors). TP monitors are distinguished from other distributed frameworks in that they provide a single point of administration. Sharing distributed facilities requires programming against the TP monitor; with other distributed technologies, each node can evolve autonomously. In a nutshell, the administrative burden imposed by many distributed technologies is thought to outweigh their advantages. Copies of an application’s logic and some of its data must be kept local for a potentially very large community of distributed users to achieve reasonable response times. But coordinating change to this logic and data becomes exponentially more difficult as more sites are added. As a result, the human effort necessary to unravel conflicts becomes burdensome.
2.1.2
Description of the Example Application We use a consistent application example throughout this chapter to give substance to the ideas we introduce. Figure 2.2 presents the
34 |
Distributed Component Database Management Systems
Figure 2.2 Extended entity-relationship (ER) model for example application. Call_History Subscriber::Phone_Number Transceivers::Transceivers_Id Duration::PERIOD
Transceivers
Subscriber
Id::Transceivers_Id
Id::Phone_Number
Location::ST_POINT
Name::Person_Name
Service_Range::ST_POLYGON
LivesAt::ST_POINT Security_Check::Audio_Key
entity-relationship (ER) diagram of our example’s conceptual data model. This ER model is unusual in some respects. Note that each entity attribute is specified by both a name (used to refer to the column in queries) and a data type or component name (which specifies the domain of objects the column contains). Such strongly typed schemata are characteristic of object-relational databases. It is useful to think of these domains as indivisible (or encapsulated) objects combining structure and behavior. This makes it a natural to use component approaches to software engineering to implement them. Once they are implemented and added to the ORDBMS framework, you can organize your application’s domains into a logical schema corresponding to the network of facts that describe the problem domain. In the schema in Figure 2.2, the types and domains in CAPS are standardized in SQL:99. Types with mixed upper- and lowercase are specific to our application. The schema is a highly simplified version of a database that supports the operational aspects of a cellular telephone network. Cellular networks are geographically distributed arrays of wireless transceiver stations connected by broadband wire networks. The transceivers provide voice and data connectivity for mobile-phone users, and roosts for resting birds. Our system also manages the account information about subscribers and their call history, which is used for billing. Finally, our system supports an innovative voiceactivated authentication function. Subscribers record a secret password when they sign up for their service. When they power up a handheld device, they speak this word—rather than punching in a secret number—to engage the device to the wireless network. The idea is to allow the subscriber to switch freely between different devices. The audio key provides both identification and authentication for the system security.
2.2
2.1.3
Single-Site Component DBMSs
| 35
Chapter Overview In this chapter we describe how distributed ORDBMSs are constructed and how they can be used. We begin by reviewing the features, architecture, and implementation of a single-site ORDBMS. Such background is useful because the commercial versions of distributed ORDBMSs take such a system as a starting point and because detailed descriptions of ORDBMS internals are not widely available. Also, instances of these single-site ORDBMSs run on each node of the overall distributed system. In Section 2.3, we outline the architecture of a distributed ORDBMS, describing the modifications that are necessary for a single-site ORDBMS to operate in a distributed environment. We extend our example application by partitioning it across a set of computers and explain how the distributed system operates. Finally, we wrap up the chapter with a summary.
2.2
Single-Site Component DBMSs In this section we give an overview of how a single-site ORDBMS works. In most cases, vendors implement their distributed ORDBMS products by augmenting the functionality of a single-site system. This strategy is motivated more by commercial than technical considerations. The market for distributed ORDBMSs is a subset of the broader market for single-site DBMS software. And as we shall see, much of the code in a single-site system is useful in the distributed system.
2.2.1
ORDBMS Abstract Data Model As mentioned in the introduction, the most important difference between traditional DBMSs and ORDBMSs is that ORDBMSs manage both persistent data and logic implementing operations over to this data within an abstract data model. This permits developers to use declarative expressions to organize and manipulate the objects for which their information system is responsible. In other words, instead of creating programs wherein every user-level operation is hand-coded and compiled into an executable form, the ORDBMS encourages more flexible system architectures. With an ORDBMS, complex business questions and changes to the information system can be handled at runtime by building appropriate query expressions.
EXAMPLE 1
Figure 2.3 presents two SQL fragments illustrating this. Figure 2.3(a) creates a table to store facts about subscribers in our cellular network system, and the query in Figure 2.3(b) records the fact that a call is beginning.
36 |
Distributed Component Database Management Systems
Figure 2.3
Query data definition language (DDL) and data manipulation language (DML) examples for cellular Operational Support Systems (OSS) application. a.
CREATE TABLE Subscribers ( Id
Phone_Number
NOT NULL PRIMARY KEY
Name
Person_Name
NOT NULL,
LivesAt
ST_POINT
NOT NULL,
Security_Check
Audio_Key
NOT NULL
CONSTRAINT Subscriber_Primary_Key,
); b.
INSERT INTO Call_Record ( Customer, Transceivers, Duration ) SELECT S.Id, :pTrCall_Transceivers, Period( CURRENT, FOREVER ) FROM WHERE
Subscribers S S.Security_CheckMATCHES:pAudCallers_Spoken_Key;
Readers acquainted with relational DBMSs will find these code fragments familiar. The ORDBMS data model is loosely based on the relational data model in Codd (1970). Data are organized into relations, which correspond to classes of facts describing the state of a problem domain. All of the relational model’s data integrity and consistency features, such as primary and foreign keys, can be enforced in an ORDBMS schema. The manipulation of data, their retrieval and modification, is handled through a declarative programming language. The most popular of these is SQL, which has recently appeared in its third major standard revision (ANSI 1999b). What distinguishes this revision of SQL from previous incarnations is the way that the basic language may be extended with new TYPES and FUNCTIONS. In fact, Figure 2.3 does not contain a single data type to be found in earlier versions of SQL; and it includes a predicate function, MATCHES, which, while syntactically familiar, is semantically alien. These unfamiliar types and expressions are references to modules of logic running within the ORDBMS. They represent objects of interest within the application domain: names, telephone numbers, geographic points, and audio keys. In the next section, we show how they are created.
2.2
2.2.2
Single-Site Component DBMSs
| 37
Component Standards and Integration Early ORDBMSs provided proprietary mechanisms to implement these extensions. Most relational DBMS products possessed a database procedural language that could be adapted to the new purpose. And for performance-sensitive tasks, research prototypes, such as Postgres (Stonebraker & Kemnitz 1991) and Starburst (Haas et al. 1990), and commercial systems, such as Illustra (Illustra Information Technologies 1995), INFORMIX (Informix 2000), and IBM’s DB2 (Chamberlin 1998), all implemented mechanisms allowing developers to integrate logic programmed using C. C extensibility relies on standard operating system dynamic linking to handle the mechanics of loading and unloading compiled libraries and invoking the functions compiled within them. In recent times we have seen a rush of component-development standards. Most of these are designed with frameworks other than an ORDBMS in mind. They describe how to implement logic so that it can be linked into other programs in a way that overcomes some of the limitations of the minimal operating system support. For example, Microsoft COM (Box 1998) specifies how to build C++ class libraries in such a way that they can be reused consistently across all Windows platforms. The standard explains how to interrogate a COM class to find out what interface methods the class implements and how they are called. Another popular component standards is the Sun Soft EJB standard. JavaBeans are components written in the Java programming language. The standard describes how to implement a Java class and how to use certain facilities in the Java language environment to get information about what the class consists of. In this section we provide several examples showing how an ORDBMS can be extended. In order to keep this presentation general we adhere as closely as possible to standard languages and syntax. Stored Procedure Languages. As we mention earlier, all commercial DBMSs support a proprietary procedural language. These were developed for database procedures, which are a primitive kind of database extensibility. Database procedure languages are attractive to DBMS vendors because their implementation is entirely under the vendor’s control. Such languages tend to be simple and robust: They support scoped variables, looping and branching, subprocedure calls, and exception handling. Initially designed as procedural extensions to support complex multistatement transactions, stored procedure languages provide an excellent mechanism for rapidly prototyping component logic. But stored procedure languages have their drawbacks. They do not execute as quickly as equivalent extensions created using compiled languages. This is because procedural languages are compiled into an intermediate form that is interpreted by a module in the ORDBMS. For
38 |
Distributed Component Database Management Systems
CPU-intensive tasks, interpretation is a slower execution strategy than running machine code generated by native compilers. Also, stored procedures were not designed to be invoked hundreds or thousands of times during query execution. Whereas properly implemented compiled language extensibility allows the engine to invoke an extension through a local procedure call within the engine address space (roughly 100 machine code instructions or 1.0 × 10–8 seconds) invoking stored procedure language interpreters is much more expensive (perhaps 10,000 instructions or 1.0 × 10–5 seconds). This makes stored procedures a poor choice for lightweight extensions that are invoked many times during query processing.
EXAMPLE 2
Figure 2.4 illustrates how a data type such as Person_Name from Figure 2.4 might be defined. In it we illustrate how the SQL:99 language standard’s CREATE TYPE DDL statement is used to define a data structure and an interface method. Note that the method in this example is called “Equal” and it returns a Boolean result. The ORDBMS can infer from this that the method corresponds to the SQL language operator symbol =. The biggest drawback to implementing components with stored procedure languages is that you can only run them in a single kind of ORDBMS. Being able to deploy your components elsewhere—in a middleware server, in another DBMS, or in a client user-interface program—is highly desirable. Compiled Languages (C and C++). Another approach to writing extensions is to use general-purpose programming languages such as C or C++. These languages compile down into object code or assembler,
Figure 2.4
Example of component implemented using a stored procedure language. CREATE TYPE Person_Name AS (
( Other_Name.Middle_Name = this.Middle_Name )); END METHOD;
2.2
Single-Site Component DBMSs
| 39
low-level instructions executed in computer hardware. Compiled languages provide the fastest possible execution, and developers using languages such as C or C++ can make use of the functions included with standard language libraries. The SQL:99 language standard refers to this kind of extension as an EXTERNAL FUNCTION.
EXAMPLE 3
Figure 2.5 presents code samples illustrating how implementing such an extension might look. However, compiled languages have drawbacks of their own. Implementing C or C++ extensions that can be moved easily between ORDBMS instances or used in non-DBMS software frameworks is difficult. Compiled libraries are portable only if you assume a homogenous hardware and operating system environment. For example, we have already mentioned that it is possible to develop components on Microsoft Windows platforms using the COM standard. Other developers
Figure 2.5
Example of component implemented using a C compiled language. #include <math.h> BOOLEAN geo_within
(
geo_point
*
pGeoPnt,
geo_circle
*
pGeoCirc,
/* ORDBMS specific context information */ ) { DOUBLE
dx, dy;
dx = pGeoPnt->X – pGeoCirc->Center.X; dy = pGeoPnt->Y – pGeoCirc->Center.Y; if (pGeoCirc->radius < sqrt((dx*dx)+(dy*dy))) return TRUE; return FALSE; } CREATE FUNCTION Within (
Geo_Point, Geo_Circle )
RETURNS boolean EXTERNALNAME‘/home/external_routines/Geo.c(geo_within)’ LANGUAGE C;
40 |
Distributed Component Database Management Systems
using the same hardware and operating system can reuse the library, but developers using a flavor of UNIX or non-Intel hardware cannot. Second, illegal operations in compiled programs are difficult for the ORDBMS to recover from gracefully. Such bugs can halt the running server, which is a highly undesirable behavior in systems required to provide high availability. This problem is made worse by the fact that compiled languages such as C and C++ are notoriously hard to work with. The pragmatic solutions to this problem are to isolate the binary object in a separate address space or to test your code exhaustively. The first of these dilutes the performance advantages of using compiled code in the first place, and the second is extremely laborious. Another alternative is to compile safe languages, such as FORTRAN, Perl or LISP, into executable binaries. Java. Semicompiled languages such as Java have all of the advantages of general-purpose programming languages without some of the drawbacks that accrue from the immobility of compiled code. Java is compiled into a terse, portable format. Java’s runtime environment, which is implemented on almost all varieties of hardware and operating systems, is responsible for actually executing this byte-code. More importantly, Java’s runtime environment has been designed to work in a broad variety of computer system contexts. In fact, the entire Java execution engine or virtual machine can be linked into the address space of a correctly implemented ORDBMS just like any other compiled library. Then the virtual machine running inside the ORDBMS can load byte-code files and execute them.
EXAMPLE 4
Figure 2.6
Figure 2.6 illustrates how a Java class is implemented and integrated into the ORDBMS. In this example we follow the SQL-J Part 2 standard, which has recently been combined with the Java DataBase Connectivity (JDBC) standard. Example of component implemented using Java. // //
File:
//
About:
PersonName.java
// //
Compile:
javac [ -g ] PersonName.java
//
Create Jar: jar -cf PersonName.jar PersonName.class
// import java.sql.*; public class PersonName implements SQLData {
2.2 private String
Surname;
private String
FirstName;
private String
TypeName;
Single-Site Component DBMSs
| 41
public String getSQLTypeName() { return TypeName; } public void readSQL
Distributed Component Database Management Systems create function personname ( lvarchar, lvarchar ) returns personname external name 'personname_jar:PersonName.PersonName(java.lang.String, java.lang.String )' language java;
Java’s greatest virtue is its runtime mobility. Logic implemented in Java can be moved from site to site within a distributed system built using heterogeneous hardware and operating systems. Achieving this in a consistent way is the goal of the JavaBeans standard. The JavaBeans standard is a collection of APIs describing the means by which a JavaBeans instance interacts with its environment and the means by which the environment can find out about the properties of a JavaBeans component. Single-site ORDBMSs can support all of these component extensibility mechanisms simultaneously. End users submitting a query to the system have no idea how the extension logic they invoke is actually implemented. This abstraction turns out to be doubly useful in the distributed-systems case. As we see later, it means that in a distributed ORDBMS the query-processing module can pick the site location on which the logic is executed based on some optimization goal. Key to this is the system’s ability to move both data and logic between nodes.
2.2.3
Query-Processing Overview All ORDBMS data access is done through queries. There is no low-level record-at-a-time or navigational interface to the objects in the ORDBMS stores. Queries can come from external programs or from logic running inside the ORDBMS that implements an application-level object. (The implementation of a JavaBeans class that corresponds to an application-level object such as Work_Order can run entirely within the ORDBMS. This same class logic can be moved to a client or middleware program. Java code implementing this JavaBean might include SQL queries over a database schema.) So far as the query-processing aspects of the ORDBMS are concerned, there is no difference between queries with external and internal sources. When it receives a SQL expression, the ORDBMS decomposes it into an ordered sequence of low-level operations, called a query plan. Then it executes this plan, returning its results to the software module that submitted the query in the first place.
2.2
EXAMPLE 5
Figure 2.7
Single-Site Component DBMSs
| 43
Consider the relatively simple query: Show me the ids and size of coverage areas of all transceivers within 13 miles of the geographic point located at longitude –127.513, latitude 35.362, ordered by the size of the coverage area. This is illustrated in Figure 2.7 and might be submitted by an external report writer program. Alternatively, it might be part of some logic implementing an application-level object’s constructor. For example, our information system might include an object that corresponds to a service request. Such an object is probably transient. It only exists until the work order is complete. This object’s state—which might include the list of transceivers returned by this query—is derived from a series of SQL queries that run against the persistent database. The ORDBMS framework can accommodate this class of object, which is traditionally associated with midtier application servers. And like other kinds of components, application objects can be made mobile within the distributed ORDBMS. This query will probably be decomposed in the ORDBMS engine into a series of simple operations, shown in Figure 2.8. An approximate translation of this plan is as follows. First, examine the Transceivers table and extract those rows that are located in the circle supplied to the query as an argument. Then strip away the first attribute from these matching rows, adding an additional attribute value by executing some logic over ST_POLYGON data in the Service_Range attribute. Finally, order the resulting rows using the values in that second, computed attribute. Simple ORDBMS query. SELECT T.Id, Area ( T.Service_Range ) AS Area FROM Transceivers T WHERE Within ( T.Location, Circle ( -127.513, 35.362, ’13 Miles’ ) ) ORDER BY 2;
Figure 2.8
Plan illustrating decomposition of query in Figure 2.7 into discrete operations. 1. RESTRICT [Transceivers, Within (Location, Circle ( –127.513, 35.362, ’13 Miles’ ))] 2. PROJECT [ 1, < Id, Area ( Service_Range ) > ] 3. SORT [ 2, 2 ]
The automatic decomposition of declarative queries into this kind of plan is the key to understanding how relational and object-relational engines work. In Table 2.1, we present an incomplete list of the lowlevel data management operations implemented in DBMS engines.
44 |
Distributed Component Database Management Systems
Table 2.1 Several data management operations implemented in RDBMSs and ORDBMSs. Abstract Operation Name
They all take as input at least one set of row data, and all of them return a set of row data as a result. The results of one operation can become the input to another. An important goal for vendors and academic researchers working in DBMS is to create efficient algorithms to implement these operations. What distinguishes relational DBMS (RDBMS) from ORDBMS engines is the way that these low-level operations are generalized in the ORDBMSs. For example, the query in Example 5 and Figure 2.7 requires that the ORDBMS sort its results according to coverage area (step 3 in Figure 2.8). Strictly speaking, measurements of area combine quantities (number values) and units (square miles or square kilometers or square degrees). Such objects cannot be represented directly with a single built-in SQL-92-data type. Information systems that use SQL-92-style RDBMSs rely on application logic in an external program to convert each user’s data entry into a consistent form for storage. The problem is that each new application using this data must re-implement the conversion logic, and bug fixes, upgrades, and evolutionary changes to the system’s functionality are thereby made more difficult. A better solution is to create a component to encapsulate area data and the operations over it (for example, a JavaBean or COM object) and to embed this in the ORDBMS. To clarify what we mean by generalized, let us look more closely at the SORT operation. Algorithms used to sort data are almost all looping and branching constructs around two operations: swap and compare (Knuth 1998). The ORDBMS knows the size of each object, so swapping them is straightforward. To sort a set of component instances, the ORDBMS requires that the component’s developer supply the compare logic to tell it if one instance of the object is less than, equal to, or greater than another. Based on what this logic reports, the ORDBMS’s
2.2
Single-Site Component DBMSs
| 45
sort algorithms can take the appropriate action (swapping or not swapping them). All of the data management operations supported by the ORDBMS—access methods such as B-trees, client-server communication, and page management—can be generalized in this way. Metadata is the key to supporting this kind of functionality. Traditional RDBMSs stored information about database schema objects such as tables, columns, and indices in a special set of tables called system catalogues. This information was used by SQL-92 RDBMSs to identify what data (table and column) a query was addressing. An ORDBMS’s system catalogues include additional tables storing information about whatever components it manages: the size of their persistent form in bytes (or whether they are of a variable length), the logic that can be applied to them, and information about how this logic is best handled in a query. Parsing in an ORDBMS is therefore considerably more sophisticated than in an RDBMS. Data structures used to represent operations in query plans include substructures containing references to the component’s C, C++, or Java code to be invoked in the generalized RESTRICT, JOIN, or SORT operations. As it parses a query the ORDBMS populates these structures with appropriate values. System catalogues document all of a database’s contents. They can be used to reconstruct entire schema definitions and to answer questions such as
•
What interface methods can be applied to ST_POINT object instances?
•
What other entities in the schema can be combined with Transceivers (what other tables have columns of component types also present in the Transceivers table)?
This reliance on query-able system catalogues is another way in which an ORDBMS can be distinguished from other component frameworks. And it provides a valuable resource in the distributed ORDBMS. Modules of programming logic can be updated or upgraded, and new modules added to a running system without bringing the whole system down.
2.2.4
ORDBMS Query Optimization ORDBMS query processing is based on classic RDBMS techniques. Query expressions are parsed into a set of data structures, each of which corresponds to a data management operation of the kind in Table 2.1. An important property of these operations is that they may be re-arranged, subject to certain rules, without affecting their result. Although each arrangement is equivalent with respect to the result, they are not necessarily equal with respect to the computational resources they consume. This means that the ORDBMS is free to examine every alternative arrangement and to pick the one it estimates is best according to some optimization goal—usually whichever plan
46 |
Distributed Component Database Management Systems
consumes the least resources or returns a result the most quickly. In this section we explore how this is done in more detail. It is useful to introduce some semiformal notation when reasoning about query plans. One of the better ways to visualize physical queryexecution plans is using trees. The lowest level nodes of the tree represent the ultimate data sources (tables). Within each node of the tree, input data is subjected to some processing to produce some output. Note that these nodes correspond to the operations introduced in Table 2.1. Nodes in the higher levels of the tree represent operations that consume data produced by lower nodes. Edges connecting nodes in the tree represent data flowing from the ultimate data sources upward to the single result node.
EXAMPLE 6
Figure 2.9
For example, consider the query: Show me the name and all calls longer than an hour for subscribers living in California. This is illustrated in Figure 2.9. Figure 2.10 presents three alternative physical plans to compute the results of the query. Plans A and B swap the inner and outer data source as they feed into the join. Depending on the circumstances—the number of rows passing through the filters from S and C—either plan might be preferable. If the number of rows passed through the “Contains (California, S.LivesAt )” filter is large, and the number of rows passed through the “Length ( C.Duration ) > 1 HOUR” filter is small, then plan B will probably outperform plan A because the ORDBMS can cache the smaller amount of data in memory as it reads in all the rows stored on disk from the larger. Query. SELECT
S.Name, C.Duration
FROM WHERE
Subscribers S, Call_History C C.Subscriber = S.Id
AND
Length( C.Duration ) > 1 HOUR
AND
Contains (‘California’::ST_POLYGON, S.LivesAt);
To pick the best plan among the alternatives, RDBMSs calculate a cost for each of them according to a cost function. For single-site RDBMSs, the most common approach is to calculate the total computational resources consumed by the query as follows. Total_Cost = CPU_Cost + I/O_Cost I/O_Cost = Data_Page_Disk_I/Os * I/O_Weighting CPU_Cost = Tuples_Processed * CPU_Weighting
2.2
Single-Site Component DBMSs
| 47
Figure 2.10 Alternative query plans for query in Figure 2.9.
Plan A
S.Name, C.Duration
C.Subscriber = S.Id
Length (C.Duration) > 1 hour
Contains (California, S.LivesAt)
C
S
Plan B
S.Name, C.Duration
S.Id = C.Subscriber
Contains (California, S.LivesAt)
Length (C.Duration) > 1 hour
S
C
Plan C
S.Name, C.Duration
Length (C.Duration) > 1 hour
S.Id = C.Subscriber
Contains (California, S.LivesAt)
C.Subscriber, C.Duration
S.Name, S.LivesAt
C
S
For each plan, the total cost can be computed by aggregating the costs for each suboperation. This cost is at best an accurate estimate of the real cost of the query. Factors such as the amount of memory the
48 |
Distributed Component Database Management Systems
DBMS can use and the weightings assigned to I/O and CPU costs affect the outcome of the query-planning phase. ORDBMS optimizers modify and extend these techniques in several ways. These modifications amount to a generalization of RDBMS techniques. The basic problem is that RDBMS optimizers make a set of assumptions that are highly unrealistic in ORDBMS databases. For instance, in plan C (Figure 2.10), we defer executing the Length() predicate to the very end. In this particular case, such a plan makes little sense. Length(PERIOD) is a relatively inexpensive operation (i.e., it consumes few CPU resources) and in this query it is likely to dramatically reduce the number of records read from the Call_History table. RDBMS optimizers assume that the CPU and I/O cost of processing each tuple is equivalent. In Example 6, an unmodified RDBMS optimizer would assume that Length(PERIOD) and Contains (ST_POINT, ST_POLYGON) use an equal amount of computer resources. But in an extensible DBMS, the user-defined logic of the kind implementing Length() might consume considerable CPU and memory resources. In addition, the data object to which the logic is applied might be very large, which means it will incur additional I/O costs. Clearly, ORDBMSs optimizers must handle extension logic with care. Suppose the Duration attribute did not store 16-byte Period objects, but was in fact a digital recording of the entire call. Length() might compute the time during which there was an active conversation going on in a call by examining the recording and counting up the seconds of nonwhite noise. Intuitively, an ORDBMS performing this query should try to minimize the number of times the Length() logic is invoked because this is computationally expensive and each time it is invoked a large data object must be read from disk. This insight has led to techniques referred to as expensive predicate migration (Hellerstein & Stonebraker 1993). An ORDBMS optimizer needs to be informed about the relative cost of each extension. Knowing that Length() is expensive will mitigate against plans A and B because they invoke the expensive Length() logic for every row in the Call_History table. Under these circumstances, plan C might make the most sense. This leads to a modification of the previous formulas. If each query includes n expressions, then calculating the total cost for a query requires something similar to the following. I/O_Cost = I/O_Weight * ( Data_Page_Disk_I/Os + ∑n ( Expression_I/O_CostI * Number_of_InvocationsI ) ) CPU_Cost = CPU_Weight * ∑n( Expression_CPU_CostI * Number_of_InvocationsI ) That is, rather than simply assuming that each expression has an invariant CPU cost and no additional disk I/O cost, an ORDBMS optimizer must cater to the diverse possibilities implied by running user-
2.2
Single-Site Component DBMSs
| 49
defined extension logic in the engine. In certain ORDBMS applications, I/O and CPU within extensions dwarf the costs of traditional tupleoriented query processing. Second, the ORDBMS cannot determine the cost of alternative query plans accurately without help. Consider the join operation that appears in each plan in Figure 2.10. As we have mentioned earlier, one efficient way to compute the answer to this query is to read all of the rows from the smaller data set into memory and scan these cached rows for matches as each row is read once from the larger one. (Note than in ORDBMSs, where join predicates are frequently expressions other than the ordinal operators <, <=, =, and so on, this nested-loop join algorithm is quite popular.) But which data set is the smaller one? As another example, consider the problem of deciding whether or not to use an index. If a predicate does not identify only a few rows in a table, then the ORDBMS will read the majority of the index data and the majority of the table data from disk. It might be better simply to scan all of the table data in this case. RDBMSs store information about data sources (such as row counts and row sizes) in system catalogues. Sophisticated RDBMSs also store a statistical characterization of the data in each table’s columns, usually a range of values (Selinger et al. 1979) or a frequency histogram. An RDBMS optimizer can use this information to estimate what proportion of rows a query predicate will filter out. We call this selectivity estimation. Enormous efforts have been made to improve the performance and accuracy of these techniques for SQL’s built-in data types. But in an ORDBMS, the optimizer needs a mechanism to estimate selectivity for the user-defined logic in predicates, so the ORDBMS allows developers implementing the extensions to integrate logic that creates a statistical characterization of the data in a column and code that lets the optimizer estimate an expression’s selectivity based on this characterization. Developers can register support functions, such as the per-function call costs, and selectivity estimators when they integrate their components into the ORDBMS. In terms more familiar to object-oriented software engineering, support functions are interface templates that generally remain private for the use of the ORDBMS. Other examples of support functions include mechanisms to copy data to a neutral format for backup and to send and receive data value instances across a network.
2.2.5
Internal Architecture of ORDBMS Once it has decided on a query plan, the ORDBMS’s final task is to execute each of the plan’s operations in the appropriate order and return the results to the user who submitted the query. Sometimes the interval of time between when a query is submitted to the ORDBMS and
50 |
Distributed Component Database Management Systems
when it is executed can be considerable. Queries can be prepared long before the query’s results are requested. This fact encourages us to divide the ORDBMS into two functional halves. The first, top half of the ORDBMS engine is responsible for the query-processing tasks described in the previous few sections: parsing the query string into an internal form, optimizing this internal form into a plan, and scheduling the execution of the optimized plan. The second, lower half of the ORDBMS is responsible for each of the low-level data management operations such as RESTRICT, JOIN, and SORT. As it executes each of the operations in a query plan, the ORDBMS’s top half calls upon facilities implemented in the lower half, such as buffer management (I/O and caching); transaction support (locking, logging, etc.); and scheduling, memory management, and other low-level operations. Looking at the ORDBMS in this way highlights the extent of the changes needed to transform an RDBMS into an ORDBMS. All of the functional components labeled in Figure 2.11 must be generalized in the same way that the ORDBMS’s sorting algorithms are generalized. Over and above these modifications, managing certain kinds of components efficiently makes it necessary to add to the list of data management algorithms and techniques in the engine. For example, indexing spatial data requires something like an R-tree (Guttman 1984), and recently several researchers have developed algorithms to compute joins over spatial data that are more efficient than the simple nest loop (Patel & DeWitt 1996). Further, joins involving expensive predicates make the case for techniques such as join indices (Valduriez 1987) more compelling than it was for SQL-92 DBMSs.
Figure 2.11
Internal architecture of ORDBMS.
Parsing Optimization Plan execution
Buffer management
Low-level operations
Locking and logging
2.2
2.2.6
Single-Site Component DBMSs
| 51
Function Manager An important objective of ORDBMS engineering is to provide equivalent support for all component extensibility mechanisms. In other words, we wish to create a system where end users are unaware of how the extensions specified in their queries are actually implemented. For example, the query in Example 1 and Figure 2.3 includes SQL expressions that invoke the Period constructor and logic that compares audio keys. These might be implemented using any of the mechanisms introduced in Section 2.2.2: stored procedure language, C, Java, or a COM library. Such agnosticism is highly desirable from a commercial point of view. It gives customers and developers freedom of choice. In certain, homogenous hardware environments, COM or C might be preferred, but when the code must be mobile—for example, when it might make sense to push the audio-key comparison down into the handheld device—languages such as Java have their advantages. Achieving this agnosticism requires that the ORDBMS include an internal facility for managing the dispatch (invocation) of extension logic and an associated mechanism to add different language environments to the ORDBMS. Such facilities need to be quite sophisticated. For example, programming logic routinely requires blocks of memory for its private use, but letting an integrated component acquire memory directly from the operating system creates problems for the ORDBMS. When an exception causes a transaction to roll back, the ORDBMS should clean up any memory allocated for the temporary use of any logic invoked in the query. But this is impossible if the ORDBMS has no record of what was allocated in the first place. Inevitably, this leads to memory leaks and system instability. Other resources traditionally provided to programs by the operating system—file management, interprocess communication, scheduling, and exception handling— face similar problems. Operating system techniques for memory management, scheduling, and exception handling can be adapted for use in the ORDBMS, and the functionality of common system calls can be implemented by the ORDBMS itself. Dynamic linking resolves any references made to these services from within compiled libraries. Such low-level plumbing can in turn be used to provide support for environments such as Java’s virtual machine because these language environments are designed to be fairly neutral with respect to the operating system context in which they are deployed. An ORDBMS language manager needs to map any operating system calls that such environments make to their ORDBMS equivalents. Thus, whenever the on-board Java virtual machine asks for memory, it receives it from the ORDBMS (rather than from the operating system). In addition, the ORDBMS needs to know what lowlevel facilities in the new environment it must invoke to perform tasks such as data value conversion, logic invocation, and return results locating.
52 |
Distributed Component Database Management Systems
Figure 2.12
Illlustration of function manager and language manager.
Name
Return_Type
Params
Language
Runnable
Foo
int
int, int
C
Funcs.so
Bar
varchar
varchar, int
Java
Funcs.jar Funcs.so
{ name ="Foo", arg_cnt =2, ret_type ="int", language ="C", runnable ="Funcs.so", args[] ={0x104, 0x110} }
Function manager Java.lib
Language manager interface
Funcs.jar
We illustrate these ideas in Figure 2.12. Dark gray represents binary objects. These can be linked to the ORDBMS’s function manager using the operating system dynamic-linking mechanisms. If the binary object itself implements a language environment—as the Java.lib file does in this figure—then appropriate entry points within it must be identified for the language manager interface. The lighter gray object is a Java archive or jar file. In the diagram, we show it being loaded into the Java language environment using standard Java techniques. When the function manager seeks to invoke some logic implemented in Java, it does so through the language manager. When the Java environment requires resources, these are made available from the ORDBMS. The ability to link compiled binary files into the ORDBMS, to allow the code in those files to access the kinds of resources they need, and to invoke the logic within these files is the most fundamental aspect of ORDBMS engineering. Generalized algorithms for performing RDBMS operations can use logic written to address a specific problem domain. Once achieved, this becomes the foundation on which the other features of the engine can be built.
2.3
Distributed Component DBMSs In this section we describe how to modify a single-site ORDBMS to make it work across a network of computers. With distributed
2.3
Distributed Component DBMSs
| 53
ORDBMSs we face many of the same challenges as with distributed RDBMSs: How do we ensure transactional guarantees for distributed operations?; How do we present users with a global schema name space while providing for local autonomy?; and How do we optimize queries that require data movement over the network? In addition, a distributed ORDBMS must address a set of challenges that are the consequence of introducing programmatic, user-defined extensions into the data model. Depending on how it is implemented, extension logic may be mobile (which means it can be copied across the network to wherever it makes the most sense to invoke it) or fixed (which means it can be run only on a particular node of the distributed system). Distributed RDBMS products can assume that every node in the system has identical functionality or, in the case of a federation of different RDBMS products, very similar (i.e., mapable) functionality. But with a distributed component DBMS, each node might contain disjoint sets of component extensions. Fortunately, many of the mechanisms developed for distributed data management in RDBMSs can also be applied in distributed ORDBMSs. For example, the theoretical models and engineering techniques used to implement distributed transactions can be reused. Therefore, we focus our attention here on topics related to query processing rather than lower-level data management. Also, we assume that within our distributed ORDBMS the same server software runs at each site; that is, we ignore the multidatabase problem. It is certainly feasible to use gateways and a variety of user-defined access methods (see later) to build a federated or multidatabase system using an ORDBMS. But such systems can be thought of as constrained cases of the more ideal designs we describe in this section. In addition, many multidatabase vendors require that developers run an instance of the vendor’s software as a co-server adjacent to the system being federated (Cohera 1999) converting the overall system into a homogeneous distributed ORDBMS. To investigate these ideas, we modify our cellular network example slightly by running it over three nodes arranged in a fashion similar to that shown in Figure 2.1. We use the site names Billing, Foundation, and History when referring to these nodes. Each entity in Figure 2.2 is stored on separate nodes: Transceivers (T) is stored on Foundation, Call_History (C) is stored on History, and Subscribers (S) is stored on Billing. Certain operations are limited to single nodes in the system. For example, adding new subscribers inserts a row in the Subscribers table on Billing. Other operations, such as the query in Example 1 and Figure 2.3, mix data and operations from different nodes. We illustrate this configuration in Figure 2.13.
54 |
Distributed Component Database Management Systems
Figure 2.13 Distributed query processing flows. Query and result data
Coordinating node for query
Billing
Foundation
History
2.3.1
Overview of Query Processing in Distributed ORDBMS Distributed ORDBMSs adopt the same query-processing strategy as distributed RDBMS (Özsu & Valduriez 1999). Each node in the system connects to the other nodes with which it interacts. These connections are used to exchange messages and data between nodes. The clientserver infrastructure developed to transfer SQL queries and data between the DBMS and external programs can be reused for this purpose. Creating these connections can take several seconds, and each connection requires a significant amount of main memory for data buffering and state management. To limit the resources they are consuming, ORDBMS nodes typically maintain a pool of several open connections to other sites that are shared among locally connected users. For example, we might have several hundred users connected to the Billing node multiplexed over 10 or 20 connections to the History and Foundation nodes. Users submit their distributed queries to a local node, which becomes the coordinating node for all the subsequent activity. The coordinating node parses the query and prepares an execution plan that may involve several subordinate (remote) nodes. At parsing time, the local node resolves names in the global schema name-space into references to shared data or logic.
2.3
Distributed Component DBMSs
| 55
Producing the physical plan requires that the coordinating node access metadata describing things such as the amount of data in tables, the selectivity and computational cost of user-defined logic, and whether a module of logic can be moved between sites. Based on this information, the ORDBMS can evaluate a list of possible query plans and pick the one that consumes the least resources. When the query is executed, the nodes involved in the query plan each performs its localized task, passing its results back to the coordinating node. As all of the distributed query’s subtasks are completed, the data they produce are combined on the local node, and the query’s results are passed back to the user. This description should not surprise readers familiar with distributed RDBMS technology. However, because a distributed ORDBMS is a framework for both logic and data, it can be deployed in configurations that are quite different from what was typical for distributed RDBMSs. Most distributed RDBMS applications typically involve a few—almost always fewer than 10—large DBMS instances. The application logic applied to this data is embedded in client-side programs or midtier application servers. Because the ORDBMS can host embedded logic, it is possible to replace many of these midtier programs with ORDBMS instances. These probably store very little data, although they can cache intermediate results and the state of transient objects. We illustrate this kind of distributed architecture in Figure 2.14. With this approach the overall system can scale extremely well in terms of the number of simultaneously connected users and the total amount of data that a user’s query may address. Application servers and TP monitors use a similar strategy to achieve the same goal. In fact, there is little difference between the functionality of a single-site node in an ORDBMS that stores no data but embeds component logic and a server in one of the more conventional middleware systems. Both are scalable dataprocessing frameworks with external APIs, security services, and support for distributed transactions. They provide connectivity management to a variety of distributed data management services. And both permit the integration of software modules that do work involving multiple sites. Further, the techniques used in these systems (e.g., queues and transactional messaging; Gray & Reuter 1992), can be incorporated in the ORDBMS and used to sustain performance and reliability behind the data model and query language abstraction. The chief difference is that using distributed ORDBMS in this way provides a declarative query language interface in the midtier in addition to these other facilities. Unlike conventional application servers or client programs, these midtier ORDBMS nodes can take advantage of SQL query-processing techniques to improve developer productivity and overall performance. It is impossible, for example, to use a graphical report writer with a TP monitor or application server middleware. However, it is perfectly possible with a distributed ORDBMS. Developers building applications that are amenable to being handled in
56 |
Distributed Component Database Management Systems
Figure 2.14 Architectural alternative for distributed ORDBMS.
parallel—e-commerce web sites selling goods and services from a catalogue or making markets—could make most immediate use of such architectures. Each web-server machine could store a replicated copy of shared data and could record offers, bids, and sales independently. A single point of failure is thereby avoided. Global queries can be handled using distributed query-processing techniques.
2.3.2
Distributed Extensions to SQL In distributed RDBMSs, the SQL language is modified by adding site names to schema object names. Schema objects in a distributed system—tables, views, and expressions referencing component logic—are fully named by a combination of their site-local name and the name of the site from which they originate. If an object name is not fully qualified, then the local site name is substituted by default. At first glance this would seem to be contrary to the objective of location transparency because developers will need to know the site on which an object resides when they write queries. But this strategy is a necessary compromise between location transparency and the need for local site autonomy. Creating a global name-space might imply unacceptable limitations on local site policies. In distributed RDBMS products, this technique was applied to naming tables and views. In distributed ORDBMSs it is also applied to
2.3
Distributed Component DBMSs
| 57
component logic. This allows an extension integrated into one site to be referenced from all other connected sites.
EXAMPLE 7
The queries we introduced in Example 1 (Figure 2.3b) and Example 5 (Figure 2.7) are represented in the distributed system as shown in Figure 2.15. If location transparency is desired, it can be achieved with synonym or alias techniques. These associate each fully named schema object with a unique, site-local label. If a single global name-space is desired— for example, if the distributed system allows for the mobility of application-level objects between nodes—these alias mappings must be identical on every site. Thus every node in the example system in Example 7 and Figure 2.15 would use an alias to associate a local label Subscribers with the global table name Subscribers@Billing. Properly administered, this would permit us to replace the queries in Example 7 and Figure 2.15 with their original versions. Regardless of which node in our distributed system these queries were submitted to, consistent synonym mapping would ensure that the same data and logic were always invoked. Parsers on each node can detect errors and ambiguities in queries, and distributed ORDBMS vendors provide utilities to detect inconsistencies between sites.
Figure 2.15
Using fully qualified object names in distributed ORDBMSs. a. INSERT
INTO Call_History@History ( Customer, Transceivers, Duration )
Subscribers@Billing S Matches@Billing (S.Security_Check, :pAudCallers_Spoken_Key);
b. SELECT
T.Id, Area@Foundation ( T.Service_Range ) AS Area
FROM WHERE
Transceivers@Foundation T Within@Foundation ( T.Location, Circle@Foundation (–127.513,35.362,’13 Miles’) )
ORDER
BY 2;
58 |
Distributed Component Database Management Systems
As we show earlier in this chapter, the key to ORDBMS query processing is metadata. When it receives a query, the local ORDBMS needs to create an efficient schedule of operations to compute an answer. In distributed ORDBMSs, a local site might need to access information about data and logic residing on multiple remote sites. At the very least, the coordinating node needs to verify that certain objects actually exist, and that it has permission to access them. In more advanced systems, it might also be able to discover a locally available version of some component logic or a copy of some data replicated from its original site. Proposals for managing distributed catalogues include centralizing all catalogue information in a canonical store, replicating it among all participating sites, or performing remote lookups at the time that queries are received at the coordinating node. All of these approaches are deficient in some way. The first two techniques require that changes to local sites be propagated to the central store or broadcast to the entire set of inter-connected sites. This turns every local DDL statement into a distributed transaction and implies an explosion in the number of messages as the number of sites increases. However, reading from remote catalogues each time a query is parsed and optimized makes query compilation more expensive. In practice, different distributed relational systems have adopted different strategies. Many research systems and early commercial systems opted for globally replicated schema because, it was reasoned, schema changes were relatively rare events. However, in distributed ORDBMSs there is a greater variety and number of schema objects, and application development calls for fairly regular upgrades and additions of new components. Both of these indicate that the frequency of schema changes in distributed ORDBMSs will be higher and argue for the parse-time remote lookup strategy (Hong 1993). It should be emphasized that so far we have only considered the problem of object naming. The fact that a site name is included in the query specification does not imply that that is the site on which the named logic is actually run. In the first place, an equivalent implementation of the function might be available on another node (this equivalency needs to be noted in the system catalogues). Moving data to that node and invoking the specified logic there might be a better way to process the query. And second, as we mention earlier in this chapter, distributed ORDBMSs can co-opt component technologies to move the implementation of component logic among nodes of the system. Decisions about where the named logic is actually invoked for some data can be deferred to later in the query processing. The principle of location transparency applies not only to where the distributed system’s data are located, but also to where the logic specified in a query statement is actually run.
2.3
2.3.3
Distributed Component DBMSs
| 59
Query Processing in Distributed ORDBMSs Although it adopts the same basic strategy, distributed query planning is considerably more complex than the same task in a single site. Distributed queries are compiled into schedules of lower-level operations, just like single-site queries. And these schedules can be re-arranged, subject to certain rules, into more or less computationally efficient plans. The extra complexity of distributed query planning is due to the physical dispersal of data and logic. When a query addresses data and logic on two machines, then the only way to answer the query is to bring the data and logic together on one of them. But which one? To answer this question, a distributed optimizer must consider the overhead of moving data across a network when it determines the costs for each alternative query plan. Networking overhead can be modeled by assigning a cost to each edge of the query-execution plan trees we introduced in Section 2.2.4. This cost factor is estimated by calculating the volume of data moving along the edge. As in the single-site case, the total cost of a distributed query plan can be found by aggregating the costs of each suboperation in the plan and including the cost of the edges. Distributed RDBMS optimizers use a similar approach. An important difference between distributed RDBMS optimizers and distributed ORDBMSs is that the data management costs of user-defined objects must also be considered, for much the same reasons as single-site ORDBMSs (see Section 2.2.4). If a distributed query includes n instances of user-defined logic, the cost of a distributed ORDBMS query can be calculated as follows. Total_Cost = CPU_Cost + I/O_Cost + Network_Cost I/O_Cost = I/O_Weight * ( Table_Data_Disk_I/Os + ∑n ( Expression_I/O_CostI * Number_of_InvocationsI ) ) CPU_Cost = CPU_Weight * ∑n ( Expression_CPU_CostI * Number_of_InvocationsI ) Network_Cost = Network_Weight * ( Message_Transfers + Data_Transfers + ∑n ( Expression_Data_SizeI * Number_of_InvocationsI ) ) In simple distributed ORDBMSs, the coordinating node’s optimizer prepares an initial query plan using single-site techniques; then a special module within the coordinating node determines the most efficient data movement strategy given this plan. This first phase of query processing is sometimes referred to as global optimization. At runtime, the query plan’s distributed operations are forwarded to the appropriate remote sites. Remote operations are often entire SQL queries that each site is free to optimize according to local priorities. To understand why this works, recall that the result of a SQL query is a set of rows and columns that looks much like a table. Internal to the coordinating node,
60 |
Distributed Component Database Management Systems
the entire remote SELECT query can be treated as though it were a scan of a local table. This second phase of a distributed query is sometimes called local optimization, because its scope is typically contained within a single node.
EXAMPLE 8
Figure 2.16
To explain these ideas in more detail, consider the query: What are the billing line-items for each subscriber in the 555 area code for calls over the last month? It joins data from two sites and invokes component logic on both of them. Let us assume that this query is submitted at the Billing node. This is illustrated in Figure 2.16. As it parses this query, the coordinating node at Billing identifies what data are local, what data are remote, and what logic can run on which sites. But it ignores this information until it considers what work to pass to the remote History node. In this case, there are (at least) two possibilities: The coordinating node can retrieve all of the matching Call_History records from the History node or it can request each subscriber’s matching records as it needs them. We illustrate these alternatives using the trees in Figure 2.17. When it arrives on the History node either of the queries we show in bubbles is parsed, optimized, and executed at that local site. This allows the distributed ORDBMS to take advantage of each local node’s physical configuration. Then the result of the query is returned to the coordinating node, where it can be treated as if it came from a local table. Such a strategy is extremely flexible. If the ORDBMS optimizer is clever enough, it can even make use of SQL features such as ORDER BY and SELECT DISTINCT to employ more sophisticated join algorithms on the coordinating node. Queries passed to remote nodes can be relatively simple, single-table lookups or elaborate multitable joins. Join query in distributed ORDBMS. SELECT
Other factors complicate distributed query planning. Distributed ORDBMSs needs to cope with the fact that, in contrast to distributed RDBMSs, some logical query plans are physically impossible because not all logic can run at all sites. If the temporal Contains() restriction cannot be run on the History node, then neither of these plans is possi-
2.3
Distributed Component DBMSs
| 61
Figure 2.17 Query plans for distributed join query in Figure 2.16.
Plan A
S.Name, C.Billing
Data copied from history to billing
S.Id = C.Subscriber
SELECT C.Subscriber, Billing (C.Duration) AS Billing FROM Call_History C WHERE Contains (Start(C.Duration), CAST('Last Month' AS Period));
S.Id, S.Name
AreaCode(S.Id) = '555' Runs on history
S
Plan B
C
S.Name, C.Billing For each subscriber row, the S.Id value passed to history, and query results passed back to billing
S.Id = C.Subscriber
SELECT C.Subscriber, Billing (C.Duration) AS Billing FROM Call_History C WHERE Contains (Start(C.Duration), CAST('Last Month' AS Period))
S.Id, S.Name
AreaCode(S.Id) = '555' S
Repeatedly executed on history
AND C.Subscriber = :S_Id; C
ble. We say that, as the optimizer evaluates the query-plan alternatives in its plan space, it must be able to detect such impossible plans and ignore them. Later in this section we see how this affects the information that the ORDBMS must keep about extension logic.
2.3.4
External Storage Interfaces Unlike the situation with RDBMSs, data accessed through an ORDBMS need not be stored in a database. It has been observed that less than
62 |
Distributed Component Database Management Systems
20% of the data on which modern organizations depend is stored in structured repositories. The rest is found in personal computer file formats or in legacy applications where the data management and programming logic are tightly intertwined. In addition, organizations are increasingly making use of network and wireless connectivity to deploy command and control systems to monitor real-world environmental conditions. Distributed ORDBMSs provide the means whereby external data and processes may be integrated within the database and accessed using its query language. We illustrate this later in Figure 2.20 by showing access to the actual transceiver devices integrated into the Foundation schema. Because the query language represents the only means by which data in these systems is manipulated, these external data appear in the schema as a table. There are obvious business advantages to integrating the spreadsheets on which sales forecasting is done into a business’s financial management systems. But other possibilities present themselves. In our cellular network application, it would be useful to create an interface to data that is essentially transient—current calls. Current call data—literally the calls in progress at the moment that this information is wanted—lists the calling subscriber, transceiver, and current signal strength.
EXAMPLE 9
Figure 2.18
Current call data might be used in the query: What is the average signal strength of current calls by each transceiver? This is illustrated in Figure 2.18. Query using external data source for analysis. SELECT C.Transceivers, AVG( C.Signal_Strength ) FROM Current_Calls C GROUP BY C.Transceivers;
In Figure 2.11, we present the internal architecture of a DBMS. This allows us to explain how extension components interact with the various modules making up the engine. But a more radical possibility presents itself. In an extensible DBMS, it is possible to replace one of these modules entirely.
EXAMPLE 10
Consider the query plan introduced in Example 5 and Figure 2.8. In the simplest case, the first of these operations (the restriction) needs to examine each of the rows in the Subscribers table. Such a facility might be implemented as shown in Figure 2.19.
2.3
Distributed Component DBMSs
| 63
By implementing the routines shown in bold, a developer can create an interface that makes an external data source available in the database. That is, instead of calling the engine’s built-in implementation of OpenTable, BeginScan, and so on (which use the buffering and storage management facilities built into the ORDBMS), the ORDBMS instead invokes logic implemented by a developer. This logic can access external data using file management or interprocess communication facilities. So far as the rest of the engine is concerned, it has no idea how the data it is processing is stored. Such an interface provides a very flexible, lowestcommon-denominator way to access external data. It provides an ideal mechanism for unstructured data and data stored in nonrelational data stores because it allows a developer to use procedural code to unravel non-normalized data structures. Figure 2.19
Pseudocode for restriction operation. Function Restriction ( STRING
TableName,
PREDICATE
Predicate
) { TABLE_DESCRIPTION
Table;
SCAN_DESCRIPTION
Scan;
ROW
Record;
Table
:=
OpenTable(TableName);
Scan
:=
BeginScan(Table);
while (( Record := GetNext(Scan)) != END_OF_SCAN) { if
Check_Predicate ( Record, Predicate ) == TRUE
{ Add_Row_to_Result_List ( Record ); } } EndScan(Scan); CloseTable(Table); }
It is useful to think of the implementation of these external data interfaces as creating new kinds of access methods (in addition to the ORDBMS’s built-in heap, B-tree, hash, and R-tree access methods).
64 |
Distributed Component Database Management Systems
EXAMPLE 11
Figure 2.20
Figure 2.20 illustrates how to use a SQL data definition statement to create a new access method, and then use it in the declaration of a table. Within the body of this declaration, various aspects of the access method interface (appearing on the left) are associated with their respective user-defined functions (named on the right). Figure 2.20 illustrates how write operations—UPDATE, INSERTs, and DELETEs—can also be incorporated in this abstraction. In practical terms, you might use the DELETE operation to terminate a cellular phone call in progress. This mapping is stored in the ORDBMS’s system catalogues. The second statement in Figure 2.20 illustrates how the new access method may be employed. Current_Calls is the table presenting the external data within the schema. Queries with Current_Calls invoke the user-defined routines specified in the CREATE ACCESS METHOD statement, instead of the ORDBMS’s built-in methods. Access methods may even be parameterized. Parameter values such as the machine-port combination used here are made available to the extension logic. Therefore, the same access method logic may be used to implement interfaces to multiple, different external data instances. SQL data definition statements for user-defined access method. CREATE PRIMARY ACCESS_METHOD Transceivers_Access ( am_create
=
Transceivers_create_access,
am_open
=
Transceivers_open_access,
am_close
=
Transceivers_close_access,
am_beginscan
=
Transceivers_beginscan,
am_endscan
=
Transceivers_endscan,
am_getnext
=
Transceivers_getnext,
am_delete
=
call_hangup,
am_perscan_cost
=
Transceivers_latency
); CREATE TABLE Current_Calls ( Transceivers
Transceivers_Id
NOT NULL,
Subscriber
Phone_Number
NOT NULL,
Signal_Strength
FLOAT
NOT NULL
) USING ACCESS METHOD Transceivers_Access(Addr=’machine:port’);
2.3
Distributed Component DBMSs
| 65
This interface can be improved on in several ways. When the external data source can do more than simply store data, which is the case when it is another DBMS, an obvious optimization would be to push any predicates included in the query to the external source. For example, were we to query for current calls on a particular transceiver, it makes sense to avoid bringing all current calls back to the ORDBMS and performing the check there. To do this, we would augment the BeginScan function in Figure 2.19 with an additional parameter and pass any predicates from the query. Checking predicates locally to the remote data minimizes the volume of data brought back to the local node and helps to balance the CPU load. This kind of interface can provide the foundation for more sophisticated interfaces directed at external data sources capable of accepting declarative queries. However, extracting predicates from queries significantly complicates the interface. Another improvement involves providing statistical information about these tables. It is possible to incorporate routines into the new access method that return information such as total row counts and row widths. This is especially helpful to the ORDBMS query processor because it estimates the resources needed for alternative query plans. The interface also needs to deal with an important difference between external data sources and internally stored data. Single-site optimizers can assume that the cost of scanning an unchanging table does not vary over time. This might not be the case for external data. For instance, the time taken to perform a read from a tertiary or hierarchical storage device varies tremendously, depending on where the data resides (e.g., racked tape, mounted tape, disk cache, or memory). To overcome this, the access method interface includes a function that computes a multiplier value. If the data is on disk or in memory, this value might be 1. If the data is on mounted tape it might be 100. If the tape must be fetched from racks by a robot, the multiplier value might be as high as 10,000.
2.3.5
Distributed Processing In this subsection, we describe changes that can be made to the ORDBMS function manager to allow the ORDBMS to invoke logic on remote computer systems. The first of these extensions simply invokes logic integrated into another node of the distributed ORDBMS. This is necessary because, as we mention in the introduction, some component logic might be immobile. Also, in most organizations you will find a diverse ecology of information systems already developed and not amenable to being run inside an ORDBMS. Interoperability standards already exist to permit these systems to exchange messages. Therefore, it makes sense to add support for these standards within the ORDBMS so that logic in these systems can be invoked from within the ORDBMS (and vice versa).
66 |
Distributed Component Database Management Systems
When it is integrated into the ORDBMS, most component logic can be marked as execute anywhere. Either an equivalent version of the logic is installed on all sites of the distributed ORDBMS or the runable logic is copied between compatible nodes by the distributed system at runtime. Immobile logic, which is taken to be the default, only runs at a single node. This might be because a particular computer uses a different hardware or operating system than all of the others or because of software licensing restrictions. But the fact that the logic is immobile does not mean it cannot be addressed from other nodes. Integrated component logic can be invoked directly, without the use of queries.
EXAMPLE 12
Developers can use the syntax in Figure 2.21 to invoke integrated component logic directly. Just like a query, this expression might be submitted from an external program or from some application-level logic embedded in the ORDBMS. It specifies that a particular module of logic should be invoked and what argument values should be used. The coordinating node receiving this expression might discover that the runable logic implementing this extension is not available at its site (i.e., it exchanges messages with the remote site to ascertain the status of the integrated component and is informed that the logic cannot be executed anywhere). Under such circumstances, the coordinating node can send a message to the remote node and have it invoke the logic locally. This requires an extension to the language manager introduced in Section 2.2.6. Data structures describing an expression must be modified to include a notion of the site on which the logic must be invoked. Then the ORDBMS function manager can be modified to employ the same SQL-centric mechanism used for distributed queries to send messages to function managers on remote sites. The need for this kind of remote procedure call (RPC) functionality is well understood in other software engineering contexts. In fact, there is a set of system interoperability standards that rely on it. Object Management Group’s (OMG’s) CORBA, Microsoft’s Distributed COM (DCOM), and SunSoft’s Remote Method Invocation (RMI) and EJB standards are the best known of these. Although these models relate to different language and operating system environments, which somewhat dilutes their promises of heterogeneous interoperability, they all share certain properties. All of them define how messages can be exchanged between logic embedded in separated programs. Allowing
Figure 2.21
Direct invocation of component logic. EXECUTE FUNCTION Matches@Billing (:First_Audio_Key, :Second_Audio_Key );
2.3
Distributed Component DBMSs
| 67
ORDBMS developers access to remote procedure call facilities and interoperability frameworks is a good idea. Achieving this requires that the ORDBMS integrate code managing the dispatch of RPCs into its function manager. Several vendors ship libraries implementing either simple RPC functionality or a fully functional Object Request Broker (ORB). These libraries can be integrated into the ORDBMS using the same techniques that were used to integrate the Java environment. We illustrate this idea in Figure 2.22. Allowing external systems to invoke logic within the ORDBMS using an RPC mechanism requires more substantial changes. To become an RPC server, which means that a program can become the recipient of RPC calls in addition to being the sender, requires that the ORDBMS register the facility with the operating system. When an external program directs an RPC call to the ORDBMS, the operating system interrupts the ORDBMS, which must handle the interrupt before continuing normal processing. It should be pointed out that the interoperability standards introduced earlier in this subsection also specify object models—the means by which an object’s interfaces are described. SQL defines its own interface specification standard, and the ORDBMS’s system catalogues constitute a repository or registry. Therefore, the naming services provided by other interoperability frameworks are of little value. That said, it is highly desirable to allow different systems to exchange this kind of information. Using the external access methods interface to provide query-centric interrogation of an ORB’s naming services or the Windows NT registry is fairly straightforward. The most important difference between these standards and the ORDBMS approach is philosophical. ORDBMSs integrate data instances and logic into an abstract framework that is profoundly dynamic in its
Figure 2.22
Remote logic invocation from RPC handler within language manager. Remote invocation of logic via RPC/RMI
Function manager
Remote procedure call handler
68 |
Distributed Component Database Management Systems
character. Logic is applied to data with queries in an ORDBMS, whereas all of the standards mentioned here handle this within the programming logic.
2.3.6
Distributed Data Movement Distributed ORDBMSs frequently copy data between nodes. This is fairly straightforward for distributed RDBMSs, which include a fixed set of predefined data types. They can implement logic to deal with issues such as variable binary formats for data types. In contrast, distributed ORDBMSs need mechanisms to move copies of arbitrary, userdefined data structures between different hardware and operating systems. In the simplest case, our cellular support system might employ the same hardware and operating system at all sites. Under these circumstances, the distributed ORDBMS can distribute binary versions of data directly. But when the distributed ORDBMS is built on heterogeneous local systems, the problem becomes considerably more difficult. Sometimes in fact, data instances cannot be distributed at all. One solution is to require the component developer to provide logic to facilitate data movement. For example, the Java language includes mechanisms to serialize instances of Java components into an efficient, hardware-neutral format. Data might be stored everywhere in this canonical format, or the mechanism can be used to move data between dissimilar computer systems. The drawback of storing data in such a format is that it must be converted to a format suitable for each local computer at runtime. If each system stores data in a different binary format, then as a last resort a developer might choose to implement send and symmetrical receive logic. These support functions process the local binary data into a neutral form, such as an ASCII string, and from the neutral form back into binary. Of course, this scheme assumes equivalent versions of the C or C++ component exist on both sites. Component data can be very large. For example, our audio key might be several kilobytes in size. A digitized audio recording might be several megabytes. Most ORDBMSs use a system whereby the large object data is stored separately from a table’s row data and a large object handle is placed in the row. Queries that do not access large object data thereby avoid reading it from disk as they scan the table. When a module of logic needs to access the large object data, the handle is passed into it as the internal reference. Server interfaces that open large objects and access their contents take large object handles as arguments. Corresponding facilities in operating systems use file names instead of handles. Figure 2.23 illustrates the basic idea. The computational cost of reading large objects from disk, copying them across the network, and applying logic to them is often a significant factor in query planning. It would be very inefficient to move
2.4
Summary
| 69
Figure 2.23 Large object data management in ORDBMS. Row data on a single page Pages of data in table space
Large object data
Large object handle value
large object instances unnecessarily. But the size of a large object handle is quite small relative to the size of large object data, so moving handles is generally not a problem. Therefore, efficiently supporting large object data in a distributed ORDBMS environment requires that the following modifications be made. 1. Augment the structure of the large object handle to include the identity of the site on which it is stored. In detail, something similar to the Open Software Foundation’s (OSF’s) Distributed Computing Environment (DCE) specification describes an opaque Universal Unique IDentifier (UUID) value that can be used. 2. Augment the ORDBMS’s internal facilities to ensure that its facilities for accessing large object data operate across a network. Using this strategy, the distributed ORDBMS avoids unnecessarily copying large objects across the network. In fact, because logic that accesses large object data frequently uses only a fraction of the entire object, buffering and caching techniques from network file systems can be used. Rather than bring the entire object across the wire when it is to be accessed, only those portions of the large object that are actually read by the remote logic need to be moved.
2.4
Summary In this chapter we show how a distributed ORDBMS, a kind of component DBMS, works. We begin by describing how a single-site ORDBMS functions and how it differs from RDBMSs. Then we explain how such a single-site ORDBMS system can be modified to allow it to participate in a distributed multisite system. The chief technical benefit of this
70 |
Distributed Component Database Management Systems
approach is that the distributed ORDBMS provides a framework for the mobility of both data and programming logic. In economic terms, such a system provides a foundation for code reuse, scalability, and the flexibility that comes from combining these features in a dynamic SQL language environment.
Overview The new Internet computing environment has brought new kinds of data to a huge number of users across the globe. Multimedia data types such as images, maps, video clips, and audio clips were once rarely seen outside of specialty software. Today, many web-based applications require their database servers to manage such data. Other software solutions need to store data dealing with financial instruments, engineering diagrams, or molecular structures. An increasingly large number of database applications demand content-rich data types and associated content-specific business logic. With the addition of object-relational extensions, the Oracle8i Server (Oracle 1999b, 1999d) can be enhanced by developers to create their own application-domain-specific data types. For example, one can create new data types representing customers, financial portfolios, photographs, or telephone networks—and thus ensure that database programs deal with the same level of abstraction as their corresponding application domain. In many cases, it is desirable to integrate these new domain types as closely as possible with the server so they are treated at par with the built-in types such as NUMBER or VARCHAR. With such integration, the database server can be readily extended for new domains. Oracle8i gives application developers greater control over userdefined data types, not only by enabling the capture of domain logic and processes associated with the data, but also by providing control over the manner in which the server stores, retrieves, or interprets this data. The Oracle8i database contains a series of database extensibility services, which enable the packaging and integration of content-rich domain types and behavior into server-based managed components. Such components are called data cartridges (Oracle 1999d). They are developed and managed by means of a set of database interfaces called the Oracle Data Cartridge Interfaces (ODCI). Let us look at the issues involved in creating data cartridges in more detail. Relational databases, so far, are widely known for efficiently managing and manipulating simple, structured business data. The business data of the early 1990s mostly consisted of flat, row-oriented data (i.e., tabular data with no nested or structured columns). Also, databases did not maintain unstructured data such as text associated with records, voice clips, or spatial data. With advances in both computer hardware and software technologies, applications are becoming more sophisticated, and they desire the efficient integration of heterogeneous, multimedia data sources. For example, it is quite reasonable to store and manipulate data about an employee’s salary (structured relational data) with the employee’s resume (textual nonrelational data), and correlate the results with the location of employees on a map: How many employees who know database management systems (DBMSs) and
3.1
Overview
| 73
make less than $50 thousand live within 50 miles of San Francisco? Surely they deserve raises. Until recently, the burden of integrating heterogeneous data types and data sources fell on the applications rather than the underlying DBMS. This happened for three reasons: 1. Databases did not have the capability to store the unstructured or semistructured data, such as free-form text resumes. 2. Databases could not perform specialized querying on high-dimensional data, such as spatial queries on geographic locations. 3. Databases did not provide adequate performance for the efficient manipulation of a large amount of content-rich data, so that queries such as the employee query finished in reasonable time. Specialized applications became available from various vendors to provide middle tiers that perform spatial searches, free text searches, and so forth. However, such loosely integrated specialty middle tiers have several disadvantages.
• • •
Many functions have to be built repeatedly.
•
Optimizations across data sources cannot be performed efficiently. For instance, a spatial data server knows nothing about text searches, and vice versa.
•
Each specialty server comes with its own utilities and practices for administering data, causing severe complexity in the backup, restore, and monitor functions necessary to guarantee high availability.
Applications become too large, too complex, and far too customized. Even though these midtier products can exploit special access and storage methods to manipulate multimedia data, they run outside the DBMS server, causing performance to degrade as interactions with the database server increase.
Because processing for content-rich data is beset with problems when done outside the database, the next question that arises is whether databases can support specific rich types inside them. Because it is not clear what constitutes a full set of such types, it seems inefficient to provide, on an ad hoc basis, support for each new type that comes along. The DBMS would have to be re-architected each time a new type is encountered. In other words, unless all the content-rich types belong to some comprehensive architecture, they will continue to be bedeviled by issues of re-architecture, cross-type query optimization, uniform programmatic access, and so on. Oracle approached the content-rich data problem from the standpoint of creating such an architecture. Databases must be dynamically extended to be able to efficiently handle various rich, applicationdomain-specific data types. Extensibility is the component architecture
74 |
All Your Data: The Oracle Extensibility Architecture
that enables vendors to integrate components dealing, say, with richcontent data, efficiently into the DBMS, thereby dynamically extending its capabilities. Such components, called data cartridges, typically include a definition of a structured or unstructured data type, userdefined operations (operators and functions) on the data type, userdefined storage and access structures for efficient storage and retrieval of the data type instances, and query-ability on the datatype instances.
EXAMPLE 1
Consider the following example. CREATE TABLE patients ( patient_id
PersonID,
age
INTEGER,
medhistory
Text,
catscan
Image,
loc
Location
); CREATE TABLE cities ( name
CHAR(20),
loc
Location,
population
INTEGER
);
Given these definitions, it should be possible to formulate queries on the different data types in the tables. For instance, find the number of patients older than 50, who live within 50 miles of San Francisco, who have a family medical history of cancer, and for whom there is a probability greater than 0.6 of finding a tumor in their CAT scan. SELECT FROM WHERE
count(p), p.age patients p, cities c p.age > 50 AND c.name = ‘San Francisco’ AND Distance(p.loc, c.loc, 50) AND TumorProb(p.catscan) >= 0.6 AND Contains(p.medical_history, ‘cancer’)
GROUP
BY p.age;
3.1
Overview
| 75
Example 1 illustrates the query-ability we desire of an assortment of multimedia data types. In order to support such a query, it is clear that significant extensions to the services normally provided by the DBMS are required. The steps involved in making these extensions are as follows.
•
Creation of user-defined types: the ability to define text, image, and location data types.
•
Storage of user-defined type instances: the ability to store and manipulate multimedia-type instances.
•
Creation of domain-specific operations: support for user-defined functions and operators such as Contains(), Distance(), and TumorProb().
•
Implementation of domain-specific indexes: support for indexes specific to text data (to evaluate Contains()), spatial data (Distance()), and so on, which can be used to speed the query.
•
Providing optimizer extensibility: support for intelligent ordering of query predicates during evaluation. In the query in Example 1, it is critical to decide the order in which to evaluate the where clauses, so that the most restrictive clause can be applied first. The Contains operator evaluation involves a text index search; the Distance evaluation involves a spatial index lookup. The most efficient order of evaluation of these various operators and functions depends on the CPU and I/O costs of the operations. The TumorProb() function call should be evaluated last if there is no index on it. Because all these operators and functions are user-defined, the optimizer has to be extended to allow type-designers to specify the costs of various operations on the types.
In each case where a service is an extensible one, an interface or API provides access to the service. Figure 3.1 shows this extensibility architecture. Next, we take a look at the functionality of each extensible service in more detail.
Figure 3.1
The Oracle Extensibility Architecture. Data cartridge
Oracle8 universal data server
Type system
Extensibility interfaces
Server execution
Query processing
...
Data base and extensibility services
Data indexing
76 |
All Your Data: The Oracle Extensibility Architecture
3.2
Extensible Type System The Oracle Type System (OTS) (Oracle 1997, 1999c), based on the SQL:99 type system, provides a declarative interface for defining types. The behavior for these types can be implemented in Java, C, C++, PL/ SQL. The DBMS automatically provides the low-level infrastructure services needed for I/O, heterogeneous client-side access for new data types, optimizations for data transfers between the application and the database, and so on. Let us examine the constituents of OTS.
3.2.1
Object Types An object type, distinct from native SQL data types such as NUMBER, VARCHAR, or DATE, is user-defined. It specifies both the underlying persistent data (called attributes of the object type) and the related behavior (called methods of the object type). Object types are used to extend the modeling capabilities provided by the native data types. They can be used to make better models of complex entities in the real world by binding data attributes to semantic behavior. There can be one or more attributes in an object type. The attributes of an object type can be the native data types, large objects (LOBs), collections, other object types, or reference (REF) types (see the following). A method is a procedure or a function that is part of an object-type definition. Methods can access and manipulate attributes of the related object type. Methods can be run within the execution environment of the Oracle8i Server. In addition, methods can be dispatched to run outside the database as part of the Extensible Server Execution service.
3.2.2
Collection Types Collections are SQL data types that contain multiple elements. Each element or value for a collection is an instance of the same data type. In Oracle8i there are two collection types—variable arrays (VARRAYs) and nested tables. A VARRAY contains a variable number of ordered elements. VARRAY data types can be used as a column of a table or as an attribute of an object type. Also, the element type of a VARRAY may be either a native data type such as NUMBER or an object type. Using Oracle8i SQL, a named table type can be created. These can be used as nested tables to provide the semantics of an unordered collection. As with VARRAY, a nested table type can be used as a column of a table or as an attribute of an object type.
3.2
3.2.3
Extensible Type System
| 77
Relationship Types It is possible to obtain a reference (or the database pointer) to a standalone instance of an object type. References are important for navigating among object instances, particularly in client-side applications. A special REF operator is used to obtain a reference to a row object.
3.2.4
Large Objects Oracle8i provides LOB types to handle the storage demands of images, video clips, documents, and other such forms of data. LOBs are stored in a manner that optimizes space use and provides efficient access. More specifically, LOBs are composed of locators and the related binary or character data. A locator is an opaque handle or proxy to a LOB that is retrieved by the application and is then used to read and manipulate the LOB data. The LOB locators are stored in-line with other table record columns, and, for internal LOBs (binary larger objects, BLOBs; character large objects, CLOBs; and national character large objects, NCLOBs), the data can reside in a separate storage area and even on a different secondary storage device. For external LOBs (binary files, BFILEs), the data are stored outside the database table spaces in operating system files. There are SQL data definition language (DDL) extensions to create and delete tables and object types that contain LOB types. The Oracle8i Server provides SQL data manipulation language (DML) commands to INSERT and DELETE complete LOBs. In addition, there is an extensive set of commands for piecewise reading, writing, and manipulating LOBs via Java, PL/SQL, OLE DB or the Oracle Call Interface (OCI). For internal LOB types, the locators and related data participate fully in the transactional model of the Oracle8i Server. The data for BFILEs does not participate in transactions. However, the BFILE locators themselves are fully supported by server transactions. With respect to SQL, the data residing within Oracle8i LOBs is opaque and not query-able. One can write functions (including methods of object types) to access and manipulate parts of LOBs. In this way the structure and semantics of data residing in large objects can be supplied by application developers.
3.2.5
Opaque Types The opaque type mechanism provides a way to create new basic types in the database whose internal structure is not known to the DBMS. The internal structure is modeled in some 3GL languages (such as C). The database provides storage for the type instances that can be
78 |
All Your Data: The Oracle Extensibility Architecture
bounded by a certain size with a varying length or a fixed one. The storage requirement is specified in the type definition. The type methods or functions that access the internal structure are external methods or external procedures in the same 3GL language used to model the structure.
3.3
Server Execution Environments The OTS decouples the choice of implementation language for the member method of an object type from its specification. Thus, components of an Oracle8i data cartridge can be developed using any of the popular programming languages. In Oracle8i, methods, functions, and procedures can be developed using Java, PL/SQL, or external C language routines. Indeed, a type developer can mix and match multiple languages. Thus, the database server runtime environment can be extended by user-defined methods, functions, and procedures.
3.3.1
Java Java is one of the available choices for server-based execution. Oracle8i provides a high performance Java Virtual Machine (JVM) to enable the use of Java in developing stored procedures, object-type methods, stand-alone functions, and constraints (Oracle 1999g). This scalable, multiuser JVM runs in the address space of the database server and can be used to run standard Java behavior inside the database. There are multiple programming models available with the JVM. Java Data Base Connectivity (JDBC) allows object-relational statement-wise access to data (Oracle 1999h). SQLJ, a standard precompiler technology, allows SQL to be embedded directly in Java code (Oracle 1999j). Associated with the JVM is a server-based Object Request Broker (ORB), which enables an Enterprise JavaBeans (EJB) programming model (Oracle 1999e). The server ORB is fully compliant with the Common Object Request Broker (CORBA) specification. Finally, it is possible to perform ahead-of-time compilation on server-based Java code, so that the costs of interpreting this code are not taken at each invocation. Such native compilation capabilities make it possible to write computationally intensive data-cartridge behavior in Java. In addition to server-based Java execution, Oracle8i also provides client-side Java access to the JVM, as well as the SQL and PL/SQL engines, by means of JDBC. Figure 3.2 shows how these fit schematically into the overall database architecture.
3.3
Figure 3.2
Server Execution Environments
| 79
Java and Oracle8i. Client JDBC (thin) Java sockets
PL/SQL In Oracle8i, PL/SQL offers a data-cartridge developer a powerful procedural language that supports all the object extensions for SQL (Oracle 1999l). With PL/SQL, program logic can execute on the server performing traditional procedural language operations such as loops, ifthen-else clauses, and array access. All of this processing occurs in a transactional SQL environment where DML statements can be executed to retrieve and modify object data.
3.3.3
C and C++ While PL/SQL and Java are comprehensive languages, certain computations such as a Fast Fourier Transform or an image format conversion are handled more efficiently by C programs. With the Oracle8i Server, C language programs can be called from PL/SQL. As shown in Figure 3.3, external programs are executed in an address space separate from the server. This ensures that the database server is insulated from any program failures that might occur in external procedures and that under no circumstances can an Oracle database be corrupted by such failures. In general, the Extensible Server Execution Environment enables an external C routine to be used wherever a PL/SQL subprogram could be called, such as the body of a PL/SQL method for an object type, a database trigger, or a PL/SQL function embedded in an SQL statement. Figure 3.3 shows the process of dispatching an external routine. External routines need not be confined to C; in fact, any external language that is capable of generating a dynamically linked library or
80 |
All Your Data: The Oracle Extensibility Architecture
Figure 3.3
Dispatching external routines. Oracle8
PL/SQL
Listener
extproc Oracle database
Oracle address space
/data_cartridge_dir/libdatastream.so External address space
shared object file can be used to implement behavior in the Extensible Server Execution service. C++, Pascal, and other languages are all available as implementation choices. With certain reasonable restrictions, external routines can call back to the Oracle Server using Oracle Call Interface (OCI). Callbacks are particularly useful for processing LOBs. For example, by using callbacks an external routine can perform piecewise reads or writes of LOBs stored in the database. External routines can also use callbacks to manipulate domain indexes stored as index-organized tables (see Section 3.4) in the database. External routines can be procedures or functions (the difference is that functions have a return value and procedures do not.)
3.3.4
Safe Execution Opening up the server-execution environment creates a new problem for component databases. As long as all the operating parts of a database came from one vendor, safety and operational reliability of the data can be achieved relatively easily. However, as database systems evolve into platforms for hosting specialized data and behavior, the end user could see reliability reduced because the least reliable vendor component becomes the weakest link in the chain. One key characteristic of the Oracle Extensibility architecture is that it is always safe for an end user to run cartridges created by third parties on the Oracle platform. PL/SQL and Java are interpreted and therefore safe to run in the server’s address space. Even in the case where Java code is compiled, the native compiler performs various boundschecking operations to ensure that the generated compilable code is safe to run. Unsafe cartridge code (written in C or C++) is run outside the address space of the database using the extproc mechanism.
3.4
3.4
Extensible Indexing and Operators
| 81
Extensible Indexing and Operators Typical database management systems support a few types of access methods (e.g., B+-trees and hash indexes) on some set of data types (e.g., numbers and strings). In recent years, databases have been used to store many different types of data such as text, spatial, image, video, and audio. In these complex domains, there is a need for the indexing of complex data types and for specialized indexing techniques. For simple data types such as integers and small strings, all aspects of indexing can be easily handled by the database system. This is not the case for documents, images, video clips, and other complex data types that require content-based retrieval. Complex data types have applicationspecific formats, indexing requirements, and selection predicates. For example, there are many different document encodings (e.g., Open Document Architecture, ODA; Standard Generalized Markup Language, SGML; and plain text) and information retrieval techniques (e.g., keyword, full-text Boolean, similarity, and probabilistic). Similarly, R-trees are an efficient method of indexing spatial data. No database server can be built with support for all the possible kinds of complex data and indexing. Oracle’s solution is to build an extensible server that provides the application developer with the ability to define new index types (Oracle 1999d). The framework to develop new index types is based on the concept of cooperative indexing where a data cartridge and the Oracle Server cooperate to build and maintain indexes for data types including text, spatial, and Online Analytical Processing (OLAP). The cartridge is responsible for defining the index structure, maintaining the index content during load and update operations, and searching the index during query processing. The index structure itself can either be stored in the Oracle database (e.g,. in heap or index-organized tables) or externally (e.g., in operating system files). However, it is highly desirable for reasons of concurrency control and recovery to have the physical storage of the domain indexes be in the Oracle database. To this end, Oracle8i introduces the concept of an Indextype. The purpose of an Indextype is to enable efficient search and retrieval functions for complex domains such as text, spatial, image, and OLAP. An Indextype is analogous to the sorted or bit-mapped index types that are built into the Oracle Server. The difference is that the routines implementing an Indextype are provided by the cartridge developer, whereas the Oracle Server kernel implements the built-in indexes. Once a new Indextype has been implemented by a data-cartridge developer, end users of the data cartridge can use it just as they would the built-in index types. With extensible indexing, the application defines the structure of the domain index as a new Indextype; stores the index data either inside
82 |
All Your Data: The Oracle Extensibility Architecture
the Oracle database (in the form of tables) or outside the Oracle database; and manages, retrieves, and uses the index data to evaluate user queries. When the database server handles the physical storage of domain indexes, cartridges must be able to do the following.
•
Define the format and content of an index. This enables cartridges to define an index structure that can accommodate a complex data object.
•
Build, delete, and update a domain index. The cartridge handles the building and maintenance of the index structures. Note that this is a significant departure from the automatic indexing features provided for simple SQL data types. Also, because an index is modeled as a collection of tuples, in-place updating is directly supported.
•
Access and interpret the content of an index. This capability enables the data cartridge to become an integral component of query processing. That is, the content-related clauses for database queries are handled by the data cartridge.
Traditional database systems do not support extensible indexing. Many applications maintain file-based indexes for complex data residing in relational database tables. A considerable amount of code and effort is required to maintain consistency between external indexes and the related relational data, to support compound queries (involving tabular values and external indexes), and to manage a system (backup, recovery, allocate storage, etc.) with multiple forms of persistent storage (files and databases). By supporting extensible indexes, the Oracle8i Server significantly reduces the level of effort needed to develop solutions involving high-performance access to complex data types. Table 3.1 lists the functionality of the ODCIIndex interface. Before we provide an example of extensible indexing, we introduce a few additional constructs, all necessary to add power and flexibility to the database indexing service. We discuss index-organized tables and function-based indexes, followed by a discussion of user-defined operators, before presenting the example of extensible indexing.
3.4.1
Index-Organized Tables An index-organized table (IOT) is a useful tool in the armory of a cartridge developer using extensible indexing. An IOT differs from an ordinary table in that the data for the table are held in their associated index. In fact, the table and the index are one and the same such that changing the table data, such as adding new rows, updating rows, or deleting rows, is tantamount to updating the index. The IOT is like an ordinary table with an index on one or more of its columns, but instead of maintaining two separate storages for the table and the B*-tree index, the database system only maintains a single B*-tree index, which contains both the encoded key value and the
3.4
Table 3.1
Extensible Indexing and Operators
| 83
ODCIIndex interface. Interface Routine
Description
Index create
ODCIIndexCreate
Creates the domain index according to user-specified parameters
Index drop
ODCIIndexDrop
Drops the domain index
Index scan
ODCIIndexStart
Initializes the scan of the domain index
ODCIIndexFetch
Fetches from the domain index: returns the rowid of each successive row satisfying the operator predicate
ODCIIndexClose
Ends the current use of the index
Insert
ODCIIndexInsert
Maintains the domain index structure when a row is inserted in the indexed table
Delete
ODCIIndexDelete
Maintains the domain index structure when a row is deleted
Update
ODCIIndexUpdate
Maintains the domain index structure when a row is updated
Truncate
ODCIIndexTruncate
Deletes the domain index data preserving the structure
Alter
ODCIIndexAlter
Alters the domain index
Metadata
ODCIIndexGetMetaData Allow import and export of domain-index-specific metadata
associated column values for the corresponding row. Rather than having a row's rowid as the second element of the index entry, the actual data row is stored in the B*-tree index. The data rows are built on the primary key for the table, and each B*-tree index entry contains the pairs <primary_key_value, non_primary _key_column_values>. IOTs are suitable for accessing data by the primary key or any key that is a valid prefix of the primary key. There is no duplication of key values because only nonkey column values are stored with the key. One can build secondary indexes to provide efficient access to other columns. Applications manipulate the IOT just like an ordinary table, using SQL statements. However, the database system performs all operations by manipulating the corresponding B*-tree index. Table 3.2 summarizes the main distinctions between access of ordinary tables and IOTs. IOTs can be very useful to developers of domain indexes in that they provide a canned B*-tree index in which to store their data.
3.4.2
Function-Based Indexing So far, we have discussed indexing data in various ways. Another intriguing possibility open to data-cartridge developers is the ability to index on behavior.
84 |
All Your Data: The Oracle Extensibility Architecture
Table 3.2
Ordinary tables versus index-organized table.
Ordinary Table
Index-Organized Table
Rowid-based access
Primary key–based access
Physical rowid in ROWID pseudocolumn allows building secondary indexes
Logical rowid in ROWID pseudocolumn allows building secondary indexes
Rowid uniquely identifies a row; primary key can be optionally specified
Primary key uniquely identifies a row; primary key must be specified
Sequential scan returns all rows
Full-index scan returns all rows in primary-key order
UNIQUE constraint and triggers allowed
UNIQUE constraint not allowed; triggers allowed
Can be stored in a cluster with other tables
Cannot be stored in a cluster
To address the efficient evaluation of a query when the predicate is based on an object method, Oracle8i supports function-based indexes. Users can create indexes on functions (object methods) and expressions that involve one or more columns in the table being indexed. A function-based index precomputes the value of the function or expression and stores it in the index. A function-based index is created as either a B*-tree or bit-map index. The function used for building the index can be an arithmetic expression or an expression that contains an object type method or a stand-alone SQL function. In order for the function to be indexed, it must be declared deterministic; that is, it must return the same result when invoked with the same set of parameter values. Function-based indexes provide an efficient mechanism for evaluating SQL statements that contain functions in their WHERE clauses. One can create a function-based index to materialize computationalintensive expressions in the index, so that Oracle does not need to compute the value of the expression when processing SELECT or DELETE statements. However, when processing INSERT and UPDATE statements, Oracle must still evaluate the function to process the statement. Suppose a table contains all purchase-order objects, and suppose TotalValue is a method defined for purchase_order type that returned the total value of a purchase-order object by summing up the values of the individual line-items of the purchase order. Then the index CREATE INDEX TotalValueIndx ON purchase_order_table p p.TotalValue();
can be used instead of evaluating the TotalValue method, for example, when processing queries such as
3.4 SELECT
Extensible Indexing and Operators
| 85
p.order_id
FROM
purchase_order_table p
WHERE
p.TotalValue() > 10000;
The ability to build functional indexes thus extends the database indexing service in a fundamental way.
3.4.3
User-Defined Operators Data-cartridge developers find it useful to define domain-specific operators and integrate them into the Oracle8i Server along with extensible indexing schemes that such operators take advantage of while accessing data. The ability to increase the semantics of the query language by adding such domain-specific operators is akin to extending the query service of the database. Oracle8i provides a set of predefined operators that include arithmetic operators (+, -, *, /), comparison operators ( =, >, <), and logical operators (NOT, AND, OR). These operators take as input one or more arguments (or operands) and return a result. Oracle8i allows users to extend the set of operators by defining new ones with user specified behavior. Like built-in operators, they take a set of operands as input and return a result. The implementation of the operator is provided by the user. After a user has defined a new operator, it can be used in SQL statements like any other built-in operator. For example, if the user defines a new operator Contains, which takes as input a text document and a query string and returns TRUE if the document satisfies the specified query string, we can write an SQL query as SELECT FROM WHERE
* Employees Contains(resume, ‘Oracle AND Unix’);
Oracle8i uses indexes to efficiently evaluate some built-in operators— for example, a B-tree index can be used to evaluate the comparison operators =, >, and <. Similarly, in Oracle8i, user-defined domain indexes can be used to efficiently evaluate user-defined operators. In general, user-defined operators are bound to functions. However, operators can also be evaluated using indexes. For instance, the equality operator can be evaluated using a hash index. An Indextype provides index-based implementation for the operators listed in the Indextype definition. An operator binding identifies the operator with a unique signature (via argument data types) and allows associating with it a function that provides an implementation for the operator. The Oracle8i Server executes the function when the operator is invoked. Multiple operator bindings can be defined as long as they differ in their signatures. Thus,
86 |
All Your Data: The Oracle Extensibility Architecture
any operator can have an associated set of zero or more bindings. Each of these bindings can be evaluated using a user-defined function, which could be stand-alone functions, package functions, or object member methods. User-defined operators can be invoked anywhere built-in operators can be used, that is, wherever expressions can occur. For example, user-defined operators can be used in the SELECT list of a select command, the condition of a WHERE clause, or ORDER BY and GROUP BY clauses. When an operator is invoked, its evaluation is transformed into the execution of one of the functions bound to it. This transformation is based on the data types of the arguments to the operator. If none of the functions bound to the operator satisfies the signature with which the operator is invoked, an error occurs. There might be some implicit type conversions present during the transformation process. Example 2 illustrates the extensible indexing and user-defined operator framework.
EXAMPLE 2
Extensible Indexing and Operators Consider a text-retrieval application. For such applications, indexing involves parsing the text and inserting the words or tokens into an inverted index. Such index entries typically have the following logical form: (token, <docid, data>)
where token is a word or stem that is a term in searches, docid is a unique identifier for a document this word occurs in, and data is a segment containing information on how many times or where in the document the word occurs. A sample index entry for such an application would look like (Ulysses, <5, 3, [7 62 225]>, <26, 2, [33, 49]>, ...)
In this sample index entry, the token Ulysses appears in document 5 at 3 locations (7, 62, and 225) and in document 26 at 2 locations (33 and 49). Note that the index would contain one entry for every document with the word Ulysses.
3.5
Defining a Text-Indexing Scheme The sequence of four steps is required to define a text indexing scheme using a text Indextype. First, define and code functions to support the functional implementation of operators that eventually would be supported by the text Indextype. Suppose our text-indexing scheme is in the context of a text data cartridge that intends to support an operator Contains. The operator Contains takes as parameters a text value and a key, and returns a
3.5
Defining a Text-Indexing Scheme
| 87
Boolean value indicating whether the text contains the key. The functional implementation of this operator is a SQL function defined as CREATE FUNCTION TextContains(Text IN VARCHAR2, Key IN VARCHAR2) RETURN BOOLEAN AS BEGIN ....... END TextContains;
Second, create a new operator and define its specification, namely the argument and return data types, and the functional implementation: CREATE OPERATOR Contains BINDING (VARCHAR2, VARCHAR2) RETURN BOOLEAN USING TextContains;
Third, define a type or package that implements ODCIIndex. This involves implementing routines for index definition, index maintenance, and index scan operations. The index definition routines (ODCIIndexCreate, ODCIIndexAlter, ODCIIndexDrop, and ODCIIndexTruncate) build the text index when the index is created, alter the index information when the index is altered, remove the index information when the index is dropped, and truncate the text index when the base table is truncated. The index maintenance routines (ODCIIndexInsert, ODCIIndexDelete, and ODCIIndexUpdate) maintain the text index when the table rows are inserted, deleted, or updated. The index scan routines (ODCIIndexStart, ODCIIndexFetch, ODCIIndexClose) implement access to the text index to retrieve rows of the base table that satisfy the operator predicate. In this case, Contains(...) forms a Boolean predicate whose arguments are passed into the index scan routines. The index scan routines scan the text index and return the qualifying rows to the system. CREATE TYPE TextIndexMethods ( FUNCTION ODCIIndexCreate(...) ... ); CREATE TYPE BODY TextIndexMethods
(
... );
Fourth, create the text Indextype schema object. The Indextype definition also specifies all the operators supported by the new Indextype and the type that implements the index interface.
88 |
All Your Data: The Oracle Extensibility Architecture CREATE INDEXTYPE TextIndexType FOR Contains(VARCHAR2, VARCHAR2) USING TextIndexMethods;
3.5.1
Using the Text-Indexing Scheme Suppose that the text Indextype presented in the previous section has been defined in the system. The user can define text indexes on text columns and use the associated Contains operator to query the text data. Further, suppose an Employees table is defined as follows: CREATE TABLE Employees (name VARCHAR2(64), id INTEGER, resume VARCHAR2(2000));
A text-domain index can be built on the resume column as follows: CREATE INDEX ResumeIndex ON Employees(resume) INDEXTYPE IS TextIndex;
The text data in the resume column can be queried as SELECT * FROM Employees WHERE Contains(resume, ‘Oracle’);
The query execution will use the text index on resume to efficiently evaluate the Contains() predicate.
3.6
Extensible Optimizer The extensible optimizer functionality enables the authors of userdefined functions and indexes to create statistics collection, selectivity, and cost functions (Oracle 1999d). This information is used by the optimizer in choosing a query plan. The cost-based optimizer is thus extended to use the user-supplied information. The optimizer generates an execution plan for a SQL statement (for simplicity, consider a SELECT statement—this also applies to other statements). An execution plan includes an access method for each table in the FROM clause, and an ordering (called the join order) of the tables in the FROM clause. System-defined access methods include indexes, hash clusters, and table scans. The optimizer chooses a plan by generating a set of join orders or permutations, computing the cost of each, and selecting the one with the lowest cost. For each table in the join order, the optimizer computes the cost of each possible access and join method, choosing the one with the lowest cost. The cost of the join
3.6
Extensible Optimizer
| 89
order is the sum of the access method and join method costs. The costs are calculated using algorithms that together compose the cost model. A cost model can include varying levels of detail about the physical environment in which the query is executed. Oracle’s present cost model includes only the number of disk accesses with minor adjustments to compensate for the lack of detail. The optimizer uses statistics about the objects referenced in the query to compute the costs. The statistics are gathered using the ANALYZE command. The optimizer uses these statistics to calculate cost and selectivity. The selectivity of a predicate is the fraction of rows in a table that will be chosen by the predicate. Extensible indexing functionality enables users to define new operators, index types, and domain indexes. For such user-defined operators and domain indexes, the extensible optimizer gives data-cartridge developers control over the three main components used by the optimizer to select an execution plan: statistics, selectivity, and cost. We look at each of these components in more detail.
3.6.1
Statistics The ANALYZE command is extended so that whenever a domain index is to be analyzed, a call is made to the cartridge-specified statistics collection function. The representation and meaning of these usercollected statistics is not known to the database. In addition to domain indexes, cartridge-defined statistics collection functions are also supported for individual columns of a table and data types (whether built-in types or object types). In the former case, whenever a column is analyzed, in addition to the standard statistics collected by the database, the user-defined statistics collection function is called to collect additional statistics. If a statistics collection function exists for a data type, it is called for each column of the table being analyzed of the specified type. For example, the following statement associates a statistics type ImgStats_t, which implements the statistics collection function ODCIStatsCollect, with the image column imgcol of the table tab. ASSOCIATE STATISTICS WITH COLUMNS tab.imgcol USING ImgStats_t;
The following ANALYZE command collects statistics for the tab table. In addition to the usual statistics collected by Oracle, the ODCIStatsCollect function is invoked to collect extra statistics for the imgcol column. ANALYZE TABLE tab COMPUTE STATISTICS;
90 |
All Your Data: The Oracle Extensibility Architecture
EXAMPLE 3
Statistics Consider images stored in a BLOB column in a table created as CREATE TABLE ImgTab ( name VARCHAR2(100), imgcol BLOB );
The following statement associates a statistics type ImgStats_t, which implements the statistics collection function ODCIStatsCollect, with the image column imgcol of the table tab. ASSOCIATE STATISTICS WITH COLUMNS ImgTab.imgcol USING ImgStats_t;
The type ImgStats_t implements the ODCIStats interface. CREATE TYPE ImgStats_t AS OBJECT ( STATIC FUNCTION ODCIStatsCollect(...) ... ... );
The following ANALYZE command collects statistics for the ImgTab table. In addition to the usual statistics collected by Oracle, the ODCIStatsCollect function is invoked to collect extra statistics for the imgcol column. It could collect any relevant information regarding the images. For example, the average size of the images is a useful indicator of the time required to process the images. ANALYZE TABLE ImgTab COMPUTE STATISTICS;
3.6.2
Selectivity The optimizer uses statistics to calculate the selectivity of predicates. The selectivity is the fraction of rows in a table that will be chosen by the predicate and is a number between 0 and 1. The selectivity of a predicate is used to estimate the cost of a particular access method. It is also used to determine the optimal join order. A poor choice of join order by the optimizer could result in a very expensive execution plan. By default, the optimizer uses a standard algorithm to estimate the selectivity of selection and join predicates. However, the algorithm does not work very well when predicates contain functions or type methods. In addition, in Oracle8i predicates can contain user-defined operators about which the optimizer does not have any information and so cannot compute an accurate selectivity.
3.6
Extensible Optimizer
| 91
For greater control over the optimizer’s selectivity estimation, the extensible optimizer enables data-cartridge developers to specify userdefined selectivity functions for predicates containing user-defined operators, stand-alone functions, package functions, or type methods. The user-defined selectivity function will be called by the optimizer whenever it encounters a predicate with one of the following forms: op(...) relop relop op(...) op(...) LIKE
where op(...) is a user-defined operator, stand-alone function, package function, or type method; relop is any of the standard comparison operators (<, <=, =, >=, >); and is a constant value expression. For such cases, data cartridges can define selectivity functions that will be associated with op(...). The arguments to op can be columns, constants, bind variables, or attribute references. When such a predicate is encountered, the optimizer will call the user-defined selectivity function and pass the entire predicate as an argument including the operator, function, or type method and its arguments; the relational operator relop; and the constant expression or bind variable. The return value of the user-defined selectivity function must be between 0 and 1, inclusive; values outside this range are ignored by the optimizer. Typically, the arguments and the statistics collected are used to estimate the selectivity of an operation.
EXAMPLE 4
Selectivity Consider a function Brightness() defined on images that returns a value between 0 and 100 to indicate the level of brightness. A selectivity function can be associated by implementing the ODCIStatsSelectivity function within a type, say ImgStats_t, and executing the following statement: ASSOCIATE STATISTICS WITH FUNCTIONS Brightness USING ImgStats_t;
Now, if a user executes a query of the form: SELECT * FROM ImgTab WHERE Brightness(imgcol) BETWEEN 50 and 60;
the selectivity of the predicate is computed by invoking the usersupplied implementation of the ODCIStatsSelectivity function.
92 |
All Your Data: The Oracle Extensibility Architecture
3.6.3
Cost The optimizer estimates the cost of various access paths to choose an optimal plan. For example, it may compute the cost of using an index and a full table scan in order to choose between the two. However, for data-cartridge defined-domain indexes, the optimizer does not know the internal storage structure of the index. Thus, the optimizer cannot make a good estimate of the cost of using such an index. Similarly, the default optimizer model assumes that the cost of I/O dominates and that other activities such as function evaluations have zero cost. This is only true when functions include relatively inexpensive built-in functions. User-defined functions can be very expensive because they can be CPU-intensive. User-defined functions can invoke recursive SQL. When the function argument is a file LOB, there may be substantial I/O cost. For superior optimization, the cost model is extended to enable users to define costs for domain indexes, user-defined functions, standard stand-alone functions, package functions, and type methods. The user-defined costs can be in the form of default costs that the optimizer simply looks up, or can be full-blown cost functions that the optimizer calls to compute the cost. User-defined cost, like user-defined selectivity, is optional on the part of a data cartridge. If no user-defined cost is available, the optimizer uses its internal heuristics to compute an estimate. However, in the absence of useful information about the storage structures in userdefined domain indexes and functions, such estimates can be very inaccurate and may result in the choice of a suboptimal execution plan. User-defined cost functions for domain indexes are called by the optimizer only if a domain index is a valid access path for a userdefined operator. User-defined cost functions can return three parameters. Each parameter represents the cost of a single execution of a function or domain index implementation: 1. cpu—Number of machine instructions executed by the function or domain index implementation. 2. I/O—Number of data blocks read by the function or domain index implementation. 3. network—Number of data blocks transmitted. This is valid for distributed queries as well as functions and domain index implementations.
EXAMPLE 5
Cost Consider again the query involving the Brightness() function introduced in Example 4. SELECT * FROM ImgTab WHERE Brightness(imgcol) BETWEEN 50 and 60;
3.7
User-Defined Aggregates
| 93
The optimizer will invoke the ODCIStatsFunctionCost() function implemented within ImgStats_t to estimate the cost of executing the Brightness() function. Typically, the cost function retrieves the usercollected statistics (e.g., in this case, average length of the image column) and computes the cost of one invocation of the function. Table 3.3 describes the ODCIStats interface. Table 3.3 ODCIStats interface. Interface Routine
Description
ODCIStatsCollect
Collects statistics for column and index data
ODCIStatsDelete
Drops statistics
Selectivity
ODCIStatsSelectivity
Estimates the selectivity of a predicate involving user-defined functions or operators
Cost
ODCIStatsFunctionCost
Accepts information about the function parameters and computes the cost of a single execution
ODCIStatsIndexCost
Accepts information about the operator predicate and computes the cost of the domain index scan
Statistics
3.7
User-Defined Aggregates Typical database systems support a small number of aggregate operators (e.g., MAX and MIN) over scalar data types such as number and character strings. However, it is desirable to provide the capability of adding new aggregate operators to the DBMS. For instance, a new aggregate operator SecondMax(), which ignores the highest value and returns the second highest, might be necessary in some application. Further, in complex domains, the semantics of aggregation is not known to the DBMS and has to be provided by the domain code. For instance, MAX() in the spatial domain may refer to the geometry with the largest enclosed area. User-defined aggregate operators (UDAGs) are the mechanisms that incorporate new aggregate operators with user-specified aggregation semantics. Users can specify a set of routines that implements the aggregation logic. Once the aggregate operator is registered with Oracle, it can be used by users wherever built-in aggregates can be used. An aggregate operator operates over a set of rows and returns a single value. The sets of rows for aggregation are typically identified using a GROUP BY clause. For example, SELECT AVG(T.Sales)FROM AnnualSales T GROUP BY T.State
94 |
All Your Data: The Oracle Extensibility Architecture
Conceptually, an aggregate value is computed in three steps. Using the example AVG(), the steps are as follows. 1. Initialize: Initialize the computation. Action: Assign 0 to runningSum and runningCount variables. 2. Iterate: Iteratively examine each of the tuples and perform necessary computations. Action: Add input number value to runningSum variable, and increase runningCount variable by 1. 3. Terminate: Compute the resulting value. Action: Compute the average as (runningSum/runningCount). New aggregate operators can be registered by providing the implementations for these aggregation steps. The variables runningSum and runningCount in the steps determine the state of the aggregation. We can think of the state as a state variable, an object that contains runningSum and runningCount as elements. Thus, the Initialize function initializes the state variable, Iterate updates it, and Terminate uses the state variable to return the resultant value. Note that the state variable completely determines the state of the aggregation. In the presence of a GROUP BY clause in the query, this sequence of steps is performed for every group. Thus, with user-defined aggregate operators, the application defines the set of implementations of the UDAG and specifies each of the implementations in terms of the implementation routines and the data types of the argument values and return value. The application then creates the implementation routines in C++ or Java. Finally, the application uses the UDAG in SQL query statements.
EXAMPLE 6
User-Defined Aggregates An application requires the use of a TopTen aggregate operator, which has the behavior: The aggregate operator, computed over a set of values, determines the 10 largest values and returns a single object having the 10 values as its components. The end user would like to use the TopTen operator in queries such as the following. SELECT TopTen(A1.DollarSales) FROM AnnualSales A1 GROUP BY A1.State
The TopTen aggregate operator is implemented in the following steps. 1. Specify the aggregate operator in terms of its implementations, and specify the data types of the argument values and return value for each implementation as follows. CREATE OPERATOR TopTen AS AGGREGATE BINDING (NUMBER) RETURN TenNUMBER USING TopTenNumber;
3.7
User-Defined Aggregates
| 95
Where TenNUMBER is a collection type capable of holding 10 values and is the return type of the aggregate operator implementation. 2. Implement the aggregate operator implementation routines in any language supported by Oracle for type methods (e.g., PL/SQL, C, C++, or Java). The implementation routines for aggregation are member methods of the TopTenNumber object type. CREATE TYPE TopTenNumber ( FUNCTION ODCIAggregateInitialize(...) ... ); CREATE TYPE BODY TopTenNumber ( FUNCTION ODCIAggregateInitialize() AS BEGIN ...... END ODCIAggregateInitialize; ... );
Table 3.4 describes the ODCIAggregate interface.
Table 3.4
ODCIAggregate interface.
Action
Interface Routines
Description
Serial aggregation
ODCIAggregateInitialize
Initializes aggregation context
ODCIAggregateIterate
Accepts next batch of rows and updates aggregation context
ODCIAggregateTerminate Returns aggregate value Parallel aggregation ODCIAggregateParInit
Initializes parallel aggregation context
ODCIAggregateParIter
Accepts next batch of rows and updates the parallel aggregation context
ODCIAggregateParTerm
Returns the final parallel aggregation context
ODCIAggregateSuperAggr Accepts set of aggregation context and returns aggregate value
3.7.1
Using the User-Defined Aggregates User-defined aggregate operators can be used in DML statements much like built-in aggregates. The evaluation of the UDAG triggers the invocation of the underlying user-supplied routines to compute the aggregate value. For example,
96 |
All Your Data: The Oracle Extensibility Architecture SELECT TopTen(A1.DollarSales) FROM AnnualSales A1 GROUP BY A1.State
triggers the invocation of the initialization routine followed by one or more invocations of the iteration routine followed by an invocation of the termination routine.
3.8
Abstract Tables A user might occasionally need to access data that is outside any database. Such data may have a certain structure, albeit different from the structure of relational or object-relational databases. For example, some data may be stored as Extensible Markup Language (XML) files on a file system. As part of data-cartridge operations, it is sometimes important to perform composite operations that span such external data as well as internal, tabular data. The best way to combine such different structures is by providing data-cartridge developers the ability to create a uniform table metaphor over all data by constructing abstract tables corresponding to the external sources. An abstract table is a virtual table whose data are retrieved by invoking user-registered functions. In its simplest form, the user implements iteration routines to scan the rows of the table. However, abstract tables provide a firm framework for supporting user-defined data manipulation as well. The user can implement routines that will be invoked when the abstract table is created or dropped and when its rows are inserted, deleted, or updated. Abstract tables also support other table features such as building secondary indexes and defining constraints and triggers. An abstract table is created in a manner similar to regular tables, the difference being that the name of the underlying implementation type is specified. The implementation type contains the implementations of the ODCITable interface—consisting of the table scan and manipulation routines. The ODCITable interface is listed in Table 3.5.
EXAMPLE 7
Abstract Tables Suppose we want to read employee data that originates from a set of operating system files. CREATE ABSTRACT TABLE emps_filetab ( name VARCHAR2(30), id NUMBER, mgr VARCHAR2(30) )
3.8
Abstract Tables
| 97
USING LoaderFileReader PARAMETERS (‘/tmp/emp.dat’);
The abstract table definition of emps_filetab specifies the name of the object type LoaderFileReader that implements the ODCITable interface routines. The ODCITable interface includes the create, drop, DML, and scan routines. The PARAMETERS clause can be used to pass userdefined parameters to the ODCITableCreate routine—in this example, the path to the loader file is specified. Once the abstract table has been created, it can be used exactly like regular tables—creating indexes, performing DML operations and queries, and so forth. The ODCITable interface contains the definitions of all the routines that need to implemented by the abstract table implementor. Note that the definitions are static and are specified by Oracle, but their actual implementations must be provided by the user. The ODCITable interface consists of three classes of routines. • Query routines. These are the set of routines that are invoked by Oracle while querying abstract table data. These are also the same set of routines that are implemented for table functions. • DML routines. These are the set of routines that are invoked by Oracle to insert, delete, and update rows of the abstract table. • DDL routines. These are the set of routines that are invoked by Oracle when an abstract table is created or dropped. The implementations of all the ODCITable interface routines are provided by the user in the form of an object type. For example, the type LoaderFileReader contains the implementation of the DDL and query routines for the emps_filetab abstract table. CREATE TYPE LoaderFileReader AS OBJECT ( MEMBER FUNCTION ODCITableStart(...) RETURN NUMBER; MEMBER FUNCTION ODCITableFetch(...) RETURN NUMBER; MEMBER FUNCTION ODCITableClose(...) RETURN NUMBER; ); CREATE TYPE BODY LoaderFileReader AS ... END;
When the user executes a query over the abstract table, a scan is set up to iterate over the rows of the table. SELECT * FROM emps_filetab;
The integrity and consistency constraints across the external data sources can be enforced through additional sets of extensible APIs, which are not elaborated in this chapter.
98 |
All Your Data: The Oracle Extensibility Architecture
Accepts the scan context and returns the next set of rows
ODCITableClose
Performs cleanup at the end of scan
Rowid access
ODCITableLookup
Retrieves row corresponding to the specified row identifier
Describe row
ODCITableDescribe
Returns the metadata descriptor of a row
Query rewrite
ODCITableRewrite
Returns a SQL query string that can be plugged into original query in place of the abstract table
Data manipulation
ODCITableInsert
Inserts a new row into the table
ODCITableDelete
Deletes a row from the table
ODCITableUpdate
Updates a row of the table
ODCITableCreate
Processes parameters specified while creating the abstract table
ODCITableDrop
Performs cleanup when abstract table dropped
Data definition
3.8.1
Table Functions Data-cartridge developers may also need to generate table data dynamically. The extensibility architecture provides this capability through iterative table functions that are complementary to abstract tables. There are scenarios in which data are inherently dynamic and depend on user-supplied parameters. Such data cannot be modeled easily using abstract tables. For example, one might want to access data present in an external web site (identified by a uniform resource locator, URL). A function ReadURL() can be implemented to take in the URL as input and return a collection representing the read data. SELECT * FROM TABLE(ReadURL(“www.oracle.com”));
This solution has some drawbacks. The entire result set is returned from ReadURL() as a single collection. This not only affects the response time of the query, but also consumes more resources. Table functions are the right answer to this problem. They are similar to regular functions returning collections, except that the result is returned iteratively (i.e., in subsets). This is accomplished by implementing the table scan routines in the ODCITable interface.
3.8
EXAMPLE 8
Abstract Tables
| 99
Table Functions The following statement creates a table function GetURL(), which is implemented to return the result collection iteratively. CREATE FUNCTION GetURL(url VARCHAR2) RETURN ROWSET ITERATE USING GetURLMethods;
The type GetURLMethods implements the table scan routines of the ODCITable interface. CREATE TYPE GetURLMethods AS OBJECT ( STATIC FUNCTION ODCITableStart(...)... MEMBER FUNCTION ODCITableFetch(...)... MEMBER FUNCTION ODCITableClose(...).. );
Now, when the user executes a query of the form, SELECT * FROM TABLE(GetURL(“www.oracle.com”));
the table scan routines are appropriately invoked by Oracle to retrieve the rows representing the data contained in the web site. The differences between abstract tables and table functions lie in the dynamism of data and the set of allowed operations. Table 3.6 shows some of the key differences. In other words, abstract tables are a more general mechanism, but do not provide the user the ability to specify parameters at query time. Table functions are a simplified mechanism to provide user-defined iterators in such cases.
Table 3.6
Table functions versus abstract tables. Table Functions
Abstract Tables
Query-time parameters
Yes
No
Create-time parameters
No
Yes
Indexes
No
Yes
Constraints and triggers
No
Yes
Inserts/deletes/updates
No
Yes
Partitioning specification
No
Yes
100 |
All Your Data: The Oracle Extensibility Architecture
3.9
Cartridge Basic Services In order to develop and deploy full-fledged cartridges, Oracle’s Extensibility Architecture provides a set of commonly useful routines. They represent a very useful library, which not only assists in the development of data cartridges, but also facilitates intercartridge coordination. Typically, these basic services are exposed as OCI routines that can be invoked by a cartridge. These cartridge service interfaces include memory management, parameter management, internationalization, error reporting, context management, and file I/O.
3.9.1
Memory Management The Memory Management Interfaces contain support for allocating permanent and freeable memory of several durations (session, statement, user call to server, and server call to cartridge), re-allocating memory, allocating subduration memories, allocating large contiguous memory, and so forth.
3.9.2
Parameter Management The parameter manager provides a set of routines to process parameters from a file or a string. Routines are provided to process the input and to obtain key and value pairs. These key and value pairs are stored in memory and can be accessed via certain routines. The input-processing routines match the contents of the file or the string against an existing grammar and compare the key names found in the input against the list of known keys that the user has registered. The behavior of the input-processing routines can also be configured.
3.9.3
Internationalization To support multilingual applications, national language support (NLS) functionality is required by the cartridges. The National Language Support Run-Time Library (NLSRTL) is a multiplatform and multilingual library that is currently used by Oracle object-relational database management systems (ORDBMSs) and provides consistent NLS behavior to all Oracle products. The basic NLS services are available to cartridge developers in the form of interfaces for the following functionalities: locale-information retrieval; string manipulation in the format of multibyte and wide-char; character-set conversion, including Unicode; and messaging.
3.10
3.9.4
Case Studies
| 101
Error Reporting The cartridge code can return errors or raise exceptions that are handed back to the Oracle server. There are service routines to raise errors, register error messages, and manipulate the error stack.
3.9.5
Context Management Context management allows clients to maintain context across calls to a cartridge. The context maintained by a cartridge could be based on a duration such as SESSION, STATEMENT, or CALL. The cartridge services provide a mechanism for saving and retrieving contexts.
3.9.6
File I/O The OCI file I/O package is designed to make it easier to write portable code that interacts with the file system by providing a consistent view of file I/O across multiple platforms.
3.10
Case Studies The extensibility services and interfaces available in Oracle8i have been used by Oracle to create some commonly useful data cartridges. This section discusses the implementations of these data cartridges with a view to shedding more light on the Extensibility Architecture and its benefits.
3.10.1
The Oracle8i interMedia Text Data Cartridge The Oracle8i interMedia Text Cartridge supports full-text indexing of text documents (Oracle 1999f). The text index is an inverted index storing the occurrence list for each token in each of the text documents. The inverted index is stored in an index-organized table and is maintained by performing insert, update, and delete on the table whenever the table on which the text index is defined is modified. The text cartridge defines an operator Contains, which takes as input a text column and a keyword and returns true or false depending on whether the keyword is contained in the text column. The benefits of the extensible indexing framework can be seen by analyzing the execution of the same text query before and in Oracle8i.
102 |
All Your Data: The Oracle Extensibility Architecture
EXAMPLE 9
Consider the query: SELECT * FROM docs WHERE Contains(resume, ‘Oracle’);
In releases prior to Oracle8i, the text-indexing code, although logically a part of the Oracle Server, was not known by the query optimizer to be a valid access path. As a result, text queries were evaluated as a two-step process: 1. The text predicate is evaluated. The text index is scanned and all the rows satisfying the predicate are identified. The row identifiers of all the relevant rows are written out into a temporary result table, say results. 2. The original query is rewritten as a join of the original query (minus the text operator) and the temporary result table containing row identifiers for rows that satisfy the text operator, as follows: SELECT d.* FROM docs d, results r WHERE d.rowid = r.rid;
In Oracle8i, using the extensible indexing framework, the query is now executed in a single step in a pipelined fashion. The text-indexing code is invoked at the appropriate times by the kernel. There is no need for a temporary result table because the relevant row identifiers are streamed back to the server via the ODCI interfaces. This also implies that there are no extra joins to be performed in this execution model. Further, all rows that satisfy the text predicate do not have to be identified before the first result row can be returned to the user. The performance of text queries has improved due to reduced I/O because of no temporary result table; improved response time because the row satisfying the text predicate can be identified on demand; and better query plans because the number of joins is reduced, as there are no extra joins with temporary result tables—a decrease in the number of joins typically improves the effectiveness of the optimizer due to reduced search space. We have observed as much as a 10-fold improvement in performance for certain search-intensive queries after the integration using the extensible indexing framework.
3.10.2
The Oracle8i Spatial Data Cartridge The Oracle 8i Spatial cartridge allows users to store, spatially index, and query spatial data (Oracle 1999i). The spatial data is modeled as an object type SDO_GEOMETRY. The coordinate values describing a geometry are stored as a collection attribute within the object type. A spatial index can be built on a SDO_GEOMETRY column. The spatial index consists of a collection of tiles corresponding to every spatial object and is stored in an Oracle table. The spatial index is an instance of a spatial Indextype that defines the routines for creating, maintaining, and querying the spatial index. The spatial Indextype supports an
3.10
Case Studies
| 103
operator called Overlaps, which determines which geometries in two given layers overlap with each other.
EXAMPLE 10
A spatial query is of the form SELECT FROM WHERE
r.gid, p.gid roads r, parks p Overlaps(r,p);
The extensible indexing framework has greatly improved the usability of the Oracle spatial cartridge. Prior to Oracle 8i, the user had to explicitly invoke PL/SQL package routines to create an index or to maintain the spatial index following a DML operation to the base spatial table. With this framework, the spatial index is maintained implicitly by the server just like a built-in index. Also, with the extensible indexing framework the logic of using the index to process the queries is encapsulated in the Indextype routines and the end user is not burdened with any details of the index implementation. Prior to Oracle 8i the spatial query had to be formulated as follows: SELECT FROM WHERE
DISTINCT r.gid, p.gid roads_sdoindex r,parks_sdoindex p (r.grpcode = p.grpcode) AND (r.sdo_code BETWEEN p.sdo_code AND p.sdo_maxcode OR p.sdo_code BETWEEN r.sdo_code AND r.sdo_maxcode)
The drawback of this approach is that the querying algorithm, which may be proprietary, has to be exposed to the user, the entire logic has to be expressed as a single SQL statement, and the user is expected to learn the details of the storage structures for the index. In addition to vastly simplifying the queries, the Oracle 8i framework allows changing the underlying spatial indexing algorithms without requiring the end users to change their queries. The performance of spatial queries using the extensible indexing framework has been as good as the performance of the prior implementation.
3.10.3
The Oracle8i Visual Information Retrieval Data Cartridge The Oracle8i Visual Information Retrieval (VIR) cartridge supports content-based retrieval of images (Oracle 1999k). An image is modeled as the ORDImage object type. A BLOB attribute stores the raw bytes of the image. The image cartridge supports building image indexes. For the purposes of building an index, each image is transformed into a signature, which is an abstraction of the contents of the image in terms of its
104 |
All Your Data: The Oracle Extensibility Architecture
visual attributes. A set of numbers that is a coarse representation of the signature is then stored in a table representing the index data. The cartridge supports an operator Similar that searches for images similar to a query image. The benefits of extensible indexing can be seen by analyzing the execution of the same image query before and in Oracle8i.
EXAMPLE 11
Consider the query: SELECT FROM WHERE
* images T VIRSimilar(T.img.Signature, querySignature, ‘globalcolor=0.5, localcolor=0.0, texture=0.5, structure=0.0’, 10,1);
In releases prior to Oracle8i, the image cartridge had no indexing support. Hence, the operator was evaluated as a table scan, and the image comparison had to be done for every row. In Oracle8i, using the extensible indexing framework, the VIRSimilar operator can be evaluated in three phases—the first phase is a filter that does a range query on the index data table, the second phase is another filter that is a computation of the distance measure, and the third phase does the actual image comparison. Thus, the complex problem of high-dimensional indexing is broken down into several simpler components. Also, the first two passes of filtering are very selective and greatly reduce the data set on which the image comparisons need to be performed. The performance of image queries has improved due to the multilevel filtering process instead of doing the image comparison for every row and the optimization of the range query on the index data table using indexes. Thus, in Oracle8i, it is now possible to do image comparisons on tables storing millions of rows, something that was not possible in prior releases.
3.11
Conclusion Oracle8i provides a framework for database extensibility so that complex, content-rich types can be supported and managed natively in the database. This framework provides the infrastructure needed to allow extensions of the database server by creating domain-specific components called data cartridges. The major services provided by the database (type system, server execution environment, indexing, query optimization, aggregation, etc.) are all capable of being customized through a special set of interfaces. These DCIs help the close integration of all kinds of data into the Oracle8i platform.
Introduction Emerging database (DB) applications require the scalable management of a large quantity of complex data together with traditional business data and flexible querying capabilities for business intelligence. Such applications are called universal applications (Stonebraker et al. 1999). Object-relational databases or universal servers are component database management systems (DBMSs) that have been developed to support such applications. They leverage the mature relational database technology, which provides scalability, reliability, and recovery. More important, they enable users to introduce application-specific types and methods into a database. These extensibility features can be seen as infrastructure for providing plug-in components that extend the builtin repertoire of database types and functions with support for additional, nonstandard data types. Tables in a database may now contain such user-defined objects as geographical shapes, images, and semistructured or unstructured text documents. IBM’s DB2 Universal Database product (Chamberlin 1998; Davis 2000) implements important object-relational concepts that have been standardized in SQL:99 (ANSI 1999a), such as structured types with subtyping, inheritance, and value substitutability (Fuh et al. 1999); typed tables, table hierarchies, and object views (Carey et al. 1999); and user-defined functions and methods. With the object-relational features provided by DB2, the content-based search capabilities of DB2 can be extended to new data types such as text, image, video, audio, and spatial. To make this not only possible but also easy, IBM (working with customers and software vendors) has created the Relational Extenders—prepackaged collections of user-defined types, user-defined functions, triggers, constraints, and stored procedures that can easily be plugged into DB2’s SQL engine to support integrated content search. This has enabled DB2 customers to quickly develop customized applications by using the functionality provided by DB2 and its Relational Extenders as parts that can be easily assembled to create powerful applications that meet the search requirements of today. The business value of complex data and nonstandard content cannot be fully realized unless efficient searching and querying can be provided on user-defined objects together with the traditional business data. In order to achieve this, component database management systems have to support a plug-in architecture that goes beyond the support of merely adding new data types. The concept of plug-in components also has to be supported at the index-management level to allow for the definition of new data-access methods for user-defined types. Unfortunately, existing commercial databases are rather primitive in their support of access and indexing of user-defined objects. Btrees (Bayer & McCreight 1972; Comer 1979) often serve as the sole
4.1
Introduction
| 107
indexed-access method. Indexing is also limited in that an index can be created only on table columns whose data types are understood by the access methods and an indexed scan of a table can exploit only those predicates that are understood by the access methods. For example, when a B-tree is the sole indexed access method, only columns of built-in types can be indexed and only relational operators can be exploited during an indexed scan of a table. This chapter presents a high-level framework of indexing for userdefined types. Our framework allows users to concentrate on the semantics of applications and user-defined predicates without being concerned with low-level details of locking, recovery, buffer management, or balanced search-tree updates. It is tightly integrated with the database engine and enhances the value of the underlying access methods, built-in or user-defined, in a database system by supporting indexing on new data types and indexed scans using new predicates. In addition, the enhanced flexibility of providing user-defined methods for generating index entries and search ranges makes our highlevel user-defined indexing support a very attractive solution for coupling object-relational databases with external, content-specific search engines that need to continue to store and maintain content-specific indexes outside the database. Using our approach, the content-specific indexing mechanisms of search engines, such as full-text retrieval engines, can be exploited without having to extend the database engine with new access methods or having to break up the search engine to map its indexing scheme to database index structures. The proposed framework has been implemented in the IBM DB2 Universal Database (Chamberlin 1998). Its generality, ease of use, and performance advantage have been demonstrated in several different domains of applications including, among others, spatial databases (Wang et al. 1998), indexing (Chow et al. 1999) on structured XML (Extensible Markup Language) documents (XML 97), and coupling DB2 with an external full-text search engine. The rest of this chapter is organized as follows. Section 4.2 revisits the implicit assumptions that are hard-wired in existing database systems that make them inadequate for indexing on user-defined objects. Section 4.3 presents our high-level framework for direct user control over the indexing of user-defined types. Section 4.4 demonstrates its application in two distinct domains. Section 4.5 focuses on using our high-level indexing framework for integrating external search engines. Section 4.6 reports on performance experiences, and Section 4.7 discusses related work and concludes the chapter.
108 |
Extensible Indexing Support in DB2 Universal Database
4.2
Hard-Wired Indexing B-trees are arguably the most popular indexed access method in relational databases (Bayer & McCreight 1972; Comer 1979). Existing database systems are heavily hard-wired to support only B-tree indexing on primitive data with relational operators. This section reviews the implicit assumptions that have been made and examines the resulting limitations for the indexing of user-defined types. With B-trees as the built-in indexed access method, most database systems only support primitive indexing that is limited by the B-tree indexed access method. Indices are defined by specifying the table name, the set of index columns, the sort order, and the unique constraint of an index: CREATE TABLE employee (empno Char(6), name Char(20), title Char(20), salary Integer); CREATE INDEX salary_index on employee (salary ASC);
A B-tree index is a balanced tree where each node contains a set of sort range–pointer pairs. Sort ranges are identified by an index key value and are used to determine the path of the B-tree traversal. Pointers are used to go from one level of the B-tree to another. For intermediate nodes, they point to the child nodes where the corresponding sort range is further divided. For leaf nodes, the pointers point to tuples stored in the table. When a tuple is inserted into a table, the values of all index columns are concatenated to form the index key that is used to traverse the Btree for the insertion of the new index entry. Similarly, when a tuple is deleted, the index key is used to identify the index entry for deletion. Once an index is created for a table, subsequent queries can take advantage of the indexed access path provided by the index: SELECT name, salary FROM employee WHERE salary > 50000;
Existing database systems make several implicit assumptions in their indexing support due to the use of B-trees as the built-in, indexed access method. First, an index is created on the values of table columns directly. The index key is the concatenation of the values of the index columns. Clearly this is not acceptable for user-defined objects, which can be large binary objects or text documents that are not appropriate as index-key values. Even if all the index columns have built-in types, users may want to create an index on some values derived from the values of the index columns, for example, compensation level based on the salary or keywords in the title of a book (Lynch & Stonebraker 1988).
4.3
High-Level Indexing of User-Defined Types
| 109
Second, a total order is assumed over the domain of index-key values. An indexing search is restricted by a single range of index-key values. For example, the predicate salary > 50000 maps trivially to the range (50,000, ∞). This is not sufficient for a user-defined predicate that may bound a search in more than one dimension, for example, within a certain distance from a specific location. Third, for index exploitation, which exploits any available indexes for efficient query execution, only simple predicates of relational operators are considered by query optimizers. User-defined predicates, however, may contain external functions. In a spatial database, a predicate such as distance(location, point(10,10)) ≤ 5 may limit the search space of location within the circle centered at (10, 10) and with radius 5. Query compilers need to be able to recognize userdefined predicates and know how to derive the corresponding search space in order to exploit any index of user-defined types for efficient query execution.
4.3
High-Level Indexing of User-Defined Types As identified in Section 4.2, built-in indexed access methods such as Btrees show a number of restrictions, making them unsuitable for userdefined types and predicates. Our framework of high-level indexing of user-defined types sits on top of the underlying access methods in a database system and eliminates these restrictions by providing direct user control over index maintenance, search key generation for userdefined predicates, and efficient predicate execution through filtering. This approach, which is illustrated in Figure 4.1, opens up the hardwired index maintenance and exploitation logic used for traditional data types and provides the capability to plug in user-defined index components that precisely define the support for user-defined types and predicates, based on the underlying conventional access method(s), such as B-trees.
Figure 4.1
High-level indexing of user-defined types. Traditional data types
User-defined data types
Hard-wired: Index maintenance Search/lookup
High-level indexing: Index maintenance
Conventional acess methods
110 |
Extensible Indexing Support in DB2 Universal Database
This section describes the main components of the framework and its implementation in IBM DB2 (Chamberlin 1998). To simplify the discussion, we consider indexing on a single column of a possibly userdefined type. This is not a restriction because indexing on multiple columns can be carried out in the same way, as if a new type were defined that contains one attribute for each index column.
4.3.1
Index Maintenance Index maintenance deals with the update of an index when tuples are inserted, deleted, or updated in a table. Existing database systems often treat the value of an index column directly as its index key (i.e., the mapping from values of an index column to index keys is the trivial identity mapping). To untie index keys from the values of an index column, we allow a user-defined key transform. Given the value of an index column, the key transform returns one or more index key values. Therefore a key transform is, in general, a function that returns a set as its result and can be implemented as a table function in an objectrelational DBMS. Each row in the result table forms an index key. The introduction of key transforms brings several fundamental benefits. First of all, the domain of index keys is logically separated from the domain of values for an index column. Because an index column can be of any user-defined type, its values may be large objects (LOBs) or structurally rich text documents, among other things. It is impossible to store them directly in an index. Nevertheless, an index can still be created on them using index keys derived by the key transform. Second, even if the values of an index column are all of built-in types, using index keys derived by a key transform can have some nice properties that are not satisfied by indexing on the index column values directly. For example, a high-dimensional space can be mapped to a linear-ordered space such that multidimensional clustering is preserved and reflected in the one-dimensional clustering (Bayer 1996; Jagadish 1990). Distance-preserving transformations have been successfully used to index high-dimensional data in many applications, such as time sequences (Faloutsos et al. 1994b) and images (Faloutsos et al. 1994a). In Berchthold et al. (1998), a new indexing method is proposed for high-dimensional data spaces that can be implemented through a mapping to a one-dimensional space. Key transforms allow the implementation of these new indexing methods on top of existing access methods such as B-trees. Third, from an abstract interpretation point of view, index keys can be viewed as abstractions of the corresponding values of index columns and are simpler and/or occupy less space (Chow et al. 1999). For spatial applications, index keys often represent approximations of spatial objects, such as z-values in Orenstein & Manola (1988) or minimum bounding rectangles (MBRs) in R-trees (Guttman 1984). Depending on
4.3
High-Level Indexing of User-Defined Types
| 111
the abstraction defined by the key transform, the more information is stored in an index, the more filtering can be done by indexed search, thus offering a trade-off between indexing cost and search efficiency. Fourth, a single value of an index column can be mapped to a number of index keys using a table function as a key transform. The relationship between values of an index column and index keys is no longer one to one, but many to many (e.g., z-transform, Orenstein & Manola 1988; and keywords in a book title, Lynch & Stonebraker 1988). Different values of an index column can have the same index key, and one value of an index column can have multiple index keys associated with it. The idea of key transforms has been explored in Lynch & Stonebraker (1988) for keyword searching in textual databases. We are using the same idea as one of the building blocks for our framework of highlevel indexing of user-defined types. It should be mentioned that the idea of key transforms is not new (see Orenstein & Manola 1988). However, our framework allows users to define their own key transforms that will be managed automatically for index maintenance by the database system.
DEFINITION 1
Let S be a concrete domain and let I be a set of index keys. A key transform for S into I is a function K of type S → 2I. The key lookup for K is a function L from I to 2S defined by L(i) = {s | i ∈ K(s)}. We call the quadruple (S, I, K, L) an index abstraction. Notice that both K and L can be extended naturally to take a set of values and we will use the same symbol to denote such extensions.
EXAMPLE 1
The following are some examples of key transforms. K1: The identity function. K2: Maps a geographical object into a set of fixed-size grids. K3: Maps an XML document into a set of paths. K4: Maps a text document into a set of keywords. The corresponding key lookup is the reverse mapping from index keys to user-defined objects. For example, the key lookup for K4 maps a keyword into all documents containing the keyword. In general, an index maintains the association between the indexed objects and their index keys. As far as users are concerned, an index provides a search function that, given an index key, returns the corresponding objects in the index. Because the set of objects that are indexed changes dynamically as tables are updated, the search function has to work correctly all the time. Definition 2 captures this notion of correctness more formally for the previously introduced concept of index abstractions.
112 |
Extensible Indexing Support in DB2 Universal Database
DEFINITION 2
Let A = (S, I, K, L) be an index abstraction. An index with respect to A is a (search) function f from 2S × I to 2S, where the first argument of f is a finite subset of S representing the set of objects that are currently in the index. An index f is sound (respectively, complete) if for every finite set O ⊆ S (of indexed objects) and for every index key i ∈ I, f(O, i) ⊆ O ∩ L(i) (respectively, f(O, i) ⊇ O ∩ L(i)). The soundness of an index means that the search function will return only objects that have the given index key according to the key transform. The completeness of an index means that the search function will return all objects that have the given index key. In other words, given the set O of all objects that are currently indexed, the index-search function should compute exactly the restriction of the key lookup to O. For example, a textual index based on keywords should return exactly the set of indexed textual documents that contain a given keyword.
4.3.2
User-Defined Predicates and Search-Key Generation Existing database systems support simple predicates of relational operators for which the corresponding search ranges can be easily determined based on the operators and the bound arguments. To provide extensible indexing over user-defined objects, two issues have to be tackled. First, a user-defined type may or may not support the standard relational comparisons. Even if it does, these relationships may not translate directly to search ranges over index keys. In addition, users may want to search based on application-specific predicates other than relational comparisons, such as overlap and within in spatial databases. Second, a predicate or condition defined by a user can be an arbitrary condition representing some complicated relationship among different objects. When such a user-defined predicate is used to exploit any existing index for efficient query execution, the rich semantics of user-defined predicates requires the sophisticated computation of the corresponding search ranges to be used for the indexed scan of table. This is an efficiency issue because the complete range of index keys is always a logical candidate for search. Consider Example 2.
EXAMPLE 2
CREATE TABLE customer (name varchar(20), id integer, ..., xyloc location); ... CREATE INDEX locationIdx on customer(xyloc); ...
4.3
High-Level Indexing of User-Defined Types
| 113
SELECT * FROM customer WHERE within(xyloc, circle(...));
A straightforward representation is to use the x and y coordinates as the index key. An index key value is the concatenation of the x and y coordinates. Assuming that the concatenated keys are sorted first in the x coordinate and then in the y coordinate, then a single range over the index keys corresponds to a vertical slice in Figure 4.2. Thus, an implementation using B-trees for the SELECT statement can restrict the search only within the vertical slice. But the actual search region is the MBR containing the circle in the WHERE clause. This shows that a single range of index keys is insufficient to represent the search space accurately. Figure 4.2
Search region for locating customers within a region. Vertical Actual search region
For extensible indexing with user-defined predicates, we want to represent the corresponding search region as closely as possible and introduce the concept of search methods. Each search method is a userdefined function that given a semantic relation over user-defined objects and one of its search patterns, returns a set of search keys.
DEFINITION 3
Let r(a1, ..., an) be an n-ary relationship, where n ≥ 1 and each ai (1 ≤ i ≤ n) is of some domain Di. For each i (1 ≤ i ≤ n), there is a corresponding search pattern of r for ai in which ai is the search target and every aj ( j ≠ i) is a search argument. Let A be an index abstraction of the form (Di, I, K, L). A search method for the search pattern of r for ai is a function of the search arguments, g(a1, ..., ai–1, ai+1, ..., an), that returns a subset of I. The search method g(a1, ..., ai–1, ai+1, ..., an) is sound with respect to relation r and index abstraction A if whenever r(a1, ..., an) holds, ai ∈ L[g(a1, ..., ai–1, ai+1, ..., an)]. A search method computes the set of search keys over which the possible search targets can be found. For the query in Example 2, a search method can be defined for the semantic relationship within, where the first operand is the search target and the second operand is the search argument. Assuming that an index key is a fixed-size grid intersecting with an object, the search method can return the minimal set of grids that covers the circle given in the search argument.
114 |
Extensible Indexing Support in DB2 Universal Database
A search method in general is only an approximation for the semantic relation r in the sense that every search target participating in the relation r with search arguments must have an index key among those returned by the search method. For instance, every geometric object that is within the circle given in the search argument must have a grid that is in the set of search keys generated by the search method. However, a search method may not be exact, in the sense that some objects with an index key among those returned by the search method may not satisfy r with the search arguments. Therefore, it is necessary in general to evaluate r for every object that is found using the index keys from a search method.
4.3.3
Index Exploitation Index exploitation is performed by query optimizers in order to use any index for efficient query execution. Traditionally, query optimizers have been able to exploit only simple relational operators for indexing because the corresponding search range can be easily determined. For index exploitation with user-defined predicates, the query compiler must be able to recognize them and find the relevant search methods to use. The definition of a user-defined function is extended to specify whether it can be used as a predicate and, if so, what search method to use when certain operands are search arguments. (See Section 4.3.4 for details of the syntax of predicate specifications.) For the query in Example 2, suppose that within has been defined as a predicate that has an associated search method when the second operand is a search argument. The query compiler can choose an index scan over a table scan to retrieve records from the table customer for two reasons. One is that there is an index on xyloc attribute. The other is that the query compiler recognizes that the second operand of within is bound and within is a predicate with a search method when the second operand is a search argument. The index scan will use the corresponding search method to generate a set of search keys, which represents the minimal set of grids covering the circle in the second operand. The set of search keys will be used by the underlying access method to retrieve the relevant records from the table customer.
4.3.4
Implementation and Predicate Filtering The high-level framework of indexing of user-defined types has been implemented in IBM DB2. In addition to index maintenance, userdefined predicates, and index exploitation, the implementation also provides user control over multistage evaluation of user-defined predicates through filtering. This avoids the potentially expensive evaluation of user-defined predicates and reduces both I/O and CPU costs.
4.3
High-Level Indexing of User-Defined Types
| 115
Figure 4.3 shows the syntax for index extensions with the associated key transform and search methods. The semantic relation corresponding to a search method is not explicitly specified. The CREATE INDEX EXTENSION statement defines a parametric index extension. A parametric index extension is instantiated when an index is created on a table using CREATE INDEX statements. The parameters of an index extension can be used to specify, for example, the number of layers and the size of a grid in a multilayer grid index.
Figure 4.3
Syntax for index extensions, where N+ specifies one or more occurrences of N, with the separator ‘,’ when appropriate. ::= CREATE INDEX EXTENSION ::= [ ‘(‘ {<parmName> <parmType>}+ ’)’ ] ::= FROM SOURCE KEY ‘(‘ ’)’ /* index columns */ GENERATE KEY USING
::= WITH TARGET KEY ‘(‘ { }+ ’)’ SEARCH METHODS {<search method>}+ <search method> ::= WHEN <searchmethodName> }+
‘(‘
{
’)’ /* search arguments */
RANGE THROUGH <search key producer> [ FILTER USING
]
::= DROP INDEX EXTENSION ::= CREATE INDEX ON ‘(‘ ’)’ USING ‘(‘ {}+ ’)’
116 |
Extensible Indexing Support in DB2 Universal Database
The key aspects of an index extension include the key transform function (indicated by ) and the associated search methods. Each search method contains a search-key producer function (indicated by <search key producer>) that computes the set of search keys given search arguments and an index-filter function (indicated by ) used inside the index component. The user control over the index filter is a powerful concept. It provides early filtering using the index keys. This avoids the I/O cost of retrieving data that obviously do not satisfy the search criteria because data will not be retrieved from the disk using an index scan until the index keys are determined. This also makes it possible for users to combine multiple indexing mechanisms in a single search by plugging in an index filter that performs additional search (e.g., using an external search engine). Figure 4.4 shows the syntax of user-defined functions that can serve as predicates. Each predicate specification indicates an optional filter function, and the associated search methods for various search patterns. The data filter aims to reduce the potentially expensive evaluation of the predicate by filtering out records that do not satisfy the predicate
Figure 4.4
Syntax for user-defined predicates and their associated search methods. ::= CREATE FUNCTION ’(‘ {[ <parmName}] }+ ’)’ ...
/* additional options */
[ PREDICATES
‘(‘ <predicate specification>+ ’)’ ]
<predicate specification> ::= WHEN
( ‘=’ | ‘<>’ | ‘<’ | ‘>’ | ‘<=’ | ‘>=’ )
( | EXPRESSION AS <expr-name> ) [ FILTER USING ] ::= SEARCH BY [ EXACT ] INDEX EXTENSION <exploitation rule>+ <exploitation rule> ::= WHEN KEY ‘(‘ { <paramName> }+ ’)’
/* search target */
USE <searchmethodName> ‘(‘ { <paramName> }+ ’)’
4.3
High-Level Indexing of User-Defined Types
| 117
using simpler and cheaper operations. In <exploitation rule>, the parameters following WHEN KEY indicate the search argument. The optional keyword EXACT following SEARCH BY requires a little explanation. When an index scan using a predicate is executed, the corresponding search method, which is a user-defined function, is invoked. It computes a set of search keys for the search target using the search arguments. The search keys are sent to the underlying access methods to retrieve the relevant records. The index filter associated with the search method is applied, if one exists, before the records are retrieved from the disk. The relational data manager then applies the data filter associated with the predicate specification. Finally, all records that pass through the data filter are evaluated using the predicate. When the index lookup and the filters defined by an index extension provide only an approximation for the predicate (e.g., in spatial applications), the final step of predicate evaluation is still necessary. However, in other applications such as document search, the index lookup together with the (optional) index filter may compute the exact set of all answers that satisfy the predicate. The final step of data filter and predicate evaluation should not be carried out in this case. The keyword EXACT indicates such a situation. Figure 4.5 shows the architecture of the implementation in DB2. It can leverage any underlying access methods that are available. The rectangular boxes represent places where user-defined functions can be plugged in to support user-defined search. The key transform is invoked in the index manager for index maintenance when tuples are
Figure 4.5
Implementation of high-level indexing of user-defined types. Query
Query compiler:
Predicate specification
Index exploitation
Insert/delete/update Relational data manager:
Search arguments
Predicate evaluation
Search methods
Data filter
Table update
Index manager:
Search keys
Key transform Index update
Access method
Search
Index filter
Table
118 |
Extensible Indexing Support in DB2 Universal Database
inserted, deleted, or updated in a table. The query compiler uses specifications of user-defined predicates for index exploitation. During a search based on a user-defined predicate, the corresponding search method is invoked by the relational data manager to generate a set of search keys. For retrieval based on a user-defined predicate, two filters are included in the architecture. The purpose is to avoid the potentially expensive evaluation of user-defined predicates. Users can specify simpler and cheaper functions to be applied as filters before predicate evaluation. The index filter filters out records before they are retrieved from the disk into buffers inside the relational data manager. The data filter in the relational data manager presents another chance for costefficient filtering before expensive predicates are evaluated.
4.4
Applications In this section, we illustrate how the generic framework for high-level indexing of user-defined types can be applied effectively in two widely different content domains: geographic information systems and XML document search. Note that the applicability of our framework is by no means limited to these areas.
4.4.1
Indexing for Geographical Information System Applications In traditional geographical information systems (GISs), indexing on spatial data is provided through a set of proprietary APIs. When a query involves searching on spatial data, the spatial predicates are transformed for index exploitation, and the resulting query is then sent to the database for optimization and evaluation. The lack of integration of spatial indexing with the database engine leads to integrity issues and performance hits. Our framework of high-level indexing makes it possible to have spatial indexing in a database and still take advantage of the special search methods that have been developed in GISs.
EXAMPLE 3
Suppose that the following user-defined types have been created. CREATE TYPE envelop AS (xmin ymin
int, int,
xmax ç int, ymax
int);
CREATE TYPE shape AS (gtype varchar(20),
4.4
Applications
| 119
mbr envelop, numpart sint, numpoint sint, geometry BLOB(1M)) NOT INSTANTIABLE; CREATE TYPE point UNDER shape; CREATE TYPE line UNDER shape; CREATE TYPE polygon UNDER shape;
where shape serves as a supertype for various subtypes such as lines and polygons. Two tables have been defined in the database, one storing the information about schools and the other containing the information on households and their income information: CREATE TABLE schools AS (name varchar(20), district varchar(20), address varchar(20), area shape, PRIMARY KEY (name, district)); CREATE TABLE households AS (address varchar(20), annualincome int, location shape);
The following query tries to compute the average annual income of all households inside the attendance area of a specific school: SELECT FROM WHERE
avg(h.annualincome) households h, schools s s.name = ‘Armstrong Elementary’ AND s.district = ‘Highland Park’ AND within(h.location, s.area);
To allow the efficient execution of this query, we need to (a) create an index extension incorporating user-defined key transform and search methods for shape, (b) create an index on table households using the index extension, and (c) specify predicates for within and the associated search methods. The following statement defines an index extension over type shape. It uses a multilayer grid index for shapes.
120 |
Extensible Indexing Support in DB2 Universal Database CREATE INDEX EXTENSION gridshape(levels varchar(20) FOR BIT DATA) FROM SOURCE KEY (sh shape) GENERATE KEY USING gridkeys(levels, sh..mbr..xmin, sh..mbr..ymin, sh..mbr..xmax, sh..mbr..ymax) WITH TARGET KEY (level int, gx int, gy int, xmin int, ymin int, xmax int, ymax int) SEARCH METHODS WHEN search_within(area shape) RANGE THROUGH gridrange(levels, area..mbr..xmin, area..mbr..ymin, area..mbr..xmax, area..mbr..ymax) FILTER USING checkduplicate( level, gx, gy, xmin, ymin, xmax, ymax, levels, area..mbr..xmin, area..mbr..ymin, area..mbr..xmax, area..mbr..ymax) WHEN search_contain(loc shape) RANGE THROUGH gridrange(levels, loc..mbr..xmin, loc..mbr..ymin, loc..mbr..xmax, loc..mbr..ymax) FILTER USING mbroverlap(xmin, ymin, xmax, ymax, loc..mbr..xmin, loc..mbr..ymin, loc..mbr..xmax, loc..mbr..ymax);
(DB2 uses the double-dot notation for accessing attributes of objects of user-defined types.) The index extension definition specifies the function for key transform, gridkeys and two search methods: one for searching within a specific area and the other for finding shapes that contain a specific location. Both search methods use the same function, gridrange, to generate a set of index keys for potential search targets. Each search method has its own filtering function. All the functions that are mentioned may be user-defined functions, whose definitions are omitted here. We are now ready to create an index on the location column of table households: CREATE INDEX houselocIDX ON households(location) USING gridshape(‘10 100 1000’);
4.4
Applications
| 121
The parameter indicates three levels of grid sizes. For index exploitation, we need to define the predicates and their associated search methods. The following specification indicates that within should be viewed as a predicate. CREATE FUNCTION within(s1 shape, s2 shape) RETURNS int LANGUAGE C ... EXTERNAL NAME ‘/lib/spatial/ gislib!within’ PREDICATES ( WHEN = 1 FILTER USING mbrwithin(s1..mbr..xmin, s1..mbr..ymin, s1..mbr..xmax, s1..mbr..ymax, s2..mbr..xmin, s2..mbr..ymin, s2..mbr..xmax, s2..mbr..ymax) SEARCH BY INDEX EXTENSION gridshape WHEN KEY (s1) USE search_within(s2) WHEN KEY (s2) USE search_contain(s1));
The last three lines indicate that searching based on predicates of within will be done using an index extension gridshape.When the first argument s1 is the search target, use search method search_within with s2 as the search argument. When the second argument s2 is the search target, use search method search_contain, with s1 as the search argument.The query compiler is able to generate a plan that takes advantage of the access path provided by the index on location of table households. The key transform, search-key producer, and filtering functions will be called automatically at appropriate places.
4.4.2
Indexing on XML Documents The high-level framework of indexing of user-defined types has also been applied to structured documents, such as XML documents (XML 1997). These documents are organized by nested tags with applicationdefined semantics. Searching for XML documents often requires a structured search, where queried words are matched only within particular structures. These structures are specified by a path expression consisting of a sequence of tags. For example, if a user wants to search for documents whose chapters contain the word “Data,” the user may invoke a user-defined function structSearch(content, ‘book. toc.chapter’, ‘Data’), where the argument book.toc.chapter specifies the path leading to chapters. If an XML document is stored as an attribute in a relational table, an immediate question is how to search these documents efficiently, or more specifically, how to create an index for a structured search that does not require scanning source documents at query-execution time.
122 |
Extensible Indexing Support in DB2 Universal Database
Supporting a structured search requires a tight integration of searching on structures (matching path expressions) and searching on content (matching the keyword list) (Baeza-Yates & Navarro 1996; Consens & Milo 1994; McHugh et al. 1997). The challenge comes from the facts that indexing for a structured search is not readily available and that indexing for content search often relies on an external full-text index. With the high-level framework of indexing of user-defined types, we are able to achieve the necessary tight integration. The structure index is implemented inside DB2 and, at the same time, interacts with the external full-text index outside DB2.
EXAMPLE 4
The following shows the declaration of an index extension for a structured search. CREATE INDEX EXTENSION xml_structure FROM SOURCE KEY (xml db2xml) GENERATE KEY USING pathGenerator(xml..doc) WITH TARGET KEY (path varchar(100), doc varchar(100), position int) SEARCH METHODS WHEN structure (path varchar(100), word varchar(20)) RANGE THROUGH pathRange(path) FILTER USING checkFullText(path, doc, position, word);
where db2xml is a user-defined type for XML documents. The key transform is a user-defined function pathGenerator. It generates a set of paths from a document docid and results in a set of index keys of the form (path, docid, position). There is only one search method in this index extension, structure with two arguments. The user-defined function pathRange(path) produces a set of search keys given a path, which is filtered inside the index manager by the user-defined function checkFullText(path, docid, position, word). The filter actually invokes an external full-text index for content search, thus achieving an integration of a structured search with an external full-text content search. The following shows an example of a table using the index extension and a user-defined predicate for index exploitation. CREATE TABLE designdb(docid integer, author char(20), notes db2xml); ... CREATE INDEX docIdx on designdb(notes) using xml_structure;
4.4
Applications
| 123
... CREATE FUNCTION searchStruct(xml db2xml, path varchar(100), word varchar(20)) RETURNS integer LANGUAGE C ... EXTERNAL NAME ‘/home/chowjh/xml/xmllib!searchStructure’ PREDICATES ( WHEN = 1 SEARCH BY EXACT INDEX EXTENSION xml_structure WHEN KEY (xml) USE structure(path, word)); ... SELECT FROM WHERE
For this query, the optimizer may generate a query plan that uses the index scan. Because the search target is the first argument of the predicate searchStruct, the search method structure is used. During query execution, the structured index of path expressions is first searched using the search argument ‘book.toc.chapter’. All index keys that are found are then filtered by calling checkFullText, which returns only those index keys for which the word ‘SQLJ’ occurs at the corresponding position in the corresponding document. Only those XML documents that satisfy both criteria are retrieved from the table designdb. It should be mentioned that the integration of a structured search using path expressions with a content search using keywords occurs inside the index manager, thus avoiding unnecessary data retrieval from the disk. Without our framework of high-level indexing, it is rather difficult for users to explore the integrated search of XML documents using both path expressions and keywords. For example, an alternative to Example 4 might be to have two separate user-defined predicates, one for exploring the structured index only and one for exploring the external full-text keyword index only: SELECT FROM WHERE
* designdb searchpath(notes, ‘book.toc.chapter’) AND searchword(notes, ‘SQLJ’)
However, this is not really a viable alternative, in terms of neither semantics nor performance. The new query does not support the intended semantics where the keyword ‘SQLJ’ is searched for only
124 |
Extensible Indexing Support in DB2 Universal Database
within the structure indicated by the path expression. Instead it will search the entire document for the keyword. In terms of efficiency, only one predicate will be used for index exploitation and the other predicate will serve as a filter after the corresponding documents are retrieved from the disk.
4.5
Loose Integration of External Search Engines The previous section showed the use of our high-level indexing approach for tightly integrating new search capabilities into the DB engine, storing the complete index data for content search inside the database itself. While this is, in general, the preferred approach, it may not always be feasible because it may require the mapping of highly tuned content-specific index data structures and the processing model of existing content-specific search and indexing engines to the indexing approach of the database system. This can very often be too costly or result in limitations. For example, consider the integration of full-text retrieval into the database. Following the standard approach of supporting full-text queries through inverted word lists, we can easily realize an index extension for text whose key-transform function produces a table of all the words that appear in an indexed document and stores each row of the table as a separate entry in the underlying B-tree. However, in order to process a proximity search on text, such as find all documents where the word “database” appears in the same sentence as “object-relational,” a sophisticated full-text retrieval engine needs to also store positional information in the text index and needs to efficiently aggregate and combine index lookups for individual words with one another to determine the result of the text query. If we wanted to store the text index information in DB2 indexes completely, this would require the means to combine the outcomes of the index lookup operations for each word (i.e., “database” and “object-relational”) using a join operation based on the document identifier (id) and sentence number. This goes beyond the capabilities offered by our extensible indexing approach. However, extensible indexing provides effective support for solving this problem through a loosely coupled approach for integrating external search engines, which we explore using the DB2 Text Extender.
4.5.1
DB2 Text Extender Overview The DB2 Text Extender, a plug-in component developed for IBM’s DB2 Universal Database product (Chamberlin 1998), employs DB2’s objectrelational features to provide user-defined types and user-defined functions for integrating text search into DB2 by using an IBM stand-alone text-search engine called SearchManager.
4.5
EXAMPLE 5
Loose Integration of External Search Engines
| 125
Using Text Extender, a user can define a table with a full-text document column in the following way to create a table with information about projects. CREATE TABLE projects ( proj_no
integer,
title
varchar(50),
budget
integer,
description
CHARACTER LARGE OBJECT,
description_id
db2texth)
Text content is stored in the table using the traditional data types available for character data, such as variable-length character data types or character LOBs. Each text column is accompanied by an additional column of type db2texth, which is a user-defined type introduced by Text Extender. The values of these columns (also called text handles) serve, among other things, to uniquely identify the text documents in the text column for the search engine. When issuing text-search queries, the accompanying handle column has to be used instead of the text column itself, as illustrated in the following query. SELECT FROM WHERE
proj_no, title compschema.projects contains(description_id, ‘ “database” IN SAME SENTENCE AS “object-relational” ’) = 1
This query would return all projects with a textual description that contains the word “database” in the same sentence as “object-relational.” The contains function involved in this query is used to perform the text search and has two arguments (a value of type db2texth and a search pattern string). Figure 4.6 illustrates the basic architecture of the Text Extender in terms of the interaction of the database engine with the text-search engine and helps to clarify how the query in Example 5 would be evaluated. Please note that the architecture depicted in the figure is so far incomplete; it illustrates only aspects related to text-search user-defined functions (UDFs). Additional functionality and components of Text Extender, such as client components and administrative APIs, are not described. The query is submitted via a DB2 client to the DB2 server engine. For each row in the projects table, the engine calls the contains function with the contents of the description_id column and the search pattern as arguments. The contains function is a user-defined
126 |
Extensible Indexing Support in DB2 Universal Database
Figure 4.6
Interaction of database and text-search engine in Text Extender. Client
Application
DB2 clients
Server
DB2 server Contains UDF engine
DB
Text search engine
Text index
function written in a 3GL (C). It again calls the text-search engine (realized as a set of C functions in a shared library), passing it the search pattern as well as the name of the index covering the text documents stored in the description column of the Projects table.1 The text-search engine returns the result of the text search to the contains UDF body in form of a list of document identifiers (i.e., values of the type db2texth). The UDF checks whether the identifier that has been supplied by the database engine is actually contained in the result list returned by the text-search engine and returns the appropriate result (1 for true, or 0 for false) to the database engine. Based on the result of the contains function call, the database engine will construct the query result. It is possible for the contains UDF to keep the results returned by the text-search engine across invocations inside of a query by using a special scratchpad memory area supplied by the engine. Therefore, the actual text search using the external text-search engine has to be performed only once, during the first invocation of the contains UDF inside of a query. There are several issues related to this type of coupling that we address here very briefly.
•
Text-index creation is initiated using a separate Text Extender API implemented on top of DB2. It used DB2-stored procedures to have the text engine build the index, which consists of a set of files stored outside DB2 in an index directory on the server system.
•
Meta-information relevant for the text search, such as the names of database columns enabled for the text search and the names of the external text indexes covering a certain database column are stored
1 This information is stored in Text Extender system catalogue tables in the database.
4.5
Loose Integration of External Search Engines
| 127
in additional catalogue tables. The catalogue table is a regular DB2 table whose contents are manipulated by the Text Extender administrative functions, but can be read by the end user using standard SQL.
•
To construct the external text index, a separate process is initiated that accesses the text catalogue table and determines the text column containing the text data, as well as the associated handle column. It then reads the text data plus the handle on a per-document basis and analyzes the text for constructing the index entries, storing the text handle value as a unique identifier of the document inside the index.
•
DB2 triggers, which may also call UDFs in the conditions and trigger bodies, are used to reflect updates performed on text columns correctly in the corresponding text indexes.
•
The contains UDF for text search runs in the same process and address space as the database engine. This extremely minimizes UDF overhead by eliminating expensive interprocess communication.2
For a detailed description, the reader is referred to (IBM 1999).
4.5.2
Exploiting High-Level User-Defined Indexing Although the described architecture permits the exploitation of an external text-search engine through its native APIs and usually limits the interaction to a single call (performed during the first call of the contains UDF), there is no way to avoid a full-table scan on the Projects table. In other words, although the text-search engine can provide the result in form of a set of identifiers in one call, the database engine will call the contains function for each row in the table. The main problem is therefore: How can the set of identifiers returned by a text index lookup (involving the API of the external text-search engine) be fed back into the database query-evaluation process in a way that is comparable to a traditional database index lookup to avoid the table scan. In order to solve this problem, we use our user-defined indexing framework in a way that fundamentally differs from the use demonstrated in the previous sections. Instead of indexing on the text content itself (which is left under the control of the external text-search engine), we index only on the identifiers of the text documents. In other words, the index is built on the text handle of type db2texth, and the key-transform function only extracts the text document identifier that is stored as a part of that handle to populate the index entries in DB2. The search method of the index then passes the text-search 2 In addition, DB2 supports a fenced execution mode for untrusted UDFs, in which the UDF runs in its own process, separate from the DB engine.
128 |
Extensible Indexing Support in DB2 Universal Database
argument to its range-producer function, which uses the functionality of the text-search engine to perform the external index lookup for the text search. The list of text document identifiers returned by the search engine is then returned as a set of document identifier ranges as the result of the range-producer function of our new index type. The following statement creates the index extension for our Text Extender. CREATE INDEX EXTENSION TextIdx(protHd VARCHAR(60) FOR BIT DATA) FROM SOURCE KEY (handle-id db2texth) GENERATE KEY USING TXGetKey(handle-id) WITH TARGET KEY (x_key VARCHAR(18) FOR BIT DATA) SEARCH METHODS WHEN containSearch (s_arg LONG VARCHAR) RANGE THROUGH TXGetHandleSet(protHd, s_arg);
Additional index metadata can be provided at index creation time using the protHd (prototype handle) index parameter and is then available to the range-producer function. This includes information such as the name of the external text index, which has to be provided as an input parameter to the text search engine during text search.3
EXAMPLE 6
Let us illustrate index creation using the TextIdx index extension. CREATE INDEX Desc_idx ON Projects(description_h) USING TextIdx(‘...IX1234...’)
This statement, whose effects are illustrated in Figure 4.7 on page 130, creates an index on the description_h column of the Projects table, providing the name of the external index IX1234 as a property of the newly created index. During index creation (and during further insert, update, or delete operations on the Projects table), the keytransform function TxGetKey is invoked and extracts the document identifiers from the text handles. These identifiers are stored in the DB2 index Desc_idx (together with the internal row identifiers used by DB2). In order to be able to use the TextIdx index extension during a text search, we have to define the contains UDF as a user-defined predicate. 3 Note that this data definition language (DDL) statement would be issued by the Text Extender administration component as part of enabling a column for text search. An end user would not need to know about the detailed metainformation.
4.5
Loose Integration of External Search Engines
| 129
CREATE FUNCTION contains(TextHd db2texth, s_arg LONG VARCHAR) RETURNS INTEGER ... PREDICATES ( WHEN = 1 SEARCH BY EXACT INDEX EXTENSION TextIDX WHEN KEY(TextHd) USE containSearch(s_arg));
Details of the index exploitation during text search are illustrated in Figure 4.8. When processing the example’s SELECT statement, the optimizer realizes that contains is a user-defined predicate that is supported by an index extension, and determines that the existing index Desc_idx can be exploited to process the query. If an index access is indicated based on the cost information available to the optimizer, then the containSearch search method is used for the index lookup, based on the information provided in the CREATE FUNCTION statement. The text-search argument provided in the invocation of the contains function will be passed to the search method and, in turn, to the range producer function TxGetHandleSet. Together with the index meta-information, the search argument is then used by the range producer function to query the external index IX1234 using the textsearch engine. The result of the text search, which is a list of identifiers, is then turned into a set of identifier ranges (in this case, with identical start-stop key values), which is then passed back to the index manager to do the lookup on the DB2 index Desc_Idx to locate the actual rows that match the text document identifiers. Because the text index itself gives back the exact results for the text search (instead of an approximate result that only provides a filtering capability), no further evaluation of the original contains UDF is required in the processing of the query.
4.5.3
Generalization for Arbitrary Predicates The approach in Example 6 supports the rewrite of user-defined functions that return a Boolean value. What if a function returns other (numeric or non-numeric) values and appears as an operand of an arbitrary SQL predicate? Assume the following definition for the UDF rank, CREATE FUNCTION rank (id db2texth, arg LONG VARCHAR) RETURNS DOUBLE PRECISION;
130 |
Extensible Indexing Support in DB2 Universal Database
Figure 4.7
Creating an index of type TextIdx. Projects proj_no
title
budget
description
description_h
123
'....'
200000
'This project...'
456
'....'
400000
'Objects are...'
789
'....'
700000
'A database...'
Desc_Idx handle-id
rid
rid1
TxGetKey()
rid2
“Key transformer”
rid3
Figure 4.8 Exploitation of TextIdx index extension during search. Query result
SELECT proj_no, title FROM compschema.projects WHERE contains (description_h, ' "database" IN SAME SENTENCE AS "object" ') = 1
This function behaves like the contains function, but returns a rank value instead of a Boolean value, describing how well a document meets the text-search criteria. For example, the query SELECT FROM WHERE
proj_no, title projects rank (description, ‘“database” IN SAME SENTENCE AS “object-relational”’) >0.5
would retrieve all text information for documents that matches the given search argument with a rank value > 0.5 (all rank values range between 0 and 1). Instead of evaluating the rank UDF for each of the rows in the projects table, it is very desirable for performance reasons to be able to push down the rank value limit (0.5) into the index lookup, if the text-search engine supports the computation of the rank value as a part of the text index lookup or if its APIs allow the specification of a rank-value threshold for full-text retrieval. Our approach of using the extensible indexing framework can be easily applied to solve this problem as well. The first step is to provide another search method as part of the text index extension that accepts a rank-value threshold. CREATE INDEX EXTENSION TextIdx(protHd VARCHAR(60) FOR BIT DATA) FROM SOURCE KEY (handle-id db2texth) GENERATE KEY USING TXGetKey(handle-id) WITH TARGET KEY (x_key VARCHAR(18) FOR BIT DATA) SEARCH METHODS WHEN containSearch (s_arg LONG VARCHAR) RANGE THROUGH TXGetHandleSet(protHd, s_arg) WHEN rankSearch (s_arg LONG VARCHAR, limit DOUBLE PRECISION) RANGE THROUGH TXGetRankHandleSet(protHd, s_arg, limit);
In addition, the definition of the rank UDF would have to reflect the potential exploitation of the text index extension in the following way. CREATE FUNCTION rank(TextHd db2texth, s_arg LONG VARCHAR) RETURNS DOUBLE PRECISION ... PREDICATES ( WHEN > EXPRESSION AS limit
132 |
Extensible Indexing Support in DB2 Universal Database SEARCH BY EXACT INDEX EXTENSION TextIDX WHEN KEY(TextHd) USE rankSearch(s_arg, limit));
The clause EXPRESSION AS limit and the use of limit as an additional argument in USE clause defines that the limit used as the second operand of the > predicate will be provided as an additional argument to the rankSearch search method if the DB2 optimizer chooses to exploit the index extension. Given our example, this would finally result in the following invocation of the range-producer function for the rankSearch search method. TXGetRankHandleSet( ‘...IX1234...’, ‘“database” IN SAME SENTENCE AS “object-relational”’, 0.5)
4.6
Performance Our framework of high-level indexing of user-defined types extends the expressive power and integrated optimization of SQL queries to user-defined types. This section presents some preliminary performance measurements for GIS applications using the existing GIS architecture and our integrated approach, introduced in Section 4.4.1 and also implemented in the DB2 Spatial Extender. The existing GIS architecture is represented by Spatial Database Engine, SDE 3.0.2 on DB2 UDB Version 5 from ESRI (ESRI 2000), which uses a spatial data engine external to the database for optimization of spatial queries. Given a table with business data and a column of spatial attributes, SDE introduces a new feature table to represent spatial data and a new index to process spatial queries. The feature table contains an id column as the primary key and all the spatial attributes and the geometric shapes. The spatial column in the original table (called business table) is replaced by an id column that is a foreign key for the feature table. In addition to the feature table, SDE maintains a spatial index table, which uses a three-level grid-based index method in our example. The spatial index table contains the feature id (which is a foreign key for the feature table) and the indexing information such as the location of the lower-left grid cell and the feature’s MBR. When processing a spatial search query, SDE uses the spatial index table and the feature table to compute a list of (ids of) candidate shapes that satisfy the spatial predicate. The computed list of candidate shapes is then used to retrieve data from the business table by applying the remaining predicates in the WHERE clause of the spatial search query.
4.6
Performance
| 133
Currently SDE handles the join between the business table and the feature table itself by executing different queries.
EXAMPLE 7
We use the census block data for the state of Kentucky, which has 137,173 polygons, with an average of 31 points per polygon. The table kentuckyBlocks has a column boundary of spatial type POLYGON, in addition to other attributes such as the name and the total population. Each polygon represents an area, containing as few as 4 points and as many as 3416 points. CREATE TABLE kentuckyBlocks (name varchar(20), ..., boundary POLYGON)
The following queries represent some typical operations in GIS applications.
•
Loading: Includes raw-data loading through a sequence of SQL insert statements and the maintenance of spatial indexes.
•
Region queries: For three predefined regions in different locations, with the sizes of the answer sets being 3155, 2387, and 1457.
•
Point queries: One hundred random point searches, simulating users pointing at a polygon during spatial browsing.
•
Region queries with attributes: Same as region queries except that nonspatial attributes such as the name and the total population are also fetched in addition to the spatial data.
•
Fetch all: Measures how fast data can be pumped out of a database.
All queries were run on the IBM RS6000/J40 server and during offhours to minimize variations due to other users and processes. The GIS client programs are run on the same machine as the server. The measurements of query execution time (rounded to seconds) are shown in Table 4.1. Data loading was run once, while the rest of the queries were run three times (the averages are shown).
Table 4.1
Performance measurements of spatial queries. Region Queries with Attributes
Region Queries Queries
Loading R1
R2
GIS
3012
19
13.66 8
706
8
Integrated
5
R3
3.33
R1
R2
R3
Point Queries Fetch All
20
14.66
9
9.66
731
4.66
3
8
169.67
7.66
134 |
Extensible Indexing Support in DB2 Universal Database
In both loading and fetch all, we process the entire table and the integrated approach is about four times faster. In the case of loading, an insert statement for a row in the integrated approach becomes three insert statements in the GIS approach, one for the business table, one for the feature table, and one for the spatial index table. In the case of fetch all, because the GIS approach handles the join between the business table and the feature table by itself, it executes a separate query against the feature table repeatedly, once for each set of data retrieved from the business table. For region queries without nonspatial attributes, the integrated approach is approximately two and one-half times faster than the GIS approach, but is about three times faster for region queries with nonspatial data. The difference is that the second case involves the access of the business table. The GIS approach performs very well for point queries. Overall, the results show that our integrated approach of high-level indexing of spatial data has a much better performance. This shows the value of enhancing the database engine for the extensible indexing of complex data.
4.7
Related Work and Conclusion Universal applications involving both complex queries and complex data demand strong and flexible support for nontraditional data such as geographical information and structured documents. Indexing of userdefined types with user-defined predicates is crucial to meeting the demands of component database applications. Therefore, component DBMSs have to support the concept of plug-in components, not only for data type and functions, but also at the index management level to allow for the definition of new data access methods for user-defined types. Several approaches have been investigated to provide extensibility in data access and indexing. One approach is to support extensible indexing through query rewriting. Normally a separate entity manages the special access methods and indexing on complex data of which the database engine is unaware. A user query involving predicates on complex data is transformed into a different query by taking into consideration the access methods and indexing specific to complex data. The indexes may be stored in a relational database or in external files. For example, in a typical GIS such as ESRI’s SDE (ESRI 2000), a spatial data engine supplies a set of proprietary spatial functions and predicates and uses a relational database to store both user tables and side tables representing the spatial indexes. Spatial predicates in a user query are converted by the spatial data engine into joins with and predicates on the
4.7
Related Work and Conclusion
| 135
side tables. The resulting query is then given to the relational database for optimization and execution. The query-rewriting approach does not require modifications to the database engine and is therefore the only option for an application developer if the underlying DBMS does not support indexing on complex data. The query-rewriting approach offers an excellent solution to the problem of building advanced spatial applications with decent performance. However, its lack of integration with the database engine raises integrity, usability, and performance issues. For data integrity, the indexes on complex data must be kept consistent with other business data, which is now the responsibility of the application developer or the database administrator (e.g., by using triggers to achieve consistency). In general, usability suffers from the fact that the application programmer more or less has to be aware of the index model and structure, which is now exposed as part of the overall data model unless there is an additional API layer that hides the database completely. From a performance perspective, because the database engine is unaware of special indexes for complex data, it may fail to generate the most efficient query-execution plan. One can also implement application-specific access methods. There is no shortage of special access methods for spatial or multidimensional data (Gaede & Günther 1998). Unfortunately, only B-trees (Bayer & McCreight 1972; Comer 1979) and R-trees (Guttman 1984) have found their way into commercial database systems. One of the reasons is that an implementation of a new access method or a generic search tree is a huge undertaking because it interacts closely with low-level components of the database engine such as concurrency control, lock manager, and buffer manager. Reliability is of paramount importance for a mature database product, which often discourages extensive changes to low-level components of the database engine. In addition, applications now require new data types and more advanced searches, such as nearest neighbor for spatial data (Roussopoulos et al. 1995) or regular path expressions for semistructured data (Goldman & Widom 1997). It is expected that the access methods supported by a database system will not always match the growing needs of applications. A better approach to extensible access and indexing is through userdefined access methods and user-defined indexing that are tightly integrated with the database engine. Stonebraker (1986) and Stonebraker et al. (1999) introduced user-defined access methods by describing an interface for implementing a new access method. Users specify functions for opening a scan of an index, getting the next record in a scan, inserting a record, deleting a record, replacing a record, and closing a scan. These functions are called by the database engine at appropriate places when executing a query plan. User-defined access methods have a performance advantage because they are tightly integrated with the database engine. However, experience has shown that it is extremely difficult for application developers
136 |
Extensible Indexing Support in DB2 Universal Database
to define a new access method. The reason is that a new access method has to interface with several low-level components of a database engine, including the lock manager for locking of index objects, log manager for recovery, and buffer manager for page management. Few nondatabase-engine developers possess such an intimate knowledge of the internals of a database engine as to be able to write an access method effectively. Extensive changes to low-level components of a database engine are always a risky proposition that is not taken lightly in the context of real-world applications of a mature database product. Researchers have also extended the concept of search in a userdefined access method in the form of generalized search trees (Hellerstein et al. 1995). The notion of a search key is generalized to a userdefined predicate that holds for every datum below the key in a search tree. Users define six key methods that are invoked during the topdown search and insertion or deletion of generalized search trees. This has been further extended in Aoki (1998) to allow more powerful searches such as nearest-neighbor and ranked search. Indexing aims to provide the efficient search and querying of data using some underlying indexed access method. The purpose of userdefined indexing is to extend indexing capabilities to data types and predicates that may not be directly supported by the underlying indexed access methods. Stonebraker (1986) introduced a mechanism (called extended secondary indexes in Lynch & Stonebraker 1988) that allows users to apply existing access methods such as B-trees to new data types. In the case of B-trees, an operator class can be defined that provides a list of user-defined operators and specifies the correspondence between the user-defined operators and the standard relational operators for B-trees. Users can specify an operator class when creating an index on a table column of a new data type. The mechanism of extended secondary indexes in Stonebraker (1986) is generalized in Lynch & Stonebraker (1988) so that predicates that cannot be mapped neatly to comparison operators can also be used for an indexed scan of a table. For instance, one may want to index on keywords occurring in the titles of books. Their idea is to introduce another operator that is applied to values of a column to generate index-key values (e.g., an operator that returns a list of keywords occurring in a given string). The result of the operator can be a list of values that can be compared. The introduction of this operator essentially provides a logical separation between values of a table column to be indexed (e.g., the title of a book in a table books) and the corresponding index keys (e.g., keywords occurring in the title of a book). Our framework of high-level indexing of user-defined types generalizes the extended secondary indexes in Stonebraker (1986) and Lynch & Stonebraker (1988) in two important ways. First, we provide user control over search in indexing that maps a predicate with a search argument into search ranges used by an underlying access method for an indexed scan of a table. Such mapping is no longer limited to a sin-
4.7
Related Work and Conclusion
| 137
gle search range based on a relational operator. For a user-defined predicate, users can provide their own method of generating possibly multiple search ranges based on the search arguments of the predicate. Second, we provide user control over the execution of possibly expensive user-defined predicates using a multistage procedure for filtering with approximate predicates (Shivakumar et al. 1998). The framework is tightly integrated with the database engine, especially with the index manager and query optimizer. It is orthogonal to the underlying access methods and can take advantage of any special access methods whenever they are available. Our main contribution is not in developing a new access method or a special search algorithm, but rather in providing a framework in which users have direct control over index maintenance, exploitation, index filtering, and predicate evaluation. More specifically, users can define their own key transforms. The idea of key transforms is not new; examples are transforming a geometric object into an MBR for R-trees (Guttman 1984) or into a set of z-values (Orenstein & Manola 1988). Following Stonebraker (1986) and Lynch & Stonebraker (1988), we give the power to users to decide what abstractions or approximations to use as index keys for a userdefined type. Users can define their own search-key producers for search patterns of arbitrary predicates. Although search key producers are not sufficient by themselves to support advanced searches such as ranked and nearest neighbor (which require direct support from the underlying access methods), they bridge the semantic gap between user-defined predicates and the limited access methods that are available. Users can define their own filters to avoid expensive predicate evaluation. Multistage predicate evaluation has been explored in Brinkhoff et al. (1993, 1994). Researchers have also investigated query optimization issues with expensive predicates (Chaudhuri & Shim 1996; Hellerstein & Stonebraker 1993) and with filtering using approximate predicates (Shivakumar et al. 1998). Our contribution is in integrating multistage evaluation of predicates with the database engine, especially the index manager, thus providing an implementation framework where approximate predicates can be used effectively for efficient query execution. As we have shown, the index filter is a powerful technique that makes it possible to avoid the I/O cost of retrieving useless data into the memory buffer. Furthermore, it offers an interesting mechanism to combine multiple indexing mechanisms in a single search (e.g., structured search with external full-text indexing). As we have shown, our framework for high-level indexing can also be effectively used to integrate external search engines in situations where the content-specific index data need to reside outside the database system in a proprietary format. This is especially important for a vendor that is specialized in search technology for specific types of content and would like to implement a database plug-in component for an
138 |
Extensible Indexing Support in DB2 Universal Database
existing object-relational component DBMS. Here, a tight integration that stores the index in the database system may not be feasible because it requires the mapping of highly tuned content-specific index data structures and the processing model of the indexing engine to the indexing approach of the database system, which can very often be too costly or result in limitations. Loosely integrated approaches, based on logical query rewrite, that solve this problem have been explored in Chaudhuri & Shim (1993) and Deßloch & Mattos (1997). However, because logical query rewrite usually precedes cost-based optimization in the overall process of query evaluation and may not be tightly integrated with the optimizer, decisions on whether to exploit an index may happen prematurely and have a negative effect on the quality of the query-execution plan. In comparison, our approach uses extensibility at the index level, which consequently gives our cost-based optimizer complete control over index exploitation. The tight integration with the database engine means that it is possible for the query compiler to exploit user-defined predicates in the standard framework of query optimization. This means that the full querying capabilities of SQL, including multiple predicates in a WHERE clause, aggregate functions, subqueries, and recursion, are now available for universal applications through DB2.
Introduction Today’s data-intensive applications bring a challenging collection of requirements to data management. These requirements include
•
Accessing data from heterogeneous data sources. Data are stored not only in traditional database management systems (DBMSs), but also in simple files, index-sequential files, desktop databases, spreadsheets, project management tools, electronic mail, directory services, and spatial and multimedia data stores. Recently, the explosive growth of the World Wide Web (WWW) has transformed this medium into a vast repository of information and knowledge. All these data are usually heterogeneous in their native forms.
•
Specialized data management. Given that heterogeneous data sources are the norm, each data source tends to have different strengths and weaknesses in the areas of data model, query language, transaction, and indexing support. Applications want to access data directly from their source of origin and exploit their data stores’ native capabilities. For example, messaging data stores have a tree-structured name space and manage collections of heterogeneous objects such as messages, appointment, and contacts. Spatial and image data stores require support for geometric data types, indexes, and query language.
•
Different data navigation models. The environments where the application runs may have different requirements for the way data are navigated. For example, an application running in a middle tier may operate on a small subset of data cached entirely within the application space and provide bidirectional or positional access to data. An application running in the database usually works on large data sets and tends to navigate data by reading records sequentially in a forward-only direction.
It is possible to consider a solution to all these requirements using a single DBMS. Indeed, several database companies have pursued a traditional database-centric approach, which this chapter refers to as the universal database. In this approach, the database vendor extends the database engine and programming interface to support new data types, including text, files, spatial, video, and audio. The vendor requires customers to move all data needed by the application, which can be distributed in diverse sources across a corporation, into the vendor’s database system. An advantage of this approach is that many DBMS services such as queries, transactions, and utilities can be generalized to manage all these heterogeneous data in a uniform way. A disadvantage is that moving data from their source of origin into the database may be very laborious and expensive. An even greater problem arises when there are other tools that work with the data in their native format outside of the database, such as a map-drawing program or email application.
5.2
Universal Data Access
| 141
An alternative approach, which we call universal data access (UDA), is based on the premise that heterogeneity of data and sources is the norm and will stay so. UDA allows applications to efficiently access data where they reside without replication, transformation, or conversion. Common abstractions and interfaces allow connectivity among all data sources. Independent services provide for distributed queries, caching, updating, data marshaling, distributed transactions, and content indexing among sources. UDA can be implemented as a complementary approach and used in combination with the universal database approach. Microsoft developed OLE DB and ActiveX Data Objects (ADO) in support of the UDA approach. OLE DB is a fairly detailed, system-level interface used by system developers to encapsulate component database functionality. OLE DB interfaces are the key enabler of UDA’s component architecture. Early in the design of OLE DB, it became clear that this was not an interface appropriate for business-application developers. Therefore, Microsoft developed ADO as a set of applicationlevel interfaces designed to simplify access to data and provide a more natural data access experience for programmers from multiple languages. Together, OLE DB and ADO enable applications to have uniform access to data stored in DBMS and non-DBMS sources. Microsoft released OLE DB 1.0 in August 1996 as part of a data-access Systems Developer’s Kit. To date, the OLE DB architecture has been used by applications that access data from multiple heterogeneous data sources, as a component interface to build services that augment the data store’s native functionality, as a way to build custom solutions on top of commercial data stores, and as a component interface inside a DBMS. In the remainder of this chapter, we identify some of the aspects of the architecture that contribute to OLE DB’s inherent component nature. Then, after looking at the base objects that comprise the abstractions for connecting and representing data, we show how those base abstractions have been extended to support particular types of data. Finally, we take a look at some of the generic services that have been built using OLE DB’s component nature to augment the functionality of existing data providers, how custom solutions have been built on top of existing data stores, and how OLE DB has been used in several component database scenarios. But first, let us take a look at how OLE DB enables the componentization of Microsoft’s UDA Architecture.
5.2
Universal Data Access UDA is a component database architecture built as a collection of data management components and services, whose behavior is defined by a set of interfaces. OLE DB, based on the Microsoft Component Object Model (COM) (Box 1998), defines the system-level programming
Enabling Component Databases with OLE DB
interfaces that encapsulate various DBMS components. ADO, based on Automation (Microsoft 1997a), defines an application-level data-access model that allows enterprise programmers to write applications over OLE DB data from any language including Visual Basic, Java, VBScript, JavaScript, C, and C++. Figure 5.1 illustrates the UDA architecture. There are three general kinds of database components: data providers, services, and consumers.
•
Data providers are components that represent data sources such as SQL databases, index-sequential files, and spreadsheets.
•
Services are components that add functionality to existing OLE DB data. Services can be combined with other components because they consume and produce data using the same set of interfaces. For example, a cursor engine is a service component that consumes data from a sequential, forward-only data source to produce scrollable data. A relational query engine is an example of a service over OLE DB data that produces rowsets satisfying a Boolean predicate.
•
Consumers are components that consume OLE DB data. The best example of a consumer is ADO, which provides a high-level dataaccess programming model over OLE DB services and data providers to enable business applications to be written in languages such as Visual Basic, C++, or Java.
All interactions among components in Figure 5.1 (indicated by bidirectional arrows) may occur across process and machine boundaries
Universal Data Access (UDA Architecture). Consumers
Application or tool
ActiveX Data Objects (ADO)
Services
Microsoft Transaction Server
Figure 5.1
COM/DCOM
142 |
OLE DB
Cursor engine
Distributed query engine
Relational query engine
OLE DB
Data providers
OLAP
RDBMS
ISAM
SPATIAL
EMAIL
5.3
OLE DB: A Component Data Model
| 143
through network protocols such as Distributed COM (DCOM) (Eddon & Eddon 1998) or Hypertext Transfer Protocol (HTTP) (HTTP 1997). Transacted interactions among components are possible via a distributed transaction coordination service such as the one included in Microsoft Transaction Server (Homer & Sussmann 1998).
5.3
OLE DB: A Component Data Model This section describes how OLE DB provides a component architecture for accessing and managing data. This is not intended to be an exhaustive reference for the interface, but rather a high-level overview of the basic abstractions defined in OLE DB and some of the extensions built with, and on top of, those abstractions to extend the model with domain-specific functionality. In particular, we will emphasize those aspects of OLE DB that contribute to its inherent extensibility and compositional nature. A complete interface reference is documented in Microsoft (1998a).
5.3.1
The Microsoft Component Object Model OLE DB is designed as a Microsoft COM interface. COM provides the foundation for OLE DB’s compositional nature. In order to understand how OLE DB is used to build component solutions, it is necessary to understand a bit about COM. The following section provides an overview of the COM concepts of interface factoring, inheritance, interface navigation, and object reference counting, encapsulation, and aggregation. A complete reference to COM can be found in Box (1998). Interface Factoring. In COM, a single object exposes one or more interfaces. Each interface exposed by the object represents a fixed set of functionality, such that the total functionality of the object is factored among the various interfaces. The full set of functionality defined in the OLE DB specification is factored into multiple discrete interfaces that group related levels of functionality for each object. This interface factoring is what makes OLE DB so accessible to simple data stores. Rather than forcing every data store to implement a full relational engine in order to be accessible through common tools, the data store implements an object comprising only those interfaces that represent native functionality for that data store. At a minimum, this includes core interfaces that provide a simple forward-only, read-only view of the data. In this chapter interfaces are denoted by the interface name in boldface, prefixed with the letter I, for example, IUnknown or IRowset. Methods specific to an interface are denoted by the interface name,
144 |
Enabling Component Databases with OLE DB
followed by the method name, separated by double-colons and followed by parentheses, for example, IUnknown::QueryInterface(). Where unambiguous, the interface name is omitted, as in QueryInterface(). Interface Inheritance. Interface inheritance refers to the ability to define a new interface that extends the functionality of an existing interface. Inheriting from an existing interface means that the new interface is designed to automatically support all of the methods defined in the inherited interface. When implementing the new interface, the object automatically supports the inherited interface and consumers can call inherited methods through either interface. Multiple interfaces may inherit from the same base interface, in which case all such interfaces support the same subset of methods. COM only allows an interface to inherit from a single parent, although the parent interface may itself inherit from another interface, and such nesting is allowed to go arbitrarily deep. In this chapter we indicate interface inheritance as the name of the inheriting interface followed by the name of the inherited interface, separated by a single colon, as in IRowset:IUnknown. Interface Navigation and Object Reference Counting. Each interface instance in COM inherits directly or indirectly from a common interface implemented on each COM object called IUnknown, such that the IUnknown interface is the super-parent of every other interface supported by the object. IUnknown contains AddRef() and Release() methods, which control the lifetime of the object through reference counting, and a QueryInterface() method for navigating between the different interfaces on the object. Figure 5.2 shows a single COM object that implements two interfaces, IRowset and IRowsetFind, in addition to IUnknown. Encapsulation. Because the interface definition defines the contract with the consumer, the details of how that interface is implemented are hidden from the consumer. Oftentimes a COM object may implement some or all of its functionality by calling other COM objects internally. These internal COM objects are hidden from the consumer, and all calls
Figure 5.2
Factoring of functionality between interfaces. IUnknown
COM object
IRowsetFind:IUnknown IRowset:IUnknown
5.3
OLE DB: A Component Data Model
| 145
go through the outer object. Only those methods and interfaces implemented by the outer object are exposed to the consumer. This is called encapsulation. Figure 5.3 shows how one COM object (object A) may call another COM object (object B) internally through encapsulation. Note that the consumer calls both IRowset and IRowsetFind as implemented directly on object A. Object B exposes its own IUnknown interface (labeled IUnknown′), and object B’s IRowset inherits from object B’s IUnknown′. Object B’s interfaces are not available to the consumer and are called only by object A in order to provide object A’s exposed functionality. This looks like a single object (object A) to the consumer, which has no knowledge of object B. COM Aggregation. Another way a COM object can be written to take advantage of a separate object’s existing functionality is through aggregation. Aggregation is a lot like encapsulation, except that in aggregation the consumer can call interfaces on the inner object directly, including interfaces that the outer object does not know about. Exposing functionality that is unknown to the outer object is called blind aggregation. In aggregation, when the inner COM object (object B) is created, it is provided with the IUnknown interface of the outer COM object (object A). In this case, each interface exposed by the newly created object B inherits from object A’s IUnknown implementation. This controlling IUnknown then manages the navigation between sets of functionality supported by interfaces implemented on object A and sets of functionality supported by interfaces on object B. Object B’s IUnknown interface is only exposed to object A and is used by object A to navigate between object B’s interfaces. Figure 5.4 shows a COM object (object B) that supports the IRowset interface, and a controlling object (object A) that implements the IRowsetFind interface. When the consumer calls methods in IRowset, it is calling the method implemented on object B directly. When the consumer calls methods in IRowsetFind, it is calling the method implemented on object A, which may in turn use object B’s IRowset interface under the covers. The QueryInterface(),
Figure 5.3
Object A encapsulates object B.
IUnknown'
Object B
Object A IRowset:IUnknown'
IUnknown IRowsetFind:IUnknown IRowset:IUnknown
146 |
Enabling Component Databases with OLE DB
Figure 5.4
Object A aggregates object B.
IUnknown'
Object B
Object A
IUnknown IRowsetFind:IUnknown IRowset:IUnknown
AddRef(), and Release() methods of the IUnknown interface implemented by object A are used, whether the methods are called from the IUnknown interface directly or through either the IRowset or IRowsetFind interface. This controlling unknown navigates between the interfaces supported by the two objects. To the application, this appears as a single object that implements both IRowset and IRowsetFind (in addition to IUnknown).
5.3.2
Introspection via Properties For a consumer to use a generic data provider, it must be able to determine the characteristics, functionality, and behavior of that provider. The ability to determine these characteristics at runtime is referred to as introspection. OLE DB provides a rich property model for data stores to expose provider-specific options and behavior, and for requesting extended functionality. Using properties the consumer can
•
Determine data provider information. Properties provide a way for the consumer to determine information about the data provider; for example, the data provider’s name and version, level of SQL supported (if any), case sensitivity, and maximum row size. Data-source informational properties also allow the consumer to determine the data provider’s behavior, for example, the effect of committing or aborting a transaction on any prepared statements or how null concatenation is treated.
•
Specify connection information. Properties provide a way for the consumer to determine what connection information (server, username, password, etc.) is required by the data provider in order to connect to a data source and to specify the required information.
•
Request extended functionality. Properties provide a way for the consumer to request that an object support certain functionality above and beyond the minimal base functionality. For example, the consumer may specify that a set of results should be updatable and support scrolling. In this way, the consumer can specify exactly the level of functionality it requires, and the data provider is free to
5.3
OLE DB: A Component Data Model
| 147
make optimizations by not supporting any extended functionality not requested by the consumer. Properties are divided into property sets, identified by a globally unique identifier (GUID). There are predefined GUIDs for each group of OLE DB–defined properties, and consumers can define their own GUIDs for provider-specific properties. Consumers can request specific properties or request all properties of a given type, for example, all initialization properties used to connect to a data source. The provider returns information about all of the OLE DB–defined properties it supports, as well as any provider-specific properties. This information includes a name for the property, its type, whether it is required or optional, and possible values for enumerated properties. Thus, a general consumer (one not tied to a particular data source) can present these properties to a user, along with the necessary information needed by the user to set them to appropriate values, without knowing the properties ahead of time.
5.3.3
Common Abstractions OLE DB is built on a small set of common abstractions, along with a set of common extensions for exposing additional functionality of diverse types of data stores. By building on a set of common abstractions, OLE DB enables generic components to operate on those abstractions as individual entities, without needing to know how the entities were generated or the details of what they represent. OLE DB defines three core object classes common to any data provider; the data source object (DSO) provides a common abstraction for connecting to a data store, the session object provides a transactional scope for multiple concurrent units of work, and the rowset object provides a common abstraction for representing data in a tabular fashion, as shown in Figure 5.5. All of the extensions, built on OLE DB, extend this basic object hierarchy. Figure 5.5 shows the use of the CoCreateInstance() function to create an instance of a particular DSO. The DSO supports an IDBCreateSession interface, whose method CreateSession() returns an instance of a session object. The IOpenRowset interface on the session object, in turn, returns a rowset object that supports, at minimum, the IRowset interface. Connection Abstraction. The OLE DB connection model defines how data providers are located and activated. The DSO and the session object are the basis for the OLE DB connection model. To access a data provider, a consumer must first instantiate a DSO. Windows provides a registry for COM objects, and each data provider’s DSO is identified by a unique class identifier (CLSID) in this registry. A particular data
148 |
Enabling Component Databases with OLE DB
Figure 5.5
Hierarchy of data source, session, and rowset objects. CoCreateInstance( )
Data source IDBCreateSession CreateSession( )
Session IOpenRowset
OpenRowset( )
Rowset IRowset
provider’s DSO is instantiated by calling the OLE CoCreateInstance function, which creates an instance of the object through the object’s class factory. The DSO exposes the IDBProperties interface, which consumers use to provide basic authentication information such as the user ID and password (for cases when the data source does not exist in an authenticated environment) and the name of the data source (computer, file, and database name) containing the data to be accessed. Once these properties have been specified, consumers use IDBInitialize to establish a connection with the data store. Once a DSO has been successfully initialized, it exposes data source information properties through which a consumer can query the capabilities of a provider. These capabilities include the interfaces, rowset properties (e.g., scrollability), transaction properties (e.g., isolation levels), SQL dialects, command operations (e.g., left outer joins, textsearch operators), and security options a provider supports. From the initialized data source, the consumer calls CreateSession in order to create a session object. The session object exposes, at a minimum, IOpenRowset, a simple interface through which providers expose their data as rowsets. The Rowset: A Common Data Representation. A rowset is a unifying abstraction that enables OLE DB data providers to expose data in tabular form. Conceptually, a rowset is a multiset of rows where each row has columns of data. Base table providers present their data in the form of rowsets. Query processors present the results of queries in the form of rowsets. This way it is possible to layer components that consume or produce data through the same abstraction. Rowsets are also used to return metadata, such as database schemata, supported datatype information, and extended column information. The most basic rowset object exposes three interfaces: IRowset, which contains methods for navigating through its rows sequentially;
5.3
Figure 5.6
OLE DB: A Component Data Model
| 149
Rowset object. People Name
Address
IRowset
David
NY
IAccessor
Mike
CA
Monica
DC
IColumnsInfo
Resume
Picture
Voice
. . . IStream
IAccessor, which permits the definition of groups of column bindings describing the way in which data are bound to consuming components; and IColumnsInfo, which describes the columns of the rowset and their types. Using IRowset, data may be traversed sequentially or in a bidirectional manner, depending on the properties of the rowset. Figure 5.6 illustrates a typical rowset object representing a collection of people. Other rowset interfaces capture rowset properties a provider may support. For example, rowsets that support updatability may expose interfaces to insert, delete, or update rows. In addition, there are rowset interfaces that expose richer row navigation models, such as direct row access and scrollability.
5.3.4
Common Extensions In order to be suitable as a primary interface to a variety of data stores, OLE DB defines a set of common extensions on top of the base abstractions. On page 150 we show how the base abstractions have been extended to provide support for issuing a query, navigating an index, navigating hierarchical data, representing heterogeneous data, and navigating multidimensional information in a natural manner. By building these extensions on the common defined abstractions, generic consumers can view all data through the common abstractions while special-purpose consumers can access the domain-specific functionality through common extensions.
150 |
Enabling Component Databases with OLE DB
Query Syntax Support. The first extension to the basic hierarchy exposes the ability to issue a textual command to a data store through a command object. The command object encapsulates the functions that enable a consumer to invoke the execution of data definition or data manipulation commands such as queries over a relational database. The command object is obtained from the session object by calling the CreateCommand method on the IDBCreateCommand interface. Once a command object is created, the consumer sets the command text to be executed through the SetCommandText method on the ICommand interface, and then calls the Execute method on the same interface. When the command represents a query, the Execute method returns the results in a rowset object, as shown in Figure 5.7. The consumer can specify required or optional properties to be supported on the resulting rowset by setting those properties prior to executing the command. Because OLE DB is designed to work with any query-capable provider, it does not mandate a specific syntax or semantics for queries. It is entirely up to the provider to define the language for formulating queries. For example, currently there are OLE DB providers for relational DBMSs, full-text-search engines, OLAP engines, directory services, email stores, and spatial stores. Each of these query-capable data stores exposes a different query language. Table 5.1 lists some of the query languages supported by various OLE DB providers. A command object may be built to consume one or more rowsets, such as a rowset exposed by a simple tabular data provider, and expose the results of the query evaluation as a rowset to the user. ISAM Navigation. Many types of data stores do not support the ability to query data through textual commands, but do support basic indexing functionality. This functionality allows services such as query processors to efficiently access contiguous rows of data within a range of keys.
Table 5.1
Query languages supported by various OLE DB providers.
Type of Data Source
Product
Query Language
Relational
Microsoft SQL Server 7.0
SQL
Full-text indexing
Microsoft Index Server
Index Server Query Languagea
OLAP
Microsoft OLAP Services
MDXb
Email
Microsoft Exchange
SQL with hierarchical query extensions
Directory services
Microsoft Active Directory
LDAP
a b
Swank and Kittel (1997). Thomsen et al. (1999).
5.3
Figure 5.7
OLE DB: A Component Data Model
| 151
Executing a command. CreateSession( )
Session IDBCreateCommand CreateCommand( )
Command ICommand Execute( )
Rowset IRowset
The index object abstracts the functionality of B-trees or indexsequential files by exposing a rowset with extended functionality (Figure 5.8). Indexes are traversed using the IRowset interface, information about the index entries is obtained via the IColumnsInfo interface, and insertions and deletions are performed through the IRowsetChange interface. By building on the common rowset abstraction, developers can apply much of the same logic, tools, and even code for manipulating an index as they would use for any other rowset. Hierarchical Navigation. In order to support a simple, efficient mechanism for navigating homogeneous one-to-many relationships, such as a master-detail relationship between departments and employees, OLE DB defines hierarchical rowsets. In a hierarchical rowset there is one rowset per level of hierarchy representing all data instances at a given level. In other words, each rowset in the hierarchy captures an entire
Figure 5.8
Rowset object representing an index. IUnknown
IRowsetIndex IRowset
Index pages
IColumnInfo Data pages IRowsetChange Index object
152 |
Enabling Component Databases with OLE DB
relationship between two entities. The rowsets at each level in the hierarchy support the basic rowset abstraction discussed earlier. The rowsets are linked together via chapters, which are a mechanism to identify groups of related children rows. Chapters behave as separate collections with a beginning and an end, but also share the facilities (such as accessors and notifications) of the rowset to which they belong.
EXAMPLE 1
To illustrate hierarchical result sets, consider first a SQL statement producing a flat collection of rows: SELECT FROM
d.name, e.name, e.phone Department d LEFT OUTER JOIN Employee e ON d.dno = e.dno
ORDER
BY d.name
The execution of this statement produces a single rowset containing the employee names and their phone numbers per department in an organization. The department name is repeated for each employee working in that department. This statement can be reformulated to produce a two-level hierarchy where the department name occurs only once and each department is associated with its corresponding set of employees: SELECT FROM
d.name, SET (e.name, e.phone) as emps Department d LEFT OUTER JOIN Employee e ON d.dno = e.dno
GROUP
BY d.name
The result of executing the second statement is a hierarchy of two rowsets, one containing department and the other containing employee information. Each rowset represents a complete hierarchy level in the result. The root rowset—the rowset returned from ICommand::Execute —contains the department name (name), and a chapter column (emps). Each chapter value in emps references a (possibly empty) set of rows in the second rowset containing information about employees. Each grouping of a detail set of rows, such as the employees working for a particular department, is treated as a collection with a beginning and end. A chapter identifies this detail group. A chapter can be implemented as a partial key to the details or as a parameterized query. Because it is partial, it might lead to a set rather than a single row. The functions that take a chapter as a parameter use the beginning, membership, and end implied by the chapter to limit the rows that the request may return.
5.3
OLE DB: A Component Data Model
| 153
Although a chapter value may be held in a column of a table, that value is not the same thing as the physical instantiation of the chapter. The chapter may be instantiated when a command is executed or on demand when the chapter value is used. In any case, the actual chapter belongs not to the parent rowset that had the chapter value, but to the rowset containing the rows the chapter references. Hence, the lifetime of chapter instances is controlled by the rowset containing the detail rows. Heterogeneous Data and Containment Relationships. Rowsets are well suited for representing homogeneous results—results that have the same set of columns from one row to the next. This works well for structured data, such as the results of a query, but becomes a limitation when dealing with collections of dissimilar rows. For example, an email inbox is a collection of items. Each item can be of a different type, such as an email message, a calendar entry, a journal entry, or a folder. They may share a common set of attributes, such as name, date, or size, but each also has its own specific attributes that are not relevant to the other items. Because rowsets expose a common set of columns for each row, representing all of these items in a single rowset would require exposing a column in each row of the rowset for each possible attribute of any item within the rowset. In order to simplify support for heterogeneous collections, in which some columns may vary from one row to another, OLE DB defines a row object. Each row object represents an individual row instance, which can be a member of a rowset or a single result value such as the result of a singleton SELECT statement in SQL. Consumers can navigate through a set of rows viewing the common set of columns through the rowset abstraction and then obtain a row object for a particular row in order to view row-specific columns. In addition to heterogeneous collections, hierarchies of row and rowset objects can be used to model containment relationships common in tree-structured data sources, such as all files within a directory in a file system or all messages within a folder in an email system. These hierarchies can be used not only to navigate through the data, but also to manipulate the underlying tree and its contents. Row objects have their own metadata and can contain rowset-valued columns. Row objects contain their own interfaces to add or remove columns and to create new row objects. In addition, row objects can be augmented to support specialized interfaces that implement operations whose semantics scope all dependent objects (rows and rowsets) in the containment hierarchy rooted at the row. Just as an OLE DB consumer can change the rows in a rowset and transmit those changes to the underlying data store, it can also create new objects in a tree-structured name space, such as directories and email folders. Or it can move, copy, or delete entire subtrees through various scoped operations on the appropriate row objects.
154 |
Enabling Component Databases with OLE DB
OLAP Data. Online Analytical Processing (OLAP) provides a way to consolidate raw data into a multidimensional view. For example, an OLAP report might show the number of computer sales by region, broken down by salesperson. In order to provide support for OLAP, OLE DB defines a multidimensional expression language (MDX) (Thomsen et al. 1999). Multidimensional queries written in MDX may be submitted to OLAP data providers through the standard OLE DB command object. A special GUID, MDGUID_MDX, is defined for the MDX dialect so that providers that can support both SQL and MDX queries can distinguish between the two. General consumers can view the results of an MDX query through the standard rowset abstraction. In this view, the results of the query are flattened into a tabular view. OLAP consumers, however, are often designed to navigate and manipulate the results of an MDX query along its multiple dimensions. For these types of consumers, it is convenient to expose a multidimensional data object. OLE DB defines such an object, called the dataset, which exposes interfaces for navigating and accessing the results of an MDX query in multiple dimensions. Other than the specification of the MDX query language and addition of the dataset object for manipulating data in multiple dimensions, OLE DB required little in the way of extensions to support OLAP data. The same connection model, commands, and rowsets continue to apply to OLAP data providers. Although OLE DB defines additional metadata schema for cubes, measures, dimensions, hierarchies, levels, properties, and members, these metadata are accessed through the IDBSchemaRowset interfaces on the session object, just as are any other metadata. By requesting that the results of an MDX query be returned as a flattened rowset, generic consumers can talk to OLAP data providers in the same manner as a relational or other command-based provider.
5.4
Services The previous section describes how OLE DB defines a small set of common abstractions and builds on that base level of functionality by defining common extensions that reuse those abstractions where appropriate. This section discusses how OLE DB facilitates building services that can be composed in order to provide a component-based solution for accessing and managing data. In each case, the service described illustrates how common components are implemented in order to extend existing data providers with discrete sets of functionality— namely, connection facilities, resource pooling, and rich client-cursor functionality.
5.4
5.4.1
Services
| 155
Adding Generic Connection Facilities through Data Link A good example of a component that makes use of OLE DB’s extensible property model for introspection is the Microsoft Data Link Component, shipped as part of the Microsoft Data Access Components (MDAC) version 2.0 (Microsoft 1999a). Prior to the Data Link Component, OLE DB applications had to implement their own logic for determining and specifying connection information. This information could vary from one provider to another. In addition, consumers wanted an easy way to persist, load, and exchange connection information, preferably in the form of a string. The Data Link Component provides a common connection property editor. This component queries the individual data provider for a list of its initialization (connection) properties and exposes these to the consumer through a simple common interface. The property names and datatypes are determined from the provider, along with possible values for enumerated types. The Data Link Component uses the property’s option values to enable the consumer to distinguish between required and optional properties. Applications can call this component directly in order to provide a common interface for specifying connection properties. The component returns a DSO for the selected provider with the specified properties set. Once the initialization properties have been set on a provider, the consumer can obtain a ConnectionString representation of those properties that can be used internally or written to a universal data link (udl) file. The ConnectionString is built in the form of property name– value pairs, where the name of the property is again obtained from the property information obtained from the provider. In this way, the Microsoft Data Link Component can build and parse connection strings with no a priori knowledge of the provider’s set of valid properties. There is also a stand-alone utility for creating or modifying .udl files.
5.4.2
Adding Common Resource-Pooling Facilities The introduction of a common component for specifying connection information also gave Microsoft the ability to implement logic for pooling connections and providing automatic transaction enlistment in an Microsoft Transaction Server (MTS) environment. By connecting through this common component, with or without exposing the connection dialog to the user, OLE DB Service Components are able to aggregate (as described in Section 5.3.1) the data provider’s DSO in order to examine the connection information. If this is the first such connection, a pool is created to hold similar connections, and a connection is created from within that pool. OLE DB Service Components then check to see whether or not the calling thread is in a
156 |
Enabling Component Databases with OLE DB
transactional context and, if so, create an OLE DB session and enlist it in the global transaction. This session is held as long as the data source is connected, so that when the application calls CreateSession, the Service Component that aggregates the DSO intercepts the call and returns this held session. If the consumer requests a second session without releasing the first, a new session is created and enlisted in the global transaction, but is not held if the consumer releases it. When the consumer releases the DSO, OLE DB Services again intercept the release and return the connected data source (and its session object) to the pool. If another request with the same connection information (including security and transactional context) comes within a specified amount of time (default, 60 seconds), the existing connection is returned directly to the user, without undergoing an expensive reconnect and possible reenlistment, to the server. If the data source is not used within the specified period of time, the object is released and the connection freed. Because these resource-pooling facilities are implemented through COM aggregation, the functionality can be implemented on top of existing data providers, without requiring special code in the provider; data providers built according to the original OLE DB specification simply work without changes. This ability to pool database connections and transaction enlistment increased the scalability and performance of ADO and OLE DB components in web applications by several orders of magnitude.
5.4.3
Providing a Rich Client-Cursor Service Relational databases were initially designed to provide a forward-only cursor over data. In fact, scrolled cursors were not introduced until SQL-92. Even then, scrolled cursors are not required for SQL-92’s minimum entry-level conformance and not required to support updating until the highest full-conformance level. As a result, many database systems support cursor semantics no richer than a forward-only traversal of data. Even for those databases that do support richer scrolling models, keeping a cursor open on the server for each user and incurring a network round trip each time the user scrolls may not be practical for many types of multitier distributed applications. Still, many types of applications, particularly those that interact with users, require the ability to scroll back and forth through results. Having to write code within the application to cache results, scroll through the cache, and synchronize the cache with the database is not only onerous for the typical client application, but means that the application cannot take advantage of native cursor functionality in the data source where it is appropriate and supported by the data provider. In order to allow applications to rely on rich cursor functionality over any data provider, Microsoft implemented a cursor service that
5.5
Custom Data Providers
| 157
provides client-side cursoring functionality on behalf of any OLE DB provider. As with resource pooling, the cursor service is invoked when needed by OLE DB Service Components. A component manager aggregates the session and command objects, and intercepts the properties set by the consumer that specify the desired functionality of a rowset to be created. If the consumer requests properties that can be supported by the cursor service, including rich scrolling functionality, client-side filtering, sorting, or finding values within a result, the component manager passes those properties to the underlying provider as optional. If the data provider is able to satisfy all of the requested properties and the consumer has not explicitly requested client-side cursoring, the data provider’s rowset is returned to the user without invoking the cursor service. On the other hand, if the data provider cannot provide all of the requested functionality or the customer specifically asks for client cursors, the component manager hands the underlying data provider’s rowset to the cursor service, and the cursor service returns a rich cursor on top of that underlying rowset by caching results (synchronously or asynchronously) and allowing the user to manipulate data within the cached results. Changes to the results are synchronized with the data source through update, insert, and delete SQL statements generated from the rowset’s metadata and the affected rows by the cursor service.
5.5
Custom Data Providers In addition to building services that augment the functionality of data providers, OLE DB allows developers to build custom data providers that reuse existing data providers as the underlying component of a more-vertical solution. In many cases, developers want to expose a customized view of data in an existing data store. For example, they may want to expose custom methods that return fixed sets of data or custom queries over a subset of the tables in a data store. OLE DB allows these types of developers to expose custom semantics by building custom data providers on top of existing data providers. Custom data providers may be hard-coded to a specific underlying data provider or may work with a variety of data providers. Developers of custom data providers may expose as much or as little of the underlying data source’s functionality as they want. Custom data providers generally aggregate or encapsulate the underlying data provider. In this manner, they can intercept calls that they want to customize and let the others flow through unchanged. For example, a developer may want to expose a custom data provider for a management system that connects to an Oracle database based on the user’s EmployeeID. Once the user specifies his or her
158 |
Enabling Component Databases with OLE DB
EmployeeID, the custom provider can connect to the database using a hidden username and password. Once connected, the custom provider may expose a natural-language query syntax that allows the user to ask for a list of employees, their projects, or their past reviews. A simple parser takes that request and translates it into an Oracle query, using parameters to limit the information based on the EmployeeID specified during connection. The custom provider can then return the data provider’s rowset directly to the consumer if the developer does not want to add any semantics on top of the data. Users can select their choice of ADO- or OLE DB–enabled products to access this custom provider; regardless of which they select, they will always see the same customized view of the data. An example of an actual custom data provider is the MSShape Provider, shipped as part of the Microsoft Data Access Components 2.0. This custom provider appears as any other provider to the consumer. When the consumer connects to the provider, it specifies the name of the underlying data source it wants to connect to and valid connection information for that source. The Shape Provider implements its own command object that supports a custom syntax for building hierarchical results by relating multiple rowsets. The syntax for specifying the individual rowsets is particular to the underlying data provider; the Shape Provider simply passes that portion of the query through and obtains rowsets from the underlying data source. The data from those rowsets are then read into a local cache (the same local cache used for client cursors), and the Shape Provider relates rows from the rowsets according to relations specified in the shape command. The user can then navigate from rows in the parent rowset to rows in the child rowset through chapter columns built by the Shape Provider. The Shape Provider also supports custom aggregates and other calculated columns.
5.6
Component Database Scenarios We have seen three ways that OLE DB’s component nature lends itself to reuse and extension. Having a common way to represent data, be it raw data, metadata, or other types of information, and defining common extensions from that foundation builds on developer knowledge and encourages code and development tool reuse. Factoring the full set of functionality into discrete interfaces and building on COM’s component model allows common services to be introduced to augment the functionality of existing data providers. And through encapsulation, existing providers can be used as component building blocks, allowing developers to expose custom views of data and functionality without
5.6
Component Database Scenarios
| 159
rewriting the entire data store or access methods. In this section we see a fourth way that products can take advantage of OLE DB’s inherent component nature by describing five component database scenarios in the context of a commercial RDBMS.
5.6.1
Database Server The database server of Microsoft SQL Server 7.0 has two main parts: the relational engine and the storage engine. The architecture of SQL Server 7.0 clearly separates the relational and storage engine components in the server and uses OLE DB interfaces to allow them to communicate with one another (Figure 5.9). The processing for a SELECT statement that references only tables in local databases can be briefly described as follows. The relational engine compiles the SELECT statement into an optimized execution plan. The execution plan defines a series of operations against simple rowsets from the individual tables or indexes referenced in the SELECT statement. The rowsets requested by the relational engine return the amount of data needed from a table or index to perform the operations needed to build the SELECT result set.
EXAMPLE 2
The following query computes the result from two indexes. SELECT FROM
CompanyName, OrderID, ShippedDate Customers AS Cst JOIN Orders AS Ord ON (Cst.CustomerID = Ord.CustomerID)
The relational engine requests two rowsets, one for the clustered index on Customers and the other on a nonclustered index in Orders. The relational engine then uses the OLE DB interfaces to request that the storage engine open rowsets on these indexes. As the relational engine works through the steps of the execution plan and needs data, it uses OLE DB to fetch the individual rows from the rowsets. The storage engine transfers the data from its data buffers to the execution operators of the query, which is executing in the relational engine. The relational engine combines the data from the storage engine rowsets into the final result set transmitted to the user. This clear communication between the relational engine and the storage engine via OLE DB interfaces enables the relational engine to process queries against any data source that exposes such interfaces. Such data sources can be other SQL Server DBMSs (distributed queries), and other relational or nonrelational OLE DB data providers (heterogeneous queries).
160 |
Enabling Component Databases with OLE DB
Figure 5.9
OLE DB as an internal DBMS interface. Relational engine
OLE DB
Storage engine
Local database
5.6.2
Local database
Local database
Distributed and Heterogeneous Query Processing Microsoft SQL Server 7.0 represents an implementation of the UDA (see Section 5.2) approach to integrating heterogeneous data. Its distributed heterogeneous query capability allows transactional queries and updates against a variety of relational and nonrelational OLE DB data providers running in one or more computers. SQL Server 7.0 supports two methods for referencing heterogeneous OLE DB data sources in Transact-SQL statements. The linked-servernames method uses the system-stored procedures sp_addlinkedsrv and sp_addlinkedserverlogin to associate a server name with an OLE DB data source. Objects in these linked servers can be referenced in Transact-SQL statements using the four-part name convention described here.
EXAMPLE 3
If a linked server name of DeptSQLSrvr is defined against another copy of SQL Server 7.0, the following statement references a table on that server: SELECT FROM
* DeptSQLSrvr.Northwind.dbo.Employees
An OLE DB data source is registered in SQL Server as a linked server. Once a linked server is defined, its data can be accessed using the four-part name ..<schema>.