VIDEO DATABASE SYSTEMS Issues, Products and Applications
The Kluwer Intemational Series on ADVANCES IN DATABASE SYSTE...
280 downloads
1621 Views
8MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
VIDEO DATABASE SYSTEMS Issues, Products and Applications
The Kluwer Intemational Series on ADVANCES IN DATABASE SYSTEMS Series Editor
Ahmed K. Elmagarmid Purdue University West Lafayette, IN 47907 Other books in the Series: DATABASE CONCURRENCY CONTROL: Methods, Performance, and Analysis
by Alexander Thomasian, IBM T. J. Watson Research Center TIME-CONSTRAINED TRANSACTION MANAGEMENT Real-Time Constraints in Database Transaction Systems
by Nandit R. Soparkar, Henry F. Korth, Abraham Silberschatz SEARCHING MULTIMEDIA DATABASES BY CONTENT
by Christos Faloutsos REPLICATION TECHNIQUES IN DISTRIBUTED SYSTEMS
by Abdelsalam A. Helal, Abdelsalam A. Heddaya, Bharat B. Bhargava
The Kluwer International Series on Advances in Database Systems addresses the
following goals: •
To publish thorough and cohesive overviews of advanced topics in database systems.
•
To publish works which are larger in scope than survey articles, and which will contain more detailed background information.
•
To provide a single point coverage of advanced and timely topics.
•
To provide a forum for a topic of study by many researchers that may not yet have reached a stage of maturity to warrant a comprehensive textbook.
VIDEO
DATABASE SYSTEMS
Issues, Products and Applications
Ahmed K. Elmagarmid Haitao Jiang Purdue University
II Abdelsalam A. Helal Microelectronics and Computer Technology Corporation (MCC)
II Anupam Jeshi University of Missouri
II Magdy Ahmed Purdue University
KLUWER ACADEMIC PUBLISHERS Boston/London/Dordrecht
Distributors for North America: Kluwer Academic Publishers 101 Philip Drive Assinippi Park Norwell, Massachusetts 02061 USA Distributors for all other countries: Kluwer Academic Publishers Group Distribution Centre Post Office Box 322 3300 AH Dordrecht, THE NETHERLANDS
Library of Congress Cataloging-in-Publication Data A C.I.P. Catalogue record for this book is available from the Library of Congress.
The publisher offers discounts on this book when ordered in bulk quantities. For more information contact: Sales Department, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell, MA 02061 Copyright © 1997 by Kluwer Academic Publishers All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher, Kluwer Academic Publishers, 101 Philip Drive, Assinippi Park, Norwell, Massachusetts 02061 Printed on acid-free paper. Printed in the United States of America
.e
- - A . K. Elmagarmid
To my wife Qing Yan and nay daughter Malina --Hailao Jiang
In m e m o r y of Professor Yehia El-Hakim, 1925-1996. - - A . Helm
To curiosity ....... A. Joshi
In m e m o r y of my father Abdelazim Ahmed, 1920-1993. - - M. Ahmed
CONTENTS
FOREWORD
ix
PREFACE
xi
1
INTRODUCTION
2
RESEARCH 2.1 2.2 2.3 2.4 2.5
3
OTHER RELATED RESEARCH 3.1 3.2 3.3 3.4
4
Introduction Video Board Video Storage Systems Video Server System
APPLICATIONS 5.1
ISSUES
Video Data Compression Media Server Design and File System Support Network Support for the Video Applications Copyright Protection
PRODUCTS 4.1 4.2 4.3 4.4
5
ISSUES
Introduction Video Data Modeling Video Scene Analysis and Video Segmentation Video Data Indexing and Organization Video Data Query and Retrieval
Introduction
9 9 11 23 41 47 57 57 66 74 78 83 83 83 91 95 103 103
viii
VIDEO DATABASE SYSTEMS
5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10
Education and Training Entertainment Commercial Industry and Manufacturing Digital Library Health and Medicine Communciation Law Enforcement Conclusion
103 106 110 111 112 114 115 117 119
6
CONCLUSIONS
121
A
USEFULURLS
125 125 126 130 137 150
A.1 A.2 A.3 A.4 A.5
Background and Overview Research Issues Other Research Issues Market and Commercial Products Video Database Applications
REFERENCES
163
INDEX
175
FOREWORD
Research and development in the video database technology has gained enormous popularity along with the rapid advances in computer and communication technologies. This technology is expected to be a major focus of effort in the arena of information system development for the rest of the decade. The goal is to promote affordable and cost-effective video database products for numerous applications, including business, medicine, education, entertainment, and manufacturing, etc. Along with promising prospects for potential benefits of the continuous growth and development of the video database technology, we are faced with complex engineering challenges that push the limits of available hardware and the ingenuity of human imagination. In order to realize actual working systems, extensive research and development efforts are needed. The goal of this book is to provide the state-of-the-art view of video database technology and to serve the need of a broad range of readers, including students, programmers, developers, and researchers who wish to design and build video-based applications. The book goes beyond pontification of open issues. The authors cover many important topics and take a pragmatic approach in terms of presenting techniques for processing, storing, accessing, and distributing of video information. An in-depth survey of video processing and data modeling techniques, given in Chapter 2, provides a comprehensive treatment to the underlying issues in designing video database management systems and building meaningful applications. A unique feature of this book is that it is also a guide to a large number of the state-of-the-art commercial products available for video data processing and management. The products can serve as the essential components for building large-scale video database systems. These systems can provide cost-effective solutions for management and dissemination of information, which is a primary tool for increasing economic efficiency.
x
VIDEO DATABASE SYSTEMS
Overall, the book--with its highly informative style--is a succinct source of information for video database technology.
Arif Ghafoor Purdue University November, 1996
PREFACE
The fusion of video systems and databases has embarked a technology that is received with great anticipation, from the stuffy hallways of corporate organizations to the comfortable couch of the living room. Great advances have been made in the database field. Relational and object-oriented databases, distributed and client/server databases, and large-scale data warehousing are among the more notable. However, none of these advances promises to have as great and direct an effect on the daily lives of ordinary citizens as video databases. Video databases will provide a quantum j u m p in our ability to deal with visual data, and in allowing people to access and manipulate visual information in ways hitherto thought impossible. By storing and querying semantically rich video information, this revolutionary technology is transforming the notion of what a database is beyond imagination. Problems that used to be tackled in the past using ad hoc, non-database solutions, will be revisited as simple and systematic applications of video databases. In fact, this technology will enable a wide class of ambitious applications that are needed in many aspects of our daily life. Audio-visual communication, telematics, Internet-based remote education, multimedia publishing, teleconferencing, and highly-programmable home entertainment are among a few such application domains. In this book, we attempt to go beyond a mere discussion of theory that underlies this technology. We give practical information on the research issues that the academics are working on, the products that have already been developed, and the applications of the future that are driving this research and development. Producing this book was an exciting proposition. It involved looking at and evaluating hundreds of systems over an 18 month period. Documenting state of the art in research is not difficult given the fact that research moves at a much slower pace than product introductions to the market place. Therefore, the most difficult task was that of doing the exhaustive surveys of the state of the market in products offered and their features. We strived to make our summaries as accurate as the available documentation allowed them to be. We hope the reader will find them useful as guides in their own search for products. This book can also be considered as a reference text for those entering the field
xii
V I D E O DATABASE SYSTEMS
of video or multimedia databases, as well as a reference for practitioners who want to identify the kinds of products needed in order to utilize video databases. The book is not limited to video databases, instead it covers issues relating to other layers and systems needed to realize and use video databases. The book covers concepts, products and applications. It is written at a level which is less detailed than what is normally found in textbooks but more indepth than what is normally written in trade press or professional reference books. Thus it seeks to serve both an academic and industrial audience by providing a single source of information about the research issues in the field, and the state-of-the-art of practice. Many people have contributed to this book. Our publisher Scott Delman of Kluwer handled the logistics of getting the book through the publication process and encouraged us to go through with this project. Ms. Lizhen Chen helped with collecting the information in Appendix A and with writing that section. Ms. Rosemay Winfield, Mrs. Karuna Joshi, Ms. Sheila Owens, and Ms. Barbara L'eplantier helped us in copy-editing the book by going through several versions of our manuscript. Our sincere thanks go to all of them. Many thanks are also due to Professor Arif Ghafoor, who kindly agreed to write the foreword to this book. Enfin, our kudos to all the dedicated researchers and vendors who gave us exciting materials to write about.
1 INTRODUCTION
Words and phrases like information age, information revolution, and infosphere have been thrown about so much over the past decade that any new phrase using the word information or its derivatives has begun to sound like a cliche. Yet the past decade has probably seen more information generated than ever before in any similar period in human history. In today's age, information is often a commodity and an asset, much like more traditional physical assets. Having the right information at the right time is of critical importance to all kinds of organizations and individuals, from large corporations to the armed forces, to the tourist looking for the nearest metro station. However, the increasing amount of information often leads to information overload, where one is expected to retrieve the right needle of information from the proverbial haystack. Thus, many like Denning have argued that as nmch attention needs to be paid to the receiving of information as to the generating of it. What complicates matters even further is that this information haystack is not just made up of plain t e x t - - i t is media rich. To borrow terms from HCI literature, humans have multiple, synergetic input modalities. We can digest all kinds of related aural and visual information. So tile producers of information, not just the big infotainment giants but also Joe Q. Average who has a web page, put out multimedia information. Video is one of the most popular and pervasive forms that this information takes. With the proliferation of relatively small and inexpensive video cameras, people in more developed countries have been producing home movies for the past decade or so. Note, for instance, the steady supply of material for television shows such as ABC's "America's Funniest Home Videos" in the United States. With the steady drop in prices, video cameras are also spreading in the developing countries, a phenomenon that will further increase the amount of raw video data. Video data are also
2
CHAPTER 1
generated by the countless TV stations and news crews spread all across the globe. Another important source of video data is the entertainment industry with T V programs, films, advertisements, and music videos inlet alia providing the bulk of the material. Clearly, given this plethora of data that are being generated everyday, we are reaching a point where the organization, storage, and retrieval of this data are becoming very important. As a result, video database systems are being looked into to manage video data and help the field move away from the mostly ad hoc schemes that are currently being used in this domain. At first glance, it might appear to the casual observer that this should not pose much of a challenge. After all, database systems have been a well researched issue over the past several decades, and many commercially successful products exist. Surely one could use these systems for video data. Unfortunately, such is not the case, and this is the raison d'etre for this book. Video databases share several research issues, such as the distributed nature of the data and the local caching of the data, with traditional database systems, and research being done in these areas can perhaps be used mutatis mutandis for video databases as well. Yet there are several issues that are unique to video databases, as the following chapter will describe. In addition, video data have unique characteristics that often make algorithms developed for traditional databases impractical or downright impossible to use. At the core of these differences lies the fact that in traditional systems, the data and semantics are clearly distinguishable and distinct. Video data in that regard are not really data alone but data and information fused into one. In other words, the data and semantics are intermingled. For example, raw numbers in a traditional database tell us very little. On the other hand, an image sequence that is a part of some video data conveys the semantics inherent in it. This implies that techniques from traditional database systems, while important, are extremely unlikely to provide answers to the research issues raised in the context of video databases. The objective of this book is to serve as a resource for those interested in the research issues involved in video database systems and the field of video databases in general. To this end, we have collected information on the ongoing research in this field, and on some of the products that are already in the market, as well as on fields where we foresee this technology finding substantial use in the near future. This book thus serves the purpose of a comprehensive reference to emerging technology and provides a single starting point for those interested in video databases. Our extensive bibliography also provides references to many of the primary sources for this field, allowing the reader to explore in detail, issues of interest.
Introduction
3
Chapter 2 of the text addresses what we envision as some of the fundamental and core research issues in this area. Specifically, we provide information on research that is being done in the areas of modeling, insertion, indexing, and retrieval of video data. All of these are issues that arise in traditional database systems, but traditional solutions are often unworkable in the domain of video data.
Data modeling refers to the process whereby a representation is designed for the data based on their characteristics and information content, as well as intended application. Video data have certain unique characteristics that distinguish them from not just traditional alphanumeric data but also image data. For one, they have both spatial and temporal aspects. This means that they are much more complex to organize in relation to unidimensional alphanumeric data. Further, unlike alphanumeric data, the relations between video data are not clear. For example, a similarity (or difference) metric between two video clips is extremely difficult to define. Finally, the sheer voluminousness of video data is a characteristic singularly missing from traditional alphanumeric data. A single second worth of video can take up megabytes of storage! Video data is also both multimedia and multimodal. All these features demand new data models, several of which are described in Chapter 2. Insertion of new video clips into a database requires that we be able to partition a video data sequence into meaningful units. This requires scene analysis and segmentation. A simple example would be to take the tape of the Super Bowl and cut out all advertisements. This requires segmenting the video into game clips and advertisements. Clearly, to be able to do this, we must be able to analyze the video and detect when a scene has changed. Thus scene analysis and segmentation are important aspects of video database systems. We provide a comprehensive survey of various proposed algorithms to achieve this. Some of these algorithms can detect abrupt scene changes only, such as when a shot of E m m i t t Smith running is replaced by that of Bugs Bunny dunking a basketball wearing Air Jordan shoes. The detection of gradual scene changes is a more complex problem that some recent algorithms have addressed with moderate success. Continuing in the same vein, an example of a gradual scene change would be E m m i t t Smith getting morphed into Bugs Bunny. An abrupt scene change is usually accomplished by changes in several measurable image properties such as chrominance, luminance, color distribution, scene depth and so on. Thus one can attempt to detect such a change by measuring the interframe difference in these properties. In the gradual scene change case, artifacts of production (such as morphing, fading, mixing, and dissolving) cause the changes in frame properties to occur not across a single frame change, but to spread across several frames. Clearly, identifying variations within a scene
4
CHAPTER 1
from a scene change becomes a much more complex task. A related and important issue is whether scene changes can be detected in compressed data. Due to an extremely large volume, video data are often transmitted and stored in compressed format. If one could detect scene changes without having to uncompress data, a significant burden in terms of computation time and storage would be eliminated. Some new algorithms attempt to detect changes in data that are compressed using a discrete cosine transform. However, the more general issue is whether one could come up with compression techniques that are amenable to scene change detection (and other possible frame property measurement operations). Chapter 2 also studies the indexing of video data. Clearly, this task is much more complex than in the case of traditional alphanumeric data. For alphanumeric data, keys (or attributes) on which indexing can be done are fairly well defined. The same cannot be said of video data. For example, should video frames be indexed based on their overall color, their overall intensity, their intensity or color distribution, or the accompanying audio clips? Moreover, while the nature of alphanumeric data lends itself to automatic indexing, this is rarely possible for video data. Techniques for indexing video data depend largely on the model chosen for video data. Existing work on indexing can be categorized as that based on annotation, that based on features, and that based on the application domain. Annotation of video data, due to the current state-of-the-art in computer vision, remains a task that requires constant human intervention. Clearly, this is a time-consuming and expensive enterprise, and much of the work in this area is devoted to developing systems that will aid the humans in this task. Feature-based indexing, on the other hand, seeks to automate the indexing process. It uses as its keys measurable properties of image frames such as color, luminance and texture. While such keys enable automation, they cannot capture the semantics associated with the video, and thus the semantic aspects of indexing are completely lost. Indexing can be made somewhat easier when, instead of general techniques, domain-specific schemes are used. For example, one could index footage of a newscast as anchorperson shots, reporter shots, news footage, sports/weather anchor shots, weather map shots or scoreboard shots. The drawback of such schemes is inherent in their domain dependence. Finally in Chapter 2, we examine the issues of querying a video database and retrieving information. In order to design retrieval systems, one must first identify the nature of queries possible. One possible aspect is the content of the query. Is it related to the semantics of the data? Is it related to some meta information about the data? Is it about t h e spatial or temporal aspects of the data? Another possible aspect relates to whether an exact match to the query
Introduction
5
is sought, or whether partial matches will be acceptable as well. Similarly, the granularity and behavior of the query need to be considered as well. When one examines these factors, one finds that languages like SQL lack the expressiveness needed for video database systems. Thus, several researchers have examined the issue of developing new languages for these systems, especially those that can capture the spatio-temporal relationships inherent in video data. Researchers have also been investigating the related issue of representing such data and their relationship to the users. In Chapter 3, we deal with research issues other than the core issues identified in the previous chapters. These research issues have a strong bearing on the development of video database systems. One such issue is that of video compression. As we have mentioned before, video data occupy a lot of bandwidth and space. For instance, the NTSC signal standard, used in the U.S., has an image frame of 640x480 with 3 bytes per pixel. The signal is usually interleaved and requires over 27 MBps data rate for the uncompressed signal. The PAL format, used in several countries such as the United Kingdom, India and China, requires an almost 33 MBps data rate. Thus compression is a natural accompaniment whenever the issue of storage or transmission of video d a t a is discussed. Video compression is almost always a lossy process, and a tradeoff usually exists between the quality of the recovered video and the compression ratio achieved. Also, the quality of the video can be measured a m o n g several dimensions, and tradeoffs can be made, such as sacrificing color information to preserve brightness information. We present several schemes that have been proposed for video compression, such as M P E G , Motion J P E G , C C I T T H.261, DVI, and QuickTime, and discuss their relative merits. Besides compression, the actual medium and mechanism used for storage are also extremely important. Thus, in Chapter 3 we describe the various issues involved with the design of the media system, and the required file system support. The objective of the media server is to use a combination of primary, secondary and tertiary storage to provide a continuous media stream in response to user requests. This becomes extremely important in applications like video on demand. This requirement on the media server leads to several constraints on the kind of system that will be designed. We articulate these issues, point out why traditional file systems such as NFS are inadequate, and discuss work being done to devise new systems. The form and structure of the network are also i m p o r t a n t in the case of a video database. We discuss in Chapter 3 the work that is being done to devise new high-bandwidth wide-area networks. These include initiatives such as the Gigabit network testbed supported by the NSF. We also discuss new protocols
6
CHAPTER 1
that provide QOS guarantees. Finally, we touch on the rather thorny issue of copyrights and intellectual property protection. Research is being done on both the prevention of unauthorized duplication, and the detection of such. We provide an overview of this work and summarize the issues involved. Several products that can form parts of a video database system are now commercially available. In Chapter 4, we present an overview of these products. The information, such as capabilities, standards compliance, and process, provided for these products is based largely on information publicly released by the manufacturers. We have not tested any of these products, and so must here stress caveat emptor. The specific categories we have examined are: V i d e o boards, which plug into computers and support video capture and full motion playback of digital video (examples include Sound and Motion 1900, FulVideo Supreme, Intel's DVI, and the TAGRA 2000); V i d e o s t o r a g e s y s t e m s for the massive amounts of data that a video database will typically have (these include RAID systems and CD/CD-i systems); and V i d e o server systems, which provide continuous, real-time multimedia data streams to users (companies such as Oracle, HP, SGI, Digital and several others have already announced products, which we describe). Clearly, no progress in technology occurs in vaccuo. Research is often driven by the needs of society, whether actual or perceived. Video databases too are being spurred on by a huge perceived demand for such systems in several application areas. In Chapter 5 of this book, we present application scenarios in several areas that will require video database systems. Consider, for example, education and training. Studies have been conducted that show that training with multimedia components is more effective along several dimensions when compared to traditional training methods. Video databases form an important component of technology needed for distance education and teleclassrooms. Systems that provide students with lectures on demand have been visualized, and video databases are needed for such systems to become a reality. Universities, which see such systems as an opportunity to extend their reach globally, are important players in this field, and several projects using video databases are in various stages of development. For industries, too, such systems can be a boon in their retraining and continuing education efforts for employees. The ability of employees to learn off-line using video material is of significant importance to corporations.
Introduction
7
Perhaps the most talked about application of video databases, at least in the popular media, is the video-on-demand system. Writers have painted pictures of huge video repositories from which one could download one's favorite movie or show using the information infrastructure being developed by phone and cable companies. Several companies are now investing in the infrastructure needed to support this application in specific target areas to judge cost and profitability. One of the things holding back this application is the relatively high cost of deployment, which leads to high costs for the subscribers. In fact, USWest recently cancelled one of its prototype setups for VOD due to a lack of subscriber interest. However, with the recent changes in the regulatory regime in the U.S., it is expected that cable and phone companies will merge and be able to use their combined infrastructure to support VOD services. We describe several ongoing projects by leading telecommunication and entertainment companies in this area. Electronic commerce, which is supposed to be one of the "killer apps" for the World Wide Web and Internet, also requires video database support. Companies are already talking about on-line catalogs with video clips that are generated on-demand in response to user queries. Consider, for example, a car buyer querying GM's web site about cars that have ABS, power steering, and cost less than $15,000 and the site providing video clips of advertisements of some specific models. Video databases will also play an important role in manufacturing, providing video support for training, and on-line fault diagnoses. Video servers will obviously also serve as an integral component of the various digital library systems that are being conceived. Another critical area that needs video database technology is telemedicine. It is well known that placing and retaining health care workers in rural or inner city areas is exceedingly difficult. Telemedicine seeks to alleviate the problem by allowing primary health care providers and general practitioners in such areas to consult with specialists in large urban and suburban centers. An important aspect of such consultations is the ability of the specialist to examine the patient in some virtual sense and the ability of the general practitioner to access case histories of patients with similar symptoms. All of this demands that video clips (of the patient being examined, of specific conditions and so on) be made available on demand, and such a scenario requires a video database for realization. Video databases are also being projected as important components of security and law enforcement systems. Finally, Chapter 6 provides an overview of the work described in earlier chapters and suggests directions in which further research and progress are needed to realize video database systems.
2 RESEARCH
2.1
ISSUES
INTRODUCTION
With advances in computer technology, digital video is becoming more and more common in various aspects of life, including communication, education, training, entertainment, and publishing. The result is massive amounts of video data that either already exist in digital form, or that are soon to be digitized. According to an international survey [42], there are more than six million hours of feature films and video archived worldwide, with a yearly increase rate of about 10%. This would be equal to 1.8 million GB of M P E G encoded digital video data if they were digitalized. Another example is the NASA's Earth Observation System, which has the capability of generating about 1 terabyte of image data per day when fully operational [45]. With this huge, ever-increasing amount of video information, it would be impossible to cope with the growth without systematic management of the video data. A similar need in the past led to the creation of computerized textual and numeric database management systems (DBMSs), which are also called traditional DBMSs. Traditional DBMSs are mainly designed for managing simple structured data types, but not video data. The difficulty lies in the fact that the nature of video data is fundamentally different, and require new ways of modeling, inserting, indexing, and manipulating data. For example, unlike traditional DBMSs that rely on exact match for retrieval, queries on video data requires similarity-based retrieval algorithms. Also, graphical user interface design and video data browsing are much more important in video databases than in traditional database management systems.
10
CHAPTER 2
IVideoAutho-og~____~ / IVideoInputI "~
Structured and Indexed Video Data
IOraphiU~,rin~r~ o I
IVideoSogme.tatio°I"*"" IVidooAonotationlIQoo~~°~essingl
/G
Video Meta Data
Raw VideoData
F i g u r e 2.1
A generic video d a t a b a s e system
A video database management system can be defined as a software system that manages a collection of video data and provides content-based access to users [47]. A generic video database management system is shown in Figure 2.1. Similar to the issues involved in the traditional DBMS [29], a video database management system (VDBMS) needs to address the following important issues: V i d e o d a t a m o d e l i n g deals with the issue of representing the video data, that is, designing the high-level abstraction of the raw video to facilitate various operations. These operations include video data insertion, editing, indexing, browsing, and querying. Thus, modeling of the video data is usually the first thing done in the design process of a video database management system (VDBMS). It has great impact on other components of the VDBMS. The video data model is, to a certain extent, user and application dependent. This is discussed in Section 2.2. V i d e o d a t a i n s e r t i o n deals with the issue of introducing new video data into a video database. This usually includes following steps:
Research I s s u e s
11
- Key information (or features) extraction from video data for instantiating a data model. The a u t o m a t i c feature extraction can usually be done by using image processing and computer vision techniques for video analysis. - Break the given video stream into a set of basic units. This process is often called video scene analysis and segmentation, which is described in Section 2.3. - Manually or semi-automatically annotate the video unit. W h a t needs to be annotated is usually within the application domain. - Index and store video data into the video database based on the extracted information and the annotated information about video data. Video indexing and annotation is discussed in Section 2.4. V i d e o d a t a i n d e x i n g is the most i m p o r t a n t step in the video d a t a insertion process. It deals with the organization of the video data in the video database to make user access such as querying or browsing more efficient. This process involves the identification of the important features and computing the search keys based on them for ordering the video data. V i d e o d a t a q u e r y a n d r e t r i e v a l deals with the extraction of video data from the database that satisfies certain user-specified query conditions. Due to the nature of video data, those query conditions are usually ambiguous in that the video data satisfying the query condition are not unique. This difficulty can be partially overcome by providing a graphic user interface (GUI) and video database browsing capability to the users. The GUI of a video database can help the user to improve query formulation, result viewing and manipulation, and navigation of the video database. Video data query and retrieval are described in Section 2.5.
2.2
VIDEO
DATA
MODELING
Video data modeling is the process of designing the representation for the video data based on its characteristics, the information content, and the applications it is intended for. Video data modeling plays a very important role in the VDBMS since other functions are more or less dependent on it. For example, video data modeling determines which features are to be used in the indexing process, and can also help in developing more suitable video analysis tools.
12
CHAPTER 2
2.2.1
Characteristics
of Video Data
Video is a medium with high resolution and very rich information content. In addition to the meta information such as the title, author, or the production date, video also provides other information, such as the tracks of motions, the happening of the events, and the difference of object shapes. The nature of video information is different from textual data since video has both temporal and spatial dimensions. Moreover, the volume and unstructured format of digital video data make it difficult to manage, access, reuse, and compose video segments into video documents. By providing a new digital video data type with content-based access, the video database will solve these problems and motivate broader use of video resources.
Unique Characteristics of Video Data Before video data modeling is discussed, it is necessary to compare video data to other kinds of data that are managed by the traditional DBMS. The unique characteristics of video data are discussed by Hampapur [47] and are summarized in Table 2.1. Criteria Information
Dimension Organization Volume Relationship
Textual Data
Image Data
Poor Static and non-spatial Organized Low Simple and well defined
Rich Very rich Static and Temporal and spatial spatial Unstructured Unstructured Median Massive Complex and ill defined
Table 2.1
Video Data
Comparison of video data with other types of data
I n f o r m a t i o n : Because video and image data contain much more information than plain textual data, the interpretation of the video and image data is usually ambiguous and dependent on both the viewer and the application. On the other hand, textual data usually has a limited and well defined meaning. D i m e n s i o n : Textual data are neither spatial nor temporal, and can be thought of as one-dimensional. Image data, however, contains spatial but not temporal information, and can be regarded as two-dimensional. Video data, on the other hand, has one additional dimension--time, and can be viewed as three-dimensional data objects.
Research Issues
13
O r g a n i z a t i o n : Compared to traditional data types like textual data, video and image data do not have a clear underlying structure and are much more difficult to model and represent. Volumq : One single image is usually of the magnitude of K Bytes of data vol role, and one minute of video data m a y contain over 1,000 image f r a m e s . ks pointed out in [47], the data volume of the video data is about seven or ters of magnitude larger than a structured data record. R e l a t i c a s h i p : Relationship operators defined over textual data such as equal an [ not equal are simple and well-defined. However, the relationships between video (image) data segments are very complex and ill-defined. This ca1 ses m a n y problems to the video data indexing, querying and retrieval. For example, there is no widely accepted definition of a simple similaril operator between two images.
Content 9f Video Data Video data a ing those fea develop data cessing techn to the follow
'e said to have very rich information content or features. Identifyures will help us to better understand the video data, as well as to models to represent, indexing schema to organize, and query proques to access it. We can classify the video data content according ng criteria [47]:
S e m a n 1ic c o n t e n t : Semantic content of the video is the idea or knowledge It is usually ambiguous and context-dependant. For exa J lple, two people can watch the same T V program and yet have different opinions about it. By limiting the context or the application, the ambigui y of the video data can be reduced. it conv,~ "s to the user.
dsual content:
Audio
dio: Some video data contain an audio track, like a seminar video clil • The audio signal can help us understand the semantic informatic, carried in the video, and it is possible to extract it from the video USl ,g speech recognition Mgorithms.
--
A~
--
C~}
for: Color intensity and color distribution (histogram). : t u r e : Different texture patterns.
-
O~
.ject m o t i o n : Rotation, translation, and so on.
-
O~
.ject r e l a t i o n s h i p : Before, after, above, below, and so on.
14
CHAPTER 2
-
Camera operation:
Pan, fade in/out, and zoom.
- O b j e c t : Shape, area, and so on. T e x t u a l c o n t e n t : Some textual content information m a y be embedded in the video data. Examples are the caption of a news video clip, the title of the video clip, or actors and actresses listed at the beginning of a feature film. This textual information provides important m e t a data about the video data and can be extracted from the video using existing optical character recognition (OCR) techniques.
The contents of video data are not equally important. The choice and importance of the features depend on the purpose and use of the video data. In an application like animal behavior VDBMS, the motion information of objects (in this case, animals) is the most i m p o r t a n t content of the video data. Also, there m a y be additional m e t a information, which is usually application specific and cannot be obtained directly from the video data itself. It usually is added during the annotation step of inserting video data into the VDB, for example, background information about a certain actor in a feature film video database.
2.2.2
R e q u i r e m e n t s for a V i d e o D a t a M o d e l
To provide efficient m a n a g e m e n t , a VDBMS should support video data as one of its data types, just like textual or numerical data. The supporting video data model should integrate both content attributes of the video stream and its semantic structure. It should also describe the physical object (persons, vehicles, buildings) present in a scene and the associated verbal communication for each video segment. Structural and temporal relationships between video segments can also be expressed. I m a g e and video features like color, texture, shape, and motion can be extracted automatically from the video stream and used as attributes. Additionally, the user can incorporate these features in the description of visual data.
Multi-level Video Structure Abstraction Support In a video stream, there are two inherent levels of abstractions: the entire video stream and the individual frames. For most applications, the entire video stream is too coarse as a level of abstraction. A single frame, on the other hand, is rarely the unit of interest. This is because a single frame spans an extremely short interval of time and the number of individual frames in
Research Issues
15
even a short video is large (the European video standard, PAL, for instance results in 25 frames per second) [54]. Other intermediate abstractions, such as scenes, are often desired, and thus a hierarchy of video stream abstraction can be formed. At each level of hierarchy, additional information, like shot type, should be allowed to be added. A multi-level abstraction of the video stream can thus be built that [54]: •
Is easier to use to reference video information and easier to comprehend its contents,
•
Better supports video browsing, and
•
Simplifies video indexing and storage organization.
The video shot can be considered as the basic structural element for characterizing the video data [105]. As defined by Davenport et al. [30], a shot consists of one or more frames generated and recorded contiguously, representing a continuous action in time and space. Shots that are related in time and space can be assembled in an episode [105]. Figure 2.2 is an example given in [105] representing the structure of the CNN "Headline News" episode. Similarly, Hjelsvold
g i a k f+a+ e;+
Figure
He~di~+ieN e w E~i~g ~e ~efsSh~
N e w R~i
Am+,:t>+++e+ ~ S*~,t
N e w Ree}2
2.2
News ~ i
Wea~
3
Headline N e ~ F~y~g Let~er+S ~
(~te as{
F r a m e s a n d shots of a C N N " H e a d l i n e News" episode
and M i d t s t r a u m [54] propose to abstract the video stream structure in a compound unit, sequence, scene, shot hierarchy. They also follow the definitions of Davenport et al. [30] and define a scene as a set of shots that are related in time and space. Scenes that together give a meaning are grouped into what is called a sequence. Related sequences are assembled into a compound unit, which can be recursively grouped together into a compound unit of arbitrary level.
16
CHAPTER 2
Spatial and Temporal Relationship Support A key characteristic of video data is the associated spatial and temporal semantics. This makes video data quite different from other types of data. Thus, it is important that the video model identifies physical objects and their relationship in time and space. A user of a video database can generate queries containing both temporal and spatial dimensions. Generally, most of the episodes and video sequences can be expressed in the form of worldly knowledge by describing the interplay among physical objects in the course of time and their relationship in space. The temporal relationships between different segments are very important from the perspective of a user navigating through a video. Given any two intervals, there are thirteen distinct ways in which they can be related [5]. The thirteen relations can be represented by seven cases [70], since six of them are inverses of each other, except equal relation. They are, namely, before, meets, overlaps, during, starts, finishes, and equals. Those temporal relations are used in formulating queries that contain temporal relationship constrain among the video frames [54]. For spatial relations, most of the techniques are based on projecting objects on a two- or three-dimensional coordinate system. Very few research attempts have been made to formally represent the spatiotemporal relationship of objects contained in the video data and of queries with those constraints.
Video Annotation Support A video data model should support easy and dynamic annotation of the video stream. Unlike textual data, digital video does not easily accommodate extraction of content features because fully automatic image and speech recognition is not yet feasible. Moreover, the structure of a video captures some aspects of the video material but is not suited for the representation of every characteristic of the material. For example, given two city scenes, it may be impossible to say which is in New York and which is in Indiana. As discussed in [52, 97], it should be possible to make detailed descriptions of the content of the video material that are linked not necessarily directly to structural components but more often to arbitrary frame sequences. Also, annotations of video data often change dynamically depending on the human interpretations and application contexts. Presently, the video annotation process is mostly an off-line, manual process even though graphical user interfaces are often built to help users input descriptions of video data. It will remain interactive until significant breakthroughs are made in the field of computer vision and artificial intelligence.
Research Issues
17
Video Data Independence Data independence is a fundamental transparency that should be provided by a DBMS. One of the advantages of data independence is sharing and reuse of video data which is critical in a VDBMS because of the sheer volume of data. As pointed out by Mackay and Davenport [76], the same basic video material may be used in several different video documents. Hjelsvold and Midtstraum [54] explore this idea by defining the video content of a video document as a logical concept called VideoS'lream, which can be mapped onto a set of physically stored video data called StoredVideoS'egment. However, the concept of video data independence has not been fully addressed in the current literature yet.
2.2.3
Video Data Models
Models Based on Video Segmentation Traditional data models like the relational data model have long been recognized as inadequate for representing the rich data structures required by image and video data. For example, most relational databases do not support an array data type needed for representing images. In the past few years, many video data models have been proposed. Some of them first segment the video stream into a set of temporally ordered basic units that are often called shols. Next, domain dependent models, which can be either hierarchy or finite automata, are built upon basic units. Swanberg et al. [104, 105] propose a hierarchical video stream model that is illustrated in Figure 2.3. In the first step of representing a given video stream, template matching or histogram matching techniques are used to segment the video stream into a set of temporally ordered shots by detecting scene changes. Once shots are obtained, their types are identified by domain-specific shot models. One example of such a shot model in the CNN news video is the anchor/person shot. The model for an anchor/person shot is based on locations of the set of features within frames. These features include the "Headline News" icon in the lower right corner, the title of the anchor/person name. Finally, episodes are identified by matching episode models to the typed shots. For example, CNN "Headline News" begins and ends with a computer graphics shot of flying letters reading "Headline News." After the initial graphics, an anchor/person shot leads the news broadcast. In this system, the episode model is represented by a finite automata.
18
CHAPTER 2
Episode
Shot
Shot
Scene
Figure
2.3
Shot
Scene
......
......
Shot
Scene
Swanberg et al.'s hierarchical video model
A four-layer model, called VIMSYS (Visual Information Management System), is proposed by Gupta et al. [46] to model data in visual information management systems. In this model, the user can view the information entities in four different planes. These planes correspond to domain objects and relations (DO), domain events and relations (DE), image objects and relations (IO), and image representations and relations (IR), respectively. All objects have a set of attributes and methods associated with them. The attributes have their own representations and are connected in a class-attribute hierarchy. The relations may be spatial, functional, or semantic. This hierarchy provides a mechanism for translating high-level semantic concepts into content-based queries using the corresponding image data. This allows queries based on object similarity to be generated without requiring the user to specify the low-level image structure and attributes of the objects. Such data modeling techniques are certainly needed. However, there is no experience available currently in using such a model for real applications. A video data model based on video production process is used by Hampapur et al. [50]. The video production process involves shooting, which generates a significant amount of raw footage and editing, which organizes the raw footage into final video. During the editing phase, video objects from the collection are retrieved based on the content, and these are organized into the final representation. As the shooting operation progresses, new clips are produced that are introduced into the video collection available to the editor. The video data modeling presented in this work is driven from the perspective of a set of different applications of video like feature films, news video, sporting event videos, biomechanical analysis of sports, and building security videos. This model cap-
Research Issues
19
tures the essential aspects of video editing. Based on this edit model, video feature extractors for measuring image sequence properties are designed. These extracted features are used in a production model that is based on classification formulation and that segments the video stream.
Models Based on Annotation Layering As pointed out by Smith and Princever [99], the main weakness of video segmentation-based models is lack of flexibility. Instead of partitioning the video stream, Smith and Davenport et al. [30, 98] propose a layered annotation representation model called the stratification model, which segments contextual information of the video. The stratification model approximates the movie editor's perspective on a movie. They suggest that to maximize browsing efficiency, the representation should be built up from the level of movie shots. The idea is that if the annotation is performed at the finest grain, any coarser grain of information may be reconstructed easily. The stratification system has structures to represent perspectives (Who is the subjective operator of the camera?), cinematographic (What are the cinematographic properties of the shot?), content (What is in the shot?), and context (How does the shot relate to others?). They do recognize the enormity of the post production annotation task. They suggest the idea of a data camera, which in addition to recording video will record some of the lower levels of annotation data required.
Video Object Models Two prevailing data models used in current DBMS are the relational and the object-oriented models. The object-oriented model is getting more and more popular. It has the following features that make it one of candidates for modeling video data [80]. It can •
Represent and manage complex objects,
•
Handle object identities,
•
Encapsulate data and associated methods into objects, and
•
Inherit attribute structures and methods based on a class hierarchy.
However, modeling the video data using the object-oriented data model has also been strongly criticized by Oomoto and Tanaka [80] and Banerjee et al. [7], mainly because of the following reasons:
20
CHAPTER 2
•
Video data are raw data created independently from its contents and its database structure. This is described later in the annotation process.
•
In traditional data models like object-oriented model, the data schema is static. T h a t is once defined, attributes of the object are more or less fixed, and adding or deleting attributes is impossible. However, attributes of the video data cannot be defined completely in advance because -
Descriptions of video data are user and application dependent, and
- The rich information contained in video data implies that semantic meaning should be added incrementally. Thus, a video data model should support an arbitrary attribute structure for the video data and support incremental and dynamic evaluation of the schemas and attributes. •
Many object-oriented data models only support class-based inheritance. However, for the video data objects, which usually overlap or include each other, support for inclusion inheritance [80] is desired. Inclusion inheritance will enable those video objects to share their descriptive data.
A video object model is used in a video database prototype system named OVID (Object-oriented Video Information Database) developed by Oomoto and Tanaka [80]. In this model, the notion of video object is introduced, it can be an arbitrary sequence of video frames. Each video object consists of a unique identifier, an interval presented by a pair of starting and ending frame numbers, and the contents of the video frame sequence described manually by a collection of attribute and value pairs. Oomoto and Tanaka's video data model is schemaless, that is, it does not use the class hierarchy as database schema like the OODB system. Arbitrary attributes can be attached to each video object if necessary. This enables the user to describe the content of the video object in a dynamic and incremental way. Also, interval inclusion inheritance is applied to ease the effort of providing description data when an existing video is composed into new video objects using the generalization hierarchy concept. This approach is very tedious since the description of video content is done manually by users and not through an automatic image processing mechanism. Another video object data model for the annotations of video database is described by Chang et al. [19]. They designed a video object description model (VODM) to record the detail attribute information about video objects, to store, and to retrieve annotations in many different video files. In VODM, the representation of a video object can be described by attributes with several
Research Issues
21
different data types. They can be a text type keyword, a paragraph of words, a related image, or even another recursive video object. The content representation of the VODM is based on the Entity-Relationship model [22] for the database conceptual-level organization. In this model, the entity is an "object of content" in the video data and can be distinctly identified. The relationship is an association among entities. Every entity and relationship have particular properties, called attributes. An entity or a relationship m a y have an attribute value for each of its attributes.
Algebraic Video Data Model Weiss et al. [113] propose an algebraic video data model that defines a video stream by recursively applying a set of algebraic operations on the raw video segment (see Figure 2.4). The fundamental entity of the algebraic video data [~deoexpr.sdo.]
Kid pre~sion
v" . . . . . .
ion
raw video stream
Figure 2.4
'
expression]
.~
Weiss et al.'s algebraic video model
model is a presentation. A presentation is a multi-window spatial, temporal, and content combination of video segments. Presentations are described by video expressions, which are constructed from raw video segments using video algebraic operations. Video algebraic operations included in the model are •
Creation: create, delay;
•
Composition: concatenalion, union, intersection, and so on (video expressions can be combined both temporally and spatially);
•
Output: window, audio (which define the output characteristics of video segments);
22
•
CHAPTER 2
Description: description, hide-content (content attributes can be attached to video algebraic expressions);
Segments are specified using the name of the raw video and a range within the raw video. The recursive nature of the video structure is supported by creating compound video expressions from simpler ones using video algebra operations. The model also allows nested stratification, that is, overlapping logical video segments are used to provide multiple coexisting views and annotations for the same raw video data. Users can search video collections with queries specifying the desired attributes of video expressions. The result of a query is a set of video expressions that can be played back, reused, or manipulated by a user. In addition to the content-based access, algebraic video allows video browsing. The algebraic video data model is implemented in a prototype system called the algebraic video system.
Other Video Data Models A data model that provides a framework for modeling both the structure description and the contents of a video is proposed by Hjelsvold and Midtstraum [54]. The authors suggest that this model is generic, and can be tailored for different application domains to adopt domain specific terminology or attributes. They have shown how the model can be adopted to a television news domain. The proposed model supports
Structuring of video material by a compound unit, sequence scene, shot hierarchy; Free annotation of video material (annotating any arbitrary video frame sequence is done by establishing an Annotates relationship between a FrameSequence and an Annotation); Sharing and reuse of video material by the separation of the logical VideoStream from physical VideoSegment.
The main weakness of the proposed model is its complexity [54]. Day et al. [34, 35] propose a graphical data model called video semantic directed graph (VSDG) for specifying spatio-temporal semantics of video data. The proposed model extracts spatial and temporal information associated with objects (persons, buildings, vehicles) in a video clip and represents it in the form of a
Research Issues
23
directed graph. They also proposed a framework based on spatio-temporal information and on a set of generalized n-ary operations defined to specify spatial and temporal relationships of objects presented in the data. They described a method of handling content-based spatial and spatio-temporal queries.
2.3
2.3.1
VIDEO VIDEO
SCENE ANALYSIS SEGMENTATION
AND
Introduction
A video database management system is a software that manages a collection of video data and provides content-based access to users [47]. There are four basic problems that need to be addressed in a video database m a n a g e m e n t system. These are video data modeling, video data insertion, video data storage organization and m a n a g e m e n t , and video data retrieval. One fundamental aspect that has a great impact on all these problems is the content-based temporal sampling of video data [93]. Its purpose is to identify significant video frames to achieve better representation, indexing, storage, and retrieval of the video data. Automatic content-based temporal sampling is very difficult due to the fact that the sampling criteria are not well defined, that is, whether a video frame is i m p o r t a n t or not is usually subjective. Moreover, it is usually highly application-dependent and requires high-level, semantic interpretation of the video content. This requires the combination of very sophisticated techniques from computer vision and AI. The state-of-the-art in those fields has not advanced to a point where semantic interpretation would be possible. However, researchers usually can get satisfying results by analyzing the visual content of the video and partitioning it into a set of basic units called shots. This process is also referred to as video data segmentation. Content-based sampling can thus be approximated by selecting one representing frame from each shot, since a shot is defined as a continuous sequence of video frames that have no significant inter-frame difference in terms of their visual contents. 1 A single shot usually results from a single continuous camera operation. This partitioning is usually achieved by sequentially measuring inter-frame differences and studying their variances, for example, detecting sharp peaks. This process is often called
scene change detection (SCD). aThere are many definitions in the literature bearing different points of views. This definition seems to be the most agreed on
24
CHAPTER 2
F i g u r e 2.5
An example of an a b r u p t scene change
F i g u r e 2.6
An example of a gradual scene change
Scene change in a video sequence can be either abrupt or gradual. Abrupt scene changes result from editing "cuts," and detecting them is called cut detection [48]. Gradual scene changes result from chromatic edits, spatial edits, and combined edits [48]. They include special effects like zoom, camera pan, dissolve, and fade in/out. Examples of an abrupt scene change and a gradual scene change are shown in Figure 2.5 and Figure 2.6, respectively. Scene-change detection is usually based on some measurements of the image frame, which can be computed from the information contained in the images. This information can be color, spatial correlation, object shape, motion contained in the video image, or DC coefficients in the ease of compressed video data. In general, gradual scene changes are more difficult to detect than abrupt scene changes and may cause the majority of scene detection algorithms to fail under certain circumstances. Existing scene-change detection (SCD) algorithms can be classified in many ways according to the video features they use and the video objects they can be applied on, among other factors. In this section, we discuss SCD algorithms in three main categories: (1) approaches that work on uncompressed full-image sequences, (2) algorithms that aim at working directly on the compressed video, and (3) approaches that are based on explicit models. The latter are also called top-down approaches [47], whereas the first two categories are called bottom-up
approaches.
Research Issues
2.3.2
25
Background
We now introduce some basic notations used in this section, followed by the notions of DC (discrete cosine) images, DC sequences and how they can be extracted from compressed video. Several of the most often used image measurements are also briefly described in terms of their use in measuring the inter-frame difference. It should be noted that they may not work well for scene detection when used separately; thus, they are usually combined in the scene-change detection algorithms. For example, Swanberg et al. [105] use a combination of template and histogram matching to measure the video frames.
Basic Notations A sequence of video images, whether they are fully uncompressed or spatially reduced, are denoted as li, 0 < i < N; N is the length or the number of frames of the video data. h ( x , 9) denotes the value of the pixel at position (x, Y) for the ith frame. Hi refers to the histogram of the image Ii. The inter-frame difference between images Ii, Ij according to some measurement is represented
as d(Zi,
MPEG Standard: Different Frame Types According to the International Standard ISO/IEC 11172 [39], a MPEG-I compressed video stream can have one or more of following types of frames: I (intra-coded) frames are coded without reference to other frames. They are coded using spatial redundancy reduction, which is a lossy block-based coding involving DCT, quantization, run-length encoding, and entropy coding. P (predicitive-coded) frames are coded using motion compensation predication from the last I or P frame. B (bidirectionally predictive coded) flames are coded using motion compensation with reference to both previous and next I or P frames. D (DC-coded) frames are coded using DC coefficients of blocks; thus, they contain only low-frequency information. D frames are not allowed to coexist with I / P / B frames and are rarely used in practice.
26
CHAPTER 2
Obviously, any M P E G compressed video stream must have at least I frames. The data-size ratios between frames suggested by the standard are 3:1 for I:P and 5:2 to 2:1 for P:B. In other words, B frames have the highest degree of compression, and I frames have the least. More details about MPEG video streams can be found in [39].
DC Images, DC Sequences and Their Extraction A DC (discrete cosine) image [117, 118, 119, 120] is a spatially reduced version of a given image. It can be obtained by first dividing the original image into blocks of n × n pixels each and then computing the average value of pixels in each block, which corresponds to one pixel in the DC image. For the compressed video data, such as MPEG video, a sequence of DC images can be constructed directly from the compressed video sequence, which is called a DC sequence. Figure 2.7 is an example of a video frame image and its DC image.
Figure 2.7 An example of a full image and its DC image
There are several advantages of using DC images and DC sequences in the SCD for the compressed video: DC images retain most of the essential global information for image processing. Thus, a lot of analysis done on the full image can be done on its DC image instead. DC images are considerably smaller than the full-image frames, which makes the analysis on DC images much more efficient.
R e s e a r c h Issues
•
27
Partial decoding of compressed video saves more computation time than full-frame decompression.
A method of extracting DC images from an MPEG video stream is described by Yeo and Liu [117, 118, 119, 120]. Extracting the DC image of an I frame is trivial since it is given by its DCT (discrete cosine transform) coefficients. Extracting DC images from P frames and B frames needs to use inter-frame motion information. This may result in many multiplication operations. To speed up the computation, two approximations are proposed: zero-order and first-order. The authors claim that the reduced images formed from DC coefficients, whether they are precisely or approximately computed, retain the global features that can be used for video data segmentation, SCD, matching, and other image analysis.
Basic Measurements of Inter-frame Difference T e m p l a t e M a t c h i n g Template matching compares the pixels of two images across the same location and can be formulated as x'<M,y
d(Ii,b)=
IIi(x,y)-b(x,
)l,
(2.1)
x=0,y--0
where image size is of M x N. Template matching is very sensitive to noise and object nlovements since it is strictly tied to pixel locations. This can cause false SCD and can be overcome to some degree by partitioning the image into several subregions. Figure 2.8 is an example of inter-frame difference sequence based on template matching. The input video is the one that contains the image sequence as shown in Figure 2.5. C o l o r H i s t o g r a m The color histogram of an image can be computed by dividing a color space, such as RGB, into discrete image colors called bins and counting the number of pixels that fall into each bin [103]. The difference between two images Ii and Ij based on their color histograms Hi and Hj can be formulated as t z
d(Ii, Ij) = E
IHik - Hjkl.
(2.2)
k=l
This denotes the difference in the number of pixels of two images that fall into same bin. In the RGB color space, above formula can be written as n
dRaB(Ii, Ij) =
}-~(Im~(k)-H~(k)l+lmZ(k)-mff(k)l+lm~(k)-m~(k)l ). k
(2.3)
28
CHAPTER 2
c+(6
2 z406
Figure 2.8
Template matching
Using only simple color histogram may not detect the scene changes very well since two images can be very different in structure and yet have similar pixel values. Figure 2.9 is the inter-frame difference sequence of the same video data as in Figure 2.5 with color histogram measurement. X2 H i s t o g r a m The X2 histogram computes the distance measure between two image frames as
d(Ii, Ij) = 2..£ k=l
Hi(,)) Hj(k)
.
(2.4)
Several researchers [79, 128, 129] have used X2 histogram in their SCD algorithms, and they have reported that it generates better results compared to other intensity-based measurements, such as color histogram and template matching. Figure 2.10 is the inter-frame difference sequence of the same video data as shown in Figure 2.5 but computed using X~ histogram.
2.3.3
Full-Image Video Scene-Change Detection
Most of the existing work on SCD is based on full-image video analysis. The differences between the various SCD approaches are the measurement function used, features chosen, and the subdivision of the frame images. Many use either intensity features [79, 81, 82, 110, 128, 129] or motion information [3, 57, 93]
Research Issues
29
I
0 aO
~) 5(}a
70
~0 ~N)1{X)~11:(?~340iSO6(}7(]~g)9020121122{i212~)25~Z(~)2702g(~ F i g u r e 2.9
Color histogram
0 )
2~ 30 ~ ) 5(~ 60 70 0
)0 100 iI~ 1~0 3
F i g u r e 2.10
14f} t50 6 0 17( gt 1902(~)2[)~
X2 histogram
2 )~41 .51 2 6 1 3 ) ~ I 3
30
CHAPTER 2
of the video data to compute the inter-frame difference sequence. The problem with the intensity-based approaches is that they may fail when there is a peak measurement value introduced by object or camera motion. Motion based algorithms also have the drawback of being computationally expensive since they usually need to match the image blocks across frames. After the interframe differences are computed, some approaches use a global threshold to decide a scene change. This is clearly insufficient since a large global difference does not necessarily imply there is a scene change as reported, for example, by Yeo and Liu [119, 120]. In fact, a scene change with a globally low peak is one of the situations that often causes the failure of algorithms. The scene change, either abrupt or gradual, is a localized process and should be checked accordingly.
Detecting Abrupt Scene Changes Algorithms for detecting abrupt scene changes are proposed by Nagasaka and Tanaka [79], Hsu and Harashima [57], Otsuji, Tonomura, and Ohba [81, 82], and Akutsu et al. [3], and can achieve over 90 % accuracy rate. However, these approaches do not work well for gradual scene changes like fading. Nagasaka and Tanaka [79] present an approach that partitions the video frames into 4 × 4 equal-sized windows and compares the corresponding windows from the two frames. Every pair of windows is compared, and the largest difference is discarded. The remaining difference values are used to make the final decision. The purpose of the subdivision is to make the algorithm more tolerant to object movement, camera movement, and zooms. Six different types of measurement functions namely, difference of gray level sums, template matching, difference of gray level histograms, color template matching, difference of color histogram, and a X2 comparison of the color histograms have been tested using this method. The experimental results indicate that a combination of image subdivision and the X2 color histogram approach provides the best detection of abrupt scene changes. Otsuji, Tonomura, and Ohba [81, 82] compute both the histogram and the pixel-bases inter-frame difference based on brightness information to detect scene changes, a projection detection filter is also proposed for more reliable scene detection. Akutsu et al. [3] use both the average inter-frame correlation coefficient and the ratio of velocity to motion in each frame of the video to detect scene change. Their assumptions are (1) the inter-frame correlation between frames from the
Research Issues
31
same scene should be high and (2) the ratio of velocity to motion across a cut also should be high. The approach is computationally expensive since computing motion vectors requires the matching of image blocks across frames. ttsu et al. [57] treat the scene changes and activities in the video stream as a set of motion discontinuities that change the shape of the spatiotemporal surfaces. The sign of the Gaussian and mean curvature of the spatiotemporal surfaces is used to characterize the activities. Scene changes are detected using a empirically chosen global threshold. Clustering and split-and-merge approach are then taken to segment the video. Unfortunatly, the experimental results in the paper are not sufficient to make any judgment on the approach, and no comparison results with other existing algorithms are available.
Detecting Gradual Scene Changes Increasingly, researchers are starting to study the methods for detecting both abrupt and gradual scene changes [93, 110, 128, 129]. Robust gradual SCD is more challenging than its abrupt counterpart, especially when there is a lot of motion involved. Unlike abrupt scene changes, a gradual scene change does not usually manifest itself by sharp peaks in the inter-frame difference sequence and is very easily confused with object or camera motion. Gradual scene changes are usually determined by observing the behavior of the inter-frame differences over a certain period of time. Tonomura et al. [110] use a comparison of an extended set of frames before and after the current frame to determine whether the current frame is a cut. They also propose to detect gradual scene changes by checking whether the inter-frame differences over extended periods of time exceed a threshold value. However, lack of sufficient details and experimental results make it very difficult to judge the proposed algorithm. Zhang et al. [128, 129] evaluate four scene-change detection approaches: template matching, the likelihood ratio between two images, histogram comparison, and ~2 histogram comparison. They conclude that histogram comparison performs better in terms of computation cost. In their approach, gradual transitions are detected using so called twin comparison technique. Two thresholds Tb and T,, (% < Tb) are set for camera breaks and gradual transition respectively. If the histogram value difference d(Ii, [i+1) between consecutive frames satisfies Ts < d(Ii, Ii+l) < Tb, they are considered potential start frames for a gradual transition. For every potential frame detected, an accumulated comparison At(i) = D(Ii, [i+1) is computed until Ac(i) > Tb and d(Ii, Ii+l) < %.
32
CHAPTER 2
The end of the gradual transition is declared when this condition is satisfied. To distinguish gradual transitions from other camera operations like pans and zooms, the approach uses image-flow computations. They claim that a gradual transition results in a null optical flow, and other camera operations result in particular types of flows. Their approach achieves good results. Failures are either due to similarity of color histograms across the shots when color contents are very similar or sharp changes in lighting such as flashes and flickering object. Shahraray [93] detects abrupt and gradual scene changes based on motioncontrolled temporal filtering of the disparity between consecutive frames. Each image frame is subdivided, and image block matching is done based on intensity values. A nonlinear order statistical filter [74] is used to combine the matching values of different image blocks, that is the weight of a match value in the total sum depends on its order in the image match value list. The author claims that this match measure of two images is more consistent with human judgment. Abrupt scene changes are detected by a thresholding process used by many existing algorithms that are discussed here. Gradual transitions are detected by identifing sustained low-level increases in image matching values. False SCD due to the camera and object motion is supressed by both image block matching and temporal filtering of the image matching value sequence. Shahraray [93] also mentions a simple and interesting idea of verifying the scene-detection result that he called scene verification. The idea is to measure inter-frame differences of representing frames resulting from the SCD algorithm. High similarity between two representing frames is likely to indicate a false detection. It is reported that this algorithm is capable of processing 160 × 120 pixels video in real time on a Pentium PC and has been extensively tested on a variety of TV broadcasts for more than one year. However, no statistical data about the accuracy of SCD are given in this paper. To improve the results of detecting fades, dissolves, and wipes, which most existing algorithms have difficulties with, Zabih et al. [123] propose an algorithm based on the edge-changing fraction. They observe that new intensity edges appear (enter the scene) far from the locations of old edges during a scene change and that old edges disappeared (exit the scene) far from the locations of old edges. Abrupt scene changes, fades, and dissolves are detected by studying the peak values in a fixed window of frames. Wipes are identified by the distribution of the entering and exiting edge pixels. A global computation is used to guard the algorithm from the camera or object motion. The algorithm has been tested on a data set available on the Internet M P E G movie archive, and experimental results indicate that the algorithm is robust against the parameter variances, compression loss, and sub-sampling of the frame images. The
Research Issues
33
algorithm performes well in detecting the fades, dissolves, and wipes but might fail in case of very rapid changes in lighting and fast moving object. It might also have difficulties in being applied to the video that is so dim that no edge could be detected. Initial implementation of the algorithm is about two frames per second on a SUN workstation.
2.3.4
S c e n e - C h a n g e D e t e c t i o n on t h e Compressed Video Data
To efficiently transmit and store video data, several video compression schemes such as MPEG, DVI, and motion J P E G have been proposed and standardized. To detect scene changes on those video streams, two approaches can be taken: Fully decompress the video data into a sequence of image frames and then perform the video scene analysis on fldl images by using the algorithms discussed in the last subsection. However, fully decompressing the compressed video data can be computationally intensive. For example, it involves Hoffmann code decoding, inverse DPCM, inverse quantization, inverse DCT, and motion compensation steps in the case of MPEG compressed data. To speed up the scene analysis, some researchers developed SCD algorithms for the compressed video data without the full decompression step. Those approaches are introduced in this subsection. They have been shown [117, 119, 120, 121] to be capable of producing similar results as the full image-based approach but are much more efficient. Most of the work has been done on DCT-based standard compressed video, such as MPEG [41]. Therefore, all SCD algorithms in this category are based on the DCT-related information, which can be extracted from the compressed video. Some algorithms operate oll the corresponding DC image sequences of the compressed video [117, 119, 120], whereas some use DC coefficients and motion vectors [36, 72, 77, 92, 130]. They all only need partial decompression of the video as compared to those algorithms described in the Section 2.3.3.
DC Image Sequence-Based Approach Yeo and Liu [117, 119, 120] propose to detect scene changes in the DC image sequence of the compressed video data. They [119] discuss the following measurements: successive pixels difference (template matching) and global color
34
CHAPTER 2
statistic comparison (RGB color histogram). Template matching is sensitive to the camera and object motion and m a y not produce good results as in the full frame image case. However, this measurement is more suitable for DC sequences because DC sequences are smoothed versions of the corresponding full images and thus less sensitive to the camera and object movements. Based on comparison experiment results, global color statistic comparison is found to be less sensitive to the motion but more expensive to compute. T e m p l a t e matching is usually sufficient in most cases and is used in their algorithm. Abrupt scene changes were detected by first computing the inter-frame difference sequence and then applying a slide window of size m. A scene change is found if The difference between two frames is the m a x i m u m within a symmetric window of size 2m - 1, and The difference is also n times of the second largest difference in the window. This criteria is for the purpose of guarding false SCD because of fast panning, zooming, or camera flashing. The window size m is set to be smaller than the m i n i m u m number of frames between any scene change. The selection of parameters n and rn relates to the trade-off between missed detection rate and false detection rate, typical values can be n = 3 and m = 10. The sensitivity of these parameters were also experimentally measured and analyzed. This method may miss some gradual scene changes. Gradual scene changes can also be captured by computing and studying the difference of every frame with the previous kth frame, that is, checking if a "plateau" appears in the difference sequence. They also discussed the detection of flashing-light scenes that might indicate the happening of i m p o r t a n t events or appearance of i m p o r t a n t person. Flashing-light scenes can be located by noticing two consecutive sharp peaks in a difference sequence, that is, in a slide window of the difference sequence: •
The m a x i m u m and second largest difference values are very close; and
•
The two largest difference values are much larger than the average value of the rest.
Detecting scenes with captions was also studied. Their experimental results indicate that over 99% of abrupt changes and 89.5% of gradual changes have
Research Issues
35
been detected and the algorithm was about 70 times faster than that on the full image sequence. This conforms to the fact that the size of the DC images of a M P E G video is only one sixty fourth of their original size. Although there m a y exist situations in which DC images are not sufficient to detect some features [117], this approach is nonetheless very promising and produces the best results in the literature.
DC Coefficients-Based Approach A r m a n et al. [36] detect scene changes directly on Motion J P E G compressed video d a t a using the DC coefficients. A frame in the compressed video sequence is represented by a subset of blocks. A subset of the AC coefficients of the 8 x 8 D C T blocks are chosen to form a vector. It is assumed that the inner product of the vectors from the same scene is small. A global threshold is used to detect scene changes, and in case there is an uncertainty, a few neighboring frames are selected for further decompression, and color histograms are used on those decompressed frames to find the location of scene changes. This approach is computationally efficient; however, it does not address gradual transitions like fade and dissolving, and the experimental evaluation of the technique is not very sufficient. Sethi and Patel [92] use only the DC coefficients of I frames of a M P E G compressed video to detect scene changes based on luminance histogram. The basic idea is that if two video frames belong to the same scene, their statistical luminance distribution should be derived from a single statistical distribution. If they are not, a scene change can be declared. Their algorithm works as follows:
.
I frames are extracted from the compressed video streams;
2. Then, the luminance histograms of the I frames are generated by using the first DC coefficient; . Finally, the luminance histograms of consecutive frames are compared using one of the three statistical tests (Yakimovsky's likelihood ratio test, the X2 histogram comparison test, or the Kolmogrov-Smirnov test, which compares the cumulative distributions of the two data sets).
Different types of video data have been used to test the algorithm, and X2 histogram comparison seems to produce better results than the other two tests.
36
CHAPTER 2
Zhang et al. [130] use D C T blocks and vector information of the M P E G compressed video d a t a to detect scene changes based on a count of non-zero motion vectors. Their observation is that the number of valid motion vectors in P or B frames tended to be low when such frames lie between two different shots. Those frames are then decompressed, and full-image analysis is done to detect scene changes. The weakness of their approach is that motion compensationrelated information tends to be unreliable and unpredictable in the case of gradual transitions, which might cause the approach to fail. Meng et al. [77] use the variance of DC coefficients in I and P frames and motion vector information to characterize scene changes of M P E G - I and M P E G - I I video streams. The basic idea of their approach is that frames tend to have very different motion vector ratios if they belong to different scenes, and to have very similar motion vector ratios if they are within the same scene. Their scene-detection algorithm works as follows. First, a M P E G video is decoded just enough to obtain the motion vectors and DC coefficients, and inverse motion compensation is applied only to the luminance micro-blocks of P frames to construct their DC coefficients. Then the suspected frames are marked in the following ways:
•
An I frame is marked if there is a peak inter-frame histogram difference and the immediate B frame before it has a peak value of the ratio between forward and backward motion vectors;
•
A P frame is marked if there is a peak in its ratio of intra-coded blocks and forward motion vectors; and
•
A B frame is marked if its backward and forward motion vector ratio has a peak value.
Final decisions are made by going through marked flames to check whether they satisfy the local window threshold. The threshold is set according to the estimated minimal scene change distance. Dissolve effect is determined by noticing a parabolic variance curve. As more and more video data are compressed and made available on the Internet and the World Wide Web, scene-change detection algorithms prove to be useful in m a n y cases. However, we should notice their limitations. First, current video compression standards like M P E G are optimized for data compression rather than for the representation of the visual content and they are lossy, that is,
Research Issues
37
they do not necessarily produce accurate motion vectors [123]. Second, motion vectors are not always readily obtainab!e from the compressed video data since a large portion of the existing MPEG video has I frames only [123]. Moreover, some of the important image analysis, such as automatic caption extraction and recognition, may not be possible on the compressed data.
2.3.5
M o d e l - b a s e d V i d e o Scene C h a n g e Detection
All the research works introduced so far are based solely on image-processing techniques. It is, however, possible to build an explicit model of the video data to help the SCD process [2, 47, 49]. These algorithms are sometimes referred to as top-down approaches [47, 49], whereas algorithms in Section 2.3.3 and 2.3.4 are known as bottom-up approaches. The advantages of the model-based SCD is that a systematic procedure based on mathematical models can be developed, and certain domain-specific constraints can be added to improve the effectiveness of the approaches [49]. The performance of such algorithms depends on the models they are based on. Hampapur et al. [47, 49] use a production model-based classification for video segmentation. Based on a study of the video production process and different constraints abstracted from it, a video edit model is proposed that captures the process of video editing and assembly. The model includes three components: edit decision model, assembly model, and edit effect model. The edit effect model contains both abrupt scene changes (cuts) and gradual scene changes (translate, fade, dissolve and morphing). Template matching and X2 histogram measurements are used. Gradual scene changes such as fade and dissolve are modeled as chromatic scaling operations. Fade is modeled as a chromatic scaling operation with positive and negative fade rates. Dissolve is modeled as a simultaneous chromatic scaling operations of two images. The first step of their algorithm is to identify the features that correspond to each of the edit classes to be detected and then classify the video frames based on these features. Feature vectors extracted from the video data are used together with the mathematical models to classify the video frames and to detect any edit boundaries. Their approach has been tested using the cable TV program video with cut, fade, dissolve, and spatial edits. An overall 88% accurate rate is reported [47]. Aigrain and aoly [2] propose an algorithm based on a differential model of the distribution of pixel value differences in a motion picture that includes
38
CHAPTER 2
Notes
Type
Model
Cut
Q(s) = 2(a-I~l) a(a-1)
Wipe
Q(s)
=
da(a-lsl) a(a-1)
Fade ( t o / f r o m white,
Q(s) = 2d(~-dl'l) ~(~-1) for dlsl < a Q(s) : 0 for dis [ > a d for d[s I < a Q(s) = -Z
t o / f r o m black)
Q(s) = 0 for dlsl > a
Dissolve
Table 2.2
a is the number of grey level d is the duration of change same as above same
as above
s > 0 for fade from black or to white s < 0 for fade from white or to black
Pixel difference distribution models for scene changes
A small amplitude additive zero-centered Gaussian noise that models camera, film, and other noises; An intrashot change model for pixel change probablity distribution constructed from object, camera motion, angle change, focus, or light change at a given time and in a given shot that can be expressed as
[a-lsl] a2 j
P(s) = k [
+ (1 - k)ae -~M
where a is the number of grey levels, k is the proportion of auto-correlated pixels, a and s are variables; and A set of shot transition model for different kinds of abrupt and gradual scene changes that are assumed to be linear (they are summarized in Table 2.2.
The first step of their SCD algorithm is to reduce the resolution of frame images by undersampling. Its purpose is to overcome the effects of the camera and object motion, as well as to make the computation more efficient in the following steps. Second is to compute the histogram of pixel difference values and count the number of pixels whose change of value is within a certain range determined by studying above models. Different scene changes are then to be detected by checking the resulting integer sequence. Experiments show
Research Issues
39
that their algorithm can achieve 94% to 100% detection rate for abrupt scene changes and around 80% for gradual scene changes.
2.3.6
Evaluation Criteria for t h e P e r f o r m a n c e of S C D A l g o r i t h m s
It is difficult to evaluate and compare existing SCD algorithms due to the lack of objective performance measurements. This is mainly attributed to the diversity in factors involved in the video data. However, various video resources can be used to test and compare algorithms against some user- and applicationindependent evaluation criteria, and give us indications of their effectiveness. Unfortunately, there is no widely accepted test video data set currently available, and lots of researchers use M P E G movies available on a few W W W archive sites ~ as inputs to their SCD algorithms. Such video data may not be a good benchmark to use in testing SCD algorithms. This can be explained as follows. First, the purpose of making these movies was not for benchmarking SCD algorithms. So, although some of them m a y take half an hour to download and m a y occupy very large disk space (a one minute M P E G video can easily take over 5 MB of storage space depending on the encoding method), they m a y not be a good representive data set for all the possible scene change types. Second, the qualities of these movies varies greatly since they come from different sources and are encoded using various coding algorithms. For example, m a n y M P E G movies have only I frames, which m a y cause problems to some SCD algorithms for compressed video data. Third, there is no widely accepted "correct" SCD results available for any of those M P E G data sets. Thus, an effort toward building a public accessible library of SCD test video data sets will be very useful. Such test data set should include video data from various applications that cover different types of scene change, along with the analysis results made and agreed on by researchers. We argue that performance measurements of SCD algorithms should include one or more of the following:
•
CPU time spent for a given video benchmark, such as the number of frames processed by SCD algorithm per time unit;
•
Average success rate or failure rate for SCD over various video benchmarks (includes both false detection and missed detection. A 100 % scene 2For example, http://w3.eeb.ele.tue.nl/mpeg/index.html.
40
CHAPTER 2
change capture rate does not imply that the algorithm is good since it may have very high false change alarms; results of an SCD algorithm can be compared to human SCD results which can be assumed to be correct); SCD granularity (can it decide between which frames a scene change occurs? can it also report the type of the scene change, such as whether it is a fade in or a dissolve?); Stability (its sensitivity to the noise in the video stream; flashing of the scene and background noises often trigger the false detection); Types of the scene changes and special effects that it can handle; Generality (can it be applied to various applications? what are the different kind of video data resources that can it handle?); and Formats of the video it can accept (full-image sequence, MPEG-I, MPEGII, AVI video, and so on).
2.3.7
Conclusion
In this section, a taxonomy of existing SCD techniques for the video database system is presented and discussed. Criteria for benchmarking SCD algorithms are also proposed. Existing SCD algorithms have achieved an above 90% success rate for abrupt scene changes and above 80% success rate for gradual scene changes. These numbers are, in general, fairly acceptable in certain applications. However, there is an obvious need for further improvement. There are several possible ways to achieve this: Use additional visual as well as audio information, rather than relying only on the color or intensity information that most existing algorithms rely on. Other visual information includes captions, motion of the objects and camera, and object shapes. The problem of how to use audio signals and other information contained in the video data in the scene change detection and video segmentation has not been carefully addressed in the literature so far, although some initial efforts [95, 96] were made for video skimming and browsing support. Develop adaptive SCD algorithms that can combine several SCD techniques and can self-adjust various parameters. Algorithms would choose the best criteria optimized for different given video data, that as a video
Research Issues
41
sequence with frequent object movements (action movies) versus the one with very few motions (lecture video). Use a combination of various scene-change models. Developing scenechange models can be a difficult task due to the complicated nature of video production and editing. However, different aspects of video editing and production process can be individually modeled and used in developing detectors for certain scene changes. Another idea is to develop new video coding and decoding schemes that inelude more information about the scene content. As pointed out by Bove [13], current motion-compensated video codec standards like M P E G complicate the scene analysis task by partitioning the scene into arbitrary tiles, resulting in a compressed bitstream that is not physically or semantically related to the scene structure. For a complete solution to the problem, however, a better understanding of the human capabilities and techniques for SCD is needed. This would involve using information available from psychopysics [21, 33, 65, 85] and also understanding the neural circuitry of the visual pathway [73]. Techniques developed in computer vision for detecting motion or objects [18, 21, 36, 106] can also be incorporated into SCD algorithms.
2.4
VIDEO DATA INDEXING ORGANIZATION
AND
Due to the huge data volume of video database, accessing and retrieving video data item become time consuming. Indexing of the video data is needed to facilitate the process. Compared to the traditional text-based database systems, video indexing is far more difficult and complex. First, in the traditonal DBMS, data are usually selected on one or more key fields (or attributes) that can uniquely identify data itself in the VDBMS. However, what to index on is not clear and easy to determine in video data indexing. This can be audio-visual features, annotations, or other information contained in the video. Second, unlike textual data, content-based video data indexes are difficult to have been automatically generated. Video data indexing is closely related to how the video data is represented (video data modeling discussed in Section 2.2) and to the possible queries that the user can ask (video data query and retrieval discussed in Section 2.5). Existing work on video indexing can be classified into three categories: annotation-based indexing, feature-based indexing, and domain-specific indexing based on how the indexes are derived. Most of the
42
CHAPTER 2
existing indexing approaches require that the video stream be first segmented into basic units called shots. Video segmentation is discussed in Section 2.3.
2.4.1
Annotation-Based Indexing
The video annotation is very important for a number of reasons. First, it fully explores the richness of the information contained in the video data. Second, it provides access to video data based on its semantic content rather than just by its visual content like color distribution. Unfortunately, due to the limitations of current machine vision and image-processing techniques, the fully automation of the video annotation process will remain impossible for a long time. Thus, video annotation is usually a manual process that requires human intervention. The annotation is usually done by an experienced user (film producer or librarian, for example), either as a part of the production process or as a post-production process. Manual annotation has several drawbacks: The cost is high as it is time consuming. It may only be suitable for inserting a small quantities of video data into the database but not for the large collection of video data. Annotation is usually application dependent. Thus the annotation of the video data of a certain domain may not be applicable to other applications. Annotation is usually biased and limited by the user doing the work. Because of these reasons, the design of existing annotation-based indexing techniques is primarily concentrated on the selection of the indexing terms or keyword, data structures, and user interfaces to facilitate users' effort. One of the earliest ideas for recording descriptive information of the film or video is the stratification model proposed by Davenport and Smith [30, 98]. The stratification model is a layered information model for annotating video shots. It approximates the way in which the editor builds an understanding of what happens in individual shots. To overcome the high cost of human annotation of video shots, they suggested that a data camera can be used during the video production process to record descriptive data of the video including time code, camera position and voice annotation of who-what-why information. This kind of annotation is also called source annotation by Hampapur [47]. However, they didn't address the problem of converting this annotation information into textual description to create indexes of the video data.
Research Issues
43
Very often, a set of keywords can be selected to annotate the video data. However, this may not be a good approach as strongly criticized by Davis [31] and shown in the following: It is not possible to use only keywords to describe the spatial and temporal relationships, as well as other information contained in the video data; Keywords cannot fully represent semantic information in the video data and do not support inheritance, similarity, or inference between descriptors (looking for shots of dogs will not retrieve shots indexed as German shepherds and vice versa; Keywords do not describe relations between descriptions (a search using the keywords man, dog, and bite may retrieve dog bites man videos as well as man bites dog videos; the relations between the descriptions determines salience and are not represented by keyword descriptions alone); Keywords do not scale (the more keywords used to describe the video data, the lesser the chance the video data will match the query condition). To overcome these difficulties of keyword annotation, an annotation system called Media Stream has been developed [31, 32] 3 at the MIT Media Laboratory. Media Stream allows users to create multi-layer, iconic annotation of the video data. The system has three main user interfaces: Director's Workshop, icon palettes, and media time lines for users to annotate video. Director's Workshop allows users to browse and compound predefined icon primitives into iconic descriptors by cascading hierarchical structure. Iconic descriptors are then grouped into one or more icon palettes and can be dropped onto a media time line. The media time line represents the temporal nature of the video data, and the video is thus annotated by a media time line of icon descriptors. Media System does not discuss how to further create indexes from those video annotations. One important idea argued in [30, 31, 32, 98] is that video should not be presegmented, which limits its usability; rather, they propose to build up an hierarchical annotation structure on top of the physical video streams, segmented annotation. One example is the stratification model [30, 98]. One apparent outcome is that several annotations can be linked to the same video clip. This idea has been further explored by Weiss et al. [113] in Algebraic Video Systems, 3 T h e p a p e r " A n Iconic Visual L a n g u a g e for Video A n n o t a t i o n " is also available in H T M L .
The URL is http://www.nta.no/telektronikk/4.93.dir/Davis-M.html
44
CHAPTER 2
in which algebraic operations are used to define nested strata or layers. Using the nested strata, multiple coexisting views and annotations can be linked to the same raw video data. Similarly, Hjelsvold et al. [53, 55, 56] built a logical hierarchical video data structure on physically stored raw video data. Free annotation is supported by establishing different mappings between annotations and video segments. Bimbo et al. [10] proposed spatial temporal logic (STL) for symbolically representing spatio-temporal relationships between objects or features in a image sequence. The spatial logical operators include before, after, overlaps, adjacent, contained, and partially intersects. Temporal logical operators include eventually and until. Standard boolean operators are also supported including and, or, and not. The symbolic description, which is a set of STL assertions, describes the ordering relationships among the objects in a image sequence. The symbolic description is created for and stored together with each image sequence in the database and serves as index. The symbolic description is checked when a user query is processed to determine matches.
2.4.2
Feature-based Indexing
Unlike the annotation indexing approach, feature-based indexing techniques are targeted at fully automating the indexing process. These techniques mainly depend on image-processing algorithms to segment video, to identify representing frames, and to extract key features from the video data. Indexes then can be built based on these key features. Key features can be color, texture, object motion, and so on. The advantage is that indexing processing can be done completely automatically and can be applied to various applications. Their primary limitation is the lack of semantics attached to the features, which causes problems and inconveniences to users who are attempting to specify video database queries. Hence, they are usually combined with a query graphical user interface for the user to define the query easily and domain-specific semantic annotations that enable the user to perform content-based queries. Lee and Kao [69] have developed a mechanism for indexing video data based on the concept of objects and object motion with interactive annotation. They defined a video record as a subsequence of video sequence that starts at the frame of some object appearing and ends at the frame of the object disappearing. A video record is object-oriented and time dependent, and is indexed as follows:
Research lssues
45
•
Each video record has an unique identifier.
•
The user views the video and interactively annotates the location of an object or masks the video subsequence. Annotation information is written into a labeling record.
•
Tracks of objects are extracted by motion extraction algorithms and are described by motion representation. Any motion can be described by combinations of 16 primitive motions (north, rotate-to-left, and so on) which are refined from the 4 basic motions that can be detected by optical flow methods.
•
Track records are generated for moving objects, and indexes are thus established.
In their prototype system, users can query the database by the combinations of video identifiers, objects, and tracks. The weakness of this approach is that video segmentation and annotation are all done manually, which can be very tedious. Similarly, Ioka and Kurokawa [59] also addressed the problem of indexing image sequences based on motion properties of objects within the sequences. Motion vectors of objects in a video sequence are automatically extracted using a block motion estimation technique. Vectors are then m a p p e d by using spatiotemporal space ( x - y - t ) and are aggregated into several representative vectors using statistical analysis. The motion vector information is stored in a description file as an index to the video sequence. The measurement of the distance between motion vectors is also defined. Ill their scene-retrieval system prototype, users can interactively input query trajectory, and it is matched with the trajectories of the sequences in the database, and the sequences with the smallest distance are returned. Video annotation and video segmentation are not discussed in the paper. A r m a n et al. [37] implemented a video browsing and indexing system. In the system, video is segmented into shots, and each shot is visually presented to the user using a selected representative frame (RFrame) from the shot. The RFrames are used as indices into the shots. The similarity comparison of the RFrames is based on the combination of two image features: one is the shape of the objects in the shot which is measured by the gray level moments; another one is the color histograms of the RFrames. This approach can be a very efficient way of indexing video data; however, types of user query is limited due to the fact that video indexing and retrieval are completely based on the calculation of image features.
46
C H A P T E R 9,
Indexes of a video sequence are defined as a multi-dimensional vector by Tonomura and Akutsu et al. [3, 4, 109,110]. The vectors are computed using imageprocessing techniques on features of video. The features, which they used in their prototype system MediaBENCH, include •
Average intensity, which is the average pixel intensity value for each frame;
•
Representative hue values, which are the top two hue histogram frequencies of each frame; and
•
Camera work parameters by extracting camera motions from the video sequence [4].
The above indexes are computed every three frames in order to achieve real-time video indexing. Indexes are stored in video index files that contain pointers to the corresponding video contents. Video data can be segmented into shots using scene-change detection based on index filtering, going through indexes frame by frame and noticing the inter-frame differences. Thus, a structured video representation can be built to facilitate video browsing and retrieval operations.
2.4.3
Domain-Specific Indexing
Domain-specific indexing approaches use the logical (high-level) video structure models, say, the anchorperson shot model and CNN "Headline News" unit model, to further process the the low-level video feature extraction and analysis results. An example of the logical video structure model is shown in Figure 2.2. After logical video data units have been identified, certain semantic information can be attached to each of them, and domain specific indexes can be built. These techniques are effective in their intended domain of application. The primary limitation of these techniques is their narrow range of applicability and limited semantic information through parsing video data. Most current research uses collections of well-structured logical video units as input, such as news broadcast videos. One of the early efforts with domain-specific video indexing was made by Swanberg et al. [104, 105]. They discussed the insertion process in video databases. Several logical video data models that are specific to news broadcasting (including anchorperson shot model, the CNN news episode models, and so on) are proposed and used to identify these logical video data units. These models contain both the spatial and temporal ordering of the key features, as well as
Research Issues
47
different types of shots. For example, the anchorperson shot model is based on the location of a set of features within frames belonging to an anchorperson shot. These features include icons of "Headline News" and the titling of the anchorperson. Image-processing routines, including image comparison, object detection, and tracking, are used to segment the video into shots and interactively extract the key elements of the video data model from the video data. The prototype system is implemented in the domain of CNN news broadcasting video. Hampapur et al. [48] defined the problem of video indexing based on a video data model. They proposed a methodology for designing feature-based indexing schemes. The indexing schemes presented used low-level image-sequence features in a feature-based classification formalism to arrive at a machine-derived index. A mapping between the machine-derived index and the desired index was designed using domain constraints. An efficacy measure was proposed to evaluate this mapping. The indexing scheme was implemented and tested on video data taken from the cable television feed. Similarly, Smoliar and Zhang et al. [100, 101, 127] used an a priori model of a video structure based on domain knowledge to parse and index the video data. In their approach, a video data is parsed by identifying the key features of the video shots, which are then compared with domain-specific models to classify them. Both textual and visual indexes are built. The textual index uses a category tree and assigns news items to the topic category tree. The visual index is built during the parsing process, and each news item is represented as a visual icon inside a window that provides an unstructured index of the video database. A low-level index that indexes the key frames of video data is also built automatically. The features used for indexing include color, size, location, and shape of segmented regions and the color histograms of the entire image and nine subregions that are coded into numerical keys. Their approach has been tested using news program video data.
2.5
VIDEO DATA QUERY AND RETRIEVAL
The purpose of a VDBMS is to provide efficient and convenient user access to video data collection. The video data query and retrieval process is complicated by the numerous demands placed on the system. The video data retrieval process typically involves the following steps. First, the user specifies a query
48
CHAPTER 2
using facilities provided by the user interface. The query is then processed and evaluated. The value or feature obtained is used to match and retrieve the video data stored in VDB. At the end, the resulting video data is displayed on the user interface in a suitable form. Video query is closely related to other aspects of VDBMS, such as video data indexing, since features used for indexing are usually used to evaluate the query, and the query is usually processed by searching the indexing structure.
2.5.1
Different Types of Queries
Identifying different classes of user queries in a VDBMS is very important to the design of video query processing in the VDBMS. The classification of the queries in a video database system can be done in many ways depending on, among many other factors, intended applications and the data model they are based on. Hampapur [47] proposes a schema to classify them from a video data model design point of view. Queries are classified according to their content type, matching required, function, and temporal unit type. In this book, video queries are grouped using the following criteria: Q u e r y c o n t e n t : Queries that are based on the video content can be divided as follows: -
Semantic information query: This kind of query requires an understanding of the semantic content of the video data. Consider a scene with a dog running after a man. It can be partially solved by semantic annotation of the video data. This is the most difficult type of query in the video database system, and it depends on the development of technologies such as computer vision, machine learning, and AI.
- Meta information query: Meta data is used to provide some information about video data, such as the producer, date of production, and length. This kind of query can be answered, in most cases, in a way that is similar to the conventional database queries. Meta data is usually inserted into the VDB along with the corresponding video data by video annotation that is currently manually or semi-manually done off-line. An example of a query could be to find a video directed by Alan Smithee and titled "2127: A Cenobite Space Odyssey." This class includes a so-called statistical query, which is used to gather the information about video data without content analysis of the video. A typical example is to ask how many films in the database in which Tom Cruise has appeared.
Research
-
Issues
49
A u d i o v i s u a l q u e r y : This kind of query depends on the audio and visual features of the video and usually does not require the understanding of the video data. One example is to find a video clip with dissolve scene change. In those queries, audio and visual feature analysis and computation, as well as tile similarity measurement [60], are the key operations as compared to the textual queries in the conventional DBMS.
Queries that are based on the nature of the video content can be divided as follows: S p a t i a l q u e r y : This relies on the spatial information in the video, for example, the retrieve clips with a sunset scene as background. -
T e m p o r a l q u e r y : This relies on the temporal information in video, for example, to find video clips with camera zooming in.
-
S p a t i o - t e m p o r a l q u e r y : This relies on the spatio-temporal aspects of video, for example, to find all video clips with people running side by side.
Query matching type: E x a c t m a t c h - b a s e d q u e r y : These queries are used to obtain an exact match of the data, for example, to find a CNN "Dollars and Sense" news clip from the morning of March 18, 1996.
-
-
S i m i l a r i t y m a t c h - b a s e d q u e r y : Because of the nature of the video data, similarity-based queries dominate the VDBMS. One example of a similarity-based query is to find a video clip that contains a scene that is similar to the given image.
Q u e r y g r a n u l a r i t y : The granularity of the query is the expected size of the query result. -
F r a m e - b a s e d q u e r y : This type of query is aimed at individual frames of video data that are usually the atomic unit of the VDB.
-
C l i p - b a s e d q u e r y : The result of this query is expected to be one or more subsets of video data that are relatively independent in terms of their contents.
-
V i d e o s t r e a m - b a s e d q u e r y : These queries deal with complete video data. An example query is to find a video produced in 1996 that has Kurt Russell as the leading actor. This will result in one or more of the video d a t a that the user would like to insert into the VDB before the latter performs any video segmentation operations on it.
50
•
•
CHAPTER 2
Q u e r y b e h a v i o r : Queries can be classified according to the ways they proceed. -
Deterministic q u e r y : In this case, the user has a clear idea what the expected result should be. The query condition is usually very specific.
-
B r o w s i n g q u e r y : A user may be vague about his retrieval needs or may be unfamiliar with the structures and types of information available in the video database. In such cases, the user may be interested in browsing the database rather than searching for a specific entity. The system should allow formulation of fuzzy queries to browse through the database. In browsing mode, there is no specific entity that a user is looking for. The system should also provide data sets that are representatives of all video data in the system.
Query specification: -
D i r e c t q u e r y : Here, the user should specify values of features of certain frames, such as color, texture, and camera position.
-
Query by example: This is also called query by pictorial example (QBPE) or Iconic Query (IQ). The user can supply a sample frame image as well as other optional qualitative information as a query input. The system will return to the user a specified number of the best-match frames. This kind of query methodology has been used in IBM's QBIC system 4.
-
I t e r a t i v e q u e r y : The system provides a graphic user interface that allows users to incrementally refine their queries until a satisfying result is obtained. A practical example of this can be see in the JACOB 5 system.
2.5.2
Query Specification and Processing
Video Query Language Most textual query languages such as SQL have limited expressive power when it comes to specifying VDB queries. The primary reason is that the visual, temporal, and spatial information of the video data can not be readily structured into fields and often has a variable-depth, complex, nested character. Queries 4ht t p: //wwwqbic.almaden.ibm.com/-.~qbic/qbic.ht ml/. 5http://wwwcsai.diepa.unipa.it/research/projects/jacob/.
Research Issues
51
about visual features of the video can be specified, for example, by using an iterative query by example mechanism. Spatial and temporal queries can be expressed, for example, by TSQL [12, 102] or spatial temporal logic (STL) [9]. Queries dealing with the relationships of video intervals can be specified using a temporal query language like TSQL (TSQL 2, Applied TSQL2) [12, 102]. TSQL2 has been shown to be upward compatible with SQL-92 and can be viewed as an extension to SQL-92. However, not all SQL-92 relations can be generated by taking the time slices of TSQL2 relations, and not all SQL-92 queries have a counterpart in TSQL-92. The completeness and evaluation of the TSQL2 are discussed by Bohlen et al. [12]. STL is proposed as a symbolic representation of the image-sequence content by Bimbo and Vicario [9]. It also permits intentional ambiguity and detail refinement which is especially needed in the queries. In their prototype system, users can define a query through an iconic interface, which allows for the creation of sample dynamic scenes reproducing the contents of the image sequence to be retrieved. The sample scenes are then automatically translated and interpreted into STL assertions. The retrieval is carried out by checking the query STL assertions against the descriptions of every image sequence stored in the database. The description of an image sequence is used to define the objectcentered spatial relationship between any pair of objects in every image and created manually when the sequence is stored in the database. Hjelsvold et al. [54, 56] define a video query algebra based on their VideoSTAR video data model. The video query algebra allows the user to specify complex queries based on temporal relationships between video stream intervals. Basic algebra operations include
•
Normal set operations, which include AND, OR, and D I F F E R E N C E ;
•
Temporal set operations;
•
Filter operations that are used to determine the temporal relationships between a two-stream interval (pointed out by Allen [5], there are 13 different relationships between two temporal intervals, such as before and overlaps;
•
Annotation operations that are used to retrieve all annotations of a given type and have non empty intersections with a given input set;
•
Structure operations that are similiar to the above but on the structural components; and
52
•
CHAPTER 2
Mapping operations that map the elements in a given set onto different contexts that can be basic, primary, or video stream.
A graphical user interface (GUI) is developed to assist users interactively define the queries in terms of the aforementioned video query algebra. VideoSQL is the video query language used in OVID [80] developed by Oomoto et al. The language gives the user the ability to retrieve video objects that satisfying certain conditions by S E L E C T - F R O M - W H E R E clauses. VideoSQL does not, however, contain language expressions for specifying temporal relations between video objects. Another SQL-like video query language is used in the Video Query System [58], which includes operations to •
Determine whether two intervals of video overlap each other,
•
Compute the intersection, complement, and union of two video data, and
•
Perform set iteration operations, such as FORALL and FOREACH.
Other Video Query Specifications Despite its expressive power and formalism, defining and using certain video query language can often become very complex and computationally expensive. Some researchers simply combine some important features of the video data to form and carry out queries. In these cases, the types of queries that can be defined and processed are usually limited. The MovEase system developed by Ahanger et al. [1] includes motion information as one of the main features of the video data. Motion information, together with other video features (color, shape, object, position, and so on) are used to formulate a user query into the VDB. One or more objects, as well as their motion information (path, speed), can be described through a GUI. In the GUI, objects are represented as a set of thumbnail icon images. Object and camera motions can be specified by using either pre-defined generic terms like pan, zoom, up, down, or user input-defined motion descriptions, such as zigzag path. The query is then processed and matched against the pre-annotated video data stored in the VDB. Results of the query are displayed as icons. Users can get meta information, the video segment, or the whole video represented by each icon image by simply clicking on it.
Research Issues
53
Lee and Kao [69] propose a video indexing mechanism that is based on the concept of video objects and object motion with interactive annotation. The video record is first labeled using an interactive annotation interface. Motion information is extracted and combined with labeling record to index the video data. Queries include query by video identifier (VID), query by object, and query by track. Different combinations of these primitive queries are also supported. It is not clear, however, how the objects and motion information are specified and input into the system. Ioka and Kurokawa [59] also use motion information to retrieve sequences of images in a motion image database. The motion vectors of objects are extracted on the basis of block motion estimation, which tracks each block of scene throughout a period. The motion of each block is represented by a vector in the feature space. Motion vectors in a scene are aggregated into several representative vectors by statistical analysis, which is used to index the image sequences in the database. The distance between two vectors (that is their similarity) is defined as the mean distance between the two vectors during the overlap time period of their lifetime. The user query is input using a stroke device. The user can also interaetively change the query condition, such as the number, period and starting time of the motion, through a GUI. The system then matches the query condition and the motion database index. Query results are a set of motion image sequences in the order of the distances. Experiments show that this method is practical for a limited number of data sets.
Query Processing After the query is defined either in terms of some query language or in terms of some features of the video data, query processing takes place. Query processing usually involves query parsing, query evaluation, database index search, and the returning of results. In the query parsing step, the query condition or assertion is usually decomposed into the basic unit and then evaluated. Unlike a traditional DBMS which deals only with textual and numerical data, features of the video data (like color and motion) are usually computed using some image-processing algorithms. Following the evaluation of the query, the index structure of VDB is searched and checked. The video data are retrieved, if the assertion is satisfied [9] or if the similarity lneasurement [59] is maximum. The result video data are usually displayed by a GUI in a way convenient to the user (such as iconic images [1]). Some examples of the video query processing are as follows.
54
CHAPTER 2
An on-line object-oriented query processing technique is proposed by Day et al. [34] for the user to query video data. Generalized n-ary operations are used for modeling both spatial and temporal contents of video frames. This enables a unified methodology for handling content-based spatial and spatio-temporal queries. In addition, the work devises a unified object-oriented interface for users with a heterogeneous view to specify queries. The VideoSTAR system developed by Hjelsvold et al. [56] uses a GUI to let users define a query in terms of the video query algebra they propose. After that, the query is parsed and broken down into basic algebra operations. Then, a query plan is determined and may be used to optimize the query before it is computed. Finally, the resulting video objects are retrieved.
2.5.3
User Interface Design
The user interface in a video data management system plays a crucial role in the overall usability of the system. Video queries are usually very complex in nature and difficult to express in pure text. The ease of query specification depends solely on the design of the GUI, that is, the GUI should enable the user to visually pose and refine different kinds of queries or browsing through. The GUI also decides how the video data retrieved can be presented to the user. Knowing the importance of the user interface, almost all research in video databases has implemented graphical user interfaces. For example, Ioka and Kurokawa [59] implemented a query interface based on "draw and playback," that is, the user draws his desired motion on the screen by using the stroke device, and the system senses the position of the cursor at specified time intervals. The input motion is then used to retrieve a sequence of images. Given the spatio-temporal nature of video data, there are a few research projects on the visual representation of such data [17, 78, 108]. Brondmo and Davenport [17] use a micon (moving icon) that displays moving images to represent video content of a document. Mills et al. [78] propose a hierarchical video magnifier that offers different levels of temporal resolutions to the user. Tonomura et al. [108] propose a content-oriented visual interface for a visual database system. The interface includes video icons, which are based on a structured icon model using information about video, for example, the duration of the video is represented by the thickness of the icon. There are also a few research efforts into interacting with video via graphical user interfaces [78, 113]. However, these have been designed more from the perspective of presenting digital video to users, for example, Weiss et al. [113] assign spatial layout operations to
Research Issues
55
every video algebraic expression, which determines how its video components are displayed on the screen. However, the entire area of designing video user interfaces, which includes query specification mechanisms and presentation of video query results, has not been systematically studied.
3 OTHER
RELATED
RESEARCH ISSUES
The success of video database systems is not solely dependent on the digital video database techniques discussed in the last chapter, it also relies on many other issues. In this chapter, several important issues are described. Video compression, which is very important for video data storage and transmission, is first introduced. This is followed by the description of the video server system design and file system support, an issue that is critical to the real-time, quality of service guaranteed applications of the video database systems, e.g. videoon-demand systems. Transmission of video data across computer network is posing new challenges to networking technologies; some of the efforts along this line are discussed. Another challenging issue is the protection of the exclusive rights of the owner of the digital video data (as well as other digital media), while at the same time allowing legal digital media sharing. Both are crucial to the success and acceptance of digital media sharing systems, like a video database.
3.1
VIDEO
DATA
COMPRESSION
One of the problems faced in designing and using video databases is the huge data volume of video streams. Table 3.1 shows the data rate of some standard representations of uncompressed digital video data that are obtained by sampling the corresponding analog video stream at certain frequencies. Clearly, video streams must be compressed to achieve efficient transmission, storage, and manipulation. The bandwidth will become even more acute for the HDTV since uncompressed HDTV video can require a data rate of more than 100 MB/sec.
57
58
CHAPTER 3
Video S t a n d a r d NTSC square pixel (USA, Japan etc.) a PAL square pixel (UK, China etc.) b SECAM (France, Russia etc.) c CCIR 601(D2) d T a b l e 3.1
a.
I m a g e Size 640 x 480
Bytes/Pixel 3
MB/sec 27.6
768 x 576
3
33.2
625 x 468 720 x 486
3
22.0 21.0
D a t a r a t e of u n c o m p r e s s e d digital video
NTSC stands for National Television System Committee; it has image format 4:3, 525 lines, 60 Hz, and 4 Mhz video bandwidth with a total 6 Mhz of video channel width. NTSC uses YIQ in its color coding system. NTSC-1 was set in 1948. It increased the number of scanning lines from 441 to 525, and replaced AM-modulated sound with FM. The frame rate is 30 frames/sec.
b. PAL stands for Phase Alternating Line, and was adopted in 1967. It has image format 4:3, 625 lines, 50 Hz, and 4 Mhz video bandwidth with a total 8 Mhz of video channel width. PAL uses YUV for color coding. The frame rate is 25 frames/sec. c. SECAM stands for Sequentiel Coleur A Memoire; it has image format 4:3, 625 lines, 50 Hz, and 6 Mhz video bandwidth with a total 8 Mhz of video channel width. The frame rate is 25 frames/sec. d. CCIR standards for Comit@ Consultativ International de Radio, which is part of the United Nations International Telecommunications Union (ITU) and is responsible for making technical recommendations about radio, television, and frequency assignments. The CCIR 601 digital television standard is the base for all the subsampled interchange formats such as SIF, CIF, and QCIF. For NTSC (PAL/SECAM), it is 720 (720) pixels by 243 (288) lines by 60 (50) fields per second where the fields are interlaced when displayed. The chrominance channels horizontally subsampled by a factor of two, yielding 360 (360) pixels by 243 (288) lines by 60 (50) fields a second.
Other Related Research Issues
3.1.1
59
Video Compression Standards
Most extant video compression standards use lossy video compression algorithms. In other words, the decompressed result is not totally identical to the original data. Some techniques can reduce the video data rate to 1 M/see, for example: M P E G and motion J P E G . Lossy compression algorithms are very suitable for video data since not all information contained in video data is equally important or perceivable to the human eye. For example, it is known that small changes in the brightness are more perceivable than changes in color. Thus, compression algorithms can allocate more bits to luminance information (brightness) than to the ehrominance information (color). This leads to lossy algorithms, but human subjects may not be able to see the loss in data. One important issue of a video compression scheme is the tradeoffs between compression ratio and quality of the video. A higher quality video image implies a smaller compression ratio and results in larger encoded video data. The speed of a compression algorithm is also an important issue. A video compression algorithm may have a larger compression ratio, but it may not be usable in practice due to the high computational complexity, and because real-time video requires a 25 fps decoding speed. There are many video compression standards (including C C I T T / I S O standards, Internet standards, proprietary standards) that are based on different tradeoff considerations, we outline some of them next.
MPEG MPEG stands for Moving Pictures Expert Group, which meets under the International Standards Organization (ISO) to generate standards for digital video (sequences of images in time) and audio compression. MPEG video compression is a block-based coding scheme. In particular, the standard defines a compressed bit stream, which implicitly defines a decompressor. However, the choice of compression algorithms is up to the individual manufacturers as long as the bitstreams they produce are compliant with the standard, and this allows proprietary advantage to be obtained within the scope of a publicly available international standard. MPEG encoding starts with a relatively low resolution video sequence (possibly decimated from the original) of about 352 by 240 by 30 frames/see (252 by 288 by 25 frames/sec in Europe), but with original high quality audio. The 352 by 240 by 30 frames/see is derived from the CCIR 601 digital television standard that is used by professional digital video equipment. The 30 frames/see frame rate is derived from the 60HZ display standards in the
60
Name MPEG-I
MPEG-II MPEG-III MPEG-IV
CHAPTER 3
Objective Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbps Generic coding of moving pictures and associated audio NA Very low bitrate audiovisual coding
Table 3.2
Status International Standard IS11172, completed in 10.92
International Standard IS13818, completed in 11.94 No longer exists (has been merged into MPEG-II) Verification model under development and core experiments being performed.
MPEG family of video standards
United States. For the 50 HZ display standard (PAL and SECAM,) the frame size and frame rate change accordingly. The basic encoding schema of MPEG is to convert the color video images into YUV space. Y is the luminance channel, and UV are two chrominance channels. UV channels are decimated further to 176 by 120 pixels based on the fact that human perception is not affected by the loss of resolution in U and V channels to a certain extent, at least not in "natural" (non computer generated) images. Motion is predicted from frame to frame in the temporal direction, and then the discrete cosine transforms (DCTs) are used to organize the redundancy in the spatial directions. The current status of M P E G standards is listed in Table 3.2.
MPEG-I is the Committee Draft of MPEG phase I completed in January 1992. It defines a bit stream for compressed video and audio optimized to fit into a bandwidth (data rate) of 1.5 Mbps which is the data rate of (uncompressed) audio CDs and DATs. The video stream takes about 1.15 Mbps, and the remaining bandwidth is used by audio and system data streams. The draft is in three parts, video, audio, and systems. The data in the system stream provides information for the integration of the audio and video streams with the proper time stamping to allow synchronization. The quality of the MPEG-I encoded video is said to be about that of a VHS video recording. The MPEG-I video is strictly progressive, i.e. non-interlaced. MPEG-II's task was to define a bit stream for video and audio coded at around 3 to 10 Mbps. Based on the subjective testing at the ISO M P E G Japan meeting
Other Related Research Issues
61
in November 1991, it seems that 4 Mbps can give very good quality compared to the original CCIR 601 material. The objective of phase II is to define a bit stream optimized for these resolutions and bit rates. MPEG-II will compress full motion video (CCIR 601) in broadcast television and video-on-demand (VOD) applications. Its main advantages over MPEG-I are •
A better compression ratio for interleaved images;
•
A transmission stream suitable to a computer network;
•
The ability to support a wide range of picture frames;
•
A wide bandwidth of data st.reams;
•
The freedom to select aspect ratio, and
•
High Definition TV (HDTV) encoding at bitrates between 20 and 40 Mbps which was the goal of MPEG-III.
MPEG-II also allows progressive video sequence, and its decoder can decode MPEG-I video stream as well.
MPEG-IV is associated with narrowband channels, like mobile networks and POTS (plain old telephone system), which use small frame sizes and require slow refreshing. The common data rate that will be offered is 4 to 64 Kbps. M P E G provides very good compression but requires expensive computations to decompress the video data before display. Currently, ahnost all systems on the Internet use software decompressing, which greatly limits the frame rate that can be achieved. It is widely believed that MPEG will soon become the video standard for home and industry applications. Actually, almost every major multimedia content developer is developing or porting existing titles to MPEG. For example, Phillips, which is one of the big forces behind M P E G titles, produced hundreds of M P E G CDs using their CD-i technology in 1995. Computer companies, including Packard Bell, Dell, and IBM, have announced plans to support MPEG in 1996, and Apple has intimated that M P E G chips will play a role in the Macintosh's future. Microsoft also announced that the next release of Windows will include Mediamatics' software-only M P E G 1 playback engine. In addition, Sega's Saturn and Matsushita's upcoming console based on 3DO's M2 chipset will include hardware M P E G support in 1996, as no doubt, Sony's PlayStation and Atari's Jaguar as well [90].
62
CHAPTER 3
JPEG and MJPEG JPEG is a standardized image compression mechanism. JPEG stands for Joint Photographic Experts Group which is the original name of the committee that wrote the standard. It is designed for compressing either full-color or gray-scale images of natural, real-world scenes. It works well on photographs, naturalistic artwork, and similar images; however, it does not work so well on lettering, simple cartoons, or line drawings. JPEG handles only still images, and it is lossy (there are lossless image compression algorithms, but JPEG achieves much greater compression ratio than these lossless methods.) It is designed to exploit known limitations of the human eye, notably the fact that small color changes are perceived less accurately than small changes in brightness on which MPEG is also based. Thus, JPEG is intended for compressing images that will be viewed by human beings. If images need to be machine-analyzed, the small errors introduced by JPEG may pose a problem, even if they are invisible to the human eye. A useful property of JPEG is that the degree of loss of information can be varied by adjusting compression parameters. This means that the image maker can trade off file size against output image quality. One can make extremely small files if one doesn't mind poor quality; this is useful for applications such as indexing image or video databases. Conversely, one can increase the quality with lesser compression if one is not satisfied with the output quality at the default compression setting. Another important aspect of JPEG is that decoders can trade off decoding speed against image quality, by using fast, though inaccurate, approximations to the required calculations. MJPEG stands for Motion JPEG. Contrary to popular perception, there is no MJPEG standard. Various vendors have applied JPEG compression algorithm to individual frames of a video sequence, and have called the compressed video MJPEG, but they are not comptiable across the vendors. Compared with MPEG standard, the advantages of MJPEG are •
Frame-based encoding, which is good for accurate video editing;
•
Fairly uniform bitrate; and
•
Simpler compression, e.g., no cross-frame encoding, which requires less computation and can be done in real time.
The disadvantage is that there is no inter-frame compression; thus its compression ratio is poorer than that of MPEG (about three times worse).
Other Related Research Issues
63
H.261 H.261 is the most widely used international video compression standard for video conferencing. The standard describes the video coding and decoding methods for the moving picture component of an audiovisual service at rates up to 2 Mbps, which are multiples (1 to 30) of 64 Kbps. The standard is suitable for applications using circuit-switched (phone) networks as their transmission channels. This is understandable, as ISDN with both basic and primary rate access was the communication channel considered within the framework of the standard. H.261 is usually used in conjunction with other control and framing standards such as H.221, H.230, H.242, and H.320, for communications and conference control. It is an ITU (was C C I T T ) standard for video telephony aimed at ISDN (which is also one of its limitations), and has many hardware and software implementations (like PC video cards). It is said that at 2 MB/sec, it approximates entertainment quality (VHS) video. The actual encoding algorithm of H.261 is similar to (but incompatible with) that of MPEG. Another difference is that H.261 needs substantially less CPU power for real-time encoding than MPEG. The algorithm includes a mechanism that optimizes bandwidth usage by trading picture quality against motion, so that a quickly-changing picture will have a lower quality than a relatively static picture. H.261 used in this way is a constant-bit-rate encoding rather than a constant-quality, variable-bit-rate encoding.
MHEG MHEG (Multimedia and Hypermedia information coding Experts Group) is a draft ISO standard for representing hypermedia applications in a platformindependent form. It uses an object-oriented approach, and is optimized for run-time efficiency. The standard has been published in two parts: object representations and hyperlinking. MHEG is suited to interactive hypermedia applications such as on-line textbooks and encyclopedias. It is also suited for many of the interactive multimedia applications currently available (in platform-specific form) on CD-ROMs. MHEG could, for instance, be used as the data-structuring standard for a future home entertainment interactive multimedia appliance. To address such markets, MHEG represents objects in a non-revisable form and is, therefore, more suitable as an output format rather than as an input format for hypermedia authoring applications. MHEG is not a multimedia document processing format, instead it provides rules for the structure of multimedia objects that permit the objects to be represented in a convenient form, for example, video objects could be MPEG-encoded. It uses ASN.1 as a base syntax to represent object structure, but also allows the
64
CHAPTER 3
use of other syntax notations, like the SGML syntax. Other parts of the standard are being developed. Among them, Part 3 will specify a set of extensions for script object interchange. Part 5 will specify the MHEG subset for baselevel implementations such as those used for VOD and home shopping services. Part 6 will identify support levels required for interactive televsion and related applications.
DV/ DVI is the Intel's Digital Video Interactive compression scheme, which is based on the region encoding technique. Each picture is divided into regions that, in turn, are split into subregions recursively, until the regions can be mapped onto basic shapes to fit the required bandwidth and quality. The chosen shapes can be reproduced accurately by the decoder. The data sent is a description of the region tree and of the shapes at leaves. This is an asymmetric coding, which requires a large amount of processing for the encoding and less for the decoding. DVI, though not a standard, plays an important role in the market and Internet (one can find many movies and video files in DVI format on the W W W ) . However, due to the fact that SUN aborted its plan to use DVI compression in the new generation of SUN videopix cards and Intel has also canceled the development of the V3 DVI chips, the future of DVI is now uncertain.
Quick Time Like DVI, QuickTime is not one of the C C I T T / I S O standards. It is Apple's cross-platform file format for the storage and interchange of sequenced data. A QuickTime movie contains time-based data that may represent sound, video, or other time-sequenced information. A QuickTime movie is constructed of one or more tracks, with each track being a single data stream. Movie resources are built up from basic units called atoms, which describe the format, size, and content of the movie storage element. It is possible to nest atoms recursively within "container" atoms. One type of container atom is the "movie" atom, which defines the timescale, duration, and display characteristics for the entire movie file. It also contains one or more track atoms for the movie. A track atom defines a single track of a movie and is independent of any other tracks in the movie, carrying its own temporal and spatial information. Track atoms contain status information relating to the creation or editing of the track, priority in relation to other tracks, and display and masking characteristics. They also contain media atoms that define the data for a track. Media atoms contain information relating to the type of data (such as sound, animation, and
Other Related Research Issues
65
text), and information relating to the QuickTime system component that is to handle the data. Component-specific information is contained in a media information atom which is used to map media time and media data. There are many other atom types that define a wide variety of features and functions, including a T E X T media atom that allows displayed text to change with time, and user-defined data atoms called derived media types for other purposes. These allow for the custom handling of data by overriding the media handler with a user-supplied driver. The actual movie data referred to by the movie resources may reside in the same file as the movie resource, for example, a "self-contained"movie, or more commonly, it may reside in another file or on an external device. QuickTime has the potential of becoming a computerindustry standard for the interchange of video and audio sequences. CNN, for an example, use a Quicktime format to provide daily news clips on its World Wide Web sites (http://www.cnn.com/video_vault/index.html).
3.1.2
Other Video Data Compression Algorithms
Besides the existing video compression standards, there are other video data compression algorithms that are used for specific purposes. An example of these is the Subband video coding [115]. Subband video coding algorithms use multistage quadrature mirror filtering (QMF) to split the input spectrum into low-band and high-band halves. In each stage, high and low pass filterings are performed followed by downsamplings in the target dimension to produce two output components [20]. The corresponding decompression algorithms use stages of unsampling and high and low pass filterings. These techniques can be used to achieve both spatial and temporal video compression by defining the target dimensions. The compression ratio can be increased by stage cascading [20], and data rates comparable to other algorithms such as MPEG, can be achieved. An important feature of subband video coding algorithms is that they can be used to produce video streams with different spatial and temporal resolution. Keeton and Katz [67] make use of this feature to produce multiresolution video data stored on high-bandwidth, high-capacity storage arrays, and use them to satisfy multiuser video requests with varying quality-of-service (QOS) demands.
66
CHAPTER 3
3.2
M E D I A S E R V E R D E S I G N A N D FILE SYSTEM SUPPORT
3.2.1
Introduction
A media server system is used to deliver reliable storage and real-time continuous playback video and audio streams from its storage. The structure of a generic media server system can be illustrated in Figure 3.1. Tertiary storage,
DisplayEquipment
\ ""
--00TertiaryStorage
Disk Array
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Media
ServerS y s t e m
Figure 3.1
The structure of a media server system
such as tape systems, is used to store hundreds or thousands gigabytes, or even terabytes, of video data. However, they usually do not provide sufficiently low latency or sufficiently high concurrency of access. Disk arrays are usually used to store the video d a t a retrieved from tertiary storage and to deliver the video d a t a on user's request. It is not u n c o m m o n for a disk array to contain on the order of 1000 disks. Assuming the capacity of one disk to be 1 gigabyte, and the transfer bandwidth to be 4 MB/sec, a 1000-disk system is large enough to store 300 M P E G - I I movies of 90 minutes each, and support 6500 concurrent M P E G - I I movie users [8]. In order to deliver smooth, continuous, and realtime media streams, a RAM buffer is often used to cache the popular portion of videos. The media streams are delivered on requests through the high-speed network (ATM, etc.) to the corresponding users, and contents are displayed on
Other Related Research Issues
67
the end users' video display equipment, which can range from HDTV to PC screen. A continuous media server is needed to support real-time, continuous video and audio streams for Video Database Systems, and its applications. An important application is the video-on-demand (VOD) system, which has attracted a lot of attention from the entertainment, telecommunication, and computer industries, as well as academic research groups. Designing a media (including video, audio etc.) file server system is challenging since •
Traditional file servers like NFS are based on best-effort policy, and thus are not suitable as video file servers since they were designed for small data files whose usage, requirements, and semantics are fundamentally different from those of real-time, continuous media data.
•
Such a system must deal with multiple classes of tasks with diverse performance requirements. These tasks include -
-
-
Continuous real-time media streams (video, audio) that are periodic and time sensitive. They require performance guarantees for high throughput, bounded delay, and low jitter. These tasks usually have the highest priorities in the media server systems; Other real-time tasks, such as device polling and communication stacks; Non-real-time, unsolicited requests from the network (NFS read/write, background management commands etc.), characterized by stochastic arrivals, tolerant of bandwidth and delay variations, but requiring an adequate response time;
•
Such a system must satisfy the real-time requirement of the continuous media data delivery at certain bandwidth.
•
Such a system must deal with multiple user requests simultaneously.
•
Such a system needs to support VCR-like operations, such as fast forward and rewind.
•
Such a system must be cost effective since it will be a determining factor for the success of the video database technology and its various applications (VOD for example).
•
Such a system needs to consider the scheduling of storage devices, along with the scheduling of other system resources to achieve high CPU and storage utilization.
68
CHAPTER 3
3.2.2
R e a l - T i m e Disk Scheduling and A d m i s s i o n Control
We now discuss some disk scheduling and admission control algorithms for guaranteed real-time storage access. A common approach of real time disk scheduling is to retrieve disk blocks for each stream in a round robin, and to keep the size of the block to the proportion of the stream's playback rate; thus, it is known as quality proportional multisubscriber servicing (QPMS) [111], rate conversion [75] or period transformation technique [26]. Other research works including the following The elevator disk scheduling algorithm proposed by Silberschatz et al. [94] scans the disk cyclinders from the innermost to the outermost, and then scans backward. This algorithm is widely used because of its nearly minimal seek time and fairness. Taking the priorities of the requests into consideration, the elevator disk scheduling algorithm can be easily extended to the real-time disk scheduling. The priorities can be determined based on factors such as tasks' deadlines; they can be grouped into different priority classes. Each disk access request will be assigned a priority, and the highest priority class with pending disk accesses is serviced using the elevator algorithm. Yu et al. [122] propose a disk scheduling algorithm called group sweeping scheme (GSS) to minimize both the disk access time and the buffer space. This algorithm assigns each request to a group. The set of groups is processed in a round robin fashion. Elevator scheduling algorithm is used within each group. This algorithm is a mixture of elevator and round robin algorithms, and its behavior can be adjusted by changing the group size and the number of groups. The algorithm approximates to elevator algorithm as the number of groups descreases, and to round robin algorithm if the number of requests in each group increases.
Prefetching disk scheduling algorithm can be extended for the real-time disk scheduling to reduce the memory requirement of the media server. Freedman and DeWitt [40] describe a love page prefetching and delayed prefetching algorithms, and use them in their SPIFFI video-on-demand system. SystemLove page prefetching is a buffer pool page replacement algorithm that extends the global LRU algorithm [94] by distinguishing prefetched pages and referenced pages. Love page prefetching makes use of the fact that the video data is usually accessed strictly sequentially (for example, watching a movie), and the probability of a data block in
Other Related Research Issues
69
the RAM buffer being referenced again is not high [83]. It uses two LRU chains: one is for referenced pages and one is for prefetched pages. When a new page is needed, the referenced page chain is searched first and a page from prefetched chain is taken if there are no available pages referenced page chain. Delayed prefetching delays the data prefetching until the last minute, thus reducing the size of the RAM buffer needed to store the prefetched video data. Reddy and Wyllie [89] simulated a real-time I / O subsystem and compared the performance of three disk scheduling algorithms: elevator, earliest deadline first (EDF), and their combinations. Haritsa and Karthikeyan [51] present analytical and simulation studies on the m e m o r y requirements of the elevator and, first-come-first-served (FCFS) disk scheduling algorithms. Ramakrishnan et al. [87] describe how to implement a video-on-demand server. For each stream the video file server maintains two buffers: a disk buffer and a network buffer. A network request empties the network buffer as a disk task fills up the disk buffer. When the network buffer is empty, two buffers exchange their roles. The goal is to have the disk buffer fill up rate, and the network buffer e m p t y rate equal to the s t r e a m ' s playback rate. Their scheduling and admission control agorithm supports multiple classes of tasks with different performance requirements and can deal with the coexistence of guaranteed real-time requests, as well as sporadic and unsolicited requests. The algorithm is based on round robin. A new video stream request is granted if the following conditions can be satisfied: -
-
The disk buffer fill up rate must be no less than the network buffer empty rate; Enough space must be provided for allocating network and disk buffers to the new stream; and Disk service time must not exceed the m i n i n m m tolerable request latency.
Certain bandwidth is permanently reserved for the non-real-time requests. The algorithm can be adapted to variations in frame sizes.
70
CHAPTER 3
3.2.3
C P U Admission and Scheduling Algorithms
The purpose of CPU admission control and scheduling algorithms is to ensure that a feasible scheduling exists for all the admitted tasks. Ramakrishnan et al. [87] use a time-based admission test for isochronous tasks (also known as periodic tasks), and use a time-based and tolerable delay constraints admission policy for other real-time tasks for the design of a VOD file service system: Isochronous tasks are periodic network transmissions of video and audio data. These tasks need performance guarantee, that is, throughput, bounded latency, and low jitter. Their priorities are determined using a rate-monotonic basis [71], i.e. a task with a higher frequency has a higher priority. A preemptive fixed-priority scheduling algorithm is used for isochronous tasks. Other real-time and non-real-time tasks are scheduled using weighted round robin, which can be preempted by isochronous tasks. General-purpose tasks have the lowest priorities, but are given minimum CPU quantum to avoid starvation.
3.2.4
Video Data Storage Strategies
Clearly, an important aspect of video database design is the video data placement policy, that is, how the video data is placed on the storage device. Video data need to be stored continuously and retrieved sequentially. Real-time video playback also imposes strict delay and delay-variance requirements on the retrieval of data. Traditional file system data layout strategies do not take characteristics of video data and these requirements into account. What makes a video data layout policy more challenging are the following facts: •
Real-time video playback imposes strict delay and delay-variance requirements on the retrieval of the data.
•
The video server should maximize the number of concurrent user requests without affecting the system performance.
•
The bandwidth and space parameters of storage devices in the system may vary.
Other Related Research Issues
71
The video server should be able to support clients with vastly different quality-of-service (QOS) parameters, such as display size, resolution, and frame rate. Rangan and Vin [88] use a constrained allocation policy for continuous media data blocks to ensure the continuous retrieval of the data. Lougher and Shepherd [75] exploit the temporal locality of the video data by storing it in an append-only log. Both approaches develop admission algorithms that admit a new user request only if it satisfies the real-time constraints. They primarily focus on the issue of single-disk system or very small disk arrays. Ozden et al. [83] propose a hierarchical storage structure for a movie-on-demand (MOD) server. It uses disks to store the popular movies, and uses a small amount of RAM cache to store portions of movies. The storage allocation scheme is called phase-constrained which elimates seeks to random locations, and only buffers the movie portions currently being viewed in the RAM. In this way, the overall cost of the server is reduced (RAM is expensive) and a large number of different parts of movies can be viewed simultaneously. They also describe how to support VCR operations of the server storage structure. Dan and Sitaram [28] proposed a dynamic video data placement policy called the Bandwidth to Space Ralio (BSR). It creates a n d / o r deletes replicas of a video and mixes popular and unpopular videos so as to make the best use of bandwidth and space of a storage device. The policy is evaluated using a simulation study. Authors address the following problems: •
The access frequency of certain video objects is different and the bandwidth of a single storage device is limited. Popular video objects may need to be replicated and put on many different storage devices to satisfy many simultaneous user requests.
•
Video data vary in size, such as long movies and short video clips. The access frequency is independent of the data size.
•
Storage devices vary in terms of their bandwidth and capacity in the server system.
The question is how to place the video data so that the and thus maxinmm utilization of both bandwidth and Bandwidth to Space Ratio (BSR) policy characterizes its bandwidth to space ratio, and each video object by
device load is balanced space is achieved. The each storage device by its required bandwidth
72
CHAPTER 3
to the space needed to store it. The policy then dynamically determines how many copies are needed and on which storage devices to put them according to the change in the user's demand. Another related work is the Dynamic Segmen* Replication (DSR) policy proposed in [27], which uses partial replication of the video objects to balance the load of a multimedia server system. This policy is based on the observation that a group of consecutive requests of a popular video object can share the partial replication of the video object generated by the previous request on the same video. Some researchers take the approach of integrating video image coding and video data layout for the video data storage systems. One example of such an approach is the multi-resolution video coding scheme based on Gaussian and Laplacian pyramids for video data storage proposed by Chieuh and Katz [23]. Keeton and Katz [67] propose another approach, which uses scalable compression algorithms to generate multi-resolution video data. These video data are stored on a high-bandwidth, high-capacity disk arrays using one of the two video data layout strategies proposed. The advantages of their approach are that It can satisfy multi-user requests of different quality-of-service (QOS) without wasting too much server and client resources on dynamically adjusting the QOS parameters; and The server can switch the representations of some streams to respond to an overload situation or to accommodate more requests if needed. Their simulation results showed that a video server with the storage of multiresolution video data satisfies more user requests than the one with singleresolution video data.
3.2.5
Disk Failure Tolerance
A VDB system needs to be able to tolerate failures of the disks in the media server, espically in an application like VOD or MOD because real-time, continuous media streams require media with very high availability and reliability. Disk failures result in a disruption of the video streams' delivery. The failed disks have to be replaced, and the video data has to be copied from tertiary storage. One may point out that disk reliability is increasing with technological advances. However, although a single disk may be very reliable, a large disk array used in a media server system may have an unacceptable high failure
Other Related Research Issues
73
probability. As pointed out by Berson et al. [8], if the mean time to failure ( M T T F ) of a single disk is on the order of 300,000 hours, the M T T F of a disk array system that consists of 1,000 disks will be 300 hours. There are two kinds of serious disk failures [8],
Catastrophic failure can happen, for example, when in the parity schema [84], two disks in the same parity group fail. In this case, the video data cannot be reconstructed on the fly and needs to be copied from tertiary storage. Degradation of service occurs when the system cannot answer all active requests due to the insufficient bandwidth because of a disk failure. It is necessary to sacrifice some of the disk space and the bandwidth to improve the reliability and availability of the media server system. Usually, several parity [8, 84] and mirroring [11] schemas can be used.
Streaming RAID Schema In the streaming RAID Schema [107], disks are grouped into disk clusters of fixed size C, of which C - 1 disks are used to store the actual data; one disk is used to store the parity results of the actual data blocks. In case of disk failure, data can be reconstructed using parity computation, up to one disk failure per cluster. For a 1,000 disk array with cluster size 10, assume that the M T T F of each disk is 300,000 hours. The mean time until a catastrophic system failure can be computed as 1,100 years [8]. The disadvantages of the streaming RAID schema are (1) RAM size per disk is linear to the cluster size, and (2) a certain amount of the bandwidth is permanently reserved for parity data, and thus wasted when there is no disk failure.
Staggered-Group Schema In the Staggered-Group Schema [107], the data layout and fault tolerance are exactly the same as Streaming RAID Schema. The main difference between staggered-group schema [107] and the streaming RAID schema is that the data read for an stream in one cycle can be delivered in next n cycles rather than in the next cycle as in the streaming RAID schema. Each stream is assigned a different reading cycle, and m e m o r y usage is overlapped. In this way, the RAM buffer size can be reduced approximately by half. The tradeoff is that some of the bandwidth will be lost. To support fault tolerance in case of disk failure, an
74
CHAPTER 3
entire parity group will be read during the reading cyle of a particular object. The reasons for disk bandwidth utilization loss are shorter cycles and fewer requests per disk per cycle compared to those of streaming RAID schema.
Non-Clustered Schema Both streaming RAID Schema and Staggered-Group Schema read entire parity group in each reading cycle for each active stream; this requires a large amount of RAM to store the data until it is transmitted. Non-clustered Schema [8] improve the utilization of the bandwidth and RAM buffer by reading only the actual data when the system is in the normal opertion mode, and when there is a disk failure, the cluster will switch to degrade mode in which actual data will be read along with parity data. The drawback of this schema is that during the short transition period, some data will not be delivered. However, such degradation is rare. The mean time to a catastrophic failure is the same as for the above two schemas.
3.3
3.3.1
NETWORK SUPPORT FOR THE VIDEO APPLICATIONS Current Network Technology
Current network technologies typically support local area networks (LAN) that operate up to 10 to 100 Mbps (e.g., Ethernet or token ring). Because 45 Mbps T3 or DS3 links are still expensive, most long-haul networks are constructed from up to six ISDN links that give 384 Kbps or T1 (DS1) links that run at 1.544 Mbps. A 100 Mbps LAN (fast Ethernet) can handle a small number of users sending and receiving 3 Mbps video streams, but a 1.5 Mbps WAN cannot support even a single user. The total demand of video application users can easily overwhelm the capacities of the current networks. Obviously, a dramatic increase in network capacity is needed, especially if WAN-based VOD-like applications are to be supported. The most widely used network protocols today are Novell IPX and T C P / I P . Both protocols support unreliable datagram and reliable virtual circuit services. There is no guarantee on how long the end-to-end datagram delivery takes since these networks were designed to share the network bandwidth between
Other Related Research Issues
75
the users; as the network load goes up, service for everyone degrades, with considerable transimission time delay, for example.
3.3.2
Requirements and Challenges of Networked Video Applications
Current LANs were designed to deal with mainly non-real-time text and image data objects. Video and audio data are fundamentally different from the kind of data that typically travels across a LAN, that is, word processing, database, or spreadsheet data. Video and audio datagrams must be delivered in a real-time, continuous, guaranteed way; any delay will make the service unacceptable to the users. Video and audio data objects are huge in size, which require high-speed network transmission speeds. Even the compressed video such as Intel's Digital Video Interactive (DVI) requires a 1 to 2 Mbps transmission rate. When the bandwidth is unavailable for the user request, we have only two choices: either reject the request and tell the user to wait and request again, or reduce the quality-of-service (QOS) (reduce the frame size, drop some frames, etc.) to accommodate the request. Compressed video streams are variable bit rate (VBR) and sometimes have a high degree of burstiness.
3.3.3
Network Technologies for Video Applications
Many researchers are working together with industry and the government on the design and development of high-speed and capacity network technologies and real-time networks protocols that can be used for continuous media applications.
The Gigabit Test Bed Initiative The Gigabit Test Bed Initiative [25] is a collaborative work of industry, academia, and federal government. Work on five US testbeds began in 1990 with funding from the National Science Foundation (NSF), the Advanced Research Projects
76
CHAPTER 3
Agency (ARPA), and industry. The testbeds are known as Aurora, Blanca, Casa, Nectar, and Vistanet. Since the establishment of the original testbeds, others, such as the Magic wide area testbed, have also been created. The gigabit testbed project has two major goals. The first goal is to develop architectural alternatives for the possible structure of a wide-area gigabit network serving the research and education communities. The second goal is to understand the utility of gigabit networks to the end user. Aurora testbed is exploring two different technological approaches to the development of gigabit switching systems. The first one is based on Asynchronous Transfer Mode (ATM), which uses small, fixed-size data elements and is one of the current proposals within the carrier standards community for the next generation of network switching technology. The second approach is called Packet Transfer Mode (PTM), which is based on variablesized packets and is a method being pursued within a segment of the data communications industry. Each approach has its advantages, and these and other options may coexist in the future national wide-area network. The project also addresses the various issues associated with interoperability between these two switching technologies, as well as the development of higher-layer protocols and application service models. Blanca testbed is developing and testing new switching machines that can carry a mixture of isochronous and asynchronous traffic in a manner consistent with the emerging architecture of Broadband ISDN. They can handle data communication at speeds of up to 500 Mbps, and several switches can be combined to achieve multigigabit per second aggregate throughput. The new network switch system will be applied to applications such as NCSA's multimedia digital library project. Nectar testbed is a system for interconnecting heterogeneous computing resources via fiber-optic links, large crossbar switches (called HUBs) and dedicated network coprocessors (called CABs). The current Nectar prototype uses 100 megabits per second links; the next version of Nectar will use 1 gigabit per second or higher speed links. The Vistanet testbed includes the network technology, which provides a minimum of 622 Mbps, with 2.488 Gbps trunks used between switches. The project will study the issues of protocols, performance analysis, and switching technologies needed to support multiple-service-class gigabit networks.
Other Related Research Is.sues
77
Real-Time Network Protocol and Service Quality Guarantee The Tenet Group at Berkeley has developed a real-time protocol suite that supports a datagram service with quality-of-service (QOS) guarantee [6]. Through a combination of admission control policies [38] and resource reservations, this protocol can guarantee a statistical level of service. This is achieved by the combination of several network techniques they have developed. Some examples are
•
A new approach to end-to-end statistical guarantee in packet-switching networks by modeling a traffic source with a family of interval-dependent bounding random variables and by using a rate-controlled service discipline inside the network [124, 125, 126].
•
A traffic model called Deterministic Bounding Interval Dependent (DBIND) [68] is proposed for determining network service, when combined with tight analysis techniques for deriving delay bounds, it can potentially achieve high network utilization for bursty video traffic.
•
Renegotiated Deterministic Variable Bit Rate (RED-VBR) [126], which allows more graceful and client-controlled QOS degradation during overload period.
Based on this protocol suite, for example, a client can open a connection to a video file server (VFS) and be guaranteed that 99.99% of the packages will be received on time. This protocol, also called real-time IP (RTIP), has been implemented and tested on a variety of networks (Blanca testbed project, Sequoia 2000, and B A G N e t - t h e Bay Area Gigabit Network) and has been ported to a number of operating systems. The source code of their real-time protocol suite 1 has been available since 1995 at Berkeley Computer Science Department F T P site. Protocol suite 2 is currently under implementation and testing.
Other Efforts Next generation real-time network protocols are also under construction to support the real-time multicast channels, advanced reservation of real-time channels, virtual real-time networks, dynamic channel management, and fault recovery. Client-server mode has been found useful in building the network media servers. The ISO/IEC J T C 1 / S C 2 9 / W o r k Group 11 is working on the Digital Storage Media Command and Control (DSM-CC), which is an application
78
CHAPTER 3
protocol intended to provide the control functions and operations specific to managing MPEG-I and MPEG-II bitstreams. The draft provides the specification for the control of MPEG-II bitstreams in both stand-alone and distributed environments with the following characteristics: •
Multiserver: DSM-CC clients may request service from multiple servers. The environment also contains servers communicating with other servers.
•
Multisession: a DSM-CC client has the ability to have multiple simultaneous calls in progress.
•
Connectivity: it can be broadcast, point-to-point, multicast, or multipoint to multi-point.
•
Multiprolocol: a DSM-CC client may request service of multiple servers, where each communciation path may use multiple diverse network protocols.
3.4
COPYRIGHT
PROTECTION
Intellectual property is defined as information that either has a commercial value or derives its intrinsic value from creative ideas. The owners of the intellectual property are granted intellectual property rights (IPRs), that is, they have the right to exclude others from access to, or use of, their intellectual property, just like other tangible property. The first international treaties covering IPRs are the Paris Convention for the Protection of Industrial Property and the Berne Convention for the Protection of Literary and Artistic Works, which were created in the 1880s. Those treaties are administered by the World Intellectual Property Organization (WIPO), which is a United Nations agency established in 1967. In 1994, the Agreement on Trade-Related Aspects of Intellectual Property (TRIPS) was established as part of the Uruguay Round multilateral trade agreement. It provides a full range of intellectual property rights protection standards, as well as the enforcement of those standards both internally and at the border. The intellectual property rights covered by the T R I P S agreement include copyrights, patents, trademarks, industrial designs, trade secrets (undisclosed information), integrated circuits (semiconductors), and geographical indications. The agreement is administrated by the World Trade Organization (WTO).
As part of the IPRs, copyright is the exclusive right that the owner has to his or her original works of authorship fixed in any tangible medium of expression,
Other Related Research Issues
79
now known or later developed, from which they can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device" [43]. Among the rights the copyright imparts is the exclusive right to reproduce and distribute work, it is probably the one with the most important digital media implications. Copyright protection is very important since it protects intellectual property, assures the creators that they will receive the credits they deserve, and provides incentives to authors and publishers to invest in the creation and distribution of creative works. It also encourages valuable information sharing between the owner and registered users (economic incentives). In the long run, without copyright protection, users, creators and information distributors and carriers will be hurt. Copyright protection is also crucial to the development of digital information systems (such as digital library systems) and may soon become one of the central research issues. Relatively little research has been done in these areas, especially on the copyright protection of multimedia data, including the video data managed by VDBMS.
Copyright protection is becoming increasingly challenging for the following reasons: Information in digital form is very easy to copy and to distribute (essentially by just a click of your mouse button!), and it is very hard to trace the original source of the information. This is espically true when the information is on the Internet (on-line), such as the World Wide Web. Distributed information systems make it very difficult for any central authority to monitor and manage possible violation activities. The ubiquitous nature of digital networks makes it all but impossible to control the distribution and redistribution of copyright-protected information, or even to determine the jurisdiction in which an alleged copyright infringement has taken place. Payment of tariffs and negotiation of rights become more difficult. New advances in technology, like Web technologies (home pages, autonomous agents, robots) make digital information more available than ever before. At the same time, technologies pose even greater challenges to the protection of intellectual properties. "How to prevent people, robots or agents from stealing my creation or design in my World Wide Web pages?" is always a hot topic in the Internet news groups. Unfortunately, "Don't put them on-line" seems to be the answer one often hears.
80
CHAPTER 3
There are two different approaches to address this problem: copyright violation prevention and copyright violation deteclion.
3.4.1
Copyright Violation Prevention
The basic idea here is to make it very difficult for unauthorized copying of the information. Popek and Kline [86] use Secure printers with cryptographically secure communication paths. Griswold [44] proposes an active document approach, in which users may interact with a document only through a special program. Choudhury et al. [24] explore the use of cryptographic protocol in the copright protection problem for electronic publishing over computer networks. The first protocol is to use a public and private key pair that is known only to the hardware manufacture. The key pair is embedded in the printing or display devices. The second schema is to encrypt the document and decrypt using software at the recipent's computer. There are some problems with preventive approaches: •
They may cause inconvenience to authorized users, and thus discourage them from purchasing and using the information.
•
They may require special hardware or software.
•
They restrict the means to access the documents.
•
Illegal users can obtain a copy of the document by getting access to the special hardware (such as a printer) or software.
3.4.2
Copyright Violation D e t e c t i o n
The basic assumption of this approach is that all the users are trustworthy, and that their access to the information is allowed. At the same time, efforts are made to discover unauthorized copy activities. Examples of this approach including the following:
Signature (watermark) scheme. Each document comes with a unique signature for each registered user [14, 15, 114]. For example, Srassil et al. [14, 15] use both line and word shifting to mark and identify the document, which can be unique to certain registered users. The marks are indiscernible to
Other Related Research Issues
81
users and can be used to identify the registered owner if a suspected document is found. The problem with this approach is that the user could forge or destroy the signature. Copy detection server scheme [16, 66]. The author registers his or her new work in the server, and the document is broken into small pieces (sentences) and hashed. Any given document can be compared with what's registered in the server to check whether there is any overlap; an overlap threshold m a y be set to determine whether there may be a violation of copyright. H u m a n examination m a y be needed to really determine whether there is an violation. For example, Brin et al. [16] proposed a system for registering documents and detecting copies, which can be either partial or complete. An algorithm for such detection and matrices required for evaluating detection mechanisms (covering accuracy, efficiency, and security) are proposed. A prototype system called COPS has been implemented, and experimental results are presented that suggest proper setting for copy detection parameters. There are also some problems with this schema:
- It requires a centralized registration system, which m a y not fit well into today's highly distributed information systems, such as World Wide Web and the Internet. - It applies only to text document data [16] and it is unclear as to how the schema can be applied to other kind of data (programs, sounds, videos, images). - Convincing users to use the server system m a y be a problem. - It is a passive detection schema, and depends on the submission of documents in "question" to detect any unauthorized copy.
3.4.3
Conclusions
As we can see, at the present time, there is no completely secure system that will also allow easy authorized access; there may never be such a system. Also, how to apply existing copyright protection techniques to the video database management is not very clear. There are other research issues related to copyright protection, such as encryption and authorization, access-charging mechanisms, preventing registered users from redistributing the information (through large mailing lists, newsgroups and so on) to other unauthorized users. Many researchers are also aware that our traditional copyright law and patent system is not adequate for digital information, and a new infrastructure is
82
CHAPTER 3
needed [91]. Emerging information technologies have posed serious challenges to the interpretation, application, and enforcement of the current intellectual property law. In July 1994, the Clinton Administration's Working Group on Intellectual Property Rights issued a Preliminary Draft Report on Intellectual Property and the National Information Infrastructure [116]. Still the fundamental definition of copyright violation and the copyright policy in the new digital information era are highly debatable and under discussion [91].
4 PRODUCTS
4.1
INTRODUCTION
The purpose of this chapter is to give readers a basic idea of the state of the market of hardware technologies related to the video database and its applications. We will briefly describe some of the products in three categories: video boards, storage systems, and video server systems. Although the authors have made a considerable effort to keep what they describe here up-to-date, it is by no means a complete listing of all the commercial products now available on the market due to the ever-evolving nature of the market, and the diversity of video products. There are many other video products that we could describe here, for example, in the video board section, the MPEG encoding system and animation controller, removable storage systems, and so on. All information in this chapter was gathered through World Wide Web sites and product catalogs of companies, as well as newspapers and magazines. The authors have not checked the precision of the technical details that come from above sources and thus do not hold any responsibility for their accuracy. All the company names and product names are trademarks of their respective companies.
4.2
VIDEO
BOARD
To support video database applications and other realtime video applications in interactive training, digital library, education, and entertainment, which are in high demand now, virtually every graphics chip vendor has announced a graphics architecture and video board to support full motion capture and playback of digital video. 83
84
CHAPTER 4
FullVideo Supreme (TM) is Digital's first PCI video board for its AlphaStations. It has the capacities of full motion video capture and video output, which add video capture and live video playback to various applications. FullVideo Supreme supports software compression for capturing and creating audio and video (AVI) files and incorporates Digital's patented AceuVideo rendering technology for video playback. It also takes advantage of the AlphaStation system's fast PCI bus that can accommodate a large volume of data for supporting the requirements of full motion video. FullVideo Supreme supports the input video format shown in Table 4.1. Format NTSC NTSC PAL, SECAM
Frame Size 640x480 (full size) 640x240 768x288
Speed 30 frames/sec 30 frames/sec 25 frames/sec
Table 4.1 Videoformat supported by FullVideoSupreme (TM) The output can be either NTSC in full frame output or PAL in field output. This video board can be added to any AlphaStation system and can be used with all supported AlphaStation graphics subsystems. FullVideo Supreme can be used for LAN-based video eonferencing if combined with InSoft's Communique! and a desktop camera. Two AlphaStation users can video conference at a frame rate up to 30 frames/see (320x240) if Communique!'s CellB compression and decompression option is used.
Sound gJ Motion J300 (TM) is a low-cost TURBOchannel option card that provides professional quality audio and full motion video for Digital Alpha AXP systems. Its main features include: •
Support for any Alpha AXP 100 Mbyte/second TURBO channel and delivery of full 30 frames/see video,
•
JPEG compression and decompression,
•
Real time video capture of NTSC, PAL, SECAM in composite or S-video video and output in NTSC, PAL in composite or S-video, and
•
Support for multiple sampling rate audio data and ability to record and play back audio from voice grade to stereo CD-quality.
Products
85
Intel's DVI Digital Video Interactive (TM) was introduced onto the market in 1990 (current version i760). It has an Intel 760 video processor and uses DVI file format for video and audio data. It is said to be the first video board that allows real time playback of video and audio. It can achieve a frame rate of 30 frames/sec at a resolution of 320x200. The original application is to present video on a personal computer, but it can be used with Microsoft Video for Windows, IBM Presentation Manager for OS/2 Multimedia, and Apple QuickTime for Windows.
TARGA 2000 (TM) for the EISA bus is Truevision's high end desktop video production engine based on DVR Technology. Main features of the TARGA 2000 include •
Full-screen (640x480 NTSC, 768x576 PAL), full-motion (30 frames/see NTSC, 25 frames/see PAL) video capture, playback, and recording,
•
16-bit CCIR 601 resolution output (720x486 NTSC and 720x576 PAL),
•
16-bit CD/DAT-quality stereo audio (44.1 or 48 KHz) capture and synchronization to video,
•
Real-time, variable motion J P E G compression with adjustable dynamic quantization factor (Q factor),
•
Fully QuickTime compatible, and
•
Compatibility with any standard Windows (TM) or Video For Windows application, focus on applications such as video and sound editing, effects, and animation.
Other video board products of Truevision Inc. include the TARGA 2000 Pro (Windows, Macintosh) and TARGA 1000 multimedia editing system (released January 9, 1996) for Windows NT. More detail can be found at the company's World Wide Web site: http://www.truevision.com/.
Bitfield ti.261 Video Compression Board is a full length P C / A T - b o a r d which can be used in machines with an ISA or EISA bus to digitize and compress video using H.261. Some technical information about is listed in Table 4.2. The card is controlled from software, and most settings can be adjusted at any time (PAL or NTSC is set at startup). The card can also display overlays, and it is possible to grab images directly from the board or to put images
86
CHAPTER 4
Coding algorithm Video format Frame rate Resolution Data rate Video inputs Video output Audio
T a b l e 4.2 (TM)
CCITT H.261 NTSC or PAL NTSC: 7.5, 10, 15, 30 frames/s; PAL: 6.25, 8.33, 12.5, 25 frames/s CIF (352x288)QCIF (176x144) 0 - 2048 kbps composite or Y/C 2. composite composite, Y/C or RGB G.711 (PCM), 64 or 56 kbps, 3.5 KHZ 7 KHZ optional daughter board
Technical features of Bitfield H.261 Video Compression Board
onto the board--the display memory consists of one luminance bank (Y) and two color-difference banks (CB, CR). The price in April 1995 was about US $5000 for the basic card with G.711 audio. The optional motion estimation daughter board was around US $600. Bitfield Inc.'s World Wide Web URL is http://www.bitfield.fi/. PC Motion and PC Motion Pro (TM) (see Figure 4.1) is the video board product of Optibase Inc. that provides full-motion playback of MPEG digital video and audio for video-on-demand, preview channels, and movie-on-demand applications. Users may place up to to four PC Motion Pros in a single computer system. Some features of PC Motion video boards are
Compatiblity with the ISO MPEG-I standard, Analog video input/output (NTSC: 30 fps; PAL: 25 fps) at Composite CCIR 601, Ability to support MPEG audio with two channels, MPEG layer I (32Kbps to 448 Kbps) and II (32 Kbps to 384 Kbps) audio quality can be CD quality stereo or mono, and Software that includes Microsoft Video and System Development Kit (SDK). Optibase announced on April 15 1996, two new MPEG encoding products: MPEG Forge and MPEG Fusion. MPEG Forge is a two-board plug-and-play
Products
F i g u r e 4.1
87
O P T I B A S E ' s PC Motion Video Cards (TM) Q O p t i b a s e Inc.
MPEG-1 and MPEG-2 encoding system that creates MPEG-1 SIF and MPEG2 Half D-1 streams. The price of MPEG Forge is around $18,500. MPEG Fusion supporting SIF, Half D-l, and Full D-1 MPEG encoding. It is a threeboard that consists of two PCI video encoders and an ISA audio encoder. MPEG Fusion is priced around $35,000. More information can be found at http://www.optibase.com/. Image Manipulation Systems Inc. (IMS, URL: h t t p : / / w w w . i m a g e m a n . c o m / ) has a series of video cards for SUN Sparc workstations (SBus architecture) and VME 1, PCI Bus platforms. Some of them are briefly described here.
•
IMS SC3000 SUN SBUS Card (TM) is a single slot Sbus card that provides the ability to compress and decompress video and audio in real time on any existing Sparc station. It supports the H.261 video and G.711, G.722, G.728 audio protocols. It also allows the user to display a video image in real time on standard Sun display devices. The display window is 320x240x8bpp (bits per pixel). The image is dithered and transferred in real time by the SC3000 as a SBus master device that makes the load added to CPU minimal.
•
IMlO02 SBUS NTSC Output Card (TM) is a single slot SBus card that provides the ability to create NTSCvideo images from the SUN workstation. One can run the sample programs to copy a region of the X l l display
1VMEbus is an i n d u s t r y open s t a n d a r d s y s t e m to build c o m p u t e r s y s t e m s . VME64 h a s been approved by ANSI in 1995. More information a b o u t V M E b u s s t a n d a r d can be found at: http://http://www.ee.ualberta.ca/archive/vmefaq.html.
88
CHAPTER 4
to video out. Or one can write his own programs to deal with the framebuffer directly. The list price for the IM1002 is around $2,995.
VJ3000 VME High Resolution RGB/NTSC/PAL JPEG card (TM) is a single 6U VME card that provides the ability to compress and decompress high resolution RGB video, and decompress for display on a R G B / s monitor. The list price for the VJ3000 is around $7,995.00. IMS VC3000 VME H.261/H.320 Card (TM) is a single 6U VME card that provides the ability to compress and decompress video and audio in real time on any existing Spare station. It supports the H.261 video and G.711, G.722 and G.728 audio protocols. The cost for the VC3000 video card is around $4,995.00. IMS PC3000 PCI H.261/H.320 Card (TM) is a single slot PCI card that provides the ability to compress and decompress video and audio in real time on any existing Spare station. It supports the H.261 video and G.711, G.722, G.728 audio protocols. It also allows the display of a video image in real time on standard Sun display devices. The display window can be sized from full size to single pixel. The image is dithered and transfered in real time by the PC3000 as a PCI master device to minimize the CPU load. The list price for the PC3000 is around $1,995.00. Talisman(TM) MPEG Playback Cards of OmniMedia Technology, Inc. are computer add-on cards for MPEG Playback on a VGA Monitor a n d / o r TV. They provide full screen, full motion video with 16-bit stereo sound. Some important features are •
Support for MPEG video CD and CD-i movies and games,
•
Ability to play fullscreen video at 640x480 (8,16 or 24 bit color), 800x600 (8,16 bit color), or 1024x768 (8 bit color),
•
Simultaneous output to T V a n d / o r VGA monitors,
•
Ability to accept MPEG-I system streams or CD data with no external parsing requirements,
•
Ability to decode and synchronize SIF-resolution MPEG video and two channels of MPEG layer 1 and layer 2 audio, and
•
Ability to support YUV 4:2:2 true color video playback formats.
Products
89
F i g u r e 4.2 Left: P a r a l l a x video cards: clockwise f r o m t h e u p p e r left are XVideo for Sun, XVideo for HP, XVideo for P C I / A I X a n d PowerVideo for Sun. Right: Xvideo s u p p o r t i n g Video Tool for S U N @ P a r a l l a x .
More information can be found at http://www.omt.com/. Parallax Graphics video boards (see Figure 4.2) of Parallax Graphics, Inc. is a series of video products that support high performance video on Sun/Solaris Spare stations (Sbus), HP/IIP-UX 9000 series 700 Workstations (EISA Bus), and IBM/AIX Power PC (PCI Bus) workstations. Some important features of those products are Full motion (30 frames/sec), true color (24 bits) real time video capture and playback, Full size and resolution video, such as NTSC video displays at 640x482 pixels at 30 frames/see and PAL and SECAM display at 768x576 pixels at 25 frames/sec, Hardware based Motion JPEGhardware compression and decompression, Cross platform support which, for example, can support a real time video conference between Sun/Solaris, HP/HP-UX and IBM/AIX workstations, and Various supporting software and developing tools. There are three families of boards:
90
CHAPTER 4
XVideo offers features including video compression, two simulataneous video displays and analog video output, and a complete software development environment. MultiVideo is targeted at the networked video applications such as video conferencing and digital video capture to disk. PowerVideo supports image capture and video-in-a-window applications. The list prices as of November 1 1995, of the above products are $9,485 for XVideo with compression, $5,990 for PowerVideo and $3,490 for MultiVideo. More information about the products of Parallax Graphics Inc. can be found at their World Wide Web Site (http://www.parallax.com/). From the above examples, one can find out some of the important features that all the video boards provide or will provide in the future: •
Full motion (30 frames/sec for NTSC and 25 frames/sec for PAL, SECAM), full size (640x480 for NTSC and 768x576 for PAL, SECAM) real-time video capture and playback,
•
Standards support (MPEG, AVI, QuickTime, H.261, etc.),
•
24 bit true color image support,
•
NTSC, PAL, and SECAM video input and output support,
•
CD-quality sound, and
•
Software development environment support.
Right now, most of the video boards are still in the thousand dollar range, which seems to be expensive for home video applications. However, with advances in technology and increased competition within the video board market, one could expect continuous price decreases for video boards and, at the same time, a constant increase in their functionalities.
Products
4.3
91
VIDEO STORAGE SYSTEMS
4.3.1
R A I D Technology and S y s t e m s
Huge data volume and the real time requirement of video database applications demand large capacity, fast data access, and a low price storage media system. Digital magnetic disk drive is one of the choices for such a purpose, especially the Redundant Array of Inexpensive Disks (RAID) systems. RAID systems increase data storage space, improve access (read and write) speed, and avoid data loss and downtime by distributing data among multiple disks, creating parity data, and mirroring data on separate disks. RAID systems typically include SCSI-2 Fast (10 MBps) or Fast and Wide SCSI (20 MBps at 16-bit and 40 MBps at 32-bit) interfaces for maximum throughput, which is usually above 10 MBps. The access time is usually below 10ms. There are seven RAID levels at which a RAID system can be set up. Table 4.3 summarizes their meaning, advantages, and disadvantages, which is adopted from [112] with some modifications. All RAID controllers have level 0 as an option. Different RAID levels can also be implemented using software if one has SCSI-2 Fast and Wide high-capacity drives. For example, Adaptec's Remus support RAID levels 0, 1, 4 and 5. Most RAID systems on the market have a default level at 5 since RAID 5 strips data in terms of block and allows concurrent accesses, which makes its performance better for the transaction processing applications such as databases. RAID systems currently on the market usually cost over $20,000 and have dozens of GB of total storage space.
4.3.2
Magnetic Disk Storage S u b s y s t e m s
CD storage systems have several advantages (huge storage capacity, for example). The price drop in recent years makes them good choices as video backup storage systems. However, the majority of video storage systems are still based on magnetic disk arrays since they provide nmch faster access and a much higher throughput. A few examples are described here. P. E. P H O T R O N ' s DVDA-2 Digital Video Disk Array is a disk array used to real-time digital video storage and playback. It can store up to fifty two minutes of D1 s digital video data. Some of its features include •
Real time recording and playback of 10 bits D1 video, 2ITU-R BT.601, also called CCIR 601.
92
Level 0
0+1
CHAPTER 4
Technology Disk striping: data is written across multiple drives. Disk mirroring: data is duplicated on separate drives.
Pros Best transfer rate and easiest implementation
Cons No fault tolerance
Complete copy of data is always available in case of any drive failure.
Combination of levels 0 and 1
Increase speed of RAID 1 and security of RAID 0 None; today's controller tracks the drive's failure independently. Fault-tolerant, fast data transfer; uses only one drive for storing data parity; good for large multimedia files
Expensive and reduces the available storage space by half; read speed is similar to level 0, but write speed is slower since the data has to be written twice. As expensive as level 1
Report drive failures to the disk controller Disk striping by byte with error correction
Disk striping by sector with error correction
Stripes blocks of data and parity information across all drives
Fault-tolerant; uses only one drive for storing data parity; good for applications with high read request rate Fault-tolerant; good for transaction processing applications because each drive can read and write independently
Almost obsolete mainframe standard; not pratical for PCs All disks involved in every read and write; can handle only one file at a time; not appropriate for transaction processing when writing and accessing many small files; no concurrent access Poor performance for write operations and applications that require high transfer rates
Cannot match the RAID 0 and RAID 1 performance due to the processing required to compute and write error correction data
Table 4.3 Different RAID levels
Products
93
•
48 kHz, 20 bit and 4 channel D1 serial audio recording and playback,
•
Parity disk to secure the operation of the disk array in case of the one disk failure, and
•
SCSI workstation interface with data rate of 10 MBps.
Micropolis' A V Server is a specialized storage subsystems for the rigorous demands of the video-on-demand marketplace. Its main features include •
One to sixty four channels per server and up to 15 MBps per channel,
•
Stereo audio at 48 kHZ per channel,
•
Scalability: servers can be connected to meet one's needs,
•
Support M P E G - I , 1.5 (CL-950,) and 2 video stream,
•
Output is PAL and NTSC compatible,
•
Full C h r o m a Genlock with host and reference sync,
•
Store up to 240 hours video data per server at 1.5 Mb/sec,
•
Fault tolerance provided by RAID, and
•
V C R / V T R - l i k e control protocol.
The Micropolis AV Server Series has models 50, 100, and 200. The price for the AV Server 50 with VideoNet starts around $20,000 and includes four M P E G - I I video channels with 6 GB of storage (approximately four and one-half hours of video at 3 Mbits/sec data rate). The AV Server Series is fully modular, allowing additional video channels as well as larger storage capabilities.
4.3.3
C D and C D - i T e c h n o l o g y
CD Drives Optical disks have a large capacity and long lifetime at a low cost (less than $6 each). All of these make them a good choice for video backup storage media. International standards have been established for the optical disks
94
CHAPTER 4
(Phillips - Sony, High Sierra standards), so they can be read from units of diverse manufacturers. The cost of a CD drive was between $200 and $400 for 4X drive, and about $500 for 6X, and this cost has been decreasing continuously. In fact, Plextor's 8X CD drive is now priced at $375 with fastest access time at 110 ms, which is the best on the current market. The throughput of CD drive is standardized; regardless of the manufacture, any 4X CD drive's throughput is 600 KBps and 8X CD drive's throughput is 1.2 MBps. The access time of the CD drives on the market is usually below 200 ms and is quickly approaching 100 ms. To serve as video backup storage, CD drives can be combined together as large-scale dedicated CD-ROM servers and high-capacity jukebox systems. Those systems can consist of hundreds of discs and usually have high-speed caches to improve the performance.
CD-i Technology CD-i (Compact Disk Interactive) is an entertainment and information system that plays digital data stored on a CD. The CD-i technology was developed jointly by N. V. Philips of the Netherlands and Sony Corporation in Japan. It was aimed at developing a worldwide standard so that CD-i discs would run on all CD-i players anywhere in the world. The CD-i Full Functional Specification has been published and is commonly known as the Green Book. A CD-i player consists of at least a computer and an optical disc drive, with a video output for the TV or monitor, an audio output for the stereo (if desired), and an input device. CD-i combines video and computer animation with digital audio. It was designed to be an all-purpose multimedia device for the home when hooked up to the television. Lots of CD-i titles have been published each year including digital video, games, education material for children and music. On a CD-i one can play video games, use multimedia and educational software, play music, watch movies, or view photographs. CD-i supports several CD-based standards: Audio CD, C D + G (which adds graphics to standard audio CDs), Photo CD, and Video CD (with an optional add-on Digital Video cartridge). Videos on the CD-i video CDs are encoded in MPEG-I format. One example of the video database applications using CD-i technology is the Interactive Digital Entertainment Terminals (also known as digital set-top decoders), which was used in Bell Altantic's first commercial market digital television, and information services, which were transported over Bell Atlantic's video dialtone network in New Jersey. Those terminals are provided by Philips and CLI and use MPEG-II international standard for digital video compression, as well as the MPEG-specified Musician audio system. They each have more than 5 MB of memory and have a graphic user interface is based on Philips' Compact Disc-
Products
95
interactive (CD-i) system. A wireless remote control can be used to go through the onscreen menu of different television choices.
4.4
VIDEO SERVER SYSTEM
An interactive multimedia system uaually has following components: •
Multimedia content preparation and authoring system,
•
A video server system that provides continuous, real time video and audio data streams from its storage subsystem,
•
A distribution network system, broadband networks to deliever multidata using ATM, SONET, or T1, and
•
Operations and business support: billing, user management, access control and so on.
Video server system is a key component of the whole multimedia service. Besides video data, it usually delivers other multimedia database services (such as text, audio, and image). It also needs to interact with the other components of the interactive system, such as boardband network. Video server system differs from traditional file servers because it delivers a continuous data stream to many subscribers simultaneously, and it usually requires hundreds of gigabytes of storage space. The design objective of a video server system is usually cost-effective rather than computing-intensive. In general, video servers are expected to provide the following features and functionalities: •
High performance, for example, allowing multiple user access at the same time,
•
A TV quality video stream,
•
Scalable, for example, be easy to be reconfigured to satisfy different application requirements,
•
Portable, for example, ready to be ported to different operating systems (such as Solaris, AIX, HP-UX.),
•
Open architecture, for example, allow third party hardware or software add-on,
96
CHAPTER 4
•
Multimedia database services that include video, audio, images and text,
•
System management tools (user accounting, pricing, access control, etc.),
•
Reliability, such as high data availability, and
•
Cost effectiveness.
Oracle Media Server (TM) has been widely selected in the industry and used in today's multimedia systems, for example, in various VOD trails and enterprise information management systems (see the application chapter). The key features of Oracle Media Server (TM) include •
Modular design including content management, customer tracking and billing services, and a highly scalable multimedia stream server; custom or third party plug-in modules are allowed,
•
Portability across different networks, set-top boxes, and operating system environments,
•
Based on standards where they exist, from the relevant interactive television standards organizations such as ANSI, ISO, MPEG, and DAVIC,
•
Transparent scalability across different platforms to massively parallel supercomputers that can concurrently serve tens of thousands of video streams, store thousands of hours of video, and handle thousands of transactions per second,
•
Reliability, operates without interruption; multimediasoftware RAID guarantees the continous delivery of data stream in case of subsystem failure,
•
Complete set of system management tool sets for various servers and SNMP support with open MIB, 3 which allows users to integrate Oracle Media Server with existing support infrastructure and to use third-party or custom monitoring tools,
•
Multimedia data management: - Video data: MPEG-I, MPEG-II, QuickTime, Motion J P E G , DigiCipher etc. -
Audio data.
3 Medical I n f o r m a t i o n Bus. MIB is the IEEE s t a n d a r d (P1073) for medical device c o m m u n elation; the p r i m a r y application for MIB is patient m o n i t o r i n g in acute care environments, such as ORs, ICUs, a n d ERs. Typical medical devices include infusion p u m p s .
Products
-
97
Images and graphics.
- Text data including all ANSI-SQL data types, such as subscriber account information, preferences, billing information, movie categories, classifications, starring actors, pricing, newspaper articles, movie descriptions. - External data sources including CD-ROM titles, existing databases or applications. Multimedia data server: - Full VCR controls (play, forward, rewind, pause etc.). - Continuous stream delivery guaranteed by software multimedia RAID. - Parallel stream attribute tagging. -
Off:line tape storage loading facilities. - Contention free data access.
More information about Oracle's Media Server can be found at its W W W site:
http://www.oracle.com:80/info/products/newMedia/OMS/datasheet.html. The HP MediaStream Serveris targeted at large (more than 10,000 data streams), cost effective, reliable, realtime video applications. It has been used by telecommunication companies for interactive T V trials and deployments. Key components of MediaStream Server include A specifically designed video transfer engine (VTE) for optimal delivery of video streams. The V T E is a high-performance, modular, scalable inp u t / o u t p u t ( I / O ) subsystem designed specifically for efficient massive data movement. It provides three main functions: event management, content m a n a g e m e n t , and network control. Its components are data sources, stream controllers, a stream router, a control router, the V T E m a n a g e m e n t system, and real time software modules to integrate the elements within the block.
A MediaStream Application Server (MAS). It is responsible for transaction process, customer billing etc.. The main features of MediaStream Server are •
D a t a stream rate control,
98
CHAPTER 4
Figure 4.3
SGI's Challenge (TM) media server (~SGI.
•
VCR functions for stream services,
•
Support of SNMP agents and MIBs for network management functions,
•
Set top independent and compression independent architecture,
•
High availability of data streams,
•
Scalable configuration to beyond 10,000 streams, and
•
Real time video ( M P E G I and II) and audio data streams.
More information about HP MediaStream Server can be found at HP's W W W site: http://www.tmo.hp.com/tmo/tcnews/9508/TNCovSt.html. The SGI's CHALLENGE (TM)(model S, DM, L, XL) is a family of media server products. Figure 4.3 shows its XL model. The main specifications of its high end XL model include •
From 2 up to 36 64 bit RISC MIPS R4400 processors (250 MHZ),
Products
99
Figure 4.4 TNCi's Cheetah Video Server @TNCi. •
From 64 MB up to 6 GB memory,
•
Up to 6.39 TB RAID-5 external storage space, and
•
Up to 4 P O W E R Channel-2 I / O boards to utilize system's 1.2 G B / s brandwidth.
Teamed up with SGI's SCSI-2 MPEG-H decoders, the system has wide applications in film, video, and broadcast. It acted as content server, for example, in the TimeWarner interactive TV trial started in fall 1993 and Nippon Telegraph & Telephone's interactive TV trial in Japan started in 1994. SGI also formed a join company with AT&T to develop and sell video servers and software. The W W W URL of SGI, Inc. is http://www.sgi.com/.
Cheetah (see Figure 4.4) is a video server product of The Network Connection, Inc. (TNCi), which is aiming at the corporate network market. Cheetah is based on multiple pentium processors and works under the Microsoft NT operating system. But it also interoperable with Microsoft W F W , Microsoft Windows NT, Netware, and Lotus Notes servers. It support Ethernet, Token Ring, FDDI, and ATM networks. Cheetah video server has up to 112 G storage space with hardware RAID 5 protection offering 3 or more fast and wide SCSI channels per TRAC processor. The video data formats supported include MPEG-I, MPEG-II, J P E G , M J P E G , AVI, DVI, and others. Up to 40 streams of digital video at a data rate of 1.2 Mbps/stream are allowed on Workgroup Servers and up to 120 streams of digital video at a data rate of 1.2 Mbps on Enterprise Servers.
100
CHAPTER 4
Figure 4.5
Figure 4.6
AirView VOD System @TNCi.
Video server 2000 Series 2020 Model @Video Server, Inc.
Cheetah Video Server has been used in various applications such as: Airview (see Figure 4.5) - - an in-flight entertainment system, Medview - - a VOD medical training library, SportsView - - a VOD system for sports teams etc. Video Server, Inc.'s Series 2000 Multimedia Conference Servers (MCS) (see Figure 4.6) are targeted at the networked videoconferencing market. They can support from 8 (model 2004) up to 48 video conference endpoints (2020 model). Their main features include
•
Audio mixing and video switch,
Products
101
•
Support standards thus allowing endpoint equipment from different vendors,
•
Ability to support Primary Rate ISDN, El, T1, and switched 56 T1 network accesses,
•
Application software that is PC Window based, and
•
Scheduling services and conference control capacity.
Series 2000 MCS has been used in applications such as the following: Washington State University is using VideoServer Series 2000 MCSs on its main campus as the heart of its Washington Higher Education Telecommunications System (WHETS). W H E T S , started in 1985, is now offering seventy seven classes to over 2,300 students in various branch campuses that may be a few hundred miles away via realtime interactive videoconferencing. BellSouth has selected VideoServer equipment for multipoint videoconferencing service.
StarWorks (TM) is a video network software of Starlight Networks, Inc.. Running either on a PC platform or Sun workstation with solaris operating system, it acts as a video server that shares and stores video and audio data over local area networks (LANs), including Ethernet, Token Ring, and FDDI. The StarWorks video server can handle over 400 gigabytes of digital video and audio storage and thousands of video (MPEG, AVI, QuickTime, etc.) and audio files. Up to 40 simultaneous Windows, DOS, Macintosh, and Solaris users can view video applications while accessing other file servers oll the LAN without impacting other networked applications. Together with S'larWorks-TV(TM), which is a video nmlticasting software, it can provide one-to-many multicasting services over standard enterprise Ethernet local area networks. The multicasting video can be live feed or stored in a video server. Examples of applications are distance learning, remote manufacturing process management, Wall Street live T V news for financial analysts/traders, security/surveillance systems, video conference multicasts, corporate communications, and emergency broadcasts. Some companies do not produce specical video servers; rather they combine their existing system and technologies to answer the video server market requirements. For example, Digital Equipment Corporation is one of the leading
102
CHAPTER 4
computer companies in video-on-demand market. It combines its strengths in Alpha AXP systems, client/server computing, storage, and systems integration to deliver video server platforms that can store and deliver large amounts of multimedia data on demand to thousands, and, in the future, to millions of subscribers. The video server of Digital is a combination of •
Ultra-high performance Alpha AXP processors based on 64 bit RISC architecture,
•
StorageWorks disk storage arrays,
•
Digital Linear Tape(DLT) library subsystems,
•
A interactive gateway unit,
•
A sophisticated server management unit, and
•
GIGA switch, a high-speed networking switch.
Digital's video server systems have been widely used in different VOD user trials and other applications, which are described in the next chapter.
5 APPLICATIONS
5.1
INTRODUCTION
Video databases in combination with today's multimedia (pictures, audio, video, graphics, text) and other technologies (networks, operating systems and so on) are widely used in many areas. Some examples are public information, advertising, entertainment and education. In the near future, such applications will be integrated in distributed systems, composed of computers, (digital) TVs, optical disks, and other electronic units appropriate for information retrieval, presentation, and interaction with users. In this chapter, we introduce some of the applications which we think represent the majority of industrial and academic efforts. As in the last chapter, all the information in this chapter is based on companies' product catalogs, WWW sites, newspapers and magazines. The authors have not checked the accuracy of the information that comes from above sources and thus, disclaim responsibility for its accuracy. All company names and product names are trademarks of their respective companies.
5.2
EDUCATION
AND
TRAINING
Video database technology has many applications in education and training, for example, distance learning, telecollaboration telec|assrooms, interactive training, self education, K 12 school education, employee reeducation. It creates a classroom without walls and puts information from around the world at users' fingertips. Several studies, including the one conducted by the U.S. Department of Defense, have indicated that, compared with traditional training, multimedia
103
104
CHAPTER 5
training is roughly 40 percent more effective, the retention rate is 30 percent greater, and the learning curve is 30 percent shorter. We can develop video database systems that contain various educational topics, such as history, art, computer sciences, and electrical engineering. Instructors can dynamically organize the course syllabus by querying and retrieving the related video clips from these video databases (such as seminars or lectures given by famous professors or experts in different universities or companies), according to students' needs, and putting them together. Moreover, users or students can self-teach or self-review certain topics by using the video database systems. An interesting example of this is self-taught VCR repairing, which allows one to query the video database by a specific problem description, retrieve the corresponding video clips, and watch the expert to find step-by-step instructions on finding the cause and performing the repair. In the following paragraphs, we describe several ongoing research projects on video database for educational applications.
Electronic classrooms and the MUNIN/MultiTeam project is a joint project between the Center for Information Technology Services at the University of Oslo (USIT), the Norwegian Telecom Research Department (NTR), and the Center for Technology at Kjeller (UNIK). Its aim is to implement, try out, and evaluate a system for distance education between electronic classrooms based on T C P / I P network technology. Part of their research is on the storage and reuse of videos of lectures and seminars, as well as the possible editing of them, which will be a great aid for the student who cannot follow the course schedule or wants to review certain lectures. The video data of the system are coded in C C I T T H.261 standards and audio data is coded using PCM; 1 , all these are done by a dedicated ISA card by Bitfield. In collaboration with the Institute for Learning Technologies, NYNEX, and NPAC, the Syracuse University School of Education is working on a project called Living Schoolbook. Based on leading database, digital video, World Wide Web, and networking technologies, the Living Schoolbook demonstrates Education Information Infrastructure (EII) services and interactive, informationon-demand systems for K-12 education. Educational information-on-demand servers are to be built to demonstrate how digital media, high-speed networks, and classroom interfaces provide options for supporting new models of learning in the classroom. So far, Kids Web, a digital library of Internet resources for school kids, has been built, as well as a New York State Image Database and 1PCM represents Pulse Code Modulation, which is an audio coding method. Its sampling rate is 8 kHZ, which results a transfer rate of 64 kBit/sec.
Applications
105
an African American photograph collection database. The centralized VOD server is at NPAC and distributed over NYNET on demand to schools. This server contains thousands of hours of VHS-quality MPEG compressed video data that is indexed by a text database to video clips with an average duration of 15 seconds. Training professionals to use new technologies is an extremely significant factor for a country's economy nowadays. It is estimated that in the modern society, an average employee will be retrained four times during his or her life. A country with an excessive work force in certain areas and a shortage of workers in other areas could direct the training of its work force according to its needs. In the modern manufacturing industry, for example, the emphasis is given to equipment that is adaptable to the needs of production (flexible manufacturing). The work force of the company will need to be retrained and adapted to the needs of evolving manufacturing technologies. A video database system in this case could introduce cheap, quick, and efficient training of technological knowledge in a new area that has future evolution prospects (for example, Internet and W W W technologies) even if the instructors, the places, and the related equipment needed to teach this technology are not physically available. Also, changes in strategies and objectives of large organizations can be easily transmitted down the hierarchy in remote branches of the organizations that may be located in different parts of the world through the use of distributed video database systems. Based on distributed video database technology, the large training costs (installation, premises, equipment, instructors, transport of the trainees from all over the country, accommodation expenses, absence from positions where the presence of the personnel is crucial), accidents, or the loss of human lives and equipment can be avoided. For this reason, the U.S. Department of Defense has recently financed tile area of training systems with an enormous amount of money. According to market research done by Training Magazine, the market for corporate training video systems is a potentially explosive one; $56.6 billion was spent on training in the United States last year. Some businesses are already taking advantage of video database technology; Focus:Hope Center for Advanced Technologies, a nonprofit job-training organization in Detroit, uses video servers to teach manufacturing-floor skills to auto-plant job candidates. Video-based training and education are totally new concepts in the way education is delivered and relationships are structured between industry and schools.
106
5.3
CHAPTER 5
ENTERTAINMENT
Film clips database, video-on-demand, pay-per-view, interactive TV, in-flight entertainment (IFE), video game database, sports video database, and video dating services are all interesting applications of video databases in the entertainment industry, which is probably is the biggest video database market right now.
One of the fastest-growing markets for video database technology in the entertainment industry is video-on-demand (VOD) services. Analysts have estimated that more than $1 billion will be spent in 1994 on video-on-demand infrastructure worldwide, with nearly $3 billion in 1995. Dataquest Corp estimated that by the end of 1997, video server sales will reach $5.2 billion. Many big companies worldwide are gearing up to provide video-on-demand services to their customers. Digital's video and interactive information technology incorporates Digital's Alpha AXP processors, StorageWorks disk storage arrays, Digital Linear Tape (DLT) library subsystems, an interactive gateway unit, a server management unit, and GIGA switch (a high-speed networking switch linking the various elements together). Digital is one of the major suppliers of video server systems. It entered the VOD market by providing Integrated Video Server Platform and other interactive services for US WEST Communications' proposed broadband communications trial in Omaha, Nebraska, in 1994. Since then, Digital's media server technology has been selected by many cable and telephone companies worldwide. A few examples follow: In February 1994, Digital was selected by NYNEX as one of the suppliers for its initial broadband network deployment in Rhode Island. In November 1994, Digital's video server technology was selected for one of Europe's first trials of video-on-demand services. The trial was to take place in a suburb south of Stockholm, Sweden, and was undertaken by Svenska Kabel-TV and Telia AB, the Swedish national phone service provider. In February 1995, Digital signed an agreement with Ameritech for $40 million worth of its media server products and services over the next five years to Ameritech's interactive services deployment. Ameritech's interactive deployment will provide movies-on-demand, home-shopping, and other interactive entertainment services. The Ameritech network is ex-
Applications
107
pected to provide local phone, data, and video services to more than 13 million customers in Illinois, Indiana, Michigan, Ohio, and Wisconsin. TMN Networks, Inc., which is the major partner of Viewer's Choice Canada Pay Per View and the owner ofTMN - The Movie Network and MOVIEPIX, signed a multimillion dollar agreement with Digital on April 26, 1995. It will use Digital's second-generation media server technology for the network origination of pay-per-view and pay television programming. TMN will become the world's first to deliver a complete broadcast schedule using MPEG-II digital compression when the project is done. Combined with satellite technology, the company is expected to increase the number of pay-per-view channels and provides other services cost-effectively. Westminster Cable, a London cable TV operator in the Borough of Westminster, is implementing Digital Equipment Corporation's Mediaplex media server technology for a video-on-demand trial, initially involving 100 customers and later expanding to 1,000. The beta trial will run from November 1995 to March 1996 and will ultimately enable customers to choose from a constantly updated video library of 200 video titles. British Telecom-owned Westminster Cable is the first to offer a truly interactive video-on-demand system in London. Silicon Graphics, Inc. is another active player in the VOD market. On 3une 7 1993 Silicon Graphics and TimeWarner Cable announced they will work together to develop technology for a full-service interactive digital cable television network in Orlando, Florida, based on the MIPS@ microprocessor architecture. The network will provide access to services such as video-on-demand, educational resources, interactive video games and home shopping. Time Warner Cable started its Full Service Network (TM) (FSN) on December 14, 1994. The FSN incorporates several new interactive technologies developed by Silicon Graphics, including FSN system software and user interface, and video-on-demand and three interactive game applications. Silicon Graphics also supplied eight of its CHALLENGE (TM) symmetric multiprocessing (SMP) media servers for use in the network operating center and powerful MIPS@RISC-based digital media technology for Scientific-Atlanta's prototype FSN home communications terminal or set-top device. Several companies are involved in the FSN user trial, including TimeWarner Cable, Scientific-Atlanta, AT&T, SGI, and Toshiba. The trial is a full-scale user experiment with 4,000 subscribers by the end of 1995. The concept of FSN is based on a ATM network connected to home and on a hybrid fiber coax transmission medium. The bitrate at custom site is 3.5 Mb/sec for video. AT&T's high speed ATM switch
108
CHAPTER 5
(the Globeview@2000 Broadband Switching Systems) is fed at 45 Mbits/sec rate (DS3 or T3). About 500 homes will be connected to each node over coaxial cable. At service providers' site, there will be 8 SGI Indigo servers and 16 SGI Vaults to provide a total of 1.5 TeraBytes of disk capacity. The multiplexing is performed at the ATM level using VPI/VCI 2 and AAL5 3. The information is cell interleaved and error protected with Reed-Solomon and then modulated using 64 QAM. In addition, the data is encrypted. Further effort toward the VOD market leads to a joint venture company called Interactive Digital Solutions formed by TimeWarner Entertainment, AT&T Network Systems and Silicon Graphics on April 25, 1995. The company will provide fully integrated, multimedia software environments to enable interactive services. Sequent decided to participate in and to provide high-performance symmetric multiprocessing (SMP) systems and related services for the Bell Atlantic Stargazer (TM) video-on-demand trials by Bell Atlantic Video Services Company (BVS) at the end of 1994. BVS is a partner in Sequent's associate prograin, which enables the two companies to offer integrated hardware, software and service solutions to the video-on-demand market. Waiting for regulatory approvals, BVS planned to provide Stargazer video-on-demand service to approximately 1,000 customers in Washington metropolitan area, Northern New Jersey, Baltimore, Virginia Beach, Philadelphia and Pittsburgh in 1995. By 1998, it will become one of the twenty largest markets in the Bell Atlantic region. By the end of the year 2000, it plans to provide services for up to 8.5 million homes throughout the region. In the Stargazer trials, Oracle Media Server (TM) software from Oracle Corp. manages the multimedia data and video delivery while a Sequent Symmetry system integrated with Bell Atlantic application software monitors and qualifies customer-ordering activity. ATM is a connection oriented protocol, a n d as such there is a connection identifier in every cell header t h a t explicitly associates a cell with a given virtual channel on a physical link. T h e connection identifier consists of two subfields, the Virtual Channel Identifier (VCI) a n d the Virtual P a t h Identifier (VPI). Together t h e y are used in multiplexing, demultiplexing, a n d switching a cell t h r o u g h the network. VCIs a n d VPIs are not addresses. T h e y are explicitly assigned at each segment (link between ATM nodes) of a connection when a connection is established a n d r e m a i n for the d u r a t i o n of t h e connection. Using the V C I / V P I the ATM layer can asynchronously interleave (multiplex) cells from multiple connections. 3ATM a d a p t a t i o n layer (AAL) 5 s u p p o r t s connection-oriented variable bit-rate d a t a services. It is a substantially lean AAL c o m p a i r e d with A A L 3 / 4 at the expense of error recovery a n d built in retransmission. This tradeoff provides a smaller b a n d w i d t h overhead, simpler processing requirements, a n d reduced i m p l e m e n t a t i o n complexity. Some organizations have proposed AAL5 for use with b o t h connection-oriented a n d connectionless services.
Applications
109
On December 7, 1994, HP's MediaStream Server (TM) was chosen by New England Telephone Company (SNET) in an expanded trial of interactive-video services for up to 150,000 homes in Connecticut's Fairfield and Hartford counties. The services includes video-on-demand, enhanced pay-per-view, banking, travel, home shopping and interactive games. Other suppliers of this trial are Sybase (interactive-media server software), Scientific-Atlanta, Inc. (set-top decoders), AT&T (switching technology), and ALS (digital transport technology). Oracle is another major player in the VOD market; it has been working with other telecommunication companies (such as Bell Atlantic) and cable TV companies to bring the VOD to the ordinary user. The strengths of Oracle are the wide acceptance of Oracle7 as a standard database environment and the availability of third-party software solutions developed for the cable industry and optimized for Oracle7. Their media servers have been used by Continental Cablevision, Inc.'s NVOD tail and Bell Atlantic's STARGAZER (TM) system trial begining in 1995. Oracle also teamed up with Pyramid Technology and Siemens Nixdorf on April 3, 1995, to provide highly scalable video servers to the emerging global market for interactive multimedia services. Oracle will port its Oracle Media Server software on Pyramid and Siemens Nixdorf's Reliant RM1000 Parallel Server platform. The Reliant RM1000 is a new system architecture that combines the strengths of both symmetric multiprocessing (SMP) and massively parallel processing (MPP). Oracle Media Server software running on the Reliant RM1000 will scale to hundreds of processing nodes and tens of terabytes of online storage, providing rapid access to thousands of hours of video-on-demand services. Micropolis Corporation is one of the leading manufacturers of video-on-demand servers. On September 11, 1995, Micropolis Inc. and Matsushita Avionics Systems Corporation announced that they will deploy AVOD (Airline Videoon-Demand) systems on commercial airlines. The AVOD is a part of MASC's 2000EV in-flight entertainment system. The system will offer airline passengers the flexibility to choose a video or audio selection from the airline's library and play it back on demand, rather than at scheduled viewing times. Over forty hours of video data that are in MPEG-II or MPEG-I format are available in 32 to 128 channels, together with 96 hours of CD quality audio for up to 12 stereo audio channels, will be stored on Micropolis' AV server, utilizing Micropolis mission-critical disk drives. The first installation on a 747 j u m b o jet is scheduled for completion by the end of first quarter 1996, with additional installations to follow throughout the remainder of the year. In the near future, many of the world's leading airlines will install the system 2000EV in their
110
CHAPTER 5
aircraft. The new in-flight entertainment system incorporates a 6.5 inch LCD display in the armrest of each first and business class seat, with future plans for installation in the economy sections as well. As an option, three stereo audio channels per video channel are available, allowing the airlines to offer movies with multilingual audio output. In addition to the video and audio applications supplied by Micropolis, Matsushita already includes video games, telephones, and other interactive services. Micropolis Corporation also works with Texscan MSI, a division of TSX Corporation, the leading supplier of cable television (CATV) advertisement insertion systems, to provide the first commercial insertion system. The system will use Micropolis AV Server 50, which is MPEG-II disk drive based. On October 24, 1995, The Network Connection Inc. (TCNi) signed an agreement with Kollsmanto to establish the TNCi AirView system as the in-flight entertainment standard for the airline marketplace. The system is the combination of TNCi's Cheetah Video Server (TM) and its LAN technology. They hope to pass the Federal Aviation Administration's certification process and to bring the video technology to aircraft in the near future.
5.4
COMMERCIAL
Commercial applications of video databases include on-line shopping, stock video clip database, on-line advertisements, and so on which are briefly introduced in this section. In the case of on-line advertisements, product information (such as video, text, audio and images) of a certain company can be stored and used for communication between distant consumers and producers, so that consumers will be able to suggest modifications in the design of certain products. These changes will be grouped and then sent (together with orders) to the production centers or can be stored in a multimedia information management system, from which users can retrieve information through communication channels (yellow pages). This means that information on certain products and advertisements can get to the consumers quickly and selectively. For example, in the apparel retail scenario, a clothing apparel customer can sit in front of his or her PC at home and connect to such a database, browse, and retrieve any product information he or she may have in mind, or retrieve a video clip and see how models wear
Applications
111
those clothes at the fashion show without having to drive to a certain retail shop. In the stock market applications, a good example is the Stock-Clips, which is the first full-motion video stock information service available since the begining of 1995 on the Internet offered by The Network Connection (TNC). TNC's Video WEB server stores two to five-minute video clips of the stock exchanges publicly traded companies backgrounds, company products, and future company developments and allows users to retrieve through the Internet. This service allows organizations, investors, and end-users to view full-motion video clips from publicly traded companies before making their purchasing decisions of stocks. Users can dial into TNC's Video WEB 2000 to access the information with an access fee of $10 per month with unlimited usage. If a company wants to be listed on the server, $1,500 is charged. The video clips stored on the server are under the MPEG format in the t.0 to 1.5 Mb/sec range. Other video formats such as DVI and AVI are also supported.
5.5
INDUSTRY
AND MANUFACTURING
Video database technology in industry and manufacturing has several potential applications, which include the following: Enterprise multimedia information management system, (for example, the manufacturing process video database, which can be used for problem diagnoses), Company wide video broadcasting (discussed this application in the eommunciation application section), and Employee education and retraining (described in the educational application section).
Enterprise multimedia information management system is a fairly new application of video databases. A practical example is an automobile manufacturer who wants to reduce new-car development costs and to speed model development time. It can do so by streamlining their crash analysis through the use of digital video capture and playback. On September 26, 1995, Oracle Corporation introduced a new product called Oracle Media Server for Corporate Use. It is the first product to allow companies to manage, share, and access all types
112
CHAPTER 5
of multimedia information including video and audio data. This media server is a combination of Oracle 7 parallel database server, Oracle TextServer, and Oracle VideoServer. Using Oracle's TextServer, which is based on the company's advanced parallel database software, and supports mainstream symmetric multiprocessing and massively parallel processing computers, corporations can now use low-cost open computer systems to manage all their information. Oracle's VideoServer supports company-wide video broadcasting that can be used for employee training, video conferencing, and video mail.
5.6
DIGITAL
LIBRARY
Do you want to quickly find a video clip of how to cook a certain Chinese dish? Or a video clip of a tour of Yellowstone National park? Or a video clip of Joe Montana's first NFL career touchdown throw? Do you want to look at a video clip before selecting Purdue University to get your Ph.D? If your answer is yes, you need a digital video database or a digital video library. Video data is an inseparable part of the digital library content and video database management techniques are certainly crucial to the development of digital libraries. IBM announced on March 27, 1995, that it was launching the IBM Digital Library, an initiative aimed at helping owners of information content in all its forms - - including films, music, text, art, and rare manuscripts - - maximize their assets and make them available on networks around the world. The IBM Digital Library integrates a wide variety of information storage, management, search, retrieval, and distribution technologies. The information management features of the IBM Digital Library include automated indexing, foldering, correlation, feature extraction, and translation functions. Two examples of the existing applications of IBM Digital Library technology follow:
The Vatican Library IBM's digital cameras are used to capture the library's collection of original manuscripts. Also, electronic watermarking is used to protect the distribution of the images (see Figure 5.1). Indiana University's Variations project It is one of the first large-scale multimedia projects attempting the distribution of digital information (digital audio, full-motion video, etc.) across a campus network. One of the early successes is the Indiana University School of Music Digital Library. The library contains digitized tape recordings, musical scores and so on that can be accessed through the campus-wide network and used to enhance teaching, researching, and learning.
Applications
F i g u r e 5.1
113
Example of ancient manuscripts digitized in the Vatican Library
IBM Digital Library technology has also been used in Case Western Reserve University, Marist College, Archivo General de Indias (Spain) and Derwent (U.K.), just to name a few. In 1994, the National Science Foundation (NSF), the Department of Defense's Advanced Research Projects Agency (ARPA), and the National Aeronautics and Space Administration (NASA) sponsored a National Digital Library Initiative, which is viewed as the cornerstone in a national effort to develop a digital library. Six projects were supported: •
The Alexandria Project at the University of California at Santa Barbara, 4
•
The Informedia (TM) Digital Video Library project at the Carnegie-Mellon University, 5
•
Building the Interspace: Digital Library Infrastructure for a University Engineering Community project at the University of Illinois at ChampaignUrbana, 6
•
University of Michigan Digital Library Project, r
•
University of California at Berkeley Digital Library Project, s and
•
Stanford University Digital Library Project, 9
As an example of this national Digital Library Initiative, the Informedia (TM) Digital Video Library of Carnegie Mellon University is a direct effort to establish a large, on-line digital video library by developing intelligent, automatic 4ht tp://alexandria.sdc.ucsb.edu/ 5ht tp :/ / fuzine.mt.cs.cmu.edu / im / 6ht tp://www.grainger.uiuc.edu/dli/ 7http: / /http2.sils.umich.edu/UMD L /HomePage.html s http://elib.cs.berkeley.edu/ 9 http://www-diglib.st anford.edu / diglib/pub/
114
CHAPTER 5
mechanisms to populate the library and allow for full-content and knowledgebased search and retrieval via desktop computer and metropolitan area networks. The library was populated with 1,000 hours of raw and edited video drawn from video assets of WQED/Pittsburgh, Fairfax County (VA) Public Schools, and the Open University (U.K.). An early version used AVI as the video compression format, which required only 10 Mbytes per minute of source video to achieve VHS video (256x240 pixels) quality. A later version uses MPEG or MPEG-II digital video formats. The video data is automatically segmented to detect cuts, scenes, and shots that are used as an index entry point. The video library system will be deployed at Carnegie Mellon University and local area K-12 schools and will be used by QED Enterprise and the Open University of U.K. to pursue the applications in educational and commercial training. Another research effort in the digital video library is the Digital Video Library System (DVLS) project at the University of Kansas, which is aiming to develop technologies for storing, indexing, searching, and retrieving video and audio information, and sharing data across the Internet or the evolving National Information Infrastructure (NII). The prototype system called VISION has been implemented, and an initial video database has been built from several hours of selected "Nature" videos provided by WNET, "Nova" videos provided by WGBH, and current-event videos provided by CNN.
5.7
HEALTH
AND MEDICINE
With advances in video database technology, multimedia medical information systems can be used to manage the huge data files, which include patient images (such as x-rays and 3-D scans) and videos (such as surgical procedures), as well as other text files (medical records). It can also help the collaboration between hospitals and medical experts, who may be far from each other when there is a critical situation or operation. Video database technology is helping to create a revolution in the current medical system. Some very exciting examples are telemedicine, telesurgery and telediagnoses. An example of practical applications in this direction is TeleMed, which is a joint project of the National Jewish Center for Immunology and Respiratory Medicine and Los Alamos National Laboratory. Physicians at The National Jewish Center for Immunology and Respiratory Medicine in Denver, CO are experts in pulmonary diseases and radiology. They are helping patients combat
Applications
115
the effects of tuberculosis and other lung diseases throughout the nation. These individuals are an expensive and scarce resource who travel around the country to share their expertise with other physicians. To make their knowledge and experience available to a wider audience, Los Alamos National Laboratory has developed a telemedicine system called TeleMed that is based on a national radiographic repository located at Los Alamos National Laboratory. Without leaving their offices, participating doctors can view radiographic data via a sophisticated multimedia interface. With the new system, a doctor can match a patient's radiographic information with the data in the repository, review treatment history and success, and then determine the best treatment. The features of TeleMed that make it attractive to clinicians and diagnosticians also make it valuable for teaching and presentation as well. Thus, a resident can use TeleMed for self-training in diagnostic techniques and a physician can use it to explain to a patient the course of their illness. TeleMed is deployed over a broadband T3 network between Los Alamos, National Jewish Center, and the National Institutes of Health in Bethesda, MD. The Center for Disease Control in Atlanta, GA, the Bureau of Tuberculosis Control in New York, NY, and the Department of Health Services in Los Angeles, CA joined the network at a later time. In the future, TeleMed will run over the National Information Infrastructure (NII) and enable doctors throughout the country to make use of the expertise of diagnostic experts, improving the quality of health care while reducing its cost.
5.8
COMMUNCIATION
Digital video databases can be used as data resources for the broadcasting of video data. Digital video broadcasting is very useful for those companies whose users constantly need to monitor news or other live broadcasts for information that might affect their business decisions. Examples include financial analysts and investors at Wall Street firms, government agency personnel, and researchers at the National Laboratories. Other companies or communities need the capability to broadcast information on a company-wide basis for training or other eorporative communication purposes. On April 15, 1995, K O L D Channel 13 in Tucson, Arizona, became the world's first T V station to use a networked digital video server in its daily on-air operations. The broadcast video server provided by HP replaced the station's robotic tape carousel for broadcasting digital commercials. It automatically
116
CHAPTER 5
stores, plays back, and converts digitized programs such as commercials, promotional spots, and public service announcements, into analog signals that can be received by standard television sets. In the future, it will be used for applications such as time-delayed network programming and news editing, making itself the nerve center for KOLD's digital broadcast environment. On March 21, 1995, Digital Equipment Corporation announced the introduction of a digital-based, video broadcast system called the AlphaStudio Broadcast System. The new system is based on the company's 64-bit Alpha chips. Its integrated high-performance computing and storage provides reliable, productive, and flexible full-motion imaging service. The system contains four major components: •
Automation Control video material;
a communication system that controls the flow of
•
Network Backbone the software architecture that efficiently distributes and manages the video data;
•
Broadcast Record/Edil/View (REV) System a system that provides for commercial playback, digital video disk recording, and non-linear editing;
•
Video Library Server a server that allows the video archive storing of thousands of hours of video data.
The price tag of this system is $150,000 and up. Digital also reached a nonexclusive cooperative agreement in interactive information and entertainment systems with Alcatel in 1995. They integrated Digital's second-generation media server technology with Alcatel's network switching equipment to distribute interactive information and entertainment services on traditional phone networks as well as emerging boardband networks. Sun Microsystems Computer Company (SMCC) announced ShowMe (TM) T V software in November 1994. ShowMe (TM) is a powerful new audio and video communications product that delivers broadcast TV to the desktop. ShowMe TV consists of two primary components: ShowMe TV Transmitter and ShowMe TV Receiver. ShowMe T V Transmitter broadcasts video and audio from a standard tuner, camera, VCR, or file to any workstation or server on the network. It uses efficient compression techniques to broadcast over the existing network without disrupting normal data flow or use of the network. The ShowMe Transmitter requires a standard SunVideo (TM) board in the workstation or server.
Applications
117
The ShowMe TV Receiver allows the user to display, control, and record program material that is broadcast over the existing local area network. The VCR feature allows the user to record any broadcast onto a local or remote disk. Sony Corporation and Oracle Corporation signed a letter of intent on February 16, 1995, to work together to develop video, audio, and text news database products (digital newsroom) for professional broadcast and the production industry. The two companies plan to combine Oracle's expertise in multimedia database management with Sony's strength in television broadcast and production to create a new digital electronic newsgathering (ENG) video system. The system will form a key element of Sony's digital broadcast station system, incorporating acquisition, production, master control, archiving, and play to air. The system will integrate Oracle's Media Server, ORACLE 7, and Cooperative Development Environment (CDE). Another application of video database system in communication is video conferencing recording and archiving. Video databases can be used to manage the session recordings of video conferences and provide services such as contentbased video conference session searching and browsing.
5.9
LAW ENFORCEMENT
One important application area of video database management system is law enforcement. Possible examples are face or finger print database, crime scene video database, and automatic access control. We next describe two efforts in this direction:
FACEi! is a facial image composition, recognition, and retrieval prototype system developed at the Institute of System Science (ISS) of the National University of Singapore. It integrates anatomical facial feature-based matching, fuzzy inference and image aging techniques to provide a powerful and user friendly face image database retrieval engine. Its possible applications include mugshot photo-fitting and recognition for criminal investigation, automated human surveillance for access control, personal identification verification for security screening, forensic face reconstruction, and face plastic surgery. The system runs on an ordinary workstation with UNIX and X window systems. Another interesting prototype system (with an almost identical name) is the Facelt (current version 1.0), which runs on Silicon Graphics machines with
118
CHAPTER 5
video boards and is available free of charge to interested users. 1° Facelt was developed by Joseph J. Atick, Paul A. Griffin, and A. Norman Redlich in the Laboratory of C o m p u t a t i o n a l Neuroscience at Rockefeller University and can recognize faces from live video input. Users must build a face image database when they first use the system, which can be easily done by running the system in the Gather mode. The camera will stop automatically whenever there is a new face it cannot recognize, and the user can choose to add or skip the face image. The Gather mode can be stopped at any time and enter the Recognize mode. In the Recognize mode, FaceIt will notify the user whenever it recognizes a face from the video input, and output the face image as well as the person's name retrieved from database. The face recongnized in the video input will also be highlighted. There are four built-in modes that run in the Recognize state.
Stranger mode The system automatically looks out for people not in the database. It can sound an a l a r m when it sees a stranger and store an image of the stranger, so that at any time the user can scan through all the stranger images to see who has been caught. Surveillance mode FaceIt keeps track of complete time and date information together with the cumulative number of times seen of certain given people in its database. Screen lock mode The system works like a standard screen lock except it automatically unlocks the screen when the user returns and looks into the camera. In the meantime, other people will be denied access until the user who looked into the camera unlocks it. Search mode In this m o d e a large video database can be automatically scanned to look for all frames in which a particular individual (such as a famous person) appears. The FaceIt system can be used in m a n y interesting applications such as restricted building entrance monitoring and access control, criminal recognition, etc. l°anonymous ftp to venezia.rockefeller.edu under the/pub/faceit directory
Applications
5.10
119
CONCLUSION
Like other technologies, video database applications are market driven; videoon-demand is a good example. The full potential of the video database market depends on how researchers and manufacturers work together and integrate it with other technologies, high-speed network systems, fast data storage, transaction and security, massive parallel computing and database, video and image recognition to provide cost-effective solutions to users' needs. Video database will be a integral part of the NII (National Information Infrastructure) or GII (Global Information Infrastructure) to provide complete interactive, multimedia service to the customers and will have a vast application market in the near future. In previous sections, we talked about applications of video databases and various efforts at implementation. This technology is rapidly progressing, and by the time this book is printed, we expect that several novel applications will appear. The application of video database technology is currently in the initial stage and a lot of marketing activities were just started in the past two years. As a matter of fact, some video application services are still at prices too high for customers, and only very limited user trials have been done. Also, many application areas have not been touched yet, and lots of problems are waiting to be solved.
6 CONCLUSIONS
In the previous chapters we have given a complete overview of what makes video database technology work and what products can be found in the market. Further, we have given a brief overview of sample applications and how they further the cycle of innovation that started with relational databases enhanced with BLOBs to video storage and nmltimedia presentation tools. There is no question that the impact of adding video to the database media increases its impact above and beyond traditional applications to more fascinating areas. Databases now can store not only customer information but also their picture, not only disconnected tabular data about a chemistry experiment results but also a video of the experiment itself. It is laboring the obvious to say the potential of this technology is endless. The goal of this book is to shed some light on the potential of this type of technology. There is also a lot that still remains to be discovered about this technology. Eventually, a total integration of television, PCs, and the network will unleash the ultimate demand for the integration of digital video from work, education, and entertainment, inter alia. Besides the technology, the only stumbling block in our opinion is the content. Technology alone is not sufficient to open up commercial and educational aspects of this technology. For example, just setting up a digital video server in the university is not enough, professors have to spend time developing content materials that can be used with this technology. As more courses are available digitally, the idea of distance education will be a given and not a luxury. The contents of this book are a necessary first step to understanding the technology on which this phenomenal new adventure is based and also to gain a
121
122
CHAPTER 6
realistic glimpse of the commercially available tools and products that will help make it happen. The first three chapters overview in a brief manner some of the most essential issues that have to be dealt with in order to understand and promote this field of science and engineering. Chapter 2, specifically, summarizes in simple terms conceptual issues involved with video databases. Such brief treatment of so many complex issues cannot be highly detailed. Like the rest of the book, our intention is for this to be comprehensive coverage rather than a detailed one. Issues central to video databases are dealt with here starting with data models, video scene analysis, and segmentation, video indexing and organization, and finally data query and retrieval. In Chapter 3, we go beyond database-related issues and discuss infrastructure type questions such as compression, video servers, file systems, and network support. Chapter 3 deals with hardware, software, as well as network issues. Again, the coverage here is meant to educate and show the options and give pointers. In Chapters 2 and 3, we discuss and summarize important issues in video database research. Despite that fact that there are considerable ongoing efforts being made in various research directions, most of them are either on the image processing and computer vision end of things, or on the database issues. The effort that can combine the state of art of two fields is highly desirable for furthering video database research. Also, lots of research issues are not fully defined, causing problems for the researchers in those directions. One example of such ill-posed problems is the notion of similarity between images, which is one of the basic questions in the video and image databases. How does one measure the similarity between two images, and when does one say two images are similar? The absence of a precise metric causes problems in the query, retrieval, and indexing of video databases. The lack of a clear definition of what a scene change is makes it difficult to evaluate the existing SCD algorithms for temporal sampling of video data. In the future, we also expect to see more work in areas like automatic video annotation and, video-related network technologies. Chapters 2 and 3 are intended for the practitioner who wants to understand the terminology, concepts, and issues but at a a level of detail that could not be found in trade press. The level of detail is also enough to initiate a computer science researcher who is not familiar with the topic. It is not intended for experts in the field of video databases. The survey nature of these two chapters suffices to identify, highlight, and explain the most important concepts as well as provide for a comprehensive set of references to additional and primary reading materials in the field.
Conclusions
123
Chapters 4 and 5 contain something unique to this monograph. We studied both state of the market (products) and state of practice (case studies). These two chapters are unique in that they provide a single source of information about the current (at the time we go to press) nature of the commercial ventures, present and possible, in this field. This starting point is invaluable to any one wanting to answer the questions What is "doable" with today's off-theshelf-technology? and What company offers which products? In essence we did the foot work for someone wanting to invest in this technology. We surveyed products not only in the area of video databases but in related areas as well. In effect, we investigated products that are needed to affect computer-based video activity. Similarly, the applications we investigated range from very mature areas to innovative applications. Chapter 5 summarizes applications for the layman. Again, no great detail is given, rather, a lot of pointers to sources where more information can be given. In addition to these two chapters, a list of useful URLs is given. The purpose of the useful URL list is twofold: to help the reader obtain additional information and to provide a source where a reader can obtain updated information. We expect continuous improvements of the products with new releases out frequently and new product innovations regularly. We want the reader to be able to easily update the data they receive from reading the book. In Chapter 4, we survey some of the video products that are needed to implement a VDB and provide a case study of the applications. Currently, these video products and applications share something in common: high cost. A high-end video board can easily cost you several thousand dollars, which is almost equal to the cost of a personal computer and hardly acceptable to the ordinary user. On the other hand, low-end video boards seldom satisfy users' expectations. The applications discussed in Chapter 5 sound exciting, and lots of effort and money have been invested in them. However, the wide acceptance of the video database and its applications is not yet realized. The main reason behind this is also the cost and performance. Let us take the video-on-demand (VOD) for example. For the last few years, VOD is the "killer application" that everyone was talking about and was backed by m a j o r entertainment and communication companies. Lots of user trials have been carried out as we described in Chapter 4; however, lots of these user trials are now on hold or stopped. One example is the T i m e Warner trial in Orlando, Florida, which is a collaboration effort with Silicon Graphics, AT&T, and Scientific-Atlanta. It has been stopped recently after billions of dollars were invested. One of the reasons is that the cost is so high that very few people are willing to sign up and pay for the services. T o d a y ' s set-top box are being sold for around $2,000 and it is predicted that the set-top
124
CHAPTER 6
box has to be in the $100 to $200 range to make VOD economically practical. Other reasons are performnace, the video, and service quality. Current VOD trails use M P E G - I video stream, which is similiar to the quality of S-VHS video. VOD users find M P E G - I video a little bit disappointing since it has problems with fast-moving scenes, like sport programs. M P E G - I I has broadcast-quality but the system is too expensive to be used in practice. Its encoder, for example, costs around $250,000. Another problem with current VOD is that the related techniques have been m a t u r e enough for it to bloom fully. Concurrent access to stored data is one example. To distribute the video on thousands of user requests at the same time poses a lot of new challenges to the researchers. There is quite a lot of research work going on, some of which we have introduced in Chapters 2 and 3. As the products on the market become more and more costeffective, we believe that applications such as VOD will be accepted by users in the near future in terms of both performance and cost.
A USEFUL
A.1
BACKGROUND
AND
URLS
OVERVIEW
Digital Video World Wide Web page contains much useful i n f o r m a t i o n a b o u t organizations, markets, researchers, etc. http://www.well.com/user/rld/vidpage.html •
A Survey of Distributed M u l t i m e d i a Research, Standards and Products.
http :/ / cuiwww.unige.ch / OSG /Multimedialnfo/ mmsurvey / •
•
Video Demystified: A Video H a n d b o o k for the Digital Engineer. h t t p : / / w w w . n e t s t o r a g e . c o m / k j ack/video_demystified.html Videonics Video Glossary - - a comprehensive list of video terms.
http://www.videonics.com/Video-Glossary.html •
Glossary of D a t a b a s e Terminology. http://www2.baldwinw.edu/-rmolmen/glossary.html
•
Videoconferencing and T e l e c o n m m n i c a t i o n s Glossary. http://www.videoserver.com/htm/glossary.htm
•
Telecom Glossary - - collection of t e l e c o m m u n i c a t i o n terms. http://www.wiltel.com:80/glossary/glosv.html
This Chapter is in part based on contributions made by Lizhen Chen who is a graduate stduent in the Computer Sciences Department, Purdue University
125
126
APPENDIX A
•
Multimedia Glossary. http://www.ima.org/tools/glossary.html
•
HP Digital Video: Seeing Is Believing - - a quick course for digital video. http://www.dmo.hp.com/wsg/ssa/digvideo.html
•
Digital Video Primer. http://www.optibase.com:80/primer.html
•
MPEG Primer.
http: / /www.optibase.com:80/primer2.html •
Digital video technology overview.
http: / /www.sc.ist.ucf.edu/- OTT /2A/2_4_3 /index.htm •
Video Delivery Systems, written by David Sharpe et al., EET Encyclopedia of Educational Technology. http://edweb.sdsu.edu/edweb_folder/EET/EET.html
•
Tutorial: Image Databases on the W W W , written by Andreas Bittorf et al.
http :/ /www.uni-erlangen.de/ docs/ derma/ personen/ asbittor /www-tut / tut _j .htm •
•
LOCAL foilset First Part of Video Server Presentation for HPDC95 Tutorial. http ://www.npac.syr.edu/users/gcf/hpdc95videoA/fullhtml.html Making Sense of Multimedia, presented by Intel.
http://www.intel.com/procs/homepc/vollno2/septech.htm •
Collection of MPEG/QuickTime video sites on the Internet. http://www.sover.net/-ren/lan.html
•
Multimedia and authoring resources provided by Northwestern University library. http://www.library.nwu.edu/media/resources/multimedia.html
A.2 A.2.1
RESEARCH
ISSUES
Image Processing Techniques
Evaluation of a Video Image Detection System (VIDS) - - Final Report, written by B. H. Cottrell, Jr. http://www.bts.gov/smart/cat/cottre.html
Useful URLs
•
127
Hausdorff-Fraction Motion Tracking on a Clustered Workstation, written by Matt Welsh et al.
http:/ /www.cs.cornell.edu/Info/People/mdw/cs631/project.html Feature-Based Algorithm for Detecting and Classifying Scene Breaks, written by Ramin Zabih et al., Cornell University. ht tp://www.cs.cornell .edu/Info/People/rdz/MM%/mm95html.html
•
A
•
A Unified Approach to Temporal Segmentation of Motion J PEG and MPEG Compressed Video, written by Boon-Lock Yeo et al. http://www.ee.princeton.edu:80/-yeo/anacomp/scenechange.html
•
Knowledge-Based Video Motion Detection System developed by Sandia National Laboratories.
http://www.sandia.gov:80/ttrans/ipo/Abstracts/5613.html •
Realtime Video Contour Detection presented by CARDIT group. http://einstein.et.tudelft.nl:80/~stout/mccd/index.html
•
Motion Detection and Tracking research in Computation &: Neural Systems Program, California Institute of Technology, Pasadena, California. http://www.cns.caltech.edu:80/cns248/motion.html
A.2.2
V i d e o Indexing and A n n o t a t i o n
Video/image analysis for annotation and indexing in IBM. http ://www-i.almaden.ibm.com/cs/video/video_anno_ext.html •
Image indexing and retrieval: some problems and solutions proposed by Graeme Baxter et al.
http :/ /www.mcb.co.uk/services/articles/ documents/nlw /baxter.htm •
•
Similarity Indexing: Algorithms and Performance, written by David A. White et al. http://vision.ucsd.edu/papers/sindexalg/ A Content-Based Video Indexing System.
http:/ /www.tisl.ukans.edu/-wlee/papers/d196/subsection3_4_a.html •
Conceptually Indexed Video: Enhanced Storage and Retrieval in SunLabs. http://www.sun.com/960201/cover/video.html
•
A Video Parsing, Indexing and Retrieval System.
http:/ /www.iss.nus.sg/RiD/MS/Projects/vc/videmol.html
128
APPENDIX A
•
Research on the possibility of automatic indexing of lecture videos by Harvard's Ubiquitous Information Project. http://rioja.harvard.edu/~dweinst/UI.html
•
Overview of technical demonstration on video indexing. A user can submit a query (either by typing or voice) to the indexing system and have the system return a set of references into a video tape library that are appropriate to the query.
http://www.arpa.mil/sisto/symp/Demos/HLS/PR_VIDEO.html •
Research on Video and Image Libraries: Browsing, Retrieval, Annotation in MIT. http://www-white.media.mit.edu:80/-roz/res-diglib.html
•
OUTLAND's Video Annotation Systems can display data on a standard CCTV monitor. If a video camera is used, that data can be overlayed on the live video picture. http://www.accesscom.net/outland/otivpm 1.htm
•
The Video Classification group at ISS has developed tools supporting video parsing, indexing, and retrieval, based on video content. http://www.iss.nus.sg/RND/cs/vc.html
•
Indexing & Retrieval for Single Image and Video. http://aguirre.ing.unifi.it/-ist/idbms.html
•
Demo: Image/Video Indexing • Retrieval. http://marge.genie.uottawa.ca/demo/demo5.html
•
Demo: Representation, Similarity, Indexing of VDB.
http://vision.ucsd.edu/~deborah/infra/jain/ret.html •
Media stream: an iconic visual language for video annotation, written by Marc Davis. http://www.nta.no/telektronikk/4.93.dir/Davis_M.html
•
Video Indexing via Closed Captioning, written by Michal Kurmanowicz. http://trurl.npac.syr.edu/EFP/michal.report.html
•
Integrated Video Archive Tools, written by Rune Hjelsvold et al. It discusses what kind of tools a digital video archive should offer its users, and provides an experimental video archive system that consists of tools for playing, browsing, searching and indexing video information.
http://www.idt.unit.no:80/-videodb/artikler/ACM-MM95/paper.html
Useful URLs
129
Video Database Interface Project in Vision Interfaces and Systems Laboratory, UIC. http://www.eecs.uic.eduffrbryll/ Video Database: Data Structure & Interface, prepared by Robert Bryll. http://www.eecs.uic.eduffrbryll/videodb.html
A.2.3 •
V i d e o D a t a b a s e Query
Content Based Query System for Video Databases a prototype system allowing content-based browsing and querying in video databases.
JACOB:
http :/ /wwwcsai.diepa.unipa.it /research /projects/jacob / •
IBM QBIC System (Query by Image and Content). http://wwwqbic.ahnaden.ibm.com/~qbic/qbic.html/
•
Query Videoconferencing Database. http://www.ccm.ecn.purdue.edu/information/research/projects/videoconf/ database/form.html
•
MPI-Video Database. http://vision.ucsd.edu/papers/mpiv-arch/node12 .html
•
The VIQS Video Indexing and Querying System Project in University of Maryland. The research develops a simple SQL-like video query language, develops polynomial-time algorithms to process such queries, and builds a prototype video retrieval system called VIQS. http://www.cs.mnd.edu/users/vs/mnl/viqs/
•
IBM Visualizer Ultimedia Query for OS/2. http://199.246.40.99:80/ap/visualizer/umq.html
•
Fast Multiresolution Image Querying - - a strategy for searching through an image database, in which the query is expressed either as a low-resolution image from a scanner or video camera, or as a rough sketch painted by the user. http://www.cs.washington.edu/research/projects/grail2/www/Projects/ query.html ByteBuster Video tries to simulate a dynamic database browser using the World Wide Web.
http :/ /wolf.cs. yorku.ca:80 /People / frank / 4361/b l.html
130
APPENDIX A
Audio/Video Databases: An Object-Oriented Approach. http://cuiwww.unige.ch/OSG/Abstracts/AudioVideoDatabases.html EVA is a multimedia database system capable of storage, retrieval, management, analysis, and delivery of objects of various media types, including text, audio images, and moving pictures. Different ways of querying in EVA are discussed by Forouzan Golshani. http ://enws396.eas.asu.edu:8080/mmis/activities/multimediadb.html In VideoSTAR - - A Database for Video Information Sharing, written by Rune Hjelsvold. http://www.idt.unit.no/IDT/grupper/DB-grp/tech_papers/hjelsvold95.ht ml
A.3 A.3.1
OTHER RESEARCH
ISSUES
Video Data Compression
A Three-Dimensional segmentation-based approach for the compression of video sequence. http ://teal.ece.ucdavis.edu/ispg/3D_seg_demo/demo.html •
Introduction to Digital Video Coding and Block Matching Algorithms. http://atlantis.ucc.ie/dvideo/contents.html Binary Tree Predictive Coding - - a lossless and lossy compression method for photos, graphics, and multimedia images; evaluation code is available. http://monet.uwaterloo.ca/-john/btpc.html Sharable parallel image compression algorithms developed for massively parallel architectures to meet NASA's needs of sharing real-time video and image compression.
http :/ / cesdis.gsfc.nasa.gov :80/hpccm / annual.reports/ess94contents/ guest .ci/yun.html The art of video compression. Applied math: USC Algorithm Speeds Video Image Compression by Taking Its Time, written by Eric Mankin. http://128.125.1.11/dept/News_Service/chronicle_html/1995.10.16 .html/ The_art_of_video_compress.html
Useful URLs
•
•
131
Toward Digital Video Compression, Part II and Part III. http://www.tsc.hh.avnet.com/digvid2.html http://www.tsc.hh.avnet.com/digvid3.htrnl Fractal Video Compression.
http: / /inls.ucsd.edu/y /Fractals/Video /fracvideo.html •
Fractal Image Compression. http://inls.ucsd.edu/y/Fractals/
•
Fractal Image Compression Bibliography. http://dipl.ee.uct.ac.za/fractal.bib.html
•
Teltec Ireland, a video coding software group involved in research and development in video compression techniques and algorithms. http://www.teltec.dcu.ie/video/video.html
•
Video compression techniques used in the software industry.
http://eval.mcit.med.umich.edu:80/projects/video/tech/compress.html •
Video compression for broadcasting, including direct broadcast satellite, presented by Halhed Enterprises Inc. http://www.hei.ca/hei/mpeg2.html
•
H.261 Video Coding is a video coding standard published by the ITU (International Telecom Union) in 1990. The coding algorithm is a hybrid of interpicture prediction, transform coding, and motion compensation. http://rice.ecs.soton.ac.uk/peter/h261/h261.html
•
Lightning Strike Wavelet Compression, an image compression technology based on wavelets. The decompressor is now available as a Netscape plugin. http://www.infinop.com/html/comptable.html
•
Aware, Inc. has a variety of software-based data compression products for both general purpose image compression and specialized compression applications. http ://www.aware.com/product_info/compression_overview.html
•
Some leading codecs products and their compression qualities, BYTE magazine.
http :/ /www.byte.com/ art / 9505/sec l O/ art5.htm •
This site shows how well each codec maintains frame rates in tests of 15 fps and 30 fps video at 320 by 240 pixel resolution on four computer configurations.
http: / /www.hyperstand.com/ SIT E / A wesome/ O2.mpeg/ chart l.html
132
•
APPENDIX A
A sample video showing accelerated MPEG compression of dynamic polygonal scenes.
http: / /www.cs.princeton.edu:80/- dwallach/sg94/video.html •
Moving Picture Expert Group provides useful information about MPEG compression standard. http://www.crs4.it/-luigi/MPEG/
•
List of companies developing video compression equipment. http://www.indra.com/jewels/unicom/videocompression.html
•
Papers and source code on data compression. http://www.cs.pdx.edu/-idr/compression/
•
Importance of Data Compression in Branch Office Networks, written by Jim Mello et al. The paper outlines where the use of data compression makes most sense, discusses the more commonly implemented forms of data compression, and suggests the most efficient data compression option for environments with mixed public and private framed data networks.
http:/ /www.mot.com/MIMS/ISG/Papers/Data_Compress/ •
Collection of image and video compression sites by Jo Yew. http://www.eadcam.nus.sg/-ele22091/Proj.html#Others
A.3.2
C o m p u t e r Networks
Five US Gigabit Testbeds funded by NSF, ARPA, and industry that began in 1990. The project has two major goals. The first goal is to develop architectural alternatives for consideration in determining the possible structure of a wide-area gigabit network serving the research and education communities. The second goal is to understand the utility of gigabit networks to the end user. Five US Gigabit Testbeds are 1. AURORA: MIT/VuNet .. Bellcore/Osiris http://www.cnri.reston.va.us:4000/public/overview.html#aur 2. BLANCA: NCSA http://www.ncsa.uiuc.edu/General/CC/CCBLANCA.html 3. CASA http://www.nlanr.net/CASA/casa.html 4. NECTAR
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/nectar/WWW/ Nectar.html
Useful URLs
133
5. VISITNET
http: / /www.mcnc.org/HTML/ITD / ANT /VISTAnet.html •
Other gigabit testbeds. http://www.cnri.reston.va.us:4000/public/links.html
•
Bay Area Gigabit Testbed (BAGNet), a large-scale ATM metropolitanarea network for fourteen organizations in the San Francisco Bay Area. This network investigates the computer multimedia network infrastructure needed to support a diverse set of distributed applications in such an environment. http://george.lbl.gov/BAGNet.html
•
MIT/VuNet, a gigabit-per-second desk/local-area ATM network that interconnects general-purpose workstations, network-based multimedia devices, and bridges to other networks http://www.tns.lcs.mit.edu/vs/vunet.html
•
Tenet Group in UC Berkeley. The group focuses on the design and development of real-time communication services and on network support for continuous media applications such as video conferencing. http://tenet.berkeley.edu/
•
The Information Wide Area Year (I-WAY), an experimental high performance network linking dozens of the country's fastest computers and advanced visualization environments. This network provides the wide-area high-performance backbone for various experimental networking activities including Video Server project at Supercomputing '95 (SC'95). http://www.iway.org/
•
WEdNet, a statewide network that can connect all school and educators in Washington. It is a private digital network capable of carrying data, voice, and teleconferencing video information simultaneously. http://www.wednet.edu/wednet/wednetinfo.html
•
BEATMAN the Boulder ATM Area Network, an experimental ATM network based on distributed ATM switches. The network interconnects sixteen institutions in the Boulder area. http://www.cs.colorado.edu/~ batman/Home.html
•
Starlight Networks develops and markets digital video networking software products for use in video-on-demand and live video network multimedia applications. http://www.starlight.com/
134
APPENDIX A
•
NORTEL - - Broadband Video Networks, offers a comprehensive portfolio of products and services for broadband networks, including elements for transport and access portions of the network and specialized systems for broadcast video communications. http://www.nortel.com/english/broadband/video.html
•
SuperJANET ATM/Video Network. http://www.jnt.ac.uk:80/SuperJANET/video/serviceflyer.html
•
Gigabit Networking - - Yahoo searching results.
http :/ /www. yahoo.com/ Computers_andAnternet /Networking_and_Comm unications/Gigabit_Networking/ •
Present and Future Telecommunication Networks. http://nic2.hawaii.net/usr-cgi/ssis/lramos/com633/com633.htm
A.3.3
Copyright
Protection
•
Watermarking & Digital Signature: Protect your work! The Signal Processing Lab and the Laboratoire de Reseaux de Communications develop a technique for hiding invisible informations into images in order to be able to identify their authors. http://ltswww.epfl.ch/~jordan/watermarking.html
•
Digimarc Signatures can be used for a wide variety of applications, including verifying copyright ownership, detecting alterations, triggering digitalcash meters, or tracking black-market distribution. http://www.digimarc.com/~digimarc/
•
SysCoP (System for Copyright Protection) is a tool that allows the information provider to secretly embed robust hidden copyright labels into still image, motion data, or text postscript documents. WWW interface to the SysCoP and demo are also provided. http://sagittarius.igd.fhg.de:64325/
•
ACCOPI addresses the issues of access control and copyright protection for broadcasted images services (ACCPI), including TV, HDTV, and new multimedia services.
http :/ /www.tele.ucl.ac.be/IMA G ES / RACE / A CCO PI.html •
COPICAT project addresses the area of electronic copyright protection. It aims to provide a basis for confidence in electronic copyright protection
Useful URLs
135
and open up a "blocked" market in multimedia electronic publishing. http://albion.ncl.ac.uk/esp-syn/text/8195.html •
Guide to United States Copyright Law as Applied to Multimedia Productions. http://succeed.ee.vt.edu/copyinfo.html
•
Multimedia Copyright Information. http://oacl.oac.tju.edu/'rod/copyright.html
•
WWW Multimedia Law. http://www.batnet.com/oikoumene/
•
Beyond the Future: Multimedia and the Law, written by P.G. Leonard.
http://snazzy.anu.edu.au/CNASI/pubs/OnDisc95/docs/ONLlO.html •
Ladas & Parry Guide to Statutory Protection for Computer Software in the United States. http://www.ladas.com/GUIDES/COMPUTER/Computer.USA.html
•
Copyright Protection for Software: 1996 UPDATES. http://www.ita.doc.gov/industry/computers/cpychng.html
•
Legal Care for Your SOFTWARE, a Step-by-Step Legal Guide for Computer Software Writers, Programmers and Publishers, written by Daniel Remer and Robert Dunaway. http://www.island.com/LegalCare/
•
Computer Software Protection, written by Copyright Law Review Committee.
http :/ /www.agps.gov.au / customer/ agd/ clrc/software /homepage.html •
Three Common Fallacies in the User Interface Copyright Debate. http://www.lpf.org/Copyright/laf-fallacies.html
•
Software Protection: Patents, Copyrights and Trade Secrets, written by Joseph S. Iandiorio.
http: / /www.nmq.com/EmgBizN C/ CntProvs/SvcProvs/IT/ Articles/IandArt5.htm •
Study on New Media and Copyright.
http: / /www.nlc-bnc.ca/documents/ infopol/ copyright/nglfinal.txt •
Legal Protection of Computer Programs and Databases, written by Ilya NIKIFOROV. http://www.spb.su/rulesreg/legal.html
136
APPENDIX A
•
How to Protect Intellectual Property Rights in Computer Software, written by R. Mark Halligan, Esq. http://execpc.com/~mhallign/computer.html
•
COMPUTER LAW: European Database Protection. http ://www.brmlaw.com/doclib/complaw21396.html
•
Intellectual Property and the National Information Infrastructure, a Preliminary Draft of the Report of the Working Group on Intellectual Property Rights. http://www.uspto.gov/text/pto/nii/ipwg.html
•
"Intellectual Property Online: Patent, Trademark, Copyright", EFF Archive. http://www.eff.org/pub/Intellectual_property/
•
UNDPs World Intellectual Property Organization. gopher://gopher.undp.org/11/unearth/organizations/wipo
•
Protecting Intellectual Property, maintained by the United States Information Agency. http://www.usia.gov/topics/ip/
•
Major Forms of Intellectual Property Protection. http://www.questel.orbit.com/patents/readings/ipr.html
•
ARVIC's Guide to Intellectual Property. http://www.arvic.com/
•
Copyright in Public Domain. http://www .benedict.com/public.ht m
•
Intellectual Property in the Information Age, written by Christopher Chang et al. http://www.seas.upenn.edu/-cpage/cis590/
•
Leveraging Intellectual Property Rights. http://www.mccarthy.ca/mt-lipr.html
•
CREDO Section II. Summary of Copyright Basics. http://www.ilt .columbia.edu/projects/copyright/ILTcopy2.html
•
Copyright Fundamentals. http://www.benedict.com/fund.htm
•
Copyright in Australia. http ://www.spirit.net.au/~ dan/law/swguide/Copyright.html
Useful URLs
137
U.S. Copyright Law. http://www.prae.com/USCopyright.html A Guide to Copyrights. http://www.wsrgm.com/copy.html
Overview of U.S. Copyright System. http://utrc08.admin.utk.edu/overcopy.html General information about copyrights. http://www.patents.com/copyrigh.sht Council Directive 93/98/EEC of 29 October 1993 harmonizing the term of protection of copyright and certain related rights.
http :/ /www2.echo.lu/legal / en / ipr / termprot / termprot.html Structure of copyright protection.
http :/ / edie.cprost.sfu.ca/ cjc / amb / structur.html A bibliography of copyright protection & intellectual property issues. http://robotics.stanford.edu/users/ketchpel/copyrightandip.html Copyright WWW page in the Pennsylvania State University. http://ets.cac.psu.edu/news/Copyright/copyright.html
A.4
MARKET AND PRODUCTS
A.4.1
COMMERCIAL
Video Boards
Parallax Graphics, Inc. uses the unique VideoStream technology to display and capture high-quality digital video images for performance-intensive applications like desktop videoconferencing, video editing, and video-ondemand. Products include: 1. XVideo, PowerVideo, and MultiVideo for Sun workstations. 2. XVideo700, PowerVideo 700, and MultiVideo 700 for HP workstations. 3. XVideoPCI, PowerVideoPCI and MultiVideoPCI for IBM PCI-based workstations running AIX.
138
APPENDIX A
http://www.parallax.com/home.html •
Vidboard is a video capture and processing board that connects directly to VuNet. It is capable of generating video streams having a wide variety of characteristics relating to the presentation of video and transport of video across a network. Devices within the system use a set of ATM protocols for requesting video from the Vidboard. http://www.tns.lcs.mit.edu/vs/vidboard.html
•
SILICON VIDEO MUX is a single slot imaging board that provides the PC/AT with an interface to high bandwidth, high resolution video sources. Video can be digitized from almost any video format: Standard RS-170 cameras, CCIR cameras, VCRs, high scan rate graphic CRT displays, medical CRT displays, high frame rate cameras, high resolution cameras, or VCRs in pause mode. http://www.epixinc.com/epix/svmtext.htm
•
Sirius Board (SGI). http://viswiz.gmd.de/DML/hw/Video/sir.html
•
Bitfield H.261 Video Compression Board. http://www.bitfield.fi/techspec.html#H261
•
OmniMedia Technology, Inc.'s products includes Video Capture/Playback add-on Cards, H.261/320 Codec Cards. http://www.omt.com/
•
FullVideo Supreme, the Digital's first video PCI board. http://www.digital.com/info/Customer-Update/950411011.txt.html
•
MiroVIDEO DC1 tv video capture/editing board allows capturing video and sound from any video source via standard RCA connectors. http://tcp.ca/Nov94/Miro.html
•
Motion-Video Board developed by Orchid Technologies.
http://www.zdnet.com/cshopper/features/94best/bestO9.html •
VSC Video Board, HEBER's Pluto amusement machine control system. http://www.heber.co.uk/vsc.htm
•
Picasso Video Capture Board for IBM compatible developed by Digital Media Labs. http://www.dmlweb.com/
Useful URLs
139
•
Video Board Products offered by PINE Group, a Hong Kong family company. http://www.pinegroup.eom/
•
CVM Video Board developed by Digital Designs and Systems, Inc. http://www.dideas.com/index.html
•
VIDEO-IT! and VIDEO BASIC still image and video capture board and software are products of ATI Technologies. http://www.atitech.ca/
•
MobilePlanet's NOGAVISION! is the first PCMCIA Type II video card to capture full-motion video and sound. http://www.mplanet.com/index.html
•
Hauppauge delivers digital video boards for PC's. It provides Win/TVCelebrity, Win/TV-HighQ, Win/TV-CinemaPro, Win/TV-Prism, Win/Motion60. http://www.hauppauge.com/hew/index.htm
•
Willow Peripherals' Video Capture Products.
http: / /WWW. WILLOW.COM /P ERIP HERALS / •
List of Mac and PC based video capture products of Educational Technology Team, San Diego City Schools. http://edtech.sdcs.kl2.ca.us/josh/dvrg/
A.4.2
Video Processors
•
Elite Video presents BVP4 - Broadcast Video Processor. http://www.elitevideo.com/
•
Colorado Video, Inc.'s Video Signal Processors.
http://www.optics.org/colorado-video/colorado-video.html •
Datum Inc.'s -6VTI Video Time Insertion Module. http://www.datum.eom/
•
AuraVision Video Processors (the VxP524 Video Stream Processor and the VxP505 Video Processor) are suited for applications such as full-motion video capture, multiformat video playback, video editing, video eonferencing, multimedia presentations, and multimedia authoring. http ://www.liberty.com/home/auravision/
140
APPENDIX A
•
HP's new PA-RISC processor decodes MPEG video. http://www.hp.com/wsg/strategies/parisc3.html
•
Colonel Video & Audio presents a list of Video Editing Equipment. http://www.phoenix.net/-colonel/default.html
•
Tseng Labs, Inc.'s ET6000 graphics and video processor. http://204.140.244.22/wire/nov95/nov 16-2.htm
•
ComPu-2000-IBM Aptiva, a Pentium microprocessor plus IBM's powerful media processor.
http :/ /www.stargate.ca/ Compu /html / cmpul OO3.htm •
An explanation concerning poor video performance with PCI video cards under NEXTSTEP 3.3.
http :/ /www.next.com / N eX Tanswers / HTM L Files/18 23.h tmld /18 23.html •
8x8's Multimedia Processor Architecture (MPA), provided by 8x8, Inc. includes 1. Video Communications Processor (VCP). 2. Low Bitrate Video Processor (LVP). 3. Multimedia Playback Processor (MPPex). http://www.8xS.com/index.html
•
Video Blaster RT300, a real-time video capture and compression solution allows you to capture and create high-quality, full-motion video on your PC and play them back on any 386.
http:/ /www.creaf.com/wwwnew /complex/products/video/vbrt3OO.html •
Desk Top Video Editing Products developed by Electronic Mailbox. http://www.cris.com/~videoguy/dtv.htm
•
Digital's 21130 Chip, a PCI-based video and graphics accelerator for CISC and RISC PCs that delivers high-performance video and graphics from a single chip. http://www.digital.com/info/pr-news/95030602PR.txt.html
A.4.3 •
Video Storage Media and Systems
Philips CD-i. http://www.acs.brockport.edu:80/cd-i/
Useful URLs
•
141
Colorado Video, Inc. presents following Video Memories and Video Integration Memories: 1. 441 - Frame Store. 2. 441RGB - Frame Store. 3. 441Y/C - Frame Store. 4. 491 - Video Frame Store. 5. 442 - Video Subtractor. 6. 443 - Video Peak Store. 7. 499 and 599 - Video Multimemories. 8. 440 and 440A - Field Store and Frame Store. 9. 444 - Integration Memory. 10. 446RGB - Integration Memory. 11. 4 4 6 Y / C - Integration Memory. 12. 503 - Exposure Control. http://www.optics.org/colorado-video/colorado-video.html
•
Silicon Graphics products: 1. P H O T R O N DVDA-1 Digital Video Disk Array. 2. S T O N E 1086. 3. Quick-Frame. http://www.sgi.com/
•
CLARiiON: Advanced Storage Solutions.
http://www'helpwanted'c°m/hwd°cs/c°mpany/clarii°n/clahp'htm •
Data Storage Products in EMC. 1. Extended-Online Storage Solutions - - EMC's new line of high-capacity, disk-based storage systems. 2. Storage Management Systems - - EMC's centralized solution for the open-system enterprise; providing high performance, easily managed, distributed database backup. 3. Mainframe Storage Solutions - - EMC's Symmetrix 5000 Systems, Symmetrix Data Migration Services, and Symmetrix Remote Data Facility.
142
APPENDIX A
4. Storage Solutions for Open Systems - - EMC's Symmetrix 3000, Centriplex 1000, and Open Harmonix. 5. Storage Solutions for AS/400 - - EMC's Symmetrix 3000 ICDA Systems for AS/400, Harmonix HX3SR AS/400 DASD Storage Systems, Harmonix HX-ACAB, and Enhanced Remote Maintenance Processor. 6. EMC Media Servers - - for high capacity, availability, scalability, and reliability video service. http://www.emc.com/ Netvideo provides Internet Video Storage and Distribution Services. http://www.net video.com/netvideo/advantages.html High-performance, fault-tolerant, OPEN, SCSI standards-based, enterprisewide storage systems developed by Storage Computer. http://storage.com/bio.html NewTek's video toaster flyer - - the digital video storage and editing system. http://www.newtek.com/medium/index.html
A.4.4
Video Input/Output Devices
IBM's Ultimedia Video I/O Adapter, a multipurpose video adapter for RS/6000 workstations, supports the video output of YUV data as NTSC or PAL. http://www.austin.ibm.com/hardware/Adapters/ultimedia.html#topic7 Image Manipulation Systems, Inc. is a Minnesota-based corporation, specializing in high quality multimedia cards. Current products include 1. IM1002 - SBUS NTSC Video output card. 2. SC3000 - Sun Sparc SBUS H . 3 2 0 / H . 2 6 1 / G . 7 2 8 / J P E G / I S D N Teleconferencing card. 3. PCI3000 - PCI Bus H . 3 2 0 / H . 2 6 1 / G . 7 2 8 / J P E G / I S D N Teleconferencing card. 4. SC3100 - SBUS NTSC/PAL Video I/O card. 5. VJ3000 - VME NTSC/High Resolution JPEG CODEC Output card. http://www.imageman.com/
Useful URLs
•
143
Silicon Graphics Products: http://www.sgi.com/ 1. V-PORT, video output for the Silicon Graphics Indy, completes the Silicon Graphics Indy platform by providing a broadcast quality, video output on a single-width GIO plug-in card. 2. Diskus Digital Disk Recorder, a real-time graphics disk recorder, provides as elegant interface to all SGI workstations for both video recording and remote control. 3. Avion is a new hardware and software "solution" providing broadcast quality video in and out of the SGI Indigo2 platform. http://www.sgi.com/
•
E D T specializes in the design and manufacture of SBus interface cards, providing connectivity between Sun SBus computers and external devices. It has the following products: 1. SDV - Digital video camera interface for the SUN SPARCstation. 2. SIV - Video capture and display for the SUN SPARCstation. 3. $53B-1 - SBus to MIL-STD 15538 Serial Interface for the SUN SPARCstation. 4. S11W - SBus to DR11W Parallel Interface for the SUN SPARCstation. 5. $16D - High Speed, 16-bit I / O Interface for the SUN SPARCstation. 6. SCD-20 - 20 MB /Second Configurable DMA interface for the SUN SPARCstation. 7. SCD-40 - 40 MB /Second Configurable DMA interface for the SUN SPARCstation. 8. M A R K - 1 0 / 1 6 / 2 0 - 10, 16, and 20 MB /Second disk arrays for the SUN SPARCstation. 9. KATO Driver Analyst - real-time, X-windows graphical analysis tool for the S16D and S11W. 10. Cable assemblies for the S l l W , S16D, SCD-20 and SCD-40 Interface
cards. http://www.edt.com/Products.html •
Willow Peripherals' Video Output Products:
144
APPENDIX A
1. VGA-TV 4000 is a VGA-compatible graphics card that provides simultaneous NTSC composite video and S-Video outputs. Up to eight separate VGA-TV 4000s can coreside in a single PC allowing system designers to control multiple independent video channels from one computer. 2. The LaptopTV is Willow's external VGA-to-NTSC scan converter that turns the output form PC's and Mac's into standard television video.
http://WWW.WlLLOW.COM/PERIPHERALS/ VID series of video transmitters developed by Micro-Video Transmitter World Inc. http ://www.canadamalls.com/mvworld.html
A.4.5
Video
Server Systems
TNC's Cheetah video server has two new capabilities: the ability to deliver more streams than any other player, and the inclusion of an optical realtime encoder into the server architecture. ht tp://tnc.www.com/index.html •
VideoServer is the world's leading supplier of standards-based Multimedia Conference Servers (MCSs) - - the communications device that enables network conferencing over dissimilar networks. http://www.videoserver.com/htm/products.htm
•
With its DirectStream Server Architecture and its innovative variable-bitrate (VBR) video encoding and Concatenated Interleaving technique to format VBR video for storage onto and playback from disk, Imedia DS1000 Video Server provides an attractive alternative to the expensive deployment of full-service, true video-on-demand systems. http://powergrid.electriciti.com/~ pshen/pd-ser.html
•
Voyager Web Based Video Server developed at Argonne allows both recording to and playback from the server via clients managed by a local Web browser.
http: / /www.netlib.org/nhse /nhse96/ demos/video.html •
VOSAIC - Video Server in Taiwan. Vosaic(Video Mosaic): World Wide Web browsers and servers support full file transfer for document retrieval. http://peacock.tnj c . e d u . t w / h D D / v o s a i c / i n d e x . h t m l
Useful URLs
145
•
Video Sever developed by I-WAY provides interactive video services to both local and remote conference attendees. http://www.iway.org/video/index.html
•
The LANL Video Server is a part of the Sunrise Education Nil project. http://bang.lanl.gov:80/video/
•
Oracle Media Server, a component of the Oracle Media family of products, provides high-performance, scalable, and reliable multimedia library services on a wide variety of general-purpose computer platforms. http://www.oracle.com:80/
•
Oracle Media Server, the portable, open multimedia systems with massive scalability, reliability, manageability. http://www.ksi.co.za:80/prodinfo/oms.html
•
i P O I N T Video Server of UIUC. http://www.ccsm.uiuc.edu/vserver.html
•
Netrek Server, a multiplayer interactive X-11 based video game. It is played over Internet, against real human opponents, and requires a direct T C P / I P connection. http://astrowww.astro.indiana.edu:80/personnel/ahabig/netrek.html
•
Starlight Networks, Inc. offers StarWorks-TV and a StarWorks video server, which provide integrated live and stored multicasts over standard Ethernet local area networks or Hughes DirectPC for many applications.
http :/ /www.starligh t.com / •
HP MediaStream Server.
http :/ /www.tmo.hp.com/ tmo /tcnews / 9508 /TN CovSt.html •
ELVIRA - Experimental Video Server for A r M . http://www.idt.unit.no/-videodb/diplomer/sflangorgen/summary.html
•
The Cheetah High Bandwidth Server in Renaissance Digital. http://www.sover.net/-ren/index.html#rd
•
Micropolis' AV Server. http://www.microp.com/AVS_features.html
•
InTOUCH T V Interactive T V System developed by Interactive Systems, Inc., Beaverton, Oregon. http://www.teleport.com/'isi/system.shtml
•
The VCR server, the current status of the Video Conference Recorder. http://www.cs.ucl.ac.uk:80/dragon/server.html
146
APPENDIX A
A.4.6 •
V i d e o / I m a g e I n f o r m a t i o n Retrival Systems
IBM's products: 1. DB2 Extenders. http://www.software.ibm.com/db2/rextspec.html 2. Ultimedia Manager 1.1 and Client Search --- finds images by color, shape, texture, and related business data. http://www.software.ibm.com/data/umm/umm.html http://www.software.ibm.com/
•
Virage provides Visual Information Retrieval (VIR) technology. It has these products: 1. The VIR Image Engine is an image-analysis and image-comparison tool that application developers can integrate into their own applications. 2. The Virage VIR Command Line Interface provides a text-based interface for the VIR Image Engine so its powerful visual search capabilities can be used over networks such as the World Wide Web. 3. The Virage Image Read/Write Toolkit provides a library of image file-format readers and writers that incorporate format conversions, colorspace conversions, and various compression options. 4. The Virage Image Processing Toolkit provides image-processing functions necessary for many image-management applications. http://www.virage.com/
•
JACOB: Content Based Query System for Video Databases - - a prototype system allowing content-based browsing and querying in video databases
http :/ /wwwcsai.diepa.unipa.it /research /projects/jacob / •
MC~zG Video Laser Disc (VLD) can quickly access and retrieve a wide variety of map and chart images stored on a standard analog Video Laser Disc. The system is designed to operate with a PC-compatible computer, two monitors, a video laser disc player, mouse, data base software, and indexing system software. http://www.tmpo.dma.gov:S001/./guides/dtf/vld.html
•
Networked Information Discovery and Retrieval Tools for Macintosh. http://kudzu.cnidr.org/Software/tools.html
Useful URLs
A.4.7
147
Other Video Products
VISIT Video is NORTEL's product for personal multimedia conferencing. http://www.nortel.com/english/nortel.html VDOLive based on VDOW compression that enables real-time video applications on the Internet provided by VDOnet. http://www.vdo.net/ The Video Notepad, a personal video library system that consists of a camcorder and a PowerBook communicating via a Sony V-box. The system has three main functions: record and index, searching for clips, condensing highlights. http://www.sils.umich.edu/awoolf/VideoNotepad.html PowerVideo provides high-quality video display, capture, storage, and broadcast for Sun and HP workstations http://www.frame.com/PARTNERS/third-137.html Internet-based video confereucing products: 1. CU-SeeMe is a free video conferencing program (under copyright of Cornell University and its collaborators) available to anyone with a Macintosh or Windows and a connection to the Internet. http://cu-seeme.cornell.edu/#CU-SeeMe 2. VidCall is a software product by MRA and Associates. It allows for point-to-point communication over the Internet for video and voice. http://www.access.digex.net/-vidcall/vidcall.html 3. Internet Phone is a software product by VocalTec, the Internet Phone Company. It allows Mac and PC users talk over the Internet. http://www.vocaltec.com/ 4. Video Engine 200 developed by Intergraph Corporation provides an all-digital system that can record, playback, and output full-motion, full-screen video, and true-color computer animation. http://www.ingr.com/ics/video/ve200.html 5. Commercial Video Service (CVS) developed by Luminet Winona, Minnesota is a fiber optic-based service for the transmission of fullmotion, broadcast quality video signals, it can deliver video one-way, two-way interactive, point-to-point, multipoint, and broadcast. http://www.luminet.net/luminet/video/index.html 6. Virtual Video Brower(VVB) lets the user query and view a video database based on scene content. http://hulk.bu.edu/projects/vvb_demo.html
148
•
APPENDIX A
KAPPA's video products: 1. Color Video Systems incorporate advanced video technology. 2. Black-and-White Video Systems are characterized by their superior resolution. The range of these systems reaches from compact and simple units to digital systems that directly hook up to computers. 3. Video Microscopy serves in those situations where a regular microscope is not sufficient. 4. Video Endoscopy is an excellent tool for the actual inspection and the documentation of findings. 5. Video Measuring can do sophisticated measuring with the help of video technology. 6. Medical Video Systems provide video systems for laparoscopy, arthroseopy, and other applications in the operating room. 7. Stereo Video is a compact and easy-to-use three-dimensional stereoscopic video system that delivers true live spatial video images. 8. Criminalistic Video for tactical and forensic application. http://www.techexpo.com/WWW/kappa/kappa.html
•
Motion Media Technology products: 1. Xyclops 200 Personal Videophone uses H.320 standards for videotelephony. 2. Xyclops 900 Video-telephony ISA-bus PC add-in card plus controller software allows an IBM PC compatible computer system to execute videophone applications. http://www.mmtech.co.uk/-mmtech/products.htm
•
outSPOKEN for Windows is a screen reader converts the graphics and text of Microsoft Windows to a full speech and audio interface for blind, visually impaired, or learning disabled computer users. http://www.inforamp.net/- access/osw.htm
•
Omega Videoconference Systems, provided by VSI Enterprises, Inc. share a common software-based system architecture. They are also internally networked for sophisticated diagnostic abilities and offer a unique mousecontrolled user interface. http://www.vsin.eom/arch.html
Useful URLs
149
•
PDI's Precision MX Video Engine is PCI bus based and provides integrated image capture, processing and display with comprehensive software support. http://www.precisionimages.com/
•
Data Translation designs, manufactures and sells plug-in boards and software that turn your personal computer into a digital video system, a data acquisition system, or a machine vision or microscopy system. http://wheat.symgrp.com/symgrp/datx/home.html
•
PCMotion delivers full-motion playback of MPEG digital video and audio for video-on-demand. http://www .optibase.com/p cmotio.html
•
Reveal's Video Artist editing system can turn your PC into a digital video editing system, turn your home videos into Hollywood-style movies, or create professional business and training presentations. http://www.reveal.com/
•
SONY's video products: 1. Professional VTR. 2. Switcher/Digital Effects. 3. Professional Cameras. 4. Videodisc Recorders/Player. 5. hnage Capture Cameras. 6. MPEG-I Real Time Encode Unit. 7. CD-ROM Juke Box. 8. Language Learning Systems. 9. Language Learning Systems. 10. Infrared Conference System. http://www.sel.sony.com/
•
MovieVideo, a Multi Media System's product, is a single-board video RAM-recorder with multistandard and multiehannel capabilities designed for the capture, storage, and display of full-motion video sequences in studio-level quality. http://www.hannover.de/mms/mms.html
150
APPENDIX A
Metro-Xv (TM) Real-Time Video in a Window Package provides a deviceindependent method to control and manipulate real-time video windows in a manner consistent with the standard X programming interface. http://www.metrolink.com/products/Metro-Xv.html Scitex Digital Video. http://204.247.28.5/abekas/overview/Over.html MPEG Compression for Windows.
http :/ /www.byte.com / art / 9407/ sec14 / art6 7.htm A survey of distributed multimedia products.
http :/ / cuiwww.unige.ch / OSG / MultimediaInfo/mmsurvey /products.html Silicon Graphics video products. http://www.sgi.com/Products/appsdirectory.dir/SolutionIXFilm_Video _Production.html
A.5 A.5.1
VIDEO
DATABASE
APPLICATIONS
Education
•
The Apple Multimedia Learning System (AMLS) is a combination of Apple software and Apple hardware and non-Apple hardware that delivers up to 125 channels of full-motion video to a network of Macintosh multimedia computers. It is designed to foster interactive learning activities, including multimedia authoring, software training, and network management. http://www.apple.ca/doc/ds/AMLS.html
•
An On-Line Distance Learning System using digital video and multimedia networking technologies. The design of this system is made in the context of the Stanford Instructional Television Network (SITN) serving both the Stanford community at large and off-campus students at SITN's remote customer sites. http://minas.st anford.edu:S0/proj ect/proj ect.html
•
Distributed electronic classrooms and the MUNIN/MultiTeam project, plans to implement, try out, and evaluate a system for distance education between electronic classrooms. The project is organized as a joint project between the Center for Information Technology Services at the University of Oslo (USIT), the Norwegian Telecom Research Department (NTR), and
Useful URLs
151
the Center for Technology at Kjeller (UNIK). Project is carried out by the University of Oslo.
http:/ /www.uio.no/usit/Utviklingsseksjonen/mice/munin.html •
The Living Schoolbook Project of the Syracuse University is a demonstration of the potential of the World Wide Web, the National Information Infrastructure, and High Performance Computing, VOD, and communications technologies in the K-12 classroom. http://www.npac.syr.edu/projects/ltb/index.html
•
Video on Demand Technologies and Demonstrations in Syracuse University.
http://www.npac.syr.edu:80/NPACl/PUB/marek/proj3.html •
Bell High School Video Portfolios CD ROM is an interactive electronic portfolio containing QuickTime(tm) versions of 28 videos produced by students in the Bell High School Television Production Program. http ://www.atg.apple.com/personal/Brian_Reilly/video_portfolios.html
•
Curriculum & Content Development in RENAISSANCE Digital. http://www.sover.net/-ren/educat.html
•
TNS Technology Demonstrations are some demonstrations of VuSystem applications and VuNet hardware developed by members of the Telemedia, Networks, and Systems Group in the MIT Laboratory for Computer Science. http://www.tns.lcs.mit.edu/vs/demos.htnfl
•
Dynacom Applications - - a baseband video networking for true video-ondemand applications in education and others on a budget. There are five general applications for baseband video networking: 1. Network screen output from computers/CD-ROM to reduce dependency on computer software and hardware. 2. Digital Video Serving for true video-on-demand. 3. Interbuilding Broadcasting and Receiving for both shared and distance learning over telephone lines. 4. Integration of Bell/Alarm/Intercom Systems with video. 5. Remotely Control Standard A/V Devices. http://www.indata.com/www5.html
•
Interactive Distance Learning in Bell Atlantic. http://www.ba.com/dl/
152
APPENDIX A
Distance Education is the first of six generic on-line Internet facilities relating to leadership and management development. http://www.oise.on, ca: 80/- bwillard/disted, htm Cin~-Med, Inc. produces video telecourses, interactive computer programs, monographs, and other related materials to meet the instructional needs of physicians and other health care providers. http://www.cine-med.com/ The CIPR video conferencing and interactive distance learning system. http ://ipl.rpi.edu/conferencing.html Distance Learning Information at Stephen F. Austin State University. http://www.education.sfasu.edu/emc/dlearning.html
A.5.2
Entertainment
•
TV Goes Digital: The Future of Video in Leisure and Entertainment, written by Steve Alcorn, Alcorn McBride Inc. http://www.alcorn.com/text/dtv.html
•
IBM offers full-service interactive media production capabilities, from CDROM to interactive television and VOD.. http://www.solutions.ibm.com/multimedia/media-home.html
•
Nissim Corp. has recently demonstrated a digital video player that automatically customizes a motion picture according to a viewer's preferences for the level of explicitness - none, implied, explicit, or graphic. http://www.nissim.com/
•
Video services in Bell Atlantic. http://www.bell-atl.com/bvs/
•
Digital provides media server technology for Ameritech Deployment. http://www.digital.com/info/pr-news/95022101PR.txt.html
•
Digital Equipment video server has been selected for European cable TV VOD Trial.
http :/ /www.digital.com/info/pr-news/95022101P R.txt.html •
VOD Field Trial of Helsinki Telephone Company. http://www.kolumbus.fi:80/k-toimitus/hpy-vod.htm
Useful URLs
153
•
Westminster (Uk) Cable signed contract with Digital to provide technology for VOD trial with Cable TV customers. http://www.digital.com:80/info/pr-news/94112201PR.txt.html
•
TMN Networks Inc. selected Digital's Media Server for Canadian deployment of expanded programming services. http://www.digital.com:80/info/pr-news/95042601PR.txt.html
•
The Network Connection and Kollsman signed In-Flight Entertainment Certification Agreement. http://tnc.www.com/home/pr/102195.html
•
News on Demand. http://www.cnri.reston.va.us:80/home/dlib/september95/nod/page3.html
•
Bothwell Communication provides VOD movies. http://home.sprynet.com/sprynet/bothwell/
•
Telephone Video of America, Inc.(TVA) provides 200,000 homes with VOD in 1996 with the help of Lockheed Martin Media Systems Integration Group. http://www.intechnet.com/tva/
•
NPAC VOD Video News Database Server Project. http://trurl.npac.syr.edu/vns/
•
VOD research at NDSU Computer Science Department.
http :/ /www.cs.ndsu.nodak.edu /~rvetter / A T M /html /vi deo.html •
VOD research at TMresearch. http://www.innovplace.saskatoon.sk.ca/tmres/research.html
•
Demonstration of VOD in Educational Fields - - research and development on large-scale interactive CATV in Japan. http://www.mpt.go.jp/gTweb/Education/VOD.html
•
VOD technologies and demonstrations in Syracuse University. http://www.npac.syr.edu/NPAC 1/PUB/marek/proj3.html
•
Data sharing schemes for multimedia database systems. http://www-ccs.cs.umass.edu/db/mmdb.html
•
VOD research in SONAH.
http :/ /www.analysys.co.uk / acts /sonah / guide /vod.htm •
VOD research in Georgia Tech..
http :/ /www.cc.gatech.edu / fac /Mostafa.Ammar /V O D.html
154
•
APPENDIX A
Implementation of VOD using an M P E G client-server model and ATM networks.
http: / /lal.cs.byu.edu/ketav /issue_2.5/vod/vod.html •
Portable VOD in wireless communication.
http://www-cis.stanford.edu/cis/research/LabProjects94/PortableVideo .html •
Development of Advanced Image/Video Servers in the VOD Testbed, written by Shih-Fu Chang et al.
http://www.ctr.columbia.edu/-jrsmith/html/pubs/VSPC-94/vSPC-94_1 .html •
The Design and Implementation of a Media on Demand System for W W W . http://www.it.kth.se/-klemets/www.html
•
On-Demand Video Network Architectures and Applications (MCL project abstracts). http://hulk.bu.edu/projects/summary.html
•
The DIAMOND VOD Consortium. http://www.octacon.co.uk/proj/diamond/diamond.htm
•
Digital's Interactive Video Services, VOD into the office, classroom and marketplace. http://www.digital.com.au/CSS/ivs.html
•
VOD DEMO. http://www.cardinal.fi/campeius/video.html
•
VOD DEMO at MIT. http://www.tns.lcs.mit.edu/-hhh/demos.html
•
UCSD's virtual lab's DEMO. http://vision.ucsd.edu/
•
Berkeley VOD System. http://roger-rabbit.cs.berkeley.edu/vods/index.html
•
Audio and VOD. http://www.cs.colorado.edu/home/homenii/demand.html
•
Business gets access to VOD.
http: / /www.smh.com.au/archive/news/951024/newsl-951024.html •
VOD overview.
http :/ /www.cs.tut.fi:80 / tlt /stuff/vod /VoDOverview/vod.html
Useful URLs
•
Problems in VOD and Video Dialtone. http://www.magic.ca/infohighway/vod.html
•
VOD using CELL-MASTER. http://www.cellware.de/systems/vod.html
A.5.3
155
Law Enforcement
•
Face Recognition from Live Video for Real-World Applications. http ://venezia.rockefeller.edu/group/papers/full/AdvImaging/index.html
•
Video Enforcement Systems (VES) are the various components and processes of the toll collection system with which the toll equipment is able to capture information on vehicles that have not paid the proper toll. http ://village.ios.com/-mkolb/ves.htinl
•
SpectraTek designs and manufactures state-of-the-art electronic equipment for law enforcement and government intelligence customers. 1. Video Surveillance Systems are designed to work together flexibly in any operation from tactical use to covert stakeouts and even automated, unattended surveillance. http://www.interserve.com/-spectrat/video.html 2. State-of-the-art Night Vision Systems with integrated video capability. http://www.interserve.com/-spectrat/night.html http://www.interserve.com/-spectrat/index.html#menu
• •
Forensic image restoration with Khoros. http://www.xs4all.nl/-forensic/deblur.html Harris MultiVIEW.
http :/ /www.harris.com/hcjp /multiview.html •
Video Surveillance Systems and hnagery Analysis. http://www.aero.org/nlectc-wr/ses_vsia.html
•
Investigative Image Processing. http://www.spie.org/web/meetings/calls/le96_analysis.html#Rudin
•
Digital Imaging for Law Enforcement. http://www.deltanet.com/PostOfficeWall/
156
APPENDIX A
US WEST has a wide range of video applications: nmltipoint business conferences without the costs of travel; distance learning between remote campuses; inexpensive security systems linked to multiple locations; longdistance diagnosis in medicine; and remote arraignments and other procedures in law enforcement http://www.uswest.com:80/advert/guide.html
A.5.4 •
D i g i t a l Library
The six NSF/ARPA/NASA Digital Library Initiative grant projects: 1. Carnegie Mellon University - - Full-content search and retrieval of video. http://www.informedia.cs.cmu.edu/inforweb.html 2. Stanford University - - Interoperation mechanisms among heterogeneous services. http://Walrus.Stanford.EDU/diglib/ 3. University of California at Berkeley - - Work-centered digital information services. http://elib.cs.berkeley.edu/ 4. University of California at Santa Barbara - Spatially-referenced map information. http://alexandria.sdc.ucsb.edu/ 5. University of Illinois at Urbana-Champaign-- Federating repositories of scientific literature. http://surya.grainger.uiuc.edu/dli/ 6. University of Michigan - - Intelligent agents for information location. http://http2.sils.umich.edu/UMDL/HomePage.html The Digital Library Technology (DLT) Project supports the development of new technologies to facilitate public access to NASA data via computer networks. http://dlt.gsfc.nasa.gov/ Berkeley Digital Library SunSITE. http://sunsite.berkeley.edu/ The British Library's Initiatives for Access Projects. gopher://portico.bl.uk/
Useful URLs
•
D-lib - - articles, news, and commentary on all aspects of digital library research. http://www.dlib.org/
•
IBM Digital Library - The Vatican Library. htt p ://www.software.ibm.com/is/dig-lib/vatican.ht ml
•
The Gibson Digital Library. http://kcmo.com/hggll.htm
•
157
Digital Media Library.
http :/ /www.clark.net / pub /networx / fusion / overview.html •
Exploratorium's Digital Library provides stills, movies, sounds, and other experimental items. http://www.exploratorium.edu/imagery/imagery~ome.html
•
CEDAR's Digital Library for the use of those interested in document analysis and recognition research. http://www.cedar.buffalo.edu:80/Taxila/
•
A World Wide Web Digital Library for Schoolkids. http://www.npac.syr.edu/textbook/kidsweb/
•
Wide Area Technical Report Services: Technical Report Online. http://www.cs.odu.edu/WATERS/WATERS-GS.html
•
Dienst: An Architecture for Distributed Document Libraries. http://cs-tr.cs.purdue.edu
•
Library of Congress Digital Library Effort. http://www.loc.gov
•
Digital Library Related Information. http://interspace.grainger.uiuc.edu/~ bgross/digital-libraries.html
•
CNN Newsroom on the Internet: A Digital Video News Magazine and Library, written by Charles L. Compton and Paul D. Bosco.
http: / /www.nmis.org:80 / AboutNMIS /Papers/icmcs95.1/newsroom.html •
The Digital Video Library System, written by Dr. Susan Gaueh, University of Kansas. http://www.tisl.ukans.eduffsgauch/DVLS.html
•
Prototyping the VISION Digital Video Library System, written by Kok Meng Pua. http://www.tisl.ukans.edu:80/-sgauch/vision_proto.html
158
•
APPENDIX A
Research home page on digital library applications.
http :/ /www.ee.princeton.edu:80 /~mingy /browser/ browsing-research.html •
Adding Digital Video to an Object-Oriented User Interface Toolkit, written by S.M.G. Freeman and M.S. Manasse. http://www.research.digital.com:80/SRC/argo/ecoop/ecoop_ToC.html
•
Key Concepts in the Architecture of the Digital Library, written by William Y. Arms and Reston, Virginia. http://www.cnri.reston.va.us/home/dlib/July95/07arms.html
•
Creating Digital Libraries for the 21st Century.
http :/ /www.osc.edu:80 / Casc/papers/paper5.html •
VISION: A Digital Video Library, written by Wei Li et al. http://www.tisl.ukans.edu/~wlee/papers/d196/d196.ht ml
A.5.5
Commercial
•
Digital Stock Photo Library provides online shopping. http://www.gil.com.au/comm/digit/digi.html
•
The Network Connection Introduces Stock-Clips, the First Stock Information Service Offering Full Motion Video Stock Information Over the Internet. http://tnc.www.com/home/pr/013195.html
•
New Way Technologies provides World Wide Web Services for W W W design and marketing. http://www.newway.com/
•
Some interesting topics in TMresearch: http://www.innovplace.saskatoon.sk.ca/tmres/tmr2.html 1. Home Shopping via Electronic Networks. 2. Profiling the Potential Users of On-Line Video Games. 3. On-Line Public Services. http://www.innovplace.saskatoon.sk.ca/t mres/tmr2.html
•
Video Publishing House provides video-based business training programs that help developing the leadership, management, team, and communication skills. http://www.vphi.com/
Useful URLs
A.5.6
159
Communication
•
Digital Announces Industry-Leading Broadcast Video Solution. http://www.digital.com/info/pr-news/95032104PR.txt.html
•
ShowMe TV: Sun's Desktop Audio/video Broadcast Solution.
http :/ / sunsi te.nus.sg /flashback / november.1994 / sunflash / 71.23.showme-tv .html •
VDOLive and VDOPhone are two products of VDO for Internet video broadcasting and desktop video conferencing. http://www.vdo.net/products/
•
Alcatel and Digital Sign Interactive Information Services Technology Agreement. http://www.digital.com:80/info/pr-news/95012001 PR.txt.html
•
tIongkong Telecom's digital network is responsible for making the connection between the video server and the customer's set-top box when customer requests access to the service. The network also carries commands input by the customer, e.g., pause. http://www.ims.hkt.com.hk/IMS/Technology.html
•
DT-5 desktop video conferencing. http://fiddle.ee.vt.edu/succeed/videoconf.html
•
PictureWindow Video Conferencing Software, a software package that allows workstation users to hold video conferences over existing Internet Protocol (IP) networks. http://kopernik.npac.syr.edu: 1200/collaboratory/PictureWindow.html
•
Picturephone Direct provides solutions to desktop videoconferencing.
http :/ /picturephone.com •
Video Conferencing Tools.
http:/ /sapho.cs.pdx.edu/wh_www /OTHER_DIR/sd/tutorial.html •
The configuration management of access networks with V5 interfaces. http://www.labs.bt.com/bookshop/papers/4691223.htm
•
Bitfield video communication products enable transferring full-motion video between standard PCs via ISDN, LANs, and other telecommunication networks. http://www.bitfield.fi:80/videonet.html
160
APPENDIX A
•
A Full Multimedia Conferencing System. http://www.wmin.ac.uk:80/media/ttRC/manifesto/hmm.19.html
•
Video Conferencing Directory. http://www.dipoli.hut.fi:80/cet-bin/studios.pl
•
The Media Gateway: Live Video on the World Wide Web, written by H. Houh et al. ht tp://www.tns.lcs.mit.edu/publications/WWW94a.html
•
Video Communications Bibliography. http://www.crew.umich.edu/-brinck/vbib.html
A.5.7
Healthcare
•
Video in Medical Imaging. http://www.parallax.com/cbv/apps/medical.ht ml
•
Tele-Education and Medicine Project - - TEAM. http://www.ihi.aber.ac.uk/IHI/teamdoc.html
•
Telemedicine Projects in the United States. http://zax.radiology.arizona.edu:80/umc.html
•
Telemedicine Program at the University of Arkansas for Medical Sciences. http://life.uams.edu:80/ahec/telel.htm
•
Telemedicine Information Exchange - - a useful resource on the Internet.
http :/ / tie.telemed.org/scripts/ getpage.pl?client=text&page=extlink •
The Virtual Hospital presented by University of Iowa College of Medicine, is a continuously updated digital health sciences library. It exists to provide rapid, convenient access to health care information for both health care providers and patients. http://vh.radiology.uiowa.edu/
A.5.8
Industry and Manufacturing
Video in Manufacturing and Process Control. http ://www.parallax.com/cbv/apps/manufacturing.html Video in Geographic Imaging Systems. http://www.parallax.com/cbv/apps/gis.html
Useful URLs
161
Manufacturing Experts/Instructional Video Modules. http://www.mame.syr.edu/MET/experts/bu.html Videomedia Features Latest in Video Technology at Montreux. http://www.videomedia.com:80/about/press-montreux.html ATP FOCUSED PROGRAM: Digital Video in Information Networks. http://atp.nist.gov:80/atp/dviin.htm
REFERENCES
[1] Gulrukh Ahanger, Dan Benson, and T. D. C. Little. Video query formulation. In Storage and Retrieval for Image and Video Database II, I S ~ T / S P I E Symposium on Electronic Image Science ~ Technology, San Jose, CA, February 1995. [2] P. Aigrain and P. Joly. Automatic real-time analysis of film editing and transition effects and its applications. Computer and Graphics, 18(1):93103, January 1994. [3] Hideo Hashimoto Akihito Akutsu, Yoshinobu Tonomura and Yuji Ohba. Video indexing using motion vectors. In Proceedings of SPIE: Visual Communications and Image Processing 92, November 1992. [4] Akihito Akutsu and Yoshinobu Tonomura. Video tomography: An efficient method for camerawork motion vectors. In Proceedings Second Annual ACM Multimedia Conference, October 1994. Association of Computing Machinery. [5] J. F. Allen. Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11):832-843, November 1983. [6] A. Banerjea, D. Ferrari, B. Mah, M. Moran, D. Verma, and H. Zhang. The tenet real-time protocol suite: Design, implementation, and experiences. Technical Report TR-94-059, International Computer Science Institute, Berkeley, November 1994. [7] J. Banerjee and W. Kim. Semantics and implementation of schema evolution in object-oriented database. ACM SIGMOD 87, pages 311-322, 1987. [8] Steven Berson, Leana Golubchik, and Richard R. Muntz. Fault tolerant design of multimedia servers. In SIGMOD'95, pages 364-375, San Jose, CA, USA, 1995. [9] Alberto Del Bimbo and Enrico Vicario. A logical framework for spatio temporal indexing of image sequence. In S. K. Chang, editor, Spatial Reasoning. Springer Verlag, 1993. 163
164
VIDEO DATABASE SYSTEMS
[10]
Alberto Del Bimbo, Enrico Vicario, and Daniele Zingoni. Sequence retrieval by contents through spatio temporal indexing. IEEE Symposium on Visual Languages, 1993. IEEE Computer Society.
[11] [12]
D. Bitton and J. Gray. Disk shadowing. VLDB, pages 331-338, 1988. Michael H. Bohlen, Christian S. Jensen, Richard T. Snodgrass, and Richard Schroeppel. Evaluating and enhancing the completeness of TSQL2. Technical Report TR95-05, Computer Science Department, University of Arizona, 1995.
[13]
V. M. Bove. What's wrong with today's video coding. TV Technology, February 1995.
[14]
J. Brassil, S. Low, N. Maxemchuk, and L. O'Gorman. Document marking and identification using both line and word shifting. Technical report, AT&T Bell labratories, 1994.
[15]
J. Brassil, S. Low, N. Maxemchuk, and L. O'Gorman. Electronic marking and identification technology to discourage document copying. Technical report, AT&:T Bell labratories, 1994.
[16]
Sergey Brin, James Davis, and Hector Garcia-Molina. Copy detection mechanisms for digital document. In SIGMOD'95, pages 398-409, San Jose, CA, USA, 1995.
[17]
H. P. Brondmo and Glorianna Davenport. Creating and viewing the elastic data - - a hypermedia journal. In R. McAlesse and C. Greene, editors, Hypertext State of the Art. Intellect, Ltd., Oxford, England, 1990.
[18]
Claudette Cedras and Mubarak Shah. Motion-based recognition: A survey. Image and Vision Computing, 13(2):129-155, March 1995.
[19]
C. W. Chang, K. F. Lin, and S. Y. Lee. The characteristics of digital video and considerations of designing video databases. In Proceedings of the 4th International Conference on Information and Knowledge Management, pages 370-377, Baltimore, Maryland, USA, November 1995.
[2o]
E. Chang and A. Zakhor. Scalable video coding using 3-d subband velocity coding and multirate quantization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 574-577, Minneapolis, MN, 1993.
[21]
R. Chellappa, Charles L. Wilson, and Saad Sirohey. Human and machine recognition of faces: A survey. Proceedings of the IEEE, 83(5):705-740, May 1995.
References
165
[22]
P. S. Chen. The entity-relationship model-toward a unified view of data. ACM Transactions on Database Systems, 1(1), March 1976.
[23]
T. Chieuh and R. Katz. Multiresolution video representation for parallel disk arrays. Ill Proceedings of ACM Multimedia'93, pages 401-409, Anaheim, CA, 1993.
[24]
A. K. Choudhury, N. F. Maxemchuk, S. Paul, and H. G. Schulzrinne. Copyright protection for electronic publishing over networks. Submitted to IEEE Network Magazine, June 1994.
[25]
A brief description of the Gigabit Testbed Initiative. URL: http://www.cnri.reston.va.us:4000/public/overview.html. The Corporation for National Research Initiatives (CNRI), Last revision: January 1994.
[26]
S. J. Daigle. Disk scheduling for continuous media data streams. Master's thesis, Carnegie Mellon University, 1992.
[27]
A. Dan, M. Kienzle, and D. Sitaram. Dynamic segment replication policy for load-balancing in video-on-demand servers. Technical Report IBM Research Report RC 19589, IBM, Yorktown Heights, NY, 1994.
[28]
Asit Dan and Dinkar Sitaram. An online video placement policy based on bandwidth to space ration (BSR). In SIGMOD'95, pages 376-385, San Jose, CA, USA, 1995.
[29]
C. J. Date. An Introduction to Database Systems. The Systems Programruing Series. Addison-Wesley Publishing Company, 1975.
[3o]
Gloriana Davenport, Thomas G. Aguierre Smith, and Natalio Pincever. Cinematic primitives for multimedia. IEEE Computer Graphics ~ Applications, pages 67-74, July 1991.
[31] Marc Davis. Media Streams: An iconic visual language for video annotation. IEEE Symposium on Visual Languages, pages 196-202, 1993. IEEE Computer Society.
[32]
Marc Davis. Knowledge representation for video. In Working Notes: Workshop on Indexing and reuse in Multimedia Systems, pages 19-28. American Association of Artificial Intelligence, August 1994.
[33]
M.R.W. Dawson. The how and why of what went where in apparent motion. Psychological Review, 98:569-603, 1991.
166
VIDEO DATABASE SYSTEMS
[34] Young Francis Day, Serhan Dagtas, Mitsutoshi Iino, Ashfaq Khokhar, and Arif Ghafoor. Spatio-temporal modeling of video data for on-line object-oriented query processing. In IEEE ICDE, 1995.
[35]
Young Francis Day, Serhan Dagtas, Mitsutoshi Iino, Ashfaq Khokhar, and Arif Ghafoor. Object-oriented conceptual modeling of video data. In IEEE ICDE, 1995.
[36] Arding Hsu Farshid Arman and Ming-Yee Chiu. Image processing on compressed data for large video database. In Proceedings of the ACM Multimedia, pages 267-272, California, USA, June 1993. Association of Computing Machinery. [37] Arding Hsu Farshid Arman, R. Depommier and Ming-Yee Chiu. Contentbased browsing of video. In Proceedings Second Annual ACM Multimedia Conference, October 1994.
[38]
D. Ferrari. Advances in Real-Time Systems, chapter New Admission Control Method for Real-Time Communication in an Internetwork, pages 105-116. Prentice-Hall, Englewood Cliffs, NJ, 1995.
[39]
International Organization for Standardization. (MPEG).
[4o1
Craig S. Freedman and David J. DeWitt. The SPIFFI scalable videoon-demand system. In SIGMOD'95, pages 352-363, San Jose, CA, USA, 1995.
[41]
D. Le. Gall. MPEG: A video compression standard for multimedia applications. Communication of ACM, 34(4):46-58, April 1991.
[42]
D. E. Gibson. Report on an International Survey of 500 Audio, Motion Picture Films and Video Archives. Talk given in the annual FIAT/IASA Conference, September 1994. Bogensee, Germany.
[43]
U. S. Government. Copyright act, 1976.
ISO//IEC 11172
[44] G. N. Griswold. A method for protecting copyright on networks. In Joint Harvard MIT Workshop on Technology Strategies for Protecting Intellectual Property in the Networked Multimedia Enviornment, April 1993.
[45]
V. Gudivada and V. Raghavan. Special issues on content-based image retrieval systems. Computer, pages 18-22, September 1995.
References
167
[46]
Amarnath Gupta, Terry Weymouth, and Ramesh Jain. Semantic queries with pictures: the VIMSYS model. In Proceedings of the 171h International Conference on Very Large Data Bases, September 1991.
[4z]
Arun Hampapur. Design Video Data Management Systems. PhD thesis, The University of Michigan, 1995.
[48]
Arun Hampapur, Ramesh Jain, and Terry Weymouth. Digital video indexing in multimedia systems. In Proceedings of the Workshop on Indexing and Reuse in Multimedia Systems, August 1994.
[49]
Arun Hampapur, Ramesh Jain, and Terry Weymouth. Digital video segmentation. In Proceedings Second Annual ACM Multimedia Conference and Exposition, 1994.
[50]
Arun Hampapur, Ramesh Jain, and Terry Weymouth. Production model based digital video segmentation. Journal of Multimedia Tools and Applications, 1(1):9-46, March 1995.
[51]
J. R. Haritsa and M. B. Karthikeyan. Disk scheduling for multimedia database applications. To appear in COMAD'94.
[52]
Rune Hjelsvold. Video information content and architecture. In Proceedings of the 4th International Conference on Extending Database Technology, Cambridge, UK, March 1994.
[531
Rune Hjelsvold. VideoSTAR - A Database for Video Information Sharing. PhD thesis, Norwegian Institute of Technology, November 1995.
[54]
Rune Hjelsvold and Roger Midtstraum. Modeling and querying video data. In Proceedings of the 20th International Conference on Very Large Data Bases, September 1994.
[55]
Rune Hjelsvold and Roger Midtstraum. Databases for video information sharing. In Proceedings of the ISgJT/SPIE Symposium on Electronic Imaging Science and Technology, San Jose, CA, 1995. Conference on Storage and Retrieval for Image and Video Databases III.
[56]
Rune Hjelsvold, Roger Midtstraum, and Olav Sandst. Searching and Browsing a Shared Video Database, chapter Design and hnplementation of Multimedia Database Management Systems. Kluwer Academic Publishers, 1996.
[sr]
P. R. Hsu and H. Harashima. Detecting scene changes and activities in video databases. In ICASSP'94, volume 5, pages 33-36, April 1994.
168
VIDEO DATABASE SYSTEMS
[58] E. Hwang and V. S. Subrahmanian. Querying video libraries. Journal of Visual Communication and Image Representation. Accepted for publication. [59] Mikihiro Ioka and Masato Kurokawa. Estimation of notion vectors and their application to scene retrieval. Technical Report 1623-14, IBM Research, Tokyo Research Laboratory, Shimotsuruma, Yamato-shi, Kanagawa-ken 242, Japan, 1993. [60] Ramesh Jain, Jayaram Murthy, Peter L-J Chen, and Shankar Chaterjee. Similarity measures in image database. Technical report, University of California at San Diego, 1994. [61] Haitao Jiang and Jeffery W. Dailey. Video database system for study animal behavior. In C.-C. Jay Kuo, editor, Proceedings of SPIE Multimedia Storage and Archiving Systems, volume 2916, pages 162-173, Boston, MA, November 1996. [62] Haitao Jiang and Ahmed K. Elmagarmid. Video databases: State of the art, state of the market and state of practice. In Second International Workshop on Multimedia Information Systems, pages 87-91, West Point, NY, September 1996. [63] Haitao Jiang, Abdelsalam Helal, Ahmed K. Elmagarmid, and Anupam Joshi. Scene change detection techniques for video database systems. ACM Multimedia Systems, 1996. In print. [64] Haitao Jiang, Danilo Montesi, and Ahmed Elmagarmid. VideoText database systems. 1996. Submitted to IEEE Multimedia Systems'97. [65] A. Joshi. On Connectionism and the Problem of Correspondence in Computer Vision. PhD thesis, Department of Computer Science, Purdue University, 1993. [66] R. E. Kahn. Deposit, registeration and recordation in an electronic copyright management system. Technical report, Corporation for National Research Initiatives, Reston, Virginia, August 1989. [67] K. Keeton and R. H. Katz. Evaluating video layout strategies for a highperformance storage server. Multimedia Systems, 3:43-52, 1995. [68] E. Knightly and H. Zhang. Traffic characterization and switch utilization using a deterministic bounding interval dependent traffic model. In Proceedings of IEEE INFOCOM'95, pages 1137-1145, Boston, MA, April 1995.
References
169
[69]
Sun-Yin Lee and Huan-Ming Kao. Video indexing - an approach based on moving object and track. Technical report, Institute of Systems Science and Information Engineering, National Chiao Tung University, tlsinchu, Taiwan Republic of China.
[7o]
T. D. C. Little and A. Ghafoor. Interval-based conceptual model for time-dependent multimedia data. IEEE Transaction on Knowledge and Data Engineering, 5(4):551-563, August 1993.
[71]
C. L. Liu and J. W. Layland. Scheduling algorithms for multiprogramruing in hard-real-time environment. Journal of ACM, 11:46-61, 1973.
[72]
It. C. Liu and G. L. Zick. Scene decomposition of MPEG compressed video. In Digital Video Compression: Algorithms and Technologies, volume SPIE 2419. February 1995.
[73]
M. Livingstone and D. O. Hubel. Segregation of form, color, movement and depth: Anatomy, physiology and perception. Science, 240:740-749, 1988.
[74]
H. G. Longbotham and A. C. Bovic. Theory of order statistic filters and their relationship to linear FIR filters. IEEE Transactions on Acoustics, Speech, and Signal Processing, ASSP-37(2):275 287, February 1989.
[75]
P. Lougher and D. Shepherd. The design of a storage server for continuous media. Computer Journal, 36:32-42, 1993.
[76]
W. E. Mackay and G. Devenport. Virtual video editing in interactive multimedia applications. Communications of A CM, 32(7):802-810, July 1989.
[77]
J. Meng, Y. Juan, and S. F. Chang. Scene change detection in a mpeg compressed video sequence. In Storage and Retrieval for Image and Video Database III, volume SPIE 2420. February 1995.
ITS]
Michael Mills, Jonathan Cohen, and Yin Yin Wong. A magnifier tool for video data. In Proceedings of ACM Conference on Human Factors in Computing Systems - CHI'92, pages 93-98, New York, NY, USA, May 1992.
[79]
A. Nagasaka and Y. Tanaka. Automatic video indexing and full-video search for object appearances. In Second Working Conference on Visual Database Systems, pages 119-133, Budapest, Hungary, October 1991. IFIP WG 2.6.
170
VIDEO DATABASE SYSTEMS
[80] Eitetus Oomoto and Katsumi Tanaka. OVID: Design and implementation of a video-object database system. IEEE Transaction on Knowledge and Data Engineering, page 5(4), 1993. [81] K. Otsuji and Y. Tonomura. Projection detecting filter for video cut detection. In Proceedings of First ACM International Conference on Multimedia, August 1993. [82] K. Otsuji, Y. Tonomura, and Y. Ohba. Video browsing using brightness data. In Visual Communications and Image Processing, volume SPIE 1606, pages 980-989. 1991. [83] Banu Ozden, Alexandros Biliris, Rajeev Rastogi, and Avi Silberschatz. A low-cost storage server for movie on demand databases. In Proceedings of the 20th VLDB Conference, pages 594-605, Santiago, Chile, 1994. [84] David A. Patterson, Garth Gibson, and Randy H. Katz. A case for redundant arrays of inexpensive disks (RAID). In ACM SIGMOD Conference, pages 109-116, 1988. [85] Z. Pizlo, A. Rosenfeld, and J. Epelboim. An exponential pyramid model of the time course of size processing. Vision Research, 35:1089-1107, 1995. [86] G. J. Popek and C. S. Kline. Encryption and secure computer networks. ACM Computing Surveys, 11(3):331-356, December 1979. [87] K. K. Ramakrishnan, L. Vaitzblit, and et al. Operating system support for a video-on-demand file service. Multimedia Systems, 3:53-65, 1995. [88] P. V. Rangan and H. Vin. Designing file system for digital video and audio. In Proceedings of the 13th Symposium on Operating System Principles, pages 81-94, New York, NY, 1991. Operating System Review. [89] A. L. N. Reddy and J. C. Wyllie. I/O issues in a multimedia system. IEEE Computer, 27(3):69-74, March 1994. [90] H. M. Rose. MPEG titles: One more year. URL: http://www.hyperst and.com/cgi-bin/w3com/register?nm2. [91] Pamela Samuelson. Legally speaking: The Nil intellectual property report. Communications of the ACM, December 1994. [92] I. K. Sethi and N. Patel. A statistical approach to scene change detection. In Storage and Retrieval for Image and Video Database Ili, volume SPIE2420, pages 329-338, February 1995.
References
171
[93] B. Shahraray. Scene change detection and content-based sampling of video sequences. In Digital Video Compression: Algorithms and Technologies, volume SPIE 2419, pages 2-13. February 1995.
[94]
A. Silberschatz and P. B. Galvin. Operating System Concepts. AddisonWesley, Reading, MA, 4th edition, 1994.
[95]
Michael A. Smith and Michael G. Christel. Automating the creation of a digital video library. ACM Multimedia, 1995.
[96] Michael A. Smith and Alexander Hauptmann. Text, speech, and vision for video segmentation: The informedia project. In AAAI Fall 1995 Symposium on Computational Models for Integrating Language and Vision, 1995. [97] T.G.A. Smith. If you could see what i mean .... descriptions of video in an anthropologist's notebook. Master's thesis, MIT, 1992.
[98]
Thomas G. Aguierre Smith and Glorianna Davenport. The stratification system: A design environment for random access video. In Workshop on Networking and operating System Support for Digital Audio and Video, 1992. Association of Computing Machinery.
[99] Thomas G. Aguierre Smith and N. C. Princever. Parsing movies in context. In Proceedings of the 1991 Summer USENIX Conference, Nashville, USA, 1991.
[lOO] Stephen
W. Smoliar and Hong Jiang Zhang. Content-based video indexing and retrieval. IEEE Multimedia, pages 62-72, Summer 1994.
[lOl]
Stephen W. Smoliar, Hong Jiang Zhang, and Jian Hua Wu. Using frame technology to manage video. In Proceedings of the Workshop on Indexing and reuse in Multimedia Systems. American Association of Artificial Intelligence, August 1994.
[lO2]
Richard T. Snodgrass. The temporal query language Tquel. ACM Transactions on Database Systems, pages 12:299-321, June 1987.
[103] M. J. Swain and D. H. Ballard. Color indexing. International Journal of Computer Vision, pages 11-32, 1991. [104] Deborah Swanberg, Chiao-Fe Shu, and Ramesh Jain. Architecture of multimedia information system for content-based retrieval. In Audio Video Workshop, San Diego, CA, November 1992.
172
VIDEO DATABASE SYSTEMS
[105]
Deborah Swanberg, Chiao-Fe Shu, and Ramesh Jain. Knowledge guided parsing in video database. In Electronic Imaging: Science and Technology, San Jose, California, February 1993. IST/SPIE.
[106]
Mruthyunjaya S. Telagi and Athamaram H. Soni. 3-D object recognition techniques: A survey. In Proceedings of the 1994 ASME Design Technical Conferences, volume 73, September 1994.
[107] F. Tobagi, J. Pang, R. Raird, and M. Gang. Streaming RAID - a disk array management system for video files. In ACM Multimedia'93, pages 393-400, 1993. [108] Yoshinobu Tonomura and S. Abe. Content-oriented visual interface using video icons for visual database systems. Journal Visual Languages and Computing, 1(2):183-198, 1990. [109] Yoshinobu Tonomura and Akihito Akutsu. A structured video handling technique for multimedia systems. IEICE Trans. Inf. ~ Syst., E78D(6):764-777, June 1994. [110] Yoshinobu Tonomura, Akihito Akutsu, Yukinobu Taniguchi, and Gen Suzuk. Structured video computing. IEEE Multimedia, 1(3):34-43, 1994. [111] H. M. Vin and P. V. Rangan. Designing a multiuser HDTV storage service. IEEE Journal of Selected Areas Communication, (11), 1993. [112] Jiri Weiss. RAIDing storage for multimedia. Newmedia, pages 34-38, February 1996.
[113]
Ron Weiss, Andrzej Duda, and David Gifford. Content-based access to algebraic video. In IEEE International Conference Multimedia Computing and Systems, Los Alamitos, CA, 1994.
[114] D. Wheeler. Computer network are said to offer new opportunities for plagarists. The Chronicle of Higher Education, pages 17-19, June 1993.
[115]
J. Woods. Subband Image Coding. Kluwer Academic, Boston, MA, 1991.
[116]
Information Infrastructure Task Force Working Group on Intellectual Property Rights. Green paper: Intellectual property and the national information infrastructure, July 1994. Preliminary Draft.
[117] Boon-Lock
Yeo. Efficient Processing of Compressed Images and Video. PhD thesis, Princeton University, January 1996.
References
173
[1181 Boon-Lock Yeo and Bede Liu. On the extraction of DC sequence from MPEG compressed video. In The International Conference on Image Processing, October 1995. [119] Boon-Lock Yeo and Bede Liu. Rapid scene analysis and compressed video. IEEE Transactions on Circuit and System for Video Technology, 5(6):533-544, December 1995. [1201 Boon-Lock Yeo and Bede Liu. A unified approach to temporal segmentation of Motion JPEG and MPEG compressed video. In Second International Conference on Multimedia Computing and Systems, May 1995.
[121]
Minerva M. Yeung and Bede Liu. Efficient matching and clustering of video shots. In International Conference on Image Processing, volume I, pages 338-343, October 1995.
[122]
P. S. Yu and et al. Design and analysis of a grouped sweeping schema for multimedia storage management. In Proceedings of the 3rd International Workshop on Network and Operating System Support for Digital Audio and Video, pages 44-45, November 1992.
[1231 Ramin Zabih, Justin Miller, and Kevin Mai. Feature-based algorithms for detecting and classifying scene breaks. In Fourth ACM Conference on Multimedia, San Francisco, California, November 1995. [124] H. Zhang and E. Knightly. Providing end-to-end statistical guarantees using bounding interval dependent stochastic models. In Proceedings of ACM SIGMETRICS'9d, Nashville, TN, May 1994. [125] H. Zhang and E. Knightly. Comparison of rate-controlled static priority and stop-and-go. Multimedia Systems, (September), 1995. [126] H. Zhang and E. Knightly. A new approach to support VBR video in packet-switching networks. In Proceedings of IEEE Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV)'95, pages 275-286, Durham, NH, April 1995. [127] Hong Jiang Zhang, Yihong Gong, Stephen W. Smoliar, and Shuang Yeo Tan. Automatic parsing of news video. In Proceedings of IEEE Conference on Multimedia Computing Systems, May 1994.
[128]
Hong Jiang Zhang, A. Kankanhalli, and Stephen W. Smoliar. Automatic partition of animate video. Technical report, Institute of System Science, National University of Singapore, 1992.
174
VIDEO DATABASE SYSTEMS
[129] Hong Jiang Zhang, A. Kankanhalli, and Stephen W. Smoliar. Automatic parsing of full-motion video. Multimedia Systems, 1:10-28, July 1993. [130] Hong Jiang Zhang, C. Y. Low, Y. Gong, and Stephen W. Smoliar. Video parsing using compressed data. In Image and Video Processing II, volume SPIE 2182, pages 142-149. 1994.
INDEX
AC coefficient, 35 Active document, 80 Airline Video-on-Demand (AVO D), 109 Algebraic Video System, 22 Asynchronous Transfer Mode (ATM), 76, 99, 108 AVI, 84, 101, 111 Bandwidth to Space Ratio (BSR), 71 Block motion estimation, 45 Catastrophic failure, 73 CCIR 601, 59, 61, 85 CCITT, 63 CD-i, 61, 88, 94 CompoundUnit, 15 Constant-bit-rate encoding, 63 Content-based access, 10, 23 Content-based query, 18 Content-based temporal sampling, 23 Cut, 24, 37 cut detection, 24 Data camera, 42 Data independence, 17 video data independence, 17 Data modeling, 3 DC coefficient, 24, 35-36 DC image sequence, 33 DC image, 26 DC sequence, 26 DCT block, 36 DCT coefficient, 27 DCT vector, 36 Degradation of service, 73
Delayed prefetching, 68 Deterministic Bounding Interval Dependent (D-BIND), 77 Digital library, 83 Digital Storage Media Command and Control (DSM-CC), 77-78 Discrete Cosine Transform (DCT), 4 Distance education, 6 Distance learning, 101 DS3, 108 DVI, 5, 64, 85, 111 Dynamic Segment Replication (DSR), 72 EISA bus, 85, 89 Electronic classroom, 104 Electronic commerce, 7 Elevator disk scheduling algorithm, 68 Entity-Relationship model, 21 Episode, 15 Ethernet, 99, 101 FDDI, 99, 101 Full motion capture and playback, 83 Full motion video, 84 G.711, 86-87 G.722, 87-88 G.728, 87-88 Gigabit network testbed, 5 Gigabit test bed The Gigabit Test Bed Initiative, 75
176
Global color statistic comparison, 34 Global LRU algorithm, 68 Graphic user interface (GUI), 54 Group sweeping scheme (GSS), 68 H.221, 63 H.230, 63 H.242, 63 H.261, 5, 63, 85, 104 H.320, 63 HDTV, 57, 61 Hierarchical video magnifier, 54 Histogram X2 histogram, 28, 31, 35 color histogram, 27, 35 histogram matching, 17, 25 luminance histogram, 35 Iconic annotation, 43 In-flight entertainment (IFE), 106 Information-on-demand, 104 Intellectual property rights (IPRs), 78 Intellectual property, 78 Interactive training, 83 Interval inclusion inheritance, 20 ISA, 87 ISDN, 63 Isochronous tasks, 70 ITU, 63 JACOB system, 50 JPEG, 62, 84 Keyword annotation, 43 Kolmogrov-Smirnov test, 35 Lectures on demand, 6 Logical video structure model, 46 Lossy compression algorithm, 59 Lossy video compression algorithm, 59 Love page prefetching, 68 Meantime to failure (MTTF), 73 Media server, 5 Media Stream system, 43
VIDEO DATABASE SYSTEMS
MediaBENCH system, 46 MHEG, 63 Micon (moving icon), 54 Motion JPEG (MJPEG), 5, 35, 59, 62, 89, 96 Motion vector, 36, 45, 53 Move-on-demand (MOD), 71 MovEase system, 52 Movie-on-deinand (MOD), 86 MPEG, 5, 25, 35-36, 41, 59, 83, 101 B frame, 27 D frame, 25 I frame, 25, 35-36 P frame, 25, 27, 36 MPEG-I, 25, 36, 60, 78, 86, 88, 96, 109 MPEG-II, 36, 60, 66, 78, 93, 96, 109 MPEG-IV, 61 Nested stratification, 22 Nonlinear order statistical filter, 32 NTSC, 5, 84, 87, 93 Object-oriented Video Information Database (OVID), 20 Object-oriented Video Information Database (OVID), 52 Optical character recognition (OCR), 14 Packet Transfer Mode (PTM), 76 PAL, 5, 60, 84, 93 Parity schema, 73 PCI, 84, 87 PCM, 104 Period transformation technique, 68 Periodic tasks, 70 Phase-constrained storage allocation scheme, 71 Prefetching disk scheduling algorithm, 68 Presentation, 21
Index
QBIC system, 50 Quadrature mirror filtering (QMF), 65 Quality proportional multisubscriber servicing (QPMS), 68 Quality-of-service (QOS), 6, 65, 71-72, 75, 77 QuickTime, 5, 64, 85, 96, 101 RAID, 91, 93, 99 Rate conversion, 68 Rate-monotonic basis, 70 Real-time IP (RTIP), 77 Removable storage system, 83 Representative frame (RFrame), 45 S-video, 84 SBus, 87 Scene, 15 scene analysis, 3 scene change, 17 abrupt scene change, 3, 24, 37 gradual scene change, 3, 24, 31, 37 scene change detection (SCD), 4, 23-25 SCSI, 99 SECAM, 60, 84 Secure printers, 80 Segmented annotation, 43 Sequence, 15 Shot, 15, 17, 23, 42 Signature (watermark) scheme, 80 Similarity measurement, 49 Source annotation, 42 Spatial Temporal Logic (STL), 44, 51 SQL-92, 51 Stock-Clips, 111 StoredVideoSegment, 17 Stratification model, 42-43 Subband video coding algorithm, 65
177
T3, 108 Teleclassroom, 6 Telemedicine, 7 Template matching, 17, 25, 27, 3I, 33
Token ring, 99, 101 Trade-Related Aspects of Intellectual Property (TRIPS), 78 TSQL, 51 TSQL2, 51 Variable bit rate (VBR), 75 Variable-bit-rate encoding, 63 Video algebraic operations, 21 Video annotation, 16, 42, 48 Video board, 83 Video browsing, 45 Video compression, 5 Video conferencing, 117 Video data indexing, 11, 41 annotation based indexing, 41 domain specific indexing, 4, 41 feature based indexing, 4, 41, 44 Video data insertion, 10 Video data model algebraic video data model, 21 domain specific shot model, 17 episode model, 17 hierarchical video stream model, 17 object-oriented data model, 19 stratification model, 19 Video Object Description Model (VODM), 20 video production process model, 18 video segmentation based model, 19 Visual Information Management System (VIMSYS), 18 Video data modeling, 10-11 Video data placement policy, 70
178
Video data query audiovisual query, 49 browsing query, 50 clip based query, 49 deterministic query, 50 direct query, 50 exact match-based query, 49 frame based query, 49 Iconis Query (IQ), 50 iterative query, 50 meta information query, 48 query by example, 50 Query by Pictorial Example (QBPE), 50 semantic information query, 48 similarity match-based query, 49 spatial query, 49 spatial-temporal query, 49 statistical query, 48 temporal query, 49 Video data segmentation, 23 Video database (VDB), 2, 14 Video database management system (VDBMS), 10, 23 Video database system, 2 Video expression, 21 Video file server, 77 Video identifier (VID), 53 Video object, 20 Video Query System, 52 Video query, 11 Video retrieval, 11 Video scene analysis, 11 Video segmentation, 11, 42 Video Semantic Directed Graph
(VSDC), 22 Video server system, 83 Video storage system, 8:3 Video-on-demand (VOD), 5, 7, 100, 57, 61, 64, 67-68, 86, 93, 96, 102, 106, 108-109 VideoSQL, 52
VideoSTAR system, 51, 54 VideoStream, 17 VMEbus, 87 World Intellectual Property Organization (WIPO), 78 World Trade Organization (WTO), 78 Yakimovsky's likelihood ratio test, 35