Xiaofeng Meng Jidong Chen
Moving Objects Management Models, Techniques and Applications
Xiaofeng Meng Jidong Chen
Moving Objects Management Models, Techniques and Applications With 80 figures
Authors Prof. Xiaofeng Meng Information School Renmin University of China Beijing 100872, P.R China Email:
[email protected]
Dr. Jidong Chen EMC Research China 8/F, Block D, SP Tower Tsinghua Science Park Zhongguancun Dong Road Beijing 100084, P.R. China Email:
[email protected]
ISBN 978-7-302-22378-8 Tsinghua University Press, Beijing ISBN 978-3-642-13198-1 e-ISBN 978-3-642-13199-8 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2010927587 © Tsinghua University Press, Beijing and Springer-Verlag Berlin Heidelberg 2010 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Cover design: Frido Steinen-Broo, EStudio Calamar, Spain Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
We live in an age of rapid technological development. The Internet already affects our lives in many ways. Indeed, we continue to depend more, and more intrinsically, on the Internet, which is increasingly becoming a fundamental piece of societal infrastructure, just as water supply, electricity grids, and transportation networks have been for a long time. But while these other infrastructures are relatively static, the Internet is undergoing swift and fundamental change: Notably, the Internet is going mobile. The world has some 6.7 billion humans, 4 billion mobile phones, and 1.7 billion Internet users. The two most populous continents, Asia and Africa, have relatively low Internet penetration and hold the greatest potentials for growth. Their mobile phone users by far outnumber their Internet users, and the numbers are growing rapidly. China and India are each gaining about half a dozen million new phone users per month. Users across the globe as a whole increasingly embrace mobile Internet devices, with smart phone sales are starting to outnumber PC sales. Indeed, these and other facts suggest that the Internet stands to gain a substantial mobile component. This mega trend towards “mobile” is enabled by rapid and continuing advances in key technology areas such as mobile communication, consumer electronics, geopositioning, and computing. In short, this is the backdrop for this very timely book on moving objects by Xiaofeng Meng and Jidong Chen. The mobile Internet differs from the conventional Internet in key respects. Its users are faced with much more varied use situations: Rather than being in the office or at home, they are engaged in diverse activities such as driving or using public transport, walking, or attending a meeting. The mobile setting calls for a much more varied collection of services and applications, many of which will push content to their users when certain conditions are met. It becomes increasingly important to be able to anticipate the user’s current needs. A key signal in this regard is the user’s geo-context, notably the user’s current location. Thus, user location is fundamentally important for the mobile Internet.
i
ii
Foreword
Meng and Chen’s book concerns data management for moving objects on the mobile Internet. This important area is subject to intense research by a large and global community of scientists. And due to its many and diverse contributions, this area is also often confusing: it is difficult to attain an overview of important topics and solutions. I am excited about the book because it — by its very choice of topics and its coverage of these — offers structure to this rapidly evolving area. It introduces the reader to key tropics in data management for moving objects, offering both overviews and covering specific techniques in considerable detail. The book covers modeling, query processing techniques, and applications. The book considers the representation of the positions of moving objects and the modeling of the underlying space in which the objects move. Since object movement is frequently constrained to a transportation network, the book affords this setting special attention throughout. It addresses the problem of maintaining up-to-date representations of the objects’ positions. It also considers the important problem of indexing a database of frequently updated moving-object positions, including the current positions and the past, current, and anticipated future positions stored in an evolving database, as well as the past trajectories stored in a static database. On this foundation, the book delves into query processing, covering the fundamental kNN and range queries and also similar-trajectory retrieval and one-time and continuous density queries. It covers solutions to the problem of predicting the future trajectory of a moving object, and it addresses the topic of position uncertainty. Moving on to applications, the book puts focus on dynamic vehicle navigation, data management in dynamic transportation networks, and real-time moving-object clustering. The book ends with a coverage of location privacy. The book meets the need for a coherent account of the state-of-the-art on important topics in the area of moving-object data management, which is at the core of the evolving mobile Internet. It comes highly recommended to research students and researchers new to the topics covered, as well as to experienced researchers.
Preface
The continued advances in wireless communications and positioning technologies such as global positioning systems (GPS) enable new data management applications such as location-based services (LBS) that store and manage the continuously changing positions of moving objects. This book gives a comprehensive and complete view of a moving object management system. It aims at moving objects management, from the location management perspective to analyze how the continually changing locations affect the traditional database and data mining technology. Specifically, the book describes moving objects management from every aspect including moving objects modeling, location updating and indexing, querying and prediction for moving objects, uncertainty management, clustering analysis, location privacy issue, as well as some applications in intelligent transportation management. Early studies focused on moving objects database in free space. They assumed that the movement of the objects is unconstrained and based on Euclidean spaces. However, in the real world, objects move within spatially constrained networks, e.g., vehicles move on road networks. Overlooking this reality often leads to unrealistic data modeling and inaccurate query results. The content in this book focuses mainly on the moving objects within spatial networks, which is more practical. By exploiting the network feature of spatial networks, this book introduces models, techniques, and applications of moving objects management in a spatial network. The book is intended to help readers understand the main technologies in moving object management and apply them to LBS and transportation applications. With its accessible style and emphasis on practicality, the book presents new concepts and techniques for managing continuously moving objects. Database management systems developers, mobile applications developers, and applied R&D researchers will find the study an essential companion for new concepts, development strategies, and application models associated with this kind of changing location data. The book:
iii
iv
Preface
• presents a comprehensive architecture of moving object management, which includes not only basic theories and new concepts but also practical technologies and applications. • describes a set of new database techniques in modeling, indexing, querying and updating locations of moving objects, as well as data mining techniques in clustering analysis of moving objects. • introduces some new research issues in location privacy and uncertainty management of moving objects, which are topics of major interest in this field. • provides two typical applications of moving objects management in intelligent transportation systems.
Organization of the Book The book contains three parts with a total of twelve chapters, which describe the models, techniques, and applications of moving objects management. It is organized as follows: The first part describes the underlying data models of moving objects management, including location modeling, location updating, and moving object indexing. In Chapter 1, we introduce some background of moving objects management, including mobile computing and positioning technique, and then describe some applications in location-based services and mobile data management. Finally we present the main content — the moving objects databases technologies and our focuses in this book. In Chapter 2, we introduce a few underlying location modeling methods and propose a new graph of cellular automata (GCA) model to integrate the traffic movement features into the model of moving objects and the underlying spatial network. In Chapter 3, we first introduce a few of the underlying spatial index structures including the R-tree, Grid File, and Quad-tree. Then, we propose the indexing methods for moving objects in Euclidean space and in spatial networks, respectively. Finally, we describe techniques that index the past, present, and anticipated future positions of moving objects. In Chapter 4, we introduce a few underlying location update methods. Then, we describe two location update strategies in detail, the proactive location update strategy and group location update strategy, which can improve the performance. The second part describes the key techniques of moving objects management, in particular the query processing, location prediction, and uncertainty management. In Chapter 5, we classify the basic querying types for moving objects according to spatial predicates, temporal predicates, and moving spaces. Then, we introduce how to process a range query and a kNN query in a spatial network, based on the Euclidean restriction and network expansion frameworks. In Chapter 6, we introduce advanced querying for moving objects including similar trajectory queries and density queries for moving objects in a spatial network. We first present how to process the snapshot density queries. Then, we introduce
Preface
v
some efficient methods based on the safe interval to continuously monitor dense regions for moving objects. In Chapter 7, we first review some linear prediction methods and analyze their limitations in handling moving objects in spatial networks, and finally present the simulation-based prediction methods: Fast-Slow Bounds Prediction and TimeSegment Prediction. In Chapter 8, we study the uncertainty management problem for moving objects databases with uncertainty models and indexing algorithms. We propose an uncertainty model and an index framework, the UTR-Tree, for indexing the fully uncertain trajectories of network-constrained moving objects. The third part describes some typical applications of moving objects management, e.g., dynamic transportation navigation and dynamic transportation networks. Some advanced applications like location privacy and clustering analysis of moving objects are also introduced. In Chapter 9, we first discuss the kind of applications that can be built based on moving objects management technologies. Then, a typical application in an intelligent transportation system, dynamic transportation navigation, is described in detail, which can provide the user, always in real time and in a continuous fashion, the optimal path to the destination considering the traffic conditions. In Chapter 10, we present another application, a new moving objects model and query system for moving objects on dynamic transportation networks (MODTN). In MODTN, moving objects are modeled as moving graph points that move only within predefined transportation networks and the underlying transportation networks are modeled as dynamic graphs so that the state and the topology of the graph system at any point in time can be tracked and queried. In Chapter 11, we introduce an advanced application, clustering analysis of moving objects in spatial networks. We first propose two new static clustering algorithms, which use the information of nodes and edges in the network to improve the clustering efficiency and accuracy. Then, we introduce a notion of cluster block (CB) as the underlying clustering unit and propose a unified framework of clustering moving objects in spatial network (CMON), which improves the dynamic clustering performance of moving objects and supports different clustering criteria. In Chapter 12, we introduce location privacy, and analyze the challenges of preserving location. Then, we provide an analysis of the current studies including the system architecture, location anonymity, and query processing. As shown in Fig. 0.1, each chapter of the two parts in this book can be treated as one component of a typical moving objects management system. The contents of the whole book construct a comprehensive moving object management and application system. Figure 0.1 also shows the relationship of each component in the system.
vi
Preface
Fig. 0.1 Organization of the book
Acknowledgements The work described in this book has been supported by the grants from the National Natural Science Foundation of China under grant number 60573091 (“Research on the Key Technologies of Network-constrained Moving Objects Databases”, Jan. 2006 - Dec. 2008). This book is based on the research work of the authors in the past years, including the Ph.D thesis of Dr. Jidong Chen accomplished in June 2007 at Renmin University of China. The book integrates the collective intelligence from the mobile group of the Lab of Web and Mobile Data Management (WAMDM) at Renmin University of China. The authors would like to thank all the people in the group, Zhiming Ding, Xiao Pan, Xing Hao, Rui Ding, Yun Bai, Yanyan Guo, Benzhao Li, Zhizhi Hu, Zhen Xiao, Caifeng Lai, Shaoyi Yin and Yulei Fan. In particular, the authors appreciate Xing Hao, Xiao Pan and Yulei Fan for helping the collection and organization of the materials. Beijing, China, December 2009
Xiaofeng Meng Jidong Chen
Contents
Part I Moving Objects Management Models 1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Mobile Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.2 Positioning Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Location-Based Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Mobile Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Moving Object Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3 3 3 4 4 6 6 9
2
Moving Objects Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Underlying Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Graphs of Cellular Automata Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 Cellular Automata (CA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Structure of GCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.3 Trajectory of GCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.4 Transition of GCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.5 Two-Lane GCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13 13 14 17 17 18 19 20 20 21 22
3
Moving Objects Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Underlying Update Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Based on Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Based on Location Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 Based on Object Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Proactive Location Update Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Group Location Update Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 25 26 26 26 27 27 29
vii
viii
Contents
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4
Moving Objects Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Underlying Indexing Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 The R-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 The Grid File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 The Quad-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Indexing Moving Objects in Euclidean Space . . . . . . . . . . . . . . . . . . . 4.3.1 The R-Tree-Based Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 The Grid-Based Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 The Quad-Tree-Based Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Indexing Moving Objects in Spatial Networks . . . . . . . . . . . . . . . . . . 4.4.1 The Adaptive Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4.2 The Adaptive Network R-Tree (ANR-Tree) . . . . . . . . . . . . . . 4.5 Indexing Past, Present, and Future Trajectories . . . . . . . . . . . . . . . . . . 4.5.1 Indexing Future Trajectory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Indexing History Trajectories . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 Update-Efficient Indexing Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
35 35 36 37 39 40 40 41 42 44 51 52 54 57 57 60 61 63 63
Part II Moving Objects Management Techniques 5
Moving Objects Basic Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Classifications of Moving Object Queries . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Based on Spatial Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Based on Temporal Predicates . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Based on Moving Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 NN Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Incremental Euclidean Restriction . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Incremental Network Expansion . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Range Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Range Euclidean Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Range Network Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69 69 70 71 72 72 73 73 75 76 76 77 79 79
6
Moving Objects Advanced Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Similar Trajectory Queries for Moving Objects . . . . . . . . . . . . . . . . . . 6.2.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 Trajectory Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81 81 83 84 85 87
Contents
ix
6.3 Density Queries for Moving Objects in Spatial Networks . . . . . . . . . 89 6.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 6.3.2 Cluster-Based Query Preprocessing . . . . . . . . . . . . . . . . . . . . . 90 6.3.3 Density Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 6.4 Continuous Density Queries for Moving Objects . . . . . . . . . . . . . . . . 95 6.4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.4.2 Building the Quad-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.4.3 Safe Interval Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.4.4 Query Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 6.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 7
Trajectory Prediction of Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 7.2 Underlying Linear Prediction (LP) Methods . . . . . . . . . . . . . . . . . . . . 106 7.2.1 General Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.2.2 Road Segment-Based Linear Prediction . . . . . . . . . . . . . . . . . 106 7.2.3 Route-Based Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . 107 7.3 Simulation-Based Prediction (SP) Methods . . . . . . . . . . . . . . . . . . . . . 107 7.3.1 Fast-Slow Bounds Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . 108 7.3.2 Time-Segmented Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.4 Other Non-Linear Prediction Methods . . . . . . . . . . . . . . . . . . . . . . . . . 111 7.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
8
Uncertainty of Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 8.2 Uncertain Trajectory Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 8.3 Uncertain Trajectory Indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.3.1 Structure of the UTR-Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 8.3.2 Construction and Maintenance of UTR-Tree . . . . . . . . . . . . . 121 8.4 Uncertainty Trajectory Querying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 8.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
Part III Moving Objects Management Applications 9
Dynamic Transportation Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 9.2 Moving Objects Management Application Scenarios . . . . . . . . . . . . . 128 9.3 Dynamic Transportation Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 9.3.1 Hierarchy Aggregation Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 9.3.2 Dynamic Navigation Query Processing . . . . . . . . . . . . . . . . . . 132 9.3.3 Dynamic Navigation System Architecture . . . . . . . . . . . . . . . . 134 9.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
x
Contents
10 Dynamic Transportation Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 10.2 The System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 10.3 Data Model of Transportation Network and Moving Objects . . . . . . . 140 10.4 Querying Moving Objects in Transportation Networks . . . . . . . . . . . 145 10.4.1 Computing the Locations Through Interpolation . . . . . . . . . . 145 10.4.2 Querying Moving Objects with Uncertainty . . . . . . . . . . . . . . 146 10.4.3 Location Prediction in Transportation Networks . . . . . . . . . . 148 10.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 11 Clustering Analysis of Moving Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 11.2 Underlying Clustering Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . 152 11.3 Clustering Static Objects in Spatial Networks . . . . . . . . . . . . . . . . . . . 154 11.3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 11.3.2 Edge-Based Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . 156 11.3.3 Node-Based Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . 159 11.4 Clustering Moving Objects in Spatial Networks . . . . . . . . . . . . . . . . . 161 11.4.1 CMON Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 11.4.2 Construction and Maintenance of CBs . . . . . . . . . . . . . . . . . . . 164 11.4.3 CMON Construction with Different Criteria . . . . . . . . . . . . . . 167 11.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 12 Location Privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 12.2 Privacy Threats in LBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 12.3 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 12.3.1 Non-Cooperative Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 177 12.3.2 Centralized Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 12.3.3 Peer-to-Peer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 12.4 Location Anonymization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . 179 12.4.1 Location K-Anonymity Model . . . . . . . . . . . . . . . . . . . . . . . . . 179 12.4.2 p-Sensitivity Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 12.4.3 Anonymization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 12.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 12.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
Acronyms
ANN Aggregate nearest neighbor AU Adaptive unit CA Cellular automaton CN Cluster node CU Cluster unit DS Dense segment DSS Dense segment set DTTLU Distance-threshold triggered location update DyNSA Dynamic navigation system based on moving objects stream aggregation GCA Graph of cellular automata GPS Global positioning system HAT Hierarchy aggregation tree IER Incremental Euclidean restriction INE Incremental network expansion ITLU ID-triggered location update LBS Location-based service LP Linear prediction MBR Minimum bounding rectangle MO Moving object MOD Moving objects databases MODTN Moving objects on dynamic transportation networks MOST Moving objects spatio-temporal MRM Mobile resource management NN Nearest neighbor PDQ Period density queries PTSS Prediction with time-segmented QoS Quality of service RER Range Euclidean restriction RNE Range network expansion RNN Reverse nearest neighbor SDQ Snap-shot density queries
xi
xii
SP Simulation-based prediction STTLU Speed-threshold triggered location update UT-Unit Uncertain trajectory unit UTR-Tree Uncertain trajectory R-tree
Acronyms
Part I
Moving Objects Management Models
Mobile data management has attracted considerable attention. Moving objects databases that manage the locations and corresponding information of moving objects have therefore been developed and have become the technical foundation of many location-based services applications. This part describes the underlying data models of moving objects management, including location modeling, location updating, and moving object indexing. In the first chapter, we introduce some background of moving objects management, including mobile computing and positioning technology, and then describe some applications in location-based services and mobile data management. Finally, we present the main content of moving objects databases technologies and our focuses in this book. Location modeling is the foundation for moving objects databases. In Chapter 2, we introduce a few underlying location modeling methods and propose a new graph of cellular automata (GCA) model to integrate the traffic movement features into the model of moving objects and the underlying spatial network. The structure, trajectory, and transition of GCA as well as two-lane GCA are described in detail. In Chapter 3, a few underlying location update methods are introduced based on thresholds, location prediction, and object grouping. Then, we describe two location update strategies in detail, which can improve the performance. One is the proactive location update strategy, which predicts the movement of moving objects in order to lower the update frequency; the other is the group location update strategy, which groups the objects to minimize the total number of objects reporting their locations. In Chapter 4, we first introduce a few of the underlying spatial index structures including the R-tree, Grid File, and Quad-tree. Then, we propose the indexing methods for moving objects in Euclidean space and in spatial networks, respectively. Three indexing structures: the time parameterized R-tree (TPR-tree), Grid Filebased moving objects index (GMOI), and future trajectory Quad-tree (FT-Quadtree) are presented to improve the R-tree, Grid File, and Quad-tree index structures for moving objects in Euclidean space. For moving objects in spatial networks, we introduce a dynamic data structure, called adaptive unit and the adaptive network R-tree (ANR-tree) to solve the index update problem and to support predictive querying of moving objects. By naturally extending the ANR-tree to index historical trajectory, it can be used to index the past, present, and future positions of moving objects in road networks. Finally, we discuss how to reduce index updates in existing moving objects indexing structures.
Chapter 1
Introduction
Xiaofeng Meng1 , Jidong Chen2 1
Renmin University of China, 2 EMC Research China
[email protected],
[email protected] Abstract Advances in computer and telecommunication technologies have made mobile computing a reality. In a mobile computing environment, users can access information through wireless connections regardless of their physical location. In the last decade, this new kind of computing paradigm has gained great development and posed new challenges to databases. Mobile data management has attracted considerable attention. Moving objects databases that include the management of location information, has become an enabling technology for many location-based services applications. In this chapter, we introduce some background of moving objects management, including mobile computing and positioning technology, and then describe some applications in location-based services and mobile data management. Finally, we present the main content of moving objects databases technologies. Key words: mobile computing, positioning, location-based service, moving object, moving object databases, model, index, query, update, prediction, uncertainty management, clustering analysis, location privacy
1.1 Background 1.1.1 Mobile Computing The combination of computing techniques and wireless networks makes mobile computing more and more pervasive. Compared with traditional distributed com-
3
4
1 Introduction
puting environment based on stable networks, mobile computing has the following features: mobility, frequent disconnection, variety of bandwidth, asymmetry of network communication, scalability, limited power of mobile devices, low reliability of the networks, and so on [13]. In such environments, some new technologies in particular the positioning technologies have merged and enabled a variety of new applications such as location-based services.
1.1.2 Positioning Techniques Recently, many new positioning techniques have been developed. GPS, which stands for global positioning system, is the oldest computerized locating technique and in widespread use, as it is the only technique with a worldwide (outdoor) coverage. Being invented by the American military, it has made the transition to the civilian marketplace, and today is being used in almost every navigation device for any means of transportation. From flying and sailing to car racing and ski touring, all commonly use GPS to determine their location. As the prices and size of GPS chips go down, they are integrated into more devices. There are other techniques being developed to localize a user, mostly based on various wireless technologies such as WiFi, RFID, and Bluetooth. Each kind of positioning technology has its advantages and disadvantages. The chosen technology will depend on the characteristics required of the service. Variables pertaining to the different technologies include: • • • • • • • •
accuracy price availability size of the area that needs to be covered coverage investment type of coverage (indoor, outdoor) power consumption physical size
Table 1.1 shows several positioning technologies: GPS, cellular network based, WiFi, tag based systems (RFID and Bluetooth), and each has its own characteristics pertaining to these variables.
1.2 Location-Based Services A location-based service (LBS) is an information and entertainment service, accessible with mobile devices through the mobile network. It makes use of the geographical position of the mobile device. In a typical LBS application, moving objects use
1.2 Location-Based Services
5
Table 1.1 Overview of positioning technologies Variable
GPS
Cellular work
Net- Cellular Hand- WiFi set
Accuracy Coverage
5 - 10 m Worldwide
50 m - 30 km Most countries
Indoor/outdoor coverage Hardware Coverage investment Disadvantages
Mainly Outdoor Both
Both
13 - 40 m Meters Urban environ- Very local ments Both Both
GPS chip None
Phone Mapping coverage Not possible in current phone software
WiFi reception Mapping coverage WiFi coverage needed, extensive mapping
Phone Done by provider Coverage, ex- Provider owned, tra hardware expensive to acneeded cess
2.5 - 200 m Most countries
Tag-based (RFID/Bluetooth)
Tag reader Tagging places Work only in vicinity of tags, active user input needed
e-services that involve location information. The objects disclose their positional information (position, speed, velocity, etc.) to the services, which in turn use this and other information to provide specific functionality. The five categories described next characterize what may be thought of as standard location-based services; they do not attempt to describe the diversity of services possible [18]. 1. Traffic coordination and management: Based on past and up-to-date positional data on the subscribers to a service, the service may identify traffic jams and determine the currently fastest route between two positions; it may give estimates and accurate error bounds for the total travel time, and it may suggest updated routes for the remaining travel. It also becomes possible to automatically charge fees for the use of infrastructure such as highways or bridges (termed as roadpricing and metered services). 2. Location-aware advertising and general content delivery: Users may receive sales information (or other content) based on their current locations when they indicate to the service that they are in “shopping-mode.” Positional data is used together with an accumulated user profile to provide a better service, e.g., advertisements that are more relevant to the user. 3. Integrated tourist services: This covers the advertising of the available options for various tourist services, including all relevant information about these services and options. Services may include overnight accommodation at camp grounds, hostels, and hotels; transportation via train, bus, taxi, or ferry; cultural events, including exhibitions, concerts, etc. For example, this latter kind of service may cover opening-hour information, availability information, travel directions, directions to empty parking, and ticketing. It is also possible to give guided tours to tourists, e.g., that carry online cameras. 4. Safety-related services: It is possible to monitor tourists traveling in dangerous terrain, and then react to emergencies (e.g., skiing or sailing accidents); it is possible to offer senile senior citizens more freedom of movement. It is possible to
6
1 Introduction
offer a service that takes traffic conditions into account to guide users to desired destinations along safe paths. 5. Location-based games and entertainment: One example of this is treasure hunting, where the participants compete in recovering a treasure. The treasure is virtual, but is associated with a physical location. By monitoring the positions of the participants, the system is able to determine when the treasure is found and by whom. In a variation of this example, the treasure is replaced by a “monster” with “vision”, “intelligence”, and the ability to move. Another example in this category is a location-based ICQ service.
1.3 Mobile Data Management In mobile computing environments, many new applications deal with a significant amount of data, which leads to the need for mobile data management techniques [11, 17]. Mobile data management mainly includes mobile database techniques, small footprint databases design and implementation, and moving objects data management. Mobile database techniques include mobile transaction management, data caching and replication, synchronization and publication. Small footprint databases techniques include flash-based storage and index model design, query processing and optimization in limited memory, transaction management, recovery techniques, and synchronization. Moving objects data management includes modeling and tracking of dynamic location information, uncertainty management, indexing and location-dependent query processing, data mining (including traffic and location prediction), privacy and security, and location dissemination. In addition, the strong growth in wireless communications and the ever-increasing availability of mobile multi-purpose devices have created a global computing environment. People communicate, work, and confer using a wide range of devices all connected via an array of communication networks that provide voice and data access regardless of geographic position. This infrastructure aggregation presents a number of challenges especially when it comes to data-intensive applications such as LBS and PIM and those with sensor networks. Therefore, non-traditional issues including semantics of data, location-centric data services, broadcast and multicast delivery, data availability techniques, security of data, as well as privacy questions should be given considerable attention [11, 13].
1.4 Moving Object Databases Existing database management systems (DBMSs) are not well equipped to handle the continuously changing location data of moving objects. Moving objects databases (MOD), which includes the management of location information, has become an enabling technology for LBS applications.
1.4 Moving Object Databases
7
Moving objects databases belong to the area of the spatio-temporal databases, which derive from spatial databases and temporal databases. The difference is that moving objects databases focus on the continuous change in geographic space while spatial-temporal databases only support the discrete changing of spatial information. There are two aspects for managing moving objects in the databases: location management and spatio-temporal data. Location management focuses on how to represent, store, and query the continuously changing locations of moving objects in a database as well as how to predict the future positions of moving objects. For spatial-temporal data, it is important to store the whole history of the movement of moving objects so that queries in any time can be answered. Considerable research has been carried out regarding the two different aspects of moving objects data management, which includes modeling and tracking of location information, uncertainty management, spatio-temporal indexing and querying issues, and data mining (including traffic and location prediction). The main contents of moving objects management include the following: 1. Moving Objects Modeling: A fundamental capability of location management is modeling and representation of location information. Wolfson et al. in [30, 36] first proposed a moving objects spatio-temporal (MOST) model, which represents the location as a dynamic attribute. Later, models based on linear constrain [31], abstract data types [15], and space-time grid storage [2] for moving objects were proposed. Recently, more research on modeling [7, 10, 14, 15] has begun to consider the interaction between moving objects and the underlying transportation networks as well as the real movement features of objects in transportation. For example, [7] models the road network and moving objects in a graph of cellular automata, which integrates the traffic movement features into the model of moving objects and the underlying spatial network. 2. Moving Objects Updating and Prediction: Large numbers of locations can be sampled by sensors or GPS periodically, then sent from moving clients to the server and stored in a database. Therefore, continuously maintaining in a database the current locations of moving objects by using a tracking technique becomes very important. The key issue is minimizing the number of updates, while providing precise locations for query results. To reduce location updates, most existing studies propose to lower the update frequency by a prediction method [3, 35, 37]. They usually use the linear prediction model, which represents object locations as linear functions of time. The objects do not report their locations to the server unless their actual positions exceed the predicted positions to a certain threshold. The group update strategy [9] is also proposed to reduce the total number of objects reporting their locations. A simulation-based prediction model in [7] is proposed. This model provides a more accurate location prediction for objects movements in a traffic road network while lowering the update frequency and assuring location precision. 3. Moving Objects Indexing: Indexing and querying issues in the MOD are closely related to the corresponding aspects of spatial-temporal databases. The access methods focus on two issues: (1) storage and retrieval of historical information, and (2) prediction of future trajectory. The amount of historical trajecto-
8
1 Introduction
ries is constantly increasing over time, which makes it infeasible to keep track of all location updates. As a result, past positions of moving objects are often approximated by polylines (multiple line segments). Several indexing techniques [23, 28, 32], all based on three-dimensional (3-D) variations of R-trees and R*-trees, have been proposed, and their goal is to minimize the storage and query cost. Indexing future trajectories is different from indexing historical trajectories. The goal is to efficiently retrieve objects that will satisfy some spatial condition at a future time given their present motion vectors. Some of the early studies [20, 24, 29] employ dual transformation techniques that represent the predicted positions as points moving in a two-dimensional (2-D) space. The main focus of the most recent work is on practical implementation [19, 29, 34]. For instance, the TPR-tree [29] and its variations [34] are based on R-trees, and the Bx -tree [19] is based on the B+ -tree. Development of efficient indexes with good update performance is a challenge due to frequent object movement. A novel spatio-temporal index is proposed in [12], which is based on PMR Quad-tree. It adopts a trajectory segment shared structure while depicting an efficient update algorithm. A dynamic data structure, called Adaptive Unit, is introduced in [5], which groups neighboring objects with similar movement patterns. To reduce updates, an adaptive unit captures the movement bounds of the objects based on different assumptions on the traffic conditions and obtained from the simulation, which considers the road-network constraints and stochastic traffic behavior. A spatial index for the road network is then built over the adaptive unit structures, which forms the ANR-tree [6]. The ANR-tree supports efficient predictive queries and is robust for frequent updates. 4. Moving Objects Querying: Corresponding to indexes, the queries for moving objects can also be divided into two categories: queries of historical locations of the moving objects, and queries of their anticipated future locations (also known as predictive queries). In addition, spatial query types such as the window query, K-nearest neighbor query, and spatial join query can naturally find their counterparts in MOD except that they are augmented with additional temporal predicates [25, 26]. In addition, the dynamic nature of moving objects also leads to several novel query types — similarity trajectory, density [21], as well as continuous query [22, 27, 33]. 5. Moving Objects Uncertainty Management: The location of a moving object is inherently imprecise because, regardless of the policy used to update the database location of the object (i.e., the object location stored in the database), the database location cannot always be identical to the actual location of the object. This inherent uncertainty has various implications for database modeling, querying, and indexing [16]. Although uncertainty in databases has been studied extensively, the new modeling and spatio-temporal capabilities needed for moving objects introduce the need to revise existing solutions. 6. Moving Objects Clustering Analysis: For some new applications, real-time data analysis such as clustering analysis is becoming one of the most important requirements, especially, clustering moving objects in spatial networks [4, 8]. One of the objectives for clustering objects is to identify the traffic congestion. A
References
9
unified framework for clustering moving objects in spatial networks (CMON) is proposed in [4]. The goals are to optimize the cost of clustering moving objects and support multiple types of clusters in a single application. The framework is composed of two components: (1) The continuous maintenance of cluster blocks (CBs); (2) The periodic construction of clusters with different criteria based on CBs. The network features are explored to reduce the search space and avoid unnecessary computation of network distance. 7. Location Privacy: Protection of user’s privacy has been a central issue for location-based services. Privacy threats related to location-based services are classified into two categories: communication privacy threats and location privacy threats. Location privacy is a particular type of information privacy [1]. In [38], two kinds of privacy protection requirements in LBS are identified: location anonymity and identifier anonymity. To strike a balance between the location privacy and quality of service (QoS), a quality-aware anonymity model for protecting location privacy while meeting user-specified QoS requirements is necessary. Depending on the particular objects and applications under consideration, the movements of the objects may be subject to constraints. Specifically, we may distinguish among three movement scenarios, namely unconstrained movement (vessels at sea), constrained movement (pedestrians), and movement in spatial networks (trains and, typically, cars) [25]. The latter scenario occurs when the applications at hand are concerned with the positions of the objects with respect to the transportation network. For example, we may expect that many applications will be interested only in the positions of cars with respect to the road network, rather than in their absolute coordinates. Early studies focused on moving objects database in free space. They assume that the movement of the objects is unconstrained and based on Euclidean spaces. However, in the real world, objects move within spatially constrained networks, e.g., vehicles move on road networks. Overlooking this reality often leads to unrealistic data modeling and inaccurate query results. The content in this book focuses mainly on moving objects within spatial networks, which is more practical. By exploring the features of spatial networks, this book introduces models, techniques, and applications of moving objects management from the location management perspective to analyze how the continually changing locations affect the traditional database and data mining technology. Specifically, the book describes the topics of moving objects modeling, location updating and indexing, querying and prediction for moving objects, uncertainty management, clustering analysis and location privacy issues, as well as some applications in intelligent transportation management.
References 1. Beresford AR, Stajano F (2003) Location Privacy in Pervasive Computing. IEEE Pervasive Computing 2(1):46-55
10
1 Introduction
2. Chon HD, Agrawal D, Abbadi AE (2001) Using Space-Time Grid for Efficient Management of Moving Objects. In: Proceedings of the 2nd ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE 2001), Santa Barbara, California, USA, pp 59-65 3. Civilis A, Jensen CS, Nenortaite J, Pakalnis S (2004) Efficient Tracking of Moving Objects with Precision Guarantees. In: Proceedings of the 1st Annual International Conference on Mobile and Ubiquitous Systems, Networking and Services, Cambridge, Massachusetts, USA, pp 164-173 4. Chen J, Lai L, Meng X, Xu J, Hu H (2007) Clustering Moving Objects in Spatial Networks. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA 2007), Bangkok, Thailand, pp 611-623 5. Chen J, Meng X (2009) Update-efficient Indexing of Moving Objects in Road Networks. GeoInformatica 13(4):397-424 6. Chen J, Meng X, Guo Y, Grumbach S (2007) Indexing Future Trajectories of Moving Objects in a Constrained Network. Journal of Computer Science and Technology 22(2):245-251 7. Chen J, Meng X, Guo Y, Grumbach S, Wang H (2006) Modeling and Predicting Future Trajectories of Moving objects in a Constrained Network. In: Proceedings of the 7th International Conference on Mobile Data Management (MDM 2006), Nara, Japan, pp 156 8. Chen J, Meng X, Lai C (2007) Clustering Objects in Road Networks (in Chinese). Journal of Software 18(2):332-344 9. Chen J, Meng X, Li B, Lai C (2006) Tracking Network-Constrained Moving Objects with Group Updates. In: Proceedings of the 7th International Conference on Web-Age Information Management (WAIM 2006), Hong Kong, China, pp 158-169 10. Ding Z, G¨uting RH. Managing Moving Objects on Dynamic Transportation Networks (2004) In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Santorini Island, Greece, pp 287-296 11. Dunham MH , Helal A (1995) Mobile Computing and Databases: Anything New? SIGMOD Record 24:5-9 12. Ding R, Meng X, Bai Y (2003) Efficient Index Update for Moving Objects with Future Trajectories. In: Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA 2003), Kyoto, Japan, pp 183-194 13. Forman GH, Zahorjan J (1994) The Challenges of Mobile Computing. Computer 27:387-403 14. G¨uting RH, Almeida VT, Ding Z (2006) Modeling and Querying Moving Objects in Networks. VLDB Journal 15(2):165-190 15. G¨uting RH, B¨ohlen MH, Erwig M, Jensen CS, Lorentzos NA, Schneider M, Vazirgiannis M (2000) A Foundation for Representing and Querying Moving Objects. ACM Transactions on Database Systems 25(1):1-42 16. Gowrisankar H, Nittel S (2002) Reducing Uncertainty In Location Prediction Of Moving Objects In Road Networks. In: Proceedings of the 2nd International Conference on Geographic Information Science (GIS 2002), Boulder, Colorado, USA, pp 228-242 17. Imielinski T, Badrinath BR (1993) Data Management for Mobile Computing. SIGMOD RECORD 22:349 18. Jensen CS, Friis-Christensen A, Pedersen TB, Pfoser D, Saltenis S, Tryfona N (2001) Location-Based Services — A Database Perspective. In: Proceedings of the 8th Scandinavian Research Conference on Geographical Information Science (ScanGIS 2001), As, Norway, pp 59-68 19. Jensen CS, Lin D, Ooi BC (2004) Query and Update Efficient B+ Tree Based Indexing of Moving Objects. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp 768-779 20. Kollios G, Gunopulos D, Tsotras VJ (2000) On indexing mobile objects. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2000), Dallas, Texas, USA, pp 175-186 21. Lai C, Wang L, Chen J, Meng X (2007) Effective Density Queries for Moving Objects in Road Networks. In: Proceedings of the Joint International Conferences on Asia-Pacific Web
References
22.
23. 24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35. 36.
37.
38.
11
Conference (APWeb 2007) and Web-Age Information Management (WAIM 2007), Huang Shan, China Mouratidis K, Yiu ML, Papadias D, Mamoulis N (2006) Continuous Nearest Neighbor Monitoring in Road Networks. In: Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea, pp 43-54 Nascimento MA, Silva JRO (1998) Towards Historical R-trees. In: ACM Symposium on Applied Computing (SAC 1998), Atlanta, Georgia, USA, pp 235-240 Patel JM, Chen Y, Chakka VP (2004) STRIPES: An Effcient Index for Predicted Trajectories. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 637-646 Pfoser D, Jensen CS (2003) Indexing of Network Constrained Moving Objects. In: Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems (GIS 2003), New Orleans, Louisiana, USA, pp 25-32 Pfoser D, Jensen CS, Theodoridis Y (2000) Novel Approaches in Query Processing for Moving Object Trajectories. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp 395-406 Roussopoulos N, Kelley S, Vincent F (1995) Nearest Neighbor Queries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), San Jose, California, USA, pp 71-79 Saltenis S, Jensen CS (2002) Indexing of Moving Objects for Location-Based Service. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, USA, pp 463-472 Saltenis S, Jensen CS, Leutenegger ST, Lopez MA (2000) Indexing the Positions of Continuously Moving Objects. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, USA, pp 331-342 Sistla P, Wolfson O, Chamberlain S, Dao S (1997) Modeling and Querying Moving Objects. In: Proceedings of the 13th International Conference on Data Engineering (ICDE 1997), Birmingham, United Kingdom, pp 422-432 Su J, Xu H, Ibarra O. Moving Objects (2001) Logical Relationships and Queries. In: Proceedings of the 7th International Symposium on Spatial and Temporal Databases (SSTD 2001), Redondo Beach, CA, USA, pp 3-19 Tao Y, Papadias D (2001) The MV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Roma, Italy, pp 431-440 Tao Y, Papadias D, Shen Q (2002) Continuous Nearest Neighbor Search. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), Hong Kong, China, pp 287-298 Tao Y, Papadias D, Sun J (2003) The TPR*-Tree: An Optimized Spatio-Temporal Access Method for Predictive Queries. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp 790-801 Wolfson O, Sistla AP, Camberlain S, Yesha Y (1999) Updating and Querying Databases that Track Mobile Units. Distributed and Parallel Databases 7(3):257-387 Wolfson O, Xu B, Chamberlain S, Jiang L (1998) Moving Object Databases: Issues and Solutions. In: Proceedings of the 10th International Conference on Scientific and Statistical Database Management (SSDBM 1998), Capri, Italy, pp 111-122 Wolfson O, Yin H (2003) Accuracy and Resource Consumption in Tracking and Location Prediction. In: Proceedings of the 7th International Symposium on Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 325-343 Xiao Z, Meng X, Xu J (2007) Quality Aware Privacy Protection for Location-based Services. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA 2007), Bangkok, Thailand, pp 434-446
Chapter 2
Moving Objects Modeling
Jidong Chen1 , Xiaofeng Meng2 1
EMC Research China, 2 Renmin University of China
[email protected],
[email protected] Abstract Location modeling is the foundation for moving objects databases. Existing database management systems are not well equipped to handle continuously changing data, such as the position of moving objects. The reason for this is that in traditional databases, data is assumed to be constant unless it is explicitly modified. This is unsatisfactory for MOD since locations of moving objects are continuously changing. In this chapter, we introduce a few underlying location modeling methods and propose a new graph of cellular automata (GCA) model to integrate the traffic movement features into the model of moving objects and the underlying spatial network. Key words: location modeling, cellular automata, graph of cellular automata, moving object trajectory, spatial network, moving object database
2.1 Introduction The continued advances in wireless sensor networks and position technologies enable traffic management and location-based services that track continuously changing positions of moving objects. Timely location information is becoming one of the key features of these applications. In existing DBMSs, data is assumed to be constant unless it is explicitly modified. Therefore, the continuously changing data, such as the location of moving objects are hard to handle. For example, if the salary field is 30K, then this salary is assumed to hold (i.e., 30K is returned in response to queries) until explicitly updated. Thus, in order to represent moving objects (e.g.,
13
14
2 Moving Objects Modeling
cars) in a database, and answer queries about their position, the car’s position has to be continuously updated. This is unsatisfactory since either the position is updated very frequently (which would impose a serious performance and wirelessbandwidth overhead), or, the answer to queries is outdated. Furthermore, it is possible that due to disconnection, an object cannot continuously update its position. Therefore, new location modeling methods are needed to solve this problem. Many models and algorithms have been proposed to handle the continuously changing positions of moving objects. Wolfson et al. in [16, 20] first proposed a moving objects spatio-temporal (MOST) model, which represents the location as a dynamic attribute. Later, models based on linear constraints [17], abstract data types [7], and space-time grid storage [2] for moving objects were proposed. However, in most real-life applications, objects move within constrained networks, especially in the case of transportation networks (e.g., vehicles move on road networks). These models do not take into account the interaction between moving objects and the underlying transportation networks. The interaction is very important in the management of moving objects in spatial networks. For example, in location tracking, the road-network representation of moving objects can be exploited to reduce the number of updates from moving objects to the database server [3]. For indexing moving objects in road networks, the temporal aspect can be distinguished and related to the road network to save considerable index storage space [1, 5], since the spatial property of objects’ movement is already captured by the network. In addition, by using the network constraints, the query processing can also be improved [9, 15]. More recently, models connecting moving objects with the road network representation have been proposed [4, 8, 13, 14, 19]. Most of them represent road networks as graphs and moving objects as moving graph points with their speed in order to capture the objects’ movement. However, the models assume linear movement and cannot reflect the real movement feature of moving objects in a road network where objects frequently change their velocity. This limits their applicability in a majority of real-life applications. In this chapter, we first give a brief introduction of the underlying models of moving objects. Then we propose a new graph of cellular automata (GCA) model to integrate the traffic movement features into the model of moving objects and the underlying road network. The GCA model utilizes the stochastic behavior of the real traffic by the cellular automaton that is used in the traffic simulation [11]. It also combines the road network model with the real movement of objects and therefore improves the efficiency of managing network-constrained moving objects.
2.2 Underlying Models Each attribute of an object is either static or dynamic. Intuitively, a static attribute of an object is an attribute in the traditional sense, i.e., it changes only when an explicit update of the database occurs; in contrast, a dynamic attribute changes over time
2.2 Underlying Models
15
according to some given function, even if it is not explicitly updated. For example, consider a moving object whose position in two-dimensional space at any point in time is given by values of the (x, y) coordinates. Then, each of the object’s coordinates is a dynamic attribute. The main difference between the dynamic attribute of moving objects and attributes in traditional database systems is that the values of the location attribute are continuously changing. Wolfson et al. in [16, 20] first proposed a moving objects spatio-temporal (MOST) model for databases with dynamic attributes, i.e., attributes that change continuously as a function of time, without being explicitly updated. In the MOST model, the location as a dynamic attribute is represented as a function of time. For example, the position of a car is given as a function of its motion vector (e.g., north, at 60 miles/hour). In other words, it considers a higher level of data abstraction, where an object’s motion vector (rather than its position) is represented as an attribute of the object. Obviously, the motion vector of an object can change (and thus it can be updated), but in most cases it does so less frequently than the position of the object. When a dynamic attribute is queried, the answer returned by the MOD consists of the value of the attribute at the time the query is entered. In this sense, the MOST model is different from existing database systems, since, unless an attribute has been explicitly updated, a DBMS returns the same value for the attribute, independent of the time at which the query is posed. With the motion vector, the MOST model is capable of representing not only the current, but also the near future position of moving objects. However, due to the limited expressiveness capability of the simple function in dynamic attributes, the MOST model only represents the future positions of moving objects in a short period. It is difficult to model moving objects for a longer future period. The study [6] solves this problem by presenting the moving object’s discrete data model in which the complicated trajectory of a moving object can be represented by a set of relatively simple discrete segments. In addition, Su et al. in [17] presented a data model for moving objects based on linear constraint databases. Chon et al. in [2] proposed a space-time grid storage model for moving objects. In [7], G¨uting et al. presented a data model and data structures for moving objects based on abstract data types. These studies focus on the modeling of objects moving in free spaces, not constrained in a spatial network. For moving objects in a spatial network, when adding the network constraint, we need to consider not only the location representation but also the modeling of the spatial network as well as the spatial objects in the network. There are three kinds of methods to represent a road network: the kilometer-post representation (i.e., road, kilometer, offset), two-dimensional geographical representation (i.e., segment, connection), and graph representation (i.e., edge, node). The kilometer-post representation (the most commonly used type of known-marker representation) is used for road administration. It is convenient for relating a physical location to a location stored in a database and vice versa. The location is expressed in terms of the road, the distance marker on the road (e.g., kilometer post), and the offset from the distance marker. Primitive technological means, such as a simple measuring device and a map and a ruler, suffice for identifying a position on a road, rendering the use
16
2 Moving Objects Modeling
of the representation cost-effective and thus practical. The two-dimensional geographical representation captures the geographical coordinates of the road network. The coordinate representation enables users to directly reference the location rather than measure distances along roads from certain locations, such as kilometer posts. This representation consists of line segments that represent (parts of) roads, and it is similar to a digital map. Since the locations of moving objects are provided in (an equivalent of) Euclidean coordinates, a correspondence between the roads and their locations in Euclidean space is necessary to place the moving objects in the road network. The graph representation is completely separated from the geographic space into which a road network is embedded. Rather, this representation captures the essence of a road network: edges represent parts of roads in between intersections, edge directions capture the allowed traffic directions, and edge weights capture the properties influencing movement. This is a much more abstract and thus simple and compact structure, which is also more computationally efficient than the geographical representation. Different representations of a moving object’s movement result in different location models. We describe several representations. First, a naive approach that is used by existing industrial applications such as fleet management is to represent the movement of an object’s as a constant function (i.e., as a point). With this representation, the position-time point of each object is generated and stored periodically, thus incurring large amount of updates. In addition, it has the disadvantage of incapability of interpolation or extrapolation. Therefore it may be only useful when the object is rarely moving or is moving erratically within an area that is small compared to the area given by the threshold used. Second, we may model the object location as a linear function (i.e., as a vector), which is generally adopted by several existing indexes for moving objects. Since the current location, the current speed and direction of the object are used to predict its anticipated near-future position, this model enables us to make tentative future predictions. At the same time, it has been proved that the linear location model can reduce the number of updates to one-third of those required by point representation. However, this model is only suitable for situations in which the object moves with constant speed. Third, it is more practical to represent the movement of a moving object by using a non-linear function such as spline interpolation. The spline method generally gives more realistic results for representing the whole movement of a moving object with uniformly varying speed but at the cost of more location information and more complex computation. As a result, considering the complexity, a few research studies on moving objects consider the spline model. Next, as most objects are moving under a constrained environment (such as transportation networks), we may utilize the infrastructure that consists of road segments in the presentation of an object’s movement. In this way, the object is assumed to move at constant speed along each segment and to stop on reaching the end of the current segment. The performance of updates with this representation depends on the lengths of the element. However, in the scenario of a realistic traffic system, which is stochastic, dynamic, and fuzzy in nature, the mathematical models mentioned above can hardly reflect the movement of moving objects restricted by road networks.
2.3 Graphs of Cellular Automata Model
17
In 2001, Vazirgiannis and Wolfson [19] first introduced a model for moving objects on road networks, which connects the moving object’s trajectory model with the road network representation. In the model, the road network is represented by an electronic map and the trajectory of a moving object is constructed by the map and its destination. In [14], the authors presented a computational data model for network-constrained moving objects in which the road network has two representations, namely a two-dimensional representation and a graph representation to obtain both expressiveness and efficient support for queries. In this model, the moving objects treated as query points are represented by graph points located on segments or edges. Ding et al. [4] proposed a MOD model, based on dynamic transportation networks. They modeled transportation networks as dynamic graphs and moving objects as moving graph points. In addition, Papadias et al. in [13] presented a framework to support spatial network databases. However, these models capture movement information of objects only by their speed and assume linear movement, which limit their applicability in a majority of real-life applications. Despite the wide use of traffic simulation rules in the transportation GIS domain [11], their integration to a database model for objects in constrained networks has never been done before. In the next section, we propose a new graph of cellular automata (GCA) model to integrate the traffic movement features into the model of moving objects and the underlying road network.
2.3 Graphs of Cellular Automata Model We model a road network with a graph of cellular automata (GCA), where the nodes of the graph represent road intersections and the edges represent road segments with no intersections. Different from the general graph model, each edge in the GCA consists of a cellular automaton (CA), which is represented, in a discrete mode, as a finite sequence of cells. Each cell corresponds in practice to some road segment of about 7.5 m. Figure 2.1 shows an example of a road network and its GCA model. Each node has a label that represents an intersection of the road network. The wide lines represent edges and each edge treated as one CA connects many cells.
2.3.1 Cellular Automata (CA) Cellular automata were originally introduced by von Neumann [12] and Ulam [18] in the 1960s with the particular purpose of modeling biological self-reproduction. Since then, they have been used broadly for physics applications such as particle transport simulations and thermodynamics studies. The CA model was used in this context in [11]. Because in the CA model it is quite simple and easy to describe the
18
2 Moving Objects Modeling
Fig. 2.1 An example of a road network and its GCA model
interaction of cells, it is suitable for computer simulations of discrete phenomena. We first recall the definition of cellular automaton. Definition 2.1. A cellular automaton consists of a finite oriented sequence of cells. In a configuration, each cell is either empty or contains a symbol. During a transition, symbols can move forward to subsequent cells, symbols can leave the CA, and new symbols can enter the CA. An example of cellular automaton corresponding to the edge (N1 , N2 ) in Fig. 2.1(b) with a transition between two configurations is given in Fig. 2.2.
Fig. 2.2 Transition of the GCA
2.3.2 Structure of GCA We now formally define a graph of cellular automata. Definition 2.2. The structure of a GCA is a directed weighted graph G = (V, E, l) is where V is a set of vertices (i.e., nodes), E is a set of edges, and l : E → a function that associates to each edge the number of cells of the corresponding cellular automaton.
2.3 Graphs of Cellular Automata Model
19
We assume a countably infinite alphabet Ω : {α , β , γ , · · ·}, denoting moving objects’ names. Let C be the set of cells of a GCA. A configuration or an instance of a GCA is a mapping from the cells of the GCA to constants in Ω together with a given velocity. Intuitively, the velocity is the number of cells an object can traverse during a time unit. Definition 2.3. An instance I of a GCA is defined by two functions: μ : C → Ω {0} / (1-1 mapping) v:Ω → . A moving object is represented as a symbol attached to the cell in the GCA and it can move several cells ahead at each time unit. Figure 2.1(b) is an instance of the GCA corresponding to the road network of Fig. 2.1(a). In Fig. 2.1(b), moving objects are denoted by squares. A moving object lies on exactly one cell of the edge and its location can be obtained by computing the number of cells relative to the start node. For instance, the object α lies on the edge (N1 , N2 ) and it is two cells away from N1 along the edge. Therefore, its position can be expressed by (N1 , N2 , 2).
2.3.3 Trajectory of GCA The motion of an object is represented as information in the form (time, location). Representing such information of a moving object as a trajectory is a typical approach [19]. In the GCA model, the trajectory of a moving object can be divided into two types: the in-edge trajectory for the object’s movement in one edge (CA) and the global trajectory for the object that may move cross several edges (CAs) during its movement. The in-edge trajectory of an object is a polyline in two-dimensional space (onedimensional (1-D) relative distance, plus time), which can be defined as follows: Definition 2.4. The in-edge trajectory of a moving object in a CA of length L is a piece-wise function f : T → , represented as a sequence of points (t1 , l1 ), (t2 , l2 ), . . . , (tn , ln )(t1 < t2 < . . . < tn , l1 < l2 < . . . < ln ≤ L). When an object moves across multiple edges, its global trajectory is defined as functions mapping the time to the edge and relative distance. Definition 2.5. The global trajectory of a moving object in different CAs is a piecewise function f : T → (E, ), represented as a sequence of points (t1 , e1 , l1 ), . . . , (ti , e j , lk ), . . . , (tz , em , ln )(t1 < t2 < . . . < tz ). In the sequel, we will consider the deterministic paths in the GCA, i.e., path with source nodes of out degree 1. The successive CAs in a deterministic path can be then seen as a unique CA.
20
2 Moving Objects Modeling
2.3.4 Transition of GCA Let i be an object moving along an edge. Let v(i) be its velocity, x(i) its position, gap(i) the number of empty cells ahead (forward gap), and Pd (i) a randomized slowdown rate that specifies the probability that it slows down. We assume that Vmax is the maximum velocity of the moving objects. The position and velocity of each object might change at each transition as shown in Definition 2.6 adapted from [11]. Definition 2.6. At each transition of the GCA, each object changes velocity and position in a CA of length L according to the rules below: 1. if v(i) < Vmax and v(i) < gap(i), then v(i) ← v(i) + 1 2. if v(i) > gap(i), then v(i) ← gap(i) 3. if v(i) > 0 and rand() < Pd (i), then v(i) ← v(i) − 1 4. if (x(i) + v(i)) ≤ L, then x(i) ← x(i) + v(i) The first rule represents linear acceleration until the object reaches the maximum speed Vmax . The second rule ensures that if there is another object in front of the current object, it will slow down in order to avoid collision. In the third rule, the Pd (i) models erratic movement behavior. Finally, the new position of object i is given by the fourth rule as the sum of the previous position with the new velocity if it is in the CA. Note that it is easy to extend the definition of transition to deterministic paths. Owing to deterministic path, the objects move to a new position in a subsequent CA. Figure 2.2 shows the simulated movement of objects on a cellular automaton of the GCA in two consecutive timestamps. We can see that at time t, the speed of the objects a is smaller than the gap (i.e., the number of cells between the objects a and b). On the other hand, the object b will reduce its speed to the size of the gap. According to the fourth rule, the objects move to the corresponding positions based on their speeds at time t + 1.
2.3.5 Two-Lane GCA However, objects in real traffic have different desired speed. With the transitions of the GCA of one-lane CA mentioned above, it can be found that slow objects are followed by faster ones, and the average speed is reduced to the free-flow speed of the slowest object [10]. In view of this, we extend the one-lane GCA to twolane GCA in which a CA consists of two parallel single lanes. Therefore, each cell in a two-lane GCA is composed of two parallel single lanes and each lane may contain one symbol namely a moving object. The function μ in a GCA instance I will change to the 1–2 mapping accordingly. For the transition of GCA with one lane, we extend it to the two lane GCA by attaching an additional rule that models the changing of lanes of the object. Suppose the objects move only sideways, the transition of GCA happens on both lanes
2.4 Summary
21
according to the previous four rules and then the exchange of objects between two lanes is checked according to the additional conditions for changing lanes as follows: 5. Object i changes lane with probability Pc if gap(i) < p, gapo (i) > p1 , and gapo,b (i) > po,b where gap(i) is the number of empty cells ahead in the same lane, gapo (i) is the forward gap on the other lane, gapo,b (i) is the backward gap on the other lane, p, po , and po,b are the parameters that decide how far the object looks ahead on the current lane, ahead on the other lane, and back on the other lane, respectively. In fact, the changing lane rule is based on the following observation: the car “looks” ahead if some car is in its way; the car “looks” on the other lane if it is any better there; the car “looks” back on the other lane if it would get in other cars’ way. Generally, in the above rule, both p and po are essentially proportional to the velocity, whereas looking back depends mostly on the expected velocity of other objects, not on one’s own. An example of invoking the rule of changing lane with p = v + 1, po = p, po,b = vmax , Pc = 1 is given in Fig. 2.3. The object b with p = 3, po = 3, po,b = 5 changes to the other lane in the GCA, satisfying the fifth rule mentioned above.
Fig. 2.3 An example of changing lane in transition of the two-lane GCA
2.4 Summary For managing moving objects in a spatial network, the challenging first step is to precisely represent and model their locations. In this chapter, we first introduce a few underlying models for moving objects databases. Then, we combine road network representation and the movement model of objects in a traffic network to introduce a new model — GCA for network-constrained moving objects. The GCA model exploits the constraints of the network and models the stochastic aspect of urban traffic. Considering the new feature of the GCA model, it can be efficiently used to simulate future trajectories of moving objects, where objects’ movement follows traffic rules. In Chapter 7, a simulation-based prediction method based on the GCA will be proposed to predict the future trajectory of moving objects in a spatial network.
22
2 Moving Objects Modeling
References 1. Almeida VT, G¨uting RH (2005) Indexing the Trajectories of Moving Objects in Networks. GeoInformatica 9(1):33-60 2. Chon HD, Agrawal D, Abbadi AE (2001) Using Space-Time Grid for Efficient Management of Moving Objects. In: Proceedings of the 2nd ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE 2001), Santa Barbara, California, USA, pp 59-65 3. Civilis A, Jensen CS, Pakalnis S (2005) Techniques for Efficient Road-Network-Based Tracking of Moving Objects. IEEE Transactions on Knowledge and Data Engineering 17(5):698712 4. Ding Z, G¨uting RH (2004) Managing Moving Objects on Dynamic Transportation Networks. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Santorini Island, Greece, pp 287-296 5. Frentzos E (2003) Indexing Objects Moving on Fixed Networks. In: Proceedings of the 8th International Symposium On Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 289-305 6. Forlizzi L, G¨uting RH, Nardelli E, Schneider M (2000) A Data Model and Data Structures for Moving Objects Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, USA, pp 319-330 7. G¨uting RH, B¨ohlen MH, Erwig M, Jensen CS, Lorentzos NA, Schneider M, Vazirgiannis M (2000) A Foundation for Representing and Querying Moving Objects. ACM Transactions on Database Systems 25(1):1-42 8. G¨uting RH, Almeida VT, Ding Z (2006) Modeling and Querying Moving Objects in Networks. VLDB Journal 15(2):165-190 9. Kolahdouzan M, Shahabi C (2004) Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp 840-851 10. Nagatani T (1995) Bunching of Cars in Asymmetric Exclusion Models for Freeway Traffic. Physical Review E 51(2):922-928 11. Nagel K, Schreckenberg M (1992) A Cellular Automaton Model for Freeway Traffc. Journal Physique 2:2221-2229 12. Neumann JV (1966) Theory of Self-Reproducing Automata. University of Illinois Press, Champaign, Illinois, USA 13. Papadias D, Zhang J, Mamoulis N, Tao Y (2003) Query Processing in Spatial Network Databases. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp 790-801 14. Speicys L, Jensen CS, Kligys A (2003) Computational Data Modeling for NetworkConstrained Moving Objects. In: Proceedings of the 7th ACM International Symposium on Advances in Geographic Information Systems (GIS 2003), New Orleans, Louisiana, USA, pp 118-125 15. Shababi C, Kolahdouzan MR, Sharifzadeh M (2003) A Road Network Embedding Technique for K-Nearest Neighbor Search in Moving Objects Databases. GeoInformatica 7(3):255-273 16. Sistla P, Wolfson O, Chamberlain S, Dao S (1997) Modeling and Querying Moving Objects. In: Proceedings of the 13th International Conference on Data Engineering (ICDE 1997), Birmingham, United Kingdom, pp 422-432 17. Su J, Xu H, Ibarra O (2001) Moving Objects: Logical Relationships and Queries. In: Proceedings of the 7th International Symposium on Spatial and Temporal Databases (SSTD 2001), Redondo Beach, California, USA, pp 3-19 18. Ulam S (1972) Some Ideas and Prospects in Biomathematics. Annual Review of Biophysics and Bioengineering 1:272-292
References
23
19. Vazirgiannis M, Wolfson O (2001) A Spatio-Temporal Model and Language for Moving Objects on Road Networks. In: Proceedings of the 7th International Symposium on Spatial and Temporal Databases (SSTD 2001), Redondo Beach, California, USA, pp 20-35 20. Wolfson O, Xu B, Chamberlain S, Jiang L (1998) Moving Object Databases: Issues and Solutions. In: Proceedings of the 10th International Conference on Scientific and Statistical Database Management (SSDBM 1998), Capri, Italy, pp 111-122
Chapter 3
Moving Objects Updating
Jidong Chen1 , Xiaofeng Meng2 1
EMC Research China, 2 Renmin University of China
[email protected],
[email protected] Abstract In moving objects applications, large numbers of locations can be sampled by sensors or GPS periodically, then sent from moving clients to the server and stored in a database. Therefore, continuously maintaining in a database the current locations of moving objects by using a tracking technique becomes very important. The key issue is minimizing the number of updates, while providing precise locations for query results. In this chapter, we will introduce some underlying location update methods. Then, we describe two location update strategies in detail, which can improve the performance. One is the proactive location update strategy, which predicts the movement of moving objects to lower the update frequency; the other is the group location update strategy, which groups the objects to minimize the total number of objects reporting their locations. Key words: location updating, moving object tracking, group update, prediction, spatial network, moving object databases
3.1 Introduction In many moving objects applications such as traffic management and location-based services, continuously maintaining in a database current locations of moving objects is a fundamental issue. The challenge is minimizing the number of updates, while providing precise locations for query results. The number of updates from moving objects to the server database depends on both the update frequency and the number of objects to be updated. To reduce the
25
26
3 Moving Objects Updating
location updates, most existing studies propose to lower the update frequency by a prediction method [1, 9, 10]. They usually use the linear prediction which represents objects locations as linear functions of time. The objects do not report their locations to the server unless their actual positions deviate from the predicted positions by a certain threshold. This provides a general principle for the location update policies in a moving object database system. In this chapter, we will introduce some underlying update strategies following the principle and two improved ones to reduce the total update costs.
3.2 Underlying Update Strategies So far, the research on tracking of moving objects has mainly focused on location update policies. Existing methods can be classified according to the threshold, the update mode (object grouping) or the representation and prediction of future positions of objects.
3.2.1 Based on Threshold Wolfson et al. [9] first proposed the dead-reckoning update policies to reduce the update cost. According to the threshold, they are divided into three ones, namely Speed Dead Reckoning (SDR) having a fixed threshold for all location updates, Adaptive Dead Reckoning (ADR) having different thresholds for different location updates and Disconnection Detection Reckoning (DDR) having continuously decreasing threshold since the last location update. The policies also assume that the destination and motion plan of the moving objects is known a priori. In other words, the route is fixed and known. In [4], Gowrisankar and Nittel propose a dead-reckoning policy that uses angular and linear deviations. They also assume that moving objects travel on predefined routes. Lam et al. propose two location update mechanisms for further considering the effect of the continuous query results on the threshold [7]. The idea is that the moving objects covered by the answers of the queries have a lower threshold, leading to a higher location accuracy. Zhou et al. [11] also consider the precision of query results as a result of a negotiated threshold by the Aqua location updating scheme that they proposed.
3.2.2 Based on Location Prediction Wolfson and Yin [10] consider tracking with accuracy guarantees. They introduced the deviation update policy for this purpose and compared it with the distance policy. The difference between the deviation policy and the distance policy lies in the
3.3 Proactive Location Update Strategy
27
representation of future positions with the linear function in the former and constant function in the latter, respectively. Based on experiments with artificial data generated to resemble real-life movement data, they conclude that the distance policy is outperformed by the deviation policy. Similarly, Civilis et al. [1, 2] propose three update policies: a point policy, a vector policy, and a segment-based policy, which differ in how they predict the future positions of a moving object. In fact, the first and third policies are good representations of the distance and deviation policies in [10]. They further improve the update policies in [2], by exploiting the better road-network representation and acceleration profiles with routes. It should also be noted that Ding et al. [3] have recently discussed the use of what is essentially segment-based tracking based on their proposed data model for the management of road-network constrained moving objects. In a study [8], non-linear models such as acceleration are used to represent the trajectory that is affected by abnormal traffic such as traffic incident.
3.2.3 Based on Object Grouping Most existing update techniques have been developed to process individual updates efficiently [1, 2, 9, 10]. To reduce the expensive uplink updates from the objects to the location server, Lam et al. [6] propose a group-based scheme in which moving objects are grouped so that the group leader will send the location update on behalf of the whole group. A group-based location update scheme for personal communication network is also proposed in [5]. The aim is to reduce location registrations by grouping a set of mobile objects at their serving VLRs. In the following sections, we will introduce two update strategies which improve the tracking technique from the aspect of the prediction model and update mode, and focuses on the accuracy of the predicted positions of the objects in urban road networks. Based on their predicted movement functions, we group objects to further reduce their location update costs.
3.3 Proactive Location Update Strategy In traditional update strategy, the moving objects send their location continuously. However, in fact it is not necessary. For example, if the predicted location is the same as the real location, the moving objects need not send the information. We predict the locations of moving objects in road networks by using a new prediction method and introduce the proactive location update strategy to improve the performance. Specifically, we first simulate the movement of the object and use a motion function to predict its location. Then taking advantage of the difference of upward and downward telecommunication, only the parameters of the motion function are sent back to the moving objects. The moving objects can compute the
28
3 Moving Objects Updating
predicted location using these parameters. Thus, the cost of communication is minimized. By extending the CA model used in the traffic flow simulation, we propose a new simulation-based prediction (SP) method for predicting future trajectories of objects in the GCA model. We use GCAs not only to model road networks, but also to simulate the movements of moving objects by the transitions of the GCA. The SP model treats the objects’ simulated results as their predicted positions. Then, by linear regression, a compact and simple linear function that reflects future movement of a moving object can be obtained. To refine the accuracy, based on different assumptions on the traffic conditions, we simulate two future trajectories to obtain its predicted movement function. We describe the SP method in detail in Chapter 7. Through the SP method, we obtain a compact and simple linear prediction function for the moving object. However, this is different from the linear prediction in that the simulation-based prediction method not only considers the speed and direction of each moving object, but also takes correlation of objects as well as the stochastic behavior of the traffic into account. The architecture of the proactive location update strategy is shown in Fig. 3.1. The detailed process is as follows:
Fig. 3.1 Proactive location update strategy
1. When a moving object enters a new segment or produces an update, it needs to send to the server a location update message including the current velocity, direction, and location. 2. The server simulates the movement of this object and predicts its fastest and slowest movements in the form of moving points by the CA Simulator. Then the Linear Regression regresses these moving points and gets a linear motion function based on the simulation-based prediction method.
3.4 Group Location Update Strategy
29
3. The predicted location can be derived by the function computed in the previous step and stored in a moving object database. Simultaneously, the parameters of the motion function are sent back to the clients. 4. When a moving object obtains its real location from GPS Receiver, it will compute the predicted location using the parameters of the motion function. Comparing the real location and the predicted location, the moving object will send a new update message if the difference is larger than a predefined threshold. The proactive location update strategy has two advantages when compared to traditional methods: first, the trajectory by linear regression can be serialized, thus only transmitting the parameter of the function in a wireless channel; second, it takes advantage of the difference of upward and downward channel of telecommunication and avoids the redundant updates.
3.4 Group Location Update Strategy The key issue with regard to tracking technique is minimizing the number of updates, while providing precise locations for query results. The number of updates from moving objects to the server database depends on both the update frequency and the number of objects to be updated. To reduce the location updates, most existing studies propose to lower the update frequency by a prediction method. However, few research studies focus on improving the update performance from the aspect of reducing the number of objects to be updated. We observe that in many applications, objects naturally move in clusters, including vehicles in a congested road network, packed goods transmitted in a batch, and animal and bird migrations. It is possible that the nearby objects are grouped and only one object in the group reports its location to the server to represent all objects within it. Considering real-life applications, we focus on objects moving on a road network. Figure 3.2 shows an example of grouping vehicles on a part of road network. By grouping the vehicles in each road segment, the total location updates sent to the server are reduced from 9 to 5. For the purpose of improving the performance of tracking for network-constrained moving objects, we focus on two factors affecting location updates and propose our solutions. One is a better prediction model to lower the update frequency, and the other is a group update strategy to reduce the total number of objects reporting their locations. The accurate prediction model also reduces the maintenance of the groups and assures location precision for querying. As the number of updates from moving objects to the server database depends on both the update frequency and the number of objects updated, we propose a group location update strategy based on the SP model (GSP) to minimize location updates. In the GSP method, for each edge in a road network, the objects are grouped or clustered by the similarity of their predicted future movement function and their locations are represented and reported by the group (Fig. 3.2). This means that the nearby objects with similar movement during the future period on the same edge are
30
3 Moving Objects Updating
Fig. 3.2 Group location updates
grouped and only the object nearest to its group center needs to report the location of the whole group. Within a certain precision, the locations of other objects can be approximated to their group location. The idea of grouping objects for location updates is similar to the GBL method proposed in [6]. The main differences are that the GSP method groups the objects by their future movement function predicted from the SP model instead of their current locations or predicted locations after a time parameter τ obtained from the current velocity. Grouping by objects’ predicted movement function can ensure the validity of the groups. The accurate prediction from the SP model can also reduce the maintenance of the groups. Due to the constraint of the road network, each group in the GSP method has its lifetime in accordance to the network edge. A group only exists on one edge and will be dissolved when objects within it leave the edge. Furthermore, unlike the GBL method in which objects have to send several messages to one another and compute similarities for grouping and leader selection with high costs, the GSP method executes the grouping on the server after prediction. This alleviates the resource consumption of moving clients and overloads of wireless communication. The similarity of the simulated future trajectories of two objects in the SP model has to be computed by comparing a lot of feature points on the trajectories. A straightforward method is to select some of the simulated points to sum their distance differences. However, the computation cost for simulated trajectories is very high. For simplicity and low cost, we group objects by comparing their final predicted linear functions. Therefore, the movement similarity of two objects on the same edge can be determined by their predicted linear functions and the length of the edge. Specifically, if both the distance of their initial locations and their distance when one of the objects arrives at the end of the edge are less than the given threshold (corresponding to the update threshold ε ), we group the two objects together. These distances can be easily computed by their predicted functions. Figure 3.3 shows the predicted movement functions (represented as L1 , L2 , L3 , L4 , L5 ) of the objects o1 , o2 , o3 , o4 , o5 on the edge (n1 , n2 ) from Fig. 3.2. le is the length of the
3.4 Group Location Update Strategy
31
edge and t1 ,t2 ,t3 ,t4 are respectively the times when the objects o1 , o2 , o3 , o4 arrive at the end of the edge. Given that the threshold is 7, for objects o1 , o2 , the location difference between them at initial time and t1 are not larger than 7, therefore, they are clustered in one group c1 . We then compare the movement similarities of o3 and o1 as well as o3 and o2 . The location differences are all not larger than 7, and so o3 can be inserted to c1 . Although at the initial time, o3 and o4 are very close with the distance less than 7, they move far away from each other in the future and their distance exceeds 7 when o3 arrives at the end of the edge. They cannot be grouped in one cluster. In the same way, o4 and o5 form the group c2 . Therefore, given a threshold, there are three cases of the predicted linear function of the objects when they are grouped together on one edge. These cases can be seen in the Fig. 3.3 respectively labeled by (a) (L2 and L3 with objects moving close), (b) (L1 and L3 with objects moving far away), and (c) (L1 and L2 with one object exceeding another one).
Fig. 3.3 Grouping objects by their predicted functions
In a road network, we group objects on the same network edge. When objects move out of the edge, they may change direction independently. Hence, we dissolve this group and regroup the objects in adjacent edges. Each group has its lifetime from the group formation to all objects within it leaving the edge. For each edge, with the predicted functions of the objects, groups are formed by clustering together sets of objects not only close to each other at a current time, but also likely to move together for a while on one edge. We select the object closest to the center of its group at both the current time and some period in future on the edge to represent the group. The central object represents its group and is responsible for reporting the group location to the server. For reselecting the central object, according to the predicted future functions of the objects, we can choose the objects close to the center of the group during its lifetime as the candidates of the central object. We can also identify when the central object will move away from the group center and choose another candidate as a new central object. A join from a moving object to a group must be executed as follows. The system first finds the nearby groups
32
3 Moving Objects Updating
according to the edge in which the object lies and then compares the movement similarity of the object and the group by their predicted functions. If the object cannot join the nearby groups, a new group will be created with only one member. When a moving object leaves a group, the central object of the group needs to be reselected. However, for the object leaving an edge, to reduce the central object reselection of its group, we just delete it from its group and do not change the central object until the central object leaves the edge. In the GSP method, the grouping algorithm assures the compactness and movement similarity of the objects within a group. Given the precision threshold ε , the locations of the objects in a group may be approximated by the location of the group (i.e., location of its central object). Only the location update from the central object of the group to the location server is necessary. After the server makes predictions for objects in a road network and initiates their groups, the client of the central object measures and monitors the deviation between its current location and predicted location and reports its location to the server. Other objects do not report their locations unless they enter the new edge. The prediction and grouping of objects are executed in the server and the group information (including the edge id, the central object id, its predicted function, and a set of objects within the group) is also stored in the database of the server. The update algorithm in the server is described in Algorithm 1.
Algorithm 1: GroupUpdate(ob jID, pos, vel, edgeID, grpID) input : ob jID, edgeID and grpID are respectively the identifiers of the object to be updated, its edge, and group; pos and vel are its position and velocity Simulate two future trajectories of ob jID with different Pd by the CA; Compute the future predicted function l(t) of ob jID; if ob jID does not enter the new edge then if ob jID is the central object of grpID then Update the current position pos and predicted function l(t) of grpID; Send the predicted function l(t) of grpID to the client of ob jID; end else if GetOb jNum(grpID) > 1 then Delete ob jID from its original group grpID; if ob jID is the central object of grpID then Reselect the central object of grpID, update and send its group info; end else Dissolve the group grpID; Find the nearest group grp1 for ob jID on edgeID; Compute the time te when ob jID leaves edgeID by l(t) and edgeID length; if Both distances between ob jID and grp1 at initiate time and te ≤ ε then Insert ob jID into grp1 and send grp1 identifier to the client of ob jID; Reselect the central object of grp1 , update and send its group info; else Create a new group grp2 only having ob jID and send its group info; end
References
33
3.5 Summary This chapter introduces a few location update techniques to track network-constrained moving objects. On one hand, the techniques lower the location update frequency and ensure the accurate locations in tracking by a new prediction method. The proactive location update strategy also minimizes the cost of wireless communication during the updating process by the coordination between location uploading and downloading. On the other hand, a group update strategy is presented according to movement features of objects moving on a road network, which further reduces the total number of location updates.
References 1. Civilis A, Jensen CS, Nenortaite J, Pakalnis S (2004) Efficient Tracking of Moving Objects with Precision Guarantees. In: Proceedings of the 1st Annual International Conference on Mobile and Ubiquitous Systems, Networking and Services, Cambridge, Massachusetts, USA, pp 164-173 2. Civilis A, Jensen CS, Pakalnis S (2005) Techniques for Efficient Road-Network-Based Tracking of Moving Objects. IEEE Transactions on Knowledge and Data Engineering 17(5):698712 3. Ding Z, G¨uting RH (2004) Managing Moving Objects on Dynamic Transportation Networks. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Santorini Island, Greece, pp 287-296 4. Gowrisankar H, Nittel S (2002) Reducing Uncertainty In Location Prediction of Moving Objects In Road Networks. In: Proceedings of the 2nd International Conference on Geographic Information Science (GIS 2002), Boulder, Colorado, USA, pp 228-242 5. Huh Y, Kim C (2002) Group-Based Location Management Scheme in Personal Communication Networks. In: Proceedings of the International Conference on Information Networking, Cheju Island, Korea, pp 81-90 6. Lam GHK, Leong HV, Chan SC (2004) GBL: Group-Based Location Updating in Mobile Environment. In: Proceedings of the 9th International Conference on Database Systems for Advanced Applications (DASFAA 2004), Jeju Island, Korea, pp 762-774 7. Lam KY, Ulusoy O, Lee TSH, Chan E, Li G (2001) An Efficient Method for Generating Location Updates for Processing of Location-Dependent Continuous Queries. In: Proceedings of the 6th International Conference on Database Systems for Advanced Applications (DASFAA 2001), Hong Kong, China, pp 218-225 8. Trajcevski G, Wolfson O, Xu B, Nelson P (2002) Real-Time Traffic Updates in Moving Objects Databases. In: Proceedings of the 13th International Conference on Database and Expert Systems Applications (DEXA 2002), Aix-en-Provence, France, pp 698-704 9. Wolfson O, Sistla AP, Camberlain S, Yesha Y (1999) Updating and Querying Databases that Track Mobile Units. Distributed and Parallel Databases 7(3):257-387 10. Wolfson O, Yin H (2003) Accuracy and Resource Consumption in Tracking and Location Prediction. In: Proceedings of the 7th International Symposium on Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 325-343 11. Zhou J, Leong HV, Lu Q, Lee KC (2005) Aqua: An Adaptive Query-Aware Location Updating Scheme for Mobile Objects. In: Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA 2005), Beijing, China, pp 612-624
Chapter 4
Moving Objects Indexing
Jidong Chen1 , Xiaofeng Meng2 1
EMC Research China, 2 Renmin University of China
[email protected],
[email protected] Abstract For querying large amounts of moving objects, a key problem is to create efficient indexing structures that make it possible to effectively answer various types of queries. Traditional spatial indexing approaches cannot be used because the locations of moving objects are highly dynamic, which leads to frequent updates of index structures, which in turn will cause huge overheads. In this chapter, we first introduce a few of the underlying spatial index structures including the R-tree, Grid File, and Quad-tree. Then, we propose the indexing methods for moving objects in Euclidean space and in spatial networks. Finally, we describe techniques that index the past, present, and anticipated future positions of moving objects. Key words: spatial index, spatial-temporal index, index update, spatial network, moving object databases
4.1 Introduction For various types of queries of moving objects, it is necessary for the spatiotemporal indexing techniques to efficiently locate the desired location data in the very large moving object databases. Traditional database indexing techniques work well only for static data. Such indices have to be explicitly updated when data changes. The use of such techniques in the moving objects scenario will lead to frequent updates, which will cause huge overheads. Development of efficient indexes to improve the update performance is a considerable challenge.
35
36
4 Moving Objects Indexing
Indexing methods for the moving objects focus on two issues: (1) storage and retrieval of historical information, and (2) prediction of future trajectory. The amount of historical trajectories is constantly increasing over time, which makes it infeasible to keep track of all location updates. As a result, past positions of moving objects are often approximated by polylines (multiple line segments). Several indexing techniques [20, 22, 27], all based on 3-dimensional variations of R-trees [12] and R*-trees [4], have been proposed, and their goal is to minimize storage and query cost. To manage moving objects in spatially constrained networks, Pfoser et al. [23] proposed to convert the three-dimension problem into two sub-problems of lower dimensions through certain transformation of the networks and the trajectories. Another approach, known as the FNR-tree [11], separates spatial and temporal components of the trajectories and indexes the time intervals during which any moving object was on a given network link. The MON-tree approach [3] further improves the performance of the FNR-tree by representing each edge by multiple line segments (i.e., polylines) instead of just one line segment. Indexing future trajectories raises different problems when compared to indexing historical trajectories. The goal is to efficiently retrieve objects that will satisfy some spatial condition at a future time given their present motion vectors. Some of the early works [2, 14] employ dual transformation techniques that represent predicted positions as points moving in a two-dimensional space. However, they are largely theoretical, and applicable either only in 1D space [14] or entirely inapplicable in practice due to some large hidden constant in complexity [2]. In addition, based on dual transformation, a recent approach called STRIPES [21] supports efficient queries and updates at the cost of increased space requirements. The main focus of the most recent studies is on practical implementation, for instance, the TPR-tree [25] and its variations [24, 28] are based on R-tree [12], and the Bx -tree [13] and its variations [31] are based on B+ -tree. These structures use the linear prediction model to support predictive queries and to reduce the number of index updates. However, the assumption of linear movement limits their applicability in a majority of real-life applications especially in traffic networks where vehicles change their velocities frequently. In this chapter, we first introduce a few of the underlying spatial index structures including the R-tree, Grid File, and Quad-tree. Based on these underlying structures, we propose the indexing methods for moving objects in Euclidean space and in spatial networks. Then, we describe an indexing technique capable of capturing the positions of moving objects at all points in time. Finally, we discuss how to reduce index updates in existing moving objects indexing structures.
4.2 Underlying Indexing Structures Indexing structures for moving objects are mostly developed from spatio-temporal access methods. Through the last decade, several spatio-temporal access methods have been developed as an auxiliary structure to support spatio-temporal queries.
4.2 Underlying Indexing Structures
37
Figure 4.1 depicts the evolution of spatio-temporal access methods with the underlying spatial and temporal structures [18]. Lines in the figure indicate the relation between a new proposed spatio-temporal indexing structure and the original structure that it is based upon. The spatial indexing structures are represented by square dots in the figure. The gray ones represent the indexing structures for current positions and historical trajectories of moving objects, while the black ones are for the future trajectory of moving objects. The PCFI-index and BBx -index are for indexing the positions of moving objects at all points in time - past, present, and future. There are three main indexing methods for spatial or spatio-temporal data: the R-tree [12] and its variation [4], the Quad-tree [29] and its variation, and the Grid File [19] and its variation.
Fig. 4.1 Evolution of spatio-temporal access methods
4.2.1 The R-Tree The R-tree [12] is a height-balanced indexing structure. It is an extension to the Btree in the multi-dimensional space. Each node of the R-tree (internal as well as leaf node) represents a hyper-rectangle in d dimensions. The leaf level rectangles contain objects, and the rectangles for internal nodes contain rectangles one level below.
38
4 Moving Objects Indexing
The boundaries of the rectangles are made as tight as possible. These rectangles are called minimum bounding rectangles or MBRs. An entry in a leaf node is of the form: (MBRo , po ), where MBRo is the MBR of the indexed spatial object, and po is a pointer to the actual object tuple in the database. An entry in an internal node is of the form: (MBRc , pc ), where MBRc is the MBR covering all MBRs in its child node, and pc is the pointer to its child node c. The number of entries in each R-tree node, except for the root node, is between two specified parameters m and M (m ≤ M/2). The parameter M is termed the fanout of the R-tree. Unlike B-tree, the MBRs of nodes at the same level in an R-tree are allowed to overlap. Hence, searching an object may involve traversing several paths in the R-tree. When a node becomes overfull, it undergoes a split. Efficient heuristics and pruning are used to reduce the expected number of paths visited by subsequent searches. Figure 4.2 represents the R-tree corresponding to the spatial distribution of objects (solid rectangles in this case) below it.
Fig. 4.2 R-tree
4.2 Underlying Indexing Structures
39
The R-tree has the following features: • The R-tree is height balanced. The root node has at least two children nodes and all leave nodes are in the same level of the tree. • If M is the maximum number of entries in an R-tree, then m ≤ M/2, where m is the minimum number of entries. • Height of the R-tree is | logm N| − 1. • Maximum number of nodes is N/m + N/m2 + · · · + 1.
4.2.2 The Grid File The Grid File is an effective method to index the objects in a space. Considering a 3dimensional space S = X ×Y ×Z, we can divide S into several grids: P = U ×V ×W , where U = (u0 , u1 , · · · , ul ), V = (v0 , v1 , · · · , vm ), W = (w0 , w1 , · · · , wn ), and ui (0 ≤ i ≤ l), v j (0 ≤ j ≤ m), wk (0 ≤ k ≤ n) are intervals in X,Y, Z dimensions. In this way, the space can be indexed by a Grid File, as shown in Fig. 4.3.
Fig. 4.3 Grid File
In a Grid File, a grid allocates storage in units of fixed size, called grid bucket. A bucket has a capacity c, which is the number of records that it can contain. We access the Grid File at the granularity of grid bucket, thus the structure used to organize records within a bucket is not important for the file system as it does not influence the access time. In contrast, organizing different buckets such that the least buckets will be accessed during the query processing is a critical issue in the Grid File method.
40
4 Moving Objects Indexing
4.2.3 The Quad-Tree There are many variants of the Quad-tree. The main idea is to divide the space into subspace recursively. In this section we introduce the PMR Quad-tree that divides the space into four sub quadrants recursively. It is widely used to index moving objects. The PMR Quad-tree is an index structure built on line segments in a 2-dimensional plane, while the movement of moving objects is described by a series of line trajectories. For each quadrant, there is accordingly a leaf node in the Quad-tree. In addition, if a trajectory segment of an object intersects a quadrant, there needs to be an index entry in the corresponding Quad-tree node. For example, as shown in the right part of Fig. 4.4, a PMR Quad-tree is created given the indexed space shown on the left part of Fig. 4.4. The trajectory intersects with three quadrants, and then it will be inserted into the corresponding leaf nodes in the PMR Quad-tree in Fig. 4.4 (solid rectangle leaf-nodes in the figure).
Fig. 4.4 The PMR Quad-tree
4.3 Indexing Moving Objects in Euclidean Space The indexing structure of moving object trajectories is essentially a spatial index in the 3D space X ×Y × Z, and we can adopt conventional spatial indexing methods to index the moving objects. However, this is not the best solution. In moving objects databases, location updates can cause the index structure to be updated frequently. As a result, special considerations should be taken to reduce the updating cost. In this section, we will introduce three indexing structures: the time parameterized Rtree (TPR-tree) [25], Grid File-based moving objects index (GMOI) [17], and future trajectory Quad-tree (FT-Quad-tree) [9, 10], which improve the R-tree, Grid File, and PMR Quad-tree index structures respectively, for moving objects in Euclidean space.
4.3 Indexing Moving Objects in Euclidean Space
41
4.3.1 The R-Tree-Based Index We introduce an indexing technique, the time-parameterized R-tree (TPR-tree), which efficiently indexes the current and anticipated future positions of moving point objects (or “moving points” for short). The technique naturally extends the R*-tree [4]. The TPR-tree is a balanced, multi-way tree with the structure of an Rtree. Entries in leaf nodes are pairs of the position of a moving point and a pointer to the moving point, and entries in internal nodes are pairs of a pointer to a subtree and a rectangle that bounds the positions of all moving points or other bounding rectangles in that subtree. In the TPR-tree, a moving object o is represented by (1) an MBR oR that denotes its extent at reference time 0, and (2) a velocity bounding rectangle (VBR) oV = oV 1− , oV 1+ , oV 2− , oV 2+ , where oVi− (oVi+ ) describes the velocity of the lower (upper) boundary of oR along the i-th dimension (1 ≤ i ≤ 2). Figure 4.5(a) shows the MBRs and VBRs of four objects a, b, c, and d. The arrows (numbers) denote the directions (values) of their velocities, where a negative value implies that the velocity is toward the negative direction of an axis. The VBR of a is aV = 1, 1, 1, 1 (the first two numbers are for the X dimension), while those of b, c, and d are bV = −2, −2, −2, −2, cV = −2, 0, 0, 2, and dV = −1, −1, 1, 1, respectively. A nonleaf entry is also represented by an MBR and a VBR. Specifically, the MBR (VBR) tightly bounds the MBRs (VBRs) of the entries in its child node. In Fig. 4.5(b), the objects are clustered into two leaf nodes N1 , N2 , whose VBRs are N1V = −2, 1, −2, 1 and N2V = −2, 0, −1, 2 (their directions are indicated using white arrows).
Fig. 4.5 Entry representations in a TPR-tree
Figure 4.6 shows the MBRs at timestamp 1 (notice that each edge moves according to its velocity). The MBR of a non-leaf entry always encloses those of the objects in its subtree, but it is not necessarily tight. For example, N1 (N2 ) at timestamp 1 is much larger than the tightest bounding rectangle for a, b(c, d). A predictive window query is answered in the same way as in the R*-tree, except that it is compared with
42
4 Moving Objects Indexing
the (dynamically computed) MBRs at the query time. For example, the query qR at timestamp 1 in the figure visits both N1 and N2 (although it does not intersect them at time 0). The TPR-tree is optimized for timestamp queries in interval [TC , TC + H], where TC is the current update time, and H is a tree parameter called the horizon (i.e., how far the tree should “see” in the future). The update algorithms are exactly the same as those for the R*-tree, and are obtained by simply replacing the four penalty metrics of the previous section with their integral counterparts. Specifically, T +H T +H the area (or perimeter) of an entry N equals TCC A(N,t)dt (or TCC P(N,t)dt), where A(N,t) (or P(N,t)) returns the area (perimeter) of N at time t. Similarly, theoverlap (or the centroid distance) between two MBRs N1 and N2 is computed as TTCC +H OV R(N1 , N2 ,t)dt (or TTCC +H CDist(N1 , N2 ,t)dt), where OV R(N1 , N2 ,t) (or CDist(N1 , N2 ,t)) returns the overlapping area (centroid distance) between N1 and N2 at time t. These integrals are solved into closed formulae. When an object is inserted or removed, the TPR-tree tightens the MBR of its parent node. Figure 4.6 shows the MBRs after inserting a new object e (into N1 ) at time 1. N1 is adjusted to the tightest MBR bounding a, b, e, by computing their respective extents at time 1. Note that this does not compromise the update cost because N1 must be loaded (written back) from (to) the disk anyway to complete the insertion. On the other hand, the MBR of N2 is not tightened because it is not affected by the insertion (attempting to adjust N2 will increase the update cost).
Fig. 4.6 N1 is tightened during an insertion at time 1
4.3.2 The Grid-Based Index Next, we introduce a grid-based index structure, Grid File-based moving objects index (GMOI). Suppose that the indexed space is X ×Y × T ∗ , in which X ×Y represents the geographic space of the application and T ∗ represents a section of T axis. Since the time extends infinitely, we deal with the period around the current time in-
4.3 Indexing Moving Objects in Euclidean Space
43
stant when building the index and make periodic storage of the indexed space along with time. Definition 4.1. A partition, denoted by P, in the space X × Y × T ∗ is expressed as P = Px ∗ Py ∗ Pt . Px = (x0 , x1 , · · · , xl ), Py = (y0 , y1 , · · · , ym ), and Pt = (t0 ,t1 , · · · ,tn ) are sub-partitions in three directions, in which xi (0 ≤ i ≤ l), y j (0 ≤ j ≤ m), and tk (0 ≤ k ≤ n) are successive end-to-end sections in the dimensions of X, Y , and T , respectively. In GMOI, we make a partition in the 3-dimensional indexed space and derive a set of grid blocks, as shown in Fig. 4.7. Each of these grid blocks contains a pointer leading to a certain grid bucket in the storage, whose size equals that of the basic I/O units. The indexed records are stored in the grid buckets. In the process of operation on GMOI, insertion and deletion of records may cause dynamic change of the grid partition, i.e., splits and merges of grid blocks. For example, in Fig. 4.7, the broken line divides the section x2 into two in the X dimension. Dynamic partition of X ×Y × T ∗ space and the maintenance of the relation between grid blocks and buckets are accomplished by managing a grid directory. In the grid directory, every item contains the boundary of the block and a pointer leading to the corresponding bucket.
Fig. 4.7 The grid partition in three-dimensional spatio-temporal space
In GMOI, the trajectories of moving objects are stored in the forms of trajectory segments. Considering the trajectory segment SEG = [(xi , yi ,ti ), (xi+1 , yi+1 ,ti+1 )], if it goes through a grid block, with the pointer in the block, the entire information of SEG can be found in the corresponding grid bucket, including its identifier and the vertex information. Since SEG is essentially a line segment in the three dimensional space, it is quite easy to identify all grid blocks intersected by SEG. When a moving object enters the space X × Y × T ∗ , its moving plan will be sent to the server and the corresponding trajectory information will be inserted into the Grid File. When the trajectory information in the Grid File exceeds a specific threshold, the split will be triggered.
44
4 Moving Objects Indexing
When a trajectory ξ of a moving object M enters the space, each segment of it will be processed. Considering the trajectory segment ξ · Seg(i), we need to find which bucket it will be inserted into. If the record number of the bucket is less than C, we insert the information of ξ · Seg(i) including the start and end point. Otherwise, the split is triggered. The GMOI creating algorithm is shown in Algorithm 2.
Algorithm 2: Creation of GMOI input : ξ for i = 1; i ≤ ξ · SumSeg; i + + do //process the i-th trajectory segment; Grid∗=Grid|Grid ξ · Seg(i) = φ ; //Grid∗ contains all grids intersecting with the i-th segment; for ∀Grid ∈ Grid∗ do if (|Grid| < C) then Insert ξ · Seg(i) into Grid; else Split Grid; Insert ξ · Seg(i) into corresponding grid; end end end
When the moving objects update their location, we need to adjust the GMOI by deleting the old trajectory and insert the new trajectory. It may generate split and merge of the grids in this process and introduce high cost. In fact, in the case of small threshold, the old trajectory and new trajectory may be close. We insert the new trajectory and then delete the old one so that we can reuse the old information and avoid trigger split or merge. The version is employed to implement the reusing. There is a version for each trajectory ξ (note as ξ · Vers) and for its segment (note as ξ · Seg(i) · Vers). All the versions of new trajectories and segments are 0. After each updating, it is added 1. The GMOI updating algorithm is shown in Algorithm 3.
4.3.3 The Quad-Tree-Based Index Before introducing the future trajectory Quad-tree (FT-Quad-tree), we define and explain some notions, which will be used throughout this section. Definition 4.2. Moving Segment (MS) is an 8-tuple < MOID, Number, StartX, StartY, EndX, EndY, Time,Velocity > in 2-dimensional space, which depicts an instance that a moving object on a specific route segment. MOID is a unique object identifier; Number is the sequence of the MS in the Moving Plan; StartX and StartY (EndX and EndY ) are two values to denote the start (end) point of a specific route
4.3 Indexing Moving Objects in Euclidean Space
45
Algorithm 3: Updating of GMOI input : ξold ;ξnew for i = ξold ·CurSeg; i ≤ ξnew · SumSeg; i + + do // process the i-th trajectory segment; Grid∗=Grid|Grid ξnew · Seg(i) = φ ; //Grid* contains all Grids intersecting with the i-th segment; for ∀Grid ∈ Grid∗ do if |Grid| < C then Insert ξnew · Seg(i) into Grid; else Split Grid; //reuse the old segments if necessary; Insert ξnew · Seg(i) into corresponding grid; end end end for i = ξold ·CurSeg; i ≤ ξold · SumSeg; i + + do Grid∗=Grid|Grid ξold · Seg(i) = φ ; for ∀Grid ∈ Grid∗ do Delete ξold · Seg(i) from Grid. Use version to divide the new and old records; Merge grids if necessary; end end
segment in 2D space; Time is the starting time of the moving object; Velocity is the speed of the moving object on the route segment. Definition 4.3. Moving Plan (MP) is a series of MSs of an object. It depicts the movement of the object on all its route segments. MP is a semantically ordinal queue. Any MS in the MP has its specific position determined by the element Number in the 8-tuple defined above. Figure 4.8 illustrates an MP instance. The six lines with arrow specify an MP of a moving object. Each line with an arrow represents an MP.
Fig. 4.8 An example of MP of a moving object
46
4 Moving Objects Indexing
Definition 4.4. Trajectory (T) is a polyline in a high dimensional space, which combines the spatial and temporal information of a moving object. If the object moves in a d-dimensional space, the future trajectory is a polyline in a (d+1)-dimensional space. A trajectory can be denoted by a series of 6-tuples < MOID, Number, Start, StartT, End, EndT >, which is a Trajectory Segment (TS). MOID is a number uniquely identifying the object. Start and End are two vectors specifying the start point and end point of the corresponding route segment. The vector Start (End) can be expressed as (StartX(EndX), StartY (EndY )) if the object moves in 2dimensional space (X,Y ), and StartX(EndX) if in 1-dimensional space X. StartT and EndT specify the starting and ending time. Number indicates where the T S is in the whole T of MOID. The 6-tuple means an object MOID moves from Start to End in the period of (StartT, EndT ). One example of an object moving in 2-dimensional space (d=2) is shown in Fig. 4.8. The projection of the trajectory on the (X,Y ) plane is just the route of the object, as the left polyline shown in Fig. 4.9.
Fig. 4.9 A trajectory in (X,Y, T ) and its projection in (X, T )
Definition 4.5. Shared Trajectory Segment is a trajectory segment shared by different moving objects. It is in the form of < StartX, EndX, StartT, EndT >, in which StartX, EndX, StartT , and EndT have the same meaning as in the T S. Shared trajectory segments can be used by a single object or shared by a number of objects. Shared trajectory segments serve as our index entries. A shared trajectory segment followed by an object list is an index entry. If the shared trajectory segment represents only one object’s movement, it is equivalent to the object’s trajectory segment entry. However, if a number of objects share the same trajectory segment, the situation is different. The shared trajectory segments are derived from trajectories directly and from Moving Plans indirectly. If the movements in the T Ss of different objects are approximate and the difference is less than a threshold, we regard these T Ss as having “common” information, including the starting and ending spatial-temporal points.
4.3 Indexing Moving Objects in Euclidean Space
47
Then, a shared trajectory segment is formed by the extracted “common” information. With such an information-extracting idea, we design the PMR Quad-tree-based dynamic attribute index structure. In the FT-Quad-tree, an index entry is denoted by < StartX, EndX, StartT, EndT, pmo >, where < StartX, EndX, StartT, EndT > is just a shared trajectory segment. pmo is a pointer that points to an object list in the form of < MOID, Number >. MOID and Number are the same as in the T S. The objects in the list pointed by pmo are those that share this index entry, i.e., each of these objects has a trajectory segment that is approximate, even equal, to the shared trajectory segment < StartX, EndX, StartT, EndT >. As shown in Fig. 4.11, the left list is an example of MSs in Fig. 4.10. They have nearly the same movement in the x-direction, but different in the y-direction. After the process of projection on x- and y-directions, the corresponding index entries in two naive PMR Quad-trees are derived, as shown in the middle two parts. By applying the information-extracting idea on the trajectory segments in (X, T ), the index entry, i.e., I1 in the FT-Quad-tree is derived, as shown in the right part. With the pointer pmo, all the objects sharing the identical index entry can be found.
Fig. 4.10 An example of trajectory
Compared to those un-preprocessed index methods, the number of index entries is reduced enormously in the FT-Quad-tree. Figure 4.12 displays the difference. In the figure, the left part is the naive index tree and the index entries are moving object trajectory segments. The right part is the FT-Quad-tree structure. I1 is the TS shared by X1 , X2 , X3 , X4 , X5 . In the FT-Quad-tree, the inserting, deleting, and updating operations are different.
Insertion The FT-Quad-tree is constructed by inserting the TS of each object one by one into an initially empty structure consisting of one quadrant. When a new TS (SEG) of a moving object (MO) needs to be inserted into the FT-Quad-tree, for every leaf node, the insertion function is called (see Algorithm 4). The basic idea is to check whether SEG has “common” movement information with an existing one.
48
Fig. 4.11 Index entries in naive PMR quadtree and FT-Quadtree
Fig. 4.12 Comparison of two kinds of Quad-tree indexes Algorithm 4: Insertion input : SEG, Q // SEG is a TS, Q is an FT-Quad-tree node; if SEG does not intersect Q then RETURN; else for each index entry I in the Q do if SEG.movein f or = I.in f ormation then add an item for SEG.MOID in I → ob jectlist; RETURN; add a new index entry NI in Q; add an item for SEG.MOID in NI → ob jectlist; end end
4 Moving Objects Indexing
4.3 Indexing Moving Objects in Euclidean Space
49
For a leaf node Q intersected by SEG, if the movement information in SEG is approximate to that in an index entry I in Q, the only thing that should be done is to add the object information into the object link list of I, which specifies that SEG of MO shares the existing I. Otherwise, a new index entry NI with an object link list containing only one object record should be created and inserted into Q. The amount of index entries in a node is denoted by CurrentSize, which is also used in the following deleting and updating algorithms. If the insertion causes the result that the CurrentSize of Q exceeds the splitting threshold, the splitting operation (see Algorithm 5) is required to split Q into four smaller nodes of equal size. After splitting, SEG and all the trajectory segments in Q are to be inserted into each one of the four sub-nodes, i.e., the function Insertion is called four times for each sub-node. Algorithm 5: Split input : Q // Q is an FT-Quadtree node; split Q into Q1 , Q2 , Q3 , Q4 ; for each T S SEG in Q do for each Qi do insert(SEG, Qi ); end end
Deletion The deletion process is simply the inverse process of Insertion (see Algorithm 6). When deleting a TS(SEG) of a moving object (MO) from the FT-Quad-tree, it is required to delete it from all the nodes that it intersects. For a certain node Q, after making sure that SEG intersects Q, it is important to ensure that if there are two items or more in the object list of the corresponding index entry I of SEG, the item including the information of MO is taken out. If there is only one item, its index entry I will be taken out of Q. The deletion of I may make the sum of the CurrentSize of Q and that of Q’s siblings less than the splitting threshold. Once this occurs, Q and its siblings should be merged, i.e., the mergence operation (see Algorithm 7) is executed on Q and its sibling.
Update To implement our update method of “insert-delete”, the concept of “version” is introduced to distinguish the former TS and new TS. The version of the former is considered older than the latter. Thus, after insertion operation, although the former
50
4 Moving Objects Indexing
Algorithm 6: Deletion input : SEG, Q // SEG is a T S, Q is an FT-Quad-tree node; if SEG does not intersect Q then RETURN; else find the corresponding index entry I of SEG; n = the number of items in I → ob jectlist; if n > 1 then delete the item for SEG.MOID in I → ob jectlist; else delete I; end end
Algorithm 7: Mergence input : Q // Q is an FT-Quad-tree node; find the father Q0 of Q; find the sibing Q1 , Q2 , Q3 of Q; for each SEG in Q do insert(SEG, Q0 ); for i = 1 to 3 do for each SEG in Qi do insert(SEG, Q0 ); end delete Q, Q1 , Q2 , Q3 ; end end
TS and the new TS coexist, they will not be confused because of different versions. After the deletion operation, new TSs can reuse the space of original ones. When a moving object stops because of some unexpected situation, its subsequent TSs need to be updated as soon as possible. Concerning an object (MO) stopping at X at the time of T , the updating process has two phases: inserting the new TSs of MO (denoted as New version) and deleting the former TSs (denoted as Old version). In the first phase, after inserting a TS in the New version into a node Q that it has intersected, if none of the Old version is in Q, the following deletion process will not affect Q. Then, if the CurrentSize of Q exceeds the splitting threshold, the splitting process will be executed. However, if some of the Old versions are in Q, the splitting will be ignored. In the second phase, after deleting a TS in Old version from a node Q that it has intersected, if the CurrentSize of Q exceeds the splitting threshold, the splitting operation will be executed, and if the sum of the CurrentSize of Q and that of its siblings is less than the splitting threshold, the mergence operation will be executed.
4.4 Indexing Moving Objects in Spatial Networks
51
Algorithm 8: Update input : X, T, MO //the updating position X and time T of the object MO; keep the segments after X, T of MO as Old-version; modify the segments in Old-version as New-version; for each SEG New-version do for each leaf node Q do if SEG intersects Q then insert(SEG, Q); if none of Old-version is in Q then if Q.CurrentSize > splittingthreshold then split(Q); end end end end end for each SEG Old-version do for each leaf node Q do if SEG intersects Q then delete(SEG, Q ); if Q .CurrentSize > splittingthreshold then split(Q ); end s = Q .CurrentSize + Q ’s sibling.CurrentSize; if s < splittingthreshold then merge(Q ); end end end end
4.4 Indexing Moving Objects in Spatial Networks In this section, we address the problem of efficient indexing of moving objects in spatial networks to support heavy loads of updates. We exploit the constraints of the network and the stochastic behavior of the real traffic to achieve both high update and query efficiency. We introduce a dynamic data structure called adaptive unit (AU) to group neighboring objects with similar movement patterns in the network [5]. A spatial index (e.g., R-tree) for the road network is then built over the adaptive units to form the index scheme (e.g., the adaptive network R-tree) for moving objects in road networks.
52
4 Moving Objects Indexing
4.4.1 The Adaptive Unit Conceptually, an adaptive unit is similar to a 1-dimensional MBR in the TPR-tree, which expands with time according to the predicted movement of the objects it contains. However, in the TPR-tree, it is possible that an MBR may contain objects moving in opposite directions, or objects moving at different speeds. As a result, the MBR may expand rapidly, which may create large overlap with other MBRs. The adaptive unit avoids this problem by grouping objects having similar moving patterns. Specifically, for objects in the same edge of the spatial network, we use a distance threshold and a speed threshold to cluster the adjacent objects with the same direction and similar speed. Figure 4.13 illustrates the difference between an AU and an MBR in the TPR-tree. We use an arrow to denote the direction and the speed (the length of the arrow) of an object. In Fig. 4.13(b), objects e and f are not in the AU because they have different directions and belong to different CAs. As shown in Fig. 4.13(c), the MBR in the TPR-tree experiences significant expanding as time goes by. In comparison, as Fig. 4.13(d) shows, the AU has no obvious enlargement because objects in the AU move in a cluster. Thus, AUs alleviate the problem of MBR rapid expanding and overlaps by tightly bounding enclosed moving objects for some time in the future.
Fig. 4.13 MBR vs. AU
We now formally introduce the AU. An AU is a 7-tuple: AU = (auID, ob jSet, upperBound, lowerBound, edgeID, enterTime, exitTime) where auID is the identifier of the AU, ob jSet is a list that stores objects belonging to the AU, and upperBound and lowerBound are upper and lower bounds of predicted future trajectory of the AU. The trajectory bounds are derived from the trajectory bounds of the objects in the AU. We assume the functions of trajectory bounds as
4.4 Indexing Moving Objects in Spatial Networks
53
follows: Du (t) = αu + βu · t lowerBound : Dl (t) = αl + βl · t
upperBound :
(4.1) (4.2)
edgeID denotes the edge that the AU belongs to, and enterTime and exitTime record the time when the AU enters and leaves the edge. In the spatial network, multiple AUs are associated with a network edge. Since AUs in the same edge are likely to be accessed together during query processing, we store AUs by clustering on their edgeID. In other words, the AUs in the same edge are stored in the same disk pages. To access AUs more efficiently, we create a compact summary structure called the direct access table for each edge, which is treated as a secondary index of AUs that can be accessed by an in-memory buffer. A direct access table stores the summary information of each AU on an edge (i.e., number of objects, trajectory bounds) and pointers to AU disk pages. Each AU corresponds to an entry in the direct access table, which has the following structure (auID, upperBound, lowerBound, auPtr, ob jNum), where auPtr points to a list of AUs in disk storage and ob jNum is the number of objects included in the AU. Similar to AUs, the entries of the same direct access table and of the different direct access table but in the adjacent edge are grouped together so that we can get them into the buffer more efficiently. For a simple network with small amount of AUs, we can retain all direct access tables in the main memory since it only keeps the summary information of AUs. When the updated locations are stored in a database in the server, the index structure of moving objects may be updated frequently with the update of locations. Our task is to reduce the cost of such indexing updates by a 1-dimensional dynamic AU structure and an accurate prediction method. An important feature of the AU is that it groups objects having similar moving patterns. The AU is capable of dynamically adapting itself to cover the movement of the objects it contains. By tightly bounding enclosed moving objects for some time in the future, the AU alleviates the update problem of MBR rapid expanding and overlaps in the TPR-tree like methods. For reducing updates further, the AU captures the movement bounds of the objects based on the simulation-based prediction (SP) method (This is described in detail in Chapter 7), which considers the network constraints and stochastic traffic behavior. Since objects in an AU have similar movements, we then predict the movement of the AU by the SP method, as if it were a single moving object. The specific locations of the individual objects inside AUs can be similar and obtained by trajectory bounds of the AU. Through the SP method, we obtain two predicted future trajectory bounds of objects. When an object’s position exceeds the AU, the index needs to be updated to delete the object from the old AU and insert the object to another AU. The accurate prediction of an AU’s movement and expanding with objects’ movement makes it possible that the updated location of each object seldom affects the changing of the AU structure (e.g., deleting and inserting objects,
54
4 Moving Objects Indexing
creating and dropping AUs). Therefore, the SP method helps to reduce the index updating costs. The future trajectory bounds are predicted at each node when an AU is created. The trajectory bounds will not be changed along the edge that the AU moves on until the objects in the AU move to another edge in the network. It is evident that the range of predicted bounds of an AU will become wider with time, which leads to lower accuracy of future trajectory prediction. However, if we issue another prediction when the predicted bounds are not accurate any more, the costs of simulation and regression are high. Considering that the movement of objects along one edge is stable, we can assume the same trends of the trajectory bounds and adjust only the initial locations when the prediction is not accurate. Specifically, the AU treats its actual locations (the locations of the boundary objects) at that time as the initial locations of the two trajectory bounds and follows the same movement vector (e.g., slope of the bounds) as the previous bounds to provide more accurate predicted trajectory bounds. In this way, the predicted trajectory bounds can be effectively revised with few costs. The trajectory bounds are revised according to the actual locations and slopes of the original bounds. Therefore, without executing more prediction, the prediction accuracy of the objects’ future trajectories can be maintained high.
4.4.2 The Adaptive Network R-Tree (ANR-Tree) We build a spatial index (e.g., R-tree) for the road network over the adaptive units to form the adaptive network R-tree (ANR-tree) for network-constrained moving objects [6]. The ANR-tree is a two-level index structure. At the top level, it consists of a 2D R-tree that indexes the spatial information of the road network. At the bottom level, its leaves contain the edges representing multiple road segments (i.e., polylines) included in the corresponding MBR of the R-tree and these leaves point to the lists of adaptive units. Each entry in a leaf node consists of a road segment, i.e., a line segment in the polyline. The top level R-tree remains fixed during the lifetime of the index scheme (unless there are changes in the network). The index scheme is developed with the R-tree in this study, but any existing spatial index can also be used without change. Figure 4.14 shows the structure of the ANR-tree, which also includes a direct access table. The R-tree, the direct access table, and adaptive units are stored in the disk. However, the direct access table stores the summary information of some AUs on the edge and is similar to a secondary index of adaptive units. In the index scheme, each leaf node of the R-tree can be associated with its direct access table by its edgeID and the direct access table can connect to corresponding adaptive units by auPtr in its entries. Therefore, we only need to update the direct access table when AUs change, which enhances the performance of the index scheme. Since the R-Tree indexes the road network, it remains fixed, and the update of the ANR-tree is restricted to the update of adaptive units. Specifically, an AU is usually
4.4 Indexing Moving Objects in Spatial Networks
55
Fig. 4.14 Structure of the ANR-tree
created at the start of one edge and dropped at the end of the edge. Since the AU is a 1-dimensional structure, it performs update operations much more efficiently than the 2-dimensional indexes. The update of the ANR-tree can be done as follows: creating an AU, dropping an AU, adding objects to an AU, and removing objects from an AU. We will describe these operations in detail.
Creating an AU To create an AU, we first compose the ob jSet — a list of objects traveling in the same direction with similar velocities (velocity difference is not larger than a speed threshold), and in close-by locations (location difference is not larger than a distance threshold). We then predict the future trajectories of the AU by simulation and compute its trajectory bounds. In fact, we treat the AU as one moving object (the object closest to the center of the AU) and predict its future trajectory bounds by predicting this object. The prediction starts when the AU is created and ends at the end of the edge. Finally, we write the created AU to the disk page and insert the AU entry to its summary structure. Actually, AU is created in two cases: (1) at the initial time when on bulk-loading at each network edge, and (2) when the objects leave the original edge with a single object.
56
4 Moving Objects Indexing
Dropping an AU When objects in an AU move out of the edge, they may change direction independently. Thus, we need to drop this AU and create new AUs in adjacent edges to regroup the objects. When the front of an AU touches the end of the edge, some objects in the AU may start moving out of the edge. However, the AU cannot be dropped because a query may occur at that time. Only after the last object in the AU enters another edge and joins another AU, the AU can be dropped. Dropping an AU is simple. Through its entry in the direct access table, we find the AU and delete it.
Adding and removing objects from an AU When an object leaves an AU, we remove this object from the AU and find another AU in the neighborhood to check if the object can fit that AU. If it can, the object will be inserted into that AU; otherwise, a new AU is created for this object. Specifically, when adding an object into an AU, we first find the direct access table of the edge in which the object lies and by its AU entry in the table, access the AU disk storage. Finally, we insert the object into the objects list of the AU and update the AU entry in the direct access table. Removing an object from an AU involves a similar process. Therefore, when updating an object in the ANR-tree, we first determine whether the object is leaving the edge and entering another one. If it is moving to another edge, we delete it from the old AU (if it is the last object in the old AU, the AU is also dropped) and insert it into the AU nearest to the object in terms of the network distance or create a new AU in the edge it is entering. Otherwise, we do not update the AU that the object belongs to, unless its position exceeds the bounds of the AU. In this case, we execute the same updates as those when it moves to another edge. When the AU is not updated, we check whether the object is the boundary object of the AU and whether its actual position exceeds the predicted bounds of the AU to a precision threshold ε , for the purpose of adapting the trajectory bounds of the AU. Factually, we find, from the experiment evaluation, that the chances that objects move beyond the trajectory bounds of the AU on an edge are very slim. Algorithm 9 shows the update algorithm when updating an object in the AU. Like the node capacity parameter in the index tree, MAXOBJNUM in Algorithm 9 is also used to restrict the number of object entries in an AU. It is set according to the object entry storage size and AU storage size. In summary, updating the AU-based index is easier than updating the TPR-tree. It never invokes any complex node splitting and merging. Moreover, owing to the similar movement features of objects in an AU and the accurate prediction of the SP method, the objects are seldom removed or added from their AU on an edge, which reduces the number of index updates.
4.5 Indexing Past, Present, and Future Trajectories
57
Algorithm 9: Update(ob jID, position, velocity, edgeID) input : ob jID is the object identifier, position and velocity are its position and velocity, edgeID is the edge identifier for the position of the object Find AU where ob jID is included before update; if AU.edgeID = edgeID or (ob jID is not the boundary object of AU and (position < AU.lowerBound or position > AU.upperBound)) then //The object moves to a new edge or exceeds bounds of its original AU; Find the nearest AU AU1 for ob jID on edgeID; if GetNum(AU1 .ob jSet) < MAXOBJNUM and ObjectFitAU(ob jID, position, velocity, AU1 ) then InsertObject(ob jID, AU1 .auID, AU1 .edgeID); else AU2 ← CreateAU(ob jID,edgeID); if GetNum(AU.ob jSet) > 1 then DeleteObject(ob jID, AU.auID, AU.edgeID); else DropAU(AU.edgeID, AU.auID); else if (ob jID is the low boundary object of AU and position − AU.lowerBound ≥ ε ) or (ob jID is the high boundary object of AU and AU.upperBound − position ≥ ε ) then AdaptAUBounds(AU, position);
4.5 Indexing Past, Present, and Future Trajectories There are two difficulties in indexing the moving objects, including the past, current, and future location information — storage and update. We need to support continuous updating and ensure the efficiency of querying at the same time. We also need to reduce the storage of all data. We will introduce the techniques of indexing the past, present and future trajectories using the ANR-tree.
4.5.1 Indexing Future Trajectory Since the future movement of the adaptive unit is predicted through the simulationbased prediction method, the ANR-tree can be used to index the future trajectory of moving objects to support efficient predictive queries. The predicted movement of the adaptive unit in the ANR-tree is not given by a single trajectory, but instead by two trajectory bounds based on different assumptions on the traffic conditions and obtained from the simulation. Since objects in an AU have similar movement, we then predict the movement of the AU, as if it were a single moving object. In this part, we propose an algorithm for predictive range query in the ANR-tree. It can also be extended to support the predictive kNN query and continuous query. A predictive range query captures all objects moving in a road network whose locations are inside a specified region R during time interval [T1 , T2 ] in the future. Given a spatio-temporal window range with (X1 ,Y1 , X2 ,Y2 , T1 , T2 ), the query algorithm on the ANR-tree consists of the following steps:
58
4 Moving Objects Indexing
(1) We first perform a spatial window range search (X1 ,Y1 , X2 ,Y2 ) in the top level R-tree to locate the edges (e.g., e1 , e2 , e3 , . . .) that intersect the spatial query range. (2) For each selected edge ei , we transform the original 3D search (X1 ,Y1 , X2 ,Y2 , T1 , T2 ) to a 2D search (S1 , S2 , T1 , T2 ) (S1 ≤ S2 , T1 ≤ T2 ), where S1 and S2 are the relative distances from the start vertex along the edge ei . Figure 4.15(a) shows an example when the query window range only intersects one edge e1 . In the case of multiple intersecting edges, we can divide the query range into several sub-ranges by edges and apply the transformation method to each edge. The method is also applicable to the various modes the query and edges intersect. For space limitation, we only illustrate the case in Fig. 4.15(a) and compute its relative distances S1 and S2 . It can be easily extended to other cases. Suppose Xstart ,Ystart , Xend ,Yend are the start vertex coordinates and the end vertex coordinates of the edge e1 . According to Thales Theorem about similar triangles, we obtain S1 and S2 as follows: (4.3) r = (Xstart − Xend )2 + (Ystart −Yend )2 X1 − Xstart r Xend − Xstart Y1 −Ystart r S2 = Yend −Ystart S1 =
(4.4) (4.5)
Fig. 4.15 Window range query in the ANR-tree
(3) We further find the adjacent edges of e1 on which it would be possible for objects to move into the window range during the future period [T1 , T2 ]. For supporting future spatio-temporal range queries, the TPR-tree expands MBRs towards every direction according to the maximum speed of objects, which, when applied to the network, will result in a large candidate result set including some objects that are impossible to move into the query range due to network constraints. There are limited possibilities of objects’ movement in the road network. Therefore, we filter the candidate AUs in the adjacent edges possibly intersecting the window range by expanding along the network according to the maximum speed allowed in the network, adjacent table of edges, and future query time. Figure 4.16 shows an example
4.5 Indexing Past, Present, and Future Trajectories
59
of a network expanding during the query processing where the arrow denotes the direction of the edge. Let Vmax be the maximum speed and T0 (T0 = 0 in our example) be the current time. The process first expands the network from the point of edge e1 intersecting the spatial window (e.g., locations of S1 in Fig. 4.16(a)) towards the reverse direction of e1 and then continues to the adjacent edges obtained from the reverse adjacent table of e1 until an expanded distance Vmax ∗ (T2 − T0 ) is reached. The traversed edges e2 , e3 in this example are returned. The AUs on these edges (e.g., AU3 on e2 and AU4, AU5 on e3 in Fig. 4.16(a)) will be further checked whether they may intersect the query range during [T1 , T2 ]. In this way, we can avoid the false negative for objects in the other edges during the query processing. (4) The transformed query (S1 , S2 , T1 , T2 ) is executed in each of the AUs in the direct access table of the corresponding edge e1 . As illustrated by Fig. 4.15(b), an AU is suitable to the query only if the 2D window range intersects the area between the upper and lower trajectory bounds of the AU. Otherwise, when the query is below the lower bound (e.g., βl ∗ T1 + αl > S2 ) or above the upper bound (e.g., βu ∗ T2 + αu < S1 ) of the AU, the query cannot contain objects in this AU. The computations of transformed queries in adjacent edges e2 and e3 are also together shown in Fig. 4.16(b). For the adjacent edge e2 with the length of l(e2 ), we revise the transformed query to (S1 + l(e2 ), S2 + l(e2 ), T1 , T2 ) and filter AUs suitable to the query by linking e2 and e1 , which is shown in the t -d coordinate plane of Fig. 4.16(b). We use the same method to filter AUs on the adjacent edge e3 by linking e3 and e1 , which is shown in the t -d coordinate plane of Fig. 4.16(b). It is reasonable to treat these AUs as candidates since the objects in them are also likely to move to the query range in the future time. In our example, the query returns AU2 in edge e1 and AU4 in adjacent edge e3 . By the trajectory bounds of the AU, we can determine whether the transformed query intersects the AU, thus filtering out the unnecessary AUs quickly. (5) Finally, we access the selected AUs in disk storage and return the objects satisfying the predictive query window. Future spatio-temporal queries in a road network are more difficult to compute when considering objects crossing different road segment edges because the future movement of objects in the road intersection is complex, but has limited possibilities due to network constraints. For improving the efficiency of the prediction of AU, the trajectory bounds of AU are computed based on the simulation not of all objects in it but of the object closest to the center of the AU. In this way, it seems that the query processing will possibly not return correct query results (false negative) since the extrapolated positions of the object at the query time will be outside the bounds of the AU. However, this seldom happens for the following three reasons. (1) AU is constructed by a group of moving objects with similar moving patterns and maintained by tightly bounding enclosed moving objects for some time in the future. It is reasonable to approximate the AU by its center object. (2) In the SP method, to refine the prediction accuracy, we simulate two trajectories based on different assumptions on the traffic conditions (e.g., laminar and congested traffic) and translate the regressed lines outside to contain all possible future positions of the object. (3) The adaptation of the trajec-
60
4 Moving Objects Indexing
Fig. 4.16 Network expanding in query processing
tory bounds of AU also further improves the accuracy of trajectory prediction over time. Therefore, such an approximation of AU simulation by its center object can achieve high efficiency improvement by causing very low possibility of incorrect query results.
4.5.2 Indexing History Trajectories Since the AUs can record the whole history of object’s movement, we extend the ANR-tree to support the queries in the past, current, and future and build a hybrid index structure. As shown in Fig. 4.17, at the top level, it consists of a 2D R-tree that indexes spatial information of the road network, the Current-AU in the middle level, and Past-AU in the bottom level. AUs are generated at the start point and inserted into Past-AU at the end of the edge. For the update, we reduce the cost by grouping objects and predicting the movements. For the storage, we just store the moving function instead of currency location. We use Data Sketch to reduce the data size and ensure querying efficiency. The historical trajectory of a moving object is composed of some segments. We use a storage model called Segment Container, denoted as < auSet, startTime, endTime, auIndex >. Here auSet is the list of AUs, the period between startTime and endTime is the time interval of the AUs in this Segment Container, the auIndex is an index to access AU quickly. The Segment Container is shown in Fig. 4.18. Past-AU can support queries for historical trajectories. A historical trajectory of a moving object involves many AUs, and so another two structures need to be stored for Past-AU: link information Link and operation set opSet where Link records the
4.6 Update-Efficient Indexing Structures
61
Fig. 4.17 The full-time indexing structure
Fig. 4.18 Segments container
segment that the objects come from and opSet = < op, ob jId,time, auId >. Here op denotes the operation including deletion and insertion. ob jId and auId denote the moving object and AU that have been deleted/inserted. time records the time of the operation. Figure 4.19 shows a query example for historical trajectories. For a moving object o, we find the current AU au3. To find the past AUs of o, link information Link is accessed and we get au2 and au1. In the case o is deleted for au2 and au1, operation set opSet is used to find the information of o.
4.6 Update-Efficient Indexing Structures Considerable efforts have been taken to reduce the need for index updates of moving objects. In summary, they can be classified into three categories. First, most studies focus on optimizing the updating of existing multi-dimensional index structures especially in the adaptation and extension of the R-tree [12]. The
62
4 Moving Objects Indexing
Fig. 4.19 Querying for historical trajectories
top-down update of an R-tree is costly since it needs several paths for searching the right data item considering the MBR overlaps. In order to reduce the overhead, Kwon et al. [15] have developed the Lazy Update R-tree (LUR-tree), which is updated only when an object moves out of the corresponding MBR. By adding a secondary index on the R-tree, it can perform the update operation in a bottom-up way. Recently, by exploiting the change-tolerant property of the index structure, Cheng et al. [8] presented the CTR-tree to maximize the opportunity of applying lazy updates and reduce the number of updates that cross MBR boundaries. [16] extends the main idea of [15] and generalizes the bottom-up update approach. However, they are not applicable to the case where consecutive changes of objects are large. Xiong and Aref [30] have presented the RUM-tree that processes R-tree updates in a memo-based approach, which eliminates the need to delete the old data item during an index update. Therefore, its update performance is stable with respect to the changes between consecutive updates. In the ANR-tree index structure, however, the R-tree remains fixed since it indexes the road network and only the adaptive units are updated. The second type of methods is based on the dimension reduction technique [23] and a low-dimensional index [13, 31] (e.g., B+ -tree). The Bx -tree [13, 31] combines the linearization technique with a single B+ -tree to efficiently update the index structure. It uses space-filling curves and a pre-defined time interval to partition the representation of the locations of moving objects. This makes the B+ -tree capable of indexing the 2-dimensional spatial locations of moving objects. Therefore, the cost of individual updating of the index is reduced. However, the 2-dimensional locations of objects are linearized by a space-filling curve and the time is also partitioned by a pre-defined time interval. Therefore, the Bx -tree imposes discrete representation and may not maintain the precise values of location and time during the partitioning. For index structures in spatial networks, the 2-dimensional spatial locations of moving objects can be reduced to the 1.5 dimensions [14] by the spatial network in which objects move. The techniques in the third category use a prediction method represented as the time-parameterized function to reduce the index updates [24, 25, 28]. They store the parameters of the function, e.g., the velocity and the starting position of an object, instead of the real positions. In this way, they update the index structure only when the parameters change (for example, when the speed or the direction of a moving object changes). The Time-Parameterized R-tree (TPR-tree) [25] and its variants
References
63
(e.g., TPR*-tree) [24, 28] are examples of this type of index structures. They all use a linear prediction model, which relates objects’ positions as a linear function of the time. Actually, these methods also support predictive queries that are usually processed by the dual transformation technique in some index methods [2, 21]. However, it is hard for the linear prediction to reflect the movement in many real-life application, especially in traffic networks where vehicles change their velocities frequently. The frequent changes of the object’s velocity will lead to repeated updates of the index structure. Moreover, other prediction models with non-linear prediction proposed by Aggarwal and Agrawal [1] using quadratic predictive functions and by Tao et al. [26] based on recursive motion functions for objects with unknown motion patterns improve the precision in predicting the location of each object. However, they ignore the correlation of adjacent objects and may not reflect accurately some complex and stochastic traffic movement scenarios. The ANR-tree also falls into this category and applies an accurate prediction method when updating the index structure by considering more transportation features. The ANR-tree optimizes update performance from the following aspects: (1) An AU functions as a 1-dimensional MBR in the TPR-tree [25], while it minimizes expanding and overlaps by considering more movement features. (2) The AU captures the movement bounds of objects based on a simulation-based prediction method, which considers the network constraints and stochastic traffic behavior. (3) Since the movement of objects is reduced to occur in one spatial dimension and is attached to the network, the update of the index scheme is only restricted to the update of the AUs.
4.7 Summary In this chapter, we discuss the indexing for moving objects, especially how to handle the updating. Three approaches are introduced for moving objects in Euclidean spaces: TPR-tree, GMOI, and FT-Quad-tree. We propose the ANR-tree to index moving objects in spatial networks. In addition, we analyze how to reduce the updating cost of indexing structures in these approaches and other index structures.
References 1. Aggarwal C, Agrawal D (2003) On Nearest Neighbor Indexing of Nonlinear Trajectories. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2003), San Diego, California, USA, pp 252-259 2. Agarwal PK, Arge L, Erickson J (2000) Indexing Moving Points. In: Proceedings of the 19th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2000), Dallas, Texas, USA, pp 175-186 3. Almeida VT, G¨uting RH (2004) Indexing the Trajectories of Moving Objects in Networks. GeoInformatica 9(1):33-60
64
4 Moving Objects Indexing
4. Beckmann N, Kriegel HP, Schneider R, Seeger B (1990) The R*-Tree: An Efficient and Robust Access Method for Points and Rectangles. In: Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD 1990), Atlantic City, New Jersey, USA, pp 322-331 5. Chen J, Meng X (2009) Update-efficient Indexing of Moving Objects in Road Networks. GeoInformatica 13(4):397-424 6. Chen J, Meng X, Guo Y, Grumbach S. Indexing Future Trajectories of Moving Objects in a Constrained Network (2007) Journal of Computer Science and Technology 22(2):245-251 7. Chen J, Meng X, Guo Y, Grumbach S, Wang H. Modeling and Predicting Future Trajectories of Moving Objects in a Constrained Network (2006) In: Proceedings of the 7th International Conference on Mobile Data Management (MDM 2006), Nara, Japan, pp 156 8. Cheng R, Xia Y, Prabhakar S, Shah R (2005) Change Tolerant Indexing for Constantly Evolving Data. In: Proceedings of the 21st International Conference on Data Engineering (ICDE 2005), Tokyo, Japan, pp 391-402 9. Ding R, Meng XF (2001) A Quadtree Based Dynamic Attribute Index Structure and Query Processing. In: Proceedings of the 3rd International Conference on Networking and Mobile Computing (ICCNMC 2001), Beijing, China, pp 446-451 10. Ding R, Meng XF, Bai Y (2003) Efficient Index Update for Moving Objects with Future Trajectories. In: Proceedings of the 8th International Conference on Database Systems for Advanced Applications (DASFAA 2003), Kyoto, Japan, pp 183-194 11. Frentzos E (2003) Indexing Objects Moving on Fixed Networks. In: Proceedings of the 8th International Symposium on Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 289-305 12. Guttman A (1984) A Dynamic Index Structure for Spatial Searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1984), Boston, Massachusetts, USA, pp 47-57 13. Jensen CS, Lin D, Ooi BC (2004) Query and Update Efficient B+ Tree Based Indexing of Moving Objects. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp 768-779 14. Kollios G, Gunopulos D, Tsotras VJ (1999) Effective Density Queries on Continuously Moving Objects. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 1999), Atlanta, Georgia, USA, pp 71 15. Kwon D, Lee SL, Lee S (2002) Indexing the Current Positions of Moving Objects Using the Lazy Update R-tree. In: Proceedings of the 3rd International Conference on Mobile Data Management (MDM 2003), Singapore, pp 113-120 16. Lee ML, Hsu W, Jensen CS, Cui B, Teo KL (2003) Supporting Frequent Updates in R-Trees: A Bottom-Up Approach. In: Proceedings of 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp 608-619 17. Meng XF, Ding ZM, Bai Y (2003) GRID-based Moving Object Indexing Methods (in Chinese). Journal of Computer Research and Developement 40(8):280-285 18. Mohamed FM, Thanaa MG, Walid GA (2003) Spatio-temporal Access Methods. IEEE Data Engineering Bulletin 26:40-49 19. Nievergelt J, Hinterberger H (1984) The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Transactions on Database Systems 9(1):38-71 20. Nascimento MA, Silva JRO (1998) Towards Historical R-trees. In: ACM Symposium on Applied Computing (SAC 1998), Atlanta, Georgia, USA, pp 235-240 21. Patel JM, Chen Y, Chakka VP (2004) STRIPES: An Effcient Index for Predicted Trajectories. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 637-646 22. Pfoser D, Jensen CS, Theodoridis Y (2000) Novel Approaches in Query Processing for Moving Object Trajectories. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp 395-406 23. Pfoser D, Jensen CS (2003) Indexing of Network Constrained Moving Objects. In: Proceedings of the 11th ACM International Symposium on Advances in Geographic Information Systems (GIS 2003), New Orleans, Louisiana, USA, pp 25-32
References
65
24. Saltenis S, Jensen CS (2002) Indexing of Moving Objects for Location-Based Service. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, USA, pp 463-472 25. Saltenis S, Jensen CS, Leutenegger ST, Lopez MA (2000) Indexing the Positions of Continuously Moving Objects. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, USA, pp 331-342 26. Tao Y, Faloutsos C, Papadias D, Liu B (2004) Prediction and Indexing of Moving Objects with Unknown Motion Patterns. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 611-622 27. Tao Y, Papadias D (2001) The MV3R-Tree: A Spatiotemporal Access Method for Timestamp and Interval Queries. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Roma, Italy, pp 431-440 28. Tao Y, Papadias D, Sun J (2003) The TPR*-Tree: An Optimized Spatiotemporal Access Method for Predictive Queries. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp 790-801 29. Tayeb J, Ulusoy O, Wolfson O (1998) A Quadtree Based Dynamic Attribute Indexing Method. The Computer Journal 41(3):185-200 30. Xiong X, Aref WG (2006) R-trees with Update Memos. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), Atlanta, Georgia, USA, pp 22 31. Yiu ML, Tao Y, Mamoulis N (2006) The Bdual -Tree: Indexing Moving Objects by SpaceFilling Curves in the Dual Space. The International Journal on Very Large Data Bases 17(3):379-400
Part II
Moving Objects Management Techniques
This part describes the key techniques of moving objects management, in particular the query processing, location prediction and uncertainty management. Different from the traditional data, the moving object data contain two kinds of attributes − spatial and temporal attributes. Therefore, to answer the queries for moving object data, the spatial and temporal predicates must be indicated. In Chapter 5, we classify the basic querying types for moving objects according to spatial predicates, temporal predicates and moving spaces. Then we introduce how to process NN queries and range queries in a spatial network, based on the Euclidean restriction and network expansion frameworks. In Chapter 6, we introduce the advanced querying for moving objects including similar trajectory queries and density queries. Besides the snapshot density queries, we introduce some efficient methods to continuously monitor dense regions for moving objects. The cluster-based pre-processing can efficiently support snapshot density queries in spatial networks. Based on the notion of safe interval and the Quad-tree index, we propose effective algorithms to evaluate and keep track of dense regions for continuous density queries for moving objects. In the management of moving objects, the trajectory prediction method is usually used to improve the performance of the location update strategy and to support the predictive index and queries. In Chapter 7, we first review some linear prediction methods and analyze their problem in handling moving objects in spatial networks, then present the simulation-based prediction methods: Fast-Slow Bounds Prediction and Time-Segment Prediction, which are more accurate than linear prediction methods in predicting future trajectories of moving objects in spatial networks. In Chapter 8, we study the uncertainty management issue for moving objects databases with uncertainty models, indexing and querying algorithms. We propose an uncertainty model and an index framework, the UTR-Tree, for indexing the uncertain trajectories of network-constrained moving objects. Through a dynamic index maintenance technique which is associated with location updates, the UTR-Tree can deal with the uncertain trajectories, which include not only the historical locations of moving objects, but also their current and near future location information with uncertainty considered, so that the queries on the whole life span of the moving objects can be efficiently supported.
Chapter 5
Moving Objects Basic Querying
Xiaofeng Meng1 , Jidong Chen2 1
Renmin University of China, 2 EMC Research China
[email protected],
[email protected] Abstract Once we build the model and index for moving objects, we can answer the queries for moving objects. There are many types of queries in moving objects databases such as the nearest neighbor (NN) query, range query, and density query. In this chapter, we will introduce the basic querying types for moving objects according to spatial predicates, temporal predicates, and moving spaces. Though there are many techniques to support moving objects queries, most of the existing studies consider Euclidean spaces, where the distance between two objects is determined solely by their relative position in space. However, in practice, objects can usually move only on a pre-defined set of trajectories as specified by the underlying network. Hence we will introduce how to answer range queries and NN queries for moving objects in a spatial network, which is based on the work of Papadias in [11]. Key words: spatial-temporal query, nearest neighbor query, range query, spatial network, moving object databases
5.1 Introduction Considerable research has been carried out on moving object databases, which has resulted in the development of numerous indexes, and query processing techniques. Surprisingly, most of the existing studies consider Euclidean spaces, where the distance between two objects is determined solely by their relative position in space. However, in many applications that manage spatial data (e.g., location-based services), the position and accessibility of spatial objects are constrained by spatial
69
70
5 Moving Objects Basic Querying
networks such as road, railway, and river. In such cases, the actual distance between two objects corresponds to the length of the shortest path connecting them in the network, i.e., the network distance. For instance, consider the spatial network of Fig. 5.1, where the rectangles correspond to hotels. If a user at location q poses the range query “find the hotels within a 15 km range”, the results will contain a, b, and c (the numbers in the figure correspond to network distance). Similarly, a nearest neighbor query will return hotel b. Note that the results of the corresponding conventional queries are different (e.g., the Euclidean nearest neighbor is d, which is actually the farthest hotel in the network). Furthermore, queries may combine both location and network aspects, such as “find the nearest hotel to the south” (e.g., hotel a).
Fig. 5.1 Road network query example
In this chapter, we will introduce the basic querying types for moving objects according to spatial predicates, temporal predicates, and moving spaces. Then we propose how to process range queries and NN queries for moving objects in a spatial network based on the Euclidean restriction and network expansion frameworks. The resulting algorithms expand conventional processing techniques by integrating connectivity and location information for efficient pruning of the search space [11].
5.2 Classifications of Moving Object Queries The moving object has two kinds of attributes — spatial and temporal. Therefore, to answer the queries for moving objects, the spatial and temporal predicates must be indicated. The answers for these queries are moving objects that satisfy the predicates. Hence there are many kinds of queries for moving object data according to spatial and temporal predicates. We introduce them in this section.
5.2 Classifications of Moving Object Queries
71
5.2.1 Based on Spatial Predicates The spatial predicate indicates a point or range. The queries for moving objects can be divided into four classes as follows: 1. Range Query: A range query is to find the objects within some specific area that corresponds to a rectangular window or a circular area around a query point. For example, “find all of the people who walked within one mile of the buildings at the time.” The range queries are the most basic queries and are widely used. Most moving objects indexing methods can support range query processing. 2. Nearest Neighbor Query: A nearest neighbor (NN) query is to find the object which is nearest to a query point. The most popular NN query is kNN query, which is to find the k nearest neighbors to a query point. There is another kind of NN query, called reverse nearest neighbor (RNN) Query. The RNN query is to find the object whose nearest neighbor is the query point. For example, consider some taxies and some passengers. The passenger wants to know which taxi is closest to him. The taxi wants to find the passenger who has this taxi as a nearest neighbor, so he will be a possible customer. So far, two kinds of approaches have been developed to process an NN query: index traverse and region precomputation. Most research studies adopt the first approach, and use the R-tree or Quad-tree to index the moving objects. A typical algorithm is the branch-andbound algorithm proposed by Roussopoulos et al. in 1995 [12]. This approach traverses the R-tree to find the nearest neighbor of the query point in a depth-first manner. For region pre-computation, the Voronoi graph is a typical method to find the result of an NN query [7]. 3. Aggregate Nearest Neighbor Query: An aggregate nearest neighbor (ANN) query returns the object that minimizes an aggregate distance function with respect to a set of query points. Consider, for example, several users at specific locations (query points) that want to find the restaurant (data point), which leads to the minimum sum of distances that they have to travel in order to meet. ANN queries are a natural way to express requests by groups of mobile users who want to optimize their routes according to an aggregate function applying on the traveling distances. Apart from the meeting-restaurant example, other application instances include (1) establishing a meeting station for members of a new church based on its distances from their homes, and (2) selecting the location of a touristic office based on its distances to attractions in a city. Yiu et al. [17] solve the ANN queries for objects in spatial networks. They consider alternative aggregate functions and techniques that utilize Euclidean distance bounds, spatial access methods, and/or network distance materialization structures. 4. Density Query: Density query [5, 6, 9] involves finding dense areas with high concentration of moving objects, where the density of moving objects is higher than the given threshold. Hadjieleftheriou et al. [5] first propose the density query for moving objects. They define density region as density(R, Δ t) = min Δ tN/area(R), where min Δ tN is the minimum total number of objects in region R during time interval Δ t; area(R) is the area of R. Based on the definition,
72
5 Moving Objects Basic Querying
they introduced two types of density queries: snapshot density queries (SDQ) and period density queries (PDQ). In the case of SDQ, users require information about the dense regions in a specific time, for example, “tell me the region where the total number of cars is more than 100 at 3 pm”. In the case of PDQ, users require information about the dense regions within a time period, for instance, “tell me the region where the total number of cars is always more than 100 in 10 minutes”. Jensen et al. [6] focus on how to find the dense regions in a specific time. Similar to the work of [5], they also assume there are a lot of moving objects in a Euclidean space and these objects move in a linear manner. The difference is that they can avoid the answer loss. Both studies assume the objects to be moving in a free style and thereby define the density query in Euclidean space. However, efficient dynamic density query in spatial networks is more crucial for many real-life applications.
5.2.2 Based on Temporal Predicates There are three kinds of different temporal predicates in moving objects queries. Accordingly, the queries can be classified to three classes: historical, current and future query. There are different indexes supporting the different queries. In the case of range query, historical indexes such as TB-tree [16] can support range queries for historical data; current indexes such as LUR-tree [8], which is based on R-tree, can support queries for current locations; future indexes can answer future range queries by predicting the location of moving objects for a limited time period and the query result precision is determined by the prediction model. For historical queries and current queries, the processes are relatively simple; but in the case of future queries, the process is more complex because the future location needs to be predicted. There are two typical approaches for future queries: space transformation technology in multi-dimension space, such as STRIPES [15] and expanding approach such as the TPR-tree, TPR*-tree, and Bx -tree. The transformation approach divides space into non-overlapping parts, and transfers the trajectory in (d − 1) dimension into points in 2 dimension space. The expanding approach can be divided into two forms: query range expanding and MBR expanding, which is more widely used than the transformation approach.
5.2.3 Based on Moving Spaces Moving objects queries can be divided into queries in Euclidean spaces and in spatial networks. Most of the existing studies focus on query processing in Euclidean spaces. For query processing in spatial networks, the distance metric is different from the Euclidean distance, and so the method used in Euclidean spaces cannot be used in spatial networks. There are two main differences: “nearest” refers not to the
5.3 NN Queries
73
nearby location but the smallest network distance; the distance between objects is not determined by locations of objects but the connection of network. There are three kinds of approaches for query processing in spatial networks: (1) combining the tree traversing with route searching; (2) applying the multi-pass shortest path algorithm to the network distance computations that starts from a single source to all destinations; (3) transforming the spatial network to hyperspace and using the Euclidean measurement method. The main idea in these approaches is filtering out the unnecessary objects using some space partitioning methods to reduce shortest path computation and then refine the candidate set by network distance to get the final results. The disk-based network representation method [3] can support NN queries, range queries and closet pair queries by combining spatial network connections and Euclidean location information. The ANN queries in spatial networks can be processed by using the boundary properties of Euclidean distances, spatial data access method, and network distance materialization technology [17]. In [18], the authors use graph theory and query result materialization technology to reduce the network expansion in the Dijkstra algorithm and improve the efficiency of processing RNN queries.
5.3 NN Queries Given a source point q and an entity dataset S, a kNN query retrieves the k (≥1) objects of S closest to q according to the network distance (e.g., “find the hotel within the shortest driving distance”). This section presents two algorithms for nearest neighbor queries, based on the Euclidean restriction and network expansion frameworks. Euclidean restriction takes advantage of the Euclidean lower-bound property to prune the search space. On the other hand, the network expansion framework performs query processing directly on the network [11].
5.3.1 Incremental Euclidean Restriction The Incremental Euclidean Restriction (IER) algorithm applies the multi-step kNN methodology [2, 14], traditionally used for high-dimensional similarity retrieval. Specifically, assuming that only one NN is required, IER first retrieves the Euclidean nearest neighbor pE1 of q, using an incremental kNN algorithm (e.g., [4]) on the entity R-tree of S. Then, the network distance dN (q, pE1 ) of pE1 is computed. Owing to the Euclidean lower-bound property, objects closer (to q) than pE1 in the network should be within Euclidean distance dE max = dN (q, pE1 ) from q, i.e., they should lie in the shaded area of Fig. 5.2(a). In Fig. 5.2(b), the second Euclidean NN pE2 is then retrieved (within the dE max range). Since dN (q, pE2 ) < dN (q, pE1 ), pE2 becomes the current NN and dE max is updated to dN (q, pE2 ), after which the search region (for potential results) becomes smaller (the shaded area in Fig. 5.2(b)). Since the next
74
5 Moving Objects Basic Querying
Euclidean NN pE3 falls outside the search region, the algorithm terminates with pE2 as the final result.
Fig. 5.2 Finding the NN pE2
The extension to k nearest neighbors is straightforward. The k Euclidean NNs are first obtained using the entity R-tree, sorted in ascending order of their network distance to q, and dE max is set to the distance of the k-th point. Similar to the single NN case, the subsequent Euclidean neighbors are retrieved incrementally, while maintaining the k (network) NNs and dE max (except that dE max equals the network distance of the k-th neighbor), until the Euclidean distance of the next Euclidean NN is larger than dE max . Algorithm 10 illustrates the pseudo-code of IER. Algorithm 10: IER (q, k) input : q is the query point, k is the number of query results output: k nearest neighbors to q {p1 , ..., pk }=Euclidean NN(q, k); for each entity pi do dN (q, pi ) = compute ND(q, pi ); end sort p1 , ..., pk in ascending order of dN (q, pi ); dEmax = dN (q, pk ); while dE (q, p) ≤ dE max do (p, dE (q, p))=next Euclidean NN(q); if dN (q, p) < dN (q, pk ) then insert p in {p1 , ..., pk }; dEmax = dN (q, pk ); end end
5.3 NN Queries
75
5.3.2 Incremental Network Expansion IER (and the Euclidean restriction framework in general) is more effective if the ranking of the data points by their Euclidean distance is similar to that with respect to the network distance. Otherwise, a large number of Euclidean NNs may be inspected before the network NN is found. Figure 5.3 shows an example where the black points represent the nodes in the modeling graph and rectangles denote entities. The nearest entity to the query q (white point) is p5 . The subscripts of the entities (p1 , p2 , ..., p5 ) are in ascending order of their Euclidean distance to q. Since p5 has the largest Euclidean distance, it will be examined after all other entities, i.e., p1 to p4 correspond to f alse hits, for which the network distance computations are redundant.
Fig. 5.3 Finding the NN p5
To remedy this problem, the Incremental Network Expansion (INE) algorithm performs network expansion (starting from q), and examines entities in the order they are encountered. Specifically, INE first locates the segment n1 n2 that covers q, and retrieves all entities on n1 n2 . Since no point is covered by n1 n2 , the node (n1 ) closest to the query is expanded (while the second endpoint n2 of n1 n2 is placed in a queue Q). No data point is found in n1 n7 and n7 is inserted to Q=<(n2 ,5), (n7 ,12)>. The expansion of n2 reaches n4 and n3 , after which Q=<(n4 ,7), (n3 ,9), (n7 ,12)> and point p5 is discovered on n2 n4 (while no point is found on n2 n3 ). The distance dN (q, p5 ) = 6 provides a bound dN max to restrict the search space. The algorithm terminates now since the next entry n4 in Q has larger distance (i.e., 7) than dN max . Algorithm 11 shows the pseudo-code of INE.
76
5 Moving Objects Basic Querying
Algorithm 11: INE (q, k) input : q is the query point, k is the number of query results output: k nearest neighbors to q ni n j = f ind segment(q); Scover = f ind entities(ni n j ); {p1 , ..., pk } = the k (network) nearest entities in Scover sorted in ascending order of their network distance; dNmax = dN (q, pk ); Q =< (ni , dN (q, ni )), (n j , dN (q, n j )) >; de-queue the node n in Q with the smallest dN (q, n); while dN (q, n) < dN max do for each non-visited adjacent node nx of n do Scover = f ind entities(nx n); update {p1 , ..., pk } from {p1 , ..., pk } Scover ; dNmax = dN (q, pk ); en-queue (nx , dN (q, nx )); end de-queue the next node n in Q; end
5.4 Range Queries Given a source point q, a value e, and a spatial dataset S, a range query retrieves all objects of S that are within the network distance e from q. This section applies the Euclidean restriction and network expansion paradigms for processing such queries [11].
5.4.1 Range Euclidean Restriction The Range Euclidean Restriction (RER) method first performs a range query at the entity dataset and returns the set of objects S within (Euclidean) distance e from q. Assuming the Euclidean lower bound property, S is guaranteed to avoid false misses (i.e., dN (q, p) ≤ e ⇒ dE (q, p) ≤ e), but it may contain a large number of false hits. In order to reduce the number of network distance computations, RER performs network expansion only once, examining all segments within network distance e from q. Points of S that fall on some segment are removed from S and returned to the user. The process terminates when all the segments in the range are exhausted, or when S becomes empty. Algorithm 12 illustrates the pseudo-code of the algorithm. S contains the results of the Euclidean range query sorted on some dimension. When a new segment is encountered, the sorted list is used to efficiently check if any point falls inside its MBR (filter step). Such points are then compared with the poly-line representation of the segment to determine whether they belong to the actual result (refinement step). Part of some segments at the boundary may
5.4 Range Queries
77
exceed the query threshold e, but these segments must be considered nonetheless since they may contain data points that satisfy the query.
Algorithm 12: RER (q, e) input : q is the query point, e is the network distance threshold output: objects of S that are within network distance e from q result = 0; / S = Euclidean-range(q, e); ni n j = f ind segment(q); Q =< (ni , dN (q, nI )), (n j , dN (q, n j )) >; de-queue the node n in Q with the smallest dN (q, n); while dN (q, n) ≤ e and S = 0/ do for each non-visited adjacent node nx of n do for each point s of S do s) then if check entity(nx n, result = result {s}; S = S − {s}; end end en-queue(nx , dN (q, nx )); end de-queue the next node n in Q; end
5.4.2 Range Network Expansion The Range Network Expansion (RNE) algorithm first computes the set QS of qualifying segments within network range e from q and then retrieves the data entities falling on these segments. The methodology is similar to INE, but now numerous queries, one for each qualifying segment, are performed simultaneously. To illustrate RNE, assume that QS contains the segments shown in Fig. 5.4(a). Starting from the root of the object R-tree, RNE visits nodes that intersect the MBR of at least one segment in QS. Fig. 5.4(b) illustrates the visited nodes and the qualifying objects in gray. In order to avoid joining the entire QS (which may be large) with every entry, we perform the following optimization. QS is divided into (possibly overlapping) sets QSi , one for each entry Ei in the current R-tree node. A segment is assigned to all entries that intersect its MBR. When the children of Ei are visited, they are only compared against QSi . Thus, as RNE descends the tree, the number of comparisons performed for each entry is reduced. In Fig. 5.4, the set of qualifying segments QS1 = 0, / while for E2 , QS2 consists of all segments except n1 n4 and n5 n8 . Similarly, QS5 = {nq n2 , n2 n5 , n2 n6 } and QS6 = {nq n1 , n2 n6 , n4 n7 }. When the node of E5 (E6 ) is visited, its points will only be checked against the segments of QS5 (QS6 ).
78
5 Moving Objects Basic Querying
Fig. 5.4 An example of RNE
An object can be reported more than once if it lies at the intersections of the segments in QS. Such duplicates are easy to remove, by sorting the results at each leaf node before reporting them. RNE is I/O optimal (since it only accesses R-tree nodes that overlap some qualifying segment, and therefore, may contain results). The pseudo-code of RNE is presented in Algorithm 13. The initial parameters of the algorithm are (root of R-tree S, QS, 0). / To reduce the number of intersection tests, at lines 2 and 7 we apply a plane sweep algorithm [1].
Algorithm 13: RNE (node id, QS, result) input : id of a node; segments within network range e from q in entry of node id; result set if node id is an intermediate node then compute QSi for each entry Ei in node id; for each entry Ei in node id do if QSi = 0/ then RND(Ei .node id, QSi , result); end end else resultnode id = plane-sweep(node id.entries, QSi ); sort resultnode idto remove duplicates; result = result resultnode id ; end
An alternative is to use the methodology suggested in [10]. In particular, the MBR of all segments in QS is applied as a range query to the object R-tree. When a leaf node is reached, its contents are joined with QS, using plane-sweep. This technique performs a simple intersection test at each visited R-tree node; however, if the network range is large and irregular, it may visit numerous tree nodes that do not overlap any qualifying segment (e.g., E1 in Fig. 5.4). Finally, if QS does not fit in memory, the join is performed in a block-nested loops fashion, i.e., RNE is repeatedly applied for subsets of QS that fit in memory and the partial results are materialized. Another approach is to compute all qualifying segments, materialize
References
79
them, and join them with the object R-tree using one of the spatial join algorithms that are applicable in the presence of a single tree [13].
5.5 Summary In this chapter, we introduce the query types for moving objects, such as the NN, range and density query. We also discuss how to process range and NN queries in a spatial network, based on the Euclidean restriction and network expansion frameworks, covering the most common processing tasks. This provides an introduction to several interesting and practical directions for moving objects querying.
References 1. Arge L, Procopiuc O, Ramaswamy S, Suel T, Vitter JS (1998) Scalable Sweeping-Based Spatial Join. In: Proceedings of the 24th International Conference on Very Large Data Bases (VLDB 1998), New York City, USA, pp 570-581 2. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast Subsequence Matching in TimeSeries Databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD 1994), Minneapolis, Minnesota, USA, pp 419-429 3. G¨uting R, Bohlen M, Erwig M, Jensen C, Lorentzos N, Schneider M, Vazirgiannis M (2000) A Foundation for Representing and Querying Moving Objects. ACM Transactions on Database Systems 25(1):1-24 4. Hjaltason G, Samet H (1999) Distance Browsing in Spatial Databases. ACM Transactions on Database Systems 24(2):265-318 5. Hadjieleftheriou M, Kollios G, Gunopulos D, Tsotras VJ (2003) On-Line Discovery of Dense Areas in Spatio-Temporal Databases. In: Proceedings of the 8th International Symposium on Advances in Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 306324 6. Jensen CS, Lin D, Ooi BC, Zhang R (2006) Effective Density Queries on Continuously Moving Objects. In: Proceedings of the 22nd international Conference on Data Engineering (ICDE 2006), Atlanta, Georgia, USA, pp 71 7. Kolahdouzan M, Shahabi C (2004) Voronoi-Based K Nearest Neighbor Search for Spatial Network Databases. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp 840-851 8. Kwon D, Lee SL, Lee S (2002) Indexing the Current Positions of Moving Objects Using the Lazy Update R-tree. In: Proceedings of the 3rd International Conference on Mobile Data Management (MDM 2003), Singapore, pp 113-120 9. Ni J, Ravishankar CV (2007) Pointwise-Dense Region Queries in Spatio-Temporal Databases. In: Proceedings of the 23rd International Conference on Data Engineering (ICDE 2007), Istanbul, Turkey, pp 1066-1075 10. Papadopoulos A, Rigaux P, Scholl MA (1999) Performance Evaluation of Spatial Join Processing Strategies. In: Proceedings of the 6th International Symposium on Advances in Spatial Databases (SSD 1999), Hong Kong, China, pp 286-307 11. Papadias D, Zhang J, Mamoulis N, Tao Y (2003) Query Processing in Spatial Network Databases. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp 790-801 12. Roussopoulos N, Kelley S, Vincent F (1995) Nearest Neighbor Queries. In: Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data (SIGMOD 1995), San Jose, California, USA, pp 71-79 13. Rigaux P, Scholl M, Voisard A (2002) Spatial Databases: with Application to GIS. Morgan Kaufmann Publishers Inc
80
5 Moving Objects Basic Querying
14. Seidl T, Kriegel H (1998) Optimal Multi-Step K-Nearest Neighbor Search. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, Washington, USA, pp 154-165 15. Patel JM, Chen Y, Chakka VP (2004) STRIPES: An Effcient Index for Predicted Trajectories. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 637-646 16. Pfoser D, Jensen CS, Theodoridis Y (2000) Novel Approaches in Query Processing for Moving Object Trajectories. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp 395-406 17. Yiu ML, Mamoulis N, Papadias D (2005) Aggregate Nearest Neighbor Queries in Road Networks. IEEE Transactions on Knowledge and Data Engineering 17(6): 820-833 18. Yiu ML, Papadias D, Mamoulis N, Tao Y (2006) Reverse Nearest Neighbors in Large Graphs. IEEE Transactions on Knowledge and Data Engineering 18(4):540-553
Chapter 6
Moving Objects Advanced Querying
Jidong Chen1 , Xiaofeng Meng2 1
EMC Research China, 2 Renmin University of China
[email protected],
[email protected] Abstract So far, we have introduced the basic querying for moving objects. There are still some advanced querying for moving objects. It is more difficult to deal with these queries. In this chapter, we introduce a few advanced queries, especially similar trajectory queries and density queries for moving objects. The goal of similar trajectory queries is to find the moving patterns in the trajectories of moving objects, while density queries are to efficiently find dense areas with high concentration of moving objects. We will discuss how to process both the snapshot and continuous density queries in this chapter. Key words: spatial-temporal query, density query, similar trajectory query, spatial network, moving object databases
6.1 Introduction Recently, many location sensors such as GPS have been developed, and we can obtain the trajectory of users and moving objects using these sensors. Trajectory data are widely used in location-aware systems, transportation navigation systems, and other location-based information systems. These applications have stored within them several trajectories, and these trajectories may include useful individual patterns of each user. For example, by analyzing trajectories of users who work in a building, we can find passages, rooms, stairs, and other facilities that are used frequently. The result of the analysis can be used for the management and maintenance of the buildings. In the case of a navigation system, a driver can check the route to a city by referring to the trajectories of other users who have driven to the city earlier. In another case, we can study movement characteristics to improve performance in
81
82
6 Moving Objects Advanced Querying
a sport by analyzing the motion data measured by the sensors attached to the bodies of top sport players. Thus, similar trajectory queries are produced to find the moving patterns embedded in the trajectories. The distance-based queries such as range queries or NN queries that are defined using the distance between the trajectory of a moving object and an indicated point in a space are useful in location management of moving objects. However, these queries do not have enough power to analyze the pattern of the objects’ motion. As mentioned above, because we are interested in the extraction of the individual moving patterns of each object from the trajectories, it is necessary to develop more powerful tools to analyze the trajectories. The similar sequence matching has been studied for many years [5, 23, 24, 25, 26], but the traditional techniques for data sequence including distance function and index cannot be used in the case of the trajectories of moving objects. In this chapter, we present a data model for trajectories of mobile data, and a similar trajectory query based on the distance between two trajectories by extending the similarity used in the time series database systems. Density queries are another type of important queries for moving objects. The objective is to efficiently find dense areas with high concentration of moving objects. Density queries can be used in traffic management systems to identify and predict the congested areas or traffic jams. For example, the transportation bureau may monitor the dense regions periodically in order to identify traffic jams. An instance of density query is shown in Fig. 6.1. The lines depict the road network, points indicate moving objects, and the dense regions are marked in different colors.
Fig. 6.1 Density query
Existing studies on density queries [19, 20] assume the objects to be moving in a free style and define the density query in the Euclidean space. In this setting,
6.2 Similar Trajectory Queries for Moving Objects
83
it is difficult to efficiently answer the general density-based queries. The focus is hence turned to simplified queries [19] or specialized density queries without answer loss [20]. These methods use the grid to partition the data space into disjoint cells and report the dense regions with the fixed size. However, the real dense areas may be larger or smaller than the fixed-size rectangle and appear in different shapes. Simplifying the dense query to return the area with fixed size and shape cannot reflect the natural congested area in real-life applications. We focus on density queries in the road network setting, where the dense area consists of road segments containing large number of moving objects and may be formed in any size and shape. The real congested areas can therefore be obtained by finding the dense segments. In this chapter, we introduce a cluster-based method for monitoring the snapshot of dense areas of moving objects in a road network. Then, we discuss how to continuously monitor dense regions for moving objects. Based on the notion of safe interval, we propose effective algorithms to evaluate and keep track of dense regions.
6.2 Similar Trajectory Queries for Moving Objects Moving object trajectories can be considered as two (X-Y plane) or three (X-Y -Z plane) dimensional time series data. In terms of similarity-based queries, we are concerned with the movement shape of the trajectories; sequences of sampled vectors are important in measuring the similarity between two trajectories and time component is less important so can be ignored. This separates similarity-based retrieval from queries in spatio-temporal databases where time components of trajectories are important to answer time slice or time interval queries [4]. Considerable research has been conducted on similarity-based retrieval on one-dimensional time series data, such as stock or commodity prices, sales volume, weather data, and biomedical measurements. However, the distance functions and indexing methods proposed for one-dimensional time series data cannot be directly applied to moving object trajectories due to their unique characteristics. • Trajectories are usually two or three dimensional data sequences and a trajectory data set often contain trajectories with different lengths. Most of the earlier proposals on similarity-based time series data retrieval focused on one-dimensional time series data [2, 5, 6, 7, 8]. • Trajectories usually have many outliers. Unlike stock, weather, or commodity price data, trajectories of moving objects are captured by recording the positions of the objects from time to time (or tracing moving objects from frame to frame in videos). Thus, due to sensor failures, disturbance signals, or errors in detection techniques, many outliers may appear. Longest Common Subsequences (LCSS) has been applied to address this problem [9]; however, it does not consider various gaps between similar subsequences, which leads to inaccuracy. The gap refers to a sub-trajectory between two identified similar components of two trajectories. • Similar movement patterns may appear in different regions of trajectories. Different sampling rates of tracking and recording devices combined with different speeds of the moving objects may introduce local shifts into trajectories (i.e., the trajectories follow similar paths, but certain sub-paths are shifted in time). Even though similarity measures, such as dynamic time warping (DTW) [10, 11, 12],
84
6 Moving Objects Advanced Querying
and edit distance with real penalty (ERP) [13], can be used to measure the similarity between trajectories with local shifts, they are sensitive to noise. In order to manage trajectories in database systems, we define a data model of trajectories as directed lines in a space, and the similarity between trajectories is defined as the Euclidean distance between directed discrete lines. Our proposed similarity queries can be used to find useful patterns embedded into the trajectories, for example, the trajectories of mobile cars in a city may include patterns for possible traffic jams.
6.2.1 Problem Definition It is difficult to define the similarity between lines in a space. However, we find some useful clues through study of time series databases [1, 2, 3]. The time series database systems can store time series data such as temperature, economic indicators, population, and wave signals, in addition to supporting queries for extracting patterns from the time series data. Most of the time series database systems adopt the Euclidean distance between two time data sequences [2] for analysis. Since trajectory is a type of time series data, the time series databases can deal with trajectories efficiently. However, trajectory not only has a time series data feature, but also has a space feature. For example, it is difficult for the time series database to find data for a geographic and spatial queries. In order to define the similarity between trajectories, it is necessary first to define the trajectory. Hence, we define the data model for the trajectory of moving objects. A real-world trajectory is a directed continuous line with a start and an end point (Fig. 6.2(a)). Given a two-dimensional space R2 and a closed time interval Iλ = [t,t ] with t < t , a trajectory λ is defined as follows.
Fig. 6.2 Trajectory of moving objects
Definition 6.1. A trajectory is the image of a continuous mapping: λ : Iλ → R2 . This definition is a temporal extension of the definition of a simple line described in [14]. Next, we denote the length of trajectories in R2 as LS and the interval of trajectories in temporal space as LT .
6.2 Similar Trajectory Queries for Moving Objects
85
Definition 6.2. The length of trajectory λ during a period [t0 ,t1 ] is denoted as LS (λ , [t0 ,t1 ]) calculated as follows: t1 (dx/dt)2 + (dy/dt)2 dt, where λ (t) = (x, y) (6.1) LS (λ , [t0 ,t1 ]) = t0
The length of the whole trajectory is denoted as LS (λ )(= LS (λ , [t,t ]). Definition 6.3. Given that the x = (x, y) is a vector in space R2 , the temporal interval of trajectory λ between xi and x j on λ is defined as follows: LT (λ , [xi , x j ]) = |t j − ti |, where λ (ti ) = xi , λ (t j ) = x j , and ti ,t j ∈ Iλ [t,t ] LT (λ ) = |t − t|
(6.2) (6.3)
However, a positioning device such as GPS does not continuously measure the coordinates of a moving object, but samples such data. The measured data are thus a sequence of coordinates of positions shown in Fig. 6.2(b). Hence, we define discrete trajectory λ˙ as a discrete function. Each vector xi represents a position of a moving object at each time Tλ˙ = {t0 ,t1 , . . . ,tm } in the space. Definition 6.4. A discrete trajectory is the image of a discrete mapping: λ˙ : Tλ˙ → R2 . A discrete trajectory can be represented as a vector sequence < xt1 , . . . , xtm > as well. If Tλ˙ = {1, 2, . . . , m}, we denote the discrete trajectory λ˙ as just a simple vector sequence < x1 , . . . , xm >. Additionally, where λ˙ (ti ) = xi , we introduce several notations: Tλ˙ (i) = ti , Xλ˙ (i) = xi , and |λ˙ | is the number of the vectors included in λ˙ (|λ˙ | = |Tλ˙ |). Next, we define the distance between two vectors x, x in R2 . Definition 6.5. The distance of vectors x, x is defined as: D(x, x ) = (x − x )2 + (y − y )2
(6.4)
6.2.2 Trajectory Similarity In time series databases, the similarity between two sets of time series data is typically measured by the Euclidean distance [1, 2], which can be calculated efficiently. However, there have been few discussions on the similarity between two lines in space because the previous approaches for spatial queries have focused on the “distance” between a point and a line [15, 16, 18]. The aim of the previous approaches is mainly to find objects that pass a point near the indicated point, such as a car passing through a street. On the other hand, we are concerned with the “shape” of the trajectory. In order to calculate shape-based similarities among trajectories, it is necessary to define a new similarity for the trajectories, as shown in Fig. 6.3(b). In general, the similarity query is represented as a kNN query [15, 16]. There are two types of existing approaches: one is based on spatial similarities, and the other
86
6 Moving Objects Advanced Querying
Fig. 6.3 Distance between trajectories
Fig. 6.4 Existing kNN approaches
is based on similarity between two time series data. The example of the existing spatial kNN query is illustrated in Fig. 6.4(a). In this case, the answer is L1 ,L2 when k is 2. On the other hand, the similarity between two time series data is defined as the Euclidean distance between two time series, where the length of each is n. The distance is defined as the Euclidean distance between two n-dimensional vector data [2] shown in Fig. 6.4(b). While this distance of the time series data is based on shape, the distance is defined only in the case of R1 × T(T = [0, ∞]), but not in the case of Rn × T, shown in Fig. 6.3(b). Since the trajectory has both spatial and temporal features, we consider three types of similarity queries for trajectories as follows: • Spatio-Temporal Similarity: Based on a spatio-temporal feature in R2 × T. • Spatial Similarity: Based on a spatial only feature in R2 without temporal features. • Temporal Similarity: Based on a temporal only feature in R1 ×T without spatial features. As mentioned above, the trajectory has a time series data feature. We define the similarity between two trajectories in the same manner as for the similarity defined in the time series query [2]. For the time series database, the similarity of the two time series data, where each has n values, is given by the Euclidean distance between vectors in Rn . In [1] and [2], when there are two time series data,
6.2 Similar Trajectory Queries for Moving Objects
87
c =< w1 , w2 , . . . , wn >, c =< w1 , w2 , . . . , wn >, the distance D(c, c ) is defined as follows: D(c, c ) = (w1 − w1 )2 + · · · + (wn − wn )2 (6.5) This definition can be extended if each vector x is a vector in space R2 , when the time series vectors are X =< x1 , x2 , . . . , xn >, X =< x 1 , x 2 , . . . , x n >, and the distance is D(c, c ). We define the distance between two time series vectors D(X, X ) by extending the definition of D(c, c ), as follows: (6.6) D(X, X ) = D(x1 − x 1 )2 + · · · + D(xn − x n )2 The defined distance D(X, X ) can be used only in the case where each vector x ∈ X is measured by the same interval, that is Δt = ti+1 −ti (i = 1, . . . , n − 1), where ti is an interval from the time when xi is measured. However, each vector in the trajectory is not always measured by the same interval Δt because positioning devices often lose the data. Therefore, to calculate the similarity using our definition, we define a temporal normalized discrete trajectory λ˙ Δt for trajectory λ , as follows: Definition 6.6. Given a trajectory λ defined for time interval [tS ,tE ], and a natural number m, the temporal normalized discrete trajectory λ˙ Δt is defined as follows:
λ˙ Δt =< λ (tS ), λ (tS + Δt), . . . , λ (tS + mΔt) >, where tS + mΔt = tE
(6.7)
Intuitively, this discrete trajectory λ˙ Δt is the re-sampled trajectory per fixed interval Δt from λ . In other words, λ˙ Δt is generated by dividing λ into equal interval Δt. For discrete trajectory λ˙ , we can use the piecewise linear approximation λ˜ instead of λ . Definition 6.7. Given two trajectories λ and λ with the same temporal length (i.e., LT (λ ) = LT (λ ) ) and a natural number m, the spatio-temporal distance (similarity) DT S (λ , λ ) between λ and λ is defined as follows: m 1 LT (λ ) LT (λ ) = (6.8) DT S (λ , λ ) = D(Xλ˙ (i), Xλ˙ (i))2 , where Δt = ∑ Δt Δt m + 1 i=0 m m Note that DT S (λ˙ , λ˙ ) can be defined as DT S (λ˜ , λ˜ ). In this definition, the similarity is the Euclidean distance between trajectories represented as m + 1 dimensional vectors, and the interval of each trajectory is normalized. Using this definition, it is possible to find trajectories whose shape is more similar to the query trajectory than that which can be found using previous methods.
6.2.3 Query Processing Based on these definitions, we consider the shape-based similarity query for trajectories. Here, Λ˙ is the set of discrete trajectories stored in the database, and each
88
6 Moving Objects Advanced Querying
λ˙ i (λ˙ i ∈ Λ˙ ) is a discrete trajectory, such as λ˙ i =< x1 , x2 , . . . , xm >. The query trajectory λ˙ q is given as λ˙ q =< x1 , x2 , . . . , xn >. The shape-based range query can then be defined using Λ˙ ,λq , and the previous defined distance between two time series vectors, as follows: Definition 6.8. The process for calculation of the shape-based range query Qrange (θ , λ˙ q , Λ˙ )is given in Algorithm 14. The range query is defined as a subsequence match of trajectories as shown in Fig. 6.5.
Fig. 6.5 Similarity query for trajectories
Algorithm 14: Qrange (θ : integer, Λ˙ , λ˙ q ) : Λ˙ a Input: Λ˙ ,λ˙ q ,θ (θ is a natural number) Output: Λ˙ a ,{λ˙ a1 , . . . , λ˙ ak } ∈ Λ˙ a begin l = |λ˙ q |; Λ˙ a = φ foreach λ˙ i in Λ˙ do for j = 1 to |λ˙ i | − l + 1 do λ˙ i j = subsequence(λ˙ i , j, l) //This function will return a subsequence of the original sequence λ˙ i , such as < xj , xj+1 , · · · , xj+l−1 >, each x ∈ λ˙ i if D(λ˙ q , λ˙ i j ) < θ then Add λ˙ i j to Λ˙ a end end end return Λ˙ a end
In addition, the nearest neighbor query can be defined using the distance between trajectories. In our definition, the temporal features are not indicated in the query; however, we consider that the temporal features can be indicated independently from the range query. For example, a query “Qrange (θ , λ˙ q , Λ˙ )∧11 : 00 < Tλ˙ ai (1) < 12 : 00”
6.3 Density Queries for Moving Objects in Spatial Networks
89
involves retrieving subsequences λ˙ ai where the distance between λ˙ q and λ˙ ai is less than θ . Moreover, the first vector in λ˙ ai is measured within the interval [11:00, 12:00].
6.3 Density Queries for Moving Objects in Spatial Networks The issue of density queries for moving objects was first proposed in [19]. The objective is to find regions in space and time with the density higher than a given threshold. In paper [19], the authors find the general density-based queries difficult to be answered efficiently and hence turn to simplified queries. Specifically, they partition the data space into disjoint cells, and simplified density query reports cells, instead of arbitrary regions that satisfy the query conditions. This scheme may result in answer loss. To solve this problem, Jensen et al. [20] define an effective density query to guarantee that there is no answer loss. Both studies assume the objects to be moving in a free style and define the density query in Euclidean space. However, efficient dynamic density query in spatial networks is more crucial for many applications. Consider this real-world example: in the case of queries related to vehicles distribution in the road network, users would like to know real-time traffic density distribution. Clearly, in this case the Euclidean density query methods are inapplicable, since the path between two cars is restricted by the underlying road network. Additionally, these existing query methods cannot reflect the natural dense area in a road network since they simplify the density query to return the area with fixed size and shape. Grid-based algorithms also ignore the network constraint and result in inaccurate query results. It is natural to represent the dense area in a road network as road segments containing large number of moving objects. Considering the feature of road networks, we will introduce a cluster-based density querying algorithm.
6.3.1 Problem Definition As the result of density queries in the road network are a set of dense segments, we first introduce the concepts of density and dense segment. Definition 6.9. The density of a road segment s is represented as density(s) = N/len(s), where N is the number of objects on s and len(s) is the length of s. Definition 6.10. The road segment s is a dense segment (DS) if and only if density(s) ≥ ρ , where ρ is a user-defined density parameter. A straightforward method to process the query is to traverse all objects moving on a road network to compute dense regions by their number, the length of the segment, and a given density threshold. Figure 6.6 shows a density query in a road network. Obviously, the cost is very high and it is difficult to obtain effective results. Specifically, the following three issues are likely to be encountered in the case of the query results. 1. Different DS may be overlapped, such as Case 1 in Fig. 6.6.
90
6 Moving Objects Advanced Querying
2. The distribution of moving objects may be very skewed in some DS, i.e., the distribution of objects is dense in one part of a DS, but it is sparse in another part, such as Case 2 in Fig. 6.6. 3. Some DS may contain very few objects, such as Case 3 in Fig. 6.6.
Fig. 6.6 An example of density query
Such query results are less useful. Thus, we define an effective density query in a road network to find the useful dense regions with a high concentration of objects and symmetrical distribution of objects as well as no overlaps. Definition 6.11. Given density parameter ρ , effective road-network density query (e-RNDQ) aims to find all dense segments that satisfy the following conditions: 1. Any dense segment set cannot be intersecting (namely no overlaps). 2. In each dense segment set, the distance between any neighboring object is not more than a given distance threshold δ . 3. The length of dense segments is not less than a given length threshold L. 4. Any dense segment containing moving objects is in the query result set. The first condition ensures that the result is not redundant. It avoids the Case 1 in Fig. 6.6. The second condition guarantees that objects are symmetrically distributed in a dense segment set. The third condition provides the restriction that there are no small segments that only contain few objects in the result. The fourth condition ensures that query results do not suffer from answer loss.
6.3.2 Cluster-Based Query Preprocessing To reduce the cost of clustering maintenance, we introduce the definition of Cluster Unit (CU). A cluster unit is a group of moving objects close to each other at present and near future time. The cluster unit will be incrementally maintained according to the moving objects within it. Specifically, we constrain the objects in a cluster unit moving in the same direction and on the same segment. For keeping the objects in a cluster unit dense enough, the network distance between each pair of neighboring objects in a cluster unit should not exceed a system threshold ε . As mentioned earlier, we assume that objects move in a piecewise linear manner and the next segment to move is known in advance. Formally, a cluster unit is defined as follows:
6.3 Density Queries for Moving Objects in Spatial Networks
91
Definition 6.12. A Cluster Unit is represented by (O, na , nb , head, tail, Ob jNum), where O is a list of objects {o1 , o2 , . . . , oi , . . . , on }, oi =(oidi , na , nb , posi , speedi , next nodei ), where posi is the relative location to na , speedi is the moving speed, and (nb , next node) is the next segment to move. Without loss of generality, assuming pos1 ≤ pos2 ≤ . . . ≤ posn , it must satisfy |posi+1 − posi | ≤ ε (1 ≤ i ≤ n − 1). Since all objects are on the same segment (na , nb ), the position of the CU is determined by an interval (head,tail) in terms of the network distance from na . Thus, the length of the CU is |tail − head|. Ob jNum is the number of objects in the CU. Initially, based on the definition, a set of CUs are created by traversing all segments in the network and their associated objects. The CUs are incrementally maintained after their creation. As time elapses, the distance between adjacent objects in a CU may exceed ε . Thus, we need to split the CU. A CU may also merge with its adjacent CUs when they are within the distance of ε . Hence, for each CU, we predict the time when they may split or merge. The predicted split and merge events are then inserted into an event queue. Subsequently, when the first event in the queue takes place, we process it and update the affected CUs. This process is continuously repeated. The key challenges are: (1) how to predict split/merge time of a CU, and (2) how to process a split/merge event of a CU. The split of a CU may occur in two cases. The first one is when the CU arrives at the end of the segment (i.e., an intersection node of the road network). When the moving objects in a CU reach an intersection node, the CU has to be split since they may head in different directions. Split time refers to the time when the first object in the CU arrives at the node. In the second case, the split of a CU occurs when the distance between some neighboring objects moving on the segment exceed ε . However, it is not easy to predict the split time since the neighborhood of objects changes over time. Therefore, the main task is to dynamically maintain the order of objects on the segment. We compute the earliest time instance when two adjacent objects in the CU meet at tm . We then compare the maximum distance between each pair of adjacent objects with ε until tm . If this distance exceeds ε at some time, the process stops and the earliest time exceeding ε is recorded as the split time of CU. Otherwise, we update the order of objects starting from tm and repeat the same process until some distance exceeds ε or one of the objects arrives at the end of the segment. When the velocity of an object changes over the segment, we need to re-predict the split and merge time of the CU. To reduce the processing cost of splitting at the end of segment, we propose the group split scheme. When the first object leaves the segment, we split the original CU into several new CUs according to the objects’ directions (which can be implied by next node). On the one hand, we compute a to-be-expired time (i.e., the time until the departure from the segment) for each object in the original CU and retain the CU until the last object leaves the segment. On the other hand, we attach a to-be-valid time (with the same value as the to-be-expired time) for each object in the new CUs. Only valid objects will be considered while constructing CUs. The merge of CUs may occur when adjacent CUs in a segment are moving together (i.e., their network distance ≤ ε ). To predict the initial merge time of CUs, we dynamically maintain the boundary objects of each CU and their validity time (the period when they are treated as boundary of the CU), and compare the minimum distance between the boundary objects of two CUs with the threshold ε at their validity time. The boundary objects of CUs can be obtained by maintaining the order of objects during computing the split time.
92
6 Moving Objects Advanced Querying
The processing of the merge event is similar to the split event on the segment. We get the merge event and time from the event queue to merge the CUs into one CU and compute the split time and merge time of the merged CU. Finally, the corresponding affected CUs in the event queue are updated. Besides the split and merge of CUs, new objects may come into the network or existing objects may leave. For a new object, we locate all CUs of the same segment that the object enters and check whether the new object can join any CU according to the CU definition. If the object can join some CU, its split and merge events are updated. If no such CUs are found, a new CU for the object is created and the merge event is computed. For a leaving object, we update the split and merge events of its original CU if necessary.
6.3.3 Density Query Processing Based on the dynamic CUs, density queries at any time point can be processed efficiently to return dense areas in the road networks. The dense segment we defined in Section 6.3.1 is represented as (CU, na , nb , start pos, end pos, len, N), where CU is the set of cluster units on segment (na , nb ), start pos is the start position of the DS, end pos is the end position of the DS, len is the length of DS, and N is the number of objects. To obtain the effective dense areas restricted in the e-RNDQ, we introduce the parameter δ to DS. Definition 6.13. A DS is δ -DS if and only if the distance between any adjacent CUs is not more than δ (This guarantees that the distance between any two adjacent objects satisfies Distance(oi , oi+1 )≤δ ), and density is not less than ρ . (For convenience, we abbreviate the term δ -DS to DS in the rest of this chapter.) In fact, δ is a user-defined parameter of the density query and ε is a system parameter to maintain the CUs. Since the distance of adjacent objects is not more than ε in a CU, in order to retrieve dense areas based on CUs, we require ε ≤ max{δ , ρ1 }. In the road network, a dense area is represented as a dense segment set, which may contain several DSs in different segments. Therefore, we leverage network nodes to optimize the combination of these DSs. Definition 6.14. In each DS, na is δ -ClusterNode (δ -CN) of the DS, if and only if |start pos-na |≤ δ ; nb is δ -CN of the DS, if and only if |end pos-nb | ≤ δ . Definition 6.15. A Dense Segment Set (DSS) consists of different DSs where the distance between adjacent DSs is not more than δ , the total length of DSs in the DSS is not less than L, and the density in the DSS is not less than ρ . Actually, DSS may contain DSs located in different segments where DSs are joined by δ -CN. DSS constitutes the road-network density query results. Suppose the density query parameter is given as (ρ , δ , L,tq ), where tq is the query time. For query processing based on CUs, our algorithm includes two steps: 1. The filtering step: Merge CUs into DSs by checking the parameters ρ and δ , which can prune some unnecessary segments. In this step, we can obtain a series of dense segments, specifically, a list of DSs and δ -CNs.
6.3 Density Queries for Moving Objects in Spatial Networks
93
2. The refinement step: Merge the adjacent DSs around δ -CNs to construct the DSS by checking the parameters ρ , δ , L and finally find out the effective density query result consisting of dense segment sets.
Algorithm 15: Filter(ρ , δ ,tq ) Input: density threshold ρ , query time tq begin foreach e(nx , ny ) of edgeList do if e.cuList = null then create a new DS: ds cu ← getFirstCU(e) ds.addCU(cu); ds.start pos = cu.pos if ds.start pos < δ then ds.putCN(nx ); δ -CN[nx ].putDS(ds) end while getNextCU(e) = null do nextcu ← getNextCU(e) if Dd(ds, nextcu) > δ or Dens(ds, nextcu) < ρ then ds.end pos = cu.pos + cu.len; e.addDS(ds) create a new DS: ds ds.start pos = nextcu.pos end ds.addCU(nextcu); cu = nextcu end ds.end pos = cu.pos + cu.len if 1 − ds.end pos < δ then ds.putCN(ny ); δ -CN[ny ].putDS(ds) end e.addDS(ds) end end end
We explain the two steps of density query processing in detail. First, according to the network expansion approach [21], we traverse each segment to retrieve CUs sequentially, and then compute the distance between adjacent CUs and their density. If the distance is not more than δ and the density is not less than ρ , the CUs are merged to form a DS. Figure 6.7 shows an example. Given ρ =1.5 and δ =2, we compute DS at query time tq . The road segment s1 (represented as < J1 , J2 >) includes two CUs named cu1 and cu2 . Assume that the distance between cu1 and cu2 is 1.2 at tq , which is less than δ , and the density is 1.8 after merging cu1 with cu2 , which is more than ρ , and therefore cu1 and cu2 can construct a DS (we call it DS1 ). The start position of DS1 is the head of cu1 and the end position of DS1 is the tail of cu2 . The number of objects in DS1 is the sum of the number of objects in cu1 and in cu2 . Assume that the distance between DS1 and node J2 is 1.0, which is less than δ , and J2 is the δ -CN of DS1 (we call it δ -CN1 ). We insert DS1 into the DS list of δ -CN1 . In this way, we can obtain DS2 on s3 including cu4 and DS3 on s4 including cu3 . The δ -CN of DS2 (δ -CN2 ) is J4 and that of DS3 is J2 . Thus the DS list of δ -CN1 includes DS1 and DS3 , while the DS list of δ -CN2 includes DS2 . Algorithm 15 shows the pseudo-code.
94
6 Moving Objects Advanced Querying
Fig. 6.7 An example to construct DS and DSS Algorithm 16: Re f inement(ρ , δ , L,tq ) Input: density threshold ρ , length threshold of DSS L Output: Result: The set of DSSs begin foreach δ -CNi of δ -CNList do if (δ -CNi .dsList = null) and (not δ -CNi .accessed) then /*Q is a priority queue to store all DSs around δ -CNi */ /*δ -Q is a priority queue to store all unaccessed δ -CNs*/ Q ← null; δ -Q.put(δ -CNi ) while δ -Q = null do cn = δ -Q.pop(); cn.accessed = true Q.addDSs(cn); /*add all DSs around cn and sorted*/ create a new DSS: dss ds = Q.pop(); dss.addDS(ds) δ -Q.putdscn(ds); /*add all unaccessed δ -CN around ds*/ while Q = null do nextDS = Q.pop() if Dist(dss, nextDS) ≤ δ and Dens(dss, nextDS) ≥ ρ then dss.addDS(nextDS) δ -Q.putdscn(nextDS) end end end if dss.len > L then Result.insert(dss)
end
end end return Result
6.4 Continuous Density Queries for Moving Objects
95
In the refinement step, we compute dense segment sets so that the effective dense areas can be obtained. We traverse the list of each δ -CN and evaluate whether those DSs around the δ -CN can construct DSS based on Definition 6.15. For example in Fig. 6.7, L=100. As the Distance(DS1 , δ -CN1 )=1.0 and Distance(DS3 ,δ -CN1 )=0.7, the distance between DS1 and DS3 is 1.7, which is less than δ . In addition, if DS1 is merged with DS3 , the density is more than ρ . Therefore, DS1 and DS3 can be merged to form a DSS named DSS1 . In the same way, we check if there are other dense segments that can be merged with DSS1 by utilizing its δ -CN and insert it into DSS1 . Finally, we check if the total length of DSS1 is more than L. If so, DSS1 is one of the answers of the density query. This process is repeated until all δ -CNs containing dense segments are accessed. Then, we can obtain all dense areas that are represented as dense segment sets at tq . Note that a DS may be involved in the lists of two δ -CNs. To avoid scanning the same nodes repeatedly, we mark the scanned δ -CN as accessed node. Algorithm 16 shows the pseudo-code of the refinement step.
6.4 Continuous Density Queries for Moving Objects Although many studies have been done on density queries for moving objects, they all focused on how to answer snapshot density queries, where the results are found based on a snapshot of the location dataset. In this section, we focus on continuously monitoring dense regions for moving objects in a highly dynamic environment where the density regions may be changed with location updates of the moving objects. Continuous density query is an important research but has received attention only recently [20]. We provide a definition of continuous density queries for moving objects, which returns useful answers and is amenable to efficient computation. Furthermore, we propose the notion of safe interval for dense/spare regions to support efficient processing of continuous density queries.
6.4.1 Problem Definition We assume that a collection of objects are moving on the space under consideration, where each object is capable of transmitting its location and velocity to the central server. The central server can predict the object positions based on the location and velocity information, and continuously answer density queries. When an object changes its velocity, it updates the new velocity to the central server. Definition 6.16. A continuous density query returns all the regions that satisfy the following three conditions: 1. The density of the region is not less than ρ ; 2. The minimum area of our interest is s and any subarea of the region with an area larger than s must be dense; 3. No two regions in the result set overlap with each other.
96
6 Moving Objects Advanced Querying
Conditions 1 and 2 indicate that each dense region must have more than ρ · s objects. Condition 3 is provided to simplify the search of dense regions, as in the previous study. We use the TPR-tree to index the moving objects [22]. In the TPR-tree, the position of a moving object is represented by a vector including the reference position and the velocity - (p(tref ), v). We can predict the future location at time t using the following formula: p(t) = p(tref ) + v · (t − tref ) (6.9) In order to find local dense regions, we recursively partition the space by a Quadtree. The Quad-tree is used to store the state (i.e., dense or sparse) of a subspace, as well as the validity in time, which we call safe interval of the subspace. Thus, a node in the Quad-tree is represented as ((row, col), level, state, safe interval), where (row, col) is an index to identify the node, and level denotes the level of the tree that the node belongs to. If the node is a leaf, the state can be 0 or 1, which indicates that the region represented by the node is sparse or dense. For a non-leaf node, the state can be 0, 1, or 2, where 0 indicates all its children nodes are sparse, 1 indicates all its children nodes are dense, and otherwise the state is 2. The safe interval is the valid time of the state, which is formally defined as follows. Definition 6.17. The safe interval is the time period for which the region remains in its current state. For example, if the region is dense, it will remain dense for at least a time period of safe interval. After that, the state of the region may or may not change. Next, we proceed to discuss how to build a Quad-tree and compute the safe intervals, followed by how to answer continuous density queries using the Quad-tree.
6.4.2 Building the Quad-Tree To facilitate searching dense regions, we partition the space into a grid by employing a Quad-tree. More specifically, the space is recursively divided into four quadrants until the area of the subspace is less than the threshold s given in the density query definition. We set s as the stop condition since it is the minimum area we should consider for a dense region according to the definition. Given a space with an area of S, the depth of the Quad-tree is: L = log4 S/s + 1
(6.10)
In the Quad-tree, each node corresponds to a cell in the grid. Recall that a node is represented by ((row, col), level, state, safe interval). The cell can be easily determined by some of these parameters. More specifically, the left-bottom point of the cell is given by: √ S × [row − 1, col − 1] (6.11) level 2 The right-upper point of the cell is given by:
6.4 Continuous Density Queries for Moving Objects
√
S
2level
× [row, col]
97
(6.12)
Figure 6.8 shows an example of the Quad-tree. Given S=32, s=2, and ρ =1.5, based on Equation 6.10, the level number of the Quad-tree is 3. The root of the Quad-tree corresponds to the largest cell c1 . Its level number is 0, the row value is 1, and the col value is also 1. Each internal node is one quadrant of the root, including c2 , c3 , c4 , and c5 . The leaf nodes correspond to the minimum cells (called leaf cells hereafter), such as c6 , c7 , c8 , and c9 .
Fig. 6.8 An example of the Quad-tree
Based on the Quad-tree, initially we count the number of moving objects for each leaf cell and determine if the cell is dense or spare. By definition, a high-level cell is dense if and only if all the leaf cells below it are dense. For example, in Fig. 6.8, if c6 through c9 are dense while some other leaf cell is sparse, then c2 is returned as a dense region but c1 is not.
6.4.3 Safe Interval Computation A safe interval of a dense (sparse) cell means the minimum time period for which the cell is still dense (sparse). Due to the movement of objects, a dense cell may turn into a sparse one, and vice versa. Thus, to support continuous density queries, we maintain the safe intervals for leaf cells of both types, but the safe intervals for high-level cells only if they are dense (i.e., only for dense regions). In the following, we discuss how to compute the safe intervals for dense and sparse leaf cells. The safe interval of a dense high-level cell can be recursively set as the smallest one of its child nodes.
98
6 Moving Objects Advanced Querying
6.4.3.1 Safe Interval of Dense Leaf Cell For a dense leaf cell, to simplify the computation, we only focus on the objects leaving from it, without considering the entering objects. This is because an entering object will not change the state of a dense cell. It can only change the state of a sparse cell, that is, make the sparse cell dense. Thus, we compute the shortest time interval for which the cell remains dense. Figure 6.9 shows an example, where cell C is dense. There are totally five objects in C, i.e., o1 , o2 , o3 , o4 , and o5 . Let the object number threshold for a dense cell be 3. We compute the time before each object will leave this cell to obtain the safe interval of the dense cell. Suppose the leaving times of these objects are t5 ,t3 ,t1 ,t4 , and t2 , sorted in an ascending order. Then t1 is the safe interval of the dense cell since this cell may become sparse after o1 leaves.
Fig. 6.9 An example of dense region
Algorithm 17 formally describes how to compute the safe interval for a dense leaf cell, where (xmin, ymin) and (xmax, ymax) are the bounding coordinates of cell, (x, y) is the coordinate of obj at time t, and (vx, vy) is the object’s speed in the x and y dimensions. We use a heap H to store the last several objects leaving from the cell. Let Scell be the area of the cell. The size of H is set to ρ · Scell , which is the density threshold of the cell in terms of the number of objects. For every object in the cell, we compute its leaving time and push the time into H. After processing all the objects, when the object with the minimum leaving time in H leaves from the cell, the object number in the cell will be lower than the density threshold if not considering the objects entering this cell from the outside. Hence, the minimum value in H is the earliest possible time at which that the cell changes its state. This value is returned as the safe interval of cell. Note that the safe interval of a dense leaf cell we compute is the shortest time interval for which the dense state remains. Hence, when the safe interval expires, the state of the cell may not be changed if there have been some other objects entering this cell. Thus, the state of this cell and the corresponding safe interval need to be re-calculated upon expiration.
6.4.3.2 Safe Interval of Sparse Leaf Cell Similar to the dense leaf cell, we only focus on the objects entering the sparse cell, without considering the leaving objects. Suppose that N is the density threshold for the sparse cell, and that presently there are M objects in the cell. Then after (N − M) objects move into this cell, its state might be changed. To reduce the cost of scanning
6.4 Continuous Density Queries for Moving Objects
99
Algorithm 17: SIofDense(cell) input : The region that needs to be processed output: Safe Interval of a dense region cell H is a min-heap, whose size is ρ · Scell ; for every ob j in cell do if (ob j.vx >0) then lx = cell.xmax − ob j.x; else if (ob j.vx <0) then lx = cell.xmin − ob j.x; else lx = cell.xmax − cell.xmin; end if (ob j.vy >0) then ly = cell.ymax − ob j.y; else if (ob j.vy <0) then ly = cell.ymin − ob j.y; else ly = cell.ymax − cell.ymin; end Push min(lx/vx, ly/vy) into H; end Return the minimum value in H;
outside objects, we expand the cell level by level until the expanding region contains (N − M) objects. When all the objects in this expanding region enter the cell, the cell’s state may be changed. On the other hand, a fast-moving object outside this expanding region may have also entered into the cell. Such earliest time is given by to =
L Vmax
(6.13)
where Vmax is the known maximum moving speed and L is the length of the expanding distance. Thus, within the interval to , we only need to scan the objects in the expanding region and estimate whether these objects can change the state of this sparse cell by computing their entering times. Algorithm 18 describes how to compute the safe interval for a spare leaf cell cell. Again, we use a heap H to store the first several objects that will enter to cell. The size of H is (N − M). The cell is expanded to a larger region denoted as Cell that includes at least ρ · Scell objects. We then compute the entering times of these additional objects in Cell. If object i’s entering time, denoted by ti , is longer than to , given in Equation (6.13), ti is set to to . After processing all the additional objects in Cell, the maximum value in H is returned as the safe interval of cell. Figure 6.10 shows an example, where C is a sparse region. In the expanding region, the objects o1 , o2 , o3 , o4 , and o5 are moving towards C. Suppose that their entering times are t2 ,t1 ,t5 ,t3 ,t4 , sorted in descending order, and that they are all smaller than to . If the region would change to a dense one after three objects move into it, we will then use t5 as its safe interval. Similar to the case for a dense leaf cell, the state and the safe interval of a sparse cell have to be re-computed when the safe interval expires. There are two cases in which we need to update the safe interval of a dense/sparse leaf cell: (1) when the safe interval expires, we need to recompute the state and safe
100
6 Moving Objects Advanced Querying
Algorithm 18: SIofSparse(cell) input : The region that needs to be processed output: Safe Interval of a sparse region cell H is a max-heap, whose size is (ρ · Scell )−(number of objects in cell); Expand cell to Cell, which includes at least (ρ · Scell ) objects; L is the expanded distance and Vmax is the maximum velocity of all the objects; for every additional object ob j in Cell do if (ob j.vx > 0 and ob j.x ≤ cell.xmin) then lx = cell.xmin − ob j.x; else if (ob j.vx < 0 and ob j.x ≥ cell.xmax) then lx = cell.xmax − ob j.x; else lx = L; end if (ob j.vy >0 and ob j.y ≤ cell.ymin) then ly = cell.ymin − ob j.y; else if (ob j.vy <0 and ob j.y ≥ cell.ymax) then ly = cell.ymax − ob j.y; else ly = L; end t = lx/vx; if (ob j is not in cell at time t) then t = ly/vy; if (t > L/Vmax ) then t = L/Vmax ; Push t into H; end Return the maximum value in H;
Fig. 6.10 An example of sparse region
interval of the cell, as discussed in last two subsections; (2) when the velocity of the object changes, we need to recompute the states and safe intervals of those cells affected by this update. Next we discuss how to deal with the second case. When the updating object is in a sparse cell, we do not need to recompute the safe interval of this cell since we consider only the objects entering from the outside. However, the object may affect the safe intervals of other sparse cells that the object’s moving trajectories cross. We only need to recompute the sparse cells that the object’s new trajectory crosses. For those sparse cells intersected by the old tra-
6.4 Continuous Density Queries for Moving Objects
101
jectory, we do not need to recompute their safe intervals until they expire, because before the current safe intervals their states would remain unchanged. When the updating object is in a dense cell, the safe interval of this cell may be changed because we compute the safe interval for a dense cell based on the objects inside the cell. The sparse cells that intersect with the object’s new trajectory also need to be recomputed.
Fig. 6.11 An example of object updating
Figure 6.11 shows an example for finding the sparse cells whose safe intervals need to be recomputed, where S1 , S2 , S3 , S4 , S5 are sparse cells, D1 , D2 , D3 , D4 are dense cells, and o1 is an updating object with its velocity changed. We need not consider the sparse cells in its old moving direction, i.e., S3 . In the new moving direction, we identify the sparse cells that o1 may affect its safe interval. In order to reduce the computing cost, the formula Lu = v1 · SImax
(6.14)
can be used to determine the length of the trajectory, where v1 is the new speed of o1 and SImax is the maximum safe interval among all cells. We only update the safe intervals of the sparse cells that intersect with the segment Lu (e.g., S4 in Fig. 6.11).
6.4.4 Query Processing Having computed the states and safe intervals for all leaf cells, we now have the required data to identify the dense regions. We search the Quad-tree in a bottom-up manner. For an intermediate node, if all its child nodes are dense (i.e., with the state value of 1), this node is also dense, otherwise it is not, by definition. The bottom-up search of a dense region stops until an ancestor is no longer dense. Then its child nodes that are dense are returned as answers. The safe interval of the dense region is set as the smallest interval of the leaf cells contained in the dense region. When the safe interval expires, this means the safe interval of a leaf cell expires. The state and safe interval of that leaf cell will be updated, based on which the dense region is also reevaluated. The formal procedure is described in Algorithm 19.
102
6 Moving Objects Advanced Querying
Algorithm 19: Query(t) input : Query time t output: Dense region for every leaf node n do if (n.sa f e interval ≥ t) then if (n.state == 0) then break; else n =n; while (n .parent.state == 1 and n .parent.sa f e interval ≥query time) n =n .parent; do output n ; end ignore the children of n and get the next leaf node; end else count number of moving objects in n; if (number≥ ρ · Scell ) then SIofDense(n); else SIofSparse(n); end if (n.state changes) then adjust value of n.parent and take n as the next node; end end
6.5 Summary In this chapter, we introduced the advanced querying for moving objects including similar trajectories queries and density queries. The cluster-based pre-processing can efficiently support density queries in road networks and the Quad-tree based scheme with the notion of safe interval can monitor continuous density queries for moving objects.
References 1. Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) Locally Adaptive Dimensionality Reduction for Indexing Large Time Series Databases. ACM Transactions on Database Systems 27(2):151-162 2. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality Reduction for Fast Similarity Search in Large Time Series Databases. Journal of Knowledge and Information Systems 3(3):263-286 3. Priyantha N, Miu A, Balakrishnan H, Teller S (2001) The Cricket Compass for Context-Aware Mobile Applications. In: Proceedings of the 7th Annual International Conference on Mobile Computing and Networking (MOBICOM 2001), Rome, Italy, pp 1-14 4. Pfoser D, Jensen CS, Theodoridis Y (2000) Novel Approaches in Query Processing for Moving Object Trajectories. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp 395-406
References
103
5. Agrawal R, Faloutsos C, Swami AN (1993) Efficient Similarity Search in Sequence Databases. In: Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms (FODO 1993), Chicago, Illinois, USA, pp 69-84 6. Chan KP, Fu AW (1999) Efficient Time Series Matching by Wavelets. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), Sydney, Australia, pp 126 7. Korn F, Jagadish H, Faloutsos C (1997) Efficiently Supporting Ad hoc Queries in Large Datasets of Time Sequences. In: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data (SIGMOD 1997), Tucson, Arizona, USA, pp 289-300 8. Yi B, Faloutsos C (2000) Fast Time Sequence Indexing for Arbitrary Lp Norms. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp 385-394 9. Vlachos V, Kollios G, Gunopulos D (2002) Discovering Similar Multidimensional Trajectories. In: Proceedings of the 18th International Conference on Data Engineering (ICDE 2002), San Jose, California, USA, pp 673 10. Yi B, Jagadish H, Faloutsos C (1998) Efficient Retrieval of Similar Time Sequences under Time Warping. In: Proceedings of the 14th International Conference on Data Engineering (ICDE 1998), Orlando, Florida, USA, pp 201-208 11. Chen S, Kashyap RL (2001) A Spatio-Temporal Semantic Model for Multimedia Presentations and Multimedia Database Systems. IEEE Transactions on Knowledge and Data Engineering 13(4): 607-622 12. Keogh E (2002) Exact Indexing of Dynamic Time Warping. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB 2002), Hong Kong, China, pp 406-417 13. Chen L, Ng R (2004) On the Marriage of Edit Distance and Lp Norms. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), Toronto, Canada, pp 792-803 14. Andoni A, Deza M, Gupta A, Indyk P, Raskhodnikova S (2003) Lower Bounds for Embedding Edit Distance into Normed Dpaces. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2003), Baltimore, Maryland, USA, pp 523-526 15. Chon H, Agrawal D, Abbadi AE (2002) Query Processing for Moving Objects with SpaceTime Grid Storage Model. In: Proceedings of the 3rd International Conference on Mobile Data Management (MDM 2002), Singapore, pp 121-129 16. Kollios G, Tsotras VJ, Gunopulos D, Delis A, Hadjieleftheriou M (2001) Indexing Animated Objects Using Spatiotemporal Access Methods. IEEE Transactions on Knowledge and Data Engineering 13(5):758-777 17. Cai Y, Ng R (2004) Indexing Spatio-Temporal Trajectories with Chebyshev Polynomials. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 599-610 18. Cormode G, Muthukrishnan S (2002) The String Edit Distance Matching Problem with Moves. In: Proceedings of the 13th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2002), San Francisco, California, USA, pp 667-676 19. Hadjieleftheriou M, Kollios G, Gunopulos D, Tsotras VJ (2003) On-Line Discovery of Dense Areas in Spatio-Temporal Databases. In: Proceedings of the 8th International Symposium on Advances in Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 306324 20. Jensen CS, Lin D, Ooi BC, Zhang R (2006) Effective Density Queries on Continuously Moving Objects. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), Atlanta, Georgia, USA, pp 71 21. Papadias D, Zhang J, Mamoulis N, Tao Y (2003) Query Processing in Spatial Network Databases. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), Berlin, Germany, pp 802-813 22. Saltenis S, Jensen CS, Leutenegger ST, and Lopez MA (2000) Indexing the Positions of Continuously Moving Objects. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, USA, pp 331-342 23. Faloutsos C, Ranganathan M, Manolopoulos Y (1994) Fast Subsequence Matching in TimeSeries Databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD 1994), Minneapolis, Minnesota, pp 419-429
104
6 Moving Objects Advanced Querying
24. Agrawal R, Lin KI, Sawhney HS, Shim k (1995) Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases. In: Proceedings of the 21st International Conference on Very Large Data Bases (VLDB 1995), Zurich, Switzerland, pp 490-501 25. Jagadish HV, Mendelzon AO, Milo T (1995) Similarity-Based Queries. In: Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS 1995), San Jose, California, USA, pp 36-45 26. Rafiei D (1999) On Similarity-Based Queries for Time Series Data. In: Proceedings of the 15th International Conference on Data Engineering (ICDE 1999), Sydney, Australia, pp 410-417
Chapter 7
Trajectory Prediction of Moving Objects
Jidong Chen1 , Xiaofeng Meng2 1
EMC Research China, 2 Renmin University of China
[email protected],
[email protected] Abstract The trajectory prediction is an important part for the management of moving objects. For example, it can be used to improve the performance of the location update strategy and to support the predictive index and queries. In this chapter, we first review some linear prediction methods and analyze their problem in handling moving objects in spatial networks, and then present our simulation-based prediction methods: Fast-Slow Bounds Prediction and Time-Segment Prediction. Key words: trajectory prediction, linear prediction, simulation-based prediction, spatial network, moving object databases
7.1 Introduction There exist a large number of moving objects in a spatial network with their locations continuously changing. In order to get the location of a moving object in the future time, it is necessary to store its location into a central database via GPS. The research issue is how to accurately maintain the location of a large number of moving objects while minimizing the number of updates. The trajectory prediction plays an important role to solve this problem. Most existing studies propose to lower the update frequency by a trajectory prediction method. They usually use the linear prediction which represents objects locations as linear functions of time. However, the assumption of linear movement in traditional prediction methods limits the applicability in a majority of real-life applications especially in traffic networks where vehicles change their velocities frequently. Moreover, other prediction models with non-linear prediction proposed by Aggarwal et al. [1] using quadratic predictive function and by Tao et al. [3] based on recursive motion functions for objects with
105
106
7 Trajectory Prediction of Moving Objects
unknown motion patterns improve the precision in predicting the location of each object, but they ignore the correlation of adjacent objects and may not reflect accurately the complex and stochastic traffic movement scenario. In the management of moving objects, the trajectory prediction method is usually used to improve the performance of the location update strategy and to support the predictive index and queries. In this Chapter, we first review some linear prediction methods and analyze their problem in handling moving objects in spatial networks, and finally present our simulation-based prediction methods: Fast-Slow Bounds Prediction and Time-Segment Prediction, which are more accurate than linear prediction methods in predicting future trajectories of moving objects in spatial networks.
7.2 Underlying Linear Prediction (LP) Methods Most current index and query processing approaches use the linear prediction method for its simplicity and capability of approximating any curve of free movement by piece-wise linear segments. Suppose the trajectory function for an object between time t0 and t1 is Xt = Xt0 + V(t − t0 ) (t0 ≤ t ≤ t1 )
(7.1)
where Xt0 denotes the position vector of the object at time t0 and V denotes the velocity vector of the object, which is assumed to remain fixed between t0 to t1 .
7.2.1 General Linear Prediction The general linear prediction method uses the object’s current position Xt0 and current velocity V to predict its position in the near future. When the prediction is deemed inaccurate, that is, its deviation from the actual position is beyond a predefined threshold, we revise the prediction by resetting Xt0 and V. In situations where object’s velocity remains largely constant, this method enables us to make future prediction with high precision. However, when objects move with changing velocity, their trajectory functions have to be revised frequently.
7.2.2 Road Segment-Based Linear Prediction If objects move in a constrained environment such as a transportation network, we can use the road segments of the network to help model the object’s movement. In other words, we assume objects move at constant speed along a road segment, that is, their trajectory functions will not change until they move out of a road segment. When an object enters a new road segment, we reset the velocity V in its trajectory function. The frequency of revising the trajectory function depends on the average length of the road segments.
7.3 Simulation-Based Prediction (SP) Methods
107
7.2.3 Route-Based Linear Prediction If objects have regular and known routes in the transportation network (e.g., one takes the same route from home to work), we can use the routes instead of the road segments to reduce the number of updates needed to maintain the objects’ position. If the route is predicted incorrectly, we simply make an additional update. However, any real traffic system has a stochastic, dynamic and fuzzy nature. The accuracy of linear prediction methods mentioned above is inadequate because linear methods can hardly reflect the movement of objects constrained by road networks. For example, in urban road networks, because of traffic conditions, a vehicle may travel at a constant speed, decelerate to stop, wait, accelerate and travel again at a constant speed. Vehicles may often repeat the above movement in modern urban road networks. We use Fig. 7.1 to demonstrate the inadequacy of the linear prediction method for real road networks. Figure 7.1(a) shows the predicted (linear) trajectory and the actual trajectory of an object. We can see that each time the change of the object’s velocity is above a certain threshold, an update is triggered and the trajectory is revised by a new velocity vector. The frequent changes of the object’s velocity will incur repeated update and prediction.
Fig. 7.1 Linear prediction vs. simulation-based prediction
7.3 Simulation-Based Prediction (SP) Methods Before presenting the simulation-based prediction methods, we first recall the GCA model introduced in Chapter 2, in particular the definition of CA and the transition of the GCA model. A cellular automaton (CA) consists of a finite oriented sequence of cells. In a configuration, each cell is either empty or contains a symbol. During a transition, symbols can move forward to subsequent cells, symbols can leave the CA, and new symbols can enter the CA. Let i be an object moving along an edge. Let v(i) be its velocity, x(i) its position, gap(i) the number of empty cells ahead
108
7 Trajectory Prediction of Moving Objects
(forward gap), and Pd (i) a randomized slowdown rate that specifies the probability that it slows down. We assume that Vmax is the maximum velocity of the moving objects. At each transition of GCA, each object changes velocity and position in a CA of length L according to the rules below: 1. 2. 3. 4.
if v(i) < Vmax and v(i) < gap(i), then v(i) ← v(i) + 1 if v(i) > gap(i), then v(i) ← gap(i) if v(i) > 0 and rand() < Pd (i), then v(i) ← v(i) − 1 if (x(i) + v(i)) ≤ L, then x(i) ← x(i) + v(i)
Considering the simulation feature of the GCA model, we use GCAs not only to model road networks, but also to simulate future trajectories of moving objects by the transitions of GCAs, where objects’ movement follows traffic rules. Based on the GCA, a Simulation-based Prediction (SP) method to anticipate future trajectories of moving objects is proposed. The SP method treats the object’s simulated results as its predicted positions to obtain its future in-edge trajectory. To refine the accuracy, based on different assumptions on the traffic conditions, we simulate two future trajectories in discrete points for each object on its edge. Then, by linear regression and translating, the trajectory bounds that contain all possible future positions of a moving object on that edge can be obtained. When the object moves to another edge in the GCA or the predicted position exceeds its actual position above the predefined accuracy, another simulation and regression will be executed to predict new future trajectory bounds. The process of the simulation-based prediction can be seen in Fig. 7.2.
Fig. 7.2 Two Predicted Bounds of Future Trajectories
7.3.1 Fast-Slow Bounds Prediction Most existing work uses the CA model for traffic flow simulation in which the parameter Pd (i) is treated as a random variable to reflect the stochastic, dynamic nature of traffic system. However, we extend this model for predicting the future trajectories of objects by setting Pd (i) to values that model different traffic conditions. For
7.3 Simulation-Based Prediction (SP) Methods
109
example, laminar traffic can be simulated with Pd (i) set to 0 or a small value, and the congestion can be simulated with a larger Pd (i). By giving Pd (i) two values, we can derive two future trajectories, which describe, respectively, the fastest and slowest movements of objects as showed in Fig. 7.2(a). In other words, the object’s future locations are most probably bounded by these two trajectories. The value of Pd (i) can be obtained by sampling from the given dataset. For getting the future trajectory function of an object from the simulated discrete points, we need to regress the discrete positions. We find that in most cases the linear regression (as shown in Fig. 7.2(a) fits the prediction well and at low cost. The ordinary least square estimation (OLSE) method, for example, can be calculated efficiently at low data storage cost. Let the discrete simulated points be (t1 , d1 ), . . . , (ti , di ), . . . , (tn , dn ), where di (i ∈ [1, n]) denotes the relative distance in a network edge. The average value of them be t and d. After regression, the trajectory function of the moving object is: D(t) = βˆ0 + βˆ1 · t
(7.2)
βˆ0 = d − βˆ1 · t ∑n ti di − nt · d βˆ1 = i=1 ∑ni=1 ti2 − n(t)2
(7.3)
where βˆo and βˆ1 are given by:
(7.4)
In Fig. 7.2(a), the dashed curves show two future trajectories, which are the slowest and the fastest movements simulated by using different Pd . Applying the OLSE algorithm to the two trajectories generates two linear functions, which are shown in solid lines. f astTr j : D(t) = α f · t + γ f slowTr j : D(t) = αs · t + γs
(7.5) (7.6)
Finally, in order to find the bounds of the area that contains all estimated future positions, we translate the two regression lines, until all estimated future positions fall within. More specifically, we translate the upper line (fastest movement) upwards until it touches the point with the max residual (denoting ε1 the distance translated upward), and similarly, we translate the lower line (slowest movement) downwards (denoting ε2 the distance translated downward). This minimizes the loss of information and errors brought by the OLSE algorithm. We now define the two bound lines as the upper bound and lower bound of the object’s future trajectories. Definition 7.1. The upper bound of an object trajectory upperBound is the upper bound line of its fastest future trajectory, and the lower bound lowerBound is the lower bound line of its slowest future trajectory. They are linear functions of the following form: upperBound : D(t) = α f · t + λ f lowerBound : D(t) = αs · t + λs where λ f = γ f + ε1 , λs = γs − ε2 .
(7.7) (7.8)
110
7 Trajectory Prediction of Moving Objects
The two bound lines are shown in Fig. 7.2(b). we can treat the two predicted lines as the bounds of the possible future positions of one object. The predicted trajectory bounds can be used in the predictive index structure and query processing in road network to reduce the index updates and filter unnecessary query results to improve the performance of predictive query. For example, given a predictive range query with the specified region R during time interval [t1 ,t2 ] in the future, we can filter the objects in the result during the pre-process phase if the area between their upper and lower trajectory bounds can not intersect the R during [t1 ,t2 ]. However, for other applications such as the tracking of moving objects, a single predicted function is needed to obtain the specific future positions of the object. For example, to lower update frequency from moving objects to server database, a general principle for location update policies is as follows: the moving objects equipped by GPS receiver do not report their locations to the server unless their actual positions exceed the predicted positions to a certain threshold. Their predicted positions need to be computed by a single predicted function. In this case, we can also adapt the SP method to obtain a compact and simple linear prediction function. The process can be seen in Fig. 7.3. After regressing the two simulated future trajectories to two linear function denoting L1 and L2 , we compute the middle straight line L3 , the bisector of the angle a between L1 and L2 as the final predicted function L(t).
Fig. 7.3 Singe predicted future trajectory
Although the predicted function obtained by the SP method is a simple linear function, it is different from the linear prediction in that the SP method not only considers the speed and direction of each moving object, but also takes correlation of objects as well as the stochastic behavior of the traffic into account.
7.3.2 Time-Segmented Prediction As the prediction of in-edge trajectory only use the GCA to simulate the movement of objects in an edge, we have to consider the cases when objects move across the nodes in order to make the global trajectory prediction. If the out-degree of a node in the GCA is one, the behavior of the object in the adjacent edge is the same. However,
7.5 Summary
111
if the out-degree of the node is bigger than one, we can not trace the objects cross different edges. In this case, we could use the probability of objects changing the edges according to the historical data. In the last section, we only predict the in-edge trajectory of the object moving in one edge of the GCA. When the object moves to another edge or its prediction accuracy of the future positions cannot meet the given accuracy requirement, we issue another prediction based on the current traffic conditions. For the predicted fast and slow trajectory bounds, it is possible that the predicted positions at different time stamps exceeds the real positions given query precision range. In particular, as the time goes, the predicted trajectory bounds will expand and lead to worse prediction accuracy. Therefore, the Time-Segmented prediction method is used in this case. The simulation and prediction are issued every fixed time internal, such as tLength. Even within the tLength, when the predicted locations can not meet the requirement of query anymore, we issue another prediction. The Time-Segmented prediction method can estimate the real trajectory of moving objects with better accuracy.
7.4 Other Non-Linear Prediction Methods The prediction model plays an important role in tracking of moving objects. Most existing prediction methods assume linear movement, which limits applicability in the majority of real applications. In paper [1], the non-linear models such as the acceleration are used to represent the trajectory which is affected by the abnormal traffic such as traffic incident. Xu and Wolfson [4] apply the time-series prediction together with moving speed to traffic management and moving object databases. Karimi and Liu [2] describe a technique for trajectory prediction which assigns probabilities to the roads emanating from an intersection and uses the most probable route within some extracted sub-road network to predict. Recently, according to the trend of each object’s own movement regarding its recent past locations, Tao et al. [3] propose a prediction method based on recursive motion functions for objects with unknown motion patterns. Although these prediction methods can improve the precision of location prediction of each object, they ignore the correlation of movements of adjacent objects in traffic networks, and thus may not reflect the realistic traffic movements.
7.5 Summary Some trajectory prediction methods are introduced in this chapter, which are very important to the management of moving objects. Motivated by the features of vehicle’s movements in traffic networks, we propose the new simulation-based prediction methods, which are much more precise than the linear prediction methods. In chapters 3 and 4, we have used the simulation-based method to improve the performance of the location update strategy and to support the predictive index and queries.
112
7 Trajectory Prediction of Moving Objects
References 1. Aggarwal C, Agrawal D (2003) On Nearest Neighbor Indexing of Nonlinear Trajectories. In: Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS 2003), San Diego, California, USA, pp 252-259 2. Karimi HA, Liu X (2003) A Predictive Location Model for Location-Based Services. In: Proceedings of the 11st ACM International Symposium on Advances in Geographic Information Systems (GIS 2003), New Orleans, Louisiana, USA, pp 126-133 3. Tao y, Faloutsos C, Papadias D, Liu B (2004) Prediction and Indexing of Moving Objects with Unknown Motion Patterns. In: Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 611-622 4. Xu B, Wolfson O (2003) Time-Series Prediction with Applications to Traffic and Moving Objects Databases. In: Proceedings of the 3rd ACM International Workshop on Data Engineering for Wireless and Mobile Access (MobiDE 2003), San Diego, California, USA, pp 56-60
Chapter 8
Uncertainty of Moving Objects
Xiaofeng Meng1 , Jidong Chen2 1
Renmin University of China, 2 EMC Research China
[email protected],
[email protected] Abstract One of the key research issues with moving objects databases is the uncertainty management. The uncertainty management for moving objects has been well studied recently, with many models and algorithms proposed. In this chapter, we analyze the uncertainty of moving objects in spatial networks and introduce an uncertain trajectory model and an index framework, the uncertain trajectory based Rtree (UTR-tree), for indexing the fully uncertain trajectories of network-constrained moving objects. Then, we introduce how to process queries on this framework. The content of this chapter is mainly from the work of Ding in [14]. Key words: uncertainty management, uncertainty trajectory, spatial network, moving object databases
8.1 Introduction Uncertainty management is one of the most important research issues in moving objects databases (MOD). In the MOD system, moving objects such as cars, flights, ships, and pedestrians, are uniquely identified and each of them is equipped with a portable computing platform and other integrated location tracking equipments (such as GPS). Through location updates, the moving object reports to the server its latest location information so that its location at any time instant can be retrieved from the server by other querying users. Since location update messages are sent intermittently, between any two consecutive location updates, the server cannot know the exact location of the moving object, and it can only infer its possible locations. As a result, uncertainty becomes an inherent aspect of MOD [1, 2].
113
114
8 Uncertainty of Moving Objects
In recent years, a lot of research has been focused on the uncertainty management problem with lots of effective models and algorithms being proposed. In [1], the authors analyzed the sources of uncertainty in moving objects databases, and proposed a framework to deal with uncertain data. In [2, 3], the authors discussed the uncertainty management strategies in the DOMINO system. By applying an uncertainty threshold, the trajectory of a moving object is extended from a curve to a tube in the X ×Y × T space and the operations, such as inside, are extended by introducing the uncertainty semantics such as “sometimes”, “always”, “possibly”, and “definitely”. Research [4] explored the uncertainty and fuzziness in managing moving objects, and a framework is provided to deal with spatio-temporal indeterminacies. In [5], a set of data types and operations have been proposed for the uncertainty management of moving objects. However, all the above studies are based on the X ×Y × T Euclidean-space, and nearly none of them have treated the interaction between moving objects and the underlying transportation networks in any way. Recently, based on the fact that most moving objects only move in fixed transportation networks, researchers have realized the importance of modeling network constrained moving objects, and meanwhile, the uncertainty of network-constrained moving objects has also been studied. In [6], the authors analyzed the uncertainty management problem for network-constrained moving objects. Through reasonable location modeling and location update methods, the possible location of a moving object at any time is reduced to a graph route section instead of a region so that the indeterminacies can be greatly reduced. In [7], the authors propose to use transportation networks to reduce sampling noises from GPS or to predict future positions of moving objects. In [8], the authors further analyzed the uncertainty of network-constrained moving objects based on the study [6], with a rich set of data types and operations for representing historical uncertain trajectories defined. Even though the modeling of uncertainty has been relatively well studied, the research on the index of moving object trajectories with uncertainty considered is very limited. Previously, a lot of index methods were proposed to deal with the trajectories of moving objects both with and without network concerned. For instance, in [9, 10], the authors proposed an R-tree based index method for Euclidean space-based trajectory data. In [11], the author proposed a Fixed Network R-tree (FNR-Tree) to index moving objects on fixed networks. In [12], the authors proposed the MON-tree based on the improvement to the FNR-Tree. The study [13] has dealt with the future trajectories of network-constrained moving objects. In this chapter, we discuss the uncertainty issue of moving objects in spatial networks and introduce an uncertain trajectory model for moving objects in a spatial network. Then a two-layered index framework, the uncertain trajectory based Rtree (UTR-tree), is proposed to index the fully uncertain trajectories of networkconstrained moving objects. Finally, we introduce how to process queries on this framework.
8.2 Uncertain Trajectory Modeling In this section, we first define a road network framework, which is the basis for the UTR-tree hybrid index structure. Then we define network-constrained moving
8.2 Uncertain Trajectory Modeling
115
objects and uncertain trajectories. Finally, we describe the sampling method of uncertain trajectories and provide an analysis on an uncertain trajectory. For simplicity, we model the whole transportation network as one single graph and we will use “transportation network” and “transportation graph” interchangeably throughout this chapter. Definition 8.1. A transportation graph G is defined as a pair: G = (R, J)
(8.1)
where R is a set of routes and J is a set of junctions. Definition 8.2. A route of graph G, denoted by r, is defined as follows: r = (rid, route, len, f dr)
(8.2)
where rid is the identifier of r, route is a polyline that describes the geometry of r, len is the length of r, and f dr ∈{0, 1, 2} is the direction of the traffic flow allowed in r. The polyline route in the above definition can be defined as a series of points in the Euclidean space. For simplicity, we suppose that the graph is spatially embedded in the X ×Y plane so that the polyline can be presented as a series of points of the (x, y) form. The polyline is considered directed, so we can call the two points of route the beginning point (or 0-end) and the end point (or 1-end) of the route. The direction of traffic flow allowed in a route can have three possibilities, which are specified by f dr, whose value can be assumed 0, 1, 2, which corresponds to “from 0- end to 1-end”, “from 1-end to 0-end”, and “both directions allowed”, respectively. Definition 8.3. A junction of graph G, denoted by j, is defined as follows: j = ( jid, loc, ((ridi , posi ))ni=1 , m)
(8.3)
where jid is the identifier of j; loc is the location of j, which can be represented by a point value in the X × Y plane; m is the connectivity matrix [3] of j; and ((ridi , posi ))ni=1 describes the routes connected by j, where ridi is the identifier of the ith route, and posi ∈ [0, 1] describes the position of the junction in the ith route. We suppose that the total length of any route is 1. Then every location in the route can be represented by a real number p ∈ [0, 1]. Each value in m is 0 or 1, which indicate whether the object can move from one road to another, as illustrated in Fig. 8.1. As shown in Fig. 8.1, route r1 has two traffic flow directions, “r1+” and “r1−”. Route r2 has one traffic flow direction “r2−”. The object in “r1−” can move into “r2−” in Fig. 8.1(a), so the corresponding value in Fig. 8.1(b) is 1. Definition 8.4. A position inside the graph G, denoted by gpos, is defined as a pair: gpos = (rid, pos)
(8.4)
where rid is the identifier of the route, and pos ∈ [0, 1] is the relative position inside the route. Since the geometry of each route is maintained in the database as a polyline, the above representation can be transformed to the (x, y) form easily.
116
8 Uncertainty of Moving Objects
Fig. 8.1 Junction and corresponding connectivity matrix
Definition 8.5. Motion vectors are snapshots of a moving object’s movements and are generated by location updates. A motion vector, mv, is defined as follows: mv = (t, (rid, pos), v)
(8.5)
where t is a time instant, (rid, pos) is a network position describing the location of the moving object at time t, and v is the speed measure of the moving object at time t. The speed measure v is a real number, which contains both speed and direction information. Its abstract value is equal to the speed of the moving object at time t, while its sign (either positive or negative) indicates the traffic flow direction the moving object belongs to. If the moving object is moving from 0-end towards 1-end, then the sign is positive. Otherwise, if it is moving from 1-end to 0-end, the sign is negative. In network-constrained moving objects databases, three kinds of location updates are defined in [14], that is, ID-Triggered Location Update (IDTLU), DistanceThreshold Triggered Location Update (DTTLU), and Speed-Threshold Triggered Location Update (STTLU). DTTLU and STTLU are triggered when the moving object exceeds the distance threshold ξ and the speed threshold ψ , respectively, and will generate one motion vector mva ; IDTLU is triggered when the moving object transfers from one route rs to another route re , and will generate three motion vectors, mva1 , mva2 , mva3 , where mva1 and mva2 correspond to the junction location and mva3 corresponds to the location when IDTLU is triggered. These three kinds of location updates work together to complete the trajectory data sampling process. Definition 8.6. The trajectory of a moving object mo is a sequence of motion vectors sent by mo through location updates during its journey. A trajectory, denoted as Tr, is defined as follows: Tr = (mvi )ni=1 = ((ti , (ridi , posi ), vi ))ni=1
(8.6)
where mvn is the last motion vector submitted by the moving object, and we call it the “active motion vector” of the moving object, which contains the key information for computing the current or near future locations of the moving object and for triggering the next location update.
8.2 Uncertain Trajectory Modeling
117
As stated in [6], through the trajectory we can only know the exact location of the moving object at the location update time. Between two location updates and after the last location update, the location of the moving object is uncertain and we can only compute the possible locations according to the corresponding motion vectors. Therefore, the trajectory of the moving object actually describes the “uncertain locations” of moving objects, and therefore we call it “uncertain trajectory” in this chapter. For simplicity, trajectory and uncertain trajectory will be used interchangeably throughout this chapter. In [6, 8], the authors have analyzed the possible locations that can be derived from the trajectories. Between any two consecutive motion vectors mvs and mve , the possible locations of the moving object mo forms a hexagon in the POS×T plane. After the last motion vector mva , we can predict the possible location of the moving object until the end (either 0-end or 1-end, depending on the direction of mo) of the route, and the possible locations of the moving object forms a pentagon, quadrangle, or triangle, depending on the distance threshold ξ , speed threshold ψ , and the active motion vector mva . We call the above-mentioned polygons “Uncertain Trajectory Unit”, or UT-Unit for short. The UT-Unit corresponding to two consecutive motion vectors mvs and mve is called non-active UT-Unit and is denoted as UTUnit(mvs , mve ), and the UT-Unit corresponding to the active motion vector mva = (ta , (rida , posa ), va ) is called active UT-Unit and is denoted as UT-Unit(mva ). Figure 8.2 illustrates the geometry of the possible locations derived from UT-Units.
Fig. 8.2 Uncertain trajectory units of moving objects
The uncertain trajectory of a moving object is a set of consecutive uncertain trajectory units, as shown in Fig. 8.3.
118
8 Uncertainty of Moving Objects
Fig. 8.3 Uncertain trajectory of a moving object
8.3 Uncertain Trajectory Indexing In this section, we propose a new indexing method, the uncertain trajectory R-tree (or UTR-tree for short), which can index the fully uncertain trajectories, including past, present, and near-future possible locations of moving objects. To simplify the discussion, we suppose that the functions route(rid), junct( jid) return the route and the junction, respectively. The function RTreelow (rid) returns the corresponding lower R-tree of route(rid).
8.3.1 Structure of the UTR-Tree The structure of the UTR-tree is two-layered. The upper layer is a single R-tree that indexes the directed atomic route sections of the traffic network, and the lower layer consists of a forest of R-trees, with each R-tree corresponding to a certain route and indexing the uncertain trajectory units submitted in the route. Figure 8.4 illustrates the structure of the UTR-tree.
Fig. 8.4 Structure of the UTR-tree
8.3 Uncertain Trajectory Indexing
119
As shown in Fig. 8.4, the upper part of the UTR-tree is a standard R-Tree that takes directed atomic route sections as the basic unit of indexing. The records of the leaf nodes take the form < MBRxy , rid.aid, ptroute , pttree >, where MBRxy is the MinimumBoundingRectangle (MBR) (in the X × Y plane) of the directed atomic route section, rid.aid is a route identifier, ptroute is a pointer to the detailed route record, and pttree is a pointer to the lower R-tree corresponding to route(rid). The root or internal nodes contain records of the form < MBRxy , ptnode >, where MBRxy is the MBR (in the X ×Y plane) containing all MBRs of its child records, and ptnode is a pointer to the child node. The lower part of the UTR-tree is composed of a set of R-trees with each Rtree corresponding to a certain route and indexing all the UT-Units that the moving objects submitted in the route. The records of the leaf nodes have the form < MBR pt , mid, mvs , mve >, where MBR pt is the MBR (in the POS × T plane) of the UT-Unit, mid is the identifier of the moving object, mvs = (ts , rid, poss , vs ) and mve = (te , rid, pose , ve ) are two consecutive motion vectors that form the UT-Unit. If the UT-Unit is an active unit, then mve is NULL and the MBR is < ta , posa , min(tξ ,tψ ), 1 >. The root or internal nodes of the lower R-tree contains records of the form < MBR pt , ptnode >, where MBR pt is the MBR (in the POS × T plane) containing all MBRs of its child records, and ptnode is the pointer to its child node. For simplicity, in the following discussion we suppose that the moving object runs from 0-end towards 1-end during the time period corresponding to the UTUnit. The methods proposed can be easily adapted to deal with the situation when moving objects run from 1-end to 0-end. Let us first analyze non-active uncertain trajectory units. From the location update strategies for network-constrained moving objects [14], we know that for a nonactive trajectory unit UT-Unit(mvs , mve ) (where mvs = (ts , (rids , poss ), vs ), mve = (te , (ride , pose ), ve ), and rids = ride ), the location of the moving object at any given time tq ∈ [ts ,te ], denoted as pos[tq ], should meet the following condition: pos[tq ] ∈ [posqmin , posqmax ]
(8.7)
where posqmin , posqmax can be computed in the following way [6] (since we use relative position pos ∈ [0, 1], we suppose that the speed measures vs , ve , va , the distance threshold ξ , and the speed threshold ψ have already been divided by r.length, the length of the route, before computation): posqmin = max (posq − ξ , poss + (vs − ψ ) × (tq − ts ), pose − (vs + ψ ) × (te − tq )) (8.8) posqmax = min (posq − ξ , poss + (vs + ψ ) × (tq − ts )pose − (vs − ψ ) × (te − tq )) (8.9) where posq = poss + vs × (tq − ts ). We can depict the geometry of UT-Unit(mvs , mve ) as a hexagon (see the shadowed part of Fig. 8.5). As shown in Fig. 8.5, the hexagon of UT-Unit(mvs , mve ) is actually formed and surrounded by six lines corresponding to the Eqs. 8.8 and 8.9. Now let us analyze the MBR of this hexagon. Suppose that (t∗, pos∗) is an arbitrary point inside the hexagon, where ts ≤ t∗ ≤ te . Since the moving object runs along route(rids ) monotonically from poss towards pose during the time period from ts to te , we can be assured that pos∗ must meet the following condition: poss ≤ pos∗ ≤
120
8 Uncertainty of Moving Objects
pose . Therefore, the MBR of UT-Unit(mvs , mve ) is: mbr(mvs , mve )=< ts , poss ,te , pose >, as shown in Fig. 8.5.
Fig. 8.5 Geometry and MBR of UT-Unit(mvs , mve )
In the following, let us consider active trajectory units. For an active trajectory unit UT-Unit(mva ), the location of the moving object at a given time tq (ta ≤ tq ≤ tnow ), denoted as pos[tq ], should meet the following condition: pos[tq ] ∈ [posqmin , posqmax ]
(8.10)
where posqmin and posqmax can be computed in the following way [6]: posqmin = max (posq − ξ , posa + (va − ψ ) × (tq − ta ))
(8.11)
posqmax = min (1, posq + ξ , posa + (va + ψ ) × (tq − ta ))
(8.12)
posq
= posa + va × (tq − ta ). where The geometry of UT-Unit(mva ) is illustrated in Fig. 8.6 (see the shadowed parts).
Fig. 8.6 Geometry and MBR of UT-Unit(mva )
8.3 Uncertain Trajectory Indexing
121
8.3.2 Construction and Maintenance of UTR-Tree When the UTR-tree is first constructed in the moving objects database, the system has to read the route records of the traffic network and build the upper R-tree. At this moment, all lower R-trees are empty trees. After the construction, whenever a new location update message is received from a moving object, the server will generate corresponding UT-Units and insert them into the related lower R-tree(s). Since active UT-Units contain prediction information, when a new location update occurs, the current active UT-Unit of the moving object should be replaced by newly generated UT-Units. Let us first consider the situation when DTTLU and STTLU occur. From the location update strategies for network-constrained moving objects, we know that when DTTLU or STTLU happens, the moving object mo is still running in route(ridn ), where ridn is the route identifier contained in the current active motion vector mvn . When the server receives a new location update message mva = (ta , rida , posa , va ) (where rida = ridn ), if mva is the first motion vector of mo in RTreelow (rida ), then the system only needs to insert UT-Unit(mva ) into RTreelow (rida ) directly. Otherwise, the system has to do the following with RTreelow (rida ): (1) Delete UTUnit( mvn ) generated at the last location update; (2) Insert UT-Unit(mvn , mva ) into RTreelow (rida ); (3) Insert Unit(mva ) into RTreelow (rida ), as shown in Fig. 8.7.
Fig. 8.7 Maintenance of Lower R-Tree Records
When IDTLU occurs, the process is relatively more complicated. Suppose that the moving object mo transfers from route rs to re (with route identifiers rids , ride , respectively). In this case three location update messages will be generated: mva1 , mva2 , mva3 , where mva1 corresponds to the junction’s position in rs , mva2 corresponds to the junction’s position in re , and mva3 corresponds to the location update position in re . The UTR-Tree needs to carry out the following operations: (1) Delete UT-Unit (mvn ) from RTreelow (rids ), and insert UT-Unit (mvn , mva1 ) into RTreelow (rids ); (2) Insert UT-Unit (mva2 , mva3 ) into RTreelow (ride ); (3) Insert UTUnit (mva3 ) into RTreelow (ride ). Then we discuss the constructing and dynamic maintaining algorithm for the UTR-tree (refer to Algorithm 20). In Algorithm 20, the function mbr() returns the MBR of the given UT-Unit, the functions Insert() and Delete() finish the insertion and deletion of UT-Units in the corresponding lower R-trees, respectively.
122
8 Uncertainty of Moving Objects
Algorithm 20: Constructing/Dyn-Maintenance of UTR-tree Read route records of N; Set all lower R-trees to empty tree; while MOD is running do Receive location update package LUM from moving object; if LUM contains 1 motion vector mva = (ta , rida , posa , va ) then Let mvn be mo’s active motion vector in RTreelow (rida ); if mvn = NULL then Insert(RTreelow (rida ),(mid,UT-Unit(mvn ),mbr(mvn ))); end else Delete(RTreelow (rida ),(mid,UT-Unit(mvn ),mbr(mvn ))); Insert(RTreelow (rida ),(mid,UT-Unit(mvn ,mva ),mbr(mvn ,mva ))); Insert(RTreelow (rida ),(mid,UT-Unit(mva ),mbr(mva ))); end end else if LUM contains 3 motion vectors mva1 , mva2 , mva3 then Let mvn be mo’s active motion vector in RTreelow (rida1 ); Delete(RTreelow (rida1 ),(mid,UT-Unit(mvn ),mbr(mvn ))); Insert(RTreelow (rida1 ),(mid,UT-Unit(mvn ,mva1 ),mbr(mvn ,mva1 ))); Insert(RTreelow (rida2 ),(mid,UT-Unit(mva2 ,mva3 ),mbr(mva2 ,mva3 ))); Insert(RTreelow (rida2 ),(mid,UT-Unit(mva3 ),mbr(mva3 ))); end end
8.4 Uncertainty Trajectory Querying Since in moving objects databases, the most common uncertain query operators, such as possibly-inside(tra jectory, Ix × Iy × It) and possibly-intersect (tra jectory, Ix × Iy × It) (where Ix, Iy, It are intervals in X, Y , T domains), belong to range queries, that is, the input of the query is a range in the X × Y × T space, we take range query as an example to show how the uncertain query processing is supported by the UTR-tree.
Fig. 8.8 Range query through UTR-tree
The querying of the UTR-tree can be completed in two steps. When processing a range query (suppose the range is Ix × Iy × It), the system will first query the upper R-tree of the UTR-tree according to Ix × Iy, and will receive a set of (rid × period)
References
123
pairs as the result, where period ⊆ [0, 1] and can have multiple elements; then for each (rid × period) pair, search the corresponding lower R-tree to find the UT-Units intersecting period × It, and output the corresponding moving object identifiers. Figure 8.8 illustrates the range query processing based on the UTR-tree. The query algorithm is given in Algorithm 21.
Algorithm 21: RNE (node id, QS, result) input : Ix × Iy × It output: Result Search the upper R-Tree according to Ix × Iy, and receive a set of pairs: (ridi , periodi )ni=1 ; for i:=1 to n do for ∀ρ ∈ period × It do Let μ be the set of UT-Units in RTreelow (ridi ) which intersect ρ ; Result = Result∪the set of moving object IDs in the element of μ ; end end Return Result;
8.5 Summary Uncertainty management is a key research issue with moving objects databases, and a lot of research has been focused on this problem recently. However, most studies focus on the modeling (data types, operations, and algorithms) of uncertainty, leaving the index of uncertain trajectories for moving objects, especially networkconstrained ones, as an unsolved problem. In this chapter, we introduce an index structure, the UTR-tree, which can support the most common uncertain query operators, such as possibly-inside and possibly-intersect.
References 1. Pfoser D, Jensen CS (1999) Capturing the Uncertainty of Moving Object Representations. In: Proceedings of the 6th International Symposium on Advances in Spatial Databases (SSD 1999), Hong Kong, China, pp 111-132 2. Trajcevski G, Wolfson O, Chamberlain S, Zhang F (2002) The Geometry of Uncertainty in Moving Objects Databases. In: Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology (EDBT 2002), Prague, Czech Republic, pp 233-250 3. Trajcevski G, Wolfson O, Cao H, Lin H, Zhang F, Rishe N (2002) Managing Uncertain Trajectories of Moving Objects with Domino. In: Proceedings of the 4th International Conference on Enterprise Information Systems (ICEIS 2002), Ciudad Real, Spain, pp 769-771 4. Pfoser D, Tryfona N (2001) Capturing Fuzziness and Uncertainty of Spatiotemporal Objects. In: Proceedings of the 5th East European Conference on Advances in Databases and Information Systems (ADBIS 2001), Vilnius, Lithuania, pp 112-126 5. Tøssebro E, Nyg˚ard M (2002) Uncertainty in Spatio-Temporal Databases. In: Proceedings of the 2nd International Conference on Advances in Information Systems (ADVIS 2002), Izmir, Turkey, pp 43-53
124
8 Uncertainty of Moving Objects
6. Ding Z, G¨uting RH (2004) Uncertainty Management for Network Constrained Moving Objects. In: Proceedings of the 2004 International Conference on Database and Expert Systems Applications (DEXA 2004), Zaragoza, Spain, pp 411-421 7. Gowrisankar H, Nitte S (2002) Reducing Uncertainty in Location Prediction of Moving Objects in Road Networks. In: Proceedings of the 2002 Conference on Geographic Information Science (GIScience 2002), Boulder, Colorado, USA, pp 228-242 8. Almeida VT, G¨uting RH (2005) Supporting Uncertainty in Moving Objects in Network Databases. In: Proceedings of the 13th Annual ACM International Workshop on Geographic information Systems (GIS 2005), Bremen, Germany, pp 31-40 9. Pfoser D, Jensen CS, Theodoridis Y (2000) Novel Approaches in Query Processing for Moving Object Trajectories. In: Proceedings of the 26th International Conference on Very Large Data Bases (VLDB 2000), Cairo, Egypt, pp 395-406 10. Saltenis S, Jensen CS, Leutenegger ST, Lopez MA (2000) Indexing the Position of Continuously Moving Objects. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD 2000), Dallas, Texas, USA, pp 331-342 11. Frentzos E (2003) Indexing Objects Moving on Fixed Networks. In: Proceedings of the 8th International Symposium of Advances in Spatial and Temporal Databases (SSTD 2003), Santorini Island, Greece, pp 289-305 12. Almeida VT, G¨uting RH (2005) Indexing the Trajectories of Moving Objects in Networks. GeoInformatica 9(1):30-66 13. Chen J and Meng X (2007) Indexing Future Trajectories of Moving Objects in a Constrained Network. Journal of Computer Science and Technology 22(2):245-251 14. Ding Z, G¨uting RH (2004) Managing Moving Objects on Dynamic Transportation Networks. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Santorini Island, Greece, pp 287
Part III
Moving Objects Management Applications
Nowadays, the rapid advances in wireless communications, positioning techniques and mobile devices enable many new applications. Some important classes of new applications that will be enabled by this revolutionary development include intelligent traffic management, location-based services, tourist services, mobile electronic commerce, and digital battlefield. Some existing application classes that will benefit from the development include transportation and air traffic control, weather forecasting, emergency response, mobile resource management, and mobile workforce. Location management is an enabling technology for all these applications. This part describes a few of typical applications of moving objects management, e.g., Dynamic Transportation Navigation and Dynamic Transportation Networks. Some advanced applications like location privacy and clustering analysis of moving objects are also introduced. In Chapter 9, we first discuss different kinds of applications that can be built on top of moving objects management technology. Then a typical application in intelligent transportation system, the dynamic transportation navigation is described in detail. Specifically, we propose a system architecture providing the user, always in real time and in a continuous fashion, the optimal path to destination considering traffic conditions. This system is based on an extension of the Quad-tree access method that adapts better to the road networks while it maintains aggregated information on traffic density according to hierarchy levels. Based on this index, a viewbased hierarchy search method is proposed to reduce selection areas and accelerate the optimal path computation. In Chapter 10, another application, a new moving objects model and query system for moving objects on dynamic transportation networks (MODTN), is proposed. In MODTN, moving objects are modeled as moving graph points which move only within predefined transportation networks. To express general events of the system, such as traffic jams, temporary constructions, insertion and deletion of junctions or routes, the underlying transportation networks are modeled as dynamic graphs so that the state and the topology of the graph system at any time instant can be tracked and queried. In Chapter 11, we introduce an advanced application, clustering analysis of moving objects in spatial networks. Considering unique features of spatial networks, we first propose two static clustering algorithms, which use the information of nodes and edges in the network to improve the clustering efficiency and accuracy. Then we introduce a notion of cluster block (CB) as the underlying clustering unit and propose a unified framework of clustering moving objects in spatial networks (CMON), which improves the dynamic clustering performance of moving objects and supports different clustering criteria. In CMON, the clustering process is divided into the continuous maintenance of CBs and periodical construction of clusters with different criteria based on CBs. Since LBS is becoming valuable and important, location privacy issues raised by such applications have also grasped more and more attentions. However, due to the specificity of location information, traditional privacy-preserving techniques in data publishing can not be used. In Chapter 12, we introduce location privacy, and analyze the challenges of location privacy-preserving. Then we introduce a survey of existing work including the system architecture, location anonymity model and technique for query processing on anonymized data.
Chapter 9
Dynamic Transportation Navigation
Xiaofeng Meng, Jidong Chen
[email protected],
[email protected] Renmin University of China, EMC Research China Abstract Miniaturization of computing devices, and advances in wireless communication and sensor technology are some of the forces that are propagating computing from the stationary desktop to the mobile outdoors. Some important classes of new applications that will be enabled by this revolutionary development include intelligent traffic management, location-based services, tourist services, mobile electronic commerce, and digital battlefield. Some existing application classes that will benefit from the development include transportation and air traffic control, weather forecasting, emergency response, mobile resource management, and mobile workforce. Location management, i.e., the management of transient location information, is an enabling technology for all these applications. In this chapter, we present the applications of moving objects management and their functionalities, in particular, the application of dynamic traffic navigation, which is a challenge due to the highly variable traffic state and the requirement of fast, on-line computations. Key words: moving object databases, moving object application, location-based services, traffic management, dynamic traffic navigation
9.1 Introduction In 1996, the Federal Communications Commission mandated that all wireless carriers offer a 911 service with the ability to pinpoint the location of callers making emergency requests. This requirement has been forcing wireless operators to roll out costly new infrastructure that provides location data about mobile devices. In part to facilitate the rollout of these services, in May 2000, the U.S. government stopped jamming the signals from global positioning system satellites for use in civilian ap-
127
128
9 Dynamic Transportation Navigation
plications, dramatically improving the accuracy of GPS-based location data to 5−50 meters [6]. As prices of basic enabling equipment like smart cell phones, handheld devices, wireless modems, and GPS devices continue to drop rapidly, the number of wireless subscribers worldwide will soar. Spurred by the combination of expensive new location-based infrastructure and an enormous market of mobile users, companies will roll out new wireless applications to recoup their technology investments and increase customer loyalty and switching costs. These applications are collectively called location-based services. Emerging commercial location-based services include mobile resource management (MRM) applications such as systems for mobile workforce management, automatic vehicle location, fleet management, and logistics, transportation management and support (including air traffic control). These systems use location data combined with route schedules to track and manage service personnel or transportation systems. Call centers and dispatch operators can use these applications to notify customers of accurate arrival times, optimize personnel utilization, handle emergency requests, and adjust for external conditions like weather and traffic. Other examples of location based services are location-aware content delivery that use location data to tailor the information delivered to the mobile user in order to increase relevance; for instance, delivering accurate driving directions, instant coupons to customers nearing a store, or nearest resource information like local restaurants, hospitals, ATM machines, or gas stations. In addition to commercial systems, the need for management of moving objects in location based systems arises in the military, in the context of the digital battlefield. In a military application one would like to ask queries such as “retrieve the helicopters that are scheduled to enter region R within the next 10 minutes”. Moving object management is an enabling technology for all the above-mentioned applications. Managing moving objects is also a fundamental component of other technologies such as fly-through visualization (the visualized terrain changes continuously with the location of the user), context-awareness (the location of the user determines the content, format, or timing of information delivered), augmented reality (the location of both the viewer and the viewed object determines the type of information delivered to the viewer), and cellular communication.
9.2 Moving Objects Management Application Scenarios We discuss different types of applications that can be built based on moving objects management technology [6]. • Geographic resource discovery: A mobile user provides his/her (partial) future trajectory to a service provider, and expects the answer to queries/triggers such as: “notify me when I am two miles away from a motel (in my travel direction) which has rooms available for under $100 per night”. The service provider uses an MOD to store the location information of its customers, and answers their queries/triggers. • Digital battlefield: The dynamic location and status of the moving objects (tanks, helicopters, soldiers) in the battlefield is stored in a MOD that must answer
9.2 Moving Objects Management Application Scenarios
•
•
• •
•
• •
129
queries and process triggers of various degrees of complexity (e.g., “How many friendly tanks are in region X?”). Transportation (taxi, courier, emergency response, municipal transportation, traffic control, supply chain management, logistics): In these applications, the MOD stores the trajectories of the moving objects and answers queries, such as: “Which taxi cab is expected to be closest to 320 State street half an hour from now” (when the presumably service is requested at that address); “When will the bus arrive at the State and Oak station?”; “How many times during last month was bus No.25 late at some station by more than 10 minutes?” Location (or Mobile) e-commerce and marketing: In these applications, coupon information and other location-sensitive marketing information are fed to a mobile device (that presumably screens it based on the user profile, and displays it selectively). Mobile workforce management: Utilities and commercial service providers track their service engineers and the MOD answers queries such as: “which service crew is closest to the emergency at 232 Hill street”? Context-awareness, augmented reality, fly-through visualization: In these applications, the service provider feeds, in real time, the relevant information to the current location of a moving user. For example, a geologist driving through a terrain can use its handheld device to view the area he/she sees with the naked eye, but with additional information superimposed. The additional information is fed by the server and may include seismographic charts, images of the terrain taken during another season, and notes made by other geologists about each landmark in the viewable terrain. Air traffic control: Currently commercial flights take highways in the sky, but when free flight is instituted, a typical trigger to the air-traffic control MOD may be: “retrieve the pair of aircraft that are on a collision course, i.e., are expected to be less than a mile apart at some time point”. Dynamic allocation of bandwidth in cellular networks: Cellular service providers may track their customers and may dynamically change the bandwidth allocation to various cells to satisfy changing customer density. Querying in mobile environments: A Mobile Ad hoc Network (MANET) is a system of mobile computers equipped with wireless broadcast transmitters and receivers that are used for communicating within the system. Such networks provide an attractive and inexpensive alternative to the cellular infrastructures when this infrastructure is unavailable (e.g., in remote and disaster areas), inefficient, or too expensive to use. Knowing the location of the destination computer enables better and more reliable routing of messages. Thus, maintaining the trajectories of mobile computers in an MOD is an attractive alternative. However, in this case, the MOD is distributed among the moving objects themselves, since a centralized solution defeats the MANET purpose.
Currently, commercial MOD products provide a very limited set of capabilities, and they focus on transportation, particularly fleet management systems.
130
9 Dynamic Transportation Navigation
9.3 Dynamic Transportation Navigation Traffic navigation, one application of moving objects management, has gained considerable attention because it very closely represents modern life. Besides, traffic surveillance technologies (either GPS or cell phones) allow monitoring and broadcasting of traffic conditions in real time. This encourages some innovative applications such as dynamic traffic navigation. It is believed that dynamic traffic navigation will become widely used because it can provide exact useful information such as the driver’s current position, optimal path to destination, and traffic congestion. Due to the highly variable traffic state and the requirement of fast, on-line computations, dynamic traffic navigation has become a challenging issue. Some research studies use the technique of prediction, which forecasts potential congestions and thus calculates the optimal path for each moving object. This requires each moving object to provide its start time, start location, and its destination. However, obtaining all the information of each object is unrealistic, and traffic conditions are difficult to forecast (e.g., traffic jams produced by any unpredictable bursting event such as an accident). Therefore, dynamic navigation techniques cannot be totally based upon prediction models. This motivates us to introduce an intelligent city traffic control system, providing the user, always in a continuous fashion, the optimal path to the destination considering traffic conditions. Although the optimal path-finding problem is one of the most fundamental problems in transportation networks analysis [2, 3], we have adapted the most efficient Dijkstra algorithm [1] for calculating shortest path in order to answer the path-finding request. In this section, we propose a novel indexing method, Hierarchy Aggregation Tree (HAT), combining spatial indexing technology and preaggregation for traffic measures. On one hand, this spatial index is more suitable to road networks, and could support other kinds of location-based services, especially on the road network; on the other hand, the aggregated information stored in each hierarchy level may filter congested zones in advance. Then, based on this index structure, a notion of personal “view” is defined. First, it improves the performances of optimal path finding by limiting the search to regions that are likely to be crossed after the current user’s location. Second, it optimizes the local memory occupancy. Finally, we derive a navigation algorithm and a Dynamic Navigation System, named DyNSA.
9.3.1 Hierarchy Aggregation Tree Navigation intensively uses the retrieval of network sections in which objects move. Therefore, an efficient spatial index becomes a key issue. A road is usually represented as a line string in a 2-dimensional space. Spatial access methods, such as the R-tree, PMR Quad-tree [5], and Grid File [4], can be consequently adopted for networks’ indexing. However, the R-tree indexing structure produces large overlapping MBRs, which makes the search inefficient. Furthermore, originally tailored for indexing rectangles, applying the R-tree to a network will result in large amount of dead space. Although the PMR Quad-tree and Grid File do not have the overlapping problems, the uniform geometric partition does not adapt to non-uniform road distribution in space. Since these indices are not suitable for optimal path searching
9.3 Dynamic Transportation Navigation
131
in traffic environments, to improve the efficiency of query processing, we put forward a new indexing method named Hierarchy Aggregation Tree (HAT). It is based on two structures: road and region. The former is a segment of road, which has no intersections with other roads except the two extremities. The latter is similar to an MBR. In addition, it contains a supplement information that stores an aggregated value over it. The principal functionality of this aggregated information is to filter the regions having a high traffic density on the same hierarchy level. HAT is set up based on the spatial information of roads. Unlike the R-tree, HAT references edges and nodes so that it avoids dead space. Moreover, because node MBRs do not overlap, the search is more efficient. Basically, this method is inspired from the PMR Quad-tree’s non-overlapping index, except that the space partitioning in HAT may be skewed and the resulting tree will be balanced. Furthermore, for each region, HAT stores additional information on traffic density at different granularity levels. As shown later, this provides a filter capability for the path-finding process. HAT is constructed by partitioning the index space recursively. The space is divided according to the distribution of network segments, by an adaptive and recursive split of space in four sub-regions. When the amount of roads, namely capacity in a leaf node N, exceeds B (B is the predefined threshold for split), N is to be split and the corresponding region is to be partitioned into four sub-regions. The split method of HAT satisfies the following two rules: 1. Capacities in four sub-regions should almost be the same. Suppose Max is the maximum capacity and Min is the minimal capacity of four sub-regions. Then, the difference between Max and Min should not exceed a predefined proportion P1, which is expressed by the following inequality: (Max − Min) ≤ Max ∗ P1 where symbol , denotes the minimal integer more than the value in it. 2. The sub-regions crossed by a road should be as few as possible, namely the copies of entries for the road should be as few as possible. Suppose S is the sum of four capacities, C is the capacity of original region. S should be as close to C as possible, i.e., the difference between S and C should not exceed the predefined proportion P2, which is expressed by the following inequality: (S − C) ≤ S ∗ P2. The main idea of the split algorithm is to first partition the region into four regions of equal size; then, no more modification is needed if the result satisfies rules 1 and 2. Otherwise, the partition point is adjusted. Let N be an overflowing node, and let R be its corresponding region of capacity C(C > B). To process the split operation, R is partitioned at a partition point (xs , ys ) into four equal sub-regions R1 , R2 , R3 , R4 . (xs , ys ) is initially set to (xmedian , ymedian ), i.e., the median point among road coordinates in R; then, it should be adjusted to satisfy the above rules. To do so, the split axis is pushed so that it minimizes the segment split, as sketched in Fig. 9.1 and Fig. 9.2. By the above method, although the partition of index space is skewed, the resulting tree is balanced and the copies of roads are reduced, which reduces the occupancy of index and increases the searching efficiency.
132
9 Dynamic Transportation Navigation
Fig. 9.1 Adjusting partition for a region without crossings
Fig. 9.2 Adjusting partition for a region with crossings
9.3.2 Dynamic Navigation Query Processing Dynamic navigation query refers to finding a path through which a user will take least time, distance, or cost to reach the destination. In this section, we introduce a new method for path searching, using the available moving objects stream aggregation and HAT index. We notice that in real world, users do not prefer detours. Thus, their travel only involves parts of the whole map. Considering this, referring to the concept of view in relational databases, we carry out the view-based searching method. When finding the optimal path, only consider the area from the user’s current position to his/her destination, which is referred to as reference area shown in Fig. 9.3. The resulting path is then found from the set of those roads crossing the reference area, which are referred to as candidate roads. Reference area is a logic notion. To retrieve all the candidate roads in HAT, first the algorithm retrieves those underlying regions contained in or intersecting with the reference area. This spatial union of underlying regions is then referred to as searching area, which is a partial view of the whole map shown in the right half of the figure. Since HAT corresponds to the whole map, the view corresponds to some nodes of HAT. These nodes remain the tree-like structure referred to as view tree. Consequently, not accessing the whole HAT but only parts of it will lead to less time for implementing the dynamic navigation task. In addition, with the user becoming closer and closer to the destination, the size of the view tree will surely reduce, and release memory resources. Processing on such a smaller view tree will enhance the efficiency.
9.3 Dynamic Transportation Navigation
133
Fig. 9.3 Reference area and searching area
Based on the structure of view tree, we adopt a method similar to the drill-down process in OLAP, named hierarchy search. The hierarchy search is a top-down process that always chooses the region of better traffic conditions as the mid region to go through until it finds the resulting path. Considering a movement from S to D, the user may go through different regions along different paths. Actually, several cases could be distinguished. The first is when S and D belong to the same best region, which means they are in the same leaf-node. In this case, a direct call to Dijkstra search suffices. The second case is when S and D belong to the same region, but at a higher level of the tree. We then continue to descendant nodes till S and D appear in different sub-regions or in a leaf-node. In the other cases, when S and D belong to two different regions (denoted as Rs and Rd ), we should find mid regions from S to D by selecting a sub-region (denoted as Rm ) providing the rough direction of Rs → Rm → Rd . Since each region is partitioned into four, Rs and Rd are either adjacent to each other, or separated by one region of the same level as shown in Fig. 9.4. If Rs and Rd are adjacent, the algorithm makes recursive calls to HierarchySearch, starting from their corresponding nodes. However, if they are not adjacent, recursive calls will include one of the two sub-regions according to each region’s aggregate value. For instance, in Fig. 9.4, suppose the density in R3 is smaller, then the determined direction will be Rs → R3 → Rd .
Fig. 9.4 Different distribution of S and D
Notice that the path from one region Rs to an adjacent region Rd will necessarily pass through a road that crosses their frontier. Suppose r1 , . . . , rm are m such roads spanning Rs and Rd . Therefore, Rs → Rd can be concretely transformed to m selections: S → (ri .x1 , ri .y1 ) → (ri .x2 , ri .y2 ) → D(1 ≤ i ≤ m). The endpoint (ri .x1 , ri .y1 ) is in Rs and the other endpoint (ri .x2 , ri .y2 ) is in Rd . Each selection results in different paths (denoted as pathi ). For each selection, we issue the same hierarchy search under node N to find the optimal path (denoted as pathi .sub − path1 ) from S to (ri .x1 , ri .y1 ) and the optimal path (denoted as pathi .sub − path2 ) from (ri .x2 , ri .y2 ) to D. Then, we join pathi .sub − path1 and pathi .sub − path2 by ri and get the en-
134
9 Dynamic Transportation Navigation
tire pathi . Finally, we select the one of minimal cost among every pathi as the final optimal path from S to D.
9.3.3 Dynamic Navigation System Architecture Adopting the aforementioned index structure and navigation method, we have designed and implemented an intelligent city traffic control system, named DyNSA (Dynamic Navigation System based on moving objects stream Aggregation), with the aim of providing high quality of dynamic navigation services. An overview of this system architecture is shown in Fig. 9.5.
Fig. 9.5 System architecture of DyNSA
This system consists of multiple managers: Traffic Information Receiver (TIR), Traffic Information Manager (TIM) and Query Processor. TIR is an information receiver, which continuously sends traffic information to TIM. In TIM, aggregated information of each road segment is refreshed regularly according to current traffic information and the region aggregation on each HAT’s hierarchy level is thus recalculated. Query process is in charge of users’ navigation requests. When a navigation request arrives, it is sent to a View Manager, and then a corresponding view tree on HAT will be created. The Service Agent will perform the view-based hierarchy search on it, and finally, the optimal path will return to the user. Since the View Manager maintains a consistency between the view tree and the HAT, a recalculation will happen if necessary and will be sent to the user until she/he arrives at her/his destination. The underlying index structure of RIM and Query Processor are both based on HAT.
9.4 Summary We believe that the pervasive, wireless, mobile computing is a revolutionary development. Location based services and mobile resource management are some of the initial developments that this revolution has resulted in. This section has presented the applications of moving objects management and their functionalities. In particular, we introduced the applications of dynamic traffic navigation. We have proposed a system architecture DynSA that enables dynamic navigation service.
References
135
References 1. Dijkstra EW (1959) A Note on Two Problems in Connection with Graphs. Numerische Mathematik 1:269-271 2. Deo N, Pang CY (1984) Shortest Path Algorithms: Taxonomy and Annotation. Networks 14(2):275-323 3. Fu L, Rilett LR (1996) Expected Shortest Paths in Dynamic and Stochastic Traffic Networks. Transportation Research Part B: Methodological 32(7):499-516 4. Nievergelt J, Hinterberger H (1984) The Grid File: An Adaptable, Symmetric Multikey File Structure. ACM Transactions on Database Systems 9(1):38-71 5. Tayeb J, Ulusoy O, Wolfson O (1998) A Quadtree-Based Dynamic Attribute Indexing Method. The Computer Journal 41(3):185-200 6. Wolfson O, Mena E (2005) Applications of Moving Objects Databases. Spatial Databases: Technologies, Techniques and Trends, Idea Group, 2005, pp 186-203
Chapter 10
Dynamic Transportation Networks
Xiaofeng Meng, Jidong Chen
[email protected],
[email protected] Renmin University of China, EMC Research China Abstract In this chapter, another application, a new moving objects database system, moving objects on dynamic transportation networks (MODTN), is proposed. In the MODTN system, moving objects are modeled as moving graph points that move only within predefined transportation networks. To express general events of the system, such as traffic jams, temporary constructions, insertion and deletion of junctions or routes, the underlying transportation networks are modeled as dynamic graphs so that the state and the topology of the graph system at any time instant can be tracked and queried. Besides, to track the location of network constrained moving objects, a location update mechanism is provided, and the corresponding uncertainty management issues are analyzed. The content of this chapter is mainly from the work of Ding in [1]. Key words: moving object databases, moving object application, location-based services, traffic management, dynamic traffic network
10.1 Introduction The modeling of moving objects on dynamic transportation networks is composed of two relatively independent steps. The first step is the modeling of the underlying transportation networks. Since the transportation networks can be subject to discrete changes over time, they should be modeled as “dynamic” graphs that allow us to express state changes (such as traffic jams and blockages caused by temporary constructions) and topology changes (such as insertion and deletion of junctions or routes). For simplicity, “dynamic transportation networks” and “dynamic graphs” will be used interchangeably throughout this chapter.
137
138
10 Dynamic Transportation Networks
In modeling dynamic graphs, we utilize a state-based method. The basic idea is to associate a temporal attribute to every route or junction of the graph system so that the state of the route or junction at any time instant can be retrieved [1]. Since the changes to the graph system are discrete, we can use a series of temporal units to represent a temporal attribute with each temporal unit describing one single state of the route or junction during a certain time period. As a result, the whole spatio-temporal history of the graph system can be stored and queried. The second step is the modeling of moving objects on the graph system, which has been handled in the first step. Since in most cases a moving object can be viewed as a point, moving objects are modeled as moving graph points. A moving graph point is a function from time to graph point, which can be represented as a group of moving units in the discrete model. The methodology proposed in this chapter can be easily extended to deal with more complicated situations where moving objects need to be modeled as moving graph lines or moving graph regions.
10.2 The System Architecture In the following discussion, we suppose that in the whole MOD system, there can be multiple graphs coexisting while each graph is composed of a set of routes and a set of junctions. For each route, its geometry is described by a polyline so that it can actually assume an arbitrary shape instead of just a straight line. A junction connects two or more routes of the graph system. The connected routes can come from one graph (in this case, the junction is called “in-graph junction”), or belong to different graphs (in this case, the junction is called “intergraph junction”). A junction can be located in the middle of a route, or at the beginning/end of the route. Figure 10.1 illustrates a graph system composed of two graphs.
Fig. 10.1 Graph system consisting of two graphs
Moving objects can move inside one graph and transfer from one route to another via in-graph junctions. They can also transfer from one graph to another via intergraph junctions. In the system, both moving objects and the underlying transporta-
10.2 The System Architecture
139
tion networks are dynamic − moving objects change their locations continuously, while transportation networks change their states and topologies discretely. In order to envisage the above ideas, we give an example. This example shows how a modern logistic system works. We suppose that in such a system, transportation vehicles are uniquely identified and each of them is equipped with a portable computing platform and to other integrated location tracking equipment so that its location at any time instant can be retrieved. We assume that such a logistic system, which is responsible for cargo delivery services, exists in the Verkehrsverbund Rhein-Ruhr (VRR) area of Germany. The whole highway network of the VRR area is expressed as a graph in the database. Besides, the street network of each city in this area is also stored as an independent graph. As a result, the whole system is composed of multiple graphs that can overlap each other, as shown in Fig. 10.2.
Fig. 10.2 Architecture of the MODTN system
A certain vehicle can move either by highway between two cities, or by street inside a city, during its whole journey. Therefore, it can pass through several different graphs during one trip. Since both historical and current location information are stored in the database, the system can support the following queries: “tell me the location of vehicle x310 at 2:00 PM of last Friday” and “find all vehicles that are currently in the Hagener street”. Besides, since the general events (such as traffic jams, car accidents, and insertion and deletion of routes or junctions) of the graph system are stored in dynamic graphs, the following queries: “tell me the topology of the Hagen street network at time t” and “find the current traffic jam in the Hagener street and the moving objects affected by it” can also be handled. To speed up query processing, both moving objects and dynamic graphs should be indexed. The database records and the index structures contain location information covering a time period from the past until the future. Therefore, when location updates occur, both database records and the index need to be modified.
140
10 Dynamic Transportation Networks
10.3 Data Model of Transportation Network and Moving Objects Let us first consider the transportation networks. In the MODTN model, the whole transportation networks are modeled as a dynamic graph system. Definition 10.1. A dynamic graph system, GS, is defined as a set of dynamic graphs and inter-graph junctions: ∗ GS = {G1 , G2 , . . . , Gn , j1∗ , j2∗ , ..., jm }
(10.1)
where n ≥ 1, m ≥ 0, Gi (1 ≤ i ≤ n) is a dynamic graph, and jk∗ (1 ≤ k ≤ m) is an inter-graph junction (see Definition 10.5). Definition 10.2. A dynamic graph G is defined as a pair: G = (R, J)
(10.2)
where R is a set of dynamic routes and J is a set of dynamic in-graph junctions. Definition 10.3. A dynamic route of graph G, denoted by r, is defined as follows: r = (gid, rid, route, len, f dr,t p)
(10.3)
where gid and rid are identifiers of G and r, respectively, route is a polyline that describes the geometry of r, len is the length of the route, f dr ∈ {0, 1, 2} is the traffic flow directions allowed in the route, and t p is the temporal attribute (refer Definition 10.6) associated with r. The polyline route in the above definition can be defined as a series of points in the Euclidean space. For simplicity, we suppose that the graph system is spatially embedded in the X × Y plane so that the polyline can be presented as a series of points in the X × Y plane. The polyline is considered directed, whose direction is from the first vertex to the last vertex, which is said as the beginning point (or 0-end) and the end point (or 1-end) of the route. The traffic flow directions allowed in a route can have three possibilities, which are specified by f dr, whose value can assume 0, 1, 2, corresponding to “from 0-end to 1-end”, “from 1-end to 0-end”, and “both directions allowed”, respectively. Definition 10.4. A dynamic in-graph junction of graph G, denoted by j, is defined as follows: (10.4) j = (gid, jid, loc, ((ridi , posi ))ni=1 , m,t p) where gid and jid are identifiers of G and j, respectively, loc is the location of j, which can be presented as a point value in the X ×Y plane, ((ridi , posi ))ni=1 describes the routes connected by j, m is the connectivity matrix of j, and t p is the temporal attribute associated with j. (ridi , posi )(1 ≤ i ≤ n) in the above definition indicates the ith route connected by j, where ridi is the identifier of the route and posi ∈ [0, 1] describes the position of the junction inside the route. We suppose that the total length of any route is 1, and then every location in the route can be presented by a real number p ∈ [0, 1].
10.3 Data Model of Transportation Network and Moving Objects
141
The matrix m describes the connectivity of the junction. It contains possible matches of traffic flows in the routes connected by the junction, and the element value associated with each match can assume either 0 or 1, which indicates whether moving objects can transfer from the “in” traffic flow to the “out” traffic flow through this junction, as shown in Fig. 8.1 As illustrated in Fig. 8.1, route r1 allows moving objects running in both directions so that it can have two traffic flows, r1 + and r1 −. Route r2 is a one-way street so that it has only one traffic flow, r2 −. These three traffic flows can have nine combinations, and the value corresponding to each combination describes the transferability of the two traffic flows. Definition 10.5. A dynamic inter-graph junction, denoted by j∗ , is defined as follows: (10.5) j∗ = ( jid, loc,t p, ((gidi , ridi , posi ))ni=1 , m,t p) The definition of the inter-graph junction is very similar to that of the in-graph junction. The 3-tuple (gidi , ridi , posi )(1 ≤ i ≤ n) describes the routes connected by j∗ , which can come from different graphs. Definition 10.6. The temporal attribute associated with a junction or a route, denoted by t p, describes the state history of the junction or route, which is defined as a sequence of the following form: t p = ((Ii , si ))ni=1
(10.6)
where Ii is a time interval, si is the state (refer Definition 10.7) of the junction or route during Ii . (Ii , si )(1 ≤ i ≤ n) is called the ith temporal unit of t p. For a valid temporal attribute, the following conditions should be satisfied: 1. ∀i, j ∈ {1, ..., n}, i = j : Ii ∩ I j = φ 2. ∀i ∈ {1, ..., n − 1} : Ii Ii+1 ( means “before” in time series) 3. ∪ni=1 Ii = [min I1 , max In ] For a certain temporal unit (Ii , si )(1 ≤ i ≤ n), Ii is composed of two time instant values, min(Ii ) and max(Ii ), which indicate the starting point and the endpoint of Ii , respectively. min(Ii ) must be a defined value while max(Ii ) can be either defined or undefined. If max(Ii ) is an “undefined” value ⊥, then Ii is called an open temporal unit. Otherwise it is called a closed temporal unit. Semantically, ⊥ means “until now”. Therefore, if a junction or route is still active in the transportation network, its temporal attribute will contain exactly one open temporal unit, which forms its last temporal unit. Otherwise, if it has already been deleted from the transportation network, then its temporal attribute will only contain closed temporal units. The insertion and deletion time of a junction or a route can be decided by min(I1 ) and max(In ), respectively. Figure 10.3 illustrates an example for a temporal attribute value. Definition 10.7. A state of a junction or a route, denoted by s, is defined as follows: s = (σ , (bri , BPi )ni=1 )
(10.7)
where σ ∈ {opened, closed, blocked}. If σ = blocked, then s must be associated with a route, and (bri , BPi )ni=1 is needed in this situation to describe the blockages
142
10 Dynamic Transportation Networks
Fig. 10.3 The corresponding temporal units
of the route where bri describes the reason (traffic-jam, construction, traffic-control, etc.), and BPi ⊆ [0, 1] describes the location, of the ith blockage of the route. In the above definition, we assume that the location of the blockage is static so that it can be expressed as a closed interval over [0, 1], whose boundaries indicate the location of the borders of the blockage. In dynamic transportation networks, a junction can have two states: opened and closed, and a route can have three states: opened, closed, and blocked. If a junction or a route is opened, then it is entirely available to moving objects. If a junction or a route is closed, then it is entirely unavailable to moving objects, which means that no moving objects are allowed to stay or move on any part of it. A closed junction or route is not deleted from the system. Instead, it is only temporarily unavailable to moving objects and can be reopened afterwards. The blocked state is used to describe a special kind of state of a route, which is partially available to moving objects, that is, the unblocked part of the route is still available to moving objects, but no moving objects can move through the blocked part. Figure 10.4 shows an example of a blocked route.
Fig. 10.4 A blocked route with moving objects
In the dynamic graph system, since every junction or route has a temporal attribute associated with it, we can know its state at any given time instant. This is very useful in moving objects databases since a lot of queries can only be processed efficiently by accessing the states of the transportation networks. For instance, “please tell me all the routes currently blocked by traffic jams and the moving objects affected by them”. Besides, through the temporal attribute, we can also know the life span of any junction or route of the graph system so that the topology changes of the transportation networks can also be expressed and queried. For instance, “find the shortest path from a to b at time instant t”.
10.3 Data Model of Transportation Network and Moving Objects
143
Based on the above definitions for dynamic transportation networks, we can then define some useful data types, graph point, graph route section, graph line, and graph region, which form the basis for the modeling and querying of moving objects. Let graph(gid), junct( jid), junct(gid, jid), route(gid, rid) be functions which return the graph, the junction, and the route corresponding to the specified identifiers, respectively. Definition 10.8. A graph point is a point residing in the graph system. The set of graph points of graph system GS, denoted by GP, is defined as follows: GP = { jid ∗ | junct( jid ∗ ) ∈ inter juncts(GS)} ∪ {(gid, jid)| junct(gid, jid) ∈ in juncts(GS)} ∪ {(gid, rid, pos)|route(gid, rid) ∈ routes(GS), pos ∈ [0, 1]}
(10.8)
where interjuncts(GS), injuncts(GS), and routes(GS) are the set of inter-graph junctions, the set of in-graph junctions, and the set of routes of GS, respectively. Definition 10.9. A graph route section is a part of a route. The set of graph route sections of graph system GS, denoted by GRS, can be defined as follows: GRS = {(gid, rid, S)|route(gid, rid) ∈ routes(GS), S ⊆ [0, 1]}
(10.9)
Definition 10.10. A graph line is a polyline inside the graph system, which can be defined by just specifying the vertices of the polyline. The set of graph lines of graph system GS, denoted by GL, can be defined in the following form: GL = {(vertexi )n1 |n ≥ 2, vertexi = (gidi , ridi , posi ) ∈ GP and (1)∀i ∈ {1, ..., n − 1} : route(gidi , ridi ) = route(gidi+1 , ridi+1 ) Eucl(vertexi ) = Eucl(vertexi+1 ) = Eucl(get junct(vertexi , vertexi+1 )) (10.10) (2)∀i ∈ {1, ..., n − 1} : viable(vertexi , vertexi+1 ) In the above definition, the function Eucl(gp) returns the Euclidean value of a graph point, and get junct(vertexi , vertexi+1 ) returns the junction in which vertexi , vertexi+1 are located. viable(vertexi , vertexi+1 ) means that through route(gidi , ridi ) or get junct(vertexi , vertexi+1 ), moving objects can transfer from vertexi to vertexi+1 . Graph lines are considered as directed paths in the transportation network, whose directions are determined by the vertex series. Definition 10.11. A graph region is defined as a set of junctions and a set of route sections. The set of graph regions of the graph system GS, denoted by GR, is defined as follows: GR = {(V,W )|V ⊆ GJ,W ⊆ GRS} (10.11) where GJ = { jid ∗ | junct( jid ∗ ) ∈ inter juncts(GS)} ∪ {(gid, jid)| junct(gid, jid) ∈ in juncts(GS)}. In contrast to the graph line, a graph region can contain an arbitrary set of graph route sections.
144
10 Dynamic Transportation Networks
Based on the above definitions of network constrained data types, we can then model moving objects on the dynamic graph system. In MODTN, moving objects are modeled as moving graph points. Definition 10.12. A moving graph point, mgp, is defined as a function from time to graph point, that is: mgp = f : T → GP (10.12) where T is the domain of time, and GP is the domain of graph point of the graph system. In the above definitions, most data types are defined discretely so that they can be implemented directly. The only exception is the moving graph point data type. In implementation, Definition 10.12 should be translated into a discrete representation. That is, a moving graph point is expressed as a set of moving units, and each moving unit describes one single moving pattern of the moving object for a certain period of time. Definition 10.13. A discrete presentation of moving graph point, dmgp, is defined as a sequence: (10.13) dmgp = ((ti , (gidi , ridi , posi ), vmi ))ni=1 where ti is a time instant, (gidi , ridi , posi ) = gpi is a graph point describing the location of the moving object at time ti , and vmi is the speed measure of the moving object at time ti . (ti, (gidi , ridi , posi ), vmi ) = μi is called the ith moving unit of dmgp. The speed measure vm is a real number value. Its abstract value is equal to the speed of the moving object, while its sign (either positive or negative) depends on the direction of the moving object. If the moving object is moving from 0-end towards 1-end, then the sign is positive. Otherwise, if it is moving from 1-end to 0-end, the sign is negative. For a valid discrete presentation of moving graph point, the following conditions should be met: 1. ∀i ∈ {1, ..., n − 1} : route(gidi , ridi ) = route(gidi+1 , ridi+1 ) ∨ Eucl(gpi ) = Eucl(gpi+1 ) = Eucl(get junct(gpi , gpi+1 )); 2. ∀i ∈ {1, ..., n − 1} : viable(gpi , gpi+1 ); 3. ∀i ∈ {1, ..., n − 1} : ti < ti+1 and the moving object is assumed to move at even speed between ti and ti+1 . Figure 10.5 gives an example of a discrete representation of a moving graph point. In Definition 10.13, we assume that between two consecutive moving units, moving objects move at even speed. As a result, the location of moving objects can be computed by interpolation. However, this is only an ideal situation. In real world MOD applications, the moving units are generated by location updates, and the moving object is only moving roughly at even speed between two moving units. As a result, the location of moving objects is uncertain between two location updates, and we have to take the uncertainty problem into consideration. When uncertain management is considered, the above definition of moving point is actually interpreted as a “moving route section”. In this chaper, we still call this definition “moving graph point” to maintain consistency.
10.4 Querying Moving Objects in Transportation Networks
145
Fig. 10.5 An example moving graph point value
For a running moving object, its last moving unit contains predicted information. We call the last moving unit “active moving unit”, which contains key information for location update strategies.
10.4 Querying Moving Objects in Transportation Networks In this section, we discuss how the location of a moving object can be computed from its moving units. We suppose that the corresponding moving graph point value of an active moving object, mo, is as follows: mgpoint = ((ti , (gidi , ridi , posi ), vmi ))ni=1
(10.14)
Let μi = (ti , (gidi , ridi , posi ), vmi )(1 ≤ i ≤ n) be the ith moving unit of the moving object, and gpi = (gidi , ridi , posi ) be the location of the moving object at time ti . For simplicity, we assume that the speed measure vmi is positive, which means that the moving object is moving from 0-end towards 1-end along route(gidi , ridi ). The methodology can be easily adapted to the situation when the speed measure is negative. Let vimax = vmi + ψ and vimin = vmi − ψ where ψ is the speed threshold.
10.4.1 Computing the Locations Through Interpolation The location of a moving object can be computed through interpolation. By interpolation, the move of the moving object between any two location updates is approximated to an even speed move. For the query: “where is moving object mo at time tq ?”, the answer, which is a graph point gpq = (gidq , ridq , posq ), can be computed in the following way. • Case 1. ∃i ∈ {1, ..., n} : tq = ti In this case, tq happens to be a location update time, and the location information contained in μi can be returned directly as the result. That is, gpq = (gidi , ridi , posi ).
146
10 Dynamic Transportation Networks
• Case 2. ∃i ∈ {1, ..., n} : ti < tq < ti+1 In this case, tq is between two consecutive location updates, and the corresponding moving units μi , μi+1 need to be further checked in this situation. If route(gidi , ridi ) = route(gidi+1 , ridi+1 ), then according to the location update policies, we can be assured that the moving object is on route(gidi , ridi ) at time tq , and its location at tq is a graph point gpq = (gidi , ridi , posq ) where posq can be computed with the following formula: posq = posi +
posi+1 − posi × (tq − ti ) ti+1 − ti
(10.15)
If route(gidi , ridi ) = route(gidi+1 , ridi+1 ), then from the location update strategies we know that μi and μi+1 are generated by an update, Eucl(gpi ) = Eucl(gpi+1 ), and at time tq the moving object is in junction get junct(gpi , gpi+1 ). The graph point corresponding to this junction will be returned as the final result. • Case 3. tn < tq ≤ tnow In this case, we know that the moving object is still on route(gidn , ridn ). Otherwise, there would be an update triggered after tn . Therefore, the location of the moving object at time tq is a graph point gpq = (gidn , ridn , posq ), where posq can be computed as follows: posq = posn + vn × (tq − tn )
(10.16)
By computing the location of a moving object through interpolation, the location of the moving object at any time instant can be simply presented as a graph point. As a result, the query processing mechanism and the query language of the MOD system can be simplified. Besides, the location update mechanism can also be simplified. The result from the above computing method is only an approximate description of the actual location, and the error introduced is closely related to the distance threshold ξ .
10.4.2 Querying Moving Objects with Uncertainty Even though in a lot of MOD applications, the interpolation technique described is sufficient, a better solution is to take the uncertainty brought about by the location update policy into consideration. As stated in [3], the location of a moving object other than location update time is actually uncertain. Therefore, we should introduce the concept of “possible location” in presenting the location of the moving object instead of just expressing it as a precise point. In MODTN, the uncertainty management problem can be better solved because the possible location of a moving object at any historical or present time instant is reduced to a route section (See Fig. 10.6). In the following discussion, we suppose that ξ , vimax , and vimin have already been transformed to the [0, 1] scope according to the length of the corresponding route. Besides, we will focus on the uncertainty caused by the sampling method alone so that we assume the uncertainty caused by other factors to be negligible.
10.4 Querying Moving Objects in Transportation Networks
147
Fig. 10.6 Possible locations of a moving object
• Case 1. ∃i ∈ {1, ..., n} : tq = ti In this case, the possible location of the moving object is a graph point gpq = (gidi , ridi , posi ). • Case 2. ∃i ∈ {1, ..., n} : ti < tq < ti+1 If route(gidi , ridi ) = route(gidi+1 , ridi+1 ), then we can be assured that the moving object is on route(gidi , ridi ), and its possible position is a graph route section grsq = (gidi , ridi , segq ) where segq ⊆ [0, 1] and satisfies the following conditions: 1. segq ⊆ [pos∗q − ξ , pos∗q + ξ ], where pos∗q = posi + vmi × (tq − ti ). Otherwise there would be an update between ti and ti+1 ; 2. segq ⊆ [posi + vimin × (tq − ti ), posi + vimax (tq − ti )], where vimin × (tq − ti ) and vimax (tq − ti ) are the shortest and the longest distances the moving object can cover during Δt(Δt = tq − ti ) time without triggering an update; 3. segq ⊆ [posi+1 − vimax × (ti+1 − tq ), posi+1 − vimin × (ti+1 − tq )]. Otherwise, the moving object would not be able to arrive at gpi+1 in time without triggering a speed triggered location update. To sum up, the possible location of the moving object is: segq = [0, 1] ∪ [pos∗q − ξ , pos∗q + ξ ] ∪ [posi + vimin × (tq − ti ), posi + vimax (tq − ti )] ∪ [posi+1 − vimax × (ti+1 − tq ), posi+1 − vimin × (ti+1 − tq )]
(10.17)
where pos∗q = posi + vmi × (tq − ti ). If route(gidi , ridi ) = route(gidi+1 , ridi+1 ), then we know that μi and μi+1 are generated by an update. In this case, the possible location of the moving object is a graph point corresponding to get junct(gpi , gpi+1 ). • Case 3. tn < tq ≤ tnow In this case, we know that the moving object is still on route(gidn , ridn ). Otherwise, there would be an update triggered after the last location update. Therefore, the location of the moving object at time tq is a graph route section grsq = (gidn , ridn , segq ) where segq ⊆ [0, 1] and satisfies the following conditions:
148
10 Dynamic Transportation Networks
1. segq ⊆ [posq − ξ , posq + ξ ] ,where posq = posn + vmn (tq − tn ). Otherwise, there will be an update triggered after tn ; 2. segq ⊆ [posn + vnmin (tq − tn ), posn + vnmax (tq − tn )]. Otherwise, there will be an update triggered after tn . Therefore, segq can be computed as follows: segq = [0, 1] ∪ [posq − ξ , posq + ξ ] ∪ [posn + vnmin (tq − tn ), posn + vnmax (tq − tn )] (10.18) where posq = posn + vmn (tq − tn ). By computing the location of the moving object with uncertainty involved, the moving graph point defined in Definition 10.13 is actually interpreted as a moving graph route section. Besides, when uncertainty is involved, we need to adapt related operations to the uncertainty context. For instance, the inside operation can be extended to inside possibly and inside definitely, as stated in [2].
10.4.3 Location Prediction in Transportation Networks Since we assume that moving objects can only move inside the predefined transportation networks, we can predict their future locations more accurately. Suppose the query is “tell me the location of moving object mo at tq (tq > tnow )”. The predicted location of the moving object can be computed in the following way (we assume that moving objects do not submit moving plans proactively). First, the algorithm needs to search in the graph system and to decide a foreseeable future path, f f p, according to the possible position of the moving object at time tnow . f f p is a graph line which starts from the computed position of the moving object at time tnow and finishes at the first junction after which multiple consecutive traffic flows exist, as shown in Fig. 10.7.
Fig. 10.7 Predicting future locations
References
149
Then, the system needs to predict a future speed for the moving object. This can be fulfilled either by using the speed contained in μn (if tq is not too far away from tnow ), or by computing an average speed from the speed information contained in the moving units of the moving object. In the latter case, the speed can be computed as follows: ∑n−1 (|vmi | × (ti+1 − ti )) + |vmn | × (tnow − tn ) (10.19) v¯ = i=1 tnow − t1 where |vm| is the abstract value of vm. With v¯ known, we can predict the distance the moving object can cover in Δt ∗ time (Δt ∗ = tq − tnow ) as follows: d = v¯ × Δt ∗
(10.20)
From f f p and d we can get a graph point value gpq which is the predicted position of the moving object at time tq . According to different applications, gpq can be either presented directly as the final result, or extended to a graph line value. In the latter case, we need to impose the last error-factor, which can be a function of Δt ∗ , to gpq so that the final result can be a graph line value. In MODTN, we confine the prediction to the scope of the foreseeable future path f f p, since after f f p the moving object can have multiple possible directions, so that its possible position can explode, as shown in Fig. 10.7.
10.5 Summary In this chapter, we propose another application of moving objects databases, moving objects on dynamic transportation networks (MODTN). In the MODTN system, transportation networks are modeled as dynamic graphs and moving objects are modeled as moving graph points. The MODTN system has the following features: (1) the system is enabled to support logic road names, while queries based on Euclidean space can also be supported; (2) both history and current location information can be queried, and the system can also support future location queries based on the predicted information; (3) uncertainty problem can be better managed because the possible position of a moving object at any historical or present time instant is reduced to a moving route section; and 4) general events of the system, such as blockages and topology changes can also be expressed so that the system is enabled to deal with the interaction between the moving objects and the underlying transportation networks.
References 1. Ding Z, G¨uting RH. Managing Moving Objects on Dynamic Transportation Networks (2004) In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management (SSDBM 2004), Santorini Island, Greece, pp 287-296
150
10 Dynamic Transportation Networks
2. Trajcevski G, Wolfson O, Chamberlain S, Zhang F (2002) The Geometry of Uncertainty in Moving Objects Databases. In: Proceedings of the 8th International Conference on Extending Database Technology (EDBT 2002), Prague, Czech Republic, pp 233-250 3. Wolfson O, Xu B, Chamberlain S, Jiang L (1998). Moving Object Databases: Issues and Solutions. In: Proceedings of the 10th International Conference on Statistical and Scientific Database Management (SSDBM 1998), Capri, Italy, pp 111-122
Chapter 11
Clustering Analysis of Moving Objects
Jidong Chen1 , Xiaofeng Meng2 1
EMC Research China, 2 Renmin University of China
[email protected],
[email protected] Abstract In many moving objects management applications, realtime data analysis such as clustering analysis is becoming one of the most important requirements. Most spatial clustering algorithms deal with objects in Euclidean space. In many real-life applications, however, the accessibility of spatial objects is constrained by spatial networks (e.g., road networks). It is therefore more realistic to work on clustering objects in a road network. The distance metric in such a setting is redefined by the network distance, which has to be computed by the expensive shortest path distance over the network. The existing methods are not applicable to such cases. Therefore, we use the information of nodes and edges in the network to present two new static clustering algorithms that prune the search space and avoid unnecessary distance computations. In addition, we present the problem of clustering moving objects in spatial networks and propose a unified framework to address it. The goals are to optimize the cost of clustering moving objects and support multiple types of clusters in a single application. Key words: clustering, network distance, spatial networks, moving object
11.1 Introduction Clustering is one of the most important analysis techniques. It groups similar data to provide a summary of data distribution patterns in a dataset. Early research mainly focused on clustering a static dataset [9, 12, 18, 4, 14, 7, 11, 5]. In recent years, there has been increasing research on clustering moving objects [10, 17, 8], which has various applications in the domains of weather forecast, traffic jam prediction, animal migration analysis, to name but a few. However, most existing work on clus-
151
152
11 Clustering Analysis of Moving Objects
tering moving objects assumed a free movement space and defined the similarity between objects by their Euclidean distance. In the real world, objects move within spatially constrained networks, e.g., vehicles move on road networks and trains on railway networks. Thus, it is more practical to define the similarity between objects by their network distance — the shortest path distance over the network. Therefore, by exploiting unique features of road networks, two new clustering algorithms for static objects are presented, which use the information of nodes and edges in the network to prune the search space and avoid some unnecessary distance computations. For clustering moving objects in road networks, we propose a unified framework for “clustering moving objects in spatial networks” (CMON). Due to the innate feature of continuously changing positions of moving objects, the clustering results dynamically change. By exploiting the unique features of road networks, the CMON framework first introduces a notion of cluster block (CB) as the underlying clustering unit. We then divide the clustering process into the continuous maintenance of CBs and periodical construction of clusters with different criteria based on CBs. The algorithms for efficiently maintaining and organizing the CBs to construct clusters are proposed.
11.2 Underlying Clustering Analysis Methods The goal of clustering analysis is to divide a collection of objects into groups, such that the similarity between objects in the same group is high and objects from different groups are dissimilar. In spatial databases, objects are characterized by their position in the Euclidean space and, naturally, dissimilarity between two objects is defined by their Euclidean distance. Several clustering techniques have been proposed for static datasets in a Euclidean space. They can be classified into the partitioning [9, 12], hierarchical [18, 4, 14], density-based [11], grid-based [15, 1], and model-based [3] clustering methods. The generic definition of clustering is usually refined depending on the type of data to be clustered and the clustering objective. In other words, different clustering paradigms use different definitions and evaluation criteria. Partitioning methods divide the objects into k groups and iteratively exchange objects between them until the quality of the clusters does not further improve. k-means and k-medoids are representative methods from this class. In k-means algorithms, clusters are represented by a mean value (e.g., a Euclidean centroid of the points in it) and object exchanging stops if the average distance from objects to their cluster’s mean value converges to a minimum value. k-medoids algorithms represent each cluster by an actual object in it. First, k-medoids are chosen randomly from the dataset. An evaluation function sums the distance from all points to their nearest medoid. Then, a medoid is replaced by a random object and the change is committed only if it results in a smaller value of the evaluation function. A local optimum is reached, after a large sequence of unsuccessful replacements. This process is repeated for a number of initial random medoid-sets and the clusters are finalized according to the best local optimum found. Another class of (agglomerative) hierarchical clustering techniques define the clusters in a bottom-up fashion, by first assuming that all objects are individual clusters and gradually the closest pair of clusters are merged until a desired num-
11.2 Underlying Clustering Analysis Methods
153
ber of clusters remain. Several definitions for the distance between clusters exist; the single-link approach considers the minimum distance between objects from the two clusters. Others consider the maximum such distance (complete-link) or the distance between cluster representatives. Divisive hierarchical methods operate in a top-down fashion by iteratively splitting an initial global cluster that contains all objects. The cost of brute-force hierarchical methods is O(N 2 ), where N is the number of objects, which is not suitable for practical use. Moreover, they are sensitive to outliers (like partitioning methods). Algorithms like BIRCH [18] and CURE [4] were proposed to improve the scalability of agglomerative clustering and the quality of the discovered partitions. C2P [14] is another hierarchical algorithm similar to CURE, which employs closest pairs algorithms and uses a spatial index to improve scalability. Density-based methods discover dense regions in space, where objects are close to each other and separate them from regions of low density. DBSCAN [11] is the most representative method in this class. First, DBSCAN selects a point p from the dataset. A range query, with center p and radius ε , is applied to verify if the neighborhood of p contains at least a minimum number of points, that is, MinPts (i.e., it is dense). If so, these points are put in the same cluster as p and this process is iteratively applied again for the new points of the cluster. DBSCAN continues until the cluster cannot be further expanded; the whole dense region in which p falls is discovered. The process is repeated for unvisited points until all clusters and outlier points have been discovered. A limitation of this approach (addressed in [2]) is that it is hard to find appropriate values for ε and MinPts. In many real applications, however, the accessibility of spatial objects is constrained by spatial (e.g., road) networks. It is therefore realistic to define the dissimilarity between objects by their network distance, instead of the Euclidean distance. The network distance between two objects p and q is defined by the length of the shortest path that reaches q from p and vice versa, assuming an undirected network graph. There are also a few studies [5, 7, 16, 19] on clustering nodes or objects in a spatial network. Jain and Dubes [5] used the agglomerative hierarchical approach to cluster nodes of a graph. They treat each node as a cluster and then merges the clusters until one remains. The single-link variant of this method has complexity O(|V |2 ), whereas the complete-link variant comes with complexity O(|V |2 log |V |). Both methods are not scalable for large networks. Another variant [19] applies divisive clustering on the minimum spanning tree of the graph, which can be computed in O(|V | log |V |) time. However, this method is very sensitive to outliers. CHAMELEON [7] is a general-purpose algorithm, which transforms the problem space into a weighted kNN graph, where each object is connected with its k nearest neighbors. The weight of each edge reflects the similarity between the objects. Yiu and Mamoulis [16] defined the problem of clustering objects based on the network distance. They proposed algorithms for three different clustering paradigms, i.e., kmedoids for K-partitioning, ε -link for density-based, and single-link for hierarchial clustering. These algorithms avoid computing distances between every pair of network nodes by exploiting the properties of the network. However, all these solutions assumed a static dataset. A straightforward extension of these algorithms to moving objects by periodical re-evaluation is inefficient. Besides, Jin et al. [6] studied the problem of mining distance-based outliers in spatial networks, and found the problem to be only a byproduct of clustering. Clustering analysis on moving objects has recently drawn increasing attention. Li et al. [10] first addressed this problem by proposing a concept of micro moving
154
11 Clustering Analysis of Moving Objects
cluster (MMC), which denotes a group of similar objects both at current time and at near future time. Each MMC is tightly bounded by a rectangle, whose size grows with time. In order to obtain high-quality clusters, the MMCs are kept geographically small. Specifically, they identify the split and merge events and dynamically maintain the bounding boxes of clusters by their width or height of the bounding box. Each MMC maintains a bounding box for the moving objects contained, whose size grows over time. Zhang and Lin [17] proposed a histogram construction technique based on a clustering paradigm. In [8], Kalnis proposed three algorithms to discover moving clusters from historical trajectories of objects. Nehme and Rundensteiner [13] applied the idea of clustering moving objects to optimize the continuous spatio-temporal query execution. The moving cluster is represented by a circle in their algorithms. However, most of the studies only considered moving objects in unconstrained environments and defined the similarity between objects by their Euclidean distance. This chapter specifies the problem of clustering networkconstrained moving objects whose similarity is defined by network distance.
11.3 Clustering Static Objects in Spatial Networks For clustering objects in a spatial network, the distance metric is redefined by the network distance, which has to be computed by the expensive shortest path distance over the network. We presented two new clustering algorithms that use the information of nodes and edges in the network to prune the search space and avoid some unnecessary distance computations.
11.3.1 Problem Definition In this section, we formally define the problem space to which we apply clustering and the distance metric used in our settings. We introduce the definition of network, network distance between objects, and network distance between clusters. We then identify the peculiarities of the problem by definition of cluster block and discuss why existing clustering algorithms are inapplicable or inefficient for objects that lie on a network. Definition 11.1. A network is an undirected weighted graph G = (V, E,W ) where V is the set of vertices (i.e., nodes), E is the set of edges, and W : E → IR+ associates each edge to a positive real number. An object (i.e., point) is located on an edge e ∈ E in the network. The position of the object in the network can be expressed by the triplet < ni , n j , posi >, where pos ∈ [0,W (e)] is the distance of the point from node ni along the edge. A point lies on exactly one edge (In real-life problems, some objects may not lie on edges of the network. In such cases, we assume that the object is represented by the position on the network which is most directly accessible from it). To ensure the position of the object is expressed unambiguously by one triplet, we require that ni < n j (assuming a total ordering of node labels).
11.3 Clustering Static Objects in Spatial Networks
155
Let p and q be two points, whose positions are < na , nb , pos pi > and < nc , nd , posqi >, respectively. The network distance Dd(ni , n j ) is defined in Definition 11.2. Definition 11.2. The network distance is defined as follows: 1. The direct distance between points in the same network edge: Dd(p, q) = | pos pi − posq j | (p and q lie on the same edge, e.g., na = nc and nb = nd ); otherwise, it is defined as ∞; 2. The direct distance between a point and a node in the same network edge: Dd(p, na ) = pos p ; Dd(p, nb ) = W (na , nb ) − pos p ; 3. The network distance between nodes: Dn(ni , n j ) is the distance of the shortest path from ni to n j ; 4. The network distance between objects in different network edges: Dn(p, q) = minx∈{a,b},y∈{c,d} Dd(p, nx ) + Dn(nx , ny ) + Dd(ny , q). From the definition, in the computation of the network distance of objects, the first two cases are defined as the direct distance, which is not necessarily the network distance and can be found in constant time. However, the last two cases are defined as the network distance, which need the computation of the shortest distance between nodes. Definition 11.3. The network distance between clusters is the minimum network distance among different boundary objects of the clusters. Let the boundary objects of the cluster Cx be px1 , px2 , · · · , pxm and the boundary objects of the cluster Cy be py1 , py2 , · · · , pyn ; the similarity of the clusters Cx and Cy , M(Cx ,Cy ) = mini⊂{1,m}, j⊂{1,n} Dn(Pxi , Py j ). It is necessary to find the boundary objects of the clusters in the network distance computation of clusters. In the road network, we treat the closest object in the cluster to the nodes as the boundary object of the cluster since the boundary of the clusters is relevant to nodes. The object cluster results in the road network are usually composed of object sets in a few of network edges. Therefore, for representing the cluster results better and reducing the computation of the network distance of objects, we introduce the definition of cluster block (CB). Definition 11.4. A cluster block is represented by (O, na , nb , head,tail, Ob jNum), where O is a list of objects {o1 , o2 , · · · , oi , · · · , on }, oi = (oidi , na , nb , posi ). Without loss of generality, assuming pos1 ≤ pos2 ≤ · · · ≤ posn , it must satisfy |posi+1 − posi | ≤ ε (1 ≤ i ≤ n − 1). Since all objects are on the same edge (na , nb ), the position of the cluster is determined by an interval (head,tail) in terms of the network distance from na . Thus, the length of the CB is |tail − head|. Ob jNum is the number of objects in the CB. There could be several CBs in the same edge, but each CB only belongs to one edge. A CB itself is a cluster. Therefore, clusters in the network are composed of one or several adjacent CBs. Given a collection of N object points that lie on a network, our objective to group them into a set of clusters becomes constructing the CBs according to their direct distance and merging these CBs by their network distance. Existing spatial clustering methods group objects only by their spatial similarity, which is either infeasible or inefficient for clustering network-based objects. The
156
11 Clustering Analysis of Moving Objects
replacement of the Euclidean distance by the network distance increases the complexity, since now the distance between two arbitrary objects cannot be computed in constant time, but an expensive shortest path algorithm is required. However, we find that the cluster results in a network are relevant to the edge and nodes of the network. For example, the objects in the same edge or adjacent edges are likely to be grouped into a cluster. Similarly, the chance that the objects around one nodes are clustered together is large (e.g., traffic jams usually occur in the crossroad). Based on these observations, we exploit the information of edges and nodes and propose two clustering methods, edge-based clustering, and node-based clustering in the following sections.
11.3.2 Edge-Based Clustering Algorithm Most hierarchical clustering methods initially assume that each point is a cluster and then iteratively merge the closest pair of clusters until one cluster remains. The user may opt to stop the algorithm after a desired number of k clusters have been discovered. Since the algorithm initializes one cluster for each point in the dataset, considerable merging of clusters is involved when the number of objects becomes high. The edge-based clustering algorithm is actually a hierarchical method. However, it solves the scalability issues of the traditional hierarchical clustering methods by constructing initial groups according to the edges in which the objects lie and refining the results through group splitting and merging. During the merging process, the algorithm only merges the clusters adjacent to nodes and further reduces the number of merges by introducing the ε parameter so that the fine-granular cluster results can be found at the earliest. The edge-based clustering algorithm involves the following three phases: • Initiation phase: Construct initial groups according to edges in which the objects lie. This involves assigning the objects in the same edge into one cluster. The number of initial clusters is the number of edges in the network. This phase can filter out those edges in which no objects lie. Therefore, unnecessary network traverse processes are reduced. • Splitting phase: Split large initial groups into smaller cluster blocks to obtain the intermediate results. Specifically, for each initial group, if the network distance of two adjacent objects in this group is larger than the predefined threshold ε , the group needs to be split into two smaller cluster blocks from the two adjacent objects. Otherwise, the group is treated as an individual cluster block. The process ensures the compactness of cluster blocks. • Merging phase: Iteratively merge the adjacent cluster (blocks) to form the final cluster results. Specifically, for each node, the process merges the adjacent cluster (blocks) around the node iteratively until they cannot merge any more according to the threshold ε and the network distance of clusters, Figure 11.1 shows an example of the edge-based clustering process. A part of the road network is represented as road segments (denoted as S) and intersections (denoted as J) in the figure, which correspond to edges and nodes, respectively in the network graph. Objects (denoted as P) are represented by small rectangles and clusters by circles or polygons. In the initial phase, all objects in the same road
11.3 Clustering Static Objects in Spatial Networks
157
segment are clustered into one group. For example, for segment S1 , the objects p9 , p5 , p6 , p7 , p8 , p12 and other objects between them are grouped together. Similarly, the cluster in segment S2 contains all objects between p2 and p3 . In the splitting phrase, since the network distances between p5 and p6 , as well as between p7 and p8 are larger than the threshold ε , the corresponding cluster is split into three parts as shown in Fig. 11.2. In the merging phrase, for cluster blocks around nodes J1 , J2 , and J3 , since the network distances between adjacent cluster blocks is less than the threshold ε , they are merged into a large cluster. Figure 11.3 shows the final results after repeating this merging process.
Fig. 11.1 Initiation phase
Fig. 11.2 Splitting phase
The pseudo-codes of the edge-based algorithm are shown in Algorithm 22. To reduce the number of object access in the network, in the initial phase, the algorithm creates clusters for each edge having objects instead of assigning objects to clusters. In the splitting phrase, when traversing the objects in the corresponding
158
11 Clustering Analysis of Moving Objects
Algorithm 22: Edge CMON() // Q: priority queue which is used to store clusters to be merged around nodes; // Initial Phase; for each network edge (nx , ny ) in edge-list with moving objects on it do Create a new cluster C for edge (nx , ny ), assign cid for it; end // Splitting Phase; for each cluster Ci do o is the first object on edge (nx , ny ) in which Ci lies; o.ci d = Ci .cid; if Dd(o.pos, nx) ≤ ε then Insert < nx, Dd(o.pos, nx) > into nodelist of Ci ; end nexto is the next object on edge (nx , ny ) from o to ny ; C = Ci ; while nexto is not NULL do if Dd(o.pos, nexto.pos) > ε then Split C into CB1 and CB2 ; C = CB2 ; if nexto is the last object on edge (nx , ny ) AND Dd(nexto.pos, ny ) ≤ ε then Insert < ny, Dd(nexto.pos, ny) > into nodelist of Ci ; end o = nexto; nexto is the next object on edge (nx , ny ) from o to ny ; o.cid = C.cid; end // Merging Phase; for each ni in nodelist of Ci do Q =new priority queue; Insert clusters in ni into Q according to their distance to ni ; if notempty(Q) then Ci is the first cluster in Q; C j is the next cluster in Q; while (Dd(Ci , ni ) + Dd(C j , ni )) ≤ ε do Merge C j into Ci and merge the nodelist of C j into the one of Ci ; if notempty(Q) then C j is the next cluster in Q; end else Break; end for each adjacent node ns of ni do if no cluster in edge (ni , ns ) and (Dd(Ci, ni) +W (ni, ns)) ≤ ε then Insert < ns, (Dd(Ci , ni )+W (ni , ns ))> into nodelist of Ci ; end end end end end end
11.3 Clustering Static Objects in Spatial Networks
159
Fig. 11.3 Merging phase
edge, objects are assigned the split CBs. The key part of the algorithm is merging of the clusters (blocks). After splitting, there are two cases for which clusters need to be merged: (1) clusters adjacent to the nodes, and (2) clusters across a small edge having no other clusters but with network distance between them less than the threshold. The algorithm maintains a ε -nodelist for each cluster in which each entry is the pair of (ε -node, ε -dist). ε -node is the adjacent node of the cluster satisfying the condition that the network distance between the node and the cluster is less than ε and ε -dist denotes the network distance. The algorithm uses the list of ε -node for clusters and sorts the clusters by the network distance during merging. This will filter out some unnecessary clusters whose distance to the nodes is larger than ε and reduce the repeated computation of the distance change of clusters due to the distance change between clusters and nodes after merging.
11.3.3 Node-Based Clustering Algorithm Given a random point p, the density-based clustering method identifies the cluster, to which p belongs, by applying an ε -range query around p and checking if there are at least MinPts points in this range. If so, a new cluster for p is created containing the points in the range query. It iteratively applies range queries for the new points in the cluster, until it cannot be expanded any further. The node-based clustering algorithm involves adapting this density method to our network model. A main module of the algorithm finds the ε -neighborhood of a point p in the network. This can be done by expanding the network around p and assigning points until the distance exceeds ε. The node-based clustering algorithm further optimizes the network-expanding process by exploiting the nodes information of the network, which avoids the redundant computation of random expanding. The main idea is to traverse the network starting from the node and the group objects around the node according to the condition that their network distance is less than ε . Then, the algorithm expands the cluster to the adjacent nodes so that the other objects around the adjacent nodes
160
11 Clustering Analysis of Moving Objects
are also grouped into this cluster when these objects satisfy the same condition as well. The process continues until the cluster can not expand (e.g. the distance between any adjacent object and the cluster exceeds ε ). For other nodes which are not traversed, we repeat this process until all objects around the nodes are assigned to some cluster. Finally, we check the isolated objects that cannot join some cluster to check whether they can form the individual clusters. The core part in the node-based clustering algorithm is to cluster objects around one node and expand to other adjacent nodes. The clustering process based on one node can be divided into two steps: initial phase and expanding phase. In the initial phase, the algorithm first filters out the edges containing the node in which the distance between objects and the node is larger than ε . Then, the objects satisfying the distance condition are sorted by the distance to the node. In this way, the nearest object to the node is treated as an initial cluster. During the expanding phase, we expand the initial cluster according to the ordered objects in the adjacent edges. Then we continue the sorting and expanding process for the adjacent nodes until the network distance between adjacent objects is larger than ε and the cluster cannot expand again. The algorithm is shown in Algorithm 23. Figure 11.4 shows an example of the node-based clustering process, which starts from the node J1 . The objects around J1 are traversed and ordered according to their distance to J1 . Then, the initial cluster expands from J1 to the adjacent node J2 until the next adjacent node J3 . Consequently, the objects around these nodes are traversed and assigned to this cluster if they satisfy the distance condition. When the cluster cannot expand any more, the algorithm selects other nodes that are not traversed and repeats this process until all nodes are traversed (shown as Fig. 11.5). Finally, it checks the individual objects such as p6 , p7 , and objects between them to see whether they can form an individual cluster according to the distance between each pair of adjacent objects. The final clustering results in this example is shown in Fig. 11.6 . If the user sets the minimum object number in a cluster be five for example, the objects p6 , p7 , and the ones between them are treated as outliers.
Fig. 11.4 Cluster of the node J1
The algorithm needs a priority queue Q to keep the adjacent edges of a node and network distance of objects adjacent the node. In Q, edges are grouped by nodes and
11.4 Clustering Moving Objects in Spatial Networks
161
Fig. 11.5 Clusters of all nodes
Fig. 11.6 Clusters of all objects
sorted according to their distance to each node. The traversed nodes in the networkexpanding process are inserted into the head of Q. The array Ndist keeps the nearest distance of adjacent objects to each node to decide which an adjacent edge needs to be traversed and which objects need to be added into initial clusters. The algorithm also deals with the case of clusters across a small edge having no other clusters but with network distance between them less than the threshold. In this case, the objects need to be added into initial clusters and expanded.
11.4 Clustering Moving Objects in Spatial Networks Clustering moving objects in spatial networks is more complex than in free space. The increasing complexity is mainly due to the network distance metric. The dis-
162
11 Clustering Analysis of Moving Objects
Algorithm 23: Node CMON() // Q: priority queue in which each entry B is < node1, node2, dist >. This represents that the distance between node1 and adjacent objects in edge (node1,node2) is dist; // Ndist[ni]: store the distance of the nearest object in adjacent edges to ni. Default value is set to ∞; for each adjacent node nz to ni do o is the first object in edge (ni,nz); if (o is not Null AND Dd(o.pos, ni) ≤ ε ) OR (o is NULL AND W (ni, nz) ≤ ε ) then Insert < ni, nz, Dd(o.pos, ni) > or < ni, nz,W (ni, nz) > into Q by the distance between o or nz to ni; end end if Notempty(Q) then Create a new cluster C, assign cid for it; end // Expanding Phase; while Notempty(Q) do B = Dequeue(Q); nx = B.node1; ny = B.node2; if B.dist < Ndist[nx] then Ndist[nx] = B.dist; end o is the first object in edge (nx, ny); if Dd(o.pos, nx) + Ndist[nx] ≤ ε OR o is the closest object to nx then o.cid = C.cid; clustered(o)=True; nexto is the next object in edge (nx, ny) from o to ny; while nexto is not Null AND Dd(o.pos, nexto.pos) ≤ ε do nexto.cid=C.cid; Clustered(nexto)=True; o = nexto; nexto is the next object in edge (nx, ny) from o to ny; if Dd(o.pos, ny) ≤ ε then if Dd(o.pos, ny) < Ndist[ny] then Ndist[ny] = Dd(o.pos, ny); end visited(ny)=True; for each adjacent node nz (except nx) to ny do o is the first object in edge(ny, nz); if o is not Null AND Dd(o.pos, ny) ≤ ε OR (o is Null AND Ndist[ny]+W (ny, nz) ≤ ε ) then Insert < ny, nz, Dd(o.pos, ny) > or < ny, nz, Ndist[ny] +W (ny, nz) > into Q by the distance between o or nz to ny; end end end end end end
tance between two arbitrary objects cannot be obtained in constant time, but requires an expensive shortest path computation. Moreover, the clustering results are related to the segments of the network and their changes will be affected by the network constraint. For example, a cluster is likely to move along the road segments and change (i.e., splitting and merging) at the road junctions due to the objects’ diversified spatio-temporal properties (e.g., moving in different directions). It is not efficient to predict their changes only by measuring their compactness. Thus, the
11.4 Clustering Moving Objects in Spatial Networks
163
existing clustering methods for free space cannot be applied to spatial networks efficiently. On the other hand, the existing clustering algorithms based on the network distance [16] mainly focus on the static objects that lie on spatial networks. To extend to moving objects, we can apply them to the current positions of the objects in the network periodically. However, this approach is highly expensive since each time the expensive clustering evaluation starts from scratch. In addition, the clustering algorithms for different clustering criteria (e.g., K-partitioning, distance, and density-based) are totally different in their implementation. This is inefficient for many applications that require to execute multiple clustering algorithms at the same time. For example, in a traffic management application, it is important to monitor densely populated areas (by density-based clusters) so that traffic control can be applied; but at the same time, there may be a requirement for assigning K police officers to each of the congested areas. In this case, it is favorable to partition the objects into K clusters and keep track of the K-partitioned clusters. Separate evaluation of different types of clusters may incur computational redundancy. In this section, we introduce a unified framework for clustering moving objects in spatial networks (CMON). The goals are to optimize the cost of clustering moving objects and support multiple types of clusters in a single application. The CMON framework divides the clustering process into the continuous maintenance of cluster blocks (CBs) and the periodical construction of clusters with different criteria based on CBs. A CB groups a set of objects on a road segment in close proximity to each other at present and in the near future. In general, a CB satisfies two basic requirements: (1) it is inexpensive to maintain in a spatial network setting; (2) it serves as a building block for different types of application-level clusters.
11.4.1 CMON Framework We model a spatial network as a graph where objects are moving on the edges (we use the word “segments” for “edges” interchangeably). The distance between any two objects, called network distance, is measured by the length of the shortest path connecting them in the network. We employ a similar motion model as in [10], where moving objects are assumed to move in a piecewise linear manner (i.e., each object moves at a stable velocity at each edge). We assume that an object location update has the following form (oid, na , nb , pos, speed, next node), where oid is the id of the moving object, (na , nb ) represents the edge on which the object moves (from na towards nb ), pos is the relative location to na , and speed is the moving speed. We also assume that the next edge to move along, (nb , next node), is known in advance. The requirement is to continuously monitor the moving clusters with various criteria at some predefined period. As shown in Fig. 11.7, the proposed CMON framework is composed of two components: the incremental maintenance of cluster blocks (CBs) and the periodical construction of different types of application-level clusters. We have defined a CB in Definition 11.4. A CB is a group of moving objects close to each other at present and near future time. For easy maintenance, we constrain the objects in a CB moving in the same direction and on the same edge segment. Additionally, a CB imposes a strict clustering criterion so as to support different types of application-level
164
11 Clustering Analysis of Moving Objects
clusters. Specifically, the network distance between each pair of neighboring objects in a CB does not exceed a preset threshold ε . We incrementally maintain each CB by taking into account the objects’ anticipated movements. We capture the predicted update events (including split and merge events) of each CB during the continuous movement and process these events accordingly. At any time, clusters of different criteria can be constructed from the CBs, instead of the entire set of moving objects, which makes the construction processing cost efficient. Moreover, to reduce unnecessary computation of the network distance between the CBs, we adapt the network expansion method to combine CBs to construct the application-level clusters.
Fig. 11.7 CMON framework
11.4.2 Construction and Maintenance of CBs Initially, based on the CB definition, a set of CBs are created by traversing all edge segments in the network and their associated objects. The CBs are incrementally maintained after their creation. As time elapses, the distance between adjacent objects in a CB may exceed ε and hence, we need to split the CB. A CB may also merge with adjacent CBs when they are within the distance of ε . Thus, for each CB, we predict the time when they may split or merge. The predicted split and merge events are then inserted into an event queue. Afterwards, when the first event in the queue takes place, we process it and update (compute) the split and merge events for affected CBs (new CBs if any). This process is continuously repeated. The key problems are: (1) how to predict split/merge time of a CB, and (2) how to process a split/merge event of a CB.
11.4 Clustering Moving Objects in Spatial Networks
165
The split of a CB may occur in two cases. The first is when a CB arrives at the end of the segment (i.e., an intersection node of the spatial network). When the moving objects in a CB reach an intersection node, the CB has to be split since they may head in different directions. Obviously, a split time is the time when the first object in the CB arrives at the node. In the second case, the split of a CB is when the distance between some neighboring objects moving on the segment exceeds ε . However, it is not easy to predict the split time since the neighborhood of objects changes over time. Therefore the main task is to dynamically maintain the order of objects on the segment. We compute the earliest time instance when two adjacent objects in the CB meet as tm . We then compare the maximum distance between each pair of adjacent objects with ε until tm . If this distance exceeds ε at some time, the process stops and the earliest time exceeding ε is recorded as the split time of the CB. Otherwise, we update the order of objects starting from tm and repeat the same process until some distance exceeds ε or one of the objects arrives at the end of the segment. When the velocity of an object changes over the segment, we need to re-predict the split and merge time of the CB. Figure 11.8 shows an example. Given ε = 7, we compute the split time as follows. At the initial time t0 , the CB is formed with a list of objects < o1 , o2 , o3 , o4 , o5 >. We first compute the time te when the first object (i.e., o2 ) arrives at the end of the segment (i.e., le ). For adjacent objects, we find that the earliest meeting time is t1 at which o2 and o3 first meet. We then compare the maximum distance for each pair of adjacent objects during [t0 ,t1 ] and no pair whose distance exceeds 7. At t1 , the object list is updated into < o1 , o3 , o2 , o4 , o5 >. In the same way, the next meeting time is at t2 for o2 and o4 . There are also no neighboring objects whose distance exceeds 7 during [t1 ,t2 ]. As the algorithm continues, at t4 , the object list becomes < o3 , o1 , o4 , o5 , o2 > and t5 is the next time for o1 and o4 to meet. When comparing neighboring objects during [t4 ,t5 ], we find the o4 and o5 whose distance is longer than 7 at time ts . Since ts < te , we obtain ts as the split time of the CB.
Fig. 11.8 Prediction of splitting CB
We now discuss how to handle a split event. If the split event occurs on the segment, we can simply split the CB into two ones and predict the split and merge events for each of them. If the split event occurs at the end of the segment, the processing would be more complex. One straightforward method is to handle the
166
11 Clustering Analysis of Moving Objects
departure of the objects individually each time an object reaches the end of the segment. Obviously, the cost of this method is high. To reduce the processing cost, we propose a group split scheme. When the first object leaves the segment, we split the original CB into several new CBs according to objects’ directions (which can be implied from next node). On one hand, we compute a to-be-expired time (i.e., the time until the departure from the segment) for each object in the original CB and retain the CB until the last object leaves the segment. On the other hand, we attach a to-be-valid time (with the same value as to-be-expired time) for each object in the new CBs. Only valid objects will be counted in constructing application-level clusters. Figure 11.9 illustrates this split example. When CB1 reaches J1 , objects p1 and p3 will move to the segment < J1 , J2 > while p2 and p4 will follow < J1 , J6 >. Thus, CB1 is split into two such that p2 and p4 join CB3 , and p1 and p3 form a new cluster CB4 . We still keep CB1 until p4 leaves < J4 , J1 >. As can be observed, the group split scheme reduces the number of split events and hence the cost of CB maintenance.
Fig. 11.9 Group split at an edge intersection
The merge of CBs may occur when adjacent CBs in a segment are moving together (i.e., their network distance ≤ ε ). To predict the initial merge time of CBs, we dynamically maintain the boundary objects of each CB and their validity time (the period when they are treated as the boundary of the CB), and compare the minimum distances between the boundary objects of two CBs with the threshold ε at their validity time. The boundary objects of CBs can be obtained by maintaining the order of objects while computing the split time. For the example in Fig. 11.8, the boundary objects of the CB are represented by (o1 , o5 ) for validity time [t0 ,t3 ], (o3 , o5 ) for [t3 ,t4 ], and (o3 , o2 ) for [t4 ,te ]. The processing of the merge event is similar to the split event on the segment. We obtain the merge event and time from the event queue to merge the CBs into one CB and compute the split time and merge time of the merged CB. Finally, the corresponding affected CBs in the event queue are updated. Besides the split and merge of a CB, new objects may come into the network or existing objects may leave. For a new object, we locate all CBs of the same segment that the object enters and see if the new object can join any CB according to the CB definition. If the object can join some CB, the CB’s split and merge events are updated. If no such CBs are found, a new CB for the object is created and the merge
11.4 Clustering Moving Objects in Spatial Networks
167
event is computed. For a leaving object, we update its original CB’s split and merge events if necessary.
11.4.3 CMON Construction with Different Criteria This section discusses how to construct application-level clusters of different criteria from CBs. We focus our discussions on three common clustering criteria, i.e., distance-based, density-based, and K-partitioning.
11.4.3.1 Distance-based CMON A common clustering criterion is based on the minimum distance metric. The Minimum Distance CMON is defined as follows. Definition 11.5. For each object in a Minimum Distance CMON (MD-CMON), the minimum network distance with other objects in the cluster is not longer than a user specified threshold δ (δ ≥ ε ). The requirement of ε ≤ δ is necessary because it guarantees that a CB does not cross two clusters in the MD-CMON. The MD-CMON can be constructed by combining the CBs. Generally, for two CBs, we need to compute their network distance (i.e., the minimum network distance of their boundary objects) to determine whether to combine them. This simple method has a time complexity of O(N 2 ), where N is the number of CBs. In order to reduce the computation cost, we adapt the incremental network expansion method to combine the CBs. The detailed algorithm can be found in Algorithm 24. The algorithm starts with a CB and adds its adjacent nodes that are within δ to a queue Q using the Dijkstra algorithm. Take Fig. 11.10 as an example. Suppose δ = 10 and the algorithm starts with CB1 . Thus, initially CB1 is marked “visited” and J1 is added to Q. The algorithm proceeds to dequeue the first node in Q (i.e., J1 ). All adjacent edges of J1 (except the checked edge < J6 , J1 >) are examined. For each edge < J1 , Ji >, assuming dist(J1 , Ji ) to be the edge length, if Ji satisfies dist(CB1 , J1 ) + dist(J1 , Ji ) ≤ δ , Ji is added to Q and dist(CB1 , Ji ) = dist(CB1 , J1 ) + dist(J1 , Ji ). Moreover, all unvisited CBs on each adjacent edge are checked. For a CBi on < J1 , Ji >, if dist(CB1 , J1 ) + dist(J1 ,CBi ) ≤ δ , CBi is merged into CB1 ’s MD-CMON cluster. If dist(CBi , Ji ) ≤ δ and Ji has not been added to Q, it it is added to Q. The algorithm continues with the same process until Q becomes empty and the CBs around CB1 are combined into a cluster C1 . Afterwards, the algorithm picks up another unvisited CB and repeats the same process until all CBs are visited.
11.4.3.2 Density-based CMON The second clustering criterion is density-based, which is suitable for filtering out noise data. Definition 11.6. For each cluster in the Density-based CMON (DB-CMON), the average density should be higher than a given threshold ρ . Moreover, there should
168
11 Clustering Analysis of Moving Objects
Fig. 11.10 The combination of CBs
Fig. 11.11 The Cross-CB
not be any empty segment (without any objects lying on it) whose length is longer than E. m > ε1 . The Suppose there are m(m > 1) objects in a CB,the density of CB is ε (m−1) second condition is necessary to avoid very skewed clusters. It is equivalent to the condition that for any object in the cluster, the nearest object is within a distance E. Thus, to construct the DB-CMON clusters from CBs, we require ε ≤ max{E, ρ1 }. The cluster formation algorithm is the same as the one described in Algorithm 24 except that the minimum-distance constraint (transformed from the density constraint) is dynamic. Suppose the density of the current cluster with k objects is ρ and a CB has m objects with a length of L. If a CB can be merged into the cluster, k+m+ρ (k/ρ +L) . their minimum distance D must satisfy k/ρk+m +L+D ≥ ρ , i.e., D ≤ ρ
11.4 Clustering Moving Objects in Spatial Networks
169
Algorithm 24: MD CMON() foreach CBi do if CBi .visited == f alse then Q = new priority queue; find edge nx , ny where CBi lies; CB = CBi ; C = CB; nextCB = Next CB on nx , ny from CBi to ny ; while (nextCB = null) and Dist(CB.head,nextCB.tail) ≤ δ do Merge Expand(CB,nextCB,C,nx ,ny ); if (nextCB == null) and Dist(CB.head,ny )≤ δ then B.node = ny ; B.dist = Dist(CB.head,ny ); Enqueue(Q,B); while notempty(Q) do B = Dequeue(Q); foreach node nz adjacent to B.node do nextCB = Next CB from B.node to nz ; if (nextCB = null) and Dist(B.node,nextCB.tail)+B.dist ≤ δ then newdnz = Dist(nextCB.head,nz ); Merge Expand(CB,nextCB,C,B.node,nz ); while (nextCB = null) and Dist(CB.head,nextCB.tail) ≤ δ do newdnz = Dist(nextCB.head,nz ); Merge Expand(CB,nextCB,C,B.node,nz ); end if (no CBs on edge (B.node,nz )) then newdnz = B.dist+Dist(B.node,nz ); if (nextCB == null) and (newdnz ≤ δ ) then Bnew .node = nz ; Bnew .dist = newdnz ; Enqueue(Q,Bnew ); end end end
Procedure Merge Expand(CB1 ,CB2 ,C,node1 ,node2 ) if CB2 .visited == f alse then C=MergeClst(C,CB2 ); CB1 = CB2 ; CB1 .visited = true; CB2 = Next CB from node1 to node2 ; else C1 =FindCluster(CB2); C=MergeClst(C,C1 ); end
11.4.3.3 K-Partitioning CMON K-Partitioning CMON is similar to the K-Partitioning clustering method [9, 12]. It can be defined as follows. Definition 11.7. Given a set of objects, K-Partitioning CMON (KP-CMON) groups them into K clusters such that the sum of distances between all adjacent objects in each cluster is minimized.
170
11 Clustering Analysis of Moving Objects
According to the definition of CBs, the sum of distances between all adjacent objects in each CB is minimized. Therefore, it is intuitive to construct the KPCMON clusters from the CBs. An exhaustive method is to iteratively combine the closest pairs of CBs until K clusters are obtained. This method requires to compute the distances between all pairs of CBs, which is costly. Hereby, we propose a low-complexity heuristic similar to the K-means algorithm [9, 12]. We initially select K CBs as the seeds for K clusters. For the remaining CBs, we assign them to their nearest clusters to minimize the sum of distances between adjacent objects. Note that this heuristic may not lead to the optimal solution. Suppose that in Fig. 11.11, the distances between CBs are: dist(CB2 ,CB3 ) < dist(CB2 ,CB5 ) < dist(CB3 ,CB1 ) < dist(CB2 ,CB1 ) < dist(CB3 ,CB5 ), and that the initial seed CBs are CB1 and CB5 for K = 2. When CB3 is checked, it will be assigned to the cluster of {CB1 }. Then, CB2 will be assigned to the cluster of {CB5 }, which is different from the optimal solution where CB2 and CB3 should be grouped together since dist(CB2 ,CB3 ) < dist(CB2 ,CB5 ). To compensate for such mistakes, we introduce the concept of Cross-CB. For adjacent CBs lying around the same node, if their minimum distance is less than ε , we group them into a Cross-CB. Then, the clustering algorithm is applied over the CBs and Cross-CBs.
11.5 Summary In this chapter, we studied the problem of clustering moving objects in a spatial network and proposed a framework to address this problem. By introducing a notion of cluster block, this framework, on one hand, amortizes the cost of clustering into CB maintenance and combination based on the object movement feature in the road network; on the other hand, it efficiently supports different clustering criteria. We have exploited the features of the road network to predict the split and merge of CBs accurately and efficiently. Three different clustering criteria have been defined and the cluster construction algorithms based on CBs were proposed.
References 1. Agrawal R, Gehrke J, Gunopulos D, and Raghavan P (1998) Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, Washington, USA, pp 94-105 2. Ankerst M, Breunig MM, Kriegel HP, and Sander J (1999) OPTICS: Ordering Points to Identify the Clustering Structure. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1999), Philadelphia, Pennsylvania, USA, pp 49-60 3. Fisher D (1987) Knowledge Acquisition Via Incremental Conceptual Clustering. Machine Learning 2:139-172 4. Guha S, Rastogi R, Shim K (1998) CURE: An Effcient Clustering Algorithm for Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1998), Seattle, Washington, USA, pp 73-84 5. Jain AK, Dubes RC (1988) Algorithms for Clustering Data. Prentice Hall 6. Jin W, Jiang Y, Qian W, Tung AKH (2006) Mining Outliers in Spatial Networks. In: Proceedings of the 11th International Conference on Database Systems for Advanced Applications (DASFAA 2006), Singapore, pp 156-170
References
171
7. Karypis G, Han EH, Kumar V (1999) Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer 32(8):68-75 8. Kalnis P, Mamoulis N, Bakiras S (2005) On Discovering Moving Clusters in Spatio-Temporal Data. In: Proceedings of the 9th Symposium on Spatial and Temporal Databases (SSTD 2005), Angra dos Reis, Brazil, pp 364-381 9. Kaufman L, Rousseeuw PJ (1990) Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons Inc. 10. Li YF, Han JW, Yang J (2004) Clustering Moving Objects. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2004), Seattle, Washington, USA, pp 617-622. 11. Martin E, Kriegel HP, Sander J, Xu X (1996) A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: Proceedings of the 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD 1996), Portland, Oregon, pp 226-231 12. Ng RT and Han J (1994) Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of the 20th International Conference on Very Large Data Bases (VLDB 1994), Santiago de Chile, Chile, pp 144-155 13. Nehme RV, Rundensteiner EA (2006) SCUBA: Scalable Cluster-Based Algorithm for Evaluating Continuous Spatio-Temporal Queries on Moving Objects. In: Proceedings of the 10th International Conference on Extending Database Technology (EDBT 2006), Munich, Germany, pp 1001-1019 14. Nanopoulos A, Theodoridis Y, Manolopoulos Y (2001) C2P: Clustering Based on Closest Pairs. In: Proceedings of the 27th International Conference on Very Large Data Bases (VLDB 2001), Roma, Italy, pp 331-340 15. Wang W, Yang J, Muntz R (1997) STING: A Statistical Information Grid Approach to Spatial Data Mining. In: Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB 1997), Athens, Greece, pp 186-195 16. Yiu ML, Mamoulis N (2004) Clustering Objects on a Spatial Network. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2004), Paris, France, pp 443-454 17. Zhang Q, Lin X (2004) Clustering Moving Objects for Spatio-Temporal Selectivity Estimation. In: Proceedings of the 15th Australasian Database Conference (ADC 2004), Dunedin, New Zealand, pp 123-130 18. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: An Effcient Data Clustering Method for Very Large Databases. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), Montreal, Canada, pp 103-114 19. Zahn C (1971) Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters. IEEE Transactions on Computers 20(1):68-86
Chapter 12
Location Privacy
Xiaofeng Meng1 , Jidong Chen2 1
Renmin University of China, 2 EMC Research China
[email protected],
[email protected] Abstract With rapid development of sensor and wireless mobile devices, it is easy to access mobile users’ location information anytime and anywhere. On one hand, LBS is becoming more and more valuable and important. On the other hand, location privacy issues raised by such applications have also gained more attention. However, due to the specificity of location information, traditional privacy-preserving techniques in data publishing cannot be used. In this chapter, we will introduce location privacy, and analyze the challenges of location privacy-preserving, and give a survey of existing work including the system architecture, location anonymity and query processing. Key words: location-based service, moving object, location privacy, privacy-preserving, location anonymization
12.1 Introduction In LBS applications, mobile users send their location information to service providers and enjoy various types of location-based services, such as mobile yellow page (e.g., “Where is my nearest restaurant”), mobile buddy list (e.g., “Where is my nearest friend”), traffic navigation (e.g., “What is my shortest path to the Summer Palace”), and emergency support services (e.g., “I need help and send me the nearest police)”. LBS is playing an important role in people’s daily life. In 2010, the total population of GPS-enabled LBSs subscribers will reach 315 millon, up from 12 millon in 2006, according to a new study from ABI Research. While people get much benefit from the useful and convenient information provided by LBSs, the privacy threat of revealing a mobile user’s personal information
173
174
12 Location Privacy
(including the identifer and location) has become a severe issue. It has been reported in [18] and an article in USA Today Dec., 20021 that a man was tracking his exgirlfriend with GPS. Some companies increasingly use GPS-enabled cell phones to track empolyees2 . Although many cases illustrate the various benefits of mobile devices, the user’s privacy is threatened. In order to enjoy a good quality of LBS, an exact location is needed. However, an exact location needs to be hidden, to meet privacy requirements. In this chapter we first give an LBS example and show what kind of privacy issues are threaten. Then, the state-of-the-art of privacy in Location-based Services is introduced, including challenges, system architecture, and cloaking methods. Finally, compared with traditional query processing, the key challenges of privacy-aware query on moving objects are introduced.
12.2 Privacy Threats in LBS A major privacy threat specific to LBS usage is the location privacy breaches [7]. Such breaches take place when a party that is not trusted gets access to information that reveals the locations visited by the individual as well as the times during which these visits took place. An adversary can utilize such location information to infer details about the private life of an individual, such as political affiliations, alternative lifestyles, or medical problems of an individual, or the private businesses of an organization, such as new business initiatives and partnerships. First, using her PDA phone, Alice issues a query to the service provider (e.g., Google Map) to find out “where is the nearest hospital with specialty in cancer.” Alice wants to hide her exact location (e.g., being in a hospital or at home), as well as the information that it is her (Alice) who issued a query about cancer. Or else, an adversary may infer that Alice has some medical problem. Location privacy is a particular type of information privacy [1]. Westin defined information privacy as “the claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others” [22]. Whereas location privacy is defined as the ability to prevent other parties from learning one’s current or past location. Sensitive data [2] refers to information of general concern, like medical information or financial data that could be transmitted as part of a service request; it may also be the spatiotemporal information regarding the user, as possibly collected by a location-based service provider. Examples include (1) information on the specific location of individuals at specific times, (2) movement patterns of individuals (specific routes at specific times and their frequency), and (3) personal points of interest (frequent visits to specific shops, clubs, or institutions). Privacy threats related to location-based services are classified into two categories [9]: communication privacy threats and location privacy threats. In the communication privacy domain, sender anonymity is maintained, which implies that 1 “GPS system used to stalk woman” (htt p : //www.usatoday.com/tech/news/2002-12-30-gpsstalkerx .html) 2 “Companies increasingly use GPS-enabled cell phones to track employees” (htt p : //wi f i.weblogsinc.com/2004/09/24/companies-increasingly-use-gps-enabled-cell-phones-totrack/) Weblogsinc. September, 2004
12.2 Privacy Threats in LBS
175
eavesdroppers on the network and LBS providers cannot determine the originator of a message. Compared to non-LBS web services, the location information is the key problem: an adversary can re-identify the sender of an otherwise anonymous message by correlating the location information with prior knowledge or observations about a subject’s location. Consider the case where a subject reveals his/her location L in a message M to a location-based service and an adversary A has access to this information. Then, sender anonymity and location privacy is threatened by location information in the following ways: • Restricted Space Identification. If A knows that space L exclusively belongs to subject S then A learns that S is in L and S has sent M. For example, when the owner of a suburban house sends a message from his garage or driveway, the coordinates can be correlated with a database of geocoded postal addresses to identify the residence. An address lookup in phone or property listings then reveals the owner and likely originator of the message. • Observation Identification. If A has observed the current location L of subject S and finds a message M from L, then A learns that S has sent M. For example, the subject has revealed its identity and location in a previous message and then wants to send an anonymous message. The latter message can be linked to the previous one through the location information. • Location Tracking. If A has identified subject S at location Li and can link a series of location updates L1 , L2 , . . . , Li , . . . , Ln to the subject, then A learns that S visited all locations in the series. Location privacy threats describe the risk that an adversary learns the locations that a subject visited (and the corresponding times). Through these locations, the adversary receives clues about private information such as political affiliations, alternative lifestyles, or medical problems. Assuming that a subject does not disclose his/her identity at such a private location, an adversary could still gain this information through location tracking. If the subject transmits his/her location with high frequency, the adversary can, at least in less populated areas, link subsequent location updates to the same subject. If at any point the subject is identified, his/her complete movements are also known. There have been a number of follow-up studies based on location privacy preserving, which can be divided into two directions. (1) How to perform location anonymization. Anonymity is the state of being not identifiable within a set of subjects, referred to the anonymity set [19]. Location anonymity guarantees the inability to associate location information to a particular individual/group/institution through inference attacks [15]. Specifically, its goal is to prevent disclosure of unnecessary information, including the individual identity and location of an individual, through explicit or implicit control of what information is given to “whom and when”. (2) How to efficiently answer location-based queries (e.g., nearest neighbor and range queries) with cloaked regions [12, 16]. In a privacy-aware LBS system, location information is fuzzy instead of being exact. It can be a set of locations or an obfuscated location. Such that query processing in traditional moving object databases is not applicable now. We have to extend it or find new methods for answering queries with anonymized location. The challenges faced in location privacy preserving can be summarized as follows: 1. It needs a trade-off between location privacy protecting and location-based services enjoying. As the data precision increases, so does the data utility; however,
176
2.
3.
4.
5.
12 Location Privacy
the privacy is threatened. It is often desirable to strike a balance between the location privacy and quality of services (QoS) requirements. A similar case occurs with regard to location privacy preserving. When a user issues a query, he has to publish his exact location. The more exact the location data is, the QoS correspondingly rises, but the privacy preserving is at a very low level. The QoS here includes response time, communication cost, etc. Location information is multi-dimensional data, and they are dependent with each other. By contrast, data in publishing has independent attributes. And the attribute has one dimension value. In privacy preserving in data publishing, the data are partitioned into different groups based on all attributes. The anonymization method on each dimension can be different. However, in location privacy preserving, location is multi-dimensional information. We cannot handle it separately. Location privacy preservation is on-line and service-centric, which should tolerate the high frequency of location updates. Data anonymization in data publishing is applicable for the current snapshot of data. It is off-line and data-centric. Therefore, it has no constraints on response time. However, for location privacy protection, the processor has to face so many moving objects with locations being updated frequently. Therefore, the cloaking time is a very important factor for location anonymization. Meanwhile, the problem of privacy compromise in location cloaking for continuous location updates should be considered, e.g., trajectory anonymization. QoS is a very important factor. In privacy protection in data publishing, the focus is only on whether the user’s privacy information is protected. However, in location privacy preserving, privacy protection is only one of the several issues. Other issues include how long users have to wait for the query answer and how much it costs when the answer is got. Specifically, the location is fuzzy as a result of anonymization. Therefore, it is a challenge to provide highly efficient, accurate, and anonymous location-based services based on the knowledge of the cloaked spatial areas rather than the exact location information. Therefore, how to provide highly efficient, accurate, and anonymous location-based services based on the knowledge of the cloaked spatial areas rather than the exact location information is another problem that users are concerned. Privacy requirements are personalized. Different people have different privacy requirements. Moreover, the privacy levels for the same person maybe different when the place or time is different. For example, when someone is shopping, his/her privacy level is low. However, if he/she is in a hospital, the privacy level increases. Therefore, we cannot unify everyone’s privacy requirements or force users to accept a minimum level of privacy.
In order to accommodate personalized privacy requirements, each user can specify four parameters for protecting the location privacy at least: • k: It represents the anonymity level in the location k-anonymity model. More specifically, each cloaked region should cover at least k different users. The larger the value of k, the more privacy is protected. • Amin : It specifies the minimum area that the cloaked region should have. This is to prevent the cloaked region from being too small for highly populated areas. • Amax : It constrains the maximum area of the cloaked region. As the area of the cloaked region would affect the accuracy and size of the query result, this parameter stands for one kind of quality of service.
12.3 System Architecture
177
• δt : It is the maximum tolerable cloaking delay, which is a QoS parameter. The larger is the δt value, the worse is the service quality, since the user will have a higher chance of moving away from the location where the query was issued. The former two parameters are the constraints for location anonymization, which is the minimum of QoS. And the latter two are constraints for location service quality, which indicate the worst QoS.
12.3 System Architecture System architectures for location privacy are classified into three categories: noncooperative architecture, centralized architecture, and peer-to-peer architecture. Users in non-cooperative architecture depend only on their knowledge to preserve their location privacy. However, in centralized architecture, a centralized entity is responsible for gathering information and providing the required privacy for each user. For peer-to-peer architecture, users collaborate with each other without the centralized entity to provide customized privacy for each single user.
12.3.1 Non-Cooperative Architecture The non-cooperative architecture system [5] consists of many mobile users and an untrusted service provider. It is assumed in this architecture that each of the clients is location-aware — they can position their own locations (e.g., using GPS or WLAN based positioning). It has strong capability for calculation and storage to get the anonymized location according to the personalized privacy requirement. Location obfuscation is performed at client’s end. On receiving the anonymized location, the untrusted service provider processes the request and sends back the candidate results to the user. As the client knows its own exact location, it obtains the true result on its own. In a word, the location anonymization and results refinement are both completed by clients themselves. The good point of this architecture is that it is simple, and easy to be incorporated with other technologies. But the requirement for client is too high. The most worst is that it generates the anonymized location only by its own knowledge, but ignores the other users’ locations. Therefore, privacy is easily threatened in this architecture. For example, [6] reduces the resolution of location for location privacy protection, and thus a cloaked region is issued. However, only one user is covered in this region, such that the query issued from this region can be easily to be matched with the issuer. Query privacy is disclosed.
12.3.2 Centralized Architecture The system consists of many mobile users, a trusted anonymizing proxy, and an untrusted service provider. Compared with non-cooperative architecture, a third party
178
12 Location Privacy
anonymization proxy (middleware) is required for all communications between mobile users and LBS applications. Its functions can be summarized as follows: • It receives the exact locations from clients. • It blurs the locations, and sends the blurred locations to the service provider. • It receives and refines the candidate results, which are sent by the service provider. Moreover, it relays the exact query result to clients. Mobile clients communicate with third-party LBS providers through the anonymity proxy. The mobile user sends location-based queries to the anonymizing proxy. The anonymity proxy is a secure gateway to the LBS providers for the mobile clients. Upon receiving the location-based query, the anonymizing proxy removes any identifiers, such as IP addresses. In the meantime, it invokes the location cloaking algorithm to generate a cloaked region in accordance with the user’s privacy requirement. Then, it forwards the modified query to the service provider. Finally, the anonymizing proxy will relay the result returned from the service provider to the mobile user. With a trusted anonymizing proxy, it provide powerful privacy guarantees with high-quality services. But it still suffers from that [10]: • The centralized anonymizer proxy is a bottleneck due to handling of query requests, frequent updates of user locations, and result post-processing. Moreover, the anonymizer is a single point of failure; the system cannot function without it. • The complete knowledge of the locations and queries of all users is a serious security threat, if the anonymizer is compromised. Even if there is no attack, the centralized anonymizer may be subject to governmental control, and may be banned or forced to disclose sensitive user information.
12.3.3 Peer-to-Peer Architecture Similar to non-cooperative architecture, peer-to-peer architecture consists of mobile clients and service providers. However, the users collaborate with each others to keep their customized privacy information. In this aspect, peer-to-peer architecture is different from non-cooperative architecture. Each mobile user carries mobile devices (e.g., mobile phones, PDAs) with embedded positioning capabilities (e.g., GPS). The devices have processing power and access the network through a wireless protocol such as WiFi, GPRS, or 3G. Moreover, each device has a unique network identity (e.g., IP address) and can establish point-to-point communication (e.g., TCP/IP sockets) with any other device in the system through a base station (i.e., the two devices do not need to be within communication range of each other). For security reasons, all communication links are encrypted. In addition, there is a trusted central Certification Server (CS), where users are registered. Prior to entering the system, a user u must authenticate against the CS and obtain a certificate. Users having a certificate are trusted by all other users. Typically, a certificate is valid for a few hours; it can be renewed by recontacting the CS. Apart from the certificate, the CS returns to u the IP addresses of some users who are currently in the system. u uses this list to identify an entry point to the distributed network. Note that the CS does not know the locations of the users
12.4 Location Anonymization Techniques
179
and does not participate in the anonymization process. Therefore, the workload of the CS is low (i.e., no location updates); moreover, it does not store any sensitive information. Each user corresponds to a peer. Peers are partitioned into different groups, according to their location. Within each group, peers elect a head. The anonymization process can be completed by the group head or the user who issues the service. However, the group head refines the candidate results for the users in its group. To achieve load balancing, group heads can be rotated in a round-robin manner [10]. There are three main issues to be addressed in this architecture: anonymization, query processing, and head selection. Group Formation [4] and PRIVE [10] are the two representative works.
12.4 Location Anonymization Techniques The goal of location anonymization is to protect the user’s location while meeting user-specified QoS requirements. A query in LBS can be formalized as r = (id, l, q), where id is the user’s identifier, l = (x, y) is the user’s current location, and q is the query content. These three parameters have different implications. First, id uniquely identifies a user. It cannot be revealed to any third-party and should be removed before being forwarded to the LBS server. Second, l could be a quasi-identifier (QI) attribute, which cannot directly identify a user but may reveal a user’s association with requests by joining with external data (e.g., some background knowledge such as yellow pages and location data obtained by network-based positioning). Thus, l should be cloaked (enlarged) in the request sent to the LBS server. Third, q is a sensitive attribute, which may be confidential to an individual (subject to her/his preference) but must be sent to the LBS server in order to answer the request. Following the above analysis, the simplest way is to replace his/her identity with a pseudonym before sending the query to the service provider. However, as described in Section 12.2, it is not enough. We have to anonymize the location information. There have been a number of follow-up studies on this issue [5, 6, 10, 13, 14, 15, 21].
12.4.1 Location K-Anonymity Model Location k-anonymity model is the most widely accepted metric for location privacy preserving. The k-anonymity model was originally proposed for privacy protection in data publishing by Sweeny [20]. As defined in [20], a release of data provides k-anonymity protection if the information for each individual contained in the release cannot be distinguished from at least k − 1 individuals whose information also appear in the release. To address the location privacy issue, location k-anonymity was proposed by Gruteser and Grunwald [9]. A mobile user is considered as location k-anonymous if and only if the location information sent to the service provider is indistinguishable from those of at least k − 1 other users. More specifically, location information is represented by a tuple containing three intervals ([x1 , x2 ], [y1 , y2 ], [t1 ,t2 ]). The intervals [x1 , x2 ] and [y1 , y2 ] describe a two dimensional area where the subject is
180
12 Location Privacy
located. [t1 ,t2 ] describes a time period during which the subject was present in the area. Note that the intervals represent uncertainty ranges; we only know that at some point in time within the temporal interval the subject was present at some point of the area given by the spatial intervals. Thus, a location tuple for a subject is kanonymous, when it describes not only the location of the subject, but also the locations of (k − 1) other subjects. In other words, (k − 1) other subjects also must have been presented in the area and the time period described by the tuple. For example, Fig. 12.1 shows a location 3-anonymity example (For stating conveniently, time interval is omitted here). Locations of A, B,C, and D are all extended to a rectangle CR = ([xbl , xur ], [ybl , yur ]), where (xbl , ybl )((xur , yur )) is the bottom-left (top-down) location of cloaked region. If it is represented by a table form, it is shown as Table 12.1. Thus, the adversary cannot be sure the exactly location of each mobile user. The users in the cloaked region constitutes the cloaking set. In this example, cloaking set is {A, B,C, D}. Generally speaking, the larger the anonymity set k is, the higher is the degree of anonymity. Note here that k is specified by the user, which is one of the four parameters mentioned in Section 12.2. Generally speaking, the larger k is, the larger the size of cloaked region is. It largely depends on the surrounding environment. Let k=100 and the user is in the shopping mall, the cloaked region will be very small. However, if the user is in the desert, the cloaked region may be very large.
Fig. 12.1 Location 4-anonymity
Table 12.1 Location 4-anonymity User
Real location
Anonymity location
A
(xA , yA )
([xbl , xur ], [ybl , yur ])
B
(xB , yB )
([xbl , xur ], [ybl , yur ])
C
(xC , yC )
([xbl , xur ], [ybl , yur ])
D
(xD , yD )
([xbl , xur ], [ybl , yur ])
12.4.2 p-Sensitivity Model Several methods have been proposed to support location-based services without revealing mobile users’ privacy information. There are two types of privacy concerns in location-based services: location privacy and query privacy. Existing stud-
12.4 Location Anonymization Techniques
181
ies, based on location k-anonymity, mainly focus on location privacy and are insufficient to protect query privacy. In particular, due to lack of semantics, location k-anonymity has the drawback of query homogeneity attack. In many LBS applications, mobile users do not mind to reveal their exact location information. However, they would like to hide the fact that they have issued queries that contain sensitive content as such information may reveal their personal interest (e.g., searching the nearest clinic when the user is in an insensitive public place). In this section, we will discuss protection of query privacy for LBS applications. Existing location k-anonymity technique can be used to improve protection of query privacy. Nevertheless, the protection provided by location k-anonymity is not sufficient. Consider a scenario where each query location is enlarged in accordance with k-anonymity. That is, each query location is covered by at least k queries (hereafter called anonymity set). Thus, even though the adversary knows the exact location of a user, he is not able to link the user to a specific query (rather k queries). However, one main weakness of k-anonymity is that it considers only spatial proximity in forming anonymity sets, but not query semantics. In an extreme case, if all queries in the anonymity set contain the same content, the query privacy is still revealed. This situation is not uncommon. For example, when friends meet after office hours and discuss visiting some club, they may all issue location-based queries containing the keyword “club”. Since these query locations are spatially proximate, they are very likely to be anonymized together in the same anonymity set. As a result, although the adversary cannot infer which user issued which query, he would know all users queried about clubs. Consider another example, several specialty clinics are located in a small area of the downtown, and people would easily lose their way after leaving the highway exit. The users may often issue a location-based query to find the way to some specialty clinic near the highway exit. These queries are then likely to be anonymized with each other. Furthermore, even if the k queries in an anonymity set are not of the same kind (e.g., satisfying l-diversity in [17]), it is still not acceptable to some users if they all contain sensitive information (e.g., some queries ask about clubs and some others ask about clinics). In a word, due to lack of semantics, location k-anonymity can just prevent the association between users and requests, but not the association between users and (sensitive) query contents, and hence suffers from the aforementioned attacks. To protect query privacy, first we define the query semantics. For simplicity,we simply assume that each query can be classified into two types according to its content: (1) Insensitive Query (Qi ), e.g., queries about traffic; (2) Sensitive Query (Qs ), e.g., queries about bar, clinic, and political information. Following our assumption, the attacker may obtain the tables R and R∗ (as shown in Fig. 12.2) and attempt to establish their relationship. We use an example to illustrate each of these two attacks. Figure 12.3 shows an example of query homogeneity attack. We assume that there are six users u1 through u6 . In the external table R∗ , user u4 has the location of l4 ∗ = (5, 8). When l4 ∗ is joined with R , the attacker can observe that l4 ∗ is covered by the cloaking regions of four requests, each of which covers more than one location in R∗ . Thus, the attacker can only know that u4 has sent one of the four queries but cannot tell which one. However, all the four queries are about “club”. Hence, the attacker can conclude that u4 must have queried about “club”, which might be sensitive with respect to u4 ’s privacy preference. In a word, the attacker can infer that a user has issued some sensitive query with high confidence.
182
12 Location Privacy
Fig. 12.2 Original table, anonymized table, and external table.
Fig. 12.3 Query homogeneity attack
To protect against location linking attack, each query can be de-linked from its issuer by confusing the attacker with more than one user appearing in the cloaking region of the query; each user can be de-linked from his/her query by confusing the attacker with more than one query having cloaking regions that cover the user’s location. Given an anonymized query r , denote by P(r → u∗ ) the probability of the user u∗ in r .Su being the true issuer of r . Given a user u∗ , denote by P(u∗ → r ) the probability of the query r in u∗ .Sr being sent by u∗ . By the assumption of uniform background knowledge, the probability of a query being sent by any user in its Su is equal and each user has the same probability of sending any request in the user’s Sr . Thus, in order to defend against location linking attack, it is required that P(r → u∗ ) and P(u∗ → r ) are both less than or equal to the user-specified threshold 1k : P(r → u∗ ) = P(u∗ → r ) =
1 k
(12.1)
1 1 ≤ |u∗ .Sr | k
(12.2)
1 |r .Su |
≤
To protect against query homogeneity attack, each user can be de-linked from sensitive queries by confusing the attacker with some insensitive queries in the user’s Sr . Given a user u∗ , denote by P(u∗ → Qs ) the probability that u∗ has sent some sensitive query. Hence, it is required that P(u∗ → Qs ) is always less than the user-specified threshold p. It can be formalized as:
12.4 Location Anonymization Techniques
P(u∗ → Qs ) =
183
∑ri ∈u∗ .Sr vi
(12.3)
where vi is the sensitivity value of query ri and Σ vi computes the total number of sensitive queries in the request anonymity set of user u∗ . Eqs. (12.1) and (12.2) ensure that any query will be linked with at least k users and any user will be linked with at least k queries. Eq (12.3) ensures that the probability of any user sending some sensitive query is less than p. Finally, we wrap up the p-sensitivity model as: p-Sensitivity: p-sensitivity is satisfied if and only if: • for each user u∗ , P(u∗ → r ) ≤ 1k , P(u∗ → Qs ) < p; • for each query r , P(r → u∗ ) ≤ 1k .
12.4.3 Anonymization Algorithms In terms of the techniques used for protecting location privacy, the existing approaches can be classified into three categories: dummy, cloaking, and encryption. The first technique is to generate dummies. A user specifies a dummy location instead of his/her genuine location. As shown in Fig. 12.4, circle point represents the query, and the square point represents the object queried. The black point represents the true location, and the white points represent dummies. The user location is represented with a wrong value, such that the privacy is achieved from the fact that the reported location is false. The QoS and the amount of privacy mainly depend on how far the dummy is from. The larger the distance, the worst QoS, but much privacy is preserved.
Fig. 12.4 Dummies
The second technique is cloaking. The main idea of cloaking is to reduce the spatio-temporal resolution of the user location. A precise location is replaced with a cloaked region, which is shown in Fig. 12.5, so that the attacker cannot know the exact location of the user. The cloaked region is a closed region, which can be any shape with a predefined probability distribution of this object in the region. In general, most existing work uses a rectangle or a circle to present a cloaked region, and assume that the probabilities of the users being in a cloaking region is the same. The difference between cloaking and dummy is that the location in the former case is a fuzzy location, whereas in the latter case the locations are all precise and the attacker just cannot tell which one is real. The larger is the cloaked region, the more privacy is preserved, but the less specific is the request.
184
12 Location Privacy
Fig. 12.5 Cloaking
Third, some work [11] suggested using encryption for location privacy protection recently. Its main idea is that the query is encrypted so that the service provider answers the queries without knowing what kind of information is being retrieved. Then, the user de-encrypts the result candidates and refines them at the client side. For example, Ghinita et al.[11] proposed a framework that is based on Private Information Retrieval (PIR). The framework partitions the space into grid cells and then the user requests the content of cell where he/she is located. Thanks to PIR, the user can encrypt which cell is requested while receiving the correct content.
12.5 Evaluation Metrics Compared with dummies and encryption, cloaking is the most widely used method [3, 7, 8, 14]. In this section, we introduce several evaluation metrics for system level control of the balance between privacy value and performance implication in terms of QoS. These metrics [7] can be used to evaluate the effectiveness and the efficiency of anonymization algorithms based on cloaking. Success rate is an important measure for evaluating the effectiveness of the proposed location k-anonymity model. It can be defined over a set S ⊂ S of requests as the percentage of messages that are successfully anonymized, which is formally represented as: |S | (12.4) SR = |S| where S is the number of requests that have been anonymized successfully, and S is the number of requests issued. Relative anonymity level is a measure of the level of anonymity provided by the cloaking algorithm, normalized by the level of anonymity required by the messages. It is measured by k /k, where k is the number of users actually included in the cloaking region while k is the number that user required. Note that the relative anonymity level cannot go below 1. Higher relative anonymity levels mean that on the average messages are getting anonymized with larger k values than the userspecified minimum k-anonymity levels. In general, we prefer algorithms that can provide higher relative anonymity levels.
References
185
Relative spatial resolution is a measure of the spatial resolution provided by the cloaking algorithm, normalized by the minimum acceptable spatial resolution defined by the spatial tolerances. Higher relative spatial resolution values imply that anonymization is performed with smaller spatial cloaking regions relative to the constraint boxes specified. Relative temporal resolution is a measure of the temporal resolution provided by the cloaking algorithm, normalized by the minimum acceptable temporal resolution defined by the temporal tolerances. Higher relative temporal resolution values imply that anonymization is performed with smaller temporal cloaking intervals and thus with smaller delays due to perturbation. Relative spatial and temporal resolutions cannot go below 1. Message processing time is a measure of the running time performance of the anonymization algorithm. It is the period from when a request is received to when the request is successfully cloaked. It includes the cloaking time as well as the waiting time for cloaking.The message processing time may become a critical issue, if the computational power at hand is not enough to handle the incoming messages at a high rate. Important measures of efficiency include relative anonymity level, relative temporal resolution, relative spatial resolution, and message processing time. The first three are measures related with quality of service, whereas the last one is a performance measure.
12.6 Summary This chapter presents the definition, the models, and the techniques of location privacy preserving. It consists of four main components. First, we introduced location privacy threats and gave an overview of the state-of-art research. Second, we presented three system architectures for location privacy preserving. Third, we discussed the various location privacy models and techniques. Finally,we introduced several evaluation metrics for system level control of the balance between privacy value and performance implication in terms of QoS. In real life, several major privacy threats are occurring due to the use of locationdetection devices. Therefore, location privacy is a major obstacle in the ubiquitous deployment of location-based services. Location privacy protection is a new developing field, and there are several open issues to be researched.
References 1. Beresford AR, Stajano F (2003) Location Privacy in Pervasive Computing. IEEE Pervasive Computing 2(1):46-55 2. Bettini C, Wang XS, Jajodia S (2005) Protecting Privacy Against Location-based Personal Identification. In: Proceedings of the VLDB Workshop on Secure Data Management (SDM 2005), Trondheim, Norway, pp 185-199 3. Bamba B, Liu L, Pesti P, Wang T (2008) Supporting Anonymous Location Queries in Mobile Environments with PrivacyGrid. In: Proceedings of the 17th International Conference on World Wide Web (WWW 2008), Beijing, China, pp 237-246
186
12 Location Privacy
4. Chow CY, Mokbel MF, Liu X (2006) A Peer-to-Peer Spatial Cloaking Algorithm for Anonymous Location-Based Services. In: Proceedings of the 14th ACM International Symposium on Geographic Information Systems (GIS 2006), Arlington, Virginia, USA, pp 171-178 5. Cheng R, Zhang Y, Bertino E, Prabhakar S (2006) Preserving User Location Privacy in Mobile Data Management Infrastructures. In: Proceedings of the 6th Workshop on Privacy Enhancing Technologies (PET 2006), Cambridge, United Kingdom, pp 393-412 6. Du J, Xu J, Tang Z, Hu H (2007) iPDA: Supporting Privacy-Preserving Location-Based Mobile Services. In: Proceedings of the 8th International Conference on Mobile Data Management (MDM 2007), Mannheim, Germany, pp 212-214 7. Gedik B, Liu L (2008) Protecting Location Privacy with Personalized k-Anonymity: Architecture and Algorithms. IEEE Transactions on Mobile Computing 7(1):1-18 8. Gedik B, Liu L (2005) Location Privacy in Mobile Systems: A Personalized Anonymization Model. In: Proceedings of the 25th International Conference on Distributed Computing Systems (ICDCS 2005), Columbus, Ohio State, USA, pp 620-629 9. Gruteser M, Grunwald D (2003) Anonymous Usage of Location Based Services Through Spatial and Temporal Cloaking. In: Proceedings of the 1st International Conference on Mobile Systems, Applications, and Services (MobiSys 2003), San Francisco, California, USA, pp 3142 10. Ghinita G, Kalnis P, Skiadopoulos S (2007) MobiHide: A Mobile Peer-to-Peer System for Anonymous Location-Based Queries. In: Proceedings of the 10th Symposium on Spatial and Temporal Databases (SSTD 2007), Boston, Massachusetts, USA, pp 221-238 11. Ghinita G, Kalnis P, Khoshgozaran A, Shahabi C, Tan K (2008) Private Queries in Location Based Services: Anonymizers Are Not Necessary. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2008), Vancouver, BC, Canada, pp 121-132 12. Hu H, Lee D (2006) Range Nearest-Neighbor Query. IEEE Transactions on Knowledge and Data Engineering 18(1):78-91 13. Kido H, Yanagisawa Y, Satoh T (2005) An Anonymous Communication Technique Using Dummies for Location-Based Services. In: Proceedings of the 2005 IEEE International Conference on Pervasive Services (ICPS 2005), Santorini, Greece, pp 88-97 14. Kalnis P, Ghinita G, Mouratidis K, Papadias D (2006) Preserving Anonymity in Location Based Services. Technical Report TRB6/06, Department of Computer Science, National University of Singapore. 15. Liu L (2007) From Data Privacy to Location Privacy: Models and Algorithms. In: Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007), Vienna, Austria, pp 1429-1430 16. Mokbel MF, Chow C, Aref WG (2006) The New Casper: A Privacy-Aware Location-Based Database Server. In: Proceedings of the International Conference on Very Large Data Bases (VLDB 2006), Seoul, Korea, pp. 1499-1500 17. Machanavajjhala A, Gehrke J, Kifer D (2006) l-Diversity: Privacy Beyond k-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), Atlanta, Georgia, USA, pp. 24-37 18. Man Accused of Stalking Ex-girlfriend with GPS. Fox News. September, 2004. htt p : //www. f oxnews.com/story/0, 2933, 131487, 00.html 19. Pfitzmann A, Koehntopp M (2000) Anonymity, Unobservability, and Pseudonymity. C A Proposal for Terminology. In: Proceedings of the International Workshop on Design Issues in Anonymity and Unobservability, Berkeley, California, USA, pp 1-9 20. Sweeney L (2002) K-anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-Based Systems 10(5):557-570 21. Xiao Z, Meng X, Xu J (2007) Quality-Aware Privacy Protection for Location-Based Services. In: Proceedings of the 12th International Conference on Database Systems for Advanced Applications (DASFAA 2007), Bangkok, Thailand, pp 434-446 22. Westin AF (1967) Privacy and Freedom. New York NY: Atheneum
Index
B+ -tree, 34 Bx -tree, 34, 70 abstract data type, 14, 15 acceleration, 25 adaptive network R-tree, 48, 52 adaptive unit, 48 adversary, 173 aggregate nearest neighbor query, 69 air traffic control, 127 animal migration analysis, 149 ANN query, 69 anonymity, 173 anonymity proxy, 176 ANR-tree, 52, 58 answer loss, 87 AU, 48 augmented reality, 126, 127 bandwidth allocation, 127 blockage, 139 Bluetooth, 4 bottom-up, 99, 150 boundary, 164 branch-and-bound, 69 CA, 17, 105 CB, 150, 153 cell, 17, 96, 105 dense cell, 96 dense leaf cell, 95 sparse cell, 96 cellular automaton, 14, 17 cellular communication, 126 cellular network, 127 centralized architecture, 175 certification server, 176 change-tolerant property, 60 cloaked region, 173, 178 cloaking, 172, 181 cluster block, 150, 153, 161
cluster unit, 88 clustering density-based, 150, 151, 157, 167 DBSCAN, 151 grid-based, 150 hierarchical, 150, 154 agglomerative, 150 divisive, 150 model-based, 150 partitioning, 150, 167 k-means, 150 k-medoids, 150 clustering analysis, vi, vii, 8, 150 clustering criteria, 150, 160, 164 clustering moving object, 149 CMON, 150 compactness, 159 computer simulation, 18 congestion, 106 constant function, 16 context-awareness, 126, 127 coordinate, 15 CTR-tree, 60 CU, 88 data anonymization, 174 data mining, vi, 7 data model, vi data publishing, 174 data sketch, 58 DB-CMON, 167 DBMS, 6, 13 dead space, 128 dense area, 69, 80 dense region, 151 dense segment, 80, 87 dense segment set, 90 density, 87 density query, 67, 69, 80, 86 continuous density query, 93 effective density query, 87
187
188
Index
period density query, 69 snapshot density query, 69, 93 depth-first, 69 detour, 130 deviation, 30 digital battlefield, 126 Dijkstra algorithm, 71, 128, 165 dimension reduction, 60 dimension transformation, 34 direct access table, 50 discrete change, 135 discrete point, 106 disjoint cell, 87 dissimilarity, 150 distance metric, 152 distance of vector, 83 drill-down, 130 dummy, 181 dummy location, 181 dynamic attribute, 7, 14, 15 dynamic graph, 137 dynamic graph system, 137 dynamic route, 138 dynamic traffic navigation, 127 dynamic transportation navigation, vii dynamic transportation network, vii, 135 DyNSA, 132
Grid File, vi, 35, 36, 128 group, 150 GSP, 28
edge, 18, 153, 154 edge-based clustering, 154 electronic map, 16 emergency support service, 171 encryption, 181 Euclidean distance, 70, 82, 149 Euclidean restriction, 71 Euclidean space, v, 34, 67, 70
laminar traffic, 106 lane, 20 lazy update, 60 lazy update R-tree, 59 LBS, 5 linear constraint, 14 linear constraint database, 15 linear function, 16 linear movement, 14 linear prediction, vii, 23, 103, 104 linear regression, 106 location anonymity, 173 location anonymization, 173, 177 location evaluation metric, 182 message processing time, 183 relative anonymity level, 182 relative spatial resolution, 182 relative temporal resolution, 183 success rate, 182 location linking attack, 180 location management, v, 6 location modeling, 13 location obfuscation, 175 location privacy, vi, vii, 8, 172, 179 location privacy breach, 172 location privacy threat, 172 location representation, 15 location tracking, 173 location update, vi, 23, 103, 114 group location update, vi, 28 proactive location update, vi, 26 location-aware advertising, 5
fly-through visualization, 126, 127 FNR-tree, 34 FT-Quad-tree, 43 future trajectory, 34, 55 future trajectory Quad-tree, 43 fuzzy, 173 GCA, 17, 105 structure, 18 trajectory, 19 transition, 105 geographic resource discovery, 126 global positioning system, v, 4 GPS, 4, 125 granularity, 129 graph line, 141 graph of cellular automata, vi, 17 graph point, 140 graph region, 141 graph route sectio, 140 grid, 37, 94 grid block, 41 grid bucket, 37
head selection, 177 hierarchy aggregation tree, 128 histogram, 151 historical trajectory, 33, 58 index traverse, 69 index update, 34, 59 inference attack, 173 information privacy, 172 insensitive query, 179 integrated tourist service, 5 intelligent transportation management, v interpolation, 143 intersection, 16, 154 junction, 136 in-graph junction, 136, 138 inter-graph junction, 138 intergraph junction, 136 k-anonymity model, 174, 177 kNN query, vi, 71, 83 KP-CMON, 167
Index location-aware content delivery, 126 location-based games and entertainment, 5 location-based service, v, 126, 171 lower bound, 107 LUR-tree, 59, 70 MBR, 35 MD-CMON, 165 micro moving cluster, 151 minimum bounding rectangle, 35 minimum distance metric, 164 MMC, 151 Mobile Ad hoc Network, 127 mobile buddy list, 171 mobile computing, vi, 3 mobile data management, 6 mobile database, 6 mobile e-commerce and marketing, 127 mobile resource management, 126 mobile user, 125 mobile workforce management, 127 mobile yellow page, 171 mobility, 3 MOD, 6 MON-tree, 34 motion vector, 15, 114 movement similarity, 28 moving graph point, 141 discrete presentation, 141 moving object, 13 moving object management, 126 moving object query, vi density query, vi similar trajectory query, vi moving objects database, v moving objects indexing, 7 moving objects management, v moving objects modeling, 7 moving objects querying, 8 moving objects updating, 7 moving pattern, 80 moving plan, 43 moving segment, 43 moving space, vi, 68 multi-dimensional index structure, 59 network, 152 network constraint, 14, 159 network distance, 67, 70, 149, 151, 152 network expansion, 71, 73 network representation graph, 15 kilometer-post, 15 two-dimensional geographical, 15 NN query, 67–69 node, 153, 154 node-based clustering, 157 non-cooperative architecture, 175 non-linear function, 16 non-linear prediction, 103
189 observation identification, 173 OLAP, 130 OLSE, 106 optimal path, 128 ordinary least square estimation, 106 out-degree, 108 outlier, 81, 151 p-sensitivity model, 181 peer, 177 peer-to-peer architecture, 176 piece-wise linear segment, 104 PMR Quad-tree, 37 point-to-point communication, 176 polyline, 19, 33 positioning technique, vi, 4 pre-aggregation, 128 pre-computation, 69 predictive query, 34 predictive range query, 55 priority queue, 158 privacy preserving, 173 privacy requirement, 174 privacy threat, 171, 172 privacy-aware LBS, 173 privacy-aware query, 172 private information retrieval, 181 QoS, 173 Quad-tree, vi, 35, 94, 128 quadrant, 37, 94 quality of service, 173 quasi-identifier, 177 query homogeneity attack, 180 query privacy, 175, 179 query processing, 67, 144 R*-tree, 33, 38 R-tree, vi, 33, 35, 38, 116, 128 range query, vi, 67, 68, 73 reference area, 130 region, 128 restricted space identification, 173 reverse nearest neighbor query, 69 RFID, 4 risk, 173 RNN query, 69 road segment, 104, 154 round-robin, 177 route, 24, 105, 136 route section, 144, 145 RUM-tree, 60 safe interval, 81, 93, 94, 99 scalability, 151 segment container, 58 sender anonymity, 172 sensitive data, 172 sensitive query, 179 shape, 83 shared trajectory segment, 44
190 shortest path, 67, 151 multi-pass shortest path, 70 shortest path distance, 149, 152 similar movement pattern, 81 similar sequence matching, 80 similar trajectory query, 79, 80 similarity, 84, 149 similarity query shape-based similarity query, 85 spatial similarity, 84 spatio-temporal similarity, 84 temporal similarity, 84 simulation, 106 simulation-based prediction, vii, 25, 51 slowdown rate, 19 SP, 51 space partitioning, 71, 129 space transformation, 70 space-filling curve, 60 spatial condition, 34 spatial database, 6, 150 spatial index, 128 spatial network, v, 9, 14, 34, 48, 67, 70, 87, 159 spatial predicate, vi, 68 spatio-temporal access method, 34 spatio-temporal data, 6 spatio-temporal database, 6, 81 spatio-temporal indexing, 7, 33 spatio-temporal property, 159 spatio-temporal query, 34, 152 spatio-temporal similarity, 85 split, 36 state, 135, 139 state change, 135 static attribute, 14 static data, 33 stochastic behavior, 14, 26, 48 sub-region, 129 sub-trajectory, 81 TB-tree, 70 temporal attribute, 135, 138, 139 temporal database, 6 temporal predicate, vi, 68 temporal unit, 135 threshold, 24 time interval, 34 time interval query, 81 time series data, 81 time series database, 80, 82 time slice query, 81 time-segmented prediction, 109 time-series prediction, 109
Index to-be-expired time, 89, 163 to-be-valid time, 89, 163 top-down, 130, 150 topology change, 135 TPR*-tree, 60, 70 TPR-tree, 34, 38, 60, 70 tracking, 24 traffic condition, 106, 128 traffic coordination and management, 5 traffic density, 128 traffic flow, 138 traffic jam, 80 traffic jam prediction, 149 traffic navigation, 127, 171 traffic rule, 106 traffic simulation, 14 traffic surveillance, 127 trajectory, 19, 43, 82, 114 discrete trajectory, 83 global trajectory, 19, 108 in-edge trajectory, 19, 106, 108 length, 82 temporal interval, 82 temporal normalized discrete trajectory, 85 trajectory anonymization, 174 trajectory bound, 50 trajectory prediction, 103 trajectory segment, 41, 43 transition, 19, 25 transportation, 126 transportation network, 14 uncertain location, 114 uncertain query operator, 119 uncertain trajectory, 114 uncertain trajectory R-tree, 115 uncertain trajectory unit, 114 uncertainty management, vi, 7, 8, 144 uncertainty model, vii update frequency, 23 upper bound, 107 UTR-tree, 115 validity time, 89 VBR, 38 velocity bounding rectangle, 38 vertice, 18 view, 130 view tree, 130 Voronoi graph, 69 weather forecast, 149 WiFi, 4