Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
4978
Manindra Agrawal Dingzhu Du Zhenhua Duan Angsheng Li (Eds.)
Theory andApplications of Models of Computation 5th International Conference, TAMC 2008 Xi’an, China, April 25-29, 2008 Proceedings
13
Volume Editors Manindra Agrawal Indian Institute of Technology (IIT) Kanpur 208016, India E-mail:
[email protected] Dingzhu Du University of Texas Dallas, TX, USA E-mail:
[email protected] Zhenhua Duan Xidian University Xi’an 710071, China E-mail:
[email protected] Angsheng Li Chinese Academy of Sciences Beijing 100080, China E-mail:
[email protected]
Library of Congress Control Number: 2008924623 CR Subject Classification (1998): F.1.1-2, F.2.1-2, F.4.1, I.2.6, J.3 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13
0302-9743 3-540-79227-9 Springer Berlin Heidelberg New York 978-3-540-79227-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. Springer is a part of Springer Science+Business Media springer.com © Springer-Verlag Berlin Heidelberg 2008 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper SPIN: 12257317 06/3180 543210
Preface
Theory and Applications of Models of Computation (TAMC) is an international conference series with an interdisciplinary character, bringing together researchers working in computer science, mathematics (especially logic), and the physical sciences. This crossdisciplinary character, together with its focus on algorithms, complexity, and computability theory, gives the conference a special flavor and distinction. TAMC 2008 was the fifth conference in the series. The previous four meetings were held during May 17–19, 2004 in Beijing, May 17–20, 2005 in Kunming, May 15–20, 2006 in Beijing, and May 22–25, 2007 in Shanghai. TAMC 2008 was held in Xi’an, during April 25–29, 2008. At TAMC 2008 we had two plenary speakers, Bernard Chazelle and Cynthia Dwork, giving one-hour talks each. Bernard spoke on “Why Algorithms Matter” and Cynthia on “Differential Privacy: A Survey of Results.” Their respective papers accompanying the talks are included in the proceedings. In addition, there were two special sessions organized by Barry Cooper and Ying Jiang on “Models of Computation” and by Jianer Chen on “Algorithms and Complexity.” The invited speakers in the first session were Jose Felix Costa, Vincent Danos, Luke Ong, Mingsheng Ying, Miklos Santha, and Gilles Dowek. Invited speakers in the second session were Daniel Brown, Dieter Kratsch, Xiaotie Deng, and Jianer Chen. The TAMC conference series arose naturally in response to important scientific developments affecting how we compute in the twenty-first century. At the same time, TAMC is already playing an important regional and international role, and promises to become a key contributor to the scientific resurgence seen throughout China and other parts of Asia. For TAMC 2008, we received 192 submissions from all over the world, one of which was withdrawn. The Program Committee finally selected 50 papers for presentation at the conference and inclusion in this LNCS volume. We are very grateful to the Program Committee, and the many outside referees they called on, for their hard work and expertise during the difficult selection process. We also wish to thank all those authors who submitted their work for our consideration. The Program Committee could have accepted many more submissions without compromising standards, and were only restrained by the practicalities of timetabling so many talks, and by the inevitable limitations on the size of this proceedings volume. Finally, we would like to thank the members of the Editorial Board of Lecture Notes in Computer Science and the Editors at Springer for their encouragement and cooperation throughout the preparation of this conference.
VI
Preface
Of course TAMC 2008 would not have been possible without the support of our sponsors, and we therefore gratefully acknowledge their help in the realization of this conference. April 2008
Manindra Agrawal Dingzhu Du Zhenhua Duan Angsheng Li
Organization
The conference was organized by the Institute of Computing Theory and Technology, Xidian University, Xi’an, China.
Steering Committee Manindra Agrawal (IIT Kanpur, India) Jin-Yi Cai (University of Wisconsin-Madison, USA) S. Barry Cooper (University of Leeds, UK) Angsheng Li (Chinese Academy of Sciences, China)
Conference Chair Dingzhu Du (University of Texas at Dallas, USA)
Program Committee Co-chairs Manindra Agrawal (IIT Kanpur, India) Dingzhu Du (University of Texas at Dallas, USA) Zhenhua Duan (Xidian University, China) Angsheng Li (Chinese Academy of Sciences, China)
Program Committee Manindra Agrawal (IIT Kanpur, India) Giorgio Ausiello (Rome, Italy) Paola Bonizzoni (University of Milano-Bicocca, Italy) Jin-Yi Cai (University of Wisconsin-Madison, USA) Cristian S. Calude (University of Auckland, Australia) Jianer Chen (Texas A&M University, USA) Francis Chin (University of Hong Kong) S. Barry Cooper (University of Leeds, UK) Decheng Ding (Nanjing University, China) Rod Downey (Wellington, New Zealand) Dingzhu Du (University of Texas at Dallas, USA) Zhenhua Duan (Xidian University, China) Rudolf Fleischer (Fudan University, China) Hiroshi Imai (University of Tokyo, Japan) Kazuo Iwama (Kyoto University, Japan) Ying Jiang (Chinese Academy of Sciences, China)
VIII
Organization
Valentine Kabanets (Simon Fraser University, Canada) Maciej Koutny (Newcastle University, UK) Andrew Lewis (University of Leeds, UK) Angsheng Li (Chinese Academy of Sciences, China) Xingwu Liu (Chinese Academy of Sciences, China) Satyanarayana Lokam (Microsoft Research, India) Giuseppe Longo (Paris, France) Janos Makowsky (Israel Institute of Technology, Israel) Jaikumar Radhakrishnan (Tata Institute of Fundamental Research, India) Rudiger Reischuk (Universitat zu Lubeck, Germany) Helmut Schwichtenberg (Mathematisches Institut der Universit¨ at M¨ unchen, Germany) Xiaoming Sun (Tsinghua University, China) Luca Trevisan (UC Berkeley, USA) Christopher Umans (CalTech, USA) Hanpin Wang (Beijing University, China) Osamu Watanabe (Tokyo Institute of Technology, Japan) Mingsheng Ying (Tsinghua University, China) Shengyu Zhang (California Institute of Technology, USA) Ting Zhang (Microsoft Research Asia) Wenhui Zhang (ISCAS) Yunlei Zhao (Fudan University, China) Hong Zhu (Fudan University, China)
Organizing Committee Dingzhu Du (University of Texas at Dallas, USA) Zhenhua Duan (Xidian University, Xi’an, China) Angsheng Li (Institute of Software, Chinese Academy of Sciences, China)
Sponsoring Institutions Institute of Computing Theory and Technology of Xidian University Xidian University The National Natural Science Foundation of China based on NSFC Grant No. 60433010
Table of Contents
Plenary Lectures Differential Privacy: A Survey of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cynthia Dwork
1
Special Session Lectures On the Complexity of Measurement in Classical Physics . . . . . . . . . . . . . . Edwin Beggs, Jos´e F´elix Costa, Bruno Loff, and John Tucker
20
Quantum Walk Based Search Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . Miklos Santha
31
Contributed Lectures Propositional Projection Temporal Logic, B¨ uchi Automata and ω-Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cong Tian and Zhenhua Duan
47
Genome Rearrangement Algorithms for Unsigned Permutations with O(log n) Singletons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaowen Lou and Daming Zhu
59
On the Complexity of the Hidden Subgroup Problem . . . . . . . . . . . . . . . . . Stephen Fenner and Yong Zhang
70
An O∗ (3.523k ) Parameterized Algorithm for 3-Set Packing . . . . . . . . . . . . Jianxin Wang and Qilong Feng
82
Indistinguishability and First-Order Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . Skip Jordan and Thomas Zeugmann
94
Derandomizing Graph Tests for Homomorphism . . . . . . . . . . . . . . . . . . . . . Angsheng Li and Linqing Tang
105
Definable Filters in the Structure of Bounded Turing Reductions . . . . . . . Angsheng Li, Weilin Li, Yicheng Pan, and Linqing Tang
116
Distance Constrained Labelings of Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jiˇr´ı Fiala, Petr A. Golovach, and Jan Kratochv´ıl
125
A Characterization of NCk by First Order Functional Programs . . . . . . . . Jean-Yves Marion and Romain P´echoux
136
X
Table of Contents
The Structure of Detour Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lars Kristiansen and Paul J. Voda Hamiltonicity of Matching Composition Networks with Conditional Edge Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sun-Yuan Hsieh and Chia-Wei Lee Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs . . . . . . . . . . . J. Czyzowicz, S. Dobrev, H. Gonz´ alez-Aguilar, R. Kralovic, E. Kranakis, J. Opatrny, L. Stacho, and J. Urrutia Generalized Domination in Degenerate Graphs: A Complete Dichotomy of Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Petr Golovach and Jan Kratochv´ıl
148
160 170
182
More on Weak Bisimilarity of Normed Basic Parallel Processes . . . . . . . . Haiyan Chen
192
Extensions of Embeddings in the Computably Enumerable Degrees . . . . . Jitai Zhao
204
An Improved Parameterized Algorithm for a Generalized Matching Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianxin Wang, Dan Ning, Qilong Feng, and Jianer Chen
212
Deterministic Hot-Potato Permutation Routing on the Mesh and the Torus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Andre Osterloh
223
Efficient Algorithms for Model-Based Motif Discovery from Multiple Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bin Fu, Ming-Yang Kao, and Lusheng Wang
234
Ratio Based Stable In-Place Merging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pok-Son Kim and Arne Kutzner A Characterisation of the Relations Definable in Presburger Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mathias Barra Finding Minimum 3-Way Cuts in Hypergraphs . . . . . . . . . . . . . . . . . . . . . . Mingyu Xiao Inapproximability of Maximum Weighted Edge Biclique and Its Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jinsong Tan Symbolic Algorithm Analysis of Rectangular Hybrid Systems . . . . . . . . . Haibin Zhang and Zhenhua Duan
246
258 270
282
294
Table of Contents
On the OBDD Complexity of the Most Significant Bit of Integer Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Beate Bollig
XI
306
Logical Closure Properties of Propositional Proof Systems . . . . . . . . . . . . . Olaf Beyersdorff
318
Graphs of Linear Clique-Width at Most 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . Pinar Heggernes, Daniel Meister, and Charis Papadopoulos
330
A Well-Mixed Function with Circuit Complexity 5n ± o(n): Tightness of the Lachish-Raz-Type Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kazuyuki Amano and Jun Tarui A Logic for Distributed Higher Order π-Calculus . . . . . . . . . . . . . . . . . . . . . Zining Cao Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M. Demange and T. Ekim A Topological Study of Tilings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gr´egory Lafitte and Michael Weiss A Denotational Semantics for Total Correctness of Sequential Exact Real Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas Anberr´ee
342 351
364 375
388
Weak Bisimulations for the Giry Monad (Extended Abstract) . . . . . . . . . . Ernst-Erich Doberkat
400
Approximating Border Length for DNA Microarray Synthesis . . . . . . . . . . Cindy Y. Li, Prudence W.H. Wong, Qin Xin, and Fencol C.C. Yung
410
On a Question of Frank Stephan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Klaus Ambos-Spies, Serikzhan Badaev, and Sergey Goncharov
423
A Practical Parameterized Algorithm for the Individual Haplotyping Problem MLF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minzhu Xie, Jianxin Wang, and Jianer Chen
433
Improved Algorithms for Bicluster Editing . . . . . . . . . . . . . . . . . . . . . . . . . . Jiong Guo, Falk H¨ uffner, Christian Komusiewicz, and Yong Zhang
445
Generation Complexity Versus Distinction Complexity . . . . . . . . . . . . . . . . Rupert H¨ olzl and Wolfgang Merkle
457
Balancing Traffic Load Using One-Turn Rectilinear Routing . . . . . . . . . . . Stephane Durocher, Evangelos Kranakis, Danny Krizanc, and Lata Narayanan
467
XII
Table of Contents
A Moderately Exponential Time Algorithm for Full Degree Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Serge Gaspers, Saket Saurabh, and Alexey A. Stepanov
479
Speeding up Dynamic Programming for Some NP-Hard Graph Recoloring Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oriana Ponta, Falk H¨ uffner, and Rolf Niedermeier
490
A Linear-Time Algorithm for Finding All Door Locations That Make a Room Searchable (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Z. Zhang and Tsunehiko Kameda
502
Model Theoretic Complexity of Automatic Structures (Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bakhadyr Khoussainov and Mia Minnes
514
A Separation between Divergence and Holevo Information for Ensembles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rahul Jain, Ashwin Nayak, and Yi Su
526
Unary Automatic Graphs: An Algorithmic Perspective . . . . . . . . . . . . . . . . Bakhadyr Khoussainov, Jiamou Liu, and Mia Minnes
542
Search Space Reductions for Nearest-Neighbor Queries . . . . . . . . . . . . . . . . Micah Adler and Brent Heeringa
554
Total Degrees and Nonsplitting Properties of Σ20 Enumeration Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M.M. Arslanov, S. Barry Cooper, I.Sh. Kalimullin, and M.I. Soskova s-Degrees within e-Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Thomas F. Kent
568 579
The Non-isolating Degrees Are Upwards Dense in the Computably Enumerable Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . S. Barry Cooper, Matthew C. Salts, and Guohua Wu
588
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
597
Differential Privacy: A Survey of Results Cynthia Dwork Microsoft Research
[email protected]
Abstract. Over the past five years a new approach to privacy-preserving data analysis has born fruit [13, 18, 7, 19, 5, 37, 35, 8, 32]. This approach differs from much (but not all!) of the related literature in the statistics, databases, theory, and cryptography communities, in that a formal and ad omnia privacy guarantee is defined, and the data analysis techniques presented are rigorously proved to satisfy the guarantee. The key privacy guarantee that has emerged is differential privacy. Roughly speaking, this ensures that (almost, and quantifiably) no risk is incurred by joining a statistical database. In this survey, we recall the definition of differential privacy and two basic techniques for achieving it. We then show some interesting applications of these techniques, presenting algorithms for three specific tasks and three general results on differentially private learning.
1
Introduction
Privacy-preserving data analysis is also known as statistical disclosure control, inference control, privacy-preserving datamining, and private data analysis. Our principal motivating scenario is a statistical database. A statistic is a quantity computed from a sample. Suppose a trusted and trustworthy curator gathers sensitive information from a large number of respondents (the sample), with the goal of learning (and releasing to the public) statistical facts about the underlying population. The problem is to release statistical information without compromising the privacy of the individual respondents. There are two settings: in the noninteractive setting the curator computes and publishes some statistics, and the data are not used further. Privacy concerns may affect the precise answers released by the curator, or even the set of statistics released. Note that since the data will never be used again the curator can destroy the data (and himself) once the statistics have been published. In the interactive setting the curator sits between the users and the database. Queries posed by the users, and/or the responses to these queries, may be modified by the curator in order to protect the privacy of the respondents. The data cannot be destroyed, and the curator must remain present throughout the lifetime of the database. Of course, any interactive solution yields a non-interactive solution, provided the queries are known in advance: the curator can simulate an interaction in which these known queries are posed, and publish the resulting transcript. There is a rich literature on this problem, principally from the satistics community (see, e.g., [10, 14, 27, 28, 29, 38, 40, 26, 39] and the literature on controlled M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 1–19, 2008. c Springer-Verlag Berlin Heidelberg 2008
2
C. Dwork
release of tabular data, contingency tables, and cell suppression), and from such diverse branches of computer science as algorithms, database theory, and cryptography, for example as in [3, 4, 21, 22, 23, 33, 34, 41, 48], [1, 24, 25, 31], and [6, 9, 11, 12, 13, 18, 7, 19]; see also the survey [2] for a summary of the field prior to 1989. This survey is about differential privacy. Roughly speaking, differential privacy ensures that the removal or addition of a single database item does not (substantially) affect the outcome of any analysis. It follows that no risk is incurred by joining the database, providing a mathematically rigorous means of coping with the fact that distributional information may be disclosive. We will first describe three differentially private algorithms for specific, unrelated, data analysis tasks. We then present three general results about computational learning when privacy of individual data items is to be protected. This is not usually a concern in the learning theory literature, and signals the emergence of a new line of research.
2
Differential Privacy
In the sequel, the randomized function K is the algorithm applied by the curator when releasing information. So the input is the data set, and the output is the released information, or transcript. We do not need to distinguish between the interactive and non-interactive settings. Think of a database as a set of rows. We say databases D1 and D2 differ in at most one element if one is a proper subset of the other and the larger database contains just one additional row. Definition 1. A randomized function K gives -differential privacy if for all data sets D1 and D2 differing on at most one element, and all S ⊆ Range(K), Pr[K(D1 ) ∈ S] ≤ exp() × Pr[K(D2 ) ∈ S]
(1)
The probability is taken is over the coin tosses of K. A mechanism K satisfying this definition addresses concerns that any participant might have about the leakage of her personal information: even if the participant removed her data from the data set, no outputs (and thus consequences of outputs) would become significantly more or less likely. For example, if the database were to be consulted by an insurance provider before deciding whether or not to insure a given individual, then the presence or absence of that individual’s data in the database will not significantly affect her chance of receiving coverage. Differential privacy is therefore an ad omnia guarantee. It is also a very strong guarantee, since it is a statistical property about the behavior of the mechanism and therefore is independent of the computational power and auxiliary information available to the adversary/user. Differential privacy is not an absolute guarantee of privacy. In fact, Dwork and Naor have shown that any statistical database with any non-trivial utility
Differential Privacy: A Survey of Results
3
compromises a natural definition of privacy [15]. However, in a society that has decided that the benefits of certain databases outweigh the costs, differential privacy ensures that only a limited amount of additional risk is incurred by participating in the socially beneficial databases. Remark 1. 1. The parameter in Definition 1 is public. The choice of is essentially a social question and is beyond the scope of this paper. That said, we tend to think of as, say, 0.01, 0.1, or in some cases, ln 2 or ln 3. If the probability that some bad event will occur is very small, it might be tolerable to increase it by such factors as 2 or 3, while if the probability is already felt to be close to unacceptable, then an increase by a factor of e0.01 ≈ 1.01 might be tolerable, while an increase of e, or even only e0.1 , would be intolerable. 2. Definition 1 discusses the behavior of the mechanism K, and is independent of any auxiliary knowledge the adversary, or user, may have about the database. Thus, a mechanism satisfying the definition protects the privacy of an individual row in the database even if the adversary knows every other row in the database. 3. Definition 1 extends to group privacy as well (and to the case in which an individual contributes more than a single row to the database). A collection of c participants might be concerned that their collective data might leak information, even when a single participant’s does not. Using this definition, we can bound the dilation of any probability by at most exp(c), which may be tolerable for small c. Of course, the point of the statistical database is to disclose aggregate information about large groups (while simultaneously protecting individuals), so we should expect privacy bounds to disintegrate with increasing group size.
3
Achieving Differential Privacy in Statistical Databases
We will presently describe an interactive mechanism, K, due to Dwork, McSherry, Nissim, and Smith [19], for the case of continuous-valued queries. Specifically, in this section a query is a function mapping databases to (vectors of) real numbers. For example, the query “Count P ” counts the number of rows in the database having property P . When the query is a function f , and the database is X, the true answer is the value f (X). The mechanism K adds appropriately chosen random noise to the true answer to produce what we call the response. The idea of preserving privacy by responding with a noisy version of the true answer is not new, but this approach is delicate. For example, if the noise is symmetric about the origin and the same question is asked many times, the responses may be averaged, cancelling out the noise1 . We must take such factors into account. 1
We do not recommend having the curator record queries and their responses so that if a query is issued more than once the response can be replayed: If the query language is sufficiently rich, then semantic equivalence of two syntactically different queries is undecidable; even if the query language is not so rich, the devastating attacks demonstrated by Dinur and Nissim [13] pose completely random and unrelated queries.
4
C. Dwork
Definition 2. For f : D → Rk , the sensitivity of f is Δf = max f (D1 ) − f (D2 )1 D1 ,D2
(2)
for all D1 , D2 differing in at most one element. In particular, when k = 1 the sensitivity of f is the maximum difference in the values that the function f may take on a pair of databases that differ in only one element. For many types of queries Δf will be quite small. In particular, the simple counting queries discussed above (“How many rows have property P ?”) have Δf = 1. Our techniques work best – introduce the least noise – when Δf is small. Note that sensitivity is a property of the function alone, and is independent of the database. The sensitivity essentially captures how great a difference (between the value of f on two databases differing in a single element) must be hidden by the additive noise generated by the curator. √ The scaled symmetric exponential distribution with standard deviation 2Δf / denoted Lap(Δf /), has mass at x proportional to exp(−|x|(/Δf )). More precisely, let b = Δf /. The probability density function is p(x) = exp(−|x|/b)/2b and the cumulative distribution function is D(x) = (1/2)(1 + sgn(x)(1 − exp(|x|/b))). On query function f the privacy mechanism K responds with f (X) + (Lap(Δf /))k adding noise with distribution Lap(Δf /) independently to each of the k components of f (X). Note that decreasing , a publicly known parameter, flattens out the Lap(Δf /) curve, yielding larger expected noise magnitude. When is fixed, functions f with high sensitivity yield flatter curves, again yielding higher expected noise magnitudes. For simplicity, consider the case k = 1. The proof that K yields -differential privacy on the single query function f is straightforward. Consider any subset S ⊆ Range(K), and let D1 , D2 be any pair of databases differing in at most one element. When the database is D1 , the probability mass at any r ∈ Range(K) is proportional to exp(−|f (D1 )−r|(Δf /)), and similarly when the database is D2 . Applying the triangle inequality in the exponent we get a ratio of at most exp (−|f (D1 ) − f (D2 )|(Δf /)). By definition of sensitivity, |f (D1 ) − f (D2 )| ≤ Δf , and so the ratio is bounded by exp(−), yielding -differential privacy. It is easy to see that -differential privacy can be achieved for any (adaptively chosen) query sequence f1 , . . . , fd by running K with noise distribution Lap( i Δfi /) on each query. In other words, the quality of each answer deteriorates with the sum of the sensitivities of the queries. Interestingly, it is sometimes possible to do better than this. Roughly speaking, what matters is the maximum possible value of Δ = ||(f1 (D1 ), f2 (D1 ), . . . , fd (D1 )) − (f1 (D2 ), f2 (D2 ), . . . , fd (D2 ))||1 . The precise formulation of the statement requires some care, due to the potentially adaptive choice of queries. For a full treatment see [19]. We state the theorem here for the non-adaptive case, viewing the (fixed) sequence of queries d f1 , f2 , . . . , fd , with respective arities k1 , . . . , kd , as a single k = i=1 ki -ary query f , and recalling Definition 2 for the case of arbitrary k.
Differential Privacy: A Survey of Results
5
Theorem 1 ([19]). For f : D → Rk , the mechanism Kf that adds independently generated noise with distribution Lap(Δf /) to each of the k output terms enjoys -differential privacy. The mechanism K described above has excellent accuracy for insensitive queries. In particular, the noise needed to ensure differential privacy depends only on the sensitivity of the function and on the parameter . Both are independent of the database and the number of rows it contains. Thus, if the database is very large, the errors for many typical queries introduced by the differential privacy mechanism is relatively quite small. We can think of K as a differential privacy-preserving interface between the analyst and the data. This suggests a general approach to privacy-preserving data analysis: find algorithms that require few, insensentitive, queries. See, e.g., [7, 8, 32]. Indeed, even counting queries are extremely powerful, permitting accurate and differentially private computations of many standard datamining tasks including principal component analysis, k-means clustering, perceptron learning of separating hyperplanes, and generation of an ID3 decision tree [7], as well as (nearby) halfspace learning [8] (see Section 4.3 below). Among the many applications of Theorem 1, of particular interest is the class of histogram queries. A histogram query is an arbitrary partitioning of the domain of database rows into disjoint “cells,” and the true answer is the set of counts describing, for each cell, the number of database rows in this cell. Although a histogram query with k cells may be viewed as k individual counting queries, the addition or removal of a single database row can affect the entire k-tuple of counts in at most one location (the count corresponding to the cell to (from) which the row is added (deleted); moreover, the count of this cell is affected by at most 1, so by Definition 2, every histogram query has sensitivity 1. Many data analyses are simply histograms; it is thus particularly encouraging that complex histograms, rather than requiring large variance in each cell, require very little. 3.1
When Noise Makes No Sense
In some tasks, the addition of noise makes no sense. For example, the function f might map databases to strings, strategies, or trees. In a recent paper McSherry and Talwar address the problem of optimizing the output of such a function while preserving -differential privacy [35]. Assume the curator holds a database X and the goal is to produce an object y. In a nutshell, their exponential mechanism works as follows. There is assumed to be a utility function u(X,y) that measures the quality of an output y, given that the database is X. For example, if the database holds the valuations that individuals assign a digital good during an auction, u(X, y) might be the revenue, with these valuations, when the price is set to y. Auctions are a good example of where noise makes no sense, since an even slightly too high price may prevent many bidders from buying. McSherry and Talwar’s exponential mechanism outputs y with probability proportional to exp(−u(X, y)/2). This ensures Δu-differential privacy, or
6
C. Dwork
-differential privacy whenever Δu ≤ 1. Here Δu is defined slightly differently from above; it is the maximum possible change to the value of u caused by changing the data of a single row (as opposed to removing or adding a row; the notions differ by at most a factor of two); see [35]. With this approach McSherry and Talwar obtain approximately-truthful auctions with nearly optimal selling price. Roughly speaking, this says that a participant cannot dramatically reduce the price he pays by lying about his valuation. Interestingly, they show that the simple composition of differential privacy can be used to obtain auctions in which no cooperating group of c agents can significantly increase their utility by submitting bids other than their true valuations. This is analagous to the situation of Remark 1 above, where composition is used to obtain privacy for groups of c individuals,
4
Algorithms for Specific Tasks
In this section we describe differentially private algorithms for three unrelated tasks. 4.1
Statistical Data Inference
The results in this Section are due to Dwork and Nissim [18]. Consider a setting in which each element in the database is described by a set of k Boolean attributes α1 , . . . αk , and the rows are independently sampled from some underlying distribution on {0, 1}k . Let 1 ≤ ≤ k/2 be an integer. The goal here is to use information about the incidence of settings of any attribute values to learn the incidence of settings of any 2 attribute values. Although we use the term “queries,” these will all be known in advance, and the mechanism will be non-interactive. From something like k 2 pieces of released information k 2 it will be possible to compute approximations to the incidence of all 2 2 minterms. This will allow the data analyst to approximate k the probabilities of all 2(2)2 subsets of 2-ary minterms of length 2, provided the initial approximations are sufficiently accurate. We will identify probability with incidence, so the probability space is over rows in the database. Fix any set of attributes. The incidence of all possible settings of these attribute values is described by a histogram with 2 cells, and histograms have sensitivity 1, so we are going to be working with a query sequence of overall sensitivity proportional to k (in fact, it will be worse than this by a factor t, discussed below). Let α and β be attributes. We say that α implies β in probability if the conditional probability of β given α exceeds the unconditional probability of β. The ability to measure implication in probability is crucial to datamining. Note that since Pr[β] is simple to estimate well using counting queries, the problem of measuring implication in probability reduces to obtaining a good estimate of Pr[β|α]. Moreover, once we can estimate Pr[β|α], Pr[β], and Pr[a], we can use Bayes’ Rule and de Morgan’s Laws to determine the statistics for any Boolean function of
Differential Privacy: A Survey of Results
7
attribute values. For example, Pr[α ∧ β] = Pr[α] Pr[β|α], so if we have estimates of the two multiplicands, within an additive η, we have an estimate for the products that is accurate within 3η. As a step toward the non-interactive solution, consider the interactive case and assume that we have a good estimate for Pr[α] and Pr[β]. The key to determining Pr[β|α] is to find a heavy set for α, that is, a set q ⊆ [n] such that the incidence of α is at least, say, a standard deviation higher than expected, and then to determine whether the incidence of β on this heavy set is higher than the overall incidence of β. More specifically, one can test whether this conditional incidence is higher than a given threshold, and then use binary search to find the “right” threshold value. Finding the heavy set is easy because a randomly chosen subset of [n] has constant probability of exceeding the expected incidence of α by at least one standard deviation. To “simulate” the interactive case, the curator chooses some number of random subsets and for each one releases (noisy) estimates of the incidence of α and the incidence of β within this subset. With high probability (depending on t), at least one of the subsets is heavy for α. Putting the pieces together: for t random subsets of [n] the curator releases good approximations to the incidence of all m = k 2 conjunctions of literals. Specifically, we require that with probability at least 1 − δ/m2 a computed implication in distribution Pr[α|β] is accurate to within η/3m2 , where α and β are now minterms of literals. This ensures that with probability least 1 − δ all computed implications in distribution are accurate to within η/3m2 , and so all estimated probabilities for minterms of 2 literals are accurate to within η/m2 . The number t is rather large, and depends on many factors, including the differential privacy parameter as well as η, δ, k and . The analysis in [18] shows that, when k η and δ are constant, this approach reduces the number of (one histogram for each 2-tuple of variables (not literals!)), to queries from 2 O(24 k 2 log k). Note the interesting tradeoff: we require accuracy that depends on m2 in order to avoid making m2 queries. When the database is sufficiently large this tradeoff can be accomplished. 4.2
Contingency Table Release
The results in this Section are due to Barak, Chaudhuri, Dwork, Kale, McSherry, and Talwar [5]. A contingency table is a table of counts. In the context of a census or other survey, we think of the data of an individual as a row in a database. We do not assume the rows are mutually independent. For the present, each row consists of k bits describing the values of k binary attributes a1 , . . . , ak .2 k Formally, the contingency table is a vector in R2 describing, for each setting of the k attributes, the number of rows in the database with this setting of the attribute values. In other words, it is a histogram with 2k cells. Commonly, the contingency table itself is not released, as it is likely to be sparse when k is large. Instead, for various subsets of attributes, the data curator 2
Typically, attributes are non-binary. Any attribute with m possible values can be decomposed into log(m) binary attributes.
8
C. Dwork
releases the projection of the contingency table onto each such subset, i.e., the counts for each of the possible settings of the restricted set of attributes. These smaller tables of counts are called marginals, each marginal being named by a subset of the attributes. A marginal named by a set of j attributes, j ≤ k, is called a j-way marginal. The data curator will typically release many sets of low-order marginals for a single contingency table, with the goal of revealing correlations between many different, and possibly overlapping, sets of attributes. Since a contingency table is a histogram, we can add independently generated noise proportional to −1 to each cell of the contingency table to obtain an differentially private (non-integer and not necessarily non-negative) table. We will address the question of integrality and non-negativity later. For now, we simply note that any desired set of marginals can be computed directly from this noisy table, and consistency among the different marginals is immediate. A drawback of this approach, however, is that while the noise in each cell of the contingency table is relatively small, the noise in the computed marginals may be large. For example, the variance in the 1-way table describing attribute a1 is 2k−1 −2 . We consider this unacceptable, especially when n 2k . Marginals are also histograms. A second approach, with much less noise in the (common) case of low-order marginals, but not offering consistency between marginals, works as follows. Let C be the set of marginals to be released. We can think of a function f that, when applied to the database, yields the desired marginals. Now apply Theorem 1 with this choice of f , (adding noise to each cell in the collection of tables independently), with sensitivity Δf = |C|. When n (the number of rows in the database) is large compared to |C|/, this also yields excellent accuracy. Thus we would be done if the small table-to-table inconsistencies caused by independent randomization of each (cell in each) table are not of concern, and if the user is comfortable with occasionally negative and typically non-integer cell counts. We have no philosophical or mathematical objection to these artifacts – inconsistencies, negativity, and non-integrality – of the privacy-enhancing technology, but in practice they can be problematic. For example, the cell counts may be used as input to other, possibly off-the-shelf, programs that anticipate positive integers, giving rise to type mismatch. Inconsistencies, not to mention negative values, may also be confusing to lay users, such as casual users of the American FactFinder website. We now outline the main steps in the work of Barak et al [5]. Move to the Fourier Domain. When adding noise, two natural solutions present themselves: adding noise to entries of the source table (this was our first proposal; accuracy is poor when k is large), or adding noise to the reported marginals (our second proposal; consistency is violated). A third approach begins by transforming the data into the Fourier domain. This is just a change of basis. Were we to compute all 2k Fourier coefficients we would have a non-redundant encoding of the entire consistency table. If we were to perturb the Fourier coefficients and then convert back to the contingency table domain, we would get a (different, possibly non-integer, possibly negative) contingency table, whose “distance” (for example,
Differential Privacy: A Survey of Results
9
2 distance) from the original is determined by the magnitude of the perturbations. The advantage of moving to the Fourier domain is that if only a set C of marginals is desired then we do not need the full complement of Fourier coefficients. For example, if C is the set of all 3-way marginals,then weneed only the Fourier coefficients of weight at most 3, of which there are k3 + k2 + k + 1. This will translate into a much less noisy set of marginals. The Fourier coefficients needed to compute the marginals C form a model of the dataset that captures everything that can be learned from the set C of marginals. Adding noise to these coefficients as indicated by Theorem 1 and then converting back to the contingency table domain yields a procedure for generating synthetic datasets that ensures differential privacy and yet to a great (and measurable) extent captures the information in the model. This is an example of a concrete method for generating synthetic data with provable differential privacy. The Fourier coefficients exactly describe the information required by the marginals. By measuring exactly what is needed, Barak et al. add the least amount of noise possible using the techniques of [19]. Moreover, the Fourier basis is particularly attractive because of the natural decomposition according to sets of attribute values. Even tighter bounds than those in Theorem 4 below can be placed on sub-marginals (that is, lower order marginals) of a given marginal, by noting that no additional Fourier coefficients are required and fewer noisy coefficients are used in computing the low-order marginal, improving accuracy by reducing variance. Use Linear Programming and Rounding. Barak et al. [5] employ linear programming to obtain a non-negative, but likely non-integer, data set with (almost) the given Fourier coefficients, and then round the results to obtain an integer solution. Interestingly, the marginals obtained from the linear program are no “farther” (made precise in [5]) from those of the noisy measurements than are the true marginals of the raw data. Consequently, the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself. Notation and Preliminaries. Recall that, letting k denote the number of k (binary) attributes, we can think of the data set as a vector x ∈ R2 , indexed by attribute tuples. For each α ∈ {0, 1}k the quantity xα is the number of data elements with this setting of attributes. We let n = x1 be the total number of tuples, or rows, in the data set. For any α ∈ {0, 1}k , we use α1 for the number of non-zero locations. We write β α for α, β ∈ {0, 1}k if every zero location in α is also a zero in β. The Marginal Operator. Barak et al. describe the computation of a set of marginals as the result of applying a marginal operator to the contingency table k α1 for α ∈ {0, 1}k maps contingency tables vector x. The operator C α : R2 → R2 to the marginal of the attributes that are positively set in α (there are 2α1 possible settings of these attributes). Abusing notation, C α x is only defined at those
10
C. Dwork
locations β for which β α: for any β α, the outcome of C α x at position β is the sum over those coordinates of x that agree with β on the coordinates described by α: xγ (3) (C α (x))β = γ:γ∧α=β
Notice that the operator C α is linear for all α. k
Theorem 2. The f α form an orthonormal basis for R2 . Consequently, one can write any marginal as the small summation over relevant Fourier coefficients: f α , x C β f α . (4) Cβ x = αβ
The coefficients f α , x are necessary and sufficient data from x for the computation of C β x. Theorem 3 ([5]). Let B ⊆ {0, 1}k describe a set of Fourier basis vectors. Releasing the set φβ = f β , x + Lap(|B|/2k/2 ) for β ∈ B preserves -differential privacy. Proof: Each tuple contributes exactly ±1/2k/2 to each output coordinate, and consequently the L1 sensitivity of the set of |B| outputs is at most |B|/2k/2 . By Theorem 1, the addition of symmetric exponential noise with standard deviation |B|/2k/2 gives -differential privacy. Remark: To get a sense of scale, we could achieve a similar perturbation to each coordinate by randomly adding or deleting |B|2/ individuals in the data set, which can be much smaller than n. Putting the Steps Together. To compute a set A of marginals, we need all the Fourier coefficients f β for β in the downward closure of A uner . Marginals(A ⊆ {0, 1}k , D): 1. Let B be the downward closure of A under . 2. For β ∈ B, compute φβ = f β , D + Lap(|B|/2k/2 ). 3. Solve for wα in the following linear program, and round to the nearest integral weights, wα . minimize
b
subject to: wα ≥ 0 ∀α wα fαβ ≤ b ∀β ∈ B φβ − α
φβ −
wα fαβ ≥ −b ∀β ∈ B
α
4. Using the contingency table wα , compute and return the marginals for A.
Differential Privacy: A Survey of Results
11
Theorem 4 ([5]). Using the notation of Marginals(A), with probability 1 − δ, for all α ∈ A, C α x − C α w 1 ≤ 2α1 2|B| log(|B|/δ)/ + |B| .
(5)
When k is Large. The linear program requires time polynomial in 2k . When k is large this is not satisfactory. However, somewhat surprisingly, non-negativity (but not integrality) can be achieved by adding a relatively small amount to the first Fourier coefficient before moving back to the data domain. No linear program is required, and the error introduced is pleasantly small. Thus if polynomial in 2k is an unbearable cost and one can live with non-integrality then this approach serves well. We remark that non-integrality was a non-issue in a pilot implementation of this work as counts were always converted to percentages. 4.3
Learning (Nearby) Halfspaces
We close this Section with an example inspired by questions in learning theory, appearing in a forthcoming paper of Blum, Ligett, and Roth [8]. The goal is to give a non-interactive solution to half-space queries. At a high level, their approach is to publish information that (approximately) answers a large set of “canonical” queries of a certain type, with the guarantee that for any (possibly non-canonical) query of the given type there is a “nearby” canonical query. Hence, the data analyst can obtain the answer to a query that in some sense is close to the query of interest. The queries in [8] are halfspace queries in Rd , defined next. Throughout this section we adopt the assumption in [8] that the database points are scaled into the unit sphere. Definition 3. Given a database D ⊂ Rd and unit length y ∈ Rd , a halfspace query Hy is d |{x ∈ D : i=1 xi · yi ≥ 0}| . Hy (D) = |D| Note that a halfspace query can be from two counting queries: “What estimated d is |D|?” and “What is |{x ∈ D : i=1 xi · yi ≥ 0}|?” Thus, the halfspace query has sensitivity at most 2. The distance between halfspace queries Hy1 and Hy2 is defined to be the sine of the angle between them, sin(y1 , y2 ). With this in mind, the algorithm of Blum, Ligett, and Roth, ensures the following notion of utility: Definition 4 ([8]). A database mechanism A is (, δ, γ)-useful for queries in class C according to some metric d if with probability 1 − δ, for every Q ∈ C and every database D, |Q(A(D)) − Q (D)| ≤ for some Q ∈ C such that d(Q, Q ) ≤ γ. Note that it is the queries that are close, not (necessarily) their answers. Given a halfspace query Hy1 , the algorithm below will output a value v such that |v − Hy2 (D)| < for some Hy2 that is γ-close to Hy1 . Equivalently, the
12
C. Dwork
algorithm arbitrarily counts or fails to count points x ∈ D such that cos(x, y1 ) ≤ γ. Blum et al. note that γ plays a role similar to the notion of margin in machine learning, and that even if Hy1 and Hy2 are γ-close, this does not imply that true answers to the queries Hy1 (D) and Hy2 (D) are close, unless most of the data points are outside a γ margin of Hy1 and Hy2 . Definition 5 ([8]). A halfspace query Hy is b-discretized if for each i ∈ [d], yi can be specified with b bits. Let Cb be the set of all b-discretized halfspaces in Rd . Consider a particular k < d dimensional subspace of Rd defined by a random d × k matrix M with entries chosen independently and uniformly from {−1, 1}. √ Consider the projection PM (x) = (1/ k)x · M , which projects database points into the subspace and re-scales them to the unit sphere. For a halfspace query Hy , the projection PM (Hy ) is simply the k-dimensional halfspace query defined by the projection PM (y). The key fact is that, for a randomly chosen M , projecting a database point x and the halfspace query specifier y is very unlikely to significantly change the angle between them: Theorem 5 (Johnson-Lindenstrauss Theorem). Consider a projection of a point x and a halfspace Hy onto a random k-dimensional subspace as defined by a projection matrix M . Then Pr[| cos(x, Hy ) − cos(PM (x), HPM (y) )| ≥ γ/4] ≤ 2e−((γ/16)
2
−(γ/16)3 )k/4
.
The dimension k of the subspace is chosen such that the probability that projecting a point and a halfspace changes the angle between them by more than γ/4 is at most 1 /4. This yields k≥
4 ln(8/1 ) . (γ/16)2 − (γ/16)3
Thus, the answer to the query Hy can be estimated by a privacy-preserving estimate of the answer to the projected halfspace query, and overall accuracy could be improved by choosing m projection matrices; the angle between x and y would be estimated by the median of the angles induces by the m resulting pairs of projections of x and y. Of course, if the goal were to respond to a few half-space queries there would be no point in going through the projection process, let alone taking several projections. But the goal of [8] is more ambitious: an(, δ, γ)-useful non-interactive mechanism for (non-discretized) halfspace queries; this is where the lower dimensionality comes into play. The algorithm chooses m projection matrices, where m depends on the discretization parameter b, the dimension d, and the failure probability δ (more specifically, m ∈ O(ln(1/δ)+ln(bd))). For each random subspace (defined by a projection matrix M ), the algorithm selects a net NM of “canonical” halfspaces (defined by canonical vectors in the subspace) such that for every vector y ∈ Rk there is a nearby canonical vector, specifically, of distance (induced sine) at most (3/4)γ.
Differential Privacy: A Survey of Results
13
The number of canonical vectors needed is O(1/γ k−1 ). For each of these, the curator publishes a privacy-preserving estimate of the projected halfspace query. The mechanism is non-interactive and the curator will play no further role. To handle an arbitrary query y, the analyst begins with an empty multiset. For each of the m projections M , the analyst finds a vector yˆ ∈ NM closest to PM (y), adding to the multiset the answer to that halfspace query. The algorithm outputs the median of these m values. Theorem 6 ([8]). Let n≥
log(1/δ) + log m + (k − 1) log(1/γ) + mO(1/γ)k−1 2 α
Then the above algorithm is (, γ, δ)-useful while maintaining α-differential privacy for a database of size n. The algorithm runs in time poly(log(1/δ), 1/, 1/α, b, d) for constant γ.
5
General Learning Theory Results
We briefly outline three general results regarding what can be learned privately in the interactive model. We begin with a result of Blum, Dwork, McSherry and Nissim, showing that anything learnable in the statistical queries learning model can also be efficiently privately learned interactively [7]. We then move to results that ignore computational issues, showing that the exponential mechanism of McSherry and Talwar [35] can be used to 1. Privately learn anything that is PAC learnable [32]; and 2. Generate, for any class of functions C with polynomial VC dimension, a differentially private “synthetic database” that gives “good” answers to any query in C [8]. The use of the exponential mechanism in this context is due to Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith [32]. 5.1
Emulating the Statistical Query Model
The Statistical Query (SQ) model, proposed by Kearns in [10], is a framework for examining statistical algorithms executed on samples drawn independently from an underlying distribution. In this framework, an algorithm specifies predicates f1 , . . . , fk and corresponding accuracies τ1 , . . . , τk , and is returned, for 1 ≤ i ≤ k, the expected fraction of samples satisfying fi , to within additive error τi . Conceptually, the framework models drawing a sufficient number of samples so that the observed count of samples satisfying each fi is a good estimate of the actual expectation. The statistic/al queries model is most commonly used in the computational learning theory community, where the goal is typically to learn a (in this case, Boolean) concept, that is, a predicate on the data, to within a certain degree of
14
C. Dwork
accuracy. Formally, an algorithm δ-learns a concept c if it produces a predicate such that the probability of misclassification under the latent distribution is at most 1 − δ. Blum, Dwork, McSherry, and Nissim have shown that any concept that is learnable in the statistical query model is privately learnable using the equivalent algorithm on a so-called “SuLQ” database [7]; the proof is not at all complicated. Reformulting these results using the slightly different technology presented in the current survey the result is even easier to argue. Assume we have an algorithm in the SQ model that makes the k statistical queries f1 , . . . , fk , and let τ = min{τ1 , . . . , τk } be the minimum required tolerance in the SQ algorithm. Assume that n, the size of the database, is known. In this case, we can compute the answer to fi by asking the predicate/counting query corresponding to fi , call it pi , and dividing the result by n. Thus, we are dealing with a query squence of sensitivity at most k. Let b = k/. Write τ = ρ/n, for ρ to be determined later, so that if the noise added to the true answer when the counting query is pi has magnitude bounded by ρ, the response to the statistical query is within tolerance τ . We want to find ρ so that the probability that a response has noise magnitude at least ρ is bounded by δ/k, when the noise is generated according to Lap(k/). The Laplace distribution is symmetric, so it is enough to find x < 0 such that the cumulative distribution function at x is bounded by δ/2k: δ 1 −|x|/b e < 2 2k By a straightforward calculation, this is true provided |x| > b ln(k/δ), ie, when |x| > (k/) ln(k/δ). We therefore set ρ > (k/) ln(k/δ). So long as ρ/n < τ , or, more to the point, so long as n > ρ/τ , we can emulate the SQ algorithm. This analysis only takes into account noise introduced by K; that is, it assumes 1 n i=1 fj (di ) = Prx∈R D [fj (x)], 1 ≤ j ≤ k, where D is the distribution on n examples. The results above apply, mutatis mutandis, when we assume that the rows in the database are drawn iid according to D using the well known fact that taking n > τ −2 log(k/δ) is sufficient to ensure tolerance τ with probability at least 1 − δ for all fj , 1 ≤ j ≤ k, simultaneously. Replacing τ by τ /2 everywhere and finding the maximum lower bound on n handles both types of error. 5.2
Private PAC Learning
The results in this section are due to Kasiviswanathan, Lee, Nissim, Raskhodnikova, and Smith [32]. We begin with an informal and incomplete review of the concept of probably-appproximately-correct (PAC) learning, a notion due to Valiant [47]. Consider a concept t : X −→ Y that assigns to each example taken from the domain X a label from the range Y . As in the previous section, a learning algorithm is given labeled examples drawn from a distribution D, labeled by a target concept; the goal is to produce an hypothesis h : X −→ Y from a specified hypothesis class, with small error, defined by:
Differential Privacy: A Survey of Results
15
error(h) = Pr [t(x) = h(x)]. x∈R D
A concept class is a set of concepts. Learning theory asks what kinds of concept classes are learnable. Letting α, β denote two error bounds, if the target concept belongs to C, then the goal is to minimize error, or at least to ensure that with probability 1 − β the error is bounded by α. This is the setting of traditional PAC learning. If the target concept need not belong to C, the goal is to produce an hypothesis that, intuitively, does almost as well as any concept in C: Letting OPT = minc∈C {error(c)}, we want Pr[error(h) ≤ OPT + α] ≥ 1 − β, where the probability is over the samples seen by the learner, and the learner’s randomness. This is known as agnostic learning.3 Following [32] we will index concept classes, domains, and ranges by the length d of their binary encodings. For a target concept t : Xd → Yd and a distribution D over Xd , let Z ∈ Dn be a database containing n labeled independent draws from D. That is, Z contains n pairs (xi , yi ) where yi = t(xi ), 1 ≤ i ≤ n. The goal of the data analyst will be to agnostically learn a hypothesis class C; the goal of the curator will be to ensure -differential privacy. Theorem 7 ([32]). Every concept class C is -differentially pivately agnostically learnable using hypothesis class H = C with n ∈ O(log Cd + log(1β) · 1 , α12 }). The learner may not be efficient. max{ α The theorem is proved using the exponential mechanism of McSherry and Talwar, with utility function u(Z, h) = −|{i : t(xi ) = h(xi )}| for Z ∈ (X × Y )n , h ∈ Hd . Note that u has sensitivity 1, since changing any element in the database can change the number of misclassifications by at most 1. The (inefficient) algorithm outputs c ∈ Hd with probability proportional to exp(u(Z, c)/2). Privacy follows from the properties of the exponential mechanism and the small sensitivity of u. Accuracy (low error with high probablity) is slightly more difficult to argue; the proof follows, intuitively, from the fact that outputs are produced with probability that falls exponentially in the number of misclassifications. 5.3
Differentially Private Queries of Classes with Polynomial VC Dimension
Our last example is due to Blum, Ligett, and Roth [8] and was inspired by the result of the previous section, This learning result again ignores computational 3
The standard agnostic model has the input drawn from an arbitrary distribution over labeled examples (x, y) (that is, the label need not be a deterministic function of the example). The error of a hypothesis is defined with respect to the distribution (i.e. probability that y = h(x)). The results (and proofs) of Kasiviswanathan et al. stay the same in this more general setting.
16
C. Dwork
efficiency, using the exponential mechanism of McSherry and Talwar. The object here is to release a “synthetic dataset” that ensures “reasonably” accurate answers to all queries in a specific class C. The reader is assumed to be familiar with the Vapnick-Chervonenkis (VC) dimension of a class of concepts. Roughly speaking, it is a measure of how complicated a concept in the class can be. Definition 6 ([8]). A database mechanism A is (γ, δ)-useful for queries in class = A(D), C if with probability 1−δ, for every Q ∈ C and every database D, for D |Q(D) − Q(D)| ≤ γ. Let C be a fixed class of queries. Given a database D ∈ ({0, 1}d)n of size n, where n is sufficiently large (as a function of the VC dimension of C, as well ˆ that is as of and δ), the goal is to produce produce a synthetic dataset D (γ, δ)-useful for queries in C, while ensuring -differential privacy. The synthetic dataset will contain m = O(VC dim(C)/γ 2 ) d-tuples. It is chosen according to the exponential mechanism using the utility function ˆ . ˆ = − max h(D) − n h(D) u(D, D) h∈C m Theorem 8 ([8]). For any class of functions C, and any database D ⊂ {0, 1}d such that d · VC dim(C) log(1/δ) + |D| ≥ O γ3 γ that preserves -differential privacy. we can output an (γ, δ)-useful database D Note that the algorithm is not necessarily efficient. Blum et al. note that this suffice for (γ, δ)-usefulness because the set of all databases of this size forms a γ-cover with respect to C of the set of all possible databases. ˆ < |D|, the number of database entries One can resolve the fact that, since |D| matching any query will be proportionately smaller by considering the fraction of entries matching any query.
6
Concluding Remarks
The privacy mechanisms discussed herein add an amount of noise that grows with the complexity of the query sequence applied to the database. Although this can be ameliorated to some extent using Gaussian noise instead of Laplacian, an exciting line of research begun by Dinur and Nissim [13] (see also [17, 20]) shows that this increase is essential. To a great extent, the results of Dinur and Nissim drove the development of the mechanism K and the entire interactive approach advocated in this survey. A finer analysis of realistic attacks, and a better understanding of what failure to provide -differential privacy can mean in practice, are needed in order to sharpen these results – or to determine this is impossible, in order to understand the how to use these techniques for all but very large, “internet scale,” data sets.
Differential Privacy: A Survey of Results
17
Acknowledgements. The author thanks Avrim Blum, Katrina Ligett, Aaron Roth and Adam Smith for helpful technical discussions, and Adam Smith for excellent comments on an early draft of this survey. The author is grateful to all the authors of [32, 8] for sharing with her preliminary versions of their papers.
References [1] Achugbue, J.O., Chin, F.Y.: The Effectiveness of Output Modification by Rounding for Protection of Statistical Databases. INFOR 17(3), 209–218 (1979) [2] Adam, N.R., Wortmann, J.C.: Security-Control Methods for Statistical Databases: A Comparative Study. ACM Computing Surveys 21(4), 515–556 (1989) [3] Agrawal, D., Aggarwal, C.: On the Design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems (2001) [4] Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 439–450 (2000) [5] Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F., Talwar, K.: Privacy, Accuracy, and Consistency Too: A Holistic Solution to Contingency Table Release. In: Proceedings of the 26th Symposium on Principles of Database Systems, pp. 273–282 (2007) [6] Beck, L.L.: A Security Mechanism for Statistical Databases. ACM TODS 5(3), 316–338 (1980) [7] Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical Privacy: The SuLQ framework. In: Proceedings of the 24th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (June 2005) [8] Blum, A., Ligett, K., Roth, A.: A Learning Theory Approach to Non-Interactive Database Privacy. In: Proceedings of the 40th ACM SIGACT Symposium on Thoery of Computing (2008) [9] Chawla, S., Dwork, C., McSherry, F., Smith, A., Wee, H.: Toward Privacy in Public Databases. In: Proceedings of the 2nd Theory of Cryptography Conference (2005) [10] Dalenius, T.: Towards a methodology for statistical disclosure control. Statistik Tidskrift 15, 222–429 (1977) [11] Denning, D.E.: Secure Statistical Databases with Random Sample Queries. ACM Transactions on Database Systems 5(3), 291–315 (1980) [12] Denning, D., Denning, P., Schwartz, M.: The Tracker: A Threat to Statistical Database Security. ACM Transactions on Database Systems 4(1), 76–96 (1979) [13] Dinur, I., Nissim, K.: Revealing Information While Preserving Privacy. In: Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 202–210 (2003) [14] Duncan, G.: Confidentiality and statistical disclosure limitation. In: Smelser, N., Baltes, P. (eds.) International Encyclopedia of the Social and Behavioral Sciences, Elsevier, New York (2001) [15] Dwork, C.: Differential Privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006) [16] Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I., Naor, M.: Our Data, Ourselves: Privacy Via Distributed Noise Generation. In: Vaudenay, S. (ed.) EUROCRYPT 2006. LNCS, vol. 4004, pp. 486–503. Springer, Heidelberg (2006)
18
C. Dwork
[17] Dwork, C., McSherry, F., Talwar, K.: The Price of Privacy and the Limits of LP Decoding. In: Proceedings of the 39th ACM Symposium on Theory of Computing, pp. 85–94 (2007) [18] Dwork, C., Nissim, K.: Privacy-Preserving Datamining on Vertically Partitioned Databases. In: Franklin, M. (ed.) CRYPTO 2004. LNCS, vol. 3152, pp. 528–544. Springer, Heidelberg (2004) [19] Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating Noise to Sensitivity in Private Data Analysis. In: Proceedings of the 3rd Theory of Cryptography Conference, pp. 265–284 (2006) [20] Dwork, C., Yekhanin, S.: New Efficient Attacks on Statistical Disclosure Control Mechanisms (manuscript, 2008) [21] Evfimievski, A.V., Gehrke, J., Srikant, R.: Limiting Privacy Breaches in Privacy Preserving Data Mining. In: Proceedings of the Twenty-Second ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, pp. 211–222 (2003) [22] Agrawal, D., Aggarwal, C.C.: On the design and Quantification of Privacy Preserving Data Mining Algorithms. In: Proceedings of the 20th Symposium on Principles of Database Systems, pp. 247–255 (2001) [23] Agrawal, R., Srikant, R.: Privacy-Preserving Data Mining. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 439–450 (2000) [24] Chin, F.Y., Ozsoyoglu, G.: Auditing and infrence control in statistical databases. IEEE Trans. Softw. Eng. SE-8(6), 113–139 (1982) [25] Dobkin, D., Jones, A., Lipton, R.: Secure Databases: Protection Against User Influence. ACM TODS 4(1), 97–106 (1979) [26] Fellegi, I.: On the question of statistical confidentiality. Journal of the American Statistical Association 67, 7–18 (1972) [27] Fienberg, S.: Confidentiality and Data Protection Through Disclosure Limitation: Evolving Principles and Technical Advances. In: IAOS Conference on Statistics, Development and Human Rights (September 2000), http://www. statistik.admin.ch/about/international/fienberg final paper.doc [28] Fienberg, S., Makov, U., Steele, R.: Disclosure Limitation and Related Methods for Categorical Data. Journal of Official Statistics 14, 485–502 (1998) [29] Franconi, L., Merola, G.: Implementing Statistical Disclosure Control for Aggregated Data Released Via Remote Access. In: United Nations Statistical Commission and European Commission, joint ECE/EUROSTAT work session on statistical data confidentiality, Working Paper No.30 (April 2003), http://www.unece. org/stats/documents/2003/04/confidentiality/wp.30.e.pdf [30] Goldwasser, S., Micali, S.: Probabilistic Encryption. J. Comput. Syst. Sci. 28(2), 270–299 (1984) [31] Gusfield, D.: A Graph Theoretic Approach to Statistical Data Security. SIAM J. Comput. 17(3), 552–571 (1988) [32] Kasiviswanathan, S., Lee, H., Nissim, K., Raskhodnikova, S., Smith, S.: What Can We Learn Privately? (manuscript, 2007) [33] Lefons, E., Silvestri, A., Tangorra, F.: An analytic approach to statistical databases. In: 9th Int. Conf. Very Large Data Bases, October-November 1983, pp. 260–274. Morgan Kaufmann, San Francisco (1983) [34] Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M.: l-Diversity: Privacy Beyond k-Anonymity. In: Proceedings of the 22nd International Conference on Data Engineering (ICDE 2006), p. 24 (2006)
Differential Privacy: A Survey of Results
19
[35] McSherry, F., Talwar, K.: Mechanism Design via Differential Privacy. In: Proceedings of the 48th Annual Symposium on Foundations of Computer Science (2007) [36] Narayanan, A., Shmatikov, V.: How to Break Anonymity of the Netflix Prize Dataset, http://www.cs.utexas.edu/∼ shmat/shmat netflix-prelim.pdf [37] Nissim, K., Raskhodnikova, S., Smith, A.: Smooth Sensitivity and Sampling in Private Data Analysis. In: Proceedings of the 39th ACM Symposium on Theory of Computing, pp. 75–84 (2007) [38] Raghunathan, T.E., Reiter, J.P., Rubin, D.B.: Multiple Imputation for Statistical Disclosure Limitation. Journal of Official Statistics 19(1), 1–16 (2003) [39] Reiss, S.: Practical Data Swapping: The First Steps. ACM Transactions on Database Systems 9(1), 20–37 (1984) [40] Rubin, D.B.: Discussion: Statistical Disclosure Limitation. Journal of Official Statistics 9(2), 461–469 (1993) [41] Shoshani, A.: Statistical databases: Characteristics, problems and some solutions. In: Proceedings of the 8th International Conference on Very Large Data Bases (VLDB 1982), pp. 208–222 (1982) [42] Samarati, P., Sweeney, L.: Protecting Privacy when Disclosing Information: kAnonymity and its Enforcement Through Generalization and Specialization, Technical Report SRI-CSL-98-04, SRI Intl. (1998) [43] Samarati, P., Sweeney, L.: Generalizing Data to Provide Anonymity when Disclosing Information (Abstract). In: Proceedings of the Seventeenth ACM SIGACTSIGMOD-SIGART Symposium on Principles of Database Systems, p. 188 (1998) [44] Sweeney, L.: Weaving Technology and Policy Together to Maintain Confidentiality. J. Law Med Ethics 25(2-3), 98–110 (1997) [45] Sweeney, L.: k-anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems 10(5), 557–570 (2002) [46] Sweeney, L.: Achieving k-Anonymity Privacy Protection Using Generalization and Suppression. International Journal on Uncertainty, Fuzziness and Knowledgebased Systems 10(5), 571–588 (2002) [47] Valiant, L.G.: A Theory of the Learnable. In: Proceedings of the 16th Annual ACM SIGACT Symposium on Theory of computing, pp. 436–445 (1984) [48] Xiao, X., Tao, Y.: M-invariance: towards privacy preserving re-publication of dynamic datasets. In: SIGMOD 2007, pp. 689–700 (2007)
On the Complexity of Measurement in Classical Physics Edwin Beggs1 , Jos´e F´elix Costa2,3, , Bruno Loff2,3 , and John Tucker1 1
3
Department of Mathematics and Department of Computer Science University of Wales Swansea, Singleton Park, Swansea, SA3 2HN Wales, United Kingdom
[email protected],
[email protected] 2 Department of Mathematics, Instituto Superior T´ecnico Universidade T´ecnica de Lisboa Lisboa, Portugal
[email protected],
[email protected] Centro de Matem´ atica e Aplica¸co ˜es Fundamentais do Complexo Interdisciplinar Universidade de Lisboa Lisbon, Portugal
Abstract. If we measure the position of a point particle, then we will come about with an interval [an , bn ] into which the point falls. We make use of a Gedankenexperiment to find better and better values of an and bn , by reducing their relative distance, in a succession of intervals [a1 , b1 ] ⊃ [a2 , b2 ] ⊃ . . . ⊃ [an , bn ] that contain the point. We then use such a point as an oracle to perform relative computation in polynomial time, by considering the succession of approximations to the point as suitable answers to the queries in an oracle Turing machine. We prove that, no matter the precision achieved in such a Gedankenexperiment, within the limits studied, the Turing Machine, equipped with such an oracle, will be able to compute above the classical Turing limit for the polynomial time resource, either generating the class P/poly either generating the class BP P//log∗, if we allow for an arbitrary precision in measurement or just a limited precision, respectively. We think that this result is astonishingly interesting for Classical Physics and its connection to the Theory of Computation, namely for the implications on the nature of space and the perception of space in Classical Physics. (Some proofs are provided, to give the flavor of the subject. Missing proofs can be found in a detailed long report at the address http://fgc.math.ist.utl.pt/papers/sm.pdf.)
1
Introducing a Gedankenexperiment
If a physical experiment were to be coupled with algorithms, would new functions and relations become computable or, at least, computable more efficiently?
Corresponding author. The authors are listed in alphabetic order.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 20–30, 2008. c Springer-Verlag Berlin Heidelberg 2008
On the Complexity of Measurement
21
To pursue this question, we imagine using an experiment as an oracle to a Turing machine, which on being presented with, say, xi as its i-th query, returns yi to the Turing machine. In this paper we will consider this idea of coupling experiments and Turing machines in detail. The first thing to note is that choosing a physical experiment to use as an oracle is a major undertaking. The experiment comes with plenty of theoretical baggage: concepts of equipment, experimental procedure, instruments, measurement, observable behaviour, etc. In earlier work [5], an experiment was devised to measure the position of the vertex of a wedge to arbitrary accuracy, by scattering particles that obey some laws of elementary Newtonian kinematics. Let SME denote this Scatter Machine Experiment. The Newtonian theory was specified precisely and the SME was put under a theoretical microscope: theorems were proved that showed that the experiment was able to compute positions that were not computable by algorithms. Indeed, the SME could, in principle, measure any real number. Thus, [5] contains a careful attempt to answer the question above, in the positive; it does so using a methodology developed and applied in earlier studies [3,4]. To address the question, here we propose to use the SME as an oracle to a Turing machine and to classify the computational power of the new type of machine, which we call an analogue-digital scatter machine. Given the results in [5], we expect that the use of SME will enhance the computational power and efficiency of the Turing machine. To accomplish this, we must establish some principles that do not depend upon the SME. In a Turing machine, the oracle is normally specified very abstractly by a set. Here we have to design a new machine where the oracle is replaced by a specification of some physical equipment and a procedure for operating it. The design of the new machine depends heavily upon the interface and interaction between the experiment and the Turing machine. Following some insights provided by the work of Hava T. Siegelmann and Eduardo Sontag [6], we use non uniform complexity classes of the form B/F, where B is the class of computations and F is the advice class. Context and proofs are, however, different. Examples of interest for B are P and BP P ; examples for F are poly and log. The power of the machines correspond with the choice of different B/F.
2
The Scatter Machine
Experiments with scatter machines are conducted exactly as described in [5], but, for convenience and to use them as oracles, we need to review and clarify some points. The scatter machine experiment (SME) is defined within the Newtonian mechanics, comprising of the following laws and assumptions: (a) point particles obey Newton’s laws of motion in the two dimensional plane, (b) straight line barriers have perfectly elastic reflection of particles, i.e., kinetic energy is conserved exactly in collisions, (c) barriers are completely rigid and do not deform on impact, (d) cannons, which can be moved in position, can project
22
E. Beggs et al.
6
right collecting box
6
sample trajectory
1
@ @
5m
6 x 0?
@ @ @ @
t
6
cannon
z
0?
limit of traverse of point of wedge
?
1
10 m/s
limit of traverse of cannon
left collecting box
5m
-
Fig. 1. A schematic drawing of the scatter machine
a particle with a given velocity in a given direction, (e) particle detectors are capable of telling if a particle has crossed a given region of the plane, and (f) a clock measures time. The machine consists of a cannon for projecting a point particle, a reflecting barrier in the shape of a wedge and two collecting boxes, as in Figure 1. The wedge, our motionless point particle, can be at any position. But we will assume it is fixed for the duration of all the experimental work. Under the control of a Turing machine, the cannon will be moved and fired repeatedly to find information about the position of the wedge. Specifically, the way the SME is used as an oracle in Turing machine computations, is this: a Turing machine will set a position for the canon as a query and will receive an observation about the result of firing the cannon as a response. For each input to the Turing machine, there will be finitely many runs of the experiment. In Figure 1 the parts of the machine are shown in bold lines, with description and comments in narrow lines. The double headed arrows give dimensions in meters, and the single headed arrows show a sample trajectory of the particle after being fired by the cannon. The sides of the wedge are at 45◦ to the line of the cannon, and we take the collision to be perfectly elastic, so the particle is deflected at 90◦ to the line of the cannon, and hits either the left or right collecting box, depending on whether the cannon is to the left or right of the point of the wedge. Since the initial velocity is 10 m/s, the particle will enter one of the two boxes within 1 second of being fired. Any initial velocity v > 0
On the Complexity of Measurement
23
will work with a corresponding waiting time. The wedge is sufficiently wide so that the particle can only hit the 45◦ slopping sides, given the limit of traverse of the cannon. The wedge is sufficiently rigid so that the particle cannot move the wedge from its position. We make the further assumption, without loss of generality (see the report mentioned in the abstract) that the vertex of the wedge is not a dyadic rational. Suppose that x is the arbitrarily chosen, but non dyadic and fixed, position of the point of the wedge. For a given cannon position z, there are two outcomes of an experiment: (a) one second after firing, the particle is in the right box — conclusion: z > x —, or (b) one second after firing, the particle is in the left box — conclusion: z < x. The SME was designed to find x to arbitrary accuracy by altering z, so in our machine 0 ≤ x ≤ 1 will be fixed, and we will perform observations at different values of 0 ≤ z ≤ 1. Consider the precision of the experiment. When measuring the output state the situation is simple: either the ball is in one tray or in the other tray. Errors in observation do not arise. Now consider some of the non-trivial ways in which precision depends on the positions of the cannon. There are different postulates for the precision of the cannon, and we list some in order of decreasing strength: Definition 2.1. The SME is error-free if the cannon can be set exactly to any given dyadic rational number. The SME is error-prone with arbitrary precision if the cannon can be set only to within a non-zero, but arbitrarily small, dyadic precision. The SME is error-prone with fixed precision if there is a value ε > 0 such that the cannon can be set only to within a given precision ε. The Turing machine is connected to the SME in the same way as it would be connected to an oracle: we replace the query state with a shooting state (qs ), the “yes” state with a left state (ql ), and the “no” state with a right state (qr ). The resulting computational device is called the analog-digital scatter machine, and we refer to the vertex position of an analog-digital scatter machine when mean to discuss the vertex position of the corresponding SME. In order to carry out a scatter machine experiment, the analog-digital scatter machine will write a word z in the query tape and enter the shooting state. This word will either be “1”, or a binary word beginning with 0. We will use z indifferently to denote both a word z1 . . . zn ∈ {1} ∪ {0s : s ∈ {0, 1}∗} and the n corresponding dyadic rational i=1 2−i+1 zi ∈ [0, 1]. In this case, we write |z| to denote n, i.e., the size of z1 . . . zn , and say that the analog-digital scatter machine is aiming at z. The Turing machine computation will then be interrupted, and the SME will attempt to set the cannon at z. The place where the cannon is actually set at depends on whether the SME is error-free or error-prone. If the SME is error-free, the cannon will be placed exactly at z. If the SME is errorprone with arbitrary precision, then the cannon will be placed at some point in the interval [z − 2−|z|−1 , z + 2−|z|−1 ] with a uniform probability distribution over this interval. This means that for different words representing the same dyadic rational, the longest word will give the highest precision. If the SME is errorprone with fixed dyadic precision ε, then the cannon will be placed somewhere in the interval [z − ε, z + ε], again with a uniform probability distribution.
24
E. Beggs et al.
After setting the cannon, the SME will fire a projectile particle, wait one second and then check if the particle is in either box. If the particle is in the right collecting box, then the Turing machine computation will be resumed in the state qr . If the particle is in left box, then the Turing machine computation will be resumed in the state ql . With this behaviour, we obtain three distinct analog-digital scatter machines. Definition 2.2. An error-free analog-digital scatter machine is a Turing machine connected to an error-free SME. In a similar way, we define an errorprone analog-digital scatter machine with arbitrary precision, and an error-prone analog-digital scatter machine with fixed precision. The error-free analog-digital scatter machine has a very simple behaviour. If such a machine, with vertex position x ∈ [0, 1], aims at a dyadic rational z ∈ [0, 1], we are certain that the computation will be resumed in the state ql if z < x, and that it will be resumed in the state qr when z > x. We define the following decision criterion. Definition 2.3. Let A ⊆ Σ ∗ be a set of words over Σ. We say that an errorfree analog-digital scatter machine M decides A if, for every input w ∈ Σ ∗ , w is accepted if w ∈ A and rejected when w ∈ / A. We say that M decides A in polynomial time, if M decides A, and there is a polynomial p such that, for every w ∈ Σ ∗ , the number of steps of the computation is bounded by p(|w|). The error-prone analog-digital scatter machines, however, do not behave in a deterministic way. If such a machine aims the cannon close enough to the vertex position, and with a large enough error, there will be a positive probability for both the particle going left or right. If the vertex position is x, and the error-prone analog-digital scatter machine M aims the cannon at z, then the probability of the particle going to the left box, denoted by P(M ← left ), is given by: ⎧ if z < x − ε ⎨1 if x − ε ≤ z ≤ x ˜+ε P(M ← left ) = 12 + x−z 2ε ⎩ 0 if z > x ˜+ε The value ε will be either 2−|z|−1 or a fixed value, depending on the type of error-prone analog-digital scatter machine under consideration. The probability of the particle going to the right box is P(M ← right ) = 1 − P(M ← left ). We can thus see that a deterministic decision criteria is not suitable for these machines. For a set A ⊆ Σ ∗ , an error-prone analog-digital scatter machine M, and an input w ∈ Σ ∗ , let the error probability of M for input w be either the probability of M rejecting w, if w ∈ A, or the probability of M accepting w, if w ∈ A. Definition 2.4. Let A ⊆ Σ ∗ be a set of words over Σ. We say that an errorprone analog-digital scatter machine M decides A if there is a number γ < 12 , such that the error probability of M for any input w is smaller than γ. We say
On the Complexity of Measurement
25
that M decides A in polynomial time, if M decides A, and there is a polynomial p such that, for every input w ∈ Σ ∗ , the number of steps in every correct computation is bounded by p(|w|).
3
The Relevant Computational Classes
We will see, in the sections that follow, that the non-uniform complexity classes give the most adequate characterisation of the computational power of the analog-digital scatter machine. Non-uniform complexity classifies problems by studying families of finite machines (e.g., circuits) {Cn }n∈N , where each Cn decides the restriction of some problem to inputs of size n. It is called non-uniform, because for every n = m the finite machines Cn and Cm can be entirely unrelated, while in uniform complexity the algorithm is the same for inputs of every size. A way to connect the two approaches is by means of advice classes: one assumes that there is a unique algorithm for inputs of every size, which is aided by certain information, called advice, which may vary for inputs of different sizes. The advice is given, for each input w, by means of function f : N → Σ ∗ , where Σ ∗ is the input alphabet. Definition 3.1. Let B be a class of sets and F a class of functions. The advice class B/F is the class of sets A for which some B ∈ B and some f ∈ F are such that, for every w, w ∈ A if and only if w, f (|w|) ∈ B. F is called the advice class and f is called the advice function. Examples for B are the well known classes P , or BP P (see [1, Chapter 6]). We will be considering two instances for the class F : poly is the class of functions with polynomial size values, i.e., poly is the class of functions f : N → Σ ∗ such that, for some polynomial p, |f (n)| ∈ O(p(n)); log is the class of functions g : N → Σ ∗ such that |g(n)| ∈ O(log(n)). We will also need to treat prefix non-uniform complexity classes. For these classes we may only use prefix functions, i.e., functions f such that f (n) is always a prefix of f (n + 1). The idea behind prefix non-uniform complexity classes is that the advice given for inputs of size n may also be used to decide smaller inputs. Definition 3.2. Let B be a class of sets and F a class of functions. The prefix advice class B/F ∗ is the class of sets A for which some B ∈ B and some prefix function f ∈ F are such that, for every length n and input w, with |w| ≤ n, w ∈ A if and only if w, f (n) ∈ B. For the non-deterministic classes we need a refined definition: Definition 3.3. BP P//poly is the class of sets A for which a probabilistic polynomial Turing machine M, a function f ∈ poly, and a constant γ < 12 exist such that M rejects w, f (|w|) with probability at most γ if w ∈ A and accepts w, f (|w|) with probability at most γ if w ∈ / A. A similar definition applies to the class BP P// log ∗.
26
E. Beggs et al.
It can be shown that BP P//poly = BP P/poly, but it is unknown whether BP P/ log ∗ ⊆ BP P// log ∗. It is important to notice that the usual non-uniform complexity classes contain undecidable sets, e.g., P/poly contains the halting set.
4
Infinite Precision
first gedankenexperiment: the cannon can be placed at some dyadic rational with infinite precision. I.e., we began by investigating the error-free machine, which gives the simplest situation. For every n ∈ N and x ∈ R, let xn be the rational number obtained by truncating x after the first n digits in its binary expansion. Then we showed the following result: Proposition 4.1. Let M be an error-free analog-digital scatter machine with ˜ be the same machine, but with the the vertex placed at the position x. Let M ˜ after vertex placed at the position x ˜ = x t . Then, for any input w, M and M, t steps of computation, make the same decisions. We may now sketch the proof of the following main theorem of this section. Theorem 4.1. The class of sets decided by error-free analog-digital scatter machines in polynomial time is exactly P/poly. The sketch of the proof is done by the way of polynomial advice. Let A be a set in P/poly, and, by definition, let B ∈ P , f ∈ poly be such that w ∈ A ⇐⇒ w, f (|w|) ∈ B. Let f˜ : N → Σ ∗ be a function, also in poly, such that, if the symbols of f (n) are ξ1 ξ2 . . . ξp(n) , then f˜(n) = 0ξ1 0ξ2 . . . 0ξp(n) We can create an error-free analog-digital scatter machine which also decides A, setting the ˜ vertex at the position x = 0.f˜(1)11f(2)11 f˜(3)11 . . . Given any input w of size n, the error-free analog-digital scatter machine Sx can use the bisection method to obtain f (n) in polynomial time. Then the machine uses the polynomial-time algorithm which decides B, and accepts if and only if w, f (n) is in B. Thus we have shown that an error-free analog-digital scatter machine can decide any set in P/poly in polynomial time. As for the converse, let C be any set decided in polynomial time by an errorfree analog-digital scatter machine with the vertex at the position x. Proposition 4.1 ensures that to decide on any input w, the machine only makes use of p(|w|) digits of x, where p is a polynomial. Thus we can see that the set must be in P/poly, using the advice function g ∈ poly, given by g(n) = x p(n) . We then conclude that measuring the position of a motionless point particle in Classical Physics, using a infinite precision cannon, 1 in polynomial time, we are deciding a set in P/poly. Note that, the class P/poly includes the Halting Set. But, most probably, if we remove the infinite precision criterion for the cannon, we will loose accuracy of observations, and we will loose the computational power of our physical oracle... 1
Remember that by infinite precision we mean the cannon to be settle at a rational point with infinite precision.
On the Complexity of Measurement
5
27
From Infinite to Unlimited Precision
second gedankenexperiment: the cannon can be placed at some dyadic rational z up to some dyadic arbitrary precision ε, let us say ε = 2−|z|−1 . I.e., we will be investigating the error-prone machine, which gives the next simplest situation. We will conclude that the computational power of these machines is not altered by considering such a small error, since they decide exactly BP P//poly = BP P/poly = P/poly in polynomial time. As in Section 4, we showed that only a number of digits of the vertex position linear in the length of computation influences the decision of such an error-prone analog-digital scatter machine. Notice that the behaviour of the error-prone analog-digital scatter machines is probabilistic, because if the machine shoots close enough to the motionless point particle position — the vertex —, and with a large enough error ε, the projectile can go both left or right with a non-zero probability. Thus, we can not ensure that when we truncate the vertex position to O(t(n)) digits, the state of the machine will be exactly the same after t(n) steps. Instead we showed that if a machine decides in time t(n), then by truncating the vertex position to O(t(n)) digits, the machine will decide the same set for inputs up to size n. In the following statement, note that if a set is decided by an error-prone analog-digital scatter machine with error probability bounded by γ, we may assume without loss of generality that γ < 14 . Proposition 5.1. Let M be an error-prone analog-digital scatter machine with arbitrary precision, with the vertex at x, deciding some set in time t(n) with ˜ be an error-prone machine with error probability bounded by γ < 14 . Let M arbitrary precision, with the same finite control as M and with the vertex placed ˜ make the same decision on every input of size at x˜ = x 5t(n) . Then M and M smaller or equal to n. This allows us to show the following. Proposition 5.2. Every set decided by an error-prone analog-digital scatter machine with arbitrary precision in polynomial time is in BP P//poly. Let A be a set decided by a precise error-free analog-digital scatter machine M in polynomial time p, and with a error probability bounded by 14 . Let x be the position of the vertex of M. We use the advice function f ∈ poly, given by ˜ which decides A f (n) = x 5p(n) , to construct a probabilistic Turing machine M in polynomial time. ˜ can carry out a Bernoulli Given any dyadic rational x˜ ∈ [0, 1], the machine M trial X with an associated probability P(X = 1) = x ˜. If x˜ has the binary expan˜ tosses its balanced coin k times, and constructs sion ξ1 . . . ξk , the machine M a word τ1 . . . τk , where τi is 1 if the coin turns up heads and 0 otherwise. The Bernoulli trial will have the outcome 1 if ξ1 . . . ξk < τ1 . . . τk , and 0 otherwise, and this will give the desired probability. ˜ will decide if w ∈ A by simulating M on the The probabilistic machine M input w with the vertex placed at the position x ˜ = x 5p(n). In order to mimic
28
E. Beggs et al.
the shooting of the cannon from the position z, which should have an error ε = 2−|z|−1 , the machine will carry out a Bernoulli trial X with an associated dyadic probability. ˜ will simulate a left hit when X = 1 and a right hit when X = 0. Then M As we have seen in the previous Proposition 5.1, M will, when simulated in this way, decide the same set in polynomial time and with bounded error probability. In a way similar to the first part of the proof of Theorem 4.1 we can prove the statement: Proposition 5.3. An error-prone analog-digital scatter machine with arbitrary precision can obtain n digits of the vertex position in O(n2 ) steps. The conclusion for this section comes with the following theorem. Theorem 5.1. The class of sets decided by error-prone analog-digital scatter machines with arbitrary precision in polynomial time is exactly BP P//poly = P/poly. It seems that measurement in more reasonable conditions do not affect the computational power of a motionless point particle position taken as oracle. I.e., making measurements in Classical Physics with incremental precision decide languages above the Turing limit, namely the halting set. But, surely, the computational power of a fuzzy motionless point particle position, i.e., a point to which we have access only with a finite a priori precision in measurement, will drop above the Turing limit...
6
A Priori Finite Precision
third gedankenexperiment: the cannon can be placed at some dyadic rational z up to some dyadic fixed precision ε. We will show that such machines may, in polynomial time, make probabilistic guesses of up to a logarithmic number of digits of the position of the vertex. We will then conclude that these machines decide exactly BP P// log ∗. Proposition 6.1. For any real value δ < 12 , prefix function f ∈ log, there is an error-prone analog-digital scatter machine with fixed precision which obtains f (n) in polynomial time with an error of at most δ. The proof will take two steps. First we show that if f is a prefix function in log, then there is a real value 0 ≤ r ≤ 1 such that it is possible to obtain the value f (n) from a logarithmic number of digits of r. Then we will show that by carefully choosing the vertex position, we can guess a logarithmic number of digits of r with an error rate δ. If f (n) is ultimately constant, then the result is trivial, and so we assume it is not so. After the work of [2], we can assume, without loss of generality, that there exist a, b ∈ N such that |f (n)| = a log n + b. Since f is a prefix function, we can consider the infinite sequence ϕ which is the limit of f (n) as n → ∞. Let ϕn be the n-th symbol (1 or 0) in this sequence,
On the Complexity of Measurement
29
−n and set r = ∞ . Then, since f is not ultimately constant, the digits in n=1 ϕn 2 the binary expansion of r are exactly the symbols of ϕ. Then the value f (n) can be obtained from the first a log n + b digits of r. The real number r is a value strictly between 0 and 1. Now suppose that ε is the error when positioning the cannon. We then set the vertex of our analog-digital scatter machine at the position x = 12 − ε + 2rε. Our method for guessing digits of r begins by commanding the cannon to shoot from the point 12 a number z of times. If the scatter machine experiment is carried out by entering the shooting state after having written the word 01 in the query tape, the cannon will be placed at some point in the interval [ 12 − ε, 12 + ε], with a uniform distribution. Then we conclude that the particle will go left with a probability of r and go right with a probability of 1 − r. After shooting the cannon z times in this way, we count the number of times that the particle went left, which we denote by L. The value r˜ = Lz will be our estimation of r. In order for our estimation to be correct in the sufficient number of digits, it is required that |r − r˜| ≤ 2−|f (n)|−1. By shooting the cannon z times, we have made z Bernoulli trials. Thus L is a random variable with expected value μ = zr and variance ν = zr(1 − r). By Chebyshev’s inequality, we conclude that, for every Δ, Δ ν P(|L − μ| > Δ) = P(|z˜ r − zr| > Δ) = P |˜ r − r| > ≤ 2. z Δ Choosing Δ = z2−|f (n)|−1, we get r(1 − r)22|f (n)|+2 z And so, the probability of making a mistake can be bounded to δ by making z > δ −1 r(1 − r)22a log n+2b+2 ∈ O(n2a ) experiments. The proposition above will guarantee us that for every fixed error ε we can find a vertex position that will allow for an SME to extract information from this vertex position. It does not state that we can make use of any vertex position independently of the fixed error ε. It can be shown, however, that if ε is a dyadic rational, then we may guess O(log n) digits of the vertex position in polynomial time. P(|˜ r − r| > 2−|f (n)|−1 ) ≤
Proposition 6.2. The class of sets decided in polynomial time by error-prone analog-digital scatter machines with a priori fixed precision is exactly BP P//log ∗. Since the class BP P// log ∗ include non-recursive sets, measurements in this more realistic condition still decide super-Turing languages. We can even think about changing the (uniform) probability distribution in the last two sections, making experiments closer and closer to reality...
7
Conclusion
We have seen that every variant of the analogue-digital scatter machine has a hypercomputational power. For instance, if K = {0n : the Turing machine coded
30
E. Beggs et al.
by n halts on input 0}, then K can be decided, in polynomial time, either by an error-free analog-digital scatter machine, or by an error-prone analog-digital scatter machine with arbitrary precision. It is obvious that the hypercomputational power of the analog-digital scatter machine arises from the precise nature of the vertex position. If we demand that the vertex position is a computable real number, then the analog-digital scatter machine can compute no more than the Turing machine, although it can compute, in polynomial time, faster than the Turing machine, namely REC P/poly, the recursive part of P/poly. In order to use the scatter machine experiment as an oracle, we need to assume that the wedge is sharp to the point and that the vertex is placed on a precise value x. Without these assumptions, the scatter machine becomes useless, since its computational properties arise exclusively from the value of x. The existence of an arbitrarily sharp wedge seems to contradict atomic theory, and for this reason the scatter machine is not a valid counterexample to the physical Church– Turing thesis. If this is the case, then what is the relevance of the analog-digital scatter machine as a model of computation? The scatter machine is relevant when it is seen as a Gedankenexperiment. In our discussion, we could have replaced the barriers, particles, cannons and particle detectors with any other physical system with this behaviour. So the scatter machine becomes a tool to answer the more general question: if we have a physical system to measure an answer to the predicate y ≤ x, to what extent can we use this system in feasible computations? As an open problem, besides a few other aspects of the measurement apparatus that we didn’t cover up in this paper, we will study a point mass in motion, according to some physical law, like a Newtonian gravitation field, and we will apply instrumentation to measure its position and velocity.
References 1. Balc´ azar, J.L., D´ıas, J., Gabarr´ o, J.: Structural Complexity I. Springer, Heidelberg (1988) 2. Balc´ azar, J.L., Hermo, M.: The structure of logarithmic advice complexity classes. Theoretical Computer Science 207(1), 217–244 (1998) 3. Beggs, E., Tucker, J.: Embedding infinitely parallel computation in newtonian kinematics. Applied Mathematics and Computation 178(1), 25–43 (2006) 4. Beggs, E., Tucker, J.: Can Newtonian systems, bounded in space, time, mass and energy compute all functions? Theoretical Computer Science 371(1), 4–19 (2007) 5. Beggs, E., Tucker, J.: Experimental computation of real numbers by Newtonian machines. Proceedings of the Royal Society 463(2082), 1541–1561 (2007) 6. Siegelmann, H.: Neural Networks and Analog Computation: Beyond the Turing Limit. Birkh¨ auser (1999)
Quantum Walk Based Search Algorithms Miklos Santha1,2 2
1 CNRS–LRI, Universit´e Paris–Sud, 91405 Orsay, France Centre for Quantum Technologies, Nat. Univ. of Singapore, Singapore 117543
Abstract. In this survey paper we give an intuitive treatment of the discrete time quantization of classical Markov chains. Grover search and the quantum walk based search algorithms of Ambainis, Szegedy and Magniez et al. will be stated as quantum analogues of classical search procedures. We present a rather detailed description of a somewhat simplified version of the MNRS algorithm. Finally, in the query complexity model, we show how quantum walks can be applied to the following search problems: Element Distinctness, Matrix Product Verification, Restricted Range Associativity, Triangle, and Group Commutativity.
1
Introduction
Searching is without any doubt one of the major problems in computer science. The corresponding literature is tremendous, most manuals on algorithms include several chapters that deal with searching procedures [22,14]. The relevance of finite Markov chains (random walks in graphs) to searching was recognized from early on, and it is still a flourishing field. The algorithm of Aleliunas et al. [4] that solves s–t connectivity in undirected graphs in time O(n3 ) and in space O(log n), and Sch¨ oning’s algorithm [33] that provides the basis of the currently fastest solutions for 3-SAT are among the most prominent examples for that. Searching is also a central piece in the emerging field of quantum algorithms. Grover search [16], and in general amplitude amplification [11] are well known quantum procedures which are provably faster than their classical counterpart. Grover’s algorithm was used recursively by Aaronson and Ambainis [2] for searching in grids. Discrete time quantum walks were introduced gradually by Meyer[27,28] in connection with cellular automata, and by Watrous in his works related to space bounded computations [37]. Different parameters related to quantum walks and possible speedups of classical algorithms were investigated by several researchers [30,8,3,29,19,32]. The potential of discrete time quantum walks with respect to searching problems was first pointed out by Shenvi, Kempe, and Whaley [34] who designed a quantum walk based simulation of Grover search. Ambainis, in his seminal paper [7],
Research supported by the European Commission IST Integrated Project Qubit Applications (QAP) 015848, and by the ANR Blanc AlgoQP grant of the French Research Ministry.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 31–46, 2008. c Springer-Verlag Berlin Heidelberg 2008
32
M. Santha
used quantum walks on the Johnson graphs to settle the query complexity of the Element Distinctness problems. Inspired by the work of Ambainis, Szegedy [35] designed a general method to quantize classical Markov chains, and developed a theory of quantum walk based search algorithms. A similar approach for the specific case of searching in grids was taken by Ambainis, Kempe and Rivosh [9]. The frameworks of Ambainis and Szegedy were used in various contexts to find algorithms with substantial complexity gains over simple Grover search [26,12,23,15]. In a recent work, Magniez, Nayak, Roland and Santha [25] proposed a new quantum walk based search method that expanded the scope of the previous approaches. The MNRS search algorithm is also conceptually simple, and improves various aspects of many walk based algorithms. In this survey paper we give an intuitive (though formal) treatment of the quantization of classical Markov chains. We will be concerned with discrete time quantum walks, the continuous case will not be covered here. Grover search and the quantum walk based search algorithms of Ambainis, Szegedy and Magniez et al. will be stated as quantum analogues of classical search procedures. We present a rather detailed description of a somewhat simplified version of the MNRS algorithm. Finally, in the query complexity model, we show how quantum walks can be applied to the following search problems: Element Distinctness, Matrix Product Verification, Restricted Range Associativity, Triangle, and Group Commutativity. For a detailed introduction to quantum walks the reader is referred to the excellent surveys of Kempe [20] and Ambainis [5]. Another survey on quantum search algorithms is also due to Ambainis [6].
2
Classical Search Algorithms
At an abstract level, any search problem may be cast as the problem of finding a marked element from a set X. Let M ⊆ X be the set of marked elements, and let ε be a known lower bound on |M |/|X|, the fraction of marked elements, whenever M is non-empty. If no further information is available on M , we can choose ε as 1/|X|. The simplest approach, stated in Search Algorithm 1, to solve this problem is to repeatedly sample from X uniformly until a marked element is picked, if there is any.
Search Algorithm 1 Repeat for t = O(1/ε) steps 1. Sample x ∈ X according to the uniform distribution. 2. If x is in M then output it and stop. More sophisticated approaches might use a Markov chain on the state space X for generating the samples. In that case, to generate the next sample, the resources expended for previous generations are often reused. Markov chains can be viewed as random walks on directed graphs with weighted edges. We will identify a Markov chain with its transition matrix
Quantum Walk Based Search Algorithms
33
P = (pxy ). A chain is irreducible if every state is accessible from every other state. An irreducible chain is ergodic if it is also aperiodic. The eigenvalues of a Markov chain are at most 1 in magnitude. By the Perron-Frobenius theorem, an irreducible chain has a unique stationary distribution π = (πx ), that is a unique left eigenvector π with eigenvalue 1 and positive coordinates summing up to 1. If the chain is ergodic, the eigenvalue 1 is the only eigenvalue of P with magnitude 1. We will denote by δ = δ(P ) the eigenvalue gap of P , that is 1 − |λ|, where λ is an eigenvalue with the second largest magnitude. It follows that when P is ergodic then δ > 0. The time-reversed Markov chain P ∗ of P is defined by the equations πx pxy = πy p∗yx . The Markov chain P is said to be reversible if P ∗ = P . Reversible chains can be viewed as random walks on undirected graphs with weighted edges, and in these chains the probability of a transition from a state x to another state y in the stationary distribution is the same as the probability of the transition in the reverse direction. The Markov chain P is symmetric if P = P t where P t denotes the transposed matrix of P . The stationary distribution of symmetric chains is the uniform distribution. They can be viewed as random walks on regular graphs, and they are time-reversible. We consider two search algorithms based on some ergodic and symmetric chain P . Search Algorithm 2 repeatedly samples from approximately stationary distributions, and checks if the element is marked. To get a sample the Markov chain is simulated long enough to mix well. Search Algorithm 3 is a greedy variant: a check is performed after every step of the chain.
Search Algorithm 2 1. Initialize x to a state sampled from the uniform distribution over X. 2. Repeat for t2 = O(1/ε) steps (a) If the element reached in the previous step is marked then output it and stop. (b) Simulate t1 = O(1/δ) steps of P starting with x.
Search Algorithm 3 1. Initialize x to a state sampled from the uniform distribution over X. 2. Repeat for t = O(1/εδ) steps (a) If the element reached in the previous step is marked then output it and stop. (b) Simulate one step of P starting with x. We state formally the complexity of the three algorithms to clarify their differences. They will maintain a data structure d that associates some data d(x) with every state x ∈ X. Creating and maintaining the data structure incurs a certain cost, but the data d(x) can be helpful to determine if x ∈ M . We distinguish three types of cost.
34
M. Santha
Setup cost S: The cost to sample x ∈ X according to the uniform distribution, and to construct d(x). Update cost U: The cost to simulate a transition from x to y according to P , and to update d(x) to d(y). Checking cost C: The cost of checking if x ∈ M using d(x). The cost may be thought of as a vector listing all the measures of complexity of interest, such as query and time complexity. The generic bounds on the efficiency of the three search algorithms can be stated in terms of the cost parameters. Proposition 1. Let P be an ergodic and symmetric Markov chain on X. Then all three algorithms find a marked element with high probability if there is any. The respective costs incurred by the algorithms are of the following order: 1. Search Algorithm 1: (S + C)/ε, 2. Search Algorithm 2: S + (U/δ + C)/ε, 3. Search Algorithm 3: S + (U + C)/δε. The generic bound of O(1/δε) in Search Algorithm 3 on the hitting time is not always optimal, which in some cases, for example in the 2-dimensional grid, can be significantly smaller.
3
Quantum Analogue of a Classical Markov Chain
We define a quantum analogue of an arbitrary irreducible Markov chain P as it is given by Magniez et al. [25]. This definition is based on and slightly extends the concept of quantum Markov chain due to Szegedy [35]. The latter was inspired by an earlier notion of quantum walk due to Ambainis [7]. We also point out that a similar process on regular graphs was studied by Watrous [37]. The quantum walk may be thought of as a walk on the edges of the original Markov chain, rather than on its vertices. Thus, its state space is a vector subspace of H = CX×X ∼ = CX ⊗ CX . For a state |ψ ∈ H, let Πψ = |ψψ| denote the orthogonal projector onto Span(|ψ), and let ref(ψ) = 2Πψ − Id denote the reflection through the line generated by |ψ, where Id is the identity operator on H. If K is a subspace of H spanned by a set of mutually orthogonal states {|ψi : i ∈ I}, then let ΠK = i∈I Πψi be the orthogonal projector onto K, and let ref(K) = 2ΠK −Id be the reflection through K. Let A = Span(|x|px : x ∈ X) and B = Span(|p∗y |y : y ∈ X) be vector subspaces of H, where √ |px = y∈X pxy |y and |p∗y = x∈X p∗yx |x. Definition 1 (Quantum walk). The unitary operation W (P ) = ref(B)·ref(A) defined on H by is called the quantum walk based on the classical chain P . Let us give some motivations for this definition. Classical random walks do not quantize in the space of the vertices. The standard way advocated by several papers (see the survey [20]) is to extend the vertex space X by a coin space C, and define the state space of the walk as X ×C. Then a step of the walk is defined
Quantum Walk Based Search Algorithms
35
as the product of two unitary operations. The first one is the flip operation F controlled by the vertex state, which means that for every x ∈ X, it performs a unitary coin flip F x on the states {|x, c : c ∈ C}. For d-regular undirected graphs, C can be taken as the set {1, . . . , d}, and in that case the coin flip F x is independent from x. The second one is the shift operation S which is controlled by the coin state, and takes a vertex to one if its neighboring vertices. For dregular graphs the simplest way to define it is via a labeling of the directed edges by the numbers between 1 and d such that for every 1 ≤ i ≤ d, the directed edges labeled by i form a permutation. Then, if the coin state is i, the new vertex is the ith neighbor according to the labeling. For general walks it is practical to take the coin space also to be X, then the state space of the walk corresponds naturally to the directed edges of the graph. In this case there is a symmetry between the two spaces, and the shift operation simply exchanges the vertices, that is S|x, y = |y, x, for every x, y ∈ X. Let us pause here for a second and consider how a classical walk defined by some Markov chain P can be thought of as a walk on the directed edges of the graph (instead of the vertices). Let’s think about an edge (x, u) as the state of the walk P being at x, where the previous state was u. According to this interpretation, in one step the walk on edges should move from state (x, u) to state (y, x) with probability pxy . This move can be accomplished by the stochastic flip operation F controlled by the left end-point of the edge, where x Fuy = pxy for all x, u, y ∈ X, followed by the shift S defined previously. If we define the flip F as F but controlled by the right end-point of the edge, then it is not hard to see that SF SF = F F . Therefore one can get rid of the shift operations, and two steps of the walk can be accomplished by two successive flips where the control and the target registers alternate. Coming back to the quantization of classical walks, we thus want to find unitary coin flips which mirror the walk P , and which alternately mix the right end-point of the edges over the neighbors of the left end-point, and then the left end-point of the edges over the neighbors of the new right end-point. The reflections ref(A) and ref(B) are natural choices for that. They are also generalizations of the Grover diffusion operator [16]. Indeed, when the transition to each neighbor is equally likely, they correspond exactly to Grover diffusion. In Szegedy’s original definition the alternating reflections were ref(A) and ref(B ) with B = Span(|py |y : y ∈ X), mirroring faithfully the classical edge based walk. The reason why the MNRS quantization chooses every second step a reflection based on the reversed walk P ∗ is explained now. The eigen-spectrum of the transition matrix P plays an important role in the analysis of a classical Markov chain. Similarly, the behaviour of the quantum process W (P ) may be inferred from its spectral decomposition. The reflections through subspaces A and B are (real) orthogonal transformations, and so is their product W (P ). An orthogonal matrix may be decomposed into a direct sum of the identity, reflection through the origin, and two-dimensional rotations over orthogonal vector subspaces [17, Section 81]. These subspaces and the corresponding eigenvalues are revealed by the singular value decomposition of the
36
M. Santha
product ΠA ΠB of the orthogonal projection operators onto the subspaces A and B. Equivalently, as done by Szegedy, values of the one can consider the singular √ √ discriminant matrix D(P ) = ( pxy p∗yx ). Since pxy p∗yx = πx pxy / πy , we have D(P )
=
diag(π)1/2 · P · diag(π)−1/2 ,
where diag(π) is the invertible diagonal matrix with the coordinates of the distribution π in its diagonal. Therefore D(P ) and P are similar, and their spectra are the same. When P is reversible then D(P ) is symmetric, and its singular values are equal to the absolute values of its eigenvalues. Thus, in that case we only have to study the spectrum of P . Since the singular values of D(P ) all lie in the range [0, 1], they can be expressed as cos θ, for some angles θ ∈ [0, π2 ]. The following theorem of Szegedy relates the singular value decomposition of D(P ) to the spectral decomposition of W (P ). Theorem 1 (Szegedy [35]). Let P be an irreducible Markov chain, and let cos θ1 , . . . , cos θl be an enumeration of those singular values (possibly repeated) of D(P ) that lie in the open interval (0, 1). Then the exact description of the spectrum of W (P ) on A + B is: 1. On A + B those eigenvalues of W (P ) that have non-zero imaginary part are exactly e±2iθ1 , . . . , e±2iθl , with the same multiplicity. 2. On A ∩ B the operator W (P ) acts as the identity Id. A ∩ B is spanned by the left (and right) singular vectors of D(P ) with singular value 1. 3. On A∩B ⊥ and A⊥ ∩B the operator W (P ) acts as −Id. A∩B ⊥ (respectively, A⊥ ∩ B) is spanned by the left (respectively, right) singular vectors of D(P ) with singular value 0. Let us now suppose in addition that P is ergodic and reversible. As we just said, reversibility implies that the singular values of D(P ) are equal to the absolute values of the eigenvalues of P . From the ergodicity it also follows that D(P ) has a unique singular vector with singular value 1. We have therefore the following corollary. Corollary 1. Let P be an ergodic and reversible Markov chain. Then, on A + B the spectrum of W (P ) can be characterized as: √ √ πx |x|px = πy |p∗y |y |π = x∈X
y∈X
is the unique 1-eigenvector, e±2iθ are eigenvalues for every singular value cos θ ∈ (0, 1) of D(P ), and all the remaining eigenvalues are -1. The phase gap Δ(P ) = Δ of W (P ) is defined as 2θ, where θ is the smallest angle in (0, π2 ] such that cos θ is a singular value of D(P ). This definition is motivated by the previous theorem and corollary: the angular distance of 1 from any other eigenvalue of W (P ) on A+B is at least Δ. When P is ergodic and reversible, there
Quantum Walk Based Search Algorithms
37
is a quadratic relationship between the phase gap Δ of the quantum walk W (P ) and √ the eigenvalue gap δ of the classical Markov chain P , more precisely Δ ≥ 2 δ. Indeed, let δ ∈ (0, π2 ] such that δ = 1 − cos θ and Δ = 2θ. The following √ √ (in)equalities can easily be checked: Δ ≥ 1 − e2iθ = 2 1 − cos2 θ ≥ 2 δ. The origin of the quadratic speed-up due to quantum walks may be traced to this phenomenon.
4
Quantum Search Algorithms
As in the classical case, the quantum search algorithms look for a marked element in a finite set X. We suppose that the elements of X are coded by binary strings and that ¯ 0, the everywhere 0 string is in X. A data structure attached to both vertex registers is maintained during the algorithm. Again, three types of cost will be distinguished, generalizing those of the classical search. In all quantum search algorithms the overall complexity is of the order of these specific costs, which justifies their choices. The operations not involving manipulations of the data will be charged at unit cost. For the sake of simplicity, we do not formally include the data into the description of the unitary operations defining the costs. The initial state of the algorithm is explicitly related to the stationary distribution π of P . (Quantum) Setup cost S: The cost for constructing the state √ ¯ π |x| 0 with data. x x∈X (Quantum) Update cost U: The cost to realize any of the unitary transformations and inverses with data √ |x|¯ 0 → |x y∈X pxy |y, ∗ |¯ 0|y → pyx |x|y. x∈X (Quantum) Checking cost C: The cost to realize the unitary transformation with data, that maps |x|y to −|x|y if x ∈ M , and leaves it unchanged otherwise. In the checking cost we could have included the cost of the unitary transformation which realizes a phase flip also when y ∈ M , our choice was made just for simplicity. Observe that the quantum walk W (P ) with data can be implemented at cost 4U + 2. Indeed, the reflection ref(A) is implemented by mapping 0, applying ref(CX ⊗ |¯ 0), and undoing the first transforstates |x|px to |x|¯ mation. In our accounting we charge unit cost for the second step since it does not depend on the database. Therefore the implementation of ref(A) is of cost 2U + 1. The reflection ref(B) may be implemented similarly. Let us now describe how the respective algorithms of Grover, Ambainis and Szegedy are related to the classical search algorithms of Section 2. We suppose that ε, a lower bound on the proportion of marked elements is known in advance, though the results remain true even if it is not the case. Grover search (which we discuss soon in detail) is the quantum analogue of Search Algorithm 1, and it doesn’t involve my specific data structure.
38
M. Santha
Theorem 2 (Grover [16]). There exists a quantum algorithm which with high √ . probability finds a marked element, if there is any, at cost of order S+C ε In the original application of Grover’s result to unordered search there is no data structure involved, therefore S + C = O(1), and the cost is of order √1ε . The algorithm of Ambainis is the quantum analogue of Search Algorithm 2 in the special case of the walk on the Johnson graph and for some specific marked sets. Let us recall that for 0 < r ≤ n/2, the vertices of the Johnson graph J(n, r) are the subsets of [n] of size r, and there is an edge between two vertices if the size of their symmetric difference is 2. In other words, two vertices are adjacent if by deleting an element from the first one and adding a new element to it we get the second vertex. The eigenvalue gap δ of the symmetric walk on J(n, r) is n/r(n − r) = Θ(1/r). If the set of marked vertices in J(n, r) is either empty, or it consists of vertices that contain a fixed subset of constant size k ≤ r then k ε = Ω( nr k ). Theorem 3 (Ambainis [7]). Let P be the random walk on the Johnson graph J(n, r) where r = o(n), and let M be either empty, or the class of vertices that contain a fixed subset of constant size k ≤ r. Then there is a quantum algorithm that finds, with high probability, the k-subset if M is not empty at cost of order S + √1ε ( √1δ U + C). Szegedy’s algorithm is the quantum analogue of Search Algorithm 3 for the class of ergodic and symmetric Markov chains. His algorithm is therefore more general than the one of Ambainis with respect to the class of Markov chains and marked sets it can deal with. Nonetheless, the approach of Ambainis has its own advantages: it is of smaller cost when C is substantially greater than U, and it also finds a marked element. Theorem 4 (Szegedy [35] ). Let P be an ergodic and symmetric Markov chain. There exists a quantum algorithm that determines, with high probability, if M is non-empty at cost of order S + √1δε (U + C). The MNRS algorithm is a quantum analogue of Search Algorithm 2 for ergodic and reversible Markov chains. It generalizes the algorithms of Ambainis an Szegedy, and it combines their benefits in terms of being able to find marked elements, incurring the smaller cost of the two, and being applicable to a larger class of Markov chain. Theorem 5 (Magniez et al. [25]). Let P be an ergodic and reversible Markov chain, and let ε > 0 be a lower bound on the probability that an element chosen from the stationary distribution of P is marked whenever M is non-empty. Then, there exists a quantum algorithm which finds, with high probability, an element of M if there is any at cost of order S + √1ε ( √1δ U + C). There is an additional feature of Szegedy’s algorithm which doesn’t fit into the MNRS algorithmic paradigm. In fact, the quantity √1δε in Theorem 4 can be
Quantum Walk Based Search Algorithms
39
replaced by the square root of the classical hitting time [35]. The search algorithm for the 2-dimensional grid obtained this way, and the one given in [9] have smaller complexity than what follows from Theorem 5.
5
The MNRS Search Algorithm
We give a high level description of the MNRS search algorithm. Assume that M = ∅. Let M = CM×X denote the marked subspace, that is the subspace with marked items in the first register. The purpose of the algorithm is to approximately transform the initial state |π to the target state |μ, which is the normalized projection of |π onto M: ΠM |π 1 √ = √ πx |x|px , ΠM |π ε x∈M where ε = ΠM |π2 = x∈M πx is the probability of the set M of marked states under the stationary distribution π. Let us recall that Grover search [16] solves this problem via the iterated use of the rotation ref(π) · ref(μ⊥ ) in the two-dimensional real subspace S = Span(|π, |μ), where |μ⊥ is the state in S orthogonal to |μ making some acute angle ϕ with |π. The angle ϕ is given √ by sin ϕ = μ|π = ε. Then ref(π) · √ ref(μ⊥ ) is a rotation by 2ϕ within the space S, and therefore O(1/ϕ) = O(1/ ε) iterations of this rotation, starting with |π, approximates well |μ. The MNRS search basically follows this idea. Restricted to the subspace S, the operator ref(μ⊥ ) is identical to −ref(M). Therefore, if the state of the algorithm remains close to the subspace S throughout, it can be implemented at the price of the checking cost. The reflection ref(π) is computationally harder to perform. The idea is to apply the phase estimation algorithms of Kitaev [21] and Cleve at al. [13] to W (P ). Corollary 1 implies that |π is the only 1-eigenvector of W (P ), and all the other eigenvectors have phase at least Δ. Phase estimation approximately resolves any state |ψ in A+ B along the eigenvectors of W (P ), and thus distinguishes |π from all the others. Therefore it is possible to flip the phase of all states with a non-zero estimate of the phase, that is simulate the effect of the operator ref(π) in A + B. The following result of [25] resumes this discussion: |μ
=
Theorem 6. There exists a uniform family of quantum circuits R(P ) that uses O(k log(Δ−1 )) additional qubits and satisfies the following properties: 1. It makes O(kΔ−1 ) calls to the controlled quantum walk c−W (P ) and its inverse. 2. R(P )|π = |π. 3. If |ψ ∈ A + B is orthogonal to |π, then (R(P ) + Id)|ψ ≤ 2−k . The essence of the MNRS search algorithm is the following simple procedure that satisfies the conditions of Theorem 5, but with a slightly higher complexity, of the order of S + √1ε ( √1δ log √1ε U + C). Again, we suppose that ε is known.
40
M. Santha
Quantum Search(P ) 1. Start from the √ initial state |π. 2. Repeat O(1/ ε)–times: (a) For any basis vector |x|y, flip the phase if x ∈ M . √ (b) Apply circuit R(P ) of Theorem 6 with k = O(log(1/ ε)). 3. Observe the first register and output if it is in M . To see the correctness, let |φi be the result of i iterations of ref(π) · ref(μ⊥ ) applied to |π, and let |ψi be the result of i iterations of step (2) in Quantum Search(P ) applied to |π. It is not hard to show by induction on i, using a that hybrid argument as in [10,36], that |ψi − |φi ≤ O(i2−k ). This implies √ |ψk −|φk is an arbitrarily small constant when k is chosen to be O(log(1/ ε)) and therefore the success probability is arbitrarily close to 1. The cost of the procedure is simple to analyze. Initialization costs S + U, and in each iteration the single phase flip costs C. In the circuit R(P ), the controlled quantum walk and its inverse can be implemented, similarly to W (P ), at cost 0) and ref(|¯0 ⊗ CY ). The√number of 4U + 2, simply by controlling ref(CX ⊗ |¯ ε)). Since steps of √the controlled quantum walk and its inverse is O((1/Δ) log(1/ √ Δ ≥ 2 δ, this finishes the cost analysis. Observe that the log(1/ ε)-factor in the update cost was necessary for reducing the error of the approximate reflection operator. In [25] it is described how it can be eliminated by adapting the recursive amplitude amplification algorithm of Høyer et al. [18]
6
Applications
We give here a few examples where the quantum search algorithms, in particular the MNRS algorithm can be applied successfully. All examples will be described in the query model of computation. Here the input is given by an oracle, a query can be performed at unit cost, and all other computational steps are free. A formal description of the model can be found for example in [23] or [26]. In fact, in almost all cases, the circuit complexity of the algorithms given will be of the order of the query complexity, with additional logarithmic factors. 6.1
Grover Search
As a first (and trivial) application, we observe that Grover’s algorithm [16] for the unordered search problem is a special case of Theorem 5. Unordered Search Oracle Input: A boolean function f defined on [n]. Output: An element i ∈ [n] such that f (i) = 1. Theorem 7. Unordered Search can be solved with high probability in quantum query complexity O((n/k)1/2 ), where |{i ∈ [n] : f (i) = 1}| = k.
Quantum Walk Based Search Algorithms
41
Proof. Consider the symmetric random walk in the complete graph on n vertices, where an element v is marked if f (v) = 1. The eigenvalue gap of the walk is 1 1 − n−1 , and the probability ε that an element is marked is k/n. There is no data structure involved in the algorithm, the setup, update and checking costs are 1. 6.2
Johnson Graph Based Algorithms
All these examples are based on the symmetric walk in the Johnson graph J(n, r), with eigenvalue gap Θ(1/r). Element Distinctness. This is the original problem for which Ambainis introduced the quantum walk based search method [7]. Element Distinctness Oracle Input: A function f defined on [n]. Output: A pair of distinct elements i, j ∈ [n] such that f (i) = f (j) if there is any, otherwise reject. Theorem 8. Element Distinctness can be solved with high probability in quantum query complexity O(n2/3 ). Proof. A vertex R ⊆ [n] of size r is marked if there exist i = j ∈ R such f (i) = f (j). The probability ε that an element is marked, if there is any, is in Ω((r/n)2 ). For every R, the data is defined as {(v, f (v)) : v ∈ R}. Then the setup cost is in O(r), the update cost is O(1), and the checking cost is 0. Therefore the overall cost is O(r + n/r1/2 ) which is O(n2/3 ) when r = n2/3 . This upper bound is tight, the Ω(n2/3 ) lower bound is due to Aaronson and Shi [1]. Matrix Product Verification. This problem was studied by Buhrman and Spalek [12], and the algorithm in the query model is almost identical to the previous one. Matrix Product Verification Oracle Input: Three n × n matrices A, B and C. Output: Decide if AB = C and in the negative case find indices i, j such that (AB)ij = Cij . Theorem 9. Matrix Product Verification can be solved with high probability in quantum query complexity O(n5/3 ). Proof. For an n × n matrix M and a subset of indices R ⊆ [n], let M |R denote the |R| × n submatrix of M corresponding to the rows restricted to R. The submatrices M |R and M |R R are defined similarly, when the restriction concerns the columns and both the rows and columns. A vertex R ⊆ [n] is marked if there exist i, j ∈ R such that ABij = Cij . The probability of being marked, if there is such an element, is in Ω((r/n)2 ). For every R, the data is defined as the
42
M. Santha
set of entries in A|R , B|R and C|R R . Then the setup cost is in O(rn), the update cost is O(n), and the checking cost is 0. Therefore the overall cost is O(n5/3 ) when r = n2/3 . The best known lower bound is Ω(n3/2 ), and in [12] an O(n5/3 ) upper bound was also proven for the time complexity with a somewhat more complicated argument. Restricted Range Associativity. The problem here is to decide if a binary operation ◦ is associative. The only algorithm known for the general case is Grover search, but D¨orn and Thierauf [15] have proved that when the range of the operation is restricted, Theorem 5 (or Theorem 3) give a non-trivial bound. A triple (a, b, c) is called non associative if (a ◦ b) ◦ c = a ◦ (b ◦ c). Restricted Range Associativity Oracle Input: A binary operation ◦ : [n] × [n] → [k] where k ∈ O(1). Output: A non associative triple (a, b, c) if there is any, otherwise reject. Theorem 10. Restricted Range Associativity can be solved with high probability in quantum query complexity O(n5/4 ). Proof. We say that R ⊆ [n] is marked if there exist a, b ∈ R and c ∈ [n] such that (a, b, c) is non associative. Therefore ε is in Ω((r/n)2 ), if there is a marked element. For every R ⊆ [n], the data structure is defined as {(a, b, a ◦ b) : a, b ∈ R ∪ [k]}. Then the setup cost is O((r + k)2 = O(r2 ) and the update cost is O(r + k) = O(r). Observe that if b ∈ R and c ∈ [n] are fixed, then computing (a◦ b)◦ c and a◦ (b ◦ c) for all a ∈ R requires at most k + 1 queries with the help of the data cost is √ Thus using Grover search to find b and c, the √ checking √ √ structure. O(k rn) = O( rn). The overall complexity is then O(r2 + nr ( rr+ rn)) which √ is O(n5/4 ) when r = n. The best lower bound known both in the restricted range and the general case is Ω(n). Triangle. In an undirected graph G, a complete subgraph on three vertices is called a triangle. The algorithm of Magniez et al. [26] for finding a triangle uses the algorithm for Element Distinctness in the checking procedure. Triangle Oracle Input: The adjacency matrix f of a graph G on vertex set [n]. Output: A triangle if there is any, otherwise reject. Theorem 11. Triangle can be solved with high probability in quantum query complexity O(n13/10 ). Proof. We show how to find the edge of a triangle, if there is any, in query complexity O(n13/10 ). This implies the theorem since given such an edge, Grover search finds the third vertex of the triangle with O(n1/2 ) additional queries. An element R ⊆ [n] is marked if it contains a triangle edge. The probability ε that an element is marked is in Ω((r/n)2 ), if there is a triangle. For every R, the data structure is the adjacency matrix of the subgraph induced by R. defined
Quantum Walk Based Search Algorithms
43
as {(v, f (v)) : v ∈ R}. Then the setup cost is O(r2 ), and the update cost is O(r). The interesting part of the algorithm is the checking procedure, which is a quantum walk based search itself, and the claim is that it can be done at cost √ O( n × r2/3 ). To see this, let R be a set of r vertices such that the graph G restricted to R is explicitly known, and for which we would like to decide if it is marked. Observe that R is marked exactly when there exists a vertex v ∈ [n] such that v and an edge in R form a triangle. Therefore for every vertex v, one can define a secondary search problem on R via the boolean oracle fv , where for every u ∈ R, by definition fv (u) = 1 if {u, v} is an edge. The output of the problem is by definition positive if there is an edge {u, u } such that fv (u) = fv (u ) = 1. To solve the problem we consider the Johnson graph J(r, r2/3 ), and look for a subset which contains such an edge. In that search problem both the probability of being marked and the eigenvalue gap of the underlying Markov chain are in Ω(r−2/3 ). The data associated with a subset of R is just the values of fv at its elements. Then the setup cost is r2/3 , the update cost is O(1), and the checking cost is 0. Therefore by Theorem 5 the cost of solving a secondary search problem is in O(r2/3 ). Finally the checking procedure of the original search problem consists of a Grover search for a vertex v such that the secondary search problem defined by fv has a positive outcome. Putting things √ together, √ the problem can be solved in quantum query complexity O(r2 + nr ( r × r + n × r2/3 )) which is O(n13/10 ) when r = n3/5 . The best known lower bound for the problem is Ω(n). 6.3
Group Commutativity
The problem here is to decide if a group multiplication is commutative in the (sub)group generated by some set of group elements. It was defined and studied in the probabilistic case by Pak [31], the quantum algorithm is due to Magniez and Nayak [23]. Group Commutativity Oracle Input: The multiplication operation ◦ for a finite group whose base set contains [n]. Output: A non commutative couple (i, j) ∈ [n] × [n] if G, the (sub)group generated by [n], is non-commutative, otherwise reject. Theorem 12. Group Commutativity can be solved with high probability in quantum query complexity O(n2/3 log n). Proof. For 0 < r < n let S(n, r) be the set of all r-tuples of distinct elements ¯ = u1 ◦ · · · ◦ ur . We define from [n]. For u = (u1 , . . . , ur ) in S(n, r), we set u a random walk over S(n, r). Let u = (u1 , . . . , ur ) be the current vertex. Then with probability 1/2 stay at u, and with probability 1/2 pick uniformly random i ∈ [r] and j ∈ [n]. If j = um for some m then exchange ui and um , otherwise set ui = j. The random walk P at the basis of the quantum algorithm is over S(n, r) × S(n, r), and it consists of two independent simultaneous copies of the
44
M. Santha
above walk. The stationary distribution of P is the uniform distribution, and it is proven in [23] that its eigenvalue gap is Ω(1/(r log r)). A vertex (u, v) is marked if u¯ ◦ v¯ = v¯ ◦ u ¯. It is proven again in [23] that when G is non-commutative and r ∈ o(n), then the probability ε that an element is marked is Θ(r2 /n2 ). For u ∈ S(n, r) let Tu be the balanced binary tree with r leaves that are labeled from left to right by u1 , . . . , ur , and where each internal node is labeled by the product of the labels of its two sons. For every vertex (u, v) the data consists of (Tu , Tv ). Then the setup cost is r, and the update cost is O(log r) for recomputing the leaf–root paths. The checking cost is simply 2 for querying u ¯ ◦√ v¯ and v¯ ◦ u¯. Therefore the query complexity to find a marked element is O(r + nr ( r log r log r + 1)) which is O(n2/3 log n) when r = n2/3 log n. Once a marked element is found, Grover search yields a non-commutative couple at cost O(r). In [23] an Ω(n2/3 ) lower bound is also proven. And, it turns out that a Johnson graph based walk can be applied to this problem too [24], yielding an algorithm of complexity O((n log n)2/3 ).
Acknowledgment I would like to thank Fr´ed´eric Magniez, Ashwin Nayak and J´er´emie Roland, my coauthors in [25], for letting me to include here several ideas which were developed during that work, and for numerous helpful suggestions.
References 1. Aaronson, S., Shi, Y.: Quantum lower bounds for the collision and the element distinctness problems. Journal of the ACM, 595–605 (2004) 2. Aaronson, S., Ambainis, A.: Quantum search of spatial regions. Theory of Computing 1, 47–79 (2005) 3. Aharonov, D., Ambainis, A., Kempe, J., Vazirani, U.: Quantum walks on graphs. In: Proc. of the 33rd ACM Symposium on Theory of Computing, pp. 50–59 (2001) 4. Aleliunas, R., Karp, R., Lipton, R., Lov´ asz, L., Rackoff, C.: Random Walks, Universal Traversal Sequences, and the cComplexity of Maze Problems. In: Proc. of the 20th Symposium on Foundations of Computer Science, pp. 218–223 (1979) 5. Ambainis, A.: Quantum walks and their algorithmic applications. International Journal of Quantum Information 1(4), 507–518 (2003) 6. Ambainis, A.: Quantum search algorithms. SIGACT News 35(2), 22–35 (2004) 7. Ambainis, A.: Quantum Walk Algorithm for Element Distinctness. SIAM Journal on Computing 37, 210–239 (2007) 8. Ambanis, A., Bach, E., Nayak, A., Vishwanath, A., Watrous, J.: One-dimensional quantum walks. In: Proc. of the 33rd ACM Symposium on Theory of computing, pp. 37–49 (2001) 9. Ambainis, A., Kempe, J., Rivosh, A.: Coins make quantum walks faster. In: Proc. of the 16th ACM-SIAM Symposium on Discrete Algorithms, pp. 1099–1108 (2005) 10. Bennett, C., Bernstein, E., Brassard, G., Vazirani, U.: Strengths and weaknesses of quantum computing. SIAM Journal on Computing 26(5), 1510–1523 (1997)
Quantum Walk Based Search Algorithms
45
11. Brassard, G., Høyer, P., Mosca, M., Tapp, A.: Quantum amplitude amplification and estimation. In: Lomonaco Jr., S.J., Brandt, H.E. (eds.) Quantum Computation and Quantum Information: A Millennium Volume, American Mathematical Society. Contemporary Mathematics Series, vol. 305, pp. 53–74 (2002) 12. Buhrman, H., Spalek, R.: Quantum verification of matrix products. In: Proc. of the 17th ACM-SIAM Symposium on Discrete Algorithms, pp. 880–889 (2006) 13. Cleve, R., Ekert, A., Macchiavello, C., Mosca, M.: Quantum algorithms revisited. In: Proc. of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 454(1969), pp. 339–354 (1998) 14. Cormen, T., Leiserson, C., Rivest, R., Stein, C.: Introduction to Algorithms, 2nd edn. The MIT Press and McGraw-Hill (2001) 15. D¨ orn, S., Thierauf, T.: The Quantum Query Complexity of Algebraic Properties. In: Proc. of the 16th International Symposium on Fundamentals of Computation Theory, pp. 250–260 (2007) 16. Grover, L.: A fast quantum mechanical algorithm for database search. In: Proc. of the 28th ACM Symposium on the Theory of Computing, pp. 212–219 (1996) 17. Halmos, P.: Finite-dimensional vector spaces. Springer, Heidelberg (1974) 18. Høyer, P., Mosca, M., de Wolf, R.: Quantum search on bounded-error inputs. In: Baeten, J.C.M., Lenstra, J.K., Parrow, J., Woeginger, G.J. (eds.) ICALP 2003. LNCS, vol. 2719, pp. 291–299. Springer, Heidelberg (2003) 19. Kempe, J.: Discrete Quantum Walks Hit Exponentially Faster. In: Proc. of the International Workshop on Randomization and Approximation Techniques in Computer Science, pp. 354–369 (2003) 20. Kempe, J.: Quantum random walks – an introductory survey. Contemporary Physics 44(4), 307–327 (2003) 21. Kitaev, A.: Quantum measurements and the abelian stabilizer problem. Electronic Colloquium on Computational Complexity (ECCC) 3 (1996) 22. Knuth, D.: Sorting and Searching. The Art of Computer Programming, vol. 3. Addison-Wesley, Reading (1973) 23. Magniez, F., Nayak, A.: Quantum complexity of testing group commutativity. Algorithmica 48(3), 221–232 (2007) 24. Magniez, F., Nayak, A.: Personal communication (2008) 25. Magniez, F., Nayak, A., Roland, J., Santha, M.: Search via quantum walk. In: Proc. of the 39th ACM Symposium on Theory of Computing, pp. 575–584 (2007) 26. Magniez, F., Santha, M., Szegedy, M.: Quantum Algorithms for the Triangle Problem. SIAM Journal of Computing 37(2), 413–427 (2007) 27. Meyer, D.: From quantum cellular automata to quantum lattice gases. Journal of Statistical Physics 85(5-6), 551–574 (1996) 28. Meyer, D.: On the abscence of homogeneous scalar unitary cellular automata. Physical Letter A 223(5), 337–340 (1996) 29. Moore, C., Russell, A.: Quantum Walks on the Hypercube. In: Proc. of the 6th International Workshop on Randomization and Approximation Techniques in Computer Science, pp. 164–178 (2002) 30. Nayak, A., Vishwanath, A.: Quantum walk on the line. Technical Report quantph/0010117, arXiv (2000) 31. Pak, I.: Testing commutativity of a group and the power of randomization (2000), Electronic version http://www-math.mit.edu/∼ pak/research.html 32. Richter, P.: Almost uniform sampling via quantum walks. New Journal of Physics (to appear) 33. Sch¨ oning, U.: A Probabilistic Algorithm for k -SAT Based on Limited Local Search and Restart. Algorithmica 32(4), 615–623 (2002)
46
M. Santha
34. Shenvi, N., Kempe, J., Whaley, K.B.: Quantum random-walk search algorithm. Physical Review A 67(052307) (2003) 35. Szegedy, M.: Quantum Speed-Up of Markov Chain Based Algorithms. In: Proc. of the 45th IEEE Symposium on Foundations of Computer Science, pp. 32–41 (2004) 36. Vazirani, U.: On the power of quantum computation. Philosophical Transactions of the Royal Society of London, Series A 356, 1759–1768 (1998) 37. Watrous, J.: Quantum simulations of classical random walks and undirected graph connectivity. Journal of Computer and System Sciences 62(2), 376–391 (2001)
Propositional Projection Temporal Logic, B¨ uchi Automata and ω-Regular Expressions Cong Tian and Zhenhua Duan Institute of Computing Theory and Technology Xidian University {ctian,zhhduan}@mail.xidian.edu.cn
Abstract. This paper investigates the language class defined by Propositional Projection Temporal Logic with star (PPTL with star). To this end, B¨ uchi automata are first extended with stutter rule (SBA) to accept finite words. Correspondingly, ω-regular expressions are also extended (ERE) to express finite words. Consequently, by three transformation procedures between PPTL with star, SBA and ERE, PPTL with star is proved to represent exactly the full regular language. Keywords: Propositional Projection Temporal Logic, B¨ uchi automata, ω-regular expression, expressiveness.
1
Introduction
Temporal logic is a useful formalism for describing sequences of transitions between states in a reactive system. In the past thirty years, many kinds of temporal logics are proposed within two categories, linear-time and branching-time logics. In the community of linear-time logics, the most widely used logics are Linear Temporal Logic (LTL) [?] and its variations. In the propositional framework, Propositional LTL (PLTL) has been proved to have the expressiveness of starfree regular expressions [12,16]. Considering the expressive limitation of PLTL, extensions such as Quantified Linear Time Temporal Logic (QLTL) [13], Extended Temporal Logic (ETL) [9,14] and Linear mu-calculus (νTL) [15] etc, were introduced to PLTL to express the full regular language. Nevertheless, results [17,18,19,20] have shown that temporal logic needs some further extensions in order to support a compositional approach to the specification and verification of concurrent systems. These extensions should enable modular and compositional reasoning about loops and sequential composition as well as about concurrent ones. Therefore, kinds of extensions are proposed. Prominently, one of the important extensions is the addition of the chop operator. The work in [9] showed that process logic with both chop operator and its reflexive-transitive closure (chop star ), which is called slice in process logic, is strictly more expressive. The resulting logic is still decidable and in fact has the expressiveness of full regular expressions.
This research is supported by the NSFC Grant No.60433010.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 47–58, 2008. c Springer-Verlag Berlin Heidelberg 2008
48
C. Tian and Z. Duan
Interval Temporal Logic (ITL) [4] is an easily understood temporal logic with next, chop and a projection operator proj. In the two characteristic operators, chop implements a form of sequential composition while proj yields repetitive behaviors. ITL without projection has similar expressiveness as Rosner and Pnueli’s choppy logic [3]. Further, addition of the proj operator will brings more powerful expressiveness, since repetitive behaviors are allowed. However, no systematic proofs have been given in this aspect. Projection Temporal Logic (PTL) [5] is an extension of ITL. It extends ITL to include infinite models and a new projection construct, (P1 , ..., Pm ) prj Q, which is much more flexible than the original one. However, in the propositional case1 , the projection construct needs further to be extended to projection star, (P1 , ..., (Pi , ..., Pj ) , ..., Pm ) prj Q, so that it can subsume chop, chop star, and the original projection (proj ) in [4]. This extension makes the underlying logic more powerful without the lose of decidability [22]. Within PTL, plenty of logic laws have been formalized and proved [5], and a decision procedure for checking the satisfiability of Propositional Projection Temporal Logic (PPTL) formulas are given in [6,7]. Based on the decision procedure, a model checking approach based on SPIN for PPTL is proposed [8]. Further, in [22], projection star is introduced to PPTL, and the satisfiability for PPTL with star formulas is proved to be still decidable. Instinctively, PPTL with star is powerful enough to express the full regular expression. Thus, by employing PPTL with star formulas as the property specification language, the verification of concurrent systems with the model checker SPIN will be completely automatic. This will overcome the error-prone hand-writing of a never claim in the original SPIN since some properties cannot be specified by PLTL formulas. Further, since PPTL with star can subsume chop construct, compositional approach for the specification and verification of concurrent systems with SPIN will be allowed. Therefore, we are motivated to give a systematic proof concerning the expressiveness of PPTL with star formulas. To this end, stutter B¨ uchi automata and extended ω-regular expressions are introduced first. Subsequently, by three transformation procedures beteen PPTL with star, SBA, and ERE, PPTL with star is proved to represent exactly the full regular language. The paper is organized as follows. The syntax and semantics of PPTL with star are briefly introduced in the next section. Section 3 and Section 4 present the definition of stutter B¨ uchi automata and extended regular expressions respectively. Section 5 is devoted to proving the expressiveness of PPTL with star formulas. Precisely, three transformations between PPTL with star, SBA and ERE are given. Finally, conclusions are drawn in Section 6.
2
Propositional Projection Temporal Logic with Star
Our underlying logic is propositional projection temporal logic with star. It extends PPTL to include projection star. It is an extension of propositional interval temporal Logic (PITL). 1
In the first order case, projection star is a derived formula.
Propositional Projection Temporal Logic
49
Syntax: Let Prop be a countable set of atomic propositions. The formula P of PPTL with star is given by the following grammar: P ::= p | P |¬P |P ∨ Q|(P1 , ..., Pm ) prj Q|(P1 , ..., (Pi , ..., Pj ) , ..., Pm ) prj Q where p ∈ Prop, P1 , ..., Pm , P and Q are all well-formed PPTL formulas. (next ), prj (projection) and prj (projection star ) are basic temporal operators. The abbreviations true, f alse, ∧, → and ↔ are defined as usual. In particudef def lar, true = P ∨ ¬P and f alse = P ∧ ¬P . In addition, we have the following derived formulas. def def more = ¬empty empty = ¬ true 0 P
def
n P
= P
def
len(0) = empty def
skip = len(1) def
P ; Q = (P, Q) prj empty 2P
def
P
1
def
P
∗
def
= ¬3¬P = P
def
= (n−1 P ), n ≥ 1
def
len(n) = n empty, n ≥ 1 def P = empty ∨ P 3P
def
P
0
def
P
n
def
= true; P = empty = (P (n) ) prj empty, n ≥ 1
def
= (P ) prj empty P = (P ⊕ ) prj empty (P1 , ..., Pi−1 , (Pi , ..., Pj )(0) , Pj+1 , ..., Pm ) prj Q +
def
= (P1 , ..., Pi−1 , Pj+1 , ..., Pm ) prj Q (P1 , ..., Pi−1 , (Pi , ..., Pj )(1) , Pj+1 , ..., Pm ) prj Q
def
= (P1 , ..., Pi−1 , Pi , ...Pj , Pj+1 , ..., Pm ) prj Q (P1 , ..., Pi−1 , (Pi , ..., Pj )(n) , Pj+1 , ..., Pm ) prj Q
def
= (P1 , ..., Pi−1 , Pi , ...Pj , (Pi , ..., Pj )n−1 , Pj+1 , ..., Pm ) prj Q, n ≥ 1 (P1 , ..., Pi−1 , (Pi , ..., Pj )⊕ , Pj+1 ..., Pm ) prj Q
def
= (P1 , ..., Pi−1 , Pi , ...Pj , (Pi , ..., Pj ) , Pj+1 , ..., Pm ) prj Q
where (weak next), 2 (always), 3 (sometimes), ; (chop), prj ⊕ (projection plus), ∗ (chop star) and + (chop plus) are derived temporal operators; empty denotes an interval with zero length, and more means the current state is not the final one over an interval. Semantics: Following the definition of Kripke’s structure [2], we define a state s over Prop to be a mapping from Prop to B = {true, f alse}, s : P rop −→ B. We will use s[p] to denote the valuation of p at state s. An interval σ is a non-empty sequence of states, which can be finite or infinite. The length, |σ|, of σ is ω if σ is infinite, and the number of states minus 1 if σ is finite. To have a uniform notation for both finite and infinite intervals, we will use extended integers as indices. That is, we consider the set N0 of non-negative integers and ω, Nω = N0 ∪ {ω}, and extend the comparison operators, =, <, ≤, to Nω by considering ω = ω, and for all i ∈ N0 , i < ω. Moreover, we define as ≤ −{(ω, ω)}. To simplify definitions, we will denote σ by < s0 , ..., s|σ| >, where s|σ| is undefined if σ is infinite. With such a notation, σ(i..j) (0 ≤ i j ≤ |σ|) denotes the sub-interval < si , ..., sj >
50
C. Tian and Z. Duan
and σ (k) (0 ≤ k |σ|) denotes < sk , ..., s|σ| >. Further, the concatenation (·) of two intervals σ and σ is defined as follows, σ, if |σ| = ω σ · σ = < s0 , ..., si , si+1 , ... > if σ =< s0 , ..., si >, σ =< si+1 , ... >, i ∈ N0 And the fusion of two intervals σ and σ is also defined as below, σ, if |σ| = ω σ ◦ σ = < s0 , ..., si , ... > if σ =< s0 , ..., si >, σ =< si , ... >, i ∈ N0 Moreover, σ ·ω means infinitely many interval σ are concatenated, while σ ◦ω denotes infinitely many σ are fused together. Let σ =< s0 , s1 , . . . , s|σ| > be an interval and r1 , . . . , rh be integers (h ≥ 1) such that 0 ≤ r1 ≤ r2 ≤ . . . ≤ rh |σ|. The projection of σ onto r1 , . . . , rh is the interval (namely projected interval) σ ↓ (r1 , . . . , rh ) =< st1 , st2 , . . . , stl > where t1 , . . . , tl is obtained from r1 , . . . , rh by deleting all duplicates. That is, t1 , . . . , tl is the longest strictly increasing subsequence of r1 , . . . , rh . For instance, < s0 , s1 , s2 , s3 , s4 >↓ (0, 0, 2, 2, 2, 3) =< s0 , s2 , s3 > We need also to generalize the notation of σ ↓ (r1 , ..., rm ) to allow ri to be ω. For an interval σ =< s0 , s1 , ..., s|σ| > and 0 ≤ r1 ≤ r2 ≤ ... ≤ rh ≤ |σ| (ri ∈ Nω ), we define σ ↓ () = ε, σ ↓ (r1 , ..., rh , ω) = σ ↓ (r1 , ..., rh ). This is convenient to define an interval obtained by taking the endpoints (rendezvous points) of the intervals over which P1 , . . . , Pm are interpreted in the projection construct. An interpretation is a tuple I = (σ, k, j), where σ is an interval, k is an integer, and j an integer or ω such that k j ≤ |σ|. We use the notation (σ, k, j) |= P to denote that formula P is interpreted and satisfied over the subinterval < sk , ..., sj > of σ with the current state being sk . The satisfaction relation (|=) is inductively defined as follows: I − prop I − not I − or I − next I − prj
I − prj
I I I I I
|= p iff sk [p] = true, for any given proposition p |= ¬P iff I |= P |= P ∨ Q iff I |= P or I |= Q |= P iff k < j and (σ, k + 1, j) |= P |= (P1 , ..., Pm ) prj Q, if there exist integers r0 ≤ r1 ≤ ... ≤ rm ≤ j such that (σ, r0 , r1 ) |= P1 , (σ, rl−1 , rl ) |= Pl , 1 < l ≤ m, and (σ , 0, |σ |) |= Q for one of the following σ : (a) rm < j and σ = σ ↓ (r0 , ..., rm ) · σ(rm +1..j) or (b) rm = j and σ = σ ↓ (r0 , ..., rh ) for some 0 ≤ h ≤ m I |= (P1 , ..., (Pi , ..., Pj ) , ..., Pm ) prj Q iff ∃n ∈ N0 , I |= (P1 , ..., (Pi , ..., Pj )n , ..., Pm ) prj Q or there exist integers r0 ≤ r1 ≤ ... ≤ ri ≤ ... ≤ rj ≤ r2i ≤ ... ≤ r2j ≤ r3i ... ≤ rk ω, limk→ω rk = ω, such that (σ, rl−1 , rl ) |= Pl , 0 < l < i, (σ, rl−1 , rl ) |= Pt , l ≥ i, t = i + (l mod (j − i + 1)), and (σ , 0, |σ |) |= Q, where σ = σ ↓ (r0 , ..., rk , ω), l mod 1 = 0.
Satisfaction and Validity: A formula P is satisfied by an interval σ, denoted by σ |= P , if (σ, 0, |σ|) |= P . A formula P is called satisfiable if σ |= P for some σ. A formula P is valid, denoted by |= P , if σ |= P for all σ.
Propositional Projection Temporal Logic
3
51
B¨ uchi Automata with Stutter
Definition 1. A B¨ uchi automaton is a tuple B = (Q, Σ, I, δ, F ), where, – – – – –
Q = {q0 , q1 , ..., qn } is a finite, non-empty set of locations; Σ = {a0 , a1 , ..., am } is a finite, non-empty set of symbols, namely alphabet; I ⊆ Q is a non-empty set of initial locations; δ ⊆ Q × Σ × Q is a transition function; F ⊆ Q is a set of accepting locations.
An infinite word w over Σ is an infinite sequence w = a0 a1 ... of symbols, ai ∈ Σ. A run of B over an infinite word w = a0 a1 ... is an infinite sequence ρ = q0 q1 ... of locations qi ∈ Q such that q0 ∈ I and (qi , ai , qi+1 ) ∈ δ holds for all i ∈ N0 . In this case, we call w the word associated with ρ, and ρ the run associated with w. The run ρ is an accepting run iff there exists some q ∈ F such that qi = q holds uchi automaton for infinitely many i ∈ N0 . The language L(B) accepted by a B¨ B is the set of infinite words for which there exists some accepting run ρ of B. Similar to the approach adopted in SPIN [10] for modeling finite behaviors of a system with a B¨ uchi automaton, the stuttering rule is adopted so that the classic notion of acceptance for finite runs (thus words) would be included as a special case in B¨ uchi automata. To apply the rule, we extend the alphabet Σ with a fixed predefined null-label , representing a no-op operation that is always executable and has no effect. For a B¨ uchi automaton B, the stutter extension of finite run ρ with final state qn is the ω-run ρ such that (qn , , qn )ω is the suffix of ρ. The final state of the run can be thought to repeat null action infinitely. It follows that such a run would satisfy the rules for B¨ uchi acceptance if and only if the original final location qn is in the set F of accepting locations. This means that it indeed generalizes the classical definition of the finite acceptance. In what follows, we denote B¨ uchi automata with the stutter extension, simply as stutter-B¨ uchi automata (SBA for short).
4
Extended Regular Expression
Corresponding to the stutter-B¨ uchi automata, we define a kind of extended ωregular expression (ERE) which is capable of defining both finite and infinite strings. Let Υ = {r1 , ..., rn } be a finite set of symbols, namely alphabet. The extended ω-regular expressions are defined as follows, ERE R ::= ∅ | | r | R + R | R • R | Rω | R∗ where r ∈ Υ , denotes an empty string; +, • and ∗ are union, concatenation and Kleene (star) closure respectively; Rω means infinitely many R are concatenated. In what follows, we use ERE to denote the set of extended regular expressions. Before defining the language expressed by the extended regular expressions, we first introduce strings and operations on strings. A string is a finite or infinite sequence of symbols, a0 a1 ...ai ..., where each ai is chosen from the alphabet Υ . The length of a finite string w, denoted by |w|, is the number of the symbols in
52
C. Tian and Z. Duan
w while the length of an infinite string is ω. For two strings w and w , w • w , w∗ and wω are defined as follows, w, if |w| = ω w • w = a0 ...ai ai+1 ..., if w = a0 ...ai , and w = ai+1 ... ⎧ ⎨ w, if |w| = ω wω = a 0 ...ai ...a0 ...ai ..., if w is finite and w = a0 ...ai
⎩ ω times ⎧ ⎨ w, if |w| = ω ...a0 ...ai , if w is finite and w = a0 ...ai w∗ = ∃n ∈ Nω , a0 ...ai
⎩ n times Further, if W and W are two sets of strings. Then W • W , W ω and W ∗ are defined as follows, W • W = {w • w | w ∈ W and w ∈ W } W ω = {wω | w ∈ W } W ∗ = {w∗ | w ∈ W } Accordingly, the language L(R) expressed by extended regular expression R is given by, Lr1 L(∅) = ∅ Lr3 L( ) = { } Lr5 L(R • R) = L(R) • L(R) Lr7 L(R∗ ) = L(R)∗
Lr2 L(r) = {r} Lr4 L(R+R) = L(R) ∪ L(R) Lr6 L(Rω ) = L(R)ω
For a string w, if w ∈ L(R), w is called a word of expression R. For convenience, we use PPTL* to denote the set of all PPTL with star formulas, SBA the set of all Stutter B¨ uchi Automata, and ERE the set of all Extended ω-Regular Expressions. Further, the language classes determined by PPTL*, SBA and ERE are represented by L(PPTL*), L(SBA) and L(ERE) respectively. That is, L(P P T L∗) = {L(P )|P ∈ P P T L∗} L(SBA) = {L(B)|B ∈ SBA} L(ERE) = {L(R)|R ∈ ERE}
5
Relationship between PPTL*, ERE and SBA
Even though the extended ω-regular expressions, PPTL with star formulas, and stutter-B¨ uchi automata describe languages fundamentally in different ways, however, it turns out that they represent exactly the same class of languages, named the “full regular languages” as concluded in Theorem 1. Theorem 1. PPTL with star formulas, extended regular expressions and stutter B¨ uchi automata have the same expressiveness. 2
Propositional Projection Temporal Logic
53
L(PPTL∗ )
L(SBA)
L(ERE )
Fig. 1. The relationship between three language classes
In order to prove this theorem, we will show the following facts: (1) For any P , there is an SBA B such that L(B) = L(P ); (2) For any SBA B, there is an ERE R such that L(R) = L(B); (3) For any ERE R, there is a PPTL with star formula P such that L(P ) = L(R). The relationship is depicted in Fig.1, where an arc from language class X to Y means that each language in X can also be defined in Y . This convinces us that three language classes are equivalence. For the purpose of transformations between PPTL*, SBA and ERE, we define the set Qp of atomic propositions appearing in PPTL with star formula Q, |Qp | = l , and further need define alphabets Σ and Υ for SBA and ERE. To do so, We first define sets Ai , 1 ≤ i ≤ l, as follows, Ai = {{qj˙ 1 , ..., q˙ji } | qjk ∈ Qp , qj˙k denotes qjk or ¬qjk , 1 ≤ k ≤ i} l l Then, Σ = i=1 Ai ∪ {true} ∪ { }, and Υ = i=1 Ai ∪ {true}. For instance, if Qp = {p1 , p2 , p3 }, it is obtained that, A1 = {{p˙1}, {p˙2 }, {p˙3 }}, A2 = {{p˙1 , p˙2 }, {p˙1 , 3 p˙3 }, {p˙2 , p˙3 }}, A3 = {{p˙1 , p˙2 , p˙3 }}. So, we have,Σ = i=1 Ai ∪ {true} ∪ { } = 3 {{p˙1 }, {p˙2 }, {p˙3 }, {p˙1 , p˙2 }, {p˙1 , p˙3 }, {p˙2 , p˙3 }, {p˙1 , p˙2 , p˙3 }, true, }, Υ = i=1 Ai ∪ {true} = {{p˙1 }, {p˙2 }, {p˙3 }, {p˙1 , p˙2 }, {p˙1 , p˙3 }, {p˙2 , p˙3 }, {p˙1 , p˙2 , p˙3 }, true}. Obviously, for each r ∈ Υ , r is a set of atomic propositions or their negations, denoted by true or {q˙i , ..., q˙j }, where 1 ≤ i ≤ j ≤ l. 5.1
From PPTL with Star Formulas to SBAs
For PPTL with star formulas, their normal forms are the same as for PPTL formulas [6,7]. In [22], an algorithm was given for transforming a PPTL with star formula to its normal form. Further, based on the normal form, labeled normal form graph (LNFG) for PPTL with star formulas are constructed to precisely characterize the models of PPTL with star formulas. Also an algorithm was given to construct the LNFG of a PPTL with star formula [22]. The details about normal forms and LNFGs can be found in [7,22]. Here we focus on how to transform an LNFG G to an SBA B (corresponding SBA of G). For the clear presentation of the transformation, LNFGs are briefly introduced first. The LNFG of formula P can be expressed as a graph, G = (V, E, v0 , Vf ), where V denotes the set of nodes in the LNFG, E is the set of directed edges among V , v0 ∈ V is the initial (or root) node, and Vf ⊆ V denotes the set of nodes with finite label F . V and E in G can inductively be constructed by algorithm LNFG composed in [7]. Actually, in an LNFG, a node v ∈ V denotes a PPTL with star formula and initial node is P itself while an edge from node vi to v j is a tuple (vi , Qe , vj ) where vi and vj are PPTL with star formulas and Qe ≡ i q˙i , qi is an atomic proposition, q˙i denotes qi or ¬qi . The following is an example of LNFG.
54
C. Tian and Z. Duan
true
v0
v1
q
q v4
v0 : ¬(true; ¬ q) ∨ p ∧ q
p v2
true
v1 : q ∧ ¬(true; ¬ q) v2 : q
q v3
v3 : true v4 : ε
true
Fig. 2. LNFG of formula ¬(true; ¬ q) ∨ p ∧ q
Example 1. The LNFG of formula ¬(true; ¬ q) ∨ p ∧ q. As shown in Fig.2, the LNFG of formula ¬(true; ¬ q) ∨ p ∧ q is G={V, E, v0 , Vf }, where V = {v0 , v1 , v2 , v3 , v4 }; E = {(v0 , true, v1 ), (v0 , p, v2 ), (v1 , q, v1 ), (v2 , q, v4 ), (v2 , q, v3 ), (v3 , true, v3 ), (v3 , true, v4 )}; the root node is v0 ; and Vf = ∅. 2 Factually, an LNFG contains all the information of the corresponding SBA. The set of nodes is in fact the set of locations in the corresponding SBA; each edge (vi , Qe , vj ) forms a transition; there exists only one initial location, the root node; the set of accepting locations consists of ε node and the nodes which can appear in infinite paths for infinitely many times. Given an LNFG G = (V, E, v0 , Vf ) of formula P , an SBA of formula P , B = (Q, Σ, I, δ, F ), over an alphabet Σ can be constructed as follows. • Sets of the locations Q and the initial locations I: Q = V , and I = {v0 }. • Transition δ: Let q˙k be an atomic proposition or its negation. We define a m0 function atom( m0k=1 q˙k ) for picking up atomic propositions or their negations appearing in k=1 q˙k as follows, atom(true) = true {qk }, if q˙k ≡ qk 1 ≤ k ≤ l atom(q˙k ) = {¬qk }, otherwise m0 m0 atom( k=1 q˙k ) = atom(q˙1 ) ∪ atom( k=2 q˙k ) For each ei = (vi , Qe , vi+1 ) ∈ E, there exists vi+1 ∈ δ(vi , atom(Qe )). For node ε, δ(ε, ) = {ε}. • Accepting locations F : We have proved in [6,7] that infinite paths in an LNFG precisely characterize the infinite models of the corresponding formula. In fact, there exists an infinite path if and only if there are some nodes appearing in the path for infinitely many times. Therefore, the nodes appearing in infinite paths for infinitely many times are defined as the accepting locations in the SBA. In addition, by employing the stutter extension rule, ε node is also an accepting location. Lemma 1. For any PPTL with star formula P , there is an SBA B such that L(B) = L(P ). 2 Formally, algorithm Lnfg-Sba is useful for transforming an LNFG to an SBA. Also Example 2 is given to show how the algorithm works.
Propositional Projection Temporal Logic q4 init
q1 a1
a0
q0 a2
q2
a3
a1
q3
a0 = {true} a1 = {q} a2 = {p}
a0 a1
55
a0
a3 = {τ }
Fig. 3. Stutter-B¨ uchi automaton of formula P ≡ ¬(true; ¬ q) ∨ p ∧ q
Example 2. Constructing the SBA, B=(Q, Σ, I, δ, F ), from the LNFG in Example 1. As depicted in Fig.3, the set of locations, Q={q0 , q1 , q2 , q3 , q4 }, comes from V directly. The set of initial locations I={q0 } is root node v0 in G. The set of the accepting locations F ={q1 , q3 , q4 } consists of nodes v1 , v3 appearing in loops and ε node in V . The transitions, δ(q0 , a0 )={q1 }, δ(q0 , a2 )={q2 }, δ(q1 , a1 )={q1 }, δ(q2 , a1 )={q3 , q4 }, δ(q3 , a0 )={q3 , q4 }, δ(q4 , a3 )={q4 } are formalized according to the edges in E. 2 Function Lnfg-Sba(G) /* precondition: G = (V, E, v0 , Vf ) is the LNFG of PPTL formula P */ /* postcondition: Lnfg-Sba(G) computes an SBA B = (Q, Σ, I, δ, F ) from G*/ begin function Q = ∅; F = φ; I = ∅; for each node vi ∈ V , add a state qi to Q, Q = Q ∪ {qi }; if vi is ε, F = F ∪ {qi }; δ(qi , ) = {qi }; else if vi appears in loops and vi ∈ Vf , F = F ∪ {qi }; end for if q0 ∈ Q, I = I ∪ {q0 }; for each edge e = (vi , Pe , vj ) ∈ E, qj ∈ δ(qi , atom(Pe )); end for return B = (Q, Σ, I, δ, F ) end function
5.2
From SBAs to EREs
For the proof of the language L(A) of any finite state automaton A being regular [24], Arden’s rule [11] plays an important role. Lemma 2. (Arden’s Rule) For any sets of strings S and T , the equation X= S • X + T has X = S ∗ • T as a solution. Moreover, this solution is unique if ∈ S. 2 From now on we shall often drop the concatenation symbol •, writing SX for S • X etc. In the following, we show how Arden’s rule is used to equivalently transform an SBA to an ERE. Given a stutter-B¨ uchi automaton B with Q = {q0 , ..., qn } and the starting location q0 . For 1 ≤ i ≤ n, let Xi denote the ERE where L(Xi ) equals to the set of strings accepted by the sub-automaton of B starting at location qi ; thus L(B) = L(X0 ). We can write an equation for each Xi in terms of the languages defined by its successor locations. For example, for the stutter-B¨ uchi automaton B in Example 2, we have,
56
C. Tian and Z. Duan
(0) X0 = a0 X1 + a2 X2 (1) X1 = a1 X1 + aω 1 (2) X2 = a1 X4 + a1 X3 (3) X3 = a0 X3 + a0 X4 + aω 0 (4) X4 = a3 X4 + aω 3 ω ω Note that X1 , X3 and X4 contains aω 1 , a0 and a3 respectively because q1 , q3 and q4 are accepting states with self-loops2 . Now we use Arden’s rule to solve the equations. First, for (4), since a3 is , ∗ ω ω X 4 = a3 X 4 + aω 3 = a3 a3 = a3 = Replacing X4 in (3), ω ∗ ∗ ω ∗ ∗ X 3 = a0 X 3 + a0 X 4 + aω 0 = a0 X3 + a0 + a0 = a0 a0 + a0 a0 = a0 a0 = true true Replacing X3 and X4 in (2), X2 = a1 X4 + a1 X3 = {q} + {q}true∗ true For (1), ∗ ω ω ω X 1 = a1 X 1 + aω 1 = a1 a1 = a1 = {q} Finally, replacing X1 and X2 in (0), we have, X0 = a0 X1 + a2 X2 = a0 {q}ω + a2 ({q} + {q}true∗ true) = true{q}ω + {p}{q} + {p}{q}true∗true
Lemma 3. For any SBA B, there is an ERE R such that L(R) = L(B). 5.3
2
From EREs to PPTL with Star Formulas
Let Γ be the set of all models of PPTL with star. For any extended regular expression R ∈ ERE, we can construct a PPTL with star formula FR such that, (1) for any model σ ∈ Γ , if σ |= FR , then Ω(σ) ∈ L(R); and (2) for any word w ∈ L(R), there exists σ ∈ Γ , σ |= FR and Ω(σ) = w. The mapping function Ω : Γ → Υ ∗ from models of PPTL formulas to words of extended regular expression is defined ⎧ as follows, if |σ| = 0 ⎨ , Ω(σ) = A(s0 )...A(sj−1 ) if σ is finite and σ =< s0 , ..., sj >, j ≥ 1 ⎩ A(s0 )...A(sj )... if σ is infinite and σ =< s0 , ..., sj , ... > where A(si ) denotes true, or the set of propositions and their negations holding at state si . It is not difficult to prove that Ω(σ1 ◦ σ2 ) = Ω(σ1 ) • Ω(σ2 ), Ω(σ ◦ω ) = Ω(σ)ω and Ω(σ ◦∗ ) = Ω(σ)∗ . FR is constructed inductively on the structure of R. def F∅ = f alse def
F = empty p˙i ∧ ... ∧ p˙j ∧ skip, if r = {p˙i , ..., p˙j }, 1 ≤ i ≤ j ≤ l def Fr = true ∧ skip, if r = true where r ∈ Υ . Inductively, if R1 and R2 are extended regular expressions, then def FR1 +R2 = FR1 ∨ FR2 def
FR1 •R2 = FR1 ; FR2 FRω FR∗1 2
def
= FR∗ ∧ 2more
def
= FR∗ 1
For finite state automata, Xi contains if qi is accepted.
Propositional Projection Temporal Logic
57
Now we need to prove that, for any R ∈ ERE and σ ∈ Γ , if σ |= FR , then Ω(σ) ∈ L(R); for any w, if w ∈ L(R), then there exists σ ∈ Γ such that Ω(σ) = w and σ |= FR . Lemma 4. For any ERE R, there is a PPTL with star formula P such that L(P ) = L(R). 2 Example 3. Constructing PPTL with star formula from the extended regular expression, true{q}ω + {p}{q} + {p}{q}true∗true obtained in Example 2. Ftrue{q}ω +{p}{q}+{p}{q}true∗ true ≡ Ftrue{q}ω ∨ F{p}{q} ∨ F{p}{q}true∗ true ≡ Ftrue ; F{q}ω ∨ F{p} ; F{q} ∨ F{p} ; F{q} ; Ftrue∗ ; Ftrue ≡ true ∧ skip; F{q}∗ ∧ 2more ∨ p ∧ skip; q ∧ skip ∨ p ∧ skip; q ∧ skip; (true ∧ skip)∗ ; true ∧ skip ≡ skip; (q ∧ skip)∗ ∧ 2more ∨ p ∧ skip; q ∧ skip ∨ p ∧ skip; q ∧ skip; skip∗ ; skip 2
6
Conclusions
Further, it is readily to prove the following useful conclusions concerning characters of fragments of PPTL with star. To avoid abuse of notations, we use an expression like L(next, chop) to refer to the specific fragment of PPTL with star with temporal operators next, chop and the basic connections in the typical propositional logic. 1 L(chop) has the same expressiveness as star-free regular expressions without . 2 L(next, chop) has the same expressiveness as star-free regular expressions. 3 L(next, prj) has the same expressiveness as regular expressions without ω. 4 L(next, chop, chop∗ ) has the same expressiveness as full regular expressions. In this paper, we have proved that the expressiveness of PPTL with star is the same as the full regular expressions. Also, the proof itself provides approaches to translate a PPTL with star formula to an equivalent Buchi automaton, a Buchi automaton to an equivalent extended ω-regular expression, and an extended ω-regular expression to a PPTL with star formula. This enables us to specify and verify concurrent systems by compositional approach with PPTL with star. Further, we have developed a model checker based on SPIN for PPTL with star. Therefore, any systems with regular properties can be automatically verified within SPIN using PPTL with star.
References 1. Pnueli, A.: The temporal logic of programs. In: Proceedings of the 18th IEEE Symposium on Foundations of Computer Science, pp. 46–67.2 IEEE, New York (1977) 2. Kripke, S.A.: Semantical analysis of modal logic I: normal propositional calculi. Z. Math. Logik Grund. Math. 9, 67–96 (1963) 3. Rosner, R., Pnueli, A.: A choppy logic. In: First Annual IEEE Symposium on Logic In Computer Science. LICS, pp. 306–314 (1986)
58
C. Tian and Z. Duan
4. Moszkowski, B.: Reasoning about digital circuits. Ph.D Thesis, Department of Computer Science, Stanford University. TRSTAN-CS-83-970 (1983) 5. Duan, Z.: An Extended Interval Temporal Logic and A Framing Technique for Temporal Logic Programming. PhD thesis, University of Newcastle Upon Tyne (May 1996) 6. Duan, Z., Tian, C.: Decidability of Propositional Projection Temporal Logic with Infinite Models. In: Cai, J.-Y., Cooper, S.B., Zhu, H. (eds.) TAMC 2007. LNCS, vol. 4484, pp. 521–532. Springer, Heidelberg (2007) 7. Duan, Z., Tian, C., Zhang, L.: A Decision Procedure for Propositional Projection Temporal Logic with Infinite Models. Acta Informatica 45, 43–78 (2008) 8. Tian, C., Duan, Z.: Model Checking Propositional Projection Temporal Logic Based on SPIN. In: Butler, M., Hinchey, M.G., Larrondo-Petrie, M.M. (eds.) ICFEM 2007. LNCS, vol. 4789, pp. 246–265. Springer, Heidelberg (2007) 9. Wolper, P.L.: Temporal logic can be more expressive. Information and Control 56, 72–99 (1983) 10. Holzmann, G.J.: The Model Checker Spin. IEEE Trans. on Software Engineering 23(5), 279–295 (1997) 11. Arden, D.: Delayed-logic and finite-state machines. In: Theory of Computing Machine Design, Univ. of Michigan Press, pp. 1–35 (1960) 12. Gabbay, D., Pnueli, A., Shelah, S., Stavi, J.: On the temporal analysis of fairness. In: POPL 1980: Proceedings of the 7th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 163–173. ACM Press, New York (1980) 13. Sistla, A.P.: Theoretical issues in the design and verification of distributed systems. PhD thesis, Harvard University (1983) 14. Vardi, M.Y., Wolper, P.: Yet another process logic. In: Clarke, E., Kozen, D. (eds.) Logic of Programs 1983. LNCS, vol. 164, pp. 501–512. Springer, Heidelberg (1984) 15. Vardi, M.Y.: A temporal fixpoint calculus. In: POPL 1988, pp. 250–259 (1988) 16. McNaughton, R., Papert, S.A.: Counter-Free Automata (M.I.T research monograph no.65). The MIT Press, Cambridge (1971) 17. Barringer, H., Kuiper, R., Pnueli, A.: Now You May Compose Temporal Logic Specifications. In: Proc. 16th STOC, pp. 51–63 (1984) 18. Barringer, H., Kuiper, R., Pnueli, A.: The Compositional Temporal Approach to CSP-like Language. In: Proc.IFIP Conference, The Role of Abstract Models in Information Processing (January 1985) 19. Nguyen, V., Demers, A., Gries, D., Owicki, S.: Logic of Programs 1985. LNCS, vol. 193, pp. 237–254. Springer, Heidelberg (1985) 20. Nguyen, V., Gries, D., Owicki, S.: A Model and Temporal Proof System for Networks of Processes. In: Proc. 12th POPL, pp. 121–131 (1985) 21. Harel, D., Peleg, D.: Process Logic with Regular Formulas. Theoretical Computer Science 38, 307–322 (1985) 22. Tian, C., Duan, Z.: Complexity of Propositional Projection Temporal Logic with Star. Technical Report No.25, Institute of computing Theory and Technology, Xidian University, Xian P.R.China (2007) 23. Arden, D.: Delayed-logic and finite-state machines. In: Theory of Computing Machine Design, pp. 1–35. Univ. of Michigan Press (1960) 24. Milner, R.: Communicating and Mobile System: The -Calculus. Cambridge University Press, Cambridge (1999)
Genome Rearrangement Algorithms for Unsigned Permutations with O(log n) Singletons Xiaowen Lou and Daming Zhu School of Computer Science and Technology, Shandong University
Abstract. Reversal, transposition and transreversal are common events in genome rearrangement. The genome rearrangement sorting problem is to transform one genome into another using the minimum number of rearrangement operations. Hannenhalli and Pevzner discovered that singleton is the major obstacle for unsigned reversal sorting. They also gave a polynomial algorithm for reversal sorting on those unsigned permutations with O(log n) singletons. This paper involves two aspects. (1) We describe one case for which Hannenhalli and Pevzner’s algorithm may fail, and propose a corrected algorithm for unsigned reversal sorting. (2) We propose a (1 )-approximation algorithm for the weighted sorting problem on unsigned permutations with O(log n) singletons. The weighted sorting means: sorting a permutation by weighted reversals, transpositions and transreversals, where reversal is assigned weight 1 and transposition(including transreversal) is assigned weight 2.
1 Introduction The genome rearrangement problem is to find a shortest sequence of evolutionary events (such as reversals, transpositions, and transreversals) that transform one genome into another. Genomes can be represented as signed or unsigned permutations, depending on the data got from experiments. It is well known that this problem is equivalent to that of sorting a permutation by the same set of operations into the identity permutation. Sorting unsigned permutations by reversals is proved to be NP-hard by Caprara [4]. This problem can be approximated to 1.375 [2] as we know. However, sorting signed permutations by reversals can be solved exactly in polynomial time [6]. Hannenhalli and Pevzner discovered that singleton is the major obstacle for sorting unsigned permutations by reversals. They designed a polynomial algorithm for unsigned permutations with O(log n) singletons [7]. However, there is one case for which Hannenhalli and Pevzner’s algorithm may fail. We describe the case and propose a corrected algorithm. Bafna and Pevzner suggested the sorting problem that considers reversals and transpositions simultaneously helps to understand the genome rearrangements related to the evolution of mammalian and viral [3]. Eriksen observed that an algorithm looking for the minimal number of such operations will produce a solution heavily biased towards transposition. Instead, he studied the weighted reversal transposition transreversal sorting Supported by (1) National Nature Science Foundation of China, 60573024. (2) Chinese National 973 Plan, previous special, 2005cca04500. M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 59–69, 2008. c Springer-Verlag Berlin Heidelberg 2008
60
X. Lou and D. Zhu
problem, which assigns 1 to reversal and 2 to transposition or transreversal. The task is to sort a permutation by reversals or transpositions or transreversals with the minimum value of rev2trp, where rev and trp is the number of reversals and transpositions transreversals respectively. Eriksen designed a (1 )-approximation algorithm for the weighted sorting of signed permutations [5]. In this paper, we focus on the same weighted sorting problem for unsigned permutations. We present a (1 )-approximation algorithm for the sorting of unsigned permutations by weighted reversal transposition transreversals, where the permutation is restricted to containing O(log n) singletons.
2 Preliminaries Let [g1 gn ] be a permutation of 1 n. It is also known as an unsigned permutation because each element of has no signs. If each element of has been added a sign of ‘’ or ‘’ to represent gene’s direction, then it is called a signed permutation. Let [g1 gn ] be an unsigned permutation and ¼ [g¼1 g¼n ] be a signed permutation. We say ¼ is a spin of if either g¼i gi or g¼i gi for 1in. A segment of is a sequence of consecutive elements in . For example, segment [gi g j ] (i j) of contains all the elements from gi to g j . We consider three kinds of rearrangement operations: reversal, transposition and transreversal. A reversal r(i j) (i j) cuts the segment [gi1 g j ] out of , reverses it, and then pastes it back to exactly where it is cut; hence the permutation becomes [g1 gi g j gi1 g j1 gn ]. A transposition t(i j k) (i jk) exchanges two consecutive segments [gi1 g j ] and [g j1 gk ] of , and the permutation becomes [g1 gi g j1 gk gi1 g j , gk1 gn ]. A transreversal tr (i j k) (i jk) reverses the segment [gi1 g j ] while exchanging the two consecutive segments [gi1 g j ] and [g j1 gk ], and the permutation becomes [g1 gi g j1 gk g j gi1 gk1 gn ]. Reversing a segment of ¼ changes the direction of each element in the segment. Thus for a signed permutation ¼ , each element in the reversed segment also has its sign flipped. The reversal sorting problem asks to transform into the identity permutation by the minimum number of reversals, where the signed identity permutation is [1 2, n], the unsigned identity permutation is [1 2 n]. And this number is called the reversal distance of , denoted by dr ( ). A reversal scenario is a sequence of reversal operations that transform into . The weighted sorting problem asks to transform into by the minimum weighted sum number of reversals, transpositions, and transreversals, where one reversal counts for once and one transposition or transreversal counts for twice. A weighted scenario is a sequence of such rearrangement operations that transform into . Formally, the weighted distance of is defined as: drt ( )min ¾ rev() 2trp(), where is the set of weighted scenarios sorting , rev() and trp() is the number of reversals and transposition transreversals in a certain weighted scenario respectively. Among all the 2n spins of an unsigned permutation , if there is a spin ¼ satisfying dr ( ¼ )dr ( ), call ¼ an optimal r-spin; if there is a spin ¼ satisfying drt ( ¼ )drt ( ), call ¼ an optimal rt-spin. Write i j if i j1. For an unsigned permutation , a pair of consecutive elements (gi gi1 ) form an adjacency if gi gi1 ; otherwise a breakpoint. A segment [gi g j ]
Genome Rearrangement Algorithms
61
of is called a strip if each (gk gk1 ) is an adjacency for ik j, but both (gi 1 gi ) and (g j g j1 ) are breakpoints. A strip of one element is a singleton; a strip of two elements is called a 2-strip; and otherwise a long strip. If has no singletons, call it a singleton-free permutation. A strip s[gi g j ] is increasing (decreasing) if gi g j (gi g j ). Let ¼ be a spin of , an increasing (decreasing) strip s of is canonical in ¼ if all elements of s are positive (negative) in ¼ . An increasing (decreasing) strip s of is anti-canonical in ¼ if all elements of s are negative (positive) in ¼ . An anti-canonical 2-strip is called an anti-strip. If every strip in ¼ is canonical, then call ¼ a canonical spin of . Two spins 1¼ and 2¼ are twins(with respect to segment s[gi g j ]) if they dier only in the signs of elements in s. For a signed permutation ¼ , there exists a polynomial algorithm computing the reversal distance. Here are some details. First, transform a signed permutation ¼ [g¼1 g¼n ] of order n to a permutation of order 2(n1) as follows: replace the positive element x by (2x1 2x), and the negative element x by (2x 2x1). Then add 0 at the beginning and 2n1 at the end. Such a permutation is called the extended permutation of ¼ , denoted as ¼¼ . The reversal operation on ¼¼ never touches 0 and 2n1, and never breaks the pair (2x1 2x) or (2x 2x1). Therefore, it has the same eect of operating on ¼ . The breakpoint graph G( ¼¼ ) of ¼¼ is defined as follows: set a vertex of G( ¼¼ ) for each element of ¼¼ ; draw a gray edge between vertices 2x and 2x1 if their positions are not consecutive in ¼¼ ; draw a black edge between vertices g2i and g2i1 if they form a breakpoint. Note that every vertex has degree 2 in G( ¼¼ ), so it can be uniquely decomposed into alternating cycles, i.e., cycle with every two consecutive edges having distinct colors. We use l-cycle to denote an alternating cycle with l black edges. Note that the length of a cycle is at least 2. A cycle is oriented if, when we travel along it, at least one black edge is traveled from left to right and at least one black edge is traveled from right to left; otherwise, an unoriented cycle. Two gray edges e1 (gi1 gi2 ) and e2 (g j1 g j2 ) are crossing if i1 j1 i2 j2 or j1 i1 j2 i2 . Cycles C1 and C2 are crossing if there exist crossing gray edges e1 C1 and e2 C2 . If we take each alternating cycle of G( ¼¼ ) as a vertex, and draw an edge between two vertices if the cycles they present are crossing, we get another graph G . Those alternating cycles of G( ¼¼ ) denoted by vertices belonging to the same connected component of G together are referred to as a component of G( ¼¼ ). For every component of G( ¼¼ ), if it contains at least one oriented cycle, call it an oriented component; otherwise, an unoriented component. Imagining we bend a permutation into a circle of counterclockwise order and make (0, 2n1) an adjacency. If there is an interval on the circle which contains one and only one unoriented component, then this component is called a hurdle. If an unoriented component is not a hurdle, call it a non-hurdle. Note that a non-hurdle always stretches over an interval that contains a hurdle. A hurdle is called a super hurdle if removing it will turn a non-hurdle into a hurdle [5][6]. If G( ¼¼ ) has an odd number of hurdles and all these hurdles are super hurdles , then the permutation ¼ is referred to as a fortress. Let b( ¼ ), c( ¼ ), h( ¼ ) denote the number of breakpoints, cycles and hurdles in G( ¼¼ ), respectively. If ¼ is a fortress, let f ( ¼ )1; otherwise, f ( ¼ )0. It is proved in [6] that: dr ( ¼ ) b( ¼ ) c( ¼ ) h( ¼ ) f ( ¼ )
(1)
62
X. Lou and D. Zhu
K1
K2
K3
K4
K5
K6
Fig. 1. A breakpoint graph with six unoriented components
Fig. 1 gives an example of a breakpoint graph with six components K1 K6 . All these components are unoriented components, where K3 , K4 and K6 are simple hurdles, K1 is a super hurdle, K2 and K5 are non-hurdles. For a signed permutation ¼ , there exists a (1 )-approximation algorithm to sort ¼ by weighted reversals, transpositions, and transreversals. First construct G( ¼¼ ) from ¼ as aforementioned. For a component K of the breakpoint graph, we use b(K) and c(K) to denote the number of breakpoints and cycles in K respectively. If K is oriented, we can use b(K)c(K) times of reversals to transform K into b(K) disjoint adjacencies, which is just an optimal weighted scenario sorting K. However, for an unoriented component, we cannot always have a method to sort it optimally by the weighted operations. Thus if drt (K) b(K)c(K), call K a strongly unoriented component (SUC). If an unoriented component is not a SUC, call it a non-SUC. Again imagine bending a permutation into a circle, if there is an interval on the circle which contains one and only one SUC, then the SUC is a strong hurdle. A strong hurdle is a super-strong-hurdle if removing it will make another SUC become a strong hurdle. If a strong hurdle is not a super-stronghurdle, call it a simple-strong-hurdle. If G( ¼¼ ) has an odd number of strong hurdles and all these strong hurdles are super-strong-hurdles, the permutation ¼ is called a strong fortress. Let b( ¼ ), c( ¼ ) and ht ( ¼ ) denote the number of breakpoints, cycles and strong hurdles in G( ¼¼ ), respectively. If ¼ is a strong fortress, let ft ( ¼ )1; otherwise ft ( ¼ )0. It is proved in [5] that: drt ( ¼ ) b( ¼ ) c( ¼ ) ht ( ¼ ) ft ( ¼ )
(2)
Since ht ( ¼ ) and ft ( ¼ ) cannot be computed exactly, only an approximation algorithm is available [5]. As an example, in Fig. 1, K1 , K2 , K4 and K5 are all SUC’s, K3 and K6 are non-SUC’s, since each of them can be removed by one transposition, thus d(K3 )2b(K3 )c(K3 ). And both K1 and K4 are super-strong-hurdles.
3 Corrected Algorithm for Unsigned Reversal Sorting Using formula (1), Hannenhalli and Pevzner proved: Lemma 1. [7] For any unsigned permutation , there exists an optimal r-spin ¼ of such that all the long strips of are canonical in ¼ ; every 2-strip of is either canonical or anti-canonical in ¼ .
Genome Rearrangement Algorithms
63
Lemma 2. [7] For any unsigned permutation , there exists an optimal r-spin ¼ of such that (I) an unoriented component in ¼ does not contain any anti-strip, and (II) an oriented component in ¼ contains at most one anti-strip. Definition 1. super spin [7]: Suppose ¼ is a canonical spin of , for every unoriented component K of G( ¼¼ ), if K contains 2-strips, arbitrarily select any one of them and transform it into an anti-strip, all the other 2-strips remain canonical. Such a spin of is called a super spin. Hannenhalli and Pevzner proved that every super spin of a singleton-free permutation is an optimal r-spin. However, such a super spin is not always optimal. Here is an example: Suppose [5, 6, 3, 4, 1, 2], the canonical spin of is 1¼ [5, 6, 3, 4, 1, 2] and consists of one unoriented component. According to their algorithm, suppose we select 2-strip [5,6] to be turned into anti-strip and get a super spin ¼ ¼ ¼ 2 [5 6 3 4 1 2]. From formula (1), dr ( 1 )4213, dr ( 2 )5104 ¼ ¼ dr ( 1 ); thus 2 can not be an optimal r-spin. Moreover, if we select the canonical 2-strip [3,4] or [1,2] to be turned into an anti-strip, the resulted super spin cannot be optimal either. The breakpoint graphs of G( 1¼¼ ) and G( 2¼¼ ) are shown in Fig. 2.
0 9 10 11 12 5 6 7 8
1 2 3 4 13
0 10 9 12 11 5 6 7 8 1 2 3 4 13
Fig. 2. Breakpoint graph G(
¼¼
1)
and G(
¼¼
2)
In fact, from lemma 1 and 2, to get an optimal r-spin from the canonical spin, it only needs to decide which canonical 2-strips of the canonical spin have to be turned into anti-strips. If one choose a canonical 2-strip which belongs to two cycles of the same unoriented component to be turned into anti-strip, the reversal distance will be increased. And this case is not considered in [7]. The following will consider rectifying Hannenhalli and Pevzner’s algorithm. Without lose of generality, let s[gi gi1 ] (gi gi1 ) be a 2-strip of an unsigned permutation , its canonical form in a canonical spin ¼ is s[gi gi1 ], which is transformed to [2gi 1 2gi 2gi1 1 2gi1] in ¼ ’s extended permutation ¼¼ . If 2gi 1 and 2gi1 belong to the same cycle of G( ¼¼ ), call s a satisfied 2-strip. Using this definition, we correct the definition of super spin as follows: Definition 2. super-r-spin: Suppose ¼ is a canonical spin of , for every unoriented component K of G( ¼¼ ), if K contains satisfied 2-strips, arbitrarily select only one of them and transform it into an anti-strip; otherwise, all the strips of K remain canonical. We call such a spin a super-r-spin. The dierence between super spin and super-r-spin is that only those satisfied 2-strips can be turned into anti-strips. In order to prove the super-r-spin’s optimality, we present some simple lemmas firstly. In what follows, we use f lip(s) to denote flipping the sign of every element in s.
64
X. Lou and D. Zhu
Lemma 3. Let s be a canonical 2-strip, flip(s) will increase the number of breakpoints by 1. Proof. Let s[gi gi1 ] (0gi gi1 ), its extended form is [2gi 1 2gi 2gi1 1 2gi1 ]. We use b1 (x 2gi 1) and b2 (2gi1 y) to denote the two black edges incident to s. The anti-strip s¼ of s is [gi gi1 ], its extended form is [2gi 2gi 1 2gi1 2gi1 1], where (2gi 1 2gi1) forms a new breakpoint, (x 2gi) and (2gi1 1 y) replace the old black edges incident to s. So the overall number of breakpoints will increase by 1. Lemma 4. Let s be a canonical 2-strip of an unoriented component K, flip(s) will turn K into one oriented component K ¼ . Proof. Let s[2gi 1 2gi 2gi1 1 2gi1] in the extended form and b1 and b2 (n1 and n2 ) denote the two black edges (gray edges) incident to 2gi 1 and 2gi1 respectively. From lemma 3, f lip(s) always adds a new black edge m1 (2gi 1 2gi1) and a new gray edge m2 (2gi 2gi11) to the breakpoint graph. After f lip(s), black edge m1 connects n1 with n2 , and gray edge m2 connects the altered b1 with b2 . Since K is unoriented, we can suppose that all the black edges of K are traveled from left to right, see Fig. 3. For convenience, we add an arrow to each edge, the head and tail of the arrow are called the head and tail of the edge. If we retain the direction of b1 , b2 , n1 , n2 , then in K ¼ , no matter s is a satisfied 2-strip or not, the newly created black edge m1 joins n2 ’s head with n1 ’s tail, and the new gray edge m2 joins b1 ’s head with b2 ’s tail. To see it clearly, we stretch out the alternating cycle into a circle, Fig. 4(a) shows m2 b1
n2
n1
flip(s)
b2
x 2gi–1 2gi 2gi+1–1 2gi+1 y
x
b2
n1 m1 n2
b1
2gi 2gi–1 2gi+1 2gi+1–1
y
Fig. 3. Transform a 2-strip into an anti-strip, b1 has the same direction with b2
b1
(a)
n1
m1
n2
(b)
n1
n2
n1
m1 m2
n2
(c)
n1
(bc)
n1
m1
m2
b2
b2 m1
(cc) n2
b2
b1
Fig. 4. Stretched circle
n2
m2
b1
m2
b1
b1
m2
b2
b2
b2
b1
(ac)
m1
n1
m1 n2 m2
Genome Rearrangement Algorithms
65
the case when s is a satisfied 2-strip, and Fig. 4(b) shows the case when s belongs to two cycles. Note that the newly created edges m1 and m2 can not form a 1-cycle, and all the cycles in K ¼ must be alternating cycles. So the configuration of stretched cycles in K ¼ must be the one shown in Fig. 4(a¼ ) and Fig. 4(b¼ ) correspondingly. It is clear that in each case, m1 is directed from n2 ’s head to n1 ’s tail and the directions of all the other edges remain the same. What’s more, in K ¼ , m2 crosses with both n1 and n2 , which makes K ¼ remain one component. From Fig. 3, we know that in both cases, K ¼ is an oriented component. Lemma 5. Let s be a canonical 2-strip of , the two black edges incident to it be b1 and b2 . If s belongs to two cycles, f lip(s) will decrease c( ) by 1; if s is a satisfied 2-strip, f lip(s) will increase c( ) by 1 if b1 has the same direction with b2 , or retain c( ) if b1 has the opposite direction with b2 . Proof. Let s[2gi 1 2gi 2gi1 1 2gi1], its two incident black edges be b1 (x 2gi 1) and b2(2gi1 y). If s belongs to two cycles, i.e., gray edge n1 joins black edge b1 through one more edges and gray edge n2 joins b2 through one more edges. After f lip(s), although b1 and b2 are altered, n1 still joins x through one more edges and n2 joins y through one more edges. On the other hand, in K ¼ , n1 and n2 are joined by the newly created black edge m1 (2gi 1 2gi1 ), the altered b1 and b2 are joined by the newly created gray edge m2 (2gi 2gi1 1), so the two cycles are merged into one. Fig. 4(b) and (b¼ ) shows the stretched cycle transformation when b1 has the same direction with b2 .
b1 x
n1
n2
m2
flip(s) b2
2gi–1 2gi 2gi+1–1 2gi+1 y
b1 x
n1
m1 n2
b2
2gi 2gi–1 2gi+1 2gi+1–1 y
Fig. 5. Transform a 2-strip into an anti-strip, b1 has the opposite direction with b2
If s belongs to the same cycle, the change of c( ) depends on the relative direction of b1 and b2 . Case 1: b1 and b2 have the same direction, as shown in lemma 4 and Fig. 4(a¼ ), in K ¼ , the new black edge m1 is directed from n2 ’s head to n1 ’s tail, and the new gray edge m2 is directed from b1 ’s head to b2 ’s tail, each in a closed cycle, thus increasing c( ) by 1. Case 2: b1 and b2 have the opposite direction, see Fig. 5. If we retain the direction of b1 , b2 , n1 , n2 , then in K ¼ , m1 joins two tails of gray edge n1 and n2 , m2 joins two heads of black edge b1 and b2 . So the direction of some edges must change in K ¼ . The corresponding stretched cycle is shown in Fig. 4(c). Since the newly created edges m1 and m2 can not form a 1-cycle, the cycle configuration of K ¼ must be the one shown in Fig. 4(c¼ ). It is clear that c( ) remains the same. Theorem 1. For a singleton-free unsigned permutation , every super-r-spin an optimal r-spin of .
¼
of
is
Proof. Let ¼1 be an optimal r-spin of containing the minimum number of anti-strips among all the optimal r-spins satisfying the conditions (I) and (II) of lemma 2. Let s
66
X. Lou and D. Zhu
be an anti-strip of ¼1 , and let 2¼ be the twin of ¼1 with canonical 2-strip s, i.e., ¼1 ¼ ¼ 2 f lip(s). Then 2 also satisfies the conditions (I) and (II) of lemma 2. From lemma ¼ ¼ 3, b( 1)b( 2 )1. Note that a 2-strip s can belong to two cycles of 2¼ , or belong to the same cycle of 2¼ . The following will discuss them respectively. Case 1: If s belongs to two cycles of 2¼ . From lemma 5, c( 1¼ )c( 2¼ )1. Since f lip(s) can aect at most two hurdles of ¼2 , and when f lip(s) removes two hurdles, they must both be simple hurdles. From the definition of fortress, we know (h f )( 1¼ ) (h f )( ¼2 )2. 1 From formula (1), dr ( 2¼ )dr ( 1¼ ). This implies that 2¼ is an optimal r-spin of satisfying conditions (I) and (II) of lemma 2, and that ¼2 contains fewer anti-strips than 1¼ , a contradiction. Case 2: If s belongs to the same cycle of 2¼ . From lemma 5, the change of cycle number due to f lip(s) on 2¼ depends on the relative direction of black edges b1 and b2 which are incident to s. Case 2.1 If b1 and b2 have the opposite direction, then c( 1¼ )c( 2¼ ). Since s is in an oriented component of ¼2 , (h f )( 1¼ )(h f )( 2¼ ). From formula (1), dr ( 1¼ ) dr ( 2¼ ), a contradiction. Case 2.2 If b1 and b2 have the same direction, then c( ¼1 )c( 2¼ )1. If b1 and b2 are in an oriented component of G( 2¼¼ ), then (h f )( 1¼ )(h f )( 2¼ ). From formula (1), dr ( 1¼ )dr ( 2¼ ). This implies that 2¼ is an optimal r-spin of satisfying conditions (I) and (II) of lemma 2, and that ¼2 contains fewer anti-strips than 1¼ , a contradiction. So b1 and b2 must belong to one cycle of an unoriented component K of G( 2¼¼ ), i.e., s is a satisfied 2-strip of K. In this case, f lip(s) will never increase (h f )( 2¼ ), and can at most decrease (h f )( 2¼ ) by 1. If (h f )( 1¼ )(h f )( 2¼ ), then dr ( 1¼ )dr ( 2¼ ), a contradiction. Therefore (h f )( 1¼ )(h f )( 2¼ )1 and dr ( 1¼ )dr ( 2¼ )1. The above analysis shows that if ¼1 is an optimal r-spin of containing the minimum number of anti-strips among all the optimal r-spins satisfying the conditions (I) and (II) of lemma 2, then its twin 2¼ with canonical s must have s in one cycle of an unoriented component of G( ¼¼2 ) and (h f )( ¼1 )(h f )( 2¼ )1 holds. Suppose that 1¼ contains t anti-strips s¼1 s¼2 s¼t . Applying f lip(s) to all of them will turn them into canonical 2-strips s1 s2 st , thus turning 1¼ into a canonical spin, denoted as Æ. Note that dr (Æ)dr ( 1¼ )t must hold, and each si (1it) must be a satisfied 2-strip of Æ. Let K1 K2 Kt be the t unoriented components of Æ containing satisfied 2-strips s1 s2 st . Let ¼3 be a spin of obtained from Æ by arbitrarily choosing a satisfied 2-strip of Ki (for 1it), and transforming it into an anti-strip. From lemma 3, b( 3¼ )b(Æ)tb( 1¼ ), from lemma 5, c( ¼3 )c(Æ)tc( 1¼ ). Moreover, (h f )( ¼1 )(h f )( 3¼ ). This is because for an unoriented component Ki , if it contains more than one satisfied 2-strips, turning any one of them into an anti-strip has the same eect of turning Ki into an oriented component, thus having the same eect on h f . Therefore, dr ( 3¼ )dr ( 1¼ ). If 3¼ does not have additional unoriented components containing satisfied 2-strips, ¼ ¼ 3 is a super-r-spin of . Otherwise, suppose 3 has unoriented components Kt1 ,..., K x , each of them contains satisfied 2-strips. Then for every K j (t1 j x), arbitrarily select a satisfied 2-strip s j K j , performing f lip(s j ) will transform K j into an oriented component without increasing the value of (bc)( ¼3) and (h f )( ¼3 ), which means 1
(h f )( ¼1 ) is a simplified notation of h( 1¼ ) f ( 1¼ ).
Genome Rearrangement Algorithms
67
dr ( 3¼ f lip(s j )) dr ( 3¼ ). Let 4¼ ¼3 f lip(st1 ) f lip(s x ). Then 4¼ is a super-r-spin of , and dr ( 4¼ ) dr ( 3¼ ) dr ( 1¼ ), which implies that ¼4 is an optimal r-spin. This completes the proof of the theorem. The corrected algorithm is given as Reversal Sorting( ). For an unsigned permutation with k singletons, there are 2k possible canonical spins. Reversal Sorting( ) first gets the super-r-spin of each canonical spin, then computes the reversal distance of each superr-spin, and finally takes the super-r-spin which has the minimum distance as the optimal r-spin. To decide whether a 2-strip is satisfied takesO(n) time. Step 10 can be completed in O(n) time [1], step 16 can be completed in O(n nlogn) time [8]. Thus the time com plexity of Reversal Sorting( ) can reach O(2k nn nlogn). If the number of singletons, k, is O(logn), the computation can be completed in polynomial time.
Algorithm Reversal Sorting( ) n; 1 dr ( ) 2 for every canonical spin ¼ of 3 construct G( ¼¼ ); 4 for every unoriented component Ki of G( ¼¼ ) 5 if Ki has satisfied 2-strips 6 arbitrarily select one of them, denoted as s; ¼ ¼ f lip(s); 7 endif 8 endfor 9 10 compute dr ( ¼ ) according to formula (1); 11 if dr ( ¼ )dr ( ) 12 dr ( ) dr ( ¼ ); £ ¼ ; 13 endif 14 endfor 15 16 sort £ ; 17 sorting of £ mimics optimal sorting of by dr ( ) reversals.
4 Approximation Algorithm for Unsigned Weighted Sorting 4.1 Algorithm for Singleton-Free Permutations In this section, we consider sorting a singleton-free unsigned permutation by weighted reversals, transpositions and transreversals. The weighted distance is defined as drt ( ) min ¾ rev() 2trp(). Eriksen studied the signed version of this problem, and gave a (1 )-approximation algorithm [5]. For a signed permutation ¼ , a component K of G( ¼¼ ) is either an oriented component or an unoriented component, and an unoriented component is either a SUC or a non-SUC. If K is an oriented component, one can eliminate it using b(K)c(K) reversals. If K is a non-SUC, one can eliminate it using ((b(K)c(K)) 2 transpositions, which contributes b(K)c(K) to the weighted distance, while it costs more than b(K)c(K) to eliminate a non-SUC by reversals. If K is a SUC, it takes more than
68
X. Lou and D. Zhu
((b(K)c(K)) 2 transpositions to eliminate, while just b(K)c(K)1 reversals will do the job. However, it is diÆcult to distinguish between a SUC and a non-SUC, only approximation algorithm is available. The reason we do not mention transreversals is that in both signed case and unsigned case, each transreversal can be replaced by two reversals, without aecting the objective function [5]. So we only consider reversals and transpositions. Details about formula (2) see [5]. For an arbitrary unsigned permutation of order n, let be the set of all 2n spins of . Using the same idea as reversal sorting, we can get the following lemmas and theorem. The proofs are similar to the those proved for reversal sorting and are given in the full version of this paper. Lemma 6. For any unsigned permutation , drt ( )min ¾ drt ( ¼ ). ¼
Lemma 7. Let 1¼ be a spin of with canonical 3-element strip s[gi gi1 gi2 ], and let 2¼ be a twin of 1¼ with respect to s. Then drt ( 1¼ )drt ( 2¼ ) Lemma 8. For any unsigned permutation , there exists an optimal rt-spin that all the long strips of are canonical in ¼ .
¼
of such
Lemma 9. For any unsigned permutation , there exists an optimal rt-spin that every 2-strip of is either canonical or anti-canonical in ¼ .
¼
of such
Lemma 10. For any unsigned permutation , there exists an optimal rt-spin ¼ of such that (I) an unoriented component in ¼ does not contain any anti-strip, and (II) an oriented component in ¼ contains at most one anti-strip. ¼
Theorem 2. For a singleton-free unsigned permutation , every super-r-spin an optimal rt-spin of .
of
is
According to theorem 2, by running Eriksen’s algorithm on the super-r-spin of , we can also get a (1 )-approximation solution for sorting by weighted reversal transposition transreversals. The algorithm is given as Weighted Sorting( ). Algorithm Weighted Sorting( ) 1 construct the canonical spin ¼ of and G( ¼¼ ); 2 for every unoriented component Ki of G( ¼¼ ) 3 if Ki has satisfied 2-strips 4 arbitrarily select one of them and turn it into an anti-strip; endfor 5 6 run the (1 )-approximation algorithm (Eriksen [5]) on the resulting
¼
.
4.2 Algorithm for Permutations with O(log n) Singletons We show that algorithm Weighted Sorting can be applied to permutations with O(log n) singletons simply by enumerating of all the singleton’s signs and guarantee the (1 )approximation ratio. Theorem 3. For any unsigned permutation with O(log n) singletons, a (1 )-approximation solution for sorting by weighted reversal transposition transreversals can be computed in polynomial time. »
»
Genome Rearrangement Algorithms
69
Proof. First we can enumerate all the singleton’s signs to get the set of all the canonical spins of . Let P 1¼ ¼2 ¼m denote the set of canonical spins by the enumeration, Æi denote the super-r-spin of i¼ , and Art (Æi ) denote the distance of Æi computed by running Eriksen’s algorithm on Æi . By running Weighted Sorting on every canonical spin in P and taking the minimum value, we can get the approximation distance for sorting , i.e., Art ( )min Art (Æ1 ) Art (Æ2 ) Art (Æm )Art (Æ x )(1xm). On the other hand, from theorem 2, the optimal weighted distance for sorting is drt ( ) min drt (Æ1 ) drt (Æ2 ) drt (Æm ). Let drt ( )drt (Æy )(1ym). Since Adrrtt((ÆÆii)) 1 for 1i m, we have: Art ( ) drt ( )
min Art (Æ1 ) Art (Æ2 ) Art (Æm ) min drt (Æ1 ) drt (Æ2 ) drt (Æm )
Art (Æ x ) drt (Æy )
Art (Æy ) drt (Æy )
1
When there are O(log n) singletons in the permutation, the size of set P is O(nk ), where k is a fixed constant. Similarly to section 3, we can get a polynomial time algorithm on such permutations and guarantee the (1 )-approximation ratio. Acknowledgments. We would like to thank the reviewers for their helpful suggestions.
References 1. Bader, D.A., Moret, B.M.E., Yan, M.: A linear-time algorithm for computing inversion distance between signed permutations with an experimental study. J. Comput. Biol. 8, 483–491 (2001) 2. Berman, P., Hannenhalli, S., Karpinki, M.: 1.375-approximation algorithm for sorting by reversals. In: M¨ohring, R.H., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 200–210. Springer, Heidelberg (2002) 3. Bafna, V., Pevzner, P.A.: Sorting by transpositions. SIAM J. Discrete Math. 11, 272–289 (1998) 4. Caprara, A.: Sorting by reversals is diÆcult. In: Proc. 1st Annu. Int. Conf. Res. Comput. Mol. Biol. (RECOMB 1997), pp. 75–83 (1997) 5. Eriksen, N.: (1 )-approximation of sorting by reversals and transpositions. Theor. Comput. Sci. 289, 517–529 (2002) 6. Hannenhalli, S., Pevzner, P.A.: Transforming cabbage into turnip (polynomial algorithm for sorting signed permutations by reversals). In: Proc. 27th Annu. ACM Symp. Theory Computing (STOC 1995), pp. 178–189 (1995) 7. Hannenhalli, S., Pevzner, P.A.: To cut or not to cut (applications of comparative physical maps in molecular evolution). In: Proc. 7th Annu. ACM-SIAM Symp. Discrete Algorithms (SODA 1996), pp. 304–313 (1996) 8. Tannier, E., Sagot, M.-F.: Sorting by Reversals in Subquadratic Time. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 1–13. Springer, Heidelberg (2004)
On the Complexity of the Hidden Subgroup Problem Stephen Fenner1, and Yong Zhang2, 1
Department of Computer Science & Engineering, University of South Carolina Columbia, SC 29208 USA
[email protected] 2 Department of Mathematical Sciences, Eastern Mennonite University Harrisonburg, VA 22802, USA
[email protected]
Abstract. We show that several problems that figure prominently in quantum computing, including Hidden Coset, Hidden Shift, and Orbit Coset, are equivalent or reducible to Hidden Subgroup. We also show that, over permutation groups, the decision version and search version of Hidden Subgroup are polynomial-time equivalent. For Hidden Subgroup over dihedral groups, such an equivalence can be obtained if the order of the group is smooth. Finally, we give nonadaptive program checkers for Hidden Subgroup and its decision version.
1
Introduction
The Hidden Subgroup problem generalizes many interesting problems that have efficient quantum algorithms but whose known classical algorithms are inefficient. While we can solve Hidden Subgroup over abelian groups satisfactorily on quantum computers, the nonabelian case is more challenging. Although there are many families of groups besides abelian ones for which Hidden Subgroup is known to be solvable in quantum polynomial time, the overall successes are considered limited. People are particularly interested in solving Hidden Subgroup over two families of nonabelian groups, permutation groups and dihedral groups, since solving them will immediately give solutions to Graph Isomorphism [Joz00] and Shortest Lattice Vector [Reg04], respectively. To explore more fully the power of quantum computers, researchers have also introduced and studied several related problems. Van Dam, Hallgren, and Ip [vDHI03] introduced the Hidden Shift problem and gave efficient quantum algorithms for some instances. Their results provide evidence that quantum computers can help to recover shift structure as well as subgroup structure. They also introduced the Hidden Coset problem to generalize Hidden Shift and Hidden Subgroup. Recently, Childs and van Dam [CvD07] introduced the Generalized Hidden Shift problem, extending Hidden Shift from a
Partially supported by NSF grant CCF-0515269. Partially supported by an EMU Summer Research Grant 2006.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 70–81, 2008. c Springer-Verlag Berlin Heidelberg 2008
On the Complexity of the Hidden Subgroup Problem
71
different angle. In an attempt to attack Hidden Subgroup using a divide-andconquer approach over subgroup chains, Friedl et al. [FIM+ 03] introduced the Orbit Coset problem, which they claimed to be an even more general problem including Hidden Subgroup and Hidden Shift1 as special instances. They called Orbit Coset a quantum generalization of Hidden Subgroup and Hidden Shift, since the definition of Orbit Coset involves quantum functions. In Section 3, we show that all these related problems are equivalent or reducible to Hidden Subgroup with different underlying groups. In particular, 1. Hidden Coset is polynomial-time equivalent to Hidden Subgroup, 2. Orbit Coset is equivalent to Hidden Subgroup if we allow functions in the latter to be quantum functions, and 3. Hidden Shift and Generalized Hidden Shift reduce to instances of Hidden Subgroup over a family of wreath product groups.2 There are a few results in the literature about the complexity of Hidden Subgroup. Hidden Subgroup over abelian groups is in class BQP [Kit95, Mos99]. Ettinger, Hoyer, and Knill [EHK04] showed that Hidden Subgroup (over arbitrary finite groups) has polynomial quantum query complexity. Arvind and Kurur [AK02] showed that Hidden Subgroup over permutation groups is in the class FPSPP and is thus low for the counting complexity class PP. In Section 4 we study the relationship between the decision and search versions of Hidden Subgroup, denoted as Hidden SubgroupD and Hidden SubgroupS , respectively. It is well known that NP-complete sets such as SAT are self-reducible, which implies that the decision and search versions of NP-complete problems are polynomial-time equivalent. We show this is also the case for Hidden Subgroup and Hidden Shift over permutation groups. There are evidences in the literature showing that Hidden SubgroupD over permutation groups is difficult [HRTS03, GSVV04, MRS05, KS05, HMR+ 06, MRS07]. In particular, Kempe and Shalev [KS05] showed that under general conditions, various forms of the Quantum Fourier Sampling method are of no help (over classical exhaustive search) in solving Hidden SubgroupD over permutation groups. Our results yield evidence of a different sort that this problem is difficult—namely, it is just as hard as the search version. For Hidden Subgroup over dihedral groups, our results are more modest. We show the search-decision equivalence for dihedral groups of smooth order, i.e., where the largest prime dividing the order of the group is small. Combining our results in Sections 3 and 4, we obtain nonadaptive program checkers for Hidden Subgroup and Hidden SubgroupD over permutation groups. We give the details in Section 5. 1 2
They actually called it the Hidden Translation problem. Friedl et al. [FIM+ 03] gave a reduction from Hidden Shift to instances of Hidden Subgroup over semi-direct product groups. However, their reduction only works when the group G is abelian.
72
S. Fenner and Y. Zhang
2
Preliminaries
2.1
Group Theory
Background on general group theory and quantum computation can be found in textbooks [Sco87] and [NC00]. A special case of the wreath product groups plays an important role in several proof. Definition 1. For any finite group G, the wreath product G Zn of G and Zn = {0, 1, . . . , n − 1} is the set {(g1 , g2 , . . . , gn , τ ) | g1 , g2 , . . . , gn ∈ G, τ ∈ Zn } equipped with the group operation ◦ such that (g1 , g2 , . . . , gn , τ ) ◦ (g1 , g2 , . . . , gn , τ ) = (gτ (1) g1 , gτ (2) g2 , . . . , gτ (n) gn , τ τ ). We abuse notation here by identifying τ and τ with permutations over the set {1, . . . , n} sending x to x + τ mod n and to x + τ mod n, respectively. Let Z be a set and SZ be the symmetric group of permutations of Z. We define the composition order to be from left to right, i.e., for g1 , g2 ∈ SZ , g1 g2 is the permutation obtained by applying g1 first and then g2 . For n ≥ 1, we abbreviate S{1,2,...,n} by Sn . Subgroups of Sn are the permutation groups of degree n. For a permutation group G ≤ Sn and an element i ∈ {1, . . . , n}, let G(i) denote the stabilizer subgroup of G that fixes the set {1, . . . , i} pointwise. The chain of the stabilizer subgroups of G is {id} = G(n) ≤ G(n−1) ≤ · · · ≤ G(1) ≤ G(0) = G. Let Ci be a complete set of right coset representatives of G(i) in G(i−1) , 1 ≤ i ≤ n. Then the cardinality of Ci is at most n−i and ∪ni=1 Ci forms a strong generator set for G [Sim70]. Any element g ∈ G can be written uniquely as g = gn gn−1 · · · g1 with gi ∈ Ci . Furst, Hopcroft, and Luks [FHL80] showed that given any generator set for G, a strong generator set can be computed in polynomial time. For X ⊆ Z and G ≤ SZ , we use GX to denote the subgroup of G that stablizes X setwise. It is evident that GX is the direct sum of SX and SZ\X . We are particularly interested in the case when G is Sn . In this case, a generating set for GX can be easily computed. Let G be a finite group. Let Γ be a set of mutually orthogonal quantum states. Let α : G × Γ → Γ be a group action of G on Γ , i.e., for every x ∈ G the function αx : Γ → Γ mapping |φ to |α(x, |φ ) is a permutation over Γ , and the map h from G to the symmetric group over Γ defined by h(x) = αx is a homomorphism. We use the notation |x · φ instead of |α(x, |φ ) , when α is clear from the context. We let G(|φ ) denote the orbit of |φ with respect to α, i.e., the set {|x · φ : x ∈ G}, and we let G|φ denote the stabilizer subgroup of |φ
in G, i.e., {x ∈ G : |x · φ = |φ }. Given any positive integer t, let αt denote the group action of G on Γ t = {|φ ⊗t : |φ ∈ Γ } defined by αt (x, |φ ⊗t ) = |x · φ ⊗t . We need αt because the input superpositions cannot be cloned in general. Definition 2. Let G be a finite group. – Given a generating set for G and a function f that maps G to some finite set S, where the values of f are constant on a subgroup H of G and distinct
On the Complexity of the Hidden Subgroup Problem
73
on each left (right) coset of H. The Hidden Subgroup problem is to find a generating set for H. The decision version of Hidden Subgroup, denoted as Hidden SubgroupD , is to determine whether H is trivial. The search version, denoted as Hidden SubgroupS , is to find a nontrivial element, if there exists one, in H. – Given a generating set for G and n injective functions f1 , f2 , . . . , fn defined on G, with the promise that there is a “shift” u ∈ G such that for all g ∈ G, f1 (g) = f2 (gu), f2 (g) = f3 (gu), . . . , fn−1 (g) = fn (gu), the Generalized Hidden Shift problem is to find u. If n = 2, this problem is called the Hidden Shift problem. – Given a generating set for G and two functions f1 and f2 defined on G such that for some shift u ∈ G, f1 (g) = f2 (gu) for all g in G, the Hidden Coset problem is to find the set of all such u. – Given a generating set for G and two quantum states |φ0 , |φ1 ∈ Γ , the Orbit Coset problem is to either reject the input if G(|φ0 ) ∩ G(|φ1 ) = ∅, or else output both a u ∈ G such that |u · φ1 = |φ0 and also a generating set for G|φ1 . 2.2
Program Checkers
Let π be a computational decision or search problem. Let x be an input to π and π(x) be the output of π. Let P be a deterministic program (supposedly) for π that halts on all inputs. We are interested in whether P has any bug, i.e., whether there is some x such that P (x) = π(x). A efficient program checker C for P is a probabilistic expected-polynomial-time oracle Turing machine that uses P as an oracle and takes x and a positive integer k (presented in unary) as inputs. The running time of C does not include the time it takes for the oracle P to do its computations. C will output CORRECT with probability ≥ 1 − 1/2k if P is correct on all inputs (no bugs), and output BUGGY with probability ≥ 1 − 1/2k if P (x) = π(x). This probability is over the sample space of all finite sequences of coin flips C could have tossed. However, if P has bugs but P (x) = π(x), we allow C to behave arbitrarily. If C only queries the oracle nonadaptively, then we say C is a nonadaptive checker. See Blum and Kannan [BK95] for more details.
3
Several Reductions
The Hidden Coset problem is to find the set of all shifts of the two functions f1 and f2 defined on the group G. In fact, the set of all shifts is a coset of a subgroup H of G and f1 is constant on H (see [vDHI03] Lemma 6.1). If we let f1 and f2 be the same function, this is exactly Hidden Subgroup. On the other hand, if f1 and f2 are injective functions, this is Hidden Shift. Theorem 1. Hidden Coset is polynomial-time equivalent to Hidden Subgroup.
74
S. Fenner and Y. Zhang
Proof. Let G and f1 , f2 be the input of Hidden Coset. Let the set of shifts be Hu, where H is a subgroup of G and u is a coset representative. Define a function f that maps G Z2 to S × S as follows: for any (g1 , g2 , τ ) ∈ G Z2 , (f1 (g1 ), f2 (g2 )) if τ = 0, f (g1 , g2 , τ ) = (f2 (g2 ), f1 (g1 )) if τ = 1. The values of f are constant on the set K = (H × u−1 Hu × {0}) ∪ (u−1 H × Hu × {1}), which is a subgroup of G Z2 . Furthermore, the values of f are distinct on all left cosets of K. Given a generating set of K, there is at least one generator of the form (k1 , k2 , 1). Pick k2 to be the coset representative u of H. Form a generating set S of H as follows. S is initially empty. For each generator of K, if it is of the form (k1 , k2 , 0), then add k1 and uk2 u−1 to S; if it is of the form (k1 , k2 , 1), then add uk1 and k2 u−1 to S. Corollary 1. Hidden Coset has polynomial quantum query complexity. It was mentioned in Friedl et al. [FIM+ 03] that Hidden Coset in general is of exponential (classical) query complexity. Recently Childs and van Dam [CvD07] proposed Generalized Hidden Shift where there are n injective functions (encoded in a single function). Using a similar approach, we show Generalized Hidden Shift essentially addresses Hidden Subgroup over a different family of groups. Proposition 1. Generalized Hidden Shift reduces to Hidden Subgroup in time polynomial in n. Proof. The input for Generalized Hidden Shift is a group G and n injective functions f1 , f2 , . . . , fn defined on a group G such that for all g ∈ G, f1 (g) = f2 (gu), . . . , fn−1 (g) = fn (gu). Consider the group G Zn . Define a function f such that for any element in (g1 , . . . , gn , τ ) ∈ G Zn , f ((g1 , . . . , gn , τ )) = (fτ (0) (g0 ), . . . , fτ (n) (gn )). The function values of f are constant and distinct for left cosets of the n-element cyclic subgroup generated by (u−(n−1) , u, u, · · · , u, 1). Van Dam, Hallgren, and Ip [vDHI03] introduced the Shifted Legendre Symbol problem as a natural instance of Hidden Shift. They claimed that assuming a conjecture this problem can also be reduced to an instance of Hidden Subgroup over dihedral groups. By Proposition 1, this problem can be reduced to Hidden Subgroup over wreath product groups without any conjecture. Next we show Orbit Coset is not a more general problem either, if we allow the function in Hidden Subgroup to be a quantum function. We need this generalization since the definition of Orbit Coset involves quantum functions, i.e., the ranges of the functions are sets of orthogonal quantum states. In Hidden Subgroup, the function is implicitly considered by most researchers to be a classical function, mapping group elements to a classical set. For the purposes of quantum computation, however, this generalization to quantum functions is natural and does not affect any existing quantum algorithms for Hidden Subgroup.
On the Complexity of the Hidden Subgroup Problem
75
Proposition 2. Orbit Coset is quantum polynomial-time equivalent to Hidden Subgroup. Proof. Let G and two orthogonal quantum states |φ0 , |φ1 ∈ Γ be the inputs of Orbit Coset. Define the function f : G Z2 → Γ ⊗ Γ as follows: |g1 · φ0 ⊗ |g2 · φ1 if τ = 0, f (g1 , g2 , τ ) = |g2 · φ1 ⊗ |g1 · φ0 if τ = 1. The values of the function f are identical and orthogonal on each left coset of the following subgroup H of G Z2 : If there is no u ∈ G such that |u · φ1 = |φ0 , then H = G|φ0 × G|φ1 × {0}. If there is such a u, then H = (G|φ0 × G|φ1 × {0}) ∪ (G|φ1 u−1 × uG|φ1 × {1}). For i, j ∈ {0, 1}, let gi ∈ G be the i’th coset representative of G|φ0 (i.e., |gi · φ0 = |φi ), and let gj ∈ G be the j’th coset representative of G|φ1 (i.e., |gj · φ1 = |φj ). Then elements of the left coset of H represented by (gi , gj , 0) will all map to the same value |φi ⊗ |φj via f .
4
Decision Versus Search
For NP-complete problems, the decision and search version are polynomial-time equivalent. This equivalence was also obtained for problems which are not known to be N P -complete, such as Graph Isomorphism [Mat79] and Group Intersection [AT01]. Since both problems reduce to instances of Hidden Subgroup, we ask the question whether Hidden Subgroup has such property. In this section we show that over permutation groups Sn , this decision-search equivalence can actually be obtained for Hidden Subgroup and also Hidden Shift. On the other hand, over dihedral groups Dn , this equivalence can only be obtained for Hidden Subgroup when n has small prime factors. Lemma 1. Given (generating sets for) a group G ≤ Sn , a function f : G → S that hides a subgroup H ≤ G, and a sequence of subgroups G1 , . . . , Gk ≤ Sn , an instance of Hidden Subgroup can be constructed to hide the group D = {(g, g, . . . , g) | g ∈ H ∩ G1 ∩ · · · ∩ Gk } inside G × G1 × · · · × Gk . Proof. Define a function f over the direct product group G × G1 × · · · × Gk so that for any element (g, g1 , . . . , gk ), f (g, g1 , . . . , gk ) = (f (g), gg1−1 , . . . , ggk−1 ). The values of f are constant and distinct over left cosets of D. In the following, we will use the tuple G, f to represent a standard Hidden Subgroup input instance, and G, f, G1 , . . . , Gk to represent a Hidden Subgroup input instance constructed as in Lemma 1. We define a natural isomorphism that identifies Sn Z2 with a subgroup of SΓ , where Γ = {(i, j) | i ∈ {1, . . . , n}, j ∈ {1, 2}}. This isomorphism can be viewed as a group action, where the group element (g1 , g2 , τ ) maps (i, j) to (gj (i), τ (j)). Note that this isomorphism can be efficiently computed in both directions. Theorem 2. Over permutation groups, Hidden SubgroupS is truth-table reducible to Hidden SubgroupD in polynomial time.
76
S. Fenner and Y. Zhang
Proof. Suppose f hides a nontrivial subgroup H of G, first we compute a strong generating set for G, corresponding to the chain {id} = G(n) ≤ G(n−1) ≤ · · · ≤ G(1) ≤ G(0) = G. Define f over G Z2 such that f maps (g1 , g2 , τ ) to (f (g1 ), f (g2 )) if τ is 0, and (f (g2 ), f (g1 )) otherwise. It is easy to check that for the group G(i) Z2 , f |G(i) Z2 hides the subgroup H (i) Z2 . Query the Hidden SubgroupD oracle with inputs G(i) Z2 , f |G(i) Z2 , (SΓ ){(i,1),(j,2)} , (SΓ ){(i,2),(j ,1)} , (SΓ ){(k,1),(,2)} for all 1 ≤ i ≤ n, all j, j ∈ {i + 1, . . . , n}, and all k, ∈ {i, . . . , n}. Claim. Let i be such that H (i) = {id} and H (i−1) = {id}. For all i < j, j ≤ n and all i ≤ k, l ≤ n, there is a (necessarily unique) permutation h ∈ H (i−1) such that h(i) = j, h(j ) = i and h(k) = if and only if the query G(i−1) Z2 , f |G(i−1) Z2 , (SΓ ){(i,1),(j,2)} , (SΓ ){(i,2),(j ,1)} , (SΓ ){(k,1),(,2)} to the Hidden SubgroupD oracle answers “nontrivial.” Proof of Claim. For any j > i, there is at most one permutation in H (i−1) that maps i to j. To see this, suppose there are two distinct h, h ∈ H (i−1) both of which map i to j. Then h h−1 ∈ H (i) is a nontrivial permutation, contradicting the assumption H (i) = {id}. Let h ∈ H (i−1) be a permutation such that h(i) = j, h(j ) = i, and h(k) = . Then (h, h−1 , 1) is a nontrivial element in the group H (i−1) Z2 ∩(SΓ ){(i,1),(j,2)} ∩(SΓ ){(i,2),(j ,1)} ∩(SΓ ){(k,1),(,2)} , and thus the oracle answers “nontrivial.” Conversely, if the oracle answers “nontrivial,” then the nontrivial element must be of the form (h, h , 1) where h, h ∈ H (i−1) , since the other form (h, h , 0) will imply that h and h both fix i and thus are in H (i) = {id}. Therefore, h will be a nontrivial element of H (i−1) with h(i) = j, h(j ) = i, and h(k) = . This proves the Claim. Find the largest i such that the query answers “nontrivial” for some j, j > i and some k, ≥ i. Clearly this is the smallest i such that H (i) = {id}. A nontrivial permutation in H (i−1) can be constructed by looking at the query results that involve G(i−1) Z2 . Corollary 2. Hidden SubgroupD and Hidden SubgroupS are polynomialtime equivalent over permutation groups,. Next we show that the search version of Hidden Shift, as a special case of Hidden Subgroup, also reduces to the corresponding decision problem. Definition 3. Given a generating set for a group G and two injective functions f1 , f2 defined on G, the problem Hidden ShiftD is to determine whether there is a shift u ∈ G such that f1 (g) = f2 (gu) for all g ∈ G.
On the Complexity of the Hidden Subgroup Problem
77
Theorem 3. Over permutation groups, Hidden ShiftD and Hidden ShiftS are polynomial-time equivalent. Proof. We show that if there is a translation u for the two injective functions defined on G, we can find u with the help of an oracle that solves Hidden ShiftD . First compute the strong generator set ∪ni=1 Ci of G using the procedure in [FHL80]. Note that ∪ni=k Ci generates G(k−1) for 1 ≤ k ≤ n. We will proceed in steps along the stabilizer subgroup chain G = G(0) ≥ G(1) ≥ · · · ≥ G(n) = {id}. Claim. With the help of the Hidden ShiftD oracle, finding the translation ui for input (G(i) , f1 , f2 ) reduces to finding another translation ui+1 for input (G(i+1) , f1 , f2 ). In particular, we have ui = ui+1 σi . Proof of Claim. Ask the oracle whether there is a translation for the input instance (G(i+1) , f1 |G(i+1) , f2 |G(i+1) ). If the answer is yes, then we know ui ∈ G(i+1) and therefore set σi = id and ui = ui+1 σi . If the answer is no, then we know that u is in some right coset of G(i+1) in G(i) . For every τ ∈ Ci+1 , define a function fτ such that fτ (x) = f2 (xτ ) for all x ∈ G(i+1) . Ask the oracle whether there is a translation for input (G(i+1) , f1 |G(i+1) , fτ ). The oracle will answer yes if and only if u and τ are in the same right coset of G(i+1) in G(i) , since u and τ are in the same right coset of G(i+1) in G(i) ⇐⇒ u = u τ for some τ ∈ G(i+1) ⇐⇒ f1 (x) = f2 (xu) = f2 (xu τ ) = fτ (xu ) for all x ∈ G(i) ⇐⇒ u is the translation for (G(i+1) , f1 |G(i+1) , fτ ). Then we set σi = τ . We apply the above procedure n − 1 times until we reach the trivial subgroup G(n) . The translation u will be equal to σn σn−1 · · · σ1 . Since the size of each Ci is at most n − i, the total reduction is in classical polynomial time. For Hidden Subgroup over dihedral groups Dn , we can efficiently reduce search to decision when n has small prime factors. For a fixed integer B, we say an integer n is B-smooth if all the prime factors of n are less than or equal to B. For such an n, the prime factorization can be obtained in time polynomial in B + log n. Without loss of generality, we assume that the hidden subgroup is an order-two subgroup of Dn [EH00]. Theorem 4. Let n be a B-smooth number, Hidden Subgroup over the dihedral group Dn reduces to Hidden SubgroupD over dihedral groups in time polynomial in B + log n. Proof. Without loss of generality, we assume the generator set for Dn is {r, σ}, where the order of r and σ are n and 2, respectively. Let pe11 pe22 · · · pekk be the prime factorization of n. Since n is B-smooth, pi ≤ B for all 1 ≤ i ≤ k. Let the hidden subgroup H be {id, ra σ} for some a < n.
78
S. Fenner and Y. Zhang
First we find a mod pe11 as follows. Query the Hidden SubgroupD oracle with input groups (we will always use the original input function f ) rp1 , σ , rp1 , rσ , . . . , rp1 , rp1 −1 σ . It is not hard to see that the Hidden SubgroupD oracle will answer “nontrivial” only for the input group rp1 , rm1 σ where m1 = a mod p1 . The next set of input groups to the Hidden SubgroupD oracle are 2 2 2 rp1 , rm1 σ , rp1 , rp1 +m1 σ , . . . , rp1 , r(p1 −1)p1 +m1 σ . From the oracle answers we obtain m2 = a mod p21 . Repeat the above procedure until we find a mod pe11 . Similarly, we can find a mod pe22 , . . . , a mod pekk . A simple usage of the Chinese Remainder Theorem will then recover a. The total number of queries is e1 p1 + e2 p2 + · · · + ek pk , which is polynomial in log n + B.
5
Nonadaptive Checkers
An important concept closely related to self-reducibility is that of a program checker, which was first introduced by Blum and Kannan [BK95]. They gave program checkers for some group-theoretic problems and selected problems in P. They also characterized the class of problems having polynomial-time checkers. Arvind and Tor´ an [AT01] presented a nonadaptive NC checker for Group Intersection over permutation groups. In this section we show that Hidden SubgroupD and Hidden Subgroup over permutation groups have nonadaptive checkers. For the sake of clarity, we give the checker for Hidden SubgroupD first. Let P be a program that solves Hidden SubgroupD over permutation groups. The input for P is a permutation group G given by its generating set and a function f that is defined over G and hides a subgroup H of G. If P is a correct program, then P (G, f ) outputs TRIVIAL if H is the trivial subgroup of G, and NONTRIVIAL otherwise. The checker C P (G, f, 0k ) checks the program P on the input G and f as follows: Begin Compute P (G, f ). if P (G, f ) = NONTRIVIAL, then Use Theorem 2 and P (as if it were bug-free) to search for a nontrivial element h of H. if f (h) = f (id), then return CORRECT else return BUGGY if P (G, f ) = TRIVIAL, then Do k times (in parallel): generate a random permutation u ∈ G. define f over G such that f (g) = f (gu) for all g ∈ G, use (G, f, f ) to be an input instance of Hidden Shift use Theorem 1 to convert (G, f, f ) to an input instance (GZ2 , f ) of Hidden Subgroup3 3
Using the natural isomorphism we define in Section 4, the group G Z2 is still considered as a permutation group.
On the Complexity of the Hidden Subgroup Problem
79
use Theorem 2 and P to search for a nontrivial element h of the subgroup of G Z2 that f hides. if h = (u−1 , u, 1), then return BUGGY End-do return CORRECT End
Theorem 5. If P is a correct program for Hidden SubgroupD , C P (G, f, 0k ) always outputs CORRECT. If P (G, f ) is incorrect, the probability of C P (G, f, 0k ) outputting CORRECT is ≤ 2−k . Moreover, C P (G, f, 0k ) runs in polynomial time and queries P nonadaptively. Proof. If P is a correct program and P (G, f ) outputs NONTRIVIAL, then C P ((G, f, 0k ) will find a nontrivial element of H and outputs CORRECT. If P is a correct program and P (G, f ) outputs TRIVIAL, the function f constructed by C P (G, f, 0k ) will hide the two-element subgroup {(id, id, 0), (u, u−1 , 1)}. Therefore, C P (G, f, 0k ) will always recover the random permutation u correctly, and output CORRECT. On the other hand, if P (G, f ) outputs NONTRIVIAL while H is actually trivial, then C P (G, f, 0k ) will fail to find a nontrivial element of H and thus output BUGGY. If P (G, f ) outputs TRIVIAL while H is actually nontrivial, then the function f constructed by C P (G, f, 0k ) will hide the subgroup (H × u−1 Hu × {0}) ∪ (u−1 H × Hu × {1}). P correctly distinguishes u and other elements in the coset Hu only by chance. Since the order of H is at least 2, the probability that C P (G, f, 0k ) outputs CORRECT is at most 2−k . Clearly, C P (G, f, 0k ) runs in polynomial time. The nonadaptiveness follows from Theorem 2. Similarly, we can construct a nonadaptive checker C P (G, f, 0k ) for a program P (G, f ) that solves Hidden Subgroup over permutation groups. The checker makes k nonadaptive queries. Begin Run P (G, f ), which outputs a generating sets S. Verify that elements of S are indeed in H. Do k times (in parallel): generate a random element u ∈ G. define f over G such that f (g) = f (gu) for all g ∈ G, use (G, f, f ) to be an input instance of Hidden Coset use Theorem 1 to convert (G, f, f ) to an input instance (G Z2 , f ) of Hidden Subgroup P (G Z2 , f ) will output a set S of generators and a coset representative u if S and S don’t generate the same group or u and u are not in the same coset of S, then return BUGGY End-do return CORRECT End
The proof of correctness for the above checker is similar to the proof of Theorem 5.
80
6
S. Fenner and Y. Zhang
Discussion
The possibility of achieving exponential quantum savings is closely related to the underlying algebraic structure of a problem. Some researchers argued that generalizing from abelian groups to nonabelian groups may be too much for quantum computers [qua], since nonabelian groups exhibit radically different characteristics comparing with abelian groups. The results in this paper seem to support this point of view. Given the rich set of nonabelian group families, such as wreath product groups, Hidden Subgroup is indeed a “robust” and difficult problem. Recently Childs, Schulman, and Vazirani [CSV07] suggested an alternative generalization of abelian Hidden Subgroup, namely to the problems of finding nonlinear structures over finite fields. They gave two such problems, Hidden Radius and Hidden Flat of Centers, which exhibit exponential quantum speedup. An interesting open problem is whether these two problems are instances of Hidden Subgroup over some nonabelian group families.
Acknowledgments We thank Andrew Childs and Wim van Dam for valuable comments on a preliminary version of this paper. We also thank anonymous referees for helpful suggestions.
References [AK02] Arvind, V., Kurur, P.P.: Graph Isomorphism is in SPP. In: Proceedings of the 43rd IEEE Symposium on Foundations of Computer Science, IEEE, New York (2002) [AT01] Arvind, V., Tor´ an, J.: A nonadaptive NC checker for permutation group intersection. Theoretical Computer Science 259, 597–611 (2001) [BK95] Blum, M., Kannan, S.: Designing programs that check their work. Journal of the ACM 42(1), 269–291 (1995) [CSV07] Childs, A.M., Schulman, L.J., Vazirani, U.V.: Quantum algorithms for hidden nonlinear structures. In: FOCS 2007: Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), pp. 395–404. IEEE Computer Society, Washington (2007) [CvD07] Childs, A.M., van Dam, W.: Quantum algorithm for a generalized hidden shift problem. In: SODA 2007: Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pp. 1225–1232 (2007) [EH00] Ettinger, M., Høyer, P.: On quantum algorithms for noncommutative hidden subgroups. Advances in Applied Mathematics 25, 239–251 (2000) [EHK04] Ettinger, M., Høyer, P., Knill, E.: The quantum query complexity of the hidden subgroup problem is polynomial. Information Processing Letters 91(1), 43–48 (2004) [FHL80] Furst, M.L., Hopcroft, J.E., Luks, E.M.: Polynomial-time algorithms for permutation groups. In: Proceedings of the 21st IEEE Symposium on Foundations of Computer Science, pp. 36–41 (1980)
On the Complexity of the Hidden Subgroup Problem
81
[FIM+ 03] Friedl, K., Ivanyos, G., Magniez, F., Santha, M., Sen, P.: Hidden translation and orbit coset in quantum computing. In: Proceedings of the 35th ACM Symposium on the Theory of Computing, pp. 1–9 (2003) [GSVV04] Grigni, M., Schulman, J., Vazirani, M., Vazirani, U.: Quantum mechanical algorithms for the nonabelian hidden subgroup problem. Combinatorica 24(1), 137–154 (2004) otteler, M., Russell, A., Sen, P.: Limitations [HMR+ 06] Hallgren, S., Moore, C., R¨ of quantum coset states for graph isomorphism. In: STOC 2006: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, pp. 604–617 (2006) [HRTS03] Hallgren, S., Russell, A., Ta-Shma, A.: The hidden subgroup problem and quantum computation using group representations. SIAM Journal on Computing 32(4), 916–934 (2003) [Joz00] Jozsa, R.: Quantum factoring, discrete algorithm and the hidden subgroup problem (manuscript, 2000) [Kit95] Kitaev, A.Y.: Quantum measurements and the Abelian Stabilizer problem, quant-ph/9511026 (1995) [KS05] Kempe, J., Shalev, A.: The hidden subgroup problem and permutation group theory. In: Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, Vancouver, British Columbia, January 2005, pp. 1118–1125 (2005) [Mat79] Mathon, R.: A note on the graph isomorphism counting problem. Information Processing Letters 8, 131–132 (1979) [Mos99] Mosca, M.: Quantum Computer Algorithms. PhD thesis, University of Oxford (1999) [MRS05] Moore, C., Russell, A., Schulman, L.J.: The symmetric group defies strong fourier sampling. In: FOCS 2005: Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, pp. 479–490 (2005) [MRS07] Moore, C., Russell, A., Sniady, P.: On the impossibility of a quantum sieve algorithm for graph isomorphism. In: STOC 2007: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pp. 536–545 (2007) [NC00] Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) [qua] Quantum pontiff, http://dabacon.org/pontiff/?p=899 [Reg04] Regev, O.: Quantum computation and lattice problems. SIAM Journal on Computing 33(3), 738–760 (2004) [Sco87] Scott, W.R.: Group Theory. Dover Publications, Inc. (1987) [Sim70] Sims, C.C.: Computational methods in the study of permutation groups. In: Computational problems in abstract algebra, pp. 169–183 (1970) [vDHI03] van Dam, W., Hallgren, S., Ip, L.: Quantum algorithms for some hidden shift problems. In: Proceedings of the 14th annual ACM-SIAM symposium on Discrete algorithms, pp. 489–498 (2003)
An O ∗(3.523k ) Parameterized Algorithm for 3-Set Packing Jianxin Wang and Qilong Feng School of Information Science and Engineering, Central South University
[email protected]
Abstract. Packing problems have formed an important class of NPhard problems. In this paper, we provide further theoretical study on the structure of the problems, and design improved algorithm for packing problems. For the 3-Set Packing problem, we present a deterministic algorithm of time O∗ (3.523k ), which significantly improves the previous best algorithm of time O∗ (4.613k ).
1
Introduction
In the complexity theory, packing problem forms an important class of NP-hard problems, which are used widely in scheduling and code optimization fields. We first give some related definitions [1]. Assume all the elements used in this paper are from U . Set Packing: Given a pair (S, k), where S is a collection of n sets and k is an integer, find a largest subset S such that no two sets in S have the common elements. (Parameterized) 3-Set Packing: Given a pair (S, k), where S is a collection of n sets and k is an integer, each set contains 3elements, either construct a k-packing or report that no such packing exists. 3-Set Packing Augmentation: Given a pair (S, Pk ), where S is a collection of n sets and Pk is a k-packing in S, either construct (k + 1)-packing or report that no such packing exists. Recently, Downey and Fellows [2] proved that the 3-D Matching problem is Fixed Parameter Tractable (FPT), and gave an algorithm with time complexity O∗ ((3k)!(3k)9k+1 ), which can be applied to solve 3-Set Packing problem. Jia, Zhang and Chen [3] reduced the time complexity to O∗ ((5.7k)k ) using greedy localization method. Koutis [4] proposed a randomized algorithm with time
This work is in part supported by the National Natural Science Foundation of China under Grant No. 60773111 and No. 60433020, Provincial Natural Science Foundation of Hunan (06JJ10009), the Program for New Century Excellent Talents in University No. NCET-05-0683 and the Program for Changjiang Scholars and Innovative Research Team in University No. IRT0661.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 82–93, 2008. c Springer-Verlag Berlin Heidelberg 2008
An O∗ (3.523k ) Parameterized Algorithm
83
complexity O∗ (10.883k ) and a deterministic algorithm with time complexity at least O∗ (320003k ). Fellows et al [5] gave an algorithm with time complexity at least O∗ (12.673k T (k)) for the 3-D Matching problem, (based on current technology, T (k) is at least O∗ (10.43k )). Kneis, Moelle, S.R and Rossmanith [6] presented a deterministic algorithm of time O∗ (163k ) using randomized divide and conquer. Chen, Lu, S.H.S and Zhang [7] gave a randomized algorithm with time complexity O∗ (2.523k ) based on the divide and conquer method, whose deterministic algorithm is of time complexity O∗ (12.83k ). Liu, Lu, Chen and H Sze [1] gave a deterministic algorithm of time O∗ (4.613k ) based on greedy localization and color coding, which is currently the best result in the world. In this paper, we will discuss how to construct a (k + 1)-packing from a kpacking, so as to solve the 3-Set Packing problem. After further analyzing the structure of the problem, we can get the following property: if the 3-Set Packing Augmentation problem can not be solved in polynomial time, then each set in Pk+1 should contain at least one element of Pk . Based on the above property, we can get a randomized algorithm of time O∗ (3.523k ) using randomized divide and conquer. According to the structure analysis and the derandomization method given in [8], we can get a deterministic algorithm with the same time complexity O∗ (3.523k ), which greatly improves the current best result O∗ (4.613k ).
2
Related Terminology and Lemmas
We first introduce two important lemmas [1]. Lemma 1. For any constant c > 1, the 3-Set Packing Augmentation problem can be solved in time O∗ (ck ) if and only if the 3-Set Packing problem can be solved in time O∗ (ck ). Lemma 2. Let (S, Pk ) be an instance of 3-Set Packing Augmentation, where Pk is a k-packing in S. If S also has (k + 1)-packings, then there exists a (k + 1)packing Pk+1 in S such that every set in Pk contains at least two elements in Pk+1 . By lemma 1, reducing the time complexity of 3-Set Packing Augmentation problem is our objective. For the convenience of analyzing the structure of the 3-Set Packing Augmentation problem, we give the following definitions. Definition 1. Let (S, Pk ) be an instance of 3-Set Packing Augmentation problem, where Pk is a k-packing in S. Assume there exists Pk+1 , for a certain (k + 1)-packing P and a set ρi in Pk , if only one element of ρi is contained in P , ρi is called 1-Set; if no element of ρi is contained in P , it is called 0-Set. The collection of all the 1-Set and 0-Set in Pk is called (1, 0)-Collection of the (k+1)-packing P . If the (1, 0)-Collection of P is null, then P is called a (0, 1)-free packing.
84
J. Wang and Q. Feng
We need to point out that: different (k + 1)-packing may have different (1, 0)Collection. It is easy to get the following lemma from relation between Pk and Pk+1 . Lemma 3. Let (S, Pk ) be an instance of 3-Set Packing Augmentation problem, where Pk is k-packing in S. Assume there exists Pk+1 , any (k + 1)-packing can be transformed into a (0, 1)-free packing in polynomial time. Proof. For an instance of 3-Set Packing Augmentation problem (S, Pk ). Assume there exists Pk+1 , for any (k + 1)-packing P , do the following process. Find out all the 1-Set and 0-Set in Pk , denoted by W . For each set ρi in W , discuss it in the following two cases. Case 1: ρi is a 1-Set. Assume one element a in ρi is contained in P and the set ρj in P contains the element a. Use ρi to replace ρj in P such that the number of 1-Set in Pk is reduced by one. Because of the replacement, there may produce new 1-Set or 0-Set in Pk . Find out all the new 1-Set and 0-Set in Pk . If a new 1-Set or 0-Set has not existed in W , add this set into W . Case 2: ρi is a 0-Set. Since Pk has k sets and Pk+1 has k + 1 sets, there must exists one set ρj in P that is not contained in Pk . Use ρi to replace ρj in P such that the number of 0-Set in Pk is reduced by one. Because of the replacement, there may produce new 1-Set or 0-Set in Pk . Find out all the new 1-Set and 0-Set in Pk . If a new 1-Set or 0-Set has not existed in W , add this set into W . After processing all the sets in W , there are no 1-Set and 0-Set in Pk . Therefore, the (k + 1)-packing P is converted into a (0, 1)-free packing. Now, we prove that the above process can be done in polynomial time. In order to find out all the 1-Set and 0-Set, for each set ρi in Pk , we need to consider all the sets in P . Obviously, the time complexity of this process is bounded by O(k 2 ). When a set in P is replaced by 1-Set or 0-Set,it needs to redetermine the 1-Set and 0-Set in Pk . The whole time complexity of this process is bounded by O(k 3 ). This completes the proof of the lemma. Based on the lemma 3, we can get following lemma. Lemma 4. Let (S, Pk ) be an instance of 3-Set Packing Augmentation problem, where Pk is a k-packing in S. Assume there exists Pk+1 , for any (k + 1)-packing P , if P is a (0, 1)-free packing, then each set in Pk should have at least 2 elements be contained in P . Proof. Assume there exists Pk+1 in S, for any (k+1)-packing P , if P is a (0, 1)-free packing, the (1, 0)-Collection of P is null, that is, there is no 1-Set or 0-Set in Pk . Therefore, each set in Pk should have at least 2 elements be contained in P . Combing lemma 4 with the structure analysis of Pk+1 , we can get the following lemma. Lemma 5. Given an instance of 3-Set Packing Augmentation problem (S, Pk ), where Pk is k-packing in S. For any (0, 1)-free packing P , assume there are 2k+x (0 ≤ x ≤ k) elements from Pk contained in P , if the 3-Set Packing Augmentation problem can not be solved in polynomial time, each set in P should contain at least one of those 2k + x elements.
An O∗ (3.523k ) Parameterized Algorithm
85
Proof. By the lemma 4, each (0, 1)-free packing contains at least 2k elements of Pk . We use contradiction method to prove. Assume that: although the 3-Set Packing Augmentation problem can not be solved in polynomial time, one or more sets in P contain none of those 2k + x elements. Assume that there is a set α in P containing none of those 2k + x elements. Except those 2k + x elements, other elements in Pk are definitely not in P . Therefore, α and all sets in Pk have no common elements. According to the relation between elements and sets, construct the bipartite graph G = (V1 ∪ V2 , E), where the vertices in V1 correspond to the elements in U , and the vertices in V2 denote the sets in S. If an element is contained in a set, then the corresponding vertices will be connected by an edge. In graph G, we can find out all the sets having no common elements with Pk , thus, α is definitely in those sets. A (k + 1)-packing can be constructed by the k sets in Pk and α, which can be done in polynomial time. This contradicts with the assumption. This completes the proof of the lemma.
3
The Randomized Algorithm
In this part, randomized divide and conquer will be used efficiently to solve the 3-Set Packing Augmentation problem. Based on the lemma 3, we can assume that all the (k + 1)-packing used in the following are (0, 1)-free packing. By lemma 5, we can get the following lemma. Lemma 6. Given an instance of 3-Set Packing Augmentation problem (S, Pk ), where Pk is a k-packing in S. Assume there exists Pk+1 , for a (k + 1)-packing P , if P can not be found in polynomial time, P is composed of the following there parts. (1) P has r sets, each of which contains only one element of Pk , 0 ≤ r ≤ k+3 2 . (2) P has s sets, each of which contains only two elements of Pk , 0 ≤ s ≤ k+1. (3) P has t sets, each of which contains three elements of Pk , 0 ≤ t ≤ k − 1. where r + s + t = k + 1. Proof. By lemma 5, if there exists Pk+1 and could not find a (k + 1)-packing in polynomial time, each set in Pk+1 should contains at least one element of Pk . Therefore, for the (k +1)-packing P , each set in P may contain 1, 2 or 3 elements of Pk , which is one of the three types given in the lemma. Now we will prove that there are at most k+3 2 sets in P , each of which contains only one element of Pk . Assume that P contains 2k + x (0 ≤ x ≤ k) elements of Pk . If s = 0 and each set in P has already contained one of those 2k + x elements. When the remaining 2k + x − (k + 1) = k + x − 1 elements are used to form the sets containing three elements of Pk , r gets the maximum value: k + 1 − k+x−1 ≤ k+3 2 2 . When each set in P contains only two elements of Pk , s gets the maximum value k + 1, thus, s ≤ k + 1. Because each set in P contains at least one element of Pk , the maximum number of sets in P containing three elements of Pk is k − 1, thus, t ≤ k − 1.
86
J. Wang and Q. Feng
Let Ci (1 ≤ i ≤ 3) denote all the sets having i common elements with Pk , which can be found in polynomial time based on the relation of elements and sets. Assume UPk denotes the 3k elements in Pk and US−Pk denotes the elements in S − Pk . Therefore, each set in C2 contains 2 elements from UPk , and each set in C3 contains 3 elements from UPk . Let UC2 −Pk denote the elements in C2 but not in UPk , then UC2 −Pk ⊆ US−Pk . By lemma 6, if Pk+1 exists, there are r sets in Pk+1 such that each of which contains only one element of Pk , which are obviously included in C1 . To find the enumerations. Let H be the collection of r elements from UPk , there are 3k r sets in C1 containing one of those r elements. Assume UPk+1 −Pk denotes all the elements in Pk+1 but not in Pk , and the size of the UPk+1 −Pk is denoted by y = |UPk+1 −Pk |. By lemma 5, Pk+1 contains at least 2k elements of UPk , thus, UPk+1 −Pk contains at most k + 3 elements of US−Pk , that is, y ≤ k + 3. It can be seen that the elements in UPk+1 −Pk are either in H or in C2 ∪ C3 . Assume that UP k+1 −Pk denotes the elements of UPk+1 −Pk belonging to H. When the elements in UPk+1 −Pk are partitioned, the probability that the elements in UP k+1 −Pk are exactly partitioned into H is 21y . The general ideal of our randomized algorithm is as follows: Divide Pk+1 into two parts to handle, one of which is in H and the other in C2 ∪ C3 . For the part contained in C2 ∪ C3 , we use dynamic programming to find a (k + 1 − r)-packing; For the part in H, we use randomized divide and conquer to handle. 3.1
Use Dynamic Programming to find a (k + 1 − r)-Packing in C2 ∪ C3
For the convenience of describing the algorithm, we first give the concept of symbol pair. For each set ρi ∈ C2 , the elements from UPk in set ρi is called a symbol pair. The algorithm of finding a k = k + 1 − r (k ≤ k + 1) packing in C2 ∪ C3 is given in figure 1. Theorem 1. If there exists k -packing in C2 ∪ C3 , algorithm SP will definitely return a collection of symbol pairs and 3-sets with size k , and the time complexity is bounded by O∗ (23k ). Proof. If there exists k -packing in C2 ∪ C3 , assume that the number of sets in C2 contained in the k -packing is k , 0 ≤ k ≤ k . Therefore, the k sets of k -packing in C2 can form a k -packing. We need to prove the following two parts. (1) After the execution of the for-loop in step 3, Q1 must contain a collection of symbol pairs with size k . (2) After the execution of the for-loop in step 5, Q1 must contain a collection of symbol pairs and 3-sets with size k . The proof of the first part is as follows. It can be seen from step 3.1-3.4 that the C added into Q1 in the step 3.7 is a collection of symbol pairs from the right packing. We get a induction for the i in
An O∗ (3.523k ) Parameterized Algorithm
87
Algorithm SP Input: C2 , C3 , k , UPk Output: if there exists k -packing in C2 ∪ C3 , return a collection of symbol pairs and 3-sets with size k 1. assume the elements in UC2 −Pk are x1 , x2 , . . . xm ; 2. Q1 ={φ}; Qnew ={φ}; 3. for i = 1 to m do 3.1 for each collection C in Q1 do 3.2 for each 3-set ρ in C2 having element xi do 3.3 if C has no common element with ρ then 3.4 C = C ∪ {elements in ρ belonging to UPk }; 3.5 if C is not larger than k and no collection in Qnew has used exactly the same elements as that used in C then 3.6 add C into Qnew ; 3.7 Q1 = Qnew ; 4. assume the 3-sets in C3 are z1 , z2 , . . . zl ; 5. for h = 1 to l do 5.1 for each collection C in Q1 do 5.2 if C does not have common elements with zh thenC = C ∪ {zh }; 5.4 if C is not larger than k and no collection in Qnew has used exactly the same elements as that used in C then add C into Qnew ; 5.6 Q1 = Qnew ; 6. if there is a collection of symbol pairs and 3-sets with size k in Q1 then return the collection.
Fig. 1. Use dynamic programming to find a (k + 1 − r)-packing in C2 ∪ C3
the step 3 so as to prove that: if there exists k -packing in C2 , Q1 must contain a collection of symbol pairs with size k . There are m different elements in UC2 −Pk : x1 , x2 , . . . xm . For any arbitrary i (1 ≤ i ≤ m), assume that Xi denotes all the sets containing the element in {x1 , x2 , . . . xi }. Therefore, we only need to prove the following claim. Claim 1. If there exists a j-packing symbol pairs collection Pj in Xi , then after i-thexecution of the for-loop in step 3, Q1 contains a j-packing symbol pairs collection Pj , which uses the same 2j elements with Pj . In the step 2, Q1 ={φ}. Thus, if Xi has a 0-packing, the claim is true. When i ≥ 1, assume there exists a j-packing symbol pair collection Pj = {ϕl1 , ϕl2 , . . . , ϕlj }, where 1 ≤ l1 < l2 < · · · < lj ≤ i, then there must exists a (j − 1)-packing symbol pair collection Pj−1 = {ϕl1 , ϕl2 , . . . , ϕlj −1 } in Xlj −1 . By the induction assumption, after the (lj − 1)-th execution of for-loop in step 3, Q1 contains a (j − 1)-packing symbol pairs collection Pj−1 , which use the same 2(j − 1) elements of UPk with Pj−1 . By the assumption, Xi contains a jpacking symbol pair collection Pj . Therefore, when the set containing the ϕlj is considered in step 3.2, the elements belonging to UPk in Pj−1 are totally different from the elements in ϕlj . As a result, if there is no collection of symbol pairs
88
J. Wang and Q. Feng
in Q1 containing the same 2j elements with Pj−1 ∪ {ϕlj }, the j-packing symbol pairs Pj−1 ∪ {ϕlj } will be added into Q1 . Because all the collection of symbol pairs in Q1 are not removed from Q1 and lj ≤ i, after the i-th execution of the for-loop in step 3, Q1 must contains a j-packing symbol pairs collection using the same 2j elements with Pj . When i = m, if there exists k -packing Pk in C2 , Q1 must contain a collection of symbol pairs using the same 2k elements of UPk with Pk . The proof of the second part is similar to the first part, which is neglected here. At last, we analyze the time complexity of algorithm SP. If considering C2 only, for each j (0 ≤ j ≤ k ) and any subset of UPk containing 2j elements, Q1 record at most one collection symbol pairs using those 2j elements, thus, k +1 Q1 contains at most j=0 3k 2j collections. If considering C3 only, for each j (0 ≤ j ≤ k ) and any subset of containing 3j elements, Q1 record at most one k −1 collection of 3-sets using those 3j elements, thus, Q1 contains at most j=0 3k 3j collections. Therefore, the time complexity of algorithm SP is bounded by k +1 k −1 3k ∗ ∗ 3k max{O∗ ( j=0 3k j=0 2j ), O ( 3j )} = O (2 ).
If there exists k -packing in C2 ∪ C3 , the collection returned by algorithm SP may contain symbol pairs, which can be converted into 3-sets using the bipartite maximum matching. 3.2
Use Randomized Divide and Conquer to find a r-Packing in H
Assume that we have already picked r (0 ≤ r ≤ k+3 2 ) elements from UPk , and let H be the collection of sets in C1 containing one of those r elements. The algorithm of finding a r-packing in H is given in figure 2. Theorem 2. If H has r-packing, algorithm RSP will return a collection D containing the r-packing with probability larger than 0.75, and the time complexity is bounded by O∗ (42r ). Proof. Algorithm RSP divides H into two parts to handle: H1 , H2 . If there exists r-packing Pr , Pr has 2r elements of US−Pk , denoted by UPr −Pk . Assume that UP r −Pk denotes the elements belonging to UPr −Pk in H1 , thus, the elements belonging to UPr −Pk in H2 can be denoted by UPr −Pk − UP r −Pk . Mark all the elements from US−Pk in H1 and H2 with red and blue colors. When elements in UP r −Pk are exactly marked with red and elements in UPr −Pk − UP r −Pk are marked with blue, it is called that elements in H1 and H2 are rightly marked, which occurs with probability 212r . Therefore, the probability that the elements in H1 and H2 are not rightly marked is 1 − 212r . If there exists r-packing in H, let δr be the probability that algorithm RSP can not find the r-packing. In the step 3, H is divided into two parts: H1 , H2 . Therefore, the probability that RSP(H1 ) and RSP(H2 ) can not find the corresponding packing respectively is δr/2 . After 3·22r iterations, the probability 2r 1 that algorithm RSP could not find the Pk+1 is (1− 212r + 22r−1 ·δr/2 )3·2 . We need
An O∗ (3.523k ) Parameterized Algorithm
89
Algorithm RSP Input: H, r Output: return a collection D of packings 1. if r = 0 then return φ; 2. if r = 1 then return H; 3. randomly pick r2 elements from the r elements, and let H1 denote all the sets containing one of those r2 elements; 4. H2 = H − H1 ; D = φ; 5. for 3 · 22r times do 5.1 mark all the elements from US−Pk in H1 and H2 with red and blue; for each set ρ in H1 , if the colors of the elements belonging to US−Pk are not both red, delete the set ρ; for each set ρ in H2 , if the colors of the elements belonging to US−Pk are not both blue, delete the set ρ ; 5.2 D1 =RSP(H1 , r2 ); 5.3 D2 =RSP(H2 , r2 ); 5.4 for each packing α in D1 do for each packing β in D2 do if there does not exist α ∪ β in D, add α ∪ β into D; 6. return D; Fig. 2. Use randomized divide and conquer to find a r-packing in H
to prove that: for any r, δr ≤ 1/4. It is obvious that δ1 = 0. Assume δr/2 ≤ 1/4. 2r 1 · δr/2 )3·2 ≤ By the induction assumption, we can get that: δr = (1 − 212r + 22r−1 2r 2r+1 3 1 1 (1 − 212r + 22r−1 · 1/4)3·2 = (1 − 22r+1 ) 2 ·2 ≤ e−3/2 < 1/4. Let Tr denote the number of recursive calls in algorithm RSP, then Tr ≤ 3·22r ·(T r2 +T r2 ) ≤ 3·22r+1 ·T r2 = O(3log 2r 42r ) = O((2r)log 3 42r ) = O∗ (42r ). 3.3
The General Algorithm for 3-Set Packing Augmentation
Based on the above two algorithm, the general algorithm for 3-Set Packing Augmentation problem is given in figure 3. Theorem 3. If S has (k + 1)-packing, algorithm GSP will return the (k + 1)packing with probability larger than 0.75, and the time complexity is bounded by O∗ (3.523k ). Proof. In the above algorithm, we need to consider all the enumeration of r. If S has a (k + 1)-packing, there must exist a r and an enumeration satisfying the condition. The algorithm divides Pk+1 into two parts to handle, one of which is in H and the other in C2 ∪ C3 . When elements in UP r −Pk are exactly marked with white and elements in UPr −Pk − UP r −Pk are marked with black, it is called that elements in US−Pk are rightly marked, which occurs with probability 21y .
90
J. Wang and Q. Feng
Algorithm GSP Input: S, k Output: whether there is a (k + 1)-packing in S do 1. for r = 0 to k+3 2 1.1 enumerate r elements from UPk , and get 3k enumerations; r 1.2 for each enumeration do let H be the collection of sets in C1 containing one of those r elements; 1.3 for 24 · 2k times do C2 = C2 ; C3 = C3 ; for each element a in UPk , if a belongs to H, then delete all the sets in C2 ∪ C3 containing a; use colors black and white to mark all the elements in US−Pk ; for each set ρ in C2 , if the color of the element belonging to US−Pk is not black, delete the set ρ; for each set ρ in H, if the colors of the elements belonging to US−Pk are not both white, delete the set ρ ; Q2 =SP(C2 , C3 , k + 1 − r, UPk ); Q3 =RSP(H, r); use the bipartite maximum matching algorithm to convert the symbol pairs in Q2 into 3-sets; if Q2 is a (k + r − 1)-packing and Q3 has a r-packing then return the (k + 1)-packing; stop; 2. return no (k + 1)-packing in S; Fig. 3. The general algorithm for 3-Set Packing Augmentation
Therefore, the probability that the elements in US−Pk are not rightly marked is 1 − 21y . If S has Pk+1 , let δk denote the probability that algorithm GSP can not find the Pk+1 . If H has r-packing, let δr denote the probability that algorithm RSP can not find the r-packing. By theorem 3, we know that δr ≤ 1/4. If Pk+1 exists, for a certain r and an enumeration, after 24 · 2k iterations of step 1.3, the k 1 probability that algorithm GSP could not find the Pk+1 is (1− 21y + 2y−1 ·δr )24·2 , that is, k k k 1 1 1 · δr )24·2 ≤ (1 − 21y + 2y−1 · 14 )24·2 = (1 − 2y+1 )24·2 ≤ δk = (1 − 21y + 2y−1 e−3/2 < 1/4. By theorem 1, If there exists (k + 1 − r)-packing in C2 ∪ C3 , algorithm SP will definitely return a collection of symbol pairs and 3-sets with size k + 1 − r. By theorem 2, if H has r-packing, algorithm RSP will return the r-packing with probability larger than 0.75. Therefore, if S has (k + 1)-packing, algorithm GSP will return the (k + 1)-packing with probability larger than 0.75. we analyze time complexity of the above algorithm. For each r, there are 3kNow ways to enumerate r elements from UPk . By theorem 1, the time complexity r of algorithm SP is bounded by O∗ (23k ). By theorem 2, the time complexity of
An O∗ (3.523k ) Parameterized Algorithm
91
algorithm RSP is O∗ (42r ). Because of 0 ≤ r ≤ k+3 2 , the running time of algorithm RSP is bounded by O∗ (4k ). Using bipartite maximum matching algorithm to covert symbol pairs in Q2 can be done in polynomial time. Therefore, the total k+3 3k k 3k−r time complexity of algorithm GSP is r=02 + 4k )) = O∗ (3.523k ). r (2 (2
4
Derandomization
When there exists (k + 1)-packing, in order to make failure impossible, we need to derandomize the above algorithm. We first point out that: partitioning a set means dividing the set into two parts. Because the size of the UPk+1 −Pk is y, there are 2y ways to partition UPk+1 −Pk . Therefore, there must exist a partition satisfying the following property: elements in UP k+1 −Pk are exactly partitioned into H, and elements in UPk+1 −Pk −UP k+1 −Pk are exactly in C2 ∪ C3 . However, the problem is that: UPk+1 −Pk is unknown. Naor, Schulman and Srinivasan [9] gave the solution for the above problem. Moreover, Chen and Lu [8] presented a more detailed description of that method. Now we introduce a very important lemma in [10]. Lemma 7. Let n, k be two integers such that 0 < k ≤ n. There is an (n, k)2 universal set P of size bounded by O(n2k+12 log k+12 log k+6 ), which can be con2 structed in time O(n2k+12 log k+12 log k+6 ). The (n, k)-universal set in the above lemma is a set F of splitting functions, such that for every k-subset W of {0, 1, · · · , n − 1} and any partition (W1 , W2 ) of W , there is a splitting function f in F that implements (W1 , W2 ). In the construction of the above lemma, Chen and Lu [8] constructed a fuction h(x)=((ix mod p)mod k 2 ) from {0, 1, · · · , n − 1} to {0, 1, · · · , k 2 − 1}, and used the fact that there are at most 2n h(x) to get the above lemma. However, the bound 2n is not tight. Now, we introduce an important lemma in [9]. Lemma 8. There O(k 6 log k log n).
is
an
explicit
(n, k, k 2 )-splitter
A(n, k)
of
size
In the above lemma, the (n, k, k 2 )-splitter A(n, k) denotes the function from {0, 1, · · · , n − 1} to {0, 1, · · · , k 2 − 1}. Thus, the number of functions from {0, 1, · · · , n − 1} to {0, 1, · · · , k 2 − 1} are bounded by O(k 6 log k log n). Based on lemma 7, lemma 8, we can get the following lemma. Lemma 9. Let n, k be two integers such that 0 < k ≤ n. There is an (n, k)2 universal set P of size bounded by O(log n2k+12 log k+18 log k ), which can be con2 structed in time O(log n2k+12 log k+18 log k ). By the lemma 9, we can get the following theorem.
92
J. Wang and Q. Feng
Theorem 4. 3-Set Packing Augmentation problem can be solved deterministically in time O∗ (3.523k ). Proof. Given an instance of 3-Set Packing Augmentation problem (S, Pk ), if there exists Pk+1 , by lemma 5, Pk+1 contains at least 2k elements of UPk , thus, UPk+1 −Pk contains at most k + 3 elements of US−Pk . After picking r elements from UPk , Pk+1 is divided into two parts to handle in the randomized algorithm, one of which is in H and the other in C2 ∪ C3 . By lemma 9, we can construct the (S − Pk , k + 3)-universal set, whose size is bounded by 2 O(log n2k+3+12 log (k+3)+18 log(k+3) ). For each partition to US−Pk in the (S − Pk , k + 3)-universal set, let UH−Pk denote the elements partitioned into H. If H has a r-packing Pr , there are 2r elements of US−Pk in Pr , denoted by UPr −Pk . Assume UP r −Pk denotes the elements of UPr −Pk in H1 . In order to find Pr , UP r −Pk should be partitioned into H1 , and UPr −Pk − UP r −Pk should be in H2 . By lemma 9, we can construct (UH−Pk , 2r)-universal set, whose size is bounded by 2 O(log n2k+3+12 log (k+3)+18 log(k+3) ). In the derandomization of algorithm RSP, the time complexity is: 2 Tr ≤ log n2k+3+12 log (k+3)+18 log(k+3) (T r2 + T r2 ) ≤ log n2k+3+12 log
2
(k+3)+18 log(k+3)+1
T r2 log log n 2(k+3)+4 log3 (k+3)+15 log2 (k+3)+13 log(k+3)
= O((k + 3) 2 ). In the practical point of view, Tr is bounded by: 3 2 O(22(k+3)+4 log (k+3)+15 log (k+3)+11 log(k+3) ). If there exists Pk+1 , on the basis of (S − Pk , k + 3)-universal set and the above result, we can get the Pk+1 deterministically with time complexity: k+3 k+3+12 log2 (k+3)+18 log(k+3) 2 3k r=0 r (log n2 3 2 (23k−r + 22(k+3)+4 log (k+3)+15 log (k+3)+13 log(k+3) )) = O∗ (3.523k ). By lemma 1 and theorem 4, we can get the following corollary. Corollary 1. 3-Set Packing can be solved in O∗ (3.523k ).
5
Conclusions
For the 3-Set Packing problem, we construct a (k + 1)-packing Pk+1 from a kpacking Pk . After further analyzing the structure of the problem, we can get the following property: for any (0, 1)-free packing P , if the 3-Set Packing Augmentation problem can not be solved in polynomial time, each set in P should contains at least one element of Pk . On the basis of the above property, we get a randomized algorithm of time O∗ (3.523k ). Based on the derandomization method given in [10], we can get a deterministic algorithm with the same time complexity, which greatly improves the current best result O∗ (4.613k ). Our results also imply improved algorithms for various triangle packing problems in graphs [10].
An O∗ (3.523k ) Parameterized Algorithm
93
References 1. Liu, Y., Lu, S., Chen, J., Sze, S.H.: Greedy localization and color-coding:improved matching and packing algorithms. In: Bodlaender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, pp. 84–95. Springer, Heidelberg (2006) 2. Downey., R., Fellows, M.: Parameterized complexity. Springer, New York (1999) 3. Jia, W., Zhang, C., Chen, J.: An efficient parameterized algorithm for m-SET PACKING. J. Algorithms 50, 106–117 (2004) 4. Koutis, I.: A faster parameterized algorithm for set packing. Information Processing Letters 94, 7–9 (2005) 5. Fellows, M., Knauer, C., Nishimura, N., Ragde, P., Rosamond, F., Stege, U., Thilikos, D., Whitesides, S.: Faster fixed-parameter tractable algorithms for matching and packing problems. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, Springer, Heidelberg (2004) 6. Kneis, J., Moelle, D., Richter, S., Rossmanith, P.: Divide-and-Color. In: Fomin, F.V. (ed.) WG 2006. LNCS, vol. 4271, pp. 58–67. Springer, Heidelberg (2006) 7. Chen, J., Lu, S., Sze, S.-H., Zhang, F.: Improved algorithms for path, matching, and packing problems. In: Proc. 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2007), pp. 298–307 (2007) 8. Chen, J., Lu, S.: Improved parameterized set splitting algorithms: A probabilistic approach. Algorithmica (to appear) 9. Naor, M., Schulman, L., Srinivasan, A.: Splitters and near-optimal derandomization. In: Proceedings of the 36th Annual Symposium on Foundations of Computer Science (FOCS 1995), pp. 182–190 (1995) 10. Mathieson, L., Prieto, E., Shaw, P.: Packing edge disjoint triangles: A parameterized view. In: Downey, R.G., Fellows, M.R., Dehne, F. (eds.) IWPEC 2004. LNCS, vol. 3162, pp. 127–137. Springer, Heidelberg (2004)
Indistinguishability and First-Order Logic Skip Jordan and Thomas Zeugmann Division of Computer Science Hokkaido University, N-14, W-9, Sapporo 060-0814, Japan {skip,thomas}@ist.hokudai.ac.jp
Abstract. The “richness” of properties that are indistinguishable from first-order properties is investigated. Indistinguishability is a concept of equivalence among properties of combinatorial structures that is appropriate in the context of testability. All formulas in a restricted class of second-order logic are shown to be indistinguishable from first-order formulas. Arbitrarily hard properties, including RE-complete properties, that are indistinguishable from first-order formulas are shown to exist. Implications on the search for a logical characterization of the testable properties are discussed. Keywords: Property testing, logic, graph theory, descriptive complexity.
1
Introduction
In property testing we are interested in efficiently deciding whether a given structure possesses, or is far from possessing, a desired property. Although algorithmic efficiency is often defined as polynomial time, this is not always ideal. For example, we may not have an explicit representation of the input but instead only the ability to query an oracle for individual bits. These queries may be expensive and even a linear-time algorithm, which requires us to compute the entire input explicitly, may be unacceptable. If we are only concerned about distinguishing with high probability between the input satisfying and being far from satisfying a property, a sub-linear number of queries may be sufficient. Testers are randomized algorithms that, given the size of the input, are restricted to a certain number of queries which does not depend on the input size. We shall give formal definitions below, but mention here that such algorithms are probabilistic approximation algorithms. In a certain sense, the whole area can be traced back to Freivalds [14] who introduced a program result checker for matrix multiplication over a finite field. Subsequently, the study of property testing originally arose in the context of program verification (see Blum et al. [7] and Rubinfeld and Sudan [22]), and the first explicit definition appears to be in [22]. For surveys of the field see e.g., Fischer [12] and Ron [21]. Characterizing the testable properties has been called the most important problem in the area (see Alon and Shapira [5]). The class of regular languages was shown to be testable in Alon et al. [4], a result that was extended in Chockler and Kupferman [8]. However, there are context-free languages that are not M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 94–104, 2008. c Springer-Verlag Berlin Heidelberg 2008
Indistinguishability and First-Order Logic
95
testable [12]. It is then perhaps at first surprising that there are many natural, testable properties that are considerably harder. Testers for NP-complete graph properties including k-color were given in Goldreich et al. [16]. Restricting ourselves to characterizations of testable graph properties, the first step towards a logical characterization was obtained by Alon et al. [2], later extended by Fischer [13]. They show that all properties expressible in first-order logic (FO) with all quantifier alternations of type “∃∀” are testable, whereas there exists a property expressible with a single quantifier alternation of type “∀∃” that is not testable. It is useful to note the equivalence of first-order logic with arithmetic and uniform AC0 (see Barrington et al. [6]). Later, a characterization of the graph properties testable with one-sided error by algorithms that are unaware of the input size was given in [5], a result that was extended to hypergraphs by R¨ odl and Schacht [20]. An exact combinatorial characterization of the graph properties testable with a constant number of queries has been obtained by Alon et al. [3]. In the present paper we focus on a question raised in [13]: the expressive power of first-order logic in the context of indistinguishability and testing. The concept of indistinguishability was introduced by Alon et al. [2] as a suitable form of equivalence in the context of testing. First-order logic with arithmetic is equivalent to uniform AC0 , and so it is strictly contained in NP (see Furst et al. [15]). Therefore, properties that are complete for NP under first-order reductions such as k-color cannot be expressed in FO (see Allender et al. [1]). However, it has been noted in Alon et al. [2] that there are first-order expressible properties that are indistinguishable from such properties, including k-color. In this sense, the descriptive power of first-order logic with indistinguishability is surprisingly rich. We examine the set of properties that are indistinguishable from FO-expressible properties and show that this set is larger than was previously known. The paper is organized as follows. We begin by giving more formal definitions. In Section 3 we show that all graph properties expressible in a restriction of monadic second-order existential logic (MSO∃) are indistinguishable from FO properties. We next prove that there are arbitrarily-hard properties, including RE-complete properties, that are indistinguishable from FO properties (cf. Section 4). In fact, we can construct arbitrarily-hard testable properties. Finally, we discuss the implications of the results obtained (cf. Section 5).
2
Preliminaries
We generally restrict our attention to graph properties, and so give the following definitions for graphs. We consider only finite, undirected graphs without loops. We use G and H to refer to graphs, n = |G| as the number of vertices in a graph, and P , Q and R to refer to properties. Let G and G be two graphs having the same vertex set and let ε > 0. If G can be constructed from G by adding and removing no more than εn2 edges then we say that G and G differ in no more than εn2 places.
96
S. Jordan and T. Zeugmann
Definition 1 (Alon et al. [2]). Let P be a property of graphs and let ε > 0. (1) A graph G with n vertices is called ε-far from satisfying P if no graph G with the same vertex set, which differs from G in no more than εn2 places, satisfies P . (2) An ε-test for P is a randomized algorithm which, given the quantity n and the ability to make queries whether or not a desired pair of vertices of an input graph G with n vertices are adjacent, distinguishes with probability at least 23 between the case of G satisfying P and the case of G being ε-far from satisfying P . Note that in Definition 1 the choice of 23 is of course traditional and arbitrary. Any probability strictly greater than 12 can be chosen and the resulting test can be iterated a constant number of times to achieve any desired accuracy strictly less than one, see e.g., Hromkoviˇc [19]. Definition 2 (Alon et al. [2]). The property P is called testable if for every fixed ε > 0 there exists an ε-test for P whose total number of queries is bounded only by a function of ε, which is independent of the size of the input graph. We allow the tester to know the size of the input, and to make its queries in any computable fashion. In [17] this was shown to be equivalent to the “nonadaptive” model, where a tester uniformly chooses a set of vertices, receives the induced subgraph, and makes a decision based on whether that subgraph has some fixed property. We note that this definition of testability is not an o(n) number of queries and that “ε-far” clearly depends on n. In general we do not require a uniformity condition. However, it is very natural to require the ε-tests for P in the above definition to be computable given ε. We refer to properties satisfying this additional condition as uniformly testable where the uniform and non-uniform cases differ (Proposition 1). We note that testers given in the literature are generally presented as a single algorithm that takes ε as a parameter and therefore uniform. Definition 3 (Alon et al. [2]). Two graph properties P and Q are called indistinguishable if for every ε > 0 there exists N = N (ε) satisfying the following. For every graph G having n > N vertices that satisfies P there exists a graph G with the same vertex set, differing from G in no more than εn2 places, which satisfies Q; and for every graph H with n > N vertices satisfying Q there exists a graph H with the same vertex set, differing from H in no more than εn2 places, which satisfies P . As notation, we use φ, ψ and γ to refer to logical formulas. Whether these formulas are first- or second-order will be clear from context. We use Φ to denote a logical interpretation of variables. First-order variables are denoted by xi , yi and ti while second-order variables are denoted by Ci . Members of the universe (nodes in the graph) are referred to as ui when we wish to distinguish between variables and the nodes bound to them. We write E(x, y) to denote the predicate
Indistinguishability and First-Order Logic
97
“there is an edge between x and y.” Since we consider only finite, undirected graphs without loops, it follows that E(x, y) implies E(y, x) for all x, y and that E(x, x) is false for all x. Logical structures such as graphs allow us to interpret predicate symbols. The combination of structures and interpretations, e.g., (G, Φ), then allows us to interpret formulas. We write G |= φ, read G models φ, if formula φ holds when interpreting the edge predicate according to G. Inductively we use interpretations to interpret bound variables, and write (G, Φ) |= φ if formula φ holds when interpreting bound variables according to Φ and the edge predicate (assuming it isn’t bound by a second-order quantifier) according to G. For a more formal introduction to the logics used, see e.g., Enderton [9]. We assume that ordering and arithmetic are not present in the logics considered, however ordering and arithmetic can be defined in logics containing existential second-order. In proofs we treat the universal quantifier (∀) as the dual of the existential quantifier (∃). Although properties are often implicitly assumed to be computable, we shall see that there are implications to this assumption, and so explicitly state that we do not make this assumption. Finally, we define DTIME(f (n)) in the usual manner as the set of decision problems computable on a deterministic Turing machine in f (n) steps.
3
Monadic Second-Order Existential Logic
In this section we show that all graph properties expressible in a restriction of monadic second-order existential logic (see Definition 4 below) are indistinguishable from FO properties. Definition 4. Let rMSO∃ denote the graph properties expressible in monadic second-order existential logic that satisfy the following. For all P ∈rMSO∃ expressible with r second-order quantifiers, there exists an N such that in all graphs G satisfying P with n > N vertices (1) there exists a set of r vertices such that removing them does not affect P , and (2) adding r disconnected vertices to the graph does not affect P . Note that rMSO∃ contains natural problems such as k-color. The first restriction is similar to but weaker than hereditariness. The proof of the following theorem is a generalization of a result regarding the indistinguishability of k-color from FO properties (see, e.g., Alon et al. [2]). Theorem 1. All properties expressible in rMSO∃ are indistinguishable from properties expressible in FO. Proof. Let P be the rMSO∃ property expressed by formula φ := ∃C1 C2 . . . Cr ψ, where ψ is first-order. We construct a first-order formula φ expressing property Q such that P and Q are indistinguishable. φ := ∃t1 t2 . . . tr ψ ,
98
S. Jordan and T. Zeugmann
where the symbols ti do not occur in ψ (if they do occur, simply rename them). Formula ψ is derived from ψ with the following changes: 1. Replace all 2. Replace all ∃x : γ with ∀x : γ with
occurrences of Ci (x) with E(ti , x). quantifiers with restricted quantifiers: ∃x : (x = t1 ∧ . . . ∧ x = tr ) ∧ γ and ∀x : (x = t1 ∧ . . . ∧ x = tr ) → γ.
First, let ε > 0 be arbitrary and let G model φ. Assume |G| = n > N . Choose the set of r vertices guaranteed to exist by Restriction (1) of Definition 4, call them u1 . . . ur and remove them. Call this graph G− . Take an interpretation Φ under which G− models φ. Replace the ui as disconnected vertices. Connect ui and x iff Ci (x) holds in Φ. Call the resulting graph G . Note that we have changed at most r(n − 1) edges, which is less than εn2 for sufficiently large n. Claim 1. Graph G models φ . Proof. We construct a satisfying interpretation Φ from Φ. The only change is to bind ti to ui . Recall that the ti appear only where we added them above and thus the ui are only referred to in these contexts. Note that: – (G− , Φ) |= x = y ⇐⇒ (G , Φ ) |= x = y, because the ui cannot appear here and all other members are retained. – (G− , Φ) |= Ci (x) ⇐⇒ (G , Φ ) |= E(ti , x), by construction. – Logical operators ∧, ¬ are preserved inductively. – (G− , Φ) |= ∃x : γ ⇐⇒ (G , Φ ) |= ∃x : (x = t1 ∧ . . . ∧ x = tr ) ∧ γ, because G− does not contain the vertices referred to by the ti . Therefore, G models φ and thus has the first-order property Q.
Claim 1
Next, let ε > 0 be arbitrary and assume that graph H models φ . Assume further that |H| > N . Let Φ be an interpretation satisfying ψ . Let the vertices bound to the ti be called ui . Recall that because of the restricted quantifiers in φ , the ui are referred to only as ti , and the ti only occur where we explicitly added them. Remove the ui from H and call this graph H − . Next, re-add the ui as r isolated vertices and call this graph H . We claim that both H − and H model φ, and construct a satisfying interpretation Φ from Φ . Because of Restriction (2) in Definition 4, it is sufficient to prove that H − models φ as adding r isolated vertices will not affect property P . We set Ci (x) to be true in Φ iff there is an edge between ui and x in H. Claim 2. (H − , Φ) |= φ. Proof. Note that: – (H − , Φ) |= x = y ⇐⇒ (H, Φ ) |= x = y, because x and y cannot be bound to ui on the right. – (H − , Φ) |= Ci (x) ⇐⇒ (H, Φ ) |= E(ti , x), by construction.
Indistinguishability and First-Order Logic
99
– Logical operators ∧, ¬ are preserved inductively. – (H − , Φ) |= ∃x : γ ⇐⇒ (H, Φ ) |= ∃x : (x = t1 ∧ . . . ∧ x = tr ) ∧ γ, because H − does not contain the vertices referred to by the ti . Claim 2
Therefore, H − has property P and by Restriction (2) of Definition 4, H does too. We have changed at most r(n−1) edges, which is less than εn2 for sufficiently large n. Properties P and Q are therefore indistinguishable.
4
Hard Properties
In the previous section, we showed that every property expressible in a restriction of monadic second-order existential logic is indistinguishable from some FO-expressible property. All properties expressible in this logic are contained in NP by Fagin’s [10] theorem. For graphs it is known that MSO∃ is strictly less expressive than SO∃: there are graph properties in NP that are not expressible in MSO∃ (see, Fagin et al. [11]). In this section we continue our study of the set of properties indistinguishable from FO-expressible properties, and show that it contains much harder properties. We show that this set contains uncomputable properties, and also, for every f (n), computable (and testable) properties that are not computable in time f (n). In this sense, the power of first-order logic in the context of indistinguishability is even larger than previously known. However, we also show that there exist computable properties that are distinguishable from every first-order expressible property. In the next proof we use the concept of RE-completeness. A decision problem is in RE (recursively enumerable) iff there exists a Turing machine that halts and accepts all positive instances and does not accept negative instances. The machine is not required to halt on negative instances. A decision problem D is RE-complete iff it is in RE and all other problems in RE can be decided by machines given an oracle for D. The halting problem is the canonical REcomplete problem. Theorem 2. There exists an RE-complete graph property that is indistinguishable from a FO-expressible property. Proof. We define property P such that graph G satisfies it iff (1) there is a vertex i such that all edges are incident to i, and (2) taking the degree d of vertex i as the number of a Turing machine Md in some canonical enumeration (Mi )i∈N of Turing machines where every Turing machine appears at least once, provided Md halts on the empty string. If Md does not halt on the empty string, then graph G does not have property P . For convenience, we require that machine M0 in the enumeration halts on the empty string. We shall show that P is an RE-complete property that is indistinguishable from the FO-expressible property of being an empty graph.
100
S. Jordan and T. Zeugmann
Lemma 1. P is RE-complete. Proof. Let HALT={a | Ma halts on the empty string}. We show that P is REcomplete by reducing HALT to it. On input a, representing Turing machine Ma in the enumeration, we output a graph on a+1 vertices. There is an edge between vertices i and j iff exactly one of them is zero. All edges are then incident to the zero vertex, and as the degree of vertex zero is n − 1 = a, the graph has property Lemma 1
P iff machine Ma halts on the empty string. Lemma 2. P is indistinguishable from the FO property of being an empty graph (∀x, y : ¬E(x, y)). Proof. Assume graph G satisfies property P and let ε > 0 be arbitrary. Let i denote the vertex that is mentioned in Property (1) of the Theorem, and let d be its degree. Remove the d ≤ n − 1 edges incident to i. By assumption, all edges in G were incident to i, and so the resulting graph is empty. For n sufficiently large, n − 1 < εn2 . Now assume graph G is an empty graph. By assumption, M0 halts. Choose an arbitrary vertex i and note that all zero edges are incident to it. Consequently, G has property P . We have not changed any edges, and therefore 0 < εn2 for all non-zero n. Lemma 2
Lemma 1 and 2 directly yield the theorem.
The following proposition is essentially obvious: an ε-tester is a probabilistic machine. It remains only to mention that we can, by choosing ε > 0 appropriately as a function of n, remove the “approximation” in “probabilistic approximation algorithm.” Of course, we then make a number of queries that depends on the input size. It is important to note that this proposition does not hold in the non-uniform case. Proposition 1. All uniformly testable graph properties can be decided by a probabilistic Turing machine with success probability at least 23 . Proof. Assume graph property P is testable. We construct a probabilistic machine deciding P . On input G of size n, choose ε such that εn2 < 1, for example, ε = n21+1 . Run the ε-test on G and output the result. Because εn2 < 1, being at most ε-far from G implies that we may not change any edges. We therefore distinguish with probability at least 23 between the case of G satisfying P and G not satisfying P .
Corollary 1. All uniformly testable properties are decidable by probabilistic machines. Proof. Simply modify εn2 in the proof to the appropriate definition of ε-far for the vocabulary in question.
The following also follows immediately, as randomization does not allow us to compute uncomputable functions. We provide a proof sketch for completeness.
Indistinguishability and First-Order Logic
101
Corollary 2. All uniformly testable properties are recursive. Proof. We convert the probabilistic machine into a deterministic machine using the following generic construction. All probabilistic machines can be modified such that their randomness is taken from a special binary “random tape” that is randomly fixed when the machine is started, in which each digit is 0 or 1 with equal probability. All halting probabilistic machines must eventually halt, regardless of the random choices made. We can then simulate the machine over all initial segments of increasing lengths, keeping track of “accepting,” “rejecting” and “still running” states. Once any given segment has halted, all random strings beginning with that initial segment must also halt. Therefore, the percentage of halting paths is increasing, and we shall eventually reach a length such that at least 70% of the paths have halted. Our error probability is at most 13 , strictly less than half of 70% and so we can output the decision of the majority of the halting paths.
Theorem 3 (Alon et al. [2]). Every first-order property P of the form ∃x1 , . . . , xt ∀y1 , . . . , ys : A(x1 , . . . , xt , y1 , . . . , ys ), where A is quantifier-free, is testable. Note that our RE-complete property P defined in the proof to Theorem 2, is indistinguishable from a first-order property of this form (t = 0). Theorem 4 (Alon et al. [2]). If P and Q are indistinguishable graph properties, then P is testable if and only if Q is testable. However, we now see a contradiction in the uniform case. Our RE-complete property P , defined in the proof to Theorem 2, is indistinguishable from a FOproperty that, according to Theorem 3, is testable (which it is). Theorem 4 then implies that this RE-complete property P is also testable, which contradicts Corollary 2. Therefore, Theorem 4 is not strictly correct in the uniform case. The proof given in [2] assumes that if input G has strictly less than N = N (ε) vertices, there exists a decision procedure that gives “accurate output according to whether it satisfies” the property in question. Of course, no such (uniform) procedure exists for RE-complete properties. Theorem 4 holds in the uniform case when restricted to recursive properties. We can however use a similar construction to Theorem 2 to obtain the following, restricting ourselves to recursive properties. By the time hierarchy theorem, for every computable f (n) there exist computable properties that cannot be decided in time f (n), see Hartmanis and Stearns [18]. We define arbitrarily-hard properties to be any set of computable properties that “for each computable f (n), contains properties that cannot be computed in DTIME(f (n)).” It is of course possible to use other complexity measures. We have restricted these sets to computable properties and so they are obviously infinite.
102
S. Jordan and T. Zeugmann
The reduction in the following proof increases the input length by an exponential factor. However, because we are interested in arbitrarily-hard properties and by the time hierarchy theorem, we can choose Q such that it is not computable in, e.g., DTIME(2f (n) ). Then, after the input length is increased exponentially, we have a property that cannot be computed in DTIME(f (n)). This can be done for all f (n). Theorem 5. There are arbitrarily-hard testable properties. Proof. Let Q be an arbitrarily-hard property. We define property R such that Q is reducible to R. Let q(x) be the characteristic function for Q with an appropriate encoding of the input. Similar to Theorem 2, we define R to hold in graph G of size n iff 1. there is a vertex i such that all edges are incident to i, 2. and either the degree of i is zero or q(n) = 1. We can obviously reduce Q to R by computing the encoding x, and outputting a graph on x vertices with one edge. We can therefore construct arbitrarily hard properties R. Using the same proof as that used for Theorem 2, we see that all such R are also indistinguishable from the empty graph. The property of being the empty graph is testable by Theorem 3, and so R is testable by Theorem 4, if it is decidable. We can therefore construct arbitrarily hard, testable properties.
Computable graph properties that are distinguishable from all first-order properties do exist however. We show the following by a simple diagonalization argument. We define distinguishable as “not indistinguishable.” Theorem 6. There exist computable graph properties that are distinguishable from all first-order properties. Proof. We let first-order formula φi denote the i’th formula in some enumeration of first-order formulas on graphs, in which all such formulas occur infinitely often. We define property Q such that G has property Q iff φ|G| does not hold on any graph with |G| = n vertices. We show that Q is distinguishable from all first-order properties by contradiction. Assume that first-order ψ expresses a property that is indistinguishable from Q. Find the first i > N such that ψ = φi . There are two cases. First, assume that there is a graph G on i vertices such that G satisfies ψ. Then, there is no graph on i vertices with property Q, by construction. We therefore cannot obtain a graph G on i vertices with property Q by changing at most εn2 edges in G, because no such graph exists. There must then be no such graph G on i vertices satisfying ψ. In this case, by definition all graphs on i vertices have property Q. Taking any of them, we see that we cannot obtain a graph on i vertices satisfying ψ by modifying at most εn2 edges, because again no such graph exists. There must then be no first-order ψ expressing a property indistinguishable from Q.
Indistinguishability and First-Order Logic
5
103
Discussion
The descriptive power of first-order logic with indistinguishability is surprisingly large. As we have seen, we can construct testable properties of arbitrary hardness. However, as seen in [2], there are problems in FO with quantifier alternations “∀∃” that are not testable. So, although the class of testable properties contains arbitrarily hard properties, it does not strictly contain uniform AC0 or even the context-free languages. In this sense, it is a rather odd class. A complete, logical characterization of the testable properties must then not contain FO entirely, but must contain arbitrarily hard properties. However, in the uniform case we have seen that all testable properties are recursive, and so a characterization of the uniformly testable properties must not be able to express e.g. RE-complete properties. We also believe that the distinction between the uniformly testable and nonuniformly testable properties is important. In particular, given our motivation of searching for a class that is very efficiently computable, it seems undesirable to admit uncomputable properties. In this sense, the uniform case is preferable (Proposition 1). It would also be worthwhile to consider other possible definitions of uniformity. Acknowledgements. We wish to thank Osamu Watanabe for an inspiring discussion regarding the importance of uniformity conditions.
References [1] Allender, E., Balc´ azar, J.L., Immerman, N.: A first-order isomorphism theorem. SIAM J. Comput. 26(2), 539–556 (1997) [2] Alon, N., Fischer, E., Krivelevich, M., Szegedy, M.: Efficient testing of large graphs. Combinatorica 20(4), 451–476 (2000) [3] Alon, N., Fischer, E., Newman, I., Shapira, A.: A combinatorial characterization of the testable graph properties: It’s all about regularity. In: STOC 2006: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 251–260. ACM, New York, NY, USA (2006) [4] Alon, N., Krivelevich, M., Newman, I., Szegedy, M.: Regular languages are testable with a constant number of queries. SIAM J. Comput. 30(6), 1842–1862 (2001) [5] Alon, N., Shapira, A.: A characterization of the (natural) graph properties testable with one-sided error. In: Proceedings, 46th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2005, pp. 429–438. IEEE Computer Society Press, Los Alamitos (2005) [6] Barrington, D.A.M., Immerman, N., Straubing, H.: On uniformity within NC1. J. of Comput. Syst. Sci. 41(3), 274–306 (1990) [7] Blum, M., Luby, M., Rubinfeld, R.: Self-testing/correcting with applications to numerical problems. J. of Comput. Syst. Sci. 47(3), 549–595 (1993) [8] Chockler, H., Kupferman, O.: ω-regular languages are testable with a constant number of queries. Theoret. Comput. Sci. 329(1-3), 71–92 (2004) [9] Enderton, H.B.: A Mathematical Introduction to Logic, 2nd edn. Academic Press, London (2000)
104
S. Jordan and T. Zeugmann
[10] Fagin, R.: Generalized first-order spectra and polynomial-time recognizable sets. In: Karp, R.M. (ed.) Complexity of Computation, SIAM-AMS Proceedings, vol. VII, Amer. Mathematical Soc., pp. 43–73 (1974) [11] Fagin, R., Stockmeyer, L.J., Vardi, M.Y.: On monadic NP vs. monadic co-NP. Inform. Comput. 120(1), 78–92 (1995) [12] Fischer, E.: The art of uninformed decisions. Bulletin of the European Association for Theoretical Computer Science 75(97) (October 2001); Columns: Computational Complexity [13] Fischer, E.: Testing graphs for colorability properties. Random Struct. Algorithms 26(3), 289–309 (2005) [14] Freivalds, R.: Fast probabilistic algorithms. In: Becvar, J. (ed.) MFCS 1979. LNCS, vol. 74, pp. 57–69. Springer, Heidelberg (1979) [15] Furst, M.L., Saxe, J.B., Sipser, M.: Parity, circuits, and the polynomial-time hierarchy. Mathematical Systems Theory 17(1), 13–27 (1984) [16] Goldreich, O., Goldwasser, S., Ron, D.: Property testing and its connection to learning and approximation. J. ACM 45(4), 653–750 (1998) [17] Goldreich, O., Trevisan, L.: Three theorems regarding testing graph properties. Random Struct. Algorithms 23(1), 23–57 (2003) [18] Hartmanis, J., Stearns, R.E.: On the computational complexity of algorithms. Transactions of the American Mathematical Society 117, 285–306 (1965) [19] Hromkoviˇc, J.: Design and Analysis of Randomized Algorithms: Introduction to Design Paradigms. Springer, Heidelberg (2005) [20] R¨ odl, V., Schacht, M.: Property testing in hypergraphs and the removal lemma. In: STOC 2007: Proceedings of the 39th Annual ACM Symposium on Theory of Computing, pp. 488–495. ACM, New York, NY, USA (2007) [21] Ron, D.: Property testing ch. 15. In: Rajasekaran, S., Pardalos, P.M., Reif, J.H., Rolim, J. (eds.) Handbook of Randomized Computing, vol. II, ch. 15, pp. 597–649. Kluwer Academic Publishers, Dordrecht (2001) [22] Rubinfeld, R., Sudan, M.: Robust characterizations of polynomials with applications to program testing. SIAM J. Comput. 25(2), 252–271 (1996)
Derandomizing Graph Tests for Homomorphism Angsheng Li1, and Linqing Tang1,2, 1
State Key Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences {angsheng,linqing}@ios.ac.cn
Abstract. In this article, we study the randomness-efficient graph tests for homomorphism over arbitrary groups (which can be used in locally testing the Hadamard code and PCP construction). We try to optimize both the amortized-tradeoff (between number of queries and error probability) and the randomness complexity of the homomorphism test simultaneously. For an abelian group G = Zm p , by using the λ-biased set S of G, we show that, on any given bipartite graph H = (V1 , V2 ; E), the graph test for linearity over G is a test with randomness complexity |V1 | log |G| + |V2 |O(log |S|), query complexity |V1 | + |V2 | + |E| and error probability at most p−|E| + (1 − p−|E| ) · δ for any f which √ δ 2 −λ
)-far from being affine linear. For a non-abelian is 1 − p−1 (1 + 2 group G, we introduce a random walk of some length, say, on expander graphs to design a probabilistic homomorphism test over G with randomness complexity log |G| + O(log log |G|), query complexity 2 + 1 δ 2 2 for any f and error probability at most 1 − (δ + δ 2 2 − δ 2 ) + 2ψ(λ, ) which is 2δ/(1 − λ)-far from being affine homomorphism, here ψ(λ, ) = j−i−1 . 0≤i<j≤−1 λ
1
Introduction
For any two finite groups G and Γ , a homomorphism is a function f : G → Γ such that for every g1 , g2 ∈ G we have f (g1 · g2 ) = f (g1 ) · f (g2 ). If G and Γ are abelian groups, we may also call f as linear function. An affine homomorphism is a function f such that f (0)−1 ·f is a homomorphism, here 0 is the unit element of group G. For two functions f and g, define the distance d(f, g) = P rx∈G [f (x) = g(x)]. A homomorphism test, which tests the proximity of a function to a family of homomorphism functions, was first raised by Blum, Luby and Rubinfeld [2]. In the last decade, some interesting results about the test focused on two different objects have been achieved: one is the graph test of Samorodnitsky and Trevisan [4] which tried to optimize the tradeoff between the number of queries and the
The authors are partially supported by NSFC grant number 60325206, and NSFC Grand International Joint Project No. 60310213.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 105–115, 2008. c Springer-Verlag Berlin Heidelberg 2008
106
A. Li and L. Tang
error probability of the test. Another is the randomness-efficient version of linearity test of Ben-Sasson et al [11], which tried to save the randomness used by the test. 1.1
The Affine Homomorphism Testing
A (δ, , q, r)-test for (affine) homomorphisms from G to Γ is a test which queries q values of f , uses r random bits, always accepts homomorphism f and rejects any f which is -far from being a (affine) homomorphism with probability at least δ. Blum, Luby and Rubinfeld [2] first gives a (δ, 9δ/2, 3, 2 log |G|)-test (conditioned on δ < 2/9) for functions over abelian groups, so we always call the basic linearity test BLR-test. The BLR-test just picks two random elements x, y from the abelian group G and checks whether f (x)f (y) = f (xy). Ben-Or et al [1] extended the result and showed that the test works for functions over any general group, they showed that: Theorem 1. For any δ < 2/9, there is a (δ, τ, 3, 2 log |G|)-test for homomorphism over any group G, where τ is the smaller root of 3x − 6x2 = δ. Note that τ = δ/3 + O(δ 2 ) and the analysis is nearly optimal since the best possible τ is at least δ/3. 1.2
The Graph Test for Linearity
In order to reuse the queried bits and get better tradeoff between the number of queries and the error probability of the test, Samorodnitsky and Trevisan [4] introduced a notion of graph test and obtained very strong results for linearity test (for PCP construction too). Given a graph H = (V, E), the graph test for linearity chooses a random element from the group G = Zm p for each vertex v ∈ V , implements a basic BLR-test on each edge (x, y) = e ∈ E using the values of f at x, y and xy. Samorodnitsky and Trevisan [4] used Fourier analysis and proved (the analysis was simplified by H˚ astad and Wigderson [6] later): Theorem 2. For a graph H = (V, E), if the graph test for linearity over abelian −|E| + , then |fˆα | ≥ for group G = Zm p accepts f : G → μp with probability p m −1 some α ∈ Zp . In particular, f has agreement ≥ p (1 + /2)-with some affine linear function. Moreover, the test queries |V | + |E| values from f and uses |V |log|G| random bits. 1.3
The Randomness-Efficient Linearity Test
Reducing the number of random bits required by the homomorphism test is also a very intriguing problem. Several authors have supplied methods for this purpose. Ben-Sasson et al [11] used -biased sets to derandomize the BLR-test for linearity over abelian group G = Zm p , which uses only (1 + o(1)) log |G| random bits whereas the original test uses 2 log |G| random bits, while just lost a quite small quantity in soundness.
Derandomizing Graph Tests for Homomorphism
107
Theorem 3. For group G = Zm p and λ > 0, f : G → μp , let S be a λ-biased set in G, the test randomly chooses x ∈ G, y ∈ S and checks whether f (x)f (y) = √ f (xy). If P r( test accepts ) ≥ p1 + (1 − p1 ) · δ, then |fˆαa | ≥ δ 2 − λ for some a a α ∈ Zm p and 1 ≤ a ≤ p − 1, here f (x) = (f (x)) . In particular, f has agreement √
≥ p−1 (1 + δ 2−λ ) with some affine linear function. Moreover, the test queries 3 values of f and uses log|G| + log|S| random bits. 2
The main technique for analysis of Ben-Sasson et al [11] is Fourier analysis, which seems hard to be generalized to non-abelian groups. Recently Shpilka and Wigderson [3] extended the result of Ben-Sasson et al [11] to general groups by using combinatorial arguments and properties of expanding Cayley graphs. They showed: Theorem 4. For any group G, Γ and any λ > 0, let S ⊆ G be an expanding generating set with λ(Cay(G; S)) < λ, there is a δ, 4δ/(1−λ), 3, log|G|+log|S| test for affine homomorphism from G to Γ , given that 12δ/(1 − λ) < 1. 1.4
Our Results
For the linearity test over abelian groups, Samorodnitsky and Trevisan [4] and Ben-Sasson et al [11] tried to optimize different parameter but fail to consider the other one. A natural question asks how about trying to optimize both the amortized-tradeoff (between number of queries and error probability) and the randomness complexity simultaneously. In this paper we study this problem, we extend the result of [4] to a derandomized version. We show the following Theorem 5. For any δ > 0, group G = Zm p with a λ-biased set S and bipartite graph H = (V1 , V2 ; E), there is a derandomized graph test for f : G → μp on graph H, if the test accepts f with probability p−|E| + (1 − p−|E| ) · δ, then f has √ 2 agreement ≥ p−1 (1 + δ 2−λ ) with some affine linear function. Moreover, the test queries |V1 | + |V2 | + |E| values from f and uses |V1 | log |G| + |V2 | log |S| random bits. Note that for group G = Zm p , λ-biased set of size O(log |G|) can be efficiently constructed (see [13], [12]). Specifically, when graph H is a star graph with k leaves, we only use log |G| + k log log |G| random bits compared to the original graph test uses (k + 1) log |G| random bits. We are able to extend this result to graph test on graphs which is the union of a bipartite graph H = (V1 , V2 ; E ) and the clique graph over V1 easily. We also try to extend the derandomized graph test for homomorphism to general groups. We partially realized this goal. The main technique we use is the random walk of length over an expanding Cayley graph, and for each edge on the walk, we do a basic homomorphism test. is a parameter to be fixed later. It can be thought that the test is a graph test on a path.
108
A. Li and L. Tang
Theorem 6. For groups G, Γ , δ > 0 and a symmetric subset S of G. Let X0 be chosen uniformly at random and X0 , ..., X be a random walk on the expanding Cayley graph Cay(G; S) starting at X0 , the test checks whether f (xi )f (x−1 i xi+1 ) = f (xi+1 ) for each edge (Xi , Xi+1 ). Then if P r( test accepts f ) ≥ 1 −
δ 2 2 (δ + δ 2 2 − δ 2 ) + 2ψ(λ, )
2δ -close to some affine homomorphism, here ψ(λ, ) = 0≤i<j≤−1 1−λ λj−i−1 . Moreover, the test queries 2 + 1 values from f and uses log |G| + O(log log |G|) random bits. then f is
In order to keep the randomness complexity of the test to be (1 + o(1)) log |G|, we may let = O(log log |G|). Note that the test of Shpilka and Wigderson [3] is just the case of = 1. We remark that the error probability bound of derandomized homomorphism test of Shpilka and Wigderson [3] can be improved slightly. We have (compare to Theorem 4): Theorem 7. For any groups G, H ,δ > 0 and let S ⊆ G be an expanding generating set with λ(Cay(G; S) < λ, the test that picks uniformly an edge e = −1 (x, y) in Cay(G; S) and checks whether f (x)f (x y) = f (y) is a (δ, 2δ/(1 − λ)), 3, log |G| + O(log log |G|) -test for homomorphism given 6δ/(1 − λ) < 1. By using a little more random bits (but still log |G| + O(log log |G|)), we can decrease the part corresponding to λ in the error probability exponentially. It’s easy to have Theorem 8. For any groups G, H ,δ > 0 and let S ⊆ G be an expanding generating set with λ(Cay(G; S) < λ, the test that picks uniformly a random walk of lengh on Cay(G; S) (with starting and ending points x and y respec−1 − λ )), 3, log |G| + tively) and checks whether f (x)f (x y) = f (y) is a (δ, 2δ/(1 O(log log |G|) -test for homomorphism given 6δ/(1 − λ ) < 1.
2
Preliminaries
In this section, we give some necessary background. 2.1
Fourier Transformation over Abelian Groups, λ-Biased Sets
Let G be an Abelian group with |G| = k. Let F = f |G → C∗ where C∗ is the multiplicative group of complex numbers. We define the standard inner product over F : f (x)g(x) f, g = |G|−1 x∈G ∗
where a is the conjugation of a in C .
Derandomizing Graph Tests for Homomorphism
109
A character of G is a homomorphism χ : G → C∗ , it is easy to verify that the set of characters forms an orthonormal basis for the vector space F , so for any f ∈ F, we can represent it as fˆχ χ f= χ
where fˆχ = f, χ is the Fourier coefficient corresponding to χ. Let G be a group, a set S ⊆ G is called symmetric if it is closed under inverses, that is s ∈ S iff −s ∈ S. Definition 1. For G a finite abelian group, let S ⊆ G be a symmetric multiset. We call S λ-biased if for all nontrivial characters χ, we have χ(x)| ≤ λ |S|−1 | x∈S −1 We S = |S| define inner product f, g
note that G is 0-biased set. We can S f, f S on vector space C similar to x∈S f (x)g(x) and 2 -norm f S = that in G. Naor and Naor [13] first defined and gave a construction of λ-biased sets. Since then many other constructions appeared for various groups (see [13], [12]).
2.2
Random Walk on Expanding Cayley Graphs
Let H = (V, E) be a graph on n vertices. Let AH be its adjacency matrix. For two sets A, B ∈ V denote E(A, B) = {(u, v)|u ∈ A, v ∈ B}. Let e(A, B) = |E(A, B)|. Denote with λ1 ≥ ... ≥ λn the normalized eigenvalues of AH . If G is a d-regular graph, then λ1 = 1 . Let λ(H) = max(λ2 , |λn |) We denote λ = λ(H) sometimes when H is obviously known. Definition 2. An (n, d, α)-expanding graph is an n-vertice, d-regular graph with max(λ2 , |λn |) ≤ α . Definition 3. For G a finite group, let S ⊆ G be a symmetric generating set for G. Define the expanding Cayley graph Cay(G; S) = (V, E) as follows: let the vertices be the elements of G and the edges be the pairs (g, gs) for any g ∈ G and s ∈ S. It is easy to see that if (g1 , g2 ) ∈ E, then (xg1 , xg2 ) ∈ E for any x ∈ G.
110
A. Li and L. Tang
Lemma 1. [WX [12]] Given a group G of size n and λ < 1, there exists an algorithm running in time poly(n, λ), constructs a symmetric set S ⊆ G of size O(log n) such that λ(Cayley(G; S)) ≤ λ. For the relation between the expansion of H and λ(H), Alon and Chung [9] proved the following: Lemma 2. [expander mixing lemma] Given an (n, d, α)-expanding graph H = (V, E), for any two sets A, B ⊆ V we have d(|A| · |B|) e(A, B) − ≤ λ · d · |A| · |B| n Specifically, Shpilka and Wigderson showed that (as Corollary 2 in [3]) Lemma 3. Let H = (V, E) be an (n, d, α)-expanding graph, then: 1. For set A ⊆ V we have min(|A|, n − |A|) ≤
e(A, Ac ) 2 · 1−λ d
4δ ) · n in G if 2. There exists a connected component of size at least (1 − 1−λ 1−λ 12δ removing only 2δdn < 6 · dn edges from it (here conditioned on 1−λ < 1).
If we regard the expanding Cayley graph Cay(G; S) = (V, E) as a directed graph, i.e., there is an directed edge (y, ys) from y to ys for any y ∈ G and s ∈ S, then each vertex in Cay(G; S), since S is symmetric, has both out-degree and in-degree to be d. As usual, we say a directed graph is weakly connected if it is possible to reach any node starting from any other node by traversing edges in some direction (i.e., not necessarily in the direction they point). We remark that there is a similar result as Lemma 3 for the size of weakly connected component in Cay(G; S). 2δ ·n 1−λ in Cay(G; S) if removing only 2δdn directed edges from it (here conditioned on 6δ < 1). 1−λ We delay the proof of Lemma 4 to the full version. A random walk on an expanding Cayley graph H = (V, E) is a stochastic process (X0 , X1 , ...): chooses a vertex X0 from some initial distribution on V and Xi+1 uniformly from the neighbors of Xi . Recently, Dinur [8] proved: Lemma 4. There exists a weakly connected component of size at least
Proposition 1. Let H = (V, E) be a d-regular graph with λ(H) = λ. Let F ⊆ E be a set of edges, and let K be the distribution on vertices induced by selecting a random edge in F , and then a random endpoint. The probability p that a random walk that starts with distribution K takes the i + 1st step in F , is bounded by |F | i |E| + λ .
Derandomizing Graph Tests for Homomorphism
111
Corollary 1. Let H = (V, E) be a (n, d, λ)-graph and F ⊂ E be of size μ|E|, X0 , ...X is a random walk on G with X0 uniformly chosen from V , then for i < j, we have P r( edge (Xj , Xj+1 ) is in F | edge (Xi , Xi+1 ) is in F ) ≤ μ + λj−i−1 .
3
Derandomized Graph Test for Linearity in Abelian Groups
In this section, we prove Theorem 5. Let G = Zm p be an abelian group, S be a λ-biased set of G. f : G → μp is the function needs to be tested, where μp is the group of p’th roots of unity. Given a graph H = (V1 , V2 ; E) with |V1 | = k1 and |V2 | = k2 , the graph test (Test 1) works as follows: 1. For each vertex u ∈ V1 , chooses a random element xu from G uniformly. 2. For each vertex v ∈ V2 , chooses a random element yv from S uniformly. 3. For each edge e = (u, v) ∈ V1 × V2 , checks whether f (xu )f (yv ) = f (xu yv ). The test accepts iff it’s true for all edges in E. Now we analyze the accept probability of the above test. It is straightforward that the test always accept a linear function. Moreover, we have the following: Lemma 5. If Test 1 accepts f with probability at least p−|E| + (1 − p−|E| )δ, then there exists some character χ of G and some 1 ≤ a ≤ p − 1 such that √ fˆχa ≥ δ 2 − λ, here fˆχa = f a , χ and f a (x) = (f (x))a . We delay the proof of Lemma 5 to the appendix. Note that Ben-Sasson et al [11] have showed that (lemma 3.3 in [11]) √ √ 2 Lemma 6. If |fˆχa | ≥ δ 2 − λ, then f has agreement ≥ p−1 (1 + δ 2−λ ) with some affine function. Theorem 5 follows from the combining of the above two lemmas.
4 4.1
Derandomized Homomorphism Test by Random Walk on Expander Graphs Proof Theorem 7
Proof. We divided the proof into three claims. Define function φ(x) = P luralityy∈Gf (xy)f (y)−1 . Claim 1. For any x ∈ G, P ry∈G [f (xy)f (y)−1 = φ(x)] ≥ 1 − 2δ/(1 − λ). Proof. For a fixed x ∈ G, we know P ry∈G,s∈S [f (y)f (s) = f (ys)] = δ P ry∈G,s∈S [f (xy)f (s) = f (xys)] = δ.
112
A. Li and L. Tang
Construct a subgraph of Cay(G, S) as follows: delete edge y → ys from the Cay(G; S) (take it as directed graph) if f (y)f (s) = f (ys) or f (xy)f (s) = f (xys). By the above two equations, we delete at most 2δdn directed edges, by lemma 4 we know that the remaining graph H x contains a weakly connected component Cx of size at least 1 − 2δ/(1 − λ) n. We claim that for every two elements u, v ∈ Cx , we have f (xu)f (u)−1 = f (xv)f (v)−1 , then since |Cx | > |G|/2, claim 1 follows. For the last claimed equation, we may assume v = v1 , ..., vt = u be a path (do not consider the direction of edges) between v and u in Cx , since Cx is weakly connected, such path always exists. W.l.o.g., we may assume the direction is vi → vi+1 for edge vi , vi+1 , i.e., vi+1 = vi ·si for some si . Since the edge vi , vi+1 is in Cx , we know that f (vi )f (si ) = f (vi si ) = f (vi+1 ) and f (xvi )f (si ) = f (xvi si ) = f (xvi+1 ), then f (vi )−1 f (vi+1 ) = f (si ) = f (xvi )−1 f (xvi+1 ) and f (xvi )f (vi )−1 = f (xvi+1 )f (vi+1 )−1 . So f (xv)f (v)−1 = f (xu)f (u)−1 follows. Claim 2. φ is a homomorphism. Claim 3. There exists γ ∈ Γ such that P rx∈G [φ(x) = f (x)γ] ≥ 1 − 2δ/(1 − λ) . The proof of claim 2 and claim 3 is same as that in Shpilka and Wigderson [3]. We omit it here. We finish our proof by combining the three claims. 4.2
Proof Theorem 8
We do the following test: Choose a random vertex x in G, then independently choose elements s1 , ..., s from S, check whether f (x)f (s1 ·...·s ) = f (x·s1 ·...·s ), the test accepts if it is true. The above test may be thought as taking a random walk of length on Cayley graph Cay(G; S), then use the values of f at the starting point and the terminal point of the random walk to check whether the homomorphism condition is satisfied. We construct a new graph H = (V , E ) as follows: The vertex set V is still all the group elements. For each path of length from u to v in Cay(G; S), construct a corresponding edge (u, v) in E (parallel edges are permitted). Since graph Cay(G; S) is d-regular graph with λ(Cay(G; S)) = λ, it is easy to see that graph H is an (n, d , λ )-expander. The new test we defined is just taking a random edge from H , and tests whether it satisfies the homomorphism property. So we just need similar analysis to that in proving Theorem7, to prove that the test is a (δ, 2δ/(1 − λ ), 3, log|G| + O(log|S|)-test for affine homomorphism, which proves Theorem 8.
5 5.1
Derandomized Graph Test for Homomorphism in General Groups Proof of Theorem 6
Ben-or et al [1] proved that the BLR-test works over non-abelian groups too. Later Shpilka and Wigderson [3] used the Expanding Cayley graphs to derandomize the
Derandomizing Graph Tests for Homomorphism
113
homomorphism testing in general groups. However, derandomized graph test for homomorphism over non-abelian groups seems untouched till now. We will try to go further in this area. However, in order to maintain the quantity of random bits used by the test to be log |G| + O(log log |G|), we are only able to analyze the case that the test for homomorphism uses a random walk of length , queries + 1 values of f , and runs basic tests. For any group G, Γ , and subset S of G. Let X0 be chosen uniformly at random and X0 , ..., X be a random walk on the expanding Cayley graph Cay(G; S) starting at X0 . We define a homomorphism test (Test 2) as follows: checks whether f (xi )f (x−1 i xi+1 ) = f (xi+1 ) for each edge (Xi , Xi+1 ) met on the random walk. It may be viewed as a graph test for homomorphism on a path of length . First we introduce a useful lemma for our proof. −1 Lemma 7. [MU [7]] Let Xi be a 0-1 random variable and X = i=0 Xi , then P r[X > 0] ≥
−1 P r(Xi = 1) . E[X|Xi = 1] i=0
Now we proof the following key lemma for Theorem 6. Lemma 8. If P r( Test 2 accepts f ) ≥ 1−A(δ, λ), then f is 2δ/(1−λ)-close to δ 2 2 some affine homomorphism, where A(δ, λ) = and 2 2 (δ + δ − δ 2 ) + 2δψ(λ, ) −1 ψ(λ, ) = 0≤i<j≤−1 λj−i−1 = t=0 t · λ−1−t . Proof. Define random variables Xi for i = 0, ..., − 1 such that:
1 if f (xi )f (x−1 i xi+1 ) = f (xi+1 ) Xi = 0 otherwise −1 and X = i=0 Xi . Define set F = (x, xs) ∈ Cay(G; S)|f (x)f (s) = f (xs) and let |F | μ = |E(Cay(G;S)| , then we know E[Xi ] = P r(Xi = 1) = μ E[X] =
−1
E(Xi ) = μ
i=0
P r[ test rejects f ] = P r[X > 0] ≥
−1 −1 P r(Xi = 1) μ = E[X|X = 1] E[X|X i i = 1] i=0 i=0
≥ −1 i=0
2 μ E[X|Xi = 1]
114
A. Li and L. Tang
where the first inequality is from lemma 7 and the last inequality is from Jensen’s inequality. And −1 i=0
E[Xj=1 |Xi = 1]
i,j=0
−1
=
−1
E[X|Xi = 1] =
P r(Xj=1 |Xi = 1) = μ−1
i,j=0
−1
P r(Xj=1 , Xi = 1)
i,j=0
= μ−1 E[X 2 ]. Moreover E[X 2 ] =
−1
E[Xi Xj ] =
i,j=0
−1
E[Xi2 ] + 2
i=0
E[Xi Xj ]
0≤i<j≤−1
Since each Xi is 0-1 variable, so E[Xi2 ] = E[Xi ] = μ and for i < j E[Xi Xj ] = P r(Xi = 1) · P r(Xj = 1|Xi = 1) ≤ μ(μ + λj−i−1 ) where the last inequality is from corollary 1. It implies E[X 2 ] ≤ μ+2 μ(μ+λj−i−1 ) = (μ+μ22 −μ2 )+2μ 0≤i<j≤−1
Define ψ(λ, ) =
0≤i<j≤−1
λj−i−1
0≤i<j≤−1
λj−i−1 =
−1 t=0
t · λ−1−t then
E[X 2 ] ≤ (μ + μ2 2 − μ2 ) + 2μψ(λ, ) and P r[ test rejects f ] = P r[X > 0] ≥ ≥
−1 P r(Xi = 1) E[X|X i = 1] i=0
μ2 2 = A(μ, λ) . (μ + μ2 2 − μ2 ) + 2μψ(λ, )
So if P r( Test 2 accepts f ) ≥ 1 − A(δ, λ) then P r( Test 2 rejects f ) ≤ A(δ, λ) which implies P r( test of SW [3] rejects f ) = P ry∈G,s∈S (f (y)f (s) = f (ys)) ≤ δ By the analysis of Theorem 7, we know that in this case f is 2δ/(1 − λ)-close to some affine homomorphism.
Derandomizing Graph Tests for Homomorphism
115
Remark 1: When δ is relatively large, say δ > 1−α 2 , we are able to prove a better −1 bound on the error probability, say P r( Test 3 accepts ) ≤ (1 − δ + 1−α . 2 ) Remark 2: Be different from the abelian group case, there exists some kind of graph test may not work over general groups.
References 1. Ben-Or, M., Coppersmith, D., Luby, M., Rubinfeld, R.: Non-Abelian Homomorphism Testing, and Distributions Close to their Self-Convolutions. In: Jansen, K., Khanna, S., Rolim, J.D.P., Ron, D. (eds.) RANDOM 2004. LNCS, vol. 3122, Springer, Heidelberg (2004) 2. Blum, M., Luby, M., Rubinfeld, R.: Self-Testing/Correcting with Applications to Numerical Problems. Journal of Computer and System Sciences 47(3), 549–595 (1993) 3. Shpilka, A., Wigderson, A.: Derandomizing Homomorphism Testing in General Groups. In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing (STOC), pp. 427–435 (2004) 4. Samorodnitsky, A., Trevisan, L.: A PCP characterization of NP with optimal amortized query complexity. In: Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing, Portland, OR, USA, May 21-23, 2000, pp. 191–199 (2000) 5. Ajtai, M., Koml´ os, J., Szemer´edi, E.: Deterministic simulation in LOGSPACE. In: Proceedings of the 19th Annual ACM Symposium on Theory of Computing, pp. 132–140 (1987) 6. H˚ astad, J., Wigderson, A.: Simple Analysis of Graph Tests for Linearity and PCP. Random Structures and Algorithms 22(2), 139–160 (2003) 7. Mitzwnmacher, M., Upfal, E.: Probability and Computing: Randomized Algoriths and Probabilistic Analysis. Cambridge University Press, Cambridge (2005) 8. Dinur, I.: The PCP Theorem by Gap Amplification. In: Proceedings of the 38th Annual ACM Symposium on Theory of Computing, pp. 241–250 (2006) 9. Alon, N., Chung, F.R.K.: Explicit construction of linear sized tolerant networks. Discrete Mathematics 72, 15–19 (1988) 10. Bellare, M., Coppersmith, D., Hastad, J., Kiwi, M., Sudan, M.: Linearity testing in characteristic two. In: The 36th Annual Symposium on Foundations of Computer Science, p. 432 (1995) 11. Ben-Sasson, E., Sudan, M., Vadhan, S., Wigderson, A.: Randomness-efficient Low Degree Tests and Short PCPs via Epsilon-Biased Sets. In: Proceedings of the Thirty-fifth Annual ACM Symposium on Theory of Computing, San Diego, CA, USA, June 9-11, pp. 612–621 (2003) 12. Wigderson, A., Xiao, D.: Derandomizing the AW matrix-valued Chernoff bound using pessimistic estimators and applications. In: ECCC TR06-105 (2006) 13. Naor, J., Naor, M.: Small Bias Probability Spaces: Efficient Constructions and Applications. SIAM Journal on Computing 22(4), 838–856 (1993)
Definable Filters in the Structure of Bounded Turing Reductions Angsheng Li1, , Weilin Li1,2, , Yicheng Pan1,2, , and Linqing Tang1,2, 1
State Key Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences {angsheng,weilin,yicheng,linqing}@ios.ac.cn
Abstract. In this article, we show that there exist c.e. bounded Turing degrees a, b such that 0 < a < 0 , and that for any c.e. bounded Turing degree x, b ∨ x = 0 if and only if x ≥ a. The result gives an unexpected definability theorem in the structure of bounded Turing reducibilities.
1
Introduction
A set A ⊆ ω is called computably enumerable (c.e., for short), if there is an algorithm to enumerate the elements of it. Given sets A, B ⊆ ω, we say that A is Turing reducible to B, if there is an oracle Turing machine, Φ say, such that A = ΦB (denoted by A ≤T B), and furthermore, if the bits of oracle queries are bounded by a computable function, we say that A is bounded Turing reducible to B (written A ≤bT B). A Turing and a bounded Turing (or bT, for short) degree is the equivalence class of a set under the Turing reductions and the bounded Turing reductions respectively. A degree is called computably enumerable (c.e.), if it contains a c.e. set. Let E and EbT be the structures of the c.e. degrees under the Turing reductions and the bounded Turing reductions respectively. During the past decades, the studies of both structures focused on that of the algebraic properties, leading to major achievements such as the decidability results of the Σ1 -theory of E, and the Σ2 -theory of EbT (Ambos-Spies, P. Fejer, S. Lempp and M. Lerman [1996]), and the undecidability results of the Σ3 -theory of E (Lempp, Nies, and Slaman [1998]), and of the Σ4 -theory of EbT (Nies and Lempp [1995]). This progress brings the decidability problems of the Σ2 -theory of E, and the Σ3 -theory of EbT into sharper focus, for which new ingredients are welcome. In recent years, the study of the computably enumerable degrees has focused on Turing definability in the structure E. For instance, Slaman asked in 1985 if there are any c.e. degrees that are incomplete and nonzero which are definable in the c.e. degrees E. A natural approach to this problem is to find some definable substructures of E that have nontrivial minimal/maximal and/or least/greatest
The authors are partially supported by NSFC Grant No. 60325206, and No. 60310213.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 116–124, 2008. c Springer-Verlag Berlin Heidelberg 2008
Definable Filters in the Structure of Bounded Turing Reductions
117
members. This resumes interests in topics such as the continuity of the c.e. degrees, started by Lachlan early in 1967. Harrington and Soare [1992] showed that there is no maximal minimal pair in the c.e. Turing degrees, and Seetapun [1991] proved a stronger result that for any c.e. degree b = 0, 0 , there is a c.e. Turing degree a > b such that for any c.e. Turing degree x, a ∧ x = 0 if and only if b ∧ x = 0. In the dual case, Ambos-Spies, Lachlan and Soare [1993] showed that for any c.e. Turing degrees x, y, if x and y are nontrivial splitting of 0 , then there exists a c.e. Turing degree a < x such that a ∨ y = 0 . Remarkably, Cooper and Li [ta] (the major subdegree theorem) showed that for any c.e. Turing degree b = 0, 0 , there exists a c.e. Turing degree a < b such that for any c.e. Turing degree x, b ∨ x = 0 if and only if a ∨ x = 0 (. However, the study of the continuity and the definability in the bounded Turing structures is started just recently. Paul Brodhead, Angsheng Li and Weilin Li [ta] (BLL) showed that the Seetapun’s result holds in the c.e. bounded Turing degrees, that is, for any c.e. bounded Turing (bT) degree b = 0, 0 , there exists a c.e. bT-degree a > b such that for any c.e. bT-degree x, a ∧ x = 0 if and only if b ∧ x = 0. In the present paper, we consider the dual case of this result, the BLL result. It is very unexpected that the dual case fails badly. In fact we are able to prove: Theorem 1. There exist c.e. bounded Turing degrees a, b with the following properties: (1) 0 < a < 0 , (2) For any c.e. bounded Turing degree x, b ∨ x = 0 if and only if x ≥ a. An immediate corollary of the theorem is that there exist c.e. bounded Turing degrees a, b such that a ∨ b = 0 but for no c.e. bT-degree x < a with x ∨ b = 0 , and that the Cooper-Li major subdegree theorem fails badly in the c.e. bT-degrees. However the result gives a very nice theorem in the Turing definability of the c.e. bounded Turing degrees. That is, there is a principal filter [a, 0 ] which is definable by equation x ∨ b = 0 for some nonzero and incomplete c.e. bTdegree b. The result may provide ingredients to the decidability/undecidability problem of the Σ3 -theory of the c.e. bounded Turing degrees. The rest of this paper is devoted to proving theorem 1.1, our main result. In section 2, we formulate the conditions of the theorem by requirements, and for each requirement, we give corresponding strategy to satisfy it; in section 3, we arrange all strategies to satisfy the requirements on nodes of a tree, or more precisely, the priority tree T . In section 4, we use the priority tree to describe a stage-by-stage construction of the objects we need. Finally, in section 5 we verify that the construction in section 4 satisfies all of the requirements, finishing the proof of the theorem. Our notation and terminology are standard and generally follow Soare [1987]. During the course of a construction, notations such as A, Φ are used to denote the current approximations to these objects, and if we want to specify the values
118
A. Li et al.
immediately at the end of stage s, then we denote them by As , Φ[s] etc. For a partial computable (p.c., or for simplicity, also a Turing) functional, Φ say, the use function is denoted by the corresponding lower case letter φ. The value of the use function of a converging computation is the greatest number which is actually used in the computation. For a Turing functional, if a computation is not defined, then we define its use function = −1. During the course of a construction, whenever we define a parameter, p say, as fresh, we mean that p is defined to be the least natural number which is greater than any number mentioned so far. In particular, if p is defined as fresh at stage s, then p > s. The notion of bounded Turing reducibility is taken from Soare’s new book: Computability Theory and Applications [ta] (Definition 3.4.1).
2
Requirements and Strategies: Theorem 1.1
2.1
The Requirements
We will build c.e. sets A, B, C and D to satisfy the following requirements: T Pe Se Re
: : : :
K ≤bT A ⊕ B A = θe C = Ψe (A) D = Φe (Xe , B) −→ A ≤bT Xe
where e ∈ ω, {(θe , Φe , Ψe , Xe ) : e ∈ ω} is an effective enumeration of all quadruples (θ, Φ, Ψ, X) of all computable partial functions θ, of all bounded Turing (bT , for short) reductions Φ, and Ψ , and of all c.e. sets X; and K is a fixed creative set. Let a, b and x be the bT-degrees of A, B and X respectively. By the P-requirements and the S-requirements, we have 0 < a < 0 . By the Rrequirements, we know that for any c.e. bT degree x , if x ∨ b = 0 , then a ≤ x. By the T -requirement, for any c.e. bounded Turing degree x, if a ≤ x, then 0 ≤ a ∨ b ≤ x ∨ b. Therefore satisfying all the requirements is sufficient to prove the theorem. We introduce some conventions of the bounded Turing reductions for describing the strategies. We will assume that for any given bounded Turing reduction Φ or Ψ , the use functions φ and ψ will be increasing in arguments. Now we are ready to describe the strategies. 2.2
The T -Strategy
To satisfy the T -requirement, K ≤bT A ⊕ B, we need to construct a bounded Turing reduction Γ such that K = Γ (A, B). We construct the Γ by coding K into A and B as follows: For any k, if k ∈ K, then either 2k ∈ A or 2k ∈ B. Therefore, k ∈ K if and only if either A(2k) = 1 or B(2k) = 1. K is bounded Turing reducible to A ⊕ B, the T -requirement is satisfied.
Definable Filters in the Structure of Bounded Turing Reductions
2.3
119
A P-Strategy
A P-strategy satisfying a P-requirement, A = θ say, is a Friedberg-Muchnik procedure, and proceeds as follows: 1. Define a witness a as a fresh odd number. 2. Wait for a stage, v say, at which θ(a) ↓= 0 = A(a), then — enumerate a into A and terminate. By the strategy, we know if step 2 of the procedure occurs, then θ(a) ↓= 0 = 1 = A(a) is created, otherwise θ(a) = 0 = A(a) occurs at each stage. In either case, the P-requirement is satisfied. We use a node on a tree (the priority tree as we’ll see later), γ say, to denote a P-strategy. 2.4
An S-Strategy
Suppose that we want to satisfy an S-requirement, C = Ψ (A) say. It will be satisfied by a Friedberg-Muchnik procedure as follows: 1. Define a witness c as a fresh odd number. 2. Wait for a stage, v say, at which Ψ (A; c)[v] ↓= 0 = Cv (c), then — enumerate c into C; — define an A-restraint rA = ψ(c). The key point to the satisfaction of S is that the A-restraint rA = ψ(c) will be preserved once step 2 of the strategy occurs. In this case, we have that if step 2 occurs, then Ψ (A; c) ↓= 0 = 1 = C(c) is created and preserved, else if step 2 never occurs, then Ψ (A; c) = 0 = C(c). In either case, the S-requirement is satisfied. We use a node β, say, on the priority tree to denote the S-strategy. 2.5
An R-Strategy
Suppose we want to satisfy an R requirement, R: D = Φ(X, B) −→ A ≤bT X, say. We will build a bounded Turing reduction Δ such that if D = Φ(X, B) then Δ(X) = A. The bounded Turing reduction Δ will be built by an ω-sequence of cycles n, cycle n will be responsible for defining Δ(X; n). The R-strategy will proceed as follows: 1. Let n be the least x, such that Δ(X; x) ↑. Define a witness block Un such that min{y ∈ Un } is fresh, and that |Un | = n + 1. 2. Wait for a stage, v say, at which the following conditions occur: For all y ∈ Un for some n ≤ n, we have Φ(X, B; y) ↓= D(y) then: — define Δ(X; n) ↓= A(n) with δ(n) = max{φ(y)|y ∈ Un , n ≤ n}. 3. If there is an x such that Δ(X; x) ↓= A(x), then let n be the least such x, and go on to step 4.
120
A. Li et al.
4. If for all y ∈ Un , Φ(X, B; y) ↓= D(y), then — let x be the least y ∈ Un \D; — enumerate x into D; → — define a conditional restraint − r = (n, δ(n)]. → 5. Suppose that a conditional restraint − r = (n, δ(n)] is kept, then there is no element b with n < b ≤ δ(n) that can be enumerated into B. Note that if X changes below δ(n), then Δ(X; n) becomes undefined, and → simultaneously the conditional restraint − r = (n, δ(n)] drops. The new idea of this proof is the notion of the witness block, which seems the first time it becomes available, and the notion of conditional restraints. The later notion was largely the first author’s idea that has already been used several times in the literature. Now we analyze the correctness of the R-strategy. We proceed the arguments by cases: Case 1: Δ(X) is built infinitely often. In this case we will prove that Δ(X) is a total function (i.e., Δ(X; x) is defined for every x). Note that for every n, Δ(X; n) is redefined only finitely many times. So there exists some s , such that for any stage s with s > s, Δ(X; n)[s ] does not change any more. Assume by contradiction, there exists an x such that Δ(X; x) ↑ eventually, let n be the least such x. Assume after stage v, the Δ(X; n) would never been defined any more. By step 2 of the R-strategy, we know that for each stage s with s ≥ v, there exists some y ∈ Un for some n ≤ n such that Φ(X, B; y) ↓= D(y) (otherwise Δ(X; n) will be defined again by step 2 of the R-strategy), then for any undefined Δ(X; m) with m > n, it would never be defined after stage s with s ≥ v. So Δ(X) is finitely built, a contradiction. That means that Δ(X) is a total function. Now if Δ(X) = A, i.e., there exists an x such that Δ(X; x) ↓= A(x), we prove that Φ(X, B) = D under this assumption. Let n be the least such x, we prove that there is an x ∈ Un such that Φ(X, B; x) ↓= 0 = 1 = D(x). Let sn be the stage at which the permanent computation Δ(X; n) was created, and let tn > sn be the stage at which n is enumerated into A. By step 4 of the R-strategy, for any s > tn , if Φ(X, B; y) ↓= D(y) holds for all y ∈ Un , then we enumerate an element x ∈ Un into D, and create a conditional restraint (n, δ(n)]. By the choice of sn , the conditional restraint will be kept forever. By the conditional restraint, no number x with n < x ≤ δ(n) . could be enumerated into B, therefore rn = {Φ(X, B; y)|y ∈ Un } can be injured only by numbers x ≤ n. Once rn is injured, we may waste a witness x ∈ Un . However, rn can be injured at most n many times and |Un | = n + 1. Therefore after up to n many times injury, there is a witness y ∈ Un which can be used to create a permanent inequality between Φ(X, B) and D. Let vn > tn be a minimal stage after which B will never change below n. Therefore if at a stage s > tn we have that
Definable Filters in the Structure of Bounded Turing Reductions
121
(1) Δ(X; n) ↓= A(n), and (2) for any y ∈ Un , Φ(X, B; y)[s] ↓= Ds (y), then we enumerate the least x ∈ Un \D, x0 say, into D. By the choice of n, vn , → and by the conditional restraint − r = (n, δ(n)], Φ(X, B; x0 ) ↓= 0 = 1 = D(x0 ) holds permanently, which means Φ(X, B) ↓= D. Case 2: Δ is finitely built. In this case, assume w.l.o.g. that Δ(X; 0) ↓, · · · , Δ(X; n − 1) ↓, and Δ(X; n) ↑ eventually, we can assume after sn , Δ(X; n) doesn’t have chance to be defined any more (since if it has infinitely many chances to be defined, then it must be defined). By step 2 of the R-strategy, for any stage s with s > sn , there exists a y ∈ Un for some n ≤ n such that Φ(X, B; y) = D(y) (otherwise Δ(X; n) will be defined again by step 2 of the R-strategy). Since n ≤n Un is finite, so there exists some n with some y0 ∈ Un such that Φ(X, B; y0 )[s] = Ds (y0 ) for infinitely many stages s, that means Φ(X, B) = D. In either case, the R-requirement is satisfied. We use a node on the priority tree, α say, to denote the R-strategy. Note in the case Δ(X) = A, there is no permanent conditional restraint −→ − → r . However, there may be infinitely many stages at which we create r[s] = (a[s], b[s]]. The key to the proof is that this unbounded conditional restraints −→ r[s] are harmless, because: −→ (1) A conditional restraint r[s] = (a[s], b[s]] controls elements x only if a[s] < x ≤ b[s]. (2) The conditional restraints ensure that the R-strategy has no influence on any number no larger than a[s]. We say it’s not important because in this case, a[s] will be unbounded. This guarantees that the R-strategy would not injure a lower priority S-strategy after some fixed stage. That is to say, a lower priority S-strategy is injured by the R-strategy only finitely many times. This is the reason why we can only use conditional restraints here, instead of the typical restraints, which is of course one of the main contributions of this paper. We thus define the possible outcomes of the R-strategy α by 0
3
The Priority Tree
In this section, we will build a priority tree of strategies T ⊂ Λ<ω , with Λ = {0, 1}. Let P < R denote that the priority ranking of P is higher than that of R. Also let
122
A. Li et al.
Definition 1. Define the priority ranking of the requirements such that T has the highest priority, and ∀e ∈ ω: Re < Se < Pe < Re+1 . Note that both a P-strategy and an S-strategy have only one possible outcome, we denote it by 1. However, as we mentioned before, an R-strategy has two outcomes which we denote by 0
4
The Construction
Our construction will perform different actions at even and odd stages. Suppose that K is enumerated at odd stages only, and that there is exactly one element that enters K at each odd stage. At even stages, strategies on the tree will act to satisfy the requirements. During the course of the construction, we may initialize a node, ξ say, which means that all the actions taken by ξ previously, are canceled, or set to be totally undefined. Now we give the construction stage-by-stage. Definition 4. (The Construction) The construction is defined as follows: Stage s = 0. Set A = B = C = D = ∅, and initialize every node of the priority tree T . Stage s = 2n + 1. For ks ∈ Ks \Ks−1 . Let x = 2ks , we need to decide either x ∈ A or x ∈ B: Case 1: Find the <-least ξ such that (a) or (b) below occurs: −→ (a) ξ = α is an R-strategy such that α has conditional restraint r[s] ↓= (a[s], b[s]] and a[s] < x ≤ b[s]; (b) ξ = β is an S-strategy such that rA (β) ↓≥ x. then: Subcase 1: ξ = α, i.e., (a) occurs — enumerate x into A; — initialize all nodes ξ to the right of αˆ0. Subcase 2: ξ = β, i.e., (b) occurs
Definable Filters in the Structure of Bounded Turing Reductions
123
— enumerate x into B; — initialize all nodes ξ with ξ ≤ β. Case 2: Otherwise, we can’t find node ξ such that (a) or (b) occurs for ξ: — enumerate x into A. Stage s = 2n + 2. We specify certain strategies to be eligible to act. First we allow the root node to be eligible to act at substage t = 0. At each substage t, we let the strategy be eligible to act run its program, and then, either close the current stage or specify a node to be eligible to act next, i.e., at the next substage of stage s. Substage t. Suppose that node ξ is eligible to act at substage t of stage s. If t = s, then initialize all nodes ξ ≤ ξ, and close the current stage. Otherwise, corresponding to different types of the strategy, there are three cases: Case 1. ξ = β is an S-strategy. Then run the following: Program β: 1. If c(β) ↓, and Ψβ (A; c(β)) ↓= 0 = C(c(β)), then (a) If for any α with αˆ0 ⊆ β we have dom(α) > ψβ (c), then — enumerate c(β) into C, and — define rA (β) = ψ(c(β)), where dom(α) is the size of the domain of Δα . (b) Otherwise, then do nothing, in either case, initialize all ξ ≤ β and go to stage s + 1. 2. If c(β) ↑, then — define c(β) as fresh; — initialize all ξ ≤ β and go to stage s + 1. 3. Otherwise, let βˆ1 be eligible to act next. Case 2. ξ = γ is a P-strategy. Run the following Program γ: 1. If a(γ) ↓, and θγ (a(γ)) ↓= 0 = A(a(γ)), then — enumerate a(γ) into A, – initialize all nodes ξ with ξ ≤ γ, and go to stage s + 1. 2. If a(γ) ↑, then — define a(γ) as fresh; — initialize all ξ ≤ γ and go to stage s + 1. 3. Otherwise, let γˆ1 be eligible to act next. Case 3. ξ = α is an R-strategy. In this case, we perform the following Program α: 1. If there is a k > b(α), such that Δα (Xα ; k) ↓= A(k), then: → (a) If − r (α) ↓, then: — Let αˆ1 be eligible to act next.
124
2.
3.
4.
5.
A. Li et al.
(b) Otherwise and ∀y ∈ Ukα , Φα (Xα , B; y)[s] ↓= D(y), then: — let i be the least k such that Δα (Xα ; k) ↓= A(k), — let x be the least y ∈ Uiα \D, enumerate x into D; → — define a conditional restraint − r (α) = (i, δ(i)]; — initialize all nodes ξ to the right of αˆ0, and go to stage s + 1. (c) Otherwise: – let αˆ1 be eligible to act next. Otherwise and for k = min{x > b(α)|Δα (Xα ; x) ↑} we have: (a) Ukα ↓; (b) ∀y ∈ Ukα for some k ≤ k, Φα (Xα , B; y)[s] ↓= D(y). Then: — define Δα (Xα ; k) ↓= A(k) with δα (k) = max{φα (y)|y ∈ Ukα , k ≤ k}; — let αˆ0 be eligible to act next. Otherwise and for k = min{x > b(α)|Δα (Xα ; x) ↑} we have Ukα ↑, then: — define Ukα = (l, l + k + 1), where l is a fresh number; — initialize all nodes ξ to the right of αˆ0, and close the current stage. Otherwise and b(α) ↑, then: — define b(α) as fresh; — initialize all ξ ≤ α and go to stage s + 1. Otherwise, then let αˆ1 be eligible to act next.
This completes the description of the construction. The verification of this construction will be given in the full version of the paper.
References 1. Ambos-Spies, K., Fejer, P.A., Lempp, S., Lerman, M.: Decidability of the twoquantifier theory of the recursively enumerable weak truthtable degrees and other distributive upper semi-lattices. Journal of Symbolic Logic 61, 880–905 (1996) 2. Ambos-Spies, K., Lachlan, A.H., Soare, R.I.: The continuity of cupping to 0’. Annals of Pure and Applied Logic 64, 195–209 (1993) 3. Brodhead, P., Li, A., Li, W.: Continuity of capping in εdT (to appear) 4. Cooper, S.B., Li, A.: On Lachlan’s major subdegree problem (to appear) 5. Harrington, L., Soare, R.I.: Games in recursion theory and continuity properties of capping degrees. In: Judah, H., Just, W., Woodin, W.H. (eds.) Set Theory and the Continuum, Proceedings of Workshop on Set Theory and the Continuum, MSRI, Berkeley, CA, October, 1989, pp. 39–62. Springer, Heidelberg (1992) 6. Nies, A., Lempp, S.: The undecidability of the Π4 -theory for the r.e. wtt- and Turing degrees. Journal of Symbolic Logic 60, 1118–1136 (1995) 7. Nies, A., Lempp, S., Slaman, T.A.: The Π3 -theory of the enumerable Turing degrees is undecidable. Transactions of the American Mathematical Society 350, 2719–2736 (1998) 8. Seetapun, D.: Contributions to Recursion Theory, Ph. D thesis, Trinity College (1991) 9. Soare, R.: Recursively Enumerable Sets and Degrees. Springer, Heidelberg (1987) 10. Soare, R.I.: Computability Theory and Applications. Springer, Heidelberg (to appear)
Distance Constrained Labelings of Trees Jiˇr´ı Fiala1 , Petr A. Golovach2, , and Jan Kratochv´ıl1 1,
Institute for Theoretical Computer Science and Department of Applied Mathematics, Charles University, Prague, Czech Republic {fiala,honza}@kam.mff.cuni.cz 2 Institutt for informatikk, Universitetet i Bergen, Norway
[email protected]
Abstract. An H(p, q)-labeling of a graph G is a vertex mapping f : VG → VH such that the distance between f (u) and f (v) (measured in the graph H) is at least p if the vertices u and v are adjacent in G, and the distance is at least q if u and v are at distance two in G. This notion generalizes the notions of L(p, q)- and C(p, q)-labelings of graphs studied as particular graph models of the Frequency Assignment Problem. We study the computational complexity of the problem of deciding the existence of such a labeling when the graphs G and H come from restricted graph classes. In this way we extend known results for linear and cyclic labelings of trees, with a hope that our results would help to open a new angle of view on the still open problem of L(p, q)-labeling of trees for fixed p > q > 1 (i.e., when G is a tree and H is a path). We present a polynomial time algorithm for H(p, 1)-labeling of trees for arbitrary H. We show that the H(p, q)-labeling problem is NP-complete when the graph G is a star. As the main result we prove NP-completeness for H(p, q)-labeling of trees when H is a symmetric q-caterpillar.
1
Introduction
Motivated by models of wireless communication, the notion of so called distance constrained graph labelings has received a lot of interest in Discrete Mathematics and Theoretical Computer Science in recent years. In the simplest case of constraints at distance two, the typical task is, given a graph G and parameters p and q, to assign integer labels to vertices of G such that labels of adjacent vertices differ by at least p, while vertices that share a common neighbor have labels at least q apart. The aim is to minimize the span, i.e., the difference between the smallest and the largest labels used. The notion of proper graph coloring is a special case of this labeling notion — when (p, q) = (1, 0). Thus it is generally NP-hard to decide whether such
Supported by Norwegian Research Council. Supported by the Ministry of Education of the Czech Republic as project 1M0021620808.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 125–135, 2008. c Springer-Verlag Berlin Heidelberg 2008
126
J. Fiala, P.A. Golovach, and J. Kratochv´ıl
a labeling exists. On the other hand, in some special cases polynomial-time algorithms exist. For example, when G is a tree and p ≥ q = 1, the algorithm of Chang and Kuo based on dynamic programming finds a labeling of the minimum span [3,6] in superlinear time O(nΔ5 ), where Δ is the maximum degree of the tree. Distance constrained labelings can be generalized in several ways. For example, constraints on longer distances can be involved [18,14], or the constraints on the difference of labels between close vertices can be directly implemented by edge weights — the latter case is referred to as the Channel Assignment Problem [22]. Alternatively, the cyclic metric or even more complex metrics on the label set are considered. In particular, if the metric is given as the distance between vertices in some graph H, we get the following mapping: Definition 1. Given two positive integers p and q, we say that f is an H(p, q)labeling of a graph G if f maps the vertices of G onto vertices of H such that the following hold: – if u and v are adjacent in G, then distH (f (u), f (v)) ≥ p, – if u and v are nonadjacent but share a common neighbor, then distH (f (u), f (v)) ≥ q. Observe that if the graph H is a path, the ordinary linear metric is obtained. This has been introduced by Roberts and studied in a number of papers — see, e.g., recent surveys [23,1]. The cyclic metric (corresponding to the case when H is a cycle) was studied in [17,20]. The general approach was suggested in [8] and several (both P-time and NP-hardness) results for various fixed graphs H were presented in [7,10]. Several computational problems can be defined by restricting the graph classes of the input graphs, and/or by fixing some values as parameters of the most general problem which we refer to as follows: Distance Labeling Instance: G, H, p and q Question: Does G allow an H(p, q)-labeling?
DL
As it was already mentioned, the linear metric is often considered, i.e. when H = Pλ+1 is a path of length λ. In this case we use the traditional notation “L(p, q)-labeling of span λ” for “Pλ+1 (p, q)-labeling” and also define the problem explicitly: L(p, q)-Distance Labeling Parameters: p, q Instance: G and λ Question: Does G allow an L(p, q)-labeling of span λ?
L(p, q)-DL
We focus our attention on various parameterized versions of the DL problem. Griggs and Yeh [15] showed NP-hardness of the L(2, 1)-DL of a general graph,
Distance Constrained Labelings of Trees
127
which means that DL is NP-complete for fixed (p, q) = (2, 1) and H being a path. Later Fiala et al. [9] showed that the L(2, 1)-DL problem remains NP-complete for every fixed λ ≥ 4. Similarly, the labeling problem with the cyclic metric is NP-complete for a fixed span [8], i.e., when (p, q) = (2, 1) and H = Cλ for an arbitrary λ ≥ 6. From the very beginning it was noticed that the distance constrained labeling problem is in certain sense more difficult than ordinary coloring. The first polynomial time algorithm for L(2, 1)-DL for trees came as a little surprise [3]. Attention was then paid to input graphs that are trees and their relatives, either paths and caterpillars as special trees, or graphs of bounded treewidth. Table 1 briefly summarizes the known results on the complexity of the DL problem on these graph classes and the new results presented in this paper. Notice in particular that L(2, 1)-DL belongs to a handful of problems that are solvable in polynomial time on graphs of treewidth one and NP-complete on treewidth two [5]. Table 1. Summary of the complexity of the DL problem Class of G bounded tw tw ≤ 2 tw ≤ 2 pw ≤ 2 stars trees, L trees trees trees trees
Class of H finite class paths cycles all graphs all graphs paths all graphs cycles q-caterpillars paths
p q on input on input 2 1 2 1 2 1 fixed fixed, ≥ 2 fixed fixed, ≥ 2 on input 1 on input ≥ q on input ≥ 1 fixed, ≥ 2q + 1 fixed, ≥ 2 fixed fixed, ≥ 2
Complexity P [∗] NP-c [5] NP-c [5] NP-c [5] NP-c [∗] NP-c [11] P [3,2, ∗] P [20,19] NP-c [∗] Open
The symbol tw means treewidth; pw is pathwidth; L indicates the list version of the DL problem where every vertex has prescribed set (list) of possible labels; P are problems solvable in polynomial time; NP-c are NP-complete problems; the reference [∗] indicates results of this paper.
We start with two observations. For the seventh line, the algorithm of Chang and Kuo [3] can be easily modified to work for arbitrary p and H on the input. It only suffices to modify all tests whether labels of adjacent vertices have difference at least two into tests whether they are mapped onto vertices at distance at least p in H. The first line of the table is a corollary of a strong theorem of Courcelle [4], who proved that properties expressible in Monadic Second Order Logic (MSOL) can be recognized in polynomial time on any class of graphs of bounded treewidth. If H belongs to a finite graph class, then the graph property ”to allow an H(p, q)labeling” straightforwardly belongs to MSOL. A special case arises, of course, if a single graph H is a fixed parameter of the problem.
128
J. Fiala, P.A. Golovach, and J. Kratochv´ıl
In particular, if λ is fixed, then L(p, q)-DL is polynomially solvable for trees. On the other hand, without this assumption on λ the computational complexity of L(p, q)-DL on trees remains open so far. It is regarded as one of the most interesting open problems in the area. We focus on the following variant when both p and q are fixed: (p, q)-Distance Labeling Parameters: p and q Instance: G and H Question: Does G allow an H(p, q)-labeling?
(p, q)-DL
Our main contribution is given by Theorems 1 and 2 showing two NP-hardness results for the (p, q)-DL problem: in the case when G is a star (with no restriction on H), and in the case when G is a tree and H is a specific symmetric qcaterpillar. The latter choice of H is motivated by the similarity of a caterpillar with a path, which might provide a useful step in a proof of NP-hardness of the L(p, q)-labeling problem.
2
Preliminaries
Throughout the paper the symbol [a, b] means the interval of integers between a and b (including the bounds), i.e., [a, b] := {a, a + 1, . . . , b}. We also define [a] := [1, a]. The relation i ≡q j means that i and j are congruent modulo q, i.e., q divides i − j. For i ≡q j we define [i, j]≡q := {i, i + q, i + 2q, . . . , j − q, j}. A set M of integers is t-sparse if the difference between any two elements of M is at least t. We say that a set of integers M is λ-symmetric if for every x ∈ M , it holds that λ − x ∈ M . All graphs are assumed to be finite, undirected, and simple, i.e., without loops or multiple edges. Throughout the paper VG stands for the set of vertices, and EG for the set of edges, of a graph G. We use standard terminology: a path Pn is a sequence of n consecutively adjacent vertices (its length being n − 1); a cycle is a path where the first and the last vertex are adjacent as well; a graph is connected if each pair of vertices is joined by a path; the distance between two vertices is the length of a shortest path that connects them; a tree is a connected graph without a cycle; and a leaf is a vertex of degree one. For precise definitions of these terms see, e.g., the textbooks [21,16]. A q-caterpillar is a tree that can be constructed from a path, called the backbone, by adding new disjoint paths of length q, called legs, and merging one end of each leg with some backbone vertex. We say that a caterpillar is symmetric, if it allows an automorphism that reverses the backbone. Obviously, a path can be viewed as a symmetric caterpillar. As a technical tool for proving NP-hardness results we use the following problem of finding distant representatives:
Distance Constrained Labelings of Trees
129
System of q-distant representatives Sq-DR Parameter: q Instance: A collection of sets Mi , i ∈ [m] of integers. Question: Is there a collection of elements ui ∈ Mi , i ∈ [m] that pairwise differ by at least q? It is known that the S1-DR problem allows a polynomial time algorithm (by finding a maximum matching in a bipartite graph), while for all q ≥ 2 the Sq-DR problem is NP-complete, even if each set M has at most three elements [12]. We conclude this section with several observations specific to the H(p, q)labelings when H is a symmetric caterpillar. If f is such a labeling then the ”reversed” mapping f defined as f composed with the backbone reversing automorphism is a valid H(p, q)-labeling as well. Hence, if we look for a specific graph construction where only a fixed label is allowed on a certain vertex, we can not avoid symmetry of labelings f and f . For that purpose we need a stronger concept of systems of q-distant representatives. Lemma 1. For any q ≥ 2 and t ≥ q, the Sq-DR problem remains NP-complete even when restricted to instances whose sets are of size at most 6, t-sparse and λ-symmetric for some λ. Proof. We extend the construction from [12] where an instance of the well known NP-complete problem 3-Satisfiability (3-SAT) [13, problem L02] was transformed into an instance of the Sq-DR problem as follows: Assume that the given Boolean formula is in the conjunctive normal form and consists of m clauses, each of size at most three, over n variables, each with one positive and two negative occurrences. – The three literals for a variable xi are represented by a triple ai , bi , ai + q such that ai < bi < ai + q. The number bi represents the positive literal, while ai , ai + q the negative ones. – Triples representing different variables are at least t apart (e.g., the elements ai form an arithmetic progression of step q + t). – The sets Mi , i ∈ [m] represent clauses and are composed from at most three numbers, each uniquely representing one literal of the clause. The equivalence between the existence of a satisfying assignment and the existence of a set of q-distant representatives is straightforward (for details see [12]). Without loss of generality assume that there are positive integers α and β such that Mi ⊂ [α, β] for every i ∈ [m] (we may assume that these bounds are arbitrarily high, but sufficiently apart). We set λ := 2β + t and construct the family of sets Mi , i ∈ [m + n] as follows: – for i ∈ [m] : Mi := {a, λ − a : a ∈ Mi }, – for i ∈ [n] : Mm+i := {bi , λ − bi }. If we choose the representatives for the sets Mm+i , i ∈ [n] arbitrarily, and exclude infeasible numbers from the remaining sets, then the remaining task is equivalent to the original instance Mi , i ∈ [m] of the Sq-DR problem.
130
3
J. Fiala, P.A. Golovach, and J. Kratochv´ıl
Distance Labeling of Stars
We first prove NP-hardness of the (p, q)-DL problem when G belongs to a very simple class of graphs, namely to the class of stars. Theorem 1. For any p ≥ 0 and q ≥ 2, the (p, q)-DL problem is NP-complete even when the graph G is required to be a star. Proof. We reduce the Independent Set (IS) problem, which for a given graph G and an integer k asks whether G has k pairwise nonadjacent vertices. The IS problem is well known to be NP-complete [13, problem GT20]. Let H0 and k be an instance of IS. We construct a graph H such that the star K1,k has an H(p, q)-labeling if and only if H0 has k independent vertices. The construction of H goes in three steps: Firstly, if q = 2, then we simply let H1 := H0 and M := VH0 . Otherwise, i.e., for q ≥ 2, we replace each edge of H0 by a path of length q − 1 to obtain H1 . Now let M be the set of the middle points of the replacement paths (i.e., M is of size |EH0 | when q is odd; otherwise M is twice bigger). For the second step we first prepare a path of length max{0, 2p−q−1 } (ob2 serve that this path consists of a single vertex when 2p ≤ q + 1). We make one its its ends adjacent to all vertices in the set M . We denote the other end of the path by w. Finally, if q is odd, we insert new edges to make all vertices in M pairwise adjacent, i.e., the set M now induces a clique. This concludes the construction of the graph H. The properties of H can be summarized as follows: – If two vertices were adjacent in H0 , then they have distance q − 1 in H. Analogously, they have distance q if they were non-adjacent. – Every original vertex is at distance at least p from w, and this bound is attained whenever 2p ≥ q − 1. – If p ≤ q then every newly added vertex except w is at distance less than q from any other vertex of H. – If p ≥ q then every newly added vertex is at distance less than p from w. Straightforwardly, if H0 has an independent set S of size k, then we map the center of K1,k onto w and the leaves of K1,k bijectively onto S. This yields a valid H(p, q)-labeling of K1,k . For the opposite implication assume that K1,k has an H(p, q)-labeling f . We distinguish three cases: – When p < q, then the images of the leaves of K1,k are pairwise at distance q in H. Hence, by the properties of H, these are k original vertices that form an independent set in H0 . – If p = q, then the q-distant vertices of H are some nonadjacent original vertices together with the vertex w. As the image of the center of K1,k belongs into this set as well, H0 has at least k independent vertices. – If p > q, then w is the image of the center of K1,k (unless k = 1, but then the problem is trivial). Analogously as in the previous cases, images of the leaves of K1,k form an independent set of H0 .
Distance Constrained Labelings of Trees
4
131
Distance Labeling between Two Trees
In this section we show that the DL problem is NP-complete for any q ≥ 2 and p ≥ 2q + 1 even when both graphs G and H are required to be trees. Before we state the theorem, we describe the target graph H and explore its properties. Let p and q be given, such that q ≥ 2 and p ≥ 2q + 1. Assume that l > 2(p − q) and for i ∈ [l] define mi := l3 + il. For convenience we also let mi := m2l−i = l3 + (2l − i)l for i ∈ [l + 1, 2l − 1]. We construct a graph Hl as follows: We start with a path of length 2l − 2 on vertices v1 , . . . , v2l−1 , called the backbone vertices. For each vertex vi , i ∈ [2l−1], we prepare mi paths of length q and identify one end of each of these mi paths with the vertex vi . By symmetry, every vertex v2l−i has the same number of pending q-paths as the vertex vi . Observe that the resulting graph depicted in Fig. 1 is a q-caterpillar. For i ∈ [2l − 1] and j ∈ [mi ], let ui,j denote the final vertex of the j-th path hanging from the vertex vi . v1
v2
vi−s−1
vi−s
vi
vi+s
v2l−1
p − 2q q u1,1
ui,j m1
u2l−1,m2l−1
mi = l3 + il
Fig. 1. Construction of the target tree Hl . White vertices define ni as well as a lower bound on r(ui,j ).
Observe that the total number of leaves in Hl is 2l4 . We define s := p − 2q to shorten some expressions. For i ∈ [2l − 1], let ni be the number of leaves of Hl at distance at least p − q from vi , i.e., ⎧ 4 2l − (s + i)l3 − (s + 2i)il + ls(s+1) if i ∈ [s − 1], ⎪ 2 ⎪ ⎨ 4 3 if i ∈ [s, l − s], and mi = 2l4 − (2s + 1)l3 − (2s + 1)il ni := ⎪ 2l − (2s + 1)l − (2s + 1)il + ⎪ j∈Si ⎩ if i ∈ [l − s + 1, l], + (s + i − l)(s + i − l + 1)l where Si := [2l + 1] \ [i − s, i + s]. By symmetry, ni := n2l−i for i ∈ [l + 1, 2l − 1]. Observe that the sequence n1 , . . . , nl−s is decreasing. For a vertex u ∈ VHl , we further define r(u) to be the maximum size of a set of vertices of Hl that are pairwise at least q apart, and that are also at distance at least p from u. In other words, r(u) is an upper bound on the degree of a vertex which is mapped onto u in an Hl (p, q)-labeling.
132
J. Fiala, P.A. Golovach, and J. Kratochv´ıl
We claim that for the leaves ui,j with i ∈ [l], j ∈ [mi ] the r(ui,j ) can be bounded by 2l − p , ni ≤ r(ui,j ) ≤ ni + q since the desired set can be composed from the ni leaves that are at distance at least p − q from vi together with suitable backbone vertices. Consequently, r(ui,j ) < ni−1 for i ∈ [2, l − s]. Finally, observe that if u is a non-leaf vertex of H, then r(u) < nl−s , since with every step away from a leaf the size of the set of p distant vertices decreases by the factor of Ω(l3 ). By the properties of the values ni and r(u) we get that: Lemma 2. For given p, q and l such that p ≥ 2q + 1 and l > 2(p − q) + 1, let T be a tree of three levels such that for every i ∈ [l − s] and j ∈ [mi ], the root y of T has two children xi,j , x2l−i,j , both of degree ni . Every Hl (p, q)-labeling f of T satisfies that f (y) ∈ {ul,j | j ∈ [ml ]}, and ∀i ∈ [l − s] : {f (xi,j ), f (x2l−i,j ) | j ∈ [mi ]} = {ui,j , u2l−i,j | j ∈ [mi ]}. In addition, T has an Hl (p, q)-labeling such that the leaves of T are mapped onto the leaves of Hl . Proof. Assume by induction that all vertices xk,j and x2l−k,j with k < i are mapped onto the set W = {uk,j , u2l−k,j | k < i, j ∈ [mk ]}. Then vertices xi,j , x2l−i,j with j ∈ [mk ] must be mapped onto vertices that are at least q apart from W , i.e., on the backbone vertices or inside a path under some vi with i ∈ [i, 2l − i]. Among those vertices only leaves ui,j and u2l−i,j satisfy r(ui,j ) = r(u2l−i,j ) ≥ ni and can be used as images for xi,j , x2l−i,j . When the labels of all xi,j are fixed, the root must be mapped onto a vertex that is at distance at least p from all ui,j with i ≤ l − p + 2q or i ≥ l + p − 2q. The only such vertices are ul,j with j ∈ [ml ]. If the first two levels of T are partially labeled as described above, then the children of xi,j can be labeled by vertices uk,j with |k − i| > s, only the label of the root y must be avoided. This provides a valid Hl (p, q)-labeling of T . For i ∈ [l − s], let Ti denote the tree T rearranged such that its root is one of the children of xi,1 . We are ready to prove the main theorem of this section. Theorem 2. For any q ≥ 2 and p ≥ 2q + 1, the (p, q)-DL problem is NPcomplete, even if G is a tree and H is a symmetric q-caterpillar. Proof. We reduce the 3-SAT problem and extend the reduction to the Sq-DR problem exposed in Lemma 1. For a formula with n variables, we set α := p − q + 1, β := 2p − 3q, t := p, l := α + (n − 1)p + nq + β, and λ := 2l. According to these parameters we build the graph Hl and transform the given formula into a collection of t-sparse sets Mi , i ∈ [m + n].
Distance Constrained Labelings of Trees
133
r y
T
G w1
...
wi
...
wm+n
y ...
xi,j x2l−i,j j ∈ [mi ]
x1,1
... xl+s,ml−s
...
...
ni − 1
ni − 1
... T
for every k ∈ Ni : (mk + 1) × Tk
Fig. 2. Construction of trees T and G
We construct the tree G of six levels as follows: the root r has children wi , i ∈ [m + n], representing sets Mi . s] that are For each i ∈ [m + n], let Ni be the set containing all numbers of [l − at least p − q apart from any number of Mi . Formally, Ni := [l − s] \ j∈Mi [j − p + q + 1, j + p − q − 1]. For each k ∈ Ni , we take mk + 1 copies of the tree Tk and add mk + 1 edges between the roots of these trees and the vertex wi . Finally, we insert a copy of the tree T and insert a new edge so that the root of this T is also a child of wi . We repeat the above construction for all i ∈ [m + n] to obtain the desired graph G. (See Fig. 2.) We claim that if f is an arbitrary Hl (p, q)-labeling of G then every vertex wi is mapped on some vertex vj with j ∈ Mi . The child of wi , which is the root y, maps on some ul,j . Also the children of y map onto all leaves of form ui,j , i ∈ [l − p + q], j ∈ [mi ]. Hence, the image of wi is one of the backbone vertices vi with i ∈ [l − p + q] ∪ [l + p − q, 2l − 1]. On the other hand, for any k ∈ Ni the image of wi is at least p apart from some uk,l as well as from some u2l−k,l with l, l ∈ [mk ]. This follows from the fact that wi has in Tk more children than there are the leaves under vk (or under v2l−k ), so both k and l − k appear as the first index of the leaf which is the image of a child of wi . This proves the claim. Therefore, the existence of such mapping f yields a valid solution of the original 3-SAT and Sq-DR problems. In the opposite direction observe that any valid solution of the Sq-DR problem transforms naturally to the mapping on vertices wi , i ∈ [m + n]. We extend this partial mapping onto the remaining vertices of G such that the root r is mapped onto u1,1 , all vertices y onto ul,1 . The copies of trees T are labeled as described in Lemma 2. For every wi , we map its children in copies of Tk onto distinct vertices of the set {uk,j , u2l−k,j | j ∈ [mk ]} \ {u1,1 , ul,1 }. Then we extend the labeling onto the entire copy of each Tk like in Lemma 2 without causing any conflicts with other labels. In particular, every child of wi in Tk is of degree nk +1 and its children are labeled by leaves of Hl , while its parent (the vertex wi ) by a backbone vertex.
134
5
J. Fiala, P.A. Golovach, and J. Kratochv´ıl
Conclusion
In this paper we have studied the computational complexity of the H(p, q)labeling problem when both the input graph G and the label space graph H are trees. This could hopefully pave the way to the solution of the L(p, q)-labeling of trees (with both p and q fixed), which is the most interesting open problem in the area of computational complexity of distance constrained labeling problems. Another persistent open problem is the complexity of the L(2, 1)-labeling problem for graphs of bounded pathwidth.
References 1. Calamoneri, T.: The L(h, k)-labeling problem: A survey and annotated bibliography. Computer Journal 49(5), 585–608 (2006) 2. Chang, G.J., Ke, W.-T., Liu, D.D.-F., Yeh, R.K.: On l(d, 1)-labellings of graphs. Discrete Mathematics 3(1), 57–66 (2000) 3. Chang, G.J., Kuo, D.: The L(2, 1)-labeling problem on graphs. SIAM Journal of Discrete Mathematics 9(2), 309–316 (1996) 4. Courcelle, B.: The monadic second-order logic of graphs. I: Recognizable sets of finite graphs. Inf. Comput. 85(1), 12–75 (1990) 5. Fiala, J., Golovach, P.A., Kratochv´ıl, J.: Distance Constrained Labelings of Graphs of Bounded Treewidth. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 360–372. Springer, Heidelberg (2005) 6. Fiala, J., Kloks, T., Kratochv´ıl, J.: Fixed-Parameter Complexity of λ-Labelings. In: Widmayer, P., Neyer, G., Eidenbenz, S. (eds.) WG 1999. LNCS, vol. 1665, Springer, Heidelberg (1999) 7. Fiala, J., Kratochv´ıl, J.: Complexity of Partial Covers of Graphs. In: Eades, P., Takaoka, T. (eds.) ISAAC 2001. LNCS, vol. 2223, pp. 537–549. Springer, Heidelberg (2001) 8. Fiala, J., Kratochv´ıl, J.: Partial covers of graphs. Discussiones Mathematicae Graph Theory 22, 89–99 (2002) 9. Fiala, J., Kratochv´ıl, J., Kloks, T.: Fixed-parameter complexity of λ-labelings. Discrete Applied Mathematics 113(1), 59–72 (2001) 10. Fiala, J., Kratochv´ıl, J., P´ or, A.: On the computational complexity of partial covers of theta graphs. Electronic Notes in Discrete Mathematics 19, 79–85 (2005) 11. Fiala, J., Kratochv´ıl, J., Proskurowski, A.: Distance Constrained Labeling of Precolored Trees. In: Restivo, A., Ronchi Della Rocca, S., Roversi, L. (eds.) ICTCS 2001. LNCS, vol. 2202, pp. 285–292. Springer, Heidelberg (2001) 12. Fiala, J., Kratochv´ıl, J., Proskurowski, A.: Systems of distant representatives. Discrete Applied Mathematics 145(2), 306–316 (2005) 13. Garey, M.R., Johnson, D.S.: Computers and Intractability. W. H. Freeman and Co., New York (1979) 14. Golovach, P.A.: Systems of pair of q-distant representatives and graph colorings (in Russian). Zap. nau. sem. POMI 293, 5–25 (2002) 15. Griggs, J.R., Yeh, R.K.: Labelling graphs with a condition at distance 2. SIAM Journal of Discrete Mathematics 5(4), 586–595 (1992) 16. Harary, F.: Graph theory. Addison-Wesley Series in Mathematics IX (1969)
Distance Constrained Labelings of Trees
135
17. Leese, R.A.: A fresh look at channel assignment in uniform networks. In: EMC 1997 Symposium, Zurich, pp. 127–130 (1997) 18. Leese, R.A.: Radio spectrum: a raw material for the telecommunications industry. In: 10th Conference of the European Consortium for Mathematics in Industry, Goteborg (1998) 19. Leese, R.A., Noble, S.D.: Cyclic labellings with constraints at two distances. Electr. J. Comb. 11(1) (2004) 20. Liu, D.D.-F., Zhu, X.: Circulant distant two labeling and circular chromatic number. Ars Combinatoria 69, 177–183 (2003) 21. Matouˇsek, J., Neˇsetˇril, J.: Invitation to Discrete Mathematics. Oxford University Press, Oxford (1998) 22. McDiarmid, C.: Discrete mathematics and radio channel assignment. In: Recent advances in algorithms and combinatorics. CMS Books Math./Ouvrages Math. SMC, vol. 11, pp. 27–63. Springer, Heidelberg (2003) 23. Yeh, R.K.: A survey on labeling graphs with a condition at distance two. Discrete Mathematics 306(12), 1217–1231 (2006)
A Characterization of NCk by First Order Functional Programs Jean-Yves Marion and Romain P´echoux ´ Loria-INPL, Ecole Nationale Sup´erieure des Mines de Nancy, B.P. 239, 54506 Vandœuvre-l`es-Nancy Cedex, France
[email protected],
[email protected]
Abstract. This paper is part of a research on static analysis in order to predict program resources and belongs to the implicit computational complexity line of research. It presents intrinsic characterizations of the classes of functions, which are computable in NCk , that is by a uniform, poly-logarithmic depth and polynomial size family of circuits, using first order functional programs. Our characterizations are new in terms of first order functional programming language and extend the characterization of NC1 in [9]. These characterizations are obtained using a complexity measure, the sup-interpretation, which gives upper bounds on the size of computed values and captures a lot of program schemas.
1
Introduction
Our work is related to machine independent characterizations of functional complexity classes initiated by Cobham’s work [14] and studied by the Implicit computational complexity (ICC) community, including safe recursion of Bellantoni and Cook [4], data tiering of Leivant [23], linear type disciplines by Girard et al. [17,18], Lafont [22], Baillot-Mogbil [2], Gaboardi-Ronchi Della Rocca [16] and Hofmann [19] and studies on the complexity of imperative programs using matrix algebra by Kristiansen-Jones [21] and Niggl-Wunderlich [28]. Traditional results of the ICC focus on capturing all functions of a complexity class and we should call this approach extensional whereas our approach, which tries to characterize a class of programs, which represents functions in some complexity classes, as large as possible, is rather intensional. In other words, we try to delineate a broad class of programs using a certain amount of resources. Our approach relies on methods combining term rewriting systems and interpretation methods for proving complexity upper bounds by static analysis. It consists in assigning a function from real numbers to real numbers to some symbols of a program. Such an assignment is called a sup-interpretation if it satisfies some specific semantics properties introduced in [27]. Basically, a sup-interpretation provides upper bounds on the size of computed values. Supinterpretation is a generalization of the notion of quasi-interpretation of [10]. The problem of finding a quasi-interpretation or sup-interpretation of a given program, called synthesis problem, is crucial for potential applications of the M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 136–147, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Characterization of NCk by First Order Functional Programs
137
method. It consists in automatically finding an interpretation of a program in order to determine an upper bound on its complexity. It was demonstrated in [1,8] that the synthesis problem is decidable in exponential time for small classes of polynomials. Quasi-interpretations and sup-interpretations have already been used to capture the sets of functions computable in polynomial time and space [26,7] and capture a broad class of algorithms, including greedy algorithms and dynamic programming algorithms. Consequently, it is a challenge to study whether this approach can be adapted to characterize small parallel complexity classes. Parallel algorithms are difficult to design. Employing the supinterpretation method leads to delineate efficient parallel programs amenable to circuit computing. Designing parallel implementations of first order functional programs with interpretation methods for proving complexity bounds, might be thus viable in the near future. A circuit Cn is a directed acyclic graph built up from Boolean gates And, Or and Not. Each gate has an in-degree less or equal to two and an out-degree equal to one. A circuit has n input nodes and g(n) output nodes, where g(n) = O(nc ), for some constant c ≥ 1. Thus, a circuit Cn computes a function fn : {0, 1}n → {0, 1}g(n). A family of circuits is a sequence of circuits C = (Cn )n , which computes a family of finite functions (fn )n over {0, 1}∗ . A function f is computed by a family of circuits (Cn )n if the restriction of f to inputs of size n is computed by Cn . A uniformity condition ensures that there is a procedure which, given n, produces a description of the circuit Cn . Such a condition is introduced to ensure that a family of circuits computes a reasonable function. All along, we shall consider UE ∗ -uniform family of circuits defined in [29]. The complexity of a circuit depends on its depth (the longest path from an input to an output gate) and its size (the number of gates). The class NCk is the class of functions computable by a UE ∗ -uniform family of circuits of size bounded by O(nd ), for some constant d, and depth bounded by O(logk (n)). Intuitively, it corresponds to the class of functions computed in poly-logarithmic time with a polynomial number of processors. Following [3], the main motivation in the introduction of such classes was the search for separation results: “NC1 is the at the frontier where we obtain interesting separation results”. NC1 contains binary addition, substraction, prefix sum of associative operators. Buss [11] has demonstrated that the evaluation of boolean formulas is a complete problem for NC1 . A lot of natural algorithms belong to the distinct levels of the NCk hierarchy. In particular, the reachability problem in a graph or the search for a minimum covering tree in a graph are two problems in N C 2 . In this paper, we define a restricted class of first order functional programs, called fraternal and arboreal programs, using the notion of sup-interpretation of [27]. We demonstrate that functions, which are computable by these programs at some rank k, are exactly the functions computed in NCk . This result generalizes the characterization of NC1 established in [9]. To our knowledge, these are the first results, which connect small parallel complexity classes and first order functional programs.
138
2
J.-Y. Marion and R. P´echoux
First Order Functional Programs
2.1
Syntax of Programs
We define a generic first order functional programming language. The vocabulary Σ = Var, Cns, Op, Fct is composed of four disjoint sets of symbols. The arity of a symbol is the number n of arguments that it takes. A program p consists in a vocabulary and a set of rules R defined by the following grammar: (Values) T (Cns) v ::= c | c(v1 , · · · , vn ) (Patterns) T (Var, Cns) p ::= c | x | c(p1 , · · · , pn ) (Expressions) T (Var, Cns, Op, Fct) e ::= c | x | c(e1 , · · · , en ) | op(e1 , · · · , en ) | f(e1 , · · · , en ) (Rules) Rr ::= f(p1 , · · · , pn ) → e where x ∈ Var is a variable, c ∈ Cns is a constructor symbol, op ∈ Op is an operator, f ∈ Fct is a function symbol, p1 , · · · , pn ∈ T (Var, Cns) are patterns and e1 , · · · , en ∈ T (Var, Cns, Op, Fct) are expressions. The program’s main function symbol is the first function symbol in the program’s list of rules. Throughout the paper, we only consider orthogonal programs having disjoint and linear rule patterns. Consequently, each program is confluent [20]. We will use the notation e to represent a sequence of expressions, that is e = e1 , . . . , en . 2.2
Semantics
The domain of computation of a program p is the constructor algebra Values = T (Cns). Set Values∗ = Values∪{Err}, where Err is a special symbol associated to runtime errors. An operator op of arity n is interpreted by a function op from Valuesn to Values∗ . Operators are essentially basic partial functions like destructors or characteristic functions of predicates like =. Set Values# = Values ∪ {Err, ⊥}, where ⊥ means that a program is nonterminating. Given a program p of vocabulary Var, Cns, Op, Fct and an expression e ∈ T (Cns, Op, Fct), the computation of e, noted e, is defined by e = w ∗ ∗ iff e→w and w ∈ Values∗ , otherwise e = ⊥, where → is the reflexive and transitive closure of the rewriting relation → induced by the rules of R. By definition, if no rule is applicable, then an error occurs and e = Err. A program of main function symbol f computes a partial function φ : Valuesn → Values∗ defined by ∀u1 , · · · , un ∈ Values, φ(u1 , · · · , un ) = w iff f(u1 , · · · , un ) = w. Definition 1 (Size). The size of an expression e is defined by |e| = 0, if e is a 0-arity symbol, and |b(e1 , · · · , en )| = i∈{1,...,n} |ei | + 1, if e = b(e1 , · · · , en ). Example 1. Consider the following program which computes the logarithm function over binary numbers using the constructor symbols {0, 1, }: f(x) → rev(log(x)) log(i(x)) → if(Msp(Fh(i(x)), Sh(i(x))), 0(log(Fh(i(x)))), 1(log(Fh(i(x))))) log() →
A Characterization of NCk by First Order Functional Programs
139
where if(u, v, w) is an operator, which outputs v or w depending on whether u is equal to 1() or 0(), rev is an operator, which reverses a binary value given as input, Msp is an operator, which returns 1() if the leftmost |x| |y| bits of x (where ∀x, y ∈ N, x y = 0 if y > x and x − y otherwise) are equal to the empty word and returns 0() otherwise, Fh is an operator, which outputs the leftmost |x|/2 bits of x, and Sh is an operator, which outputs the rightmost |x|/2 bits of x. The algorithm tests whether the number |x|/2 of leftmost bits is equal to the number |x|/2 of rightmost bits in the input x using the operator Msp. In this case, the last digit of the logarithm is a 0, otherwise it is a 1. Finally, the computation is performed by applying a recursive call over half of the input digits and the result is obtained by reversing the output, using the operator rev. For any binary value v, we have log(v) = u, where u is the value representing the binary logarithm of the input value v. For example, log(1(0(0(0()))) = 1(0(0())). 2.3
Call-Tree
We now describe the notion of call-tree which is a representation of a program state transition sequences induced by the rewrite relation → using a call-by-value strategy. In this paper, the notion of call-tree allows to control the successive function calls corresponding to a recursive rule. First, we define the notions of context and substitution. A context is an expression C[1 , · · · , r ] containing one occurrence of each i , with i new variables which do not appear in Σ. A substitution is a finite mapping from variables to T (Var, Cns, Op, Fct). The substitution of each i by an expression di in the context C[1 , · · · , r ] is noted C[d1 , · · · , dr ]. A ground substitution σ is a mapping from variables to Values. Throughout the paper, we use the symbol σ and the word “substitution” to denote a ground substitution. The application of a substitution σ to an expression (or a sequence of expressions) e is noted eσ. Definition 2. Suppose that we have a program p. A state f, u1 , · · · , un of p is a tuple where f is a function symbol of arity n and u1 , · · · , un are values of Values∗ . There is state transition, noted η1 η2 , between two states η1 = f, u1 , . . . , un and η2 = g, v1 , · · · , vm if there are a rule f(p1 , · · · , pn ) → e of p, a substitution σ, a context C[−] and expressions e1 , · · · , em such that ∀i ∈ {1, n} ∗ pi σ = ui , ∀j ∈ {1, m}, ej σ = vj and e = C[g(e1 , · · · , em )]. We write to denote the reflexive and transitive closure of . A call-tree of p of root f, u1 , · · · , un is the following tree: – the root is the node labeled by the state f, u1 , · · · , un . ∗ – the nodes are labeled by states of {η | f, u1 , · · · , un η}, – there is an edge between two nodes η1 and η2 if there is a transition between both states which label the nodes (i.e. η1 η2 ).
140
J.-Y. Marion and R. P´echoux
A branch of the call-tree is a sequence of states of the call-tree η1 , · · · , ηk such that η1 η2 . . . ηk−1 ηk . Given a branch B of a call-tree, the depth of the branch depth(B) is the number of states in the branch, i.e. if B = η1 , . . . , ηi−1 , ηi , then depth(B) = i. Notice that a call-tree may be infinite if it corresponds a non-terminating program. 2.4
Fraternity
The fraternity is the main syntactic notion, we use in order to restrict the computational power of considered programs. Definition 3. Given a program p, the precedence ≥Fct is defined on function symbols by f ≥Fct g if there is a rule f(p) → C[g(e)] in p. The reflexive and transitive closure of ≥Fct is also noted ≥Fct . Define ≈Fct by f ≈Fct g iff f ≥Fct g and g ≥Fct f and define >Fct by f >Fct g iff f ≥Fct g and not g ≥Fct f. We extend the precedence to operators and constructor symbols by ∀f ∈ Fct, ∀b ∈ Cns ∪ Op, f >Fct b. Definition 4. Given a program p, an expression C[g1 (e1 ), . . . , gr (er )] is a fraternity activated by f(p1 , · · · , pn ) if: 1. f(p1 , · · · , pn ) → C[g1 (e1 ), . . . , gr (er )] is a rule of p, 2. For each i ∈ {1, r}, gi ≈Fct f, 3. For every symbol b in the context C[1 , · · · , r ], f >Fct b. Notice that a fraternity corresponds to a recursive rule. Example 2. if(Msp(Fh(i(x)), Sh(i(x))), 0(log(Fh(i(x)))), 1(log(Fh(i(x))))) is the only fraternity in the program of example 1. It is activated by log(i(x)) by taking C[1 , 2 ] = if(Msp(Fh(i(x)), Sh(i(x))), 0(1 ), 1(2 )) since log ≈Fct log.
3 3.1
Sup-interpretations Monotonic, Polynomial and Additive Partial Assignments
Definition 5. A partial assignment I is a partial mapping from the vocabulary Σ such that, for each symbol b of arity n in the domain of I, it yields a partial function I(b) : (R+ )n −→ R+ , where R+ is the set of non-negative real numbers. The domain of a partial assignment I is noted dom(I) and satisfies Cns ∪ Op ⊆ dom(I). A partial assignment I is monotonic if for each symbol b ∈ dom(I), we have ∀i ∈ {1, . . . , n} , ∀Xi , Yi ∈ R+ , Xi ≥ Yi ⇒ I(b)(X1 , · · · , Xn ) ≥ I(b)(Y1 , · · · , Yn ). A partial assignment I is polynomial if for each symbol b of dom(I), I(b) is a max-polynomial function ranging over R+ . That is, I(b) = max(P1 , . . . , Pk ), with Pj polynomials. A partial assignment is additive if the assignment nof each constructor symbol c of arity n is of the shape I(c)(X1 , · · · , Xn ) = i=1 Xi + αc , with αc ≥ 1, whether n > 0, and I(c) = 0 otherwise.
A Characterization of NCk by First Order Functional Programs
141
If each function symbol of a given expression e having m variables x1 , · · · , xm belongs to dom(I) then, given m fresh variables X1 , · · · , Xm ranging over R+ , we define the homomorphic extension I ∗ (e) of the assignment I inductively by: 1. If xi is a variable of Var, then I ∗ (xi ) = Xi 2. If b is a symbol of Σ of arity 0, then I ∗ (b) = I(b). 3. If b is a symbol of arity n > 0 and e1 , · · · , en are expressions, then I ∗ (b(e1 , · · · , en )) = I(b)(I ∗ (e1 ), . . . , I ∗ (en )) Given a sequence e = e1 , · · · , en , we will sometimes use the notation I ∗ (e) to denote I ∗ (e1 ), . . . , I ∗ (en ). 3.2
Sup-interpretations
Definition 6. A sup-interpretation is a partial assignment θ which satisfies the three conditions below: 1. θ is a monotonic assignment. 2. For each v ∈ Values, θ∗ (v) ≥ |v| 3. For each symbol b ∈ dom(θ) of arity n and for each value v1 , . . . , vn of Values, if b(v1 , . . . , vn ) ∈ Values, then θ∗ (b(v1 , . . . , vn )) ≥ θ∗ (b(v1 , . . . , vn )) A sup-interpretation is additive (respectively polynomial) if it is an additive (respectively polynomial) assignment. Notice that if a sup-interpretation is an additive assignment then the second condition of the above definition is always satisfied. Intuitively, the sup-interpretation is a special program interpretation which bounds from above the output size of the function denoted by the program, as demonstrated in the following lemma: Lemma 1 ([27]). Given a sup-interpretation θ and an expression e defined over dom(θ), if e ∈ Values then we have |e| ≤ θ∗ (e) ≤ θ∗ (e) Example 3. Consider the program of example 1. Define the additive assignment θ by θ(1)(X) = θ(0)(X) = X + 1, θ() = 0, θ(Msp)(X, Y ) = X Y = max(X − Y, 0), θ(Fh)(X) = X/2 and θ(Sh)(X) = X/2. We claim that θ is an additive and polynomial sup-interpretation. Indeed, all these functions are monotonic. Moreover, for every binary value v, we have θ∗ (v) = |v| since the sup-interpretation of a value is equal to its size. Finally, such an assignment satisfies the third condition of the above definition. In particular, we check that, for every binary values u and v, if Msp(u, v) ∈ Values then we have: θ∗ (Msp(u, v)) = θ(Msp)(θ∗ (u), θ∗ (v)) ∗
∗
By definition of assignments
= θ (u) θ (v)
By definition of θ(Msp)
= |u| |v| = |Msp(u, v)|
Since ∀w ∈ Values, θ∗ (w) = |w| By definition of Msp
= θ∗ (Msp(u, v))
Since ∀w ∈ Values, θ∗ (w) = |w|
142
J.-Y. Marion and R. P´echoux
Notice that θ is a partial assignment since it is not defined on the symbols if and log. However, we can extend θ by θ(if)(X, Y, Z) = max(Y, Z) and θ(log)(X) = X in order to obtain a total, additive and polynomial sup-interpretation.
4
Arboreal and Fraternal Programs
In this section, we give two restrictions on programs (i) the arboreal condition which ensures that the number of successive recursive calls corresponding to a function symbol (in the depth of a call-tree) is bounded logarithmically by the input size (ii) and the fraternal condition which ensures that the size of each computed value is polynomially bounded by the input size. 4.1
Arboreal Programs
An arboreal program is a program whose recursion depth (number of successive recursive calls) is bounded logarithmically by the input size. This logarithmic upper bound is obtained by ensuring that some complexity measure, corresponding to the combination of sup-interpretations and motononic and polynomial partial assignments, is divided by a fixed constant K > 1 at each recursive call. Definition 7. A program p is arboreal iff there are a polynomial and additive sup-interpretation θ, a monotonic and polynomial partial assignment ω and a constant K > 1 such that for every fraternity C[g1 (e1 ), . . . , gr (er )] activated by f(p), the following conditions are satisfied: – For any substitution σ, ω(f)(θ∗ (pσ)) ≥ 1 – For any substitution σ and ∀i ∈ {1, . . . , r}, ω(f)(θ∗ (pσ)) ≥ K×ω(gi )(θ∗ (ei σ)) – There is no function symbol h ∈ ei such that h ≈Fct f. Example 4. In the program of example 1, there is one fraternity: if(Msp(Fh(i(x)), Sh(i(x))), 0(log(Fh(i(x)))), 1(log(Fh(i(x))))) activated by log(i(x)). Taking the additive and polynomial sup-interpretation θ of example 3, the polynomial partial assignment ω(log)(X) = X and the constant K = 2, we check that the program is arboreal: ω(log)(θ∗ (i(x))) = X + 1 ≥ 2 × (X + 1)/2 = K × ω(log)(θ∗ (Fh(i(x)))) ≥ 1 Lemma 2. Assume that p is an arboreal program. Then p is terminating. That is, for every function symbol f and for any values u in Values, f(u) is in Values∗ . Moreover, for each branch B = f, u1 , · · · , un , . . . , g, v1 , · · · , vm of a calltree corresponding to one execution of p and such that f ≈Fct g, we have: depth(B) ≤ α × log(ω(f)(θ∗ (u1 ), . . . , θ∗ (un ))), for some constant α
A Characterization of NCk by First Order Functional Programs
143
Proof. Consider an arboreal program p, a call-tree of p and one of its branch of the shape f, u1 , · · · , un g, v1 , · · · , vm , with f ≈Fct g. We know, by definition of a call-tree, that there are a rule of the shape f(p1 , · · · , pn ) → C[g(e1 , · · · , em )] and a substitution σ such that pi σ = ui and ej σ = vj . We obtain: ω(f)(θ∗ (u1 ), . . . , θ∗ (un )) = ω(f)(θ∗ (p1 σ), . . . , θ∗ (pn σ)) ≥ K × ω(gi )(θ∗ (e1 σ), . . . , θ∗ (em σ)) ∗
∗
≥ K × ω(gi )(θ (v1 ), . . . , θ (vm ))
By definition 7 By lemma 1
Applying the same reasoning, we demonstrate, by induction on the depth of a branch, that for each branch B = f, u1 , · · · , un . . . g, v1 , · · · , vm , with f ≈Fct g and depth(B) = i: ω(f)(θ∗ (u1 ), . . . , θ∗ (un )) ≥ K i × ω(g)(θ∗ (v1 ), . . . , θ∗ (vm )) Consequently, the depth is bounded by logK (ω(f)(θ∗ (u1 ), . . . , θ∗ (un ))), because of the first condition of definition 7, whenever f, u1 , · · · , un is the first state of the considered branch. It remains to combine this result with the equality K (x) log(x) = log logK (2) in order to obtain the required result. Since every branch corresponding to a recursive call has a bounded depth and we are considering confluent programs, the program is terminating. 4.2
Fraternal Programs
Definition 8. A program p is fraternal if there is a polynomial and additive sup-interpretation θ such that for each fraternity C[g1 (e1 ), . . . , gr (er )] activated by f(p1 , · · · , pn ) and for each symbol b of arity m appearing in C or in ej , there are constants αbi,j , βjb ∈ R+ satisfying: m θ(b)(X1 , · · · , Xm ) = max( αbi,j × Xi + βjb ) j∈J
i=1
where J is a finite set of indices. In other words, a program is fraternal if every symbol in a context or in an argument of a fraternity admits an affinely bounded sup-interpretation. Example 5. C[log(Fh(i(x))), log(Fh(i(x)))] is the only fraternity in the program of example 1, where C[1 , 2 ] = if(Msp(Fh(i(x)), Sh(i(x))), 0(1 ), 1(2 )). Consequently, we have to check that the symbols if, Msp, Fh, Sh, 0 and 1 admit affine sup-interpretations. This is the case by taking the polynomial supinterpretation of example 3 and, consequently, the program is fraternal. Lemma 3. Given a sup-interpretation θ and a monotonic and polynomial partial assignment ω for which the program p is arboreal and fraternal, there is a polynomial P such that for each sequence of values u1 , · · · , un and for each function symbol f of arity n, we have: |f(u1 , · · · , un )| ≤ P (maxi=1..n (|ui |)).
144
J.-Y. Marion and R. P´echoux
Proof (Sketch). Lemma 2 states that the number of successive recursive calls occurring in the depth of the call-tree is logarithmically bounded by the input size. By definition 8, the contexts and arguments of a recursive call (fraternity) of a fraternal program are affinely bounded by the input size. Consequently, a logarithmic composition of affinely bounded functions remains polynomially bounded.
5
Characterizations of NCk and NC
Similarly to Buss’ encoding [11], we represent constructors and destructors (implicitly used in pattern matching definitions) by UE ∗ -uniform circuits of constant depth and polynomial size. Given such an encoding code and a function φ : Valuesn → Values∗ computed by some program p, we define a function ˜ φ˜ : {0, 1}∗ → {0, 1}∗ by ∀u ∈ Valuesn φ(code(u)) = code(φ(u)). A function φ of Values is computable in NCk relatively to the encoding code if and only if φ˜ is computable by a UE ∗ -uniform family of circuits in NCk . Now, we define a notion of rank in order to make a distinction between the levels of the NCk hierarchy: Definition 9. Given a program p composed by a vocabulary Σ, a set of rules R and an encoding code, the rank of a symbol b, rk(b), and the rank of a symbol b relatively to an expression e, ∇(b, e), are partial functions ranging over N and defined by induction over the precedence ≥Fct : – If b is a variable or a constructor symbol then rk(b) = 0. – If b is an operator, which can be computed by a UE ∗ -uniform family of circuits of polynomial size and depth bounded by logk relatively to the encoding code, then rk(b) = k. – If b is a function symbol we define its rank relatively to an expression e by: • If ∀b ∈ e, b >Fct b then ∇(b, e) = maxb ∈e (rk(b )) • Otherwise ∃b ∈ e such that b ≈Fct b and e = b (e1 , · · · , en ): ∗ If b >Fct b then ∇(b, e) = max(rk(b ) + 1, ∇(b, e1 ), . . . , ∇(b, en )) ∗ Otherwise b ≈Fct b then ∇(b, e) = max(∇(b, e1 ), . . . , ∇(b, en )) + 1 – Finally, we define the rank of a function symbol b by: • rk(b) = maxb(p)→e∈R (∇(b, e)) where b ∈ e means that the symbol b appears in the expression e. The rank of a program is defined to be the highest rank of a symbol of a program. Example 6. In the program of example 1, the operators if, Fh, Sh, rev and Msp are of rank 0 since they all belong to N C 0 (Cf.[5] for Fh, Sh and Msp) using an encoding code which returns the binary value in T ({0, 1, }) corresponding to a binary string in {0, 1}∗. Consequently, we obtain that: rk(log) = ∇(log, if(Msp(Fh(i(x)), Sh(i(x))), 0(log(Fh(i(x)))), 1(log(Fh(i(x)))))) = max(rk(if) + 1, ∇(log, 0(log(Fh(i(x))))), ∇(log, 1(log(Fh(i(x)))))) = max(1, rk(0) + 1, rk(1) + 1, ∇(log, log(Fh(i(x))))) = max(1, ∇(log, Fh(i(x))) + 1) = 1
A Characterization of NCk by First Order Functional Programs
145
Theorem 1. A function φ from Valuesn to Values∗ is computed by a fraternal and arboreal program of rank k ≥ 1 (resp. k ∈ N) if and only if φ is computable in NCk (resp. NC). Proof (Sketch). We can show, by induction on the rank and the precedence ≥Fct , that an arboreal and fraternal program of rank k can be simulated by a UE ∗ -uniform family of circuits of polynomial size and logk depth, using a discriminating circuit of constant depth which, given some inputs, picks the right rule to apply. The logk depth relies on a logarithmic number of logk−1 depth circuits compositions by lemma 2. The polynomial size is a consequence of lemma 3. Conversely, we use the characterization of Clote [12] to show the completeness. Clote’s algebra is based on two recursion schemas called Concatenation Recursion on Notation (CRN) and Weak bounded Recursion on Notation (WBRN) defined over a function algebra from natural numbers to natural numbers when considering the following initial functions zero(x) = 0, s0 (x) = 2 × x, s1 (x) = 2 × x + 1, πkn (x1 , · · · , xn ) = xk , |x| = log2 (x + 1), x#y = 2|y|×|x|, bit(x, i) = x/2i mod 2 and a function tree, which computes alternations of bitwise conjunctions and bitwise disjunctions. All Clote’s initial functions can be simulated by operators of rank 1 since they are in AC 0 ([12]). We show that a function of rank k in Clote’s algebra can be simulated by an arboreal and fraternal program of rank k + 1, using divide-and-conquer algorithms (This requirement is needed because of the arboreal property). A difficulty to stress here is that Clote’s rank differs from the notion of rank we use because of the function tree and the CRN schema.
6
Comparison with Previous Works
This work extends the characterization of NC1 in [9] to the NCk hierarchy. For that purpose, we have substituted the more general notion of fraternal program to the notion of explicitly fraternal of [9] and we have introduced a notion of rank. In the literature, there are many characterizations of NC1 using several computational models, like ATM or functions algebra. Compton and Laflamme [15] gave a characterization of NC1 based on finite functions. Bloch [5] used ramified recurrence schema relying on a divide and conquer strategy to characterize NC1 . Leivant and Marion [25] have established another characterization using ramified recurrence over a specific data structure, well balanced binary trees. We can show that our characterization strictly generalizes the ones of Bloch and LeivantMarion since they both rely on divide-and-conquer strategies. The characterizations of the NCk classes by Clote [13], using a bounded recurrence schema `a la Cobham, are distinct from our characterizations since they do not rely on a divide and conquer strategy. However, like Cobhams’ work, Clote’s WBRN schema requires an upper bound on the computed function which is removed from our characterizations. Other characterizations of NC are also provided in [24,6]. All these purely syntactic characterizations capture a few algorithmic patterns. On the contrary, our work tries to delineate a broad class of algorithms using the semantics notion of sup-interpretation.
146
J.-Y. Marion and R. P´echoux
References 1. Amadio, R.: Synthesis of max-plus quasi-interpretations. Fundamenta Informaticae 65(1–2) (2005) 2. Baillot, P., Mogbil, V.: Soft lambda-calculus: A language for polynomial time computation. In: Walukiewicz, I. (ed.) FOSSACS 2004. LNCS, vol. 2987, pp. 27–41. Springer, Heidelberg (2004) 3. Barrington, D., Immerman, N., Straubing, H.: On uniformity within NC. J. of Computer System Science 41(3), 274–306 (1990) 4. Bellantoni, S., Cook, S.: A new recursion-theoretic characterization of the poly-time functions. Computational Complexity 2, 97–110 (1992) 5. Bloch, S.: Function-algebraic characterizations of log and polylog parallel time. Computational complexity 4(2), 175–205 (1994) 6. Kahle, R., Marion, J.-Y., Bonfante, G., Oitavem, I.: Towards an implicit charac´ Z. (ed.) CSL 2006. LNCS, vol. 4207, pp. 212–224. terization of N C k . In: Esik, Springer, Heidelberg (2006) 7. Bonfante, G., Marion, J.-Y., Moyen, J.-Y.: On lexicographic termination ordering with space bound certifications. In: Bjørner, D., Broy, M., Zamulin, A.V. (eds.) PSI 2001. LNCS, vol. 2244, Springer, Heidelberg (2001) 8. Bonfante, G., Marion, J.-Y., Moyen, J.-Y., P´echoux, R.: Synthesis of quasiinterpretations. In: LCC2005, LICS affiliated Workshop (2005), http://hal.inria.fr 9. Bonfante, G., Marion, J.-Y., P´echoux, R.: A characterization of alternating log time by first order functional programs. In: Hermann, M., Voronkov, A. (eds.) LPAR 2006. LNCS (LNAI), vol. 4246, pp. 90–104. Springer, Heidelberg (2006) 10. Bonfante, G., Marion, J.Y., Moyen, J.Y.: Quasi-interpretations, a way to control resources. TCS (2007) 11. Buss, S.: The boolean formula value problem is in ALOGTIME. In: STOC, pp. 123–131 (1987) 12. Clote, P.: Sequential, machine-independent characterizations of the parallel complexity classes ALOGTIME, AC k , N C k and NC. In: Buss, R., Scott, P. (eds.) Workshop on Feasible Math, pp. 49–69. Birkh¨ auser, Basel (1989) 13. Clote, P.: Computational models and function algebras. In: Leivant, D. (ed.) LCC 1994. LNCS, vol. 960, pp. 98–130. Springer, Heidelberg (1995) 14. Cobham, A.: The intrinsic computational difficulty of functions. In: Conf. on Logic, Methodology, and Philosophy of Science, pp. 24–30. North-Holland, Amsterdam (1962) 15. Compton, K.J., Laflamme, C.: An algebra and a logic for N C. Inf. Comput. 87(1/2), 240–262 (1990) 16. Gaboardi, M., Rocca, S.R.D.: A soft type assignment system for λ-calculus. In: Duparc, J., Henzinger, T.A. (eds.) CSL 2007. LNCS, vol. 4646, pp. 253–267. Springer, Heidelberg (2007) 17. Girard, J.-Y.: Light linear logic. Inf. and Comp. 143(2), 175–204 (1998) 18. Girard, J.Y., Scedrov, A., Scott, P.: Bounded linear logic. Theoretical Computer Science 97(1), 1–66 (1992) 19. Hofmann, M.: Programming languages capturing complexity classes. SIGACT News Logic Column 9 (2000) 20. Huet, G.: Confluent reductions: Abstract properties and applications to term rewriting systems. Journal of the ACM 27(4), 797–821 (1980)
A Characterization of NCk by First Order Functional Programs
147
21. Kristiansen, L., Jones, N.D.: The flow of data and the complexity of algorithms. In: Cooper, S.B., L¨ owe, B., Torenvliet, L. (eds.) CiE 2005. LNCS, vol. 3526, pp. 263–274. Springer, Heidelberg (2005) 22. Lafont, Y.: Soft linear logic and polynomial time. Theoretical Computer Science 318(1-2), 163–180 (2004) 23. Leivant, D.: Predicative recurrence and computational complexity I: Word recurrence and poly-time. In: Feasible Mathematics II, pp. 320–343. Birkh¨ auser, Basel (1994) 24. Leivant, D.: A characterization of NC by tree recurrence. In: FOCS 1998, pp. 716– 724 (1998) 25. Leivant, D., Marion, J.-Y.: A characterization of alternating log time by ramified recurrence. TCS 236(1-2), 192–208 (2000) 26. Marion, J.-Y., Moyen, J.-Y.: Efficient first order functional program interpreter with time bound certifications. In: Parigot, M., Voronkov, A. (eds.) LPAR 2000. LNCS (LNAI), vol. 1955, pp. 25–42. Springer, Heidelberg (2000) 27. Marion, J.-Y., P´echoux, R.: Resource analysis by sup-interpretation. In: Hagiya, M., Wadler, P. (eds.) FLOPS 2006. LNCS, vol. 3945, pp. 163–176. Springer, Heidelberg (2006) 28. Niggl, K.-H., Wunderlich, H.: Certifying polynomial time and linear/polynomial space for imperative programs. SIAM J. on Computing (to appear) 29. Ruzzo, W.: On uniform circuit complexity. J. of Computer System Science 22(3), 365–383 (1981)
The Structure of Detour Degrees Lars Kristiansen1 and Paul J. Voda2 1
2
Department of Mathematics, University of Oslo
[email protected] http://www.iu.hio.no/~larskri Institute of Informatics, Comenius University Bratislava
[email protected] http://www.fmph.uniba.sk/~voda
Abstract. We introduce a new subrecursive degree structure: the structure of detour degrees. We provide definitions and basic properties of the detour degrees, including that they form a lattice and admit a jump operator. Our degree structure sheds light upon the open problems involving the small Grzegorcyk classes. There are also connections to complexity theory and complexity classes defined by resource-bounded Turing machines.
1
Introduction
Two main categories of subrecursive degrees are known from the literature: (i) degrees of honest functions and (ii) degrees of computable sets. An honest function is a total computable and monotone increasing function with a simple graph, i.e., the characteristic function for the relation f (x) = y is of degree 0. (E.g., if we are working with elementary degrees, the graph should be elementary; if we are working with primitive recursive degrees, the graph should be primitive recursive.) Typical reducibility relations between honest functions will be “being (Kalmar) elementary in” and “being primitive recursive in”. Strong resource bounded reducibilities like “being polynomial time computable in” do not work when we are dealing with honest functions, and hence, computer scientists and complexity theoreticians have never been very attracted to honest degree theory. Honest subrecursive degree structures were studied in the early seventies by e.g. Basu [2], Meyer & Ritchie [20], Machtey [16][17][18]; and more recently by the first author, see [7][8][9][10]. The structure of honest elementary degrees is a distributive lattice with strong density properties. Kristiansen studies a jump operator on the honest elementary degrees, see e.g. [7] [10]. The study of degrees of computable sets started off in the mid seventies with Ladner’s seminal paper [15]. In contrast to honest functions, computable sets admit degree structures induced by strong resource bounded reducibility relations, like e.g. “polynomial time computable in” and “logarithmic space computable in”. The resulting degree structures, which of course are highly interesting from a complexity-theoretic point of view, are all upper semi lattices with strong density properties. See Merkle [19] for very general embedding results. Ladner [15] M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 148–159, 2008. c Springer-Verlag Berlin Heidelberg 2008
The Structure of Detour Degrees
149
proved that neither the structure of polynomial time m-degrees nor the structure of polynomial time T-degrees are lattices, and his proof methods and results generalise to wide variety of reducibilities. See Ambos-Spies [1] for more on degrees of computable sets. In this paper we introduce a new subrecursive degree structure that does not fall into either of the two categories discussed above. The theory we develop has a flavour of honest degree theory, but our reducibility relation is sufficiently strong to make the theory interesting from a complexity-theoretic point of view. We prove that our degree structure is a distributive lattice, and we introduce a jump operator.
2
Preliminaries
We will use some notation and terminology from Clote [3]. An operator, here also called (definition) scheme, is a mapping from functions to functions. Let X be a set of functions (possibly given in a slightly informal notation), and let op be a collection of operators. The function algebra [X ; op] is the smallest set of functions containing X and closed under the operations of op. Let comp denote the scheme of composition, i.e. the scheme f (x) = h(g1 (x), . . . , gm (x)) where m ≥ 0, and let pr denotes the scheme of primitive recursion, that is, the scheme f (x, 0) = g(x)
f (x, y + 1) = h(x, y, f (x, y)) .
Further, bounded minimalisation is the scheme f (x) = (μi ≤ y)[R(x, i)] where the least i ≤ y such that the relation R(x, i) holds (μi ≤ y)[R(x, i)] = 0 if no such i exists. Any function f : Nn → N will be identified with a relation Rf ⊆ Nn . The relation Rf (n) holds iff f (x) = 0, and the relation Rf belongs to a class of functions F iff f belongs to F . A problem is a subset of N, or equivalently, a unary relation. We will use A, B, C, . . . to denote problems, and a function f decides a problem A when f (x) = 0 iff x ∈ A. For any set F of number-theoretic functions, F∗ denotes the set of problem decided by the functions in F .1 Let Iin : Nn → N denote the projection function, i.e. Iin (x1 , . . . , xn ) = xi where x = x1 , . . . , xn and 1 ≤ i ≤ n. Let I denote the set of all such projection functions. Let Ck denote the constant function yielding the natural number k. The small Grzegorczyk classes are defined by E 0 := [I, C0 , S, max; comp, br], E 1 := [I, C0 , S, +; comp, br] and E 2 := [I, C0 , S, +, x2 + 2; comp, br] where S denotes the successor function and br denotes the scheme of bounded primitive recursion, that is, the scheme f (x, 0) = g(x) 1
f (x, y + 1) = h(x, y, f (x, y))
f (x, y) ≤ j(x, y) .
Our use of the subscript ∗ differs slightly from the standard in the literature. Normally, F∗ denotes the 0-1 valued functions in F whereas we use F∗ to denote the set of problem decided by the functions in F. This is a matter of convenience, and the deviation has no essential mathematical implications.
150
3
L. Kristiansen and P.J. Voda
Detour Functions and P−-Computability
Definition. P− := [I, C1 ; comp, pr].
Roughly speaking, P− contains the primitive recursive functions which can be defined without the successor function, and it is easy to see that we have f (x) ≤ max(x, 1) for any f ∈ P− . Even though P− contains only non-increasing functions, surprisingly powerful relations and predicates belong to the class. Indeed, it − 0 1 2 is an open problem if P− ∗ = linspace; we have P∗ = E∗ ⊆ E∗ ⊆ E∗ = linspace, 2 and it is not known if any of the inclusions are strict. For more on P− and similar non-increasing function algebras, see e.g. Kristiansen [5] [6] and Kristiansen & Voda [13] [14]. Lemma 1 (Basic functions). The following number-theoretic functions belong ˙ (modified to P− : (i) 0,1 (constant functions); (ii) P(x) (predecessor); (iii) x−y subtraction); (iv) c where c(x, y1 , y2 ) = y1 if x = 0 and c(x, y1 , y2 ) = y2 if x = 0; (v) max(x, y) and min(x, y); (vi) f where f (x, m) = x + 1 (mod m + 1) for x ≤ m; (vii) f where f (x, y, m) = x + y (mod m + 1) for x, y ≤ m; (viii) f where f (x, y, m) = x × y (mod m + 1) for x, y ≤ m; (ix) f where f (x, y, m) = xy (mod m + 1) for x, y ≤ m; (x) for each k ∈ N, the almost everywhere constant function νk where νk (x) = min(x, k). Moreover, the class P− is closed under bounded minimalisation. Proof. To define the constant function 0 is slightly nontrivial. Define g by primitive recursion such that g(x, 0) = x and g(x, y + 1) = y. Then we can define the predecessor P from g since P (x) = g(x, x). Further, we can define the constant function 0 by 0 = P (1). This proves that (i) and (ii) hold. (iii) holds since we ˙ = x and x−(y ˙ + 1) = P (x−y). ˙ have x−0 (iv) holds since c(0, y1 , y2 ) = I12 (y1 , y2 ) 4 and c(x + 1, y1 , y2 ) = I2 (y1 , y2 , x, c(x, y1 , y2 )). Furthermore, (v) holds since ˙ −y), ˙ ˙ −y), ˙ max(x, y) = c(1−(x x, y) and min(x, y) = c(1−(x y, x). (vi) holds since ˙ 0, m−((m ˙ ˙ −1)) ˙ c(m−x, −x) = x + 1 (mod m + 1) for x ≤ m. We omit the rest of the proof. Lemma 2 (Basic relations). (i) The relations x ≤ y and x = y belong to P− . (ii) The relations of P− are closed under the propositional operations (and, or, not) and under bounded existential and bounded universal quantification, i.e., ∃x ≤ y[. . .] and ∀x ≤ y[. . .]. ˙ = 0 iff x ≤ y; we Proof. Apply the functions given by Lemma 1. We have y −x have c(f (x), c(g(x), 0, 1), 1) = 0 iff f (x) = 0 and g(x) = 0; we have c(f (x), 1, 0) = 0 iff f (x) = 0; and so on. Use the bounded minimalisation operator to define the bounded quantifiers. 2
?
?
The notorious problem E∗0 ⊆ E∗1 ⊆ E∗2 is more than 50 years old and was posed 0 in Grzegorczyk [4]. It is trivial that P− ∗ ⊆E∗ , and it follows from Theorem 5 in this ?
0 − 0 paper that P− ∗ = E∗ . (The problem P∗ ⊆ E∗ was posed as open in Kristiansen & 2 Barra [11].) Ritchie [22] proved that E∗ = linspace.
The Structure of Detour Degrees
151
Definition. A detour function f : N → N satisfies the following requirements: (1) x ≤ f (x); (2) f (x) ≤ f (x + 1); and (3) the graph of f , i.e. the relation f (x) = y, is in P− . Henceforth we will use Latin letters f, g, h, etc., to denote detour functions. and Greek letters φ, ψ, ξ, etc., to denote arbitrary functions. A function ψ is P− -computable in the detour function f if there exists φ ∈ P− such that ψ(x) = φ(x, z) for all z ≥ f (max(x)). Let P− (f ) denote the set of functions P− -computable in f . What motivates our definitions? Well, we are mainly interested in P− (f )∗ , that is, the set of problems P− -computable in f . A problem A will be P− -computable in a detour function f if f provides enough resources for a P− -computation to solve the problem. Any P− -function is non-increasing, and thus, a P− -computation solving a problem on input x in a detour function f , can make detours via any number less than f (x), but no further detours are possible. Our definitions assure that if g provides more resources (longer detours) than f , that is if f (x) ≤ g(x), then any problem P− -computable in f will also be P− -computable in g. It is essential that we require the graph of a detour function to be in P− . This prevents us from coding a set A of high computational complexity into a detour function f , e.g. by letting f (x) = 2x when x ∈ A and f (x) = 2x + 1 when x ∈ A, and thus, the requirement ensures that a detour function provides nothing but resources to a P− -computation. (Without the requirement our degree structure would simply degenerate to a degree structure of computable sets.) Definition. For any set A ⊆ N, we define the characteristic function χA by χA (x) = 0 if x ∈ A and χA (x) = 1 if x ∈ A. We define the function f −1 by f −1 (x) = (μi ≤ x)[f (i) ≥ x], and when f is a detour function, we will say that f −1 is the inverse function of f . For any f : N → N and g : N → N, we define the functions max[f, g] and min[f, g] by respectively max[f, g](x) = max(f (x), g(x)) and min[f, g](x) = min(f (x), g(x)) The two following lemmas are collections of very basic facts. The lemmas will be applied frequently throughout the paper, but we will only occasionally refer explicitly to the lemmas. Lemma 3. Let f be any detour function. (i) There exists ξ ∈ P− such that φ(x) = ξ(x, f (max(x))) if, and only if, there exist ψ ∈ P− such that φ(x) = ψ(x, z) for any z ≥ f (max(x)); (ii) A ∈ P− (f )∗ iff χA ∈ P− (f ); (iii) P− (f )∗ is closed under unions, intersections and complements; (iv) f −1 ∈ P− and f −1 (f (x)) ≤ x, besides, if f is strictly monotone, we also have f −1 (f (x)) = x. Proof. We prove the only-if-direction of (i). (The if-direction is trivial.) Assume there exists ξ ∈ P− such that φ(x) = ξ(x, f (max(x))). Let ξ0 (x, y) = (μi ≤ y)[f (max(x)) = i]. The graph of f belongs to P− , and P− is closed under bounded minimalisation, and thus, we have ξ0 ∈ P− . Let ξ1 (x, z) = ξ(x, ξ0 (x, z)). Now we have ξ1 ∈ P− and φ(x) = ξ1 (x, z) for any z ≥ f (max(x)). Next we prove (iv). By the definition of f −1 we have f −1 (f (x)) = (μi ≤ f (x))[f (i) ≥ f (x)]. Since f is monotone, it cannot be the case that f −1 (f (x)) > x, and if f is strictly
152
L. Kristiansen and P.J. Voda
monotone, it cannot be the case that f −1 (f (x)) < x. Thus, f −1 (f (x)) = x when f is strictly monotone. The graph of f belongs to P− , and P− is closed under bounded minimalisation. Thus, f −1 ∈ P− . We omit the rest of the proof. Lemma 4. (i) Any non-constant polynomial p(x) (generated from x and the constants of N by + and ×) is a detour function. (ii) The function 2x is a detour function. (iii) The detour functions are closed under compositions. (iv) max[f, g] and min[f, g] are detour functions if f and g are detour functions. Proof. (i) and (ii) follow from Lemma 1. Let f and g be detour functions. Obviously, we have x ≤ f (g(x)) and f (g(x)) ≤ f (g(x + 1)), and thus, (iii) follows from Lemma 2 since f (g(x)) = y ⇔ ∃z ≤ y[g(x) = z ∧ f (z) = y]. Further, (iv) follows from Lemma 2 since max[f, g](x) = y ⇔ (f (x) = y ∧ ∃z ≤ y[g(x) = z]) ∨ (g(x) = y ∧ ∃z ≤ y[f (x) = z]) and min[f, g](x) = y ⇔ (f (x) = y ∧ ¬∃z < y[g(x) = z]) ∨ (g(x) = y ∧ ¬∃z < y[f (x) = z]) .
4
The Lattice of Detour Degrees
Definition. We define f g :⇔ P− (f )∗ ⊆ P− (g)∗ ; f ≡ g :⇔ f g ∧ g f ; f ≺ g :⇔ f g ∧ g f . Now, ≡ is an equivalence relation, and the detour degrees are the ≡-equivalence classes of the detour functions. We denote the equivalence class of f by dg(f ), that is, dg(f ) = {g | g ≡ f }, and following the tradition of classical computability theory, we use boldface lowercase Latin letters a, b, c, . . . to denote our degrees. Furthermore, we use < and ≤ to denote the relations induced on the degrees by respectively ≺ and , and we use D to denote the set of detour degrees. Lemma 5. Let f, g, h be detour functions. (i) If h f and h g, then h min[f, g]. (ii) If f h and g h, then max[f, g] h. Proof. The following simple claim is the key to the proof. (Claim) Let f and g be detour functions, and let B = {x|f (x) ≤ g(x)}. We have B ∈ P− (min[f, g])∗ (and thus B ∈ P− (f )∗ and B ∈ P− (g)∗ ). Let ξ(x) = (μi ≤ min[f, g](x)) [f (x) = i ∨ g(x) = i] and let χB (x) = c(f (x) = ξ(x), 0, 1). Now, χB is the characteristic function of B, and χB ∈ P− (min[f, g]) by the lemmas in Section 3. This proves the claim. We prove (i). Assume h f and h g (*). (We will prove h min[f, g].) Assume A ∈ P− (h)∗ . (We will prove A ∈ P− (min[f, g])∗ .) By (*), we have A ∈ P− (f )∗ and A ∈ P− (g)∗ , and hence, we have φ1 , φ2 ∈ P− such that χA (x) = φ1 (x, f (x)) and χA (x) = φ2 (x, g(x)). By (Claim) we have η ∈ P− such that
The Structure of Detour Degrees
153
η(x, min[f, g](x)) = 0 iff f (x) ≤ g(x). Let ψ(x, z) = c(η(x, z), φ1 (x, z), φ2 (x, z)). Then, ψ ∈ P− and χA (x) = ψ(x, min[f, g](x)), and thus, A ∈ P− (max[f, g])∗ . This completes the proof of (i). In order to prove (ii), we assume f h and g h (*) and prove max[f, g] h. To prove max[f, g] h, we assume A ∈ P− (max[f, g])∗ and prove A ∈ P− (h)∗ . Since A ∈ P− (max[f, g])∗ , we have φ ∈ P− such that χA (x) = φ(x, max[f, g](x)). Let B = {x|f (x) ≤ g(x)}, and let x ∈ A1 ⇔ φ(x, f (x)) = 0 ∧ x ∈ B
and
x ∈ A2 ⇔ φ(x, g(x)) = 0 ∧ x ∈ B .
By (Claim), we have A1 ∈ P− (f )∗ and A2 ∈ P− (g)∗ , and by (*), we have A1 , A2 ∈ P− (h)∗ . By Lemma 3, we have A1 ∪ A2 ∈ P− (h)∗ . This completes the proof because A = A1 ∪ A2 . Lemma 6. For any detour functions f, f1 , g, g1 , we have (i) if f ≡ f1 and g ≡ g1 , then max[f, g] ≡ max[f1 , g1 ]; (ii) if f ≡ f1 and g ≡ g1 , then min[f, g] ≡ min[f1 , g1 ]. Proof. We prove f f1 ∧ g g1 ⇒ max[f, g] max[f1 , g1 ]. It follows that (i) holds. Assume f f1 and g g1 . Since f1 max[f1 , g1 ] and g1 max[f1 , g1 ], we have f max[f1 , g1 ] and g max[f1 , g1 ]. By Lemma 5 (ii), we have max[f, g] max[f1 , g1 ]. This completes the proof of (i). The proof of (ii) is symmetric; apply Lemma 5 (i) in place of Lemma 5 (ii). Lemma 4 (iv) and Lemma 6 show that the minimum and the maximum operators on the detour functions induce operators on the detour degrees, and thus, the next definition make sense. Definition. Let f ∈ a and g ∈ b be detour functions. We define a ∪ b and a ∩ b by respectively dg(max[f, g]) and dg(min[f, g]). Theorem 1 (The Lattice of Detour Degrees). The structure D, ≤, ∪, ∩ is a distributive lattice, that is, for any a, b, c ∈ D we have (i) a ∩ b is the greatest lower bound (glb) of a and b under the ordering ≤; (ii) a ∪ b is the least upper bound (lub) of a and b under the ordering ≤; (iii) a ∪ (b ∩ c) = (a ∪ b) ∩ (a ∪ c) and a ∩ (b ∪ c) = (a ∩ b) ∪ (a ∩ c). Proof. We prove (i). Since we have min[f, g] f and min[f, g] g, it is obvious that a ∩ b is a lower bound of a and b. To see that a ∩ b indeed is the greatest lower bound of a and b, pick any degree c that lies below both a and b. By Lemma 5 (i), we have that c also lies below a ∩ b. Thus, (i) holds. The proof of (ii) is symmetric (use Lemma 5 (ii) in place of Lemma 5 (i)). Finally, (iii) holds since the max operator distributes over the min operator and vice versa.
5
The Jump Operator
Lemma 7. Let f, g, h be detour functions, and let h be strictly monotone. If f g, then f ◦ h g ◦ h.
154
L. Kristiansen and P.J. Voda
˙ A (i). The function #A (x) gives the Proof. For any set A, let #A (x) = i≤x 1−χ cardinality of the set {y | y ∈ A ∧ y ≤ x}, and when f (x) ≥ x + 1, we have A ∈ P− (f )∗ ⇔ #A ∈ P− (f )
(Claim)
We skip the proof of the claim and assume f g. (We will prove f ◦ h g ◦ h.) To prove f ◦ h g ◦ h, we have to prove that A ∈ P− (f ◦ h)∗ entails A ∈ − P (g ◦ h)∗ . We assume A ∈ P− (f ◦ h)∗ (and prove A ∈ P− (g ◦ h)∗ .) By (Claim), we have #A ∈ P− (f ◦ h), and thus, there exists φ ∈ P− such that #A (x) = φ(x, f h(x)). Let B be the set given by #B (x) = φ(h−1 (x), f (x)). Then, B ∈ P− (f )∗ and #B (h(x)) = φ(h−1 h(x), f h(x)) = φ(x, f h(x)) = #A (x) . Now, since B ∈ P− (f )∗ and f g, we have B ∈ P− (g)∗ , and then, by (Claim), there exists ψ ∈ P− such that #B (x) = ψ(x, g(x)). Thus, #A (x) = #B (h(x)) = ψ(h(x), gh(x)) .
(*)
Let h (x, y) = (μz ≤ y)[h(x) = z] and ξ(x, y) = ψ(h (x, y), y). Then, ξ ∈ P− and ξ(x, gh(x)) = ψ(h (x, gh(x)), gh(x)) = ψ(h(x), gh(x)) = #A (x) . (*)
Thus, we have #A ∈ P− (g ◦ h), and by (Claim), we also have A ∈ P− (g ◦ h)∗ . It follows from Lemma 7 that f ≡ g ⇒ f ◦ h ≡ g ◦ h for any detour functions f , g and any strictly monotone detour function h. Now, 2x is a strictly monotone detour function, and furthermore, the detour functions are closed under composition. Hence, the next definition makes sense. Definition. We define the jump operator · on the detour functions by f (x) = f (2x ). We define the jump operator · on the detour degrees by a = dg(f ) where f is some detour function such that a = dg(f ). Let 0 denote the degree dg(I11 ). It is easy to see that we have 0 ≤ a for any degree a. (We have a = dg(f ) for some f such that f (x) ≥ I11 (x) = x, and thus P− (I11 )∗ ⊆ P− (f )∗ .) To prove that 0 < 0 < 0 < 0 < . . . , we have to resort to a machine model of computation. We have chosen the Turing machines. Lemma 8. There exists φ ∈ P− such that for any A ∈ P− (f )∗ we have a fixed k ∈ N and fixed e0 , e1 , e2 , . . . ∈ N such that χA (x) = φ(ei , x, z) for any z ≥ f (x)k and any i ∈ N. Proof. We will work with Turing machines accepting pairs of natural numbers. The two inputs numbers are given to the machines in dyadic notation, and |z| denotes the number of bits required to represent the number z. We assume some computable enumeration M0 , M1 , M2 , . . . of such machines, furthermore, we can w.l.o.g. assume that each machine occurs infinitely many times in the enumeration. We define the function Φ: Let Φ(e, x, y, z) = 0 if Me accepts the input x, y in space |z| and time z; let Φ(e, x, y, z) = 1 otherwise. We skip the proof of the following claim.
The Structure of Detour Degrees
155
(Claim 1) Φ ∈ P− . Obviously, we have P− ⊆ E 2 , and it is well known that any function in E 2 can be computed by a Turing machine working in linear space in the length of the input (Ritchie [22]). Thus, if φ is a binary function in P− , then φ(x, y) is computable by a Turing machine M running in space k1 | max(x, y)| for some fixed k1 ∈ N. Now, if M runs in space k1 | max(x, y)|, there also exists a fixed k ∈ N such that M runs in space | max(x, y)k | and time max(x, y)k . Hence, the next claim holds. (Claim 2) Let ψ be a binary 0-1 valued function in P− . There exists a fixed k ∈ N and infinitely many e ∈ N such that ψ(x, y) = Φ(e, x, y, max(x, y)k ). Let ·, · be a standard bijection from N × N into N, e.g. x, y = ( i) + y, i≤x+y
and let π1 , π2 be the corresponding decoding functions, i.e. π1 (x, y) = x and π2 (x, y) = y. The pairing function is not in P− , but the decoding functions 1 π1 , π2 are. Furthermore, the binary function x y belongs to P− . We define the function φ by 1 φ(w, x, y) = Φ(π1 (w), x, y π2 (w) , y) . It follows from our discussion and (Claim 1) that we have φ ∈ P− . To complete our proof, we will prove that for any A ∈ P− (f )∗ there exist k ∈ N and infinitely many e ∈ N such that χA (x) = φ(e, x, z) holds for any z ≥ f (x)k . By the definition of φ we have φ(e, k, x, z k ) = Φ(e, x, z, z k ) .
(*)
Let A be any set in P− (f )∗ . Then we have ψ ∈ P− such that χA (x) = ψ(x, z) for any z ≥ f (x). Furthermore, for some e, k and and any z ≥ f (x), we have χA (x) = ψ(x, z) = Φ(e, x, z, max(x, z)k ) = Φ(e, x, z, z k ) = φ(e, k, x, z k ) .
(Claim 2) since x ≤ f (x) ≤ z (*)
and indeed, by (Claim 2) there are infinitely many numbers e such that these equalities hold. Let d0 , d1 , d2 , . . . denote these numbers, and let ei = di , k. Now we have χA (x) = φ(ei , x, z k ) for any i ∈ N and any z ≥ f (x), and thus, χA (x) = φ(ei , x, z) for any i ∈ N and any z ≥ f (x)k . Theorem 2. Let f be any detour function such that for any k ∈ N we have f (x)k ≤ f (2x ) for all but finitely many x. Then, we have f f . Proof. Let B be the set given by 0 if φ(x, x, f (x)) = 1 χB (x) = 1 otherwise.
(*)
156
L. Kristiansen and P.J. Voda
where φ is the function given by Lemma 8. Obviously, we have B ∈ P− (f ). We prove that B ∈ P− (f ). Assume for the sake of a contradiction that B ∈ P− (f ). Then, by Lemma 8 and (*), we have ei and k such that χB (ei ) = φ(ei , ei , f (ei )k ) = φ(ei , ei , f (ei )) = χB (ei ) . (Since we have infinitely many ei ’s, it is possible to pick ei such that f (ei )k ≤ f (2ei ) = f (ei ).) It is a contradiction that χB (ei ) = χB (ei ). Theorem 3. Let 00 = 0 and 0n+1 = (0n ) . We have 0n < 0n+1 for any n ∈ N. Proof. It is trivial that 0n ≤ 0n+1 . Let g0 (x) = x and gi+1 (x) = 2gi (x) . Then we have dg(gi ) = 0i . Further, for any fixed k, we have gn (x)k < gn (2x ) = gn+1 (x) for all sufficiently large x. Hence, we have 0n+1 ≤ 0n by Theorem 2. Theorem 4 (Jump Inversion). Let a be any detour degree above 0 , i.e. 0 ≤ a. There exists a detour degree b such that b = a. Proof. Let a = dg(f ). We can w.l.o.g. assume that f (x) ≥ 2x . Let log2 x be the inverse function of 2x , and let g(x) = f (log2 x). Now, g is a detour function, and obviously we have g (x) = f (log2 2x ) = f (x), and hence, dg(g) = a.
6
A Few Theorems and Some Comments
The implication f (x) ≤ g(x) ⇒ dg(f ) ≤ dg(g) follows straightforwardly from our definitions. The reverse implication is not true. (So, our degree structure is not trivial.) Theorem 5. P− (x + 1)∗ ⊆ P− (x)∗ . By Lemma 7 and Theorem 5, we have dg(f (x)+a) = dg(f (x)+b) for any a, b ∈ N and any detour function f . The long and technical proof of the Theorem 5 is available in a preprint posted on the Internet [12]. By extending the ideas of this proof we expect to be able to prove that we have dg(xn ) = dg(xn + p(x)) for any n ∈ N and any detour polynomial p of degree strictly less than n. Theorem 6. For any decidable problem (computable set) A there exists a detour function f such that A ∈ P− (f )∗ . Proof. Let A be any computable set. By Kleene’s Normal Form Theorem (see e.g. Odifreddi [21] p. 179) there exists a primitive recursive function U and a (x, i)]). It can be primitive recursive relation T such that χA (x) = U((μi)[T proved that U, T ∈ P− . Let g(x) = (μi)[T (x, i)] and f (x) = i≤x g(i). Now, f is a detour function and A ∈ P− (f )∗ . The previous theorem shows that the detour degrees span all the computable sets. The next theorems show that the degree structure is sufficiently fine-grained
The Structure of Detour Degrees
157
to shed light upon (open) problems in subrecursion theory and complexity theory. We omit the proofs of these theorems due to space limitations. (Recall that any unary number-theoretic function generated by composition from constants, +, × and 2x , is a detour function. The small Grzegorczyk classes are defined in Section 2.) Theorem 7 (Turing Machine Space Classes). For i ∈ N, let space 2lin i be the set of problems decidable by a deterministic Turing machine working in x c|x| space 2i for some c ∈ N (where 2x0 = x and 2xi+1 = 22i ). Then, A ∈ space 2lin i p(x) iff A ∈ P− (2i )∗ for some polynomial p. Moreover, for any i ∈ N and any p(x) polynomial p, the function 2i is a detour function Theorem 8 (Small Grzegorczyk Classes). (i) A ∈ E∗0 iff A ∈ P− (x + k)∗ for some k. (ii) A ∈ E∗1 iff A ∈ P− (kx)∗ for some k. (iii) A ∈ E∗2 iff A ∈ P− (p)∗ for some polynomial p. 0 Now, it follows from Theorem 5 that P− ∗ = E∗ , and it is an open problem whether 0 1 2 any of the inclusions E∗ ⊆ E∗ ⊆ E∗ are strict. Thus, it will be very hard to prove that 0 = dg(p) for some polynomial p. It might even not be true. However, ?
?
in our degree structure the open problem E∗0 ⊆ E∗1 ⊆ E∗2 manifests itself as a special case of a more general open problem: Do there exist n ∈ N and a detour p(x) polynomial p such that 0n < dg(2n )? Finally, we will argue that there exist incomparable degrees below 0 , i.e., we have a, b < 0 such that a ≤ b and b ≤ a. Let d0 = 0 and di+1 = 2di , and for j = 0, 1, 2, let x 2 if d3i+j ≤ x < d3i+j+1 fj (x) = ˙ max(fj (x−1), x) otherwise. Now, f0 , f1 , f2 are detour functions such that max(f0 (x), f1 (x), f2 (x)) = 2x and min(f0 (x), f1 (x), f2 (x)) = x. Let a = dg(f0 ), b = dg(f1 ) and c = dg(f2 ). Then, we have a ∪ b ∪ c = 0 and a ∩ b ∩ c = 0. Now, since 0 = 0 , any two of the three degrees will probably be incomparable, but we are not sure how we best can prove this. (Since the three degrees are uniformly constructed it is extremely unlikely that one of them should equal 0 and one of them should equal 0 .) The construction can be generalised to yield incomparable degrees between any two degrees a, b where a < b. In general, we expect our degree structure to be interesting and challenging from a degree-theoretic point of view. The methods required to investigate the structure will probably be a mixture of the number-theoretic techniques developed by Kristiansen for honest degrees and the techniques developed for degrees of computable sets, e.g. delayed diagonalisation.
7
Stronger Reducibility Relations
Our reducibility relation is slightly weak from a complexity-theoretic point of view as the number of steps in a P− -computation is not bounded by a polynomial
158
L. Kristiansen and P.J. Voda
in the length of the input. (The number of steps in a P− -computation is bounded by a polynomial in the input x and thus by a number-theoretic expression of the form 2k|x| where |x| denotes the length of the input.) However, we believe that we can develop the structure of detour degrees for stronger reducibilities, e.g. − by resorting to the function algebra P− b . This function algebra is defined as P but the scheme of primitive recursion is replaced by the scheme of (primitive) recursion on notation, i.e., the scheme φ(x, 0) = ξ(x)
φ(x, 2y) = ψ0 (x, y, φ(x, y))
φ(x, 2y + 1) = ψ1 (x, y, φ(x, y)) .
The number of steps in a P− b -computation is bounded by a polynomial in the length of the input, and P− b∗ is included in logspace.
References 1. Ambos-Spies, K.: Polynomial time reducibilities and degrees. In: Griffor, E. (ed.) Handbook of Computability Theory, Elsevier, Amsterdam (1999) 2. Basu, S.: On the structure of subrecursive degrees. J. Comput. System Sci. 4, 452–464 (1970) 3. Clote, P.: Computation models and function algebra. In: Griffor, E. (ed.) Handbook of Computability Theory, Elsevier, Amsterdam (1999) 4. Grzegorczyk, A.: Some classes of recursive functions. Rozprawy Matematyczne IV, Warszawa (1953) 5. Kristiansen, L.: Neat function algebraic characterizations of Logspace and Linspace. Computational Complexity 14, 72–88 (2005) 6. Kristiansen, L.: Complexity-theoretic hierarchies induced by fragments of G¨ odel’s T . In: Theory of Computing Systems, Springer, Heidelberg (July 2007) 7. Kristiansen, L.: A jump operator on honest subrecursive degrees. Archive for Mathematical Logic 37, 105–125 (1998) 8. Kristiansen, L.: Subrecursive degrees and fragments of Peano arithmetic. Archive for Mathematical Logic 40, 365–397 (2001) 9. Kristiansen, L.: Information content and computational complexity of recursive sets. In: G¨ odel 1996. Logical Foundations of Mathematics, Computer Science and Physics – Kurt G¨ odel Legacy. Lecture Notes in Logic., vol. 6, pp. 235–246. Springer, Heidelberg (1996) 10. Kristiansen, L.: Low n , highn , and intermediate subrecursive degrees. In: Combinatorics, computation and logic. Proceedings of DMTCS 1999 and CATS 1999, Auckland, New Zealand. Australian Computer Science Communications, vol. 21, pp. 235–246. Springer, Heidelberg (1996) 11. Kristiansen, L., Barra, G.: The small Grzegorczyk classes and the typed λ-calculus. In: Cooper, S.B., L¨ owe, B., Torenvliet, L. (eds.) CiE 2005. LNCS, vol. 3526, pp. 252–262. Springer, Heidelberg (2005) 0 12. Kristiansen, L., Voda, P.: Constant detours do not matter and so P− ∗ = E∗ , http://www.ii.fmph.uniba.sk/∼ voda/E0.ps 13. Kristiansen, L., Voda, P.: Programming languages capturing complexity classes. Nordic Journal of Computing 12, 1–27 (2005) (special issue for NWPT 2004) 14. Kristiansen, L., Voda, P.: The trade-off theorem and fragments of G¨ odel’s T . In: Cai, J.-Y., Cooper, S.B., Li, A. (eds.) TAMC 2006. LNCS, vol. 3959, pp. 654–674. Springer, Heidelberg (2006)
The Structure of Detour Degrees
159
15. Ladner, R.: On the structure of polynomial time reducibility. J. Assoc. Comput. Mach. 22, 155–171 (1975) 16. Machtey, M.: On the density of honest subrecursive classes. J. Comput. System Sci. 10, 183–199 (1975) 17. Machtey, M.: The honest subrecursive classes are a lattice. Information and Control 24, 247–263 (1974) 18. Machtey, M.: Augmented loop languages and classes of computable functions. J. Comput. System Sci. 6, 603–624 (1972) 19. Merkle, W.: Lattice embeddings for abstract bounded reducibilities. SIAM Journal on Computing 31, 1119–1155 (2002) 20. Meyer, A., Ritchie, D.: A classification of the recursive functions. Z. Math. Logik Grundlagen Math. 18, 71–82 (1972) 21. Odifreddi, P.: Classical recursion theory. In: Studies in Logic and the Foundations of Mathematics, vol. 125, (Paperback edition 1992 edn.) North-Holland Publishing Co., Amsterdam (1989) 22. Ritchie, R.W.: Classes of predictably computable functions. Transactions of the American Mathematical Society 106, 139–173 (1963)
Hamiltonicity of Matching Composition Networks with Conditional Edge Faults Sun-Yuan Hsieh and Chia-Wei Lee Department of Computer Science and Information Engineering, National Cheng Kung University
[email protected],
[email protected]
Abstract. In this paper, we sketch structure characterization of a class of networks, called Matching Composition Networks (MCNs), to establish necessary conditions for determining the conditional fault hamiltonicity. We then apply our result to n-dimensional restricted hypercube-like networks, including n-dimensional crossed cubes, and n-dimensional locally twisted cubes, to show that there exists a fault-free Hamiltonian cycle if there are at most 2n − 5 faulty edges in which each node is incident to at least two fault-free edges. We also demonstrate that our result is worstcase optimal with respect to the number of faulty edges tolerated. Keywords: Algorithmica aspect of network problems, conditional edge faults, fault-tolerance, graph theory, Hamiltonian cycles, Hamiltonicity, matching composition networks, multiprocessor systems, restricted hypercube-like networks.
1
Introduction
Cycles (rings), the most fundamental class of network topologies for parallel and distributed computing, are suitable for designing simple algorithms with low communication costs. Numerous efficient algorithms based on cycles for solving various algebraic problems and graph problems can be found in [1,10]. These algorithms can also be used as control/data flow structures for distributed computing in arbitrary networks. Usually, when the Hamiltonicity of a graph G is an issue, investigations center in whether G is Hamiltonian or Hamiltonian-connected. A Hamiltonian cycle (Hamiltonian path) in a graph is a cycle (path) goes through every node of G exactly once. G is called Hamiltonian if there is a Hamiltonian cycle in G, and Hamiltonian-connected if there is a Hamiltonian path between every two distinct nodes of G. Examples of applying Hamiltonian paths and cycles to practical problems include on-line optimization of a complex flexible manufacturing system [2] and wormhole routing [16,17]. These applications motivated us to embed Hamiltonian cycles in networks.
Corresponding author.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 160–169, 2008. c Springer-Verlag Berlin Heidelberg 2008
Hamiltonicity of Matching Composition Networks
161
Since link (edge) faults may occur when a network is put into use, it is important to consider faulty network. Investigating the Hamiltonicity for various faulty networks has received a great deal of attention in recent years[4,5,6,7,9,12,13,14]. Among the proposed works, there are two assumptions for faulty edges: One is the standard fault-assumption in which there is no restriction on the distribution of faulty edges. This assumption is also adapted in [4,7,12,14]. The other is the conditional fault-assumption in which each node is incident to at least two fault-free edges. This assumption is adapted in [5,6,9,13]. In this paper, we investigate the hamiltonicity of a wide class of interconnection networks, called the Matching Composition Networks (MCNs), under the conditional fault-assumption. Basically, each MCN is constructed from two graphs G0 and G1 with the same number of nodes by adding a perfect matching between the nodes of G0 and G1 . We first establish necessary conditions for determining the hamiltonicity of an MCN with faulty edges. We then apply our result to a subclass of MCNs, called n-dimensional restricted hypercube-like networks, which include n-dimensional crossed cubes, and n-dimensional locally twisted cubes. We show that each mentioned cube contains a fault-free Hamiltonian cycle if there are at most 2n − 5 faulty edges in which each node is incident to at least two fault-free edges. Two of these particular applications are previously known results [6,9] using rather lengthy proofs. Our approach unifies these special cases and our proof is much simpler. We also demonstrate that our result is optimal in the worst case, with respect to the number of faulty edges tolerated. The remainder of this paper is organized as follows: Section 2 introduces some definitions, notations, and properties. In Sections 3 and 4, we investigate the respective Hamiltonicity of MCNs and restricted hypercube-like networks, under the conditional fault-assumption. In Section 5, we apply our result to two multiprocessor systems. Finally, some concluding remarks are given in Section 6.
2
Preliminaries
A graph G = (V, E) is comprised of a node set V and an edge set E, where V is a finite set and E is a subset of {(u, v)| (u, v) is an unordered pair of V }. We also use V (G) and E(G) to denote the node set and edge set of G, respectively. Two nodes u and v are adjacent if (u, v) is an edge in G. For a node v, we call the nodes adjacent to it the neighbors of v, denoted by NG (v). The degree of node v, denoted by dG (v), is the number of edges incident to it, i.e., dG (v) = |NG (v)|. Let δ(G) = min{dG (v)| v ∈ V (G)}. A path P [v0 , vt ] = v0 , v1 , ..., vt , is a sequence of distinct nodes such that any two consecutive nodes are adjacent. A path may contain other subpaths, denoted by v0 , v1 , ..., vi , P [vi , vj ], vj , vj+1 , ..., vt , where P [vi , vj ] = vi , vi+1 , ..., vj−1 , vj . A cycle v0 , v1 , . . . , vk , v0 for k ≥ 2 is a sequence of nodes in which any two consecutive nodes are adjacent, where v0 , v1 , . . . , vk are all distinct. A cycle that traverses each node of G exactly once is a Hamiltonian cycle and a graph is said to be Hamiltonian if there exists a Hamiltonian cycle. A Hamiltonian path is a path that contains every node of G and a graph is Hamiltonian-connected
162
S.-Y. Hsieh and C.-W. Lee
if there exists a Hamiltonian path between any two distinct nodes of G. A graph G is called k-fault Hamiltonian (respectively, k-fault Hamiltonian-connected ) if there exists a Hamiltonian cycle (respectively, a Hamiltonian path between each pair of distinct nodes) in G − F for any set F of faulty edges with |F | ≤ k. A graph G is k-mixed-fault Hamiltonian-connected if there exists a Hamiltonian path between each pair of distinct nodes in G − F for any set F of faulty elements (nodes and/or edges) with |F | ≤ k. The term k-mixed-fault Hamiltonian can be defined similarly. A graph G is k-fault edge-Hamiltonian-connected if there exists a Hamiltonian path P [x, y] between arbitrary two adjacent nodes x and y in G with |F | ≤ k. Furthermore, a graph G is said to be conditional k-fault Hamiltonian if there exists a Hamiltonian cycle in G − F for any set F of faulty edges with |F | ≤ k, under the conditional fault-assumption. Lemma 1. [12] If a graph G is k-mixed-fault Hamiltonian-connected and (k+1)mixed-fault Hamiltonian, then k ≤ δ(G) − 3 and thus k + 4 ≤ |V (G)|. A matching in a graph G is a set of non-loop edges with no shared end-nodes. The nodes incident to the edges of a matching M are saturated by M ; the others are unsaturated. A perfect matching in a graph is a matching that saturates every node. The class of Matching Composition Networks (MCNs) is defined as follows. Definition 1. Let G0 and G1 be two graphs with the same number of nodes. Let P M be an arbitrary perfect matching between V (G0 ) and V (G1 ), i.e., P M is a set of edges connecting V (G0 ) and V (G 1 ) in a one to one fashion; the resulting composition graph, denoted by G0 G1 , is called a Matching Composition Network (MCN), where the symbol “ ” represents an operation that connects G0 and G1 by adding a perfect matching between V (G0 ) and V (G1 ). an MCN.Note that For convenience, G0 and G1 are called thecomponents of V (G0 G1 ) = V (G0 ) V (G1 ) and E(G0 G1 ) = E(G0 ) E(G1 ) P M . For two end-nodes of an edge in P M , one is said to be the crossing neighbor of the other. For convenience, we use x ¯ to denote the crossing neighbor of x. PM
G0
G1
Fig. 1. An example of an MCN
Hamiltonicity of Matching Composition Networks
163
Vaidya et al.[15] introduced a class of hypercube-like networks, called HLgraphs, which can be defined recursively as follows: (1) HL0 = {K1 }, where K1 is a single node, and (2) for n ≥ 1, HLn = {G0 G1 | G0 , G1 ∈ HLn−1 }. A graph which belongs to HLn is called an n-dimensional HL-graph. Clearly, HLn belongs to MCNs. Note that each graph in HLn has 2n nodes and is regular with the common degree n. Furthermore, non-bipartite HL-graphs are said to be restricted HL-graphs (RHL), which can be recursively defined as fol{K1 }, RHL1 = {K2 }, RHL2 = {C4 }, RHL3 = {G(8, 4)}, lows [12]: RHL0 = and RHLn = {G0 G1 | G0 , G1 ∈ RHLn−1 } for n ≥ 4, where C4 is a cycle graph with 4 nodes and G(8, 4) is a recursive circulant defined in [11]. A graph which belongs to RHLn is called an n-dimensional restricted HL-graph. Definition 2. A graph G is said to be 2-disjoint-path-coverable (2-DPC for short) if given four distinct nodes x, y, u, and v in G, there exists two paths, P [x, y] and P [u, v], such that V (P [x, y]) V (P [u, v]) = ∅ and V (P [x, y]) V (P [u, v]) = V (G). Hereafter, let F be the set of faulty edges in G = G0 G1 and let F0 and F1 be the sets of faulty edges in G0 and G1 , respectively. Also, let FP M be the set of faulty edges in P M . Note that F = F0 F1 FP M . For convenience, let f0 = |F0 |, f1 = |F1 |, and fP M = |FP M |. Definition 3. A graph G is said to be k-fault super-Hamiltonian if the following properties hold: (1) G is k-mixed-fault Hamiltonian-connected, (2) G is (k + 1)fault Hamiltonian, (3) G is (k + 1)-fault edge-Hamiltonian-connected, and (4) G is conditional (2k + 1)-fault Hamiltonian. Definition 4. An MCN G = G0 G1 has the edge-property if for a Hamiltonian cycle Ci in Gi , i = 0, 1, there exist two edges (x, y) and (u, v) in Ci such that (¯ x, y¯) and (¯ u, v¯) are two edges in G1−i , where x, y, u, v are all distinct.
3
Conditional Fault Hamiltonicity of MCNs
Given the set of faulty edges F in a graph G, the notation δ(G − F ) ≥ 2 means that each node is incident to at least two fault-free edges in G − F . The following result determines the conditional fault hamiltonicity of MCNs. Theorem 1. Let G = G0 G1 be an MCN and let ki be a positive integer such that |V (Gi )| ≥ 4ki + 7 for i = 0, 1. Then, G is conditional (2k + 3)-fault Hamiltonian, where k = min{k0 , k1 }, if the following conditions hold: (1) each Gi is 2-DPC, (2) Gi is ki -fault super-Hamiltonian, and (3) G has the edge-property. Proof. Without loss of generality, we assume that f = 2k + 3 and f0 ≥ f1 . Hence, f1 ≤ (2k + 3)/2 = k + 1. We have the following three cases. Case 1: f0 ≤ 2k + 1. Since G0 is k0 -mixed-fault Hamiltonian-connected and (k0 + 1)-fault Hamiltonian, then by Lemma 1, δ(G0 ) ≥ k0 + 3. Hence, there is at most one node incident to only one fault-free edge in G0 ; otherwise,
164
S.-Y. Hsieh and C.-W. Lee
f0 ≥ (δ(G0 ) − 1) + (δ(G0 ) − 1) − 1 = 2δ(G0 ) − 3 ≥ 2k0 + 3 ≥ 2k + 3, which contradicts to the fact that f0 ≤ 2k + 1, which leads to the following subcases. Case 1.1: δ(G0 − F0 ) ≥ 2. Since f0 ≤ 2k + 1 ≤ 2k0 + 1 and G0 is conditional (2k0 + 1)-fault Hamiltonian, there exists a Hamiltonian cycle C0 in G0 − F0 . We have the following scenarios. Case 1.1.1: f1 = k + 1; hence, fP M ≤ 2k + 3 − 2(k + 1) = 1. Since G has the edge-property, there are two edges (x, y) and (u, v) in C0 such that (¯ x, y¯) and (¯ u, v¯) are also two edges in G1 , wherex, y, u, v are all ¯), (y, y¯), (u, u ¯), (v, v¯)} FP M | ≤ 1 and distinct. Since fP M ≤ 1, |{(x, x thus there are at least three fault-free edges in {(x, x ¯), (y, y¯), (u, u ¯), (v, v¯)}. Without loss of generality, we assume that (x, x¯) and (y, y¯) are both fault-free. By deleting (x, y) from C0 , we obtain a path P0 [x, y]. Moreover, since f1 = k + 1 ≤ k1 + 1 and G1 is (k1 + 1)-fault edgey, x ¯] in Hamiltonian-connected, there exists a Hamiltonian path P1 [¯ G0
G1
G0
x
x C0
x
x
y
y
C0
y
y
G1
(a)
(b)
G0
G1 w
G0
G1
y x
x
z
z
C1
x
x
y
y
C0
(c)
(d) G0
G1
x y C0
r
r
z
z
(e) Fig. 2. Illustration of the proof of Theorem 1, where dotted lines represent faulty edges
Hamiltonicity of Matching Composition Networks
165
G1 − F1 . Therefore, a fault-free Hamiltonian cycle can be constructed (see Figure 2(a)). Case 1.1.2: f1 ≤ k. Since |V (G0 )| ≥ 4k0 + 7 ≥ 4k + 7, there exists an edge (x, y) in C0 such that (x, x¯) and (y, y¯) are both fault-free1 . By deleting (x, y) from C0 , we obtain a path P0 [x, y]. Moreover, since f1 ≤ k = min{k0 , k1 } and G1 is k1 -mixed-fault Hamiltoniany, x ¯] in G1 − F1 . A connected, there exists a Hamiltonian path P1 [¯ fault-free Hamiltonian cycle can be constructed (see Figure 2(b)). Case 1.2: δ(G0 −F0 ) = 1. Let x be a unique node incident to only one faultfree edge in G0 . Note that f0 ≥ δ(G0 ) − 1 ≥ k0 + 2 ≥ k + 2, f1 + fP M ≤ (2k + 3) − (k + 2) = k + 1, and (x, x ¯) is fault-free. We have the following scenarios. Case 1.2.1: f1 = k + 1; hence, fP M = 0 and f0 = k + 2. Since f1 = k + 1 ≤ 2k+1 ≤ 2k1 +1 and G1 is conditional (2k1 +1)-fault hamiltonian, there is a Hamiltonian cycle C1 in G1 − F1 . Let (w, x) be the unique fault-free edge incident to x in G0 . Since f0 = k+2 and the number of faulty edges incident to x in G0 is at least δ(G0 ) − 1 ≥ k0 + 2 ≥ k + 2, the number of faulty edges in G0 are all incident to x. Let y and z be two neighbors of x ¯ in C1 . Clearly, y¯ = w or z¯ = w. Without loss of generality, we assume that z¯ = w. Moreover, since G0 is k0 -mixedfault Hamiltonian connected, there is a Hamiltonian path in G0 − x, where G0 −x is the graph obtained by deleting x from G0 . Therefore, a fault-free Hamiltonian cycle can be constructed (see Figure 2(c)). Case 1.2.2: f1 ≤ k. Since fP M ≤ k+1 and there are at least k+2 faulty / edges incident to x, there is an edge (x, y) ∈ F0 such that (y, y¯) ∈ FP M . Since f1 ≤ k ≤ k1 and G1 is k1 -mixed-fault Hamiltonianconnected, there exists a Hamiltonian path P1 [¯ y, x¯] in G1 − F1 . Since δ(G0 − (F0 − {(x, y)})) = 2, there is a Hamiltonian cycle C0 in G0 − (F0 − {(x, y)}) such that C0 contains (x, y). Therefore, a faultfree Hamiltonian cycle can be constructed (see Figure 2(d)). Case 2: f0 = 2k + 2; hence, f1 + fP M ≤ 1. Note that there is at most one node that is incident to only one fault-free edge in G0 . There are the following two subcases. Case 2.1: δ(G0 −F0 ) ≥ 2. We can select an edge (x, y) ∈ F0 such that (x, x¯) and (y, y¯) are both fault-free. Since |F0 − {(x, y)}| = 2k + 1 ≤ 2k0 + 1 and G0 is conditional (2k0 +1)-fault Hamiltonian, there is a Hamiltonian cycle C0 in G0 − (F0 − {(x, y)}). If C0 contains (x, y), then a fault-free Hamiltonian cycle can be constructed similar as Case 1.2.2. However, if C0 does not contain (x, y), we select an arbitrary edge (r, z) in C0 . By replacing x with r, y with z, x ¯ with r¯, and y¯ with z¯, a fault-free Hamiltonian cycle can also be constructed (see Figure 2(e)). Case 2.2: δ(G0 −F0 ) = 1. Let x be a unique node incident to only one faultfree edge in G0 . Since fP M ≤ 1, we can select an edge (x, y) ∈ F0 such 1
Since |V (G0 )| ≥ 4k + 7, C0 contains at least 4k + 7 edges and contributes total 4k + 7 > 2k + 3 ≥ f , which is choices. If such an edge does not exist, then fP M ≥ 4k+7 2 a contradiction.
166
S.-Y. Hsieh and C.-W. Lee
that (y, y¯) ∈ / FP M . Since |F0 −{(x, y)}| = 2k+1, δ(G1 −(F0 −{(x, y)})) = 2, and G0 is conditional (2k0 + 1)-fault Hamiltonian, there is a Hamiltonian cycle C0 in G0 − (F0 − {(x, y)}) such that C0 contains (x, y). Therefore, a fault-free Hamiltonian cycle can be constructed similar as Case 1.2.2. Case 3: f0 = 2k + 3; hence, f1 = fP M = 0. In this case, there are at most two nodes in which each node is incident to only one fault-free edge in G0 . We can construct a fault-free Hamiltonian cycle similar to that used in above cases. By combining above cases, we complete the proof.
4
Conditional Fault Hamiltonicity of RHLn
In this session, we will apply Theorem 1 to n-dimensional restricted hypercubelike networks RHLn . We will demonstrate that the conditions described in Theorem 1 can be further simplified because RHLn is recursively constructed. Lemma 2. [12] Every RHLn for n ≥ 3 is (n − 3)-mixed-fault Hamiltonianconnected and (n − 2)-mixed-fault Hamiltonian. The following lemmas show that RHLn is 2-DPC for n ≥ 4, and (n − 2)-fault edge-Hamiltonian-connected for n ≥ 6. Due to the space limitation, the proofs are omitted. Lemma 3. For any four distinct nodes x, y, u, and v in a graph G∈ RHLn for n ≥ 4, thereexist two paths, P [x, y] and P [u, v], such that P [x, y] P [u, v] = ∅ and P [x, y] P [u, v] = V (G). Lemma 4. Let G = G0 G1 be a graph in RHLn for n ≥ 6. If G has the edge property and G0 , G1 ∈ RHLn−1 are (n − 3)-fault edge-Hamiltonian-connected, then G is (n − 2)-fault edge-Hamiltonian-connected. Theorem 2. Let G = G0 G1 be a graph in RHLn with |V (G)| ≥ 8n − 18. Then, G is conditional (2n − 5)-fault Hamiltonian if the following conditions hold: (1) each Gi is (n − 3)-fault edge-Hamiltonian-connected, (2) each Gi is conditional (2n − 7)-fault Hamiltonian, and (3) G has the edge-property. Proof. Note that Gi ∈ RHLn−1 . We prove this theorem by checking whether all the conditions of Theorem 1 hold. First, by Lemma 3, each Gi is 2-DPC. Secondly, since |V (G)| ≥ 8n − 18, |V (Gi )| ≥ 8n−18 = 4(n − 4) + 7. By Lemma 2 2 and Conditions 2–3, each Gi is (n − 4)-fault super-Hamiltonian. Moreover, the edge-property hold by Condition 3. Therefore, all the conditions of Theorem 1 are satisfied and the result follows.
Hamiltonicity of Matching Composition Networks
5
167
Application to Multiprocessor Systems
In this section, we apply Theorem 2 to two popular multiprocessor systems: ndimensional crossed cubes, and n-dimensional locally twisted cubes, all of which belong to RHLn . It is of course possible to apply them to many other potentially useful systems. Interested readers can refer to [3,18] for the definitions of the above networks. To prove Theorem 3, we first show two useful lemmas in Lemma 5 and Lemma 6. Due to the space limitation, the proofs are omitted. Lemma 5. If C is a Hamiltonian cycle in CQin−1 for i = 0, 1, where n ≥ 3, then there exist two edges (x, y) and (u, v) in C such that (¯ x, y¯) and (¯ u, v¯) are also two edges in CQ1−i n−1 , where x, y, u and v are all distinct. Lemma 6. CQn for n ≥ 5 is (n − 2)-fault edge-Hamiltonian-connected. We can prove the following theorem by induction on n. The basis case for n = 3, 4, 5 can be shown using a computer program. For n ≥ 6, we show the result by Theorem 2, Lemmas 5 and 6. Then, we have the following theorem. Theorem 3. CQn for n ≥ 3 is conditional (2n − 5)-fault Hamiltonian. The following lemmas demonstrate that LT Qn has edge-property, and LT Qn is (n − 2)-fault-Hamiltonian-connected. The proofs of lemmas are omitted. Lemma 7. If C is a Hamiltonian cycle in LT Qin−1 for i = 0, 1, where n ≥ 3, then there exist two edges (x, y) and (u, v) in C such that (¯ x, y¯) and (¯ u, v¯) are , where nodes x, y, u and v are distinct. also two edges in LT Q1−i n−1 Lemma 8. LT Qn for n ≥ 5 is (n − 2)-fault edge-Hamiltonian-connected. We can prove the following theorem by induction on n. The basis case for n = 3, 4, 5 can be shown using a computer program. For n ≥ 6, we show the result by Theorem 2, Lemmas 7 and 8. Thus, we have Theorem 4. Theorem 4. LT Qn for n ≥ 3 is conditional (2n − 5)-fault Hamiltonian.
6
Concluding Remarks
In this paper, we have focused on investigating conditional fault hamiltonicty of matching composition networks. By applying the established theorem to restricted hypercube-like networks, we have successfully determined the conditional fault-tolerant hamiltonicities of two popular multiprocessor systems, including crossed cubes, and locally twisted cubes. Since an n-dimensional restricted hypercube-like network G, where n ≥ 3, studied in this paper possesses the edge-property, the girth (the length of the shortest cycle) equals four. This leads to a worst-case scenario: Let u, t, v, s, u forms a cycle of length 4. Also, assume that (u, s), (u, t), (v, s) and (v, t) are
168
S.-Y. Hsieh and C.-W. Lee
s fault-free
faulty
fault-free
u
v
fault-free
faulty
fault-free
t Fig. 3. A worst case scenario
fault-free. Clearly, any fault-free cycle containing nodes u and v must contain (u, s), (u, t), (v, s) and (v, t). Since G is regular of the common degree n, the above worst case contributed total 2n − 4 faulty edges; moreover, it is impossible to generate a fault-free Hamiltonian cycle containing nodes u, t, v, and s. Therefore, 2n−5 is worst-case optimal with respect to the number of faulty edges tolerated.
References 1. Akl, S.G.: Parallel Computation: Models and Methods. Prentice Hall, Englewood Cliffs (1997) 2. Ascheuer, N.: Hamiltonian Path Problems in the On-Line Optimization of Flexible Manufacturing Systems. PhD thesis, University of Technology, Berlin, Germany (1995) 3. Efe, K.: The crossed cube architecture for parallel computation. IEEE Transactions on Parallel and Distributed Systems 3(5), 513–524 (1992) 4. Fu, J.-S.: Fault-tolerant cycle embedding in the hypercube. Parallel Comptuing 29(6), 821–832 (2003) 5. Fu, J.-S.: Conditional fault-tolerant hamiltonicity of star graphs. In: 7th Parallel and Distributed Computing, Applications and Technologies (PDCAT 2006), pp. 11–16. IEEE Computer Society, Los Alamitos (2006) 6. Fu, J.-S.: Conditional fault-tolerant hamiltonicity of twisted cubes. In: 7th Parallel and Distributed Computing, Applications and Technologies (PDCAT 2006), pp. 5–10. IEEE Computer Society, Los Alamitos (2006) 7. Hsieh, S.-Y., Ho, C.-W., Chen, G.-H.: Fault-free hamiltoninan cycles in faulty arrangement graphs. IEEE Transactions on Parallel and Distributed Systems 10(3), 223–237 (1999) 8. Hsieh, S.-Y., Chen, G.-H., Ho, C.-W.: Longest fault-free paths in star graphs with edge faults. IEEE Transactions on Computers 50(9), 960–971 (2001) 9. Hung, H.-S., Chen, G.-H., Fu, J.-S.: Conditional fault-tolerant cycle-embedding of crossed graphs. In: 7th Parallel and Distributed Computing, Applications and Technologies (PDCAT 2006), pp. 90–95. IEEE Computer Society, Los Alamitos (2006) 10. Leighton, F.T.: Introduction to parallel algorithms and architecture: arrays· trees· hypercubes. Morgan Kaufmann, San Mateo, CA (1992) 11. Park, J.-H., Chwa, K.-Y.: Recursive circulant: A new topology for multicomputer networks. In: Internation Symposium Parallel Architectures, Algorithms and Networks (ISPAN 1994), pp. 73–80. IEEE press, New York (1994)
Hamiltonicity of Matching Composition Networks
169
12. Park, J.-H., Kim, H.-C., Lim, H.-S.: Fault-hamiltonicity of hypercube-like interconnection networks. In: 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2005), IEEE Computer Society, Los Alamitos (2005) 13. Tsai, C.-H.: Linear array and ring embeddings in conditional faulty hypercubes. Theoretical Computer Science 314, 431–443 (2004) 14. Tseng, Y.-C., Chang, S.-H., Sheu, J.-P.: Fault-tolerant ring embedding in a star graph with both link and node failures. IEEE Transactions on Parallel and Distributed Systems 8(12), 1185–1195 (1997) 15. Vaidya, A.S., Rao, P.S.N., Shankar, S.R.: A class of hypercube-like networks. In: 5th IEEE Symposium on Parallel and Distributed Proceeding (SPDP 1993), pp. 800–803. IEEE Press, Los Alamitos (1993) 16. Wang, N.-C., Chu, C.-P., Chen, T.-S.: A dual-hamiltonian-path-based multicasting strategy for wormhole-routed star graph interconnection networks. Journal of Parallel and Distributed Computing 62(12), 1747–1762 (2002) 17. Wang, N.-C., Yan, C.-P., Chu, C.-P.: Multicast communication in wormhole-routed symmetric networks with hamiltonian cycle model. Journal of Systems Architecture 51(3), 165–183 (2005) 18. Yang, X., Evans, D.J., Megson, G.M.: The locally twisted cubes. International Journal of Computer Mathematics 82(4), 401–413 (2005)
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs J. Czyzowicz1 , S. Dobrev2 , H. Gonz´ alez-Aguilar3, R. Kralovic4, E. Kranakis5, 6 J. Opatrny , L. Stacho7 , and J. Urrutia8 1
D´epartement d’informatique, Universit´e du Qu´ebec en Outaouais, Gatineau, Qu´ebec J8X 3X7, Canada. Research supported in part by NSERC. 2 Slovak Academy of Sciences, Bratislava, Slovakia. Research partially funded by grant APVV 0433-06. 3 Centro de Investigacion en Matematicas, Guanajuato, Gto., C.P. 36000, Mexico. 4 Department of Computer Science, Faculty of Mathematics, Physics and Informatics Comenius University, Bratislava, Slovakia. Research partially funded by grant APVV 0433-06. 5 School of Computer Science, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, Canada K1S 5B6. Research supported in part by NSERC and MITACS. 6 Department of Computer Science, Concordia University, 1455 de Maisonneuve Blvd West, Montr´eal, Qubec, Canada, H3G 1M8. Research supported in part by NSERC. 7 Department of Mathematics, Simon Fraser University, 8888 University Drive, Burnaby, British Columbia, Canada, V5A 1S6. Research supported in part by NSERC. 8 ´ Instituto de Matem´ aticas, Universidad Nacional Aut´ onoma de M´exico, Area de la investigaci´ on cientifica, Circuito Exterior, Ciudad Universitaria, Coyoac´ an 04510, M´exico, D.F. M´exico.
Abstract. The problem of computing locally a coloring of an arbitrary planar subgraph of a unit disk graph is studied. Each vertex knows its coordinates in the plane, can directly communicate with all its neighbors within unit distance. Using this setting, first a simple algorithm is given whereby each vertex can compute its color in a 9-coloring of the planar graph using only information on the subgraph located within at most 9 hops away from it in the original unit disk graph. A more complicated algorithm is then presented whereby each vertex can compute its color in a 7-coloring of the planar graph using only information on the subgraph located within a constant number of hops away from it.
1
Introduction
We are interested in the graph vertex coloring as applicable to wireless ad-hoc networks. The wireless ad-hoc networks of interest to us are geometrically embedded in the plane and consist of a number of location aware nodes, say n, whereby two nodes are adjacent if and only if they are within the transmission range of each other. If all the nodes have the same transmission range then these networks are known as unit disk graphs. For such graphs there have been several papers in the literature addressing the coloring problem. Among these it is M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 170–181, 2008. c Springer-Verlag Berlin Heidelberg 2008
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs
171
worth mentioning the work by Marathe et al. [10] (presenting an on-line coloring heuristic which achieves a competitive ratio of 6 for unit disk graphs: the heuristic does not need a geometric representation of unit disk graphs which is used only in establishing the performance guarantees of the heuristics), Graf et al. [7] (which improves on a result of Clark, Colbourn and Johnson (1990) and shows that the coloring problem for unit disk graphs remains NP-complete for any fixed number of colors k ≥ 3), Caragiannis et al. [2] (which proves an improved upper bound on the competitiveness of the on-line coloring algorithm First-Fit in disk graphs which are graphs representing overlaps of disks on the plane) and Miyamoto et al. [11] (which constructs multi-colorings of unit disk graphs represented on triangular lattice points). There are also several papers on coloring restricted to planar graphs of which we note Ghosh et al. [6] because it is concerned with a self-stabilizing algorithm for coloring such graphs. Their algorithm achieves a 6 coloring by transforming the planar graph into a DAG of out-degree at most five. However, this algorithm needs the full knowledge of the topology of the graph. The specificity of the problem for ad-hoc networks requires a different approach. An ad-hoc network can be a very large dynamic system, and in some cases a node can join or leave a network at any time. Thus, the full knowledge of the topology of an ad-hoc network might not be available, or possible for each node of the network at all times. Thus algorithms that can make computations in a fully distributed manner, using in each node only information about the network within a fixed distance neighborhood of the node, are of particular interest in ad hoc networks. Examples of algorithms of this type are the Gabriel test [5] for constructing a planar spanner, face routing [8],[1] or an approximation of the minimum spanning tree [14], [3]. To reduce network complexity, the unit disk graph G is sometimes reduced to a much smaller subset P of its edges called a spanner. A good spanner must have some properties so that certain parameters of communication within P are preserved. To ensure all to all communication P must be connected. An important property is having a constant stretch factor s, guaranteeing that the length of a path joining two nodes in G is at most s times shorter that the shortest path joining these nodes in P . A desired property of P is planarity, which, on one hand, permits an efficient routing scheme based on face routing and, on the other hand, ensures linear complexity of P with respect to its number of nodes. Planar graphs also have low chromatic number, hence a small set of frequencies is sufficient to realize radio communication. In this paper, we are interested in local distributed coloring algorithms whereby messages emanating from any node can propagate for only a constant number of hops. This model was first introduced in the seminal paper of Linial [9]. One of the advantages of this model is that it aims to obtain algorithms that could cope with a dynamically changing infrastructure in a network. In this approach, each node may communicate with nodes at a bounded distance from it and thus a local change in the network only needs a local adjustment of a solution. In Linial’s model of communication, locality results in a constant-time distributed algorithm.
172
1.1
J. Czyzowicz et al.
Network Model and Results of the Paper
We are given a set S of n points in the plane and a planar subgraph G of the unit disk graph induced by S. We assume that G is connected and all the nodes either know their exact (x, y) coordinates (which could be achieved for example by having the nodes equipped with GPS devices) or have a consistent relative coordinate system. If G is not connected, all reasoning can be applied to each connected component of G independently, in which case the coloring constructed applies to each connected component. We propose two local coloring algorithms. The first simple algorithm computes a 9-coloring of the planar graph. We assume that each node knows its 9-neighborhood in the original unit disk graph (i.e. all nodes at distance at least 9 hops away from it), it can communicate with each node of its neighborhood and it is aware which of these nodes belong to the planar subgraph. We then present a more complicated algorithm whereby each vertex can compute its color in a 7-coloring of the planer graph using only information on the subgraph located within at most a constant number h = 201 of hops away from it. The constant h is quite large though in practice nodes at much smaller distance will need to communicate. The algorithm does not determine locally either what the different connected components are or even what are the local parts of a component connected somewhere far away. Moreover, the correctness of the algorithm is independent of the connectivity of the planar subgraph.
2
Simple Local Coloring Algorithm
The basic idea of the coloring algorithms in this paper is to partition the plane containing G into fixed sized areas, compute a coloring of the subgraph of G within each such area independently and possibly adjust colors of some vertices that are on the border of an area and thus are adjacent to nodes in another area. This is possible to do consistently and without any pre-processing because the nodes know their coordinates and thus can determine the area in which they belong. Since each area is of fixed size, a subgraph of the given unit disk graph belonging to this area is of a bounded diameter. Hence a constant number of hops is needed for a node to communicate with each other node of the same area. 2.1
Coloring with Regular Hexagonal Tilings
The simplest partitioning we consider is obtained by tiling the plane with regular hexagons having sides of size 1. We suppose that two edges of the hexagons are horizontal and one of the hexagons is centered in coordinates [0, 0]. To assure the disjointness of the hexagon areas we assume that only the upper part of the boundary and the leftmost vertex belongs to each hexagon area while the rightmost vertex does not belong to it. Under such conditions, two vertices of G can be connected by an edge only if they are in the same or adjacent hexagon areas. As each vertex knows its own coordinates, it can calculate which hexagon it belongs to. In the first step each vertex communicates with vertices within its
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs
173
hexagon and learns the part of the subgraph located in its hexagon. By Lemma 1, communication inside a hexagon may be done using at most nine hops. Lemma 1. Any connected component of the subgraph of a unit disk graph induced by its nodes belonging to a regular hexagon with sides of size 1 has a diameter smaller or equal to nine. Moreover, there exist configurations of nodes inside the hexagon of diameter equal to nine. We could apply the standard 4-coloring algorithm for each connected component of the graph induced by G on points within each hexagon. Since the tiling of the plane by hexagons can be 3-colored, three disjoint sets of four colors can be used, one set of colors for vertices in each hexagon area of the same color. This
5,6,7,8 9,10,11,12
9,10,11,12 1,2,3,4
5,6,7,8
5,6,7,8 9,10,11,12
Fig. 1. Coloring of a hexagonal tiling of the plane using 12 colors
would lead to a 12 coloring of G (see the coloring scheme of the hexagonal tiling depicted in Figure 1). The number of colors can be reduced by coloring the outer face of each component in a hexagon with prescribed colors, using the result of the following theorem. Theorem 1 (3 + 2 Coloring, Thomassen [12]). Given a planar graph G, 3 prescribed colors and an outer face F of G, graph G can be 5-colored while the vertices of F use only the prescribed three colors. The precise result in Thomassen’s paper [12] (see also D¨ orre [4]) states that every planar graph is L-list colorable for every list assignment with lists of size 5. The proof of this result however implies the following statement taken from Tuza et al. [13] that for a planar graph G with outer face F , every pre-coloring of two adjacent vertices v1 , v2 of F can be extended to a list coloring of G for every list assignment with |L(v)| = 3 if v ∈ V (F ) − {v1 , v2 } and |L(v)| = 5 if v ∈ V (F ), where V (F ) denotes the vertex set of F . Using Theorem 1 the number of colors can be reduced to 9 as follows. The idea is that the vertices of G in a hexagon can be adjacent only to the outer face vertices of the graphs induced in the neighboring hexagons. Three disjoint
174
J. Czyzowicz et al.
4,5,6 7,8,9
1,2
7,8,9
1,2
1,2,3
1,2
4,5,6
4,5
4,5,6
1,2
7,8,9
1,2
1,2
Fig. 2. Colorings of a hexagonal tiling of the plane using 9-colors
sets of colors of size three are used as the prescribed colors of the vertices on the outer faces, the inner vertices of G in a hexagon can employ, in addition to the three colors used on the outer face of its hexagon, two additional colors of the outer faces of other hexagons (see Figure 2). Theorem 2. Using the partitioning of a plane into regular hexagons with sides of size 1, a vertex can compute its color in a 9-coloring of the planar graph using only information on the subgraph located within at most 9 hops away from it.
3
The 7-Coloring Algorithm
In this section we give a 7 coloring algorithm and prove its correctness. The tradeoff is the larger area of the network a node needs to examine in this algorithm and a more complex partitioning of the plane. 3.1
Reducing the Number of Colors Using Mixed Tilings
We will employ the tiling using octagons and squares shown in Figure 3. Each square is of size 5 + while the slanted part of an octagon border is of length 3 + , meaning that these sides can be chosen arbitrarily close to but greater than 5 and 3, respectively. We shall assume that one of the octagons is centered in coordinates [0, 0]. The reasons for choosing the sizes of tiles this way is to isolate the meeting places of octagons (the slanted border part) from each other so that each of them can be dealt with independently and locally, and to ensure that there is no edge between two vertices in different squares or between two vertices of the same squares that could be recolored for different crossings in the algorithm. In handling a meeting place, only vertices at a distance at most 2 from it will be recolored, therefore it is impossible to have two neighboring vertices recolored due to different crossing.
3
1,
2, 4,
5,
6
1,
2,
4,6
2,
3
1.2
1.2
6 5,
4,6
4,
4,6
4,
5,
6
1,
4, 3
4,6
2, 1,
6
1.2
3
4,6
5,
6
1,
2,
1.2
5, 4,
1.2
4,
5,
6
1,
1.2
3
4,6
4,6
175
2, 4,
5,
6
1,
1.2
3
4,
5,
6
2,
3
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs
1.2
Fig. 3. A coloring of the set of points based on the octagon/square tiling of the plane
Similarly as in the previous coloring, the subgraph induced by the vertices in a tile is colored using 5 colors, three of them are used on the outer face and these three colors plus the additional two colors are used on vertices not on the outer face using Theorem 1. Since each node knows its location, it can determine which tile it belongs to. By Lemma 2 the communication inside an octagon may be done using at most 201 hops. Lemma 2. Any subgraph of a unit disk graph induced by its vertices belonging to an octagon used in the algorithm has a diameter smaller or equal to 201. Despite the fact that the simple, surface comparing √ argument leaves some room for improvement (the packing density is at most π 3/6 = 0.907), it is possible to construct configurations of nodes, centered inside the octagon, inducing a graph of diameter at least 183. Since the square tile admits a smaller hop diameter, any node can determine the subgraph induced by the vertices in its tile by examining nodes at hop distance at most 201. The color sets used in the tiles are as specified in Figure 3. The resulting coloring is using only 6 colors. Due to the chosen sizes of the tiles and the chosen coloring scheme, an edge of G crossing from a square to an octagon is between vertices of different colors. After this initial color assignment, any edge of G whose endpoints are of the same color is necessarily an edge crossing the slanted part of the border of two adjacent octagons. The following construction shows that using one additional color and with careful attention to detail near the common border of adjacent octagons, some of the vertices can be recolored in order to achieve a 7-coloring of G. Details follow in Subsection 3.2. 3.2
Adjusting the Coloring
As seen from Figure 3, the centers of the tiles form an infinite regular mesh. We shall denote the hexagon tile that is centered at coordinates [0, 0] as S0,0 . For
176
J. Czyzowicz et al.
a tile denoted Si,j , its horizontal left, horizontal right, vertical down, vertical up neighboring tile is denoted Si−1,j , Si+1,j , Si,j+1 , Si,j−1 , respectively. Let Gi,j denote the subgraph of G induced by the vertices located within Si,j and suppose that Fi,j denotes the outer face of Gi,j . In case that Gi,j is not connected, we consider each connected component of Gi,j separately. (Notice that we only need to consider those components of Gi,j that contain vertices adjacent to more than one octagon, for otherwise the coloring of such a component could be added to the coloring of Gi+1,j+1 .) Let Si,j be one of the hexagonal tiles, i.e., i + j is even. Consider the place where Si,j and Si+1,j+1 meet (we call it a CRi,j crossing). Consider the sequence of vertices obtained in a counterclockwise cyclical traversal of the outer face Fi,j of Gi,j and let L = {u1 , u2 , . . . , uk } be the shortest subsegment of this traversal containing all the vertices connected to Gi+1,j+1 , i.e., as in Figure 4. Notice that L could contain some vertex more than once if the outer face is not a simple curve. Define M = {v1 , v2 , . . . , vl } analogously for Gi+1,j+1 , using clockwise traversal. We say that crossing CRi,j is simple if no inside vertex of L different from u1 and uk is connected both to a vertex of M and to a vertex in Gi,j+1 or Gi+1,j , and the same analogous condition holds for the inside vertices of M . If the crossing is simple, the problem of having some edges between vertices of L or M having both endpoints of the same color can be resolved using the following lemma.
uk
Gi+1,j
Gi,j u3
u2
vl u1
v2
Gi+1,j+1
v1 Gi,j+1
Fig. 4. A typical simple crossing
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs
177
Lemma 3. Let CRi,j be a simple crossing (see Figure 4). Then the vertices of L and M and some of the neighbors of u1 , v1 , uk and vk can be recolored, possibly with the help of color 7, in such a way that no edge incident to L or M or a neighbor of either of L or M has both endpoints of the same color. Proof. After the initial coloring, the only edges which might have endpoints of the same color are the edges connecting vertices of L to vertices of M . Without loss of generality we may assume that the vertices of L and M use colors 1, 2, 3, the inside of Gi,j uses in addition colors 4, 5, while the inside of Gi+1,j+1 uses colors 4, 6 in addition to 1, 2, 3. The recoloring is done in two steps, where in the first step the conflict vertices of L and M (i.e. the ones with a neighbor of the same color) are recolored as indicated in Table 1. Table 1. First step of recoloring Conflict vertex of old color new color L 1 6 M 2 5 L 3 7
This ensures that no edge incident to an inner vertex of L or M , i.e, different from {u1 , v1 , uk , vl }, has both endpoints of the same color since: – if there was an edge from L to M connecting two vertices of the same color, one endpoint of this edge has been recolored. – As no color 6 was used in Gi,j , the vertices of L recolored to 6 have no neighbors of color 6 in Gi,j (and they cannot be neighbors, as both had color 1 in the coloring of Gi,j ). They also do not have neighbors of color 6 in Gi+1,j+1 as all their neighbors in Gi+1,j+1 are on the outer face and thus of colors 1, 2, 3 (and newly 5). Finally, from the second property of simple crossing it follows that the inner conflict vertices of L and M do not have neighbors in Gi,j+1 and Gi+1,j , – Analogous argument applies for vertices recolored to 5 in M and vertices recolored to 7 in L. It remains to consider edges incident to {u1 , v1 , uk , vl } that might have endpoints of the same color; for example u1 was recolored from 1 to 6 but it has a neighbor of color 6 in Si,j+1 (note that there is no problem if the new color was 7). In such a case, these same-color neighbors in Si,j+1 are recolored to color 7. We claim that this does not create same-color edges. First note that if u1 recolored its neighbors of color 6 in Gi,j+1 , then v1 necessarily kept its original color, since v1 might change its color only if it was originally 2. Hence, by the simplicity of the crossing, only the neighbors of v1 of color 6 (or, by symmetry, only the neighbors of u1 of color 5) need to be recolored to color 7. The cases for other extreme vertices of L and M are analogous. Since the width of the gap
178
J. Czyzowicz et al.
u
Gi+1j uk
Gij
vl
u
u1
v1
Gi+1j+1
Gij+1
Fig. 5. A not simple crossing
between the squares is greater than 3, it ensures that the recoloring of vertices in Gi,j+1 and Gi+1,j does not create any conflict in coloring. Furthermore, any two vertices of Gi,j+1 which were recolored to 7 due to different crossings with octagons cannot be neighbors since the size of the squares is 5 + . While the width of the gap between the squares ensures the first condition for a crossing to be simple is always satisfied, there can be a case when several different vertices of L are connected both to a vertex of M and to a vertex in Gi,j+1 or Gi+1,j , see Figure 5. Notice that this may happen when some inner vertices of L or M are cut vertices in Gi,j or Gi+1,j+1 . We resolve the problem of a crossing that is not simple by a pre-processing phase in which some of the vertices in the octagons are assigned to the neighboring squares, with the goal to make the crossing simple. Consider a crossing CRi,j and let L and M be defined as before. Let u be the last (in L) occurrence of a node connected to both M and Gi,j+1 (if there is no such node, set u = u1 ). Similarly, let u be the first occurrence in L of a node connected to both M and Gi+1,j . Define v and v analogously in M , using clockwise traversal, see Figure 5. Any vertex of L which is connected to both Gi,j+1 and Gi+1,j+1 must occur in the segment of L from u1 to u , since the edges incident with u connecting it to Gi,j+1 and Gi+1,j+1 act as separators in the planar graph. Similarly, any node of L which is connected to both Gi+1,j and Gi+1,j+1 must occur in the segment of L from u to uk (see Figure 6).
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs
179
Gi+1j Gij
uk = u vl
L u
v
L2 M
L1 u1
v1
v
Gi+1j+1
Gij+1
Fig. 6. L and M in a crossing
We now partition L into three parts: Let L1 be the shortest initial segment of L from u1 to to first occurrence of u so that all vertices connected to both Gi,j+1 and Gi+1,j+1 are contained in L1 , let L3 be the shortest final segment of L starting with an occurrence of u so that all vertices connected to both Gi+1,j and Gi+1,j+1 are contained in L3 , and L2 be the remaining part of L. We define M1 , M2 , and M3 analogously as segments of M using v and v . To make the crossing simple, we assign the components of Gi,j separated by u and encountered in the traversal of L1 to Gi,j+1 , and the components of Gi,j separated by u and encountered in the traversal of L3 are assigned to are Gi+1,j . The same is applied to the segments of Gi+1,j+1 separated by v and v and encountered in the traversal of M1 and M3 , (see Figure 7). All the components that are assigned to Gi,j+1 are inside the area bordered by the edges connecting u or v to Gi+1,j+1 and Gi,j+1 or Gi,j and Gi,j+1 . Similarly all the components that are assigned to Gi+1,j are inside the area bordered by the edges connecting u or v to Gi+1,j+1 and Gi+1,j or Gi,j and Gi+1,j . Since the length of the crossing is more than 3, there cannot be any edge between vertices assigned to Gi+1,j and Gi,j+1 . Furthermore, after this reassignment, u is the only vertex in Gi,j that can be connected to both Gi+1,j+1 and Gi,j+1 and u is the only vertex in Gi,j that can be connected to both Gi+1,j+1 and Gi+1,j . The analogous statement can be made about v and v , Thus the crossing L and M between Gi,j and Gi+1,j+1 is a subset of u , L2 , u and v , M2 , v and this modified crossing CRi,j satisfies both conditions of a simple crossing, see Figure 7, and we can thus proceed with the coloring as stated in Lemma 3. The following lemma, together with the size of the squares being selected as 5 + , allows us to apply Lemma 3 to each crossing independently.
180
J. Czyzowicz et al.
Gi+1j Gij
uk = u L v
u M u1
v1
v
Gi+1j+1
Gij+1
Fig. 7. L and M after the transformation. The shaded areas belong to Gi,j+1 and Gi+1,j .
Lemma 4. Any vertex recolored due to resolving conflicts in crossing CRi,j is at a distance at most 2 from the line separating Si,j and Si+1,j+1 . 3.3
Local 7-Coloring Algorithm
Putting the pieces together we have the following local, fully distributed algorithm that is executed at each vertex of the graph to obtain a valid 7-coloring of the graph. Algorithm 1. The local 7-coloring algorithm for a vertex v 1: Learn your neighborhood up to distance 201 // Note that all steps can be performed locally using the information learned in the first (communication) step, without incurring further communication. 2: From your coordinates, identify the square/octagon Si,j you are located in, and calculate the connected component of Gi,j you belong to. // The next step is for vertices near a crossing 3: Calculate L and M , and then L and M . Determine whether you have been shifted to a neighboring square. Determine whether L and M are connected, if not but the squares are now connected, repeat the process until the final L∗ and M ∗ are computed. 4: Apply the 3 + 2 coloring algorithm from Theorem 1 for each Gi,j , as in Figure 3 5: Apply the recoloring from Lemma 3.
The results of this section can be summarized in the following theorem. Theorem 3. Given a planar subgraph of the unit disk graph whose vertices correspond to hosts that are each aware of its geometric location in the plane, Algorithm 1 computes locally a 7-coloring of this subgraph using only information on the subgraph located within a constant number of hops away from it.
Local 7-Coloring for Planar Subgraphs of Unit Disk Graphs
181
An interesting remaining open problem is whether the number of colors needed for a local coloring of a planar subgraphs of a unit disk graph can be decreased any further.
References 1. Bose, P., Morin, P., Stojmenovic, I., Urrutia, J.: Routing with guaranteed delivery in ad hoc wireless networks. wireless networks 7, 609–616 (2001) 2. Caragiannis, I., Fishkin, A.V., Kaklamanis, C., Papaioannou, E.: A tight bound for online coloring of disk graphs. In: Pelc, A., Raynal, M. (eds.) SIROCCO 2005. LNCS, vol. 3499, pp. 78–88. Springer, Heidelberg (2005) 3. Chavez, E., Dobrev, S., Kranakis, E., Opatrny, J., Stacho, L., Urrutia, J.: Local construction of planar spanners in unit disk graphs with irregular transmission ranges. In: Cardoso, J., Sheth, A.P. (eds.) SWSWPC 2004. LNCS, vol. 3387, pp. 286–297. Springer, Heidelberg (2005) 4. D¨ orre, P.: Every planar graph is 4-colourable and 5-choosable a joint proof. Fachhochschule S¨ udwestfalen (University of Applied Sciences) (unpublished note) 5. Gabriel, K.R., Sokal, R.R.: A new statistical approach to geographic variation analysis. Systemic Zoology 18, 259–278 (1972) 6. Ghosh, S., Karaata, M.H.: A self-stabilizing algorithm for coloring planar graphs. Distributed Computing 7, 55–59 (1993) 7. Gr¨ af, A., Stumpf, M., Weißenfels, G.: On coloring unit disk graphs. Algorithmica 20(3), 277–293 (1998) 8. Kranakis, E., Singh, H., Urrutia, J.: Compass routing on geometric networks. In: Proc. of 11th Canadian Conference on Computational Geometry, August 1999, pp. 51–54 (1999) 9. Linial, N.: Locality in distributed graph algorithms. SIAM J. COMP. 21(1), 193– 201 (1992) 10. Marathe, M.V., Breu, H., Hunt III, H.B., Ravi, S.S., Rosenkrantz, D.J.: Simple heuristics for unit disk graphs. Networks 25(1), 59–68 (1995) 11. Miyamoto, Y., Matsui, T.: Multicoloring unit disk graphs on triangular lattice points. In: SODA, pp. 895–896. SIAM, Philadelphia (2005) 12. Thomassen, C.: Every planar graph is 5-choosable. Combinatorial Theory Series B 62(1), 180–181 (1994) 13. Tuza, Z., Voigt, M.: A note on planar 5-list colouring: non-extendability at distance 4. Discrete Mathematics 251(1), 169–172 (2002) 14. Wang, Y., Li, X.-Y.: Localized construction of bounded degree and planar spanner for wireless ad hoc networks. In: DialM: Proceedings of the Discrete Algorithms and Methods for Mobile Computing & Communications (2003)
Generalized Domination in Degenerate Graphs: A Complete Dichotomy of Computational Complexity Petr Golovach1, and Jan Kratochv´ıl2, 1
Department of Informatics, University of Bergen, PB 7803, 5020 Bergen, Norway
[email protected] 2 Department of Applied Mathematics, and Institute for Theoretical Computer Science, Charles University, Malostransk´e n´ am. 25, 118 00 Praha 1, Czech Republic
[email protected]
Abstract. The so called (σ, ρ)-domination, introduced by J.A. Telle, is a concept which provides a unifying generalization for many variants of domination in graphs. (A set S of vertices of a graph G is called (σ, ρ)-dominating if for every vertex v ∈ S, |S ∩ N (v)| ∈ σ, and for every v ∈ / S, |S ∩ N (v)| ∈ ρ, where σ and ρ are sets of nonnegative integers and N (v) denotes the open neighborhood of the vertex v in G.) It is known that for any two nonempty finite sets σ and ρ (such that 0 ∈ / ρ), the decision problem whether an input graph contains a (σ, ρ)-dominating set is NP-complete, but that when restricted to some graph classes, polynomial time solvable instances occur. We show that for every k, the problem performs a complete dichotomy when restricted to k-degenerate graphs, and we fully characterize the polynomial and NPcomplete instances. It is further shown that the problem is polynomial time solvable if σ, ρ are such that every k-degenerate graph contains at most one (σ, ρ)-dominating set, and NP-complete otherwise. This relates to the concept of ambivalent graphs previously introduced for chordal graphs. Subject: Computational complexity, graph algorithms.
1
Introduction and Overview of Results
We consider finite undirected graphs without loops or multiple edges. The vertex set of a graph G is denoted by V (G) and its edge set by E(G). The open neighborhood of a vertex is denoted by N (u) = {v : (u, v) ∈ E(G)}. The closed neighborhood of a vertex u is the set N [u] = N (u) ∪ {u}. If U ⊂ V (G), then G[U ] denotes the subgraph of G induced by U . Let σ, ρ be a pair of nonempty sets of nonnegative integers. A set S of vertices of G is called (σ, ρ)-dominating if for every vertex v ∈ S, |S ∩ N (v)| ∈ σ, and for every v ∈ / S, |S ∩ N (v)| ∈ ρ. The concept of (σ, ρ)-domination was introduced by
Supported by Norwegian Research Council. Supported by the Czech Ministry of Education as Research Project No. 1M0545.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 182–191, 2008. c Springer-Verlag Berlin Heidelberg 2008
Generalized Domination in Degenerate Graphs
183
J.A. Telle [4,5] (and further elaborated on in [6,2]) as a unifying generalization of many previously studied variants of the notion of dominating sets. In particular, (N0 ,N)-dominating sets are ordinary dominating sets, ({0},N0 )-dominating sets are independent sets, (N0 ,{1})-dominating sets are efficient dominating sets, ({0},{1})-dominating sets are 1-perfect codes (or independent efficient dominating sets), ({0},{0, 1})-dominating sets are strong stable sets, ({0},N)-dominating sets are independent dominating sets, ({1},{1})-dominating sets are total perfect dominating sets, or ({r},N0 )-dominating sets are induced r-regular subgraphs (N and N0 denote the sets of positive and nonnegative integers, respectively). We are interested in the complexity of the problem of existence of a (σ, ρ)dominating set in an input graph, and we denote this problem by ∃(σ, ρ)domination. It can be easily seen that if 0 ∈ ρ, then the ∃(σ, ρ)-domination problem has a trivial solution S = ∅. So throughout our paper we suppose that 0∈ / ρ. We consider only finite sets σ and ρ and use the notation pmin = min σ, pmax = max σ, qmin = min ρ, and qmax = max ρ. It is known that for any nontrivial combination of finite sets σ and ρ (considered as fixed parameters of the problem), ∃(σ, ρ)-domination is NPcomplete [4]. It is then natural to pay attention to restricted graph classes for inputs of the problem. The problem is shown polynomial time solvable for interval graphs in [3], where also the study of its complexity for chordal graphs was initiated. A full dichotomy for chordal graphs was proved in [1], where a direct connection between the complexity of ∃(σ, ρ)-domination and the so called ambivalence of the parameter sets σ, ρ was noted. A pair (σ, ρ) is called ambivalent for a graph class G if there exists a graph in G containing at least two different (σ, ρ)-dominating sets (such a graph will be called (σ, ρ)-ambivalent), and the pair (σ, ρ) is called non-ambivalent otherwise. It is shown in [1] that for finite sets σ, ρ, ∃(σ, ρ)-domination is polynomial time solvable for chordal graphs if the pair (σ, ρ) is non-ambivalent (for chordal graphs), and it is NP-complete otherwise. It should be noted that the characterization which is given in [1] is nonconstructive in the sense that the authors did not provide a structural description of ambivalent (or non-ambivalent) pairs σ, ρ (and there is indication that such a description will not be simple). In this paper we consider the connection between ambivalence and computational complexity of ∃(σ, ρ)-domination for k-degenerate graphs. A graph G is called k-degenerate (with k being a positive integer) if every induced subgraph of G has a vertex of degree at most k. For example, trees are exactly connected 1-degenerate graphs, every outerplanar graph is 2-degenerate, and every planar graph are 5-degenerate. An ordering of vertices v1 , v2 , . . . , vn is called a k-degenerate ordering if every vertex vi has at most k neighbors among the vertices v1 , v2 , . . . , vi−1 . It is well known that a graph is k-degenerate if and only if it allows a k-degenerate ordering of its vertices. It is known [6] that for trees (and for graphs of bounded treewidth), ∃(σ, ρ)domination can be solved in polynomial time. Thus we assume k ≥ 2 throughout the paper. We prove that also in the case of k-degenerate graphs, ambivalence and NP-hardness of ∃(σ, ρ)-domination go hand in hand.
184
P. Golovach and J. Kratochv´ıl
Theorem 1. For finite sets σ, ρ, ∃(σ, ρ)-domination is polynomial (linear) time solvable for k-degenerate graphs if the pair (σ, ρ) is non-ambivalent for k-degenerate graphs (moreover, the problem can be solved by an algorithm which is polynomial not only in the size of the graph, but also in pmax and qmax ), and it is NP-complete otherwise. Unlike the case of chordal graphs, for k-degenerate graphs we are able to describe a complete and constructive classification of ambivalent and non-ambivalent pairs. Theorem 2. Let σ, ρ be finite sets, and k ≥ 2. If pmin > k, then no k-degenerate graph has a (σ, ρ)-dominating set. If pmin ≤ k, then the pair (σ, ρ) is nonambivalent for k-degenerate graphs if and only if one of the following two conditions holds: 1. (σ ∪ ρ) ∩ {0, 1, . . . , k} = {0}, 2. for every p ∈ σ and every q ∈ ρ, |p − q| > k. The last section of the paper is devoted to planar graphs. These undoubtedly form one of the most interesting k-degenerate classes of graphs (k = 5 in this case). Here we end up with several open problems. We are able to prove the NP-hardness part of an analog of Theorem 1. Theorem 3. For finite sets σ, ρ, ∃(σ, ρ)-domination is NP-complete for planar graphs if the pair (σ, ρ) is ambivalent for planar graphs. However, we do not know if non-ambivalence implies polynomial time recognition algorithm in this case. We are able to classify ambivalent and non-ambivalent pairs for some special pairs of sets σ and ρ, e.g., one-element sets, but even in this case the proof of non-ambivalence is nonconstructive and does not yield an algorithm. Theorem 4. Let σ, ρ be one-element sets, σ = {p}, ρ = {q}, and 0 = q. If p > 5, then no planar graph has a (σ, ρ)-dominating sets. And if p ≤ 5, then the pair (σ, ρ) is non-ambivalent for planar graphs if and only if q − p > 3 or p − q > 2.
2
Classification of Ambivalent and Non-ambivalent Pairs for k-Degenerate Graphs
In this section we present a structural characterization of ambivalent and nonambivalent pairs of sets (σ, ρ) for k-degenerate graphs. We also describe an algorithm which (in case of a non-ambivalent pair (σ, ρ)) constructs the unique (σ, ρ)-dominating set (if it exists) in an input k-degenerate graph. We start with the following simple statement. Lemma 1. Let σ, ρ be finite sets, and let k be a positive integer. If pmin > k, then no k-degenerate graph contains a (σ, ρ)-dominating set.
Generalized Domination in Degenerate Graphs
185
Proof. Let v1 , v2 , . . . , vn be a k-degenerate ordering of a k-degenerate graph G. Since deg vn ≤ k < pmin , the vertex vn does not belong to any (σ, ρ)-dominating set. But then every (σ, ρ)-dominating set of G is also a (σ, ρ)-dominating set of the subgraph of G induced by {v1 , v2 , . . . , vn−1 }. By repeating this argument inductively, we conclude that only the empty set can be (σ, ρ)-dominating. And since 0 ∈ / ρ, this is impossible. Now we assume that pmin ≤ k and we prove that the conditions given in Theorem 2 are sufficient for non-ambivalence of σ, ρ. Towards this end, we describe greedy algorithms which construct the unique candidate for a (σ, ρ)-dominating set. Let G be a k-degenerate graph with n vertices, and suppose that v1 , v2 , . . . , vn is a k-degenerate ordering of the vertices of G. We consider two cases, and in each if them a set S, which is a unique candidate for (σ, ρ)-dominating set in G, is constructed. Case 1. (σ ∪ ρ) ∩ {0, 1, . . . , k} = {0}. Procedure Construct A; U := V (G), S := ∅; while U = ∅ do i := max{j : vj ∈ U }; S := S ∪ {vi }, U := U \ N [vi ]; Return S Case 2. (σ ∪ ρ) ∩ {0, 1, . . . , k} = {0}, and for every p ∈ σ and every q ∈ ρ, |p − q| > k. Procedure Construct B; U := V (G), S := ∅; for i := n downto 1 do r := |N (vi ) ∩ S|, s := |N (vi ) ∩ U |; if there is p ∈ σ such that r ≤ p ≤ r + s then S := S ∪ {vi }, U := U \ {vi } else if there is q ∈ ρ such that r ≤ q ≤ r + s then U := U \ {vi } else Return There is no (σ, ρ)-dominating set, Halt; Return S Even if the procedures Construct A or Construct B construct a set S, it is still possible that this set is not (σ, ρ)-dominating. So we have to test for this property:
186
P. Golovach and J. Kratochv´ıl
Procedure Test; for i := 1 to n do if (vi ∈ S and |N (vi ) ∩ S| ∈ / σ) or (vi ∈ / S and |N (vi ) ∩ S| ∈ / ρ) then Return There is no (σ, ρ)-dominating set, Halt; Return S The properties of the algorithms are summarized in the next statement. Lemma 2. Let σ, ρ be finite sets. Suppose that k is a positive integer, pmin ≤ k and either (σ ∪ ρ) ∩ {0, 1, . . . , k} = {0}, or for every p ∈ σ and every q ∈ ρ, |p − q| > k. Then the described algorithms correctly construct the (σ, ρ)dominating set (if it exists) for any k-degenerate graph G, and this set is unique. The running time is O((pmax + qmax )(n + m)), where n is the number of vertices of G, and m is the number of its edges. Proof. The correctness of the procedure Construct A is straightforward. The loop invariant of the procedure is that no vertex of U has a neighbor in S. Hence if (σ ∪ ρ) ∩ {0, 1, . . . , k} = {0}, then every vertex v ∈ U with degree no more than k < qmin must belong to every (σ, ρ)-dominating set, and vertices of N (v) can not belong to any such set. The correctness of the procedure Construct B follows from the following observation: If for every p ∈ σ and every q ∈ ρ, |p − q| > k, then the set {r, r + 1, . . . , s} can not contain elements of both sets σ and ρ, since s ≤ k. And since the number of S-neighbors of vi (in the final (σ, ρ)-dominating set S) will end up in the interval [r, r + s], the justification is clear. It is known that a k-degenerate ordering can be constructed in time O(n+m). Then the estimate of the running time immediately follows from the description of the algorithms. (Note here that we can assume that pmax , qmax ≤ n since otherwise we can truncate the sets σ and ρ.) To complete the proof of Theorem 2 we have to prove that the conditions given in the theorem are not only sufficient but also necessary. We do so by constructing graphs with at least two different (σ, ρ)-dominating sets. Let σ, ρ be finite sets, and let k ≥ 2 be a positive integer. Suppose that pmin ≤ k, (σ∪ρ)∩{0, 1, . . . , k} = {0}, and there are p ∈ σ and q ∈ ρ such that |p − q| ≤ k. We consider 3 different cases. Case 1. max(σ ∩ {0, 1, . . . , k}) = 0. Since (σ ∪ ρ) ∩ {0, 1, . . . , k} = {0}, there is a q ∈ ρ such that q ≤ k. Then each class of bipartition of the complete bipartite graph Kq,q is a (σ, ρ)-dominating set in Kq,q and Kq,q (and consequently the pair (σ, ρ)) is ambivalent. Case 2. 1 ∈ σ. If p < q, then we start the construction with the complete bipartite graph Kq−p,q−p . Let the bipartition of its vertex set be {u1 , u2 , . . . , uq−p } and {v1 , v2 , . . . , vq−p }. We further join every pair of vertices ui and vi by p different paths of length 2. Let us denote by X the set of the middle vertices of these paths. Since k ≥ 2 and q − p ≤ k, the graph constructed is k-degenerate. And it has two different (σ, ρ)-dominating sets: {u1 , u2 , . . . , uq−p }∪X and {v1 , v2 , . . . , vq−p }∪X.
Generalized Domination in Degenerate Graphs
187
If p ≥ q, then the construction starts with two copies of the complete graph Kp−q+1 , with vertex sets {u1 , u2 , . . . , up−q+1 } and {v1 , v2 , . . . , vp−q+1 }. Again, we join every pair of vertices ui and vi by q different paths of length 2, and we let X denote the set of the middle vertices of these paths. Since k ≥ 2 and p− q ≤ k, the graph constructed in this way is k-degenerate. And it has two different (σ, ρ)dominating sets: {u1 , u2 , . . . , up−q+1 } ∪ X and {v1 , v2 , . . . , vp−q+1 } ∪ X. Case 3. r ∈ σ for some 2 ≤ r ≤ k. Let H denote the complete graph Kr+1 with one edge deleted, and let u, v be the endvertices of this edge. We will further refer to these vertices as the poles of H. If p < q, then we start the construction with two copies of the complete bipartite graph Kq−p,q−p with the bipartition of the vertex sets {u1 , u2 , . . . , uq−p }, {v1 , v2 , . . . , vq−p } and {x1 , x2 , . . . , xq−p }, {y1 , y2 , . . . , yq−p }, respectively. Then for every i ∈ {1, 2, . . . , q − p}, we introduce p copies of H and join one pole of each of them with ui and vi by edges, and the other pole with xi and yi . Let X be the union of the sets of vertices of all added graphs H. The resulting graph is k-degenerate (first the non-pole vertices of H’s have degrees r ≤ k, after their deletion the pole vertices have degrees 2 ≤ k, and after the deletion of these all the remaining vertices have degrees p − q ≤ k), and it has two different (σ, ρ)-dominating sets: {u1 , u2 , . . . , uq−p } ∪ {x1 , x2 , . . . , xq−p } ∪ X and {v1 , v2 , . . . , vq−p } ∪ {y1 , y2 , . . . , yq−p } ∪ X. If p ≥ q, then we start the construction with four copies of the complete graph Kp−q+1 , with vertex sets {u1 , u2 , . . . , up−q+1 }, {v1 , v2 , . . . , vp−q+1 }, {x1 , x2 , . . . , xp−q+1 }, and {y1 , y2 , . . . , yp−q+1 }. For every i ∈ {1, 2, . . . , p − q + 1}, we add q copies of the graph H and join one pole of each of them with ui and vi , and the other one with xi and yi . Again let X be the union of the sets of vertices of the added copies of H. The resulting graph is k-degenerate, and it has two different (σ, ρ)-dominating sets: {u1 , u2 , . . . , up−q+1 } ∪ {x1 , x2 , . . . , xp−q+1 } ∪ X and {v1 , v2 , . . . , vp−q+1 } ∪ {y1 , y2 , . . . , yp−q+1 } ∪ X. Unifying the claims of Lemmas 1, 2 and these constructions we have completed the proof of Theorem 2. Also since we presented polynomial time algorithms which construct unique (σ, ρ)-dominating sets (if they exist) for the non-ambivalent pairs (σ, ρ), the polynomial part of Theorem 1 is proved. To conclude this section, let us point out a property of the constructed graphs which will be used in the next section. Lemma 3. For every ambivalent pair (σ, ρ), there is a k-degenerate graph G with at least two different (σ, ρ)-dominating sets, which has a k-degenerate ordering v1 , v2 , . . . , vn such that for some , the first vertices v1 , . . . , v belong to one and are not included to the other (σ, ρ)-dominating set.
3
NP-Completeness of ∃(σ, ρ)-domination for Ambivalent Pairs
It this section we outline the proofs of the NP-hardness part of Theorem 1 and of Theorem 3.
188
P. Golovach and J. Kratochv´ıl
We use a reduction from a special covering problem. Let r be a positive integer. An instance of the Cover by no more than r sets is a pair (X, M ), where X is a nonempty finite set and M is a collection of sets of elements of X. We ask about the existence of a collection M ⊂ M of sets such that every element of X belongs to at least one and to at most r sets of M . The graph G(X, M ) of an instance (X, M ) is the bipartite graph with the vertex set X ∪ M and edge set {xm|x ∈ m ∈ M }. The proof of the following lemma will appear in the full version of the paper. Lemma 4. For every fixed r ≥ 1, the Cover by no more than r sets problem is NP-complete even for instances (X, M ) for which the graph G(X, M ) is 2-degenerate. It also stays NP-complete if the graph G(X, M ) is planar. The main technical part of the NP-hardness proof is the construction of a gadget which “enforces” on a given vertex the property of “not belonging to any (σ, ρ)dominating set”, and which guarantees that this vertex has a given number of neighbors in any (σ, ρ)-dominating set in the gadget: Lemma 5. Assume that k ≥ 2 and pmin ≤ k. Let r be a nonnegative integer. Then there is a rooted graph F with the root u such that: 1. there is set S ⊂ V (F ) \ {u} such that for every x ∈ S, |N (x) ∩ S| ∈ σ, and for every x ∈ / S, x = u, |N (x) ∩ S| ∈ ρ; 2. for every such set S, |N (u) ∩ S| = r; 3. for every set S ⊂ V (F ) such that u ∈ S, either there is x ∈ S, x = u, for / S for which |N (x) ∩ S| ∈ / ρ; which |N (x) ∩ S| ∈ / σ, or there is x ∈ 4. F has a k-degenerate ordering with u as the first vertex. The construction of F (which will appear in the full version of the paper) is technical and requires a lengthy case analysis. A specific variant of the gadget F for planar graphs is also constructed, and the construction will also appear in the full version of the paper. Now we complete the proof of Theorem 1. Suppose that σ, ρ are finite sets of integers, k ≥ 2, and the pair (σ, ρ) is ambivalent for k-degenerate graphs. Let / ρ, i + 1 ∈ ρ}. Since 0 ∈ / ρ, r is correctly defined. We are r = max{i ∈ N0 : i ∈ going reduce Cover by no more than t sets for t = qmax − r. Suppose that (X, M ) is an instance of Cover by no more than t sets such that the graph G(X, M ) is 2-degenerate, X = {x1 , x2 , . . . , xn } and M = {s1 , s2 , . . . , sm }. Let H be a k-degenerate ambivalent rooted graph with root u, such that u belongs to some (σ, ρ)-dominating set and u is not included in some other (σ, ρ)-dominating set, and H has a k-degenerate ordering for which the root is the first vertex. The existence of such a graph was proved in Lemma 3. For every vertex si of the graph G(X, M ), we take a copy of the graph H and identify its root with si . For every vertex xj of our graph, a copy of the graph F (cf. Lemma 5) is constructed and its root is identified with xj . Denote the graph obtained in this way by G. Clearly, G is k-degenerate. We claim that the graph G has a (σ, ρ)-dominating set if and only if (X, M ) allows a cover by no more than t sets. Since the graphs H and F depend only
Generalized Domination in Degenerate Graphs
189
on σ and ρ, G has O(n + m) vertices, our reduction is polynomial and the proof will be concluded. Suppose first that G has a (σ, ρ)-dominating set S. Let M = {si ∈ M : si ∈ S}. It follows from the properties of the forcing gadget F that every vertex xj has exactly r neighbors in the gadget with root xj and xj ∈ / S. Then xj has at least one S-neighbor in the set {s1 , s2 , . . . , sm }, but no more than t = qmax − r such neighbors. So, M is a cover of X by no more than t sets. Suppose now that M ⊆ M is a cover of X by no more than t sets. For every i = 1, 2, . . . , m, we choose a (σ, ρ)-dominating set Si in the copy of H with the root si such that si ∈ Si if and only if si ∈ M . Let S1 , S2 , . . . , Sn be (σ, ρ)-dominating sets in the copies of F . Since {t + 1, t + 2, . . . , qmax } ⊆ ρ, S = S1 ∪ S2 ∪ · · · ∪ Sm ∪ S1 ∪ S2 ∪ . . . Sn is a (σ, ρ)-dominating set in G. The proof of Theorem 3 follows along the same lines. Suppose that (X, M ) is an instance of Cover by no more than t sets such that the graph G(X, M ) is planar, X = {x1 , x2 , . . . , xn } and M = {s1 , s2 , . . . , sm }. Let H be a planar ambivalent rooted graph with root u, such that u belongs to some (σ, ρ)-dominating set and u is not included in some other (σ, ρ)-dominating set. For every vertex si of the graph G(X, M ), we construct a copy of the graph H and unify its root with si . For every vertex xj of our graph, a copy of the forcing gadget F is constructed and the root of F is identified with xj . Let G be the resulting graph. Obviously, G is planar and G has a (σ, ρ)-dominating set if and only if (X, M ) allows a cover by no more than t sets.
4
Ambivalence and Non-ambivalence for Planar Graphs
Since planar graphs are 5-degenerate, Theorem 2 gives sufficient conditions for non-ambivalence, but these conditions are not necessary for planar graphs. In this section we give some new sufficient conditions for non-ambivalence for planar graphs for certain cases of sets σ and ρ, and prove that in some cases these conditions are also necessary. We start with the case qmin > pmax . Lemma 6. Let σ, ρ be finite sets, and pmin ≤ 5. If qmin − pmax > 3, then the pair (σ, ρ) is non-ambivalent for planar graphs. Proof. Assume that qmin − pmax > 3 and let G be a planar graph with two different (σ, ρ)-dominating sets S1 and S2 . Let X = S1 ∩ S2 , Y1 = S1 \ S2 and Y2 = S2 \ S1 . If x ∈ Y1 , then since x ∈ S1 , |N (x) ∩ X| ≤ pmax , and since x ∈ / S2 , |N (x) ∩ S2 | ≥ qmin . So, x has at least 4 neighbors in Y2 . Similarly for y ∈ Y2 . Hence G[Y1 ∪ Y2 ] is a planar bipartite graph such that every vertex has degree at least 4, but this is impossible, since planar bipartite graphs are 3-degenerate. Now we consider the case qmax < pmin . Lemma 7. Let σ = {p} for some p ≤ 5 and 0 ∈ / ρ. If p − qmax > 2, then the pair (σ, ρ) is non-ambivalent for planar graphs.
190
P. Golovach and J. Kratochv´ıl
Proof. Suppose that p − qmax > 2 and let G be a planar graph with two different (σ, ρ)-dominating sets S1 and S2 . Let X = S1 ∩S2 , Y1 = S1 \S2 and Y2 = S2 \S1 . If x ∈ Y1 , then since x ∈ / S2 , |N (x)∩X| ≤ qmax , and since x ∈ S1 , |N (x)∩S1 | = p. So x has at least 3 neighbors in Y1 , i.e., G[Y1 ] and G[Y2 ] are planar graphs with all vertices of degree at least 3. Assume that a vertex x ∈ Y1 is not adjacent to any vertex of Y2 . Then x has some neighbor y ∈ X. The vertex y must be adjacent to some vertex z ∈ Y2 because σ contains exactly one element. Hence every vertex of Y1 is either adjacent to some vertex of Y2 , or is connected with some vertex of Y2 by a path of length two with the middle vertex from X. Consider a plane embedding of G. It induces plane embeddings of G[Y1 ] and G[Y2 ]. Let x ∈ Y1 . It is joined by an edge or by a path of length two to some vertex y ∈ Y2 , which belongs to some component H of G[Y2 ]. This graph H lies completely in one face of G[Y1 ]. Since all vertices of H have degrees at least 3, the graph H is not outerplanar. Then there is a vertex z ∈ V (H) which does not belong to the boundary of the external face of H. By repeating the same arguments for z instead of x, we conclude that some component of G[Y1 ] lies completely in some internal face of H, and so on. Since the number of components of G[Y1 ] and G[Y2 ] is finite, this immediately gives a contradiction. The conditions given in Lemmas 6 and 7 are not only sufficient but also necessary for one-element sets σ and ρ, and this completes the proof of Theorem 4. The proof of this claim is provided by examples which are omitted here and will be given the full version of the paper. We conclude the section and the paper by explicitly stating some questions left open for planar graphs. Problem 1. Is ∃(σ, ρ)-domination polynomial (NP-complete) when restricted to planar graphs if and only if the pair σ, ρ is non-ambivalent (ambivalent, respectively) for planar graphs? We believe that it would be interesting to solve this problem even for one-element sets σ and ρ. Problem 2. Complete the characterization of ambivalent pairs σ, ρ for planar graphs.
References 1. Golovach, P., Kratochvil, J.: Computational complexity of generalized domination: A complete dichotomy for chordal graphs. In: Brandstaedt, A., Kratsch, D., M¨ uller, H. (eds.) Graph-theoretic Concepts in Computer Science, Proceedings WG 2007. LNCS, vol. 4769, pp. 1–11. Springer, Heidelberg (2007) 2. Heggernes, P., Telle, J.A.: Partitioning graphs into generalized dominating sets. Nordic J. Comput. 5, 173–195 (1998) 3. Kratochv´ı, J., Manuel, P., Miller, M.: Generalized domination in chordal graphs. Nordic Journal of Computing 2, 41–50 (1995)
Generalized Domination in Degenerate Graphs
191
4. Telle, J.A.: Complexity of domination-type problems in graphs. Nordic Journal of Computing 1, 157–171 (1994) 5. Telle, J.A.: Vertex partitioning problems: characterization, complexity and algorithms on partial k-trees, Ph. D tesisis, Department of Computer Science, Universiy of Oregon, Eugene (1994) 6. Telle, J.A., Proskurowski, A.: Algorithms for vertex partitioning problems on partial k-trees. SIAM J. Discrete Math. 10, 529–550 (1997)
More on Weak Bisimilarity of Normed Basic Parallel Processes Haiyan Chen1,2 1
State Key Laboratory Computer Science of Institute of Software, Chinese Academy of Sciences 2 Graduate School of the Chinese Academy of Sciences
[email protected]
Abstract. Deciding strong and weak bisimilarity of BPP are challenging because of the infinite nature of the state space of such processes. Deciding weak bisimilarity is harder since the usual decomposition property which holds for strong bisimilarity fails. Hirshfeld proposed the notion of bisimulation tree to prove that weak bisimulation is decidable for totally normed BPA and BPP processes. In this paper, we present a tableau method to decide weak bisimilarity of totally normed BPP. Compared with Hirshfeld’s bisimulation tree method, our method is more intuitive and more direct. Moreover from the decidability proof we can derive a complete axiomatisation for the weak bisimulation of totally normed BPP.
1
Introduction
A lot of attention has been devoted to the study of decidability and complexity of verification problems for infinite-state systems [1,15,16]. In [2], Baeten, Bergstra, and Klop proved the remarkable result that bisimulation equivalence was decidable for irredundant context-free grammars (without the empty product). Subsequently, many algorithms in this domain were proposed. In [7], Hans H¨ uttel and Colin Stirling proved the decidability of normed BPA by using a tableau method, which can also be used as a decision procedure. Decidability of strong bisimilarity for BPP processes has been established in [13]. Furthermore, [14] proved that deciding strong bisimilarity of BPP is PSPACE-complete. For weak bisimilarity, much less is known. Semidecidability of weak bisimilarity for BPP has been shown in [5]. In [6] it is shown that weak bisimilarity is decidable for those BPA and BPP processes which are “totally normed”. P.Janˇ car conjectured that the method in [14] might be used to show the decidability of weak bisimilarity for general BPP. However, the problem of decidability of weak bisimilarity for general BPP is open. Our work is inspired by Hirshfeld’s idea. In [6] Hirshfeld proposed the notion of bisimulation tree to prove the decidability of weak bisimulation of totally normed
Supported by the National Natural Science Foundation of China under Grant Nos. 60673045, 60496321.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 192–203, 2008. c Springer-Verlag Berlin Heidelberg 2008
More on Weak Bisimilarity of Normed Basic Parallel Processes
193
BPP. Based on the idea, we show that weak bisimulation for totally normed BPP is decidable by a tableau method. In [13], S. Christensen, Y. Hirshfield and F. Moller proposed a tableau decision procedure for deciding strong bisimilarity of normed BPP. The key for tableau method to work is a nice decomposition property which holds for strong bisimulation, but fails for weak bisimulation. In our work, instead of using decomposition property, we apply Hirshfeld’s idea to control the size of the tableaux to make the tableau method work correctly. This approach not only provides us a more direct decision method, but also has the advantage of providing a completeness proof of an equational theory for weak bisimulation of totally normed BPP processes, similar to the tableau method of [13] provides such a completeness proof for strong bisimulation of normed BPP processes. Moreover, the termination proof for tableau is greatly simplified. The paper is organized as follows. Section 2 introduces the notion of BPP processes and weak bisimulation and describes weak bisimulation equivalence. Section 3 gives the tableau decision method and presents the soundness and completeness results. In Section 4 we prove the completeness of the equational theory. Finally, Section 5 sums up conclusions and gives suggestions for further work.
2
BPP Processes and Weak Bisimulation Equivalence
Assuming a set of variables V, V={X, Y, Z, · · ·} and a set of actions Actτ , Actτ = {τ, a, b, c, · · ·} which contains a special element τ , we consider the set of BPP expressions E given by the following syntax; we shall use E, F, . . . as metavariables over E. E ::= 0 |X | E1 + E2 | μE | E1 |E2
(inaction) (variables, X ∈ V) (summation) (μ ∈ Actτ ) (merge)
A BPP process is defined by a finite family of recursive process equations def
Δ = {Xi = Ei |1 ≤ i ≤ n} where the Xi ∈ V are distinct variables and each Ei is BPP expressions, and free variables in each Ei range over set {X1 , . . . , Xn }. In this paper, we concentrate on guarded BPP systems. Definition 1. A BPP expression E is guarded if each occurrence of variable is within the scope of an atomic action, and a BPP system is guarded if each Ei is guarded for 1 ≤ i ≤ n. Definition 2. The operational semantics of a guarded BPP system can be simply given by a labeled transition system (S, Actτ , −→) where the transition relation −→ is generated by the rules in Table 1.
194
H. Chen Table 1. Transition rules rec
E −→ α
sum2
a
E + F −→ α a
(X = E ∈ Δ)
F −→ β a
E + F −→ β a
E −→ α
par1
a
X −→ E a
a
sum1
E −→ E a
a
aE −→ E
act
par2
a
E|F −→ α|F
F −→ β a
E|F −→ E|β
where the state space S consists of finite parallel of BPP processes, and the transition relation −→⊆ S × Actτ × S is generated by the rules in Table 1, in which (as also later) we use Greek letters α, β, · · · as meta variables ranging over elements of S, Each such α denotes a BPP process by forming the product of the elements of α, i.e. by combining the elements of α in parallel using the merge operator. We write for empty sequence. We shall write X n to represent the term X| · · · |X consisting of n copies of X combined in parallel. By length(α) we denote the cardinality of α. It is shown in [3] that any guarded system can be effectively transformed into a 3-GNF normal form def
ni aij αij |1 ≤ i ≤ m} {Xi = Σj=1
where for all i, j such that 1 ≤ i ≤ m, 1 ≤ j ≤ ni , length(αij ) < 3. So we only considered BPP processes given in 3-GNF in this paper. τ a a Moreover, we write α =⇒ β for α(−→)∗ β, and write α =⇒ β for α=⇒ −→=⇒ β. Let ˆ : Actτ → {Actτ − τ } ∪ be the function such that a ˆ = a when a = τ and τˆ = , then the following general definition of weak bisimulation on S is standard. Definition 3. A binary relation R ⊆ S × S is a weak bisimulation if for all (α, β) ∈ R the following conditions hold: a
a ˆ
1. whenever α −→ α , then β =⇒ β for some β with (α , β ) ∈ R; a a ˆ 2. whenever β −→ β , then α =⇒ α for some α with (α , β ) ∈ R. Two states α and β are said to be weak bisimulation equivalent, written α ≈ β, if there is a weak bisimulation R such that (α, β) ∈ R. It is standard to prove that ≈ is an equivalence relation between processes. Moreover it is a congruence with respect to composition on S: Proposition 1. If α1 ≈ β1 and α2 ≈ β2 then α1 |α2 ≈ β1 |β2 . Proposition 2. α|β ≈ β|α.
More on Weak Bisimilarity of Normed Basic Parallel Processes
195
Proposition 3. (α|β)|γ ≈ α|(β|γ). Definition 4. A process α is said to be normed if there exists a finite sequence α −→ . . . −→ αn −→ transitions from α to , and un-normed otherwise. The weak norm of a normed α is the length of the shortest transition sequence of the a1 an ai form α =⇒ . . . =⇒ , where each ai = τ and =⇒ is counted as 1. We denote by ||α|| the weak norm of α. Also, for unnormed α, we follow the convention that ||α|| = ∞ and ∞ > n for any number n. A BPP system Δ is totally normed if for every variable X appears Δ, 0 < X < ∞. With this definition, it is obvious that weak norm is additive: for normed α, β ∈ S, ||α|β|| = ||α||+ ||β||. Moreover, the following proposition says that weak norm is respected by ≈. Proposition 4. If α ≈ β, then either both α, β are un-normed, or both are normed and ||α|| = ||β||.
3
The Tableau Decision Method
From now on, we restrict our attention to the totally normed BPP processes in 3-GNF, i.e. processes of a parallel labeled rewrite system S, Actτ , −→ where ∞ > ||X|| > 0 for all X ∈ V. And throughout the rest of the paper, we assume that all the processes considered are totally normed unless stated otherwise. With the preparation of the previous section, in this section we can devise a tableau decision method. The rules of the tableau system are built around equations α = β, where α, β ∈ S. Each rule of the tableau system has the form name
α=β α1 = β1 . . . αn = βn
side condition.
The premise of a rule represents the goal to be achieved while the consequents are the subgoals. There are three rules altogether. One for unfolding. Two rules for substituting the states. We now explain the three rules in turn. Table 2. Tableau rules subl
α1 |β1 = α2 |β2 α1 |β1 = α1 |β2
(if there is α1 ≺ α2 and a dominated node labeled α1 = α2 or α2 = α1 )
subr
α1 |β1 = α2 |β2 α2 |β1 = α2 |β2
(if there is α2 ≺ α1 and a dominated node labeled α1 = α2 or α2 = α1 )
unfold
α=β {α = β | (α , β ) ∈ M }
M is a match f or (α, β)
196
3.1
H. Chen
Substituting the States
The next two rules can be used to substitute the expressions in the goal. The rules are based on the following observation. Definition 5. (dominate and improve)[6] 1. The pair (α1 |α2 , β1 |β2 ) dominates the pair (α1 , β1 ). k m 2. X1k1 | · · · |Xi0i0 | · · · |Xnkn improves X1m1 | · · · |Xi0 i0 | · · · |Xnmn iff there is some i0 such that for i < i0 the (total) number of occurrences of Xi is equal in both pairs, i.e. ki = mi while the number of occurrences of Xi0 is smaller in k m (X1k1 | · · · |Xi0i0 | · · · |Xnkn ) than in (X1m1 | · · · |Xi0 i0 | · · · |Xnmn ) i.e. ki0 < mi0 . Proposition 5. Every sequence of pairs in which every pair improves the previous one is finite. Proposition 6. Every sequence of pairs in which no pair dominates a previous one is finite. Definition 6. By ≺ we denote the well-founded ordering on S given as follows: X1k1 | · · · |Xnkn ≺ X1l1 | · · · |Xnln iff there exists j such that kj < lj and for all i < j we have ki = li . It is easy to show that ≺ is well-founded. Moreover, We shall rely on the fact that ≺ is total in the sense that for any α, β ∈ S such that α = β we have α ≺ β or β ≺ α. Also we shall rely on the fact that α ≺ β implies α|α ≺ β|α as well as α ≺ β|α for any α ∈ S. All these properties are easily seen to hold for ≺. When building tableaux basic nodes might dominate other basic nodes; we say a basic node n : α1 |β1 = α2 |β2 or α2 |β2 = α1 |β1 dominates any node n : α1 = α2 or n : α2 = α1 which appears above n in the tableau. There n : α1 = α2 or n : α2 = α1 is called the dominated node. Definition 7. We define a weight function ω, s.t. for α = X1k1 | · · · |Xnkn , β = Y1m1 | · · · |Ynmn , ω(α, β) = 1×k1 +1×k2 +· · ·+1×kn +1×m1 +1×m2 +· · ·+1×mn. Proposition 7. For every α1 , β1 , if α ≈ β, then α|α1 ≈ β|β1 iff α|α1 ≈ α|β1 iff β|α1 ≈ β|β1 . Proof. For the only if direction, suppose α|α1 ≈ β|β1 , since α ≈ β and β1 ≈ β1 , then α|β1 ≈ β|β1 by Proposition 1, by α|α1 ≈ β|β1 , so α|α1 ≈ α|β1 since ≈ is an equivalence. For the if direction, suppose α|α1 ≈ α|β1 , since α ≈ β and β1 ≈ β1 , α|β1 ≈ β|β1 by Proposition 1, by α|α1 ≈ α|β1 , so α|α1 ≈ β|β1 since ≈ is an equivalence. For β it is similar to previous proof. Proposition 8. One of the pairs (α|α1 , α|β1 ) or (β|α1 , β|β1 ) is an improvement of (α|α1 , β|β1 ) where α = β. This proposition guarantees the soundness and backwards soundness of subl, subr rules. In fact in section 2, from Proposition 5 we know that every sequence of pairs in which every pair improves the previous one is finite. So this means that there are only finitely many different ways to apply the rules.
More on Weak Bisimilarity of Normed Basic Parallel Processes
3.2
197
Unfolding by Matching the Transitions
Definition 8. Let (α, β) ∈ S × S. A binary relation M ⊆ S × S is a match for (α, β) if the following hold: a
a ˆ
1. whenever α −→ α then β =⇒ β for some (α , β ) ∈ M ; a a ˆ 2. whenever β −→ β then α =⇒ α for some (α , β ) ∈ M ; a a 3. whenever (α , β ) ∈ M then ||α || = ||β || and either α −→ α or β −→ β for some a ∈ Actτ . It is easy to see that for a given (α, β) ∈ S × S, there are finitely many possible M ⊆ S × S which satisfies 3. Above and moreover each of them must be finite. And for such M it is not difficult to see that it is decidable whether M is a match for (α, β). The rule can be used to obtain subgoals by matching transitions, and it is based on the following observation. Proposition 9. Let α, β ∈ S. Then α ≈ β if and only if there exists a match M for (α, β) such that α ≈ β for all (α , β ) ∈ M . This proposition guarantees the soundness and backwards soundness of unfold rule. As pointed out above there are finitely many matches for a given (α, β), so there are finitely many ways to apply this rule on (α, β). 3.3
Constructing Tableau
We determine whether α ≈ β by constructing a tableau with root α = β using the three rules introduced above. A tableau is a finite tree with nodes labeled by equations of the form α = β, where α, β ∈ S. Moreover if α = β labels a non-leaf node, then the following are satisfied: 1. ||α|| = ||β||; 2. its sons are labeled by α1 = β1 . . . αn = βn obtained by applying rule subl, subr or unfold in Table 2 to α = β, in that priority order; 3. no other non-leaf node is labelled by α = β. A tableau is a successful tableau if the labels of all its leaves have the forms: 1. α = β where there is a non-leaf node is also labeled α = β; 2. α ≡ β 3.4
Decidability, Soundness, and Completeness
Lemma 1. Every tableau with root α = β is finite, Furthermore, there is only a finite number of tableaux with root α = β. Theorem 1. If α ≈ β then there exists a successful tableau with root labeled α = β.
198
H. Chen
Proof. Suppose α ≈ β. If we can construct a tableau T (α = β) for α = β with the property that any node n : α = β of T (α = β) satisfies α ≈ β, then by Lemma 1 that construction must terminate and each terminal will be successful. Thus the tableau itself will be successful. We can construct such a T (α = β) if we verify that each rule of the tableau system is forward sound in the sense that if the antecedent relates bisimilar processes then it is possible to find a set of consequents relating bisimilar processes. For the rule subl or subr we know from Proposition 7. For the rest of the tableau rules it is easily verified that they are forward sound in the above sense. Finally we must show soundness of the tableau system, namely that the existence of a successful tableau for α = β indicates that α ≈ β. This follows from the fact that the tableau system tries to construct a family of binary relations which are bisimilar. Definition 9. A sound tableau is a tableau such that if α = β is a label in it then α ≈ β. Theorem 2. A successful tableau is a sound tableau. Proof. Let T be a successful tableau. We define W = {B ⊆ S × S} to be the smallest binary relations satisfies the following: 1. if α ≡ β labels a node in T then (α, β) ∈ W ; 2. if there is a node in T labeled α = β and on which rule unfold is applied then (α, β) ∈ W ; 3. if (α1 , α2 ) ∈ W , (α1 |β1 , α1 |β2 ) ∈ W where α1 ≺ α2 then (α1 |β1 , α2 |β2 ) ∈ W ; 4. if (α1 , α2 ) ∈ W , (α2 |β1 , α2 |β2 ) ∈ W where α2 ≺ α1 then (α1 |β1 , α2 |β2 ) ∈ W . We will prove the following properties about W : A. If α = β labels a node in T then (α, β) ∈ W . B. If (α, β) ∈ W , then the following hold: a
a ˆ
a
a ˆ
(a) if α −→ α then β =⇒ β for some β such that (α , β ) ∈ W ; (b) if β −→ β then α =⇒ α for some α such that (α , β ) ∈ W . Clearly property B. implies that B = {(α, β) | (α, β) ∈ W } is a weak bisimulation. Then together with property A. it implies that T is a sound tableau. We prove A. by induction on weight ω = ω(α, β). If α = β is a label of an nonleaf node, there are three cases according to which rule is applied on this node. If unfold is applied, then by rule 2. of the construction of W clearly (α, β) ∈ W . If subl is applied, in this case α = β is of the form α1 |β1 = α2 |β2 , and the node has sons labeled by α1 |β1 = α1 |β2 . Clearly ω(α1 |β1 , α1 |β2 ) < ω(α1 |β1 , α2 |β2 ), then by the induction hypothesis (α1 |β1 , α1 |β2 ) ∈ W . Then by rule 3. in the
More on Weak Bisimilarity of Normed Basic Parallel Processes
199
construction of W , (α1 |β1 , α2 |β2 ) ∈ W . If subr is applied, it is similar to subl proof. If α = β is a label of a leaf node, then since T is a successful tableau either there is a non-leaf node also labeled by α = β and in this case we have proved that (α, β) ∈ W , or α ≡ β must hold and in this case by rule 1. in the construction of W we also have (α, β) ∈ W . We prove B. by induction on the four rules define W . Suppose (α, β) ∈ W , there are the following cases. Case of rule 1. i.e. α ≡ β. It is obvious B. holds. Case of rule 2. i.e. there exists M which is a match for (α, β) such that α = β is a label of T for all (α , β ) ∈ M . Then by A. it holds that (α , β ) ∈ W for all (α , β ) ∈ M , then by definition of a match, clearly B. holds. Case of rule 3. i.e. there exist (α1 , α2 ) ∈ W , (α1 |β1 , α1 |β2 ) ∈ W where α1 ≺ α2 a and α = α1 |β1 . If α1 |β1 −→ α , we have to match this by looking for a β such a ˆ
that α2 |β2 =⇒ β and (α , β ) ∈ W . By transition rule for α has two cases: a the first case α1 −→ α1 then α = α1 |β1 . Now (α1 , α2 ) ∈ W , by the induction a ˆ
hypothesis there exists α2 ∈ α such that α2 =⇒ α2 and (α1 , α2 ) ∈ W . since (α1 |β1 , α1 |β2 ) ∈ W , by the induction hypothesis there exists α1 |β2 ∈ α such that a ˆ
α1 |β2 =⇒ α1 |β2 and (α1 |β1 , α1 |β2 ) ∈ W . By rule 3 we have (α1 |β1 , α2 |β2 ) ∈ W . a The another direction can be proved in a similar way; the second case β1 −→ β1 then α = α1 |β1 . since (α1 |β1 , α1 |β2 ) ∈ W , by the induction hypothesis a ˆ
there exists α1 |β1 ∈ α such that α1 |β2 =⇒ α1 |β2 and (α1 |β1 , α1 |β2 ) ∈ W . Now (α1 , α2 ) ∈ W , by rule 3 we have (α1 |β1 , α2 |β2 ) ∈ W . The another direction can be proved in a similar way. Case of rule 4. it is similar rule 3. proof. Theorem 3. Let α, β ∈ S be totally normed. Then α ≈ β if and only if there exists a successful tableau with root α = β.
4
The Equational Theory
We will develop the equational theory proposed by Sφren Chirstensen, Yoram Hirshfeld, Faron Moller in [13] for strong bisimulation on normed BPP processes given in 3-GNF. We now describe a sound and complete axiomatisation for totally normed BPP processes. We pay attention to BPP processes in 3-GNF. The axiomatisation shall be parameterised by Δ and consists of axioms and inference rules that enable one to derive the root of successful tableaux. The axiomatisation is built around sequences of the form Γ Δ E = F where Γ is a finite set of assumptions of the form α = β and E, F are BPP expressions. Let Δ be a finite family of BPP processes in 3-GNF. A sequent is interpreted as follows: Definition 10. We write Γ |=Δ E = F when it is the case that if the relation def
{(α, β)|α = β ∈ Γ } ∪ {(Xi , Ei )|Xi = Ei ∈ Δ} is part of a bisimulation then E ≈ F.
200
H. Chen
Thus, the special case ∅ |=Δ E = F states that E ≈ F (relative to the system of process equations Δ). For the presentation of rule unfold we introduce notation unf (α) to mean the unfolding of α given as follows (assuming that α ≡ Y1 |Y2 | · · · |Ym ): m
i=1 {ai γi : ai αi ∈ Yi }, m where γi = αi |( j=1,j=i Yj ) and the notation aα ∈ Y means that aα is a summand of the defining equation for Y . The proof system is presented in Table 3. Equivalence and congruence rules are R1-6. In [13] the rule R5 of the axiomatisation for normed BPP processes can not directly apply in our rules, since we know that weak bisimulation is not preserved by summation, i.e. if E1 ≈ E2 and F1 ≈ F2 , but we can’t get E1 +F1 ≈ E2 + F2 . So we increase two rules R5-6 to achieve summation. The rules R7-15 correspond to the BPP laws; notably we have associativity and commutativity for merge. Finally, we have two rules characteristic for this axiomatisation; R16 is an assumption introduction rule underpinning the role of the assumption list Γ and R17 is an assumption elimination rule and also a version of fixed point induction. The special form of R17 has been dictated by the rule unfold of the tableau system presented in Table 2.
unf (α) =
Definition 11. A proof of Γ Δ E = F is a proof tree with root labeled Γ Δ E = F , instances of the axioms R1 and R7-R16 as leaves and where the father of a set of nodes is determined by an application of one of the inference rules R2-R6 or R17. Definition 12. The relations ≈o for ordinals o are defined inductively as follows, where we assume that l is a limit ordinal E ≈0 F for all E, F E ≈o+1 F iff for a ∈ (Actτ ∪ {}) a
a ˆ
a
a ˆ
E =⇒ E , then ∃F .F =⇒ F and E ≈o F F =⇒ F , then ∃E .E =⇒ E and E ≈o F . E ≈l F iff ∀o < l.E ≈o F ∞ So we can get a fact that is ≈= ≈n . n=0
Theorem 4. (Soundness) If Γ Δ E = F then we have Γ |=Δ E = F . In particular if Δ E = F then E ≈ F . The similar proof for soundness can be found in [13]. Lemma 2. If Γ Δ E = F then Γ, Γ Δ E = F for any Γ . The completeness proof rests on a number of lemmas and definitions which tell us how to determine our sets of hypotheses throughout a proof of E ≈ F from a successful tableau for E ≈ F . We prove completeness from [13] idea.
More on Weak Bisimilarity of Normed Basic Parallel Processes
201
Table 3. The axiomatisation Equivalence
Congruence
R1
Γ Δ E = E
R4
Γ Δ E 1 = F 1 Γ Δ E 2 = F 2 Γ Δ E1 |E2 = F1 |F2
R2
Γ Δ E = F Γ Δ F = E
R5
Γ Δ E = F Γ Δ E = F + τ E
R3
Γ Δ E = F Γ Δ F = G Γ Δ E = G
R6
Γ Δ E = F Γ Δ aE + R = aF + R
Axioms R7 R8 R9 R10 R11 Recursion R16
Γ Γ Γ Γ Γ
Δ Δ Δ Δ Δ
E + (F + G) = (E + F ) + G E+F =F +E E+E =E E+0=E E|(F |G) = (E|F )|G
Γ, α = β Δ α = β
R12 R13 R14 R15
R17
Γ Γ Γ Γ
Δ Δ Δ Δ
E|F = F |E E|0 = E τE = E a(E + τ F ) + aF = a(E + τ F )
Γ, α = β Δ unf (α) = unf (β) Γ Δ α = β
Definition 13. For any node n of a tableau, Rn(n) denotes the set of labels of the nodes above n to which the rule unfold is applied. In particular, Rn(r)=∅ where r is the root of the tableau. Theorem 5. (Completeness)If α ≈ β then Γ Δ α = β Proof. If α ≈ β, then there exists a finite successful tableau with root labeled α = β. Let T (α = β) be such a tableau. We shall prove that for any node n : E = F of T (α = β) we have Rn(n) Δ E = F . In particular, for the root r : α = β, this reduces to Δ α = β, so we shall have our result. We prove Rn(n) Δ E = F by induction on the depth of the subtableau rooted at n. As the tableau is built modulo associativity and commutativity of merge and by removing 0 components sitting in parallel or in sum we shall assume that the axioms R12-R14 are used whenever required to accomplish the proof. Firstly, if n : E = F is a terminal node then either E and F are identical terms α, so Rn(n) E = F follows from R1. Hence assume that n : E = F is a internal nodes. We proceed to apply to n according to the tableau rule. (i) Suppose unfold is applied. Then n is the label E = F and the son n of n is labeled Ei = Fi (i ∈ {1 · · · n}, Ei = Fi is match of E = F ), by induction hypothesis Rn(Ei = Fi ) Ei = Fi , Rn(Ei = Fi ) − {E = F }, E = F Ei = Fi , Rn(Ei = Fi ) − {E = F } E = F by R17, we know Rn(E = F ) = Rn(Ei = Fi ) − {E = F }, so Rn(E = F ) E = F . (ii) Suppose sub is applied wlog that is subl. Then the label E = F is of the form E1 |F1 = E2 |F2 with the corresponding node n labeled E1 = E2 and
202
H. Chen
the son n of n is labeled E1 |F1 = E1 |F2 , by induction hypothesis Rn(E1 |F1 = E1 |F2 ) E1 |F1 = E1 |F2 since Rn(E1 |F1 = E1 |F2 ) = Rn(E1 |F1 = E2 |F2 ), and E1 = E2 ∈ Rn(E1 |F1 = E2 |F2 ), so Rn(E1 |F1 = E2 |F2 ) E1 = E2 by R16. Hence from R1, R4, R3 we have Rn(E1 |F1 = E2 |F2 ) E1 |F1 = E2 |F2 , last Rn(E = F ) E = F is required. This completes the proof.
5
Conclusions and Directions for Further Work
In this paper we proposed a tableau method to decide whether a pair of totally normed BPP processes is a weak bisimilar relation. The whole procedure is direct and easy to understand, while the termination proof is also very simple. This tableau method also helps us to show the completeness of Sφren Chirstensen, Yoram Hirshfeld, Faron Moller’s equational theory on totally normed BPP systems. Recent resultsby Richard Mayr show that weak bisimulation of Basic P Parallel Processes is 2 -hard[18]. The study of bisimulation decision problems in the fields of BPA and BPP processes has been already rather sophisticated. All the results were recorded and updated by J.Srba[17], as well as open problems in this field. About algorithms, the things left should be concerned with lowing complexity and improving the efficiency. As the equational theory depends on assumptions, it is somewhat different from Milner’s equational theory for regular processes[9]. One direction of interest is the construction of equational theory of ≈ since many decision results for weak bisimulation are already given.
References 1. Moller, F.: Infinite results. In: Sassone, V., Montanari, U. (eds.) CONCUR 1996. LNCS, vol. 1119, pp. 195–216. Springer, Heidelberg (1996) 2. Baeten, J.C.M., Bergstra, J.A., Klop, J.W.: Decidability of bisimulation equivalence for processes generating context-free languages. Journal of the Association for Computing Machinery 93(40), 653–682 (1993) 3. Christensen, S.: Forthcoming Ph.D. thesis. University of Edinburgh 4. Hirshfeld, Y., Jerrum, M., Moller, F.: Bisimulation equivalence is decidable for normed process algebra. In: Wiedermann, J., Van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 412–421. Springer, Heidelberg (1999) 5. Esparza, J.: Petri nets, commutative context-free grammars, and basic parallel processes. In: Reichel, H. (ed.) FCT 1995. LNCS, vol. 965, pp. 221–232. Springer, Heidelberg (1995) 6. Hirshfeld, Y.: Bisimulation trees and the decidability of weak bisimulations. In: INFINITY 1996. Proceedings of the 1st International Workshop on Verification of Infinite State Systems, Germany, vol. 5 (1996) 7. H¨ uttel, H., Stirling, C.: Actions speak louder than words. Proving bisimilarity for context-free processes. In: LICS 1991. Proceedings of 6th Annual symposium on Logic in Computer Science, Amsterdam, pp. 376–386 (1991) 8. R., Milner: A Calculus of Communicating Systems. LNCS, vol. 92. Springer, Heidelberg (1980)
More on Weak Bisimilarity of Normed Basic Parallel Processes
203
9. Milner, R.: Communication and Concurrency. Prentice-Hall International, Englewood Cliffs (1989) 10. Bergstra, J.A., Klop, J.W.: Process theory based on bisimulation semantics. In: Prodeedings of REX Workshop, pp. 50–122. The Netherlands, Amsterdam (1988) 11. Caucal, D.: Graphes canoniques de graphes alg´ ebriques. Informatique th´ eorique et Applications(RAIRO) 90-24(4), 339–352 (1990) 12. Dixon, L.E.: Finiteness of the odd perfct and primitive abundant numbers with distinct factors. American Journal of Mathematics 35, 413–422 (1913) 13. Christensen, S., Hirshfield, Y., Moller, F.: Decomposability, Decidability and Axiomatisability for Bisimulation Equivalence on Basic Parallel Processes. In: LICS 1993. Proceedings of the Eighth Annual Symposium on Logic in Computer Science, Montreal, Canada, pp. 386–396 (1993) 14. Janˇ car, P.: Strong bisimilarity on basic parallel processes is PSPACE-complete. In: LICS 2003. Proceedings of the 18th Annual IEEE Symposium on Logic in Computer Science, Ottawa, Canada, pp. 218-227 (2003) 15. Esparza, J.: Decidability of model checking for infinite-state concurrent systems. Acta Informatica 97-34, 85–107 (1997) 16. Burkart, O., Esparza, J.: More infinite results. Electronic Notes in Theoretical computer Science 5 (1997) 17. Srba, J.: Roadmap of Infinite Results. Bulletin of the European Association for Theoretical Computer Science 78, 163–175 (2002), columns:Concurrency 18. Mayr, R.: On the Complexity of Bisimulation Problems for Basic Parallel Processes. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 329–341. Springer, Heidelberg (2000) 19. Mayr, R.: Weak Bisimulation and Model Checking for Basic Parallel Processes. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 88–99. Springer, Heidelberg (1996)
Extensions of Embeddings in the Computably Enumerable Degrees Jitai Zhao1,2, 1
State Key Lab. of Computer Science, Institute of Software, Chinese Academy of Sciences 2 Graduate University of Chinese Academy of Sciences
[email protected]
Abstract. In this paper, we study the extensions of embeddings in the computably enumerable Turing degrees. We show that for any c.e. degrees x ≤ y, if either y is low or x is high, then there is a c.e. degree a such that both 0 < a ≤ x and x ≤ y ∪ a hold.
1
Introduction
A set A ⊆ ω is called computably enumerable (c.e.) if and only if either A = ∅ or A is the range of a computable function. ¯ the complement of A, is infinite but A set A is simple if A is c.e. and A, contains no infinite c.e. set. Clearly if A is simple then A is not computable ¯ because A¯ can not be c.e., since otherwise it contains an infinite c.e. set, i.e., A. Given sets A, B ⊆ ω, we say that A is Turing reducible to B, if there is an oracle Turing machine Φ such that A = Φ(B) (denoted by A ≤T B). A ≡T B if A ≤T B and B ≤T A. The Turing degree of A is defined to be a = deg(A) = {B : B ≡T A}. A degree a ≤ 0 is low, if a = 0 , and high if a = 0 . A set A ≤T ∅ is low (high), if deg(A) is low (high). The degrees D form a partially ordered set under the relation deg(A) ≤ deg(B) if and only if A ≤T B. We write deg(A) < deg(B) if A ≤T B and B T A. A degree is called computably enumerable (c.e.), if it contains a c.e. set. Let E denote the class of c.e. degrees with the some ordering as that for D. As we know (D; ≤, ∪) and (E; ≤, ∪) will form upper semi-lattices. Given r.e. degrees 0 < b < a, we say that b cups to a if there exists an r.e. degree c < a such that b ∪ c = a; if no such c exists then b is an anti-cupping witness for a. The r.e. degree a has the anti-cupping (a.c.) property if it has an anti-cupping witness. Extensions and non-extensions of embeddings in the computably enumerable Turing degrees have been an extensively studied phenomena in the past decades since Shoenfield [1965] published his conjecture. The conjecture was soon proved false by the minimal pair theorem of Lachlan [1966]. However the characterization of the structure satisfying Shoenfield’s conjecture has become a successful
The author is partially supported by NSFC Grant No. 60325206, and No. 60310213.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 204–211, 2008. c Springer-Verlag Berlin Heidelberg 2008
Extensions of Embeddings in the Computably Enumerable Degrees
205
research in the local Turing degrees, leading to the final resolution of Slaman and Soare [1995] of the full characterization of the problem. The interests in the strong extensions of embeddings in the computably enumerable degrees come from the close relationship between the problem and the decidability/undecidability of the Σ2 -fragment of the c.e. degrees. For instance, Slaman asked in [1983] the following question: For any two c.e. degrees x and y, with x y, does there exist a c.e. degree a, satisfying: 1. 0 < a ≤ x, 2. x y ∪ a? Intuitively, the problem wants to construct a c.e. degree a which should “code more information than” 0, and which cannot compute x even if it joins with y. In fact it is not always possible to find such a c.e. degree a, which is stated in Slaman and Soare [2001]. This problem has recently been resolved negatively in progress by Barmpalias, Cooper, Li, Xia, and Yao. In this article, we consider possible partial results on the positive side of the problem above. We consider two special cases, i.e., when y is low and x is high. We will show the following two theorems: Theorem 1. For any two c.e. degrees x and l, with x l and l low, there is a c.e. degree a, satisfying: 1. 0 < a ≤ x, and 2. x l ∪ a. Theorem 2. For any two c.e. degrees h and y, with h y and h high, there is a c.e. degree a, satisfying: 1. 0 < a ≤ h, and 2. h y ∪ a. Note that Theorem 2 can be directly deduced from the following theorem which appears in Miller [1981]. Theorem 3. Every high r.e. degree h has the a.c. property via a high r.e. witness a. We briefly describe how to show Theorem 2 by Theorem 3: for a given high r.e. degree h, there exists a high r.e. degree a, 0 < a < h, and for a given r.e. y, h ≤ y ∪ a implies h ≤ y, i.e., h y implies h y ∪ a. Therefore a is a desired r.e. degree in Theorem 2. The rest of the paper is devoted to proving Theorem 1, our main result. Our notations and terminology are standard and generally follow Soare [1987] and Cooper [2003]. During the course of a construction, notations such as A, Φ are used to denote the current approximations to these objects, and if we want to specify the values immediately at the end of stage s, then we denote them
206
J. Zhao
by As , Φ[s] etc. For a computable partial functional (c.p., or for simplicity, also a Turing functional), Φ say, the use function is denoted by the corresponding lower case letter φ. The value of the use function of a converging computation is the greatest number which is actually used in the computation. For a Turing functional, if a computation is not defined, then we define its use function equal to −1.
2
Proof of Theorem 1
2.1
Requirements and Strategies
Given c.e. sets X ∈ x, and L ∈ l, with X ≤T L and L low, we will build a c.e. set A to satisfy the following requirements: T : A ≤T X Pe : We inf inite ⇒ We ∩ A = ∅
Φe (L ⊕ A) Ne : X = where e ∈ ω, {Φe : e ∈ ω} is an effective enumeration of all Turing reductions Φ, and We is the e-th c.e. set. Let a be the Turing degree of A. By the T -requirement, a ≤ x, by the Prequirements, A is simple, so it is not computable, i.e., a > 0, and by the N -requirements, x l ∪ a = deg(L) ∪ deg(A) = deg(L ⊕ A). Therefore the requirements are sufficient to prove the theorem. Let {Xs }s∈ω , {Ls }s∈ω be computable enumerations of X, L respectively. During the construction, the requirements may be divided into the positive requirements Pe , which attempt to put elements into A, and the negative requirements Ne , which attempt to keep elements out of A, i.e., impose an Arestraint function with priority Ne . The priority rank of the requirements is Ne < Pe < Ne+1 , for all e ∈ ω. First we introduce an easy method in Yates [1965] for constructing a c.e. set A which is computable in a given non-computable c.e. set B by enumerating an element x in A at some stage s only when B permits x in the sense that some element y < x appears in B at the same stage s. Proposition 1. If {As }s∈ω and {Bs }s∈ω are computable enumerations of c.e. sets A and B respectively, such that x ∈ As+1 −As implies (∃y < x)[y ∈ B −Bs ], then A ≤T B. Proof. To B-recursively compute whether x ∈ A, find a stage s such that Bs x = B x. Now x ∈ A if and only if x ∈ As . 2 The strategy for meeting the T -requirement is attached onto the positive requirements. When an element x is enumerated into A, it must satisfy that Xs+1 x = Xs x so that A ≤T X holds according to Proposition 1 (Soare [1987]). Note that this kind of x can always be found because X is not computable in L.
Extensions of Embeddings in the Computably Enumerable Degrees
207
The strategy for meeting a single requirement Pe is the same as for Post’s simple set. Intuitively, enumerate We until the first element > 2e appears in We simultaneously satisfying other conditions and put it into A. We now give a property of a low c.e. set, as found in Soare [1987]. Proposition 2. If L is a low set then ¯ ≤T ∅ . C = {j : (∃n ∈ Wj )[Dn ⊆ L]}
(1)
where Wj is the j-th c.e. set, and Dn is a finite set with canonical index n. L Proof. Clearly, C is 1 , so C ≤T L . If L is low then L ≤T ∅ , so C ≤T ∅ . 2 According to the Limit Lemma, let g(e, s) be a computable function such that lims g(e, s) is the characteristic function of C. Now we state the basic strategy for meeting a requirement Ne , without loss of generality, let A ⊆ 2ω, the even numbers, and L ⊆ 2ω + 1, the odd numbers. Note that A ⊕ L ≡T A ∪ L, so we use the latter from now on, i.e., Ne : X = Φe (L ∪ A). We follow some basic idea in Robinson [1971] of proving the Robinson Low Splitting Theorem which also can be found in Soare [1987]. Fix e and x. Intuitively, we use the lowness of L to help to “L-certify” a computation Φe ((L ∪ A) u; x)[s] where u = φe (L ∪ A; x)[s] is the use function of this computation as follows: ¯ s u. Enumerate n into a c.e. V that we shall build during Let Dn = L the construction. By the Recursion Theorem we may assume that we have in advance an index j such that V = Wj . Find the least t ≥ s such that either Dn ∩ Lt = ∅, in which case the computation is obvious disturbed, or g(j, t) = 1, in which case we “L-certify” the computation and guess that it is L-correct. It may happen that we were wrong and L u = Ls u, but this happens at most finitely often by Proposition 2 and the Limit Lemma. Since we are really using ¯ for the current Dn = L ¯ s u, it is g as an oracle to inquire whether Dn ⊆ L very important that there are no previous m ∈ Vs unless Dm ∩ Ls = ∅. Thus, whenever an L-certified computation first becomes A-invalid by At u = As u, we abandon the old c.e. set V and start with a new version of V and hence a new index j such that Wj = V . This L-certification process is best formalized by transforming the function Φe (L ∪ A; x) to a computable function Φˆe (L ∪ A; x). When we have fixed e, for notational convenience, we let ˆs (x) ↔ Φˆe (L ∪ A; x)[s], Φ and usx ↔ φe (L ∪ A; x)[s]. ˆs−1 (x) ↓ beIf Φˆs−1 (x) ↓ but Φˆs (x) ↑ we say that the (e, x)-computation Φ comes A-invalid if (∃z < us−1 x )[z ∈ As − As−1 ] and otherwise becomes L-invalid.
208
J. Zhao
Fix e, x and s, we define Φˆs (x) as follows: given At and Lt , t ≤ s and assume that Φe (L ∪ A; x)[s] ↓= y, and ¬(∃z < us−1 x )[z ∈ (As ∪ Ls ) − (As−1 ∪ Ls−1 )]. ¯ s usx . Enumerate n into Vse,x . Let v be the greatest stage less than Let Dn = L s at which an (e, x)-computation becomes A-invalid, and v = 0if no such stage exists. By the Recursion Theorem, choose j such that Wj = {Vte,x : t > v}. Find the least t ≥ s such that either Dn ∩ Lt = ∅,
(2)
g(j, t) = 1.
(3)
or ˆs (x) ↑. If the latter holds, define Φˆs (x) ↓= y. Otherwise, Φ We use a strategy which is similar to the Sacks’ preserving agreement strategy to meet a negative requirement. Here we want to construct a c.e A to meet a requirement of the form X = Φe (L ∪ A). During the construction we preserve agreement between X and Φe (L ∪ A). Sufficient preservation will guarantee that if X = Φe (L ∪ A), then in fact X ≤T L, contrary to hypothesis. As usual, we define the computable functions: (length f unction) ˆ l(e, s) = max{x : (∀y < x)[Xs (y) = Φˆs (y)]}, (restraint f ucntion) rˆ(e, s) = max{usx : x ≤ ˆl(e, s) & Φˆs (x) ↓}. We say that x injures Ne at stage s + 1 if x ∈ As+1 − As and x ≤ rˆ(e, s). Define the injury set for Ne , ˆ = {x : (∃s)[x ∈ As+1 − As & x ≤ rˆ(e, s)]}. (injury set) I(e) The positive requirements of course are never injured. 2.2
Construction and Verification
Proof of Theorem 1. Construction of A. Stage s = 0. Set A0 = ∅. Stage s + 1. Since As has already been defined, we can define, for all e, the length function ˆl(e, s) and restraint function rˆ(e, s). We say Pe requires attention at stage s + 1 if We,s ∩ As = ∅, Then find if ∃x,
x ∈ We,s ,
Extensions of Embeddings in the Computably Enumerable Degrees
209
x > 2e, Xs+1 x = Xs x, and (∀i ≤ e)[ˆ r (i, s) < x]. Choose the least i ≤ s such that Pi requires attention, and then enumerate the least such x into As+1 , and we say that Pi receives attention. Hence Wi,s ∩As+1 = ∅ and (∃x ∈ As+1 )[Xs+1 x = Xs x], so Pi is satisfied and never again requires attention. If i does not exist, do nothing. Let A = As . This ends the construction. To verify that the construction succeeds we must prove the following lemmas. ˆ is finite]. (Ne is injured at most finitely often.) Lemma 1. (∀e) [I(e) Proof. Note that once Pi receives attention, it will become satisfied and remain satisfied forever. Hence each Pi contributes at most one element to A, and Ne ˆ can be injured by Pi only if i < e. So |I(e)| ≤ e. 2 Lemma 2. (∀e) [X = Φe (A ∪ L)]. (Ne is met.) Proof. Assume for a contradiction that X = Φe (A ∪ L). Then lims ˆl(e, s) = ∞. By Lemma 1, choose s1 such that Ne is never injured after stage s1 . We shall show that X ≤T L contrary to hypothesis. To L-recursively compute X(p) for p ∈ ω, find some stage s > s1 such that ˆ l(e, s) > p and each computation Φˆs (x), s x ≤ p, is L-correct, namely, Ls ux = L usx . It follows by induction on t ≥ s that (∀t ≥ s)[ˆl(e, t) > p & rˆ(e, t) ≥ max{usx : x ≤ p}], (4) and hence that for all t ≥ s, Φˆt (p) = Φe (A ∪ L; p) = X(p). So X is computable in L. To prove (4), when t = s, clearly it is true. Assume that it holds for t. Then by the definition of rˆ(e, t) and s > s1 , for any x ≤ p, it ensures that (At+1 ∪ Lt+1 ) z = (At ∪ Lt ) z for all numbers z used in a computation Φˆt (x) ↓= y. Hence, Φˆt+1 (x) ↓= Φˆt (x) ↓= Xt (x). So ˆl(e, t + 1) > p unless Xt+1 (x) = Xt (x). But if Xt (x) = Xs (x) for some t ≥ s, since X is c.e., the disagreement Φˆt (x) ↓ = Xt (x) is preserved forever, so X(x) = ˆt (x) ↓= Φe (A ∪ L; x), contrary to the hypothesis X = Φe (A ∪ L). 2 Xt (x) = Φ Lemma 3. (∀e)[lims rˆ(e, s) exists and is finite].
210
J. Zhao
Proof. By Lemma 1, choose s1 such that Ne is never injured after stage s1 . By Lemma 2, choose p = (μx)[X(x) = Φe (A ∪ L; x)]. Choose s2 ≥ s1 sufficiently large so that for all s ≥ s2 , (∀x < p)[Φˆs (x) ↓= Φe (A ∪ L; x)], and (∀x ≤ p)[Xs (x) = X(x)]. Case 1. Φe (A ∪ L; p) ↓ = X(p). Choose s3 ≥ s2 such that for all s ≥ s3 , Φˆs (p) ↓= q = X(p). Hence, for all s ≥ s3 , ˆl(e, s) = ˆl(e, s3 ) and rˆ(e, s) = rˆ(e, s3 ). Case 2. Φe (A ∪ L; p) ↑. We shall find a stage v such that for all s ≥ v, Φˆs (p) ↑. Hence, for all s ≥ v, rˆ(e, s) = rˆ(e, v). Note that if Φˆs (p) ↓ for any s ≥ s2 then rˆ(e, s) ≥ usp , so by induction on t ≥ s, ˆt (p) = Φˆs (p) holds as long as it remains L-valid. Let s be the computation Φ the least t such that no (e, p)-computation becomes A-invalid at any stage ≥ t. By the Recursion Theorem, choose j such that Wj = {Vse,p : s ≥ s }. Since Φe (A ∪ L; p) ↑, any computation Φˆs (p), s ≥ s , becomes L-invalid at some stage t > s, at which time Dm Ct = ∅ for every m ∈ Vte,p . Hence, lims g(j, s) = 0 by (1). Choose v > s2 such that Φˆv (p) ↑ and g(j, s) = 0 for all s ≥ v. We claim that Φˆs (p) ↑ for all s ≥ v. Suppose s > v, Φˆs−1 (p) ↑ and Φˆs (p) ↓. Then we enumerate ¯ s us , and we choose the least t ≥ s satisfying (2) or n ∈ Vse,p , where Dn = L p ˆs (p) ↑. 2 (3). But (3) could not occur by the choice of v, so (2) occurs and Φ Lemma 4. (∀e) [We inf inite ⇒ We ∩ A = ∅]. (Pe is met, simultaneously, T is met.) Proof. By the above lemmas, for all i ≤ e, let rˆ(i) = lim rˆ(i, s) s
and ˆ R(e) = max{ˆ r (i) : i ≤ e}. Choose s0 such that (∀t ≥ s0 )(∀i ≤ e)[ˆ r(e, t) = rˆ(e)], and no Pi , i < e, receives attention after stage s0 . Now choose s ≥ s0 , if ∃x, x ∈ We,s , x > 2e, Xs+1 x = Xs x, and ˆ R(e) < x.
Extensions of Embeddings in the Computably Enumerable Degrees
211
Now either We,s ∩ As = ∅ or else Pe receives attention at stage s + 1, then in either case We,s ∩ As+1 = ∅, so Pe is met by the end of stage s + 1. And by Proposition 1, A ≤T X is obviously met. It is remarkable that searching for an x with the condition Xs+1 x = Xs x does not impact the requirement Pe . Suppose for the sake of contradiction that if We infinite, while A ∩ We = ∅, then A ⊆ W e , choose an increasing c.e. sequence ˆ for all stage s ≥ s0 . Choose of elements x1 < x2 < · · · in We such that x1 > R(e) sk minimal such that sk > s0 and xk ∈ We,sk . Now Xsk xk = X xk so X is 2 computable, contrary to X T L. Note that A¯ is infinite by the clause “x > 2e”. To see this, note that A contains at most e elements in {0, 1, . . . , 2e}, hence card(A¯ (2e + 1)) ≥ 2e + 1 − e = e + 1. A is simple. This ends the proof of Theorem 1.
References 1. Lachlan, A.H.: Lower bounds for pairs of recursively enumerable degrees. Proc. London Math. Soc. 16, 537–569 (1966) 2. Barmpalias, Cooper, Li, Xia, Yao, Super minimal pairs (in progress) 3. Yates, C.E.M.: Three theorems on the degrees of recursively enumerable sets. Duke Math. J.32, 461–468 (1965) 4. Miller, D.: High recursively enumerable degrees and the anti-cupping property. In: Lerman, Schmerl, and Soare, pp. 230–245 (1981) 5. Shoenfield, J.R.: Application of model theory to degrees of unsolvability. In: Addison, Henkin, and Tarski, pp. 359–363 (1965) 6. Soare, R.I.: Recursively Enumerable Sets and Degrees. Springer, Heidelberg (1987) 7. Robinson, R.W.: Interpolation and embedding in the recursively enumerable degrees. Ann. of Math. 2(93), 285–314 (1971) 8. Cooper, S.B.: Computability Theory. Chapman Hall/CRC Mathematics Series, vol. 26 (2003) 9. Slaman, T.A.: The recursively enumerable degrees as a substructure of the Δ02 degrees (1983) (unpublished notes) 10. Slaman, T.A., Soare, R.I.: Algebraic aspects of the computably enumerable degrees. Proceedings of the National Academy of Science, USA 92, 617–621 (1995) 11. Slaman, T.A., Soare, R.I.: Extension of embeddings in the computably enumerable degrees. Ann. of Math (2) 154(1), 1–43 (2001)
An Improved Parameterized Algorithm for a Generalized Matching Problem Jianxin Wang, Dan Ning, Qilong Feng, and Jianer Chen School of Information Science and Engineering, Central South University
[email protected]
Abstract. We study the parameterized complexity of a generalized matching problem, the P2 -packing problem. The problem is NP-hard and has been studied by a number of researchers. In this paper, we provide further study of the structures of the P2 -packing problem, and propose a new kernelization algorithm that produces a kernel of size 7k for the problem, improving the previous best kernel size 15k. The new kernelization leads to an improved algorithm for the problem with running time O∗ (24.142k ), improving the previous best algorithm of time O∗ (25.301k ).
1
Introduction
Packing problem has formed an important class of NP-hard problems. In particular, as one of the graph packing problem, the H-packing problem has gained more attention, which arises in applications such as scheduling, wireless sensor tracking, wiring-board design and code optimization, etc. The problem is defined as follows [1]. Definition 1. Given a graph G = (V, E) and a fixed graph H. An H-packing of G is a set of vertex disjoint subgraphs of G, each of which is isomorphic to H. From the optimization point of view, the problem of MAXIMUN H-packing is to find the maximum number of vertex disjoint copies of H in G. If the H is the complete graph K2 , the MAXIMUN H-packing becomes the familiar maximum matching problem in bipartite graph, which can be solved in polynomial time. When the graph H is a connected graph with at least three vertices, D. G. Kirkpatrick and P. Hell [2] gave that the problem is NP-complete. From the approximation point of view, V. Kann [3] proved that the MAXIMUN H-packing problem is MAX-SNP-complete. C. Hurkens and A. Schrijver [4] presented an approximation algorithm with ratio |VH |/2 + ε for any ε > 0.
This work is in part supported by the National Natural Science Foundation of China under Grant No. 60773111 and No. 60433020, Provincial Natural Science Foundation of Hunan (06JJ10009), the Program for New Century Excellent Talents in University No. NCET-05-0683 and the Program for Changjiang Scholars and Innovative Research Team in University No. IRT0661.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 212–222, 2008. c Springer-Verlag Berlin Heidelberg 2008
An Improved Parameterized Algorithm for a Generalized Matching Problem
213
Recently, parameterized complexity theory has been used to design efficient algorithms for H-packing problem. M. Fellows et. al. [5] proposed a parameterized algorithm with time complexity of O(2O(|H|k log k+k|H| log |H|) ) for any arbitrary graph H. For the edge disjoint triangle packing problem, L. Mathieson, E. Prieto and P. Shaw [6] proved that the problem has a 4k kernel and gave a 9k 9k parameterized algorithm of running time O(2 2 log k+ 2 ) based on the kernel. When H belongs to the restricted family of graphs K1,s , a star with s leaves, we can get the K1,s -packing problem, which is defined as follows: Definition 2. Parameterized K1,s -PACKING(k-K1,s -PACKING): Given a graph G = (V, E) and a positive integer k, whether there are at least k vertex disjoint K1,s in G? M. Fellows, E. Prieto and C. Sloper [7] gave that the parameterized K1,s -packing problem is fixed-parameter tractable and got a O(k 3 ) kernel. M. Fellows [7] et.al. also studied the P2 -packing problem, where P2 is a path of three vertices (one center vertex and two endpoints) and two edges, which is defined as follows: Definition 3. Parameterized P2 -PACKING (k-P2 -PACKING): Given a graph G = (V, E) and a positive integer k, whether there are at least k vertex disjoint P2 in G? In [7], for the P2 -packing problem, M. Fellows et.al. gave a kernel of size at most 15k and proposed an algorithm with time complexity O∗ (25.301k ). In this paper, we mainly focus on the kernelization of the k-P2 -packing problem and give a kernel of size at most 7k. Based on the kernel, we present a parameterized algorithm with time complexity O∗ (24.142k ), which greatly improves the current best result O∗ (25.301k ). This paper is organized as follows. In section 2, we introduce some related definitions and lemmas. In section 3, we present all the steps of the kernelization algorithm, and prove that the k-P2 -packing problem has a size of 7k kernel. In section 4, we give the general algorithm solving the k-P2 -packing. In section 5, we draw some final conclusions.
2
Related Definitions and Lemmas
We first give some concepts and terminology about graph [8]. Assume G = (V, E) denotes a simple, undirected, connected graph, where |V | = n. The neighbors of a vertex v are denoted as N (v). The induced subgraph of S ⊆ V is denoted G[S]. For an arbitrary subgraph H of G, let N (H) denote the vertices that are not in H but connect with at least one vertex in H. We use the simpler G\v to denote G[V \v] for a vertex v and G\e to denote G = (V, E\e) for an edge e. Likewise, G\V denotes G[V \V ] and G\E denotes G = (V, E\E ) where V is a set of vertices and E is a set of edges. For the convenience of description, we firstly introduce the definitions of ‘double crown’ decomposition and ‘fat crown’ decomposition [7].
214
J. Wang et al.
Definition 4. A double crown decomposition (H, C, R) in a graph G = (V, E) is a partitioning of the vertices of the graph into three sets H, C and R that have the following properties: (1) H (the head) is a separator in G such that there are no edges in G between vertices belonging to C and vertices belonging to R. (2) C = Cu ∪ Cm ∪ Cm2 (the crown) is an independent set in G. (3) |Cm | = |H|, |Cm2 | = |H| and there is a perfect matching between Cm and H, and a perfect matching between Cm2 and H. Definition 5. A fat crown decomposition (H, C, R) in a graph G = (V, E) is a partitioning of the vertices of the graph into three sets H, C and R that have the following properties: (1) H (the head) is a separator in G such that there are no edges in G between vertices belonging to C and vertices belonging to R. (2) G[C] is a forest where each component is isomorphic to K2 . (3) |C| ≥ |H|, and there is a perfect matching M between H and a subset of cardinality |H| in C, where one endpoint of each edge in M is in H, and the other is the endpoint of K2 in C. We introduce the following lemmas [7] about the ‘double crown’ decomposition and ‘fat crown’ decomposition that will be used in our algorithm. Lemma 1. A graph G = (V, E) that admits a ‘double crown’-decomposition (H, C, R) has a k-P2 -packing if and only if G\(H ∪C) has a (k−|H|)-P2 -packing. Lemma 2. A graph G = (V, E) that admits a ‘fat crown’-decomposition (H, C, R) has a k-P2 -packing if and only if G\(H ∪C) has a (k−|H|)-P2 -packing. Lemma 3. A graph G with an independent set I, where|I| ≥ 2|N (I)|, has a double crown decomposition (H, C, R), H ⊆ N (I), which can be constructed in linear time. Lemma 4. A graph G with a collection J of independent K2 s, where |J| ≥ |N (J)|, has a fat crown decomposition (H, C, R), H ⊆ N (J), which can be constructed in linear time.
3
Kernelization Algorithm for the k-P2 -Packing Problem
In this section we propose a kernelizaiton algorithm that can get a kernel of size at most 7k for the parameterized version of P2 -packing problem. Assume W denotes a maximal P2 -packing and the vertices in W are denoted by V (W ). Let W be {L1 , ..., Lt }, t ≤ k − 1, where each of Li (1 ≤ i ≤ t) is a subgraph in G that is isomorphic to P2 . Let Li be (e1 , c, e2 ), 1 ≤ i ≤ t, where e1 and e2 are two endpoints of Li , and c is the center vertex of Li . Therefore, each connected component of the graph induced by Q = V \V (W ) is either a single vertex or a single edge [7]. Let Q0 be the set of all vertices such that each vertex in Q0 makes a connected component of the graph induced by Q, and each vertex in Q0 will be called a Q0 -vertex. Let Q1 be the set of all edges such that each edge in Q1 makes a connected component of the graph induced by Q. Each edge in Q1 will be called a Q1 -edge and each vertex in Q1 will be called a Q1 -vertex.
An Improved Parameterized Algorithm for a Generalized Matching Problem
3.1
215
RPLW Algorithm
Based on the kernelization algorithm given in [7], the kernelization process we propose is to apply the algorithm RPLW repeatedly. By using the kernelization algorithm in [7], we can get a graph G, which consists of a maximal packing W and Q = V \V (W ). The algorithm RPLW is to further reduce the vertices in W and Q to get a better kernel, whose general idea is given in the following: Algorithm RPLW deals with the Q0 -vertices and Q1 -edges in Q. When the size of W is not changed, the algorithm aims at reducing the number of Q0 vertices in Q. When the number of Q1 -edges in Q is reduced, the size of W becomes larger (the number of disjoint P2 in W is increased) and the algorithm returns the larger W . Then we call the algorithm for the larger W . If the ‘double crown’ decomposition or the ‘fat crown’ decomposition is found, the parameter k becomes smaller and the algorithm returns the smaller parameter. Then we call the algorithm for the smaller parameter. For the convenience of analyzing the RPLW algorithm, we first discuss the following two structures as shown in Fig.1 and Fig.2, which use solid circles and thick lines for vertices and edges in the maximal P2 -packing W , and use hollow circles and thin lines for vertices and edges not in W . (In particular, thin lines that connect two hollow circles are Q1 -edge.) q1
t1
q1
q2
c
t2
t1
c
q2
q1
q2
t2
t1
c
t2
q2
q1
t1
c
t2
(a) (b)
Fig. 1. Reduce the number of Q0 -vertex
The general idea of Fig.1 is that: In order to decrease the number of Q0 vertices in Q, replace the Li in W . The specific process is as follows. In Fig.1(a), assume the P2 is the Li in W whose center vertex is c and two endpoints are t1 , t2 . Q0 -vertex q2 is adjacent to c, and Q0 -vertex q1 is adjacent to t1 . Vertices q2 , c and t2 can form a new P2 . Let the new P2 be Li . If Li is replaced by Li in W , the number of Q0 -vertices in Q is just reduced by 2 (q1 and t1 form a Q1 -edge in Q). In Fig.1(b), assume the P2 is the Li in W whose center vertex is c and two endpoints are t1 , t2 . Q0 -vertex q2 is adjacent to t2 , and Q0 -vertex q1 is adjacent to t1 . Vertices q1 t1 and c can form a new P2 . Let the new P2 be Li . If Li is replaced by Li in W , the number of Q0 -vertices in Q is just reduced by 2 (q2 and t2 form a Q1 -edge in Q). The general idea of Fig.2 is that: In order to decrease the number of Q1 -edges in Q and increase the number of disjoint P2 in W , replace Li in W . The specific process is as follows.
216
J. Wang et al.
e1
e2
t1
c
e3
e4
t2
e1
t1
e2
e3
c
e4
t2
e1 e2 e3
t1
c
e4
e1 e2 e3 e4
t2
(a)
t1
c
t2
(b)
Fig. 2. Reduce the number of Q1 -edge
In Fig.2(a), assume the P2 is the Li in W whose center vertex is c and two endpoints are t1 , t2 . Q1 -edge (e1 , e2 ) is adjacent to t1 , and Q1 -edge (e3 , e4 ) is adjacent to c. Vertices e1 , e2 and t1 can form a new P2 , which can be denoted as Li . Vertices e3 , e4 and c can also form a new P2 , which is denoted as Li . If Li is replaced by Li and Li in W , the number of Q1 -edges in Q is just reduced by 1 and the number of P2 in W is increased by 1. In Fig.2(b), assume the P2 is the Li in W whose center vertex is c and two endpoints are t1 , t2 . Q1 -edge (e1 , e2 ) is adjacent to t1 , and Q1 -edge (e3 , e4 ) is adjacent to t2 . Vertices e1 , e2 and t1 can form a new P2 which is denoted as Li . Vertices e3 , e4 and t2 can also form a new P2 which is denoted as Li . If Li is replaced by Li and Li in W , the number of Q1 -edges in Q is just reduced by 1and the number of P2 in W are increased by 1. Fig.1 and Fig.2 vividly illustrate how to replace a Li in W to change the number of Q0 -vertices and Q1 -edges. According to Fig.1 and Fig.2, we can obtain the following rules. Rule1. If a Li in W has two vertices that each is adjacent to a different Q0 vertex, then apply the processes described in Fig.1 to decrease the number of Q0 -vertices by 2 (and increase the number of Q1 -edges by 1). Rule2. If a Li in W has two vertices that each is adjacent to a different Q1 -edge, then apply the processes described in Fig.2 to decrease the number of Q1 -edges by 1(and increase the size of the maximal P2 -packing by 1). The RPLW algorithm tries to reduce the number of Q0 -vertices and the number of Q1 -edges by applying Rule1 and Rule2 consecutively. Note that these rules cannot be applied forever. As shown in Fig.3, the while-loop in step1 of the algorithm tries to reduce the number of Q0 -vertices in Q. Because the number of vertices in input graph G is limited (at most 15k [7]) and each applications of Rule1 reduces the number of Q0 -vertices by 2, the number of consecutive applications of Rule1 is bonded by 7.5k . During the applications of these rules, the resulting W may becomes non-maximal. In this cases, we simply first make W maximal again, using any proper greedy algorithm in step2 of the algorithm before we further apply the rules. Thus, the P2 founded in Q can be put into W to make W larger. Assume the larger packing is W , then call the algorithm for W . During the process of the replacement of Q1 -edges in step3 of the algorithm, since each application of Rule2 increases the number of P2 in W by 1, the
An Improved Parameterized Algorithm for a Generalized Matching Problem
217
Algorithm RPLW Input: G, W , k Output: a maximal P2 -packing W and |W | > |W |, or a smaller parameter k and |k | < |k|, or a reduced graph G 1. while W is a maximal P2 -packing and a P2 in W has two vertices that each is adjacent to two different Q0 -vertices do apply Rule1 to replace W by a packing of the same size with reduced Q0 -vertices; 2. if W is not maximal then use greedy algorithm to construct a larger P2 -packing W ; return (G, W , k). 3. if two Q1 -edges are adjacent to two different vertices on a P2 in W then apply Rule2 to obtain a larger P2 -packing W ; return (G, W , k). 4. if |Q0 | ≥ 2|W | then construct a double crown decompositon (H, C, R), then k = k − |H|; return (G, W, k ). 5. if |Q1 | ≥ |W | then construct a fat crown decompositon (H, C, R), then k = k − |H|; return (G, W, k ). 6. Assume the reduced graph is G , return (G , W, k).
Fig. 3. RPLW algorithm
total number of applications of Rule2 is bounded by k. Step4 and step5 of the algorithm aim at finding ‘double crown’ and ‘fat crown’ in G induced by the replacement of Q0 -vertices and Q1 -edges. Once ‘double crown’ or ‘fat crown’ is found, the parameter k must be reduced (k = k − |H|). For completeness, we verify the algorithm’s correctness, and analyze its precise complexity. Lemma 5. Repeatedly calling the algorithm RPLW will either find a k-P2 packing or reduce the size of G, and those can be done in O(k 3 ). Proof. From step2 and step3 of the RPLW algorithm, it can be seen that the number of disjoint P2 in W is increased by the replacement of Q0 -vertices and Q1 -edges. By calling the algorithm repeatedly, when the number of disjoint P2 in W is k, a k-P2 -packing is found in graph G. Because of the replacements in step4 and step5 of the algorithm, ‘double crown’-decomposition or ‘fat crown’decomposition will be found in G. Therefore, the parameter k is decreased and the number of disjoint P2 needed to be found is also decreased. By calling the algorithm repeatedly, when the parameter k is reduced to 0, a k-P2 -packing is found in graph G. On the other hand, because the replacement in step1-3 of the algorithm limits the number of Q0 -vertices and Q1 -edges, and some vertices are removed by the ‘double crown’-decomposition or ‘fat crown’-decomposition in
218
J. Wang et al.
step4-5 of the algorithm, the size of G will be reduced. Thus, if a k-P2 -packing is not found in the algorithm, the algorithm returns a reduced G . At last, we analyze the time complexity of those whole process. Calling the RPLW algorithm repeatedly is to apply Rule1 and Rule2 consecutively, which can be finished in polynomial time. The number of consecutive applications of Rule1 is bonded by 7.5k, and the total number of applications of Rule2 is bounded by k. When ‘double crown’-decomposition or ‘fat crown’-decomposition is applicable, the parameter k is reduced accordingly. The ‘double crown’-decomposition or ‘fat crown’-decomposition can be founded in O(k 2 ) [7]. The algorithm must return when the number of disjoint P2 in W is increased or the parameter k is reduced, and the algorithm is called again for the larger packing W or the smaller parameter k . If a k-P2 -packing is not found in the algorithm, the algorithm returns a reduced G . Therefore, the algorithm is called at most 9.5k times. In consequence, the whole process can be computed in O(k 3 ) time. Our kernelization process is to apply the RPLW algorithm repeatedly, i.e, to apply Rule1 and Rule2 repeatedly by starting with a maximal P2 -packing. The whole process can finish in polynomial time. The kernelization will either find a k-P2 -packing or reduce the size of G until Rule1 and Rule2 are not applicable. The reduced G can be considered as a kernel of the k-P2 -packing problem. Note that when Rule1 and Rule2 are not applicable, the maximal P2 -packing W (for each Li in W ) has the following properties: Property 1. If more than one Q0 -vertices are adjacent to Li , then all these Q0 -vertices must be adjacent to the same (and unique) vertex in Li . Property 2. If more than one vertex in Li are adjacent to Q0 -vertices, then all these vertices in Li must be adjacent to the same (and unique) Q0 -vertex. Property 3. If more than one Q1 -edges are adjacent to Li , then all these Q1 edges must be adjacent to the same (and unique) vertex in Li . Property 4. If more than one vertex in Li are adjacent to Q1 -edges, then all these vertices in Li must be adjacent to the same (and unique) Q1 -edge. 3.2
A Smaller Kernel
In the following, we first analyze the number of Q0 -vertices and Q1 -edges in Q after the kernelization. Then we will present how the kernel of size at most 7k is obtained for k-P2 -packing problem. We first analyze the number of Q0 -vertices in Q. Theorem 1. The number of Q0 -vertices is bounded by 2(k − 1), that is, |Q0 vertex| ≤ 2(k−1), or else we can find a double crown decomposition in polynomial time. Proof. When Rule1 and Rule2 are not applicable, let W be the maximal packing W = L1 , · · · , Lt , t ≤ k − 1, which is a collection of disjoint P2 . We partition the disjoint P2 in W into two groups: {L1 , · · · , Ld }, {Ld+1, · · · , Lt }, which satisfy
An Improved Parameterized Algorithm for a Generalized Matching Problem
219
the following property: for each Li , 1 ≤ i ≤ d, each Q0 -vertex adjacent to Li can be adjacent to more than one vertex in Li , and we denote these Q0 -vertex as Q0i (1 ≤ i ≤ d); for each Lj , j > d, each Q0 -vertex adjacent to Lj is at most adjacent to one vertex in Lj . Consider the vertex set Q0 -vertex=Q0-vertex−{Q01, · · · , Q0d }, and let W = {v1 , · · · , vs } be the set of vertices in Ld+1 ∪ · · · ∪ Lt such that each vertex in W has neighbor in Q0 -vertex. By the above partition property, each Lj (j > d) has at most one vertex in W . Thus, s ≤ t − d. Moreover, by Property2, no vertex in Q0 -vertex is adjacent to any Li . Therefore, each vertex in Q0 -vertex has all its neighbors in W , that is, W = N (Q0 -vertex). Assume the total number of vertices in Q0 -vertex is p. If p > 2s, there is a ‘double crown’-decomposition in the input graph (note that the set of Q0 vertex is an independent set). We can call the RPLW algorithm again, which contradicts that Rule1 and Rule2 are not applicable. On the other hand, if p ≤ 2s, the total number of Q0 -vertices in the graph is that: |Q0 -vertex| = |Q0 vertex| + d = p + d ≤ 2s + d ≤ 2(s + d) ≤ 2t ≤ 2(k − 1). This completes the proof. In the following, we analyze the number of Q1 -vertices in Q. Theorem 2. The number of Q1 -vertices is bounded by 2(k − 1), that is, |Q1 edge| ≤ k − 1, or else we can find a double crown decomposition in polynomial time. Proof. When Rule1 and Rule2 are not applicable, let W be the maximal packing W = L1 , · · · , Lt , t ≤ k − 1, which is a collection of disjoint P2 . We partition the disjoint P2 in W into two groups: {L1 , · · · , Ld }, {Ld+1, · · · , Lt }, which satisfy the following property: for each Li , 1 ≤ i ≤ d, each Q1 -edge adjacent to Li can be adjacent to more than one vertex in Li , and we denote these Q1 -edges as Q1i (1 ≤ i ≤ d); for each Lj , j > d, each Q1 -edge adjacent to Lj is at most adjacent to one vertex in Lj . Consider the vertex set Q1 -edge=Q1 -edge−{Q11, · · · , Q1d }, and let W = {v1 , · · · , vs } be the set of vertices in Ld+1 ∪ · · · ∪ Lt such that each vertex in W has neighbors in Q1 -vertex. By the above partition property, each Lj (j > d) has at most one vertex in W . Thus, s ≤ t − d. Moreover, by Property4, no vertex in Q1 -edge is adjacent to any Li , 1 ≤ i ≤ d. Therefore, each vertex in Q1 -edge has all its neighbors in W , that is, W = N (Q1 -edge). Assume the total number of edges in Q1 -edge is p. If p > s, there is a ‘fat crown’-decomposition in the input graph (note that the set of Q1 -edge is an independent set of K2 ). We can call the RPLW algorithm again, which contradicts that Rule1 and Rule2 are not applicable. On the other hand, if p ≤ s, the total number of Q1 -edges in the graph is that: |Q1 -edge| = |Q1 edge| + d = p + d ≤ s + d ≤ t ≤ k − 1. Each Q1 -edge has two Q1 -vertices, therefore, the number of Q1 -vertices is bounded by |Q1 -vertex| ≤ 2(k − 1). This completes the proof. Based on theorem 1 and theorem 2, we can get the following theorem.
220
J. Wang et al.
Theorem 3. The k-P2 -packing problem has a kernel of size at most 7k − 7. Proof. By applying the RPLW algorithm repeatedly until the two rules are not applicable. The vertices in G consist of the vertices in W and Q. Assume V (G) denotes the vertices in G. The vertices in Q contains only Q0 -vertices and Q1 vertices. By Theorem1, we can get that |Q0 -vertex| ≤ 2(k − 1). By Theorem2, we can get that |Q1 -vertex| ≤ 2(k − 1). Since |V (W )| ≤ 3(k − 1), thus, |V (G)| = |V (W )| + |Q0 | + |Q1 | ≤ 3(k − 1) + 2(k − 1) + 2(k − 1) = 7k − 7. Therefore, the k-P2 -packing problem has a kernel of size at most 7k − 7.
4
The Improved Parameterized Algorithm
For the k-P2 -packing problem, we proposed an improved parameterized algorithm based on the 7k kernel. We first apply the kernelizaiton algorithm to obtain a kernel for the problem. Since each P2 has a center vertex, in order to find k vertex disjoint P2 , we just need to find k center vertices in brute force manner on the 7k kernel. The specific algorithm is given in figure 4.
Algorithm KPPW Input: G = (V, E) Output: a k-P2 -packing in G, or can not find a k-P2 -packing in G 1. compute a maximal P2 -packing W with a greedy algorithm; 2. apply the RPLW(G, W, k) until rule1 and rule2 are not applicable; 3. if |V (G)| > 7k then report “there exists a k-P2 -packing” and stop; 4. find all possible subsets C of size k in reduced G; 5. for each C do 6. for each vertex v in C, produce a copy vertex v ; 7. Construct a bipartite graph G = (V1 ∪ V2 , E) in the following way: the edges connecting to v are also connected to v . The k vertices and its copy vertices are put into V1 , and the neighbors of the k vertices in C are put into V2 ; 8. use Maximum bipartite matching algorithm to find the k center vertices; 9. if all the vertices on V1 are matched then report “there exists a k-P2 -packing in G” and stop; 10. report “there is no a k-P2 -packing in G” and stop;
Fig. 4. KPPW agorithm
Theorem 4. If there exists a k-P2 -packing, the KPPW algorithm will find the k-P2 -packing in time O∗ (24.142k ). Proof. It can be seen from the algorithm, the step2 is the whole kernelization process applying the RPLW algorithm repeatedly. As a result, we can obtain
An Improved Parameterized Algorithm for a Generalized Matching Problem
221
a kernel of size at most 7k. We check the number of vertices in reduced G in Step3. If the number of vertices is more than 7k, there must be a k-P2 -packing in G, and the KPPW algorithm does not need to run. We apply a straightforward brute-force method on the kernel to find the optimal solution from Step 4 to Step 8. The general idea is as follows: We find all possible subsets C of size k in reduced graph G. For each C, we will construct a bipartite graph G = (V1 ∪ V2 , E) in the following way: first, for each vertex v in C, we produce a copy vertex v with the property that the edges connecting to v are also connected to v . The k vertices and its copy vertices are put into V1 , and the neighbors of the k vertices in C are put into V2 . If all the vertices on the V1 are matched by a maximum bipartite matching, the k vertices in C must be the center vertices of a k-P2 -packing, therefore, report “there exists a k-P2 -packing in G” , and the algorithm stops. If for all C, we cannot find k center vertices, report “there does not exist a k-P2 -packing in G”, and the algorithm stops. In the following, we analyze the time complexity of algorithm KPPW. Step1: Using greedy algorithm to find a maximal packing can be done in time O(|E|). Step2: The kernelization process given in [7] can be done in O(n3 ) time, and the whole process that call the algorithm RPLW repeatedly until Rule1 and Rule2 are not applicable runs in O(k 3 ) time, therefore, the time complexity of Step2 is O(n3 + k 3 ). Step3: Obviously, the running time of Step 3 is linear in the size of V . Step4-Step8: We find the center vertices of the P2 -packing in a brute force manner, which has 7k k enumerations. By Stirling’s formula, this is bounded by 24.142k . We construct a bipartite graph G = (V1 ∪V2 , E) with k vertices in C and its copy vertices in V1 , and the neighbors of k vertices in V2 . Thus, the original question is transformed to find the maximum matching2.5problem in bipartite G which can be solved in time O( |V1 + V2 ||E|) = O(k ). Therefore, the total running time of Step4-Step8 is O(24.142k k 2.5 ). As a result, the total running time of algorithm KPPW is bounded by O(|E|+ |n3 + k 3 + |V | + k + 24.142k k 2.5 ) = O∗ (24.142k ).
5
Conclusions
In this paper, we mainly focus on the kernelization for the k-P2 -packing problem. We give further structure analysis of the problem, and propose a kernelization algorithm obtaining a kernel of size at most 7k. Comparing with the kerneliztion given in [7], our algorithm makes further optimization on the vertices of any P2 in W and their Q0 -vertex neighbors and Q1 -edge neighbors, which reduces the number of Q0 -vertices and Q1 -edges in Q. Based on the 7k kernel, we also present an improved parameterized algorithm with time complexity O∗ (24.142k ), which greatly improves the current best result O∗ (25.301k ).
222
J. Wang et al.
References 1. Hell, P.: Graph Packings. Electronic Notes in Discrete Mathematics 5 (2000) 2. Kirkpatrick, D.G., Hell, P.: On the complexity of general graph factor problems. SIAM J. Comput. 12, 601–609 (1983) 3. Kann, V.: Maximum bounded H-matching is MAX-SNP-complete. J. nform. Process. Lett. 49, 309–318 (1994) 4. Hurkens, C., Schrijver, A.: On the size of systems of sets every t of which have an SDR, with application to worst case ratio of Heuristics for packing problems. ISIAM J. Discrete Math. 2, 68–72 (1989) 5. Fellows, M., Heggernes, P., Rosamond, F., Sloper, C., Telle, J.A.: Exact algorithms for finding k disjoint triangles in an arbitrary graph. In: Hromkoviˇc, J., Nagl, M., Westfechtel, B. (eds.) WG 2004. LNCS, vol. 3353, pp. 235–244. Springer, Heidelberg (2004) 6. Mathieson, L., Prieto, E., Shaw, P.: Packing edge disjoint triangles: a parameterized view. In: Downey, R.G., Fellows, M.R., Dehne, F. (eds.) IWPEC 2004. LNCS, vol. 3162, pp. 127–137. Springer, Heidelberg (2004) 7. Prieto, E., Sloper, C.: Look at the stars. Theoretical Computer Science 351, 437–445 (2006) 8. Chen, J.: Parameterized computation and complexity: a new approach dealing with NP-hardness. Journal of Computer Science and Technology 20, 18–37 (2005)
Deterministic Hot-Potato Permutation Routing on the Mesh and the Torus Andre Osterloh FernUniversit¨ at in Hagen Germany
[email protected]
Abstract. In this paper we consider deterministic hot-potato routing algorithms on n × n meshes and tori. We present algorithms for the permutation routing problem on these networks and achieve new upper bounds. The basic ideas used in the presented algorithms are sorting, packet concentration and fast algorithms for one-dimensional submeshes. Using this ideas we solve the permutation routing problem in 3.25n+o(n) steps on an n × n mesh and in 2.75n + o(n) steps on an n × n torus.
1
Introduction
This paper studies routing in a synchronous network with bidirectional links in which at most one packet is able to traverse any link in each time step and direction. We consider two of the most studied network with a fixed interconnection, the two-dimensional n × n mesh and torus. The problem of routing packets through a network is fundamental to the study of parallel computation. One of the best studied routing problems is the problem where each processor is source and destination of one packet, the permutation routing problem. In the literature many different approaches to solve permutation routing problems have been studied, e.g. adaptive, oblivious, cutthrough, wormhole, fault-tolerant, local, etc. [8]. Here we consider a routing strategy known as hot-potato or deflection routing. In hot-potato routing no buffers are used for storing packets. In this routing strategy in each step and in each processor all incoming packets, unless they have reached their destination, have to leave the processor in the following step (see Figure 1). Hot-potato algorithms are attractive for practical applications because they tend to be simple and due to the lack of buffers they have efficient hardware realizations. They have been observed to work well in practice and have been used successfully on several parallel machines or optical networks [10, 18, 19, 1, 9]. Although hot-potato routing algorithms are rather simple they are very hard to analyze. Compared to other routing strategies the gap between the known upper and lower bounds for hot-potato routing algorithms on meshes (and tori) is large. To give an example, for greedy1 hot-potato routing no asymptotically 1
In greedy hot-potato routing a packet has to use an outgoing link in the direction of its destination whenever it is possible.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 223–233, 2008. c Springer-Verlag Berlin Heidelberg 2008
224
A. Osterloh
A
B A
D
B D
B
C
D
C A
C step t−1
routing decision
step t
Fig. 1. One step in a processor in hot-potato routing
optimal routing algorithm is known. Nearly all fast routing algorithms for the mesh and torus use sorting to achieve their bounds, e.g. see [8, 11, 17]. In the case of hot-potato routing it is surprising that sorting can be used. Newman and Schuster in [16] introduced it for hot-potato routing and we use a variant of their technique here. Another well known technique to achieve fast routing algorithms is to concentrate packets in a smaller subnetwork and to solve the problem in this subnetwork. In combination with hot-potato routing the concentration technique was not used before. Since in hot-potato routing no buffers are allowed to store packets the possibilities to concentrate packets are very limited. In the algorithm we concentrate the packets in a subnetwork with the half number of nodes, so there are two packets on each node in a step. This can be done without buffers. In routing algorithms for two-dimensional meshes or tori, one-dimensional routing is often used as a subroutine. We use one-dimensional routing where each processor is source and destination of at most two packets at several points of algorithm. Hence we analyze one-dimensional routing and achieve new upper bounds for two-dimensional hot-potato routing on meshes and tori. The rest of the paper is organized as follows. In Section 2 we present related work. In Section 3 we give some necessary definitions. In Section 4 we analyze one-dimensional routing with the farthest destination first strategy and give some simple results. In Section 5 we present and analyze our 3.25n+o(n) (2.75n+o(n)) step permutation routing algorithms for the mesh (torus). In Section 6 we give a short conclusion.
2
Related Work
Several randomized or deterministic hot-potato routing algorithms have been designed for the n × n-mesh or torus [2, 3, 4, 5, 6, 7, 11, 14, 16]. In [2] BenAroya et al. provide a deterministic greedy hot-potato routing algorithm for the n × n mesh that delivers a batch of l packets in 2(l − 1) + dmax steps, where dmax is the maximal initial soure to destination distance of a packet. In their batch problems a processor is source of at most four and destination of at most l
Deterministic Hot-Potato Permutation Routing
225
packets. Since l = n2 in permutation routing this results an O(n2 ) √ bound for the permutation routing problem. In [4] Ben-Dor et al. achieve a O(n l) bound for the batch problem in an n × n mesh with the help of potential function analysis. For the two-dimensional mesh, Kunde in [14] presents the first deterministic 2 greedy hot-potato routing algorithm √ with a o(n ) bound for permutation routing. Kunde achieves a bound of O(n n log n) steps. In [5] Busch et al. present a randomized greedy hot-potato routing algorithm for the n × n mesh that solves any permutation routing problem in O(n ln n) and batch routing problems in O(m ln n) steps with high probability, where m = min{mr , mc } ∈ Θ(n) and mr (mc ) is the maximum number of packets targeted to a single row (column). In [6] the same authors study batch routing where each processor is source of at most one and destination of at most 4n2 packets. Their algorithm needs O(LB ·log3 n) steps with high probability, where LB ∈ Ω(n) is a lower bound based on dmax and the maximum congestion of an instance. For non-greedy hot-potato routing, Feige and Raghavan [7] present a randomized algorithm that routes any random destination problem in 2n + O(ln n) steps with high probability. They also solve any permutation routing problem in 9n steps with high probability. Newman and Schuster [16] give a deterministic non-greedy hot-potato routing algorithm for permutation routing that needs 7n + o(n) steps on the mesh and 4n + o(n) steps on the torus. The algorithms for the mesh and torus in [16] use sorting to solve the routing problem. In [11] Kaufmann et al. improve this result to 3.5n + o(n). Hence 3.5n + o(n) was the so far best known bound for deterministic hot-potato routing on an n × n mesh or torus. For non hot-potato routing several optimal 2n − 2 step algorithms for the permutation routing problem are known [8].
3
Basic Definitions
In an n × n mesh each processor (node) is given as a pair (r, c), 1 ≤ r, c ≤ n. A processor (r, c) lies in row r and column c and is connected to at most four adjacent nodes by bidirectional links. In a mesh a processor (r, c) is connected with (r , c ) iff |r − r | + |c − c | = 1, 1 ≤ r, r , c, c ≤ n. The diameter of a mesh is 2n − 2 and the bisection width is n. We call a node (r, c) where r + c is even a black node and a node where r + c is odd a white node. Packets that are initially on a black (white) node are called black (white) packets. In hotpotato routing white and black packets never meet in any node. We denote the (at most) four incoming and outgoing links of a processor by 1, 2, 3 or 4 (see Figure 2). A torus is a mesh with wrap-around links. Node (i, 1) is connected with (i, n) and node (1, j) is connected with (n, j), 1 ≤ i, j ≤ n. The advantage of a torus is that all nodes can be treated in the same way since no border or corner nodes exists. The diameter of a torus is half the diameter of the corresponding mesh and its bisection width is twice as large.
226
A. Osterloh
outgoing links: 1−link 4−link
2−link 3−link
incoming links: 1−link 4−link
2−link 3−link
Fig. 2. A node (r, c) in an n × n mesh or torus. In the case of the mesh 1 < r, c < n
4
Basic Problems
First we consider the following problem in row i, 1 < i ≤ n, of the mesh2 . We call the problem 2-2 basic routing. We want to route up to n packets in row i using hot-potato routing. Initially there are at most two packets on black (white) nodes and no packets on white (black) nodes. Each node is destination of at most two packets. During the routing the packets are allowed to use the 2-links and 4-links in row i and the 1-links and 3-links between row i and row i − 1. We assume that a packet is taken out of the mesh when it arrives at its destination. For a problem instance R of 2-2 basic routing we define rR (k, l) (lR (l, k)) as the number of packets that have to pass from left to right (right to left) node (i, k) and node (i, l). We route the packets greedily to their destination using the farthest destination first strategy. In the case that during routing two packets in a node want to use the same 2-link or 4-link, the packet with lower priority uses the 1-link and go to row i − 1. In the next step it uses the 3-link to come back into the node in row i. If a packet uses neighboring nodes to wait, we say that the packet vibrates. The following holds (for a similar result for (non hot-potato) routing see Lemma 3.1 in [12]): Lemma 1. Solving a 2-2 basic routing problem R in row i, using the farthest destination first strategy, takes at most maxk,l∈X {2(max{rR (k, l), lR (l, k)}−1)+ l − k} steps, where X = {(k, l) | k < l ∧ (rR (k, l) = 0 ∨ lR (l, k) = 0)}. Proof. In this proof we restrict our attention to packets traveling from left to right. For the other packets the bound could be proved analogously. We show the bound by proving 2
Row i consists of processors (i, x), 1 ≤ x ≤ n.
Deterministic Hot-Potato Permutation Routing
227
∀n > 0. n = maxk
fR (kmax , lmax ) for a k < kmax ). If there are one or two packets from M in kmax , we get fR (kmax , lmax ) = n − 1, since rR (kmax , lmax ) = rR (kmax , lmax )−1. If there are two packets in kmax from which only one is in M , the packet in M has higher priority because the destination of the other packet is smaller than lmax . So fR (kmax , lmax ) = n − 1 also for this case. Now assume that there are a, b such that fR (a, b) = n. No packet of MR (a, b) has passed column a in the two steps otherwise fR (a, b) > n + 1. So there are two possibilities: (1) There is no packets in a in instance R or (2) all packets in a have a destination column < b. In both cases we get MR (a−1, b) = MR (a, b) and fR (a − 1, b) ≥ fR (a, b) + 1 = n + 1. So we get the maximum of n + 1 for columns a − 1 and b. For a maximum we know that a packet of MR (a − 1, b) leaves a − 1 in the first step and hence leaves a in the second step. A contradiction to (1) and (2). • The result of Lemma 1 holds also for row 1. In row 1 the packets vibrate using row 2. Since the vibration of packets in different rows and the vibration of black and white packets to do not interfere we get: Lemma 2. The 2-2 basic routing problem can be solved simultaneously for all n rows and black and white packets in at most maxk
228
A. Osterloh
in a row i > 1 use row i − 1. Now we want to note that it is possible to sort an n × n mesh in O(n) steps. To do so a variant of the sorting algorithm presented in [15] or [17] can be used. Lemma 4. Let each black (white) node of the n × n mesh have two packets. There is a hot-potato algorithm that sorts simultaneously black and white packets in snake-like column-major order in O(n) time. (Finally the black (white) packets are in black (white) nodes.) Of course the result of Lemma 4 also holds for column-major, row-major snakelike row-major and similar indexing schemes. For the algorithm presented in the next section it is sufficient to sort an n3/4 × n3/4 mesh in o(n) steps. For this purpose also algorithms like Shearsort which are not asymptotically optimal can be used. The benefit of Shearsort is that it is simple and easy to implement.
5
Permutation Routing on the Mesh and Torus
We begin with a rough version of the algorithm and give a more detailed version of step 2 and step 3 later. The mesh is devided in four submeshes of n × n/4 nodes. We call these submeshes A, B, C, and D (see Figure 3). In step 1 the packets are concentrated in the middle n/2 columns of the mesh (in BC). In step 2 the algorithm uses a kind of 2-2 sorting to solve the routing problem. The packets are sorted according to their destination column. To save time we do not totally sort the packets and stop after 3n/2 + o(n) steps. In step 3 the packets are transported to their final destination. Algorithm Permutation Routing (1) Move the packets n/4 steps horizontally from submesh A to submesh B and from submesh D to submesh C. (n/4 steps) (2) In the n/2×n mesh BC partially sort the packets according to the destination column of a packet. (3n/2 + o(n) steps) (3) Route packets to their destination: (3a) Horizontally to their destination column. (n/2 + o(n) steps) (3b) Vertically within the destination column. (n + o(n) steps) Lemma 5. Algorithm Permutation Routing on a n × n mesh needs 3.25n + o(n) steps. Since black and white packets never interact we can restrict our attention to black packets. All the following results and proofs are given only for black packets. They also hold for white packets. Details of Step 2. Sorting in BC. In the following we use column-major and snake-like column-major in the usual sense and assume that n/4, n1/4 /4 are integer3 . The algorithm is based on Schnorr/Shamir’s algorithm from [17]. We divide BC into blocks of size 2n3/4 ×
Deterministic Hot-Potato Permutation Routing a horizontal slice of blocks 1/4 consists of n /4 blocks
n/4
A
B
229
C
D
n/4
n/2
2n3/4
a block (b)
n/4
2n3/4 (a)
Fig. 3. Division of the Mesh for Permutation Routing
2n3/4 . We call a row of n1/4 /4 blocks a horizontal slice and a column of n1/4 /2 blocks a vertical slice. BC consists of n1/4 /2 horizontal and n1/4 /4 vertical slices. Algorithm Partially Sorting (1) Sort each block in snake-like column-major order. (o(n) steps) (2) Perform an n1/4 /2-way unshuffle of the rows. (The packets move within the vertical slices). (n steps) (3) Sort each block into snake-like column-major order. (o(n) steps) (4) Sort each row in linear order. (n/2 steps) (5) Collectively sort blocks 1 and 2, blocks 3 and 4, blocks 5 and 6, etc. of each horizontal slice into column-major order. (o(n) steps) (6) Collectively sort blocks 2 and 3, blocks 4 and 5, blocks 6 and 7, etc. of each horizontal slice into column-major order. (o(n) steps) In the following proofs we use the 0-1 principle([13]) several times. For k, 1 ≤ k ≤ n, we mark all packets with a destination column < k as 0s, all packets with destination column k with 1s and all packets with destination column > k with 2s. Lemma 6. After step 6 of algorithm Partially Sorting the following holds: (0) In any horizontal slice there are at most n3/4 + n1/2 /4 1s. (1) There are at most 3 neighboring columns in BC that contain black 1s. (2) There are at most 2 black packets in each row with the same column destination. Proof. In step 2 of algorithm Partially Sorting the number of 0s (1s, 2s) a block sends to any two blocks of a vertical slice (vs for short) differs at most by two. Since there are n1/4 /2 blocks in a vs, the number of 0s (1s, 2s) in any two blocks in a vs differs at most by n1/4 after step 2. There are n1/4 /4 blocks in a horizontal slice (hs for short). Therefore, the number of 0s (1s, 2s) in any two hs differs at most by n1/2 /4 after step 2. Steps (3) to (6) do not change the number of 0s (1s, 2s) in a hs. Hence after step 6 the number of 0s (1s, 2s) in any two hs differ at most by n1/2 /4. Furthermore, steps 3 to 6 sort the horitzontal slices in 3
Since n1/4 /4 is integer we have n ≥ 44 .
230
A. Osterloh
column-major order. Let l1s,min be the minimal number of 1s in a hs, l1s,max the maximal number of 1s in a hs and t1s be the total number of 1s in BC. Then 2t1s /n1/4 − n1/2 /4 < l1s,min ≤ l1s,max < 2t1s /n1/4 + n1/2 /4. So (0) holds. Since (2t1s /n1/4 + 2n1/2 /4)/(2n3/4) < 2 (the additional (n1/2 /4)/ (2n3/4 ) stems from the difference of 0s within a hs) (1) holds. Although the 1s could be in three neighboring columns, the cut of these three columns and any row consists of at most two neighboring nodes. This is due to the sorting in the blocks. Since from any two neighboring nodes in a row, one node is a black node and one node is a white node (2) is fulfilled. • Note that BC is not sorted in column-major but the horizontal slices are. Furthermore, for each destination column i, 1 ≤ i ≤ n, the black packets with destination column i can be found in at most three neighboring columns of the mesh. Additionally there are at most two black packets in a row with destination column i. Finally we consider the running time of the sorting step: Lemma 7. Algorithm Partially Sorting needs 1.5n + O(n3/4 ) steps. Proof. The running time of steps (1), (3), (5), and (6) follows from Lemma 4. The running time of step (2) follows from Lemma 2 and the running time for step (4) follows from Lemma 3. • Details of Step 3. Greedy Routing to the destination. In step (3a) we have to solve an instance of 2-2 basic routing. The destination of a packet in this problem is its destination column. This, together with the fact that the packets are partially sorted according to their destination column, ensures that there is no overtaking of packets during the horizontal routing, i.e. if two packets π and π in a row are initially in column i and i , i < i , and their destinations are j and j , then j ≤ j . A packet that has reached its destination column in step (3a) vibrates until step (3b) starts. We have to make sure that the packets are able to vibrate before the beginning of step (3b). Packets in row i with destination column j use nodes (i − 1, j) and (i + 1, j) to vibrate. So columns 1 and n could be a problem. Since there is no overtaking, node (i, 1) is able to vibrate the two packets to node (i, 2) and node (i + 1, 1) if i < n. Analogously node (i, n) can vibrate packets to node (i, n − 1) and node (i + 1, n). For the nodes (n, 1) and (n, n) a special treatment is necessary. One packet of the possibly two packets can vibrate using node (n, 2) ((n, n − 1)). The other packet have to vibrate from node (n, 2) to node (n − 1, 2) (node (n, n − 1) and (n − 1, n − 1)). In step (3b) this special treatment of the two corner nodes only results in O(1) extra steps. The time bound of n/2 + O(1) for step (3a) can be seen with the help of Lemma 2. Lemma 8. Step (3a) can be done in n/2 + O(1) steps. Proof. (Sketch) We only consider packets traveling from left to right. The bounds for packets traveling from right to left are the same. To avoid notational clutter, we
Deterministic Hot-Potato Permutation Routing
231
omit constants that stem from the fact that packets with a destination column k, 1 ≤ k ≤ n, are located in three neighboring columns. Instead we assume that BC is sorted. This reduces the running time only by a constant. We have to give an upper bound for max(i,j)∈X {2(max{r(i, j), l(i, j)} − 1) + j − i}. To do so we construct a worst case, i.e. we place the packets such that max(i,j)∈X {2(max{r(i, j), l(i, j)} − 1)+ j − i} is maximal. We have to maximize 2(r(i, j)− 1)+ j − i. Such a worst case occurs when all black packets have a destination in the right half of the mesh. In the following n/4 ≤ i < j ≤ 3n/4, l := j − i, and k := 3n/4 − j. Note that n black packets fit in one column of BC and that a column of the mesh is destination of n packets. Packets located in columns j to 3n/4 in the beginning of step 3a have to go to columns n−k to n. Hence at most (n−k−j)·n free destinatons exist in columns t where t ≥ j. Since n−k −j = n/4 we have at most n2 /4 free destinations. Packets between columns i and j have a destination to the right of column i. Hence at most n2 /4 − ln free destinations to the right of column i exist. Since n2 /4 − ln packets fit in n/4 − l columns, at most n/4 − l black packets in a row have to pass i and j from left to right. So we get j − i + 2(n/4 − l) = n/2 − j + i ≤ n/2. • Simulations of the algorithm have shown that the average time the packets need in step (3a) is approximately n/4. In the following we describe how packets are routed in step (3b). We restrict our attention to packets with destination column k, 1 < k < n, and begin with the description of the initial situation. We come back to the description of the situation for packet with destination column 1 or n later. Initially the packets are in columns k − 1, k, and k + 1 such that in column k there are at most two packets on any black node and no packets on any white node. In columns k − 1 and k + 1 there is at most one packet on any black node and no packet on any white node. The packets are routed to their destination in the following way. Packets in column k − 1 and k + 1 are routed horizontally to column k. Packets in column k are routed vertically towards their destination. If two packets in column k compete for a vertical link (1-link or 3-link) the packet with the farthest destination wins. The other packet is routed horizontally to column k − 1 or k. For the time of step (3b) we get Lemma 9. For packets with destination column k, 1 < k < n, step (3b) can be done in n + o(n) steps. Proof. (sketch) Let u(i, j) be the number of packets that have to pass row i and j downwards and d(i, j) be the number of packets that have to pass row i and j upwards. (u(i, j) and d(i, j) are defined as r(i, j) and l(i, j) for 2-2 basic routing). Then a bound of max(k,l)∈X {2(max{u(k, l), d(l, k)} − 1) + l − k}+O(1) steps can be proved analogously to the proof of Lemma 2. The additional O(1) steps result from the fact that initially some packets are on a node in column k − 1 or k + 1. So, to get a bound for the running time of step (3b), we have to find an upper bound for the number of packets that pass node i and node j. We restrict our attention to black packets that have to travel downwards. From Lemma 6.(0) we know that in an horizontal slice at most 2n3/4 + O(n1/2 ) packets with destination column k exist. So in r horizontal slices there are at
232
A. Osterloh
most r(2n3/4 + O(n1/2 )) packets with destination column k. Since we consider permutation routing, each node is destination of exactly one packet. So at most min{i/(2n3/4 ) (2n3/4 + O(n1/2 ), n − j} = min{i + o(n), n − j} packets pass i and j. Since 2(d(i, j) − 1) + j − i ≤ 2min{i + o(n), n − j} + j − i = n + o(n) we have proved the desired bound. • Finally we have to consider destination columns 1 and n. We restrict our attention to column n. Column 1 can be treated in the same way. In a horizontal slice there are at most 2n3/4 + O(n1/2 ) black packets with destination column n. Nearly all (up to O(n1/2 )) of these packets are in column 3n/4 after step 2 of algorithm Partially Sorting. So only from row i ≥ n − O(n1/2 ) each row has two packets. In rows i < n − O(n1/2 ) only every second row has two packets. Hence we have a situation very similar to a 2-2 basic routing problem that can be solved in n + o(n) steps. We omit a proof here. We get Theorem 1. There is a deterministic hot-potato routing algorithm that solves Permutaion Routing on an n × n mesh in 3.25n + o(n) steps. Algorithm Permutation Routing also works on a torus but does not benefit from the connections between row 1 (column 1) and row n (column n). On a torus no special treatment for columns (rows) 1 or n is necessary and step (2) of algorithm Partially Sorting can be done within n/2 steps. At the moment we do not know whether step (3b) can be performed faster than in n steps on a torus. We assume that it can be done in 3n/4 + o(n) steps. To get a better bound for step (3b) a deeper analysis of hot-potato algorithms for 2-2 basic routing problems on rings is necessary. So we have: Theorem 2. There is a deterministic hot-potato routing algorithm that solves Permutaion Routing on an n × n torus in 2.75n + o(n) steps. Under the assumption that step (3b) can be done in 3n/4 + o(n) steps a bound of 2.5n + o(n) could be achieved.
6
Conclusion
In this paper we have shown new upper bounds for deterministic hot-potato routing on the mesh and torus. For permutation routing we reduced the gap between the known upper and lower bound. To achieve the new bound we analyzed routing problems on one-dimensional meshes. The results achieved there could be useful for the design of fast hot-potato routing algorithms for higher dimensional meshes or similar networks. To design fast (permutation routing) algorithms for the torus it would be helpful to have a similar result for rings. For local routing problems we achieved asymtotically optimal deterministic hotpotato algorithms. Our algorithms use sorting to solve the routing problem. As far as we know no asymptotically optimal deterministic hot-potato routing algorithm for the Permutation Routing problem exists that does not use sorting.
Deterministic Hot-Potato Permutation Routing
233
References [1] Acampora, A.S., Shah, S.I.A.: Multihop Lightwave Networks: Acomparison of Store-and-Forward and Hot-Potato Routing. In: Proceedings of IEEE INFOCOM, pp. 10–19 (1991) [2] Ben-Aroya, I., Eilam, T., Schuster, A.: Greedy Hot-Potato Routing on the TwoDimensional Mesh. Distributed Computing 9(1), 3–19 (1995) [3] Borodin, A., Rabani, Y., Schieber, B.: Beterministic Many-To-Many Hot Potato Routing. Theory of Computer Systems 31(1), 41–61 (1998) [4] Ben-Or, A., Halevi, S., Schuster, A.: Randomized Single-Target Hot-Potato Routing. Journal of Algorithms 23(1), 101–120 (1997) [5] Busch, C., Herlihy, M., Wattenhofer, R.: Randomized Greedy Hot-Potato Routing. In: Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2000), San Francisco, Calefornia, USA, January 2000, pp. 458–466 (2000) [6] Busch, C., Herlihy, M., Wattenhofer, R.: Hard-potato Routing. In: Proceedings of the Thirty-second Annual ACM Symposium on Theory of Computing (STOC 2000), Portland, Oregon, USA, May 2000, pp. 278–285 (2000) [7] Feige, U., Raghavan, P.: Exact Analysis of Hot-Potato Routing. In: IEEE Proceedings of the 33rd Annual Symposium on Foundations of Computer Science (FOCS 1992), Pittsburgh, Pennsylvania, USA, October 1992, pp. 553–562 (1992) [8] Grammatikakis, M.D., Hsu, D.F., Sibeyn, J.F.: Packet routing in fixed-connection networks: A survey. Journal of Parallel and Distributed Computing 54(2), 77–132 (1998) [9] Greenber, A.G., Goodmann, J.: Sharp Approximate Models of Deflection Routing in Mesh Networks. IEEE Transactions on Communications 41(1), 210–223 (1993) [10] Hillis, W.D.: The Connection Machine. MIT Press, Cambridge (1985) [11] Kaufmann, M., Lauer, H., Schr¨ oder, H.: Fast Deterministic Hot-Potato Routing on Meshes. In: Du, D.-Z., Zhang, X.-S. (eds.) ISAAC 1994. LNCS, vol. 834, pp. 273–282. Springer, Heidelberg (1994) [12] Kaufmann, M., Rajasekaran, S., Sibeyn, J.F.: Matching the Bisection Bound for Routing and Sorting on the Mesh. In: Proceedings of the 4th Symposium on Parallel Algorithms and Architectures (SPAA 1992), pp. 31–40 (1992) [13] Knuth, D.: The Art of Computer Programming, Sorting and Searching, vol. III. Addison-Wesley, Reading (1973) [14] Kunde, M.: A New Bound for Pure Greedy Hot Potato Routing. In: Thomas, W., Weil, P. (eds.) STACS 2007. LNCS, vol. 4393, pp. 49–60. Springer, Heidelberg (2007) [15] Leighton, T.: Introduction to Parallel Algorithms and Architectures: ArraysTrees-Hypercubes. Morgan-Kaufmann Publishers, San Francisco (1992) [16] Newman, I., Schuster, A.: Hot-Potato Algorithms for Permutation Routing. IEEE Transactions on Parallel and Distributed Systems 6(11), 1168–1176 (1995) [17] Schnorr, C.P., Schamir, A.: An Optimal Sorting Algorithm for Mesh Connected Computers. In: Prooceedings of the 18th Symposium on Theory of Computing (STOC 1986), pp. 255–263 (1986) [18] Seitz, C.L.: Mosaic C: An Experimental, Fine-Grain Multicomputer. In: Bensoussan, A., Verjus, J.-P. (eds.) INRIA 1992. LNCS, vol. 653, pp. 69–85. Springer, Heidelberg (1992) [19] Smith, B.J.: Architecture and Applications of the HEP multiprocessor computer. Soc. Photocopti. Instrum. Eng. 298, 241–248
Efficient Algorithms for Model-Based Motif Discovery from Multiple Sequences Bin Fu1 , Ming-Yang Kao2 , and Lusheng Wang3 1
3
Dept. of Computer Science, University of Texas - Pan American TX 78539, USA [email protected] 2 Department of Electrical Engineering and Computer Science, Northwestern University, Evanston, IL 60208, USA [email protected] Department of Computer Science, The City University of Hong Kong, Kowloon, Hong Kong [email protected]
Abstract. We study a natural probabilistic model for motif discovery that has been used to experimentally test the quality of motif discovery programs. In this model, there are k background sequences, and each character in a background sequence is a random character from an alphabet Σ. A motif G = g1 g2 . . . gm is a string of m characters. Each background sequence is implanted a randomly generated approximate copy of G. For a randomly generated approximate copy b1 b2 . . . bm of G, every character is randomly generated such that the probability for bi = gi is at most α. In this paper, we give the first analytical proof that multiple background sequences do help for finding subtle and faint motifs.
1
Introduction
Motif discovery is an important problem in computational biology and computer science. For instance, it has applications to coding theory [3,4], locating binding sites and conserved regions in unaligned sequences [18,10,6,17], genetic drug target identification [9], designing genetic probes [9], and universal PCR primer design [13,2,16,9]. This paper focuses on the application of motif discovery to finding conserved regions in a set of given DNA, RNA, or protein sequences. Such conserved regions may represent common biological functions or structures. Many performance measures have been proposed for motif discovery. Let C be a subset of 0-1 sequences of length n. The covering radius of C is the smallest integer r such that each vector in {0, 1}n is at a distance at most r from a set of 0-1 sequence of length n. The decision problem associated with the covering radius for a set of binary sequences is NP-complete [3]. Another similar problem called closest string problem was also proved to be NP-hard [3,9]. Some approximation algorithms have also been proposed. Li et al. [12] gave an approximation scheme for the closest string and substring problems. The related consensus patterns problem is that give n sequences s1 , · · · , sn , it asks for a region of length L in each M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 234–245, 2008. c Springer-Verlag Berlin Heidelberg 2008
Motif Discovery from Multiple Sequences
235
si , and a median string s of length L so that the total Hamming distance from s to these regions is minimized. Approximation algorithms for the consensus patterns problem were also reported in [11]. Furthermore, a number of heuristics and programs have been developed [15,7,8,19,1]. In many applications, motifs are faint and may not be apparent when two sequences alone are compared but may become clearer when more sequences are together [5]. For this reason, it has been conjectured that comparing more sequences together can help identifying faint motifs. In this paper, we give the first analytical proof for this conjecture. In this paper, we study a natural probabilistic model for motif discovery. In this model, there are k background sequences and each character in the background sequence is a random character from an alphabet Σ. A motif G = g1 g2 . . . gm is a string of m characters. Each background sequence is implanted a randomly generated approximate copy of G. For a randomly generated approximate copy b1 b2 . . . bm of G, every character is randomly generated such that the probability for bi = gi is at most α. This model was first proposed in [15] and has been widely used in experimentally testing motif discovery programs [7,8,19,1]. We design an algorithm that for a reasonably large k can discover the implanted motif with high probability. Specifically, we prove that for α < 0.1771 and any constant x ≥ 8, there exist constants t0 , δ0 , δ1 > 0 such that if the length of the motif is at least δ0 log n, the alphabet has at least t0 characters, and there are at least δ1 log n0 input sequences, then in O(n3 ) time the algorithm finds the motif with probability at least 1 − 21x , where n is the longest length of any input sequence and n0 ≤ n is an upper bound for the length of the motif. When x is considered as a parameter of order O(log n), the parameters t0 , δ0 , δ1 > 0 do not depend on x. We also show some lower bounds that imply our conditions for the length of the motif and the number of input sequences are tight to within a constant multiplicative factor. This algorithm’s time complexity depends on the length of input sequences and is independent of the number of the input sequences. This is because that for a fixed x, Θ(log n) sequences are sufficient to guarantee the probability of at least 1 − 21x to discover the motif. In contrast to the NP-hardness of other variants of the common substring problem, motif discovery is solvable in O(n3 ) time in this probabilistic model. Our algorithm is an exact algorithm that has provable high probability to return the motif. The algorithm employs novel methods that extract similar consecutive regions among multiple sequences while tolerating noises. The algorithm needs the motif to be long enough, but does not need to have the length of the motif as an input. The algorithm allows the motif to appear any position at each sequence, and each mutation in a motif to be arbitrary (a mutation lets a character to be changed to an arbitrary character without any probabilistic condition). We also derive lower bounds that indicate the upper bounds are almost optimal. We give a brief description about the algorithm as section 3. Before giving the algorithm, we set up a few parameters and constants that will affect the algorithm at section 4.1. Then entire Algorithm Find-Noisy-Motif is described.
236
B. Fu, M. Kao, and L. Wang
We give the analysis and proof about Algorithm Find-Noisy-Motif and state it in our main theorem (Theorem 1). Two lower bounds are presented at section 5.
2
Notations
For a set A, |A| denotes the number of elements in A. Σ is an alphabet with |Σ| = t ≥ 2. For an integer n ≥ 0, Σ n is the set of sequences of length n with characters from Σ. For a sequence S = a1 a2 · · · an , S[i] denotes the character ai , and S[i, j] denotes the substring ai · · · aj for 1 ≤ i ≤ j ≤ n. |S| denotes the length of the sequence S. We use ∅ to represent the empty sequence, which has length 0. Let G = g1 g2 · · · gm be a fixed sequence of m characters. G is the motif to be discovered by our algorithm. A Θα (n, G)-sequence has the form S = a1 · · · an1 b1 · · · bm an1 +1 · · · an2 , where n2 + m ≤ n, each ai has probability 1t to be equal to π for each π ∈ Σ, and bi has probability at most α not equal to gi for 1 ≤ i ≤ m, where m = |G|. ℵ(S) denotes the motif region b1 · · · bm of S. The motif region b1 · · · bm of S may start at an arbitrary or worst-case position in S. Also, a mutation may convert a character gi in the motif into an arbitrary or worst-case different character bi only subject to the restriction that gi will mutate with probability at most α. A mutation converts a character gi in the motif into an arbitrary different character bi without probability restriction. This allows a character gi in the motif to change into any character bi in Σ − {gi } with even different probability. For two sequences S1 = a1 · · · am and S2 = b1 · · · bm of the same length, let i=1,···,m}| , i.e., the ratio of difference between the two diff(S1 , S2 ) = |{i|ai =bi for m sequences. Definition 1. Assume that S = a1 a2 · · · an is a sequence. For its substring S = S[i1 , j1 ] and S = S[i2 , j2 ], define shiftS (S , S ) = min(|i1 − i2 |, |j1 − j2 |). The analysis of our algorithm employs the well known Chernoff bound [14].
S1
ℵ(S1 )
ℵ(S2 )
-
S2
ℵ(S3 )
-
S3 Fig. 1. The motif regions of S1 , S2 and S3 are not aligned
Motif Discovery from Multiple Sequences
3
237
A Sketch of the Algorithm Find-Noisy-Motif
Our Algorithm Find-Noisy-Motif has two phases. The first phase exploits the fact that with high probability, the motif area in some sequences conserves the first and last characters. Furthermore, the middle area of the motif changes with a small ratio. We will select enough pairs of Θα (n, G)-sequences S , S and find their substrings G and G of S and S , respectively such that G and G match in their left and right most characters. Furthermore, G and G only have a relatively small difference in the middle area. For each such pair S and S , the substring G of S is extracted. During the second phase, a new set of Θα (n, G)-sequences S1 , S2 , · · · , Sk2 will be used. For each G extracted from a pair of sequences in the first phase, it is used to match a substring Gi of Si for i = 1, 2, · · · , k2 . Assume that G1 , · · · , Gk2 are derived from matching G to all sequences S1 , S2 , · · · , Sk2 . Some Gi may be an empty sequence if G can not match well to any substring of Si . If G has the same length as that of motif G and is very similar to G, then the number of non-empty sequences among G1 , · · · , Gk2 is much larger than k22 and the i-th character G[i] of G can be recovered from voting among G1 [i], · · · , Gk2 [i]. In other words, G[i] is the character that appears more than k22 times in G1 [i], · · · , Gk2 [i]. We prove that with high probability, such a G exists. The conversion from figure 1 to figure 2 shows how we recover the motif via voting. On the other hand, if |G | > |G| or G does not match G well, we can prove that the number of non-empty sequences among G1 , · · · , Gk2 is less than k22 . Our algorithm’s time complexity depends on the length of the input sequences and is independent of the number of the input sequences. This is because that for a fixed x, Θ(log n) sequences are sufficient to guarantee the probability of at least 1 − 21x the motif will be discovered. Additional sequences can improve the probability but are not needed for the high probability guarantee.
S1
ℵ(S1 )
-
ℵ(S2 )
-
ℵ(S3 )
-
S2
S3 Fig. 2. S1 , S2 and S3 with their motif in the same column region
4
Algorithm Find-Noisy-Motif
In this section, we give an algorithm that any motif G can be discovered in O(n3 ) time. It requires that the size of alphabet is larger than a fixed constant.
238
B. Fu, M. Kao, and L. Wang
Some parameters and constants will be used in Algorithm Find-Noisy-Motif . In section 4.1, we give a list of assignments for some parameters and constants that are used in the algorithm. The description of Algorithm Find-Noisy-Motif is given at section 4.2. The analysis of the algorithm is given at section 4.3. 4.1
Parameters
As multiple parameters affect the performance of Algorithm Find-Noisy-Motif, we list the parameters and discuss some useful inequalities here. – Let x be any constant at least 8. We will prove that Algorithm Find-NoisyMotif has probability at least 1 − 21x to output the motif G. Let α be any constant with α < 0.1771. Note that (1 − α)2 − α > 12 . Let η = 16 and let 1 ρ0 = 24 . Let > 0 be any constant such that (1 − α)2 − α − 3 > 12 . – Select any constant r0 > 0 such that (1 − α)2 − α − 3 − 2r0 >
1 . 2
(1)
– Let v be the least integer that satisfies the inequalities below: 1 ≤ v, v 2c 1 − α − 3 − 2r0 > , (1 − α)2 − 1−c 2 2c2 v 3 cv < ρ0 , 1−c 2cv r0 < , 1−c 2 2cv < ρ0 , 1−c
(2) (3) (4) (5) (6)
2
where c = e− 3 . Note that the existence of v for (3) follows from (1). – We define the following Q0 . It will be first used in Lemma 1. Let Q0 = 2cv (1 − α)2 − 1−c . – Let c2 be a constant to be specified in Lemma 6. – Let t0 be any constant such that 2(v − 1) r0 ≤ , t0 2 3 c2 v ≤ ρ0 , t0 t0 − 1 − β > , t0
(7) (8) (9)
where R is to be defined in Lemma 8. In the remainder of this paper, we always assume the parameter t ≥ t0 . Combining (3), (5), (7) and the definition of R, we have Q0 − α − 3 − 2R > 12 .
Motif Discovery from Multiple Sequences
239
– Let β = 2α + 2 . – The constant z is selected so that z ≥ v, and – The number k1 is selected such that (1 − Q1 )k1 ≤
2 z 3 2 1−e− 3
4e−
η , 2x
≤ ρ0 .
(10)
1 where Q1 is defined in Lemma 6, and is at least 12 . Note that k1 = O(1) is a constant independent of the length of the input sequences. – Select a constant δ0 > 0 and let d = δ0 log n such that n2 e−d ≤ 2ηx , 2
n 2 e − 3 d ≤ ρ0 . – We require that the length of the motif G is at least d. Let n be the largest length of an input Θα (n, G)-sequence. Let parameter n0 ∈ [d, n] be a given upper bound on the length of the motif G that will be discovered by Algorithm Find-Noisy-Motif . 2 – Select a constant δ1 > 0 and let k2 = δ1 log n0 −2k1 so that n0 k2 e− 3 k2 ≤ 2ηx , 2
and k1 e− 3 k2 ≤
η 2x .
The motif G is a pattern unknown to Algorithm Find-Noisy-Motif , and Algorithm Find-Noisy-Motif will attempt to recover G from a series of Θα (n, G)sequences generated by the probabilistic model, which is controlled by the parameters α, n, and G. The source of randomness comes entirely from the input sequence. Let’s imagine how a sequence S is generated in this model. 1). Generate a sequence S with n − |G| characters, in which each character is a random character Σ. 2). Generate G such that with probability at most α, G [i] = G[i]. For G [i] = G[i], it represents a mutation. Note that there is no restriction about how a character will change to in a mutation. 3). Insert G , which servers the motif region ℵ(S) of S, into any position of S . Let Z0 be a set of k1 pairs of random Θα (n, G)-sequences (S1 , S1 ), · · · , (Sk 1 , Sk1 ). Let Z1 be the set Θα (n, G)-sequences {S1 , S1 , · · · , Sk 1 , Sk1 } in the k1 pairs of sequences in Z0 , where k1 is defined by inequality (10). Let Z2 be a set of k2 sequences used in the second phase of Algorithm Find-Noisy-Motif . Let k = 2k1 + k2 be the total number of Θα (n, G)-sequences that are used as the input to Algorithm FindNoisy-Motif . In the remainder of this paper, we assume that the alphabet has t ≥ t0 characters. 4.2
Description of Algorithm Find-Noisy-Motif
Algorithm Find-Noisy-Motif has two phases. The input to Phase 1 is k1 pairs of Θα (n, G)-sequences in the set Z0 . The input to Phase 2 is k2 Θα (n, G)sequences in the set Z2 and the output result from Phase 1. All the Θα (n, G)sequences are independent random Θα (n, G)-sequences. Note that k1 is constant, k2 = O(log n0 ), and n0 (≤ n) is an upper bound for the length of the motif G according to the setting in Section 4.1. Algorithm Find-Noisy-Motif is a deterministic algorithm, which is based on the randomness of those sequences in
240
B. Fu, M. Kao, and L. Wang
both Z0 and Z2 and the independence in selecting them. Algorithm Find-NoisyMotif is deterministic, but its input is generated by a probabilistic model. The following steps generate data sequenced for Algorithm Find-Noisy-Motif for Z0 and Z2 . Step 1. Randomly select 2k1 Θα (n, G)-sequences S1 , S1 , S2 , S2 , · · · , Sk 1 , Sk1 and let Z0 = {(S1 , S1 ), (S2 , S2 ), · · · , (Sk 1 , Sk1 )}. Step 2. Randomly select k2 Θα (n, G)-sequences S1 , · · · , Sk2 and let Z2 = {S1 , · · · , Sk2 }. Definition 2. – Two sequences X1 and X2 are left matched if (1) |X1 | = |X2 |, (2) X1 [1] = X2 [1], and (3) diff(X1 [1, i], X2 [1, i]) ≤ β for all integers i, v ≤ i ≤ |X1 |. – Two sequences X1 and X2 are right matched if X1R and X2R are left matched, where X R = an · · · a1 is the inverse sequence of X = a1 · · · an . – Two sequences X1 and X2 are matched if X1 and X2 are both left and right matched. The function Extract(S1 , S2 ) below extracts the longest similar region between two sequences S1 and S2 . Function Extract(S1 , S2 ) Input: a pair of Θα (n, G)-sequences S1 and S2 Output: a subsequence of S2 which is similar to a subsequence of S1 . Steps: for h = min(|S1 |, |S2 |) to d (recall from Section 4.1 that |G| ≥ d) for i = 1 to |S1 | for j = 1 to |S2 | let i = i + h − 1 and j = j + h − 1; if S1 [i, i ] and S2 [j, j ] are both left and right matched (see Definition 2) then return S2 [j, j ]; return ∅ (the empty sequence); End of Extract The following are the steps of Phase 1 of Algorithm Find-Noisy-Motif : Phase 1: Input: Z0 = {(S1 , S1 ), (S2 , S2 ), · · · , (Sk 1 , Sk1 )}, a set of pairs of sequences generated at Step 1 in the initial stage of the algorithm. Output: a set W that contains a similar region of each pair in Z0 . Steps: let W = ∅ (empty set); for each pair of sequence (S, S ) ∈ Z0 let G = Extract(S, S ) and put G into W ; return W , which will be used in Phase 2; End of Phase 1
Motif Discovery from Multiple Sequences
241
After a set of motif candidates W is produced from Phase 1 of Algorithm Find-Noisy-Motif, we use this set to match with another set of sequences to recover the hidden motif via voting. Function Match(G , Si ) Input: a motif candidate G , which is returned from the function Extract(), and a sequence S from the group Z2 ; Output: either a subsequence Gi of Si of the same length as G or an empty sequence. Gi will be considered the motif region ℵ(Si ) of Si if it is not empty, and the empty sequence means the failure in extracting the motif region ℵ(Si ) of Si . Steps: find a substring Gi of Si with |G| = |Gi | such that G and Gi are matched (see Definition1) if such a Gi does not exist, let Gi = ∅ (empty string). Output Gi ; End of Match The function Vote(G1 , G2 , · · · , Gk ) is to generate another sequence G via voting, where G [i] is the most frequent character among G1 [i], G2 [i], · · · , Gk [i]. Function Vote(G1 , G2 , · · · , Gk ) Input: sequences G1 , G2 , · · · , Gk of the same length with k ≤ k2 ; Output: a sequence G , which is derived from voting at every position of the input sequences. Steps: let m = |G1 |; for each j = 1, · · · , m if strictly more than k22 characters from G1 [j], · · · , Gk [j] are equal to some character a then let aj = a else return “failure”; return G = a1 · · · am ; End of Vote The following are the steps of Phase 2 of Algorithm Find-Noisy-Motif . It uses the candidates of motif derived in the Phase 1 to extract the motif regions of another set Z2 of sequences, and recover the motif via voting. Phase 2: let Z2 = {S1 , · · · , Sk2 } as defined in the begining of Section 4.2. for each G ∈ W , let Gi = Match(G , Si ) for i = 1, · · · , k2 . let G1 , · · · , Gk be the list of all non-empty sequences in the list 2 G1 , · · · , Gk2 (Note: For every non-empty sequence that appears multiple times in the second list, it also appears the same number of times in the first list.) If k2 ≥ (Q0 − 2R − 2 )k2 then output Vote(G1 , G2 , · · · , Gk ) (which will be proven to be 2 identical to G with probability at least 1 − 21x ). End of Phase 2
242
4.3
B. Fu, M. Kao, and L. Wang
Analysis of Phase 1 of Algorithm Find-Noisy-Motif
We present Lemma 1 that shows that with high probability, the initial part and last part of motif region in a Θα (n, G)-sequence do not change much. v
2c Lemma 1. With probability at least Q0 = (1 − α)2 − 1−c , a Θα (n, G)-sequence S contains G = ℵ(S) satisfying the following conditions: (1) G [1] = G[1]; (2) G [m] = G[m]; (3) diff(G [1, h], G[1, h]) ≤ β2 for all h = v, v + 1, · · · , m; (4) diff(G [m − h, m], G[m − h, m]) ≤ β2 for h = v − 1, v + 1, · · · , m − 1, where
c = e−
2 3
and m = |G| as defined in Sections 4.1 and 2, respectively.
Lemma 2 shows that with small probability, a sequence can match a random sequence. It will be used to prove that when two subsequences in two different Θα (n, G)-sequences are similar, they are unlikely to stay away the motif regions in the two Θα (n, G)-sequences, respectively. Lemma 2. Assume that X1 and X2 are two independent sequences of the same length and that every character of X2 is a random character from Σ. Then 1. if 1 ≤ |X1 | = |X2 | < v, then the probability that X1 and X2 are matched is ≤ 1t ; and 2. if v ≤ |X1 | = |X2 |, then the probability for diff(X1 , X2 ) ≤ β is at most e−
2 |X1 | 3
.
Function Extract(S1 , S2 ) returns a subsequence of S2 . We expect that Extract( S1 , S2 ) is the motif region ℵ(S2 ) in S2 . Lemma 3 shows that with small probability, the region for Extract(S1 , S2 ) in S2 has no overlap with the motif region ℵ(S2 ) of S2 . Lemma 3. With probability at most ρ0 , Extract(S1 , S2 ) and ℵ(S2 ) are not overlaping substrings of S2 . In other words, with probability is at most ρ0 , Extract(S1 , S2 ) = S2 [j, j ], ℵ(S2 ) = S2 [t, t ], and the two intervals [j, j ] and [f, f ] have no overlap ([j, j ] ∩ [f, f ] = ∅). In order to show that Extract(S1 , S2 ) is efficient to find a motif region in S2 , we give Lemma 4 show that with small probability, the region to fetch Extract(S1 , S2 ) in S2 shift much from the motif region ℵ(S2 ) of S2 . Lemma 4. For every z > 0, the probability is at most H1 = 2ρ0 that for a pair of sequences (S1 , S2 ) from Z0 , shiftS2 (M, ℵ(S2 )) ≥ z and |M | ≥ |G|, where M = Extract(S1 , S2 ). We need the Lemma 5, which will be useful to give the upper bound of probability analysis. It is derived by the standard methods in calculus. Lemma 5. Let a be a real constant in interval (0, 1) and j be an integer ≥ 1. jaj −(j−1)aj+1 jaj i < (1−a) Then, 1. ∞ 2 ; and i=j ia = (1−a)2 2 ∞ 2 i 2j 2 aj j (j −(j−1)(j+1)a)(1−a)−(j−(j−1)a)2(−a) ) < (1−a) 2. i=j i a = a ( 3. (1−a)3
Motif Discovery from Multiple Sequences
243
Lemma 6 gives a lower bound for the probability that Extract(S1 , S2 ) returns the motif region ℵ(S2 ) of S2 . Furthermore, the motif region ℵ(S2 ) of S2 does not have much difference with the original motif G. Lemma 6. Given two independent Θα (n, G)-sequences S1 and S2 , it has the probability at least Q1 = Q20 − H2 − H1 ≥ Q20 − 4ρ0 that G = Extract(S1 , S2 ) is ℵ(S2 ), and ℵ(S2 ) satisfies the conditions of G Lemma 1, where H1 is defined in Lemma 4, H2 = c2 v 3 ( 1t + cv ) and c2 = O(1) is a constant. 1 defined in Section 4.1, we By (3), we have Q0 ≥ 12 . By Lemma 6 and ρ0 = 24 1 2 have Q1 ≥ Q0 −4ρ0 ≥ 12 . Since the number k1 is selected to be large enough that (1 − Q1 )k1 ≤ 2ηx (see (10)), the probability is at least 1 − (1 − Q1 )k1 ≥ 1 − 2ηx (by Lemma 6) that there is G0 = Extract(S1 , S2 ) = ℵ(S2 ), where S1 and S2 satisfy the conditions of Lemma 1. We now assume there is such a G0 that satisfies the conditions described above.
4.4
Analysis of Phase 2 of Algorithm Find-Noisy-Motif
Lemma 7 shows that with small probability, Z1 generated in the initial stage (step 2) of Algorithm Find-Noisy-Motif has a sequence whose motif region has many mutations. 2
Lemma 7. With probability at most 2k1 e− 3 d , there is a sequence S in Z1 that changes more than β2 |G| characters in its motif region ℵ(S). Lemma 8 shows that with high probability, phase 2 of Algorithm Find-NoisyMotif extracts motif regions from the sequences in Z1 . Lemma 8. 1. Assume that G = Extract(Si , Si ) with |G| ≤ |G |. Let S be a Θα (n, G)-sequence with M = Match(G , S) and let w0 be the number of characters of M that vare not in the region of ℵ(S). Then the probability is at c most R = 2( v−1 t + 1−c ) that w0 ≥ 1. 2. The probability is at least Q0 − R that given a random Θα (n, G)-sequence S, ℵ(S) = Match(G0 , S). Lemma 9 shows that we can use G to extract most of the motif regions for the sequences in Z2 if G = G0 (recall that G0 is close to the original motif G and G0 is defined right after Lemma 6). Lemma 9. Assume that |G | ≥ |G| and Gi = Match(G , Si ) for Si ∈ Z2 = {S1 , · · · , Sk2 } and i = 1, · · · , k2 (Recall that each sequence Gi is either an empty sequence or a sequence of the length |G |). 2 k2 3
1. If G = G0 , then the probability is at least 1 − e− than (Q0 − R − )k2 sequences Gi with Gi = ℵ(Si ). 2. The probability is at least 1 − e− 1, · · · , k2 )}| ≤ (R + )k2 .
2 k2 3
that there are more
that for every G , |{i|Gi = ℵ(Si )(i =
244
B. Fu, M. Kao, and L. Wang
Theorem 1 (Main). Assume that α is a constant less than 0.1771. There exist constants t0 , δ0 , and δ1 such that if the size t of the alphabet Σ is at least t0 and the length of the motif G is at least δ0 log n, then given k independent Θα (n, G)-sequences with k ≥ δ1 log n0 , Algorithm Find-Noisy-Motif outputs G with probability ≥ 1 − 21x and runs in O(n3 ) time, where n is the longest length of any input sequences and n0 ≤ n is a given upper bound for the length of G.
5
Lower Bounds on the Parameters
In this section, we show some lower bounds for the length of the motif and the number of input sequences that are needed to recover the motif with high probability. Theorem 2 shows that when the motif is short, it is impossible to recover it with a small number O(log n) of sequences. Thus, the upper bounds of Algorithm Find-Noisy-Motif and the lower bounds here have constant factor multiplicative. Theorem 2. Assume that constant > 0 and the alphabet has constant number t characters. There is a constant δ > 0 such that with probability at least 1 − o(1) that given n1− independent random Θα (n, G)-sequences S1 , · · · , Sn1− , every sequence of length m0 = δ log n is a substrings of each Si for i = 1, 2, · · · , n1− . We consider the lower bound for the number of sequences needed for recovering the motif. Theorem 3 shows that if the number of sequences is o(log n), it is impossible to recover the motif correctly. Theorem 3. There exists a constant δ such that no algorithm can recover the motif G with at most δ log n Θα (n, G)-sequences. Open Problems: An interesting open problem is whether there exists an algorithm to recover all the motifs for the alphabet with four characters. Acknowledgements. We thank Mikl´ os Cs¨ ur¨ os and Manan Sanghi for helpful discussions. Ming-Yang Kao is supported in part by National Science Foundation Grant CNS-0627751. Lusheng Wang is fully supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CityU 1196/03E).
References 1. Chin, F., Leung, H.: Voting algorithms for discovering long motifs. In: Proceedings of the 3rd Asia-Pacific Bioinformatics Conference, pp. 261–272 (2005) 2. Dopazo, J., Rodr´ıguez, A., S´ aiz, J.C., Sobrino, F.: Design of primers for PCR amplification of highly variable genomes. Computer Applications in the Biosciences 9, 123–125 (1993) 3. Frances, M., Litman, A.: On covering problems of codes. Theoretical Computer Science 30, 113–119 (1997)
Motif Discovery from Multiple Sequences
245
4. G¸asieniec, L., Jansson, J., Lingas, A.: Efficient approximation algorithms for the Hamming center problem. In: Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. S905–S906 (1999) 5. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997) 6. Hertz, G., Stormo, G.: Identification of consensus patterns in unaligned DNA and protein sequences: a large-deviation statistical basis for penalizing gaps. In: Proceedings of the 3rd International Conference on Bioinformatics and Genome Research, pp. 201–216 (1995) 7. Keich, U., Pevzner, P.: Finding motifs in the twilight zone. Bioinformatics 18, 1374–1381 (2002) 8. Keich, U., Pevzner, P.: Subtle motifs: defining the limits of motif finding algorithms. Bioinformatics 18, 1382–1390 (2002) 9. Lanctot, J.K., Li, M., Ma, B., Wang, L., Zhang, L.: Distinguishing string selection problems. In: Proceedings of the 10th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 633–642 (1999) 10. Lawrence, C., Reilly, A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51 (1990) 11. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. In: Proceedings of the Thirty-first Annual ACM Symposium on Theory of Computing, pp. 473–482 (1999) 12. Li, M., Ma, B., Wang, L.: On the closest string and substring problems. Journal of the ACM 49(2), 157–171 (2002) 13. Lucas, K., Busch, M., Mossinger, S., Thompson, J.: An improved microcomputer program for finding gene- or gene family-specific oligonucleotides suitable as primers for polymerase chain reactions or as probes. Computer Applications in the Biosciences 7, 525–529 (1991) 14. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (2000) 15. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology, pp. 269–278 (2000) 16. Proutski, V., Holme, E.C.: Primer master: a new program for the design and analysis of PCR primers. Computer Applications in the Biosciences 12, 253–255 (1996) 17. Stormo, G.: Consensus patterns in DNA. In: Doolitle, R.F. (ed.) Molecular evolution: computer analysis of protein and nucleic acid sequences. Methods in Enzymolog, 183, 211–221 (1990) 18. Stormo, G., Hartzell III, G.: Identifying protein-binding sites from unaligned DNA fragments. Proceedings of the National Academy of Sciences of the United States of America 88, 5699–5703 (1991) 19. Wang, L., Dong, L.: Randomized algorithms for motif detection. Journal of Bioinformatics and Computational Biology 3(5), 1039–1052 (2005)
Ratio Based Stable In-Place Merging Pok-Son Kim1 and Arne Kutzner2 1
Kookmin University, Department of Mathematics, Seoul 136-702, Rep. of Korea [email protected] 2 Seokyeong University, Department of Computer Science, Seoul 136-704, Rep. of Korea [email protected]
Abstract. We investigate the problem of stable in-place merging from n based point of view where m, n are the sizes of the input a ratio k = m sequences with m ≤ n . We introduce a novel algorithm for this problem that is asymptotically optimal regarding the number of assignments as well as comparisons. Our algorithm uses knowledge about the ratio of the input sizes to gain optimality and does not stay in the tradition of Mannila and Ukkonen’s work [8] in contrast to all other stable in-place merging algorithms proposed so far. It has a simple modular structure and does not demand the additional extraction of a movement imitation buffer as needed by its competitors. For its core components we give concrete implementations in form of Pseudo Code. Using benchmarking we prove that our algorithm performs almost always better than its direct competitor proposed in [6]. As additional sub-result we show that √ stable in-place merging is a quite simple problem for every ratio k ≥ m by proving that there exists a primitive algorithm that is asymptotically optimal for such ratios.
1
Introduction
Merging denotes the operation of rearranging the elements of two adjacent sorted sequences of size m and n, so that the result forms one sorted sequence of m + n elements. An algorithm merges two sequences in place when it relies on a fixed amount of extra space. It is regarded as stable, if it preserves the initial ordering of elements with equal value. There are two significant lower bounds for merging. The lower bound for the number of assignments is m + n because every element of the input sequences can change its position in the sorted output. As shown e.g. in Knuth [7] the n + 1)), where m ≤ n. lower bound for the number of comparisons is Ω(m log( m A merging algorithm is called asymptotically fully optimal if it is asymptotically optimal regarding the number of comparisons as well as assignments. We will inspect the merging problem on the foundation of a ratio based apn of the sizes of the proach. In the following k will always denote the ratio k = m input sequences. The lower bounds for merging can be expressed on the foundation of such a ratio as well. We get Ω(m log(k + 1)) as lower bound for the number of comparisons and m · (k + 1) as lower bound for the number of assignments. M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 246–257, 2008. c Springer-Verlag Berlin Heidelberg 2008
Ratio Based Stable In-Place Merging
247
In the first part of this paper we will show that there is a simple √ asymptotically fully optimal stable in-place merging algorithm for every ratio k ≥ m. Afterward we will introduce a novel stable in-place merging algorithm that is asymptotically fully optimal for any ratio k. The new algorithm has a modular structure and does not rely on the techniques described by Mannila and Ukkonen [8] in contrast to all other works ([10,4,2,6]) known to us. Instead it exploits knowledge about the ratio of the input sizes to achieve optimality. In its core our algorithm consists of two separated operations named “Block rearrangement” and “Local merges”. The separation allowed the omitting of the extraction of an additional movement imitation buffer as e.g. necessary in [6]. For core parts of the new algorithm we will give an implementation in Pseudo-Code. Some benchmarks will show that it performs better than its competitor proposed in [6] for a wide range of inputs. A first conceptual description of a stable asymptotically fully optimal in-place merging algorithm can be found in the work of Symvonis [10]. Further work was done by Geffert et al. [4] and Chen [2] where Chen presented a simplified variant of Geffert et al’s algorithm. All three publications delivered neither an implementation in Pseudo-Code nor benchmarks. Recently Kim and Kutzner [6] published a further algorithm together with benchmarks. These benchmarks proved that stable asymptotically fully optimal in-place merging algorithms are competitive and don’t have to be viewed as theoretical models merely.
2
A Simple Asymptotically Optimal Algorithm for √ k≥ m
We now introduce some notations that will be used throughout the paper. Let u and v be two ascending sorted sequences. We define u ≤ v (u < v) iff x ≤ y (x < y) for all elements x ∈ u and for all elements y ∈ v. |u| denotes the size of the sequence u. Unless stated otherwise, m and n (m ≤ n) are the sizes of two input sequences u and v respectively. δ always denotes some block-size with δ ≤ m. Tab. 1 contains the complexity regarding comparisons and assignments for six elementary algorithms that we will use throughout this paper. Brief descriptions Table 1. Complexity of the Toolbox-Algorithms Algorithm Hwang and Lin
(1) - ext. buffer (2) - m rotat. Block Swapping Block Rotation
Arguments u, v with |u| ≤ |v| let m = |u| , n = |v|
Comparisons
Assignments t
m(t + 1) + n/2 where t = log(n/m)
2m + n n + m2 + m u, v with |u| = |v| 3 |u| u, v |u| + |v| + gcd(|u| , |v|) ≤ 2(|u| + |v|) Binary Search u, x (searched element) log |u| + 1 Minimum Search u |u| − 1 m(m−1) m(m+1) Insertion Sort u, let m = |u| + (m − 1) −1 2 2
248
P.-S. Kim and A. Kutzner
of these algorithms except for “Minimum Search” can be found in [6]. In the case of “Minimum Search” we assume that u is unsorted, therefore a linear search is necessary. First we will now show that there is a simple stable merging algorithm called Block-Rotation-Merge that is asymptotically fully optimal for any ratio √ k ≥ m. Afterward we will prove that there is a relation between the number of different elements in the shorter input sequence u and the number of assignments performed by the rotation based variant of Hwang and Lin’s algorithm [5]. Algorithm 1: Block-Rotation-Merge (u, v, δ) 1. We split the sequence u into blocks u1 u2 . . . u m so that all sections u2 to δ u m are of equal size δ and u1 is of size m mod δ. Let xi be the last element δ of v of ui (i = 1, · · · , m δ ). Using binary searches we compute a splitting m into sections v1 v2 . . . v m so that vi < xi ≤ vi+1 (i = 1, · · · , δ − 1). δ 2. u1 u2 . . . u m v1 v2 . . . v m is reorganized to δ δ u1 v1 u2 v2 . . . u m v m using m δ − 1 many rotations. δ δ calls of the rotation based variant 3. We locally merge all pairs ui vi using m δ of Hwang and Lin’s algorithm ([5]). The steps 2 and 3 are interlaced as follows: After creating a new pair ui vi (i = 1, · · · , m δ ) as part of the second step we immediately locally merge this pair as described in step 3. 2
Lemma 1. Block-Rotation-Merge performs m 2·δ + 2n + 6m + m · δ many assignments at most if we use the optimal algorithm from Dudzinski and Dydek [3] for all block-rotations. Proof. For the first rotation from u1 u2 · · · u m v1 to u1 v1 u2 · · · u m the alδ
δ
gorithm performs |u2 | + · · · + |u m | + |v1 | + gcd(|u2 | + · · · + |u m |, |v1 |) asδ δ signments. The second rotation from u2 u3 · · · u m v2 to u2 v2 u3 · · · u m requires δ
δ
|u3 |+· · ·+|u m |+|v2 |+gcd(|u3 |+· · ·+|u m |, |v2 |) assignments, and so on. For δ δ the last rotation from u m −1 u m v m −1 v m to u m −1 v m −1 u m v m δ
δ
δ
δ
δ
δ
δ
δ
the algorithm requires |u m | + |v m −1 | + gcd(|u m |, |v m −1 |) assignments. δ δ δ δ 2 Additionally m δ (3δ + 3δ + δ ) = 6m + m · δ assignments are required for the local m merges. Altogether the algorithm performs δ · (( m δ − 1) + ( δ − 2) + · · · + 1) + m2 m m2 n + n + 6m + m · δ = 2·δ − 2 + 2 · n + 6m + m · δ ≤ 2·δ + 2 · n + 6m + m · δ assignments at most. Lemma 2. Block-Rotation-Merge is asymptotically optimal regarding the number of comparisons. √ Corollary 1. If we assume a block-size of √
m then Block-Rotation-Merge is asymptotically fully optimal for all k ≥ m.
Ratio Based Stable In-Place Merging
249
√ So, for k ≥ m there is a quite primitive asymptotically fully optimal stable in-place merging algorithm. In the context of complexity deliberations in the next section we will rely on the following Lemma. Lemma 3. Let λ be the number of different elements in u. Then the number of assignments performed by the rotation based variant of Hwang and Lin’s algorithm is O(λ · m + n) = O((λ + k) · m). Proof. Let u = u1 u2 . . . uλ , where every ui (i = 1, · · · , λ) is a maximally sized section of equal elements. We split v into sections v1 v2 . . . vλ vλ+1 so that we get vi < ui ≤ vi+1 (i = 1, · · · , λ). (Some vi can be empty.) We assume that Hwang and Lin’s algorithm already merged a couple of section and comes to the first elements of the section ui (i = 1, · · · , λ). The algorithm now computes the section vi and moves it in front of ui using one rotation of the form · · · ui . . . uλ vi · · · to · · · vi ui . . . uλ · · ·. This requires |ui | + · · · + |uλ | + |vi | + gcd(|ui | + · · · + |uλ |, |vi |) ≤ 2(m+|vi |) many assignments. Afterward the algorithm continues with the second element in ui . Obviously there is nothing to move at this stage because all elements in ui are equal and the smaller elements from v were already moved in the step before. Because we have only λ different sections we proved our conjecture. Corollary 2. Hwang and Lin’s algorithm is fully asymptotically optimal if we have either k ≥ m or k ≥ λ where λ is the number of different elements in the shorter input sequence u .
3
Novel Asymptotically Optimal Stable In-Place Merging Algorithm
We will now propose a novel stable in-place merging algorithm called StableOptimal-Block-Merge that is fully asymptotically optimal for any ratio. Notable properties of our algorithm are: It does not rely on the block management techniques described in Mannila and Ukonnen’s work [8] in contrast to all other such algorithms proposed so far. It degenerates to the simple Block-Rotation√ Merge algorithm for roughly k ≥ m/2 . The internal buffer for local merges and the movement imitation buffer share a common buffer area. The two operations “block rearrangement” and “local merges” stay separated and communicate using a common block distribution storage. There is no lower bound regarding the size of the shorter input sequence. Algorithm 2: Stable-Optimal-Block-Merge Step 1: Block distribution storage assignment √ Let δ = m be our block-size. We split the input sequence u into u = s1 ts2 u so that s1 and s2 are two sequences of size m/δ + n/δ and t is a sequence of maximal size with elements equal to the last element of s1 . We assume that there are enough elements to get a nonempty u and call s1 together with s2 our block distribution storage (in the following shortened to bd-storage).
250
P.-S. Kim and A. Kutzner elements originating from u s1
s2
t
w
b
block distribution storage
v
buffer (all elements in this area are distinct)
Fig. 1. Segmentation after the buffer extraction
Step 2: Buffer extraction In front of the remaining sequence u we extract an ascending sorted buffer b of size δ so that all pairs of elements inside b are distinct. Simple techniques how to do so are proposed e.g. in [9] or [6]. Once more we assume that there are enough elements to do so. Now let w be the remaining right part of u after the buffer extraction. The segmentation of our input sequences after the buffer extraction is shown in Fig. 1. Step 3: Block rearrangement We logically split the sequence wv into blocks of equal size δ as shown in Fig. 2 (a). The two blocks w1 and v n +1 are undersized and can even be empty. δ In the following we call every block originating from w a w-block and every block originating from v a v-block. The minimal w-block of a sequence of w-blocks is always the w-block with the lowest order (smallest elements) regarding the original order of these blocks. We rearrange all blocks except of the two undersized blocks w1 and v n +1 , δ so that the following 3 properties hold: (1) If a v-block is followed by a w-block, then the the last element of the v-block must be smaller than the first element of the w block (Fig. 2(b)). buffer area (here used as movement imitation buffer) w1
(a)
w2
wr−1
v-block
δ
≤
< (b)
v n
v1
wr
w-block
w-block
v-block
> (elements exchanged) (c)
v-block
w-block
v-block
first and second section of the block distribution storage
Fig. 2. Graphical remarks to the block rearrangement process
v n +1 δ
Ratio Based Stable In-Place Merging
251
(2) If a w-block is followed by a v-block, then the first element of the w-block must be smaller or equal to the last element of the v-block (Fig. 2(b)). (3) The relative order of the v-blocks as well as w-blocks stays unchanged. This rearrangement can be easily realized by “rolling” the w-blocks through the v-blocks and “drop” minimal w-blocks so that the above properties are fulfilled. During this rolling the w-blocks stay together as group but they can be moved out of order. So, due to the need for stability, we have to track their positions. For this reason we mirror all block replacements in the buffer area using a technique called movement imitation (The technique of movement imitation is described e.g. in [10] and [6]). Every time when a minimal w-block was dropped we can find the position of the next minimal block using this buffer area. Later we will have to find the positions of w-blocks in the block-sequence created as output of the rearrangement process. For this purpose we store the positions of w-blocks in the block distribution storage as follows: The block distribution storage consist of two sections of size m/δ+ n/δ and the i-th element of the first section together with the i-th element of the second section belong to the i-th block in the result of the rearrangement process. Please note that, due to the technique used for constructing the bd-storage, such pairs of elements are always different with the first one smaller than the second one. If the i-th block originates from w we exchange the corresponding elements in the bd-storage otherwise we leave them untouched. Fig. 2(c) shows this graphically. Step 4: Local merges We visit every w-block and proceed as follows: Let p be the w-block to be merged and let q be the sequence of all v-originating elements immediately to the right of p that are still unmerged. Further let x be the first element of p. (1) Using a binary search we split q into q = q1 q2 so that we get q1 < x ≤ q2 . It holds |q1 | < δ due to the block rearrangement applied before. (2) We rotate pq1 q2 to q1 pq2 . (3) We locally merge p and q2 by Hwang and Lin’s algorithm, where we use the buffer area as internal buffer. This visiting process starts with the rightmost w-block and moves sequentially w-block by w-block to the left. The positions of the w-blocks are detected using the information hold in the bd-storage. Every time when we locate the position of a w-block in the bd-storage we bring the corresponding bd-storage elements back to their original order. So, after finishing all local merges both sections of the bd-storage are restored to their original form. Step 5: Final sweeping up On the left there is a still unmerged sub-sequence s1 ts2 bw1 v where v is the subsection of v that consists of the remaining unmerged elements. We proceed as follows: (1) We split v into v = v1 v2 so that v1 < x ≤ v2 where x is the last element of s2 . Afterward we rotate bw1 v1 v2 to v1 bw1 v2 and locally merge w1 and v2 using Hwang and Lin’s algorithm with the internal buffer. (2) In the same v1,2 so that we get v1,1 < y ≤ v1,2 where y is the way we split v1 into v = v1,1 last element of s1 . We rotate s1 ts2 v1,1 v1,2 to s1 v1,1 ts2 v1,2 and locally merge s1
252
P.-S. Kim and A. Kutzner
with v1,1 and s2 √ with v1,2 ’ using the Block-Rotation-Merge algorithm with a block-size of m. (3) We sort the buffer area using Insertion-Sort and merge it with all elements right of it using the rotation based variant of Hwang and Lin’s algorithm.
Lack of Space in Step 1 The inputs are so asymmetric that u becomes empty. Using a binary search we split v into v = v1 v2 so that we get v1 < t ≤ v2 and rotate s1 ts2 v1 v2 to√s1 v1 ts2 v2 . Using the Block-Rotation-Merge algorithm with a block-size
m we locally merge s1 with v1 and s2 with v2 . If s2 is empty we ignore it and directly merge s1 with v in the same style. √ Extracted buffer smaller than m in Step 2 √ We assume that we could extract a buffer of size λ with λ < m. We change our block-size δ to |u| /λ and apply the algorithm as described but with the modification that we use the rotation based variant of Hwang and Lin’s algorithm for all local merges. Corollary 3. Stable-Optimal-Block-Merge is stable. Theorem 1. The Stable-Optimal-Block-Merge algorithm requires O(m+ n) = O(m · (k + 1)) assignments.. Proof. It is enough to prove that every step is performed with O(m + n) assignments. In the first step no assignments occur at all. The buffer extraction in step 2 requires O(m) assignments, as shown in [6]. In step 3 the “rolling” of the w-blocks through√the √ v-blocks together with the “dropping” of the minimal wblocks requires 3 m·( m+ √nm ) = O(m+n) assignments. The rotations for the √ √ integrated “movement imitation” contribute O( m·( m+ √nm )) = O(m+n) assignments. The marking of the positions of the w-blocks in the bd-storage needs √ assignments. In step O( m) assignments. So, altogether √ √ √ step√3 requires√O(m+n) 4 each w-block rotation requires m + √m +√gcd( m, m) = 3 m assignments at most. So all w-block rotations need 3 m· m = O(m) assignments. The local mergings using Hwang and Lin’s algorithm consume 2m + n assignments altogether. The reconstruction√of the original order of the exchanged elements in the bd-storage contributes O( m) assignments. In step 5 the first rotation √ √ requires 4 m assignments at most and the local merging√of w1 and v2 needs 3 m assignments at most. The second rotation requires 3 m + √nm assignments at most. √ √ The success in step 1 implies that roughly k ≤ m/2, so we get m ≤ m. √ k · m+n Further we have m/δ + n/δ is roughly equal to (k + 1) · m = √m . So, according to Lemma 1 each local merging with Block-Rotation-Merge needs √ √ √ √ √ (k m)·k m √ + 2 · n + 6k m + k( m m) ≤ k·m 2 + 2n + 6m + k · m assignments at 2· m most. The buffer sorting using insertion sort contributes O(m)√assignments and the final call of Hwang and Lin’s algorithm requires n + m + m assignments. So, step 5 needs altogether O(m + n) assignments as well. √ In the first exceptional case “Lack of Space in Step 1” we have roughly k ≥ m/2 and directly switch to Block-Rotation-Merge. According to Corollary 1 Block-Rotation-Merge is fully asymptotically optimal for such k.
Ratio Based Stable In-Place Merging
253
√ In the second exceptional case “Extracted √ buffer smaller than m” we change the block-size to |u| /λ with λ < m and use the rotation based variant of Hwang and Lin’s algorithm for local merges. A recalculation of the steps 3 to 5, were we use Lemma 3 in the context of all local merges, proves that the number of assignments is still O(m + n) . n n Lemma 4. If k = i=1 ki for any ki > 0 and integer n > 0, then i=1 log ki ≤ n log(k/n).
Proof. It holds because the function log x is concave. Theorem 2. The Stable-Optimal-Block-Merge n + 1)) = O(m log(k + 1)) comparisons. O(m log( m
algorithm
requires
Proof. As in the case of the assignments it is enough to show that every step keeps the asymptotic optimality. Step 1 contains one binary search over m merely. The buffer extraction in step 2 requires m comparisons at most, as shown in [6]. The rearrangement of all blocks except of the two undersized blocks w1 and v n +1 in δ √ step 3 requires 2 m + √nm comparisons at most. The detection of the minimal el√ √ ement in the movement imitation buffer demands m · m many comparisons √at √ most. In step 4 the binary searches for splitting the q-sequences cost m · log m comparisons at most. Now let (m1 , n1 ), (m2 , n2 ), · · · , (mr , nr ) be the sizes of all r-groups that are locally merged√by Hwang and Lin’s algorithm. According to r ni ) + 1) + Lemma 4, Table 1 and since r < m this task requires i=1 (mi (log( m i r r r ni ni mi ) = i=1 (mi log( mi ) + 2mi ) ≤ i=1 mi log( mi ) + 2m = i=1 (mi log ni − √ r √ √ √ mi log mi )+2m = m i=1 (log ni −log mi )+2m ≤ m( m log nr − m log m r )+ n n 2m ≤ m(log( m + 1)) + 2m = O(m log( m + 1)) comparisons. The asymptotic optimality in step 5 as well as in the exceptional case “Lack of Space in Step 1” is obvious due to Lemma 2. The change√of the block-size in the second exceptional case “Extracted buffer smaller than m” triggers a simple recalculation of step 3 and step 4, where we leave the details to the reader. Corollary 4. Stable-Optimal-Block-Merge is an asymptotically fully optimal stable in-place merging algorithm. Table 2. Pseudo-code Definitions of the Toolbox Algorithms Pseudo-code Definition
Description of the Arguments
Hwang-And-Lin(A, f irst1, f irst2, last) u is in A[f irst1 : f irst2 − 1], v is in A[f irst2 : last − 1] Binary-Search(A, f irst, last, x) delivers the position of the first occurrence of x in A[f irst : last−1] Minimum(A, pos1, pos2) delivers the index of the minimal element in A[pos1 : pos2 − 1] Block-Swap(A, pos1, pos2, len) u is in A[pos1 : pos1 + len − 1], v is in A[pos2 : pos2 + len − 1] Block-Rotate(A, f irst1, f irst2, last) u, v as in Hwang-And-Lin Exchange(A, pos1, pos2) is equal to Block-Swap(A, pos1, pos2, 1).
254
P.-S. Kim and A. Kutzner
Algorithm 1. Pseudo-code of the procedure for the block rearrangement Rearrange-Blocks(A, f irst1, f irst2, last, buf, bds1, bds2, blockSize) 1 w2 . . . wx is in A[f irst1 : f irst2 − 1], v1 . . . vy−1 is in A[f irst2 : last − 1] √ 2 buffer b is in A[buf : buf + m − 1] √ √ 3 bd-storage s{1|2} is in A[bds{1|2} : bds{1|2} + m + n/ m − 1] 4 5 buf End ← buf + (f irst2 − f irst1) / blockSize 6 minBlock ← f irst1 7 while f irst1 < f irst2 8 do if f irst2+blockSize < last and A[f irst2+blockSize−1] < A[minBlock] 9 then Block-Swap(A, f irst1, f irst2, blockSize) 10 Block-Rotation(A, buf, buf + 1, buf End) 11 if minBlock = f irst1 12 then minBlock ← f irst2 13 f irst2 ← f irst2 + blockSize 14 else Block-Swap(A, minBlock, f irst1, blockSize) 15 Exchange(A, buf, buf + (minBlock − f irst1) / blockSize) 16 Exchange(A, bds1, bds2) 17 buf ← buf + 1 18 if buf < end 19 then minIndex ← Minimum(A, buf, buf End) 20 minBlock ← f irst1 + (minIndex − buf ) ∗ blockSize 21 bds1 ← bds1 + 1; bds2 ← bds2 + 1 22 f irst1 = f irst1 + blockSize
Pseudo-code implementations for the core operations “block rearrangement” and “local merges” are given in Alg. 1 and Alg. 2, respectively. Both code segments contain calls of the toolbox algorithms mentioned in section 2. The Pseudo-code definitions for these toolbox algorithms are summarized in Tab. 2. 3.1
Optimizations
We now report about several optimizations that help improving the performance of the algorithm without any impact on its asymptotic properties. The immediate mirroring of all w-block movements in the movement imitation buffer (occurs in Step 3) triggers a rotation (line 10 in Alg. 1) every time when a v-block is moved into front of the group of w-blocks. The number of necessary rotations can be reduced by first counting the number of v-blocks moved into front of the wblocks. This counting follows a single update of the movement imitation buffer if the placement of a minimal w-block happens. In the context of the movement of v-blocks into front of w-blocks (Step 3) the floating hole technique (for a description see [4] or [6]) can be applied for reducing the number of assignments. Similarly the floating hole technique can also be applied during the local merges (Step 4) by combining the block swap to the internal buffer with the rotation
Ratio Based Stable In-Place Merging
255
Algorithm 2. Pseudo-code of the function for local merges Local-Merges(A, f irst, last, buf, bds1, bds2, blockSize, numBlocks) 1 A[f irst : last − 1] contains all blocks in distributed form 2 3 index ← ((last − f irst) / blockSize) − 1 4 while numBlocks > 0 5 do while A[bsd1 + index] < A[bsd2 + index] 6 do index ← index − 1 7 f irst2 ← f irst + ((index + 1) ∗ blockSize) 8 if f irst2 < last 9 then b ← Binary-Search(f irst2, last, A[f irst2 − blockSize]) 10 Block-Rotation(A, f irst2 − blockSize, f irst2, b) 11 Hwang-Lin(A, b − blockSize, b, last, buf ) 12 last ← b − blockSize 13 Exchange(A, bds1 + index, bds2 + index) 14 numBlocks ← numBlocks − 1; index ← index − 1 15 return last
that moves smaller v-originating elements to √ the front of the w-block. In the special case “Extracted buffer smaller then m” the sorting of the buffer b in Step 5 is unnecessary because the buffer is already sorted after Step 3 and stays unchanged during Step 4. Insertion-Sort can be replace by some more efficient sorting algorithm. Please note that there is no need for stability in the context of the buffer sorting because all buffer elements are distinct.
4
Experimental Work
We did some experimental work with our algorithm in order to get an impression of its performance. We compared it with the stable fully asymptotically optimal algorithm presented in [6] as well as the simple standard algorithm that relies Table 3. Practical comparison of various merge algorithms n m Stable-O.-B.-Merge SOFSEM 2006 Alg. Linear Standard Alg. #comp #assign te #comp #assign te #comp #assign te 221 221 5843212 37551852 227 5961524 49666369 335 4194239 8388608 121 221 218 1500433 15866835 100 1505766 17182008 122 2359288 4718592 71 221 215 280611 17350896 87 280412 12681115 68 2129890 4259840 64 221 212 43611 4422493 35 47330 10512479 53 2100804 4202496 63 223 29 8057 16350956 133 8589 38150052 202 8373039 16778240 251 223 26 1200 15459824 131 1271 30749720 161 8234508 16777344 254 223 23 172 11322991 119 170 7535160 68 7572307 16777232 301 223 20 23 4163489 55 24 4163489 55 4225121 16777218 261 te : Execution time in ms, #comp : Number of comp., m, n : Lengths of inp. seq.
256
P.-S. Kim and A. Kutzner
on external space of size m. The results of our experimental work summarizes Tab. 3 where every line shows average values for 50 runs with different data. We took a standard desktop computer with 2GHz processor as hardware platform. All coding happened in the C programming language. For the measurement of the number of assignments we applied the optimal block rotation algorithm presented in [3]. Although this algorithm is optimal regarding the number of assignments it is quite slow in practice due to its high computational demands. Therefore for the time measurements we applied a block-swap based algorithm presented e.g. in [1] using identical data. Regarding the buffer extraction (Step 2) there are several alternatives. The extraction process can be started from the left end as well as from the right end of the input and we can choose between a binary search and linear search for the determination of the next element. All 4 possible combinations keep the asymptotic optimality. However, there is no clear “best choice” among them because the most advantageous combination can vary depending on the structure of the input. In the context of the Stable-Optimal-Block-Merge algorithm we decided for the variant “starting from the left combined with binary search”, the SOFSEM 2006 algorithm already originally chose “starting from the right combined with linear search”. Except for two combinations of input sizes our new algorithm is always faster than its predecessor. The bad performance in the case (221 , 215 ) reflects the lack of the implementation of the floating hole technique as mentioned in the section about optimizations. The application of Block-Rotation-Merge triggers unnecessary rotations in the case (223 , 23 ). This can be fixed by introduction of a check whether k ≥ m and a direct switch to the rotation based variant of Hwang and Lin’s algorithm if true.
5
Conclusion
We investigated the problem of stable in-place merging from a ratio based point n of view by introducing a ratio k = m , where m,n are the sizes of the input sequences with m ≤ n. We could show that there is a simple asymptotically fully optimal (optimal regarding the number of comparisons √ as well as assignments) stable in-place merging algorithm for any ratio k ≥ m. In the second part of this paper we introduced a novel asymptotically fully optimal stable in-place merging algorithm which is constructed on the foundation of deliberations regarding the ratio of the input sizes. Highlights of this algorithm are: It has a modular structure and does not rely on techniques described by Mannila and Ukkonen [8] in contrast to all its known competitors ([10,4,6]). The tasks “block-distribution” and “local block mergings” are modular separated. As side effect they can share a common buffer area and the extraction of a separated movement imitation buffer is not necessary. The algorithm demands no lower bound for the size of the shorter input sequence (32 elements in case of the alg. in [4] and 10 elements for the alg. in [6]).
Ratio Based Stable In-Place Merging
257
Our algorithm performs for a wide range of inputs remarkably better than its direct competitor presented in [6]. There is a superiority in particular for symmetrically sized inputs, a fact that is of importance in the context of the Merge-sort algorithm. The number of comparisons and assignments are good measurements for the efficiency of merging algorithms. However, the impact of other operations as e.g. numerical calculations and index comparisons deserves investigation as well. As motivation we would like to refer to a well known effect with the optimal blockrotation algorithm introduced by Dudzinski and Dydek in [3]. Their algorithm is optimal regarding the number of assignments but has a bad performance due to a included computation of a greatest common divisor. For our further work we plan to include deliberations regarding such so far uncounted operations.
References 1. Bentley, J.: Programming Pearls, 2nd edn. Addison-Wesley, Reading (2000) 2. Chen, J.: Optimizing stable in-place merging. Theoretical Computer Science 302(1/3), 191–210 (2003) 3. Dudzinski, K., Dydek, A.: On a stable storage merging algorithm. Information Processing Letters 12(1), 5–8 (1981) 4. Geffert, V., Katajainen, J., Pasanen, T.: Asymptotically efficient in-place merging. Theoretical Computer Science 237(1/2), 159–181 (2000) 5. Hwang, F.K., Lin, S.: A simple algorithm for merging two disjoint linearly ordered sets. SIAM J. Comput. 1(1), 31–39 (1972) 6. Kim, P.-S., Kutzner, A.: On optimal and efficient in place merging. In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2006. LNCS, vol. 3831, pp. 350–359. Springer, Heidelberg (2006) 7. Knuth, D.E.: The Art of Computer Programming. Sorting and Searching, vol. 3. Addison-Wesley, Reading (1973) 8. Mannila, H.: A simple linear-time algorithm for in situ merging. Information Processing Letters 18, 203–208 (1984) 9. Pardo, L.T.: Stable sorting and merging with optimal space and time bounds. SIAM Journal on Computing 6(2), 351–372 (1977) 10. Symvonis, A.: Optimal stable merging. Computer Journal 38, 681–690 (1995)
A Characterisation of the Relations Definable in Presburger Arithmetic Mathias Barra Dept. of Mathematics, University of Oslo, P.B. 1053, Blindern, 0316 Oslo, Norway [email protected] http://folk.uio.no/georgba
Abstract. Four sub-recursive classes of functions, B, D, BD and BDD are defined, and compared to the classes G0 , G1 and G2 , originally defined by Grzegorczyk, based on bounded minimalisation, and characterised by Harrow in [5]. B is essentially G0 with predecessor substituted for successor ; BD is G1 with (truncated) difference substituted for addition. We prove that the induced relational classes are preserved (G0 = B and G1 = BD ). We also obtain D = PrAqf (the quantifier free fragment of Presburger Arithmetic), and BD = PrA , and BDD = G2 , where BDD is G2 with integer division and remainder substituted for multiplication, and where G2 is known to be equal to the predicates definable by a bounded formula in Peano Arithmetic.
1
Introduction and Notation
Introduction: This paper emerges from some investigations into (very) small sub-recursive classes. The original motivation has been to discard all increasing functions, and to compare the resulting classes to otherwise similar classes. This approach—banning all growth—has proved successful in the past, and has repeatedly yielded surprising and enlightening results, see e.g. Jones [7,8]; Kristiansen and Voda in [12,13] (with functionals of higher types, and imperative programming languages respectively); Kristiansen and Barra [11] (with function algebras and λ-calculus); and Kristiansen [9,10]. The author has been interested in further investigations into the consequences of omitting increasing functions from the set of initial functions of various inductively defined classes (see definitions below). The seminal paper by A. Grzegorczyk Some classes of recursive functions [4] from 1953 was the source of great inspiration to many researchers during the decades to follow. One significant contribution emerged with Harrow’s Ph.D. dissertation Sub-elementary classes of functions and relations [5], and his findings were later summarised and enhanced in Small Grzegorczyk classes and limited minimum [6]. He there answered several questions (in the negative), originally posed by Grzegorczyk, and with regard to the interchangeability of the schemata bounded primitive recursion and bounded minimalisation. Another result from [5] is that G1 (this and other classes mentioned below are defined in the sequel) M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 258–269, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Characterisation of the Relations Definable
259
is identical to the set of predicates PrA , i.e. those subsets of INk definable by a formula in the language of Presburger Arithmetic. More precisely, for i = 0, 1, 2, we study the three classes Gi which are defined analogously to Grzegorczyk’s E i , but where the schema of bounded minimalisation is substituted for bounded primitive recursion. These classes contain the increasing functions successor (in G0 and E 0 ), addition (in G1 and E 1 ) and multiplication (in G2 and E 2 ). We will show that the classes Gi contain redundancies, in the sense that the increasing functions ‘S’, ‘+’ and ‘×’ can be substituted with their nonincreasing inverses predecessor, (truncated) difference, and integer division and remainder, without affecting the induced relational classes Gi . That is, the growth provided by e.g. addition, in the restricted framework of composition and bounded minimalisation, does not contribute to the number of computable predicates. In fact, we show that the quantifier free fragment of Presburger Arithmetic may be captured in a much weaker system: essentially only truncated difference and composition is necessary. Notation: Unless otherwise specified, a function means a function f : INk → IN The arity of f , denoted ar(f ), is then k. When F is a set of functions, F k denotes the k-ary functions in F . A function is nonincreasing if, for some cf ∈ IN we have1 f (x) ≤ max(x, cf ) for all x ∈ INk . We say that f has top index i if f (x) ≤ max(xi , cf ). If cf = 0, we say that f is strictly nonincreasing, and that i is a strict top index. def The bounded f , denoted fˆ, is the (ar(f )+1)-ary function fˆ(x, b) = min(f (x), b). These bounded versions, in particular the bounded versions of increasing funcˆ the bounded successor function, will be of major importance for the tions like S, def ensuing developments. The predecessor function, denoted P, is defined by P(x) = max(x − 1, 0). The case function, denoted C, and the (truncated) difference func· are defined by tion, denoted −, x , if z = 0 0 , if x ≤ y def def def · C(x, y, z) = and x −y = max(x − y, 0) = y , else x − y , if x > y When we use the symbol ‘−’ without the dot in an expression or formula, we mean the usual minus on ZZ. def Let φ(x, y, n, r) ⇔ 0 ≤ r < y ∧ x = ny + r. The remainder function and integer division function, denoted rem and ·· respectively, are defined by x def x , if y = 0 x , if y = 0 def = and rem(x, y) = , n , if φ(x, y, n, r) r , if φ(x, y, n, r) y x 2 respectively. We define e.g. 0 = x in order to make the functions total on IN . Note that we have xy y + rem(x, y) = x for all x and y. 1
The bound which holds for f in G0 and E 0 is f ( x) ≤ max( x)+cf ; note the distinction. x) The latter bound is sometimes referred to as 0-boundedness, while f ( x) ≤ cf max( as 1-boundedness.
260
M. Barra
I is the set of all projections Iki (x) = xi , and N is the set of all constant functions c(x) = c for all c ∈ IN. An inductively defined class of functions (idc.), is generated from a set X called the initial, primitive or basic functions, as the least class containing X and closed under the schemata, functionals or operations of some set op of functionals. We write [X ; op] for this set2 . The schemata composition, denoted comp, and bounded minimalisation 3 , denoted bmin, will be used extensively. We write h ◦ g for the function f (x) = h(g1 (x), . . . , gm (x)), and μz≤y [g1 = g2 ] for the function f (x, y) which equals the least z ≤ y satisfying the equation g1 (x, z) = g2 (x, z), if such exists, and 0 else. A relation is a subset R of INk for some k. Relations are interchangeably called predicates. Sets of predicates are usually subscripted with a . For a set F of relations, we say that F is boolean, when F is closed under finite intersections and complements. When F is some set of relations of varying arity, Fk denotes the k-ary relations of F . When R = f −1 (0), the function f is referred to as a characteristic function of R, and is denoted χR . Let F be a set of functions. F denotes the set of relations of F , viz. those subsets R ⊆ INk with χR ∈ F. Formally Fk = {f −1 (0) ⊆ INk |f ∈ F k } and F = Fk . k∈IN
The graph of f , denoted Γf , is the relation {(x, y) ∈ INar(f )+1 |f (x) = y}. We overload Γf to also denote its characteristic function. Whenever a symbol occurs under an arrow, e.g. x, we usually do not point out the length of the list—by convention it shall be k unless otherwise specified.
2
The Classes B and G0
def ˆ comp, bmin]. Note that Let the class B be defined by B = [I ∪ N ∪ {P, C}; def ˆ ˆ the case function C takes four arguments: C(x, y, z, b) = min(C(x, y, z), b). G0 is defined by (in our notation) [I ∪ {0, S, P}; comp, bmin]. The essential difference between the two classes is that G0 includes the increasing initial function S, whereas B only includes nonincreasing initial functions. It is clear that N ⊆ G0 , ˆ ∈ G0 . Hence B ⊆ G0 . The inclusion is strict, since and it is easy to prove that C all functions in B satisfy
Lemma 1 (top index). Let f ∈ B k . Then f has a top index. Furthermore, if f (INk ) is infinite, the top index is strict.
We skip the proof, which is easy and by induction on f ∈ B. Observe next that · ˆ 1, f1 , f2 ) = 0 ⇔ f1 = 0 ∨ f2 = 0. Since B is closed ˆ 0, f, 1) = 1 −f and C(0, C(1, under bmin by definition, we obtain 2 3
This notation is adopted from Clote [2], where an i.d.c is called a function algebra. a.k.a. bounded search a.k.a. limited minimum.
A Characterisation of the Relations Definable
Proposition 1. B is boolean, and is closed under ∃y≤x type quantifiers.
261
ˆ amongst B’s initial functions4 . As the main This explains the inclusion of C result of this section will show, the only essential use of S in [6] is to produce · via the successor and bounded minimalisation. functions like e.g. 1 −x Harrow proves in [6] that G0 = E 0 ∩ PL, where PL is the set of piecewise linear functions. This class is defined as the functions which may be put on the form f (x) = Li (x) ⇔ Pi (x), where each Li is either constant or of the form · xj −c or xj + c, and where the Pi are conjunctions of predicates of the form Lr ≤ Ls . Note that there should be a finite number of clauses5 . Note that I11 is a characteristic function for the relation ‘equals zero’. Also, x < y ⇔ y = 0 ∧ (x = 0 ∨ μz≤y [P(z) = x] = 0), and x = y ⇔ ¬(x < y ∨ y < x). def ˆ b) = C(b, ˆ f, f, b), and so Importantly, for f = μz≤b [P(z) = x], we have S(x, ˆ x + 1 = y ⇔ x < y ∧ S(x, y) = y. Clearly this generalizes to relations x + c = y for c ∈ IN. Combining the above, it is not hard to see that we may decide in B whether P (x) for any relation P of the form specified in the definition of PL, and—for ditto L—whether L(x) = y.
But now, for f ∈ PL, clearly f (x) = y ⇔ j (Pj (x) ∧ Lj (x) = y), with the latter relation in B , whence Theorem 1. B = G0 = PL . Proof. Since B ⊆ G0 , inclusion to the right is trivial. If R ∈ G0 , by definition R = f −1 (0) for some f ∈ G0 ⊆ PL. Hence x ∈ R ⇔ f (x) = 0 ∈ B .
Hence, only nonincreasing functions are necessary to carry out the analysis. In section 4 we show the analogous result holds also of Harrow’s G1 . We first dedicate some time to the truncated difference function. As the results of the next section show, it is surprisingly strong from a computational point of view.
3
The Class D
· Define D = [I ∪ N ∪ { −}; comp]. The lemma below is the starting point for showing that the class D is surprisingly powerful. The reader is asked to note that composition is the sole closure operation of D. A top-index lemma also holds for D. def
Lemma 2 (top index). Let f ∈ Dk . Then f has a top index. Furthermore, if f (INk ) is infinite, the top index is strict.
4
5
ˆ cannot be omitted, e.g. proposition 1 would not hold. We also conjecture that C However, we can prove that with an alternative version of bmin, which returns ‘y’ rather than ‘0’ upon a failed search, I ∪ N suffice as initial functions. The proofs of this claim are straightforward, but involved. 5 , if x1 ≤ x2 ∧ 3 ≤ x1 is a member of PL, since the ‘else E.g. f (x1 , x2 ) = x2 + 4 , else clause’ can be split into the two clauses x2 + 1 ≤ x1 and x1 ≤ 2.
262
M. Barra
Lemma 3 (bounded addition). The function min(x + y, z) belongs to D. · · · · · Proof. Set f (x, y, z) = z −((z −x) −y). If x + y ≥ z, then (z −x) −y = 0, which · · · yields z −((z −x) −y) = z − 0 = z On the other hand, if z > x + y ≥ x, then · · z −x = z − x > y > 0. Hence (z − x) −y = ((z − x) − y) > 0. But now z > (z − (x + y)) > 0, and so · · · −x) −y) = z − (z − (x + y)) = z − z + (x + y) = x + y.
z −((z def
This function is the key to proving several properties of D and D . Proposition 2. We have that (i) min(x, y) ∈ D; (ii) D is boolean; (iii) χ= , χ< ∈ D; (iv) Γmax , ΓC ∈ D; (v) If A or INk \ A is finite, then A ∈ D . Proof. Clearly min(x, y) = min(x + 0, y), thus (i). Next, given χR1 and χR2 we have that · χR1 ∩R2 = min(χR1 + χR2 , 1) and χINk \R = 1 −χ R1 · · hence (ii). Since x −y = 0 ⇔ x ≤ y and y −x = 0 ⇔ y ≤ x yields (iii) in conjunction with (ii). Next, max(x, y) = z ⇔ y ≤ x ∧ z = x ∨ x < y ∧ z = y . Also, C(x, y, z) = w ⇔ z = 0 ∧ w = x ∨ z < 0 ∧ w = y , hence (iv). (v) is proved by induction on n = min(|A|, |INk \ A|). We skip the details.
Armed with proposition 2, we may prove the following lemma Lemma 4. Let r ∈ {<, =}, and let 1 ≤ j < k ∈ IN be arbitrary. Then, the following relations belong to D : j
i=1
xi r
k
xi
i=j+1
k · Proof. Observe first that the function f (x, y ) = x − i=1 yi ∈ D, by the length of y consecutive applications of composition. Note also that when R(x) ∈ Dk , def and f ∈ Dm , then the relation S(y ), defined by S(y ) ⇔ R(f1 (y ), . . . , fk (y )), m belongs to D . We prove the lemma by induction on k. proposition 2 constitutes induction start. Note that k j k j k j i=1 xi = i=j+1 xi ⇔ ¬( i=1 xi < i=j+1 xi ) ∧ ¬ i=j+1 xi < i=1 xi . Hence, it is sufficient to perform the induction step when r is ‘<’. Moreover, as D is boolean, we may invoke the induction hypothesis (i.h.) for r ∈ {≥, ≤, <, >, =}. We proceed to the case k + 1, with 2 ≤ k, and we shall first prove the special cases of j = k. Clearly k
This shows that
k
i=1
i=1
· xi < xk+1 ⇔ 0 < xk+1 −
k
xi
.
i=1
xi < xk+1 ∈ D by the initial remarks.
A Characterisation of the Relations Definable
263
Next consider the general case 1 ≤ j < k. j
xi <
k
i=1 j i=1
xi + xk+1 ⇔
i=j+1
j j k
· xk+1 ≤ . xi < xk+1 ∨ xi ∧ xi −x xi k+1 <
i=1
ψ
i=1
i=j+1
φ
The conjunct marked φ above, needs special attention. Consider the function j
· gj (x1 , . . . , xj , xk+1 ) = ( xi ) −x k+1 . i=1
Now gj is not in D. However, we note that · · · · · · · · gj (x, y) = (x1 −y)+(x 2 −(y −x1 ))+· · ·+(xj −(y −x1 −x2 − · · · −xj−1 )) · · · )) (†) Furthermore, when ψ is true, we have that gj (x, y) = i xi − y. Importantly for us, the expression (†) is a sum of j summands, each summand being a Dfunction of the variables involved. Hence
ψ∧φ ⇔ ψ∧
j
i−1 k
· · xi − < xk+1 − x xi ,
i=1
=1
φ
i=j+1
which concludes the proof when r is ‘<’, since the φ is in D by the i.h. 3.1
Presburger Arithmetic and D
Let PrA be the 1st -order language {0, S, +, <, =} with the intended structure def N = (IN, 0, S, +, <)—the natural numbers with the usual order, successor and addition. Terms, (atomic) formulae, and the numerals m are defined in the standard way. As usual we use abbreviations extensively, e.g. t ≤ s for t = s ∨ t < s. Many readers will no doubt have recognized PrA as the language of Presburger Arithmetic, see e.g. Enderton [3, p. 188]. It is known that the theory N is decidable (we return to this point in section 4), and that it does not admit quantifier elimination. We overload the symbol PrA to also denote the set of PrA-formulae. For def def ∈ INk |N |= φ(m)}, and set PrA = φ(x) ∈ PrA, define Rφ ⊆ INk by Rφ = {m {Rφ |φ ∈ PrA}. PrAqf denotes the set of quantifier free PrA-formulae; PrAΔ0 Δ0 the Δ0 -formulae. PrAqf and PrA are defined as expected. It is well-known that for terms t ∈ PrA, we have N |= t = m for some m and that this m is unique. Also, any PrA-term t is clearly equivalent to some term t = ( i≤k ai xi ) + m, where the right side is shorthand for the term
264
M. Barra
(x1 + · · · + x1 ) + · · · + (xk + · · · + xk ) + m . a1 −times
ak −times
Thus, any atomic formula φ(x) ∈ PrA is equivalent to a formula of the form6
bi yi + b . (‡) ai xi + a r The main result of this section is Theorem 2. PrAqf = D . Last section’s lemma 4 provides us with most of the proof of PrAqf ⊆ D . Proof (of PrAqf ⊆ D ). Let R ∈ PrAqf . Then, by definition we have R = Rφ for some φ(x) ∈ PrAqf . Since D is boolean, it is sufficient to prove the lemma for φ atomic. Let φ(x) be atomic. Hence it is of the form specified in (‡) above. If we let R be the relation
def ai xi + z r bi yi + w , R(x, y ) = we have by lemma 4 that χR ∈ D. But then
χR (n, m, a, b) = 0 ⇔ ai n i + a r bi mi + b ⇔ N |= φ(n, m) , and so χR (x, y , a, b) = χRφ . Hence χRφ ∈ D.
To facilitate the proof of the opposite inclusion, we first consider the language · def · · PrA − = PrA ∪ { −, −}, viz. PrA augmented with new function symbols ‘ −’ def · · and ‘−’, and with intended model Z = (ZZ, 0, S, +, −, −, <). Note that − is well-defined on all of ZZ 2 by its original definition max(x − y, 0). We also just remark on the fact that for φ ∈ PrA variable free, we have Z |= φ ⇔ N |= φ. · Secondly, to every function f ∈ Dk , there is a PrA − -term tf (x) such that, for all n, m ∈ IN we have Z |= tf (n) = m ⇔ f (n) = m . ·
Lemma 5. Let φ(x) ∈ PrA − be atomic. Then, there is a φ (x) ∈ PrAqf such that, for all n ∈ IN we have N |= φ (n) ⇔ Z |= φ(n) . ·
Proof. For a PrA − -term t, we define rk −· (φ) to be the number of occurrences of · · the symbol − in t. Note that when φ ∈ PrA − is atomic it is of the form t1 r t2 , and that rk −· (t1 ) + rk −· (t2 ) = 0 implies that either φ ∈ PrAqf , or it is equivalent to φ ∈ PrAqf by basic arithmetical considerations. The proof is by induction on rk −· (t1 ) + rk −· (t2 ); the comments above effects the induction start. Induction step: Let rk −· (t1 ) + rk −· (t2 ) = + 1. At least one of the ti must · · (s) = 1, i.e. s may contain a sub-term s of the form s1 −s 2 , and with with rk − be chosen to satisfy rk −· (s1 ) = rk −· (s2 ) = 0. Next, consider the terms defined 6
We continue to use r as meta variable, ranging over {<, =} if not otherwise specified.
A Characterisation of the Relations Definable
265
by t− i = ti [s := s1 − s2 ] and ti = ti [s := 0], where ti [s := s ] denotes the result of substituting s for all occurrences of the sub-term s. We note that 0 · (t ) < rk − · (ti ) for at least one i since we have removed at least rk −· (t− i ) = rk − i · one occurrence of − from one of the ti in each construction. Thus we have 0 · (t2 ) = rk − · (t ) + rk − · (t2 ) ≤ , and moreover rk −· (t− 1 ) + rk − 1 s (x) < s (x) ∧ t0 (x) r t0 (x) 1 2 1 2 . t1 (x) r t2 (x) ⇔ − s2 (x) ≤ s1 (x) ∧ t− ( x ) r t x) 1 2 ( def
0
def
By the i.h., the relation t1 (x) r t2 (x) is in PrAqf , since each disjunct above is a · conjunction of atomic PrA − -terms of rank strictly less than + 1.
We are now ready to finish the proof of theorem 2. Proof (of D ⊆ PrAqf ). Let R ∈ Dk . Then R = f −1 (0) for some f ∈ Dk . Next, · fix a PrA − -term tf such that Z |= t(x) = y iff f (x) = y. Apply lemma 5 to obtain φ ∈ PrAqf satisfying N |= φ(x, y) ⇔ Z |= t(x) = y. But then also x ∈ f −1 (0) ⇔ N |= φ(x, 0), and thus f −1 (0) = Rφ ∈ PrAqf
. Considering that no increasing functions are available in D, and perhaps even more striking, only composition is allowed, the class D really delivers more than one—at a first glance—would expect7 . It is also of interest that when only composition is available, none of the standard linear initial functions add anything to D . More precisely · Theorem 3. [I ∪ N ∪ {max, min, C, S, P, +, −}; comp] = D .
Space does not permit a proof, but by theorem 2 it is sufficient to construct PrAqf formulae representing the graphs of the functions in this class—surely the reader can supply the details.
4
The Classes BD and G1
In this section we turn our attention to the class BD, the class obtained by adding the schema of bounded minimalisation to the closure operations of the def · class D. Thus, define BD = [I ∪ N ∪ { −}; comp, bmin]. Obviously D ⊆ BD. It is straightforward to show that lemma 1 is true also of f ∈ BD, in other words (x) ≤ max(xi , cf ) for some fixed i and cf , and cf = 0 when f (INk ) is infinite. Also, BD is boolean—since functions for intersection and complement are already in D—and closed under quantifiers bounded by a variable—since all classes F where F is closed under bmin share this feature. Define PrAΔV by PrAqf ⊆ PrAΔV , and φ ∈ PrAΔV ⇒ ∃z≤y φ ∈ PrAΔV . The ‘V ’ in ΔV is meant to reflect the requirement that the quantified variable be bounded by a variable, and not a general term, and is also known as finite or linear quantification. 7
In contrast, e.g. [I ∪ N ∪ {C}; comp] essentially consists of ∅, {0}, IN \ {0} and IN, (and their products), and most other familiar functions, like S, P or +, even fail to produce a boolean set of relations.
266
M. Barra
V Lemma 6. If f ∈ BD, then Γf ∈ PrAΔ .
Proof. By induction on f . That D = PrAqf effects the induction start. Case h ◦ g : Let ij (for 1 ≤ j ≤ m) be the strict top index of gj if gj (IN) is infinite, and max gj (IN) otherwise. Fix PrAΔV -formulae φh (z , y) and φj (x, zj ) representing the graphs of h and the gj ’s respectively. Then (h ◦ g)(x) = y ⇔ ∃z1 ≤t1 · · · ∃zm ≤tm φ(x, zi ) ∧ φh (z, y) , 1≤i≤m
where tj = xij for unbounded gj , and ij else. Clearly, quantification bounded by a constant is merely a finite disjunction, and as such even PrAqf is closed under ∃x≤c type quantifiers. Case μz≤y [g1 = g2 ]: Let φi represent gi (x) = y, and define ⎧ y ∧ ¬∃z≤y (φ1 (x, z) ∧ φ2 (x, z)) ⎨w = w = z ∧ φ1 (x, z) ∧ φ2 (x, z) φ(x, y, w) ⇔ ⎩ ∃z≤y ∀u≤z (¬φ1 (x, u) ∨ ¬φ2 (x, u) ∨ u = z) Then φ ∈ PrAΔV , and N |= φ(x, y, w) ⇔ f (x, y) = w as required.
Corollary 1. D BD =
V PrAΔ
.
The class G1 is defined by G1 = [I ∪ {0, S, +}; comp, bmin], and Harrow proved that G1 = PrA . Our next theorem is def
Theorem 4. BD = G1 = PrA . We obtain theorem 4 via the original proof that the theory of PrA is decidable. In 1930 Moj˙zes Presburger demonstrated this now well-known fact in8 ¨ Uber die Vollst¨ andigket eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt [15] by proving that the theory of the intended structure N≡ = (IN, 0, S, +, <, ≡2 , ≡3 , . . .) for the def language PrA≡ = PrA ∪ {≡2 , ≡3 , . . .} does admit quantifier elimination. In particular, for any PrA-formula φ, there is a φ ∈ PrA≡,qf such that N≡ |= φ ⇔ N≡ |= φ ⇔ N |= φ . The relation ‘congruence modulo λ’, is definable by x ≡λ y ⇔ ∃u≤x x = y + u + · · · + u ∨ ∃u≤y y = x + u + · · · + u . λ-terms
λ-terms
V Since the right side is clearly PrAΔ , by corollary 1 these predicates belong to BD for all λ ∈ IN. But we can do even better.
8
The author has consulted the excellent translation by D. Jaquette, On the Completeness of a Certain System of Arithmetic of Whole Numbers in Which Addition Occurs as the Only Operation [16].
A Characterisation of the Relations Definable
267
Lemma 7. Let p, q ∈ IN[x] be linear polynomials, and let λ ∈ IN. Then, the relation p(x) ≡λ q(x) is in BD . Proof. First note that the unary function remλ (x) = rem(x, λ) ∈ BD, for each fixed λ ∈ IN, since rem(x, λ) = μz≤λ [χ≡λ (x, z) = 0]. kp kp p ai xi + mp . Set Ap = i=1 ai , and note that Ap is indeWrite p(x) = i=1 pendent of x. Also, since remλ (x) < λ, we have def
p
s (x) =
kp
ai remλ (xi ) < Ap λ .
i=1
Similarly for q(x). Then p(x) ≡λ q(x) ⇔ sp (x) ≡λ sq (x) . It remains to show that the relation sp (x) ≡λ sq (x) is in BD . Since bounded def addition is in D, we also have sˆp (x, z) = min(sp (x), z) in BD. But then, for A = max(λAp , λAq ), we have p(x) ≡λ q(x) ⇔ sˆp (x, A) ≡λ sˆq (x, A).
Because lemma 7 yields a decision procedure for all atomic PrA≡ -formulae qf within BD , and since BD is boolean, we have PrA≡, ⊆ BD . Whence, via Presburger’s original results qf V = PrA ⊇ PrAΔ = BD , BD ⊇ PrA≡,
which constitutes a proof of theorem 4. Note that this also proves that V = PrA . Corollary 2. PrAΔ
5
The Classes BDD and G2
Let PA = PrA ∪ {×}, be the language of Peano Arithmetic, and adopt the notations of the previous sections. The definition of the class G2 , minted in def our notation, is G2 = [I ∪ N ∪ {0, S, +, ×}; comp, bmin]. Harrow proved that Δ0 ΔV 2 0 0 PA = G , and that PAΔ course PAΔ = PA . Off PA . def · · Let the class BDD = [I ∪ N ∪ { −, · , rem}; comp, bmin]; informally BDD is obtained by adding integer division and remainder to BD. A natural question to ask is whether the inclusion of these new functions, which together make for an inverse to multiplication, makes BDD to G2 what BD is to G1 . The answer is yes. def
2 0 Theorem 5. BDD = PAΔ = G .
We only sketch the proof. Observe that xy = z ⇔
z z ∨ φ ⇔ y > 0 ∧ rem(z, y) = 0 ∧ y >0∧x= =x ∨φ , y y
where φ(x, y, z) is e.g. y = 0 ∧ z = 0, and so the graph of multiplication is computable in BDD .
268
M. Barra
The class CA of constructive arithmetic predicates as introduced by Smullyan (see [17]), is defined as the closure of the graphs of addition and multiplication under explicit transformations, boolean operations and quantification bounded by a variable. Hence CA ⊆ BDD . It is straightforward to prove that BDD ⊆ 0 PAΔ , by a lemma analogous to lemma 6, especially since here we need not worry about the ΔV -restriction and simply define the graphs of BDD-functions 0 within the framework of PAΔ (easy). This is sufficient, since results by Bennett, Wrathall and Lipton in [1,18,14] Lip Ben Wra 0 , which completes imply the nontrivial identities CA = RUD = LH = PAΔ a proof of theorem 5. (Here RUD are Smullyan’s rudimentary relations, and LH the linear hierarchy.) Acknowledgments and final remarks. The author would like to thank Lars Kristiansen for valuable discussions, and the referees for their thorough reading and commenting on the article. In particular, section 5 is new to the final version of this paper, and materialised after one of the referees pointed in the 0 direction of Lipton [14]. The correct reference to the identity CA = PAΔ proved invaluable. This theorem is widely—and falsely—attributed to Harrow, and the surrounding confusion initially stalled the proof of theorem 5. Note however that it is not clear if DD (where DD is ‘BDD without bmin’) is equal to PAqf , and so the question of whether the analogue to theorem 2 qf is true is an open problem. DD qf ⊆ PA is obvious, but the converse inclusion might very well be proper; expressing general polynomial equations in BDD appears to make use of quantifiers in an essential way.
References 1. Bennett, J.H.: On Spectra, Ph.D. thesis, Princeton University (1962) 2. Clote, P.: Computation Models and Function Algebra. In: Handbook of Computability Theory, Elsevier, Amsterdam (1996) 3. Enderton, H.B.: A mathematical introduction to logic. Academic Press, Inc., San Diego (1972) 4. Grzegorczyk, A.: Some classes of recursive functions. In: Rozprawy Matematyczne, vol. IV, Warszawa (1953) 5. Harrow, K.: Sub-elementary classes of functions and relations, Ph.D.Thesis, New York University (1973) 6. Harrow, K.: Small Grzegorczyk classes and limited minimum. Zeitscr. f. math. Logik und Grundlagen d. Math. 21, 417–426 (1975) 7. Jones, N.D.: Logspace and Ptime characterized by programming languages. Theoretical Computer Science 228, 151–174 (1999) 8. Jones, N.D.: The expressive power of higher-order types or, life without CONS. J. Functional Programming 11, 55–94 (2001) 9. Kristiansen, L.: Neat function algebraic characterizations of logspace and linspace. Computational Complexity 14(1), 72–88 (2005) 10. Kristiansen, L.: Complexity-Theoretic Hierarchies Induced by Fragments of G¨ odel’s T: Theory of Computing Systems (2007), http://www.springerlink.com/content/e53l54627x063685/
A Characterisation of the Relations Definable
269
11. Kristiansen, L., Barra, M.: The small Grzegorczyk classes and the typed λ-calculus. In: Cooper, S.B., L¨ owe, B., Torenvliet, L. (eds.) CiE 2005. LNCS, vol. 3526, pp. 252–262. Springer, Heidelberg (2005) 12. Kristiansen, L., Voda, P.J.: The surprising power of restricted programs and G¨ odel’s functionals. In: Baaz, M., Makowsky, J.A. (eds.) CSL 2003. LNCS, vol. 2803, pp. 345–358. Springer, Heidelberg (2003) 13. Kristiansen, L., Voda, P.J.: Complexity classes and fragments of C. Information Processing Letters 88, 213–218 (2003) 14. Lipton, R.J.: Model theoretic aspects of computational complexity. In: Proc. 19th Annual Symp. on Foundations of Computer Science, Silver Spring MD, pp. 193– 200. IEEE Computer Society, Los Alamitos (1978) ¨ 15. Presburger, M.: Uber die Vollst¨ andigket eines gewissen Systems der Arithmetik ganzer Zahlen, in welchem die Addition als einzige Operation hervortritt. In: Sprawozdanie z I Kongresu Matematyk´ ow Slowa´ nskich, pp. 92–101 (1930) 16. Presburger, M., Jaquette, D.: On the Completeness of a Certain System of Arithmetic of Whole Numbers in Which Addition Occurs as the Only Operation. History and Philosophy of Logic 12, 225–233 (1991) 17. Smullyan, R.M.: Theory of formal systems (revised edition). Princeton University Press, Princeton, New Jersey (1961) 18. Wrathall, C.: Rudimentary predicates and relative computation. SIAM J. Comput. 7(2), 194–209 (1978)
Finding Minimum 3-Way Cuts in Hypergraphs Mingyu Xiao Department of Computer Science and Engineering The Chinese University of Hong Kong [email protected]
Abstract. The minimum 3-way cut problem in an edge-weighted hypergraph is to find a partition of the vertices into 3 sets minimizing the total weight of hyperedges with at least two endpoints in two different sets. In this paper we present some structural properties for minimum 3-way cuts and design an O(dmn3 ) algorithm for the minimum 3-way cut problem in hypergraphs, where n and m are the numbers of vertices and edges respectively, and d is the sum of the degrees of all the vertices. Our algorithm is the first deterministic algorithm finding minimum 3-way cuts in hypergraphs.
1
Introduction
Given a hypergraph G = (V, E) with nonnegative hyperedge weights, and an integer k, a k-way cut for G is a subset of hyperedges whose deletion separates the graph into k nonempty components, and the minimum k-way cut problem is to find a k-way cut minimizing the total weight of it. In the literature k-way cuts are also referred as k-cuts or multi-component cuts. The minimum k-way cut problem is an extension of the classical minimum cut problem, and it has great applications in the area of VLSI system design, parallel computing systems, clustering, network reliability and finding cutting planes for the travelling salesman problems. The minimum k-way cut problem in graphs was well studied during the past twenty years. First, Goldschmidt and Hochbaum [5] proved that this problem is NP-hard when k is part of the input and presented a polynomial algorithm for 2 fixed k with running time O(nk T (n, m)), where T (n, m) is the time required to compute a minimum cut or maximum flow in an edge-weighted graph. Recently √ Kamidoi et al. [8] improved the running time bound to O(n4k/(1−1.71/ k)−34 T (n, m)). Karger and Stein [10] proposed a Monte Carlo algorithm with O(n2(k−1) log3 n) time. It seems that all the above algorithms do not work in hypergraphs and no solid results in hypergraphs for k > 2 are known. When k is a small number, the problem has also drawn much attention. The minimum 2-way cut problem is commonly known as the minimum cut problem. Another version of the minimum 2-way cut problem is the minimum (s, t) cut problem, which asks us to find a minimum cut that splits two given vertices s and t. These two problems are classical and fundamental problems in the subject of graph connectivity. For graphs, the minimum cut problem can M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 270–281, 2008. c Springer-Verlag Berlin Heidelberg 2008
Finding Minimum 3-Way Cuts in Hypergraphs
271
be solved in O(mn + n2 log n) time by Nagamochi and Ibaraki’s algorithm [17] or Stoer and Wagner’s algorithm [21], and the minimum (s, t) cut problem can be solved in O(mn log n2 /m) time by Goldberg and Tarjan’s algorithm [4] and O(min{n2/3 , m1/2 }m log(n2 /m) log U ) time by Goldberg and Rao’s algorithm [3], where U is the maximum capacity of the edge. For hypergraphs, there are also some good results on them. Klimmek and Wagner [13] and Mak and Wong [16] extended Stoer and Wagner’s algorithm [21] to hypergraphs and gave an O(dn + n2 log n) algorithm for the minimum cut problem in hypergraphs, where d is the sum of the degrees of all the vertices. Lawler [14] showed that a minimum (s, t) cut in a hypergraph can be computed by using one maximum flow computation in an auxiliary digraph with n + 2m vertices and 2d + m edges. Then a minimum (s, t) cut in a hypergraph can be found in O(dm) time. In the remainder of the paper, we use T (n, m) and T (n, m, d) to denote the running times of computing a minimum (s, t) cut in a graph and a hypergraph respectively. For the minimum 3-way cut problem in graphs, Kapoor [9] and Kamidoi et al. [7] showed that it can be solved by using O(n3 ) maximum flow computations. Burlet and Goldschmidt [1] and Nagamochi and Ibaraki [18] improved the result to O(n2 ). Furthermore, Nagamochi et al. [18], [19] proved that the minimum k ) time for k = 4, 5, 6. Unfortunately, k-way cut problem can be solved in O(mn we do not find many results in hypergraphs for k ≥ 3. In fact, Nagamochi et al. [18] considered the minimum {3, 4}-way cuts in hypergraphs, but their measure is different, which makes their problem easier. Currently the frequently used algorithms in VLSI system design to partition a hypergraph are some heuristic algorithms without any theoretic guarantee, such as hMETIS [11] and other algorithms based on multilevel partitioning framework [12]. Effective algorithms for minimum 3-way cuts in hypergraphs have potential applications in VLSI system design. A deeper understanding of the structure of 3-way cuts will help us investigating the structure of the general k-way cut problem. Motivated by these, in this paper we study the minimum 3-way cut problem in hypergraphs. Roughly speaking, there are three techniques frequently used to design algorithm for finding minimum 3-way cuts, as well as minimum k-way cuts, in graphs. The first one is based on searching minimum (S, T ) cuts. The first algorithm for the general k-way cut problem presented by Goldschmidt and Hochbaum [5] is an example. They proved that there are 4 vertices a, b, c, d such that the minimum ({a, b}, {c, d}) cut is contained in a minimum 3-way cut. If we try all the O(n4 ) possibilities by taking each one as a subset of a 3-way cut and finding a such 3-way cut with weight minimized, then we can get a minimum 3-way cut by selecting a lightest one among all the O(n4 ) 3-way cuts. Later Kapoor [9] improved this method to O(n3 ) maximum flow computations. The second technique is enumerating all small 2-way cuts (We will simply call a 2-way cut a cut). We sort all cuts in the graph in the order of nondecreasing weights. If we try on each cut by taking it as a subset of a 3-way cut, finally we will meet a cut contained in a minimum 3-way cut and then get a minimum 3-way cut. Nagamochi et al. [18] proved that at least one of the first O(n) small cuts is
272
M. Xiao
contained in a minimum 3-way cut and the first O(n) cuts can be enunciated by using O(n2 ) maximum flow computations. So a minimum 3-way cut can be found in O(n2 T (n, m)) time. Levine [15] proved that a minimum 3-way cut can be found by considering the first O(n) small cuts of all the 4/3-minimum cuts in 3 graphs. Since Karger and Stein’s algorithms [10] can find all the 4/3-minimum cuts and a minimum cut in O(n2 log n) and O(m log3 n) time with high probability respectively, Levine got a Monte Carlo algorithm finding minimum 3-way cuts with time bound O(mn log3 n). The last technique is based on ‘divide-andconquer’, such as Kamidoi et al.’s algorithm [7]. We first find a proper cut of the graph that is noncrossing with a minimum 3-way cut, then we can find minimum 3-way cuts in two smaller graphs. The hard part of this method is that the proper cut is not always easy to get and sometimes we will turn to the second technique to find one. All of the above algorithms can not be extended to the problem in hypergraphs directly. We look at one of the fastest algorithms for finding minimum 3-way cuts in graphs presented by Nagamochi et al. [18]. Their algorithm is based on the observation that at least one of the first r cuts {C1 , · · · , Cr } in the order of nondecreasing weights is contained in a minimum 3-way cut, where r is the first integer such that Cr is crossing with Cq (1 ≤ q < r). We give an example in Figure 1 to show that this theorem does not hold in hypergraphs. G = (V, E) is a hypergraph with 6 vertices and 6 hyperedges, V = {a, b, c, d, e, f }, E = {abc, cdef, cd, de, ef, cf }, and w(abc) = 4.5, w(cdef ) = 2, w(cd) = w(de) = w(ef ) = e(cf ) = 1. [{a, b, c, d}, {e, f }] and [{a, b, c, f }, {d, e}] are the first two small cuts and they are crossing, but none of them is contained in the minimum 3-way cut [{a}, {b}, {c, d, e, f }]. In this paper, based on the idea of Kamidoi et al.’s divided-and-conquer algorithm [7], we define a class of cuts called k-way free cuts (See Definition 3) and present a framework of designing algorithms for the minimum k-way cut problem in hypergraphs by finding k-way free cuts. Then we prove that minimum 3-way cuts in hypergraphs can be found by using O(n3 ) hypergraph minimum (s, t) cut computations. a
c
1
1
2
4.5 b
f
1
d 1 e
The first two small cuts: [{a, b, c, d}, {e, f}] and [{a, b, c, f}, {d, e}]. The minimum 3-way cut: [{a}, {b}, {c, d, e, f}].
Fig. 1. The first two small cuts not being contained in a minimum 3-way cut
2
Preliminaries
Let G = (V, E) be an edge-weighted hypergraph with |V | = n vertices and |E| = m hyperedges. For any hyperedge e ∈ E, V (e) denotes the set of endpoints of e. An induced sub hypergraph H = G[V ] with V ⊆ V is a sub hypergraph with hyperedges whose endpoints in V . For any hyperedge subset C ⊆ E, w(C) denotes the total weight of the hyperedges in C. Let X1 , X2 , · · · , Xk ⊂ V be k (2 ≤
Finding Minimum 3-Way Cuts in Hypergraphs
273
k ≤ n) disjoint nonempty subsets of vertices, then [X1 , X2 , · · · , Xk ] denotes the set of hyperedges crossing at least two different vertex sets of {X1 , X2 , · · · , Xk } (Hyperedges that the endpoints of each of them are in at least two different sets of {X1 , X2 , · · · , Xk }). We also define another important hyperedge set: k e(X1 , X2 , · · · , Xk ) = {e|e ∈ [X1 , X2 , · · · , Xk ]&V (e) ⊆ i=1 Xi }. There is an illustration for these two notations in Figure 2. Let V = {a, b, c, d, e} and E = {ab, abc, abe, ae, bde, cd}. Then [{a, b}, {c}, {d}] = {abc, bde, cd} and e({a, b}, {c}, k {d}) = {abc, cd}. When i=1 Xi = V , i.e., {X1 , X2 , · · · , Xk } is k-partition of V , we say [X1 , X2 , · · · , Xk ] is a k-way cut of the hypergraph and Xi is a component of the k-way cut. Over all the k-way cuts those with the minimum weight are called minimum k-way cuts. A 2-way cut [X, X] is also simply called a cut of the hypergraph, where X = V −X. If we require a special vertex s in X and a special vertex t in X, then such kind of cuts are called (s, t) cuts. Generally for any two disjoint sets of vertices S and T , a group of hyperedges is called an (S, T ) cut, if deleting it splits S away from T . Merging some vertices means identifying these vertices as a new vertex while keeping all the hyperedges incident on them (we also need to update the hyperedges: combining ‘parallel’ hyperedges with weight being the sum of all the weights and deleting self-loops). For a vertex subset X ⊂ V , GX stands for the graph generated by merging X = V − X into a single vertex. This notation is frequently used in this paper. Sometimes a singleton set {s} is simply written as s, w([X1 , X2 , · · · , Xk ]) as w(X1 , X2 , · · · , Xk ), and w(e(X1 , X2 , · · · , Xk )) as we (X1 , X2 , · · · , Xk ).
a
5 vertices: a, b, c, d, e b
e d
c
6 hyperedges: ab, abc, abe, ae, bde, cd [{a, b}, {c}, {d}] = {abc, bde, cd} e({a, b}, {c}, {d}) = {abc, cd}
Fig. 2. An illustration for notations [] and e()
Definition 1. Two cuts [X, X] and [Y, Y ] are noncrossing if X ⊆ Y or X ⊆ Y . Otherwise, we say that they are crossing. Note that when X = Y or X = Y , the two cuts are the same. Some references exclude this case in their definition of noncrossing. The properties of noncrossing between two cuts are studied in many references [6], [2]. Here we extend the definition of noncrossing to describe a relation between a cut and a k-way cut. Definition 2. A cut [X, X] and a k-way cut [Y1 , Y2 , · · · , Yk ] are noncrossing if there exists i ∈ {1, 2, · · · , k} such that X ⊆ Yi or X ⊆ Yi . Otherwise, we say that they are crossing.
274
M. Xiao
Lemma 1. Given a hypergraph G and two vertices u and v in G, let G be the hypergraph after merging u and v into a new vertex, Ck be a minimum k-way cut of G and Ck be the corresponding hyperedge set of Ck in G. If there is a minimum k-way cut C of G such that u and v are in a same component of C, then Ck is also a minimum k-way cut of G. Proof. Let C be the corresponding hyperedge set of C in G . u and v are in a same component of C, so C is a k-way cut of G . Ck is a minimum k-way cut of G , and then w(Ck ) = w(Ck ) ≤ w(C ) = w(C). Thus, Ck is a minimum k-way cut of G. Lemma 2. Given a hypergraph G and a cut C = [X, X] of G, let CX and CX be a minimum k-way cut of GX and GX respectively, where GX is the graph obtained by merging X into a single vertex and GX is the graph obtained by merging X into a single vertex. And let CX and CX be the corresponding edge sets of CX and CX in G respectively. If there is a minimum k-way cut noncrossing with cut C, then either CX or CX is a minimum k-way cut of G. Proof. Assume that minimum k-way cut Ck = [Y1 , Y2 , · · · , Yk ] is noncrossing with cut C. Then there exists i ∈ {1, 2, · · · , k} such that X ⊆ Yi or X ⊆ Yi . According to Fact 1, when X ⊆ Yi , we can merge X into a single vertex; when X ⊆ Yi , we can merge X into a single vertex. Thus, the lemma holds. Lemma 2 provides a prospective divide-and-conquer method to find a minimum k-way cut in hypergraphs. If we get a proper cut which is noncrossing with a minimum k-way cut and each component of the cut contains at least two vertices, then we can find a minimum k-way cut by searching on two smaller graphs. This idea is one of the main ideas we used to find minimum 3-way cuts in this paper. For convenience, we give the following definition. Definition 3. A cut is k-way free, if there exists a minimum k-way cut noncrossing with it. Note that k-way free is just used to describe cuts, but not for k-way cuts for k ≥ 3. To find a minimum k-way cut, we need to find proper k-way free cuts. In Section 3.2, we first present some structural properties of minimum 3-way cuts, which provide us some ways to get 3-way free cuts. The detailed algorithm and running time analysis are presented in Section 4. In Section 3.1, we first give some properties for hyperedges, which will be used in our main proofs.
3 3.1
Properties Hyperedge’s Properties
Lemma 3. Let {X1 , X2 , · · · , Xk } of vertices in a hypergraph, then
{Y1 } be a group of disjoint nonempty subsets
[X1 + Y1 , X2 , · · · , Xk ] ⊇ [X1 , X2 , · · · , Xk ] + e(Y1 , X2 + · · · + Xk ).
Finding Minimum 3-Way Cuts in Hypergraphs
275
If {X1 , X2 , · · · , Xk } {Y1 } is a partition of the vertex set, then [X1 + Y1 , X2 , · · · , Xk ] = [X1 , X2 , · · · , Xk ] + e(Y1 , X2 + · · · + Xk ). Lemma 4. Let {X1 , X2 , · · · , Xk } {Y1 , Y2 } be a group of disjoint nonempty subsets of vertices in a hypergraph, then [X1 +Y1 , X2 +Y2 , X3 , · · · , Xk ] ⊇ [X1 , · · · , Xk ]+e(Y1 +Y2 , X3 +· · ·+Xk )+e(Y1 , X2 )+e(X1 , Y2 ).
Fact 3 and Fact 4 can be simply proved by enumerating all kinds of hyperedges and we just ignore the detailed proof steps here. These two facts are frequently used in the proofs in Section 3.2, and to make the proofs fluent we do not point out which fact being used each time. In fact, we only use the cases k = 2 and 3, which are somewhat intuitive. 3.2
Structural Properties
Lemma 5. In a hypergraph, any minimum cut is 3-way free. Proof. We need to prove that for any minimum cut C = [X, X] of G, there is a Let C3 = [Y1 , Y2 , Y3 ] minimum 3-way cut C3 such that C and C3 are noncrossing. be an arbitrary minimum 3-way cut, and Zij = Xi Yj , (i = 1, 2, j = 1, 2, 3), where X1 = X and X2 = X (See Figure 3). If C and C3 are crossing, then ∃j ∈ {1, 2, 3} such that Z1j and Z2j are nonempty sets. Without loss of generality, we assume that they are Z11 and Z21 . At least one of Z13 and Z23 is not an empty set. Without loss of generality, we further assume that Z23 = ∅. Case 1: Z22 = ∅. We will prove that C3 = [Y1 + Z12 + Z13 , Z22 , Z23 ] is a satisfied minimum 3-way cut (See Figure 3). It is easy to see that C3 is a 3-way cut noncrossing with C. We only need to prove that C3 is minimum. [Z11 , Z11 ] = [Z11 , X2 ] + e(Z11 , Z12 + Z13 ) and [X1 , X2 ] = [Z11 , X2 ] + e(Z12 + Z13 , X2 ). [X1 , X2 ] is a minimum cut. w(Z11 , Z11 ) ≥ w(X1 , X2 ). So we (Z11 , Z12 + Z13 ) ≥ we (Z12 + Z13 , X2 ). Y1
Y2
Y3
X1
Z11
Z12
Z13
X2
Z 21
Z 22
Z 23
: the minimum 3-way cut C3 in Case 1.
Fig. 3. Case 1 of the proof
(1)
276
M. Xiao
s
3
2
a
t
2 1
: the minimum ( s, t ) cut.
b
: the minimum 3-way cut.
Fig. 4. A minimum (s, t) cut not being 3-way free
On the other hand, C3 = [Y1 + Z12 + Z13 , Z22 , Z23 ] = [Y1 , Z22 , Z23 ] + e(Z12 + Z13 , Z22 + Z23 ) and C3 = [Y1 , Y2 , Y3 ] ⊇ [Y1 , Z22 , Z23 ] + e(Y1 , Z12 + Z13 ). It is obvious that e(Z12 + Z13 , Z22 + Z23 ) ⊆ e(Z12 + Z13 , X2 ) and e(Z11 , Z12 + Z13 ) ⊆ e(Y1 , Z12 + Z13 ). According to (1), we get that w(C3 ) ≤ w(C3 ). C3 is a minimum 3-way cut. Case 2: Z22 = ∅. For this case, Z12 = ∅. Let C3 = [Z11 , Z12 , Z21 + Y3 ]. Like what we do in Case 1, we will prove that C3 is a satisfied minimum 3-way cut. [Z23 , Z23 ] = [X1 , Z23 ] + e(Z21 , Z23 ) and [X1 , X2 ] = [X1 , Z23 ] + e(X1 , Z21 ). (Z23 , Z23 ) ≥ w(X1 , X2 ). So we (Z21 , Z23 ) ≥ we (X1 , Z21 ).
(2)
C3 = [Z11 , Z12 , Z21 + Y3 ] = [Z11 , Z12 , Y3 ] + e(Z11 + Z12 , Z21 ), where e(Z11 + Z12 , Z21 ) = e(Z11 , Z21 )+e(Z12 , Z21 )+e(Z11 , Z12 , Z21 ) ⊆ e(Z12 , Z21 )+e(X1 , Z21 ). And C3 = [Y1 , Y2 , Y3 ] ⊇ [Z11 , Z12 , Y3 ]+ e(Z12 , Z21 )+ e(Z21 , Y3 ) ⊇ [Z11 , Z12 , Y3 ]+ e(Z12 , Z21 ) + e(Z21 , Z23 ). According to (2), we know that w(C3 ) ≤ w(C3 ). We have discussed all the cases and then finished the proof. Lemma 5 shows a noncrossing property between minimum cuts and minimum 3-way cuts. The noncrossing property between minimum (s, t) cuts and minimum 3-way cuts is a little weaker. Given two vertices s and t, the minimum (s, t) cut may not be 3-way free even in graphs. We give an example in Figure 4. G = (V, E) is a graph with 4 vertices and 4 edges, V = {a, b, s, t}, E = {ab, bt, st, as}, and w(ab) = 1, w(bt) = 2, w(st) = 3, w(as) = 2. It is easy to see that [a, b, {s, t}] is the unique minimum 3-way cut and [{a, s}, {b, t}] is the unique minimum (s, t) cut in the graph. They are crossing. In the next two lemmas we show some further properties between minimum (s, t) cuts and minimum 3-way cuts. Lemma 6. Given a hypergraph G and two vertices s and t in it, if there is a minimum 3-way cut whose removal disconnects s from t (s and t are in two different components of the 3-way cut), then any minimum (s, t) cut is 3-way free. Proof. The proof is based on the proof of Lemma 5. Let C3 = [Y1 , Y2 , Y3 ] be a minimum 3-way cut, and s and t be two vertices in two different components
Finding Minimum 3-Way Cuts in Hypergraphs
277
of C3 . Suppose C = [X1 , X2 ] is a minimum (s, t) cut, where X2 = X1 . Let Zij = Xi Yj , (i = 1, 2, j = 1, 2, 3). Without loss of generality, we also assume that Z11 , Z21 = ∅,s ∈ X1 , t ∈ X2 , and t ∈ Y3 . Then t ∈ Z23 and s is in either Z11 or Z12 . Case 1: s ∈ Z11 . The proof just follows the proof of Lemma 5, because [Z11 , Z11 ] and [Z23 , Z23 ] are still (s, t) cuts and the two cases in the proof of Lemma 5 still work. Case 2: s ∈ Z12 . When Z22 = ∅, we exchange Y1 and Y2 and then it becomes Case 1. When Z22 = ∅, we only need to do Case 2 in the proof of Lemma 5. For this case, [Z23 , Z23 ] is still an (s, t) cut, so the proof is the same. Lemma 7. Given a hypergraph G and two disjoint vertex subsets S, T ⊂ V , if there is a minimum 3-way cut whose removal disconnects S from T , then any minimum (S, T ) cut is 3-way free. Proof. Let C3 = [Y1 , Y2 , Y3 ] be a minimum 3-way cut whose removal disconnects S from T , C = [X1 , X2 ] be a minimum (S, T ) cut, and Zij = Xi Yj , (i = 1, 2, j = 1, 2, 3). Since removal of C3 will disconnect S from T , either S or T is contained in a component of C3 . Without loss of generality, we assume that S ⊆ Y1 . Furthermore, we assume that S ⊆ X1 and T ⊆ X2 . Then S ⊆ Z11 and T ⊆ Z22 Z23 . When T is also contained in a component of C3 , by Fact 1 we can safely merge S into a single vertex and merge T into a single vertex. The minimum 3-way cut in the remaining graph is still a minimum 3-way cut in the original graph. And by Lemma 6 we know that there is a minimum 3-way cut noncrossing with C. Otherwise, we let T1 = T ∩ Y2 and T2 = T ∩ Y3 . T1 , T2 = ∅. For this case, we can prove that C3 = [Y1 + Z12 + Z13 , Z22 , Z23 ] is a minimum 3-way cut noncrossing with C like what we do in Case 1 of the proof of Lemma 5. The reason is that T1 ⊆ Z22 = ∅ and T2 ⊆ Z23 = ∅, and [Z11 , Z11 ] is still an (S, T ) cut. If we want to use Lemma 2 to design algorithms, we first need to find a 3-free cut [X, X] such that neither X nor X only contains one vertex. The 3-way free cuts discussed above, including the minimum cuts and the minimum (s, t) cuts, may not have this property. For convenience, we give the following definition. Definition 4. A cut [X, X] is called a 2-size cut, if |X| ≥ 2 and |X| ≥ 2. Over all the 2-size cuts those with the minimum weight are called minimum 2-size cuts. Lemma 8. Let G be a hypergraph with more than 6 vertices. If G has a minimum 3-way cut such that each component of it has at least 2 vertices, then any minimum 2-size cut in G is 3-way free. Proof. Let the minimum 3-way cut be C3 = [Y1 , Y2 , Y3 ], the minimum 2-size cut be C = [X1 , X2 ], and Zij = Xi Yj , (i = 1, 2, j = 1, 2, 3). We will prove the lemma by using Lemma 7.
278
M. Xiao
1
3 2
2
3
4
3
3 3 3
1 : the minimum cut. : the minimum 4-way cut.
: the minimum cut. : the minimum 5-way cut.
Fig. 5. Minimum cuts not being {4, 5}-way free
If there is a component of C3 being contained in a component of C, say Y1 ⊆ X1 , we can prove the lemma as the follows. Y1 has at least two vertices a and b. X2 has least two vertices c and d, and c, d ∈ Y2 ∪ Y3 . Let S = {a, b} and T = {c, d}. Removal of C3 will disconnect S from T and C is a minimum (S, T ) cut. By Lemma 7 we know the lemma holds. Otherwise, none of Zij (i = 1, 2, j = 1, 2, 3) is an emptyset. G has more than 6 vertices. At least one of Zij (i = 1, 2, j = 1, 2, 3) contains more than one vertex. Without loss of generality, we assume that two different vertices a, b ∈ Z11 . Z22 , Z23 = ∅. Assume that c ∈ Z22 and d ∈ Z23 . Let S = {a, b} and T = {c, d}. We can also prove the lemma by using Lemma 7. Remark. In Lemma 8, if graph G has right 6 vertices, the result may not hold. Here is an example. G = (V, E) is a hypergraph with 6 vertices and 5 hyperedges, V = {a, b, c, d, e, f }, E = {ad, be, cf, abc, def }, and w(ad) = w(be) = w(cf ) = 2.5, w(abc) = w(def ) = 4. [{a, d}, {b, e}, {c, f }] is the unique minimum 3-way cut and each component of it has two vertices. [{a, b, c}, {d, e, f }] is the unique minimum 2-size cut. They are crossing. Lemma 9. A minimum cut may not be k-way free, k ≥ 4. Two examples that minimum cuts are not being {4, 5}-way free are shown in Figure 5. In fact, Lemma 6, Lemma 7 and Lemma 8 also do not hold for minimum 4-way cuts.
4
The Algorithm
Using Lemma 2 and lemmas mentioned in Section 3.2, we can design various algorithms for the minimum 3-way cut problem in hypergraphs. But we should pay attention that when we meet a 3-way free cut [X, X] such that X or X contains only one vertex, then our algorithms may not work any more, because GX or GX will not be a smaller graph. At that time, we need to use Lemma 8. Next, we use Lemma 5 and Lemma 8 to design a recursive algorithm. We iteratively find a minimum cut C = [X, X] as a 3-way free cut, and then find minimum 3-way cuts in smaller hypergraphs GX and GX . When the 3-way free
Finding Minimum 3-Way Cuts in Hypergraphs
279
cut we found is not 2-size, we turn to Lemma 8. There is a 3-way free cut being 2-size or a minimum 3-way cut such that one component of it has only one vertex (Assume the graph has more than 6 vertices). For the former case, the recursive step can continue. And for the later case, we can find a solution directly by trying on each vertex. In fact, the algorithm only includes Step 1 and Step 4 of the following algorithm still works. The reason for it lies in Lemma 8. Step 2 and Step 3 can accelerate the algorithm sometimes but do not help in the running time analysis. Algorithm Hyper3waycut(G): 0. Initially let the solution set S be the whole edge set E. 1. If the graph has less than or equal to 6 vertices, find the solution directly and return it; otherwise, do 2. Find a minimum cut [X, X] of the graph. 3. If both of X and X have more than one vertex, update S ←− min {Hyper3waycut(GX ), Hyper3waycut(GX )}, where GX is the graph obtained by merging X into a single vertex and GX is the graph obtained by merging X into a single vertex. 4. Otherwise, do 4.1 For each vertex v in the current graph, find a minimum cut Cv in induced subgraph G[V − {v}] and update S ←− min{S, Cv ∪ [v, V − {v}]}. 4.2 Find a minimum 2-size cut [X, X] in the current graph and update S ←− min{S, Hyper3waycut(GX ), Hyper3waycut(GX )}. 5. Return S. In each iteration, we will find a 3-way free cut being 2-size in either Step 3 or Step 4.2 and divide the problem into two smaller problems. Now we consider how much time we will use in each iteration. First we compute a minimum cut in the hypergraph in Step 1. If one component of the cut has only one vertex, we will compute n minimum cuts in Step 4.1 and compute a minimum 2-size cut in Step 4.2. Via brute-force search, a minimum 2-size cut can be found by using O(n4 ) minimum (s, t) cut computations. In fact, we can improve it to O(n2 ) minimum (s, t) cut computations by enumerating small cuts. Vazirani and Yannakakis [22] presented an algorithm for finding all the cuts in a graph in the order of nondecreasing weights by building a 0-1 tree. And the delay between two successive outputs is at most n − 1 minimum (s, t) cut computations. This algorithm still works in hypergraphs. A hypergraph has at most n cuts not being 2-size. So at most we need to compute the first n+ 1 outputs to get a minimum 2-size cut. We can find a minimum 2-size cut by using O(n2 ) minimum (s, t) cut computations in hypergraphs. Suppose the current graph has n vertices and X has x vertices (2 ≤ x ≤ n−2). Then GX has x + 1 vertices and GX has n − x + 1 vertices. We get recurrence relation: C(n) ≤ C(x + 1) + C(n − x + 1) + O(n2 ),
(3)
280
M. Xiao
where C(n) is the number of minimum (s, t) cut computations used when Hyper3waycut computes on a hypergraph with n vertices. It is easy to verify that C(n) = O(n3 ) satisfies (3). Theorem 1. Algorithm Hyper3waycut finds a minimum 3-way cut in a hy3 pergraph in O(n3 T (n, m, d)) = O(dmn ) time.
5
Conclusion
In this paper, we have presented an O(dmn3 ) algorithm for finding minimum 3way cuts in hypergraphs. As far as we know, it is the first deterministic algorithm for solving the minimum 3-way cut problem in hypergraphs. k-way partitioning with minimum cost in hypergraphs is one of central problems in VLSI system design [20]. Effective algorithms for small k may have direct applications. But unfortunately most properties and algorithms for k-way cuts in graphs do not hold in hypergraphs, and currently there are not many known algorithms for hypergraphs except some heuristic algorithms without any theoretic guarantee. It would be interesting to find other fast algorithms for the k-way cut problem in hypergraphs, even when k is a small number. Furthermore, to prove the correctness of our algorithm, we present some properties of hypergraphs and minimum k-way cuts, which may be helpful for us to understand the structure of the minimum k-way cuts and differences between the k-way cut problems in graphs and hypergraphs. We note that some good properties for 3-way cuts do not hold for 4-way cut cases, and the algorithm presented in this paper can not be simply extended to the minimum 4-way cut problem in hypergraphs. It seems that other approaches or properties are needed for the minimum k-way cut problem for k ≥ 4.
References 1. Burlet, M., Goldschmidt, O.: A new and improved algorithm for the 3-cut problem. Operations Research Letters 21(5), 225–227 (1997) 2. Easley, R.F., Hartvigsen, D.: Crossing properties of multiterminal cuts. Networks 34(3), 215–220 (1999) 3. Goldberg, A.V., Rao, S.: Beyond the flow decomposition barrier. J. ACM. 45(5), 783–797 (1998): A preliminary version appeared in FOCS 1997 4. Goldberg, A.V., Tarjan, R.E.: A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988) 5. Goldschmidt, O., Hochbaum, D.: A polynomial algorithm for the k-cut problem for fixed k. Mathematics of Operations Research 19(1), 24–37 (1994): A preliminary version appeared in FOCS 1988 6. Gomory, R.E., Hu, T.C.: Multi-terminal network flows. J. SIAM. 9(4), 551–670 (1961) 7. Kamidoi, Y., Wakabayashi, S., Yoshida, N.: A divide-and-conquer approach to the minimum k-way cut problem. Algorithmica 32(2), 262–276 (2002) 8. Kamidoi, Y., Yoshida, N., Nagamochi, H.: A deterministic algorithm for finding all minimum k-way cuts. SIAM Journal on Computing 36(5), 1329–1341 (2006)
Finding Minimum 3-Way Cuts in Hypergraphs
281
9. Kapoor, S.: On minimum 3-cuts and approximating k-cuts using cut trees. In: Proceedings of the 5th International IPCO Conference on Integer Programming and Combinatorial Optimization, Springer, London (1996) 10. Karger, D.R., Stein, C.: A new approach to the minimum cut problem. Journal of the ACM 43(4), 601–640 (1996): Preliminary portions appeared in SODA 1993 and STOC 1993 11. Karypis, G., Kumar, V.: hmetis: A hypergraph partitioning package version 1.5, user manual (1998), http://glaros.dtc.umn.edu/gkhome/fetch/sw/hmetis/manual.pdf 12. Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. VLSI Design 11(3), 285–300 (2000) 13. Klimmek, R., Wagner, F.: A simple hypergraph min cut algorithm, Internal Report B 96-02 Bericht FU Berlin Fachbereich Mathematik und Informatik (1995) 14. Lawler, E.L.: Cutsets and partitions of hypergraphs. Networks 3(3), 275–285 (1973) 15. Levine, M.S.: Fast randomized algorithms for computing minimum {3,4,5,6}-way cuts. In: Proceedings of the 11th annual ACM-SIAM symposium on Discrete algorithms (SODA 2000), Philadelphia, PA, USA. Society for Industrial and Applied Mathematics (2000) 16. Mak, W.-K., Wong, D.F.: A fast hypergraph min-cut algorithm for circuit partitioning. Integration, the VLSI Journal 30(1), 1–11 (2000) 17. Nagamochi, H., Ibaraki, T.: Computing edge connectivity in multigraphs and capacitated graphs. SIAM Journal on Discrete Mathematics 5(1), 54–66 (1992) 18. Nagamochi, H., Ibaraki, T.: A fast algorithm for computing minimum 3-way and 4-way cuts. Mathematical Programming 88(3), 507–520 (2000) 19. Nagamochi, H., Katayama, S., Ibaraki, T.: A Faster Algorithm for Computing Minimum 5-Way and 6-Way Cuts in Graphs. In: Asano, T., Imai, H., Lee, D.T., Nakano, S.-i., Tokuyama, T. (eds.) COCOON 1999. LNCS, vol. 1627, Springer, Heidelberg (1999) 20. Preas, B.T., Lorenzetti, M.: Physical Design Automation of VLSI Systems, Benjamin-Cummings, California (1988) 21. Stoer, M., Wagner, F.: A simple min-cut algorithm. Journal of the ACM 44(4), 585–591 (1997); A preliminary version appeared in ESA 1994 22. Vazirani, V.V., Yannakakis, M.: Suboptimal cuts: Their enumeration, weight and number. In: Kuich, W. (ed.) Automata, Languages and Programming. Proc. of the 19th International Colloquium, Springer, Berlin (1992)
Inapproximability of Maximum Weighted Edge Biclique and Its Applications Jinsong Tan Department of Computer and Information Science School of Engineering and Applied Science University of Pennsylvania, Philadelphia, PA 19104, USA [email protected]
Abstract. Given a bipartite graph G = (V1 , V2 , E) where edges take on both positive and negative weights from set S, the maximum weighted edge biclique problem, or S-MWEB for short, asks to find a bipartite subgraph whose sum of edge weights is maximized. This problem has various applications in bioinformatics, machine learning and databases and its (in)approximability remains open. In this paper, that for a wide we show min δ−1/2 S ∈ Ω(η ) ∩ O(η 1/2−δ ) range of choices of S, specifically when max S (where η = max{|V1 |, |V2 |}, and δ ∈ (0, 1/2]), no polynomial time algorithm can approximate S-MWEB within a factor of n for some > 0 unless RP = NP. This hardness result gives justification of the heuristic approaches adopted for various applied problems in the aforementioned areas, and indicates that good approximation algorithms are unlikely to exist. Specifically, we give two applications by showing that: 1) finding statistically significant biclusters in the SAMBA model, proposed in [18] for the analysis of microarray data, is n -inapproximable; and 2) no polynomial time algorithm exists for the Minimum Description Length with Holes problem [4] unless RP = NP.
1
Introduction
Let G = (V1 , V2 , E) be an undirected bipartite graph. A biclique subgraph in G is a complete bipartite subgraph of G and maximum edge biclique (MEB) is the problem of finding a biclique subgraph with the most number of edges. MEB is a well-known problem and received much attention in recent years because of its wide range of applications in areas including machine learning [14], management science [16] and bioinformatics, where it is found particularly relevant in the formulation of numerous biclustering problems for biological data analysis [5,2,18,19,17], and we refer readers to the survey by Madeira and Oliveira [13] for a fairly extensive discussion on this. Maximum edge biclique is shown to be NPhard by Peeters [15] via a reduction from 3SAT. Its approximability status, on
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 282–293, 2008. c Springer-Verlag Berlin Heidelberg 2008
Inapproximability of Maximum Weighted Edge Biclique
283
the other hand, remains an open question despite considerable efforts [7,8,12]1 . In particular, Feige and Kogan [8] conjectured that maximum edge biclique is hard to approximate within a factor of n for some > 0. In this paper, we consider a weighted formulation of this problem defined as follows Definition 1. S-Maximum Weighted Edge Biclique (S-MWEB) Instance: A complete bipartite graph G = (V1 , V2 , E) (throughout the paper, let η = max{|V1 |, |V2 |} and n = |V1 | + |V2 |), a weight function wG : E → S, where S is a set consisting of both positive and negative integers. Question: Find a biclique subgraph of G where the sum of weights on edges is maximized. A few comments are in order. First note it is not a lose of generality but a technical convenience to require the graph be complete, one can always think of an incomplete bipartite graph as complete where non-edges are assigned weight 0. Also note we require that both positive and negative weights be in S at the same time because otherwise S-MWEB becomes a trivial problem. Our study of S-MWEB is motivated by the problem of finding statistically significant biclusters in microarray data analysis in the SAMBA model [18] and the Minimum Description Length with Holes (MDLH) problem [3,4,10]; detailed discussion of the two problems can be found in Sect. 4. Our main technical min S contribution of this paper is to show that if S satisfies the condition | max S| ∈ δ−1/2 1/2−δ Ω(η ) ∩ O(η ), where δ > 0 is any arbitrarily small constant, then no polynomial time algorithm can approximate S-MWEB within a factor of n for some > 0 unless RP = NP. This result enables us to answer open questions regarding the hardness of the SAMBA model and the MDLH problem. Since maximum edge biclique can be characterized as a special case of S-MWEB with S = {−η, 1}, the n -inapproximability result also provides interesting insights into the conjectured n -inapproximability [8] of maximum edge biclique. The rest of the paper is organized in three sections. In Sect. 2, we present the main technical result by proving the aforementioned inapproximability of SMWEB. We give applications of this by answering hardness questions regarding two applied problems in Sect. 3. We conclude this work by raising a few open problems in the last section.
2
Approximating S-Maximum Edge Biclique Is Hard
We start this section by giving two lemmas about CLIQUE, which will be used in establishing inapproximability for the biclique problems we consider later. Lemma 1 is a recent result by Zuckerman [20], obtained by a derandomization of results of H˚ astad [11]; Lemma 2 follows immediately from Lemma 1. 1
Note it might be easy to confuse the MEB problem with the Bipartite Clique problem discussed by Khot in [12]. Bipartite Clique, which also known as Balanced Complete Bipartite Subgraph [8], aims to maximize the number of vertices of a balanced subgraph whereas MEB aims to maximize the total weights on edges in a (not necessarily balanced) subgraph.
284
J. Tan
Lemma 1. ([20]) It is NP-hard to approximate CLIQUE within a factor of n1− , for any > 0. Lemma 2. For any constant > 0, no polynomial time algorithm can approx1 unless imate CLIQUE within a factor of n1− with probability at least poly(n) RP = NP. 2.1
A Technical Lemma
We first describe the construction of a structure called {γ, {α, β}}-Product, which will be used in the proof of our main technical lemma. Definition 2. ({γ, {α, β}}-Product) Input: An instance of S-MWEB on complete bipartite graph G = V1 ×V2 , where γ ∈ S and α < γ < β; an integer N . Output: Complete bipartite graph GN = V1N × V2N constructed as follows: V1N and V2N are N duplicates of V1 and V2 , respectively. For each edge (i, j) ∈ GN , let (φ(i), φ(j)) be the corresponding edge in G. If wG (φ(i), φ(j)) = γ, assign weight α or β to (i, j) independently and identically at random with expectation being γ, denote the weight by random variable X. If wG (φ(i), φ(j)) = γ, then keep the weight unchanged. Call the weight function constructed this way w(·). For any subgraph H of GN , denote by wγ (H) (resp., w−γ (H)) the total weight of H contributed by former-γ-edges (resp., other edges). Clearly, w(H) = wγ (H) + w−γ (H). With a graph product constructed in this randomized fashion, we have the following lemma. Lemma 3. Given an S-MWEB instance G = (V1 , V2 , E) where γ ∈ S, and a δ(3−2δ)+3
number δ ∈ (0, 12 ]; let η = max (|V1 |, |V2 |), N = η δ(1+2δ) , GN = (V1N , V2N , E) be the {γ, {α, β}}-product of G and S = (S ∪ {α, β}) − {γ}. If 1 1. |β − α| = O((N η) 2 −δ ); and 2. there is a polynomial time algorithm that approximates the S -MWEB instance within a factor of λ, where λ is some arbitrary function in the size of the S -MWEB instance then there exists a polynomial time algorithm that approximates the S-MWEB 1 . instance within a factor of λ, with probability at least poly(n) Proof. For notational convenience, we denote η 2 −δ by f (η) throughout the proof. Define random variable Y = X − γ, clearly E[Y ] = 0. Suppose there is a polynomial time algorithm A that approximates S -MWEB within a factor of λ, we can then run A on GN , the output biclique G∗B corresponds to N 2 bicliques in G (not necessarily all distinct). Let G∗A be the most weighted among these N 2 subgraphs of G, in the rest of the proof we show that with high probability, G∗A is a λ-approximation of S-MWEB on G. Denote by E1 the event that G∗B does not imply a λ-approximation on G. Let H be the set of subgraphs of GN that do not imply a λ-approximation on G, 1
Inapproximability of Maximum Weighted Edge Biclique
285
clearly, |H| ≤ 4N η . Let H be an arbitrary element in H, we have the following inequalities P r {E1 } ≤ P r at least one element in H is a λ-approximation of GN ≤ 4N η · P r H is a λ-approximation of GN = 4N η · P r{E2 } where E2 is the event that H is a λ-approximation of GN . Let the weight of an optimal solution U1 × U2 of G be K, denote by U1N × U2N the corresponding N 2 -duplication in GN . Let x1 and x2 be the number of formerγ-edges in H and U1N × U2N , respectively. Suppose E2 happens, then we must have w−γ (H ) + x1 γ ≤ N 2 ( K λ − 1) w−γ (H ) + wγ (H ) ≥ λ1 (w−γ (U1N × U2N ) + wγ (U1N × U2N )) where the first inequality follows from the fact that we only consider integer weights. Since w−γ (U1N × U2N ) = N 2 K − x2 γ, it implies (wγ (H ) − x1 γ) −
1 (wγ (U1N × U2N ) − x2 γ) ≥ N 2 λ
so we have the following statement on probability P r{E2 } ≤ P r (wγ (H ) − x1 γ) − λ1 (wγ (U1N × U2N ) − x2 γ) ≥ N 2 Let z1 (resp., z2 and z3 ) be the number of edges in E(H ) − E(U1N × U2N ) ( resp., E(U1N × U2N ) − E(H ) and E(U1N × U2N ) ∩ E(H ) ) transformed from former-γ-edges in G. We have 2 P r (wγ (H ) − x1 γ) − λ1 (wγ (U1N × U2N ) − x2 γ) ≥ N z1 z3 2 2 = Pr Yi − λ1 zj=1 Yj + λ−1 k=1 Yk ≥ N λ i=1 z1 1 z2 λ−1 z3 2 = Pr Y + (−Y ) + Y ≥ N i j k k=1 λ j=1 i=1 λ z3 z1 z2 N2 1 N2 N2 ≤ Pr +P r +P r λ−1 Y ≥ (−Y j) ≥ 3 i=1 i j=1 k=1 Yk ≥ 3 3 λ λ z1 z2 z3 N2 N2 N2 ≤ Pr Y ≥ (−Y ) ≥ Y ≥ + P r + P r i j k i=1 j=1 k=1 3 3 3
2 N2 ≤ i∈{1,2,3} exp −2zi 3zi (c1 f (N η)) (Hoeffding bound)
1+2δ (zi ≤ η 2 N 2 ) ≤ 3 · exp −c2 · N η 3−2δ 3−2δ
where c1 , c2 are constants (c2 > 0). Now if we set N = η 1+2δ +θ for some θ, we have
4 P r {E1 } ≤ 4N η · P r {E2 } ≤ 3 · exp ln 4 · η (1+2δ) +θ − c2 · η (1+2δ)θ For this probability to be bounded by 12 as η is large enough, we need to have 2 + θ < (1 + 2δ)θ. Solving this inequality gives θ > δ(1+2δ) . Therefore, for any
4 1+2δ
δ ∈ (0, 12 ], by setting N = η
δ(3−2δ)+3 δ(1+2δ)
, we have P r{E1 }, i.e. the probability that
286
J. Tan
the solution returned by A does not imply a λ-approximation of G, is bounded from above by 12 once input size is large enough. This gives a polynomial time algorithm that approximates S-MWEB within a factor of λ with probability at
least 12 . This lemma immediately leads to the following corollary. Corollary 1. Following the construction in Lemma 3, if S -MWEB can be ap proximated within a factor of n , for some > 0, then there exists a polynomial time algorithm that approximates S-MWEB within a factor of n , where 1 2 = (1 + δ(3−2δ)+3 δ(1+2δ) ) , with probability at least poly(n) . Proof. Let |G| and |GN | be the number of nodes in the S-MWEB and S -MWEB
problem, respectively. Since λ = |GN | ≤ |G|(1+ from Lemma 3. 2.2
δ(3−2δ)+3 ) δ(1+2δ)
, our claim follows
{−1, 0, 1}-MWEB
In this section, we prove inapproximability of {−1, 0, 1}-MWEB by giving a reduction from CLIQUE; in subsequence sections, we prove inapproximability results for more general S-MWEB by constructing randomized reduction from {−1, 0, 1}-MWEB. Lemma 4. The decision version of the {−1, 0, 1}-MWEB problem is NP-complete. Proof. We prove this by describing a reduction from CLIQUE. Given a CLIQUE instance G = (V, E), construct G = (V , E ) such that V = V1 ∪V2 where V1 , V2 are duplicates of V in that there exist bijections φ1 : V1 → V and φ2 : V2 → V . And E E1 E2 E3
= E1 ∪ E2 ∪ E3 = {(u, v) | u ∈ V1 , v ∈ V2 and (φ1 (u), φ2 (v)) ∈ E} = {(u, v) | u ∈ V1 , v ∈ V2 , φ1 (u) = φ2 (v) and (φ1 (u), φ2 (v)) ∈ / E} = {(u, v) | u ∈ V1 , v ∈ V2 , and φ1 (u) = φ2 (v)}
Clearly, G is a biclique. Now assign weight 0 to edges in E1 , −1 to edges in E2 and 1 to edges in E3 . We then claim that there is a clique of size k in G if and only if there is a biclique of total edge weight k in G . First consider the case where there is a clique of size k in G, let U be the set −1 of vertices of the clique, then taking the subgraph induced by φ−1 1 (U ) × φ2 (U ) in G gives us a biclique of total weight k. Now suppose that there is a biclique U1 × U2 of total weight k in G . Without loss of generality, assume U1 and U2 correspond to the same subset of vertices in 2
Note we are slightly abusing notation here by always representing the size of a given problem under discussion by n. Here n refers to the size of S -MWEB (resp. S MWEB) when we are talking about approximation factor n (resp. n ). We adopt the same convention in the sequel.
Inapproximability of Maximum Weighted Edge Biclique
287
V because if (φ1 (U1 ) − φ2 (U2 )) ∪ (φ2 (U2 ) − φ1 (U1 )) is not empty, then removing (U1 − U2 ) ∪ (U2 − U1 ) will never decrease the total weight of the solution. Given φ1 (U1 ) = φ2 (U2 ), we argue that there is no edge of weight −1 in biclique U1 ×U2 ; suppose otherwise there exists a weight −1 edge (i1 , j2 ) (i1 ∈ U1 , and j2 ∈ U2 ), then the corresponding edge (j1 , i2 ) (j1 ∈ U1 , and i2 ∈ U2 ) must be of weight −1 too and removing i1 , i2 from the solution biclique will increase total weight by at least 1 because among all edges incident to i1 and i2 , (i1 , i2 ) is of weight 1, (i1 , j2 ) and (i2 , j1 ) are of weight −1 and the rest are of weights either 0 or −1. Therefore, we have shown that if there is a solution U1 × U2 of weight k in G , U1 and U2 correspond to the same set of vertices U ∈ V and U is a clique of size k. It is clear that the reduction can be performed in polynomial time and the problem is NP, and thus NP-complete.
Given Lemma 1, the following corollary follows immediately from the above reduction. Theorem 1. For any constant > 0, no polynomial time algorithm can approximate problem {−1, 0, 1}-MWEB within a factor of n1− unless P = NP. Proof. It is obvious that the reduction given in the proof of Lemma 4 preserves inapproximability exactly, and given that CLIQUE is hard to approximate within
a factor of n1− unless P = NP, the theorem follows. Theorem 2. For any constant > 0, no polynomial time algorithm can approx1 imate {−1, 0, 1}-MWEB within a factor of n1− with probability at least poly(n) unless RP = NP. Proof. If there exists such a randomized algorithm for {−1, 0, 1}-MWEB, combining it with the reduction given in Lemma 4, we obtain an RP algorithm for CLIQUE. This is impossible unless RP = NP.
2.3
{−1, 1}-MWEB
Lemma 5. If there exists a polynomial time algorithm that approximates {−1, 1}MWEB within a factor of n , then there exists a polynomial time algorithm that ap1 . proximates {−1, 0, 1}-MWEB within a factor of n5 with probability at least poly(n) Proof. We prove this by constructing a {γ, {α, β}}-Product from {−1, 0, 1}MWEB to {−1, 1}-MWEB by setting γ = 0, α = −1 and β = 1. Since δ = 12 , according to Corollary 1, it is sufficient to set N = η 4 so that the probability of 1 .
obtaining a n5 -approximation for {−1, 0, 1}-MWEB is at least poly(n) Theorem 3. For any constant > 0, no polynomial time algorithm can approx1 1 imate {−1, 1}-MWEB within a factor of n 5 − with probability at least poly(n) unless RP = NP. Proof. This follows directly from Theorem 2 and Lemma 5.
288
2.4
J. Tan
{−η 2 −δ , 1}-MWEB and {−η δ− 2 , 1}-MWEB 1
1
In this section, we consider the generalized cases of the S-MWEB problem. Theorem 4. For any δ ∈ (0, 12 ], there exists some constant such that no poly1 nomial time algorithm can approximate {−η 2 −δ , 1}-MWEB within a factor of 1 n with probability at least poly(n) unless RP = NP. The same statement holds 1
for {−η δ− 2 , 1}-MWEB. Proof. We prove this by first construct a {γ, {α, β}}-Product from {−1, 1}1 1 MWEB to {−η 2 −δ , 1}-MWEB by setting γ = −1, α = −(N η) 2 −δ and β = 1. By 1 Corollary 1, we know that for any δ ∈ (0, 2 ], if there exists a polynomial time al1 gorithm that approximates {−η 2 −δ , 1}-MWEB within a factor of n , then there exists a polynomial time algorithm that approximates {−1, 1}-MWEB within a factor of n(1+
δ(3−2δ)+3 δ(1+2δ)
)
with probability at least
1 poly(n) .
So invoking the hardness
result in Theorem 3 gives the desired hardness result for {−η 2 −δ , 1}-MWEB. 1 The same conclusion applies to {−1, η 2 −δ }-MWEB by setting γ = 1, α = −1 1 and β = (N η) 2 −δ . Since η is a constant for any given graph, we can simply 1 1
divide each weight in {−1, η 2 −δ } by η 2 −δ . 1
Theorem 4 leads to the following general statement. min S δ−1/2 Theorem 5. For any small constant δ ∈ (0, 12 ], if max )∩O(η 1/2−δ ), S ∈ Ω(η then there exists some constant such that no polynomial time algorithm can ap1 unless proximate S-MWEB within a factor of n with probability at least poly(n) RP = NP.
3
Two Applications
In this section, we describe two applications of the results establish in Sect. 3 by proving hardness and inapproximability of problems found in practice. 3.1
SAMBA Model Is Hard
Microarray technology has been the latest technological breakthrough in biological and biomedical research; in many applications, a key step in analyzing gene expression data obtained through microarray is the identification of a bicluster satisfying certain properties and with largest area (see the survey [13] for a fairly extensive discussion on this). In particular, Tanay et. al. [18] considered the Statistical-Algorithmic Method for Bicluster Analysis (SAMBA) model. In their formulation, a complete bipartite graph is given where one side corresponds to genes and the other size corresponds to conditions. An edges (u, v) is assigned a real weight which could be either positive or negative, depending on the expression level of gene u in condition v, in a way such that heavy subgraphs corresponds to statistically significant
Inapproximability of Maximum Weighted Edge Biclique
289
biclusters. Two weight-assigning schemes are considered in their paper. In the first, or simple statistical model, a tight upper-bound on the probability of an observed biclusters in computed; in the second, or refined statistical model, the weights are assigned in a way such that a maximum weight biclique subgraph corresponds to a maximum likelihood bicluster. The Simple SAMBA Statistical Model: Let H = (V1 , V2 , E ) be a subgraph of G = (V1 , V2 , E), E = {V1 × V2 } − E and p = |V1|E| ||V2 | . The simple statistical model assumes that edges occur independently and identically at random with probability p. Denote by BT (k, p, n) the probability of observing k or more successes in n binomial trials, the probability of observing a graph at least as dense as H is thus p(H) = BT (|E |, p, |V1 ||V2 |). This model assumes p < 12 and |V1 ||V2 | |V1 ||V2 |, therefore p(H) is upper bounded by
p∗ (H) = 2|V1 ||V2 | p|E | (1 − p)|V1 ||V2 |−|E
|
The goal of this model is thus to find a subgraph H with the smallest p∗ (H). This is equivalent to maximizing − log p∗ (H) = |E |(−1 − log p) + (|V1 ||V2 | − |E |)(−1 − log (1 − p)) which is essentially solving a S-MWEB problem that assigns either positive weight (−1 − log p) or negative weight (−1 − log (1 − p)) to an edge (u, v), depending on whether gene u express or not in condition v, respectively. The summation of edge weights over H is defined as the statistical significance of H. (1−p) ∈ Ω( log1 η ) ∩ O(1). Since η12 ≤ p < 12 , asymptotically we have −1−log −1−log p Invoking Theorem 5 gives the following. Theorem 6. For the Simple SAMBA Statistical model, there exists some > 0 such that no polynomial time algorithm, possibly randomized, can find a bicluster whose statistical significance is within a factor of n of optimal unless RP = NP.
The Refined SAMBA Statistical Model: In the refined model, each edge (u, v) is assumed to Bernoulli trial with parameter pu,v , take an independent therefore p(H) = ( (u,v)∈E pu,v )( (u,v)∈E (1 − pu,v )) is the probability of observing a subgraph H. Since p(H) generally decreases as the size of H increases, Tanay et al. aims with the largest (normalized) likelihood ra to find a bicluster ( (u,v)∈E pc )( (u,v)∈E (1 − pc )) , where pc > max(u,v)∈E pu,v is a tio L(H) = p(H) constant probability and chosen with biologically sound assumptions. Note this is equivalent to maximizing the log-likelihood ratio
290
J. Tan
log L(H) =
(u,v)∈E
log
pc + pu,v
log
(u,v)∈E
1 − pc 1 − pu,v
c With this formulation, each edge is assigned weight either log ppu,v > 0 or
1−pc < 0 and finding the most statistically significant bicluster is equivalog 1−p u,v 1−pc c , log ppu,v }. Since pc is a constant lent to solving S-MWEB with S = {log 1−p u,v
and η12 ≤ pu,v < pc , we have Theorem 5 gives the following.
log (1−pc )−log (1−pu,v ) log pc −log pu,v
∈ Ω( log1 η ) ∩ O(1). Invoking
Theorem 7. For the Refined SAMBA Statistical model, there exists some > 0 such that no polynomial time algorithm, possibly randomized, can find a bicluster whose log-likelihood is within a factor of n of optimal unless RP = NP. 3.2
Minimum Description Length with Holes (MDLH) Is Hard
Bu et. al [4] considered the Minimum Description Length with Holes problem (defined in the following); the 2-dimensional case is claimed NP-hard in this paper and the proof is referred to [3]. However, the proof given in [3] suffers from an error in its reduction3 , thus whether MDLH is NP-complete remains unsettled. In this section, by employing the results established in the previous sections, we show that no polynomial time algorithm exists for MDLH, under the slightly weaker (than P = NP) but widely believed assumption RP = NP. We first briefly describe the Minimum Description Length summarization with Holes problem; for a detailed discussion of the subject, we refer the readers to [3,4]. Suppose one is given a k-dimensional binary matrix M , where each entry is of value either 1, which is of interest, or of value 0, which is not of interest. Besides, there are also k hierarchies (trees) associated with each dimension, namely T1 , T2 , ..., Tk , each of height l1 , l2 , ..., lk respectively. Define level l = maxi (li ). For each Ti , there is a bijection between its leafs and the ’hyperplanes’ in the ith dimension (e.g. in a 2-dimensional matrix, these hyperplanes corresponds to rows and columns). A region is a tuple (x1 , x2 , ..., xk ), where xi is a leaf node or an internal node in hierarchy Ti . Region (x1 , x2 , ..., xk ) is said to cover cell (c1 , c2 , ..., ck ) if ci is a descendant of xi , for all 1 ≤ i ≤ k. A k-dimensional l-level MDLH summary is defined as two sets S and H, where 1) S is a set of regions covering all the 1-entries in M ; and 2) H is the set of 0-entries covered (undesirably) by S and to be excluded from the summary. The length of a summary is defined as |S| + |H|, and the MDLH problem asks the question if there exists a MDLH summary of length at most K, for a given K > 0. In an effort to establish hardness of MDLH, we first define the following problem, which serves as an intermediate problem bridging {−1, 1}-MWEB and MDLH. 3
In Lemma 3.2.1 of [3], the reduction from CLIQUE to CEW is incorrect.
Inapproximability of Maximum Weighted Edge Biclique
291
Definition 3. (Problem P) Instance: A complete bipartite graph G = (V1 , V2 , E) where each edge takes on a value in {−1, 1}, and a positive integer k. Question: Does there exist an induced subgraph (a biclique U1 × U2 ) whose total weight of edges is ω, such that |U1 | + |U2 | + ω ≥ k. Lemma 6. No polynomial time algorithm exists for Problem P unless RP = NP. Proof. We prove this by constructing a reduction from {−1, 1}-MWEB to Problem P as follows: for the given input biclique G = (V1 , V2 , E), make N duplicates of V1 and N duplicates of V2 , where N = (|V1 | + |V2 |)2 . Connect each copy of V1 to each copy of V2 in a way that is identical to the input biclique, we then claim that there is a size k solution to {−1, 1}-MWEB if and only if there is a size N 2 k solution to Problem P. If there is a size k solution to {−1, 1}-MWEB, then it is straightforward that there is a solution to Problem P of size at least N 2 k. For the reverse direction, we show that if no solution to {−1, 1}-MWEB is of size at least k, then the maximum solution to Problem P is strictly less than N 2 k. Note a solution U1N × U2N to Problem P consists of at most N 2 (not necessarily all distinct) solutions to {−1, 1}-MWEB, and each of them can contribute at most (k − 1) in weight to U1N × U2N , so the total weight gained from edges is at most N 2 (k −√1). And note the total weight gained from vertices√is at most N (|V1 |+|V2 |) = N N , therefore the weight is upper bounded by N N + N 2 (k − 1) < N 2 k and this completes the proof. As a conclusion, we have a polynomial time reduction from {−1, 1}-MWEB to Problem P. Since no polynomial time algorithm exists for {−1, 1}-MWEB unless RP = NP, the same holds for Problem P.
Theorem 8. No polynomial time algorithm exists for MDLH summarization, even in the 2-dimension 2-level case, unless RP = NP. Proof. We prove this by showing that Problem P is a complementary problem of 2-dimensional 2-level MDLH. Let the input 2D matrix M be of size n1 ×n2 , with a tree of height 2 associated with each dimension. Without loss of generality, we only consider the ’sparse’ case where the number of 1-entries is less than the number of 0-entries by at least 2 so that the optimal solution will never contain the whole matrix as one of its regions. Let S be the set of regions in a solution. Let R and C be the set of rows and columns not included in S. Let Z be the set of all zero entries in M . Let z be the total number of zero entries in the R × C ’leftover’ matrix and let w be the total number of 1-entries in it. MDLH tries to minimize the following: (n1 − |R|) + (n2 − |C|) + (|Z| − z) + w = (n1 + n2 + |Z|) − (|R| + |C| + z − w) Since (n1 + n2 + |Z|) is a fixed quantity for any given input matrix, the 2dimensional 2-level MDLH problem is equivalent to maximizing (|R|+|C|+z−w), which is precisely the definition of Problem P.
292
J. Tan
Therefore, 2-dimensional 2-level MDLH is a complementary problem to Problem P and by Lemma 6 we conclude that no polynomial time algorithm exists for 2-dimensional 2-level MDLH unless RP = NP.
4
Concluding Remarks
Maximum weighted edge biclique and its variants have received much attention in recently years because of it wide range of applications in various fields including machine learning, database, and particularly bioinformatics and computational biology, where many computational problems for the analysis of microarray data are closely related. To tackle these applied problems, various kinds of heuristics are proposed and experimented and it is not known whether these algorithms give provable approximations. In this work, we answer this question by showing that it is highly unlikely (under the assumption RP = NP) that good polynomial time approximation algorithm exists for maximum weighted edge biclique for a wide range of choices of weight; and we further give specific applications of this result to two applied problems. We conclude our work by listing a few open questions. 1. We have shown that {Θ(−η δ ), 1}-MWEB is n -inapproximable for δ ∈ (− 12 , 12 ); also it is easy to see that (i) the problem is in P when δ ≤ −1, where the entire input graph is the optimal solution; (ii) for any δ ≥ 1, the problem is equivalent to MEB, which is conjectured to be n -inapproximable [8]. Therefore it is natural to ask what is the approximability of the {−nδ , 1}-MWEB problem when δ ∈ (−1, − 21 ] and δ ∈ [ 12 , 1]. In particular, can this be answered by a better analysis of Lemma 3? 2. We are especially interested in {−1, 1}-MWEB, which is closely related to the formulations of many natural problems [1,3,4,18]. We have shown that no polynomial time algorithm exists for this problem unless RP = NP, and we believe this problem is NP-complete, however a proof has eluded us so far.
References 1. Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56, 89–113 (2004) 2. Ben-Dor, A., Chor, B., Karp, R., Yakhini, Z.: Discovering local structure in gene expression data: The Order-Preserving Submatrix Problem. In: Proceedings of RECOMB 2002, pp. 49–57 (2002) 3. Bu, S.: The summarization of hierarchical data with exceptions. Master Thesis, Department of Computer Science, University of British Columbia (2004), http://www.cs.ubc.ca/grads/resources/thesis/Nov04/Shaofeng Bu.pdf 4. Bu, S., Lakshmanan, L.V.S., Ng, R.T.: MDL Summarization with Holes. In: Proceedings of VLDB 2005, pp. 433–444 (2005) 5. Cheng, Y., Church, G.: Biclustering of expression data. In: Proceedings of ISMB 2000, pp. 93–103. AAAI Press, Menlo Park (2000) 6. Dawande, M., Keskinocak, P., Swaminathan, J.M., Tayur, S.: On Bipartite and multipartite clique problems. Journal of Algorithms 41(2), 388–403 (2001)
Inapproximability of Maximum Weighted Edge Biclique
293
7. Feige, U.: Relations between average case complexity and approximation complexity. In: Proceedings of STOC 2002, pp. 534–543 (2002) 8. Feige, U., Kogan, S.: Hardness of approximation of the Balanced Complete Bipartite Subgraph problem. Technical Report MCS 2004-2004, The Weizmann Institute of Science (2004) 9. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-completeness. Freeman, San Francisco (1979) 10. Fontana, P., Guha, S., Tan, J.: Recursive MDL Summarization and Approximation Algorithms (preprint, 2007) 11. H˚ astad, J.: Clique is hard to approximate within n1− . Acta Mathematica 182, 105–142 (1999) 12. Khot, S.: Ruling out PTAS for Graph Min-Bisection, Densest Subgraph and Bipartite Clique. In: Proceedings of FOCS 2004, pp. 136–145 (2004) 13. Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1, 24–45 (2004) 14. Mishra, N., Ron, D., Swaminathan, R.: On finding large conjunctive clusters. In: Proceedings of COLT 2003, pp. 448–462 (2003) 15. Peeters, R.: The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics 131, 651–654 (2003) 16. Swaminathan, J.M., Tayur, S.: Managing Broader Product Lines Through Delayed Differentiation Using Vanilla Boxes. Management Science 44, 161–172 (1998) 17. Tan, J., Chua, K., Zhang, L., Zhu, S.: Complexity study on clustering problems in microarray data analysis. Algorithmica 48(2), 203–219 (2007) 18. Tanay, A., Sharan, R., Shamir, R.: Discovering statistically significant biclusters in gene expression data. Bioinformatics 18(1), 136–144 (2002) 19. Zhang, L., Zhu, S.: A New Clustering Method for Microarray Data Analysis. In: Proceedings of CSB 2002, pp. 268–275 (2002) 20. Zuckerman, D.: Linear Degree Extractors and the Inapproximability of Max Clique and Chromatic Number. In: Proceedings of STOC 2006, pp. 681–690 (2006)
Symbolic Algorithm Analysis of Rectangular Hybrid Systems Haibin Zhang and Zhenhua Duan Institute of Computing Theory and Technology, Xidian University {hbzhang,ZhhDuan}@mail.xidian.edu.cn
Abstract. This paper investigates symbolic algorithm analysis of rectangular hybrid systems. To deal with the symbolic reachability problem, a restricted constraint system called hybrid zone is formalized. Hybrid zones are also applied to a symbolic model-checking algorithm for verifying some important classes of timed computation tree logic formulas. To present hybrid zones, a data structure called difference constraint matrix is defined. Using this structure, all reachability operations and model checking algorithms for rectangular hybrid systems are implemented. These enable us to deal with the symbolic algorithm analysis of rectangular hybrid systems in an efficient way. Keywords: hybrid systems, model checking, temporal logic, reachability analysis.
1
Introduction
Hybrid systems are state-transition systems consisting of a non-trivial mixture of continuous activities and discrete events [6,8]. Most of verification tools [2,3,4] for hybrid systems deal with the reachability analysis and model checking [16,17]. In this paper, we will be concerned with a symbolic approach which is based on the manipulation of conjunctions of inequalities. Many tools for timed automata use data structures and algorithms to manipulate timing constraints over clock variables called clock zone [11], which is a conjunction of inequalities that compare either a clock value or the difference between two clock values to an integer. In this paper, we are motivated to deal with the reachability of rectangular hybrid systems using some constraint system similar to clock zones. However, the immediate problem we encounter is that the constraint system is getting more complicated than clock zones. For linear hybrid systems, researchers have used convex polyhedra [10] as the basis for the symbolic manipulations. Using convex polyhedra, there is a basic operation “quantifier elimination” to compute reachability states, which has an exponential complexity [15]. As a subset of linear hybrid systems, we could also
This research is supported by the NSFC under Grant No. 60373103 and 60433010, the SRFDP under Grant 20030701015.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 294–305, 2008. c Springer-Verlag Berlin Heidelberg 2008
Symbolic Algorithm Analysis of Rectangular Hybrid Systems
295
use convex polyhedra as a basic unit for the symbolic reachability analysis of rectangular hybrid systems. However, we use a more effective constraint system called hybrid zone in this paper1 , which is a conjunction of inequalities that compare either a variable value or a linear expression of two variables to a rational number. We prove that hybrid zones are closed over the reachability operations of rectangular hybrid systems. The model checking problem for rectangular hybrid systems decides whether a rectangular hybrid system presented by a rectangular automaton meets a property given by a temporal logic formula. Unfortunately, this problem is proved to be undecidable. In [1], a semidecision model-checking algorithm for linear hybrid systems is given to check timed computation tree logic (TCTL) formulas. In the model-checking procedure, there are two important operations on state spaces of linear hybrid systems. They are backward time closure and precondition. We proved that our hybrid zones are also closed under those two operations for rectangular hybrid systems, which enables us to use the same algorithm to check properties of rectangular hybrid systems presented by some important classes of TCTL formulas. To represent hybrid zones, we define a matrix data structure difference constraint matrix (DCM), which is indexed by the variables in the rectangular hybrid systems together, and each entry in the matrix represents an inequality in the hybrid zone. Using DCM, we implement all reachability operations and model checking algorithms for rectangular hybrid systems. Similar to DBM [5], after the DCM has been converted to canonical form, those reachability operations and model checking algorithms for rectangular hybrid systems can be implemented straightforwardly. Hence, the main computation is the operation for obtaining the canonical form of hybrid zones. Finding the canonical form can be automated by an algorithm with polynomial time complexity. However, if we use convex polyhedra, the quantifier elimination will lead to an exponential complexity. This paper is organized as follows. The next section introduces rectangular automata. In Section 3, hybrid zones are defined. Section 4 investigates the model checking problem for rectangular hybrid systems. In Section 5, a data structure DCM is formalized. Conclusions are drawn in Section 6.
2
Rectangular Automata
Let Y = {y1 , · · · , yn } be a set of variables. A rectangle B ⊆ Rn over Y is defined by a conjunction of linear (in)equalities of the form yi ∼ c, where ∼ is <, ≤, =, ≥, or >, and c is a rational number [7,9]. For a rectangle B ⊆ Rn , we denote Bi by its projection onto the ith coordinate. The set of all n-dimensional rectangles is denoted by Bn . Definition 1. A rectangular automaton [9] is a tuple (Q, X, init, E, jump, update, reset, act, inv), where: 1
The definition of the hybrid zone is firstly formalized in our another paper [13], which is perfected in this paper.
296
H. Zhang and Z. Duan
– Q is a finite set of locations. – X is a finite set of real-numbered variables. – Init : Q → Bn is a labeling function that assigns an initial condition to each location q ∈ Q. – E ⊆ Q × Q is a finite set of edges called transitions. – jump : E → Bn is a labeling function that assigns a jump condition to each edge e ∈ E, which relates values of variables before edge e. – update : E → 2{1,...,n} is a function that maps an element in 2{1,...,n} to each edge e ∈ E. update(e) indicates the variables {xi |i ∈ update(e)} whose values must be changed over edge e. – reset : E → Bn is a labeling function that assigns a reset condition to each edge e ∈ E, which relates values of the variables after edge e. – inv : Q → Bn is a labeling function that assigns an invariant condition to each location q ∈ Q. – act : Q → Bn is a labeling function that assigns a flow condition to each location q ∈ Q being the form of x˙ = a (x ∈ X, a ∈ act(q)). In this paper, we investigate the initialized rectangular automata with bounded nondeterminism [7]. In addition, we require that for e = (q, q ) ∈ E, jump(e) is bounded; and for each interval act(q)i , either act(q)i ⊂ (−∞, 0] or act(q)i ⊂ [0, ∞) while both of its end-points are integers. A valuation is a function from (x1 , ..., xn ) to Rn , where xi ∈ X. We use V to denote the set of valuations. A state is a tuple (q, v), where q ∈ Q and v ∈ V . We use S to denote the set of states. For v ∈ V and d ∈ R+ , v + l · d denotes the valuation which maps each variable xi ∈ X to the value v(xi ) + li · d. For θ ⊆ {1, ..., n}, B ∈ Bn , [θ → B]v denotes the valuation which maps each variable xi ∈ X (i ∈ θ) to a value in Bi and agrees with v over {xk |k ∈ {1, ..., n}\θ}. For e = (q, q ) ∈ E, states (q, v) and (q , v ), if v ∈ jump(e) and v ∈ reset(e), then state (q , v ) is called a transition successor of state (q, v). The semantics of rectangular automata can be given by an infinite transition system. A run of a rectangular automaton H is a finite or infinite sequence ρ : s0 →tf00 s1 →tf11 s2 →tf22 · · · of states si = (qi , vi ) ∈ S, nonnegative reals ti ∈ R+ , and fi ∈ act(qi ), such that for all i ≥ 0: 1. For all 0 ≤ t ≤ ti , vi + fi · t ∈ inv(qi ). 2. The state si+1 is a transition successor of the state (qi , vi + fi · ti ). The state si+1 is called a successor of state si . The run ρ diverges if ρ is infinite and the infinite sum ti diverges. i≥0
3
Reachability Analysis
Given two states s and s of a hybrid system H, the reachability problem asks whether there exists a run of H which starts at s and ends at s . The reachability
Symbolic Algorithm Analysis of Rectangular Hybrid Systems
297
problem is central to the verification of hybrid systems. In the sequel, we will give a constraint system to deal with the symbolic reachability analysis of rectangular hybrid systems. We can use the algorithm in Fig. 1 to deal with the symbolic reachability analysis of rectangular hybrid systems, which checks whether a rectangular automaton may reach a state satisfying a given state formula φ. This algorithm can be expressed in terms of the following three reachability operations: given two sets of valuations D and D , two locations q, q ∈ Q and the edge e = (q, q ) ∈ E, intersection D ∧ D is the set of valuations in both D and D , time progress in location q D↑q is the set of valuations {u + l · d|u ∈ D ∧ l ∈ act(q) ∧ d ∈ R+ } that are reachable from some valuation in D by time progressing, variable reset Resete (D) is the set of valuations {[update(e) → reset(e)]u|u ∈ D} that are reachable from some valuation in D over the transition e (In Fig. 1, (q, D) → (qs , Ds ) iff qs = q ∧ Ds = (D ∧ inv(q))↑q or qs = q ∧ Ds = Resete (D ∧ jump(e )) ∧ e = (q, qs )).
Passed:={} Wait:={(q0 , D0 )} while Wait= {} get (q, D) from Wait if (q, D) |= φ then return ’YES’ else if D ⊆ D for all (q, D ) ∈Passed add (q, D) to Passed Next:={(qs , Ds )| (q, D) → (qs , Ds ) ∧ Ds = ∅} for all (qs , Ds ) ∈Next do put (qs , Ds ) to Wait return ’NO’ Fig. 1. An algorithm for symbolic reachability analysis
Hybrid zones In the sequel, for each variable xi ∈ X, li denotes the left end-point of act(q)i , and ri the right end-point, without clarification. We define a function g(a, b) as g(a, b) = gcd(a, b), if a · b = 0, otherwise, g(a, b) = 1, where gcd(a, b) is the greatest common divisor of a and b. Definition 2. For a location q of a rectangular automaton, let li denote the left end-point of act(q)i , and ri the right end-point. A q-zone is the conjunction of inequalities: (xi ≺ ci0 ∧ −xi ≺ c0i ) ∧ aij xi − bij xj ≺ cij , 0
such that for 0 < i = j ≤ n
0
298
H. Zhang and Z. Duan
⎧ ⎪ ⎪ ⎨ ⎪ ⎪ ⎩
aij aij aij aij
= lj / g(lj , ri ) = lj / g(lj , li ) = rj / g(rj , ri ) = rj / g(rj , li )
bij bij bij bij
= ri / g(lj , ri ), = li / g(lj , li ), = ri / g(rj , ri ), = li / g(rj , li ),
if li ≥ 0 and lj ≥ 0 if li ≥ 0 and rj < 0 if ri < 0 and lj ≥ 0 if ri < 0 and rj < 0
A hybrid zone is a conjunction of inequalities that accords with some q–zone, where q ∈ Q is a location of a rectangular automaton. By using a variable x0 which is always 0, we can obtain a general form of a hybrid zone: x0 = 0 ∧ aij xi − bij xj ≺ cij (3 − 1) 0≤i=j≤n
where a0k = b0k = ak0 = bk0 = 1, for 0 ≤ k ≤ n. Given a hybrid zone D represented by ax − by ≺ c, we call (≺, c) the bound. For two bounds (≺, c) and (≺ , c ), (≺, c) is tighter than (≺ , c ), denoted by (≺, c) < (≺ , c ), iff c < c or c = c , ≺ is < and ≺ is ≤. Further, for ≺ and ≺ , we define min(≺, ≺ ) is ≤ if both ≺ and ≺ are ≤, and min(≺, ≺ ) is < if one of ≺ and ≺ is <. For a constraint ax − by ≺ c in D, if (≺ , c ) is the tightest bound of linear expression ax − by that can be deduced from the conjunction of other constraints in D, then we call ax − by ≺ c the canonical constraint ; and D with each constraint being canonical the canonical hybrid zone, where (≺ , c ) = (≺, c) if (≺, c) < (≺ , c ), otherwise, (≺ , c ) = (≺ , c ). The following lemmas and theorems ensure that hybrid zones are closed over those three reachability operations of rectangular hybrid systems, which enables hybrid zones to be used as the basis for the state reachability analysis algorithm for rectangular hybrid systems in Fig. 1. To prove that the projection of a hybrid zone onto a lower dimensional subspace is also a hybrid zone, we need to prove that any newly produced constraint from arbitrary two constraints contained in the hybrid zone must be redundant. For this, we give a sufficient condition that a hybrid zone has q–property. If a qzone D has q-property, then any constraint newly produced from two constraints in D can be deduced from the conjunction of other constraints in D. Given a location q of a rectangular automaton and a q-zone D, the canonical form of D is represented by (3-1). We say D has q-property, iff for 0 < i = j = k ≤ n, one of the following conditions is satisfied: 1. if li ≥ 0, lj ≥ 0, lk ≥ 0, then g(lk , ri )lj cik + g(lj , rk )ri ckj g(lj , ri )lk cij + (rk ri − lk ri )c0j ≤ 2. if li ≥ 0, rj ≥ 0, rk < 0, then − g(lj , ri )rk cij + (rk ri − lk ri )c0j ≤ g(ri , rk )lj cki + g(lk , lj )ri cjk 3. if li ≥ 0, lj < 0, lk ≥ 0, then g(lj , lk )li ckj − g(li , rk )lj cki g(lj , li )rk cij + (rk li − lk li )cj0 ≤ 4. if li ≥ 0, lj < 0, lk < 0, then − g(lj , li )lk cij + (rk li − lk li )cj0 ≤ g(rk , lj )li cjk − g(lk , li )lj cik 5. if li < 0, lj ≥ 0, lk ≥ 0, then g(lk , rj )ri cjk g(rj , ri )rk cij + (lk ri − rk ri )c0j ≤ g(rk , ri )rj cik −
Symbolic Algorithm Analysis of Rectangular Hybrid Systems
299
6. if li < 0, rj ≥ 0, rk < 0, then − g(rj , ri )lk cij + (lk ri − rk ri )c0j ≤ g(ri , lk )rj cki − g(rj , rk )ri ckj 7. if li < 0, lj < 0, lk ≥ 0, then g(rj , li )lk cij + (lk li − rk li )cj0 ≤ − g(li , lk )rj cki − g(rk , rj )li cjk 8. if li < 0, lj < 0, lk < 0, then − g(rj , li )rk cij + (lk li − rk li )cj0 ≤ − g(rk , li )rj cik − g(rj , lk )li ckj For a canonical q-zone D, and a variable xr , D has q-property is a sufficient condition for ∃xr [D] being a q-zone with variables X \ {xr }. Thus, the following lemma can be readily obtained. Lemma 1. Given a location q of a rectangular automaton, if D is a q-zone with q-property, xr is a variable in X, then ∃xr [D] is a q-zone with q-property over variables X \ {xr }. Proof. Suppose the canonical form of D is given by (3-1). Without loss of generation, let r = n > 0. Since D has q-property, then any inequality obtained from two inequalities in D by eliminating variable xn are redundant. So, ∃xn [D] is the q-zone x0 = 0 ∧ aij xi − bij xj ≺ cij . 0≤i=j
Since D has q-property, so ∃xn [D] has q-property over X \ {xn }.
Given a canonical hybrid zone D represented by (3-1), it is easy to prove that D is empty iff there exists a k.(0 < k ≤ n) such that (min(≺0k , ≺k0 ), c0k + ck0 ) < (≤, 0). For any location q of a rectangular automaton, a bounded rectangle can be easily expressed as a q-zone with q-property. This ensures that the assignment of values to variables in the initial condition and every variable constraint used in the invariant condition of a rectangular automaton location can be expressed as hybrid zones. Moreover the jump condition of a transition is a hybrid zone. The following three theorems ensure that hybrid zones keep closed over the three reachability operations. Theorem 1. Given a location q of a rectangular automaton, if D is a q-zone with q-property, then D↑q is a q-zone with q-property. Theorem 2. Given a location q of a rectangular automaton, if D is a bounded rectangle and D is a q-zone with q-property, then D ∧ D is a q-zone with qproperty. The proofs of Theorem 1 and 2 can refer to our another paper [13]. To ensure D ∧D having q-property is necessary and crucial, as after being intersected with D , the hybrid zone D will traverse an edge e with q as the left end location, D ∧ D having q-property is the sufficient condition that all the valuation that can be reached from D ∧ D by traversing the edge e can be represented by the hybrid zone. For the q-zone D, an edge e = (q, q ), and a single variable set update(e ) = {xi }, by Lemma 1, Resete [D] = ∃xi [D] ∧ xi ∈ reset(e )i is a q – zone. It is important that Resete [D] has q -property, as it will traverse another
300
H. Zhang and Z. Duan
edge by time elapsing and intersected with a jump condition. This is ensured by the following lemma. The result can easily be extended to sets with more than one variable by induction, which is the conclusion of Theorem 3. The proofs of Lemma 2 and Theorem 3 can refer to our paper [13]. Lemma 2. Given two locations q and q of a rectangular automaton. Suppose D, given by x0 = 0 ∧ aij xi − bij xj ≺ cij 0≤i=j
is a canonical q-zone with q-property, and q is a location such that act(q )i = act(q)i , for 0 < i < n. Then aij xi − bij xj ≺ cij x0 = 0 ∧ x0 − xn ≺ c0n ∧ xn − x0 ≺ cn0 ∧ 0≤i=j
is a q -zone with q -property, where c0n and cn0 are two constant rational numbers such that c0n + cn0 ≥ 0. Theorem 3. Given an edge e = (q, q ) of a rectangular automaton. If D is a q-zone with q-property, then Resete [D] is a q -zone with q -property.
4
Model Checking
In this section, we address the model checking problem of rectangular hybrid systems, which is to check whether a given rectangular automaton satisfies a requirement expressed in the timed computing tree logic [1,14]. 4.1
Timed Computation Tree Logic
Here, we introduce the TCTL presented in [1]. Let C be a set of clocks satisfying C ∩ X = ∅, a state predicate is a linear formula over the set C ∪ X. The syntax of TCTL is given by the grammar φ ::= ψ | ¬φ | φ1 ∨ φ2 | z.φ1 | φ1 ∃Uφ2 | φ1 ∀Uφ2 where ψ is a state predicate, ∃U and ∀U are temporal operators. The formula φ is closed if all occurrences of a clock z ∈ C are within the scope of a reset def quantifier z. In the following, some abbreviations are given: ∀♦φ = true∀Uφ, def def def ∃♦φ = true∃Uφ, ∀2φ = ¬∃♦¬φ, ∃2φ = ¬∀♦¬φ. We also put timing constraints as subscripts on the temporal operators. For example, the formula z.∃♦(φ ∧ z ≤ 3) is abbreviated to ∃♦z≤3 φ. Given a run ρ = s0 →tf00 s1 →tf11 s2 →tf22 . . . of a rectangular automaton H, we can construct a function g : R+ → S such that for any t ∈ R+ , g(t) = i (qi , vi + fi · t ), where i and t ∈ R+ satisfy t = tj + t and si = (qi , vi ). We j=0
use ρ(t) to denote the state g(t).
Symbolic Algorithm Analysis of Rectangular Hybrid Systems
301
A clock valuation ν is a function from C to R+ . For any clock z ∈ C, we use ν[z := 0] to denote the clock valuation ν such that ν (z) = 0 and ν (z ) = ν(z ), for all z (z = z) ∈ C. An extended state is a tuple (s, ν). The extended state (s, ν) satisfies the TCTL-formula φ, denoted (s, ν) |= φ, if (s, ν) |= ψ iff (s, ν)(ψ). (s, ν) |= ¬φ iff (s, ν) |= φ. (s, ν) |= φ1 ∨ φ2 iff (s, ν) |= φ1 or (s, ν) |= φ2 . (s, ν) |= z.φ iff (s, ν[z := 0]) |= φ. (s, ν) |= φ1 ∃Uφ2 iff there exists a run ρ of rectangular automaton H with ρ(0) = s and a time t ∈ R+ such that (ρ(t), ν+t) |= φ2 and for any 0 ≤ t ≤ t, (ρ(t ), ν + t ) |= φ1 ∨ φ2 . 6. (s, ν) |= φ1 ∀Uφ2 iff for all divergent runs ρ of rectangular automaton H with ρ(0) = s and a time t ∈ R+ such that (ρ(t), ν + t) |= φ2 and for any 0 ≤ t ≤ t, (ρ(t ), ν + t ) |= φ1 ∨ φ2 .
1. 2. 3. 4. 5.
For a closed TCTL formula φ, a state s ∈ S satisfies φ, denoted by s |= φ, if (s, ν) |= φ for all clock valuation ν. The rectangular automaton H satisfies φ, denoted by H |= φ, if all states of H satisfies φ. We use |φ| to denote the states of H that satisfies φ. 4.2
Model Checking
Given a closed TCTL-formula φ, a model-checking algorithm computes the characteristic set |φ|. We use the algorithm in [1] to deal with the model-checking problem for rectangular hybrid systems with our hybrid zones. The procedure is based on fixpoint characterizations of the TCTL-modalities in terms of a binary next operator . Given two sets of states R and R , R R is the set of the state s that have a successor s ∈ R such that all states between s and s are contained in R ∪ R : (q, v) ∈ R R iff ∃(q , v ) ∈ R , ∃t ∈ R+ , ∃e ∈ E.((q , v ) ∈ (q , Resete ({v}↑q ))∧ ∀0 ≤ t ≤ t.((q, v + act(q) · t) ∈ R ∪ R )), that is, the operator is a single-step until operator. To define the operator syntactically, we introduce two notations. Given an edge e = (q, q ) ∈ E and a set of valuations D, we define the backward time elapsing in location q D↓q and the backward zone Bace (D) as v ∈ D↓q if f ∃v ∈ D, ∃f ∈ act(q), ∃t ∈ R+ .(v = v + f · t ∧ ∀0 ≤ t ≤ t.((v + f · t ) ∈ inv(q))), v ∈ Bace (D) if f ∃v ∈ D.(v ∈ Resete ({v})). Then, for two sets of states R and R , let D = {v|∃q ∈ Q.((q, v) ∈ R )}, we can define R R as (q, v ) ∈ R R if f ∃ e ∈ E, ∃ v ∈ Bace (D), ∃ f ∈ act(q).(v ∈ {v}↓q ).
302
H. Zhang and Z. Duan
Lemma 3. Given a location q of a rectangular automaton, if D is a q-zone with q-property, then D↓q is a q-zone with q-property. Proof. Suppose q is a location such that for all k ∈ R, k ∈ act(q) if and only if −k ∈ act(q ). By the definition of q–zone and q–property, D is a q-zone iff D is a q -zone, D has q-property iff D has q -property. By the definition of backward time elapsing, D↓q = D↑q . By Theorem 1, D↑q is a q -zone with q -property, then D↓q is a q-zone with q-property. Lemma 4. Given an edge e = (q, p) of a rectangular automaton, if D is a p-zone with p-property, then Bace [D] is a q-zone with q-property. Proof. Suppose update(e) = {i1 , ..., im }, by the definition of backward zone, Bace [D] is ∃xi1 · · · ∃xim [D ∧ xj ∈ reset(e)j ]. j∈update(e)
By the definition of initialled rectangular automaton, for ∀k ∈ {1, ..., n} \ update (e), act(q)k = act(p)k . By Lemma 1 and Theorem 2, Bace [D] is a q-zone with q-property. Theorem 4. Given two locations q and q of a rectangular automaton, if D is a q-zone, D is a q -zone, R = (q, D) and R = (q , D ) are two state regions, then D∗ = {v|(q, v) ∈ R R } is a q-zone. Proof. It is the immediate consequence of Lemma 3 and 4.
∃U(φ1 , φ2 ) { / ∗ φ1 , φ2 are two formulas ∗/ R1 = |φ1 |; R2 = |φ2 |; R = |φ2 |; R = |φ1 | |φ2 |; while(R ⊆ R ) { R = R ∪ R; R = R1 R; } return R ; } ∀2(φ) { / ∗ φ is a formula ∗/ R = |φ|; R = |φ| ∩ ¬(true ¬|φ|) while(R ⊆ R ) { R = R ∩ R; R = R ∩ ¬(true ¬R); } return R ; } ∀♦≤c (φ) { / ∗ φ is a formula ∗/ R = |φ|; R∗ = |z > c|; R = |z > c| ∩ ((¬R) ¬|z > c|); /*z ∈ C*/ while(R ⊆ R∗ ) { R∗ = R∗ ∩ R ; R = R ∩ ((¬R) ¬R ); } return R∗ ; } Fig. 2. The model checking procedures
Symbolic Algorithm Analysis of Rectangular Hybrid Systems
303
The meaning of both TCTL- modalities ∀U and ∃U can be computed iteratively as fixpoints using the operator. By [1], the iterative fixpoint computation always terminates for rectangular hybrid systems. Theorem 4 ensures that all regions that are computed by the process are hybrid zones. Refer to the method presented in [1], Fig. 2 gives the model checking procedures for some important classes of TCTL-formulas,
5
Difference Constraint Matrix
To represent hybrid zones, we formalize a data structure difference constraint matrix. This matrix is indexed by the variables in X together with a special variable x0 whose value is always 0. This variable plays exactly the same role as the variable x0 in the previous section. Each entry Dij (i = j) in the matrix D has the form (aij , bij , dij , ≺ij ) and represents the inequality aij xi − bij xj ≺ij dij , or (aij , bij , ∞, <), if no bound is known for aij xi − bij xj . Each entry Dii has the form (1, 1, dii , ≺ii ). Given a hybrid zone D in its general form represented by (3-1). We can obtain the matrix D shown below: – Dij = (aij , bij , cij , ≺ij ), for 0 ≤ i = j ≤ n. – Dii = (1, 1, 0, ≤), for 0 ≤ i ≤ n. A DCM D with each Dij = (aij , bij , dij , ≺ij ) is canonical iff the hybrid zone represented by D is canonical, and (min(≺i0 , ≺0i ), (di0 + d0i )) < (≺ii , dii ), for all 0 ≤ i ≤ n. We describe five operations on canonical DCM. These operations correspond to the five operations defined on hybrid zones. Intersection. Given a location q of a rectangular automaton, and two q-zones respectively represented by two DCMs D1 and D2 , we define D = D1 ∧ D2 . Let 1 2 = (a, b, c1 , ≺1 ) and Dij = (a, b, c2 , ≺2 ). Then Dij = (a, b, min(c1 , c2 ), ≺), Dij where ≺ is defined as follows: – – – –
If If If If
c1 c2 c1 c1
< c2 , then ≺=≺1 . < c1 , then ≺=≺2 . = c2 and ≺1 =≺2 , then ≺=≺1 . = c2 and ≺1 =≺2 , then ≺=<.
Variable Reset. Given an edge e = (q, q ) of a rectangular automaton, and a qzone represented by a DCM D. Suppose Dij = (aij , bij , cij , ≺ij ) and reset(e)k = −xk ≺k μk ∧ xk ≺k νk , for k ∈ update(e). We define D = Resete (D) as follows: – Dij = (aij , bij , ∞, <), for i ∈ update(e) or j ∈ update(e) such that i = j and i · j = 0, where aij and bij satisfy the definition of q −zone. – Dij = Dij , for (i = j) ∈ update(e). – D0i = (1, 1, μi , ≺i ), Di0 = (1, 1, νi , ≺i ), for i ∈ update(e) and 0 < i ≤ n. = Dii , for 0 ≤ i ≤ n. – Dii
Elapsing of Time in Location q. Given a location q of a rectangular automaton, and a q-zone represented by a DCM D, we define D = D↑q as follows:
304
H. Zhang and Z. Duan
– Di0 = (1, 1, ∞, <), D0i = D0i , if li ≥ 0; Di0 = Di0 , D0i = (1, 1, ∞, <), if ri < 0, for i = 0. = Dij , for i = j = 0 or i · j = 0. – Dij
Backward Time Elapsing in Location q. Suppose Dij = (aij , bij , cij , ≺ij ), define D = D↓q as follows: – Di0 = (1, 1, ∞, <), D0i = D0i , if li < 0; Di0 = Di0 , D0i = (1, 1, ∞, <), if ri ≥ 0, for any i = 0. = Dij , if i = j = 0 or i = 0 ∧ j = 0. – Dij
Backward Zone. Suppose e = (q, q ) is an edge of a rectangular automa ton, and the canonical form of D = D ∧ reset(e)i has each Dij as i∈update(e)
(aij , bij , cij , ≺ij ). We define D∗ = Bace (D) as follows: ∗ – Dij = (aij , bij , ∞, <), for i ∈ update(e) or j ∈ update(e). ∗ = Dij , for i, j ∈ update(e). – Dij
In each case the resulting DCM may fail in its canonical form. Thus, as a final step we need to reduce the DCM to the canonical form. Given a hybrid zone D represented by (3-1), to find the canonical form of a constraint aij xi −bij xj ≺ cij is equal to find the maximum of function f (xi , xj ) = aij xi − bij xj over the constraint system D. This is the issue of linear programming, and it can be automated by the algorithm in [12]. As clarified in [12], the running-time of this algorithm is O(n3.5 L2 ), where n is the dimension of the problem and L is the number of bits in the input. Hence, finding canonical form of hybrid zone D can be automated within a polynomial-time O(n5.5 L2 ).
6
Conclusion
In this paper, a hybrid zone with constraints less than n2 and a data structure DCM are defined. After the DCM has been converted to canonical form, the manipulating operations of rectangular hybrid systems on hybrid zones can be implemented straightforwardly. Hence, the main computation is n2 operations for obtaining canonical form at most. Finding the canonical form can be automated by an algorithm with polynomial time-complexity. In addition, we implement the reachability operations and model checking algorithms for rectangular hybrid systems using DCMs.
References 1. Alur, R., Courcoubetis, C., Halbwachs, N., Henzinger, T.A., Ho, P.-H., Nicollin, X., Olivero, A., Sifakis, J., Yovine, S.: The algorithmic analysis of hybrid systems. Theoretical Computer Science 138, 3–34 (1995) 2. Alur, R., Courcoubetis, C., Henzinger, T.A., Ho, P.-H.: Hybrid Automata: an Algorithmic Approach to the Specification and Verification of Hybrid Systems. In: Grossman, R.L., Ravn, A.P., Rischel, H., Nerode, A. (eds.) HS 1991 and HS 1992. LNCS, vol. 736, Springer, Heidelberg (1993)
Symbolic Algorithm Analysis of Rectangular Hybrid Systems
305
3. Alur, R., Henzinger, T.A., Ho, P.-H.: Automatic Symbolic Verification of Embedded Systems. In: Proceedings of 1993 IEEE Real-Time System Symposium (1993) 4. Annichini, A., Asarin, E., Bouajjani, A.: Symbolic Techniques for Parametric Reasoning about Counter and Clock Systems. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, Springer, Heidelberg (2000) 5. Dill, D.L.: Timing Assumptions and Verification of Finite-State Concurrent Systems. LNCS, vol. 407, pp. 197–212 (1989) 6. Duan, Z.: Modeling of hybrid systems. Ph.D thesis, Department of Computer Science, University of Shefield, UK (February 1997) 7. Henzinger, T.A., Kopke, P.W., Puri, A., Varaiya, P.: What s decidable about hybrid automata? J.Comput.Syst.Sci. 57, 94–124 (1998) 8. Henzinger, T.A., Majumdar, R.: Symbolic model checking for rectangular hybrid systems(Abstract). In: Schwartzbach, M.I., Graf, S. (eds.) ETAPS 2000 and TACAS 2000. LNCS, vol. 1785, pp. 146–156. Springer, Heidelberg (2000) 9. Kopke, P.W.: The thoery of Rectangular Hybrid Automata. Ph.D thesis, Cornell University (1996) 10. Wang, F.: Symbolic Parametric Safety Analysis of Linear Hybrid Systems with BDD-Like Data-Structures. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 295–307. Springer, Heidelberg (2004) 11. Behrmann1, G., Larsen, K.G., Pearson, J., Weise, C., Yi, W.: Efficient Timed Reachability Analysis Using Clock Diffierence Diagrams. In: Halbwachs, N., Peled, D.A. (eds.) CAV 1999. LNCS, vol. 1633, pp. 341–353. Springer, Heidelberg (1999) 12. Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinatorica 4(4), 373–395 (1984) 13. Zhang, H., Duan, Z.: Symbolic Reachability Analysis of Rectangular Hybrid Systems. In: ICCP 2006, Cluj-Napoca, Romania, pp. 240–248 (2006) 14. Alur, R., Courcoubetis, C., Dill, D.L.: Model-checking in dense real-time. Inform. Computat. 104, 2–34 (1993) 15. Anai, H., Weispfnening, V.: Reach Set Computations Using Real Quantifier Elimination. In: Di Benedetto, M.D., Sangiovanni-Vincentelli, A.L. (eds.) HSCC 2001. LNCS, vol. 2034, pp. 63–76. Springer, Heidelberg (2001) 16. Zhang, H., Duan, Z.: Symbolic Reachability Analysis of Multirate Hybrid Systems. Journal of Xi’an Jiao Tong University 41(4), 412–416 (2007) 17. Zhang, H., Duan, Z.: Decidability of The Dense Timed Interval Temporal Logic. Journal of Xidian University 34(3), 463–467 (2007)
On the OBDD Complexity of the Most Significant Bit of Integer Multiplication (Extended Abstract) Beate Bollig LS2 Informatik, TU Dortmund, 44221 Dortmund, Germany [email protected]
Abstract. Integer multiplication as one of the basic arithmetic functions has been in the focus of several complexity theoretical investigations. Ordered binary decision diagrams (OBDDs) are the most common dynamic data structure for boolean functions. Among the many areas of application are verification, model checking, computer-aided design, relational algebra, and symbolic graph algorithms. In this paper it is shown that the OBDD complexity of the most significant bit of integer multiplication is exponential answering an open question posed by Wegener (2000). Keywords: Computational complexity, integer multiplication, lower bounds, ordered binary decision diagrams.
1
Introduction and Result
Integer multiplication is certainly one of the most important functions in computer science and a lot of effort has been spent in designing good algorithms and small circuits and in determining its complexity. For one of the latest results see, e.g., [8]. Ordered binary decision diagrams (OBDDs) are the most common dynamic data structure for boolean functions. Although many even exponential lower bounds on the OBDD size of boolean functions are known and the lower bound methods how to obtain such bounds are simple, it is often a more difficult task to prove large lower bounds for some predefined and interesting functions. Despite the well-known lower bounds on the OBDD size of the so-called middle bit of multiplication ([7], [17]), until now the OBDD complexity of the most significant bit of multiplication has been unknown and Wegener [16] has asked whether its OBDD complexity is exponential. In the following we answer his question affirmatively. 1.1
Branching Programs or Binary Decision Diagrams
Besides boolean circuits and formulae, branching programs (BPs), sometimes also called binary decision diagrams (BDDs), are one of the standard representations for boolean functions. (For a history of results on branching programs see, e.g., the monograph of Wegener [16]). M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 306–317, 2008. c Springer-Verlag Berlin Heidelberg 2008
On the Most Significant Bit of Integer Multiplication
307
Definition 1. A branching program (BP) on the variable set Xn = {x1 , . . . , xn } is a directed acyclic graph with one source and two sinks labeled by the constants 0 and 1. Each non-sink node (or decision node) is labeled by a boolean variable and has two outgoing edges, one labeled by 0 and the other by 1. An input b ∈ {0, 1}n activates all edges consistent with b, i.e., the edges labeled by bi which leave nodes labeled by xi . A computation path for an input b in a BP G is a path of edges activated by the input b which leads from the source to a sink. A computation path for an input b which leads to the 1-sink is called accepting path for b. Let Bn denote the set of all boolean functions f : {0, 1}n → {0, 1}. The BP G represents a function f ∈ Bn for which f (b) = 1 iff there exists an accepting path for the input b. The size of a branching program G is the number of its nodes and is denoted by |G|. The branching program size of a boolean function f is the size of the smallest BP representing f . The length of a branching program is the maximum length of a path. It is well known that the logarithm of the branching program size is essentially the same as the space complexity of the nonuniform variant of Turing machines. Hence, it is a fundamental open problem to prove superpolynomial lower bounds on the size of branching programs for explicitly defined boolean functions. In order to develop and strengthen lower bound techniques one considers restricted computation models. There are several possibilities to restrict BPs, among them restrictions on the multiplicity of variable tests or the order in which variables may be tested. Definition 2. i) A branching program is called read-k-times (BPk) if each variable is tested on each path at most k times. ii) A branching program is called s-oblivious for a sequence of variables s = (s1 , . . . , sl ), si ∈ Xn , or short oblivious, if the set of decision nodes can be partitioned into disjoint sets Vi , 1 ≤ i ≤ l, such that all nodes from Vi are labeled by si and the edges which leave Vi -nodes reach a sink or a Vj -node where j > i. The length of an s-oblivious branching program is the length of the sequence s. Besides the complexity theoretical viewpoint people have used branching programs in applications. Representations of boolean functions that allow efficient algorithms for many operations, in particular synthesis (combine two functions by a binary operation) and equality test (do two representations represent the same function?) are necessary. Bryant [6] introduced ordered binary decision diagrams (OBDDs) which have become the most popular data structure for boolean functions. Among the many areas of application are verification, model checking, computer-aided design, relational algebra, and symbolic graph algorithms. Definition 3. An OBDD is a branching program with a variable order given by a permutation π on the variable set. On each path from the source to the sinks, the variables at the nodes have to appear in the order prescribed by π (where
308
B. Bollig
some variables may be left out ). A π-OBDD is an OBDD ordered according to π. The π-OBDD size of f denoted by π-OBDD(f ) is the size of the smallest π-OBDD representing f . The OBDD size of f , sometimes also called OBDD complexity of f , (denoted by OBDD(f )) is the minimum of all π-OBDD(f ). 1.2
Integer Multiplication and Binary Decision Diagrams
Lower bounds for integer multiplication are motivated by the general interest in the complexity of important arithmetic functions. Definition 4. The boolean function MULi,n ∈ B2n maps two n-bit integers x = xn−1 . . . x0 and y = yn−1 . . . y0 to the ith bit of their product, i.e., MULi,n (x, y) = zi , where x · y = z2n−1 . . . z0 . The bit z2n−1 is the most significant bit of integer multiplication in the following sense. Let (z2n−1 , . . . , z0 ) be the binary representation of the integer z, i.e., i z = 2n−1 i=0 zi ·2 . Since the bit z2n−1 has the highest value, for the approximation of the value of the product of two n-bit numbers x and y it is the most interesting one. On the other hand for space bounded models of computation the most significant bit of integer multiplication is the easiest one to compute in the sense that if it cannot be computed with size s(n), then any other bit zi , 2n − 1 < i ≤ n − 1, cannot be computed with size s(n/2). and any other bit zi , n − 1 < i ≤ 0, cannot be computed in size s(i/2). The middle bit of integer multiplication (the bit zn−1 ) is the hardest bit to compute for space bounded models of computation in the sense that if it can be computed with size s(n), then any other bit can be computed with size at most s(2n). More precisely, any branching program for MUL2n−1,2n can be converted into a branching program representing MULi,n , 0 ≤ i ≤ 2n − 1, by relabeling the nodes and by replacing some inputs with the constant 0. Therefore, the first exponential lower bounds have been proved for MULn−1,n . For OBDDs Bryant [7] has presented an exponential lower bound of 2n/8 and Gergov has extended the result for so-called nondeterministic linear-length oblivious branching pro√ grams [9]. Later Ponzio has shown that the complexity of this function is 2Ω( n) for read-once branching programs [12]. Progress in the analysis of MULn−1,n has been achieved by a new approach using universal hashing. Woelfel [17] has improved Bryant’s lower bound to Ω(2n/2 ) and Bollig and Woelfel [3] have presented a lower bound of Ω(2n/4 ) for read-once branching programs. Exponential lower bounds have also been proved for more general read-once branching program models that allow limited nondeterminism and for models where some but not all variables may be tested multiple times (see, e.g., [2], [5], [18], [4]). Finally, Sauerhoff and Woelfel [13] have presented exponential lower bounds on the size of read-k-times branching programs representing the middle bit of multiplication. Despite the well-known lower bounds for the middle bit of multiplication, until now the OBDD complexity of the most significant bit of multiplication has been unknown. Since the most significant bit is a monotone function it seems to be easier to compute than the middle bit. The known upper bounds on the OBDD
On the Most Significant Bit of Integer Multiplication
309
size confirms this intuition. Amano and Maruoka [1] have presented an upper bound of O(2n ) on the OBDD size of the most significant bit of multiplication, whereas the best known upper bound for the middle bit is O(26/5n ). Furthermore, in the lower bound proofs on the OBDD size for MULn−1,n it has been shown that for an arbitrary variable order π there exists an assignment b to one of the input vectors such that the π-OBDD size for the resulting subfunction is exponential. In contrast it is not difficult to see that the OBDD size for any subfunction of MUL2n−1,n where one of the input vectors is a constant is O(n2 ). Computing the set of nodes that are reachable from some source s ∈ V in a digraph G = (V, E) is an important problem in computer-aided design, hardware verification, and model checking. Proving exponential lower bounds on the space complexity of a common class of OBDD-based algorithms for the reachability problem, Sawitzki [14] has presented the first exponential lower bound on the size of π-OBDDs representing the most significant bit for the variable order π where the variables are tested according to increasing significance, i.e. π = (x0 , y0 , x1 , y1 , . . . , xn−1 , yn−1 ). For the lower bounds on the space complexity of the OBDD-based algorithms he has used the assumption that the output OBDDs use the same variable order as the input OBDDs. But in contrast, practical algorithms usually run variable reordering heuristics on intermediate OBDD results in order to minimize their size. Therefore, it is interesting whether the OBDD size of the most significant bit of multiplication is exponential with respect to an arbitrary variable order. In this paper we present the following result. Theorem 1. OBDD(MUL2n−1,n ) = Ω(2n/720 ). As a by-product we obtain almost the same lower bound on the π-OBDD size for the variable order π = (x0 , y0 , x1 , y1 , . . . , xn−1 , yn−1 ) as Sawitzki using a much simpler proof.
2 2.1
Preliminaries Notation
In the rest of the paper we use the following notation. Let [x]lr , n − 1 ≥ l ≥ r ≥ 0, denote the bits xl . . . xr of a binary number x = (xn−1 , . . . , x0 ). For the ease of description we use the notation [x]lr = z if (xl , . . . , xr ) is the binary representation of the integer z ∈ {0, . . . , 2l−r+1 − 1}. Sometimes, we identify [x]lr with z if the meaning is clear from the context. Let ∈ {0, . . . , 2m − 1}, then denotes the number (2m − 1) − . 2.2
Communication Complexity
In order to obtain lower bounds on the size of OBDDs one-way communication complexity has become a standard technique (see Hromkoviˇc [10] and Kushilevitz and Nisan [11] for the theory of communication complexity and the results mentioned below).
310
B. Bollig
The main subject is the analysis of the following (restricted) communication game. Consider a boolean function f ∈ Bn which is defined on the variables in Xn = {x1 , . . . , xn }, and let Π = (XA , XB ) be a partition of Xn . Assume that Alice has only access to the input variables in XA and Bob has only access to the input variables in XB . In a one-way communication protocol, upon a given input x, Alice is allowed to send a single message (depending on the input variables in XA ) to Bob who must then be able to compute the answer f (x). The oneway communication complexity of the function f denoted by C(f ) is the worst case number of bits of communication which need to be transmitted by such a protocol that computes f . It is easy to see that an OBDD G with respect to a variable order where the variables in XA are tested before the variables in XB can be transformed into a communication protocol and C(f ) ≤ log |G|. Therefore, linear lower bounds on the communication complexity of a function f lead to exponential lower bounds on the OBDD complexity. One central notion of communication complexity are fooling sets which play an important role for the lower bound proof used later on. Definition 5. Let f : {0, 1}|Xa| × {0, 1}|XB | → {0, 1}. A set S ⊆ {0, 1}|Xa | × {0, 1}|XB | is called fooling set for f if f (a, b) = c for all (a, b) ∈ S and some c ∈ {0, 1} and if for different pairs (a1 , b1 ), (a2 , b2 ) ∈ S at least one of f (a1 , b2 ) and f (a2 , b1 ) is unequal to c. Theorem 2. If f : {0, 1}|Xa| × {0, 1}|XB | → {0, 1} has a fooling set of size t, the communication complexity of f is bounded below by log t. Because of our considerations above, the size t of a fooling set for f is a lower bound on the size of OBDDs representing f with respect to a variable order where the variables XA are tested before the variables XB . Because of the symmetric definition of fooling sets, t is also a lower bound on the size of OBDDs representing f with respect to a variable order where the variables XB are tested before the variables XA . The crucial thing to prove lower bounds on the OBDD complexity of a function is to obtain for all partitions of the variables lower bounds on the size of fooling sets for subfunctions of the given function. Now we take a look at known results about the communication complexity of some functions. Let EQ: {0, 1}n × {0, 1}n → {0, 1} be defined by EQ(a, b) = 1 iff the vectors a = (a1 , . . . , an ) and b = (b1 , . . . , bn ) are equal. It is well-known that C(EQ) = n. The same holds for the function IP: {0, 1}n × {0, 1}n → {0, 1} for which IP(a, b) = 1 iff ai ⊕ bi = 1 for all i ∈ {1, . . . , n}. Similar results can be obtained for the functions GT : {0, 1}n × {0, 1}n → {0, 1}, GT∗ : {0, 1}n × {0, 1}n → {0, 1}, and GT∗∗ : {0, 1}n × {0, 1}n → {0, 1}, where GT(a, b) = 1 iff [a]n1 ≤ [b]n1 , GT∗ (a, b) = 1 iff α ≤ [b]n1 , where α is the integer with binary representation a, and GT∗∗ (a, b) = 1 iff [a]n1 ≤ β, where β is the integer with binary representation b. Furthermore, obviously the same results can be obtained if Alice gets exactly one of the variables ai and bi , 1 ≤ i ≤ n. The reason is the following one. The addition function ADDi,n ∈ B2n maps two n-bit integers x = xn−1 . . . x0 and y = yn−1 . . . y0 to the ith bit of their sum, i.e., ADDi,n (x, y) = si , where x + y = sn . . . s0 . It is well-known that ADDn,n
On the Most Significant Bit of Integer Multiplication
311
has a fooling set of size 2n if for each i, 0 ≤ i ≤ n − 1, Alice gets exactly one of the variables xi and yi (variables of the same significance are symmetric variables for binary addition). GT((an−1 , . . . , a0 ), (bn−1 , . . . , b0 )) is equal to ADDn,n ((an−1 , . . . , a0 ), (bn−1 , . . . , b0 )), where ADDn,n (x, y) is the negated function ADDn,n . The idea of Bryant’s lower bound proof on the OBDD size of MULn−1,n [7] is the following. For each variable order, there is a subfunction of MULn−1,n which essentially equals the computation of the output bit at position m of the addition of two m-bit numbers x and y where m ≥ n/8. The variable order is bad in the sense that among Alice’s m variables is exactly one of the variables xi and yi .
3
An Exponential Lower Bound on the OBDD Complexity of the Most Significant Bit of Integer Multiplication
In this section, we determine the lower bound on the size of OBDDs for the representation of the most significant bit of multiplication mentioned above. Besides Bryant’s lower bound proof on the size of OBDDs representing the middle bit of multiplication we use the idea of the following reduction from multiplication to squaring presented by Wegener [15] where squaring computes the square of an n-bit input. For two n-bit numbers u and w the number z := u · 22(n+1) + w is defined. Then z 2 = u2 · 24(n+1) + uw22(n+1)+1 + w2 . Since w2 and uw are numbers of length 2n, the binary representation of the product uw can be found in the binary representation of z 2 . In the following for the sake of simplicity we do not apply floor or ceiling functions to numbers even when they need to be integers whenever this is clear from the context and has no bearing on the essence of the proof. We start our proof of the lower bound on the OBDD complexity of MUL2n−1,n by the following observation (due to the lack of space we have to omit the proof). Lemma 1. A pair (xi , yi+1 ), n/2 + 1 ≤ i ≤ (3/4)n − 2, is called (x, y)-pair. Let S be the set of the first |S| variables according to a variable order π. A pair (xi , yi+1 ) is called separated with respect to S iff xi ∈ S and yi+1 ∈ / S or vice versa. If there exist a set S according to π such that there are at least m separated (x, y)-pairs with respect to S, the π-OBDD size of the most significant bit of integer multiplication is at least 2m/2 . Now let π be an arbitrary variable order. The key observation for our lower bound proof is the following one. Fact 1. For a number 2n−1 + 2(1/2)n , ≤ 2(1/6)n−1 , the corresponding smallest number such that the product of the two numbers is at least 22n−1 is 2n − 2(1/2)n+1 + 42 . (Figure 1 shows the corresponding x- and y-inputs.)
312
B. Bollig n 2
n−1
10
...
+
n 6
−2
n 2
0...0
0 u
0
0
...
0
x
w l
n 2
n−1
1 ...
+
n 6
−1
1
n 2
n 3
−1
2
0 0 ... 0
¯l + 1
0
00
y
l2
Fig. 1. The partition of the inputs x and y
In the following we take a closer look at the variables xn/2 , . . . , xn/2+n/6−2 . For the ease of description we assume that (n/6 − 1) mod 3 = 2. We rename n/2+n/18−2 n/2+n/6−2 by [w]m−1 and [x]n/2+n/9 by [u]m−1 , where m := (n/6 − 3)/3. [x]n/2 0 0 Figure 1 illustrates the partition of the input x. Let S be the set of the first |S| variables according to π where there are at least (1/2)m variables from {w0 , . . . , wm−1 } for the first time. Let IS ⊆ {0, . . . , m− 1} be the set of indices i for which wi ∈ S. Using simple counting arguments we can prove that there exists a distance parameter d such that there exists a set of pairs / IS or i ∈ / IS and (i + d) ∈ IS , where 0 ≤ P = {(wi , wi+d )|i ∈ IS and (i + d) ∈ i < m/2 ≤ i + d ≤ m − 1} and |P | ≥ m/8 (see [7] for a similar proof). Let I be the set of indices i, 0 ≤ i < m/2, where wi belongs to a pair in P . Case 1: There are at least (2/5)m/8 separated (xn/2+i , yn/2+i+1 )-pairs with respect to S, where i ∈ I or i − d ∈ I . Using Lemma 1 we can conclude that the π-OBDD size of the most significant bit of integer multiplication is at least 2(1/5)m/8 . Case 2: There are less than (2/5)m/8 separated (xn/2+i , yn/2+i+1 )-pairs with respect to S, where i ∈ I or i − d ∈ I . Let I ⊆ I be the set of indices such that (xn/2+i , yn/2+i+1 ) and (xn/2+i+d , yn/2+i+d+1 ), i ∈ I , are not separated with respect to S. Obviously, |I | ≥ (3/5)m/8. Now we replace some of the variables in the following way. -
yn−1 , . . . , yn/2+n/6 are replaced by 1, yn/2 , . . . , yn/3 , y1 , and y0 are replaced by 0, xn−1 is replaced by 1, xn−2 , . . . , xn/2+n/6−1 are replaced by 0,
On the Most Significant Bit of Integer Multiplication 6m + 3
4m + 4
4m + 2
2m + 3
0
u2
2m − 1
313
0
l2
000
w2
u·w
Fig. 2. The bit composition of the number l2 6m + 5
4m + 6
u2
4m + 4
0
2m + 5
2m + 1
000 1
2
...
1
y
u·w Fig. 3. The effect of the replacements of some of the y-variables
- xn/2+n/9−1 , . . . , xn/2+n/18−1 are replaced by 0, - x0 , . . . , xn/2−1 are replaced by 0. Figure 1 illustrates these replacements. Furthermore, u0 and ud are set to 1, all other u-variables are set to 0. The effect of these replacements is that = 2d + 1 =: u. The variables y4m+6 , y4m+d+7 , and y4m+2d+6 are set to [u]m−1 0 1, the other variables yj with 4m + 6 ≤ j ≤ 6m + 5 are set to 0. The effect of 2 these replacemenst is that [y]6m+5 4m+6 = u (Figure 3 shows these replacements). The variables y4m+5 , y2m+4 , y2m+3 and y2m+2 are set to 0, and the variables y2m+1 , . . . , y2 are set to 1. The effect of the last replacements is that 22m > > w2 , where w is defined as the integer with binary representation [y]2m+1 2 m−1 [w]0 . Figure 3 illustrates these replacements. Now we take a closer look at the product u · w, where u is equal to 2d + 1. A pair (wi+d , y2m+5+2d+i ), i ∈ I , is called (w, y)-pair. A (w, y)-pair is called separated with respect to S iff wi+d ∈ S and y2m+5+2d+i ∈ / S or vice versa. Case 2.1 In the following we prove the existence of a fooling set with at least 2(1/5)m/8 elements. For this reason we choose a subfunction of MUL2n−1,n such that the computation of this subfunction resembles the computation of the function GT(1/5)m/8 . There are at least (2/5)m/8 separated (w, y)-pairs with respect to S. W.l.o.g. we assume that there are at least (1/5)m/8 separated w-variables in S. Our aim is to prove that there exists a fooling set of size at least (1/5)m/8. The separated w-variables in S and their corresponding y-variables are called free. Furthermore, a variable yn/2+i+d+1 for which the variable wi+d is free is also
314
B. Bollig
called free. Remember that the variable yn/2+i+d+1 is also in S because of the definition of I . Let min I be the minimal and max I be the maximal element of I . In the rest of the proof we choose for each variable yn/2+i+d+1 , where wi+d is a free variable and i = min I , an assignment such that yn/2+i+d+1 = wi+d ⊕ 1 without further mentioning it. - The variables wmin I +d , y2m+5+2d+min I , and yn/2+min I +d+1 are set to 1. - The variables wj and yn/2+j+1 , 0 ≤ j < min I + d, are set to 0. - All other variables wi which are not free are set to 0, their corresponding variables yn/2+i+1 are set to 1. - The other y-variables which are not free are replaced in the following way. The variable y2m+5+max I +d+1 is set to 1, the remaining y-variables without the free variables to 0. What is the effect of these replacements? n/2+m−1 Remember that [w]m−1 = [x]n/2 . We only consider assignments to the 0 n/2+3m+1
variables for which the following holds. If [x]n/2
n/2+3m+2
= then [y]n/2+1
=
+ 1. Now iff a number r, where r ≥ , the product x · . We take a closer look at the variables y2 , . . . , y6m+3 . y is greater than 2 Figure 2 shows the composition of the number 2 . One effect of our replacements 2m+1 2 is that [y]6m+5 > w2 . Therefore, iff [y]4m+4 4m+6 = u and [y]2 2m+5 represents a 6m+5 number r , where r ≥ u · w, [y]2 represents a number r, where r ≥ 2 . Since we have replaced the variable y2m+5+max I +d+1 by 1 and because of our other replacements, [y]26m+5 represents a number r, where r ≥ 2 , iff for each separated (w, y)-pair, the assignment to the variable y2m+5+2d+i is at least as large as the assignment to the variable wi+d . Therefore, the considered subfunction resembles the function GT1/5·m/8 . In the rest of the proof we show that all possible assignments to the free variables wi+d together with all possible assignments to the variables y2m+5+2d+i , where y2m+5+2d+i = wi , are a fooling set of size at least 1/5 · m/8. Together with the replacements to constants an assignment to the free wvariables can be seen as a number 2n−1 + 1 2n/2 , the corresponding assignment to the free y-variables together with the replacements to constants as number 2n − 1 2n/2+1 + c, where c > 421 . Therefore, the product of the two numbers is larger than 22n−1 . If we choose another assignment to the free variables wi+d such that x can be seen as number 2n−1 + 2 2n/2 , 2 > 1 , we get the following result (2n−1 + 2 2n/2 ) · (2n − 2 2n/2+1 + c) < 22n−1 , [y]6m+3 represents 2 2n−1
2
because c < 422 . Therefore, we are done. Case 2.2: In the following we prove the existence of a fooling set with at least 2(1/5)m/8 elements. For this reason we choose a subfunction of MUL2n−1,n such that the computation of this subfunction resembles the computation of the function GT∗∗ (1/5)m/8 . There are less than (2/5)m/8 separated (w, y)-pairs with respect to S. Let I ⊆ I be the set of indices such that (wi+d , y2m+5+2d+i ), i ∈ I , are not separated
On the Most Significant Bit of Integer Multiplication
315
with respect to S. Obviously, |I| ≥ (1/5)m/8. Let min I and max I be the minimal resp. maximal element of I. For this reason we replace some of the variables in the following way. - The variables wmin I and yn/2+min I+1 are set to 1, the variable wmin I+d is set to 0, the variable yn/2+min I+d+1 is set to 1, - the variables wi , i < min I, are set to 0, the corresponding variables yn/2+i+1 are set to 0, - the variables wi , min I < i < max I and i ∈ / I are set to 1, the corresponding variables yn/2+i+1 are set to 0, - all other variables wj , j ∈ / I and j − d ∈ / I, are set to 0, the corresponding variables yn/2+j+1 are set to 1. Furthermore, the variables y2m+5+2d+i , i ∈ I \ I, are replaced by 0. The variables y2m+5+2d+i , yn/2+i+1 , and yn/2+i+d+1 , i ∈ I, are called free. The yvariables which are not free are replaced in the following way. - The variables yj , 2m + 5 + min I ≤ j ≤ 2m + 5 + max I, are set to 1, - the variables yj , 2m + 5 + min I + d ≤ j ≤ 2m + 5 + max I + d, are set to 1, and - all other variables yj with 2m + 5 ≤ j ≤ 6m + 5 besides the free y-variables are set to 0. The free y-variables are not separated from their corresponding w-variables, since we know that wi ∈ S and yn/2+i+1 ∈ S or wi ∈ S and yn/2+i+1 ∈ S, where i ∈ I, because of the definition of I. The same holds for wi+d , yn/2+i+d+1 , and y2m+5+2d+i , where i ∈ I. In the rest of the proof we only consider assignments with the property that - yn/2+i+1 = wi ⊕ 1, - yn/2+i+d+1 = wi+d ⊕ 1, and - y2m+5+2d+i = wi+d , where i ∈ I, without further mentioning it. In the following we prove that all possible assignments to the variables wi , i ∈ I, together with the assignments to the variables wi+d , i ∈ I, such that wi+d = wi ⊕ 1 are a fooling set of size at least (1/5)m/8. Together with the replacements to constants our assignments to the variables wi , i ∈ I or i − d ∈ I, can be seen as a number 2n−1 + 2n/2 . The corresponding assignments to the yvariables can be interpreted as number 2n −2n/2+1 +c, where c > 42 . Therefore, the product of the two numbers is larger than 22n−1 . To see this we decompose and w = [w]m−1 . The number c can be into u · 22m+2 + w, where u = [u]m−1 0 0 decomposed into 4m+6 2m+5 [y]6m+5 + [y]4m+4 + [y]2m+1 · 22 . 4m+6 · 2 2m+5 · 2 2 6m+5 = u2 and w2 < [y]2m+1 < 22m (see Figure 3). As mentioned before, [y]4m+6 2 m−1 max I+d I The number [w]0 can be decomposed into [w]min I+d · 2min I+d + [w]max min I ·
316
B. Bollig
max I+d 4m+4 I 2min I . Let w := [w]max min I and w := [w]min I+d . Now the number [y]2m+5 can 2m+5+2d+min I max I−min I+1 2m+5+d+min I + (2 − 1) · 2 + be decomposed into w · 2 (2max I−min I+1 − 1) · 22m+5+min I . Iff w + w ≤ 2max I−min I+1 − 1, the number 6m+5 > 2 . Therefore, we can [y]4m+4 2m+5 is greater than u · w and altogether [y]2 2 conclude c > 4 . If w +w > 2max I−min I+1 −1, the number [y]4m+4 2m+5 is less than u·w. Therefore, x can be seen as number 2n−1 + 2n/2 and y as 2n − 2n/2+1 + c, where c < 42 . Since (2n−1 + 2n/2 ) · (2n − 2n/2+1 + c) < 22n−1
we are done. Altogether we have shown that for an arbitrary variable order π the π-OBDD size for the most significant bit of multiplication is at least 2(1/5)m/8 . Considering the fact that m := (n/6−3)/3 = n/18−1 we obtain a lower bound of 2n/720−1 = Ω(2n/720 ) on the OBDD complexity of MUL2n−1,n . Using techniques from analytical number theory Sawitzki [14] has presented a lower bound of 2n/6 on the size of π-OBDDs representing the most significant bit of integer multiplication for the variable order π where the variables are tested according to increasing significance, i.e. π = (x0 , y0 , x1 , y1 , . . . , xn−1 , yn−1 ). Almost the same lower bound can be proved in an easier way and without analytical number theory using the fact that for a number 2n−1 + 2(1/2)n , ≤ 2(1/6)n−1 , the corresponding smallest number such that the product of the two numbers is at least 22n−1 is 2n − 2(1/2)n+1 + 42 . Furthermore, we only want to mention here that similar to Gergov’s [9] generalization of Bryant’s lower bound on the size of OBDDs for the middle bit of multiplication to arbitrary oblivious programs of linear length the result for the most significant bit of multiplication can be extended.
Acknowledgement The author would like to thank Martin Sauerhoff for careful reading the first version of the paper and for valuable suggestions which helped to improve the paper.
References 1. Amano, K., Maruoka, A.: Better upper bounds on the QOBDD size of integer multiplication. Discrete Applied Mathematics 155, 1224–1232 (2007) 2. Bollig, B.: Restricted nondeterministic read-once branching programs and an exponential lower bound for integer multiplication. RAIRO Theoretical Informatics and Applications 35, 149–162 (2001) 3. Bollig, B., Woelfel, P.: A read-once branching program lower bound of Ω(2n/4 ) for integer multiplication using universal hashing. In: Proc. of 33rd STOC, pp. 419–424 (2001)
On the Most Significant Bit of Integer Multiplication
317
4. Bollig, B., Waack, S., Woelfel, P.: Parity graph-driven read-once branching programs and an exponential lower bound for integer multiplication. Theoretical Computer Science 362, 86–99 (2006) 5. Bollig, B., Woelfel, P.: A lower bound technique for nondeterministic graph-driven read-once branching programs and its applications. Theory of Computing Systems 38, 671–685 (2005) 6. Bryant, R.E.: Graph-based algorithms for Boolean manipulation. IEEE Trans. on Computers 35, 677–691 (1986) 7. Bryant, R.E.: On the complexity of VLSI implementations and graph representations of Boolean functions with application to integer multiplication. IEEE Trans. on Computers 40, 205–213 (1991) 8. F¨ uhrer, M.: Faster integer multiplication. In: Proc. of 39th STOC, pp. 57–66 (2007) 9. Gergov, J.: Time-space trade-offs for integer multiplication on various types of input oblivious sequential machines. Information Processing Letters 51, 265–269 (1994) 10. Hromkoviˇc, J.: Communication Complexity and Parallel Computing. Springer, Heidelberg (1997) 11. Kushilevitz, E., Nisan, N.: Communication Complexity. Cambridge University Press, Cambridge (1997) 12. Ponzio, S.: A lower bound for integer multiplication with read-once branching programs. SIAM Journal on Computing 28, 798–815 (1998) 13. Sauerhoff, M., Woelfel, P.: Time-space trade-off lower bounds for integer multiplication and graphs of arithmetic functions. In: Proc. of 33rd STOC, pp. 186–195 (2003) 14. Sawitzki, D.: Exponential lower bounds on the space complexity of OBDD-based graph algorithms. In: Proc. of LATIN. LNCS, vol. 3831, pp. 471–482 (2005) 15. Wegener, I.: Optimal lower bounds on the depth of polynomial-size threshold circuits for some arithmetic functions. Information Processing Letters 46(2), 85–87 (1993) 16. Wegener, I.: Branching Programs and Binary Decision Diagrams - Theory and Applications. SIAM Monographs on Discrete Mathematics and Applications (2000) 17. Woelfel, P.: New bounds on the OBDD-size of integer multiplication via universal hashing. Journal of Computer and System Science 71(4), 520–534 (2005) 18. Woelfel, P.: On the complexity of integer multiplication in branching programs with multiple tests and in read-once branching programs with limited nondeterminism. In: Proc. of 17th Computational Complexity, pp. 80–89 (2002)
Logical Closure Properties of Propositional Proof Systems (Extended Abstract) Olaf Beyersdorff Institut f¨ ur Theoretische Informatik, Leibniz Universit¨ at Hannover, Germany [email protected]
Abstract. In this paper we define and investigate basic logical closure properties of propositional proof systems such as closure of arbitrary proof systems under modus ponens or substitutions. As our main result we obtain a purely logical characterization of the degrees of schematic extensions of EF in terms of a simple combination of these properties. This result underlines the empirical evidence that EF and its extensions admit a robust definition which rests on only a few central concepts from propositional logic.
1
Introduction
In their seminal paper [11] Cook and Reckhow gave a very general complexitytheoretic definition of the concept of a propositional proof system, focusing on efficient verification of propositional proofs. Due to the expressivity of Turing machines (or any other model of efficient computation) this definition includes a variety of rather unnatural propositional proof systems. In contrast, prooftheoretic research concentrates on propositional proof systems which, beyond efficient verification, satisfy a number of additional natural properties. Proof systems with nice structural properties are also exclusively used in practice (e.g. for automated theorem proving). Supported by this empirical evidence, we therefore formulate the thesis, that the Cook-Reckhow framework is possibly too broad for the study of natural proof systems of practical relevance. Motivated by these observations, we investigate the interplay of central logical closure properties of propositional proof systems, such as the ability to use modus ponens or substitutions in arbitrary proof systems. Proof systems are compared with respect to their strength by simulations, and all equivalent systems form one degree of proof systems. Since in proof complexity we are mostly interested in the degree of a propositional proof system and not so much in specific representatives of this degree, we only study properties which are preserved inside a simulation degree. In particular, we think that it would be desirable to characterize the degrees of important proof systems
Part of this work was done while at Humboldt University Berlin. Supported by DFG grant KO 1053/5-1.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 318–329, 2008. c Springer-Verlag Berlin Heidelberg 2008
Logical Closure Properties of Propositional Proof Systems
319
(e.g. resolution, cutting planes, or Frege) by meaningful and natural properties. Such results would provide strong confirmation for the empirical evidence, that these systems have indeed a natural and robust definition. One would expect that according to the general classification of propositional proof systems into logical systems (such as resolution, Frege, QBF), algebraic systems (polynomial calculus, Nullstellensatz) and geometric systems (cutting planes), these underlying principles should also be of logical, algebraic, and geometrical character, respectively. As a first step of this more general program we exhibit a purely logical characterization of the degrees of schematic extensions of the extended Frege system EF . These schematic extensions enhance the extended Frege system by additional sets of polynomial-time decidable axiom schemes. Such systems are of particular importance: Firstly, because every propositional proof system is simulated by such an extension of EF , and secondly, because these systems admit a fruitful correspondence to theories of bounded arithmetic [8,14,16]. For our characterization we formalize closure properties such as modus ponens and substitutions in such a way that they are applicable for arbitrary propositional proof systems. We analyse the mutual dependence of these properties, providing in particular strong evidence for their independence. Our characterization of extensions of EF involves the properties modus ponens, substitutions, and reflection. This result tells us that the essence of extended Frege systems (and its generalizations) lies in the ability to use modus ponens and substitutions, and to prove the consistency of the system with short proofs (this property is known as reflection). Thus schematic extensions of EF are exactly those systems (up to p-equivalence) which can prove their consistency and are closed under modus ponens and substitutions. This result also allows the characterization of the existence of optimal propositional proof systems (which simulate every other system). The paper is organized as follows. We start in Sect. 2 by recalling some background information on propositional proof systems and particularly Frege systems and their extensions. In Sect. 3 we define and investigate natural properties of proof systems which we use throughout this paper. A particularly important property for strong systems is the reflection property, which gives a propositional description of the correctness of a proof system. Different versions of such consistency statements are discussed in Sect. 4, leading in particular to a robust definition of the reflection property. Section 5 contains the main result of this paper, consisting of a purely logical characterization of the degrees of schematic extensions of EF . This directly leads to a similar characterization of the existence of p-optimal proof systems. These results can also be explained in the more general context of the correspondence between strong propositional proof systems and arithmetic theories, of which we will sketch an axiomatic approach. Finally, in Sect. 6 we conclude with some open problems. Most definitions and results of this paper can be given in two versions, one for simulations and the other, using slightly stronger assumptions, for the case
320
O. Beyersdorff
of p-simulations between proof systems (cf. Sect. 2). For brevity we will restrict this exposition to the efficient case of p-simulations. Due to space limitations we only sketch proofs or omit them in this extended abstract.
2
Propositional Proof Systems
Propositional proof systems were defined in a very general way by Cook and Reckhow [11] as polynomial-time functions P which have as their range the set TAUT of all propositional tautologies. A string π with P (π) = ϕ is called a P -proof of the tautology ϕ. By P ≤m ϕ we indicate that there is a P -proof of ϕ of size ≤ m. If Φ is a set of propositional formulas we write P ∗ Φ if there is a polynomial p such that P ≤p(|ϕ|) ϕ for all ϕ ∈ Φ. If Φ = {ϕn | n ≥ 0} is a sequence of formulas we also write P ∗ ϕn instead of P ∗ Φ. Proof systems are compared according to their strength by simulations, introduced in [11] and [18]. A proof system S simulates a system P (denoted by P ≤ S) if there exists a polynomial p such that for all tautologies ϕ and P -proofs π of ϕ there is an S-proof π of ϕ with |π | ≤ p (|π|). If such a proof π can even be computed from π in polynomial time we say that S p-simulates P and denote this by P ≤p S. Systems P and S, that mutually (p-)simulate each other, are called (p-)equivalent, denoted by P ≡(p) S. A proof system is (p-)optimal if it (p-)simulates all proof systems. A prominent example of a class of proof systems is provided by Frege systems which are usual textbook proof systems based on axioms and rules. In the context of propositional proof complexity these systems were first studied by Cook and Reckhow [11], and it was proven there that all Frege systems, i.e., systems using different axiomatizations and rules, are p-equivalent. A different characterization of Frege systems is provided by Gentzen’s sequent calculus [12], that is historically one of the first and best analysed proof systems. The sequent calculus is widely used, both for propositional and first-order logic, and it is straightforward to verify that Frege systems and the propositional sequent calculus LK p-simulate each other [11]. Augmenting Frege systems by the possibility to abbreviate complex formulas by propositional variables, we arrive at the extended Frege proof system EF . The extension rule might further reduce the proof size, but it is not known whether EF is really stronger than ordinary Frege systems. Both Frege and the extended Frege system are very strong systems for which no non-trivial lower bounds to the proof size are currently known (cf. [6]). It is often desirable to further strengthen the proof system EF by additional axioms. This can be done by allowing a polynomial-time computable set Φ as new axioms, i.e., formulas from Φ as well as their substitution instances may be freely used in EF -proofs. These schematic extensions of EF are denoted by EF + Φ. In this way, we obtain proof systems of arbitrary strength (cf. Theorem 15). More detailed information on Frege systems and its extensions can be found in [9,16].
Logical Closure Properties of Propositional Proof Systems
3
321
Closure Properties of Proof Systems
In this section we define and investigate natural properties of propositional proof systems that are satisfied by many important proof systems. One of the most common rules is modus ponens, which serves as the central rule in Frege systems. Carrying out modus ponens in a general proof system might be formalized as: Definition 1. A proof system P is closed under modus ponens if there exists a polynomial-time computable algorithm that takes as input P -proofs π1 , . . . , πk of propositional formulas ϕ1 , . . . , ϕk together with a P -proof πk+1 of the implication (ϕ1 → (ϕ2 → . . . (ϕk → ϕk+1 ) . . . )) and outputs a P -proof of ϕk+1 . Defining closure under modus ponens by requiring k = 1 in the above definition seems to lead to a too restrictive notion. Namely, in some applications we need to use modus ponens polynomially many times (cf. Theorem 11). In this case, the above definition with k = 1 would only guarantee an exponential upper bound on the size of the resulting proof, whereas Definition 1 results only in a polynomial increase. If π is a Frege proof of a formula ϕ, then we can prove substitution instances σ(ϕ) of ϕ by applying the substitution σ to every formula in the proof π. This leads us to the general concept of closure of a proof system under substitutions. Definition 2. P is closed under substitutions if there exists a polynomial-time procedure that takes as input a P -proof of a formula ϕ as well as a substitution instance σ(ϕ) of ϕ and computes a P -proof of σ(ϕ). It also makes sense to consider other properties like closure under conjunctions or disjunctions. A particularly simple property is the following: a proof system evaluates formulas without variables if formulas using only constants but no propositional variables have polynomial-size proofs. As this is true even for truthtable evaluations, all proof systems simulating the truth-table system evaluate formulas without variables. We can classify properties of proof systems like those above along the following lines. Some properties are monotone in the sense that they are preserved from weaker to stronger systems, i.e., if P ≤ Q and P has the property, then also Q satisfies the property. Evaluation of formulas without variables is such a monotone property. Other properties might not be monotone but still robust in the sense that the property is preserved when we switch to a p-equivalent system. Since we are interested in the degree of a proof system and not in the particular representative of that degree, it is desirable to investigate only robust or even monotone properties. It is straightforward to verify that closure under modus ponens and closure under substitutions are robust properties. We remark that Frege systems and their extensions have very good closure properties. Proposition 3. The Frege system F , the extended Frege system EF , and all extensions EF + Φ by polynomial-time computable sets of axioms Φ ⊆ TAUT are closed under modus ponens and under substitutions.
322
O. Beyersdorff
It is interesting to ask whether these properties of propositional proof systems are independent from each other. With respect to this question we observe the following. Proposition 4. Assume that the extended Frege proof system is not optimal. Then there exist proof systems which are closed under substitutions but not under modus ponens. Proof. (Idea) We use the assumption of the non-optimality of EF to obtain polynomial-time constructable sequences ϕn and ψn of tautologies, such that EF ∗ ϕn , but EF ∗ ψn . We then encode the implications ϕn → ψn into an extension of EF , thus obtaining a system Q that is closed under substitutions, but
not under modus ponens, because Q ∗ ϕn , Q ∗ ϕn → ψn , and Q ∗ ψn . Candidates for proof systems that are closed under modus ponens but not under substitutions come from extensions of Frege systems by polynomial-time computable sets Φ ⊆ TAUT as new axioms. Clearly these systems are closed under modus ponens. In [3], however, we exhibit a suitable hypothesis, involving disjoint NP-pairs, which guarantees that these proof systems are not even closed under substitutions by constants for suitable choices of Φ.
4
Consistency Statements
Starting with this section, we will use the correspondence of propositional proof systems to theories of bounded arithmetic. Bounded arithmetic is the general denomination of a whole collection of weak fragments of Peano arithmetic, that are defined by adding a controlled amount of induction to a set of basic axioms (cf. [14]). One of the most prominent examples of these arithmetic theories is Buss’ theory S21 , defined in [8]. In addition to the usual ingredients, the language L of S21 uses a number of technical symbols to allow a smooth formalization of syntactic concepts. A central ingredient of the correspondence of arithmetic theories to propositional proof systems is the translation of first-order arithmetic formulas into propositional formulas [10,19]. An L-formula in prenex normal form with only bounded existential quantifiers is called a Σ1b -formula. These formulas describe NP-predicates in the sense that the class of all Σ1b -definable subsets of N coincides with the class of all NP-subsets of N (cf. [25,8]). Likewise, Π1b -formulas only have bounded universal quantifiers and describe coNP-predicates. A Π1b -formula ϕ(x) is translated into a sequence ϕ(x) n of propositional formulas containing one formula per input length for the number x, such that ϕ(x) is true, i.e., N |= (∀x)ϕ(x), if and only if ϕ(x) n is a tautology where n = |x| (cf. [16]). We use ϕ(x) to denote the set { ϕ(x) n | n ≥ 1}. The consistency of a proof system P is described by the consistency statement Con(P ) = (∀π)¬Prf P (π, ⊥), where Prf P (π, ϕ) is a suitable arithmetic formula describing that π is a P -proof of ϕ. The formula Prf P can be chosen such that Prf P is provably equivalent in S21 both to a Σ1b - and a Π1b -formula (such formulas are called Δb1 -formulas with respect to S21 , cf. [16]).
Logical Closure Properties of Propositional Proof Systems
323
A somewhat stronger formulation of consistency is given by the reflection principle of a propositional proof system P , which is defined by the arithmetic formula RFN (P ) = (∀π)(∀ϕ)Prf P (π, ϕ) → Taut(ϕ) , where Taut is a Π1b -formula formalizing propositional tautologies. Therefore Con(P ) and RFN (P ) are ∀Π1b -formulas, i.e., these formulas are in prenex normal form with unbounded ∀-quantifiers followed by bounded ∀-quantifiers and can therefore be translated via . into sequences of propositional formulas. The two consistency notions Con(P ) and RFN (P ) are compared by the following well-known observation, contained e.g. in [16]: Proposition 5. Let P be a proof system that is closed under substitutions and modus ponens and evaluates formulas without variables, and assume that these properties are provable in S21 . Then S21 RFN (P ) ↔ Con(P ). Very often propositional descriptions of the reflection principle are needed. These can be simply obtained by translating RFN (P ) to a sequence of propositional formulas using the translation . . Definition 6. A propositional proof system P has the reflection property if there exists a polynomial-time algorithm that on input 1n outputs a P -proof of RFN (P ) n . There is a subtle problem with Definition 6 which is somewhat hidden in the definition. Namely, the formula Prf P describes the computation of some Turing machine computing the function P . However, the provability of the formulas RFN (P ) n with polynomial-size P -proofs might depend on the actual choice of the Turing machine computing P . Let us illustrate this with the following example. Proposition 7. If EF is not p-optimal, then there exists a proof system Q ≡p EF such that S21 does not prove the reflection principle of Q, i.e., S21 does not prove the formula (∀π)(∀ϕ)Prf Q (π, ϕ) → Taut(ϕ) for some suitable choice of the Turing machine that computes Q and is used for the formula Prf Q . Proof. (Sketch) If EF is not p-optimal, then there exists a proof system R such that R ≤p EF . We define the system P as EF + RFN (R) and the system Q as ⎧ ⎪ if π = 0π and π is an EF -proof of ϕ ⎨ϕ Q(π) = P (π ) if π = 1π and P (π ) ∈ {, ⊥} ⎪ ⎩ otherwise. It is easily checked that EF and Q are ≤p -equivalent. We have to show that S21 does not prove the formula RFN (Q) where for the predicate Prf Q we use the canonical Turing machine M according to the above definition of Q, i.e., on input 0π the machine M checks whether π is a correct EF -proof and on input 1π the machine M evaluates P (π ). Assume on the contrary that S21 RFN (Q). Because of line 2 of the definition of Q this means that S21 can prove
324
O. Beyersdorff
that there is no P -proof of ⊥, i.e., S21 proves the consistency statement of P . The system P is closed under substitutions by constants and modus ponens. Therefore Con(P ) and RFN (P ) are equivalent in S21 by Proposition 5. Hence S21 not only proves Con(P ), but also RFN (P ), which gives a p-simulation of P by EF (cf. Definition 13 and Theorem 14 below). This, however, contradicts the choice of P , and hence S21 RFN (Q).
Note that S21 RFN (EF ) (cf. [18]), contrasting S21 RFN (Q) in the above proposition. This observation tells us that we should understand the meaning of Definition 6 in the following, more precise way: Definition 8. A propositional proof system P has the robust reflection property if there exists a deterministic polynomial-time Turing machine M computing the function P such that for some Δb1 -formalization Prf P of the computation of M with respect to S21 we have a polynomial-time algorithm that constructs on input 1n a P -proof of the formula (∀π)(∀ϕ)Prf P (π, ϕ) → Taut(ϕ) n . For this definition of reflection we can show the robustness of the reflection principle under p-simulations: Proposition 9. Let P and Q be p-equivalent proof systems. Then P has the robust reflection property if and only if Q has the robust reflection property. Proof. (Idea) If Q proves its reflection principle with respect to the Turing machine M , then P can prove its reflection for the Turing machine M ◦ N , where the machine N computes a p-simulation of Q by P .
It is known that strong propositional proof systems like EF and its extensions have the reflection property [18]. In contrast, weak systems like resolution do not have reflection. Pudl´ ak [21] proved that the cutting planes system CP requires nearly exponential-size refutations of some canonical formulation of the formulas RFN (CP ). Atserias and Bonet [1] obtained the same result for the resolution system Res. This, however, does not exclude the possibility that we have short proofs for the reflection principle of resolution or CP with respect to some other formalization of PrfCP , PrfRes , or Taut . It therefore remains as an open question whether these systems have the robust reflection property. Alternatively, we can view robust reflection as a condition on the canonical disjoint NP-pair (Ref (P ), Sat ∗ ) of a proof system P , introduced by Razborov [22]. Its first component Ref (P ) = {(ϕ, 1m ) | P ≤m ϕ} contains information about proof lengths in P , and the second component Sat ∗ = {(ϕ, 1m ) | ¬ϕ ∈ SAT} is a padded version of SAT. The link of the canonical pair with the reflection property was already noted by Pudl´ ak [21]. We can extend this idea to obtain a characterization of robust reflection for weak proof systems. Proposition 10. Let P be the resolution or cutting planes system. Then P has the robust reflection property if and only if the canonical pair of P is p-separable, i.e., there exists a polynomial-time decidable set S such that Ref (P ) ⊆ S and S ∩ Sat ∗ = ∅.
Logical Closure Properties of Propositional Proof Systems
325
Proof. (Idea) Robust reflection for P means that we can efficiently generate P proofs for the disjointness of (Ref (P ), Sat ∗ ) with respect to some propositional representations of its components. Using feasible interpolation for P [17,7,20], we get a polynomial-time computable separator for (Ref (P ), Sat ∗ ). Conversely, if the canonical P -pair is p-separable, then it can be given a simple propositional description, for which we can devise short P -proofs of the disjointness of the pair. This is possible, as all p-separable disjoint NP-pairs are equivalent (via suitable reductions for pairs [13]). We can then choose a simple pseparable pair, prove its disjointness in P , and translate these proofs into proofs for the disjointness of (Ref (P ), Sat ∗ ) (cf. [2] for the details of this approach).
As it is conjectured that none of the canonical pairs of natural proof systems is p-separable [21], Proposition 10 indicates the absence of robust reflection for weak systems that satisfy the interpolation property.
5
Characterizing the Degree of Extended Frege Systems
Using the results from the previous section, we will now exhibit a characterization of the degrees of schematic extensions of EF . Theorem 11. For all proof systems P ≥p EF the following conditions are equivalent: 1. P is p-equivalent to a proof system of the form EF + ϕ with a true Π1b formula ϕ. 2. P is p-equivalent to a proof system of the form EF + Φ with a polynomialtime decidable set of true Π1b -formulas Φ. 3. P has the robust reflection property and is closed under modus ponens and substitutions. Proof. Item 1 trivially implies item 2. For the implication 2 ⇒ 3 let P ≡p EF + Φ . Then the closure properties of EF + Φ are transferred to P . Systems of the form EF + Φ are known to have the reflection property (cf. [19] and Theorem 14 below). By Proposition 9 robust reflection for EF + Φ is transferred to P . The main part of the proof is the implication 3 ⇒ 1. Its proof involves a series of results that are also of independent interest. The first step is an efficient version of the deduction theorem for EF : 1. Efficient Deduction theorem for EF . There exists a polynomial-time procedure that takes as input an EF -proof of a formula ψ from a finite set of tautologies Φ as extra assumptions, and produces an EF -proof of the implication ( ϕ∈Φ ϕ) → ψ. A similar deduction theorem was shown for Frege systems by Bonet and Buss [4,5]. For stronger systems like EF we just remark that there are different ways to formalize deduction. These deduction properties seem to be quite powerful, as
326
O. Beyersdorff
they allow the characterization of the existence of optimal and even polynomially bounded proof systems [3]. In the second step we compare schematic extensions of EF and strong proof systems with sufficient closure properties. 2. Simulation of extensions of EF by sufficiently closed systems. Let P be a proof system such that EF ≤p P and P is closed under substitutions and modus ponens. Let Φ be some polynomial-time decidable set of tautologies such that P -proofs of all formulas from Φ can be constructed in polynomial time. Then EF + Φ ≤p P . The idea of the proof of this simulation is the following: if EF + Φ ≤m ϕ, then there are substitution instances ψ1 , . . . , ψk of formulas from Φ such that we have an EF -proof of ϕ from ψ1 , . . . , ψk . Using the deduction theorem for EF , we get k a polynomial-size EF -proof of ( i=1 ψi ) → ϕ. By the hypotheses P ≥p EF and P ∗ Φ, together with the closure properties of P , we can transform this proof into a polynomial-size P -proof of ϕ. Item 2 is most useful in the following form: 3. If the proof system P ≥p EF has the robust reflection property and P is closed under substitutions and modus ponens, then we get the p-simulation EF + RFN (P ) ≤p P . The converse simulation extends a result of Kraj´ıˇcek and Pudl´ ak [18], namely that every proof system P is p-simulated by the system EF + RFN (P ) . 4. Simulation of arbitrary systems by extensions of EF . Let P be an arbitrary proof system, and let Φ be some polynomial-time decidable set of tautologies. If EF + Φ-proofs of RFN (P ) n can be generated in polynomial time, then P ≤p EF + Φ. After these preparations we can now prove the implication 3 ⇒ 1. Let P be a proof system which has the robust reflection property and is closed under modus ponens and substitutions. We choose the formula ϕ as RFN (P ). Then EF + ϕ ≤p P by the above item 3. The converse simulation P ≤p EF + ϕ follows from item 4.
The equivalence of items 1 and 2 in the above corollary expresses some kind of compactness for extensions of EF : systems of the form EF + Φ are always equivalent to a system EF + ϕ with a single arithmetic formula ϕ. The equivalence to item 3 shows that these systems have a robust logical definition, independent of the particular axiomatization chosen for EF . We will now apply Theorem 11 to characterize the existence of optimal and p-optimal proof systems. These problems were posed by Kraj´ıˇcek and Pudl´ ak [18] and have been intensively studied during the last years [15,23,24]. We call a set A printable if there exists a polynomial-time algorithm which on input 1n outputs all words from A of length n.
Logical Closure Properties of Propositional Proof Systems
327
Corollary 12. The following conditions are equivalent: 1. P is a p-optimal propositional proof system. 2. P ≥p EF and P is closed under modus ponens and substitutions. Further, for every printable set of tautologies, P -proofs can be constructed in polynomial time. Using non-constructive versions of the conditions in item 2 we get a similar characterization of the existence of optimal proof systems. Probably the strongest available information on EF and its extensions stems from the connection of these systems to theories of bounded arithmetic. Actually, also Theorem 11 can be derived as a consequence from this more general context. In the remaining space we will sketch an axiomatic approach to this correspondence, as suggested by Kraj´ıˇcek and Pudl´ ak [19]. The correspondence works for pairs (T, P ) of arithmetic theories T and propositional proof systems P . It can be formalized as follows: Definition 13. A propositional proof system P is called regular if there exists an L-theory T such that the following two properties are fulfilled for (T, P ). 1. Let ϕ(x) be a Π1b -formula such that T (∀x)ϕ(x). Then there exists a polynomial-time computable function which on input 1n outputs a P -proof of ϕ(x) n . 2. T proves the correctness of P , i.e., T RFN (P ). Furthermore, P is the strongest proof system for which T proves the correctness, i.e., T RFN (Q) for a proof system Q implies Q ≤p P . Probably the most important instance of the general correspondence is the relation between S21 and EF . Property 1 of the correspondence, stating the simulation of S21 by EF , is essentially contained in [10], but for the theory PV instead of S21 . Examining the proof of this result, it turns out that the theorem is still valid if both the theory S21 and the proof system EF are enhanced by further axioms. Property 2 of the correspondence between S21 and EF was established by Kraj´ıˇcek and Pudl´ ak [19]. Again, this result can be generalized to extensions of S21 and EF by additional axioms. Combining these results, we can state: Theorem 14 (Cook [10], Buss [8], Kraj´ıˇ cek, Pudl´ ak [19]). Let Φ be a polynomial-time decidable set of true Π1b -formulas. Then the proof system EF + Φ is regular and corresponds to the theory S21 + Φ. Using these results, we can exhibit sufficient conditions for the regularity of a propositional proof system. From the definition of a regular system it is clear that regular proof systems have the reflection property. Furthermore, a combination of the properties of proof systems introduced in Sects. 3 and 4 guarantees the regularity of the system, namely: Theorem 15. If P is a proof system such that EF ≤p P and P has the robust reflection property and is closed under substitutions and modus ponens, then P is regular and corresponds to the theory S21 + RFN (P ).
328
O. Beyersdorff
Proof. The hypotheses on P imply EF + RFN (P ) ≡p P by Theorem 11. We will now check the axioms of the correspondence for S21 + RFN (P ) and P . Suppose ϕ is a ∀Π1b -formula such that S21 + RFN (P ) ϕ. By Theorem 14 we can construct EF + RFN (P ) -proofs of ϕ n in polynomial time. As we already know that EF + RFN (P ) is p-simulated by P , we obtain polynomial-size P -proofs of ϕ n . This proves part 1 of the correspondence. It remains to verify the second part. Clearly S21 +RFN (P ) RFN (P ). Finally, assume S21 + RFN (P ) RFN (Q) for some proof system Q. By Theorem 14 this implies that we can efficiently construct proofs of RFN (Q) in the system EF + RFN (P ) . Applying items 3 and 4 from the proof of Theorem 11 we infer
Q ≤p EF + RFN (P ) ≤p P .
6
Conclusion
The results of this paper suggest that logical closure properties can be used to give robust definitions of strong proof systems such as EF and its extensions. Continuing this line of research, it is interesting to ask, whether we can also characterize the degrees of weak systems like resolution or cutting planes in terms of similar closure properties. In particular, these weak systems are known to satisfy the feasible interpolation property [17]. Can interpolation in combination with other properties be used to characterize the degrees of weak systems? Pudl´ ak [21] provides strong evidence that interpolation and reflection are mutually exclusive properties. Which other combinations of such properties are possible? Further investigation of these questions will hopefully contribute to a better understanding of propositional proof systems. Acknowledgements. I am indebted to Jan Kraj´ıˇcek for many helpful and stimulating discussions on the topic of this paper. I also thank Sebastian M¨ uller for detailed suggestions on how to improve the presentation of the paper.
References 1. Atserias, A., Bonet, M.L.: On the automatizability of resolution and related propositional proof systems. Information and Computation 189(2), 182–201 (2004) 2. Beyersdorff, O.: Classes of representable disjoint NP-pairs. Theoretical Computer Science 377, 93–109 (2007) 3. Beyersdorff, O.: The deduction theorem for strong propositional proof systems. In: Arvind, V., Prasad, S. (eds.) FSTTCS 2007. LNCS, vol. 4855, pp. 241–252. Springer, Heidelberg (2007) 4. Bonet, M.L.: Number of symbols in Frege proofs with and without the deduction rule. In: Clote, P., Kraj´ıˇcek, J. (eds.) Arithmetic, Proof Theory and Computational Complexity, pp. 61–95. Oxford University Press, Oxford (1993) 5. Bonet, M.L., Buss, S.R.: The deduction rule and linear and near-linear proof simulations. The Journal of Symbolic Logic 58(2), 688–709 (1993) 6. Bonet, M.L., Buss, S.R., Pitassi, T.: Are there hard examples for Frege systems? In: Clote, P., Remmel, J. (eds.) Feasible Mathematics II, pp. 30–56. Birkh¨ auser, Basel (1995)
Logical Closure Properties of Propositional Proof Systems
329
7. Bonet, M.L., Pitassi, T., Raz, R.: Lower bounds for cutting planes proofs with small coefficients. The Journal of Symbolic Logic 62(3), 708–728 (1997) 8. Buss, S.R.: Bounded Arithmetic. Bibliopolis, Napoli (1986) 9. Buss, S.R.: An introduction to proof theory. In: Buss, S.R. (ed.) Handbook of Proof Theory, pp. 1–78. Elsevier, Amsterdam (1998) 10. Cook, S.A.: Feasibly constructive proofs and the propositional calculus. In: Proc. 7th Annual ACM Symposium on Theory of Computing, pp. 83–97 (1975) 11. Cook, S.A., Reckhow, R.A.: The relative efficiency of propositional proof systems. The Journal of Symbolic Logic 44, 36–50 (1979) 12. Gentzen, G.: Untersuchungen u ¨ber das logische Schließen. Mathematische Zeitschrift 39, 68–131 (1935) 13. Grollmann, J., Selman, A.L.: Complexity measures for public-key cryptosystems. SIAM Journal on Computing 17(2), 309–335 (1988) 14. H´ ajek, P., Pudl´ ak, P.: Metamathematics of First-Order Arithmetic. In: Perspectives in Mathematical Logic, Springer, Berlin (1993) 15. K¨ obler, J., Messner, J., Tor´ an, J.: Optimal proof systems imply complete sets for promise classes. Information and Computation 184, 71–92 (2003) 16. Kraj´ıˇcek, J.: Bounded Arithmetic, Propositional Logic, and Complexity Theory. Encyclopedia of Mathematics and Its Applications, vol. 60. Cambridge University Press, Cambridge (1995) 17. Kraj´ıˇcek, J.: Interpolation theorems, lower bounds for proof systems and independence results for bounded arithmetic. The Journal of Symbolic Logic 62(2), 457–486 (1997) 18. Kraj´ıˇcek, J., Pudl´ ak, P.: Propositional proof systems, the consistency of first order theories and the complexity of computations. The Journal of Symbolic Logic 54, 1079–1963 (1989) 19. Kraj´ıˇcek, J., Pudl´ ak, P.: Quantified propositional calculi and fragments of bounded arithmetic. Zeitschrift f¨ ur mathematische Logik und Grundlagen der Mathematik 36, 29–46 (1990) 20. Pudl´ ak, P.: Lower bounds for resolution and cutting planes proofs and monotone computations. The Journal of Symbolic Logic 62, 981–998 (1997) 21. Pudl´ ak, P.: On reducibility and symmetry of disjoint NP-pairs. Theoretical Computer Science 295, 323–339 (2003) 22. Razborov, A.A.: On provably disjoint NP-pairs. Technical Report TR94-006, Electronic Colloquium on Computational Complexity (1994) 23. Sadowski, Z.: On an optimal propositional proof system and the structure of easy subsets of TAUT. Theoretical Computer Science 288(1), 181–193 (2002) 24. Sadowski, Z.: Optimal proof systems, optimal acceptors and recursive presentability. Fundamenta Informaticae 79(1–2), 169–185 (2007) 25. Wrathall, C.: Rudimentary predicates and relative computation. SIAM Journal on Computing 7(2), 149–209 (1978)
Graphs of Linear Clique-Width at Most 3 Pinar Heggernes, Daniel Meister, and Charis Papadopoulos Department of Informatics, University of Bergen, 5020 Bergen, Norway [email protected], [email protected], [email protected] Abstract. We give the first characterisation of graphs of linear cliquewidth at most 3, and we give a polynomial-time recognition algorithm for such graphs.
1
Introduction
Clique-width is an important graph parameter that is useful for measuring the computational complexity of NP-hard problems. In particular, all problems that can be expressed in a certain kind of monadic second order logic can be solved in linear time on graphs whose clique-width is bounded by a constant [5]. The clique-width of a graph is defined as the smallest number of labels that are needed for constructing the graph using the graph operations ‘vertex creation’, ‘union’, ‘join’ and ‘relabel’. The related graph parameter linear clique-width is obtained by restricting the allowed clique-width operations to only ‘vertex creation’, ‘join’ and ‘relabel’. Both parameters are NP-hard to compute, even on complements of bipartite graphs [7]. The relationship between clique-width and linear clique-width is similar to the relationship between treewidth and pathwidth, and the two pairs of parameters are related [7,10,12]. However, clique-width can be viewed as a more general concept than treewidth since there are graphs of bounded clique-width but unbounded treewidth, whereas graphs of bounded treewidth have bounded clique-width. While treewidth is widely studied and well understood the knowledge on clique-width is still limited. The study of the more restricted parameter linear clique-width is a step towards a better understanding of clique-width. For example, NP-hardness of clique-width is obtained by showing that linear clique-width is NP-hard to compute [7]. In this paper, we contribute to the study of linear clique-width. The main result that we report is the first characterisation of graphs that have linear cliquewidth at most 3 and the first polynomial-time algorithm to decide whether a graph has linear clique-width at most 3. We also show that such graphs are both cocomparability and weakly chordal graphs, and we present a decomposition scheme for them. Such a decomposition scheme is useful for designing algorithms especially tailored for this graph class. For bounded linear clique-width, until now only graphs of linear clique-width at most 2 could be recognised in polynomial
This work is supported by the Research Council of Norway through grant 166429/V30. Some proofs are omitted in this version. They can be found in [11].
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 330–341, 2008. c Springer-Verlag Berlin Heidelberg 2008
Graphs of Linear Clique-Width at Most 3
331
time [8,3], and it was an open question whether the same is true for graphs of linear clique-width at most 3 [9,10]. For bounded clique-width, graphs of clique-width at most 2 [6] and at most 3 [2] can be recognized in polynomial time. Whereas graphs of clique-width at most 2 are characterised as the class of cographs [6], no characterisation is known for graphs of clique-width at most 3. Furthermore, from the proposed algorithm in [2] there is even no straightforward way of deciding whether a graph of clique-width at most 3 has linear clique-width at most 3. Before giving the mentioned results, we define a graph reduction operation that preserves linear clique-width.
2
Graph Preliminaries and Linear Clique-Width
We consider undirected finite graphs with no loops or multiple edges. For a graph G = (V, E), we denote its vertex and edge set by V (G) = V and E(G) = E, respectively. Usually, graphs are non-empty, and may be empty only in case it is explicitly mentioned. For a vertex set S ⊆ V , the subgraph of G induced by S is denoted by G[S]. We denote by G − S the graph G[V \ S] and by G−v the graph G[V \ {v}]. The neighbourhood of a vertex x in G is NG (x) = {v : xv ∈ E} and its degree is |NG (x)|. The closed neighbourhood of x is NG [x] = NG (x) ∪ {x}. Vertex x is isolated in G if NG (x) = ∅ and universal if NG [x] = V (G). Two vertices x, y of G are called false twins if NG (x) = NG (y). An induced cycle on k vertices is denoted by Ck and an induced path on k vertices is denoted by Pk . The graph consisting of only two disjoint edges is denoted by 2K2 , the complement of the graph consisting of two disjoint P3 ’s is denoted by co-(2P3 ). Let G and H be two vertex-disjoint graphs. The (disjoint) union of G and H, denoted by G ⊕ H, is the graph with vertex set V (G) ∪ V (H) and edge set E(G) ∪ E(H). The join of G and H, denoted by G ⊗ H, is the graph obtained from the union of G and H and adding all edges between vertices of G and vertices of H. Let H be a family of graphs. A graph is H-free if none of its induced subgraphs is in H. The class of cographs is defined recursively as follows: a single vertex is a cograph, the disjoint union of two cographs is also a cograph, and the complement of a cograph is a cograph. The class of cographs coincides with the class of P4 -free graphs [3]. The notion of clique-width was first introduced in [4]. The clique-width of a graph G, denoted by cwd(G), is defined as the minimum number of labels needed to construct G, using the following operations: (i) (ii) (iii) (iv)
Creation of a new vertex v with label i, denoted by i(v); Disjoint union, denoted by ⊕; Changing all labels i to j, denoted by ρi→j ; Adding edges between all vertices with label i and all vertices with label j, i = j, denoted by ηi,j (= ηj,i ).
332
P. Heggernes, D. Meister, and C. Papadopoulos
An expression built by using the above four operations is called a clique-width expression. If k labels are used in a clique-width expression then it is called a k-expression. We say that a k-expression t defines a graph G if G is equal to the graph obtained by using the operations in t in the order given by t. The linear clique-width of a graph, denoted by lcwd(G), is introduced in [10] and defined by restricting the disjoint union operation (ii) of clique-width. In a linear clique-width expression all clique-width operations are allowed, but whenever the ⊕ operation is used, at least one of the two operands must be an expression defining a graph on a single vertex. The restricted version of operation (ii) becomes redundant if we allow operation (i) to automatically add the vertex to the graph as an isolated vertex when it is created. For simplicity, we adopt this notation in this paper. Hence whenever a vertex v is created with operation i(v), it is added to the graph as an isolated vertex v with label i, which means that we never use operation ⊕ in our linear clique-width expressions. With this convention, the linearity of a linear clique-width expression becomes clearly visible. For every k-expression that defines a graph G there is a parse tree that encodes the composition of G starting with single vertices followed by interleaved operations of relabeling, edge insertion, and disjoint union. If t is a linear clique-width expression then its parse tree is path-like. Thus one can view the relationship between clique-width and linear clique-width analogous to the relationship between treewidth and pathwidth. Note that the difference between clique-width and linear clique-width of a graph can be arbitrarily large. For example cographs and trees have bounded clique-width [6] but unbounded linear clique-width [10].
3
A Graph Reduction Operation That Preserves Linear Clique-Width
Some graph operations preserve clique-width. Substituting a vertex by a graph is the replacement of a vertex v of G with a graph H such that every vertex of H is adjacent to its neighbours in H and the neighbours of v in G. The modular decomposition is the reverse operation of obtaining a graph recursively by substitution. A module in a graph is a set of vertices that have the same neighbourhood outside of the module. A prime graph with respect to modular decomposition is a graph that cannot be obtained by non-trivial substitution. The clique-width of a graph G is equal to the maximum of the clique-width of all prime induced subgraphs of G [6]. Thus for clique-width it is enough to consider the prime graphs appearing in the modular decomposition. For linear clique-width this is not true, since cographs have unbounded linear clique-width whereas they are completely decomposable with respect to modular decomposition and they have no prime graph [1]. In other words modules do not affect the clique-width of a graph and the scheme provided by modular decomposition gives an efficient way of considering only the clique-width of the decomposed subgraphs. Motivated by the above property on clique-width, in this section we give an analogous result for linear
Graphs of Linear Clique-Width at Most 3
333
clique-width. In particular we are able to show that for certain types of modules this nice property holds when restricted to linear clique-width of graphs. Definition 1. A set of vertices M in a graph G is a maximal independentset module of G if M is an inclusion-maximal set of vertices that is both an independent set and a module of G. For a graph G, we define a binary relation for false twins u and v, denoted by u ∼ft v. Since ∼ft is an equivalence relation, the corresponding equivalence classes partition the vertex set. It is not difficult to verify that the equivalence classes are exactly the maximal independent-set modules of G. We define the graph G/ ∼ft as follows: there is a vertex in G/ ∼ft for every maximal independent-set module of G, and two vertices of G/ ∼ft are adjacent if and only if the corresponding maximal independent-set modules in G contain adjacent vertices. Clearly G/ ∼ft is isomorphic to an induced subgraph of G. An independent-set module is called trivial if it contains a single vertex. Lemma 1. For a graph G, the independent-set modules of G/ ∼ft are trivial. Lemma 2. For a graph G, lcwd(G) = lcwd(G/ ∼ft ). Proof. Since G/ ∼ft is isomorphic to an induced subgraph of G, the inequality lcwd(G/ ∼ft ) ≤ lcwd(G) is immediate. For showing lcwd(G) ≤ lcwd(G/ ∼ft ), let Mx for x ∈ V (G/ ∼ft ) be the maximal independent-set module of G corresponding to x. Let a be a linear clique-width k-expression for G/ ∼ft where k = lcwd(G/ ∼ft ) and let x be the label of x when adding x in the expression a. Define a linear clique-width k-expression a for G as: for every vertex x of G/ ∼ft place the vertices of Mx at the occurrence of the addition of x and give them the same label of x. That is, we replace the appearance of x (x) in a with x (v1 ) · · · x (v|Mx | ), where vi ∈ Mx . Mx is a module in G and every vertex of Mx will obtain the same neighbourhood as x in the graph defined by a . And since every vertex of Mx receives the same label in a , Mx is an independent set in the graph defined by a . Hence lcwd(G) ≤ lcwd(G/ ∼ft ).
4
Characterisation of Graphs of Linear Clique-Width at Most 3
We introduce a graph decomposition scheme such that graphs of linear cliquewidth at most 3 are exactly the graphs that are completely decomposable. Using this characterisation, we show that graphs of linear clique-width at most 3 are both cocomparability and weakly-chordal graphs, which particularly means that graphs of linear clique-width at most 3 do not contain long cycles or complements of long cycles as induced subgraphs. As a first step, we give a characterisation of graphs of linear clique-width at most 2. The only known characterisation of this graph class is as the class of {2K2 , P4 , co-(2P3 )}-free graphs [8].
334
P. Heggernes, D. Meister, and C. Papadopoulos
Definition 2. A graph G is a simple cograph if G satisfies one of the following conditions: (1) G is an edgeless graph. (2) G can be partitioned into two graphs A and B such that A is a simple cograph and B is an edgeless graph and G = A ⊗ B or G = A ⊕ B. Recall that cographs are graphs that are completely decomposable into edgeless graphs with respect to operations ‘join’ and ‘union’. Hence simple cographs are defined as the graphs that are completely decomposable in the same way, with the restriction that one operand is always an edgeless graph in the ‘join’ and ‘union’ operations. Theorem 1. For a graph G, the following statements are equivalent: (1) G is a simple cograph. (2) G is {2K2 , P4 , co-(2P3 )}-free. (3) G can be reduced to a graph on a single vertex by repeatedly deleting an isolated vertex, a universal vertex or a false twin vertex. Note that characterisation (3) of Theorem 1 implies a simple and linear-time recognition algorithm for graphs of linear clique-width at most 2. Similar to graphs of linear clique-width at most 2, we want to give a decomposition scheme for graphs of linear clique-width at most 3. We define a set of operations and show that graphs of linear clique-width at most 3 are exactly the graphs that are completely decomposable into edgeless graphs by using these operations. Unlike for simple cographs, the operations are complex. We define the decomposition as a class of formal expressions. An lc3-expression is inductively defined as follows, where d ∈ {l, r}. (d1) (A) is an lc3-expression, where A is a non-empty set of vertices. (d2) Let T be an lc3-expression and let A be a (possibly empty) set of vertices
not containing a vertex appearing in T , then (d[T ], A) is an lc3-expression. (d3) Let T be an lc3-expression and let A1 , A2 , A3 , A4 be disjoint (and possi-
bly empty) sets of vertices not containing a vertex appearing in T , then (A1 |A2 , d[T ], A3 |A4 ) is an lc3-expression. (d4) Let T be an lc3-expression and let A1 , A2 , A3 , A4 be disjoint (and possibly empty) sets of vertices not containing a vertex appearing in T , and let p be one of the following number sequences: 123, 132, 312, 321, 12, 32 (not allowed are 213, 231, 21, 23, 13, 31), then T (A1 |A2 , •), T (•, A1 |A2 ) and T ◦ (A1 , A2 |A3 , A4 , d[p]) are lc3-expressions where ∈ {⊗, ⊕}. This completes the definition of lc3-expressions. These expressions define graphs. Important and necessary is that these expressions allow to fix a vertex partition for the defined graphs. The graph defined by an lc3-expression and its associated vertex partition is obtained according to the following inductive definition. Let T be an lc3-expression. Then, G(T ) is the following graph, where T always means an lc3-expression and A, A1 , A2 , A3 , A4 are sets of vertices and d ∈ {l, r}:
Graphs of Linear Clique-Width at Most 3
335
(i1) Let T = (A), then G(T ) is the edgeless graph on vertex set A; the vertex
partition associated with G(T ) is (A, ∅).
(i2) Let T = T (A1 |A2 , •) for ∈ {⊗, ⊕}, and let G(T ) with vertex par-
(i3) (i4)
(i5)
(i6)
tition (B, C) be given, then G(T ) is obtained from G(T ) by adding the vertices in A1 and A2 and the edges B (A1 ∪ A2 ) and A2 ⊗ C; the vertex partition associated with G(T ) is ((B ∪ A1 ∪ A2 ), C). The case T = T (•, A1 |A2 ) is similar to the previous one, where the edges are C (A1 ∪ A2 ) and A2 ⊗ B and the vertex partition is (B, (C ∪ A1 ∪ A2 )). Let T = T ◦(A1 , A2 |A3 , A4 , d[p]), and let G(T ) with vertex partition (B, C) be given, then G(T ) is obtained from G(T ) by adding the vertices in A1 ∪ A2 ∪ A3 ∪ A4 ; the set of edges is dependent on p: we have three start sets, A2 , B, C, and three additional sets, A3 , A1 , A4 , that correspond to A2 , B, C, respectively; the first join is executed between the two specified start sets, then the two additional sets are added to the sets involved in the first join; the second join is executed between a now enlarged set and a start set, then the third additional set is added to its corresponding start set and the last join operation (if there is one) is executed; so the result depends on the order of the operations; the numbers stand for: 1 means A2 ⊗ B, 2 means B ⊗ C, 3 means A2 ⊗ C; note that the start sets become bigger after each join operation; the vertex partition associated with G(T ) is ((B ∪ A1 ∪ A2 ∪ A3 ), (C ∪ A4 )) or ((B ∪ A1 ), (C ∪ A4 ∪ A2 ∪ A3 )) for d = l or d = r, respectively. Let T = (d[T ], A), and let G(T ) with vertex partition (B, C) be given, then G(T ) is the graph defined by T ⊗ (•, A|∅) or T ⊗ (A|∅, •) for d = l or d = r, respectively; the vertex partition associated with G(T ) is (B ∪ C, A). Let T = (A1 |A2 , d[T ], A3 |A4 ), and let G(T ) with vertex partition (B, C) be given, then G(T ) is the graph defined by T ◦ (A1 , A3 |A4 , A2 , d[p ]) where p =def 132 or p =def 312 for d = l or d = r, respectively; the vertex partition associated with G(T ) is ((A1 ∪ A2 ∪ B ∪ C), (A3 ∪ A4 )).
Definition 3. A graph G is an lc3-graph if there is an lc3-expression T such that G = G(T ). It is not difficult to show that induced graphs of lc3-graphs are also lc3-graphs. Note that this is not trivial, since a given lc3-expression might require considerable changes to obtain an lc3-expression for the induced subgraph. Before showing that lc3-graphs are exactly the graphs of linear clique-width at most 3, we give two structural properties. For simplicity, we assume that empty graphs are simple cographs. Lemma 3. Let T be an lc3-expression, and let (B, C) be the vertex partition associated with G(T ). Then, the subgraph of G(T ) induced by C is a simple cograph. Proof. We give a construction, that works on the inductive definition of lc3expressions. Let GC denote the subgraph of G(T ) induced by vertex partition set C. If T = (A) for some set A of vertices, then C is empty, and then GC is
336
P. Heggernes, D. Meister, and C. Papadopoulos
G1
G3 H3
H1
G2 H2
Fig. 1. The result of an lc3-composition of the graphs G1 , G2 , G3 , H1 , H2 , H3 . The thick lines from graphs G1 and H1 represent joins. The bow tie between G2 ⊕ H2 and G3 ⊕ H3 means either a join or no edge at all.
an empty graph. Empty graphs are simple cographs. If T = (d[T ], A) or T = (A3 |A4 , d[T ], A1 |A2 ) for d ∈ {l, r}, T an lc3-expression and A1 , A2 , A3 , A4 sets of vertices, then GC is an edgeless graph on vertex set A or A1 ∪A2 , respectively. If T = T (A1 |A2 , •) where ∈ {⊕, ⊗}, then C induces GC also in G(T ), and we obtain the result by application of the induction hypothesis. Now, let (B , C ) be the vertex partition associated with G(T ). If T = T ◦ (A1 , A2 |A3 , A4 , l[p]) and A4 is non-empty, then GC is equal to GC ⊕ G(T )[A4 ], where GC denotes the subgraph of G induced by C . According to induction hypothesis, GC is a simple cograph, and since G(T )[A4 ] is an edgeless graph, GC is a simple cograph. Using the same definitions for the case T = T (•, A1 |A2 ) where ∈ {⊕, ⊗}, GC is equal to GC G(T )[A1 ∪ A2 ], which is a simple cograph. Finally, let T = T ◦(A1 , A2 |A3 , A4 , r[p]). Depending on p, GC is one of the following graphs: – (GC ⊗ G(T )[A2 ∪ A3 ]) ⊕ G(T )[A4 ] – (GC ⊗ G(T )[A2 ]) ⊕ G(T )[A3 ∪ A4 ] – (GC ⊕ G(T )[A4 ]) ⊗ G(T )[A2 ∪ A3 ] . All these graphs are simple cographs, and we conclude the proof. The second property of lc3-graphs is a closure property for a special composition operation. Let G1 , G2 , G3 be an edgeless graph, a simple cograph and an lc3graph (in arbitrary assignment) and let H1 , H2 , H3 be edgeless graphs. Graphs may be empty but at least two of them are non-empty. Then, the graph that is obtained from these six graphs and the additional edges as in Figure 1 is an lc3graph. The bow tie in Figure 1 means either a join between G2 ⊕H2 and G3 ⊕H3 or no edge at all. We call this operation complete or incomplete lc3-composition depending on whether the bow tie represents a join operation (‘complete’) or a union operation (‘incomplete’). Lemma 4. Let G1 , G2 , G3 be an edgeless graph, a simple cograph and an lc3graph and let H1 , H2 , H3 be edgeless graphs, where at least two of the six graphs are non-empty. Then, both the complete and the incomplete lc3-compositions of these graphs yield lc3-graphs.
Graphs of Linear Clique-Width at Most 3
337
Proof. Let G2 be an edgeless graph, G3 be a simple cograph, G1 be an lc3-graph. Let T be an lc3-expression for G1 . Let T =def (l[T ], ∅). Obtain T ∗ by adding operations of the form ⊕(•, ∅|A) and ⊗(•, ∅|A) to T to obtain an lc3-expression for G1 ⊗ G3 . This can be done by reversing the construction procedure described in the proof of Lemma 3. Then, the expressions T ∗ ⊕ (V (H1 )|∅, •) ⊕ (•, V (H3 )|∅) ⊗ (•, V (H2 )|V (G2 )) T ∗ ⊕ (V (H1 )|∅, •) ⊕ (•, V (H3 )|∅) ⊕ (•, V (H2 )|V (G2 )) define lc3-expressions for the complete and incomplete lc3-composition of the six graphs. In case that G2 is an edgeless graph, G1 is a simple cograph and G3 is an lc3-graph, we obtain lc3-expressions for the lc3-compositions in a similar way. Let G1 be an edgeless graph. Similar to the construction above, we obtain an lc3-expression T for G2 ⊕ G3 . Then, T ◦ (V (H2 ), V (G1 )|V (H1 ), V (H3 ), l[p])
or
T ◦ (V (H1 ), V (G1 )|V (H1 ), V (H2 ), l[p]) for an appropriate choice of p is an lc3-expression for the complete or incomplete lc3-composition. The case of G3 being an edgeless graph is analogous. Our main theorem is proven by considering linear clique-width 3-expressions. Linear clique-width expressions may contain redundant operations with no affect to the defined graph. As an auxiliary result, the following lemma shows a normalisation result for linear clique-width expressions. We use the following notation. Let t = t1 · · · tq be a linear clique-width k-expression. By G[t1 · · · ti ], we denote the graph that is defined by subexpression t1 · · · ti of t. We say that vertices with the same label in G[t1 · · · ti ] belong to the same label class of G[t1 · · · ti ]. Lemma 5. Let k ≥ 1 and let G be a graph that has a linear clique-width kexpression. Then, G has a linear clique-width k-expression a = a1 · · · ar such that the following holds for all join and relabel operations ai in a: (1) G[a1 · · · ai ] does not contain an isolated vertex (2) G[a1 · · · ai−1 ] contains vertices of the two label classes involved in ai . Proof. Let b = b1 · · · bs be a linear clique-width k-expression for G. Let bi be a join or relabel operation and suppose that G[b1 · · · bi ] contains an isolated vertex, say x. Let c be the label of x in G[b1 · · · bi ]. If bi is a join operation then c is not one of the two join labels. We obtain b = b1 · · · bs from b by deleting the vertex creation operation for x in b and adding the operation c(x) right after operation bi . Then, G[b1 · · · bi ] = G[b1 · · · bi ]. Iterated application of this operation shows existence of a linear clique-width k-expression having the first property. For the second property, let d = d1 · · · dt be a linear clique-width k-expression that has the first property. Let di = ηc,c be a join operation. If G[d1 · · · di ] does not contain a vertex with label c or c , di does not add an edge to G[d1 · · · di−1 ], i.e.,
338
P. Heggernes, D. Meister, and C. Papadopoulos
G[d1 · · · di−1 ] = G[d1 · · · di ]. We obtain d from d by deleting operation di . Let di = ρc→c be a relabel operation, and suppose that one of the two involved label classes is empty in G[d1 · · · di−1 ]. If the label class corresponding to c is empty, we obtain d from d by just deleting operation di . If the class corresponding to c is non-empty but the class corresponding to c is empty, we obtain d from d by first exchanging c and c in all operations di+1 , . . . , dt and then deleting di . It is clear that G[d1 · · · dj ] and G[d1 · · · dj−1 ] correspond to each other for every j ∈ {i + 1, . . . , t} with the exception that the classes corresponding to c and c are exchanged. Repeated application of the modification completes the proof. Theorem 2. A graph has linear clique-width at most 3 if and only if it is an lc3-graph. The main idea of the proof of the ‘only-if’ part is to partition a given normalised linear clique-width 3-expression into subexpressions between two relabel operations. The modifications of the described graph are expressed by appropriate combinations of lc3-expression operations. For the converse, it is mainly sufficient to give linear clique-width 3-expressions for each lc3-expression operation. Using the new characterisation of graphs of linear clique-width at most 3, we can show that these graphs are cocomparability graphs. A graph is called cocomparability graph if directions can be assigned to the edges of its complement to obtain a directed graph that is transitive. For a graph G = (V, E), a vertex ordering σ is called cocomparability ordering if for every triple u, v, w of vertices such that u ≺σ v ≺σ w, uw ∈ E implies uv ∈ E or vw ∈ E. A graph is cocomparability if and only if it has a cocomparability ordering [13]. Proposition 1. Lc3-graphs are cocomparability graphs. Proof. We show the statement by induction over the definition of lc3-expressions. Let G = (V, E) be an lc3-graph with lc3-expression T . We show that there is a cocomparability ordering for G that respects the vertex partition associated with G(T ). First, let T = (A) for a set A of vertices. Then, G(T ) is an edgeless graph, and every vertex sequence for G is a cocomparability ordering for G. Furthermore, every vertex sequence for G respects the vertex partition (A, ∅). Now, let T be more complex. For the rest of the proof, let T be an lc3-expression with associated vertex partition (B, C), let σ be a cocomparability ordering for G(T ) that respects partition (B, C), which means that the vertices in B appear consecutively and the vertices in C appear consecutively in σ , let A, A1 , A2 , A3 , A4 be sets of vertices, d ∈ {l, r} and p an appropriate number sequence. We consider only the more interesting cases here. – T = T (A1 |A2 , •) for ∈ {⊕, ⊗}. We obtain σ from σ by adding the vertices in A1 to the left of the vertices in B and the vertices in A2 between the vertices in B and C. Then, σ is a cocomparability ordering for G and the vertices in B ∪ A1 ∪ A2 appear consecutively. – T = (l[T ], A). The vertices in A are adjacent to only the vertices in C. We obtain σ from σ by adding the vertices in A to the right of the vertices in C. Then, σ is a cocomparability ordering for G that respects the partition (B ∪ C, A).
Graphs of Linear Clique-Width at Most 3
339
– T = T ◦(A1 , A2 |A3 , A4 , l[12]). The vertices in B and A2 are adjacent and the vertices in B ∪ A1 and C are adjacent. We obtain σ by placing the vertices in the following order: A3 , A2 , B, A1 , C, A4 , and the vertices in B and C appear in order determined by σ . Then, σ is a cocomparability ordering for G and respects the vertex partition ((B ∪ A1 ∪ A2 ∪ A3 ), (C ∪ A4 )). – T = T ◦ (A1 , A2 |A3 , A4 , r[12]). We place the vertices in order A4 , A3 , A2 , C, B, A1 . The remaining cases are (A1 |A2 , d[T ], A3 |A4 ) and T = T ◦ (A1 , A2 |A3 , A4 , d[p]) where p ∈ {123, 132, 312, 321}. We begin with the situation in Figure 1. Suppose cocomparability orderings for the graphs G1 , G2 , G3 , H1 , H2 , H3 are given. We define three vertex orderings for the depicted graph. The vertices of every partition graph appear consecutively and in order defined by the given orderings. The order of the partition graphs is as follows: H2 , G3 , G2 , H3 , G1 , H1 ; H3 , H1 , G1 , G3 , G2 , H2 ; H2 , G1 , G2 , H1 , G3 , H3 . It is easy to check that all three vertex orderings actually define cocomparability orderings for the graph (scheme) depicted in Figure 1. Furthermore, for every i ∈ {1, 2, 3}, there is a cocomparability ordering such that the vertices of Gi ⊕ Hi appear consecutively at the end of the ordering. Depending on p, the pairs (B, A1 ), (C, A4 ), (A2 , A3 ) are matched to (G1 , H1 ), (G2 , H2 ), (G3 , H3 ), and depending on d and the particular case, one of the vertex orderings is chosen to achieve the correct vertex partition. This completes the proof. Proposition 1 shows that lc3-graphs do not contain induced cycles of length more than 4, since such cycles are no cocomparability graphs. This also holds the complements of such cycles. Graphs that neither contain induced cycles of length at least 5 nor their complements as induced subgraphs, are called weaklychordal. The proof of the following proposition requires a careful but not difficult argumentation. Proposition 2. Lc3-graphs are weakly-chordal.
5
Recognition of Graphs of Linear Clique-Width at Most 3
The results of the previous section provide an efficient recognition algorithm for lc3-graphs. This algorithm, called Lc3-graphRecognition, is given in Figure 2. The main loop decomposes the input graph using the reverse of lc3composition as long as possible. When such a decomposition is not possible, a vertex partition has to be determined. Algorithm Simplify applies the opposite strategy, since it starts already with a vertex partition. We call a vertex almostuniversal if it has exactly one non-neighbour. The following lemma summarises properties of Simplify. Lemma 6. Let G = (V, E) be a graph, and let (B, C) be a vertex partition for G. Let G not have false twin vertices in B and in C. If Simplify applied to G and (B, C) . . .
340
P. Heggernes, D. Meister, and C. Papadopoulos
Algorithm Lc3-graphRecognition Input a graph G = (V, E) Result an answer accept or reject set G := G/ ∼ft ; while if
no return
do
G is edgeless
then
return accept
else-if G is the result of an lc3-composition of at least two non-empty graphs if all partition graphs are simple cographs then return accept else set G to the partition graph that is not a simple cograph end if
then
else-if
there is a vertex u such that Simplify on G−u with partition (NG (u), V (G) \ NG [u]) returns a graph G set G := G
then
there is a pair u, v of non-adjacent vertices where the degree of v is |V (G)| − 2 such that Simplify on G − {u, v} with partition (NG (u), V (G) \ (NG (u) ∪ {u, v})) returns a graph G then set G := G
else-if
else
return reject
end if
end while. Algorithm Simplify Input a graph G and a vertex partition (B, C) for G Result an answer accept or reject or a graph G , where accept returns a graph on a single vertex while if
no return
do
G is edgeless
else-if
then
B = ∅ or C = ∅
return accept then
return G
else-if
G has a vertex u such that NG (u) = ∅ or NG (u) = B or NG (u) = C or NG [u] = V (G) or NG [u] = B or NG [u] = C then set G := G−u and (B, C) := (B \ {u}, C \ {u})
else-if
there is a pair u, v of non-adjacent vertices where v is almost-universal such that u, v ∈ B and NG (u) = B \ {u, v} or u, v ∈ C and NG (u) = C \ {u, v} then set G := G − {u, v} and (B, C) := (B \ {u, v}, C \ {u, v})
else-if
B or C is singleton set containing vertex u and |V (G) \ NG [u]| = 2 or V (G) \ NG [u] = {x, z} and the sets NG (x) \ {z} and NG (z) \ {x} can be ordered by inclusion then set G := G−u and (B, C) := (V (G) \ NG [u], NG (u))
else-if
G is the result of an lc3-composition of at least two non-empty graphs that respects the given vertex partition then if all partition graphs are simple cographs then return accept else return the partition graph that is not a simple cograph end if
else
return reject
end if
end while.
Fig. 2. The recognition algorithm for lc3-graphs
(1) returns accept then G is an lc3-graph and there is an lc3-expression T for G that associates G(T ) with vertex partition (B, C) or (C, B). (2) returns reject then G is not an lc3-graph or there is no lc3-expression T for G that associates G(T ) with vertex partition (B, C) or (C, B). (3) returns a graph G then G is an lc3-graph and there is an lc3-expression T for G that associates G(T ) with vertex partition (B, C) or (C, B) if and only if G is an lc3-graph. Furthermore, G is a module and subgraph of G that is induced by vertices only in B or in C. Theorem 3. There is an O(n2 m)-time algorithm for recognising lc3-graphs. Correctness. If the algorithm accepts, an lc3-expression can be constructed for the input graph. Important subroutines are the proofs of Lemmata 4 and 2. The converse is proven by induction. Let G be the input graph, and let T be an lc3-expression for G. For space reasons, we restrict G to have only trivial independent-set modules and not to be the result of an lc3-composition. The last operation in T is not of the forms (d1) and (d3) and the complex operation of (d4). Let the last operation be of the form (d2). There is a vertex u such that G−u is an lc3-graph with lc3-expression T that associates G(T ) with the
Graphs of Linear Clique-Width at Most 3
341
vertex partition (NG (u), V (G)\NG [u]) or its reverse. Due to Lemma 6, Simplify outputs a proper subgraph of G, that is accepted. Let the last operation of T be of the form (A1 |A2 , •) or (•, A1 |A2 ). If = ⊕ then A1 is empty. The vertex in A2 , say u, defines vertex partition (V (G) \ NG [u], NG (u)) or (NG (u), V (G) \ NG [u]) for G−u, and this partition corresponds to the vertex partition associated with G(T )−u. So, G is accepted. Let = ⊗. If A2 is empty, the case is similar to the previous one. If A1 is empty, the vertex in A2 is universal. Thus, A1 and A2 are non-empty. The vertex in A2 is adjacent to all vertices but the vertex in A1 , and the vertex in A1 defines a vertex partition for G. The algorithm accepts. Running time. Graph G/ ∼ft can be computed in linear time. Every whileloop execution is done with a smaller graph, so that there are at most n whileloop executions. A single while-loop execution takes time O(nm): linear time for checking for result of an lc3-composition and at most 2n applications of Simplify, that requires linear time each call. This shows the total O(n2 m) running time.
References 1. Brandst¨ adt, A., Le, V.B., Spinrad, J.P.: Graph Classes: A Survey. SIAM Monographs on Discrete Mathematics and Applications (1999) 2. Corneil, D.G., Habib, M., Lanlignel, J.-M., Reed, B.A., Rotics, U.: Polynomial time recognition of clique-width ≤ 3 graphs. In: Gonnet, G.H., Viola, A. (eds.) LATIN 2000. LNCS, vol. 1776, pp. 126–134. Springer, Heidelberg (2000) 3. Corneil, D.G., Perl, Y., Stewart, L.K.: A linear recognition algorithm for cographs. SIAM J. Comput. 14, 926–934 (1985) 4. Courcelle, B., Engelfriet, J., Rozenberg, G.: Handle-rewriting hypergraph grammars. J. Comput. System Sci. 46, 218–270 (1993) 5. Courcelle, B., Makowsky, J.A., Rotics, U.: Linear time solvable optimization problems on graphs of bounded clique-width. Theory Comput. Syst. 33, 125–150 (2000) 6. Courcelle, B., Olariu, S.: Upper bounds to the clique width of graphs. Discrete Applied Mathematics 101, 77–114 (2000) 7. Fellows, M.R., Rosamond, F.A., Rotics, U., Szeider, S.: Clique-width Minimization is NP-hard. In: Proceedings of the thirty-eighth annual ACM Symposium on Theory of Computing, STOC 2006, pp. 354–362 (2006) 8. Gurski, F.: Characterizations for co-graphs defined by restricted NLC-width or clique-width operations. Discrete Mathematics 306, 271–277 (2006) 9. Gurski, F.: Linear layouts measuring neighbourhoods in graphs. Discrete Mathematics 306, 1637–1650 (2006) 10. Gurski, F., Wanke, E.: On the relationship between NLC-width and linear NLCwidth. Theoretical Computer Science 347, 76–89 (2005) 11. Heggernes, P., Meister, D., Papadopoulos, Ch.: Graphs of small bounded linear clique-width. Technical report, no.362, Institute for Informatics, University of Bergen, Norway (2007) 12. Hlinˇen´ y, P., Oum, S., Seese, D., Gottlob, G.: Width parameters beyond tree-width and their applications. Computer Journal (to appear) 13. Kratsch, D., Stewart, L.: Domination on cocomparability graphs. SIAM Journal on Discrete Mathematics 6, 400–417 (1993)
A Well-Mixed Function with Circuit Complexity 5n ± o(n): Tightness of the Lachish-Raz-Type Bounds Kazuyuki Amano1 and Jun Tarui2 1
2
Dept of Comput Sci, Gunma Univ, Tenjin 1-5-1, Kiryu, Gunma 376-8515, Japan [email protected] Dept of Info & Comm Eng, Univ of Electro-Comm, Chofu, Tokyo 182-8585, Japan [email protected]
Abstract. A Boolean function on n variables is called k-mixed if for any two different restrictions fixing the same set of k variables must induce different functions on the remaining n−k variables. In this paper, we give an explicit construction of an n − o(n)-mixed Boolean function whose circuit complexity over the basis U2 is 5n + o(n). This shows that a lower bound method on the size of a U2 -circuit that uses the property of k-mixed, which gives the current best lower bound of 5n − o(n) on a U2 -circuit size (Iwama, Lachish, Morizumi and Raz [STOC ’01, MFCS ’02]), has reached the limit.
1
Introduction and Results
A Boolean function on n variables is called k-mixed if for any two different restrictions fixing the same set of k variables must induce different functions on the remaining n − k variables. The notion of a k-mixed Boolean function, which was introduced by Jukna [7], plays an important role on deriving explicit lower bounds on several computational models. The best known 5n − o(n) lower bound on the size of a Boolean circuit over the basis U2 can be established for any n − o(n)-mixed Boolean function [5,6]. The basis U2 is the set of all Boolean functions over two variables except for the XOR function and its complement. It is also known that the size of any read-once branching program computing a k-mixed Boolean function is at least 2k (see e.g., [12,10]). This result was used to show the existence of a√function in P having read-once branching program size not smaller than 2n−O( n) [10] (see also [2] for an improved result). In this paper, we focus on the complexity of Boolean circuits over the basis U2 . Deriving a good lower bound for an explicit Boolean function in such a general circuit model is one of the central problems in computer science. In 1991, Zwick [14] gave a lower bound of 4n − O(1) for a certain family of symmetric Boolean functions. After a decade, Lachish and Raz [9] introduced a new family of Boolean functions called strongly-two-dependent and improved the lower bound to 4.5n − o(n). Shortly after, Iwama and Morizumi [5] proved a lower M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 342–350, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Well-Mixed Function with Circuit Complexity 5n ± o(n)
343
bound of 5n − o(n) for the same family of Boolean functions, which is the current best (see also [6]). Since it is easily shown that the property of k-mixed is stronger than that of strongly-two-dependent, the above mentioned lower bound of 5n − o(n) can be established for every k-mixed √ function with k = n − o(n). A simple and explicit construction of an n − 3 n-mixed Boolean function was given by Savick´ y and Z´ ak [10]. Their function is of the form h(x) = xφ(x) , where φ : {0, 1}n → {1, 2, . . . , n} is a kind of weighted sum of inputs. If we consider the circuit complexity of their function, a straightforward implementation yields an upper bound of O(n log n). It seems to be difficult to improve this upper bound without changing the definition of the function, since we need to manipulate n numbers each has O(log n)-digits in order to compute the index function φ(·). Hence it would be natural to expect that a lower bound method that uses the property of k-mixed [6,9,5], which gives the current best lower bound on a U2 circuit size of 5n − o(n), can further be extended for obtaining a higher lower bound. The main result of this paper is to show that this is not the case since there is a well-mixed function with circuit complexity 5n + o(n). Precisely, we give a construction of an explicit Boolean function fn on n variables such that √ (i) fn is n − t(n)-mixed for any t(n) = ω( n log2 n), and (ii) fn can be computed by a circuit of size 5n + o(n) over the basis U2 . The circuit complexity of our function over the basis U2 is optimal up to a lower order term. The result also shows that a lower bound method that uses the property of k-mixed (Iwama, Lachish, Morizumi and Raz [6,9,5]) has reached the limit. The organization of the paper is as follows. In Section 2, we give some preliminaries. In Section 3, we describe the definition and the analysis of our function. Finally, we give some concluding remarks in Section 4.
2
Notations and Definitions
For a natural number n, [n] denotes the set {1, 2, . . . , n}. For a binary sequence k−1 x = xk−1 · · · x0 , (x)2 denotes the integer represented by x, i.e., (x)2 = i=0 2i xi . Throughout the paper, we consider a Boolean function on the variable set Xn = {x1 , . . . , xn }. Let B2 denote the set of all (sixteen) Boolean functions over two variables, and let U2 denote B2 − {⊕, ≡}, i.e., U2 contains all Boolean functions over two variables except for the XOR function and its complement. For a basis B ⊆ B2 , a Boolean circuit over the basis B is a directed acyclic graph with nodes of in-degree 0 or 2. Nodes of in-degree 0 are called input-nodes, and each one of them is labeled by a variable in Xn or a constant 0 or 1. Nodes of in-degree 2 are called gate-nodes, and each one of them has two inputs and an output, and is labeled by a function in B. For a basis B and for a Boolean function f , the circuit complexity of f over the basis B, which is denoted by SizeB (f ), is the minimum number of gate-nodes in a circuit over the basis B that computes f .
344
K. Amano and J. Tarui
Let f be a Boolean function on Xn . A partial assignment is a function σ : Xn → {0, 1, ∗}, where σ(xi ) = 0 or 1 means that the input variable xi is fixed to the corresponding constant, and σ(xi ) = ∗ means xi remains free. For a partial assignment σ, the support of σ, denoted by Sup(σ), is Sup(σ) = {xi | σ(xi ) = ∗}. Let f |σ denote the subfunction (over Xn \Sup(σ)) obtained by restricting all variables xi in Sup(σ) to σ(xi ). All logarithms in the paper are base 2.
3
The Function and Proofs
In this section, we give an explicit construction of a well-mixed Boolean function whose circuit complexity over the basis U2 is 5n ± o(n). Definition 1. For k ∈ N, a Boolean function f on Xn is called k-mixed, if for every V ⊆ Xn such that |V | = k and for any two distinct partial assignments α, β with support V yield different subfunctions, i.e., f |α = f |β . We now describe the definition of our function. Our function is of the form fn (x) = xz where z is computed from a weighted sum of block parities of input variables. Definition 2. For an integer n, let p[n] be the smallest prime greater than or equal to n. Note that the Bertrand-Chebyshev theorem (see e.g., [4, p.373]), which states that there is a prime p such that n ≤ p < 2n for every n > 1, guarantees that p[n] < 2n. Let wn : N → [n] be defined as follows. For an integer s, let s˜ be the unique integer satisfying s˜ ≡ s (mod p[n]) and 1 ≤ s˜ ≤ p[n]. Then, s˜, 1 ≤ s˜ ≤ n, wn (s) = 1, otherwise. Let b = logn2 n . For an index i ∈ [n], the value logi2 n is called a weight of the index i and is denoted by wgt(i). We divide [n] into b blocks D1 , . . . , Db such that j ∈ Di iff wgt(j) = i. For a total assignment x to Xn and i ∈ [b], let PAR(x, i) denote the parity of variables whose weight is i, i.e., PAR(x, i) = xj . j∈Di
For every n, the Boolean function fn : {0, 1}n → {0, 1} is defined as fn (x) = xz , where b i · PAR(x, i) . (1) z = wn i=1
We should note that our function is a modification of√the function introduced by Savick´ y and Z´ ak [10], which was shown to be n − 3 n-mixed. The difference
A Well-Mixed Function with Circuit Complexity 5n ± o(n)
345
between their function and ours is that they defined z := wn ( ni=1 i · xi ) instead of Eq. (1) in Definition 2. Below we √ prove that the function fn defined above is n − t(n) mixed for every t(n) = ω( n log2 n) (Theorem 3), and then prove that the circuit complexity of fn over the basis U2 is 5n + o(n) (Theorem 6). √ Theorem 3. For any t = t(n) = ω( n log2 n), the function fn is (n − t)-mixed. As for the result of Savick´ y and Z´ ak [10], we use the following theorem due to da Silva and Hamidoune [11] (see also e.g., [1]). Theorem 4. Let p be a prime number and let z and h be two integers. Let h ≤ z ≤ p and let A ⊆ Zp such that |A| = z. Let B be the set of all sums of h distinct elements of A. Then |B| ≥ min(p, hz − h2 + 1). The outline of the proof of Theorem 3 is similar to the proof of Theorem 2.5 in [10]. However, we need more detailed analysis since the effect of a partial assignment to our function is more complex than that to the function introduced by Savick´ y and Z´ ak [10]. Proof. (of Theorem 3) Let I ⊆ [n] with |I| = n − t and let u and v be two different partial assignments both of whose support is I. To show fn |u = fn |v , it is sufficient to show that there are two total assignments x and y such that fn (x) = fn (y), where x is an extension of u and y is an extension of v. Let u0 (v 0 , resp.) be a total assignment obtained from u (v, resp.) by additionally assigning every variables in [n]\I to the constant 0. Let J be an arbitrary maximal subset of [n]\I such that every two elements of J have distinct weights, i.e., wgt(i) = wgt(j) for every i, j ∈ J with i = j. √ Note that |J| = ω( n) since there are at most O(log2 n) indices having same weight. Intuitively, Theorem 4 guarantees that, for every assignment to the variable set [n]\J, it is possible to set the remaining inputs J in such a way that the weighted sum of parities in Eq. (1) belongs to any given value residue class modulo Zp[n] . Now we divide the set of indices J into two classes Js and Jd defined as Js = {j ∈ J | PAR(u0 , wgt(j)) = PAR(v 0 , wgt(j))}, Jd = {j ∈ J | PAR(u0 , wgt(j)) = PAR(v 0 , wgt(j))}. We divide the proof of the theorem into two cases depending on the sizes of Js and Jd . In the rest of the proof, the symbol ≡ means that the congruence modulo p[n]. Case A. |Js | ≥ |Jd |. √ ˜ (˜ v , resp.) be a partial assignment In this case we have |Js | = ω( n). Let u obtained from u (v, resp.) by additionally assigning every variables in ([n]\I)\Js to 0.
346
K. Amano and J. Tarui
Let S(u) = bi=1 i · PAR(u0 , i) and S(v) = bi=1 i · PAR(v 0 , i). Let Δ be an element of Zp such that Δ ≡ S(v) − S(u). Case A-1. Δ ≡ 0. ˜ and We extend u˜ and v˜ to u and v by setting some positions not fixed in u v˜. We first choose any j ∈ Js \{1}. Let = wn (j + Δ). Since ≡ j + Δ ≡ j or = 1 = j, we have j = . If ∈ Js , in u and v we set the position j to 0 and set the position to 1. This ensures that uj = vj = u = v . If ∈ Js , we set the position j of both u and v√ in such a way that uj = vj = v . We have |Js | − 2 = ω( n) positions that are still not specified in both u and v . We denote A by the set of these positions. For an index j ∈ [n], the contribution of j, denoted by cj , is defined as wgt(j), if PAR(u0 , wgt(j)) = 0, cj = p[n] − wgt(j), if PAR(u0 , wgt(j)) = 1. Intuitively, setting 1 to an unassigned variable xj increases the total weight of b n x (i.e., i=1 i · PAR(x, i)) by cj . Since wgt(j) ≤ b = log 2 n < p[n]/2 for every j, all indices in A have distinct contributions. Let h = |A|/2. Since |A|h − h2 + 1 = ω(n) ≥ p[n], Theorem 4 guarantees that there is a set H ⊆ A of size h such that
ci ≡ j −
b
i · PAR(u , i)
i=1
i∈H
Let x be an extension of u such that xi = 1 for every i ∈ H and xi = 0 for every i ∈ A\H. Then, we have b
i · PAR(x, i) ≡ j.
i=1
This implies that fn (x) = uj . Let y be an assignment extending v such that the bits with indices in Js have the same value in x and y. This implies that b b ωn i · PAR(y, i) = ωn i · PAR(x, i) + Δ = , i=1
i=1
and hence fn (y) = v = fn (x). Case A-2. Δ ≡ 0. √ Let j ∈ I with uj = vj . Since |Js | = ω( n), Theorem 4 guarantees that there b is an extension x of u such that i=1 i · PAR(x, i) ≡ j. Let y be an assignment extending v such that the bits with indices in Js have the same value in x. Then, we have fn (x) = uj = vj = fn (y).
A Well-Mixed Function with Circuit Complexity 5n ± o(n)
347
Case B. |Jd | > |Js |. We divide the set Jd into two sets Jd,0 and Jd,1 defined as Jd,0 = {j ∈ Jd | PAR(u0 , wgt(j)) = 0 ∧ PAR(v 0 , wgt(j)) = 1}, Jd,1 = {j ∈ Jd | PAR(u0 , wgt(j)) = 1 ∧ PAR(v 0 , wgt(j)) = 0}. Without loss of generality, we can assume that √ |Jd,0 | ≥ |Jd,1 | (otherwise we swap u with v). Note that |Jd,0 | ≥ |Jd |/2 = ω( n). Let u ˜ (˜ v , resp.) be a partial assignment obtained from u (v, resp.) by additionally assigning every variables in ([n]\I)\Jd,0 to 0. Let S(u) = bi=1 i · PAR(u0 , i) and S(v) = bi=1 i · PAR(v 0 , i). Let Δ be an element of Zp such that Δ ≡ S(v) − S(u). As for Case A-1, we will extend u ˜ and v˜ to u and v by setting some positions in Jd,0 . The difference to Case A-1 is that if we set the value 1 to an unassigned variable whose weight is k, then the total weight of u ˜ increases by k and that of v˜ decreases by k. Let k be an arbitrary element in Zp that satisfies ωn (S(u)+k) ∈ Jd,0 \{1}, and let j = ωn (S(u) + k) for a chosen k. Let = ωn (S(v) − k) = ωn (S(u) + Δ − k). Without loss of generality, we can assume that j = . This is because that since j = implies S(u)+k ≡ S(u)+Δ−k, the number of choices of k that forces j = is at most one. Hence we can always avoid such a k since |Jd,0 | is sufficiently large. If ∈ Jd,0 , in u and v we set the position j to 0 and set the position to 1. If ∈ Jd,0 , we set the position j of both u and v in such a way that uj = vj = v . The rest of the √ proof is an analogous to Case A-1. Since we have at least |Jd,0 | − 2 = ω( n) unassigned variables, Theorem 4 guarantees that we can b always extend u to x such that i=1 i · PAR(x, i) ≡ S(u) + k. Let y be a total assignment b obtained from v by applying the same extension as for u . Then we have i=1 i · PAR(y, i) ≡ S(v) − k, and therefore fn (x) = xj = y = fn (y). This completes the proof of Theorem 3. Now we proceed to the analysis of the complexity of our function. We use a circuit called decoder whose definition is as follows. n
Definition 5. An n-to-2n decoder Decoden : {0, 1}n → {0, 1}2 is the function that takes an n-bit binary input x and outputs d0 d1 · · · d2n −1 such that di = 1 iff (x)2 = i. It is well known that SizeU2 (Decoden ) = 2n + O(n2n/2 ) (see e.g., [13, p. 75]). Theorem 6. The function fn can be computed by a circuit of size at most 5n + o(n) over the basis U2 . Proof. We break down the computation of fn into five steps as follows : (i) Compute PAR(x, i) for each i = 1, . . . , b. Recall that b denotes the number of blocks of the input variables.
348
K. Amano and J. Tarui
(ii) Compute the binary representation of i · PAR(x, i) for each i = 1, . . . , b. b (iii) Compute the binary representation of i=1 i ·PAR(x, i). b i · PAR(x, i) . (iv) Compute the binary representation of z = wn i=1 (v) Output xz . In step (i), since a parity gate can be implemented by using three U2 -gates (g1 = x1 ∧ x2 , g2 = x1 ∧ x2 , g3 = x1 ⊕ x2 = g1 ∨ g2 ), all PAR(x, i)’s can be computed by a circuit of size 3(n − b) < 3n over the basis U2 . Let (iq , . . . , i1 ) be the binary representation of an i ∈ [b], where q = O(log n). Then the binary representation of i · PAR(x, i) is obviously (iq ∧ PAR(x, i), . . . , i1 ∧ PAR(x, i)).
(2)
Since ij ’s are not depending on an input, i.e., ij ’s can be considered as constants, we need no gates in step (ii). In fact, we can replace ij ∧ PAR(x, i) in Eq. (2) by the constant 0 for each j with ij = 0, and by PAR(x, i) for each j with ij = 1. In step (iii), we need b − 1 O(log n)-bit adders, which can be realized using at most b · O(log n) = o(n) gates over the basis U2 , since the addition of two k bit numbers can be implemented by a circuit of size O(k) (see, e.g., [13]). In step (iv), we only need several basic arithmetic operations on O(log n)-digit numbers. The number of gates needed is obviously a polynomial in O(log n) = o(n). In step (v), we use the n-way multiplexer (a.k.a. storage access function) Mn whose definition is as follows: Let q = log n . The function Mn takes q + n binary inputs and is defined as Mn (z1 , . . . , zq , x0 , . . . , xn−1 ) = x(zq ···z1 )2 . If (zq · · · z1 )2 ≥ n, then the output of Mn is unspecified. It is well known that Mn can be computed by a circuit of size 2n + o(n) over the basis U2 when n is a power of two [8] (see also [13, p.77]). Below we describe the construction due to [8] and verify that SizeU2 (Mn ) = 2n + o(n) for every n. By using the identity [8], we have Mn (z1 , . . . , zq , x0 , . . . , xn−1 ) = M2q/2 (z1 , . . . , zq/2 , x0 , . . . , x2q/2 −1 ), where, for each i = 0, . . . , 2q/2 − 1, xi
=
2q/2
−1
(dt ∧ x2q/2 i+t ),
(3)
t=0
and dt is the t-th output of Decodeq/2 applied to (zq/2 +1 , . . . , zq ). Here we put xj = 0 for every j ≥ n in Eq. (3). Thus, Mn can be realized using n AND gates and at most n OR gates, and additional O(2q/2 ) = o(n) gates those are used to compute the function M2q/2 and Decodeq/2 .
A Well-Mixed Function with Circuit Complexity 5n ± o(n)
349
Overall, the total size of our circuit for the function fn is (3 + 2)n + o(n) = 5n + o(n). This completes the proof of Theorem 6. In summary, we have: Theorem 7. There is a sequence of Boolean functions {fn }n , which is given by Definition 2, √ such that (i) fn is a Boolean function on n variables, (ii) for every t = t(n) = ω( n log2 n), fn is n − t-mixed, and (iii) the circuit complexity of fn over the basis U2 satisfies 5n − o(n) ≤ SizeU2 (fn ) ≤ 5n + o(n).
4
Concluding Remarks
In the paper, we gave an explicit construction of an n−o(n)-mixed Boolean function with circuit complexity 5n ± o(n). Our results shows that a lower bound method on the size of a U2 -circuit that uses the property of k-mixed has reached the limit. This gives a strong motivation to find another property of Boolean functions that can be used for deriving a higher lower bound, which is an interesting and challenging open problem.
References 1. Alon, N., Nathanson, M.B., Ruzsa, I.: The Polynomial Method and Restricted Sums of Congruence Classes. J. Number Theory 56, 404–417 (1996) 2. Andreev, A., Baskakov, J., Clementi, A., Rolim, J.: Small Pseudo-Random Sets yields Hard Functions: New Tight Explicit Lower Bounds for Branching Programs. In: Wiedermann, J., Van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 179–189. Springer, Heidelberg (1999); Also in ECCC TR97-053 (1997) 3. Forster, C.C., Stockton, F.D.: Counting Responders in an Associative Memory. IEEE Trans. Comput. C-20(12), 1580–1583 (1971) 4. Hardy, G.H., Wright, E.M.: An Introduction to the Theory of Numbers, 5th edn. Oxford University Press, Oxford (1978) 5. Iwama, K., Morizumi, H.: An Explicit Lower Bound of 5n − o(n) for Boolean Circuits. In: Proc. 27th MFCS, pp. 353–364 (2002) 6. Iwama, K., Lachish, O., Morizumi, H., Raz, R.: An Explicit Lower Bound of 5n − o(n) for Boolean Circuits (manuscript, 2005), (preliminary versions are [9] and [5]), http://www.wisdom.weizmann.ac.il/∼ ranraz/publications/ 7. Jukna, S.P.: Entropy of Contact Circuits and Lower Bounds on Their Complexity. Theoret. Comput. Sci. 57, 113–129 (1988) 8. Klein, P., Paterson, M.S.: Asymptotically Optimal Circuit for a Storage Access Function. IEEE Trans. Comput. 29(8), 737–738 (1980) 9. Lachish, O., Raz, R.: Explicit Lower Bound of 4.5n − o(n) for Boolean Circuits. In: Proc. 33rd STOC, pp. 399–408 (2001) 10. Savick´ y, P., Z´ ak, S.: A Large Lower Bound for 1-Branching Programs. ECCC TR96-036 Rev.01 (1996) 11. Dias da Silva, J.A., Hamidoune, Y.O.: Cyclic Spaces for Grassmann Derivatives and Additive Theory. Bull. London. Math. Soc. 26, 140–146 (1994)
350
K. Amano and J. Tarui
12. Simon, J., Szegedy, M.: A New Lower Bound Theorem for Read Only Once Branching Programs and its Applications. In: Advances in Computational Complexity Theory, DIMACS Series, vol. 13, pp. 183–193 (1993) 13. Wegener, I.: The Complexity of Boolean Functions, Willey-Teubner (1987) 14. Zwick, U.: A 4n Lower Bound on the Combinatorial Complexity of Certain Symmetric Boolean Functions over the Basis of Unate Dyadic Boolean Functions. SIAM J. Comput. 20, 499–505 (1991)
A Logic for Distributed Higher Order π-Calculus Zining Cao Department of Computer Science and Technology Nanjing University of Aero. & Astro., Nanjing 210016, P.R. China [email protected]
Abstract. In this paper, we present a spatial logic for distributed higher order π-calculus. In order to prove that the induced logical equivalence coincides with distributed context bisimulation, we present some new bisimulations, and prove the equivalence between these new bisimulations and distributed context bisimulation. Furthermore, we present a variant of this spatial logic and prove that it gives a logical characterisation of distributed bisimulations.
1
Introduction
Higher order π-calculus was proposed and studied intensively in Sangiorgi’s dissertation [13]. In higher order π-calculus, processes and abstractions over processes of arbitrarily high order, can be communicated. Some interesting equivalences for higher order π-calculus, such as barbed equivalence, context bisimulation and normal bisimulation, were presented in [13]. Barbed equivalence can be regarded as a uniform definition of bisimulation for a variety of concurrent calculi. Context bisimulation is a very intuitive definition of bisimulation for higher order π-calculus, but it is heavy to handle, due to the appearance of universal quantifications in its definition. In the definition of normal bisimulation, all universal quantifications disappeared, therefore normal bisimulation is a very economic characterisation of bisimulation for higher order π-calculus. The coincidence between the three weak equivalences was proved [13,12,9]. In [3] we design a distributed higher order π-calculus which allows the observer to see the distributed nature of processes. For this distributed higher order πcalculus, the processes under observation are considered to be distributed in nature and locations are associated with parallel components which represent sites. So, the observer cannot only test the process by communicating with it but also can observe or distinguish that part of the distributed process which reacted to the test. Furthermore, three general notions of bisimulation related to this observation of distributed systems are introduced. The equivalence between the distributed bisimulations is proved, which generalizes the equivalence between
This work was supported by the National Natural Science Foundation of China under Grant 60473036.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 351–363, 2008. c Springer-Verlag Berlin Heidelberg 2008
352
Z. Cao
barbed equivalence, normal bisimulation and context bisimulation for distributed process calculus. Providing a logical characterisation for various equivalence relations has been one of the major research topics in the development of process theories. A logical characterisation not only allows us to reason about process behaviours, but also helps to understand the properties of the operators. For CCS, a famous modal logic is Hennessy-Milner logic [10], for which logical equivalence is known to characterise bisimilarity. For higher order π-calculus, the logical characterisation is in general difficult to give. The main problem lies in the fact that universal quantifications appear in the definition of bisimulations for higher order π-calculus. For instance, to equal two processes with higher order input prefixing, one has to equal the residuals under all possible process substitutions. Some Hennessy-Milner style logics for bisimulation in CHOCS were presented in [1,2]. Spatial logic was presented in [7]. Spatial logic extends classical logic with connectives to reason about the structure of the processes. The additional connectives belong to two families. Intensional operators allow one to inspect the structure of the process. A formula A1 |A2 is satisfied whenever we can split the process into two parts satisfying the corresponding subformula Ai , i = 1, 2. In presence of restriction in the underlying model, a process P satisfies formula nA if we can write P as (νn)P with P satisfying A. Finally, formula 0 is only satisfied by the inaction process. Connectives | and come with adjunct operators, called guarantee () and hiding () respectively, that allow one to extend the process being observed. In this sense, these can be called contextual operators. P satisfies A1 A2 whenever the spatial composition (using |) of P with any process satisfying A1 satisfies A2 , and P satisfies An if (νn)P satisfies A. Some spatial logics have an operator for fresh name quantification [6]. Existing spatial logics for concurrency are intensional [11], in the sense that they induce an equivalence that coincides with structural congruence, which is much finer than bisimilarity. In [8], Hirschkoff studied an extensional spatial logic. This logic only has spatial composition adjunct (), revelation adjunct (), a simple temporal modality, and an operator for fresh name quantification. For π-calculus, this extensional spatial logic was proved to induce the same separative power as strong early bisimilarity. There are lots of works of spatial logics for π-calculus and Mobile Ambients. But as far as we know, there is no spatial logic for higher order π-calculus up to now. In this paper, we present a spatial logic for distributed higher order π-calculus, called W L, which comprises two barb detecting predicates ⇓ and ⇓μ , a simple temporal modality and a spatial composition adjunct . We show that this equivalence induced by this spatial logic coincides with distributed context bisimulation. To establish this result, our strategy is to first give a simpler bisimulation which is equivalent to distributed context bisimulation, and then present a logical characterisation for this simpler bisimulation. Consequently, this logic is also a characterisation for context bisimulation. Similar to [8], we
A Logic for Distributed Higher Order π-Calculus
353
exploit the characterisation of distributed context bisimulation in the terms of distributed barbed equivalence. However, in the definition of distributed barbed equivalence, the testing context is an arbitrary process. It is difficult to give the logical characterisation of barbed equivalence directly. Therefore we first give a variant of normal bisimulation, ≈dnr , where in the case of higher order output action, we only need to test the remainder process with some kind of finite processes. Then we give a similar variant of contextual barbed bisimulation, ≈dnb , where a process only need to be tested with some special finite processes. The bisimulations ≈dnr and ≈dnb are proved to coincide with weak context bisimulation. Furthermore, we prove that W L is a characterisation of ≈dnb . Thus W L is a logic characterisation of weak context bisimulation. Finally, we present a variant of W L, named W SL, which does not have ⇓μ , but contains . We prove that W SL induces an equivalence that coincides with weak context bisimulation. Thus W SL is also a logic characterisation of context bisimulation. This paper is organized as follows: In Section 2, we introduce a distributed higher order π-calculus. In Section 3, we present some distributed bisimulations and give the equivalence between these bisimulations. In Section 4, we give a spatial logic for weak distributed context bisimulation, and prove that the logical induced equivalence coincides with weak distributed context bisimulation. The paper is concluded in Section 5.
2
Syntax and Labelled Transition System of Distributed Higher Order π-Calculus
In this section, a distributed higher order π-calculus is given, which was first presented in [3]. We now introduce the concept of distributed process. Given a location set Loc, w.l.o.g., let Loc be the set of natural numbers, the syntax for distributed processes allows the parallel composition of located process {P }i , which may share a defined name, using the construct (νa)−. The class of the distributed processes is denoted as DP r, ranged over by K, L, M, N, .... The formal definition of distributed process is given as follows: M ::= {P }i | M1 |M2 | (νa)M, where i ∈ Loc and P is a process of higher order π-calculus. Intuitively, {P }i represents process P residing on location i. The actions of such a process will be observed to occur “within location i”. M1 |M2 represents the parallel of two distributed systems M1 and M2 . (νa)M is the restriction operator, which makes name a local to system M . For example, {(νl)(bl.0|!τ.0.0|l.0)}0 |{b(U ).U }1 , (νl)({l.0}0 |{l.0|!τ.0}1), {b!τ.0.0}0 |{b(U ).U }1 and {0}0|{!τ.0}1 are distributed processes. Here intuitively {(νl)(bl.0|!τ.0.0|l.0)}0 |{b(U ).U }1 represents (νl)(bl.0|!τ.0.0|l.0) and b(U ).U running on locations 0 and 1 respectively, (νl)({l.0}0 |{l.0|!τ.0}1) represents the parallel of {l.0}0 and {l.0|!τ.0}1 with a private name l. In each distributed process of the form (νa)M the occurrence of a is a bound within the scope of M . An occurrence of a in M is said to be free iff it does not lie within the scope of a bound occurrence of a. The set of names occurring
354
Z. Cao
free in M is denoted f n(M ). An occurrence of a name in M is said to be bound if it is not free, we write the set of bound names as bn(M ). n(M ) denotes the set of names of M , i.e., n(M ) = f n(M ) ∪ bn(M ). We use n(M, N ) to denote n(M ) ∪ n(N ). Higher order input prefix {a(U ).P }i binds all free occurrences of U in P . The set of variables occurring free in M is denoted as f v(M ). We write the set of bound variables in M as bv(M ). A distributed process is closed if it has no free variable; it is open if it may have free variables. DP rc is the set of all closed distributed processes. Processes M and N are α-convertible, M ≡α N , if N can be obtained from M by a finite number of changes of bound names and bound variables. Structural congruence is a congruence relation including the following rules: {P }i |{Q}i ≡ {P |Q}i ; M |N ≡ N |M ; (L|M )|N ≡ L|(M |N ); M |0 ≡ M ; (νa)0 ≡ 0; (νa)(νb)M ≡ (νb)(νa)M ; (νa)(M |N ) ≡ M |(νa)N if a ∈ / f n(M ); M ≡ N if M ≡α N. The distributed actions are given by Iα ::= {τ }i,j | {l}i | {l}i | {aE}i | {aE}i | {(νb)aE}i We write bn(Iα) for the set of names bound in Iα, which is {b} if Iα is {(νb)aE}i and ∅ otherwise. n(Iα) denotes the set of names that occur in Iα. We give the operational semantics of distributed processes in Table 1. We have omitted the symmetric of the parallelism and communication rules. The main character of Table 1 is that the label Iα on the transition arrow is of the Table 1. M −→ M
P −→ P
Iα
ALP :
Iα
N −→ N
τ
M ≡ N, M ≡ N P −→ P
T AU :
P −→ P
l
OU T 1 :
l
IN 1 :
{l}i
{P }i −→ {P }i (ν b)aE
−→
P
OU T 2 :
{P }i
P −→ P {P }i
{aE}i
−→ {P }i
bn(Iα) ∩ f n(N ) = ∅
Iα
M |N −→ M |N {l}i
COM 2 :
{P }i −→ {P }i
IN 2 :
−→ {P }i Iα M −→ M
COM 1 :
{l}i
aE
P
{(ν b)aE}i
P AR :
{τ }i,i
{P }i −→ {P }i
M −→ M
{l}j
N −→ N
{τ }i,j
{(ν b)aE}i
M |N −→ M |N
OP EN :
−→
M
{aE}j
−→ N b ∩ f n(N ) = ∅ M |N −→ (νb)(M |N ) Iα M −→ M a∈ / n(Iα) RES : Iα (νa)M −→ (νa)M
M
N
{τ }i,j
M
{(ν c)aE}i
(νb)M
−→
M
{(νb, c)aE}i
−→
M
a = b, b ∈ f n(E) − c
A Logic for Distributed Higher Order π-Calculus
355
form {α}i or {τ }i,j , where α is an input or output action, i and j are locations. From the point of distributed view, {α}i can be regarded as an input or output action performed on location i, and {τ }i,j can be regarded as a communication, where the output happens on location i and the input happens on location j. In the following, we view {α}i as a distributed input or output action, and view {τ }i,j as a distributed communication. α
Remark: In the above table, the labelled transition system of P −→ P was {α}i
given in operational semantics of higher order π-calculus [13]. Transition M −→ M means that distributed process M performs an action on location i, then {τ }i,j
continues as M ; and transition M −→ M means that after a communication between locations i and j, distributed process M continues as M .
3 3.1
Bisimulations for Distributed Higher Order π-Calculus Distributed Bisimulations
In our distributed calculus, processes contain locations, and actions in the scope of locations will be observed on those locations. For such distributed processes, some new distributed bisimulations, distributed context bisimulation and distributed normal bisimulation, will be proposed. The new semantics we give to distributed processes additionally takes the distribution in space into account. For distributed calculus, two distributed processes are equated if not only the actions, but also the locations, where the actions happen, can be matched against each other. So we get the following definition of distributed context bisimulation. In the following, we abbreviate M {E/U } as M E. We first give the weak distributed context bisimulation, weak distributed normal bisimulation and the weak distributed reduction bisimulation. Then we give their congruence property and the equivalence between these three bisimulations. Before giving the weak distributed bisimulations, let us compare communication {τ }i,i with {τ }i,j firstly, where i = j. For weak distributed bisimulations, it seems natural to view {τ }i,i as an invisible communication and {τ }i,j as a visible communication, since for an observer who can distinguish between sites, {τ }i,i is an internal communication on location i, and {τ }i,j represents an external communication between two different locations i and j. Therefore in the following, we regard {τ }i,i as a private event on location i, and {τ }i,j as a visible event between locations i and j. For example, let us consider a system consisting of a satellite and an earth station. It is clear that in this system, the satellite is physically far away from the earth station. If the program controlling the satellite has to be changed, either because of a program error or because the job of the satellite is to be changed, then a new program will be sent to the satellite. The satellite is ready to receive a new program. After reception it acts according to this program until it is “interrupted” either by a new job or because a program error has occurred or because the program has finished.
356
Z. Cao
In this example, the satellite and the earth can be specified in our distributed calculus syntax as follows: def Sat = {!a(U ).U |satsys}S , where S represents location satellite. def
Earth = {anewprg1 .anewprg2 ...}E , where E represents location earth. Now the system is specified as: def Sys = (νa)({!a(U ).U |satsys}S |{anewprg1 .anewprg2 ...}E ). The earth can send a new program to the satellite, then in the satellite, this program interacts with the old satellite system: {τ }S,E
{τ }S,S
Sys −→ (νa)({!a(U ).U |newprg1 |satsys}S |{anewprg2 ...}E ) =⇒ (νa)({!a(U ).U |newsatsys}S |{anewprg2 ...}E ). In this transition sequence, {τ }S,E is a communication between locations satellite and earth, {τ }S,S is an internal action on location satellite. Intuitively, we can consider {τ }S,E as an external communication and it is visible; meanwhile, {τ }S,S can be viewed as an internal action of satellite and it is private. Therefore, from the point of view of distributed calculus, we can neglect internal communication such as {τ }S,S but {τ }S,E is treated as a visible action. Firstly we give the definition of weak distributed context bisimulation. The difference from strong distributed context bisimulation is that in the case of weak bisimulation we neglect {τ }i,i since from the point of distributed view {τ }i,i happens internally on location i. ε
In the following, we use M =⇒ M to abbreviate M Iα ε Iα ε and use M =⇒ M to abbreviate M =⇒−→=⇒ M .
{τ }i1 ,i1
−→
...
{τ }in ,in
−→
M ,
Definition 1. Weak distributed context bisimulation Let M , N ∈ DP rc , we write M ≈dcxt N , if there is a symmetric relation R, such that M R N implies: ε ε (1) whenever M =⇒ M , there exists N such that N =⇒ N and M R N ; {τ }i,j
{τ }i,j
(2) whenever M =⇒ M , there exists N such that N =⇒ N with M R N , where i = j;
{l}i
{l}i
{l}i
{l}i
(3) whenever M =⇒ M , there exists N such that N =⇒ N with M R N ; (4) whenever M =⇒ M , there exists N such that N =⇒ N with M R N ; (5) whenever M R N ;
{aE}i
=⇒ M , there exists N such that N
{(νb)aE}i
{aE}i
=⇒ N and M {(ν c)aF }i
=⇒ M , there exist N , F , c, such that N =⇒ N , (6) whenever M and for any distributed process C(U ) with f n(C(U )) ∩ {b, c} = ∅, (νb)(M | CE) R (ν c)(N |CF ). Definition 2. Weak distributed normal bisimulation Let M , N ∈ DP rc , we write M ≈dnor N , if there is a symmetric relation R, such that M R N implies: ε ε (1) whenever M =⇒ M , there exists N such that N =⇒ N and M R N ;
A Logic for Distributed Higher Order π-Calculus {τ }i,j
357
{τ }i,j
(2) whenever M =⇒ M , there exists N such that N =⇒ N with M R N , where i = j; {l}i
{l}i
{l}i
{l}i
(3) whenever M =⇒ M , there exists N such that N =⇒ N with M R N ; (4) whenever M =⇒ M , there exists N such that N =⇒ N with M R N ; {am.0}i
(5) whenever M =⇒ M , there exists N such that N M R N , where m is a fresh name;
{am.0}i
=⇒
{(νb)aE}i
N and
{(ν c)aF }i
(6) whenever M =⇒ M , there exist N , F , c, such that N =⇒ N , and for a fresh name m and fresh location j, (νb)(M |{!m.E}j ) R (ν c)(N |{!m.F }j ). In the following, we present some new bisimulations, and prove that they are equivalent to distributed context bisimulation. These results are used to give a logical charatersation of ≈dcxt . Now we first give a variant of distributed normal bisimulation, where parallel of finitary copies is used as a limit form of replication. We prove it coincides with distributed normal bisimulation. In the following, we write Πk M to denote the def def parallel composition of k copies of M , i.e., Πk M = M |Πk−1 M and Π0 M = 0. For example, Π3 M represents M |M |M . Definition 3. A symmetric relation R ⊆ DP rc × DP rc is a weak distributed limit normal bisimulation if M R N implies: ε ε (1) whenever M =⇒ M , there exists N such that N =⇒ N and M R N ; {τ }i,j
{τ }i,j
(2) whenever M =⇒ M , there exists N such that N =⇒ N with M R N , where i = j; {l}i
{l}i
{l}i
{l}i
(3) whenever M =⇒ M , there exists N such that N =⇒ N with M R N ; (4) whenever M =⇒ M , there exists N such that N =⇒ N with M R N ; {am.0}i
(5) whenever M =⇒ M , there exists N such that N M R N , where m is a fresh name; {(νb)aE}i
{am.0}i
=⇒
N and
{(ν c)aF }i
=⇒ M , there exist N , F , c such that N =⇒ N (6) whenever M and (νb)(M |{Πk m.E}j ) R (ν c)(N |{Πk m.F }j ) for all k ∈ {0, 1, 2, ...}, where m is a fresh name. We write M ≈dnr N if M and N are weakly distributed limit normal bisimilar. In [8], the logical characterisation of strong early bisimulation was exploited in terms of strong barbed equivalence. The extensional spatial logic [8] was proved to capture strong barbed equivalence firstly, then since strong early bisimulation coincides with strong barbed equivalence, this spatial logic is also a logical characterisation of strong early bisimulation. But the proof techniques in [8] does not work for weak bisimulation. In [8], the problem of the spatial logical equivalence of weak early bisimulation was presented as an open question. To give a logical charatersation of ≈dcxt , we will give a distributed variant of barbed equivalence, where testing contexts are some special processes.
358
Z. Cao
Definition 4. For each name or co-name μ, the observability predicate ↓μ is defined by {l}i
(1) M ↓l if there exists M such that M −→ M ; {l}i
(2) M ↓l if there exists M such that M −→ M ; (3) M ↓a if there exist E, M such that M
{aE}i
(4) M ↓a if there exist b, E, M such that M
−→ M ;
{(νb)aE}i
−→
M .
Definition 5. A symmetric relation R ⊆ DP rc × DP rc is a weak distributed normal barbed bisimulation if M R N implies: (1) M |C R N |C for any C in the form of {0}i , {l.0}i , {l.0}i, {am.0.0}i , {a(U ).Πk m.U }i , where k is any natural number; ε ε (2) whenever M =⇒ M there exists N such that N =⇒ N and M R N ; {τ }i,j
{τ }i,j
(3) whenever M =⇒ M , there exists N such that N =⇒ N with M R N , where i = j; ε (4) if M ⇓μ , then also N ⇓μ , where M ⇓μ means ∃M , M =⇒ M ↓μ . We write M ≈dnb N if M and N are weakly distributed normal barbed bisimilar. It is worth to note that all testing contexts in the above definition are finite processes. Hence it is possible to give the characteristic formulas for these testing contexts. 3.2
Equivalence of Distributed Bisimulations
Now we study the equivalence between ≈dcxt , ≈dnor and ≈dnr . Lemma 1. For any M, N ∈ DP rc , M ≈dcxt N ⇒ M ≈dnr N. P roof : It is trivial by the definition of ≈dcxt and ≈dnr . Lemma 2. For any M, N ∈ DP rc , M ≈dnr N ⇒ M ≈dnor N. Proposition 1. For any M, N ∈ DP rc , M ≈dcxt N ⇔ M ≈dnr N ⇔ M ≈dnor N. P roof : By the equivalence between ≈dcxt and ≈dnor [3], M ≈dnor N ⇔ M ≈dcxt N. Furthermore, by Lemmas 1 and 2, we have that the proposition holds. The following proposition states that ≈dnr coincides with ≈dnb . Proposition 2. For any M, N ∈ DP rc , M ≈dnr N ⇔ M ≈dnb N. Proposition 3. For any M, N ∈ DP rc , M ≈dcxt N ⇔ M ≈dnr N ⇔ M ≈dnor N ⇔ M ≈dnb N. P roof : By Propositions 1 and 2. In fact, a symmetric relation R satisfying conditions (1), (2) and (3) of Definition 5 is also equivalent to ≈dcxt .
A Logic for Distributed Higher Order π-Calculus
4
359
A Logic for Distributed Higher Order π-Calculus
In this section, we present a logic to reason about distributed higher order πcalculus. Roughly speaking, this logic extends propositional logic with four connectives: (1) formula ⇓ is an atom formula, which means that the process can communicate through some channel; (2) formula ⇓μ is an atom formula, which means that the process can communicate through channel μ; (3) formula A means that the process can perform several {τ }i,j actions, where i = j, and transform to a process that satisfies A; (4) formula i, jA means that process can perform {τ }i,j and transform to a process that satisfies A; (5) if M satisfies A1 A2 , then the parallel composition of processes M and N satisfies formula A2 if N satisfies formula A1 . To prove that the equivalence induced by this logic coincides with distributed context bisimulation, we present a variant of distributed normal bisimulation, ≈dnr , where replication is replaced by a series of parallel composition. Then we prove the equivalence between ≈dnr and ≈dnor . Furthermore, we present a simplified version of distributed contextual barbed bisimulation, ≈dnb . In the definition of distributed contextual barbed bisimulation, two equivalent processes should be tested by any context. So it is not convenient to verify the equivalence between two processes. In the definition of ≈dnb , to equate two processes we only need to test them by some special contexts. We prove that ≈dnb coincides with ≈dnr . Moreover, since ≈dcxt , ≈dnor and ≈dnr are equivalent, to give a logical characterisation of ≈dcxt , it is enough to prove that this logic captures ≈dnb . Since the special contexts appearing in the definition of ≈dnb are finite, we can give characterisation formulas for all these processes. Thus that the equivalence induced by this logic coincides with ≈dnb can be proved. 4.1
Syntax and Semantics of Logic W L
Now we introduce a spatial logic called W L. Definition 6. Syntax of W L A :=⇓ | ⇓x | ⇓x | ¬A | ∧i∈I Ai | A | i, jA | A1 A2 Definition 7. Semantics of W L ε M |=⇓ iff ∃M . M =⇒ M , and M ↓x or M ↓x for some x; ε M |=⇓x iff ∃M . M =⇒ M ↓x ; ε M |=⇓x iff ∃M . M =⇒ M ↓x ; M |= ¬A iff M |= A; M |= ∧i∈I Ai iff M |= Ai for any i ∈ I; ε M |= A iff ∃M . M =⇒ M and M |= A; {τ }i,j
M |= i, jA iff ∃M . M =⇒ M and M |= A; M |= A1 A2 iff ∀N . N |= A1 implies M |N |= A2 . Definition 8. M and N are logically equivalent with respect to W L, written M =W L N, iff for any formula A, M |= A iff N |= A.
360
4.2
Z. Cao
W L is a Logical Characterisation of ≈dcxt
In this section we prove the equivalence between ≈dnb and =W L . Since ≈dnb coincides with ≈dcxt , we have the equivalence between ≈dcxt and =W L . Proposition 4. For any M, N ∈ DP rc , M ≈dnb N implies M =W L N. P roof : We prove by structural induction on A that whenever M ≈dnb N and M |= A, we have N |= A. The case corresponding to the adjunct operator follows from congruence properties of ≈dnb with respect to parallel composition. To prove the converse proposition, we need the following definition, which gives the characteristic formulas for the testing contexts in the definition of ≈dnb . Definition 9 def def (1) W F{0}i = ⇓, where ⇓ = ¬ ⇓; def
def
(2) W F{ch(x)}i = ⇓x ∧[j, i]⊥ ∧ ([j, i]⊥ j, iW F{0}i ), where A B = def
¬(A ¬B), [i, j]A = ¬i, j¬A. def
(3) W F{ch(x)}i = ⇓x ∧[i, j]⊥ ∧ ([i, j]⊥ i, jW F{0}i ). def
(4) W F{l.0}i = W F{ch(l)}i ; def
(5) W F{l.0}i = W F{ch(l)}i ; def
(6) W F{a0.0}i = W F{ch(a)}i ∧ (W F{ch(a)}j [i, j]( ⇓a → W F{0}i )); def
(7) W F{am.0.0}i = W F{ch(a)}i ∧ (W F{ch(a)}j [i, j]( ⇓a → W F{m.0}i )); def
(8) W F{Π0 m.0}i = W F{0}i ; def
(9) W F{Πk m.0}i = W F{m.0}j j, iW F{Πk−1 m.0}i ; def
(10) W F{Π0 m.n.0}i = W F{0}i ; def
(11) W F{Πk m.n.0}i = ⇓n ∧(W F{m.0}j j, i)k W F{Πk n.0}i ∧ W F{m.0}j j, i def
(W F{n.0}h i, hW F{Πk−1 m.n.0}i ), where k = 0 (B i, j)i C = B i, j(B def
i, j)i−1 C, (B i, j)1 C = B i, jC; def
(12) W F{a(U).Πk m.U}i = ⇓m ∧W F{a0.0}j j, iW F{Πk m.0}i ∧W F{an.0.0}j j, iW F{Πk m.n.0}i . Intuitively, formula W FM captures the class of processes that are bisimilar to process M . The following lemma states this formally: Lemma 3. The above formulas have the following interpretation: (1) M |= W F{0}i iff M ≈dcxt {0}i ; (2) M |= W F{ch(a)}i where a is a higher order name iff M ≈dcxt {a(U ).C(U )}i , and there exist b, E, N, such that (νb)({CE}i |N ) ≈dcxt {0}i ; (3) M |= W F{ch(a)}i where a is a higher order name iff M ≈dcxt (νb)({aE.Q}i ), and there exists N (U ), such that (νb)(N E|{Q}i ) ≈dcxt {0}i ; (4) M |= W F{l.0}i iff M ≈dcxt {l.0}i ; (5) M |= W F{l.0}i iff M ≈dcxt {l.0}i ;
A Logic for Distributed Higher Order π-Calculus
361
(6) M |= W F{a0.0}i iff M ≈dcxt {a0.0}i ; (7) M |= W F{am.0.0}i iff M ≈dcxt {am.0.0}i ; (8) M |= W F{Πk m.0}i iff M ≈dcxt {Πk m.0}i , where Πk M denotes the parallel composition of k copies of M ; (9) M |= W F{Πk m.n.0}i iff M ≈dcxt {Πk m.n.0}i , where m = n; (10) M |= W F{a(U).Πk m.U}i iff M ≈dcxt {a(U ).Πk m.U }i . Lemma 4. D |= W FC ⇔ C ≈dcxt D, where C is in the form of {0}i , {l.0}i , {l.0}i , {am.0.0}i , or {a(U ).Πk m.U }i , where k is an arbitrary natural number. P roof : It is clear by Lemma 3. Proposition 5. For any M, N ∈ DP rc , M =W L N implies M ≈dnb N. The following proposition states that W L captures ≈dcxt . Proposition 6. For any M, N ∈ DP rc , M ≈dcxt N ⇔ M =W L N. P roof : By Proposition 3, 4 and 5. 4.3
A Variant of W L
We have proved that ≈dcxt coincides with =W L . In the following we consider a variant of W L, called W SL, in which we remove ⇓ and ⇓μ , and add ⇓in , ⇓out and . We show that ⇓ and ⇓μ can be defined by using ⇓in , ⇓out and , then it is clear that W SL is a spatial logical characterisation of ≈dcxt . Definition 10. Syntax and semantics of W SL Formulas of W SL are defined by the following grammar: A ::=⇓in | ⇓out | ¬A | ∧i∈I Ai | A | i, jA | A1 A2 | A n Semantics of W SL is similar to W L except that the definition of ⇓μ is eliminated and the definitions of ⇓in , ⇓out and are added as follows: ε M |=⇓in iff ∃M . M =⇒ M ↓μ where μ is in the form of l or a; ε M |=⇓out iff ∃M . M =⇒ M ↓μ where μ is in the form of l or a; M |= A n iff (νn)M |= A. , where n = {n1 , n2 , ..., nk }. We abbreviate A n1 n2 ... nk as A n Definition 11. M and N are logically equivalent with respect to W SL, written M =W SL N, iff for any formula A, M |= A iff N |= A. Lemma 5. M |=⇓⇔ M |=⇓in ∨ ⇓out P roof : It is trivial. Lemma 6. (1) M |=⇓μ ⇔ M |=⇓in ∧(W F0 m μ) ∧ ¬(W F0 m), where μ is in the form of l or a, m = f n(M ) − {μ}; (2) M |=⇓μ ⇔ M |=⇓out ∧(W F0 m μ) ∧ ¬(W F0 m), where μ is in the = f n(M ) − {μ}. form of l or a, μ = l if μ = l, μ = a if μ = a, and m Now we can give the equivalence between ≈dcxt and =W SL . Proposition 7. For any M, N ∈ DP rc , M ≈dcxt N ⇔ M =W SL N.
362
Z. Cao
P roof : Similar to Proposition 4, we prove by structural induction on A that whenever M ≈dcxt N and M |= A, we have N |= A. Thus M ≈dcxt N ⇒ M =W SL N. By Lemmas 5, 6 and Proposition 6, we have M =W SL N ⇒ M ≈dcxt N . Hence the proposition holds.
5
Conclusions
In this paper, we have defined a logic W L, which comprises some spatial operators. We have shown that the induced logical equivalence coincides with weak distributed context bisimulation for distributed higher order π-calculus. As far as we know, this is the first spatial logical characterisation of distributed context bisimulation. In [8], Hirschkoff studied a contextual spatial logic for the π-calculus, which lacks the spatial operators to observe emptiness, parallel composition and restriction, and only has composition adjunct and hiding. The induced logical equivalence was proved to coincide with strong early bisimulation. The proof involves the definition of non-trivial formulas, including characteristic formulas for restriction-free processes up to bisimulation and characteristic formulas for barbed bisimulation. In [8], Hirschkoff pointed out that his techniques do not apply directly if we consider weak early bisimulation, and studying spatial logical equivalence of weak early bisimulation was viewed as a challenging question. In this paper, to prove that =W L coincides with ≈dcxt , we present two simplified notions of observable equivalence on distributed higher order processes named ≈dnr and ≈dnb . Then ≈dnr and ≈dnb was proved to coincide with ≈dcxt . Moreover, since the testing contexts in the definition of ≈dnb are finite processes, we can give the characteristic formulas for such testing contexts. Thus the equivalence between =W L and ≈dcxt is proved.
References 1. Amadio, R.M., Dam, M.: Reasoning about Higher-order Processes. In: Mosses, P.D., Schwartzbach, M.I., Nielsen, M. (eds.) CAAP 1995, FASE 1995, and TAPSOFT 1995. LNCS, vol. 915, pp. 202–216. Springer, Heidelberg (1995) 2. Baldamus, M., Dingel, J.: Modal Characterization of Weak Bisimulation for Higherorder Processes. In: Bidoit, M., Dauchet, M. (eds.) CAAP 1997, FASE 1997, and TAPSOFT 1997. LNCS, vol. 1214, pp. 285–296. Springer, Heidelberg (1997) 3. Cao, Z.: Bisimulations for a Distributed Higher Order π -Calculus. In: Jones, C.B., Liu, Z., Woodcock, J. (eds.) ICTAC 2007. LNCS, vol. 4711, pp. 94–108. Springer, Heidelberg (2007) 4. Cao, Z.: More on bisimulations for higher-order π -calculus. In: Aceto, L., Ing´ olfsd´ ottir, A. (eds.) FOSSACS 2006 and ETAPS 2006. LNCS, vol. 3921, pp. 63–78. Springer, Heidelberg (2006) 5. Castellani, I.: Process Algebras with Localities, ch. 15. In: Bergstra, J., Ponse, A., Smolka, S. (eds.) Handbook of Process Algebra, pp. 945–1045. North-Holland, Amsterdam (2001)
A Logic for Distributed Higher Order π-Calculus
363
6. Caires, L., Cardelli, L.: A Spatial Logic for Concurrency (Part II). Theoretical Computer Science 322(3), 517–565 (2004) 7. Caires, L., Cardelli, L.: A Spatial Logic for Concurrency (Part I). Information and Computation 186(2), 194–235 (2003) 8. Hirschkoff, D.: An Extensional Spatial Logic for Mobile Processes. In: Gardner, P., Yoshida, N. (eds.) CONCUR 2004. LNCS, vol. 3170, pp. 325–339. Springer, Heidelberg (2004) 9. Jeffrey, A., Rathke, J.: Contextual equivalence for higher-order π-calculus revisited. In: Proceedings of Mathematical Foundations of Programming Semantics, Elsevier, Amsterdam (2003) 10. Milner, R., Parrow, J., Walker, D.: Modal logics for mobile processes. Theoretical Computer Science 114(1), 149–171 (1993) 11. Sangiorgi, D.: Extensionality and Intensionality of the Ambient Logic. In: Proc. of the 28th POPL, pp. 4–17. ACM Press, New York (2001) 12. Sangiorgi, D.: Bisimulation in higher-order calculi. Information and Computation 131(2) (1996) 13. Sangiorgi, D.: Expressing mobility in process algebras: first-order and higher-order paradigms. Ph.D thesis, University of Einburgh (1992)
Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs M. Demange1 and T. Ekim2, 1
2
ESSEC Business School, Avenue Bernard HIRSH, BP 105, 95021 Cergy Pontoise cedex France [email protected] Bo˘ gazi¸ci University, Department of Industrial Engineering, 34342, Bebek-Istanbul, Turkey [email protected]
Abstract. Yannakakis and Gavril showed in [10] that the problem of finding a maximal matching of minimum size (MMM for short), also called Minimum Edge Dominating Set, is NP-hard in bipartite graphs of maximum degree 3 or planar graphs of maximum degree 3. Horton and Kilakos extended this result to planar bipartite graphs and planar cubic graphs [6]. Here, we extend the result of Yannakakis and Gavril in [10] by showing that MMM is NP-hard in the class of k-regular bipartite graphs for all k ≥ 3 fixed. Keywords: stable marriage, unmatched pairs, Minimum Maximal Matching, regular bipartite graphs.
1
Introduction
Given a graph, a matching is a set of edges which are pairwise non-adjacent. A matching M is said to be maximal if no other edge can be added to M while keeping the property of being a matching. We say that an edge in a matching M dominates all edges adjacent to it. Also, vertices contained in edges of M are said to be saturated by M . In a given graph, a matching saturating all vertices is called perfect. Note that in a maximal matching M of a graph G = (V, E), every edge in E is necessarily dominated. The problem of finding a maximal matching of minimum size is called Minimum Maximal Matching (MMM) or Minimum (Independent) Edge Dominating Set (see [5] for the equivalence of these problems as well as all other graph theoretical definitions not given here). MMM, NP-hard in general, is extensively studied due to its wide range of applications. See [10] for the description of an application related to a telephone switching network built to route phone calls from incoming lines to outgoing trunks. The worst case behavior of such a network (minimum number of calls routed when the system is saturated) can be evaluated by a minimum maximal matching.
The research of Tınaz Ekim is funded by the B.U. Research Fund, Grant 08A301, whose support is greatly acknowledged.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 364–374, 2008. c Springer-Verlag Berlin Heidelberg 2008
Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs
365
A second application appears in the so-called stable marriage problem which involves a set of institutions and applicants. Suppose that institutions, respectively applicants, have preference lists on each other set. The objective is to find a matching between institutions and applicants which is stable; that is, a matching where there is no pair of institution-applicant which are not matched but which would be both better off if matched to each other. The problem to solve is to find a stable matching in a bipartite graph where one part of the bipartition represents institutions and the other part applicants, and where the preferences of each institution and each applicant are given. In [4], Gale and Shapley give an algorithm to determine a stable matching between medical school graduate students and hospitals. Their algorithm works in a greedy fashion; in a successive way, institutions tentatively admit applicants and admitted applicants tentatively accept the admissions. It is easy to see that a stable matching is necessarily a maximal matching since otherwise there would be at least one more pair of institution-applicant where both prefer to be matched to each other compared to the current situation. However, the well-known Gale-Shapley algorithm finding a stable matching in a bipartite graph may leave unmatched some institutions and applicants; this is clearly not a desired situation. One can easily see that the number of unmatched institutions and applicants is bounded above by the number of the remaining vertices after the removal of a minimum maximal matching (since a stable matching is necessarily maximal). In some problems, the number of applicants can be very large; it is the case for instance during the university admission examination in Turkey where each year over two million students apply. Here, the national examination office places students to universities by giving priority to the preferences of students with best scores (see [2] for further information). Departing from the idea that the examination score can not be the only criterion in the student placement procedure, after the first placement results, it may be convenient to establish shortlists (of the same lenght) based on the initial preferences of students and universities (which are for now represented by examination scores). These shortlists can be obtained by constructing a stable many-to-many matching where each student and each university has the same number of choices; this is represented by a regular bipartite graph (see [1] for the description of such an algorithm). Once we have the shortlists, students and universities can have a second round to express their new preferences which are this time based on interviews and a better knowledge of each other; criteria other then exam scores for institutions or university reputations for students are now taken into account. A stable matching obtained from this regular bipartite graph with new preferences will be the ultimate placement result. So this motivates the problem of studying the tractability of minimum maximal matching restricted to regular bipartite graphs. In [10], Yannakakis and Gavril show that MMM is NP-hard in several classes of graphs including bipartite (or planar) graphs with maximum degree 3. In [6], Horton and Kilakos obtained some extensions of these results including the NP-hardness of MMM in planar bipartite graphs and planar cubic graphs.
366
M. Demange and T. Ekim
In the present paper, we give another strengthening of the result of Yannakakis and Gavril in [10] by showing that MMM is NP-hard in k-regular bipartite graphs for any fixed k ≥ 3; this shows the NP-hardness of finding the best upper bound on the number of unmatched pairs given by a stable matching based on shortlists. Note that in the case k = 2, MMM is polynomially solvable since 2-regular graphs consist of disconnected cycles and therefore any maximal matching is in fact a maximum matching. Our result implies in particular that MMM is NP-hard already in cubic bipartite graphs which are strictly contained in the class of bipartite graphs with maximum degree 3. Besides, one may observe that the generalization of this result in cubic bipartite graphs to k-regular bipartite graphs for k > 3 is far from being straightforward. Note also that the class of k-regular bipartite graphs for k ≥ 3 has no inclusion relationship with the class of planar bipartite graphs while having a non-empty intersection (the Heawood graph, also called (3,6)-cage graph, is a 3-regular bipartite graph which is nonplanar [5]). These results contribute to narrowing down the gap between P and NP-complete with respect to MMM. Other polynomial time solvable cases for MMM can be found in [8] for trees, in [7] for block graphs and in [9] for bipartite permutation graphs and cotrianglated graphs. Throughout the paper, we consider only connected bipartite graphs; all results remain valid for not necessarily connected bipartite graphs simply by repeating the described procedures for each connected component. The paper is organized in the following way. In Section 2, we show that if MMM is NP-hard in k-regular bipartite graphs then it is also NP-hard in (k +1)regular bipartite graphs. In Section 3, the NP-hardness of MMM in 3-regular bipartite graphs is established; to this purpose, we first reduce a restricted 3SAT problem to MMM in bipartite graphs with vertices of degree 2 or 3 (inspired from [10]), and then the later problem to MMM in 3-regular bipartite graphs by using the gadget Rk for k = 2 already described in Section 2. Finally, we put together the results of Sections 2 and 3 to conclude that MMM is NP-hard in k-regular bipartite graphs for all fixed k ≥ 3.
2
MMM in k-Regular Bipartite Graphs ∝ MMM in (k + 1)-Regular Bipartite Graphs
Let Bk = (Vk , Ek ) be a k-regular bipartite graph. We shall explain a polynomial time reduction from MMM in k-regular bipartite graphs to MMM in (k + 1)-regular bipartite graphs. It has two steps; in the first step, called degree increasing step, we increase by 1 the degree of all vertices in Bk by branching to each of them a Degree Increasing gadget DIk which itself contains vertices of degree k and k + 1. This transformation is followed by the addition of a Regk , contains ularity gadget Rk to each DIk . The obtained graph, denoted by B vertices of degree k and k + 1. The second step, called regularity step, consists of k (k + 1)-regular by first taking k + 1 copies of B k and then branching making B Regularity gadgets Rk ’s to the vertices of degree k in such a way that, in the
Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs
367
final graph Bk+1 , all vertices have degree k + 1. To conclude, we show that if one can solve MMM in Bk+1 , then one can also solve MMM in Bk . 2.1
Degree Increasing Step
Let us first discuss about some properties of the gadget DIk depicted in Figure 1. DIk is bipartite as can be observed in the bipartition shown in Figure 1 by black (referred to as sign −) and white (referred to as sign +) vertices. DIk has one vertex in Level 1, k − 1 vertices in Level 2, k vertices in Level 3 and two vertices in Level 4; let ij denote the j’th vertex (from left) in Level i in a DIk . The edges of DIk are the following: there are all possible edges between Levels 1 and 2 and between Levels 3 and 4; every vertex in Level 2 has exactly k − 1 edges towards the vertices of Level 3 which are distributed as equally as possible (hence each vertex of Level 3 receives exactly k − 2 edges from Level 2 except one vertex which receives k − 1). We suppose without loss of generality that this is the vertex 31 in Level 3 which receives k − 1 edges from Level 2. The dashed edge e0 is the connection edge to the original graph Bk whereas the other k + 1 dashed edges e1 , . . . , ek+1 will be used to connect DIk to a Regularity gadget Rk .
Fig. 1. DIk : Degree Increasing gadget of order k
Suppose we have branched a gadget DIk to each vertex of the k-regular graph Bk using the edge e0 . Now, we branch a Regularity Gadget Rk (see Figure 2) to each DIk using the k + 1 (dashed) connection edges e1 , . . . , ek+1 . The graph k = (Vk , E k ) and depicted in Figure 3. obtained in this manner is denoted by B The gadget Rk has k + 1 vertices in each one of Level 1, Level 2 and Level 3, and k vertices in Level 4; the edges of Rk are such that a k-regular bipartite graph is induced by the vertices of Level 1 and 2, a 1-regular bipartite graph (a perfect matching) is induced by the vertices of Level 2 and 3, and all possible edges exist between Levels 3 and 4.
368
M. Demange and T. Ekim
Fig. 2. Rk : Regularity gadget of order k
k = (Vk , E k ): The graph obtained at the end of the first step Fig. 3. B
It is important to note that the connection edges e0 , e1 , . . . , ek+1 are not included in DIk , nor in Rk . Let nk = |Vk | be the number of vertices in Bk . The following property is a k : direct consequence of the construction of B k is bipartite; moreover all of its vertices are of degree k +1, except Property 1. B k vertices in each one of the nk gadgets DIk which are of the same sign and of degree k. k follows from the fact that both DIk and Rk are biparProof. A bipartition of B tite; moreover all end-vertices of the connection edges e1 . . . , ek+1 in DIk , respectively in Rk , have the same sign in the bipartition of DIk , respectively of Rk . k is a direct consequence of its The property on the degrees of vertices in B definition; the connection edges e0 ’s ensure that the degree of all vertices of
Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs
369
Bk is k + 1; all vertices in Rk ’s have degree k + 1 since the connection edges e1 , . . . , ek+1 are present; all vertices in DIk ’s are of degree k + 1 except the vertices 11 , 32 , 33 , . . . , 3k in each copy of DIk which are of degree k. Other properties follow from the above definitions. Property 2. There is a unique minimum maximal matching (of size k + 1) in Rk ; it leaves non-dominated all connection edges. Proof. First, notice that at least k + 1 edges are needed to dominate all edges in Rk since there is an induced perfect matching of size k + 1 between Levels 2 and 3 (shown by bold edges in Figure 2). In addition, this matching dominates all edges in Rk , hence it is a minimum maximal matching in Rk . Besides, suppose that there is another maximal matching M in Rk which does not contain p > 0 edges of the matching between Levels 2 and 3. This requires that min(p, k) other edges between Levels 3 and 4, and at least p/2 other edges between Levels 1 and 2 have to be in M (since there are p × k edges to be dominated between Levels 1 and 2, and one edge can dominate at most 2k − 1 edges (plus itself)). It follows that this matching has at least k + 1 − p + min(p, k) + p/2 > k + 1 edges and consequently there is no other minimum maximal matching in Rk . Let Hk be the graph which consists of a DIk = (VD , ED ) connected to a Rk = (VR , ER ) with the connection edges e1 , . . . , ek+1 and where all edges incident to e0 (that is all edges between Levels 1 and 2 in DIk ) are removed. In other words, ˜k induced by the vertex set (VD \ {11 }) ∪ VR . The following Hk is a subgraph of B property shows that, even if an edge e0 connecting a couple of gadgets DIk and Rk (which are already connected) to Bk is included in a maximal matching of ˜k , we still need at least 2(k + 1) additional edges to dominate all edges of these B gadgets DIk and Rk . Property 3. There are at least 2(k + 1) edges in any maximal matching MH of Hk . R D R D ∪ MH ∪ MH where MH = MH ∩ ER and MH = Proof. We set MH = MH D MH ∩ ED and hence MH ⊆ {e1 , . . . , ek+1 }. Firstly, we note that |MH | ≥ k. In fact, the k vertices of DIk in Level 3 are of degree at least k and an edge incident to one of them can dominate at most one edge which is incident to D another vertex of Level 3. As a consequence, if |MH | = k − 1 then these k − 1 edges can dominate at most all edges incident to k − 1 vertices of Level 3 and k − 1 edges incident to the remaining vertex of Level 3; hence at least one edge remains non-dominated. R ; suppose p ≥ 0 of them are adjacent to Let us now consider the edges in MH R one of the connection edges e1 , . . . , ek+1 . If p = 0, then |MH | = k + 1 since there is an induced matching of size k + 1 (between Levels 2 and 3) in Rk . In this case, R none of the connection edges e1 , . . . , ek+1 is dominated by an edge in MH and D moreover, they form an induced matching of size k + 1; thus |MH ∪ MH | ≥ k + 1 R contains at least k and therefore |MH | ≥ 2(k + 1). If p ≥ 1, note that MH additional edges to dominate all edges between Levels 3 and 4. Indeed, either R k edges saturating all vertices of Level 4 are in MH ; or at least one vertex of
370
M. Demange and T. Ekim
R Level 4 is not saturated by an edge in MH and consequently at least k + 1 R R edges are in MH . In both cases we have |MH | ≥ k + p. Now, if p ≥ 2 then R D |MH | ≥ |MH | + |MH | ≥ 2(k + 1). Finally, if p = 1, we consider two cases R contains k edges dominating all vertices of Level 4, mentioned above. If MH then there is at least one edge which is not yet dominated between Levels 1 and R 2; thus, at least one more edge (in MH or MH ) is necessary in addition to k + 1 R edges in MH , this gives |MH | ≥ 2(k + 1). If there is at least one vertex of Level R 4 which is not saturated, then |MH | ≥ k + 1 + 1 and therefore |MH | ≥ 2(k + 1), which concludes the proof.
The following result summarizes the above discussion on the degree increasing step. Lemma 1. There is a maximal matching M0 of size at most m0 in Bk if and only if there is a maximal matching M1 of size at most m1 = m0 + 2nk (k + 1) k where nk = |Vk | and all vertices in copies of DIk are saturated. in B Proof. (=⇒) The matching M1 contains all (at most) m0 edges in M0 which dominate all edges in Bk . In addition, the matchings of size (k + 1) in each DIk (there are nk of them) as depicted in Figure 1 by bold edges, and the matchings of size (k + 1) in each Rk (there are nk of them) as depicted in Figure 2 by k . Note that edges e0 , e1 , . . . , ek+1 bold edges dominate together all edges in B are dominated since the matchings in DIk ’s saturate their end-vertices in DIk ’s. This matching of size m1 = m0 + 2nk (k + 1) is clearly maximal. (⇐=) Suppose we have a maximal matching M1 of size at most m1 = m0 + k . Let Ve0 be the set of vertices in Bk which are incident to 2nk (k + 1) in B an edge of type e0 ∈ M1 . By Property 3, we know that |M1 | ≥ 2nk (k + 1) + |Ve0 | + |M1 ∩ Ek |; hence m0 ≥ |Ve0 | + |M1 ∩ Ek |. Besides, the edges of Bk not dominated by M1 ∩ Ek are incident to a vertex in Ve0 and the partial subgraph of Bk formed by these edges is a bipartite graph with Ve0 as a vertex cover (of its edges). Consequently, any maximal matching M1 of this subgraph has at most |Ve0 | edges. Therefore M1 ∪ (M1 ∩ Ek ) is a maximal matching of Bk of size at most m0 . 2.2
Regularity Step
k obtained at the end of the Property 1 states that all vertices in the graph B first step has degree k + 1 except k vertices of the same sign in each one of k a the nk copies of DIk . Now, we describe a procedure which allows to make B k , (k + 1)-regular bipartite graph. To this purpose, we first take k + 1 copies of B k . Then, for each vertex x ∈ Vk of degree k, we consider the denoted by (k + 1)B k ; clearly, they are all of degree k and k + 1 copies x1 , . . . , xk+1 of x in (k + 1)B there is a bipartition of (k + 1)Bk where all of x1 , . . . , xk+1 have the same sign. We branch a regularity gadget Rk to vertices x1 , . . . , xk+1 for every vertex x of k ; this new graph is denoted by Bk+1 . We have the following. degree k in B
Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs
371
Theorem 1. There is a maximal matching M0 of size at most m0 in Bk if and only if there is a maximal matching M2 of size at most m2 = (k + 1)m1 + k(k + 1)nk in Bk+1 where m1 = m0 + 2nk (k + 1) and nk = |Vk |. k , there are k vertices of degree k in each one Proof. (=⇒) By Property 1, in B of the nk gadgets DIk . This results in knk copies of the regularity gadget Rk in Bk+1 . By Property 2, we know that we can dominate all edges in these knk copies of Rk by k(k + 1)nk edges. Moreover, we know by Lemma 1 that one can find a minimum maximal matching M1 of size m1 = m0 + 2nk (k + 1) in k where all vertices in different copies of DIk are saturated. Therefore, by B k ) in M2 , we obtain a matching of size taking k + 1 copies of M1 (one by each B m2 = (k + 1)m1 + k(k + 1)nk which also dominates all connection edges between DIk ’s and newly added Rk ’s. (⇐=) There are knk independent copies of Rk ’s in Bk+1 which are added during the regularity step; by Property 2, there are at least k(k + 1)nk edges of M2 which dominate all edges in these Rk ’s (but not necessarily their connection edges e1 , . . . , ek+1 ). Consequently, the remaining at most (k+1)(m0 +2nk (k+1)) k and their edges in M2 dominate all edges in the k + 1 independent copies of B connection edges to the newly added knk copies of Rk . It follows from Lemma 1 that there is a maximal matching M0 of size at most m0 in Bk .
3
MMM in 3-Regular Bipartite Graphs Is NP-Hard
We will make use of the following problem in the sequel. Restricted 3-SAT : A set of clauses C1 , . . . , Cp with variables x1 , . . . xn where there are at most three literals (negative or positive variables) per clause, such that every variable occurs twice and its negation once. Lemma 2. The decision version of MMM in bipartite graphs with vertices of degree 2 or 3 is NP-complete. Proof. The reduction is a modified version of the reduction of Restricted 3-SAT to the decision version of MMM in bipartite graphs with vertices of maximum degree 3 [10]. We construct a graph B = (V, E) as follows: for each variable xi , there is a gadget Gi1 as depicted in Figure 4 where the edges (dj1 , dj1 ) and (dj2 , dj2 ) represent the two clauses Cj1 , Cj2 in which xi appears in positive form, and (dj3 , dj3 ) the clause Cj3 in which xi appears in negative form. Edges j2 j3 (aj1 i , dj1 ), (bi , dj2 ) and (ci , dj3 ) are the connection edges between subgraphs representing variables and clauses. In this way, all variables and clauses are represented; nevertheless, the vertices dj (respectively dj ) corresponding to clauses Cj which consist of 3-positive (respectively 3-negative) variables will have degree four. We avoid such a situation by representing this kind of clauses by the gadget xr , x ¯k , x ¯h )), Gj2 of Figure 4. For a clause Cj = (xr , xk , xh ) (respectively Cj = (¯ j j j j j the connection edges l1 , l2 , l3 are incident to ar or br , ak or bk , and ah or bjh (respectively cjr , cjk and cjh ); a connection edge is incident to a vertex of type aji
372
M. Demange and T. Ekim
Fig. 4. Gadgets G1 and G2
if the first occurrence of variable xi is in the clause Cj , and to a vertex of type bji if the second occurrence of variable xi is in the clause Cj . Now, we notice that the constructed graph B is bipartite with vertices of degree 2 or 3. Suppose a satisfying assignment of Restricted 3-SAT is given; let p3 be the number of clauses with 3-positive or 3-negative variables, then a maximal matching M of size 9n + 5p3 in B is obtained as follows: j2 j3 i M = {(aj1 i , dj1 ), (bi , dj2 ), (ci , ci )|xi = 1} ∪ {6 bold edges in G1 |xi = 1} j2 j3 i ∪ {(ai , aj1 i ), (bi , bi ), (ci , dj3 )|xi = 0} ∪ {6 bold edges in G1 |xi = 0}
∪
{5 bold edges in Gj2 |Cj has 3-positive or 3-negative variables}.
Conversely, let M be a maximal matching of size 9n + 5p3 in B, a truth assignment for Restricted 3-SAT can be obtained in polynomial time. First, notice that M must contain at least 5 edges per gadget Gj2 even though all connection edges l1 , l2 , l3 are in M (since there are three disjoint paths of length 4 between vertices u and v). In particular, if at least one connection edge is in M , then 5 additional edges suffice to dominate all edges of Gj2 (In Figure 4, the bold edges of Gj2 show these 5 edges in case l1 is in M ). On the other hand, if none of the connection edges is in M , then at least 6 edges from Gj2 should be in M . Secondly, one can observe that in Gi1 , we need at least 6 edges to dominate all edges containing non-labeled vertices (such a set of 6 edges in G1 is shown in bold in Figure 4). Moreover, there is no such a set of 6 edges which also j3 j3 j2 j2 j1 dominates at least one edge from {(aj1 i , a i ), (ci , c i ), (bi , b i )}. It follows j1 j3 j2 that vertices ai , ci and bi must be saturated for all variables xi in every maximal matching of B, so in particular in M . This implies that there are at least three more edges per variable in M . Since |M | = 9n + 5p3 and at least 6n edges are necessarily used to dominate all edges containing non-labeled vertices as well as at least 5p3 edges to dominate all edges in Gj2 ’s, there are j3 exactly three additional edges per variable which should saturate aj1 i , ci and
Minimum Maximal Matching Is NP-Hard in Regular Bipartite Graphs
373
bj2 i ; therefore neither (ai , ci ) nor (ci , bi ) is in M for any xi . Let us now have a closer look at these three remaining edges per variable in M . If (ci , cj3 / M, i ) ∈ j2 j3 j3 ), (b , b ), (c , d ) ∈ M . If (c , c ) ∈ M , then we can then necessarily (ai , aj1 i i i i j3 i i j1 j2 suppose without loss of generality that (ai , dj1 ), (bi , dj2 ) ∈ M because if not, M can be transformed in polynomial time into such a maximal matching since / M. |M | = 9n + 5p3 implies that for every j we have (dj , dj ) ∈ We define a truth assignment by setting xi = 0 in the first case and xi = 1 / M , there is at least one edge in M which is in the second case. As ∀j, (dj , dj ) ∈ incident to dj or dj and which determine(s) the variable(s) satisfying the clause j3 j2 j. One can notice that if for some i, we have (ai , aj1 i ), (ci , ci ), (bi , bi ) ∈ M then variable xi can be set equally to 0 or 1; in fact, this simply means that there is a truth assignment of the Restricted 3-SAT instance where variable xi does not satisfy any clause. Theorem 2. The decision version of MMM in 3-regular bipartite graphs is NPcomplete. Proof. The reduction is from MMM in bipartite graphs with degree 2 or 3. Let B be a bipartite graph with vertices of degree 2 or 3. We take 3 copies of B and for each vertex x of degree 2 in B, we branch a Regularity gadget (see Figure 2) of order k = 2 (R2 ) to its three copies x1 , x2 and x3 . Let us denote this new graph by B ; it is easy to see that B is a 3-regular bipartite graph. Now, it follows from the discussion in the proof of Theorem 1 that there is a maximal matching M of size m in B if and only if there is a maximal matching M of size m = 3m + 6|V 2 | in B where V 2 is the set of vertices of degree 2 in B.
4
Final Remarks
The following is a corollary of Theorems 1 and 2: Corollary 1. MMM is NP-hard in k-regular bipartite graphs for all fixed k ≥ 3. A simple computation shows that our reduction can be done in polynomial time if k is fixed. In fact, if the number of vertices in a k-regular bipartite graph is nk , then our reduction (in Theorem 1) gives a (k + 1)-regular bipartite graph with nk (10k 2 + 14k + 5) vertices. It follows that the graph obtained (by successive applications of Theorem 1 starting from parameter k = 3 until k = k) to show the NP-hardness of MMM in k-regular bipartite graphs will have a number of vertices which is a multiplier of k!. Our result narrowed down the gap between P and NP for MMM in bipartite graphs. This theoretical progress opens the doors of several research directions. Several approximation algorithms with performance guarantee for MMM are known in the literature (MMM is 2-approximable even if the edges are weighted [3]); one could hope to derive algorithms with better performance guarantee in the case of regular bipartite graphs. The sharpness of such approximation ratios in special cases can be discussed knowing that this allows to bound the number of unmatched pairs in a stable matching.
374
M. Demange and T. Ekim
References 1. Alkan, A.: Current Trends in Economics: Theory and Applications, pp. 30–39. Springer, Heidelberg (1999) 2. Balinski, M., S¨ onmez, T.: A Tale of Two Mechanisms: Student placement. Journal of Economic Theory 84, 73–94 (1999) 3. Fujito, T., Nagamochi, H.: A 2-approximation algorithm for the minimum weight edge dominating set problem. Discrete Applied Mathematics 118, 199–207 (2002) 4. Gale, D., Shapley, I.S.: College admissions and the stability of marriage. Amer. Math. Montly 69, 9–14 (1962) 5. Harary, F.: Graph Theory. Addison-Wesley, Reading (1994) 6. Horton, J.D., Kilakos, K.: Minimum edge dominating sets. SIAM J. Disc. Math. 6(3), 375–387 (1993) 7. Hwang, S.F., Chang, G.J.: The edge domination problem. Discuss. Math. Graph. Theory 15(1), 51–57 (1995) 8. Mitchell, S.L., Hedetniemi, S.T.: Edge domination in trees. In: Proc. of the 8th Southeastern Conference on Combinatorics, Graph Theory and Computing (Louisiana State Univ., Baton Rouge, La., 1977), pp. 489–509 (1977) 9. Srinivasan, A., Madhukar, K., Nagavamsi, P., Pandu Rangan, C., Chang, M.-S.: Edge domination on bipartite permutation graphs and cotriangulated graphs. Inform. Process. Lett. 56(3), 165–171 (1995) 10. Yannakakis, M., Gavril, F.: Edge dominating sets in graphs. SIAM J. Appl. Math. 38, 364–372 (1980)
A Topological Study of Tilings Gr´egory Lafitte1 and Michael Weiss2, 1
2
Laboratoire d’Informatique Fondamentale de Marseille (LIF), CNRS – Aix-Marseille Universit´e, 39, rue Joliot-Curie, F-13453 Marseille Cedex 13, France Centre Universitaire d’Informatique, Universit´e de Gen`eve, Battelle bˆ atiment A, 7 route de Drize, 1227 Carouge, Switzerland
Abstract. To tile consists in assembling colored tiles on Z2 while respecting color matching. Tilings, the outcome of the tile process, can be seen as a computation model. In order to better understand the global structure of tilings, we introduce two topologies on tilings, one ` a la Cantor and another one ` a la Besicovitch. Our topologies are concerned with the whole set of tilings that can be generated by any tile set and are thus independent of a particular tile set. We study the properties of these two spaces and compare them. Finally, we introduce two infinite games on these spaces that are promising tools for the study of the structure of tilings.
1
Introduction
Wang was the first to introduce in [Wan61] the study of tilings with colored tiles. A tile is a unit size square with colored edges. Two tiles can be assembled if their common edge has the same color. A finite set of tiles is called a tile set. To tile consists in assembling tiles from a tile set on the grid Z2 . One of the first famous problems on tilings was the domino problem: can one decide whether given a tile set, there exists a tiling of the plane generated by this tile set? Berger proved the undecidability of the domino problem by constructing an aperiodic set of tiles, i.e., a tile set that can generate only non-periodic tilings [Ber66]. Simplified proofs can be found in [Rob71] and later [AD96]. The main argument of this proof was to simulate the behavior of a given Turing machine with a tile set, in the sense that the Turing machine M stops on an instance ω if and only if the tile set τM,ω does not tile the plane. Hanf and later Myers [Mye74, Han74] have strengthened this and constructed a tile set that has only non-recursive tilings. Later, tilings have been studied for different purposes: some researchers have used tilings as a tool for studying mathematical logical problems [AD96], others have studied the different kinds of tilings that one tile set can produce [CK97, DLS01, Rob71], or defined tools to quantify the regular structure of a tiling [Dur99]. One of the most striking facts concerning tilings is that tilings constitute a Turing equivalent computation model. This computation model is particularly
This author has been supported by the FNS grant 200020-105515.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 375–387, 2008. c Springer-Verlag Berlin Heidelberg 2008
376
G. Lafitte and M. Weiss
relevant as a model of computation on the plane. Notions of reductions that have led to notions of universality for tilings and completeness for tile sets have been introduced in [LW07]. It is difficult to quantify this completeness property and other ones as periodicity, quasiperiodicity: one would like to be able to measure how common such a property is, in order to determine when and how they occur, or to give a size to the different sets of tilings with a certain property, or to say if a tile set is more likely to produce tilings with a certain property. One of our ultimate goals is to be able to say that if a set of tilings (generated from a given tile set or a family of tile sets) is large enough, then it necessarily contains a tiling with such and such properties. Naturally, topological tools on tilings would be the first step in this direction; first step that we aim at developing in this paper. We introduce two metrics on tilings which have the particularity to be independent of a particular tile set, i.e., we can measure a distance between two tilings that are not necessarily generated by the same tile set. These two metrics are similar to two traditional metrics used in the study of cellular automata: the so-called Cantor and Besicovitch metrics. The former gives more importance to the local structure around (0, 0) and the later measures the asymptotic difference of information contained in the two tilings. They give rise to two natural topologies on the set of tilings. The topological study of subsets of reals is inherently linked to the study of infinite games. In some of these games, such as Banach-Mazur games, a strong connection exists between the existence of a winning strategy and the co-meager property of the set on which the game is played. This connection allows one to show that some sets are meager. Others games, such as Gale-Stewart games, yield a hierarchy of winning strategies for one of the two players - we say in this case that the game is determined - depending on the structure of the sets on which the games are played. Having such a game-topological study of tilings, instead of subsets of reals, can lead to a better understanding of the structure of the tilings generated by a tile set. This paper is organized as follows: we first recall basic notions on tilings, define the two topologies that we use and prove basic properties of these topologies. Then we study in a deeper way the structure of our topological spaces. We conclude by introducing two types of games on tilings, which help us to prove in a simpler way results of the previous section, and open a new direction for the study of the structure of tilings.
2 2.1
Topologies on Tilings Tilings
In this paper we use the following terminologies: a tile set S is an initial subset {1, . . . , n} of N. To map consists in placing the numbers of S on the grid Z2 . A mapping generated by S is called a S-mapping. It is associated to a mapping 2 function fA ∈ S Z that gives the tile of S at position (x, y) in A. We call M the 2 set of all mappings, i.e., M ≡ {{1, . . . , n}Z }n≥1 .
A Topological Study of Tilings
377
An S-pattern without constraints, or just S-pattern, is an S-mapping defined on a finite subset of Z2 . A Wang tile is an oriented unit size square with colored edges from C, where C is a finite set of colors. A Wang tile set, or just tile set, is a finite set of different tiles. To tile consists in placing the tiles of a given tile set on the grid Z2 such that two adjacent tiles share the same color on their common edge. A tiling P generated by a tile set τ is called a τ -tiling. It is associated to a tiling function fP where fP (x, y) gives the tile at position (x, y) in P . An S-mapping A ∈ M is a Wang tiling if there exist a Wang tile set τ , a τ -tiling P and a bijective function h : S → τ such that h ◦ fA (x, y) = fP (x, y). By this, we mean that A works as P . We define T as the subset of mappings of M which are Wang tilings. Different kinds of tile sets and tilings have been identified: a periodic tiling is a tiling such that there exist two integers a and b such that for any (x, y), the tiles at position (x, y) and (x+a, y+b) are the same; a tile set is periodic if it generates a periodic tiling; a tiling is finite if it is a pattern; a tiling P is universal if for any tile set τ , P simulates at least one τ -tiling. For more precisions on simulation and universality we refer the reader to [LW07]. Different tools are used to quantify the regular structure of a tiling. One of these is the quasiperiodic function. For a tiling P , the quasiperiodic function of P , denoted gP , is the function that given an integer n, gives the smallest integer s which has the following property: if m is a square pattern of P of size n (the length of the sides of the square), then m appears in any square of size s in Z2 . Thus, the quasiperiodic function of a tiling quantifies the regularity of appearance of the patterns in the tiling. Some tilings do not have a quasiperiodic function defined for every n. We say that a tiling is quasiperiodic if it has a quasiperiodic function defined for every n. An important result in [Dur99] is that any tile set, that can tile the plane, can generate a quasiperiodic tiling of the plane. 2.2
A Besicovitch Topology
The first metric we introduce is a metric similar to the cellular automata metric a la Besicovitch. A {n × n} τ -pattern m can be seen has a sequence of numbers ` placed in a rectangle of size n × n ; we have the number k in position (x, y) if fm (x, y) = tk where tk is the k th tile of τ . Then intuitively any reordering of the tiles of τ gives the same pattern. Thus, we would like to say that the distance between two patterns m and m is c if the proportion of different tiles between m and m is at most c up to a reordering of the tiles. We formalize this notion in the following definitions, by defining a metric for any pattern without constraints: Definition 1. Let S and S be two initial subsets of N such that |S| ≤ |S |. Let A be a {n × n} S-pattern, B be a {n × n} S -pattern, and g be a one-toone function from S to S . We define the metric related to g by: δg (A, B) = #{ (x,y) | g◦fA (x,y)=fB (x,y) } . n2
378
G. Lafitte and M. Weiss
If A is an S-pattern and B is an S -pattern such that |S| ≥ |S |, then δg (A, B) is defined to be equal to δg (B, A). We define the absolute metric by: δ(A, B) = ming∈S S {δg (A, B)}. From this definition of distance between patterns, we have that δ is symmetric and satisfies the triangle inequality. Now that we have defined a metric between patterns, we can generalize it to tilings of the whole plane, since a tiling can be seen as the infinite union of patterns of ever increasing sizes: Definition 2. Let A be an S-tiling and let B be an S -tiling. For any function g ∈ S S , we define the tiling metric dg by: dg (A, B) = lim supi→∞ δg (Ai , Bi ), where the Ai (resp. Bi ) are the {n × n} S-patterns (resp. S -patterns) centered around (0, 0) in A (resp. B). We define the absolute tiling metric d by: d(A, B) = ming∈S S {dg (A, B)}. Since δ satisfies the triangle inequality, and since reflexivity and symmetry are obvious, then d is a pseudometric on M. The natural way to obtain a metric on M is to introduce an equivalence relation ≡d defined by: A ≡d B ⇔ d(A, B) = 0. One can see that ≡d is an equivalence relation. We can now consider the quotient space M/ ≡d where a typical element is the equivalence class [A] = { B | d(A, B) = 0 }. By adding to this space the metric d we obtain a metric space that we call MB . In this paper, [A] denotes the equivalence class of the particular mapping A; an element of MB is designated by a capital letter in bold, e.g., A; and we say ”let A ∈ MB ” in the sense that we consider a mapping of the equivalence class of A, i.e., a mapping in [A]. Similarly, we define the space T/≡d where a typical element is [P ] = { Q | d(P, Q) = 0 }. By adding to this space the metric d we obtain the metric space TB . Of course, we have TB ⊂ MB . An element of TB is an equivalence class of MB that can contain mappings that are not Wang tilings, but which work ”almost” like Wang tilings since the local constraint is respected almost everywhere. From this definition, we have the two following results: for any two mappings A, B ∈ M, the distance between A and B is in [0, 1[ and for any mapping C ∈ M (resp. any tiling P ∈ T) and any ∈ [0, 1[, there exists a tiling D (resp. Q) such that d(C, D) ≥ 1 − (resp. d(P, Q) ≥ 1 − ). Therefore, for any tiling A, we can build a tiling B such that the distance between A and B is almost one. To obtain a topological space, since MB is a metric space, we use the natural topology induced by the metric where the open sets are the balls B(A, r), where A is a mapping. We use the subset topology on TB . The Besicovitch metric is one of the traditional metrics used for tilings. The other one is the Cantor one which gives more importance to the finite patterns centered around the origin. 2.3
A Cantor Metric
We define and study another traditional metric adapted for tilings, a metric ` a la Cantor. The metric studied above, is a metric that allows one to understand
A Topological Study of Tilings
379
the behavior of a tiling in the whole Z2 grid. The distance between two tilings is small if their behavior is close. Another way to measure distance between tilings is to consider the greatest common pattern centered around (0, 0) that they share. We first define the function p as the function p : N → Z2 such that p(0) = (0, 0), p(1) = (0, 1), p(2) = (1, 1), p(3) = (1, 0) . . . and p keeps having the behavior of a spiral afterward. This function allows us to enumerate the tiles of a given tiling. We define the prefix-patterns of a tiling P : Definition 3. Let P be a tiling. The prefix-pattern of size n of P is the pattern m defined on the finite subset D = {p(0), . . . , p(n)} Z2 with the pattern function fm (x, y) = fP (x, y) if (x, y) ∈ D. We denote the prefix-pattern m by a finite sequence of tiles ordered by the function p: m = {fm ◦ p(0), fm ◦ p(1), . . . , fm ◦ p(n − 1)}. If m is a prefix-pattern of size n, then m.t1 , the concatenation of m and the tile t1 , is the prefix-pattern of size n + 1 such that m.t1 = {fm ◦ p(0), fm ◦ p(1), . . . , fm ◦ p(n − 1), t1 }. The set of prefix-patterns can be enumerated by a tree T . The rules of the construction of T are the following: at level 0 in T , we have the empty prefixpattern; at level 1 we have an unique prefix-pattern {1}. Then a pattern m at the level i, composed of j different tiles, has j + 1 sons: m.1, m.2, . . . , m.(j + 1). One can see that for any tile set S and any prefix-pattern m = {m1 , . . . , mn } generated by S, there exists an unique bijective function em : S → {1, . . . , |S|} def such that the prefix-pattern em (m) = {em (m1 ), em (m2 ), . . . , em (mn )} is an element of T . em (m) is said to be the canonical form of m. In T , an infinite branch corresponds to a mapping of the plane. Thus, to any S-mapping A there exists a def unique bijective function eA such that the set eA (A) = {eA ◦ fA ◦ p(0), eA ◦ fA ◦ p(1), . . .} corresponds to an infinite branch of T . eA (A) is said to be the canonical form of A. We say that m is a prefix-pattern of A if em (m) is a prefix-pattern of eA (A). We can now define a metric ` a la Cantor: Definition 4. Let A be an S-mapping and B be an S -mapping. We define the Cantor metric δC as: δC (A, B) = 2−i where i is the size of the greatest common prefix-pattern of eA (A) and eB (B), i.e., the highest level in T where eA (A) and eB (B) are equal. If eA (A) = eB (B) then δC (A, B) = 0. We can see that δC is a pseudometric on M. In fact, dC is a hypermetric, i.e., a metric such that for any three mappings A, B, C, dC (A, C) ≤ max{dC (A, B), dC (B, C)}. This is a stronger version of the inequality of the triangle. One can note that in a hypermetric space, any point of an open ball is center of this ball. To obtain a metric on M, we say that two tilings P and Q are equivalent, P ≡C Q, if their Cantor distance is null. Thus, two tilings are equivalent if they represent the same tiling up to a color permutation. We denote MC the space of equivalence classes M/≡C equipped with the metric δC . Similarly, we define T/≡C that we denote TC . The metric δC is a metric on TC . From this, we can define a topology on TC : we say that the set Um = { P | m is a prefix-pattern of P } is a clopen set for any prefix-pattern m. This
380
G. Lafitte and M. Weiss
topology gives rise to a different understanding of the topology of tilings than the topological space MB since it gives more importance to the local structure centered around {0, 0}. Since there is a finite set of pattern of a given size, then we can cover MB and TB with a finite set of open sets. Therefore, MB and TB are precompact, i.e., for all r, there does not exist a finite set of open balls of radius r that covers these spaces. Since we have a Hausdorff space, because MC is a metric space, and since we have a basis of clopen sets, then MC is a 0 − dimensional space. We finish the definition of the Cantor space by stating some obvious facts about the Cantor metric: if A and B are two mappings, then dC (A, B) ∈ [0, 1/2], and for any mapping A, there exists a mapping B such that dC (A, B) = 1/2. 2.4
Basic Properties
The space MC is well-defined since two mappings at distance 0 are in fact the same tiling up to a reordering of the tiles, or, to say it differently, have the same canonical form. The space MB is slightly different, since two tilings at distance 0 can be different. We have to redefine properties for the mappings of MB : if a tiling class contains a tiling with a certain property, then all the class has this property, since in fact, all other tilings of the class have ”almost” the property. Thus, we can define the following tiling classes: if P ∈ TB is a tiling class, we say that P is: periodic if P contains a periodic tiling, quasiperiodic if P contains a quasiperiodic tiling, finite if P contains a finite tiling (i.e., a pattern), universal if P contains a universal tiling and a tiling with a quasiperiodic function f if P contains a tiling with a quasiperiodic function f and if P does not contain a tiling with a quasiperiodic function g < f . We obtain now a space that works almost like the space of all Wang tilings. We can see how the different classes work. The following proposition states that the distance between two periodic tilings of TB is a rational number and gives a characterization of the classes of periodic tilings. Proposition 1. If P and Q are two different periodic tilings, then there exist n, m ∈ N∗ such that d(P, Q) = n/m. Therefore, if P ∈ MB is periodic, then it contains one and only one periodic tiling up to a reordering of the tiles. The following proposition shows that two quasiperiodic tilings which belong to the same equivalence class have the same quasiperiodic function: Proposition 2. If P and Q are two quasiperiodic tilings such that P ≡d Q, then gP = gQ , where gP and gQ are the quasiperiodic functions of P and Q. We recall a basic notion of tilings: extraction. Consider an infinite set of τ patterns {m1 , m2 , . . .} of ever increasing sizes. We can see them as an infinite tree with the root representing the empty pattern and where a pattern m is a direct son of a pattern n if n is a subpattern of m, and if there does not exist a pattern m = m such that n is a subpattern of m and m is a subpattern of m. Thus, we obtain an infinite tree with finite degree. Therefore, by Koenig’s
A Topological Study of Tilings
381
lemma, we have at least one infinite branch. In our tree, this branch represents a tiling Q of the plane. This tiling Q is said to be an extraction of the set {m1 , m2 , . . .}. Now, we say that Q is extracted from P if there exists an infinite set {m1 , m2 , . . .} of patterns of P such that Q is an extraction of {m1 , m2 , . . .}. Notions of universality and completeness for tilings have been introduced in [LW07]. We relate them to our topological space: Proposition 3. Let P ∈ MB be a universal (resp. quasiperiodic, periodic) tiling. Then for any tiling A ∈ P, we can extract from A a universal (resp. quasiperiodic, periodic) tiling A . The previous result shows again that belonging to an equivalence class with a certain property is almost like having this property. Now we study the different distances that we can obtain between Wang tilings and mappings in our two Besicovitch spaces. We have the following properties: Theorem 1. i) There exist a mapping A ∈ MB \ TB and > 0 such that the ball B(A, ) does not contain any Wang tilings; ii) For any n, there exists an infinite subset H of TB such that for any two tilings P and Q of H, d(P, Q) ≥ 1 − 1/n2 . As a corollary, we have that the spaces MB and TB are not precompact. We can remark that that there exist prefix-patterns that can not be represented by the local constraint of a Wang tile set. From this, we obtain the following proposition: Proposition 4. There exists an open set in MC that does not contain any Wang tiling.
3 3.1
Properties of Our Topologies Properties of the Metric Spaces
We first study some basic notions of our spaces to have a better understanding of how they work. First of all, since we have metrizable spaces, we have that our spaces are completely Hausdorff and that there are no isolated points neither in MB nor in TB . This seems natural for MB , and is more interesting in the case of TB . Proposition 5. MB , TB , MC and TC are all perfectly normal Hausdorff and perfect. The set of tilings is uncountable. But there exist tile sets that generate an uncountable set of tilings but only a countable set of equivalence classes in MB . From this, and with the fact that any equivalence class contains an uncountable set of mappings, there arises the question of the cardinality of MB . The following proposition shows that even the equivalence classes of TB are uncountable.
382
G. Lafitte and M. Weiss
Proposition 6. TB has the cardinality of the continuum. The next theorem is important for the understanding of Wang tilings. The metric used is strongly related to the information contained in the tiling. Thus, two tilings are close if the information contained in them is similar. So, theorem 2 can be stated as follows: a countable set of Wang tilings can not approach all the information that Wang tilings can generate. Then, by generalizing this theorem, we obtain a nice corollary. Theorem 2. There does not exist a countable set of Wang tilings which is dense in TB . Corollary 1. i) For any countable set of Wang tilings H, there exist a tiling P and a natural number n such that d(P, Q) ≥ 1/n for any Q ∈ H; ii) There exist a Wang tiling P and an integer n such that for any tile set τ , there exists at least one τ -tiling Q such that d(P, Q) ≥ 1/n. The next proposition shows how different the two topological spaces TB and TC are. This comes from the fact that one takes a glimpse at the whole tiling since the other one just look at it with blinkers. Proposition 7. There exists a countable set of Wang tilings that is dense in τC . From this, we have that MB is separable, and since it is completely metrizable, we have that MB is a Polish space. The next theorem shows that for any two tilings A, B ∈ TB , there exists a continuous path c : [0, 1] → TB such that c(0) = A and c(1) = B: Theorem 3. TB is a path-connected space. 3.2
Topological Properties
We now study the topological structure of our spaces. Since they are metric spaces, then natural topologies are induced on them by the metrics. We define the following sets: i) Mapping(S) = { [A] | A is an S-mapping }, ii) Wang(S) = { [P ] | P is a Wang S-tiling }. And for any tile set τ , we define the set: Wang(τ ) = { [P ] | P is a τ -tiling }. The following theorem shows that the set of mappings or tilings generated by a tile set is a closed set. Then we give a characterization of the set of Wang tilings that can produce a tile set: Theorem 4. Let S and τ be two tile sets. Then Wang(τ ) and Mapping(S) are closed sets and Wang(τ ) is either a closed discrete set, or a closed non-discrete nowhere-dense set. Corollary 2. i) TB is meager in MB ; ii) MB \ TB is dense in MB .
A Topological Study of Tilings
383
We now show that our spaces are Baire spaces: Theorem 5. TB , TC , MB and MC are complete metric spaces, and thus, are Baire spaces. In the following section, we introduce games on our topological spaces as tools for the study of the structure of tilings.
4
Games on Tilings
Since the tilings computation model is equivalent to the Turing machines, tilings make possible a geometrical point of view of computability. The different topological tools studied in this paper have shown some interesting aspects of the behavior of computability in tiling spaces. As we have seen, the set of tilings generated by a tile set τ , gives rise to complex subsets of MB and MC . These sets can even be uncountable. A natural next step for studying these sets is to consider infinite games on tilings. Several kinds of infinite games exist and are used in many different fields. Games have been studied for computation models such as pushdown automata (see Serre’s PhD thesis [Ser05] for detailed survey). Considering the tilings computation model, we now give definitions for two types of infinite games on tilings. The first one, Banach-Mazur games [Oxt57], is a play on the topological structure of the space. Different definitions of Banach-Mazur games exist. We propose this one: Definition 5. Let X be a topological space and Y a family of subsets of X such that: i) any member of Y has nonempty interior; ii) any nonempty open subset of X contains a member of Y . Let C be a subset of X. The game proceeds as follows: Player I chooses a subset Y1 of Y . Player II chooses a subset Y2 of Y such that Y2 ⊆ Y1 . Then Player I chooses a subset Y3 of Y such that Y3 ⊆ Y2 and so on. At the end of the infinite game, we obtain a decreasing sequence of sets: X ⊇ Y1 ⊇ Y2 ⊇ . . . such that Player I has chosen the sets with odd indexes and Player II has chosen the sets with even indexes. Player II wins the game if n≥1 Yi ⊆ X. The study of the different subsets of X such that Player II has a winning strategy is the main application of Banach-Mazur games. Of course, if C = X Player II has a winning strategy. The question is: how big C has to be to allow Player II to have a winning strategy? This gives rise to classical theorem concerning BanachMazur games on topological spaces which states: a subset C of X is meager if and only if Player II has a winning strategy for the game on {X, X \ C}. We propose a Banach-Mazur game on the space MC : Definition 6. Let X be a subset of MC and C ⊆ X. The game {X, C} is defined as follows: Player I chooses a prefix-pattern m1 such that m1 is a prefix-pattern
384
G. Lafitte and M. Weiss
of a mapping of X. Player II chooses a prefix-pattern m2 such that m1 ⊂ m2 and such that m2 is a prefix-pattern of a mapping of X and so on. At the end of the game, we obtain a sequence of prefix-patterns m1 ⊂ m2 ⊂ m3 . . . from which we extract a unique mapping A. Player II wins the game if A ∈ C. This amounts to playing the classical Banach-Mazur game with X = MC and Y ⊆ { Um | Um is an open set in MC }. In our Cantor space, choosing a prefixpattern amounts to choosing an open set, since the open sets of MC can be defined from prefix-patterns. Using this tool, we show that the set of Wang tilings is meager in the set of mappings for our topology MC : Theorem 6. TC is meager in MC . Proof. We use the game {MC , MC \ TC } . We now have to show that Player II has a winning strategy, i.e., Player II can always chooses integers in such a way that the final mapping can not be a Wang tiling. This is true since whatever plays Player I at the first round, then Player II can choose a prefix-pattern which does not respect any possible local constraint generated by Wang tilings. This trivial proof shows the convenience of using games to prove that some subsets are meager. To obtain the same kind of results for MB we define a Banach-Mazur game more adapted to the topology of MB : Definition 7. Let X be a subset of MB and C ⊆ X. A Banach-Mazur game on {X, C} is defined as follows: Player I chooses a mapping A1 of X and an integer n1 ; Player II chooses a mapping A2 of X such that d(A1 , A2 ) ≤ 1/n1 and an integer n2 ≥ n1 ; Player I chooses a mapping A3 of X such that d(A3 , A2 ) ≤ 1/n2 and an integer n3 ≥ n2 , and so on. Player II wins the game if limi→∞ Ai ∈ C. This game is still equivalent to a classical Banach-Mazur game, and since MB is a topological space, we still have that Player II has a winning strategy if and only if C is co-meager. We now prove the same result for MB : Theorem 7. TB is meager in MB . Proof. We will show that Player II has a winning strategy in the game {MB , MB \ TB }. To show this, we first prove the following result: for any tiling P and any open ball B(P, 1/n), there exist a mapping A and an integer m > n such that B(A, 1/m) ⊂ B(P, 1/n) and B(A, 1/m) ∩ TB = ∅. Let P be a tiling and n an integer. The idea is to insert error patterns in P . We can build a pattern of size six generated by two tiles such that it can not be represented by a Wang pattern since its construction would imply that the two tiles that compose it are equal. Thus, at least one tile of this pattern can not be represented by Wang tiles. We introduce it in P in such a way that the new mapping A that we obtain is at distance 1/2n of P . Because of the error patterns, A can not be a Wang tiling. In the error pattern we have at least one of the six tiles that can not be represented by a Wang tile. Therefore if Q
A Topological Study of Tilings
385
is a Wang tiling, then d(Q, A) ≥ 1/12n. Thus, B(A, 1/12n) ⊂ B(P, 1/n) and B(A, 1/12n) ∩ TB = ∅. With this result, we can see that the strategy of Player II will be to choose the tiling A and the integer 12n1 to be sure to win. Thus, TB is meager. We introduce another type of games for the study of the complexity of the set of tilings generated by a given tile set: games ` a la Gale-Stewart. Here is a general definition of these games: Definition 8. Let A be a nonempty set and X ⊆ AN . We associate with X the following game: Player I chooses an element a1 of A, Player II chooses an element a2 of A and so on. Player I wins if {an }n∈N ∈ X. We denote G(A, X) this game. A traditional question about a game G(A, X) is to know whether one of the two players has a winning strategy, or in the terminology of games, if the game is determined. In [Mar75], Martin has shown that any Borel set is determined. Thus, we have to go beyond the Borelian hierarchy to find subsets complicated enough not to be determined. We would like to use these games on tilings to obtain similar structural complexity results for the set of tilings generated by a tile set. In that direction we give the following definition: Definition 9. Let H ⊆ MB and X ⊆ H. The Gale-Stewart game G(H, X) is defined as follows: Player I chooses a tile a1 such that {a1 } is a prefix-pattern of a tiling of H; Player II chooses a tile a2 such that {a1 , a2 } is a prefix-pattern of a tiling of H and so on. Player I wins if the tiling {a1 , a2 , . . .} ∈ X. If one of the two players has a winning strategy we say that G(H, X) is determined or that X is determined in H. We say that τ is determined if Wang(τ ) is determined in Tτ , where Tτ is the set of all τ -tilings and τ -patterns, and that τ is completely undetermined if for any subset X ∈ Wang(τ ), X is undetermined in Wang(τ ). The question is to know which kinds of games on tilings are determined, and which ones are not. We give a glimpse in that direction by showing that there exist tile sets determined and other ones completely undetermined: Theorem 8. i) There exists a determined tile set; ii) There exists a completely undetermined tile set. Proof. i) To show this, we just have to find a tile set simple enough to generate a determined game. The tile set Easy5composed of a unicolor blue tile and four tile with three sides blue and one red satisfies the theorem. Player I has a winning strategy in the game G(TEasy5 , Wang(Easy5)): in this game, the goal of player I is to obtain a tiling of the plane, and the goal of player II, while respecting the local constraint of Easy5, is to obtain a situation where Player I can not move anymore. Player I can force the play of Player II by playing always one of the tile with a colored edge to force Player II to play the symmetric of this tile. Thus, the game is determined.
386
G. Lafitte and M. Weiss
ii) Since Hanf and Myers [Mye74, Han74], we know that there exist tile sets that generate only non-recursive tilings. Let τ be one of them; we consider the game G(Wang(τ ), X) where X is a subset of Wang(τ ). If this game is determined, then there exists a winning strategy for one of the two players. Without loss of generality, suppose Player I has a winning strategy. Therefore, whatever Player II plays, Player I has a recursive process that allows him to choose a tile to go in a winning position: this strategy can generate a τ -tiling in a recursive way. This is a contradiction.
5
Concluding Remarks
The topologies and games introduced in this paper have made possible some descriptions of the structure of Wang tilings. This is a first step in the direction of measuring the largeness or meagerness of sets of tilings. One of the many remaining questions is to be able to measure how common universality is. To reach these goals, the topological study of tilings through games appears as a promising approach.
References [AD96] [Ber66] [CD04] [CK97] [DLS01] [Dur99] [Dur02] [GS53] [Han74] [LW07] [Mar75] [Mye74] [Oxt57] [Rob71]
Allauzen, C., Durand, B.: The Classical Decision Problem. In: Appendix A: “Tiling problems”, pp. 407–420. Springer, Heidelberg (1996) Berger, R.: The undecidability of the domino problem. Memoirs of the American Mathematical Society 66, 1–72 (1966) Cervelle, J., Durand, B.: Tilings: recursivity and regularity. Theoretical Computer Science 310(1-3), 469–477 (2004) Culik II, K., Kari, J.: On aperiodic sets of Wang tiles. In: Foundations of Computer Science: Potential - Theory - Cognition, pp. 153–162 (1997) Durand, B., Levin, L.A., Shen, A.: Complex tilings. In: Proceedings of the Symposium on Theory of Computing, pp. 732–739 (2001) Durand, B.: Tilings and quasiperiodicity. Theoretical Computer Science 221(1-2), 61–75 (1999) Durand, B.: De la logique aux pavages. Theoretical Computer Science 281(12), 311–324 (2002) Gale, F., Stewart, F.M.: Infinite games with perfect information. Ann. Math. Studies 28, 245–266 (1953) Hanf, W.P.: Non-recursive tilings of the plane. I. Journal of Symbolic Logic 39(2), 283–285 (1974) Lafitte, G., Weiss, M.: Universal Tilings. In: Thomas, W., Weil, P. (eds.) STACS 2007. LNCS, vol. 4393, pp. 367–380. Springer, Heidelberg (2007) Martin, D.A.: 1975. Annals of Math. 102, 363–371 (1975) Myers, D.: Non-recursive tilings of the plane. II. Journal of Symbolic Logic 39(2), 286–294 (1974) Oxtoby, J.C.: Contribution to the theory of games, Vol. III. Ann. of Math. Studies 39, 159–163 (1957) Robinson, R.: Undecidability and nonperiodicity for tilings of the plane. Inventiones Mathematicae 12, 177–209 (1971)
A Topological Study of Tilings [Ser05]
387
Serre, O.: Contribution ´ a l’´etude des jeux sur des graphes de processus ´ a pile, PhD thesis, Universit´e Paris VII (2005) [Wan61] Wang, H.: Proving theorems by pattern recognition II. Bell System Technical Journal 40, 1–41 (1961) [Wan62] Wang, H.: Dominoes and the ∀∃∀-case of the decision problem. In: Proceedings of the Symposium on Mathematical Theory of Automata, pp. 23–55 (1962)
A Denotational Semantics for Total Correctness of Sequential Exact Real Programs Thomas Anberr´ee Division of Computer Science, University of Nottingham in N´ıngb¯ o, P.R. China [email protected] http://www.cs.bham.ac.uk/~txa
Abstract. We provide a domain-based denotational semantics for a sequential language for exact real number computation, equipped with a non-deterministic test operator. The semantics is only an approximate one, because the denotation of a program for a real number may not be precise enough to tell which real number the program computes. However, for many first-order common functions f : Rn → R, there exists a program for f whose denotation is precise enough to show that the program indeed computes the function f . In practice such programs possessing a faithful denotation are not difficult to find.
1
Introduction
We provide a denotational model for a functional programming language for exact real number computation. A well known difficulty in real number computation is that the tests x = y and x ≤ y are undecidable and hence cannot be used to control the execution flow of programs. One solution, proposed by Boehm and Cartwright, is to use a non-deterministic test [2]. For any two rational numbers p < q and any real number x, at least one of the relations p < x or x < q can be determined to hold; thus, operators rtestp,q are used, whose evaluation never diverges when x is a real number: 1. rtestp,q (x) evaluates to true or to false, 2. rtestp,q (x) may evaluate to true iff x < q and 3. rtestp,q (x) may evaluate to false iff p < x. Since a program can in general produce different results in different runs, Escard´ o and Marcial-Romero took the view in previous work that programs of realnumber type denote sets of real numbers, and the question arose as to which power domains would be suitable for modelling the behaviour of rtest [7,8]. It was shown that, among the known power domains, only the Hoare power domains are suitable. However, a semantics based on Hoare power domains only accounts for partial correctness of programs (If a program computes a real number then it is the one given by its denotation). We argue that, although Smyth power domains cannot faithfully model the rtestp,q operators, they can nevertheless provide an approximation which is sufficiently precise to define many common first-order functions, and has the further M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 388–399, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Denotational Semantics for Total Correctness
389
advantage over other choices that total correctness of programs is expressed in the model (If the denotation of a program is a real number, then the program computes this real number). Writing M and [M ] for the denotational and operational meaning of a program M , respectively, we show that, in general, one has M [M ], but not necessarily M = [M ]. However, if M represents a real number, then M is maximal, and hence M = [M ]. To illustrate the usefulness of the approximate semantics in practice, it can be shown that many programs of interest have a total denotation, which allows to establish their total correctness without resorting to operational methods. Some background in domain theory is assumed [1,6]. In this paper, a domain is a continuous, directed-complete poset and all domains are seen as topological spaces, with the Scott topology. Continuity of functions between domains is to be understood relatively to the Scott topologies.
2
The Language
The language we consider is an extension of the functional language PCF [9,10] with a type for real numbers and related basic constructors. Types are given by σ = real | nat | bool | σ → σ.
(1)
Types real, nat and bool are called ground types. The typing judgements and reductions rules are recapitulated in Table 1 and Table 2 respectively. The expressions |M | = ⊥, |L| < 1 and 0 < |L|, to the right of certain reduction rules, are decidable, side conditions indicating when these rules are applicable (cf. Definition 2 below). 2.1
Approximating Real Numbers with Intervals
The constructors bounda , p+ and p× are used to approximate real numbers with intervals and operate on these approximations. The index a ranges over RQ = {[s, t] | s ≤ t and s, t ∈ Q} and p over the rational numbers. The set RQ ∪ {(−∞, +∞)} forms a basis of the interval domain [5,3]. Definition 1. The interval domain R is the set of all non-empty, closed and bounded real intervals ordered by reverse inclusion, along with a least element ⊥R = (−∞, +∞). We write x y for x ⊇ y. The interval domain is a continuous Scott domain in which the supremum of a directed set is its intersection. The real numbers are embedded in R via the continuous map r → {r} = [r, r].
390
T. Anberr´ee
Table 1. Typing judgements
Γ, x : σ x : σ
Γ M: σ →τ Γ N: σ Γ M (N ) : τ
Γ, x : σ M : τ Γ (λx : σ.M ) : σ → τ
Γ M: σ →σ Γ Yσ (M ) : σ
Γ true : bool
Γ false : bool
Γ L : nat Γ (0 =) (L) : bool Γ L : bool, Γ M : γ, Γ N: γ γ ∈ {bool, nat, real} Γ ifγ (L, M, N ) : γ Γ n : nat
for all n ∈ N
Γ M : nat Γ succ(M ) : nat
Γ M : nat Γ pred(M ) : nat
Γ M : real Γ bounda (M ) : real
a ∈ RQ , where RQ = {[p, q] | p ≤ q,
Γ M : real Γ p + (M ) : real
p∈Q
Γ M : real Γ p × (M ) : real
p∈Q
p, q ∈ Q}
Γ L : real Γ rtest(L) : bool
Γ ranges over contexts, that is finite sequences x1 : σ1 , . . . , xk : σk where xi are pairwise distinct variables and σi are types.
The operation bounda : R → R that appears as index in the reduction rule (15) of Table 2 is continuous. It is the direct image function obtained from the function bounda : R → R, r → max(a, min(r, a)) where a and a are the upper and lower bounds of the interval a (i.e. a = [a, a]). Notice that we have
A Denotational Semantics for Total Correctness
391
Table 2. Reduction relation (1)
(3) (4)
(6)
(7)
Yσ (M ) M (Yσ (M ))
(2)
(λx : σ.M )(N ) M [N/x]
(5)
if(false, M, N ) N
M M M (N ) M (N ) if(true, M, N ) M B B if(B, M, N ) if(B , M, N ) (8)
(0 =)(0) true
(10)
(0 =)(n + 1) false
succ(n) n + 1
(12)
pred(0) 0
(15)
bounda boundb (M ) boundbounda (b) (M )
(16)
M M |M | = ⊥ bounda (M ) bounda (M )
(17)
(19) (21)
(23)
(13)
pred(n + 1) n
(9)
L L (= 0)(L) (= 0)(L )
(11)
M M succ(M ) succ(M )
(14)
M M pred(M ) pred(M )
(18)
M M |M | = ⊥ p + (M ) p + (M )
(20)
M M |M | = ⊥ p × (M ) p × (M )
p + bounda (M ) boundp+a p + (M )
p × bounda (M ) boundp×a p × (M ) rtest(L) true
|L| < 1
(22)
rtest(L) false
0 < |L|
L L rtest(L) rtest(L )
bounda (⊥R ) = ⊥R and hence a bounda (x), for all x ∈ R. Furthermore, if a and x are consistent (a ∩ x = ∅), then bounda (x) = a ∩ x = a x. Similarly, p+ and p× : R → R are the continuous functions obtained from addition and multiplication of real numbers.
392
T. Anberr´ee
As an example, here are a few reduction steps from a program zero for 0. 1 zero = Yreal λz : real. × bound[−1,1] (z) 2 1 1 λz : real. × bound[−1,1] (z) (zero) × bound[−1,1] (zero) 2 2 1 × (zero) bound − 1 , 1 2 2 2
1 1 4 bound 1 1 × × (zero) bound 1 1 −2,2 −4,4 2 2
1 1 bound − 1 , 1 × × (zero) 2 2 4 4
1 1 1 5 bound 1 1 × × × (zero) bound 1 1 −4,4 −8,8 2 2 2
1 1 1 bound 1 1 × × × (zero) etc. −8,8 2 2 2 It is not hard to see that there is only one, infinite reduction path from program zero, of the form zero . . . bound[− 12 , 12 ]
1 1 1 zero . . . bound[− 21n , 21n ] × . . . × zero . . . 2 2 2
The reduction relation reduces a program of type real to another one that possibly displays a better approximation of its result, in the form of intervals a, at the head of programs such as bounda (M ), but the evaluation process never terminates. We say that a is the immediate output of program bounda (M ). Definition 2 (Immediate outputs of programs). The immediate output of a program of type real is an element of R and is defined by: | bounda (N )| = a |M | = ⊥R
for any term N : real → real and any a ∈ RQ , for any program not of the form bounda (N ).
We also define the immediate output of programs of type bool and nat, as an element of B⊥ or N⊥ : | true | = true, | false | = f alse, |n| = n and |M | = ⊥ in other cases. The immediate output of a program of ground type is the information that can be read at once from the head of a program without further evaluation. Definition 3. The supremum |Mn | of immediate outputs of programs along a reduction path (Mn ) is called the output of the path. At type real, the output of a path is an element of the interval domain R. For example the output of the reduction path from program zero given above,
A Denotational Semantics for Total Correctness
393
is [0, 0], that is {0}, which represents the real number 0 in R. The output of any reduction path exists because a program can only reduce to a program with equal or greater immediate output. It is easy to see that there exists only one reduction path from any given term with no occurrence of rtest. In this deterministic setting, the operational meaning [M ] of a program M of ground type could simply be defined as the output of the unique reduction path from it, that is as an element of R, N⊥ or B⊥ . However, the expressivity of the language without rtest is very weak [4]. 2.2
Redundant Test: The rtest Constructor
We have already motivated the introduction of the rtest operator in Section 1. Reduction rules for the rtest constructor are the only ones with overlapping premises, which renders the operational semantics non-deterministic, in the sense that one term may reduce to several ones (Table 2). We only introduce one constructor rtest0,1 , simply written rtest, the other ones are definable because in the language, e.g. rtestp,q (x) = rtest0,1
x−p q−p
.
The purpose of rule (23) is to eventually enable the applicability of one or both rules (21) and (22). Only when none of (21) and (22) become applicable, should rule (23) be allowed to be applied infinitely often. 2.3
Fair Reduction Paths
To avoid applying rule (23) infinitely many times in a row when doing otherwise is possible, one could give priority to the application of rules (21) and (22) over the application of rule (23), by stipulating that rule (23) should not be applied whenever one of the two other rules can be applied. But this would make rtest able to distinguish between programs computing the same real number. For example, suppose that half is a program for computing 12 , i.e. all reduction paths from half have output [ 12 , 12 ]; for example, take half = 12 + zero. The term rtest(bound[0, 1 ] (half)) would then only reduce to true whereas the 2
term rtest(bound[ 1 ,1] (half)) would only reduce to false, although both terms 2 bound[0, 1 ] (half) and bound[ 1 ,1] (half) compute the number 12 . We wish to remain 2 2 as general as possible and not put artificial constraints on the operational behaviour of the language. It should be up to a particular implementation to make specific decisions to avoid undesirable paths. In order to rule out undesirable paths without imposing too much constraint, we stipulate that only fair paths should be allowed. Roughly, a fair path is one along which rules (21) and (22) are eventually applied when possible. Definition 4. A computation is a maximal, fair reduction path. Computations are the only paths that any implementation of the language should consider. What is important, is that there exists a strategy that allows to produce maximal fair paths. Even if we only consider computations from a fixed program,
394
T. Anberr´ee
these may have different outputs. For example, the program rtest(half) reduces to both true and false. Definition 5. The outputs of a program is the set of outputs of all computations from this program:
|Mn | (Mn ) is a computation from M . Outputs (M ) = n
Since there is at least one fair path from a given term M , the set Outputs(M ) is not empty. A program M : real computes a real number r if Outputs(M ) = [r, r]. For example, consider the program abs : real → real recursively defined by abs (x) = if rtest−1,0 (x) then −x else if rtest0,1 (x) then 12 bound[−1,1] (abs (2x)) else x. (2) It can be shown that, for any program M : real that computes some real number r, the program abs (M ) computes the absolute value of r. 2.4
Operational Meaning of Programs of Ground Types
To account for the multi-valuedness of programs, both the operational meaning [M ] and the denotation M of a program M of ground type are defined in a power domain PD. As mentioned in the introduction, our choice is to use Smyth power domains because it allows us to prove the correctness of programs. Definition 6 (Smyth power domains). Given a domain D, its Smyth power domain P S D is the set of non empty, Scott compact and upper subsets of D, ordered by reverse inclusion (It is itself a domain). The Hasse diagram of the Smyth power domain P S B⊥ is represented in Figure 1. {true}
{false}
{true, false}
{⊥, true, false} Fig. 1. Smyth power domain P S B⊥ over the lifted booleans B⊥
Definition 7 (Interpretation of ground types.). The interpretation γ of a ground type γ is given by: nat = P S N⊥ , bool = P S B⊥ , real = P S R.
A Denotational Semantics for Total Correctness
395
Definition 8. The operational meaning [M ] of a closed term M of ground type γ is the greatest element in the domain γ that contains the set of outputs of M: [M ] = {K ∈ γ | K ⊇ Outputs(M )} . (3) Notice that, for γ = real, Definition 8 relies on the fact that, for any closed term M of ground type γ, the set {K ∈ γ | K ⊇ Outputs (M )} is directed. This follows from the fact that, in continuous Scott domains (bounded complete domains), the intersection of finitely many Scott compact upper sets is Scott compact [1]. It is not always possible to recover the set of outputs of a program from its operational meaning. However, for a program M of ground type whose operational meaning is a finite set of maximal elements of N⊥ , B⊥ or R, we have that Outputs (M ) = [M ]. In particular, if a program M computes a real number r, then [M ] = {[r, r]} in P S R. The main motivation of our work is to define a denotational semantics such that, in such cases, M coincides with [M ] and hence with Outputs (M ), at least for enough programs M for the denotational semantics to be useful.
3
An Approximate Semantics for Total Correctness
The interpretations of ground types were given in Definition 7. The interpre tation of a higher type σ → τ is given by σ → τ = Dσ→τ = σ → τ , the domain of Scott continuous functions from σ to τ . The denotation of a typing judgement x1 : σ1 , . . . , xk : σk M : σ is a continuous function from σ1 ×. . .×σk to σ in the category of bounded-complete continuous domains. Table 3 on the following page gives the recursive definition of the denotations of typing judgements. The fact that the denotations of terms are continuous comes from classical results of domain theory (see e.g. [10]) and the fact that the denotations of basic constructors other than rtest are given as functions known to be continuous. We only have space to give details on the interpretation of rtest. 3.1
Denotation of rtest
Theorem 1. The function ttest : R → P S B⊥ defined as follows is continuous. ⎧ {|true|} ⎪ ⎪ ⎪ ⎨{|false|} ttest(l) = ⎪ {|true, false|} ⎪ ⎪ ⎩ {|⊥B⊥}|
if if if if
l<0 1
(4)
∗ S S The denotation of rtest is the continuous function ttest : P R → P B⊥ that maps K to {↑ ttest(k) | k ∈ K}.
396
T. Anberr´ee Table 3. Denotation of typing judgements
→ − x1 : σ1 , . . . , xk : σk xi : σi ( d ) = di → − → − Γ (λx : σ.M ) : σ → τ ( d )(e) = Γ, x : σ M : τ ( d , e) → − → − Γ Yσ (M ) : σ ( d ) = μDσ Γ M : σ → σ ( d ) → − → − → − Γ M (N ) : τ ( d ) = Γ M : σ → τ ( d ) Γ N : σ ( d ) → − Γ (0 =)(M ) : bool ( d ) = P S (0 =) (Γ M : nat) → − Γ succ(M ) : nat ( d ) = P S (succ) (Γ M : nat) → − Γ pred(M ) : nat ( d ) = P S (pred) (Γ M : nat) → − Γ true : bool ( d ) = {true} → − Γ false : bool ( d ) = {false} → − Γ n : nat ( d ) = {n} → − → − Γ rtest(L) : bool ( d ) = ttest∗ Γ L : real ( d ) → − Γ ifγ (B, M, N ) : γ ( d ) ⎧ → − ⎪Γ M : γ ( d ) ⎪ ⎪ → ⎪ ⎨Γ N : γ (− d) = → − → − ⎪ Γ M : γ ( d ) ∪ Γ N : γ ( d) ⎪ ⎪ ⎪ ⎩ ⊥Dγ
→ − if Γ B : bool ( d ) = {true} → − if Γ B : bool ( d ) = {false} → − if Γ B : bool ( d ) = {true, false} otherwise
→ − Γ bounda (M ) : real ( d ) = P S (bounda ) (Γ M : real) → − Γ p + (M ) : real ( d ) = P S (p+) (Γ M : real) → − Γ p × (M ) : real ( d ) = P S (p×) (Γ M : real)
The left side of Figure 2 represents the values taken by the function ttest on the interval domain. The black wedges indicate to which area the boundary lines belong. In particular ttest ([0, 0]) = ttest ([1, 1]) = {true, false} .
(5)
Similarly, the right side represents the values taken by rtest when seen as a function rtest from R to P(B⊥ ), the power set of B⊥ , in the following way:
A Denotational Semantics for Total Correctness
{true}
{0} {1} {true, false}
{false}
{true}
{0} {1} {true, false}
{⊥B⊥ , true, false}
{⊥B }
⊥R
⊥R
397
{false}
Fig. 2. Values of ttest(l) for l ∈ R
given a program M : real such that M = ↑ {x} = {y ∈ R | x y} for some interval x ∈ R, the figure indicates the set of outputs of the program rtest (M ). On the areas that are of different colours from one figure to the other, we have that ttest(x) strictly contains rtest(x). In particular rtest ([0, 0]) = {true} and rtest([1, 1]) = {false}. It follows that for some programs M : real, even some computing real numbers, one has rtest (M ) < [rtest (M )] . Our main technical result states that adequacy between the operational and denotational semantics holds for programs of ground type with maximal denotation. Theorem 2. For any closed term M of ground type, M [M ]. In particular, if M has a maximal denotation, then equality holds. Our proof makes use of a logical relation and is adapted from the proof of adequacy for PCF as found in e.g. [10]. In itself, Theorem 2 does not say much about our semantics, because the semantics that assigns bottom to every term also satisfies the conclusion of the theorem. However, our model is “close enough” to the operational semantics, as we shall see in the next two sections.
4
Example
For many common first-order computable functions, it is not difficult to find programs whose denotation is total (sends maximal elements to maximal elements). We only have space to provide one example (but see conclusion). The denotation of the closed term we gave earlier (page 394) for the absolute value function is the map abs : P S R → P S R considered in the next theorem. Theorem 3. The following recursively defined function abs: P S R → P S R represents the absolute value map on real numbers: abs (x) = cases x ≤ 0 −1 ≤ x
→ →
−x cases x ≤ 1 0≤x
→ →
bound[−1,1] x
1 2
abs(2x)
398
T. Anberr´ee
where cases x ≤ q p≤x
→ →
M0 is a notation for if ttestp,q (x) then M0 else M1 . M1
Let us say that a program M defines a real number r if M = {[r, r]}. Using the fact that {[r, r]} is maximal in P S R, it follows from Theorems 2 and 3 that, whenever a program M : real defines r ∈ R, the program abs(M ) defines the absolute value |r| of r, and hence that the program abs(M ) computes |r|: all computations from abs(M ) output [|r|, |r|], which represents |r| in the interval domain.
5
Conclusion
The undecidability of (in)equality tests on real numbers from a constructive point of view creates problems in the design of control-flow mechanisms for programming languages for exact real-number computation. A good operational solution, based on constructive analysis, has been proposed by Boehm and Cartwright [2] and developed by Marcial-Romero and Escard´ o [8] to some extent. A main contribution of [8] is that the Hoare power domain of the interval domain can be used to reason about partial correctness in a natural way, combining real analysis and domain theory. A main problem left open in the same work is the development of a denotational semantics that would allow for total correctness proofs. Marcial-Romero and Escard´o ruled out the Smyth and Plotkin power domains for such a semantics, by proving that there is no continuous interpretation of Boehm and Cartwright’s rtest construction in those power domains. In this work, we have shown that the best continuous approximation of rtest in the Smyth power domain allows one to develop total correctness proofs based on real analysis and denotational semantics, in a natural way, avoiding termination proofs based on operational semantics. Technically, our main contribution is the formulation and proof of a suitable computational adequacy theorem for the approximate semantics, which we expressed as M [M ], for any program M of ground type. Acknowledgement. This work is part of my PhD Thesis, under the supervision of Mart´ın Escard´ o, to whom I express all my gratitude. Detailed proofs and many other examples of programs will appear in my PhD dissertation, a draft of which is already available on my web page. The dissertation also contains a universality result (all computable first-order total functions are definable in the language); this will be the subject of another paper.
References 1. Abramsky, S., Jung, A.: Domain theory. In: Abramsky, S., Gabbay, D.M., Maibaum, T.S.E. (eds.) Handbook of Logic in Computer Science, vol. 3, pp. 1– 168. Clarendon Press (1994) 2. Boehm, H.J., Cartwright, R.: Exact real arithmetic: Formulating real numbers as functions. In: Research topics in functional programming, pp. 43–64. AddisonWesley Longman Publishing Co., Inc., Boston, MA, USA (1990)
A Denotational Semantics for Total Correctness
399
3. Escard´ o, M.H.: PCF extended with real numbers: A domain-theoretic approach to higher-order exact real number computation. PhD thesis, Department of Computing, Imperial College, University of London (1996) 4. Farjudian, A.: Sequentiality in Real Number Computation. PhD thesis, School of Computer Science, University of Birmingham (2004) 5. Di Gianantonio, P.: A Functional Approach to Computability on Real Numbers. PhD thesis, University of Pisa, Udine (1993) 6. Gierz, G., Hofmann, K.H., Keimel, K., Lawson, J.D., Mislove, M., Scott, D.S.: Continuous Lattices and Domains. Encyclopedia of Mathematics and its Applications, vol. 93. Cambridge University Press, Cambridge (2003) 7. Marcial-Romero, J.R.: Semantics of a Sequential Language for Exact Real-Number Computation. PhD thesis, School of Computer Science, University of Birmingham U.K. (2004) 8. Marcial-Romero, J.R., Escard´ o, M.H.: Semantics of a sequential language for exact real-number computation. Theoretical Computer Science 379(1-2), 120–141 (2007) 9. Plotkin, G.D.: LCF considered as a programming language. Theor. Comput. Sci. 5(3), 225–255 (1977) 10. Streicher, T.: Domain-Theoretic Foundations of Functional Programming. Imperial College Press, London (2006)
Weak Bisimulations for the Giry Monad (Extended Abstract) Ernst-Erich Doberkat Chair for Software Technology Technical University of Dortmund [email protected]
Abstract. The existence of bisimulations for objects in the Kleisli category associated with the Giry monad of subprobabilities over Polish spaces is studied. We first investigate these morphisms and show that the problem can be reduced to the existence of bisimulations for objects in the base category of stochastic relations using simulation equivalent congruences. This leads to a criterion for two objects related through the monad to be bisimilar.
1 Introduction The Giry monad [8] provides a categorical approach to the measure-theoretic foundations of Probability Theory. Permitting to represent a variety of models in computer Science, it has recently attracted some attention with respect to the foundations of concurrency theory (like bisimulations) [11,1,3,5], Kripke models for modal logics with applications to model checking [15,4], and even to modelling software architectures [2]. The monograph [6] provides an overview over the recent development. E. Moggi proposes in his paper Notions of Computations and Monads [10] a monadic model of computation. Functor T acts as a type constructor, so that if A is a type, then TA is the object of computations of type A. Assuming that T is the functorial part of a monad, the Kleisli category for this monad is identified as the category of programs, and morphisms in this category are the programs. The present paper is about these morphisms in the Giry monad; they are usually referred to as stochastic relations. It studies the problem under which conditions two stochastic relations are bisimilar in the Kleisli category for the Giry monad. Thus, given two stochastic relations K and L, we ask under which conditions we can find a stochastic relation M and two Kleisli morphisms Φ : K M and Ψ : L M, hence we want to know under which conditions a span of morphisms in the Kleisli category for the Giry monad exists. The strong case, i.e., the case that F and G both are replaced by Borel measurable maps (and Kleisli composition by the composition of maps) has been studied extensively in different settings [6,7], and here general criteria are known. The case of weak morphisms, however, did not find sufficient attention yet. This problem is interesting for a variety of reasons. First, it is known that there is an intimate connection between bisimilarity and Kripke models for modal logics, not
Research funded in part by Deutsche Forschungsgemeinschaft, grant DO 263/8-3, Algebraische Eigenschaften stochastischer Relationen.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 400–409, 2008. c Springer-Verlag Berlin Heidelberg 2008
Weak Bisimulations for the Giry Monad
401
only in the case when the logics is interpreted coalgebraically through a set based functor [13] but also when the subprobability based functor is used [7]. It can be shown that the weak probabilistic case is by no means as clearly cut as the strong case, hence the interest is somewhat pronounced here. Then, bisimilarity is a basic notion in the theory of concurrency, consequently, a better knowledge about an interesting special case is welcome. From a technical point of view it is important to see that weak morphisms are closely related to strong ones, which gives some insight into the inner working of the Kleisli construction for the Giry monad. The problem is tackled here in terms of congruences with an approach similar to the one proposed in [4]. Congruences are introduced in Section 3, after having recalled in Section 2 some familiar constructions with the Giry monad. In Section 4 we reduce the problem to finding a weak morphism to the strong case through a construction that resembles a tensor product and analyse congruences that occur in this case. We know that the existence of simulation equivalent congruences implies the existence of a bisimulation for the strong case, hence we study simulation equivalence in the context of the transformed problem. After some transformations involving the corresponding factor spaces, we show that two stochastic relations are weakly bisimilar if we can find simulation equivalent congruences on the base spaces that are compatible with the Kleisli composition. The conclusing Section 5 wraps it all up and proposes further work.
2 Some Constructions Associated with the Giry Monad Given a measurable space (X, A), S (X, A) is the space of all subprobability measures on (X, A). It is again a measurable space with the ∗ − σ-algebra A• . This is the smallest σ-algebra on S (X, A) which makes all the evaluations μ → μ(Q) with Q ∈ A measurable. Defining for another measurable space (Y, B) and an A-B-measurable map f : X → Y the map S (f ) → S (Y ) through S (f ) (μ)(B) := μ(f −1 [B]) constitutes a A• -B • -measurable map. The subprobability functor S is hence an endofunctor on the category of measurable spaces. This functor is the functorial part of a monad, the Giry monad [8]. The Kleisli construction [9] for this monad is particularly interesting: If (X, A) and (Y, B) are measurable spaces, then a stochastic relation K : (X, A) (Y, B) is a map K : X → S (Y, B) which is A-B • -measurable, stochastic relations are just the Kleisli morphisms for the Giry monad. The reader may wish to compare this with Kleisli morphisms for the powerset monad: they are exactly the relations, and in fact there are many interesting and not so obvious similarities, see, e.g. [6]. In Probability Theory, stochastic relations are known as subprobability kernels, the algebraic context is usually not being considered there. The Kleisli composition of the stochastic relations K : (X, A) (Y, B) and L : (Y, B) (Z, C) (for some measurable space (Z, C)) is the stochastic relation L∗K : (X, A) (Z, C) defined through (x ∈ X, G ∈ C) L(y)(G) K(x)(dy). L∗K (x)(G) := Y
In Probability Theory, this operation is usually referred to as the convolution of the kernels K and L.
402
E.-E. Doberkat
Define the coproduct K+K of the stochastic relations K = ((X, A), (Y, B), K) and K = ((X , A ), (Y , B ), K ) through (X + X , A+ A ), (Y + Y , B + B ), K + K , where if z = injX (x), S (injY ) ◦ K (x), K + K (z) := S (injY ) ◦ K (x ), if z = injX (x ).
Here e.g., injY is the injection Y → Y + Y from Y into the sum Y + Y . Thus we have e.g. for x ∈ X and the measurable set T ⊆ Y + Y the identity K + K (injX (x))(T ) = K(x)(T ∩ Y ). It is not difficult to see that this defines the coproduct in the category of stochastic relations, anticipating the definition of strong morphisms from Definition 3. If X is a Polish space, i.e., a Hausdorff space with a countable dense subset for which a complete metric exists, then X becomes a measurable space with the Borel sets B(X) as a σ-algebra. B(X) is the smallest σ-algebra that contains the open (or, equivalently, the closed) sets of X. We suppress B(X) usually from the notation of a Polish space X. Then S (X) is a Polish space as well [12], taking the topology of weak convergence as a topology, and it is well known that the ∗ − σ-algebra are just the Borel sets for this topology. Let again (X, A) and (Y, B) be measurable spaces, and denote by A ⊗ B the product σ-algebra on X × Y . Define for a stochastic relation K : (X, A) (Y, B) and for a subprobability μ ∈ S (Y, B) on (X × Y, A ⊗ B) the measure μ ⊗ K (H) := K(x)(Hx ) μ(dx) X
(here Hx := {y ∈ Y | x, y ∈ H} is the vertical cut of H at x ∈ X). It is folklore in Probability Theory that this defines a measure on the product space. The following Lemma will be helpful in expressing the Kleisli product in a sometimes more convenient manner. Lemma 1. x → K(x) ⊗L defines a stochastic relation K(·) ⊗ L : (X, A) (Y × Z, B ⊗ C) such that L∗K (x)(G) = S (πY ×Z,Z ) (K(x) ⊗ L)(G) for all G ∈ C. 2 Now let X and Y be Polish spaces, and assume that a measure μ ∈ S (X × Y ) is given. Then it is well known that μ can be represented through a measure on X and a stochastic relation K : X Y (see [12, Theorem V.8.1] or [6, Section 1.5.3]): Theorem 1. Given μ ∈ S (X × Y ), then μ = S (πX×Y,X ) (μ)⊗K for some stochastic relations K : X Y . K is unique S (πX×Y,X ) (μ)-almost everywhere. K is called a disintegration of μ with respect to X, Y . 2 Thus, whenever μ ∈ S (X × Y ) and H ∈ B(X × Y ), we can write μ(H) = K(x)(Hx ) S (πX×Y,X ) (μ)(dx). X
The Change of Variables Formula will be a helpful technical tool. It uses the basic fact stated above that a measurable map between measurable spaces induces a measurable map between the corresponding subprobabilities. We formulate it for convenience for Polish spaces, but it holds for general measurable spaces as well.
Weak Bisimulations for the Giry Monad
403
Lemma 2. Let X and Y be Polish spaces, and assume that f : X → Y is Borel measurable. Then g ◦ f (x) μ(dx) g(y) S (f ) (μ)(dy) = f −1 [B]
B
whenever B ⊆ Y is a Borel set.
2
3 Smooth Relations If (X, A) is a measurable space, then an equivalence relation on X is called smooth (or countably generated) iff there exists a sequence (Qn )n∈N of sets in A such that x x iff ∀n ∈ N : [x ∈ Qn ⇔ x ∈ Qn ]. The sequence (Qn )n∈N is said to determine . For example, when one constructs an equivalence relation from the sets on which the formulas of a modal logic are valid [4], these sets then form a determining sequence. In this case two states are equivalent iff they cannot be separated by the logic. Concerning the measurable structure of the factor space, it is well known that if X is Polish, then the factor space X/ is analytic (i.e., homeomorphis to a continuous image of a Polish space), whenever is smooth, provided the factor carries the largest σ-algebra that makes the factor map e : x → [x] B(X)-measurable [14, Exercise 5.1.14]. As a rule, the factor spaces are analytic spaces, even if the base spaces are Polish, see [7, Example 2.7] for a discussion. Call a set B ⊆ X -invariant iff B = {[x] | x ∈ B}, so that x ∈ B and x x implies x ∈ B. The -invariant A-measurable sets INV (A, ) := {C ∈ A | C is -invariant} form a σ-algebra for a measurable space (X, A). This is a collection of results for smooth equivalence relations that will be used silently throughout. Lemma 3. Let X be a Polish space, be a smooth equivalence relation with the Borel sets (Qn )n∈N as determining sequence. Then a. INV (B(X), ) = σ ({Qn | n ∈ N}), −1 b. C ∈ B(X/) iff e−1 [C] ∈ INV (B(X), ), and e [e [A]] = A for each A ∈ INV (B(X), ) , c. x x iff x ∈ Q ⇔ x ∈ Q holds for all Q ∈ INV (B(X), ) . d. The equivalence classes [x] are exactly the atoms of the σ-algebra INV (B(X), ). As a consequence of the characterization for the atoms of INV (B(X), ) we note that the INV (B(X), )-measurable real functions are constant on the equivalence classes. Corollary 1. Let X and be as in Lemma 3. If f : X → R is INV (B(X), )measurable, then ⊆ ker (f ) . 2 A pair (α, β) of smooth equivalence relations α on X and β on Y (with both X and Y Polish) is called a congruence for the stochastic relation K : X Y iff K(x)(B) = K(x )(B) whenever x α x and B ⊆ Y is a β-invariant Borel set. Thus if α cannot
404
E.-E. Doberkat
distinguish x from x , and if β cannot discern the elements of B, then K(x) and K(x ) both assign the same probability to B. Given a congruence (α, β) for K : X Y , one constructs the factor relation K(α,β) : X/α Y /β upon setting K(α,β) [x]α (G) := K(x) e−1 β [G] = S (eβ ) ◦ K (x)(G) for x ∈ X, G ∈ B(Y /β). This is a characterization of congruences in terms of the relation involved [5, Lemma 2.8]: Proposition 1. The following statements are equivalent for the Polish spaces X and Y a. (α, β) is a congruence for K : X Y . b. K : INV (B(X), α) → INV (B(Y ), β) is a stochastic relation.
2
For the investigation of bisimilarity, the notion of simulation equivalent congruences will be important. A necessary condition for the bisimilarity of stochastic relations is the existence of equivalent congruences on them. This condition is technically a bit involved because it requires the notion of the mutual generation of smooth equivalence relations; this concept is called spawning, it is discussed in detail in [4], so is the closely related concept of simulation equivalent congruences. Lemma 4. Let α and β be smooth equivalence relations on the Polish spaces X resp. Y , and assume that p : X/α → Y /β is a map between the equivalence classes. √ √ Put p(A) := {p([x]α ) | x ∈ A} for A ⊆ X. Then p : INV (B(X), α) → √ 2 INV (B(Y ), β) . If p is injective, then p is a Boolean σ-morphism. √ It is useful to notice that p(A) = e−1 β [p [eα [A]]] . Definition 1. Let α and β be smooth equivalence relations on the Polish spaces X resp. Y , and assume that p : X/α → Y /β is an injective map between the equivalence classes. We say that α spawns β via (p, A0 ) iff A0 is a countable generator √ INV (B(X), α) such that { p(A)|A ∈ A0 } is a generator of INV (B(Y ), β). Thus if α spawns β, then the measurable structure induced by α on X is all we need for constructing the measurable structure induced by β on Y : the map p can be made to carry over the generator A0 from INV (B(X), α) to INV (B(Y ), β) and to transport the atoms from one σ-algebra to the other. This is of particular interest since the atoms constitute the equivalence classes. It can be shown that the specific generator in question is not important because the map between the classes is injective, so that any generator will do. Definition 2. Let K : X Y and K : X Y be stochastic relations over analytic spaces on which congruences (α, β) resp. (α , β ) are defined. a. Call (α, β) proportional to (α , β ) (written as (α, β) ∝ (α , β )) iff α spawns α via (p, A0 ), β spawns β via (q, B0 ) such that both A0 and B0 are closed under finite intersections and √ ∀x ∈ X∀x ∈ p([x]α )∀B ∈ B0 : K(x)(B) = K (x )( q(B)).
Weak Bisimulations for the Giry Monad
405
b. Call these congruences simulation equivalent iff both (α, β) ∝ (α , β ) and (α , β ) ∝ (α, β) We require the generators to be closed under finite intersections since this is — by the famous π-λ-Theorem of Measure Theory [6, Theorem 1.1] — a helpful condition to ensure uniqueness of a measure. Simulation equivalent congruences behave in identical fashion on the class structure induced by the respective congruences. ¯, x ¯ ∈ X+X If α and α are smooth equivalence relations on X resp. X , define for x the equivalence relation α + α on the coproduct X + X through [injX (x)]α+α := [x]α whenever x ∈ X, and [injX (x )]α+α := [x ]α , whenever x ∈ X . Thus (X + X )/(α + α ) is isomorphic to X/α + X /α , hence both spaces will be identified. Simulation equivalence is preserved through coproducts. The proof is fairly straighforward. Lemma 5. Let K and L be stochastic relations with simulation equivalent congruences (χ, α) and (υ, β) and assume that K and L are stochastic relations with simulation equivalent congruences (χ , α ) and (υ , β ). Then (χ + χ , α + α ) and (υ + υ , β + β ) are simulation equivalent congruences for K + K and L + L . 2
4 Weak Bisimilarity We will define weak and strong morphisms now in order to be able to define bisimilarity. This property is introduced through a span of morphisms, thus we take here a coalgebraic point of view. In what follows, all spaces are Polish. Definition 3. Let K = (X, Y, K) and L = (A, B, L) be stochastic relations, then a pair (f, g) : K → L is called a strong morphism iff f : X → A and g : Y → B are surjective Borel maps, such that L ◦ f = S (g) ◦ K. Thus strong morphisms are based on measurable Borel maps. In contrast, weak morphisms are based on Kleisli morphisms. Definition 4. Let K = (X, Y, K) and L = (A, B, L) be stochastic relations, then a pair (F, G) : K L is called a weak morphism iff F : X A and G : Y B are Kleisli morphisms with L∗F = G∗K. The condition L∗F = G∗K entails for each x ∈ X and each Borel set P ∈ B(B) L∗F (x)(P ) =
L(a)(P ) F (x)(da) =
A
G(y)(P ) K(x)(dy) = G∗K (x)(P )
Y
If (f, g) : K → L is a strong morphisms, then L(f (x))(P ) = K(x)(g −1 [P ]) holds. Clearly, strong morphisms are special cases of weak ones. This is so because δf : x → δf (x) is a stochastic relation X Y , whenever f : X → Y is a Borel map, and because Y h(y) δf (x)(dy) = h(f (x)), whenever f : Y → R is a measurable map. Some properties of weak morphisms are discussed in [5]. The coproduct permits to reduce co-spans of weak morphisms to weak morphisms.
406
E.-E. Doberkat
t1 t2 / X RR X o P l RRR l l l RRR K l F lll RRR l M M N l 1 2 RRR l ll RR( vlll / S (Y × B) / S (Y ) S (A) o S (A × B) o S (Q) KK S(s1 ) S(s2 ) s S(πA×B,A ) S(πY ×B,Y ) KK s s KK ss K ssS(πY ×B,B ) S(πA×B,B ) KK s % ys
S (B) Fig. 1. Derived diagram for the proof of Proposition 3
Lemma 6. Let (F, G) : K N , (F , G ) : K N be a span of weak morphisms. 2 Then (F + F , G + G ) : K + K N + N constitutes a weak morphism. Proposition 2. A weak morphism (F, G) : (X, Y, K) (A, B, L) is characterized through the stochastic relations M1 : X A × B and M2 : X Y × B with these properties: a. S (πA×B,B ) ◦ M1 = S (πY ×B,B ) ◦ M2 . b. The disintegration of M1 (x) with respect to A, B is independent of x ∈ X and coincides with L. c. K = S (πY ×B,Y ) ◦ M2 . d. The disintegration of M2 (x) with respect to Y, B is independent of x ∈ X. Then G is the disintegration of M2 (x) with respect to Y, B, and F = S (πA×B,A )◦M1 . 2 Using this construction, we derive a first criterion for the weak bisimilarity of two stochastic relations. Proposition 3. Construct M1 : X A × B and M2 : X Y × B from the weak morphism (F, G) : (X, Y, K) (A, B, L). If M1 and M2 are strongly bisimilar, then K and L are weakly bisimilar. Proof. The construction yields the diagram of commuting maps displayed in Fig. 1 for a suitable stochastic relation N : P Q. Chasing L (which is hidden through disintegration in M1 ) and K, the assertion is easily established. 2 This reduction result encourages to look for conditions that ensure the derived relations M1 and M2 being strongly bisimilar. For this, fix smooth equivalence relations α on A, β on B, χ on X and υ on Y and a weak morphism (F, G) : (X, Y, K) (A, B, L) such that (χ, υ) is a congruence for K, (α, β) is a congruence for L, (χ, α) is a congruence for F and finally, (υ, β) is a congruence for G. We assume that (χ, α) and (χ, υ) are simulation equivalent such that p : A/α → Y /υ is a bijection with inverse q. The maps p and q are used to transport the structures to and fro in the sense of Definition 1 with A0 and Y0 as the respective ∩-stable generators for INV (B(A), α) and INV (B(Y ), υ). We assume thatthe identity on X/χ maps the factor structure onto it√ self so that K(x)(W0 ) = F (x ) q(W0 ) holds for every W0 ∈ Y0 whenever x χ x . Recall from both A0 and Y0 are closed under finite intersections, see Definition 2.
Weak Bisimulations for the Giry Monad
407
Define the product α × β of the equivalence relations α and β through a, b a , b (α × β) iff a α a and b β b , then α×β and, similarly, υ×β are smooth equivalence relations on A×B resp. Y ×B such that e.g. INV (B(A × B), α × β) = INV (B(A), α)⊗ INV (B(B), β) , see [6, Lemma 5.13]. Let B0 be a ∩-stable countable generator of INV (B(B), β), then {A0 × B0 | A0 ∈ A0 , B0 ∈ B0 } is a countable and ∩-stable generator of INV (B(A × B), α × β). A similar characterizsation holds for the σ-algebra INV (B(Y × B), υ × β). We claim that (χ, α × β) is simulation equivalent to (χ, υ × β). It suffices to demon√ strate that these equations hold: M2 (x)( p(A0 ) × B0 ) = M1 (x )(A0 × B0 ), and √ M1 (x)( q(Y0 ) × B0 ) = M2 (x )(Y0 × B0 ), whenever A0 ∈ A0 , B0 ∈ B0 , Y0 ∈ Y0 and x, x ∈ X with x χ x are given. It is enough to establish the first equality, since the maps p and q are symmetric, the second one will follow in the same way. The proof is broken into several steps. The first step shows that √ G(υ,β) (t)(eβ [B0 ]) K(χ,υ) ([x]χ )(dt). M2 (x)( p(A0 ) × B0 ) = √ eυ [ p(A0 )]
Because (υ, β) is a congruence for G, we know from Proposition1 that y → G(y)(B0 ) is INV (B(Y ), υ)-measurable, since B0 ∈ INV (B(B), β). From Corollary 1 we infer that this map is constant on the atoms of INV (B(Y ),υ), which are just the equivalence classes, thus we may conclude that G(y)(B0 ) = G(υ,β) ◦ eυ (y)(eυ [B0 ]). Apply the Change of Variables Formula, and observe that for a Borel set T ∈ B(Y /υ) the √ √ √ equality e−1 p(A0 ) is equivalent to T = eυ [ p(A0 )] , because p(A0 ) ∈ υ [T ] = INV (B(Y ), υ). Recalling the construction of the factor relation, it is evident that the measures S (eυ ) (K(x)) and K(χ,υ) ([x]χ ) coincide on B(Y /υ), yielding the equation. Because (α, β) and (υ, β) are simulation equivalent, we may conclude that G(υ,β) (t)(eβ [B0 ]) = L(α,β) (q(t))(eβ [B0 ]). Using change of variables, and observing √ √ q−1 [q [eυ [ p(A0 )]]] = eυ [ p(A0 )] , because q is injective, we continue and show in a second step √ L(α,β) (s)(eβ [B0 ]) S (q) K(χ,υ) ([x]χ ) (ds) M2 (x)( p(A0 ) × B0 ) = √ q[eυ [ p(A0 )]]
Now let G ∈ B(A/α), then S (q) K(χ,υ) ([x]χ ) (G) = S (eα ) F (x ) (G) This is derived from the definition of the factor relation, from the assumption that , and x−1χ x −1 √ −1 −1 −1 from the assumption that q is injective, since q◦eυ ◦q = eα ◦q◦eυ ◦eυ ◦q = −1 −1 e−1 α . From eα ◦ q ◦ eυ ◦ eυ ◦ p ◦ eα = id we obtain √ L(α,β) (s)(eβ [B0 ]) S (eα ) F (x ) (ds) M2 (x)( p(A0 ) × B0 ) = √ q[eυ [ p(A0 )]]
=
√ e−1 α [q[eυ [ p(A0 )]]]
=
L(α,β) (eα (a))(eβ [B0 ]) F (x )(da)
L(a)(B0 ) F (x )(da)
A0
= M1 (x )(A0 × B0 ),
408
E.-E. Doberkat
as desired. As an aside, it is noted that the proof does not use the assumption that (F, G) is a weak morphism. The considerations above establish the main result now. Theorem 2. Let (χ, υ) and (α, β) be congruences for the stochastic relations K = (X, Y, K) resp. L = (A, B, L) and assume that there exists a weak morphism (F, G) : K L such that (χ, α) and (υ, β) are simulation equivalent congruences for (X, A, F ) resp. (Y, B, G). Then K and L are weakly bisimilar. Proof. Define the relations M1 : X A × B through M1 (x) := F (x) ⊗ L resp. M2 : X Y × B through M2 (x) := K(x) ⊗ G. The discussion above shows that the congruences (χ, α × β) and (χ, υ × β) are simulation equivalent. Thus we obtain from the general criterion in [4, Theorem 4.8] that M1 and M2 are strongly bisimilar. From Proposition 3we infer the claim. 2 In the category of stochastic relations with strong morphisms, a bisimulation is constructed through a cospan. This construction is helpful, e.g., when showing that Kripke models for a modal logic are bisimilar, provided they are behavioral equivalent [6, 6.2.3]. Thus the question arises whether a similar construction is possible as well when strong morphisms are replaced by weak ones. We did reduce a cospan of weak morphisms to a weak morphism on a coproduct in Lemma 6, and this construction will be helpful now for answering the question above. Proposition 4. Let (F, G) : K N and (F , G ) : K N be a cospan of weak morphisms, and assume that (χ, υ) and (χ , υ ) is a congruence on K resp. K , and (α, β) is a congruence on N such that (χ, α) is simulation equivalent to (υ, β), (χ , α) is simulation equivalent to (υ , β). Then K and K are weakly bisimilar. Proof. We infer from Lemma 5 that (χ + χ , α + α) is simulation equivalent to (υ + υ , β + β ), so we find through Theorem 2 a stochastic relation M = (C, D, M ) and weak morphisms (I, J) : M K + K and (I , J ) : M N + N (using the notation from Theorem 2 with N = (A, B, C)). Discarding the latter morphisms, we define the stochastic relation I1 : C X through I1 (c)(G) := I(c)(G ∩ X) with G ∈ B(X); the relations I2 : C X , J1 : D Y and J2 : D Y are defined similarly. An easy integration argument shows that K ∗I2 = J2 ∗M, holds. 2 Thus we have a general criterion for two stochastic relations being weakly bisimilar. It is technically more involved than the corresponding criterion for strong morphisms (which simply states that two relations are strongly bisimilar if there exists simulation equivalent congruences for them). Both statements have in common that they do not need an external instance (like a modal logic) for a decision of whether or not they are bisimilar, and both use the existence of a cospan, albeit in very different ways.
5 Conclusion We derive a criterion for weak bisimilarity in the Giry monad. Technically, this was done through a reduction argument together with an appliction of a general criterion
Weak Bisimulations for the Giry Monad
409
for strong bisimilarity. This shows a close relationship between Kleisli morphisms and morphisms in the base category which has not yet been sufficiently exploited in this special case or in the general case, as it seems. Looking again into tensored categories [1] for this purpose may be useful.
References 1. Abramsky, S., Blute, R., Panangaden, P.: Nuclear and trace ideal in tensored *-categories. J. Pure Appl. Alg. 143(1–3), 3–47 (1999) 2. Doberkat, E.-E.: Pipelines: Modelling a software architecture through relations. Acta Informatica 40, 37–79 (2003) 3. Doberkat, E.-E.: Eilenberg-Moore algebras for stochastic relations. Information and Computation 204, 1756–1781 (2006) 4. Doberkat, E.-E.: Stochastic relations: congruences, bisimulations and the Hennessy-Milner theorem. SIAM J. Computing 35(3), 590–626 (2006) 5. Doberkat, E.-E.: Kleisli morphisms and randomized congruences for the Giry monad. J. Pure Appl. Alg. 211, 638–664 (2007) 6. Doberkat, E.-E.: Stochastic Relations. Foundations for Markov Transition Systems. Chapman & Hall/CRC Press, Boca Raton, New York (2007) 7. Doberkat, E.-E., Schubert, C.: Coalgebraic logic for stochastic right coalgebras. Technical report, Chair for Software-Technology, University of Dortmund (September 2007) 8. Giry, M.: A categorical approach to probability theory. In: Categorical Aspects of Topology and Analysis, J. Pure Appl. Alg., 915, 68–85. (1981) 9. Mac Lane, S.: Categories for the Working Mathematician. In: Graduate Texts in Mathematics, Springer, Berlin (1997) 10. Moggi, E.: Notions of computation and monads. Information and Computation 93, 55–92 (1991) 11. Panangaden, P.: Probabilistic relations. In: Baier, C., Huth, M., Kwiatkowska, M., Ryan, M. (eds.) Proc. PROBMIV, pp. 59–74 (1998) 12. Parthasarathy, K.R.: Probability Measures on Metric Spaces. Academic Press, New York (1967) 13. Schr¨oder, L., Pattinson, D.: Modular Algorithms for Heterogeneous Modal Logics. In: Arge, L., Cachin, C., Jurdzi´nski, T., Tarlecki, A. (eds.) ICALP 2007. LNCS, vol. 4596, pp. 459– 471. Springer, Heidelberg (2007) 14. Srivastava, S.M.: A Course on Borel Sets. In: Graduate Texts in Mathematics, Springer, Berlin (1998) 15. Zhou, C.: Complete Deductive Systems for Probabilistic Logic with Application to Harsany Type spaces. PhD thesis, Department of Mathematics, University of Indiana (2007)
Approximating Border Length for DNA Microarray Synthesis Cindy Y. Li1 , Prudence W.H. Wong1 , Qin Xin2 , and Fencol C.C. Yung3 1
3
Department of Computer Science, University of Liverpool, UK {cindyli,pwong}@liverpool.ac.uk 2 Simula Research Lab, Norway [email protected] Department of Computer Science, University of Hong Kong, Hong Kong [email protected]
Abstract. We study the border minimization problem (BMP), which arises in microarray synthesis to place and embed probes in the array. The synthesis is based on a light-directed chemical process in which unintended illumination may contaminate the quality of the experiments. Border length is a measure of the amount of unintended illumination and the objective of BMP is to find a placement and embedding of probes such that the border length is minimized. The problem is believed to be NP-hard. In this paper we show that BMP admits √ an O( n log2 n)-approximation, where n is the number of probes to be synthesized. In the case where the placement is given in advance, we show that the problem is O(log2 n)-approximable. We also study a related problem called agreement maximization problem (AMP). In contrast to BMP, we show that AMP admits a constant approximation even when placement is not given in advance.
1 Introduction DNA microarrays [9] have become a very important research tool which have proved to benefit areas including gene discovery, disease diagnosis, and multi-virus discovery. They are used for performing a large number of hybridization experiments simultaneously. Besides their prevalent use to measure the amount of gene expression [21] in a cell, microarray is an efficient tool for making a qualitative statement about the presence or absence of biological target sequences in a sample. A DNA microarray (“chip”) is a plastic or glass slide which consists of thousands of (about 60,000) short DNA sequences known as probes. DNA microarray design raises a number of challenging combinatorial problems, such as probe selection [10,14,18,22], deposition sequence design [17,19] and probe placement and synthesis [12,3,4,5,15,16]. In this paper, we focus on the probe placement and synthesis problem. Probes are synthesized on the microarray through the process called very large-scale immobilized polymer synthesis (VLSIPS) [8]. In each step, light is selectively allowed through a mask to expose spots in the microarray in order to activate the nucleotides in the spots. The patterns of the masks used and the sequence of the deposition nucleotides in the illumination define the ultimate sequence of nucleotides of the array spot. A mask consists of masked (blocking light) and unmasked (allowing light) regions and induces M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 410–422, 2008. c Springer-Verlag Berlin Heidelberg 2008
Approximating Border Length for DNA Microarray Synthesis
AC
TA
CT
CA
C
M4
C
A
M3
A
411
A A
unmasked region
T
M2
masked region
C
M1
T T
C
C
Fig. 1. Synthesis of a 2 × 2 microarray. The deposition sequence D = CTAC corresponds to the sequence of four masks M1 , M2 , M3 , and M4 . The masked regions are shaded. The borders between the masked and unmasked regions are represented by bold lines.
deposition of a particular nucleotide (A, C, G or T) at its exposed array spots. The deposition sequence D corresponding to the sequence of masks is a supersequence of all probes in the array (see example in Figure 1). DNA microarray synthesis consists of two components, namely probe placement and probe embedding. Given a set of probes to be synthesized, probe placement is to place each probe to a unique spot in the microarray and probe embedding is the sequence of masked and unmasked steps used in the synthesis. For example, in Figure 2, the deposition sequence is (ACGT)3 and the sequence (a) A(−)4 C(−)5 T is a possible embedding of the probe ACT, where “ − ” represents a space. We distinguish two types of synthesis, namely, synchronous and asynchronous synthesis. In synchronous synthesis, each deposition nucleotide can only be deposited to the i-th position of the probes for a particular i. In asynchronous synthesis, there is no such restriction, allowing arbitrary embeddings. For example, Figure 1 shows an asynchronous synthesis in which M2 deposits a nucleotide to the second position of the sequence CT and the first position of TA. Asynchronous synthesis is more flexible, yet more difficult to optimize. In this paper we focus on asynchronous synthesis. Due to diffraction, internal reflection and scattering, spots on the border between masked and unmasked regions are often subject to unintended illumination [8]. This uncertainty produces unpredicted probes that can compromise experimental results. As microarray chip is expensive to synthesize, it is usual that as many probes as possible are placed in a chip (i.e., as many entries are used), while unintended illumination has to be minimized. The magnitude of unintended illumination can be measured by the border length of the masks used, which is the number of borders shared between masked and unmasked regions, e.g., in Figure 1, the border length of M1 , M3 , M4 is 2 and M2 is 4. To reduce the amount of unintended illumination, one can exploit freedom in placing probes in the microarray during probe placement and choosing different probe embeddings. The Border Minimization Problem (BMP) [12] is to find a placement of the probes on the microarray together with their embeddings in such a way that the sum of border lengths over all masks is minimized. It has been stated in [3,4] that the problem is believed to be NP-hard because of the exponential number of possible placements
412
C.Y. Li et al. p = ACT
(a) (b) (c) (d)
D ε1
A C G T A C G T A C G T
ε2
A C
ε3 ε4
A
C
A
T T C
A C
T
T
Fig. 2. Different embeddings of probe p = ACT into deposition sequence D = (ACGT)3
(although we are not aware of an NP-hardness proof). For this reason, we focus on approximation algorithms for BMP in this paper. Previous work. The BMP problem has attracted a lot of attention [12,3,4,5,15,16] and most work is experimental in nature. As far as we know, no polynomial time approximation algorithm is known for BMP with non-trivial performance guarantee. BMP was first formally defined by Hannenhalli et al. [12]. They focused on synchronous synthesis and the only concern becomes probe placement. Their algorithm computes an approximated travelling salesman path (TSP) in the complete graph with nodes representing probes and edge costs representing the Hamming distance between the probes. The TSP path is then placed on the microarray in a certain way called threading. Experiments shows that threading is effective in reducing border length. Since then, other algorithms [15,16,4] have been proposed to improve the experimental results. Asynchronous probe embedding was introduced by Kahng et al. [15]. They studied a special case that the deposition sequence D is given and the embeddings of all but one probes are known. A polynomial time dynamic programming algorithm was proposed to compute the optimal embedding of this single probe whose neighbors are already embedded. This algorithm is used as the basis for several heuristics [3,4,5,15,16] that are shown experimentally to reduce unintended illumination in terms of border length. On the other hand, there are few theoretical results. In [15], lower bounds on the total border length for synchronous and asynchronous BMP problem were given, which are based on Hamming distance, and Longest Common Subsequence (LCS), respectively. The asynchronous dynamic programming mentioned above computes the optimal embedding of a single probe in time O(|D|), where is the length of a probe and D is the deposition sequence. The algorithm can be extended to an exponential time algorithm to find the optimal embedding of all n probes in O(2n n |D|) time. Our contribution. In this paper, we study approximation of BMP in asynchronous synthesis. √ This is the first result with proved performance guarantee. The main result is an O( n log2 n)-approximation, where n is the number of probes in the microarray. This is based on an approximation algorithm for the variant when the placement of probes is given in advance (called P-BMP problem). We show that P-BMP is O(log2 n)approximable. We further show that if the array is one-dimensional, P-BMP can be solved optimally in polynomial time and there is a constant approximation for BMP. On the other hand, we show that BMP can be defined as the maximum agreement problem (AMP) with a different objective called “agreement”. Minimizing the border length is
Approximating Border Length for DNA Microarray Synthesis
413
equivalent to maximizing the agreement. Yet we are able to devise O(1)-approximation algorithms for AMP regardless of whether the placement is given in advance or not. Organization of the paper. In Section 2, we give some definitions and notations. In Sections 3 and 4, we present and analyze approximation algorithms for BMP and AMP, respectively. Finally we give a conclusion and discuss future work in Section 5.
2 Preliminaries √ √ We are given a set of n length- √ probes P = {p1 , p2 , . . . , pn }, a n × n array (for simplicity, we assume that n is an integer). For any sequence pi , we denote √ the t-th √ character of a sequence pi by pi [t]. The probes in P are to be placed on the n × n array. We represent this array by a grid graph G = (V, E). Two grid vertices (x1 , y1 ) and (x2 , y2 ) are said to be neighbor if |x1 − x2 | + |y1 − y2 | = 1. For each vertex v ∈ V , we denote the set of neighbors of v by N (v). Placement and embedding. A placement of the probes is a bijective function φ : P → V that maps each probe to a unique vertex in the grid G. An embedding of a set of probes P into a deposition sequence D is denoted by ε = {ε1 , ε2 , . . . , εn }. For 1 ≤ i ≤ n, εi is a length-|D| sequence such that (1) εi [t] is either D[t] or a space “ − ”; and (2) removing all spaces from εi gives pi . The hamming distance between εi and εj measures the border length between pi and pj if they are neighbors in a certain placement. We define this quantity as the conflict between the embeddings of pi and pj , denoted by confε (pi , pj ). Note that confε (pi , pj ) ≤ 2. We define the share between the embeddings of pi and pj as 2 − confε (pi , pj ), and denote it by shareε (pi , pj ). Border length and agreement. The border length of a placement φ and an embedding ε is defined as the sum of conflicts between the embeddings of probes that are neighbors in the placement φ in G: 1 BL(φ, ε) = confε (pi , pj ) . (1) 2 pi , pj : φ(pj ) ∈ N (φ(pi ))
The objective of the BMP problem is to find a placement φ and an embedding ε, so that BL(φ, ε) is minimized. We denote the optimal placement and the corresponding optimal embedding by φ∗ and ε∗ , respectively. We further define the counter part of border length, the agreement, which is the sum of shares between the embeddings of probes that are neighbors in the placement φ in G: 1 A(φ, ε) = shareε (pi , pj ) (2) 2 pi , pj : φ(pj ) ∈ N (φ(pi ))
The Maximum Agreement Problem (AMP) is to find a placement φ and an embedding ε, √ so that A(φ, ε) is maximized. Since A(φ, ε) = 4(n − n) − BL(φ, ε) , minimizing the border length BL(φ, ε) is equivalent to maximizing the agreement A(φ, ε). Common subsequence and common supersequence. The border length is closely related to the common subsequence and common supersequence between neighboring
414
C.Y. Li et al.
sequences in the placement. Consider any two length- sequences p, q. We denote the longest common subsequence and shortest common supersequence of two sequences p and q by LCS(p, q) and SCS(p, q), respectively, and the corresponding length as |LCS(p, q)| and |SCS(p, q)|, respectively. SCS(p, q) can be obtained by finding LCS(p, q) and inserting into p the characters in q that are not in LCS(p, q) while preserving the order in q. Therefore, |SCS(p, q)| = 2 − |LCS(p, q)|. For any embedding ε, the maximum number of common deposition nucleotides between p and q is |LCS(p, q)|, in other words, confε (p, q) ≥ 2( − |LCS(p, q)|) and shareε (p, q) ≤ 2|LCS(p, q)|. We define the LCS distance to be 2( − |LCS(p, q)|), denoted by dist(p, q). In other words, dist(p, q) is a lower bound of confε (p, q) for any embedding ε. Multiple sequence alignment (MSA) and Weighted MSA (WMSA). As we will see in later sections, a variant of BMP problem, named P-BMP (BMP problem in which the placement is given), can be polynomial time reducible to WMSA. As a consequence, we can apply the approximation results on WMSA to P-BMP, which we can further use as a building block for the approximation for BMP. We first review the MSA and WMSA problems. MSA and WMSA have been studied extensively [2,7,11,20]. Let Σ be the set of characters and S = {S1 , S2 , . . . , Sk } be a set of k sequences, with maximum length m, over Σ. An alignment of S is a matrix S = (S1 , S2 , . . . , Sk ) such that |Si | = m and Si is formed by inserting spaces function δ(a, b) where a, b ∈ Σ ∪ {−}, the pair-wise score into Si . For a given distance of Si and Sj is defined as 1≤y≤m δ(Si [y], Sj [y]). Given a weight function w(i, j) sum-of-pair (SP) score SP(S , w) = forthe pair of sequences Si and Sj , the weighted 1 1≤i,j≤k w(i, j) 1≤y≤m δ(Si [y], Sj [y]). The WMSA problem is to find an align2 ment S such that SP(S , w) is minimized. WMSA has been proved to be NP-complete. An O(log2 n)-approximation algorithm [23] has been given via a reduction to the minimum routing cost tree problem (MRCT) [1]. Minimum routing cost tree problem (MRCT). In this problem, a graph with weighted edges is given. For a spanning tree of the graph, the routing cost between two vertices is the sum of weights of the edges on the unique path between the two vertices in the spanning tree. The routing cost of the spanning tree is defined as the sum of routing cost between every pair of two vertices. The MRCT problem is to find a spanning tree whose routing cost is minimum. The results in [1] state that there is a polynomial time reduction from WMSA to MRCT. Each sequence in the input of WMSA corresponds to a vertex in the input graph of MRCT. The edge weight between two vertices is set to be the weighted edit distance between the two corresponding sequences. The reduction result states that (1) there is a routing spanning tree T whose routing cost is at most O(log2 n) times i,j w(i, j)d(i, j), where d(i, j) is the edit distance between the two sequences i and j; and (2) there is an alignment S whose SP(S , w) is at most the routing cost of T . Note that i,j w(i, j)d(i, j) is a lower bound on the weighted SP score. Therefore, the following lemma follows. Lemma 1. [23] There is an O(log2 n)-approximation algorithm for the WMSA problem, where n is the number of sequences to be aligned.
Approximating Border Length for DNA Microarray Synthesis
415
3 The BMP Problem In this section, we study the BMP problem. We are to find a placement and an embed√ ding for the given probe set. An O( n log2 n)-approximation algorithm is given for BMP (Section 3.2), which is based on an approximability result for a variant of BMP, named P-BMP (Section 3.1). At the end of this section, we also discuss the case when the array is one-dimensional and we show that BMP admits better results in this case. 3.1 P-BMP: Finding Embedding When Placement Is Given In this section, we study the P-BMP problem, a variant of BMP with a placement given in advance. The concern becomes to find an embedding. We show that P-BMP is O(log2 n)-approximable by giving a reduction to the weighted multiple sequence alignment problem (WMSA), for which there is an O(log2 n)-approximation algorithm [23]. Lemma 2. There is a polynomial time reduction from P-BMP to WMSA. Proof. Let I be an instance of the P-BMP problem with a given placement φ. We construct an instance I for WMSA such that there is a solution for I with border length X if and only if there is a solution for I with a weighted SP score of X. Construction of I . We first show the construction of I . The input sequence for WMSA is the same as the input probe set P. The weight w(i, j) is defined as follows: 1 if φ(pj ) ∈ N (φ(pi )), w(i, j) = 0 otherwise. The distance function δ(a,⎧b), for a, b ∈ Σ ∪ {−}, is defined as follows: ⎪ ⎨0 if a = b, δ(a, b) = 1 if a = b and (a = “ − ” or b = “ − ”), ⎪ ⎩ ∞ otherwise. Note that the edit distance of pi and pj in WMSA is the same as dist(pi , pj ) in BMP. Solution for I implies solution for I . Suppose we have an embedding ε for I. Note that ε = {ε1 · · · εn } is an alignment for score of εi and εj equals P and the pairwise confε (pi , pj ). So, SP(P , w) = 12 1≤i,j≤n w(i, j) 1≤y≤|D| δ(εi [y], εj [y]) = 1 1 1≤i,j≤n w(i, j)confε (pi , pj ) = 2 pi ,pj :φ(pj )∈N (φ(pi )) confε (pi , pj ) = BL(φ, ε). 2 The second last equality is due to the definition of w(i, j), which is based on φ. Solution for I implies solution for I. On the other hand, suppose we have a solution for I , i.e., an alignment P = (p1 · · · pn ) for P and |pi | = m , for some m . In the alignment P , each column contains the same character or “ − ” because of the definition of the distance function δ(a, b). We denote the resulting matrix as ε = (ε1 · · · εn ). It can be seen that ε is an embedding for P and the hammingdistance between εi and εj equals the pair-wise score of pi and pj . Then BL(φ, ε) = 12 pi ,pj :φ(pj )∈N (φ(pi )) confε (pi , pj ) = 1 1 pi ,pj :φ(pj )∈N (φ(pi )) 1≤y≤|D| δ(pi [y], pj [y]) = 1≤i,j≤n w(i, j) 1≤y≤|D| 2 2 δ(pi [y], pj [y]) = SP(P , w). Note that the second last equality holds for the same reason as above. Therefore, the lemma follows.
Corollary 1. The P-BMP problem is O(log2 n)-approximable.
416
C.Y. Li et al.
Fig. 3. Row-by-row threading of a TSP (solid edges) on a grid. Solid and dotted edges connect neighbors in the placement that are and are not, respectively, neighbors on the TSP.
3.2 BMP: Finding Placement and Embedding In this section, we study the BMP problem √ in which we are to find both the placement as well as the embedding. We give an O( n log2 n)-approximation, which makes use of the approximability result for P-BMP (Section 3.1). To make use of the result for P-BMP, we need a certain placement, the choice of which is guided by some travelling salesman path (TSP) on a particular graph (to be defined). Note that finding the minimum TSP is NP-hard, yet there is a polynomial time O(1)-approximation [6]. The algorithm P LACE &E MBED. The approximation algorithm P LACE &E MBED is shown in Algorithm 1. The graph Gc constructed in the algorithm is a weighted complete graph with vertices representing P and edge weight representing dist() between the two vertices. A travelling salesman path (TSP) is obtained from Gc , which we “thread” on the grid G in a row-by-row fashion to form a placement [12]: the TSP is placed from left to right on the first row, right to left on the second, and then alternate in the same way in the remaining rows (see Figure 3 for an example). We then employ the approximation algorithm in Section 3.1. We denote the placement and embedding computed by P LACE &E MBED as φ˜ and ε˜, respectively. Algorithm 1. P LACE &E MBED: Approximation algorithm for BMP.
√ √ Input: Probe set P = {p1 , p2 , . . . , pn } to be placed on a n × n array. Output: A placement φ˜ and an embedding ε˜ for P. 1: Construct the weighted complete graph Gc . ˜ for Gc using algorithm in [6]. 2: Find an approximate TSP Q ˜ ˜ 3: Thread Q in a row-by-row fashion to obtain a placement φ. 4: Run the approximation algorithm for P-BMP in Section 3.1 (i.e., by reducing the P-BMP instance to an WMSA instance) to obtain an embedding ε˜.
√ Theorem 1. Algorithm P LACE &E MBED is an O( n log2 n)-approximation for BMP. To analyze the performance of P LACE &E MBED, we need some notations. Recall that we define for any sequences p, q, dist(p, q) = 2( − |LCS(p, q)|). We overload the notation dist() for any subgraph of Gc . For any subgraph H of Gc , we define the LCS distance of H, denoted by dist(H), to be the sum of LCS distances of neighboring probes in H, i.e., dist(H) = 12 p, q : q ∈ N (p) in H dist(p, q). As mentioned before in Section 2, dist(p, q) is the minimum conflict between probes p and q. Yet the embeddings needed to achieve dist(p, q) may not be compatible with
Approximating Border Length for DNA Microarray Synthesis
417
each other in a particular placement. For example, consider the placement φ in Figure 1, dist(φ) = 8 since dist(p, q) = 2 for every neighboring pair p, q. Yet the minimum border length is 10 with CTAC as the deposition sequence, and embeddings (− − AC, −TA−, CT − −, C − A−). We summarize this as follows. Observation 1. Given a placement φ, dist(φ) ≤ BL(φ, ε), for any embedding ε. Observation 1 implies that for the optimal placement φ∗ and embedding ε∗ , dist(φ∗ ) ≤ BL(φ∗ , ε∗ ). To approximate BMP, it suffices to bound the border length by dist(φ∗ ). On the other hand, we make an observation about a graph H1 and its subgraph H2 . The observation is true since any neighbors in H2 are also neighbors in H1 . Observation 2. Consider any graph H1 and a subgraph H2 of it. dist(H2 ) ≤ dist(H1 ). Corollary 2. Suppose Q∗ is the optimal TSP for Gc . Then, we have dist(Q∗ ) ≤ dist(φ∗ ). Proof. φ∗ can be viewed as threading a TSP Q in a row-by-row fashion. By Observation 2, dist(Q) ≤ dist(φ∗ ). As Q∗ is the optimal TSP, dist(Q∗ ) ≤ dist(Q) ≤ dist(φ∗ ).
˜ ≤ It is known that TSP can be approximated by 3/2 (Lemma 3). So, dist(Q) ∗ 3 dist(Q )/2. Lemma 3. [6] The travelling salesman problem admits a 3/2-approximation if the weight satisfies the triangle inequality. ˜ ≤ 2√n dist(Q); ˜ ε˜) ≤ O(log2 n) dist(φ). ˜ ˜ and (ii) BL(φ, Lemma 4. (i) dist(φ) ˜ = {u1 , u2 , . . . , un }. Note that the LCS distance dist() Proof (Sketch). (i) Suppose Q satisfies the triangular inequality, i.e., dist(ui , uj ) ≤ i≤k<j dist(uk , uk+1 ). Neigh˜ ˜ boring probes on Q are also neighbors in φ but not vice√versa. For any two probes ui ˜ we have 1 ≤ |j − i| < 2 n. When we sum up dist(φ), ˜ and uj which are neighbors in φ, √ k, may be counted more than once, but no more than 2 n times. dist(uk , uk+1 ), for any√ ˜ ≤ 2 n dist(Q). ˜ Therefore, dist(φ) (ii) In Step 4 of P LACE &E MBED, we reduce the P-BMP instance with φ˜ as the placement to an WMSA instance. Lemma 2 asserts that the border length of the embedding obtained is the same as the weighted SP score of the alignment. Furthermore, we have seen in Section 2 that approximation for WMSA can be found by the approximation for MRCT and the resulting routing tree has a routing cost, and thus, the weighted SP score, at most O(log2 n) times the total weighted edit distance in WMSA. In the proof of Lemma 2, we note that the weighted edit distance of two sequences is the same as ˜ ˜ ε˜) ≤ O(log2 n) dist(φ).
dist() of the two sequences. So, BL(φ, ˜ ε˜) ≤ O(√n log2 n) Proof (Theorem 1). By Lemmas 4, 3, and Corollary 2, we have BL(φ, √ √ 2 2 ∗ ∗ ˜ ≤ O( n log n) dist(Q ) ≤ O( n log n) dist(φ ) . Furthermore, Observadist(Q) tion 1 holds for all placements, and hence for φ∗ , in other words, dist(φ∗ ) ≤ BL(φ∗ , ε∗ ). √ ˜ ε˜) ≤ O( n log2 n) BL(φ∗ , ε∗ ). Therefore, BL(φ,
418
C.Y. Li et al. p
1 0 0 01 1 0 1 00 1 11 0 00 11 0 00 1 11 0 1 0 1 0 1 x1
S
x2
y1
q
new S
11 00 00 11
1 0 0 0 1 1 0 1 00 11 0 1 00 11 0 1 00 11 0 1 0 1 00 0 011 1 00 1 11 0 1 x4
x3
y2
1 0 0 1
y3
y4
1 0 0 1
1 0 0 1
Fig. 4. An illustration of E XTEND. Shaded squares refer to characters in LCS(p, q). Characters in q but not in LCS(p, q) are inserted into S so that the order preserves as in q (see the arrows).
3.3 One Dimensional Array In this section, we study the special case on an 1D array. Intuitively, the problem is easier than the 2D case. We show that P-BMP on an 1D array can be solved optimally in polynomial time while BMP on an 1D array admits an O(1)-approximation. P-BMP on 1D array. The algorithm E MBED 1 D shown in Algorithm 2 makes use of a procedure called E XTEND. E XTEND takes two sequences p and q, and a supersequence S of p as input and returns a supersequence of S and q. Let c = |LCS(p, q)|, x1 , x2 , . . . , xc be the indices of S corresponding to p that belongs to LCS(p, q), and y1 , y2 , . . . , yc be the indices of q that belongs to LCS(p, q). E XTEND then extends S by inserting characters in q but not in LCS(p, q): characters between q[yk−1 ] and q[yk ] are inserted right before S[xk ] and characters beyond q[yc ] are appended to the end of S. E XTEND keeps track of the indices of the new S that correspond to q (see Figure 4). Algorithm 2. E MBED 1 D: Optimal embedding for P-BMP on 1D array. Input: Probe set P = {p1 , p2 , . . . , pn }, placed on a 1D array in that order. Output: An embedding ε with minimum border length. 1: Set D = p1 . 2: For i > 1, call the procedure E XTEND with pi−1 , pi and D as the input to obtain a new D. 3: For each pi , set εi such that ε[y] = D[y] if D[y] corresponds to a character in pi kept track by E XTEND, and ε[y] = “ − ” otherwise.
Theorem 2. E MBED 1 D finds an optimal embedding for the P-BMP problem on 1D array in polynomial time. Proof. We first observe that D constructed in each iteration by E XTEND is a common supersequence of p1 , . . . , pi . This is clear from the way E XTEND finds LCS(pi−1 , pi ) and inserts characters into D. It also implies that the number of nucleotides shared by pi−1 and pi is maintained as |LCS(pi−1 , pi )|, which is the maximum possible. Note that this property does not change by later steps. Hence, the border length of the final embedding is the minimum. As for time complexity, the bottleneck is finding the longest common subsequences of two sequences, which is known to take polynomial time [13]. This is done for n−1 times only. Therefore, E MBED 1 D also takes polynomial time.
Approximating Border Length for DNA Microarray Synthesis
419
BMP on 1D array. Similar to the case on 2D array, we find a placement by finding an approximate TSP on the weighted complete graph Gc and then find an embedding by E MBED 1 D. This algorithm gives a 3/2-approximation for BMP on 1D array. Theorem 3. There is a polynomial time algorithm for BMP on 1D array with approximation ratio 3/2.
4 The Maximum Agreement Problem (AMP) In this section, we study the counter part of BMP, which we called maximum agreement problem (AMP) (recall definition in Section 2). In contrast to BMP, AMP admits constant approximations, whether the placement is given in advance or not. 4.1 Approximation for P-AMP We first study the P-AMP problem, a variant of AMP with a placement already given. Algorithm AE MBED. The algorithm AE MBED (E MBED for Agreement) makes use of procedure E XTEND in Section 3.3. The order of probes to be considered is determined by a certain tree T with the bottom rightmost probe in G being the root. To construct T , for each probe p, we assign a parent to the probe, denoted by parent(p). We denote by r(p) and b(p) the right and bottom neighbors of probe p, respectively. The probes in the rightmost column and bottommost column has r(p) = NULL and b(p) = NULL, respectively. We set parent(p) to r(p) or b(p) depending on whether |LCS(p, r(p))| or |LCS(p, b(p))| is larger. Details of AE MBED is shown in Algorithm 3. The embedding found is denoted by εˆ. Figure 5 shows an example. Analysis. To analyze the performance of AE MBED, we first observe that in the final embedding εˆ, the number of nucleotides shared by a probe and its parent equals to the length of their LCS (by a similar argument as the proof of Theorem 2). We then bound the performance of AE MBED as follows.
ACT 2 GAC
2
2
2 TAC
1
2 CTG 1 GCC 2 GCT
0
2
AAT 1
CAT
C C
GCT
AAT GCC CTG
TAC GAC
CTT ACT
(a)
C C
CTT CAT 2
(b)
G G G A G A T A
C C C C
A A A A
C A
T A A T C A A T C A A T C A A T C A A T C A A T C A A T C A A T
T T T T G T G T G T G C T G C T G C
(c)
Fig. 5. (a) A set of probes placed on a 3 × 3 grid G. The values represent the length of LCS between the two neighboring probes. An arrow from p to q means parent(p) = q. (b) The tree constructed by AE MBED with root CTT. (c) How the deposition sequence D changes iteratively. The sequences are drawn in a way the characters align with the final D.
420
C.Y. Li et al.
Algorithm 3. AE MBED: Approximate algorithm for P-AMP.
√ √ Input: Probe set P = {p1 , p2 , . . . , pn } placed on a n × n array according to a placement φ. Output: An embedding εˆ for P. 1: Construct a tree T by assigning parent to each probe p: if |LCS(p, r(p))| ≥ |LCS(p, b(p))| set parent(p) = r(p) else set parent(p) = b(p). 2: Set D to be the bottom rightmost probe in the grid G. 3: Traverse T in a pre-order fashion: for each probe p traversed, call the procedure E XTEND with parent(p), p and D as input. 4: For each pi , set εˆi such that εˆ[y] = D[y] if D[y] corresponds to a character in pi kept track by E XTEND, and εˆ[y] = “ − ” otherwise.
Theorem 4. AE MBED is a polynomial-time 2-approximation algorithm for P-AMP. ∗ Proof. For the given placement φ and the optimal embedding ε , the optimal agreement ∗ is: A(φ, ε ) = p∈P (shareε∗ (p, r(p)) + shareε∗ (p, b(p))). We assume shareε∗ (p, q) = 0 if q = NULL. As mentioned in Section 2, for any embedding, the share between the embeddings of probes p, q is at most 2|LCS(p, q)|. Thus, 2|LCS(p, r(p))| ≥ shareε∗ (p, r(p)) and 2|LCS(p, b(p))| ≥ shareε∗ (p, b(p)). Note that shareεˆ(p, parent(p)) = 1 2 max{ |LCS(p, r(p))|, r(p)) + shareε∗ (p, b(p))). |LCS(p, b(p))|} ≥ 2 (shareε∗ (p, 1 ∗ Therefore, A(φ, εˆ) = p∈P shareεˆ(p, parent(p)) ≥ 2 A(φ, ε ). Finally, AE MBED runs in polynomial time as the bottleneck is finding LCS between two sequences.
4.2 Approximation for AMP In this section, we study the general AMP problem to find both the placement and the embedding to maximize the agreement. We prove that the algorithm AP LACE &E MBED as shown in Algorithm 4 has an asymptotic approximation ratio of 4. Algorithm 4. AP LACE &E MBED: Approximation algorithm for AMP.
√ √ Input: Probe set P = {p1 , p2 , . . . , pn } to be placed on a n × n array. Output: A placement φˇ and an embedding εˇ for P. 1: Partition P into four disjoint groups A, C, G and T : a probe belongs to A if the number of A in the probe is the maximum over the number of other characters (similarly for C, G and T ). 2: Thread the probes in group A on the array in a row-by-row fashion, followed by threading of ˇ probes in C, G, and T to form the placement φ. 3: For probes in A, align them such that the maximum number of A are aligned while different characters are not aligned. This forms a partial embedding εˇa with deposition sequence Da . Similarly, find εˇc , εˇg , εˇt and Dc , Dg , Dt . 4: Combine Da , Dc , Dg , and Dt to form D (append one after the other). 5: Extend the embeddings εˇa , εˇc , εˇg , εˇt according to D by inserting “ − ” in the columns corresponding to other groups. The union of the extended embeddings is the resulting embedding εˇ.
Approximating Border Length for DNA Microarray Synthesis
421
Theorem 5. The asymptotic approximation ratio of AP LACE &E MBED is 4. Proof. Consider the optimal placement φ∗ and embedding ε∗ . For √ every pair of neighboring probes p, q, shareε (p, q) ≤ 2. There are a total of 2(n − n) pairs √ of neighbors on the grid in total. So, the optimal agreement A(φ∗ , ε∗ ) ≤ 4(n − n). On the other hand, consider φˇ and εˇ returned by AP LACE &E MBED. According to the way we partition the probes into group, for any two probes p, q in a group, the number of nucleotides that can be shared is at least /4.√Hence, shareεˇ(p, q) ≥ 2(/4) = /2. As we seen above, there are altogether 2(n − n) pairs of neighbors in the grid. We may not share any nucleotide when the pair belongs √ According to the √ to different groups. way we thread the groups, there are at most 3 n + 3 such pairs ( n pairs of vertical neighbors between consecutive groups and 3 pairs of neighbors that are the last√one in a group and the first one in the next group). As a result, we have at least 2n − 5 n − 3 ˇ εˇ) ≥ (n − 2.5√n − 1.5). Then pairs each with shareεˇ() at least /2. Therefore, A(φ, ˇ εˇ)/A(φ∗ , ε∗ ) tends to 4 as A(φ∗ , ε∗ ) tends to infinity. So, the asymptotic approxA(φ,
imation ratio of AP LACE &E MBED is 4.
5 Concluding Remarks To summarize, we study the border minimization problem which is believed to be NPhard with no known NP-hardness proof. An open question is to derive an NP-hardness proof. Another interesting open question is to improve the approximation ratio and/or derive inapproximability result. As mentioned before, there is an exponential time algorithm to compute the optimal BMP solution. Improving the exponential time algorithm could be useful in practice and is of theoretical interest.
References 1. Bartal, Y.: Probabilistic approximation of metric spaces and its algorithmic applications. In: Proc. 37th FOCS, pp. 184–193 (1996) 2. Bonizzoni, P., Vedova, G.D.: The complexity of multiple sequence alignment with SP-score that is a metric. Theoretical Computer Science 259(1–2), 63–79 (2001) 3. Carvalho Jr., S.A., Rahmann, S.: Improving the layout of oligonucleotide. microarrays: Pivot partitioning. In: Proc. 6th WABI, pp. 321–332 (2006) 4. Carvalho Jr., S.A., Rahmann, S.: Microarray layout as quadratic assignment problem. In: Proc. GCB, pp. 11–20 (2006) 5. Carvalho Jr., S.A., Rahmann, S.: Improving the design of genechip arrays by combining placement and embedding. In: Proc. 6th CSB, pp. 54–63 (2007) 6. Christofides, N.: Worst-case analysis of a new heuristic for the travelling salesman problem. Technical Report 388, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, PA (ND33) (1976) 7. Feng, D.F., Doolittle, R.F.: Approximation algorithms for multiple sequence alignment. Theoretical Computer Science 182(1), 233–244 (1987) 8. Fodor, S., Read, J.L., Pirrung, M.C., Stryer, L., Lu, A.T., Solas, D.: Light-directed, spatially addressable parallel chemical synthesis. Science 251(4995), 767–773 (1991) 9. Gerhold, D., Rushmore, T., Caskey, C.T.: DNA chips: promising toys have become powerful tools. Trends in Biochemical Sciences 24(5), 168–173 (1999)
422
C.Y. Li et al.
10. Gasieniec, ˛ L., Li, C.Y., Sant, P., Wong, P.W.H.: Randomized probe selection algorithm for microarray design. Journal of Theoretical Biology 248(3), 512–521 (2007) 11. Gusfield, D.: Efficient methods for multiple sequence alignment with guaranteed error bounds. Bulletin of Mathematical Biology 55(1), 141–154 (1993) 12. Hannenhalli, S., Hubell, E., Lipshutz, R., Pevzner, P.A.: Combinatorial algorithms for design of DNA arrays. Advances in Biochemical Engineering/Biotechnology 77, 1–19 (2002) 13. Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Communications of the ACM 18(6), 341–343 (1975) 14. Kaderali, L., Schliep, A.: Selecting signature oligonucleotides to identify organisms using DNA arrays. Bioinformatics 18, 1340–1349 (2002) 15. Kahng, A.B., Mandoiu, I.I., Pevzner, P.A., Reda, S., Zelikovsky, A.: Scalable heuristics for design of DNA probe arrays. Journal of Computational Biology 11(2/3), 429–447 (2004) 16. Kahng, A.B., Mandoiu, I.I., Reda, S., Xu, X., Zelikovsky, A.: Computer-aided optimization of DNA array design and manufacturing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 25(2), 305–320 (2006) 17. Kasif, S., Weng, Z., Detri, A., Beigel, R., DeLisi, C.: A computational framework for optimal masking in the synthesis of oligonucleotide microarrays. Nucleic Acids Research 30(20), e106 (2002) 18. Li, F., Stormo, G.: Selection of optimal DNA oligos for gene expression arrays. Bioinformatics 17(11), 1067–1076 (2001) 19. Rahmann, S.: The shortest common supersequence problem in a microarray production setting. Bioinformatics 19(suppl. 2), 156–161 (2003) 20. Reinert, K., Lenhof, H.P., Mutzel, P., Mehlhorn, K., Kececioglu, J.D.: A branch-and-cut algorithm for multiple sequence alignment. In: Proc. 1st RECOMB, pp. 241–250 (1997) 21. Slonim, D.K., Tamayo, P., Mesirov, J.P., Golub, T.R., Lander, E.S.: Class prediction and discovery using gene expression data. In: Proc. 4th RECOMB, pp. 263–272 (2000) 22. Sung, W.K., Lee, W.H.: Fast and accurate probe selection algorithm for large genomes. In: Proc. 2nd CSB, pp. 65–74 (2003) 23. Wu, B.Y., Lancia, G., Bafna, V., Chao, K.M., Ravi, R., Tang, C.Y.: A polynomial-time approximation scheme for minimum routing cost spanning trees. SIAM Journal on Computing 29(3), 761–778 (1999)
On a Question of Frank Stephan Klaus Ambos-Spies, Serikzhan Badaev, and Sergey Goncharov 1
University of Heidelberg, Im Neuenheimer Feld 294, D-69120 Heidelberg, Germany [email protected] http://www.math.uni-heidelberg.de/logic/personen.html 2 Kazakh National University, 71 Al-Farabi ave., Almaty 050038, Kazakhstan [email protected] 3 Sobolev’s Math. Institute, 4 Koptyug ave., Novosibirsk 630090, Russia [email protected] http://mmfd.nsu.ru/mmf/persons/gonchar/
Abstract. For many TxtEX-learnable computable families of recursively enumerable sets, all their computable numberings are equivalent with respect to the reduction via the functions recursive in the halting problem. We show that this holds for every TxtEX-learnable computable family of recursively enumerable sets, but, in general, the converse is not true. Keywords: computable family of sets, TxtEX learning, equivalent numberings.
1
Introduction
It is well-known (see [1]) that computable classes of finite sets, finite classes of recursively enumerable sets, and some classes of the graphs of recursive functions are TxtEX-learnable. On the other hand, the computable numberings of any of these classes are pairwise equivalent with respect to the reduction by 0 -recursive functions. It might be that these observations led Frank Stephan to propose the following conjecture to one of the authors of this paper: For every computable family A of r.e. sets, the following are equivalent. (i) A is TxtEX-learnable. (ii) All computable numberings of A are 0 -equivalent. Our aim is to show that one of the directions of this statement is true (namely the (easy) direction (i) ⇒ (ii)) while the converse fails. We follow the monograph [4] of Yu.L. Ershov in Russian and the survey papers [2], [3] for the terminology and notations accepted in the theory of numberings. A mapping α : N −→ L of the set N of natural numbers onto a family L of recursively enumerable (r.e. for short) sets is called a computable numbering of L if the set {x, n : x ∈ α(n)} is r.e., and a family L of subsets of N is called computable if it has a computable numbering. In other words, a computable family L is a uniformly r.e. class of sets, and every computable numbering α of L defines a uniform r.e. sequence α(0), α(1), . . . of the members of L. A M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 423–432, 2008. c Springer-Verlag Berlin Heidelberg 2008
424
K. Ambos-Spies, S. Badaev, and S. Goncharov
numbering α is called reducible (0 -reducible) to a numbering β if α = β ◦ f for some recursive (recursive relative to the halting problem) function f . Numberings α, β are called equivalent (0 -equivalent) if they are reducible (0 -reducible) to each other. A language in this paper is just an r.e. set. The class of all r.e. subsets of N is denoted by E, and W0 , W1 , . . . stands for the standard computable numbering of the family E. A text for a set L is a function t : N −→ N such that range(t) = L. Definition 1. A learner is a partial function M : N∗ −→ N. Notice that a learner is not required to be recursive. We call a learner M computable if the function M is partial recursive. Definition 2. A learner M learns or identifies a language L if for every text t for L, 1. limn M (t n) exists (we say in this case that M converges on the text t); 2. L = Wlimn M(tn) . If L is a family of languages, we say that M learns or identifies L if M identifies every language L ∈ L. L is TxtEX-learnable if it is identified by some computable learner. TxtEX denotes the class of all TxtEX-learnable families, [5].
Remark 1. If a computable family L of languages is TxtEX-learnable then it is identifiable by some primitive recursive learner.
2
Computable Numberings of TxtEX-Learnable Families
In this section we will show that any two computable numberings of a computable TxtEX-learnable family of languages are 0 -equivalent. Our proof will use the following representation of computable numberings by Lachlan (see [6] for the details). Let L be a computable family of r.e. sets. We say that an r.e. set A represents L if L = {Wx : x ∈ A}. Now if A represents L then any recursive function f enumerating A induces a computable numbering αf of L where αf (x) = Wf (x) . Moreover, if f and g are recursive functions enumerating A then the corresponding numberings αf and αg of L are equivalent. So, up to equivalence, any r.e. set A representing L induces a unique computable numbering of L in the just described way. Conversely, for any computable numbering α of L, there is an r.e. set A representing L such that the numbering induced by A is equivalent to α. The latter follows from the following well-known fact of the theory of numberings: α : N −→ L ⊆ E is a computable numbering iff α is reducible to the standard numbering W = We : e ≥ 0, so A can be chosen as the range of a function which reduces α to W . In the following we will refer to the above observations on representations as Lachlan’s representation theorem. The following lemma gives a criterion for the numberings induced by two representations of a computable family to be 0 -equivalent.
On a Question of Frank Stephan
425
Lemma 1. Let A and B be r.e. sets such that {Wa : a ∈ A} = {Wb : b ∈ B}
(1)
and let α and β be the numberings induced by the sets A and B respectively. Then α is 0 -reducible to β if and only if there exists a recursive function g : N×N → N such that, for any a ∈ A, lim g(a, s) ↓ & lim g(a, s) ∈ B & Wa = Wlims s
s
g(a,s) .
(2)
Proof. The proof is a straightforward modification of the proof of Lemma 2.2 from [6]. Theorem 1. Computable numberings of a computable TxtEX-learnable family of languages are pairwise 0 -equivalent. Proof. Let C be a computable TxtEX-learnable family, let M be a computable learner of C, and let A and B be any r.e. sets such that C = {Wa : a ∈ A} = {Wb : b ∈ B}. By Lachlan’s representation theorem and by Lemma 1, it suffices to define a recursive function g satisfying (2). In order to do so, we will show that there is a recursive function state : N3 → {0, 1} such that, for any numbers a ∈ A and b ∈ B the following hold. Wa = Wb ⇒ lim state(a, b, s) ↓ & lim state(a, b, s) = 1
(3)
Wa = Wb ⇒ lim state(a, b, s) ↓ & lim state(a, b, s) = 0
(4)
s
s
s
s
Then the function g defined by letting g(a, s) be the least b ∈ Bs such that state(a, b, s) = 1 (if there is such a b and by letting g(a, s) = 0 otherwise) will have the desired properties. Namely, given a ∈ A, by (1), we may fix b minimal such that b ∈ B and Wa = Wb . Then, given a stage s0 such that b ∈ Bs0 , by (3) and (4), we may fix s1 ≥ s0 such that, for s ≥ s1 , state(a, b, s) = 1 and state(a, b , s) = 0 for all b < b. So, for any stage s ≥ s1 , g(a, s) = b. The function state(a, b, s) is defined by induction on s. Simultaneously with state(a, b, s) we define a finite string σ(a, b, s) over N and we let content(σ(a, b, s)) be the set of numbers occuring in σ(a, b, s). For s = 0 we let σ(a, b, 0) = λ and state(a, b, 0) = 1. For the definition of σ(a, b, s + 1) and state(a, b, s + 1) we distinguish the following cases. Case 1 : content(σ(a, b, s)) ⊆ Wa,s ∩ Wb,s . Then let σ(a, b, s + 1) = σ(a, b, s) and state(a, b, s + 1) = 0.
426
K. Ambos-Spies, S. Badaev, and S. Goncharov
Case 2 : content(σ(a, b, s)) ⊆ Wa,s ∩ Wb,s and M (σ(a, b, s)) = M (σ(a, b, s)Wa,s ) = M (σ(a, b, s)Wb,s ). (Here Wa,s is viewed as the string of the elements of Wa,s in the order of enumeration (and, similarly, for Wb,s ).) Then let σ(a, b, s + 1) = σ(a, b, s) and state(a, b, s + 1) = 1. Case 3 : Otherwise, i.e., content(σ(a, b, s)) ⊆ Wa,s ∩ Wb,s and M (σ(a, b, s)) = M (σ(a, b, s)Wa,s ) or M (σ(a, b, s)) = M (σ(a, b, s)Wb,s ). Then let σ(a, b, s+1) = σ(a, b, s)Wa,s Wb,s if M (σ(a, b, s)) = M (σ(a, b, s)Wa,s ) and σ(a, b, s + 1) = σ(a, b, s)Wb,s Wa,s otherwise. In either case let state(a, b, s + 1) = 0. This completes the definition of σ and state. To show that the function state satisfies (3) and (4), fix a ∈ A and b ∈ B. Note that, by definition, σ(a, b, s) σ(a, b, s + 1),
(5)
content(σ(a, b, s)) ⊆ Wa,s ∪ Wb,s ,
(6)
σ(a, b, s) σ(a, b, s + 1) ⇒ content(σ(a, b, s)) ⊆ Wa,s ∩ Wb,s .
(7)
and Next we will show that there are only finitely many stages s such that Case 3 applies in the definition of σ(a, b, s + 1) and state(a, b, s + 1). For a contradiction assume that this happens infinitely often. Since, for s < s such that Case 3 applies to s + 1 and s + 1, content(σ(a, b, s + 1)) = Wa,s ∪ Wb,s and content(σ(a, b, s + 1)) ⊆ content(σ(a, b, s )) ⊆ Wa,s ∩ Wb,s , it follows that Wa ∪ Wb ⊆ Wa ∩ Wb whence Wa = Wb . So the infinite sequence lims σ(a, b, s) is an enumeration (i.e., a text) of both Wa and Wb . Moreover, if Case 3 holds at stage s + 1 then the extension σ(a, b, s + 1) of σ(a, b, s) is chosen in such a way that M (σ(a, b, s)) = M (σ) for some σ with σ(a, b, s) σ σ(a, b, s + 1). So on the sequence lims σ(a, b, s) the learner M changes its mind infinitely often, hence does not learn Wa contrary to assumption. Now, since Case 3 applies only finitely often and since σ(a, b, s) is only extended at a stage s + 1 at which Case 3 applies, we may fix s0 such that σ(a, b, s) = σ(a, b, s0 ) for all s > s0 and Case 3 does not apply after stage s0 . Distinguish the following two cases. First assume that content(σ(a, b, s0 )) ⊆ Wa ∩ Wb . Then, by (6), Wa = Wb and Case 1 applies to all stages s > s0 whence lims state(a, b, s) = 0. So (4) holds. Finally assume that content(σ(a, b, s0 )) ⊆ Wa ∩ Wb . Then, for s1 > s0 with content(σ(a, b, s0 )) ⊆ Wa,s1 ∩ Wb,s1 , Case 2 applies to all stages s > s1 whence lims state(a, b, s) = 1. It remains to show that Wa = Wb in this case. Now, by definition, M (σ(a, b, s0 )) = M (σ(a, b, s0 )Wa,s ) and M (σ(a, b, s0 )) = M (σ(a, b, s0 )Wb,s ), so given the enumeration of Wa obtained by the initial segments σ(a, b, s0 )Wa,s (s ≥ s1 ), the learner M does not make any changes in his
On a Question of Frank Stephan
427
prediction after stage s1 . Since M learns Wa this implies that M (σ(a, b, s0 )) is an index of Wa . By a similar argument, M (σ(a, b, s0 )) is an index of Wb too, whence Wa = Wb . This completes the proof.
3
A Counterexample
Now we will give a counterexample to the converse part of the conjecture of Frank Stephan. Indeed, we will prove the following stronger statement. Theorem 2. There exists a computable family L ∈ / TxtEX whose computable numberings are pairwise equivalent. Proof. We will only sketch the proof by discussing the strategies for meeting the requirements, the conflicts among these strategies and how these conflicts are resolved. We will construct a computable numbering α such that for the family L = {α(x) : x ∈ N} of r.e. sets the following hold. (a) All computable numberings of L are equivalent to α. (b) L is not TxtEX-learnable. Indeed, we will build a positive numbering α, i.e., a numbering α such that the relation α(x) = α(y) is r.e. in x and y. Any positive numbering is minimal under reduction, [4], and, therefore, in order to ensure (a) it suffices to reduce all computable numberings of L to that numbering α. Moreover, in order to ensure (b), by Remark 1 it suffices to guarantee that no primitive recursive learner identifies L. Requirements. Let M0 , M1 , . . . be a computable sequence of all primitive recursive learners, and let γ0 , γ1 , . . . be a uniformly computable sequence of all possible computable numberings of computable families of r.e. sets. We fix any uniform approximation γks (x) for this sequence. Then the numbering α has to meet the following requirements for all k, e ∈ N: P : α is a computable positive numbering. Rk : If γk is a numbering of L then γk is reducible to α. Me : There exists an α-index m such that Me fails to learn the set α(m). We will refer to Rk as the kth reduction requirement and to Me as the eth nonlearning requirement. The priority ordering among the requirements Rk and Me is defined as usual by giving requirements with smaller index higher priority and by giving Rn higher priority than Mn . We identify each α-index n with a triple of numbers, n = e, i, j, where the individual components of n have the following meaning:
428
K. Ambos-Spies, S. Badaev, and S. Goncharov
– e means that the language α(n) might be used for diagonalizing against the learner Me , i.e., for meeting the nonlearning requirement Me , – i denotes the attempt number for trying to diagonalize against Me (due to our strategy for meeting the higher priority reduction requirements, a single attempt might not suffice), – j means that α(n) is the jth candidate in the ith attempt for diagonalizing against Me . We denote the components e, i, j of a triple n = e, i, j by π1 (n), π2 (n), and π3 (n) respectively. Moreover, we refer to the sets α(n) with π1 (n) = e and π2 (n) = i as the sets in section (e, i). (So sets in section (e, i) are reserved for the ith attempt for meeting Me .) Strategies. For meeting the requirements P and Rk we have to take some precautions in the enumerations αs (n) of the sets α(n). Strategy for meeting P. In order to make α positive we ensure, for all stages s and numbers m and n, αs (m) = αs (n) ⇒ ∀ t ≥ s (αt (m) = αt (n))
(8)
α(m) = α(n) ⇒ ∃t (αt (m) = αt (n)).
(9)
and s
s
So, in particular, α(m) = α(n) if and only if α (m) = α (n) for some stage s. Obviously, this implies that {(m, n) : α(m) = α(n)} is r.e. Strategy for meeting Rk . Let b(n) = 2n and let a(n) = 2n + 1. Initially, we let α0 (n) = {b(n)} (10) and call b(n) the base element of α(n). So, for n = m, α0 (n) = α0 (m) and α0 (n) can be positively distinguished from α0 (m) by its base element. Numbers a(n) may be enumerated into some sets α(m) later, where a(s) will not enter any set α(m) before stage s. Sets in different sections will be distinguishable by their base elements, i.e., the base element of a set α(n) will never be put into any set α(m) in a different section. For sets α(m) and α(n) in the same section (e, i), however, our strategy for meeting the nonlearning requirements may force us to enumerate the base element of α(n) into α(m). If this happens, the role of the base element of α(n) will be played by some new number a(s) put into α(n) before b(n) enters α(m) unless we make α(n) and α(m) agree. To be more precise, the enumeration of the sets in a given section (e, i) will obey the following rules. At any stage s there will be a distinguished set, α(e, i, je,i (s)), called the active set, where je,i (0) = 1 and je,i (s + 1) ≥ je,i (s) is defined at stage s + 1. Then, at any stage s ≥ 1, – the sets α(e, i, j) for j < je,i (s) have been previously merged with the primary set of the section, α(e, i, 0), i.e., αs (e, i, j) = αs (e, i, 0) for j < je,i (s);
On a Question of Frank Stephan
429
– αs (e, i, je,i (s)) and αs (e, i, 0) can be positively distinguished from each other, namely the base element b(e, i, je,i (s)) of α(e, i, je,i (s)) is not in αs (e, i, 0) and some element of αs−1 (e, i, 0), denoted by ˆb(e, i, 0, s − 1) below, is not in αs (e, i, je,i (s)); and – the sets α(e, i, j) for j > je,i (s) are still in their initial states and can be positively distinguished from all other sets by their base elements, i.e., αs (e, i, j) = {b(e, i, j)} and b(e, i, j) ∈ αs (m) for all m = e, i, j. This will be achieved by initially letting je,i (0) = 1 and ˆb(e, i, 0, 0) = b(e, i, 0), and by allowing only the following procedures to be applied to the sets of section (e, i) at a stage s + 1 > 0. – COPY(e, i, s + 1). αs+1 (e, i, je,i (s)) = αs (e, i, je,i (s)) ∪ [αs (e, i, 0) \ {ˆb(e, i, 0, s)}]. – MERGE(e, i, s + 1). For all j ≤ je,i (s) let αs+1 (e, i, j) = αs (e, i, 0) ∪ αs (e, i, je,i (s)) ∪ {a(s + 1)} and set ˆb(e, i, 0, s + 1) = a(s + 1) and je,i (s + 1) = je,i (s) + 1. where these procedures are applied alternatingly starting with COPY and where at most one of the procedures is applied at any stage s + 1 > 0. (If not stated otherwise, a parameter will maintain its value at stage s + 1.) Note that the above will ensure that (8) and (9) are satisfied whence α is positive. Now, in order to show how the above will help us to meet the reduction requirements, fix k. Assuming that γk is a numbering of L we have to give a reduction function gk from γk to α. Given x, we will define gk (x) as follows. 1. Wait for a stage s1 > x and a number n such that b(n) ∈ γks1 (x), say n = e, i, j. Distinguish the following two cases. 2. If j ≥ je,i (s1 ) then let gk (x) = n. 3. If j < je,i (s1 ) then wait for the least stage s2 ≥ s1 such that γks2 (x) = αs2 (e, i, 0) or je,i (s1 ) < je,i (s2 ) or γks2 (x) = αs2 (e, i, je,i (s1 )). In the former two cases let gk (x) = e, i, 0, in the latter case let gk (x) = e, i, je,i (s1 ). Note that if we wait for a stage s1 and n as above forever then γk (x) ∈ L, hence γk is not a numbering of L. Similarly, if there is no stage s2 ≥ s1 such that je,i (s1 ) < je,i (s2 ) then only finitely many operations are applied to section (e, i). So there will be s ≥ s1 at which all sets in the section have reached their final (finite) state, and γk (x) will disagree from all of these sets. So, again, γk is not a numbering of L. So we may conclude, that gk is total. (In the actual construction we will define gk (x) according to the above procedure only if, for s1 and n as above, π1 (n) ≥ k. If π1 (n) < k then the value of gk (x) will be specified depending on the outcomes of the strategies for meeting the nonlearning requirements M0 , . . . , Mk−1 .)
430
K. Ambos-Spies, S. Badaev, and S. Goncharov
It remains to argue that γk (x) = α(gk (x)). Note that if we let gk (x) = n, n = e, i, j, at stage s then b(n) ∈ γks (x) & ∀ m = n (b(n) ∈ αs (m)) (namely in case of 2. and in the third case of 3.) or j = 0 & [ (b(n) ∈ γks (x) ∩ αs (e, i, 0) & b(n) ∈ αs (e, i, je,i (s))) or γks (x) = αs (e, i, 0) ]. Now, in the former case, if α(n) is not merged with α(e, i, 0), then b(n) will positively distinguish α(n) from all sets α(m) with m = n in the limit. So it suffices to show that if the first case applies and α(n) is merged with α(e, i, 0) or if the second case applies then γk (x) = α(e, i, 0). Here a problem may arise by a COPY operation applied at a stage t > s. Then the current approximation γkt (x) of γk (x) may be a subset of both, αt (e, i, 0) and αt (e, i, je,i (t − 1)), and in the limit the latter two sets may disagree. (So it could turn out that γk (x) = α(e, i, je,i (t − 1)) = α(e, i, 0).) We can prevent this from happening by requiring that any COPY operation has to wait for k-confirmation. Here we say that stage s + 1 is k-confirmed (w.r.t. section (e, i)) if, for any x and j such that gks (x) = e, i, j for some j < je,i (s), αs (e, i, 0) ⊆ γks (x). Note that if s + 1 is k-confirmed then (for x and j as above), in particular, ˆb(e, i, 0, s) ∈ γks (x), so if COPY is applied at stage s, still γks (x) ⊆ αs+1 (e, i, je,i (s)). Hence if we limit the COPY operation to k-confirmed stages then the reduction gk will be correct. But what effect does this limitation may have on the Me strategies? For a single k such that γk is a numbering of L there will be infinitely many k-confirmed stages (provided that we sufficiently slow down the enumeration of gk ), so that waiting for confirmation will not interfere with the intended action on section (e, i). For k such that γk is not a numbering of L, however, we may wait for a k-confirmation forever. So any attack on the nonlearning requirement Me will be provided with a guess at which of the higher priority reduction requirements have a correct hypothesis. An attack based on guess (i0 , . . . , ie ) ∈ {0, 1}e+1 will wait for kconfirmation for those k ≤ e such that ik = 1. If the guess is correct, waiting for confirmation will not interfere with the nonlearning startegy. On the other hand, if in the course of the construction it seems that an attack erroneously ignores some k, then the attack is abandoned and the sets in the section (e, i) used by the attack are all identified (by letting α(e, i, j) = {b(e, i, j ) : j ≥ 0} ∪ {a(n) : n ≥ 0}). So the reduction gk will be trivially correct on section (e, i). Strategy for meeting Me . Given a section (e, i), in the ith attempt for meeting Me we build a (finite or infinite) sequence of strings over N, namely ∗
j 0 1 σe,i · · · σe,i σe,i
(j ∗ ≥ 0) or
0 1 2 σe,i σe,i σe,i ...
On a Question of Frank Stephan
431
such that -if the ith attempt is the successful one- either ∗
j content(σe,i ) ⊆ α(e, i, 0) ∩ α(e, i, j ∗ + 1) & Me fails to learn α(e, i, 0) or α(e, i, j ∗ + 1)
or
(11)
j content(σe,i ) = α(e, i, 0) & j Me fails to learn α(e, i, 0) from the text σe,i = limj→∞ σe,i . j∈N
(12)
j This is achieved by induction on j ≥ 1, where string σe,i is defined in the j0 cycle given below (σe,i is the empty string). Cycle j will affect only α(e, i, 0) and j−1 )∪ α(e, i, j). If cycle j is started at stage sj then αsj (e, i, 0) = content(σe,i {ˆb(e, i, 0, sj )}, j is active at stage sj , i.e., j = je,i (sj ), and αsj −1 (e, i, j) is still in its initial state.
Cycle j (j ≥ 1). 1. Wait for the least stage sj > sj which is k-confirmed for k ≤ e such that ik = 1 where (i0 , . . . , ie ) is the guess underlying the attack. Perform COPY(e, i, sj ). 2. Wait for the least stage sj > sj such that
j−1 ˆ j−1 Me (σe,i ) b(e, i, 0, sj )sj −sj ) = Me (σe,i
or
j−1 j−1 b(e, i, j)sj −sj ) = Me (σe,i ). Me (σe,i
(13) (14)
Then MERGE(e, i, sj ), let j−1 ˆ b(e, i, 0, sj )sj −sj b(e, i, j) if (13) holds σe,i j σe,i = j−1 b(e, i, j)sj −sj ˆb(e, i, 0, sj ) otherwise, σe,i and start cycle j + 1. The success of this strategy (if based on the correct guess, hence not stuck in step 1 of any cycle) is shown as follows. If the strategy gets stuck in step 2 of cycle j then, for j ∗ = j − 1, (11) holds. Namely, the learner Me will make j−1 ˆ b(e, i, 0, sj )ω of α(e, i, 0) = αsj (e, i, 0) the same prediction for the text σe,i j−1 and for the text σe,i b(e, i, j)ω of α(e, i, j) = αsj (e, i, je,i (sj )) though these sets differ. If all cycles are completed then (12) holds since Me makes infinitely j many changes on the text σe,i = limj→∞ σe,i for α(e, i, 0). This completes our discussion of the strategies and the basic ideas underlying the proof. Acknowledgments. The second author’s research was partially supported by the State Grants of Kazakhstan “Best Teacher of Higher Education” for 2006 and 2007. The third author was partially supported by the Grant RFBR-08-0100336 and the Grant for Leading Scientific Schools SS-4413 for 2006.
432
K. Ambos-Spies, S. Badaev, and S. Goncharov
References 1. Jain, S., Osherson, D., Royer, J.S., Sharma, A. (eds.): Systems That Learn. An Introduction to Learning Theory, 2nd edn. The MIT Press, Cambridge Massachusetts (1999) 2. Ershov, Y. L.: Theory of Numberings. In: Handbook of Computability Theory, pp. 473–503. North-Holland, Amsterdam (1999) 3. Badaev, S.A., Goncharov, S.S.: The Theory of Numberings: Open Problems. In: Cholak, P.A., Lempp, S., Lerman, M., Shore, R.A. (eds.) Computability Theory and its Applications. Current Trends and Open Problems, pp. 23–38, Amer. Math. Soc., Providence (2000) 4. Ershov, Y.L.: Theory of Numberings. Nauka, Moscow (1977) 5. Gold, E.M.: Language Identification in the Limit. Information and Control 10, 447– 474 (1967) 6. Badaev, S.A., Goncharov, S.S., Podzorov, S., Podzorov, S.Y., Sorbi, A.: Algebraic Properties of Rogers Semilattices of Arithmetical Numberings. In: Cooper, S.B., Goncharov, S.S. (eds.) Computability and Models, pp. 45–77. Kluwer / Plenum Publishers, New York (2003)
A Practical Parameterized Algorithm for the Individual Haplotyping Problem MLF Minzhu Xie1,2 , Jianxin Wang1, , and Jianer Chen1 1
School of Information Science and Engineering, Central South University 2 College of Physics and Information Science, Hunan Normal University [email protected],[email protected] http://netlab.csu.edu.cn
Abstract. The individual haplotyping problem Minimum Letter Flip (MLF) is a computational problem that, given a set of aligned DNA sequence fragment data of an individual, induces the corresponding haplotypes by flipping minimum SNPs. There has been no practical exact algorithm to solve the problem. In DNA sequencing experiments, due to technical limits, the maximum length of a fragment sequenced directly is about 1kb. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence, the maximum number k1 of SNP sites that a fragment covers is usually small. Moreover, in order to save time and money, the maximum number k2 of fragments that cover a SNP site is usually no more than 19. Based on the properties of fragment data, the current paper introduces a new parameterized algorithm of running time O(nk2 2k2 + mlogm + mk1 ), where m is the number of fragments, n is the number of SNP sites. The algorithm solves the MLF problem efficiently even if m and n are large, and is more practical in real biological applications. Keywords: SNP (single-nucleotide polymorphism); haplotype; MLF (Minimum Letter Flip); NP-hardness; parameterized algorithm.
1
Introduction
A single nucleotide polymorphism (SNP) is a single base mutation of a DNA sequence that occurs in at least 1% of the population. SNPs are the predominant form of human genetic variation, and more than 3 million SNPs are distributed throughout the human genome [1, 2]. Detection of SNPs is used in identifying
This research was supported in part by the National Natural Science Foundation of China under Grant Nos. 60433020 and 60773111, the Program for New Century Excellent Talents in University No. NCET-05-0683, the Program for Changjiang Scholars and Innovative Research Team in University No. IRT0661, and the Scientific Research Fund of Hunan Provincial Education Department under Grant No.06C526. The corresponding author.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 433–444, 2008. c Springer-Verlag Berlin Heidelberg 2008
434
M. Xie, J. Wang, and J. Chen
biomedically important genes for the diagnosis and therapy of human hereditary diseases, in identification of individual and descendant, and in the analysis of genetic relations of populations. In humans and other diploid organisms, chromosomes are paired up. A haplotype describes the SNP sequence of a chromosome. In Fig. 1, the haplotypes of the individual are “ATACG” and “GCATG”. Because two chromosomes paired up are not completely identical, the haplotype contains more information than the A . . T . . A . . C . . G . genotype. Haplotyping, i.e. identification of chromosome hap- G . . C . . A . . T . . G . lotypes, plays an important role in SNP applications [3]. Stephens et al. [4] identified 3899 SNPs that were present Fig. 1. SNPs within 313 genes from 82 unrelated individuals of diverse ancestry. Their analysis of the pattern of haplotype variation strongly supports the recent expansion of human populations. Based on linkage studies of SNPs and the association analysis between haplotypes and type 2 diabetes, Horikawa et al. [5] localized the gene NIDDM1 to the distal long arm of chromosome 2 and found 3 SNPs in CAPN10 associated with type 2 diabetes. Haplotyping has been time-consuming and expensive using biological techniques. Therefore, effective computational techniques have been in demand for solving the haplotyping problem. A number of combinatorial versions of the haplotyping problem have been proposed. In the current paper, we will be concentrated on an important version Minimum Letter Flips (MLF) [6, 7] that comes from the individual haplotyping problem [8]: Given a set of aligned SNP sequence fragment data from the two copies of a chromosome, find a minimum number of SNPs to correct so that there exist two haplotypes compatible with the corrected fragments. The MLF problem is also called the Minimun Error Correction (MEC) problem [9, 10]. The problem is NP-hard [9] and there has been no practical exact algorithm to solve the problem [6, 10, 11]. By carefully studying related properties of fragment data, we have found the following fact. In all sequencing centers, due to technical limits, the sequencing instruments such as ABI 3730 and MageBACE can only sequence DNA fragments whose maximum length is about 1000 nucleotide bases. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence [12], the maximum number k1 of SNP sites that a fragment covers is small. Moreover, in order to save time and money, the maximum number k2 of fragments that cover a SNP site is usually no more than 19 [1, 13, 14]. Based on the observation above, the current paper proposes a new algorithm of time O(nk2 2k2 + mlogm + mk1 ), where m is the number of fragments and n is the number of SNP sites. The algorithm solves the problem efficiently even if m and n are large, and is more practical in real biological applications.
2
The Individual Haplotyping MLF Problem
For a pair of chromosomes, a SNP site where both haplotypes have the same nucleotide is called a homozygous site, and a SNP site where both haplotypes
A Practical Parameterized Algorithm
613
$
%
$
$
$
$
%
$
$
$
%
$
$
IUDJPHQW
613
IUDJPHQW
$
$
%
$
%
$
%
$
$
%
$
%
%
$
%
%
$
435
(a) Error-free
(b) M3,3 is an error read Fig. 2. SNP Matrices
have different nucleotides is called a heterozygous site. To reduce the complexity, a haplotype can be represented as a string over a two-letter alphabet {A, B} rather than the four-letter alphabet {A, C, G, T} , where ‘A’ denotes the major allele and ‘B’ denotes the minor. In Fig. 1, the haplotypes can be represented by “ABABB” and “BAAAB”. Only considering SNPs, the aligned DNA sequence fragment data of an individual can be represented as an m× n SNP matrix M over the alphabet {A,B,-}, in which n columns represent a sequence of SNPs according to the order of sites in a chromosome and m rows represent m fragments. Fig. 2 shows two 5 × 4 SNP matrices. In the matrix M , the ith row’s value at the jth column is denoted by Mi,j , which equals the ith fragment’s value at the jth SNP site. If the value of the ith fragment at the jth SNP site misses (i.e. there is a hole in the fragment) or the ith fragment doesn’t cover the jth SNP site, then Mi,j takes the value “−” (the value “−” will be called the empty value). The following are some definitions related to the SNP matrix M . We say that the ith row covers the jth column if there are two indices k and r such that k ≤ j ≤ r, and both Mi,k and Mi,r are not empty. For example, in Fig. 2(a), row 2 covers columns 1, 2 and 3. The set of (ordered) rows covering the jth column is denoted by rowset(j). The first and the last column that the ith row covers are denoted by lef t(i) and right(i) respectively. If Mi,j = “ − ”, Mk,j = “ − ” and Mi,j = Mk,j , then the ith and kth rows of M are said to conflict at column j. If the ith and kth rows of M do not conflict at any column then they are compatible. A SNP matrix M is feasible if its rows can be partitioned into two classes such that the rows in each class are all compatible. Obviously, a set of DNA sequence fragment data without error corresponds to a feasible SNP matrix, with each of the compatible classes in the matrix corresponding to one of a pair of haplotypes. As in Fig. 2(a), it is easy to see fragments 1 and 2 are from a chromosome and fragments 3, 4 and 5 are from another chromosome. So it is easy to infer that a haplotype is “ABAA” and another is “BABA”.
436
M. Xie, J. Wang, and J. Chen
However, because of contaminants and read errors during the sequencing process, the SNP matrix corresponding to the fragment data is in general not feasible. This fact has resulted in the following combinatorial optimization problem [6, 9, 10]: Minimum Letter Flips (MLF): Given a SNP matrix M , flip (or correct) a minimum number of elements (“A” into “B” and vice versa) so that the resulting matrix is feasible, i.e. the corrected SNP fragments can be divided into two disjoint sets of pairwise compatible fragments, with each set determining a haplotype. With a SNP matrix M , we will denote MLF(M ) a solution to the MLF problem (i.e. the minimum number of elements have to be flipped to make M feasible). As for the SNP matrix M in Fig. 2(b), MLF(M )=1 because with no element flipped, M is not feasible, and once M3,3 is flipped to “B”, M becomes feasible. The MLF problem is NP-hard [9]. Wang et al. [10] proposed a branch-andbound algorithm whose time complexity is O(2m ), which is impractical when the number of fragments m is large. There has been no more efficient exact algorithm [6, 10, 11]. Therefore it is of practical importance to introduce more efficient algorithms.
3
Parameterized MLF Algorithm
Currently the main method to identify SNPs is DNA direct sequencing [15]. The prevailing method to sequence a DNA fragment is the Sanger’s technology of DNA sequencing with chain-termination inhibitors [16], which cannot sequence any DNA fragment longer than 1200 bases due to the technique limits. A recent research [17] revealed that two copies of the human genome differ from one another by approximately 0.5% of nucleotide sites, and the number of SNP sites in a fragment read is very small and no more than 10 according to the available data in spite of the varying distribution density along a chromosome [3, 18]. In order to save money and time, in DNA sequencing experiments, the fragment coverage is also small. In Celera’s whole-genome shotgun assembly of the human genome, the fragment average coverage is 5.11 [1], and in the human genome project of the International Human Genome Sequencing Consortium, the fragment average coverage is 4.5 [14]. Huson et al. [13] have analyzed the fragment data of the human genome project of Celera’s, and given a fragment coverage plot. Although the fragment covering rate is not the same at all sites along the whole genome, the plot shows that most sites are covered by 5 fragments, and that the maximum of fragments covering a site is no more than 19. Therefore, for an SNP site, compared with the total number of fragments, the number of fragments covering the SNP site is very small. Based on the observations above, we introduce the following parameterized condition.
A Practical Parameterized Algorithm
437
Definition 1. The (k1 , k2 ) parameterized condition: the number of SNP sites covered by a single fragment is bounded by k1 , and each SNP site is covered by no more than k2 fragments. Accordingly, in a SNP matrix satisfying the (k1 , k2 ) parameterized condition, each row covers at most k1 columns and each column is covered by at most k2 rows. For an m × n SNP matrix M , the parameters k1 and k2 can be obtained by scanning all rows of M . In the worst case, k1 = n and k2 = m. But as to the fragment data of Celera’s human genome project, k2 is no more than 19 [13]. For a solution to the MLF problem for a SNP matrix M , after flipping the corresponding elements, all the rows of M can be partitioned into two classes H0 and H1 , such that all pairs of rows in the same class are compatible. Definition 2. Let R be a subset of (ordered) rows of a SNP matrix M . A partition function P on R maps each row in R to one of the values {0, 1}. Suppose that R contains h > 0 rows, then a partition function P on R can be coded by an h-digit binary number in {0, 1}, where the i-th digit is the P value of the ith row in R. For completeness, if R = ∅, we also define a unique partition function P , which is coded by −1. For briefness, a partition function defined on rowset(j) is called a partition function at column j. For a SNP matrix M satisfying the (k1 , k2 ) parameterized condition, there are at most 2k2 different partition functions at column j. Let R be a set of rows of the matrix M , and P be a partition function on R. For a subset R of R, the partition function P on R obtained by restricting P on the subset R is called the projection of P on R , and P is called an extension of P on R. Definition 3. Fix a j. Let P be a partition function on a row set R. Define VE [P, j] to be any subset S of elements of M that satisfies the following conditions: (1) For each element Mr,k in S: 1 ≤ k ≤ j. (2) After flipping the elements of S, there is a partition (H0 , H1 ) of all rows of M such that any two rows in the same class do not conflict at any column from 1 to j, and for any row i ∈ R, row i is in the class Hq if and only if P (i) = q, for q ∈ {0, 1}. Definition 4. Fix a j. Let P be a partition function at column j. Define SE [P, j] to be any VE [P, j] that minimizes the number of elements in VE [P, j]. And E[P, j] is defined as the number of elements in SE [P, j]. From Definitions 3 and 4, it is easy to verify that the following equation holds true: MLF(M ) =
min
(E[P, n])
P : P is a partition function at column n
(1)
Let P be a partition function at column j, for k ∈ {0, 1}, NA (P, j, k) denotes the number of the rows whose P value is k, and whose value at column j is
438
M. Xie, J. Wang, and J. Chen
CompFlips(j, P, F ) // F denotes F (P, j) // As [k] denotes NA (P, j, k), Bs [k] denotes NB (P, j, k), k = 0, 1 1. As [0] = Bs [0] = As [1] = Bs [1] = 0; k = 0; Ptmp = P ; F = ∅; 2. for i=the first row .. the last row of rowset(j) do 2.1. k =the least significant bit of Ptmp ; //k = P (i) 2.2. Ptmp right shift 1 bit; 2.3. if Mi,j =‘A’ then As [k] = As [k] + 1; 2.4. if Mi,j =‘B’ then Bs [k] = Bs [k] + 1; 3. for k = 0..1 do 3.1. if As [k] > Bs [k] then M inor[k] =‘B’; 3.2. else M inor[k] =‘A’; 4. Ptmp = P ; 5. for i = the first row .. the last row of rowset(j) do 5.1. k = the least significant bit of Ptmp ; Ptmp right shift 1 bit; 5.2. if Mi,j = M inor[k] then F = F ∪ Mi,j ;
Fig. 3. Function CompFlips
‘A’, i.e. NA (P, j, k) = | {i | P (i) = k ∧ Mi,j = ‘A’} |; similarly, | {i | P (i) = k ∧ Mi,j = ‘B’} | is denoted by NB (P, j, k). If NA (P, j, k) > NB (P, j, k), then M inor(P, j, k) = ‘B’, otherwise M inor(P, j, k) = ‘A’. Under the condition that the rows covering column j are partitioned by P and the rows partitioned into the same class don’t conflict at column j, the values at column j of some rows have to be flipped to avoid confliction. The number of the elements flipped at column j is minimal if and only if those Mi,j , whose value equal to M inor(P, j, P (i)), are flipped. We denote the set of the flipped elements above by F (P, j), i.e. F (P, j) = {Mi,j | Mi,j = M inor(P, j, P (i))}. The function CompFlips to compute F (P, j) is given in Fig. 3, whose time complexity is O(k2 ). So SE [P, 1] and E[P, 1] can be obtained as follows. SE [P, 1] = F (P, 1) E[P, 1] = |F (P, 1)|,
i.e. the number of elements in F (P, 1)
(2) (3)
In order to present our algorithm, we need to extend the above concepts from one column to two columns as follows. Let the set of all rows that cover both columns j1 and j2 be Rc (j1 , j2 ). Definition 5. Fix a j. Let P be a partition function defined on Rc (j, j + 1). Define SB [P , j] to be any VE [P , j] that minimizes the number of elements in VE [P , j]. And B[P , j] is defined as the number of elements in SB [P , j].
A Practical Parameterized Algorithm
439
Given j and a partition function P on Rc (j, j + 1). If E[P, j] and SE [P, j] are known for every extension P of P on rowset(j), then B[P , j] and SB [P , j] can be calculated by the following equations: B[P , j] =
min
P : P is an extension of P on rowset(j)
(E[P, j])
SB [P , j] = SE [P, j] | P minimizes E[P, j]
(4) (5)
Inversely, for each partition function P on rowset(j), since Rc (j − 1, j) is a subset of rowset(j), the project P of P on Rc (j −1, j) is unique. Once B[P , j −1] and SB [P , j − 1] are known, E[P, j] and SE [P, j] can be calculated according to the following equations, whose correctness can be proven similarly as Equation (2) and (3). SE [P, j] = SB [P , j − 1] ∪ F (P, j)
(6)
E[P, j] = B[P , j − 1]+ | F (P, j) |
(7)
Based on the equations above, the solution to the MLF problem for a SNP matrix M can be obtained as follows: firstly, E[P, 1] and SE [P, 1] can be obtained according to Equations (2) and (3) for all partition functions P at column 1; secondly, B[P , 1] and SB [P , 1] can be obtained by using Equations (4) and (5) for all partition functions P on Rc (1, 2); thirdly, E[P, 2] and SE [P, 2] can be obtained by using Equations (6) and (7) for all partition functions P on rowset(2); and so on, at last E[P, n] and SE [P, n] can be obtained for all partition functions P at column n. Once E[P, n] and SE [P, n] for all possible P are known, a solution to the MLF problem for M can be obtained by using Equation (1). Please see Fig. 4 for the details of our P-MLF algorithm. Theorem 1. For an m × n SNP matrix M , if M satisfies the (k1 , k2 ) parameterized condition, then the P-MLF algorithm solves the MLF problem correctly in time O(nk2 2k2 + mlogm + mk1 ) and space O(mk1 2k2 + nk2 ). Proof. The P-MLF algorithm is based on Eqs. (1)-(7). Eqs. (2)-(7) have been proven in the above discussion and the correctness of Eq. (1) is obvious. Given an m×n SNP matrix M satisfying the (k1 , k2 ) parameterized condition, consider the following storage structure: each row keeps the first and the last column that the row covers, i.e. its left and right value, and its values at the columns from its left column to its right column. In such a storage structure, M takes space O(mk1 ). It is easy to see that rowset takes space O(nk2 ), H takes space O(n), E and B take space O(2k2 ), and SE and SB take space O(mk1 2k2 ). In summary, the space complexity of the algorithm is O(mk1 2k2 + nk2 ). Now we discuss the time complexity of the P-MLF algorithm. In Step 1, sorting takes time O(mlogm). All rowsets can be obtained by scanning the rows only once, which takes time O(mk1 ). Since M satisfies the (k1 , k2 ) parameterized condition, there are no more than k2 rows covering a column. Therefore, the
440
M. Xie, J. Wang, and J. Chen
Algorithm P-MLF input: an m × n SNP matrix M output: a solution to the MLF problem for M 1. initiation: sort the rows in M in ascending order such that for any two rows i1 and i2 , if i1 < i2 , then lef t(i1 ) ≤ lef t(i2 ); for each column j, calculate an ordered set rowset(j) and the number H[j] of the rows that cover column j; 2. for P = 0..2H[1] − 1 do // P is coded by a binary number 2.1. CompFlips(1, P, SE [P ]); //calculate SE [P, 1] using Eq. (2) 2.2. E[P ] =| SE [P ] |; //calculate E[P, 1] using Eq. (3), 3. j=1; 4. while j < n do //recursion based on Eqs. (4)-(7) // MAX denotes the maximum integer 4.1. calculate NC , the number of rows that cover both columns j and j + 1, and a vector Bits such that Bits[i]=1 denotes the ith row of rowset(j) covers column j + 1; 4.2. for P = 0..2NC − 1 do B[P ]=MAX; 4.3. for P = 0..2H[j] − 1 do 4.3.1. calculate the project P of P on Rc (j, j + 1) using Bits. //Eqs. (4) and (5) 4.3.2. if B[P ] > E[P ] then B[P ] = E[P ]; SB [P ] = SE [P ]; 4.4. j + +; //next column 4.5. for P = 0..2NC − 1 do 4.5.1. for each extensions P of P on rowset(j) do 4.5.1.2. CompFlips(j, P, F ); 4.5.1.3. SE [P ] = SB [P ] ∪ F ; E[P ] = B[P ]+ | F |; //Eqs. (6) and (7) //Eq. (1) 5. output the minimum E[P ] and the corresponding SE [P ] (P = 0..2H[n] −1). Fig. 4. P-MLF Algorithm
function CompFlips takes time O(k2 ), and for each column j, H[j] ≤ k2 . In consequence, Step 2 takes time O(k2 2k2 ). In Step 4.1, scanning rowset(j) and rowset(j+1) simultaneously can obtain NC and Bits, and takes time O(k2 ). Step 4.2 takes time O(2k2 ), and Step 4.3 takes time O(k2 2k2 ). In Step 4.5, for each P , there are 2H[j]−NC extensions of P on rowset(j). Given P , an extension of P can be obtained by a bit-or operation in time O(1). In all, Step 4.5 takes time O(k2 2Nc 2k2 −Nc ). Then Step 4 is iterated n − 1 times and takes time O(nk2 2k2 ). Step 5 takes time O(2k2 ). In summary, the time complexity of the algorithm is O(nk2 2k2 + mlogm + mk1 ). This completes the proof.
4
Experimental Results
To the best of our knowledge, real DNA sequence fragment data in the public domain are not available, References [15, 19, 20] used computer-generated
A Practical Parameterized Algorithm
441
Table 1. Comparison of performances of two algorithms Parameters n m e 0.01 16 32 0.03 0.05 0.01 22 44 0.03 0.05 0.01 50 100 0.03 0.05 a
Tavg (s) a B-MLF P-MLF 0.04 0.002 0.40 0.002 5.62 0.002 186.9 0.002 200.1 0.003 532.3 0.003 0.013 0.018 0.025
Tmax (s) a B-MLF P-MLF 0.13 0.011 2.34 0.011 56.8 0.013 738.5 0.018 1503 0.019 3214 0.029 >96 hours 0.030 >96 hours 0.037 >96 hours 0.038
R B-MLF 0.985 0.978 0.971 0.962 0.958 0.953 -
a
P-MLF 0.984 0.980 0.971 0.963 0.958 0.952 0.949 0.945 0.943
All experiments are repeated 100 times except for B-MLF with n = 50 and m = 100. Tavg (the average time), Tmax (the maximum running time) and R (the reconstruction rate) are over the repeated experiments with the same parameters.
simulated data under some realistic assumptions to compare the algorithms on the haplotypes assembly problem. Accordingly, we generated artificial fragment data in the same way and using the same parameters as in the above references. In order to make the generated data have the same statistical features as the real data, a widely used shotgun assembly simulator Celsim [21] is adopted. At first a haplotype h1 of length n is generated at random, then another haplotype h2 of the same length is generated by flipping every char of h1 with the probability of d, then Celsim is invoked to generated m fragments whose lengths are between lM in and lM ax. At last the output fragments are processed to plant reading errors with probability e and empty values with probability p. In the DNA sequencing experiments, fragment coverage rate c is about 10 [13]. In our experiments, the parameters are as follows: fragment coverage rate c = 10, the difference rate between two haplotypes d =10%, the minimal length of fragment lM in = 3, the maximal length of fragment lM ax = 7 and empty values probability p = 2%. The length of haplotype n, the number of fragment m (m = 2×n×c/(lM ax+ lM in)) and the reading error probability e are varied to compare the performance of Wang et al.’s algorithms B-MLF [10] and our algorithms P-MLF. Please refer to [20] and [21] for the details about how to generate artificial data. P-MLF is implemented in C++ and B-MLF comes from Wang [10]. We ran our experiments on a Linux server (4 Intel Xeon 3.6GHz CPU and 4GByte RAM). In the experiments, we compare the running time and the reconstruction rate of haplotypes [10] between both the algorithms. The reconstruction rate of haplotypes is defined as the ratio of the number of the SNP sites that are correctly inferred by an algorithm to the total number of the SNP sites of the haplotypes. From Table 1, we can see when n and m increase, the running time of BMLF increases sharply. When n = 50 and m = 100, B-MLF cannot work out a solution in 4 days, but the running time of P-MLF is still small and less than 1 second. We also can see when the reading error probability e is small both the algorithms have good performance in reconstructing the haplotypes.
442
M. Xie, J. Wang, and J. Chen
(a) Reconstruction rate
(b) Running time
Fig. 5. The performance of P-MLF
From Fig. 5, we can also see that P-MLF still has good performance when the length of the haplotypes increases to 300.
5
Conclusion
Haplotyping plays a more and more important role in some regions of genetics such as locating of genes, designing of drugs and forensic applications. MLF is an important computational model to infer the haplotypes of an individual from one’s aligned SNPs fragment data by flipping the minimum number of SNPs. MLF has been proven to be NP-hard. To solve the problem, Wang et al. [10] proposed a branch-and-bound algorithm. Because it is of O(2m ) time complexity, Wang et al.’s alogrithm is impractical when the number of fragments m is large. Based on the fact that the maximum number of fragments covering a SNP site is small (usually no more than 19 [13]), the current paper introduced a new parameterized algorithm P-MLF to solve the MLF problem. With the fragments of maximum length k1 and the maximum number k2 of fragments covering a SNP site, the P-MLF algorithm can solve the MLF problem in time O(nk2 2k2 + mlogm + mk1 ) and in space O(mk1 2k2 + nk2 ). Compared with other exact algorithms, the P-MLF algorithm has good scalability and is more efficient in practice. As for other MLF type computational models such as MEC/GI ( MEC with Genotype Information ) [10], the P-MLF algorithm can also be applicable with a little change to ensure that the haplotypes reconstructed by the algorithm make up the input genotype. Acknowledgement. We thank Dr. Rui-Sheng Wang and Prof. Gene Myers for their kindly providing us with the source codes of B-MLF and Celsim respectively.
A Practical Parameterized Algorithm
443
References [1] Venter, J.C., Adams, M.D., Myers, E.W., et al.: The sequence of the human genome. Science 291(5507), 1304–1351 (2001) [2] The International HapMap Consortium: A haplotype map of the human genome. Nature 437(7063) 1299–1320 (2005) [3] Gabriel, S.B., Schaffner, S.F., Nguyen, H., et al.: The structure of haplotype blocks in the human genome. Science 296(5576), 2225–2229 (2002) [4] Stephens, J.C., Schneider, J.A., Tanguay, D.A., et al.: Haplotype variation and linkage disequilibrium in 313 human genes. Science 293(5529), 489–493 (2001) [5] Horikawa, Y., Oda, N., Cox, N.J., et al.: Genetic variation in the gene encoding calpain-10 is associated with type 2 diabetes mellitus. Nature Genetics 26(2), 163–175 (2000) [6] Greenberg, H.J., Hart, W.E., Lancia, G.: Opportunities for combinatorial optimization in computational biology. INFORMS J. Comput. 16(3), 211–231 (2004) [7] Zhao, Y.Y., Wu, L.Y., Zhang, J.H., Wang, R.S., Zhang, X.S.: Haplotype assembly from aligned weighted snp fragments. Computational Biology and Chemistry 29(4), 281–287 (2005) [8] Lancia, G., Bafna, V., Istrail, S., Lippert, R., Schwartz, R.: Snps problems, complexity and algorithms. In: Meyer auf der Heide, F. (ed.) ESA 2001. LNCS, vol. 2161, pp. 182–193. Springer, Heidelberg (2001) [9] Lippert, R., Schwartz, R., Lancia, G., Istrail, S.: Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief. Bioinform 3(1), 1–9 (2002) [10] Wang, R.S., Wu, L.Y., Li, Z.P., Zhang, X.S.: Haplotype reconstruction from snp fragments by minimum error correction. Bioinformatics 21(10), 2456–2462 (2005) [11] Bonizzoni, P., Vedova, G.D., Dondi, R., Li, J.: The haplotyping problem: an overview of computational models and solutions. J. Comp. Sci. Technol. 18(6), 675–688 (2003) [12] Chen, C., Wang, J., Cohen, B.: The strength of selection on ultraconserved elements in the human genome. The American Journal of Human Genetics 80(4), 692–704 (2007) [13] Huson, D.H., Halpern, A.L., Lai, Z., Myers, E.W., Reinert, K., Sutton, G.G.: Comparing assemblies using fragments and mate-pairs. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 294–306. Springer, Heidelberg (2001) [14] International Human Genome Sequencing Consortium: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001) [15] Wernicke, S.: On the algorithmic tractability of single nucleotide polymorphism (SNP) analysis and related problems. Ph. d. thesis, Univ. T¨ ubingen (2003) [16] Sanger, F., Nicklen, S., Coulson, A.R.: Dna sequencing with chain-terminating inhibitors. PNAS 74(12), 5463–5467 (1977) [17] Levy, S., Sutton, G., Ng, P.C., et al.: The diploid genome sequence of an individual human. PLoS Biology 5(10), October 2007, e254–e254 (2007) [18] Hinds, D.A., Stuve, L.L., Nilsen, G.B., Halperin, E., Eskin, E., Ballinger, D.B., Frazer, K.A., Cox, D.R.: Whole-genome patterns of common dna variation in three human populations. Science 307(5712), 1072–1079 (2005) [19] H¨ uffner, F.: Algorithm engineering for optimal graph bipartization. In: Nikoletseas, S.E. (ed.) WEA 2005. LNCS, vol. 3503, pp. 240–252. Springer, Heidelberg (2005)
444
M. Xie, J. Wang, and J. Chen
[20] Panconesi, A., Sozio, M.: Fast hare: a fast heuristic for single individual snp haplotype reconstruction. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 266–277. Springer, Heidelberg (2004) [21] Myers, G.: A dataset generator for whole genome shotgun sequencing. In: Lengauer, T., Schneider, R., Bork, P., Brutlag, D.L., Glasgow, J.I., Mewes, H.W., Zimmer, R. (eds.) Proc. ISMB, California, pp. 202–210. AAAI Press, Menlo Park (1999)
Improved Algorithms for Bicluster Editing Jiong Guo1, , Falk H¨ uffner1, , Christian Komusiewicz1, , and Yong Zhang2 1
2
Institut f¨ ur Informatik, Friedrich-Schiller-Universit¨ at Jena Ernst-Abbe-Platz 2, D-07743 Jena, Germany {guo,hueffner,ckomus}@minet.uni-jena.de Department of Mathematical Sciences, Eastern Mennonite University Harrisonburg, VA 22802, USA [email protected]
Abstract. The NP-hard Bicluster Editing is to add or remove at most k edges to make a bipartite graph G = (V, E) a vertex-disjoint union of complete bipartite subgraphs. It has applications in the analysis of gene expression data. We show that by polynomial-time preprocessing, one can shrink a problem instance to one with 4k vertices, thus proving that the problem has a linear kernel, improving a quadratic kernel result. We further give a search tree algorithm that improves the running time bound from the trivial O(4k + |E|) to O(3.24k + |E|). Finally, we give a randomized 4-approximation, improving a known approximation with factor 11.
1
Introduction
Data clustering is a classical task, where the goal is to partition a data set into clusters such that elements within a cluster are similar, while between clusters there is less similarity. This similarity is often modeled as a graph: Each vertex represents a data point, and two vertices are connected by an edge iff the entities that they represent have some (context-specific) similarity. If the data were perfectly clustered, this would result in a cluster graph, that is, a graph where every connected component is a clique. However, for real-world data, there is typically noise in the data. A simple clustering model is then the Cluster Editing problem [4, 19]: find a minimum set of edges to add or delete to make the graph a cluster graph. Cluster Editing is NP-hard [15]; a number of approaches have been recently suggested to deal with this. After a series of improvements, the best known polynomial-time approximation is by a factor of 2.5 [2, 21]. Another technique is that of fixed-parameter (FPT) algorithms [7, 9, 17]. The idea is to accept the superpolynomial running time that seems to be inherent to NP-hard problems, but to restrict the combinatorial explosion to a parameter that is expected to be small. For Cluster Editing, the number of editing operations k is a suitable
Supported by the Deutsche Forschungsgemeinschaft, Emmy Noether research group PIAF (fixed-parameter algorithms), NI 369/4. Supported by a PhD fellowship of the Carl-Zeiss-Stiftung.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 445–456, 2008. c Springer-Verlag Berlin Heidelberg 2008
446
J. Guo et al.
parameter, since for data with not too much noise it should be low. Several fixed-parameter algorithms for Cluster Editing have been suggested (see also H¨ uffner et al. [14] for a survey on FPT techniques in graph-modeled clustering). The search tree algorithm of Gramm et al. [10] with a running time bound of O(2.27k + n3 ) has been experimentally evaluated [6]. A recent manuscript [5] claims a running time of O(1.83k + n3 ) by using a different branching strategy and reports further experimental results. An important tool of FPT algorithmics is kernelization [7, 9, 17]. A kernelization is a polynomial-time preprocessing that reduces an instance to a size that depends only on the parameter k, and not on the input size |G| anymore. Clearly, such a preprocessing is useful for basically any approach to solving the problem, be it exact, approximative, or heuristic. For Cluster Editing, after a series of improvements [8, 10, 18], a kernel of only 4k vertices is known [11]. In some settings, the standard clustering model is not satisfactory. An important example is clustering of gene expression data, where under a number of conditions the level of expression of a number of genes is measured. This yields a bipartite similarity graph. Here, clustering only genes or only conditions often does not yield sufficient insight; we would like to find subsets of genes and subsets of conditions that together behave in a consistent way. This is called biclustering [16, 20]. A simple formulation of biclustering analogous to Cluster Editing is Bicluster Editing. Here, as a consistency condition for a cluster, we demand that it forms a biclique, that is, a complete bipartite subgraph. With bipartite graphs, we mean two-colorable graphs. Further, we do not allow any clusters to overlap. Bicluster Editing Instance: A bipartite graph G = (V, E) and an integer k ≥ 0. Question: Can we delete and add at most k edges in G such that it becomes a bicluster graph, that is, a graph where every connected component is a biclique? Further applications of biclustering arise in collaborative filtering, information retrieval, and data mining. Despite its importance, there are fewer results for Bicluster Editing than for Cluster Editing. Amit [3] proved the NPhardness and gave a factor-11 approximation based on the relaxation of a linear program. Using a simple branching strategy, the problem can be solved in O(4k + m) time [18], where m is the number of edges in the graph. Protti et al. [18] showed how to construct a problem kernel with 4k 2 + 6k vertices. Contributions. Following the work recently done for Cluster Editing, our aim is to improve FPT and approximation algorithms also for its sister problem Bicluster Editing. We first improve the size of the problem kernel from 4k 2 +6k to 4k vertices (Sect. 2). The methods used are similar to those of Guo [11]. If the input graph is not already a bipartite graph, we can still get a 6k-vertex kernel by similar means. Next, we show that the trivial O(4k + m) time branching algorithm can be improved to O(3.24k + m) time by a more refined branching strategy (Sect. 3). Finally, we give a randomized approximation algorithm
Improved Algorithms for Bicluster Editing
447
with an expected approximation factor of 4, using similar techniques as those introduced for Cluster Editing [1]. Preliminaries. We consider only undirected graphs G = (V, E) with n := |V | and m := |E|. Since singleton vertices do not play an interesting role in our problems, we assume that n ∈ O(m). Let P4 denote an induced path comprising 4 vertices. Furthermore, let ijkl denote a P4 in which i and l have degree 1 and j and k have degree 2. The neighborhood of a vertex v is denoted by N (v), and the closed neighborhood N (v) ∪ {v} is denoted by N [v]. We furthermore extend this notation to vertex sets, that is, for a vertex set S, N (S) := ( v∈S N (v)) \ S. For a vertex v, N2 (v) := N (N (v)) \ {v} denotes the set of vertices that have distance exactly 2 from v. Due to lack of space, several proofs are deferred to a full version of this paper.
2
Linear Problem Kernel
In this section, we present a kernelization algorithm for Bicluster Editing that produces a kernel consisting of at most 4k vertices, improving the kernel consisting of O(k 2 ) vertices given by Protti et al. [18]. This kernelization follows the idea of the kernelization algorithm for Cluster Editing in [11] that also produces a kernel consisting of at most 4k vertices. However, since here we are dealing with bipartite graphs and bicliques, the concrete handling of the data reduction rules and the argumentation of the kernel size are different from the one for Cluster Editing. The first step is to introduce a useful structure. Definition 1. A set S of vertices is called a critical independent set if all vertices in S have the same open neighborhood and S is maximal under this property. Observe that every critical independent set is an independent set. The connection between critical independent sets and Bicluster Editing is given by the following lemma. Lemma 1. For any critical independent set I, there is an optimal solution of Bicluster Editing in which any two vertices v1 and v2 from I end up in the same biclique. We apply the following two data reduction rules; the second one works on critical independent sets. Rule 1. Remove all connected components that are bicliques from the graph. Rule 2. Consider a critical independent set R. Let S := N (R) and T := N (S) \ R. If |R| > |T |, then remove arbitrary vertices from R until |R| = |T |. Rule 1 is clearly correct and can be carried out in O(m) time. A situation in which Rule 2 can be applied is illustrated in Fig. 1. Next, we prove the correctness of Rule 2.
448
J. Guo et al.
R
S
T
U
Fig. 1. Example for the application of Rule 2
Lemma 2. Rule 2 is correct and works in O(n2 ) time. Proof. To prove the correctness of Rule 2, we first claim that, as long as |R| ≥ |T |, there is always an optimal solution constructing a bicluster graph that contains a biclique B with R ∪ S ⊆ B ⊆ R ∪ S ∪ T and, thus, deleting or inserting no edge incident to R. Since the input graph G and the graph resulting by one application of Rule 2 to G differ only in the size of R, the correctness of Rule 2 follows. To show the claim, let U := N (T ) \ S. First, observe that R will not be “split” (following from Lemma 1), that is, there is always an optimal solution leaving a biclique B with R ⊆ B. Second, we prove that no vertex outside of R ∪ S ∪ T can be in B, that is, B ⊆ R ∪S ∪T . To see this, if a vertex u ∈ / R ∪S ∪T is in B, then obviously u is from U . However, to add u to B needs at least |R| edge insertions. Thus, as long as |R| ≥ |T |, adding u to B is never better than putting u in a biclique different from B, which requires at most |T | edge deletions. Finally, we show that no vertex from S can be outside of B, that is, R ∪ S ⊆ B. This is easy to see, since every vertex u ∈ S has only neighbors in R and T . By |R| ≥ |T | and B ⊆ R ∪ S ∪ T , including u in B requires at most |T | edge modifications, namely, deleting all edges between u and N (u) ∩ T and adding edges between u and T \N (u). In comparison, excluding u from B needs at least |R| edge deletions, since R ⊆ N (u). Concerning the running time, one can compute all critical independent sets in O(m) time [12]. Then, we determine the sets S and T for all independent sets R, which can be done in O(n2 ) time. To check the applicability of Rule 2, one iterates over all critical independent sets and uses the already computed information about S and T to decide if the precondition of Rule 2 is fulfilled by R. Note that after one application of Rule 2, one has only to consider the critical independent sets whose vertices are in S and T and to change the sizes of their S’s and T ’s. Therefore, each application of Rule 2 can be carried out in O(n) time. Rule 2 can be applied at most n times, which gives the total running time O(n2 ). With these two rules we can now prove a kernel consisting of at most 4k vertices. Theorem 1. Bicluster Editing on bipartite graphs admits a 4k-vertex problem kernel. Proof. Let G denote a bipartite graph on which Rules 1 and 2 have been exhaustively applied. Furthermore, let F be a bicluster editing set with |F | ≤ k
Improved Algorithms for Bicluster Editing
449
and let G be the resulting bicluster graph after applying the edge modifications in F to G. We partition the vertices in G into two sets, X containing the endpoints of the edges in F , and Y the rest. Clearly, |X| ≤ 2k. It remains to upper-bound |Y |. Suppose G consists of l biclusters, B1 , . . . , Bl . It is easy to see that for every i ∈ {1, . . . , l}, the unaffected vertices from one partition in Bi must have the same neighborhood in G. Hence, for every i ∈ {1, . . . , l}, the vertices in Bi ∩ Y form at most two critical independent sets in G. Let R be one critical independent set in Bi , S := NG (R), and T := NG (S) \ R. Due to Rule 2, |R| ≤ |T |. Due to Rule 1, all vertices of T are in X. Some of them are in Bi after gaining some edges between them and the vertices in S and the others are in other bicliques after losing all edges between them and S. Therefore, summing up over all critical independent sets in all bicliques, we can conclude that |Y | ≤ |X|, giving the claimed number of vertices in the kernel. Protti et al. [18] have considered Bicluster Editing on general graphs as well and presented a problem kernel with O(k 2 ) vertices. With a slight modification of Rule 2, we can improve this result to a 6k-vertex problem kernel. The main difference to the kernelization for bipartite graphs lies in the edges between the vertices of S: If the vertices in S have much more edges between them than the size of R, it could be better to keep them and to remove some edges between R and S. To take this into account, we make a partition of the vertices in S as described below. Modified Rule 2. Consider a critical independent set R, let S := N (R) and T := N (S) \ R. Further partition S into two sets, S1 the set of vertices without neighbors in S and S2 := S \ S1 . If |R| > |S2 | + |T |, then reduce R until |R| = |S2 | + |T |. The correctness proof of the modified Rule 2 is almost the same as the one for Lemma 2, namely, showing that in case |R| ≥ |S2 | + |T | there is always an optimal solution creating a biclique B, such that R ∪ S ⊆ B ⊆ R ∪ S ∪ T . The only difference concerns the vertices in S2 . Since they have neighbors in S2 , including them in B requires not only deleting and adding edges between them and T but also deleting edges between them and their neighbors in S2 . However, if |R| ≥ |S2 | + |T |, then it is never better to exclude them from B than to include them in B. Theorem 2. Bicluster Editing on general graphs admits a 6k-vertex problem kernel.
3
Fixed-Parameter Algorithm
In this section, we present a search tree algorithm for Bicluster Editing in bipartite graphs that is based on the forbidden subgraph characterization of Bicluster Editing and has a running time of O(3.24k + m), improving upon the trivial search tree algorithm with a running time of O(4k + m) [18].
450
J. Guo et al.
Let ijkl be a P4 in G. The trivial search tree algorithm for Bicluster Editing branches on ijkl in 4 cases: one case corresponds to adding edge {i, l}; the other three cases correspond to removing one of the three edges of ijkl. The improvement of the running time of our algorithm is achieved by applying a refined branching strategy on larger subgraphs that contain a P4 . In this branching strategy, we distinguish two main cases. For the first case, we show that an improved branching can be achieved. For the second case, we show that it can be solved in polynomial time. In the following, we describe this branching strategy. Clearly, branching is only performed as long as G is not a bicluster graph. Therefore, we assume that G contains a P4 . Furthermore, we deal with each connected component separately. Therefore, without loss of generality assume that G is connected. We distinguish two main cases. Case 1: There is a connected subgraph of size 5 of G that contains a P4 and has 4 edges. Let G be such a subgraph, and let ijkl denote a P4 contained in G . Since G is connected and contains 5 vertices and 4 edges, it must contain a vertex u that is connected to exactly one vertex in ijkl. Hence, the resulting graph is either a P5 —in case u is adjacent to i or l—or a so-called fork —in case u is adjacent to j or k. We describe the branching strategy for P5 ’s in detail. Branching on forks works analogously. Both branchings are depicted in Fig. 2. Let ijklu be the P5 that we branch on. In the first branch, we delete the edge {k, l}, the parameter is decreased by 1. In the second branch, we delete the edge {j, k} and the parameter is decreased by 1. Since we have already considered deleting {k, l} or {j, k}, we can mark these two edges as permanent, that is, we may not delete these edges in the remaining branches. To destroy jklu, we must either delete {l, u} or add {j, u}. But after performing either of these two edge modifications the graph still contains ijkl, and {j, k} and {k, l} are marked as permanent. Hence, for each of these two branches, we create two subbranches, one in which {i, l} is added, and one in which {i, j} is deleted. In total, we have 6 branches, two branches in which the parameter is decreased by 1, and 4 branches in which the parameter is decreased by 2. To estimate the size of the search tree, we use the concept of branching vectors [17]. The branching vector of this branching is (1, 1, 2, 2, 2, 2). The branching on a fork works analogously. Case 2: Otherwise. Since Case 1 did not apply, every connected subgraph of size 5 that contains a P4 has at least 5 edges. We show that in this case, no branching is needed because G can be turned into a biclique by adding one edge. Lemma 3. Let G = (V1 , V2 , E) be a fork-free and P5 -free connected bipartite graph, and let ijkl be a P4 in G. Then, adding edge {i, l} transforms G into a biclique. Proof. W.l.o.g. assume that {i, k} ⊆ V1 and {j, l} ⊆ V2 . We prove the lemma by showing that with the exception of {i, l} all edges are present in G. First, we show that k is adjacent to all vertices in V1 , and that j is adjacent to all vertices in V2 . Clearly, if i has a neighbor u, then u is a neighbor of k. Otherwise, G[{i, j, k, l, u}] is a P5 , and G is thus not P5 -free. Also, every vertex u ∈
Improved Algorithms for Bicluster Editing i k u
l
k i
u
k
j
u
l
i j
u
l
l
i k
k
l j i
j u
j
i
l
k
j
i
l
k
l
j l
u
k u
k
j
u l (a) Case 1.1: Branching on a P5 .
i
j
l
u
l
k
j
i
k
j
i
i j
451
u
i
j u
k
l
k
j u l j
u
k
i
k l (b) Case 1.2: Branching on a fork.
j l
k
Fig. 2. Branching on subgraphs of size 5 that contain a P4 and exactly 4 edges. Dashed lines are deleted edges; bold lines are permanent edges.
N (k) \ {l} is a neighbor of i. Otherwise, G[{i, j, k, l, u}] is a fork, and G thus not fork-free. Therefore N (i) = N (k) \ {l}. Analogously, we can show that N (l) = N (j) \ {i}. Furthermore, every vertex v ∈ N2 (i) is adjacent to j. Otherwise, suppose that there is a vertex v ∈ N2 (i) that is not adjacent to j and let u be the common neighbor of i and v. Then, the subgraph G[{i, k, l, u, v}] is a fork, because u is also adjacent to k (since u ∈ N (i) ⊂ N (k)), and u is not adjacent to l (since u∈ / N (j) ⊃ N (l)). Hence, G is not fork-free in this case. Analogously, we can show that k is adjacent to all vertices in N2 (l). With this it becomes obvious that k is adjacent to all vertices in V1 , and j is adjacent to all vertices in V2 . Now we show that every pair of vertices u ∈ V1 \ {i} and v ∈ V2 \ {l} must be pairwise adjacent. Suppose, there are two vertices u ∈ V1 \ {i} and v ∈ V2 \ {l} that are not pairwise adjacent. Since i is adjacent to all vertices in V1 \ {l}, it is adjacent to v. Analogously, one can show that j and l are adjacent to u. Therefore, G[{i, j, l, v, u}] is a P5 if v and u are not adjacent, contradicting the fact that G is P5 -free. Since all vertices in V1 \ {i} are adjacent to all vertices V2 \ {l}, it is clear that adding edge {i, l} transforms G into a biclique. In the following theorem, we bound the running time of the described search tree algorithm, when it is combined with kernelization. Theorem 3. Bicluster Editing can be solved in O(3.24k + m) time.
452
J. Guo et al.
ApproxBicluster(G = (V1 , V2 , E)) 1 G := (∅, ∅, ∅) 2 while V1 ∪ V2 = ∅: 3 randomly select a pivot vertex i ∈ V1 ∪ V2 4 C := {i} ∪ N (i) 5 for all j ∈ {v = i | N (v) ∩ N (i) = ∅} : 6 if N (j) = N (i) : add j to C 7 else : add j to C with probability 1/2 8 transform G[C] into an isolated biclique add G[C] as new component to G 9 G := G ∪ G[C] 10 G := G[V \ C] remove C from G 11 output set of edge modifications from G to G Fig. 3. A randomized factor-4 approximation algorithm for Bicluster Editing
4
Randomized 4-Approximation Algorithm
We present a polynomial time randomized factor-4 approximation algorithm for Bicluster Editing that is based on a technique introduced by Ailon et al. [1]. This improves the previously best factor-11 approximation algorithm by Amit [3]. The basic strategy of the algorithm is to randomly pick a pivot vertex v, and then to randomly destroy all P4 ’s that contain v. In doing so, we create an isolated biclique that contains v, since a connected component in which one vertex does not appear in a P4 is a biclique. This procedure is applied until the graph is a bicluster graph. In the following, we describe how the P4 ’s containing the pivot vertex v are destroyed. Given a pivot vertex i, we create a vertex set C that initially contains N [i]. In the end this set C contains the vertices that are in the same biclique as i in the final bicluster graph. First, we add all vertices that are in the same critical independent set as i. Then we randomly decide for each vertex w that is adjacent to at least one vertex of N (i) whether w should be added to C. Since w is adjacent to neighbors of N (i) but is not in the same critical independent set as i, there must be a P4 that contains i and w. By randomly deciding whether i and w end up in the same biclique, we randomly decide which edge modification is made to destroy the P4 . After this is done for all such vertices, we output C and cut C from G. This is done until G has no vertex. The pseudo-code of the algorithm is shown in Fig. 3. In order to apply the method of Ailon et al. [1], the algorithm must guarantee that after an edit operation is made on an edge, this edge is never again modified during the course of the algorithm, and that for a given P4 each edit operation has the same probability. In our case this probability is 14 , which leads to an approximation factor of 4. To prove this upper bound on the approximation factor, we first need the notion of a fractional packing.
Improved Algorithms for Bicluster Editing
i
j
k
l
pivot i
P r[Eij ] P r[Eil ] P r[Ekj ] P r[Ekl ] 1 1 0 0 2 2
j
0
k
1 2 1 2 1 4
l i∨j∨k∨l
453
1 2 1 2
0
1 2
0
0
0
1 2 1 4
0
1 4
1 4
Fig. 4. The probabilities of edge modifications in ijkl in case one of {i, j, k, l} is chosen as pivot. The event that there is an edge modification between two vertices i and j is denoted by Eij .
Definition 2. Let G = (V1 , V2 , E) be a bipartite graph, P the set of P4 ’s of G, and w : P → + a weightfunction. The function w is called a fractional packing of P if ∀i ∈ V1 , j ∈ V2 : {p∈P |{i,j}∈p} w(p) ≤ 1.
Ê
In the following lemma, we show that given a fractional packing w, the sum of the weights w(p) of all p ∈ P is a lower bound on the cost of optimal solutions. Lemma 4. Let G = (V1 , V2 , E) be a bipartite graph, COpt the cost of an optimal solution of Bicluster Editing of G, and P the set of P4 ’s. If a weight function w : P → + is a fractional packing of P , then p∈P w(p) ≤ COpt .
Ê
Using Lemma 4, we can show an upper bound on the approximation factor of ApproxBicluster, as follows. First, we show that the sum of the probabilities of making edge modifications in all P4 ’s caused by choosing one of their vertices as pivot equals the expected cost of the solution that is output by ApproxBicluster. Then, we show that dividing these probabilities by 4 also yields a fractional packing and thus that the expected cost of the output solution is at most 4 times the cost of an optimal solution. Theorem 4. ApproxBicluster is a randomized factor-4 approximation algorithm for Bicluster Editing, running in O(n2 ) time. Proof. Obviously, the output of ApproxBicluster is a solution of Bicluster Editing. We thus prove the theorem by bounding the approximation factor of the expected cost of the output of ApproxBicluster. Let C be the cost of a solution that is output by ApproxBicluster, and let COpt be the cost of an optimal solution. We prove the theorem by showing that the expected cost E[C] ≤ 4 · COpt . Clearly, the algorithm inserts or removes edges only between vertices that appear in a P4 . An edit operation is performed in a P4 ijkl only if i, j, k, and l are still in the graph and one of them is chosen as pivot. Let Aijkl denote the event that one vertex in {i, j, k, l} is chosen as pivot when all of them are still in the graph. Furthermore, let πijkl denote the probability that event Aijkl occurs during the execution of ApproxBicluster. In case that event Aijkl occurs, we perform exactly one edge edit operation between the vertices of {i, j, k, l}. Therefore, the expected cost E[C] of ApproxBicluster is p∈P πp .
454
J. Guo et al.
We complete the proof by first showing that the weight function w that is π obtained by assigning the weight ijkl to the P4 ijkl is a fractional packing 4 of the P4 ’s of G, and then showing that this leads to the claimed expected approximation factor. Let p be a P4 , and let i ∈ V1 , j ∈ V2 be two vertices in p. Let Eij denote the event that there is an edit operation between i and j. As Fig. 4 shows, the probability P r[Eij | Ap ] = 14 . Therefore, P r[Eij ∧ Ap ] = P r[Eij | Ap ] · P r[Ap ] =
1 πp . 4
Furthermore, note that after event Eij occurred, at least one of i and j is removed from the graph, and thus no further editing between i and j takes place. Therefore, for distinct P4 ’s p and p , the events Eij ∧Ap and Eij ∧Ap are disjoint and thus 1 πp ≤ 1. P r[Eij ∧ Ap ] = 4 {p∈P |{i,j}∈p}
{p∈P |{i,j}∈p}
With this, it becomes obvious that assigning the weight 14 πp to every p ∈ P results in a fractional packing of the P4 ’s of G. As shown by Lemma 4, this means that p∈P 14 πp ≤ COpt . Therefore, E[C] =
πp ≤ 4 · COpt ,
p∈P
which proves the upper bound on the approximation factor. Now we prove the running time of the algorithm. In a preprocessing step, we compute the critical independent sets of the graph, which can be performed in O(n+m) time [12]. This is done so that the test in line 6 of the algorithm can be performed in constant time. Determining which vertices end up in C takes O(m) time overall, since the test for membership in the same critical independent set can now be performed in constant time and each edge is visited at most once: either the edge is cut or it is part of the isolated biclique that is removed from G and added to G . Finally, the number of added edges is in O(n2 ), which results in the claimed running time bound. Note that the running time of Theorem 4 can be improved to O(m) when the output is merely a list of the bicliques and the vertices they contain. Otherwise, a linear running time cannot be achieved, because the output size cannot be bounded by O(m).
5
Outlook
We have improved kernelization, parameterized algorithm, and approximation algorithm for Bicluster Editing. It is probably possible to further improve the bound of the FPT algorithm, albeit only at the cost of a more complicated case
Improved Algorithms for Bicluster Editing
455
distinction. Further improvement of the approximation factor and derandomization of the result of Theorem 4 should be possible using similar techniques as for Cluster Editing [1, 21, 22]. Together, the improvements make non-heuristic algorithm implementations much more feasible. In particular useful seems the kernelization, which for example is guaranteed to reduce an instance with 1000 vertices and k = 50 to only 200 vertices, without losing optimality (note that here the quadratic kernelization [18] does not give any useful bound). It is conceivable that in many cases the kernelized instance can be solved by the branching algorithm from Sect. 3 within reasonable time. Further, it would be interesting to see whether the observed approximation quality of the approximation algorithm from Sect. 4 improves by the preprocessing. Variants of Bicluster Editing are also of interest, for example considering weights, allowing the deletion of vertices instead of adding and deleting edges, or the generation of a prespecified number of bicliques (the corresponding variations for Cluster Editing have received some attention, see e. g. [11, 13]).
References [1] Ailon, N., Charikar, M., Newman, A.: Aggregating inconsistent information: ranking and clustering. In: Proc. 37th STOC, pp. 684–693. ACM Press, New York (2005) [2] Ailon, N., Charikar, M., Newman, A.: Proofs of conjectures in Aggregating inconsistent information: Ranking and clustering. Technical Report TR-719-05, Department of Computer Science, Princeton University (2005) [3] Amit, N.: The bicluster graph editing problem. Master’s thesis, Tel Aviv University, School of Mathematical Sciences (2004) [4] Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Machine Learning 56(1– 3), 89–113 (2004) [5] B¨ ocker, S., Briesemeister, S., Bui, Q.B.A., Truß, A.: PEACE: Parameterized and exact algorithms for cluster editing. Manuscript, Lehrstuhl f¨ ur Bioinformatik, Friedrich-Schiller-Universit¨ at Jena (September 2007) [6] Dehne, F.K.H.A., Langston, M.A., Luo, X., Pitre, S., Shaw, P., Zhang, Y.: The cluster editing problem: Implementations and experiments. In: Bodlaender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, Springer, Heidelberg (2006) [7] Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Heidelberg (1999) [8] Fellows, M.R., Langston, M.A., Rosamond, F.A., Shaw, P.: Efficient parameter´ ized preprocessing for cluster editing. In: Csuhaj-Varj´ u, E., Esik, Z. (eds.) FCT 2007. LNCS, vol. 4639, pp. 312–321. Springer, Heidelberg (2007) [9] Flum, J., Grohe, M.: Parameterized Complexity Theory. Springer, Heidelberg (2006) [10] Gramm, J., Guo, J., H¨ uffner, F., Niedermeier, R.: Graph-modeled data clustering: Exact algorithms for clique generation. Theory of Computing Systems 38(4), 373– 392 (2005) [11] Guo, J.: A more effective linear kernelization for Cluster Editing. In: Chen, B., Paterson, M., Zhang, G. (eds.) ESCAPE 2007. LNCS, vol. 4614, Springer, Heidelberg (2007)
456
J. Guo et al.
[12] Hsu, W., Ma, T.: Substitution decomposition on chordal graphs and applications. In: Hsu, W.-L., Lee, R.C.T. (eds.) ISA 1991. LNCS, vol. 557, pp. 52–60. Springer, Heidelberg (1991) [13] H¨ uffner, F., Komusiewicz, C., Moser, H., Niedermeier, R.: Fixed-parameter algorithms for cluster vertex deletion. In: Lin, G. (ed.) COCOON. LNCS, vol. 4598, pp. 711–722. Springer, Heidelberg (2007) [14] H¨ uffner, F., Niedermeier, R., Wernicke, S.: Fixed-parameter algorithms for graphmodeled data clustering. In: Clustering Challenges in Biological Networks, World Scientific, Singapore (to appear, 2008) [15] Kˇriv´ anek, M., Mor´ avek, J.: NP-hard problems in hierarchical-tree clustering. Acta Informatica 23(3), 311–323 (1986) [16] Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics 1(1), 24–45 (2004) [17] Niedermeier, R.: Invitation to Fixed-Parameter Algorithms. Oxford Lecture Series in Mathematics and Its Applications, vol. 31. Oxford University Press, Oxford (2006) [18] Protti, F., da Silva, M.D., Szwarcfiter, J.L.: Applying modular decomposition to parameterized bicluster editing. In: Bodlaender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, pp. 1–12. Springer, Heidelberg (2006); To appear under the title “Applying modular decomposition to parameterized cluster editing problems” in Theory of Computing Systems. [19] Shamir, R., Sharan, R., Tsur, D.: Cluster graph modification problems. Discrete Applied Mathematics 144(1–2), 173–182 (2004) [20] Tanay, A., Sharan, R., Shamir, R.: Biclustering algorithms: A survey. In: Aluru, S. (ed.) Handbook of Computational Molecular Biology, pp. 26–1 – 26–17. Chapman Hall/CRC Press (2006) [21] van Zuylen, A.Z., Williamson, D.P.: Deterministic algorithms for rank aggregation and other ranking and clustering problems. In: Kaklamanis, C., Skutella, M. (eds.) WAOA 2007. LNCS, vol. 4927, pp. 260–273. Springer, Heidelberg (2007) [22] van Zuylen, A.Z., Hegde, R., Jain, K., Williamson, D.P.: Deterministic pivoting algorithms for constrained ranking and clustering problems. In: Proc. 18th SODA, pp. 405–414. SIAM, Philadelphia (2007)
Generation Complexity Versus Distinction Complexity Rupert H¨ olzl and Wolfgang Merkle Institut f¨ ur Informatik, Ruprecht-Karls-Universit¨ at, Heidelberg, Germany {hoelzl,merkle}@math.uni-heidelberg.de
Abstract. Among the several notions of resource-bounded Kolmogorov complexity that suggest themselves, the following one due to Levin [Le] has probably received most attention in the literature. With some appropriate universal machine U understood, let the Kolmogorov complexity of a word w be the minimum of |d| + log t over all pairs of a word d and a natural number t such that U takes time t to check that d determines w. One then differentiates between generation complexity and distinction complexity [A, Sip], where the former asks for a program d such that w can actually be computed from d, whereas the latter asks for a program d that distinguishes w from other words in the sense that given d and any word u, one can effectively check whether u is equal to w. Allender et al. [A] consider a notion of solvability for nondeterministic computations that for a given resource-bounded model of computation amounts to require that for any nondeterministic machine N there is a deterministic machine that exhibits the same acceptance behavior as N on all inputs for which the number of accepting paths of N is not too large. They demonstrate that nondeterminism is solvable for computations restricted to polynomially exponential time if and only if for any word the generation complexity is at most polynomial in the distinction complexity. We extend their work and a related result by Fortnow and Kummer [FK] as follows. First, nondeterminism is solvable for linearly exponential time bounds if and only if generation complexity is at most linear in distinction complexity. Second, nondeterminism is solvable for polynomial time bounds if and only if the conditional generation complexity of a word w given a word y is at most linear in the conditional distinction complexity of w given y; hence, in particular, the latter condition implies that P is equal to UP. Finally, in the setting of space bounds it holds unconditionally that generation complexity is at most linear in distinction complexity.
In general, the Kolmogorov complexity of a word w is the length |d| of a shortest program d such that d determines w effectively. In a setting of unbounded computations, this approach leads canonically to the usual notion of plain Kolmogorov complexity and its prefix-free variant. In a setting of resource-bounded computations though, there are several notions of Kolmogorov complexity that are in some sense natural – and none of them is considered canonical. M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 457–466, 2008. c Springer-Verlag Berlin Heidelberg 2008
458
R. H¨ olzl and W. Merkle
A straight-forward approach is to cap the execution time and/or used space by simply not allowing descriptions that take too long or too much space for producing the word we want to describe. This notion has the disadvantage that for a fixed resource-bound there is no canonical notion of universal machine. Another approach, which has received considerable attention in the literature, was introduced by Levin [Le], where, in contrast to the notion just mentioned, arbitrarily long computations are allowed, but a large running time increases in some way the complexity value. More precisely, with some appropriate universal machine U understood, in Levin’s model the Kolmogorov complexity of a word d is the minimum of |d|+log t over all pairs of a word d and a natural number t such that U takes time t to check that w is the word determined by d. As for other notions of resource-bounded Kolmogorov complexity, here one can differentiate between generation complexity and distinction complexity [A, Sip], where the former asks for a program d such that w can actually be computed from d, whereas the latter asks for a program d that distinguishes w from other words in the sense that given d and any word u, one can effectively check whether u is equal to w. The question of how generation and distinction complexity relate to each other in the setting of Levin’s notion of resource-bounded Kolmogorov complexity has been investigated by Allender et al. [A]. They consider a notion of solvability for nondeterministic computations that — for a given resource-bounded model of computation — amounts to require that for any nondeterministic machine N there is a deterministic machine that exhibits the same acceptance behavior as N on all inputs for which the number of accepting paths of N is not too large, e.g., is at most logarithmic in the number of all possible paths. Their main result then asserts that nondeterminism is solvable for computations restricted to polynomially exponential time if and only if for any word the generation complexity is at most polynomial in the distinction complexity. We extend the work of Allender et al. [A] and a related result by Fortnow and Kummer [FK] as follows. First, nondeterminism is solvable for linearly exponential time bounds if and only if generation complexity is at most linear in distinction complexity. Second, nondeterminism is solvable for polynomial time bounds if and only if the conditional generation complexity of a word w given a word y is at most linear in the conditional distinction complexity of w given y; as a consequence, the latter condition implies in particular that P is equal to UP. Combining the result on polynomial time bounds with a result by Fortnow and Kummer [FK] about Kolmogorov complexity defined in terms of fixed polynomial time bounds, one obtains that in the model just mentioned conditional generation and distinction complexity are close if and only if they are close in Levin’s model. Finally, in the setting of space bounds, more precisely, for complexity measures Ks and KDs that logarithmically count the used space instead of the running time used on a program, it holds unconditionally that generation complexity is at most linear in distinction complexity. The notion of generation complexity considered below differs from Levin’s original notion insofar as one has to generate only single bits of the word to be
Generation Complexity Versus Distinction Complexity
459
generated but not the word as a whole. This variant has already been used by Allender et al. [A]; their results mentioned above, as well as the results demonstrated below extend to Levin’s original model by almost identical proofs. For a complexity class C, we will refer by C-machine to any machine M that uses a model of computation and obeys a time- or space-bound such that M witnesses L(M ) ∈ C with respect to the standard definition of C. For example, an NE-machine is a nondeterministic machine that runs in linearly exponential time. The individual bits of a word x will be denoted by x1 to x|x| . We fix an appropriate universal machine U that receives as input encoded tuples of words and e.g. (x, y, z) will be encoded by x˜01˜ y01˜ z where the word u˜ is obtained by doubling every symbol in u, i.e., u ˜ = u1 u1 u2 u2 . . . u|u| u|u| . Logarithms to base 2 are denoted by log, and often a term of the form log t will indeed denote the least natural number s such that t ≤ 2s .
1
Known Results
Definition 1 (Levin [Le], Allender et al. [A]). Time-bounded generation complexity Kt(.) and distinction complexity KDt(.) are defined by ∀b ∈ {0, 1, ∗} : ∀i ≤ |x| + 1 : U (d, i, b) Kt(x) = min |d| + log t , runs for t steps and accepts iff (x∗)i = b ∀y ∈ Σ |x| : U (d, y) runs for . KDt(x) = min |d| + log t t steps and accepts iff x = y Observe that in the definition of Kt-complexity the symbol ∗ has to be generated as an end marker for the word x. Remark 2. The notion of Kt-complexity introduced in Definition 1 was proposed by Allender et al. in [A] as a variation of Levin’s original definition, where the latter requires to generate whole words instead of individual bits. Levin’s original definition has the advantage of assuring that for all x, it holds that KDt(x) ≤ Kt(x) + log |x|. In connection with Theorem 14 we also use the following conditional complexity notions. Definition 3. The conditional time-bounded distinction complexity Kt(.|.) and conditional generation complexity KDt(.|.) are defined by ∀b ∈ {0, 1, ∗} : ∀i ≤ |x| + 1 : U (d, y, i, b) , Kt(x|y) = min |d| + log t runs for t steps and accepts iff (x∗)i = b ∀z ∈ Σ |x| : U (d, y, z) runs for KDt(x|y) = min |d| + log t . t steps and accepts iff z = x
460
R. H¨ olzl and W. Merkle
We will shortly review Theorem 17 from Allender et al. [A] before we will state our extensions. Definition 4 (Allender et al. [A]). We say that FewEXP search instances are EXP-solvable if, for every NEXP-machine N and every k, there is an k EXP-machine M with the property that if N has fewer than 2|x| accepting paths on input x, then M produces on input x some accepting path as output if there is one. We say that FewEXP decision instances are EXP-solvable if, for every NEXP-machine N and every k, there is an EXP-machine M with the property k that if N has fewer than 2|x| accepting paths on input x, then M accepts x if and only if N accepts x. We say that FewEXP decision instances are EXP/poly-solvable if, for every NEXP-machine N and every k, there is an EXP-machine M having access to k advice of polynomial length, such that if N has fewer than 2|x| accepting paths on input x, then M accepts x if and only if N accepts x. The notion of solvability can be equivalently characterized in terms of promise problems [CHV, FK]. This will be discussed further in connection with Theorem 17 by Fortnow and Kummer. Remark 5. Note that by definition EXP-solvability of FewEXP decision instances implies FewEXP = EXP, but it is unknown whether the reverse implication holds as well. This is because the definition of EXP-solvability does not require the considered machines to have a limited number of accepting paths on all inputs. Theorem 6 (Allender et al. [A]). The following statements are equivalent: 1 2 3 3 4
For all x, Kt(x) ∈ (KDt(x))O(1) . FewEXP search instances are EXP-solvable. FewEXP decision instances are EXP-solvable. FewEXP decision instances are EXP/poly-solvable. For all A ∈ P and for all y ∈ A=l it holds that Kt(y) ∈ (log |A=l | + log l)O(1) .
In words this means, that if generating is “not much more difficult” than distinguishing, then witnesses for certain nondeterministic computations with few witnesses can be found deterministically, and vice versa.
2
Tools
In what follows we will use a corollary of the following result by Buhrman et al., which have also been used by Allender et al. We omit proofs and more detailed discussion due to space considerations.
Generation Complexity Versus Distinction Complexity
461
Lemma 7 (Buhrman et al. [BFL]). Let n be large enough and let A := {x1 , x2 , . . . , x|A| } ⊆ {l, l + 1, . . . , l + n − 1}. Then for all xi ∈ A and at least half of the prime numbers p ≤ 4 · |A| · log2 n it holds for all j = i that xi ≡ xj (mod p). Corollary 8. Let A ⊆ Σ ∗ , y ∈ Σ ∗ and l ∈ N. Let Ay,l := A ∩ {x | y x ∧ |x| = l}. Then it holds that ∀y ∈ Ay,l : KDtAy,l (x) ≤ 2 log |Ay,l | + O(log l) In particular, if there is a machine that on input y, l and x decides in polynomial time whether x is in the set Ay,l , then ∀x ∈ Ay,l : KDt(x|y) ≤ 2 log |Ay,l | + O(log l).
3
New Results
We will now state our variants of Theorem 6 by Allender et al., which are demonstrated by similar proofs. Definition 9. We say that FewE search instances are E-solvable if, for every NE-machine N and every k, there is an E-machine M with the property that if N has fewer than 2k·|x| accepting paths on input x, then M produces on input x some accepting path as output if there is one. We say that FewE decision instances are E-solvable if, for every NE-machine N and every k, there is an E-machine M with the property that if N has fewer than 2k·|x| accepting paths on input x, then M accepts x if and only if N accepts x. We say that FewE decision instances are E/lin-solvable if, for every NEmachine N and every k, there is an E-machine M having access to advice of linear length, such that if N has fewer than 2k·|x| accepting paths on input x, then M accepts x if and only if N accepts x. We say that UE decision instances are E-solvable if, for every NE-machine N and every k, there is an E-machine M with the property that if N has at most one accepting path on input x, then M accepts x if and only if N accepts x. Theorem 10. The following statements are equivalent: 1 2 3 3 3
For all words x, Kt(x) ∈ O(KDt(x)). FewE search instances are E-solvable. FewE decision instances are E-solvable. UE decision instances are E-solvable. FewE decision instances are E/lin-solvable.
462
4
R. H¨ olzl and W. Merkle
For all A ∈ P it holds that for Ay,l := A ∩ {x | y x ∧ |x| = l} and for all x ∈ Ay,l Kt(x) ∈ O(log |Ay,l | + log l + |y|).
We omit the proof of Theorem 10 due to space constraints. Remark 11. Theorem 10 remains valid by essentially the same proof when formulated for Levin’s original notion of Kt instead of the variant of Allender et al. Corollary 12. If for all x, Kt(x) ∈ O(KDt(x)), then UE = E. Proof. According to the theorem, the assumption implies that UE decision instances are E-solvable. Since a language in UE contains only such instances, the claim follows.
The equivalence stated in Theorem 10 can be extended to the setting of polynomial time bounds when considering conditional complexities Definition 13. We say that FewP search instances are P-solvable if, for every NP machine N and every k there is a P machine M with the property that if N has fewer than |x|k accepting paths on input x, then M produces on input x some accepting path as output if there is one. We say that FewP decision instances are P-solvable if, for every NP machine N and every k there is a P machine M with the property that if N has fewer than |x|k accepting paths on input x, then M accepts x if and only if N accepts x. We say that UP decision instances are P-solvable if, for every NP-machine N and every k, there is a P-machine M with the property that if N has at most one accepting path on input x, then M accepts x if and only if N accepts x. Theorem 14. The following statements are equivalent: 1 2 3 3 4
For all words x and y, Kt(x|y) ∈ O(KDt(x|y)). FewP search instances are P-solvable. FewP decision instances are P-solvable. UP decision instances are P-solvable. For all A ∈ P it holds that for Ay,l := A ∩ {x | y x ∧ |x| = l} and for all x ∈ Ay,l Kt(x|y) ∈ O(log |Ay,l | + log l).
Proof. (1 ⇒ 4): We have access to y through the conditioning. If we also have access to l, we can decide membership of x in Ay,l in polynomial time. To do this, we first check whether y x and whether y has the correct length l. If yes, compute the value A(x) = Ay,l (x) using the fact that A ∈ P. Corollary 8 then yields KDt(x|y) ≤ 2 log |Ay,l | + O(log l). Using assumption 1 the claim follows.
Generation Complexity Versus Distinction Complexity
463
(4 ⇒ 2): Let N be any nondeterministic machine running in polynomial time nk , where we can assume that N branches binarily. Let L denote L(N ). Let D := {yx | x ∈ {0, 1}|y| codes an accepting computation of N on y}. k
Obviously, D ∈ P. Now fix any y such that M on input y has at most |y|k accepting paths. Then the set Dy := D ∩ {yx | |x| = |y|k } contains at most |y|k words and by assumption 4 it follows that ∀yx ∈ Dy : Kt(yx|y) ∈ O(log |Dy | + log(|y| + |y|k )) = O(log(|y|)) So, in order to find an accepting path of M on input y, if there is one, it suffices to search through all words x with Kt(x|y) ≤ O(log |y|). This can be done in polynomial time, so that L ∈ P as was to be shown. Note that it causes no problems that we have to deal with conditional complexity here. This is because when we are searching for an accepting path x for a word y we obviously have access to y. (2 ⇒ 3): This is trivial. (3 ⇒ 3 ): This is trivial. (3 ⇒ 1): Let us assume that we have a KDt-description d, finite conditioning information y and a described word x such that the universal machine U accepts the triple (d, y, x) in tKDt steps. Assuming that this is the optimal description for x we would have KDt(x|y) = |d| + log tKDt . Since we can in tKDt steps only access the first tKDt bits of y (with the encoding of tupels introduced with the universal machine U ) we can use y[1..tKDt ] instead of y for the rest of the proof and can therefore w.l.o.g. assume |y| < tKDt . By a similar argument we can assume |d| < tKDt . Consider a variant UUP of the universal machine U where UUP is given ˆ yˆ, 1tˆ, ˆi, ˆb, n ˆ ). On any such input, if n ˆ > tˆ, then reject. inputs of the form (d, n ˆ Otherwise guess a word xˆ ∈ {0, 1} and check whether x ˆi = ˆb. If yes, UUP ˆ ˆ yˆ, x behaves for tˆ steps like U on input (d, yˆ, x ˆ), that is UUP accepts iff U (d, ˆ) accepts in these tˆ steps. For our fixed triple (d, y, x) this computation takes tUP steps, with tUP ∈ Θ(|x| + tKDt ). Note that the running time of the simulation is coded unarily into its input. Let’s call the coded number tcoded. So we have tKDt = tcoded. The execution of a distinguishing description for x on U takes at least |x| steps (otherwise x could not even be read completely, and therefore it could not be correctly distinguished). So we have tUP ∈ Θ(tKDt ). This computation is now a nondeterministic one that already correctly recognizes the given bit of x. Because no part of the input pentuple P is longer than tcoded , we have that the program has length Θ(tcoded) = Θ(tKDt ) and that therefore the running time tUP of the nondeterministic computation is in Θ(|P |). Since d is a distinguishing description for the word x ∈ {0, 1}n, for all i on input (d, y, 1tcoded , i, xi , |x|) there is a unique accepting path of UUP and
464
R. H¨ olzl and W. Merkle
none on input (d, y, 1tcoded , i, x¯i , |x|). By assumption 3 there is a deterministic machine M that for all such inputs has the same acceptance and rejection behaviour as N and works in some fixed polynomial time bound. The input for M together with an encoding of M is a generating program for x. It only remains to prove that this program is small enough and computes fast enough, compared to the KDt-program. This then implies Kt(x|y) ≤ O(KDt(x|y)), as desired. Let us first inspect the program length. The input for M consists of tcoded, |x| (both encoded in binary), M , d. Since tKDt counted logarithmically for KDt we have log tcoded ≤ KDt(x|y). KDt(x|y) is always greater than log |x|, for the same argument as above. One fixed M works for all appropriate inputs and its encoding therefore has constant length. Obviously, d ≤ KDt(x|y). Furthermore, y is given as part of the input to M , but does not add to Kt(x|y). Let us now inspect the running time of the code. The nondeterministic machine used a running time in Θ(|P |). After the conversion to a deterministic procedure we have by assumption a running time of (|P |)O(1) . In other words: A polynomial overhead might have been introduced relative to the nondeterministic running time tUP . Therefore we have tP ∈ O((tUP )c ) = O(tcKDt ), hence log(tP ) ∈ O(log tKDt ). All this, together with the fact that running time counts logarithmically, results in the required inequality Kt(x|y) ≤ O(KDt(x|y)).
Remark 15. For the same reason as in Remark 11, this proof would also work if we considered Levin’s original definition of Kt. Corollary 16. If for all x and y, Kt(x|y) ∈ O(KDt(x|y)), then UP = P. Proof. According to the theorem, the assumption implies that UP decision instances are P-solvable. Since a language in UP contains only such instances, the claim follows.
Fortnow and Kummer [FK, Theorem 24] proved an equivalence related to Theorem 14 in the setting of the “traditional”polynomially time-bounded Kolmogorov complexities C t and CDt [LiV, Chapter 7] where for example Ct (x) = min{|d| | U (d) runs on input x for t(|x|) steps and outputs x}. Theorem 17 (Fortnow, Kummer). The following two statements are equivalent: 1. UP decision instances are P-solvable. 2. For any polynomial t there are a polynomial t and a constant c ∈ N such that for all x and y it holds that C t (y|x) ≤ CDt (y|x) + c. Remark 18. In fact Fortnow and Kummer formulated their equivalence in terms of promise problems. Instead of the first statement in the theorem they used the assertion that the promise problem (1SAT, SAT) is in P, where for a promise
Generation Complexity Versus Distinction Complexity
465
problem (Q, R) to be in P means that there is a P-machine that accepts all x ∈ Q ∩ R and rejects all x ∈ Σ ∗ − R. Their formulation of the first statement is indeed equivalent to the one used above, because (1SAT, SAT) is complete for UP , as witnessed by a parsimonious version of Cook’s Theorem due to Simon [Sim, Theorem 4.1]. The following corollary is immediate from Theorems 14 and 17. Corollary 19. The following two statements are equivalent. 1. For all x and y, Kt(x|y) ∈ O(KDt(x|y)). 2. For any polynomial t there is a polynomial t and a constant c ∈ N such that for all x and y it holds that C t (y|x) ≤ CDt (y|x) + c. In analogy to the time-bounded case one can define the following two notions of space-bounded Kolmogorov complexity. Definition 20. The space-bounded distinction complexity Ks and generation complexity KDs are defined by ∀b ∈ {0, 1, ∗} : ∀i ≤ |x| + 1 : U (d, i, b) , Ks(x) = min |d| + log s runs in space s and accepts iff (x∗)i = b ∀y ∈ Σ |x| : U (d, y) runs in . KDs(x) = min |d| + log max(s, |x|) space s and accepts iff x = y Here U is a machine with a two-way read-only input tape where only the space on the work tapes is counted. Remark 21. For the definition of KDs it is relevant how the candidate y is provided to U and if the space for y is counted. Here we chose to do count the space for y which accounts for the term max |x| in the definition of KDs. This then implies the inequality log |x| ≤ KDs(x), which is analogous to the corresponding statement for KDt and will be used in the proof of Theorem 22. Theorem 22. For almost all x, it holds that Ks(x) ≤ 5 · KDs(x). Proof. Let N be a nondeterministic machine which on input (d, s, i, b, n) guesses a word y ∈ {0, 1}n, simulates the computation of U (d, y) while limiting the used space to s, and then accepts iff yi = b and U (d, y) accepts. In particular, if d is a distinguishing description for a word x ∈ {0, 1}n, then for all sufficiently large s and for all i ≤ n there is an accepting path of N on input (d, s, i, xi , |x|) but none on (d, s, i, x¯i , |x|). By the Theorem of Savitch there is a deterministic machine M that has the same acceptance behavior as N and uses space at most s2 ; observe in this connection that s is specified in the input of N and M , hence doesn’t have to be computed by M . Given a word x, fix a pair d and s such that d is a distinguishing program for x, it holds that |d| + log s ≤ KDs(x), and U uses space at most s on input (d, x).
466
R. H¨ olzl and W. Merkle
The specification of d, s, |x| and M therefore constitutes a Ks-program for x which runs in space s2 . By choice of d and s we have |d| + log s + log |x| ≤ 2KDs(x). Furthermore, the space s2 used in the computation of M counts only logarithmically, where 2 · log s ≤ 2 · KDs(x). Taking into account that M has to be specified and that some additional information is needed to separate the components of the Ks-program for x, we obtain Ks(x) ≤ 5 · KDs(x) for all sufficiently large x.
References [A]
Allender, E., Kouck´ y, M., Ronneburger, D., Roy, S.: Derandomization and distinguishing complexity. In: Proc. 18th Annual IEEE Conference on Computational Complexity, pp. 209–220. IEEE Computer Society Press, Los Alamitos (2003) [BFL] Buhrman, H., Fortnow, L., Laplante, S.: Resource-bounded Kolmogorov complexity revisited. SIAM Journal on Computing 31(3), 887–905 (2002) [CHV] Cai, J., Hemachandra, L.A., Vyskoˇc, J.: Promises and fault-tolerant database access. In: Ambos-Spies, K., Homer, S., Sch¨ oning, U. (eds.) Complexity Theory, pp. 227–244. Cambridge University Press, Cambridge (1993) [FK] Fortnow, L., Kummer, M.: On resource-bounded instance complexity. Theoretical Computer Science 161, 123–140 (1996) [Le] Levin, L.A.: Randomness conservation inequalities: Information and independence in mathematical theories. Information and Control 61, 15–37 (1984) [LiV] Li, M., Vit´ anyi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, Heidelberg (1997) [Sim] Simon, J.: On some central problems in computational complexity. Technical Report TR75-224, Cornell University (1975) [Sip] Sipser, M.: A complexity theoretic approach to randomness. In: Proc. 15th ACM Symp. Theory Comput, pp. 330–335. ACM Press, New York (1983)
Balancing Traffic Load Using One-Turn Rectilinear Routing Stephane Durocher1 , Evangelos Kranakis2, Danny Krizanc3 , and Lata Narayanan4 1
4
School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada [email protected] 2 School of Computer Science, Carleton University, Ottawa, Ontario, Canada [email protected] 3 Department of Mathematics and Computer Science, Wesleyan University, Middletown, Connecticut, USA [email protected] Department of Computer Science, Concordia University, Montr´eal, Qu´ebec, Canada [email protected]
Abstract. We consider the problem of load-balanced routing, where a dense network is modelled by a continuous square region and origin and destination nodes correspond to pairs of points in that region. The objective is to define a routing policy that assigns a continuous path to each origin-destination pair while minimizing the traffic, or load, passing through any single point. While the average load is minimized by straight-line routing, such a routing policy distributes the load nonuniformly, resulting in higher load near the center of the region. We consider one-turn rectilinear routing policies that divert traffic away from regions of heavier load, resulting in up to a 33% reduction in the maximum load while simultaneously increasing the path lengths by an average of less than 28%. Our policies are simple to implement, being both local and oblivious. We provide a lower bound that shows that no one-turn rectilinear routing policy can reduce the maximum load by more than 39% and we give a polynomial-time procedure for approximating the optimal randomized policy.
1
Introduction
The problem of routing in multi-hop wireless networks has received extensive attention in the last decade [1,2,12,14,18]. Many of the proposed routing protocols attempt to find shortest paths between pairs of nodes, or try to bound the stretch factor of the paths, while trying to ensure that the paths are loop-free. This approach takes into account a single packet traversing the network and tries to optimize performance for this packet. A more global and realistic view would consider the performance of the protocol under the assumption of many traffic flows in the network. In this situation, there can often be congestion created by several packets that need to be forwarded by the same intermediate nodes at the same time. This congestion is very likely to influence the latency experienced by M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 467–478, 2008. c Springer-Verlag Berlin Heidelberg 2008
468
S. Durocher et al.
a packet. A routing protocol should therefore attempt to avoid creating highly congested nodes. Not only does this improve packet latency, it would also improve the lifetime of a wireless network, where heavily loaded nodes may run out of battery power and disconnect the network. In this paper, we investigate routing protocols for wireless networks with the aim of minimizing the congestion experienced at nodes. We consider a multihop ad hoc network consisting of identical location-aware nodes, uniformly and densely deployed within a given planar region. Furthermore, we assume that the traffic pattern is uniform point-to-point communication, i.e., each node has the same number of packets to send to every other node in the network. This is sometimes called the all-to-all communication pattern. A routing policy must define, for every ordered pair of nodes (u, v), a path in the network to get from u to v. The load at a given node v is the number of paths that pass through v. The average (maximum) load for a network with respect to a particular routing policy is the average (respectively maximum) load over all nodes in the network. The fundamental question we wish to answer is: what routing policy minimizes the maximum load in the network? It seems intuitively evident that for nodes within a convex planar region, shortest path routing should cause maximum load near the geometric center. Indeed, this has been proved analytically for disks (see for example [16]) and squares and rectangles (see Section 3). This suggests that if load balancing is a fundamental concern then a good routing policy should redirect some of the traffic away from the geometric center and other areas of high load. However, load balancing cannot be the only concern: taking unnecessarily long paths just to bypass the center can drastically increase the stretch factor and the average load of nodes in the network, and can therefore be very inefficient in terms of energy consumption. Furthermore, it is critical that the forwarding strategy required to implement the routing policy be simple and have low memory requirements. Ideally, the routing policy should be oblivious (the route between u and v depends only on the identities or locations of u and v) and the forwarding strategy should be local (the forwarding node can make its decision based only on itself and its neighbors, and the packet header contains only the address of the destination). In the setting of nodes uniformly distributed in a given planar convex region, very little research has been done on finding a simple routing policy that achieves both a reasonable stretch factor and a minimum value of maximum congestion. In [6], an algorithm achieving a good tradeoff between stretch factor and load balance is shown for the special case when all nodes are located in a narrow strip of width at most 0.86 times the transmission radius. The analysis is not specific to the all-to-all communication pattern. Popa et al. [16] address the all-to-all routing problem for the case when the region containing the nodes is a unit disk. They establish quantitatively the crowded center effect for shortest-path routing as a nearly-quadratic function that peaks at the center of the disk and present a theoretical approach that is guaranteed to find paths minimizing the maximum load. They also give a practical solution (curveball routing) whose performance
Balancing Traffic Load Using One-Turn Rectilinear Routing
469
compares favorably to the optimum. No theoretical bounds are given on the stretch factor of the routes for either strategy. In this paper, we investigate the problem of load-balanced routing when the nodes are uniformly and densely packed in a square or rectangular region. As in [16], our approach is to look at the unit square (and the k × 1 rectangle) as a continuous space rather than formed by discrete nodes. This makes it possible to analyze the average and maximum load induced by a routing policy, without regard to the topology of the actual network. At the same time, the results should predict the behavior of a network with very densely and uniformly deployed nodes. Shortest-path routing corresponds to straight-line routing in this setting. We derive the average and maximum load for straight-line routing in a unit square and confirm the crowded-center effect for squares and rectangles. In keeping with the goal of minimizing congestion while ensuring a reasonable stretch factor, we investigate the class of rectilinear routing policies that assign to each origin-destination pair of nodes one of the two possible rectilinear paths containing only one turn. It is not difficult to show that √ all such one-turn rectilinear strategies have a maximum stretch factor of 2. Furthermore, they are simple and realistic in the ad hoc network setting; the routing policy is oblivious and the forwarding algorithm is local. We propose and analyze several simple rectilinear strategies, the best of which reduces the maximum load by about 33% compared to the straight-line policy. We also characterize the optimal randomized rectilinear policy as the solution to an optimization problem and provide an efficient procedure for approximating it. 1.1
Overview of Results
Our main contributions are summarized below: – We derive an exact expression for the load induced by a straight-line routing policy at an arbitrary point in the unit square. We show that the average and maximum load for the straight-line routing policy are 0.5214 and 1.1478 respectively. – We show that the average load for every one-turn rectilinear routing policy is 2/3.√The maximum and average stretch factor for such policies are shown to be 2 and 1.2737 respectively. – We propose several one-turn rectilinear routing policies and derive their maximum load. The best of these, called the diagonal rectilinear policy, achieves a maximum load of 0.7771, which represents a 33% improvement over straightline routing. – We prove a lower bound of 0.7076 on the the maximum load for any one-turn rectilinear policy. – We characterize the optimal randomized rectilinear policy as the solution to an optimization problem and provide an efficient procedure for approximating it. Numerical results suggest that the maximum load for the best possible rectilinear policy is close to 0.74. Detailed proofs for some results are omitted due to space restrictions.
470
1.2
S. Durocher et al.
Related Work
In this section, we briefly describe other efforts to address the congestion problem. Several studies confirm the crowded center effect for shortest path routing [8,11,15,16]. In [15,16], the load at the center of a circular area is derived analytically, by modelling the area as a continuous region, as in this paper, rather than as formed by discrete nodes. The node distribution resulting from a random waypoint mobility model in an arbitrary convex domain is analyzed in [10]; this is related to the load probability density for straight-line routing. The tradeoffs between congestion and stretch factor in wireless networks has been studied in [13] and [7]. For instance, for growth-bounded wireless networks, Gao and Zhang [7] show routing algorithms that simultaneously achieve a stretch factor of c and a load balancing ratio of O((n/c)1−1/k ) where k is the growth rate. (The load balancing ratio is defined to be the ratio between the maximum load on any node induced by the algorithm versus that created by the optimal algorithm.) They also derive an algorithm for unit disk graphs with bounded density and show that √if the density is constant, shortest path routing has a load balancing ratio of Θ( n). The communication patterns considered are arbitrary, the lower bound does not derive from the all-to-all communication pattern, and the routing algorithms are not oblivious. The all-to-all communication pattern has been studied extensively in the context of interconnection networks, and particularly in WDM optical networks. In this context, [4] defined the forwarding index of a communication network with respect to a specific routing algorithm to be the maximum number of paths going through any vertex in the graph. The forwarding index of the network itself is the minimum over all possible routing algorithms for the network. This notion was extended to the maximum load on an edge [9], which is more appropriate to wired networks. However, for wireless networks, the node forwarding index captures the load on a wireless node better. While the node forwarding index for specific networks, including the ring and torus networks has been derived exactly [4], it has not been studied for two-dimensional grid networks, which would perhaps be a good approximation for the dense wireless networks of interest to us. Our results in Section 6 provide an approximation for the forwarding index in grid graphs for the class of one-turn rectilinear routing schemes. There does not appear to be much work on routing with a view to reducing the congestion for the all-to-all communication pattern in specific planar regions, the model of interest in this paper. As stated earlier, [6] looks at nodes contained in a narrow strip and [16] addresses the problem for the unit disk. Busch et al. [3] analyze routing on embedded graphs via a random intermediate point located near the perpendicular bisector of the origin and destination; we consider the generalization of this strategy to convex regions in Section 4.4. Popa et al. [16] give expressions for the maximum and average load induced by straightline routing in unit disks, and propose a practical algorithm called curveball routing whose performance is close to the optimum for disks. They also provide experimental results on greedy routing versus curveball routing in square- and
Balancing Traffic Load Using One-Turn Rectilinear Routing
471
rectangular-shaped areas, and show that curveball routing achieves a reduction in load in such areas, but they do not provide any theoretical results.
2 2.1
Definitions Routing Policies and Traffic Load
Given a convex region A ⊆ R2 , a routing policy P assigns a route to every origindestination pair (u, v) ∈ A2 , where the route from u to v, denoted routeP (u, v), is a plane curve segment contained in A, whose endpoints are u and v. For a given routing policy P on a region A, the traffic load at a point p is proportional to the number of routes that pass through p. Formally, Definition 1. Given a routing policy P on a region A, the load at point p is 1 if p ∈ routeP (u, v), fP (p, u, v) du dv, where fP (p, u, v) = λP (p) = 0 otherwise. A The average load of routing policy P on region A is given by 1 λavg (P ) = λP (p) dp, (1) Area(A) A where Area(A) = A dp denotes the area of region A. The average length of a route determined by policy P between two points in A is given by 1 lengthavg (P ) = length(routeP (p, q)) dq dp. (2) Area(A)2 A Since length(routeP (u, v)) = A fP (p, u, v) dp, Proposition 1 follows from (1) and (2): Proposition 1. Given routing policy P on a region A, λavg (P ) = Area(A) · lengthavg (P ).
(3)
In addition to average load, a routing policy P on a region A is also characterized by its maximum load, given by λmax (P ) = max λP (p). p∈A
2.2
(4)
Straight-Line Routing Policy
The straight-line routing policy, denoted S, assigns to every pair (u, v) the route consisting of the line segment between u and v. In straight-line routing, (5) length(routeS (p, q)) = ||p − q|| = (px − qx )2 + (py − qy )2 .
472
S. Durocher et al.
Since the line segment from u to v is the shortest route from u to v, it follows that straight-line routing minimizes (2). Consequently, for any convex region A and any routing policy P = S, λavg (S) ≤ λavg (P ).
(6)
The average stretch factor and maximum stretch factor of routing policy P on region A are respectively given by length(routeP (p, q)) 1 dq dp stravg (P ) = Area(A)2 A length(routeS (p, q)) length(routeP (p, q)) strmax (P ) = max . {p,q}⊆A length(routeS (p, q)) 2.3
(7) (8)
One-Turn Rectilinear Routing Policies
Recent related work on this problem has considered the case when region A is a disk [16]. In this paper, we consider the case when region A is bounded by a square or a rectangle. As we show in Section 3, the load in straight-line routing on a square or a rectangle is maximized at its center. The maximum load can be decreased by redirecting routes that pass near the center to regions of lower traffic. This motivates the examination of one-turn rectilinear routing policies which we now define. A monotonic rectilinear routing policy assigns to every pair (u, v) a route consisting of a monotonic rectilinear path from u to v, i.e., a path comprised of a series of axis-parallel line segments such that any axis-parallel line intersects the path at most once. A one-turn rectilinear routing policy assigns to every pair (u, v) a monotonic rectilinear path consisting of one horizontal line segment and one vertical line segment joining u to v via an intermediate point w. Point w may coincide with u or v. For any monotonic rectilinear routing policy P , length(routeP (p, q)) = |px − qx | + |py − qy |.
(9)
In general, there are two possible one-turn rectilinear routes from a given origin (ux , uy ) to a given destination (vx , vy ). We refer to these as row-first and columnfirst, where the row-first route passes through the intermediate point (vx , uy ) and the column-first route passes through the intermediate point (ux , uy ).
3
Straight-Line Routing on a Square
In this section we examine the load of straight-line routing on the unit square. These values serve as milestones against which the optimality of all other routing policies on the unit square are compared.
Balancing Traffic Load Using One-Turn Rectilinear Routing
3.1
473
Average Load
By Proposition 1, the average load in the unit square under straight-line routing is equal to the expected distance between two points selected at random in the square. This value is a box integral with the following solution [17]: √ √ 2 + 2 + 5 ln(1 + 2) ≈ 0.5214. (10) lengthavg (S) = 15 By (6), the average load (and maximum load) of any routing policy on the unit square is bounded from below by (10). 3.2
Load at an Arbitrary Point
Since straight-line routing is symmetric in the x- and y-dimensions, we derive the load at an arbitrary point p located in an octant of the unit square. The load at an arbitrary point in the unit square is then easily found using the appropriate coordinate transformation. Theorem 1. Given a point p = (px , py ) such that 1/2 ≤ py ≤ px ≤ 1, the load at p using straight-line routing is given by
α π/2−β g4 (θ) λS (p) = (1 − px )px g1 (θ) + (1 − px )2 py θ=0 θ=α
π/2−β π/2 + (1 − px )p2y g3 (θ) + (1 − py )py g2 (θ) θ=α θ=π/2−β
π/2+γ π−δ + (1 − py )py g2 (θ) − (1 − py )2 (1 − px ) g3 (θ) θ=π/2 θ=π/2+γ
π−δ π g4 (θ) − px (1 − px ) g1 (θ) , (11) + (1 − py )(1 − px )2 θ=π/2+γ
θ=π−δ
where expressions for α, β, γ, δ, and g1 through g4 are omitted for lack of space. Expression (11) has a closed-form polylogarithmic representation (free of any trigonometric terms). The complete expression is not reproduced here due to the large number of terms but can be easily reconstructed from (11). 3.3
Maximum Load
We now derive the maximum load for straight-line routing on the unit square and show that this value is realized at the center of the square. Theorem 2. The maximum load for straight-line routing on the unit square is √ √ 3 1 1 λmax (S) = √ + ln( 2 + 1) − ln( 2 − 1) ≈ 1.1478, 8 8 2 realized uniquely at the center of the square.
(12)
474
4
S. Durocher et al.
One-Turn Rectilinear Routing on a Square
In this section we consider various one-turn rectilinear routing policies on the unit square and compare these against straight-line routing. Our objective in designing these policies was to reduce the maximum load by redirecting routes for particular regions of origin-destination pairs away from high-traffic areas and towards low-traffic areas while maintaining a low stretch factor. 4.1
Average Load
Theorem 3. The average load for any monotonic rectilinear routing policy on the unit square is 2/3. Proof. By Proposition 1 and (9), the average load is equal to the average 1 distance between two points in the unit square. This value is 111 1 |ux − vx | + |uy − vy | dvy dvx duy dux =
λavg (P ) =
2 . 3
(13)
0 0 0 0
4.2
Average Stretch Factor
It is straightforward to see that √ the maximum stretch factor for any monotonic rectilinear routing policy is 2. We now consider the average stretch factor. Theorem 4. The average stretch factor for any monotonic rectilinear routing policy P on the unit square is √ √ 1 10 ln(2 + 2) + 2 2 − 4 − 5 ln(2) ≈ 1.2737. stravg (P ) = (14) 6 4.3
Diagonal Rectilinear Routing
We define a routing policy in terms of the partition of the unit square induced by its two diagonals. Let R1 through R4 denote the four regions of the partition such that R1 is at the bottom of the square and the regions are numbered in clockwise order. If the origin lies in R1 or R3 , the row-first route is selected. Otherwise, the column-first route is selected. We refer to this routing policy, denoted PD , as diagonal rectilinear routing. As we did in Section 3.2, we derive the load at an arbitrary point p located in an octant of the unit square since PD is symmetric in the x- and y-dimensions. The load at an arbitrary point in the unit square is then easily found using the appropriate coordinate transformation. Theorem 5. Given a point p = (px , py ) such that 0 ≤ py ≤ px ≤ 1/2, the load at p using diagonal rectilinear routing is 7 3 λPD (p) = 2p3x − 5p2x + px − 2px py + py − 3p2y + 2p3y . 2 2
(15)
Balancing Traffic Load Using One-Turn Rectilinear Routing
1a
1b
1c
1d
2a
2b
475
2c
Fig. 1. Illustration in support of Theorem 5. The white dot denotes point p.
Proof. Let u = (ux , uy ) denote the origin and let v = (vx , vy ) denote the destination. The relative positions of u, v, and p can be divided into seven cases such that the load at p corresponds to the sum of the measure of the regions of possible origin-destination combinations in each case. In Cases 1a through 1d, u lies in region R1 or R3 and, consequently, the row-first route is selected. In Cases 2a through 2c, u lies in region R2 or R4 and, consequently, the column-first route is selected. See Fig. 1. The theorem follows by summing the contribution to load in each of these cases. Details are omitted for lack of space. √ It can be shown that (15) is maximized when px = 56 − 13 3 − 12 11 ≈ √
√ 1 0.4472 and py = 23 − 611 ≈ 0.1139, with load λmax (PD ) = 27 ≈ 11 − 31 2 0.7771. 4.4
Additional Policies Considered
We describe additional one-turn rectilinear routing policies considered. In each case, the maximum load was shown to be strictly greater than that of diagonal rectilinear routing. Recall that all one-turn rectilinear routing policies have equal average load (Theorem 3). Equal Distribution. A simple initial strategy to consider is to assign to each origin-destination pair (u, v) the row-first route. For any point p ∈ [0, 1]2 , λPR (p) = λP (p), where PR denotes the row-first routing policy and P denotes any policy that assigns the pairs (u, v) and (v, u) different one-turn rectilinear routes for all u and v. Outer Turn. Consider the routing policy that selects the one-turn rectilinear route whose intermediate point is furthest from the center of the square. If the two intermediate points are equidistant from the origin, then a route is assigned as in the equal distribution policy. Grid-Based Regions. Divide the unit square into nine rectangular regions whose boundaries intersect the x- and y-axes at 0, k, 1 − k, and 1, respectively, for some fixed k ∈ [0, 1/2]. There are three types of regions: corner, mid-boundary, and one central region. If the origin is located in a mid-boundary region and the destination is in a non-adjacent corner region then select the one-turn rectilinear route that avoids passing through the central region. Similarly, a route from a corner regions to a non-adjacent mid-boundary region must avoid passing through the central region. For all other origin-destination pairs, routes are
476
S. Durocher et al.
assigned as in the equal distribution policy. A second grid-based routing policy is defined by adding the constraint that a route from a mid-boundary region to an adjacent mid-boundary region must also avoid passing through the central region. Line Division. Let l denote the line passing through the origin u and destination v. If l does not pass through the center of the square, c, select the one-turn rectilinear route whose intermediate point is opposite l from c. If l passes through c, then a route is assigned as in the equal distribution policy. Random Intermediate Point. A random intermediate point connected to the origin and destination by straight-line routes results in load exactly twice that of straight-line routing. Perhaps a better strategy is that described by Busch et al. [3] which can be generalized to convex regions; an intermediate point is selected at random on the perpendicular bisector of the origin and destination. Note that although this policy involves a single turn, it is not a rectilinear routing policy. Summary. Table 1 summarizes bounds on average and maximum load for each of the above routing policies and straight-line routing. We derived the load at an arbitrary point in the unit square for five of these policies; the corresponding plots are illustrated in Fig. 2. The diagonal rectilinear routing policy, PD , achieves the lowest maximum load, significantly lower than the maximum load of straight-line routing and not much greater than the lower bound. Table 1. Comparing routing policies on the unit square routing policy λavg straight-line S diagonal PD equal dist. PR outer turn
1.2
λmax
routing policy λavg λmax
0.5214 1.1478 2/3 0.7771 2/3 1 2/3 ≥ 0.8977
1.2
grid-based (1) grid-based (2) line division lower bound
1.2
2/3 7/8 2/3 0.8541 2/3 ≥ 7/8 0.7076
1.2
1.2
1
1
1
1
1
0.8
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.4
0.4
0.2
0.4
0.2
0.8 0.8 0.6 y
0.6 0.4 x
0.4
0.2
0.2 0
0
0.2
1 0 1
0.8 0.8 0.6 y
0.6 0.4 x
0.4
0.2
0.2 0
0
0.4
0.4
0.2
1 0 1
0.2
1 0 1
0.8 0.8 0.6 y
0.6 0.4 x
0.4
0.2
0.2 0
0
1 0 1
0.8 0.8 0.6 y
0.6 0.4 x
0.4
0.2
0.2 0
0
1 0 1
0.8 0.8 0.6 y
0.4
0.4
0.6 x
0.2
0.2 0
0
Fig. 2. These plots display λP (p) for p ∈ [0, 1]2 for five routing policies: (left to right) straight-line S, equal distribution PR , grid-based (two policies), and diagonal PD
Balancing Traffic Load Using One-Turn Rectilinear Routing
5
477
Lower Bounds on Load for One-Turn Rectilinear Routing Policies
Naturally, no monotonic rectilinear routing policy can have a maximum load less than the average load of 2/3. In this section we establish a stronger lower bound on the maximum load of any one-turn rectilinear routing policy. Theorem 6. No one-turn rectilinear routing policy can guarantee a maximum load less than 0.7076.
6
Optimal Randomized One-Turn Rectilinear Routing Policies
In this section we give a characterization of the optimal randomized one-turn rectilinear strategy as the solution of an optimization problem and provide an efficient procedure for approximating it. A deterministic one-turn rectilinear strategy is equivalent to a function P : [0, 1]4 → {0, 1} where P (u, v, s, t) = 1 iff the route from (u, v) to (s, t) uses the column-first path. (If u = s or v = t, we define P (u, v, u, t) = P (u, v, s, v) = 1.) We can generalize this to randomized rectilinear schemes by considering Q : [0, 1]4 → [0, 1] where if Q(u, v, s, t) = q then a packet travelling from point (u, v) to (s, t) takes the the column-first path with probability q and the row-first path with probability 1 − q. A formula for the expected λ(x, y) at a point (x, y) can be found easily. The optimal strategy is given by the solution to the following optimization problem: minQ max(x,y) λ(x, y). While we can’t directly solve this problem we can approximate it by considering finer and finer partitions of the square into n2 1/n by 1/n subsquares and giving a strategy for all packets routing between each pair of subsquares. Now our problem is equivalent to finding a randomized oneturn rectilinear routing strategy for an n × n grid that minimizes the number of packets using any particular node of the grid under an all-to-all communication pattern. Let pijkl , 1 ≤ i, j, k, l ≤ n, be the probability that a packet starting in subsquare (i, j) going to subsquare (k, l) uses the column-first path and let the maximum expected load at any point in subsquare (r, s) be λ(r, s). An upper bound on λ(r, s) is easily derived and our problem now reduces to minpijkl maxr,s λ(r, s), which is equivalent to the following linear program with n4 + 1 variables and 2n4 + n2 constraints (solvable in polynomial time): Minimize z Subject to 0 ≤ pijkl ≤ 1, 1 ≤ i, j, k, l ≤ n, z − λ(r, s) ≥ 0, 1 ≤ r, s ≤ n. Table 2 shows an upper bound on the maximum load achieved by the strategy obtained by using an n × n grid to approximate the unit square for 2 ≤ n ≤ 12. The results indicate that the optimal strategy achieves a maximum load
478
S. Durocher et al.
Table 2. Approximations to the optimal randomized strategy using n × n grids n 2 3 4 5 6 7 8 9 10 11 12 max. load 1.0000 0.8889 0.8264 0.8009 0.7813 0.7759 0.7650 0.7610 0.7530 0.7499 0.7446
of approximately 0.74. The solutions were found using the CVXOPT convex optimization package [5]. We were unable to obtain results for larger n due to memory limitations.
References 1. Bose, P., Morin, P., Stojmenovic, I., Urrutia, J.: Routing with guaranteed delivery in ad hoc wireless networks. Wireless Networks 7, 609–616 (2001) 2. Broch, J., Johnson, D., Maltz, D.: The dynamic source routing protocol for mobile ad hoc networks (1998): Internet-draft, draft-ietf-manet-dsr-00.txt 3. Busch, C., Magdon-Ismail, M., Xi, J.: Oblivious routing on geometric networks. In: Proc. ACM SPAA, vol. 17 (2005) 4. Chung, F.R.K., Coffman Jr., E.G., Reiman, M.I., Simon, B.: The forwarding index of communication networks. IEEE Trans. Inf. Th. 33(2), 224–232 (1987) 5. Dahl, J.: Cvxopt: A python package for convex optimization. In: Proc. Eur. Conf. Op. Res. (2006) 6. Gao, J., Zhang, L.: Load balanced short path routing in wireless networks. In: Proc. IEEE INFOCOM, vol. 23, pp. 1099–1108 (2004) 7. Gao, J., Zhang, L.: Tradeoffs between stretch factor and load balancing ratio in routing on growth restricted graphs. In: Proc. ACM PODC, pp. 189–196 (2004) 8. Gupta, P., Kumar, P.R.: The capacity of wireless networks. IEEE Trans. Inf. Th. 46, 388–404 (2000) 9. Heydemann, M.C., Meyer, J.C., Sotteau, D.: On forwarding indices of networks. Disc. App. Math. 23, 103–123 (1989) 10. Hyyti¨ a, E., Lassila, P., Virtamo, J.: Spatial node distribution of the random waypoint mobility model with applications. IEEE Trans. Mob. Comp. 6, 680–694 (2006) 11. Jinyang, L., Blake, C., Couto, D.D., Lee, H., Morris, R.: Capacity of ad hoc wireless networks. In: Proc. ACM MOBICOM (2001) 12. Kranakis, E., Singh, H., Urrutia, J.: Compass routing on geometric networks. In: Proc. CCCG, pp. 51–54 (1999) 13. Meyer auf der Heide, F., Schindelhauer, C., Volbert, K., Grunwald, M.: Energy, congestion, and dilation in radio networks. In: Proc. ACM SPAA, pp. 230–237 (2002) 14. Park, V., Corson, S.: A highly adaptive distributed routing algorithm for mobile wireless networks. In: Proc. IEEE INFOCOM, pp. 1405–1413 (1997) 15. Pham, P.P., Perreau, S.: Performance analysis of reactive shortest path and multipath routing mechanism with load balance. In: Proc. IEEE INFOCOM, pp. 251– 259 (2003) 16. Popa, L., Rostami, A., Karp, R.M., Papadimitriou, C., Stoica, I.: Balancing traffic load in wireless networks with curveball routing. In: Proc. ACM MOBIHOC (2007) 17. Santal´ o, L.A.: Integral Geometry and Geometric Probability. Cambridge University Press, Cambridge (2004) 18. Stojmenovic, I.: Position based routing in ad hoc networks. IEEE Comm. Mag. 40, 128–134 (2002)
A Moderately Exponential Time Algorithm for Full Degree Spanning Tree Serge Gaspers, Saket Saurabh, and Alexey A. Stepanov Department of Informatics, University of Bergen, N-5020 Bergen, Norway {serge,saket}@ii.uib.no, [email protected]
Abstract. We consider the well studied Full Degree Spanning Tree problem, a NP-complete variant of the Spanning Tree problem, in the realm of moderately exponential time exact algorithms. In this problem, given a graph G, the objective is to find a spanning tree T of G which maximizes the number of vertices that have the same degree in T as in G. This problem is motivated by its application in fluid networks and is basically a graph-theoretic abstraction of the problem of placing flow meters in fluid networks. We give an exact algorithm for Full Degree Spanning Tree running in time O(1.9172n ). This adds Full Degree Spanning Tree to a very small list of “non-local problems”, like Feedback Vertex Set and Connected Dominating Set, for which non-trivial (non brute force enumeration) exact algorithms are known.
1
Introduction
The problem of finding a spanning tree of a connected graph arises at various places in practice and theory, like the analysis of communication or distribution networks, or modeling problems, and can be solved efficiently in polynomial time. On the other hand, if we want to find a spanning tree with some additional properties like maximizing the number of leaves or minimizing the maximum degree of the tree, the problem becomes NP-complete. This paper deals with one of the NP hard variants of Spanning Tree, namely Full Degree Spanning Tree from the view point of moderately exponential time algorithms. Full Degree Spanning Tree (FDST): Given an undirected connected graph G = (V, E), find a spanning tree T of G which maximizes the number of vertices of full degree, that is the vertices having the same degree in T as in G. The FDST problem is motivated by its applications in water distribution and electrical networks [15,16,17,18]. Pothof and Schut [18] studied this problem in the context of water distribution networks where the goal is to determine or control the flows in the network by installing and using a small number of flow meters. It turns out that to measure flows in all pipes, it is sufficient to find a full degree spanning tree T of the network and install flow meters (or pressure M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 479–489, 2008. c Springer-Verlag Berlin Heidelberg 2008
480
S. Gaspers, S. Saurabh, and A.A. Stepanov
gauges) at each vertex of T that does not have full degree. We refer to [1,4,11] for a more detailed description of various applications of FDST. The FDST problem has attracted a lot of attention recently and has been studied extensively from different algorithmic paradigms, developed for coping with NP-completeness. Pothof and Schut [18] studied this problem first and gave a simple heuristic based algorithm. Bhatia et al. [1] studied it from √ the view point of approximation algorithms and gave an algorithm of factor O( n). On the negative side, they show that FDST is hard to approximate within a factor 1 of O(n 2 − ), for any > 0, unless coR = N P , a well known complexity-theoretic hypothesis. Guo et al. [10] studied the problem in the realm of parameterized complexity and observed that the problem is W[1]-complete. The problem which is dual to FDST is also studied in the literature, that is the problem of finding a spanning tree that minimizes the number of vertices not having full degree. For this dual version of the problem, Khuller et al [11] gave an approximation algorithm of factor 2 + for any fixed > 0, and Guo et al. [10] gave a fixed parameter tractable algorithm running in time 4k nO(1) . FDST has also been studied on special graph classes like planar graphs, bounded degree graphs and graphs of bounded treewidth [4]. The goal of this paper is to study Full Degree Spanning Tree in the context of moderately exponential time algorithms, another coping strategy to deal with NP-completeness. We give a O(1.9172n) time algorithm breaking the trivial 2n nO(1) barrier. Exact exponential time algorithms have an old history [5,14] but the last few years have seen a renewed interest in the field. This has led to the advancement of the state of the art on exact algorithms and many new techniques based on Inclusion-Exclusion, Measure & Conquer and various other combinatorial tools have been developed to design and analyze exact algorithms [2,3,7,8,12]. Branch & Reduce has always been one of the most important tools in the area but its applicability was mostly limited to ‘local problems’ (where the decision on one element of the input has direct consequences for its neighboring elements) like Maximum Independent Set, SAT and various other problems, until recently. In 2006, Fomin et al.[9] devised an algorithm for Connected Dominating Set (or Maximum Leaf Spanning Tree) and Razgon [19] for Feedback Vertex Set combining sophisticated branching and a clever use of measure. Our algorithm adheres to this machinery and adds an important real life problem to this small list. We also need to use an involved measure, which is a function of the number of vertices and the number of edges to be added to the spanning tree, to get the desired running time.
2
Preliminaries
Let G be a graph. We use V (G) and E(G) to denote the vertices and the edges of G respectively. We simply write V and E if the graph is clear from the context. For V ⊆ V we define an induced subgraph G[V ] = (V , E ), where E = {uv ∈ E : u, v ∈ V }.
A Moderately Exponential Time Algorithm
481
Let v ∈ V , we denote by N (v) the neighborhood of v, namely N (v) = {u ∈ V : uv ∈ E}. The closed neighborhood N [v] of v is N (v) ∪ {v}. In the same way we define N [S] for S ⊆ V as N [S] = ∪v∈S N [v] and N (S) = N [S] \ S. We define the degree of vertex v in G as the number of vertices adjacent to v in G. Namely, dG (v) = |{u ∈ V (G) : uv ∈ E(G)}|. Let G be a graph and T be a spanning tree of G. A vertex v ∈ V (G) is a full degree vertex in T , if dG (v) = dT (v). We define a full degree spanning tree to be a spanning tree with the maximum number of full degree vertices. One can similarly define full degree spanning forest by replacing tree with forest in the earlier definition. A set I ⊆ V is called an independent set for G if no vertex v in I has a neighbor in I.
3
Algorithm for Full Degree Spanning Tree
In this Section we give an exact algorithm for the FDST problem. Given an input graph G = (V, E), the basic idea is that if we know a subset S of V for which there exists a spanning tree T where all the vertices in S have full degree then, given this set S, we can construct a spanning tree T where all the vertices in S have full degree in polynomial time. Our first observation towards this is that all the edges incident to the vertices in S, that is ES = {uv ∈ E such that u ∈ S or v ∈ S }
(1)
induce a forest. For our polynomial time algorithm we start with the forest (V, ES ) and then complete this forest into a spanning tree by adding edges to connect the components of the forest. The last step can be done by using a slightly modified version of the Spanning Tree algorithm of Kruskal [13] that we denote by poly fdst(G, S). The rest of the section is devoted to finding a largest subset of vertices S for which we can find a spanning tree where the vertices of S have full degree. Our algorithm follows a branching strategy and as a partial solution keeps a set of vertices S for which there exists a spanning tree where the vertices in S have full degree. The standard branching step chooses a vertex v that could be included in S and then recursively tries to find a solution by including v in S and not including v in S. But when v is not included in S, it cannot be removed from further consideration as cycles involving v might be created later on in (V, ES ) by adding neighbors of v to S. Hence we resort to a coloring scheme for the vertices, which can also be thought of as a partition of the vertex set of the input graph. At any point of the execution of the algorithm, the vertices are partitioned as below: 1. Selected S: The set of vertices which are decided to be of full degree. 2. Discarded D: The set of vertices which are not required to be of full degree. 3. Undecided U : The set of vertices which are neither in S nor D, that is those vertices which are yet to be decided. So, U = V \ (S ∪ D).
482
S. Gaspers, S. Saurabh, and A.A. Stepanov
Next we define a generalized form of the FDST problem based on the above partition of the vertex set. But before that we need the following definition. Definition 1. Given a vertex set S ⊆ V , we define the partial spanning tree of G induced by S as T (S) = (N [S], ES ) where ES is defined as in Equation (1). For our generalized problem, we denote by G = (S, D, U, E) the graph (V, E) with vertex set V = S ∪ D ∪ U partitioned as above. Generalized Full Degree Spanning Tree (GFDST): Given an instance G = (S, D, U, E) such that T (S) is connected and acyclic, the objective is to find a spanning forest which maximizes the number of vertices of U of full degree under the constraint that all the vertices in S have full degree. If we start with a graph G, an instance of FDST, with the vertex partition S = D = ∅ and U = V then the problem we will have at every intermediate step of the recursive algorithm is GFDST. Also, note that a full degree spanning forest of a connected graph can easily be extended to a full degree spanning tree and that a full degree spanning tree is a full degree spanning forest. As suggested earlier our algorithm is based on branching and will have some reduction rules that can be applied in polynomial time, leading to a refined partitioning of the vertices. Before we come to the detailed description of the algorithm, we introduce a few more important definitions. For given sets S, D and U , we say that an edge is (a) unexplored if one of its endpoints is in U and the other one in U ∪ D, (b) forced if at least one of its endpoints is in S, and (c) superfluous if both its endpoints are in D. The basic step of our algorithm chooses an undecided vertex u ∈ U and considers two subcases that it solves recursively: either u is selected, that is u is moved from U to S, or u is discarded, that is moved from U to D. But the main idea is to choose a vertex in a way that the connectivity of T (S) is maintained in both recursive calls. To do so we choose u from U ∩ N [N [S]]. This brings us to the following definition. Definition 2. The vertices in U ∩ N [N [S]] are called candidate vertices. On the other hand, if S is not empty and the graph does not contain a candidate vertex, then D can be partitioned into two sets: (a) those vertices in D that have neighbors in S and (b) those that have neighbors in U . Superfluous edges (with both endpoints in D) are removed by reduction rule R1 making G disconnected in this case, and then the algorithm is executed on each connected component. Now we are ready to describe the algorithm in details. We start with a procedure for reduction rules in the next subsection and prove that these rules are correct.
A Moderately Exponential Time Algorithm
3.1
483
Reduction Rules
Given an instance G = (S, D, U, E) of GFDST, a reduced instance of G is computed by the following procedure. Reduce(G = (S, D, U, E)) R1 If there is a superfluous edge e, then return Reduce((S, D, U, E \ {e})) R2 If there is a vertex u ∈ D ∪ U such that d(u) = 1, then remove the unique edge e incident on it and return Reduce((S, D, U, E \ {e})). R3 If there is an undecided vertex u ∈ U such that T (S ∪ {u}) contains a cycle, then discard u, that is return Reduce((S, D ∪ {u}, U \ {u}, E)). R4 If there is a candidate vertex u that is incident to at most one vertex in U ∪ D, then select u, and return Reduce((S ∪ {u}, D, U \ {u}, E)). R5 If S = ∅ and there exists a vertex u ∈ U of degree 2, then select u and return Reduce((S ∪ {u}, D, U \ {u}, E)). R6 If there is a candidate vertex u of degree 2, then select u and return Reduce((S ∪ {u}, D, U \ {u}, E)). Else return G Now we argue about the correctness of the reduction rules, more precisely that there exists a spanning forest of G such that a maximum number of vertices preserve their degree and the partitionning of the vertices into the sets S, D and U of the graph resulting from a call to Reduce(G = (S, D, U, E)) is respected. Note that the reduction rules are applied in the order of their appearance. The correctness of R1 follows from the fact that discarded vertices are not required to have full degree. For the correctness of reduction rule R2, consider a vertex u ∈ D ∪ U of degree 1 with unique neighbor w. Let G = (S, D, U, E \ {uw}) be the graph resulting from the application of the reduction rule. Note that the edge uw is not part of any cycle and that a full degree spanning forest of G can be obtained from a full degree spanning forest of G by adding the edge uw. As Algorithm poly fdst(G, S) adds edges to make the obtained spanning forest into a spanning tree, the edge uw is added to the final solution. For the correctness of reduction rule R3, it is enough to observe that if for a subset S ⊆ V , there exists a spanning tree T such that all the vertices of S have full degree then T (S) is a forest. We prove the correctness of R4, R5 and R6 by the following lemmata. Lemma 1. Let G = (V, E) be a graph and T be a full degree spanning forest for G. If v ∈ V is a vertex of degree dG (v) − 1 in T , then there exists a full degree spanning forest T such that v has degree dG (v) in T . Proof. Let u ∈ V be the neighbor of v such that uv is not an edge of T . Note that both u and v do not have full degree in T , are not adjacent and belong to the same tree in T . The last assertion follows from the fact that if u and v belong to two different trees of T then one can safely add uv to T and obtain a forest T that has a larger number of full degree vertices, contradicting that T
484
S. Gaspers, S. Saurabh, and A.A. Stepanov
is a full degree spanning forest. Now, adding the edge uv to T creates a unique cycle passing through u and v. We obtain the new forest T by removing the other edge incident to u on the cycle, say uw, w = v. So, T = T \ {uw} + {uv}. The number of full degree vertices in T is at least as high as in T as v becomes a full degree vertex and at most one vertex, w, could become non full degree.
We also need a generalized version of Lemma 1. Lemma 2. Let G = (S, D, U, E) be a graph and T be a full degree spanning forest for G such that the vertices in S have full degree. Let v ∈ U a candidate vertex such that its neighbors in D ∪ U are not incident to a forced edge. If v has degree dG (v) − 1 in T , then there exists a full degree spanning forest T such that v has degree dG (v) in T and the vertices in S have full degree. Proof. The proof is similar to the one of Lemma 1. The only difference is that we need to show that the vertices of S remain of full degree and for that we need to show that all the edges of T (S) remain in T . To this observe that all the edges incident to the neighbors of v in D ∪ U in T do not belong to edges of T (S), that is they are not forced edges. So if uv is the unique edge incident to v missing in T then we can add uv to T and remove the other non-forced edge
on u from the unique cycle in T + {uv} and get the desired T . Now consider reduction rule R4. If u is a candidate vertex with unique neighbor w in D ∪ U then (a) u ∈ N (S) and (b) all the edges incident to w are not forced, otherwise reduction rule R2 or R3 would have applied. Now the correctness of the reduction rule follows from Lemma 2. The correctness proof of reduction rule R6 is similar. Here u belongs to N [N [S]] ∩ U but all the edges incident to its unique neighbor in V \ N [S] are not forced and again Lemma 2 comes into play. To prove the correctness of reduction rule R5, we need to show that there exists a spanning forest where u has full degree. Suppose not and let T be any full degree spanning forest of G. Without loss of generality, suppose that u has degree 1 in T (if u is an isolated vertex in T , then add one edge incident to u to T ; this does not create any cycle in T and does not decrease the number of vertices of full degree in T ). Let v be the unique neighbor of u in T . But since S = ∅, there are no forced edges and we can apply Lemma 2 again and conclude. This finishes the correctness proof of the reduction rules. Before we go into the details of the algorithm we would like to point out that all our reduction rules preserve the connectivity of T (S). 3.2
Algorithm
In this section we describe our algorithm in details. Given an instance G = (S, D, U, E) of GDPST, our algorithm recursively solves the problem by choosing a vertex u ∈ U and including u in S or in D and then returning as solution the one which has maximum sized S. The algorithm has various cases based on the number of unexplored edges incident to u. Algorithm fdst(G), described below, returns a super-set S ∗ of S corresponding to the full degree vertices in a full degree spanning forest respecting the
A Moderately Exponential Time Algorithm
485
initial choices for S and D. After this, poly fdst(G, S ∗ ) returns a full degree spanning tree of G as described in the beginning of the section. The description of the algorithm consists of the application of the reduction rules and a sequence of cases. A case consists of a condition (first sentence) and a procedure to be executed if the condition holds. The first case which applies is used in the algorithm. Thus, inside a given case, the conditions of all previous cases are assumed to be false. fdst(G = (S, D, U, E)) Replace G by Reduce(G). Case 1: U is a set of isolated vertices. Return S ∪ U . Case 2: S = ∅. Choose a vertex u ∈ U of degree at least 3. Return the largest set among fdst((S ∪ {u}, D, U \ {u}, E)) and fdst((S, D ∪ {u}, U \ {u}, E)). Case k3: G has at least 2 connected components, say G1 , G2 , · · · , Gk . Return i=1 fdst((S ∩ V (Gi ), D ∩ V (Gi ), U ∩ V (Gi ), E ∩ E(Gi ))). Case 4: There is a candidate vertex u with at least 3 unexplored incident edges. Make two recursive calls: fdst((S ∪ {u}, D, U \ {u}, E)) and fdst((S, D ∪ {u}, U \ {u}, E)), and return the largest obtained set. Case 5: There is a candidate vertex u with at least one neighbor v in U and exactly two unexplored incident edges. Make two recursive calls: fdst((S ∪ {u}, D, U \ {u}, E)) and fdst((S, D ∪ {u, v}, U \ {u, v}, E)), and return the largest obtained set. From now on let v1 and v2 denote the discarded neighbors of a candidate vertex u (see Figure 1). Case 6: Either v1 and v2 have a common neighbor x = u; or v1 (or v2 ) has a neighbor x = u that is a candidate vertex; or v1 (or v2 ) has a neighbor x of degree 2. Make two recursive calls: fdst((S ∪ {u}, D, U \ {u}, E)) and fdst((S, D ∪ {u}, U \ {u}, E)), and return the largest obtained set. Case 7: Both v1 and v2 have degree 2. Let w1 and w2 (w1 = w2 ) be the other (different from u) neighbors of v1 and v2 in U respectively. Make recursive calls as usual, but also explore all the possibilities for w1 and w2 if u ∈ S. When u is in S, recurse on all possible ways one can add a subset of A = {w1 , w2 } to S. That is make recursive calls fdst((S, D ∪ {u}, U \ {u}, E)) and fdst((S ∪ {u} ∪ X, D ∪ (A − X), U \ ({u} ∪ A), E)) for each independent set X ⊆ A, and return the largest obtained set. Case 8: At least one of {v1 , v2 } has degree ≥ 3. Let {u, w1 , w2 , w3 } ⊆ N ({v1 , v2 }) and let A = {w1 , w2 , w3 }. Make recursive calls fdst((S, D ∪ {u}, U \ {u}, E)) and fdst((S ∪ {u} ∪ X, D ∪ (A − X), U \ ({u} ∪ A), E)) for each independent set X ⊆ A, and return the largest obtained set.
4
Correctness and Time Complexity of the Algorithm
We prove the correctness and the time complexity of Algorithm fdst in the following theorem.
486
S. Gaspers, S. Saurabh, and A.A. Stepanov
S
v1
u
w1
w2
D
v2
U
Fig. 1. Illustration of Case 7. Cases 6 and 8 are similar.
Theorem 1. Given an input graph G = (S, D, U, E) on n vertices such that T (S) is connected and acyclic, Algorithm fdst returns a maximum size set S ∗ , S ⊆ S ∗ ⊆ S ∪ U such that there exists a spanning forest for G where all the vertices in S ∗ have full degree in time O(1.9172n). Proof. The correctness of the reduction rules is described in Section 3.1. The correctness of Case 1 follows, as any isolated vertex belonging to U has full degree in any spanning forest. Similarly, the correctness of Case 3 follows from the fact that any spanning forest of G is a spanning forest of each of the connected components of G. The remaining cases, except Case 5, of Algorithm fdst are branching steps where the algorithm chooses a vertex u ∈ U and tries both possibilities: u ∈ S or u ∈ D. Sometimes the algorithm branches further by looking at the local neighborhood of u and trying all possible ways these vertices can be added to either S or D. Since all possibilities are tried to add vertices of U to D or S in Cases 3, 4 and 6 to 8, these cases are correct and do not need any further justifications. The correctness of Case 5 requires special attention. Here we use the fact that there exists a full degree spanning forest with all the vertices in S having full degree, such that either u ∈ S or u and its neighbor v ∈ U are in D. We prove the correctness of this assertion by contradiction. Suppose all the full degree spanning forests such that all the vertices in S are of full degree have u of non full degree and v of full degree. But notice that u ∈ N (S) (see R6) and all the neighbors of u in D ∪ U do not have any incident forced edges. Now we can use Lemma 2 to get a spanning forest which contains u and is a full degree spanning forest with all the vertices in S having full degree. Now we move on to the time complexity of the algorithm. The measure of subproblems is generally chosen as a function of structure, like vertices, edges or other graph parameters, which change during the recursive steps of the algorithm. In our algorithm, this change is reflected when vertices are moved to either S or D from U . The second observation is that any spanning tree on n vertices has at most n − 1 edges and hence when we select a vertex in S we increase the number of edges in T (S) and decrease the number of edges we can add to T (S). Finally we also gain when the degree of a vertex becomes two because reduction rules apply as soon as the degree 2 vertex becomes a candidate vertex. Our measure is precisely a function of these three parameters and is defined as follows:
A Moderately Exponential Time Algorithm
μ(G) = η|U2 | + |U≥3 | + αm ,
487
(2)
where U2 is the subset of undecided vertices of degree 2, U≥3 is the subset of undecided vertices of degree at least 3, m = n − 1 − |E(T (S))| is the number of edges that can be added to the spanning tree and α = 0.372 and η = 0.5 are numerically obtained constants to optimize the running time. We write μ instead of μ(G) if G is clear from the context. We prove that the problem can be solved for an instance of size μ in time O(xμ ) where x < 1.60702. As μ ≤ 1.372n, the final running time of the algorithm will be O(x1.372n ) = O(1.9172n). Denote by P [μ] the maximum number of times the algorithm is called recursively on a problem of size μ (i. e. the number of leaves in the search tree). Then the running time T (μ) of the algorithm is bounded by P [μ] · nO(1) because in any node of the search tree, the algorithm executes only a polynomial number of steps. We use induction on μ to prove that P [μ] ≤ xμ . Then T (μ) = xμ · nO(1) , and since the polynomial is suppressed by rounding the exponential base, we have T (μ) = O(1.60702μ). Clearly, P [0] = 1. Suppose that P [k] ≤ xk for every k < μ and consider a problem of size μ. We remark that in all the branching steps all the candidate vertices in U have degree at least 3, otherwise reduction rules R5 or R6 would have applied. Case 2: In this case, the number of vertices in U≥3 decreases by one in both recursive calls and the number of edges in T (S) increases by at least 3 in the first recursive call. Thus, P [μ] ≤ P [μ − 1 − 3α] + P [μ − 1]. Case 3: Here we branch on different connected components of G, and hence P [μ(G)] ≤
k
P [μ(Gi )].
i=1
Case 4: This case has the same recurrence as Case 2 as the number of vertices in U≥3 decreases by one in both recursive calls and the number of edges in T (S) increases by at least 3 in the first recursive call. Case 5: When the algorithm adds u to S, the number of vertices in U≥3 decreases by one and the number of edges in T (S) increases by 2 while in the other case, |U≥3 | decreases by two as both u and v are candidate vertices. So we get: P [μ] ≤ P [μ − 1 − 2α] + P [μ − 2]. Case 6: When the algorithm adds u to S, reduction rule R3 or R6 applies to x. We obtain the following recurrences, based on the degree of x: P [μ] ≤ P [μ − 1 − η − 2α] + P [μ − 1], P [μ] ≤ P [μ − 2 − 2α] + P [μ − 1].
488
S. Gaspers, S. Saurabh, and A.A. Stepanov
Case 7: In this case we distinguish two subcases based on the degrees of w1 and w2 . Our first subcase is when either w1 or w2 has degree 3 and the other subcase is when both w1 and w2 have degree at least 4. (Note that because of Case 5, v1 and v2 do not have a common neighbor and do not have a neighbor of degree 2). Suppose w1 has degree 3. When the algorithm adds u to D, the edges uv1 and uv2 are removed (R1), the degree of v1 is reduced to 1 and then reduction rule R2 is applied and makes w1 of degree 2. So, in this subcase, μ decreases by 2 − η. The analysis of the remaining branches is standard and we get the following recurrence: P [μ] ≤ P [μ − 3 − 2α] + 2P [μ − 3 − 5α] + P [μ − 3 − 8α] + P [μ − 2 + η]. For the other subcase we get the following recurrence: P [μ] ≤ P [μ − 3 − 2α] + 2P [μ − 3 − 6α] + P [μ − 3 − 10α] + P [μ − 1]. Case 8: This case is similar to Case 7 and we get the following recurrence: P [μ] ≤ P [μ−4−2α]+3P [μ−4−5α]+3P [μ−4−8α]+P [μ−4−11α]+P [μ−1]. In each of these recurrences, P [μ] ≤ xμ which completes the proof of the theorem.
The bottleneck of the analysis is the second recurrence in Case 7. Therefore, an improvement of this case would lead to a faster algorithm.
5
Conclusion
In this paper we have given an exact algorithm for the Full Degree Spanning Tree problem. The most important feature of our algorithm is the way we exploit connectivity arguments to reduce the size of the graph in the recursive steps of the algorithm. We think that this idea of combining connectivity while developing Branch & Reduce algorithms could be useful for various other nonlocal problems and in particular for other NP-complete variants of the Spanning Tree problem. Although the theoretical bound we obtained for our algorithm seems to be only slightly better than a brute-force enumeration algorithm, practice shows that Branch & Reduce algorithms perform usually better than the running time proved by a worst case analysis of the algorithm. Therefore we believe that this algorithm, combined with good heuristics, could be useful in practical applications. One problem which we would like to mention here is Minimum Maximum Degree Spanning Tree, where, given an input graph G, the objective is to find a spanning tree T of G such that the maximum degree of T is minimized. This problem is a generalization of the famous Hamiltonian Path problem for which no algorithm faster than 2n nO(1) is known. It remains open to find even a 2n nO(1) time algorithm for the Minimum Maximum Degree Spanning Tree problem.
A Moderately Exponential Time Algorithm
489
References 1. Bhatia, R., Khuller, S., Pless, R., Sussmann, Y.J.: The full degree spanning tree problem. Networks 36(4), 203–209 (2000) 2. Bj¨ orklund, A., Husfeldt, T.: Inclusion–Exclusion Algorithms for Counting Set Partitions. In: The proceedings of FOCS 2006, pp. 575–582 (2006) 3. Bj¨ orklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Fourier meets M¨ obius: Fast Subset Convolution. In: The proceedings of STOC 2007, pp. 67–74 (2007) 4. Broersma, H., Koppius, O.R., Tuinstra, H., Huck, A., Kloks, T., Kratsch, D., M¨ uller, H.: Degree-preserving trees. Networks 35(1), 26–39 (2000) 5. Christofides, N.: An Algorithm for the Chromatic Number of a Graph. Computer Journal 14(1), 38–39 (1971) 6. Fomin, F.V., Gaspers, S., Pyatkin, A.V.: Finding a Minimum Feedback Vertex Set in Time O(1.7548n ). In: Bodlaender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, pp. 184–191. Springer, Heidelberg (2006) 7. Fomin, F.V., Grandoni, F., Kratsch, D.: Measure and Conquer: Domination - A Case Study. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP 2005. LNCS, vol. 3580, pp. 191–203. Springer, Heidelberg (2005) 8. Fomin, F.V., Grandoni, F., Kratsch, D.: Measure and Conquer: A simple O(20.288n ) Independent Set Algorithm. In: The proceedings of SODA 2006, pp. 18–25 (2006) 9. Fomin, F.V., Grandoni, F., Kratsch, D.: Solving Connected Dominating Set Faster Than 2n . In: Arun-Kumar, S., Garg, N. (eds.) FSTTCS 2006. LNCS, vol. 4337, pp. 152–163. Springer, Heidelberg (2006) 10. Guo, J., Niedermeier, R., Wernicke, S.: Fixed-Parameter Tractability Results for Full-Degree Spanning Tree and Its Dual. In: Bodlaender, H.L., Langston, M.A. (eds.) IWPEC 2006. LNCS, vol. 4169, pp. 203–214. Springer, Heidelberg (2006) 11. Khuller, S., Bhatia, R., Pless, R.: On Local Search and Placement of Meters in Networks. SIAM Journal of Computing 32(2), 470–487 (2003) 12. Koivisto, M.: An O(2n ) Algorithm for Graph Colouring and Other Partitioning Problems via Inclusion-Exclusion. In: The proceedings of FOCS 2006, pp. 583–590 (2006) 13. Kruskal, J.B.: On the Shortest Spanning Subtree and the Traveling Salesman Problem. The proceedings of the American Mathematical Society 7, 48–50 (1956) 14. Lawler, E.L.: A Note on the Complexity of the Chromatic Number Problem. Information Processing Letters 5(3), 66–67 (1976) 15. Lewinter, M.: Interpolation Theorem for the Number of Degree-Preserving Vertices of Spanning Trees. IEEE Transaction Circ. Syst. 34, 205 (1987) 16. Ormsbee, L.E.: Implicit Network Calibration, Journal of Water Resources. Planning and Management 115(2), 243–257 (1989) 17. Ormsbee, L.E., Wood, D.J.: Explicit Pipe Network Calibratio. Journal of Water Resources, Planning and Management 112(2), 166–182 (1986) 18. Pothof, I.W.M., Schut, J.: Graph-theoretic approach to identifiability in a water distribution network. Memorandum, vol. 1283, Universiteit Twent (1995) 19. Razgon, I.: Exact Computation of Maximum Induced Forest. In: Arge, L., Freivalds, R. (eds.) SWAT 2006. LNCS, vol. 4059, pp. 160–171. Springer, Heidelberg (2006)
Speeding up Dynamic Programming for Some NP-Hard Graph Recoloring Problems Oriana Ponta1, , Falk H¨ uffner2, , and Rolf Niedermeier2 1
Mathematisches Institut, Ruprecht-Karls-Universit¨ at Heidelberg, Im Neuenheimer Feld 288, D-69120 Heidelberg, Germany [email protected] 2 Institut f¨ ur Informatik, Friedrich-Schiller-Universit¨ at Jena, Ernst-Abbe-Platz 2, D-07743 Jena, Germany {hueffner,niedermr}@minet.uni-jena.de
Abstract. A vertex coloring of a tree is called convex if each color induces a connected component. The NP-hard Convex Recoloring problem on vertex-colored trees asks for a minimum-weight change of colors to achieve a convex coloring. For the non-uniformly weighted model, where the cost of changing a vertex v to color c depends on both v and c, we improve the running time on trees from O(Δκ ·κn) to O(3κ ·κn), where Δ is the maximum vertex degree of the input tree T , κ is the number of colors, and n is the number of vertices in T . In the uniformly weighted case, where costs depend only on the vertex to be recolored, one can instead parameterize on the number of bad colors β ≤ κ, which is the number of colors that do not already induce a connected component. Here, we improve the running time from O(Δβ · βn) to O(3β · βn). For the case where the weights are integers bounded by M , using fast subset convolution, we further improve the running time with respect to the exponential part to O(2κ · κ4 n2 M log 2 (nM )) and O(2β · β 4 n2 M log2 (nM )), respectively. Finally, we use fast subset convolution to improve the exponential part of the running time of the related 1-Connected Coloring Completion problem.
1
Introduction
The issue of recoloring vertex-colored graphs by a minimum-cost set of color changes in order to achieve a desired property of the color classes such as being connected has recently received considerable attention; approximation as well as fixed-parameter algorithms have been developed for the corresponding NPhard problems [2, 7, 8, 15, 16, 20]. Here, we focus on exact fixed-parameter algorithms [9, 11, 18] for two prominent types of these problems, significantly improving on the associated exponential running time factors. The two types of problems we investigate are as follows. First, we study vertex-colored trees and the task is to recolor some of its vertices such that each color class forms
Supported by the Deutsche Forschungsgemeinschaft, Emmy Noether research group PIAF (fixed-parameter algorithms), NI 369/4.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 490–501, 2008. c Springer-Verlag Berlin Heidelberg 2008
Speeding up Dynamic Programming
491
a connected component. The problem was introduced by Moran and Snir [15], and it concerns the major part of this work. The second type of problem is not only concerned with trees, but with general graphs. However, here we cannot recolor freely, but a subset of vertices is uncolored and the task is to complete the coloring such that each color class forms a connected component [8]. Convex Recoloring. Most of this work deals with convex recoloring problems on trees. The most general version, that is, non-uniformly weighted, is defined as follows. Convex Recoloring Instance: A tree T = (V, E) with a vertex coloring C : V → C and a weight function w : V × C → + , where w(v, C(v)) = 0 for all v ∈ V . Task: Find a convex coloring C : V → C with minimum weight w(C ) := v∈V w(v, C (v)). We defined Convex Recoloring only for trees. There are some positive results for the special case of paths [15, 16], but there do not seem to be positive results for general graphs (Moran et al. [17] considered the slightly more general class of galled networks). Let κ be the number of colors |C| and n the number of vertices |V |. Let β ≤ κ be the number of bad colors, that is, colors that do not already induce a connected component. Convex Recoloring was introduced by Moran and Snir [15], who showed that the decision version is NP-complete, even for unweighted paths. They also gave an algorithm for non-uniformly weighted Convex Recoloring running in O(Δκ κn) time, where Δ is the maximum degree of the input graph. They further gave an algorithm running in O((κ/ log κ)κ · κn4 ) time, thus showing that the problem is fixed-parameter tractable with respect to the parameter κ. For the uniformly weighted case, they showed that κ can be replaced by the potentially smaller parameter β in these running times. For the unweighted case, Razgon [20] gave an 256k · nO(1) time algorithm, where k is the number of vertices recolored. This can be related to other results by noting that k ≥ β/2 (every color change can make at most two bad colors good). Bodlaender and Weyer [5] considered a different parameter, namely the separation of colors , which is the maximum number of colors separated by a vertex, where we say that a vertex v separates a color c if there is a path between two vertices of color c that passes through v. They presented a running time of O(3 ·3 n). Since they showed that ≤ k+1, this also improves Razgon’s result to O(3k ·kn). Bar-Yehuda et al. [2] further improved the bound to O(2k ·kn+ n2) by doing a better analysis of a variant of the dynamic programming algorithm of Moran and Snir [15]. Bodlaender et al. [6] showed a problem kernel with O(k 6 ) vertices, which was later improved to O(k 2 ) vertices [7]. Finally, Moran and Snir [16] gave a factor-3 approximation for the uniformly weighted case running in O(κn2 ) time. This was improved to a (2 + )-approximation running in O(n2 + n(1/)2 41/ ) time [2].
492
O. Ponta, F. H¨ uffner, and R. Niedermeier
Bachoore and Bodlaender [1] gave an O(4k n) time algorithm for the variant where only the leaves are precolored. Connected Coloring Completion. The second type of problem we study has only recently been introduced by Chor et al. [8]; accordingly, so far less results are known for this problem. As Convex Recoloring, it is motivated by applications in bioinformatics. 1-Connected Coloring Completion Instance: A graph G = (V, E) with k uncolored vertices U ⊆ V and a vertex coloring C : V \ U → C. Task: Find a convex coloring C : V → C that extends C, that is, for all v ∈ V \ U : C (v) = C(v). Chor et al. [8] also considered the more general r-Connected Coloring Completion, where the goal is to find a coloring where each color induces at most r connected components. They showed that 1-Connected Coloring Completion is NP-hard, even for only two colors, but can be solved in O(8k ·k+ 2k · kn) time on an n-vertex graph. They further showed that for the parameter treewidth, r-Connected Coloring Completion is fixed-parameter tractable for r = 1 but W[1]-hard for r ≥ 2. Our contributions. The main purpose of this paper can be seen in “engineering” dynamic programs for weighted Convex Recoloring problems and for (unweighted) 1-Connected Coloring Completion with respect to their exponential running time factors. To this end, we make use of two main technical tricks investigated in greater depth in the following two sections. First, we observe how a method for tree problems originally going back to Maffioli [14] (which meanwhile has found several applications, see, e.g., [4, 5]) also helps to significantly speed up and somewhat simplify dynamic programming algorithms for weighted convex recoloring problems. Second, we show how a recent general breakthrough result of Bj¨ orklund et al. [3] concerning a more efficient computation of subset convolutions can be tailored towards applying it to recoloring problems.1 More specifically, for non-uniformly weighted Convex Recoloring we improve a previous exponential factor of Δκ to 3κ and further on to 2κ , and for uniformly weighted Convex Recoloring we improve a previous exponential factor of Δβ to 3β and further on to 2β ; herein, Δ denotes the maximum vertex degree in the tree. Note that the improvements from exponential base 3 to 2 come along with increased polynomial factors in the running time. Finally, we also adapt the subset convolution trick to 1-Connected Coloring Completion in order to improve the previous exponential factor of 8k to 4k . 1
Lingas and Wahlen [13] recently presented an application in the context of subgraph homeomorphism problems.
Speeding up Dynamic Programming
2
493
Fine-Grained Dynamic Programming
The major part of this work is concerned with improvements for non-uniformly and uniformly weighted Convex Recoloring based on a more efficient dynamic programming strategy. The essence of the underlying trick can be traced back to work of Maffioli [14]. We start with the somewhat less technical case concerning a dynamic program for non-uniformly weighted Convex Recoloring with respect to the parameter “number of colors” and then extend our findings to uniformly weighted Convex Recoloring with respect to the parameter “number of bad colors”. 2.1
Non-uniformly Weighted Convex Recoloring
In this section, we show how to improve the running time of the dynamic programming by Moran and Snir [15] from O(Δκ · κn) to O(3κ · κn), where κ is the number of colors and n is the number of vertices in the input graph. The dynamic programming works bottom-up from the leaves of the tree. The improvement comes from not considering all children of an inner vertex at once, but rather taking them into account one-by-one. This is a classical trick for dynamic programming on trees (see e. g., [14, 5, 4]). A more detailed presentation of our result is given in the thesis of Ponta [19]. We designate an arbitrary vertex r of T as the root. For each vertex v ∈ V , we denote by Tv the subtree induced by v and all descendants of v. For a vertex v with children w1 , . . . , wp in an arbitrary but fixed order, we denote by Tv,i the subtree induced by v, the first i children w1 , . . . , wi of v, and all descendants of w1 , . . . , wi . Note that Tv,0 contains only the vertex v and that Tv,p equals Tv . The basic structure of Moran and Snir’s original algorithm is preserved. The algorithm visits the vertices in postorder. We start by determining the trivial convex recolorings for the leaves of the tree and proceed with the computation of weights of convex recolorings of subtrees Tv for internal vertices v in a bottom-up fashion. A solution for Tv is constructed using the previously computed solutions for the subtrees induced by the children of v. The way a solution for the extended problem is computed differs from Moran and Snir’s algorithm and is the key to the running time improvement. For the description of the algorithm, we need two dynamic programming tables denoted by opt and optr . Let C [Tv ] be the set of colors appearing in the subtree Tv . Definition 1. Let v ∈ V and D ⊆ C be a set of colors. A recoloring C is a (Tv , D)-coloring if it is a convex recoloring of Tv such that C [Tv ] = D. The cost of an optimal (Tv , D)-coloring of Tv is denoted by opt(Tv , D). If Tv has less than |D| vertices or D = ∅, then no (Tv , D)-coloring exists, and we set opt(Tv , D) = ∞. A (Tv , D)-coloring is a convex recoloring of Tv that uses exactly the colors from D. Thus, the cost of an optimal convex recoloring of T can be calculated as minD⊆C opt(Tr , D). To retrieve the recoloring that realizes
494
O. Ponta, F. H¨ uffner, and R. Niedermeier
this cost, we can use standard dynamic programming backtracing methods. It remains to describe how to fill in the dynamic programming table opt. For this, we need a second table optr . Definition 2. Let v ∈ V , D ⊆ C and c ∈ C. A recoloring C is a (Tv , D, c)coloring if it is a (Tv , D)-coloring such that C (v) = c. The cost of an optimal (Tv , D, c)-coloring is denoted by optr (Tv , D, c). / D. It is easy to calculate opt from optr : We set optr (Tv , D, c) = ∞ if c ∈ opt(Tv , D) = min optr (Tv , D, c). c∈D
(1)
For a subtree Tv consisting of only the vertex v, we set optr (Tv , {c}, c) = w(v, c) and optr (Tv , D, c) = ∞ for D = {c}. For an interior vertex v with children w1 , . . . , wp , we inductively assume that the values optr (Twi , ·, ·) for 1 ≤ i ≤ p are already calculated. We then iteratively calculate optr (Tv,i+1 , ·, ·) for i = 0, . . . , p, obtaining optr (Tv , ·, ·) = optr (Tv,p , ·, ·). Thus, each iteration has to take into account the subtree Twi+1 in addition to the subtree Tv,i considered in the previous iteration. In contrast, Moran and Snir [15] take into account all child subtrees at once. The childwise iterative approach to dynamic programming on trees has also been used e. g. to find minimum-weight subtrees of a tree [14, 4] or for unweighted Convex Recoloring with a different parameter [5]. In our context, the technique allows to avoid the maximum vertex degree Δ in the base of the exponential part of the running time of Moran and Snir’s algorithm. For a simpler notation of the recurrence for optr (Tv,i+1 , ·, ·), we define the function optc (Twi+1 , D, c) for the (i + 1)th child wi+1 of v and D ⊆ C, v ∈ C as optc (Twi+1 , D, c) = min{opt(Twi+1 , D \ {c}), optr (Twi+1 , D ∪ {c}, c)}.
(2)
Thus, the value optc (Twi+1 , D, c) is the minimum cost of a convex recoloring C of Twi+1 that uses every color in D \ {c}, no color from C \ (D ∪ {c}), and uses color c in Twi+1 only if C (wi+1 ) = c. The following lemma describes the central recurrence for optr . Lemma 1. Let v be an interior vertex with children w1 , . . . , wp . For any color set D and any color c ∈ D it holds that optr (Tv,i+1 , D, c) =
min
(optr (Tv,i , D1 ∪{c}, c)+optc (Twi+1 , D2 , c)). (3)
D1 ∪D2 =D\{c} D1 ∩D2 =∅
Proof. “≥”: Let C be an optimal (Tv,i+1 , D, c)-coloring. The weight of the recoloring C is then w(C ) = optr (Tv,i+1 , D, c). Let D1 = C [Tv,i ] \ {c} be the set of colors different from c that C uses in the recoloring of Tv,i , and let D2 = C [Twi+1 ]\{c} be the set of colors different from c that C uses in the recoloring of Twi+1 . Given the fact that C (v) = c and the convexity of (C , Tv,i+1 ), it follows that D1 ∩ D2 = ∅. By the definitions of the sets D1 , D2 , and D, it holds that D1 ∪ D2 = D \ {c}.
Speeding up Dynamic Programming
495
For a subtree T of a tree T and a coloring C of T , let C|T be the restriction of C to the vertices of T . Since C |Tv,i is a (Tv,i , D1 , c)-coloring of Tv,i and C |Twi+1 is a (Twi+1 , D2 )- or (Twi+1 , D2 , c)-coloring of Twi+1 , it holds that w(C |Tv,i ) ≥ optr (Tv,i , D1 ∪ {c}, c) and w(C |Twi+1 ) ≥ optc (Twi+1 , D2 , c). Consequently, w(C ) is at least the right-hand side of (3). “≤”: Consider D1 and D2 with D1 ∪ D2 = D \ {c} and D1 ∩ D2 = ∅ such that the sum optr (Tv,i , D1 ∪ {c}, c) + optc (Twi+1 , D2 , c) is minimized. Denote with Cv,i the recoloring of Tv,i witnessing the cost optr (Tv,i , D1 ∪ {c}, c) and with Cwi+1 the recoloring of Twi+1 witnessing the cost optc (Twi+1 , D2 , c). We can then combine Cv,i and Cw to obtain a coloring C for Tv,i+1 . The weight i+1 of C equals the right-hand side of (3). By construction, C is a convex recoloring of Tv,i+1 that uses exactly the colors in D and has C (v) = c. Thus, w(C ) is at least optr (Tv,i+1 , D, c).
Theorem 1. Non-uniformly weighted Convex Recoloring can be solved in O(3κ · κn) time for a tree with n vertices and κ colors. Proof. We have shown how to solve non-uniformly weighted Convex Recoloring by dynamic programming using the recurrences (1), (2), and (3). By visiting each vertex in postorder, it is possible to fill in opt, optc , and optr while only accessing already calculated entries. It remains to bound the running time. The bottleneck is clearly the calculation of (3). Since there are O(n) edges in a tree, we have O(n) values for the first component Tv,i+1 . For fixed c and Tv,i+1 , the computation of optr (Tv,i+1 , D, c) effectively needs to examine all 3-ordered partitions of C \ {c} of the form (C \ D, D1 , D2 ); there are 3κ−1 such partitions. In total, we arrive at the claimed running time.
2.2
Uniformly Weighted Convex Recoloring
In this section, we show that for the uniformly weighted case, the parameter κ (number of colors) can be replaced by β (number of bad colors) in the running time of Theorem 1. This is particularly attractive for scenarios where the input is already almost convex. Moran and Snir [15] have shown how to get an O(Δβ ·βn) time algorithm from their O(Δκ · κn) time dynamic programming algorithm for the non-uniformly weighted case. We show that analogously, our O(3κ · κn) time algorithm (Theorem 1) can be improved to O(3β · βn) time for the nonuniformly weighted case. The approach is similar to that of Moran and Snir, but we considerably simplify some concepts and proofs. When recoloring, typically good colors are overwritten by bad colors, in order to connect different regions of a bad color. It is tempting to just restrict the search of alternative colors to bad colors, which would reduce the size of the dynamic programming tables defined in Sect. 2.1 and give the desired speedup of replacing κ by β in the base of the exponential factor. However, this is not correct: sometimes a bad color has to be overwritten with a good color in order to wipe out a region of this bad color. The central observation of Moran and Snir [15] is that when overwriting a color with a good color, we do not have to decide
496
O. Ponta, F. H¨ uffner, and R. Niedermeier
immediately which good color to use—the goal is after all only to get rid of the bad color of the vertex that is being recolored. We capture this in the notion of a restricted recoloring, which is a coloring V → C ∪ {∗}, where ∗ ∈ / C serves to mark vertices that are uncolored.2 It is easy to see that a standard recoloring is convex iff all vertices on a path between two vertices with the same color c also have color c. In analogy, we say that a restricted recoloring is convex iff all vertices on a path between two vertices with the same color c = ∗ have color c. In the uniform cost model, we can assign a cost to a restricted recoloring by simply giving cost w(v) to the recoloring of v ∈ V with ∗ (this is not possible in the non-uniform model, where cost also depends on the actual color used in recoloring a vertex). The following lemma shows that to find an optimal convex recoloring, it suffices to look for optimal restricted recolorings. Lemma 2. In the uniformly weighted model, any convex restricted recoloring can be converted in linear time into a convex recoloring of the same weight and vice versa. Proof. Given a convex restricted recoloring, we can fill in the colors of the uncolored vertices by a depth-first search starting from some not uncolored vertex, where we recolor an uncolored vertex with the color of its predecessor in the search. This clearly produces a convex recoloring with the same weight. The only way a convex recoloring Cˆ of a coloring C might not already be a restricted recoloring is that some vertex color was overwritten with a good color. We construct C from Cˆ by recoloring these vertices by ∗ instead. Clearly, ˆ and we claim that C is also convex. For this, C has the same weight as C, consider two vertices v1 , v2 with C (v1 ) = C (v2 ) = c = ∗. By construction ˆ 1 ) = C(v ˆ 2 ) = c. Thus, every vertex on the path between v1 of C , then also C(v ˆ and v2 is colored c by C. If c is a bad color, then also every vertex on the path between v1 and v2 is colored c by C , since only good colors are used differently between Cˆ and C ; if c is a good color, then Cˆ has left v1 and v2 unchanged from C, and because c is a good color, any vertex between v1 and v2 must also be colored c by C, and thus also by Cˆ and C . In summary, every vertex on the path between v1 and v2 is colored c by C , and thus C is convex.
By Lemma 2, it suffices to find the weight of an optimal restricted recoloring to solve the uniformly weighted Convex Recoloring problem. The dynamic programming from Sect. 2.1, based on the three tables opt, optr , and optc calculated by the recurrences (1), (2), and (3) in tree postorder remains almost unchanged. Therefore, we only point out the differences here. Let B be the set of bad colors. The table opt now only covers restricted colorings, that is, opt(Tv , D) is the weight of an optimal restricted coloring C such that for each c ∈ B there is a vertex x in Tv with C (x) = C(x) and C (x) = c iff c ∈ D. Table optr is adapted analogously, and we additionally allow ∗ as third argument. In the initialization, we also set optr (Tv , ∅, ∗) = w(v) 2
Moran and Snir [15] use the more complicated notion of a conservative recoloring.
Speeding up Dynamic Programming
497
and optr (Tv , D, ∗) = ∞ for all D = ∅. For the case that c = ∗ in recurrence (3), we use optr (Tv,i+1 , D, ∗) =
min
D1 ∪D2 =D D1 ∩D2 =∅
(optr (Tv,i , D1 , ∗) + opt(Twi+1 , D2 )).
(4)
We omit the proof, which is analogous to that in Sect. 2.1. Theorem 2. Uniformly weighted Convex Recoloring can be solved in O(3β · βn) time for a tree with n vertices and β bad colors.
3
Fast Subset Convolution
When we restrict the weights to be integers bounded by M , we can further improve the exponential part of the running time for Convex Recoloring by using fast subset convolution. This novel technique was developed by Bj¨ orklund et al. [3], who used it to speed up several dynamic programming algorithms such as the classical Dreyfus–Wagner algorithm [10] for Steiner Tree in graphs. It was also used to improve the speed of subgraph homeomorphism algorithms [13]. Let f and g be functions defined on the power set of a finite set N with |N | = p, that is, f, g : P(N ) → I. For any ring over I that defines addition and multiplication on elements of I, the subset convolution of f and g, denoted by f ∗ g, is defined for each S ⊆ N as f (T )g(S \ T ). (5) f ∗ g : P(N ) → I, (f ∗ g)(S) = T ⊆S
To calculate the subset convolution means to determine the value of f ∗ g for all 2p possible inputs, assuming that f and g can be evaluated in constant time (typically by being stored in atable). A naive algorithm that calculates each value independently needs O( pi=0 pi 2i ) = O(3p ) ring operations. The following result shows a substantial improvement. Theorem 3 (Bj¨ orklund et al. [3]). The subset convolution over an arbitrary ring can be computed with O(2p · p2 ) ring operations. Bj¨ orklund et al. [3] showed how to apply Theorem 3 to also calculate the subset convolution for the integer min-sum semiring
f ∗ g : P(N ) → ,
(f ∗ g)(S) = min f (T ) + g(S \ T ) T ⊆S
(6)
by embedding it into the standard integer sum-product ring. Here, it is not appropriate to assume that addition and multiplication can be done in constant orklund et al. [3] time, since the numbers involved can have up to n bits.3 Bj¨ did not give a precise estimation, but it is not too hard to derive the following bound from their Theorem 3 [3]. Proposition 1. The subset convolution over the integer min-sum ring with M := maxi∈(f (P(N ))∪g(P(N ))) |i| can be computed in O(2p · p3 M log2 (M p)) time. 3
To avoid complicated terms, we assume a bound of O(n log2 n) on the running time of integer multiplication of two n-bit numbers. Better bounds are known [12].
498
3.1
O. Ponta, F. H¨ uffner, and R. Niedermeier
Convex Recoloring
We now use fast subset convolution over the integer min-sum to speed up the dynamic programming for Convex Recoloring. Recall that the bottleneck in deriving the running time of O(3κ · κn) (Theorem 1) comes from recurrence (3), which we recall here: optr (Tv,i+1 , D, c) =
min
(optr (Tv,i , D1 ∪ {c}, c) + optc (Twi+1 , D2 , c)).
D1 ∪D2 =D\{c} D1 ∩D2 =∅
Consider fixed Tv,i+1 and c. Then (3) can be seen as a subset convolution over the c (D) = optr (Tv,i , D ∪ {c}, c) integer min-sum semiring (like in (6)) by setting fv,i c and gwi +1 (D) = optc (Twi+1 , D, c): c c optr (Tv,i+1 , D, c) = (fv,i ∗ gw )(D \ {c}). i+1
(7)
Theorem 4. The non-uniformly weighted Convex Recoloring problem with integer weights bounded by M can be solved in O(2κ · κ4 n2 M log2 (nM )) time for a tree with n vertices and κ colors. Proof. We solve non-uniformly weighted Convex Recoloring by dynamic programming using the recurrences (1), (2), and (3), where (3) is calculated by fast subset convolution as in (7). Using Proposition 1, for fixed Tv,i+1 and c, we can c c calculate (7) in O(2κ ·κ3 nM log2 (nM )) time, because the values of fv,i and gw i +1 are bounded by nM , since they are weights of recolorings, and κ ≤ n. The rest of the analysis is as in Theorem 1.
In the same way as for Theorem 2, we obtain a running time of O(2β · β 4 n2 M log2 (nM )) for the uniformly weighted case. Theorem 5. The uniformly weighted Convex Recoloring problem with integer weights bounded by M can be solved in O(2β · β 4 n2 M log2 (nM )) time for a tree with n vertices and β bad colors. 3.2
1-Connected Coloring Completion
Chor et al. [8] gave simple linear-time preprocessing rules that allow without loss of generality to assume κ ≤ k, that is, there are at most as many colors as uncolored vertices. The data reduction also collapses each maximal connected monochromatic subgraph into a single vertex. Thus, the problem can be restated as finding a coloring of the set U of uncolored vertices such that each color induces a connected subgraph in U and each vertex in V \ U with color c is adjacent to a vertex with color c in U . Chor et al. [8] solved 1-Connected Coloring Completion by using a binary-valued dynamic programming table T (C , U ) for C ⊆ C and U ⊆ U with the following semantics: T (C , U ) = 1 iff if it is possible to color U with C such that each color c ∈ C induces a connected subgraph Gc in U that dominates the vertices colored c in V \ U , meaning that each such vertex is adjacent to at
Speeding up Dynamic Programming
499
least one vertex in Gc . Thus, if T (C , U ) = 0, then it is not possible to solve the instance by assigning (exclusively) the colors from C to the vertices in U ; but if T (C , U ) = 1, then solving the instance is still possible by finding a suitable allocation of the remaining colors C \ C to the remaining uncolored vertices U \ U . Clearly, if T (C, U ) = 1, then the instance is solvable, and we can find the corresponding solution by backtracing. Chor et al. [8] used the following recurrence to fill in T : T (C , U ) = 1 ⇐⇒ ∃c ∈ C , U ⊂ U : T (C \ {c}, U ) = 1 and U \ U induces a connected subgraph that dominates the vertices of color c,
(8)
which can be simplified to T (C \ {c}, U ) ∧ T ({c}, U \ U )
T (C , U ) =
(9)
U ⊆U
for some c ∈ C . To be able to calculate recurrence (9), we need all values of T ({c}, U ) for c ∈ C and U ⊆ U . The calculation can clearly be done in O(2k ·kn) time, since there are 2k ·k such entries, and each can be calculated in linear time. A straightforward calculation of (9) for an entry then takes O(2k ) time, and there are 4k table entries, thus giving a total running time of O(8k + 2k · kn). To speed up the exponential part of the calculation of (9), we use fast subset convolution over the or-and semiring. Proposition 2. The subset convolution over the or-and semiring f (T ) ∧ g(S \ T ) f ◦ g : P(N ) → {0, 1}, (f ◦ g)(S) =
(10)
T ⊆S
with |N | = p can be calculated in O(2p · p3 log2 p) time. Proof. It holds that (f ◦ g)(S) =
f (T ) ∧ g(S \ T )
(11)
T ⊆S
=
=
1 if maxT ⊆S (f (T ) + g(S \ T )) = 2 0 otherwise
(12)
1 if (f ∗ g)(S) = 2 0 otherwise,
(13)
and the subset convolution “∗” in the integer min-sum semiring can be calculated using Proposition 1 with M = 1.
Next, we define fC (U ) = T (C \ {c}, U )
gC (U ) = T ({c}, U ),
(14) (15)
500
O. Ponta, F. H¨ uffner, and R. Niedermeier
for some c ∈ C , which gives us T (C , U ) = (fC ◦ gC )(U ).
(16)
Theorem 6. 1-Connected Coloring Completion can be solved in O(4k · k 3 log2 k + 2k · kn) time. Proof. By Proposition 2, for fixed C , we can calculate (16) in O(2k · k 3 log2 k) time. There are 2k subsets C ⊆ C. Thus, together with the O(2k · kn) time for the table initialization, we arrive at the claimed running time.
4
Outlook
We improved known fixed-parameter tractability results based on dynamic programming for several NP-hard recoloring problems in trees and graphs. These problems are mainly motivated by applications in bioinformatics (particularly, phylogenetics). The running times now seeming practically feasible, so it would be desirable to experimentally test the algorithms on real-world data. In particular, it would be interesting to see how the improvements concerning the exponential factors that have been achieved due to fast subset convolution pay off in practice. Moreover, also the space consumption of our algorithms is exponential and so memory space could become the real bottleneck in applications—this invites further research on improvement strategies.
References [1] Bachoore, E.H., Bodlaender, H.L.: Convex recoloring of leaf-colored trees. In: Proc. 3rd ACiD. Texts in Algorithmics, vol. 9, pp. 19–33. College Publications, London (2007) [2] Bar-Yehuda, R., Feldman, I., Rawitz, D.: Improved approximation algorithm for convex recoloring of trees. Theory of Computing Systems, (to appear, 2007) [3] Bj¨ orklund, A., Husfeldt, T., Kaski, P., Koivisto, M.: Fourier meets M¨ obius: fast subset convolution. In: Proc. 39th STOC, pp. 67–74. ACM Press, New York (2007) [4] Blum, C.: Revisiting dynamic programming for finding optimal subtrees in trees. European Journal of Operational Research 177(1), 102–115 (2007) [5] Bodlaender, H.L., Weyer, M.: Convex and connected recolorings of trees and graphs (unpublished manuscript, 2005) [6] Bodlaender, H.L., Fellows, M.R., Langston, M.A., Ragan, M.A., Rosamond, F.A., Weyer, M.: Kernelization for convex recoloring. In: Proc. 2nd ACiD. Texts in Algorithmics, vol. 7, pp. 23–35. College Publications, London (2006) [7] Bodlaender, H.L., Fellows, M.R., Langston, M.A., Ragan, M.A., Rosamond, F.A., Weyer, M.: Quadratic kernelization for convex recoloring of trees. In: Lin, G. (ed.) COCOON. LNCS, vol. 4598, pp. 86–96. Springer, Heidelberg (2007) [8] Chor, B., Fellows, M.R., Ragan, M.A., Razgon, I., Rosamond, F.A., Snir, S.: Connected coloring completion for general graphs: Algorithms and complexity. In: Lin, G. (ed.) COCOON. LNCS, vol. 4598, pp. 75–85. Springer, Heidelberg (2007)
Speeding up Dynamic Programming
501
[9] Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Heidelberg (1999) [10] Dreyfus, S.E., Wagner, R.A.: The Steiner problem in graphs. Networks 1(3), 195– 207 (1972) [11] Flum, J., Grohe, M.: Parameterized Complexity Theory. Springer, Heidelberg (2006) [12] F¨ urer, M.: Faster integer multiplication. In: Proc. 39th STOC, pp. 57–66. ACM Press, New York (2007) [13] Lingas, A., Wahlen, M.: On exact complexity of subgraph homeomorphism. In: Cai, J.-Y., Cooper, S.B., Zhu, H. (eds.) TAMC 2007. LNCS, vol. 4484, pp. 256– 261. Springer, Heidelberg (2007) [14] Maffioli, F.: Finding a best subtree of a tree. Technical Report 91.041, Politecnico di Milano, Dipartimento di Elettronica, Italy (1991) [15] Moran, S., Snir, S.: Convex recolorings of strings and trees: Definitions, hardness results and algorithms. In: Dehne, F., L´ opez-Ortiz, A., Sack, J.-R. (eds.) WADS 2005. LNCS, vol. 3608, pp. 218–232. Springer, Heidelberg (2005) (to appear in Journal of Computer and System Sciences) [16] Moran, S., Snir, S.: Efficient approximation of convex recolorings. Journal of Computer and System Sciences 73(7), 1078–1089 (2007) [17] Moran, S., Snir, S., Sung, W.-K.: Partial convex recolorings of trees and galled networks: Tight upper and lower bounds (February 2007) (manuscript) [18] Niedermeier, R.: Invitation to Fixed-Parameter Algorithms. Oxford Lecture Series in Mathematics and Its Applications, vol. 31. Oxford University Press, Oxford (2006) [19] Ponta, O.: The Fixed-Parameter Approach to the Convex Recoloring Problem. Diplomarbeit, Mathematisches Institut, Ruprecht-Karls-Universit¨at. Springer, Heidelberg (2007) [20] Razgon, I.: A 2O(k) poly(n) algorithm for the parameterized convex recoloring problem. Information Processing Letters 104(2), 53–58 (2007)
A Linear-Time Algorithm for Finding All Door Locations That Make a Room Searchable (Extended Abstract) John Z. Zhang1 and Tsunehiko Kameda2 1
Department of Mathematics and Computer Science, University of Lethbridge Lethbridge, AB, Canada T1K 3M4 [email protected] 2 School of Computing Science, Simon Fraser University Burnaby, BC, Canada V5A 1S6 [email protected]
Abstract. A room is a simple polygon with a prespecified point, called the door, on its boundary. Search may be conducted by two guards on the boundary who keep mutual visibility at all times, or by a single boundary searcher with a flashlight. Search starts at the door, and must detect any intruder that was in the room at the time the search started, preventing the intruder from escaping through the door. A room may or may not be searchable, depending on where the door is placed or no matter where the door is placed. We want to find all intervals on the boundary where the door can be placed for the resultant room to be searchable. It is known that this problem can be solved in O(n log n) time, if the given polygon has n sides. We improve this complexity to O(n).
1
Introduction
Imagine intruders who move freely within a dark polygonal region. One or more searchers, who are equipped with flashlights, try to detect them. It is assumed that the intruders can move arbitrarily fast and try to escape detection. Suzuki and Yamashita [1] formulated polygon search by introducing the k-searcher whose vision is restricted to the beams from the k flashlights. The purpose of search is to detect the intruders by eventually illuminating all of them. A searcher who has 360o visibility and can illuminate an intruder in any direction is called an ∞-searcher. In street search [2], two points on the polygon boundary, the entrance and the exit, are prespecified. Two guards start moving in the opposite directions from the entrance along the boundary, while maintaining mutual visibility. They may move backwards from time to time, as long as each of them does not move beyond the entrance and exit. The search completes when they meet at the exit. A street is said to be walkable if every intruder is detected by the time the search is completed. Heffernan [3] proposed a linear-time algorithm to check whether a street is walkable. Tseng et al. [4] considered the problem of finding all the M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 502–513, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Linear-Time Algorithm
503
pairs of entrance and exit to make the resultant street walkable. Bhattacharya et al. [5] came up with the optimal algorithm for the same problem. In [6], Crass et al. studied an ∞-searcher in an open-edge “corridor”, which uses edges as the entrance and exit instead of vertices. Another special case of polygon search is room search, which has been studied extensively [7,8,9,10,11]. A room, denoted by P (d), is a simple polygon P with a designated point d on its boundary, called the door, which is like the entrance in street search. Unlike street search, however, no exit is prespecified. If two boundary guards are used, they move on the boundary as in street search and eventually meet somewhere on the boundary [9]. We may also use a boundary 1-searcher who can move only along the room boundary, starting at the door [7]. An intruder is detected if he is illuminated by the beam. We want to test if a given room P (d) is searchable by two guards or a 1-searcher. We proposed a linear-time algorithm for checking whether a given room is searchable by two guards in [12]. In [13], we also presented an O(n log n) time algorithm for finding all the door locations for a given polygon to make the resultant room searchable by two guards or a 1-searcher, respectively, where n is the number of vertices of the polygon. In this paper, we show that this problem can be solved in O(n) time. Other related work can be found in [8,10,11]. The paper is organized as follows. In Section 2, we introduce the notation used throughout the paper, and review some basic concepts related to room search. In particular, we review the visibility diagram. In Section 3, we discuss the relation between a room’s searchability and its LR-visibility. In Section 4, we analyze and characterize searchable rooms using the visibility diagram. Section 5 presents our linear-time algorithm for identifying a set of possible positions for the door such that the resultant rooms are searchable by two guards. We then extend our discussions to the 1-searcher case. Finally in Section 6, we summarize our work. The full version of this paper, complete with all the proofs, is available as a technical report [14].
2 2.1
Preliminaries Notation
A simple polygon P is defined by a clockwise sequence of distinct vertices numbered 0, 1, · · · , n − 1, (n ≥ 3), and n edges, connecting adjacent vertices. The edge between vertices u and v is denoted by (u, v). The boundary of P , denoted by ∂P , consists of all its vertices and edges. The vertices immediately preceding and succeeding vertex v in the clockwise order are denoted by Pred(v) and Succ(v), respectively. For any two points a, b ∈ ∂P , the open and closed portions of ∂P clockwise from a to b are denoted by ∂P (a, b) and ∂P [a, b], respectively. A vertex whose interior angle between its two incident edges in the polygon is more than 180o is called a reflex vertex. Consider reflex vertex r. Extend the edge (Succ(r), r) toward the interior of P , and let B(r) ∈ ∂P denote the backward extension point, where this extension leaves P for the first time. The polygonal area formed by ∂P [r, B(r)] and the chord rB(r) is called the clockwise component
504
J.Z. Zhang and T. Kameda
associated with r, and is denoted by Ccw (r). Similarly, the extension of (Pred(r), r) determines forward extension point, F (r), and the counter-clockwise component, Cccw (r), associated with r is bounded by ∂P [F (r), r] and the chord rF (r). If Ccw (r) (resp. Cccw (r)) does not totally contain any other clockwise (resp., counter-clockwise) component, it is said to be a non-redundant component. Since our result in this paper makes use of some previous results, which are based on the assumption that no three vertices of the polygon are collinear, and no three lines defined by edges intersect in a common point, we adopt the same assumption in our discussions. 2.2
Visibility Diagram and Skeleton V-Diagram
Two points u and v are said to be mutually visible if the line segment uv is completely contained inside P , where we consider that ∂P is inside P . Given a polygon P , we define a configuration to be an element in ∂P × ∂P [15]. We call p, q ∈ ∂P × ∂P a visible (resp. invisible) configuration, if p and q are mutually visible (resp. invisible). Let us pick an arbitrary point on the boundary ∂P as the origin, and measure all distances along ∂P clockwise from the origin. Let |∂P | denote the length of ∂P . For x ∈ IR,1 x represents the point p(x) ∈ ∂P which is at distance x − k|∂P | from the origin, where k is an integer such that 0 ≤ x − k|∂P | < |∂P |. We thus can consider any x ∈ IR as representing point p(x) on ∂P . Note that there are infinitely many real numbers x ∈ IR that represent the same point p(x) on ∂P . Let x, y ∈ IR. Without loss of generality, we maintain one side of the beam (or the line connecting two guards on the polygon boundary) clear of the intruders. As the search progresses, x and y, which are the points on ∂P where the cleared area ends, can be considered as real-valued functions x, y : [0, 1] → IR.2 A schedule is a pair σ = (x(t), y(t)). Since the length of the cleared part on the boundary is at most |∂P |, we impose the constraint x(t) − |∂P | ≤ y(t) ≤ x(t) for any t ∈ [0, 1]. The visibility space, denoted by V, consists of the infinite area between and including the lines y = x (the start line S) and y = x − |∂P | (the goal line G), as shown in Fig. 1 [16].3 The visibility diagram (V-diagram for short) for a given polygon is drawn in the V-space by shading some areas in it gray as follows: point (x, y) ∈ V is gray if configuration x, y is invisible. For example, Fig. 2 (b) shows the V-diagram for the polygon in Fig. 2 (a). In the diagram, coordinate (x, y) on line S or G is labeled by vertex v if x = + k|∂P | and y = + k |∂P | for some integers k and k , where is the distance of v from the origin on ∂P . The straight-edge boundaries of the gray areas touching either the S line of G line constitute the skeleton V-diagram (SV-diagram, for short). See the highlighted line segments on the boundary of the gray areas in Fig. 2 (b). The maximal contiguous white area in an SV-diagram is called a cell. A cell that has either three or four sides is called a simple cell. 1 2
3
IR denotes the set of all real numbers. We shall see in Section 3 that the formulation here naturally maps to the room search by two guards. The two dashed lines originating from point (d, d) will be discussed in Section 4.1.
A Linear-Time Algorithm y=x
y
505
y=x−L
(d, d)
(d+L, d)
S G
(d, d−L)
0
x
Fig. 1. Visibility space (L = |∂P |)
13 12 11 10
6
4
7 9 4
8
5
10
2
3
12
11 1 13 0
1
2
5
67
8
9
3
0 0 13 12 12 11 11 10 10 9 9 8 8 7 6 7 6 5 5 4 4 3 2 3 2 1 0 0 1
(a)
00 1 23 4 5 67 8 9 101112 13 0 0 13 12 11 10 9 8 67 5 4 3 2 1 0 13
(b)
Fig. 2. (a) An example polygon; (b) Its V-diagram
3
Searchability and LR-Visibility
Given a polygon P , let d ∈ ∂P . Room P (d) is searchable by two boundary guards if there exist two continuous functions : [0, 1] → ∂P and r : [0, 1] → ∂P such that [9] (a) (0) = r(0) = d and (1) = r(1) ∈ ∂P ; (b) For any t ∈ (0, 1), (t) = r(t) and (t) and r(t) are mutually visible; and (c) For any t ∈ [0, 1], d ∈ ∂P [r(t), (t)] holds.
Functions (t) and r(t), both of which are continuous, represent the positions of the left guard and right guard4 on ∂P at t ∈ [0, 1], respectively. It is clear from the above definition that, if a room is searchable, for any t ∈ [0, 1], the partition of P that lies on the door side of the line segment (t)r(t) is always clear of intruders. If P (d) is searchable, we call d a safe door. 4
Viewed from door d.
506
J.Z. Zhang and T. Kameda
A polygon P is LR-visible if there exists a pair of points u and v on ∂P such that ∂P (u, v) and ∂P (v, u) are mutually weakly visible, i.e., any point on ∂P (u, v) is visible to at least one point on ∂P (v, u) and vice versa [17]. If a polygon is LR-visible, there is an O(n) time algorithm to determine all possible pairs of maximal boundary chains {Ai , Bi }, i = 0, 1, · · · , m, such that for any u ∈ Ai and any v ∈ Bi , P is LR-visible with respect to the pair {u, v} [17]. It is shown in [1] that if a simple polygon P is 1-searchable, there exists a pair of vertices {u, v} such that P is weakly visible from the shortest path connecting u and v, i.e., every point on the two sides of P partitioned by u and v can be seen at least by one point on the shortest path connecting u and v. In this case, it is clear that P is LR-visible with respect to u and v [18]. Similarly, there is a relation between the searchability of a room and its LR-visibility. Lemma 1. [9,10] Let v be a reflex vertex of a searchable room P (d). If d ∈ / / Cccw (v)), the final meeting place of the two guards p ∈ ∂P (v, d) Ccw (v) (resp. d ∈ (resp. ∂P (d, v)).
Lemma 2. If a room P (d) is searchable by two guards, P is LR-visible with respect to d and a point p ∈ ∂P . Proof. For the full proof, see [14].
Corollary 1. If d ∈ ∂P is a safe door, d is on some Ai or Bi .
4 4.1
Searchability Analysis of a Room A Simple Characterization
We investigate room search by two guards using the V-diagram introduced in Section 2.2. Since no guard can move across the door at any time according to condition (c) of the definition of a searchable room given in Section 3, configuration l(t), r(t) is always within the V-space bounded by x ≥ d and y ≤ d. See the triangular portion of the V-space bounded by the dashed lines in Fig. 1. The initial configuration for the two guards d, d is represented by the only point on S, called the door point, at the upper left corner of this triangular section. The portion of the SV-diagram for room P (d) bounded by x ≥ d and y ≤ d is called the visibility triangle for P (d) and denoted by VT P (d). Fig. 3 shows an example. By convention, we label point (x, x) on the diagonal by x. By definition, in a searchable room P (d), all the configurations in {l(t), r(t) | t ∈ (0, 1)} correspond to white points in the V-diagram. Thus, the following proposition follows immediately [16]. Proposition 1. A room is searchable by two boundary guards if and only if there exists a cell in VT P (d) that touches both the door point and the diagonal side.
A cell in VT P (d) that does not touch both the door point and the diagonal side is called a blocking cell. We call a blocking cell that has either three sides
A Linear-Time Algorithm
d d
456 7 8 9 1011d
13 0 1 2 3
11 10 9 8 7 6 5 4 3 2 1 0 13
507
d
11 10
45 1
2
67
9
8
3
0 13
d d
Fig. 3. The visibility triangle where vertex 12 is considered as the door for the polygon in Fig. 2
(a triangle) or four sides (a rectangle) a simple blocking cell.5 Note that in Fig. 3, for example, the blocking cell to the right of the vertical line segment extending from vertex 5 on the diagonal G and bounded by the two horizontal line segments at vertices 8 and 11 does not touch G at 8, since the line segments have non-zero width.
r
r’ r’
r
r
d
r’ r
(a)
d
d r’
r
r’
r’
d r’
r (b)
r
r’ r
(c)
(d)
Fig. 4. Simple blocking cells that touch the door point
All the simple blocking cell patterns that touch the door point are shown in Fig. 4. In Fig. 4 (a) (resp. Fig. 4 (b)) there are two reflex vertices whose clockwise (resp. counter-clockwise) components are disjoint from each other. In Fig. 4 (c) there are two reflex vertices such that the clockwise component of one is disjoint from the counter-clockwise component of the other. The two arrows in Fig. 4 (d) need not intersect. It follows immediately from Proposition 1 that if P (d) contains any pattern in Fig. 4, P (d) is not searchable by two boundary guards. However, Fig. 4 does not exhaust all the possible cases where P (d) is not searchable. Let us call an interval on the diagonal that is touched by a simple blocking cell (a triangle) an 5
There may be other “dangling” line segments inside a simple blocking cell.
508
J.Z. Zhang and T. Kameda
r r’
r
r
r’
r r
r’ d
d
d
d r
d
d
d
r
r’
d
(a)
r
r d
(b)
d r’
r’
r’
r d
d
d
r’
r
r’
d
d
d
d
d r
r
r
(c)
r
r d
(d)
d
(e)
Fig. 5. Simple blocking cells that touch the diagonal
unreachable interval. An interval that is not unreachable is called reachable. Fig. 5 shows all the simple blocking cell patterns that cause unreachable intervals [13]. In the figure, the unreachable intervals are indicated by dashed line segments (in the bottom half of the figure), and the corresponding boundary portions are indicated by dotted curves (in the top half of the figure). In Fig. 5 we show the visibility triangles embedded in V-diagrams. Note that the unreachable intervals of Fig. 5 (a) and (b) appear in (d) and (e), respectively, as well. In addition, Fig. 5 (a), (d), and (e) appear in Fig. 4 but the positions of d are different in the two figures. The following theorem characterizes a safe door. Theorem 1. [13] Room P (d) is searchable by two boundary guards if and only if (a) no simple blocking cell touches the door point, and (b) there exists a point in a reachable interval on the diagonal of VT P (d). 4.2
A Linear-Time Testing Algorithm
Suppose the given P is LR-visible with respect to some pair of vertices. Then all the pairs of maximal boundary chains {{Ai , Bi } | i = 1, 2, · · · , m}, such that for any s ∈ Ai and any t ∈ Bi , P is LR-visible with respect to the pair {s, t}, can be computed in O(n) time [17]. In [12] we made use of the LR-visibility of a searchable room in our linear-time algorithm for checking if a given room is searchable by two guards. Below we present our algorithm as a procedure [13]. Its major goal is to detect the patterns in Fig. 5 (d) and (e) in linear time. We focus on the pattern in Fig. 5 (d), since the one in Fig. 5 (e) can be treated symmetrically. We will use this procedure in Section 5. We first construct two lists in linear time: Lcw is a list of reflex vertices on ∂P (d, p) that cause non-redundant clockwise components, ordered counterclockwise from p to d, while Rcw is a list of all reflex vertices on ∂P (p, d) ordered
A Linear-Time Algorithm
509
clockwise from p to d. We now modify Rcw to construct a new list Rcw by merging Rcw and {B(u) | u ∈ Lcw } and appending d at the end. If Lcw could be augmented by {B(u) | u ∈ Rcw }, it would be easy to find the farthest r , but it would require O(n log n) time to do so [13], because Rcw may contain reflex vertices that cause redundant clockwise components. Our aim is to do away with the computation of {B(u) | u ∈ Rcw }, since we do not know how to do it in linear time. We use variables Pr and Pl , initialized to the first element of Rcw and to be null (Λ), respectively, in order to scan Rcw and Lcw , respectively. We maintain another variable, v, to remember the “current” reflex vertex in Lcw . Procedure Farthest(d) 1. 2. 3. 4. 5. 6. 7. 8.
5
While Pr = B(u) for some vertex u do {v = u; advance Pr }; If Pr = d no pattern in Fig. 5 (d) is present (stop); If Pr and v are mutually visible, set w = v. Otherwise, go to step 6; If r = w and r = Pr form the pattern of Fig. 5 (d), Pr is the farthest vertex (stop);6 Advance Pr to the next element in Rcw and go to Step 1; If Pl = Λ, set Pl = v; While Pl and Pr are not mutually visible {advance Pl to the next element in Lcw };
Set w = Pl and go to Step 4;
A Linear-Time Algorithm for Finding all Safe Door Locations
We presented an O(n log n) solution to the problem of finding all safe doors in [13]. Here we present an optimal O(n) algorithm in five steps, by reusing the same visibility information for all candidate door positions. If a polygon is not LR-visible, by Lemma 2 there can be no safe door. Therefore, we shall always assume that the given polygon is LR-visible. 5.1
Critical Points and Sections
We define a section to be the part of the polygon boundary between two adjacent critical points. Observe in Fig. 5 (d) that, as we slide an imaginary d clockwise from r , the interval ∂P (r , r) remains unreachable as long as d is within ∂P (r , r). In other words, ∂P (r , r) remains unreachable for the range of d bounded by the two reflex vertices r and r. This means that the safety of any door position on ∂P (r , r) is not affected by the position of B(r ). Note that the clockwise component from r could be redundant. A similar observation applies to r in Fig. 5 (e), due to its symmetry to Fig. 5 (d). Therefore, we do not need to consider the extension points from the reflex vertices whose components are redundant. Note that we do need to consider the reflex vertices themselves. Proposition 2. The points in a section are either all safe or all unsafe. 6
To test this condition efficiently, determine if the extension of edge (Succ(Pr ), Pr ) is to the left of line segment Pr w viewed from Pr and Pr = B(w).
510
5.2
J.Z. Zhang and T. Kameda
Data Structures
We maintain two lists, Candidates and Sections, and an array A[ ]. Candidates initially contains one representative door position for each section, and unsafe positions will be deleted from it as they become known. Candidates functions as an index for Sections. The entry of Sections corresponding to a candidate door d in Candidates contains at most two maximal unreachable intervals that are currently known for d. The edge (i, i + 1) will be numbered i, where i = 0, 1, . . . , n − 1. A[ ] is an array indexed by the vertex number. If edge i is in interval ∂P (r, r ) for any pair of reflex vertices r and r in the pattern shown in Fig. 5 (a) components A[i].cw and A[i].ccw will store the clockwise and counter-clockwise end (reflex) vertices, respectively, of the maximal unreachable interval on which edge i lies. Otherwise, we set A[i].cw = A[i].ccw = i. We can construct A[ ] in linear time. 5.3
Patterns in Fig. 4 and Fig. 5 (a)
By condition (a) of Theorem 1, point d in any of the figures in Fig. 4 is unsafe. Step 1: Determine the critical points by computing the extension points of all the reflex vertices that cause non-redundant components. Construct Candidates by placing one representative door position for each section, excluding the sections that are unsafe, according to Fig. 4.
Since we assume the given polygon is LR-visible, we can compute the Ai s and Bi s in O(n) time. By deleting from Candidates all positions that do not belong to any Ai or Bi (see Corollary 1), we eliminate all candidates that are positioned as d in Figs. 4 (a), (b), or (c). As a result of Step 1, a safe d can only be in the clockwise component due to r or r . Step 2: Referring to the deadlock intervals in Fig. 5 (a), construct array A[ ].
5.4
Patterns in Fig. 5 (b)-(e)
Fig. 5 (b) and (c) are shown combined in Fig. 6 (a), and Fig. 5 (d) and (e) are shown combined in Fig. 6 (b). We now construct the entry of Sections for each representative door d. It consists of at most two unreachable intervals on the diagonal of visibility triangle VT P (d). They are of the form (rH (d), d) or (d, rL (d), if any. In Fig. 6, rH (d) and rL (d) are the farthest reflex vertices from d that cause such patterns [13]. Thus the intervals ∂P (d, rL (d)) and ∂P (rH (d), d) are unreachable. We update the unreachable intervals in the entry of Sections corresponding to d by Steps 3 and 4. Clearly, if the union of the unreachable intervals in the entry for position d and those in A[ ] spans the entire boundary, d is unsafe. Step 3: For each point d in Candidates, compute rL (d) and rH (d) of Fig. 6 (a), by determining the farthest (clockwise from d) non-redundant clockwise component and the farthest (counter-clockwise from d) non-redundant counter-clockwise component, respectively.
A Linear-Time Algorithm
511
d
d
r H (d)
rH(d) rL (d) (a)
r L(d) (b)
Fig. 6. (a) Unreachable interval of Type A; (b) Unreachable interval of Type B
We can partition Candidates into two groups: group Gd1 consists of those door positions belonging to the same clockwise components as d and group Gd2 consists of the rest. Note that all door positions in Gd1 (resp. Gd2 ) share the same farthest rH . This implies that we can invoke procedure Farthest(d) for one representative d from each group. The pattern in Fig. 5 (e) can be treated similarly to determine rL (d) for each d ∈ Candidates. Step 4: Pick an element d from Candidates and determine the two groups, Gd1 and Gd2 , defined above. Update rL (d ) and rH (d ) for each d in Sections.
5.5
Final Step
All the information needed to determine the safety of the doors in Candidates is contained in A[ ] and Sections. We now need to compute the union of the unreachable intervals on the diagonal, if they overlap. We then test if the resulting unreachable interval equals the entire boundary. Step 5: For each entry (let it be the entry for representative door d) in Sections do the following: 1. If either rL (d) = d or rH (d) = d, the diagonal is reachable and d is safe. Stop 2. Using rL (d) and rH (d) as indices into array A[ ], find A[rL (d)].cw and A[rH (d)].ccw. If, starting clockwise from d, A[rH (d)].ccw is encountered before A[rL (d)].cw, door d is unsafe and otherwise it is safe.
Theorem 2. Given polygon P with n vertices, we can identify in O(n) time all the safe door locations when two boundary guards search.
5.6
Extension to 1-Searcher Case
A boundary 1-searcher always moves on the polygon boundary, starting from the door, aiming her flashlight at the other side of the door. Her vision is restricted to the beam (inside the polygon) from the flashlight, up to the point where it leaves the polygon for the first time. In the 1-searcher case, we need to consider the beam head jump [15,16]. The following results are immediate [14].
512
J.Z. Zhang and T. Kameda
Theorem 3. Room P (d) is searchable by a boundary 1-searcher if and only if (a) no simple blocking cell touches the door point, and (b) the unreachable intervals caused by the patterns in Figs. 5 (a)-(d) do not span the entire boundary ∂P .
Theorem 4. Given polygon P with n vertices, we can identify in O(n) time all the safe door locations when a boundary 1-searcher searches.
6
Conclusion
The ultimate objective of our research on polygon search is to characterize the sets of polygons that can be searched by a given number of k-searchers or guards and generate search schedules in an efficient way. For some special cases where k = 1 or 2, or where there are other restrictions (such as streets or rooms), several interesting results are known, thanks to recent research effort by many researchers. As for room search, it is now fairly well-understood, and an O(n) time algorithm is known for testing if a given door on a given polygon is safe, i.e., it makes the room searchable by two guards. There is also an O(n log n) time algorithm to find all such safe door locations. We have shown in this paper that all the safe doors can be found in O(n) time, in other words, we can do so in the most efficient way. In a sense this concludes research on the room search problem by two guards or a 1-searcher.
References 1. Suzuki, I., Yamashita, M.: Searching for a mobile intruder in a polygonal region. SIAM J. on Computing 21(5), 863–888 (1992) 2. Icking, C., Klein, R.: The two guards problem. Int’l J. of Computational Geometry and Applications 2(3), 257–285 (1992) 3. Heffernan, P.: An optimal algorithm for the two-guard problem. Int’l J. of Computational Geometry and Applications 6, 15–44 (1996) 4. Tseng, L.H., Heffernan, P.J., Lee, D.T.: Two-guard walkability of simple polygons. Int’l J. of Computational Geometry and Applications 8(1), 85–116 (1998) 5. Bhattacharya, B.K., Mukhopadhyay, A., Narasimhan, G.: Optimal algorithms for two-guard walkability of simple polygons. In: Proc. 7th Int’l Workshop on Algorithms and Data Structures, pp. 438–449 (2001) 6. Crass, D., Suzuki, I., Yamashita, M.: Searching for a mobile intruder in a corridor: the open edge variant of the polygon search problem. Int’l J. of Computational Geometry and Applications 5(4), 397–412 (1995) 7. Lee, J.H., Park, S.M., Chwa, K.Y.: Searching a polygonal room with one door by a 1-searcher. Int’l J. of Computational Geometry and Applications 10(2), 201–220 (2000) 8. Lee, J.H., Shin, S.Y., Chwa, K.Y.: Visibility-based pursuit-evasions in a polygonal room with a door. In: Proc. ACM Symp. on Computational Geometry, pp. 281–290 (1999)
A Linear-Time Algorithm
513
9. Park, S.M., Lee, J.H., Chwa, K.Y.: Characterization of rooms searchable by two guards. In: Proc. Int’l Symp. on Algorithms and Computation, pp. 515–526 (2000) 10. Park, S.M., Lee, J.H., Chwa, K.Y.: Searching a room by two guards. Int’l J. of Computational Geometry and Applications 12(4), 339–352 (2002) 11. Tan, X.: Efficient algorithms for searching a polygonal room with a door. In: Japanese Conf. on Discrete and Computational Geometry, pp. 339–350 (2000) 12. Bhattacharya, B.K., Zhang, J.Z., Shi, Q.S., Kameda, T.: An optimal solution to room search problem. In: Proc. 18th Canadian Conf. on Computational Geometry, August 2006, pp. 55–58 (2006) 13. Zhang, J.Z., Kameda, T.: Where to build a door. In: Proc. IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, October 2006, pp. 4084–4090 (2006) 14. Zhang, J.Z., Kameda, T.: A linear-time algorithm for finding all door locations that make a room searchable. Technical Report CMPT TR 2007-27 (submitted to a journal), School of Computing Science, Simon Fraser University (November 2007), http://www.cs.sfu.ca/research/publications/techreports/ 15. LaValle, S.M., Simov, B., Slutzki, G.: An algorithm for searching a polygonal region with a flashlight. Int’l J. of Computational Geometry and Applications 12(1-2), 87– 113 (2002) 16. Kameda, T., Yamashita, M., Suzuki, I.: On-line polygon search by a seven-state boundary 1-searcher. IEEE Trans. on Robotics 22, 446–460 (2006) 17. Das, G., Heffernan, P.J., Narasimhan, G.: LR-visibility in polygons. Computational Geometry 7, 37–57 (1997) 18. Bhattacharya, B., Ghosh, S.K.: Characterizing LR-visibility polygons and related problems. Computational Geometry: Theory and Applications 18, 19–36 (2001)
Model Theoretic Complexity of Automatic Structures (Extended Abstract) Bakhadyr Khoussainov1 and Mia Minnes2 1
Department of Computer Science University of Auckland Auckland, New Zealand [email protected] 2 Mathematics Department Cornell University Ithaca, New York 14853 [email protected]
Abstract. We study the complexity of automatic structures via wellestablished concepts from both logic and model theory, including ordinal heights (of well-founded relations), Scott ranks of structures, and CantorBendixson ranks (of trees). We prove the following results: 1) The ordinal height of any automatic well-founded partial order is bounded by ω ω ; 2) The ordinal heights of automatic well-founded relations are unbounded below ω1CK ; 3) For any infinite computable ordinal α, there is an automatic structure of Scott rank at least α. Moreover, there are automatic structures of Scott rank ω1CK , ω1CK + 1; 4) For any ordinal α < ω1CK , there is an automatic successor tree of Cantor-Bendixson rank α.
1
Introduction
In recent years, there has been increasing interest in the study of structures that can be presented by automata. The underlying idea is to apply techniques of automata theory to decision problems that arise in logic and applications such as databases and verification. A typical decision problem is the model checking problem: for a structure A (e.g. a graph), design an algorithm that, given a formula φ(¯ x) in a formal system and a tuple a ¯ from the structure, decides if φ(¯ a) is true in A. In particular, when the formal system is the first order predicate logic or the monadic second order logic, we would like to know if the theory of the structure is decidable. Fundamental early results in this direction by B¨ uchi ([4], [5]) and Rabin ([20]) proved the decidability of the monadic second order theories of the successor on the natural numbers and of the binary tree. There have been numerous applications and extensions of these results in logic, algebra, verification, model checking, and databases (see, for example, [9] [23] [24] and [25]). Moreover, automatic structures provide a theoretical framework for constraint databases over discrete domains such as strings and trees [1]. M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 514–525, 2008. c Springer-Verlag Berlin Heidelberg 2008
Model Theoretic Complexity of Automatic Structures
515
A structure A = (A; R0 , . . . , Rm ) is automatic if the domain A and all the relations R0 , . . . , Rm of the structure are recognised by finite automata (precise definitions are in the next section). Independently, Hodgson [13] and later Khoussainov and Nerode [14] proved that for any given automatic structure there is an algorithm that solves the model checking problem for the first order logic. In particular, the first order theory of the structure is decidable. There is a body of work devoted to the study of resource-bounded complexity of the model checking problem for automatic structures. Most current results demonstrate that automatic structures are not complex in various concrete senses. However, in this paper we use well-established concepts from both logic and model theory to prove results in the opposite direction. We now briefly describe the measures of complexity we use (ordinal heights of well-founded relations, Scott ranks of structures, and Cantor-Bendixson ranks of trees) and connect them with the results of this paper. A relation R is called well-founded if there is no infinite sequence x1 , x2 , x3 , . . . such that (xi+1 , xi ) ∈ R for i ∈ ω. In computer science, well-founded relations are of interest due to a natural connection between well-founded sets and terminating programs. We say that a program is terminating if every computation from an initial state is finite. This is equivalent to well-foundedness of the collection of states reachable from the initial state, under the reachability relation [3]. The ordinal height is a measure of the depth of well-founded relations. Since all automatic structures are computable, the obvious bound for ordinal heights of automatic well-founded relations is ω1CK (the first non-computable ordinal). Sections 3 and 4 study the sharpness of this bound. Theorem 1 characterizes automatic well-founded partial orders in terms of their ordinal heights, whereas Theorem 2 shows that ω1CK is the sharp bound in the general case. Theorem 1. For each ordinal α, α is the ordinal height of an automatic wellfounded partial order if and only if α < ω ω . Theorem 2. For each (computable) ordinal α < ω1CK , there is an automatic well-founded relation A whose ordinal height is greater than α. Section 5 is devoted to building automatic structures with high Scott ranks. The concept of Scott rank comes from a well-known theorem of Scott stating that for every countable structure A there exists a sentence φ in Lω1 ,ω -logic which characterizes A up to isomorphism [22]. The minimal quantifier rank of such a formula is called the Scott rank of A. A known upper bound on the Scott rank of computable structures implies that the Scott rank of automatic structures is at most ω1CK + 1. But, until now, all the known examples of automatic structures had low Scott ranks. Results in [19], [7], [17] suggest that the Scott ranks of automatic structures could be bounded by small ordinals. This intuition is falsified in Section 5 with the theorem: Theorem 3. For each infinite computable ordinal α there is an automatic structure of Scott rank at least α.
516
B. Khoussainov and M. Minnes
In the last section, we investigate the Cantor-Bendixson ranks of automatic trees. A partial order tree is a partially ordered set (T, ≤) such that there is a ≤-minimal element of T , and each subset {x ∈ T : x ≤ y} is finite and is linearly ordered under ≤. A successor tree is a pair (T, S) such that the reflexive and transitive closure ≤S of S produces a partial order tree (T, ≤S ). The derivative of a tree T is obtained by removing all the nonbranching paths of the tree. One applies the derivative operation to T successively until a fixed point is reached. The minimal ordinal that is needed to reach the fixed point is called the Cantor-Bendixson (CB) rank of the tree. The CB rank plays an important role in logic, algebra, and topology. Informally, the CB rank tells us how far the structure is from algorithmically (or algebraically) simple structures. Again, the obvious bound on CB ranks of automatic successor trees is ω1CK . In [16], it is proved that the CB rank of any automatic partial order tree is finite and can be computed from the automaton for the ≤ relation on the tree. It has been an open question whether the CB ranks of automatic successor trees can be bounded by small ordinals. We answer this question in the following theorem. Theorem 4. For α < ω1CK there is an automatic successor tree of CB rank α. The main tool we use to prove results about high ranks is the configuration spaces of Turing machines, considered as automatic graphs. It is important to note that graphs which arise as configuration spaces have very low model-theoretic complexity: their Scott ranks are at most 3, and if they are well-founded then their ordinal heights are at most ω (see Propositions 1 and 2). Hence, the configuration spaces serve merely as building blocks in the construction of automatic structures with high complexity, rather than contributing materially to the high complexity themselves.
2
Preliminaries
A (relational) vocabulary is a finite sequence (P1m1 , . . . , Ptmt , c1 , . . . , cs ), where m each Pj j is a predicate symbol of arity mj > 0, and each ck is a constant symbol. A A structure with this vocabulary is a tuple A = (A; P1A , . . . , PtA , cA 1 , . . . , cs ), A A where Pj and ck are interpretations of the symbols of the vocabulary. When convenient, we may omit the superscripts A. We only consider infinite structures. A finite automaton M over an alphabet Σ is a tuple (S, ι, Δ, F ), where S is a finite set of states, ι ∈ S is the initial state, Δ ⊂ S × Σ × S is the transition table, and F ⊂ S is the set of final states. A computation of A on a word σ1 σ2 . . . σn (σi ∈ Σ) is a sequence of states, say q0 , q1 , . . . , qn , such that q0 = ι and (qi , σi+1 , qi+1 ) ∈ Δ for all i ∈ {0, . . . , n − 1}. If qn ∈ F , then the computation is successful and we say that automaton M accepts the word σ1 σ2 . . . σn . The language accepted by the automaton M is the set of all words accepted by M. In general, D ⊂ Σ is finite automaton recognisable, or regular, if D is the language accepted by some finite automaton M. To define automaton recognisable relations, we use n-variable (or n-tape) automata. An n–tape automaton can be thought of as a one-way Turing machine
Model Theoretic Complexity of Automatic Structures
517
with n input tapes [8]. Each tape is regarded as semi-infinite, having written on it a word over the alphabet Σ followed by an infinite succession of blanks (denoted by symbols). The automaton starts in the initial state, reads simultaneously the first symbol of each tape, changes state, reads simultaneously the second symbol of each tape, changes state, etc., until it reads a blank on each tape. The automaton then stops and accepts the n–tuple of words if it is in a final state. The set of all n–tuples accepted by the automaton is the relation recognised by the automaton. For a formal definition see, for example, [14]. Definition 1. A structure A = (A; R0 , R1 , . . . , Rm ) is automatic over Σ if its domain A and all relations R0 , R1 , . . ., Rm are regular over Σ. The configuration graph of any Turing machine is an example of an automatic structure. The graph is defined by letting the configurations of the Turing machine be the vertices, and putting an edge from configuration c1 to configuration c2 if the machine can make an instantaneous move from c1 to c2 . Many examples of automatic structures can be formed using the ω-fold disjoint union of a structure A (the disjoint union of ω many copies of A). Lemma 1. [21] If A is automatic then its ω-fold disjoint union is isomorphic to an automatic structure. The class of automatic structures is a proper subclass of the computable structures. In this paper, we will be coding computable structures into automatic ones. Good references for the theory of computable structures include [11], [15]. Definition 2. A computable structure is a structure A = (A; R1 , . . . , Rm ) whose domain and relations are all computable. The domains of computable structures can always be identified with the set ω of natural numbers. Under this assumption, we introduce new constant symbols cn for each n ∈ ω and interpret cn as n. In this context, A is computable iff the atomic diagram of A (the set of G¨ odel numbers of all quantifier-free sentences in the extended vocabulary that are true in A) is computable.
3
Ranks of Automatic Well-Founded Partial Orders
In this section we consider structures A = (A; R) with a single binary relation. An element x is said to be R-minimal for a set X if for each y ∈ X, (y, x) ∈ / R. The relation R is said to be well-founded if every non-empty subset of A has an R-minimal element. This is equivalent to saying that (A; R) has no infinite chains x1 , x2 , x3 , . . . where (xi+1 , xi ) ∈ R for all i. A ranking function for A is an ordinal-valued function f such that f (y) < f (x) whenever (y, x) ∈ R. For f a ranking function on A, let ord(f ) = sup{f (x) : x ∈ A}. The structure A is well-founded if and only if A admits a ranking function. The ordinal height of A, denoted r(A), is the least ordinal α which is ord(g) for some ranking function g on A. For B ⊆ A, we write r(B) for the ordinal height of the structure obtained by restricting R to B. Recall that if α < ω1CK then α is a computable ordinal.
518
B. Khoussainov and M. Minnes
Lemma 2. If α < ω1CK , there is a computable well-founded relation of ordinal height α. Lemma 2 amounts to taking a computable copy of any linear order of type α. The next lemma follows easily from well-foundedness of ordinals and of R. Lemma 3. For a structure A = (A; R) where R is well-founded, if r(A) = α and β < α then there is an x ∈ A such that rA (x) = β. For the remainder of this section, we assume further that R is a partial order. For convenience, we write ≤ instead of R. Thus, we consider automatic well-founded partial orders A = (A, ≤). We will use the notion of natural sum of ordinals. The natural sum of ordinals α, β (denoted α + β) is defined recursively: α + β is the least ordinal strictly greater than γ + β for all γ < α and strictly greater than α + γ for all γ < β. Lemma 4. Let A1 and A2 be disjoint subsets of A such that A = A1 ∪ A2 . Consider the partially ordered sets A1 = (A1 , ≤1 ) and A2 = (A2 , ≤2 ) obtained by restricting ≤ to A1 and A2 respectively. Then, r(A) ≤ α1 + α2 , where αi = r(Ai ). Proof. We will show that there is a ranking function on A whose range is contained in the ordinal α1 + α2 . For each x ∈ A consider the partially ordered sets A1,x and A2,x obtained by restricting ≤ to {z ∈ A1 | z < x} and {z ∈ A2 | z < x}, respectively. Define f (x) = r(A1,x ) + r(A2,x ). It is not hard to see that f is the desired ranking function. Corollary 1. If r(A) = ω n and A = A1 ∪ A2 , where A1 ∩ A2 = ∅, then either r(A1 ) = ω n or r(A2 ) = ω n . Khoussainov and Nerode [14] show that, for each n, there is an automatic presentation of the ordinal ω n . It is clear that such a presentation has ordinal height ω n . The next theorem shows that ω ω is the sharp bound on ranks of all automatic well-founded partial orders. Now that Corollary 1 has been established, the proof of Theorem 1 follows Delhomm´e [7] and Rubin [21]. Theorem 1. For each ordinal α, α is the ordinal height of an automatic wellfounded partial order if and only if α < ω ω . Proof. One direction of the proof is clear. For the other, assume for a contradiction that there is an automatic well-founded partial order A = (A, ≤) with r(A) = α ≥ ω ω . Let (SA , ιA , ΔA , FA ) and (S≤ , ι≤ , Δ≤ , F≤ ) be finite automata over Σ recognizing A and ≤ (respectively). By Lemma 3, for each n > 0 there is un ∈ A such that rA (un ) = ω n . For each u ∈ A we define the set u ↓= {x ∈ A : x < u}. Note that if rA (u) is a limit ordinal then rA (u) = r(u ↓). We define a finite partition of u ↓ in order to apply Corollary 1. To do so, for u, v ∈ Σ , define
Model Theoretic Complexity of Automatic Structures
519
Xvu = {vw ∈ A : w ∈ Σ & vw < u}. Each set of the form u ↓ can then be partitioned based on the prefixes of words as follows: u ↓= {x ∈ A : |x| < |u| & x < u} ∪ Xvu . v∈Σ :|v|=|u|
(All the unions above are finite and disjoint.) Hence, applying Corollary 1, for each un there exists a vn such that |un | = |vn | and r(Xvunn ) = r(un ↓) = ω n . On the other hand, we use the automata to define the following equivalence relation on pairs of words of equal lengths: v v (u, v) ∼ (u , v ) ⇐⇒ ΔA (ιA , v) = ΔA (ιA , v ) & Δ≤ (ι≤ , ) = Δ≤ (ι≤ , ) u u There are at most |SA | × |S≤ | equivalence classes. Thus, the infinite sequence (u1 , v1 ), (u2 , v2 ), . . . contains m, n such that m = n and (um , vm ) ∼ (un , vn ).
Lemma 5. For any u, v, u , v ∈ Σ , if (u, v) ∼ (u , v ) then r(Xvu ) = r(Xvu ).
To prove the lemma, consider g : Xvu → Xvu defined as g(vw) = v w. From the equivalence relation, we see that g is well-defined, bijective, and order preserving. Hence Xvu ∼ = Xvu (as partial orders). Therefore, r(Xvu ) = r(Xvu ). By Lemma 5, ω m = r(Xvumm ) = r(Xvunn ) = ω n , a contradiction with the assumption that m = n. Therefore, there is no automatic well-founded partial order of ordinal height greater than or equal to ω ω .
4 4.1
Ranks of Automatic Well-Founded Relations Configuration Spaces of Turing Machines
In the following, we embed computable structures into automatic ones via configuration spaces of Turing machines. Let M be an n-tape deterministic Turing machine. The configuration space of M, denoted by Conf (M), is a directed graph whose nodes are configurations of M. The nodes are n-tuples, each of whose coordinates represents the contents of a tape. Each tape is encoded as (w q w ), where w, w ∈ Σ are the symbols on the tape before and after the location of the read/write head, and q is one of the states of M. The edges of the graph are all the pairs of the form (c1 , c2 ) such that there is an instruction of M that transforms c1 to c2 . The configuration space is an automatic graph. The out-degree of every vertex in Conf (M) is 1; the in-degree need not be 1. Definition 3. A deterministic Turing machine M is reversible if Conf (M) consists only of finite chains and chains of type ω. Lemma 6. [2] For any deterministic 1-tape Turing machine there is a reversible 3-tape Turing machine which accepts the same language.
520
B. Khoussainov and M. Minnes
Proof. (Sketch) Given a deterministic Turing machine, define a 3-tape Turing machine with a modified set of instructions. The modified instructions have the property that neither the domains nor the ranges overlap. The first tape performs the computation exactly as the original machine would have done. As the new machine executes each instruction, it stores the index of the instruction on the second tape, forming a history. Once the machine enters a state which would have been halting for the original machine, the output of the computation is copied onto the third tape. Then, the machine runs the computation backwards and erases the history tape. The halting configuration contains the input on the first tape, blanks on the second tape, and the output on the third tape. We establish the following notation for a 3-tape reversible Turing machine M given by the construction in this lemma. A valid initial configuration of M is of the form (λ ι x, λ, λ), where x in the domain, λ is the empty string, and ι is the initial state of M. From the proof above, observe that a final (halting) configuration is of the form (x, λ, λ qf y), with qf a halting state of M. Also, because of the reversibility assumption, all the chains in Conf (M) are either finite or ω-chains (the order type of the natural numbers). In particular, this means that Conf (M) is well-founded. We call an element of in-degree 0 a base (of a chain). The set of valid initial or final configurations is regular. We classify the components (chains) of Conf (M) as follows: – Terminating computation chains: finite chains whose base is a valid initial configuration; that is, one of the form (λ ι x, λ, λ), for x ∈ Σ . – Non-terminating computation chains: infinite chains whose base is a valid initial configuration. – Unproductive chains: chains whose base is not a valid initial configuration. Configuration spaces of reversible Turing machines are locally finite graphs (graphs of finite degree) and well-founded. Hence, the following proposition guarantees that their ordinal heights are small. The proof is left to the reader. Proposition 1. If G = (A, E) is a locally finite graph then E is well-founded and the ordinal height of E is not above ω, or E has an infinite chain. 4.2
Automatic Well-Founded Relations of High Rank
Theorem 2. For each computable ordinal α < ω1CK , there is an automatic well-founded relation A whose ordinal height is greater than α Proof. The proof of the theorem uses properties of Turing machines and their configuration spaces. We take a computable well-founded relation whose ordinal height is α, and “embed” it into an automatic well-founded relation with similar ordinal height. By Lemma 2, let C = (C, Lα ) be a computable well-founded relation of ordinal height α. We assume without loss of generality that C = Σ for some finite alphabet Σ. Let M be the Turing machine computing the relation Lα . On each
Model Theoretic Complexity of Automatic Structures
521
pair (x, y) from the domain, M halts and outputs “yes” or “no” . By Lemma 6, we can assume that M is reversible. Recall that Conf (M) = (D, E) is an automatic graph. We define the domain of our automatic structure to be A = Σ ∪ D. The binary relation of the automatic structure is: R = E ∪ {(x, (λ ι (x, y), λ, λ)) : x, y ∈ Σ } ∪ {(((x, y), λ, λ qf “yes” ), y) : x, y ∈ Σ }.
Intuitively, the structure (A; R) is a stretched out version of (C, Lα ) with infinitely many finite pieces extending from elements of C, and with disjoint pieces which are either finite chains or chains of type ω. The structure (A; R) is automatic because its domain is a regular set of words and the relation R is recognisable by a 2-tape automaton. We should verify, however, that R is well-founded. Let Y ⊂ A. If Y ∩ C = ∅ then since (C, Lα ) is well-founded, there is x ∈ Y ∩ C which is Lα -minimal. The only possible elements u in Y for which (u, x) ∈ R are those which lie on computation chains connecting some z ∈ C with x. Since each such computation chain is finite, there is an R-minimal u below x on each chain. Any such u is R-minimal for Y . On the other hand, if Y ∩ C = ∅, then Y consists of disjoint finite chains and chains of type ω. Any such chain has a minimal element, and any of these elements are R-minimal for Y . Therefore, (A; R) is an automatic well-founded structure. We now consider the ordinal height of (A; R). For each element x ∈ C, an easy induction on rC (x), shows that rC (x) ≤ rA (x) ≤ ω + rC (x). We denote by (a, b) the (finite) length of the computation chain of M with input (a, b). For any element ax,y in the computation chain which represents the computation of M determining whether (x, y) ∈ R, we have rA (x) ≤ rA (ax,y ) ≤ rA (x) + (x, y). For any element u in an unproductive chain of the configuration space, 0 ≤ rA (u) < ω. Therefore, since C ⊂ A, r(C) ≤ r(A) ≤ ω + r(C).
5
Automatic Structures and Scott Rank
The Scott rank of a structure is introduced in the proof of Scott’s Isomorphism Theorem [22]. Here we follow the definition of Scott rank from [6]. Definition 4. For structure A and tuples a ¯, ¯b ∈ An (of equal length), define – a ¯ ≡0 ¯b if a ¯, ¯b satisfy the same quantifier-free formulas in the language of A; – For α > 0, a ¯ ≡α ¯b if for all β < α, for each c¯ (of arbitrary length) there is ¯ ¯ and for each d¯ (of arbitrary length) there is c¯ such d such that a ¯, c¯ ≡β ¯b, d; β ¯ ¯ that a ¯, c¯ ≡ b, d. Then, the Scott rank of the tuple a ¯, denoted by SR(¯ a), is the least β such that ¯) ∼ ¯ ≡β ¯b implies that (A, a for all ¯b ∈ An , a = (A, ¯b). Finally, the Scott rank of A, denoted by SR(A), is the least α greater than the Scott ranks of all tuples of A. Example 1. SR(Q, ≤) = 1, SR(ω, ≤) = 2, and SR(n · ω, ≤) = n + 1. Configuration spaces of reversible Turing machines are locally finite graphs. By the Proposition below, they all have low Scott Rank.
522
B. Khoussainov and M. Minnes
Proposition 2. Let G = (V, E) be a locally finite graph, then SR(G) ≤ 3. Proof. The neighbourhood of diameter n of a subset U , denoted Bn (U ), is defined as follows: B0 (U ) = U and Bn (U ) is the set of v ∈ V which can be reached from U by n or fewer edges. The proof of the proposition relies on two lemmas whose proofs are left to the reader. Lemma 7. Let a ¯, ¯b ∈ V be such that a ¯ ≡2 ¯b. Then for all n, there is a bijection ¯ of the n-neighbourhoods around a ¯, b which sends a ¯ to ¯b and which respects E. Lemma 8. Let G = (V, E) be a graph. Suppose a ¯, ¯b ∈ V are such that for ∼ ¯ ¯ all n, (Bn (¯ a), E, a ¯) = (Bn (b), E, b). Then there is an isomorphism between the component of G containing a ¯ and that containing ¯b which sends a ¯ to ¯b. To prove the proposition, we note that for any a ¯, ¯b in V such that a ¯ ≡2 ¯b, Lemmas 7 and 8 yield an isomorphism from the component of a ¯ to the component of ¯b 2 ¯ ¯ that maps a ¯ to b. Hence, if a ¯ ≡ b, there is an automorphism of G that maps a ¯ to ¯b. Therefore, for each a ¯ ∈ V , SR(¯ a) ≤ 2, so SR(G) ≤ 3. Let C = (C; R1 , . . . , Rm ) be a computable structure. We construct an automatic structure A whose Scott rank is (close to) the Scott rank of C. Since the domain of C is computable, we assume that C = Σ for some finite Σ. The construction of A involves connecting the configuration spaces of Turing machines computing relations R1 , . . . , Rm . Note that Proposition 2 suggests that the high Scott rank of the resulting automatic structure is the main part of the construction because it is not provided by the configuration spaces themselves. We detail the construction for Ri . Let Mi be a Turing machine for Ri . By a simple modification of the machine we assume that Mi halts if and only if its output is “yes” . By Lemma 6, we can also assume that Mi is reversible. We now modify the configuration space Conf (Mi ) so as to respect the isomorphism type of C. This will ensure that the construction (almost) preserves the Scott rank of C. We use the terminology from Subsection 4.1. Smoothing out unproductive parts. The length and number of unproductive chains is determined by the machine Mi and hence may differ even for Turing machines computing the same set. In this stage, we standardize the format of this unproductive part of the configuration space. We add ω-many chains of length n (for each n) and ω-many copies of ω. This ensures that the (smoothed) unproductive section of the configuration space of any Turing machine will be isomorphic and preserves automaticity. Smoothing out lengths of computation chains. We turn our attention to the chains which have valid initial configurations at their base. The length of each finite chain denotes the length of computation required to return a “yes” answer. We will smooth out these chains by adding “fans” to each base. For this, we connect to each base of a computation chain a structure which consists of ω many chains of each finite length. To do so we follow Rubin [21]: consider the structure whose domain is 0 01 and whose relation is given by xEy if and only if |x| = |y| and y is the least lexicographic successor of x. This structure has a
Model Theoretic Complexity of Automatic Structures
523
finite chain of every finite length. As in Lemma 1, we take the ω-fold disjoint union of the structure and identify the bases of all the finite chains. We get a “fan” with infinitely many chains of each finite size whose base can be identified with a valid initial computation state. Also, the fan has an infinite component if and only if Ri does not hold of the input tuple corresponding to the base. The result is an automatic graph, Smooth(Ri ) = (Di , Ei ), which extends Conf (Mi ). Connecting domain symbols to the computations of the relation. We apply the construction above to each Ri in the signature of C. Taking the union of the resulting automatic graphs and adding vertices for the domain, we have the structure (Σ ∪∪i Di , E1 , . . . , En ) (where we assume that the Di are disjoint). Assume that each Mi has a different initial state, and denote it by ιi . We add n predicates Fi to the signature of the automatic structure connecting the elements of the domain of C with the computations of the relations Ri : Fi = {(x0 , . . . , xmi −1 , (λ ιi (x0 , . . . , xmi −1 ), λ, λ)) | x0 , . . . , xmi −1 ∈ Σ }. Note that for x¯ ∈ Σ , Ri (¯ x) if and only if Fi (¯ x, (λ ιi x ¯, λ, λ)) holds and all ¯, λ, λ) are finite. We have built the automatic Ei chains emanating from (λ ιi x structure A = (Σ ∪ ∪i Di , E1 , . . . , En , F1 , . . . , Fn ). Two technical lemmas are used to show that the Scott rank of A is close to α: Lemma 9. For x¯, y¯ in the domain of C and for ordinal α, if x¯ ≡α ¯ then x ¯ ≡α ¯. C y A y xx¯ u ¯) ≤ 2 + SRC (¯ y ). Lemma 10. If x ¯ ∈ Σ ∪ ∪i Di , there is y¯ ∈ Σ with SRA (¯ Putting these together, we conclude that SR(C) ≤ SR(A) ≤ 2 + SR(C). Applying the above construction to the computable structures of Scott rank ω1CK and ω1CK + 1 built by Harrison [12] and Knight and Millar [18], we get automatic structures of Scott rank ω1CK , ω1CK + 1. We also apply the construction to [10], where it is proved that there are computable structures with Scott ranks above each computable ordinal. In this case, we get the following theorem. Theorem 3. For each infinite computable ordinal α, there is an automatic structure of Scott rank at least α.
6
Cantor-Bendixson Rank of Automatic Successor Trees
In this section we show that there are automatic successor trees of high CantorBendixson (CB) rank. Recall the definitions of partial order trees and successor trees from Section 1. Note that if (T, ≤) is an automatic partial order tree then the successor tree (T, S), where the relation S is defined by S(x, y) ⇐⇒ (x < y) & ¬∃z(x < z < y), is automatic. Definition 5. The derivative of a tree T , d(T ), is the subtree of T whose domain is {x ∈ T : x lies on at least two infinite paths in T }. By induction, d0 (T ) = T , dα+1 (T ) = d(dα (T )), and for γ a limit ordinal, dγ (T ) = ∩β<γ dβ (T ). The CB rank of the tree, CB(T ), is the least α such that dα (T ) = dα+1 (T ).
524
B. Khoussainov and M. Minnes
The CB ranks of automatic partial order trees are finite [16]. This is not true of automatic successor trees. The main theorem of this section provides a general technique for building trees of given CB ranks. It uses the fact that for each α < ω1CK there is a computable successor tree of CB rank α. This fact can be proven by recursively coding up computable trees of increasing CB rank. Theorem 4. For α < ω1CK there is an automatic successor tree of CB rank α. Proof. Suppose we are given α < ω1CK . Take a computable tree Rα of CB rank α. We use the same construction as in the case of well-founded relations (see the proof of Theorem 2). The result is a stretched out version of the tree Rα , where between each two elements of the original tree we have a coding of their computation. In addition, extending from each x ∈ Σ we have infinitely many finite computation chains. Those chains which correspond to output “no” are not connected to any other part of the automatic structure. Finally, there is a disjoint part of the structure consisting of chains whose bases are not valid initial configurations. By the reversibility assumption, each unproductive component of the configuration space is isomorphic either to a finite chain or to an ω-chain. Moreover, the set of invalid initial configurations which are the base of such an unproductive chain is regular. We connect all such bases of unproductive chains to the root and get an automatic successor tree, Tα . We now consider the CB rank of Tα . Note that the first derivative removes all the subtrees whose roots are at distance 1 from the root and are invalid initial computations. This occurs because each of the invalid computation chains has no branching and is not connected to any other element of the tree. Next, if we consider the subtree of Tα rooted at some x ∈ Σ , we see that all the paths which correspond to computations whose output is “no” vanish after the first derivative. Moreover, x ∈ d(Tα ) if and only if x ∈ d(Rα ) because the construction did not add any new infinite paths. Therefore, after one derivative, the structure is exactly a stretched out version of d(Rα ). Likewise, for all β < α, dβ (Tα ) is a stretched out version of dβ (Rα ). Hence, CB(Tα ) = CB(Rα ) = α.
Acknowledgement We thank Moshe Vardi who posed the question about ranks of automatic wellfounded relations. We also thank Anil Nerode and Frank Stephan with whom we discussed Scott and Cantor-Bendixson ranks of automatic structures.
References 1. Benedikt, M., Libkin, L.: Tree extension algebras: Logics, automata, and query languages. In: Proceedings of LICS 2002, pp. 203–212 (2002) 2. Bennett, C.H.: Logical Reversibility of Computation. IBM Journal of Research and Development, 525–532 (1973) 3. Blaas, A., Gurevich, Y.: Program Termination and Well Partial Orderings. ACM Transactions on Computational Logic, 1–25 (2006)
Model Theoretic Complexity of Automatic Structures
525
4. B¨ uchi, J.R.: Weak second-order arithmetic and finite automata. Zeitschrift Math. Logik und Grundlagen det Mathematik, 66–92 (1960) 5. B¨ uchi, J.R.: On a decision method in restricted second-order arithmetic. In: Nagel, E., Suppes, P., Tarski, A. (eds.) Proc. International Congress on Logic, Methodology and Philosophy of Science, 1960, pp. 1–12. Stanford University Press (1962) 6. Calvert, W., Goncharov, S.S., Knight, J.F.: Computable structures of Scott rank ω1CK in familiar classes. In: Advances in Logic (Proceedings of the North Texas Logic Conference). Contemporary Mathematics, vol. 425, pp. 49–66. American Mathematical Society (2007) 7. Delhomm´e, C.: Automaticit´e des ordinaux et des graphes homog`enes. C.R. Acad`emie des sciences Paris, Ser. I 339, 5–10 (2004) 8. Eilenberg, S.: Automata, Languages, and Machines (Vol. A). Academic Press, New York (1974) 9. Epstein, D.B.A., et al.: Word Processing in Groups, A.K. Peters Ltd. (Natick, Massachusetts) (1992) 10. Goncharov, S.S., Knight, J.F.: Computable structure and non-structure theorems. Algebra and Logic 41, 351–373 (2002) 11. Harizanov, V.S.: Pure Computable Model Theory. In: Yu. Ershov, S., Goncharov, A., Nerode, J. (eds.) Handbook of Recursive Mathematics, pp. 3–114. NorthHolland, Amsterdam (1998) 12. Harrison, J.: Recursive Pseudo Well-Orderings. Transactions of the American Mathematical Society 131(2), 526–543 (1968) 13. Hodgson, B.R.: On Direct Products of Automaton Decidable Theories. Theoretical Computer Science 19, 331–335 (1982) 14. Khoussainov, B., Nerode, A.: Automatic presentations of structures. In: Leivant, D. (ed.) LCC 1994. LNCS, vol. 960, pp. 367–392. Springer, Heidelberg (1995) 15. Khoussainov, B., Shore, R.A.: Effective Model Theory: The Number of Models and Their Complexity. In: Cooper, S.B., Truss, J.K. (eds.) Models and Computability, Invited Papers from LC 1997. LMSLNS, vol. 259, pp. 193–240. Cambridge University Press, Cambridge, England (1999) 16. Khoussainov, B., Rubin, S., Stephan, F.: On automatic partial orders. In: Proceedings of 18th LICS, pp. 168–177 (2003) 17. Khoussainov, B., Rubin, S., Stephan, F.: Automatic linear orders and trees. ACM Transactions on Computational Logic 6(4), 675–700 (2005) 18. Knight, J.F., Millar, J.: Computable Structures of Rank ω1CK . Journal of Mathematical Logic; posted on arXiv 25 (August 2005) (submitted) 19. Lohrey, M.: Automatic structures of bounded degree. In: Y. Vardi, M., Voronkov, A. (eds.) LPAR 2003. LNCS (LNAI), vol. 2850, pp. 344–358. Springer, Heidelberg (2003) 20. Rabin, M.O.: Decidability of Second-Order Theories and Automata on Infinite Trees. Transactions of the American Mathematical Society 141, 1–35 (1969) 21. Rubin, S.: Automatic Structures, PhD Thesis, University of Auckland (2004) 22. Scott, D.: Logic with Denumerably Long Formulas and Finite Strings of Quantifiers. In: Addison, J., Henkin, L., Tarski, A. (eds.) The Theory of Models, pp. 329–341. North-Holland, Amsterdam (1965) 23. Vardi, M.Y., Wolper, P.: Automata-Theoretic Techniques for Modal Logics of Programs. In: Proceedings of 16th STOC, pp. 446–456 (1984) 24. Vardi, M.Y.: An Automata-Theoretic Approach to Linear Temporal Logic. In: Moller, F., Birtwistle, G. (eds.) Logics for Concurrency. LNCS, vol. 1043, pp. 238– 266. Springer, Heidelberg (1996) 25. Vardi, M.Y.: Model Checking for Database Theoreticians. In: Proceedings of ICDL, vol. 5 (2005)
A Separation between Divergence and Holevo Information for Ensembles Rahul Jain1, , Ashwin Nayak2, , and Yi Su3, 1
2
University of Waterloo [email protected] University of Waterloo & Perimeter Institute [email protected] 3 University of Waterloo [email protected]
Abstract. The notion of divergence information of an ensemble of probability distributions was introduced by Jain, Radhakrishnan, and Sen [1,2] in the context of the “substate theorem”. Since then, divergence has been recognized as a more natural measure of information in several situations in quantum and classical communication. We construct ensembles of probability distributions for which divergence information may be significantly smaller than the more standard Holevo information. As a result, we establish that lower bounds previously shown for Holevo information are weaker than similar ones shown for divergence information.
1
Introduction
In this article, we study the relationship between two different measures of information contained in an ensemble of probability distributions. The first measure, Holevo information, is a standard notion from information theory, and is equivalent to the notion of mutual information between two random variables. Consider jointly distributed random variables XY , with X taking values in a sample space X . Consider the ensemble of distributions E = {(λi , Yi ) : i ∈ X },
School of Computer Science, and Institute for Quantum Computing, University of Waterloo, 200 University Ave. W., Waterloo, ON N2L 3G1, Canada. Research supported in part by ARO/NSA USA. Department of Combinatorics and Optimization, and Institute for Quantum Computing, University of Waterloo, 200 University Ave. W., Waterloo, ON N2L 3G1, Canada. Research supported in part by NSERC Canada, CIFAR, MITACS, QuantumWorks, and an ERA from the Province of Ontario. A.N. is also Associate Member, Perimeter Institute for Theoretical Physics, Waterloo, Canada. Research at Perimeter Institute for Theoretical Physics is supported in part by the Government of Canada through NSERC and by the Province of Ontario through MRI. Department of Pure Mathematics, University of Waterloo, 200 University Ave. W., Waterloo, ON N2L 3G1, Canada. Research supported in part by an NSERC Canada Undergraduate Research Award.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 526–541, 2008. c Springer-Verlag Berlin Heidelberg 2008
A Separation between Divergence and Holevo Information for Ensembles
527
where λi = Pr(X = i), and Yi = Y |(X = i), obtained by conditioning on values assumed by X. The Holevo information of the ensemble is given by χ(E) = I(X : Y ) = Ei∼X S(Yi Y ), where S(··) measures the relative entropy of a random variable (equivalently, distribution) with respect to another. This notion may be extended to ensembles of quantum states (see, e.g., the text [3]), and the term ‘Holevo information’ is derived from the literature in quantum information theory. The second measure, divergence information, was introduced by Jain, Radhakrishnan, and Sen [1,2]. It arises in the study of relative entropy, and its connection with a “substate property”. The observational divergence (or simply divergence) of two classical distributions P, Q on the same finite sample space is maxE P (E) log2 (P (E)/Q(E)), where E ranges over all events. We may view this as a (scaled) measure of the factor by which P may exceed Q for an event of interest. The notion of divergence information is derived from this as D(E) = Ei∼X D(Yi Y ), in analogy with Holevo information. A quantum generalisation of this measure may also be defined [2]. Relative entropy and Holevo (or mutual) information have been studied extensively in communication theory and beyond (see, e.g, [4]) as they arise in a variety of applications. Since the discovery of the substate theorem [1], divergence is being recognized as a more natural measure of information in a growing number of applications [2, Section 1]. The applications include privacy trade-offs in communicatioin protocols for computing relations [5] and bit-string commitment [6], and the communication complexity of remote state preparation [7]. In particular, divergence captures, up to a constant factor, the substate property for probability distributions. It thus becomes relevant in every application where the substate theorem is used. We construct ensembles of probability distributions (equivalently, jointly distributed random variables) for which the Holevo and divergence information are quantitatively different. Theorem 1. For every positive integer N , and real number k ≥ 1 such that N > 2 236k , there is an ensemble E of distributions over a sample space of size N such that D(E) = k and χ(E) = Θ(k log log N ). A more precise statement of this theorem (Theorem 4) and related results may be found in Section 3. The ensembles we construct satisfy the property that the ensemble average (i.e., the distribution of the random variable Y in the description above) is uniform. We show that the above separation is essentially the best possible whenever the ensemble average is uniform (Theorem 6). The result also applies to ensembles of quantum states, where the ensemble average is the completely mixed state (Theorem 7). We leave open the possibility of larger separations for classical or quantum ensembles with non-uniform averages. The difference between the two measures demonstrated by Theorem 1 shows that in certain applications, divergence is quantitatively a more relevant measure of information. In Appendix A, we describe two applications where functionally similar lower bounds have been established in terms of both measures. This
528
R. Jain, A. Nayak, and Y. Su
article shows that the lower bounds in terms of divergence information are, in fact, stronger. In prior work on the subject, Jain et al. [2, Appendix A] compare relative entropy and divergence for classical as well as quantum states. For pairs of distributions P, Q over a sample space of size N , they show that D(P Q) ≤ S(P Q) + 1, and S(P Q) ≤ D(P Q) · (N − 1). This extends to the corresponding measures of information in an ensemble: D(E) ≤ χ(E)+1 and χ(E) ≤ D(E)·(N − 1). They show qualitatively similar relations for ensembles of quantum states. In addition, they construct a pair of distributions P, Q such that S(P Q) = Ω(D(P Q) · N ). However, they do not translate their construction to a similar separation for ensembles of probability distributions. Our work fills this gap for ensembles (of classical or quantum states) with a uniform average.
2
Preliminaries
Here, we summarise our notation and the information-theoretic concepts we encounter in this work. We refer the reader to the text by Cover and Thomas [4] for a deeper treatment of (classical) information theory. While the bulk of this article pertains to classical information theory, as mentioned in Section 1, it is motivated by studies in (and has implications for) quantum information. We refer the reader to the text [3] for an introduction to quantum information. For a positive integer N , let [N ] represent the set {1, . . . , N }. We view probability distributions over [N ] as vectors in RN . The probability assigned by distribution P to a sample point i ∈ [N ] is denoted by pi (i.e., with the same letter in small case). We denote by P ↓ the distribution obtained from P by composing ↓ ↓ ↓ ↓ it with a permutation π on [N ] so that pi = pπ(i) and p1 ≥ p2 ≥ · · · ≥ pN . For an event E ⊆ [N ], let P (E) = i∈E pi denote the probability of that event. We denote the uniform distribution over [N ] by UN . The expected value of a function f : [N ] → R with respect to the distribution P over [N ] is abbreviated as EP f . We appeal to the majorisation relation for some of our arguments. The relation tells us which of two given distributions is “more random”. Definition 1 (Majorisation). Let P, Q be distributions over [N ]. We say that P majorises Q, denoted as P Q, if i j=1
p↓j ≥
i
qj↓ ,
j=1
for all i ∈ [N ]. The following is straightforward. Fact 2. Any probability distribution P on [N ] majorises UN , the uniform distribution over [N ]. Throughout this article, we use ‘log’ to denote the logarithm with base 2, and ‘ln’ to denote the logarithm with base e.
A Separation between Divergence and Holevo Information for Ensembles
529
Definition 2 (Entropy, relative entropy). Let P, Q be probability distribu def tions on [N ]. The entropy of P is defined as H(P ) = − N i=1 pi log pi . The relative entropy between P, Q, denoted S(P Q), is defined as S(P Q)
def
=
N
pi log
i=1
pi . qi
Note that the relative entropy with respect to the uniform distribution is connected to entropy as S(P UN ) = log N − H(P ). We can formalise the connection between majorisation and randomness through the following fact. Fact 3. If P, Q are distributions over [N ] such that P majorises Q, i.e. P Q, then H(P ) ≤ H(Q). The notion of observational divergence was defined by Jain, Radhakrishnan, and Sen [1] in the context of the “substate theorem”. Definition 3 (Observational divergence). Let P, Q be probability distributions on [N ]. Then the observational divergence between them, denoted D(P Q), is defined as EP f def D(P Q) = . max (EP f ) log f :[N ]→[0,1] EQ f Note that we allow the quantity to take the value +∞. Throughout the paper we refer to ‘observational divergence’ as simply ‘divergence’. Divergence D(P Q) is always non-negative, and it is finite precisely when the support of P is contained in the support of Q [1]. Due to convexity, the divergence between two distributions is attained by the characteristic function of an event. Lemma 1. D(P Q)
=
maxE⊆[N ] P (E) log
P (E) Q(E)
.
Proof. Let F denote the (convex) set of functions from [N ] to [0, 1]. The extreme points of F are precisely the characteristic functions of events in [N ]. For an extreme point, say the characteristic function fE of the event E ⊆ [N ], we have EP fE = P (E). If the divergence is +∞, then there is an event for which the right hand side also takes the value +∞. So assume that the divergence is finite. In this case, the right hand side also is finite, as the support of P is contained in the support of Q. By restricting f : [N ] → [0, 1] to characteristic functions of events, we see that D(P Q) is at least the expression on the right hand side above. For the inequality in the other direction, we note that the function ax + b g(x) = (ax + b) log cx + d defined on [0, 1] is convex in x, for any a, b, c, d ∈ R such that ax + b ≥ 0 and cx + d > 0 when x ∈ [0, 1]. Therefore, the function g(x) attains its maximum at either x = 0 or at x = 1.
530
R. Jain, A. Nayak, and Y. Su
The convexity of g(x) implies that for any α ∈ [0, 1], and functions f, f ∈ F, we have EP (αf + (1 − α)f ) EQ (αf + (1 − α)f ) α(EP f − EP f ) + EP f = (α(EP f − EP f ) + EP f ) log α(EQ f − EQ f ) + EQ f EP f EP f , (EP f ) log ≤ max (EP f ) log . EQ f EQ f
(EP (αf + (1 − α)f )) log
Thus, the divergence is attained at an extreme point of F . This proves the claim. Henceforth, we only use the equivalent definition of divergence given by this lemma. The divergence of any distribution with respect to the uniform distribution is bounded. Lemma 2. For any probability distribution P on [N ], we have 0 ≤ D(P UN ) ≤ log N . Proof. Consider the event E which achieves the divergence between P and UN . W.l.o.g., the event E is non-empty. Therefore P (E) ≥ UN (E) ≥ 1/N , and 0
≤
D(P UN )
≤
P (E) log P (E)N
≤
log N.
We observe that we need only maximise over N events to calculate divergence with respect to the uniform distribution. Lemma 3. For any probability distribution P on [N ] such that P ↓ = P , i.e., p1 ≥ p2 ≥ · · · ≥ pN , we have D(P UN )
=
max P ([i]) log i∈[N ]
N · P ([i]) . i
Proof. By definition of observational divergence, the RHS above is bounded by D(P UN ). For the inequality in the other direction, we note that the probability P (E) of any event E with size nE = |E| is bounded by P ([nE ]), the probability of the first nE elements in [N ]. We thus have N · P (E) nE N · P ([nE ]) ≤ max P (E) log E⊆[N ] nE N · P ([nE ]) ≤ max P ([nE ]) log , E⊆[N ] nE
D(P Q) = max P (E) log E⊆[N ]
A Separation between Divergence and Holevo Information for Ensembles
since P majorises UN (Fact 2) and P ([nE ]) ≥ in the statement of the lemma.
nE N .
531
This is equivalent to the RHS
Definition 4 (Ensemble). An ensemble is a sequence of pairs {(λj , Qj ): j∈ [M ]}, for some integer M , where Λ = (λj ) ∈ RM is a probability distribution on [M ] and Qj are probability distributions over the same sample space. Definition 5 (Holevo information). The Holevo information of an ensemble E = {(λj , Qj ) : j ∈ [M ]}, denoted as χ(E), is defined as χ(E)
M
def
=
λj S(Qj Q),
j=1
where Q =
M j=1
λj Qj is the ensemble average.
Definition 6 (Divergence information). The divergence information of an ensemble E = {(λj , Qj ) : j ∈ [M ]}, denoted as D(E) is defined as D(E)
def
=
M
λj D(Qj Q),
j=1
where Q =
3
M j=1
λj Qj is the ensemble average.
Divergence Versus Relative Entropy
In this section, we describe the construction of an ensemble for which there is a large separation between divergence and Holevo information. The ensemble has the property that the ensemble average is uniform. As a by-product of our construction, we also obtain a bound on the maximum possible separation for ensembles with a uniform average. We begin with the construction of the ensemble. Let fL (k, N ) = k(ln log(kN )− ln(6k) + 1) − log(1 + k ln 2) − 1 − ln12 on point in the positive orthant in R2 with N k > 1. Theorem 4. For every integer N > 1, and every positive real number 16 N ≤k < 1 1 log N , there is an ensemble E = ( N , Qi ) : i ∈ [N ] with N i Qi = UN , the uniform distribution over [N ], with D(E) ≤ k, and χ(E)
≥
fL (k, N ).
To construct the ensemble described in the theorem above, we first construct a probability distribution P on [N ] with observational divergence D(P UN ) ≤ k such that its relative entropy S(P UN ) is large as compared with k. Let fU = k(ln log(N k) − ln k + 1) be defined on points in the positive orthant of R2 with kN > 1. Theorem 5. For every integer N > 1, and every positive real number log N , there is a probability distribution P with D(P UN ) = k, and
16 N
≤k<
532
R. Jain, A. Nayak, and Y. Su
fL (k, N )
≤
S(P UN )
≤
fU (k, N ).
The construction of the ensemble is now immediate. Proof of Theorem 4: Let Qj = P ◦πj , where πj is the cyclic permutation of [N ] by j − 1 places. We endow the set of the N cyclic permutations {Qj : j ∈ [N ]} of P with the uniform distribution. By construction, the ensemble average is UN . Since both observational divergence and relative entropy with respect to the uniform distribution are invariant under permutations of the sample space, D(E) = D(P UN ) ≤ k, and χ(E) = S(P UN ) ≥ fL (k, N ). We turn to the construction of the distribution P . Our construction is such that P ↓ = P , i.e., p1 ≥ p2 ≥ · · · ≥ pN . Lemma 3 tells us that we need only ensure that P ([i]) log
N · P ([i]) ≤ k, i
∀ i ∈ [N ],
(1)
to ensure D(P Q) ≤ k. Since S(P UN ) = log N − H(P ), we wish to minimise the entropy of P subject to the constraints in Eq. (1). This is equivalent to successively maximising p1 , p2 , . . ., and motivates the following definitions. Define the function g(y, x) = y log(N y/x) − k on the positive orthant of R2 . Consider the function h : R+ → R+ implicitly defined by the equation g (h(x), x) = 0. Lemma 4. The function h : R+ → R+ is well-defined, strictly increasing, and concave. Proof. Fix an x ∈ R+ , and consider the function gx (y) = g(y, x). This function is continuous on R+ , tends to −k < 0 as y → 0+ , and tends to ∞ as y → ∞. By Intermediate Value Theorem, for some y > 0, we have gx (y) = 0. Moreover, gx (y) < −k for 0 < y ≤ x/N , and is strictly increasing for y > x/N e (its derivative is gx (y) = log eNx y ). Therefore there is a unique y such that gx (y) = 0 and h(x) is well-defined. The function h satisfies the equation h log Nxh = k, and therefore the identity x
=
2 N h exp − k ln . h
Differentiating with respect to h, we see that
k ln 2 dx 2 =N 1+ exp − k ln , and h dh h
N (k ln 2)2 d2 x 2 . = exp − k ln h dh2 h3 So dh dx > 0 for all x > 0, and h is a strictly increasing function. Note also 2 that ddhx2 > 0 for all h > 0, so x is a convex function of h. Since h is an increasing function, convexity of x(h) implies concavity of h(x).
A Separation between Divergence and Holevo Information for Ensembles
533
def
Let v0 = 0. For i ∈ [N ], let vi = h(i), i.e., vi log Nivi = k. Let si = min{1, vi }, for i ∈ [N ]. Let p1 = s1 , and pi = si − si−1 for all 2 ≤ i ≤ N . Lemma 4 guarantees that these numbers are well-defined. We claim that Lemma 5. The vector P = (pi ) ∈ RN defined above is a probability distribution, and P ↓ = P , i.e., p1 ≥ p2 ≥ · · · ≥ pN . Proof. By definition, we have vi > 0 for all i ∈ [N ]. Therefore s1 = min {1, v1 } > 0. Since h(x) is an increasing function in x, the sequence (vi ) is also increasing, so (si ) is non-decreasing. Therefore pi = si − si−1 ≥ 0 for i > 1. Now vN log vN = k > 0. Since x log x ≤ 0 for x ∈ (0, 1], we have vN > 1. So sN = min {1, vN } = 1. Therefore N i=1 pi = sN = 1. So P is a probability distribution on [N ]. Note that (v2 /2) log(N v2 /2) = k/2 < k, so v1 > v2 /2. So s1 ≥ s2 /2, i.e., p1 ≥ p2 . For i ≥ 2, we have pi −pi+1 = (si −si−1 )−(si+1 −si ) = 2si −si−1 −si+1 . Since h(x) is concave, so is the function min {1, h(x)}. Therefore, si ≥ (si−1 + si+1 )/2, and the sequence (pi ) is non-decreasing. The vector S = (si ) ∈ RN thus represents the (cumulative) distribution function corresponding to P . Proof of Theorem 5: We claim that the probability distribution P constructed above satisfies the properties stated in the theorem. Since P ↓ = P , by Lemma 3, we need only verify that si log(N si /i) ≤ k for i ∈ [N ]. If si = vi , then the condition is satisfied with equality. (Note that since k < log N , we have s1 = v1 < 1.) Else, si = 1 < vi , so si log(N si /i) < vi log(N vi /i) = k. We now bound the relative entropy S(P UN ) from below. Let n be the smallest positive integer such that vn−1 ≤ 1 and vn > 1. Note that n > 1. We also have n ≤ N , since vN > 1 (as vN log vN = k > 0). Therefore, we have si = vi (equivalently, N si = i2k/si ) for i ∈ [n − 1], and sn = 1 < vn . Thus, for 1 < i < n, k
k
N pi = i2 si − (i − 1)2 si−1 k
k
k
k
k si
k si
k
= 2 si + (i − 1)(2 si − 2 si−1 ) k
= 2 si + (i − 1)2 si−1 (2 si =2 ≥2
k si
k
= 2 si
−s
−s
k i−1
− 1)
k i−1
+ N si−1 (2 − 1) k k + N si−1 − ln 2 si si−1 N pi k − ln 2. si
The penultimate line follows from the inequality 2x ≥ 1 + x ln 2 for all x ∈ R. Thus we have k
N pi
≥
2 si . 1 + ski ln 2
(2)
534
R. Jain, A. Nayak, and Y. Su k
Since N p1 = N s1 = 2 s1 , this also holds for i = 1. We bound the relative entropy using Eq. (2).
S(P UN ) =
N
pi log N pi
=
i=1 n−1
n
pi log N pi
i=1 k
2 si + pn log N pn ≥ pi log 1 + ski ln 2 i=1 n−1 pi k n−1 k ln 2 ≥ − pi log 1 + + pn log N pn . si si i=1 i=1
(3)
We bound each of the three terms in the RHS of Eq.
separately. (3) n−1 We start with i=1 psiik . Let p = p1 , and let m = p1 . For every j ∈ [m], there is an i ∈ [n], say i = ij , such that jp ≤ sij ≤ (j +1)p. (Otherwise, for some i > 1, the probability pi = si − si−1 is strictly larger than p, an impossibility.)
1/s1 1/s2 1/s3 1/s4 1/s5
s1 p
s2
s3 2p
s4 3p
s5
1/s6
s6
4p
n−1 n−1 i−1 We interpret the sum i=2 psii = i=2 si −s as a Riemann sum approxisi mating the area under the curve 1/x between s1 and sn−1 with the area under the solid lines in Figure 3. This area is bounded from below by the area under the dashed lines, which corresponds to the area of rectangles of uniform width p and height 1/sj+1 for the jth interval. Thus,
A Separation between Divergence and Holevo Information for Ensembles n−1 i=1
535
m 1 pi k ≥k+k p· si sij+1 j=1
≥k+k
m
1 (j + 2)p
p·
j=1
=k+k
m j=1
1 j+2
m+3
1 dx x 3 m+3 = k + k ln . 3 ≥k+k
(4)
We lower bound m = function for y > point q = log2kkN : g1 (q)
=
1 eN ,
1 p
next. Recall that g1 (y) = y log(N y) is an increasing
and p = p1 ≥ 1/N . Consider the value of g1 (y) at the
2N k 2k log log kN log kN
log log kN 2k 1 − log kN
>
≥
since kN ≥ 16. As g1 (q) > g1 (p) > 0, we have q > p. Therefore, m ≥ log kN 2k
k, 1 p
−1 ≥
− 1. Together with Eq. (4), we get n−1 i=1
pi k ≥ k(ln log kN − ln 6k + 1). si
(5)
Next, we derive a lower bound for the second term in Eq. (3). −
n−1 n−1 k ln 2 pi log 1 + pi log(si + k ln 2) + pi log si =− si i=1 i=1 i=1
n−1
≥ − log(1 + k ln 2) +
n−1
pi log si .
(6)
i=1
Viewing the second term above as a Riemann sum, we get n−1
sn−1
pi log si ≥
log x dx 0
i=1
1
≥
log x dx 0
=−
1 . ln 2
(7)
536
R. Jain, A. Nayak, and Y. Su
Combining Eq. (6) and (7), we get n−1 k ln 2 1 − . pi log 1 + ≥ − log(1 + k ln 2) − si ln 2 i=1
(8)
We bound the third term in Eq. (3) crudely as pn log N pn ≥ −1. Along with the bounds for the previous two terms, Eq. (5), (8), this shows that S(P UN )
≥
fL (k, N )
def
k(ln log kN −ln 6k+1)−log(1+k ln 2)−1−
=
1 . ln 2 (9)
This proves the lower bound on the relative entropy. Moving to an upper bound, we have for i ≥ 2, k
k
N pi = i2 si − (i − 1)2 si−1 k
k
k
= 2 si + (i − 1)(2 si − 2 si−1 ) k
≤ 2 si , since the second term is negative. This also holds for i = 1, since p1 = s1 and s1 log N s1 = k. Therefore, S(P UN ) =
n
pi log N pi
i=1 n
kpi si i=1 ≤k+k ≤
1
s1
1 ds s
= k − k ln s1 log N k ≤ k + k ln k = k(1 − ln k + ln(log N k)). In the last inequality, we used the lower bound s1 ≥ k/ log N k.
The upper and lower bounds on the relative entropy of P with respect to the uniform distribution both behave as k log log N k up to constant factors. Proof of Theorem 1: The dominating term in both of lower bound and upper bound on the relative entropy S(P UN ), with P as in Theorem 5, is k ln log N k 2 when N is large as compared with k. Specifically, when N > 236k , we have 1 k log log N k 2
≤
S(P UN )
≤
2k log log N k.
By hypothesis, 1 ≤ k and by Lemma 2, we have k ≤ log N . Thus, S(P UN )
=
Θ(D(P UN ) log log N ).
A Separation between Divergence and Holevo Information for Ensembles
The same holds for the ensembles constructed in Theorem 4.
537
The separation we demonstrated above is the best possible for ensembles of distributions that have a uniform average distribution. Theorem 6. For any positive integer N , and any ensemble E= {(λj , Qj ) : j ∈ [M ]} M of distributions over [N ] such that j=1 λj Qj = UN , we have χ(E) ≤ K(2 ln log N − ln K + 1) + 16, where K = D(E). Proof. Let D(Qj UN ) = kj . We show that S(Qj UN ) ≤ kj (2 ln log N − ln kj + 1) 16 when kj ≥ 16 N . When kj < N , we have S(Qj UN ) < 16. Since k(2 ln log N −ln k+ 1) is a concave function in k, averaging over j with respect to the distribution Λ = (λj ) gives the claimed bound. ↓ Fix an j such that kj > 16 N . Let R = Qj . Note that D(RUN ) = kj and S (RUN ) = S(Qj UN ). Consider the distribution P constructed as in Section 3 with k = kj . Using the notation of that section, we have si log(N si /i) = kj for i def all i < n, and sn = 1. Let ti = l=1 rl , where rl = Pr(R = l). By definition, we have ti log(N ti /i) ≤ kj = si log(N si /i). Since the function gi (y) = y log(N y/i) is strictly increasing for y ≥ i/N e, and ti ≥ i/N (Fact 2), we have ti ≤ si for i < n. Since si = 1 for i ≥ n, we have ti ≤ si for these i as well. In other words, P R. By Fact 3, we have H(P ) ≤ H(R). This is equivalent to S(RUN ) ≤ S(P UN ). By Theorem 5, S(P UN ) ≤ kj (ln log(N kj ) − ln kj + 1). Since kj ≤ log N , this is at most kj (2 ln log N − ln kj + 1). Finally, we observe that this is also the best separation possible for an ensemble of quantum states with a completely mixed ensemble average. Theorem 7. For any positive integer N , and any ensemble E = {(λj , ρj ) : j ∈ [M ]} M of quantum states ρj over a Hilbert space of dimension N such that j=1 λj ρj = NI , the completely mixed state of dimension N , we have χ(E) ≤ K(2 ln log N − ln K + 1) + 16, where K = D(E). Proof. Let Qj be the probability distribution on [N ] corresponding to the eigenvalues of ρj . By definition of observational divergence for quantum states, D (Qj UN ) ≤ D(ρj NI ). Further, we have S(ρj NI ) = S(Qj UN ). We now apply the same reasoning as in the proof of Theorem 6, note that the divergence of the ensemble {(λj , Qj ) : j ∈ [M ]} is bounded by D(E), and that the RHS in the statement is a non-decreasing M function of K. This gives us the stated bound. (Note that we do not need j=1 λj Qj = UN to use the reasoning in Theorem 6.)
538
R. Jain, A. Nayak, and Y. Su
References 1. Jain, R., Radhakrishnan, J., Sen, P.: Privacy and interaction in quantum communication complexity and a theorem about the relative entropy of quantum states. In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002, pp. 429–438. IEEE Computer Society Press, Los Alamitos (2002) (A more complete version appears as In: [2]) 2. Jain, R., Radhakrishnan, J., Sen, P.: A theorem about relative entropy of quantum states with an application to privacy in quantum communication. Technical Report arXiv:0705.2437v1, ArXiv.org Preprint Archive (2007), http://www.arxiv.org/ 3. Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) 4. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley Series in Telecommunications. John Wiley & Sons, New York, NY, USA (1991) 5. Jain, R., Radhakrishnan, J., Sen, P.: Prior entanglement, message compression and privacy in quantum communication. In: Proceedings of the 20th Annual IEEE Conference on Computational Complexity, 2005, pp. 285–296. IEEE Computer Society Press, Los Alamitos (2005) 6. Jain, R.: Stronger impossibility results for quantum string commitment. Technical Report arXiv:quant-ph/0506001v4, ArXiv.org Preprint Archive (2005), http://www.arxiv.org/ 7. Jain, R.: Communication complexity of remote state preparation with entanglement. Quantum Information and Computation 6, 461–464 (2006) 8. Buhrman, H., Christandl, M., Hayden, P., Lo, H.-K., Wehner, S.: Security of quantum bit string commitment depends on the information measure. Physical Review Letters 97 (2006) (Article no. 250501) 9. Mayers, D.: Unconditionally secure quantum bit commitment is impossible. Physical Review Letters 78(17), 3414–3417 (1997) 10. Lo, H.-K., Chau, H.F.: Is quantum bit commitment really possible? Physical Review Letters 78, 3410–3413 (1997) 11. Kent, A.: Quantum bit string commitment (Article no. 237901). Physical Review Letters 90 (2003) (Article no. 237901)
A A.1
Implications for Quantum Protocols Quantum String Commitment
A string commitment scheme is an extension of the well-studied and powerful cryptographic primitive of bit commitment . In such schemes, one party, Alice, wishes to commit an entire string x ∈ {0, 1}n to another party, Bob. The protocol is required to be such that Bob not be able to identify the string until it is revealed by Alice. In turn, Alice should not be able to renege on her commitment at the time of revelation. Formally, quantum string commitment protocols are defined as follows [8,6]. Definition 7 (Quantum string commitment (QSC)). Let P = {px : x ∈ {0, 1}n} be a probability distribution and let B be a measure of information contained in an ensemble of quantum states. A (n, a, b)-B-QSC protocol for P is a quantum communication protocol between two parties, Alice and Bob.
A Separation between Divergence and Holevo Information for Ensembles
539
Alice gets an input x ∈ {0, 1}n chosen according to the distribution P . The starting joint state of the qubits of Alice and Bob is some pure state independent of x. The protocol runs in two phases: the commit phase, followed by the reveal phase. There are no intermediate measurements during the protocol. At the end of the reveal phase, Bob measures his qubits according to a POVM {My : y ∈ {0, 1}n} ∪ {I − y My } to determine the value of the committed string by Alice or to detect cheating. The protocol satisfies the following properties. 1. (Correctness) Suppose Alice and Bob act honestly. Let ρx be the state of Bob’s qubits at the end of the reveal phase of the protocol, when Alice gets input x. Then (∀x, y) Tr My ρx = 1 iff x = y, and 0 otherwise. 2. (Concealing property) Suppose Alice acts honestly, and Bob possibly cheats, i.e., deviates from the protocol in his local operations. Let σx be the state of Bob’s qubits after the commit phase when Alice gets input x. Then the B information B(E) of the ensemble E = {px , σx } is at most b. In particular, this also holds when both Alice and Bob follow the protocol honestly. 3. (Binding property) Suppose Bob acts honestly , and Alice possibly cheats. Let c ∈ {0, 1}n be a string in a special cheating register C with Alice that she keeps independent of the rest of the registers till the end of the commit phase. Let τc be the state of Bob’s qubits at the end of the reveal phase when def Alice has c in the cheating register. Let qc = Tr Mc τc . Then pc qc ≤ 2a−n c∈{0,1}n
The idea behind the above definition is as follows. At the end of the reveal phase of an honest run of the protocol Bob identifies x from ρx by performing the POVM measurement {My }y ∪ {I − y My }. He accepts the committed string to be x iff the observed outcome y = x; this happens with probability Tr Mx ρx . He declares that Alice is cheating if outcome I − x Mx is observed. Thus, at the end of an honest run of the protocol, with probability 1, Bob accepts the committed string as being exactly Alice’s input string. The concealing property ensures that the amount of B information about x that a possibly cheating Bob gets is bounded by b. In bit -commitment protocols, the concealing property is quantified in terms of the probability with which Bob can guess Alice’s bit. Here we instead use different notions of information contained in the corresponding ensemble. The binding property ensures that when a cheating Alice wishes to postpone committing to a string string until after the commit phase, then she succeeds in forcing an honest Bob to accept her choice with bounded probability (in expectation). Strong string commitment, in which both parameters a, b above are required to be 0, is impossible for the same reason that of strong bit-commitment protocols are impossible [9,10]. Weaker versions are nonetheless possible, and exhibit a trade-off between the concealing and binding properties. The trade-off between the parameters a and b has been studied by several researchers [11,8,6]. Buhrman, Christandl, Hayden, Lo, and Wehner [8] study this trade-off both in the scenario
540
R. Jain, A. Nayak, and Y. Su
of a single execution of the protocol and also in the asymptotic regime, with an unbounded number of parallel executions of the protocol. In the asymptotic scenario, they show the following result in terms of Holevo information (which is denoted by χ). Theorem 8 ([8]). Let Π be an (n, a1 , b)-χ-QSC scheme. Let Πm represent m independent, parallel executions of Π (so Π1 = Π). Let am represent the binding def parameter of Πm and let a = limm→∞ am /m. Then, a + b ≥ n. Jain [6] shows a similar trade-off result regarding QSCs, in terms of the divergence information of an ensemble (denoted by D). Theorem 9 ([6]). For single execution of the protocol of an (n, a, b)-D-QSC scheme, √ a + b + 8 b + 1 + 16 ≥ n. As mentioned before, for any ensemble E, divergence information is bounded by the Holevo χ-information D(E) ≤ χ(E) + 1. This immediately implies: Theorem 10 ([6]). For single execution of the protocol of a (n, a, b)-χ-QSC scheme √ a + b + 8 b + 2 + 17 ≥ n. As Jain shows, this implies the asymptotic result due to Buhrman et al. (Theorem 8). The separation that we demonstrate between divergence and Holevo information (Theorem 1) shows that for some ensembles over n qubits, D(E) may be a log n larger than χ(E). For such ensembles the binding-concealing trade-off of Theorem 9 is stronger than that of Theorem 8. A.2
Privacy Trade-Off for Two-Party Protocols for Relations
Let us consider two-party protocols between Alice and Bob for computing a relation f ⊆ X × Y × Z. The goal here is to find a z ∈ Z such that (x, y, z) ∈ f , when Alice and Bob are given x ∈ X and y ∈ Y, respectively. Jain, Radhakrishnan, and Sen [1] studied to what extent the two parties may solve f while keeping their respective inputs hidden from the other party. They showed the following: Result 11 ([5], informal statement). Let μ be a product distribution on X × Y. Let (f ) represent the one-way distributional complexity of f with a single Qμ,A→B 1/3 communication from Alice to Bob and distributional error under μ at most 1/3. Let X and Y represent the random variables corresponding to Alice and Bob’s inputs respectively. If there is a quantum communication protocol for f where Bob leaks divergence information at most b about his input Y , then Alice leaks (f )/2O(b) ) about her input X. A simdivergence information at least Ω(Qμ,A→B 1/3 ilar statement also holds with the roles of Alice and Bob interchanged.
A Separation between Divergence and Holevo Information for Ensembles
541
From the upper bound on the divergence information in terms of Holevo information this immediately implies the following. Result 12 ([5], informal statement). Let μ be a product distribution on X × Y. Let Qμ,A→B (f ) represent the one-way distributional complexity of f with 1/3 a single communication from Alice to Bob and distributional error under μ at most 1/3. Let X and Y represent the random variables corresponding to Alice and Bob’s inputs respectively. If there is a quantum communication protocol for f where Bob leaks Holevo information at most b about his input Y , then Alice (f )/2O(b) ) about her input X. A leaks Holevo information at least Ω(Qμ,A→B 1/3 similar statement also holds with the roles of Alice and Bob interchanged. It follows from Theorem 1 that Result 11 is much stronger than the second, Result 12 in case the ensembles arising in the protocol between Alice and Bob has divergence information much smaller than its Holevo information.
Unary Automatic Graphs: An Algorithmic Perspective Bakhadyr Khoussainov1, Jiamou Liu1 , and Mia Minnes2 1
Department of Computer Science University of Auckland, New Zealand 2 Department of Mathematics Cornell University, USA
Abstract. This paper studies infinite graphs produced from a natural unfolding operation applied to finite graphs. Graphs produced via such operations are of finite degree and can be described by finite automata over the unary alphabet. We investigate algorithmic properties of such unfolded graphs given their finite presentations. In particular, we ask whether a given node belongs to an infinite component, whether two given nodes in the graph are reachable from one another, and whether the graph is connected. We give polynomial time algorithms for each of these questions. Hence, we improve on previous work, in which nonelementary or non-uniform algorithms were found.
1
Introduction
The underlying idea of automatic structures consists of using automata to represent structures and then to study the logical and algorithmic consequences of such presentations. For example, there are descriptions of automatic linear orders and trees in model theoretic terms such as Cantor-Bendixson ranks [13], [10]. Thomas and Oliver gave a full description of finitely generated automatic groups [12]. Khoussainov, Nies, Rubin and Stephan have characterized the isomorphism types of automatic Boolean algebras [8]. These results give the decidability of the isomorphism problems for automatic ordinals and Boolean algebras [13]. The complexity of the first-order theories of automatic structures has also been studied. Gr¨ adel and Blumensath constructed examples of automatic structures whose first-order theories are non-elementary [2]. Lohrey, on the other hand, proved that the first-order theory of any automatic graph of bounded degree is elementary [11]. This paper continues this line of research and investigates computational properties of unary automatic graphs of finite degree. We use a fundamental algorithmic property of automatic structures proved by Khoussainov and Nerode: the first-order theory of any automatic graph is decidable [7]. In particular, for a fixed first-order formula φ(¯ x) and an automatic graph G, determining if a tuple a ¯ from G satisfies φ(¯ x) can be done in linear time. Refining this, we find polynomial time algorithms for natural graph theoretic questions in the class of unary automatic graphs of finite degrees. Since all such graphs can be obtained by an unfolding operation applied to finite graphs (see Theorem 2), M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 542–553, 2008. c Springer-Verlag Berlin Heidelberg 2008
Unary Automatic Graphs: An Algorithmic Perspective
543
we measure complexity based on the input size of the finite graphs. Specifically, we are interested in the following decision problems for the graph G determined by the pair of finite graphs (D, F ): • • • •
Connectivity Problem. Is the graph G connected? Reachability Problem. Given vertices x, y, is there a path from x to y? Infinite Component Problem. Does G have an infinite component? Infinity Testing Problem. Given a vertex x, is it in an infinite component?
For finite graphs, the first two problems can be solved in linear time and the last two have obvious answers. However, for infinite graphs, much more work is needed to investigate these problems. In the class of all automatic graphs, all of these problems are undecidable (see [13]). Since all unary automatic graphs are first-order definable in S1S (the monadic second-order logic of the successor function), it is not hard to prove that all the problems above are decidable ([1], [13]). However, the constructions which appeal to S1S yield algorithms with non-elementary time complexity, since one needs to transform S1S formulas into automata ([4]). The reachability problem has been studied in [3], [5], and [14] via pushdown graphs. A pushdown graph is the configuration space of a pushdown automaton. Unary automatic graphs are examples of pushdown graphs [14]. In [3], [5], [14] it is proved that for a given node v in a pushdown graph, there is an automaton that recognizes all nodes reachable from v. The size of this automaton depends on the input node v. Moreover, the automata constructed by this algorithm are not uniform (different automata are built for different vertices v). It is therefore interesting to see for which classes of graphs the reachability problem has a uniform solution (an automaton that tells whether any two nodes belong to the same component). The practical advantage of a uniform solution is that, once the automaton that recognizes reachability relation is built, deciding whether node v is reachable from u by a path takes only linear time In this paper, we show that for unary automatic graphs of finite degree, all the problems above can be solved in polynomial time. Moreover, the reachability problem has a uniform solution. We now outline the rest of the paper. Section 2 introduces the main definitions needed and recalls a characterization theorem (Theorem 1) for unary automatic graphs. Section 3 introduces unary automatic graphs of finite degree; Theorem 2 explicitly provides a method for building these graphs and is used throughout the paper. Section 4 and Section 5 solve the infinite component problem and infinity testing problem, respectively. For easy reference, we list the main results below. G is a given unary automatic graph of finite degree, A is the unary automaton recognizing G, and n is the number of states of A. 3
Theorem 3. The infinite component problem for G is solved in O(n 2 ). 5
Theorem 4. The infinity testing problem for G is solved in O(n 2 ). When A is fixed, a constant time algorithm decides the infinity testing problem on G.
544
B. Khoussainov, J. Liu, and M. Minnes
Section 6 gives a polynomial time algorithm constructing uniform automata that solve the reachability problem. This algorithm also yields a solution to the connectivity problem for unary automatic graphs of finite degree. Theorem 5. A polynomial time algorithm solves the reachability problem on 5 G. For inputs u, v, the running time of the algorithm is O(|u| + |v| + n 2 ). Theorem 6. The connectivity problem for G is solved in O(n3 ).
2
Preliminaries
A finite automaton A over Σ is a tuple (Q, ι, Δ, F ), where Q is a finite set of states, ι ∈ Q is the initial state, Δ ⊂ Q × Σ × Q is the transition table, and F ⊂ Q is the set of final states. A run of A on a word σ1 . . . σn ∈ Σ is a sequence q0 , . . . , qn such that q0 = ι and (qi , σi+1 , qi+1 ) ∈ Δ for all i ≤ n − 1. If qn ∈ F then the run is successful and we say that the automaton A accepts the word. The language accepted by the automaton A is the set of all words accepted by A. A set D ⊂ Σ is FA recognizable if D is the language accepted by some finite automaton. For two states q0 , q1 , the distance from q0 to q1 is the minimum number of transitions required for A to go from q0 to q1 . If |Σ| = 1, we call A a unary automaton. A 2-tape automaton is a oneway Turing machine with two semi-infinite input tapes. Each tape has written on it a word from Σ followed by a succession of symbols. The automaton starts in the initial state, reads simultaneously the first symbol of each tape, changes state, reads simultaneously the second symbol of each tape, changes state, etc., until it reads on each tape. The automaton then stops and accepts the 2-tuple of words on its input tapes if it is in a final state. Formally, set Σ = Σ ∪ {} where ∈ Σ. The convolution of a tuple (w1 , w2 ) ∈ Σ 2 is the string w1 ⊗ w2 of length maxi |wi | over the alphabet (Σ )2 which is defined as follows: the k th symbol is (σ1 , σ2 ) where σi is the k th symbol of wi if k ≤ |wi |, and is otherwise. The convolution of a relation E ⊂ Σ 2 is the language ⊗E = {w1 ⊗ w2 | (w1 , w2 ) ∈ E}. The relation E ⊂ Σ 2 is FA recognizable if ⊗E is recognizable by a 2-tape automaton. A graph G = (V, E) is automatic over Σ if its vertex set V ⊂ Σ and the edge relation E are FA recognizable. The binary tree ({0, 1}, E), where E = {(x, y) | y = x0 or y = x1}, is an automatic graph. We are interested in the following class of automatic graphs: Definition 1. A unary automatic graph is a graph (V, E) whose domain is a regular subset of {1} and whose edge relation E is regular. Convention. To eliminate bulky exposition, we fix the following assumptions: 1) By “automatic graph”, we always mean “unary automatic graph”. 2) All graphs are infinite unless explicitly specified otherwise. 3) The domains of automatic graphs coincide with the set 1 of all unary strings {λ, 1, 11, 111, . . .}. Hence, the automaton recognizing the edge relation is sufficient for describing the graph. 4) The graphs are undirected. All the notions and results below can be adapted
Unary Automatic Graphs: An Algorithmic Perspective
545
to the case when the domains are regular subsets of 1 and when the graphs are directed without materially changing the complexity of the algorithms. Let G = (V, E) be an automatic graph. Let A be a unary automaton recognizing E with n states. The general shape of A is given in Figure 1. All the states reachable from the initial state by reading input (1, 1) are called (1, 1)-states. A tail in A is a sequence of states linked by transitions without repetition. A loop is a sequence of states linked by transitions such that the last state coincides with the first one, and with no repetition in the middle. The set of (1, 1)-states is a disjoint union of a tail and a loop, called the (1, 1)-tail and the (1, 1)-loop. Let q be a (1, 1)-state. All the states reachable from q by reading inputs (1, ) are called (1, )-states. This collection of (1, )-states is also a disjoint union of a tail and a loop (see the figure), called the (1, )-tail and the (1, )-loop. The (, 1)-tails and (, 1)-loops are defined in a similar way. Since we consider undirected graphs, we simplify the general shape of the automaton by only considering edges labelled by (, 1) and (1, 1). An automaton is standard if the lengths of all its loops and tails equal some number p, called the loop constant. (, 1)-loop
(1, )-loop (, 1)-tail
(1, )-tail (1, 1)-tail
(1, 1)-loop
Fig. 1. A Typical Unary Graph Automaton
We recall a characterization theorem of unary automatic graphs from [13]. Let D = (D, ED ) and F = (F, EF ) be finite graphs. Let R1 , R2 be subsets of D × F , and R3 , R4 be subsets of F × F . Consider the graph D followed by ω many copies of F , ordered as F 0 , F 1 , F 2 , . . .. Formally, the vertex set of F i is F × {i} and we write f i = (f, i) for f ∈ F and i ∈ ω. The edge set E i of F i consists of all pairs (ai , bi ) such that (a, b) ∈ EF . We define the infinite graph, ¯ as follows: the vertex set is D ∪ F 0 ∪ F 1 ∪ F 2 ∪ . . .; the edge unwind(D, F , R), set contains ED ∪ E 0 ∪ E 1 ∪ . . . as well as the following edges, for all a, b ∈ F , d ∈ D, and i, j ∈ ω: – (d, b0 ) when (d, b) ∈ R1 , and (d, bi+1 ) when (d, b) ∈ R2 , – (ai , bi+1 ) when (a, b) ∈ R3 , and (ai , bi+2+j ) when (a, b) ∈ R4 . Theorem 1. [9] A graph G has a unary automaton presentation if and only if it ¯ for some parameters D, F , and R. ¯ Moreover, is isomorphic to unwind(D, F , R) ¯ can be if A is a standard automaton representing G then the parameters D, F , R 2 2n extracted in O(n ); otherwise, the parameters can be extracted in O(n ), where n is the number of states in A.
546
3
B. Khoussainov, J. Liu, and M. Minnes
Unary Automatic Graphs of Finite Degree
A graph is of finite degree if there are finitely many edges connected to each vertex v. A unary automaton A recognizing a binary relation is a one-loop automaton if its transition diagram contains exactly one loop, the (1, 1)-loop. The following is an easy proposition: Proposition 1. Let G = (V, E) be a unary automatic graph, then G is of finite degree if and only if there is a one-loop unary automaton A recognizing E.
Each unary automaton has an equivalent standard unary automaton. In general, the standard automaton may have exponentially more states. However, if A is a one-loop automaton with n states, the (1, 1)-loop of the equivalent standard one-loop automaton has at most n states, so the automaton itself has at most 4n2 states. Below, we assume the input automaton A is standard. Let p be the loop constant of A, then A has exactly 4p2 states. In the following, we state all results in terms of p rather than n, the number of states of the input automaton. Definition 2 (Unfolding Operation). Let D = (VD , ED ) and F = (VF , EF ) be finite graphs. The finite sets ΣD,F , ΣF contain all mappings η : VD → P (VF ) and σ : VF → P (VF ) (respectively). The sequence α = ησ0 σ1 . . . where η ∈ ΣD,F and σi ∈ ΣF for each i yields the infinite graph Gα = (Vα , Eα ) as follows: • Vα = VD ∪ {(v, i) | v ∈ VF , i ∈ ω}. • Eα = ED ∪ {(d, (v, 0)) | v ∈ η(d)} ∪ {((v, i), (v , i)) | (v, v ) ∈ EF , i ∈ ω} ∪ {((v, i), (v , i + 1)) | v ∈ σi (v), i ∈ ω}. Figure 2 illustrates the general shape of a unary automatic graph of finite degree built from D, F , η, and σ ω (σ ω is the infinite word σσσ · · · ). We use Definition 2 to recast Theorem 1 for graphs of finite degree. The proof is omitted.
Fig. 2. Unary automatic graph of finite degree Gησ ω
Theorem 2. A graph of finite degree G = (V, E) possesses a unary automatic presentation if and only if there exist finite graphs D, F and mappings η : VD → P (VF ) and σ : VF → P (VF ) such that G is isomorphic to Gησω .
If G is a unary automatic graph of finite degree, the parameters D, F , σ and η can be extracted in O(p2 ) time, where p is the loop constant of the one-loop automaton representing the graph. Furthermore, |VF | = |VD | = p.
Unary Automatic Graphs: An Algorithmic Perspective
4
547
Deciding the Infinite Component Problem
A component of a graph is the transitive closure of a vertex under the edge relation. The infinite component problem asks whether a graph G has an infinite component. Theorem 3. The infinite component problem for unary automatic graphs of finite degree G is solved in O(p3 ), where p is the loop constant of the unary automaton recognizing G. By Theorem 2, it suffices to consider the case when G = Gσω since Gησω has an infinite component if and only if Gσω has one. Let F i be the ith copy of F in G and xi be the copy of vertex x in F i . The finite directed graph F σ = (V σ , E σ ) is defined as follows. Nodes in V σ are the distinct connected components of F . For simplicity, we assume that |V σ | = |VF | and use x to denote its own component in F . The case in which |V σ | < |VF | is similar. For x, y ∈ VF , put (x, y) ∈ E σ if and only if y ∈ σ(x ) for some x and y that are in the same component as x and y, respectively. Constructing F σ requires finding connected components of F and hence takes time O(p2 ). To prove Theorem 3, we make essential use of the following definition which is taken from [6]. Definition 3. An oriented walk in a directed graph G is a subgraph P of G that consists of a sequence of nodes v0 , . . . , vk such that for 1 ≤ i ≤ k, either (vi−1 , vi ) or (vi , vi−1 ) is an arc in G, and for each 1 ≤ i ≤ k, exactly one of (vi−1 , vi ) and (vi , vi−1 ) belongs to P. An oriented walk is an oriented cycle if v0 = vk and there are no repeated nodes in v1 , . . . , vk . In an oriented walk P, an arc (vi , vi+1 ) is called a forward arc and (vi+1 , vi ) is called a backward arc. The net length of P, denoted disp(P), is the difference between the number of forward arcs and backward arcs. Note that the net length can be negative. Given an oriented walk P = v0 , . . . , vm , we define the low point of P as min{disp(v0 . . . v ) | 0 ≤ ≤ m}. The low point of the oriented walk P is at most min{0, disp(P)}, and hence is not positive. The next lemma establishes a connection between oriented walks in F σ and paths in G. Lemma 1. Let P be an oriented walk from x to y whose net length is d and low point is − . For every i ≥ , the oriented walk P defines a path P i in G from xi to y i+d . Moreover, the smallest j such that P i ∩ F j = ∅ is equal to i − .
Lemma 2. There is an infinite component in G if and only if there is an oriented cycle in F σ with positive net length. Proof. We prove one direction; the other is left to the reader. Suppose there is an infinite component D in G. Since F is finite, there must be some x in VF such that there are infinitely many copies of x in D. Let xi and xj be two copies of x in D with i < j. Consider a path between xi and xj . We can assume that on this path there is at most one copy of any vertex y ∈ VF apart from x (otherwise, there is another vertex in VF having an infinite number of copies in the infinite component with these properties). By definition of Gσω and F σ , the node x must be on an oriented cycle of F σ with net length j − i.
548
B. Khoussainov, J. Liu, and M. Minnes
Proof (Theorem 3). By Lemma 2, it suffices to decide if F σ contains an oriented cycle with positive net length. Such an oriented cycle exists if and only if there is an oriented cycle with negative net length. Therefore, the following algorithm searches for oriented cycles with non-zero net length. ALG:Oriented-Cycle 1. Pick the first node x ∈ F σ for which a queue has not been built. Initialize the queue Qx to be empty. Let d(x) = 0, and put x into Qx marked as unprocessed. If there is no such x ∈ F σ , stop the process and return NO. 2. Define y to be the first unprocessed node in the queue Qx . If there are no unprocessed nodes in Qx , return to (1). 3. For each node z in the set {z | (y, z) ∈ E σ or (z, y) ∈ E σ }, do the following: (a) If (y, z) ∈ E σ , set d (z) = d(y) + 1; if (z, y) ∈ E σ , set d (z) = d(y) − 1. (If both hold, do steps (a), (b), (c) first for (z, y) and then for (y, z).) (b) If z ∈ / Qx , set d(z) = d (z), put z into Qx , and mark z as unprocessed. (c) If z ∈ Qx then if d(z) = d (z), move to next z; if d(z) = d (z), stop the process and return YES. 4. Mark y as processed and go back to (2). We claim that the algorithm returns YES if and only if there is an oriented cycle in F σ with non-zero net length. Suppose the algorithm returns YES. Then, there is a base node x and a node z such that d(z) = d (z). Thus, there is an oriented walk P from x to z with net length d(z) and there is an oriented walk P from x to z with net length d (z). Let (P )− be the oriented walk P in reverse direction. Consider the oriented walk P(P )− : it is an oriented walk from x to x with net length d(z) − d (z) = 0. If there are no repeated nodes in P(P )− , it is the required oriented cycle. Otherwise, let y be a repeated node in P(P )− such that no nodes between the two occurrences of y are repeated. Consider the oriented walk between these two occurrences of y; if it has a non-zero net length it is our required oriented cycle and otherwise we can make the oriented walk P(P )− shorter without altering its net length. Conversely, suppose there is an oriented cycle P = x0 , . . . , xm of non-zero net length where x0 = xm . We assume for a contradiction that the algorithm returns NO. Consider how the algorithm acts when we pick x0 at step (1). For each 0 ≤ i ≤ m, the following statements hold (by induction on i). ( ) xi gets a label d(xi ) ( ) d(xi ) equals the net length of the oriented walk from x0 to xi in P. These statements suffice to yield a contradiction, and hence prove the correctness of Oriented-Cycle. Putting these pieces together, the following algorithm solves the infinite component problem. Suppose we are given a unary automaton (with loop constant p) which recognizes the unary automatic graph of finite degree G. Recall that p = |VF |. We compute F σ in time O(p2 ). Then we run Oriented-Cycle to decide if F σ contains an oriented cycle with positive net length. For each node x in F σ , the run time is O(p2 ). Since F σ contains p nodes, this takes time O(p3 ).
Unary Automatic Graphs: An Algorithmic Perspective
5
549
Deciding the Infinity Testing Problem
The infinity testing problem asks for an algorithm that, given a vertex v and graph G, decides if the vertex belongs to an infinite component of the graph G. Theorem 4. The infinity testing problem for G, a unary automatic graph of finite degree with loop constant p, is solved in O(p5 ). When A is fixed, there is a constant time algorithm that decides the infinity testing problem on G. To prove Theorem 4 we outline several lemmas, the more difficult of which we prove. The set C is defined as all nodes x in F σ for which there exists an oriented cycle from x with positive net length and low point 0. We call an oriented walk simple if it contains no repeated nodes. For any k ≥ 0, let C[k] be the set of all nodes x ∈ / C[0] ∪ . . . ∪ C[k − 1] that can reach C via a simple oriented walk with low point −k. Note that C ⊆ C[0]. Moreover, since |F σ | ≤ p, a simple oriented walk may have at most p steps and hence C[k] = ∅ for k > p − 1. Lemma 3. Let x ∈ VF . If xi belongs to an infinite component of G then for all j > 0, xi+j also belongs to an infinite component of G.
Lemma 4. If x ∈ C, then xi is in an infinite component for all i ∈ ω.
Lemma 5. For each vertex xi , xi belongs to an infinite component in G if and only if node x ∈ C[k] for some 0 ≤ k ≤ min{i, p − 1}. Proof. We prove the harder direction: if xi is in an infinite component, there is an oriented walk from x to C with low point −k, where 0 ≤ k ≤ min{i, p−1}. Let D be the infinite component of xi . Since F is finite, there must be y in VF such that D contains infinitely many copies of y. Let y s and y t be two copies of y in D with s < t. Take a path P in G between y s and y t such that P contains no more than one copy of each vertex in VF apart from y. (If there is no such path P , choose another vertex y in VF with these properties). Let be the least number such that P ∩ F = ∅. Let z be a vertex in P . Then P is divided into two paths P1 and P2 , where P1 goes from y s to z and P2 goes from z to y t . Hence there is a path P3 from y t to z +t−s . By joining P2 and P3 together we obtain a path between z and z +t−s . We have defined an oriented cycle in F σ with positive net length and low point 0. Hence, z ∈ C. Take a path in G between xi and a copy of z in D containing no more than one copy of each vertex in F . This is an
oriented walk in F σ from x to z with low point not more than min{i, p − 1}. Lemma 6. If G is a unary automatic graph of finite degree presented by A with loop constant p, the set C for G can be computed in time O(p4 ). Proof. For each x ∈ F σ , do a breadth-first search through F σ for oriented walks starting at x. To compute the path P, put (y, d) in a queue, where y is the incremental destination of P and d is its net length. We keep track of the following properties of the pair (y, d): 1. level(y, d) is the length of the oriented walk P from x to y; and
550
B. Khoussainov, J. Liu, and M. Minnes
2. path(y, d) is a tuple of pairs (x0 , d0 ) . . . (xlevel(y,d) , dlevel(y,d) ) coding the initial segment of P. Note: (x0 , d0 ) = (x, 0) and (xlevel(y,d) , dlevel(y,d) ) = (y, d). Given input x ∈ F σ , the following algorithm checks membership in C. ALG:C-Membership 1. Put (x, 0) into the (initially empty) queue Q. Mark (x, 0) as unprocessed and set level(x, 0) = 0, path(x, 0) = (x, 0). 2. If no unprocessed pair is left in the queue, stop and output NO. Otherwise, take the first unprocessed (y, d) in Q. 3. If level(y, d) ≥ p, stop and output NO. 4. For arcs e of the form (y, z) or (z, y) in E σ do the following: (a) If e = (y, z), set j = d + 1; if e = (z, y), set j = d − 1. (b) If z = x and j > 0, stop the process and return YES. / Q, then (c) If (z, d ) is not in path(y, d) for any d , and if j ≥ 0 and (z, j) ∈ put (z, j) into Q, mark (z, j) as unprocessed, set level(z, j) = level(y, d) + 1, set path(z, j) = path(y, d) · (z, j). 5. Mark (y, d) as processed and go back to (2). We claim that C-Membership on input x returns YES if and only if x ∈ C. Suppose the algorithm returns YES. Then there is a simple oriented walk P from x to x with positive net length. Let P be x0 , . . . , xm such that x0 = xm = x. The algorithm ensures that the net length of the (sub)oriented walk in P from x0 to each xi is non-negative. Thus, the low point of P is no less than 0 and x ∈ C. For the other direction, suppose P = x0 , . . . , xm is an oriented cycle of positive net length and zero low point. Assume the algorithm does not return YES. Run the algorithm from x0 . For all xi , the following statements hold by induction. ( ) There exists di ≥ 0 such that (xi , di ) ∈ Q. ( ) di equals the net length of the oriented walk from x0 to xi in P. Note that level(y, d) ≤ p for all (y, d) ∈ Q. Moreover, every time the level is incremented by 1 the net length either goes up or down by 1, hence d must also be no more than p. Thus, the cardinality of Q is bounded above by p2 and so for each x ∈ F σ , the algorithm takes time O(p3 ). To compute C, we need to run C-Membership on every x in F σ , taking time O(p4 ).
Using Lemma 6, we iteratively compute C[k] for any 0 ≤ k ≤ p − 1 as follows. First, compute the set C in time O(p4 ). For each x ∈ / C[0] ∪ · · · ∪ C[k − 1] we run operations similar to the ones described above, except that at step (4)(b), the process stops and returns YES whenever z ∈ C and j ≥ −k, and at step (4)(c), the process puts a pair (z, j) into the queue Q if j ≥ −k and (z, j) ∈ / Q. The proof of correctness is like that of C-Membership. This algorithm runs in O(p4 ). Proof (Theorem 4). We assume the input vertex xi is given by tuple (x, i). By Lemma 5, to check if xi is in an infinite component, the algorithm needs to compute C[0], . . . , C[min{i, p − 1}]. As a consequence of Lemma 6, this takes time O(p5 ). The algorithm then checks whether x ∈ C[k] for some 0 ≤ k ≤ min{i, p − 1}. Once the sets C[0], . . . , C[p − 1] are found, checking whether xi belongs to an infinite component takes constant time.
Unary Automatic Graphs: An Algorithmic Perspective
6
551
Deciding the Reachability and Connectivity Problems
The reachability problem asks whether two given vertices u and v in a unary automatic graph of finite degree belong to the same component. Theorem 5. Suppose G is a unary automatic graph of finite degree represented by unary automaton A of loop constant p. A polynomial time algorithm solves the reachability problem on G. For inputs xi , y j , the algorithm runs in O(i + j + p5 ). We restrict to the case G = Gσω (the general case requires few changes). The infinity testing algorithm checks if xi is in a finite component in O(p5 ) time, and leads to two possible cases. First, suppose that xi is in a finite component. Lemma 7. If xi is in a finite component, then xi and y j are in the same component only if i − p < j < i + p.
To check if xi and y j are in the same component, we run a breadth first search in G starting from xi visiting all vertices in F i−i , . . . , F i+p (i = min{p, i}) . ALG: FiniteReach 1. Put (x, 0) into the (initially empty) queue Q, marked as unprocessed. 2. If there are no unprocessed pairs in Q, stop the process. Otherwise, let (y, d) be the first unprocessed pair. For arcs e of the form (y, z) or (z, y) in E σ : (a) If e = (y, z), let d = d + 1; if e = (z, y), let d = d − 1. (b) If −i ≤ d ≤ p and (z, d ) ∈ / Q, put (z, d ) into Q marked as unprocessed. 3. Mark (y, d) as processed, and go to (2). Then, xi and y j are in the same (finite) component if and only if after running FiniteReach on the input xi , the pair (y, j − i) is in Q. The running time is bounded by the number of edges in G restricted to F 0 , . . . , F 2p , hence is O(p3 ). Corollary 1. If all components of G are finite and if we represent (xi , y j ) by
(xi , y j , j − i), an O(p3 )-algorithm checks reachability for xi and y j . On the other hand, suppose that xi is in an infinite component. We begin with an algorithm that computes all vertices y ∈ VF whose ith copy lies in the same component as xi . The algorithm is identical to FiniteReach, except that Line (2b) in FiniteReach is changed to the following: (2b’) If |d | ≤ p and (z, d ) ∈ Q, then put (z, d ) into Q and mark (z, d ) as unprocessed. We use this modified algorithm to define the set Reach(x) = {y | (y, 0) ∈ Q}. Intuitively, we can think of the algorithm as a breadth first search through F 0 ∪ · · · ∪ F 2p which originates at xp . Therefore, y ∈ Reach(x) if and only if there exists a path from xp to y p in G, restricted to F 0 ∪ · · · ∪ F 2p . Lemma 8. If xi , y i are both in infinite components, they are in the same component iff y ∈ Reach(x). Proof. Assume xi , y i are in infinite components. Suppose y ∈ Reach(x). There is a path P in G from xp to y p . Let be least such that F ∩ P = ∅. If i ≥ p − ,
552
B. Khoussainov, J. Liu, and M. Minnes
then xi and y i are in the same component. Thus, suppose i < p − . Let z be such that z ∈ P . Then P is P1 P2 where P1 is a path from xp to z and P2 is a path from z to y p . By Lemma 3, since xi is in an infinite component, so is xp . There is r > 0 such that the set {xp+rm | m ∈ ω} is contained in a single component. Likewise, there is an r > 0 such that {y p+r m | m ∈ ω} is in one component. Consider xp+rr and y p+rr . There is a path P1 P2 from xp+rr to y p+rr . A second path P from xp to y p goes from xp to xp+rr , then along P1 P2 from xp+rr to y p+rr , and finally to y p . The least such that F ∩ P = ∅ is larger than . Iteratively lengthening the path between xp and y p until i < p − brings us to the previous case. To prove the implication in the other direction, we assume that xi and y i are in the same infinite component. We want to prove that y ∈ Reach(x). Let i = min{p, i}. Let P be a path in G from xi to y i . We use P to construct a path which stays in F i−i ∪ · · · ∪ F i+p . Let (P ) be largest such that P ∩ F (P ) = ∅; let (P ) be least such that P ∩ F (P ) = ∅. If i − i ≤ (P ) and (P ) ≤ i + p, we are done. Otherwise, let P1 , . . . , Pk be a sequence of subpaths of P , each beginning and ending in F i , such that P = P1 · · · Pk and for each 1 ≤ j ≤ k, (Pj ) = i or (Pj ) = i. It is not hard to see that each Pj can be replaced by a path Pj with the same start and end points and which satisfies i − i ≤ (Pj ) ≤ (Pj ) ≤ i + p. This new path witnesses that y ∈ Reach(x).
We inductively define a sequence Cl0 (x), Cl1 (x), . . . such that each Clk (x) is a subset of VF . Set Cl0 (x) = Reach(x). For k > 0, set Clk (x) = Reach(σ(Clk−1 (x))). Lemma 9. Suppose j ≥ i and xi , y j are both in infinite components. xi and y j are in the same component if and only if y ∈ Clj−i (x).
The following algorithm uses the lemma to solve the reachability problem. ALG: Na¨ ıveReach 1. Check if each of xi , y j are in an infinite component of G (see Theorem 4). 2. If exactly one of xi and y j is in a finite component, then return NO. 3. If both xi , y j are in finite components, run FiniteReach on xi and check if (y, j − i) ∈ Q. 4. If both xi and y j are in infinite components, check if y ∈ Clj−i (x). Na¨ ıve Reach computes Cl0 (x) in time O(p3 ). Given Clk−1 (x), we can compute Clk (x) in time O(p4 ). Hence, on input xi , y j , Na¨ ıveReach takes time O((j − i) · p4 ). We will now improve this bound. From Lemma 5, xi is in an infinite component in G if and only if there is an oriented cycle C with positive net length, zero low point, and reachable from x by a simple oriented walk with low point ≥ −i. Assume xi is in an infinite component. The algorithm for the infinity testing problem finds such an oriented cycle C. And, it can compute the net length r of C. All vertices in {xi+mr | m ∈ ω} belong to the same component. Lemma 10. Cl0 (x) = Clr (x).
We give a new algorithm, Reach, by replacing line (4) in Na¨ ıveReach with: (4’) If xi and y j belong to infinite components, compute Cl0 (x), . . ., Clr−1 (x). If y ∈ Clk (x) for k < r with j − i = k mod r, return YES ; otherwise, return NO.
Unary Automatic Graphs: An Algorithmic Perspective
553
Proof (Theorem 5). By Lemma 9 and Lemma 10, Reach returns YES iff xi and y j are in the same component. Calculating Cl0 (x), . . . , Clr−1 (x) requires time O(p5 ). Therefore the running Reach on xi , y j takes O(i + j + p5 ).
In fact, the algorithm produces k < p such that to check if xi , y j (j > i) are in the same component, we need to test if j − i < p and if j − i = k mod p. If G is fixed, we may pre-compute Cl0 (x), . . . , Clrx −1 (x) for all x, so deciding if two vertices u, v belong to the same component takes linear time. The above proof can also be used to build a unary automaton that decides reachability uniformly. Corollary 2. With G as above, there is a deterministic automaton with at most 2p4 + p3 states that solves the reachability problem on G. The time required to
construct this automaton is O(p6 ). This corollary can be applied to solve the connectivity problem. Theorem 6. The connectivity problem for unary automatic graphs of finite degree is solved in time O(p6 ), where p is the loop constant of the unary automaton.
References 1. Blumensath, A.: Automatic Structures. Diploma Thesis, RWTH Aachen (1999) 2. Blumensath, A., Gr¨ adel, E.: Finite presentations of infinite structures: Automata and interpretations. Theory of Computing Systems 37, 642–674 (2004) 3. Bouajjani, A., Esparza, J., Maler, O.: Reachability analysis of pushdown automata: Application to model-checking. In: Mazurkiewicz, A., Winkowski, J. (eds.) CONCUR 1997. LNCS, vol. 1243, pp. 135–150. Springer, Heidelberg (1997) 4. B¨ uchi, J.R.: On a decision method in restricted second-order arithmetic. In: Proc. CLMPS, pp. 1–11. Stanford University Press (1960) 5. Esparza, J., Hansel, D., Rossmanith, P., Schwoon, S.: Efficient algorithms for model checking pushdown systems. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 232–247. Springer, Heidelberg (2000) 6. Hell, P., Ne˘ set˘ ril, J.: Graphs and Homomorphisms. Oxford University Press, Oxford (2004) 7. Khoussainov, B., Nerode, A.: Automatic presentation of structures. In: Leivant, D. (ed.) LCC 1994. LNCS, vol. 960, pp. 367–392. Springer, Heidelberg (1995) 8. Khoussainov, B., Nies, A., Rubin, S., Stephan, F.: Automatic structures: richness and limitations. In: Proc. LICS, pp. 44–53 (2004) 9. Khoussainov, B., Rubin, S.: Graphs with automatic presentations over a unary alphabet. J. of Automata, Languages and Combinatorics 6(4), 467–480 (2001) 10. Khoussainov, B., Rubin, S., Stephan, F.: Automatic linear orders and trees. ACM Trans. Comput. Log. 6(4), 675–700 (2005) 11. Lohrey, M.: Automatic structures of bounded degree. In: Vardi, M.Y., Voronkov, A. (eds.) LPAR 2003. LNCS (LNAI), vol. 2850, pp. 344–358. Springer, Heidelberg (2003) 12. Oliver, G.P., Thomas, R.M.: Automatic presentations for finitely generated groups. In: Diekert, V., Durand, B. (eds.) STACS 2005. LNCS, vol. 3404, pp. 693–704. Springer, Heidelberg (2005) 13. Rubin, S.: Automatic Structures, PhD Thesis, University of Auckland (2004) 14. Thomas, W.: A short introduction to infinite automata. In: Kuich, W., Rozenberg, G., Salomaa, A. (eds.) DLT 2001. LNCS, vol. 2295, pp. 130–144. Springer, Heidelberg (2002)
Search Space Reductions for Nearest-Neighbor Queries Micah Adler1 and Brent Heeringa2 1
2
Department of Computer Science, University of Massachusetts, Amherst 140 Governors Drive Amherst, MA 01003 Department of Computer Science, Williams College, Williamstown, MA, 01267 [email protected], [email protected]
Abstract. The vast number of applications featuring multimedia and geometric data has made the R-tree a ubiquitous data structure in databases. A popular and fundamental operation on R-trees is nearest neighbor search. While nearest neighbor on R-trees has received considerable experimental attention, it has received somewhat less theoretical consideration. We study pruning heuristics for nearest neighbor queries on R-trees. Our primary result is the construction of non-trivial families of R-trees where k-nearest neighbor queries based on pessimistic (i.e. min-max) distance estimates provide exponential speedup over queries based solely on optimistic (i.e. min) distance estimates. The exponential speedup holds even when k = 1. This result provides strong theoretical evidence that min-max distance heuristics are an essential component to depth-first nearest-neighbor queries. In light of this, we also consider the time-space tradeoffs of depth-first versus best-first nearest neighbor queries and construct a family of R-trees where best-first search performs exponentially better than depth-first search even when depth-first employs min-max distance heuristics.
1
Introduction
Nearest neighbor queries on the R-tree play an integral role in many modern database applications. This is due in large part to the prevalence and popularity of multimedia data indexed geometrically by a vector of features. It is also because nearest neighbor search is a common primitive operation in more complex queries [1]. Although the performance of nearest neighbor search on R-trees has received some theoretical consideration (e.g., [2,3]), its increasing prominence in todays computing world warrants even further investigation. The authors of [1] note that three issues affect the performance of nearest neighbors on R-trees: – the order in which children are visited, – the traversal type, and – the pruning heuristics. We show that at least two of these — traversal type and pruning heuristics — have a quantitatively profound impact on efficiency. In particular we prove the following: M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 554–567, 2008. c Springer-Verlag Berlin Heidelberg 2008
Search Space Reductions for Nearest-Neighbor Queries
555
1. There exists a family of R-trees where depth-first k-nearest neighbor search with pessimistic (i.e. min-max) distance pruning performs exponentially better than optimistic (i.e. min) distance pruning alone. This result holds even when k = 1. 2. There exists a family of R-trees where best-first k-nearest neighbor queries perform exponentially better than depth-first nearest neighbor queries even when the depth-first search uses both optimistic and pessimistic pruning heuristics. This result also holds when k = 1. Our first result provides strong theoretical evidence that pruning strategies based on pessimistic distance estimates are valuable in depth-first nearest neighbor queries. These results rely on subtle changes to existing algorithms. In fact, without these nuanced changes, the exponential speedup may completely disappear. Our second result deals with the known time efficiency benefits of best-first nearest neighbor algorithms over depth-first nearest neighbor algorithms. Given our first result, it is natural to ask whether pessimistic distance pruning closes part of the time efficiency gap. We answer this question in the negative through several general constructions. Still, the benefit of pessimistic pruning strategies should not be overlooked. They provably enhance the time-efficiency of the already space-efficient depth-first nearest neighbor queries. Such algorithms still play an increasingly prominent role in computing given the frequency and demand for operations on massive data sets. The outline of this paper is as follows: In sections 2 and 3 we briefly review definitions for R-trees, MinDist, MinMaxDist, and the three common pruning heuristics employed in depth-first nearest neighbor queries. Sections 4 and 5 describe our R-tree constructions and prove the power of pessimistic pruning. Section 6 discusses the time-space tradeoffs of best-first versus depth-first nearest neighbor queries. We conclude in Section 7.
2
Background
R-trees [4] and their variants (e.g. [5,6]. See [1] for a list of others) are data structures for organizing spatial objects in Euclidean space. They support dynamic insertion and deletion operations. Internal and leaf nodes contain records. A record r belonging to an internal node is a tuple M, μ where μ is a pointer to the child node of r and M is an n-dimensional minimum bounding rectangle (MBR). M tightly bounds the spatial objects located in the subtree of r. For example, given the points (1, 2), (4, 5), and (3, 7) in 2-space, the MBR would be (1, 4), (2, 7). The records of leaf nodes are also tuples but have the form M, o where o is either the actual spatial object or a reference to it. The number of records in a node is its branching factor. Every node of an Rtree contains between b and B records where both b and B are positive integers and b ≤ B2 . One exception is the root node which must have at least two records. R-trees are completely balanced—all leaf nodes have the same depth. Figure 1 depicts an example collection of spatial objects, their MBRs, and their R-tree. More details on R-trees are available in [1].
556
M. Adler and B. Heeringa
R1
R4 R5
R3
R9 R14
R8
R10
R11
(ii)
R13 R2
R7
R12
R19 R6 R18
R16
R17
R15
(i)
Fig. 1. (i) A collection of spatial objects (solid lines) and their hierarchy of minimum bounding rectangles (dashed lines). (ii) An R-tree for the objects in (i).
We consider nearest neighbor searches on R-trees. In all cases these searches have the form: Given a query point q and an R-tree T with spatial objects of matching dimension to q, find the k-nearest objects to q in T . Nearest here and throughout the rest of the paper is defined by Euclidean distance.
3
Nearest Neighbors
There are two dominant nearest neighbor algorithms for the R-tree. The first is a best-first search algorithm (denoted HS) due to Hjatlson and Samet [7]. HS is optimal in the sense that it only searches nodes with bounding boxes intersecting the k-nearest neighbor hypersphere [7,8]. However, it has worst case space complexity that is linear in the total number of tree nodes. With large data sets this cost may become prohibitive [3]. The second algorithm due to Roussopoulos et al. [9] (denoted here by RKV) is a branch and bound depth-first search. RKV employs several heuristics to prune away branches of the tree. We define and discuss the subtlety of these heuristics below. While RKV may search more nodes than HS, it has worst-case space complexity that is only logarithmic in the number of tree nodes. In addition, the authors of [10] note that statically constructed indices map all pages on a branch to contiguous regions on disk, so a depth-first search may “yield fewer disk head movements than the distance-driven search of the HS algorithm.” In these cases RKV may be preferable to HS for performance reasons beyond space complexity. 3.1
Distances
RKV uses three different strategies (here called H1, H2, and H3 respectively and defined formally below) to prune branches. H3 is based on a measure called MinDist which gives the actual distance between a node and a query point. In
Search Space Reductions for Nearest-Neighbor Queries
557
MinDistance MinMaxDistance
Fig. 2. A visual explanation of MinDist and MinMaxDist in two dimensions
other words, MinDist(q, M ) is the length of the shortest line between the query point q and the nearest face of the MBR M . When the query point lies within the MBR, MinDist is 0. Figure 2 shows the MinDist values for a query point and three minimum bounding rectangles. Because an MBR tightly encapsulates the spatial objects within it, each face of the MBR must touch at least one of the objects it encloses [9]. This is called the MBR face property. Figure 2 shows that MinDist is a lower bound or optimistic estimate of the distance between the query point and some spatial object inside its MBR. However, the actual distance between a query point and the closest object may be much larger. H1 and H2 use a second measure called MinMaxDist which provides an upper bound on the distance between an actual object in a node and a query point. In other words, MinMaxDist provides a pessimistic estimate of the distance between the query point and some spatial object within its MBR. Figure 2 depicts these distances for a query point and three MBRs. From its name one can see MinMaxDist is calculated by finding the minimal distance from a set of maximal distances. This set of maximal distances is formed as follows: Suppose we have an n-dimensional minimum bounding rectangle. If we fix one of the dimensions, we are left with two n − 1 dimensional hyperplanes; one representing the MBR’s lower bounds, the other representing its upper bounds. We know from the MBR face property that at least one spatial object touches each of these hyperplanes. However, given only the MBR, we cannot identify this location. But, given a query point, we can say that an object is at least as close as the distance from that point to the farthest point on the closest hyperplane. This distance is an upper bound on the distance between the query point and a spatial object located within the MBR. By iteratively fixing each dimension of an MBR and finding the upper bound, we can form the set of maximal distances. Since each maximal distance is an upper bound, it follows that the minimum of these is also an upper bound. This minimum distance is what we call the MinMaxDist(q, M ) of a query point q and an MBR M .
558
3.2
M. Adler and B. Heeringa
Pruning Heuristics
Pruning strategies based on MinDist and MinMaxDist potentially remove large portions of the search space. The following three strategies were originally defined in [9] for use in RKV. All assume a query point q and a list of MBRs M (to potentially prune) sorted by MinDist. The latter assumption is based on empirical results from both [9] and [7]. In addition, two strategies, H2 and H3, assume a current nearest object o. Definition 1 (H1). Discard any MBR Mi ∈ M if there exists Mj ∈ M with MinDist(q, Mi ) > MinMaxDist(q, Mj ) Definition 2 (H2). Discard o if there exists Mi ∈ M such that MinMaxDist (q, Mi ) < Dist(q, o). Definition 3 (H3). Discard any minimum bounding rectangle M ∈ M if MinDist(q, M ) > Dist(q, o) Both Cheung et al. [11] and Hjaltason et al. [7] show that any node pruned by H1 is also pruned by H3. Furthermore, they note that H2 serves little purpose since it does not perform any pruning. This has led to the development of simpler but behaviorally-identical versions of RKV that rely exclusively on H3 for pruning. As a result, we take RKV to mean the original RKV without H1 and H2.
Algorithm 1. 1NN(q, n, e) Require: A query point q, a node n and a nearest neighbor estimate e. e may be a distance estimate object or a (pointer to a) spatial object. 1: if LeafNode(n) then 2: for M, o in records[n] do {M is a MBR, o is a (pointer to a) spatial object} 3: if Dist(q, M ) ≤ Dist(q, e) then 4: e←o 5: end if 6: end for 7: else 8: ABL ← Sort(records[n]) {Sort records by MinDist } 9: for M, μ in ABL do {M is an MBR, μ points to a child node} 10: if MinMaxDist(q, M ) ≤ e then {H2* Pruning} 11: e ← MinMaxDist(q, M ))) 12: end if 13: end for 14: for M, μ in ABL do {M is an MBR, μ points to a child node} 15: if MinDist(q, M ) < Dist(q, e) then {H3 Pruning} 16: 1NN(q, μ, e) 17: end if 18: end for 19: end if
Search Space Reductions for Nearest-Neighbor Queries
559
The benefits of MinMaxDist, however, should not be overlooked — it can provide very useful information about unexplored areas of the tree. The key is to replace an actual object o with a distance estimate e (some call this the closest point candidate) and then adjust H2 so that we replace the estimate with the MinMaxDist instead of discarding the object. This gives us a new definition of H2 which we call H2*. Definition 4 (H2*). Replace e with MinMaxDist(q, Mi ) if there exist Mi ∈ M such that MinMaxDist(q, Mi ) < e. This definition is not new. In fact, the authors of [2] use replace instead of discard in their description of H2. However, updating the definition of H2 to H2* in RKV does not yield the full pruning power of MinMaxDist. We need to apply H2* early in the search process. This variation on RKV yields Algorithm 1 which we refer to it as 1NN. Note that the RKV traditionally applies H2 after line 1. This diminishes the power of pessimistic pruning. In fact, our exponential speedup results in Sections 4 and 5 hold even when H2* replaces H2 in the original algorithm. 3.3
k-Nearest Neighbors
Correctly generalizing H2* to k-nearest neighbor queries is essential in light of the potential power of pessimistic pruning. However, as B¨ ohm et. al [10] point out, such an extension takes some care. We begin by replacing e with a priority queue L of k-closest neighbors estimates. Note that H2* doesn’t perform direct pruning, but instead, updates the neighbor estimate when distance guarantees can be made. If the MinMaxDist of a node is less than the current distance estimate, then we can update the estimate because the future descent into that node is guaranteed to contain an object with actual distance at most MinMaxDist. We call estimates in updates of this form promises because they are not actual distances, but are upper bounds on distances. Moreover each estimate is a promise of, or place holder for, a spatial object that is as least as good the promise’s prediction. A natural but incorrect generalization of H2 places a promise in the priority queue whenever the maximum-distance element in L is farther away than the MinMaxDist. This leads to two problems. First, multiple promises may end up referring to the same spatial object. Second, a promise may persist past its time and eventually refer to a spatial object already in the queue. These problems are depicted visually in Figure 3. The key to avoiding both problems is to always remove a promise from the queue before searching the node which generated it; it will always be replaced by an equal or better estimate or by an actual object. This leads us to the following generalization of H2 which we call Promise-Pruning: Definition 5 (Promise-Pruning). If there exists Mi ∈ M such that δ(q, Mi ) = MinMaxDist(q, Mi ) < Max(L), then add a promise with distance δ(q, Mi ) to L. Additionally, replace any promise with distance δ(q, Mi ) from L with ∞ before searching Mi .
560
M. Adler and B. Heeringa
a X c q
d Y e
b X
r1 r2
f
a
b
Y d
e
r3
Fig. 3. Blindly inserting promises into the queue, without removing them correctly, wreaks havoc on the results. For example, when performing a 2-nearest neighbor search on the tree above, a promise with distance f is placed in the queue at the root. After investigating X, the queue retains f and a. However, if f is not removed before descending into Y , the final distances in the queue are e and f — an incorrect result.
This generalization is tantamount to an extension suggested by B¨ohm et al. in [10]. Our primary contribution is to show that this extension, when performed at the right point, may provide an exponential performance speedup on depthfirst nearest neighbor queries. For completeness, we also provide generalizations of H1 and H3 which we call K1 and K3 respectively. We also prove that K3 dominates K1, just as it did in the 1-nearest neighbor case. This proof is a simple extension of those appearing in [11,7]. Definition 6 (K1). Discard any minimum bounding rectangle Mi ∈ M if there exists M ⊆ M such that |M | ≥ k and for every Mj ∈ M it is the case that MinDist(q, Mi ) > MinMaxDist(q, Mj ) Definition 7 (K3). Discard any minimum bounding rectangle Mi ∈ M if MinDist(q, Mi ) > Max(L) where Max returns the largest estimate in the priority queue. Theorem 1. Given a query point q, a list of MBRs M, and a priority queue of k closest neighbor estimates L, any MBR pruned by K1 in the depth-first RKV algorithm is also pruned by K3. Proof. Suppose we are performing a k nearest neighbor search with query point q and K1 prunes MBR M from M. From Definition 6 there exists M ⊂ M such that |M | ≥ k and every M in M has MinMaxDist(q, M ) < MinDist(q, M ). Since for any MBR N , MinDist(q, N ) ≤ MinMaxDist(q, N ), each M in M will be searched before M because M is sorted by MinDist. Because each M in M is guaranteed to contain a spatial object with actual distance at most that of than an object found in M and since we have |M | ≥ k we know Max(L) < δ(q, M ). Therefore, from Definition 7, M would also be pruned using K3.
Search Space Reductions for Nearest-Neighbor Queries
561
Algorithm 2. KNN(k, q, n, L) Require: An integer k, a query point q, a node n and a priority queue L of fixed size k. L initially contains k neighbor estimates with distance ∞. 1: if LeafNode(n) then 2: for M, o in records[n] do {M is a MBR, o is a (pointer to a) spatial object} 3: if Dist(q, M ) < max(L) then {Here max returns the object or estimate of greatest distance} 4: insert(L, o) {Inserting o into L replaces some other estimate or object.} 5: end if 6: end for 7: else 8: ABL ← Sort(records[n]) {Sort records by MinDist } 9: for M, μ in ABL do {M is an MBR, μ points to a child node} if MinMaxDist(q, M ) < max(L) then {Promise-Pruning} 10: 11: insert(L, Promise(MinMaxDist(q, M ))) 12: end if 13: end for 14: for M, μ in ABL do {M is an MBR, μ points to a child node} 15: if MinDist(q, M ) < max(L) then {K3 Pruning} 16: if L contains a promise generated from M then 17: remove(L, Promise(MinMaxDist(q, M ))) 18: end if 19: KNN(k, q, μ, L) 20: end if 21: end for 22: end if
Theorem 1 means K1 is redundant with respect to K3, so we do not use it in our depth-first k-nearest neighbor procedure outlined in Algorithm 2. We call this procedure KNN.
4
The Power of Pessimism
In this section we show that 1NN may perform exponentially faster than RKV despite the fact that it differs only slightly from the original definition. As we noted earlier, the original RKV is equivalent to RKV with only H3 pruning. Thus, RKV is Algorithm 1 without lines 9-13. Theorem 2. There exists a family of R-tree and query point pairs T = {(T1 , qn ), . . . , (Tm , qm )} such that for any (Ti , qi ), RKV examines O(n) nodes and 1NN examines O(log n) nodes on a 1-nearest neighbor query. Proof. For simplicity, we restrict our attention to R-trees composed of points in R2 so that all the MBRs are rectangles. Also, we only construct complete binary trees where each node has two records. The construction follows the illustration in Figure 4. Let δ(i, j) be the Euclidean distance from point i to point j and let (i, j) be their rectangle. Let q be the query point. Choose three points a,
562
M. Adler and B. Heeringa b
Y
Z
f
X U
q
T
X
T
2
c Leaves are pairs of points in V with MBRs like Y or Z
r1 r2 r3
d
W
1
W
a e
V
r4
b
c
L1
d
e
points in X
L2
Fig. 4. A visual explanation of the tree construction for Theorem 2
b, and c such that δ(q, a) = r1 < δ(q, b) = r4 < δ(q, c) and (b, c) forms a rectangle W with a corner a. Similarly, choose three points d, e, and f such r1 < δ(q, f ) = r2 < δ(q, d) = r3 < r4 < δ(q, e) and (d, e) forms a rectangle X with corner f . Let T be a complete binary tree over n leaves where each node has two records. Let T1 be the left child of T and let T2 be the right child of T . Let L1 be the far left leaf of T1 and let L2 be the far left leaf of T2 . Place b and c in L1 , and d and e in L2 . In the remaining leaves of T1 , place pairs of points (pi , pj ) such that pi and pj are interior to V , δ(q, pi ) > r4 and δ(q, pj ) > r4 but (pi , pj ) form a rectangle with corner p such that r3 < δ(q, p ) < r4 . Rectangles Y and Z in Figure 4 are examples of this family of point pairs. In the remaining leaves of T2 place pairs of points (pk , pl ) such that pk and pl are interior to U (i.e., so that MinDist(q, (pk , pl )) > r3 ). The construction yields a valid R-tree because we place pairs of points at the leaves and build up the MBRs of the internal nodes accordingly. Claim. Given a tree T and query point q as constructed above, both RKV and 1NN prune away all of T2 save the left branch down to L2 on a 1-nearest neighbor query. Proof. Note that d is the nearest neighbor to q in T so both algorithms will search T2 . L2 , by construction, has the least MinDist of any subset of points in T2 , so both algorithms, when initially searching T2 , will descend to it first. Since δ(q, d) is the realization of this MinDist and no other pair of points in X has MinDist < δ(q, d), both algorithms will prune away the remaining nodes using H3. Now we’ll show that RKV must examine all the nodes in T1 while 1NN can use information from X to prune away all of T1 save the left branch down to L1 . Lemma 1. Given an R-tree T and query point q as constructed above, RKV examines every node in T1 on a 1-nearest neighbor query. Proof. Since MinDist(q, W ) = δ(q, a), RKV descends to L1 first and claims b as its nearest neighbor. However, RKV is unable to prune away any of the remaining leaves of T1 . To see this, let Li = (pi1 , pi2 ) and Lj = (pj1 , pj2 ) be distinct leaves of
Search Space Reductions for Nearest-Neighbor Queries
563
T1 (but not L1 ). Note that MinDist(q, (pi1 , pi2 )) < r4 < min(δ(q, pj1 ), δ(q, pj2 )) and MinDist(q, (pj1 , pj2 )) < r4 < min(δ(q, pi1 ), δ(q, pi2 )). This means that the MinDist of any leaf node is at most r4 but every point is at least r4 so RKV must probe every leaf. As a result, it cannot prune away any branches. Lemma 2. Given a tree T and query point q as constructed above, 1NN prunes away all nodes in T1 except those on the branch leading to L1 in a 1-nearest neighbor query. Proof. 1NN uses the MinMaxDist information from X as an indirect means of pruning. Before descending into T1 , the algorithm updates its neighbor estimate with δ(q, d). Like RKV, 1NN descends into T1 directly down to L1 since δ(q, W ) < δ(q, d) and W has the smallest MinDist of all the nodes. Unlike RKV, it reaches and ignores b because δ(q, d) < δ(q, b). In fact, the promise granted by X allows us to prune away all other branches of the tree since the remaining nodes are all interior to V and δ(q, d) < MinDist(q, V ). The theorem follows from Lemma 1 and Lemma 2. RKV searches all of T1 (O(n) nodes) while 1NN searches only the paths leading to L1 and L2 (O(log n) nodes). As a consequence, 1NN can reduce the search space exponentially over RKV.
In the original RKV, H2 pruning appears on line 1. Our results hold even if we replace H2 with H2*. This is because all the pruning in T1 relies on the MinMaxDist found at the root node. Hence the promotion of pessimistic pruning in Algorithm 1 plays a crucial role in the performance of depth-first nearest neighbor queries.
5
Search Space Reductions with K-Nearest Neighbors
Here we show that the benefits 1NN reaps from MinMaxDist extend to KNN when H2* is properly generalized to Promise-Pruning. In particular, we construct a class of R-trees where KNN reduces the number of nodes visited exponentially when compared with RKV. As in Section 4, we take RKV to mean Algorithm 2 without lines 9-13 (and additionally lines 16-18).
Y
Z b
V T1
f
X U
q
c
T2
Leaves are pairs of points in V with MBRs like Y or Z
r1 r2 r3
d
X
W
a e
W
b
r4 r5
c
L1
d
e
points in U
L2
Fig. 5. A visual explanation of the tree construction used in Theorem 3
564
M. Adler and B. Heeringa
Theorem 3. There exists a family of R-trees and query point pairs T = {(T1 , qn ), . . . , (Tm , qm )} such that for any (Ti , qi ), RKV examines O(n) nodes and KNN examines O(log n) nodes on a 2-nearest neighbor query. Proof. The proof follows the outline of Theorem 2. The construction is similar to Figure 4 except that b shifts slightly down and V shifts slightly up so that δ(q, b) < MinDist(q, V ). We illustrate this in Figure 5 and give details where the two proofs differ. Claim. Given a tree T and query point q as constructed in Figure 5, both KNN and RKV prune away all of T2 save the left branch down to L2 on a 2-nearest neighbor query. Proof. Note that b and d are the two nearest-neighbors to q in T . Both algorithms will search T1 first because MinDist(q, W ) < MinDist(q, X). Since both algorithms are depth-first searches, b is in tow by the time T2 is searched. Because d has yet to be realized, both algorithms will search T2 and prune away the remaining nodes just as in 4. Just as before, we’ll show that RKV must examine all the nodes in T1 while KNN can use information from X to prune away all of T1 save the left branch down to L1 . Lemma 3. Given an R-tree T and query point q as constructed above, RKV examines every node in T1 in a 2-nearest neighbor search. Proof. Since MinDist(q, W ) = δ(q, a), RKV descends to L1 first and inserts b (and c) into its 2-best priority queue. However, RKV is unable to prune away any of the remaining leaves of T1 because every pair of leaf points have MinDist at most r5 but all points in T1 (besides b) lie outside r5 . As a result RKV must probe every leaf. Lemma 4. Given a tree T and query point q as constructed above, KNN prunes away all nodes in T1 except those on the branch leading to L1 in a 2-nearest neighbor search. Proof. Before descending into T1 , KNN inserts a promise with distance MinMaxDist(q, X) = δ(q, d) into its 2-best priority queue. The algorithm descends into T1 directly down to L1 , finding b and inserting it into its 2-best priority queue. Unlike RKV, the promise granted by X allows us to prune away all other nodes of the tree since the remaining nodes are all interior to V and δ(q, d) < MinDist(q, V ). The theorem follows from Lemma 3 and Lemma 4. Note that if KNN did not remove the promise granted by d at X the final result would be the point d and its promise – an error.
We can generalize the construction given in Theorem 3 so that the exponential search space reduction holds for any k nearest neighbor query. In particular,
Search Space Reductions for Nearest-Neighbor Queries
565
b
W
Y
Z
T1
V
W
X
T2
a d
q
c Leaves are pairs of points in V with MBRs like Y or Z
r1 r2
X
r3 r4
e
b
c
L1
d
e
points in X
L2
f
Fig. 6. A visual explanation of the tree construction used in Theorem 4
for any constant k > 0 there exists a class of R-tree and query point pairs for which KNN reduces the search space exponentially over RKV. A simple way of accomplishing this is to place k − 1 points at b and insert them into the far left leaves of T1 .
6
Time / Space Tradeoffs
Given the search space reductions offered by 1NN and KNN, it is natural to ask if the space-efficient depth-first algorithms can approach the time efficiency of the best-first algorithms. In other words, how does KNN stack up against HS? We answer this question here in the negative by constructing a family of R-trees where HS performs exponentially better than KNN. The HS algorithm uses a priority queue to order the nodes by MinDist. It then performs a best-first search by MinDist while pruning away nodes using K3. We direct the reader to [7] for more details. Theorem 4. There exists a family of R-tree and query point pairs T = {(T1 , qn ), . . . , (Tm , qm )} such that for any (Ti , qi ), HS examines O(log n) nodes and 1NN examines O(n) nodes on a 1-nearest neighbor query. Proof. This construction resembles the construction in Theorem 2 and is depicted visually in Figure 6. We organize T1 in exactly the same way as Theorem 2: we choose three points a, b, and c such that δ(q, a) = r1 < δ(q, b) = r4 < δ(q, c) and (b, c) forms a rectangle W with a corner a. Next we choose three points d, e, and f such r1 < δ(q, d) = r2 < r4 < δ(q, f ) < δ(q, e) and (d, e) forms a rectangle X with corner f . Let T be a complete binary tree over n leaves where each node has two records. Let T1 , T2 , L1 , and L2 be as in Theorem 2. Place b and c in L1 , and d and e in L2 . In the remaining leaves of T1 , place pairs of points (pi , pj ) such that pi and pj are interior to V , δ(q, pi ) > r4 and δ(q, pj ) > r4 but (pi , pj ) form a rectangle with corner p such that r3 < δ(q, p ) < r4 . Rectangles Y and Z in Figure 6 are examples of this family of point pairs. In the remaining leaves of T2 place pairs of points (pk , pl ) such that pk and pl are interior to X.
566
M. Adler and B. Heeringa
Claim. Given a tree T and query point q as constructed above, both 1NN and HS prune away all of T2 save the left branch down to L2 on a 1-nearest neighbor query. Proof. d is the nearest neighbor to q in T so both algorithms must search T2 . L2 , by construction, has the least MinDist of any subset of points in T2 , so both algorithms, when initially searching T2 , will descend to it first. Since δ(q, d) is the realization of this MinDist and no other pair of points in X has MinDist < δ(q, d), both algorithms will prune away the remaining nodes using H3. Now we’ll show that 1NN must examine all the nodes in T1 while HS can use the MinDist of X to bypass searching all of T1 save the path down to L1 . Lemma 5. Given an R-tree T and query point q as constructed above, 1NN examines every node in T1 on a 1-nearest neighbor query. Proof. 1NN descends into T1 before T2 since MinDist(q, W ) < MinDist(q, X). Note that the distance estimate delivered by f is useless given that every pair of leaf points in T1 has MinDist less than δ(q, f ). 1NN will descend to L1 and find b but it cannot rule out the rest of T1 because every pair of leaf points forms a rectangle with MinDist smaller than r4 . To see this, let Li = (pi1 , pi2 ) and Lj = (pj1 , pj2 ) be distinct leaves of T1 (but not L1 ). Note that MinDist(q, (pi1 , pi2 )) < r4 < min(δ(q, pj1 ), δ(q, pj2 )) and MinDist(q, (pj1 , pj2 )) < r4 < min(δ(q, pi1 ), δ(q, pi2 )). This means that the MinDist of any leaf node is at most r4 but every point is beyond r4 so 1NN cannot use H3 pruning. Furthermore, since MinMaxDist is always an upper bound on the points, it can never use H2* pruning. Thus, 1NN searches all of T1 . Lemma 6. Given a tree T and query point q as constructed above, HS prunes away all nodes in T1 except those on the branch leading to L1 in a 1-nearest neighbor query. Proof. Like 1NN, HS descends directly to L1 , however, once b is in tow, it immediately jumps back to X since MinDist(q, X) < MinDist(q, V ). Since d is the 1-nearest neighbor and since MinDist(q, X) = δ(q, d), it immediately descends to L2 to find d. Since δ(q, d) < MinDist(q, V ) it can use H3 to prune away the rest of T1 . The theorem follows from Lemma 5 and Lemma 6. 1NN searches all of T1 (O(n) nodes) while HS searches only the paths leading to L1 and L2 (O(log n) nodes). As a consequence, HS can prune the search space exponentially over 1NN even when 1NN has the advantages of H2*.
Extending Theorem 4 to k-nearest neighbors is fairly straight-forward. Add k −1 points to X between r2 and r3 and place these points in leaves as adjacent to L2 as possible. Since this set of points forms a rectangle with MinDist smaller than r3 and since this rectangle is encountered en route to L2 , HS will find the points immediately and then use H3 to prune away the rest of T1 and T2 . This gives us the following theorem:
Search Space Reductions for Nearest-Neighbor Queries
567
Theorem 5. There exists a family of R-tree and query point pairs T = {(T1 , qn ), . . . , (Tm , qm )} such that for any (Ti , qi ), HS examines O(log n) nodes and KNN examines O(n) nodes on a k-nearest neighbor query.
7
Open Problems and Future Work
The most natural open problem is quantifying the time/space trade-off of depthfirst versus best-first nearest-neighbor algorithms on the R-tree. One line of future work might explore hybrid algorithms that combine the space-efficiency of depth-first search along with the time-efficiency of best-first search.
References 1. Manolopoulos, Y., Nanopoulos, A., Papadopoulos, A.N., Theodoridis, Y.: R-Trees: Theory and Applications. In: Advanced Information and Knowledge Processing, 1st edn., Springer, Heidelberg (2006) 2. Papadopoulos, A., Manolopoulos, Y.: Performance of nearest neighbor queries in r-trees. In: Proceedings of the 6th International Conference on Database Theory, pp. 394–408 (1997) 3. Berchtold, S., B¨ ohm, C., Keim, D.A., Krebs, F., Kriegel, H.P.: On optimizing nearest neighbor queries in high-dimensional data spaces. In: Van den Bussche, J., Vianu, V. (eds.) ICDT 2001. LNCS, vol. 1973, pp. 435–449. Springer, Heidelberg (2000) 4. Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 47–57 (1984) 5. Sellis, T., Roussopoulos, N., Faloutsos, C.: R+-tree: A dynamic index for multidimensional objects. In: Proceedings of the 13th International Conference on Very Large Databases, pp. 507–518 (1988) 6. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: R*-tree: An efficient and robust access method for points and rectangles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 322–331 (1990) 7. Hjaltason, G.R., Samet, H.: Distance browsing in spatial databases. ACM Transactions on Database Systems 24, 265–318 (1999) 8. Berchtold, S., B¨ ohm, C., Keim, D.A., Kriegel, H.P.: A cost model for nearest neighbor search in high-dimensional data space. In: Proceedings of the Sixteenth ACM Symposium on Principles of Database Systems, pp. 78–86. ACM Press, New York (1997) 9. Roussopoulos, N., Kelley, S., Vincent, F.: Nearest neighbor queries. In: Proceedings ACM SIGMOD Internaiontal Conference on the Management of Data, pp. 71–79 (1995) 10. B¨ ohm, C., Berchtold, S., Keim, D.A.: Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Computing Surveys (CSUR) 33, 322–373 (2001) 11. Cheung, K.L., Fu, A.W.C.: Enhanced nearest neighbour search on the r-tree. SIGMOD Record 27, 16–21 (1998)
Total Degrees and Nonsplitting Properties of Σ20 Enumeration Degrees M.M. Arslanov1, , S.B. Cooper2 , I.Sh. Kalimullin1 , and M.I. Soskova2, 1
Department of Mathematics, Kazan State University, Kazan 420008, Russia 2 School of Mathematics, University of Leeds, Leeds LS2 9JT, U.K.
This paper continues the project, initiated in [ACK], of describing general conditions under which relative splittings are derivable in the local structure of the enumeration degrees. The main results below include a proof that any high total e-degree below 0e is splittable over any low e-degree below it, and a construction of a Π10 e-degree unsplittable over a Δ2 e-degree below it. In [ACK] it was shown that using semirecursive sets one can construct minimal pairs of e-degrees by both effective and uniform ways, following which new results concerning the local distribution of total e-degrees and of the degrees of semirecursive sets enabled one to proceed, via the natural embedding of the Turing degrees in the enumeration degrees, to results concerning embeddings of the diamond lattice in the e-degrees. A particularly striking application of these techniques was a relatively simple derivation of a strong generalisation of the Ahmad Diamond Theorem. This paper extends the known constraints on further progress in this direction, such as the result of Ahmad and Lachlan [AL98] showing the existence of a nonsplitting Δ02 e-degree > 0e , and the recent result of Soskova [Sos07] showing that 0e is unsplittable in the Σ20 e-degrees above some Σ20 e-degree < 0e . This work also relates to results (e.g. Cooper and Copestake [CC88]) limiting the local distribution of total e-degrees. For further background concerning enumeration reducibility and its degree structure, the reader is referred to Cooper [Co90], Sorbi [Sor97] or Cooper [Co04], chapter 11. Theorem 1. If a < h ≤ 0 , a is low and h is total and high then there is a low total e-degree b such that a ≤ b < h. Corollary 2. Let a < h ≤ 0 , h be a high total e-degree, a be a low e-degree. Then there are Δ02 e-degrees b0 < h and b1 < h such that a = b0 ∩ b1 and h = b0 ∪ b1 . Proof. Immediately follows from Theorem 1, and Theorem 6 of [ACK].
Proof of Theorem 1. Assume A has low e-degree, H ⊕ H has high e-degree (i.e., H has high Turing degree) and A ≤e H ⊕ H.
The first and the third authors are partially supported by RFBR grant 05-01-00605. The fourth author is partially supported by Marie Curie Early Training Site MATHLOGAPS(MEST-CT-2004-504029).
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 568–578, 2008. c Springer-Verlag Berlin Heidelberg 2008
Total Degrees and Nonsplitting Properties
569
We want to construct an H-computable increasing sequence of initial segments {σs }s∈ω such that the set B = ∪s σs satisfies the requirements Pn : n ∈ A ⇐⇒ (∃y)[ n, y ∈ B] and / Wnτ ]] Rn : (∃σ ⊂ B)[n ∈ Wnσ ∨ (∀τ ⊃ σ)[τ ∈ S A =⇒ n ∈ for each n ∈ ω, where S A = {τ : (∀x)(∀y)[τ ( x, y ) ↓= 1 =⇒ x ∈ A]}. Note that Pn -requirements guarantee that A ≤e B, and hence A ≤e B ⊕ B. To prove that the Rn -requirements provide B ≡T ∅ , first note that S A ≡e A, which has low e-degree, and X = { σ, n : (∃τ ⊃ σ)[τ ∈ S A & n ∈ Wnτ ]} ≤e S A . Then X ∈ Δ02 and / X], n∈ / B ⇐⇒ (∃σ ⊂ B)[ σ, n ∈ so that B is co-c.e. in B ⊕ ∅ ≡T ∅ . Thus B ≤T ∅ by Post’s Theorem. Since the set B will be computable in H, the set Q = {n : (∀σ ⊂ B)(∃τ ⊃ σ)[τ ∈ S A & n ∈ Wnτ ]} will be computable in (H ⊕ ∅ ) ≡T H – indeed, we have n ∈ Q ⇐⇒ (∀σ ⊂ B)[ σ, n ∈ X], so that Q is co-c.e. in H ⊕ ∅ . Now to construct the desired set B we can apply the Recursion Theorem and fix an H-computable function g such that Q(x) = lims g(x, s). Let {As }s∈ω and {SsA }s∈ω be respective H-computable enumerations of A and S A . Construction Stage s = 0. σ0 = λ. Stage s + 1 = 2 n, z (to satisfy Pn ). Given σs define l = |σs |. If n ∈ / As , then let σs+1 = σs 0. If n ∈ As , then choose the least k ≥ l such that k = n, y for some y ∈ ω and define σs+1 = σs 0k−l 1 (so that σs+1 (k) = 1). Stage s + 1 = 2 n, z + 1 (to satisfy Rn ). H-computably find the least stage τ for some τ satisfying τ ∈ StA and t ≥ s such that either g(n, t) = 0, or n ∈ Wn,t τ ⊃ σs . (Such stage t exists since if lims g(n, s) = 1 then n ∈ Q, and hence there exists some τ ⊃ σs such that n ∈ Wnτ and τ ∈ S A .) If g(n, t) = 0 then define σs+1 = σs 0. τ Otherwise, choose the first τ ⊃ σs such that τ ∈ StA and n ∈ Wn,t . Define σs+1 = τ. This completes the description of the construction.
570
M.M. Arslanov et al.
Let B = ∪s σs . Clearly B ≤T H since each σs is obtained effectively in H. Each Pn -requirement is satisfied by the even stages of the construction since σs ∈ S A for any s ∈ ω. To prove that each Rn -requirement is met suppose that (∀σ ⊂ B)(∃τ ⊇ σ)[τ ∈ S A & n ∈ Wnτ ] for some n. This means that n ∈ Q. Choose any odd stage s = 2 n, z + 1 such that g(n, t) = 1 for all t ≥ s. Then by the construction n ∈ Wnσs . Hence A ≤e B ⊕ B ≤e H ⊕ H, and dege (B ⊕ B) is low. Theorem 3. There is a Π10 e-degree a and a 3-c.e. e-degree b < a such that a is not splittable over b. Proof. We construct a Π10 set A and 3-c.e. set B satisfying both the global requirement: G : B = Ω(A), and the requirements RΞ,Ψ,Θ : A = Ξ(Ψ (A) ⊕ Θ(A)) =⇒ (∃ e-operator Γ )A = Γ (Ψ (A) ⊕ B)∨ (∃ e-operator Λ)A = Λ(Θ(A) ⊕ B) for each triple of e-operators Ξ, Ψ, Θ, and NΦ : A = Φ(B) for each e-operator Φ. In fact A will be constructed as a 2-c.e. set. Note that the e-degrees of Π1 sets coincide with the e-degrees of 2-c.e. sets. Hence this will still produce the desired enumeration degrees. Basic Strategies Suppose we have an effective listing of all requirements R1 , R2 , . . . and N1 , N2 , . . . The requirements will then be arranged by priority in the following way: G < R1 < N1 < R2 < N2 < . . . To satisfy the requirement G we will make sure that every time we enumerate an element into the set B, we enumerate a corresponding axiom into the set Ω; and every time we extract an element from B, we make the corresponding axiom invalid by extracting elements from A. More precisely every element y that enters B will have a corresponding marker m in A and an axiom y, {m} in Ω. If y is extracted from B then we extract m from A. If y is later re-enumerated into B – this can happen since B is 3-c.e. – then we will just enumerate the axiom y, ∅ into Ω. To satisfy the requirements Ri we will initially try to construct an operator Γ using information from both of sets B and Ψ (A). Again, enumeration of elements into A is always accompanied by enumeration of axioms into Γ , and extraction of elements from A can be rectified via B-extractions.
Total Degrees and Nonsplitting Properties
571
The N -strategies follow a variant of the Friedberg -Muchnik strategy while at the same time respecting the Γ -rectification, so we will call them (NΦ , Γ )strategies. They choose a follower x, enumerate it in A, then wait until x ∈ Φ(B). If this happens - they extract the element x from A while restraining B ϕ(x) in B. The need to rectify Γ after the extraction of the follower x from A can be in conflict with the restraint on B. To resolve this conflict we try to obtain a change in the set Ψ (A) which would enable us to rectify Γ without any extraction from the set B. To do this we monitor the length of agreement lΞ,Ψ,Θ (s) = max{y : (∀y < x)[y ∈ A[s] ⇐⇒ y ∈ Ξ(Ψ (A) ⊕ Θ(A))[s]]}. We only proceed with actions directed at a particular follower once it is below the length of agreement. This ensures that the extraction of x from A will have one of the following consequences 1. The length of agreement will never return so long as at least one of the axioms that ensure x ∈ Ξ(Ψ (A) ⊕ Θ(A)) remains valid. 2. There is a useful change in the set Ψ (A). 3. There is a useful change in the set Θ(A). We will initially assume that it is the case that the third consequence is true and commence a backup strategy (NΦ , Λ) which is devoted to building an enumeration operator Λ with information from A and Θ(A). This is a new copy of the N -strategy working with the same follower. It will try to make use of this change in Θ(A) to satisfy the requirement. Only when we are provided with evidence that our assumption is wrong will we return to the initial strategy (NΦ , Γ ). Basic module for an NΦ -strategy below one RΞ,Ψ,Θ -strategy We will first consider the simple case involving just two requirements. Assume we have NΦ , which we refer to as the N -requirement, below RΞ,Ψ,Θ , which we refer to as the R-requirement. At the root we have the R-strategy denoted by (R, Γ ). It will have two out/ A. In the case comes e
572
M.M. Arslanov et al.
Outcome w indicates that Γ is correct on x and the N -requirement is satisfied as x ∈ A − Φ(B). Outcome f is only accessible once a follower x has been returned. It will indicate that Γ is again correct on x and the N -requirement is satisfied via x ∈ Φ(B) − A. Below outcome λ strategies will be devoted to constructing an operator Λ with A = Λ(Θ(A) ⊕ B) where they will receive their followers from (N, Γ ). Again we have a controlling strategy (R, Λ) with only one outcome e which makes sure that the operator Λ can be rectified at all times. In case it sees an element x ∈ /A for which the axiom enumerated in Λ is valid, it will send x back to (N, Γ ). We will be able to argue that x has provided evidence of a useful change in Ψ (A). Below (R, Λ)’s only outcome e we try to meet N by (N, Λ) with A = ΛΦ (Θ(A)⊕ B). The strategy below the outcome λ acts only when the (N, Γ )-strategy sends its follower x. It performs similar actions with regard to (N, Γ ) and has two outcome f
Total Degrees and Nonsplitting Properties
573
2)Let v(x) = max(ϕ(x), y(x)) and restrain B on v(x). Extract y(x) from Bs and m(x) from As , noting that x is still in Ξ(Ψ (A) ⊕ Θ(A)) as the marker m(x) is chosen as a fresh number after G(x) and H(x) are already defined. Send x. Let the outcome be o = λ. At the next stage start from Setup1, choosing a new current follower. The strategy working below outcome λ will believe B only below a right boundary Rs = y(x). Note that the next follower will choose its B-marker of greater value. So if the outcome λ is visited infinitely often then the right boundary R will grow unboundedly. Result Let the returned follower be x. Put y(x) into Bs and y(x), ∅ into Ωs . For each follower z of this strategy such that z ∈ A put the axiom z, ∅ into Γs . 1) For the returned follower we know that x ∈ / As and H(x) ⊂ Θ(As ). The outcome λ will not be accessible anymore so we can preserve H(x) ⊆ Θ(At ) at further stages t. Also if G(x) ⊆ Ψ (As ) then the (R, Γ )-strategy would have outcome gw preserving the difference and satisfying R globally. The (N, Γ )-strategy would not be accessible any longer. Otherwise G(x) A and the outcome is o = f . Return at Result1 at the next stage. The (R, Λ)-strategy below outcome λ 1. Scan all followers x ∈ / A. 2. If x ∈ Λ(Θ(A) ⊕ B) then return x. End this stage. 3. If all followers are scanned and none have been returned then let the outcome be e. The (N, Λ)-strategy below outcome λ Setup 1) Let x ∈ A be a new integer which was sent by the (N, Γ )-strategy. Now x becomes the follower of the (N, Λ)-strategy. Go to Setup2. 2) Put x, Hx ⊕ B v(x) into Λ. Go to W ait. Wait If x ∈ Φ(B) with use ϕ(x) < Rs then go to Attack. Otherwise the outcome is o = w, return to W ait at the next stage. Attack Extract x from A. Go to result. Result Let the outcome be o = f . Return to Result at the next stage. The (N, F M )-strategy below outcome l or gw Setup Choose a new follower x bigger than any previously set restraint on A and enumerate it into A. Go to W ait. Wait If x ∈ Φ(B) go to Attack. Otherwise the outcome is o = w, return to W ait at the next stage. Attack Extract x from A and go to Result. Result Let the outcome be o = f . Return to Result at the next stage.
574
M.M. Arslanov et al.
Now the (N, F M ) strategy below outcome l will also be changing A. To keep Γ and Λ rectified, every time we initialise the (N, F M )-strategy and cancel its follower x, if x ∈ A we will add the axiom x, ∅ in Γ and Λ. If the (R, Γ )-strategy has outcome gw on stage s for the first time, then the (N, F M )-strategy working below will will be initialised on the previous stage and will choose its follower x anew, respecting the restraint on A that (N, Γ ) has set up. So (R, Γ ) will have outcome gw on all further stages and B will not be modified any longer. The (N, F M )-strategy will be able to satisfy its requirement. Suppose that (R, Γ )-strategy never has outcome gw. We will analyse all possible outcomes of the N -strategies and see that in each case the requirements are satisfied. Consider first the possible outcomes of the strategy (N, Γ ). If one of the cycles stops at Setup2, i.e. on all stages t > s the strategy has outcome l, then the true outcome will be (o = l). The length of agreement lΞ,Ψ,Θ (s) = max{y : (∀y < x)[y ∈ A[s] ⇐⇒ y ∈ Ξ(Ψ (A) ⊕ Θ(A))[s]]} is bounded and hence the requirement R is trivially satisfied. The set B is not modified after stage s and the simple strategy (N, F M ), active on all stages t ≥ s succeeds to satisfy the requirement N . Suppose now that no cycle of the (N, Γ )-strategy stops at Setup2. In this case the (N, F M )-strategy may be activated infinitely many times and will be initialised every time the (N Γ )-strategy moves on to W ait. The current follower x of the (N, F M )-strategy will be cancelled and if it is not yet extracted from A the corresponding axiom x, ∅ will be enumerated in Γ and Λ. This ensures that both operators will be correct at x for all cancelled followers x of the strategy (N, F M ). We first consider the case when the (N, Γ )-strategy during its work sends only finitely many integers. Then some cycle with a follower x stops either at W ait or reaches Result. If the cycle stops at W ait then the outcome is o = w and x ∈ A − Φ(B), hence the N -requirement is satisfied. On the other hand for all followers z we have z ∈ A ⇐⇒ z ∈ Γ (Ψ (A) ⊕ B) and m(z) ∈ A ⇐⇒ m(z) ∈ Γ (Ψ (A) ⊕ B) since y(z) ∈ B ⇐⇒ z = x. Hence Γ is correct at all followers z. If the cycle reaches Result then we have y(x) ∈ B and hence x ∈ Φ(B) − A, so N is satisfied. Also Hx ⊆ Θ(A) via some finite set Px ⊂ A. If Gx ⊆ Ψ (A) then this will be apparent at some finite stage s, i.e. on stage s we will see a finite set Qx ⊂ A such that Gx ⊆ Ψ (Qx ). Then from stage s on the (R, Γ )-strategy will have outcome o = gw, contradicting our assumption. So Gx Ψ (A) giving x∈ / Γ (Ψ (A) ⊕ B). Since again y(z) ∈ B ⇐⇒ z = x we have z ∈ A ⇐⇒ z ∈ Γ (Ψ (A) ⊕ B) and m(z) ∈ A ⇐⇒ m(z) ∈ Γ (Ψ (A) ⊕ B) for any follower z. Hence the operator Γ remains correct at all further stages. Suppose now that the (N, Γ )-strategy during its work sends infinitely many integers. In particular, no x is returned to (N, Γ ). Then the true outcome is o = λ and we will see that the (N, Λ)-strategy is successful. If the (N, Λ)- strategy stops at W ait then x ∈ A − Φ(B). Indeed if we assume that x ∈ Φ(B) then there is some finite Mx ⊂ B such that x ∈ Φ(Mx ). The
Total Degrees and Nonsplitting Properties
575
right boundary R grows unboundedly, so eventually there will be a stage s with Rs > max(Mx ) and the strategy will move on to Attack. The second case is if the strategy reaches Result. Then x ∈ Φ(B) − A because at some stage s we found a set Mx ⊂ Bs with max Mx < R such that x ∈ Φ(Mx ). The strategy (N, Γ ) will not extract any more markers from B after stage s that are below the right boundary Rs , hence x ∈ Φ(B). At this stage of the construction we can only prove that Λ will be correct at the follower x and all cancelled followers of the strategy (NΦ , F M ). To prove that the operator is correct at the rest of the followers enumerated in A by the (N, Γ )-strategy we will need to consider how all N -strategies will work together. Basic module for many NΦ -strategies under one RΞ,Ψ,Θ -strategy We will try to meet all requirements NΦ1 , NΦ2 , . . . . Each requirement NΦj will be denoted by Nj and met by one of the following strategies: 1. (Nj , Γ ) with outcomes λ, f , w and l; 2. (Nj , F M ) with outcomes f and w and situated in the subtree of the strategy (Ni , Γ ) with outcome l, where i ≤ j. 3. (Nj , Λ) with outcomes f and w and situated in the subtree of the strategy (Ni , Γ ) with outcome λ where where i ≤ j. We now need to be more careful as more strategies will enumerate and extract markers from A and B. We will have to ensure that the operator constructed on the true path is correct and manages to satisfy the R-requirement. The first rule that we will implement in order to achieve this follows the idea of cancelling followers of the (N, F M )-strategy from the previous section. Namely, whenever we initialise a strategy (Nj , S) on an node α in the tree of strategies whose follower x is in A we will enumerate an axiom x, ∅ into all operators Γ and Λ that are constructed on nodes β < α. If m(x) is in A we will also enumerate an axiom m(x), ∅ into these operators. Secondly we will be more careful when enumerating axioms in the corresponding operators. Instead of just using the sets G(x) and H(x), we will use the information from all axioms defined up until now. More precisely we will modify the modules of the strategies from the previous section in the following way: The (Nj , Γ )-strategy is the same as the as the (NΦ , Γ )-strategy with the exception of step Setup3, which is now as follows: Setup3) Enumerate all z, Gx ⊕ B yx ∪ U into Γ where z is either x, or mx , or a follower z ∈ A from a previous cycle of the strategy and U is the union of all sets D such that v, D is a valid axiom in Γ , where v ∈ A is a follower of the strategy (Ni , Γ ) with i < j. The (NΦj , Λ)-strategy is the same as the (NΦ , Λ)-strategy with the exception of Setup2), which is now as follows: Setup2) Enumerate x, (Hx ⊕ B v(x)) ∪ U into Λ where U is the union of all finite sets D such that v, D ∈ Λ for some follower v ∈ A of an (Nk , Λ)-strategy with k < j.
576
M.M. Arslanov et al.
The main idea behind the added sets U in the axioms is that a strategy α working below another strategy β where α and β construct the same operator O believes that β s work is final and the axioms enumerated in O by β will remain true. In the case that β changes its mind and invalidates one of these axioms α will be initialised as β will have an outcome to the left of α. If α s followers are still in A then an axioms for them will be enumerated in the operator as stated in above. But if α s follower is not in A, then we need to ensure that there isn’t a valid axiom in O for it. α will not be able to monitor this follower any longer, so the job is going to be transferred to β automatically via the set U which includes an axiom for β s follower, which β observes and makes sure is invalid. Two R-requirements Now we need to consider the case when there are two R-requirements. Corresponding to them there are nodes on the tree: an (R1 , Γ1 )-strategy and an (R2 , Γ2 )-strategy along each path, scanning for an appropriate global win for the R-requirements. Below outcome gw for an Ri -strategy the N -requirements simply ignore the requirement Ri and act as in the previous section. There now more possibilities for an N -strategy working below outcomes e of both (Ri , Γi )-strategies depending on how it believes the Ri -requirements will be satisfied. The main strategy will be again the one that deals with operators Γ1 and Γ2 . It will try to obtain the necessary changes in the sets Ψ1 (A) and Ψ2 (A) using backup strategies that try to satisfy the R requirements in a different manner. The requirement R1 is of higher priority. The method for satisfying the lower priority requirement R2 will be decided after we have established the method for satisfying R1 unless we have already evidence that the R2 -requirement is trivially satisfied. The N -strategy starts off assuming that the requirements will be satisfied via operators Γ1 and Γ2 . It will be denoted by (N, Γ1 , Γ2 ). Its outcomes are λ2
Total Degrees and Nonsplitting Properties
577
R2 must be decided again. The strategy is (N, Λ2 , Γ2 ) with outcomes λ2
578
M.M. Arslanov et al.
The (R, Γ1 ) strategy plays a similar role. Suppose that αˆl2 ⊂ β. Now β is sharing the same method Γ1 as α. If a follower x of β is extracted from A we must ensure that the axioms for x defined in the operator Γ1 are invalid. If α moves on to outcome w thereby initialising β we lose control on x and it could happen that G1 (x) ⊂ Ψ1 (A) at a later stage. We will be able to argue that if the axiom for x in Γ1 is valid, then H1 (x) ⊂ Θ1 (A) and (R, Γ1 ) will have outcome gw at all further stages. In [ACKS] we combine the ideas from the above description to obtain the construction that meets all requirements.
References [Ah91] Ahmad, S.: Embedding the diamond in the Σ2 enumeration degrees. J. Symbolic Logic 50, 195–212 (1991) [AL98] Ahmad, S., Lachlan, A.H.: Some special pairs of Σ20 e-degrees properties of Σ20 − enumeration degrees. Math. Logic Quarterly 44, 431–449 (1998) [ACK] Arslanov, M.M., Cooper, S.B., Sh. Kalimullin, I.: Splitting properties of total enumeration degrees. Algebra and Logic 42(1) (2003) [ACKS] Arslanov, M.M., Cooper, S.B., Sh. Kalimullin, I., Soskova, M.: Splitting and nonsplitting in the Σ20 enumeration degrees (to appear) [AS99] Arslanov, M.M., Sorbi, A.: Relative splittings of 0e in the Δ02 enumeration degrees. In: Buss, S., Pudlak, P. (eds.) Logic Colloquium 1998, Springer, Heidelberg (1999) [Co90] Cooper, S.B.: Enumeration reducibility, nondeterministic computations and relative computability of partial functions. In: Ambos-Spies, K., M¨ uller, G., Sacks, G.E. (eds.) Recursion Theory Week, Oberwolfach 1989. Lecture Notes in Mathematics, vol. 1432, pp. 57–110. Springer, Heidelberg (1990) [Co04] Cooper, S.B.: Computability Theory. Chapman & Hall/CRC, Boca Raton, London, New York, Washington, D.C (2004) [CC88] Cooper, S.B., Copestake, C.S.: Properly Σ2 enumeration degrees. Z. Math. Logik Grundlag. Math. 34, 491–522 (1988) [MC85] McEvoy, K., Cooper, S.B.: On minimal pairs of enumeration degrees. J. Symbolic Logic 50, 983–1001 (1985) [La75] Lachlan, A.H.: A recursively enumerable degree which will not split over all lesser ones. Ann. Math. Logic 9, 307–365 (1975) [NS00] Nies, A., Sorbi, A.: Branching in the enumeration degrees of the Σ20 sets. J. Symb. Log. 65(1), 285–292 (2000) [Sor97] Sorbi, A.: The enumeration degrees of the Σ20 sets. In: Sorbi, A. (ed.) Complexity, Logic and Recursion Theory, pp. 303–330. Marcel Dekker, New York (1997) [Sos07] Soskova, M.I.: A non-splitting theorem in the enumeration degrees. Annals of Pure and Applied Logic (to appear)
s-Degrees within e-Degrees Thomas F. Kent Dipartimento di Scienze Matematiche ed Informatiche “Roberto Magari” University of Siena, Italy [email protected]
Abstract. For any enumeration degree a let Das be the set of s-degrees contained in a. We answer an open question of Watson by showing that if a is a nontrivial Σ20 -enumeration degree, then Das has no least element. We also show that every countable partial order embeds into Das .
1
Introduction
Positive reducibilities formalize models of relative computability which use only “positive ” oracle information. The most comprehensive positive reducibility is enumeration reducibility, denoted by ≤e . Intuitively a set A is enumeration reducible to a set B if there is some effective procedure for enumerating A given any enumeration of B. This is made mathematically precise by defining A ≤e B if there exists a c.e. set Φ such that A = {x : (∃ finite D)[x, D ∈ Φ & D ⊆ B]} (often denoted by A = ΦB ) where finite sets are identified with their canonical indices. In this context a c.e. set Φ is also called an enumeration operator. It is clear that given a set B, an enumeration operator Φ, and a given x, there is no bound to the number n of oracle questions which are needed to enumerate x in ΦB , i.e. to the cardinality of a finite set D for which we need D ⊆ B, in order to have x ∈ ΦB . One can therefore introduce restricted, or strong, versions of enumeration reducibility by requesting instead that there be such a bound, such as is done in [1]. Although extreme, the case n = 1, in which for any given x we need at most one oracle question, is particularly interesting, and occurs often in practical applications of enumeration reducibility. This suggests the following definition: Definition 1.1. An enumeration operator Φ is called an s-operator if for every x, D ∈ Φ, we have that D has at most one element. It is straightforward to see that the s-operators (s stands for singleton) can be effectively listed, and give rise to a reducibility (called s-reducibility), denoted by ≤s . The corresponding degree structure, denoted by Ds , consists of the
The author has been supported by a Marie Curie Incoming International Fellowship of the European Community FP6 Program under contract number MIFI-CT-2006021702.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 579–587, 2008. c Springer-Verlag Berlin Heidelberg 2008
580
T.F. Kent
equivalence classes, called s-degrees, of the subsets of ω under the equivalence relation ≡s generated by ≤s . The s-degree of a set A will be denoted by degs (A). The structure Ds is an upper semilattice with least element 0s = degs (∅) consisting of the c.e. sets, and the operation of least upper bound is given by degs (A) ∪ degs (B) = degs (A ⊕ B), where ⊕ denotes the usual disjoint union of sets. It is clear that if A ≤s B then A ≤e B and so it is natural to ask questions about the structure of the s-degrees contained within a single enumeration degree. Given a set A, define A∗ = {n : Dn ∈ A} where n is the canonical index of the finite set Dn . It follows that A∗ ≡e A and if B ≤e A then B ≤s A∗ , and so degs (A∗ ) is the greatest s-degree in dege (A). Zacharov [10] showed that ≤s is properly contained in ≤e by showing that every nonzero enumeration degree contains at least two s-degrees. Zacharov’s proof is a unique gem of its kind and will be sketched in Section 3. Watson [9] showed that no nonzero Δ02 or Σ20 -high enumeration degree contains a minimal s-degree, thus showing that every such enumeration degree contains infinitely many s-degrees. Based also on Copestake’s result in [3] that every 1-generic enumeration degree contains infinitely many s-degrees (in fact an ω-chain of s-degrees), Watson then raised the question ([9, p. 90]) as whether every nontrivial Σ20 -enumeration degree contains infinitely many s-degrees, conjecturing that this is so. In this paper we give a positive answer to Watson’s question, by giving in Theorem 4.1 a uniform priority-free proof of the fact that no nonzero Σ20 -enumeration degree contains a minimal s-degree. That every nontrivial Σ20 -enumeration degree contains infinitely many s-degrees is also implied by Theorem 5.1, in which we show that one can embed any countable partial order in any nontrivial Σ20 enumeration degree. This theorem generalizes also Copestake’s result on ω-chains within any 1-generic enumeration degree. For the reader who is interested in restricted versions of enumeration reducibility, we finally observe that Theorem 4.1 still holds if we replace enumeration reducibility with computably bounded enumeration reducibility, where we say that a set A is computably bounded enumeration reducible to a set B, if A ≤e B via an enumeration operator Φ such that there exists a computable function f satisfying (∀x, D)[x, D ∈ Φ ⇒ |D| ≤ f (x)], where |D| denotes the cardinality of the finite set D. The reader is referred to the papers [1], [2], and [8] for a survey of results on s-reducibility.
2
Conventions
We give some conventions and notation that we will use throughout the article. Definition 2.1. Given a Σ20 -approximation As s∈ω to a set A, we say that the stage s is true if As ⊆ A, and we say that the approximation is good if it contains infinitely many true stages.
s-Degrees within e-Degrees
581
Even though we are using a good Σ20 -approximation to A, it is possible that a strategy on the true path of the construction will be active only during finitely many true stages. To circumvent this problem, each strategy in the tree of strategies will useits own relativized version of the approximation to A defined by Aα s s∈ω = s0
3
Sketch of Zacharov’s Proof
Let A be a non-c.e. set. We outline the main steps of Zacharov’s proof to show that there exists a set B such that A ≡e B, but A s B. One of the key points is the following observation: If a set X is not c.e. and for every c.e. subset W ⊆ X there exists a computable set V such that W ⊆ V ⊆ X (examples of such sets X include the immune sets and the non-c.e. semirecursive sets), then for every set Y containing a simple set, we have that X s Y . 1. Define J(A) = x : x ∈ ΦA where Φe e∈ω denotes here an effective listing x of all enumeration operators. It is easy to see that A ≡e J(A), and for every set B, if B ≤e J(A) then B ≤1 J(A), where ≤1 denotes 1 − 1-reducibility. 2. Assume now that S is Post’s simple set, and let {Fn }n∈ω be a strong array of finite sets (i.e. Fn = Df (n) for some computable function f ), which partition ω and such that Fn ∩ S = ∅, for every n. Define Fn , B=S∪ n∈J(A)
so that B contains a simple set. It is not difficult to see that J(A) ≡e B. 3. We now claim that J(A) s B, thus showing that the enumeration degree of A contains two distinct s-degrees. Suppose that J(A) ≤s B. By [4, Theorem 3.6] let R be a semirecursive set such that R ≤e J(A) and J(A) ≤T R. Hence R ≤1 J(A), which implies that R ≤s B. But, as observed at the beginning of this section, since R is semirecursive this is possible only if R is c.e.. It follows that J(A) ∈ Δ02 . It is now easy to see that there exists an immune set C such that C ≤e J(A): In fact any non-c.e. Σ20 set is enumeration equivalent to a hyperimmune set (see [6]). Therefore C ≤1 J(A), which implies that C ≤s B, a contradiction since B contains a simple set, and C is immune.
4
There Is No Minimal Element
Watson [9] proves that if A is Δ02 and non-c.e., or A is Σ20 -high, then there exists a set B such that B ≡e A and B <s A. He actually gives two distinct proofs
582
T.F. Kent
depending on whether A lies in Δ02 or A is Σ20 -high. The following theorem gives a uniform and priority-free proof of Watson’s result that works for every non-c.e. Σ20 set. Theorem 4.1. Let A be a non-c.e. Σ20 -set. There exists a set B ≡e A such that B <s A. Corollary 4.1. If a is a nontrivial Σ20 -enumeration degree, then there is no minimal s-degree within a. Proof (of Corollary 4.1). Immediate. We now give the construction, and a sketch of the verification, for Theorem 4.1. 4.1
The Requirements
Let A be a non-c.e. Σ20 -set. We build an enumeration operator Γ and an soperator Λ such that for each s-operator Φ we meet the following requirements: R: B = ΛA Q: A = Γ B PΦ : A = ΦB ⇒ A is c.e. Order the PΦ -requirements as Pi i∈ω . 4.2
The Strategies
Partition ω as {Fn : n ∈ ω} where |Fn | = (n + 2)3 and max(Fn ) < min(Fn+1 ). For each x, define σ(x) to be such that x ∈ Fσ(x) . Define Γ = {n, Fn : n ∈ ω} The construction of Λ is by stages. At stage s define a finite approximation Λs s to Λ, and let of course Bs = ΛA s . Initially define Λ0 = {y, {n} : n ∈ ω and y ∈ Fn } . Notice that if we defined B = ΛA 0 , we would immediately have n ∈ A ⇔ Fn ⊆ B ⇔ n ∈ Γ B , i.e. A = Γ B . Unfortunately, strategies attempting to meet PΦ -requirements may enumerate additional axioms into Λ as needed during the construction, while Γ is never changed. These additional axioms will have the form z, ∅ ∈ Λ, in which case we say that we dump z into B. However we will guarantee that A = Γ B for the eventual set B = ΛA , by ensuring that for every n, not all the elements of Fn are dumped into B. It follows from the definition of the operator Γ that if n belongs to A then Fn is a subset of B. The fact that not all the elements of Fn are dumped into B guarantees that if n does not belong to A then Fn will not be a subset of B and hence n will not belong to Γ B .
s-Degrees within e-Degrees
583
The strategy for PΦ . The basic strategy to meet requirement PΦ = Pt tries to permanently restrain in ΦB the elements of A ∩ ΦB : If at stage s we see a number y ∈ A∩ΦB which has not been so far restrained, then we do so, by either automatic restraint provided by an axiom y, ∅ ∈ Φ, or otherwise by taking an axiom y, {z} ∈ Φ, with currently z ∈ B, and dumping z into B. If A = ΦB then we can show that A is c.e. by arguing that A coincides, modulo a c.e. set, with the elements restrained in ΦB . Extra care will be taken in order to achieve that the strategy only dumps elements of sets Fn with t ≤ n, and for each such n, PΦ dumps at most n + 1 elements of Fn . This, together with the fact that |Fn | = (n + 2)3 , will guarantee that not all the elements of Fn are dumped into B: In fact, for each n, Fn ∩ Λ∅ ≤ (n + 1)3 . 4.3
The Construction
Each stage s of the construction consists of substages t < s, where strategy Pt is eligible to act. Let strategy Pt = PΦ be eligible to act at substage t of stage Ais Bsi+1 i 0 i+1 i+1 s = A where A = A t, B = Λ and A = Φ . Let s. Define A s s s s s s i∈ω s s s . If y exists, let z ∈ Bs be least such y ∈ As ∩ ΦB be least such that y ∈ / A s s that y, {z} ∈ Φs and σ(z) ∈ ΦB s . If z exists, dump z into B by enumerating s is the smallest c.e. set such that, for the axiom z, ∅ into Λ. (Intuitively, A s s A B s .) Bs = Λs , we have As = Φs , and As t ⊆ A The reader is referred to [5] for a thorough verification that the construction works.
5
Embedding Countable Partial Orders
A further easy consequence of Theorem 4.1 is the following corollary. Let Z− denote the partial order of the nonpositive integers: Corollary 5.1. If a is any nontrivial Σ20 enumeration degree, then Z− embeds into the s-degrees within a. Proof. If A is a non-c.e. Σ20 set, define B0 = A . Having defined Bi then apply Theorem 4.1 to Bi to get Bi+1 ≡e Bi (hence Bi+1 ≡e A), and Bi+1 <s Bi . We improve on the previous embedding result by showing that in fact every countable partial order embeds in the s-degrees within any nontrivial Σ20 enumeration degree. Theorem 5.1 generalizes also Copestake’s result in [3] stating that every 1-generic enumeration degree contains a chain of s-degrees orderisomorphic to ω. We say that a family {bi }i∈ω of s-degrees is independent if for every i ∈ ω, and any computable set J with i ∈ / J, one has bj bi ≤ j∈J
where the join is defined by taking suitable representatives Bj ∈ bj and letting
584
T.F. Kent
bj = degs (
j∈J
({j} × Bj ))
j∈J
Theorem 5.1. Let a be a non-trivial Σ20 -enumeration degree. Then there exists an independent family of s-degrees {bi }i∈ω such that for each i, bi ⊆ a. Corollary 5.2. For every non-trivial Σ20 -enumeration degree a and countable partial order P = P, ≤, we have that P embeds into the s-degrees within a, i.e. there exists a mapping F , associating with any p ∈ P an s degree F (p) ⊆ a, and satisfying, for all p, q ∈ P , p ≤ q ⇔ F (p) ≤s F (q). Proof (of Corollary 5.2). It is a standard argument (see for instance [7, Theorem V.2.9]) to show how to embed any countable partial order in a class of degrees which contains a computably independent collection of degrees, and is closed under computable joins. We now give the construction, and a sketch of the verification, for Theorem 5.1. 5.1
The Requirements
Fix a non-c.e. Σ20 -set A. We build enumeration operators Γi and Λi , such that for all s-operators Φ and i ∈ ω, we meet the following requirements: Ri : Bi = ΛA i Qi : A = ΓiB i Pi,Φ : Bi = Φ j=i Bj ⇒ ∃Δ (A = Δ) where Δ is a c.e. set constructed by us. 5.2
The Tree of Strategies
Let our set of outcomes be 0 < 1 < · · · < ∞ and define T = (ω ∪ {∞})<ω . Order the P-requirements as Pi i∈ω and assign to each α ∈ T the requirement P|α| . In the rest of the proof, notations and terminology about trees are standard. 5.3
The Construction
[i] Let As s∈ω approximation to A. We define Bi ⊆ ω , and so we
be a good can take j=i Bj = j=i Bj . Using the convention that max(∅) = min(∅) = [i] [i] −1, partition ω as {Fn : n ∈ ω} where |Fn | = (n + 2)2 if i ≤ n and Fn = ∅ [i] [i] otherwise, and such that max(Fn ) ≤ min(Fn+1 ). For each x define σ(x) to be such that x ∈ Fσ(x) . For each i ∈ ω, define Λi = x, An : n ≥ i and x ∈ Fn[i] , and
Γi = x, Fn[i] : n ≥ i and x ∈ An .
s-Degrees within e-Degrees
585
We will enumerate additional axioms into Λi as needed during the construction, and so at each stage of the construction, for i, x ∈ ω and α ∈ T, define
G(i, x) = {F : x, F ∈ Λi } , and Gα =
α {G(i, x) : x ∈ Bi and x = bα m for some m < n } ,
where nα and the bα m are parameters defined by α during the construction. 5.4
The Strategies
As in the proof of the previous theorem, it is immediate to see that if we let Bi Bi = ΛA i , with Λj as is initially defined, then it would follow that A = Γi . In defining the additional axioms for Λi (as needed by the Pi,Φ -strategies) we must [i] therefore ensure that for every k, there is at least one element y ∈ Fk for which we do not define additional axioms. The strategy α for Pi,Φ can be visualized as working on cycles, one cycle for each natural number n: Cycle n:
1. If bm ∈ Φ j=i Bj − Bi , for some m < n, then stop all cycles k > m. If later bm ∈ Φ j=i Bj ∩ Bi again then resume
all existing cycles; / {b0 , . . . , bn−1 }, 2. otherwise, wait for a number b ∈ Φ j=i Bj ∩ Bi , with b ∈ such that there is either an axiom b, ∅ ∈ Φ, or an axiom b, {y} ∈ Φ where y ∈ Bj , some j = i, and y can be conditionally dumped into Bj , by adding an axiom y, G ∈ Λj
, where G is such that if b0 , . . . , bn−1 ∈ Bi then G ⊆ ⊕k=i Bk , forcing b ∈ Φ j=i Bj . In doing so we must not violate the constraints imposed by higher priority strategies, and care should be taken to guarantee that for every z the strategy conditionally dumps at most one [j] element of Fσ(z) . 3. Once such a number b appears, appoint bn = b, conditionally dump the / Φ), impose a restraint on bn so that corresponding y into Bj (if y, ∅ ∈ lower priority strategies are not allowed to add additional Λ-axioms for bn , enumerate the elements of the finite set G, as defined in Step 2, into an auxiliary c.e. set Δ, and go to Cycle n + 1. We briefly describe the outcomes of this strategy, and how they are recorded on the true path if α is on the true path. If there is a least m such that outcome (1) holds
infinitely often, than the requirement is satisfied since in this case path. Outcome (2) would bm ∈ Φ j=i B j − Bi and so α m is on the true
imply that Φ j=i Bj ∩ Bi is c.e.; therefore if Φ j=i Bj = Bi then Bi is c.e., and so A = ΓiBi is c.e., a contradiction. On the other hand, each time we pass through (2) we let α ∞ be eligible to act next. Finally if all cycles are defined
and there is no bm ∈ Φ j=i Bj − Bi , then we can argue that A = Δ, i.e. A is c.e., again a contradiction. In this last case, the construction ends the current stage. However, since A is not c.e., we can conclude that there is indeed an outcome o
586
T.F. Kent
such that α o is on the true path, and this outcome is of the form o = m, for some bm ∈ Φ j=i Bj − Bi . Before proceeding with the details of the construction let us spend a few more words on how different strategies interact with each other. Let α1 have higher priority than α2 . By initialization, we may assume that α1 ⊂ α2 , and by the above informal remarks we consider only the case of some m such that α 1 m ⊆ α2 . The restraints imposed by α1 on α2 are that α2 is not allowed to add additional Λ-axioms for numbers bn that α1 conditionally dumps in (3) of the above module. We can argue that these restraints guarantee that for each i [i] and n ≥ i, there is a y ∈ Fn such that the only axiom of the form y, F ∈ Λi is y, An , thus letting the Qi - and Ri -requirements be satisfied. On the other hand these restraints, being finitary by the above discussion, do not prevent α2 from being satisfied. Each stage s of the construction consists of substages t < s, where a strategy α ∈ T, with |α| = t is eligible to act. Let α be a Pi,Φ -strategy eligible to act at substage t of stage s and let s0 be the first stage at which α was eligible to act after its last initialization. If this is the first time that α has been eligible to act, set n = 0. Choose the first case which applies.
Case 1. There is an m < n such that bm ∈ Φ j=i Bj − Bi : Let m0 be the least such m and end the current substage by letting α m0 be eligible to act next.
Case 2. For all m < n, bm ∈ Φ
j=i
Bj
∩ Bi :
Case 2.1. There is a b ∈ Φ j=i Bj ∩ Bi − {bm : m < n} with σ(b) >
s0 and Gα ⊆ G(i, b) such that either b, ∅ ∈ Φ or b, {y} ∈ Φ with y ∈ j=i Bj , σ(y) > s0 , y = bβm for all β ⊆ α and m < nβ , and y ∈ / Fσ(ym ) for all m < n: / Φ, let yn be the least such y for bn , Let bn to be the least such b. If bn , ∅ ∈ and enumerate yn , β⊆α Gβ into Λj for the j such that yn ∈ Bj . Enumerate G(i, bn ) into Δ. Set n = n + 1, and end the current stage. Case 2.2. Else: Let α ∞ be eligible to act next. Ending the stage s: For any α > fs cancel all local parameters and set Δ = ∅. Again, the reader is referred to [5] for more details on the construction and its verification.
References 1. Cooper, S.B.: Enumeration reducibility using bounded information: counting minimal covers. Z. Math. Logik Grundlag. Math. 33(6), 537–560 (1987) 2. Cooper, S.B.: Enumeration reducibility, nondeterministic computations and relative computability of partial functions. In: Ambos-Spies, K., M¨ uller, G., Sacks, G.E. (eds.) Recursion theory week (Oberwolfach, 1989). Lecture Notes in Math, vol. 1432, pp. 57–110. Springer, Heidelberg (1990) 3. Copestake, C.S.: Enumeration Degrees of Σ20 -sets. PhD thesis, Department of Pure Mathematics, University of Leeds (1987)
s-Degrees within e-Degrees
587
4. Jockusch Jr., C.G.: Semirecursive sets and positive reducibility. 131, 420–436 (1968) 5. Kent, T.F.: On a conjecture of Watson (to appear, 2008) 6. McEvoy, K.: The Structure of the Enumeration Degrees. PhD thesis, School of Mathematics, University of Leeds (1984) 7. Odifreddi, P.: Classical Recursion Theory (vol. I), Amsterdam (1989) 8. Sh. Omanadze, R., Sorbi, A.: Strong enumeration reducibilities. Arch. Math. Logic 45(7), 869–912 (2006) 9. Watson, P.: On restricted forms of enumeration reducibility. Ann. Pure Appl. Logic 49(1), 75–96 (1990) 10. Zakharov, S.D.: e- and s-degrees. Algebra i Logika 23(4), 395–406, 478 (1984)
The Non-isolating Degrees Are Upwards Dense in the Computably Enumerable Degrees S. Barry Cooper1 , Matthew C. Salts, and Guohua Wu2, 1
2
School of Mathematics, University of Leeds, Leeds LS2 9JT, U.K. [email protected] http://www.amsta.leeds.ac.uk/ pmt6sbc/ School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 639798, Republic of Singapore [email protected] http://www.ntu.edu.sg/home/guohua/
Abstract. The existence of isolated degrees was proved by Cooper and Yi in 1995 in [7], where a d.c.e. degree d is isolated by a c.e. degree a if a < d is the greatest c.e. degree below d. A computably enumerable degree c is non-isolating if no d.c.e. degree above c is isolated by c. Obviously, 0 is a non-isolating degree. Cooper and Yi asked in [7] whether there is a nonzero non-isolating degree. Arslanov et al. showed in [3] that nonzero non-isolating degrees exist and that these degrees are downwards dense in the c.e. degrees and can also occur in every jump class. In [11], Salts proved that there is an interval of computably enumerable degrees, each of which isolates a d.c.e. degree. Recently, Cenzer et al. [4] proved that such intervals are dense in the computably enumerable degrees, and hence the non-isolating degrees are nowhere dense in the computably enumerable degrees. In this paper, using a different type of construction to that of [3], we prove that the non-isolating degrees are upwards dense in the computably enumerable degrees. In the context of [4], this is the best possible such result.
1 Introduction The existence of isolated degrees was proved by Cooper and Yi in 1995 in [7], where a d.c.e. degree d is isolated by a c.e. degree a if a < d is the greatest c.e. degree below d. Ding and Qian [8], LaForte [10] independently, proved that the isolated degrees, and hence the isolating degrees, are dense in the computably enumerable degrees. In [3], Arslanov, Lempp and Shore proved that the non-isolated degrees are also dense in the computably enumerable degrees. In [13], Wu use the isolation phenomenon an alternative proof of Downey’s diamond embedding theorem. Ishmukhametov and Wu
This research was partially supported by a London Mathematical Society collaborative small grant. The first author was supported by EPSRC grant No. GR /S28730/01, and by the NSFC Grand International Joint Project, No. 60310213, New Directions in the Theory and Applications of Models of Computation, the second author by an EPSRC Research Studentship and the EU Human Capital and Mobility Network Complexity, Logic and Recursion Theory (COLORET), and the third author is partially supported by a research grant No. RG58/06 from Nanyang Technological University.
M. Agrawal et al. (Eds.): TAMC 2008, LNCS 4978, pp. 588–596, 2008. c Springer-Verlag Berlin Heidelberg 2008
The Non-isolating Degrees Are Upwards Dense
589
[12] proved that sometimes, the isolated degrees can be far from the corresponding isolating degrees. That is, there is a high d.c.e. degree isolated by a low c.e. degree. This too has had an interesting application (see [1]), related to Post’s problem for the d.c.e. degrees. In this paper, we are mainly concerned with the non-isolating degrees, where a computably enumerable degree c is non-isolating if no d.c.e. degree above c is isolated by c. Obviously, 0 is a non-isolating degree. Cooper and Yi asked in [7] whether there is a nonzero non-isolating degree. Arslanov et al. showed in [3] that nonzero non-isolating degrees exist and that these degrees are downwards dense in the c.e. degrees and can also occur in every jump class. Arslanov et al. actually proved a stronger result. They first pointed out that for any c.e. degree c and d.c.e. degree d > c, there is a degree a c.e. in c such that c < a < d, and then proved that there is a c.e. degree failing to isolate any 2-CEA degree in it. In [11], Salts proved that there is an interval of computably enumerable degrees, each of which isolates a d.c.e. degree. Recently, Cenzer et al. [4] proved that such intervals are dense in the computably enumerable degrees, and hence the non-isolating degrees are nowhere dense in the computably enumerable degrees. In this paper, we prove that the non-isolating degrees are upwards dense in the computably enumerable degrees, so completing a near comprehensive characterisation of the situation. Theorem 1. For any incomplete c.e. degree a < 0 , there is an incomplete non-isolating degree c above a. Our construction of the non-isolating degrees is direct, and is different from the one given by Arslanov et al. in [3]. In section 2, we show how to construct a non-isolating degree. In section 3, we describe how to combine our construction with the upwards density to prove Theorem 1.
2 Constructing a Non-isolating Degree In this section, we present a new construction of non-isolating degrees. We will construct c.e. sets A, C satisfying the following requirements: Pe : A = Φe ; Qe : C = ΦA e ; e = ΦA ⇒ ∃Be ≤T De ⊕ A(Be ≤T A) ∨ De ≤T A; Re : D e where {(De , Φe ) : d ∈ ω} is an effective list of pairs (D, Φ), where D is a d.c.e. set is the Lachlan set of D, with respect and Φ is a partial computable functional. Here, D to an effective (d.c.e.) approximation {Ds : s ∈ ω}. That is: = { x, s : x ∈ Ds & ∃t > s(x ∈ Dt )}. D ≤T D and it is a c.e. set. From the approximation {Ds : s ∈ ω}, we have Obviously, D x, s is enumerated into D at stage t if t is the following effective enumeration of D: the least stage such x ∈ Dt .
590
S.B. Cooper, M.C. Salts, and G. Wu
The strategy for satisfying the P and Q requirements is the standard FriedbergMuchnik one, and we assume the readers are familiar with it. The R-requirements are non-isolating requirements. That is, for a d.c.e. set D, if D is not reducible to A, then we want to find a c.e. set reducible to A ⊕ D, but not reducible to A, so that A does not or some other, B, isolate A ⊕ D. This c.e. set may either be the natural candidate D, which we will need to construct. On the other hand, if D itself is reducible to A (where, is reducible to A), then we will need to show this fact. of course, D To satisfy Re , we need to construct a c.e. set Be , and a p.c. functional Γe for which Be = ΓeA⊕D . At the same time, we also want to ensure that Be ≤T A, provided that D is not reducible to A. That is, the following requirements should be also satisfied: Se,i :
A Be = ΦA i or there is a p.c. functional Δe,i such that D = Δe,i .
An Re strategy defines Γe at sufficiently large, that is “big”, expansionary stages. Here we say that an expansionary stage is big if the length of agreement between D and ΦA is bigger than any number specified by a substrategy S. Obviously, if there are e infinitely many expansionary stages, there are also infinitely many big expansionary stages. An Re strategy has two outcomes: f for finitely many expansionary stages, and ∞ for infinitely many expansionary stages, with ∞
The Non-isolating Degrees Are Upwards Dense
591
4. Say D(n) changes at stage t > s. There are two possibilities. (a) n enters D at stage t. In this case, the D(n) change undefines ΓeA⊕D (xn ), and and ΦA instead of putting xn into Be immediately at this stage, we wait for D e to agree on (all numbers ≤) n, t . [We delay the enumeration of xn into Be mainly because n may leave D later, making ΓeA⊕D correct, forcing the enumeration of a number into A to change ΓeA⊕D (xn ), resulting in no real progress.] at this stage. Since the computation (b) n leaves D at stage t. Then n, s enters D ΦA ( n, s ) = 0 is preserved at stage s by restraining A, and we get a global e win via ΦA e ( n, s ) = 0 = 1 = D( n, s ). and ΦA 5. At stage t > t, D e agree on (all numbers ≤) n, t . We enumerate xn into Be , and put restraint on A to preserve both computations ΦA i (xn ) = 0, and ΦA ( n, t ) = 0. e [Here, at stage t, n enters D, which undefines ΓeA⊕D (xn ). From now on, we wait and ΦA to exceed n, t , and during this period, we for the agreement between D e A⊕D do not define Γe (xn ). Thus, at stage t , we can enumerate xn into Be directly, as ΓeA⊕D (xn ) is undefined at this stage. If there is no such a stage t , then Re will have outcome f , and below this outcome, no substrategies Se,i are listed. We consider the case when a strategy X is between Re and Se,i . W.o.l.g., suppose that X is an Se ,j -strategy with e < e. Then under outcome f , another strategy is arranged to satisfy the Se ,j -requirement. A nontrivial case is that after X puts a number into Be , De changes now and the permission from De used for the enumeration of the number into Be goes away — (6) is reached. In this case, Re e and ΦA is found. is satisfied permanently because a disagreement between D e We will also see that in the whole construction, only P strategies put numbers into A. This explains why 0 is non-isolating and why the nonzero non-isolating degrees are downwards dense in the c.e. degrees. We will see in the next section that it will not be like this when we prove the upwards density, where a change of K is required.] at this stage. As the 6. At stage t > t , n leaves D. Then as in 4(b), n, t enters D computation ΦA ( n, t ) = 0 is preserved at stage t , by keeping this restraint on A, e we get a global win via ΦA e ( n, t ) = 0 = 1 = D( n, t ). We now consider the outcomes of Se,i , which can run finitely many or infinitely many steps. Notice that if step n2 is in progress, and now we have an A-change so that we come back to 2 of step n1 < n2 , then ΔA e,i (n2 ) is undefined automatically by this A-change. After this, we need to start step n2 from 1. That is, we choose a new xn2 . If some step n passes 3, then we win either by 4(b) or 6, which is a global win via ΦA e = D, where this disagreement is preserved forever. This means that there will be no
592
S.B. Cooper, M.C. Salts, and G. Wu
more Re -expansionary stages, or we win by 5, where we get a D-change at n, where this n remains in D, ensuring that the previous axioms enumerating xn into Γ A,D are invalid forever. If no step passes 3, then notice that we put no restraint on A. It can happen that there A is a least n such that ΦA e (xn ) does not converge to 0, or for each n, Φe (xn ) converges to 0. Se,i is satisfied in both cases: In the former case, Be (xn ) = 0 = ΦA e (xn ); and in the latter case, each step remains at 3, and hence ΔA e,i (n) is defined and equal to D(n). So ΔA e,i = D. So this Se,i module has two outcomes: w and s with w
3 Upwards Density We are now ready to present the strategy for constructing an incomplete non-isolating degree above any incomplete c.e. degree. Fix U as an incomplete c.e. set. We will construct c.e. sets A and C satisfying the following requirements: ; Pe : C = ΦA,U e A,U ⇒ (∃Be ≤T A ⊕ De ⊕ U )(Be ≤T A ⊕ U ) ∨ De ≤T A ⊕ U ; Re : De = Φe where {(De , Φe ) : d ∈ ω} is a standard list of pairs (D, Φ), where D is a d.c.e. set and is the Lachlan set of D. Here we did not Φ is a partial computable functional, and D require that A is not reducible to U , as we can obtain this by applying Sacks’ density theorem first. We apply the Sacks preservation strategy to satisfy the P requirements by running (infinitely many) cycles to threaten the assumption that U is incomplete via a p.c. functional Θ. Cycle n behaves as follows: 1. Choose a big number xn . (xn ) ↓= 0. 2. Wait for ΦA,U e 3. Say ΦA,U (x ) converges to 0 at stage s. Define ΘU (n) = Ks (n) with use θ(n) = n e ϕ(xn ). Restrain A from changing below ϕ(xn ). Wait for a change of K(n) or a change of U below ϕ(n), and simultaneously start the next cycle. 4. Say U changes below ϕ(xn ) first. Then go back to 2. Note that ΘU (n ), each n ≥ n, is undefined by this U change. 5. Say K(n) changes first. Then we put xn into C, and wait for a change of U below ϕ(xn ). [If there is no such a U -change, then ΘU (n) differs from K(n). But as U below ϕ(xn ) is fixed, we satisfy P because ΦA,U (xn ) ↓= 0 is preserved forever, and e C(xn ) = 1. Otherwise, as above, a U -change undefines ΘU (n ), each n ≥ n, allowing us to redefine ΘU (n) = K(n) = 1.]
The Non-isolating Degrees Are Upwards Dense
593
6. Undefine ΘU (n ) for n ≥ n, and redefine ΘU (n) = K(n) = 1 with use 0. Start the next cycle. The P module has the following outcomes: n, w : cycle n stops at 2. P is satisfied since ΦA,U (xn ) does not converge to 0 and e C(xn ) = 0. (xn ) converges to 0 and C(xn ) = 1. n, s : cycle n stops at 2. P is satisfied since ΦA,U e n, u : cycle n runs through the loop 2-3-4-2 infinitely often. P is satisfied since (xn ) does not converge at all, and C(xn ) = 0. ΦA,U e These outcomes are ordered: 0, u
Be = ΦA⊕U or there is a p.c. functional Δe,i such that D = ΔA⊕U i e,i .
As in section 2, a Re strategy defines Γe at big expansionary stages, and has two outcomes: f for finitely many expansionary stages, and ∞ for infinitely many expansionary stages, with ∞
594
S.B. Cooper, M.C. Salts, and G. Wu
Whenever A or U changes below δe,i (n), we go back to 2, in which case, ΔA,U e,i (n) is undefined. [Note that here, n can be in Ds or not in Ds . If n is currently not in Ds , then n can enter D later, and leave at a further stage. is equal to ΦA⊕U , When n enters D, at stage s , we will use the assumption that D e and we want to restrain A from changing to retain as then n, s is not in D, this computation. If n leaves D later (Γ A⊕D⊕U (xj,n ) reverts to a previous value, making D and ΦA⊕U disagree at which is equal to 0), then n, s will enter D, e n, s . This disagreement can now fail only when U changes, which will undefine Γ A⊕D⊕U (xj,n ). If n is in Ds , then we do nothing here, as if n leaves D later, this change will allow us to undefine Γ A⊕D⊕U (xn ), and we can put xj,n into Be . Notice that axioms enumerated into Γe before n enters D are all invalid due to the A or U changes. Since otherwise we will be in the situation described in the last paragraph, and ΦA⊕U .] where n leaving D will cause a disagreement between D e Wait for D(n) to change, and simultaneously commence the step j, n + 1 . 4. Say D(n) changes at stage t > s. There are two possibilities: (a) n enters D at stage t. In this case, the D(n) change makes ΓeA⊕D⊕U (xj,n ) undefined, and instead of putting xj,n into Be immediately at this stage, we and ΦA to agree on (all numbers ≤) n, t . wait for D e (b) n leaves D at stage t. Then this D(n) change necessarily results in ΓeA⊕D⊕U (xj,n ) becoming undefined, and we define ΘU (j) = K(j) with use θ(j) = ϕi (xj,n ), and wait for U to change below θ(j) or K(n) to change. Of course, a restraint is put on A to preserve the computation ΦA⊕U (xj,n ). e If U changes first, go back to (2), and the restraint is canceled. If K(n) changes first, go to (6). and ΦA agree on (all numbers ≤) n, t . We put restraint on A to 5. At stage t > t, D e preserve both computations ΦA,U (xj,n ) = 0, and ΦA,U ( n, t ) = 0. Create a link e i between Se,i and Re , the mother node. This link can be traveled and canceled when we find at Re that U has changes below ϕe ( n, t ). If there is no such a U -change, then this link can survive for ever. We define ΘU (j) = K(j) with use θ(j) = max{ϕi (xj,n ), φe ( x, s )}, and wait for U to change below θ(j), or K(n) to change, or n to leave D. Simultaneously, start cycle j + 1. If K(j) changes first, then go to (6). If n leaves D first, go to (7). If U changes first, see whether U has a change below ϕi (xj,n ). If yes, then go back to (2). Of course, the restraint on A established at stage t will be canceled. and ΦA⊕D to agree on (all numbers ≤) Otherwise, go back to (4) and wait for D e n, t . 6. K(j) changes. We enumerate xj,n into Be , and the current γe (xj,n ) into A. Note that this new γe (xj,n ) is bigger than θ(j), since it is defined after stage t .
The Non-isolating Degrees Are Upwards Dense
595
Wait for U to change below θ(j), and also wait for n to leave D. If n leaves D first, then go to (7). If U changes first, go to (8). at this stage. As the computation 7. At stage t > t , n leaves D. Then n, t enters D A Φe ( n, t ) = 0 is preserved at stage t , we get a global win via ΦA,U ( n, t ) = 0 = 1 = D( n, t ), e provided that U does not change change below ϕe ( n, t ). Wait for U to change below θ(j). 8. Put γe (xj,n ) into A and redefine ΘU (j) = K(j), with use 0. Start cycle j + 1. The outcomes for Se,i are rather more complicated. As in the P-module, we will have outcomes for each cycle, and each cycle has outcomes different from those given in section 2. Below, we only describe the outcome for a particular cycle j. j, w : Cycle j runs many steps, and each one stays at (2) or (3), or returns to (2) infinitely often from (3) or (4b). In this case, no restraint is put on A, and Se,i is satisfied by either ΦA,U (xj,n ) not converging for some xj,n (some step e stops at (2) or loops between (2) and (3), or (4b)), or by ΔA,U e,i becoming totally defined and computing D correctly. [Note that if this outcome is true, a step can return to (2) from other point beyond (5) at most once, via a U change.] No restraint is put on A in this outcome. and Π A⊕U j, n, u : Step n of cycle j reaches (5) and later returns to (4) to wait for D e to agree on number ≤ n, t , infinitely often by U -changes, where t is the stage at which n enters D. That is, infinitely many links to the mother node Re are created and canceled in the construction. If this outcome is true, then ΦA⊕U ( n, t ) diverges, leading to a global win for Re . e No restraint is put on A in this outcome. j, n, d : Step n of cycle j reaches (7) because after (5), n leaves D, no matter whether K(j) has changed or not. As we put a restraint on A at (5), if U does not change below ϕe ( n, t ), then we will have ΦA⊕D ( n, t ) = 0 and n, t ∈ D e — again, a global win for Re . If this outcome applies, then there are only finitely many expansionary stages. That is, Re will have f as its outcome, and we do not put this outcome below Se,i . [Notice that if this outcome applies, ΓeA⊕D⊕U (xj,n ) can be wrong in the case that step n reaches (6), and xj,n is put into Be when n is in D. If later n leaves D, then ΓeA⊕D⊕U (xj,n ) can be reinstated to the one before n enters D, which is currently defined as 0. If so, then either cycle j reaches (7) and stays at (7), or it eventually reaches (8). If the former case, as indicated above, we have a global win for Re . If cycle j eventually reaches (8), then cycle j also wins as ΘU (j) is defined, and is equal to K(j). To keep ΓeA⊕D⊕U (xj,n ) correct, at (8), we also put γe (xj,n ) (the old one) into A. This enumeration does not injure cycle j.]
596
S.B. Cooper, M.C. Salts, and G. Wu
j, s : Some step n of cycle j reaches and stays at (6). Then Se,i is satisfied, via Be (xj,n ) = 1 = 0 = ΦA⊕U (xj,n ). Notice that since n remains in D (othi erwise step n will go to (7), which cannot be true by our assumption), the enumeration of γe (xj,n ) (the new one) into A will not change the computation ΦA⊕U (xj,n ). The enumeration of this γe (xj,n ) makes ΓeA⊕D⊕U welli defined and correct at xj,n . Also note that U will not change so as to injure this computation, since otherwise step n will go to (8), which again cannot be true. We arrange the outcomes of cycle j as: j, w
References 1. Afshari, B., Barmpalias, G., Cooper, S.B., Stephan, F.: Post’s Programme for the Ershov Hierarchy. J. of Logic and Computation (to appear) 2. Arslanov, M.M., Lempp, S., Shore, R.A.: Interpolating d-r.e. and REA degrees between r.e. degrees. Ann. Pure. Appl. Logic 78, 29–56 (1994) 3. Arslanov, M.M., Lempp, S., Shore, R.A.: On isolating r.e. and isolated d-r.e. degrees. In: Computability, enumerability, unsolvability. London Math. Soc. Lecture Note Ser, vol. 224, pp. 61–80. Cambridge Univ. Press, Cambridge (1996) 4. Cenzer, D., LaForte, G., Wu, G.: The nonisolated degrees are nowhere dense (in preparation) 5. Cooper, S.B.: Computability Theory. Chapman & Hall/CRC, Boca Raton, London, New York, Washington, D.C (2004) 6. Cooper, S.B., Salts, M., Wu, G.: The non-isolating degrees are upwards dense in the computably enumerable degrees (to appear) 7. Cooper, S.B., Yi, X.: Isolated d.r.e. degrees. University of Leeds, Dept. of Pure Math. 17 (1995) (preprint series) 8. Ding, D., Qian, L.: Isolated d.r.e. degrees are dense in r.e. degree structure. Arch. Math. Logic 36, 1–10 (1996) 9. Downey, R.G.: D.r.e. degrees and the nondiamond theorem. Bull. London Math. Soc. 21, 43–50 (1989) 10. LaForte, G.: The isolated d.r.e. degrees are dense in the r.e. degrees. Math. Logic Quart. 42, 83–103 (1996) 11. Salts, M.: An interval of computably enumerable isolating degrees. Math. Logic Quart. 45, 59–72 (1999) 12. Ishmukhametov, S., Wu, G.: Isolation and the high/low hierarchy. Arch. Math. Logic 41, 259–266 (2002) 13. Wu, G.: Isolation and lattice embeddings. Jour. Symb. Logic 67, 1055–1064 (2002)
Author Index
Adler, Micah 554 Amano, Kazuyuki 342 Ambos-Spies, Klaus 423 Anberr´ee, Thomas 388 Arslanov, M.M. 568 Badaev, Serikzhan 423 Barra, Mathias 258 Beggs, Edwin 20 Beyersdorff, Olaf 318 Bollig, Beate 306 Cao, Zining 351 Chen, Haiyan 192 Chen, Jianer 212, 433 Cooper, S. Barry 568, 588 Costa, Jos´e F´elix 20 Czyzowicz, J. 170 Demange, M. 364 Doberkat, Ernst-Erich 400 Dobrev, S. 170 Duan, Zhenhua 47, 294 Durocher, Stephane 467 Dwork, Cynthia 1 Ekim, T.
364
Feng, Qilong 82, 212 Fenner, Stephen 70 Fiala, Jiˇr´ı 125 Fu, Bin 234 Gaspers, Serge 479 Golovach, Petr A. 125, 182 Goncharov, Sergey 423 Gonz´ alez-Aguilar, H. 170 Guo, Jiong 445 Heeringa, Brent 554 Heggernes, Pinar 330 H¨ olzl, Rupert 457 Hsieh, Sun-Yuan 160 H¨ uffner, Falk 445, 490
Jain, Rahul 526 Jordan, Skip 94 Kalimullin, I.Sh. 568 Kameda, Tsunehiko 502 Kao, Ming-Yang 234 Kent, Thomas F. 579 Khoussainov, Bakhadyr 514, 542 Kim, Pok-Son 246 Komusiewicz, Christian 445 Kralovic, R. 170 Kranakis, Evangelos 170, 467 Kratochv´ıl, Jan 125, 182 Kristiansen, Lars 148 Krizanc, Danny 467 Kutzner, Arne 246 Lafitte, Gr´egory 375 Lee, Chia-Wei 160 Li, Angsheng 105, 116 Li, Cindy Y. 410 Li, Weilin 116 Liu, Jiamou 542 Loff, Bruno 20 Lou, Xiaowen 59 Marion, Jean-Yves 136 Meister, Daniel 330 Merkle, Wolfgang 457 Minnes, Mia 514, 542 Narayanan, Lata 467 Nayak, Ashwin 526 Niedermeier, Rolf 490 Ning, Dan 212 Opatrny, J. 170 Osterloh, Andre 223 Pan, Yicheng 116 Papadopoulos, Charis 330 P´echoux, Romain 136 Ponta, Oriana 490
598
Author Index
Salts, Matthew C. 588 Santha, Miklos 31 Saurabh, Saket 479 Soskova, M.I. 568 Stacho, L. 170 Stepanov, Alexey A. 479 Su, Yi 526 Tan, Jinsong 282 Tang, Linqing 105, 116 Tarui, Jun 342 Tian, Cong 47 Tucker, John 20 Urrutia, J. Voda, Paul J.
170 148
Wang, Jianxin 82, 212, 433 Wang, Lusheng 234 Weiss, Michael 375 Wong Prudence W.H. 410 Wu, Guohua 588 Xiao, Mingyu 270 Xie, Minzhu 433 Xin, Qin 410 Yung, C.C. Fencol
410
Zeugmann, Thomas 94 Zhang, Haibin 294 Zhang, John Z. 502 Zhang, Yong 70, 445 Zhao, Jitai 204 Zhu, Daming 59