Theoretical Computer Science 410 (2009) 2301–2307
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Preface This special issue is dedicated to Professor Sheng Yu to honor him and celebrate his 60th birthday. It contains a collection of research papers on formal languages and their applications, the general research area in which Professor Sheng Yu has made significant contributions. The authors are colleagues, co-authors, friends and, in many cases, former students and postdocs of Professor Sheng Yu. All the papers have been refereed according to the usual standards of the journal, Theoretical Computer Science. Professor Yu obtained his Doctor of Philosophy degree from the University of Waterloo in 1986 under the guidance of Karel Culik II. After a research visit to Turku, Finland he taught for two years at Kent State University, and in 1989 took up a position at the University of Western Ontario, where he is currently a Professor of Computer Science. Professor Yu’s research in theoretical computer science is reflected in more than 150 scientific publications. To mention some examples, the well-known classification of cellular automata into Culik–Yu classes resulted from some of his early work. Since the 90s, one of the foci of his work has been the descriptional complexity of finite state machines. Indeed, Professor Yu’s papers have made a strong impact in this area. He has solved difficult technical problems and his work has provided novel interesting avenues of research. Following this preface, we include a list of Professor Yu’s publications. Professor Yu has presented many invited plenary lectures at international meetings. He is one of the founders and the Steering Committee chair of the international conference series Implementation and Application of Automata, and has been the program committee chair of numerous international conferences. He will chair the 14th International Conference Developments in Language Theory to be held in London, Ontario in 2010. The editors and many of the authors of this issue share fond memories of workshops and conferences that Professor Yu has organized at the University of Western Ontario since the mid 90s, as well as of other occasions of fruitful scientific cooperation. As a collaborator, Professor Yu is innovative, knowledgeable, inspiring, and very reliable. His many Ph.D. students are grateful for his friendly and supportive guidance. We wish Professor Sheng Yu continued success in years to come. To conclude, we thank the Editor-in-Chief of TCS-A, Giorgio Ausiello, for the opportunity to publish the special issue. We thank the Journal Manager Mick van Gijlswijk for efficient cooperation in handling the issue. Lucian Ilie Department of Computer Science, University of Western Ontario, London, Ontario N6A 5B7, Canada E-mail address:
[email protected]. Grzegorz Rozenberg Leiden Center for Natural Computing LCNC - LIACS, Leiden University, Niels Bohrweg 1, 2333 CA Leiden, The Netherlands E-mail address:
[email protected]. Arto Salomaa Turku Centre for Computer Science, Joukahaisenkatu 3-5 B, 20520 Turku, Finland E-mail address:
[email protected]. 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.026
2302
Preface / Theoretical Computer Science 410 (2009) 2301–2307
Kai Salomaa∗ School of Computing, Queen’s University, Kingston, Ontario K7L 3N6, Canada E-mail address:
[email protected]. ∗
Corresponding editor.
Preface / Theoretical Computer Science 410 (2009) 2301–2307
2303
List of Sheng Yu’s Publications Journal papers [1] ‘‘Deciding determinism of caterpillar expressions’’, by K. Salomaa, S. Yu, J. Zan, Theoretical Computer Science, to appear. [2] ‘‘State complexity of basic language operations combined with reversal’’, by G. Liu, C. Martin-Vide, A. Salomaa, S. Yu, Information and Computation 206(9-10) (2008) 1178–1186. [3] ‘‘The state complexity of two combined operations: Star of catenation and star of reversal’’, by Y. Gao, K. Salomaa, S. Yu, Fundamenta Informaticae 83(1-2) (2008) 75-89. [4] ‘‘On the state complexity of combined operations and their estimation’’, by K. Salomaa, S. Yu, International Journal of Foundations of Computer Science 18(4) (2007) 683-698. [5] ‘‘Sc-expressions in object-oriented languages’’, by S. Yu, Q. Zhao, International Journal of Foundations of Computer Science 18(6) (2007) 1441-1452. [6] ‘‘A family of NFAs free of state reductions’’, by C. Campeanu, N. Santean, S. Yu, Journal of Automata, Languages and Combinatorics 12(1-2) (2007) 69-78. [7] ‘‘Representation and uniformization of algebraic transductions’’, by S. Konstantinidis, N. Santean, S. Yu, Acta Informatica 43(6) (2007) 395-417. [8] ‘‘Fuzzification of rational and recognizable sets’’, by S. Konstantinidis, N. Santean, S. Yu, Fundamenta Informaticae 76(4) (2007) 413-447. [9] ‘‘On the existence of prime decompositions’’, by Y.-S. Han, A. Salomaa, K. Salomaa, D. Wood, S. Yu, Theoretical Computer Science 376(1-2) (2007) 60-69. [10] ‘‘State complexity of combined operations’’, by A. Salomaa, K. Salomaa, S. Yu, Theoretical Computer Science 383(2-3) (2007) 140-152. [11] ‘‘Nondeterministic bimachines and rational relations with finite codomain, ’’ by N. Santean, S. Yu, Fundamenta Informaticae 73(1-2) (2006) 237-264. [12] ‘‘Subword conditions and subword histories’’, by A. Salomaa, S. Yu, Information and Computation 204 (2006) 17411755. [13] ‘‘Type theory and language constructs for objects with states’’, by H. Xu, S. Yu, Electronic Notes in Theoretical Computer Science 135(3) (2006) 141-151. [14] ‘‘State complexity: Recent results and open problems’’, by S. Yu, Fundamenta Informaticae 64(1-4) (2005) 471-480. [15] ‘‘Mergible states in large NFA’’, by C. Campeanu, N. Santean, S. Yu, Theoretical Computer Science 330 (2005) 23-34. [16] ‘‘Pattern expressions and pattern languages’’, by C. Campeanu, S. Yu, Information Processing Letters 92 (2004) 267274. [17] ‘‘On the state complexity of reversals of regular languages’’, by A. Salomaa, D. Wood, S. Yu, Theoretical Computer Science 320 (2004) 293-313. [18] ‘‘Subword history and Parikh matrices’’, by A. Mateescu, A. Salomaa, S. Yu, Journal of Computer and System Sciences 68(1) (2004) 1-21. [19] ‘‘Word Complexity and Repetitions in Words’’, by L. Ilie, S. Yu, International Journal of Foundations of Computer Science 15(1) (2004) 41-56. [20] ‘‘Follow automata’’, by L. Ilie, S. Yu, Information and Computation 186(1) (2003) 140-162. [21] ‘‘Reducing NFA by invariant equivalences’’, by L. Ilie, S. Yu, Theoretical Computer Science 306(1-3) (2003) 373-390. [22] ‘‘A Formal Study of Practical Regular Expressions’’, by C. Campeanu, K. Salomaa, S. Yu, International Journal of Foundations of Computer Science 14(6) (2003) 1007-1018. [23] ‘‘Decidability of EDT0L structural equivalence’’, by K. Salomaa, S. Yu, Theoretical Computer Science 276(1-2) (2002) 245-259. [24] ‘‘On the robustness of primitive words’’, by G. Paun, N. Santean, G. Thierrin, S. Yu, Discrete Applied Mathematics 117 (2002) 239-252. [25] ‘‘Tight lower bound for the state complexity of shuffle of regular languages’’, by C. Campeanu, K. Salomaa, S. Yu, Journal of Automata, Languages and Combinatorics 7(3) (2002) 303-310. [26] ‘‘Factorizations of languages and commutativity conditions’’, by A. Mateescu, A. Salomaa, S. Yu, Acta Cybernetica 15 (2002) 339-351. [27] ‘‘A sharpening of the Parikh mapping’’, by A. Mateescu, A. Salomaa, K. Salomaa, S. Yu, Theoretical Informatics and Applications 35 (2001) 551-564. [28] ‘‘An efficient algorithm for constructing minimal cover automata for finite languages’’, by C. Campeanu, A. Paun, S. Yu, International Journal of Foundations of Computer Science 13(1) (2002) 83-97. [29] ‘‘Minimal cover-automata for finite languages’’, by C. Campeanu, N. Santean, S. Yu, Theoretical Computer Science 267 (2001) 3-16. [30] ‘‘On the state complexity of k-entry deterministic finite automata’’, by M. Holzer, K. Salomaa, S. Yu, Journal of Automata, Languages and Combinatorics 6(4) (2001) 453-466. [31] ‘‘Tree-systems of morphisms’’, by J. Dassow, G. Paun, G. Thierrin, S. Yu, Acta Informatica, 38 (2001) 131-153.
2304
Preface / Theoretical Computer Science 410 (2009) 2301–2307
[32] ‘‘State complexity of regular languages’’, by S. Yu, Journal of Automata, Languages and Combinatorics 6(2) (2001) 221-234. [33] ‘‘Efficient implementation of regular languages using reversed alternating finite automata’’, by K. Salomaa, X. Wu, S. Yu, Theoretical Computer Science, 231(1) (2000) 103-111. [34] ‘‘Using DNA to solve the bounded Post correspondence problem’’, by L. Kari, G. Gloor, S. Yu, Theoretical Computer Science 231(2) (2000) 193-203. [35] ‘‘On fairness of many-dimensional trajectories’’, by A. Mateescu, K. Salomaa, S. Yu, Journal of Automata, Languages and Combinatorics 5(2) (2000) 145-157. [36] ‘‘Alternating finite automata and star-free languages’’, by K. Salomaa, S. Yu, Theoretical Computer Science 234 (2000) 167-176. [37] ‘‘Synchronization expressions and languages’’, by K. Salomaa, S. Yu, Journal of Universal Computer Science 5(9) (1999) 610-621. [38] ‘‘On Synchronization in P Systems’’, by G. Paun, S. Yu, Fundamenta Informaticae 38(4) (1999) 397-410. [39] ‘‘Generalized fairness and context-free languages’’, by K. Salomaa, S. Yu, Acta Cybernetica 14 (1999) 193-203. [40] ‘‘Synchronization expressions with extended join operation’’, by K. Salomaa, S. Yu, Theoretical Computer Science 207 (1998) 73-88. [41] ‘‘DNA computing, sticker systems and universality’’, by L. Kari, G. Paun, G. Rozenberg, A. Salomaa, S. Yu, Acta Informatica 35 (1998) 401-420. [42] ‘‘NFA to DFA transformation for finite languages over arbitrary alphabets’’, by K. Salomaa, S. Yu, Journal of Automata, Languages and Combinatorics 2(3) (1997) 177-186. [43] ‘‘Physical versus computational complementarity I’’, by C. Calude, K. Svozil, S. Yu, International Journal of Theoretical Physics 36(7) (1997) 1495-1523. [44] ‘‘Language-theoretic complexity of disjunctive sequences’’, by C. Calude, S. Yu, Discrete Applied Mathematics (80) 2-3 (1997) 199-205. [45] ‘‘Structural equivalence and ET0L grammars’’, by K. Salomaa, D. Wood, S. Yu, Theoretical Computer Science 164 (1996) 123-140. [46] ‘‘On synchronization languages’’, by L. Guo, K. Salomaa, S. Yu, Fundamenta Informaticae 25 (1996) 423-436. [47] ‘‘Program reuse via kind-bounded polymorphism’’, by S. Yu, Q. Zhuang, Journal of Computing and Information 2(1) (1996) 1163-1181. [48] ‘‘Complexity of EOL structural equivalence’’, by K. Salomaa, D. Wood, S. Yu, RAIRO Theoretical Informatics and Applications 29(6) (1995) 471-485. [49] ‘‘Decision problems for patterns’’, by T. Jiang, A. Salomaa, K. Salomaa, S. Yu, Journal of Computer and System Sciences 50(1) (1995) 53-63. [50] ‘‘P, NP and the Post correspondence problem’’, by A. Mateescu, A. Salomaa, K. Salomaa, S. Yu, Information and Computation 121(2) (1995) 135-142. [51] ‘‘Measures of nondeterminism for pushdown automata’’, by K. Salomaa, S. Yu, Journal of Computer and System Sciences 49(2) (1994) 362-374. [52] ‘‘Algorithmic abstraction in object-oriented languages’’, by S. Yu, Q. Zhuang, Journal of Object-Oriented Systems 2 (1995) 217-236. [53] ‘‘Fuzzy automata in lexical analysis’’, by A. Mateescu, A. Salomaa, K. Salomaa, S. Yu, Journal of Universal Computer Science 1(5) (1995). [54] ‘‘Pumping and pushdown machines’’, by K. Salomaa, D. Wood, S. Yu, RAIRO Theoretical Informatics and Applications 28(3-4) (1994) 221-232. [55] ‘‘Decidability of the intercode property’’, by H. Jurgensen, K. Salomaa, S. Yu, Journal of Information Processing and Cybernetics 29(6) (1993) 375-380. [56] ‘‘Pattern languages with and without erasing’’, by T. Jiang, E. Kinber, A. Salomaa, K. Salomaa, S. Yu, International Journal of Computer Mathematics 50 (1994) 147-163. [57] ‘‘On the state complexity of some basic operations on regular languages’’, by S. Yu, Q. Zhuang, K. Salomaa, Theoretical Computer Science 125 (1994) 315-328. [58] ‘‘Transducers and the decidability of independence in free monoids’’, by H. Jürgensen, K. Salomaa, S. Yu, Theoretical Computer Science 134 (1994) 107-117. [59] ‘‘On sparse language L such that LL = F’’, by P. Enflo, A. Granville, J. Shallit, S. Yu, Discrete Applied Mathematics 52 (1994) 275-285. [60] ‘‘Limited nondeterminism for pushdown automata’’, by K. Salomaa, S. Yu, EATCS Bulletin 50 (1993) 186-193. [61] ‘‘Attempting guards in parallel: A dataflow approach to execute generalized communication guards’’, by R. Govindarajan, S. Yu, International Journal of Parallel Programming 21(4) (1992) 225-268. [62] ‘‘Decidability of structural equivalence of EOL grammars’’, by K. Salomaa, S. Yu, Theoretical Computer Science 82 (1991) 131-139. [63] ‘‘Cellular automata, ωω-regular sets, and sofic systems’’, by K. Culik II, S. Yu, Discrete Applied Mathematics 32 (1991) 85-101.
Preface / Theoretical Computer Science 410 (2009) 2301–2307
2305
[64] ‘‘Primality types of instances of the Post correspondence problem’’, by A. Salomaa, K. Salomaa, S. Yu, EATCS Bulletin 44 (1991) 226-241. [65] ‘‘Computation theoretic aspects of global cellular automata behavior’’, by K. Culik II, L.P. Hurd, S. Yu, Physica D 45 (1990) 357-378. [66] ‘‘Finite-time behavior of cellular automata’’, by K. Culik II, L.P. Hurd, S. Yu, Physica D 45 (1990) 396-403. [67] ‘‘Constructions on alternating finite automata’’, by A. Fellah, H. Jurgensen, S. Yu, International Journal of Computer Mathematics 35(3-4) (1990) 117-132. [68] ‘‘The immortality problem for Lag systems’’, by K. Salomaa, S. Yu, Information Processing Letters 36 (1990) 311-315. [69] ‘‘A pumping lemma for deterministic context-free languages’’, by S. Yu, Information Processing Letters 31 (1989) 4751. [70] ‘‘On the limit sets of cellular automata’’, by K. Culik II, J. Pachl, S. Yu, SIAM Journal on Computing 18 (4) (1989) 831-842. [71] ‘‘Undecidability of CA classification schemes’’, by K. Culik II, S. Yu, Complex Systems 2 (1988) 177-190. [72] ‘‘The emptiness problem for CA limit sets’’, by K. Culik II, S. Yu, Mathematical and Computer Science Modeling 11 (1988) 363-366. [73] ‘‘Can the catenation of two sparse languages be dense?’’, by S. Yu, Discrete Applied Mathematics 20 (1988) 265-267. [74] ‘‘Fault-tolerant schemes for some systolic systems’’, by K. Culik II, S. Yu, International Journal of Computer Mathematics 22 (1987) 13-42. [75] ‘‘Decision problems resulting from grammatical inference’’, by S. Horvath, E. Kinber, A. Salomaa, S. Yu, Annales Academiae Scientiarum Fennicae, Series A. 1. Mathematica 12 (1987) 287-298. [76] ‘‘On a public-key cryptosystem based on iterated morphisms and substitutions’’, by A. Salomaa, S. Yu, Theoretical Computer Science 48 (1986) 283-296. [77] ‘‘Real time, pseudo real time and linear time ITA’’, by K.Culik II, S. Yu, Theoretical Computer Science 47 (1986) 15-26. [78] ‘‘A property of real-time trellis automata’’, by S. Yu, Discrete Applied Mathematics 15 (1986) 117-119. [79] ‘‘On the equivalence of grammars inferred from derivations’’, by E. Kinber, A. Salomaa, S. Yu, EATCS Bulletin 29 (1986) 186-193. [80] ‘‘Iterative tree arrays with logarithmic depth’’, by K. Culik II, O.H. Ibarra, S. Yu, International Journal of Computer Mathematics 20 (1986) 187-204. [81] ‘‘Iterative tree automata’’, by K. Culik II, S. Yu, Theoretical Computer Science 32 (1984) 227-247. Books and journal issues edited [82] International Journal of Foundations of Computer Science 16(3) (2005), edited by K. Salomaa, S. Yu. [83] Implementation and Application of Automata, edited by M. Domaratzki, A. Okhotin, K. Salomaa, S. Yu, Springer LNCS 3317, 2005. [84] International Journal of Foundations of Computer Science 13(1) (2002), edited by S. Yu. [85] A Half Century of Automata Theory, edited by A. Salomaa, D. Wood, S. Yu, World Scientific, 2001. [86] Words, Semigroups, Transductions, edited by M. Ito, G. Paun, S. Yu, World Scientific, 2001. [87] Implementation and Application of Automata, edited by S. Yu, A. Paun, Springer LNCS 2088 (2001). [88] Theoretical Computer Science 231(1), edited by K. Salomaa, D. Wood, S. Yu. [89] Automata Implementation WIA97 Springer LNCS 1436, 1997, edited by D. Wood, S. Yu. [90] Automata Implementation WIA96 Springer LNCS 1260, 1997, edited by D. Raymond, D. Wood, S.Yu. Book chapters and invited papers [91] ‘‘State complexity of finite and infinite regular languages’’, by S. Yu, in Current Trends in Theoretical Computer Science The Challenge of the New Century, Vol. 2, 2004, 567-580, World Scientific, edited by G. Paun, G. Rozenberg, A. Salomaa. [92] ‘‘On NFA reductions’’, by L. Ilie, G. Navarro, S. Yu, Theory Is Forever, LNCS 3113, 2004, pp. 112-124, edited by J. Karhumaki, H. Maurer, G. Paun, G. Rozenberg. [93] ‘‘Finite automata’’, by S. Yu, in Formal Languages and Applications, edited by C. Martin-Vide, V. Mitrana, G. Paun, Studies in Fuzziness and Soft Computing 148. Springer, (2004) 55-85. [94] ‘‘Class-is-type is inadequate for object reuse’’, by S. Yu, ACM Sigplan Notices 36(6) (2001) 50-59. [95] Chapter 14: ‘‘The time dimension of computation models’’, by S. Yu, in Where Mathematics, Computer Science, Linguistics and Biology Meet, edited by C. Martin-Vide and V. Mitrana, Kluwer (2001) 161-172. [96] Chapter 5: ‘‘State complexity of regular languages: Finite versus infinite’’, by C. Campeanu, K. Salomaa, S. Yu, in Finite vs Infinite Contributions to an Eternal Dilemma, edited by C. Calude, Gh. Pˇaun, Springer 2000, pp. 53-73. [97] ‘‘Synchronization expressions: Characterization results and implementation’’, by K. Salomaa, S. Yu, in Jewels are Forever, edited by J. Karhumäki, H.A. Maurer, Gh. Pˇaun and G. Rozenberg, Springer 1999. [98] Chapter 2, ‘‘Regular Languages’’, by S. Yu, in Handbook of Formal Languages, edited by G. Rozenberg and A. Salomaa, Springer, 1998, 41-110. [99] Chapter 2, ‘‘Topological transformation of systolic systems’’, by K. Culik II, S. Yu, Transformational approaches to systolic design, edited by G.M. Megson, Chapman and Hall, 1994, 34-52. [100] ‘‘Rewriting rules for synchronization languages’’, by K. Salomaa, S. Yu, Lecture Notes in Computer Science 1261, Springer, 1997, 322-338.
2306
Preface / Theoretical Computer Science 410 (2009) 2301–2307
[101] ‘‘Rediscovering pushdown machines’’, by K. Salomaa, D. Wood, S. Yu, in Lecture Notes in Computer Science 812, Springer-Verlag, 1994, 372-385. [102] ‘‘On the state complexity of intersection of regular languages’’, by S. Yu, Q. Zhuang, ACM SIGACT News 22(3) (1991) 52-54. Papers in conference proceedings [103] ‘‘State complexity of combined operations for prefix-free regular languages’’, by Y.-S. Han, K. Salomaa, S. Yu, 3rd International Conference on Language and Automata Theory and Applications (LATA 2009), Springer LNCS 5457. [104] ‘‘Length codes, products of languages and primality’’, by A. Salomaa, K. Salomaa, S. Yu, Language and Automata Theory and Applications (LATA 2008), Springer LNCS 5196, 2008, 476-486. [105] ‘‘Deterministic caterpillar expressions’’, by K. Salomaa, S. Yu, J. Zan, International Conference on Implementation and Application of Automata (CIAA 2007), Springer LNCS 4783, 97-108. [106] ‘‘On the state complexity of combined operations’’, by S. Yu, International Conference on Implementation and Application of Automata, (CIAA 2006), Springer LNCS 4094, pp. 11-22. [107] ‘‘State complexity of catenation and reversal combined with star’’, by Y. Gao, K, Salomaa, S. Yu, Descriptional Complexity of Formal Systems (DCFS 2006) 153-164. [108] ‘‘On weakly ambiguous finite transducers’’, by N. Santean, S. Yu, 10th International Conference on Developments in Language Theory (DLT 2006), LNCS 4036, pp. 156-167. [109] ‘‘Large NFA without mergible states’’, by C. Campeanu, N. Santean, S. Yu, Proceedings of 7th Descriptional Complexity of Formal Systems (DCFS 2005) 75-84. [110] ‘‘Type theory and language constructs for objects with states’’, by H. Xu, S. Yu, International Workshop on Development in Computational Models (DCM 2005) 45-54. [111] ‘‘Reducing the size of NFAs by using equivalences and preorders’’, by L. Ilie, R. Solis-Oba, S. Yu, Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM 2005), LNCS 3537, pp. 310-321. [112] ‘‘Adding states into object types’’, by Haitong Xu, Sheng Yu, Proceedings of the 2005 International Conference on Programming Languages and Compilers (PLC05) 101-107. [113] ‘‘Process traces with the option operation’’, by S. Yu, Q. Zhao, Proceedings of the 2004 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA04) 750-755 (2004). [114] ‘‘Introduction to process traces’’, by L. Ilie, S. Yu, Q. Zhao, Proceedings of the 2003 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA03) 1706-1712 (2003). [115] ‘‘Fast algorithms for extended regular expression matching and searching’’, by L. Ilie, B. Shan, S. Yu, 20th Annual Symposium on Theoretical Aspects of Computer Science (STACS 2003), LNCS 2607, 179-190. [116] ‘‘Regex and extended regex’’, by C. Campeanu, K. Salomaa, S. Yu, International Conference on Implementation and Application of Automata (CIAA 2002), LNCS 2608, 81-89. [117] ‘‘Algorithms for computing small NFAs’’, by L. Ilie, S. Yu, Mathematical Foundations of Computer Science, LNCS 2420 (2002) 328-340. [118] ‘‘Repetition complexity of words’’, by L. Ilie, S. Yu, K. Zhang, International Conference on Computing and Combinatorics (COCOON) 2002, LNCS 2387 320-329. [119] ‘‘Constructing NFAs by optimal use of positions in regular expressions’’, by L. Ilie, S. Yu, Symposium on Combinatorial Pattern Matching, CPM 2002, LNCS 2373, 279-288. [120] ‘‘Minimal covers of formal languages’’, by M. Domaratzki, J. Shallit, S. Yu, Development in Language Theory (2001) 333-344. [121] ‘‘An O(n2) algorithm for constructing minimal cover automata for finite languages’’, by A. Paun, N. Santean, S. Yu, Fifth International Conference on Implementation and Application of Automata (2000) 233-241. [122] ‘‘State complexity of regular languages’’, by S. Yu, Descriptional Complexity of Automata, Grammars and Related Structures (1999) 77-88. [123] ‘‘State complexity of basic operations on finite languages’’, by C. Campeanu, K. Culik, K. Salomaa, S. Yu, Proceedings of the Fourth International Workshop on Implementing Automata VIII 1-11, 1999. LNCS 2214, pp.60-70. [124] ‘‘Decomposition of finite languages’’, (Invited lecture), by A. Salomaa, S. Yu, Proceedings of Fourth International Conference on Developments in Language Theory (1999) 8-20. [125] ‘‘Metric lexical analysis’’, by C. Calude, K. Salomaa, S. Yu, Proceedings of the Fourth International Workshop on Implementing Automata VI 1-12, 1999. [126] ‘‘Practical rules for reduction on the number of states of a state diagram’’, by J. Ma, S. Yu, Proceedings of the 26th International Conference on Technology of Object-Oriented Languages and Systems (TOOLS USA 98) (1998) 46-57, Santa Barbara. [127] ‘‘Implementing R-AFA operations’’, by S. Huerter, K. Salomaa, X. Wu, S. Yu, Proceedings of the Third International Workshop on Implementing Automata (WIA98) 1998, 54-64. [128] ‘‘Minimal cover-automata for finite languages’’, by C. Campeanu, N. Santean, S. Yu, Proceedings of the Third International Workshop on Implementing Automata (WIA98) 1998, 32-42. [129] ‘‘An efficient implementation of regular languages using r-AFA’’, by K. Salomaa, X. Wu, S. Yu, Proceedings of the Second International Workshop on Implementing Automata (WIA97) 1997, 33-42. Also in LNCS 1436, 176-184.
Preface / Theoretical Computer Science 410 (2009) 2301–2307
2307
[130] ‘‘Decidability of fairness for context-free languages’’, by A. Mateescu, K. Salomaa, S. Yu, Proceedings of Third International Conference on Developments in Language Theory (1997) 351-364. [131] ‘‘At the crossroads of DNA computing and formal languages: Characterizing recursively enumerable languages using insertion–deletion systems’’, by L. Kari, G. Paun, G. Thierrin, S. Yu, Proceedings of the Third Annual DIMACS Workshop on DNA Based Computers, (1997) 329-346. [132] ‘‘EDT0L structural equivalence is decidable’’, by K. Salomaa, S. Yu, Proceedings of DMTCS96 (Discrete Mathematics and Theoretical Computer Science, by Springer), Dec. 1996, 363-375. [133] ‘‘Loop-free alternating finite automata’’, by K. Salomaa, S. Yu, Proceedings of the 8th International Conference on Automata and Formal Languages, July 29-Aug. 2, 1996, 979-988. [134] ‘‘NFA to DFA transformation for finite languages’’, by K. Salomaa, S. Yu, International Workshop on Implementing Automata (WIA 1996), (Aug. 28-30, 96), also appeared in Lecture Notes in Computer Science 1260, Springer, 1997, 149-158. [135] ‘‘Language-theoretic complexity of disjunctive sequences’’, by C. Calude, S. Yu, Proceedings of the Australian Theory Symposium, 1996, 175-180. [136] ‘‘Software reuse via algorithm abstraction’’, by S. Yu, Q. Zhuang, Proceedings of the 17th International Conference on Technology of Object-Oriented Languages and Systems (TOOLS-USA 17), 1995, 277-292. [137] ‘‘Measuring nondeterminism of pushdown automata’’, by K. Salomaa, S. Yu, International Conference on Developments in Language Theory, July, 1995, 154-165. [138] ‘‘Synchronization expressions and languages’’, by L. Guo, K. Salomaa, S. Yu, Proceedings of IEEE Symposium on Parallel and Distributed Processing, Oct. 1994, 257-264. [139] ‘‘Rediscovering pushdown machines’’, by K. Salomaa, D. Wood, S. Yu, Results and Trends in Theoretical Computer Science, Colloquium Proceedings appeared in Lecture Notes in Computer Science 812, 1994, 372-385. [140] ‘‘Inclusion is undecidable for pattern languages’’, by T. Jiang, A. Salomaa, K. Salomaa, S. Yu, Proceedings of the 20th International Colloquium on Automata, Languages, and Programming (ICALP93), 1993, 301-312. [141] ‘‘Structural equivalence and ET0L grammars’’, by K. Salomaa, D. Wood, S. Yu, Proceedings of the 10th international Conference on Fundamentals of Computation Theory, 1993, 430-439. [142] ‘‘Characterizing regular languages with polynomial densities’’, by A. Szilard, S. Yu, K. Zhang, J. Shallit, Proceedings of the 17th International Symposium on Mathematical Foundations of Computer Science, Aug. 1992, 494-503. (Lecture Notes in Computer Science 629.) [143] ‘‘State complexity of some basic operations on regular languages’’, by S. Yu, Q. Zhuang, Proceedings of the 4th International Conference on Computer and Information, 1992, 95-99. [144] ‘‘Iterative tree automata, alternating Turing machines, and uniform Boolean circuits: Relationships and characterizations’’, by A. Fellah, S. Yu, Proceedings of the 1992 ACM Symposium on Applied Computing, 1992, 1159-1166. [145] ‘‘Degrees of nondeterminism for context-free languages’’, by K. Salomaa, S. Yu, Proceedings of the 8th International Conference on Fundamentals of Computation Theory, Sept. 1991, 380-389. [146] ‘‘PARC project: Practical constructs for parallel programming languages’’, by S. Yu, L. Guo, R. Govindarajan, P. Wang, the Proceedings of the IEEE Fifteenth Annual International Computer Software and Applications Conference, Sept. 1991, 183-189. [147] ‘‘Attempting guards in parallel: A data flow approach to execute generalized guarded commands’’, by Govindarajan, S. Yu, the proceedings of PARLE91 Parallel Architectures and Languages Europe, June 1991, 372-389. [148] ‘‘Alternating finite automata’’, by A. Fellah, H. Jurgensen, S. Yu, Proceedings International Conference on Computer and Information, 1990, 140-143. [149] ‘‘Translation of systolic algorithms between systems of different topology’’, by K. Culik II, S. Yu, Proceedings of IEEE International Conference on Parallel Processing, 1985, 756-763. Extended abstracts in conference proceedings [150] ‘‘State complexity: Recent results and open problems’’, invited talk at the ICALP Formal Language Symposium 2004. [151] ‘‘Evolutions of cellular automaton configurations’’, by S. Yu, invited talk at the American Mathematical Society Meeting, Tampa, March 23, 1991. [152] ‘‘Sofic systems, ωω-rational sets and CA’’, by S. Yu, Cellular Automata: Theory and Experiment, Sept. 1989, Los Alamos. [153] ‘‘On the computing power of tree architecture’’, by S. Yu, First Montreal Conference on Combinatorics and Computer Science (1987). [154] ‘‘The emptiness problem of CA limit sets’’, by S. Yu, Sixth International Conference on Mathematical Modeling (1987). [155] ‘‘ITA with logarithmic depths’’, by S. Yu, invited talk at the 1986 Finnish Mathematics conference.
Theoretical Computer Science 410 (2009) 2308–2315
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
The parallel complexity of signed graphs: Decidability results and an improved algorithm Artiom Alhazov a,b , Ion Petre c,d,∗ , Vladimir Rogojin a,d a
Department of Information Technologies, Åbo Akademi University, Finland
b
Institute of Mathematics and Computer Science, Academy of Sciences of Moldova, Str. Academiei 5, Chişinău, MD-2028, Republic of Moldova
c
Academy of Finland, Finland
d
Turku Centre for Computer Science, FIN-20520 Turku, Finland
article
info
Keywords: Ciliates Parallel gene assembly Parallel complexity Signed graphs Algorithm Decidability
a b s t r a c t We consider a graph-based model for the process of gene assembly in ciliates, as proposed in [A. Ehrenfeucht, T. Harju, I. Petre, D. M. Prescott, G. Rozenberg, Computation in Living Cells: Gene Assembly in Ciliates, Springer, 2003]. The model consists of three operations, each reducing the order of the signed graph. Reducing the graph to the empty graph through a sequence of operations corresponds to assembling a gene. We investigate parallel reductions of a given signed graph, where the graph is reduced through a sequence of parallel steps. A parallel step consists of operations such that any of their sequential compositions are applicable to the current graph. We improve the basic exhaustive search algorithm reported in [A. Alhazov, C. Li, I. Petre, Computing the graph-based parallel complexity of gene assembly, Theoretical Computer Science, 2008 (in press)] to compute the parallel complexity of signed graphs. On the one hand, we reduce the number of sets of operations which should be checked for parallel applicability. On the other hand, we speed up the parallel applicability check procedure. We prove also that deciding whether a given parallel composition of operations is applicable to a given signed graph is a coNP problem. Deciding whether the parallel complexity (the length of a shortest parallel reduction) of a signed graph is bounded by a given constant is in NPNP . © 2009 Elsevier B.V. All rights reserved.
1. Introduction Ciliates are an old and diverse group of unicellular eukaryotes that, as a unique feature, possess two kinds of nuclei. The macronucleus is the somatic nucleus, while the micronucleus is the germline nucleus. The micronucleus remains silent throughout the life cycle except at a certain stage following ciliate conjugation. Then ciliates destroy all old micronuclei and macronuclei and transform a mitotic copy of the micronucleus into a macronucleus. This process implies massive DNA manipulations, with a large amount of DNA being excised, inverted, and/or translocated. The reason for these manipulations lies in the drastically different genome structure in the micronuclei and the macronuclei. Macronuclear genes for example are continuous DNA sequences. The same gene in the micronucleus is broken into many coding blocks, presented in a shuffled order, some of them even inverted, separated by non-coding blocks. The transformation from a micronucleus to a macronucleus implies identifying all coding blocks and assembling them in the correct order, while excising all non-coding blocks. We refer to [11] for a survey on this topic.
∗
Corresponding author at: Turku Centre for Computer Science, FIN-20520 Turku, Finland. E-mail addresses:
[email protected] (A. Alhazov),
[email protected] (I. Petre),
[email protected] (V. Rogojin).
0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.028
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
2309
A clue to how gene assembly is possible is given by the special structure of all coding blocks. Thus, each coding block ends with a nucleotide sequence (called a pointer) that is repeated in the beginning of the coding block that should follow it in the assembled macronuclear gene. We consider in this paper an intramolecular model for gene assembly proposed in [3,12]. A different intermolecular model was previously proposed in [9]. The intramolecular model consists of three operations called ld, hi, and dlad. They all conjecture that the DNA molecule folds on itself into a shape that is specific to each operation in such a way as to enable recombination of consecutive coding blocks on their common pointers. These molecular operations have been described in a number of previous publications such as [2,7,8]. It is enough for the purpose of this paper to focus on a mathematical model associated to them in terms of signed graphs. To each micronuclear gene one may associate the string consisting of its sequence of pointers, where each pointer is denoted by a letter. The inversion of a pointer p is denoted by p. The resulting structure is a signed double occurrence string. One can then further associate to this string (and so, to the gene) the corresponding signed overlap graph. The three molecular operations can then be formulated as rewriting rules for signed graphs in such a way that the result of a graph operation models the result of the corresponding molecular operation. For all the details of these transformations we refer to [2,8]. In this paper we focus exclusively on the graph theoretical formalism associated to gene assembly, which we introduce in Section 3 of this paper. We focus in this paper on a notion of parallelism, which is most natural to consider from a biological perspective. In the graph theoretical framework of our paper, parallelism is defined as follows: a set of operations is applicable in parallel to a graph if all sequential compositions of operations in this set are applicable to this graph. In this case it follows that all sequential compositions of operations lead to the same result; see [6]. This notion enables a notion of complexity in terms of the minimum number of parallel steps needed to reduce the graph to the empty one. We recall from [1] the following table giving the size n of the smallest known graphs with complexity c for small values of c: c n
1 1
2 2
3 3
4 5
5 12
6 24
A number of partial results have been obtained, see [6,5,4], but the main problem of this research area remains open: Is the parallel complexity of signed graphs finitely bounded? Addressing this question, we establish in this paper several results related to its computational complexity. We prove that for a signed graph G (i) it is a coNP problem whether a set of operations is applicable to G; (ii) it is a coNP problem whether a sequence of sets of operations is a parallel reduction of G; (iii) it is an NPNP problem whether the parallel complexity of G is bounded by a given constant. An algorithm to compute the parallel complexity andan optimal parallel reduction for a given signed graph was
introduced in [1]. Its complexity has been estimated as O
n2n+4 dn
√
for d = e2 / 8. We propose in this paper a speed-up
on this algorithm that remains, however, of prohibitive computational complexity. 2. Preliminaries A signed graph is a triple G = (V , E , σ ), where G = (V , E ) is an undirected graph without loops and σ : V → {+, −}. Edges between u, v ∈ V are denoted by uv . Since the graph is undirected, we make the convention that uv = v u. We let V + = σ −1 (+) and V − = σ −1 (−). By NG (u) = {v ∈ V | uv ∈ E } we denote the neighborhood of u ∈ V . For signed graphs G1 = (V1 , E1 , σ1 ) and G2 = (V2 , E2 , σ2 ), we will need the following graph-theoretic operations:
• G1 ∩ G2 = (V , E , σ ), V = V1 ∩ V2 , E = E1 ∩ E2 , σ = σ1 |V if σ1 |V = σ2 |V is an intersection of graphs; • G1 ∪ G2 = (V , E , σ ), V = V1 ∪ V2 , E = E1 ∪ E2 , σ = σ1 ∪ σ2 |V2 \V1 is a union of graphs (on V1 ∩ V2 we take the signing from G1 );
• G1 \ G2 = (V1 , E1 \ E2 , σ1 ) is a graph G1 without edges of G2 ; • G1 ∆G2 = (G1 \ G2 ) ∪ (G2 \ G1 ) is a graph formed by symmetric difference of edges in G1 and G2 . For a set S ⊆ V we denote by G|S = (S , E ∩ (S × S ), σ |S ) the subgraph generated by S. We also write G − S = G|V \S . For a set S ⊆ V we denote by KG (S ) = (S , {uv | u, v ∈ S , u 6= v}, σ |S ) the clique generated by S. For sets S1 , S2 ⊆ V with S1 ∩ S2 = ∅, we use the notation KG (S1 , S2 ) = (S1 ∪ S2 , {uv | u ∈ S1 , v ∈ S2 }, σ |S1 ∪S2 ) to represent the biclique (also called the complete bipartite graph) generated by S1 and S2 . For a graph G = (V , E , σ ), we denote by neg (G) = (V , E , σ 0 ), where σ 0 (u) = − if and only if σ (u) = +, u ∈ V , the graph with complemented signing. Then com(G) = neg (KG (G) \ G) stands for the graph with complemented edges and signing. For a set S ⊆ V , the graph with complemented edges and signing over S is comS (G) = com(G|S ) ∪ (G \ KG (S )). Finally, for a node u ∈ V we denote by locu (G) = comNG (u) (G) the graph with complemented edges and signing over the neighborhood of u; we also refer to it as the graph G with complemented neighborhood of u.
2310
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
Fig. 1. Graphs (a) G, (b) gnr1 (G), (c) gpr2 (G) and (d) gdr6,7 (G).
3. Three graph operations The following three graph operations have been introduced as a model for gene assembly in ciliates. Each micronuclear and intermediate gene is modeled as a signed graph and its assembly process is modeled as a composition of the three operations. For details on this model we refer to [2]. Definition 1. Consider a signed graph G = (V , E , σ ).
• The operation gnr is applicable to vertices x ∈ V − with N (x) = ∅. In this case, gnrx (G) = G − {x}. • The operation gpr is applicable to vertices x ∈ V + . In this case, gprx (G) = locx (G) − {x}. • The operation gdrx,y is applicable to adjacent vertices x, y ∈ V − . In this case, gdrx,y (G) = (G∆Gx,y ) − {x, y}, where Gx,y = KG (NG (x), NG (y) \ NG (x)) ∪ KG (NG (y), NG (x) \ NG (y)). Equivalently, for p, q ∈ V \ {x, y}, we have pq ∈ G∆gdrx,y (G) if and only if · p ∈ NG (x) and q ∈ NG (y) \ NG (x), or · p ∈ NG (x) \ NG (y) and q ∈ NG (x) ∩ NG (y). The set of all gnr, gpr and gdr operations is denoted by GNR, GPR and GDR, respectively. We also use notations dom(gnrx ) = S {x}, dom(gprx ) = {x} and dom(gdrx,y ) = {x, y}. We extend the notation to sets: dom(S ) = r ∈S dom(r ), for S ⊆ GNR ∪ GPR ∪ GDR. For an operation r and a sequential composition ϕ of operations in GNR ∪ GPR ∪ GDR, we say that ϕ ◦ r is applicable to G if r is applicable to G and ϕ is applicable to r (G). If ψ = rk ◦ · · · ◦ r1 , r1 , . . . , rk ∈ GNR ∪ GPR ∪ GDR is applicable to G and ψ(G) = ∅, then we say that ψ is a sequential reduction of G. Applications of a gnr operation, a gpr operation and a gdr operation are illustrated by an example in Fig. 1. 3.1. Parallelism We discuss in this section a notion of parallelism, as introduced in [6]. Definition 2. We say that a set of operations S is applicable in parallel to G if any permutation ϕ of operations from S is applicable to G. We recall the following lemma. Lemma 1 ([6]). If S is applicable in parallel to G, then for any two sequential compositions ϕ1 , ϕ2 of the operations in S, we have ϕ1 (G) = ϕ2 (G). Therefore, whenever set S is applicable in parallel to G, we denote S (G) = ϕ(G), where ϕ is an arbitrary sequential composition of all operations from S. We recall the following criterion for the applicability in parallel of operations from GNR ∪ GPR ∪ GDR. Lemma 2 ([6]). Consider a subset S ⊂ GNR ∪ GPR ∪ GDR of operations applicable to G and a signed graph G. Then S is applicable in parallel to G if and only if NG|dom(S ) (u) = ∅ for all gpru ∈ S and S ∩ GDR is applicable in parallel to G. Definition 3. For a signed graph G and sets S1 , . . . , Sm ⊆ GNR ∪ GPR ∪ GDR, we say that Φ = Sm ◦ · · · ◦ S1 is applicable in parallel to G if Si is applicable in parallel to Si−1 ◦ · · · ◦ S1 (G), for all 1 ≤ i ≤ m. We call each of the sets Si , 1 ≤ i ≤ m, a parallel step of Φ . We say that Φ is a parallel reduction of G if, moreover, Φ (G) = ∅. The parallel complexity of Φ is the number of parallel steps in Φ : C (Φ ) = m. The parallel complexity for graph G is
C (G) = min{C (R) | R is a parallel reduction of G}.
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
2311
4. A complexity result We prove in this section that deciding whether a given set of gdr operations is applicable to a given graph is a coNP problem. We also prove that deciding whether the parallel complexity of a given signed graph is at most k, for a given k, is an NPNP problem. Definition 4. We recall first a few notions of computational complexity. For details we refer to [10].
• A problem is said to be in class NP if it can be solved on a non-deterministic Turing machine in polynomial time. Equivalently, a problem P is in class NP if and only if its solution (the computation that ends with answer yes) can be verified in polynomial time by a deterministic Turing machine. Note that verifying the solution does not include finding the solution. The dual problem P 0 , for which the answer is no if and only if the answer to P is yes, is called a coNP problem. Equivalently, a problem is in class coNP if and only if a counter-example (the computation that ends with answer no) can be verified in polynomial time. • An oracle is an always-halting Turing machine whose computation is abstracted and counted as a single (macro)step, part of a larger computation of a different Turing machine. An oracle machine is a Turing machine connected to an oracle. The machine is able to query the oracle on various inputs throughout its computation, get the answer in one step and continuing its computation according to the answer it receives. We refer to [10] for a formal definition of an oracle machine. • If C is an arbitrary (deterministic or nondeterministic) complexity class and A is an arbitrary oracle, the complexity class C A consists of all languages that can be decided by machines able to decide the class C , extended with the oracle A. If the 0 oracle A is in the complexity class C 0 , then we obtain the complexity class C C In particular, a problem is said to be in class NPNP if it can be solved on a non-deterministic Turing machine with NP oracles in a polynomial number of meta-steps. A meta-step is understood here as either a transition of a Turing machine, or asking the oracle the answer to a problem in NP, and modifying the state of the Turing machine depending on the oracle’s answer. Equivalently, a problem is in class NPNP if and only if its solution can be verified in polynomial time using NP oracles. We can prove now the following results on the parallel complexity of a signed graph. Lemma 3. Let G be a signed graph and S ⊆ GNR ∪ GPR ∪ GDR a set of operations. Deciding whether S is applicable to G in parallel is a coNP problem. Proof. It follows from Definition 2 that S is not applicable in parallel to G if there exists a sequential composition of the operations in S that is not applicable to G. Verifying such a composition can be done in polynomial time. Lemma 4. Let G be a signed graph and S1 , . . . , Sk , k ≥ 1, some sets of operations. Deciding whether Sk ◦ · · · ◦ S1 is a parallel reduction of G is a coNP problem. Proof. Clearly, Sk ◦ · · · ◦ S1 is not a parallel reduction if (i) there exists 1 ≤ i ≤ k such that Si is not applicable in parallel to Si−1 ◦ · · · ◦ S1 (G), or (ii) Sk ◦ · · · ◦ S1 is applicable to G but Sk ◦ · · · ◦ S1 (G) 6= ∅. Deciding the problem can be done in polynomial time by a non-deterministic Turing machine as follows. First, guess whether (i) or (ii) is to be checked. In the case of (ii), for each Si , 1 ≤ i ≤ k, let φi be an arbitrary sequential composition of all operations in Si and compute in polynomial time φk ◦ · · · ◦ φ1 (G). In the case of (i), guess first an i, 1 ≤ i ≤ k, and then guess an operation f ∈ Si and a sequential composition ψi of operations of Si \ {f }. Then check whether f is not applicable to ψi ◦ φi−1 ◦ · · · ◦ φ1 (G). Theorem 5. Let G be a signed graph and k ≥ 1. Deciding whether C (G) ≤ k is an NPNP problem. Proof. With a nondeterministic Turing machine we may guess in polynomial time some sets of operations S1 , . . . , Sl , l ≤ k and then, using an NP oracle, we may verify as in the proof of Lemma 4 whether Sl ◦ · · · ◦ S1 is a parallel reduction of G. Corollary 6. Given a signed graph G and an integer k, to decide whether C (G) ≥ k is a coNPNP problem. 5. Computing the parallel complexity: The basic algorithm The algorithm in [1] to compute the parallel reduction complexity C (G) for the graph G, referred to in what follows as the basic algorithm, is essentially based on the following basic observation:
C (G) = 1 + min{C (G0 ) | G0 = S (G), S ⊆ GNR ∪ GPR ∪ GDR}.
(1)
We denote by app(G) the set of all operations applicable to G. We compute the parallel reduction complexity C (G) as follows.
2312
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
• If G is empty, then C (G) = 0; • For all subsets S ⊆ app(G) applicable in parallel to G do: · Let G0 = S (G); · Compute C (G0 ) (using the same algorithm as for G); • Choose S yielding the minimal C (G0 ); • Then C (G) = 1 + C (G0 ). The algorithm is detailed in [1] on three levels as follows: (i) For a given graph, construct all sets S of operations from GNR ∪ GPR ∪ GDR, with each operation applicable to G. (ii) For every set S built at (i), check whether it is applicable in parallel to G. (iii) Repeat the algorithm for all graphs S (G). For step (i) of the algorithm, one has to consider at most 2n(n−1)/2 sets of operations for a graph G with n vertices. For step (ii), for m gdr operations, one should verify that all m! sequential compositions are applicable to G; the check related to gnr and gpr operations can be done in linear time based on Lemma 2. A Greedy-type of simplification can be considered: investigate only maximal sets of operations S, i.e., sets such that for all T with S ⊆ T , T is not applicable in parallel to G. This Greedy algorithm may, however, not give a reduction strategy with the minimal number of steps. For such an example, we refer to [1]. 6. An improved algorithm We discuss in this section two improvements over the algorithm in [1], presented in Section 5. On the one hand, we reduce the number of sets of applicable operations that need to be considered throughout the algorithm. On the other hand, we reduce the number of sequential compositions that need to be checked in step (ii) from m! to 2m . 6.1. A different strategy for computing the parallel complexity We focus first on decreasing the number of sets of applicable operations considered throughout the algorithm. We illustrate the idea on an example with two rules. Assume that some operations r1 , r2 are applicable in parallel to graph G. When computing the parallel complexity of G, one should consider at least the following three cases for the first step of a parallel reduction of G: {r1 }, {r2 }, and {r1 , r2 }. Assume now that, after choosing {r1 } in the first step of a parallel reduction, we choose a set {r2 } ∪ S in the second step of the reduction. We claim that this case need not be considered since it yields the same complexity as a reduction considering {r1 , r2 } in the first step and S in the second. Indeed, in this case S is applicable in parallel to r2 (r1 (G)) = {r1 , r2 }(G) and S ◦ {r1 , r2 }(G) = (S ∪ {r2 }) ◦ {r1 }(G). The argument above can be generalized to the following result. Theorem 7. Let G be a signed graph and U , V , W ⊆ GNR ∪ GPR ∪ GDR such that (W ∪ V ) ◦ U is applicable to G. If V ∪ U is applicable in parallel to G, then ({W ∪ V } ◦ U )(G) = (W ◦ {V ∪ U })(G). Proof. The result is straightforward. From Lemma 1, noting that (V ◦ U )(G) = (V ∪ U )(G), it follows that both sides are well-defined and are equal to (W ◦ V ◦ U )(G). Note that Theorem 7 does not imply that a Greedy strategy, where each parallel step is maximal, leads to a minimal strategy. It only implies that in the next parallel step, Si+1 , one need not consider operations that could be applied in parallel with the current step, Si . However, Si need not be maximal: operations that are applicable in parallel with Si may be considered for steps Sj , with j ≥ i + 2. 6.2. A faster test for the parallel applicability of a set of operations We discuss now the problem of checking the parallel applicability of a set of operations. Since we are constructing the sets of operations incrementally, the problem we are interested in is the following: given a graph G, a set S of operations applicable in parallel to G, and an operation r 6∈ S applicable to G, verify whether S ∪ {r } is applicable to G. We only consider this problem for the case when S ∪ {r } ⊆ GDR. A straightforward approach, implemented in the basic algorithm of [1], is to consider all sequential compositions of operations in S ∪ {r }. Let us assume that a total order relation < is defined on the set of all operations. For a set S of operations, we denote the sequential compositions of elements of S in the order < by lex(S ). Lemma 8. Let G be a signed graph and S a set of operations. If for all S 0 ⊆ S and r ∈ S \ S 0 , composition r ◦ lex(S 0 ) is applicable to G, then S is applicable in parallel to G.
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
2313
Note that, although in the presumption of the lemma it is implicitly stated that lex(S 0 ) is applicable to G, this condition will be automatically checked if subsets S 0 ⊆ S are considered in the increasing order of ⊆. Indeed, the statement of the lemma will guarantee it. We now proceed with the proof. Proof. We prove that all subsets S 0 ⊆ S are applicable in parallel to G by induction on the cardinality of S 0 . The claim is trivially true for ∅. Assume that, for a given k, every subset with less than k operations is applicable in parallel to G. Consider an arbitrary subset S 0 ⊆ S with |S 0 | = k. Take an arbitrary sequential composition ψ of operations in S 0 . Then we can write ψ = r ◦ ψ 0 , where r is an operation and ψ 0 is a sequential composition of operations in the set S 00 = S 0 \ {r }. By induction hypothesis, S 00 is applicable in parallel to G and ψ 0 (G) = (lex(S 00 ))(G). By the premise of the lemma, r is applicable to (lex(S 00 ))(G), so ψ is applicable to G. Since ψ was chosen arbitrarily, S 0 is applicable in parallel to G. Lemma 8 gives a way to test the parallel applicability of a set S of k operations by considering 2k sequential compositions instead of k!. Indeed, for all of the 2k subsets S 0 of S, one only needs to verify the applicability of lex(S 0 ) to G. When S is constructed in an incremental way, as in the algorithm of [1], the test is faster, as shown in the next result. Lemma 9. Let G be a signed graph and S a set of k − 1 operations, applicable in parallel to G, k ≥ 1. For any r 6∈ S applicable to G, we may decide the parallel applicability of S ∪ {r } to G by applying at most k2k−1 operations in GNR ∪ GPR ∪ GDR. Proof. One only needs to verify that, for all S 0 ⊆ S, r ◦ lex(S 0 ) is applicable to G. Based on Lemmas 2 and 9, we can give now the following procedure to check the parallel applicability of some set of operations; see function Check. Input: graph G, set S, op r Output: boolean Data: op r 0 , set S 0 if r ∈ GPR then return NG (dom(r )) ∩ dom(S ) = ∅; else if r ∈ GDR then if NG (dom(r )) ∩ dom(S ∩ GPR) 6= ∅ then return false; else foreach S 0 ⊆ (S ∪ {r }) ∩ GDR do foreach r 0 ∈ S ∪ {r } \ {S 0 } do if not (applicabler 0 ◦lex(S 0 ) (G)) then return false; return true; Function
Check. Deciding whether the operation r is applicable in parallel with the set S of operations.
6.3. The new algorithm The new strategy proposed in Section 6.1 to compute the parallel complexity of a signed graph aims to investigate the parallelizations of sequential reductions of the graph. Rather than investigating all possible sequential reductions, we propose an idea of ‘‘parallelization on the fly’’, as explained below. Assume a total order relation < on all operations GNR ∪ GPR ∪ GDR. Assume that we have already chosen a set S of rules applicable in parallel to G. We then examine all possible operations r applicable to S (G) as follows. If r 0 < r for all r 0 ∈ S we denote S < r and also r > S. We now explain the algorithm for finding the parallel complexity and an associated strategy. The answer is obtained by calling the function Complexity, giving it as parameters the corresponding graph, the empty set, the same graph and the empty set again, and the number of nodes plus one. Function Complexity takes five parameters: a graph G, a set S of operations already chosen to be applied in the current step, the graph G0 before the previous step, the set F of operations applied in the previous step, and an integer bound. The function returns the best reduction strategy of G in less than bound steps, with the first step of the reduction including S. In the same time, based on Theorem 7, the first step of the reduction may not include any operation applicable in parallel with F to G0 . The recursion consists in checking all possible operations r ∈ GPR ∪ GDR applicable to S (G). If r is not applicable in parallel with S, then we consider a possible reduction where the current step remains S and the next parallel step includes {r }, while excluding any operation applicable in parallel with S. Otherwise, if r > S, then it is added to the current step S of the reduction and the scan continues. If r < S, then it is not added to the current step S. In this way, for any G we consider at most one time any parallel step applicable to G.
2314
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
Input: graph G, set S, graph G0 , set F , integer bound Output: integer, strategy Data: strategy R, R0 ; integer i; set S 0 S 0 ← app(S (G)); if S 0 \ GNR = ∅ then if G = ∅ then return (0, ∅); else if NG (u) = ∅ for all gnru ∈ S 0 then return (1, S ∪ S 0 ); else return (2, S 0 ◦ S ); else R ← ∅; if bound > 1 then foreach r ∈ S 0 \ GNR do if Check(G, S , r ) = false then (i, R0 ) ←Complexity(S (G), {r }, G, S , bound − 1); if i + 1 < bound then bound ← i + 1; R ← R0 ◦ S; else if r > S and Check(G0 , F , r ) = false then (i, R0 ) ←Complexity(G, S ∪ {r }, G0 , F , bound); if i < bound then bound ← i; R ← R0 ; return (bound, R); Function Complexity. The central routine: find the best reduction strategy of G in less than bound steps, with the first-step reduction including S but containing no operations applicable in parallel with F . This Greedy-like approach is justified by Theorem 7. Note that it differs from the Greedy-like approach considered in Section 5, where we consider only maximal sets applicable in parallel. With the help of the variable bound, strategies are not computed beyond the depth of the best strategy already found. We return from the recursion in case no operations from GPR ∪ GDR are applicable to S (G). The complexity is 0 if G is empty; otherwise it is 1 if all GNR operations applicable to S (G) are also applicable to G, and it is 2 if they are not. Finally, the current best strategy and its length are returned. 7. Complexity estimates Consider a graph G with n nodes. The idea of the search consists in considering sequences of operations, and deciding whether each subsequent operation belongs to the same step or begins the next one (not considering sequences that do not satisfy criterion justified by Theorem 7). There can be no more than n! possible sequences of operations ϕ , and the bottleneck is checking the parallel applicability of the operations in them, which consists in examination of at most 2n/2 sequential compositions of some operations in ϕ . Checking a sequential composition means applying a linear number of rules; each application takes at most quadratic time with respect to n, so the total complexity can be estimated as O(n! · 2n/2 · n3 ). Since from the Stirling formula it follows that n! = Θ (nn+1/2 /en ), we can rewrite the complexity as
O
nn+7/2 cn
e for c = √ . 2
While the complexity estimate of the basic algorithm in [1] grows almost as fast as (nn )2 , the present estimate of the improved method grows almost as fast as nn . 8. Discussion For a set V , the number of possible signed graphs whose set of nodes is V is 2|V |(|V |+1)/2 . Therefore, the complexity problem for all 1 + 2 + 8 + 64 + 1024 + 32768 graphs with up to 6 nodes can be easily computed on a standard PC using a bottom-up algorithm.
A. Alhazov et al. / Theoretical Computer Science 410 (2009) 2308–2315
2315
This might be quite useful because the number of times the algorithm considers small intermediate graphs grows very quickly as V grows. Pre-computed complexity for all ‘‘small’’ graphs can be used in the following way. Assume we are in the step s, the best already found solution is b, the current graph is G, the operations already chosen for step s form a set S, and S (G) is ‘‘small’’, so we know C (S (G)). In this case, we can conclude that the best solution we can obtain on this branch of the search tree is either s + C (S (G)) or s + C (S (G)) − 1 (because we have not finished step s yet). Therefore, unless s + C (S (G)) − 1 < b we can ignore this search tree branch and continue by backtracking. The method presented in this article has been implemented in the C++ programming language; see link [13] (except checking Check(G0 , F , r ), which would asymptotically speed up the algorithm, but needs more code). While the running time for a graph of 24 nodes and complexity 6 of an implementation of the basic algorithm is about 30 h, the implementation of the improved method gives the result in less than 5 min on the same computer. Acknowledgments This work was supported by Academy of Finland, grants 203667, 108421, and by Science and Technology Center in Ukraine, project 4032. Vladimir Rogojin is on leave of absence from the Institute of Mathematics and Computer Science of the Academy of Sciences of Moldova. References [1] A. Alhazov, C. Li, I. Petre, Computing the graph-based parallel complexity of gene assembly, Theoretical Computer Science (2008) (in press). [2] A. Ehrenfeucht, T. Harju, I. Petre, D.M. Prescott, G. Rozenberg, Computation in Living Cells: Gene Assembly in Ciliates, Springer, 2003. [3] A. Ehrenfeucht, D.M. Prescott, G. Rozenberg, Computational aspects of gene (un)scrambling in ciliates, in: L.F. Landweber, E. Winfree (Eds.), Evolution as Computation, Springer, Berlin, Heidelberg, New York, 2001, pp. 216–256. [4] T. Harju, C. Li, I. Petre, Parallel complexity of signed graphs for gene assembly in ciliates, in: Soft Computing — A Fusion of Foundations, Methodologies and Applications, Springer, Berlin, Heidenberg, 2008 (in press). [5] T. Harju, C. Li, I. Petre, G. Rozenberg, Complexity measures for gene assembly, in: K. Tuyls (Ed.), Proceedings of the Knowledge Discovery and Emergent Complexity in Bioinformatics Workshop, in: Lecture Notes in Bioinformatics, vol. 4366, Springer, 2007, pp. 42–60. [6] T. Harju, C. Li, I. Petre, G. Rozenberg, Parallelism in gene assembly, Natural Computing 5 (2) (2006) 203–223. [7] T. Harju, I. Petre, G. Rozenberg, Gene assembly in ciliates: Molecular operations, in: G. Paun, G. Rozenberg, A. Salomaa (Eds.), Current Trends in Theoretical Computer Science, 2004, pp. 527–542. [8] T. Harju, I. Petre, G. Rozenberg, Gene assembly in ciliates: Formal frameworks, in: G. Paun, G. Rozenberg, A. Salomaa (Eds.), Current Trends in Theoretical Computer Science, 2004, pp. 543–558. [9] L.F. Landweber, L. Kari, The evolution of cellular computing: Nature’s solution to a computational problem, in: Proceedings of the 4th DIMACS Meeting on DNA-Based Computers, Philadelphia, PA, 1998, pp. 3–15. [10] C.H. Papadimitriou, Computational Complexity, Addison Wesley, 1994. [11] D.M. Prescott, The DNA of ciliated protozoa, Microbiol. Rev. 58 (2) (1994) 233–267. [12] D.M. Prescott, A. Ehrenfeucht, G. Rozenberg, Molecular operations for DNA processing in hypotrichous ciliates, European J. Protistology 37 (2001) 241–260. [13] I. Petre, S. Skogman, Gene Assembly Simulator, 2006. http://combio.abo.fi/simulator/simulator.php.
Theoretical Computer Science 410 (2009) 2316–2322
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Binary sequences with optimal autocorrelation Ying Cai a,b , Cunsheng Ding c,∗ a
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
b
Beijing Information Science and Technology University, Beijing, China
c
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong
article
info
a b s t r a c t
Keywords: Almost difference set Difference sets Relative difference sets Sequences
Sequences have important applications in ranging systems, spread spectrum communication systems, multi-terminal system identification, code-division multiple access communication systems, global positioning systems, software testing, circuit testing, computer simulation, and stream ciphers. Sequences and error-correcting codes are also closely related. In this paper, we give a well rounded treatment of binary sequences with optimal autocorrelation. We survey known ones and construct new ones. © 2009 Elsevier B.V. All rights reserved.
1. Introduction The autocorrelation of a binary sequence (s(t )) of period N at shift w is ACs (w) =
N −1 X (−1)s(t +w)−s(t ) ,
(1)
t =0
where each s(t ) ∈ {0, 1}. These ACs (w), w ∈ {1, 2, . . . , N − 1}, are called the out-of-phase autocorrelation values. For applications in direct-sequence code-division multiple access, coding theory and cryptography, we wish to have binary sequences of period N with minimal value max1≤w≤N −1 |ACs (w)|. Throughout this paper, let (s(t )) be a binary sequence of period N. The set Cs = {0 ≤ i ≤ N − 1 : s(i) = 1}
(2)
is called the support of (s(t )); and (s(t )) is referred to as the characteristic sequence of Cs ⊆ ZN . The mapping s 7→ Cs is a one-to-one correspondence from the set of all binary sequences of period N to the set of all subsets of ZN . Hence, studying binary sequences of period N is equivalent to that of subsets of ZN . For any subset C of ZN , the difference function of C is defined as dC (w) = |(w + C ) ∩ C |,
w ∈ ZN .
(3)
Let (s(t )) be the characteristic sequence of C . It is easy to show that ACs (w) = N − 4(k − dC (w)),
(4)
where k := |C |. Thus the study of the autocorrelation property of the sequence (s(t )) becomes that of the difference function dC of the support C of the sequence (s(t )).
∗
Corresponding author. E-mail addresses:
[email protected] (Y. Cai),
[email protected] (C. Ding).
0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.021
Y. Cai, C. Ding / Theoretical Computer Science 410 (2009) 2316–2322
2317
The following results follow from (4): (1) Let N ≡ 3 (mod 4). Then max1≤w≤N −1 |ACs (w)| ≥ 1. On the other hand, max1≤w≤N −1 |ACs (w)| = 1 iff ACs (w) = −1 for all w 6≡ 0 (mod N ). In this case, the sequence {s(t )} is said to have ideal autocorrelation and optimal autocorrelation. (2) Let N ≡ 1 (mod 4). There is some evidence [11] that there is no binary sequence of period N > 13 with max1≤w≤N −1 |ACs (w)| = 1. It is then natural to consider the case max1≤w≤N −1 |ACs (w)| = 3. In this case ACs (w) ∈ {1, −3} for all w 6≡ 0 (mod N ). (3) Let N ≡ 2 (mod 4). Then max1≤w≤N −1 |ACs (w)| ≥ 2. On the other hand, max1≤w≤N −1 |ACs (w)| = 2 iff ACs (w) ∈ {2, −2} for all w 6≡ 0 (mod N ). In this case, the sequence {s(t )} is said to have optimal autocorrelation. (4) Let N ≡ 0 (mod 4). We have clearly that max1≤w≤N −1 |ACs (w)| ≥ 0. If max1≤w≤N −1 |ACs (w)| = 0, the sequence (s(t )) is called perfect. The only known perfect binary sequence up to equivalence is the (0, 0, 0, 1). It is conjectured that there is no perfect binary sequence of period N ≡ 0 (mod 4) greater than 4 [10]. This conjecture is true for all N < 108900 [10]. Hence, it is natural to construct binary sequences of period N ≡ 0 (mod 4) with max1≤w≤N −1 |ACs (w)| = 4. Binary sequences with optimal autocorrelation have close connections with certain combinatorial designs. The objective of this paper is to give a well rounded treatment of binary sequences with optimal autocorrelation. We will survey known constructions, and present new constructions. 2. Combinatorial characterizations To characterize binary sequences with optimal autocorrelation, we need to introduce difference sets and almost difference sets. Let (A, +) be an abelian group of order v . Let C be a k-subset of A. The set C is a (v, k, λ) difference set (DS) in A if dC (w) = λ for every nonzero element w of A, where dC (w) = |C ∩ (C + w)| is the difference function defined for A. The complement C of a (v, k, λ) difference set C in A is defined by A \ C and is a (v, v − k, v − 2k + λ) difference set. The reader is referred to [9,12] for details of difference sets. Let (A, +) be an abelian group of order v . A k-subset C of A is a (v, k, λ, t ) almost difference set (ADS) in A if dC (w) takes on λ altogether t times and λ + 1 altogether v − 1 − t times when w ranges over all the nonzero elements of A [1]. Two subsets D and E of a cyclic abelian group of order v are said to be equivalent if there are an integer ` relatively prime to v and an element a ∈ A such that E = `D + a. In particular, we have the equivalence definition for two almost difference sets and two difference sets in any cyclic group. Binary sequences of period N with optimal autocorrelation are characterized by the following [1]. Theorem 2.1. Let (s(t )) be a binary sequence of period N, and let Cs be its support. (1) Let N ≡ 3 (mod 4). Then ACs (w) = −1 for all w 6≡ 0 (mod N ) iff Cs is an (N , (N + 1)/2, (N + 1)/4) or (N , (N − 1)/2, (N − 3)/4) DS in ZN . (2) Let N ≡ 1 (mod 4). Then ACs (w) ∈ {1, −3} for all w 6≡ 0 (mod N ) iff Cs is an (N , k, k−(N +3)/4, Nk−k2 −(N −1)2 /4) ADS in ZN . (3) Let N ≡ 2 (mod 4). Then ACs (w) ∈ {2, −2} for all w 6≡ 0 (mod N ) iff Cs is an (N , k, k − (N + 2)/4, Nk − k2 − (N − 1) × (N − 2)/4) ADS in ZN . (4) Let N ≡ 0 (mod 4). Then ACs (w) ∈ {0, −4} for all w 6≡ 0 (mod N ) iff Cs is an (N , k, k − (N + 4)/4, Nk − k2 − (N − 1) × N /4) ADS in ZN . Due to this theorem, we will describe binary sequences with optimal autocorrelation later by difference sets and almost difference sets in ZN . Two binary sequences (s1 (t )) and (s2 (t )) of period N are said to be equivalent if there is an integer ` relatively prime to N and an integer j such that s2 (t ) = s1 (`t + j) for every t ≥ 0. It is easily shown that two binary sequences are equivalent if and only if their supports are equivalent. 3. The N ≡ 3
(mod 4) case
1 N −3 1 N +1 Due to Theorem 2.1, we only need to describe all known N , N − , 4 or N , N + , 4 difference sets of Zl , which are 2 2 called Paley–Hadamard difference sets. 3.1. Cyclotomic cyclic difference sets and their sequences (d,q)
Let q = df + 1 be a power of a prime, θ a fixed primitive element of GF(q). Define Di 6
(d,q)
θ i generated by θ . The cosets Dl d
d
(d,q)
= θ i hθ d i, a coset of the subgroup
are called the index classes or cyclotomic classes of order d with respect to GF(q). Clearly (d,q)
GF(q) \ {0} = ∪di=−01 Di . Define (l, m)d = |(Dl order d with respect to GF(q) [21].
+ 1) ∩ D(md,q) |. These constants (l, m)d are called cyclotomic numbers of
2318
Y. Cai, C. Ding / Theoretical Computer Science 410 (2009) 2316–2322
3.1.1. The Hall construction (6,p) Let p be a prime of the form p = 4s2 + 27 for some s. The Hall difference set [8] is defined by D = D0 ∪ D(16,p) ∪ D(36,p) . The characteristic sequence of D is a binary sequence of period p with ideal autocorrelation. 3.1.2. The Paley construction Let p ≡ 3 (mod 4) be a prime. The set of all quadratic residues modulo p is a p, 2 , 4 difference set in Zp [17]. The characteristic sequence of this difference set is a binary sequence of period p with ideal autocorrelation. p−1
p−3
3.1.3. The twin-prime construction Let p and p + 2 be two primes. Define N = p(p + 2). The twin-prime difference set is defined by
{(g , h) ∈ Zp × Zp+2 : g , h 6= 0 and χ (g )χ (h) = 1} ∪ {(g , 0) : g ∈ Zp }, where χ (x) = +1 if x is a nonzero square in the corresponding field, and χ (x) = −1 otherwise. Note that Zp × Zp+2 is isomorphic to Zp(p+2) . The image of the difference set above is a difference set in Zp(p+2) whose characteristic sequence has ideal autocorrelation. For detailed information about this construction, see [2, Chapt. V]. 3.2. Cyclic difference sets with Singer parameters and their sequences Cyclic difference sets in GF(2m )∗ with Singer parameters are those with parameters (2m − 1, 2m−1 − 1, 2m−2 − 1) for some integer t or their complements. Let D be any cyclic difference set in GF(2m )∗ . Then the characteristic sequence of logα D ⊂ Z2m −1 is a binary sequence with ideal autocorrelation, where α is any primitive element of GF(2m ). There are many constructions of cyclic difference sets with Singer parameters in GF(2m )∗ . Below we shall introduce them. 3.2.1. The Singer construction The Singer difference set [20] is defined by Da = {x ∈ GF(2m ) : Tr(ax) = 1} and has parameters (2m − 1, 2m−1 , 2m−2 ). Its characteristic sequence is (s(t )), where s(t ) = Tr(α t ) for ant t ≥ 0 and α is a primitive element of GF(2m ). This is also called the maximum-length sequence of period 2m − 1. 3.2.2. The hyperoval construction A function f from GF(2m ) to GF(2m ) is called two-to-one if for every y ∈ GF(2m ), |{x ∈ GF(2m ) : f (x) = y}| = 0 or 2. Another class of cyclic difference sets with Singer parameters is the hyperoval sets discovered by Maschietti in 1998 [14]. Let m be odd. Maschietti showed that Mk := GF(2m ) \ {xκ + x : x ∈ GF(2m )} is a difference set if x 7→ xκ is a permutation on GF(2m ) and the mapping x 7→ xκ + x is two-to-one. The following κ yields difference sets:
• • • •
κ κ κ κ
= 2 (the Singer case). = 6 (the Segre case). = 2σ + 2π with σ = (m + 1)/2 and 4π ≡ 1 mod m (the Glynn I case). = 3 · 2σ + 4 with σ = (m + 1)/2 (the Glynn II case).
3.2.3. The five-people construction Let m 6≡ 0 (mod 3) be a positive integer. Define δk (x) = xd + (x + 1)d ∈ GF(2m )[x], where d = 4k − 2k + 1 and k = (m ± 1)/3. Put Nk =
δk (GF(2m )), GF(2m ) \ δk (GF(2m )),
if m is odd if m is even.
Then Nk is a difference set with Singer parameters in GF(2m )∗ . This family of cyclic difference sets were conjectured by No, Chung and Yun [15], and the conjecture was confirmed by Dillon and Dobbertin [3]. 3.2.4. The Dillon–Dobbertin construction Let m be a positive integer. For each k with 1 ≤ k < m/2 and gcd(k, m) = 1, define ∆k (x) = (x + 1)d + xd + 1, where d = 4k − 2k + 1. Then Bk := GF(2m ) \ ∆k (GF(2m )) is a difference set with Singer parameters in GF(2m )∗ . Furthermore, for each fixed m, the φ(m)/2 difference sets Bk are pairwise inequivalent, where φ is the Euler function. This family of cyclic difference sets were described by Dillon and Dobbertin [3].
Y. Cai, C. Ding / Theoretical Computer Science 410 (2009) 2316–2322
2319
3.2.5. The Gordon–Mills–Welch construction Consider a proper subfield GF(2m0 ) of GF(2m ), where m0 > 2 is a divisor of m. Let R := {x ∈ GF(2m ) : TrGF(2m )/GF(2m0 ) (x) = 1}. If D is any DS with Singer parameters (2m0 − 1, 2m0 −1 , 2m0 −2 ) in GF(2m0 ), then UD := R(D(r ) ) is a DS with Singer parameters in GF(2m )∗ , where r is any representative of the 2-cyclotomic coset modulo (2m0 − 1), and D(r ) := {yr : y ∈ D} [7]. The Gordon–Mills–Welch construction is very powerful and generic. Any difference set with Singer parameters (2m0 − 1, 2m0 −1 , 2m0 −2 ) in any subfield GF(2m0 ) can be plugged in it, and may produce new difference set with Singer parameters. 4. The N ≡ 0
(mod 4) case
In this section, we describe known constructions of binary sequences of period N ≡ 0 phase autocorrelation values {0, −4}.
(mod 4) with optimal out-of-
4.1. The Sidelnikov–Lempel–Cohn–Eastman construction (2,q)
Let q ≡ 1 (mod 4) be a power of an old prime. Define Cq = logα (D1 − 1). Then the set Cq is a q − 1, 2 , 4 , almost difference set in Zq−1 . The characteristic sequence of Cq has optimal autocorrelation values {0, −4} [13,19]. q−1
q−5
q −1 4
4.2. Two more constructions
Two constructions were presented in [1]. The first one is the following. Let C be any l, l−21 , l−43 or l, l+21 , l+41 difference set of Zl , where l ≡ 3 (mod 4). Define a subset of Z4l by U = [(l + 1)C mod 4l] ∪ [(l + 1)(C − δ)∗ + 3l mod 4l] ∪ [(l + 1)C ∗ + 2l mod 4l] ∪ [(l + 1)(C − δ)∗ + 3l mod 4l]
(5)
where C ∗ and (C − δ)∗ denote the complements of C and C − δ in Zl respectively. Then U is a (4l, 2l − 1, l − 2, l − 1) or (4l, 2l + 1, l, l − 1) almost difference set in Z4l . The second construction presented in [1] is similar. Let D1 be any l, l−21 , l−43 (respectively, l, l+21 , l+41 ) difference set in Zl , let D2 be a trivial difference set in Z4 with parameters (4, 1, 0). Then D := (D2 × D∗1 ) ∪ (D∗2 × D1 ) is (4l, 2l − 1, l − 2, l − 1) (respectively, (4l, 2l + 1, l, l − 1)) almost difference set of Z4 × Zl . Let φ : Z4 × Zl → Z4l be an isomorphism. Then the characteristic sequence of φ(D) has optimal autocorrelation values {0, −4}. This sequence is obtained from a binary sequence of period l with ideal autocorrelation and a binary perfect sequence of length 4. An alternative description is given in [1]. The constructions are generic, and yield many binary sequences of length N ≡ 0 (mod 4) with optimal autocorrelation. All the cyclic difference sets described in Section 3 can be plugged into this generic construction. 5. The N ≡ 2
(mod 4) case
In this section, we describe known constructions of binary sequences of period N ≡ 2 phase autocorrelation values {2, −2}.
(mod 4) with optimal out-of-
5.1. The Sidelnikov–Lempel–Cohn–Eastman construction (2,q)
Let q ≡ 3 (mod 4) be a power of an old prime. Define Cq = logα (D1 − 1). Then the set Cq is a q − 1, 2 , 4 , almost difference set in Zq−1 . The characteristic sequence of Cq has optimal autocorrelation values {2, −2} [13,19]. q −1
q −3
3q−5 4
5.2. The Ding–Helleseth–Martinsen constructions Let q ≡ 5 (mod 8) be a prime. It is known that q = s2 + 4t 2 for some s and t with s ≡ ±1 Let i, j, l ∈ {0, 1, 2, 3} be three pairwise distinct integers, and define
i h i ∪ D(j 4,q) ) ∪ {1} × (D(l 4,q) ∪ D(j 4,q) ) . 2 n−6 3n−6 , 4 , 4 almost difference set of A = Z2 × Zq if the generator of Z∗q employed to define the cyclotomic Then C1 is an n, n− 2 h
(4,q)
(mod 4). Set n = 2q.
C1 = {0} × (Di (4,q)
classes Di
is properly chosen and
2320
Y. Cai, C. Ding / Theoretical Computer Science 410 (2009) 2316–2322
(1) t = 1 and (i, j, l) = (0, 1, 3) or (0, 2, 1); or (2) s = 1 and (i, j, l) = (1, 0, 3) or (0, 1, 2). Another construction is the following. Let i, j, l ∈ {0, 1, 2, 3} be three pairwise distinct integers, and define
h
(4,q)
C2 = {0} × Di
2 Then C2 is an n, 2n , n− , 4
(4,q)
classes Di
∪ D(j 4,q) 3n−2 4
i
h i ∪ {1} × D(l 4,q) ∪ D(j 4,q) ∪ {0, 0}.
almost difference set of A = Z2 × Zq if the generator of Z∗q employed to define the cyclotomic
is properly chosen and
(1) t = 1 and (i, j, l) ∈ {(0, 1, 3), (0, 2, 3), (1, 2, 0), (1, 3, 0)}; or (2) s = 1 and (i, j, l) ∈ {(0, 1, 2), (0, 3, 2), (1, 0, 3), (1, 2, 3)}. Let φ : Z2 × Zq → Z2q be an isomorphism. Then the characteristic sequence of φ(Ci ) has optimal autocorrelation values {2, −2} [6]. 5.3. Other constructions There are four families of binary sequences of period pm − 1 with optimal autocorrelation in [16]. Two of them are balanced and are equivalent to the Sidelnikov–Lempel–Cohn–Eastman sequence, and the other two are almost balanced and a modification of one bit of the Sidelnikov–Lempel–Cohn–Eastman sequence. 6. The N ≡ 1
(mod 4) case
In this section, we describe known constructions of binary sequences of period N ≡ 1 phase autocorrelation values {1, −3}.
(mod 4) with optimal out-of-
6.1. The Legendre construction Let p ≡ 1 (mod 4) be a prime. The set of quadratic residues modulo p form an almost difference set in Zp . Its characteristic sequence is the Legendre sequence with optimal out-of-phase autocorrelation values {−3, 1}. 6.2. The Ding–Helleseth–Lam construction
(mod 4) be a prime, and let D(i 4,q) be the cyclotomic classes of order 4. For all i, the set D(i 4,q) ∪ D(i+4,1q) is a q −1 q −5 q −1 q, 2 , 4 , 2 almost difference set, if q = x2 + 4 and x ≡ 1 (mod 4) [5]. The characteristic sequence of these almost difference sets has optimal out-of-phase autocorrelation values {−3, 1}. Let q = 1
6.3. A construction with generalized cyclotomy Let g be a fixed common primitive root of both primes p and q. Define d = gcd(p − 1, q − 1), and let de = (p − 1)(q − 1). Then there exists an integer x such that Z∗pq = {g s xi : s = 0, 1, . . . , e − 1; i = 0, 1, . . . , d − 1}. Whiteman’s generalized cyclotomic classes Di are defined by Di = {g s xi : s = 0, 1, . . . , e − 1},
i = 0, 1, . . . , d − 1.
Let D0 and D1 be the generalized cyclotomic classes of order 2. Define C = D1 ∪ {p, 2p, . . . , (q − 1)p}. If q − p = 4 and (p − 1)(q − 1)/4 is odd, then C is a
(p(p + 4), (p + 3)(p + 1)/2, (p + 3)(p + 1)/4, (p − 1)(p + 5)/4) almost difference set of Zp(p+4) [4]. The characteristic sequence of these almost difference sets has optimal out-of-phase autocorrelation values {−3, 1}. 6.4. Comments on the N ≡ 1
(mod 4) case
There are only three known constructions of binary sequences of period N ≡ 1 It can be proved that they are not equivalent.
(mod 4) with optimal autocorrelation.
Y. Cai, C. Ding / Theoretical Computer Science 410 (2009) 2316–2322
2321
7. A generic construction of binary sequences with out-of-phase autocorrelation values {−1, 3} Throughout this section, let m be an even positive integer. In this section, we describe a generic construction of binary sequences of period 2m − 1 with out-of-phase autocorrelation values {−1, 3} only. Let (G, +) be an abelian group of order v , and let D be a k-element subset of G. Assume that N is a subgroup of order n and index m of G. The set D is called a relative difference set with parameters (m, n, k, λ) if the multiset {r − r 0 : r , r 0 ∈ D, r 6= r 0 } contains no element of N and covers every element in G \ N exactly λ times. A relative difference set is called cyclic if G is cyclic. Theorem 7.1. Let R2 be any (2m/2 − 1, 2(m−2)/2 − 1, 2(m−4)/2 − 1) difference set in GF(2m/2 )∗ . Define R1 = {x ∈ GF(2m ) : Tr2m /2m/2 (x) = 1},
R = {r1 r2 : r1 ∈ R1 , r2 ∈ R2 }.
Then R is a (2m − 1, 2m−1 − 2m/2 , 2m−2 − 2m/2 , 2m/2 − 2) almost difference set in GF(2m )∗ . Furthermore, the characteristic sequence of the set logα R has only the out-of-phase autocorrelation values {−1, 3}, where α is any generator of GF(2m ). Proof. For the convenience of description, define G = GF(2m )∗ and H = GF(2m/2 )∗ . We first prove that R1 is a relative difference set with parameters (2m/2 + 1, 2m/2 − 1, 2m/2 , 1) in G relative to H. This is to prove that (−1)
R1 R1
= k1 + λ1 (G \ H ),
(−1)
where R1 := {r −1 : r ∈ R1 }, k1 = 2m/2 and λ1 = 1. Clearly, |R1 | = k1 . Note that Tr2m /2m/2 (x) = x + xγ , where γ := 2m/2 . We need to compute the number of solutions (x, y) ∈ GF(2m )∗ × GF(2m )∗ of the following set of equations x + xγ = 1,
y + yγ = 1,
xy−1 = a,
(6)
where a ∈ GF(2 ). It is easily seen that the number of solutions (x, y) ∈ GF(2 ) × GF(2 ) of (6) is the same as the number of solutions y ∈ GF(2m )∗ of the following set of equations m ∗
m
y + yγ = 1, m/2 ∗
m ∗
(a − aγ )yr = a − 1. γ
(7) m/2
γ
If a ∈ GF(2 ) , a − a = 0. So (7) has no solution. If a ∈ GF(2 ) \ GF(2 ), then a − a 6= 0 and (7) has the unique solution y = (aγ − 1)/(aγ − a). This proves the difference set property of R1 . Since R2 is a (2m/2 − 1, 2(m−2)/2 − 1, 2(m−4)/2 − 1) difference set in GF(2m )∗ , we have (−1)
R2 R2
m
= k2 + λ2 (H \ {1}), (m−2)/2
where k2 = 2 Now we have
− 1 and λ2 = 2(m−4)/2 − 1.
(R1 R2 )(R1 R2 )−1 = = = =
(k1 + λ1 (G \ H ))((k2 − λ2 ) + λ2 H ) k1 (k2 − λ2 ) + λ1 (k2 − λ2 )(G \ H ) + k1 λ2 H + λ1 λ2 (G \ H )H k1 (k2 − λ2 ) + λ1 (k2 − λ2 )G + λ1 λ2 |H |G − λ1 (k2 − λ2 )H + k1 λ2 H − λ1 λ2 |H |H k1 (k2 − λ2 ) + (λ1 (k2 − λ2 ) + λ1 λ2 |H |)G + (k1 λ2 − λ1 (k2 − λ2 ) − λ1 λ2 |H |)H .
Note that
λ1 (k2 − λ2 ) + λ1 λ2 |H | = 2m−2 − 2m/2 + 1 and k1 λ2 − λ1 (k2 − λ2 ) − λ1 λ2 |H | = −1. We obtain
(R1 R2 )(R1 R2 )−1 = (2m−2 − 2m/2 + 1)(G \ H ) + (2m−2 − 2m/2 )(H \ {1}) + 2m−1 − 2m/2 . This proves the almost difference set property of R. It is then easy to prove that the characteristic sequence of logα R has only the out-of-phase autocorrelation values {−1, 3}. This is left to the reader as an exercise. This construction is generic in the sense that the difference sets with Singer parameters described in Section 3 can be plugged in to obtain many classes of binary sequences of period 2m − 1 with the out-of-phase autocorrelation values {−1, 3} only. As seen before, the Gordon–Mills–Welch construction of Section 3.2.5 is generic and powerful in constructing difference sets with Singer parameters. The Gordon–Mills–Welch construction was generalized for constructing relative difference sets in [18, Proposition 3.2.1]. The idea of the construction of almost difference sets in this section is the same as the one in [18, Proposition 3.2.1]. However, our objective here is to construct almost difference sets.
2322
Y. Cai, C. Ding / Theoretical Computer Science 410 (2009) 2316–2322
8. Concluding remarks There are only a few constructions of binary sequences of period N with optimal autocorrelation for N = 2 (mod 4) and N = 1 (mod 4). It may be challenging to find new constructions. The classification of the binary sequences with optimal autocorrelation according to equivalence is open in some cases. For the equivalence of binary sequences with ideal autocorrelation, the reader is referred to [22] for information. Acknowledgments The authors wish to thank Qing Xiang for his help with the proof of Theorem 7.1, and the reviewer for the comments and suggestions that greatly improved the quality of this paper. The research of Cunsheng Ding is supported by the Research Grants Council of the Hong Kong Special Administrative Region, China, Proj. No. 612405. References [1] K.T. Arasu, C. Ding, T. Helleseth, P.V. Kumar, H. Martinsen, Almost difference sets and their sequences with optimal autocorrelation, IEEE Trans. Inform. Theory 47 (2001) 2834–2843. [2] L.D. Baumert, Cyclic Difference Sets, in: Lecture Notes in Mathematics, vol. 182, Springer, Berlin, 1971. [3] J.F. Dillon, H. Dobbertin, New cyclic difference sets with Singer parameters, Finite Fields Appl. 10 (2004) 342–389. [4] C. Ding, Autocorrelation values of the generalized cyclotomic sequences of order 2, IEEE Trans. Inform. Theory 44 (1998) 1698–1702. [5] C. Ding, T. Helleseth, K.Y. Lam, Several classes of sequences with three-level autocorrelation, IEEE Trans. Inform. Theory 45 (1999) 2606–2612. [6] C. Ding, T. Helleseth, H.M. Martinsen, New families of binary sequences with optimal three-level autocorrelation, IEEE Trans. Inform. Theory 47 (2001) 428–433. [7] B. Gordon, W.H. Mills, L.R. Welch, Some new difference sets, Canad. J. Math. 14 (1962) 614–625. [8] J. Hall Jr., A survey of difference sets, Proc. Amer. Math. Soc. 7 (1956) 975–986. [9] D. Jungnickel, Difference sets, in: J. Dinitz, D.R. Stinson (Eds.), Contemporary Design Theory, A Collection of Surveys, in: Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley, New York, 1992, pp. 241–324. [10] D. Jungnickel, A. Pott, Difference sets: An introduction, in: A. Pott, P.V. Kumar, T. Helleseth, D. Jungnickel (Eds.), Difference Sets, Sequences, and their Correlation Properties, Klumer, Amsterdam, 1999, pp. 259–296. [11] D. Jungnickel, A. Pott, Perfect and almost perfect sequences, Discrete Appl. Math. 95 (1999) 331–359. [12] D. Jungnickel, B. Schmidt, Difference sets: An update, in: J.W.P. Hirschfeld, S.S. Magliveras, M.J. de Resmini (Eds.), Geometry, Combinatorial Designs and Related Structures, Cambridge University Press, Cambridge, 1997, pp. 89–112. [13] A. Lempel, M. Cohn, W.L. Eastman, A class of binary sequences with optimal autocorrelation properties, IEEE Trans. Inform. Theory 23 (1977) 38–42. [14] A. Maschietti, Difference sets and hyperovals, Des. Codes Cryptgr. 14 (1998) 89–98. [15] J.S. No, H. Chung, M.S. Yun, Binary pseudorandom sequences of period 2m − 1 with ideal autocorrelation generated by the polynomial z d + (z + 1)d , IEEE Trans. Inform. Theory 44 (1998) 1278–1282. [16] J.S. No, H. Chung, H.Y. Song, K. Yang, J.D. Lee, T. Helleseth, New construction for binary sequences of period pm − 1 with optimal autocorrelation using (z + 1)d + az d + b, IEEE Trans. Inform. Theory 47 (2001) 1638–1644. [17] R.E.A.C. Paley, On orthogonal matrices, J. Math. Phys. MIT 12 (1933) 311–320. [18] A. Pott, Finite Geometry and Character Theory, in: Lecture Notes in Mathematics, vol. 1601, Springer, Heidelberg, 1995. [19] V.M. Sidelnikov, Some k-valued pseudo-random sequences and nearly equidistant codes, Probl. Inf. Transm. 5 (1969) 12–16. [20] J.F. Singer, A theorem in finite projective geometry and some applications to number theory, Trans. Amer. Math. Soc. 43 (1938) 377–385. [21] T. Storer, Cyclotomy and Difference Sets, Markham, Chicago, 1967. [22] Q. Xiang, Recent results on difference sets with classical parameters, in: J. Dinitz, D.R. Stinson (Eds.), Contemporary Design Theory, A Collection of Surveys, in: Wiley-Interscience Series in Discrete Mathematics and Optimization, Wiley, New York, 1992, pp. 419–437.
Theoretical Computer Science 410 (2009) 2323–2335
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Topology on words Cristian S. Calude a , Helmut Jürgensen b , Ludwig Staiger c a
The University of Auckland, New Zealand
b
The University of Western Ontario, London, Canada
c
Martin-Luther-Universität Halle-Wittenberg, Germany
article
info
Keywords: Formal languages Combinatorics on words Topology on words ω-languages Order-based topologies
a b s t r a c t We investigate properties of topologies on sets of finite and infinite words over a finite alphabet. The guiding example is the topology generated by the prefix relation on the set of finite words, considered as a partial order. This partial order extends naturally to the set of infinite words; hence it generates a topology on the union of the sets of finite and infinite words. We consider several partial orders which have similar properties and identify general principles according to which the transition from finite to infinite words is natural. We provide a uniform topological framework for the set of finite and infinite words to handle limits in a general fashion. © 2009 Elsevier B.V. All rights reserved.
1. Introduction and preliminary considerations We investigate properties of various topologies on sets of words over a finite alphabet. When X is a finite alphabet, one considers the set X ∗ of finite words over X , the set X ω of (right-)infinite words over X and the set X ∞ = X ∗ ∪ X ω of all words over X . On the set X ∞ concatenation (in the usual sense) is a partial binary operation defined on X ∗ × X ∞ . Infinite words are commonly considered limits of sequences of finite words in the following sense. A finite word u is said to be a prefix of a w ∈ X ∞ , written as u ≤p w , if there is a word v ∈ X ∞ such that w = uv ; when u 6= w , u is a proper prefix of w , written as u
2324
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
ϕ : X ∗ → Y ∗ is monotone with respect to ≤. How can the mapping ϕ be extended, in a natural fashion, to a mapping ϕ : X ∞ → Y ∞?
In particular, we investigate which partial orders on X ∗ yield reasonable extensions. It turns out that prefix-based partial orders, that is, partial orders ≤ containing the prefix order, allow for such extensions of the topology Top≤ . Moreover, we consider properties of the limits defined with respect to these topologies on X ∗ and their extensions. Specifically, we explore to which extent topologies derived from such partial orders ≤ support a natural description of infinite words as limits of sequences of finite words thus allowing for the extension of ≤-monotone mappings as indicated above. An important issue is, how to present an infinite word ξ ∈ X ω as a limit of sequences, of order type ω, of finite words wj j∈N in such a way that
ξ is a limit point of wj
j∈N if and only if w0 < w1 < · · · < wj < · · · < ξ . In the case of the prefix order ≤p , the concept of adherence plays a crucial rôle in extending continuous, that is, ≤p -monotone, mappings from X ∗ to X ω . We apply the ideas leading to the definition of adherence to partial orders different from the prefix order. We then investigate the properties of the resulting generalized notion of adherence with respect to limits. Several fundamentally different ways of equipping the set X ∗ with a topology are proposed in the literature. Roughly, these can be classified as follows:
• Topologies arising from the comparison of words. • Topologies arising from languages, that is, sets of words. • Topologies arising from the multiplicative structure. A similar classification can be made for topologies on X ω and X ∞ . For X ∞ , topologies have not been studied much; however, to achieve a mathematically sound transition between X ∗ and X ω , precisely such topologies are needed. Our paper is structured as follows. In Section 2 we introduce notation and review some basic notions. In Sections 4 and 5 we briefly discuss topologies for the sets of finite and of infinite words as considered in the literature. General background regarding topologies and specifics relevant to topologies on words are introduced in Section 3. In Section 6 we consider extensions of partial orders on X ∗ to X ω . Intuitively, the limits are related to reading from left to right, that is, according to the order type ω; topologies derived from partial orders rely on this idea. In Section 7 we explore this intuition. Section 8 provides a discussion of special cases. In Section 9 we summarize the ideas and discuss the results. A preliminary version of this paper was presented at the Joint Workshop Domains VIII and Computability Over Continuous Data Types, Novosibirsk, September 11–15, 2007 [5]. 2. Notation and basic notions We introduce the notation used and also review some basic notions. By N we denote the set {0, 1, . . .} of non-negative integers; R denotes the set of real numbers; let R+ be the set of nonnegative real numbers. For a set S, card S is the cardinality of S, and 2S is the set of all subsets of S. If T is also a set then S T is the set of mappings of T into S. The symbol ω denotes the smallest infinite ordinal number. As usual, ω is identified with the set N. Thus S ω is the set of all mappings of N into S, hence the set of all infinite sequences of elements of S. When considering singleton sets, we often omit the set brackets unless there is a risk of confusion. An alphabet is a non-empty, finite set. The elements of an alphabet are referred to as symbols or letters. Unless specifically stated otherwise, every alphabet considered in this paper has at least two distinct elements. Let X be an alphabet. Then X ∗ denotes the set of all (finite) words over X including the empty word ε , and X + = X ∗ \ {ε}. The set X ω is the set of (right-)infinite words over X . Let X ∞ = X ∗ ∪ X ω . With γ ∈ {∗, ω, ∞} a γ -word is a word in X γ . Similarly, a γ -language is subset of X γ . When we do not specify γ , γ = ∞ is implied. For a word w ∈ X ∞ , | w | is its length. On the set X ∞ concatenation (in the usual sense) is a partial binary operation defined on X ∗ × X ∞ . With concatenation as operation X ∗ is a free monoid and X + is a free semigroup; moreover, X ∞ can be considered as a left act (also called a left operand)1 resulting in a representation of the monoid X ∗ as a monoid of (left) transformations of the set X ∞ . We also consider the shuffle product x which is defined as follows: For u ∈ X ∗ and w ∈ X ∞ ,
v ∈ X ∞ , ∃ n ∃ u1 , u2 , . . . , un ∈ X ∗ ∃w0 , w1 , . . . , wn−1 ∈ X ∗ ∃wn ∈ X ∞ uxw= v . u = u1 u2 · · · un , w = w0 w1 · · · wn , v = w u w u ···w u w 0 1
n −1 n
1 2
n
We consider binary relations % ⊆ X × X and their restrictions to X ∗ × X ∗ . Unless there is a risk of confusing the relations the latter is also just denoted by %. Usually, such a relation is defined by some property of words, say P, and we write %P to indicate this fact. When the restriction of %P to X ∗ × X ∗ is a partial or strict order, we write ≤P or
1 See [13] for basic definitions.
∞
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
2325
• Prefix order: u ≤p v if v ∈ uX ∞ . • Infix order: u ≤i v if, for some w ∈ X ∗ , v ∈ w uX ∞ . • Embedding (or shuffle) order: u ≤e v if, for some w ∈ X ∞ , v ∈ u x w. For the next definitions we need a total ordering on the alphabet X as afforded, for instance, by a bijective mapping α of X onto the set {1, 2, . . . , q} where q = card X . Let u = u1 u2 · · · un and v = v1 v2 · · · with u1 , u2 , . . . , v1 , v2 , . . . ∈ X .
• Lexicographic order: If u 6≤p v and v 6≤p u, let i0 = min{i | ui 6= vi }. Then u ≤lex v if u ≤p v or if u 6≤p v , v 6≤p u and α(ui0 ) < α(vi0 ). • Quasi-lexicographic (or pseudo-lexicographic) order: u ≤q-lex v if | u | < | v | or if | u | = | v | and u ≤lex v . If ≤ is any one of these relations, then u < v if u ≤ v and u 6= v . For a more comprehensive list of important binary relations, especially partial orders, on finite strings and their rôles in the definition of classes of languages or codes see [31,54,65]. Let ≤ be a partial order on X ∗ . The right extension of ≤ to X ∗ × X ∞ is defined as follows: For u ∈ X ∗ and v ∈ X ω , u ≤ v if there is a word w ∈ X ∗ such that w ≤p v and u ≤ w . For v ∈ X ∞ , the set Pred≤ v = {u | u ∈ X ∗ , u ≤ v} is the set of predecessors of v with respect to ≤. The set Succ≤ v = {u | u ∈ X ∞ , v ≤ u} is the set of successors of v with respect to ≤. In particular, Succ≤ v = ∅ for v ∈ X ω . For L ⊆ X ∞ , let Pred≤ L =
[
Pred≤ v
and
Succ≤ L =
v∈L
[
Succ≤ v.
v∈L
Specifically, we define Pref = Pred≤p and Inf = Pred≤i . A ∗-language L is said to be prefix-free (or a prefix code) if, for all u, v ∈ L, u ≤p v implies u = v . Similarly, L is infixfree (or an infix code) if, for all u, v ∈ L, u ≤i v implies u = v . In general, for a binary relation %, a language is %-free (or %-independent) if, for all u, v ∈ L, (u, v) ∈ % implies u = v . For further details concerning %-freeness and codes see [31]. 3. General topologies We now present some basic background concerning topologies; we use [21,35] as general references. For topologies on partially ordered sets see also [3,39] 3.1. Definitions A topology τ on a set X is a pair τ = (X, O ) where O ⊆ 2X is a family of subsets, called open sets, containing X itself and being closed under finite intersections and arbitrary unions. Alternatively, a topology on X can be defined by a closure operator cl : 2X → 2X having the following properties: M ⊆ cl(M )
(1)
cl(M ) = cl(cl(M ))
(2)
cl(M1 ∪ M2 ) = cl(M1 ) ∪ cl(M2 ) and
(3)
cl(∅) = ∅
(4)
A set M satisfying cl(M ) = M is said to be closed; the family of all complements of closed sets O = {M | M ⊆ X ∧ cl(X \ M ) = X \ M } is closed under finite intersection and arbitrary union, hence a family of open sets. A basis of a topology τ = (X, O ) is a family B ⊆ 2X such that Tn every M ∈ O is a union0 of sets in B . A sub-basis of a topology τ = (X, O ) is a family B 0 ⊆ 2X such that the family j=1 Mj | n ∈ N ∧ Mj ∈ B for 1 ≤ j ≤ n is a basis of τ . Every family B 0 ⊆ 2X when used as a sub-basis defines a topology on X. A point x ∈ X is an accumulation point of a set M ⊆ X when x ∈ cl(M \ {x}). This condition is equivalent to that of every open set M 0 which contains x satisfying that M 0 ∩ (M \ {x}) 6= ∅. One can define the closure via accumulation points: cl(M ) = M ∪ {x | x is an accumulation point of M }.
(5)
For a topological space (X, O ) and a subset M ⊆ X the pair (M , OM ) with OM = {M ∩ M | M ∈ O } is the subspace topology on M induced by (X, O ). Here BM = {M ∩ M 0 | M 0 ∈ B } is a basis for (M , OM ) if B is a basis for (X, O ). 0
0
2326
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
3.2. Sequences and limits
A sequence in a space X is an ordered family xj j∈N where xj ∈ X but not necessarily xi 6= xj for i 6= j, that is, such a sequence is an element of XN . A point x in a topological space (X, O ) is called a limit point of the sequence xj j∈N if, for every open set M ∈ O containing x, there is j0 ∈ N such that xj ∈ M for all j, j ≥ j0 . The set of all limit points of a sequence xj j∈N is denoted by lim xj . Observe that a sequence may have more than one limit point or no limit point at all. In general topological spaces limit points of sequences are not sufficient to determine closed sets. In metric spaces the situation is different. Only the following holds true in general (see [21, Ch. I.6]). Theorem 1. If a topological space (X, O ) has a countable basis then for every M ⊆ X its closure cl(M ) is the set of all limit points of sequences xj j∈N where xj ∈ M for all j ∈ N.
A cluster point of a sequence xj j∈N is a point x such that for every open set M 0 containing x there are infinitely many j such that xj ∈ M 0 (see [21]). Similarly, a point x ∈ X is a cluster point of a set M ⊆ X if, for every open set M 0 containing x, the intersection M 0 ∩ M is infinite. Remark 2. Every cluster point of M is also an accumulation point of M. In spaces where every finite set is closed, every accumulation point is also a cluster point. The difference in the definitions of accumulation and cluster points is useful in what follows, as most of the spaces considered in this paper have finite subsets which are not closed. 3.3. Right topology In this last preliminary part we recall the concept of right (or Alexandrov) topology α≤ on a set X partially ordered by some relation ≤. This topology is generated by the basis of right-open intervals Bx = {y | y ∈ X ∧ x ≤ y}. It has the following properties (see [21]). Proposition 3. Let (X, ≤) be a partially ordered set, and let α≤ be defined as (X, O≤ ) where O≤ = the following hold true. (1) (2) (3) (4)
S
x∈M
Bx | M ⊆ X . Then
Bx is the smallest open set containing x. An arbitrary intersection of open sets is again open. For every pair x, y ∈ X there is an open set containing one of the points but not the other. In particular, if y 6≤ x then x ∈ / By . A point x ∈ X is an isolated point, that is, the set {x} is open, if and only if x is a maximal element with respect to ≤ in X. Note that, because of Property 3, α≤ is a T0 topology.
4. Review of topologies for finite words Several fundamentally different ways of equipping the set X ∗ with a topology are proposed in the literature, roughly classified as follows:
• Topologies arising from the comparison of words. • Topologies arising from languages. • Topologies arising from the multiplicative structure. In most cases, the intended application of the topology requires that X ∗ with the topology be a metric space. Topologies related to X ∗ arise also when one considers the space of formal power series RhhX ii with a semiring R as the coefficient domain and with the elements of X as non-commuting variables (see [34], for example). 4.1. Topologies from comparing words At least two methods have been proposed for comparing words and deriving topologies from them. One of the historical origins is the theory of codes, where the size and, implicitly, the improbability of an error are measured in terms of the difference between words.2 When only words of the same length are compared, as is the case in the theory of error correcting codes, the Hamming or the Lee metric, depending on the physical context, is commonly used. The Hamming metric just counts the number of positions in which two words of the same length differ; the Lee metric assumes a cyclic structure on the alphabet X and reflects the sum of the cyclic differences of two words of the same length. Neither of these metrics seems to lead to a meaningful topology on the whole of X ∗ . Also originating with the theory of codes is the Levenshtein distance [37] between words of arbitrary length; sometimes this distance measure is also called editing distance. It is widely used in the context of string matching algorithms as needed, for instance, in genome research. On the set X ∗ one considers the three operations σ of substituting a symbol, ι of inserting a symbol and δ of deleting a symbol. To change a word x ∈ X ∗ into a word y ∈ X ∗ , one can use a sequence of these operations; the reverse of this sequence will change y into x. The length of the shortest such sequence of operations is the Levenshtein 2 See [31] for an explanation of the connection between error probability and difference of words.
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
2327
distance3 between x and y; the operation σ is redundant as it can be simulated by ιδ . Hence one gets two different distance measures dσ ,ι,δ and dι,δ , both being metrics, which give rise to homeomorphic topologies. Another idea is proposed in [7]. Let f : X ∗ → R+ be an injective function such that f (ε) = 0. Then the function df : X ∗ × X ∗ → R with df (x, y) = | f (x) − f (y) | for x, y ∈ X ∗ is a metric. For example, with card X = q, let P α : X → {1, 2, . . . , q} be a bijection; for a word x = a1 a2 · · · an ∈ X ∗ with ai ∈ X for all i, let f (x) = ni=1 (q + 1)−α(ai ) . Then f corresponds to the lexicographical ordering of words in the following sense: f (x) < f (y) if and only if x ≤lex y. In general, a partial order ≤ on X ∗ gives rise to a topology Top≤ defined by the family {Succ≤ u | u ∈ X ∗ } as a sub-base of open sets. Among these the prefix topology Top≤p plays a special rôle as the concept of successor coincides with the usual left-to-right reading of words. For the prefix order ≤p the set of successors of a word u ∈ X ∗ is the set uX ∗ . For a given partial order, one can derive natural definitions of the notions of density and convexity. For the former, see [30,32]; for the latter, see [1], where the term of continuity is used instead. For additional information see [54,65]. Another interesting method by which a topology could be derived from the comparison of words is analysed in [9] in an abstract setting, not in any way related to orders on words. 4.2. Topologies from languages Let L ⊆ X ∗ be a language (natural or formal) and let u, v ∈ X ∗ . A question raised early on in linguistics was how to quantify the comparison of the rôles played by the words u and v with respect to the language L (see [40]). The set CL (u) = {(x, y) | x, y ∈ X ∗ , xuy ∈ L} of permitted contexts of u is called the distribution class of u. The distribution class of a word can be interpreted as a description of the syntactic or semantic category of the word. Thus one would like to express the topological relation between u and v in terms of a comparison of their distribution classes CL (u) and CL (v). A probabilistic version of these relations was introduced in [33]. Generalizing these thoughts one attempts to compare classes of words, that is, languages. While most of the elementary concepts concerning distribution classes can easily be extended to ∞-languages, the topological consequences of such a generalization have not been explored. Several different proposals for deriving topologies on X ∗ and for equipping X ∗ with a metric, which are based on the language-theoretic concepts, are presented and analysed in [18,17,20,19,66,49,8]. Topologies on X ∗ which are not induced by order relations were considered in [47,48,51]. Further topological properties derived for languages, automata or grammars are studied in [12,64,41,63,27,10,53]. 4.3. Topologies from the multiplicative structure In [22] a topology for free groups was introduced (see also [46]). These ideas were generalized to free monoids, that is, to X ∗ in [44,45]. At this point we do not know how this work relates to our results. 5. Review of topologies for finite and infinite words It seems that for finite and infinite words one usually only considers the topology related to the prefix order. See [43] for a general introduction. These topologies resemble the ones defined on semirings of formal power series (see [34]). Topologies on X ∞ , while needed for a sound definition of ω-words as limits of sequences of ∗-words have not been studied much. As far as we know, the earliest such investigation is reported in [42,4]. There, instead of X ∞ , one considers (X ∪ {⊥})ω , where ⊥ is a new symbol such that a ∗-word w is represented by w⊥ω ; the topology is then based on the prefix order. As mentioned above, we are looking for a natural way of extending mappings from finite words to infinite words. The following method, applicable in the case of the prefix topology, will guide the ideas. Let ϕ : X ∗ → X ∗ be a mapping which is monotone with respect to ≤p . The natural extension of ϕ to a mapping ϕ : X ∞ → X ∞ is then defined by
ϕ(ξ ) = sup≤p {ϕ(w) | w ∈ X ∗ ∧ w ≤p ξ } as shown in Fig. 1. For language-theoretic aspects see [38,4,58]. 5.1. Topologies related to the prefix-limit process We consider two topologies which are related to the extension process defined above. The first one is closely related to the topology of the Cantor space (X ω , %) where the function % : X ω × X ω → R, defined as
%(ξ , ζ ) = inf{(card X )−| w | | w ∈ Pref ξ ∩ Pref ζ } for ξ , ζ ∈ X ω , is a metric.
3 For algorithms to compare strings according to the Levenshtein distance and for applications to DNA-sequencing see [2,11].
2328
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
Fig. 1. Extension of a mapping.
5.1.1. Cantor topology For details regarding the Cantor topology we refer to [4]. As mentioned above, we introduce a new symbol ⊥ and represent the words w ∈ X ∗ by the infinite words w⊥ω . For η, η0 ∈ X ∞ one has
%(η, η ) = 0
0, 0 (card X )1−card (Pref η∩Pref η ) ,
if η = η0 , otherwise.
Thus, the space (X ∞ , %) is considered as a subspace of the Cantor space (X ∪ {⊥})ω , % with all w ∈ X ∗ as isolated points.
5.1.2. Redziejowski’s topology A different approach to defining a natural topology on X ∞ is proposed in [50]. We refer to this topology as τR .
− →
Definition 4. Let W ⊆ X ∗ and F ⊆ X ω . We define W = {ξ | ξ ∈ X ω ∧ Pref ξ ∩ W is infinite} and the closure clR (W ∪ F ) = − → W ∪F ∪ W . We list a few properties of the topology τR (see [50]). Proposition 5. The topology τR on X ∞ has the following properties: (1) (2) (3) (4)
The topology τR is not a metric topology. Every subset F ⊆ X ω is closed. The topological space (X ∞ , τR ) is completely regular, hence a Hausdorff space. In contrast to the Cantor topology, where limn→∞ 0n · 1 = 0ω , the sequence (0n · 1)n∈N has no limit in τR , while limn→∞ 0n = 0ω in both topologies.
5.2. Adherences An operator, very much similar to that of the closure operator in the Cantor topology, called adherence, (or ls-operator) was introduced to formalize the transition from finite to infinite words (see [57,61,38,42,4,58,59,26,28,36,15,23–25,55,62]). Adherence is defined as an operator on languages as follows. Definition 6. The adherence of a language W ⊆ X ∗ is the set Adh W = {ξ | ξ ∈ X ω ∧ Pref ξ ⊆ Pref W }. An ω-word ξ is an element of Adh W if and only if, for all v ≤p ξ , the set W ∩ v X ∗ is infinite. 5.2.1. Adherence and topologies The following facts connect the concept of adherence with the closure operator in the Cantor topology of X ∞ . Proposition 7. Let W ⊆ X ∗ and F ⊆ X ω . The Cantor topology on X ∞ has the following properties: (1) The adherence Adh W is the set of cluster points of W . (2) The closure of W ∪ F is the set W ∪ Adh (W ∪ Pref F ). 5.2.2. Adherences as limits Given the connection between adherence and closure, it is not surprising that adherence can be viewed as a kind of limit. Proposition 8. Let ϕ : X ∗ → X ∗ be a mapping which is monotone with respect to ≤p and let ξ ∈ X ω . If the set ϕ(Pref ξ ) is infinite then {ϕ(ξ )} = Adh {ϕ(w) | w
−1
(Adh W ) = Adh ϕ −1 (Pref W ).
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
2329
Fig. 2. The situation of Definition 14.
6. Extending partial orders As mentioned above the need to consider partial orders different from the prefix order arose from the following general consideration in [6]: We needed to make a statement about the density of a certain kind of language with respect to all kinds of reasonable topologies; the prefix topology would have been just one special, albeit natural, case. Moreover, we needed a topologically well-founded transition between X ∗ and X ∞ which did not rely on the artifact of a padding symbol like ⊥ considered before. Therefore, in this section we consider extensions of partial orders ≤ on X ∗ to the set X ∞ . Since we want the infinite words to be limits of sequences of finite words, we make them maximal elements in the extended order. Definition 11. Let ≤ be a partial order on X ∗ . The relation on X ∞ × X ∞ defined by
η ≤ η0 , if η, η0 ∈ X ∗ , η = η0 , if η, η0 ∈ X ω , (6) ∗ 0 ∃v (v ∈ X ∧ η ≤ v
η η ⇐⇒ 0
From the third case of Definition 11 one concludes: Remark 12. Let ≤ be an extended partial order on X ∞ . For all ξ ∈ X ω and all w ∈ X ∗ , if w
w ∈ X ∗ and ξ ∈ X ω .
Proposition 13. Let ξ ∈ X ω , w ∈ X ∗ , and let ≤ be an extended partial order on X ∞ . Then ξ ∈ Bw , that is, w ≤ ξ , if and only if Pref ξ ∩ Bw 6= ∅. Proof. If w ≤ ξ there is a u
By Corollary 20 and Example 21, the extensions of several highly relevant partial orders are indeed confluent. For extended partial orders we obtain the following equivalence. Lemma 16. Let ≤ be an extended partial order. The relation ≤ is confluent if and only if Bw ∩ Bv =
S
u∈X ∗ w,v≤u
Bu for all w, v ∈ X ∗ .
Proof. For all u, v ∈ X ∗ one has v ≤ u if and only if Bu ⊆ Bv for any extended partial order ≤. This proves the inclusion ⊇. Now assume that ≤ is confluent. We prove the converse inclusion. Let η ∈ Bw ∩ Bv , that is, w, v ≤ η. If η ∈ X ∗ then S η ∈ Bη ⊆ w,v≤u Bu . If η ∈ X ω , in view of Definition 14 there is a u ∈ X ∗ with w, v ≤ u
Example 17. We consider the infix order ≤i . (1) If SX = {0, 1} then B0 ∩ B1 = B01 ∪ B10 , and the union is finite, whereas the minimal representation B01 ∩ B10 = n n n≥1 B01 0 ∪ B10 1 is an infinite union. S (2) If we consider B0 ∩ B1 over the ternary alphabet X = {0, 1, 2} then a minimal representation is B0 ∩ B1 = n≥0 B02n 1 ∪ B12n 0 , where the union is infinite. Neither B01 ∩ B10 nor, in the ternary case, B0 ∩ B1 can be represented as finite unions.
2330
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
6.1. Prefix-based partial orders Intuitively, taking limits of words implies that one moves from prefixes to prefixes; hence the pre-dominance of considerations based on the prefix order. While we shall not dwell on this point in the present paper, it is far less intuitive what a topology on words would look like if one took away the European way of reading words from left to right. In this section we consider topologies from partial orders which are compatible with the prefix order. Hence, ideas derived for the latter can be adequately generalized. We investigate particular cases of confluent extended partial orders. Several prominent instances of such orders are given in Example 21. Definition 18. A partial order ≤ on X ∗ is said to be prefix-based if, for all w, v ∈ X ∗ , w ≤p v implies w ≤ v , Lemma 19. A partial order ≤ on X ∗ is prefix-based if and only if, for all w, v, u ∈ X ∗ , w ≤ v and v ≤p u imply w ≤ u. Proof. Let ≤ be prefix-based and let w ≤ v and v ≤p u. Then v ≤ u and, since ≤ is transitive, we get w ≤ u. Conversely, if w ≤ v and v ≤p u imply w ≤ u, we choose w = v and obtain that w ≤p u implies v ≤ u. Corollary 20. If a partial order ≤ on X ∗ is prefix-based then its extension to X ∞ is confluent. Proof. Assume w, v ≤ ξ for w, v ∈ X ∗ and ξ ∈ X ω . According to Eq. (6) there are uw , uv ∈ X ∗ such that w ≤ uw
infix order ≤i , embedding (or shuffle) order ≤e , quasi-lexicographical order ≤q-lex , and lexicographical order ≤lex .
When ≤ = ≤p the resulting topology τ≤p on X ∞ is a Scott topology (see [56]), that is, every directed family w0 ≤p · · · wi ≤p wi+1 ≤p · · · has a least upper bound. The partial orders considered above do not have this property. Consider, for example, the directed family 0 ≤ · · · ≤ 0i ≤ 0i+1 ≤ · · · where ≤ is a partial order. When ≤ = ≤p , the ω-word 0ω is the unique (andQ‘‘natural’’) upper bound. On the other hand, when ≤ is any one of the relations considered ∞ above, in addition to 0ω , also i=0 0i · 1 is an upper bound. For prefix-based relations ≤ we have a connection between ≤ and the prefix relation similar to Proposition 13. Proposition 22. Let ξ ∈ X ω , w ∈ X ∗ , and let ≤ be the extension of a prefix-based partial order. Then w ≤ ξ if and only if Pref ξ \ Bw is finite. Proof. If w ≤ ξ there is a u
(7)
Similarly to the right topology α≤ , for w ∈ X , the set Bw is the smallest open set containing w , and, since the family Tn Bw w∈X ∗ is countable, the topology τ≤ has the countable basis { i=1 Bwi | n ∈ N ∧ wi ∈ X ∗ for i = 1, . . . , n}. ∗
From Lemma 16, we obtain a necessary and sufficient condition as to when the family Bw w∈X ∗ is a basis. Proposition 23. The family Bw w∈X ∗ is a basis of the topology τ≤ if and only if the extended partial order ≤ is confluent. Proposition 24. Let ≤ be an extended partial order on X ∞ . Then, for ξ ∈ X ω , one has Bξ = {ξ }, and Bξ is not open in τ≤ . If, moreover, the order ≤ is confluent, then no non-empty subset F ⊆ X ω is open.
Proof. By definition, Bξ = {ξ }. Assume that {ξ } is open. Then {ξ } contains a non-empty basis set i=1 Bwi . Hence ξ ∈ Bwi , for all i = 1, . . . , n. Consequently, for every i = 1, there is a prefix ui
Tn
The next example shows that the hypothesis for ≤ to be confluent is indeed essential. Example 25. Consider X = {0, 1} and the suffix order ≤s . Then B0 ∩ B1 = X ω \ {0ω , 1ω } ⊆ X ω is open.
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
2331
7.1. Accumulation points and cluster points In this part we use the fact that Bw is the smallest open set containing w ∈ X ∗ to describe the accumulation and cluster points in the topology τ≤ in greater detail. As an immediate consequence, we obtain a result on finite words. Lemma 26. Let w ∈ X ∗ and M ⊆ X ∞ . (1) w is an accumulation point of M with respect to τ≤ if and only if Bw ∩ (M \ {w}) 6= ∅. (2) w is a cluster point of M with respect to τ≤ if and only if Bw ∩ M is infinite. For infinite words we obtain the following. Lemma 27. Let ξ ∈ X ω and M ⊆ X ∞ . (1) ξ is an accumulation point of M with respect to τ≤ if and only if Pref ξ ⊆ {v | Bv ∩ M \ {ξ } 6= ∅}. (2) ξ is a cluster point of M with respect to τ≤ if and only if Pref ξ ⊆ {v | card (Bv ∩ M ) ≥ ℵ0 }.
Proof. If ξ is an accumulation point of M then M 0 ∩(M \{ξ }) 6= ∅ for every open set M 0 containing ξ . This holds, in particular, for every basis set Bv with v
containing ξ . Thus, w ≤ ξ and, according to Definition 11, there is a v
Now Eq. (5) yields the following characterisation of the closure cl≤ . Corollary 28. Let ≤ be an extended partial order on X ∞ and, for M ⊆ X ∞ , let M = w | w ∈ X ∗ ∧ Bw ∩ M 6= ∅ . Then
cl≤ (M ) = M ∪ ξ | ξ ∈ X ω ∧ Pref ξ ⊆ M .
The following example shows that in our topologies – unlike metric topologies – accumulation points and cluster points, even in X , can be different. Example 29. Consider X = {0, 1} and the quasi-lexicographic order ≤q-lex . All non-empty open sets contain 1ω . Thus every ξ ∈ {0, 1}ω \ {1ω } is an accumulation point of the set M = {1ω }. But M has no cluster points. Definition 30. A partial order ≤ on X ∞ is well-founded if, for every w ∈ X ∗ , the set Pred≤ w of predecessors of w is finite, Theorem 31. Let ≤ be a well-founded prefix-based partial order on X ∞ , let W ⊆ X ∗ and let ξ ∈ X ω . Then ξ is an accumulation point of W if and only if it is a cluster point of W . Proof. By Remark 2 every cluster point of W is an accumulation point of W . For the converse, we use Lemmata 26 and 27 to show that, if, for all v
\ u
Bu ∩ W then u ≤ w for all u ∈ Pref ξ which contradicts the fact that ≤ is well-founded. Consequently,
(Bu ∩ W )u
We conclude this subsection with two examples which show that the assumptions regarding the partial order ≤ in Theorem 31 are essential. Example 32. The lexicographical order ≤lex is not well-founded but prefix-based. Consider the language W = {11} ⊆ {0, 1}∗ . Then the infinite word 1 · 0ω is an accumulation point of W . Since W is finite, it cannot have cluster points. Example 33. The suffix order ≤s is well-founded but not prefix-based. Let again X = {0, 1} and consider the language W = {0} ∪ 1∗ · 101 · 1∗ and the infinite word ξ = 0 · 1ω . Here Bw ∩ W 6= ∅ for all w ∈ Pref ξ = {ε} ∪ 0 · 1∗ and B0 ∩ W is finite.
2332
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
7.2. Adherences related to the topologies τ≤ It is interesting to note that that the closure operator cl≤ of the topology τ≤ is closely related to the language-theoretical operation of adherence. Adherence (or ls-limit) was first introduced for the prefix relation ≤p (see [57,61,38,42,4,58,59]), and then in [16] for the infix order ≤i . In this section we define the operation of adherence for arbitrary extended partial orders ≤ and we prove its relation to the corresponding closure operation cl≤ . Moreover we show that for prefix-based partial orders adherence can be expressed with the aid of the prefix order. For notational convenience, given a partial order ≤ on X, we define a relation, also denoted by ≤, on 2X as follows: Let M , M 0 ⊆ X. Then M ≤ M 0 if and only if for every x ∈ M there is an x0 ∈ M 0 such that x ≤ x0 . Proposition 34. The relation ≤ on 2X has the following properties: (1) (2) (3) (4) (5) (6)
≤ is reflexive and transitive. ≤ is not necessarily anti-symmetric. M ⊆ M 0 implies M ≤ M 0 . ∅ ≤ M for all M ⊆ X. S S Let I be a set and, for i ∈ I, let Mi , Mi0 ⊆ X such that Mi ≤ Mi0 . Then i∈I Mi ≤ i∈I Mi0 . With X = X ∞ , let ≤ be an extended partial order, let M ⊆ X ∗ and M 0 ⊆ X ∞ . Then M ≤ M 0 if and only if M ⊆ {w | Bw ∩ M 0 6= ∅}.
Proof. Assertions (1)–(5) are direct consequences of the definition. For (6) one uses Eq. (7). By Proposition 34(6), in the particular case of the prefix order ≤p and subsets W ⊆ X ∗ , M 0 ⊆ X ∞ , one has W ≤p M 0 if and only if W ⊆ Pref M 0 . For extended partial orders we obtain the following properties. Lemma 35. Let ≤ be an extended partial order on X ∞ , let ξ ∈ X ω , and let M ⊆ X ∞ . Then Pred≤ ξ ≤ M if and only if Pref ξ ≤ M. Proof. By Remark 12, Pref ξ ⊆ Pred≤ ξ ; hence, if Pred≤ ξ ≤ M then Pref ξ ≤ M. To prove the converse implication, consider v ≤ ξ . Then there is a u
is the ≤-adherence of W . Remark 39. Adh≤ W = {ξ | ξ ∈ X ω ∧ Pred≤ ξ ≤ W }. Proposition 40. If ≤ is an extended partial order then Adh≤ W = {ξ | ξ ∈ X ω ∧ Pref ξ ≤ W }. Proof. This follows from Lemma 35. Lemma 41. Let ≤ be an extended partial order on X ∞ and W ⊆ X ∗ . (1) Adh≤ W is the set of accumulation points of W in X ω . (2) If ≤ is well-founded and prefix-based then Adh≤ W is the set of cluster points of W .
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
2333
Proof. Let ξ ∈ X ω . In view of the equivalence of v ≤ w and w ∈ Bv we have Pref ξ ⊆ {v | Bv ∩ W 6= ∅} if and only if Pref ξ ≤ W . Now Proposition 40 proves the first assertion. Assertion (2) follows from (1) and Theorem 31. Now we can prove the result as announced. Theorem 42. Let W ⊆ X ∗ , F ⊆ X ω , and let ≤ be an extended partial order on X ∞ . Then the closure of W ∪ F in the topology τ≤ satisfies cl≤ (W ∪ F ) = Pred≤ (W ∪ F ) ∪ Adh≤ (W ∪ Pref F ). Proof. By Corollary 28 one has cl≤ (W ∪ F ) = v | v ∈ X ∗ ∧ Bv ∩ (W ∪ F ) 6= ∅ ∪ ξ | ξ ∈ X ω ∧ Pref ξ ⊆ {v | Bv ∩ (W ∪ F ) 6= ∅} .
Observe that {v | v ∈ X ∗ ∧ Bv ∩ (W ∪ F ) 6= ∅} = Pred≤ (W ∪ F ). Lemma 36 shows that the conditions Bv ∩ (W ∪ F ) 6= ∅ and Bv ∩ (W ∪ Pref F ) 6= ∅ are equivalent. Thus
ξ | ξ ∈ X ω ∧ Pref ξ ⊆ {v | Bv ∩ (W ∪ F ) 6= ∅} = ξ | ξ ∈ X ω ∧ ∀v v ≤p ξ → v ∈ W ∪ Pref F ,
and the assertion is proved. For the infix order, Dare and Siromoney [16] obtained the identity cl≤i (W ) = Inf (W ∪ F ) ∪ Adh≤i (W ∪ Inf F ) where Inf M = {v | v ∈ X ∗ ∧ ∃η(η ∈ M ∧ v ≤i η)}. As Pred≤i = Inf the result of [16] is a special case of Theorem 42. Corollary 43. Let ≤ be an extended partial order on X ∞ , and let W ⊆ X ∗ . Then Adh≤ W = cl≤ (W ) ∩ X ω . 7.3. Limits of sequences We investigate general properties of the topological spaces τ≤ in connection with the language-theoretical operation adherence. As mentioned before we want to study limits of sequences w0 < · · · < wj < wj+1 < · · · in the topology τ≤ . Recall that a point η ∈ X ∞ is in the limit of the sequence wj j∈N if and only if wj ≤ η for almost all j ∈ N. Thus, if wi 6= wj for i 6= j, the set of limit points lim wj is a subset of the set of cluster points of {wj | j ∈ N}.
Lemma 44. Let w0 < w1 < · · · < wj < · · · be an infinite family of words, and let the partial order ≤ be well-founded. Then lim wj j∈N = Adh≤ {wj | j ∈ N}. Proof. As ≤ is well-founded, no limit point of wj j∈N can be a finite word. The inclusion lim wj j∈N ⊆ cl≤ {wj | j ∈ N} follows from Theorem 1, because the topology τ≤ has a countable basis, and from Corollary 43. Conversely, let ξ ∈ cl≤ {wj | j ∈ N} ∩ X ω = Adh≤ {wj | j ∈ N}. Then, according to Corollary 28, for every open set M
\n
containing ξ there is a j0 ∈ N such that wj0 ∈ M. Without loss of generality, we may assume M = Bvi to be a basis set. i =1 Thus vi ≤ wj0 for i = 1, . . . , n. Now the assumption w0 < w1 < · · · < wj < · · · shows that wj ∈ Bvi for all i = 1, . . . , n and j ≥ j0 ; hence ξ ∈ lim wj j∈N . 8. The topology on X ω induced by τ≤ (ω)
In this section we briefly investigate the topologies τ≤ on the space of infinite words X ω which are induced by the quasi-right topologies τ≤ on X ∞ . These topologies are defined by the sub-basis Ew w∈X ∗ where Ew = {ξ | ξ ∈ X ω ∧ w ≤ ξ }. (ω)
(ω)
The first result concerns the closure operator cl≤ of τ≤ . (ω)
(ω)
Theorem 45. Let ≤ be an extended partial order on X ∞ . Then cl≤ F = Adh≤ Pref F is the closure of F ⊆ X ω in the topology τ≤ . (ω)
(ω)
Proof. Since τ≤ is the topology on X ω induced by τ≤ , we have cl≤ (F ) = cl≤ (F ) ∩ X ω . Now the assertion follows from Corollary 43. In connection with Lemma 44 this result establishes conditions for the limit of an increasing family of words w0 < · · · < wj < wj+1 < · · · to have a unique limit point in X ω . A necessary condition for this is obviously, that the topology τ≤(ω) should have the singletons {ξ }, ξ ∈ X ω , as closed sets. We now investigate this issue for the partial orders of Example 21.
2334
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
8.1. Quasi-lexicographical and lexicographical order The case of the quasi-lexicographical order ≤q-lex is trivial. Example 46. The topology on X ω induced by τ≤q-lex is trivial: only ∅ and X ω are open, as w ≤q-lex ξ for all w ∈ X ∗ and ξ ∈ X ω . For the case of the lexicographical order some preliminary considerations are needed. Regard the alphabet X as the set of non-zero q-ary digits X = {1, . . . , q − 1} where card X = q − 1, and identify an ∞-word η ∈ X ∞ with the q-ary expansion 0.η of a number in the real interval [0, 1]. For ω-words this yields an injective and continuous mapping ν from X ω into the interval [0, 1] the image of which, ν(X ω ), is closed. Example 47. For w ∈ X ∗ and ξ ∈ X ω , w ≤lex ξ if and only if ν(w) ≤ ν(ξ ). This implies that, for ζ , ξ ∈ X ω , ν(ζ ) ≤ ν(ξ ) if and only if Pred≤lex ζ ⊆ Pred≤lex ξ . Thus the topology on X ω induced by τ≤lex is homeomorphic to the right topology on the closed subset ν(X ω ) of the unit interval. Among its closed sets, only the set {1ω } is finite. All other closed sets are infinite. Note that ν(1ω ) is the minimum of ν(X ω ). 8.2. Subword topology and disjunctive ω-words (ω)
The topology τ≤i , also known as the subword topology, was investigated in [16,60]. To study it, the following notion of disjunctivity is useful. Definition 48 ([29]). An ω-word ξ ∈ X ω is disjunctive if w ≤i ξ for all w ∈ X ∗ . The subword topology on X ω has the following property. Example 49 ([60]). The topology on X ω induced by τ≤i has the set of all disjunctive ω-words as the intersection of all its non-empty open sets, that is, the closure of every singleton set {ξ }, where ξ is disjunctive, is the whole space X ω . The only closed singleton sets in this topology are the sets {aω } where a ∈ X . 8.3. Embedding order (ω)
The investigation of the topology τ≤e induced by the embedding order can be carried out in a manner analogous to the subword topology (see also [16]). Here the ω-words containing each letter a ∈ X infinitely often play the same rôle as the disjunctive words in the case of the subword topology. Example 50. The topology on X ω induced by τ≤e has the set of all ω-words containing each letter a ∈ X infinitely often as the intersection of all its non-empty open sets, that is, the closure of every singleton {ξ }, where ξ contains each letter infinitely often, is the whole space X ω . The only closed singletons in this topology are the sets {aω } where a ∈ X . 9. Final comments We have identified some principles of inference by which sequences of finite words are extrapolated to infinite words and by which continuous functions on words can be defined. These principles are not restricted to the prefix order of words itself, but still rely on it quite heavily. It should be possible to derive far more general principles which apply to many more relations between words by changing the intuition about words being read left to right. Our main point in this paper is to focus on the underlying topologies and to expose the difficulty of defining meaningful topologies on X ∞ . Acknowledgement This research was supported in part by the Natural Sciences and Engineering Research Council of Canada. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
T. Ang, J.A. Brzozowski, Continuous languages. In Csuhaj-Varjú and Ésik [14] 74–85. A. Apostolico, String editing and longest common subsequences. In Rozenberg and Salomaa [52] 2, 361–398. J. Bertrema, Topologies sur des espaces ordonnés, RAIRO Inform. Théor. 16 (1982) 165–182. L. Boasson, M. Nivat, Adherences of languages, J. Comput. System Sci. 20 (1980) 285–309. C.S. Calude, H. Jürgensen, L. Staiger, Topology on strings, in: Joint Workshop: Domains VIII and Computability over Continuous Data Types, Novosibirsk, 11–15 September, 2007. C.S. Calude, H. Jürgensen, M. Zimand, Is independence an exception? Appl. Math. Comput. 66 (1994) 63–76. C.S. Calude, Sur une classe de distances dans un demi-groupe libre, Bull. Math. Soc. Sci. Math. R. S. Roumanie (N.S.) 17 (65) (1973) 123–133. C.S. Calude, On the metrizability of a free monoid, Discrete Math. 15 (1976) 307–310. C.S. Calude, V.E. Căzănescu, On topologies generated by Moisil resemblance relations, Discrete Math. 25 (1979) 109–115. C.S. Calude, S. Marcus, L. Staiger, A topological characterization of random sequences, Inform. Process. Lett. 88 (2003) 245–250. C.S. Calude, K. Salomaa, S. Yu, Additive distance and quasi-distances between words, J. UCS 8 (2002) 141–152. Y.A. Choueka, Structure automata, IEEE Trans. Comput. C-23 (1974) 1218–1227.
C.S. Calude et al. / Theoretical Computer Science 410 (2009) 2323–2335
2335
[13] A.H. Clifford, G.B. Preston, The Algebraic Theory of Semigroups, Vols. I, II, in: Mathematical Surveys, vol. 7, American Mathematical Society, Providence, RI, 1961, 1967. [14] E. Csuhaj-Varjú, Z. Ésik (Eds.), Automata and Formal Languages. 12th International Conference, AFL 2008, Balatonfüred, Hungary, 27–30 May, 2008. Proceedings, Computer and Automation Research Institute, Hungarian Academy of Sciences, 2008. [15] K. Culik II, A. Salomaa, On infinite words obtained by iterating morphisms, Theoret. Comput. Sci. 19 (1982) 29–38. [16] V.R. Dare, R. Siromoney, Subword topology, Theoret. Comput. Sci. 47 (1986) 159–168. [17] A. Dincă, Sur quelques problèmes d’analyse contextuelle métrique, Rev. Roumaine Math. Pures Appl. 13 (1968) 65–70. [18] A. Dincă, Distanţe contextuale în lingvistica algebrică, Stud. Cerc. Mat. 25 (1973) 223–265. [19] A. Dincă, Distanţe şi diametre într-un semigrup (cu aplicaţii la teoria limbajelor), Stud. Cerc. Mat. 25 (1973) 359–378. [20] A. Dincă, The metric properties on the semigroups and the languages, in: A. Mazurkiewicz (Ed.), Mathematical Foundations of Computer Science 1976 Proceedings, 5th Symposium, Gdánsk, 6–10 September, 1976, in: Lecture Notes in Computer Science, vol. 45, Springer-Verlag, Berlin, 1976, pp. 260–264. [21] R. Engelking, General Topology, PWN, Warszawa, 1977. [22] M. Hall Jr., A topology for free groups and related groups, Ann. of Math. 52 (1950) 127–139. [23] T. Head, Adherence equivalence is decidable for D0L languages, in: M. Fontet, K. Mehlhorn (Eds.), STACS 84: Symposium of Theoretical Aspects of Computer Science, Paris, 11–13 April, 1984, in: Lecture Notes in Computer Science, vol. 166, Springer-Verlag, Berlin, 1984, pp. 241–249. [24] T. Head, The adherences of languages as topological spaces, in: M. Nivat, D. Perrin (Eds.), Automata on infinite words. Ecole de printemps d’informatique théorique, Le Mont Dore, 14–18 May, 1984, in: Lecture Notes in Computer Science, vol. 192, Springer-Verlag, Berlin, 1985, pp. 147–163. [25] T. Head, The topological structure of adherences of regular languages, RAIRO Inform. Théor. Appl. 20 (1986) 31–41. [26] T. Head, Adherences of D0L languages, Theoret. Comput. Sci. 31 (1984) 139–149. [27] P.-C. Héam, Automata for pro-V topologies, in: S. Yu, A. Păun (Eds.), Implementation and Application of Automata, 5th International Conference, CIAA 2000, London, Ontario, Canada, July 2000, Revised Papers, in: Lecture Notes in Computer Science, vol. 2088, Springer-Verlag, Berlin, 2001, pp. 135–144. [28] S. Istrail, Some remarks on non-algebraic adherences, Theoret. Comput. Sci. 21 (1982) 341–349. [29] H. Jürgensen, H.J. Shyr, G. Thierrin, Disjunctive ω-languages, Elektron. Informationsverarbeit. Kybernetik. 19 (6) (1983) 267–278. [30] H. Jürgensen, L. Kari, G. Thierrin, Morphisms preserving densities, Internat. J. Comput. Math. 78 (2001) 165–189. [31] H. Jürgensen, S. Konstantinidis, Codes. in Rozenberg and Salomaa [52] 1, 511–607. [32] H. Jürgensen, I. McQuillan, Homomorphisms preserving types of densities. in Csuhaj-Varjú and Ésik [14] 183–194. [33] H. Jürgensen, G. Timmermann, Unscharfe Dominanz und ihre Berechnung, Rev. Roumaine Math. Pures Appl. 25 (1980) 871–878. [34] W. Kuich, Semirings and formal power series: Their relevance to formal languages and automata. in Rozenberg and Salomaa [52] 1, 609–677. [35] K. Kuratowski, Topology I, Academic Press, New York, 1966. [36] M. Latteux, E. Timmermann, Two characterizations of rational adherences, Theoret. Comput. Sci. 46 (1986) 101–106. [37] V.I. Levenshtein, Dvoiqnye kody c ispravleniem vypadeni˘ i, vstavok i zameweni˘ i simvolov, Dokl. Akad. Nauk. SSSR 163 (4) (1965) 845–848. English translation: Binary codes capable of correcting deletions, insertions, and reversals, Sov. Phys. Dokl. 10 (1966) 707–710. [38] R. Lindner, L. Staiger, Algebraische Codierungstheorie; Theorie der sequentiellen Codierungen, Akademie-Verlag, Berlin, 1977. [39] M. Malitza, Topology, binary relations and internal operations, Rev. Roumaine Math. Pures Appl. 22 (1977) 515–519. [40] S. Marcus, Introduction mathématique à la linguistique structurale, Dunod, Paris, 1966. [41] E. Nelson, Categorical and topological aspects of formal languages, Math. Systems Theory 13 (1980) 255–273. [42] M. Nivat, Infinite words, infinite trees, infinite computations, in: J.W. de Bakker, J. van Leeuwen (Eds.), Foundations of Computer Science III Part2: Languages, Logic, Semantics, in: Mathematical Centre Tracts, vol. 109, Mathematisch Centrum, Amsterdam, 1979, pp. 3–52. [43] D. Perrin, J.-É. Pin, Infinite Words, Elsevier, Amsterdam, 2004. [44] J.E. Pin, Topologies for the free monoid, J. Algebra 137 (1991) 297–337. [45] J.-E. Pin, Polynomial closure of group languages and open sets of the Hall topology, in: S. Abiteboul, E. Shamir (Eds.), Automata, Languages and Programming, 21st Internat. Coll., ICALP94, Jerusalem, Israel, July1994, Proceedings, in: Lecture Notes in Computer Science, vol. 820, Springer-Verlag, Berlin, 1994, pp. 424–435. [46] J.-E. Pin, C. Reutenauer, A conjecture on the Hall topology for the free group, Bull. London Math. Soc. 23 (1991) 356–362. [47] H. Prodinger, Topologies on free monoids induced by closure operators of a special type, RAIRO Inform. Théor. 14 (1980) 225–237. [48] H. Prodinger, Topologies on free monoids induced by families of languages, RAIRO Inform. Théor. 17 (1983) 285–290. [49] V.T. Raischi, O distanţă definită în limbajele semantice, Stud. Cerc. Mat. 26 (1974) 265–279. [50] R.R. Redziejowski, Infinite word languages and continuous mappings, Theoret. Comput. Sci. 43 (1986) 59–79. [51] T. Richter, Lokalisierung in formalen Sprachen, Diplomarbeit, Institut für Informatik, Martin-Luther-Universität Halle-Wittenberg, 2007. [52] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Springer-Verlag, Berlin, 1997. [53] W. Shukla, A.K. Srivastava, A topology for automata: A note, Inform. Control 32 (1976) 163–168. [54] H.J. Shyr, Free Monoids and Languages, third ed., Hon Min Book Company, Taichung, 2001, iv+282 pp. [55] R. Siromoney, V.R. Dare, On infinite words obtained by selective substitution grammars, Theoret. Comput. Sci. 39 (1985) 281–295. [56] M.B. Smyth, Topology, in: S. Abramsky, D.M. Gabbay, T.S.E. Maibaum (Eds.), Handbook of Logic in Computer Science, vol. 1, Clarendon Press, Oxford, 1992, pp. 641–761. [57] L. Staiger, Über ein Analogon des Satzes von Ginsburg-Rose für sequentielle Folgenoperatoren und reguläre Folgenmengen. Diplomarbeit, Sektion Mathematik, Friedrich-Schiller-Universität Jena, 1970. [58] L. Staiger, Sequential mappings of ω-languages, RAIRO Inform. Théor. 21 (1987) 147–173. [59] L. Staiger, ω-languages. in Rozenberg and Salomaa [52] 3, 339–387. [60] L. Staiger, Topologies for the set of disjunctive ω-words, Acta Cybernet. 17 (2005) 43–51. [61] L. Staiger, K. Wagner, Automatentheoretische und automatenfreie Charakterisierungen topologischer Klassen regulärer Folgenmengen, Elektron. Informationsverarbeit. Kybernetik. 10 (1974) 379–392. [62] W. Thomas, Automata on infinite objects, in: J. van Leeuwen (Ed.), Handbook of Theoretical Computer Science B, North-Holland, Amsterdam, 1990, pp. 135–191. [63] R. Valk, Topologische Wortmengen, topologische Automaten, zustandsendliche stetige Abbildungen, Mitteilungen der Gesellschaft für Mathematik und Datenverarbeitung, vol. 19, GMD, Bonn, 1972. [64] H. Walter, Topologies on formal languages, Math. Systems Theory 9 (1975) 142–158. [65] S.-S. Yu, Languages and Codes, Tsang Hai Book Publishing Co., Taichung, Taiwan, 2005. [66] B. Zelinka, Un langage avec la distance contextuelle finie, mais non bornée, Sborník Védeckých Prací Vysoké Školy Strojní a Textilní v Liberci 6 (1967) 9–12.
Theoretical Computer Science 410 (2009) 2336–2344
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
On the intersection of regex languages with regular languages Cezar Câmpeanu a,∗ , Nicolae Santean b a
Department of Computer Science and Information Technology, University of Prince Edward Island, Canada
b
Department of Computer and Information Sciences, Indiana University South Bend, IN, USA
article
info
Keywords: Extended regular expression Regex automata system Regex
a b s t r a c t In this paper we revisit the semantics of extended regular expressions (regex), defined succinctly in the 90s [A.V. Aho, Algorithms for finding patterns in strings, in: Jan van Leeuwen (Ed.), Handbook of Theoretical Computer Science, in: Algorithms and Complexity, vol. A, Elsevier and MIT Press, 1990, pp. 255–300] and rigorously in 2003 by Câmpeanu, Salomaa and Yu [C. Câmpeanu, K. Salomaa, S. Yu, A formal study of practical regular expressions, IJFCS 14 (6) (2003) 1007–1018], when the authors reported an open problem, namely whether regex languages are closed under the intersection with regular languages. We give a positive answer; and for doing so, we propose a new class of machines — regex automata systems (RAS) — which are equivalent to regex. Among others, these machines provide a consistent and convenient method of implementing regex in practice. We also prove, as a consequence of this closure property, that several languages, such as the mirror language, the language of palindromes, and the language of balanced words are not regex languages. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Regular expressions are powerful programming tools present in many scripting languages such as Perl, Awk, PHP, and Python, as well as in most programming languages implemented after year 2000. Despite a similar nomenclature, these practical regular expressions (called regex in our paper) are more powerful than the regular expressions defined in formal language theory, mainly due to the presence of the back-reference operator. This operation allows us to express patterns (repetitions) in words, therefore regex can specify languages beyond the regular family. For example, the regex (a∗ b)\1 expresses all the double words starting with arbitrary many a’s followed by a b: the operator ‘‘\1’’ is a reference to (copy of) the content of the first pair of parentheses. The current implementations of extended regular expressions are plagued by many conceptual problems, which can readily be demonstrated on many systems. For example, the use of Perl1 regex ((a)|(b)) ∗ \2 or ((a)|(b)) ∗ \2\3 leads to an erratic behavior due to its inherent semantic ambiguity. Furthermore, in Perl, the expression () is considered to match the empty word, whereas it should arguably match the empty set; thus, there is no semantic difference between the Perl expressions () and ()∗ . Moreover, in theory, a back-reference should replicate the last match of its corresponding parenthesis if such a match has occurred, or the ∅ otherwise. In the following Perl example this is not the case: ((a|b)|(b|a)) ∗ c \2\3 matches babbbcbb, but not babbcab, however, ((a|b)|(b|a)∗) ∗ c \2\3 matches both in some implementations.2 Here the behavior suggests that the second parenthesis matches always ε and never b0 s. Tested on babbcba and abcba, we discover that these words are matched, suggesting that non-determinism in these regex implementations is selective. Thus, we observe implementation inconsistencies, ambiguities and a lack of standard semantics. This unfortunate status quo of having flawed
∗
Corresponding author. E-mail address:
[email protected] (C. Câmpeanu).
1 Tested on more than ten different implementations of Perl 5.x on Solaris and Linux systems. 2 Newer versions of Perl seem to have fewer such pathological cases, however, we found other cases of queer behavior that were not present in the previous versions. 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.022
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
2337
regex implementations, as well as an incomplete theoretical foundation, has recently lead to an increased research effort aiming at their better understanding. Some of the problems of regex semantics have been addressed recently in the work of Câmpeanu, Kai Salomaa, and Sheng Yu, who have initiated a rigorous formalism for regex in [2]. In addition, Câmpeanu and Sheng Yu provide an alternative to this formalism, by introducing pattern expressions in [4]. The present paper continues their line of research, focusing on two matters: to deal with some pathological aspects of regex semantics and, most importantly, to answer an open problem stated in [2, Conclusion], namely whether regex languages are closed under the intersection with regular languages. 2. Definitions and notation Let Σ be an alphabet, that is, a finite set of symbols (or letters). By Σ ∗ we denote all words (strings of symbols) over Σ , and by ε we denote the empty word, i.e., the word with no letters. If w ∈ Σ ∗ , we denote by |w|a the number of occurrences of symbol a in w , and by |w| the length of w (the total number of letters in w ). A language L is a subset of Σ ∗ . The cardinality of a set X is denoted by #(X ). For other notions we refer the reader to [7–10]. An extended regular expression, or regex for brevity, is a regular expression with back-references [6]. This extension can be found in most programming languages and has been conceptualized in several studies, such as [1,2,4]. We give here a definition equivalent to the one found in [1, C. 5, Section 2.3, p. 261]. Definition 1. A regex over Σ is a well-formed parenthesized formula, consisting of operands in Σ ∗ ∪ {\i|i ≥ 1}, the binary operators · and +, and the unary operator ∗ (Kleene star). By convention, () and any other form of ‘‘empty’’ expression is a regex denoting ∅ (consequently, ()∗ will denote ε ). Besides the common rules governing regular expressions, a regex obeys the following syntactic rule: every control character \i is found to the right of the ith pair of parentheses, where parentheses are indexed according to the occurrence sequence of their left parenthesis. The language represented by a regex r is that of all words matching r in the sense of regular expression matching, with the additional semantic rules: (1) During the matching of a word with a regex r, a control \i should match a sub-word that has matched the parenthesis i in r. There is one exception to this rule: (2) If the ith pair of parentheses is under a Kleene star and ‘\i’ is not under the same Kleene star, then ‘\i’ matches the content of the pair of parentheses under the Kleene star, as given by its last iteration. Example 2. The expression r = (a∗ )b\1 defines the language {an ban | n ≥ 0}. For the expression r = (a∗ b)∗ \1, aabaaabaaab ∈ L(r ) and aabaaabaab 6∈ L(r ). Remark 3. Most programming languages use | instead of + to avoid ambiguity between the + sign and the + superscript. Since there is no danger of confusion, in this paper we will use + for alternation. There is a regex construct that exhibits a semantic ambiguity, which should arguably be reported as an error during the syntactic analysis preceding the regex parsing.3 Consider the following example: r = ((a) + (b))(ab + \2). Here, we have a back-reference to the second pair of parentheses, which is involved in an alternation. What happens when this pair of parentheses is not instantiated? We adopt the following convention4 : If a control \i refers to the pair of parentheses i which has not been instantiated due to an alternation, we assume that pair of parentheses instantiated with ∅, thus \i will match ∅ (note that ∅ concatenated with any word or language yields ∅). It turns out that although regex languages are not regular, they are the subject of a pumping lemma similar to that for regular languages [2]. We finally mention the complexity of membership problem for regex: Theorem 4 ([1]). The membership problem for regex is NP-complete. This theorem is true regardless of the interpretation of regex. Notice the big gap between this complexity and the complexity of the membership problem for regular expressions. 3. Regex machines: Regex automata systems In this section we propose a system of finite automata with computations governed by a stack, that addresses the membership problem for regex. The purpose of this automata system is twofold: to give a theoretically sound method for implementing regex in practice, and to prove the closure property of regex under intersection with regular languages.
3 In most programming languages these expressions are called ‘‘bad regex’’, and the recommendation is to avoid such expressions. 4 All the proofs in this paper can be adapted to any other alternative semantics.
2338
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
First we give a definition for a Regex Automata System (RAS), independent of the concept of regex. Let Σ be a finite alphabet and {u1 , v1 , . . . , un , vn } be a set of 2n variable symbols, n ≥ 1. For k ∈ {1, . . . , n} we denote by Σk the alphabet Σ ∪ {u1 , v1 , . . . , uk−1 , vk−1 }, (thus, Σ1 = Σ ). Let n > 0 and let Ak = (Σk , Qk , 0k , δk , Fk ) k∈{1,...,n} be a system of finite automata satisfying the following conditions: (1) for any k ∈ {1, . . . , n}, the variable symbol uh appears as the label of at most one transition, and in at most one automaton Ak , with h < k. When this occurs, we write uh ≺ uk and say that ‘‘the instantiation of uh is included in that of uk ’’. We further denote by the transitive and reflexive closure of ≺, as order over the set {u1 , . . . , un }. (2) for any k ∈ {1, . . . , n}, the variable symbol vk does not appear as the transition label of any automaton Ah with uh uk . These two conditions have an important role for the correct operation of a RAS and in the relationship between regex and RAS. Note that by the first condition, un cannot appear as a transition label of any automaton Ak with 1 ≤ k ≤ n. If we denote Q =
n [
Qk , then we define a regex automata system (RAS) as a tuple A = (A1 , . . . , An , Γ , V1 , . . . , Vn )
k=1
of n finite automata Ak , a stack Γ of depth at most n and storing elements of Q , and n buffers Vk that store words in Σ ∗ , 1 ≤ k ≤ n. For improving the formalism, we will make no distinction between a buffer and its content, or between the stack and its content. Our RAS A is described at any moment of its computation by a configuration of the following form: (q, w, Γ , V1 , V2 , . . . , Vn ), where q ∈ Q , w ∈ Σ ∗ , Γ is the stack content (of elements of Q ), and the buffer Vk stores a word in Σ ∗ that has the role of instantiating the variable uk , for all k ∈ {1, . . . , n}. The computation starts with an initial configuration (0n , w, ε, ∅, ∅, . . . , ∅), and the system transits from configuration
|
{z n
}
to configuration
(s, αw, Γ (t ) , V1(t ) , V2(t ) , . . . , Vn(t ) ) 7→ (q, w, Γ (t +1) , V1(t +1) , V2(t +1) , . . . , Vn(t +1) ) in one of the following circumstances: (t +1)
(1) letter-transition: α = a ∈ Σ , s ∈ Qk , q ∈ δk (s, a), Γ (t +1) = Γ (t ) , Vh (t +1)
(t )
= Vh(t ) a for all h such that uk uh , and
= Vh for all the other cases. (t +1) (t ) = Vl(t ) α for all l such that uk ul , and (2) v -transition: α ∈ Σ ∗ , s ∈ Qk , q ∈ δk (s, vh ), Γ (t +1) = Γ (t ) , Vh = α , Vl (t ) (t +1) (t ) for all the other cases. Obviously, when Vh = ∅ this transition cannot be performed. Vl = Vl (t +1) (3) u-transition: α = ε , s ∈ Qk , r ∈ δk (s, uh ), q = 0h , Γ = push(r , Γ (t ) ), Vh(t +1) = ε , and Vl(t +1) = Vl(t ) for all l 6= k. (t +1) (t ) for all l. (4) context switch: α = ε , s ∈ Fh (h 6= n), q = top(Γ (t ) ), Γ (t +1) = pop(Γ (t ) ), and Vl = Vl Vh
If f ∈ Fn , then a configuration (f , ε, ε, V1 , V2 , . . . , Vn ) is final. A computation is successful if it reaches a final configuration. At the end of a successful computation, the buffer Vn will store the initial input word, whereas the buffers Vk with 1 ≤ k < n contains the last word that has instantiated the variable uk . The difference between the variables uk and vk sharing the same buffer Vk , is that the variable uk is ‘‘instantiated’’ (or reinstantiated) with the content of the buffer, while vk simply uses the buffer to match a portion of the input word with the last instantiation of uk . Note that the stack Γ may have at most n − 1 elements. If qq0 ∈ Γ (the top of the stack is to the right) with q ∈ Qk and q0 ∈ Qh , we have that uh ≺ uk . Thus, Γ can be viewed as part of the finite control of A. What makes a RAS more powerful than a finite automaton is the set of n buffers, each capable of holding an arbitrary long input. In order to prove that RAS and regex are equivalent, we present a conversion of a regex into a RAS and vice versa. For our construction, the usual indexing of regex used in practice is not useful. We require another manner of indexing parentheses in a regex, to obey the following rules: (1) the entire expression is enclosed by a pair of parentheses; (2) inner pairs of parentheses have an index smaller than those of the parentheses that surround them; (3) if two pairs of parentheses enclosed in a outer one are not nested, then the left pair has a higher index than the right one. This order corresponds to inverse BFS (breadth-first search) order of traversing the parenthesis tree. We mention that the third condition above is not crucial, however, it helps the formalism. One can easily transform a ‘‘classical’’ regex into one obeying the above rules. For example, the regex (1 (2 a∗ b)∗ c )\2 + (3 a∗ (4 b + ba))\3 is reindexed, and the back-references are adjusted as follows: (5 (4 (2 a∗ b)∗ c )\2 + (3 a∗ (1 b + ba))\3). It is easy to observe that changing the rules of indexing in this manner and adjusting the back-references accordingly will not change the interpretation of the regex. Let r be a regex with parentheses indexed according to this new convention. The parentheses of r are numbered as 1, 2, . . . , n, and obviously, the nth pair of parentheses is the outmost one. To each pair of parentheses (k −) we associate a variable symbol uk , regardless of whether this pair is back-referenced or not. To each back-reference \k we associate another variable symbol vk . These two sets of variables are used in the matching of a word as follows: uk will store the content of the
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
2339
kth parenthesis used in matching, whereas vk will enforce the matching of an input sub-word with the already instantiated content of uk . To every pair of parentheses uk we associate a regular expression rk over Σk (the sub-expression enclosed by these parentheses), such that substituting the variable uk with the corresponding regular expression rk , and each variable vk with \k, we obtain the original regex r (= rn ) corresponding to the variable un . We illustrate this breakdown in the following example.
= (5 (4 (2 a∗ b)∗ c )\2 + (3 a∗ (1 b + ba))\3). We have two sets of variables {u1 , u2 , u3 , u4 , u5 } and {v1 , v2 , v3 , v4 , v5 }, and to each ui we associate a regular expression as follows: u2 → (a∗ b) = r2 , u4 → (u∗2 c ) = r4 , u1 → (b + ba) = r1 , u3 → (a∗ u1 ) = r3 , and u5 → (u4 v2 + u3 v3 ) = r5 . Notice that these regular expressions have no other Example 5. Let r
parentheses except the enclosing pair. Denoting Σk = Σ ∪ {u1 , . . . , uk−1 } ∪ {v1 , . . . , vk−1 }, the expression rk is a regular expression over Σk . If the variable uh is used in regex rk , i.e., |rk |uh > 0, we say that uh @ uk . Note that if uh @ uk , then uh 6@ uk0 , for all k0 6= k. In other words, once a variable uh is used in an expression rk , it will not be used again in another expression, for each pair of parentheses is transformed into an u-variable that ‘‘masks’’ the ‘‘inner’’ u-variable. This relation can be extended to an order relation v by applying the transitive and reflexive closures. During the matching of an input word with regex r, if uh v uk , then each time we attempt to match a sub-word with the expression rh , we have to consider updating the string that matches rk as well, since the expression rh is included in rk . Notice the distinction between , defined in the context of a RAS, and v defined for a regex (and yet, the parallel between them is clear). Anticipating Theorem 6, we outline here the parallel between regex and RAS. Given a regex r, we can construct an equivalent RAS A = (A1 , . . . , An , Γ , V1 , . . . , Vn ), by associating to each expression rk (in the breakdown of r as above) an automaton Ak = (Σk , Qk , 0k , δk , Fk ) recognizing the language L(Ak ) = L(rk ). One can easily see that indeed, A verifies the RAS conditions. Vice versa, given a RAS A = (A1 , . . . , An , Γ , V1 , . . . , Vn ), one can construct a corresponding regex r by reversing the previous construction: for each Ak we find the equivalent regular expression rk over the alphabet Σk , and starting with rn , we recursively substitute each symbol uk by its corresponding regular expression rk , and each symbol vk with the back-reference \k. We eventually obtain a regex over Σ . The conditions governing the structure of A ensure that the obtained regex r is indexed according to the new rules introduced in Section 3. From here, there is no problem to reindex r and adjust the back-references accordingly, to obtain a ‘‘classical’’ regex. Note that there may be cases when this construction leads to back-references occurring to the left of the referenced parentheses. Indeed, A may have, in theory, transitions labeled with vk that are triggered before uk is instantiated (they can easily be detected). Those transitions are useless, however, in order to keep the definition of RAS simple, we did not impose restrictions for avoiding them. Consequently, the resulting regex r may have ‘‘orphan’’ back-references, which we agree to replace with ∅ and perform the proper simplifications. Theorem 6. RAS and regex are equivalent. Proof. We have already shown how a regex r can be associated with a RAS A, by a two-way construction: r → A, and A → r. The outcome of these conversions is not unique, depending on the algorithms used to convert a finite automaton into a regular expression and vice versa. Given r and A = (A1 , . . . , An , Γ , V1 , . . . , Vn ), we make the following remarks: i) In the definition of transitions in A (Section 3), case (1) corresponds to a matching of an input letter, case (2) corresponds to a matching of a back-reference, while case (3) corresponds to ending the matching process for a parenthesis k — marking the moment when uk has been instantiated, and can be used in a subsequent back-referencing (by a vk ). ii) During a computation, A cannot transit along a transition labeled with a variable symbol vk for which uk has not been instantiated. This behavior is consistent with the common understanding of regex evaluation, where we cannot use a back-reference of a parenthesis which has not been matched yet (e.g, as a result of an alternation) — more precisely, we use ∅, equivalent to having no match. iii) The operation of A is non-deterministic, since it follows closely the non-deterministic matching of a word by the regex r. iv) for A ‘‘coincides’’ with v for the corresponding r. The idea of proving the equivalence of r and A is as follows. Consider a successful computation in A, for some input w ∈ L(A): (00 , w, ∅, ∅, ∅, . . . , ∅) 7→∗ (fk , α, Γ (t ) , V0(t ) , V1(t ) , . . . , Vn(t ) ) 7→∗ (fn , ε, ∅, V1(l) , V2(l) , . . . , Vn(l) ), where fk ∈ Fk . In other words, in this computation we emphasize a configuration immediately before a context switch (case (4) in the description of transitions in A in Section 3). One can check that when this configuration has been reached, all buffers Vh , with Vh 6= ∅ and uh uk , will have as a suffix the word Vk corresponding to uk , that is, the word that matches the kth pair of parentheses in r (or equivalently, which matches the expression rk ). Notice that when a variable uk is involved in an iteration, the buffer Vk is ‘‘reset’’ at the beginning of each iteration and will eventually hold the last iterated value of uk at that point of computation. At the end of computation, the set of buffers {Vk }nk=1 provide the matching sub-words used for parsing w according to r. The converse argument works similarly. Given a word w in L(r ), one can construct a matching tree [2] for w , and the node corresponding to the kth pair of parentheses in r will hold a sub-word reconstructed in the buffer Vk during a successful computation of A on input w .
2340
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
Corollary 7. The membership problem for regex has O(mn) space complexity, where n is the number of pairs of parentheses in the regex and m is the length of the input word. Proof. Since regex and RAS are equivalent, we use RAS to decide the word membership. A RAS has at most as many buffers as number of pairs of parentheses in the regex (n), and the size of a buffer is at most m (the size of the input). Notice that the depth of the stack is at most n. Remark 8. We may have different semantic variations of regex, such as: (1) Some regex implementations consider non-instantiated variables as ε , e.g., in UNIX Bourne shell interpretation. To adapt a RAS to this interpretation, we start the computation with the initial configuration (0n , w, ε, ε, ε, . . . , ε ).
|
{z n
}
(2) We may consider that each reinstantiation of a variable uk resets the values of all previously instantiated variables uh (t +1) such that uh uk . In this case, for step 3 we set xh = ∅ or x(ht +1) = ε , for all h 6= k such that uh uk , depending on the initial values for uninstantiated variables. All the results of this paper can easily be adapted without effort to any regex semantics, including the ones implemented in the current programming environments. From now on we assume without loss of generality that all components Ak of a RAS A = (A1 , . . . , An ) are trim (all states and transitions are useful), and that no transition vk can be triggered before a preceding transition uk (these situations can be detected and such transitions vk can be removed). 4. Main result: Intersection with regular languages In this section we present a construction of a RAS that recognizes the intersection of a regex language with a regular language, based on the equivalence of regex with RAS. Because the orders and v coincide for a regex and its corresponding RAS, we will only use . We now give some additional definitions and results. Definition 9. We say that a regex r is in star-free normal form if (1) every pair of parentheses included in a starred sub-expression is not back-referenced outside that sub-expression, i.e., in a sub-expression to the right of that starred sub-expression; (2) all star operations are applied to parentheses. This definition says that, in a star-free normal form regex, a pair of parentheses and its possible back-references occur only in the same sub-expression under a star operator. In other words, the following situation is avoided:
(. . . (k −) . . . )∗ . . . \k . . . . Example 10. The expressions (1 a)∗ \1 and (2 (1 a∗ )b\1)∗ \1 are not in star-free normal form, while (2 (1 a)∗ )\2 and (4 (2 (1 a)∗ )(3 a)b\3)∗ are. Lemma 11. For every regex r there exists an equivalent regex r 0 in star-free normal form. Proof. The second condition can easily be satisfied, therefore we only consider expressions where star is applied to parentheses. For the first condition, let u be a sub-expression under a star operator, which includes a pair of parentheses backreferenced outside u. The situation can be generically expressed as (. . . (k −) . . . )∗ . . . \k . . . . Our argument is based on the
|
{z u
}
following straightforward equality: u = (u u + ε). Then, we can rewrite the regex as follows: ∗
∗
(. . . (−) . . . )∗ (. . . (h −) . . . ) + ε . . . \h . . . {z } | {z } | u∗
u
without changing the accepted language. Notice that we have adjusted the back-reference \k to a new value \h, to account for the introduction of new pairs of parentheses during the process. The idea is to isolate two cases: when the iteration actually occurs, the case when we know exactly what an eventual back-reference will duplicate, or when the iteration does not occur (zero-iteration) and an eventual back-reference is set to ∅. A proof by induction on the number of parentheses that are back-referenced is straightforward. Remark 12. For a RAS obtained from a regex in star-free normal form, if a variable uh is instantiated within a loop of an automaton Ak , then its value cannot be used by a transition labeled vh , unless vk belongs to the same loop. Example 13. Let r = ((a∗ b)∗ c \1)∗ \1\2. We rewrite it in star-free normal form as follows:
∗ ∗ ( )∗ ( )∗ + ε 10 c \7 12 ∗ ( a)∗ b 6 5 (1 a)∗1 b 5 + ε 9 c \5 11 \5\11 13 . 6 2 2
a b 8 7 3a 3b 7 13 12 10 8 4 4 11 9
Let B = (Σ , QB , 0B , δB , FB ) be a trim DFA. We consider the family of functions τwB : QB → QB defined as τwB (q) = δB (q, w). Since QB is finite, the number of functions {τwB | w ∈ Σ ∗ } is also finite. These functions, together with composition and
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
2341
τεB as identity, form a finite monoid: the transition monoid TB of B. We partition Σ ∗ into equivalent classes, given by the equivalence relation of finite index u ≡B v ⇔ τuB = τvB and let WB = Σ ∗ /≡B be the quotient of Σ ∗ under ≡B . The transition functions τ B , can now be indexed by elements of WB , i.e., {τcB }c ∈WB . For every c ∈ WB , we can construct a DFA Dc = (Σ , Qc , δc , 0c , Fc ) such that L(Dc ) = c. We can repeat the above construction for each automaton Dc , obtaining an equivalence relation ≡c , the functions {τwc }w∈Σ ∗ , and the set of equivalence classes Wc = Σ ∗ / ≡c . Let W0 = WB . The above relations are of finite index, and, iterating again the above construction, we can define the following equivalence relations: let ≡0 be identical with ≡B , and x ≡l+1 y iff
x ≡l y and there is c ∈ Wl such that x ≡c y.
These classes induced by B have the following property: For any l > 1, if w1 , w2 ∈ cl ∈ Wl , then there is a unique cl−1 ∈ Wl−1 such that w1 , w2 ∈ cl−1 , and we have both
δcl (i, w1 ) = δcl (i, w2 ) and δcl−1 (i, w1 ) = δcl−1 (i, w2 ). 0
(1) 0
0
In what follows, we consider two classes c and c to be distinct if c ∈ Wj and c ∈ Wj0 , with j 6= j , thus we will make a difference between a class c and the language represented by the class L(c ). If c ∈ Wj , we denote Λ(c ) = j. For c ∈ Wl we define the function
[ [ Qc 0 −→ QB ∪ Qc 0 τc : QB ∪ c ⊆c 0
c ⊆c 0
by τc (i) = δc 0 (i, w) for w ∈ c, where i ∈ QB or i ∈ Qc 0 , c ⊆ c 0 , Λ(c 0 ) ≤ Λ(c ). Functions τc are well defined, based on property (1). Theorem 14. The family of regex languages is closed under the intersection with regular languages. Proof. Let r be a regex in star-free normal form (Lemma 11) such that the occurrence of uk in the dependency tree of the automata system is in a level lower than or equal to the levels of any occurrence of the corresponding vk (otherwise we surround uk by the required number of parentheses) and let B = (Σ , QB , 0B , δB , FB ) be a trim DFA with m = #(QB ). We consider a RAS C = (C1 , C2 , . . . Cn ), such that L(C ) = L(r ), Ck = (Qk , Σk , δk , 0k , Fk ), where C is obtained by using the construction in Section 3 from regex r. Let l be the number of levels of the dependency tree of the automata system C and consider the equivalence classes Wj , 0 ≤ j ≤ l, induced by the automaton B. Denote Lev el0 = {k | uk 6 uh , and for any 1 ≤ h ≤ n − 1, uk is a label in Cn } and Lev elj+1 = {h | uh uk , k ∈ Lev elj } for 0 ≤ j < l. Thus, for a word w ∈ L(C ) ∩ L(B), for each k ∈ Lev elj , at the end of the computation we can consider that Vk ∈ L(ck ) and ck ∈ Wj (j = Λ(ck )). For this computation, when we update a buffer Vh , we also update all buffers Vk such that uh uk . Hence, for uh uk , processing a word in a class ch using the automaton Ch requires updates of words processed by automaton Ck , but updates of words processed by automaton Ck may or may not require updates of words processed by automaton Ch . Hence, in the RAS A constructed for the intersection L(C ) ∩ L(B), a buffer Vk for C may turn into a set of buffers in A, where variables are in sets of variables resulting from uk and vk respectively, considering all possible class instantiations. The indices of new variables u− and v− of the RAS A must contain the index k of the module Ck to which they are related, and information about the instantiation classes ch ∈ WΛ(ch ) , where uh uk , h < k. For all k, 1 ≤ k < n, k ∈ Lev elj , all c ∈ Wl , and d ∈ {1, 2} we define: k = {(αk , αk−1 , . . . , α1 ) | αk = icd, and for all h, 1 ≤ h < k, αh = 0 or αh = 1 or there is 1 ≤ h0 < h ≤ k, u0h ≺ uh Sicd uk such that αh = ih ch dh , αh0 = ih0 ch0 dh0 , ih0 ∈ Qch and ch0 ∈ WΛ(ch )+1 , dh , d0h ∈ {1, 2}, dh0 ≤ dh }. k For k ∈ Lev elj and j > 1, we denote Sk = {S ∈ Sicd | d = 1, 2, c ∈ Wj , i ∈ Qc 0 , c 0 ∈ Wj−1 }. For k ∈ Lev elj and j = 1, we k denote Sk = {S ∈ Sicd | d = 1, 2, c ∈ Wj−1 , i ∈ QB } and Sn = Sn−1 . The projections πh : Sk −→ (Qch Wj {1, 2} ∪ {0, 1}), ch ∈ Wj−1 , are defined for 1 ≤ h ≤ k by πh (S ) = αh , where S = (αk , . . . , α1 ). The components of the RAS A, corresponding to the variables resulting from the variable uk are Ak,S , where 1 ≤ k < n, and S ∈ Sk verifies that πh (S ) 6= 1 for all 1 ≤ h ≤ k. The states of Ak,S are in Qk × Qck × Sk , πk (S ) = ik ck dk , and variable labels are in Qh × Qch × Sh where uh ≺ uk , ch ∈ WΛ(ck )+1 . Given a projection h of S ∈ Sk and αh , we have the following interpretation:
• if αh = ih ch dh , then ch is a class for Dck , ih ∈ Qck , and dh is 1 if this is the first instantiation of a variable resulting from uh , and 2 if it is another (re)instantiation of a variable resulting from uh .
• if αh = 0, then all variables resulting from uh are not instantiated yet. • if αh = 1, then at least one variable resulting from uh has been instantiated and another one resulting from uh is to be (re)instantiated. The information about previous instantiation is erased. This value is only possible for states. Only transitions with variables of type u change instantiation classes for buffers and each such transition must be unique. We know when a transition with a variable resulting from vk is possible, because the name of the states contains the last instantiation class. There is only one state in Q where a variable uk is instantiated, thus we denote by init (k) the state in Q that has an outward transition labeled with uk and by Init = {init (k) | 1 ≤ k < n}. For the new modules, the states having transitions
2342
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
with variables resulting from uk should only be allowed if one component is init (k) and the k component of the S name is 0 (first instantiation) or 1 (reinstantiation). If the k component is not in {0, 1}, then the previous instantiation of the variable resulting from uk is considered, and there is no transition with variables resulting from uk . For reinstantiating a variable resulting from uh in a module for a variable resulting from uk , uh ≺ uk , we need to consider all possible (re)instantiations of variables resulting from uh as well as for some of the variables resulting from uh0 , with k uh0 uh . To achieve this, we define the set E (S , h), where S ∈ Sk , and c , c 0 are such that S ∈ Sicd , c ∈ WΛ(c 0 )+1 and i ∈ Qc 0 . k 0 0 00 0 E (S , h) = {S ∈ Sicd | πh (S ) = 1 and for all 1 ≤ h < h ≤ h, if uh00 uh0 uh and πh00 (S 0 ) = 1, then πh0 (S 0 ) = 1, otherwise πh0 (S ) = πh0 (S 0 )}. 0 Note that if S 0 ∈ E (S , h), then πh0 (S ) = πh0 (S 0 ), for all [h < k and uh0 6 uk . The following set contains all cases for the reinstantiation of the new variables: Choice(S ) = E (S , h). In this set, one component h and only some of the uh ≺uk
components h0 with uh0 uh are set to 1, preparing them for a reinstantiation. For state names, the components of S which are reinstantiated must be 1, and after the u-transitions, they must be different from 0 or 1. The next set describes this situation: Follow(S , h) = {(αk , . . . , α1 ) | πh (S ) ∈ {0, 1}, αh ∈ / {0, 1}, and for all 1 ≤ h0 < k, αh0 6= 1, if πh0 (S ) = 0, αh0 ∈ 0 0 0 0 {0, Qc WΛ(ch0 ) 1}, and if πh (S ) = 1, then αh ∈ {Qc WΛ(ch0 ) 2}, Λ(c 0 ) + 1 = Λ(ch0 ), otherwise αh0 = πh0 (S )}. Now we are ready to give the formal definitions for the modules of A. (1) For all k such that Ck does not have transitions labeled with variables S = (ik ck dk , αk−1 , . . . , α1 ):
|
{z
(k−1)
}
Ak,S = Qk × Qck , Σ , (0k , 0ck ), δk,S , Fk,ck , where Fkck = Fk × Fck , and for all (p, r ) ∈ Qk × Qck and a ∈ Σ :
δk,S (p, r ), a = {(q, δck (r , a)) | q ∈ δk (p, a)}. This is the case when back-references are not processed, thus the construction is the usual automata Cartesian product [7], Ck × Dck . Note also that this corresponds to the case of a most inner pair of parentheses (i.e., with no dependencies). (2) For all k ∈ {2, . . . , n − 1} (case k = 1 does not involve any dependency) and S ∈ Sk with S = (ik ck dk , αk−1 , . . . , α1 ), we
|
{z
(k−1)
}
have:
Ak,S = Qk × Qck × Sk , Σ ∪ {uk0 ,S 0 | k0 < k, S 0 ∈ Sk−1 }
∪{vk0 ,S 0 | k0 < k, S 0 ∈ Sk−1 }, (0k , 0ck , αk−1 , . . . , α1 ), δk,S , Fk,S , | {z } k−1
where Fk,S = Fk × Fck × {S ∈ Sk | πh (S ) = αh , 1 ≤ h ≤ k}, and i) letter-transition: for all (p, i, S 0 ) ∈ Qk × Qck × Sk and a ∈ Σ :
δk,S (p, i, S 0 ), a = {(q, δck (i, a), S 0 ) | q ∈ δk (p, a) − Init } ∪{(q, δck (i, a), T 0 ) | q ∈ δk (p, a) ∩ Init , T 0 ∈ Choice(S 0 )} ii) u-transition: for all (p, i, S 0 ) ∈ Qk × Qck × Sk , such that there is k0 , k0 < k with πk0 (S 0 ) ∈ {0, 1}, p ∈ init (k0 ), and for all T 0 ∈ Follow(S 0 , k0 ) s.t. πk0 (T 0 ) = icd:
δk,S (p, i, S 0 ), uk0 ,T 0
= {(q, τc (i), T 0 ) | q ∈ δk (p, uk0 ) − Init } ∪{(q, τc (i), T 00 ) | q ∈ δk (p, uk0 ) ∩ Init , T 00 ∈ Choice(T 0 )}
iii) v -transition: for all (p, i, S 0 ) ∈ Qk × Qck × Sk , k0 < k and πk0 (S 0 ) = icd:
δk,S (p, i, S 0 ), vk0 ,S 0
= {(q, τc (i), S 0 ) | q ∈ δk (p, vk0 ) − Init } ∪{(q, τc (i), T 0 ) | q ∈ δk (p, vk0 ) ∩ Init , T 0 ∈ Choice(S 0 )}.
Note that after each transition triggered by uk0 ,T 0 we reach states where the transition with vk0 ,T 0 is possible, but the transitions with vk0 ,T 00 , πk0 (T 0 ) 6= πk0 (T 00 ) are not defined. This ensures a correlation between uk0 ,S 0 and vk0 ,S 0 , which mimics the correlation betweenuk0 and vk0 in Ck .
(3) An,FB = Qn × QB × Sn , Σ ∪ {uk0 ,S 0 | k0 < n, S 0 ∈ Sn } ∪ {vk0 ,S 0 | k0 < n, S 0 ∈ Sn }, (0n , 0, . . . , 0), δn,FB , Fn,FB , | {z } (n−1)
where Fn,FB = Fn × FB × Sn , and
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
2343
i) letter-transition: for all (p, i, S 0 ) ∈ Qk × QB × Sn , a ∈ Σ :
δn,FB (p, i, S 0 ), a = {(q, δB (i, a), S 0 ) | q ∈ δn (p, a)} ∪{(q, τc (i), T 0 ) | q ∈ δn (p, a) ∩ Init , T 0 ∈ Choice(S 0 )} ii) u-transition: for all (p, i, S 0 ) ∈ Qn × QB × Sn , such that there is k0 , k0 < k with πk0 (S 0 ) ∈ {0, 1} p ∈ init (k0 ), and for all T 0 ∈ Follow(S 0 , k0 ) s.t. πk0 (T 0 ) = icd:
δn,FB (p, i, S 0 ), uk0 ,T
= {(q, τc (i), T 0 ) | q ∈ δk (p, uk0 )} ∪{(q, τc (i), T 00 ) | q ∈ δn (p, uk0 ) ∩ Init , T 00 ∈ Choice(T 0 )}
iii) v -transition: for all (p, i, S 0 ) ∈ Qn × QB × Sk , k0 < n and πk0 (S 0 ) = icd:
δn,FB (p, i, S 0 ), vk0 ,S 0
= {(q, τc (i), S 0 ) | q ∈ δn (p, vk0 )} ∪{(q, τc (i), T 0 ) | q ∈ δn (p, vk0 ) ∩ Init , T 0 ∈ Choice(S 0 )}.
Considering that An,FB is the ‘‘main’’ automaton of our newly constructed RAS, the dependence between the constituent automata is straightforward. Let H be the number of automata in A. We make the following observations that justify the correctness of our construction: (1) A is indeed a RAS. The transitions with u(k0 ,T 0 ) , πk0 (T 0 ) = icd, are only possible for states (p, i, S 0 ) in Qk × Qck × Sk with πk0 (S 0 ) ∈ {0, 1}. Since p is unique for k0 , so is (p, i, S 0 ) for uk0 ,S 0 . (2) If in the RAS A we consider only the first component of each state and ignore the S-subscript, we observe that a computation in A for an input word w is successful if and only if there exists a successful computation for w in this reduced version of A, since all automata Ak,S are identical with Ck , for all S (we have a surjective morphism from Ak,S to Ck ). For a fixed k, the buffers Vk,S in Ak,S are not simultaneously used, therefore it does not matter if for each k we use one buffer or several buffers. The subtle point in this construction is to avoid the danger of using a back-reference vk,S corresponding to a variable uk,S that does not represent the last uk instance, i.e., it is not uk,S , but rather uk,S 0 for some index S 0 with πk (S ) 6= πk (S 0 ). However, this problem is avoided by using a RAS obtained from a regex in star-free normal form. This guarantees that every time we reinstantiate uk0 ,S 0 , we update the projection k0 of S 0 ; therefore, all the other variables uk0 ,S 00 with πk0 (S 0 ) 6= πk (S 00 ) are not on the path for uk0 ,S 0 . Indeed, πk0 (S 00 ) = πk0 (S 0 ), for all states following uk0 ,S 0 in a successful computation path, since star is only applied to a reinstantiated variable (variable between parentheses). Thus, only the transitions with vk0 ,S 0 are possible. The synchronization is done using the k0 projection of the index S 0 . (t )
(t )
(t )
(t +1)
(t +1)
(t +1)
(3) For every transition: ((s, i, S ), αw, Γ (t ) , V1 , V2 , . . . , VH ) 7→A ((q, j, T ), w, Γ (t +1) , V1 , V2 , . . . , VH ), we have that i = j and α = ε , or δB (i, α) = j (in the case when α is matching a variable τc (i) = j, α ∈ L(c ) or α is a letter). In conclusion, for every transition
((0n , 0B , 0, . . . , 0), w, ∅, ∅, ∅, . . . , ∅) 7→∗A (q, ε, ∅, V1 , V2 , . . . , VH ) | {z } | {z } n−1
H
we have that: δB (0B , w) = q and 0
(0, αw, Γ (t ) , V1(t ) , V2(t ) , . . . , Vn(t ) ) 7→C (q, w, Γ (t +1) , V1(t +1) , V2(t +1) , . . . , Vn(t +1) ), which means that w ∈ L(A) iff w ∈ L(B) and w ∈ L(C ). Thus, the automata system A recognizes the intersection of L(C ) and L(B), proving that the intersection is a regex language. 5. Consequences and conclusion We use Theorem 14 to show that a few remarkable languages, such as the mirror language, are not regex languages. In [3,4] it was proved that the following languages satisfy neither the regex, nor the PE pumping lemma: L1 = {(aababb)n (bbabaa)n | n ≥ 0}, L3 = {a2n bn | n ≥ 0}, L5 = {{a, b}n c n | n ≥ 0},
L2 = {an bn | n ≥ 0}, L4 = {an bn c n n ≥ 0}, L6 = {{a, b}n c {a, b}n | n ≥ 0}.
Since the pumping lemmas for regex and PE are essentially the same, it is clear that all these languages are not regex languages. This helps us to infer that some other languages, more difficult to control, are not regex languages – as the following result shows.
2344
C. Câmpeanu, N. Santean / Theoretical Computer Science 410 (2009) 2336–2344
Corollary 15. The following languages are not regex languages: L7 = {ww R | w ∈ Σ ∗ },
L8 = w | w = w R ,
L10 = w | |w|b = 2|w|a ,
L9 = w | |w|a = |w|b ,
L11 = w | |w|a = |w|b = |w|c ,
L12 = w | |w|a + |w|b = |w|c ,
L13 = uc v | |u|a + |u|b = |v|a + |v|b .
Proof. We observe that: L7 ∩ (aababb)∗ (bbabaa)∗ = L8 ∩ (aababb)∗ (bbabaa)∗ = L1 , L9 ∩ a∗ b∗ = L2 , L10 ∩ a∗ b∗ = L3 , L11 ∩ a∗ b∗ c ∗ = L4 , L12 ∩ (a + b)∗ c ∗ = L5 , and L13 ∩ (a + b)∗ c (a + b)∗ = L6 . If any of L7 , . . . , L13 was a regex language, so would be its corresponding intersection, leading to a contradiction. We should mention that none of the languages L7 , . . . , L13 could be proven to be non-regex by pumping lemma alone. As a theoretical application of the closure property, some previous results involving elaborate proofs, such as Lemma 3 in [2], are immediately rendered true by Theorem 14. Consequently, we also infer that the new family of regex languages is not closed under shuffle with regular languages. To conclude, in this paper we have defined a machine counterpart of regex, Regex Automata Systems (RAS), and used them to give an answer to an open problem reported in [2], namely, whether regex languages are closed under the intersection with regular languages. We have provided a positive answer to this question, and used this closure property to show that several anthological languages, such as the mirror language, the language of palindromes or the language of balanced words, are not regex – thus revealing some of the limitations of regex unforeseen before. Regex automata systems have also a practical impact: they give a rigorous method for implementing regex in programming languages and they avoid semantic ambiguities. It remains open whether regex languages are closed under intersection. We conjecture that they are not, since in the proof for the closure under the intersection with regular languages we used in a crucial manner the transition monoid of a DFA, and its corresponding equivalence of finite index. Other open problems include the relation between regex and other formalisms, such as the pattern expressions [4] or grammar system [5]. References [1] A.V. Aho, Algorithms for finding patterns in strings, in: Jan van Leeuwen (Ed.), Handbook of Theoretical Computer Science, in: Algorithms and Complexity, vol. A, Elsevier and MIT Press, 1990, pp. 255–300. [2] C. Câmpeanu, K. Salomaa, S. Yu, A formal study of practical regular expressions, IJFCS 14 (6) (2003) 1007–1018. [3] C. Câmpeanu, N. Santean, On pattern expression languages, Technical Report CS-2006-20, David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada, 2006. [4] C. Câmpeanu, S. Yu, Pattern expressions and pattern automata, IPL 92 (2004) 267–274. [5] Erzsébet Csuhaj-Varjú, Jürgen Dassow, Jozef Kelemen, Gheorghe Păun, Grammar systems, in: A Grammatical Approach to Distributed and Cooperation, Institute of Mathematics, Gordon and Breach Science Publishers, The Romanian Academy of Sciences, Bucureşti, Romania. [6] J.E.F. Friedl, Mastering Regular Expressions, O’Reilly & Associates, Inc., Cambridge, 1997. [7] J.E. Hopcroft, R. Motwani, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley, Reading Mass, 2006. [8] A. Salomaa, Theory of Automata, Pergamon Press, Oxford, 1969. [9] A. Salomaa, Formal Languages, Academic Press, New York, 1973. [10] S. Yu, Regular languages, in: A. Salomaa, G. Rozenberg (Eds.), Handbook of Formal Languages, Springer Verlag, 1997, pp. 41–110.
Theoretical Computer Science 410 (2009) 2345–2351
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Conjugacy of finite biprefix codesI Julien Cassaigne a , Juhani Karhumäki b , Petri Salmela b,∗ a
Institut de Mathématiques de Luminy—CNRS/FRUMAM, Case 907, FR-13288 Marseille Cedex 9, France
b
Department of Mathematics and TUCS, University of Turku, FI-20014 University of Turku, Finland
article
info
Keywords: Conjugacy Language equation Biprefix code
a b s t r a c t Two languages X and Y are called conjugates, if they satisfy the conjugacy equation XZ = ZY for some non-empty language Z . We will compare solutions of this equation with those of the corresponding equation of words and study the case of finite biprefix codes X and Y . We show that the maximal Z in this case is rational. We will also characterize X and Y in the case where they are both finite biprefix codes. This yields the decidability of the conjugacy of two finite biprefix codes. © 2009 Elsevier B.V. All rights reserved.
1. Introduction The conjugacy equation xz = zy is a basic equation for words. Words x and y are conjugates, i.e., they satisfy the conjugacy equation for some word z if and only if x and y have factorizations x = pq and y = qp with some words p and q, and then the above z can be expressed as z = (pq)i p. For languages we say that languages X and Y are conjugates, if they satisfy the conjugacy equation XZ = ZY for some nonempty language Z . For empty set Z the conjugacy equation always holds. We also restrict our research on languages X and Y which do not include empty word since we concentrate on finite biprefix codes. We can also note, that not all biprefix codes X and Y are conjugates. For example with X = {a} and Y = {b} the conjugacy equation aZ = Zb does not have any nonempty solution Z . The conjugacy equation on languages is not equally easy to solve as the same equation on words. Formula of general solutions of conjugacy equation on words can be extended to languages simply by replacing words x, y, z , p and q in the formula by languages X , Y , Z , P and Q . However in several cases this formula does not include all possible solutions. For example, as observed in [2], the solution X = {a, ab, abb, ba, babb}, Y = {a, ba, bba, bbba}, Z = {a, ba} is not of this type. However, for some special classes of languages all solutions can be obtained essentially with the same formula as for the conjugacy of words. To analyze this is the topic of this note. In this paper we first define the so-called word type solutions of conjugacy equation on languages. As a starting point, we note that the solutions for words can be expressed as x = (pq)k , y = (qp)k and z = (pq)i p with some integers i, k and primitive word pq. This formulation of solutions is equivalent to the standard one, which was mentioned in the beginning. This formulation, however, has some advantages. For language equations we refer to solutions of form X = (PQ )k ,
Y = (QP )k
and Z =
[ (PQ )i P i∈I
with primitive (see below) languages PQ as word type solutions. This notion has been defined in [2], however, our definition in Section 3 is a slight extension.
I Supported by the Academy of Finland under grant 203354 and Finnish Mathematical Society International Visitors Program.
∗
Corresponding author. E-mail addresses:
[email protected] (J. Cassaigne),
[email protected] (J. Karhumäki),
[email protected] (P. Salmela).
0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.030
2346
J. Cassaigne et al. / Theoretical Computer Science 410 (2009) 2345–2351
Now, we describe our four results. First we define and study the conjugator of X and Y , that is the largest language Z (with respect to the subset relation) such that XZ = ZY . We show that for finite biprefix codes X and Y the conjugator is rational, in fact, even of form X ∗ U for some finite language U. After this we characterize finite biprefix codes X and Y satisfying the conjugacy equation with some non-empty language Z . We show that these languages can always be factorized as X = UV and Y = VU for some biprefix codes U and V . This is achieved by rather complicated combinatorial analysis. However, this factorization is not necessarily unique, but we also provide a unique representation. Next we characterize the conjugator of given finite biprefix codes and show that in this case all solutions are of word type. Our last result proves that the conjugacy problem for finite biprefix codes, i.e., the problem, whether given finite biprefix codes X and Y are conjugates, is decidable. This is shown as corollary of the previous results and the fact that the set of all biprefix codes is the free monoid. In the case of arbitrary finite language the problem is open, and does not seem to be easy, see [8]. 2. Preliminaries Let A be a finite alphabet, and A∗ the free monoid generated by A. Lowercase letters are used to denote words, i.e., elements of A∗ , and uppercase letters languages, i.e., subsets of A∗ . The empty word will be denoted by 1. For words notation |w| means the length of word w and for languages |X | is the cardinality of X . Language is uniform, if all its elements have the same length. Notation Pref(X ) is used for the set of all prefixes of words in X , and similarly Suf(X ) means all S suffixes of words in X . Empty word and words in X are included. We use also a shorthand LI for the union of powers i∈I Li . Notation L≤n is a S shorthand for 0≤i≤n Li . The language L is called primitive, if L = K i implies L = K and i = 1, i.e., if the language L is not a proper power of any other language. If the language is not primitive it is imprimitive. We note that the representation X = K i with K primitive is closely related to prime factorizations of languages. Such a research was initiated in [14], and shown to be a rich research topic in [7]. When we say that an element w in language L is prefix (resp. suffix) incomparable, we mean that neither w is a prefix (resp. suffix) of any other word in L nor any other word in L is a prefix (resp. suffix) of w . Sometimes this kind of element is also called left (resp. right) singular in L. (see [9,16] or [13]) The language L is a prefix (resp. suffix) code or just prefix (resp. suffix), if all elements in L are left (resp. right) singular. If the language L is both prefix and suffix code, we say it is biprefix code or just biprefix. It is known, that the families of prefix, suffix and biprefix codes are free monoids [1,15]. This means that each prefix (resp. suffix or biprefix) code has unique factorization as catenation of indecomposable prefix (resp. suffix or biprefix) codes. This also means that prefix (resp. suffix or biprefix) set can be viewed as a word over a special alphabet of indecomposable prefix (resp. suffix or biprefix) codes. The free base of each of these monoids is infinite, but in many considerations only finite subsets are needed. We also recall that for any prefix (resp. suffix or biprefix) code L there always exists the unique primitive root ρ(L), see [1,15]. For codes the existence of the primitive root is an open problem, see [9], while for arbitrary sets it is not unique, see, e.g., [4]. The following simple fact is needed in many later considerations. Any solution Z of the conjugacy equation XZ = ZY satisfies Z ⊆ Pref(X ∗ ) ∩ Suf(Y ∗ ). This is clear, since obviously also X n Z = ZY n for any integer n, and so for any words z ∈ Z and y ∈ Y there exist words xi ∈ X and z 0 ∈ Z such that zy|z | = x1 · · · x|z | z 0 . This means, since |z | < |x1 | + · · · + |x|z | |, that z is a prefix of x1 · · · x|z | ∈ X |z | , i.e., z ∈ Pref(X ∗ ). Dually, z is also suffix of some word in Y ∗ . 3. Word type solutions We recall that the conjugacy equation xz = zy for non-empty words has the general solution xz = zy ⇐⇒ ∃p, q ∈ Σ ∗
s.t. x = pq, y = qp
and z ∈ (pq)∗ p.
(1)
This motivates the notion of word type solution of conjugacy equation of the languages. In [2] this has been straightforwardly defined as: X = PQ , Y = QP
and Z = (PQ )I P
(2)
for languages P , Q and set I ⊆ N. We call these solutions word type 1 solutions. However, there is also a slightly more general way to define word type solutions. The condition (1), in the case of words, is equivalent to the condition xz = zy
(3)
⇐⇒ ∃p, q ∈ Σ ∗ , k ∈ N s.t. x = (pq)k ,
y = (qp)k
and z ∈ (pq)∗ p,
where pq and qp are primitive words. This motivates to define, word type solution of languages as: X = (PQ )k ,
Y = (QP )k
and
Z = (PQ )I P
for languages P , Q such that PQ and QP are primitive, integer k and set I ⊆ N.
(4)
J. Cassaigne et al. / Theoretical Computer Science 410 (2009) 2345–2351
2347
We call such solutions word type 2 solutions, clearly they include all word type 1 solutions. Unlike in the case of words these notions are not equivalent in the case of languages, as shown in the next example. Example 1. Let X = BCBC and Y = CBCB for B = {b} and C = {c } (or some other biprefix codes). Now both solutions P1 = B,
Q1 = CBC ,
X = P1 Q1 ,
Y = Q1 P1 ,
P2 = BCB,
Q2 = C ,
X = P2 Q2 ,
Y = Q2 P2 ,
Z1 = P1 Q1 P1 = (BCBC )B
and Z2 = P2 Q2 P2 = (BCBC )BCB
are of word type in the sense of (2), but their union Z1 ∪ Z2 = BCBCB ∪ BCBCBCB is not. However, if we would use (4) as the definition of word type solution, we would have P = B,
Q = C,
X = (PQ )2 ,
Y = (QP )2 ,
Z1 = (PQ )2 P = (BC )2 B,
P = B,
Q = C,
X = (PQ )2 ,
Y = (QP )2 ,
Z2 = (PQ )3 P = (BC )3 B
and Z = Z1 ∪ Z2 = (PQ ){2,3} P . Based on above, we choose (4) for our definition of word type conjugation of languages. 4. The conjugator For the commutation equation XY = YX there has been active research on the centralizer, that is on the largest language commuting with given language X . J.H. Conway asked in [6], whether the centralizer of given rational language is rational as well. This, so-called Conway’s problem, was open for a long time and has been solved negatively in general [12], but has proven to have positive answers in several special cases like sets with at most two elements [5], rational codes [9], threeelement sets [10] and languages with certain special elements [13]. For the conjugacy equation XZ = ZY we can similarly study the maximal solution Z for given languages X and Y . The maximal solution exists and is the unique largest one. We call this solution the conjugator. In the case that X and Y are not conjugates the maximal (and only) solution is the empty set. If X and Y are conjugates, and conjugated via languages Zi for i inSsome index set I, then they are, by the distributivity of catenation and union operations, conjugated also via the union i∈I Zi . Hence the unique maximal solution is the union of all solutions Z . The special case where X = Y gives us the centralizer of X . We can ask the question similar to the Conway’s problem, namely whether the conjugator of given languages X and Y is rational. The general answer is of course negative, since the original Conway’s problem has a negative answer. However, we can again study some special cases. In what follows we use similar reasoning for conjugacy as has been used for commutation in [11]. First we need the following lemma. Lemma 2 (Interchange Lemma). If X and Y are 1-free languages, such that Y has a suffix incomparable element y and XZ = ZY for some language Z , then for each word z ∈ Z there exist an integer n and a word u ∈ Pref(X ) \ X such that z = x1 x2 · · · xn u for some xi ∈ X , and moreover X n u ⊆ Z . Proof. Let X and Y be 1-free languages, y a suffix incomparable element in Y , and Z such that XZ = ZY . Then for each z ∈ Z there exist an integer n and factorization z = x1 x2 · · · xn u such that xi ∈ X , u ∈ Pref(X ) \ X and zyn = x1 x2 · · · xn uyn ∈ ZY n = X n Z with uyn ∈ Z . Then again x01 x02 · · · x0n uyn ∈ X n Z = ZY n where x0i are arbitrary elements from X . This shows that X n u ⊆ Z , since y is suffix incomparable in Y .
Theorem 3. For finite languages X and Y , such that Y has suffix incomparable element y, the conjugator is rational. Proof. Let X and Y be finite languages, y a suffix incomparable element in Y , and Z their conjugator. By Lemma 2 for each word z ∈ Z we have z ∈ X n u ⊆ Z for some integer n and word u ∈ Pref(X ). Since X 2 Z = XZY the language XZ is included in the conjugator Z . Hence also X ∗ Z ⊆ Z and X ∗ X n u ⊆ Z . Let U ⊆ Pref(X ) be the set of all words u occurring in the above constructions. Since the language X is finite, so is U. Now, for each u ∈ U, there exists minimal integer nu such that X ∗ X nu u ⊆ Z and each word z ∈ Z is in one of these sets. Hence we conclude that the conjugator of X and Y is
! Z =X
∗
[
X u . nu
u∈U
This set is rational, since the set ∪u∈U X nu u is finite. Note that if X and Y are not conjugates, then Z is the empty set.
2348
J. Cassaigne et al. / Theoretical Computer Science 410 (2009) 2345–2351
The proof of previous theorem is not constructive, since it needs the conjugator to be given. Hence the result is noneffective. In a suffix set all elements are suffix incomparable, therefore this result holds in the case of finite biprefix codes X and Y . Finally, we make a remark that interchange lemma can also be proven in a sharper form using the primitive root ρ(X ) instead of the language X . This way we obtain that u ∈ Pref(ρ(X ))\ρ(X ), z = r1 r2 · · · rn u for some ri ∈ ρ(X ) and ρ(X )n u ⊆ Z . This gives us a smaller number of words u. 5. Characterization of conjugacy of finite biprefix codes In this section we characterize, when finite biprefix codes X and Y are conjugates. The fact that set of biprefix codes is a free monoid suggests that this conjugacy would be similar to the conjugacy of words, i.e., of word type. However, we cannot use this freeness property to characterize X and Y , since we do not know for sure, if the solution Z is also in this free monoid of biprefixes or even a union of such biprefix solutions. Hence we are tied to a complicated analysis as in the case of determining the centralizer of a prefix code, see [16]. When we have obtained this characterization, we are able, in Section 6, finally to prove that Z indeed is a union of such biprefix solutions. We can also note, that using looser condition, where X is a prefix code and Y is a suffix code, does not guarantee the conjugacy to be word type. As an example we can have languages X = {abaa, baa} and Y = {aaba, aab}, which are prefix and suffix respectively. These languages are conjugates for example via language Z = {b, ab, ba, aba}, but their conjugacy is not word type. In what follows, we assume that X and Y are finite biprefix codes such that XZ = ZY for some non-empty language Z . Lemma 4. For every integer n ≥ min{|x| | x ∈ X } there exist finite biprefix codes Un and Vn satisfying X ∩ A≤n = Un Vn ∩ A≤n and Y ∩ A≤n = Vn Un ∩ A≤n .
(5)
Proof. Let X0 , Y0 , Z0 be the sets of elements in X , Y , Z of minimal lengths and n0 = min{|x| | x ∈ X }. Then, since X0 , Y0 and Z0 are uniform languages, X0 Z0 = Z0 Y0 holds and the solution is of word type, see [2]. This means that X0 = Un0 Vn0 , Y0 = Vn0 Un0 and Z0 = (Un0 Vn0 )m Un0 for some uniform Un0 and Vn0 and integer m ≥ 0. Hence (5) holds for n = n0 . Let us choose u0 ∈ Un0 , v0 ∈ Vn0 and z0 = (u0 v0 )m u0 ∈ Z0 . We assume, inductively, that we have already constructed Ui and Vi for n0 ≤ i < n and construct Un and Vn for n > n0 satisfying (5), so that Un−1 ⊆ Un and Vn−1 ⊆ Vn . First we show that Un−1 Vn−1 ∩ A≤n ⊆ X and Vn−1 Un−1 ∩ A≤n ⊆ Y . Let u ∈ Un−1 , v ∈ Vn−1 such that |uv| = n, if such elements exist. Then |uv0 | < n and |u0 v| < n, so uv0 , u0 v ∈ X and v0 u, v u0 ∈ Y . Now z0 v0 uv u0 ∈ ZY 2 = X 2 Z and by regrouping elements we have z0 v0 uv u0 (v0 u0 )m = (u0 v0 )m+1 uv z0 ∈ ZY m+2 = X m+2 Z and since X is biprefix, we get uv z0 ∈ XZ . Hence uv z0 = xz with x ∈ X and z ∈ Z . Here |z | ≥ |z0 |, i.e., x is a prefix of uv ∈ Un−1 Vn−1 . If |x| < n, i.e., x is a proper prefix of uv , then also x ∈ Un−1 Vn−1 and this is a contradiction, since Un−1 Vn−1 is a biprefix. Therefore |x| = n and x = uv ∈ X . Similarly, v u ∈ Y and so Un−1 Vn−1 ∩ A≤n ⊆ X and Vn−1 Un−1 ∩ A≤n ⊆ Y . Next we deal with the words in X ∩ An \ Un−1 Vn−1 (and in Y ∩ An \ Vn−1 Un−1 ), and show that some words can be added to Un−1 and Vn−1 to form Un and Vn , still satisfying (5). If there exists x ∈ X ∩ An \ Un−1 Vn−1 , then
(u0 v0 )m+1 xz0 = z0 v0 xu0 (v0 u0 )m ∈ X m+2 Z = ZY m+2 , and hence Y is biprefix, z0 v0 xu0 ∈ ZY 2 . Therefore z0 v0 xu0 = zyy0 for some y, y0 ∈ Y and z ∈ Z , |z | ≥ |z0 |, see Fig. 1 for illustration. Now yy0 is suffix of v0 xu0 and |u0 | ≤ n0 ≤ |y0 | ≤ |v0 xu0 | − |y| = n + n0 − |y| ≤ n. So y0 = v 0 u0 , where v 0 is a suffix of x. We have two cases: (i) If |y0 | < n, then y0 = v 0 u0 ∈ Vn−1 Un−1 and, since Un−1 is a biprefix, v 0 ∈ Vn−1 . Now x = u0 v 0 , where u0 ∈ / Un−1 , and y is a suffix of v0 u0 . For lengths we have now n0 ≤ |y| ≤ |v0 u0 | = |v0 xu0 | − |y0 | = n + n0 − |y0 | ≤ n. There are two subcases on the length of y: If |y| < n, then y = v 00 u00 ∈ Vn−1 Un−1 for v 00 ∈ Vn−1 , u00 ∈ Un−1 . Now |v 00 u00 | ≤ |v0 u0 |, since y = v 00 u00 is a suffix of v0 u0 , and also |v 00 | ≥ |v0 |. Hence |u00 | ≤ |u0 | and u00 is a suffix of u0 . In fact u0 6= u00 , since u0 ∈ / Un−1 and u00 ∈ Un−1 . Now 00 0 u v ∈ Un−1 Vn−1 and, as we just proved above, by its length
|u00 v 0 | ≤ |v0 xu0 | − |v 00 | − |u0 | ≤ n also u00 v 0 ∈ X . This means that u00 v 0 and x = u0 v 0 are both in X and u00 v 0 is a proper suffix of x = u0 v 0 . This contradicts the fact that X is a biprefix. On the other hand, if |y| = n = |x|, then |y0 | = n0 , |z | = |z0 | and y = v0 u0 . In this case we add u0 to Un , so that x ∈ Un Vn0 . (ii) If |y0 | = n, then x = u0 v 0 with |u0 | = |u0 | and |y| = |v0 xu0 | − |y0 | = n0 , so y = v0 u0 . Hence y ∈ Vn0 Un0 and so u0 ∈ Un0 . In this case we add v 0 to Vn so that x ∈ Un0 Vn . We proceed similarly for y ∈ Y ∩ An \ Vn−1 Un−1 . Note that by the construction of Un and Vn maxv∈Vn |v| + minu∈Un |u| ≤ n and maxu∈Un |u| + minv∈Vn |v| ≤ n.
J. Cassaigne et al. / Theoretical Computer Science 410 (2009) 2345–2351
2349
Fig. 1. Illustration of equation z0 v0 xu0 = zyy0 .
Now for each element u in Un \ Un−1 there exist elements v 0 and v 00 in Vn0 such that uv 0 ∈ X ∩ An and v 00 u ∈ Y ∩ An . We have to show that uVn0 ⊆ X and Vn0 u ⊆ Y . Let v ∈ Vn0 . Then v u0 ∈ Y and u0 v ∈ X . Since
(u0 v0 )m u0 v 00 uv z0 = z0 (v 00 u)(v u0 )(v0 u0 )m ∈ ZY m+2 = X m+2 Z , there is u0 v 00 uv z0 ∈ X 2 Z . Since u0 v 00 ∈ Un0 Vn0 ⊆ X we obtain uv z0 ∈ XZ , so uv z0 = xz. If |x| < n = |uv|, then x ∈ Un−1 Vn−1 ⊆ Un Vn and x is proper prefix of uv ∈ Un Vn . However, this cannot be the case, since Un and Vn are both biprefix codes (see below). If |x| > n, then |z | < |z0 | which contradicts the minimality of |z0 |. Hence |x| = n = |uv| and x = uv ∈ X . The proof for Vn0 is obtained dually. Similarly, for each element v in Vn \ Vn−1 there exist elements u0 and u00 in Un0 such that u0 v ∈ X ∩ An and v u00 ∈ Y ∩ An and we can prove that Un0 v ⊆ X and v Un0 ⊆ Y . By now we have constructed sets Un and Vn satisfying (5). Hence it remains to conclude that they are biprefix codes. If u0 ∈ Un is a proper prefix of u ∈ Un , we can assume that |u| = n − |v0 | (otherwise we are in Un−1 , which is a biprefix) and u0 ∈ Un−1 . Then there exists such v 00 ∈ Vn0 that v 00 u ∈ Y , but then also v 00 u0 ∈ Vn0 Un−1 ⊆ Y . Since Y is biprefix, we have a contradiction. Similar reasoning applies also, if u0 ∈ Un is a proper suffix of u ∈ Un . Hence Un is also a suffix code and therefore it is a biprefix. Similarly Vn is a biprefix code. Theorem 5. If finite biprefix codes X and Y are conjugates, then X = UV and Y = VU for some biprefix codes U and V . Proof. Applying Lemma 4 for n = maxx∈X |x| + maxy∈Y |y| − n0 , we obtain: for all u ∈ Un , uv0 ∈ X , so |u| ≤ maxx∈X |x| − |v0 | , for all v ∈ Vn , v u0 ∈ Y , so |v| ≤ maxy∈Y |y| − |u0 |
so that |uv| ≤ n. Hence we obtain: Un Vn ∩ A≤n = Un Vn Vn Un ∩ A≤n = Vn Un X ∩ A≤n = X Y ∩ A≤n = Y implying that X = Un Vn and Y = Vn Un .
Theorem 5 deserves a few comments. It shows that if finite biprefixes X and Y are conjugates, that is satisfy the conjugacy equation XZ = ZY with non-empty Z , they can be decomposed into the form X = PQ
and Y = QP
for some biprefixes P and Q .
Of course, the reverse holds as well, namely they satisfy the conjugacy equation, e.g., for Z = P (QP )I , with I ⊆ N. Hence the conjugacy in the case of finite biprefixes can be defined equivalently in the above two ways. In general, these definitions are not equivalent as discussed in [3]. To continue our analysis let us see what happens if the biprefixes X and Y have two different factorizations X = UV ,
Y = VU
and
X = U 0V 0, Y = V 0U 0.
This indeed is possible, if X and Y are not primitive, as pointed out in the Example 1. We show that unique factorization for X and Y can be given. For this we need the following simple lemma on words. Lemma 6. All solutions of the pair of word equations
xy = uv yx = v u
over the alphabet A are of the form x = β(αβ)i , y = (αβ)j α , u = β(αβ)k and v = (αβ)l α with i + j = k + l for integers i, j, k, l and α, β ∈ A∗ .
2350
J. Cassaigne et al. / Theoretical Computer Science 410 (2009) 2345–2351
Proof. The proof is given here for the sake of completeness. If we assume that |u| ≤ |x|, the first equation implies that for some word t x = ut and hence
v = ty and yut = tyu. The latter condition means that yu and t commute, i.e., we can write t = (αβ)f ,
y = (αβ)d α,
and
u = β(αβ)e ,
where α, β ∈ A∗ and d, e, f ≥ 0. This leads to the solutions
x = β(αβ)e+f y = (αβ)d α e u = β(αβ) f +d v = (αβ) α. The case |x| ≤ |u| is symmetric and solutions are the same up to renaming of x, y, u and v .
Since biprefix codes can be viewed as words over the alphabet of all indecomposable biprefixes, we conclude from Theorem 5 and Lemma 6 the following theorem. Theorem 7. If finite biprefix codes X and Y are conjugates, then X = (PQ )i and Y = (QP )i for some integer i, primitive languages PQ and QP and unique biprefix codes P and Q . Proof. Theorem 5 implies that X and Y have some factorization X = UV and Y = VU with biprefix codes U and V . If X = UV = U 0 V 0 and Y = VU = V 0 U 0 are two different such factorizations of X and Y , then we can apply Lemma 6 for equations
UV = U 0 V 0 VU = V 0 U 0 .
Here biprefix codes are now viewed as words over the alphabet of appropriate finite set of indecomposable biprefix codes. This gives that U = P (QP )j , V = (QP )k Q , U 0 = P (QP )l and V 0 = (QP )m Q for some integers j, k, l and m. Then X = (PQ )i and Y = (QP )i for some integer i. Naturally P and Q can be chosen so that PQ and QP are primitive roots of X and Y , respectively. Hence all different factorizations X = UV , Y = VU can be given in the form described in the theorem, that is as products of the same biprefix codes P and Q . Now, we are ready to conclude our remarks. If X and Y are finite biprefixes, which are conjugates, then there exist unique biprefixes P and Q such that PQ and QP are primitive, X = (PQ )i and Y = (QP )i . Hence X and Y are conjugates in the form of word type 2 as in formula (4). In the next section we complete our characterization by showing that the form of Z is always Z = (PQ )I P, for some non-empty set I ⊆ N. 6. The conjugator of finite biprefix codes Now it is rather easy to show that the conjugacy of finite biprefix codes X and Y is always of word type 2, i.e., of form (4). This proof is based on some nontrivial results originally proved in [16], see also [9]. Lemma 8. Let X be a prefix code, ρ(X ) its primitive root, and C (X ) its centralizer. Then C (X ) = ρ(X )∗ . Lemma 9. For any prefix code X , if the set of words L commutes with X , then L = ρ(X )I , for some I ⊆ N. With the help of above lemmas we can characterize the conjugator of two finite biprefix codes. Theorem 10. For given finite biprefix codes X and Y the conjugator, i.e., the largest solution Z of equation XZ = ZY is Z = (PQ )∗ P, where P and Q are biprefix codes such that ρ(X ) = PQ and ρ(Y ) = QP. Proof. From previous theorems we know, that X = (PQ )k and Y = (QP )k for some P and Q such that ρ(X ) = PQ and ρ(Y ) = QP. Lemma 8 shows us, that the centralizer of X is C (X ) = (PQ )∗ . Let Z be the conjugator of X and Y . When we catenate the language Q to both sides of equation XZ = ZY and notice that YQ = (QP )k Q = Q (PQ )k = QX , we obtain XZQ = ZYQ = ZQX . This means, that language ZQ commutes with X . Now, Lemma 8 implies that ZQ ⊆ C (X ) = ρ(X )∗ = (PQ )∗ . Since clearly the empty word is not in ZQ , we can write ZQ ⊆ (PQ )+ .
J. Cassaigne et al. / Theoretical Computer Science 410 (2009) 2345–2351
2351
The language Q is a biprefix code, so we can eliminate the right factor Q , since the semigroup of biprefix codes is free, and hence obtain: Z ⊆ (PQ )∗ P . On the other hand, we know that (PQ )∗ P clearly is a solution of XZ = ZY , and hence (PQ )∗ P ⊆ Z . As a conclusion we see that the conjugator Z is Z = (PQ )∗ P . More generally we can characterize all conjugators of finite biprefix codes as follows. Theorem 11. If a non-empty solution of the conjugacy equation XZ = ZY for finite biprefix codes X and Y exists, it is of word type, i.e., X = (PQ )k ,
Y = (QP )k
and
Z = (PQ )I P ,
for languages P , Q and some set I ⊆ N. Proof. As in the previous proof, we know that X = (PQ )k and (QP )k and PQ and Y = QP are primitive. Let Z be an arbitrary language such that XZ = YZ . Now again XZQ = ZQX and, by Lemma 9, we have ZQ = (PQ )J for some J ⊆ N. Clearly 0 ∈ /J and we can again eliminate the right factor, biprefix code Q , from the equation. This gives us the conjugator Z = (PQ )I P with some index set I = {i ∈ N | i + 1 ∈ J }. 7. The conjugacy problem for finite biprefix codes We will refer to the problem "Are given finite languages X and Y conjugates?" as the conjugacy problem [8]. In general, the decidability status of this problem is not known, and it is expected to be hard. Our results allow to answer it in the case of biprefix codes. Theorem 12. The conjugacy problem for finite biprefix codes is decidable. Proof. Let X and Y be finite biprefix codes. Languages X and Y have unique factorizations as the catenation of indecomposable biprefix codes. These factorizations can be found, for example, by finding the minimal DFA for these biprefixes [1]. Theorem 5 shows that if X and Y are conjugates, then X = UV and Y = VU for some biprefix codes U and V . Since the prime factorizations of X and Y are finite, there are only a finite number of candidates for U and V . If U and V can be found, then equation XZ = ZY has at least word type solutions with given X and Y . If on the other hand, suitable U and V cannot be found, then X and Y are not conjugates. References [1] J. Berstel, D. Perrin, Theory of Codes, Academic Press, New York, 1985. [2] J. Cassaigne, J. Karhumäki, J. Maňuch, On Conjugacy of Languages, Theor. Inform. Appl. 35 (2001) 535–550. [3] Ch. Choffrut, Conjugacy in free inverse monoids, in: Proceedings of the Second International Workshop on Word Equations and Related Topics, in: LNCS, vol. 677, Springer-Verlag, London, UK, 1991, pp. 6–22. [4] Ch. Choffrut, J. Karhumäki, Combinatorics on words, in: G. Rozenberg, A. Salomaa (Eds.), in: Handbook of Formal Languages, vol. 1, Springer, Berlin, 1997, pp. 329–438. [5] Ch. Choffrut, J. Karhumäki, N. Ollinger, The commutation of finite sets: A challenging problem, Theoret. Comput. Sci. 273 (1–2) (2002) 69–79. [6] J.H. Conway, Regular Algebra and Finite Machines, Chapman Hall, 1971. [7] Y.-S. Han, A. Salomaa, K. Salomaa, D. Wood, S. Yu, On the existence of prime decompositions, Theoret. Comput. Sci 376 (1–2) (2007) 60–69. [8] J. Karhumäki, Combinatorial and computational problems on finite sets of words, in: Machines, Computations, and Universality, Lecture Notes in Comput. Sci. 2055 (2001) 69–81. [9] J. Karhumäki, M. Latteux, I. Petre, The commutation with codes, Theoret. Comput. Sci. 340 (2005) 322–333. [10] J. Karhumäki, M. Latteux, I. Petre, The commutation with ternary sets of words, Theory Comput. Syst. 38 (2) (2005) 161–169. [11] J. Karhumäki, I. Petre, et al., The branching point approach to Conway’s problem, in: W. Brauer (Ed.), Formal and Natural Computing, in: LNCS, vol. 2300, Springer-Verlag, Berlin Heidelberg, 2002, pp. 69–76. [12] M. Kunc, The power of commuting with finite sets of words, Theory Comput. Syst. 40 (4) (2007) 521–551. [13] P. Massazza, P. Salmela, On the simplest centralizer of a language, RAIRO—Theoret. Informatics Appl. 40 (2006) 295–301. [14] A. Mateescu, A. Salomaa, S. Yu, Factorizations of languages and commutativity conditions, Acta Cybernet. 15 (3) (2002) 339–351. [15] D. Perrin, Codes conjugués, Inform. Control 20 (1972) 222–231. [16] B. Ratoandromanana, Codes et motifs, RAIRO Inform. Theor. 23 (4) (1989) 425–444.
Theoretical Computer Science 410 (2009) 2352–2364
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Asynchronous spiking neural P systems Matteo Cavaliere a , Oscar H. Ibarra b,∗ , Gheorghe Păun d,e , Omer Egecioglu b , Mihai Ionescu c , Sara Woodworth b a
Microsoft Research-University of Trento, Centre for Computational and Systems Biology, Trento, Italy
b
Department of Computer Science, University of California, Santa Barbara, CA 93106, USA
c
Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Pl. Imperial Tàrraco 1, 43005 Tarragona, Spain
d
Institute of Mathematics of the Romanian Academy, PO Box 1-764, 014700 Bucharest, Romania
e
Department of Computer Science and AI, University of Sevilla, Avda Reina Mercedes s/n, 41012 Sevilla, Spain
article
info
Keywords: Membrane computing Spiking neural P system Turing computability Counter machine Decidability
a b s t r a c t We consider here spiking neural P systems with a non-synchronized (i.e., asynchronous) use of rules: in any step, a neuron can apply or not apply its rules which are enabled by the number of spikes it contains (further spikes can come, thus changing the rules enabled in the next step). Because the time between two firings of the output neuron is now irrelevant, the result of a computation is the number of spikes sent out by the system, not the distance between certain spikes leaving the system. The additional non-determinism introduced in the functioning of the system by the non-synchronization is proved not to decrease the computing power in the case of using extended rules (several spikes can be produced by a rule). That is, we obtain again the equivalence with Turing machines (interpreted as generators of sets of (vectors of) numbers). However, this problem remains open for the case of standard spiking neural P systems, whose rules can only produce one spike. On the other hand we prove that asynchronous systems, with extended rules, and where each neuron is either bounded or unbounded, are not computationally complete. For these systems, the configuration reachability, membership (in terms of generated vectors), emptiness, infiniteness, and disjointness problems are shown to be decidable. However, containment and equivalence are undecidable. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Spiking neural P systems (SN P systems, for short) were introduced in [12] with the aim of incorporating specific ideas from spiking neurons into membrane computing. Currently, neural computing based on spiking is a field that is being heavily investigated (see, e.g., [5,14,15]). In short, an SN P system consists of a set of neurons placed in the nodes of a directed graph and sending signals (spikes, denoted in what follows by the symbol a) along the arcs of the graph (they are called synapses). Thus, the architecture is that of a tissue-like P system, with only one kind of object present in the cells (the reader is referred to [18] for an introduction to membrane computing and to [21] for the up-to-date information about this research area). The objects evolve by means of standard spiking rules, which are of the form E /ac → a; d, where E is a regular expression over {a} and c , d are natural numbers, c ≥ 1, d ≥ 0. The meaning is that a neuron containing k spikes such that ak ∈ L(E ), k ≥ c , can consume c spikes and produce one spike, after a delay of d steps. This spike is sent to all neurons connected by an outgoing synapse from the
∗
Corresponding author. E-mail addresses:
[email protected] (M. Cavaliere),
[email protected] (O.H. Ibarra),
[email protected],
[email protected] (G. Păun),
[email protected] (O. Egecioglu),
[email protected] (M. Ionescu),
[email protected] (S. Woodworth). 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.031
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
2353
neuron where the rule was applied. There are also forgetting rules, of the form as → λ, with the meaning that s ≥ 1 spikes are removed, provided that the neuron contains exactly s spikes. Extended rules were considered in [4,17]: these rules are of the form E /ac → ap ; d, with the meaning that when using the rule, c spikes are consumed and p spikes are produced. Because p can be 0 or greater than 0, we obtain a generalization of both standard spiking and forgetting rules. In this paper we consider extended spiking rules with restrictions on the type of the regular expressions used. In particular, we consider two types of rules. The first type are called bounded rules and are of the form ai /ac → ap ; d, where 1 ≤ c ≤ i, p ≥ 0, and d ≥ 0. We also consider unbounded rules of the form ai (aj )∗ /ac → ap ; d, where i ≥ 0, j ≥ 1, c ≥ 1, p ≥ 0, d ≥ 0. A neuron is called bounded if it has only bounded rules, while it is unbounded if it has only unbounded rules. A neuron is then called general if it has both bounded and unbounded rules. An SN P system is called bounded if it has only bounded neurons, while it is called unbounded if each neuron is either bounded or unbounded. A general SN P system is a system with general neurons. It was shown in [10] that general SN P systems are universal. An SN P system (of any type) works in the following way. A global clock is assumed, and in each time unit, each neuron which can use a rule should do it (the system is synchronized), but the work of the system is sequential in each neuron: only (at most) one rule is used in each neuron. One of the neurons is considered to be the output neuron, and its spikes are also sent to the environment. The moments of time when (at least) a spike is emitted by the output neuron are marked with 1, the other moments are marked with 0. This binary sequence is called the spike train of the system — it is infinite if the computation does not stop. With a spike train we can associate various numbers, which can be considered as computed (we also say generated) by an SN P system. For instance, in [12] only the distance between the first two spikes of a spike train was considered, then in [19] several extensions were examined: the distance between the first k spikes of a spike train, or the distances between all consecutive spikes, taking into account all intervals or only intervals that alternate, all computations or only halting computations, etc. An SN P system can also work in the accepting mode: a neuron is designated as the input neuron and two spikes are introduced in it, at an interval of n steps; the number n is accepted if the computation halts. Two main types of results were obtained (for general systems, with standard rules): computational completeness in the case when no bound was imposed on the number of spikes present in the system, and a characterization of semilinear sets of numbers in the case when a bound was imposed. In [12] it is proved that synchronized SN P systems using standard rules characterize NRE; improvements in the form of the regular expressions, removing the delay, or the forgetting rules can be found in [10]. The result is true both for the generative and the accepting case. In the proof of these results, the synchronization plays a crucial role, but both from a mathematical point of view and from a neuro-biological point of view it is rather natural to consider non-synchronized systems, where the use of rules is not obligatory. Even if a neuron has a rule enabled in a given time unit, this rule is not obligatorily used. The neuron may choose to remain unfired, maybe receiving spikes from the neighboring neurons. If the unused rule may be used later, it is used later, without any restriction on the interval when it has remained unused. If the new spikes made the rule non-applicable, then the computation continues in the new circumstances (maybe other rules are enabled now). This way of using the rules applies also to the output neuron, so the distance in time between the spikes sent out by the system is no longer relevant. Hence, for non-synchronized SN P systems, the result of the computation is the total number of spikes sent out to the environment. This makes it necessary to consider only halting computations. (The computations which do not halt are ignored and provide no output.) We stress the fact that we count all spikes sent out. A possibility which we do not consider is to only count the steps when at least one spike exits the system. Moreover, it is also possible to consider systems with several output neurons. In this case one counts the spikes emitted by the output neurons and collect them as vectors. The synchronization is in general a powerful feature, useful in controlling the work of a computing device. However, it turns out that the loss in power entailed by removing the synchronization is compensated in the case of general SN P systems where extended rules are used. In fact, we prove that such systems are still equivalent with Turing machines (as generators of sets of (vectors of) natural numbers). On the other hand, we also show that a restriction which looks, at first sight, rather minor, has a crucial influence on the power of the systems and decreases their computing power: specifically, we prove that unbounded SN P systems are not computationally complete (as mentioned above, for bounded systems this result is already known from [12]). Moreover, for unbounded systems, the configuration reachability, membership (in terms of generated vectors), emptiness, infiniteness, and disjointness problems can be decided. However, containment and equivalence are undecidable. Note that, for general SN P systems, even reachability and membership are undecidable, because these systems are universal (in a constructive way). However, universality remains open for non-synchronized SN P systems using standard rules. We find this problem worth investigating (a non-universality result – as we expect it will be the case – can show an interesting difference between synchronized and non-synchronized devices, with the loss in power compensated by the additional ‘‘programming capacity’’ of extended rules). The non-synchronized case remains to be considered also for other issues specific to SN P systems, such as looking for small universal systems as in [17], for normal forms as in [10], for generating languages or processing finite or infinite sequences, [3,4,19], characterizations of multi-dimensional semilinear sets of numbers as in [8], using the rules in the exhaustive mode, as in [13], etc.
2354
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
Another mode of computation of an SN P system that has been studied earlier [9] is the sequential mode. In this mode, at every step of the computation, if there is at least one neuron with at least one rule that is fireable, we only allow one such neuron and one such rule (both chosen non-deterministically) to be fired. It was shown in [9] that certain classes of sequential SN P systems are equivalent to partially blind counter machines, while others are universal. 2. Prerequisites We assume the reader to have some familiarity with (basic elements of) language and automata theory, e.g., from [20], and introduce only a few notations and the definitions related to SN P systems (with extended rules). For an alphabet V , V ∗ is the free monoid generated by V with respect to the concatenation operation and the identity λ (the empty string); the set of all non-empty strings over V , that is, V ∗ − {λ}, is denoted by V + . When V = {a} is a singleton, then we write simply a∗ and a+ instead of {a}∗ , {a}+ . The length of a string x ∈ V ∗ is denoted by |x|. The family of Turing computable sets of natural numbers is denoted by NRE (it is the family of length sets of recursively enumerable languages) and the family of Turing computable sets of vectors of natural numbers is denoted by PsRE. A spiking neural P system (in short, an SN P system), of degree m ≥ 1, is a construct of the form
Π = (O, σ1 , . . . , σm , syn, out ), where: (1) O = {a} is the singleton alphabet (a is called spike); (2) σ1 , . . . , σm are neurons, of the form σi = (ni , Ri ), 1 ≤ i ≤ m, where: (a) ni ≥ 0 is the initial number of spikes contained by the neuron; (b) Ri is a finite set of extended rules of the following form: E /ac → ap ; d, where E is a regular expression with a the only symbol used, c ≥ 1, and p, d ≥ 0, with c ≥ p; if p = 0, then d = 0, too. (3) syn ⊆ {1, 2, . . . , m} × {1, 2, . . . , m} with (i, i) ∈ / syn for 1 ≤ i ≤ m (synapses); (4) out ∈ {1, 2, . . . , m} indicates the output neuron. A rule E /ac → ap ; d with p ≥ 1 is called extended firing (we also say spiking) rule; a rule E /ac → ap ; d with p = d = 0 is written in the form E /ac → λ and is called a forgetting rule. If L(E ) = {ac }, then the rules are written in the simplified form ac → ap ; d and ac → λ. A rule of the type E /ac → a; d and ac → λ is said to be restricted (or standard). In this paper, we investigate extended spiking rules using particular types of regular expressions. A rule is bounded if it is of the form ai /ac → ap ; d, where 1 ≤ c ≤ i, c ≥ p ≥ 0, and d ≥ 0. A neuron is bounded if it contains only bounded rules. A rule is called unbounded if is of the form ai (aj )∗ /ac → ap ; d, where i ≥ 0, j ≥ 1, c ≥ 1, c ≥ p ≥ 0, d ≥ 0. (In all cases, we assume c ≥ p; this restriction rules out the possibility of ‘‘producing more than consuming’’, but it plays no role in arguments below and can be omitted.) A neuron is unbounded if it contains only unbounded rules. A neuron is general if it contains both bounded and unbounded rules. An SN P system is bounded if all the neurons in the system are bounded. It is unbounded if it has bounded and unbounded neurons. Finally, an SN P system is general if it has general neurons (i.e., it contains at least one neuron which has both bounded and unbounded rules). It is known that any regular set over a 1-letter symbol a can be expressed as a finite union of regular sets of the form {ai (aj )k | k ≥ 0} for some i, j ≥ 0. Note that such a set is finite if j = 0. The rules are applied as follows: if the neuron σi contains k spikes, ak ∈ L(E ) and k ≥ c, then the rule E /ac → ap ; d ∈ Ri is enabled and it can be applied. This means that c spikes are consumed, k − c spikes remain in the neuron, the neuron is fired, and it produces p spikes after d time units. If d = 0, then the spikes are emitted immediately, if d = 1, then the spikes are emitted in the next step, and so on. In the case d ≥ 1, if the rule is used in step t, then in steps t , t + 1, t + 2, . . . , t + d − 1 the neuron is closed; this means that during these steps it uses no rule and it cannot receive new spikes (if a neuron has a synapse to a closed neuron and sends spikes along it, then the spikes are lost). In step t + d, the neuron spikes and becomes again open, hence can receive spikes (which can be used in step t + d + 1). Notice that distinct rules may have different delays, i.e., distinct d’s. The p spikes emitted by a neuron σi are replicated and they go to all neurons σj such that (i, j) ∈ syn (each σj receives p spikes). If the rule is a forgetting one of the form E /ac → λ then, when it is applied, c ≥ 1 spikes are removed. In the synchronized mode, considered up to now in the SN P systems investigations, a global clock is considered, marking the time for all neurons, and in each time unit, in each neuron which can use a rule, a rule should be used. Because two rules E1 /ac1 → ap1 ; d1 and E2 /ac2 → ap2 ; d2 can have L(E1 ) ∩ L(E2 ) 6= ∅, it is possible that two or more rules can be applied in a neuron, and then one of them is chosen non-deterministically. Note that the neurons work in parallel (synchronously), but each neuron processes sequentially its spikes, using only one rule in each time unit. In the non-synchronized case considered here the definition of a computation in an SN P system is easy: in each time unit, any neuron is free to use a rule or not. Even if enabled, a rule is not necessarily applied, the neuron can remain still in spite of the fact that it contains rules which are enabled by its contents. If the contents of the neuron are not changed, a rule
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
2355
which was enabled in a step t can fire later. If new spikes are received, then it is possible that other rules will be enabled — and applied or not. It is important to point out that when a neuron spikes, its spikes immediately leave the neuron and reach the target neurons simultaneously (as in the synchronized systems, there is no time needed for passing along a synapse from one neuron to another neuron). The initial configuration of the system is described by the numbers n1 , n2 , . . . , nm representing the initial number of spikes present in each neuron. Using the rules as suggested above, we can define transitions among configurations. Any sequence of transitions starting in the initial configuration is called a computation. A computation is successful if it reaches a configuration where all bounded and unbounded neurons are open but none is fireable (i.e., the SN P system has halted) Because now ‘‘the time does not matter’’, the spike train can have arbitrarily many occurrences of 0 between any two occurrences of 1, hence the result of a computation can no longer be defined in terms of the steps between two consecutive spikes as in the standard SN P system definition. That is why, the result of a computation is defined here as the total number of spikes sent into the environment by the output neuron. Specifically, a number x is then generated by the SN P system if there is a successful computation of the system where the output neuron emits exactly x spikes (if several spikes are emitted by the output neuron, at the same time, all of them are counted). Because of the non-determinism in using the rules, a given system computes in this way a set of numbers. Successful computations which send no spike out can be considered as generating number zero, but in what follows we adopt the convention to ignore number zero when comparing the computing power of two devices. Of course, a natural definition of the result of a computation can also be the number of spikes present in a specified neuron in the halting configuration. This is much closer to the traditional style of membrane computing, but there is no difference with respect to the previous definition: consider an additional neuron, which receives the spikes emitted by the previous output neuron and has no rule inside. When the computation halts, the contents of this additional neuron are the result of the computation. SN P systems can also be used for generating sets of vectors, by considering several output neurons, σi1 , . . . , σik . In this case, the system is called a k-output SN P system. Here a vector of numbers, (n1 , . . . , nk ), is generated by counting the number of spikes sent out by neurons σi1 , . . . , σik respectively during a successful computation. nsyn nsyn We denote by Ngen (Π ) [Psgen (Π )] the set [of vectors, resp.] of numbers generated in the non-synchronized way by a nsyn nsyn system Π , and by NSpiktot EPm (α, deld ) [PsSpiktot EPm (α, deld )], α ∈ {gen, unb, boun}, d ≥ 0, the family of such sets of numbers [sets of vectors of numbers, resp.] generated by systems of type α (gen stands for general, unb for unbounded, boun for bounded), with at most m neurons and rules having delay at most d. When m is not bounded, then it is replaced by ∗. (The subscript tot reminds us of the fact that we count all spikes sent to the environment.) A 0-delay SN P system is one where the delay in all the rules of the neurons is zero. Because in this paper we always deal with 0-delay systems, the delay (d = 0) is never specified in the rules. An SN P system working in the non-synchronized manner can also be used in the accepting way: a number n is introduced in the system, in the form of n spikes placed in a distinguished input neuron, and if the computation eventually stops, then n is accepted. In what follows we will only occasionally mention the accepting case. Because there is no confusion, in this paper, non-synchronized SN P systems are often simply called SN P systems. The examples from the next section will illustrate and clarify the above definitions. 3. Three examples In order to clarify the previous definitions, we start by discussing some examples, which are also of an interest per se. In this way, we also introduce the standard way to pictorially represent a configuration of an SN P system, in particular, the initial configuration. Specifically, each neuron is represented by a ‘‘membrane’’, marked with a label and having inside both the current number of spikes (written explicitly, in the form an for n spikes present in a neuron) and the evolution rules. The synapses linking the neurons are represented by directed edges (arrows) between the membranes. The output neuron is identified by both its label, out, and pictorially by a short arrow exiting the membrane and pointing to the environment. Example 1. The first example is the system Π1 given in Fig. 1. We have only two neurons, initially each of them containing one spike. In the synchronized manner, Π1 works forever, with both neurons using a rule in each step — hence the output neuron sends one spike out in each step, i.e., the spike train is the infinite sequence of symbols 1, written 1ω . In the non-synchronized mode, the system can halt in any moment: each neuron can wait an arbitrary number of steps before using its rule; if both neurons fire at the same time, then the computation continues, if not, one neuron consumes its spike and the other one gets two spikes inside and can never use its rule. nsyn Consequently, Ngen (Π1 ) = N, the set of natural numbers. It is worth noting that synchronized systems with one or two neurons characterize the finite sets of numbers (see [12]), hence we already have here an essential difference between the two modes of using the rules: in the non-synchronized mode, systems with two neurons can generate infinite sets of numbers. Clearly, it is possible to construct non-synchronized systems producing a finite set of numbers.
2356
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
Fig. 1. An example of an SN P system where synchronization matters.
Fig. 2. An SN P system functioning in the same way in both modes.
Fig. 3. A version of the system from Fig. 2.
Example 2. The two neurons of the system above can be synchronized by means of a third neuron even when they do not work synchronously, and this is shown in Fig. 2. This time, the intermediate neuron σ2 stores the spikes produced by the two neurons σ1 , σout , so that only after both these neurons spike they receive spikes back. Both in the synchronized and the non-synchronized way, this system never halts, and the number of spikes sent out is infinite in both cases. Example 3. A slight (at the first sight) change in the neuron σ2 from the previous example will lead to a much more intricate functioning of the system — this is the case with the system Π3 from Fig. 3. The system behaves like that from Fig. 2 as long as neuron σ2 uses the rule a2 → a. If, instead, rule a2 /a → a is used, then either the computation stops (if both σ1 and σout spike, then σ2 will get 3 spikes and will never spike again), or it continues working forever. In this latter case, there are two possibilities: σ2 will cooperate with σ1 or with σout (the neuron which
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
2357
spikes receives one spike back, but the other one gets two spikes and is blocked; σ2 continues by using the rule a2 /a → a, otherwise the computation halts, because σ2 will get next time only one spike). If the computation continues between σ2 and σ1 , then no spike will be sent outside; if the cooperation is between σ2 and σout , then the system sends out an arbitrary number of spikes. Again the number of spikes sent out is the same both in the synchronized and the non-synchronized modes (the generated set is again N), but the functioning of the system is rather different in the two modes. 4. Computational completeness of general SN P systems We pass now to prove that the power of general neurons (where extended rules, producing more than one spike at a time, are used) can compensate the loss of power entailed by removing the synchronization. In the following proof we use the characterization of NRE by means of multicounter machines (abbreviated CM and also called register machines) [16]. Such a device – in the non-deterministic version – is a construct M = (m, H , l0 , lh , I ), where m is the number of counters, H is the set of instruction labels, l0 is the start label (labeling an ADD instruction), lh is the halt label (assigned to instruction HALT), and I is the set of instructions; each label from H labels only one instruction from I, thus precisely identifying it. When it is useful, a label can be seen as a state of the machine, l0 being the initial state, lh the final/accepting state. The labeled instructions are of the following forms:
• li : (ADD(r ), lj , lk ) (add 1 to counter r and then go to one of the instructions with labels lj , lk non-deterministically chosen), • li : (SUB(r ), lj , lk ) (if counter r is non-empty, then subtract 1 from it and go to the instruction with label lj , otherwise go to the instruction with label lk ),
• lh : HALT (the halt instruction). A counter machine M generates a set N (M ) of numbers in the following way: we start with all counters empty (i.e., storing the number zero), we apply the instruction with label l0 and we continue to apply instructions as indicated by the labels (and made possible by the contents of counters). If we reach the halt instruction, then the number n present in counter 1 at that time is said to be generated by M. It is known (see, e.g., [16]) that counter machines generate all sets of numbers which are Turing computable. A counter machine can also accept a set of numbers: a number n is accepted by M if, starting with n in counter 1 and all other counters empty, the computation eventually halts (without loss of generality, we may assume that in the halting configuration all counters are empty). Deterministic counter machines (i.e., with ADD instructions of the form li : (ADD(r ), lj )) working in the accepting mode are known to be equivalent with Turing machines. It is also possible to consider counter machines producing sets of vectors of natural numbers. In this case a distinguished set of k counters (for some k ≥ 1) is designated as the output counters. A k-tuple (n1 , . . . , nk ) ∈ Nk is generated if M eventually halts and the contents of the output counters are n1 , . . . , nk , respectively. Without loss of generality we may assume that in the halting configuration all the counters, except the output ones, are empty. We also assume (without loss of generality) that the output counters are non-decreasing and their contents are only incremented (i.e., the output counters are never the subject of SUB instructions, but only of ADD instructions). We will refer to a CM with k output counters (the other counters are auxiliary counters) as a k-output CM. It is well known that a set S of k-tuples of numbers is generated by a k-output CM if and only if S is recursively enumerable. Therefore they characterize PsRE. We shall refer to a 1-output CM as simply a CM. nsyn
Theorem 4.1. NSpiktot EP∗
(gen, del0 ) = NRE. nsyn
Proof. We only have to prove the inclusion NRE ⊆ NSpiktot EP∗ (gen, del0 ), and to this aim, we use the characterization of NRE by means of counter machines used in the generating mode. Let M = (m, H , l0 , lh , I ) be a counter machine with m counters, having the properties specified above: the result of a computation is the number from counter 1 and this counter is never decremented during the computation. We construct a spiking neural P system Π as follows. For each counter r of M let tr be the number of instructions of the form li : (SUB(r ), lj , lk ), i.e., all SUB instructions acting on counter r (of course, if there is no such a SUB instruction, then tr = 0, which is the case for r = 1). Denote T = 2 · max{tr | 1 ≤ i ≤ m} + 1. For each counter r of M we consider a neuron σr in Π whose contents correspond to the contents of the counter. Specifically, if the counter r holds the number n ≥ 0, then the neuron σr will contain 3Tn spikes. With each label l of an instruction in M we also associate a neuron σl . Initially, all these neurons are empty, with the exception of the neuron σl0 associated with the start label of M, which contains 3T spikes. This means that this neuron is ‘‘activated’’. During the computation, the neuron σl which receives 3T spikes will become active. Thus, simulating an instruction li : (OP(r ), lj , lk ) of M means starting with neuron σli activated, operating the counter r as requested by OP, then introducing 3T spikes in one of the neurons σlj , σlk , which becomes in this way active. When activating the neuron σlh , associated with the halting label of M, the computation in M is completely simulated in Π ; we will then send to the
2358
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
Fig. 4. Module ADD (simulating li : (ADD(r ), lj , lk )).
environment a number of spikes equal to the number stored in the first counter of M. Neuron σ1 is the output neuron of the system. Further neurons will be associated with the counters and the labels of M in a way which will be described immediately. All of them are initially empty. The construction itself is not given in symbols, but we present modules associated with the instructions of M (as well as the module for producing the output) in the graphical form introduced in the previous section. These modules are presented in Figs. 4–6. Before describing these modules and their work, let us remember that the labels are injectively associated with the instructions of M, hence each label precisely identifies one instruction, either an ADD or a SUB one, with the halting label having a special situation — it will be dealt with by the FIN module. Remember also that counter 1 is never decremented. As mentioned before, because the system we construct has only rules with delay 0, the delay is not specified in the figures below. Simulating an ADD instruction li : (ADD(r ), lj , lk ) — module ADD (Fig. 4). The initial instruction, that labeled with l0 , is an ADD instruction. Assume that we are in a step when we have to simulate an instruction li : (ADD(r ), lj , lk ), with 3T spikes present in neuron σli (like σl0 in the initial configuration) and no spike in any other neuron, except those neurons associated with the counters. Having 3T spikes inside, neuron σli can fire, and at some time it will do it, producing 3T spikes. These spikes will simultaneously go to neurons σi,1 and σi,2 (as well as to neuron σr , thus simulating the increase of the value of counter r with 1). Also these neurons can spike at any time. If one of them is doing it, then 3T spikes arrive in neuron σi,3 , which cannot use them. This means that neuron σi,3 must wait until further 3T spikes come from that neuron σi,1 , σi,2 which fired later. With 6T spikes inside, neuron σi,3 can fire, by using one of its rules, non-deterministically chosen. These rules determine the non-deterministic choice of the neuron σlj , σlk to activate. If, for instance, the rule a6T → a3T was used, then both σi,4 and σi,5 receive 3T spikes. Only σi,4 can use them for spiking, while σi,5 can forget them. Eventually σi,4 fires, otherwise the computation does not halt. If this ADD instruction is simulated again and further spikes are sent to neuron σi,5 although it has not removed its spikes, then it will accumulate at least 6T spikes and will never fire again. This means that no ‘‘wrong’’ step is done in the system Π because of the non-synchronization. If in σi,3 one uses the rule a6T → a4T , then the computation proceeds in a similar way, eventually activating neuron σlk . Consequently, the simulation of the ADD instruction is possible in Π and no computation in Π will end and will provide an output (see also below) if this simulation is not correctly completed. Simulating a SUB instruction li : (SUB(r ), lj , lk ) — module SUB (Fig. 5). Let us examine now Fig. 5, starting from the situation of having 3T spikes in neuron σli and no spike in other neurons, except neurons associated with counters; assume that neuron σr holds a number of spikes of the form 3Tn, n ≥ 0. Assume
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
2359
Fig. 5. Module SUB (simulating li : (SUB(r ), lj , lk )).
also that this is the sth instruction of this type dealing with counter r, for 1 ≤ s ≤ tr , in a given enumeration of instructions (because li precisely identifies the instruction, it also identifies s). Sometimes, neuron σli spikes and sends 3T − s spikes both to σr and to σi,0 . These spikes can be forgotten in this latter neuron, because 2T < 3T − s < 4T . Sometimes, also neuron σr will fire, and will send 2T + s of 3T + s spikes to neuron σi,0 . If no spike is here, then no other action can be done, also these spikes will eventually be removed, and no continuation is possible (in particular, no spike is sent out of the system; remember that number zero is ignored, hence we have no output in this case). If neuron σi,0 does not forget the spikes received from σli (this is possible, because of the non-synchronized mode of using the rules), then eventually neuron σr will send here either 3T + s spikes – in the case where it contains more than 3T − s spikes (hence counter r is not empty), or 2T + s spikes – in the case where its only spikes are those received from σli . In either case, neuron σi,0 accumulates more than 4T spikes, hence it cannot forget them. Depending on the number of spikes accumulated, either 6T or 5T , neuron σi,0 eventually spikes, sending 3T or 2T spikes, respectively, to neurons σi,1 , σi,2 , and σi,3 . The only possible continuation of neuron σi,1 is to activate neuron σlj (precisely in the case where the first counter of M was not empty). Neurons σi,2 and σi,3 will eventually fire and either forget their spikes or send 4T spikes to neuron σi,4 , which activates neuron σlk (in the case where the first counter of M was empty). It is important to note that if any neuron σi,u , u = 1, 2, 3, skips using the rule which is enabled and receives further spikes, then no rule can be applied there anymore and the computation is blocked, without sending spikes out. The simulation of the SUB instruction is correct in both cases, and no ‘‘wrong’’ computation is possible inside the module from Fig. 5. What remains to examine is the possible interferences between modules. First, let us consider the easy issue of the exit labels of the instructions of M, which can be labels of either ADD or SUB instructions, or can be lh . To handle this question, in both the ADD and the SUB modules we have written the rules from the neurons σlj , σlk in the form a3T → aδ(lu ) , where δ is the function defined on H as follows:
2360
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
Fig. 6. Module FIN (ending the computation).
if l is the label of an ADD instruction, 3T , 3T − s, if l is the label of the sth SUB instruction δ(l) = dealing with a counter r of M, 1 if l = lh . What is more complicated is the issue of passing spikes among modules, but not through the neurons which correspond to labels of M. This is the case with the neurons σr for which there are several SUB instructions, and this was the reason of considering the number T in writing the contents of neurons and the rules. Specifically, each σr for which there exist tr SUB instructions can send spikes to all neurons σi,0 as in Fig. 5. However, only one of these target neurons also receives spikes from a neuron σli , the one identifying the instruction which we want to simulate. Assume that we simulate the sth instruction li : (SUB(r ), lj , lk ), hence neuron σr sends 3T + s or 2T + s spikes to all neurons of the form σi0 ,0 for which there is an instruction li0 : (SUB(r ), lj0 , lk0 ) in M. These spikes can be forgotten, and this is the correct continuation of the computation (note that 2T < 2T + s < 3T + s < 4T hence there is a forgetting rule to apply in each σi0 ,0 ). If these spikes are not forgotten and at a subsequent step of the computation neuron σi0 ,0 receives further spikes from the neuron σr (the number of received spikes is 3T + s0 or 2T + s0 , for some 1 ≤ s0 ≤ tr ), then we accumulate a number of spikes which will be bigger than 4T (hence no forgetting rule can be used) but not equal to 5T or 6T (hence no firing rule can be used). Similarly, if these spikes are not forgotten and at a subsequent step of the computation the neuron σi0 ,0 receives spikes from the neuron σli0 (which is associated with σi0 ,0 in a module SUB as in Fig. 5), then again no rule can ever be applied here: if li0 : (SUB(r ), lj0 , lk0 ) is the s0 th SUB instruction acting on counter r, then s 6= s0 and the neuron accumulates a number of spikes greater than 4T (we receive 3T − s0 spikes from σli0 ) and different from 5T and 6T . Consequently, no computation can use the neurons σi0 ,0 if they do not forget the spikes received from σr . This means that the only computations in Π which can reach the neuron σlh associated with the halting instruction of M are the computations which correctly simulate the instructions of M and correspond to halting computations in M. Ending a computation — module FIN (Fig. 6). When the neuron σlh is activated, it (eventually) sends one spike to neuron σ1 , corresponding to the first counter of M. From now on, this neuron can fire, and it sends out one spike for each 3T spikes present in it, hence the system will emit a number of spikes which corresponds to the contents of the first counter of M in the end of a computation (after reaching instruction lh : HALT). nsyn Consequently, Ngen (Π ) = N (M ) and this completes the proof. Clearly, the previous construction is the same for the accepting mode, and can be carried out for deterministic counter machines (the ADD instructions are of the form li : (ADD(r ), lj )), hence also the obtained system is deterministic. Similarly, if the result of a computation is defined as the number of spikes present in a specified neuron in the halting configuration, then the previous construction is the same, we only have to add one further neuron which is designated as the output neuron and which collects all spikes emitted by neuron σ1 . Theorem 4.1 can be easily extended by allowing more output neurons and then simulating a k-output CM, producing in this way sets of vectors of natural numbers. nsyn
Theorem 4.2. PsSpiktot EP∗
(gen, del0 ) = PsRE.
Note that the system Π constructed in the proof of Theorem 4.1 is general: neurons σr involved in SUB instructions contain both bounded and unbounded rules. 5. Unbounded SN P systems As mentioned in the Introduction, synchronized bounded SN P systems characterize the semilinear sets of numbers and this equivalence is proven in a constructive manner — see, e.g., [12]. The proof can be easily extended to non-synchronized SN P systems. Thus, the interesting case which remains to investigate is that of unbounded SN P systems. In the following constructions we restrict the SN P systems syntactically to make checking a valid computation easier. Specifically, for an SN P system with unbounded neurons σ1 , . . . , σk (one of which is the output neuron) we assume as given non-negative integers m1 , . . . , mk , and for the rules in each σi we impose the following restriction: If mi > 0, then ami ∈ / L(E ) for any regular expression E appearing in a rule of neuron σi . This restriction guarantees that if neuron σi contains mi spikes, then the neuron is not fireable. It follows that when the following conditions are met during a computation, the system has halted and the computation is valid: (1) All bounded neurons are open, but none is fireable. (2) Each σi contains exactly mi spikes (hence none is fireable, too).
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
2361
This way of defining a successful computation, based on a vector (m1 , . . . , mk ), is called µ-halting. In the notation of the generated families we add the subscript µ to N or to Ps, in order to indicate the use of µ-halting. As defined earlier, a non-synchronized SN P system is one in which at each step, we select zero or more neurons to fire. Clearly, for 0-delay SN P systems, selecting zero or more neurons to fire at each step is equivalent to selecting one or more neurons to fire at each step. This is due to the fact that there are no delays. Hence, if we select no neuron to fire, the entire configuration of the system will remain the same. 5.1. 0-delay unbounded SN P systems and partially blind counter machines In this section we give a characterization of 0-delay unbounded SN P systems in terms of partially blind counter machines. A partially blind k-output CM (k-output PBCM) [7] is a k-output CM, where the counters cannot be tested for zero. The counters can be incremented by 1 or decremented by 1, but if there is an attempt to decrement a zero counter, the computation aborts (i.e., the computation becomes invalid). Note that, as usual, the output counters are non-decreasing. Again, by definition, a successful generation of a k-tuple requires that the machine enters an accepting state with all nonoutput counters zero. We denote by NPBCM the set of numbers generated by PBCMs and by PsPBCM the family of sets of vectors of numbers generated by using k-output PBCMs. It is known that k-output PBCMs can be simulated by Petri nets, and vice versa [7]. Hence, PBCMs are not universal. We shall refer to a 1-output PBCM simply as PBCM. We show that unbounded 0-delay SN P systems with µ-halting are equivalent to PBCMs. This result generalizes to the case when there are k outputs. First, we describe a basic construction. Basic construction Let C be a PBCM and let us consider the following operations, each of them executed in one step: (1) C remains unchanged. (2) C is incremented by 1. (3) If the contents of C are of the form i + kj (for some k ≥ 0), then C is decremented by d (i, j, d are fixed non-negative integers with i ≥ 0, j > 0, d > 0). Note that in (3) we may not know whether i + jk is greater than or equal to d, or what k is (the multiplicity of j), since we cannot test for zero. But if we know that C is of the form i + jk, when we subtract d from it and it becomes negative, it aborts and the computation is invalid, so we are safe. Note that if C contains i + jk and is greater than or equal to d, then C will contain the correct value after the decrement of d. It is possible to show that a PBCM can implement operations (1), (2) and (3) and such PBCM can be obtained by only adding to C a finite-state control. To prove this assertion we need to distinguish two cases, according to the values of i and j. i < j case Define a modulo-j counter to be a counter that can count from 0 to j − 1. We can think of the modulo-j counter as an undirected circular graph with nodes 0, 1, . . . , j − 1, where node s is connected to node s + 1 for 0 ≤ s ≤ j − 2 and j − 1 is connected to 0. Node s represents count s. We increment the modulo-j counter by going through the nodes in a ‘‘clockwise’’ direction. So, e.g., if the current node is s and we want to increment by 1, we go to s + 1, provided s ≤ j − 2; if s = j − 1, we go to node 0. Similarly, decrementing the modulo-j counter goes in the opposite direction, i.e., ‘‘counter-clockwise’’ — we go from s to s − 1; if it is 0, we go to s − 1. The parameters of the machine are the triple (i, j, d) with i ≥ 0, j > 0, d > 0. We associate with counter C a moduloj counter, J, which is initially in node (count) 0. During the computation, we keep track of the current visited node of J. Whenever we increment/decrement C , we also increment/decrement J. Clearly, the requirement that the value of C has to be of the form i + kj for some k ≥ 0 in order to decrement by d translates to the J being in node i, which is easily checked. i ≥ j case Suppose i = r + sj where s > 0 and 0 ≤ r < j. Subcase 1: If d > i − j, then we run i < j case described above with parameters (r , j, d). When we want to perform a decrement-d, it is enough to check that the counter is of the form r + kj for some k ≥ 0. Note that if r + kj < r + sj, then the machine will abort, so the computation branch is not successful anyway. Subcase 2: If d ≤ i − j, then we run i < j case described above with parameters (r , j, d) with the following difference. When we want to perform a decrement-d, we make sure that the counter is of the form r + kj for some k ≥ 0. Then first subtract i − j + 1
2362
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
from the counter (and if the machine aborts, nothing is lost), then add back (i − j + 1 − d) to the counter. The intermediate step of subtracting i − j + 1 from the counter is accomplished by a suitably modified copy of the original machine. We are now ready to prove the following result. nsyn
Lemma 5.1. Nµ Spiktot EP∗
(unb, del0 ) ⊆ NPBCM.
Proof. We describe how a PBCM M simulates an unbounded 0-delay SN P system Π . Let B be the set of bounded neurons; assume that there are g ≥ 0 such neurons. The bounded neurons can easily be simulated by M in its finite control. So we focus more on the simulation of the unbounded neurons. Let σ1 , . . . , σk be the unbounded neurons (one of which is the output neuron). M uses counters C1 , . . . , Ck to simulate the unbounded neurons. M also uses a non-decreasing counter C0 to keep track of the spikes sent by the output neuron to the environment. Clearly, the operation of C0 can easily be implemented by M. We introduce another counter, called ZERO (initially has value 0), whose purpose will become clear later. Assume for the moment that each bounded neuron in B has only one rule, and each unbounded neuron σt (1 ≤ t ≤ k) has only one rule of the form ait (ajt )∗ /adt → aet . M incorporates in its finite control a modulo-jt counter, Jt , associated with counter Ct (as described above). One step of Π is simulated in five steps by M as follows: (1) Non-deterministically choose a number 1 ≤ p ≤ g + k. (2) Non-deterministically select a subset of size p of the neurons in B ∪ {σ1 , . . . , σk }. (3) Check if the chosen neurons are fireable. The neurons in B are easy to check, and the unbounded neurons can be checked as described above, using their associated Jt ’s (modulo-jt counters). If at least one is not fireable, abort the computation by decrementing counter ZERO by 1. (4) Decrement the chosen unbounded counter by their dt ’s and update their associated Jt ’s, as described above. The chosen bounded counters are also easily decremented by the amounts specified in their rules (in the finite control). (5) Increment the chosen bounded counters and unbounded counters by the total number of spikes sent to the corresponding neurons by their neighbors (again updating the associated Jt ’s of the chosen unbounded counters). Also, increment C0 by the number of spikes the output neuron sends to the environment. At some point, M non-deterministically guesses that Π has halted: It checks that all bounded neurons are open and none is fireable, and the unbounded neurons have their specified values of spikes. M can easily check the bounded neurons, since they are stored in the finite control. For the unbounded neurons, M decrements the corresponding counter by the specified number of spikes in that neuron. Clearly, C0 = x (for some number x) with all other counters zero if and only if the SN P system outputs x with all the neurons open and non-fireable (i.e., the system has halted) and the unbounded neurons containing their specified values. It is straightforward to verify that the above construction generalizes to when the neurons have more than one rule. An unbounded neuron with m rules will have associated with it m modulo-jtm counters, one for each rule and during the computation, and these counters are operated in parallel to determine which rule can be fired. A bounded neuron with multiple rules is easily handled by the finite control. We then have to modify item 3 above to: Non-deterministically select a rule in each chosen neuron. Check if the chosen neurons with selected rules are fireable. The neurons in B are easy to check, and the unbounded neurons can be checked as described above, using the associated Jt ’s (modulo-jt counters) for the chosen rules. If at least one is not fireable, abort the computation by decrementing counter ZERO by 1. We omit the details. Clearly, Lemma 5.1 generalizes to the following. nsyn
Corollary 5.1. Psµ Spiktot EP∗
(unb, del0 ) ⊆ PsPBCM.
We now show the converse of Lemma 5.1. nsyn
Lemma 5.2. NPBCM ⊆ Nµ Spiktot EP∗
(unb, del0 ).
Proof. To simulate a PBCM we need to be able to simulate an addition instruction, a subtraction instruction, and a halting instruction (but we do not need to test for zero). The addition instruction will add one to a counter. The halting instruction will cause the system to halt. The subtraction instruction will subtract one from a counter and cause the system to abort if the counter was zero. Also, from our definition of a ‘‘valid computation’’, as a µ-halting computation, for the output of the SN P system to be valid, the system must halt and be in a valid configuration — we will see that in our construction, all neurons (bounded and unbounded) will contain zero spikes, except the output neuron which will contain exactly one spike. This means that any computation that leaves the non-output neurons with a positive spike count is invalid. To create a 0-delay unbounded SN P system Π that will simulate a PBCM M we follow the simulation in the proof of Theorem 4.1. To simulate an instruction of the form li = (ADD(r ), lj , lk ), we create the same ADD module as in the proof of Theorem 4.1. It is important to note that all neurons in this module are bounded. Also, when the instruction is done executing, all neurons in the module contain zero spikes if the module executed in a valid manner. (There are some alternate computations which leave some spikes in some of these neurons. These computations are invalid and the system will not generate any output. This is explained more precisely in the proof of Theorem 4.1.)
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
2363
To simulate an instruction of the form li = (SUB(r ), lj , lk ), we use the SUB module from the proof of Theorem 4.1 with a few small changes. In this module we remove the instruction a3T −s → a2T +s from neuron σr . Before, the neuron was a general neuron, but by removing all the finite rules we are only left with rules of the form ai (aj )∗ /ad → ap ; t and hence the neuron is unbounded. Note that all of the other neurons in the module are bounded. This rule change still allows neuron r (representing counter σr ) to fire if it stored a3Tn spikes for some n (representing a positive count in the counter) before instruction li is executed. In this case, the firing of neuron σr continues the computation. However, if neuron σr contained no spikes before the execution of instruction li (representing a count of zero), neuron σr will not fire causing the system to eventually halt (after other neurons forget). In this case, M tried to decrement a zero counter and so the system aborted. In the simulation, Π has halted in an invalid configuration since no neuron is fireable but neuron σr is not empty and still contains 3T − s spikes. (Also, no output was generated by the system). The final change to the SUB module is that the instruction a5T → a2T is changed to a6T → a2T causing the next instruction (lj or lk ) to be chosen non-deterministically if the subtraction simulation was successful. Note that a correct execution of this module also leaves all the neurons (other than σr ) with zero spikes. To simulate an instruction of the form li : HALT, we again create the same HALT module given in the proof of Theorem 4.1. To generalize this simulation for a k-output PBCM we modify the HALT module slightly to trigger all of the k output neurons. This is done by creating extra synapses from neuron σlh to the neurons σ2 , . . . , σk . In this case, an accepting configuration leaves all non-output neurons with zero spikes and all output neurons with exactly one spike. Again, Lemma 5.2 generalizes to: nsyn
Corollary 5.2. PsPBCM ⊆ Psµ Spiktot EP∗
(unb, del0 ).
From Corollaries 5.1 and 5.2, we have the main result of this section: nsyn
Theorem 5.1. Psµ Spiktot EP∗
(unb, del0 ) = PsPBCM .
It is known that PBCMs with only one output counter can only generate semilinear sets of numbers. Hence: Corollary 5.3. 0-delay unbounded SN P systems with µ-halting can only generate semilinear sets of numbers. Theorem 5.1 is the best possible result we can obtain, since if we allow bounded rules and unbounded rules in the neurons, SN P systems become universal, as shown in Theorem 4.1, where the subtraction module (Fig. 5) has a neuron with the following rules a6T −s (a3T )∗ /a6T −s → a3T +s
and a3T −s → a2Ts .
5.2. Closure properties and decision problems The following theorem is known: Theorem 5.2. (1) (Union, intersection, complementation) The sets of k-tuples generated by k-output PBCMs are closed under union and intersection, but not under complementation. (2) (Membership) It is decidable to determine, given a k-output PBCM M and a k-tuple α (of integers), whether M generates α . (3) (Emptiness) It is decidable to determine, given a k-output PBCM, whether it generates an empty set of k-tuples. (4) (Infiniteness) It is decidable to determine, given a k-output PBCM, whether it generates an infinite set of k-tuples. (5) (Disjointness) It is decidable to determine, given two k-output PBCMs, whether they generate a common k-tuple. (6) (Containment, equivalence) It is undecidable to determine, given two k-output PBCMs, whether the set generated by one is contained in the set generated by the other (or whether they generate the same set). (7) (Reachability) It is decidable to determine, given a PBCM with k output counters and m auxiliary counters (thus a total of k + m counters) and configurations α = (i1 , . . . , ik , j1 , . . . , jm ) and β = (i01 , . . . , i0k , j01 , . . . , j0m ) (the first k components correspond to the output), whether α can reach β . Then, from Theorems 5.1 and 5.2 parts 1–5, we have: Corollary 5.4. Theorem 5.2 parts 1–6 also hold for 0-delay unbounded k-output SN P systems with µ-halting. In the construction of the PBCM from SN P system in the proof of Lemma 5.1, we only provided counters for the unbounded neurons and a counter to keep track of the number of spikes that the output neuron sends to the environment. The bounded neurons are simulated in the finite control of the PBCM. We could have also allocated a partially blind counter for each bounded neuron (for manipulating a bounded number) and use the finite control to make sure that these added counters never become negative. Then the PBCM will have m + 1 counters, where m is the total number of neurons (bounded and unbounded) in the SN P system and σ1 corresponds to the output. In the case of k-output SN P system, the PBCM will have m + k counters. Then from Theorem 5.2 part 7, we have: Corollary 5.5. It is decidable to determine, given a 0-delay unbounded k-output SN P system with m neurons, and configurations α = (i1 , . . . , ik , j1 , . . . , jm ) and β = (i01 , . . . , i0k , j01 , . . . , j0m ) (the first k components correspond to the output), whether α can reach β . Note that for the above corollary we do not need to define what is a halting configuration for the SN P system, as we are only interested in reachability and not the set of tuples the system generates.
2364
M. Cavaliere et al. / Theoretical Computer Science 410 (2009) 2352–2364
6. Final remarks We have considered spiking neural P systems with a non-synchronized use of rules: in any step, a neuron can apply or not its rules which are enabled by the number of spikes it contains (further spikes can come, thus changing the rules enabled in the next step). Asynchronous spiking neural P systems have been proved to be universal when using extended rules (several spikes can be produced by a rule) and neurons containing both bounded and unbounded rules. Moreover, we have given a characterization of a class of spiking neural P systems – the unbounded ones, with µ-halting – in terms of partially blind counter machines. In the proof of the equivalence of asynchronous unbounded SN P systems with partially blind counter machines, we have assumed the µ-halting way of defining successful computations; the resulting decidability consequences are also based on this condition. This assumption can be removed. In a recent paper [11], it was shown that µ-halting can be replaced with the usual halting (hence ignoring the contents of neurons in the halting configuration) and the results still hold. SN P systems operating in sequential mode were studied earlier in [9]. In this mode, at every step of the computation, if there is at least one neuron with at least one rule that is fireable we only allow one such neuron and one such rule (both chosen non-deterministically) to be fired. It was shown in [9] that certain classes of sequential SN P systems are equivalent to partially blind counter machines, while others are universal. Thus, in some sense, non-synchronized and sequential modes of computation are equivalent. Many issues remain to be investigated for the non-synchronized SN P systems, starting with the main open problem whether or not SN P systems with standard rules (rules can only produce one spike) are Turing complete also in this case. Then, most of the questions considered for synchronized systems are relevant also for the non-synchronized case. We just list some of them: associating strings to computations (if i ≥ 1 spikes exit the output neuron, then the symbol bi is generated); finding universal SN P systems, if possible, with a small number of neurons; considering restricted classes of systems (e.g., with a bounded number of spikes present at a time in any neuron). In the bibliography below we indicate papers dealing with each of these issues for the case of synchronized SN P systems. A natural question is to investigate the class of systems for which ‘‘the time does not matter’’, for instance, such that syn nsyn Ngen (Π ) = Ngen (Π ) (like in the second example from Section 3). Suggestions in this respect can be found, e.g., in [1,2]. Acknowledgements The work of the authors was supported as follows. O. Egecioglu, O.H. Ibarra and S. Woodworth were supported in part by NSF Grants CCF-0430945 and CCF-0524136. M. Ionescu was supported by the fellowship ‘‘Formación de Profesorado Universitario’’ from the Spanish Ministry of Education, Culture and Sport. Gh. Păun was partially supported by the project BioMAT 2-CEx06-11-97/19.09.06. This research was in part carried out during a visit of M. Ionescu and Gh. Păun at the Microsoft Research-University of Trento Center for Computational and Systems Biology, Trento, Italy. Useful comments by the two anonymous referees are gratefully acknowledged. References [1] M. Cavaliere, V. Deufemia, Further results on time-free P systems, International Journal of Foundations of Computer Science 17 (1) (2006) 69–90. [2] M. Cavaliere, D. Sburlan, Time-independent P systems, in: Membrane Computing. International Workshop WMC5, Milano, Italy, 2004, in: LNCS, vol. 3365, Springer, 2005, pp. 239–258. [3] H. Chen, R. Freund, M. Ionescu, Gh. Păun, M.J. Pérez-Jiménez, On string languages generated by spiking neural P systems, in: [6], vol. I, pp. 169–194, and Fundamenta Informaticae 75 (1–4) (2007) 141–162. [4] H. Chen, T.-O. Ishdorj, Gh. Păun, M.J. Pérez-Jiménez, Spiking neural P systems with extended rules, in: [6], vol. I, pp. 241–265. [5] W. Gerstner, W. Kistler, Spiking Neuron Models. Single Neurons, Populations, Plasticity, Cambridge Univ. Press, 2002. [6] M.A. Gutiérrez-Naranjo, et al. (Eds.), Proceedings of Fourth Brainstorming Week on Membrane Computing, Febr. 2006, Fenix Editora, Sevilla, 2006. [7] S. Greibach, Remarks on blind and partially blind one-way multicounter machines, Theoretical Computer Science 7 (3) (1978) 311–324. [8] O.H. Ibarra, S. Woodworth, Characterizations of some restricted spiking neural P systems, in: Proc. 7th Workshop on Membrane Computing, Leiden, July 2006, in: LNCS, vol. 4361, Springer, Berlin, 2006, pp. 424–442. [9] O.H. Ibarra, S. Woodworth, F. Yu, A. Păun, On spiking neural P systems and partially blind counter machines, in: Proc. 5th International Conference on Unconventional Computation, in: LNCS, vol. 4135, Springer, Berlin, 2006, pp. 113–129. [10] O.H. Ibarra, A. Păun, Gh. Păun, A. Rodríguez-Patón, P. Sosik, S. Woodworth, Normal forms for spiking neural P systems, in: [6], vol. II, pp. 105–136, and Theoretical Computer Sci. 372 (2-3) (2007) 196–217. [11] O.H. Ibarra, S. Woodworth, Characterizations of some classes of spiking neural P systems, Natural Computing 7 (4) (2008) 499–517. [12] M. Ionescu, Gh. Păun, T. Yokomori, Spiking neural P systems, Fundamenta Informaticae 71 (2–3) (2006) 279–308. [13] M. Ionescu, Gh. Păun, T. Yokomori, Spiking neural P systems with exhaustive use of rules, International Journal of Unconventional Computing 3 (2) (2007) 135–154. [14] W. Maass, Computing with spikes, Foundations of Information Processing of TELEMATIK 8 (1) (2002) 32–36 (special issue). [15] W. Maass, C. Bishop (Eds.), Pulsed Neural Networks, MIT Press, Cambridge, 1999. [16] M. Minsky, Computation — Finite and Infinite Machines, Prentice Hall, Englewood Cliffs, NJ, 1967. [17] A. Păun, Gh. Păun, Small universal spiking neural P systems, in: [6], vol. II, pp. 213–234, and BioSystems 90 (1) (2007) 48–60. [18] Gh. Păun, Membrane Computing — An Introduction, Springer, Berlin, 2002. [19] Gh. Păun, M.J. Pérez-Jiménez, G. Rozenberg, Spike trains in spiking neural P systems, International Journal of Foundations of Computer Science 17 (4) (2006) 975–1002. [20] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, 3 Volumes, Springer, Berlin, 1997. [21] The P systems web page: http://ppage.psystems.eu.
Theoretical Computer Science 410 (2009) 2365–2376
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
On the similarity metric and the distance metricI Shihyen Chen, Bin Ma, Kaizhong Zhang ∗ Department of Computer Science, The University of Western Ontario, London, Ontario, Canada, N6A 5B7
article
info
Keywords: Similarity metric Distance metric Normalized similarity metric Normalized distance metric
a b s t r a c t Similarity and dissimilarity measures are widely used in many research areas and applications. When a dissimilarity measure is used, it is normally required to be a distance metric. However, when a similarity measure is used, there is no formal requirement. In this article, we have three contributions. First, we give a formal definition of similarity metric. Second, we show the relationship between similarity metric and distance metric. Third, we present general solutions to normalize a given similarity metric or distance metric. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Distance and similarity measures are widely used in bioinformatics research and other fields. Here, we give a few examples. Distance: Sequence edit distance and tree edit distance are used in many areas [14,28]. Moreover, the widespread use of distance is exemplified in the following contexts: constructing phylogenetic trees [19,21,23], improving database search [18], describing the relationship between words [3], comparing graphs or attributed trees [2,24], comparing information contents [8], and evaluating the importance of attributes in data mining [10,17,25]. Similarity: Protein sequence similarity based on BLOSUM matrices is used for protein sequence comparison [20]. Similarity metrics are used in data mining for evaluating the importance of attributes [5–7,9,12,16]. Distance metric is a well defined concept. In contrast, although similarity measures are widely used and their properties are studied and discussed [22,26], it seems that there is no formal definition for the concept. In this article, we give a formal definition of similarity metric. We then show the relationship between similarity metric and distance metric. Furthermore, we consider the problem of normalized similarity metric and normalized distance metric. Although there are studies on normalizing specific similarity and distance metrics [2,5,7,8,10,12,17,24], there is no general solution. We present general solutions to normalize a given similarity metric or distance metric. Finally, we illustrate with examples the generality of the presented solutions. The rest of the paper is organized as follows. Section 2 reviews the definition of distance metric and introduces a formal definition of similarity metric while showing some useful properties. Section 3 concerns the relationship between similarity metric and distance metric. Section 4 concerns normalized similarity metric. Section 5 concerns normalized distance metric. Section 6 concerns the generality of the presented solutions. Section 7 presents concluding remarks.
I Preliminary work can be found in [S. Chen, B. Ma, K. Zhang, The normalized similarity metric and its applications, in: Proceedings of 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007, 2007, pp. 172–180; B. Ma, K. Zhang, The similarity metric and the distance metric, in: Proceedings of the 6th Atlantic Symposium on Computational Biology and Genome Informatics, 2005, pp. 1239–1242]. This work was partially supported by NSERC grants. ∗ Corresponding author. E-mail addresses:
[email protected] (S. Chen),
[email protected] (B. Ma),
[email protected] (K. Zhang).
0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.023
2366
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
2. Similarity metric and distance metric Recall the formal definition of a distance metric as follows. Definition 1 (Distance Metric). Given a set X , a real-valued function d(x, y) on the Cartesian product X × X is a distance metric if for any x, y, z ∈ X , it satisfies the following conditions: 1. 2. 3. 4.
d(x, y) d(x, y) d(x, z ) d(x, y)
≥ 0 (non-negativity), = d(y, x) (symmetry), ≤ d(x, y) + d(y, z ) (triangle inequality), = 0 if and only if x = y (identity of indiscernibles).
To our knowledge, there is no formal metric definition for similarity. In the following, we present a formal definition for the similarity metric [4,11]. Definition 2 (Similarity Metric). Given a set X , a real-valued function s(x, y) on the Cartesian product X × X is a similarity metric if, for any x, y, z ∈ X , it satisfies the following conditions: 1. 2. 3. 4. 5.
s(x, y) = s(y, x), s(x, x) ≥ 0, s(x, x) ≥ s(x, y), s(x, y) + s(y, z ) ≤ s(x, z ) + s(y, y), s(x, x) = s(y, y) = s(x, y) if and only if x = y.
Condition 1 states that s(x, y) is symmetric. Condition 2 states that for any x the self similarity is nonnegative. Although it is not mandatory to set this lower bound at zero, this is a common and reasonable choice. Condition 3 states that for any x the self similarity is no less than the similarity between x and any y. Condition 4 states that the similarity between x and z through y is no greater than the direct similarity between x and z plus the self similarity of y. This property is the equivalent of the triangle inequality in distance metric. Condition 5 states that the statements s(x, x) = s(y, y) = s(x, y) and x = y are equivalent. With the possible exceptions of condition 4 and 5, the remaining conditions clearly agree with the intuitive meaning of similarity. As to condition 4 and 5, although their relevance to similarity may not be intuitively clear, we explain in the following that they are indeed indispensable properties for similarity. Consider condition 4. At first sight, this inequality might appear unnatural since, from the triangle inequality, one might expect it to be s(x, y) + s(y, z ) ≤ s(x, z ) without the s(y, y) term. In a deeper analysis, as follows, we shall see why this s(y, y) term should be included. Intuitively, the notion of similarity serves as a means to quantify the common information shared by two objects. Two scenarios arise. In the first scenario, only non-negative values are used to quantify similarity. In the second scenario, real values are used to quantify similarity. In the current discussion, we borrow notations from set theory due to its convenience in conveying the intuition underlying similarity. For non-negative quantification, the similarity between x and y may be expressed as |x ∩ y| which represents that which is commonly shared by both objects. Moreover, note that |x ∩ y| = |x ∩ y ∩ z | + |x ∩ y ∩ z¯ |, where a notation x¯ denotes the complement of x. In this scenario, we are concerned with the inequality:
|x ∩ y| + |y ∩ z | ≤ |x ∩ z | + |y| . The validity of this inequality is justified as
|x ∩ y| + |y ∩ z | = |x ∩ y ∩ z | + |x ∩ y ∩ z¯ | + |x ∩ y ∩ z | + |¯x ∩ y ∩ z | ≤ |x ∩ z | + |y| , due to the facts that |x ∩ y ∩ z | ≤ |x ∩ z | and |x ∩ y ∩ z¯ | + |x ∩ y ∩ z | + |¯x ∩ y ∩ z | ≤ |y|. Without the presence of |y|, one cannot say that |x ∩ z | alone is enough to bound all the terms on the other side of the inequality. A simple example is when |x ∩ z | = ∅ while |x ∩ y| 6= ∅ and |y ∩ z | 6= ∅. For general quantification, the similarity between x and y may be expressed as k × |x ∩ y| − k0 × (|x ∩ y¯ | + |y ∩ x¯ |) where both common and non-common contributions are taken into account. In this scenario, we are concerned with the inequality: k × (|x ∩ y| + |y ∩ z |) − k0 × (|x ∩ y¯ | + |y ∩ x¯ | + |y ∩ z¯ | + |z ∩ y¯ |) ≤ k × (|x ∩ z | + |y|) − k0 × (|x ∩ z¯ | + |z ∩ x¯ |). From the results in the non-negative quantification, if we can show the validity of the following inequality then the validity of the above inequality follows:
|x ∩ y¯ | + |y ∩ x¯ | + |y ∩ z¯ | + |z ∩ y¯ | ≥ |x ∩ z¯ | + |z ∩ x¯ | .
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
2367
As shown in the following, this is indeed true:
|x ∩ y¯ | + |y ∩ x¯ | + |y ∩ z¯ | + |z ∩ y¯ | ≥ |x ∩ y¯ ∩ z¯ | + |¯x ∩ y ∩ z | + |x ∩ y ∩ z¯ | + |¯x ∩ y¯ ∩ z | = (|x ∩ y ∩ z¯ | + |x ∩ y¯ ∩ z¯ |) + (|¯x ∩ y ∩ z | + |¯x ∩ y¯ ∩ z |) = |x ∩ z¯ | + |z ∩ x¯ | . Now consider condition 5. The ‘‘if’’ part is clear. The ‘‘only-if’’ part, which states that if s(x, x) = s(y, y) = s(x, y) then x = y, is justified by Lemma 1. Lemma 1. Let s(x, y) be a real function satisfying similarity metric conditions 1, 2, 3 and 4. If s(x, x) = s(y, y) = s(x, y) then for any z, s(x, z ) = s(y, z ). Proof. From s(x, y) + s(y, z ) ≤ s(x, z ) + s(y, y), we have s(y, z ) ≤ s(x, z ). From s(y, x) + s(x, z ) ≤ s(y, z ) + s(x, x), we have s(x, z ) ≤ s(y, z ). This means that for any z, s(x, z ) = s(y, z ). From the definitions, the negation of a distance metric is a similarity metric. Therefore the similarity metric is a more general notion. The next two lemmas consider the result of adding or multiplying two similarity metrics. Lemma 2. Let s1 (x, y) ≥ 0 and s2 (x, y) ≥ 0 be two similarity metrics, then s1 (x, y) + s2 (x, y) is a similarity metric. Proof. Trivial. Lemma 3. Let s1 (x, y) ≥ 0 and s2 (x, y) ≥ 0 be two similarity metrics, then s1 (x, y) × s2 (x, y) is a similarity metric. Proof. We only show the proof for condition 4 as the other conditions can be proved trivially. Condition 4: Let dxz = max{s2 (x, y) + s2 (y, z ) − s2 (y, y), 0}, then dxz ≤ s2 (x, z ), dxz ≤ s2 (y, y) and s2 (x, y) + s2 (y, z ) ≤ s2 (y, y) + dxz . Without loss of generality, we assume that s1 (x, y) ≥ s1 (y, z ). Then, s1 (x, y) × s2 (x, y) + s1 (y, z ) × s2 (y, z ) = (s1 (x, y) − s1 (y, z )) × s2 (x, y) + s1 (y, z ) × (s2 (x, y) + s2 (y, z ))
≤ = ≤ ≤
(s1 (x, y) − s1 (y, z )) × s2 (y, y) + s1 (y, z ) × (s2 (y, y) + dxz ) s1 (x, y) × (s2 (y, y) − dxz ) + (s1 (x, y) + s1 (y, z )) × dxz s1 (y, y) × (s2 (y, y) − dxz ) + (s1 (y, y) + s1 (x, z )) × dxz s1 (y, y) × s2 (y, y) + s1 (x, z ) × s2 (x, z ).
Following the definitions of distance and similarity, the normalized metrics are defined as follows. Definition 3 (Normalized Distance Metric). A distance metric d(x, y) is a normalized distance metric if d(x, y) ≤ 1. Definition 4 (Normalized Similarity Metric). A similarity metric s(x, y) is a normalized similarity metric if |s(x, y)| ≤ 1. Corollary 1. If s(x, y) is a normalized similarity metric and for any x, s(x, x) = 1, then 12 × (1 − s(x, y)) is a normalized distance metric. If, in addition, s(x, y) ≥ 0, then 1 − s(x, y) is a normalized distance metric. If d(x, y) is a normalized distance metric, then 1 − d(x, y) is a normalized similarity metric. Proof. The statements follow directly from the basic definitions. Therefore, if di (x, y) ≥ 0, 1 ≤ i ≤ n, are normalized distance metrics, then i (1 − di (x, y)) is a normalized similarity Qn metric and 1 − i (1 − di (x, y)) is a normalized distance metric. In the following, we discuss some properties of concave and convex functions that will be useful later.
Qn
Definition 5. A function f is concave over an interval [a, b] if for every x1 , x2 ∈ [a, b] and 0 ≤ λ ≤ 1,
λ × f (x1 ) + (1 − λ) × f (x2 ) ≤ f (λ × x1 + (1 − λ) × x2 ).
(1)
Definition 6. A function f is convex over an interval [a, b] if for every x1 , x2 ∈ [a, b] and 0 ≤ λ ≤ 1,
λ × f (x1 ) + (1 − λ) × f (x2 ) ≥ f (λ × x1 + (1 − λ) × x2 ). Lemma 4. If a function f is concave over interval (−∞, ∞), then for any a, b ≥ 0 and c ≥ 0, f (a) + f (a + b + c ) ≤ f (a + b) + f (a + c ). c Proof. Let a + b = λ × a + (1 − λ) × (a + b + c ) and a + c = λ0 × a + (1 − λ0 ) × (a + b + c ). Consequently λ = b+ , c b 0 0 λ = b+c , and λ + λ = 1. From Eq. (1), we have
λ × f (a) + (1 − λ) × f (a + b + c ) ≤ f (λ × a + (1 − λ) × (a + b + c )) = f (a + b), λ0 × f (a) + (1 − λ0 ) × f (a + b + c ) ≤ f (λ0 × a + (1 − λ0 ) × (a + b + c )) = f (a + c ).
Hence, f (a) + f (a + b + c ) ≤ f (a + b) + f (a + c ).
2368
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
Lemma 5. If a function f is convex over interval (−∞, ∞), then for any a, b ≥ 0 and c ≥ 0, f (a) + f (a + b + c ) ≥ f (a + b) + f (a + c ). Proof. Symmetric to Lemma 4.
Lemma 6. Let f be a non-negative concave function over [0, ∞). Then f (bx+x) ≤ f (b+y) where 0 ≤ x ≤ y and 0 ≤ b. y
Proof. Let 0 ≤ λ ≤ 1 such that λ × b + (1 − λ) × (b + y) = b + x. Then (1 − λ) × y = x and f (b + x) x
≥
λ × f (b) + (1 − λ) × f (b + y) x y
Hence f (bx+x) ≤ f (b+y) .
=
λ × f (b) x
+
f (b + y) y
≥
f (b + y) y
.
The next lemma states the consequence of setting a similarity metric as an argument of a convex function. Lemma 7. Let s(x, y) be a similarity metric, and f a convex function such that f (0) ≥ 0, and f (x) < f (y) if x < y. Then f (s(x, y)) is a similarity metric. Proof. Condition 1, 2 and 3: Trivial. Condition 4: Let a = s(x, y) + s(y, z ) − s(y, y), b = s(y, y) − s(x, y) and c = s(y, y) − s(y, z ). Then it is straightforward to verify this condition with the help of Lemma 5. Condition 5: If x = y then clearly f (s(x, x)) = f (s(y, y)) = f (s(x, y)). Conversely, f (s(x, x)) = f (s(y, y)) = f (s(x, y)) implies s(x, x) = s(y, y) = s(x, y) due to the condition that f (x) < f (y) if x < y, hence x = y. Note. If the functional condition in Lemma 7 becomes ‘‘f (x) ≤ f (y) if x < y’’, then by partitioning the set into equivalence classes such that x and y are in the same class if and only if f (s(x, x)) = f (s(y, y)) = f (s(x, y)), f (s(x, y)) is still a similarity metric on the quotient set. Corollary 2. Given a similarity metric s(x, y) on X , we define s+ (x, y) as follows. s (x, y) = +
s(x, y), 0,
s(x, y) ≥ 0, s(x, y) < 0.
Then s+ (x, y) is a similarity metric on X 0 where all x ∈ X such that s(x, x) = 0 correspond to a single element in X 0 . Proof. The result follows directly from the preceding note. 3. Relationship between similarity metric and distance metric We consider the relationship between the similarity metric and the distance metric. In particular, we establish transformations that transform a given similarity metric to a distance metric and vice versa. We first consider transformations from similarity metric to distance metric. Given a similarity metric s(x, y), we define two transformations, Fp (s) = dp and Fm (s) = dm , as follows: Fp (s(x, y)) =
s(x, x) + s(y, y)
− s(x, y), 2 Fm (s(x, y)) = max{s(x, x), s(y, y)} − s(x, y). In the following, we prove that these transformations produce distance metrics. Lemma 8. Let s(x, y) be a similarity metric. Then, dp (x, y) =
s(x, x) + s(y, y) 2
− s(x, y)
is a distance metric. Proof. Condition 1: dp (x, y) =
s(x, x) + s(y, y) 2
− s(x, y) =
s(x, x) − s(x, y) + s(y, y) − s(x, y)
The inequality is due to similarity metric condition 3. Condition 2: Trivial.
2
≥ 0.
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
2369
Condition 3: dp (x, z ) =
≤ =
s(x, x) + s(z , z ) − 2 × s(x, z ) 2 s(x, x) + s(z , z ) + 2 × s(y, y) − 2 × s(x, y) − 2 × s(y, z ) 2 s(x, x) + s(y, y) − 2 × s(x, y) 2
+
s(y, y) + s(z , z ) − 2 × s(y, z )
= dp (x, y) + dp (y, z ).
2
Condition 4: If x = y then clearly dp (x, y) = 0. Conversely, dp (x, y) = 0 means s(x, x) + s(y, y) − 2 × s(x, y) = 0. Since s(x, x) ≥ s(x, y) and s(y, y) ≥ s(x, y), we must have s(x, x) = s(x, y) and s(y, y) = s(x, y) for s(x, x)+ s(y, y)− 2 × s(x, y) = 0 to hold, that is, s(x, x) = s(y, y) = s(x, y). Hence, x = y. Lemma 9. Let s(x, y) be a similarity metric. Then, dm (x, y) = max{s(x, x), s(y, y)} − s(x, y) is a distance metric. Proof. Condition 1 and 2: Trivial. Condition 3: dm (x, z ) = max{s(x, x), s(z , z )} − s(x, z )
≤ max{s(x, x), s(z , z )} + s(y, y) − s(x, y) − s(y, z ) ≤ max{s(x, x), s(y, y)} − s(x, y) + max{s(y, y), s(z , z )} − s(y, z ) = dm (x, y) + dm (y, z ). Condition 4: If x = y, then clearly dm (x, y) = 0. Conversely, dm (x, y) = 0 means max{s(x, x), s(y, y)} − s(x, y) = 0. Since s(x, x) ≥ s(x, y) and s(y, y) ≥ s(x, y), this implies s(x, x) = s(y, y) = s(x, y), hence x = y. Next, we consider transformations from distance metric to similarity metric. Given a distance metric d(x, y) on X , we define, for any fixed o ∈ X , transformations Gkp (d) = skp with k ≥ 1, and Gkm (d) = skm with k > 0, as follows: Gkp (d(x, y)) =
d(x, o) + d(y, o) k
− d(x, y),
Gkm (d(x, y)) = k × min{d(x, o), d(y, o)} − d(x, y). In the following, we prove that these transformations produce similarity metrics. Lemma 10. Let d(x, y) be a distance metric on X . Then for k ≥ 1, and any fixed o ∈ X , skp (x, y) =
d(x, o) + d(y, o) k
− d(x, y)
is a similarity metric. Proof. Condition 1, 2, 3 and 4: Trivial. Condition 5: If x = y then skp (x, x) = skp (y, y) = skp (x, y) holds trivially. Conversely, skp (x, x) = skp (y, y) = skp (x, y) implies 2 × d(x, o) = 2 × d(y, o) = d(x, o) + d(y, o) − k × d(x, y). This means that d(x, o) = d(y, o) and therefore 2 × d(x, o) = 2 × d(x, o) − k × d(x, y). This yields d(x, y) = 0, hence x = y. Lemma 11. Let d(x, y) be a distance metric on X . Then for k > 0, and any fixed o ∈ X , skm (x, y) = k × min{d(x, o), d(y, o)} − d(x, y) is a similarity metric. Proof. Condition 1, 2 and 3: Trivial. Condition 4: skm (x, y) + skm (y, z ) = k × min{d(x, o), d(y, o)} − d(x, y) + k × min{d(y, o), d(z , o)} − d(y, z )
≤ k × min{d(x, o), d(y, o)} − d(x, z ) + k × min{d(y, o), d(z , o)} − d(y, y) ≤ k × min{d(x, o), d(z , o)} − d(x, z ) + k × min{d(y, o), d(y, o)} − d(y, y) = skm (x, z ) + skm (y, y). Condition 5: If x = y then skm (x, x) = skm (y, y) = skm (x, y) clearly holds. Conversely, skm (x, x) = skm (y, y) = skm (x, y) implies k × d(x, o) = k × d(y, o) = k × min {d(x, o), d(y, o)} − d(x, y). This means d(x, y) = 0, hence x = y.
2370
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
Note. Given a distance metric d, we have Fp (Gkp (d)) = d. Given a similarity metric s, in general Gkp (Fp (s)) 6= s. Only when there exists a fixed o ∈ X , such that (k − 1) × (s(x, x) + s(y, y)) = 2 × (s(o, o) − s(x, o) − s(y, o)), we have Gkp (Fp (s)) = s. The following lemma states a result involving transformation via exponential function.
Lemma 12. If d(x, y) is a distance metric, then e−d(x,y) is a normalized similarity metric and 1 − e−d(x,y) is a normalized distance metric. Proof. From (1 − e−d(x,y) ) × (1 − e−d(y,z ) ) ≥ 0, we have e−d(x,y) + e−d(y,z ) ≤ e−(d(x,y)+d(y,z )) + 1. Therefore e−d(x,y) + e−d(y,z ) ≤ e−d(x,z ) + e−d(y,y) . Other properties are trivial. 4. Normalized similarity metric We first present several similarity metrics with a normalized appearance but which may not be strictly normalized according to Definition 4. Following these, we strengthen the functional condition so as to normalize these metrics. Theorem 1. Let s(x, y) be a similarity metric, and f a concave function over [0, ∞) satisfying f (0) ≥ 0, f (x) > 0 if x > 0, and f (x) ≤ f (y) if x < y. Then s¯(x, y) =
s( x, y ) f (s(x, x) + s(y, y) − s(x, y))
is a similarity metric. Proof. Condition 1, 2, and 3: Trivial. Condition 4: Let f1 = f (s(x, x) + s(y, y) + s(z , z ) − s(x, y) − s(y, z )), f2 = f (s(x, x) + s(y, y) − s(x, y)), f3 = f (s(y, y) + s(z , z ) − s(y, z )), f4 = f (s(y, y)). Consequently, f1 ≥ {f2 , f3 } ≥ f4 . Further, let a = s(y, y), b = s(x, x) − s(x, y), and c = s(z , z ) − s(y, z ). Therefore, s(x, z )
s¯(x, y) + s¯(y, z ) − s¯(y, y) − s¯(x, z ) = s¯(x, y) + s¯(y, z ) − s¯(y, y) −
f (s(x, x) + s(z , z ) − s(x, z )) ≤ s¯(x, y) + s¯(y, z ) − s¯(y, y) s(x, y) + s(y, z ) − s(y, y) − f (s(x, x) + s(z , z ) − (s(x, y) + s(y, z ) − s(y, y))) 1 s(x, y) × (f1 − f2 ) s(y, z ) × (f1 − f3 ) s(y, y) × (f1 − f4 ) = × + − f1
≤ =
s(y, y) f1 × f4 s(y, y)
f1 × f4 ≤ 0.
f2
f3
(2)
f4
× (f1 + f4 − f2 − f3 ) × (f (a + b + c ) + f (a) − f (a + b) − f (a + c )) (3)
The inequality in (2) clearly holds for s(x, z ) ≥ 0. When s(x, z ) < 0, this relation also holds due to Lemma 6. The last inequality in (3) holds due to Lemma 4. s(x,x) Condition 5: If x = y, clearly s¯(x, x) = s¯(y, y) = s¯(x, y). Conversely, if s¯(x, x) = s¯(y, y) = s¯(x, y), then f (s(x,x)) = s(y,y) f (s(y,y))
s(x,y) . f (s(x,x)+s(y,y)−s(x,y))
s(y,y)
Since s(y, y) ≥ s(x, y) and f (s(y, y)) ≤ f (s(x, x) + s(y, y) − s(x, y)), in order for f (s(y,y)) = s(x,y) to hold we must have s(y, y) = s(x, y). Similarly we must have s(x, x) = s(x, y). This means s(x, x) = f (s(x,x)+s(y,y)−s(x,y)) s(y, y) = s(x, y), hence x = y.
=
Theorem 2. Let f be a function satisfying f (0) ≥ 0, f (x) > 0 if x > 0, and f (x) ≤ f (y) if x < y. Then, given a similarity metric s(x, y) ≥ 0, s¯(x, y) =
s(x, y) f (max{s(x, x), s(y, y)})
is a similarity metric.
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
2371
Proof. Condition 1, 2, and 3: Trivial. Condition 4: To show s¯(x, y) + s¯(y, z ) ≤ s¯(x, z ) + s¯(y, y), there are three cases to consider. 1. s(z , z ) ≤ s(x, x) ≤ s(y, y): s¯(x, y) + s¯(y, z ) =
≤ ≤
s(x, y) f (s(y, y)) s(x, z ) f (s(y, y)) s(x, z )
+ + +
s(y, z ) f (s(y, y)) s(y, y) f (s(y, y)) s(y, y)
f (s(x, x)) f (s(y, y)) = s¯(x, z ) + s¯(y, y).
2. s(z , z ) ≤ s(y, y) ≤ s(x, x): s¯(x, y) + s¯(y, z ) =
≤
f (s(y, y)) × (s(x, y) + s(y, z )) + (f (s(x, x)) − f (s(y, y))) × s(y, z ) f (s(x, x)) × f (s(y, y)) f (s(y, y)) × (s(x, z ) + s(y, y)) + (f (s(x, x)) − f (s(y, y))) × s(y, y) f (s(x, x)) × f (s(y, y))
= s¯(x, z ) + s¯(y, y). 3. s(y, y) ≤ s(z , z ) ≤ s(x, x): s¯(x, y) + s¯(y, z ) =
≤ ≤
f (s(z , z )) × (s(x, y) + s(y, z )) + (f (s(x, x)) − f (s(z , z ))) × s(y, z ) f (s(x, x)) × f (s(z , z )) f (s(z , z )) × (s(x, z ) + s(y, y)) + (f (s(x, x)) − f (s(z , z ))) × s(y, y) s(x, z )
+
f (s(x, x)) × f (s(z , z ))
s(y, y)
f (s(x, x)) f (s(y, y)) = s¯(x, z ) + s¯(y, y). Condition 5: It is clear that s¯(x, x) = s¯(y, y) = s¯(x, y) if x = y. Conversely, if s¯(x, x) = s¯(y, y) = s¯(x, y), then s(x,x) = f (ss((yy,,yy))) = f (max{ss((xx,,xy),) s(y,y)}) . Since s(y, y) ≥ s(x, y) and f (s(y, y)) ≤ f (max{s(x, x), s(y, y)}), in order for f (s(x,x)) s(y,y) f (s(y,y))
= f (max{ss((xx,,xy),) s(y,y)}) to hold we must have s(y, y) = s(x, y). Similarly we must have s(x, x) = s(x, y). This means s(x, x) = s(y, y) = s(x, y), hence x = y. Theorem 3. Let f be a concave function over [0, ∞) satisfying f (0) ≥ 0, f (x) > 0 if x > 0, and f (x) ≤ f (y) if x < y. Then, given a similarity metric s(x, y) ≥ 0, for 0 ≤ k ≤ 1, s(x, y) s¯(x, y) = f (max{s(x, x), s(y, y)} + k × (min{s(x, x), s(y, y)} − s(x, y))) is a similarity metric. Proof. We only prove that s¯(x, y) + s¯(y, z ) ≤ s¯(y, y) + s¯(x, z ) as the rest is similar to the above theorems. Let f1 = f (max{s(x, x), s(z , z )} + k × (min{s(x, x), s(z , z )} − s(x, y) − s(y, z ) + s(y, y))), f10 = f (max{s(x, x), s(z , z )} + k × (min{s(x, x), s(z , z )} − s(x, z ))), f2 = f (max{s(x, x), s(y, y)} + k × (min{s(x, x), s(y, y)} − s(x, y))), f3 = f (max{s(y, y), s(z , z )} + k × (min{s(y, y), s(z , z )} − s(y, z ))), f4 = f (s(y, y)). It is straightforward to verify that all the above terms are non-negative. As will be clear soon, it is useful to sort out the relative magnitudes for {f1 , f10 } and {f1 , f2 , f3 , f4 }. For {f1 , f10 }, we have f1 ≥ f10 since s(x, y) + s(y, z ) ≤ s(x, z ) + s(y, y). Using the fact that s(x, z ) ≥ 0, we have s¯(x, y) + s¯(y, z ) − s¯(y, y) − s¯(x, z ) =
≤ ≤
s(x, y) f2 s(x, y)
+ +
s(y, z ) f3 s(y, z )
− −
f2 f3 s(x, y) × (f1 − f2 ) f1 × f2
s(y, y) f4 s(y, y)
+
− −
s(x, z ) f10 s(x, z )
f4 f1 s(y, z ) × (f1 − f3 ) f1 × f3
−
s(y, y) × (f1 − f4 ) f1 × f4
.
2372
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376 Table 1 A comparison of metric conditions. Formula
s(x, y)
k
f
s(x,y) f (s(x,x)+s(y,y)−s(x,y))
1
Any
Concave
s(x,y) f (max{s(x,x),s(y,y)}+k×(min{s(x,x),s(y,y)}−s(x,y)))
0
≥0
Concave
s(x,y) f (max{s(x,x),s(y,y)})
0
≥0
Any
For {f1 , f2 , f3 , f4 }, we have {f2 , f3 } ≥ f4 . The full order for {f1 , f2 , f3 , f4 } depends on the relative magnitudes of {s(x, x), s(y, y), s(z , z )}. Since x and z are symmetric in the formula, we can assume that s(x, x) ≥ s(z , z ). Therefore there are three cases to consider, namely s(x, x) ≥ s(z , z ) ≥ s(y, y), s(x, x) ≥ s(y, y) ≥ s(z , z ), and s(y, y) ≥ s(x, x) ≥ s(z , z ). The cases s(x, x) ≥ s(z , z ) ≥ s(y, y) and s(x, x) ≥ s(y, y) ≥ s(z , z ) give rise to the partial order: f1 ≥ {f2 , f3 } ≥ f4 . The case s(y, y) ≥ s(x, x) ≥ s(z , z ) results in multiple possibilities: f1 ≥ {f2 , f3 } ≥ f4 , f2 ≥ f1 ≥ f3 ≥ f4 , f3 ≥ f1 ≥ f2 ≥ f4 , {f2 , f3 } ≥ f1 ≥ f4 , and {f2 , f3 } ≥ f4 ≥ f1 . We first derive the following result for f1 ≥ {f2 , f3 } ≥ f4 as it is relevant in all three cases: s¯(x, y) + s¯(y, z ) − s¯(y, y) − s¯(x, z ) ≤
s(x, y) × (f1 − f2 )
+
s(y, z ) × (f1 − f3 )
f1 × f2 f1 × f3 s(y, y) × (f1 + f4 − f2 − f3 ). ≤ f1 × f4
Since
s(y,y) f1 ×f4
−
s(y, y) × (f1 − f4 ) f1 × f4
≥ 0, in the following when f1 ≥ {f2 , f3 } ≥ f4 , it suffices to prove f1 + f4 − f2 − f3 ≤ 0.
1. s(x, x) ≥ s(z , z ) ≥ s(y, y): Let a = s(y, y), b = s(x, x) − s(y, y) + k × (s(y, y) − s(x, y)), c = s(z , z ) − s(y, y) + k × (s(y, y) − s(y, z )), and c 0 = k × (s(z , z ) − s(y, z )). From c − c 0 = (1 − k) × (s(z , z ) − s(y, y)) ≥ 0, we have c ≥ c 0 . Then, f1 + f4 − f2 − f3 = f (a + b + c 0 ) + f (a) − f (a + b) − f (a + c ) ≤ f (a + b + c ) + f (a) − f (a + b) − f (a + c ) ≤ 0. 2. s(x, x) ≥ s(y, y) ≥ s(z , z ): Let a = s(y, y), b = s(x, x) − s(y, y) + k × (s(y, y) − s(x, y)), and c = k × (s(z , z ) − s(y, z )). Then, f1 + f4 − f2 − f3 = f (a + b + c ) + f (a) − f (a + b) − f (a + c ) ≤ 0. 3. s(y, y) ≥ s(x, x) ≥ s(z , z ): • f1 ≥ {f2 , f3 } ≥ f4 : Similar as above. • f2 ≥ f1 ≥ f3 ≥ f4 : Using the fact that s(y, y) ≥ s(y, z ), we have s¯(x, y) + s¯(y, z ) − s¯(y, y) − s¯(x, z ) ≤
≤
s(x, y) × (f1 − f2 ) f1 × f2 s(x, y) × (f1 − f2 ) f1 × f2
+ +
s(y, z ) × (f1 − f3 ) f1 × f3 s(y, y) × (f1 − f4 ) f1 × f3 × f4
−
s(y, y) × (f1 − f4 ) f1 × f4
× (f4 − f3 )
≤ 0. • f3 ≥ f1 ≥ f2 ≥ f4 : Similar as above. • {f2 , f3 } ≥ f1 ≥ f4 or {f2 , f3 } ≥ f4 ≥ f1 : Using the fact that min{f2 , f3 } ≥ max{f1 , f4 }, we have s¯(x, y) + s¯(y, z ) − s¯(y, y) − s¯(x, z ) ≤
≤
s( x, y )
+
s(y, z )
−
s(y, y)
−
s(x, z )
f2 f3 f4 f1 s(x, y) + s(y, z ) − s(y, y) − s(x, z )
≤ 0.
max{f1 , f4 }
Therefore, we have proved that s¯(x, y) + s¯(y, z ) ≤ s¯(y, y) + s¯(x, z ).
Note. We may define s¯(x, y) = 0 if both the numerator and the denominator are 0. Corollary 3. In the above theorems, we obtain normalized similarity metrics with an additional condition, f (x) ≥ x. Proof. Trivial. s(x,y)
s(x,y)
s(x,y)
Remark. We see that f (max{s(x,x),s(y,y)}+k×(min{s(x,x),s(y,y)}−s(x,y))) can reduce to f (s(x,x)+s(y,y)−s(x,y)) or f (max{s(x,x),s(y,y)}) with k = 1 or k = 0, respectively. A comparison of their respective metric conditions is listed in Table 1. When k is in between 0 and 1 the condition requirement is more stringent than when k takes on the limits, i.e. 0 or 1. When k = 0 the condition for f is relaxed, whereas when k = 1 the condition for s(x, y) is relaxed.
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
2373
5. Normalized distance metric Theorem 4. Let d(x, y) be a distance metric on X . Let f be a concave function on [0, ∞) such that f (0) ≥ 0, f (x) > 0 if x > 0, and f (x) ≤ f (y) if x < y. Then for any fixed o ∈ X , d¯ (x, y) =
d(x, y) f (d(x, y) +
d(x,o)+d(y,o) k
)
is a distance metric, where k ≥ 1. Proof. We prove that d¯ (x, y) ≤ d¯ (x, z ) + d¯ (y, z ) as the rest is trivial. d¯ (x, y) =
≤ ≤
d(x, y) f (d(x, y) +
d(x,o)+d(y,o) k
)
d(x, z ) + d(y, z ) f (d(x, z ) + d(y, z ) +
d(x,o)+d(y,o) k
d(x, z ) f (d(x, z ) +
d(x,o)+d(z ,o) k
)
+
(4)
) d(y, z )
f (d(y, z ) +
d(y,o)+d(z ,o) k
)
= d¯ (x, z ) + d¯ (y, z ). The inequality in (4) is due to Lemma 6.
Corollary 4. With an additional condition that f (x) ≥ k+k 1 × x, d(x, y) f (d(x, y) +
d(x,o)+d(y,o) k
)
is a normalized distance metric. Proof. Trivial. Theorem 5. Let d(x, y) be a distance metric on X . Let f be a function such that f (0) ≥ 0, f (x) > 0 if x > 0, and f (x) ≤ f (y) if x < y. Then for any fixed o ∈ X , d¯ (x, y) =
d(x, y) − min{d(x, o), d(y, o)} f (max{d(x, o), d(y, o)})
+
min{d(x, o), d(y, o)} f (min{d(x, o), d(y, o)})
is a distance metric. Proof. Let s(x, y) = d(x, o) + d(y, o) − d(x, y), then from Lemma 10, s(x, y) is a non-negative similarity metric. Since f ( 2x ) satisfies the conditions of Theorem 2, the following is a similarity metric s¯(x, y) =
d(x, o) + d(y, o) − d(x, y) f (max{d(x, o), d(y, o)})
.
Applying Lemma 8 to s¯(x, y), we have that d(x, o) f (d(x, o))
+
d(y, o) f (d(y, o))
−
d(x, o) + d(y, o) − d(x, y) f (max{d(x, o), d(y, o)})
is a distance metric. Therefore d¯ (x, y) =
d(x, y) − min{d(x, o), d(y, o)} f (max{d(x, o), d(y, o)})
+
min{d(x, o), d(y, o)} f (min{d(x, o), d(y, o)})
is a distance metric. d(x,o) Note that from the formula, d¯ (x, o) needs special definition. We can define d¯ (o, o) = 0 and d¯ (x, o) = f (d(x,o)) . Corollary 5. With an additional condition that f (x) ≥ 2 × x, d(x, y) − min{d(x, o), d(y, o)} f (max{d(x, o), d(y, o)}) is a normalized distance metric. Proof. Trivial.
+
min{d(x, o), d(y, o)} f (min{d(x, o), d(y, o)})
2374
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376 Table 2 Summary: Set similarity metrics and distance metrics. Similarity
|A ∩ B|
Distance
|A − B| + |B − A| max{|A − B|, |B − A|}
|A∩B| |A∪B| |A∩B| max{|A|,|B|}
|A−B|+|B−A| |A∪B| max{|A−B|,|B−A|} max{|A|,|B|}
Corollary 6. With an additional condition that f (x) is concave, d(x, y) − min{d(x, o), d(y, o)} + max{d(x, o), d(y, o)} f (max{d(x, o), d(y, o)}) is a distance metric. Proof. In the last step of the proof of the theorem, applying Lemma 9 instead of Lemma 8 and using the following fact d(x, o)
max
,
d(y, o)
f (d(x, o)) f (d(y, o))
max{d(x, o), d(y, o)}
=
f (max{d(x, o), d(y, o)})
,
the result follows. 6. Examples In specific problem settings, several similarity and distance metrics have been proposed, for example, in finding maximal common subgraph between two graphs, in defining information distance based on the notion of Kolmogorov complexity, and in evaluating the importance of attributes. These are special solutions, each of which is only suitable for a specific context from which it is derived. In the following, we show that by casting the solutions in previous sections to each of these contexts, these metrics readily follow. 6.1. Set similarity and distance Given sets A and B, we denote by A − B the relative complement of B in A, i.e. A − B = A ∩ B¯ = {x ∈ A | x ∈ / B}. Graph distance: An example of graph distance metric [2], based on the notion of maximal common subgraph, is 1 − |G1 ∩G2 | G1 −G2 |,|G2 −G1 |} = max{|max where G1 ∩ G2 represents the maximal common subgraph between the graphs G1 and max{|G1 |,|G2 |} {|G1 |,|G2 |} G2 and |G1 ∩ G2 | is a similarity metric. Attributed tree distance: An attributed tree is a tree of which every node is associated with a vector of attributes. A way of defining a distance metric between two attributed trees is based on maximum similarity subtree isomorphism [24]. Examples are
• |T1 | + |T2 | − 2 × |T1 ∩ T2 | = |T1 − T2 | + |T2 − T1 |, • max{|T1 |, |T2 |} − |T1 ∩ T2 | = max{|T1 − T2 |, |T2 − T1 |}, T2 −T1 | • 1 − ||TT11 ∩∪TT22 || = |T1 −T|T21|+| , ∪T2 | • 1−
|T1 ∩T2 | max{|T1 |,|T2 |}
=
max{|T1 −T2 |,|T2 −T1 |} max{|T1 |,|T2 |}
where T1 ∩ T2 represents a maximum similarity subtree between two attributed trees T1 and T2 and |T1 ∩ T2 | is a similarity metric. The formulation of the metrics in the above examples is essentially based on the notion of set similarity and distance. Therefore, we now cast the general solution in this context. Given finite sets A, B and C , we have |A ∩ B| + |B ∩ C | − |A ∩ C | ≤ |B|. Note that this inequality is the equivalent of that in similarity condition 4. It is easy to verify that |A ∩ B| is a similarity metric. From Lemmas 8 and 9 it follows that |A∩B| both |A − B| + |B − A| and max{|A − B|, |B − A|} are distance metrics. From Theorem 1, it follows that |A∪B| is a similarity |A−B|+|B−A| |A∩B| is a distance metric. From Theorem 2, it follows that max{|A|,|B|} |A∪B| max{|A−B|,|B−A|} is a distance metric. max{|A|,|B|}
metric and consequently
is a similarity metric and
consequently We summarize the results in Table 2.
Remark. Note that these are a subset of the metrics that may derive from the general solution. Evidently they encompass the metrics in the examples. For the formulae in fractional forms, we have chosen a simple concave function f (x) = x. There are many other functions to choose, so long as they meet the functional conditions specified in previous sections. It is, in general, easier to determine whether a given function meets a set of conditions than to prove that a given formula involving this function is a metric.
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
2375
Table 3 Summary: Similarity and distance metrics for evaluating the importance of attributes. Similarity
Distance
I (X , Y ), [6,9,16]
H (X |Y ) + H (Y |X ), [10,25]
I (X ,Y ) , [12] H (X ,Y ) I (X ,Y ) , max{H (X ),H (Y )}
H (X |Y )+H (Y |X ) , H (X ,Y )
[10,17]
[5,7]
Table 4 Summary: Information similarity metrics and distance metrics. Similarity I (X , Y )
Distance H (X | Y ) + H (Y | X ) max{H (X |Y ), H (Y |X )}
I (X ,Y ) H (X ,Y ) I (X ,Y ) max{H (X ),H (Y )}
H (X |Y )+H (Y |X ) H (X ,Y ) max{H (X |Y ),H (Y |X )} max{H (X ),H (Y )}
6.2. Information similarity and distance Kolmogorov complexity: There has been some study on defining the information distance, based on the notion of Kolmogorov complexity [8]. The Kolmogorov complexity K (x) of a string x is the length of a shortest binary program x∗ to compute x on an appropriate universal computer. The distance between two objects may be defined to be the length of the shortest program that can transform either object into the other and vice versa. Examples of such information distance K (x|y∗ )+K (y|x∗ ) max{K (x|y∗ ),K (y|x∗ )} metric are and max{K (x),K (y)} . K (x,y) Data mining: An attribute is deemed important in data mining if it partitions the database such that new patterns are revealed [27]. Several similarity and distance metrics were proposed in the context of evaluating the importance of attributes. They are listed in Table 3. The formulation of the metrics in the above examples is essentially based on the notion of information similarity and distance. We now cast the general solution in this context. Denote by H (X ) the information entropy of a discrete random variable X , H (Y |X ) the entropy of Y conditional on X , H (X , Y ) the joint entropy of X and Y , and I (X , Y ) the mutual information between X and Y . From information theory, we have H (X |Y ) ≤ H (X |Z ) + H (Z |Y ). The mutual information between X and Y is defined as I (X , Y ) = H (X ) − H (X |Y ), with I (X , Y ) = I (Y , X ). With the above, we have I (X , Y ) + I (Y , Z ) ≤ I (X , Z ) + I (Y , Y ). Then, it is straightforward to verify that I (X , Y ) is a similarity metric. From Lemmas 8 and 9 it follows that both H (X |Y ) + H (Y |X ) I ( X ,Y ) and max{H (X |Y ), H (Y |X )} are distance metrics. From Theorem 1, it follows that H (X ,Y ) is a similarity metric where H (X , Y ) H (X |Y )+H (Y |X ) H (X ,Y ) max{H (X |Y ),H (Y |X )} is a max{H (X ),H (Y )}
is the joint entropy of X and Y defined as H (X , Y ) = H (X ) + H (Y |X ) with H (X , Y ) = H (Y , X ). Consequently, is a distance metric. From Theorem 2, it follows that distance metric. We summarize the results in Table 4.
I ( X ,Y ) max{H (X ),H (Y )}
is a similarity metric. Consequently,
Remark. Note the resemblance between the above metrics and those for the case of set, both constructed from the general solution. Furthermore, it is evident that the metrics in these examples can all be obtained from the same principle. In the context of Kolmogorov complexity, basic quantities such as K (x), K (x, y), K (x|y) and I (x : y) are similar to H (X ), H (X , Y ), H (X |Y ) and I (X , Y ), respectively. Their respective formulae take on equivalent forms. Analogous to I (X , Y ), I (x : y) is a similarity metric. With this, the two distance metrics readily follow from the general solution.
6.3. Sequence edit distance and similarity It is well known that if the cost for basic operations of insertion, deletion, and substitution is a distance metric, then the sequence edit distance d(s1 , s2 ), defined between two sequences s1 and s2 and derived from finding the minimum-cost sequence of operations that transform s1 to s2 , is also a distance metric. d(s ,s )
d(s ,s )
1 2 1 2 Several normalized edit distances have been proposed and studied [13,15]. Examples are |s |+| , , and s2 | max{|s1 |,|s2 |} 1 d(s1 ,s2 ) n(s1 , s2 ) = min{ |p| | p is a path that changes s1 to s2 }. Although these are referred to as normalized edit distance, they are not distance metric.
2376
S. Chen et al. / Theoretical Computer Science 410 (2009) 2365–2376
From the results of Section 5, choosing o as the empty sequence, we have two normalized edit distance metrics. If the indel cost is 1, then the following is a normalized distance metric: 1 2
×
d(s1 , s2 ) − min{|s1 |, |s2 |} max{|s1 |, |s2 |}
+1 .
For sequence similarity, one popular measurement is protein sequence similarity based on BLOSUM matrices using Smith–Waterman algorithm [20]. In fact, based on the original score without rounding, any BLOSUM-N matrix with N ≥ 55 is a similarity metric. Therefore protein sequence similarity based on those BLOSUM matrices with Smith–Waterman algorithm is a similarity metric. s(s ,s ) For normalized sequence similarity, an example is |s |+|1 s 2|+k where k > 0 [1]. This, however, is not a similarity metric 1 2 since condition 4 of the similarity metric is not satisfied. 7. Conclusions We have given a formal definition for the similarity metric. We have shown the relationship between the similarity metric and the distance metric. We have given general formulae to normalize a similarity metric or a distance metric. We have shown, with examples, how the general solutions are useful in constructing metrics suitable for various contexts. Acknowledgments We thank an anonymous referee for offering helpful suggestions in improving the presentation of this article. Part of the work was done while Kaizhong Zhang was visiting the Institute for Mathematical Sciences, National University of Singapore in 2006. The visit was partially supported by the Institute. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28]
A.N. Arslan, Oï Eğecioğlu, P.A. Pevzner, A new approach to sequence alignment: Normalized sequence alignment, Bioinformatics 17 (4) (2001) 327–337. H. Bunke, K. Shearer, A graph distance metric based on the maximal common subgraph, Pattern Recognition Letters 19 (1998) 255–259. C.S. Calude, K. Salomaa, S. Yu, Additive distances and quasi-distances between words, Journal of Universal Computer Science 8 (2) (2002) 141–152. S. Chen, B. Ma, K. Zhang, The normalized similarity metric and its applications, in: Proceedings of 2007 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2007, 2007, pp. 172–180. Y. Horibe, Entropy and correlation, IEEE Transactions on Systems, Man, and Cybernetics 15 (1985) 641–642. A.J. Knobbe, P.W. Adriaans, Analysing binary associations, in: E. Simoudis, J. Han, U. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 311–314. T.O. Kvålseth, Entropy and correlation: Some comments, IEEE Transactions on Systems, Man, and Cybernetics 17 (1987) 517–519. M. Li, X. Chen, X. Li, B. Ma, P.M.B. Vitányi, The similarity metric, IEEE Transactions on Information Theory 50 (12) (2004) 3250–3264. E.H. Linfoot, An informational measure of correlation, Information and Control 1 (1) (1957) 85–89. R. López de Mántaras, Id3 revisited: A distance-based criterion for attribute selection, in: Z. Ras (Ed.), Proceedings of the Fourth International Symposium on Methodologies for Intelligent Systems, 1989, pp. 342–350. B. Ma, K. Zhang, The similarity metric and the distance metric, in: Proceedings of the 6th Atlantic Symposium on Computational Biology and Genome Informatics, 2005, pp. 1239–1242. F.M. Malvestuto, Statistical treatment of the information content of a database, Information Systems 11 (1986) 211–223. A. Marzal, E. Vidal, Computation of normalized edit distance and applications, IEEE Transactions on Pattern Analysis and Machine Intelligence 15 (9) (1993) 926–932. S.E. Needleman, C.D. Wunsch, A general method applicable to the search for similarities in the amino-acid sequences of two proteins, Journal of Molecular Biology 48 (1970) 443–453. B.J. Oommen, K. Zhang, The normalized string editing problem revisited, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (6) (1996) 669–672. J.R. Quinlan, Induction of decision trees, Machine Learning 1 (1) (1986) 81–106. C. Rajski, A metric space of discrete probability distributions, Information and Control 4 (4) (1961) 371–377. S.C. Sahinalp, M. Tasan, J. Macker, Z.M. Ozsoyoglu, Distance based indexing for string proximity search, in Proceedings of the 19th International Conference on Data Engineering, 2003, pp. 125–136. N. Saitou, M. Nei, The neighbor-joining method: A new method for reconstructing phylogenetic trees, Molecular Biology and Evolution 4 (1987) 406–425. T.F. Smith, M.S. Waterman, Comparison of biosequences, Advances in Applied Mathematics 2 (1981) 482–489. R.R. Sokal, C.D. Michener, A statistical method for evaluating systematic relationships, University of Kansas Scientific Bulletin 28 (1958) 1409–1438. A. Stojmirovic, V. Pestov, Indexing schemes for similarity search in datasets of short protein fragments, ArXiv Computer Science e-prints (cs/0309005), September 2003. J.A. Studier, K.J. Keppler, A note on the neighbor-joining algorithm of Saitou and Nei, Molecular Biology and Evolution 5 (1988) 729–731. A. Torsello, D. Hidović-Rowe, M. Pelillo, Polynomial-time metrics for attributed trees, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (7) (2005) 1087–1099. S.J. Wan, S.K.M. Wong, A measure for concept dissimilarity and its application in machine learning, in: Proceedings of the First International Conference on Computing and Information, 1989, pp. 267–273. M.S. Waterman, T.F. Smith, Some biological sequence metrics, Advances in Mathematics 20 (1976) 367–387. Y.Y. Yao, S.K.M. Wong, C.J. Butz, On information-theoretic measures of attribute importance, in: N. Zhong (Ed.), Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining, 1999, pp. 133–137. K. Zhang, D. Shasha, Simple fast algorithms for the editing distance between trees and related problems, SIAM Journal on Computing 18 (6) (1989) 1245–1262.
Theoretical Computer Science 410 (2009) 2377–2392
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
State complexity of power Michael Domaratzki a,∗ , Alexander Okhotin b,c a
Department of Computer Science, University of Manitoba, Winnipeg, MB, Canada R3T 2N2
b
Academy of Finland, Helsinki, Finland
c
Department of Mathematics, University of Turku, Turku FIN-20014, Finland
article
info
Keywords: Descriptional complexity Finite automata State complexity Combined operations Concatenation Power
a b s t r a c t The number of states in a deterministic finite automaton (DFA) recognizing the language Lk , where L is regular language recognized by an n-state DFA, and k > 2 is a constant, is shown to be at most n2(k−1)n and at least (n − k)2(k−1)(n−k) in the worst case, for every n > k and for every alphabet of at least six letters. Thus, the state complexity of Lk is Θ (n2(k−1)n ). In the case k = 3 the corresponding state complexity function for L3 is determined as 6n−3 n 4 − (n − 1)2n − n with the lower bound witnessed by automata over a four-letter 8 alphabet. The nondeterministic state complexity of Lk is demonstrated to be nk. This bound is shown to be tight over a two-letter alphabet. © 2009 Elsevier B.V. All rights reserved.
1. Introduction State complexity, which is the measure of the minimal number of states in any DFA accepting a given regular language, is one of the most well-studied descriptional complexity measures for formal languages; the topic has been an active research area for over ten years. Many results related to the state complexity of various operations on formal languages have been examined. We note in particular that the state complexity of concatenation was obtained by Maslov [10] and further studied by Yu et al. [16] and Jirásek et al. [7], who determined the effect of the number of final states on the state complexity. The state complexity of concatenation over a unary alphabet was considered by Yu et al. [16] and subsequently by Pighizzini and Shallit [11], while Holzer and Kutrib [6] have studied the state complexity of concatenation with respect to nondeterministic finite automata (NFA). Recently, A. Salomaa et al. [14] initiated the study of the state complexity of combinations of basic operations. More such operations were subsequently examined [4,5,8,9,15,16]. In each result, a certain combination of operations over independent arguments is examined to determine its exact state complexity; in many cases, the state complexity of the combined operation is less than the direct composition of the deterministic state complexities of the individual operations. As noted by K. Salomaa and Yu [15], an interesting research topic is the state complexity of combined operations with ‘‘nonlinear variables’’, that is, combined operations in which one or more operands are used in several positions in the expression. Rampersad [12] gives results on nonlinear combined operations by studying the state complexity of powers of a language: Lk for k > 2. In particular, Rampersad shows that if the state complexity of L is n, then L2 has state complexity at most n2n − 2n−1 , and this bound can be reached for any n > 3 over an alphabet of size two. Rampersad also addresses the problem of the state complexity of Lk with k > 3 for unary languages L, but leaves the state complexity of Lk for k > 3 and arbitrary alphabets open. In this paper, we consider this problem of the state complexity of Lk for L over an alphabet of size at least two. In particular, we show a general bound for the Lk which holds for any k > 2. A lower bound which is optimal up to a constant factor
∗
Corresponding author. E-mail addresses:
[email protected] (M. Domaratzki),
[email protected] (A. Okhotin).
0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.025
2378
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
(with the constant depending on k) is given over a six-letter alphabet. For the state complexity of L3 , we show an improved upper bound and a matching lower bound over a four-letter alphabet. Finally, we address the problem of nondeterministic state complexity of power. We show that if the nondeterministic state complexity of L is n, then the nondeterministic state complexity of Lk is nk for all k > 2, and give a matching lower bound over a binary alphabet. 2. Definitions For additional background in formal language and automata theory, see Rozenberg and A. Salomaa [13]. Let Σ be a finite set of symbols, called letters. The set Σ is called an alphabet. A string over Σ is any finite sequence of letters from Σ . The empty string, which contains no letters, is denoted ε . The set Σ ∗ is the set of all strings over Σ . A language L is any subset of Σ ∗ . If x = a1 a2 · · · an is a string, with ai ∈ Σ , then the length of x, denoted by |x|, is n. Given languages L1 , L2 ⊆ Σ ∗ , L1 L2 = {xy : x ∈ L1 , y ∈ L2 } is the concatenation of L1 and L2 . The kth power of a language L is defined recursively as L1 = L and Lk = LLk−1 for all k > 2. A deterministic finite automaton (DFA) is a quintuple A = (Q , Σ , δ, q0 , F ) where Q is a finite set of states, Σ is an alphabet, δ : Q × Σ → Q is the transition function, q0 ∈ Q is the distinguished start state and F ⊆ Q is the set of final states. We extend δ to a function acting on Q × Σ ∗ in the usual way: δ(q, ε) = q for all q ∈ Q , and δ(q, w a) = δ(δ(q, w), a) for any q ∈ Q , w ∈ Σ ∗ and a ∈ Σ . A DFA A = (Q , Σ , δ, q0 , F ) is said to be complete if δ is defined for all pairs (q, a) ∈ Q × Σ . In this paper, we assume that all DFAs are complete. A string w is accepted by A if δ(q0 , w) ∈ F . The language L(A) is the set of all strings accepted by A: L(A) = {w ∈ Σ ∗ : δ(q0 , w) ∈ F }. A language L is regular if there exists a DFA A such that L(A) = L. A nondeterministic finite automaton (NFA) is a quintuple A = (Q , Σ , δ, q0 , F ) where Q , Σ , q0 and F are as in the deterministic case, but the transition function is δ : Q × Σ → 2Q . The extension of δ to Q × Σ ∗ is accomplished by δ(q, ε) = q and δ(q, wa) = ∪q0 ∈δ(q,w) δ(q0 , a). For an NFA A, L(A) = {w ∈ Σ ∗ : δ(q0 , w) ∩ F 6= ∅}. It is known that NFAs accept exactly the regular languages. The (deterministic) state complexity of a regular language L, denoted sc(L), is the minimum number of states in any DFA which accepts L. Similarly, the nondeterministic state complexity of L is the minimum number of states in any NFA which accepts L, and is denoted by nsc(L). Given a DFA A = (Q , Σ , δ, q0 , F ), a state q ∈ Q is said to be reachable if there exists a string w ∈ Σ ∗ such that δ(q0 , w) = q. Given two states q1 , q2 ∈ Q , we say that they are equivalent if δ(q1 , w) ∈ F if and only if δ(q2 , w) ∈ F for all w ∈ Σ ∗ . If a pair of states is not equivalent, we say that they are inequivalent. 3. State complexity of L k In this section, we consider the state complexity of Lk , while treating the value k as a constant. We show an upper bound which is based on reachability of states, while an explicit lower bound with respect to an alphabet of size six is also given. n The upper and lower bounds differ by a multiplicative factor of 2k(k−1) n− = Θ (1). k 3.1. Upper bound Let L be a regular language with sc(L) = n, and assume that the minimal DFA for L has f final states. Note that the construction of Yu et al. [16, Thm. 2.3] for concatenation gives the following upper bound on Lk for an arbitrary k > 2: n2(k−1)n −
f (2nk − 1) 2(2n − 1)
f
− . 2
We now describe the construction of a DFA for Lk , which we use throughout what follows. Let A = (Q , Σ , δ, 0, F ) be an arbitrary DFA. Assume without loss of generality that Q = {0, 1, . . . , n − 1}. For a subset P ⊆ Q and for w ∈ Σ ∗ , we use the notation δ(P , w) = {δ(p, w) | p ∈ P }. The DFA for L(A)k is defined as Ak = (Qk , Σ , δk , Sk , Fk ) with the set of states Qk = Q × (2Q )k−1 , of which the initial state is Sk = (0, ∅, . . . , ∅) if 0 ∈ / F and Sk = (0, {0}, . . . , {0}) if 0 ∈ F , while the set of final states Fk consists of all states (i, P1 , P2 , . . . , Pk−1 ) ∈ Qk such that Pk−1 ∩ F 6= ∅. The transition function δk : Qk × Σ → Qk is defined as δk ((i, P1 , P2 , . . . , Pk−1 ), a) = (i0 , P10 , P20 , . . . , Pk0 −1 ) where: (1) (2) (3)
i0 = δ(i, a). if i0 ∈ F , then P10 = {0} ∪ δ(P1 , a). Otherwise, P10 = δ(P1 , a). for all 1 6 j 6 k − 2, if Pj0 ∩ F 6= ∅, then Pj0+1 = {0} ∪ δ(Pj+1 , a). Otherwise, Pj0+1 = δ(Pj+1 , a).
According to this definition, it is easy to see that if δk (Sk , w) = (i, P1 , . . . , Pk−1 ), then δ(0, w) = i and further ` ∈ Pj if and only if there exists a factorization w = u0 u1 . . . uj−1 v with u0 , u1 , . . . , uj−1 ∈ L(A) and with δ(0, v) = `. It follows that L(Ak ) = L(A)k .
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2379
Fig. 1. Representing states from Qk as diagrams.
Fig. 2. Transition table of Ak,n and its action on (Ak,n )k . States with no arrow originating from them are unchanged by the letter.
The above construction of Ak will be used throughout this paper. States from Qk will be represented by diagrams as in Fig. 1. Each row represents one of the k components of Qk , with the jth row representing the jth component. Accordingly, the top row is an element of Q , and all other rows represent subsets of Q . A solid dot will represent that a particular state is an element of the component: the left-most column represents state 0, the next left-most state 1, etc. Since |Qk | = n2(k−1)n , the following upper bound on the state complexity of the kth power can be inferred: Lemma 1. Let k > 2 and let L be a regular language with sc(L) = n. Then the state complexity of Lk is at most n2(k−1)n . 3.2. Lower bound In order to establish a close lower bound on the state complexity of the kth power, it is sufficient to present a sequence of automata Ak,n (2 6 k < n) over the alphabet Σ = {a, b, c , d, e, f }, with every Ak,n using n states, so that L(Ak,n )k requires Ω (n2(k−1)n ) states. Let each Ak,n have a set of states Q = {0, 1, . . . , n − 1}, of which 0 is the initial state, n − 1 is the sole final state, and where the transitions are defined as follows: j + 1 if 1 6 j 6 n − k − 1, if j = n − k, δ(j, a) = 1 j otherwise, ( 1 if j = 0, δ(j, c ) = 0 if j = 1, j otherwise, ( n − 1 if j = 0, δ(j, e) = j − 1 if n − k + 2 6 j 6 n − 1, j otherwise,
(
j+1 δ(j, b) = n − k + 1
(
j
δ(j, d) = δ(j, f ) =
1 j
if n − k + 1 6 j 6 n − 2, if j = n − 1, otherwise,
if j = n − k + 1, otherwise,
n−1 n−2
if j = 1, otherwise.
We now construct a DFA (Ak,n )k for the language L(Ak,n )k as described in Section 3.1. Its set of states is Qk = Q × (2Q )k−1 , and its initial state is (0, ∅, . . . , ∅). Fig. 2 shows the effect of the letters from Σ on states from Qk . In particular, the letter a rotates the elements in the range {1, . . . , n − k} forward, and leaves the remaining states unchanged. The letter b rotates those states in the range {n − k + 1, . . . , n − 1} forward, also leaving the rest of the states unchanged. An occurrence of the letter c swaps the states 0 and 1, leaving all others unchanged, while d collapses the state n − k + 1 onto state 0, leaving all other elements unchanged. The letter e maps the state 0 onto the state n − 1, as well as shifts those states in the range {n − k + 1, . . . , n − 1} back by one. Finally, the letter f collapses all states except 1 onto n − 2, and maps 1 to n − 1. We recall that, according to the construction of (Ak,n )k , n − 1 ∈ Pi implies 0 ∈ Pi+1 for all 1 6 i < k − 1. In the diagrams, this means that a state at the end of one row implies the existence of a state at the beginning of the next row (if such a row is present). We now show the reachability and inequivalence of a large subset of states, which will establish the lower bound. Lemmas 2 and 3 will establish that the states are reachable, and Lemma 4 shows that all such states are inequivalent. Lemma 2. Every state of the form (n − k + 1, P1 , . . . , Pk−1 ), where Pi \ {1, . . . , n − k} = {0, n − k + i + 1} for all 1 6 i < k − 1 and Pk−1 \ {1, . . . , n − k} = {0}, is reachable from the initial state. There are 2(k−1)(n−k) such states, and their general form is presented in the diagram in Fig. 3(b). In these diagrams, white areas without a dot indicate regions that are empty: no states are present in these regions. Grey areas in the diagram represent regions which may or may not be filled: any state in Pi in a grey region may or may not be present.
2380
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
Fig. 3. Outline of the reachability proof for Lk .
Fig. 4. Adding j to Pi using the string an−k−j+1 bk−1−i ccbi aj−1 .
Proof. All states of this form will be reached by induction on the total number of elements in P1 , . . . , Pk−1 . Pk−1 Basis: i=1 |Pi | = 2(k − 2) + 1, that is, Pi = {0, n − k + i + 1} for all i < k − 1 and Pk−1 = {0}, see Fig. 3(a). Then the state is (n − k + 1, {0, n − k + 2}, {0, n − k + 3}, . . . , {0, n − 1}, {0}), and it is reachable from the initial state (0, ∅, . . . , ∅) by ek−1 bk−1 . Induction step: Let (n − k + 1, P1 , P2 , . . . , Pk−1 ) be an arbitrary state with Pi \ {1, . . . , n − k} = {0, n − k + i + 1} for 1 6 i < k − 1 and with Pk−1 \ {1, . . . , n − k} = {0}. Let 1 6 i 6 k − 1 and 1 6 j 6 n − k be any numbers with j ∈ / Pi . The goal is to reach the state (j0 , P1 , . . . , Pi−1 , Pi ∪ {j}, Pi+1 , . . . , Pk−1 ) from (j0 , P1 , P2 , . . . , Pk−1 ), which is sufficient to establish the induction step. In order to add j to Pi , we apply the string an−k−j+1 bk−1−i ccbi aj−1 , as shown in Fig. 4. The prefix an−k−j+1 rotates the empty square to column 1, the next substring bk−1−i rotates the element n − k + i − 2 ∈ Pi−1 to column n − 1, then cc swaps columns 0 and 1 twice, effectively filling the empty square, and the suffix bi aj−1 rotates the columns back to their original order. It remains to move the solid dot in the top row to any column among 1, . . . , n − k. Lemma 3. Every state of the form (j0 , P1 , . . . , Pk−1 ), where 1 6 j0 6 n − k, Pi \ {1, . . . , n − k} = {0, n − k + i + 1} for all 1 6 i < k − 1 and Pk−1 \ {1, . . . , n − k} = {0}, is reachable from the initial state. There are (n − k)2(k−1)(n−k) such states, illustrated in the diagram in Fig. 3(c), in which the arrow represents the range of j0 . Proof. Let (j0 , P1 , . . . , Pk−1 ) be an arbitrary state which satisfies the conditions of the lemma. We claim that there exists a reachable state (n − k + 1, P10 , . . . , Pk0 −1 ) such that, after reading daj0 −1 , we arrive at the state (j0 , P1 , . . . , Pk−1 ). This will establish the lemma. Define Pi0 as follows. Take Pi0 \ {1, . . . , n − k} = {0, n − k + i + 1} for all 1 6 i < k − 1 and Pk0 −1 \ {1, . . . , n − k} = {0}. Next, Pi0 ∩ {1, . . . , n − k} = {` − j0 + 1 : ` ∈ Pi } where subtraction and addition are taken modulo n − k in the range {1, . . . , n − k}. Then the state (n − k + 1, P10 , . . . , Pk0 −1 ) is reachable by Lemma 2, and the subsequent computation upon reading daj0 −1 is presented in Fig. 5. By d, the automaton goes from this state to (1, P10 , . . . , Pk0 −1 ). Next, after the application of aj0 −1 , each Pi0 is properly rotated (in the range {1, . . . , n − k}) to Pi , that is, the automaton proceeds to (j0 , P1 , . . . , Pk−1 ). Lemma 4. All states of the above form are pairwise inequivalent. Proof. We first require the following three claims:
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2381
Fig. 5. Moving j to position j0 using the string daj0 −1 .
Claim 1. Let (j, P1 , . . . , Pk−1 ) be an arbitrary state and 1 6 i 6 k − 1. After reading the string (cf )k−i , the automaton (Ak,n )k is in a final state if and only if 0 ∈ Pi . Proof. The proof is by induction on i, starting with i = k − 1. For i = k − 1, first suppose that 0 ∈ Pk−1 . Then after reading c, the automaton is in the state (j0 , P10 , . . . , Pk0 −1 ) with 1 ∈ Pk0 −1 . After reading f , the automaton is in the state (j00 , P100 , . . . , Pk00−1 ) with n − 1 ∈ Pk00−1 . This is a final state, as required. Now, suppose that 0 ∈ / Pk−1 . After reading c, we are in the state (j0 , P10 , . . . , Pk0 −1 ) with 1 ∈ / Pk0 −1 . After reading f , the automaton is in the state (j00 , P100 , . . . , Pk00−1 ) where n − 1 ∈ / Pk00−1 , as f maps all states but 1 to the state n − 2. Thus, as n−1∈ / Pk00−1 , the state is not final. Assume that the statement holds for all i with ` < i 6 k − 1. We now establish it for i = ` < k − 1. Assume first that 0 0 ∈ / P` . Again, after reading cf , we are in a state (j0 , P10 , . . . , Pk0 −1 ) where n − 1 ∈ / P`0 . Thus, cf does not add 0 to P`+ 1 . On 0 the other hand, the application of cf ensures that P`+1 ⊆ {n − 2, n − 1}, since f maps all states into that pair, and 0 is not 0 0 k−`−1 added to P`+ / P`+ from (j0 , P10 , . . . , Pk0 −1 ) we are not in a 1 after reading cf . Thus, 0 ∈ 1 . By induction, after reading (cf ) final state. Now assume that 0 ∈ P` . After reading cf , we can verify that we are in a state (j0 , P10 , . . . , Pk0 −1 ) where n − 1 ∈ P` , and thus 0 ∈ P`+1 . Now, by induction, after reading (cf )k−`−1 we arrive at a final state. Claim 2. For all j (0 6 j 6 n − 1), the string an−k−j+1 f (cf )k−1 is accepted from (j0 , P1 , . . . , Pk−1 ) if and only if j = j0 . Proof. To establish this claim, first note that if j = j0 , then an−k−j+1 moves to a state (1, P10 , . . . , Pk0 −1 ), which is then mapped to (n − 1, P100 , . . . , Pk00−1 ) by f . Thus, after reading an−k−j+1 f , we have 0 ∈ P100 . By Claim 1, after reading (cf )k−1 , we arrive at a final state. On the other hand, if j 6= j0 , then an−k−j+1 maps j0 to a state which is mapped to n − 2 after reading f . Thus, after reading n−k−j+1 a f , the DFA is in a state (n − 2, P10 , P20 , . . . , Pk0 −1 ) with 0 ∈ / P10 : we have just read f , which maps all elements to either n − 1 or n − 2, and the first component is not n − 1, which would add 0 to P10 . Again, using Claim 1, we can establish that upon reading (cf )k−1 , such a state (n − 2, P10 , . . . , Pk0 −1 ) proceeds to a non-final state. Claim 3. For all i, j, with 1 6 i 6 k − 1 and 0 6 j 6 n − 1, the string an−k−j+1 f (cf )k−1−i is accepted from a state (j0 , P1 , . . . , Pk−1 ) if and only if j ∈ Pi . Proof. If j ∈ Pi , then reading an−k−j+1 , the automaton moves to a state (j00 , P10 , . . . , Pk0 −1 ) where 1 ∈ Pi0 , which is subsequently mapped to a state (j000 , P100 , . . . , Pk00−1 ) where n − 1 ∈ Pi00 by f . Thus, if i < k − 1, then 0 ∈ Pi00+1 . By Claim 1, after reading (cf )k−i−1 , (j000 , P000 , . . . , Pk00−1 ) proceeds to a final state. Otherwise, if i = k − 1, then (cf )k−i−1 = ε , and since n − 1 ∈ Pk00−1 , the state (j000 , P100 , . . . , Pk00−1 ) is final. If j ∈ / Pi , then after reading an−k−j+1 the automaton moves to a state (j00 , P10 , . . . , Pk0 −1 ) with 1 ∈ / Pi0 , and the subsequent 00 00 00 00 / Pi . If i = k − 1, the string ends here and it is not accepted, which transition by f moves it to (j0 , P1 , . . . , Pk−1 ) with n − 1 ∈ settles the case. Otherwise, if i < k − 1, consider that we have just read an f , which maps every element to either n − 1 or / Pi00+1 . Again, by Claim 1, after reading (cf )k−i−1 , we arrive at a non-final n − 2; then, as n − 1 ∈ / Pi00 , we must have that 0 ∈ state. With these three claims, we can easily establish that if (j0 , P1 , P2 , . . . , Pk−1 ) 6= (j00 , P10 , P20 , . . . , Pk0 −1 ), then there exists a string w ∈ a∗ f (cf )∗ such that exactly one of the states leads to a final state on reading w . This proves Lemma 4. Theorem 1. For every n-state regular language L, with n > 1 the language Lk requires at most n2(k−1)n states. Furthermore for every k > 2, n > k + 1 and alphabet Σ with |Σ | > 6, there exists an n-state regular language L ⊆ Σ ∗ such that Lk requires at least (n − k)2(k−1)(n−k) states. Proof. By Lemmas 1, 3 and 4.
Corollary 1. For every constant k > 2, the state complexity of Lk is Θ (n2(k−1)n ). 4. State complexity of L 3 The state complexity of L2 is known precisely from Rampersad [12], who determined it as n2n − 2n−1 for n > 3. For the next power, the cube, Corollary 1 asserts that the state complexity of L3 is Θ (n4n ), and Theorem 1 states in particular that it lies between (n − 3)4(n−3) and n4n for each n > 4. We now obtain a precise expression for this function.
2382
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
Fig. 6. Unreachable states in Lemma 5.
4.1. Upper bound Let A = (Q , Σ , δ, 0, F ) be an arbitrary DFA. Assume without loss of generality that Q = {0, 1, . . . , n − 1}. Recall from Section 3.1 the construction for Ak for k = 3. In particular, A3 = (Q3 , Σ , δ3 , S3 , F3 ) with the set of states Q3 = Q × 2Q × 2Q , in which the initial state is S3 = (0, ∅, ∅) if 0 ∈ / F and S3 = (0, {0}, {0}) if 0 ∈ F , while F3 consists of all states (i, P , R) ∈ Q3 with R ∩ F 6= ∅. The transition function δ3 : Q3 × Σ → Q3 is defined as follows: δ3 ((i, P , R), a) = (i0 , P 0 , R0 ) where: (1) (2) (3)
i0 = δ(i, a). if i0 ∈ F , then P 0 = {0} ∪ δ(P , a). Otherwise, P 0 = δ(P , a). if P 0 ∩ F 6= ∅, then R0 = {0} ∪ δ(R, a). Otherwise, R0 = δ(R, a).
We now give a description of unreachable states in A3 . We will again use diagrams as in the case of Lk in Section 3 to represent states; in this case, as we are considering the cube, the diagrams will have three rows. Lemma 5. The following states in Q3 are unreachable: (a) (i, P , R) such that i ∈ F and 0 ∈ / P. (b) (i, P , R) such that P ∩ F 6= ∅ and 0 ∈ / R. (c) (i, ∅, R) where R 6= ∅. Additionally, when there is only one final state and this final state is not initial (assume without loss of generality that it is state n − 1), the following states are also unreachable: (d) (i, {i}, Q ) where 0 6 i < n − 1. (e) (i, {i}, Q \ {i}) where 0 6 i < n − 1. (f) (0, Q , {0}). The six cases listed in this lemma are illustrated by the diagrams in Fig. 6. Proof. Cases (a) and (b) follow immediately from the definition of δ3 : if a final state appears in a component, 0 must be added to the next component. Case (c) also follows from the definition of δ3 : elements of R can only be added when elements of P are already present, and once some states appear in P, they will never completely disappear, since the DFA is complete. We now turn to case (d). Let i 6= n − 1 and assume that δ3 ((i0 , P 0 , R0 ), a) = (i, {i}, Q ) for some state (i0 , P 0 , R0 ) and some letter a. Since i is not a final state, the third component of (i, {i}, Q ) must be obtained as δ(R0 , a) = Q , which may only happen if R0 = Q . Then δ(Q , a) = Q , that is, a is a permutation, so every state has a unique inverse image, and we must have that P 0 = {i0 }. Thus, the preceding state (i0 , P 0 , R0 ) is (i0 , {i0 }, Q ), which is of the same form. Therefore, the states of the form (i, {i}, Q ) are reachable only from the states of the same form, and hence unreachable from the start state. Case (e) is similar to case (d). Assume that δ3 ((i0 , P 0 , R0 ), a) = (i, {i}, Q \ {i}) for some state (i0 , P 0 , R0 ) and some letter a, where i 6= n − 1. Since i is not final, the third component of (i, {i}, Q \ {i}) is obtained as δ(R0 , a) = Q \ {i}. On the other hand, δ(i0 , a) = i, so in fact δ(Q , a) = Q and a is a permutation. Therefore, P 0 = {i0 } and R0 = Q \ {i0 }, that is, the state (i, {i}, Q \ {i}) is again reachable only from a state (i0 , {i0 }, Q \ {i0 }) of the same form. This group of states is therefore also not reachable from the initial state. Finally, for case (f), consider the state (0, Q , {0}). Let (j, P , R) ∈ Q3 and a ∈ Σ be such that δ3 ((j, P , R), a) = (0, Q , {0}). As 0 ∈ / F , we have that Q = δ(P , a). Thus, it must be that P = Q . But now, j is the unique state such that δ(j, a) = 0 and R = {j}. Thus, δ3 ((j, Q , {j}), a) = (0, Q , {0}). If j 6= 0, then the state (j, Q , {j}) is already unreachable by case (b). Thus, the only other possibly reachable state leading to (0, Q , {0}) is itself, and the state is unreachable. Note that Lemma 5 does not consider the case of the initial state being the unique final state. This case is in fact trivial in terms of state complexity, which will be discussed in the proof of Lemma 6 below.
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2383
Lemma 6. Let L be a regular language with sc(L) = n > 3. Then the state complexity of L3 is at most 6n − 3 8
4n − (n − 1)2n − n.
(1)
This upper bound is reachable only if the minimal DFA A for L has a unique final state that is not initial, and only if all states in the corresponding automaton A3 are reachable except those in Lemma 5. Proof. Let A be a DFA with n states and f final states. We first note that if A has only one final state, we may assume without loss of generality that it is not the initial state. Indeed, if the lone final state is also the initial state, then L(A) = L(A)∗ . Thus L(A)k = L(A)∗ for all k > 1, and the state complexity is unaffected by taking powers (and the upper bound given by (1) obviously holds). Therefore, in what follows, in the cases where A has only one final state we assume that it is not the initial state. Consider first the case of more than one final state. Then the conditions (a), (b) and (c) from Lemma 5 are applicable. The total number of states is n4n . We can also count the number of unreachable states: (a) f 22n−1 states of the form (i, P , R) such that i ∈ F and 0 ∈ / P. (b) If 0 ∈ / F , there are n(2f − 1)22n−f −1 states of the form (i, P , R) such that P ∩ F 6= ∅ and 0 ∈ / R. Of them, f (2f − 1)22n−f −2 states also satisfy i ∈ F and 0 ∈ / P, and hence have already been excluded by (a). In total, there are n(2f − 1)22n−f −1 − f (2f − 1)22n−f −2 new unreachable states. On the other hand, if 0 ∈ F , there are n(2f − 1)22n−f −1 − f (2f −1 − 1)22n−f −1 states of the form (i, P , R) such that P ∩ F 6= ∅ and 0 ∈ / R not already excluded by (a). (c) (n − f )(2n − 1) states of the form (i, ∅, R) not already excluded by (a). The refined total of reachable states in the case that 0 ∈ / F is: n4n − f 22n−1 − (2f − 1)22n−f −2 (2n − f ) − (n − f )(2n − 1).
(2)
In the case where 0 ∈ F , it is n4n − f 22n−1 − ((2n − f )2f −1 − (n − f ))22n−f −1 − (n − f )(2n − 1).
(3)
For one final state, cases (d), (e) and (f) of Lemma 5 yield an additional 2(n − 1) + 1 = 2n − 1 states which are unreachable. Thus, the total for one final state (which is not the initial state by assumption) is, using (2), n4n − 22n−1 − 22n−3 (2n − 1) − (n − 1)(2n − 1) − 2n + 1.
(4)
− (n − 1)2n − n. Now, consider the case of f > 2: we can easily verify that f (2f −1 − 1)22n−f −1 < f (2f − 1)22n−f −2 , and hence the expression
Simplifying the above, we get the expression
6n−3 n 4 8
in (3) is larger than (2). Thus, in order to show that (4) is the true upper bound, we must show that it is larger than (3). That is, we must show that the inequality n4n − f 22n−1 − ((2n − f )2f −1 − (n − f ))22n−f −1 − (n − f )(2n − 1) <
6n − 3 8
4n − (n − 1)2n − n
holds for all n > 3 and 2 6 f 6 n − 1. Rewriting the left-hand side of the inequality, we get n4n − f 22n−1 − (2n − f )2f −1 − (n − f ) 22n−f −1 − (n − f )(2n − 1)
n f n−f n−f = 4n + (n − f ) − + f +1 − n 2 4 2 2 1 n−2 n n n 5n − 6 64 − + + (n − 2) = 4 + (n − 2). 2
2
8
8
In the above inequality, we use the facts that n > 3 and 2 6 f 6 n − 1. Now, note that subtracting the final quantity 4n ( 5n8−6 ) + (n − 2) from the right-hand side of the original inequality gives
n+3 8
4n − (n − 1)2n − 2n + 2.
It is now easy to verify that this quantity is strictly above zero for all n > 3.
2384
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
Fig. 7. Witness automata An for cube.
4.2. Lower bound We now turn to showing that the upper bound in Lemma 6 is attainable over a four-letter alphabet. Consider a sequence of DFAs {An }n>3 defined over the alphabet Σ = {a, b, c , d}, where each automaton An has the set of states Q = {0, . . . , n − 1}, of which 0 is the initial state and n − 1 is the only final state, while the transition function is defined as follows: i+1 1 n−1
if 0 6 i 6 n − 3, if i = n − 2, if i = n − 1,
n−1 i 0
if i = 0, if 1 6 i 6 n − 1, if i = n − 1,
( δ(i, a) =
( δ(i, c ) =
0
(
δ(i, b) = i + 1 1
δ(i, d) =
i 0
if i = 0, if 1 6 i 6 n − 2, if i = n − 1,
if 0 6 i 6 n − 2, if i = n − 1.
The form of these automata is illustrated in Fig. 7. Note that the transition tables for a, b and c are permutations of the set of states, and therefore, for every σ ∈ {a, b, c }, one can consider its inverse σ −1 : Q → Q . Denote by σ −1 (j) for j ∈ Q , the unique state k such that δ(k, σ ) = j. One can consider sequences of negative symbols; for any ` > 0 denote by σ −` (j) the unique state k with δ(k, σ ` ) = j. This notation is naturally extended to sets of states: for any set P ⊆ {0, . . . , n − 1}, for any letter σ ∈ {a, b, c } and for any ` > 0, we use the notation σ −` (P ) to denote the uniquely defined set P 0 ⊆ {0, . . . , n − 1} such that δ(P 0 , σ ` ) = P. We use the construction for (An )3 given in Section 3.1. We also again use diagrams as in the case of Lk in Section 3 to represent states. We now establish three lemmas to show reachability of all states in (An )3 : first those states whose third component is empty, then those of the form (i, P , R) where i ∈ / P, and finally those with i ∈ P. Lemma 7. Every state of the form (i, P , ∅), where (I) i ∈ / P; (II) n − 1 ∈ / P; (III) if i = n − 1, then 0 ∈ P, is reachable by a string from {a, b}∗ . In other words, Lemma 7 claims that all states (i, P , R) with i ∈ / P and R = ∅ that are not deemed unreachable by Lemma 5 are in fact reachable. Proof. Induction on |P |. Basis: P = ∅. A state (i, ∅, ∅) with 0 6 i < n − 1 is reachable via ai from the start state (0, ∅, ∅). Induction step. The proof is organized into several cases, some of which are split into subcases. Each case is illustrated Fig. 8. Case 1: i = n − 1. Consider a state S = (n − 1, P , ∅) with 0 ∈ P and n − 1 ∈ / P. Case 1(a): If 1 ∈ / P, then S is reachable from (n − 2, b−1 (P \ {0}), ∅) by b, while the latter state is reachable according to the induction hypothesis, as |b−1 (P \ {0})| < |P |. Case 1(b): If 1 ∈ P, then S is reachable by a from (n − 1, a−1 (P \ {0}), ∅), which is in turn reachable by the induction hypothesis. Case 2: i = 1. Consider any state S = (1, P , ∅) with 1, n − 1 ∈ / P. Case 2(a): If 0 ∈ P, then S is reachable from (n − 1, {0} ∪ b−1 (P \ {0}), ∅) by b, where the latter state was shown to be reachable in the previous case. Case 2(b): If 0 ∈ / P, consider the greatest number ` with ` ∈ P. The state (1, {0} ∪ (P \ {`}), ∅) is reachable as in Case 2(a), and from this state the automaton goes to S by bn−1−` a` . Case 3: i 6= n − 1. Finally, any state S = (i, P , ∅) with 0 6 i 6 n − 2 and n − 1 ∈ / P is reachable from the state (1, a−(i−1) (P ), ∅) by ai−1 . States of the latter form have been shown to be reachable in Case 2.
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2385
Fig. 8. Reachability of states (i, P , ∅) in Lemma 7.
The above Lemma 7 will now be extended to reach all states (i, P , R) with i ∈ / P that are not unreachable due to Lemma 5. Lemma 8. Every state of the form (i, P , R), where (I) (II) (III) (IV)
i∈ / P; |P | > 1; if i = n − 1, then 0 ∈ P; if n − 1 ∈ P, then 0 ∈ R,
is reachable. Proof. Induction on |R|. The basis, R = ∅, is given by Lemma 7. For the induction step, we have three major cases, each of which is broken into several subcases. These cases are illustrated in Fig. 9. Case 1: n − 1 ∈ P. Case 1(a): 1 ∈ / P, i 6= 1. Then the state (b−1 (i), b−1 (P ), b−1 (R \ {0})) is reachable by the induction hypothesis, and from it the state (i, P , R) is reachable by b. Case 1(b): 1 ∈ / P, i = 1, 0 ∈ / P. Then (1, P , R) is reachable from (1, c −1 (P ), c −1 (R \ {0})) by c, where the latter state is reachable by the induction hypothesis. Case 1(c): 1 ∈ / P, i = 1, 0 ∈ P. Then the state (n − 1, b−1 (P ), b−1 (R \ {0})) is reachable by the induction hypothesis, and from this state the automaton goes by b to (1, P , R). Case 1(d): 1 ∈ P. Let j be the greatest number, such that 1, . . . , j ∈ P. Then either i > j or i = 0, and in each case (i, P , R) is reachable from (b−j (i), b−j (P ), b−j (R)) by bj . The latter state has n − 1 ∈ b−j (P ) and 1 ∈ / b−j (P ), and hence it has been proved to be reachable in Cases 1(a)–1(c). Case 2: n − 1 ∈ / P, n − 1 ∈ R. Case 2(a): 0 ∈ P. This state is reachable by c from (c −1 (i), c −1 (P ), c −1 (R)), which has n − 1 ∈ c −1 (P ) and is therefore reachable as in Case 1. Case 2(b): 0 ∈ / P. Let j be the least number in P. Then this state is reachable by aj from (a−j (i), a−j (P ), a−j (R)), which is reachable as in Case 2(a). Case 3: n − 1 ∈ / P, n − 1 ∈ / R. Case 3(a): 0 ∈ P, 0 ∈ R. This case is further split into three subcases depending on the cardinality of P and R: (3(a1 )) First assume |P | > 2 and let j be the least element of P \{0}. Then (i, P , R) is reached by bj from (b−j (i), b−j (P ), b−j (R)), which is in turn reachable as in Case 1, since n − 1 ∈ b−j (P ). 2 (3(a )) Similarly, if |R| > 2, then setting j as the least element of R \ {0} one can reach (i, P , R) by bj from (b−j (i), b−j (P ), b−j (R)), which has n − 1 ∈ b−j (R) and hence is reachable as in Case 1 or Case 2. (3(a3 )) The remaining possibility is |P | = |R| = 1, that is, P = {0} and R = {0}. Consider the state (1, {n − 1}, {0}), which was shown to be reachable in Case 1(b). From this state, the automaton goes to (1, {0}, {0}) by d and then to (i, {0}, {0}) by bi−1 .
2386
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
Fig. 9. Reachability of (i, P , R) with i ∈ / P: cases in the proof of Lemma 8.
Case 3(b): 0 6 i 6 n − 2, P ∩ R 6= ∅. Let j ∈ P ∩ Q be the least such number. Then this state is reachable by aj from (a (i), a−j (P ), a−j (R)), which is reachable as in Case 3(a). Case 3(c): 0 6 i 6 n − 2, P ∩ R = ∅. Since P , R 6= ∅, there exists at least one pair (j, k) with j ∈ P and j + k (mod n − 1) ∈ R. −j
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2387
Fig. 10. Reachability of (i, P , R) with i ∈ P: cases in the proof of Lemma 9.
(V) if P = {i}, then R 6= Q and R 6= Q \ {i}. (VI) if i = 0 and P = Q , then R 6= {0} is reachable. Note that the last two conditions of Lemma 9 exactly match the last three cases of Lemma 5. Proof. The proof again involves examining several cases, though this time there is no induction. These cases are illustrated in Fig. 10. The first case is based upon Lemma 8, the other cases depend on the first case and on each other. All cases except the last one, Case 4, deal with i 6= n − 1: Case 1 assumes n − 1 ∈ / P and n − 1 ∈ / R, Case 2 uses n − 1 ∈ P and Case 3 handles the last possibility: n − 1 ∈ / P and n − 1 ∈ R. Case 1: i 6= n − 1, n − 1 ∈ / P and n − 1 ∈ / R (that is, the column n − 1 in a diagram is empty). Any such state is reachable by dai from (n − 1, a−i (P ), a−i (R)), which has 0 ∈ a−i (P ), n − 1 ∈ / a−i (P ) and n − 1 ∈ / a−i (R), and is therefore reachable by Lemma 8. Case 2: i 6= n − 1 and n − 1 ∈ P (and therefore 0 ∈ R). Case 2(a): 0 ∈ / P, and therefore i 6= 0. This state is reachable from i, c −1 (P ), c −1 (R \ {0}) by c, which is reachable as in Case 1. Case 2(b): 0 ∈ P and i 6= 0. Consider the state 0, b−i (P ) \ {n − 1}, c −1 (b−i (R)) \ {n − 1} , which has empty column n − 1 and is therefore reachable as in Case 1. From this state, the automaton goes to (n − 1, b−i (P ), b−i (R)) by c, which has 0 ∈ b−i (P ) and 0 ∈ b−i (R). Therefore, by bi the automaton further proceeds to (i, P , R). Case 2(c): 0 ∈ P and i = 0. This case will be proved at the end of the proof. Case 3: i 6= n − 1, n − 1 ∈ / P and n − 1 ∈ R. Case 3(a): |P | > 2. Let j ∈ P \ {i} and consider the state (a−j (i), c −1 (a−j (P )), c −1 (a−j (R))), which is reachable as in Case 2(a). From this state, the automaton goes to (a−j (i), a−j (P ), a−j (R)) by c and then to (i, P , R) by aj . Case 3(b): |P | = 1. Then this is a state of the form (i, {i}, R). By Condition (V) in the statement of the lemma, R 6= Q and R 6= Q \ {i}. Therefore, there exists j ∈ / R with j 6= i. If i < j, then (i, {i}, R) is reachable by bj−i ai from (0, 0, b−(j−i) (a−i (R)));
2388
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
the latter state has n − 1 ∈ / b−(j−i) (a−i (R)), and so it is reachable as in Case 1. The same construction is applicable for any j 6= i, if one starts from (0, 0, b−(n−1−i+j) (a−i (R))) and uses bn−1−i+j ai . Case 4: i = n − 1 (and therefore 0 ∈ P and 0 ∈ R). This state is reachable by c from (0, c −1 (P ) \ {n − 1}, c −1 (R) \ {n − 1}), which is in turn reachable as in Case 1. This completes the case study. Now it remains to prove the last case 2(c), in which i = 0, 0 ∈ P and n − 1 ∈ P (and therefore 0 ∈ R). It follows from Condition (VI) in the statement of the lemma that there exists j > 0 with j ∈ / P or j ∈ R: indeed, if there were no such j, then P = Q and R = {0}, which would contradict Condition (VI). The proof splits into two subcases depending on j and its membership in P and in R: (2(c1 )) j ∈ P (and therefore j ∈ R by the definition of j). This state is reachable by cbj from (n − 1, c −1 (b−j (P )), c −1 (b−j (R))), which is in turn reachable as in Case 4. (2(c2 )) j ∈ / P. Consider the state (0, b−j (P ), b−j (R)\{0}), which is reachable as in Case 1 or 3(a). From this state, the automaton goes by bn−1 to state (0, b−j (P ), b−j (R)), because b−j (P ) contains the element n − 1 − j, which will eventually pass through position n − 1 and hence put 0 in R. Next, the automaton goes to (0, P , R) by bj . This remaining case concludes the proof. Thus, by the previous three lemmas, all the states which are not proven to be unreachable by Lemma 5 are, in fact, reachable. We now prove that distinct states are inequivalent. Lemma 10. All states in Q3 are pairwise inequivalent. Proof. Let (i, P , R) 6= (i0 , P 0 , R0 ). To show the inequivalence of these states, it is sufficient to construct a string that is accepted from one of these states but not from the other. If R 6= R0 , then we can assume without loss of generality that there exists a state j ∈ R \ R0 . If j > 1, then the string bn−1−j is accepted from (i, P , R) but not from (i0 , P 0 , R0 ). If j = 0, then abn−2 is accepted from (i, P , R) but not from (i0 , P 0 , R0 ). If P 6= P 0 , then assume without loss of generality that there is a state j ∈ P \ P 0 . If j 6 n − 2, then an−2−j dacabn−2 is accepted from (i, P , R) but not from (i0 , P 0 , R0 ). If j = n − 1, then bn−2 dacabn−2 is accepted from (i, P , R) but not from (i0 , P 0 , R0 ). Suppose i 6= i0 . If i 6 n − 2, then an−2−i dacan−2 dacabn−2 is accepted from (i, P , R) but not from (i0 , P 0 , R0 ). If i = n − 1, then bn−2 dacan−2 dacabn−2 is accepted from (i, P , R) but not from (i0 , P 0 , R0 ). Theorem 2. The state complexity of L3 is at most alphabet of at least 4 letters.
6n−3 n 4 8
− (n − 1)2n − n for all n > 3. This upper bound is reached on every
4.3. From cube to square We now give an interesting result which states that any witness for the worst case state complexity of L3 is also a witness for L2 as well. Proposition 1. Let L be a regular language with sc(L) = n > 3 and sc(L3 ) =
6n−3 n 4 8
− (n − 1)2n − n. Then sc(L2 ) = n2n − 2n−1 .
Proof. As sc(L) > 3, we note that L 6= ∅. Let A = (Q , Σ , δ, 0, F ) be a DFA for L and assume without loss of generality that Q = {0, . . . , n − 1}. Then A2 = (Q2 , Σ , δ2 , S2 , F2 ) is a DFA for L2 where Q2 = Q × 2Q \ {(i, P ) : i ∈ F , 0 ∈ / P }, S2 = (0, ∅) if 0 ∈ / F and S2 = (0, {0}) otherwise, F2 = {(i, P ) : P ∩ F 6= ∅} and δ2 is defined as: δ2 ((i, P ), a) = (i0 , P 0 ) where (1) i0 = δ(i, a). (2) if i0 ∈ F , then P 0 = {0} ∪ δ(P , a). Otherwise, P 0 = δ(P , a). Assume that sc(L2 ) < n2n − 2n−1 . Then when we use the construction of Yu et al. [16], we obtain either a state which is unreachable, or a pair of equivalent states. Consider reachability first. Let (i, P ) ∈ Q2 be arbitrary. Consider the state S ∈ Q3 defined by S = (i, P , ∅) if P ∩ F = ∅ and S = (i, P , {0, i0 }) for some arbitrary state i0 ∈ Q \ {0} otherwise (note that since n > 3, we can assume that i0 6= 0). The construction for L2 of Yu et al. excludes those states such that i ∈ F and 0 ∈ / P, so we note that condition (a) of Lemma 5 does not hold for S. Further, by the definition of S, conditions (b)–(e) trivially hold. Condition (f) also holds since the third component of S has size zero or two by definition. Thus, S does not satisfy the conditions of Lemma 5, so must be reachable. But then (i, P ) must also be reachable in A2 by the same input. We now turn to equivalence. In what follows, for any (i1 , P1 ), (i2 , P2 ) ∈ Q2 , we denote by (i1 , P1 ) ∼2 (i2 , P2 ) the fact that for all x ∈ Σ ∗ , if δ2 ((i1 , P1 ), x) = (i01 , P10 ) and δ2 ((i2 , P2 ), x) = (i02 , P20 ), then P10 ∩ F 6= ∅ if and only if P20 ∩ F 6= ∅. That is, ∼2 is the equivalence of states for A2 . We require the following claim: Claim 4. Let i1 , i2 ∈ Q , P1 , P2 ⊆ Q with (i1 , P1 ) ∼2 (i2 , P2 ). Let Y ⊆ Q be arbitrary. For all x ∈ Σ ∗ , there exists R ⊆ Q such that
δ3 ((i1 , P1 , Y ), x) = (i01 , P10 , R) and δ3 ((i2 , P2 , Y ), x) = (i02 , P20 , R).
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2389
Proof. The proof is by induction on |x|. For |x| = 0, then x = ε and we have that
δ3 ((i1 , P1 , Y ), ε) = (i1 , P1 , Y ) and δ3 ((i2 , P2 , Y ), ε) = (i2 , P2 , Y ). Assume that the result holds for all x ∈ Σ ∗ with |x| < k. Let x ∈ Σ ∗ be an arbitrary string of length k, and write x = x0 a where |x0 | = k − 1 and a ∈ Σ . Thus, note that
δ3 ((i1 , P1 , Y ), x0 ) = (i01 , P10 , R) and δ3 ((i2 , P2 , Y ), x0 ) = (i02 , P20 , R) for some R ⊆ Q . Let
δ3 ((i01 , P10 , R), a) = (i001 , P100 , R1 ) and δ3 ((i02 , P20 , R), a) = (i002 , P200 , R2 ) for some i001 , i002 ∈ Q and P100 , P200 , R1 , R2 ⊆ Q . We have two cases: (i) P100 ∩ F = ∅. By equivalence in A2 , the same is true of P200 . Thus, by the definition of δ3 , we have that R1 = δ(R, a) and R2 = δ(R, a) as well. Thus, R1 = R2 . (ii) P100 ∩ F 6= ∅. In this case, R1 = R2 = δ(R, a) ∪ {0}. Thus, the claim holds. We now show that all pairs of reachable states in Q2 are inequivalent. Assume not. Then there exists (i1 , P1 ), (i2 , P2 ) ∈ Q2 such that (i1 , P1 ) ∼2 (i2 , P2 ). There are three cases: (i) P1 ∩ F = ∅ (note that P2 ∩ F = ∅ as well by equivalence of states, in particular, with x = ε ). In this case, as we assume that sc(L3 ) achieves the bound in Lemma 6, and as the states (i1 , P1 , ∅) and (i2 , P2 , ∅) are not unreachable by Lemma 5, we must have that both (i1 , P1 , ∅) and (i2 , P2 , ∅) are reachable. In particular, note that conditions (d) and (e) are not satisfied since the final component is empty and n > 3. Further, (i1 , P1 , ∅) and (i2 , P2 , ∅) are equivalent in A3 by Claim 4: every state reachable from them on x has the same third component. (ii) P1 ∩ F 6= ∅, but (i1 , P1 ) 6= (0, Q ) and (i2 , P2 ) 6= (0, Q ). In this case, the states (i1 , P1 , {0}) and (i2 , P2 , {0}) are reachable. Further, as in Case (i), they are equivalent. (iii) (i1 , P1 ) = (0, Q ) (a similar case handles (i2 , P2 ) = (0, Q )). In this case, (i1 , P1 , {0, i}) and (i2 , P2 , {0, i}) are reachable states in A3 for any choice of 0 < i ∈ / F . They are equivalent by the same argument used in Case (i). Thus, in all cases, we have constructed a pair of states in Q3 which are reachable and equivalent. This is a contradiction, since each pair of states in Q3 are inequivalent, by assumption. We note that the reverse implication in Proposition 1 does not hold: for example, the witness languages given by Rampersad for the worst case complexity of L2 are over a two-letter alphabet. But by the calculations in Section 6, we will see that no language over a two-letter alphabet may give the worst case complexity for L3 for small values of n. 5. Nondeterministic state complexity We now turn to nondeterministic state complexity. Nondeterministic state complexity for basic operations has been examined by Holzer and Kutrib [6] and Ellul [3]. We give tight bounds on the nondeterministic state complexity for Lk for any k > 2. We adopt the fooling set method for proving the lower bounds on nondeterministic state complexity in the form of Birget [1, p. 188]. A fooling set for an NFA M = (Q , Σ , δ, q0 , F ) is a set S ⊆ Σ ∗ × Σ ∗ such that (a) xy ∈ L(M ) for all (x, y) ∈ S and (b) for all (x1 , y1 ), (x2 , y2 ) ∈ S with (x1 , y1 ) 6= (x2 , y2 ), either x1 y2 ∈ / L(M ) or x2 y1 ∈ / L(M ). If S is a fooling set for M, then nsc(L) > |S |. Theorem 3. For all regular languages L with nsc(L) = n and all k > 2, nsc(Lk ) 6 kn. Furthermore, for all n > 2 and k > 2, the bound is reached by a language over a binary alphabet. Proof. The upper bound is given by the construction of Holzer and Kutrib [6] or Ellul [3] for concatenation, which states that if nsc(L1 ) = n and nsc(L2 ) = m then nsc(L1 L2 ) 6 n + m. For the lower bound, consider the language Ln = an−1 (ban−1 )∗ , which is recognized by an n-state NFA given in Fig. 11(a). The language (Ln )k = (an−1 (ban−1 )∗ )k is recognized by the NFA in Fig. 11(b). The following facts will be useful: Claim 5. The only string in (Ln )k ∩ a∗ is ak(n−1) . Claim 6. The following equality holds: (Ln )k ∩ a∗ ba∗ = {aj(n−1) ba(k−j+1)(n−1) : 1 6 j 6 k}. In particular, each string in the intersection has length (k + 1)(n − 1) + 1.
2390
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
Fig. 11. NFAs for Ln and for (Ln )k . Table 1 Worst case complexity of Lk . n
L2
L3
L4
L5
L6
L7
L8
2 3 4 5 6 7
5 20 56 144 352 832
7 101 620 3323 16570 79097
9 410 6738 76736 782092
11 1 331 65854 1713946
13 3729 564566
15 8833
17 18176
Our fooling set is Sn,k = {(ε, an−1 bak(n−1) )} ∪ Sn,k,1 ∪ Sn,k,2 , where Sn,k,1 = {(a(n−1)j+i , an−i−1 ba(n−1)(k−j) ) : 1 6 i 6 n − 1, 0 6 j 6 k − 1} Sn,k,2 = {(a(n−1)j b, a(k−j+1)(n−1) : 2 6 j 6 k}. The total size of the fooling set is nk, as Sn,k,1 has size k(n − 1) and Sn,k,2 has size k − 1. Further, by Claim 6, all of the elements (x, y) ∈ Sn,k satisfy xy ∈ (Ln )k . It remains to show that for all (x1 , y1 ), (x2 , y2 ) ∈ Sn,k with (x1 , y1 ) 6= (x2 , y2 ), either x1 y2 ∈ / (Ln )k or x2 y1 ∈ / (Ln )k . We say such pairs are inequivalent in what follows. First note that none of an−i−1 ba(n−1)(k−j) with 1 6 i 6 n − 1 and 0 6 j 6 k − 1 or a(k−j+1)(n−1) are in (Ln )k . Thus, the element (ε, an−1 bak(n−1) ) is inequivalent with all elements of Sn,k,1 ∪ Sn,k,2 . 0 0 0 0 Next, we consider two pairs from Sn,k,1 . Take the pairs (a(n−1)j+i , an−i−1 ba(n−1)(k−j) ) and (a(n−1)j +i , an−i −1 ba(n−1)(k−j ) ) for 0 0 some i, i0 , j, j0 with 1 6 i, i0 6 n − 1 and 0 6 j, j0 6 k − 1. Assume (i, j) 6= (i0 , j0 ). Consider the string a(n−1)j+i an−i −1 ba(n−1)(k−j ) . Its length is (n − 1)j + i + n − i0 + (n − 1)(k − j0 ) = (j − j0 )(n − 1) + (i − i0 ) + (n − 1)(k + 1) + 1. Suppose j 6= j0 ; then |(j − j0 )(n − 1)| > n − 1, and since |i − i0 | < n − 1, we have (j − j0 )(n − 1) + (i − i0 ) 6= 0, that is, the length of the string is different from (n − 1)(k + 1) + 1. If j = j0 and i 6= i0 , then (j − j0 )(n − 1) + (i − i0 ) = i − i0 6= 0, and again the string is not of length (n − 1)(k + 1) + 1. In each case the string is not in (Ln )k by Claim 6. 0 0 Now consider two pairs from Sn,k,2 . If we take (a(n−1)j b, a(k−j+1)(n−1) ) and (a(n−1)j b, a(k−j +1)(n−1) ), for some 2 6 j < j0 6 k, (n−1)j (k−j0 +1)(n−1) then we can consider the string w = a ba . Note that this string has length (n − 1)(k − (j0 − j) + 1) + 1 < k (n − 1)(k + 1) + 1. Therefore, w is not in (Ln ) by Claim 6. Finally, it remains to consider pairs from Sn,k,1 × Sn,k,2 . Consider p1 = (a(n−1)j+i , an−i−1 ba(n−1)(k−j) ) and p2 = 0 (n−1)j0 (a b, a(k−j +1)(n−1) ) for some 1 6 i 6 n − 1, 0 6 j 6 k − 1 and 2 6 j0 6 k. There are two cases: (a) if i 6= n − 1, then consider a(n−1)j+i a(k−j +1)(n−1) , obtained from concatenating the first component of p1 and the second component of p2 . As i 6= n − 1, the length of the above string is not divisible by n − 1 and thus is certainly not in (Ln )k ∩ a∗ by Claim 5. 0 (b) if i = n − 1, then consider a(n−1)j ban−i−1 ba(n−1)(k−j) , which is the first component of p2 concatenated with the second component of p1 . Simplifying, we note that this string has an occurrence of bb, which is impossible as n > 2. 0
This completes the proof. 6. Calculations We present some numerical calculations of the worst case state complexity of Lk for k from 2 to 8 and for small values of n. In each case, this state complexity can be computed by considering automata over an nn -letter alphabet, in which the transitions by different letters represent all possible functions from Q → Q . For the final states, we follow the computational technique described by Domaratzki et al. [2], which requires only considering O(n) different assignments of final states. The computed results are given in Table 1. For instance, the worst case complexity of L4 for all DFAs of size 6 (782092) is taken with respect to an alphabet of size 66 = 46656. In particular, the column for L2 starting from n = 3 is known from Rampersad [12], who obtained a closed-form expression n2n − 2n−1 ; note that for n = 2 the upper bound is five states, which is slightly less than the general bound. The case of L3 is presented in more detail in Table 2, which demonstrates the worst case state complexity of L3 over alphabets of size 2, 3, 4 and of size nn (where n is the number of states) for automata of size n between 1 and 5. The final
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
2391
Table 2 Worst case state complexity of L3 .
1 2 3 4 5 6 7
2
3
4
nn
Upper bound
1 7 64 410 2277
1 7 96 608
1 7 101 620
1 7 101 620 3323 16570 79097
101 620 3323 16570 79097
column gives the upper bound from Theorem 2. Note that the table demonstrates that this upper bound cannot be reached for small values of n on alphabets of size three or fewer. Let us mention how these calculations helped us in obtaining the theoretical results in this paper. One of our computations considered all minimal 4-state DFAs over a 4-letter alphabet, pairwise nonisomorphic with respect to permutations of states and letters. There are 364644290 such automata; for each of them, the minimal DFA for its cube was computed, which took in total less than 6 days of machine time. In total 52 DFAs giving the top result (620 states) were found, and one of them was exactly the DFA A4 defined in Section 4.2. We obtained the general form of the automata An that witness the state complexity of the cube by generalizing this single example. 7. Conclusions and open problems We have continued the investigation of the state complexity of power, previously investigated by Rampersad [12]. We have given an upper bound for the state complexity of L3 over alphabets of size two or more, and shown that it is optimal for alphabets of size four by giving a matching lower bound. By calculation, the bound is not attainable for alphabets of size two or three, at least for small DFA sizes. For the case of general Lk , we have established an asymptotically tight bound. In particular, we have shown that if L is a regular language with state complexity n and k > 2, then the state complexity of Lk is Θ (n2(k−1)n ). The upper and lower n bounds on the state complexity of Lk differ by a factor of 2k(k−1) n− ; we leave it as a topic for future research to improve the k bounds for k > 4. Very recently, Ésik et al. [4] have determined the state complexity of concatenations of three and four regular languages: L1 · L2 · L3 and L1 · L2 · L3 · L4 . Unlike the cases of L3 and L4 studied in this paper, the languages being concatenated in these expressions need not be the same. Hence, the restrictions of Lemma 5(d)–(f) are not applicable in this case, and the set of reachable states has basically the same structure as in the case of concatenation of two languages. Accordingly, the worst case state complexity of concatenation of multiple languages is slightly higher than that in the case of powers of a single language. We have also considered the nondeterministic state complexity of Lk for alphabets of size two or more, and have shown a tight bound of kn. We leave open the problem of the nondeterministic state complexity of Lk over a unary alphabet, as the nondeterministic state complexity of concatenation over a unary alphabet is not currently known exactly [6]. Acknowledgements The first author’s research was conducted at the Department of Mathematics, University of Turku, during a research visit supported by the Academy of Finland under grant 118540. The first author’s research was supported in part by the Natural Sciences and Engineering Research Council of Canada. The second author’s work was supported by the Academy of Finland under grant 118540. References [1] J.-C. Birget, Intersection and union of regular languages and state complexity, Information Processing Letters 43 (1992) 185–190. [2] M. Domaratzki, D. Kisman, J. Shallit, On the number of distinct languages accepted by finite automata with n states, Journal of Automata, Languages and Combinatorics 7 (2002) 469–486. [3] K. Ellul, Descriptional complexity measures of regular languages, Master’s Thesis, University of Waterloo, Canada, 2002. [4] Z. Ésik, Y. Gao, G. Liu, S. Yu, Estimation of state complexity of combined operations, in: C. Cămpeanu, G. Pighizzini (Eds.), 10th International Workshop on Descriptional Complexity of Formal Systems, DCFS 2008, Charlottetown, PEI, Canada, July 16–18, 2008, pp. 168–181. [5] Y. Gao, K. Salomaa, S. Yu, The state complexity of two combined operations: Star of catenation and star of reversal, Fundamenta Informaticae 83 (2008) 75–89. [6] M. Holzer, M. Kutrib, Nondeterministic descriptional complexity of regular languages, International Journal of Foundations of Computer Science 14 (2003) 1087–1102. [7] J. Jirásek, G. Jirásková, A. Szabari, State complexity of concatenation and complementation, International Journal of Foundations of Computer Science 16 (3) (2005) 511–529. [8] G. Jirásková, A. Okhotin, On the state complexity of star of union and star of intersection, Turku Centre for Computer Science Technical Report 825, Turku, Finland, August 2007.
2392
M. Domaratzki, A. Okhotin / Theoretical Computer Science 410 (2009) 2377–2392
[9] G. Liu, C. Martín-Vide, A. Salomaa, S. Yu, State complexity of basic operations combined with reversal, Information and Computation 206 (2008) 1178–1186. [10] A.N. Maslov, Estimates of the number of states of finite automata, Soviet Mathematics Doklady 11 (1970) 1373–1375. [11] G. Pighizzini, J. Shallit, Unary language operations, state complexity and Jacobsthal’s function, International Journal of Foundations of Computer Science 13 (1) (2002) 145–159. [12] N. Rampersad, The state complexity of L2 and Lk , Information Processing Letters 98 (2006) 231–234. [13] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Springer, 1997. [14] A. Salomaa, K. Salomaa, S. Yu, State complexity of combined operations, Theoretical Computer Science 383 (2–3) (2007) 140–152. [15] K. Salomaa, S. Yu, On the state complexity of combined operations and their estimation, International Journal of Foundations of Computer Science 18 (2007) 683–698. [16] S. Yu, Q. Zhuang, K. Salomaa, The state complexity of some basic operations on regular languages, Theoretical Computer Science 125 (1994) 315–328.
Theoretical Computer Science 410 (2009) 2393–2400
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Twin-roots of words and their properties Lila Kari, Kalpana Mahalingam 1 , Shinnosuke Seki ∗ Department of Computer Science, The University of Western Ontario, London, Ontario, Canada, N6A 5B7
article
info
Keywords: f -symmetric words Twin-roots Morphic and antimorphic involutions Primitive roots
a b s t r a c t In this paper we generalize the notion of an ι-symmetric word, from an antimorphic involution, to an arbitrary involution ι as follows: a nonempty word w is said to be ι-symmetric if w = αβ = ι(βα) for some words α, β . We propose the notion of ιtwin-roots (x, y) of an ι-symmetric word w . We prove the existence and uniqueness of the ι-twin-roots of an ι-symmetric word, and show that the left factor α and right factor β of any factorization of w as w = αβ = ι(βα), can be expressed in terms of the ι-twin-roots of w . In addition, we show that for any involution ι, the catenation of the ι-twin-roots of w equals the primitive root of w . We also provide several characterizations of the ι-twin-rots of a word, for ι being a morphic or antimorphic involution. Crown Copyright © 2009 Published by Elsevier B.V. All rights reserved.
1. Introduction Periodicity, primitivity, overlaps, and repetitions of factors play an important role in combinatorics of words, and have been the subject of extensive studies, [8,12]. Recently, a new interpretation of these notions has emerged, motivated by information encoding in DNA computing. DNA computing is based on the idea that data can be encoded as biomolecules, [1], e.g., DNA strands, and molecular biology tools can be used to transform this data to perform, e.g., arithmetic and logic operations. DNA (deoxyribonucleic acid) is a linear chain made up of four different types of nucleotides, each consisting of a base (Adenine, Cytosine, Guanine, or Thymine) and a sugar-phosphate unit. The sugar-phosphate units are linked together by covalent bonds to form the backbone of the DNA single strand. Since nucleotides may differ only by their bases, a DNA strand can be viewed as simply a word over the four-letter alphabet {A, C, G, T}. A DNA single strand has an orientation, with one end known as the 5’ end, and the other as the 3’ end, based on their chemical properties. By convention, a word over the DNA alphabet represents the corresponding DNA single strand in the 5’ to 3’ orientation, i.e., the word GGTTTTT stands for the DNA single strand 5’-GGTTTTT-3’. A crucial feature of DNA single strands is their Watson–Crick complementarity: A is complementary to T, G is complementary to C, and two complementary DNA single strands with opposite orientation will bind to each other by hydrogen bonds between their individual bases to form a stable DNA double strand with the backbones at the outside and the bound pairs of bases lying at the inside. Thus, in the context of DNA computing, a word u encodes the same information as its complement θ (u), where θ denotes the Watson–Crick complementarity function, or its mathematical formalization as an arbitrary antimorphic involution. This special feature of DNA-encoded information led to new interpretations of the concepts of repetitions and periodicity in words, wherein u and θ (u) were considered to encode the same information. For example, [4] proposed the notion of θ primitive words for an antimorphic involution θ : a nonempty word w is θ -primitive iff it cannot be written in the form w = u1 u2 . . . un where ui ∈ {u, θ (u)}, n ≥ 2. Initial results concerning this special class of primitive words are promising and include, e.g., an extension, [4], of the Fine-and-Wilf’s theorem [5].
∗
Corresponding author. Tel.: +1 519 661 2111; fax: +1 519 661 3515. E-mail addresses:
[email protected] (L. Kari),
[email protected],
[email protected] (K. Mahalingam),
[email protected] (S. Seki).
1 Current address: Department of Mathematics, Indian Institute of Technology, Madras 600042, India. 0304-3975/$ – see front matter Crown Copyright © 2009 Published by Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.032
2394
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
To return to our motivation, the proof of the extended Fine-and-Wilf’s theorem [4], as well as that of an extension of the Lyndon–Schützenberger equation ui = v j w k in [10], to cases involving both words and their Watson–Crick complements, pointed out the importance of investigating overlaps between the square u2 of a word u, and its complement θ (u), i.e., overlaps of the form u2 = vθ (u)w for some words v, w . This is an analogue of the classical situation wherein u2 overlaps with u, i.e., u2 = v uw , which happens iff v = pi and w = pj for some i, j ≥ 1, where p is the primitive root of u. A natural question is thus whether there is any kind of ‘root’ which characterizes overlaps between u2 and θ (u) in the same way in which the primitive root characterizes the overlaps between u2 and u. For an arbitrary involution ι, this paper proposes as a candidate the notion of ι-twin-roots of a word. Unlike the primitive root, the ι-twin-roots are defined only for ι-symmetric words. A word u is ι-symmetric if u = αβ = ι(βα) for some words α, β and the connection with the overlap problem is the following: If ι is an involution and u is an ι-symmetric word, then u2 overlaps with ι(u), i.e., u2 = αι(u)β . The implication becomes equivalence if ι is a morphic or antimorphic involution. In this paper, we prove that an ι-symmetric word u has unique ι-twin-roots (x, y) such that xy is the primitive root of u (i.e., u = (xy)n for some n ≥ 1). In addition, if u = αβ = ι(βα), then α = (xy)i x, β = y(xy)n−i−1 for some i ≥ 1 (Proposition 4). Moreover, we provide several characterizations of ι-twin-roots for the case when ι is morphic or antimorphic. The paper is organized as follows. After basic notations, definitions and examples in Section 2, in Section 3 we investigate relationships between the primitive root and twin-roots of a word. We namely show that for an involution ι, the primitive root of an ι-symmetric word equals the catenation of its ι-twin-roots. Furthermore, for a morphic or antimorphic involution δ , we provide several characteristics of δ -twin-roots of words. In Section 4, we place the set of δ -symmetric words in the Chomsky hierarchy of languages. As an application of these results, in Section 5 we investigate the µ-commutativity between languages, XY = µ(Y )X , for a morphic involution µ. 2. Preliminaries Let Σ be a finite alphabet. A word over Σ is a finite sequence of symbols in Σ . The empty word is denoted by λ. By Σ ∗ , we denote the set of all words over Σ , and Σ + = Σ ∗ \ {λ}. For a word w ∈ Σ ∗ , the set of its prefixes, infixes, and suffixes are defined as follows: Pref(w) = {u ∈ Σ + | ∃v ∈ Σ ∗ , uv = w}, Inf(w) = {u ∈ Σ + | ∃v, v 0 ∈ Σ ∗ , v uv 0 = w}, and Suff(w) = {u ∈ Σ + | ∃v ∈ Σ ∗ , v u = w}. For other notions in the formal language theory, we refer the reader to [11,12]. A word u ∈ Σ + is said to be primitive if u = v i implies i = 1. By Q we denote the set of all primitive words. For any nonempty word u ∈ Σ + , there is a unique primitive √ word p ∈ Q , which is called the primitive root of u, such that u = pn for some n ≥ 1. The primitive root of u is denoted by u. An involution is a mapping f such that f 2 is the identity. A morphism (resp. antimorphism) f over an alphabet Σ is a mapping such that f (uv) = f (u)f (v) (f (uv) = f (v)f (u)) for all words u, v ∈ Σ ∗ . We denote by f , ι, µ, θ , and δ , an arbitrary mapping, an involution, a morphic involution, an antimorphic involution and a d-morphic involution (an involution that is either morphic or antimorphic), respectively. Note that an involution is not always length-preserving but a d-morphic involution is. A palindrome is a word which is equal to its mirror image. The concept of palindromes was generalized to θ -palindromes, [7,9], where θ is an arbitrary antimorphic involution: a word w is called a θ -palindrome if w = θ (w). This definition can be generalized as follows: For an arbitrary mapping f on Σ ∗ , a word w ∈ Σ ∗ is called a f -palindrome if w = f (w). We denote by Pf the set of all f -palindromes over Σ ∗ . The name f -palindrome serves as a reminder of the fact that, in the particular case when f is the mirror-image function, i.e., the identity function on Σ extended to an antimorphism of Σ ∗ , an f -palindrome is an ordinary palindrome. An additional reason for this choice of term was the fact that, in biology, the term ‘‘palindrome’’ is routinely used to describe DNA strings u with the property that θ (u) = u, where θ is the Watson– Crick complementarity function. In the case when f is an arbitrary function on Σ ∗ , what we here call an f -palindrome is simply a fixed point for the function f . Lemma 1. Let u ∈ Σ + and δ be a d-morphic involution. Then u ∈ Pδ if and only if
√
n
√
√
u ∈ Pδ .
√
n
√
n
Proof. Note that δ( u ) = δ( u)n for a d-morphic involution δ . If u ∈ Pδ , then we have u = δ( u ). This means that √ n √ √ √ u = δ( u)n . Since δ is length-preserving, u = δ( u). The opposite direction can be proved in a similar way. The θ -symmetric property of a word was introduced in [9] for antimorphic involutions θ . In [9], a word is said to be θ -symmetric if it can be written as a product of two θ -palindromes. We extend this notion to the f -symmetric property, where f is an arbitrary mapping. For a mapping f , a nonempty word w ∈ Σ + is f -symmetric if w = αβ = f (βα) for some α ∈ Σ + and β ∈ Σ ∗ . Our definition is a generalization of the definition in [9]. Indeed, when f is an antimorphic involution, w = αβ = f (βα) = f (α)f (β) implies α, β ∈ Pf . For an f -symmetric word w , we call a pair (α, β) such that w = αβ = f (βα) an f-symmetric factorization of w . Given an f -symmetric factorization (α, β) of a word, α is called its left factor and β is called its right factor. We denote by Sf the set of all f -symmetric words over Σ ∗ . We have the following observation on the inclusion relation between Pf and Sf . Proposition 2. For a mapping f on Σ ∗ , Pf ⊆ Sf .
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
2395
3. Twin-roots and primitive roots Given an involution ι, in this section we define the notion of ι-twin-roots of an ι-symmetric word u with respect to ι. We prove that any ι-symmetric word u has unique ι-twin roots. We show that the right and left factors of any ι-symmetric factorization of u as u = αβ = ι(βα) can all be expressed in terms of the twin-roots of u with respect to ι. Moreover, we show that the catenation of the twin-roots of an ι-symmetric word u with respect to ι equals the primitive root of u. We also provide several other properties of twin-roots, for the particular case of d-morphic involutions. We begin by recalling a theorem from [6] on language equation of the type Xu = v X , whose corollary will be used for finding the ‘‘twin-roots’’ of an ι-symmetric word. Corollary 3 ([6]). Let u, v, w ∈ Σ + . If uw = wv , then there uniquely exist two words x, y ∈ Σ ∗ with xy ∈ Q such that u = (xy)i , v = (yx)i , and w = (xy)j x for some i ≥ 1 and j ≥ 0. Proposition 4. Let ι be an involution on Σ ∗ and u be an ι-symmetric word. Then there uniquely exist two words x, y ∈ Σ ∗ such that u = (xy)i for some i ≥ 1 with xy ∈ Q , and if u = αβ = ι(βα) for some α, β ∈ Σ ∗ , then there exists k ≥ 0 such that α = (xy)i−k−1 x and β = y(xy)k . Proof. Given that u is ι-symmetric and (α, β) is an ι-symmetric factorization of u. It is easy to see that β u = ι(u)β holds. Then from Corollary 3, there exist two words x, y ∈ Σ ∗ such that xy ∈ Q , u = (xy)i , ι(u) = (yx)i , and β = y(xy)k for some k ≥ 0. Since u = αβ = (xy)i , we have α = (xy)i−k−1 x. Now we have to prove that such (x, y) does not depend on the choice of (α, β). Suppose there were an ι-symmetric factorization (α 0 , β 0 ) of u for which x0 y0 ∈ Q , u = (x0 y0 )i , ι(u) = (y0 x0 )i , α 0 = (x0 y0 )i−j−1 x0 , and β 0 = y0 (x0 y0 )j for some 0 ≤ j < i and x0 , y0 ∈ Σ ∗ such that (x, y) 6= (x0 , y0 ). Then we have xy = x0 y0 and yx = y0 x0 , which contradicts the primitivity of xy. The preceding result shows that, if u is ι-symmetric, then its left factor and right factor can be written in terms of a unique pair√(x, y). We call (x, y) the twin-roots of u with respect to ι, or shortly ι-twin-roots of u. We denote the ι-twin-roots of u by ι u. Note that x 6= y and we can assume that x cannot be empty whereas y can. Proposition 4 has the following two consequences. Corollary 5. Let ι be an involution√ on Σ ∗ and u be an ι-symmetric word. Then the number of ι-symmetric factorizations of u is n for some n ≥ 1 if and only if u = ( u)n . Corollary 6. Let ι be an involution on Σ ∗ and u be an ι-symmetric word such that
√ι
u = (x, y). Then the primitive root of u is xy.
Corollary 6 is the first result that relates the notion of the primitive root of an ι-symmetric word to ι-twin-roots. For the particular case of a d-morphic involution δ , the primitive root and the δ -twin-roots are related more strongly. Firstly, we make a connection between the two elements of δ -twin-roots. Lemma 7. Let δ be a d-morphic involution on Σ ∗ , and u be a δ -symmetric word with δ -twin-roots (x, y). Then xy = δ(yx). Proof. Let u = (xy)i = αβ = δ(βα) for some i ≥ 1 and α, β ∈ Σ ∗ . Due to Proposition 4, α = (xy)k x and β = y(xy)i−k−1 for some 0 ≤ k < i. Substituting these into (xy)i = δ(βα) results in (xy)i = δ((yx)i ). Since δ is either morphic or antimorphic, we have xy = δ(yx).
√ √ √ v if and only if δ u = δ v . √ √ √ √ √ √ δ Proof. (If) For δ u = , Corollary 6 implies √u = √v = xy. (Only if) Let δ u = (x, y) and δ v = (x0 , y0 ). √ v = (x, y)√ 0 0 0 0 Corollary 6 implies u = xy and v = x y . Let p = u = v and we have p = xy = x y√. From Lemma 7, both (x, y) and (x0 , y0 ) are δ -symmetric factorizations of p. If (x, y) 6= (x0 , y0 ), due to Corollary 5, p = ( p)n for some n ≥ 2, a Proposition 8. Let δ be a d-morphic involution on Σ ∗ , and u, v be δ -symmetric words. Then
√
u=
contradiction. Proposition 9. Let δ be a d-morphic involution on Σ ∗ , and u be a δ -symmetric word such that
√ δ
u = (x, y).
(1) If δ is antimorphic, then both x and y are δ -palindromes, (2) If δ is morphic, then either (i) x is a δ -palindrome and y = λ, or (ii) x is not a δ -palindrome and y = δ(x). Proof. Due to Lemma 7, we have xy = δ(yx). If δ is antimorphic, then this means that xy = δ(x)δ(y), and hence x = δ(x) and y = δ(y). If δ is morphic, then xy = δ(y)δ(x). If y = λ, then we have x = δ(x). Otherwise, we have three cases depending on the lengths of x and y. If they have the same length, then y = δ(x). The primitivity of xy forces x not to be a δ -palindrome. If |x| < |y|, then y = y1 y2 for some y1 , y2 ∈ Σ + such that δ(y) = xy1 and y2 = δ(x). Then xy = xδ(x)δ(y1 ) = δ(y1 )xδ(x), which is a contradiction with xy ∈ Q . The case when |y| < |x| can be proved by symmetry. Next we consider the δ -twin-roots of a δ -palindrome; indeed δ -palindromes are δ -symmetric (Proposition 2), and hence have δ -twin-roots. The δ -twin-roots of δ -palindromes have the following property. Lemma 10. Let δ be a d-morphic involution and u be a δ -symmetric word such that Then u is a δ -palindrome if and only if x is a δ -palindrome and y = λ.
√ δ
u = (x, y) for some x ∈ Σ + and y ∈ Σ ∗ .
2396
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
Proof. (If) Since y = λ, u = xi for some i ≥ 1. Then δ(u) = δ(xi ) = δ(x)i = xi , and hence u ∈ Pδ . (Only if) First we√consider the case when δ is antimorphic. From Proposition 9, x, y ∈ Pδ . Suppose y 6= λ. Since u ∈ Pδ , Lemma 1 implies u ∈ Pδ , and hence xy = δ(xy) = δ(y)δ(x) = yx. This means that nonempty words x and y commute, a contradiction with xy ∈ Q . Next we consider the case of δ being morphic. Since u is a δ -palindrome, any letter a from u has the palindrome property, i.e., δ(a) = a. Then all prefixes property so that x = δ(x). Proposition 9 implies either y = λ or √of u satisfy the palindrome √ y = δ(x), but the latter, with u = xy, leads to u = x2 , a contradiction. Note that the notion of ι-symmetry and ι-twin-roots of a word are dependent on the involution ι under consideration. Thus, for example, a word u may be ι1 -symmetric and not ι2 -symmetric, and its twin-roots might be different depending on the involution considered. The following two examples show that there exist words u and morphic involutions µ1 and µ2 such that the µ1 -twin-roots of u are different from µ2 -twin-roots of u, and the same situation can be found for the antimorphic case. Example 11. Let u = ATTAATTA, µ1 be the identity on Σ extended to a morphism, and µ2 be the morphic involution such that µ2 (A) = T and µ2 (T) = A. Then u is both µ1 -symmetric and µ2 -symmetric. Indeed, u = ATTA · ATTA √ = µ1 )µ ( AT ) . The µ -symmetric property of u implies that µ1 (ATTA)µ1 (ATTA), and u = AT · TAATTA = µ2 (TAATTA u = 2 1 √ √ (ATTA, λ), and the µ2 -symmetric property of u implies µ2 u = (AT, TA). We can easily check that u = ATTA · λ = AT · TA. Example 12. Let u = TAAATTTAAATT, mi be the identity on Σ extended to an antimorphism, namely the well-known mirror-image mapping, and θ be the antimorphic involution such that θ (A) = T and θ (T) = A. We can split u into two palindromes TAAAT and TTAAATT so that u is mi-symmetric. By product of two θ -palindromes √the same token, u is a √ TAAATTTA and AATT, and hence θ -symmetric. We have that mi u = (TAAAT, T) and θ u = (TA, AATT). Note that √ u = TAAAT · T = TA · AATT holds. The last example shows that it is possible to find a word u, and morphic and antimorphic involutions µ and θ , such that the µ-twin-roots of u and the θ -twin-roots of u are distinct. Example 13. Let u = AACGTTGC. µ and θ be morphic and antimorphic involutions, respectively, which map A to T, C to G, and vice √ versa. Then u = µ(TTGC √ )µ(AACG) = θ (AACGTT)θ (GC√) so that u is both µ-symmetric and θ -symmetric. We have that µ u = (AACG, TTGC) and θ u = (AACGTT, GC). Moreover u = AACG · TTGC = AACGTT · GC. 4. The set of symmetric words in the Chomsky hierarchy In this section we consider the classification of the language Sµ of the µ-symmetric words with respect to a morphic involution µ, and Sθ of the θ -symmetric words with respect to an antimorphic involution θ , in the Chomsky hierarchy, [2,11]. For a morphic involution µ, we show that Pµ , the set of all µ-palindromes, is regular (Proposition 14). Unless empty, the set Sµ \ Pµ of all µ-symmetric but non-µ-palindromic words, is not context-free (Proposition 16) but is context-sensitive (Proposition 19). As a corollary of these results we show that, unless empty, the set Sµ of all µ-symmetric words is contextsensitive (Corollary 20), but not context-free (Corollary 17). In contrast, for an antimorphic involution θ , the set of all θ symmetric words turns out to be context-free (Proposition 21). Proposition 14. Let µ be a morphic involution on Σ ∗ . Then Pµ is regular. Proof. For Σp = {a ∈ Σ | a = µ(a)}, Pµ = Σp∗ , which is regular. Next we consider Sµ \ Pµ . If c = µ(c ) holds for all letters c ∈ Σ , then Σ ∗ = Pµ , that is, Sµ \ Pµ is empty. Therefore, we assume the existence of a character c ∈ Σ satisfying c 6= µ(c ). Under this assumption, we show that Sµ \ Pµ is not context-free but context-sensitive. Lemma 15. Let µ be a morphic involution on Σ ∗ . If there is c ∈ Σ such that c 6= µ(c ), then Sµ \ Pµ is infinite. Proof. This is clear from the fact that (c µ(c ))k ∈ Sµ \ Pµ for all k ≥ 1. Proposition 16. Let µ be a morphic involution on Σ ∗ . If Σ contains a character c ∈ Σ satisfying c 6= µ(c ), then Sµ \ Pµ is not context-free. Proof. Lemma 15 implies that Sµ \ Pµ is not finite. Suppose Sµ \ Pµ were context-free. Then there is an integer n given to us by the pumping lemma. Let us choose z = an µ(a)n an µ(a)n for some a ∈ Σ satisfying a 6= µ(a). We may write z = uvw xy subject to the usual constraints (1) |vw x| ≤ n, (2) v x 6= λ, and (3) for all i ≥ 0, zi = uv i w xi y ∈ Sµ \ Pµ . Note that for any w ∈ Sµ \ Pµ and any a ∈ Σ satisfying a 6= µ(a), the number of occurrences of a in w should be equal to that of µ(a) in w . Therefore, if v x contained different numbers of a’s and µ(a)’s, z0 = uw y would not be a member of Sµ \ Pµ . Suppose vw x straddles the first block of a’s and the first block of µ(a)’s of z, and v x consists of k a’s and k µ(a)’s for some k > 0. Note that 2k < n because |v x| ≤ |vwx| ≤ n. Then z0 = an−k µ(a)n−k an µ(a)n , and z0 ∈ Sµ \ Pµ means that there exist γ 6∈ Pµ and an integer m ≥ 1 such that z0 = (γ µ(γ ))m . Thus, µ(γ ) ∈ Σ ∗ µ(a), i.e., γ ∈ Σ ∗ a. This implies that the last block of µ(a) of z0 is a suffix of the last µ(γ ) of z0 , and hence |γ | = |µ(γ )| ≥ n. As a result, an−k µ(a)k ∈ Pref(γ ), i.e., µ(a)n−k ak ∈ Pref(µ(γ )). Since a 6= µ(a), we have µ(γ ) = µ(a)n−k ak βµ(a)n for some β ∈ Σ ∗ .
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
2397
This implies |µ(γ )| ≥ 2n. On the other hand, |z0 | = 4n − 2k, and hence |µ(γ )| ≤ 2n − k. Now we reached the contradiction. Even if we suppose that vw x straddles the second block of a’s and the second block of µ(a)’s of z, we would reach the same contradiction. Finally, suppose that vw x were a substring of the first block of µ(a)’s and the second block of a’s of z. Then z0 = an µ(a)n−k an−k µ(a)n = (γ µ(γ ))m for some m ≥ 1. As proved above, µ(a)n ∈ Suff(µ(γ )), and this is equivalent to an ∈ Suff(γ ). Since z0 contains the n consecutive a’s only as the prefix an , we have γ = an , i.e., µ(γ ) = µ(a)n . However, the prefix an is followed by at most n−k occurrences of µ(a) and k ≥ 1. This is a contradiction. Consequently, Sµ \ Pµ is not context-free. The proof of Proposition 16 suggests that for an alphabet Σ containing a character c satisfying c 6= µ(c ), Sµ is not context-free either. Corollary 17. Let µ be a morphic involution on Σ ∗ . If Σ contains a character c ∈ Σ satisfying c 6= µ(c ), then Sµ is not contextfree. Next we prove that Sµ \ Pµ is context-sensitive. We will construct a type-0 grammar and prove that the grammar is indeed a context-sensitive grammar. For this purpose, the workspace theorem is employed, which requires a few terminologies: Let G = (N , T , S , P ) be a grammar and consider a derivation D according to G like D : S = w0 ⇒ w1 ⇒ · · · ⇒ wn = w . The workspace of w by D is defined as WSG (w, D) = max{|wi | | 0 ≤ i ≤ n}. The workspace of w is defined as WSG (w) = min{WSG (w, D) | D is a derivation of w}. Theorem 18 (Workspace Theorem [11]). Let G be a type-0 grammar. If there is a nonnegative integer k such that WSG (w) ≤ k|w| for all nonempty words w ∈ L(G), then L(G) is context-sensitive. Proposition 19. Let µ be a morphic involution on Σ ∗ . If Σ contains a character c ∈ Σ satisfying c 6= µ(c ), then Sµ \ Pµ is context-sensitive. Proof. We provide a type-0 grammar which generates a language equivalent to Sµ \ Pµ . Let G = (N , Σ , P , S ), where S ← − ← − − → − → N = {S , Zˆ , Z , Xˆ i , Xˆ m , Y , L , #} ∪ a∈Σ { Xa , Ca }, the set of nonterminal symbols, and P is the set of production rules given below. First off, this grammar creates αµ(α) for α ∈ Σ ∗ that contains a character c ∈ Σ satisfying c 6= µ(c ). The 1–7th rules of the following list of P achieve this task. Secondly, 5th and 10–18th rules copy αµ(α) at arbitrary times so that the resulting word is (αµ(α))i for some i ≥ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
S S − → Xa c − → Xa Y ← − c L ← − Xˆi L ← − Xˆi L ← − Xˆ m L ← − Xˆ m L ← − Zˆ a L − → Ca c − → Ca Y − → Ca # ← − Y L ← − Zˆ Y L ← − Zˆ Y L ← − cZ ← − #Z #
→ → → → → → → → → → → → → → → → → → →
− →
#Zˆ aXˆi Xa Y # − → #Zˆ bXˆ m Xb Y # − → c Xa ← − L µ(a)Y ← − Lc − → aXˆi Xa − → bXˆ m Xb − → aXˆ m Xa ← − L − → aZˆ Ca − → c Ca − → Y Ca ← − L a# ← − L Y, ← −← − Z LY
λ ← −
Z c #Zˆ ,
∀a ∈ Σ , ∀b ∈ Σ such that b 6= µ(b), ∀ a, c ∈ Σ , ∀a ∈ Σ , ∀c ∈ Σ , ∀a ∈ Σ , ∀b ∈ Σ such that b 6= µ(b), ∀a ∈ Σ , ∀a ∈ Σ , ∀ a, c ∈ Σ , ∀a ∈ Σ , ∀a ∈ Σ ,
∀c ∈ Σ ,
λ.
This grammar works in the following manner. After the 1st or 6th rule generates a terminal symbol a ∈ Σ , the 3rd and ← − 4th rules deliver information of the symbol to Y and generate µ(a) just before Y , and by the 5th rule, the header L go back to Xˆ i . This process is repeated until a character b ∈ Σ satisfying b 6= µ(b) is generated, which is followed by changing Xˆ i to Xˆ m and generating µ(b) just before Y . Now the grammar may continue the a-θ (a) generating process or shift to a copy ← − ← − phase (9th rule Xˆ m L → L ). From now on, whenever the a-µ(a) process ends, the grammar can do this choice. Just after ← − ← − ← − using the 9th rule Xˆ m L → L , the sentential form of this derivation is Zˆ α L µ(α)Y for some α ∈ Σ + which contains at least one character b ∈ Σ satisfying b 6= µ(b). The 5th and 10–18th rules copy αµ(α) at the end of sentential form. ← − Just after coping αµ(α), the sentential form αµ(α)Zˆ Y L (αµ(α))m appears so that if the 15th rule is applied, then another
2398
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
αµ(α) is copied; otherwise the derivation terminates. Therefore, a word w derived by this grammar G can be represented as (αµ(α))n for some n ≥ 1, and hence w ∈ Sµ . In addition, G generates only non-θ -palindromic word so that w ∈ Sµ \ Pµ . √ Thus, L(G) ⊆ Sµ \ Pµ . Conversely, if w ∈ Sµ \ Pµ , then it has the µ-twin-roots µ w = (x, y) and w = (xy)n for some n ≥ 1. Since y = µ(x), w can be generated by G. Therefore, Sµ \ Pµ ⊆ L(G). Consequently, L(G) = Sµ \ Pµ . Furthermore, this grammar satisfies the workspace theorem (Theorem 18). Any sentential form to derive a word cannot be longer than |w| + c for some constant c ≥ 0. Therefore, L(G) is context-sensitive. Corollary 20. Let µ be a morphic involution on Σ ∗ . If Σ contains a character c ∈ Σ satisfying c 6= µ(c ), then Sµ is contextsensitive. Finally we show that the set of all θ -symmetric words for an antimorphic involution θ is context-free. Proposition 21. For an antimorphic involution θ , Sθ is context-free. Proof. It is known that Pθ is context-free and the family of context-free languages is closed under catenation. Since Sθ = Pθ · Pθ , Sθ is context-free. 5. On the pseudo-commutativity of languages We conclude this paper with an application of the results obtained in Section 3 to the µ-commutativity of languages for a morphic involution µ. For two languages X , Y ⊆ Σ ∗ , X is said to µ-commute with Y if XY = µ(Y )X holds. Example 22. Let Σ = {a, b} and µ be a morphic involution such that µ(a) = b and µ(b) = a. For X = {ab(baab)i | i ≥ 0} and Y = {(baab)j | j ≥ 1}, XY = µ(Y )X holds. In this section we investigate languages X which µ-commute with a set Y of µ-symmetric words. When analyzing such pseudo-commutativity equations, the first step is to investigate equations wherein the set of the shortest words in X µcommutes with the set of the shortest words of Y . (In [3], the author used this strategy to find a solution to the classical commutativity of formal power series, result known as Cohn’s theorem.) For n ≥ 0, by Xn we denote the set of all words in X of length n, i.e., Xn = {w ∈ X | |w| = n}. Let m and n be the lengths of the shortest words in X and Y , respectively. Then XY = µ(Y )X implies Xm Yn = µ(Yn )Xm . The main contribution of this section is to use results from Section 3 to prove that X cannot contain any word shorter than the shortest left factor of all µ-twin-roots of words in Yn (Proposition 28). Its proof requires several results, e.g., Lemmata 25–27. Lemma 23 ([12]). Let u, v ∈ Σ + and X ⊆ Σ ∗ . If X is not empty and Xu = v X holds, then |Xn | ≤ 1 for all n ∈ N0 . Lemma 24. Let u, v ∈ Σ + and X ⊆ Σ ∗ . If X is not empty and uX = µ(X )v holds, then |Xn | ≤ 1 for all n ∈ N0 . Let X ⊆ Σ ∗ , Y ⊆ Sµ \ Pµ such that XY = µ(Y )X , and n be the length of the shortest words in Y . For n ≥ 1, let √ Yn,` = {y ∈ Yn | µ y = (x, µ(x)), |x| = `}. Informally speaking, Yn,` is a set of words in Y of length n having the µ-twinroots whose left factor is of length `. Lemma 25. Let Y ⊆ Sµ \ Pµ , y1 , y2 ∈ Yn,` for some n, ` ≥ 1, and u, w ∈ Σ ∗ . If uy1 = µ(y2 )w and |u|, |w| ≤ `, then u = w .
√
Proof. Since |y1 | = |y2 | = n, we have |u| = |w|. Let y1 = (x1 µ(x1 ))n/2` and y2 = (x2 µ(x2 ))n/2` , where µ y1 = (x1 , µ(x1 )) √ and µ y2 = (x2 , µ(x2 )) for some x1 , x2 ∈ Σ + . Now we have u(x1 µ(x1 ))n/2` = µ(x2 µ(x2 ))n/2` w . This equation, with |u| ≤ `, implies that ux1 µ(x1 ) = µ(x2 µ(x2 ))w . Then we have µ(x2 ) = uα for some α ∈ Σ ∗ , and ux1 µ(x1 ) = uαµ(u)µ(α)w . This means x1 = αµ(u) and µ(x1 ) = µ(α)w , which conclude u = w . Lemma 26. Let X ⊆ Σ ∗ , and Y ⊆ Sµ \ Pµ such that XY = µ(Y )X . For integers m, n ≥ 1 such that Xm Yn = µ(Yn )Xm and m ≤ min{` | Yn,` 6= ∅}, we have Xm Yn,` = µ(Yn,` )Xm for all ` ≥ 1.
√
Proof. Let y1 ∈ Yn such that y1 = (x1 µ(x1 ))i for some i ≥ 1, where µ y1 = (x1 , µ(x1 )). Since Xm Yn = µ(Yn )Xm holds, there √ exist u, v ∈ Xm and y2 ∈ Yn satisfying uy1 = µ(y2 )v . When y2 = (x2 µ(x2 ))j for some j ≥ 1, where µ y2 = (x2 , µ(x2 )), we will show that i = j. Suppose i 6= j. We only have to consider the case where i and j are relatively prime. The symmetry makes it possible to assume i < j, and we consider three cases: (1) i = 1 and j is even; (2) i = 1 and j is odd; and (3) i, j ≥ 2. Firstly, we consider the case (1), where we have ux1 µ(x1 ) = (µ(x2 )x2 )j v . Since |u| ≤ |x1 |, |x2 |, we can let ux1 = (µ(x2 )x2 )j/2 α and αµ(x1 ) = (µ(x2 )x2 )j/2 v for some α ∈ Σ ∗ . Note that |α| = |u| = |v| because |x1 µ(x1 )| = |(µ(x2 )x2 )j |. Since |u| ≤ |x2 |, let µ(x2 ) = uβ for some β ∈ Σ ∗ . Then the former of preceding equations implies x1 = β x2 (µ(x2 )x2 )j/2−1 α . Substituting these into the latter equation gives αµ(β)µ(x2 )(x2 µ(x2 ))j/2−1 µ(α) = uβ x2 (µ(x2 )x2 )j/2−1 v . This provides us with x2 = µ(x2 ), which contradicts x2 6∈ Pµ . Case (2) is that i = 1 and j is odd. In a similar way as the preceding case, let ux1 = (µ(x2 )x2 )(j−1)/2 µ(x2 )α and αµ(x1 ) = x2 (µ(x2 )x2 )(j−1)/2 v for some α ∈ Σ ∗ . Since |u| ≤ |x2 |, the first equation implies that µ(x2 ) = uβ for some β ∈ Σ ∗ . Then substituting this into the second equation results in α = µ(u). By the same token, we have α = µ(v), and hence u = v . Therefore, ux1 µ(x1 ) = (µ(x2 )x2 )j u = uβµ(u)µ(β)(uβµ(u)µ(β))j−1 u = u(βµ(u)µ(β)u)j . Thus, x1 µ(x1 ) = (βµ(u)µ(β)u)j , which contradicts the primitivity of x1 µ(x1 ) because the assumption that j is odd and i < j implies j ≥ 3.
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
2399
Fig. 1. It is not always the case that |α1 | < |α2 | < · · · < |αj |. However, we can say that for any k1 , k2 , if k1 6= k2 , then |αk1 | 6= |αk2 |.
What remains now is the case (3) where i, j ≥ 2 are relatively prime. Since n = i · |x1 µ(x1 )| = j · |x2 µ(x2 )|, the relative primeness between i and j means that |x1 µ(x1 )| = j` and |x2 µ(x2 )| = i` for some ` ≥ 1. For all 1 ≤ k ≤ j, u(x1 µ(x1 ))ik αk = µ(x2 µ(x2 ))k for some 0 ≤ ik ≤ i and αk ∈ Pref(x1 µ(x1 )). We claim that for some `0 satisfying 0 ≤ `0 < `, there exists a 1-to-1 correspondence between {|α1 |, . . . , |αj |} and {0 + `0 , ` + `0 , 2` + `0 , . . . , (j − 1)` + `0 }. Indeed, u(x1 µ(x1 ))ik αk = µ(x2 µ(x2 ))k implies |u| + ik j` + |αk | = k|x2 µ(x2 )|. Then, |αk | = k|x2 µ(x2 )| − ik j` − |u| = (ik − ik j)` − |u|. Thus, |αk | = −|u| (mod `). We can easily check that if there exist 1 ≤ k1 , k2 ≤ j satisfying ik1 − ik1 j = ik2 − ik2 j, then k=j
k1 = k2 (mod j) because i and j are relatively prime. As a result, ∪k=1 {ik − ik j (mod j)} = {0, 1, . . . , j − 1}. By letting `0 = −|u| (mod `), the existence of the 1-to-1 correspondence has been proved. Since `0 < ` and i` = |x2 µ(x2 )|, let µ(x2 µ(x2 )) = βwα for some β, w, α ∈ Σ ∗ such that |β| = ` − `0 , |w| = (i − 1)`, and |α| = `0 . Then u(x1 µ(x1 ))ik αk = µ(x2 µ(x2 ))k implies that for all k, α ∈ Suff(αk ). Recall that for all k, αk ∈ Pref(x1 µ(x1 )). Then, with the 1-to-1 correspondence, we can say that α appears on x1 µ(x1 ) at even intervals. Let x1 µ(x1 ) = αβ1 αβ2 · · · αβj (see Fig. 1), where |β1 | = · · · = |βj | = |β|. We get (x1 µ(x1 ))ik+1 −ik αk+1 = αk µ(x2 µ(x2 )) = αk βwα for any 1 ≤ k ≤ j − 1 by substituting µ(x2 µ(x2 ))k = u(x1 µ(x1 ))ik αk into µ(x2 µ(x2 ))k+1 = u(x1 µ(x1 ))ik+1 αk+1 . Note that ik+1 ≥ ik ; otherwise, we would have (x1 µ(x1 ))ik −ik+1 αk µ(x2 µ(x2 )) = αk+1 , which is a contradiction with the fact that |x1 µ(x1 )| ≥ |αk+1 |. Since |αk β| ≤ |x1 µ(x1 )|, αk β ∈ Pref(x1 µ(x1 )). Even if ik+1 − ik = 0, αk β ∈ Pref(αk+1 ) ⊆ Pref(x1 µ(x1 )). Thus, there exists an 0 0 integer 1 ≤ j0 ≤ j such that β1 = · · · = βj0 −1 = βj0 +1 = · · · = βj = β , that is, x1 µ(x1 ) = (αβ)j −1 αβj0 (αβ)j−j . If 0
j0 < j, then there exist k1 , k2 such that αk1 = (αβ)j −1 αβj0 α and αk2 = α(βα)k for some k ≥ 1. Clearly, |αk1 |, |αk2 | ≥ `. By the original definitions of αk1 and αk2 , they must share the suffix of length `. Hence, βj0 = β . If j0 = j, then we claim that for all 1 ≤ k < j and some w ∈ Σ ≤2` , αk w ∈ Pref(x1 µ(x1 )) implies w ∈ Pref(µ(x2 µ(x2 ))). Indeed, as above we have (x1 µ(x1 ))ik+1 −ik αk+1 = αk µ(x2 µ(x2 )). If ik+1 − ik ≥ 1, then this means that αk w ∈ Pref(αk µ(x2 µ(x2 ))), and hence w ∈ Pref(µ(x2 µ(x2 ))); otherwise, αk+1 = αk µ(x2 µ(x2 )). Since αk+1 ∈ Pref(x1 µ(x1 )) and x2 µ(x2 ) ≥ 2`, αk w ∈ Pref(αk+1 ), and hence w ∈ Pref(µ(x2 µ(x2 ))). Let αk1 = (αβ)j−3 α and αk2 = (αβ)j−2 α . Then αk1 βαβα ∈ Pref(x1 µ(x1 )) implies βαβα ∈ Pref(µ(x2 µ(x2 ))). By the same token, αk2 βαβj = x1 µ(x1 ) implies βαβj ∈ Pref(µ(x2 µ(x2 ))). Thus, βj = β . Consequently, x1 µ(x1 ) = (αβ)j . Since j ≥ 3, this contradicts the primitivity of x1 µ(x1 ). Lemma 27. Let X ⊆ Σ ∗ , and Y ⊆ Sµ \ Pµ such that XY = µ(Y )X . If there exist m, n ≥ 1 such that Xm Yn = µ(Yn )Xm , and m ≤ min{` | Yn,` 6= ∅}, then |Yn,` | ≤ 1 holds for all ` ≥ 1. Proof. Lemma 26 implies that Xm Yn,` = µ(Yn,` )Xm for all ` ≥ 1. Let us consider this equation for some ` such that Yn,` 6= ∅. Then for y1 ∈ Yn,` , there must exist u, w ∈ Xm and y2 ∈ Yn,` satisfying uy1 = µ(y2 )w . Lemma 25 enables us to say u = w because m ≤ `. Thus, Xm Yn,` = µ(Yn,` )Xm is equivalent to ∀u ∈ Xm , uYn,` = µ(Yn,` )u. For the latter equation, Lemma 24 and the assumption |Yn,` | ≥ 1 make it possible to conclude |Yn,` | = 1. Having proved the required lemmata, now we will prove the main results. Proposition 28. Let X ⊆ Σ ∗ , and Y ⊆ Sµ \ Pµ such that XY = µ(Y )X . Let n be the length of the shortest words in Y . Then X does not contain any nonempty word which is strictly shorter than the shortest left factor of µ-twin-roots of an element of Yn . Proof. If there were such an element of X , the shortest words of X are shorter than any left factor of µ-twin-roots of words in Y . Let u be one of the shortest nonempty words in X , and let |u| = m for some m ≥ 1. Then XY = µ(Y )X implies Xm Yn = µ(Yn )Xm . Moreover, Lemma 26 implies that Xm Yn = µ(Yn )Xm if and only if Xm Yn,` = µ(Yn,` )Xm for all ` ≥ 1. Then, Lemma 27 implies |Yn,` | ≤ 1 for all ` ≥ 1. Let us consider the minimum ` satisfying |Yn,` | = 1. Such an ` certainly √ exists because Yn 6= ∅. Let Yn,` = {y}, where y = (xµ(x))i for some i ≥ 1 and µ y = (x, µ(x)). Then, uy = µ(y)u means u(xµ(x))i = µ((xµ(x))i )u. Moreover, the condition |u| < |x| results in uxµ(x) = µ(x)xu. Letting µ(x) = uα for some α ∈ Σ + , we have uxµ(x) = uαµ(u)µ(α)u, which means xµ(x) = α · µ(u)µ(α)u = µ(u)µ(α)u · α . Since α, u ∈ Σ + , this is a contradiction with the primitivity of xµ(x). Corollary 29. Let X ⊆ Σ ∗ , and Y ∈ Sµ \ Pµ such that XY = µ(Y )X , and m, n be the lengths of the shortest words in X and in Y , respectively. If m = min{` | Yn,` 6= ∅}, then both Xm and Yn are singletons. Proof. It is obvious that Xm Yn = µ(Yn )Xm holds. Lemma 26 implies that Xm Yn,` = µ(Yn,` )Xm for all ` ≥ 1. Moreover Lemma 27 implies that for all `, |Yn,` | ≤ 1. If there exists `0 > m such that |Yn,`0 | = 1, then Xm Yn,`0 = µ(Yn,`0 )Xm must hold. This contradicts Proposition 28, where Xm and Yn,`0 correspond to X and Y in the proposition, respectively. Now we know that Yn is singleton. Then Lemma 23 means that Xm is singleton.
2400
L. Kari et al. / Theoretical Computer Science 410 (2009) 2393–2400
Proposition 30. Let X ⊆ Σ ∗ and Y ⊆ Sµ \ Pµ such that XY = µ(Y )X . Let m and n be the lengths of the shortest words in X and Y , respectively. If m = min{` | Yn,` 6= ∅}, then a language which commutes with Y cannot contain any nonempty word which is strictly shorter than any primitive root of a word in Yn .
√
i µ Proof. Corollary 29 implies that Y √n is a singleton. Let Yn = {w}, and let w = (xµ(x)) for some i ≥ 1, where w = (x, µ(x)). Then from Corollary 6, we have √ w = xµ(x). Let Z be a language which commutes with Y . Suppose the shortest word in Z , say v , is strictly shorter than w . Let |v| = `0 . Then n = Yn Z`0 , i.e., Z`0 w = w Z`0 . Lemma 23 results √ in |Z`0 | = 1. Let √ Z`0 Y√ Z`0 = {v}. Now we have vw = wv . This implies that v = w , which contradicts the fact that |v| < | w| and v 6= λ.
6. Conclusion This paper generalizes the notion of f -symmetric words to an arbitrary mapping f . For an involution ι, we propose the notion of the ι-twin-roots of an ι-symmetric word, show their uniqueness, and the fact that the catenation of the ι-twinroots of a word equals its primitive root. Moreover, for a morphic or antimorphic involution δ , we prove several additional properties of twin-roots. We use these results to make steps toward solving pseudo-commutativity equations on languages. Acknowledgements This research was supported by The Natural Sciences and Engineering Council of Canada Discovery Grant and Canada Research Chair Award to L.K. References [1] L. Adleman, Molecular computation of solutions to combinatorial problems, Science 266 (1994) 1021–1024. [2] N. Chomsky, M.P. Schützenberger, The algebraic theory of context-free languages, in: P. Bradford, D. Hirschberg (Eds.), Computer Programming and Formal Languages, North Holland, Amsterdam, 1963, pp. 118–161. [3] P.M. Cohn, Factorization in noncommuting power series rings, Proceedings of the Cambridge Philosophical Society 58 (1962) 452–464. [4] E. Czeizler, L. Kari, S. Seki, On a special class of primitive words, in: Proc. Mathematical Foundations of Computer Science (MFCS 2008), in: LNCS, vol. 5162, Springer, Torun, Poland, 2008, pp. 265–277. [5] N.J. Fine, H.S. Wilf, Uniqueness theorem for periodic functions, Proceedings of American Mathematical Society 16 (1965) 109–114. [6] C.C. Huang, S.S. Yu, Solutions to the language equation LB = AL, Soochow Journal of Mathematics 29 (2) (2003) 201–213. [7] L. Kari, K. Mahalingam, Watson–Crick conjugate and commutative words, in: M. Garzon, H. Yan (Eds.), DNA 13, in: LNCS, vol. 4848, 2008, pp. 273–283. [8] M. Lothaire, Combinatorics on Words, Cambridge University Press, 1983. [9] A.D. Luca, A.D. Luca, Pseudopalindrome closure operators in free monoids, Theoretical Computer Science 362 (2006) 282–300. [10] R. Lyndon, M. Schützenberger, The equation aM = bN c P in a free group, Michigan Mathematical Journal 9 (1962) 289–298. [11] G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Springer-Verlag, Berlin, Heidelberg, 1997. [12] S.S. Yu, Languages and Codes, in: Lecture Notes, Department of Computer Science, National Chung-Hsing University, Taichung, Taiwan, 402, 2005.
Theoretical Computer Science 410 (2009) 2401–2409
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
Decimations of languages and state complexity Dalia Krieger a , Avery Miller a,1 , Narad Rampersad a,2 , Bala Ravikumar b , Jeffrey Shallit a,∗ a
School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
b
Computer Science Department, 141 Darwin Hall, Sonoma State University, 1801 East Cotati Avenue, Rohnert Park, CA 94928, USA
article
info
In Honor of Sheng Yu’s 60th Birthday Keywords: Deterministic finite automaton State complexity Decimation Context-free language Slender language
a b s t r a c t Let the words of a language L be arranged in increasing radix order: L = {w0 , w1 , w2 , . . .}. We consider transformations that extract terms from L in an arithmetic progression. For example, two such transformations are even(L) = {w0 , w2 , w4 . . .} and odd(L) = {w1 , w3 , w5 , . . .}. Lecomte and Rigo observed that if L is regular, then so are even(L), odd(L), and analogous transformations of L. We find good upper and lower bounds on the state complexity of this transformation. We also give an example of a context-free language L such that even(L) is not context-free. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Let k ≥ 1 and let Σ = {a0 , a1 , . . . , ak−1 } be a finite alphabet. We put an ordering on the symbols of Σ by defining a0 < a1 < · · · < ak−1 . This ordering can be extended to the radix order3 on Σ ∗ by defining w < x if
• |w| < |x|, or • |w| = |x|, where w = a0 a1 · · · an−1 , x = b0 b1 · · · bn−1 , and there exists an index r, 0 ≤ r < n such that ai = bi for 0 ≤ i < r and ar < br . (For words of the same length, the radix order coincides with the lexicographic order.) Thus, given a language L = Σ ∗ , we can consider the elements of L in radix order, say L = {w0 , w1 , w2 , . . .}, where w0 < w1 < · · · . Let I ⊆ N be an index set. Given an infinite language L, we let its extraction by I, L[I ], denote the elements of L in radix order corresponding to the indices of I, where an index 0 denotes the first element of L. For example, if L = {0, 1}∗ = {, 0, 1, 00, 01, 10, 11, . . .} and I = {2, 3, 5, 7, 11, 13, . . .}, the prime numbers, then L[I ] = {1, 00, 10, 000, 100, 110, . . .}. In this paper we give a new proof of a result of Lecomte and Rigo [9], which characterizes those index sets that preserve regularity. Next, we determine upper and lower bounds on the state complexity of the transformation that maps a language
∗
Corresponding author. E-mail addresses:
[email protected] (D. Krieger),
[email protected] (A. Miller),
[email protected] (N. Rampersad),
[email protected] (B. Ravikumar),
[email protected] (J. Shallit). 1 Present address: Department of Computer Science, Sandford Fleming Building, University of Toronto, 10 King’s College Road, Toronto, Ontario M5S 3G4, Canada. 2 Present address: Department of Mathematics, University of Winnipeg, Winnipeg, Manitoba R3B 2E9, Canada. 3 Sometimes erroneously called the lexicographic order in the literature. 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.024
2402
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
to its ‘‘decimation’’ (extraction by an ultimately periodic index set). Finally, answering an open question of Ravikumar, we show that if a language is context-free, its decimation need not be context-free. We note that our operation is not the same as the related one previously considered by Birget [4], Shallit [15] and Berstel and Boasson [2], which extracts the lexicographically least word of each length from a language. Nor is our operation the same as that introduced in Berstel, Boasson, Carton, Petazzoni, and Pin [3], which filters each word in a language by extracting the letters in the word that occur in positions specified by an index set. (Our operation simply removes words from a language, but does not change the actual words themselves.) 2. Regularity-preserving index sets Let I ⊆ N be an index set. We say that I is ultimately periodic if there exist integers r ≥ 0, m ≥ 1 such that for all i ∈ I with i ≥ r we have i ∈ I =⇒i + m ∈ I. For a language L, we define the (m, r )-decimation decm,r (L) to be L[I ], where I = {im + r : i ≥ 0}. Two particular decimations of interest are even(L) = dec2,0 (L) and odd(L) = dec2,1 (L). We now introduce some notation. Let us assume that our alphabet is Σ = {a0 , a1 , . . . , ak−1 } with a0 < a1 < · · · < ak−1 , and for a word w ∈ Σ ∗ , let F (w) be the set of words that are less than w in the radix order, that is, F (w) = {x ∈ Σ ∗ : x < w}. Lemma 1. We have F (w aj ) = {} ∪ F (w)Σ ∪ {w}{a0 , . . . , aj−1 }, and this union is disjoint. Proof. Suppose x < w aj . Then either |x| = 0, which corresponds to the term {}, or |x| ≥ 1. In this latter case, we can write x = ya for some symbol a ∈ Σ . Then either y < w , which corresponds to the term F (w)Σ , or y = w , which corresponds to the last term of the union. We now show how to count the number of words accepted by a deterministic finite automaton (DFA) which are, in radix order, less than a given word. Lemma 2. Let A = (Q , Σ , δ, q0 , F ) be a DFA with n states. For any finite language L, define M (L) to be the matrix such that the entry in row i and column j is the number of words x ∈ L with δ(qi , x) = qj . For 0 ≤ l < k, define Ml to be the n × n matrix where the entry in row i and column j is 1 if δ(qi , al ) = qj , and 0 otherwise. Then M (F (w aj )) = M ({}) + M (F (w))(M0 + M1 + · · · + Mk−1 ) + M ({w})(M0 + · · · + Mj−1 ). Proof. By standard results in path algebra and Lemma 1. We now state and prove a theorem that is essentially due to Lecomte and Rigo [9]. (Their proof is somewhat different, and does not explicitly provide the bound on state complexity that is the main focus of this article.) Theorem 3. Let I ⊆ N be an index set. Then L[I ] is regular for all regular languages L if and only if I is either finite or ultimately periodic. Proof. Suppose L[I ] is regular for all regular languages L. Then, in particular, L[I ] is regular for L = a∗ . But L[I ] = {ai : i ∈ I }. Then, by a well-known characterization of unary regular languages [11], I is either finite or ultimately periodic. For the converse, assume that L is regular. If I is finite, then L[I ] is trivially regular. Hence assume that I is ultimately periodic. We can then decompose I as the finite union of arithmetic progressions (mod m). Since the class of regular languages is closed under finite union and finite modification, it suffices to show that L[I ] is regular for all I of the form {jm + r : j ≥ 0} where m ≥ 1, 0 ≤ r < m. Since L is regular, it is accepted by a deterministic finite automaton A = (Q , Σ , δ, q0 , F ), where, as usual, Q is a finite nonempty set of states, δ is the transition function, q0 is the start state, and F is the set of final states. We show how to construct a new DFA A0 that accepts L[I ] where I = {jm + r : j ≥ 0}. Let Q = {q0 , q1 , . . . , qn−1 }. The states of A0 are pairs of the form hv, qi, where v is a vector with entries in Z/(m) and q is a state of Q . The intent is that if we reach the state hv, qi by a path labeled x, then the ith entry of v counts the number (modulo m) of words y < x that take M from state q0 to qi and, further, that δ(q0 , x) = q. More formally, let A0 = (Q 0 , Σ , δ 0 , q00 , F 0 ), where the components are defined as follows. For 0 ≤ l < k, define Ml to be the n × n matrix where the entry in row P i and column j is 1 if δ(qi , al ) = qj , and 0 otherwise. Let ej be the vector with a 1 in position j and 0’s elsewhere. Let M = 0≤l
X i qi ∈ F
v[i] ≡ r (mod m) and q ∈ F },
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
2403
and
δ 0 (hv, qj i, ai ) = hvM + e0 + ej (M0 + M1 + · · · + Mi−1 ), δ(qj , ai )i,
(1)
where the entries in the matrix product are computed over Z/(m). It is now clear that L(A0 ) = decm,r (L). Corollary 4. Suppose m ≥ 1 and 0 ≤ r < m. If L is regular, accepted by an n-state DFA, then the state complexity of decm,r (L) is ≤ nmn . 3. The unary case In the case where |Σ | = 1, we can improve the upper bound on the state complexity of decm,r (L) as follows: Theorem 5. If L is defined over a unary alphabet, accepted by an n-state DFA and m ≥ 1, 0 ≤ r < m, then decm,r (L) is accepted by an mn-state DFA. Proof. By a well-known result, the DFA for L consists of a ‘‘tail’’ of t states and a ‘‘loop’’ of n − t states. We can then accept decm,r (L) using a tail of at most t /m states and a loop of m(n − t ) states. There is also a matching lower bound: Theorem 6. Let L = (an )∗ , accepted by an n-state DFA. Then decm,0 (L) = (amn )∗ , which is accepted by no DFA with less than mn states. Proof. Clear. 4. Lower bound We now turn to the question of a lower bound on the state complexity of decimation in the case of larger alphabets. We introduce some notation. Let |x|a be the number of occurrences of the symbol a in the word x. For integer n ≥ 1, define Ln := {x ∈ {0, 1}∗ : |x|1 ≡ 0 (mod n)}. Let Σ = {0, 1}. Then Ln can be accepted in the obvious way by a DFA An = (Q , Σ , δ, q0 , F ) with n states. Here Q = {q0 , q1 , . . . , qn−1 }, F = {q0 }, and δ is defined by δ(qi , 0) := qi and δ(qi , 1) := q(i+1) mod n for 0 ≤ i < n. Note that δ(q0 , w) = qi if and only if |w|1 ≡ i (mod n). We will prove Theorem 7. For odd integers n ≥ 3, any DFA accepting odd(Ln ) has at least (n + 1)2n−1 states. The outline of the proof is as follows. First, we use the construction of Theorem 3 to create a DFA A0n with n · 2n states accepting odd(Ln ). We then re-interpret the transition function in the case of Ln using Eq. (1). Next, we show that each state of A0n is reachable from q00 . Finally, we determine all pairs of equivalent states in A0n and show that A0n has (n + 1)2n−1 pairwise inequivalent states. The result follows by the Myhill–Nerode theorem. For the rest of this section, we adopt the following conventions. Vectors are denoted in boldface, such as v. Since the vectors we will deal with are in (Z/(2))n , we write v = [v0 , v1 , . . . , vn−1 ], and arithmetic with vectors and terms of vectors is always done implicitly mod 2. Similarly, any state qi represents qi mod n , and we do not explicitly write the (mod n) part. For the basis vector ej we write ej = (ej,0 , ej,1 , . . . , ej,n−1 ). Lemma 8. We have A0n = (Q 0 , Σ , δ 0 , q00 , F 0 ) where
• • • • • •
Q 0 = {hv, qi : v ∈ (Z/(2))n , q ∈ Q }; Σ = {0, 1}; q00 = h[0, 0, . . . , 0], q0 i; F 0 = {hv, q0 i : v0 = 1}; δ 0 (h[v0 , v1 , . . . , vn−1 ], qi i, 0) = h[v0 + vn−1 + 1, v0 + v1 , v1 + v2 , . . . , vn−2 + vn−1 ], qi i; δ 0 (h[v0 , v1 , . . . , vn−1 ], qi i, 1) = hei + [v0 + vn−1 + 1, v0 + v1 , v1 + v2 , . . . , vn−2 + vn−1 ], qi+1 i.
Proof. Follows directly from the characterization in Theorem 3.
For a ∈ {0, 1}, we define a := 1 − a. If v = [v0 , v1 , . . . , vn−1 ], then v := [v0 , v1 , . . . , vn−1 ] = v + [1, 1, . . . , 1]. If q = hv, qi i, define q := hv, qi i. We say that the parity of hv, qi i is odd if v contains an odd number of entries equal to 1. Otherwise the parity of hv, qi i is even.
2404
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
Lemma 9. For all q ∈ Q 0 , the parity of δ 0 (q, 0) is odd and the parity of δ 0 (q, 1) is even. Proof. From Lemma 8 we have that the sum of the entries of δ 0 (q, 0) is 2v0 + 2v1 + · · · + 2vn−1 + 1 ≡ 1 (mod 2), and the sum of the entries of δ 0 (q, 1) is 2v0 + 2v1 + · · · + 2vn−1 + 1 + ei,i ≡ 0 (mod 2). Lemma 10. Let p, q ∈ Q 0 . Then δ 0 (p, s) = δ 0 (q, s) for some s ∈ Σ ∗ iff p = q or p = q. Proof. If p = q, then δ 0 (p, ) = δ 0 (q, ). Hence assume that p = q. It now immediately follows from Lemma 8 that δ 0 (p, a) = δ 0 (q, a) for a ∈ {0, 1}. Now we prove the converse. Suppose δ 0 (p, s) = δ 0 (q, s) for some s ∈ Σ ∗ . If p = q we are done, so we may assume that p 6= q. Let p = hv, qi i and q = hw, qj i. Then from δ 0 (p, s) = δ 0 (q, s) we get qi = qj . Let t be the shortest prefix of s such that δ 0 (p, t ) = δ 0 (q, t ). If t = then p = q, a contradiction. Hence |t | ≥ 1. Case 1: |t | = 1. Suppose t = 0. From Lemma 8, we deduce that if hu, r i = δ 0 (hv, qi i, 0) then r = qi and u0 = v0 + vn−1 + 1 u1 = v 0 + v 1 u2 = v 1 + v 2
un−1
.. . = vn−2 + vn−1 .
Hence
vn−1 = u0 + v0 + 1 vn−2 = vn−1 + un−1 = un−1 + u0 + v0 + 1 vn−3 = un−2 + un−1 + u0 + v0 + 1 .. . v1 = u2 + v2 = u2 + u3 + · · · + un−1 + u0 + v0 + 1. Thus v = [v0 , v0 , v0 , . . . , v0 ] + [0, u2 + · · · + un−1 + u0 + 1, u3 + · · · + un−1 + u0 + 1, . . . , un−1 + u0 + 1, u0 + 1]. Similarly, w = [w0 , w0 , w0 , . . . , w0 ] + [0, u2 + · · · + un−1 + u0 + 1, u3 + · · · + un−1 + u0 + 1, . . . , un−1 + u0 + 1, u0 + 1]. Since v 6= w, it follows that v0 6= w0 . Thus v0 = w0 and v = w. Hence p = q. On the other hand, if t = 1, then similar reasoning gives q = qi+1 and v = v0 + [0, u2 + · · · + un−1 + u0 + 1, u3 + · · · + un−1 + u0 + 1, . . . , un−1 + u0 + 1, u0 + 1]
+ [0, ei,2 + · · · + ei,n−1 + ei,0 , ei,3 + · · · + ei,n−1 + ei,0 , . . . , ei,n−1 + ei,0 , ei,0 ] and w = w0 + [0, u2 + · · · + un−1 + u0 + 1, u3 + · · · + un−1 + u0 + 1, . . . , un−1 + u0 + 1, u0 + 1]
+ [0, ei,2 + · · · + ei,n−1 + ei,0 , ei,3 + · · · + ei,n−1 + ei,0 , . . . , ei,n−1 + ei,0 , ei,0 ]. Again, since v 6= w, it follows that v0 6= w0 . Thus v0 = w0 and v = w. Thus p = q. Case 2: |t | > 1. Write t = rab for a, b ∈ Σ , r ∈ Σ ∗ . Let p0 = δ 0 (p, ra) and q0 = δ 0 (q, ra). Then p0 6= q0 by definition of t and r. However, δ 0 (p0 , b) = δ 0 (q0 , b), so from Case 1 we have p0 = q0 . But then, since n is odd, the parities of p0 and q0 differ. On the other hand, p0 = δ 0 (δ 0 (p, r ), a) and q0 = δ 0 (δ 0 (q, r ), a). From Lemma 9, we conclude that p0 and q0 are of the same parity. This is a contradiction, and so this case cannot occur. Corollary 11. In the transition diagram of A0 , every state p has exactly two incoming arrows, both labeled with the same letter a, arising from states of different parity, q and q. If p is of odd parity, then a = 0, and if p is of even parity, then a = 1. Proof. This follows from the proof of Lemma 10, where |s| = 1. We say that a state q ∈ Q 0 is reachable if there exists a string x ∈ {0, 1}∗ such that δ 0 (q00 , x) = q. Lemma 12. Every state of A0 is reachable.
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
2405
Proof. Here is the outline of the proof. We define two partial functions: INCR : {0, 1}∗ × {0, 1, . . . , n − 1} × {0, 1, . . . , n − 1} → {0, 1}∗ SHIFT : {0, 1}∗ × {0, 1, . . . , n − 1} → {0, 1}∗ . INCR(t , k, l) produces a string t 0 such that if δ 0 (q00 , t ) = hw, qj i and w has odd parity, then δ 0 (q00 , tt 0 ) = hw + ek + el , ql i. In other words, the effect of reading t 0 after t has been read is to increment the kth and lth bits in the first component of the state, and change the second component to ql . SHIFT(t , l) produces a string t 0 such that if δ 0 (q00 , t ) = hw, qj i and w has odd parity, then δ 0 (q00 , tt 0 ) = hw, ql i. In other words, the effect of reading t 0 after t has been read is to change the second component of the state to ql . We will show below how to define these two functions. For the moment, however, assume that these functions exist; we show how to apply them successively to form a path to any state hv, qi i. The general idea is to apply INCR to add 1-bits to the first component of the state, and then fix up the second component by applying SHIFT. We start with t = 0; this takes us from q00 to the state h[1, 0, 0, . . . , 0], q0 i. Case 1: hv, qi i has odd parity. Find the minimum index l such that vl = 1. If l = 0, then no action is necessary. If l 6= 0, use INCR(t , 0, l) to get to the state hel , ql i. At this point the first 1-bit is set correctly. Since v has odd parity, there is an even number, say 2j, of remaining 1-bits. We now apply INCR j times to increment the remaining 1-bits in pairs. Because we change an even number of bits each time, each new state reached after an application of INCR will be of odd parity. Finally, fix up the second component by applying SHIFT. Case 2: p = hv, qi i has even parity. By Corollary 11 there is a unique state q = hu, qi−1 i of odd parity such that δ 0 (q, 1) = p. Use Case 1 to get to q, and then append 1 to get to p. It now remains to see how to construct the functions INCR and SHIFT. First, we show that from any reachable state with odd parity, we eventually return to that state after reading some number of 0’s. Lemma 13. Given a state of odd parity p, and any word s ∈ {0, 1}∗ such that δ 0 (q00 , s) = p, there exists t = 0l , l ≥ 1, such that δ 0 (q00 , st ) = p. Proof. Using Lemma 9, we know that the parity of each of the states δ 0 (q00 , s0i ), i ≥ 0, is odd. Since there are only a finite number of states, we must have r := δ 0 (q00 , s0i ) = δ 0 (q00 , s0j ) for some 0 ≤ i < j. Further, choose i to be minimal and j to be minimal for this i. Suppose, to get a contradiction, that i ≥ 1. Define r 0 := δ 0 (q00 , s0i−1 ) and r 00 := δ 0 (q00 , s0j−1 ). Then r 0 6= r 00 , for otherwise i, j would not be minimal. Then r 0 and r 00 are distinct states of odd parity from which we reach r on input 0, contradicting Corollary 11. Hence i = 0, and we can take l = j. Now let p be a reachable state of odd parity. Let l(p) be the least positive integer l such that δ 0 (p, 0l ) = p. Lemma 14. If p = hv, qi i is a reachable state of odd parity, then l(p) ≥ 3 unless v = [0, 0, 0, . . . , 1], in which case l(p) = 1. Proof. If v = [0, 0, 0, . . . , 1], then from Lemma 8 we get δ 0 (hv, qi i, 0) = hv, qi i, so l(p) = 1. For the converse, suppose l(p) = 1. Then if v = [v0 , v1 , . . . , vn−1 ], we get by Lemma 8 that
[v0 , v1 , . . . , vn−1 ] = [v0 + vn−1 + 1, v0 + v1 , v1 + v2 , . . . , vn−2 + vn−1 ]. Solving this system gives v = [0, 0, . . . , 1]. If l(p) = 2, then by Lemma 8 we get [v0 , v1 , . . . , vn−1 ] = [v0 + vn−2 , v1 + vn−1 + 1, v0 + v2 , v1 + v3 , . . . , vn−3 + vn−1 ]. Solving this system gives v = [0, 0, . . . , 1]; but then l(p) = 1, a contradiction. We now define τ (p) := max(3, l(p)); hence if p is a reachable state of odd parity, then τ (p) ≥ 3 and δ 0 (p, 0τ (p) ) = p. Lemma 15. Let p = hv, qi i be a reachable state of odd parity. Then (a) δ 0 (p, 0τ (p)−3 010) = hv + ei + ei+1 , qi+1 i; (b) δ 0 (p, 0τ (p)−3 110) = hv + ei + ei+1 , qi+2 i. Proof. Since p is reachable, there exists a string s such that δ 0 (q00 , s) = p = hv, qi i. Now δ 0 (q00 , s) = δ 0 (q00 , s0τ (p) ). From the construction of A0 we know that if v = [v0 , v1 , . . . , vn−1 ] then vi counts, modulo 2, the number n of words w such that w is lexicographically less than s0τ (p) and |w|1 ≡ i (mod n). Now consider the words from s0τ (p) to s0τ (p)−3 110. In increasing lexicographic order, they are s0τ (p)−3 000 s0τ (p)−3 001 s0τ (p)−3 010 s0τ (p)−3 011 s0τ (p)−3 100 s0τ (p)−3 101 s0τ (p)−3 110.
2406
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
Now |s|1 = |s0τ (p)−3 000|1 ≡ i (mod n). Thus
δ 0 (q00 , s0τ (p)−3 001) = hv + ei , qi+1 i. Similarly, |s0τ (p)−3 001| ≡ i + 1 (mod n). Thus
δ 0 (q00 , s0τ (p)−3 010) = hv + ei + ei+1 , qi+1 i. Thus (a) is proved. With a similar computation, we find
δ 0 (q00 , s0τ (p)−3 110) = hv + ei + ei+1 , qi+2 i. This proves (b). Corollary 16. Let p = hv, qi i be any reachable state of odd parity in A0 . For all k ≥ 1, there exists a word yk ∈ {0, 1}∗ such that δ 0 (p, yk ) = hv + ei + ei+k , qi+k i. Proof. Using Lemma 15(a), we have δ 0 (p, 0τ (p)−3 010) = hv + ei + ei+1 , qi+1 i. If k = 1, we are done. Otherwise, use induction. Suppose we have found a string xk such that δ 0 (p, xk ) = p0 := hv + ei + ei+k−1 , qi+k−1 i. Then by Lemma 15(a) we have 0 0 δ 0 (p0 , 0τ (p )−3 010) = hv + ei + ei+k , qi+k i. Thus we can take yk = xk 0τ (p )−3 010. Now let us show that the function SHIFT exists. Lemma 17. Let p = hv, qi i be any reachable state of odd parity in A0 . Then for all j ≥ 0 there exists a word wj ∈ {0, 1}∗ such that δ 0 (p, wj ) = hv, qj i. Proof. If i = j we can take wj = . Otherwise, use Corollary 16 with k = n − 2 to get to the state hv + ei + ei+n−2 , qi+n−2 i. Now use Lemma 15 (b) to get to state hv + ei + ei+n−1 , qi i. Now use Corollary 16 with k = n − 1 to get to the state hv, qi−1 i. If j ≡ i − 1 (mod n), we are done. Otherwise, repeat the sequence of steps above until j is reached. Thus the SHIFT function exists. We now turn to INCR. Lemma 18. Let p = hv, qi i be any reachable state of odd parity in A0 . Then there exists a word xj,l ∈ {0, 1}∗ such that δ 0 (p, xj,l ) = hv + ej + el , ql i. Proof. First, use SHIFT to get to the state hv, qj i. From there, use Corollary 16 with k = l − j to get to state hv + ej + el , ql i. This shows that INCR exists. We have now completed the proof of Lemma 12.
Now that we know that every state of A0 is reachable, it remains to show that the number of pairwise distinguishable states is (n + 1)2n−1 . To do so, we determine when two states are equivalent. We say that a state p is equivalent to q if, for all x ∈ Σ ∗ , we have δ 0 (p, x) ∈ F 0 iff δ 0 (q, x) ∈ F 0 . The first step is the following lemma. Lemma 19. Let p0 , r0 ∈ Q 0 . Suppose there exists a word s ∈ Σ ∗ such that δ 0 (p0 , s) = p1 and δ 0 (r0 , s) = r1 where p1 6= r1 and p1 , r1 ∈ F 0 . Then there exists a word t = 0k , k ≥ 1, such that exactly one of {δ 0 (p1 , t ), δ 0 (r1 , t )} is in F 0 . Proof. Since p1 , r1 ∈ F 0 , we can write p1 = h[u0 , u1 , . . . , un−1 ], q0 i r1 = h[v0 , v1 , . . . , vn−1 ], q0 i, where u0 = v0 = 1. Let i be the greatest index such that ui 6= vi ; since by hypothesis p1 6= r1 , such an index must exist, and since u0 = v0 = 1, we have 1 ≤ i ≤ n − 1. By the definition of i we have ui = vi and uj = vj for j > i. Define p2 := δ 0 (p1 , 0) and r2 := δ 0 (r1 , 0). Suppose i = n − 1. Then from Lemma 8 we have p2 = h[u0 + un−1 + 1, u0 + u1 , u1 + u2 , . . . , un−2 + un−1 ], q0 i r2 = h[v0 + vn−1 + 1, v0 + v1 , v1 + v2 , . . . , vn−2 + vn−1 ], q0 i. Consider the first entries of the vectors in p2 and r2 . Since un−1 = vn−1 , we get that v0 + vn−1 + 1 = v0 + un−1 + 1. Since u0 = v0 = 1, this differs from u0 + un−1 + 1. Thus at most one of p2 , r2 is in F 0 , and the conclusion follows with t = 0, k = 1. Otherwise i < n − 1. Write p2 = h[x0 , . . . , xn−1 ], q0 i r2 = h[y0 , . . . , yn−1 ], q0 i.
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
2407
We have ui = vi and ui+1 = vi+1 . Also, 1 ≤ i ≤ n − 2, so 2 ≤ i + 1 ≤ n − 1. Now by Lemma 8 we get xi+1 = ui + ui+1 and yi+1 = vi + vi+1
= ui + ui + 1 , so it follows that xi+1 = yi+1 . Thus the largest index j where xj 6= yj is ≥ i + 1. We now repeat this process until j = n − 1, at which point we can finish with the argument above. Next we show that we can always get to at least one final state from any state. Lemma 20. At least one final state of A0 is reachable from any state of A0 . Proof. Let p = hv, qi i be a state of A0 . From Lemma 12 we know that there is a string y such that δ 0 (q00 , y) = p. Now let s1 = 1n−1−i 01 and s2 = 1n−1−i 10. Clearly ys2 directly follows ys1 in lexicographic order, and both ys1 , ys2 ∈ L. So at least one of these two strings must be in odd(L). We now consider when two distinct states p = hv, qi i and q = hw, qj i are equivalent. Lemma 21. A state p = hv, qi i is equivalent to q = hw, qj i iff p = q and i = j 6= 0. Proof. By Lemma 20 we know that there is a word s such that δ 0 (p, s) = f1 ∈ F 0 . If δ 0 (q, s) 6∈ F 0 , then p and q are inequivalent. Thus assume that δ 0 (q, s) = f2 ∈ F 0 . If f2 6= f1 , then we use Lemma 19 to see that f1 and f2 are not equivalent. Thus p and q are not equivalent. It follows that f1 = f2 . Hence i = j. Thus δ 0 (p, s) = δ 0 (q, s). By Lemma 10, we know that p = q. If i = 0, then p and q are inequivalent, since the string distinguishes them (v0 = w0 , so exactly one of these is 1). If i 6= 0, then we claim p and q are equivalent. To do so, we consider δ 0 (p, t ) and δ 0 (q, t ) for all strings t. If |t | = 0, then neither δ 0 (p, t ) = p nor δ 0 (q, t ) = q is in F 0 , since in order to be in F 0 a state’s second component must be q0 . If |t | = 1, then from Lemma 8 and the fact that p = q, we see that δ 0 (q, t ) = δ 0 (p, t ). From this we see immediately that δ 0 (q, u) = δ 0 (p, u) for all |u| ≥ 2. Thus the result follows. Lemma 22. The number of pairwise distinguishable states is n · 2n − (n − 1)2n−1 = (n + 1)2n−1 . Proof. There are n · 2n states in A0n . These are all reachable by Lemma 12. Of this number, a state is equivalent to at most one other state, and this occurs iff the state is of the form hv, qi i with i 6= 0. Thus we need to subtract (n − 1)2n−1 to account for the equivalent states, leaving (n + 1)2n−1 pairwise inequivalent states. We have now completed the proof of Theorem 7. 5. Decimations of context-free languages Suppose L is a context-free language. In some cases, decimations of L are still context-free. For example, if PAL = {x ∈ {a, b}∗ : x = xR }, the palindrome language, then even(L) = {} ∪ {xbxR : x ∈ {a, b}∗ } ∪ {xbbxR : x ∈ {a, b}∗ }, which is clearly context-free. If L = {an bn : n ≥ 0}, then it is easy to see that any decimation of L is context-free. This raises the following natural question: if L is a context-free language (CFL), need its decimation be context-free? In this section we give two examples where this is not the case. For the first example, let B be the balanced parentheses language on the symbols {a, b}, i.e., B = {, ab, aabb, abab, aaabbb, aababb, aabbab, abaabb, ababab, aaaabbbb, . . .}. This is a well-known CFL, generated by the context-free grammar S → aS bS | . We will show that even(B) = {, aabb, aaabbb, aabbab, ababab, . . .} is not a CFL. First, we state some useful lemmas. Lemma 23. The number of words of length 2n in B is the Catalan number Cn =
2n n
/(n + 1).
Proof. Very well known; for example, see [10, pp. 116–117]. Now let ν2 (n) denote the exponent of the highest power of 2 dividing n, and let s2 (n) denote the number of 1’s in the binary expansion of n. Lemma 24. For n ≥ 0 we have ν2 (n!) = n − s2 (n). Proof. A well-known result due to Legendre; for example, see [1, Corollary 3.2.2].
2408
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
Lemma 25. For n ≥ 0, Cn is odd if and only if n = 2i − 1 for some integer i ≥ 0. Proof. We have
ν2 (Cn ) = ν2
2n n
!
n+1
= ν2 ((2n)!) − 2ν2 (n!) − ν2 (n + 1) = (2n − s2 (2n)) − 2(n − s2 (n)) − ν2 (n + 1) = s2 (n) − ν2 (n + 1). Thus Cn is odd if and only if s2 (n) = ν2 (n + 1), if and only if n = 2i − 1 for some i ≥ 0. Lemma 26. For n ≥ 0 define Dn :=
P
22i − 1 ≤ n < 22i+1 − 1.
1≤i≤n
Cn . (Thus D0 = 0.) Then Dn is even if and only if there exists i ≥ 0 such that
Proof. Follows immediately from Lemma 25.
We are now ready to prove Theorem 27. The language even(B) is not a context-free language. Proof. P First, we observe that (ab)n is the lexicographically greatest word of length 2n in B. It follows that (ab)n is the Dn = ( 1≤i≤n Ci )th word in B in the radix order. (Recall that we start indexing at 0.) Suppose even(B) is context-free, and define the morphism h : {c}∗ → {a, b}∗ by h(c) = ab. By a well-known theorem [6, Theorem 6.3], h−1 (even(B)) is a context-free language. But h−1 (even(B)) = {cn : Dn is even}. From Lemma 26, we have h−1 (even(B)) = {cn : ∃i ≥ 0 such that 22i − 1 ≤ n < 22i+1 − 1}. Since h−1 (even(B)) is a unary CFL, by a well-known theorem it is actually regular. But the lengths of strings in a unary regular language form an ultimately periodic set, a contradiction. Hence even(B) is not context-free. Corollary 28. odd(B) is not context-free. Proof. This follows from the fact that h−1 (odd(B)) = c∗ − h−1 (even(B)). Recall that a language is a deterministic context-free language (DCFL) if it is accepted by a pushdown automaton that has at most one choice for a move from every configuration. Corollary 29. The class of DCFL’s is not closed under decimation. Proof. B is a DCFL, and even(B) is not a CFL. For our second example, consider the language D = {x ∈ {a, b}∗ : |x|a = |x|b }
= {, ab, ba, aabb, abab, abba, baab, baba, bbaa, aaabbb, . . .}. We will show Theorem 30. even(D) is not context-free. Proof. The proof is similar to that for the language B. We assume that even(D) is context-free and get a contradiction. n First, note that there are n/2 strings of length n in D if n is even, and 0 if n is odd. In particular, the number of strings of length n in D is even for n > 0. Since D contains the empty string, a nonempty string w is in even(D) if and only if it is of odd index, lexicographically speaking, among the strings of length n in D. Since, by assumption, even(D) is context-free, so is D0 = even(D) ∩ aba∗ b∗ = {abab, abaaabbb . . .}. 2n
We claim that aban bn is, lexicographically speaking, of index n−1 among all strings in D of length 2n + 2. To see this, observe that a string of length 2n + 2 is lexicographically less than aban bn if and only if it begins with aa. 2n 2n Thus aban bn ∈ even(D) if and only if n−1 is odd. Now n−1 is odd if and only if n = 2k − 1 for some k ≥ 1. Thus k k D0 = {aba2 −1 b2 −1 : k ≥ 1}, which is clearly not context-free.
D. Krieger et al. / Theoretical Computer Science 410 (2009) 2401–2409
2409
6. Decimation and slender languages Next we consider extractions and decimations of slender context-free languages. A language L is slender if there exists a constant c such that for every n ≥ 0, the number of words of length n in L is ≤ c. Charlier, Rigo, and Steiner [5] showed that if L is regular and slender, then extraction by an index set I gives a regular language if and only if I is the finite union of arithmetic progressions. We will show that the class of slender context-free languages is closed under the operation decm,r . We first review some properties of slender context-free languages. Ilie [7,8], confirming a conjecture of Păun and Salomaa [12], proved that a context-free language is slender if and only if it is a finite disjoint union of languages of the form {uv n wxn y : n ≥ 0}, and further, such a decomposition is effectively computable. Ilie [8, Corollary 13] also proved that the class of slender context-free languages is effectively closed under intersection and set difference. Theorem 31. The class of slender context-free languages is effectively closed under the operation decm,r . Proof. Let L be a slender context-free Sc language and let c be an upper bound on the number of words of any given length in L. We write L as a finite union L = i=1 Li , where, for i = 1, . . . , c, Li is the set consisting of the lexicographically ith words of each length in L. We first show that each Li is context-free. Let min(L) denote the set of the lexicographically least words of each length in L. Berstel and Boasson [2] showed that for any context-free language L, min(L) is context-free, and further, this closure is effective. In our case, the language L is slender by assumption, and the language L1 = min(L) is slender by definition. Since the class of slender context-free languages is closed under set difference, we see that the language L0 = L \ L1 is also a slender context-free language. We next define L2 := min(L0 ). Continuing this process, we see that each Li is a slender context-free language, as required, and further, this decomposition is effectively computable. For i = 1, . . . , c, let Ai be a PDA accepting Li . We show how to accept decm,r (L) by modifying each Ai appropriately. Recall Sk that we may write L as a finite disjoint union L = j=1 Pj , where each Pj is a language of the form {uv n w xn y : n ≥ 0}. Let us denote the length set {|uw y| + n|v x| : n ≥ 0} of Pj by len(Pj ). Let Nw denote the number of words in L of length < |w|. We modify Ai by adding a modulo m counter. If w = w1 · · · wn is the input to Ai , and Ai has processed the prefix w1 · · · wt −1 , t ≤ n, then the counter will store Nw1 ···wt −1 (mod m). On reading wt , Ai increments the counter by Sc1 for each language Pj such that t − 1 ∈ len(Pj ). The PDA Ai accepts w if and only if Nw + i ≡ r (mod m). It follows that i=1 L(Ai ) = decm,r (L), as required. 7. Additional remarks We point out some additional results of Rigo that are relevant. In [14, Theorem 13], he proved that if P is a polynomial that is non-negative at the natural numbers, then there exists a regular language such that extraction by the index set {P (n) : n ≥ 0} is regular. In [13, Proposition 17], he sketches the proof that extraction of an infinite regular language by the index set I = {2, 3, 5, 7, . . .} of primes is always non-regular. 8. Open problems (1) Numerical evidence suggests that if Tn = ( + (0 + 1)∗ 0)(1n )∗ (which can be accepted with an n-state DFA), then even(Tn ) requires (n + 2)2n−2 − 1 states. Prove this and generalize to larger alphabets. (2) Given a CFL L, is it decidable whether or not even(L) is a CFL? Acknowledgments We thank the referees for a careful reading of the paper. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
J.-P. Allouche, J. Shallit, Automatic Sequences: Theory, Applications, Generalizations, Cambridge University Press, 2003. J. Berstel, L. Boasson, The set of minimal words in a context-free language is context-free, J. Comput. System Sci. 55 (1997) 477–488. J. Berstel, L. Boasson, O. Carton, B. Petazzoni, J.-E. Pin, Operations preserving regular languages, Theoret. Comput. Sci. 354 (2006) 405–420. J.-C. Birget, Partial orders on words, minimal elements of regular languages, and state complexity, Theoret. Comput. Sci. 119 (1993) 267–291. E. Charlier, M. Rigo, W. Steiner, Abstract numeration systems on bounded languages and multiplication by a constant, INTEGERS 8 (2008) #A35. J.E. Hopcroft, J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, 1979. L. Ilie, On a conjecture about slender context-free languages, Theoret. Comput. Sci. 132 (1994) 427–434. L. Ilie, On lengths of words in context-free languages, Theoret. Comput. Sci. 242 (2000) 327–359. P.B.A. Lecomte, M. Rigo, Numeration systems on a regular language, Theory Comput. Syst. 34 (2001) 27–44. J.H. van Lint, R.M. Wilson, A Course in Combinatorics, Cambridge University Press, 1992. G. Pighizzini, J. Shallit, Unary language operations, state complexity and Jacobsthal’s function, Internat. J. Found. Comput. Sci. 13 (2002) 145–159. G. Păun, A. Salomaa, Thin and slender languages, Discrete Appl. Math. 61 (1995) 257–270. M. Rigo, Generalization of automatic sequences for numeration systems on a regular language, Theoret. Comput. Sci. 244 (2000) 271–281. M. Rigo, Construction of regular languages and recognizability of polynomials, Discrete Math. 254 (2002) 485–496. J. Shallit, Numeration systems, linear recurrences, and regular sets, Inform. Comput. 113 (1994) 331–347.
Theoretical Computer Science 410 (2009) 2410–2423
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
On two open problems of 2-interval patterns Shuai Cheng Li, Ming Li ∗ David R. Cheriton School of Computer Science, University of Waterloo, Waterloo ON N2L 3G1, Canada
article Keywords: 2-interval pattern Contact map NP-hard Bioinformatics
info
a b s t r a c t The 2-interval pattern problem, introduced in [Stéphane Vialette, On the computational complexity of 2-interval pattern matching problems Theoret. Comput. Sci. 312 (2–3) (2004) 223–249], models general problems with biological structures such as protein contact maps and macroscopic describers of secondary structures of ribonucleic acids. Given a set of 2-intervals D and a model R, the problem is to find a maximum cardinality subset D 0 of D such that any two 2-intervals in D 0 satisfy R, where R is a subset of relations on disjoint 2intervals: precedence (<), nest (<), and cross (G). The problem left unanswered at present is that of whether there is a polynomial time solution for the 2-interval pattern problem, when R = {<, G} and all the support intervals of D are disjoint. In this paper, we present a reduction from the clique problem to show that, in this case, the problem is NP-hard. The disjoint 2-interval pattern matching problem is to decide whether a disjoint 2interval pattern (called the pattern) is a substructure of another disjoint 2-interval pattern (called the target). In general, the problem is NP-hard, but when there are restrictions on the form of the pattern, the problem can, in some cases, be solved in polynomial time. In particular, a polynomial time algorithm has been proposed (Gramm, WABI 2004 and IEEE/ACM TCBB 2004) for the case where the patterns are so-called crossing contact maps. In this paper we show that the problem is actually NP-hard and point out an error in the analysis of the above algorithm.1 © 2009 Elsevier B.V. All rights reserved.
1. Introduction The paper answers two open questions related to 2-interval patterns. 2-interval patterns are graph theoretic models that are often used to model structures in bioinformatics, such as protein contact maps and macroscopic describers of ribonucleic acid secondary structures [8]. Similar graph models have also been proposed for RNA multiple structural alignments [3]. Given a set of 2-intervals and a model, the 2-interval pattern problem is to identify one of the largest subsets of 2-intervals under this model. The model describes whether two disjoint 2-intervals can be in precedence order (<), be allowed to nest (<), be allowed to cross (G), or any combination of these three orders. The complexity of the 2-interval pattern problem under different models was first investigated by Vialette [8], and then by Blin et al. [1,2]. Due to these studies, we now know that this problem is NP-complete in the most general case, and sub-cases of the problem, with restrictions on the form of intervals and models, are sometimes solvable in polynomial time. However, the complexity of one sub-case of the problem, whether the 2-interval pattern problem has a polynomial time algorithm with disjoint support intervals and {<, G}-structured patterns, remained unknown. We discover in the present paper that the 2-interval problem, in this case, is NP-hard. This problem is closely related to our second question, an open problem known as the 2-interval pattern matching problem. The complexity of this problem was first investigated by Vialette [8], and then followed by Blin et. al. [1]. The problem of
∗
Corresponding author. E-mail addresses:
[email protected] (S.C. Li),
[email protected] (M. Li).
1 The second part of this paper appeared in WABI’2006. 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.033
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2411
Table 1 2-interval pattern problem complexity for n = |D |. When not specified, the result is from [1,8]. ? denotes the contributions of this paper. 2-Interval pattern problem Model
{<, <, G} {<, G} {<, <} {<, G} {<} {<} {G}
Support Unlimited
Unitary
√
NP-complete NP-complete O(n2 ) NP-complete
Disjoint
O(n log n) O(n log n) O(n2 )[2]
O(n n) O(n2 )[2] NP-complete?
whether the 2-interval pattern matching problem has a polynomial time algorithm with disjoint interval ground sets and {<, G}-structured patterns was left unanswered in these works. Gramm [6,7] proposed a polynomial time algorithm that would solve this problem. Regrettably, a flaw in an assumption made by the algorithm appeared, we noticed. We show in this paper that this problem is actually NP-hard. 2. Preliminaries The notation in this paper follows largely those in [1,8,6]. A 2-interval consists of two (support) intervals. When the support intervals are disjoint, each interval is equivalent to an integer. Hence, in this paper we simplify the interval notation to integers. A disjoint 2-Interval Pattern (DIS-2-IP) consists of a pair (S , D ), where S is a set of integers, and D consists of a set of ordered pairs, which is D = {(sl , sr )|sl , sr ∈ S , sl < sr }. A pair (sl , sr ) is referred to as an arc. For an arc (sl , sr ), sl and sr are referred to as the left endpoint and right endpoint of the arc. We define L((sl , sr )) = sl and R((sl , sr )) = sr . The integers in S are also referred to as points. Three kinds of relations between two given arcs a = (sl , sr ) and a0 = (s0l , s0r ) are defined:
• a < a0 (a is less than a0 ) iff sr < s0l ; • a < a0 (a is nested in a0 ) iff s0l < sl < sr < s0r ; • a G a0 (a crosses a0 ) iff sl < s0l < sr < s0r The above relations do not include all the possible relations between any two arcs, such as two arcs sharing an endpoint. The DIS-2-IP is equivalent to a contact map. Therefore, we use the terms DIS-2-IP and contact map interchangeably in the current paper. 2.1. Maximum 2-interval pattern problem The maximum 2-interval pattern (2-IP) problem under a model R is to find a largest subset of 2-intervals which is Rcomparable. Formally: Disjoint 2-IP pattern (DIS-2-IP) problem Input: Output:
A set of arcs D , and a model R An R-comparable subset of D with the largest cardinality.
We denote the disjoint 2-IP under model R as DIS-2-IP-R. The DIS-2-IP problem for all possible models except DIS-2IP-{<, G} has been shown to be polynomially solvable [8]. These results are summarized in Table 1. In this paper we show NP-hardness for the DIS-2-IP-{<, G} problem. 2.2. Disjoint 2-interval pattern matching problem As DIS-2-IPs are equivalent to contact maps, we use CM to denote a disjoint 2-interval pattern. A DIS-2-IP (S , D ) is called a crossing contact map (CCM) iff it is {<, G}-structured. The disjoint 2-interval pattern matching (D2IPM) problem is: Given two disjoint 2-interval patterns CM (Sp , Dp ) (called the pattern) and CM (S , D ) (called the target) where |Sp | ≤ |S |, find a subset S 0 of S with |S 0 | = |Sp | such that there is a one–one mapping M from the elements of Sp to the elements of S 0 that satisfies the following two conditions:
• if s1 , s2 ∈ Sp and s1 < s2 , then M(s1 ), M(s2 ) ∈ S 0 and M(s1 ) < M(s2 ); • if (s1 , s2 ) ∈ Dp , then (M(s1 ), M(s2 )) ∈ D .
2412
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
Fig. 1. A simple graph G0 to illustrate the reduction.
If such a mapping exists, we say that CM (Sp , Dp ) occurs in CM (S , D ). In general, the 2-interval pattern matching problem is NP-hard [5,8]. However, some cases with restrictions on the patterns have been shown to be solvable in polynomial time. The D2IPM problem with {<}, {<}, {G}, or {<, <}-structured patterns can be solved in polynomial time, but is NP-hard for the {<, G}- and {<, <, G}-structured patterns [8]. In this paper, we are interested in the remaining case of when the patterns are CCMs. The following formally states the problem: Crossing contact map pattern matching (CCMPM) problem [7] Input: Output:
Dis-2-IP CM (Sp , Dp ) and CM (S , D ) with CM (Sp , Dp ) as a CCM Does CM (Sp , Dp ) occur in CM (S , D )?
2.3. Clique problem We use the clique problem, a known NP-hard problem [4], for both reductions in this paper. Let an instance of the clique problem be given by a directed graph G(V , E ) and by a positive integer `. Without loss of generality, assume V = {1, . . . , n}. For an edge (u, v) ∈ E, we assume u < v . In general, the clique problem is defined for undirected graphs. For ease of notation, we assign a linear order to the vertices, and assume that the edge is directed from u to v if u < v , where u is referred to as the source vertex of edge (u, v), and v is referred to as the target vertex. An `-clique of a directed graph consists of ` vertices: ui ∈ V , 1 ≤ i ≤ ` such that u1 < u2 < · · · < u` and ∀ 1 ≤ i < j ≤ `, (ui , uj ) ∈ E. 3. NP-hardness of 2-IP-DIS-{<, G} The construction is rather complicated, so we will give a big picture before going into the details. We build a set of 2intervals D, and then we prove that D has a DIS-2-IP-{<, G} of size (2` − 1)n2 + (` − 1)n + ` iff G(V , E ) contains an `-clique. In the reduction, we construct subsets of arcs to represent the edge set E, and define these arc sets to be QR1 , . . ., QR` . For a maximum DIS-2-IP-{<, G} pattern, we select ` arcs, which correspond to ` − 1 edges in E, from QR1 , and we select ` − 1 arcs from QR2 (corresponding to ` − 2 edges in E), . . . , and finally we select only one arc from QR` . The edges corresponding to the arcs selected from QR1 are (u1 , u2 ), (u1 , u3 ), . . ., (u1 , u` ); and the edges corresponding to the arcs selected from QRj are (uj , uj+1 ), (uj , uj+2 ), . . ., (uj , u` ) for some u1 , u2 . . ., u` . If we succeed in selecting these arcs, then u1 , u2 , . . ., u` form an `-clique. We will use the graph G0 in Fig. 1 to illustrate the construction. In the following subsections, we first define some additional notation, then we construct the endpoints and the orders between the points followed by the construction of the arcs, and finally show the correctness of the construction. 3.1. Additional notation A set D of k distinct arcs where ∀a, a0 ∈ D, either a G a0 or a0 G a, is called a k-arc crossing cluster. Given two disjoint sets of arcs D1 , D2 , we say D1 is nested in D2 (D1 < D2 ) iff ∀a1 ∈ D1 , ∀a2 ∈ D2 , a1 < a2 . For two arcs a and a0 , we say a is propagated to a0 if for a maximum DIS-2-IP-{<, G} pattern, the selection of a0 ensures the selection of a for this DIS-2-IP-{<, G}. For two arc sets D and D0 with |D| ≥ |D0 |, D is propagated to D0 if the selection of D0 ensures the selection of D for a maximum DIS-2-IP-{<, G}. For k DIS-2-IP-{<, G}, for arc sets D1 , . . ., Dk with |D1 | ≥ · · · ≥ |Dk |, the k arc sets are propagated if the selection of D1 ensures the selections of D2 , . . ., Dk for a maximum DIS-2-IP-{<, G}. Given two point sets S1 and S2 , we write S1 < S2 iff ∀s1 ∈ S1 and ∀s2 ∈ S2 , s1 < s2 . 3.2. The set of endpoints We construct the following sets of endpoints. (1) I; (2) P j , Qj and Rj , 1 ≤ j ≤ `; and (3) S j , T j and Uj , 2 ≤ j ≤ `. Details are as follows:
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2413
Fig. 2. Arc sets IP and PQ1 for graph G0 . IP is a full bipartite connection between I and P j . PQj is an n2 -arc crossing cluster.
1. I contains ` points; they are ordered according to their subscripts in increasing order: I = {Ij |1 ≤ j ≤ `}, I1 < · · · < I` . j
2. P j = {Pu,v |1 ≤ u ≤ n, 1 ≤ v ≤ n}, 1 ≤ j ≤ `. P j contains n2 points and those points are ordered according to their first j
j
subscripts increasingly, and then according to the second subscripts increasingly: (1) if u1 < u2 , then Pu1 ,v1 < Pu2 ,v2 ; or j j (2) if u1 = u2 and v1 < v2 then Pu1 ,v1 < Pu2 ,v2 . j
3. Qj = {Qu,v |1 ≤ u ≤ n, 1 ≤ v ≤ p}, 1 ≤ j ≤ `. Qj contains n2 points. The order relation is similar to that for the case of j P : Qu1 ,v1 j Rj Ru,v
< Quj 2 ,v2 (1) if u1 < u2 ; or (2) if u1 = u2 and v1 < v2 . ={ |1 ≤ u ≤ n, 0 ≤ v ≤ n}, 1 ≤ j ≤ `. Rj has n2 + n elements. The elements are ordered according to the first j j subscripts decreasingly, then according to the second subscripts increasingly, namely Ru1 ,v1 < Ru2 ,v2 if (1) u1 > u2 ; or j j (2) u1 = u2 and v1 < v2 . Note that case (1) is different from the case (1) of P and Q . j S j = {Su,v |1 ≤ u ≤ n, 1 ≤ v ≤ n}, 2 ≤ j ≤ `. S has n2 elements. Its order relation is similar to that for the case of Rj j j j j which is: (1) if u1 < u2 , Su1 ,v1 > Su2 ,v2 ; or (2) if u1 = u2 and v1 < v2 , Su1 ,v1 < Su2 ,v2 . j T j = {Tu |1 ≤ u ≤ n}, 2 ≤ j ≤ `. T j contains n points, and the elements are ordered according to the subscripts j j increasingly: Tu1 < Tu2 if u1 < u2 . j j Uj = {Uui |1 ≤ u ≤ n}, 2 ≤ j ≤ `. Uj contains n points, and the elements are ordered like in the case T j : Uu1 < Uu2 if u1 < u2 . j
4.
5. 6. 7.
Furthermore, we specify the following order:
P1 < Q1 < R1 Sj < Tj < Uj < Pj < Qj < Rj , 2 ≤ j ≤ ` Let W1 = P1 ∪ Q1 ∪ R1 and Wj = Sj ∪ Tj ∪ Uj ∪ Pj ∪ Qj ∪ Rj for 2 ≤ j ≤ `. We introduce the following order:
I < W1 < W2 < · · · < W` Now we have defined a total order of all the points. Let S = I ∪
S`
j=1
Wj .
3.3. Construction of the arcs In this subsection, we specify the arcs construction. 3.3.1. Arc set IP An arc is created to connect between each point in I and each point in P 1 (Fig. 2). Formally, we have IP = {(Ij , Pu1,v )|1 ≤ j ≤ `, 1 ≤ u, v ≤ n} This construction ensures that at most ` arcs of IP can occur in a DIS-2-IP-{<, G} pattern, since I contains ` points and no arcs in a DIS-2-IP-{<, G} pattern can share an endpoint. 3.3.2. Arc set PQj , 1 ≤ j ≤ ` In total, there are n2 arcs in PQj . An arc is created for each of the pairs of points which have the same subscripts; formally, PQj = {(Puj ,v , Quj ,v )|1 ≤ u ≤ n, 1 ≤ v ≤ n} All the arcs in PQj cross each other (Figs. 2 and 5). Any combination of the arcs in PQj can occur in a DIS-2-IP-{<, G}.
2414
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
Fig. 3. Arc sets QRj for G0 . QRj codes the edge information. QR` contains only the anchor arcs.
Fig. 4. Arc set RSj−1 , STj and TUj for G0 . RSj−1 is an n2 -arc crossing cluster. Every n arcs in STj share one endpoint in T j . TUj is n-arc crossing cluster.
3.3.3. Arc set QRj , 1 ≤ j ≤ ` QRj is the place to code the edge information. QRj , 1 ≤ j ≤ ` − 1, contains |E | + n arcs, and QR` contains n arcs. For j j j 1 ≤ j ≤ ` and 1 ≤ u ≤ n, an arc is created between Qu,u and Ru,0 ; and for 1 ≤ j ≤ ` − 1, there is an arc between Qu,v and j
Ru,v if and only if (u, v) is an edge of G, u < v :
j
QRj = {(Quj ,v , Rju,v )|(u, v) ∈ E } ∪ {(Quj ,u , Ru,0 )|1 ≤ u ≤ n}, 1 ≤ j ≤ ` − 1 QR` = {(Qu`,u , R`u,0 )|1 ≤ u ≤ n} j
j
j
j
Let QRju = {(Qu,v , Ru,v )|(Qu,v , Ru,v ) ∈ QRj , 0 ≤ v ≤ n}. QRju is a crossing cluster. As the elements in Qj are ordered increasingly according to the second subscripts, and the elements in Rj are ordered decreasingly according to the second subscripts, we have that QRju1 is nested in QRju2 ,1 ≤ u2 < u1 ≤ n (Fig. 3). Formally: Lemma 1. QRju1 < QRju2 , 1 ≤ u2 < u1 ≤ n. This property ensures that only those arcs in QRj whose endpoints share the same first subscripts may occur in a DIS-2-IP{<, G}. It implies that only edges with the same source node (denoted by the first subscripts) may have their corresponding arcs occurring in a DIS-2-IP-{<, G}. j
j
j
j
We call the arc (Qu,u , Ru,0 ) the anchor of QRju . The anchor (Qu,u , Ru,0 ) crosses the rest of the arcs in QRju . The anchor arcs
in QRj are nested with each other, and at most one anchor arc can occur in a DIS-2-IP-{<, G}. In Section 3.4, we will prove that to have a maximum DIS-2-IP-{<, G}, one of the anchor arcs is to be selected for each QRju , 1 ≤ j ≤ `. By our construction, at most one arc in QR` can be selected for a maximum DIS-2-IP-{<, G}.
Lemma 2. At most one arc in QR` can be selected for a maximum DIS-2-IP-{<, G}. 3.3.4. Arc set RSj , 2 ≤ j ≤ ` RSj contains n2 arcs, whose construction is similar to that of PQj (Fig. 4); formally, RSj = {(Rju,v , Suj+,v1 ) | 1 ≤ u ≤ n, 1 ≤ v ≤ n} RSj is an n2 -arc crossing cluster, and any combination of the arcs can occur in a DIS-2-IP-{<, G}. 3.3.5. Arc set STj , 2 ≤ j ≤ ` STj contains n2 arcs, and every n arcs in STj share one endpoint in T j (Fig. 4), namely STj = {(Suj ,v , Tvj )|1 ≤ u ≤ n, 1 ≤ v ≤ n} At most n arcs in STj can occur in a DIS-2-IP-{<, G}.
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2415
Fig. 5. Arc set UPj and PQj for G0 . Every n arcs UPj share a point Uj . PQj is an n2 -arc crossing cluster.
Fig. 6. Arc propagation for A1 .
3.3.6. Arc set TUj , 2 ≤ j ≤ ` TUj is an n-arc crossing cluster (Fig. 4). An arc is created between two points of the same subscripts, with one point in T and the other point in Uj :
j
TUj = {(Tvj , Uvj )|1 ≤ v ≤ n} 3.3.7. Arc set UPj , 2 ≤ j ≤ ` Every n arcs in UPj share one endpoint in Uj (Fig. 5): UPj = {(Uvj , Puj ,v )|1 ≤ u ≤ n, 1 ≤ v ≤ n} At most n arcs of UPj can appear in a DIS-2-IP-{<, G}. Let A1 = IP ∪ PQ1 ∪ QR1 and Aj = QRj−1 ∪ RSj ∪ STj ∪ TUj ∪ UPj ∪ PQj ∪ QRj , (2 ≤ j ≤ `). Denote the set of all the arcs constructed as D. 3.4. Correctness of the construction Define L = (2` − 1)(n2 + 1) + (` − 1)n + `. First, we want to prove that to have a maximum DIS-2-IP-{<, G} P of size L in D, the only way is to select ` arcs from IP, and to select ` − j + 1 arcs from QRj , 1 ≤ j ≤ `. Second, by using edge information coded in QRj , we prove that the edges corresponding to the arcs selected from QRj , 1 ≤ j ≤ `, form a clique. Theorem 3. There is a maximum DIS-2-IP-{<, G} P in D of size L if and only if P contains ` arcs from IP, and ` − j + 1 arcs from QRj , 1 ≤ j ≤ `. j j Furthermore, the arcs that are selected for P from IP are (I1 , Pu11 ,u1 ), (I2 , Pu11 ,u2 ), . . ., (I` , Pu11 ,u` ), and from QRj are (Quj ,uj , Ruf j,0 ),
(Quj j ,u2 , Rjuj ,u2 ), . . ., (Quj j ,u ` , Rjuj ,u` ) for some u1 , u2 , . . . , u` with 1 ≤ u1 < u2 < · · · < u` ≤ n. The proof has three steps: (1) we prove that in A1 , the arcs selected from IP are propagated to the arcs selected from QR1 ; (2) we prove that in Aj , the arcs selected from QRj−1 are propagated to the arcs from QRj ; and (3) by combining (1) and (2), we have that the arcs selected from IP and QRj , 1 ≤ j ≤ `, are all propagated. First we prove that the arcs selected from IP and from QR1 are propagated (an example is shown in Fig. 6). Lemma 4. For A1 , suppose k0 arcs are selected from IP and k1 arcs are selected from QR1 for P, then n2 + min{k0 , k1 } arcs are selected from A1 for P. Furthermore, suppose k0 = k1 and the number of arcs selected from A1 is n2 + k0 , then the arcs selected from QR1 have endpoints in Q1 as Qu1,u1 , . . . , Qu1,uk and the arcs selected from IP have their endpoints in P 1 as Pu1,u1 , . . . , Pu1,uk for some u, u1 , . . ., uk0 , 1 ≤ u1 < · · · < uk0 ≤ n.
0
0
Proof. k0 arcs from IP implies that at most n2 − k0 points in P 1 can be used for the arcs from PQ1 . k1 arcs from QR1 implies that at most n2 − k1 points in Q1 can be used for the arcs selected from PQ1 . PQ1 is an n2 crossing cluster and at least max{k0 , k1 } arcs in PQ1 share endpoints with arcs from IP or QR1 . Therefore the number of arcs that are selected from A1 is at most k0 + k1 + n2 − max{k0 , k1 }, or equivalently n2 + min{k0 , k1 }. If k0 = k1 and the number of arcs selected from A1 is n2 + k0 , the maximum possible number of arcs that can be selected from A1 is achieved. This maximum value is achievable if and only if (1) the number of arcs from PQ1 is n2 − k0 ; and (2) the
2416
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
subscripts for the right endpoints of the k0 arcs from IP have a one–one correspondence to the subscripts for left endpoints of the k0 arcs from QR1 . According to Lemma 1, the first subscripts for the endpoints of the arcs from QR1 are the same. Hence the statement holds. Then we prove that the arcs selected from QRj−1 and QRj are propagated. Lemma 5. For Aj , suppose kj−1 arcs are selected from QRj−1 and kj arcs are selected from QRj for P; then 2n2 +n+min{kj−1 , kj +1} arcs are selected from Aj for P. Furthermore, if kj−1 = kj + 1 and the number of arcs from Aj is 2n2 + n + kj−1 , then arcs selected from QRj−1 have their j −1
j −1
j −1
j −1
j
j
endpoints in Rj−1 as Ru,0 , Ru,u1 , Ru,u2 . . ., Ru,uk and the arcs selected from QRj have their endpoints in Qj as Qu0 ,u , Qu0 ,u , . . ., j
1
2
j
Qu0 ,u for some u, u0 , u1 , u2 , . . ., ukj , with 1 ≤ u1 < u2 < · · · < ukj ≤ n. kj
Proof. Let the number of arcs selected from STj be s, and the number of arcs selected from UPj be t. Consider QRj−1 ∪ RSj ∪ STj : like in the argument in Lemma 4, the number of arcs selected from RSj is n2 − max{kj−1 − 1, s} as at most one of the anchor arcs in QRj−1 is selected and the anchor arc does not share the endpoints with arcs in RSj . Similarly, the numbers of arcs that can be selected from TUj and PQj are n − max{s, t } and n2 − max{t , kj }, respectively. Thus the number of arcs that can be selected from Aj is 2n2 + n + kj−1 + kj + s + t − max{kj−1 − 1, s} − max{s, t } − max{t , kj }
(1)
Eq. (1) can be rearranged as 2n2 + n + kj−1 + min{s − kj−1 + 1, 0} + min{s − t , 0} + min{kj − t , 0} ≤ 2n2 + n + kj−1 The maximum 2n2 + n + kj−1 is achievable only if: (1) s + 1 ≥ kj−1 ; (2) s ≥ t; and (3) kj ≥ t. Similarly, we can rearrange Eq. (1) as 2n2 + n + kj + 1 + min{0, kj−1 − s − 1} + min{s − t , 0} + min{0, t − kj } ≤ 2n2 + n + kj + 1 The maximum 2n2 + n + kj + 1 is achievable only if: (4) kj−1 ≥ s + 1; (5) s ≥ t; and (6) t ≥ kj . Then we have that 2n2 + n + min{kj−1 , kj + 1} arcs are selected from Al for P and we can verify that this number is achievable. If kj−1 − 1 = kj , in order to maximize Eq. (1), by combining conditions (1)–(6), we have s = t = kj . We know that one j−1
j −1
j −1
j−1
anchor arc in QRj−1 is selected. Let the arcs from QRj−1 have their endpoints in Rj−1 as Ru,0 , Ru,u1 , Ru,u2 . . ., Ru,uk for some j
u, u1 , u2 , . . ., ukj , with 1 ≤ u1 < u2 < · · · < ukj ≤ n. Consider QRj−1 ∪ RSj ∪ STj ; the s arcs from STj have their endpoints in j
j
j
T j as Tu1 , Tu2 . . ., Tuk . We know that the left endpoints of the arcs from QRj have the same first subscripts by Lemma 2. It is j
j
j
j
not difficult to see that the t arcs from UPj have their endpoints in P j as Pu0 ,u , Pu0 ,u . . ., Pu0 ,u for some u0 , 1 ≤ u0 ≤ n. This 1
j
kj
2
j
j
implies that the arcs selected from QRj have their endpoints in Qj as Qu0 ,u , Qu0 ,u , . . ., Qu0 ,u . 1
2
kj
To achieve a maximum pattern, we will prove that one anchor arc is selected for each arc set QRj , 1 ≤ j ≤ `. By this, we can prove that u0 = u1 in Lemma 5. By combining the results from Lemmas 4 and 5, we now prove that the arcs selected from IP, and QRj , 1 ≤ j ≤ `, are propagated, which results in Theorem 3. Proof. Suppose k0 arcs are selected from IP, and kj arcs are selected from QRj for P. Lemma 5 implies that at most 2n2 + n + min{kj−1 , kj + 1} − kj−1 arcs can be selected from Aj − QRj−1 . Then this DIS-2-IP-{<, G} has size n2 + min{k0 , k1 } +
` ` X X (2n2 + n + min{kj−1 , kj + 1}) − kj−1 j =2
= (2` − 1)n2 + (` − 1)n + min{k0 , k1 } +
j =2
` X
min{kj + 1 − kj−1 , 0}
j =2
≤ (2` − 1)n2 + (` − 1)n + k0 ≤L The value is maximized only if: (1) k0 ≤ `; (2) k1 ≥ `, and kj−1 ≤ kj + 1 for j ≥ 2. By Lemma 2, we know that the maximum value for k` is 1. So we have k0 = `, and kj = ` − j + 1 for 1 ≤ j ≤ `. Let the arcs from IP be (I1 , Pu11 ,u1 ),(I2 , Pu11 ,u2 ), . . ., (I` , Pu11 ,u` ) for some u1 , u2 , . . . , u` with 1 ≤ u1 < u2 < · · · < u` ≤ n.
We know that one anchor arc is selected from QR1 to achieve the maximum value. Also we know that the two subscripts of left endpoint for an anchor arc are equal. According to Lemma 4, we have that the arcs from QR1 are (Qu11 ,u1 , R1u1 ,0 ),
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2417
(Qu11 ,u2 , R1u1 ,u2 ), . . ., (Qu11 ,u` , R1u1 ,u` ). By similar arguments we have that the arcs from QRj are (Quj j ,uj , Rjuj ,0 ), (Quj j ,u2 , Rjuj ,u2 ), . . ., (Quj j ,u` , Rjuj ,u` ) (2 ≤ j ≤ `) according to Lemma 5. Lastly, according to the edge information coded in the arcs selected from QRj , 1 ≤ j ≤ `, we can prove that a clique of size ` exists if there is a DIS-2-IP-{<, G} size L in D. Lemma 6. If there is a DIS-2-IP-{<, G} P of size L in D, then there is an `-clique in G. Proof. From Theorem 3, we know that to have a size L DIS-2-IP-{<, G}, the arcs selected from QRj , 1 ≤ j ≤ `, must be (Quj j ,uj , Rjuj ,0 ), (Quj j ,uj+1 , Rjuj ,uj+1 ), . . ., (Quj j ,u` , Rjuj ,u` ). Then the j arcs selected from QRj imply that E contains edges (uj , uj+1 ), . . . (uj , u` ). This implies that all the edges (v1 , v2 ), v1 < v2 , v1 , v2 ∈ {u1 , u2 , . . . , u` } are in E. Therefore u1 , . . . , u` form a clique of size `.
It is easy to prove that if G contains an `-clique, then a maximum DIS-2-IP-{<, G} in D has size L. It is obvious that this reduction is polynomial. Naturally, we have: Theorem 7. D has a size L DIS-2-IP-{<, G} if and only if G has a clique of size `; hence the DIS-2-IP-{<, G} problem is NP-hard. 4. NP-hardness of the crossing pattern matching problem In the following, we will first define some terms to facilitate the presentation of the reduction. We will then construct (1) a target map CM (SG , DG ), and (2) a pattern CM (Sn,` , Dn,` ) with parameters ` and n, from a given graph G(V , E ). Then, we analyze the reduction and show its correctness. 4.1. Additional notation and definitions A set D of k distinct arcs where ∀a, a0 ∈ D , either a G a0 or a0 G a, is called a k-arc crossing cluster. Given two disjoint sets of arcs D1 , D2 , we say D1 crosses D2 , or D2 is crossed by D1 (written as D1 G D2 ), just in cases where either (1) ∀a1 ∈ D1 , ∀a2 ∈ D2 , a1 G a2 , or (2) if one of D1 or D2 is an empty set. D1 < D2 (D1 is less than D2 , or D2 is greater than D1 ), and D1 < D2 (D1 is nested in D2 ) can be defined similarly. We also say an arc a crosses a set of arcs D to mean {a} G D (the cases for < and < can be defined similarly). For any three sets of arcs D1 , D2 and D3 , we say that:
• D3 is from D1 to D2 iff D1 < D2 and D1 G D3 , D3 G D2 ; and • D3 is anchored by D1 and D2 iff D1 < D3 and D2 G D3 . GivenS two point sets S1 and S2S , we write S1 < S2 iff ∀s1 ∈ S1 and ∀s2 ∈ S2 , s1 < s2 . For a arc set D , we define L(D ) = a∈D {L(a)}, and R(D ) = a∈D {R(a)}. The subscript ‘∗’ is a special symbol which matches every defined subscript. That is, A∗,j refers to the set {Aij |Ai,j is defined}, and A∗,∗ refers to the set of all Ai,j that have been defined. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), there exists a one–one mapping M between elements inSDn,` and some elements in DG . Here, we extend the definition of the mapping to any set Dp0 ⊆ Dn,` such that M (Dp0 ) = a∈D 0 {M (a)}. p
4.2. Target contact map construction In this section, we construct a target contact map CM (SG , DG ) from a given graph G(V , E ). We first build some large crossing clusters, and then we construct the arcs which connect these clusters. 4.2.1. Large crossing clusters Firstly, we construct 2n + 2 crossing clusters, which are H, Zu (1 ≤ u ≤ n), T and Vu (1 ≤ u ≤ n). H is a 28n4 -arc crossing Sn cluster, Zu is a 5n3 -arc crossing cluster, T is a 9n4 -arc crossing cluster and Vu is a 5n3 -arc crossing cluster. Let Z = u=1 Zu Sn and V = i=1 Vu . Furthermore we define the following order for these large clusters: H < Z1 < · · · < Zn < T < V1 < · · · < Vn 4.2.2. Arcs from H to Zu There is a 2-arc crossing cluster from H to Zu for each u, 1 ≤ u ≤ n. Denote the two arcs as Au,1 , and Au,2 , Au,1 G Au,2 . Let Au = {Au,1 , Au,2 }. Furthermore, we define the following orders: H G Au , Au G Zu Au1 < Au2 ,
1≤u≤n
(2)
1 ≤ u1 < u2 ≤ n
(3)
Eq. (2) ensures Sn that Au is from H to Zu . Eq. (3) forces that at most one pair of arcs in A∗,∗ can be included in a CCM. Let A = u=1 Au ; it is clear that |A| = 2n.
2418
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
4.2.3. Arcs from Zu to Zv There are two kinds of arcs from Zu to Zv (1 ≤ u < v ≤ n): Eu,v and Cu,v . Eu,v consists of u crossing clusters, denoted as Eu,v,w , 1 ≤ w ≤ u. Each cluster Eu,v,w contains three arcs: Eu,v,w,1 , Eu,v,w,2 and Eu,v,w,3 with Eu,v,w,1 G Eu,v,w,2 , Eu,v,w,1 G Eu,v,w,3 and Eu,v,w,2 G Eu,v,w,3 . Each Cu,v is a single arc. We now define orders among the arcs E∗,∗,∗,∗ and C∗,∗ which are needed for our proof. Diagrams of these orders are depicted in the Appendix. Firstly, we ensure that Eu,∗,∗,∗ and Cu,∗ are crossed by Zu , while E∗,v,∗,∗ and C∗,v crosses Zv : Zu G Eu,∗,∗,∗ , Zu G Cu,∗ ,
1≤u≤n−1
(4)
E∗,v,∗,∗ G Zv , C∗,v G Zv ,
2≤v≤n
(5)
Secondly, we define the orders among the arcs which are crossing Zv (2 ≤ v ≤ n): R(E∗,v,1,∗ ) < R(E∗,v,2,∗ ) < · · · < R(E∗,v,v−1,∗ ) < R(C∗,v ) R(E∗,v,w,1 ) < R(E∗,v,w,2 ) < R(E∗,v,w,3 ), 1 ≤ w < v
(6)
Ev−1,v,w,i < Ev−2,v,w,i < · · · < Ew,v,w,i , 1 ≤ w < v, 1 ≤ i ≤ 3 Cv−1,v < Cv−2,v < · · · < C1,v
(8)
(7) (9)
E∗,v,∗,∗ < Av , C∗,v < Av
(10)
Eq. (6) ensures that for the arcs crossing Zv , the right endpoints are ordered according to the third subscripts. Also the right endpoints for C∗,v should be greater than the right endpoints of E∗,v,∗,∗ . Furthermore, Eq. (7) orders (the right endpoints of) E∗,v,w,∗ according to the fourth subscripts for any given v and w , and then Eq. (8) orders them by their first subscripts. Eq. (9) defines the order between the arcs of C∗,∗ , and at most one arc in C∗,v can be selected for a CCM. Eq. (10) defines the relations between the arcs of C∗,v , E∗,v,∗,∗ and Av,∗ . If Av,∗ is selected for a CCM, then none of the arcs in E∗,v,∗,∗ and C∗,v can be used. Thirdly, for the arcs which are crossed by Zu (1 ≤ u ≤ n − 1), we introduce the orders as below: Eu,u+1,w,∗ < Eu,u+2,w,∗ < · · · < Eu,n,w,∗ , Cu,u+1 < Cu,u+2 < . . . < Cu,n
1≤w≤u
(11) (12)
Eq. (11) ensures that for any given u and w , at most one 3-arc crossing cluster can be chosen for a CCM, namely Eu,v,w,∗ for some v . Similarly Eq. (12) ensures that for a given u, at most one arc in Cu,∗ appears in a CCM. Lastly, we define the orders between those arcs which are crossed by Zz and are crossing Zz (1 ≤ z ≤ n): E∗,z ,w,1 < Ez ,∗,w,∗ , E∗,z ,w,2 G Ez ,∗,w,∗ , Az ,1 < Ez ,∗,z ,∗ , Az ,2 G Ez ,∗,z ,∗ , Az ,2 < Cz ,∗ ,
1≤w
(13)
1≤z
(14)
1≤z
(15)
Eq. (13) ensures that the arc from Zu for a given w is anchored by E∗,z ,w,1 and E∗,z ,w,2 . Notice that for w = z, the set E∗,z ,z ,∗ is not defined. The arc Ez ,∗,z ,∗ is anchored by arcs Az ,1 and Az ,2 (by Eq. (14)). Combining with Eq. (4), Eq. (15) ensures that arc Cz ,∗ is anchored by arc Az ,2 and arc set Zz . Define C = C∗,∗ ; we know that |C | = 1/2(n2 − n). Let E = E∗,∗,∗,∗ and we have |E | = 1/2(n3 − n). 4.2.4. Arcs from Zu to T Arcs from Zu to T are denoted as Fu . Fu consists of u 2-arc crossing clusters, and the clusters are denoted as Fu,w , 1 ≤ w ≤ u. Fu,w contains two arcs: Fu,w,1 and Fu,w,2 , where Fu,w,1 G Fu,w,2 . Firstly we ensure that Fu is from Zu to T (1 ≤ u ≤ n): Zu G Fu,∗,∗ , Fu,∗,∗ G T . Furthermore, we define the following orders: E∗,u,w,1 < Fu,w,∗ , E∗,u,w,2 G Fu,w,∗ , Au,1 < Fu,u,∗ , Au,2 G Fu,u,∗ , Eu,∗,w,∗ < Fu,w,∗ ,
1≤w
(16)
1≤u≤n
(17)
1≤w≤u
(18)
R(F∗,1,∗ ) < R(F∗,2,∗ ) < · · · < R(F∗,n,∗ ) R(F∗,w,1 ) < R(F∗,w,2 ), Fn,w,i < Fn−1,w,i < · · · < Fw,w,i ,
(19) 1≤w≤n
(20)
1 ≤ w ≤ n, 1 ≤ i ≤ 2
(21)
Eqs. (16) and (17) ensures that Fu,w,∗ are anchored by E∗,u,w,1 and E∗,u,w,2 or by Au,1 and Au,2 respectively. Eq. (18) ensures that if some arcs of Fu,w,∗ appear in a CCM, then none of the arcs of Eu,∗,w,∗ can appear in a CCM. The right endpoints of F∗,∗,∗ are ordered according to their second subscripts by Eq. (19), and then by the third subscript (by Eq. (20)). Furthermore, Eq. (21) ensures that only one arc is possible for a CCM in the set F∗,w,i for given w and i. Let F = F∗,∗,∗ . Note that |F | = n2 + n.
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2419
4.2.5. Arcs from T to Vv and from Vu to Vv Two kinds of arcs Iu,v and Pu,v are defined. Iu,v and Pu,v are induced from the edges of G(V , E ). Iu,v can be either a 3arc crossing cluster or an empty set. If Iu,v 6= ∅, we denote the three arcs in it as Iu,v,1 , Iu,v,2 and Iu,v,3 , with Iu,v,1 G Iu,v,2 , Iu,v,1 G Iu,v,3 and Iu,v,2 G Iu,v,3 . Pu,v contains (n − v) crossing clusters; each cluster Pu,v,w (v < w ≤ n) is empty or has two crossing arcs. If Pu,v,w 6= ∅, we denote the two arcs as Pu,v,w,1 and Pu,v,w,2 , Pu,v,w,1 G Pu,v,w,2 . The arcs from T to Vv are in two sets: P0,v,w and I0,v . P0,v,w is a 2-arc crossing cluster and I0,v is a 3-arc crossing cluster. They are all nonempty sets. The edge information of G(V , E ) is used to construct the arcs from Vu to Vv . For the case Iu,v , 1 ≤ u < v ≤ n, if (u, v) ∈ / EG , Iu,v = ∅; otherwise (u, v) ∈ EG , Iu,v is a 3-arc crossing cluster. For the case Pu,v,w , 1 ≤ u < v < w ≤ n, if (u, w) ∈ / EG , we have Pu,v,w = ∅; otherwise (u, w) ∈ EG , and we have Pu,v,w as a 2-arc crossing cluster. Firstly we ensure that I0,∗,∗ and P0,∗,∗,∗ are crossed by T ; Iu,∗,∗ and Pu,∗,∗,∗ are crossed by Vu (1 ≤ u ≤ n − 1), and I∗,v and P∗,v,∗,∗ are crossing Vv : T G I0,∗,∗ , Vu G Iu,∗,∗ , 1 ≤ u < n I∗,v,∗ G Vv , 1 ≤ v ≤ n
T G P0,∗,∗,∗
(22)
Vu G Pu,∗,∗,∗ , 1 ≤ u < n − 1 P∗,v,∗,∗ G Vv , 1 ≤ v < n
(23) (24)
For the arcs which are crossing Vv , we define the following orders: R(I∗,v,∗ ) < R(P∗,v,v+1,∗ )
1≤v ≤n−1
(25)
R(P∗,v,v+1,∗ ) < R(P∗,v,v+2,∗ ) < · · · < R(P∗,v,n,∗ ),
1≤v ≤n−1
(26)
1≤v<w≤n
(27)
1≤i≤2
(28)
1≤v≤n
(29)
R(P∗,v,w,1 ) < R(P∗,v,w,2 ), Pv−1,v,w,i < Pv−2,v,w,i < · · · < P0,v,w,i , 1 ≤ v < w ≤ n, Iv−1,v,∗ < Iv−2,v,∗ < · · · < I0,v,∗ ,
Eq. (26) ensures that for a given v , the right endpoints of P∗,v,∗,∗ are sorted according to the third subscript. Then Eq. (27) ensures that for any given v and w , the right endpoints for P∗,v,w,∗ are sorted according to the fourth subscript. Furthermore, for any given v , w and i, Eq. (28) ensures that at most one arc in P∗,v,w,i can be selected for a CCM. Next we introduce the orders for the arcs which are crossed by T , and the arcs which are crossed by Vu : Pu,u+1,w,∗ < Pu,u+2,w,∗ < · · · < Pu,n,w,∗ , Iu,w,∗ < Pu,∗,w,∗ ,
0 ≤ u, u + 1 < w ≤ n
(30)
0 ≤ u, u + 1 < w ≤ n
(31)
Eqs. (30) and (31) ensures that either (1) one 2-arc crossing cluster Pu,v,w,∗ can be selected for a CCM, or (2) the 3-arc crossing cluster Iu,w,∗ is selected for a CCM, or (3) none of them are selected. Furthermore, for the arcs which are crossed by T , we define the following orders: F∗,w,1 < I0,w , F∗,w,2 G I0,w ,
1≤w≤`
(32)
F∗,w,1 < P0,∗,w,∗ , F∗,w,2 G P0,∗,w,∗ ,
2≤w≤`
(33)
Eq. (32) ensures that I0,w is anchored by F∗,w,1 and F∗,w,2 , and Eq. (33) ensures that P0,∗,w,∗ is anchored by F∗,w,1 and F∗,w,2 . Lastly, we define orders between those arcs crossed by Vz and the arcs which cross Vz : P∗,z ,w,1 < Iz ,w , P∗,z ,w,2 G Iz ,w , P∗,z ,w,1 < Pz ,∗,w,∗ , P∗,z ,w,2 G Pz ,∗,w,∗ ,
1 ≤ z, z + 1 ≤ w ≤ n
(34)
1 ≤ z, z + 1 < w ≤ n
(35)
Eq. (34) ensures that Iz ,w is anchored by P∗,z ,w,1 and P∗,z ,w,2 , and Eq. (35) ensures that Pz ,∗,w,∗ is anchored by P∗,z ,w,1 and P∗,z ,w,2 . Let P = P∗,∗,∗,∗ and I = I∗,∗ ; it is not difficult to show that |P | ≤ 1/3(n3 − n) · |I | ≤ 3/2(n2 + n). Let DG = H ∪ A ∪ C ∪ Z ∪ E ∪ F ∪ I ∪ P ∪ V , and let SG be those endpoints of the arcs in DG . The target contact map CM (SG , DG ) is fully specified. The following results can be shown for CM (SG , DG ): Lemma 8. (i) An arc a ∈ E crosses no more than 9n3 arcs. (ii) An arc a ∈ F crosses no more than 17n4 arcs. (iii) An arc a ∈ I crosses no more than 9n3 arcs. (iv) |DG − H | < |H |. Proof. We know that |A| + |C | + |E | + |F | ≤ 2n + 1/2(n2 − n) + 1/2(n3 − n) + (n2 + n) ≤ 4n3 . The only possible arcs that an arc a ∈ E can cross are from A, C , E , F , and Zu for some u with 1 ≤ u ≤ n. Since, excepting A, C , E , F , an arc a ∈ F may cross some arcs in P and I, and T as well, we have |P | + |I | < 4n3 and |T | = 9n4 . For an arc a ∈ I, it only crosses those arcs from P , I, and one Vu for some u with 1 ≤ u ≤ n. It is easy to verify that |DG − H | < |H |.
2420
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
4.3. Pattern construction
4.3.1. Large crossing clusters Like for the target case, firstly, we construct 2` + 2 crossing clusters, which are H 0 , Zu0 (1 ≤ u ≤ `), T 0 and Vu0 (1 ≤ u ≤ `). H is a 28n4 -arc crossing cluster. Zu0 is a 5n3 -arc crossing cluster. T 0 is a 9n4 -arc crossing cluster. Vu0 (1 ≤ u ≤ `) is a 5n3 -arc 0
crossing cluster. We also define Z 0 = clusters:
S`
u=1
Zu0 and V 0 =
S`
i =1
Vu0 . Furthermore we define the following order for these large
H 0 < Z10 < · · · < Z`0 < T 0 < V10 < · · · < V`0
4.3.2. Arcs from H 0 to Z10 There is a 2-arc crossing cluster from H 0 to Z10 , and it is denoted as A0 . The two arcs are denoted as A01 and A02 , A01 G A02 . Furthermore, A0 is from H 0 to Z10 : H 0 G A0 , A0 G Z10
(36)
4.3.3. Arcs from Zu0 to Zu0 +1 There are two kinds of arcs from Zu0 to Zu0 +1 : Eu0 and Cu0 . Cu0 is a single arc. Eu0 contains u 3-arc crossing clusters; these clusters are denoted as Eu0 ,w , 1 ≤ w ≤ u. For each cluster Eu0 ,w , its three arcs are denoted as Eu0 ,w,1 , Eu0 ,w,2 and Eu0 ,w,3 with Eu0 ,w,1 G Eu0 ,w,2 , Eu0 ,w,1 G Eu0 ,w,3 and Eu0 ,w,2 G Eu0 ,w,3 Firstly, we ensure that Eu0 ,∗,∗ and Cu0 are from Zu0 and to Zu0 +1 : Zu0 G Eu0 ,∗,∗ , Zu G Cu , 0
0
Eu0 ,∗,∗ G Zu0 +1 , 1 ≤ u ≤ ` − 1
(37)
Cu0 G Zu0 +1 , 1 ≤ u ≤ ` − 1
(38)
Furthermore, we define the following orders: A01 < E10 ,∗,∗ , A02 G E10 ,∗,∗
(39)
Eu,w1 ,∗ G Eu,w2 ,∗ ,
1 ≤ w1 < w2 ≤ u ≤ ` − 1
(40)
1≤w ≤u<`−1
(41)
Eu,∗,∗ G Cu ,
1≤u≤`−1
(42)
Cu−1 < Eu,u,∗
2≤u≤`−1
(43)
0
0
Eu0 ,w,1 < Eu0 +1,w,∗ , Eu0 ,w,2 G Eu0 +1,w,∗ , 0
0
0
0
E10 ,∗,∗ (a 3-arc crossing cluster) is anchored by A01 and A02 (Eq. (39)). Eq. (40) ensures that arcs in Eu,∗,∗ form a crossing cluster. Furthermore, Eqs. (41) ensure that the 3-arc crossing cluster Eu0 +1,w,∗ is anchored by Eu0 ,w,1 and Eu0 ,w,2 . Eq. (42) means that the crossing cluster Eu0 ,∗,∗ crosses the arc Cu0 . Combining with the information from Eqs. (37) and (43), we have that the arc set Eu0 ,u,∗ is anchored by Cu0 −1 and Zu0 . 0 Let C 0 = C∗0 and E 0 = E∗,∗,∗ .
4.3.4. Arcs from Z`0 to T 0
The arcs from Z`0 to T 0 are denoted as F 0 . F 0 has ` crossing clusters, each of which contains two arcs. The crossing clusters are denoted as Fw0 (1 ≤ w ≤ `); the two arcs in Fw0 are denoted asF 0 and F 0 , F 0 G F 0 . Furthermore, we have the w,1
following orders:
w,2
w,1
w,2
0 0 V`0 G F∗,∗ , F∗,∗ G T0
(44)
0 0 0 0 E`− 1,w,1 < Fw,∗ , E`−1,w,2 G Fw,∗ , 0 0 C`−1 < F` Fw0 1 ,∗ G Fw0 2 ,∗ , 0 F∗,∗
V`0
1≤w ≤`−1
(45) (46)
1 ≤ w1 < w2 ≤ ` 0
0 Fw,∗
0 E`− 1,w,1
is from to T . is anchored by and Eq. (44) ensures that 0 form a crossing cluster by Eq. (47). V`0 (Eq. (46)). Furthermore, arcs in F∗,∗
0 E`− 1,w,2
(Eq. (45)) and
(47) F`0
is anchored by
0 C`− 1
and
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2421
4.3.5. Arcs from T 0 to V10 and from Vu0 to Vu0 +1 There are two kinds of arcs: Iu0 (0 ≤ u < `) and Pu0 . Iu0 (0 ≤ u < ` − 1) is a 3-arc crossing cluster, the three arcs being Iu0 ,1 , 0 Iu,2 and Iu0 ,3 , where Iu0 ,1 G Iu0 ,2 , Iu0 ,1 G Iu0 ,3 and Iu0 ,2 G Iu0 ,3 . Pu0 contains ` − u − 1 (0 ≤ u ≤ n − 2) 2-arc crossing clusters; each cluster is denoted as Pu0 ,w (u + 1 < w ≤ `). Denote the two arcs of Pu0 ,w as Pu0 ,w,1 and Pu0 ,w,2 , Pu0 ,w,1 G Pu0 ,w,2 . Firstly we ensure that I00 and P00 ,∗,∗ are crossed by T 0 ; Iu and Pu,∗,∗ are crossed by Vu0 (1 ≤ u ≤ ` − 1), and Iu0 and Pu0 ,∗,∗ crosses Vu0 +1 : T 0 G I00 , T 0 G P00 ,∗,∗
(48)
V u G Iu ,
1≤u<`
(49)
Vu G Pu,∗,∗ ,
1≤u<`−1
(50)
Iu0 ,∗ G Vu0 +1 ,
0≤u<`
(51)
0≤u<`−1
(52)
0
0
0
0
Pu,∗,∗ G Vu+1 , 0
0
Furthermore, arcs in Iu0 and Pu0 ,∗ form a crossing cluster: Iu0 G Pu0 ,∗ , Pu,w1 G Pu,w2 , 0
0
0≤u<`−1
(53)
0 ≤ u, u + 1 < w1 < w2 ≤ `
(54)
Also we introduce the following orders: F10 ,1 < I00 , F10 ,2 G I00 0 0 0 0 Fw, 1 < P0,w , Fw,2 G P0,w , Pu0 ,u+2,1 < Iu0 +1 , Pu0 ,u+2,2 G Iu0 +1 , Pu0 ,w,1 < Pu0 +1,w,∗ , Pu0 ,w,2 G< Pu0 +1,w,∗ ,
(55) 2≤w≤`
(56)
1≤u<`−1
(57)
1 ≤ u, u + 2 < w ≤ `
(58)
0 0 0 Eq. (55) ensures that I00 is anchored by F10 ,1 and F10 ,2 , and Eq. (56) ensures that P00 ,w is anchored by Fw, 1 and Fw,2 . Iu+1 is anchored 0 0 0 0 0 by Pu,u+2,1 and Pu,u+2,2 , and Pu+1,w,∗ is anchored by Pu,w,1 and Pu,w,2 . 0 Let P 0 = P∗,∗,∗ and I 0 = I∗0 . 0 0 0 Dn,` = H ∪ A ∪ C 0 ∪ Z 0 ∪ E 0 ∪ F 0 ∪ I 0 ∪ P 0 ∪ V 0 and Sn0 ,` are the endpoints of those arcs in Dn0 ,` . It is not difficult to verify the following result from the constructions.
Lemma 9. CM (Sn,` , Dn,` ) is a {<, G}-structured contact map, and CM (SG , DG ) is a {<, <, G}-structured contact map. 4.4. Correctness According to the construction, we have the following results. Lemma 10. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), then ∀M , M (H 0 ) = H, M (A0 ) = Au1 ,∗ for some u1 , with 1 ≤ u1 ≤ n. Proof. H 0 is a 28n4 -arc crossing cluster, and the total number of arcs AG − H is less than 28n4 . Thus for a mapping M , there exists h1 ∈ H 0 such that M (h1 ) ∈ H. The arcs M (h1 ) cross and are crossed by H ∪ A. Thus, M (H 0 ) ⊂ (H ∪ A). On the other hand, there are 28n4 arcs crossing A0 . Similarly we argue that there exists h2 ∈ H 0 such that M (h2 ) ∈ H, and h2 crosses A0 . We have M (A0 ) ⊂ (H ∪ A) since M (h2 ) crosses M (A0 ). Now we know that M (H 0 ∪ A0 ) ⊂ (H ∪ A). Note that Au < Av for 1 ≤ u < v ≤ n, which implies that Au and Av cannot occur simultaneously in a CCM. Thus to form a 28n4 + 2-arc crossing cluster, all the arcs in H and one pair of arcs in A, say Au1 ,∗ for some u1 (1 ≤ u1 ≤ n), are used. Therefore the statement holds. Lemma 11. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), then ∀M , M (E10 ,1,∗ ) = Eu1 ,u2 ,u1 ,∗ and M (C10 ) = Cu1 ,u2 for some u1 , u2 with 1 ≤ u1 < u2 ≤ n. Proof. From Lemma 10, if CM (Sn,` , An,` ) occurs in CM (SG , AG ), then M (A0 ) = Au1 for some u1 . We know that A01 < E10 ,1,∗ and A02 G E10 ,1,∗ . Thus we have M (E10 ,1,∗ ) ⊂ Eu1 ,∗,u1 ,∗ ∪ Fu1 ,u1 ,∗ since Eu1 ,∗,u1 ,∗ ∪ Fu1 ,u1 ,∗ contains all arcs which are greater than Au1 ,1 and are crossed by Au1 ,2 . We also note that Eu1 ,v1 ,u1 ,∗ < Eu1 ,v2 ,u1 ,∗ , for v1 < v2 , and Eu1 ,∗,u1 ,∗ < Fu1 ,u1 ,∗ , which implies that M (E10 ,1,∗ ) = Eu1 ,u2 ,u1 ,∗ for some u2 or M (E10 ,1,∗ ) = Fu1 ,u1 ,∗ . However, Fu1 ,u1 ,∗ contains only two arcs, and thus we have M(E10 ,1,∗ ) = Eu1 ,u2 ,u1 ,∗ for some u2 . E10 ,1,∗ is crossed by a Z10 which is a 5n3 -arc crossing cluster. Thus there exists z1 ∈ M (Z10 ) with M (z1 ) ∈ Zu1 since Eu1 ,u2 ,u1
is crossed by less than 2 × 5n3 arcs in total. Also, E10 ,1,∗ crosses Z20 , and Z20 is a 5n3 -arc crossing cluster. By a similar argument, there exists z2 ∈ M (Z20 ) with M (z2 ) ∈ Zu2 . In CM (VG , AG ), the only arc that satisfies the following three conditions is Cu1 ,u2 : (1) is greater than Au1 ,2 ; (2) is crossed by M (z1 ), and (3) is crossing M (z2 ). Thus M (C10 ) = Cu1 ,u2 .
2422
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
Lemma 12. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ) and M (E10 ,1,∗ ) = Eu1 ,u2 ,u1 ,∗ , then M (E20 ,v,∗ ) = Eu2 ,u3 ,uv ,∗ and M (Cv0 ) = Cuv ,uv+1 (v = 1, 2) for some u3 with u2 < u3 ≤ n. Proof. Consider E20 ,1,∗ . We know that E10 ,1,1 < E20 ,1,∗ , and E10 ,1,2 G E20 ,1,∗ . Also note that Eu2 ,v1 ,u1 ,∗ < Eu2 ,v2 ,u1 ,∗ for u2 < v1 < v2 ≤ n, and Fu2 ,1,∗ contains only two arcs. Hence M (E20 ,1,∗ ) = Eu2 ,u3 ,u1 ,∗ for some u3 .
Now we need to prove that M (E20 ,2,∗ ) = Eu2 ,u3 ,u2 ,∗ . E10 ,1,∗ crosses Z20 , and Z20 is a 5n3 -arc crossing cluster. Thus ∃z2 ∈ Z20 with M (z2 ) ∈ Z2 . By a similar argument for the arc set E20 ,1,∗ , ∃z3 ∈ Z30 with M (z3 ) ∈ Z3 . By Lemma 11, we know that M(C10 ) = Cu1 ,u2 . In CM(SG , AG ), in total there are four crossing arcs which are greater than Cu1 ,u2 , are crossed by M (z2 ), and are crossing M (z3 ), and these arcs are {Cu2 ,u3 } ∪ Eu2 ,u2 ,u3 ,∗ . In CM (Sn,` , An,` ), in total there are four crossing arcs which are greater than C10 , are crossed by z2 , and are crossing z3 , and these arcs are {C20 } ∪ E20 ,2,∗ . Thus M (C20 ) = Cu2 ,u3 and M(E20 ,2,∗ ) = Eu2 ,u3 ,u2 ,∗ . Lemma 13. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ) and M (Ek0 ,v1 ,∗ ) = Euk ,uk+1 ,uv1 ,∗ and M (Cv0 1 ) = Cuv1 ,uv
1 +1
, ( v1 =
1, . . . , k) for u1 < · · · < uk+1 ≤ n, then M (Ek0 +1,v2 ,∗ ) = Euk+1 ,uk+2 ,uv2 ,∗ and M (Cv0 2 ) = Cuv2 ,uv +1 (v2 = 1, . . . , k + 1) 2 for some uk+2 with uk+1 < uk+2 ≤ n. Lemma 13 can be shown using arguments similar to those in Lemmas 11 and 12. 0 0 Lemma 14. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), then ∀M , M (E`− 1,v,∗ ) = Eu`−1 ,u` ,uv ,∗ and M (C`−1 ) = Cu`−1 ,u` with v = 1, . . . , ` − 1 and for some u1 , . . . , u` , 1 ≤ u1 < · · · < u` ≤ n.
Lemma 14 can be shown by induction, using Lemmas 10–13. 0 0 Lemma 15. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), and if M (E`− 1,v,∗ ) = Eu`−1 ,u` ,uv ,∗ and M (C`−1 ) = Cu`−1 ,u` for 0 u1 , . . . , u` with 1 ≤ u1 < · · · < u` ≤ n, then M (Fv,∗ ) = Fu` ,uv ,∗ (1 ≤ v ≤ `). 0 0 0 Proof. Consider the arc set F10 ,∗ . Since it is greater than E`− 1,1,1 and is crossed by E`−1,1,2 , we have M (F1 ) ⊂ (Fu` ,u1 ,∗ ∪Eu` ,∗u1 ,∗ )
(if Eu` ,∗u1 ,∗ is not defined, treat it as an empty set). As F10 crosses no less than 9n4 arcs in CM (Sn,` , An,` ) and Eu` −1,u1 ,w crosses 0 less than 9n4 arcs, then the only possible choice is M (F10 ) = Fu0 ` ,u1 . The argument is similar for Fv, i with 2 ≤ v < `. For the 0 0 4 case F`0 , note that it is greater than C`− 1 , and crosses a 9n -arc crossing cluster, so the only choice is M (F` ) = Fu` ,u` .
Lemma 16. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), then ∀M , M (I00 ,∗ ) = I0,u1 ,∗ , and M (P00 ,v,∗ ) = P0,u1 ,uv ,∗ (2 ≤ v ≤ `) for some u1 , . . . , u` with 1 ≤ u1 < · · · < u` ≤ n. Proof. By Lemma 15, given a mapping M , M (Fv0 ) = Fu` ,uv , for some u1 , . . . , u` with 1 ≤ u1 < · · · < u` ≤ n. For the arc set I00 ,∗ , it is greater than F10 ,1 and is crossed by F10 ,2 . Therefore M (I00 ,∗ ) ∈ (I0,u1 ,∗ ∪ P0,∗,u1 ,∗ ), where I00 ,∗ is a 3-arc crossing cluster. The only 3-arc crossing cluster in I0,u1 ,∗ ∪ P0,∗,u1 ,∗ is I0,u1 ,∗ . Therefore M (I00 ,∗ ) = I0,u1 ,∗ . Consider the arc set P00 ,2,∗ , F20 ,1 < P00 ,2,∗ and F20 ,2 G P00 ,2,∗ . In order to satisfy these relations after applying the mapping M , we have M (P00 ,2,∗ ) ∈ (I0,u2 ,∗ ∪ P0,∗,u2 ,∗ ). On the other hand, I00 crosses V10 , which is a crossing arc cluster with 5n3 arcs. In total, I0,u1 crosses less than 2 × 5n3 arcs, so there exists z1 ∈ V10 with M (v1 ) ∈ Vu1 . Then M (P00 ,2 ) has to cross M (v1 ); the only possible pair of arcs in (I0,u2 ∪ P0,u2 ) which crosses M (v1 ) is P0,u1 ,u2 . The same can be shown for 3 ≤ v ≤ `. The following can be shown similarly. 0 0 Lemma 17. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), then ∀M , M (Iw,∗ ) = Iuw ,uw+1 ,∗ , and M(Pw,v,∗ ) = Puw ,uw+1 ,uv ,∗ (1 ≤ w < `, 1 ≤ w + 1 < v ≤ `) for some u1 , . . . , u` with 1 ≤ u1 < · · · < u` ≤ n.
Then by the construction of CM (SG , DG ), we have: Lemma 18. If CM (Sn,` , Dn,` ) occurs in CM (SG , DG ), G has a size ` clique. 0 0 Proof. By Lemma 17, if CM (Sn,` , An,` ) occurs in CM (SG , AG ), this implies that M (Iw,∗ ) = Iuw ,uw+1 ,∗ and M(Pw,v,∗ ) = Puw ,uw+1 ,uv ,∗ (1 ≤ w ≤ `, 1 ≤ w + 1 < v ≤ `) for some u1 , . . . , u` with 1 ≤ u1 < · · · < u` ≤ n. By the construction of CM (SG , AG ), Puw ,uw+1 ,uv ∈ AG and Puw ,uw+1 ,uv 6= ∅ iff (uw , uv ) ∈ EG , and Iuw ,uw+1 is not empty iff (uw , uw+1 ) ∈ EG . Therefore u1 , . . . , u` forms a size ` clique and the statement holds.
Finally, the following theorem can be shown: Theorem 19. CM (V`,n , D`,n ) occurs in CM (VG , DG ) if and only if G contains a clique with size `, and hence the CCMPM problem is NP-hard. Proof. The ‘only if’ case has already been shown. For the ‘if’ case, suppose there is a clique u1 , . . . u` ; a mapping M can be constructed straightforwardly between CM (Sn,` , An,` ) and a subset of arcs in CM (SG , AG ). The reduction is polynomial; thus the statement holds.
S.C. Li, M. Li / Theoretical Computer Science 410 (2009) 2410–2423
2423
Fig. 7. An example demonstrating the flaw of the algorithm: (a) A {G, <}-structured CM as the pattern. (b) The target CM.
Note that we have shown a stronger result where the problem is NP-hard even for the case where the target is a {<, <, G}structured contact map (in general, arcs in a target can share endpoints). The maximum contact map overlap (CMO) problem with {<, G}-structured patterns is to find a maximized common CCM between two given contact maps. The complexity of this problem was an open question [6]. We now show that the problem is NP-hard using Theorem 19. Theorem 20. The CMO problem is NP-hard. Proof. Given a CCMPM problem instance: CM (Sp , Dp ) and CM (S , D ), find the maximized common CCM CM (Sp0 , Dp0 ) between CM (Sp , Dp ) and CM (S , D ), and then verify whether CM (Sp0 , Dp0 ) is identical to CM (Sp , Dp ). Clearly this reduction is polynomial. Thus the theorem holds. 5. Counterexample for the algorithm in [6,7] In this section, we present a counterexample for the algorithm in [6,7]. The example is displayed in Fig. 7. The arcs are labeled with letters instead of numbers for ease of illustration. The pattern is a CCM with 24 arcs, while the target contains 42 arcs, and is {<, <, G}-structured. The arcs are labeled in the way that we intend to map an arc of a pattern to an arc of the target, which is labeled with the same letter in a different case. It can be verified that the pattern does not occur in the target, but the algorithm in [6,7] produces a ‘yes’ answer. References [1] Guillaume Blin, Guillaume Fertin, Stéphane Vialette, New results for the 2-interval problem, in: Proc. Fifteenth Annual Combinatorial Pattern Matching Symposium, CPM 2004, in: Lecture Notes in Computer Science, vol. 3109, Springer-Verlag, 2004, pp. 311–322. [2] Erdong Chen, Linji Yang, Hao Yuan, Improved algorithms for largest cardinality 2-interval pattern problem, J. Comb. Optim. 13 (April) (2007) 263–275. [3] Eugene Davydov, Serafim Batzoglou, A computational model for rna multiple structural alignment, Theoret. Comput. Sci. 368 (3) (2006) 205–216. [4] Michael R. Garey, David S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, W.H. Freeman & Company, 1979. [5] Deborah Goldman, Christos H. Papadimitriou, Sorin Istrail, Algorithmic aspects of protein structure similarity, in: FOCS’99: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, IEEE Computer Society, Washington, DC, USA, 1999, page 512. [6] Jens Gramm, A polynomial-time algorithm for the matching of crossing contact-map patterns, in: WABI, 2004, pp. 38–49. [7] Jens Gramm, A polynomial-time algorithm for the matching of crossing contact-map patterns, IEEE/ACM Trans. Comput. Biol. Bioinformatics 1 (4) (2004) 171–180. [8] Stéphane Vialette, On the computational complexity of 2-interval pattern matching problems, Theoret. Comput. Sci. 312 (2–3) (2004) 223–249.
Theoretical Computer Science 410 (2009) 2424–2430
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
On the Hopcroft’s minimization technique for DFA and DFCAI Andrei Păun a,b,c,∗ , Mihaela Păun d,e , Alfonso Rodríguez-Patón b a
Bioinformatics Department, National Institute of Research and Development for Biological Sciences, Splaiul Independenţei, Nr. 296, Sector 6, Bucharest, Romania
b
Universidad Politécnica de Madrid - UPM, Facultad de Informática, Campus de Montegancedo S/N, Boadilla del Monte, 28660 Madrid, Spain
c
Department of Computer Science/IfM, Louisiana Tech University, P.O. Box 10137, Ruston, LA 71272, USA
d
Faculty of Mathematics and Informatics, Spiru Haret University, Bucharest, Romania
e
Department of Mathematics and Statistics, Louisiana Tech University, P.O. Box 10348, Ruston, LA 71272, USA
article Keywords: Cover automata Minimization Hopcroft’s algorithm Time complexity
info
a b s t r a c t We show that the absolute worst case time complexity for Hopcroft’s minimization algorithm applied to unary languages is reached only for deterministic automata or cover automata following the structure of the de Bruijn words. A previous paper by Berstel and Carton gave the example of de Bruijn words as a language that requires O(n log n) steps in the case of deterministic automata by carefully choosing the splitting sets and processing these sets in a FIFO mode for the list of the splitting sets in the algorithm. We refine the previous result by showing that the Berstel/Carton example is actually the absolute worst case time complexity in the case of unary languages for deterministic automata. We show that the same result is valid also for the case of cover automata and an algorithm based on the Hopcroft’s method used for minimization of cover automata. We also show that a LIFO implementation for the splitting list will not achieve the same absolute worst time complexity for the case of unary languages both in the case of regular deterministic finite automata or in the case of the deterministic finite cover automata as defined by S. Yu. Published by Elsevier B.V.
1. Introduction This work is, in part, a continuation of the result reported by Berstel and Carton in [2] regarding the number of steps required for minimizing a unary language through Hopcroft’s minimization technique. The second part of the paper considers the same problem in the setting of Cover Automata. This new type of automata was introduced by Prof. Dr. Sheng Yu in [6] and since then was investigated by several authors such as in [4,7,13,14,17]. This notion proved to be one of the highest impact works of Prof. Dr. Sheng Yu on the field of the finite automata theory together with his work about the operation complexity on finite automata reported in [5,19–21] or [11,16,17]. Many of these results and algorithms as well as established algorithms have been implemented in the Grail+ project [22]. Dr. Yu has high impact results in several other areas of Computer Science. In the first part of the paper we will analyze the result by Bestel and Carton from [2] in more depth. In [3] it was shown that Hopcroft’s minimization algorithm requires O(n log n) steps for an automaton having a structure based on de Bruijn words, if implementation decisions are ‘‘bad’’. The setting of the paper [2] is for languages over a unary alphabet, considering the input
I Work supported in part by NSF CCF-0523572, grants from the National Plan of R&D II of Romania (CNCSIS RP-13/2007, CNCSIS RP-5/Jan.2008, and CNMP PC-1284), and Spanish Ministry of Science and Education (MEC) under project TIN2006-15595, and support from the Comunidad de Madrid (grant No. CCG07-UPM/TIC-0386 to the LIA research group). ∗ Corresponding author at: Bioinformatics Department, National Institute of Research and Development for Biological Sciences, Splaiul Independenţei, Nr. 296, Sector 6, Bucharest, Romania. Tel.: +40 318 257 5135; fax: +40 318 257 5104. E-mail addresses:
[email protected] (A. Păun),
[email protected] (M. Păun),
[email protected] (A. Rodríguez-Patón).
0304-3975/$ – see front matter. Published by Elsevier B.V. doi:10.1016/j.tcs.2009.02.034
A. Păun et al. / Theoretical Computer Science 410 (2009) 2424–2430
2425
languages having the number of states a power of 2 and choosing ‘‘in a specific way’’ which set to become a splitting set in the case of ties. In this context, the previous paper showed that one needs O(n log n) steps for the algorithm to complete, which is reaching the theoretical asymptotic worst case time complexity for the algorithm reported in papers such as [8–10,12]. We were initially interested in investigating further the complexity of an algorithm described by Hopcroft, specifically considering the setting of unary languages, but for a stack implementation in the algorithm. Our effort has lead to the observation that when considering the worst case for the number of steps of the algorithm (which in this case translates to the largest number of states appearing in the splitting sets), a LIFO implementation indeed outperforms a FIFO strategy, as suggested by experimental results on random automata as reported in [1]. One major observation/clarification that is needed is the following: we do not consider the asymptotic complexity of the run-time, but the actual number of steps. For the setting of the current paper when comparing n log n steps and n log(n − 1) or 2n log n steps we will say that n log n is worse than both n log(n − 1) and 2n log n, even though when considering them in the framework of the asymptotic complexity (big-O) they have the same complexity, i.e. n log(n − 1) ∈ Θ (n log n) and n log n ∈ Θ (n log n). 2 In Section 2 we give some definitions, notations and previous results, then in Section 3 we give a brief description of the algorithm discussed and its features. Section 4 describes the properties for the automaton that reaches worst possible case in terms of steps required for the algorithm (as a function of the initial number of states of the automaton). We then briefly touch upon the case of cover automata minimization with a modified version of the Hopcroft’s algorithm in Section 5 and conclude by giving some final remarks in the Section 6. 2. Preliminaries We assume the reader is familiar with the basic notations of formal languages and finite automata — see for example the excellent work by Yu [18] or Salomaa [15]. In the following we will be denoting the cardinality of a finite set T by |T |, the set of words over a finite alphabet Σ is denoted Σ ∗ , and the empty word is λ. The length of a word w ∈ Σ ∗ is denoted with S |w|. We define for l ≥ 0 the following sets of words: Σ l = {w ∈ Σ ∗ | |w| = l}, Σ ≤l = li=0 Σ i , and for l > 0 we define
Σ
Sl−1
i=0
Σ i.
A deterministic finite automaton (DFA) is a quintuple A = (Σ , Q , δ, q0 , F ) where Σ is a finite set of symbols, Q is a finite set of states, δ : Q × Σ −→ Q is the transition function, q0 is the start state, and F is the set of final states. We can extend δ from Q × Σ to Q × Σ ∗ by δ(s, λ) = s, δ(s, aw) = δ(δ(s, a), w). We will usually denote the extension δ of δ by δ when there is no confusion by doing this. The language recognized by the automaton A is L(A) = {w ∈ Σ ∗ | δ(q0 , w) ∈ F }. In what follows we assume that δ is a total function, i.e., the deterministic automaton is also complete. For a DFA A = (Σ , Q , δ, q0 , F ), we can always assume, without loss of generality, that Q = {0, 1, . . . , |Q | − 1} and q0 = 0; we will use this idea every time it is convenient for simplifying our notations. If L is finite, L = L(A) and A is complete, there is at least one state, called the sink state or dead state, for which δ(sink, w) ∈ / F , for any w ∈ Σ ∗ . If L is a finite language, we denote by l the length of the longest word(s) in L. We recall, in the following, some basic definitions and properties of cover languages and automata from [4] and [6]. We refer the interested reader to the aforementioned papers for more properties and proofs of the results recalled below. Definition 1 (Cover Language). A language L0 over Σ is called a cover language for the finite language L if L0 ∩ Σ ≤l = L. A deterministic finite cover automaton (DFCA) for L is a deterministic finite automaton (DFA) A, such that the language accepted by A is a cover language of L. Definition 2 (State Equivalence). Let A = (Σ , Q , δ, 0, F ) be a DFA. We say that p ≡A q (state p is equivalent to q in A) if for every w ∈ Σ ∗ , δ(p, w) ∈ F iff δ(q, w) ∈ F . Definition 3 (Level). Let A = (Σ , Q , δ, 0, F ) be a DFA (or a DFCA for L). We define, for each state q ∈ Q , lev el(q) = min{|w| | δ(0, w) = q}. The right language of a state p ∈ Q and for a DFCA A = (Q , Σ , δ, q0 , F ) for L is Rp = {w | δ(p, w) ∈ F , |w| ≤ l − lev elA (p)}. Definition 4 (Word Similarity). Let x, y ∈ Σ ∗ . We define the following similarity relation by: x ∼L y if for all z ∈ Σ ∗ such that xz , yz ∈ Σ ≤l , xz ∈ L iff yz ∈ L, and we write x 6∼L y if x ∼L y does not hold. Definition 5 (State Similarity). Let A = (Σ , Q , δ, 0, F ) be a DFCA for L. We consider two states p, q ∈ Q and m = max{lev el(p), lev el(q)}. We say that p is similar with q in A, denoted by p ∼A q, if for every w ∈ Σ ≤l−m , δ(p, w) ∈ F iff δ(q, w) ∈ F . We say that two states are dissimilar if they are not similar (the above does not hold). If the automaton is understood, we may omit in the following the subscript A. Lemma 6. Let A = (Σ , Q , δ, 0, F ) be a DFCA for a finite language L. Let p, q ∈ Q , with lev el(p) = i, lev el(q) = j, and m = max{i, j}. If p ∼A q, then Rp ∩ Σ ≤l−m = Rq ∩ Σ ≤l−m . Definition 7. A DFCA A for a finite language is a minimal DFCA if and only if any two distinct states of A are dissimilar. Once two states have been detected as similar, one can merge the higher level one into the smaller level one by redirecting transitions. We refer the interested reader to [6] for the merging theorem and other properties of cover automata.
2426
A. Păun et al. / Theoretical Computer Science 410 (2009) 2424–2430
3. Hopcroft’s state minimization algorithm In [10] it was described an elegant algorithm for state minimization of DFAs. This algorithm was proven to be of the order O(n log n) in the worst case (asymptotic evaluation). We will study further the complexity of the algorithm by considering the various implementation choices left to the programmer by the author of the algorithm. We will show that by implementing the list of the splitting sets as a queue, one will be able to reach the absolute worst possible case with respect to the number of steps required for minimizing an automaton. We will also show that by changing the implementation strategy from a queue to a stack, we will never be able to reach that absolute worst case in the number of steps for minimizing automata, thus, at least from this respect, the programmers should implement the list S from the following algorithm with a LIFO strategy. The algorithm uses a special (of the union-find type of) data structure that makes the set operations of the algorithm fast. We will give in the following the description of the algorithm for an arbitrary alphabet Σ and a DFA A = (Σ , Q , δ, q0 , F ). Later we will restrict the discussion to the case of unary languages. 1: P = {F , Q − F } 2: for all a ∈ Σ do 3: Add((min(F , Q − F ), a), S ) (min w.r.t. the number of states) 4: while S 6= ∅ do 5: get (C , a) from S (we extract (C , a) according to the strategy associated with the list S : FIFO/LIFO/...) 6: for each B ∈ P that is split by (C , a) do 7: B0 , B00 are the sets resulting from splitting of B w.r.t. (C , a) 8: Replace B in P with both B0 and B00 9: for all b ∈ Σ do 10: if (B, b) ∈ S then 11: Replace (B, b) in S by both (B0 , b) and (B00 , b) 12: else 13: Add((min(B0 , B00 ), b), S ) where the splitting of a set B by the pair (C , a) (the line 6) means that δ(B, a) ∩ C 6= ∅ and δ(B, a) ∩ (Q − C ) 6= ∅. Where by δ(B, a) we denote the set {q | q = δ(p, a), p ∈ B}. The B0 and B00 from line 7 are defined as the following two subsets of B: B0 = {b ∈ B | δ(b, a) ∈ C } and B00 = B − B0 . It is useful to explain briefly the working of the algorithm: we start with the partition P = {F , Q − F } and one of these two sets is then added to the splitting sequence S . The algorithm proceeds by breaking the partition into smaller sets according to the current splitting set retrieved from S . With each splitting of a set in P the number of sets stored in S grows (either through instruction 11 or instruction 13). When all the splitting sets from S are processed, and S becomes empty, then the partition P shows the state equivalences in the input automaton: all the states contained in a same set B in P are equivalent. Knowing all equivalences, one can easily minimize the automaton by merging all the states found in the same set in the final partition P at the end of the algorithm. We note that there are three levels of ‘‘nondeterminism’’ in the implementation of the algorithm. All these three choices influence the number of steps that the algorithm will run for a given input automaton. We describe first the three implementation choices, and later we show the worst case scenario for each of them. The ‘‘most visible’’ implementation choice is the choice of the strategy for processing the list stored in S : as a queue, as a stack, etc. The second and third such choices in implementation of the algorithm appear when a set B is split into B0 and B00 . If B is not present in S , then the algorithm is choosing which set B0 or B00 to be added to S , choice that is based on the minimal number of states in these two sets (line 13). In the case when both B0 and B00 have the same number of states, then we have the second implementation choice (the choosing of the set that will be added to S ). The third such choice appears when the split set (B, a) is in the list S; then the algorithm mentions the replacement of (B, a) by (B0 , a) and (B00 , a) (line 11). This is actually implemented in the following way: (B00 , a) is replacing (B, a) and (B0 , a) is added to the list S (or vice-versa). Since we saw that the processing strategy of S matters, then also the choice of which B0 or B00 is added to S and which one replaces the previous location of (B, a) matters in the actual run-time of the algorithm. In the original paper [10] and later in [8], and [12], when describing the complexity of the minimization method, the authors showed that the algorithm is influenced by the number of states that appear in the sets processed by S . Intuitively, that is the reason why the smallest set between B0 and B00 is inserted in S in line 13; and this is what makes the algorithm sub-quadratic. In the following we will focus on exactly this issue, of the number of states in sets inserted in S . 4. Worst case scenario for unary languages Let us start the discussion by making several observations and preliminary clarifications: we are discussing languages over a unary alphabet. To make the proof easier, we restrict our discussion to the automata having the number of states a
A. Păun et al. / Theoretical Computer Science 410 (2009) 2424–2430
2427
power of 2. The three levels of implementation choices are clarified/set in the following way: we assume that the processing of S is based on a FIFO and that there is a strategy of choosing between two sets that have been just split. If the two sets have the same number of elements, the insertion strategy in the queue S makes the third implementation nondeterminism irrelevant. In other words, no splitting of a set already in S will take place (line 11 will not be executed). Let us assume that such an automaton with 2n states is given as input for the minimization algorithm described in the previous section. We note that since we have only one letter in the alphabet, the states (C , a) from the list S can be written without any problems as C , thus the list S (for the particular case of unary languages) becomes a list of sets of states. So let us assume that the automaton A = ({a}, Q , δ, q0 , F ) is given as the input of the algorithm, where |Q | = 2n . The algorithm proceeds by choosing the first splitter set to be added to S . The first such set will be chosen between F and Q − F based on their number of states. Since we are interested in the worst case scenario for the algorithm, and the algorithm run-time is influenced by the total number of states that will appear in the list S throughout the running of the algorithm (as shown in [10,8,12] and mentioned in [2]), it is clear that we want to maximize the sizes of the sets that are added to S . It is time to give a Lemma that will be useful in the following. Lemma 8. For deterministic automata over unary languages, if a set R with |R| = m is the current splitter set, then R cannot add to the list S sets containing more than m states (line 13). Proof. The statement of the lemma can be reformulated as: For all B ∈ P such that δ(B, a)∩ R 6= ∅ and δ(B, a)∩(Q − R) 6= ∅ 0 00 we P define the sets B =0 {q00 ∈ B | δ(q, a) ∩ R 6= ∅} and B = {q ∈ B | δ(q, a) ∩ (Q − R) 6= ∅}. Then B∈P and splittable(B) min(|Bi |, |Bi |) ≤ m. We have only one letter in the alphabet, thus the number of states q such that δ(q, a) ∈ R is at most m. Each element of the summation of min(B0i , B00i ) isP chosen as the set with the smaller number of states whenS splitting Bi thus |min(B0i , B00i )| ≤ P 0 00 |δ(Bi , a)∩ R| which implies that B∈P and splittable(B) min(|Bi |, |Bi |) ≤ ∀ i |δ(Bi , a)∩ R| = |( ∀ i δ(Bi , a))∩ R| ≤ |R| (because all Bi are disjoint). Thus we proved that if we start splitting according to a set R, then the new sets added to S contain at most |R| states. Coming back to our previous setting, we will start with the automaton A = ({a}, Q , δ, q0 , F ) where |Q | = 2n given as input to the algorithm and we have to find the smaller set between F and Q − F . In the worst case (according to Lemma 8) we have that |F | = |Q − F |, as otherwise, fewer than 2n−1 states will be contained in the set added to S and thus less states will be contained in the sets added to S in the second stage of the algorithm, and so on. So in the worst case we have that the number of final states and the number of non-final states is the same. To simplify the discussion we will give some notations. If w = w1 . . . wn , then Sw is defined formally as Sw = {q ∈ Q | δ(q, ai−1 ) ∈ F and wi = 1 or δ(q, ai−1 ) 6∈ F and w1 = 0}, where δ(p, a0 ) denotes p. As an example, S1 = F , S110 contains all the final states that are followed by a final state and then by a non-final state and S00000 denotes the states that are non-final and are followed in the automaton by four more non-final states. With these notations we have that at this initial step of the algorithm, either F = S1 or Q − F = S0 can be added to S as they have the same number of states. Either one that is added to the queue S will split the partition P in the worst case scenario in the following four possible sets S00 , S01 , S10 , S11 , each with 2n−2 states. This is true as by splitting the sets F and Q − F in sets with sizes other than 2n−2 , then according with Lemma 8 we will not reach the worst possible number of states in the queue S and also splitting only F or only Q − F will add to S only one set of 2n−2 states not two of them. All this means that half of the non-final states go to a final state (|S01 | = 2n−2 ) and the other half go to a non final state (S00 ). Similarly, for the final states we have that 2n−2 of them go to a final state (S11 ) and the other half go to a non-final state. The current partition at this step 1 of the algorithm is P = {S00 , S01 , S10 , S11 } and the splitting sets are one of the S00 , S01 and one of the S10 , S11 . Let us assume that it is possible to chose the splitting sets to be added to the queue S in such a way so that no splitting of another set in S will happen, (chose in this case for example S10 and S00 ). We want to avoid splitting of other sets in S since if that happens, then smaller sets will be added to the queue S by the split set in S (see such a choice of splitters described also in [2]). We have arrived at step 2 of the processing of the algorithm, since these two sets from S are now processed, in the worst case they will be able to add to the queue S at most 2n−2 state each by splitting each of them two of the four current sets in the partition P . Of course, to reach the worst case, we need them to split different sets, thus we obtain eight sets in the partition P corresponding to all the possibilities of words of length 3 on a binary alphabet: P = {S000 , S001 , S010 , S011 , S100 , S101 , S110 , S111 } having 2n−3 states each. Thus four of these sets will be added to the queue S . And we could continue our reasoning up until the i-th step of the algorithm: We now have 2i−1 sets in the queue S , each having 2n−i states, and the partition P contains 2i sets Sw corresponding to all the words w of the length i. Each of the sets in the splitting queue is of the form Sx1 x2 ...xi , then a set Sx1 x2 x3 ...xi can only split at most two other sets Sx2 x3 ...xi−1 0 and Sx2 x3 ...xi−1 1 from the partition P . In the worst case, at iteration i, all these sets in the splitting queue are not splitting a set already in the queue, and split two distinct sets in the partition P , making the partition at step i + 1 the set P = {Sw | |w| = i + 1}, and each such Sw having exactly 2n−i−1 states. And in this way the process continues until we arrive at the n-th step. If the process would terminate before the step n, of course we would not reach the worst possible number of states passing through S.
2428
A. Păun et al. / Theoretical Computer Science 410 (2009) 2424–2430
Fig. 1. A cyclic automaton of size 8 for the de Bruijn word 11101000, containing all words of size 3 over the binary alphabet {0, 1} as subwords.
Let us now see the properties of an automaton that would obey such a processing through the Hopcroft’s algorithm. We started with 2n states, out of which we have 2n−1 final and also 2n−1 non-final, out of the final states, we have 2n−2 that precede another final state (S11 ), and also 2n−2 non-final states that precede other non-final states for S00 , etc. The strongest restrictions are found in the final partition sets Sw , with |w| = n each have exactly one element, which means that all the words of length n over the binary alphabet can be found in this automaton by following the transitions between states and having 1 for a final state and 0 for a non-final state. It is clear that the automaton needs to be circular and following the pattern of de Bruijn words, [3]. Such an automaton for n = 3 was depicted in [2] as in the following Fig. 1. It is easy to see now that a stack implementation for the list S will not be able to reach the maximum as smaller sets will be processed before considering larger sets. This fact will lead to splitting of sets already in the list S . Once this happens for a set with j states, then the number of states that will appear in S is decreased by at least j because the split sets will not be able to add as many states as a FIFO implementation was able to do. We conjecture that in such a setting the LIFO strategy could prove the algorithm linear with respect to the size of the input, if the aforementioned third level of implementation choice is set to add the smaller set of B0 , B00 to the stack and B to be replaced by the larger one. We proved the following result: Theorem 9. The absolute worst case run-time complexity for the Hopcroft’s minimization algorithm for unary languages is reached when the splitter list S in the algorithm is following a FIFO strategy and only for automata having a structure induced by de Bruijn words of size n. In that setting the algorithm will pass through the queue S exactly n2n−1 states for the input automaton of size 2n . Thus for m states of the input automaton we have exactly m log2 m states passing through S . 2 Proof. Due to the previous discussion we now know that the absolute maximum for the complexity of the Hopcroft’s algorithm is reached in the case of the FIFO strategy for the splitter list S . The maximum being reached when the input automaton is following the de Bruijn words for a binary alphabet. What remains to be proven is the actual number of states that pass through the queue S : in the first stage exactly half of all states are added to S through one of the sets S0 or S1 , in the second stage again half of the states are added to S through two of the four sets S00 , S01 , S10 , S11 . At the third stage again half of states are added to S because four of the following sets S000 , S001 , S010 , S011 , S100 , S101 , S110 , S111 are added to S , each having exactly 2n−3 states. we continue this process until the last stage of the algorithm, stage n: when still 2n−1 states are added to S through the fact that exactly 2n−1 sets containing each exactly one state being added to the splitting queue. Of course, at this stage we have the partitioning into {Sw | |w| = n} and half of these sets will be added to S through the instruction at line 13. It should be now clear that we have exactly n stages in this execution of the algorithm, each with 2n−1 states added to S, hence the result. 5. Cover automata In this section we discuss briefly about an extension to Hopcroft’s algorithm to cover automata. Körner reported at CIAA’02 and also in [13] a modification of the Hopcroft’s algorithm so that the resulting sets in the partition P will give the similarities between states with respect to the input finite language L. To achieve this, the algorithm is modified as follows: each state will have its level computed at the start of the algorithm; each element added to the list S will have three components: the set of states, the alphabet letter and the current length considered. We start with (F , a, 0) for example. Also the splitting of a set B by (C , a, l1 ) is defined as before with the extra condition that we ignore during the splitting the states that have their level+l1 greater than l. Formally we can define the sets X = {p | δ(p, a) ∈ C , lev el(p) + l1 ≤ l} and Y = {p | δ(p, a) 6∈ C , lev el(p) + l1 ≤ l}. Then a set B will be split only if B ∩ X 6= ∅ and B ∩ Y 6= ∅.
A. Păun et al. / Theoretical Computer Science 410 (2009) 2424–2430
2429
The actual splitting of B ignores the states that have levels higher than or equal to l − l1 . This also adds a degree of nondeterminism in implementation of the algorithm when such states appear, because the programmer can choose to add these sets in either of the two split sets obtained from B. The worst implementation choice would be to put the states with level higher than l − l1 in such a way that they balance the number of states in both B0 and B00 (where B0 = X ∪ Z 0 and B00 = Y ∪ Z 00 and Z 0 ∩ Z 00 = ∅, and Z 0 ∪ Z 00 = B − (X ∪ Y ) are all the states of level higher than or equal with l − l1 ). In the following discussion we will assume that the programmer will not spend extra steps to balance the number of states in the B0 and B00 since this is the worst possible choice to make (spend extra steps to obtain worst runtime). We will make in this case the choice to have the states split as in the case of DFA, according to whether δ(p, a) ∈ C , then p ∈ X , otherwise, p ∈ Y . This choice will make the Lemma 8 valid also for the Cover automata case. The algorithm proceeds as before to add the smaller of the newly split sets to the list S together with the value l1 + 1. Let us now consider the same problem as in [2], but in this case for the case of DFCA minimization through the algorithm described in [13]. We will consider the same example as before, the automata based on de Bruijn words as the input to the algorithm (we note that the modified algorithm can start directly with a DFCA for a specific language, thus we can have as input even cyclic automata). We need to specify the actual length of the finite language that is considered and also the starting state of the de Bruijn automaton (since the algorithm needs to compute the levels of the states). We can choose the length of the longest word in L as l = 2n and the start state as S1n . For example, the automaton in Fig. 1 would be a cover automaton for the language L = {0, 1, 2, 4, 8} with l = 8 and the start state q0 = 1. Following the same reasoning as in [2] but for the case of the new modified algorithm, we can show that also for the case of DFCA the choice of queue implementation (as in [13]) is worse than stack implementation for S . We note that the discussion is not a straight-forward extension of the work reported by Berstel in [2] as the new dimension added to the sets in S , the length and also the levels of states need to be discussed in detail. We will give the details of the construction and the step-by-step discussion of this fact in the following: We start as before with an automaton defined on a unary language with 2n states: A = ({a}, Q , δ, q0 , F ) where |Q | = 2n . Let us take a look at the possible levels of the states in deterministic automata over unary languages: Such an automaton is formed by a line followed by a loop. The line or the loop can be possibly empty: if the loop is empty (or containing only non-final states), then the automaton accepts only a finite set of numbers, if the loop contains at least one state that is final, it accepts an infinite set. In either case the levels of the states is 0, 1, 2, 3, . . . , n − 2, n − 1. One can see that the highest level in such a unary DFA is at most n − 1. Following the variant of Lemma 8 for DFCA it is clear that the worst possible case is when |F | = |Q − F |. Let us consider that S starts with the pair (F , 0) or (Q − F , 0), in either case at the second stage of the algorithm the partition P will be split in the following four possible sets (similarly as in the case of DFA): S00 , S01 , S10 , S11 . To continue with the worst possible case, each of these sets need to contain exactly 2n−2 states (otherwise, according to Lemma 8 for the DFCA case, a set with less states is added to S and also in the next steps less states will be added to S ). Also in this case it is necessary to make a ‘‘bad’’ choice of the sets that will be added next to S (one from the S00 , S01 and one from S10 , S11 ). We will use the same choosing strategy as before. The difference is that these sets will be added to S and with the length 1: for example, at the next step S will contain (S00 , 1) and (S10 , 1). At the next stage of the algorithm application we will observe a difference with the DFA case: one of the states at the next stage will not be split from the set because of its high level. Considering that we have a state of level l − 1, at this step this state will not be considered and can be added to either one or the other of the halves of the state containing it. For the final automaton, considering that the state S11..1 is the start state, the high level state is S011..1 . We continue the process until the i-th iteration of the algorithm in a similar fashion by carefully choosing the splitting sets, and by having at each iteration yet another state that would not be considered in the splitting due to its high level. Because of the forth implementation choice, the number of states in each set remains the same. At this moment we will have 2i−1 pairs in the queue S , each formed between a set containing 2n−i states and the value i − 1. Thus we will compute the splitter sets X and Y as given before in the case of DFA with the extra condition that the sets p satisfying the condition also satisfy the fact that lev el(p) + i − 1 ≤ l. At this moment the partition P has exactly 2i components that are in the worst case exactly the sets Sw for all w ∈ {0, 1}i . In the worst case all the level i states from the splitting queue S will not break a set already in the queue S , but at the same time will split two other sets in the partition P . This is achieved by the careful choosing of the order in which these sets arrive in the queue (one of these ‘‘worst’’ addition to S strategies was described in [2]). In this way, at the end of the stage i in the algorithm we will have the partition P containing all the sets Sw with w ∈ {0, 1}i+1 and each of them having 2n−i−1 states. In the queue S there will be 2i pairs of states with the number i. These splitting pairs will be used in the next stage of the algorithm. This process will continue until the n − 1-th stage as before (otherwise we will not be in the worst possible case) and at the n-th stage exactly n − 2 sets will not be added to the queue S (as opposed to the DFA case), thus only 2n−1 − n + 2 singleton sets will be added. This makes the absolute worst case for the run-time of the minimization of DFCA based on Hopcroft’s method have exactly n2n−1 − n + 2 states pass through S . The input automaton still follows the structure induced by de Bruijn words; and when considering the start state as S11...1 , the states that will be similar with other states are the n − 2 states of highest levels: S011...1 , S001..1 , . . . , S00...011 . In fact we will have several similarities between these high level states and other states in the automaton, more precisely, for an automaton with 2n states (following the structure of de Bruijn words containing all the subwords of size n) we have the following pattern of similarities: the state S011...1 will have exactly 2n−2 − 1 similarities
2430
A. Păun et al. / Theoretical Computer Science 410 (2009) 2424–2430
with other states in the automaton (because the level of this state is 2n − 1, thus only the pattern 01 is making the difference between it and other states), for S001...1 we will have 2n−3 − 1 similarities (as for its level 2n − 2 the pattern making the difference is 001), and so on, until S000...01 will have actually 2n−(n−1) − 1 = 2 − 1 = 1 similarities (since its level is 2n − n + 2 and the pattern making the difference is 000...01). These values are obtained from considering the fact that the structure of the automaton will have all the sub-words of size n, thus we can compute how many times a particular pattern appears in the automaton. This shows that a result similar to Theorem 9 holds also for the case of DFCA with the only difference in the counting of states passing through S : n2n−1 − n + 2 rather than n2n−1 . It should be clear now that a stack implementation for the list S is more efficient, at least for the case of unary languages and when considering the absolute worst possible run-time of the algorithm. 6. Final remarks We showed that at least in the case of unary languages, a stack implementation is more desirable than a queue for keeping track of the splitting sets in the Hopcroft’s algorithm. This is the first instance when it was shown that the stack is out-performing the queue. It remains open as to whether there are examples of languages which, for a LIFO approach, would perform worse than the FIFO case. Our conjecture is that the LIFO implementation will always outperform a FIFO implementation, which was also suggested by the experiments reported in [1]. As future work is planned, it is worth mentioning our conjecture that there is a strategy for processing a LIFO list S such that the minimization of all the unary languages will be realized in linear time by the algorithm. For the case of cover automata one should settle the extra implementation choice (the forth implementation choice as mentioned in the text) as follows: rather than balance the number of states in the two split sets, actually try to un-balance them by adding all the high level states to the bigger set. These remarks should achieve a reasonable speed-up for the algorithm. References [1] M. Baclet, C. Pagetti, Around Hopcroft’s algorithm, in: Proc. of the 11th Conference on Implementation and Application of Automata, CIAA’06, in: Lecture Notes in Computer Science, vol. 4094, 2006, pp. 114–125. [2] J. Berstel, O. Carton, On the complexity of Hopcrofts state minimization algorithm, in: Proc. 9th Conference on Implementation and Application of Automata, CIAA04, in: Lecture Notes in Computer Science, vol. 3317, 2004, pp. 35–44. [3] N.G. de Bruijn, A combinatorial problem, Koninklijke Nederlandse Akademie v. Wetenschappen 49 (1946) 758–764. [4] C. Câmpeanu, A. Păun, S. Yu, An efficient algorithm for constructing minimal cover automata for finite languages, International Journal of Foundations of Computer Science 13 (1) (2002) 83–97. [5] C. Câmpeanu, K. Salomaa, S. Yu, Tight lower bound for the state complexity of shuffle of regular languages, Journal of Automata, Languages and Combinatorics 7 (3) (2002) 303–310. [6] C. Câmpeanu, N. Sântean, S. Yu, Minimal cover-automata for finite languages, in: Proceedings of the Third International Workshop on Implementing Automata, WIA’98, 1998, pp. 32–42; Theoretical Computer Science 267 (2001) 3–16. [7] M. Domaratzki, J. Shallit, S. Yu, Minimal covers of formal languages, in: Proceedings of Developments in Language Theory 2001, in: Lecture Notes in Computer Science, vol. 2295, 2002, pp. 319–329. [8] D. Gries, Describing an algorithm by Hopcroft, Acta Informatica 2 (1973) 97–109. [9] J.E. Hopcroft, J.D. Ullman, R. Motwani, Introduction to Automata Theory, Languages and Computation, Addison-Wesley, 2001. [10] J.E. Hopcroft, An nlogn algorithm for minimizing states in a finite automaton, in: Z. Kohavi, A. Paz (Eds.), Theory of Machines and Computations, Academic Press, 1971, pp. 189–196. [11] L. Ilie, S. Yu, Follow automata, Information and Computation 186 (1) (2003) 140–162. [12] T. Knuutila, Re-describing an algorithm by Hopcroft, Theoretical Computer Science 250 (1–2) (2001) 333–363. [13] H. Körner, A time and space efficient algorithm for minimizing cover automata for finite languages, International Journal of Foundations of Computer Science 14 (6) (2003) 1071–1086. [14] A. Păun, N. Santean, S. Yu, An O(n2 ) algorithm for constructing minimal cover automata for finite languages, in: CIAA 2000, in: Lecture Notes in Computer Science, vol. 2088, 2001, pp. 243–251. [15] A. Salomaa, Formal Languages, Academic Press, 1973. [16] K. Salomaa, X. Wu, S. Yu, Efficient implementation of regular languages using reversed alternating finite automata, Theoretical Computer Science 231 (1) (2000) 103–111. [17] N. Sântean, Towards a minimal representation for finite languages: Theory and practice, MSc Thesis, Department of Computer Science, The University of Western Ontario, 2000. [18] S. Yu, Regular Languages, in: G. Rozenberg, A. Salomaa (Eds.), Handbook of Formal Languages, Springer, 1998, pp. 41–110. [19] S. Yu, State complexity of finite and infinite regular languages, Bulletin of the EATCS 76 (2002) 142–152. [20] S. Yu, State complexity: Recent Results and open problems, Fundamenta Informaticae 64 (1–4) (2005) 471–480. [21] S. Yu, On the state complexity of combined operations, in: CIAA 2006, in: Lecture Notes in Computer Science, vol. 4094, 2006, pp. 11–22. [22] The Grail + Project. A symbolic computation environment for finite state machines, regular expressions, and finite languages, available online at the following address: http://www.csd.uwo.ca/research/grail/.
Theoretical Computer Science 410 (2009) 2431–2441
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
State complexity of unique rational operations Narad Rampersad a , Nicolae Santean a,∗ , Jeffrey Shallit a , Bala Ravikumar b a
School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, ON N2L 3G1, Canada
b
Department of Computer Science, Sonoma State University, 1801 East Cotati Avenue, Rohnert Park, CA 94928, USA
article
info
Keywords: Unique union Unique concatenation Unique star State complexity
a b s t r a c t For each basic language operation we define its ‘‘unique’’ counterpart as being the operation that results in a language whose words can be obtained uniquely through the given operation. These unique operations can arguably be viewed as combined basic operations, placing this work in the popular area of state complexity of combined operations on regular languages. We study the state complexity of unique rational operations and we provide upper bounds and empirical results meant to cast light into this matter. Equally important, we hope to have provided a generic methodology for estimating their state complexity. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Finite automata (FAs) are ubiquitous objects in computer science theory as much as in computer applications. They model finite state systems, from a door lock to the entire Universe – in some views – and check the syntax of regular languages. Computers are deterministic finite automata (DFAs), and the English lexicon can be spell-checked by FAs. Recently, automata have found new practical applications, such as in natural language processing [17], communications [3] and software engineering [11] — applications increasingly demanding in terms of computing resources. In this context, the study of state complexity of operations on FAs and their languages has become a topic of paramount importance. From the formal languages point of view, FAs are yet another tool for defining the family of regular (or rational, as known in certain literature) languages, along with regular expressions and right linear grammars. They arise from the perpetual mathematical effort of expressing infinite objects by finite means. In this paper we pursue a new direction in their study, namely, that of analyzing the succinctness of expressing a language obtained by certain unique language operations, in terms of the descriptional complexity of the languages involved. Similar directions have been pursued in automata theory before, e.g., for basic language operations [29,5,6,25] and combined operations [9,24,28] on regular languages. In the present paper, we make a leap from the current trend, by addressing the succinctness of some special operations; namely, we address those operations derived from the basic ones, that reach a result in a unique manner: an object obtained in two (or more) ways by applying the given operation is excluded from the result. This work is distinct from another recent examination of concatenation uniqueness in [7], where the operation is defined in an ‘‘all-or-nothing’’ manner. Our definition of unique concatenation was briefly mentioned in [15], where pebble automata were used to infer the regularity of this operation. We go beyond this matter, by rigorously studying all rational operations, with a focus on their state complexity. An extended version of this paper, with more details, experiments, and complete proofs, can be found in [21].
∗ Corresponding address: Department of Computer and Information Sciences, Indiana University South Bend, 1700 Mishawaka Ave., South Bend, IN 46634, USA. E-mail addresses:
[email protected] (N. Rampersad),
[email protected] (N. Santean),
[email protected] (J. Shallit),
[email protected] (B. Ravikumar). 0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.035
2432
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
2. Definitions and notation Let Σ be an alphabet, i.e., a non-empty, finite set of symbols (letters). By Σ ∗ we denote the set of all finite words (strings of symbols) over Σ , and by ε , the empty word (a word having zero symbols). The operation of concatenation (juxtaposition) of two words u and v is denoted by u · v , or simply uv . For w ∈ Σ ∗ , we denote by w R the word obtained by reversing the order of symbols in w . A non-deterministic finite automaton over Σ , NFA for short, is a tuple M = (Q , Σ , δ, q0 , F ) where Q is a finite set of states, δ : Q × Σ → 2Q is a next-state function, q0 is an initial state and F ⊆ Q is a set of final states. δ is extended over Q × Σ ∗ in the usual way. M is deterministic (DFA) if δ : Q × Σ → Q . We consider complete DFAs, that is, those whose transition function is a total function. The size of M is the total number of its states. When we want to emphasize the number n of states of M, we say that M is an n-state DFA. The language of M, denoted by L(M ), belongs to the family of regular languages and consists of those words accepted by M in the usual way. For a background on finite automata and regular languages we refer the reader to [27]. Definition 1. Let L, R be languages over Σ . (i) The unique concatenation of L and R is the set L ◦ R = {w | w = uv, u ∈ L, v ∈ R, and such factorization is unique}. (ii) The unique star of L is the set L◦ = {ε} ∪ {w | w = u1 · · · un , n ∈ N, ui ∈ L \ {ε} ∀1 ≤ i ≤ n, and such factorization is unique}. ◦
(iii) The unique union of L and R is the set L ∪ R = (L \ R) ∪ (R \ L). We could have defined L◦ such that the factorization in the above definition involves ε as well. In this case, if L contained ε , then L◦ would be empty. Moreover, the connection with unambiguous regular expressions (Lemma 6) could not be made. For these reasons we adopt the above definition.
Notation-wise, we denote L R = LR \ (L ◦ R), L = L∗ \ L◦ , and L ∪ R = L ∩ R, and we refer to these operations as poly concatenation, poly star and poly union. Note that ε 6∈ L . We also consider unique square and poly square, given by L◦2 = L ◦ L and L2 = L2 \ L◦2 . The reversal operation is compatible with the unique operations and with their ‘‘poly’’ counterparts. Indeed, for L1 , L2 ⊆ Σ ∗ , one can verify that
(L1 ◦ L2 )R = LR2 ◦ LR1 ,
◦
◦
(L◦1 )R = (LR1 )◦ ,
(L1 )R = (LR1 ) .
(L1 ∪ L2 )R = LR1 ∪ LR2 ,
(L1 L2 )R = LR2 LR1 ,
(L1 ∪ L2 )R = LR1 ∪ LR2 ,
The next examples show that unique and poly concatenation are not associative: (1) For L1 = {b, ba2 }, L2 = {a3 , a4 } and L3 = {ab, a2 b, a3 b} we have (L1 ◦ L2 ) ◦ L3 = {ba4 b, ba9 b}, and L1 ◦ (L2 ◦ L3 ) = {ba4 b, ba6 b, ba7 b, ba9 b}. (2) For L1 = {b, ba}, L2 = {a3 , a4 }, L3 = {ab, a2 b, a3 b} we have (L1 L2 ) L3 = ∅ and L1 (L2 L3 ) = {ba6 b}. Consequently, unique concatenation and unique star are unrelated: if L = {a, b, b2 } then ab2 ∈ (L ◦ L) ◦ L; however ab2 6∈ L◦ . Various connections among unique operations can be drawn. For example, Lemma 2. If L is an arbitrary language then
(L∗ \ {ε})2 = L ,
and
(L∗ \ {ε})◦2 = L◦ \ {ε}.
Definition 3. By a unique regular expression, or unireg expression for short, we understand the well-formed, parenthesized formulas with the following operators: all symbols a ∈ Σ and ε are nullary, ◦ is unary, and ◦, ⊕ are binary. The operator ⊕ ◦
is the expression counterpart of ∪. Since {a}◦{b} = a ◦ b = ab, for any symbols a, b ∈ Σ we denote the unique concatenation of symbols by juxtaposition, as for the usual concatenation. The language L(e), denoted by unireg expression e, is defined recursively as in the case of regular expressions [13]. However, for reasons that will become apparent later, we consider only fully-parenthesized expressions. 3. Properties In the following, we denote by the shuffle operation on words or languages. We also use two new symbols, 1, 2 6∈ Σ , and we denote by h12 the homomorphism that deletes these symbols in words, h12 : (Σ ∪ {1, 2})∗ → Σ ∗ . ◦
Lemma 4. If L and R are regular languages then L ◦ R, L◦ , and L ∪ R are regular languages.
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
2433
◦
Proof. It is clear that L ∪ R is regular. For the other two, it suffices to observe that the languages
L R = h12 (L1R L = h12
(L0 1)∗
2) ∩ (L2R
1) ∩ Σ ∗ (1Σ + 2 + 2Σ + 1)Σ ∗
2∗ ∩ (L0 2)∗
1∗ ∩ ∆∗ (1Σ + 2 + 2Σ + 1)∆
and
∗
,
where L0 = L \ {ε} and ∆ = Σ ∪ {1, 2}, are both regular, their definition involving operations under which regular languages are closed. Then, since L ◦ R = LR \ (L R) and L◦ = L∗ \ L , the conclusion follows. Note that the expressions for L R and L can be simplified using ‘‘sequential insertion’’ [16]. Let R be a regular expression over Σ , containing r occurrences of symbols in Σ (multiple occurrences are counted separately). Let Σ 0 = {a1 , . . . , ar } denote an alphabet of r new symbols, and consider hR : Σ 0∗ → Σ ∗ the homomorphism that maps ai to the symbol in Σ representing the ith occurrence of a symbol in R. By Rh we denote the regular expression obtained from R by replacing each ith occurrence of a symbol in Σ with the corresponding ai ∈ Σ 0 . For example, if R = (a + ab)∗ b∗ , then Rh = (a1 + a2 a3 )∗ a∗4 , and hR (a1 ) = hR (a2 ) = a, hR (a3 ) = hR (a4 ) = b. Definition 5. With the above notation, a regular expression R is unambiguous if and only if the restriction of hR to L(Rh ) is injective. (This definition is equivalent to that given in [4].) According to this definition, the above expression R = (a + ab)∗ b∗ is not unambiguous, since a1 a4 , a2 a3 ∈ L(Rh ) and hR (a1 a4 ) = hR (a2 a3 ) = ab. An unambiguous regular expression ‘‘matches’’ any word in at most one way. Let R be a fully-parenthesized regular expression over Σ and denote by R˜ the unireg expression obtained from R by replacing its regular operations with their unique counterparts. The following result can be proven by induction on the number of regular operations involved: Lemma 6. If R is unambiguous then L(R) = L(R˜ ). The converse of this lemma does not hold. Indeed, if R = a + (a + a), then R˜ = a ⊕ (a ⊕ a) and L(R) = L(R˜ ) = {a}. However, R is obviously ambiguous. From Lemmas 4, 6, and from the fact that any regular language is represented by an unambiguous regular expression [4], we infer a fundamental fact, that Corollary 7. Unireg expressions define the family of regular languages. The question whether context-free languages are closed under unique operations appears naturally at this point, and is answered in the following. Proposition 8. The following families are not closed under unique union: DCF, CF, and linear CF. ◦
Proof. It is clear that Σ ∗ ∪ L = L. However, it is well-known that the families CF and linear CF are not closed under complement. For the DCF family, let L1 = {ai bj c k | i 6= j} and L2 = {ai bj c k | j 6= k}. Clearly L1 and L2 are deterministic CFLs. Then ◦
L1 ∪ L2 = {ai bj c k | i = j 6= k or i 6= j = k}. ◦
We claim that L1 ∪ L2 is not context-free. Suppose it is, and let n be the constant of Ogden’s lemma. Let z = an bn c n+n! and mark the b’s. Let z = uvw xy be a decomposition satisfying the conditions of Ogden’s lemma. We have the next cases: ◦
◦
– v = ap and x = bq . If p = q, then uv n!/p w xn!/q y = an+n! bn+n! c n+n! ∈ / L1 ∪ L2 . If p 6= q, then clearly uv 2 wx2 y ∈ / L1 ∪ L2 . ◦
– v = bp and x = c q . Then clearly uv 2 w x2 y ∈ / L1 ∪ L2 . ◦
– vw x = bp . Then clearly uv 2 w x2 y ∈ / L1 ∪ L2 . ◦
Thus the decomposition z = uv xw y fails to satisfy the conditions of Ogden’s lemma, a contradiction; thus, L1 ∪ L2 is not context-free. Proposition 9. The following families are not closed under unique concatenation: DCF, CF and linear CF. Proof. Consider the following CFL: L1 = {an | n ≥ 1} ∪ {an bn | n ≥ 1} and L2 = {c n | n ≥ 1} ∪ {bn c n | n ≥ 1}. It is easy to see that L1 ◦ L2 = {an c m | m, n ≥ 1} ∪ {an bn+m c m | m, n ≥ 1}
∪ {an bm c m | m 6= n; m, n ≥ 1} ∪ {an bn c m | m 6= n; m, n ≥ 1}, that is, the only words in L1 L2 that are written as a concatenation in more than one way are an bn c n , n ≥ 1. We prove that L1 ◦ L2 is not context-free by Ogden’s lemma. Assume by contradiction that it is, and let N be the constant of Ogden’s lemma. Take the word z = aN bN c N +N ! ∈ L1 ◦ L2 . By Ogden’s lemma, if we mark the a’s, there exists a factorization z = uvw xy such that v x has at least one marked symbol, vw x
2434
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
has at most N marked symbols and uv i w xi y ∈ L1 ◦ L2 for all i ≥ 0. One can observe that such factorization must necessarily have v = at and x = bt , 0 < t ≤ N. But then, for i = 1 + N !/t we have uv i w xi y = aN +N ! bN +N ! c N +N ! contradicting that such word must not be in L1 ◦ L2 . Since L1 and L2 are both deterministic and linear CF languages, the conclusion follows. Proposition 10. CF and linear CF families are not closed under unique star. Proof. Let L1 and L2 be as in the previous proposition. Clearly, L1 ∪ L2 is context-free. It can be shown that (L1 ∪ L2 )◦ is not context-free by showing that (L1 ∪ L2 )◦ ∩ a∗ b∗ c ∗ is not context-free, as before. From this, it follows that CF family is not closed under unique star. Since L1 ∪ L2 is a linear CF language, it follows that linear CF family is not closed under unique star. 4. State complexity Before dealing with the state complexity of unique operations, we first prove a result concerning unambiguous computations in NFAs. The idea behind the following construction will be useful in proving upper bounds for the state complexities of unique concatenation and unique star. Lemma 11. Let A be an NFA of size m, and let L be the language of those words in Σ ∗ that are accepted unambiguously by A. Then L is regular and its state complexity is at most 3m − 2m + 1. Proof. We construct a DFA whose states are vectors with m components, showing the number of paths from the initial state to each state: 0, 1 or 2 (2 stands for ‘‘two or more paths’’). The final states are those vectors that denote exactly one path to one final state of the initial NFA. Formally, let A = (QA , Σ , δA , qA , FA ), |QA | = m. We construct a DFA B = (QB , Σ , δB , qB , FB ) for L of size 3m − 2m + 1 as follows: (1) QB = VB − VB0 ∪ {s}, where VB = {0, 1, 2}m and VB0 is the subset of VB consisting of those vectors that do not have the value 1 in any component. Clearly, |QB | = 3m − 2m + 1. State s has the role of a sink state. (2) qB = (1, 0, . . . , 0), FB is the set of all vectors v such that the sum of all components of v corresponding to the final states of A is precisely 1. (3) For all a ∈ Σ , we denote by Ma the incidence matrix of A with respect to the symbol a (Ma [i, j] is 1 if there is a transition from qi to qj labeled with a, and 0 otherwise). Then, for all v ∈ VB − VB0 ,
v Ma , if v Ma ∈ / VB0 ; δB (v, a) = s, if v Ma ∈ VB0 ; and δB (s, a) = s. Here the matrix multiplication is done as usual, but with ⊕ and ⊗ as component-wise addition and multiplication, described as follows: for a, b ∈ {0, 1, 2}, we define a ⊕ b = min(a + b, 2) and a ⊗ b = min(a · b, 2). (See [18] for an early use of these operations.) Notice that the construction can be modified to recognize the language of those words that are accepted ambiguously by the NFA: just make the appropriate states final. Since this symmetry holds for most constructions proposed throughout the paper, we state, a priori, the following fact: Theorem 12. The state complexity results on unique operations, based on this construction and described in the following, hold for poly operations as well. While we have not proven that the bound given in Lemma 11 is tight, we can give an exponential lower bound, as follows. For k ≥ 0, define a language Lk = (0 + 1)∗ 0(0 + 1)k 1(0 + 1)∗ . The languages Lk (or variations thereof) have been used by several authors [22,12,14,8] to prove lower bounds for nondeterministic state complexity. The language Lk consists of all words containing at least one occurrence of a word in 0(0 + 1)k 1. Now consider the language ULk = (0 ⊕ 1)◦ ◦ 0 ◦(0 ⊕ 1)◦k ◦ 1 ◦(0 ⊕ 1)◦ , obtained from the regular expression for Lk by replacing the ordinary operations with the unique ones. The language ULk consists of all words containing exactly one occurrence of a word in 0(0 + 1)k 1. Lemma 13. Any NFA accepting ULk has at least 2k states. Proof. For every word x ∈ {0, 1}k , define a pair (0x, 1x). Note that 0x1x is in ULk , since there is exactly one instance where a 0 is followed by a 1 k positions later. However, for any 2 distinct words x and y, at least one of the words 0x1y or 0y1x must contain two occurrences of a subword in 0(0 + 1)k 1 (since x and y must mismatch in at least one place). Thus, at least one of 0x1y or 0y1x is not in L. Since there are 2k pairs, it follows from a result of Birget [1] that any NFA for ULk has at least 2k states. From this, we easily deduce the following results:
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
2435
Proposition 14. There exists an NFA Mk with O(k) states such that any DFA, NFA, or regular expression for the set of words accepted unambiguously by Mk has size at least 2k . Proof. The language Lk is accepted by an NFA Mk with O(k) states. The set of words accepted unambiguously by Mk is exactly ULk . The result now follows from Lemma 13. Proposition 15. There exists a regular language generated by a unireg expression of size O(k) such that any equivalent DFA, NFA, or regular expression has size at least 2k . (The language ULk used in Lemma 13 has the desired properties.) 4.1. Unique union For the unique union, we observe that given two DFAs A and B, of size m and n respectively, we can easily construct a ◦
DFA of size mn for L(A) ∪ L(B) by performing the cross-product of A and B and by setting as final states those state pairs that have exactly one component final. We prove that this upper bound of mn is tight. ◦
Theorem 16. For m, n ≥ 3, let L1 and L2 be accepted by DFAs with m and n states respectively. The state complexity of L1 ∪ L2 is mn. Proof. For m, n ≥ 3, we use the languages A = {w ∈ {0, 1}∗ | |w|0 ≡ m − 1 mod m} and B = {w ∈ {0, 1}∗ | |w|1 ≡ n − 1 mod n}, that were used by Maslov [19] to prove a similar result for the ordinary union. Clearly, A is accepted by an ◦
m-state DFA, B is accepted by an n-state DFA, and C = A ∪ B is accepted by an mn-state DFA. We show that mn states are necessary. 0 0 For integers i, i0 and j, j0 , 0 ≤ i, i0 ≤ m − 1 and 0 ≤ j, j0 ≤ n − 1, let x = 0i 1j and y = 0i 1j be distinct words. To complete the proof, it is enough to show that x and y are inequivalent with respect to the Myhill–Nerode equivalence relation. We leave the details to the reader. 4.2. Unique concatenation We now approach the more difficult problem of determining the state complexities of unique concatenation and unique star. Theorem 17. The state complexity of unique concatenation for regular languages is at most m3n − k3n−1 , where m and n are the sizes of the input DFAs, and k is the number of final states of the first DFA. Proof (Sketch). Let A = (QA , Σ , δA , qA , FA ) and B = (QB , Σ , δB , qB , FB ) be the input DFAs, of size m and n respectively. For proving the upper bound we construct a DFA C = (QC , Σ , δC , qC , FC ) for L(A) ◦ L(B), of size m3n − k3n−1 : (1) QC = QA × VB − FA × VB0 , where VB = {0, 1, 2}n and VB0 is a subset of VB , of those vectors that have 0 in their first component. Clearly, |QC | = m3n − k3n−1 . (2) qC = hqA , (0, . . . , 0)i if qA 6∈ FA and qC = hqA , (1, 0, . . . , 0)i otherwise; FC is the set of those states hq, vi such that the sum of all components of v corresponding to the final states of B is precisely 1. (3) For a ∈ Σ we denote by Ma the incidence matrix of B with respect to the symbol a. Then δC (hq, vi, a) = hδA (q, a), v 0 i, where v 0 = v Ma if δA (q, a) 6∈ FA and v 0 = v Ma + (1, 0, . . . , 0) otherwise. The matrix operations are done as usual, with the component-wise ⊕ and ⊗ as described in the proof of Lemma 11. The idea of this construction is to find the ‘‘multiplicity’’ of ambiguous computations of the NFA for L(A)L(B), as inspired by the proof of Lemma 11. Considering the DFA Nn in Fig. 3, we have found that this DFA is a state complexity worst-case for unique square, proving that the upper bound is reached: Proposition 18. For n ≥ 3, the state complexity of L(Nn )◦2 is n3n − 3n−1 , thus this is a sharp upper bound for unique square (when k = 1). Proof (Sketch — [21] Provides a Complete Proof). We show that the construction in the proof of Theorem 17 leads to a minimal DFA, by proving its total reachability and non-mergibility. Consider that the states of Nn are numbered (and named) from 0 to n − 1, and recall that the corresponding DFA (as constructed in Theorem 17) has states of the form hi, (x1 , . . . xn )i, where xj ∈ {0, 1, 2} and the component-wise operations of the vectors (x1 , . . . xn ) are x ⊕ y = min(x + y, 2) and x ⊗ y = min(x · y, 2). The following facts about state transitions are useful: a
h0, (x1 , . . . , xn )i −→ h0, (x1 ⊕ xn , 0, x2 , . . . , xn−1 )i, a hi, (x1 , . . . , xn )i −→ hi + 1 mod n, (x1 ⊕ xn , 0, x2 , . . . , xn−1 )i, b
hj, (x1 , . . . , xn )i −→ hj + 1 mod n, (xn , x1 , . . . , xn−1 )i,
∀i 6= 0,
∀j 6= n − 2,
2436
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
Fig. 1. Histogram for unique concatenation over 3-state minimal DFAs. b
hn − 2, (x1 , . . . , xn )i −→ hn − 1, (xn ⊕ 1, x1 , . . . , xn−1 )i, bn
h0, (x1 , . . . , xn )i −→ h0, (x1 , x2 ⊕ 1, x3 , . . . , xn )i, bn
h1, (x1 , . . . , xn )i −→ h1, (x1 , x2 , x3 ⊕ 1, x4 , . . . , xn )i, ... bn hj, (x1 , . . . , xn )i −→ hj, (x1 , x2 , . . . , xj+2 ⊕ 1, . . . , xn )i, ∀j 6= n − 1, ... bn hn − 2, (x1 , . . . , xn )i −→ hn − 2, (x1 , x2 , . . . , xn ⊕ 1)i, bn
hn − 1, (x1 , . . . , xn )i −→ hn − 1, (x1 ⊕ 1, x2 , . . . , xn )i. Reachability. Let hi, (x1 , . . . , xn )i be an arbitrary state, i < n − 1. From the initial state h0, (0, . . . , 0)i we first reach h0, (xi+1 , . . . , xn , x1 , . . . , xi )i by reading the word bnxi+1 abnxi abnxi−1 · · · abnx1 abnxn abnxn−1 · · · abnxi+2 . If we further apply the word bi we reach hi, (x1 , . . . xn )i. For reaching hn − 1, (x1 , . . . , xn )i, with x1 > 0 (recall that x1 cannot be 0 in this case), we first reach h0, (xn , x1 − 1, . . . , xn−1 )i, then we apply bn−2 reaching hn − 2, (x2 , . . . , xn , x1 − 1)i, and then we apply b one more time. Non-mergibility. We now show that no two distinct states hi, (x1 , . . . , xn )i and hj, (y1 , . . . , yn )i are mergible, by providing a word that maps one of these states into a final state and the other into a non-final state. Incidentally, it becomes apparent that our DFA has no sink state. n−i n n−2 PnIf i 6= j, we choose the word a b a . From hi, (x1 , . . . , xn )i P we reach a final state, namely hn − 2, (1 ⊕ (· · · ), 0, . . . , 1, . . . , 0)i, which is not final. j=1 xj , 0, . . . , 0, 1)i. For the other state we reach hj + n − i mod n, ( If i = j, there must be a position k such that xk 6= yk , since the states are distinct. Without loss of generality we may assume that xk < yk (otherwise we flip the states). We distinguish the following subcases: I. xk = 1 or yk = 1. For xk = 1 we use the word bn−k (recall that xk 6= yk ), and for yk = 1 we flip the states. II. xk = 0, yk = 2. Here we distinguish two situations. If k = i + 2 (thus, i ≤ n − 2) we choose the word b2(n−1)−i , which maps hi, (x1 , . . . , xn )i into the final state hn − 2, (xk+1 , . . . xn , x1 , . . . , 1 = xk ⊕ 1)i. The same word maps hi, (y1 , . . . yn )i into a non-final state, for yk ⊕ 1 = 2. If k 6= i + 2, we first apply the word bn−k+2 an−2 ba, which maps hi, (x1 , . . . , xn )i into ht , (xk , 0, z , 0, . . . , 0)i, for some t ∈ {0, . . . , n − 1} and z ∈ {0, 1, 2}. From here, there exists a word w = ar that continues the computation up to h∗, (1 = xk ⊕ 1, 0, . . . 0, z , 0, . . . , 0)i, and after that, the word bn−1 leads to the state h∗, (0, . . . 0, z , 0, . . . , 0, 1)i. Thus, the word bn−k+2 an−2 bawbn−1 maps hi, (x1 , . . . , xn )i into a final state, however, this is not true for hi, (y1 , . . . , yn )i. We can also prove the following exponential lower bound for the non-deterministic state complexity of unique concatenation. Proposition 19. There exists a pair of NFAs M1 and M2 with O(k) states combined, such that L(M1 )L(M2 ) is accepted by an O(k) state NFA, but any NFA accepting L(M1 ) ◦ L(M2 ) has at least 2k states. Proof. Take L(M1 ) = (0 + 1)∗ 0(0 + 1)k 1 and L(M2 ) = (0 + 1)∗ . Then L(M1 )L(M2 ) is accepted by an O(k) state NFA, but any NFA accepting L(M1 ) ◦ L(M2 ) = ULk has at least 2k states (see Lemma 13 and def. of ULk used in Lemma 13). On the experimental side, we generated all minimal DFAs with 3 states and performed unique concatenation on all pairs. There are 1028 distinct DFAs, leading to 1056,784 operations. Fig. 1 provides a histogram of our results: the x-axis represents the size of the output DFAs, and the y-axis plots the number of cases that resulted in DFAs of that size. For two DFAs of size m and n, the theoretical upper bound is m3n − k3n−1 (k is the number of final states in the first DFA). The largest DFAs obtained from this experiment are of size 72, and are the result of operations where the first DFA has precisely one final state. Thus the bound is reached for m = n = 3 and k = 1. Notice that small DFAs have a higher incidence rate, hinting at the fact that the worst-case scenarios are very sparse.
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
2437
Fig. 2. Histogram for unique square over 3-state minimal DFAs.
Fig. 3. Parameterized automata Ji and Ni .
Then, we investigated whether the unique square has a smaller state complexity. For this we performed this operation for all minimal DFAs of size 3 and 4. The results for 3-state DFAs are shown in Fig. 2. We found 6 minimal DFAs of size 3 (with one final state) whose unique square reaches the upper bound of 72. Consider the two parameterized minimal DFAs, Ji and Ni , with i ≥ 3, shown in Fig. 3. Our experiments show that the upper bound is reached for any of the following combinations: L(Ji )◦2 , L(Ni )◦2 , L(Ji ) ◦ L(Jj ), L(Ni ) ◦ L(Nj ), L(Ji ) ◦ L(Nj ), with i, j arbitrary integers greater than 2. It is interesting to notice that Ji is given in [20] as example for reaching the upper bound for the normal concatenation, hence it may provide an example where worst-case is achieved for both concatenation and unique concatenation. 4.3. Unique star Using a technique similar to that for unique concatenation we can derive an upper bound for the unique star: Theorem 20. If L \ {ε} is accepted by a DFA A of size m and with k final states, then a DFA for L◦ has at most 3m−1 + (k + 2)3m−k−1 − (2m−1 + 2m−k−1 − 2). Proof. Let A = (QA = {1, 2, . . . , m}, Σ , δA , 1, FA ) be a DFA for L, of size m, and FA = {m − k + 1, . . . , m}. By Ma we denote the adjacency matrix of A with respect to the symbol a ∈ Σ , and the matrix operations are performed with the usual ⊕ and ⊗ component-wise operations. We define a DFA B = (QB , Σ , δB , 0, FB ) for L◦ as follows: (1) QB = V ∪ {0}, where 0 is the initial state of B and V is the set of all vectors with m components holding values in {0, 1, 2}. The vector entries are indexed from 1 to m. (2) The transition function is defined as follows: (a) δB (0, a) = va , where va [δA (1, a)] = 1, va [1] = 1 if δA (1, a) ∈ FA , and va [i] = 0 for all other indices i. (b) Denote Sk [v] to be the value v[m − k + 1] ⊕ · · · ⊕ v[m]. For all j ∈ {1, . . . , m} and a ∈ Σ we set δB (v, a) = v 0 + v 00 , where v 0 = v Ma and v 00 = (Sk (v 0 ), 0, 0, . . . , 0). (3) FB = {v ∈ V | Sk (v) = 1} ∪ {0}. We use vectors to store the number of computations in A, from the initial state to every state: 0, 1, or 2 standing for more than one computation. If a vector v is reached during the computation of B, the value Sk (v) gives the number of different computations in A reaching final states. This number has been added to the first component of v , meaning that reaching a final state in A implies reaching its initial state as well, for we aim at accepting words in L∗ . If a word w ‘‘reaches’’ a statevector v in B, then v[i] gives the number (0, 1, or 2) of distinct paths in A, labeled with w , from the initial state of A to its state i, when A is modified to accept L∗ in the standard way. By setting as final states in B all those vectors that denote exactly one successful such path, we force B to accept exactly the words in L◦ .
2438
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
Fig. 4. Worst-case candidates for unique star.
Fig. 5. Histogram for unique star over 5-state minimal DFAs with k = 1.
We now make two crucial observations: (a) for a reachable state v ∈ V we must have v[1] ≥ Sk (v), and (b) any reachable state v ∈ V containing only values 0 and 2 is mergible into (or, is equivalent to) the sink state. It now remains to compute how many states can possibly be reached in B: 1. There is an initial state 0 and eventually a sink state, amounting for 2 states. 2. At most 3m−k − 1 vectors v with Sk (v) = 0 can be reached (the null vector cannot be reached). From these vectors, we subtract those having only 0’s and 2’s, for they will eventually be merged together within a sink state when B is minimized. There are 2m−k − 1 such vectors, without counting the null vector. Thus, we have altogether 3m−k − 2m−k states in this case. 3. At most 2k · 3m−k−1 states v with Sk (v) = 1 can be reached. Observe that once Sk (v) = 1 we cannot have v[1] = 0 since Sk (v) has been added to v[1] during an eventual transition. Thus, v[1] can take two values (1 and 2), then the portion of the vector v[2, . . . , m − k] gives 3m−k−1 combinations, and there are at most k combinations of v[m − k + 1, . . . , m] that ensure Sk (v) = 1. 4. Finally, at most 3m−k−1 (3k − k − 1) states v with Sk (v) = 2 can be reached. Indeed, we have at most 3k − k − 1 combinations in v[m − k + 1, . . . , m] that ensure Sk (v) = 2. Then v[1] must be 2 (since Sk (v) has been added to it), and there are 3m−k−1 combinations for v[2, . . . , m − k]. However, some of these vectors are mergible into the sink state: those with only 0’s and 2’s. There are exactly 2m−k−1 (2k − 1) such vectors v , since: v[1] = 2, there are (2k − 1) combinations in v[m − k + 1, . . . , m] (this portion cannot be all 0’s), and there are 2m−k−1 combinations of 0’s and 2’s in v[2, . . . , m − k]. Combining these numbers, we obtain 3m−k−1 (3k − k − 1) − 2m−k−1 (2k − 1) states in this case. It remains to add up the figures underlined in the above cases 1–4. The upper bound in Theorem 20 has been reached for k = 1 and m = 2, . . . , 8 by the generic examples in Fig. 4 (which are good candidates for the worst-case in general), and we conjecture that this upper bound is sharp in both n and k. In Fig. 5 we plotted the histogram for all minimal DFAs with 5 states and one non-initial final state. The case when ε ∈ L, and we are given a DFA for L, is proven similarly, and may lead to a slightly different upper bound. In fact, we can immediately derive an upper bound by noticing that a DFA for L \ {ε} has one state more than the DFA for L (thus, we just replace m by m + 1 in the above result). Nevertheless, a proof as in Theorem 20 may improve such upper bound, and it merely involves a different state-indexing scheme. We leave this exercise to the reader.
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
2439
5. Decision problems We consider two decision problems involving unireg expressions, and start with the membership problem: Theorem 21. The membership problem for unireg expressions is in P. Proof. Let R be a unireg expression over Σ , and w be a string in Σ ∗ . If w = ε , we can determine efficiently the membership, by consulting the parse-tree for R. If w 6= ε , we proceed as follows. Let w = a1 a2 . . . ak and let R0 be the regular expression obtained from R by replacing the unique operations with the corresponding regular operations. We use the algorithm of Glushkov to obtain an ε -free NFA M such that the set of strings accepted by M is the same as the set of strings generated by R0 (with the possible exception of ε ). It is known [4] that Glushkov’s algorithm preserves the degree of ambiguity of representation, that is, the number of accepting computations in M for an input word w equals the number of ways in which R0 generates w . Then, L(R) consists of those words that are accepted by M in an unique computation or, we say, unambiguously. Thus, it now suffices to detect whether our word w is accepted unambiguously by M. We consider that the states in M are numbered from 0 to m − 1, and denote {Ta }a∈Σ the set of incidence matrices of M. Let S = [s(0), s(1), . . . , s(m − 1)] where s(i) = 1 if i is the start state, 0 otherwise. Similarly, let F = [f (0), f (1), . . . , f (m − 1)] where f (j) = 1 if j is an accepting state of M, 0 otherwise. Then STa1 Ta2 . . . Tak F is the number of accepting paths for the string w in M. By computing the above matrix chain product, we can determine the number of accepting paths for w . If this number is 1, then w is accepted; otherwise it is rejected. This algorithm runs in time polynomial in |R|+|w|, where |R| is the number of symbols in R. Theorem 22. The non-emptiness problem for unireg expressions is in PSPACE. Proof. Let R be a unireg expression over Σ and R0 be the regular expression obtained from R by replacing the unique operations with the standard regular operations. Let M be the NFA obtained by applying Glushkov’s algorithm to R0 . Then L(R) is non-empty if and only if there exists a word w accepted unambiguously by M. By Savitch’s theorem [26], it suffices to give a non-deterministic polynomial space algorithm to test for the existence of such word w . For a ∈ Σ , let Ba denote the adjacency matrix of M with respect to the input a. By Lemma 11, if there is a word w accepted unambiguously by M, there is such a w of length at most 3n , where n is the number of states of M. We thus nondeterministically guess such a word w = w1 w2 . . . wr , r ≤ 3n , symbol by symbol, and we compute the matrix product Bw1 Bw2 · · · Bwr = B, reusing space after each matrix multiplication. Here the matrix multiplication is again done with ⊕ and ⊗ as component-wise operations. We maintain an O(n) bit counter to keep track of the length of the guessed string w . We verify that M accepts w unambiguously by looking at the row of B corresponding to the start state of M and summing up the entries in the columns corresponding to the final states of M. This quantity is exactly 1 if and only if M accepts w unambiguously. The transformation of R to R0 and then to M can be done in polynomial space, and the non-deterministic algorithm described above uses only polynomial space. It follows that the non-emptiness problem can be solved in polynomial space. 6. Application: 2-DFAs with a pebble It is well-known that if M is a 2-DFA with a pebble [2], L(M ) is regular. Here we revisit the following question, studied in [10]: What is the worst-case blow-up in the number of states when a 2-DFA with a pebble is converted into a 1-DFA? Let f (n) denote this function. Formally, f is defined by the following two conditions: (1) there is an n-state 2-DFA with a pebble such that the minimum equivalent 1-DFA has f (n) states, and (2) for any n-state 2-DFA with a pebble, there is an equivalent 1-DFA with at most f (n) states. A lower-bound on f (n) can be obtained from the results of the previous section. Indeed, the connection between the state complexity for converting 2-DFA with a pebble to a 1-DFA and the state complexity of unique concatenation is provided by the following proposition. Proposition 23. Let A and B be two DFAs, with m and n states. There exists a 2-DFA C with a pebble such that L(C ) = L(A) ◦ L(B) and C has 2(m + n) + 2 states. Proof (Sketch). The state set of C is given by QC = QA ∪ QB ∪ QA0 ∪ QB0 , where QA0 = {q0 | q ∈ QA } and QB0 = {q0 | q ∈ QB }∪{r , r 0 }. On an input string $w#, C starts with the reading head on the left end-marker, in its start state sC — which is, by definition, sA . The head moves to the right, and C simulates A until an accepting state is reached. At this point, C places a pebble on the current tape square, enters sB and moves to the right simulating B, until the right end-marker is reached. If at this point an accepting state of B is not reached, then C proceeds as in Step 1, else it proceeds as in Step 2, detailed as follows: Step 1: C enters the state r and makes a right-to-left sweep until it reaches the left end-marker, and enters the state sA . Then it simulates A as usual, with the difference that when it reaches the pebble, it picks it up and continues the computation to the right till another final state of A is reached. Then it drops the pebble and continues with the simulation of B, as in the initial phase. Step 2: C enters the state r 0 and makes a right-to-left sweep until it reaches the left end-marker and enters state s0A . While in a state of the form q0a ∈ QA0 , C simulates A, but uses the primed states and keeps moving to the right. More precisely, if δA (qa , b) = qd , then C , on input b and in state q0a , changes its current state to q0d and moves to the right. It continues
2440
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441
this phase until the square with a pebble is detected. Here, it picks up the pebble and moves to the right continuing the simulation of A using the primed states. At this point there are two cases to consider. (a) As the simulation of A continues, the right end-marker is reached without ever reaching an accepting state of A. In this case, C accepts the input and it halts. (b) An accepting state of A is reached before the right end-marker is reached. When an accepting state of A is reached for the first time, the pebble is dropped on the square that stores the last symbol that caused A to reach the accepting state, C enters the state s0B and it starts the simulation of B using the primed states. The simulation continues until the right end-marker is reached. At this point, if an accepting state of B is reached, C rejects the input and halts. If an accepting state is not reached, then C enters the state r 0 and repeats Step 2. It is clear that C accepts L(A) ◦ L(B) and the proof is complete. This result, together with Theorem 17 or Proposition 18, provides an exponential lower bound for f (n), whereas the lower bound given in [10] is doubly-exponential. 7. Conclusion and further work In this paper we studied unique rational operations and their state complexity. We drew connections between unireg expressions and unambiguous regular expressions, and we studied the closure of DCF, CF and linear CF languages to unique operations. We obtained a sharp bound of the state complexity for unique union, comparable with that of ‘‘plain’’ union. For unique concatenation we gave a state complexity upper bound that we strongly believe to be sharp, for we provided generic (parameterized) examples that reached the upper bound in all our extensive experiments. For the unique square, we provided sharp upper bounds and a generic worst-case example, in the laborious proof of Proposition 18. Both bounds are significantly higher than those for plain concatenation. For the non-deterministic state complexity of unique concatenation we provided an exponential lower bound. In Theorem 20 we provided a curious upper bound for the unique star, that we believe yet again to be sharp, based on empirical results. Finally, we studied the complexity of the membership and non-emptiness problem for unireg expressions, and we drew a connection between 2-DFAs with a pebble and unique concatenation. This work is in progress, and we feel that much more needs to be done. A list of open questions and further directions can be found in the extended paper [21]. Acknowledgements This study benefitted greatly from our extensive experiments that could have not been possible without the use of Grail+, carefully developed and maintained by Derick Wood, Sheng Yu, and its project members for over 20 years. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24]
J.-C. Birget, Intersection and union of regular languages and state complexity, Information Processing Letters 43 (1992) 185–190. M. Blum, C. Hewitt, Automata on a 2-dimensional tape, in: Proc. 8th IEEE Symp. Switching and Automata Theory, 1967, pp. 155–160. G.V. Bochmann, Submodule construction and supervisory control: A generalization, in: Proc. CIAA 2001, in: LNCS, vol. 2494, 2001. R. Book, S. Even, S. Greibach, G. Ott, Ambiguity in graphs and expressions, IEEE Transactions on Computers C-20 (2) (1971) 149–153. C. Campeanu, K. Culik, K. Salomaa, S. Yu, State complexity of basic operations on finite languages, in: Proc. WIA 1999, in: LNCS, vol. 2214, 1999. C. Campeanu, K. Salomaa, S. Yu, Tight lower bound for the state complexity of shuffle of regular langauges, Journal of Automata, Languages and Combinatorics 7 (2002) 303–310. M. Daley, M. Domaratzki, K. Salomaa, On the operational orthogonality of languages, in: Proc. TALE 2007, vol. 44, TUCS General Publication, June 2007. K. Ellul, B. Krawetz, J. Shallit, M. Wang, Regular expressions: New results and open problems, Journal of Automata, Languages and Combinatorics 10 (2005) 407–437. Y. Gao, K. Salomaa, S. Yu, State complexity of catenation and reversal combined with star, in: Proc. DCFS 2006, vol. 2494, 2006. N. Globerman, D. Harel, Complexity results for two-way and multi-pebble automata and their logics, Theoretical Computer Science 169 (2) (1996) 161–184. D. Harel, M. Politi, Modeling Reactive Systems with Statecharts, McGraw-Hill, 1998. M. Holzer, M. Kutrib, Nondeterministic descriptional complexity of regular languages, International Journal of Foundations of Computer Science 14 (6) (2003) 1087–1102. J. Hopcroft, R. Motwani, J. Ullman, Introduction to Automata Theory, Languages, and Computation, 3rd edition, Addison-Wesley Publishing Co, Boston, MA, 2006. J. Hromkovi˘c, J. Karhumäki, H. Klauck, G. Schnitger, Communication complexity method for measuring nondeterminism in finite automata, Information and Computation 172 (2002) 202–217. T. Jiang, B. Ravikumar, A note on the space complexity of some decision problems for finite automata, Information Processing Letters 40 (1) (1991) 25–31. L. Kari, On language equations with invertible operations, Theoretical Computer Science 132 (2) (1994) 129–150. L. Karttunen, Applications of finite-state transducers in natural languages processing, in: Proc. CIAA 2000, in: LNCS, vol. 2088, 2000. A. Mandel, I. Simon, On finite semigroups of matrices, Theoretical Computer Science 5 (1977) 101–111. A.N. Maslov, Estimates of the number of states of finite automata, Doklady AkademiiNauk SSSR 194 (1970) 1266–1268 (in Russian). English translation in Soviet Mathematics Doklady 11 (1970) 1373–1375. N. Rampersad, The state complexity of L2 and Lk , Information Processing Letters 98 (2006) 231–234. N. Rampersad, B. Ravikumar, N. Santean, J. Shallit, A study on unique rational operations, Tech. Rep. TR-20071222-1, Department of Computer and Information Sciences, Indiana University South Bend, also available at: http://www.cs.iusb.edu/technical_reports.html, December, 2007. B. Ravikumar, O. Ibarra, Relating the type of ambiguity of finite automata to the succinctness of their representation, SIAM Journal on Computing 18 (1989) 1263–1282. G. Rozenberg, A. Salomaa, Handbook of Formal Languages, Springer-Verlag, Berlin, Heidelbergm, New York, 1997. A. Salomaa, K. Salomaa, S. Yu, State complexity of combined operations, Theoretical Computer Science 383 (2007) 140–152.
N. Rampersad et al. / Theoretical Computer Science 410 (2009) 2431–2441 [25] [26] [27] [28] [29]
2441
A. Salomaa, D. Wood, S. Yu, On the state complexity of reversals of regular languages, Theoretical Computer Science 320 (2004) 293–313. W. Savitch, Relationship between nondeterministic and deterministic tape complexities, Journal of Computer and System Sciences 4 (1970) 177–192. S. Yu, Regular Languages, In [23] Ch.1 (1997) 41–110. S. Yu, State complexity: Recent results and open problems, Fundamenta Informaticae 64 (2005) 471–480. S. Yu, Q. Zhuang, K. Salomaa, The state complexities of some basic operations on regular languages, Theoretical Computer Science 125 (1994) 315–328.
Theoretical Computer Science 410 (2009) 2442–2452
Contents lists available at ScienceDirect
Theoretical Computer Science journal homepage: www.elsevier.com/locate/tcs
On minimal elements of upward-closed setsI Hsu-Chun Yen a,b,∗ , Chien-Liang Chen a a
Department of Electrical Engineering, National Taiwan University, Taiwan, ROC
b
Department of Computer Science, Kainan University, Taiwan, ROC
article
info
Keywords: Minimal element Petri net Upward-closed set Vector addition system
a b s t r a c t Upward-closed sets of integer vectors enjoy the merit of having a finite number of minimal elements, which is behind the decidability of a number of Petri net related problems. In general, however, such a finite set of minimal elements may not be effectively computable. In this paper, we develop a unified strategy for computing the sizes of the minimal elements of certain upward-closed sets associated with Petri nets. Our approach can be regarded as a refinement of a previous work by Valk and Jantzen (in which a necessary and sufficient condition for effective computability of the set was given), in the sense that complexity bounds now become available provided that a bound can be placed on the size of a witness for a key query. The sizes of several upward-closed sets that arise in the theory of Petri nets as well as in backward-reachability analysis in automated verification are derived in this paper, improving upon previous decidability results shown in the literature. © 2009 Elsevier B.V. All rights reserved.
1. Introduction A set U over k-dimensional vectors of natural numbers is called upward-closed (or right-closed) if ∀x ∈ U , y ≥ x =⇒ y ∈ U. It is well known that an upward-closed set is completely characterized by its minimal elements, which always form a finite set. Aside from being of interest mathematically, evidence has suggested that upward-closed sets play a key role in a number of decidability results in automated verification of infinite state systems. In the analysis of Petri nets, the notion of upward-closed sets is closely related to the so-called property of monotonicity which serves as the foundation for many decision procedures for Petri net problems. What the monotonicity property says is that if a sequence σ of transitions of a Petri net is executable from a marking (i.e., configuration) µ ∈ Nk , then the same sequence is legitimate at any marking greater than or equal to µ. That is, all the markings enabling σ form an upward-closed set. In spite of the fact that the set of all the minimal elements of an upward-closed set is always finite, such a set may not be effectively computable in general. Given the importance of upward-closed sets, it is of interest theoretically and practically to be able to characterize the class of upward-closed sets for which their minimal elements are computable. Along this line of research, Valk and Jantzen [8] presented a sufficient and necessary condition under which the set of minimal elements of an upward-closed set is guaranteed to be effectively computable. Suppose U is an upward-closed set over Nk and ω is a symbol representing something being arbitrarily large. In [8], it was shown that the set of minimal elements of U is effectively computable iff the question ‘reg (v) ∩ U 6= ∅?’ is decidable for every v ∈ (N ∪ {ω})k , where reg (v) = {x | x ∈ Nk , x ≤ v}. Such a strategy has been successfully applied to showing computability of a number of upward-closed sets associated with Petri nets [8]. Note, however, that [8] reveals no complexity bounds for the sizes of the minimal elements. As knowing the
I A preliminary version of this work appears in LNCS 4546, Proceedings of the 28th International Conference on Applications and Theory of Petri Nets and Other Models of Concurrency. ∗ Corresponding author at: Department of Electrical Engineering, National Taiwan University, Taiwan, ROC. E-mail address:
[email protected] (H.-C. Yen).
0304-3975/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.tcs.2009.02.036
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
2443
size of minimal elements might turn out to be handy in many cases, the following question arises naturally. If more is known about the query ‘reg (v) ∩ U 6= ∅?’ (other than just being decidable), could the size of the minimal elements be measured? In fact, answering the question in the affirmative is the main contribution of this work. Given a vector v ∈ (N ∪ ω)k , suppose kvk is defined to be the maximum component (excluding ω) in v . We demonstrate that for every v , if a bound on the size of a witness for reg (v) ∩ U 6= ∅? (if one exists) is available, then such a bound can be applied inductively to obtain a bound for all the minimal elements of U. In a recent article [9], such a strategy was first used for characterizing the solution space of a restricted class of parametric timed automata. In this paper, we move a step further by formulating a general strategy as well as applying our unified framework to a wide variety of Petri net problems with upward-closed solution sets. Given a k-place m-transition Petri net P with its transition set T = {t1 , . . . , tm }, a d × m integer matrix A, and a d × 1 integer column vector b, consider the set σ0
σ1
S = {µ | µ → µ1 → µ2 , µ2 ≥ µ1 ; A × #σ1 ≥ b} where µ, µ1 , µ2 ∈ Nk , σ0 ∈ T ∗ , σ1 ∈ T + , and #σ1 (∈ Nm ) is a column vector whose i-th coordinate represents the number σ1
of times transition ti appears in σ1 . In words, S consists of all the markings µ from which a ‘‘repeatable’’ path µ1 → µ2 can be reached such that the transition count along σ1 (i.e., #σ1 ) meets the linear constraint A × #σ1 ≥ b. Following the monotonicity property of Petri nets, S is clearly upward-closed. In this paper, we first apply the inductive strategy developed by Rackoff in [6] (for deriving complexities of the boundedness and covering problems for vector addition systems) to obtain a bound for the minimal elements of S. We then show the solutions of a wide variety of Petri net problems found in [8] to be characterizable by paths defined in S, immediately yielding bounds for the minimal elements of upward-closed sets associated with those problems. In addition to considering those upward-closed sets investigated in [8] for general Petri nets, we illustrate the usefulness of our approach in performing backward-reachability analysis, which is a useful technique in automated verification. We show that for certain classes of Petri nets and state set Q , the backward-reachability set of Q is not only upward-closed but also falls into the category to which our unified approach can be applied. Such Petri nets include several well known subclasses for which reachability is characterizable by integer linear programming. Our analysis can also be applied to the model of lossy vector addition systems with states (VASSs) [4] to derive bounds for the backward-reachability sets. For (conventional or lossy) VASSs, we further enhance the work of [4] by providing complexity bounds for the so-called global model checking problem with respect to a certain class of formulas. (In [4], such a problem was only shown to be decidable, yet no complexity was available there.) Upward-closed sets associated with a kind of parametric clocked Petri nets are also investigated in this paper, serving as yet another application of our unified approach. 2. Preliminaries Let Z (resp., N) be the set of all integers (resp., nonnegative integers), and Zk (resp., Nk ) be the set of k-dimensional vectors of integers (resp., nonnegative integers). We define the max-value of v , denoted by kvk, to be max{|v(i)| | 1 ≤ i ≤ k}, i.e., the absolute value of the largest component in v . For a set of vectors V = {v1 , . . . , vm }, the max-value of V (also written as kV k) is defined to be max{kvi k | 1 ≤ i ≤ m}. In our subsequent discussion, we let Nω = N ∪{ω} (ω is a new element capturing the notion of something being ‘arbitrarily large’).1 We also let Nkω = (N ∪ {ω})k = {(v1 , . . . , vk ) | vi ∈ (N ∪ {ω}), 1 ≤ i ≤ k}. For a v ∈ Nkω , we also write kvk to denote max{v(i) | v(i) 6= ω} (i.e., the largest component in v excluding ω) if v 6= (ω, . . . , ω); k(ω, . . . , ω)k = 1. For an element v ∈ Nkω , let reg (v) = {w ∈ Nk | w ≤ v}. A set U (⊆ Nk ) is called upward-closed (or right-closed in some literature) if ∀x ∈ U, ∀y, y ≥ x =⇒ y ∈ U. An element x (∈ U ) is said to be minimal if there is no y (6= x) ∈ U such that y < x. We write min(U ) to denote the set of minimal elements of U. From Dickson’s lemma, it is well known that for each upward-closed set U (⊆ Nk ), min(U ) is finite. Even so, min(U ) might not be effectively computable in general. Given a function f , we write the k-fold composition of f as f (k) (i.e., k
f
(k)
}| { (x) = f ◦ · · · ◦ f (x)). z
A Petri net (PN, for short) is a 3-tuple P = (P , T , ϕ), where P is a finite set of places, T is a finite set of transitions, and ϕ is a flow function ϕ : (P × T ) ∪ (T × P) → N. Let k and m denote |P | (the number of places) and |T | (the number of transitions), respectively. k is also called the dimension of the PN. A marking is a mapping µ : P → N. The transition vector of a transition t, denoted by t¯, is a k-dimensional column vector in Zk such that t¯(i) = ϕ(t , pi ) − ϕ(pi , t ), and the set of transition vectors, Pj k ¯ denoted by T¯ , is {t¯ | t ∈ T }. For a sequence of transitions σ = t1 t2 · · · tj , we define ∆(σ ) = i=1 ti (∈ Z ), i.e., a vector corresponding to the net changes of tokens in P if σ is executed. A transition t ∈ T is enabled at a marking µ iff ∀p ∈ P, ϕ(p, t ) ≤ µ(p). If a transition t is enabled, it may fire and t
yield marking µ0 (written as µ → µ0 ) with µ0 (p) = µ(p) − ϕ(p, t ) + ϕ(t , p), ∀p ∈ P. By establishing an ordering on the elements of P and T (i.e., P = {p1 , . . . , pk } and T = {r1 , . . . , rm }), we can view a marking µ as a k-dimensional column vector
1 We assume the following arithmetic for ω: (1) ∀n ∈ N, n < ω, (2) ∀n ∈ N , n + ω = ω − n = ω, (n + 1) × ω = ω, 0 × ω = ω × 0 = 0. ω
2444
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
Fig. 1. Induction basis.
with its i-th component being µ(pi ), and #σ as an m-dimensional column vector with its jth entry denoting the number of σ occurrences of transition rj in σ . The reachability set of P with respect to µ0 is the set R(P , µ0 ) = {µ | ∃σ ∈ T ∗ , µ0 → µ}. σ
F (P , µ0 ) (= {σ ∈ T ∗ | µ0 →}) denotes the set of all fireable sequences of transitions in PN (P , µ0 ). Given a σ ∈ T ω , Inf T (σ ) denotes the set of all elements in T that occur infinitely many times in σ . A k-dimensional vector addition system with states (VASSs) is a 5-tuple (v0 , V , s1 , S , δ), where v0 ∈ Nk is called the start vector, V (⊆ Zk ) is called the set of addition rules, S is a finite set of states, δ(⊆ S × S × V ) is the transition relation, and s1 (∈ S ) is the initial state. Elements (p, q, v) of δ are called transitions and are usually written as p → (q, v). A configuration of a VASS is a pair (p, x) where p ∈ S and x ∈ Nk . The transition p → (q, v) can be applied to the configuration (p, x) and yields the configuration (q, x + v), provided that x + v ≥ 0. 3. A strategy for computing the sizes of minimal elements In an article [8] by Valk and Jantzen, the following result was proven which suggests a sufficient and necessary condition under which the set of minimal elements of an upward-closed set is effectively computable: Theorem 1 ([8]). For each upward-closed set K (⊆ Nk ), min(K ) is effectively computable iff for every v ∈ Nkω , the problem ‘reg (v) ∩ K 6= ∅?’ is decidable. (Recall that reg (v) = {w ∈ Nk | w ≤ v}.) What follows can be thought of as a refinement of Theorem 1. Theorem 2. Given an upward-closed set U (⊆ Nk ), if for every v ∈ Nkω a witness w ˆ ∈ Nk for ‘reg (v) ∩ U 6= ∅’ (if one exists) can be computed with (i) kwk ˆ ≤ b for some b ∈ N when v = (ω, . . . , ω), (ii) kwk ˆ ≤ f (kvk) when v 6= (ω, . . . , ω), for some monotone function f , then kmin(U )k ≤ f (k−1) (b). Proof. Given an arbitrary h, 1 ≤ h ≤ k, we show inductively that for each m ∈ min(U ), there exist h indices im1 , . . . , imh such that ∀l, 1 ≤ l ≤ h, m(iml ) ≤ f (h−1) (b). • (Induction basis) Consider the case when h = 1. We begin with v0 = (ω, . . . , ω). Assume that w0 is a witness for reg (v0 ) ∩ U 6= ∅, which, according to the assumption of the theorem, satisfies kw0 k ≤ b = f (0) (b). Let min1 (U ) = min(U )\reg ((b, . . . , b)), i.e., those in min(U ) that have at least one component larger than b. If min1 (U ) = ∅, then the theorem follows since f (0) (b) becomes a bound for kmin(U )k. Otherwise, ∀m ∈ min1 (U ), it must be the case that ∃i, 1 ≤ i ≤ k, m(i) < b; otherwise, m would not have been minimal since w0 ≤ (b, . . . , b). Hence, the assertion holds for h = 1, i.e., f (0) (b) is a bound for at least one component for all the elements in min(U ). See Fig. 1. • (Induction S step) Assume that the assertion holds for h (< k); we now show the case for h + 1. Consider minh (U ) = min(U )\ v∈N k ,kvk≤f (h−1) (b) {reg (v)}, i.e., the set of minimal elements that have at least one coordinate exceeding f (h−1) (b). If minh (U ) = ∅, the assertion holds; otherwise, take an arbitrary m ∈ minh (U ), and let im1 , . . . , imh be the indices of those components satisfying the assertion, i.e., ∀1 ≤ j ≤ h, m(imj ) ≤ f (h−1) (b). Let vhm be such that vhm (l) = m(imj ), if l = imj ; = ω otherwise. That is, vhm agrees with m on coordinates im1 , . . . , imh , and carries the value ω for the remaining coordinates. Notice that kvhm k ≤ f (h−1) (b). According to the assumption of the theorem, a witness whm for reg (vhm ) ∩ U 6= ∅ with kwhm k bounded by f (kvhm k) (≤ f (f (h−1) (b))) can be obtained. Furthermore, whm (imj ) ≤ m(imj ), 1 ≤ j ≤ h. (Notice that m ∈ minh (U ) implies the existence of such a witness.) It must be the case that there exists an index imh+1 (6∈ {im1 , . . . , imh }) such that m(imh+1 ) ≤ whm (imh+1 ) (≤ f (f (h−1) (b))) = f (h) (b)), since otherwise, whm < m – contradicting m being minimal. The induction step is therefore proven.
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
2445
4. Some applications 4.1. General Petri nets Given a k-place m-transition PN (P , T , ϕ), a d × m integer matrix A, and a d × 1 integer column vector b, consider the set σ0
σ1
S = {µ | µ → µ1 → µ2 , µ2 ≥ µ1 ; A × #σ1 ≥ b} where µ, µ1 , µ2 ∈ Nk , and σ0 ∈ T ∗ , σ1 ∈ T + . (Here A × #σ1 ≥ b represents a system of d inequalities with their variables corresponding to the frequency counts of the m transitions along the path σ1 .) Clearly the set S is an upward-closed set. In what follows, we first use Theorem 2 to find a bound for the sizes of the minimal elements of S. We then show a wide variety of PN problems reported in the literature to be characterizable by the above set S as special cases, which in turn yields upper bounds for the sizes of their minimal elements. Our analysis makes use of the inductive strategy developed by Rackoff in [6], in which the complexities of the boundedness and covering problems for vector addition systems (equivalently, PNs) were developed. Intuitively speaking, Rackoff’s strategy relies on showing that if a path exhibiting unboundedness or coverability exists, then there is a ‘short’ witness. Before going into details, we require the following definitions, most of which can be found in [6]. A generalized marking is a mapping µ : P → Z (i.e., negative coordinates are allowed). A w ∈ Zk is called i-bounded (resp., i–r bounded) if 0 ≤ w(j), ∀1 ≤ j ≤ i (resp., 0 ≤ w(j) ≤ r, ∀1 ≤ j ≤ i). Given a k-place PN P = (P , T , ϕ), suppose p = w1 w2 · · · wl (l > 1) is a sequence of vectors (generalized markings) in Zk such that ∀j, 1 ≤ j < l, zj+1 − zj ∈ T¯ (the set of transition vectors). Sequence p is said to be
• i-bounded (resp., i–r bounded) if every member of p is i-bounded (resp., i–r bounded), • self-covering if there is a j, 1 ≤ j < l, such that wj ≤ wl , • an i-loop if wl (j) = w1 (j), ∀1 ≤ j ≤ i. The i-loop is called simple if it does not contain any i-loop as its proper subsequence. wl − w1 (∈ Zk ) is called the loop value of the i-loop. σ0
σ1
With respect to matrices [A]d×m and [b]d×1 , let s(i, µ, A, b) be the length of the shortest i-bounded path µ → µ1 → µ2 (for some σ0 ∈ T ∗ and σ1 ∈ T + ), such that µ2 ≥ µ1 and A × #σ1 ≥ b, i.e., a self-covering path with the transition count vector #σ1 satisfying A × #σ1 ≥ b. For convenience, we call such a path a ‘‘self-covering (A, b)-path’’. If no such paths exist, then s(i, µ, A, b) = 0. We define h(i, A, b) = max{s(i, µ, A, b) | µ ∈ Zk }. In what follows, we argue that h(i, A, b) ∈ N. To see this, first note that the function s is monotonic with respect to µ in the sense that if an i-bounded self-covering (A, b)-path p exists for a marking µ, then p is guaranteed to be an i-bounded self-covering (A, b)-path with respect to µ + ∆, for any ∆ ≥ 0. This implies s(i, µ + ∆, A, b) ≤ s(i, µ, A, b), for any ∆ ≥ 0. As a result, if we let E (i, A, b) = {µ | s(i, µ, A, b) > 0}, i.e., the set of all generalized markings from which i-bounded self-covering (A, b)-paths exist, then E (i, A, b) is upward-closed. Let E 0 (i, A, b) be the set of minimal elements of E (i, A, b). Then h(i, A, b) = max{s(i, µ, A, b) | µ ∈ Zk } = max{s(i, µ, A, b) | µ ∈ E 0 (i, A, b)} is finite, which does not depend on the starting marking. Before deriving our result, we need the following lemma bounding the size of the solutions of integer linear programming instances. Lemma 3 (From [3]). Let d1 , d2 ∈ N+ , let B be a d1 × d2 integer matrix and let h be a d1 × 1 integer matrix. Let e ≥ d2 be an upper bound on the absolute values of the integers in B and h. If there exists a vector v ∈ Nd2 which is a solution to Bv ≥ h, then for some constant c independent of e, d1 , d2 , there exists a vector v such that Bv ≥ h and kvk ≤ ecd1 . The following lemma bounds the length of the shortest i–r bounded self-covering (A, b)-path in a k-place m-transition PN P = (P , T , ϕ). In the reminder of this section, we let n = max{d, k, m, kT¯ k, kAk, kbk}, where A is of dimension d × m. Lemma 4. If there is an i–r bounded self-covering (A, b)-path in PN P with initial marking µ, then there exists a witnessing path c of length ≤ r n , for some constant c independent of r and n. Proof. The proof is similar to (but more involved than) the corresponding one in [6]. For the sake of completeness, a proof sketch is given below. σ0
σ0
σ1
Let µ → µ1 → µ2 be an i–r bounded self-covering (A, b)-path. First note that µ → µ1 need not be longer than r k ; otherwise, there must exist an i-loop which can be removed without affecting the requirement of being a self-covering σ1
(A, b)-path. (Recall that the condition A × #σ1 ≥ b is independent of the prefix σ0 .) We now decompose µ1 → µ2 into a σ0
possibly shorter path µ1 → µ2 and a multiset of simple i-loops {Q1 , Q2 , . . . , Qj } (for some j) such that
• • • •
σ0
the length of µ1 → µ2 is ≤ (r k + 1)2 , the length of each Qi is ≤ r k , each coordinate of the loop value of Qi (1 ≤ i ≤ j) has its absolute value ≤ n ∗ r k , the number of distinct m-dimensional vectors in {#Q1 , #Q2 , . . . , #Qj } is ≤ (r k + 1)m .
2446
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
The reason that such a decomposition exists can be found in [6]. Note that the total number of distinct loop values is ≤ (r k + 1)m , as the same vector count yields the same loop value. Let v1 , . . . , vg (g ≤ (r k + 1)m ) be the distinct vectors of transition counts of those simple i-loops, and let l1 , . . . , lg be σ1
the respective loop values. Recall that each vi , 1 ≤ i ≤ g , is an m × 1 column vector. Now the path µ1 → µ2 can be characterized by the following system of linear inequalities:
∆(σ 0 ) + a1 ∗ l1 + · · · + ag ∗ lg ≥ 0 A × (#σ 0 + a1 ∗ v1 + · · · + ag ∗ vg ) ≥ b
(a) (b)
(1)
where ai , 1 ≤ i ≤ g , corresponds to the number of times that a simple i-loop with vector count vi occurs. (1)(a) and (b) together can be regarded as a system of k + d inequalities with g unknown variables (i.e., a1 , . . . , ag ) and the max-value of the system is bounded by n ∗ (r k + 1)2 (corresponding to the maximum increase/decrease of tokens in 2
2
a place w.r.t. the firing of σ 0 ). Note that r 3n ≥ max{n ∗ (r k + 1)2 , (r k + 1)m }. By letting d1 = k + d and e = r 3n and 3n2 c 0 (k+d)
according to Lemma 4, there exists a solution u ∈ Ng such that kuk ≤ (r ) , for some constant c 0 . As a result, there c exists a ‘‘short’’ i–r bounded self-covering (A, b)-path whose length is no more than r n , for some constant c independent of r and n. σ0
σ1
We are now ready to bound the max-value of the minimal elements of the set S = {µ | µ ∈ Nk , µ → µ1 → µ2 , µ2 ≥ µ1 , A × #σ1 ≥ b}. c1 ×k×logk
Lemma 5. Given PN P and a z ∈ Nkω , reg (z )∩ S 6= ∅ iff there is a witness z 0 with kz 0 k ≤ n2 that the bound is independent of z.)
, where c1 is a constant. (Note
Proof. Recall that h(i, A, b) = max{s(i, µ, A, b) | µ ∈ Zk }, where s(i, µ, A, b) is the length of the shortest i-bounded selfc c covering (A, b)-path from µ. We first show h(0, A, b) ≤ 2n and h(i + 1) ≤ (n × h(i))n + h(i), for 1 ≤ i < k. nc h(0, A, b) ≤ 2 can be proven along the same lines as Lemma 4.6 of [6]. To show h(i + 1) ≤ (n × h(i))i+1 + h(i), consider any (i + 1)-bounded self-covering (A, b)-path p : v1 · · · vr . We have two cases: c
• Case 1: Path p is (i+1)−(n×h(i))-bounded. Then according to Lemma 4, there exists a short one with length ≤ (n×h(i))n . • Case 2: Otherwise, let vg be the first vector along p that is not (n × h(i)) bounded. By chopping off (i + 1)-loops, the prefix v1 . . . vg can be shortened if necessary to make the length ≤ (n × h(i))i+1 . Let p0 the shortened prefix path. With no loss of generality, we assume the (i + 1)st position to be the coordinate whose value exceeds n × h(i) at vg . Recalling the definition of h(i, A, b), there is an i-bounded self-covering (A, b)-path, say l, of length ≤ h(i, A, b) from vg . On appending l to the shortened prefix p0 (i.e., replacing the original suffix path vg . . . vr by l), the new path is an (i + 1)-bounded selfcovering (A, b)-path, because the value of the (i + 1)st coordinate exceeds n × h(i) and the path l (of length bounded by ≤ h(i)) can at most subtract n × h(i) from coordinate i + 1, since the application of a PN transition can subtractcat most n from a given coordinate. Note that the length of the new path is bounded by (n × h(i))i+1 + h(i) ≤ (n × h(i))n + h(i). c
c
By solving the recurrence relation h(0, A, b) ≤ 2n and h(i + 1, A, b) ≤ (n × h(i))n + h(i), for 1 ≤ i < k, we have 0 n c ∗k
h(k, A, b) ≤ 2
0 2c ∗k∗logn
=2
, for some constant c 0 . What this bound means is that regardless of the initial vector, if a self-
covering (A, b)-path exists, then there is a short one whose length is bounded by 22 c 0 ∗k∗log n
can at most subtract kT¯ k × 22 constant c1 .
c1 ∗k∗logn
Theorem 6. kmin(S)k ≤ 22
c 0 ∗k∗log n
c 0 ∗k∗logn
. Since a path of length ≤ 22 c 0 ∗k∗logn
from any component, kz 0 k is therefore bounded by n ∗ 22
c ∗k∗logn
≤ 22 1
, for some
, where c1 is a constant. c1 ∗k∗logn
Proof. Given a z ∈ Nω , define f (kz k) = 22 (where c1 a constant stated in Lemma 5), which provides an upper bound for a witness certifying reg (z )∩ min(S) 6= ∅, if one exists. Notice that the value of f is independent of z. Our result follows immediately from Theorem 2. k
In what follows, we show that a wide variety of PN problems studied in the literature are actually special cases of finding σ0
σ1
paths satisfying µ → µ1 → µ2 ; µ2 ≥ µ1 ; A × #σ1 ≥ b, for some matrices A and b. As a result, Theorem 6 can immediately be applied to deriving the max-value of the minimal elements of the upward-closed sets associated with those PN problems. We first examine some upward-closed sets defined and discussed in [8]. Some definitions from [8] are recalled first. Given a PN (P , T , ϕ), a vector µ ∈ Nk is said to be:
• • • •
t
Tˆ -blocked, for Tˆ ⊆ T , if ∀µ0 ∈ R(P , µ), ¬(∃t ∈ Tˆ , µ0 →); for the case when Tˆ = T , µ is said to be a total deadlock; dead if F (P , µ) is finite; bounded if R(P , µ) is finite; otherwise, it is called unbounded; σ Tˆ -continual, for Tˆ ⊆ T , if there exists a σ ∈ T ω , with µ → and Tˆ ⊆ Inf T (σ ).
Consider the following four sets defined in [8]:
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
• • • •
2447
NOTBLOCKED(Tˆ ) = {µ ∈ Nk | µ is not Tˆ -blocked}. NOTDEAD = {µ ∈ Nk | µ is not dead}. UNBOUNDED = {µ ∈ Nk | µ is unbounded}. CONTINUAL(Tˆ ) = {µ ∈ Nk | µ is Tˆ -continual}.
It has been shown in [8] that for each of the above four upward-closed sets, the ‘reg (v) ∩ K 6= ∅?’ query of Theorem 1 is decidable; as a consequence, the set of minimal elements is effectively computable. We now show how to use Theorem 6 to estimate the bound of the minimal elements for each of the four sets. • NOTBLOCKED(Tˆ ): Consider a PN P 0 = (P 0 , T 0 , ϕ 0 ) constructed from P such that P 0 = P ∪ {p0 }, T 0 = T ∪ {t 0 }, (ϕ 0 (p, t ) = ϕ(p, t ), ϕ 0 (t , p) = ϕ(t , p), ∀p ∈ P , t ∈ T ), (ϕ(t , p0 ) = 1, ∀t ∈ Tˆ ), and ϕ(t 0 , p0 ) = ϕ(p0 , t 0 ) = 1. Note that p0 and t 0 form a selfloop that can be fired repeatedly, provided that p0 is not empty. It is not difficult to see that w.r.t. P 0 , NOTBLOCKED(Tˆ ) = σ0
σ1
{µ | µ → µ1 → µ2 , µ2 ≥ µ1 , σ0 ∈ (T 0 )∗ , σ1 = t 0 }. • NOTDEAD: σ0 σ1 It is easy to see that NOTDEAD = {µ | µ → µ1 → µ2 , µ2 ≥ µ1 , σ0 ∈ T ∗ , σ1 ∈ T + }. • UNBOUNDED: Consider a PN P 0 = (P 0 , T 0 ϕ 0 ) constructed from P such that P 0 = P ∪ {p0 }, T 0 = T ∪ {t 0 } ∪ {tp | p ∈ P }, (ϕ 0 (p, t ) = ϕ(p, t ), ϕ 0 (t , p) = ϕ(t , p), ∀p ∈ P , t ∈ T ), (ϕ(p, tp ) = ϕ(tp , p0 ) = 1, ∀p ∈ P), and ϕ(p0 , t 0 ) = 1. Clearly w.r.t. P 0 , σ0 σ1 UNBOUNDED = {µ | µ → µ1 → µ2 , µ2 ≥ µ1 , σ0 ∈ (T 0 )∗ , #σ1 (t 0 ) > 0}. • CONTINUAL(Tˆ ): σ0 σ1 It is easy to see that CONTINUAL(Tˆ ) = {µ | µ → µ1 → µ2 , µ2 ≥ µ1 , σ0 ∈ T ∗ , #σ1 (t ) > 0, ∀t ∈ Tˆ }. We now consider a number of fairness related problems defined in [8] (see Definition 6.9 of [8]), and see how they also fall into our general framework discussed above. Let A be a finite set of nonempty subsets of transitions. Consider the following six types of fairness notions [8]. With respect to A, σ is said to be
• • • • • •
T 1-fair iff ∃A ∈ A, ∃i ≥ 1, ti ∈ A, T 10 -fair iff ∃A ∈ A, ∀i ≥ 1, ti ∈ A, T 2-fair iff ∃A ∈ A, inf T (σ ) ∩ A 6= ∅, T 20 -fair iff ∃A ∈ A, inf T (σ ) ⊂ A, T 3-fair iff ∃A ∈ A, inf T (σ ) = A, T 30 -fair iff ∃A ∈ A, A ⊆ inf T (σ ).
The fair nontermination problem (fair NTP, for short) with respect to T1 (T 10 , T 2, T 20 , T 3, T 30 , respectively) fairness is the problem of determining whether a PN P has an infinite type T1- (T 10 -, T 2-, T 20 -, T 3-, T 3’-, respectively)fair computation from its initial marking µ. σ
Let X-FAIR-NTP(A) be the set {µ | ∃σ ∈ T ω , µ →, σ is X -fair w.r.t. A}, where X ∈ {T 1, T 10 , T 2, T 20 , T 3, T 30 }, which is clearly upward-closed. We now show for the fairness notions of all six types, X-FAIR-NTP(A) can be dealt with in the framework discussed earlier in this section. • T1-FAIR-NTP(A): 0 0 0 0 0 0 Consider a PN P S = ((P ∪ {p }), (T ∪ {t }), ϕ ) constructed from P such that (∀p ∈ P , t ∈ T , ϕ (p, t ) = ϕ(p, t ), ϕ (t , p) = ϕ(t , p)), (∀t ∈ A∈A A, ϕ 0 (t , p0 ) = 1), and ϕ 0 (p0 , t 0 ) = ϕ(t 0 , p0 ) = 1. It is easy to see that w.r.t. P 0 , µ ∈ T1-FAIR-NTP(A) iff
P σ0 σ1 µ → µ1 → µ2 , µ2 ≥ µ1 , #σ1 (t 0 ) > 0, t ∈T #σ1 (t ) > 0, for some σ0 ∈ T ∗ , σ1 ∈ (T ∪ {t 0 })+ . • T 10 -FAIR-NTP(A): σ0 σ1 µ ∈ T 10 -FAIR-NTP(A) iff ∃A ∈ A such that µ → µ1 → µ2 , µ2 ≥ µ1 , for some σ0 ∈ A∗ , σ1 ∈ A+ . • T2-FAIR-NTP(A): P σ0 σ1 µ ∈ T2-FAIR-NTP(A) iff µ → µ1 → µ2 , µ2 ≥ µ1 , and ( t ∈(SA∈A A) #σ1 (t )) > 0, for some σ0 ∈ T ∗ , σ1 ∈ T + . • T 20 -FAIR-NTP(A): P P σ0 σ1 µ ∈ T 20 -FAIR-NTP(A) iff ∃A ∈ A such that µ → µ1 → µ2 , µ2 ≥ µ1 , and ( t ∈A #σ1 (t ) ≥ 0) ∧ ( t 6∈A #σ1 (t ) = 0), for some σ0 ∈ T ∗ , σ1 ∈ T + . • T3-FAIR-NTP(A): P σ0 σ1 µ ∈ T3-FAIR-NTP(A) iff ∃A ∈ A such that µ → µ1 → µ2 , µ2 ≥ µ1 , and (∀t ∈ A, #σ1 (t ) > 0) ∧ ( t 6∈A #σ1 (t ) = 0), for some σ0 ∈ T ∗ , σ1 ∈ T + . • T 30 -FAIR-NTP(A): µ ∈ T 30 -FAIR-NTP(A) iff ∃A ∈ A such that µ ∈ CONTINUAL(A).
In view of the above, a bound for the max-value of the minimal elements of each of the aforementioned upward-closed sets follows from Theorem 6.
2448
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
Now we turn our attention to a problem that arises frequently in automated verification. Given a system S with initial state q, and a designated set of states Q , it is often of interest and importance to ask whether some state in Q can be reached from q, which constitutes a question related to the analysis of a safety property. Instead of using the forward-reachability analysis (which computes all the states that can be reached from q to see whether the intersection with Q is non-empty or not), an equally useful approach is to use the so-called backward-reachability analysis. In the latter, we compute the set pre∗ (S , Q ) which consists of all the states from which some state in Q is reachable, and then decide whether q ∈ pre∗ (S , Q ). In general, pre∗ (S , Q ) may not be computable for infinite state systems. For PNs, we define the backward-reachability (BR, for short) problem as follows:
• Input: A PN P and a set U of markings. • Output: The set pre∗ (P , U ) = {µ | R(P , µ) ∩ U 6= ∅}. Now then {µ | R(P , µ) ∩ U 6= ∅} is upward-closed as well, and is, in fact, equivalent to S suppose U is0 upward-closed; 0 {µ | ∃µ ∈ R ( P , µ), µ ≥ ν}. The latter is basically asking about coverability issues of PNs. Hence, the max-value ν∈min(U ) of the minimal elements can be derived along the same lines as for the set NOTBLOCKED. 4.2. Parametric clocked Petri nets Clocked Petri nets are Petri nets augmented by a finite set of real-value clocks and clock constraints. Clocks are used to measure the progress of real time in the system. All the clocks can be reset and increase at a uniform rate. We can also regard clocks as stop-watches which refer to the same global clock. The use of such clock structure was originally introduced in [1] for defining timed automata. Given a set X = {x1 , x2 , . . . , xh } of clock variables, the set Φ (X ) of clock constraints δ is defined inductively by δ := x ≤ c | c ≤ x | ¬δ | δ ∧ δ, where x is a clock in X and c is a constant in Q+ (i.e., the set of nonnegative rationals). A clock reading is a mapping ν : X → R which assigns each clock a real value. For η ∈ R, we write ν + η to denote the clock reading which maps every clock x to the value ν(x) + η. That is, after η time units added to the global clock, the value of every clock must increase by η units as well. A clock reading ν for X satisfies a clock constraint δ over X , denoted by δ(ν) ≡ true , iff δ evaluates to true using the values given by ν . A clocked Petri net is a 6-tuple N = (P , T , ϕ, X , r , q), where (P , T , ϕ) is a PN, X is a finite set of real-value clock variables, r : T −→ 2X is a labeling function assigning clocks to transitions, and q : T −→ Φ (X ) is a labeling function assigning clock constraints to transitions. Intuitively, r (t ) contains those clock variables which are reset when transition t is fired. A configuration (µ, η, ν ) of a clocked Petri net consists of a marking µ, the global time η and the present clock reading ν . Note that the clock reading ν is continuously being updated as η, the global time, advances. Hence, ν and η are not completely independent. Given a configuration (µ, η, ν ) of a clocked Petri net P , a transition t is enabled iff ∀ p ∈ P , ϕ(p, t ) ≤ µ(p), and ν satisfies q(t ), the set of constraints associated with transition t, i.e., q(t )(ν) ≡ true . Let µ be the marking and ν the clock reading at time η. Then t may fire at η if t is enabled in the marking µ with the clock reading ν . We then write (t ,η)
(µ, ν) → (µ0 , ν 0 ), where µ0 (p) = µ(p) − ϕ(p, t ) + ϕ(t , p) (for all p ∈ P), and ν 0 (x) = 0 (for all x ∈ r (t )). Note that the global time remains unchanged as a result of firing t. That is, the firing of a transition is assumed to let no time elapse at all. The global clock will start moving immediately after the firing of a transition is completed. Initially, we assume the initial global time η0 and clock reading ν0 to be η0 = 0 and ν0 (x) = 0 (∀x ∈ X ), respectively. It is important to point out that, as opposed to having timed PNs under urgent firing semantics, enabledness is necessary but not sufficient for transition firing in a clocked PN. In other words, it is not required to fire all the enabled transitions at any point in time during the course of the computation. Now consider clocked PNs with parameterized constraints. That is, the ‘c’ in the atomic constraints x ≤ c and c ≤ x is not a constant; instead, it is an unknown parameter. We are interested in the following: - Input: A given clocked PN P with unknown integer parameters θ1 , . . . , θn in its clock constraints, and a set Q of goal markings. - Output: The values of θ1 , . . . , θn (if they exist) such that there exists a computation reaching a marking in Q . In what follows, we let S (θ1 , . . . , θn ) denote such a set of solutions. Even for timed automata, it is known that the emptiness problem (i.e., the problem of deciding whether there exists a parameter setting under which the associated timed language is empty or not) is undecidable when three or more clocks are compared with unknown parameters [2]. In what follows, we consider a special case in which the atomic clock constraints involved are only of the form x ≤ θ or x < θ , and there are no negative signs immediately before inequalities. In this case, the set ‘{(θ1 , . . . , θn ) | there exists a computation in P reaching a marking in Q under (θ1 , . . . , θn )}’ is clearly upward-closed, as x ≤ θ =⇒ x ≤ θ 0 and x < θ =⇒ x < θ 0 , if θ ≤ θ 0 . That is, whatever is enabled under θ is also enabled under θ 0 . A technique known to be useful for reasoning about timed automata is based on the notion of ‘equivalence classes’ [1]. In spite of the differences in value, two different clock readings may induce identical system behaviors; in this case, they are said to be in the same clock region. For clock constraints falling into the types given in our setting, the number of distinct clock regions is always finite [1], meaning that a timed automaton (which is of infinite state potentially) is equivalent behaviorally
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
2449
to a so-called region automaton (which is of finite state). Furthermore, the number of clock regions of a timed automaton A is bounded by 2|Q |(|X | · (CA + 2))|X | , where |Q | is the number of states of A, |X | is the number of clocks of A, and CA is the maximum timing constant involved in A (see [1] for details). This, coupled with our earlier discussion of upward-closed sets and PNs, yields the following result: Theorem 7. Given a k-dimensional clocked PN P with unknown integer parameters θ1 , . . . , θn in its clock constraints, and an d2 ·n·k·logk
n
·|X | upward-closed set Q of goal markings, kmin(S (θ1 , . . . , θn ))k is bounded by O((D · |X |)2 ), where |X | is the number of clocks, D is the absolute value of the maximum number involved in P and min(Q ), and d2 is a constant.
Proof. The proof is somewhat involved, and hence, only a proof sketch is given here. Our derivation is based on the approach detailed in Theorem 2. For the PN to reach Q , it suffices to consider whether a marking covering an element in min(Q ) is reachable or not. Recall from Theorem 2 that our approach for computing kmin(S (θ1 , . . . , θn ))k begins by letting (θ1 , . . . , θn ) = (ω, . . . , ω) = v0 . In this case, the associated clocked PN can be simplified by deleting all the clock constraints involving θi , because x ≤ (<)ω always holds. Now the idea is to simulate clocked PNs by VASSs. To this end, we use the ‘state’ portion of the VASS to capture the structure of (finitely many) clock regions of a clocked PN as discussed earlier, and an ‘addition vector’ of the VASS to simulate a transition of the PN. Using the analysis of [1], it is not hard to see that the number of clock regions is bounded by O((|X | · C0 )|X | ), where |X | is the number of clocks and C0 is the maximum timing constant (i.e., the maximum value of constants involved in clock constraints), which corresponds to the number of states of the VASS. It was shown in [7], using the technique of multi-parameter analysis, that for an m-state, k-dimensional VASS whose largest integer can be represented in l bits, the length of the shortest witnessing path covering a given marking is bounded d·k·logk
by O((2l · m)2
). Applying a similar analysis to our constructed VASS and the concept of clock regions, a witness for
‘reg (v0 ) ∩ S (θ1 , . . . , θn )) 6= ∅?’ (if it exists) of max-value bounded by d1 · (D · (|X | · C0 )|X | )2 constants d1 , d2 . This bound corresponds to the b value in the statement of Theorem 2.
d2 ·k·logk
can be found, for some
d2 ·k·logk
The next step is to start with v1 = (θ1 , ω, . . . , ω) with θ1 < d1 · (D · (|X | · C0 )|X | )2 . Let this value be C1 . In this case, those clock constraints involving θ1 can no longer be ignored. We construct a new VASS simulating the associated clocked PN, and such a VASS has its number of states bounded by O((|X | · C1 )|X | ), implying that a witness for ‘reg (v1 ) ∩ S (θ1 , . . . , θn )) 6= d ·k·logk
∅?’ (if it exists) of max-value bounded by d1 · (D · (|X | · C1 )|X | )2 2
can be found, which corresponds to the f function (w.r.t.
variable C1 ) in Theorem 2. Finally, Theorem 2 immediately yields kmin(S (θ1 , . . . , θn ))k = O((D · |X |)2
d2 ·n·k·logk ·|X |n
).
4.3. Subclasses of Petri nets Consider the following problem:
• Input: A PN P = (P , T , ϕ), a marking µ0 , and a system of linear (in)equalities L(v1 , . . . , vm ), where m = |T |. Clearly, σ the set pre∗ (P , (µ0 , L)) = {µ | ∃σ ∈ T ∗ , µ → µ00 , µ00 ≥ µ0 and L(#σ (t1 ), . . . , #σ (tm )) holds } is upward-closed. (Intuitively, the set contains those markings µ from which there is a computation covering µ0 and along which the transition firing count vector (#σ (t1 ), . . . , #σ (tm )) satisfies L.) • Output: min(pre∗ (P , (µ0 , L))). What makes the subclasses of PNs in Fig. 2 of interest is that their reachability sets can be characterized by integer linear programming (ILP)—a relatively well-understood mathematical model (see [11]). In our subsequent discussion, we shall use normal PNs as an example to show how to derive the max-value of the minimal elements of the pre∗ associated with a normal PN and an upward-closed goal set U. A PN is normal [10] iff no transition can decrease the token count of a minimal circuit by firing at any marking. For the definitions and the related properties for the rest of the PNs in Fig. 2, the reader is referred to [11]. In [5], the reachability problem of normal PNs was equated with ILP using the so-called decompositional approach. The idea behind the decompositional technique relies on the ability to decompose a PN P = (P , T , ϕ) (possibly in a nondeterministic fashion) into sub-PNs Pi = (P , Ti , ϕi ) (1 ≤ i ≤ n, Ti ⊆ T , and ϕi is the restriction of ϕ to (P × Ti ) ∪ (Ti × P )) σ
such that for an arbitrary computation µ0 → µ of PN P , σ can be rearranged into a canonical form σ1 σ2 · · · σn with σ1
σ2
σn
µ0 → µ1 → µ2 · · · µn−1 → µn = µ, and for each i, a system of linear inequalities ILPi (x, y, z ) can be set up (based upon sub-PN Pi , where x, y, z are vector variables) in such a way that ILPi (µi−1 , µi , z ) has a solution for z iff there exists a σi σi in Ti∗ such that µi−1 → µi and z = #σi . Consider a normal PN P = (P , T , ϕ) and let P = {p1 , . . . , pk } and T = {t1 , . . . , tm }. In [5], it was shown that an arbitrary computation of a normal PN can be decomposed according to a sequence of distinct transitions τ = tj1 · · · tjn . More precisely, S we define the characteristic system of inequalities for P and τ as S (P , τ ) = 1≤h≤n Sh , where - Sh = {xh−1 (i) ≥ ϕ(pi , tjh ), xh = xh−1 + Ah · yh | 1 ≤ i ≤ k}, Ah is an k × h matrix whose columns are t¯j1 , . . . , t¯jh , yh is a h × 1 column vector, for 1 ≤ h ≤ n.
2450
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
Fig. 2. Containment relationship among subclasses of PNs.
The variables in S are the components of the k-dimensional column vectors x0 , . . . , xn and the h-dimensional column vectors yh , 1 ≤ h ≤ n. In [5], it was shown that µ0 ∈ R(P , µ) iff there exists a sequence of distinct transitions τ = tj1 · · · tjn such that {x0 = µ} ∪ {xn = µ0 } ∪ S (P , τ ) has a nonnegative integer solution. In particular, for each 1 ≤ h ≤ n, the i-th coordinate of the yh variable (an h × 1 column vector) represents the number of times transition tji (1 ≤ i ≤ h) is used along the path reaching from xh−1 to xh . Intuitively speaking, the decomposition is carried out in such a way that
• stage h involves one more transition (namely tjh ) than its preceding stage h − 1; furthermore, tjh must be enabled in xh−1 as the condition ‘xh−1 (i) ≥ ϕ(pi , tjh )’ in Sh enforces, • xh represents the marking at the end of stage h and the beginning of stage h + 1, • the computation from xh−1 to xh is captured by Sh , in which ‘xh = xh−1 + Ah · yh ’ simply says that the state equation associated with the sub-PN in stage h is sufficient and necessary to capture reachability between two markings. For convenience, we define y0h to be a vector in Nm such that y0h (ji ) = yh (i), 1 ≤ i ≤ h, and the remaining coordinates are zero. Note that y0h is an m-dimensional vector w.r.t. the ordering t1 , t2 , . . . , tm and yh is an h-dimensional vector w.r.t. the ordering tj1 , tj2 , . . . , tjh . Intuitively speaking, y0h (i) serves the purpose of rearranging the vector yh w.r.t. the ordering t1 , t2 , . . . , tm , while filling those coordinates not corresponding to tj1 , tj2 , . . . , tjh with zeros. Now we are in a position to derive a bound for the minimal elements of pre∗ for normal PNs. Theorem 8. Given a normal PN P = (P , T , ϕ) with |P | = k and |T | = m, a marking µ0 , and a linear constraint L(v1 , . . . , vm ), then kmin(pre∗ (P , (µ0 , L)))k ≤ (a1 )(c ∗a2 ) , where c is some constant, a1 = max{kT¯ k, s, (m + k)∗ m}∗ m ∗kT¯ k, a2 = (m ∗ k + r ), r the number of (in)equalities in L, and s the absolute value of the largest integer mentioned in L. k
Proof. Given a subset of places Q ⊆ P, we define a restriction of P on Q as PN PQ = (Q , T , ϕQ ), where ϕQ is the restriction of ϕ on Q and T (i.e., ϕQ (p, t ) = ϕ(p, t ); ϕQ (t , p) = ϕ(t , p) if p ∈ Q ). It is obvious from the definition of normal PNs that PQ is normal as well. Now consider a vector v ∈ Nkω . To find a witness for reg (v) ∩ pre∗ (P , (µ0 , L)) 6= ∅, if one exists, it suffices to consider sub-PN with Q (v) = {p | v(p) 6= ω, p ∈ P } as the set of places (as opposed to the original set P), since each ω place can supply an arbitrary number of tokens to each of its output transitions. (That is, places associated with ω components in v can be ignored as far as reaching a goal marking in U is concerned.) Hence, reg (v) ∩ pre∗ (P , (µ0 , L)) 6= ∅ iff for some τ (of length ≤ m), the following system of linear inequalities has a solution: H ≡ S (PQ , τ ) ∪{x0 = v} ∪ {xn ≥ µ0 } ∪ L(v1 , . . . , vm ) ∪ {(v1 , . . . , vm )tr = y01 + · · · + y0n }. 2 Notice that in the above, {(v1 , . . . , vm )tr = y01 + · · · + y0n } ensures that for each transition ti , the number of times ti being used in the computation (i.e., y01 (i) + y02 (i) + · · · + y0n (i)) captured by S (PQ , τ ) equals vi . Recall that y0h captures the firing count vector in segment h. A careful examination reveals that in H, the number of inequalities is bounded by O(m ∗ k + r ), and the number of variables is bounded by O((m + k)∗ m). Furthermore, the absolute value of the maximal numbers in H is bounded by max{kvk, kT¯ k, s}. Using Lemma 3, if H has a solution, then a ‘small’ solution of max-value bounded by (max{kvk, kT¯ k, s, (m + k) ∗ m})b∗(m∗k+r ) exists, for some constant b. Recall that the y0h vector variable represents the numbers of times the respective transitions are used along segment h of the reachability path. As a result, an initial marking with at most m ∗ ((max{kvk, kT¯ k, s, (m + k) ∗ m})b∗(m∗k+r ) ) ∗ kT¯ k tokens in each of the ω places suffices for such a path to be valid in the original PN, since each transition 2 The superscript tr denotes the transpose of a matrix.
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
2451
consumes at most kT¯ k tokens from a place. The above is bounded by (a1 ∗ kvk)b∗a2 (for a1 , a2 given in the statement of the theorem), where b is a constant. Now define f (kvk) = (a1 ∗ kvk)b∗a2 . From Theorem 2, kmin(pre∗ (P , (µ0 , L)))k is bounded by f (k−1) (k(ω, . . . , ω)k), which can easily be shown to be bounded by (a1 )(c ∗a2 ) , for some constant c. k
The above theorem provides a framework for analyzing a number of upward-closed sets associated with normal PNs. The BR problem mentioned at the end of Section 4.1 clearly falls into this category. Our results for normal PNs carry over to the rest of the subclasses listed in Fig. 2, although the bounds are slightly different. Due to space limitations, the details are omitted here. 4.4. Lossy Petri nets Lossy Petri nets (or equivalently, lossy vector addition systems with states) were first defined and investigated in [4] with respect to various model checking problems. A lossy Petri net (P , T , ϕ) is a PN for which tokens may be lost spontaneously without transition firing during the course of a computation. To be more precise, an execution step from markings µ to µ0 , t
denoted by µ ⇒ µ0 , of a lossy PN can be of one of the following forms: (1) ∃t ∈ T , µ → µ0 , or (2) µ > µ0 (denoting token ∗
loss spontaneously at µ). As usual, ⇒ is a reflexive and transitive closure of ⇒. It is easy to observe that for arbitrary goal set U (not necessarily upward-closed), the set pre∗ (P , U ) for lossy PN P is always upward-closed. Consider the case when U = {µ0 }. The following is easy to show: ∗
Theorem 9. Given a lossy PN P = (P , T , ϕ), and a goal marking µ0 , the minimal elements of the set pre∗ (P , {µ0 }) = {µ | µ ⇒ 2d×k×logk
µ0 } have their max-values bounded by n
, where n = max{kT¯ k, kµ0 k} and d is a constant.
The above result can be easily extended to the case when U is an upward-closed set. In [4], the global model checking problem for (conventional or lossy) VASSs with respect to formula of the form ∃Aω (π1 , . . .W , πm ) has been shown to be decidable. An upward-closed constraint π over variable set X = {x1 , . . . , xk } is of the form xi ∈X xi ≥ ci , where ci ∈ N, 1 ≤ i ≤ k. A k-dimensional vector v is said to satisfy π , denoted by v |= π , if v(i) ≥ ci , ∀1 ≤ i ≤ k. Consider a k-dim VASS V = (v0 , V , s1 , S , δ) with S = {s1 , . . . , sh }. Given h upward-closed constraints π1 , . . . , πh over variable set X = {x1 , . . . , xk }, and a configuration σ1 , we write σ1 |= ∃ω (π1 , . . . , πh ), iff there is an infinite computation σ1 , σ2 , . . . , σi , . . ., such that ∀i ≥ 1, if state(σi ) = sj , then v al(σi ) |= πj . In words, there exists an infinite path from configuration σ1 along which the vector value of each configuration satisfies the upward-closed constraint associated with the state of the configuration. In [4], the following global model checking problem was shown to be decidable:
• Input: A given k-dim VASS V = (v0 , V , s1 , S , δ) with S = {s1 , . . . , sh } and a formula φ = ∃ω (π1 , . . . , πh ), for upwardclosed constraints π1 , . . . , πh . • Output: The set [[φ]]V = {σ | σ |= φ in V }. The following result gives a complexity bound for the above problem. d×k×logk
Theorem 10. For each state s ∈ S, kmin({v ∈ Nk | (s, v) ∈ [[φ]]V })k is bounded by n2 the absolute value of the largest number mentioned in φ , and d is a constant.
, where n = max{kT¯ k, u}, u is
Proof. The proof is obtained by constructing a VASS V 0 = (v00 , V 0 , s01 , S 0 , δ 0 ) from V such that (s1 , v0 ) |= φ in V iff there exists an infinite pathW from (s01 , v00 ) in V 0 . Assume that πi = 1≤l≤k (xl ≥ ci,l ). For convenience, for a value c and an index l, we define [c ]l to be a vector whose l-th coordinate equals c; the rest of the coordinates are zero. The construction is as follows:
• S 0 = S ∪ {qi,l,j | 1 ≤ i ≤ h, 1 ≤ l ≤ k, 1 ≤ j ≤ h}. • For each addition rule v ∈ δ(si , sj ), k addition rules are used to test the k primitive constraints in πi , by including the following rules: ∀1 ≤ l ≤ k, [−ci,l ]l ∈ δ 0 (si , qi,l,j ). Furthermore, we also have (v + [ci,l ]l ) ∈ δ 0 (qi,l,j , sj ) to restore the testing of ci,l as well as adding vector v . • v00 = v0 and s01 = s1 . On the basis of the above construction, it is reasonably easy to show that (s1 , v0 ) |= φ in V iff there exists an infinite path from (s01 , v00 ) in V 0 . The bound of the theorem then follows from Theorem 6. 5. Conclusion We have developed a unified strategy for computing the sizes of the minimal elements of certain upward-closed sets associated with Petri nets. Our approach can be regarded as a refinement of [8] in the sense that complexity bounds become available (as opposed to merely decidability as was the case in [8]), as long as the size of a witness for a key query is known. Several upward-closed sets that arise in the theory of Petri nets as well as in backward-reachability analysis in automated verification have been derived in this paper. It would be interesting to seek additional applications of our technique.
2452
H.-C. Yen, C.-L. Chen / Theoretical Computer Science 410 (2009) 2442–2452
Acknowledgement The first author was partially supported by NSC Grant 96-2221-E-002-028. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]
R. Alur, D. Dill, A theory of timed automata, Theoret. Comput. Sci. 126 (1994) 183–235. R. Alur, T. Henzinger, M. Vardi, Parametric real-time reasoning, in: Proc. 25th ACM STOC, 1993, pp. 592–601. I. Borosh, L. Treybis, Bounds on positive integral solutions to linear diophantine equations, Proc. Amer. Math. Soc. 55 (1976) 299–304. A. Bouajjani, R. Mayr, Model checking lossy vector addition systems, STACS’99, Trier, Germany, in: LNCS, vol. 1563, 1999, pp. 323–333. R. Howell, L. Rosier, H. Yen, Normal and sinkless Petri nets, J. Comput. System Sci. 46 (1993) 1–26. C. Rackoff, The covering and boundedness problems for vector addition systems, Theoret. Comput. Sci. 6 (1978) 223–231. L. Rosier, H. Yen, A multiparameter analysis of the boundedness problem for vector addition systems, J. Comput. System Sci. 32 (1986) 105–135. R. Valk, M. Jantzen, The residue of vector sets with applications to decidability in Petri nets, Acta Inform. 21 (1985) 643–674. F. Wang, H. Yen, Timing parameter characterization of real-time systems, in: Proc. CIAA 2003, in: LNCS, vol. 2759, 2003, pp. 23–34. H. Yamasaki, Normal Petri nets, Theoret. Comput. Sci. 31 (1984) 307–315. H. Yen, Integer linear programming and the analysis of some Petri net problems, Theory Comput. Syst. 32 (4) (1999) 467–485.