. Quantum Bio-Informatics IV From Quantum Information to Bio-Informatics
QP-PQ: Quantum Probability and White Noise Analysis*
Managing Editor: W. Freudenberg Advisory Board Members: L. Accardi, T. Hida, R. Hudson and K. R. Parthasarathy
QP-PQ: Quantum Probability and White Noise Analysis Vol. 28:
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya
Vol. 27:
Quantum Probability and Related Topics eds. R. Reballeda and M. Orszag
Vol. 26:
Quantum Bio-Informatics III From Quantum Information to Bio-Informatics eds. L. Accardi, W. Freudenberg and M. Ohya
Vol. 25:
Quantum Probability and Infinite Dimensional Analysis Proceedings of the 29th Conference eds. H. Ouerdiane and A. Barhaumi
Vol. 24:
Quantum Bio-Informatics II From Quantum Information to Bio-informatics eds. L. Accardi, W. Freudenberg and M. Ohya
Vol. 23:
Quantum Probability and Related Topics eds. J. C. Garcia, R. Quezada and S. B. Santz
Vol. 22:
Infinite Dimensional Stochastic Analysis eds. A. N. Sengupta and P. Sundar
Vol. 21:
Quantum Bio-Informatics From Quantum Information to Bio-Informatics eds. L. Accardi, W. Freudenberg and M. Ohya
Vol. 20:
Quantum Probability and Infinite Dimensional Analysis eds. L. Accardi, W. Freudenberg and M. Schilrmann
Vol. 19:
Quantum Information and Computing eds. L. Accardi, M. Ohya and N. Watanabe
Vol. 18:
Quantum Probability and Infinite-Dimensional Analysis From Foundations to Applications eds. M. Schilrmann and U. Franz
Vol. 17:
Fundamental Aspects of Quantum Physics eds. L. Accardi and S. Tasaki
Vol. 16:
Non-Commutativity, Infinite-Dimensionality, and Probability at the Crossroads eds. N. Obata, T. Matsui and A. Hara
Vol. 15:
Quantum Probability and Infinite-Dimensional Analysis ed. W. Freudenberg
*For the complete list of the published titles in this series, please visit: www.worldscibooks.com/series/qqpwna_series.shtml
QP-PQ Quantum Probability and White Noise Analysis Volume XXVIII
Juantum
Bio-liljormatics IV From Quantum Information to Bio-Informatics Tokyo University of Science, Japan
10 - 13 March 2010
Editors
Luigi Accardi Universita di Roma "Tor Vergata ", Italy
Wolfgang Freudenberg Brandenburgische Technische Universitat Cottbus, Germany
Masanori Obya Tokyo University o/Science, Japan
'~world Scientific NEW JERSEY· LONDON· SINGAPORE· BEIJING· SHANGHAI· HONG KONG· TAIPEI· CH ENNAI
Published by
World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA o.ffice: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK o.ffice: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
QUANTUM BIO-INFORMATICS IV QP-PQ: Quantum Probability and White Noise Analysis - Vol. 28 Copyright © 2011 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This bo.o.k, o.r parts thereo.f, may no.t be repro.duced in any fo.rm o.r by any means, electronic o.r mechanical, including pho.to.co.pying, reco.rding o.r any info.rmatio.n sto.rage and retrieval system no.w kno.wn o.r to. be invented. witho.ut written permissio.n fro.m the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers , MA 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-13 978-981-4343-75-6 ISBN-lO 981-4343-75-7
Printed in Singapore by Mainland Press Pte Ltd.
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (p. v)
PREFACE
This volume is based on the fourth international conference of quantum bio-informatics held at the QBI Center of Tokyo University of Sciences. The purpose of the conference is towards new stage making interdisciplinary bridges in mathematics, physics, information and life sciences, in particular, research for new paradigm for information science and life science on the basis of quantum theory. More than 100 researchers in various fields such as mathematics, physics, information and biology come from all over the world. The conference was held for nearly one week, and we had a lot of fruitful discussion. In this fourth conference, particular attention is come up on quantum entanglement, simulation of bio-systems, brain function, quantum like dynamics and adaptive systems. Most of speakers gave care to the relation between their own topics and the mystery of life. The papers submitted in this volume are all referred, whose contents are related to one of the following subjects: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Mathematics of Cryptography and its related topics Quantum algorithm and computation Quantum entanglement Quantum entropy and information dynamics Quantum dynamics and time operator Stochastic dynamics and white noise analysis Brain activity Quantum like models and PD game Quantum physics and superconductivity Quantum tomography and sufficiency Adaptation in Plants Alignment of sequences
Luigi Accardi Wolfgang Freudenberg Masanori Ohya
v
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. vii- x)
CONTENTS Preface
v
The QP-DYN Algorithms
1
L. Accardi, M. Regoli and M. Ohya
Study of Transcriptional Regulatory Network Based on Cis Module Database
17
S. Akasaka, T. Urushibam, T. Suzuki and S. Miyazaki On Lie Group-Lie Algebra Correspondences of Unitary Groups in Finite von Neumann Algebras
29
H. Ando, 1. Ojima and Y. Matsuzawa On a General Form of Time Operators of a Hmiltonian with Purely Discrete Spectrum
41
A. Ami Quantum Uncertainty and Decision-making in Game Theory
51
M. Asano, M. Ohya, Y. Tanaka, A. Khrennikov and 1. Basieva New Types of Quantum Entropies and Additive Information Capacities
61
V. P. B elavkin Non-Markovian Dynamics of Quantum Systems
91
D. Chrusciriski and A . Kossakowski Self-collapses of Quantum Systems and Brain Activities
K.-H. Fichtner, L. Fichtner, W. Freudenberg and M. Ohya
vii
101
viii
Statistical Analysis of Random Number Generators
117
L. Accardi and M. Gabler Entangled Effects of Two Consecutive Pairs in Residues and Its Use in Alignment
129
T. Ham, K. Sato and M. Ohya The Passage from Digital to Analogue in White Noise Analysis and Applications
137
T. Hida Remarks on the Degree of Entanglement
145
D. Chrusciriski, Y. Hirota, T. Matsuoka and M. Ohya A Completely Discrete Particle Model Derived from a Stochastic Partial Differential Equation by Point Systems
157
K.-H. Fichtner, K. Inoue and M. Ohya On Quantum Algorithm for Exptime Problem
173
S. Iriyama and M. Ohya On Sufficient Algebraic Conditions for Identification of Quantum States
185
A. J amiolkowski Concurrence and Its Estimations by Entanglement Witnesses
199
J. Jurkowski Classical Wave Model of Quantum-like Processing in Brain
209
A. Khrennikov Entanglement Mapping vs. Quantum Conditional Probability Operator
D. Chrusciriski, A. Kossakowski, T. Matsuoka and M. Ohya
223
ix
Constructing Multipartite Entanglement Witnesses
237
M. Michalski On Kadison- Schwarz Property of Quantum Quadratic Operators on M 2 (C)
255
F. Mukhamedov and A . Abduganiev On Phase Transitions in Quantum Markov Chains on Cayley Tree
267
L. Accardi, F. Mukhamedov and M. Saburov Space( -Time) Emergence as Symmetry Breaking Effect 1. Ojima Use of Cryptographic Ideas to Interpret Biological Phenomena (and Vice Versa)
279
291
M. R egoli Discrete Approximation to Operators in White Noise Analysis
311
Si Si Bogoliubov Type Equations via Infinite-dimensional Equations for Measures
321
V. V. Kozlov and O. G. Smolyanov Analysis of Several Categorical Data Using Measure of Proportional Reduction in Variation
339
K. Yamamoto, K. Tahata, N. Miyamoto and S. Tomizawa The Electron Reservoir Hypothesis for Two-dimensional Electron Systems
355
K. Yamada, T. Uchida, M. Fujita, H. Koizumi and T. Toyoda On the Correspondence between Newtonian and Functional Mechanics E. V. Piskovskiy and 1. V. Volovich
363
x
Quantile-Quantile Plots: An Approach for the Inter-species Comparison of Promoter Architecture in Eukaryotes
373
K. Feldmeier, J. Kilian, K. Harter, D. Wanke and K. W. Berendzen Entropy Type Complexities in Quantum Dynamical Processes
387
N. Watanabe
A Fair Sampling Test for Ekert Protocol
403
G. Adenier, A. Yu. Khrennikov and N. Watanabe Brownian Dynamics Simulation of Macromolecule Diffusion in a Protocell
413
T. A ndo and J. Skolnick Signaling Network of Envitonmental Sensing and Adaptation in Plants: Key Roles of Calcium Ion
427
K. K uchitsu and T. K urusu NetzCope: A Tool for Displaying and Analyzing Complex Networks
437
M. J. Barber, L. Streit and O. Strogan
Study of HIV-1 Evolution by Coding Theory and Entropic Chaos Degree
451
K. Sato The Prediction of Botulinum Toxin Structure Based on in Silico and in Vitro Analysis
461
T. Suzuki and S. Miyazaki On the Mechanism of D-wave High Tc Superconductivity by the Interplay of Jahn-Teller Physics and Mott Physics H. Ushio, S. Matsuno and H. Kamimura
469
Quantum BiD-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 1-15)
THE QP-DYN ALGORITHMS LUIGI ACCARDI, MASSIMO REGOLI and MASANORI OHYA* Centro Vito Volterra, Universitii di Roma Tor Vergata, *Quantum Bio-informatic Center, Tokyo Universty of Science
1. QP-DYN algorithms: general scheme
The common denomination QP-DYN refers to a family of symmetric cryptographic algorithms based on a single mathematical structure, but differing in a potentially infinite class of specific realizations. We will see in the following how this flexibility in the choice of the realization can be used to arbitrarily increase security. The construction of these algorithms is inspired to a special class of deterministic dynamical systems (Anosov systems) whose chaotic properties are well known in the mathematical literature (see [5]). The theoretical bases of QPK-DYN algorithms are in the theory of chaotic dynamical systems and are described in the paper [2] which develops the previous [1] in which the main idea of the method was applied to the construction of sequences of pseudo-random vectors .. This paper also explains why the transition from the pseudo-random generation method to the cryptographic algorithm is non trivial: the point is that almost all results on this class of dynamical systems (in particular the proof of the of the caoticity proprerties) are based on techniques of measure theory and therefore are valid up to sets of zero measure. On the other hand it can be proved that the set of rational numbers, which are the only ones computers can deal with, is in the excluded zero measure set. Therefore the standard mathematical theory cannot guarantee the required chaotic properties. More refined mathematical arguments (dealing with the ergodic properties of periodic orbits, see discussion in [2]) can be of help, but this direction of the theory is much less developed than the standard one. For this reason the mathematical theory, even if providing a useful intuition of the direction to pursue, cannot guarantee a priori satisfactory statistical properties of the generated sequences
2
and these have to be verified a posteriori using the standard batteries of statistical tests. Moreover a straightforward transposition of the pseudo-random generation program produces an easily breakable cryptographic protocol (for more details see section (8)). Therefore the simulation of the chaotic dynamic system must be integrated with tricks of specifically cryptographic nature. After the above mentioned proposal in [1], other authors have developed the idea to use the hyperbolic automorphism of tori (Anosov systems) for the generation of pseudo-random sequences (e.g. [12], [9], [4], [13], [11], [6]), however we have not found evidence, in the literature, of cryptographic applications of such algorithms. In the analogy with dynamical systems the secret key is the dynamical law and the potentially public part of the key (seed) is the initial state of the system. Given these data, the secret shared key (SSK) is easily constructed from the orbit of the system corresponding to the given initial state. The general idea of the algorithm can thus be summarized as follows: A (Alice) and B (Bob) share the (secret) dynamical law of a dynamical system whose state space is public. In order to produce an SSK they publicly exchange an initial state of the system and each of them constructs the SSK by applying a pubicly known procedure to the orbit. By increasing the complexity of the dynamical law one can increase the security beyond any limit, but also the constructive complexity of the system increases and this decreases its speed. The balance between these two competitive requirements characterizes the good cryptographic algorithms. A posteriori one can recognize that, by appropriately choosing the state spce, any pseudo-random generator can be fitted into this scheme. Thus what makes the difference are the specific features of the state spce and of the dynamical law. The goal of every algorithm of the QP-DYN family is the following: given in input a text T of binary length IT E N U {oo}, produce a key of length equal to IT. Such algorithms are used in two situations: (i) the length of the text is known a priori: IT < 00 (ii) the length of the text is not known a priori: IT = 00 Case (i) is typical in storage applications, Case (ii) in streaming applications. The QP-DYN protocols are separated in three independent subalgorithms:
3
(I) Initialization algorithm (II) Secret shared key (SSK) generation algorithm (III) Codification and decodification algorithms (i.e. use of the SSK to exchange messages). The role of the initialization algorithms is to quickly loose memory of the public initial state (seed). This is equivalent to introduce a modification of the dynamical law in the first steps of the algorithm and can be done in a number of ways. The codification and decodification algorithms can be chosen arbitrarily however, since in the present case the SSK has the same length as the text and has very good statistical properties, while the potentially public part of the key can be changed for every text at zero cost, as codification/decodification algorithm it is sufficient to use the XOR of the SSK with the clear text in one time pad modality which, because of Shannon's theorem, is the one of maximum security. We will concentrate our attention on the core of the algorithm, i.e. the dynamical law which produces the SSK. All the specific protocols described in the following have been realized in software programs that have been tested in a variety of applications. The paper [8], based on part of of Markus Gabler's PhD thesis, discusses the results of a statistical analysis of the QP-DYN pseudo-random generators. This analysis has been repeated many times by several independent groups confirming the good statistical properties of these generators. The report [10] has been realized by the research group directed by Prof. Giuseppe Italiano of the Department of Computer science, Systems and Production, of the University of Rome Tor Vergata and compares the performances of the QP-DYN algorithms on cellular telephones with several known suites of cryptographic algorithms realizing public key exchange (RSA, Diffie-Hellmann, Elliptic curves) and subsequent encoding/decoding (AES, RC5). The result is that the QP-DYN suite produces longer SSK in shorter times. The content of the present paper is the following: - section (2) describes the dynamical systems underlying the QP-DYN algorithms - section (3) introduces the notion of key generating function (KGF) and describes how the secret key shared (SSK) is constructed from the orbits of the dynamical systems discussed in section (2) - section (4) explains why the dynamical systems described in section (2) are not adequate for cryptographic purposes and outlines the modifications
4
of the dynamical law introduced inm order to achieve this goal - section (5.1) describes the orbit jump function - section (6) describes the mechanism of machine truncation - section (7) introduces the use of multiple dynamical systems and explains how the KGF is modified in this framework - section (8) shows how the introduction of the cut function alone is sufficient to increase enormously the complexity of attacks even to the single matrix algorithm - section (9) shows how the introduction of a second dynamical system changes qualitatively the situation with respect to possible attacks in the sense that the attacker now faces an indeterminate rather than a difficult problem. Finally let us notice that the full control on the mathematical structure, in particular the heavy use of modular multiplications, has a price in terms of speed of the algorithm (about 80 machine cycles per byte): this is quite fast for most purposes, but not enough to rank the present algorithm among the fastest presently available stream ciphers. A faster version (by a factor of about 8) ofthe QP-DYN algorithm (QPDYN-S) has been implemented in software and submitted to all the tests of the evaluation program of the Lausanne SASC Conference (13-14 February 2008) (see [14]), available in the web page of the conference and consisting of 8 measures of speed and agility. We compared the performances of QP-DYN-S with the 8 finalist algorithms in the software profile selected by the conference. The results of these tests proved that QP-DYN-S was among the most performing 4 finalist algorithms. No algorithm, among the 8 finalists (plus QP-DYN-S), turned out better than the other ones in all these 8 parameters. For example our QP-DYN-S was about twice slower than the fastest one (10,25 machine cycles per byte against 4,48) but better in agility (21,44 against 29,50) and definitively faster than some popular algorithms, such as Salsa 20. A detailed description of the P-DYN-S algorithm will be discussed elsewhere.
2. Dynamical systems underlying the QP-DYN algorithms The QPK-DYN cryptographic algorithms are realized using variations of the class of (discrete time) dynamical systems described in the present section.
5
A dynamical system of this class is determined by: (i1) a natural integer dEN, called dimension of the algorithm (i2) an invertible d x d matrix M with coefficients M[i, j] in the natural integers (we will write simply M E M (d; N)) representing the law of motion of the dynamical system (i3) a natural integer PEN, called module (typically it is a large prime number) Therefore a such dynamical system is determined by the triple: (1)
{d, M,p}
As usual we often identify a natural integer with the string of O's and 1 's defined from its binary expansion. We will use integers of m bits, typically m=32
or m is a multiple of 32. Denoting 71,p the finite field with p elements, identified to the set of natural integers {O, 1, 2, ... ,p - I} , the state space of the dynamical system is by definition the vector space 71,~. The system is supposed to be reversible, i.e. denoting det(M): the determinant of the matrix M: det(M)
i- 0
mod p
Definition 2.1. The orbit of the vector Va E 71,~ is by definition the set O(M,va):= {va} U {Vi E 71,;
Vi+l
= MVi
1 ~ i EN}
and Va is called the initial vector of the orbit. Since 71,~ contains exactly pd different vectors any orbit 0 is a finite set and therefore for each vector Va there exists TEN such that VT = Va. The smallest T with this property is called the period of Va. Since the dynamical system is deterministic and reversible, any vector in O(M, va) has the same period as Va and an orbit can intersect itself only if it comes back to the initial vector Va.
3. From sequences of vectors to sequences of bits The orbits described in the previous section generate pseudo-random binary sequences of arbitrary length through the use of a sequence of key generating functions (KGF) nEN
6
As explained above we identify an integer with its binary expansion therecan be thought to transform n d-dimensional vectors fore each function with integer components into a single binary string. Each step of the algorithm produces a d-dimensional vector with integer components (more precisely in {O, 1, ... ,p - I}). Each of these numbers is represented in the basis 2 with b binary digits (typically b = 32). Therefore every step of the algorithm produces a string of d . b bit. Consequently, after n steps of the algorithm a string of n . d . b bit will be obtained. The sequence of key generating functions ("'n) uses these strings to produce iteratively the SSK as follows: starting from the initial vector Va at step 0, after n steps the algorithm either has halted or has produced the vectors Va, V1, ... ,Vn . The (n + 1)-th step is the following. The algorithm: (i) compares "'n (V1' ... ,vn ) with IT (ii-a) stops if
"'n
(2) (ii-b) otherwise computes the (n V n +1
+ 1)-th vector
=
MV n
(mod p)
(3)
(iii) computes the (n + 1)-step key "'n+1 (V1' ... , vn+d (iv) goes to the next step. If the iteration is stopped at step N the pseudo-random sequence produced is "'N(Va, ... , VN). 3.1. Recursive construction of the sequence (K,n)
A computationally efficient way to construct the sequence ("'n) consists in computing recursively each by fixing a function:
"'n
"': N d x N
and defining the first step KGF "'1 : N d "'1 :
----t
----t
N
N by means of the prescription
x E N d ----t "'l(X) := ",(x, 0) EN
The sequence ("'n) is then defined inductively for n way: "'n+1: (x,y) ENd x N
----t
~
",(0, n)
:=
2 in the following
"'n+1(X,y):= ",(x, "'n(Y)) EN
Definition 3.1. A function", : N d
",(x, n)
~
n
X
N
----t
N that satisfies the condition
(4)
7
will be called a binary d-dimensional KGF. Remark 3.1. There are many interesting classes of binary d-dimensional KGF which are computationally easy to handle. The choice of such functions can be used: (i) in order to create personalizations of the algorithm (ii) in order to increase its robustness by keeping secret such choices In the following section we will describe the choice made in the present implementation of the QP-DYN algorithm. 3.2. KGF by left concatenation Definition 3.2. Given a function CXJ
A : 1'1
--7
1'1 ==
U {O, l}m m=l
the A-left concatenation function K,)., :
1'1 d
X
1'1
--7
1'1
is defined by
K,).,((nl, ... ,nd);n):= [A(nd), ... ,A(nd,n] where the right hand side denotes the binary string obtained by left concatenation of the binary strings A(nd), ... , A(nd, n in the given order. Remark 3.2. The A-left concatenation function K,)." defined by (5), depends on the choice of the function A. The two choices that we have currently implemented are: (i) the random truncation: A(n) removes all the bits of n from the left up to the first 1 included (if the first 1 is not removed, then each component of a vector produces a string of bits whose left extreme is always 1, thus decreasing the chaoticity of the procedure) (ii) the deterministic truncation of order c: A(n) removes the first T bits of n from left, where T is a pre-defined number (this choice is more convenient in hardware implementations). Example 3.1. If the components of the vectors are b-bit numbers, then in case (ii) each component produces b - T bits so that n vectors produce a string of (b - T)dn bits.
8
4. Modifications of the dynamical law The cryptographic robustness of the SSK, constructed in section (3) above, is based on the fact that the reconstruction of the dynamical law of a complex deterministic system from its orbits, is a very difficult problem even if these orbits are relatively simple. For example the reconstruction of the gravitational law from the elliptic orbits of the planets in the solar system has requided nearly one century of hard work of the best mathematicians, physicists and astronomers. If the reconstruction of the dynamical law of the system from its orbits is easy, then the cryptographic scheme outlined in the previous sections is weak under clear text attacks in the sense that, if an attacker E can obtain a pair of the form (clear text , encrypted text) then she can easily reconstruct the secret key. The dynamical system described in section (2) has two main defects: (i) it is not enough chaotic, i.e. does not pass some statistical tests (ii) it is not enough complex, i.e. the matrix M can be easily reconstructed once one knows a segment of orbit including a number of vectors of order d (this attack is described in the first lines of section (8)). The reason of this weakness is the linearity of the dynamical law described in section (2). One can remedy to these drawbacks by introducing additional nonlinearities which are strong enough to destroy any attempt to reconstruct, in an efficient way, the dynamical law from an arbitrary number of its orbits, but simple enough to implement, in order not to reduce the speed of the algorithm. The additional operations, introduced to hide the initial algebraic structure of the dynamical law are the following: (i) the cut (already described at the end of section (3.2)) (ii) the jump of orbit (see section (5)) (iii) the machine truncation (see section (6)) (iv) the XOR with another sequence produced by another dynamical system (see section (7)) Section (7.1) is dedicated to estimate how much part of the algebraic structure can be recovered after the single operation of cut and at which computational cost. The estimate is done in the case of fixed cut. In the case of random cut, corresponding to the currently implemented software version of the algorithm the complexity grows.
9
5. Orbit jump function We have already seen that, since all operations are taken modulo p, the space of the possible vectors contains a finite number of points, hence every orbit of every dynamical system is periodic and this can create problems even with simple statisical tests. It is usually assumed that a desirable condition for good cryptographic sequences is to have good statistical properties, i.e. to be able to pass some strongly demanding batteries of statistical tests (see however the considerations in [16] which show that this dogma has to to be taken with great caution). In order to achieve such a good statistical behavior it is necessary to make so that the periods of such orbits are at least very long (which is a necessary but not sufficient condition for chaoticity). To achieve this goal the original dynamical system is modified as follows. The algorithm confronts, at every step, the last vector produced, say V n , with the initial vector of the orbit Va (since the system is reversible only Va has to be memorized). If the two coincide, Vn is replaced by J(vn-d where J : N d --+ N d is a function, called jump function. This is equivalent to begin a new orbit from the vector J(vn-d, that therefore becomes the new initial point. Thus the protocol described in section (3) becomes modified as follows: (ii-c) after step (3) the algorithm verifies if
(6) (ii-d) if this happens, goes to step (iii) (ii-e) if (6) does not hold, defines V n+l
:=
J(vn )
(7)
and then goes to step (iii). It is clear from the above description that the role of the jump function J is to prevent, as long as possible, the occurrence of a periodic orbit, improving in this way the chaoticity of the generated sequences. Clearly this empirical prescription is not sufficient to guarantee the absence of short periods, in fact it is not difficult to build examples of systems that, even in presence of a jump function, are locked for ever in two short orbits. It is however an empirical fact that, with the introduction of a jump function, the occurrence of periodical orbits becomes very rare: in fact after several years of trials and millions of terabytes produced, such an occurrence has never shown up.
10
5.1. A choice of the orbit jump function It is clear that the jump function is largely arbitrary and its inclusion into the secret parameters of the algorithm improves its security. The algorithm actually implemented uses the following orbit jump function
J: v E Z~
(2.....0·'. .. °0).. v+ (0).. 01 . ..
-->
1
. '.
.
00··· 1
1
(8)
6. Machine truncation The fact that usual machines deal with numbers with a pre-definite number of bits, say m, can be exploited to introduce an additional nonlinearity which increases the randomness of the system. Let m be the number of binary digits (precision) available for the computation and let the modulus p be chosen so to satisfy 2m ~ 2P
Suppose that the entries of the secret key matrix M are not taken modulo p but are large enough (the more of them have m or m - 1 bits the better) so that, when one constructs the orbit, the summation occurring in the matrix-vector product, may lead to vectors whose components exceed 2m bits. When this happens the machine truncates the result to m bits, before computing the modulo p, according to the scheme: [
(~M'[i, k]· V[k])
mod 22m
1 mod p
7. Use of multiple dynamical systems An additional nonlinearity to the dynamics described in section (2) can be introduced by replicating the original procedure. Two possible choices to achieve this goal are: (i) to introduce a new dynamical law M' leaving the environment, i.e. the field Zp, unaltered (ii) to introduce a new environment Zp" leaving the dynamical law M unaltered. Both choices lead to different dynamical systems, i.e. to different orbits.
11
Since the choice (ii) has the computational advantage that, at each step of the iteration, one saves a matrix-vector multiplication, we have implemented this choice. In what follows we will illustrate this implementation.
7.1. The 2 prime protocol Given a text T of length IT we consider two dynamical systems
{d,M,p',va'''''n,J}
(9)
with: (il) the same dimension dEN (i2) the same dynamical law M E M(d; N) (i3) the same initial vector Va E N d (i4) the same orbit jump function J : N d ---+ N d , given by (8) (i5) the same key generating functions (KGF) ""n : Ndn ---+ N (n E N) (i6) different moduli pi, p" E N (prime numbers). One then executes the algorithm described in section (3) with the only difference that step (iii) (computation of the (n + 1)-step key) is replaced by the following two steps: «iii)-2syst-a) having produced, without any cut, the two (n + I)-step keys
(remember that, in the present implementation, the KGF the algorithm computes
""n are the same) (10)
where, for any two binary strings x, y, x EB y denotes the string x XOR y and if necessary the two strings are made of the same length by adding zeros on the left. (v-2syst-b) then removes from the bits of (10) all the leading O's and the first leading 1 (or, in the case of fixed cut, the first T bits from left). The result is the (n + I)- step key of the modified algorithm: -
- ( ""n+l Vl ,
I ') .. · ,Vn+l; Vl, .. ·,vn+ l
The stopping rule is the same as in section (3). Remark 7.1. It may happen that the string (10) has some leading bits equal to zero because, depending on the choice of the initial vector, the modulo operations
12
with pi and p" may enter the game only after a certain number of orbit steps: in these steps the vectors produced by the two systems are identical. Therefore an initial part of the resulting sequence should not be considered: the length of the omitted part is a parameter (which can even be public with no harm for the security of the algorithm) included in the initialization procedure.
8. Attacks to the 1-matrix algorithm The robustness analysis that follows has been developed in the worst possible hypotheses for the defender. That is: - it is only considered the case of a single dynamical system (thus excluding the most important security factor) - the machine cut is excluded - the jump function is excluded - one supposes that the only secret key is the matrix M while the following information are considered public: - the prime number p (module), - the dimension d, - the initial data initial Vo, - the KGF sequence (h: n ): left concatenation without permutations Furthermore: - the bit cut (see the end of section (3.2)) is considered fixed and public. - the most favorable case for the attacker E is considered, i.e.: the clear text attack, in which the both original text T and the encrypted text are known to the attacker. Clearly the degree of security grows if, as it is always possible, some of these informations are part of the secret key shared a priori. The fact that, even under these extreme conditions, the breaking complexity of the algorithm can be very high helps to guess why up to now it has not been possible to find, even at theoretical level, attacks to the 2-matrix version of the algorithm. Suppose that: (i) E knows d + 1 consecutive (column) vectors of the orbit starting from some lEN: {VI,VI+l,'" ,Vl+d} (ii) the first d among these vectors are linearly independent and define the following (column) matrices:
v=
(VI,'"
,Vl+d-l) E
M(d;N)
Vi =
(Vl+l,'"
,V/+d ) E
M(d; N)
13
then MV = V' and this allows to obtain the secret key M = V ,- 1 hence to break the algorithm. However, since E only knows a binary string her problem is to recover from it the components of the vectors VI+i. This means that E has to discover which bit is the first bit of the first component of VI. Once she has this information, since she knows from the public structure of the algorithm that the bits are generated from the vectors by left concatenation without permutations and that the cut is constant and equal to T, E can determine each component of each of the vectors {VI+l, ... ,vI+d up to an ambiguity of T bits per component. This implies 2T possibilities for component and therefore 2dT possibilities per vector. Since E needs d + 1 vectors, she has to choose among 2d(d+ 1)T possibilities. For example, if d = 10 (a dimension that an usual personal computer can manage without any difficulty), then d(d + 1) = 110. Supposing, in order to further facilitate E's task, that T = 2, we see that E has to choose among 2220 possibilities. For each of these choices E must carry out one inversion and one multiplication of matrices of order 10 (each of these operations requires an order of 103 mUltiplications). Finally notice that an increment of d or T, e.g. 15 instead of 10 or 3 instead of 2, increases the construction complexity of the orbit by a factor that is at most quadratic in the increment, while the complexity of attack increases exponentially.
9. Attacks to the 2-matrix algorithm The attacks described in section (8) cannot be applied to the 2- matrix algorithm because in this case E can only recover the sequence
(where EB denotes the XOR operation) and it is impossible to know if, in this sequence, a 1 has been obtained from the combination of a 0 in "'N(Vl , . .. ,VN) (SSK of the first dynamical system) and of a 1 in '" N ( v~ , ... , v~) (SSK of the second dynamical system), or vice-versa. Similarly it is impossible to know if a 0 from two O's or two l's. In other words, and this is one of the main ideas of the new algorithm,
14
E is not facing a difficult problem, but an indeterminate one, namely: given a sum of two elements in a ring, reconstruct the value of the addends. Since, fixing arbitrarily one of the two elements, the knowledge of the sum determines the other one uniquely and since, given the information available to E, all the elements of the ring are equiprobable, it follows that the ambiguity is of the same order of the number of elements of the ring. In our case this means that, for every component of every vector, E has an ambiguity, of order p. For each vector the ambiguity will be therefore of order pd and, for d + 1 vectors, of order pd(d+l). Finally the simultaneous use of the three different fields, i.e. Zp" Zp'" Z2 (where the last one refers to the XOR operation), makes an algebraic attack, even at the statistical level, practically impossible. References 1. Accardi L., F. de Tisi, A. Di Libero: Sistemi dinamici instabili e generazione di sequencei pseudo-casuali, In: Rassegna di metodi statistici e applicazioni, W. Racugno (ed.) Pitagora Editrice, Bologna (1981) 1-32 2. Abundo M., Accardi L., Auricchio A.: Hyperbolic automorphisms of tori and pseudo-random sequences, Calcolo 29 (1992) 213-240 3. Accardi L., Regoli M.: Some simple algorithms for forms generations, L. Accardi (ed.) Fractals in nature and in mathematics, Acta Encyclopaedica, Istituto dell'Enciclopedia Italiana (1993) 109-116 4. L. Afferbach and H. Grothe, J. Comput. Appl. Math. 23, 127 (1988) 5. Arnold V.I., Avez A.: Ergodic problems in classical mechanics, New York: Benjamin (1968) 6. L. Barash, L.N. Shchur, Periodic orbits of the ensemble of Sinai-Arnold cat maps and pseudorandom number generation Physical Review E 73, 036701 (2006) The American Physical Society (2006) 7. M.Cugiani: Metodi Numerico statistici (1980) 8. Markus Gabler: Statistical Analysis of Random Number Generators October (2007); see also M. Gabler's paper in these proceedings. 9. H. Grothe, Statistiche Hefte 28, 233 (1987) 10. Giuseppe F. Italiano, Vittorio Ottaviani ,Antonio Grillo, Alessandro Lentini: BENCHMARKING FOR THE QP CRYPTOGRAPHIC SUITE August (2009) 11. P. L'Ecuyer and P. Hellekalek, in Random and Quasi-Random Point Sets, No. 138 in Lectures Notes In Statistics Springer, New York (1998) 12. H. Niederreiter, Math. Japonica 31, 759 (1986) 13. H. Niederreiter, J. Comput. Appl. Math. 31, 139 (1990) 14. Mattew Robshaw, Olivier Billet (Eds.): New Stream Cipher Designs, The eSTREAM Finalists State-of-the-Art Survey, LNCS 4986 Springer (2008) 15. Regoli, M., pre-mRNA Introns as a Model for Cryptographic Algorithm: Theory and Experiments, proceedings: QUANTUM BIO-INFORMATICS
15
III From Quantum Information to Bio-Informatics Tokyo University of Science, Japan, 11-14 March 2009 16. Regoli, M., A redundant cryptographic symmetric algorithm that confounds statistical tests, Open Systems and Information Dynamics (2011) to appear
This page intentionally left blank
Quantum BiD-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 17-28)
STUDY OF TRANSCRIPTIONAL REGULATORY NETWORK BASED ON CIS MODULE DATABASE SHIZU AKASAKAt, TOMOKO URUSHIBARA, TOMONORl SUZUKI AND SATORU MIYAZAKI Graduate School a/Pharmaceutical Sciences, Tokyo University a/Science 2641 Yamazaki, Noda-city, Chiba, 278-8510, Japan Microarray analysis is a high-throughput method for analyzing expression levels of multiple genes, therefore the microarray have been regarded by many investigators as a powerful method. Treating a huge amount of data and judgment of differentially expressed genes require appropriate statistical analysis. When the microarray analysis suggests there are co-expressed genes under a specific condition, there is high possibility that the common transcriptional factors (TFs) control them. It is also difficult to identify the TFs involved in co-expression through only biochemical experiments. In view of cis-element pattern related to co expressed genes might be one of the solutions to infer the gene expression mechanism clearly. So far, we have constructed Cis-Module database in order to specify cis-element location and distribution on genome. Using this database and rat microarray data, we have investigated the TFs network related to co-expression of genes. If we could also extract the human genes that are orthologous to co-expressed gene in rat, it will allow us to compare their cis-elements and TFs and to consider difference of gene expression profiles between rat and human. It will be very useful to find out attention to drug discovery targeting gene expression mechanism.
1. Introduction
In 2003, Human Genome Project was finished [1]. And all human genome sequence data has been determined and mapped genes on it. After that, many researchers have been studying gene expression in detail. That's because they want to find differentially expressed genes from these data for clarifYing the function of genes. However, it is not efficient to test huge number of genes individually so that we analyze gene expression. Recently, micro array analysis is a good method for analyzing expression levels of multiple genes. Treating a huge amount of data and judgment of differentially expressed genes require appropriate statistical analysis. When the micro array analysis suggests a set of gene expresses under some biological
t
Work partially supported by grant 2-4570.5 of the Swiss National Science Foundation. 17
18
condition, one has a valuable clue as to the detection of the function of the genes. If there are co-expressed genes under a specific condition, it is high possibility that these genes are controlled by the common transcriptional factors (TFs). As Fig.l, if we confirm co-expression of gene A and gene D, TF3 and TF5 may be common for each other. However, the number of co-expression gene are too large in micro array analysis, so it is difficult to identify TFs involved in co-expression through only biochemical experiments. Here we tried to look to cis-element pattern related to co expressed genes by bioinformatics and predict genes Figure I Co-expressed genes and controlled by same TFs. And we aim at common transcriptional factors making gene expression mechanism clear.
2. Transcriptional control and cis-modules Like Fig.l, some gene transcription is controlled by multiple transcriptional factors (TFs). Each transcriptional factor (TF) recognizes specific sequence in up or down stream of gene and the sequence is called cis-element (CE). When some TF recognize specific CE, gene transcription become activated or suppressed depending on the situation. Sets of cis-elements are involved in control of gene expression and they are called cis-modules especially. So it is important to study about cis-modules for clarifying gene expression mechanism. 3. Available Cis-Module Data in public database Currently, there are several available databases for transcriptional factors and their DNA binding site. So, a lot of cis-element sequences have been researched and stored in databases. However, there are few databases collect them as cis-modules. For example, JASPAR and TRANSFAC, which are both transcriptional factor databases, have cis-element patterns each TF recognizes. However, on those database, there are no information associated gene to cis-elements.
19
Transcriptional factor
~TF9
f(R&TGAGTNM':i~1 ', ".
!
...,
f~
r
TF5
Ci s element pattern
Figure 2 Cis-element database and cis-element information
In addition to that, we can see another problems for some information of cis-modules based on bio-chemical experiment in International Nucleotide Sequence Database Collaboration (INSDC). Sometimes, cis-module for a gene is registered by different researchers as Entry_ A and Entry_ B (Fig.3). In this case, cis-elements SPI and AP2 (SPI and AP2 are name of transcriptional factor) are defined upstream region of Entry_A but not Entry_B. In Entry_B, 3 cis-elements for NFKB are described but 2 of them in Entry_A. That is, we can see each entry has different cis-element information To resolve this problem, we need to integrate a number of entries, described about same gene, in one entry. Therefore, grouping together cis-elements per a gene as cis-module and re-construction of cis-module database is required.
Figure 3 Current situation of cis-module entry and reconstruction cis-module database
20
4.
Construction Cis-module Datbase
4.1. Data sources and Data Collection (Fig4-1) In this work, the database responsible for five organic species (Homo sapiens, Mus musculus, Rattus norvegicus, Dorosophila melanogaster and Saccharomyces serevisiae) was constructed. To collect cis-element information, we used DDBJ (http://www.ddbj.nig.ac.jpl) which is one of three members of International Nucleotide Sequence Database Collaboration (INSDC), and this database stores information assured by biochemical experiments. At the same time, we extracted genome data from Ensembl database (http://www.ensembl.org/index.html).whichcontainedlocationofgeneloci. By using Ensembl data, we could bring together data.
4.2. Identification oj CDS location on Genome (Fig4-2) We extracted coding sequences on genome (ensembVCDSs) from genome sequence data. And then, CDSs of DDBJ entries (DDBJ/CDSs) were extracted from DDBJ data, and compared with ensmebl/CDSs with SSEARCH program. This process allowed us to identify locations of records on genome.
4.3. Extraction and Comparison upstream region (Fig4-3, 4-4) Upstream region of CDSs of DDBJ records (USRlDDBJ) and that of ensembl genome data (USRlensembl) were extracted. And, USRlDDBJ were compared with corresponding USRlensembl, leading to identification of cis-element location on genome. Through this process, we get cis-module entry which is cis-module information per gene.
21
(1) Obtaining data
Get records including cis elements information fromDDBJ
Get genome data of each species (Human, Mouse, Rat, Fly, Yeast) from Ensembl genome browser
(2)Identification of DDBJ records on genome
(CDS) from DDBJ records using keywords search
(A~~~: 2:~) +~
~i-
" - - " ., /::' : ';~
(CDS) from EnsembI genome data (2b)
~'~,~~~~~~C1£
oai,...;'H ,·- - ',-. ,-
•.
.
Mappmg (2a) based on (2b)
Mapping (3a) based on (3b) and identification cis element location on genome
Get cis-module information per gene
Figure 4 Flow chart about construction of Cis-module database
22
genome data D(from each species)
•
•
Extraction CDS from INSDB and Ensembl(2a,2b)
Extraction upstream region from DDBJ records (3a)
5'
--
BLAST ( (Mapping)
d
-
c:::{I c
II 3'
Extraction upstream region from genome data (3b) (USRlEnsembl)
~
USRlDDBJ
5'
3'
Identification of cis-element location on genome Figure 5 Reconstruction of Cis-module database. Each number of this figure correspondent to the number offig.4 (flow chart)
23
5. Explanation of User Interface and Advantage of Cis-Module Database This database is freely available on the web (http://www.pharmacoinformatics.jp/cis/). Fig.6 illustrates how the cis-element information is displayed. By using this database, we can refer to cis-elements of each gene upstream region so that researchers are able to utilize the information. At the same time, we can check the distribution of cis-elements in upstream region of gene visually. In addition to that, we also get the information of cis-element location on genome. So, this database reflects total image of gene expression mechanism and databases like this are not before now.
Fieference
Ci s~regul atory
-
Ilk
• • \:
Module Database Fielease 1 .0
.SI:
-Figure 6 User interface ofeis-Module Database.
24
6. Application of Cis-module Database 6.1. Summary of Co-occurrence ofCEs and TFsfrom Cis-module Database A gene expression is regulated by sets of cis-elements and transcriptional factors. Therefore, we extracted entries that have more than two TF information from Cis-Module Database (CMDB) to know TFs working together when some gene express. In table 1, when the gene (module_7) changes the expression, TF1, TF3 and TF6 work for the change together. So we could get co-occurrence of CEs and TFs from CMDB. By this summary, we could know which TFs and CEs involved in a gene expression. Table I : The example of co-occurrence information about transcriptional factors (TFs).
TFI module 7
~¥,~~S~~
... module 431
TF2
TF3 r";~AF,~~Y0:
0
...
TF6
...
~~\?~~~~i;'~
0
0
0
6.2. Prediction ofgene network for caloric restriction Rat Recently, relationship between anti-aging and calorie control are suggesting. As the reason for that caloric restriction rodents have longer life. So we performed below process (Fig.7) to identify common transcriptional factors, which regulate differentially-expressed genes under calorie restricted condition. It is expected that they are involved in a key role for anti-aging. (1) Data analysis (about Microarray data) We used rat micro array data, which has 31099 genes proved and was kindly provided by Dr. Higami. In the present study, 3 groups were prepared: nontransgenic male Wi star rats ad libitum intake (AL), calorie restricted (CR) rats and heterotransgenic rats (tgl-: TO). Each group had 4 rats. We targeted at AL and CR rat to analysis to specify which genes change the expression under the caloric restricted condition. AL group rats continued to receive food at libtium, and CR group rats were provided with 70% of the mean daily intake of AL average. After the comparison of AL data and CR data, differentially expressed 54 genes were identified.
25
(2) Bio-informational approach for the analysis of 54 genes with cis-module database In this section, we described how our bio-informational approach were applied to identify the common transcriptional factors (TFs) for 54 differentially expressed-genes in caloric restricted rats and clustering 54 genes based on co-regulation of these TFs. First, we performed statistical analysis to identify co-expressed genes. Second, we extracted rat genome sequence and upstream regions (USRs) from about each gene gained by analysis result. In particular, we aligned cis-elements (CEs) for TFs in Cis-DB to know what types of TF involve in those genes expression. Then, we tried to evaluate co-occurrence pattern of CEs in co-expressed genes.
Statistical
~~.. ~~CY~M~~~A~T~T~~~~~ge.nlle -:::>
---~..~
:
--..
Analysis· 31099 genes Co-expressed genes (54 genes)
.:;:"~
-+
USR
t:;;
geno~
Figure 7 Statistical analyses of microarray data and search cis-elements pattern
6.3. Evaluation CE pattern predicted in USR ofgenes We could find cis-element patterns automatically by use of the procedure mentioned above(Fig. 7). However, many false positive data might be included in the prediction. Therefore we need to propose the method to improve the accuracy of cis-elements patterns by use of Cis-module database (CMDB). If Co-expressed genes are detected under some condition, we can predict common cis-elements which involve in co-expression regulation of genes. For example, if we get co-expressed genes under certain condition and multiple CEs are found in the USRs of each gene, CE 1 and CE3 are predicted as common cis-elements involved in co-expression regulation of genes. And then, we can predict that TFs recognize those CEs are thought to relate to co-expression under the condition.
-
CE I CE3 CE4 CES c~::I!D= .. :::1 .. I!iil~=:I_ •.===I~ gene A CEI CE2 CE3
c::::J gene C
CEI CE3 CE5 ~c·=· _lI=:Ii!II!1:=====:::::[~I::':=·· .:= ..:= ...1'l"m. • .,.""••••I'!I!J., gene F
Figure 8 Co-occurrence of cis-elements in co-expressed genes
26
6.4. Identification ofgenes regulated by SREBPI There are reports by other research group that SREBP 1 is a key TF for calorie control. But no bio-chemically supported information of genes regulated by SREBP 1. So it is very useful to find genes regulated by SREBP 1. By using the cis element pattern recognized by SREBP 1, we aligned the cis element pattern for upstream regions of 54 genes, co-expressed under caloric restrict condition. As a result, we could identify some genes successfully.
7. Conclusion and Future works In this review, we introduced you the method of construction and application of Cis-Module Database (CMDB). Our database includes 5 species and grouped together cis-elements per a gene as cis-module. Again, uniqueness of CMDB is that we can check sets of cis-element, involving in each gene expression actually. So we can know the relationship between gene and cis-elements. In general, available databases on transcriptional factors and their DNA binding sites having cis-element patterns recognized by each transcriptional factor do not have any information between gene and cis-elements. That is, our database is only the database including cis-elements and their associated genes. Our database also will contribute to predict common cis-elements and transcriptional factors which related to co-expressed genes. As a next step, we are planning to predict of transcriptional factors network based on co-occurrence of cis-elements of differentially expressed genes. If we recognize some genes are controlled by same transcriptional factors, we get together these genes as one group. For example, if we recognize gene A and gene C are controlled by TFI and TF3, gene A and gene C get together as one group. In the same manner, we can construct a cluster for the other TFs. Then it will be possible to perform cluster analysis of TFs and construct gene network based on transcriptional factors.
08
Figure 9 Gene network based on transcriptional factors
Currently, there is some report describing "gene group are same, but TF group controlling these gene group are different". To evaluate this idea, we are also planning to extract homologous human genes to co-expressed rat genes and
27
construct gene network of Human (For example, gene D and gene D' have homology). And then, we intend to compare gene network for other species. It may indicate that transcriptional regulatory mechanisms are variable among species. We believe that our result will admit to find out attention to drug discovery targeting gene expression regulation mechanism. T F2&9
TF I&4
TF7&8 TFI &3
O,n, N
~
C:V
Gene Net work of Human
Figure 10 Comparison gene network based on transcriptional factors
References F.S. Collins, E.S. Lander, 1. Rogers, R.H. Waterston, Finishing the euchromatic sequence of the human genome (2004), Nature, 431, 931-945. 2. R. Anat, Y. Daniei, and B. Yoav, Identifying differentially expressed genes using false discovery rate controlling procedure, (2003),19, Bioninformatics, 368-375. 3. E. Wingender, A.E. Kel, O. V. Kel, H. Karas, T. Heinrmeyer, P. Dietze, R. Knuppel, A. G. Romaschenko and N . A. Kolchanov, TRANSFAC, TRRD and COMPEL: towards a federated database system on transcriptional regulation, (1997), Nucleic Acids Res., 25, 265-268. 4. 1. C. Bryne, E. Valen, M. H. Tnag, T. Marstrand, O. Winther, 1. da Piedade, A. Krogh, B. Lenhard and A. Sandelin, JASP AR, the open access database of transcriptional factor-binding profiles: new content and tools in the 2008 update, (2008), Nucleic Acid Res., 36, Dl 03-1 06. 5. The FANTOM Consortium and the Riken Omics Science Center, The transcriptional network that controls growth arrest and differentiation in a human myeloid leukemia cell line, (2009), Nature, 41 , 553-562 6. S. Liang, S. Fuhrman, R. Somogyi, REVEAL, A GENERAL REVERSE ENGINEERING ALGORITHM FOR INFERENCE OF GENETIC NETWORK ARCHITECTURES, (1998), Pacifis Symposium on Biocomputing, 3, 18-29. 7. http://pharmacoinformatics.jp/cis 8. H.Sugawara, O. Ogasawara, K. Okubo, T. Gojobori and Y. Tateno: DDBJ with new system and face, (2007), Nucleic Acid Res., 1-3. 9. B. B. Tuch, Hli, A.D . Johnson, Evolution of Eukaryotic Transcription Circuits, (2008), SCIENCE, 319, 1797-1799 l.
28
10.
M. Z. Ludwig, C. Bergman, N. H. Patel, M. Kreitman,Evidence for stabilizing selsction in a eukaryotic enhancer element (2000), Nature,403, 564-567
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 29-39)
ON LIE GROUP-LIE ALGEBRA CORRESPONDENCES OF UNITARY GROUPS IN FINITE VON NEUMANN ALGEBRAS
HIROSHI ANDO* AND IZUMI OJIMAt
Research Institute for Mathematical Sciences, Graduate School of Science, Kyoto University, Sakyo-ku, Kyoto, 606-8502, Japan FAX: 075-753-7272, TEL: 075-753-7202 * E-mail: [email protected] t E-mail: [email protected] YASUMICHI MATSUZAWA
Department of Mathematics, Hokkaido University Kita 10, Nishi 8, Kita-ku, Sapporo, 060-0810, Japan E-mail: [email protected] M athematisches Institut, Universitiit Leipzig Johannisgasse 26, 04103, Leipzig, Germany This article is a summary of our talk in QBIC2010. We give an affirmative answer to the question whether there exist Lie algebras for suitable closed subgroups of the unitary group U(H) in a Hilbert space H with U(H) equipped with the strong operator topology. More precisely, for any strongly closed subgroup G of the unitary group U(9J1) in a finite von Neumann algebra 9J1, we show that the set of all generators of strongly continuous one-parameter subgroups of G forms a complete topological Lie algebra with respect to the strong resolvent topology. We also characterize the algebra 9J1 of all densely defined closed operators affiliated with 9J1 from the viewpoint of a tensor category. Keywords: finite von Neumann algebra, unitary group, affiliated operator, measurable operator, strong resolvent topology, tensor category, infinite dimensional Lie group, infinite dimensional Lie algebra.
1. Introduction
In this article we discuss the new kind of Lie algebras of unbounded operators. To motivate our discussion, let us start with the classical problem of considering an infinite dimensional unitary group as a Lie group. Let 1i be 29
30
an infinite dimensional Hilbert space, U(H) be the group of unitary operators on H. If H is finite dimensional, U(H) is a compact Lie group and it is well known that any compact Lie group G can be realized as a subgroup of some U(n). Therefore it is natural to think of a subgroup G of U(H) as the generalizetion of the Lie subgroup of U(n) and discuss their "Lie algebra". Since most infinite dimensional unitary representation are not norm continuous, the appropriate topology to be introduced in G c U(H) would be the strong topology. Then, in view of Stone theorem, one may try to define its Lie algebra as a set 9 = {X; X* = -X, etX E G, \it} of all (possibly unbounded) skew-adjoint operators, which make stongly continous one-parmeter groups of G C U(H). However, this does not work well due to the domain problem of unbounded operators: even though X* = -X, y* = -Yare densely defined on H, the domain of the sum X + Y and Lie bracket XY - Y X often fails to be dense. Even worse, it is possible that dom(X) ndom(Y) = {O}. In addition to this, there is another problem: what kind of topology should we introduce in u(H)? Since it is well known that the sequence {An} of skew-adjoint operators on Hand skwe-adjoint operator A, s-lim(An n
+ 1)-1 =
(A
+ 1)-1 {? e tAn
----+
etA
for all t E R
Therefore, "strong resolvent topology" may be the appropriate one. However, there is no knowing wheter s-lim(An n
+ Bn + 1)-1 =
(A
+ B + 1)-1,
holds true even if An and Bn converges respectively to A and B in the strong resolvent sense. Taking these into consideration, we see that G is too big to have a natural Lie algebra, in general. Is there kind of subgroups G that have a nice Lie algebra beyond finite dimensional ones? We answered to this question. More preciesely, let 9J1 be a finite von Neumann algbera on H. We showed that for any strongly closed subgroup G of U(9J1) , the unitary group of 9J1, there exists a Lie algebra, which is complete with respect to the strong resolvent topology and studied their properties. The discussion is based on the Murray-von Neumann's result stating that the set 9J1 of all densely defined closed operators affiliated with 9J1 has a natural *-algebra structure. Furthermore, we determined the category of the *-algebra of unbounded operators that can be represented as 9J1 for some finite von Neumann algebra 9J1.
31
Notes. After finishing this work, the authors were informed from Professor Daniel Beltita that recently he had written a paper whose subject was closely related to ours 3.
2. Murray-von Neumann's result In this section, we review the fundamental results obtained by Murray-von Neumann 6. For the details about operator algebras and operator theory, see Refs. 11, 15. Let 1-{ be a Hilbert space with an inner product (C 1]), which is linear with respect to 1]. We denote the algebra of all bounded operators on 1-{ by 'l3(1-{). Let 9J1 be a von Neumann algebra on 1-{. 9J1' is the commutant of 9J1. The group of all unitary operators in 9J1 is denoted by U(9J1). The lattice of all projections in 9J1 is denoted by P(9J1). Next, we recall the notion of an affiliated operator. The domain of a linear operator T on 1-{ is written as dom(T) and the range of it is written as ran(T). If T is a closable operator, we write T for the closure of T. Definition 2.1. A densely defined closable operator T on 1-{ is said to be affiliated with a von Neumann algebra 9J1 if for any u E U(9J1') , uTu* = T holds. If T is affiliated with 9J1, so is T. The set of all densely defined closed operators affiliated with 9J1 is denoted by 9J1. Each element in 9J1 is called an affiliated operator. In general, 9J1 is not a *-algebra under these operations. This is the reason for the difficulty of constructing Lie theory in infinite dimensions. However, Murray and von Neumann proved, in the pioneering paper 6, that for a finite von Neumann algebra 9J1, 9J1 does constitute a *-algebra of unbounded operators. That is, Theorem 2.1. Murray-von Neumann 6 For an arbitrary finite von Neumann algebra 9J1, the set 9J1 forms a *-algebra of unbounded operators, where the algebraic operations are defined b'll' (X,Y) (X, Y)
f----+ f----+
X
+ Y,
(a,X)
f----+
aX,
Xf----+X*.
XY,
This theorem is the starting point of our study. aaX equals aX when a
#
O. However, dom(O· X)
= 7t #
dom(X).
32
Remark 2.1. In 6, Murray-von Neumann proved Theorem 2.1 for countably decomposable case. It can be generalized to arbitrary finite von Neumann algebra. For the proof, see 1. The converse of Theorem 2.1 is also true. Namely, if 9J1 is a *-algebra, then 9J1 must be of finte type. Theorem 2.2. 1 Let 9J1 be a von Neumann algebra acting on a Hilbert space H. Assume that, for all A, B E 9J1, the domains dom(A + B) and dom(AB) are dense in H. If the set 9J1 forms a *-algebra with respect to the sum A + B, the scalar multiplication o;A (0; E C), the multiplication AB and the involution A *, then 9J1 is a finite von Neumann algebra. 3. Lie Group-Lie Algebra Correspondences In this section we state and prove the main result of this paper. As explained in the introduction, Lie theory for U(H) is a difficult issue. What one has to resolve for discussing the Lie group-Lie algebra correspondence is a domain problem of the generators of one parameter subgroups of G c U(H). The second to be discussed is a continuity of the Lie algebraic operations. However we can show that, for any strongly closed subgroup G of unitary group U(9J1) of some finite von Neumann algebra 9J1, there exists canonically a complete topological Lie algebra. Since there are continuously many non-isomorphic finite von Neumann algebras on H, there are also varieties of such groups.
3.1. Topological properties of 5)J1 We first endow 9J1 with two topologies, called the strong resolvent topology and the T-measure topolgoy. The former is operator theoretical one and the latter is an operator algebraic one. The combined use of them is indispensable for our purpose.
3.1.1. Strong Resolvent Topology First of all, we define the topology called the strong resolvent topology on the suitable subset of densely defined closed operators. Let H be a Hilbert space. Definition 3.1. We call a densely defined closed operator A on H belongs to the resolvent class fl~(1i) if A satisfies the following two conditions:
33
(RC.1) there exist self-adjoint operators X and Y on H such that the intersection dom(X) n dom(Y) is a core of X and Y, (RC.2) A=X+iY, A*=X-iY. Note that (RC.1) implies dom(X)ndom(Y) is dense, so X +iY and X -iY are closable. Thus X + iY and X - iY are always defined. Furthermore, we have
~(A + A*) = ~(X + iY + X
- iY) :J Xldom(X)ndom(Y).
Since A+A* is closable and by (RC.1), we get ~A + A* :J X. As X is selfadjoint, X has no non-trivial symmetric extension, we have ~A + A* = X. Therefore, X is uniquely determined. As same as the above, Y is also unique and -fiA - A* = Y. We denote 1---.---------.--,Re(A) := X = -A + A*, 2
1 ---.---------.--,-
1m (A) := Y = 2iA - A*.
Also note that bounded operators and (possibility unbounded) normal operators belong to !Jf!,#?(H). Now we endow !Jf!,#?(H) with the strong resolvent topology (SRT for short), the weakest topology for which the following mappings {
!Jf!'#?(H) :3 A
f------>
{Re(A) - i} -1 E (SB(H), SOT)
!Jf!,#?(H) :3 A
f------>
{1m(A) - i} -1 E (SB(H), SOT)
are continuous. Thus a net {A,}", in !Jf!,#?(H) converges to A E !Jf!,#?(H) with respect to the strong resolvent topology if and only if {Re(A",)-i}-l~ ----; {Re(A)-i}-l~,
{1m(A",)-i}-l~ ----; {1m(A)-i}-l~,
for each ~ E H. This topology is well-studied in the field of unbounded operator theory and suitable for the operator theoretical study. We denote the system of open sets of the strong resolvent topology by OSRT. Let 9J1 be a finite von Neumann algebra on a Hilbert space H. We can show that 9J1 is a closed subset of the resolvent class !Jf!'#? (H). 3.1.2. T-Measure Topology
Let 9J1 be a count ably decomposable finite von Neumann algebra acting on a Hilbert space H. Fix a faithful normal tracial state T on 9J1. The T-measure topology (MT for short) on 9J1 is the linear topology whose fundamental
34
system of neighborhoods at 0 is given by
N( 15)'= {A c,.
9J1. there exists a projection p E 9J1} IIApl1 < c, T(PJ..) < 15 '
E , such that
where c and 15 run over all strictly positive real numbers. It is known that 9J1 is a complete topological *-algebra with respect to this topology 9. We denote the system of open sets with respect to the T-measure topology by Or. Note that the T-measure topology satisfies the first count ability axiom. Remark 3.1. In this context, the operators in 9J1 are also called measurable operators 4.
T-
Thus there are two topologies on 9J1, the strong resolvent topology and the T-measure topology. It seems that these two topologies are quite different. However, in fact, they coincide on 9J1, i.e., Lemma 3.1. Let 9J1 be a countably decomposable finite von Neumann algebra acting on a Hilbert space H. Then the strong resolvent topology and the T-measure topology coincide on 9J1. In particular, 9J1 forms a complete topological *-algebra with respect to the strong resolvent topology. Moreover the T-measure topology is independent of the choice of a faithful normal tracial state T. 3.2. Main Results
Definition 3.2. For a strongly closed subgroup G of U(9J1), the set fJ = Lie(G) := {A ; A* = -A on H, etA E G, for all t E~}
is called the Lie algebra of G. The complexification fJlC of fJ is defined by fJlC := {A
+ iB
; A, BE fJ} .
If G = U(9J1), we sometimes write fJ as u(9J1).
Remark 3.2. In general, the strong limit of unitary operators is not necessarily unitary. It is known that U(9J1) is strongly closed in 'l3(H) if and only if 9J1 is a finite von Neumann algebra.
At first sight, it is not clear whether we can define algebraic operations on fJ. However, Lemma 3.2. Under the above notations, fJ C 9J1 holds.
35
Therefore the sum A + B and the Lie bracket AB - BA are well-defined operations in 9J1, but it is not clear whether they belong to 9 again. The following Lemma 3.4 guarantees the validity of the name "Lie algebra". The former part of the proof is based on the two lemmata established by Trotter-Kato and E. Nelson, which are of importance int their own. Lemma 3.3. Trotter-Kato, Nelson a Hilbert space H.
8
Let A, B be skew-adjoint operators on
(1) If A + B is essentially skew-adjoint on dom(A) n dom(B), then it holds that
for all t E R (2) If (AB - BA) is essentially skew-adjoint on dom(A 2) n dom(AB)
n dom(BA) n dom(B2),
then it holds that
for all t > 0, where [A, B] := AB - BA. Lemma 3.4. Let G be a strongly closed subgroup of U(9J1). Then 9 is a real Lie algebra with the Lie bracket [X, Y] := XY - YX. Based on the above preliminaries, we can prove the following main result. Theorem 3.1. Let G be a strongly closed subgroup of the unitary group U(9J1) of a finite von Neumann algebra 9J1. Then 9 is a complete topological real Lie algebra with respect to the strong resolvent topology. Moreover, 9c is a complete topological Lie * -algebra. Remark 3.3. It is easy to see that for G = U(9J1), its Lie algebra u(9J1) is equal to {A E 9J1; A * = -A} and the exponential map exp : u(9J1)
->
U(9J1)
is continuous and surjective. Proposition 3.1. Let 9J1 1 , 9J1 2 be finite von Neumann algebras on Hilbert spaces H 1 , H2 respectively. Let G i be a strongly closed subgroup of U(9J1 i ) (i = 1,2). For any strongly continuous group homomorphism
36
rp : G 1 ----t G 2 , there exists a unique SRT-continuous Lie algebra homomorphism (A) for all A E Lie(Gd. In particular, if G 1 is isomorphic to G 2 as a topological group, then Lie( Gd and Lie( G 2 ) are isomorphic as a topological Lie algebra. As above, G has finite dimensional characters. On the other hand, it also has an infinite dimensional character. Proposition 3.2. Let 9J1 be a finite von Neumann algebra, then the following are equivalent.
(1) The exponential map exp: u(9J1):1 X f-+ exp(X) E U(9J1) is locally injective. Namely, the restriction of the map onto some SRT-neighborhood of 0 E 9J1 is injective. (2) 9J1 is finite dimensional. Remark 3.4. Lie( G) is not always locally convex, whereas most of infinite dimensional Lie theories, by contrast, assume local convexity. Indeed, u(9J1) is locally convex if and only if 9J1 is atomic. Next, we characterize closed *-subalgebras of 9J1. Proposition 3.3. Let 9J1 be a finite non Neumann algebra on a Hilbert space H, !!l be a SRT-closed *-subalgebra of 9J1 with hi. Then there exists a unique von Neumann subalgebra l)1 of 9J1 such that!!l = l)1. 4. Categorical Characterization of 9J1 In this section we consider some categorical aspects of the *-algebra 9J1. Especially, we determine when a *-algebra!!l of unbounded operators on a Hilbert space H turns out to be of the form 9J1, without any reference to von Neumann algebraic structure in advance. For this purpose, we define the category fRng of unbounded operator algebras and compare this category with the category fvN of finite von Neumann algebras and show that both of them have natural tensor category structures. Furthermore, we will see that they are isomorphic as a tensor category, in spite of the fact that the object in fRng is not locally convex in general while the one in fvN is a Banach space. To begin with, let us introduce the structure of tensor category into fRng. First, it is well known that the usual tensor product (9J1 1 ,9J12 ) f-+ 9J11 09J1 2 of von Neumann algebras and the tensor product of a-weakly continuous homomorphisms (¢1 1¢2) f-+ ¢1 0 ¢2 makes the
37
category of finite von Neumann algebras a tensor category. Therefore we define: Definition 4.1. The category fvN is a category whose objects consist of pairs (9J1, 'H) of a finite von Neumann algebra 9J1 acting on a Hilbert space 'H and whose morphisms are a-weakly continuous unital *-homomorphisms. The unit object is (Cle, C). The tensor functor is the usual tensor product functor of von Neumann algebras. The definition of left and right unit constraint functors might be obvious. If we are to characterize the objects in fRng, we must settle some subtleties due to the fact that we cannot use von Neumann algebraic structure from the outset. However, this difficulty can be overcome thanks to the the notion of the strong resolvent topology and the resolvent class whose definitions are independent of von Neumann algebras (See §3). We define fRng as follows.
Definition 4.2. The category fRng is a category whose objects (ge, 'H) consist of a SRT-closed subset ge of the resolvent class ge'-(?('H) on a Hilbert space 'H with the following properties: (1) X + Y and XY are closable for all X, Y E ge. (2) X + Y, aX, XY and X* again belong to ge for all X, Y E ge and a E C. (3) ge forms a *-algebra with respect to the sum X + Y, the scalar multiplication aX, the multiplication XY and the involution X*. (4) lH E ge.
The morphism set between (gel, 'Hd and (ge2, 'H 2) consists of SRTcontinuous unital *-homomorphisms from gel to ge2. Remark 4.1. From the definition of fRng, it is not clear whether, for each objects in fRng, its algebraic operations are continuous or not. However, the next lemma shows that ge is a complete topological *-algebra. Lemma 4.1. Let (ge, 'H) be an object in fRng. Then there exists a unique finite von Neumann algebra 9J1 on 'H such that ge = 9J1. Furthermore, 9J1 = ge n Q3('H) holds. Note that for each finite von Neumann algebra 9J1 on a Hilbert space
'H, (9J1, 'H) is an object in fRng. The main result of this section is the next theorem.
38
Theorem 4.1. The category fRng is a tensor category. Moreover, fRng and fvN are isomorphic as a tensor category. To prove this theorem, we need many lemmata. For the details, see Ref 1. Acknowledgement The authors would like to express their sincere thanks to Professor Masanori Ohya and Professor Noboru Watanabe and all the organizers of QBIC2010 for their kind invitation to the conference and giving them the precious opportunity to have a talk there. They are grateful to Professor Daniel Beltita for informing them of his important paper 3 and for his kind correspondences and discussions with them. They also thank to Ms. Mari Kuroda for her warm help and correspondences. H.A and 1.0 also thank to Mr. Ryo Harada, Mr. Takahiro Hasebe, Mr. Kazuya Okamura and Mr. Hayato Saigo for their useful comments and discussions during the seminar. They also thank to Professor Asao Arai at Hokkaido university for the fruitful discussions, insightful comments and encouragements, Professor Konrad Schmudgen at Universitat Leipzig for informing them of the paper12 and Mr. Yutaka Shikano at MIT for his professional advice about LaTeX. References 1. H. Ando and Y. Matsuzawa,
2. 3. 4. 5. 6. 7. 8. 9. 10.
Lie Group-Lie Algebra Correspondences of Unitary Groups in Finite von Neumann Algebras, submitted. http://arxiv.org/abs/1005.4850 H. Ando, Y. Matsuzawa and I. Ojima, in preparation D. Beltita, Lie theoretic significance of the measure topologies associated with a finite trace, Forum Math. 22 (2010), 241-253. T. Fack and H. Kosaki, Generalized s-numbers of T-measurable operators, Pacific J. Math., 123 (1986), 260-300. K. Hofmann and S. Morris, The Lie Theory of Connected Pro-Lie Groups, Europ. Math. Soc. Publ. House, 2007. F. Murray and J. von Neumann, On Rings of Operators, Ann. Math. 37 (1936), 116-229. K.-H. Neeb, Towards a Lie theory of locally convex groups, Jpn. J. Math, 1 (2006),291-468. E. Nelson, Topics in Dynamics I, Princeton University Press, Princeton, 1969. E. Nelson, Notes on non-commutative integration, J. Funct. Anal. 15 (1974), 103-116. H. Omori, Infinite-Dimensional Lie Groups, Transl. Math. Monogr., 158, Amer. Math. Soc., 1997.
39
11. M. Reed and B. Simon, Methods of Modern Mathematical Physics I, Academic Press, New York, 1972. 12. K. Schmiidgen, On domains of powers of closed symmetric operators, J. Operator Theory. 9 (1983), 53-75. 13. K. Schmiidgen, Unbounded Operator Algebras and Representation Theory, Birkhauser Verlag, Basel, 1990. 14. W. Stinespring, Integration theorems for gages and duality for unimodular groups. Trans. Amer. Math. Soc. 90 (1959), 15-56. 15. M. Takesaki, Theory of Operator Algebras I, Springer-Verlag, Berlin, 1979.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 41-50)
ON A GENERAL FORM OF TIME OPERATORS OF A HAMILTONIAN WITH PURELY DISCRETE SPECTRUM
A. ARAI* Department of Mathematics, Hokkaido University Sapporo, Hokkaido 060-0810, Japan * E-mail: [email protected] We review some results on determining a general form of time operators of a Hamiltonian with purely discrete spectrum.
Keywords: Canonical commutation relations; Hamiltonian; Spectrum; Time operator.
1. Introduction
Let 1i be a complex Hilbert space and H be a self-adjoint operator on 1i. A symmetric operator T on 1i is called a time operator of H if there is a subspace V =f. {O} (not necessarily dense in 1i) such that V c D(TH) n D(HT)-for a linear operator A on 1i, D(A) denotes the domain of Aand the canonical commutation relation (CCR) on V [T,H]'lj;
= i'lj;,
V'lj; E V
(1.1)
holds, where [T, H] := T H - HT and i is the imaginary unit. The name "time operator" comes from the physical context where H is the Hamiltonian of a quantum system. But we use this terminology in the general mathematical context too. We call the subspace V a CCR-domain for the pair (T, H). From purely mathematical point of view, the pair (T, H) is a (not necessarily self-adjoint) representation of the CCR with one degree of freedom. A study from this point of view has been made by Dorfmeister 9 in the case where H is bounded and purely absolutely continuous. There is a stronger version of time operator: a symmetric operator T on 1i is called a strong time operator of H if, for all t E IR, e- itH D(T) c D(T) and the weak Weyl relation Te-itH'lj;
= e- itH (T + t)'lj;, 41
t E IR, 'lj; E D(T)
42
holds. It is easy to see that a strong time operator T with D(HT) n D(T H) # {O} is a time operator. But the converse is not true. As for strong time operators, there have been studies 1 - 6, 11-13. One of the fundamental properties of H which has a strong time operator is that H is purely absolutely continuous. Hence a self-adjoint operator with eigenvalues cannot have a strong time operator. In this paper, we consider time operators of a self-adjoint operator H whose spectrum consists of only discrete eigenvalues and their possible accumulation points. It follows from the fact mentioned in the last sentence of the preceding paragraph that such time operators cannot be strong time operators of H. This kind of time operators was proposed by Galapon 10 first (see also 8 and references therein for specific examples). Then detailed, mathematically rigorous analysis on the Galapon time operator has been made by Arai-Matsuzawa 7. It is interesting to investigate to what extent a general form of such time operators can be determined under a suitable condition. In the previous paper 5, some results on this aspect were established. Below is a summary of them. 2. Time Operators of a Hamiltonian with Discrete Eigenvalues (I) We denote the inner product and the norm of 1i by (.,.) (linear in the second variable) and II . II respectively. Let N = {I, 2, 3, ... } be the set of natural numbers. A basic assumption in the present paper is as follows: Hypothesis (H) The self-adjoint operator H has a complete orthonormal system (CONS) {enaln E N,a = 1,··· ,Mn} C 1i of eigenvectors with discrete eigenvalues {En }nEN (En # Ern, n # m, n, mEN):
(2.1) (e na , em (3)
= DnmDa(3,
a
= 1,···
,Mn , f3
= 1,···
,Mm,(2.2)
where Dab is the Kronecker delta and Mn E N is the multiplicity of eigenvalue En, obeying
(2.3) We set
(2.4)
43
Hypothesis (H) implies that the spectrum of H, denoted CJ(H), is given by
(2.5) where the right hand side is the closure of the set {En} ~=1' and CJ( H) \ {En}~=l contains no eigenvalues of H. For a subset 5 of H, we denote by 1.h.(5) the linear hull of 5, i.e., the subspace algebraically spanned by the vectors in 5. Since the set {enaln E N, a = 1"" ,Mn} is a CONS ofH, the subspace
Do
:=
1.h.( {en<>ln E N, a = 1"" ,Mn})
is dense in H.
2.1. A general class of time operators of H Suppose that, for some no EN, CXJ
L n=no
Then, in the same way as in
D(To) T,
:=
n/, ._ .
7,
1
(2.6)
- 2 <00.
En
one can define a linear operator To as follows:
Do,
(2.7)
~ L~ (~ L
O'f/ . - Z L
n=lo:=l
(e rru:n 'Ij;) ) E _ E eno:, min n m
'Ij; E D(To).
(2.8)
It is easy to see that To is a symmetric operator.
Remark 2.1. Under condition (2.6), H is unbounded, since (2.6) implies that IEnl -+ 00 as n -+ 00. Remark 2.2. In
o < En < En+1'
10
and
7,
it is assumed that H is bounded below with
n E N. But, in the present paper, we do not assume the
semi-boundedness (boundedness below or boundedness above). We introduce a subspace: EM := 1.h.( {en<> - emo:ln, mEN, a = 1" .. ,M}).
(2.9)
This subspace is not necessarily dense in H: Lemma 2.1. The subspace EM is dense in H if and only if M all n E N.
= Mn
for
44
The next theorem shows that To is a time operator of H with EM being a CCR-domain for (To , H): Theorem 2.1. Under Hypothesis (H) with (2.6), EM
c D(ToH)nD(HTo)
and (2.10) Remark 2.3. It is easy to see that 2:%fn EUIE k - En l2 = 00 . Hence it follows that Toe na tf- D(H). Therefore Vo is not a CCR-domain for
(To , H). We next consider a perturbation of To by a symmetric operator T1 such that To + T1 is a time operator of H. Let a:= {a n (a,,8) ln E N,a,,8 = 1,··· ,Mn} be a set of complex numbers such that
an(a, ,8) * = an (,8, a),
n
E
N, a,,8
= 1,··· , Mn,
(2 .11)
where an (a ,,8)* is the complex conjugate of an(a,,B). Then we define a linear operator T1 (a) on 1-{ as follows:
(2 .12)
(2.13) It follows that
with Mn
T1 (a)ena =
L
an (,8, a)en,6,
\/n E N, a = 1,··· , Mn-
,6= 1
It is easy to see that T1 (a) is a symmetric operator. Using (2.14) , we see that Vo C D(HT1(a)) n D(T1(a)H) and
T1(a)H1jJ
= HT1(a)1jJ,
\/1jJ E Vo·
By this fact and Theorem 2.1 we obt ain the next theorem:
(2.14)
45
Theorem 2.2. Assume Hypotheis (H) and (2.6). Let a be as above and
T(a) := To
+ Tl(a).
(2.15)
Then T(a) is a time operator of H with EM being a CCR-domain for (T(a), H). Thus (2.6) gives a sufficient condition for H with Hypothesis (H) to have time operators of the form (2.15). Remark 2.4. Boundedness or unboundedness of T(a) can be investigated in the same way as in 7. But, here, we do not go into the details.
2.2. Necessary condition for H to have time operators and the general form of them We next consider a necessary condition for H to have time operators and their general form. Theorem 2.3. Let H be a self-adjoint operator satisfying Hypothesis (H) and T be a time operator of H such that EM is a CCR-domain for (T, H) and ena E D(T) , \:In E N, a = 1 " . . , M. Then H is unbounded and there is an no E N such that (2.6) holds. Moreover, the following (i) and (ii) hold: (i) Let M = M n , \:In E N. Then, for all 'l/J E D(T),
(2.16)
and (2.17)
In particular, one has T where
= T(a(T)) = To + Tl(a(T))
on V o,
(2.18)
46
(ii) Let k be a natural number such that M
Mn > M, n ;:::: k + 1. Then, for all 'Ij;
E
M n , n = 1,··· ,k, and
D(T), 2
(2.19) co
Mn
L L
1(ena,T'Ij;) 12
<
(2.20)
00
n=k+l a=M+l
and
where Mn
co
ST'Ij;:=
L
L
(ena , T'Ij;) ena
n=k+l a=M+l
with
2.3. Non-existence theorems of time operators Theorem 2.3 can be read as non-existence theorems of time operators for a class of H as shown below. Theorem 2.4. Let H be a self-adjoint operator with Hypothesis (H) such that co
L n=no
1 -=00
E;;
(2.22)
for some no E N. Then there exist no time operators T of H such that Vo c D(T) and EM is a CCR-domain for (T, H). Proof. This follows from the contraposition of Theorem 2.3. A simple consequence of this theorem is given as follows:
D
47
Theorem 2.5. Let H be a self-adjoint operator with Hypothesis (H). Suppose that there exist a constant a E [0,1/2] and a real bounded sequence {bn}~=l satisfying (2.23) Then there exist no time operators T of H such that Vo C D(T) and EM is a CCR-domain for (T, H).
Theorem 2.6. Let H be a bounded self-adjoint operator with Hypothesis (H). Then there exist no time operators T of H such that Vo C D(T) and EM is a CCR-domain for (T, H). Proof If H is bounded, then the sequence {En}~=l is bounded. Hence this is the case where a = 0 in (2.23). Thus Theorem 2.5 implies the desired rewU. D
3. Time Operators of a Hamiltonian with Discrete Eigenvalues (II) In this section we present another type of time operators of H. Here we do not assume (2.3). We define
Then
(3.1) and {en}~=l is an orthonormal system of H: (en, em) We introduce a subspace:
= Dnm , n, mEN.
:Fo:= l.h.({enin EN}).
It is easy to see that :Fo is dense if and only if Mn = 1 for all n E N. Assume (2.6). Then, as in the case of the operator To in Section 2, one can define a linear operator To on H as follows: ~
D(To) := {'lfJ = 'lfJ 1 To'lfJ :=
+ 'lfJ2i'lfJl
if (f i~~ n=l
mien
~
E
:Fo, 'lfJ2 E :Fo }
im) en,
'lfJ E D(To),
(3.2) (3.3)
48
It is easy to see that To is densely defined and symmetric. We remark that, if Mn = 1, \:In E N, then
To = To·
Let
(3.4) It is obvious that
It is shown that, if every En is simple, then F_ is dense in H is not dense if at least one of En (n E N) is degenerate.
7,lO.
But F_
Theorem 3.1. The operator To is a time operator of H with F_ being a CCR-domain for (To, H). Namely
F_ C D(ToH) n D(HTo) and (3.5) For a real sequence e as follows:
= {en} ~=l' we define a linear operator S (e) on H
00
(3.7) n=l
Obviously we have
Fa
C
D(S(e))
with (3.8) Hence en is an eigenvalue of S(c). It follows that S(e) is a symmetric operator. Using (3.8), we see that Fa C D(HS(c)) n D(S(e)H) and
S(c)H'I/J = HS(e)'I/J,
\:I'I/J
E
Fa·
Thus the operator
T(e)
:=
To
+ S(e)
(3.9)
49
is a time operator of H with F_ being a CCR-domain for (T(c) , H). Let (3.10)
F:=Fo· We denote the range of T by Ran(T).
Theorem 3.2. Let H be a self-adjoint operator satisfying Hypothesis (H) and T be a time operator of H such that Fo C D(T), Ran(T) C F and F_ is a CCR-domain for (T, H). Then H is unbounded and there is an no E 1"1 such that (2.6) holds. Moreover, for all7jJ E D(T), 2
<00
(3.11)
and (3.12)
In particular, one has T = T(c(T)) = To
+ S(c(T))
on F o,
(3.13)
where
As in Theorem 2.3, Theorem 3.2 can be read as non-existence theorems of time operators for a class of H. Theorem 3.3. Let H be a self-adjoint operator with Hypothesis (H) such that (2.22) holds for some no ;::: 1. Then there exist no time operators T of H such that Fo C D(T), Ran(T) E F and F_ is a CCR-domain for (T,H). Froof This follows from the contraposition of Theorem 3.2.
D
Theorem 3.4. Let H be a self-adjoint operator with Hypothesis (H). Suppose that there exist a constant a E [0,1/2] and a real bounded sequence {bn}~=l such that (2.23) holds. Then there exist no time operators T of H such that Fo C D(T), Ran(T) C F and F_ is a CCR-domain for (T, H).
50
Theorem 3.5. Let H be a bounded self-adjoint operator with Hypothesis (H). Then there exist no time operators T of H such that :Fa C D(T), Ran(T) c:F and:F_ is a CCR-domain for (T, H). Acknowledgments The author would like to thank Professor M. Ohya for inviting him to present a lecture at the International Conference QBIC'lO (Noda Campus of TUS, March 10-13, 2010). This work was supported by the Grant-In-Aid No.21540206 for Scientific Research from Japan Society for the Promotion of Science (JSPS).
References 1. A. Arai, Generalized weak Weyl relation and decay of quantum dynamics, Rev. Math. Phys. 17 (2005) , 1071-1109 . 2. A. Arai, Spectrum of time operators, Lett. Math. Phys. 80 (2007), 211-221. 3. A. Arai, Some aspects of time operators, in Quantum Bio-Informatics (Editors: L. Accardi, W. Freudenberg and M. Ohya), World Scientific, Singapore, 2008, 26-35. 4. A. Arai, On the uniqueness of weak Weyl representations of the canonical commutation relation, Lett. Math. Phys. 85 (2008), 15-25. 5. A. Arai, Necessary and sufficient conditions for a Hamiltonian with discrete eigenvalues to have time operators, Lett. Math. Phys. 87 (2009), 67-80. 6. A. Arai and Y. Matsuzawa, Construction of a Weyl representation from a weak Weyl representation of the canonical commutation relation, Lett. Math. Phys. 83 (2008),201-211. 7. A. Arai and Y. Matsuzawa, Time operators of a Hamiltonian with purely discrete spectrum, Rev. Math. Phys. 20 (2008), 951-978. 8. R. Caballar and E. A. Galapon, Characterizing multiple solutions of the timeenergy-canonical commutation relation via quantum dynamics, Phys. Let. A 373 (2009), 2660-2666. 9. G. Dorfmeister and J. Dorfmeister, Classification of certain pairs of operators (P, Q) satisfying [P, QJ = -iId, J. Funct. Anal. 57 (1984), 301-328. 10. E. A. Galapon, Self-adjoint time operator is the rule for discrete semibounded Hamiltonians, Proc. R. Soc. Lond. A 458 (2002),2671-2689. 11. M. Miyamoto, A generalized Weyl relation approach to the time operator and its connection to the survival probability, J. Math. Phys. 42 (2001), 1038-1052. 104, 570-578. 12. K. Schmudgen, On the Heisenberg commutation relation. I, J. Funct. Anal. 50 (1983), 8-49. 13. K. Schmudgen, On the Heisenberg commutation relation. II, Publ. RIMS, Kyoto Univ. 19 (1983),601-671.
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 51- 60)
QUANTUM UNCERTAINTY AND DECISION-MAKING IN GAME THEORY M. ASANO, M. OHYA and Y. TANAKA
Department of Information Sciences, Faculty of Science and Technology , Tokyo University of Science, Noda City, Chiba 278, Japan A. KHRENNIKOV and 1. BASIEVA
International Center for Mathematical Modelling in Physics and Cognitive Sciences Linnaeus University, Viixjo, S-35195 Sweden
Recently a few authors pointed to a possibility to apply the mathematical formalism of quantum mechanics to cognitive psychology, in particular, to games of the Prisoners Dilemma (PD) type. 6 _ 18 In this paper, we discuss the problem of rationality in game theory and point out that the quantum uncertainty is similar to the uncertainty of knowledge , which a player feels subjectively in his decision-making.
1. Introduction
Game theory is an applied mathematics used in the various disciplines; the social sciences, economics, political science, international relations, computer science, and philosophy. This theory analyzes an interdependency of decision-making entities (players) under a certain institutional condition. In a normal form of game, players have some strategies to be chosen and obtain payoffs which are assigned for results of their choices. About the player's decision-making, the following three assumptions are required: First, it is assumed that players know the rule of game; each player knows all selectable strategies and payoffs. Second, it is assumed that a player behaves rationally so as to maximize his own payoff. Third, it is assumed that each rational player recognizes rationalities of other players. Generally, the rationality contains the faculty of reasoning, and the second assumption and the third one can not be discussed distinctively. A rational player makes a decision by his rationality and reasoning rational behaviors of other players. A goal of game theory is to explain various interdependencies in the real world, and it is believed that the concept of Nash equilibrium provides a 51
52
normative solution for such explanation. However, there are some experiments of games with Nash equilibriums, in which, real players frequently do not achieve the Nash equilibrium, and it seems to be irrational. 1_5 About this fact, someone may think it is a matter of course that there are some players who make decisions roughly, and someone may think players educated about game theory should not make such a mistake. These opinions might be reasonable in some cases, however, we believe that there are cases that a real player's decision-making process is essentially different from a normative player's one in conventional game theory. We discuss this point by using several examples of games. Firstly we consider the following game.
A/B DA IA
DE
IE
4/4 5/2
2/5 3/3
This is a well-known two-player game called a prisoner's dilemma (PD) game. The players A and B have two strategies denoted by DA,E and IA,E, and the values of payoffs are assigned for results of (DA' DE), (DA' IE), (lA, DE) and (lA, IE)' Generally, PD game has a unique Nash equilibrium: Since the dominant strategies IA and IE are best for the rational players A and B, the result (lA, IE) is the solution of Nash equilibrium. A player's decision-making process is described as a combination of the player's rationality and the reasoning about the another player's rationality. The diagram in Fig. I shows the player A's decision-making process.
(DA,Ds) •••••••••••••• (DA, 1s) Player A's preference
............
(1A,Ds) •..•••••••••••~ (1A,1s) Figure 1.
Player S's preference reasoned by A
53
The two solid arrows in this diagram explains that the player A prefers the result (l A ,OB) to (OA,OB) and prefers (lA,lB) to (OA,l B ). The two dotted arrows explains that the player A reasons that the player B will prefer (OA, 1B) to (OA, OB) and prefers the result (lA, 1B) to (lA, OB). These arrows represent flows of the A's thinking. One can see that the A's thinking always reach the expectation for the result (lA,l B ), namely, the player A chooses 1A with believing the player B will choose lB. However, we wonder whether such the description of decision-making process is enough to explain a real player's thinking or not. Let us consider the case of the following prisoner's dilemma game;
A/B OA 1A
OB 100000/100000 100000 + 1/2
1B 2/100000 + 1
3/3
If an experiment of this game is given, we expect many real players will choose the strategy 0, and their decision-making is similar to the following player A's one: The player A which we assume, feels strongly that the payoff 100000 at the result (OA,OB) is very attractive, compared with the payoff 3 at (lA, 1B)' Moreover, he reasons "another player B will think the same thing. 11 Both players prefer the result (OA, OB) to (lA, 1B)' The player A cannot neglect this fact in his decision-making, and there is a possibility to shift his expectation from (1 A, 1 B) to (0 A, 0 B), as seen in the diagram of Fig. 2.
(OA,OB) ••••.••••.••••~(OA,1B) Player A's preference
..........
~
(1 A, OB) ............... (1 A, 1B)
Player B's preference reasoned by A
Figure 2.
This shift affirms the mind to choose 0, and the player A's mind oscil-
54
lates between the choice of 1A and the choice of OA. Here, note that the mind to choose 1A works for increasing a player's payoff by only 1. On the other hand, the mind to choose OA works for increasing the payoff by 100000 - 3. It is clear that the mind to choose OA is dominant. Such the additional effect in decision-making process can be discussed in various games, but not only the type of prisoner's dilemma. For example, let us consider the game with the table bellow. In this game, the result
AlB OA 1A
OB 41 100000 + 1 5I 4
1B 100000I 100000 4I 0
(lA, OB) is a unique Nash equilibrium. However, as seen in the diagram of Fig. 3, by assuming the comparison between the result (lA, OB) and (OA, 1B) in the player A's decision-making process, we can expect that the player A's mind will oscillate.
(OA,OS) .................. (OA,1 S) Player A's preference
..........
~
(1A,1 s)
Player B's preference reasoned by A
Figure 3.
The additional shifts in Fig. 2 and Fig. 3 might seem to be strange in the framework of the game theory. Conventionally, the player A 's preferences are explained in the contexs as "If the player B chose x B E {O B , 1 B }, then I will prefer to choose YA E {OA , 1A}", and the player B's preferences reasoned by the A are explained in the same form. Such a contex consists of two parts; the part of assumption and the part of analysis. In the first part , the player A assumes one of the player B's choices. In the second part, the player A analizes his preference with based on the assumption. Note that such an analysis is contextually same with a posterior analysis
55
by someone who already knows a result of the B's choice. In principle, each player is not informed of another player's choice, and actually, the player A will just have to perform a prior analysis. The conventional game theory gives a prior analysis based on the concept of mixed storategy, where the player A assumes the player B chooses OE or IE probabilistically, and the A analyzes his preference in the comparison between the expectation values of payoffs. However, such the analysis is just a mixture of the posterior analyses based on the probabilistic distribution about the B's choice. Such the prior analysis never explains the effects of the additional shifts in Fig. 2 and Fig. 3. Generally, a player who performs a prior analysis, holds uncertainty about another player's choice. In the concept of mixed strategy, it is represented by a simple probabilistic distribution. On the other hands, the uncertainity we consider can not be explained by the classical probability theory. Actually, a few authors pointed to a possibility to apply the mathematical formalism of quantum mechanics to cognitive psychology. 6_18 It was found that statistical data obtained in some experiments of cognitive psychology. The theory of qunatum physics teaches us that there is a phenomenon which cannot be explained by the classical probability theory, and it means that a prior analysis about a phenomenon is essentially different from a simple mixture of posterior analyses. We explain this point by introducing the example of the double slit experiment. Let us consider the experimental apparatus shown in Fig. 4, which consists of the two plates with three slits and the photo-sensitive plate. A photon passing through the first and second plates is detected on the photo-sensitive plate. If the slit-Ion the second plate is closed, see Fig. 5, photons which are detected on the third plate certainly pass through the slit-O. Then, one can obtain the distribution of photons as seen in Fig. 5. If the slit-O is closed, one can see the distribution of Fig. 6. Next, we consider the case that the photon detectors are placed near the two slits on the second plate. By these detectors, one can know which slit a photon passes through. The distribution of photons in this case exactly corresponds to the simple mixture of the distributions in Fig. 5 and Fig. 6, see Fig. 7. Lastly, we consider the case that the detectors are removed, and then , one can never know which slit a photon passes through. The distribution of photons forms an interference pattern, and it is clearly different form the result of Fig. 7, see Fig. 8. Here, we explains the mathematical representations of states of photons in the above double slit experiments. Firstly, in the case of Fig. 5, we denote
56
0--
photon
photo~sensitive
Figure 4.
~ 0--
photon
plate
Double Slit Experiment
~ ,''''
[<J "'" I'hO!lHIen.itive plate
Figure 5.
a state of photon passing through the slit-O by 10) = called a state vector defined in the Hilbert space 1i =
(~)
E
<e 2 which is
<e 2 • Similarly, a state
of photon passing through the slit-l in Fig. 6 is denoted by 11)
=
(~)
E
<e 2
These vectors are orthogonal each other. Also, these are often represented
57
0 -
photon
photo-sensitive plate
Figure 6.
~
~ slitO
photo.detector
~,
~:
0 -
~
photon
"11 photo-sensitive plate
Figure 7.
in the form of density operators as
~
mCO) ~ (~~)
11) (11 ~
G) (01) ~ e~)
10) (01
=0
=0
PO'
p,
The density operator is useful for the statistical description: The photon such that passes through the slit-D, 1 with the probabilities p and 1 - p is
58
phut&-scnsitive ,,13tt:
Figure 8.
represented by
In the case of Fig. 7, the two photon detectors check photons that certainly pass through either the slit-O or the silit-1. The state of photon averaged statistically is written as the p with the above form. The state of photon in Fig. 8 is described in the form specific to the quantum mechanics that is called the quantum superposition. The state of quantum superposition is given by the density operator
p
lal2 af3*) = I¢) (¢I = ( a* f3 1f31 2
.
== a 10) + (311) = (~), where a and f3 are complex numbers satisfying lal 2+ 1f31 2 = 1. The a and f3 are called probability amplitudes, and the squares of them, lal 2 and 1f31 2 , have Here, I¢) is the state vector defined by I¢)
meanings of probabilities. Generally, probabilities in the quantum mechanics are defined secondarily from probability amplitudes. The nondiagonal parts in the above p are called parts of quantum fractuations. The interference pattern seen in Fig. 8 is formed due to the effects of quantum fractuations. The state vector I¢) = a 10) + (311) is a mixture of 10) and 11) in a sense, however it is not the probabilistic mixture as seen in the form of p in Fig. 7. The form of I¢) affirms the states of 10) and 11) simultaneously
59
with weights of a and (3, which are not probabilities, and it does not imply "a photon passed through either the slit-O or the slit-I". The probabilistic mixture p never explain the interference pattern. The double slit experiment symbolically represents the decision-making process which we discussed previously. Let us replace the term of "photon" by "the player B" , and replace the context as "the photon passes through the slit-O (or 1)" by "the player B chooses the strategy OB (or I B )". Then, the distribution shown in Fig. 5 ( or Fig. 6 ) represents a posterior analysis by the player A who determines the player B chose OB (or IB). Furthermore, the distribution in Fig. 7 is interpreted as the mixture of posterior analyses corresponding to a prior analysis in the conventional game theory. The existence of the two photon-detectors in Fig. 7 means that the player A determines the player B's choice probabilistically. The game theory identifies this probabilistic determination as an uncertainty the A holds. However, we consider a more deep uncertainty where the player A can never determine the B's choice essentially. The distribution in Fig. 8 represents a prior analysis based on such the uncertainty. The mathematical formalism in quantum mechanics gives the quantum uncertainty with quantum fluctuation, which is different essentially from the classical probability distribution. As pointed out previously, we believe that the quantum uncertainty is very similar to the uncertainty which the player A holds for the another player B's choice. Someone might criticize that the B's action is a classical phenomenon, so such the uncertainty should be described by classical probabilities. However, this stochastic phenomenon is just an objective fact recognized posteriorly from some statistical data. The uncertainty in decision-making should be described with based on the subjective fact that the player A essentially can never determine the B's choice. This subjective fact is rather realistic for A's prior analysis in his decision-making.
References 1. Shafir, E. and Tversky, A.: Thinking through uncertainty: nonconsequential
reasoning and choice. Cognitive Psychology 24, 449-474 (1992) 2. Tversky, A. and Shafir , E.: The disjunction effect in choice under uncerta inty. Psychological Science, 3, 305-309 (1992) 3. Croson, R.: The disjunction effect and reasoning-based choice in games. Organizational Behavior and Human Decision Processes 80, 118-133 ( 1999) 4. Hofstader, D. R.: Dilemmas for superrational thinkers, leading up to a luring lottery. Scientific American, 6 (1983)
60
5. Hofstader, D. R.: Metamagical themes: Questing for the essence of mind and pattern. Basic Books, New York (1985) 6. A. Khrennikov, Open Systems and Information Dynamics 11 (3), 267-275 (2004) . 7. A. Khrennikov, BioSystems 84, 225-241 (2006). 8. K.-H.Fichtner, L.Fichtner, W.Freudenberg and M.Ohya, On a quantum model of the recognition process. QP-PQ:Quantum Prob. White Noise Analysis 21, 64-84 (2008). 9. Busemeyer, J. B., Wang, Z. and Townsend, J. T.: Quantum dynamics of human decision making. J. Math. Psychology 50, 220-241 (2006) 10. Busemeyer, J. R., Matthews, M., and Wang, Z.: A Quantum Information Processing Explanation of Disjunction Effects. In: Sun, R. and Myake, N. (eds.) The 29th Annual Conference of the Cognitive Science Society and the 5th International Conference of Cognitive Science (Pp. 131-135) Mahwah, NJ. Erlbaum (2006) 11. Busemeyer, J. R., Santuy, E., Lambert-Mogiliansky, A.: Comparison of Markov and quantum models of decision making. In P. Bruza, W. Lawless, K. van Rijsbergen, D. A. Sofge, B. Coeke, S. Clark (Eds.) Quantum interaction: Proceedings of the Second Quantum Interaction Symposium, pp.68-74. London: College Publications, (2008) 12. L. Accardi, A. Khrennikov, M. Ohya, The problem of quantum-like representation in economy, cognitive science, and genetics. In.: Quantum BioInformatics II: From Quantum Information to Bio-Informatics. L. Accardi, W. Freudenberg, M. Ohya, eds., p. 1-8, WSP, Singapore (2008). 13. L. Accardi, A. Khrennikov, M. Ohya, Quantum Markov Model for Data from. Shafir-Tversky Experiments in Cognitive Psychology. Open Systems and Information Dynamics, 16,371-85 (2009). 14. Conte E., Khrennikov A., Todarello 0., Federici A., Zbilut J. P. Mental States Follow Quantum Mechanics during Perception and Cognition of Ambiguous Figures. Open Systems and Information Dynamics, 16, 1-17 (2009). 15. Khrennikov A., Haven E. Quantum mechanics and violations of the surething principle: the use of probability interference and other concepts. Journal of Mathematical Psychology, 53, 378-388 (2009). 16. A. Khrennikov, Ubiquitous quantum structure: from psychology to finance, Springer, Heidelberg- Berlin-New York, 2010. 17. A. Khrennikov, Contextual approach to quantum formalism (Fundamental Theories of Physics). Springer, Heidelberg- Berlin-New York, 2009. 18. M. Asano, A. Khrennikov, M. Ohya, Quantum-Like Model for Decision Making Process in Two Players Game Foundations of Physics, DOl: 10.1007/s10701-01O-9454-y,(2010) .
Quantum BiD-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 61-89)
NEW TYPES OF QUANTUM ENTROPIES AND ADDITIVE INFORMATION CAPACITIES
VIACHESLAV P. BELAVKIN School of Mathematics, Nottingham University, Nottingham, NG'l2RD, United Kingdom
An elementary algebraic approach to unified quantum information theory is given. An operational meaning of entanglement as specifically quantum encoding is disclosed. The general relative entropy as information divergence is introduced and three most important types. of relative information, namely, Araki-Umegaki (Atype) and of Belavkin-Staszewski (B-type) and the thermodynamical (C-type) are shown. The true quantum entropy different from the von Neumann semiclassical entropy is introduced and the proper quantum conditional entropy is shown. The general quantum mutual information via entanglement is defined and the corresponding types of quantum channel capacities as the supremum via the generalized encodings are formulated. The additivity problem for quantum logarithmic capacities for the products of arbitrary quantum channels under the appropriate constraints on encodings is discussed. It is proved that the true quantum capacity, which is achieved on the standard entanglement as the optimal quantum encoding, reclaims the additivity property of the logarithmic quantum channel capacities via the entanglement on the products of quantum input states. This earlier obtained by V. P. B. result for quantum logarithmic information of A-type is extended to any type of quantum information
1. Introduction
It is not commonly known that quantum information is already almost 50 years old. On the mathematical side it was pioneered by Umegaki and developed by Ohya and other Japanese mathematicians. On the physical and side it was pioneered by R L Stratonovich who, following the discovery of lasers and inspired by the ideas of Shannon's mathematical theory of communications 21, published a series of papers on information transmission via quantum channels in the 60's and even completed a book on quantum information. In particular, he was the first who introduced the quantum analog of Shannon's mutual information 23 and computed it for entangled quantum Gaussian variables 24, predicted the classical capacity of quantum Gaussian channels 25, and found, jointly with Belavkin, the optimal decod61
62
ing maximizing the capacity for such channels with a Gaussian source code 9,10. Lebedev-Levitin 17 and Holevo 15 were among the followers of this pioneering work at that 'prehistoric' time of quantum information. Unlike classical channels, quantum channels can have several different capacities (e.g. for sending classical information or quantum information, one-way or two-way communication, prior or via entanglement, etc.). (See the conceptions of coherent information 20, von Neumann mutual 12 and entanglement assisted information 11). Unfortunately, most of these attempts seam not satisfactory because the defined quantities fail to preserve such naturally conjectured properties of informational capacity as additivity 16,22 for parallel channels, and some do not have even the monotonicity property for consecutive channels. Well, until recently the problem of unifying all these capacities within a general information-theoretic framework remained unsolved, despite the innumerous prediction of the advantages of quantum information. The comparison of classical and quantum capacities and the advantages and disadvantages of quantum information can be rigorously proved only within a unifying approach. This paper contributes to such unifying approach based on the operational entanglement theory to quantum channel capacity suggested in 1,2,7,8. By enlargement of the class of input encodings, including the encodings via entanglement for one-way communication this approach shows that different capacities are in fact the constraint quantum capacities related to some informational divergences, and that some of the 'capacities' which do not have the natural properties of the quantum divergence, should not be really considered as the capacities of quantum channels. Recently tremendous effort has been made to prove or disprove the additivity conjecture 16,22 for the Holevo semiclassical capacity giving the upper bound of the classical information capacity of a quantum communication channel. Eventually this additivity conjecture has been steadily disproved 13,14 proving that quasiclassical constraint of quantum capacity is inappropriate for the additivity of quantum channels as it was predicted in 1,2. In this paper we concentrate on extending the additivity property of the true, unconstraint quantum entangled capacity first proved for the A-type capacity based on the Araki-Umegaki relative entropy in 1,2 and also for the B-type capacity based on Belavkin-Staszewski relative entropy in 5. We shall also introduce a new, thermodynamical type of quantum mutual information and corresponding quantum channel capacities via entanglement in this unifying operational approach. In defining the general quantum capacity we shall follow an axiomatic approach outlined for the general relative
63
entropy by Petz 19 and Ruskai,18 see also, 6 however, we shall start from informational divergences and will treat the relative entropies and informational divergences differently than in. 18 ,19 This unifying approach has the natural level of generality based on von Neumann W*algebras taken in,3,8 however, in order to avoid the use of noncommutative integration theory we shall restrict the level of generality to the discretely decomposable algebras as it was done in. 1,7 The paper is organized as follows: section one introduces related notion of quantum probability and information theory, such as quantum state and quantum entanglement; section two introduces the general quantum divergence and new types of quantum relative entropies via the divergence, section three introduces the entangled quantum mutual information and the true quantum entropies of three types achieved via entanglement; section four introduces quantum channel capacity via entanglement encoding and shows additivity of the logarithmic entangled quantum channel capacity of any type; the final section contributes to conclusion and further problems.
2. The Entanglement States and Operations 2.1. Normal states, densities and pairings
As usual in quantum information theory, we may consider only simple quantum systems described by matrix algebras Mn. However, in order to include the classical random codlings and keep a closer link with classical information theory, the tensor product algebras Ii-. = Mn®C (J) ofrandom matrices a j = a (j) E Mn on a discrete set J with probability distribution Aj 2: 0, L Aj = 1, describing a partially classical 'Alice' as a purely quantum (if IJI = 1) or semi-quantum (IJI > 1) , will also be allowed for consideration. More generally, the semisimple matrix algebras lB = EBjM nj with = dim M nj , describing a 'decomposable Bob' as a quantum system obeying the superselection rule, may also be represented on a Hilbert space ~ , say, of total dimensionality Lj nj :::; 00, including the purely classical case corresponding to all nj = 1, like the Schrodinger's cat no + nl = 2 as a classical bit corresponding to J = {O, I}. Each element b E lB is an orthogonal sum (or series) b = EBb j represented as the block-diagonal operator b = [b j on the Hilbert sum ~ = EB~j with arbitrary entries
nJ
5i]
bj
E
B (~j) := M nj , where nj = dim ~j . Normally the quantum states
64
c; E (3 on such algebra IE are identified with the linear functionals C;
(b) :=
LTr [bjo-j] == T~ [bo-l jEJ
giving the expectations (b) = T~ [bo-l = c; (b) of b E IE considered as the output observables for a quantum system (Bob). Here o-j = 0-; 2: 0 are fJ-positive (i.e. semipositive Hermitian) matrices normalized to the probabilities Tr [0- j 1 = 7r j, L 7r j = 1, defining the decomposable covariant density operator 0- = tBjo-j which is in one-to-one correspondence C; f---+ 0- == ~ with the state C;. In order to avoid the consideration of unphysical operations such as partial transposition, and to deal only with completely positive operations, it is convenient to describe the states C; not by the decomposable density
~perators 0- = [o-i8;]
but by transposed density matrices 0-
=
[o-i8i]
=
0-. Such decomposable densities 0- = tBjo-j are also semipositive trace-one matrices which are in one-to-one correspondence C; f-+ ~ := 0- with the states C; (b) = (b,~) given, say, by the standard transposition ~ = 0- T with respect to the usual tensor pairing .
(b,~) := b~,nl~j,n
I
== Tr [b~] ,
where b = [b~l,n8i] == b T is defined in a normal basis of each fJj. The operators ~ = 0- are called contra-variant densities of the states C; with respect to the tensor pairing, and they coincide with the transposed ~ = 0for the symmetric states C; (b) = C; (b T). The contravariant densities are proved to be more appropriate for the operational entanglement theory 8, they can be defined (see the Appendix) independent of the basis, and they were introduced also for the quantum channels as infinite-dimensional Choi operators in 4. 2.2. Examples: the standard and the qubit paring If the transposition b f---+ b is taken standard b = b T in a basis of fJ, then jffi = IET coincides with IE only if the basis is com partible with the decomposition IE = tBIEj into th! full matrix algebras IEj = B (fJj). However, it is convenient ~ot to identify IE and IE even in the full matrix case IE = B (fJ), considering IE as an opposite algebra to IE. In general the transposition b f---+ b can be defined not related to any specific orthonormal basis of fJ as a linear antimultiplicative ac = ca invertible isometry IE --+ IE, commuting with the antilinear involution a f---+ a*,
65
a**
= a as a*:=8:=8:*,
a=a VaEE
in the usual notation identifying a f---+ a with the inverse transposition E ----t E. The contravariant densities <;- = ~, as f)-positive trace-one decomposable matrices, are in one-to-one correspondence with the states ~ (b) = (b, <;-) with respect to the tilde-symmetric pairing
(a, c) = Tr [ae] = Tr [ea] = (c, a) . The general tracial pairing (b, <;-) (see the appendix) is fully characterized by the property
(ab, de) = (bc, ad)
Vb E E, dEE
and strict positivity of the related scalar product (ale) = (8:, c) on jjj given by the complex conjugation a f---+ 8: into E. More generally than the usual transposition <;-T in a specific basis it can be given as ~ = h- 1<;-Th. Here h is a Hermitian invertible element h of E coinciding with the transposed h T = sh
= h up to a central signature element s = [sj
on
E
c (E) commuting
with E such that S2 = 11), which has in each Ej = B (f)j) the only two possible values sj = ±1. The following example shows how to obtain in this way the transposition a = h- 1a T h independent on basis in the case of the qubit f) = ((:2. Quantum bit algebra E = M2 of 2 x 2- matrices b = 8"b := [
bo + b3 h - ib 2 ] b1 + ib 2 bo - b3
=bo1 + b.~ 8"
----t
given in the Stock's basis & = (8"a) of M2 by the identity 8"0 = 1 and ----t the standard Pauli matrices 8" = (8"1,8"2,8"3), is naturally paired as in the Minkowski space 3
(b, <;-) = b . q :=
L baqa = baqa = tr (b~) a=O
with the same jjj = M2 by the normalized trace tr (b) = ~Tr [b] = bo, tr (1) = 1. The contravariant density matrices <;- = qa<;-a are most naturally given in the dual basis <;-0 = 1 = 8"0 and <;-a = -8"a, a = 1,2,3 such that all Pauli matrices <;-a := 8"a = ~a are h-transposed, 8"a = h - l<;-~h = 8"a, to <;-a with respect to the antisymmetric imaginary Pauli matrix h = 8"2 = h- 1 having the signature s = -1.
66
2.3. Coupling operators
From now on we shall consider not only finite but semifinite algebras A with involution A * = A, identity 1 E A and a reference trace JL as a linear strictly positive functional JL (?!) == (1, ?!) /L on the predual space AT generated by the positive functionals on A of the form a f--+ (a, r) /L such that JL Car)
= (a,r)/L = JL (ra) Va E A,r EAT n A.
Note that the trace class AT with respect to the standard trace JL = Trg of a Hilbert space 9 is a dense subspace of the weak closure <:;; 8(g) of A represented on g, but in general it is the completion of AT n A with respect to the dual norm
if
to the usual operator norm on A. The entanglement theory for the general semisimple algebras was developed in 8, but here for simplicity we can consider only the discrete algebras A with normalized or nonnormalized reference traces T 9 (a) = JL (a). The linear functionals on A of the form i? (a) = (a,?!) /L given by ?! E AT are called regular (normal if?! ::::: 0). Every normal state i? on A <:;; 8(g) can be represented as
(1) by the density ?! = x* x E AT with an operator x : 9 ~ ~ satisfying the condition xAx* <:;; lB given by a (decomposable) algebra lB <:;; 8(~). Such operator x is said to be coupling the state i? to the state <; (b) = (b,~) v on lB given by the density ~ = x*x E lBT with respect to the trace
Both states i? and <; are the margins of an entangled state w defined on the tensor product algebra A Q9 lB by
JL[ax*bxl = w(a Q9 b) = v[bx*axl.
(2)
In the case 9 = ~ and -::: = ¢1/2 corresponding?! = ¢ this entanglement is called standard for lB = A. The normal state i? represented by the covariant density p = xx* = as the margin
e
i? (a)
=
Tg
(ap) = v (x*ax)
of w is said to be entangled to ~ = x* x E lBT by the coupling operator x. In the case 9 = ~ and x = ¢1/2 corresponding?! = ¢ this entanglement is
67
called standard on E = A. We shall see that it is maximal entanglement under the constraint x* x = g of the given input state.
2.4. Mixed entangled states The achieved entanglement defines on A Q?I E
= B(g Q?I ~) a pure state
w(aQ?lb);= (vi (aQ?lb)v) == v* (aQ?lb)v in terms of the scalar product on 9 Q?I ~. It is given by a unit vector v E 9 Q?I ~ obtained by a partial transposition v = of the entangling operator x such that
x
where J 9 denotes an isometric complex conjugation on g, and it has in general mixed marginal states
(a, g) = v*(a Q?I
l~)v,
(b,~)
= v*(lg Q?I b)v.
(3)
Mixed entangled states can be obtained by using a stochastic entangling operator as a row x. = (Xi) of entangling operators Xi : 9 ----+ ~ satisfying the constraints
The partial transposition defines the components Vi = ~ of the stochastic state vector as a row v. = (Vi) of Vi E 9 Q?I ~ such that
W := v.v! E A Q?I E, Trv!v.:=
L
V;Vi
= 1.
Given a stochastic state vector Vi E 9 Q?I ~ we define a _mixed compound state w : A Q?I E ----+ C by the contravariant density := W == w as
w
w(a Q?I b) = T[(aQ?lb)w] = (aQ?lb,w). with respect to the product trace
T
=
T9
(4)
Q?I T~.
Lemma 2.1. The compound state (4) can be achieved by
T~[x.ax!b] = w(aQ?lb) = Tg[x.bx!a],
where x. = (Xi) is the row of Xi, with w( aQ?ll~) = (a, g), w( 19 Q?lb) = (b,~) given by partial traces g = L xi Xi, ¢ = Li xi Xi . Moreover, the operator x. as the partially transposition of v. = (~) is uniquely defied up to a unitary transformation U : v. f-+ v.U on the minimal auxiliary space.
68
2.5. Entanglement as operation Let us write the above entangled state w on A ® Iffi as
(5) where the CP map 7f(b) = L:i X;bXi with values in AT bounded by Ilbll is dual 7f = 7f to the predual CP map 7fT = 7fT lA,
i
?!
(6)
Remark 2.1. The entanglements 7f and 7fT can be written in terms of the partial traces v and J-l on the compound density operator W = L: V;Vi as
7f(b) = v[(lg ® b)w] == (b, w)v'
(7)
7fT (a) = J-l[(a ® l~)w] == (a,w)/L'
(8)
Definition 2.1. A normal CP operation 7f : Iffi -+ AT normalized to a state density ?! = 7f( 1~) is called generalized entanglement of the state <;(b) = (lg,7f(b)) to Q(a) = (l~,7fT(a)).
It is proper (or true) entanglement if the transposed positive map 7f - : ~ --------b -+ 7f (b) into A is not CP on Iffi (Equivalently, if the map 7f; : a -+ 7fT (a) E Iffi is not CP.) 2.6. The standard entanglement The entanglement is called standard for the state <; on Iffi if it couples <; to the transposed state Q = ~ on A = iB by
(9) Obviously pT (a) = e / 2 ae/ 2 == (j (a), where ~ = Q, and p = pT iff Iffi = Iffi and <; is symmetric, <; = Q. Given <;, the standard entangled state is defined by
(10) Theorem 2.1. Every entanglement 7f on Iffi to the state?! E AT has a decomposition 7f = P 0 II,
(11)
69
where IT is normal CP contraction JR ::' A such that 19 2: IT (1~) 2: E{2 on the the minimal orthoprojector Ee E It. supporting the density operator Q. This decomposition is unique by the normalization condition IT ( 1 ~) = E {2' Proof. IT can be found as a solution to the equation
which is unique if Q is nondegenerate:
(12) If Q is degenerate, one should reduce 9 to the Hilbert subspace ge given by Ee'
=
Ee9 0
3. Informational Divergences and Relative Entropies 3.1. Entanglement measures
The measure of entanglement in a state w is usually determined by its divergence Vrs, (w) = inf (V (w; 'P) : 'P E from a reference subset 6 convex span
1 ~
6d
6 of disentangled states on M = It. ® JR, the
of all product states (2 ® <; on M, say. The simplest such divergence is given by the trace distance
V (w; 'P) =
1
'2 (1,1& -
E
[0,1]
which is symmetric, V (w; 'P) = V ('P; w). Another important symmetric divergence is defined by quantum Hellinger distance
V (w; 'P) = ('P
+ w) (1) -
2f (w, 'P) == V l2 (w; 'P),
(13)
I I I where f(w ,'P) = e[(
f(w,'P) =
~sup{e(c(+(*~) :c~=&,(*(=
70
Although these divergences look very natural, it is usually difficult to compute them, unless the operators rp and & commute, in which case they both can be written as
VZr (W;4?) = (4?-w)(1) -w (lr with h/2 (q)
(!)),
(14)
= 2 (01 -1) for r = 1/2 and 1
h (q) = "2 (q - 1 + Iq - 11) == (q - lL \lq;::O: O.
(15)
Note that the operator function ll/2 is the square root case of the REmyi logarithm 1
1
lr (q) = - (e r1nq - 1) = - (qr - 1), lo (q) = lnq (16) r r or r-Iogarithm, which is well defined for any 0 S; r < 1 as a smooth, strictly monotone and concave operator function of q > 0 including the limiting case r ~ 0 when lo = lim lr is the natural logarithm lo = In. It can be naturally extended to a proper concave function on lR by l (q) = -00 on q S; 0, has finite strictly negative values for 0 < q < 1 with l (1) = 0 and the normalized derivative l~ (1) = 1 at q = 1 and is strictly positive if q > O. However, in the case r = 1 the REmyi "logarithm" l (q) = q-l is not concave but only affine, corresponding to the trivial divergence d (w, 4?) = 0 in (14) if h is replaced by this l. This is why in the case r = 1 we redefine the divergence by another monotone concave function (15) which is, however, not strictly monotone and concave and is not smooth at q = l. The information divergence V (w; 4?) of w from 4? is usually defined as a positive negaentropy V z = -Sz by the semifinite relative entropy
(17) Here w (ill) = w (m) and l is usually taken to be the Renyi logarithm (16), r E [0, 1[ for which In (rpj&) is usually understood as In rp - In &. 3.2. The general information divergences
The general information divergence V (w; 4?) of states on a matrix algebra M can be defined like a distance to have only positive values, however, unlike the distance it is not assumed to be symmetric and satisfying the triangular inequality, and usually is allowed to have also the infinite value +00, say, for some central states w i- 4? on an infinite dimensional M S;;; lIl\ (~). The
71
reference state cp, or weight as unnormalized state, is said to be tracial, or central, if the density operator rp commutes with all & E SM. Obviously it is the case if rp = 1 corresponding to the standard trace cp (m) = (m,1) = Tt [m] which majorizes as 1 :::> & any state w normalized as
e[&] := (1,&) = 1 'Vw E SM. with respect to the trace e on M T • If the state w is not too different from the reference cp in the sense that it is dominated by the equally normalized cp, i.e. if cp (1) = w (1) and w is majorized by ACP for a A > 0, a finite information divergence V (w; cp) of w from cp is usually defined as a positive negaentropy VI = -SI by the semifinite relative entropy (17). More generally, we shall define a positive l-divergence VI as dual to a relative l-entropy in the sense VI (w, cp)
+ Sl (w, cp)
=
(cp - w) (rz) with rl = l' (1)
(18)
for a suitable contrast function l. One can always take l (q) = e r1nq - 1 == rlr (q) having q = l' (1) = r1, or the Renyi logarithm l = IT) 0 ~ r < 1, normalized by l~ (1) := Oqlr (1) = 1, but it can be any operator-monotone concave function l : JR.+ ----+ JR., positive on (1, (0), negative on (0,1) and smooth at q = 1. Note that the latter condition excludes the function (15) corresponding to the trace distance V = d 1 for which the entropy S is not well-defined by (18) but can be taken zero Sh (w, cp) = 0 for any w ~ cp by choosing the subderivative rl E l~ (1) as rl = 1. The above properties of the contrast function are derived from the following divergence axioms if they hold on the whole cone Jt of w, or at least on the convex subset Jto = {w:::> 0: W (1) ~ I} containing 0 E Jt. By UCP we denote the properties of unitality A (1) = 1 and complete positivity [A (bibk)] :::> 0 for a linear normal map A : lIlS ----+ M.
(1) V(w; cp) :::> 0 with V = 0 {? W = cpo (Strict positivity and distinguishability) (2) V (EBAiwi, EBAiCPi) = 2: AiV (Wi, CPi) 'VAi :::> 0, 2: Ai = 1. (Direct affinity) (3) V(w 0 A; cp 0 A) ~ V(w; cp) for any UCP map (Operational monotonicity. ) In addition, we say that V( W; cp) is semifinite if (4) V (w; cp) < 00 if W ~ ACP for a positive A; that is smooth (differentiable) if (5) the function d (s, t) = V (w + s{); cp + M) is smooth.
72
that it is negaentropy defining the entropy S if (6) S(w; cp) ::; S(w; CPl) for every cP ::; CPl and an r
= (cp - w) (r) -
V
> O.
Note that in the commutative case [iV, cp] = 0 the above axioms define the l-entropy (17) and the corresponding divergence uniquely up to the choice of the function l similat to the entropic function g in 18,19 if they hold on the whole cone .it of w , or at least on the convex subset .ito = {w ;:::: 0 : w (1) ::; I} containing 0 E.it. Given a relative entropy S, the function l can be easily deduced as l (q) = S (1; q) corresponding to the case w = 0 EB 1, cP = 0 EB q, and V (w; cp) is uniquely defined for the completely decomposable w = EBAi and cP = EBAiqi due to the direct additivity by such l. In future we shall always choose r = 1 by normalizing l such that rz = l' (1) = 1.
3.3. The relative l-entropies of types A&B If cp and iV do not commute, cp/iV is not uniquely defined. In the logarithmic case l = In the naive convention In (cp / iV) = In cp - In iV gives the ArakiUmegaki relative entropy
SLA) (w; cp) = e[iV~ (In cp -In iV)iV~] = w [(In cp -In iv)].
(19)
Note that for the noncommuting cp and iv this convention does not lead to the natural generalization of classical formula SIn (w; cp) = ¢ (h (iv jcp)) in terms of logarithmic entropic function h (p) = -p In p for the RadonNikodym (RN) density p of w with respect to cpo One can take this convention to produce also the relative l-entropy of the A-type for any contrast function l, although it may look even less natural as in the case of A-type REmyi entropy
siA)(w; cp) = e [iv~ (er(ln.p-lnib)
-
l)iV~] =
w(er(ln.p-lnib) -
1).
corresponding to l (q) = qr - 1. Moreover, if cp is not dominated by w, this entropy is ill-defined, while h (iv / cp) may be still well-defined as a function 1 1 of the positive bounded operator cp - 2 ivcp - 2 for any w dominated by cpo Therefore, it is more natural to define the B-type relative l-entropy
(20) 1
1
in terms of the weight ¢ = e 0 7r
73
h (p) = pl (p -1) is l-entropic function, say
defined as a positive and concave on [0,1] for any contrast function l, with h (0) = 0 = h (1), having a maximum p~l' (Po) > 0 at the unique solution Po of the equation pl' (p) + l (p) = 0 for a smooth l. The B-type of relative entropy was introduced by Belavkin-Staszewski in 1986 for the case l = In. One can rewrite it in the form similar to (19) as
where In(&0- 1 )&
= &In(0- 1 &) is understood as (22)
Ohya and Petz proved that the Belavkin-Staszewski divergence gives better distinction of W relative to tp than Araki-Umegaki divergence in the sense that V9;l 2': VIC: l , and that it satisfies all required axioms above. Note that the Hellinger divergences 1 (w; tp) of A&B-type can be 1/2 written as (13) in terms of the corresponding nonsymmetric A&B-type fidelities (w; tp), and
vi"
fU2
f;Al
= w (er(ln
fJ
define the relative lr-entropies of the corresponding type as
3.4. The entropy increase, its concavity and additivity The relative entropy 5, defined by the general divergence as
5 (w; tp) = tp (1) - w (1) - V (w; tp) , has almost all properties of V except the positivity (1), unless tp 2': w. Indeed, by taking the standard normalization r = 1 in the definition (6) of S we have 5 (w; tp) :S tp (1) - w (1) < 0 if (tp -w) (1) < 0 since V (w; tp) 2': O. Thus, the relative entropy 5, unlike the divergence V, can be used to distinguish only the states with equal normalization w (1) = tp (1) when 5 = -V.
74
The property (2) of the general divergence V is equivalent to the direct affinity
S(EBA(Wi,EBAi'Pi) = LAiS(Wi,'Pi) VAi ~ O'LAi = 1 of the general relative entropy S:
L
AiS (Wi; 'Pi)
= L Ai ('Pi - Wi) (1) - L Ai V (Wi; 'Pi) (EBAi'Pi - EBAiWi) (r) - V (EBAiWi' EBAi'Pi)'
=
The monotonicity property (3) implies also the entropy semi increase
S(W
0
A; 'P 0 A)
~
S(w; 'P) Vw, 'P
for every normal UCP map A : lE -+ M equivalent to the semidecrease of the divergence V under the coarsgraining A due to
('P
0
A- W
0
A) (1) = ('P - w)
0
A (1) = ('P - w) (1).
In particular, by taking A as the embedding b f---+ 1 ® b of lE into M = EBiEI lEi == A ® lE corresponding to the identical copies lEi = lE indexed by
a finite set I with the abelian A = reI of the diagonal I x I-matrices, we obtain by the direct affinity of S that the general relative entropy S must be jointly concave (but not convex as V):
S (L AiWi, L
Ai'Pi)
~S
(EBAiwi, EBAi'Pi) = L
AiS (Wi, 'Pi)
for any Ai ~ 0, L Ai = 1 and states (weights) Wi, 'Pi on lEo The additional properties (4) - (6) have similar formulations in terms of S. The logarithmic case is defined by another additional joint additivity property
'P
= ®'Pi' W = ®Wi
=}
S (w; 'P)
= L S (Wi; 'Pi)
(23)
of the entropy with respect to products of weights 'Pi and states Wi on Mi' Here ®'Pi = ®i=l 'Pi and similar W is the product state on M = ®i=l Mi of Wi. One can easily see this for A and B types entropy,
(24) where e[wln(wi/rPi)] = edwdn(wdc,Oi)] for each i due to the product trace e = ®e i and the normalizations e j [Wj] = 1 of each Wj with respect to the trace ej on the pre dual space of M j . The logarithmic additivity can be similarly formulated in terms of only for the equally normalized Wi and 'Pi'
VjS!
75
3.5. A new type of relative l-entropy C Since the joint convexity of V (w;
w f-7 Vrp (ftr)
Mi,
Vrp (&) = s~p {Vz,rp (m) - (m, &) : m = m* EM}
(25)
of the inverse Legendre-Fenchel transform (ILFT)
Vrp (m)
=
inf {(m, ftr) w
+ Vrp (w)
: ftr = ftr*
E
MT }
.
(26)
The ILFT image Vrp of Vrp is also well defined as a proper but concave function Vrp (m) E [-oo,oo[ of any Hermitian operator m from the dual algebra M. If cp commutes with m, one can explicitly find the optimal covariant density ftr also commuting with cp and evaluate for any type of l-divergence V z the transform Vz,rp (m) of l-divergence V z on such m E M independently of the type (.) for each smooth contrast function l. Thus, for the Ri"myi logarithm l = lr one can find that
V~) (m) =
--+
0 corresponding to l
(27)
= In it is
One can interpret this Vrp (m), which is usually finite on m > 0, as a free energy coinciding in first order with the total energy
=
lr.
The Legendre-Fenchel
transform
VZ,rp (w)
:=
sup {Vz,rp (m) - w (m) : m = m* EM},
(29)
m
of the classically evaluated Vz,rp and analytically extended to noncommuting cp and m, say, by (27) or (28) for I = lr and I = In, defines implicitly the informational divergence of a new, thermodynamical type C.
76
One can also take a sharper divergence V(C) (w;
= i~,f {w (m) + V(C) (w;
(1)
= 1}
and can be equivalently defined as V(C) = (
= inf {S (m;
(30)
where S(m;
evaluated for all iIi commuting with rp by the constraint transform of the entropy siC) written with the help of Lagrangian multiplier, as SI(C)
(m;
+, } .
(31)
'U7
and analytically extended to any m E M for which it makes sense, otherwise = 00. Thus, the reduced lr-divergence vi~) of type C is defined as the transform (25) of the constraint REmyi energy
VI(C) (m;
vi~~ (m;
(e- m)
(32)
coincides in the first order with the mean energy
3.6. Other new entropy types D&E Unlike in the case of types A and B the optimal state w~C) resolving the variational problem (26) for divergence of the type C can be found explicitly
77
even for noncommuting Indeed, it is given as &(e) - 1 free -
1
o
rp and iii at least in the logarithmic case l I n . (e) _
e (8-1) 'Pe -8md s, 'W nor A
A
-
e
r-111
e
(8-1)
'Pe A
-8md
s
o by solving the dual problem (25) respectively for free (28) or constraint (32) by normalization logarithmic energy using the ordering index method for noncommuting variation 8m of m. This implicitly defines the entropic function hi:) (& : rp) of C-type relative density for the state 'W with respect to 'P presenting the solution
Sl~e) ('W; 'P) =
e (rp ~ hi:) (& : rp) 'P ~)
=
'P
(hi:) (& : rp)) .
(33)
of the variational problem (30) coinciding on ('3 with unreduced logarithmic relative entropy S
rp ~ hr;:) (& : rp) rp ~ = iii - ,1 in terms of the variational derivative hr;:) (& : rp) = 8hi:) (& : rp) /8& resolving (31) for (33). Similarly we can define another two interesting logarithmic relative entropies of type D and E in the form
Sl~) ('W; 'P) = e (rp ~ hf;! (& : rp) 'P ~) =
'P ( hf;! (& : rp))
.
They are determined respectively by the logarithmic entropic functions hCJ;) (& : rp) and h~) (& : rp) as the solutions of the stationarity differential equation equations
rp~h~(&:rp)rp~ =iii-,1 defined by the variational derivatives h~ (& : rp) = 8hf;! (& : rp) /8& in terms of the RN derivative &SD) = &
= exp [In & -lnrp] (E-type):
rp~hCJ;) (&: rp) rp~
=
h' (&SD)), &SD)
=
rp-~&rp-~,
rp~ h~) (& : rp) rp~ = h' (&SE)) = h' (exp [In & -lnrp]). Here h' (p) = -In p -1 is the derivative of the logarithmic entropic function h (p) = -plnp. Unfortunately, the free (28) or constraint (32) logarithmic energy of types D and E in general is difficult to evaluated explicitly if rp does not
78
commute with ill, however the optimal states resolving the corresponding minimization problems (25) can be easily found from the stationarity differential equation with I = 1 for the free case and I = r for the normalized case:
,(E) _
W free -
eIn ",,-iii ,
,(El _
W nor -
er-lln",,-iii e .
4. Quantum Mutual Information and Encodings 4.1. Entangled mutual information Here we define symmetric mutual information m a quantum compound state W achieved by an entanglement 7f : lBl ---7 AT' or, equivalently, by 7fT : A ---7 lBlT' as the general information divergence of the entangled state W with respect to the product state cp = 12 Q9 C; on M = A Q9 lBl:
(34)
I.t,k
vU
In particular, (7f) = (Wi 12 Q9 c;) is the usual logarithmic quantum mutual information of a particular type.
Theorem 4.1. Let A : lBl ---7 A~ be an entanglement of the state c;(b) = (1go,A(b)) on lBl to (AO, 12°) with AO <;;; 8(gO) as a CP map such as?/ = A(1~), and 7f = KT 0 A be the entanglement to the state 12 = (!oK on A <;;; 8 (g) defined as the composition of A through a channel KT : A~ ---7 AT as the predual to a normal UCP map K : A ---7 A0 • Then the following monotonicity holds
I A('l() ,IE 7f
~
(.) IE () I Ao , A .
(35)
Proof: This simply follows from the monotonicity of V. It is interesting to compute and compare the different types of entangled mutual informations Vz (Wi 12 Q9 c;) for the particular types of the contrast function l. In particular, to compute the RE'myi and the usual logarithmic informations (Wi 12 Q9 c;) for the special, say quantum Gaussian entangled states wand to compare them with the wellknown Gaussian entangled information of the type A.
vU
4.2. The proper quantum entropies Applying the monotonicity property for the standard entanglement A = 7fJ == a of C; to 12° = ~ decomposing every 7fT by the Theorem 1 as 7fT = aoK
79
we obtain immediately the explicit solution A = optimization problem sup IA,lIl\(1T) = IlIl\,j(a-)
iffi, 1TT
=
a- := 1T~ to the
== 7-{lIl\(C;).
(36)
7r:/-L07r=c;
Definition 4.1. The maximal quantum information
I lIl\,lIl\~(1T <; ) = 7-{lIl\(c;)
= I~
7-{e·) = Ie·) lIl\,lIl\ (1T~) <;, In In
(37)
over all entanglements 1TT E Kq(c;) of any (A,a) to (lffi,c;), achieved on 2ae/ 2 == a- (a), A 0 = iffi by the standard quantum entanglement 1T T (a) = is named as entangled, or proper quantum entropy of the state c;. The positive difference
e/
(38) is called the conditional proper quantum entropy 7-{lIl\IA = 7-{lIl\ - IA,lIl\ of the entanglement 1T : lffi ---> AT . The semiclassical entropy SlIl\ (c;) is defined the solution of the extremum problem (36) under the additional constraint 1T T E Kc (c;) that A is an Abelian (the diagonal) algebra A = C (I) indexed by a discrete set:I : sup
{IA,lIl\(1T): A
=
C (I)}
== SlIl\(c;).
7r:V07r=C;
It is achieved on a classical state a (a) = L aipi such that /W1T = L PiC;i = C; is given by pure optimal states C;i on lffi. If 1T T E Kc (c;), then, obviously,
(39) otherwise this semiclassical conditional entropy can be negative if 1TT does not satisfy the semiclassical constraint Kc (c;). Naturally, 7-{lIl\ (c;) 2- SlIl\ (c;) and 7-{lIl\IA (1T) 2- SlIl\lA (1T). Note that SI~A) (c;) is the usual von Neumann entropy
which is achieved as the solution of the semiclassical extremum problem (39) at any decomposition C; = LPiC;i into the pure states C;i as
st)(c;) = S(c;)
+
sup voc;1.=l
I:>iV [~i ln~il = S(c;)
Thus, 7-{}:) 2- S( c;), in particular, 7-{}:) algebra lffi
= 2S in the case of full matrix
= B (f)). It is interesting also to compare the entropies
SL) of the other types.
7-{J') r
and
80
4.3. Quantum noisy and noiseless channels Definition 4.2. A quantum channel is described on the output algebra of Bob, lE, by a linear normal unital completely positive map (UCP) A : lE ----+ lEO into the same or another input algebra lEo <:;; B(~O). The algebra lEo in the input state <;0 should not be identified with the "Alice" algebra A in a state f2 but rather with the transposed algebra lEO = A which can be the same as A but with the opposite products a*a to aa* E A and the transposed state <;0 = 72. Every channel A admits the Kraus decomposition
A(b) = LVkbV~ E lEo Vb E lE, k
where Vk are contractions ~ ----+ ~o, Lk VkV~ = 1~o and is dual A = A:~ to map AT IlE~ == AT on the predual space lE~ into lET decomposed as
ATW) = LVk~oV; V~o E lE~, k
where Vk = J~ V~J~o = Vk. For example, quantum noiseless channel in the case lE = B(~), lEo = B(~O) is described by any single coisometry V : ~ ----+ ~o, VV* = 1~o, as A(b) = VbV*. A noisy quantum channel sends input pure states f20 = <;0 on the algebra lEo = B(~O) into mixed states <; = <;0 A described by the output densities ~ = AT (~O) given by the predual map AT = AT IlE~ to the mixed normal UCP map A : lE ----+ lEo. It can always be written as A(b) = Trf+ [VbV*], where V is a stochastic coisometry, i.e. a linear operator ~ ----+ ~o Q9 f+ with the partial trace Trf+ [VV*] = 1~o on a separable Hilbert space f+ of quantum noise in the channel. Each input state <;0 is transmitted into an output state <; = <;0 A given for any ~o E lE~ by the density operator
4.4. Entanglement as quantum encoding Definition 4.3. A quantum encoding ofthe input state <;0 for the channel A with an alphabet, or Alice probe algebra A is any generalized entangling CP map K : A ----+ lEO normalized as K( 1p) = ~o. The proper quantum encodings correspond to the proper entanglements, while the semi-quantum encodings are described by the commutative (classical) alphabet algebras A = c (A).
81
The set of all quantum encodings is convex, denoted by Kq, or Kq (<;0) if the normalization <;0 is fixed. It includes the standard entanglements (J0 (a) = RaR, corresponding to AO = llllo, as pure quantum encodings, while the convex subset of all semi-quantum encodings, denoted respectively by K c, or Kc( <;0), excludes all proper entanglements. Every encoding K, E Kq (<;0) is a composition K, = (J0 0 K of the pure quantum encoding (J0 an~ normal UCP map K : A ----> AO into the sufficient alphabet algebra AO = llllo antiisomorphic to llllo. The transposed CP map K, T = KT 0 pO represents quantum encoding as a transmission process by KT : A~ ----> AT inducing the alphabet state e = K,T (1~) from the transposed input state eO = ¢o E A~ maximally entangled by (JOT = pO to the input state <;0 on the antipode llllO of the algebra A° . Each encoding K, E Kq (<;0) induces a compound state wO(a@bO):= Trf_[vo*(a@bO)vO]
== (a@bO,wO).
given on the input-probe algebra A@llllo by the density operator WO = vO*vo of the encoding K,(a) = ,1[(a@ l~o)wO]. 5. Quantum Channel Capacities and their Additivity
5.1. The quantum and semiclassical capacities The channel A transmits the probe-input encoding K, into the probe-output encoding AT K, entangling the output state via this channel to = 1T( 1~) by K, TA = 1T = KT A. The mutual entangled information, transmitted via the channel for quantum encoding K" is therefore
e
(41)
where A = pO A is the standard maximal input entanglement #b # transmitted via the channel A. Definition 5.1. Given a channel A : llll ----> llllO and a subset K channel information capacity is defined by
<;;;;
pO
(b) =
K q , the
(42) The channelled semiclassical and the proper quantum information capacities are defined respectively as
(43)
82
(44) Lemma 5.1. Let A(b) = VbV* be a unital completely positive map A: lBl lBlo (Noiseless channel.) Then
--->
Jq(C;0, A) = Hrrdc;O), Jc(C;°, A) = SlIdc;O) andC(A) = lndim~o, Q(A) = lndimlBlO for the logarithmic capacities of any type I12,lffi = Il~) ,
5.2. The true quantum capacity Theorem 5.1. The channelled entangled information achieves the value (45)
where A = 'ir 0 A is given by the optimal input entanglement 'ir 0 pO of the channel input state C;O to the transposed state (/ = ~ induced on the minimal sufficient alphabet A~ = lBl~ , Proof. Use the monotonicity of IA,lffi(KTA) with respect to KT and the commutativity of the following diagrams 0
Note that we have explicitly three types of such channeled information ~:j (C;O , A) for each contrast function l and obviously, ~:~) (C;O , A) :S
~:~)(c;O,A), QjA) (A):S QjB) (A), where
Qj') (A)
=
sup
{Ii',~o (AT
0
'ir,o) :
C;O E (3 (lBlO)} ,
It is interesting to compute and compare the Renyi and the usual logarithmic informations ~:j(c;O,A), ~:;(c;O,A) and the capacities Qj')(A),
cF) (A)
for the particular types for some special, say the quantum Gaussian channel A and input states c;o,
5.3. Block encoding for quantum product channels Here we consider the products Q~n) = ®i=l c;i and transposed states c;~n) = (j~n) respectively on the tensor products A~n) = ®i=l lBli and lBl~n) = A~n) and for the notational simplicity we implement the convention A~ = lBli, n) = ,® is the product f)i = ,""2 0 and A ® = A (n) f)® = t:O f)(n) such that ,-C t::o 0 0 , t::'o -:'0 ":to state on the transposed input algebra lBl~n) = lBl~ defining the standard
83
entanglement (J@ = "<:Yt=l /O,n (Ji0 of the input state ~o n(n) o the probe product state ~~ = (j~ on lE~ = A~. Let A@ = 0;'=1 Ai denote the product channel.
=
n@
t:o
on A 0(n)
= A@ to 0
Definition 5.2. A quantum block-encoding is a normal CP map f£(n) : A(n) ---+ A~T on an alphabet algebra A(n) such that f£(n) (l(n)) = 0~1~~ for the given states ~i E lEiT. Obviously, the compound density w~n) E A~n) 0 A~T for the product block encoding f£@ = 0;'=f£i is the product w~ = 0;'=1 Woi of the densities for the input compound states Woi = wi, each entangled by tri = f£I on Ai 0lEi. However, in general it may not be true, despite of f£(n) (1 (n)) = ?!~, since the input entangled state win) on A(n) o Ain ) may not be the product 0;'=1 wi even if A (n) is the product alphabet algebra A@ = 0~1 Ai as is A~n) = A~ but f£(n) (a@) i- 0;'=1 f£i (ai) for some a@ = 0;'=1 ai E A@. This is due to one-to-one correspondence w~n) f--t f£(n) :
5.4. The additivity problem for quantum capacities In the normalized case 'P (1) = 1 = w (1) the informational divergence = -Sl~) as negaentropy of any type has the multiplicative additivity property
vf;!
with respect to every finite product reference 'P@ = 0;'=1 'Pi. It was shown for (A) and (B) types entropy in (24), and for the type (C) it comes from em = 0emi and 'P = 0'Pi in the transform (29) with m = 2: 1@(i-1) 0 mi 0 l@(n-i).
The logarithmic informations I'JJ.,p, (7r(n)) are additive iff 7r(n) : 93 ---+ mT is product entanglement 7r@ : lE@ ---+ A~, and the logarithmic capacities :ftc
((}~n), A@ ), including
Hk) ((}~n)) =
sup ",(n)
Il~ ,'JJ.
(f£(n))
= :f~.) ((}~n), Id@)
Etc (e~n»)
are additive iff they are achieved on product encodings f£(n) = f£@. The latter holds if the constraint K on f£ is appropriate, or, as we will prove below, if there is no additional constraint on the set Kq ( (}~n)). On the other
84
hand, the semiclassical constraint K = Kc ((}~n)) in the case of noncommutative Ao seams inappropriate, although it is known that this constraint is appropriate for the logarithmic capacities of type A in the category of the commutative Ao, in which it trivial constraint. In particular, the semiclassical Holevo constraint K = Kc is not appropriate for the true quantum channels since as it is now known, the logarithmic capacities J}A) ({}~, A0 ) and C(A) (A0 ), known as the Holevo bounds 15, are not additive for the general quantum channels A.
5.5.
Optimal true quantum block encoding
The following Lemma is valid for any type of quantum entangled information, and in particular, for any Renyi information.
Lemma 5.2. Let A~ : A~T
---+
E~, (}~n) =
0i=1 {}~ == {}~ .
Then
Proof. Take AiT : EiT ---+ EiT and ~~ E EiT with Ei = A~ and c:;i = {}~ evaluating the densities ~i of the output states C:;i = {}~Ai as AiT ((}~). Due to Theorem 1 we can decompose the quantum block encodings ",en) : A en) ---+ A0 as ",(n) = p00 K(n)' where K(n) .• A(n) ---+ A,.0 is a normal UCP map and OT 0
p~ (b~)
=
0i=1/?lb~/?l is
the standard entanglement on
A,.~
:3
b~
to
(}~n) = 0~ 1{}~. Note that although the range of K(n) is product algebra A,.0 A0 with r0 = t:'o' n o , the domain A(n) is any algebra . For H(n) = K(n)o-0 T 0 "'!I
we maximize the information quantum encodings Kq Jq
({}~n)):
((}~n), A0 ) = sup
Ja® (H(n))
IA( n)
{I21
over the maximal convex set of
' ,'l:l
(K~n)o-~ A0 ) : K~n) 0 o-~ E Kq}
.
Since ~~ = l!~n) is fixed , the supremum should be taken only over all channels K~n) with fixed domain A,.~T but arbitrary range A~n). Thus, the required result (46) follows by monotonicity argument (Theorem 2) for any type of quantum channelled l-information J q ({}~, AO). D
5.6.
The additivity of true quantum capacities
Theorem 5.2. The logarithmic q-informations Jq(.) are additive, Jq(·)({}~, A0 )
= LJq(-)({}~,Ai)
85
and so is the logarithmic proper quantum capacity Q(')
(A 0 ) = sup J q(') ((/t!, A0) = eo
L
Q( ' )
(Ai) .
Proof. Follows immediately from Lemma 3 by additivity argument of the logarithmic function l = In . 0
Note that this proof does not work for any type of the logarithmic semiquantum information J e(') (Q~, A0) and the capacity C(·) (A 0) which have obviously the semiadditivity property
The lower bound achieved on the product encodings ",(n) = "'0 might not maximize I~:'13 (K~n) a~ A under the constraint that A (n) must be com-
0)
mutative for the semiclassical encodings satisfying
",(n)
(1 (n»)
= Q~.
But the above semiclassical capacities Je(A) and C(A) are appropriate for semiclassical channels A corresponding to the commutative input algebra Ao = lEo. The logarithmic capacities for such channels of the type A where first introduced by Levitin under the name of the defect of entropy Je(A)
(Qo, A) = SE (QoA) -
SEIA
(",~A),
C(A)
(A) = sup Je(A) (Qo, A), eo
where S EIA (",~A) and SE (QoA) are conditional and unconditional vonNeumann semiclassical entropies corresponding to the standard classical input encoding
In general, these capacities are smaller than the logarithmic semiclassical capacities
J}B) (Qo,A) =
e [goAT In (A~l~)J
C(B)
(A) =
SUPJ}B)
(Qo,A)
eo
of the type B. Here AT is the contravariant v-density v (AT b) = A (b) of the semiclassical channel AT : go f-+ J-l (goAT) mapping the classical J-l-desities go into the quantum v-densities ~ = AT (g), and = J-l Q9 v is the semiclassical trace on lB~ ®lET.
e
86
6. Conclusion
Quantum channel capacities can have several different formulations when considering to send classical information or quantum information, one-way or two-way communication, prior or via entanglement, etc., however they all can be considered as the capacities achieved under the different constraints on the encoding class of true quantum entangling codes K q . Anyway, the operational quantum channel capacity and its asymptotical equivalence with the additive quantum capacities under the appropriate constraints is still a big open and challenging research problem in quantum information theory. Another natural problem in this direction is to compare proper quantum entropies and additive capacities of different type in quantity for some interesting quantum channels, such as Gaussian channels, and other smaller capacities of different types under the well known constraints, such as semiclassical constraint, entanglement-assistant capacity, etc., and find for which class of channels they assymptotically coincide. Generally how to access those capacities, using physically implement able operations for encodings and decodings, such as quantum channel capacity for one-way communication via entanglement, is of course an important open problem in quantum information and quantum computation. All those problems wait forthcoming papers in the future. 7. Appendix: The standard pairing
Let lL and M be complex linear spaces put in duality via bilinear form (-,.; : M x lL ---> C. Let M be closed with respect to an associative but generally non-commutative binary operation M2 :3 (x, z) f--+ XZ E M (e.g. pointwise function or matrix mUltiplication) and involution * as a selfinverse antilinear map reversing the multiplication order, (x*z)* = z*x, so that M is a *-algebra with the positive cone of x*x generating M. Not that, if M has the imaginary unit i = - i *, then it the case as any x E M is the linear combination of four positive elements (x + i n)*(x + in) for n = 0,1,2,3. However, we do not require that M is unital but will assume that lL has the identity 1* = 1 defining a reference weight rJ (x) = (x,l; as a strictly positive rJ (x*x) > 0 for all nonzero x E M linear functional on M. Let the pairing be regular such that for any left multiplication x f--+ zx on M there exist an transposed operator y f--+ z'y on lL defined by (zx, y; >= (x, z'y;, and that lL is closed under the transposed involution denoted also by *: (x, y*; = (x*, y; * (Every separating pairing on finite dimensional M and lL is obviously regular.) Then the following theorem hold:
87
Theorem 7.1. The dual space lL of M with respect to the regular pairing is left module with respect to the transposed left action of M on lL such that (xz)' = z' x', and it is also the right module with respect to the right transposed action of M on lL given by y 1-+ yz*'*, where (x, yz *'*) := (* x ,z *, y *) * = (* z x * ,y *) * =- (xz, y ) .
Moreover, if Y = y* is positive element of lL in the sense that (x*x, y) 2: 0 for all x E M, then z*'yz'* is also positive for every z E M. The positive maps y 1-+ z*'yz'* in fact are completely positive (CP) in the sense that for every semipositive matrix Y = [Yi,k] in the sense
(x*x, Y)
:=
L
(X;Xk' yi ,k) 2: 0 VXj E M,j = 1,2, ... ,
i,k:;::l the matrix z*'Yz'* = [z*' I: yi,k z '*] remains semipositive. The pairing is called central, or tracial, if the left and right transpositions act identically on 1: z*'1 == z == 1z'* for all z E M such that f)
(zz*) = (z, 1z'*) = (z, z*'1) = f) (z*z) z E M .
The antilinear invertibe map z 1-+ Z, intertwining the invo~utions in M and lL, represents the *-algebra M in lL as the opposite algebra M with respect to the induced product z*z = z*z such that z = z*_becomes the transposition z*z = z*z of M onto the weakly dense domain M in lL. Moreover, the dual space lL becomes two-sided *-module over Mby identifying with left and right transpositions as z'* = z = z*' which adds the identity 1 E lL to ~. Note that if lL is taken as minimal predual MT completing the *-algebra M with respect to the dual to C*-algebra norm on M, then lL may not have the identity as the density of a not finite trace f), but then the identity can be added to M to represent the norm-trace (y) = (1, y) on lL = MT uniquely extending the transposed trace (x) = (1, x) = f) (x) from M. The basic example of such central pairing, called tilde-tracial pairing, is given by the standard trace
z
e
e
(x, y) := Tr (xy)
==
T
(yx) ,
or by any other trace e on lL as the linear positive functional e(y) = (1, y) such that e (xy) = e(yx). Note that the standard symmetric tracial pairing (x . z) = T (x . z) identifies the dual space of M with the complex conjugation M* = j[ of the predual space lL = M T. In this preadjoint space M* the left multiplication
88
z
z' is identified not with the left multiplication by but with the right multiplication by z. The preadjoint space is obviously two-sided M-module containing the algebra M itself as~a dense subspace, which, of course, coincides with M if M is symmetric, M = M.
Acknowledgment This work was supported by QBIC grant of Japanese Ministry of High Education and by British Council PMI-2 grant for UK-Korea collaboration. The author would like to thank for warm hospitality of Professor Ohya at the Science University of Tokyo and of Professors So Young Kim from Korean Institute for Advanced Studies and Un Cig Ji from Chungbuk National University for their encouragement to prepare these lecture notes for publication.
References 1. V. P. Belavkin. On entangled information and quantum capacity. Open Sys. and Information Dyn., 8:1-18, 2001. 2. V. P. Belavkin. On entangled quantum capacity. In Tombesi and Hirota, editors, Quantum Communication, Computing and Measurement, pages 325333. Kluwer /Plenum, 2001. 3. V. P. Belavkin. Quantum sex and mutual information. In New Development of Infinite-Dimensional Analysis and Quantum Probability, pages 61-82, Kyoto, 2001. Research Institute of Mathematical Sciences. 4. V. P. Belavkin. Contravariant densities, complete distances and relative fidelities for quantum channels. Report in Mathematical Physics, 55:61 - 77, 2005. 5. V. P. Belavkin and x .. Dai. An operational algebraic approach to quantum channel capacity. International Journal of Quantum Information, 6:357-374, 2008. 6. V. P. Belavkin and S. Hammersley. Information divergence for quantum channels. In Quantum Probability and Infinite Dimensional Analysis, volume 29, pages 149 - 167. World Scientific, 2006. 7. V. P. Belavkin and M. Ohya. Quantum entropy and information in discrete entangled states. Infimite- Dimensional Analysis, Quantum Probability and Related Topics, 4(2):137-160, 2001. 8. V. P. Belavkin and M. Ohya. Entanglement, quantum entropy and mutual information. Proc. R. Soc. Lond. A, 458:209-231, 2002. 9. V. P. Belavkin and R. Stratonovich. On optimisation of processing of quantum signals by information criterion. Radio Eng Electron Physics, 18(9):1839-1844,1973. 10. V. P. Belavkin and A. Vantsian. On sufficient conditions of optimality of
89
11.
12. 13. 14. 15.
16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
quantum signal processing. Radio Eng Electron Physics, 19(7):1391- 1395 , 1974. J. A . Smolin C. H. Bennett, P. W. Shor and A. V. Thapliyal. Entanglement assisted classical capacity of noisy quantum channels. Phys. Rev. Lett., 83:3081- 3084, 1999. N. Cerf and G. Adami. Von neumann capacity of noisy quantum channels. Phys. Rev, A 56:3470- 3483, 1997. M. B. Hastings. A counterexample to additivity of minimum output entropy. Nature Physics, 5:255, 2009. P . Hayden and A. Winter. Counterexample to the maximal p-norm multiplicativity conjecture for all p > . arXiv.org: 0807.4753 , 2008. A. S. Holevo. Bounds for the quantity of information transmitted by a quantum communication channel. Problems of Information Transmission, 9:177183, 1973. C . King and M. B. Ruskai. Minimal entropy of states emerging from noisy quantum channels. IEEE Trans . Inf. Thy., 47:192-209, 2001. D. S. Lebedev and L. B. Levitin. Information transmission by electromagnetic field. Information and Control, pages 1- 22, 1966. A. Lesniewski and M . Ruskai. Monotone riemanian metrics and relative entropy of non-commutative probability spaces. math-ph/9808016, 1998. D . Petz. Quasi-entropies for finite quantum systems . Reports on Mathematical Physics, 23:57- 65, 1984. B. Schumacher. Sending entanglement through noisy quantum channels. Phys. Rev. A , 54:2614- 2628, 1996. C . E. Shannon. A mathematical theory of communication. Bell Syst. Tech . Jour., 27:379- 423, 1948. P. W. Shor. Equivalence of additivity question in quantum information theory. Comm. Math. Phys., 246:453-472, 2004. R. L. Stratonivich. On mutual information and the capacity of quantum channels. Izvestia Vuzov: Radiophysics, 4:15-24, 1965. R. L. Stratonovich. Mutial infromation of quantum gaussian variables. Izvestia Vuzov: Radiophysics, 8:116; 129, 1965. R . L. Stratonovich. On transmition of information via quantum channels. Problems of Information Transmitian, 45:150-160, 1966.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M . Ohya © 2011 World Scientific Publishing Co. (pp. 91- 99)
NON-MARKOVIAN DYNAMICS OF QUANTUM SYSTEMS DARIUSZ CHRU8CIl\[SKI and ANDRZEJ KOSSAKOWSKI
Institute of Physics, Nicolaus Copernicus University, Grudzig,dzka 5/7, 87-100 Toru'Ti, Poland
We analyze a local approach to the non-Markovian evolution of open quantum systems. It turns out that any dynamical map representing evolution of such a system may be described either by non-loca l master equation with memory kernel or equivalently by equation which is local in time. The price one pays for the local approach is that the corresponding generator might be highly singular and it keeps the memory about the starting point 'to' . Remarkably, singularities of generator may lead to interesting physical phenomena like revival of coherence or sudden death and revival of entanglement.
Keywords: Open systems; quantum dynamics; non-Markovian evolution.
1. Introduction
The non-Markovian dynamics of open quantum systems attracts nowadays increasing attention. 1 It is very much connected to the growing interest in controlling quantum systems and applications in modern quantum technologies such as quantum communication, cryptography and computation. 2 It turns out that the popular Markovian approximation which does not take into account memory effects is not sufficient for modern applications and to days technology calls for truly non-Markovian approach. Non-Markovian dynamics was recently studied in Refs. 3- 18. The usual approach to the dynamics of an open quantum system consists in applying the Markovian approximation, that leads to the following local master equation d
dtP(t) = .eMP(t) ,
p(to) = Po,
(1)
where p(t) is the density matrix ofthe system investigated and.e M the timeindependent generator of the dynamical semigroup possessing the following 91
92
well known representation
19 ,20,21
LMP= -i[H,p]
+ L (VaPVd -
~{VdVa,p})
(2)
'" The above structure of LM guaranties that dynamical map A(t, to), defined by p(t) = A(t, to)Po, is completely positive and trace preserving for t 2: to· Note that A(t, to) itself satisfies Markovian master equation d
dt A(t, to) = LM A(t, to),
A(to,to) = ] ,
(3)
and the solution for A(t, to) is given by
A(t, to) =
e(t - tO)LM ,
(4)
which implies that A(t, to) depends only upon the difference 't - to' and hence A(t) := A(t,O) defines a I-parameter semi group (t 2: 0) satisfying homogeneous composition law
(5) for t1, t2 2: O. In general the external conditions which influence the dynamics of an open system may very in time. The natural generalization of the Markovian master equation (1) involves time-dependent generator LM(t) which has exactly the same representation as in (2) with time-dependent Hamiltonian H(t) and time-dependent Lindblad operators V",(t). Therefore one gets the following master equation for the dynamical map A(t, to) d
dt A(t, to) = LM(t) A(t, to),
A(to, to) = ] ,
(6)
which leads to the following solution
A(t, to) = T exp
(1:
LM(T)dT) ,
(7)
where T stands for the chronological operator. Clearly, A(t, to) no longer depends upon 't - to' but it still satisfies inhomogeneous composition law
A(t,s)·A(s,to) = A(t, to) ,
(8)
for t 2: s 2: to. We stress that (6) although time-dependent is perfectly Markovian. Note, that the solution (7) has only a formal meaning since the evaluation of T-product is in general not feasible. In this paper we analyze a non-Markovian dynamics A(t, to) which does not satisfy the composition law - this is the essence of non-Markovianity.
93
In the next section we present a general strategy to the description of nonMarkovian evolution based on the local in time generator, and in section 3 we present simple examples to illustrate a general approach. Final remarks are collected in section 4. 2. Local vs. non-local approach The standard approach to the dynamics of open system uses the NakajimaZwanzig projection operator technique 22 (see also Refs. 1, 21) which shows that under fairly general conditions, the master equation for the reduced density matrix pet) takes the form of the following non-local equation d pet) = -d t
it
K(t - u)p(u) du ,
to
p(to) = Po ,
(9)
in which quantum memory effects are taken into account through the introduction of the memory kernel K(t): this simply means that the rate of change of the state pet) at time t depends on its history (starting at t = to). Equivalently, one has the following nonlocal equation for the dynamical map d A(t, to) -d
=
t
it
Ket - u)A(u, to) du ,
A(to, to) =
to
n.
(10)
Let us observe that homogeneity of K (it depends upon the difference 't-u') implies that the corresponding solution A(t, to) is homogeneous as well. This property enables one to rewrite (10) as follows
t
d
dt A(t) = Jo Ket - u)A(u) du ,
A(O) =
n,
(11)
where A(t) := A(t + to, to). This form is usually studied in the literature 1. Note, however, that there is no need to provide the initial condition at to = o. Equation (10) allows to take completely arbitrary to. Unfortunately, we do not know condition for the memory kernel K(t) which guarantee that the corresponding dynamical map A(t, to) is a legitimate quantum evolution, i.e. it is completely positive and trace preserving. Therefore, instead of non-local approach we propose to analyze much simpler approach which is based on the local in time Master Equation (this approach is usually called time-convolutionless (TCL)1,23). Note, that each solution A(t) of (11) satisfies the following local in time equation 18
d
dt A(t, to)
=
L(t - to)A(t, to), A(to, to)
=
n,
(12)
94
where the time-dependent generator £(t) is defined by the following logarithmic derivative of the dynamical map
(13) and A(t) := A(t + to, to). Let us observe that Eq. (12) is local in time but its generator does remember about the starting point 'to'. This is the most important difference with the time-dependent Markovian equation (6). The appearance of 'to' in the generator £(t-t o) implies that £ is effectively nonlocal in time, that is, it contains a memory. Therefore, the local equation (12) is non-Markovian contrary to the local equation (6) which does does not keep any memory about to. Note, that solution to (12) is given by
A(t, to) = T exp
(fat-to £(T) dT)
.
(14)
It shows that A(t, to) is indeed homogeneous in time (depends on 't - to'). However, contrary to (7), it does not satisfy the composition law. Again, this is a clear sign for the memory effect. Now comes the natural question: how to construct non-Markovian generator £(t) which does guarantee legitimate quantum dynamics. The general answer is not known but one may easily propose special constructions. Let £M be a Markovian generator defined by (2) and define £(t) = a(t)£M' It is clear that if J~ a(u)du :2: 0 for t :2: 0, then
A(t) = exp(J~ a(u)du£M) defines completely positive non-Markovian dynamics. This construction may be generalized as follows: consider N mutually commuting Markovian generators £~), ... ,£r/) and N real functions ak(t) satisfying J~ ak(u)du :2: O. Then r( t ) J..-
=
r(1) a1 ( t ) J..-M
rCN) + ... + aN ( t ) J..-M
,
(15)
serves as a generator of non-Markovian evolution. Let £(t) (t :2: 0) be a commuting family of superoperators, i.e. [£(t),£(s)] = O. If the integral J~ £(u)du defines for each t :2: 0 a legitimate Lindblad generator, then £(t) generates legitimate non-Markovian evolution. If the family £(t) (t :2: 0) is non-commuting then the problem of necessary and sufficient condition for £ (t) is still open. 3. Examples
To illustrate our approach let us consider the following simple examples. Throughout this section we put A(t) := A(t, 0) and define the initial value problem at to.
95
Example 3.1. Consider the pure decoherence model defined by the following Hamiltonian
(16) where HR is the reservoir Hamiltonian, Hs system Hamiltonian and
= Ln EnPn (Pn = In nl) the (17)
the interaction part, Bn = B~ being reservoirs operators. Hence, the total Hamiltonian has the following structure
(18) n
where
(19) are reservoir operators. The initial product state p ® W R evolves according to the unitary evolution e- iHt (p ® WR)e iHt and by partial tracing with respect to the reservoir degrees of freedom one finds for the evolved system density matrix
n,m where cmn(t) = Tr(e-iz=twReiZnt). Note that the matrix cmn(t) is semipositive definite and hence
(20) n,m defines the Kraus representation of the completely positive map A(t). The solution of the pure de coherence model can therefore be found without explicitly writing down the underlying master equation. Our method, however, enables one to find the corresponding generator L(t). It is given by the following formula (21) n,m where the functions amn(T) are defined by a mn = cmn/cmn . We stress that this example belongs to the commutative class, that is, [L(t), L(s)] = O.
96
Example 3.2. Consider the dynamical map for a qudit (d-Ievel quantum system) given by
A(t) = F(t)ll + [1 - F(t)]1' ,
(22)
where l' : B(C d ) ------- B(C d ) denotes completely positive trace preserving projection, and F(t) is a real function such that
F(O) = 1 .
F(t) E (0,1],
(23)
For example take a fixed qudit state wand define l' by the following formula l' P = w Trp. Another example of a completely positive projection is the following: let Pn = In nl and define 1'p = 2:n PnpPn . For example for d = 2 one obtains the following formula for the evolution of p(t):
P(t)-(
Pu(O)
-
P21 (O)F( t)
P12(0)F(t)) P22 (0) .
(24)
Clearly, A(t) being a convex combination of II and l' is completely positive trace preserving map and hence it defines legal quantum dynamics of a qudit. One easily finds for the corresponding generator
.c(t)
=
a(t).c o ,
(25)
a(t)
= -
F(t) F(t) ,
(26)
where
and
.co = II -1' ,
(27)
is a legitimate Markovian generator. Note that if F(t) = e-,t , then a(t) = " and hence .c(t) = ,.co defines Markovian generator. Note, that we may perfectly regular dynamics A(t) which is generated by highly singular generator .c(t). Take for example F(t) = cost. One obtains therefore the oscillation of the qubit coherence P12(t) = P12(0) cos t. However, the corresponding (25) is defined by a(t) = tan t, which displays an infinite number of singularities. Example 3.3. The previous example may be easily generalized to bipartite systems. Consider for example a 2-qubit system and let l' be a projector
97
onto the diagonal part with respect to the product basis 1m (9 n in ((:2 (9 ((:2. Let us take as an initial density matrix so called X -state 24 represented by
Po
=
(P~l P~2 P~3 P~4) o
.
P32 P33 0 P41 0 0 P44
(28)
It is easy to see that A(t) defined by (22) does preserve the structure of X-state, that is, p(t) has exactly the same form as in (28) with t-dependent Pmn . Let
F(t) = 1
-lot
f(u)du .
(29)
It is clear that the t diagonal elements are time independent Pkk(t) = Pk~' and Pkl(t) = (1- fa f(u)du)Pkl' for k =1-1. The entanglement of the 2-qublt X-state p(t) is uniquely determined by the concurrence
C (t) := 2 max { Cl (t), C2 (t), O} ,
(30)
where Cl(t)
=
Ip23(t)l- VPllP44 , C2(t)
= IP14(t)l- VP22P33 ,
(31)
that is, p( t) is entangled if and only if Cl (t) > 0 or C2 (t) > O. Let us observe that the function f(t) controls the evolution of quantum entanglement. Consider for example f(t) = qe-,t, with 'Y > 0 and E E (0,1]. One finds from (26) the following formula a(t) = q[(l - E)e l t + E]-l. Note, that for E = 1 it reduces to a(t) = 'Y, that is, it corresponds to the purely Markovian case. Hence, the parameter '1 - E' measures the non-Markovianity of the dynamics. Suppose now that Po is entangled. The entanglement of the asymptotic state is governed by C(oo) = 2max{cdoo), C2(00), O} with Cl(oo) = (1- E)lp231- VPllP44 , C2(00) = (1 - E)lp141- VP22P33 . It is clear that in the Markovian case (E = 1) the asymptotic state is always separable (C(oo) = 0). However, for sufficiently small 'E' (i.e. sufficiently big non-Markovianity parameter '1 - E') one may have Cl (00) > 0 or C2(00) > 0, that is, the asymptotic state might be entangled. This example proves the crucial difference between Markovian and non-Markovian dynamics of composed systems. In particular controlling 'E' we may avoid sudden death of entanglement 24.
98
4. Conclusions In conclusion, any non-Markovian quantum evolution may be described either by the non-local equation (9) or by a time-local equation (12). Local approach is more simple and well suited for practical purposes. The price we pay for the local approach is that in general the corresponding generator may be highly singular (for example £(t) = tan t £3) and it keeps a memory about the starting point 'to'. Our examples show the power of this approach - one is able to provide (possibly singular) local generator but the construction of the corresponding memory kernel K(t) is not feasible. We stress that the problem of necessary and sufficient condition for the local generator which do guarantee that the corresponding dynamical map gives rise to the legitimate quantum evolution is still open and it deserves further studies.
Acknowledgments This work was partially supported by the Polish Ministry of Science and Higher Education Grant No 3004/B/H03/2007/33.
References 1. H.-P. Breuer and F. Petruccione, The Theory of Open Quantum Systems (Oxford Univ. Press, Oxford, 2007). 2. M. A. Nielsen and I. L. Chuang, Quantum Computation and Quantum Information (Cambridge Univ. Press, Cambridge, 2000). 3. J. Wilkie, Phys. Rev. E 62, 8808 (2000); J. Wilkie and Yin Mei Wong, J. Phys. A: Math. Theor. 42, 015006 (2009). 4. A. A. Budini, Phys. Rev. A 69, 042107 (2004); ibid. 74, 053815 (2006). 5. H.-P. Breuer, Phys. Rev. A 69 022115 (2004); ibid. 70,012106 (2004). 6. S. Daffer et al. Phys. Rev. A 70, 010304 (2004). 7. A. Shabani and D.A. Lidar, Phys. Rev. A 71, 020101(R) (2005). 8. S. Maniscalco, Phys. Rev. A 72, 024103 (2005). 9. S. Maniscalco and F. Petruccione, Phys. Rev. A 73, 012111 (2006). 10. J. Piilo, K. Harkonen, S. Maniscalco, K.-A. Suominen, Phys. Rev. Lett. 100, 180402 (2008); Phys. Rev. A 79, 062112 (2009). 11. E. Andersson, J. D. Cresser and M. J. W. Hall, J. Mod. Opt. 54, 1695 (2007). 12. A. Kossakowski and R. Rebolledo, Open Syst. Inf. Dyn. 14, 265 (2007); ibid. 15, 135 (2008). 13. A. Kossakowski and R. Rebolledo, Open Syst. Inf. Dyn. 16, 259 (2009). 14. H.-P. Breuer and B. Vacchini, Phys. Rev. Lett. 101 (2008) 140402; Phys. Rev. E 79, 041147 (2009). 15. M. Moodley and F. Petruccione, Phys. Rev. A 79, 042103 (2009). 16. B. Vacchini and H.-P. Breuer, Phys. Rev. A 81, 042103 (2010).
99
17. D. Chruscinski, A. Kossakowski, and S. Pascazio, Phys. Rev. A 81, 032101 (2010) 18. D. Chruscinski and A. Kossakowski, Phys. Rev. Lett . 104,070406 (2010) . 19. G. Lindblad, Comm. Math. Phys. 48, 119 (1976). 20. V. Gorini, A. Kossakowski, and E.C.G . Sudarshan, J. Math. Phys. 17, 821 (1976). 21. R . Alicki and K. Lendi, Quantum Dynamical Semigroups and Applications (Springer , Berlin, 1987). 22. S. Nakajima, Prog. Theor. Phys. 20 , 948 (1958); R. Zwanzig, J. Chern. Phys. 33, 1338 (1960). 23. H.-P. Breuer, B. Kappler and F. Petruccione, Phys. Rev. A 59,1633 (1999). 24. T. Yu and J. H. Eberly, Opt. Comm . 264 , 393 (2006) ; Q. Inf. Comp . 7, 459 (2007); Phys. Rev. Lett. 97, 140403 (2006); ibid. 93 , 140404 (2004).
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 101-115)
SELF-COLLAPSES OF QUANTUM SYSTEMS AND BRAIN ACTIVITIES
K.-H. FICHTNER Unversity lena, Institute of Applied Mathematics, 01143 lena, Germany E-mail: [email protected]
L. FICHTNER Unversity lena, Institute of Psychology, 01143 lena, Germany E-Mail: [email protected]
w.
FREUDENBERG
Techn. University Cottbus, Dep. of Mathematics, 03013 Cottbus, Germany E-mail: [email protected]
M.OHYA Department of Information Science and Quantum Bio-Informatic Center, Tokyo University of Science, Noda City, Chiba 218-8510, lapan E-Mail: [email protected] We explain the relation between the quantum statistical model of the recognition process developed in the last years in a series of papers (cf. [7, 9, 10, 11, 12]) and a certain process of self-collapses. This process will be modeled by a classical homogenous Markov chain.
1. Introduction
Physicists as R. Penrose or H. P. Stapp (cf. [25, 38, 39]) but also at an increasing rate specialists of modern brain research (cf. [35, 36, 34, 37]) are convinced that information processing in the brain cannot be described appropriately by models based on classical physics or classical stochastics. So for instance H. P. Stapp argues in [38] that "classical physics cannot explain consciousness because it cannot explain how the whole can be more than the part. " A first attempt to explain the process of recognition in terms of quantum statistics was given in [7]. The procedure ofrecognition can be described as 101
102
follows: There is a set of complex signals stored in the memory. Choosing one of these signals may be interpreted as generating a hypothesis concerning an "expected view of the world". Then the brain compares a signal arising from our senses with the signal chosen from the memory leading to a change of the state of both signals. Furthermore, measurements of that procedure like EEG or MEG are based on the fact that recognition of signals causes a certain loss of excited neurons, i.e. the neurons change their state from excited to non-excited. As it was pointed out by R. Penrose [25] this change that comes along with recognition of a signal can be identified with a process of self-collapses. A quantum statistical model of the recognition process should reflect the change of the signals and the loss of excited neurons. In a (still incomplete) series of papers the procedures of creation of signals from the memory, amplification, accumulation and transformation of input signals, and measurements like EEG and MEG are treated in detail (cf. [17, 8, 9, 10, 11, 12, 14, 13, 22]). In the present note we will not present this approach in detail and in full mathematical strength. A few of the basic ideas and structures of the proposed model of the recognition process will be sketched and we will explain the relation between this model and a certain process of self-collapses. In Section 2 we will collect some biological facts and experiments described in [26]. Moreover, we will cite some opinions given by R. Penrose, W. Singer and others allowing to formulate some basic postulates and requirements each brain model has to fulfill. Our model seems to be appropriate and in accordance with the experimental results. The subsequent section contains the basic mathematical notions. We introduce the underlying Hilbert space representing the space of signals and the basic operators acting on the signal space. In the sequel we explain the basic ingredients of our model of the recognition process. For simplicity we will restrict our considerations to the case of interaction of two signals one coming from our senses and the other one created from our memory. In Section 4 the single steps of the process of recognition will be modelled by a Markov chain with discrete time.
2. Some biological facts and experiments In this section we want to present some biological facts and experiments. Each mathematical model describing certain aspects of brain activities should be in accordance with these experimentally proven facts. Firstly, using EEG measurements one gets information on the densities of excited neurons located in the regions of the brain. These densities depend on time.
103
The curves are of the following type:
no activity
no activity
activity ... time
The behaviour of that curve shows two types of periods: periods of oscillations interpreted by specialists as phases of no activity, and periods interpreted as phase of special activity in the considered region of the brain. As an example we mention some experiments described in [26]. All regions of the cortex seem to have their own electromagnetic rhythms. For example, the so-called p.,- rhythm which is produced in certain parts of the cortex that process tactile stimuli and control movement, consists of two main frequency components, one around 10 Hz and the other around 20 Hz. The 20 Hz component of the p.,-rhythm seems to be closely related to motor functions, while the other one is linked more to the sense of touch. "What exactly causes the oscillation is still a big puzzle" [26]. The experiments show that the functional state of the cortex can be monitored by measuring changes in the 20 Hz rhythm in that area of the brain [26]. Using MEG one could show that this rhythm is significantly suppressed if the subject moves his fingers, thereby activating the motor cortex. Interestingly, a similar suppression occurs when the subject merely imagines making such movements or just views someone else moving their finger. Similar effects were observed by G. Rizzolatti (University of Parma/Italy), who used needle electrodes to measure the response of neurons in the brain of a monkey. It was observed that the neurons discharge both when the monkey picks a
104
raisin and when it sees the experimenter make the same movement. The conclusion is that one has to unravel the significance (or unimportance) of the cortical rhythms during the periods of recognition of signals or other actions of the brain. Summarizing one can state that there are periods of no activity in certain regions indicated by the rhythms mentioned above, and there are periods of action where the rhythms are suppressed. Considering a quantum model of the brain the periods of no activity in certain regions of the brain should be represented by a unitary evolution (the quantum system resp. a part of the system remains isolated from the enviroment), i. e. it is a process of type 2 in the sense of J. v. Neumann [40]. Now, if there will be a signal arising from the senses the process of recognition starts. That process represents a rapid sequence of "trials" checking whether the signal from the senses and the signal created by the brain (at least partially) coincide. Let us mention still Stapp (cf. [39]) who uses the second quantization of harmonic oscillators to explain the observed oscillations. Stapp identifies these trials with measurements performed by a mysterious "observer" in the brain what seems to be non-realistic. In our model of recognition these trials are not represented by a process of measurements. Nevertheless, the results of the trials are represented by projections like in the case of measurements. Now, the experimentally verified quantum Zeno effect tells us that a rapid repetition of a certain measurement will suppress the unitary evolution of the system, i. e. the effects of the measurements dominate the evolution of the system. We will see that the quantum model proposed by us enables one to explain the different phases of the curves obtained as the outcomes of EEG measurements using the quantum Zeno effect. Hameroff and Penrose discuss in [25] the concept of quantum theory related to some biological aspects of brain activities. Especially, they deal with the problem how recognition of signals is connected with a process of selfcollapses. We would like to mention some basic statements taken from their paper [25]: - As long as a quantum system remains isolated from its enviroment, it can be satisfactorily described in terms of a deterministic, unitarily evolving process. That process is computable, non-random, and reversible. - The conventional quantum theory view is that the quantum state reduces by enviroment entanglement, measurement or observation (subjective reduction). The mesurement process is non-computable, random, and irreversible, and it is known in various contexs as collapse of the wave function.
105
- A number of physicists have argued in support of special models in which the rules of standard quantum mechanics are modified by the inclusion of some additional procedure according to which the reduction of the state becomes an objectively real process (objective reduction) the system abruptly self-collapses. That would give a new non-computable theory, i. e. they are convinced that processes of self-collapses cannot be described in terms of the conventional quantum mechanics it requires additional postulates (relations to Gadel's incompleteness theorem ?) - Consciousness, it is argued, requires non-computability. The only readily available apparent source of non-computability are self-collapses. The self-collapse, irreversible in time, creates an instantanoues "now" event. Sequences of such events create a flow of time, and consciousness. As it was mentioned in Section 1 in a series of papers we developed a quantum statistical model describing certain aspects of the recognition process. In the sequel we will present now certain aspects of this model. We will argue that the model is in accordance with the experimental results and the opinions cited above. The paper is focussed on the relation between the process of recognition and a certain process of self-collapses. 3. The Space of Signals In the present section we introduce briefly notions and notations needed in the sequel. For interpretation and motivation of the introduced notions we refer to the above mentioned papers. Starting point will be a set G representing the space where the process of recognition and processing of the signals takes place. For the mathematical model it is irrelevant what is the concrete structure of G. So let G be an arbitrary complete separable metric space equipped with a fixed finite diffuse measure on the a-algebra of Borel sets of G. The elements of the Hilbert space L2 (G, fJ) can be interpreted as functions of the excited neurons. We assume that G decomposes into disjoint regions G I , ... , Gn being responsible for different tasks. So L2(G k ) represents the space of the excited neurons in the region G k . Now, let r(L2(G)) denote the symmetric Fock space over L2(G):
Incoming signals are identified with states on the symmetric Fock space 1{ = r(L2(G)). Why do we choose this Fock space as signal space? The main reason is the possibility to identify the Fock space over L2 (G) with
106
the tensor product of the Fock spaces over L 2 (Gd, ... , L2(G n ):
r(L2(G l U ... U G n )) = r(L2(Gd)0 ... 0r(L2(G n )). The signal decomposes according to the decomposition of the space. Now, we assume that the decomposition is fixed and maximal (very fine) in that sense that each region G r is responsible only for one simple task represented by an E L2(G r ), Ilrll = 1, r::; n. For each r E {I, ... ,n}, k ~ 1 define functions fl E r(L2(G r )) by
r
{VkT.fIr(Xj),X=(Xl, ... ,Xk)EGk ,
r-
fk (x)
:=
]=1
r-
,
o
fo (x)
_
:=
1I0(X),
elsewhere
Observe that for each r E {I, ... , n} UIh'2o gives an orthonormal system in r(L2(G r )). We denote by H~ig c r(L2(G r )) the Hilbert space with basis UIh'2o and interpret this space as space of signals of type r. An especially important class of functions in the Fock space are the so-called exponential vectors. Exponential vectors in H~ig are given by 00
exp{a· r} =
L k=O
k
~fI
(a E C).
v k!
Observe that
m d exp{t· fr} I dtm t=O
and
= ~ . frm' m > O.
One easily concludes that the linear span of the exponential vectors is dense in H~ig, i. e. H~ig
= Lin{exp{a· tr}:
a E IC},
r E {I, ... , n}.
The space '1.Isig . IL
.
=
'1.I si g,o, ILl
,o,'1.Isig
we will call the space of regular signals. States on signals. The states Wg
:=
(1)
'U • • . 'U1Ln
Hsig
are called regular
e-llgl12 . (exp{g},· exp{g})
are called coherent states on H~ig if g E L2(G r ) resp. on Hsig if g E L2(G). Hereby, (,.) denotes the scalar product in the corresponding Hilbert space. Roughly speaking, coherent states describe states of systems of quantum particles where each particle is in the same one-particle state. Furthermore, Wo = (exp{O}, . exp{O}) is called the vacuum state.
J07
4. The Process of Recognition The recognition process is based on a comparison of signals: one signal will be the input signal coming from our senses, the other one is taken from the memory. Both signals are modeled by the Hilbert space Hsig introduced above. The memory space Hmem is a further Hilbert space the structure of which we will not discuss in the present paper. Also we will not describe here the mechanism how the signal is taken out from memory. A detailed discussion and interpretation one can find in [7] and [8, 14, 15]. The whole processing procedure will take place step by step on the space
The comparison procedure between the two mentioned signals is done with the aid of operators (proj ections) on H := Hsig®Hsig , and we concentrate our considerations to these first two spaces of the tensor product space. Basic for our considerations will be the symmetric beam splitter being a well-known operator in quantum optics describing the splitting of coherent light into two beams. Definition 4.1. Let r E {I, ... ,n} be fixed. The linear operator
Vr(exp{f}®exp{g}) := exp
{~. (j + g)} ®exp {~. (j -
g)}
(2)
we call symmetric beam splitter in the region r. It is well-known that tensor products of exponential vectors are total in r(L2(G r )) ®r(L2(G r )). So the symmetric beam splitter Vr is fully characterized by formula (2). Proposition 4.1. (Properties of the symmetric beam splitter) For r E
{I, ... ,n} the symmetric beam splitter Vr is unitary and self-adjoint: V; = Vr and V; =
][r(£2(G r ))0r(L2(G r
))
(][
denoting the identical opera-
108
tor). Moreover, jor all j, g E L2(G r
and a, (3 E C
)
Vr (exp{aj}0exp{(3j}) = exp
{a+(3} v'2 j 0exp {a-(3} v'2 j
(4)
Vr (exp{j}0exp{j}) = exp{ v2j}0exp{O} 1
1
Vr(exp{ O}0exp{j}) = exp{ v'2 j}0exp{ 1
(3)
v'2 j}
(5)
1
Vr (exp{j}0exp{ O}) = exp {v'2 j}0exp {v'2 j}
(6)
V; (exp{J}0exp{g} ) = exp{J}0exp{g}.
(7)
A good survey on properties of general (usually non-selfadjoint) beam splitting operators with an arbitrary number of in- and outputs is given in [24]. In the sequel Prw := (\{f, .)\{f denotes the projection onto (the subspace generated by) \{f. For r E {I, ... , n} define projections T[, T[, To, To on H~ig0H~ig by
T[ := Vr(1I1t~ig0Prexp{O})Vr, = 1I1t~ig01I1t~ig =:
To
:=
V;
IT)
To
:= Vr(Ir -1I1t~ig0Prexp{O})Vr = Ir -
T[
= Ir -
T[
Observe that T[ has the property
T[ (exp{ar}0exP{br}) = exp {~(a + b)r} 0exp {~(a + b)r}. So, T[ corresponds to a projection onto the subspace of H~ig 0H~ig with equal first and second factor of the tensor product. T[ is interpreted as recognition in the r-th region whereas To = Ir - T[ indicates that there was no recognition in the area r. Now we join together these operators T[ and fg of partial recognition resp. "no" recognition in the single areas to operators on the whole tensor product signal space H := Hsig0Hsig. We put
n := {O, I}n =
{E
(EI,'"
=
, En):
E {O, I}, k::; n}
n
n
'i
to.Tr
'= '6'
(£1 , . .. ,E n ) ·
r=l
Ek
r=l
Er '
109
Because of notational convenience we have put in the above definition T[ := T[ (however To =I- ~) This assembly of the operat ors in the single areas was possible because of
Now we want to sketch the process of recognition. Starting point will be a state {}o on H representing a pair of signals the first arisen from the senses the second one created from the brain. Now, there is chosen a certain element ;Sl = (cL ... ,E~) E n indicating what happens in the first step of recognition: - EX:
= 1 indicates recognition in the k-th region,
- EX:
= 0 means there
is no recognition in the k-th region.
The corresponding event will be identified with the projection TCcL ... ,e;). Like in the case of measurements the probability of the event (EL ... , E~) is given by
tr ( (}o . TCci ,... ,e;)) where tr( ·) denote the usual trace of an operator. Together with the probability tr({}o . TE" ) representing the subjective reduction of the state {} caused by the measurement according to TE" we can consider the following transformation of the state {}o representing objective reduction caused by the self-collapse indicated by ;Sl:
K E'l ((}o ) .._- TE'l {}o TE" tr({}o . TE") provided tr({}o . TE") > O. So for each given sequence (EL ... , E~) we get a channel mapping the initial state {}o into the new state KE" ({}o) where partial recognition appears in all regions k with EX: = 1. Starting point for the second step of recognition will be {}l := KE" ({}o) and a second sequence ;S2 E n. Applying the same procedure replacing {}o by {}l the probability of the event indicated by ;S2 will be
110
and caused by the self-collapse indicated by E'2 the state 01 be transformed to
= KE'l (00) will
provided tr(Ol . TE'2) > O. This procedure can be repeated arbitrarily often - given an initial state 00 and a sequence (E'k)k=l one obtains this way a sequence (Ok)k=O of states on 1i and a sequence (tr(Ok _ l . TE'k ))~l of probabilities of the events (E'k)k=l provided these expressions exist (i. e. if tr(Ok_l . TE'k) > 0 for k ~ 1). A necessary condition for this to hold is that the sequence (E'k)~l has to be in some sense increasing. More precisely, we require
(rE{l, ... ,n})
(8)
for all kEN. The relation (8) defines a semi-ordering of the elements in
n.
We write E'k ~ E'k+l to indicate relation (8). This property is in accordance with the requirements our model should fulfill. If once partial recognition in the area r occurs there will be no change for this region in the future, only further zeros may be transformed into 1. The full recognition of the signal would be obtained if for some kEN we have E'k = (1, .. . ,1). The sequence (Ok)~O is in accordance with this relation. Especially, we have if E'k - l ~ E'k
does not hold,
(9)
and 1'f
-k-l <8 E _ -k E .
(10)
The above identities are based on the following properties of the introduced projection the easy proof of which we will omit.
Proposition 4.2.
T~T~ = T~T~ = T~, LTE' = ][H
T~T[ = T[T~ = T[,
(E'l ::; E'2) implies
= TE'2 TE'l = T max{E'l ,E'2} TE'l TE'2 = TE'2 TE'l = TE'2 TE'l TE'2
E'l ::; E'2 (E'l, E'2 E n) (E'l ::; E'2)
T~T[ = T~T[ = 0
111
Summarizing we state that recognition process is characterised by an mcreasing sequence (Ek) k'=l with corresponding sequence of states
en
Unfortunately, this sequence (Ek) k'= l is unknown and seems to be noncomputable. Only in the case the sequences (Ek) ~ l would be known one can identify the process of recognition with the sequence (Ok)~O of states. The way out of this dilemma is to consider the vectors Ek as random and to consider a classical Markov chain describing this process of recognition.
5. A Markov Chain with Discrete Time Since the sequence (Ek)k'=l is not known we replace this sequence by a sequence of random vectors (Xk)~l' Hereby, X k has values in 0 and is interpreted as the random outcome in the k-th step of the recogntion process. The sequence (Ok)k'=O) of states we replace by a new sequence (f3k)~O of states on H given by
f3 k(-)
;=
L P(Xk =
E) . tr(Ke(Oo)(-))
(k 2: 1).
(11)
e Efl
Hereby, P(Xo = (0, ... ,0)) = 1 and f3 0 = 00 is the initial state on H. Following the above arguments the sequence (Xk)~O should form a classical Markov chain with initial distribution PXo = 15(0 , .. . ,0 ) and with transition probabilities
(k 2: 0)
(12)
where EO = (0, ... ,0). These probabilities represent conditional probabilities of passing in the k-th step to Ek starting from Ek - l. We conclude and summarize the above considerations with the following theorem the proof of which we will omit. Theorem 5.1. Let 0 be a state on H (0 is the initial state).
(a) Th ere exists exactly one probability measure P e on ON (equipped with the canonical O"-algebra) with finite-dim ensional distributions
112
P;: given by m -0 -m -0 rrm tr({!TC'k-l TC'k) Pe(E, ... ,E )=b(O, ... ,O)({E})k=l tr({!TC'k-l)
() 13
if tr({!TC'l) .... · tr({!TC'=-l) > 0 Pem (-0 E,
-m) = ... ,E
0
otherwise.
(14)
(b) The sequence (Xk)k=O of random vectors is increasing, z. e.
(k ;:::: 0). (c) The sequence (Xk)k=O is a homogenous Markov chain with initial distribution 15(0, ... ,0) and with transition probabilities -1 -2)
( pEE ,
= tr({!TC'2) tr ({!TC'l )
(15)
(d) The m-steps transition probabilities are given by
(16) (e) (m;:::: 1, E EO).
(17)
Since 0 is a finite set and the sequence (Xk)k=O is increasing it definitely reaches its maximum Xmax indicating the final result of the recognition s
process. This vector Xmax <:::: (1, ... ,1) may be non-random. Further, one may introduce the random number 7] of steps one needs to achieve X max . One has 7] = 1 + max {k: X k < X max}. In a more refined model one has to pass over in a sense of a scaling limit from this discrete Markov chain to a homogenous Markov chain with continuous time. This will be useful since based on the measuring device one cannot measure what is going on in the single (very short) steps of the recognition process. We will deal with this problem in a forthcoming paper.
113
References 1. L. Accardi and M. Ohya. Compound channels, transition expectations and liftings. Applied Mathematics EJ Optimization, 39(1):33- 59, jun 1999. 2. L. Accardi and M. Ohya. Teleportation of general quantum states. In Quantum information (Nagoya, 1997), pages 59-70, Singapore , 1999. World Scientific. 3. A. K. Engel and W. Singer. Temporal binding and the neural correlates of sensoryawarenes. Trends in Cogn. Sci., 5(1):16-25, 2001. 4. J. W . Philips et al. Imaging neural activity using MEG and EEG. IEEE Engineering in Medicine and Biology, pages 34-42, May/June 1997. 5. Akinori Nakamura et.al. Somatosensory homunculus as drawn by meg. Neurolmage, 7(4):377- 386, 1998. 6. Annette Sterr et.al. Perceptual correlates of changes in cortical representation of fingers in blind multi finger braille readers. The Journal of Neuroscience, 18(11):4417, 1998. 7. K.-H. Fichtner and L. Fichtner. Bosons and a quantum model of the brain. Jenaer Schriften zur Mathematik und Informatik Math/ Inf/ 08/ 05, FSU Jena, Faculty of Mathematics and Informatics, Jena, 2005. 27 pages. 8. K.-H. Fichtner and L. Fichtner. Quantum markov chains and the process of recognition. Jenaer Schriften zur Mathematik und Informatik Math/Inf/02/07, FSU Jena, Faculty of Mathematics and Informatics, Jena, 2007. 24 pages. 9. K.-H. Fichtner and L. Fichtner. Quantum models of brain activities I. Recognition of signals. In J .C. Garcia, R. Quezada, and S.B. Sontz, editors, Quantum Probability and Related Topics, volume XXIII of QP-PQ: Quantum Probability and White Noise Analysis, pages 135 - 144, New Jersey London Singapore, 2008. World Scientifc. 10. K.-H. Fichtner, L. Fichtner, W. Freudenberg, and M. Ohya. On a mathematical model of brain activities. In Quantum Theory, Reconsideration of Foundations - 4, volume 962 of AlP Conference Proceedings, pages 85 - 90, Melville, New York, 2007. American Institute of Physics. 11. K.-H. Fichtner, L. Fichtner, W. Freudenberg, and M. Ohya. On a quantum model of the recognition process. In L. Accardi, W. Freudenberg, and M.Ohya , editors, Quantum Bio-Informatics, volume XXI of QP- PQ: Quantum Probability and White Noise Analysis, pages 64 - 84, New Jersey London Singapore , 2008. World Scientifc. 12. K.-H. Fichtner, L. Fichtner, W. Freudenberg, and M. Ohya. On a quantum model of the brain activities. In L. Accardi, W. Freudenberg, and M.Ohya, editors , Quantum Bio-Informatics III, volume XXVI of QP-PQ: Quantum Probability and White Nois e Analysis, pages 81-92, New J ersey London Singapore, 2010. World Scientifc. 13. K.-H. Fichtner, L. Fichtner , W. Freudenberg , and M. Ohya. Quantum models of the recognition process - mathematical prerequisites. Technical report, 2010. 44 pages. 14. K.-H. Fichtner, L. Fichtner , W. Freudenberg , and M. Ohya. Quantum mod-
114
15.
16.
17.
18.
19.
20. 2l. 22.
23.
24.
25.
26. 27.
els of the recognition process - on a convergence theorem. Open Systems 8 Information Dynamics, 17 (2):161-187, 2010. K.-H. Fichtner and W. Freudenberg. The compound Fock space and its application to brain models. In L. Accardi, W. Freudenberg, and M. Ohya, editors, Quantum Bio-Informatics II, volume XXIV of QP-PQ: Quantum Probability and White Noise Analysis, pages 55 - 67, New Jersey London Singapore, 2009. World Scientifc. K.-H. Fichtner, W. Freudenberg, and V. Liebscher. Time Evolution and Invariance of Boson Systems Given by Beam Splittings. Infinite Dimensional Analysis, Quantum Probability and Related Topics, 1(4):511 - 531, 1998. K.-H. Fichtner, W. Freudenberg, and M. Ohya. Recognition and teleportation. In W. Freudenberg, editor, Quantum Probability and Infinite Dimensional Analysis, volume XV of QP-PQ: Quantum Probability and White Noise Analysis, pages 85-105, New Jersey London Singapore, 2003. World Scientific. K.-H. Fichtner, W. Freudenberg, and M. Ohya. Teleportation schemes in infinite dimensional Hilbert spaces. Journal of Mathematical Physics, 46(10):102103, 2005. K.-H. Fichtner, V. Liebscher, and M.Ohya. A limit theorem for conditionally independent beam splittings. In M. Schiirmann and U. Franz, editors, Quantum Probability and Infinite Dimensional Analysis. From Foundations to Applications, volume XVIII of QP-PQ: Quantum Probability and White Noise Analysis, pages 227 -236, New Jersey London Singapore, 2005. World Scientific. K.-H. Fichtner and M. Ohya. Quantum teleportation with entangled states given by beam splittings. Comm. Math. Phys., 222:229-247, 200l. K.-H. Fichtner and M. Ohya. Quantum teleportation and beam splitting. Comm. Math. Phys., 225:67-89, 2002. L. Fichtner and M. Gabler. Characterisation of beam splitters. In L. Accardi, W. Freudenberg, and M. Ohya, editors, Quantum Bio-Informatics II, volume XXIV of QP-PQ: Quantum Probability and White Noise Analysis, pages 68 - 80, New Jersey London Singapore, 2009. World Scientifc. W. Freudenberg, M. Ohya, and N. Watanabe. On quantum logical gates on a general Fock space. In M. Schiirmann and U. Franz, editors, Quantum Probability and Infinite Dimensional Analysis. From Foundations to Applications, volume XVIII of QP-PQ: Quantum Probability and White Noise Analysis, pages 252 - 268, New Jersey London Singapore, 2005. World Scientific. Markus Gabler. Fock Space, Factorisation and beam Splitting: Characterisation and Applications in the Natural Sciences. PhD thesis, 2010. University Cottbus. Stuart Hameroff and Roger Penrose. Orchestrated objective reduction of quantum coherence in brain microtubules: the" orch or" model for consciousness. Mathematics and Computer Simulation, 40:453-480, 1996. R. Hari and O. V. Lounasmaa. Neuromagnetism: tracking the dynamics of the brain. Physical World, pages 33-38, May 2000. A. Nummenmaa, T. Auroanen, and M. S. Hamalainen. Hierarchical bayesian
115
estimates of distributed meg sources. N euroImage, 35:669- 685, 2007. 28. M. Ohya. Mathematical Foundation of Quantum Computer. Mazuren Publ. Comp., 1999. 29. M. Ohya. Complexity in quantum system and its application to brain function. In T. Hida and K. Saito, editors, Quantum Information II, pages 149160, Singapore, 2000. World Scientific. 30. M . Ohya, K.-H . Fichtner, and W. Freudenberg. Recognition and teleportation. In T . Hida and K. Sa ito, editors, Quantum Information V, pages 1- 17, New Jersey London Singapore , 2006. World Scientific. 31. J. W. Phillips, R. M. Leahy, and J. C. Mosher. Imaging neural activity using meg and eeg. IEEE Engineering in M edi cine and Biology, 16(3):34- 42, 1997. 32. P. R. R oelfsema and W. Singer. Det ecting connectedness. Cerebral Cortex, 8:385- 396, 1998. 33. J. M. Schwartz , H. P . Stapp , and M. Beauregard. Quantum physics in neuroscience and psychology. Phil. Trans . Royal Soc, B360( (1458)) , 2005. 34. Wolf Singer. Consciousness and the structure of neuronal representations. Phil. Trans . R. Soc . Lond. , B 353:1829- 1840, 1998. 35. Wolf Singer . Neuronal synchrony: a versatile code for the definition of relations? N euron, 24:49-65 , 1999. 36. Wolf Singer. Sriving for coherence. News and views. Nature, 397:391- 393, 1999. 37. Wolf Singer. Der Beobachter im Gehirn. Essays zur Hirnforschung. Suhrkamp Verlag, Frankfurt a.M., 2002. 38. H . P. Stapp. Mind, Malt er and Quantum Mechanics. Springer , Berlin Heidelberg, 2nd edition, 2003. 39. H. P . Stapp. A model of the quantum-classical and mind-brain connections, and the role of the quantum zeno effect in the physical implementation of conscious. arXiv, 0803.1633v1 (physics.gen-ph), 11 March 2008. 40. John von Neumann. Mathematis che Grundlagen der Quantenmechanik. Springer, Berlin, 1 edition, 1932.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 117-128)
STATISTICAL ANALYSIS OF RANDOM NUMBER GENERATORS
LUIGI ACCARDI Universita di Roma Tor Vergata, Centro Interdipartimentale Vito Volterra, Via Columbia 2, 00133 Roma, Italy, E-Mail: [email protected]
MARKUS GABLER Brandenburg Technical University Cottbus, Department of Mathematics, PO box 101344, 03013 Cottbus, Germany, E-Mail: [email protected]
In many a pplications, for example cryptogra phy and Monte Carlo simulation, there is need for random numb ers . Any procedure , algorithm or device which is intended to produce such is called a random number generator (RNG). What makes a good RNG? This paper gives an overview on empirical testing of the statistical prop erties of the sequences produced by RNGs and special software packages designed for that purpose . We also present the results of applying a particular test suite-TestU01to a family of RNGs currently being develop ed at the Centro Interdipartimentale Vito Volterra (CIVV) , Roma, Italy.
1. Introduction
Assume Xl, ... , X N is a random binary sequence, i.e. for all 1 ::; k ::; N X k is a Bernoulli random variable with P(Xk = 1) = Pk E (0,1) . Such a sequence is called purely random, if Xl"'" X N are independent and identically distributed (i.i.d.) with Pk = P = 1/2. In many applications, for example cryptography and Monte Carlo simulation, there is need for random numbers. Using binary expansion and transformation methods any such random numbers can be constructed from purely random binary sequences to an arbitrary precision, ignoring numerical difficulties for the moment. So many ways have been invented to produce, or at least simulate, realizations Xl, . . . , X N of such sequences. Repeated flipping of a "fair" coin and recording "0" for "heads" and "I" for "tails" would be one way, but surely too timeconsuming, if you needed, say, 10 18 exponential random numbers for some extensive stochastic simulation. Using randomness in physical quantities like noise in an electrical 117
118
circuit or the timing of strokes at a user keyboard would be another. Or even manipulation of a definitely deterministic process (like a small computer programm) in a way that the outcomes appear to be purely random might serve just as well. Any such procedure, algorithm or device which is intended to produce realizations of random sequences, we call a random number generator (RNG). Deterministic RNGs are also sometimes called pseudo-random number generators (PRNG) or algorithmic RNGs. But no matter what the nature of a particular RNG is, the question arises whether the sequences it produces can be distinguished from the ones coming from the purely random theoretical ideal. This paper only deals with the statistical properties of binary sequences (viewed as realizations of random binary sequences). But while good statistical properties are necessary, there are, of course, other important quality criteria, including: Efficiency with respect to time and memory. Sufficiently large period: The internal state of a deterministic RNG runs over a finite set and is therefore periodic, so running through a significant portion of the whole cycle should be beyond reach in practice. Repeatability: the ability to reproduce the same sequence as many times as needed. Non-deterministic RNGs are not repeatable. Portability: independence from software and hardware environment. Unpredictability: The next output bit of an RNG is not to be predicted from knowing preceding ones any better than by "tossing a fair coin". For a deterministic RNG this means that it should not be possible to find the internal state and/or the transition law from knowing the output sequence, at least not in reasonable time employing a reasonable amount of resources. The importance of these criteria mostly depends on the application the RNG is intended for. So while for some uses in cryptography nothing less than an unpredictable RNG will do, for simulational purposes the main emphasis might be on high efficiency and repeatability. Not all RNGs are designed to produce binary (bit) sequences. Besides bit generators, i.i.d. Uniform{O, ... , 2m - I} and i.i.d. Uniform[O, 1) generators are also very common. The former produce m-bit-integers, where 31 and 32 bits are used quite frequently, the latter approximate reals, also only up to a certain precision (float, double, etc.), of course. These different outcomes can be transformed one into the other quite easily, though. Di-
119
viding a bit sequence into m-bit-blocks yields the binary representation of a sequence of m-bit-integers. An m-bit-integer divided by 2m is a [0,1) real. The converse works just as well. In addition, a random binary sequence is purely random if and only if the corresponding m-bit-integer sequence is i.i.d. Uniform{O, ... , 2m - I}. Thus, even though some tests need bit sequences and others, say, 32-bit-integers as input, they can be applied to any RNG, as long as the output of the RNG is transformed to the right format. Section 2 deals with statistical tests and special software packages designed for testing RNGs. Then, in section 3, the results of applying a particular test suite - TestUOl - to a family of RNGs currently being developed at the Centro Interdipartimentale Vito Volterra (CIVV) at the Universita degli studi di Roma Tor Vergata, Italy, are presented. 2. Statistical Testing of RNGs Bad RNGs are those that fail simple tests, whereas good RNGs fail only complicated tests that are hard to find and run. P. L'Ecuyer
2.1. Historical Development Over the years many statistical tests for testing random number generators have been proposed. One of the first collections was found in earlier editions of Knuth [1]. These tests, plus a few others designed for testing parallel generators, were implemented in SPRNG: a scalable library for pseudorandom number generation by Mascagni and Srinivasan [2]. New and more stringent a tests, compared to the ones from the just mentioned Knuth [1], were introduced by Marsaglia in 1985 [3]. Most of these tests were later implemented in DIEHARD: A Battery of Tests of Randomness by Marsaglia in 1995 [4], probably the best-known software package for RNG testing. Because of its very unflexible setup its usefulness has become rather limited, though, by now. First of all, the sample sizes (as well as other parameters) are fixed in the package and not very large by modern standards, which makes the test results unreliable for many of todays applications. Second the sequence must be provided to the package a A test Tl is considered more stringent than another test T2, if a generator passing Tl is also likely to pass T2.
120
in the form of 32-bit-integers in a binary file, rather than just passing the RNG function to the package, which then in turn produces the numbers "on demand". This also makes any generator having an accuracy less than 32 bits fail DIEHARD. The National Institute of Standards and Technology (NIST), USA, developed the NIST Statistical Test Suite by Rukhin et. al. [5] for the evaluation of the Advanced Encryption Standard (AES) candidate algorithms. The state-of-the-art library for testing RNGs today is TestU01: A C Library for Empirical Testing of Random Number Generators introduced in 2007 by L'Ecuyer and Simard [6]. It implements: 1) a large variety of different RNGs proposed in the literature and/or used in software packages or operating systems, 2) most of the statistical tests from DIEHARD, the NIST package, the Knuth collection, other tests found in the literature and some original ones, 3) predefined test batteries and 4) tools for investigating dependence of the period length of a generator within a whole family of RNGs and the length of a sequence when this generator begins to fail a given test systematically. Many of the statistical tests proposed for testing random numbers are easily passed while other tests seem to be quite difficult to pass. So what makes a good test? In the last years some efforts have been undertaken which, eventually, might lead to something like a "hierarchy of tests". For example, Tsang et. al. [7] define the so-called "Stringency level" of a test, in order to optimize the choice of parameters for the Collision Test. The results of testing a wide range of generators against some difficult-to-pass tests of randomness by Marsaglia and Tsang [8] as well as against other test suits, including DIEHARD, NIST and Knuth's collection are found in [9]. According to their results, applying these three difficult-to-pass tests seems to "reduce the volume and concentrate the essence" of many of the statistical tests in use today. However, through empirical investigations on our part, we have found generators, that do pass these three tests, but fail others quite badly. More details on these results are reported in section 3.
2.2. General Setup of a Statistical Test What is the basic setup of a statistical test? Let Xl, ... , xN be a sequence of real numbers considered as realisations ofrandom variables Xl' ... ' X N over a common probability space (Sl, F, P) where P is unknown. Denoting with Po the set of all probability measures fulfilling certain given requirements about the common distribution of X I, ... ,XN , the statement Ho : P E Po
121
is called null hypothesis. Whether Ho holds or not is unknown and the object of a test is to gather information from the sequence Xl, ... , X N in order to either reject Ho or not. For the testing of binary sequences the null hypothesis throughout this paper will be given by Po = {P: XI, . . . ,XN is a truly random binary sequence}, i.e.
Ho : Xl , "" X N i.i.d., P(Xk
=
1)
= P(Xk = 0) =
1/2, k
=
1, ... , N.
(1) Definition 1. The random number Y := t(X I , ... , XN) , where t is a realvalued measurable map, is called statistic. A statistic T taking only the values and 1 is called a test. Thereby, the events {T = I} and {T = O} are interpreted as "reject Ho" and "do not reject Ho", respectively. If the probability of rejecting H o, even though it was true, is less than 0; E (0,1), i.e. Po (T = 1) :s; 0; for all Po E Po, the test is called significance test at level 0;.
°
Remark 2. Let T = t(X I , ... ,XN (1) For a given realization
)
be a significance test at level 0;.
Xl, ... ,XN
Ho is rejected , if t(Xl,""
XN)
=
l.
(2) Common choices for 0; are 0.05,0.01 or O.OOL (3) Rejecting Ho at level, say, 0; = 0.001, does not mean that it is not true. It just means that it is very unlikely to be true, because if it was true, Ho would only be rejected less than once in a thousand test runs, on average. It does happen nonetheless. On the other hand, not rejecting Ho does not mean it is true. The sequence was just "not bad enough" to be rejected. Now consider a statistic Y where to have a distribution function Fy(y) called p-value of the statistic Y. If F y distribution (under Ho). In this sense, statistics. Moreover
Po (0;/ 2 :s; U :s; 1 - 0;/ 2)
=
the distribution under Ho is known := Po(Y :s; y). Then U := Fy(Y) is is continuous, U has a Uniform[O, IJ p-values can be seen as standardized
0;
(0; E (0, 1), Po E Po) ,
(2)
l.e.
T := {01
0;/2:S; U :s; 1 - 0;/2 otherwise
(3)
122
defines a significance test at level 0:. Thus, Ho is rejected if the p-value U is close to 0 or 1. For general F y define a left and a right p-value of Y by
UL
:=
U
=
Fy(Y)
(4)
and
respectively, where F{J(y) := Po(Y 2: y) = 1 - Fy(y) + Po(Y = y). Then reject H o, if any of the left or right p-value is close to 0 (see L'Ecuyer and Simard [6]). Example: The X2-goodness-of-fit test
Let Xl, ... ,XN be a sequence of i.i.d. copies of a random variable X and Ho : P = Po the null hypothesis to be tested. For 1 ::; k ::; K define Pk := Po(X E A k ), where AI, ... ,AK is a finite partition of lR and let N
Ck
:=
L
(1 ::; k ::; K)
n{XnEAd
(5)
n=l
be the number of times the Xn fall into A k , where nB represents the indicator function of the event B. Then, under H o, C k follows a binomial distribution B(N,Pk) and Y :=
t
k=l
(Ck - EC k )2 ECk
=
t
k=l
(Ck - NPk)2 NPk
(6)
is approximately X2-distributed with K - 1 degrees of freedom (as N ---+ 00). Thus Ho is rejected, if the p-value U := FX2(K -1) (Y) is really close to 0 or 1, where Fx2(r) denotes the (continuous) distribution function of the X2 -distribution with r degrees of freedom. 2.3. Specialties for Testing of RNGs
As seen in the previous subsection, any statistic Y, whose distribution under Ho is known, at least approximately, can be used to define a test. Hence it will be called as a test statistic. In fact, for the testing of RNGs, a test can still be defined, even though the distribution of Y might not be (approximately) known. A number of generators may be used to estimate the distribution. And if these estimates are consistent across a variety of different and "presumably good" generators, the estimate may serve as Ho target distribution. In essence, statistical testing of RNGs is nothing but a particular kind of Monte Carlo simulation. Conversely, when testing an RNG for suitability
123
with respect to a particular Monte Carlo problem, running the simulation with a related but simplified model, that is, one where the distribution of the result can be attained theoretically, may serve as a test. Even if the distribution is not known, the results of the designated RNG can still be compared to the ones produced by a few other "good" generators of quite different designs. Most statistical tests for RNGs utilize the concept of a p-value. P-values of single tests should not only be in the proper range (not too close to 0 or 1), but should also be uniformly distributed on [0,1). Therefore, it might be useful to run the same test many times independently, i.e. on different parts of the original sequence. A number of independent first-level p-values can then be assessed by a second-level uniformity test resulting in an overall p-value. For example, while none of the single values of a sequence of supposedly independent p-values, say 0.24,0.26,0.23,0.21, gives any reason to question H a, the whole sequence does reveal a very non-uniform and/ or non-independent behaviour, though. This second-level test of uniformity could be the above-mentioned X2-goodness-of-fit test but usually an Anderson-Darling or a Kuiper version of the Kolmogorov-Smirnov test is applied. Very often RNGs are tested against whole batteries of tests and therefore p-values close to 0 or 1 are not too uncommon even for good (including perfect) generators. Therefore, the choice of a could be somewhat different from the ones for usual statistical testing mentioned earlier. If the final (first- or second-level) p-value of a test is really close to 0 or 1, the RNG is said to fail the test. If the p-value is suspicious, the t est is repeated and/ or the sample size is increased and often things will then clarify. Otherwise the RNG is said to have passed the test. But remember that this does not mean that the null hypotheses is true. The meaning of really close and suspicious should be made clear before running the test, of course. Test batteries usually have some suggested values for that purpose. For some applications there is the need for assessing much larger sequences than feasible due to memory limitations. These can be overcome by performing the same test many times on different subsequences and then somehow combining the results. One way is the above mentioned second-level uniformity t est. But there is another useful approach. For most statistical tests the target (Ha-) distribution of the test statistic Y is either a normal or a X2- or a Poisson distribution. Thus, instead of calculating a p-value for each test and then performing a second-level test, all the single Y 's could be added, the sum again being normal, X2 or Poisson,
124
respectively. This sum statistic could then, in turn, be used to give the final p-value, see [lOJ.
Example: The Birthday Spacings Test This test was introduced by George Marsaglia in 1984 [3J and uses the fact that (for large n) the number of collisions among the spacings induced by the order statistics of m independent uniform {O, ... , n - I} random variables is approximately Poisson with mean A = m 3/ (4n). The proof of the corresponding limit theorem (Theorem 3 below) was never published, though. A different proof was provided by Klykova in 2002 [11J . Let VIm (the birthdays) be independent and uniformly distributed on {O, ... , n - I} (the year),
o =: UCO)
:S U(1) :S ... :S UCm ) :S UCm+l)
:= n - 1
(7)
the corresponding order statistics, Slm+ 1 the induced spacings, i.e. Sk := UCk ) - UCk - l ) and C the number of collisions among those spacings, i.e. m
C :=
L
ll{S(k+l)=S(k)}'
(8)
k=l
where llA stands for the indicator function of the event A and SCI), ... , SCm+1) are the order statistics of the spacings Slm+l. The distribution of C we call birthday spacings collision distribution and we write C "-' BSC(m, n). Theorem 3. IfCn "-' BSC(mn,n) for all n E N such that m~/(4n) A then
P(Cn = k)
----+ n->oo
(k ;:::: 0).
----+ n->oo
(9)
• How does the test work? In [8J Marsaglia and Tsang suggest the following version of the Birthday Spacings Test. Choose m = 212 = 4096, n = 232 and therefore A = 4. Let the RNG generate 4096 32-bit-integers, sort them and take differences to get the spacings. Then sort the spacings and count how many times adjacent spacings are equal. This gives the number of collisions. Repeat this process 5000 times to get a realization of a sequence of supposedly independent Poisson random variables with mean A = 4. Then perform a X2-goodness-of-fit-test as follows: For k = 0, ... ,g let Nk be the number of times there were k collisions among the 5000 runs
125
and let NlO be the number of times there were more than 9 collisions. Also let Ek := 5000Pk be the expected number of Nk (under H o ), where Pk := ~~ e-.\ (k = 0, ... ,9) is the probability of a Poisson random variable taking the value k, PIa := 1 - I:~=o Pk respectively. Then
y.= .
L
10 (
Nk-Ek k=O Ek
)2
(10)
follows approximately a X2-distribution with 10 degrees of freedom. Therefore the test is failed if the p-value U := Fx2(10)(Y) is really close to 0 or 1, where Fx2 (r) denotes the (continuous) distribution function of the X2 -distribution with r degrees of freedom. Instead of performing a X2 -goodness-of-fit test on those 5000 collision counts, they could also be added, the sum being Poisson with mean 4 * 5000 = 20000. Even though it might not be feasible to calculate the corresponding p-value directly, this could be done by a normal approximation. In [10], L'Ecuyer and Simard design birthday spacings tests for testing uniformity in d dimensions by dividing the corresponding d-dimensional hypercube into hyper boxes of equal size and then enumerating them in the natural order. This kind of tests is implemented in TestU01 [6]. 3. Test Results
Good statistical properties of the sequences it produces being not sufficient, but certainly necessary, any RNG aspiring to be called good has to undergo some intensive and extensive empirical testing. In this section we present the results of applying the test batteries from the RNG software package TestUOl to a family of generators first introduced by Abundo, Accardi and Auricchi [12] and currently further being developed at the Centro Interdipartimentale Vito Volterra (CIVV) in Roma, Italy. They are also compared to the results of a few other popular generators, namely, the Wichmann-Hillgenerator [13] used in Microsoft Excel 2003, CombLec88 - a combination of two linear congruential generators used in RANLIB, CERNLIB, Boost, Octave and Scilab proposed by L'Ecuyer [14], KISS99 - a combination of four different generators designed by Marsaglia [15], the 2002 version of the Mersenne Twister generator and ISAAC, proposed by Jenkins for use in cryptography [16]. The MT19937 _ 2002 generator is described in [17], has good statistical properties and has become quite popular in software for numerical simulation (like MATLAB, for example). It is not unpredictable, though, and thus not recommended for cryptographical purposes.
126
Table 1.
Results of the test battery Crush (software package TestUOl 1.2.1).
It produces 144 p-values from 96 independent tests, thereby sampling
around 128 GByte of random bits (> 15 double layer DVDs).
I
Generator
I
Clear failures
I Suspect
(of which failed)
I
Total failed
QPdyn QPdyn-s
0 6
3 (0) 12(8)
0 14
MSExcel - 2003 CombLec88 Kiss99 MT19937 2002 ISAAC
12 1 0 2 0
7 5 2 4 3
12* 2 0 2 0
(*) (1) (0) (0) (0)
I
Note: * no follow-up testing done
Table 1 shows the results of applying the test battery Crush from the RNG software package TestUOl to QPdyn and QPdyn-s. Each test from the battery results in (at least) one p--value, a number in [0, 1]. It is interpreted according to table 2.
Table 2.
II
I Interpretation
p-value 0.01 < p < 0.99: p or (1 - p) < 10- 1 Otherwise:
Interpretion of p-values.
°:
Clear passed Clear failure Suspect behaviour, repeat test 3 times independently, if test continues to show suspect results, it is considered as failed, otherwise passed
As mentioned in section 2, Marsaglia and Tsang [8, 9] proposed three tests, that they consider to be a concentration of DIEHARD and other statistical tests used today. However, the generator QPdyn-s passes these three tests, but at the same time fails the test battery Crush quite badly, as can be seen from table 1. So this might be a starting point for comparing this Marsaglia-and-Tsang-three-tests-battery with the battery Crush for many different generators, to decide which is more stringent. Or, more likely, they will be found to be somewhat complementary in the sense that both batteries are specialized in finding certain deficiencies, which the other one does not detect. In this case an appropriate combination of both batteries should be considered.
127
4. Summary and Outlook The software package Test UOI should be considered THE STANDARD for testing RNGs. The particular version of QP-Dyn tested in the previous section passes all of the tests. But this is only the beginning. Eventually, QP-Dyn is intended to become an algorithm that automatically but somewhat randomly chooses parameters to generate not random numbers but random number generators. The idea of using computers to find good parameters for families of generators is also mentioned in section 3.7 of [18]. In a forthcoming paper we will study how to condense statistical testing in order to test QP-Dyn considered as a random RNG generator. Acknowledgements This work was supported by the Centro Interdipartimentale Vito Volterra at the Universita degli studi di Roma Tor Vergata, Italy and the European Union Research Training Network "Quantum Probability with Applications to Physics, Information Theory and Biology" . References 1. Donald E. Knuth. The art of computer programming. Vol. 2: Seminumerical algorithms. 3rd ed. Bonn: Addison-Wesley. xiii, 762 p. , 1998. 2. Michael Mascagni and Ashok Srinivasan. Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Transactions on Mathematical Software, 26(3):436~461, September 2000. See correction 19. 3. George Marsaglia. A current view of random number generators. In L. Billard, editor, Computational Science and Statistics: The Interface, pages 3~ 10, Amsterdam, 1985. Elsevier Press. 4. George Marsaglia. DIEHARD: A battery of tests of randomness, 1996. 5. A. Rukhin et. al. A statistical test suite for random and pseudorandom number generators for cryptographic applications. NIST Special Publication 80022, National Institute of Standards and Technology (NIST), Gaithersburg, Maryland, USA, 2001. 6. Pierre L'Ecuyer and Richard Simard. TestU01: A C library for empirical testing of random number generators. ACM Transactions on Mathematical Software, 2007. 7. W. W. Tsang, L. C. K. Hui, K. P. Chow, C. F. Chong, and C. W. Tso. Tuning the collision test for power. In ACSC '04: Proceedings of the 27th Australasian conference on Computer science, pages 23~30, Darlinghurst, Australia, Australia, 2004. Australian Computer Society, Inc. 8. George Marsaglia and Wai Wan Tsang. Some difficult-to-pass tests of randomness. Journal of Statistical Software, 7(3):1~8, January 2002. 9. Wai Wan Tsang. Development of cryptographic random number generators.
128
10.
11. 12. 13.
14.
15. 16.
17.
18.
19.
Technical report, The University of Hong Kong, Department of Computer Science and Information Systems, August 2003. Pierre L'Ecuyer and Richard Simard. On the performance of birthday spacings tests with certain families of random number generators. Math. Comput. Simul., 55(1-3) :131-137, 2001. N. V. Klykova. Limit distribution of a number of coinciding intervals. Theory Probab. Appl., 47(1):151-156, 2002. M. Abundo, Luigi Accardi, and A. Auricchio. Hyperbolic automorphisms of tori and pseudo-random sequences. Calcolo, 29(3-4):213-240, 1992. B. A. Wichmann and 1. D. Hill. An efficient and portable pseudo-random number generator. Applied Statistics, 31:188-190, 1982. See also corrections and remarks in the same journal by Wichmann and Hill, 33 (1984) 123; McLeod 34 (1985) 198-200; Zeisel 35 (1986) 89. P. L'Ecuyer. Efficient and portable combined random number generators. Communications of the ACM, 31(6):742-749 and 774, 1988. See also the correspondence in the same journal, 32, 8 (1989) 1019-1024. G. Marsaglia. Random numbers for C: The END? Posted to the electronic billboard sci.crypt.random-numbers, January 20 1999. B. Jenkins. ISAAC. In Dieter Gollmann, editor, Fast Software Encryption, Proceedings of the Third International Workshop, Cambridge, UK, volume 1039 of Lecture Notes in Computer Science, pages 41-49. Springer-Verlag, 1996. http://burtleburtle.net/bob/rand/isaacafa.html. Makoto Matsumoto and Takuji Nishimura. Mersenne twister: a 623dimensionally equidistributed uniform pseudo-random number generator. ACM Trans. Model. Comput. Simul., 8(1):3-30, 1998. P. L'Ecuyer. Uniform random number generation. In S. G. Henderson and B. L. Nelson, editors, Simulation, Handbooks in Operations Research and Management Science, chapter Chapter 3, pages 55-81. Elsevier, Amsterdam, The Netherlands, 2006. Michael Mascagni and Ashok Srinivasan. Corrigendum: Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Transactions on Mathematical Software, 26(4):618-619, December 2000. See 2
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 129- 135)
ENTANGLED EFFECTS OF TWO CONSECUTIVE PAIRS IN RESIDUES AND ITS USE IN ALIGNMENT TOSHIHIDE HARA*, KEIKO SATO and MASANORI OHYA
Department of Information Sciences, Tokyo University of Science, 2641 Yamazaki, Noda City, Chiba, Japan * E-mail: [email protected]. tus.ac.jp In the previous paper,l we showed the significant improvement of sequence alignment can be done by considering the entanglement between two consecutive pairs of residues, in which we introduced a new measure defined on compound systems of two sequences which taking entanglement. In this paper, we explain the advantage of our new measure compared with the "sum of pairs" measure as the measure often-used, and we also show the performance evaluation of our algorithm against other six alignment methods. Our alignment is available at our webpage .. http://www.rs.noda.tus.ac.jp/%7Eohya-m/ ...
Keywords: Sequence analysis; Sequence alignment; Entanglement; Transition probability.
1. Introduction
According to Anfinsen's dogma,2 for a small globular protein, its threedimensional structure is determined by the amino acid sequence of the protein. There may exist intersite correlations at least for two consecutive pairs of residues. Gonnet et al. considered this possibility.3 We could improve alignment accuracy by taking into account information of the intersite correlations. In the papers,1,4 we developed the alignment method called MTRAP based on such concept. For aligning the sequences, general methods find a path that gives the minimum value to the sum of difference (the maximum value to the sum of similarity) for each residue pair between two sequences. Our method is to change the way defining the difference (so, the sum) above by introducing entanglement of two consecutive pairs of residues. We explain the advantage of our new measure compared with the "sum of pairs" measure as the measure often-used, and we also show 129
130
the performance evaluation of our algorithm against other six alignment methods. 2. A new measure taking entangled effects of two consecutive pairs in residues
First, let us establish some notations. Let [2 be the set of all amino acids, and [2* be the [2 with the indel (gap) "*": [2* == [2 u {*}. Let [[2*] be the set of all sequences of the elements in [2*. We call an element of [2 a residue and an element of [2* a symbol. In addition, let f == [2 x [2 be the direct product of two [2s and f* == [2* x [2*. Consider two arranged sequences, A = ala2 · ·· an and B = bl b2 · .. bn , both of length n, where ai, bj E[2* . We also denote the sequences by UI U2··· Un, where Ui = (ai, bi ) E f*, and we call Ui a site in the following discussion. In general, the relative likelihood that the sequences are related as opposed to being unrelated is known as the "odds ratio":
p(A;B) R(A,B) = p(A)p(B)
p(al,a2,··· ,an ;b l ,b2,··· ,bn ) p (aI, a2,··· ,an) p (b l , b2,··· ,bn )"
(1)
Here, p (a) is the occurrence probability of the given segment and p (a; b) is the joint probability that the two segments occur. In order to arrive at an additive scoring system, Equation (1) is typically simplified by assuming that the substitutions are independent of the location and there is no intersite correlations; namely, p (A) = TIp (a), p (B) = TIp (b) and p (A, B) = TIp (a, b). Thus the logarithm of Equation (1), known as the log-odds ratio, is now a sum of independent parts:
(2) where p (a; b) s (a, b) = log p (a) p (b)
(3)
is the log likelihood ratio of the symbol pair (a , b) occurring as an aligned pair to that occurring as an unaligned pair. The s (a, b) is called a score and S = (s (a, b)) is called a substitution matrix. These quantities (Equation (2) and (3)) are used to define a measure for pairwise sequence alignment 5. Here, we define a normalized substitution matrix (i.e., every element in S takes the value between 0 and 1) and define a difference of A and B.
131
Let fs :
[8 m in,8 max
l
f--+
R be a normalizing function: 8max -
fs ( x) ==
X
, 0::; fs (x) ::; 1,
(4)
8 max
== max {max{s (un, gap cost}, uEr
(5)
8 m in
== min{min{S(u n , gap cost}. uEr
(6)
Smax -
Bmin
where
Let put s (a, b) == f (8 (a, b)) for a, b E O. This s (a, b) is a normalized expression of the score 8 (a, b). By using this quantity, we define a normalized substitution matrix as M = (s (a, b)). Then a difference of A and B is defined by
(7) When the sequence A is equal to B the difference dsub (A, B) has a minimum value O. One of the essential assumption for the above approach (using a sum of independent parts as a difference of A and B) was the induction of the occurring probability. We could take more informative approach by including the intersite correlations. Crooks et al. tried one of such an approach 6. They introduced a measure for two sequences based on a multivariate probability approximated by using the intersite relative likelihood. But, they concluded that their approach is statistically indistinguishable from the standard algorithm. We feel that their measure (equation (4) in their paper) is different from ours. To introduce their measure, they defined a type of joint probability. However it can not be a probability, because their quantity is the multiplication of likelihood "ratios", so it goes beyond more than 1. Moreover, we think that the intersite relative likelihood may not describe the difference of sequence A and B. Under an assumption that each site of the sequences has Markov property, we propose a new measure for two sequences by adding a transition effect and its weight E (a degree of mixture):
Rour (A, B)
= R (A, B)l-c R t (A, Bt ,
(8)
where n-l
Rt(A, B) ==
II P (Ui+1 \ Ui). i=l
(9)
132
Here we introduce a normalized transition i (Ui' UH1) called "Transition Quantity", in order to simplify the equation. Let i (Ui' uHd be a normalized transition defined as
where
it (x; u)
i (Ui' uHd == it (t (Ui' uHd ; Ui) ,
(10)
t (Ui' Ui+l) == logp (Ui+l \ Ui) ,
(11)
is a normalizing function:
it (x; u) = {
-x if x> 0 maxvEr* {-t (u, v)}' 1, otherwise
(12)
By using the above quantity, a difference of A and B representing the "intersite transition" is defined as n-1
dtrans (A, B)
=
L
i (Ui' Ui+l) .
(13)
i=l
Consequently, we define a difference measure for two sequences by combination of two differences dsub and dtrans: dMTRAP
(A, B)
=
(1 - c)
d sub
(A, B)
+ cdtrans (A, B).
The mathematical details can be found in the paper
(14)
4.
3. Evaluation We compared the performance of MTRAP to those of the most often used six methods: ClustalW2, MAFFT, TCoffee, DIALIGN, MUSCLE and Probcons. The details of these nine methods are: (1) ClustalW2 7 ,8, a typical progressive multiple alignment method; (2) MAFFT ver. 6 9, a fast method with Fourier transform algorithm; (3) TCoffee ver. 5.31 10, a heuristic consistency-based method that combines global and local alignments; (4) DIALIGN ver. 2.2 11 , a method with segment-segment approach; (5) MUSCLE ver. 3.7 12, a method with Log-Expectation algorithm and (6) Probcons ver. 1.12 13, a probabilistic consistency-based method. These programs without MAFFT used their default parameters and MAFFT used "L-IN-i" strategy mode. To measure the accuracy of each method, we used three different databases: HOMSTRAD (version November 1, 2008) 14,15, PREFAB 4.0 12 and BAliBASE 3.0 16. These are the databases of structure-based alignments
133
for homologous families. We used the all 630 pairwise alignments obtained from the HOMSTRAD for pairwise alignment tests , and used the all 1031 multiple alignments obtained from this database for multiple alignment tests. We also used the all 1682 protein pairs obtained from the PREFAB 4.0 for pairwise alignment tests. The BAliBASE 3.0 contains 5 different reference sets of alignment for testing multiple sequence alignment methods. We used the BBS sets included in the references 1,2,3 and 5. The BBS sets are described as being trimmed to homologous regions. In order to avoid using the same dataset for training and test, We estimated the transition quantity by using the superfamilies subset from the dataset SABmark, which is described in the section "Estimation of the Transition Quantity" . We also used this dataset for optimizing the parameters Alignment accuracy was calculated with the Q (quality) score 12. The Q score is defined as the ratio of the number of correctly aligned residue pairs in the test alignment (i.e., the alignment obtained by a specified algorithm such as MTRAP, Needle, etc.) to the total number of aligned residue pairs in the reference alignment. When all pairs are correctly aligned, the score have a maximum value 1, and when no-pairs are aligned the score have a minimum value o. This score has previously been called the SPS (Sum of Pairs Score) 17 or the developer score 18. Let us redefined this score in our notations. Let Ai (i = 1, ... , N) indicates the ith sequence of the reference alignment with length L, and let aik E [2* indicates the kth symbol in the sequence A i . When aik =I- *, it is important to find the number of the site in the test sequence corresponding to the symbol aik, whose numbers are denoted by nik. When aik = *, put n i k = 0 (i = 1, ··· , N). Then the Q score is given as N-l
L
Q score
=
N
= ,Y
L:l.a i k,aJk8nik ,n j k
i=1 j=i+l k=1 --:'N=----I--:-N=------::-L-----
L
L:l. x
L
L L
{I,
L L
L:l.aik , a j k
i=l j=i + l k = l
x 0, x
=I- * and y =I- * . = * or y = *
4. Results
We compared MTRAP with six different alignment methods by using all 1682 protein pairs of PREFAB 4.0 and all 630 protein pairs of HOMSTRAD. We used GONNET250 matrix with the MTRAP. The similar-
134
ity between the test alignment (sequence alignment by each method) and the reference alignment (obtained from PREFAB 4.0 or HOMSTRAD) was measured with the Q score. Suppose as usual that the reference alignment is the optimal alignment, the results of PREFAB 4.0 (Table 1) and those of HOMSTRAD (Table 2) indicate that our method works well compared with other methods. Our method achieves the highest ranking compared with all other methods except only one range 30-45%. Especially for the identity range 0-15%, MTRAP is 4 cv 5% accurate than the 2nd ranking method. For the identity range 30-45% of HOMSTRAD, Probcons performs slightly better (cv 1%). Table 1.
Average Q scores on the PREFAB 4.0 database
Method 0-15%(212) 0.248 0.170 0.133 0.205 0.199 0.204 0.198
MTRApa MAFFT DIALIGNb MUSCLE ClustalW2 Probcons TCoffee
PREFAB 4.0 15-30%(458) 30-45%(74) 0.674 0.877 0.860 0.671 0.556 0.814 0.867 0.632 0.859 0.644 0.647 0.875 0.642 0.872
All(1682) 0.615 0.568 0.518 0.581 0.586 0.590 0.585
CPU 120 200 100 35 70 120 180
Note: The average Q scores of four testing datasets with different identity ranges on PREFAB 4.0 are shown. The number in parentheses denotes the number of alignments in each sequence identity range. For each sequence identity range, the best scores are in bold. CPU is the total computing time for all alignments in seconds. aMTRAP uses GONNET250 substitution matrix. bDIALIGN reported critical errors for some testing data. Therefore, the scores of DIALIGN were calculated by the partial testing data.
Table 2.
Average Q scores on the HOMSTRAD database (Pairwise only)
Method MTRApa MAFFT DIALIGN b MUSCLE ClustalW2 Probcons TCoffee
0-15%(25) 0.412 0.309 0.216 0.337 0.313 0.344 0.341
HOMSTRAD (Pairwise only) 15-30% (207) 30-45%(173) All(630) 0.659 0.879 0.819 0.610 0.863 0.796 0.760 0.546 0.825 0.625 0.868 0.802 0.619 0.867 0.800 0.650 0.884 0.816 0.634 0.872 0.809
CPU 45 60 35 15 25 50 70
Note: The average Q scores offour testing datasets with different identity ranges on HOMSTRAD are shown. The notations are the same as Table 1.
135
5. Conclusion We indicated that the alignment implementing our new measure, introducing entangled effects of two consecutive pairs in residues, leads to a significant increase in alignment accuracy to other six methods, and this is especially apparent in low identity range. We will apply our new technique to several methods studying life from sequences, such as multiple alignment, motif finding and phylogenetic analysis, where it is considered that the residue substitutions in a sequence are independent of the location.
References 1. T. Hara, K. Sato and M. Ohya, QP-PQ: Quantum Probability and White Noise Analysis (Quantum Bio-Informatics III) 26, 443 (2010). 2. C. B. Anfinsen, Science 181, 223(Jul 1973). 3. G. Gonnet, M. Cohen and S. Benner, Biochemical and Biophysical Research Communications 199, 489 (1994). 4. T. Hara, K. Sato and M. Ohya, BMC Bioinformatics 11, p. 235 (2010). 5. S. Altschul, 1. Mol. Bioi. 219, 555 (1991). 6. G. Crooks, R. Green and S. Brenner, Bioinformatics 21, p. 3704 (2005). 7. J. D. Thompson, D. G. Higgins and T. J. Gibson, Nucleic Acids Res. 22, 4673(Nov 1994). 8. M. A. Larkin, G. Blackshields, N. P. Brown, R. Chenna, P. A. McGettigan, H. McWilliam, F. Valentin, 1. M. Wallace, A. Wilm, R. Lopez, J. D. Thompson, T. J. Gibson and D. G. Higgins, Bioinformatics 23, 2947(Nov 2007). 9. K. Katoh, K. Misawa, K. Kuma and T. Miyata, Nucleic Acids Res. 30, 3059(Ju12002). 10. C. Notredame, D. G. Higgins and J. Heringa, 1. Mol. Bioi. 302, 205(Sep 2000). 11. B. Morgenstern, Bioinformatics 15, 211(Mar 1999). 12. R. C. Edgar, Nucleic Acids Res. 32, 1792 (2004). 13. C. Do, M. Mahabhashyam, M. Brudno and S. Batzoglou, Genome Research 15, p. 330 (2005). 14. K. Mizuguchi, C. M. Deane, T. L. Blundell and J. P. Overington, Protein Sci. 7, 2469(Nov 1998). 15. L. Stebbings and K. Mizuguchi, Nucleic acids research 32, p. D203 (2004). 16. J. D. Thompson, P. Koehl, R. Ripp and O. Poch, Proteins 61, 127(Oct 2005). 17. J. Thompson, F. Plewniak and O. Poch, Nucleic Acids Res 27,2682 (1999). 18. J. Sauder, J. Arthur and R. Dunbrack Jr, Proteins Structure Function and Genetics 40, 6 (2000).
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 137-143)
THE PASSAGE FROM DIGITAL TO ANALOGUE IN WHITE NOISE ANALYSIS AND APPLICATIONS
TAKEYUKI HIDA Professor Emeritus of Nagoya University and Meijo University
AMS 2000 Subject Classification: 60H40 White Noise Theory
1. Prologue One of the basic idea of white noise analysis is Reduction of random phenomena (cf. [7]), so that the given system is made to be a function of idealized elemental random variables (abbr. i.e.r.v.), for example
B(t), white noise and
Fu, elemental Poisson system. More precisely, we may say that it would be fine if random complex phenomena can be express as a function of elemental random variables, so that they are ready to be analyzed and that the theory would be developed in line with stochastic analysis. As soon as we come to applications of the theory to actual problems, we need to approximate the i.e.r.v. by a system of ordinary independent random variables. Naturally follows approximation of the analysis.
Motivations Examples of the motivation of our study. i) Biological phenomena. There is involved fluctuation which is considered to be Gaussian. Behind it, we see a Gaussian process that can be reduced to white noise. 137
138
ii) We often meet probability distributions which have long tailor fat tail, sometimes called fractional power distribution. They are not considered to be Gaussian. Such a distribution may appear as that of a stable stochastic process evaluated at some instant. The process admits the Levy decomposition, or the Levy-Ito decomposition, theoretically. Each component is an infinitesimal Poisson process and even it is elemental. In each case, efficient method of application requires suitable method of approximations. Before we come to actual methods, we shall provide some considerations on a system of independent random variables which come from the effect of reduction. iii) The third motivation has somewhat different flavor. Good examples are stable distributions with some exponent. It appears as a distribution of a stable stochastic process evaluated at some instant. There a duality between time and Poisson components via the Levy decomposition of a stable process.
2. An i.i.d. random variables as a representation of parameter set Probabilistic properties of a sequence of i.i.d. (independent identically distributed) random variables may be discussed under the understanding that the sequence is a representation of the parameter set used as index of a sequence of independent random variables. We assume the sequence of i.i.d. random variables is Gaussian.
Case 1. Digital We have X = {Xn' n E N}, where N is the set of positive integers. The probability distribution m is the direct product of count ably many standard Gaussian distributions. The complex Hilbert space L2 = L 2(ROO, m) involving square integrable functions of the form f(X n , n E N) is separable, where Hermite polynomials in X~s form a complete orthonormal base of L2. Differential operators an is defined as follows:
akf(Xn,n
E
a
N) = -a f(xn,n Xk
E
N)lxn=x n
·
It is a derivation and, in fact, it is an annihilation operator. Its adjoint operator can be defined in the usual manner. The polynomials in the
az
139
annihilation and creation operators are defined. In particular, the Levy Laplacian ~i acting on L2 is given as follows: 1 L...l.L =hm N Ad.
N ""::>2
LUk' 1
Its domain is dense in L2.
Case II. Analogue II. 1) Analogue and separable
Gaussian case The time parameter space is taken to be T = [0, 1] to fix the idea. Let B(t), t E T, be a Brownian motion from which white noise 13(t) is defined, Formally speaking the 13(t)'s are independent system of idealized random variables. The probability distribution JL is introduced on the measurable space (E*, B), where E* is a space of generalized functions on R1 and where B is the sigma-field generated by the cylinder subsets of E*. Proposition 2.1. The measure space (E*, B, JL) is an (abstract) Lebesgue space where the calculus can be done. As a result, we have the Hilbert space (L2) = L2(E*, JL) which is, of course, separable and on which analysis can be carried on. Functionals in (L2) that we shall be concerned with are of the form
cp(13(t), t
E
(1)
T),
which will be simply written by cp(13). Approximations can be done as follows: Take a partition ~n = {~k = [k/2 n , (k + 1)/2 n ]}, k = 0 ~ k ~ 2n Then, we have two ways for the approximation to white noise. tends to 13 (t) as ~ nk ----+ t in linear white noise functionals. ii) Use of c"B ~ i)
c"B c,,~
Hel-1)'
-
I}.
the space of generalized
Poisson case The parameter set is taken to be (0, 00 ). For each u we associate an infinitesimal Poisson process Fu , u ETa, where To = (0, (0). As for the case of 13(t), Fu itself is not an ordinary random variable, but it can be a generalized random variable as is discussed in the next section.
140
II. 2) Analogue and non separable,
Let X = {Xt, t E Rl} be a system i.i.d. ordinary random variables, each of which is subject to the standard Gaussian distribution N(O, 1). Then, the probability distribution v = IItERlLt, where ILt is the distribution N(O, 1). Proposition 2.2. The probability measure space (RR, IItBt , v) is not an abstract Lebesgue space . With this property we see that the space L2(RR, v) is quite different from a white noise space, and it is not useful in the calculus of random functionals. 3. Poisson noise i) Background. We are going to propose a new direction of the treatment of random functions which are expressed as functionals of Poisson noise, As was briefly mentioned in the motivation, we are asked to introduce a method of analyzing functionals of Poisson noise. An urgent request has come from the study of random phenomena the probability distribution of which is of fractional power or of long (fat) tail. Standard distribution of this kind is the stable distribution. Suppose we are suggested to approximate the given fractional power distribution by a stable distribution. Then the next step is to determine the power CY, which is one of the significant characteristics of the distribution. To this end, one may think of the least square method in statistics. But it is not recommended by many important reasons. Here we do not go into details on this problem. A reasonable method uses the evaluation of the area given by the histogram of the data over intervals far from O. Even the power CY is obtained, we can not investigate the structure of the given random phenomena. In reality, we have determined only onedimensional distribution, it does not provide enough information for the determination of the random phenomena in question. ii) What we can do is that we try to discover the history of the phenomena, together with its environment. A favorable case is that the given stable distribution can be regarded as what is evaluated at some instant from a stable stochastic process. One may think that because of the circumstance of the environment, the observed data might have come from the accumulation of independent data
141
obtained in the past. Such a case, we say that the observed data can be embedded in a stable stochastic process. One may think it is too favorable, but we know actual data has such a history. iii) Steps of the analysis. Once we can find favorable stable process, which is a Levy process. We therefore know the famous Levy decomposition. In the present case, there is no Gaussian component and constant term can be ignored. We, therefore have compound Poisson processes. To come to the next stage, we must make a few important remarks. a) The Levy decomposition says that the stable process just determined is consisting of many (actually, continuously many) independent Poisson type processes. (We call Poisson type process, if the process has the same distribution as a Poisson process up to constant), although they are infinitesimal. We need numerical technique of discriminating those Poisson type processes according to different jumps. In other words, we need suitable method of approximation to come to the actual stsps. b) Having obtained a single component of Poisson type process, we must fix the time, say t = 1, to have a system of idealized elemental random variables. After that we can imitate the steps of Gaussian case in order to discuss nonli9near functionals of them. c) Each component of the stable process is a Poisson type process which is parametrized by u the amount of the jump. If the jump is different, then they are mutually independent. We are now given a so-called generalized stochastic process with independent values at every point in the sense of Gel'fand. d) The self-similarity of a stable stochastic process may be rephrased as the duality between the time t and the amount u of the jump. In the analysis of Poisson noise functionals, this fact should be taken into account. 4. Calculus of Poisson noise functionals Noise, in this section, does not mean the time derivative, but means derivative in u the space parameter of Poisson type processes. The parameter u denotes the amount of jump.
142
We are going to show the steps of the calculus in question, rather quickly. Details with proofs will be reported in the separate paper [6]. (1) We restrict the time interval to a finite interval I = [E, K]. A Poisson type process with jump as high as u, with u E I, is denoted by Pu(t), the intensity of which is denoted by >.(u) > O. The characteristic function of Pu (1) is given by
The sum of independent PUk (1), k function of the form
= 1,2"" ,n, has the characteristic
Finally we come to the expression of the form
cp(z) = exp[j (e izu
l)>.(u)du].
-
More generally, we may replace >.(u)du with a measure d>.(u). Now we must assume a condition on the intensity measure d>.(u) so that the integral converges and has meaning. Namely, we assume
j
d>.(u) <
00.
(2) Probability measure on u-space. Consider the characteristic function
cp(z)
=
exp[j (e izu
-
l)d>.(u)].
The integral in the above expression is expressed in the form
143
1
eiZUd)"(u)
+ canst,
Z
E RI.
Noting that Z can vary in RI arbitrarily, we can make the intensity measure to be the delta measure, say bUD. This means that one can pick up an elemental (atomic) Poisson process with jump uo, let it be denoted by
F'(uo) (3) The last question is how to carryon approximation. It is now the time to remind the notion of (t, u)-set introduced by P. Levy [9] and also a formal expression of a compound Poisson process lim
1
p->oo p>lul>l/p
(F'(u) -
~ )dn(u), 1
+u
where dn(u) is the Levy measure. See [1] Section 3.2. In the present setup, the approximation of single Poisson component does correspond to the approximation of the delta measure on u-space. It is in line with the passage from digital to analogue.
References 1. T. Hida, Stationary stochastic processes. Princeton University Press. 1970. 2. T. Hida, Analysis of Brownian functionals. Carleton Math. Lecture Notes no. 13, Carleton University, 1975. 3. T. Hida, Brownian motion. Springer-Verlag. 1980. 4. T. Hida and Si Si, An innovation approach to random fields. Application of white noise theory. World Scientific Pub. Co. 2004. 5. T. Hida and Si Si, Lectures on white noise functionals. World. Sci. Pub. Co. 2008. 6. T. Hida, Si Si and Win Win Htay, A noise of Poisson type and its gdneralized functionals, preprint (submitted). 7. J.L. Lions, The earth, planet, the role of mathematics and supercomputers. (original in Spanish). Spanish Inst. 1990. 8. Si Si, Effective determination of Poisson noise. IDAQP 6 (2003), 609-617. 9. P. Levy, Theorie de l'addition des variables aleatoires. Gauthier-Villars, 1937, 10. P. Levy, Processus stochastiques et mouvement brownien. Gauthier-Villars. 1948. 2eme ed. with supplement 1965. 11. P. Levy, Problemes concrets d'analyse fonctionnelle. Gauthier-Villars. 1951. 12. W. Feller, An introduction to probability theory and its applications. vol.1. Wiley, 1950. Chapt. in particular Chapt. III. 13. I. Ojima, Levy process and innovation theory in the context of Micro-Macro duality, Proc. The 5th Nagoya Levy Seminar. 2006. 65-69.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 145-156)
REMARKS ON THE DEGREE OF ENTANGLEMENT
DARIUSZ CHRUSCINSKIl, YUJI HIROTA 2 , TAKASHI MATSUOKA 3 AND MASANORI OHYA 4 1 Institute of Physics, Nicolaus Copernicus University 2 Quantum Bio-Informatics Center, Tokyo University of Science, 3 Department of Business Administration and Information, Tokyo University of Science, Suwa, 4 Department of Information Science, Tokyo University of Science We analyze a measure of quantum entanglement called degree of entanglement (DEN). It is shown how DEN behaves for well known classes of bipartite states. Moreover, we compare DEN for quantum states having the same marginals. Contrary to naive expectation it is shown that separable state might possesses stronger correlation (measured by DEN) than an entangled state. Keywords: Quantum entanglement, Quantum entropy
1. Introduction
In recent years, due to the rapid development of quantum information theory 1 the necessity of classifying entangled states as a physical resource is of primary importance. It is well known that it is extremely hard to check whether a given density matrix describing a quantum state of the composite system is separable or entangled. There are several operational criteria which enable one to detect quantum entanglement (see e.g. 2 for the recent review). The most famous Peres-Horodecki criterion is based on the partial transposition: if a state p is separable then its partial transposition pr = (ll ® T)p is positive. States which are positive under partial transposition are called PPT states. Clearly each separable state is necessarily PPT but the converse is not true. We stress that it is easy to test wether a given state is PPT, however, there is no general methods to construct PPT states. There are several measure of entanglement 2. However, there is no universal measure which shows that the problem of quantifying quantum entanglement can not be reduced to computing a single quantity. Moreover, 145
146
various measures are not compatible: if EI and E2 are two measures, then one can find two states p and pi such that EI (p) < EI (pi) but E2 (p) > E2(p'). It shows that various measures shows different aspects of quantum correlations. In the present paper we analyze a particular measure called degree of entanglement (DEN) and based on the mutual entropy.1O,1l,18 As other measures DEN uniquely characterized the entanglement of pure states. However, it gives only a partial answer for mixed states. This paper is organized as follows: in Section 2 we introduce basic properties of DEN. Sections 3 and 4 provide several examples of quantum states for which one easily compute DEN. Moreover, since they possess the same marginal states (maximally mixed) one can compare the corresponding degree of entanglement. Surprisingly, it turned out that separable state can have stronger correlation (with respect to DEN) than an entangled state. Final conclusions are collected in the last section.
2. The Degree of Entanglement We begin our discussion by recalling the definition of quantum entanglement. Throughout this paper, Hilbert spaces are assumed to be finite dimensional. If is the state on the Hilbert space HI ® H 2 , then Tr1i2e denotes the partial trace of with regard to H 2 .
e
e
Definition 2.1. Let H be a tensor product Hilbert space of two Hilbert spaces HI and H2 and B(H) the set of bounded operators on H.
e
(1) A state
on B(H) is said to be separable if there exist finite sequences of density operators {pJt'-,1 C B(Hd and {O'dt'-,l C B(H 2) such that
(1) with
"\,,N ~2
(2) A state
A2 = 1 and A > 0 (Vi = 1 ... N) 1,
-
" .
e on B(H) is said to be entangled if it is not separable.
The classical example of an entangled pure state is given by w el - el ® eo) - Bell state of two qubits.
=
Jz (eo ®
Definition 2.2. Let HI, H2 be Hilbert spaces and e a density operator on HI ® H2 with marginal states p = Tr1i2e and 0' = Tr1il e. The DEN for e
147
with regards to p, a is defined by the following formula 1
D(() : p, a) = 2{S(P)
+ S(a)} - Ie(p, a),
where Ie (p, a) is the mutual entropy for () : Ie(p, a)
(2)
= tr()(log () -log p (>9 a).
In terms of von Neumann entropy, Ie (p, a) can be rewritten in the form + S(a) - S(()). Therefore, one obtains finally
Ie(p,a) = S(p)
1
D(() : p, a) = S(()) - 2{S(P)
+ S(a)}.
(3)
We will often use this form to calculate DEN in this and all subsequent sections. As an example let us calculate DEN for the singlet state w. The marginal states WI = W2 = ~h and hence one finds D(w : Wl,W2) = -log2 < O. Actually, one has the following Theorem 2.1. M. Ohya and T. Matsuoka 18 Let () be a pure state with marginal states p, a. Then, we have the following classification:
(1) () is separable if and only if D(() : p, a) = 0; (2) () is entangled if and only if D(() : p, a) < O. According to the above theorem, DEN gives us the necessary and sufficient condition for the separability of pure states. For mixed states one has 3 ,10,1l Theorem 2.2. Assume that the compound state () is a mixed state. If () is separable, then D(() : p, a) > O. Now, we compare quantum bipartite states with respect to DEN. Definition 2.3. Let ()1, ()2 be states which have common marginal states p, a. The state ()1 possesses stronger correlations than ()2 if the following inequality holds:
D(()l : p,a) < D(()2 : p,a).
(4)
The less value of DEN one gets, the stronger correlations one has. Here a natural question arises: Problem Let ()1 and ()2 be compound states having the common marginal states p, a. Assume that D(()l : p, a) < D(()2 : p, a). If ()2 is entangled, is ()1 entangled as well? In what follows we show that it is not the case.
148
3. DEN for 2-qudit states 3.1. Circulant states We start this section by recalling the definition of circulant state. 12 (see also Refs. [13, 14]). Consider the finite-dimensional Hilbert space Cd (d E N) with the standard basis {eo, el, ... ,ed-d. Let ~o be the subspace of Cd 0 Cd generated by ei 0 ei (i = 0, 1, ... , d - 1) : ~o
= span{ eo 0 eo, el 0 el, ... ,ed-l 0 ed-I}.
(5)
For any non-negative integer 0:, we define the operator sa on Cd by
ek
f---+
ek+a(mod d),
(k = 0,1, ... , d -1),
and denote by ~a the image of ~o by Id 0 sa : ~a = (Id 0 sa)~o. It turns out that ~a and ~;3 (0: =I- {3) are orthogonal to each other and
Cd 0 Cd ~ ~o EB ~l EB ... EB ~d-l.
(6)
This decomposition is called the circulant decomposition. Let Po, PI' ... , Pd-l be positive d x d matrices with entries in C which satisfy
tr(po For each matrix Pa (0: p~ on (C d )®2 as
+ ... + Pd-l) = 1.
(7)
= 0,1,··· ,d - 1), we define the new linear operator d-l
p~ =
L
(ei' Paej leij 0 sa eij (S")*,
(8)
i,j=O where eij means leil(ejl. Since Skeij(Sk)* be also written as
= ei+k,j+k, we note that
p~ can
d-l
p~
=
L
(ei' p"ej leij 0 eHa,j+a·
(9)
i,j=O One can easily check that the sum of these operators d-l
p" = LP~
(10)
,,=0
defines a density matrix on (C d )®2. For further details of circulant states we refer to Ref. 12.
149
Let us consider a particular example of a circulant state for d
= 3:
1 (111) Po = A lII , 111 where
(12) and
E
> o. It can easily be checked that tr(po + PI + P2)
ti Po
1
= A
100010001 000000000 000000000 000000000 100010001 000000000 000000000 000000000 100010001
,
ti _ E PI - A
= 1. One finds
000000000 010000000 000000000 000000000 000000000 000001000 000000100 000000000 000000000
(13)
and
ti _ liE P2-A
000 000 000 000 000 000 001 000 000 000 100 000 000 000 000 000 000 000 000 000 000 000 000 010 000 000 000
(14)
Finally, one obtains the following family of circulant states
10 0 0 00 liE 00 0 10 0 00 0 00 0 00 0 10 0
oE
0 100 0 1 0 000 0 0 o 000 0 0 liE 0 0 0 0 0 o 100 0 1 0 OEO 0 0 0 00 E 0 0 0 00 o liE 0 0 10 001
(15)
150
0.8
0.7
0.6 0.5
0.4
0.3
0.2
0.1
x Figure 1.
The graph of D EN for
(/1 (x)
The marginal states p, (J of 8 1 (c:) can be calculated as
p = (J =
1
3 (eoo + ell + e22)
(16)
Therefore, we have
D (81 (c:): p,(J )
=
-A3
[ 3 1 3/c: +log3 J , log A + clog 3c: j[ + ~logA
(17)
with A defined in (12). Actually, one can classify the state of 8 1 (c:) by the value of c: (see Ref. 17). Theorem 3.1.
(1) 81 (c:) is separable iff c: = 1; (2) 81 (c:) is both PPT and entangled for c:
cJ l.
The corresponding graph of DEN for 81 (c:) is shown in Fig. 1. The maximal value corresponds to c: = 1 and is given by D (8 1 (1) : p, (J) = ~ log 3 ~ 0.7324.
151
3.2. Horodecki state
Let us consider another example of circulant states introduced in Refs. 15, 16: (0 ~ 0: ~ 5),
(18)
where
1/J= 8+
1 J3(eo0eo
+
e10 e 1
+ e20 e2),
(19)
1
= "3 (eoo 0 ell + ell 0 e22 + e22 0 eoo),
8- =
1
"3 (ell 0
eoo
+
e22 0 ell
+
eoo 0 e22) ,
(20) (21)
and hence 0 0
0 0 0
2 21
0 <> 0 21 0 0 5-<> 0 0 ----:2l 0 0 5-<> 0 0 0 ----:2l 0 0 2 2 0 0 21 0 21 0 <> 0 0 0 0 0 21 0 0 0 0 0 0 0 0 0 0 0 0 2 2 0 0 0 21 21 0 221
0
0 0 0 0 0 0
0 0 0 0 0 0 0
2 21
0 0 0 2 21
(22)
0 <> 0 21 5-a 0 ----:2l 0 2 0 0 21
The marginal states p and (J are the same as (16). The von Neumann entropy of 82 (0:) is calculated to be 2 2 0: 0: 5-0: 5-0: 8(8 2 (0:)) = -7 log 7 -7 log 7 - -7- log -7-
5
+ 7 log 3.
(23)
Therefore, we have
D(8 2 (0:): p,(J)
=
2 2 0: 0: 5-0: 5-0: 2 -7 log 7 -7log7 - -7-log-7- -7log3.
(24)
As in the case of the circulant state 8 1 (E), it is known that the state 82 (0:) can be classified by the values of 0: (see Refs. [15, 16]). Theorem 3.2.
(1) 82 (0:) is separable if and only if 0: E [2,3]; (2) 82 (0:) is both entangled and PPT if and only if 0: E [1,2) U (3,4]; (3) 82 (0:) is not PPT if and only if 0: E [0,1) U (4,5].
152
0.8
0.75 0.7
1:::
1;' 0.65
'" g;
0.6
~
i
0.55
C')
OJ o
~
0.5
c:: 0.45 X .;,
f
0.4
X
0.
0.35
OJ o
~
~
C>
o
0.3 0.25
2
3
2.5
~
4
3.5
4.5
5
x
Figure 2.
The graph of DEN for (l2(X)
Now, since 81(10) and 82 (a) have common marginal states, we can compare DEN for them. One has for example
D(8 1 (1) : p,O") ~ 0.7324 < D(8 2 (3.1) : p,O") ~ 0.7587.
(25)
Note, however, that 81 (1) is separable while 82 (3.1) is entangled. This example gives negative answer to our original question. The corresponding graph of DEN for 82 (a) is shown in Fig. 2. 4. Bell diagonal states
In this section, we analyze Bell diagonal states for d = 3 (see Refs. [4, 5, 6, 7, 8, 9]). Consider the Hilbert space 1t =
re 3
with the standard basis
{eo, e1, e2 }. We set
(26) and define flk,e for any k, C (0 :S k, C :S 2) as
flk ,e = (Wk ,e ® 13)flo,o ,
(27)
153
where Wk,e means the circle action given by
(i=0,1,2).
(28)
Finally, we define
e( a, p) =
P( 0
l-a-p 9 I3 0 I3 + aIOo,o) (0 0 ,0 1 + 2"
1 1,0) (0 1 ,01
+ 10 2,0) (0 2 ,0 1) (29)
Note that e( a, p) can be written in the form 1+2(0:+(3) 9
0 O 0 20:-(3 -6-
0
0 0
1-0:-(3 -g-
1-0:-(3
0 0
0
0 0
0 0
0 0
20:-(3 -6-
0
0 0
0 0 0 0
Note that tr e(a, p)
=
20:-(3 -6-
20:-(3 -6-
0 0 0
0 0 0 0
0 0 0
0 0 0
0
0
1+2(0:+(3) 9
0
0
0 0
200-(3 -6-
0 1-00-(3 -g-
0 0
0
1+2(00+(3) 9
0
1-00-(3
0 0
0 0
0
0
200-(3 -6-
0
0
0 0 0 0
1 and the eigenvalues of e(a, p) are given by
l-a-p 9
-2a + 7p + 2 18
8a -
p+ 1
(30)
9
Therefore, e(a, p) defines a state if and only if the parameters a, p satisfy
a+p:::; 1,
2a -7p:::; 2,
-8a + p:::; 1.
(31)
On the other hand, let us ask for the condition when e(a, p) can be positive under partial transpose. The partial transpose Te(a, p) of e(a, p) is given by 1+2(00+(3) 9
0
0
0
1-00-(3 -g-
0
20:-(3 -6-
0
1-0:-(3 -g-
0
006
0
1-00-g-
0 0 0
0 0 0 0
0 0
O
0
0
0 0
O O
0 0 0 0
0 0 0 0
0 0 200-(3
0 0
1+2(0:+(3) 9
0
0
0 0 0
0
0 0 0 0
0 0 0 0
0
1-00-(3
20:-(3
0 0
200-(3 -6-
1-00-(3 -g-
0 0 0 0
0
0
0
1+ 2(00+(3) 9
0
154
From this one can check that the trace is equal to 1 and the eigenvalues are
1+2(a+,8)
-8a +,8 + 2
9
18
4a - 5,8 + 2 18
(32)
Hence, the PPT condition holds for a, ,8 such that
a
1
+ ,8 ?: -"2'
8a - ,8 ::; 2,
-4a + 5,8 ::; 2.
(33)
On the set which consists of a, ,8 satisfying both (31) and (33), the state is either separable or bound entangled. However, it was shown 7 that for this class all PPT states are separable. Let us consider for example the line ,8 = 3/ 5. One has: B(a , 3/ 5) is entangled if and only if
a E [0,1/4) U (13/40,2/5] ,
(34)
and it is separable iff
a
E
[1 / 4,13/ 40] .
(35)
The marginal states p and a are calculated to be the same as (16). Hence we obtain DEN for e(a, 3/5)
(2 - 5a)_ 31 - 10a l (31-lOa) / 5 ) .. p, a ) -_ lOa-4 ( ( DBa,3 15 l og 45 45 og 90 - 40a + 2 1og (40a + 2) - 1og 3 . 45 45
(36)
By a simple calculation, it easily be verified that D(e(a, 3/5) p, a) is monotonically decreasing if a E [1/20, 2/5). Hence, the values of DEN for a E [1 / 20,1/4) are greater than the ones for a E (1 / 4,13/40) (see Fig. 3). This gives also the negative answer to our original question. 5. Conclusions We provided several examples of bipartite quantum states for which one can easily compute DEN. Moreover, since they possess the same marginal states (maximally mixed) one can compare the corresponding degree of entanglement. It turned out that separable state can have stronger correlation (with respect to DEN) than an entangled state. This observation is inconsistent with the conventional understanding of quantum entanglement. Consequently, we propose that the meaning of DEN should be changed to express the intensity not of entanglement but of correlation. The details will be discussed in the forthcoming paper 19.
155
0.7 0.6 0.5 0.4 0.3 0.2 0.1
0 ·0.1
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
x Figure 3.
The graph of DEN for B(x, 3/5)
Acknowledgments
This work was partially supported by the Polish Ministry of Science and Higher Education Grant No 3004/B/H03/2007/33.
References 1. M. A. Nielsen and I. L. Chuang, Quantum computation and quantum infor· mation, Cambridge University Press, Cambridge, 2000. 2. R. Horodecki, P. Horodecki, M. Horodecki and K. Horodecki: Quantum en· tanglement, Rev. Mod. Phys. 81 (2009). pp 865-942. 3. L. Accardi, T. Matsuoka, M. Ohya:Entangled Markov chains are indeed en· tangled, Infin. Dimens. Anal. Quantum Probab. Relat. Top., Vol. 9 (2006), pp 379-390. 4. R. A. Bertlmann, and Ph. Krammer: Simplex of bound entangled multipartite qubit states, Phys. Rev. A 78(2008), 10pp. 5. R. A. Bertlmann, and Ph. Krammer: Geometric entanglement witnesses and bound entanglement, Phys. Rev. A 77, 024303 (2008), 4pp. 6. B. Baumgartner, B. Hiesmayr, and H. Narnhofer: State space for two qutrits has a phase space structure in its cone, Phys. Rev. A 74 (2006), l4pp.
156
7. B. Baumgartner, B. Hiesmayr, and H. Narnhofer: A special simplex in the state space for entangled qudits, J. Phys. A: Math. Theor. 40, 7919 (2007). 8. B. Baumgartner, B.C. Hiesmayr and H. Narnhofer: The geometry of biparticle qutrits including bound entanglement, Physics Letters A, 372 (2008), pp 2190-2195. 9. D. Chruscinski, A. Kossakowski, T. Matsuoka, K. Mlodawski, A class of Bell diagonal states and entanglement witnesses, to apper in Open Systems and Inf. Dynamics 10. V. P. Belavkin and M. Ohya: Quantum entropy and information in discrete entangled state, Infin. Dimens. Anal. Quantum Probab. Relat. Top., Vol. 4 (2001), pp 137-160. 11. V. P. Belavkin and M. Ohya: Entanglement, quantum entropy and mutual information, Proc. R. Soc. London A 458 (2002), pp 209-231. 12. D. Chruscinski and A. Kossakowski: Circulant states with positive partial transpose, Physical Review A 76 (2007), 14 pp. 13. D. Chrusciilski and A. Pittenger: Generalized Circulant Densities and a Sufficient Condition for Separabil ity, J. Phys. A: Math. Theor. 41 (2008) 385301. 14. D. Chruscinski and A. Kossakowski, Multipartite Circulant States with Positive Partial Transpose, Open Sys. Information Dyn. 15 (2008) pp 189-212. 15. P. Horodecki, M. Horodecki, and R. Horodecki: Bound entanglement can be activated, Phys. Rev. Lett. 82, (1999), pp 1056-1059. 16. M. Horodecki, P. Horodecki and R. Horodecki: Mixed state entanglement and quantum condition, In Quantum Information, Springer Tracts in Modern Physics 173 (2001), pp 151-195. 17. J. Jurkowski, D. Chrusciilski and A. Rutkowski: A class of bound entangled states of two qutrits, Open Syst. Inf. Dyn., 16 (2009), no 2-3, pp 235-242. 18. M. Ohya and T. Matsuoka: Quantum entangled state and its characterization, Found. Probab. Phys. no 3, 750 (2005), pp 298-306. 19. D. Chrusciilski, Y. Hirota, T. Matsuoka and M. Ohya: Quantum correlation and essential q-entanglement, in preparation.
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 157-171)
A COMPLETELY DISCRETE PARTICLE MODEL DERIVED FROM A STOCHASTIC PARTIAL DIFFERENTIAL EQUATION BY POINT SYSTEMS
KARL-HEINZ FICHTNER\ KEI INOUE 2 AND MASANORI OHYA 3 1 Friedrich-Schiller- Universitiit Jena, Fakultiit fur Mathematik und Informatik, Institut fur Angewandte Mathematik, 07737 Jena, Germany. E-mail: [email protected] 2 Department of Electrical Engineering, Tokyo University of Science, Yamaguchi, Sanyo-Onoda, Yamaguchi 756-0884, Japan. E-mail: [email protected] 3 Department of Information Sciences, Tokyo University of Science, Noda City, Chiba 278-8510, Japan. [email protected]
Several scientific and technical problems can be described by a stochastic partial differential equation. The solution of the equation could be considered as the limit of a suitable discrete particle model. The existence of such a kind of approximation was discussed in 5. A completely discrete particle model, which is constructed to simulate by computer, is considered in 3. In this paper we give proofs of some lemmas which are used to prove the main theorem in 3.
1. Introduction
Several scientific and technical problems can be described by partial differential equations of the type
av
1
at
2
-=-~v+f.
(1)
A more precise model is to take into account stochastic disturbances, i.e., to the right hand of (1) there have to be added corresponding source terms. Often this disturbance occurs as the so-called white noise. So we regard the stochastic partial differential equation
av at (x, t) =
1
2~v(x,
t)
+ f(x, t) + O"(x, t) ~(x, t). 157
(2)
158
Note that (2) is connected with the idea of diffusion and generation of particles in random spatial-temporal points. Therefore the solution of (2) could be considered as the limit of a suitable discrete particle model. The existence of such a kind of approximation was discussed in 5, and the related problems were considered in 4,6. In 3 we construct a completely discrete particle model approximating a continuous system being different from the limit considered in 5. In this paper we give proofs of some lemmas which are used to prove the main theorem in 3.
2. Basic notations
The Poisson process Following 7, we introduce the notions and notations in this section. Let G be a Polish space, i.e., a separable topological space for which there is a complete metric. We denote the (I-algebra of the Borel sets in G by ® and the ring of the bounded Borel sets of ® by .e. In the case G = R d we use notations ® = ®d, .e = .ed . Further, M denotes the set of all integervalued measures on ® being finite on.e. Let 9Jt be the smallest (I-algebra of M-subsets which makes the function
f---+
(X),
E M
measurable for each X from.e. A measurable mapping from a probability space [n, F, P] into [M,9Jt] is called a random point system in G, the distribution on [M,9Jt] generated by such a random point system is said to be a point process with the phase space G. To each measure H on [M,9Jt], a measure PH on ®, called the intensity measure of H, is assigned by PH(X):=
J
(X)H(d
X E ®.
Let Xl" '" Xm be elements of ®. The measurable mapping
f---+
[(Xd, ... , (Xm)]
from Minto Nm transforms each measure H on [M,9Jt] with H({: (Xi ) = oo}) = 0,
i = 1, . .. ,m
into a measure Hx,oo.x= on [Nm ,')'(Nm)]. If H is a point process, the measures Hx,.oox= are called the finite dimensional distributions of H.
159
To any 'c-finite measure f.L on ® we can associate the Poisson process Pep) with an intensity measure f.L characterized by p(p)X, ... x=
= xz;,(7rp(Xi)
for all finite sequences (Xi)~l of pairwise disjoint sets from'c. Here 11")" A > 0 denotes the Poisson distribution with expectation A. For all 'c-finite measure f.L on ® it holds (p(p)t *
where
(p(p)t*
= p(np)
is the n-th convolution-power of
(3) p(J-t)
(cf. 7).
l7-additive processes
Definition 2.1. A random process 7] = (7](B); BE ,c) on [n, F, Pl is called a-additive if for all sequences (Bi)~l of pairwise disjoint elements from ,c such that U~l Bi E ,c, we have
Definition 2.2. A a-additive process 7] = (7](B); B E ,c) is called decomposable if for all finite sequences (Bi)~l of pairwise disjoint bounded measurable sets the random variables 7](Bd, ... , 7](Bm) are independent
(cf. 2). If \fF is a random point system in G then the family \II := (\fF(B); B E ,c) represents a a-additive process. Moreover, if the distribution of \fF is a Poisson process then \II is a decomposable process. The white noise based on the Lebesgue measure on [Rd, ®dl as well as the so-called generalized Brownian motion represents a decomposable process in this sense (cf. 8). Here U is called to be a generalized Brownian motion on G with noise intensity measure f.L if
(i) each random variable U(B) has law N(O, f.L(B)) , (ii) the variables U(B 1 ), ... , U(Bm) are independent B 1 , ... ,Bm are pairwise disjoint.
whenever
Definition 2.3. For each n E N let 7](n) and 7] be a-additive processes on [n, F, Pl. The sequence 7](n) is said to converge toward 7] if for every finite sequence (Bi)~l of elements from ,C the random vector
160
[7)(n) (Bd, ... ,7)(n) (Bm)] converges in distribution ( i.e. , weakly convergence of the probability distribution) to the random vector [7)(B 1 ), . . . , 7)(Bm)]. 3. A Special Type of a Stochastic Partial Differential Equation We consider the stochastic partial differential equation
av at (x, t) =
1
2~v(x, t)
+ f(x, t) + (J"(x, t)
~(x,
t)
(4)
with initial value v(x,O) = 0) or negative(f(x, t) < 0) charged particles are added. Finally, a stochastic source term describing the stochastic disturbance is added. (J"2(x, t) is the intensity of the noise. It must be noted that in the higher dimensional case (x E R d, d > 1) the mathematical concept symbolized by (4) is not quite clear.
A Partially Discrete Particle Model A treatment of this case was discussed in 5. Firstly let us explain the main idea of the model. Let
0, be the stochastic kernel on [Rd, Qjd] given by
K)..(q,·) = N(q,)..1,·) Here I denotes the identity matrix and N(q, ),,1,') denotes the normal distribution with the expectation vector q and the diagonal matrix I as covariance matrix. Then
(5) describes how the charge present at initial time zero diffuses until the time
t. Here Zd denotes the Lebesgue measure on [Rd, Qjd]. The discretization of the process corresponding to the source term f occurs as follows: Let f be a bounded measurable function defined on R d X R +. We define functions f+, f - putting
f+
:= max{O,
f}, f-:= max{O, - f} .
161
f-L1 denotes the measure on R d+ 1 X f-L1(B x {I}) f-L1(B x {-I})
{
-1,1} characterized by
J =J =
j+(XhB(X) Zd+1(dx), BE £d+1,
j-(X)XB(x) Zd+l(dx), BE £d+1.
(j?n denotes a random point system in Rd+1 x {-I, I} with distribution P(nJ.Ll)' Thus (j?n describes the_ configuration of charge points which are added spatially-temporally. If {(In{[x, S, I]) = 1 then a charge unit of the amount l/n is added. If (j?n{[x, s, -I]) = 1 then a charge unit of the amount -1/ n is added. After occurring a particle, it diffuses. Thus the contribution of this source process to the charge density until t > 0 is given by
D~n)O = ~
J
C
(Kt-s(x, ')X(O,t)(s)
+ 8xOX{t}(s)) (j?n(d[x, s, c]).
(6)
Here XB denotes the indicator function of a set B. The disturbance term is transformed similarly. Let IJ be a bounded measurable function defined on R d X R +. 1J2 is considered as the density of a measure f-L2 on [Rd+l, Qjd+1]. We define a measure f-L on Rd+1 x {-I, I} by f-L := f-L2 x
1
2" (8 1 + L d·
Let {(In be a random point system in R d+1 X { -1, I} with distribution p(nJ.L)' Analogously to (j?n the random point system {(In describes the configuration of charge points which are added spatially-temporally with mean intensity nIJ 2 including the "sign" of the unit charges. Differently to the effect of the first source process individual charges of the value 1/ Vn are generated. Thus the contribution of this second source process to the charge density until t > 0 is given by c;n) 0
=
In J
c (Kt-s(x,.) X(O,t)(s)
+ 8xOX{t}(s)) {(In (d[x, s, c]).
(7)
Each particle should give a contribution to the entire charge and evaluate independently on the other ones. These requirements do not follow from the properties of the Poisson process. Therefore, we assume that (j?n and {(In are independent. Consequently the entire process can be defined as the sum of independent random variables
(8)
162
By the normalization lin, respectively lifo, the "potential" of one generated particle decreases when n increases, whereas the average number of the particle increases, i.e., the produced charge is "smeared over" in a certain manner. From theorem 1.3.6 in 5 we can conclude: Theorem 3.1. Let be t > O. Then it holds
(a) The sequence of the (J"-additive processes u~n) = (u~n) (B); B E £d) converges (in the sense of definition 2.3) to a (J"-additive process Ut = (ut(B); BE £d). (b) For each finite sequence (Bi)'~l from £d the random vector [ut(BI), ... , ut(Bm)] is normally distributed. The distribution is characterized by the expectation values
and by the covariances
Let us note that the processes (Ut)t>o can be interpreted as the solution of the integral equation corresponding to (4) (cf. 5).
A Completely Discrete Particle Model In 3 a completely discrete particle model is considered, i.e., in addition to the source term j and the diffusion term (J"2 the initial condition cp has to be discretized. Furthermore the particles of the considered system are moving according to a Brownian motion. In the following we assume that cp(x) is an integrable function on Rd. Furthermore we assume that for all t > 0 the functions j(x,s)x(O,t)(s) , (J"2(x,s)X(0,t)(s) are integrable. Those conditions insure that the considered random point systems are always finite. In principle one can consider the case that cp, j, (J"2 are bounded measurable functions like in 5. Now the initial condition cp will be discretized as follows: /-to denotes the measure on R d+ 1 X {-1, I} characterized by
J =J
/-to(B x {I}) = /-to(B x {-I})
cp+ (X)xB (x) ld(dx), BE £d, CP- (X)XB(X) ld(dx), BE £d.
163
Let ~n be a Poisson random point system on R d X { -1, I} with distribution p(nl-'o)' ~n describes the configuration of charge points. Further ((wx(t))t>O)xERd denotes a family of independent standard Wiener processes on Rd. A charge point starts from x at initial time 0 and moves to x + wx(t) at time t. The contribution of this diffusion process to the charge density until t > 0 is given by
(9) We have already discretized the source term f and the diffusion term J2, i.e., we already have the point configurations of ~n and q,n. A charge point occurring at time s on position x moves to x + Wx (t - s) at time t. The contributions of these source processes to the charge density until t > 0 are given by
D-en) t (-)
11 In 1
=;
c;n)(-) =
-
c X(O,t)(s) Dx+wx(t-s)(-) q,n(d[x, s, c]), C
X(O,t)(S) Dx+wx(t-s)(-) q,n(d[x, s, c]).
(10) (11)
Furthermore we assume that ((w x (t))t2:0)XERd, ~n, ~n, q,n are independent. The entire process can be defined as the sum of independent random variables
(12) Similarly as the partially discrete particle model, the discrete system Ut(n) should approximate a continuous system. The following theorem makes that more precise 3. Theorem 3.2. Let be t given by
Vt(B) = At(B) at(B)
=
1
> 0, B
+
E £d. Further let Vb at measures on Rd
1
Kt-s(x, B)f(x, S)x(O,t)(S) ld+l(d[x, s])
Kt-s(x, B)J 2(x, S)x(O,t)(S) 1d+1(d[x, s]).
We assume that "\it is a generalized Brownian motion with noise intensity measure at. Then the sequence of the J-additive processes (Ut(n))nEN converges to a decomposable process Ut given by the following equation
164
Remark 3.1. This theorem means that the completely discrete particle model approximates a continuous system being different from the limit considered in 5, i.e., the limit of the completely discrete model has the same expectation values but different covariances, compared with the limit considered in 5. 4. Some Lemmas
For the proof of the main theorem in necessary to use the following lemmas.
3
(theorem 3.2 in this paper) it is
Lemma 4.1. Let \[Tn be a Poisson random point system with intensity measure nf.1 where f.1 is a locally finite measure on G and n E N. u(n) denotes the CT-additive process defined by u(n) := ~ \[Tn. Then the sequence (u(n) )nEN converges to the (trivial) CT-additive process f.1, i. e., AUCn) (g) -+ exp{ifg(x)f.1(dx)} (n-++oo).
Lemma 4.2. Let <pr,
Now let kt(x , y) be the density of the transition probability of the ddimensional standard Wiener process, i.e., d
kt(x,y)
=
(_1_)"2 exp (-~llx _ Y112) 27rt 2t
(t> Q,x,y
ER
d ).
Lemma 4.3. Let Q
165
Lemma 4.4. Let 1> be a Poisson random point system in Rd x R+ with finite intensity measure f-L, i.e., f-L(R d x (0, +(0)) < +00, being absolutely continuous w. r. t. the Lebesgue measure. h denotes its density. Further let ((wx(t))t:"O)xERd be a family of independent standard Wiener processes in Rd being independent from 1>. Then for each t > 0 1>t := J 1>(d[x, s])Ox+wx(t-s)X(O,t)(s) becomes a Poisson random point system in R d with finite intensity measure f-Lt being absolutely continuous w. r. t. the Lebesgue measure with density
5. Proof
In this section we give the proofs of the lemmas in section 4. Using the lemmas the proof of theorem 3.2 is given in 3. Proof of Lemma 4.1 Let us consider the characteristic functional Cu(n) (g) of u(n). Firstly m
= Lc~m)XB("') with pairwise disjoint
we consider the special case gem)
k
k=l
subsets Bi m), ... , B~m). Then we have
J
gCm)(x)du(n)(x) =
fc~m)u(n) k=l
(Bkm)) =
~ fc~m)wn (Bkm)).
(13)
k=l
From (13) we obtain Cu(n) (g(m))
=
Eexp
(i Jg(m) (x)du(n) (x))
(i~ ~c~m)wn (Bkm))) +00 +00 ((m)) = L ... L II exp iC~ lk =
Eexp
m
h=O
lk=O k=l
xPr{wn(Bim)) =h,'" ,Wn(B~m)) =lm}
= fL~ exp
(iC~) lk) Pr {w
n
(Bkm)) =
lk}'
(14)
166
Since
Using the Maclaurin expansion of exponential functional we have
From (15),(16) we obtain
=exp{i~c~m)j.L(Bkm)) } = exp
{i Jg(m) (x)j.L(dx) }.
(17)
Finally let us approximate a general function g by the sequence i.e.
(g(m))mEN,
From (17), (18) we get CU(n)
(g)
~ exp
{i Jg(X)j.L(dX)}
(n
Proof of Lemma 4.2 Let us consider the characteristic functional
--+
+00) . •
Cu(n)
(g) of
u(n).
Firstly
m
we consider the special case gem) = Lc~m)XB(=) with pairwise disjoint k
k=l
167
subsets
Bim), ... ,B~m). Then we have
J
g(m) (x)du(n) (x)
=
f
ckm)u(n)
(Bk
m ))
k=l
(m)
m
= ~ ~ {<1>7 (Bkm))
- <1>~ (Bkm))}.
(19)
From (19) we obtain
=
IT
exp
n
[n~ (Bkm)) {exp «~) -I}]
x eX+M (Bjm)) {ex =
IT
exp
p
(-i$;)
I}]
[n~(Bkm)) {exp «~) +exp (-ij;) -2}]
(20b)
where the equality (20a) is obtained from the independence of <1>f and <1>2. Using Maclaurin expansion of exponential function we have
n~ (Bkm)) {exp (ij;) +exp (-ij;) -2} = _ (Ck:))2 ~ (Bkm )) + 0 (~). (21)
168
From (20b),(21) we get
From (22) we obtain
Gu(n) (g(rn))
-----+
exp
[-~
J
{g(rnl(x)}2 fL (dX)] (n
----+
+00).
(23)
Similarly as (18) let us approximate a general function g by the sequence (g(rn) )rnEN. Then it holds from (23)
Gu(n) (g)
-----+
exp
[-~
J
{g(X)}2 fL (dX)] (n
----+
+00) . •
Proof of Lemma 4.3 Let us consider the characteristic functional Gt of Firstly we consider the special case g(rn) = EZ'=l c~rn) Bkrn ) with pairwise disjoint subsets B 1(rn) , ... , B(rn) rn· Then we h ave
t.
J
g(rn) (x)dt(x) =
f>~m)t (Bkm)) k=l
=
f>~m)
J
(dx)8x+wx (t) (Bk m))
k=l
= L g(m)(x + wx(t)).
(24)
xE
From (24) we obtain
Gt(g(m)) = Eexp
(i Jg(m) (X)dt(X))
= EexP{Lig(ml(X+wx(t))} xE
=E
II exp {ig(m)(X + Wx(t))}. xE
(25)
169
Using the independence of ((wx(t)k~O)xERd and
G'Pt(g(m»)
=
J
PJ-L(d
= exp
II Eexp {i9(m)(X +wx(t))} xE'P
(J J-l(dx) [Eexp{ig(m)(x+wx(t))} -1]).
(26)
Further it holds
E exp {ig(m)(x
+ Wx(t))} =
J
exp {ig(m)(y)} kt(x, y)dy. (27)
From (26),(27) we obtain
G'Pt(g(m») = exp
= exp
(J J-l(dx) [J {ig(m)(y)} kt(x,y)dy -1]) (J {J h(x)kt(x, Y)dX} {i9(m)(y)} - dY) exp
[exp
1]
(28) Setting h t as
ht(y)
:=
J
h(x)kt (x, y)dx,
we obtain from (28)
G'P t (g(m»)
=
exp
= exp
(J ht(y) (J J-lt(dy)
[exp {i9(m)(y)}
-1] dY)
{exp {i9(m)(y)}
-I}) .
(29)
Similarly as (18) let us approximate a general function 9 by the sequence (g(m»)mEN. Then it holds from (29)
G'Pt(g) = exp
(J J-lt(dy) [exp{ig(y)} -IJ).
That proves lemma 4.3 . • Proof of Lemma 4.4 Let us consider the characteristic functional G'P t of
ci Bk
170 B 1(m) , ... , B(m) m·
Then we have
/ g(m) (x)diJ>t(x) =
f>~m)iJ>t (Bk m)) k=l
=
f>~m) / k=l
iJ>(d[x, s])X(O,t) (s )Ox+wx(t-s) (
L
Bkm))
g(ml(x + wx(t - s)).
(30)
[x,sjE,O<s
From (30) we obtain
Ct(g(m)) = Eexp
(i / g(m) (X)diJ>t(X)) L
= Eexp {
g(m)(x
+ wx(t - S))}
[x,sjE,O<s
II
=E
exp{ig(m)(x+wx(t-s))}.
[x,sjE , O<s
II
P,,(diJ»
= /
Eexp{ig(m)(x+wx(t-s))}
[x,sjE,O<s
= exp ( / f.l(d[x, s])X(O,t)(s) [E exp {ig(m)(x + wx(t - s))} - 1]) (31) where
the
equality
((wx(t))t>O)xERd and
(31)
iJ>.
is
obtained
from
the
independence
of
Further it holds
Eexp{ig(m)(x+wx(t-s))} = / exp {ig(m)(y)} kt-s(x,y)dy.
(32)
From (31),(32) we obtain
Ct (g(m)) =
exp ( / f.l(d[x, s]h(o,t) (s) [/ exp {ii m)(y) } kt-s(x , y)dy - 1])
= exp ( / { / / h(x, s )X(O,t) (s )kt-s(x, y)dxds } [ex p {ig(m) (y) } - 1] dY) . (33)
171
Setting h t as
ht(y)
:= /
/
h(x, s)X(O,t)(s)kt-s(x, y)dxds,
we obtain from (33)
CiPt(g(m)) = exp ( / ht(y) [exp {ig(m)(y)} -1] dY)
= exp ( / ILt(dy)
J) .
[exp {ig(m)(y)} - 1
(34)
Similarly as (18) let us approximate a general function g by the sequence (g(m))mEN. Then it holds from (34)
CiPt(g)
=
exp ( / ILt(dy) [exp{ig(y)} - l l)
.
That proves lemma 4.4 . •
References 1. L. Breiman, Probability, Reading, Mass., 1968. 2. J. Feldman, Decomposable processes and continuous products of probability spaces, J. Function. Analysis, 8, I-51, 1971. 3. K.-H. Fichtner, K. Inoue, M. Ohya, Approximative approaches to a stochastic partial differential equation by point systems, Preprint. 4. K.-H. Fichtner, R. Manthey, Weak approximation of stochastic equations, Stochastics and Stochastics Reports, 43, 139-160, 1993. 5. K.-H. Fichtner, M. Schmidt, Approximation of a continuous system by point systems, SERDIeA, 13, 396-402, 1987. 6. K.-H. Fichtner, G. Winkler, Generalized Brownian motion, point processes and stochastic calculus for random fields, Math. Nachr. 161, 291-307, 1993. 7. K. Matthes, J. Kerstan, J. Mecke, Infinitely divisible point processes, J.Wiley, New York, 1978. 8. J.B. Walsh, A stochastic model of neural response, Adv.Appl.Probability, 13, 231-281, 1981. 9. H. Zessin, The method of moments for random measure, Z. Wahrsch. Verw. Gebiete 62, 395-409, 1983.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 173- 183)
ON QUANTUM ALGORITHM FOR EXPTIME PROBLEM S. IRIYAMA AND M. OHYA
Department of Information Sciences , Tokyo University of Science 2641, Yamazaki, Noda City, Chiba, Japan There exists a quantum algorithm with chaos dynamics solving an NP-complete problem in polynomial time, called OMV SAT algorithm . The language class EXPTIME is larger class than NP, there is no classical algorithm to solve it in polynomial time. In this paper we propose a quantum algorithm for one of the problems in EXPTIME, Pebble Game, and compare the computational complexity of it with the classical one. We show that a quantum algorithm with Oracle solves it in polynomial time while a classical algorithm with same Oracle does in exponential time.
1. Introduction
We have studied on quantum algorithm for several years. Ohya, and Volovich discovered the quantum algorithm with chaos dynamics called the OMV quantum algorithm which can solve NP complete problem in polynomial time. We applied this quantum algorithm to the other problems, multiple alignment of amino acid sequence, Hamilton closed path problem, protein folding problem and EXPTIME problem. Therefore we found that OMV quantum algorithm is useful for searching problems, i.e., to search the objects which satisfy the given conditions. In the field of Bio-Information there exist many searching problems, and we can apply our quantum algorithm to them. In this paper, we show a quantum algorithm for EXPTIME problem, Pebble Game. Then we discuss the computational complexity of it.
2. Quantum Algorithm A quantum algorithm is constructed by the following steps: (1) Prepare a Hilbert space (2) Construct an initial state (3) Construct unitary operators to solve the problem 173
174
(4) Apply them for the initial state and obtain a result state (5) If necessary, amplify the probability of correct result (6) Measure an observable with the result state In the first step, we define the Hilbert space depending on the problem. Let
((:2
be a Hilbert space spanned by 10)
=
(~)
and 11)
=
(~),
a
normalized vector 1'If!) = a 10) + (311) on this space is called a qubit. Since we can use a superposition of 10) and 11) as an initial state vector, the quantum algorithm is more effective than classical one. One can apply Hadamard transformation H __ 1 (1 1 )
- y'2
1-1
to create a superposition. For 10) and 11), it works as H 10)
1
1
= y'210) + y'211)
1 1 H 11) = -10) - -11) .
y'2
y'2
Hadamard transformation has a very important role in a quantum algorithm. Here we introduce logical gates, which are NOT gate, C-NOT gate and CC-NOT gate. We call these gates fundamental gates. We can also construct AND and OR gate by considering the product of fundamental gates and some imprementations. The NOT gate UNOT is defined on a Hilbert space ((:2 as UNO T =
11) (01
+ 10) (11·
It works for an arbitrary qubit as
C-NOT UCN gate and CC-NOT Hilbert space as UCN
UCCN
are given on two and three qubit
= 10) (01 ® I + 11) (11 ® UNO T
UCCN =
10) (01 ® I ® I
+ 11) (11 ® 10) (01
®I
+ 11) (11 ® 11) (11 ® UNO T ,
respectively. The unitary operator to solve the problem is constructed by these fundamental gates.
175
3. OMV SAT Algorithm In this section, we explain OMV(Ohya-Masuda-Volovich) quantum algorithm which contains two part, that are unitary computation and chaos amplification process. It is discussed precisely in the papers l ,2,4,9. Let X == {Xl, ... ,xn},n E N be a set. Xk and its negation Xk (k = 1, ... , n) are called literals. Let X == {Xl, .. " Xn} be a set, then the set of all literals is denoted by X' == X UX = {Xl, ... , X n , Xl,"" X n }. The set of all subsets of X' is denoted by :F (X') and an element G E :F (X') is called a clause. We take a truth assignment t to all variables Xk. If we can assign the truth value to at least one element of G, then G is called satisfiable. Let L = {O, I} be a Boolean lattice with usual join V and meet 1\, and t (x) be the truth value of a literal X in X. Then the truth value of a clause G is written as t(G) == VxEct(x). Moreover the set C of all clauses Gj (j = 1,2,'" ,m) is called satisfiable iff t (C) == I\'j=l t (Gj ) = 1. Thus the SAT problem is written as follows: [SAT problem] Given a Boolean set X == {Xl,'" ,xn}and a set C = {Gl ,'" ,Gm } of clauses, determine whether C is satisfiable or not. That is, this problem is to ask whether there exists a truth assignment to make C satisfiable. It is known that we can check the satisfiability in polynomial time when a specific truth assignment is given, however we do not determine it in polynomial time when an assignment is not specified. We first calculate the total number of qubits, and show that this number depends on the input data. This calculation is done in polynomial time of input size. Since the total number of qubit required the quantum algorithm, we decide the Hilbert space and the initial state vector on it. Let C {Gl , ... ,Gm } be a set of clauses on X' {Xl, ... ,X n , Xl, ... ,x n }. The computational basis of this algorithm is on the Hilbert space H = (C 2)'l9 n +I-'+l where f.L is a number of dust qubits, it is shown that f.L is less than 2mn g. Let
be an initial state vector. For X we put
where
Cl, C2, ... ,Cn
E
=
{Xl, ... ,x n }
and a truth assignment t,
{O, I} , and we write t as a sequence of binary sym-
176
boIs:
A unitary operator Uc : 1i follows
-t
1i computes t (C) for all truth assignment as
where
Id P )
lei)
leI, e2, ... ,en) is a binary representation of t. Accardi and Sabbadini
=
is dust qubits denoted by p, strings of binary symbols, and
pointed out that OMV SAT algorithm is combinatorial 6 . Theorem 3.1. (l) For a set of clauses C = {Gl , ... , Gm } on X' == {Xl, ... ,X n , Xl,···, xn}, the number p, of dust qubits for algorithm of SAT
problem is p,::::: 2nm
For a set of clauses C = {Gl , ... , Gm }, we can construct the unitary operator Uc to calculate the truth value of C as
Uc ==
m- l
m
i=l
j=l
II UAND (i) IIUoR (j) H (n)
where, H (k) is a unitary operator to apply Hadamard transformation to first k qubits, that is
The computational complexity of quantum computation depends on the number of unitary operator in the quantum circuit. Let U be the unitary operator, it is written as
where Un,· .. ,Ul are fundamental gates. The computational complexity T (U) is considered as n.
177
We need to combine some fundamental gates such as UNOT, UCN and UCCN to construct the quantum circuit in fact. UAN D and UOR can be written as a combination of fundamental gates. Here we obtain the computational complexity T (Uc) of SAT algorithm by the number of UNOT, UAN D and UOR. Theorem 3.2. f) For a set of clauses C = {G 1 , ... , Gm {X1, ... ,X n ,X1, ... ,X n }, T(Uc) is
}
and literal X'
=
m
T (Uc) = m - 1 +
L
(IGkl
+ 2i~ -
1)
k-1
:::; 4mn-l
3.1. Chaos Amplifier
Here we will briefly review how chaos can playa constructive role in computation (see 1,2 for the details). Consider the so called logistic map which is given by the equation
The properties of the map depend on the parameter a. If we take, for example, a = 3.71, then the Lyapunov exponent is positive, the trajectory is very sensitive to the initial value and one has the chaotic behavior 2. It is important to notice that if the initial value Xo = 0, then Xn = for all n. The state 1'IjJ) of the previous subsection is transformed into the density matrix of the form
°
where PI and Po are projectors to the state vectors 11) and 10) . One has to notice that PI and Po generate an Abelian algebra which can be considered as a classical system. The following theorems is proven in 1,2,3. Theorem 3.3. For the logistic map Xn+1 = aXn (1 - Xn) with a E [0,4] and Xo E [0, 1], let Xo be 2~ and a set J be {O, 1,2, ... , n , ... , 2n}. If a is 3.71, then there exists an integer k in J satisfying Xk > ~. Theorem 3.4. Let a and n be the same in above theorem. If there exists
k in J such that Xk > ~, then k > log2~~A - 1.
178
Theorem 3.5. Let It (C)I be the cardinality of these assignments, if Xo ==
.{n with r = It (C)I and there exists k in J such that exists k satisfying the following inequality if C is SAT.
- 13.71 -log2 r] [nlog2 - 1
< k < [-45
-
-
Xk
>
~, then there
(n _1)]
From these theorems, for all k, it holds k (2) g3.71 q
{=>
0 0
iff C is not SAT iff C is SAT
Ohya and Volovich proposed quantum algorithm to calculate truth function by unitary operators and mentioned that it is necessary to use some amplification processes to detect the result in case of very small probability. Then they discovered that one can construct this process by using chaos dynamics. The computational complexity of OMV SAT algorithm is discussed precisely in the papers 10 ,9,1l
4. Language Classes
There exist several classical language classes defined by a deterministic Turing machine. Definition 4.1. Let M be a deterministic Turing machine such that halts for all input. For a length n of input, let f (n) be a maximum length of tape cell of M. We define a space complexity of M as
f.
Definition 4.2. PSPACE is the language class which is recognized by a deterministic Turing machine in a polynomial space. Definition 4.3. NPSPACE is the language class which is recognized by a non-deterministic Turing machine in a polynomial space. Definition 4.4. EXPTIME is the language class which is recognized by a deterministic Turing machine in an exponential time . The following relation is known:
P
~
NP
~
PSPACE
= NPSPACE
~
EXPTIME
179
5. Pebble Game Pebble Game is the two players game using a play board and stones(pebbles). Players move one stone at a time along a given rule alternatively. A player wins when he moves a stone to the winning position on the board given by the rule, or he loses when he cannot move any stones. We want to know whether there exist strategies such that a first mover can win every time. The computational complexity of this problem is obviously very large, and in some cases depending on a size of board and rule it belongs to EXPTIME. Then we propose a quantum algorithm for Pebble Game. First, we explain a representation of game and a definition of Pebble Game.
5.1. Representation of a Game Let I be a set of players, M a set of moves (or strategies) available to those players, and P a set of payments for each combination of moves. A game G is given a triplet (I, M, P). The set of moves is given by a rule of the game. Players choice their move in their turn alternatively, so then these choices are described by a sequence of moves. We denote this by a position Pi where a player i E I has the move. A set of ordered pair (pi, Pj) where Pi and Pj are positions such that Pi -7 Pj is a feasible move implies the rule of the game. The game is finished when there does not exist any moves in someone's turn. When the game is finished, the payments are given to players by a function of the position. In this study we assume that the game must be finished so that the length of position is finite. We say a player A wins if the payments of A is greater than a player B in two players game. Then we consider the following problem: [Game Probleml] Does there exist a way such that a player A wins independently of how a player B plays? To answer this, a winning position of A was introduced by Konig.as a set of positions where A wins in finite moves in spite of moves of B. A set of winning positions of A is constructed inductively from a set of positions where A wins by only one move. Let P is a winning position, q is also a winning position if there exists a move r A of A such that for all moves rB of B, it holds
180
Therefore, the Problem 1 is equivalent as the following: [Game Problem2] Determine whether an initial position of the game is in a winning position of A?
5.2. Pebble Game Let n be a positive finite integer denoted by a size of Pebble Game. n- Pebble Game is given by a triplet G Pebble = (I, M, P) where I =
{A,B}, M = {r=(r l ,r 2 ,r3)E{1, ... ,n}3;risanavailablemove} and P = {f: Mk -> {O, I}}. M is a set of available moves depending on a situation of the board and positions of stones. The rule of Pebble Game is the following: • Prepare n lattices denoted by a board and put m ( < n) stones on nodes. Each nodes has a unique number i = 1, ... n. • Players move one of stones alternatively if the stone so chosen can jump the neighbor stone. • If a player puts a stone on the node n, he wins. If a player can not move any stones, he loses. Let x = (Xl, X2, ... , xn) be a vector of Boolean variables denoted by a situation of board, where Xi has one to one correspondence with a node i: Xi
=
{O no stone on a node i 1 node i has a stone
We prepare m stones on the board Xs arbitrary and call it an initial situation. And we call a finished situation X f if there are no available moves. A move ri (i E {A, B}) for a player i is represented by (rl, r2, r3) E {I , ... , n} 3 where rl indicates a position of stone, r2 the neighbor and r3 a position of destination. If there is a stone on rl and r2 and there is not a stone on r3 , the move ri = (rl,r2,r3) is available. After one move, the situation of board is changed by
_ {Xi i = rl or i = r3
Xi -
Xi
otherwise
we denote ri (x) by a situation of board after a move rio If there no moves available or he can move a stone to the node n, the game is finished. Then we have a sequence of moves. For a sequence of moves Pi (i E {A, B}) = (r1, r§, ... , rf) a payment for a player A is given
181
by a function fA E P:
and for a player B is by fB E P: fB (Pi)
=
{ o1 ii == BA
Let us define the following problem: [Pebble Probleml] Does there exist a way such that a player A wins independently of how a player B plays in n-Pebble Game? This problem belongs to EXPTIME if m is not fixed 8 . We check whether an initial situation is a winning position of A for all number m of stones. Then the Pebble Problem1 is translated to the following problem: [Pebble Problem2] For all finished situations xf determine whether there exists k such that 3r~\lr~-1 ... 3d \lr13r~ (xs)
= xf
where Xs is an initial situation. We propose a quantum algorithm with Oracle to solve the Pebble Problem2, and show that even if we assume an Oracle, a classical algorithm is still an exponential time.
5.3. Computational Complexity of a Classical Algorithm for Pebble Game In order to discuss a relative computational complexity between classcal and quantum algorithm, we assume the following Oracle Mo:
Mo : {x; x is a situation} ---; N U {O} For a finished situation x f' Mo cutputs the time of it immediately, just one step. If a situation x is not a final situation, Mo outputs o. A classical algorithm to solve Pebble Problem2 is the following. Step1 For all situations, we do: Step2 Calculate time k (x) for a situation x Step3 If x is a final situation, construct a set Wi (i = 1,2, ... , k (x))of winning situations: Wi
where Wo Step4 Check Xs
= {y; 3r~, \lrB, 3r A, r~rBr AY E Wi-I}
= {x}
= Wk(x).
182
The number of all situations 2n , and the upper bound of number of available moves is 4n since a player can move one pebble into four directions at most. Therefore the computational complexity of a classical algorithm Tc (n) is
Tc (n)
rv
(4n)3 x 2n
rv
exp (n)
Even if we assume the Oracle, this problem belongs to EXPTIME.
6. Quantum Algorithm for Pebble Game As we assume the Oracle Mo, we use a quantum Oracle UMo which works as same as Mo. Here, we construct the following quantum algorithm: Step 1 Step2 Step3 Step4
Create a superposition of all situations. For the superposition, apply Oracle UMo. We construct Wi (i = 1,2, ... k (xf)) for the superposition. If Xs E Wk(x), make the final qubit 11).
All situations are represented by a binary form, so then we can create a superposition of them using Hadamard transformation. Step3 is achieved by a product of unitary gates. In Step4, AND operation is constructed by unitary gates 9 . If final qubit of superposition is 11), there exists a way of winning. Using the chaos amplifier, we obtain the result.
7. Computational Complexity Here, we calculate the computational complexity of the quantum algorithm as the total number of fundamental gates. The Step 1 is constructed by n Hadamard gates. The Step2 is done by only one Oracle UMo. The computational complexity of Step3 has the same order as the classical algorithm. In the Step4 AND operation requires n gates for n qubit. The upper bound of IWk(x) I is (~) where m is the number of pebbles. Therefore, computational complexity TQ (n) of quantum algorithm of Pebble Problem2 is
TQ (n)
rv
{n
+ 1 + (4n)3 + (:) }
x
[~(n
-1)]
rv
poly (n)
where [~( n - 1)] is for chaos amplification to obtain the correct result.
183
8. Conclusion
The computational complexity Tc (n) of a classical algorithm for n-Pebble Game with Oracle is
Tc (n) ~ (4n)3 x 2n ~ exp (n) This is a exponential time of input size n. We constructed a quantum algorithm for this, the computational complexity TQ (n) is
TQ (n)
cv {
n+
1+ (4n)
3
+ (:) }
x
[~( n -
1)]
cv
poly (n)
In this study, we assume the Oracle Mo and the unitary operator UMo which works as Mo. Even if we use this Oracle, the computational complexity of the classical algorithm for Pebble game is an exponential time while it of the quantum algorithm is a polynomial of n.
References 1. M.Ohya and I.V.Volovich, Quantum computing and chaotic amplification, J. opt. B, 5 ,No.6 639-642, 2003. 2. M .Ohya and I.V.Volovich , New quantum algorithm for studying NP-complete problems, Rep.Math.Phys. , 52 , No.l,25-33 2003. 3. M.Ohya and I.V.Volovich, Mathematical Foundation of Quantum Computers, Teleportations and Cryptography, to be published. 4. M.Ohya and N.Masuda, NP problem in Quantum Algorithm, Open Systems and Information Dynamics, 7 No.1 33-39 , 2000 . 5. L. Accardi and M.Ohya, A Stochastic limit approach to the SAT problem, Open systems and Information Dynamics, 11-3, 219-233 , 2004 6. L.Accardi and R.Sabbadini, On the Ohya- Masuda quantum SAT Algorithm, Preprint Volterra, N. 432, 2000. 7. E.Bernstein and U.Vazirani, Quantum Complexity Theory, In Proc. 25th ACM Symp. on Theory of Computation, 11-20, 1993. 8. T.Kasai, A. Adachi , S. Iwata, Classes of Pebble Games and Complete Problems,SIAM J. Comput. Volume 8, Issue 4, pp. 574-586 (1979) 9. S.Iriyama and M.Ohya, Rigorous Estimate for OMV SAT Algorithm, Open Systems & Information Dynamics, 15, 2, 173-187, 2008 10. S.Iriyama, M.Ohya and I.V.Volovich (2006) Generalized Quantum Turing Machine and its Application to the SAT Chaos Algorithm, QP-PQ:Quantum Prob. White Noise Anal., Quantum Information and Computing, 19, World Sci. Publishing, 204-225 11. S.Iriyama and M.Ohya (2008) Language Classes Defined by Generalized Quantum Turing Machine, Open System and Information Dynamics 15:4, 383-396. 12. S.Iriyama and M.Ohya (2009) The problem to construct Unitary Quantum Turing Machine for compute partial recursive function , TUS preprint. 13. S.Iriyama and M.Ohya (2010) Quantum Algorithm for Pebble Game and Its Computational Complexity, TUS preprint.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 185-197)
ON SUFFICIENT ALGEBRAIC CONDITIONS FOR IDENTIFICATION OF QUANTUM STATES
ANDRZEJ JAMIOLKOWSKI Institute of Physics, Nicholas Copernicus University, 87- 100 Torun, Poland E-mail: [email protected] The aim of this paper is to discuss the relationship between some problems of the identification of quantum states by geometric methods used in stroboscopic tomography and, on the other hand , by a lgebraic approach typical for the quantum generalization of classical sufficient statistics. Some examp les of necessary and sufficient conditions which must be fulfilled by generators of algebras in order to estimate states of quantum systems are discussed.
Keywords: algebras of observables; stroboscopic tomography; generators of subalgebras.
1. Introduction
One of the basic purpose of physical theories, in both classical and quantum sectors, is to describe events that are observed in Nature or in experiments conducted in laboratories. Usually, we have a physical system under investigation, and we try to obtain information about it by making some experiments. As results, measurement outcomes are registered. In fact, in many situations in classical systems, and in all cases in the quantum regime, we can not predict the individual measurement outcomes and we obtain only some probabilities of results. In other words, we obtain, as the outputs of experiments, only probability distributions on a set of possible measurement outcomes. The statistical theory of classical systems is based on classical probability theory. However, atoms and molecules obey the statistical laws of quantum mechanics and one has to develop a parallel theory based on quantum probability (noncommutative probability), which is an essential generalization of classical probability theory. In classical statistical physics we consider probability distributions as natural representation of states and random variables as representation of observables (physical quantities). In the description of microsystems it 185
186
is natural to start with the idea of observables as a quantum analogue of random variables and to define states as a derived concept. This means that we introduce quantum states through the concept of algebra of observables, taking the states to belong to the dual space. The algebra, furnished with the operation of conjugation (*-operation) is postulated to be a C*-algebra with identity. It can be considered as a natural generalization of the algebra of classical functions with *-operation given by complex conjugation, and whose real random variables are self-adjoint elements. A nice discussion of these issues is given by W. Thirring in his lecture notes 1. From the physical point of view, by a state we understand the description of statistical properties of a system when prepared repeatedly in the same way. In other words, one of the basic assumptions of quantum mechanics is based on the observation that determination of a completely unknown state can be achieved by appropriate measurements only if we have at our disposal a set of identically prepared copies of the system in question. From the mathematical point of view, we say that a state on a C*- algebra of observables is an assignment of a number (the expectation value) for every element of the algebra. This assignment should obey the natural laws of linearity and positivity. Thus, if we denote by A a C*algebra with identity (a set of observables) then a state on A is a linear map (1)
such that P(C"1Q1 + a2Q2) = a1P(Qd + a2P(Q2) for all a1, a2 in C and Q1,Q2 in A , and p(Q) ~ 0 for all positive Q E A. Usually, we also assume that p(lI) = 1. Moreover, to set up an effective approach to the above problem of state determination, one has to identify a collection of observables, a quorum 2 , such that their expectation values contain complete information about the state of the system under consideration. In the standard formulation of quantum mechanics, usually we introduce a Hilbert space H associated with a given microsystem and we identify an algebra of observables A with the set of hermitian elements of the Hilbert-Schmidt space B(H). For any normalized vector Iw) E H the formula
p,p(Q) =
(wIQlw)
(2)
defines a state on A. Such a state is called vector state. In general, the expectation values are given by convex combinations of some vector states
187
and we have the following equality
f5(Q)
=
Tr(pQ) ,
(3)
where p denotes a state of the system in question. The problems of state determination have gained new relevance in recent years , following the realization that quantum systems and their evolutions can perform tasks such as teleportation , secure communication or dense coding (c.f. e.g. 3,4). It is important to realize that if we identify the quorum of observables, then we also have a possibility to determine the expectation values of physical quantities (observables) for which no measuring apparatuses are available 5. The idea of stroboscopic tomography for open quantum systems appeared for the first time in the beginning of 1980 's (although it was expressed in different terms 6 ,7 from the ones used presently). In particular, the question of the minimal number of observables Ql, . .. , Q", for which quantum states can be (Ql , . . . , Q",)-reconstructible was discussed. Simultaneously, it was shown that reconstructibility of states in a finite dimensional systems can be achieved if a sequence of the so-called K rylov subspaces which are defined by
(4) where Q is a fixed observable and lL is a generator of time evolution of the system in question, span the Hilbert-Schmidt space B(H) (cf. below Sect. 3). That is, more precisely, if the following equality is satisfied
(5) In the above equality I-" denotes the degree of the minimal polynomial of the superoperator lL and Ql, ... , Qr represent fixed observables. The symbol ffi denotes the Minkowski sum of subspaces (5) (cf. e.g. 8). We recall that for two subspaces Kl and K2 of the vector space H, by Kl ffi K2 one understands the smallest subspace of H which contains both Kl and.K2 . It is well known that the Krylov subspaces Kk(lL, Q) for k = 1,2, ... form a nested sequence of subspaces of increasing dimensions that eventually become invariant under lL. Hence for a given Q, there exists an index I-" = 1-"( Q) , often called the grade of Q with respect to lL, for which Kl(lL,Q) <;; ... <;; KJJ-(lL,Q) = KJJ-+l(lL,Q) = KJJ-+2(lL,Q)... .
(6)
It is easy to see, that for a given operator Q, the natural number I-"(Q) is equal to the degree of the minimal polynomial of lL with respect of Q.
188
Clearly, p,( Q) :::; p,(lL) where p,(lL) denotes the degree of the minimal polynomial of the superoperator lL (cf. e.g. 9). Now, let us observe that even if the observables Q1, ... , Qr are linearly independent, the Krylov subspaces JCk(lL, Qi) for i = 1, ... , r can have nonempty intersections. When during one of the previous QBIC meetings the author presented the main ideas of the stroboscopic tomography, in response, Prof. M. Ohya immediately suggested an operator algebra approach to these problems and proposed to study the case of some subalgebras of the set of all observables, instead of Krylov subspaces. The main aim of this paper is to discuss properties of sub algebras which allow us for identification of quantum states and to analyse the minimal number of generators of these algebras. The organization of the paper is as follows: In Section 2 we formulate some questions connected with problems of estimation and discrimination of quantum states; Section 3 presents the main ideas of the stroboscopic tomography. Then, in Section 4, we discuss an algebraic approach to the problem of identification of quantum states. We conclude the paper in Section 5 by discussing some examples of algebraic methods in low dimensional systems. 2. Identification of quantum states
In the statistical description of physical systems the main role of observables is to statistically identify states, or some of their properties. A typical goal of an experiment can be to decide among various alternatives or hypothesis about states. As a very good reference on such type of problems we recommend the review book 10. The details of a particular identification problem depend on our prior knowledge and the properties we want to discuss. One can say that owing to both the a priori knowledge about states and the knowledge of our technical possibilities we define the alternatives that we should experimentally verify. In general, depending whether the set of alternatives is finite or not, one makes a distinction between discrimination and estimation problems. One can introduce three different types of problems: 1) State estimation problem. In its most general form, one wants to identify the state of a system assuming that no additional (prior) knowledge is available. In other words, the whole state space of a system constitutes the set of possible hypotheses. 2) Sufficient statistics for families of states. In this case we are interested in considering only a subset of the whole set of states. We
189
encode prior knowledge about the preparation of states in a multiparameter family of states and consider them as a possible set of hypotheses. For example, we can assume that one considers states which are vector states or have a particular block-diagonal form. 3) State discrimination problem. A particular case of the problem 2). One assumes that we want to identify the state which belongs to a finite set {PI"'" Pr} and our aim is to distinguish among these r possibilities. It is an obvious observation that in this case the set of observables used for identification can be restricted in an essential way. All above problems create very interesting particular questions and we will discuss them in separate publications. The problem 2) is discussed in details in our paper "Wandering subalgebras, sufficiency and stroboscopic tomography" 11. 3. Stroboscopic tomography of open quantum systems
Quantum theory - as a description of properties of microsystems - was born more then a hundred years ago. But for a long time it was merely a theory of isolated systems. Only around fifty years ago the theory of quantum systems was generalized. The so-called theory of open quantum systems (systems interacting with their environments) was established, and the main sources of inspiration for it were quantum optics and the theory of lasers. This led to the generalization of states (now density operators are considered as natural representation of quantum states), and to generalized description of their time evolution. At that time the concept of so-called quantum master equations - which preserve positive semi-definiteness of density operators - and the idea of a quantum communication channel were born, cf. e.g. 4,3,12. On the mathematical level, this approach initiated the study of semi groups of completely positive maps and their generators. Now, for the comfort of the Readers, we summarize the main ideas and methods of description of open quantum systems and the so-called stroboscopic tomography. The time evolution of a quantum system of finitely many degrees of freedom (a qudit), coupled with an infinite quantum system, usually called a reservoir, can be described, under certain limiting conditions, by a oneparameter semigroup of maps (cf. e.g. 13,14). Let 7i be the Hilbert space of the first system (dim 7i = d) and let
(7)
190
be a dynamical semigroup, where B*(Ji) denotes the real vector space of all self-adjoint operators on Ji . If one introduces the scalar product of operators A, B by the formula (A, B) = Tr(A* B), then B* (Ji) can be considered as yet another inner product space, namely the so-called Hilbert-Schmidt space with the norm defined by II p 112= Tr(p*p). States of the system are described by density operators p E S(Ji), where S(Ji) := {p E B*(Ji); p 2: 0, Tr p = I}.
(8)
Usually one assumes that the family of linear superoperators (t) satisfies
(1) (t) is trace preserving, t E ~~, (2) II (t)p II ::; II p II for all p E B*(Ji) , (3) (h) 0 (t2) = (h + t2), for all t1, t2 in ~~, and ift -+ 0, then lim (t) = If. Since such defined (t) is a contraction, it follows from the Hille-Yosida theorem that there exists a linear superoperator lK : B*(Ji) -+ B*(Ji) such that (t) = exp(tlK) for all t 2: 0 and
dp(t) dt
=1V
()
!R.p t ,
(9)
where p(t) = (t)p(O). One should stress that the above conditions for semigroup (t) imply preservation of positivity of density operators, p(O) 2: o =? p(t) = (t)p(O) 2: 0 for all t E ~~. Now, the equation (9) (usually called the master equation) defines an assignment (the trajectory of p(O)) ~~ :3 t
t--->
p(t) E S(Ji) ,
(10)
provided that we know the initial state of the system p(O) E S(Ji). The fundamental question of the stroboscopic tomography reads: What can we say about the trajectories (initial state p(O)) if the only information about the system in question is given by the mean values (11) of, say, r linearly independent self-adjoint operators Q1, ... , Qr at some instants t1, ... , t p , where r < d2 - 1 and tj E [0, TJ for j = 1, ... ,p, T> O. In other words, the problem of the stroboscopic tomography consists in the reconstruction of the initial state p(O), or a current state p(t), for any t E ~~, from known expectation values (11). To be more precise we introduce the following description. Suppose that we can prepare a quantum system repeatedly in the same initial state and we make a series of experiments such that we know the expectation values EQ(tj) = Tr(Qp(tj)) for a fixed
191
set of observables Q1,'" ,Qr at different time instants h < t2 < ... < tp' The basic question is: Can we find the expectation value of any other operator Q E B*(7t), that is any other observable from B*(7t) , knowing the set of measured outcomes of a given set Q1,"" Qr at t1, ... , t p, i.e. knowing Ej(tk) for j = 1, ... , r and where 0 S t1 < t2 < ... < tp S T, for an interval [0, T]? If the problem under consideration is static, then the state of ad-level open quantum system (a qudit) can be uniquely determined only if r = d2 - 1 expectation values of linearly independent observables are at our disposal. However, if we assume that we know the dynamics of our system i.e. we know the generator lK or lL := (lK)* (in the Heisenberg picture) of the time evolution, then we can use the stroboscopic approach based on a discrete set of times t 1, ... , tp' In general, we use the term "statetomography" to denote any kind of state-reconstruction method. With reference to the terminology used in system theory, we introduce the following definition: A d-level open quantum system § is said to be (Q1, ... ,Qr )-reconstructible on the interval [0, T], if for every two trajectories defined by the equation (9) there exists at least one instant £ E [0, T] and at least one operator Qk E {Q1,"" Qr} such that (12)
The above definition is equivalent to the following statement. A d-level open quantum system § is (Q1, ... , Qr )-reconstructible on the interval [0, T], iff there exists at least one set of time instants 0 < t1 < ... < tp S T such that the state trajectory can be uniquely determined by the correspondence
(13) for i = 1, ... , rand j = 1, ... ,p. Let us observe that in the above definition of reconstructibility we discuss the problem of verifying whether the accessible information about the system in question is sufficient to determine the state uniquely and we do not insist on determining it explicitly. The positive dynamical semi group {(t), t E ~~} is determined by the generator lK : B*(7t) ---> B*(7t) (the Schrodinger picture) and it is related to the generator lL of the semigroup in the Heisenberg picture by the duality relation
Tr[Q(lKp)] = Tr[(lLQ)p].
(14)
192
For a given set of observables Ql,"" Qr, the subspace spanned on the operators Qi, lLQi, ... , (lL)k-lQi, will be denoted by
(15) as the Krylov subspace in the Hilbert-Schmidt space B*(H). If k = JL, where JL is the degree of the minimal polynomial of the generator lL, then the subspace JCI-'(lL, Qi) is an invariant subspace of the superoperator lL with respect to Qi' It can be easily seen that the subspace JCI-'(lL, Qi) is essentially spanned on all operators of the form (lL)kQi' where k = 0, 1, .... Furthermore, it is the smallest invariant subspace of the superoperator lL containing Qi (i.e. the common part of all invariant subspaces of the operator lL containing Qi)' One can now formulate the sufficient conditions for the reconstructibility of a d-level open quantum system (c.f. 6,7). Let § be a d-level open quantum system with the evolution governed by an equation of the form Q(t) = lLQ(t) (the Heisenberg picture), where lL is the generator of the dynamical semigroup \f!(t) = exp(tlL). Suppose that, by performing measurements, the correspondence
(16) can be established for fixed observables Ql, ... ,Qr at selected time instants t l , ... , tp. The system § is (Ql,"" Qr)-reconstructible if
(17) The above condition has been obtained by using the polynomial representation of the semigroup \f!(t). Indeed, if JL(A, lL) denotes the minimal polynomial of the generator lL and JL = degJL(A,lL), then \f!(t) = exp(tlL) can be represented in the form 1-'-1
\f!(t) =
L Qk(t)lL k,
(18)
k=O
where the functions Qk(t) for k = 0, ... , JL - 1 are particular solutions of the scalar linear differential equation with characteristic polynomial JL(A, lL). Since the functions Qk(t) are mutually independent, therefore for arbitrary T > there exists at least one set of moments t l , ... ,tl-' (JL = deg JL(A, lL)) such that
°
(19)
193
and det[ak(tj)] -=I O. Taking into account these conditions one finds that the state p(O) can be determined uniquely if operators of the form (20) for l = 1, ... , rand k = 0,1, ... span the space B*(1t). In other words, we can say that p(O) can be determined if vectors (20) constitute a frame in Hilbert-Schmidt space B*(1t) or, equivalently, if Krylov subspaces K!l(lL, Qz) for l = 1, ... ,r constitute a fusion frame in B*(1t) 15. The question of an obvious physical interest is to find the minimal number of observables Ql,"" Qry for which a d-level quantum system § with a fixed generator lL can be (Ql , "" Qry)-reconstructible. It can be shown that for an d-level generator there always exists a set of observables Ql,"" Qry, where TJ:= max {dim Ker().IT -lLn, AEa(lL)
(21)
such that the system is (Ql, ... ,Qry)-reconstructible 9. Moreover, if we have another set of observables Ql, ... ,Qry such that the system is (Ql, ... ,Qry)reconstructible, then TJ· The number TJ defined by (21) is called the index of cyclicity of the quantum open system § 9. The symbol (j(lL) in (21) denotes the spectrum of the superoperator lL.
r; ;: :
4. Algebraic approach to identification problems In this section we will discuss some problems of estimation of quantum states when the Krylov subspaces playing such an important role in the stroboscopic tomography are replaced by some sub algebras of the HilbertSchmidt space B*(1t). Just as the fundamental theorem of algebra ensures that every linear operator acting on a finite dimensional complex Hilbert space has a nontrivial invariant subspace, the fundamental theorem of noncommutative algebra asserts the existence of invariant subspaces of 1t for some families of operators from B*(1t). It is an obvious observation that an algebra generated by any fixed operator Q and the identity on 1t can not be equal to B*(1t). This statement is based on the Hamilton-Cayley theorem. However, already for two operators Ql, Q2 and the identity we can have Alg(I, Ql, Q2) =B(1t) (for details cf. below and the next section). In general, the famous Burnside's theorem states that an operator algebra on a finite-dimensional vector space with no nontrivial subspaces must be the algebra of all linear operators. In the sequel we will use the following version of this theorem:
194
Fundamental theorem of noncommutative algebras. If A is a proper subalgebra of B*(H) containing identity, and the dimension of the Hilbert space H is greater or equal to 2, then A has a proper nonzero invariant subspace in H (i.e., the subspace is invariant for all members Q of A). We will apply the above theorem for the following problem. Given a set F = {Q1, ... , Qr} of observables, we would like to establish conditions, when the operators Q1, ... , Qr generate the whole algebra B(H) . In other words, we want to determine whether every element in B(H) can be represented in the form 7f(Q1, ... , Qr), where 7f is a noncommutative polynomial. Let us observe that according to the fundamental theorem if A is a sub algebra of the full complex algebra B(H), then a nontrivial invariant subspace in H exists if and only if dimA
< dimB(H).
(22)
If a set of generators of A is known, then the above inequality can be verified by a finite number of arithmetic operations. The procedures possessing such property are called effective. A very important example of an effective procedure can be formulated when we discuss the problem of the existence a common one-dimensional invariant subspace for a pair of operators Q1, Q2. In other words, we ask about a common eigenvector for two operators Q1, Q2. An answer to this question is given by the following procedure. Let the symbol [Q1, Q2J denote, as usual, the commutator of the operators Q1, Q2. Then a common eigenvector for Q1 and Q2 exists, if and only if, the subspace K of H defined by
n
d-1
K :=
Ker[Q{, Q~J
(23)
j=l k=l
where d = dim H, satisfies the condition dim K > 0 (this is the so-called Shemesh criterion). A short proof of this condition is possible. First of all, let us observe that if J'Ij!) is a common eigenvector of the operators Q1 and Q2, i.e.,
(24) then J'Ij!) belongs to Ker[Q{, Q~J for all j, k greater then 1. This fact and the inequality dim K > 0 means that the gist of the She mesh condition is in observation that the subspace K is invariant under Q1 and Q2. Indeed, if J'Ij!) belongs to K, then by the definition of subs paces Ker[Q{, Q~J one can check that Q1J'Ij!) E K and Q2J'Ij!) E K. Now, let us choose a basis for K and
195
extend it to a basis in H. We then observe that there exists a nonsingular matrix S such that matrices SQlS-l and SQ2S-l have block-triangular forms and the submatrices which correspond to subspace JC commute. This means that these submatrices have a common eigenvector and therefore the same is true for Ql and Q2. D. Shemesh 16 has observed that the condition dim JC > 0 is equivalent to the singularity of the matrix d-l
M:= 2)Q{,Q~l*[Q{,Q~]'
(25)
j=l k=l
where * denotes complex conjugate transpose. For our purposes, on the basis of Burnside's theorem, more interesting is the case when matrices Ql, Q2 do not have common eigenvectors and the algebra A(Ql, Q2) generated by them coincides with B(H). This situation may be expressed by the following inequality detM>O,
(26)
which can be checked by an effective procedure, that is, by a finite number of arithmetic operations. It is obvious, that the matrix M is in general semipositive definite, and the above condition means the strict positivity ofM.
5. Examples. Low dimensional cases Now, in order to illustrate algebraic methods in estimation problems, we will discuss some algebraic procedures in low dimensional cases. For quantum systems of qubits and qutrits one can formulate an explicit form of some conditions in a matrix form which is sometimes more transparent then the general operator form. We will use the so-called vee operator procedure which transforms a matrix into a vector by stacking its columns one underneath the other. It is well known, that the tensor product of matrices and the vec operator are intimately connected. If A denotes a d x d matrix and aj its j-th column, then vec A is the d 2 -dimensional vector constructed from aI, ... , ad. Moreover if A, B, C are three matrices such that the matrix product ABC is well defined, then
vec(ABC)
=
(C T ® A) vec B.
(27)
In the above formula C T denotes the transposition of the matrix C. In particular we have
196
vec A
= (IT 0
A) vec IT
=
(AT
o IT) vec IT.
(28)
Let us agree that when we say that a set of matrices generates the set B(H), we are thinking about B(H) as an algebra, while when we say that a set of matrices forms a basis for B(H) , we are talking about B(H) as a vector space. For qubits, that is for two-dimensional Hilbert space, one can show by a direct computation that det(vec IT, vec Q1, vec Q2, vec(Q1Q2)) = det([Q1, Q2]), det(veclI, vec Q1 , vec Q2 , vec[Q1 ' Q2])
= 2 det([Q1, Q2]),
and
(29)
where on the left hand side we have the determinants of the 4 x 4 matrices and on the right hand sides [Q1, Q2] denotes the commutator of the two 2 x 2 matrices. From the last equality it follows , that if matrices IT, Q1, Q2 and [Q1 , Q2] are linearly independent, then the algebra which is spanned by them has the dimension 4, so Q1, Q2 and IT generate B(H). In other words, two operators Q1 ,Q2 and the identity generate B(H) if and only if the matrix [Q1 ,Q2] has the determinant different from zero. In a similar way one can show that the matrices QI, Q2, Q3, such that no two of them generate B(H), can generate B(H) if and only if the double commutator [Q1, [Q2, Q3ll is invertible. In general, the matrices Q1, ... , Qr generate B(H) iff at least one of the commutators [Qi, Qj] or double commutators [Qi, [Qj, Qkll is invertible 17. In the case of qutrits, that is for a three-dimensional Hilbert space, one can show by direct calculation that if [Q1, Q2] is invertible and W([Q1 ,Q2]) =1= 0, where for Q E B(H) the symbol w(Q) denotes the linear term in the characteristic polynomial of Q, then one can construct an explicit basis for B(H). Indeed, if Q1, Q2 belong to B(H), and (dim H) = 3, then the determinant of the 9-dimensional matrix n build from vec transformations oflI, Q1, Q2, Qi , Q~, Q1Q2, Q2Q1, [Q1, [Q1, Q2 ll , [Q2, [Q2, Q1ll satisfies the equality
(30) That is, if det([Q1, Q2]) =1= 0 and w(Q) =1= 0, then the columns of the matrix correspond to a basis for B(H). Of course, one can also use the Shemesh criterion to characterize pairs of generators for B(H), where dim H = 3.
n
197
References 1. W. Thirring, Quantum Mathematical Physics, (Springer, 2001). 2. W. Band, J. L. Park, Am. J. Phys. 47, 188 (1979). 3. M. A. Nielsen, 1. Chuang, Quantum Computation and Quantum Information, (Cambridge Univ. Press ., 2000). 4. K. Kraus, Ann. Phys. 64 , 119 (1971). 5. S. Weigert , in New Insight in Quantum Mechanics Eds. H.-D. Doebner et al. (Singapore: World Scientific, 2000). 6. A. Jamiolkowski, Rep. Math. Phys. 5, 415 (1975). 7. A. Jamiolkowski, Internat. J. Theoret. Phys. 22, 369 (1983). 8. R. T. Farouki et al., Geometriae Dedicata 85, 283 (2001). 9. A. Jamiolkowski, in Quantum Bio-Informatics III, Eds. L. Accardi et al. (Singapore: World Scientific, 2010). 10. M. Paris, J. Rahecek, eds., Quantum State Estimation, vol. 649 of Lecture Notes in Physics, (Springer, 2004). 11. A. Jamiolkowski, M. Ohya, N. Watanabe, T. Matsuoka, in preparation. 12. A . S. Holevo, Statistical Structure of Quantum Theory (Springer, Berlin, 2001). 13. V. Gorini et al., J. Math. Phys. 17, 149 (1976). 14. G. Lindblad, Comm. Math. Phys. 48, 119 (1976). 15. A. Jamiolkowski, J. Phys. Conf. Series 213, 012002 (2010). 16. D. Shemesh , Lin. Algebra Appl. 62, 11 (1984). 17. H . Aslaksen, A. B. Sletsj0e Lin. Algebra Appl. 430, 1 (2009).
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 199- 208)
CONCURRENCE AND ITS ESTIMATIONS BY ENTANGLEMENT WITNESSES
JACEK JURKOWSKI Institute of Physics, Nicolaus Copernicus University, Grudzigdzka 5/7, 87- 100 Torun, Poland
The notion of concurrence and its role as a measure of quantum correlations for both pure and mixed states is recalled. However, for mixed states concurrence is hard to compute and some estimations of it are necessary. It has been demonstrated how to use entanglement witnesses in this procedure. In particular, it has been shown that each entanglement witness detecting given bipartite entangled state provides an estimation of its concurrence. The results are illustrated by several examples .
1. Introduction
The phenomenon of entanglement between components of a quantum system remains still at the heart of many investigations in quantum information theory 1,2. The notion of entanglement was introduced to characterize non-classical correlations between systems appearing in quantum theory. Although a great progress in quantum information theory has been achieved some problems concerning detecting and quantifying entanglement are still open 3,4,5. In particular, there is still no general algorithm which enables to decide if a given mixed quantum composite state is entangled or not. Concerning quantifying entanglement, there were introduced various measures 6,7,8 ,9,10, 11,12 but these which fulfill some reasonable conditions are hard to compute for mixed states and multipartite systems. Hence, there is a place for methods which give estimations of entanglement measures 13,14,15,16,17,18. The most welcome are procedures which for a given family of states parametrized by x allow to construct estimations as analytical functions of x. In the paper, we focus on some particular method of estimation based on the notion of an entanglement witness (EW) 1,2 and we use it to estimate a measure of entanglement called concurrence 8. We show that EWs can be used not only to detect entanglement but to quantify it as well. We 199
200
illustrate our investigations with several examples.
2. Detecting and measuring entanglement In this paper, we focus ourselves on bipartite entanglement which describes a sort of quantum correlations between two quantum systems defined in their m-dimensional Hilbert spaces HA and HE, respectively. The pure bipartite state 1'1/1 E HA (>9 HE of the composite system is called separable if it can be written in a product form
in terms of two pure states I¢ A and l
m
k=l
k=l
(1) where Mk are Schmidt coefficients. Hence, a pure state 1'1/1 is entangled if and only if at least two Schmidt coefficients are nonzero. Moreover, using decomposition (1) we can introduce the following two measures of entanglement: • entropy of a reduced system PA
= Tr E (1'1/1 ('1/1 I)
defined as
and • concurrence
[(1'1/1 ) and its generalization to mixed states called entanglement of formation (EOF) is considered as the most natural measure but it is hard to compute and even to estimate for most mixed states. Therefore, it is more convenient to use concurrence even if it has no direct connection with entropy. For pure states both measures are simple functions of Schmidt
201
coefficients,
E(I1/J )
= -
L f.tk log2 f.tk , k
C(I1/J) =
2 L
f.tkf.tZ·
k,l#k For mixed states the situation is much more complicated. Let us recall that a mixed state P is entangled if it cannot be written as a convex combination of product mixed states of the subsystems, i.e.
Pk > 0 ,
P= LPkPkA ) 0 PkB ) , k
LPk = 1 , k
where PkA ) and PkB ) are some mixed states of subsystems A and B, respectively. There is still no universal method of detecting entanglement for any mixed state. In particular, entanglement in so-called positive partially transposed (PPT) states (for m 2': 3) is hard to recognized. Entanglement if detected should also be quantified. One expects that a measure M(p) should fulfill at least the following requirements:
(1) M(p) 2': 0 and M(p) = 0 for separable P (sometimes M(Pmax ) = 1 for maximally entangled state Pmax),
(2) invariance with respect to local unitary operation, (3) convex function with respect to convex combination of mixed states, M( LkPkPk)
~
LkPkM(Pk)·
A method extending measures initially defined for pure states to a mixed state setting and which guarantees fulfilling (1)-(3) is the convex roof construction. The procedure consists in a decomposition of P into a convex combination of I-dimensional projectors
P = LPkl1/Jk (1/Jkl, k
LPk = l. k
and then taking infimum over all such decompositions , i.e. one defines
C(p)
=
E(p) =
inf
{P k.!1f k}
inf
{Pk.!1fk}
LPkC(I1/Jk) ' k
LPkE(I1/Jk)' k
as concurrence and EOF for mixed states, respectively.
(2)
202
The problem arises when carrying out the extremalisation procedure in
(2) - exact results are known only in very limited cases. The most famous of them are two-qubit systems and states exhibiting some symmetry (like isotropic or Werner states). In particular, for two qubits there has been shown long ago 8 that (3)
),2 ~ ),3 ~ ),4 are singular values of a matrix Tkl = (Vk 100y 181 with IVk denoting eigenvectors of p, and 0" y stands for the Pauli matrix. Simultaneously, EOF is a function of concurrence, where
0" Y
)'1 ~
Ivi
where S2(X) = -x log2 X - (x -1) log2(x -1) is the binary entropy function. For isotropic states indexed by a fidelity f E [0, 1] and defined as
where 1'ljJ+ is the maximally entangled state, it is well-known 21 that
C(Pf) =
{~ m(m - ~
(mf-1),
1
-
1
f:::; 11m.
EOF is also known 22 but the relation is much more complicated. Unfortunately, the general algorithm of calculating concurrence, EOF and other measures based on the convex roof construction is still missing. Therefore, in what follows we will focus ourself on methods leading to estimations of concurrence.
3. Estimations of concurrence
There are several methods of estimating concurrence. In fact, almost every criterion of entanglement can be reformulated in such a way that it can serve as a source for appropriate estimation of concurrence. Let us recall here the methods based on trace norms of a realigned matrix or partially transposed matrix of the state 19, on local uncertainty relations, on geometrical methods 20 and others. We focus here on the following Breuer results 24:
203
Proposition 3.1. Let f(p) be a convex functional of the state p obeying
L .Jf-tkf-tl
f(l 1/! (1/! I) ~
k#l
for every pure state I1/! described by Schmidt coefficients yTik. Then C(p) ::::
J
m(m2 _ 1) f(p)·
In what follows, we are going to define f(p) in terms of the EW detecting p. Let us recall that a hermitian operator W is called an entanglement
witness (EW) for a state p, if Tr(pW) < 0 while Tr(aW) :::: 0 for all separable states a. There are many examples 25 of entanglement measures M (p) (concurrence, negativity, robustness, etc.), which can be related to the expectation value of some entanglement witness. In the case of concurrence one has the following proposition due to Breuer 24 Proposition 3.2. Let W be an entanglement witness such that
-(1/!IWI1/! ~
L
.Jf-tif-tj
(4)
i ,jf.i
for every pure state (1). Then for an arbitrary mixed state p detected by the witness W C(p) ::::
J
m(:: _ 1) ITr(pW)I·
(5)
Proposition 3.2 is a simple consequence of Proposition 3.1 if one chooses f(p) = -Tr(pW). 4. Main result
Proposition 3.2 distinguishes a class of witnesses satisfying condition Suppose now that W does not satisfy this co~dition. Clearly, for any a the rescaled operator a - I W still defines an EW. Does a-I W satisfy To answer this question let us observe that for I1/! = L::1 $ilai, bi expectation value of W reads as follows (1/! IWI 1/! =
L
.Jf-tkf-tIA~"'()(1/!) ,
(4). >0 (4)? the
(6)
k,l
where the 1/!-dependent matrix A~"'() is defined by
A~"'( ) (1/!) = Re (ak ' bk IWlal, bl
(7)
204
Note, that A kk (W) states, hence
(ak' bk IWlak, bk is a mean value of W on separable
(8) It is clear that Ak~) (1/J) encodes the entire information about W. Moreover, the condition (4) is equivalent to
(9) k,l Let us observe that the space of normalized vectors defines a compact set and hence one may define a positive number )'(W) by the following procedure
-)'(W) := min min A(W\1/J) . 1jJ k#l kl
(10)
It provides a new characteristics of W. Now, comes the main result Theorem 4.1. For any a ;:: )'(W) the rescaled entanglement witness does satisfy (4).
a-I W
The proof is almost trivial. One has
k,l
k
+
L
v'J-LkJ-Ll(Ak~)(1/J) + 1)
k,l#k ;:: 1 +
L
v'J-LkJ-LI(Ak~)(1/J) + 1)
,
k,l#k where we have used (8). As a consequence, if
A(W)(ol,) kl 'f/ -> -1
,
(11)
for all k, l, k -I- l and every normalized 1/J, then W does satisfy (9), hence (4). Suppose now that the above condition is not satisfied. It is therefore clear that for the rescaled witness We< := a - I W with a ;:: ),(W), one has (12) which proves our theorem. It should be stressed that the best estimation is provided by the witness corresponding to a = ),(W), because in this case we achieve equality in (12) at least for some k, l.
205
Summing up, we propose the following simple procedure which provides the estimation of concurrence of P if there is some EW W detecting p: W
------t
A k~)
------t
).
(W)
------t
WA
------t
Tr (WA p)
------t
C (p ) .
Tr (WAP) gives the estimation for concurrence
C(p) 2: -
~(2) Tr(WAP) .
V~
5. Examples Example 5.1. Let HA
= HB = em
and consider the flip operator
m
p
L
=
Ii (j I 0 1j (i I
i,j=l
where {Ii} is the computational basis in
Akf) = Now, evidently Ak~)
em.
Simple calculation gives
(ak' bk IFlal, bl = (ak Ibl)(bk lal).
= I(ak Ibk) 12 2: 0 and
Akf) =
for k -=I- l
Re((aklbl )(bklal)) > -1
according to orthonormality of both basis. Example 5.2. Let us consider isotropic states in
em 0 em (13)
and let us consider a EW
26
(14) satisfying Tr[WiSOpjl
= ~ - f, m
that is, W i so detects isotropic state with fidelity ~ pTA the previous example implies for i -=I- j (W iSO )
A 2J
1
> - m' -
f > 11m. Since l'lfi+('lfi+ I =
206
which shows that ,\,(Wiso) = 11m. As a consequence, the optimal W iso , in the sense of (12), is W iso = mWiso. Now,
(15) and it turns out that the estimation (5) gives an exact result,
V
2m m-1
Example 5.3. In 0 (C3
23
(1 - ~) = C(Pf)· m
we have investigated an E-family (10 > 0) of states in
(C3
(16)
where P3+ denotes a maximally entangled state, 1
di,i+l = 10,
di,i+2 = - ,
(mod 3)
10
and the normalization factor 1 Ne;=-----=1 +E+ C 1
It turns out that p(E) is entangled if and only if 10 i= 1. entanglement is detected by the entanglement witness
1 ... -1
Moreover, its
. -1
1 ..
-1
. -1
1 1
1.
-1
.. -1
1
(17)
207
for E < 1 and
-1· .
1
-1
.1 1
W2
-1·
-1·
1
-1
(18)
.. 1 -1· .. 1
for E > 1. To make pictures more transparent we replaced all zeros by dots. Interestingly, WI corresponds to the celebrated Choi positive indecomposable map and W 2 to its dual. Numerical calculations show that indeed A~ii)(1jJ);::: -1 for i = 1,2. Hence one obtains the following estimation for concurrence based on the above EWs E(E-l)
C( (E)) > _ ~ { 1 + E + E2 P
-
J3
l-E
1 + E + E2
O<E<1
(19) E>1
We stress, however, that this estimation is weaker than the one obtained from the trace norm of realigned matrix (see Fig. 1). c
Figure 1. Two estimations of concurrence as a function of c. The dashed line is for the estimation based on IIR(p(c))lll' The solid line is for the estimation based on (19).
208
References 1. M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information, Cambridge University Press, Cambridge, 2000. 2. R. Horodecki, P. Horodecki, M. Horodecki, K. Horodecki, Rev. Mod. Phys. 81, 865-942 (2009). 3. O. Guhne, G. Toth, Phys. Reports 474,1-75 (2009). 4. O. Guhne, M. Reimpell, R.F. Werner, Phys. Rev. Lett. 98, 110502 (2007). 5. J. Eisert, F. G. S. L. Brandao, K. M. R. Audenaert, New J. Phys. 9, 46 (2007). 6. V. Vedral, M. B. Plenio, M. A. Rippin, and P. L. Knight, Phys. Rev. Lett. 78, 2275 (1998). 7. C. H. Bennett, D. P. DiVincenzo, J. A. Smolin, W. K. Wootters, Phys. Rev. A 54, 3824 (1996). 8. W. K. Wootters, Phys. Rev. Lett. 80, 2245 (1998). 9. G. Vidal and R. Tarrach, Phys. Rev. A 59 141 (1999). 10. M. Steiner, Phys. Rev. A 67, 054305 (2003). 11. H. Barnum and N. Linden, J. Phys. A: Math. Gen. 34, 6787 (2001). 12. T.-C. Wei and P. M. Goldbart, Phys. Rev. A 68, 042307 (2003). 13. A. Datta, S. T. Flammia, A. Shaji, C. M. Caves, Phys. Rev. A 75, 062117 (2007). 14. O. Guhne, M. Reimpell, R.F. Werner, Phys. Rev. A 77, 052317 (2008). 15. R. Augusiak, M. Lewenstein, Quantum Information Processing 8, 493-521 (2009). 16. Zhihao Ma, Fu-Lin Zhang, Dong-Ling Deng, Jing-Ling Chen, Phys. Lett. A 373,1616-1620 (2009). 17. J.I. de Vicente, Phys. Rev. A. 75, 052320 (2007). 18. F. Mintert, M. Kus, A. Buchleitner, Phys. Rev. Lett. 92, 167902, (2004). 19. Kai Chen, S. Albeverio, Shao-Ming Fei, Phys. Rev. Lett. 95, 040504 (2005). 20. Cheng-Jie Zhang, Yong-Sheng Zhang, Shun Zhang, Guang-Can Guo, Phys. Rev. A 76, 21. P. Rungta, C. M. Caves, Phys. Rev. A 67, 012307 (2003). 22. B. M. Terhal, K. G. H. Vollbrecht, Phys. Rev. Lett. 85, 2625 (2000) 23. J. Jurkowski, D. Chruscinski, A. Rutkowski, Open Sys. Information Dyn. 16, 235 (2009). 24. H.-P. Breuer, J. Phys. A: Math. Gen. 39, 11847 (2006). 25. F. G. S. L. Brandao, Phys. Rev. A 72, 022310 (2005). 26. B.M. Terhal, P. Horodecki, Phys. Rev. A 61, 040301 (2000).
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 209- 222)
CLASSICAL WAVE MODEL OF QUANTUM-LIKE PROCESSING IN BRAIN
A. KHRENNIKOV International Center for Mathematical Modelling in Physics and Cognitive Sciences Linnaeus University, S-35195, Viixjo, Sweden
We discuss the conjecture on quantum-like (QL) processing of information in the brain. It is not based on the physical quantum brain (e.g., Penrose) - quantum physical carriers of informat ion. In our approach the brain created the QL representation (QLR) of information in Hilb ert space . It uses quantum information rules in decision making. The existence of such QLR was (at least preliminary) confirmed by experimental data from cognitive psychology. The violation of the law of total probability in these experiments is an important sign of nonclassicality of data. In so called "constructive wave function approach" such data can be represented by complex amplitudes. We presented 1,2 the QL model of decision making. In this paper we speculate on a possible physical realization of QLR in the brain: a classical wave model producing QLR . It is based on variety of time scales in the brain . Each pair of scales (fine - the background fluctuations of electromagnetic field and rough - the cognitive image scale) induces the QL representation . The background field plays the crucial role in creation of "superstrong QL correlations " in the brain.
1. Introduction
Many authors (e.g., Roger Penrose and Stuart Homeroff) advertized the model of the quantum brain, i.e., quantumness of the brain is a consequence of its composition of quantum systems. Of course, such a brain is a processor of quantum information. This approach induces numerous complicated questions on physics of such brain, e.g., "Is the brain too hot to be quantum?" Therefore, although we do not deny completely this interesting model, we do not couple our mathematical model of quantum information processing in the brain to the "physical quantum brain." In 2001 I pointed 3 ,4 to coupling between violation of the law of total probability (LTP) and interference of probabilities in quantum mechanics, e.g., in the fundamental two slits experiment. Interference (both classical and quantum) implies the violation of LTPi moreover, violation of LTP 209
210
induces the wave-like representation of probabilistic data by complex (and more general) amplitudes - the constructive wave function approach4. LTP plays the fundamental role in decision making; its violation implies a new strategy in decision making - nonclassical decision making 5 . We point out that LTP is violated in some experiments of cognitive psychology, e.g., games of the Prisoner's Dilemma (PD) type or recognition of ambiguous figures 1 ,6 for detailed presentation. The violation of LTP in these experiments is an important sign of nonclassicality of cognitive data. In the constructive wave function approach that such data can be represented by complex probability amplitudes 6 . This is an important motivation to look for QL models of information processing in the brain. In previous works 1 ,2 we presented quantum information models of decision making in games of the PD-type. This is a model of how the brain using QLR of information might work. Thus, on one hand, we have experimental data which support the hypothesis of QL processing of information in the brain and, on the other hand, we have a theoretical model of such processing. In this paper we are looking for models of physical realization of QLR in the brain. We propose a classical (!) wave model which reproduces probabilistic effects of quantum information theory. Why do we appeal to classical electromagnetic fields in the brain and not to quantum phenomena? In neurophysiological and cognitive studies we see numerous classical electromagnetic waves in the brain. Our conjecture is that these waves are carriers of mental information which is processed in the framework of quantum information theory. In the quantum community there is a general opinion that quantum effects can not be described by classical wave models (however, cf. Schrodinger). Even those who agree that the classical and quantum interferences are similar emphasize the role of quantum entanglement and its irreducibility to classical correlations (however, cf. Einstein-PodolskyRosen). It is well known that entanglement is crucial in quantum information theory. Although some authors emphasize the role of quantum parallelism in quantum computing, i.e., superposition and interference, experts know well that without entanglement the quantum computer is not able to beat the classical digital computer. Recently I propose a classical wave model reproducing all probabilistic predictions of quantum mechanics, including correlations of entangled systems, so called prequantum classical statistical field theory (PCSFT) 7,8 and see paper9 for the recent model for composite systems. It seems that,
211
in spite of mentioned common opinion, the classical wave description of quantum phenomena is still possible. In this paper we apply PCSFT to model QL processing of information in the brain on the basis of classical electromagnetic fields. This model is based on the presence of various time scales in the brain. Roughly speaking each pair of time scales, one of them is fine - the background fluctuations of electromagnetic (classical) field in the brain, and another is rough - the cognitive image scale, can be used for creation of QLR in the brain. The background field (background rhythms in the brain) which is an important part of our model plays the crucial role in the creation of "superstrong QL correlations" in the brain. These mental correlations are nonlocal due to the background field. These correlations might provide a solution of the binding problem. Each such a pair of time scales, (fine, rough), induces QLR of information. As a consequence of variety of time-scales in the brain, we get a variety of QL representations serving for various psychological functions. This QL model of brain's work was originated in author's paper 10 . The main improvement of the "old model" is due to a new possibility achieved recently by PCSFT: to represent the quantum correlations for entangled systems as the correlations of the classical random field, so to say prequantum field. This recent development has also highlighted the role of the background field, vacuum fluctuations. We now transfer this mathematical construction designed for quantum physics to the brain science. Of course, it is a little bit naive model, since we do not know the the correspondence between images and probability distributions of random electromagnetic fields in the brain. 2. The role of the law of total probability
2.1. LTP and classical decision making We recall the classical LTP: The prior probability to obtain the result, e.g., b = + 1 for the random variable b is equal to the prior expected value of the posterior probability of b = + 1 under conditions a = + 1 and a = - 1;
P(b = j) = P(a = +1)P(b = jJa = +1) +P(a = -1)P(b = jJa = -1), (1) where j = +1 or j = -l. LTP gives a possibility to predict the probabilities for the b-variable on the basis of conditional probabilities and the a-probabilities. The main idea behind applications of LTP is to split in general complex condition,
212
say C, preceding the decision making for the b-variable into a family of (disjoint) conditions, in our case C+ = {a = +1} and C~ = {a = -I}, which are less complex. Then one estimate in some way (subjectively or on the basis of available statistical data) the probabilities under these simple conditions: P(b = ±Ia = ±1) and the probabilities P(a = ±1) of realization of conditions Cj;. On the basis of these data LTP provides the value of the probability P(b = j) for j = ±1. If, e.g., P(b = +1) is larger than P(b = -1), it is reasonable to make the decision b = +1, say yes. Typically decision making is based on two thresholds for probabilities (assigned depending a problem): 0 ::::: c_ ::::: c+ ::::: 1. If the probability P(b = +1) ~ c+, the decision b = +1 should be done. If the probability P(b = +1) ::::: c _ , i.e., P(b = -1) ~ 1 - c, the decision b = -1 should be done. If c < P(b = +1) < c+, then an additional analysis should be performed. My basic conjecture was that cognitive systems developed the ability to use nonclassical LTP for decision making:
P(b
= ±lIC) =
P(a
= +lIC)P(b = ±lla = +1) + P(a = -lIC)P(b = ±Ia = -1) + 2 cos B± jI4,
where II± = P(a = +lIC)P(b = ±Ia = +l)P(a = -lIC)P(b = ±Ia = -1). This formula (the classical LTP perturbed by so called interference term) can be easily derived in the formalism of quantum mechanics where observabIes a and b are represented by self-adjoint operators ll . We can derive 4 it without appealing to the Hilbert space formalism, namely, by controlling contextual dependence of probabilities. We recall that mathematically contextuality of probabilities is equivalent to non-Kolmogorovness of probabilistic data.
2.2. Violation of LTP from contextuality of probabilities In particular, LTP is violated in quantum physics, in the two slit experiment. The b-observable gives the position of photon on the registration screen. If one likes to couple coming considerations to the decision making, she can consider the problem of prediction of the position of photon's registration: to predict the probability that photon hits a selected domain on the registration screen.
213
To make the b-variable discrete, we split the registration screen into two domains say B+ and B_ and if a photon makes the black dot in B+, we set b = +1, and in the same way we define the result b = -l. The a-variable describes the slit which is used by a particle; say a = +1 the upper slit and a = -1 the lower slit. For simplicity, we set P( a = + 1) = P(a = -1) = 1/2, so the source is placed symmetrically with respect to slits. Consider three different experimental contexts: C : both slits are open. We can find P(b = +1) and P(b = -1) from the experiment as the frequencies of photons hitting the domains B+ and B_, respectively. C+' : only one slit, labeled by a = +1, is open. We can find P(b = jla = ±1, the frequencies of photon hitting B+ and B_, respectively.
=
+l),j
: only one slit, labelled by a = -1, is open. We can find P(b = jla = ±1, the frequencies of photon hitting B+ and B_, respectively. If we put these frequency-probabilities, collected in the three real experiment, we see that LTP is violated. The classical LTP cannot be used to predict, e.g., the probability P(b = +1) that photon hits B+ under the context C (both slits are open) on the basis of probabilities P(b = jla = ±1), that photon hits B+ under contexts (only one respective slit is open). C~
-l),j
=
C±
2.3. Violation of LTP in cognitive science Data obtained in experiments of cognitive psychology (Tversky-Shafir, Croson, see book l for details) demonstrated violation of LTP. As in the above analysis of the two slit experiment, incompatible contextual structures can be easy found in all these cognitive experiments. We emphasize that here violation of LTP is even more general than described by the Dirac-von Neumann formalism of the standard quantum mechanics (as, e.g., in the two slit experiment). Thus processing of information in cognitive systems is even more nonclassical than in quantum physics. One of possibilities to proceed is to use quantum Markov chains, see L. Accardi, A. Khrennikov, and M. Ohya l2 : a concrete quantum Markov chain reproducing data from Tversky-Shafir experiment was constructed. We also mention experiments on recognition of ambiguous figures l , which were designed on the basis of the author's paper 6 . It seems that their results also can be reproduced on the basis of quantum Markov chains. Recently M. Asano, A. Khrennikov, and M. Ohya 2 proposed a generalized quantum model based on so called liftings of density operators modeling the process of decision making in the PD-type games.
214
2.4. Wave representation of information in the brain? One may come with the conjecture that decision making with nonclassical LTP is based on a kind of the wave representation of information in the brain. The brain is full of classical electromagnetic radiation. May be the brain was able to create QLR of information via classical electromagnetic signals, cf. K.-H. Fichtner, L. Fichtner, W. Freudenberg and M. Ohya 13 . Classical waves produce superposition and violate LTP. However, quantum information processing is based not only on superposition, but also on ENTANGLEMENT. It is the source of superstrong nonlocal correlations. Correlations are really superstrong - violation of Bell's inequality. Can entanglement be produced by classical signals? Can quantum information processing be reproduced by using classical waves? The answer is positive. The crucial element of coming wave model is the presence of the random background field (in physics fluctuations of vacuum, in the cognitive model- background fluctuations of the brain). Such a random background increases essentially correlations between different mental functions, generates nonlocal presentation of information. We might couple these nonlocal representation of information to the binding problem: "How the unity of conscious perception is brought about by the distributed activities of the central nervous system. "
3. Prequantum classical statistical field theory: noncomposite systems
Quantum mechanics (QM) is a statistical theory. It cannot tell us anything about an individual quantum system, e.g. , electron or photon. It predicts only probabilities for results of measurements for ensembles of quantum systems. Classical statistical mechanics (CSM) does the same. Why are QM and CSM based on different probability models? In CSM averages are given by integrals with respect to probability measures and in QM by traces. In CSM we have:
(f)Jl-
=
1M f(¢)djL(¢),
where M is the state space. In probabilistic terms: there is given a random vector ¢(w) taking values in M. Then (f)", = Ef(¢(w)) = (f)w In QM the average is given by the operator trace-formula:
(A)p = TrpA.
215
This formal mathematical difference induces the prejudance on fundamental difference between classical and quantum worlds. Our aim is to show that, in spite of the common opinion, quantum averages can be easily represented as classical averages and, moreover, even correlations between entangled systems can be expressed as classical correlations (with respect to fluctuations of classical random fields).
Einstein's dreams: Albert Einstein did not believe in irreducible randomness, completeness of QM. He dreamed of a better, so to say "prequantum", model 14 : 1) Dream 1. A mathematical model reducing quantum randomness to classical. 2) Dream 2. Renaissance of causal description. 3) Dream 3. Instead of particles, classical fields will provide the complete description of reality - reality of fields : "But the division into matter and field is, after the recognition of the equivalence of mass and energy, something artificial and not clearly defined. Could we not reject the concept of matter and build a pure field physics? What impresses our senses as matter is really a great concentration of energy into a comparatively small space. We could regard matter as the regions in space where the field is extremely strong. In this way a new philosophical background could be created. 11 The real trouble of the prequantum wave model (in the spirit of early Schrodinger) are not various NO-GO theorems (e.g., the Bell inequality 4,11,15), but the problem which was recognized already by Schrodinger. In fact, he gave up with his wave quantum mechanics, because of this problem: A composite quantum system cannot be described by waves on physical space! Two electrons are described by the wave function on R6 and not by two wave on R3. Einstein also recognized this problem 14 : "For one elementary particle, electron or photon, we have probability waves in a three-dimensional continuum, characterizing the statistical behavior of the system if the experiments are often repeated. But what about the case of not one but two interacting particles, for instance, two electrons, electron and photon, or electron and nucleus? We cannot treat them separately and describe each of them through a probability wave in three dimensions ... "
PCSFT: Einstein's Dreams 1 and 3 came true in PCSFT (but not Dream 2!) - a version of CSM in which fields play the role of particles. In particular, composite systems can be described by vector random fields,
216
i.e., by the Cartesian product of state spaces of subsystems and not the tensor product. The basic postulate of PCSFT can be formulated in the following way: A quantum particle is the symbolic representation of a "prequantum" classical field fluctuating on the time scale which is essentially finer than the time scale of measurements. The prequantum state space M = L 2 (R 3 ), states are fields ¢ : R3 -7 R; "electronic filed", "neutronic field", "photonic field" - classical electromagnetic field. An ensembles of "quantum particles" is represented by an ensemble of classical fields, probability measure Jt on M = L 2 (R 3 ), or random field ¢( x, w) taking values in M = L2 (R 3 ). For each fixed value of the random parameter w = wo, X -7 ¢(x, wo) is a classical field on physical space. Density operator = covariance operator: Each measure (or random field) has the covariance operator, say D. It describes correlations between various degrees of freedom. The map p f---+ D = p is one-to-one between density operators and the covariance operators of the corresponding prequantum random fields in the case of noncomposite quantum systems. In the case of composite systems this correspondence is really tricky. Thus each quantum state (an element of the QM formalism) is represented by the classical random field in PCSFT. The covariance operator of this field is determined by the density operator. We also postulate that the prequantum random field has zero mean value. These two conditions determine uniquely Gaussian random fields. We restrict our model to such fields. Thus by PCSFT quantum systems are Gaussian random fields. Quantum observable = quadratic form: The map A -7 fA (¢) = (A¢, ¢) establishes one-to-one correspondence between quantum observabIes (self-adjoint operators) and classical physical variables (quadratic functionals of the prequantum field). Coincidence of averages: It is easy to prove that following equality holds:
In particular, for a pure quantum state 7/J, consider the Gaussian measure with zero mean value and the covariance operator p = 7/J07/J (the orthogonal projector on the vector 7/J), then
217
This mathematical formula coupling integral of a quadratic form and the corresponding trace is well known in measure theory. Our main contribution is coupling of this mathematical formula with quantum physics. This is the end of the story for quantum noncomposite systems, e.g., a single electron or photon 7,8. Beyond QM: In fact, PCSFT not only reproduces quantum averages, but it also provides a possibility to go beyond QM. Suppose that not all prequantum physical variables are given by QUADRATIC forms, consider more general model, all smooth functionals f (¢) of classical fields. We only have the illusion of representation of all quantum observables by self-adjoint operators. The map f f-* A = 1"(0)/2 projects smooth functionals of the prequantum field (physical variables in PCSFT ) on self-adjoint operators (quantum observables). Then quantum and classical (prequantum) averages do not coincide precisely, but only approximately:
1M fA (¢)dfJ(¢)
=
TrpA
+ O(t/T),
where T is the time scale of measurements and t the time scale of fluctuations of prequantum field. The main problem is that PCSFT does not provide a quantative estimate of the time scale of fluctuations of the prequantum field. If this scale is too fine, e.g., the Planck scale, then QM is "too good approximation of PCSFT", i.e., it would be really impossible to distinguish them experimentally. However, even a possibility to represent QM as the classical wave mechanics can have important theoretical and practical applications. And in the present paper we shall use the mathematical formalism of PCSFT to model brain's functioning. Although even in this case the choice of the scale of fluctuations is a complicated problem, we know that it is not extremely fine; so the model can be experimentally verified (in contrast to Roger Penrose we are not looking for cognition at the Planck scale!).
4. Composite systems In CSM a composite system 8 = (81 ,82 ) is mathematically described by the Cartesian product of state spaces of its parts 8 1 and 8 2 . In QM it is described by the tensor product. Majority of researchers working in quantum foundations and, especially quantum information theory, consider this difference in the mathematical representation as crucial. In particular, entanglement which is a consequence of the tensor space representation is
218
considered as totally nonclassical phenomenon. However, we recall that Einstein considered the EPR-states as exhibitions of classical correlations due to the common preparation. PCSFT will realize Einstein's dream on entanglement. Let S = (SI, S2), where Si has the state space Hi ~ complex Hilbert space. Then by CSM the state space of S is HI X H 2. By extending PCSFT to composite systems we should describe ensembles of composite systems by probability distributions on this Cartesian product, or by a random field
¢(x,w)
=
(¢I(X,W),¢I(X,W)) E HI
X
H2·
In our approach each quantum system is described by its own random field: Si by ¢i(X, w), i = 1,2. However, these fields are CORRELATED ~ in completely classical sense. Correlation at the initial instant of time t = to propagates in time in the complete accordance with laws of QM. There is no action at the distance. It is a purely classical dynamics of two stochastic processes which were correlated at the beginning. (In fact, the situation is more complex: there is also the common random background, vacuum fluctuations; we shall come back to this question a little bit later). Operator realization of wave function: Consider now the QMmodel, take a pure state case: I]i E HI @ H 2 . Can one peacefully connect the QM and PCSFT formalisms? Yes! But I]i should be interpreted in completely different way than in the conventional QM. The main mathematical point: I]i is not vector! It is an operator! It is, in fact, the non-diagonal block of the covariance operator of the corresponding prequantum random field: ¢(x, w) E HI X H 2. The wave function I]i(x, y) of a composite system determines the integral operator:
W¢(x) =
J
I]i(x, y)¢(y)dy.
We keep now to the finite-dimensional case. Any vector I]i E HI @ H2 can be represented in the form I]i = 'L,';=1 'l/Jj @ Xj, 'l/Jj E HI, Xj E H 2, and it determines a linear operator from H2 to HI m
W¢
= ~)¢'Xj)'l/Jj,
¢ E H2·
(2)
j=1
I]i* acts from HI to H2 : W*'I/J = WW* : HI --+ HI and W*W : H2 --+ H2
Its adjoint operator
'L,';=I('I/J,'l/Jj)Xj,'I/J
E
HI. Of course, and these operators are self-adjoint and positively defined. Consider the density operator corresponding to a pure quantum state, p = I]i @ I]i. Then the operators of the partial traces p(1) == TrH 2 P = WW* and p(2) == TrH,p = W*W.
219
Basic equality: Let \II EE H~® H2 be normalized by 1. Then, for any pair of linear bounded operators A j : Hj ---> Hj , j = 1, 2, we have:
(3) This is a mathematical theorem 9 ; it will playa fundamental role in further considerations. Coupling of classical and quantum correlations: In PCSFT a composite system 8 = (81,82) is mathematically represented by the random field ¢(w) = (¢l(W), ¢2(W)) E H1 x H 2. Its covariance operator D has the block structure
where Dii : Hi ---> Hi, Dij : Hj ---> Hi . The covariance operator is selfadjoint. Hence D;i = D ii , and Di2 = D 21 . Here by the definition: (DijUj , Vi ) = E(Uj, ¢j(W))(Vi, ¢i(W)) , Ui E Hi, Vj E Hj . For any Gaussian random vector ¢( w) = (¢1 (w), ¢2 (w )) having zero average and any pair of operators Ai E L s(Hi ), i = 1,2, the following equality takes place: (fA " fA 2 )q, == EfA , (¢ 1(W))fA 2 (¢2(W)) = (TrDllAd(TrD22A2) + TrD12A2D21A1. We remark that TrDiiAi = EfAi (¢i(W)) , i = 1,2. Thus we have fAlfA2 = EfAl EfA2 +TrD12A2D21A1. Consider a Gaussian vector random field such that D12 = \ii :
or, for covariance of two classical random vectors f A, , f A 2 , we have: cov (fA " fA 2 ) = (A1 ® A 2)w. We have the following equality for averages of quadratic forms of coordinates of the prequantum random field describing the state of a composite system: EfA,(¢i)(W)) = TrDiiAi. We want to construct a random field such that these averages will match those given by QM. For the latter, we have: (A 1)w = (A1 ®h\Il, \II) = Tr(\II\II*)A1; (A2)W = (h®A2\I1, \II) = Tr(\ii*\ii)k :r::r:oI:a::::e:th(e
\ii}n:t~o~er)a~o:~:=~r: ~hi~' :~:::sr i:s :Oo~l:o:;ti:::~ \II* \II*\II
defined! It could not determine any probability distribution on the space of classical fields. We modify it to obtain a positively defined operator. Originally this modification had purely mathematical reasons, but there are deep physical grounds for it.
220
The operator D
-
w-
+ Ef W ) ( WW* W* W*\[r + Ef
is positively defined if
E
>
o is large enough.
Hence, it determines uniquely the Gaussian measure on the space of classical fields. Suppose now that ¢(w) is a random vector with the covariance operator Dw. Then
(5) This relation for averages and relation (4) provide coupling between PCSFT and QM. Quantum statistical quantities can be obtained from corresponding quantities for classical random field: "irreducible quantum randomness /I is reduced to randomness of classical prequantum fields. Vacuum fluctuations: The additional term given by the unit operator in the diagonal blocks of the covariance operator of the prequantum vector field corresponds to the field of the white noise type. Such a field can be considered as vacuum fluctuations, vacuum field. PCSFT induces the following picture of reality: Fluctuations of the vacuum field are combined with random fields representing quantum systems. Since we cannot separate, e.g., electron from the vacuum field, we cannot separate totally any two quantum systems. Thus all quantum systems are "entangled" via the vacuum field. WHITE NOISE is the basis of everything in Nature - Hida's Dream.
QM as renormalization formalism: Averages given by the mathematical formalism of traditional QM are obtained as renormalizations of classical averages, see (5). Thus the QM-formalism can be considered as a method of renormalization of averages with respect to vacuum fluctuations: it cancels the contribution of the vacuum field. Such a renormalization is especially important in the case of observables of the nontrace class. Here the contribution of the background field is infinite. Thus it should be subtracted from the classical average, cf. with renormalization procedures of QFT. Superstrong quantum correlations: In PCSFT such correlations (violating Bell's inequality) are due to the presence of the vacuum field. The off-diagonal term Wcan be so large only if the diagonal terms are completed by the contribution of the vacuum filed. Mathematics tells us this. Thus they are so strong, because the vacuum field really couple any two systems; they are in the same fluctuating space. Space is a huge random wave; quantum systems are spikes on this wave;
221
they are correlated via this space-wave. Thus quantum correlations have two contributions: 1) initial preparation; 2) coupling via the vacuum field. The picture is pure classical... In this model the vacuum field is the source of additional correlations. It seems that this classical vacuum field is an additional (purely classical) quantum computational resource. 5. QL processing in the brain Consider two time scales (te, t pe ) : tpe « t e , cognitive and precognitive. Take a signal in the brain oscillating on the time scale tpe' We speculate that the brain has an integration device integrating these fluctuations over the interval te. The result of such integration is considered as a cognitive image. In this paper we do not present a concrete procedure of integration and, hence, creation of cognitive images from random fluctuations.
5.1. Multiplicity of time scales in brain and cognitive QLR The main lesson from the experimental and theoretical investigations on the temporal structure of processes in brain is that there are various time scales. They correspond to (or least they are coupled with) various aspects of cognition. Therefore we are not able to determine once and for ever the cognitive time scale te ("psychological time"). There are a few such scales. We shall discuss some evident possibilities. It is well known that there are well established time scales corresponding to the alpha, beta, gamma, delta, and theta waves. Let us consider these time scales as different cognitive scales. For the alpha waves we choose its upper limit frequency, 12 Hz , and hence the te,a ~ 0.083 sec. For the beta waves we consider (by taking upper bounds of frequency ranges) three different time scales: 15 Hz, t e ,(3, low ~ 0.067 sec. - low beta waves, 18Hz, t e ,(3 ~ 0.056 sec. - beta waves, 23 Hz t e ,(3, high ~ 0.043 sec. - high beta waves. For gamma waves we take the characteristic frequency 40 Hz and hence the time scale t e " ~ 0.025 sec.
5.2. Precognitive time scale Our choice of the precognitive (very fine) time scale tpe will be motivated by so called Taxonomic Quantum Model, see proposed by Geissler et aI, see, e.g., paper 16 , for representation of cognitive processes in the brain (which was developed on the basis of the huge experimental research on time-mind relation. They found that information processing in cognitive
222
tasks is based on time scales Qq = q x Qo. where spc = Qo = 4.6ms. We choose Qo as the unit of the precognitive time scale. This corresponds to frequencies ~ 220 Hz. Under such an assumption about the precognitive scale we can find the measure of QL-ness for different EEG bands. For the alpha scale, we have "'a. = tQo ~ 0.055. For the beta scales, we have: "'c f3 low ,
=
Qot
' c,j3, low
~ 0.069;
"'c f3 '
~,atQo ~ 0.082; c,/3
"'c f3 high '
,
=
Qot C,{3,h lgh
~ 0.107.
For the gamma scale we have: "', = tQo ~ 1.84. Smaller '" correspond to c,-, larger integration time in the process of creation of cognitive images; less images can be created and processed. "Thinking through the alpha waves" is essentially less advanced than, e.g., "thinking through the gamma waves" I would like to thank M. Ohya, L. Accardi, M. Asano, K.-H.Fichtner, W.Freudenberg for fruitful discussions. References 1. A. Khrennikov, Ubiquitous quantum structure: from psychology to finance, Springer, Heidelberg- Berlin-New York, 2010. 2. M. Asano, A. Khrennikov , M. Ohya, Foundations of Physics, to be published 2010. 3. A. Khrennikov, J. Phys.A: Math. Gen. 349965-9981 (2001). 4. A. Khrennikov, Contextual approach to quantum formalism (Fundamental Theories of Physics). Springer, Heidelberg- Berlin-New York, 2009. 5. A. Khrennikov, BioSystems 84, 225- 241 (2006). 6. A. Khrennikov, Open Systems and Information Dynamics 11 (3), 267-275 (2004) . 7. A. Khrennikov, J. Phys. A: Math. Gen. 38, 9051-9073 , 2005. 8. A. Khrennikov, Physics Letters A , 372 , 6588-6592 (2008). 9. A. Khrennikov, Europhysics Letters 88, 40005 (2009). 10. A. Khrennikov, J. Consciousness Studies 15, 10-25 (2009). 11. A. Khrennikov, Interpretations of probability, VSP International Science Publishers, Utrecht (1999); second addition (completed) De Gruyter, Berlin, 2009. 12. L. Accardi, A. Khrennikov , M. Ohya, Open Systems and Information Dynamics, 16, 441-443 (2009). 13. K.-H.Fichtner, L.Fichtner, W.Freudenberg and M.Ohya, On a quantum model of the recognition process. QP-PQ:Quantum Prob. White Nois e Analysis 21, 64-84 (2008). 14. A . Einstein and L. Infeld , Evolution of Physics: The Growth of Ideas from Early Concepts to Relativity and Quanta, Simon and Schuster, 1961. 15. A. Khrennikov, Theoretical and Mathematical Physics 157, 1448-1460 (2008). 16. B. Schack, N. Vath, H. Petsche, H. -G. Geissler, E. Mller, Int. J. Psychophysiology 44, 143-163 (2002).
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 223-236)
ENTANGLEMENT MAPPING VS. QUANTUM CONDITIONAL PROBABILITY OPERATOR
DARIUSZ CHRUSCINSKI, ANDRZEJ KOSSAKOWSKI
Institute of Physics, Nicolaus Copernicus University Grudziadzka 5/7, 87-100 Torun, Poland TAKASHI MATSUOKA
Faculty of Management of Administration and Information Tokyo University of Science, Suwa Toyohira 5000-1, Chino City, Nagano 391-0292, Japan MASANORIOHYA
Department of Information Science, Tokyo University of Science Yamazaki 2641, Noda City, Chiba 278-8501, Japan The relation between two methods which construct the density operator on composite system is shown. One of them is called an entanglement mapping and another one is called a quantum conditional probability operator. On the base of this relation we discuss the quantum correlation by means of some typ es of quantum entropy.
1. Introduction
In quantum information theory it is crucial to make clear the following two problems. The first one concerns the study of classifying quantum states of composite systems. In classical description of a physical compound system its correlation can be represented by a joint probability measure or a conditional probability measure. In quantum case a quantum state of a composite system, i.e. a compound state describes a correlation between its marginal states. However, in quantum system, it is known that the joint probability and the conditional probability do not generally exist, which is an essential difference from classical system 20 ,22. The typical example of such difference is the existence of entangled states in quantum system. The difference means that one have to deal more carefully not only with the correlation of entangled state but also that of separable one. From 223
224
this point of view to give the proper classification of quantum states is of primary importance. Hence, one needs an appropriate measure of quantum correlations. We start with notions of positivity: the so-called complete positivity (CP) and complete co-positivity (CCP). Let A and B be C*-algebras (with a unit). A linear map X : A 1-+ B is CP if and only if
(1) is positive for all n. Here Mn (A) stands for n x n matrices with entries from A. A linear map X : A 1-+ B is CCP if and only if
(2) is positive for all n. Every n-co-positive map is positive but not necessarily complete positive even if it is complete co-positive, unless B is Abelian, in which case a positive map is both complete positive and complete copositive. Belavkin and Ohya gave a rigorous construction of quantum compound states by means of a CCP map called entanglement mapping4 ,5. We can construct a compound state by applying the entanglement mapping to Jamiolkowski's isomorphism. On the other hand Kossakowski et al. introduced the notion of quantum conditional probability operator, QCPO for short, by means of a CP mapl. If one of marginal states is given, then we can also construct the compound state by operating the marginal state by QCPO. In the present paper we show the relation between the entanglement mapping and QCPO, i.e. the relation between the CCP map and the CP map. This paper is organized as follows: In sections 2 and 3 the definition of entanglement mapping and QCPO are reviewed. In the middle of 1990s Horodecki family and Peres gave a classification of compound states by introducing the criteria so called a PPT (Partial Positive Transpose) condition l2 ,2l. Here we also review the relation between the entanglement mapping and PPT condition l5 ,l8, and the relation between QCPO and PPT condition 1 . The section 4 gives the relation of such two methods. We show that the compound state by entanglement mapping is related to QCPO via the similar equation to classical Bayesian relation. In the section 5 we review the two types of quantum entropies, so called the quantum mutual entropy and the quantum conditional entropy, and such entropies are applied to some examples of entanglement mapping. Finally, from the observation
225
of the above computational examples the meaning of such entropies are discussed on the base of quantum Bayesian relation. The above argument can be formulated at general C* -algebraic setting 4 ,5,15,17,18. However, for simplicity, only the finite-dimensional case will be considered here.
2. Entanglement mapping and its classification Let us consider two quantum systems (Hk, A k , S (A k )h=I,2 where Hk is a Hilbert space given by Hk = (Cn, Ak = B (Hk) stands for the algebra of complex k x k matrices and S (Ak) stands for the set of all states on A k . Then the composite system (H, A, S) is given by
H
=:0
HI 0H2, A
=:0
B (HI 0H2) = Al 0 A 2, S
=:0
S (A) = S (AI 0 A 2) .
Notice that
S
::J conv
(S (Ad 0 S (A 2 ))
.
It was mentioned that conv(S (Ad 0 S (A 2 )) are called separable states and the subset of states S\conv(S (Ad 0 S (A 2 )) is called the set of entangled states, i.e. S = Sent U Ssep' We are interested in the special subset of S,
SpPT=:O{WES;w o (10T) ES},
(3)
where T is stands for transposition. Such states are called partial positive transposition (PPT) states I2 ,21. Clearly S ::J SPPT ::J Ssep.
In this section we will review how to construct the compound state by means of entanglement mapping and we will show that the entanglement mapping gives the same classification of states with PPT condition. For a normal state w E S there exist a density matrix B such that
w(a0b) = Tr(a0b)B,a E AI,b E A 2 .
(4)
The state w can be written as
(5) where ¢ is a linear map from A2 to Al given by ¢ (b) =:oTr 2 (10 b) BE AI, and its dual map ¢* from Al to A2 given by ¢* (a) =:oTrI (a 01) B E A 2 • In (5) the partial trace with respect to Hk is denoted by Trk (.). It is clear that the marginal densities of B are given by
(6)
226
Belavkin and Ohya revealed that such maps can be reconstructed by Hilbert-Schmidt operator, which is called the entangling operator, and they showed that both ¢ and ¢* are complete co-positive, but not always complete positive on general C* -algebraic setting4 ,5. For example, if w is a pure entangled state, then its entanglement mappings are not CPo Here we recall the necessary and sufficient condition of CP and CCP in the case of finite dimension. Let Mn denote the algebra of complex n x n matrices and let I: [Mn, MmJ be the complex space of linear maps from Mn to Mm. We recall the isomorphism W : I: [Mn, MmJ f-> Mn 0 Mm 8,16 ,24: For each linear map X E I: [Mn, MmJ one defines n
WIP'+ (X) = "L)ek) (ezl 0 X (Iek) (ezl) = (1 0 X) lP' +
(7)
k,l and n
WJJ (X) = ~) ek ) (e11 0 X (leI) (ekl) = (1 0 X)Jf k,l
(8)
where {Iek) (ell} is an orthonormal base of Mn. The map (7) is called n
the Choi's isomorphism8 with lP' + =
L
k,l
lek) (ezl 0 lek) (ell which gives the
maximally entanglement state (not normalized), and the map (8) is called n
the Jamiolkowski isomorphism 14 with Jf =
L
leI) (ekI 0 Iek) (ezl.
k,l
One has the following Lemma 2.1.
8 ,16,24
For X
E
I: [Mn, MmJ the following statements hold:
(1) X is CP iff WIP'+ (X) 2: (2) X is CCP iff WJJ (X) 2:
o. o.
We apply Lemma 2.1 to entanglement mapping ¢ and its dual ¢*. The density matrix () in (4) can be represented by matrix elements as n
()=
L
eij :a(3 lei)(ejI 0
I!a)(f(3 1
(9)
i ,j ,a ,/3 = l
where{l ei) (ejl} (resp. {Ifa) (f(3 I} ) is a standard base of A1 (resp.A 2 ) , and e ij :a(3 is given by e ij :a(3 = (e i 0 fa , eej 0 f (3 ). Using this representation ¢
227
and ¢* are given by
¢ (b) = Trd1 ® b) e n
L eij:a(3 U(3, bfa} lei} (ej I,
=
(10)
a,(3=l and
= Trda ® 1) e
¢* (a)
n
=
L
eij :a(3 (ej,aei) Ifa} U(3I·
(11)
i,j=l So that we have n
L
W.Jf (¢*) =
(Iek) (ell) ® ¢* (lei) (ekl)
k.l=l n
L
eij:a(3l ei} (ejl ® Ifa) U(31 = e
(12)
i,j,a,(3=l
e
Namely the entanglement ¢* reconstructs its density matrix via the Jamiolkowski isomorphism. This fact means that ¢* is always CCP. In the similar way we can show also the complete co-positivity of ¢. On the other hand we apply ¢ to the Choi's isomorphism. Then we have n
k.l=l n
L eij:a(3l ei} (ejl ® If(3) Ual = (1 ® T) e
(13)
i,j,a,(3=l The positivity of WIP'+ (¢) is equivalent to the complete positivity of ¢ and it is also equivalent to the PPT condition of
e.
Theorem 2.1.
e E SPPT
if and only if the map ¢* is CF.
Remark 2.1. Theorem 2.1 was proved at C*-algebraic general setting 15 ,18. If an entanglement mapping ¢*, i.e. a normalized CCP map, is given, then one can construct a compound state as
e¢
n
n
k.l=l
k.l=l
228
3. Quantum conditional probability operator and its classification
In this section we review another construction of compound state by Kossakowski et all. For a given () E S with p =Tr 2 () one can define the operator
(15) which has the following properties 7r1! ~
(16)
0,
(17) and we assume that p > 0, that is, p represents a faithful state. It follows from (16) and (17) that the operator 7r1! is the quantum analogue of a classical conditional probability. Indeed, in the case A is an Abelian algebra, 7r1! coincides with a classical conditional probability. Definition 3.1. An operator 7r E A is called a quantum conditional probability operator (QCPO) if 7r satisfies condition (16) and (17).
One easily finds the following formula n
7r
L
=
'P (Ifk)
Ull) ® Ilk) Ull
(18)
k.l=l
where 'P is a CP unital map from A2 to A 1 , i.e. , 'P (1) = 11. Using Lemma 2.1 and unitality of'P it is easy to check that 7r
()
=
L
p~ 'P (lfk ) Ull) p~ ® Ifk ) Ull·
(19)
k .l= l
Then we apply Lemma 2.1 (2) to the map A (-) == p ~ 'P (.) p~: n
W.dA) =
L
p ~ 'P (lfk ) Uti) p~ ®
1M (Ik I =
(1 ® T) ()
k.l = l
Hence Corollary 3.1.
1()
if and only if the map 'P is CCP.
(20)
229
4. The relation between entanglement mapping and QCPO Belavkin and Dai6 gave the following decomposition of an entanglement mapping.
Lemma 4.1. Every entanglement mapping ¢ with ¢ (1) position
= P has a decom(21)
where zp is a OP unital map to be found as a unique solution to
zpC) = p-!¢OT(')p-!,
(22)
Now we can show the following
Theorem 4.1. If a compound state e> given by (14) has a marginal state P = Tr2 e> , then e> is represented by
(23)
Proof. We apply the decomposition (21) to e>. Then n
e>
=
L
¢ (Ifk) (!II)
o IfI) Ukl
k.l=l n
=
L p!zp
T (I!h) (11) p!
0
o liz) Ukl
k.l=l n
=
L
p!zp(lfl) Ukl)p!
o
IfI) (!hI
k.l=l
In the opposite direction we have
(p!
(1)
n
1f
(p!
(1) = L
p!zp(lfk) (11)p! 0lfk) Ull
k.l=l k.l=l n
=
L
k.l=l
¢ (liz) Ukl) 0lfk) Ull = e>.
o
230
The relation (23) can be regarded as the quantum analogue of classical Bayesian relation. In the next section we extend classical information theory to quantum system on the base of the quantum Bayesian relation (23).
5. Mutual entropy and conditional entropy vs. quantum entanglement In classical description of a physical compound system its correlation can be represented by a joint probability measure or a conditional probability measure. In classical information theory we have proper criteria to estimate such correlation, which are so-called the mutual entropy and the conditional entropy given by Shannon. The mutual entropy is given as the relative entropy of correlated joint probability and non-correlated one. This means that it represents the correlation included in the joint probability Tij as the distance from the non correlated product joint probability Piqj, where Pi and qj are marginal probabilities, in the sense of relative entropy. In other words, the mutual entropy means the common information included in marginal random variables X and Y. On the other hand, the conditional entropy is given by the average of the entropy of conditional probability which means the uncertainty still remaining in X (resp. Y) after observing Y (resp. X). We extend the classical entropies to a quantum system.
Definition 5.1. Let ¢ be an entanglement mapping with p 0" = ¢* (1) . One defines 4 ,5
= ¢ (1) and
1¢(p, 0") == 8(8¢, p ® O")
= Tr8 ¢ (log 8¢ -
log p ® 0")
(24)
where 8 (', .) is the Araki-Umegaki relative entropy. One calls 1¢ (p, 0") the quantum mutual entropy. 7, 10. The quantum mutual entropy can be computed as follows: 1¢ (p, 0") = Tr8¢ (log8¢ -logp ® 0")
= Tr8 ¢ (log 8¢ -
log p ® 1 - log 1 ® 0")
= -8 (8¢) - Tr 2 8¢ log p ® 1 - Tr 1 8¢ log 1 ® 0" =8(p)+8(0")-8(8¢) .
(25)
231
By using the quantum mutual entropy one also defines the quantum conditional entropies 4 ,5:
S> (alp) == S (a) - I> (p, a) = S (8» - S (p) ,
(26)
= S (8» - S (a) .
(27)
S> (pia) == S (p) - I> (p, a)
Remark 5.1. The above quantities are discussed also by Cerf and Adami 7 , Horodeckisll, Henderson and Vedral 13 , Groisman et al. 10. Example 5.1. (Product state). pTr2ab, ¢* (a) = aTr1pa, we have 8>
=
For entanglement mappings ¢ (b)
p 0 a.
Then
I > (p , a)
= S (p) + S (a) - S (8» =S(p)+S(a)-(S(p)+S(a))=O,
S> (alp) = S (a) , S> (pia) = S (p) .
(28)
(29)
The above result clearly means that the product state has no correlation between its marginal states. The quantum mutual entropy I> (p, a) measures the distance (correlation) between the correlated state and the product state. In quantum system we have two types of correlated states. The first one is a separable correlated state, and second one is an entangled correlated state. Example 5.2. (Separable correlated state). For entanglement mappings ¢ (b) = '2:. Pi Pi Traib, ¢* (a) = '2:.PWiTrpia , where Pi ~ 0, '2:.Pi = 1, we have
(30) with p = ¢ (1) = '2:.PiPi' ¢* (1) = '2:.PWi. Then we have the following inequalities3 ,4, 5
0::; I> (p, a) ::; min {S (p) , S (a)} ,
(31)
S> (alp) > 0, S> (p ia) > 0.
(32)
232
Example 5.3. (Separable perfect correlated state). For entanglement mappings ¢(b) = LPi lei) (eil (fi,bii ), ¢* (a) = LPi 1M (fil (ei,aei) we have
8¢ = I>i lei) (eil 0 I!i) (fil with p
= ¢ (1) = LPi lei) (eil, (J = 1¢ (p, (J) = 5 (p)
= LPi 1M (fil. Then.
¢* (1)
+ 5 ((J)
- 5 (8¢) = 5 (p),
(33)
where 5 (p) = 5 ((J) = 5 (8¢) = - LPi logpi. This correlation corresponds to a perfect correlation in the classical scheme. Example 5.4. (Pure entangled correlated state). tanglement mappings ¢(b) LAi"Xj lei) (ejl (fj,bM, 2 LAiAj 1M (fJl (ej,aei) , where Ai E C, L IAil = 1, we have
8¢ =
L Ai"Xj lei) (ej I 0
For en¢* (a)
Iii) (fJ I = IW) (WI,
where IW) = L Aiei 0 k and its marginal states are given by p L IAil 2 lei) (eil, (J = ¢* (1) = L IAil 2 lii) (fil. Then
1¢ (p, (J) = 5 (p)
+ 5 ((J)
= ¢ (1) =
- 5 (8¢)
= 25 (p), 5¢ ((Jlp) = 5¢ (pl(J) = -5 (p) , where 5 (p)
(35)
= 5 ((J) = - L IAil2log IAil2 .
In classical system the mutual entropy is always smaller than its marginal entropies, and the conditional entropy is always positive. So that the correlation of pure entangled state has a non-classical property. We introduce another criterion to measure the correlation of compound states. Definition 5.2. For an entanglement mapping ¢ with p ¢* (1), we defines 1
D¢ (pp) == 1¢ (p, (J) - "2 (5 (p) 1
+ 5 ((J))
= "2 (5 (p) + 5 ((J)) - 5 (8 ¢)
= ¢ (1), (J = (36)
(37)
233
Remark 5.2. Actually,
DEN(B; p, a)
:=
-Dq,(p, a)
was called degree of entanglement 4 ,5,19. Suppose now that we have two entanglement mappings rP1 and rP2 such that p = rPk (1) and a = rP~ (1) (for k = 1,2). Definition 5.3. One says that rP1 has stronger correlation than rP2 if
Dq" (p,a) > Dq,2 (p,a).
(38)
One shows Proposition 5.1. hold:
2,19 If
Bq, is a pure state, then the following statements
(1) Bq, is a pure entangled state iff Dq, (p,a) > O. (2) Bq, is a pure separable state iff Dq, (p,a) = O. It is well known 23 that if B is PPT, then
S(B) - S(p)
~
0,
S(B) - S(a)
~
O.
where p and a are the marginal states of B. Proposition 5.2. If Bq, is a mixed PPT state, then
Remark 5.3. Interestingly, there exist quantum entangled states with weaker correlations than some separable states in the sense of Definition 5.3 (see Hirota et al. in this proceedings). Remark 5.4. Let X = {Xi}i=l,. .. ,n and Y = {Yj}j=l, ... ,m be random variables with probability distributions P = (Pi) and Q = (qj), and let (X x Y = {(Xi, Yj)} , R = (r ij)) be their compound system. Then the definition of mutual entropy I (X, Y) is given by
(39) Now the joint probability rij is represented as rij =p(jli)Pi
(40)
234
where P (ilJ) is a transition probability, Using this classical Bayesian relation the mutual entropy I (X, Y) can be represented by r· .
I (X, Y)
= Lrij log ---2.L ·. ' ,J
Piqj
" ('I = '~P J Z') Pi 1og p(jli)pi ·. ',J
Piqj
" ( 'I·) 1 p(jli) = '~P J Z Pi og '" ('Ik) · . ~P J Pk ' ,J
= I(P ;A*) ,
(41)
k
where we denote a transition probability matrix (p (j Ii)) by A*, and we call it a channel. In this scheme we call the probability distribution P as an input state, and call the probability distribution Q as an output state given by ( 42)
and this mutual entropy I (P; A*) can be regarded as a measure of transmitted information through the channel A* from an input P to an output Q = A* P, The important property of this measure is the following inequality:
0::; I(P;A*)::; min{S(P) , S(A*P)},
(43)
We can also represent the quantum mutual entropy by channel representation, If a QCPO 7f is given, then we can define a channel A * for any input state p by
(44) In this scheme we can call I4> (p, 0')
7f
as the channel density, Then we have
= Tre4> (log e4> -log p 0 0') = Tr (p~ 01) 7f
7f
(p ~ 0 1) - log p 0 A~ (p) )
=I4> (p; A~) , where
A~ (p) =Tr 1 (p ~
0 1)
7f
(p ~ 0 1) , However this mutual entropy
I4> (p; A~) does not always satisfy the following inequality (see Example
235
5.4):
o ::; Iq, (p; A~) ::; min { S (p) , S
(A~ (p))) .
(45)
From this point of view the quantum mutual entropy 1q, (p, 0') is not a proper measure of transmitted information through the channel. Hence, a further study of this problem is needed. 6. Summary In this paper we showed that the quantum Bayesian relation holds between the compound state given by means of a CCP map (an entanglement mapping) and the quantum conditional probability operator i.e. QCPO given by means of a CP map. We discussed the role of quantum mutual entropy, conditional entropy and its relation with quantum entanglement.
References 1. M. Asorey, A. Kossakowski, G. Marmo, E.C.G. Sudarshan, "Relation between quantum maps and quantum state", Open. Syst. Info. Dyn. 12, 319329 (2006). 2. L. Accardi, T. Matsuoka, M. Ohya, "Entangled Markov chaines are indeed entangled", Infin. Dim. Anal. Quantum Probab. Top. 9, 379-390 (2006). 3. L.Accardi, T. Matsuoka, M. Ohya, "Entangled Markov chain satisfying entanglement condition", RIMS 1658, 84-94 (2009) 4. V. P. Belavkin, M. Ohya, "Quantum entropy and information in discrete entangled state", Infin. Dim. Anal. Quantum Probab. Top. 4, 33-59 (2001). 5. V. P. Belavkin, M. Ohya, "Entanglement, quantum entropy and mutual information", Proc. R. Soc. London A 458, 209-231 (2002). 6. V. P. Belavkin, X. Dai, "An operational algebraic approach to quantum channel capacity", Int. J. Quantum Inf. 6, 981 (2008). 7. N. J. Cerf and C. Adami, "Negative entropy and information in quantum mechanics", Phys. Rev. Lett. 79, 5194-5197 (1997). 8. M. D. Choi, "Completely positive maps on complex matrix", Lin. Alg. Appl., 10, 285 (1975). 9. D. Chrusciiiski, Y. Hirota, T. Matsuoka and M. Ohya, "Remarks on the degree of entanglement" in this proceedings. 10. B. Groisman, S. Popescu and A. Winter, "Quantum, classical, and total amount of correlations in a quantum state" Phys. Rev A 72, 0323187 (2005). 11. M. Horodecki and R. Horodecki, "Information-theoretical aspect of quantum inseparability of mixed states", Phys. Rev. A 54,1838-1843 (1996). 12. M. Horodecki, P. Horodecki and R. Horodecki, Phy. Lett., A 223, 1 (1996). 13. Henderson and V. Vedral, "Classical, quantum and total correlation", J. Phys. A 34 6913 (2001)
236
14. A. Jamiolkowski, "Linear transformation which preserve trace and positive semidefiniteness of oprators", Rep. Math. Phys. 3 , 275 (1072). 15. A. Jamiolkowski, T. Matsuoka and M. Ohya, "Entangling operator and PPT condition" , TUS preprint (2007). 16. G. Kimura, A. Kossakowski, "A note on positive maps and classification on states", Open Sys. & Information Dyn. 12, 1 (2005) . 17. T. Matsuoka, "On generalized entanglement" , QP- PQ Quantum Probab. & White Noise Anal. 21 , 170-180 (2007). 18. W. A. Majewski, T. Matsuoka and M . Ohya, "Characterization of partial positive transposition states and measures of entanglement", J. Math. Phys. 50, 113509 (2009). 19. T. Matsuoka, M. Ohya, "Quantum entangled state and its characterization", Foud. Probab. Phys. 3 750 , 298-306 (2005). 20. M. Ohya, 1. V. Volovich, Mathematical Foundation of Quantum Information and Computation ( to be pabulished by Springer, New Youk) 21. A. Peres, Phys.Rev.Lett., 77, 1413 (1996). 22. K. Urbanik, "Joint probability distribution of observables in quantum mechanics" , Stud. Math. T. 21,317-323 (1961) . 23. K.G .H . Vollbrecht, M.M. Wolf, "Conditional entropies and their relation to entanglement criteria" , e-print arXiv: quant-ph/0202058v1. 24. S. L. Woronowicz, "Positive maps of low dimensional matrix algebra", Rep. Math. Phys., 10, 165 (1976)
Quantum Bio-Informatics IV eds. 1. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 237- 254)
CONSTRUCTING MULTIPARTITE ENTANGLEMENT WITNESSES
MHOSZ MICHALSKI Institute of Physics, Nicolaus Copernicus University Grudzir;,dzka 5, 87-100 Torun, Poland E-mail: [email protected] The method of transforming a hamiltonian of a composite quantum system into an entanglement witness , explored in an earlier paper l , is extended to the multipartite case. The witness can be made not only to detect general entanglement but also to discriminate among its various multipartite types. Partial knowledge of the lowest part of energy spectrum is sufficient for approximate witness construction. As an example, we carry out analytic calculations with the hamiltonian of a I-dim ensional Heisenberg XXZ model.
Keywords: Multipartite entanglement, entanglement witness, quantum spin chains, Heisenberg models
1. Introduction
Multipartite entanglement is a generic property of quantum states of interacting many-body systems. However, already the simplest examples indicate that it cannot be understood and described solely in terms of more elementary bipartite entanglement. In particular, as counterintuitive as it may seem, in a system of 3 qubits there are pure states which are separable against any bipartite split, yet they are not fully separable2 ,4. It means that there exists genuinely 3-partite entanglement which is not the sum of any bipartite constituents. Moreover, there are inequivalent brands of 3-partite entanglement 5 , as exemplified, for instance, by the famous IW ) and IGHZ) states,
IW)
=
~ (1100) + 1010) + 1001)) ,
IG H Z) =
~ (1000) + 1111)).
(1)
Namely, it is easy to verify that tracing out one of the qubits reduces the GHZ state to a separable mixture, while the W state is more robust - the resulting 2-qubit state remains entangled. Already this distinction suggests 237
238
that there is no straightforward generalization of the Schmidt decomposition for the 3-partite case,
Indeed, any such I'lj!), and IGHZ) in particular, becomes a separable mixture under one-party reduction, while the W state does not - hence it cannot be cast in such a "Schmidt form" by a choice of local bases. This picture unfolds into an even more complex one with the increase of the number of subsystems. Distinguishing entangled and separable states of bipartite systems remains to be one of the most challenging open problems in quantum information theory. The discrimination among various genres of multipartite entanglement appears a still harder task. In experiments, Bell inequalities are among commonly used tools to detect entanglement both in bi- and multipartite settings. Yet Bell inequalities prove to be a faulty detector in general: on the one hand , even in the bipartite case, there are entangled states not violating the inequalities, on the other hand the degree of violation of Bell inequalities is not sufficiently "monotone" when going from "weaker" entangled biseparable states to fully entangled multipartite ones6,3. Consequently, it is not an adequate tool for the classification of various multipartite entanglement patterns. An alternative technique, recognized to be quite fruitful both theoretically as well as experimentally, is the use of entanglement witnesses 1 ,5,6,7,8,9,1O. While originally7,9, entanglement witnesses were defined to distinguish bipartite entanglement from separability, they proved to be equally useful for the discrimination among various types of multipartite entanglement l l ,5,6, or even as simple entanglement measures a . Actually, there is a number of interesting relations between specific witnesses and standard entanglement measures, the former providing various useful bounds for the latter 12. In the present paper, we are going to focus on a special method of constructing entanglement witness using a physical observable whose spectrum is partially known 10. In [1] we have discussed an application of such a method to construct entanglement witness out of the Hamiltonian of a I-dimensional isotropic Heisenberg model. We have demonstrated the aStrictly speaking every witness W gives rise to a pseudo-measure of entanglement Ew (p) = max{O, - Tr(W p)} rather than a true measure , for in general it yields zero value for some entangled states.
239
usefulness of this approach with analytic calculation of the respective entanglement indicator for a class of thermal equilibrium states of the model. Presently, we extend the results to the case of an anisotropic XXZ Heisenberg chain immersed in a uniform magnetic field. Varying the control parameters - the anisotropy constant and the strength of magnetic field produces different types of multipartite entanglement in the ground state. This in turn is used as the basis of construction of witnesses detecting these specific entanglement brands. The paper is organized as follows. Section 2, for convenience of the readers, introduces definitions and basic facts on multipartite entanglement. Section 3 outlines the general methods of witness construction and presents our procedure of obtaining multipartite witnesses of specific type out of an observable. Section 4 describes the results of analytic calculations performed for the Heisenberg model. 2. Multipartite entanglement
We begin this section with a review of basic facts on multipartite entanglement of pure states. For a more detailed exposition the reader is referred e.g. to [13, 14]. As it is well known, in bipartite case H = HA @ HB entanglement of pure states can be completely classified using Schmidt normal form , d
I?p)
= L Ai l
l77i)
i=1
where l?Pi) E HA , l77i ) E HB are orthonormal systems of vectors and d ::::; min{dimHA, dimHB}' The numbers Al ~ A2 ~ ... ~ Ad ~ 0, L i A; = 1, characterize the entanglement of I?p) entirely, giving rise to a natural entanglement measure,
called the entropy of entanglement. If d = 1 (i.e. Al = 1), the state I?p) is separable. Observe that the Schmidt form for I?p) is obtained by an appropriate choice of bases in HA and HB . In other words, I?p) is locally unitarily (LU) equivalent to any state with similar Schmidt coefficients Ai. In particular, any state of the system of two qubits can be brought by appropriate local unitary transformations to the form sinBIOO) + cosBlll), where the real o : : ; B ::::; ~ parameterizes all distinct entanglement classes.
240
The above observation indicates one way to progress with classification of mulipratitie states: different entanglement patterns are represented by equivalence classes of pure states related by local unitary actions. However, already in the case of a tripartite system, even in its simplest version consisting of 3 qubits, the classification turns out to be much more complex l5 ,16 than the bipartite one: local unitary equivalence yields the following state prototypes
e : :;
with Ai ~ 0, L: A; = 1, a :::; 1r. A general scheme yielding similar "normal forms" for arbitrary multipartite systems is known l9 , minimizing the number of summands from a product orthonormal basis, yet the number of free parameters grows considerably with the number of parties. This substantial increase of normal form complexity suggests the replacement of local unitary equivalence with another, more crude classification scheme. It can be realized by the so-called stochastic LOCC (SLOCC) equivalence 20 where, roughly speaking, local unitary transformations are replaced with local invertible ones. It is the SLOCC classification that splits the set of 3-qubit pure states into the well-known six categories: separable states represented by 1000),3 types of biseparable ones, A-BC represented up to normalization by 10) Q9 (100) + 111)) and similarly for AB-C and AC-B, the W-states and , finally, the GHZ type states represented by (1). However, the passage to a 4-qubit system reveals again the whole continuum of SLOCC equivalence classes, just like in the LV classification scheme: one can distinguish 9 structural types of pure state entanglement yet the complete SLOCC classification involves also 4 independent complex parameters21. Natural convex structure of mixed state sets yields still different and much simpler classification scheme, based on partitions of the set of constituents of the system. An approach of such type has been developed in detail e.g. in [22]. Suppose that the parties constituting a compound quantum system are labeled by AI"'" An, that is H = HA, ... An ' Any partition of this set of labels, say {A~, ... , A;,,} U ... U {Af, ... , A~k}' corresponds to a specific "coarse-grained" way of looking at the entire system,
i.e. here a k-partite one. A pure state I'lj!) E H is called k-separable iff there
241
Figure 1.
The structure of mixed states of a 3-qubit system 5 .
exists a partition as above such that
with l?,bi) E Ji Ai ".A~j , j = 1, ... , k. It is obvious that k-separable states are automatically l-separable for l < k, with respect to appropriate coarser partitions. Next the notion of k-separability is extended to mixed states by the usual convexity construction:
Definition 2.1. A mixed state p is k-sepamble, in short p E 'L.k(Ji), iff it can be represented as a convex combination of pure projections on kseparable vectors,
m
Let us stress that k-separability of individual l?,bm) above refers in general to different partitions of the system constituents. The above notion of k-separability yields the convex filtration in the set of states 'L.(Ji),
'L.(Ji) .
242
The simplicity of such partition-based approach should not overshadow some of its obvious deficiencies, especially the fact that it misses important structural features of states singled out e.g. in the SLOCC equivalence scheme. Going back to the 3-qubit example, one can reveal 5 a finer convex structure distinguishing the W-type states and GHZ-type states among the 3-partite entangled ones, cf. Fig. 1. The W-type states are those which can be expressed as convex sums of projectors on separable, biseparable and W-type vectors, GHZ-type ones being the complement in the set of all states. It should be stressed, however, that in practice one is often interested only in gross entanglement features, e.g. in the distinction between genuinely n-partite entangled and biseparable states, or between biseparable and triseparable ones, etc., when the partition-based approach offers sufficient means. In the following sections we shall describe in detail the construction of an entanglement witness based on the system Hamiltonian which detects genuinely entangled mixed states vs. biseparable ones. 3. Construction of entanglement witnesses An entanglement witness is a versatile tool which allows one to detect entangled states of a compound quantum system 17. Let us recall its original definition in the bipartite setting7 •9 . Definition 3.1. A self-adjoint operator W E BCHAB) is an entanglement witness iff Tr(W 0") ~ 0 for all separable mixed states 0", while Tr(W [!) < 0 for some nonseparable [! (equivalently, Tr(W . ) is nonnegative on separable states, yet the spectrum of W contains a negative eigenvalue). W is then said to detect the entanglement of [!. Entanglement witnesses are directly related via the Jamiolkowski isomorphism 18 to positive but not completely positive maps which are basic for the characterization of entanglement l l . Although entanglement detection criteria based on positive maps (e.g. the famous Peres-Horodecki partial transpose criterion) are formally stronger than the witness-based ones, yet globally they are equivalent in the sense that for deciding separability of a given state it takes in general all possible witness-based tests and likewise all positive map-based ones. The main advantage of witness-based detection is its operational simplicity, namely it is realized by measuring W in the tested state [!. Therefore, if W has some direct physical meaning, such a procedure becomes experimentally feasible.
243
Figure 2.
The idea of a
~C-witness.
It should be noted that the witness technique can be immediately reformulated to detect not the entanglement but any other property of quantum states defining a convex subset of~. Let D. c ~ be such a subset. We will then call an observable W the D.c-witness b iff Th(W . ) is nonnegative on D. and Th(W Q) < 0 for some Q ¢ D.. After all, the witness technique is just a simple application of the geometric formulation of Hahn-Banach theorem: the null-space of Tr(W . ) separates the convex set D. and the external point Q, cf. Fig. 2. Now, suppose that we work with a multipartite system. One can use the set of k-separable states ~k in place of D., to define a witness detecting the related entanglement category of states (i.e. non-k-separability). Of basic interest are witnesses detecting the genuine n-fold entanglement, i.e. filtering out all biseparable states. We will focus on constructing such witnesses below. Examples of witnesses of similar type exist in literature 5 ,6,23, and the general method of their construction proceeds as follows. Let I~ ) be a fixed entangled multipartite state. The witness is defined by
v
= a II: - : I~)(~I ,
(2)
where a = maxl ¢) EiJ.1 (~ I
244
Figure 3.
3-qubit witnesses of different types.
a witness. However, in practice such a is difficult - if at all possible - to compute. The above construction is self-explanatory: any 12 which is a convex combination of projections on elements of D. yields Tr(V 12) 2: 0, while 12 = I~,(~I gives the negative expectation of V. Some well-known examples for the 3-qubit system using the IW, and IGHZ, states in place of I~, are 5 ,23 (see Fig. 3):
• VI = ~][ - IW,(WI with D. = ~Sep, detecting nonseparable states; • V2 = ~][ - IW,(WI with D. = ~Bisep, detecting nonbiseparable states, i.e. genuinely tripartite entangled; • V3 = ~][ - IGHZ,(GHZI with D. = ~w, discriminating between W- and GHZ-type states. Despite its simplicity, this schema does not automatically settle the important question of practical detection of entanglement in laboratory, because the V operators do not have immediate physical meaning. In order to apply them in experiment one has first to express them in terms of physically manageable observables, like e.g. spin operators. Such decompositions have in fact been constructed for three and four qubit systems6 and they turn out to be technically quite demanding. First, it is not easy to find them and to assess their optimality, second - to actually realize them
245
in laboratory: in the cases discussed the experimental setup required 5-15 local measurement settings. We are going now to explore a slightly different approach, where the witness construction begins with a true physical observable that is easy to measure 24 ,25,10. Let A be such an observable. Then the b..c -witness is constructed as WA
=
A:-:o:[,
(3)
where 0: = inf l¢)E.6. (¢IAI¢). Just like in the case of schema (2), finding the exact value of 0: may be a difficult task. Note, however, that once 0: is known, it suffices to measure the observable A in a state g, an easily manageable task, and compare the resulting value Tr(Ag) with 0:: if the measurement outcome is less then 0:, g has the desired property b..c . To analyze the present method closer, let us write A in its spectral decomposition form,
with Ai and l1/'i) being its eigenvalues and eigenvectors, respectively. Then
(4) For a fixed I¢) the sum in (4) is simply a weighted mean of eigenvalues. Let us plot the range of these mean values (i.e. the image of b..) against the spectrum of A, cf. Fig. 4.
a
Figure 4.
The image of t. against the spectrum of A.
We can see now that in order for W A to be a witness one must have 0: or, in other words, at least the ground state 11/'1) of A must not belong to b... W A is then suitable to detect the type of entanglement present in 11/'1) and possibly also in other lowest eigenvectors 11/'2)' ... as long as they are not of b.. type.
Al <
246
As it has been said above, the main difficulty is usually hidden in the computation of a. It is both the structure of the set of states ~ and symmetry properties of A that determine how hard this task will eventually be. Heuristically, one can expect that determination of a will be less difficult in the cases when ~ is simply the set of separable states and A is sufficiently symmetric with respect to relabeling of the system components. A relatively large class of analytically tractable examples is provided by spin lattice models with regular interaction patterns, where the respective hamiltonian H plays the role of the observable A 10. The problem of finding a may be simplified considerably if instead of its exact value one uses a reasonable lower bound a ::; a in the witness construction. The resulting WA will not be optimal in the sense that in general it will detect less states than the exact witness WA: obviously we will have Tr(WAO) ~ Tr(WAO) for any 0, which is the price to pay for the simplification of determining a. We will outline one such approximation procedure below. Recall that a is given by (4). Let ILl = SUPI¢)E~ 1(7Pll¢)1 2. Since by assumption l7Pl) tf- ~, we have 0 ::; ILl < 1. An immediate lower bound to a is then
(5) Indeed, it is the least possible mean of eigenvalues (4) realized only if for some ¢ E ~ there is a coincidence of null overlaps (7Pi I ¢) = 0 for all i ~ 3. An improvement of this approximation may still be possible and it depends on the actual value of maximal squared overlap of 17P2) with states in ~. Let IL2 = SUPI¢)E~ 1(7P21 ¢) 12. If now IL2 < 1- ILl' the improved lower bound to a is given by
(6) One can continue this procedure finding subsequently IL3' ... etc. as long as the obtained values satisfy ILk < 1 - ILl - ... - ILk-l at each step. In [1] we have applied the exact witness construction method based on (3) to a I-dimensional isotropic Heisenberg ring of n spins immersed in a uniform magnetic field of controllable strength B. ~ was chosen to be the set of fully separable states. The resulting witness was then applied to thermal equilibrium states of the system, detecting correctly their entanglement for a range of field and temperature values. In the next section, we will analyze the anisotropic case and we will show how one can find a
247
lower bound of a with the procedure just described when biseparable states.
~
is the set of
4. Example: Anisotropic Heisenberg spin chains In the present section we will work with the I-dimensional Heisenberg XXZ model with hamiltonian given by N
H =
L o-ko-k+1 + 0-%0-%+1 + (1 + D)a-ko-k+1 + Bo- k ,
(7)
k=l
Here D and B denote, respectively, the nonisotropy parameter and the strength of external magnetic field. The coupling constant J is set equal to 1 for simplicity (antiferromagnetic case). The Hilbert space of the model is 1i = (C 2 )®N and we have used the abbreviated notation o-k = 1IQ9·· .Q9o-xQ9 ... Q91I, where the spin operator o-X appears at the k-th slot, and similarly for 0-% and o-k. Moreover, the index value k = N + 1 is identified with 1, dosing the chain in a periodic manner. Suppose first that we want to use H for the construction of an ordinary entanglement witness, that is, we take ~ to be the set of fully separable N-qubit states I¢) = 1¢1) Q9 ... Q91¢N)· The k-th term of (¢IHI¢) evaluates to
(¢klo-XI¢k) (¢k+110- x l¢k+1)
+
+
(¢klo-YI¢k) (¢k+llo- YI¢k+1) (1 + D)(¢klo-ZI¢k) (¢k+110- z l¢k+1) + B(¢klo-ZI¢k).
(8)
Since I¢k) vary independently on their respective Bloch spheres, recall 1 ,10 that we can rewrite the above using independent real variables Xh, Yk and Zk in place of the expectations of o-x, o-y and o-z,
(9) with additional constraints k= 1, ... ,N.
(10)
Minimizing (¢IHI¢) over ~ amounts to finding the conditional minimum of the polynomial in 3N real variables composed of terms (9) subject to the constraints (10). As it was argued in [1, 10], such a minimization can be performed termwise with a slight modification of the "B" part in (9),
h
=
Xk Xk+1
+ YkYk+1 +
(1
Zk + Zk+l + D)ZkZk+1 + B 2
'
248
Figure 5.
The sectors in the (B,D)-plane corresponding to distinct minh values.
using the Lagrange multipliers method. It yields -I-D
. mm h =
{
for B2
-1 - 4(2
+ D)
l-IBI +D
(11)
for
for D < @l_ 2 2
and the corresponding "phase diagram" in the (B, D)-plane is shown in Fig. 5. This way, the respective entanglement witness has the forme W
= H - (Nminh)lI.
(12)
To complete the analysis one has to look closely at the structure of the spectrum of H to find regions in the parameter plane where the ground state is entangled: only then W is an entanglement witness. We will perform such analysis for the case of N = 4 (for other values of N it is formally similar). The spectrum of a 4-spin model (7), i.e. its lowest part, is the following
Al
=
-2 - 2D - 2D' ,
A2
=
-4 - 2B,
A3 = 4 - 4B
+ 4D ,
eN min h is the exact minimum of (4)IHI4>) if N is even, otherwise it may be slightly smaller than the minimum 1.
249
Figure 6. The sectors in (B, D) plane where the respective Ai are the lowest energy levels, marked with different shades of gray. The dashed line is the boundary between the regions I and II of Fig. 5 and it corresponds to the switch of minh value, (11).
where D' = VD2 malization, are
+ 2D + 9 and the corresponding eigenvectors,
up to nor-
1'1/>1) = 10011) + l+D4_DI 10101) + 10110) + 11001) - 1+~+D/1101O) + 11100) and
1'1/>2) = 11110) -11101)
+ 11011) -
10111),
The states 1'1/>1) and 1'1/>2) are entangled. We have restricted here the analysis to the B 2 halfplane; the image for negative B is fully analogous, the respective eigenvectors are obtained from the current ones by spin flipping. The corresponding level-crossing diagram, showing regions in the parameter plane where different Ai become the lowest energy levels, is presented in Fig. 6. Thus 1'1/>3) is the ground state in the region below the A2 = A3 line which is D = ~ - 2; it coincides with sector III of Fig. 5. As 1'1/>3) is separable, no entanglement detection by means of (12) is possible in this domain. Above the D = ~ - 2 line the ground state changes to 1'1/>2) and for still larger values of D - to 1'1/>1). Both these domains have nonempty intersections with sectors I and II of Fig. 5 (cf. the dashed parabola in Fig. 6 corresponding to the I-II border). This data allows one to complete the detailed construction of the entanglement witness based on H as prescribed in (12). We have conducted the
°
250
construction analytically with the use of Mathematica and we have applied the witness W to a family of equilibrium states of (7),
(} (f3 ,
B D) = exp(f3H) , Z'
f3
1
=
kT'
Z
=
Trexp(-f3H) ,
(13)
for a range of positive temperatures. For T close to 0 the mixture (} is dominated by the terms corresponding to the lowest energies, say >'1 < A2 < ... , (14) and so (} inherits mostly the entanglement of l?,I;l)' Therefore, it is detected by W which, by construction, is most sensitive precisely to the entanglement of l?,I;l) type. However, one type of entanglement often present in thermal equilibrium states cannot be detected by witnesses of this kind. The separability of the ground state l?,I;l) of H, while making our witness construction faulty, does not exclude that the state (13) can nevertheless be entangled for a range of positive temperatures. This phenomenon, called thermally induced entanglement, happens when the first excited eigenstate 1?,I;2) is entangled and its admixture shows up in (} via the second term in (14) for certain T> Tmin. In Fig. 7 we collect the results of our tests of W H. The plots (left column) show the value max { -Tr(WH
(}) ,
o} .
for a range of Band T parameters. Separate plots correspond to different values of D, see also Fig. 8 for the location of these values on the eigenvalue level crossing diagram. We shall describe now how H can be used in the construction of a genuine entanglement witness. In this case, b. is the set of biseparable states. The argument that has previously led us to the simplification (910) of the extremalization problem for H is no longer valid, for the local components l
251
Figure 7. Left column: positive values of - Tr(WHe) plotted for a range of Band f3 = k~ for several fixed D values (see also Fig. 8). Right column: similarly, for the genuine entanglement witness WHo
Specifically, we have the following ordering of eigenvalues
III
the sectors
252
Figure 8. The refinement of the level crossing diagram of Fig. 6 for 3 lowest energies of H (d. (15) and the surrounding text). Horizontal dashed lines correspond to the values of D used in the plots of Fig. 7.
(a)-(e):
(a) (b)
(c) (d)
(d') (e)
A2 A2 A2 Al Al Al
< < < < <
<
A3 Al Al A2 A2 A4
< < < < <
<
AI, A3, A4, A4, A3, A2,
(15)
where the eigenvalue A4 = -4 - 4D corresponds to the GHZ-type state 11{>4) = 11010) -10101). In order to find as in (5), we need to estimate the maximal overlap of the ground state 11{>2), and respectively 11{>1) in sectors (d)-(e), with biseparable vectors. This is done by considering all bipartite splits of the entire system: 1- 234, 2 -134, ... , 12 - 34, 13 - 24, ... etc. and, in each case, writing down the Schmidt form of 11{>2) with respect to this split. Then the square of the largest among all Schmidt coefficients obtained in this process is the desired value of a. For example, the normalized state 11{>2) can be
a
253
written as
with respect to the 1 - 234 split. On the other hand, taking the splitting 12 - 34, we have
Now, using the matrices A
= [aij] and B = [bij] as above,
[ 00000 0 O-~] 000 12 0 _12 12 0 '
[
o0 00 o0 o -~
0 0
O-~
1
0 12 ~ 0
the respective squared Schmidt coefficients are the nonzero eigenvalues of A* A and B* B, i.e. {~, and {~, ~}. It is easy to see that due to the symmetry of 11J'!2) other bipartite splittings of the system space yield precisely the same Schmidt coefficients. Hence, we can take as the first approximation (i = + in the region (a) and (i = ~A2 + tAl in (b)-(c). Let us analyze whether this estimate can be improved by (6). Since 11J'!3) is separable, its maximal overlap with biesparable states in ~ is 1, so (i cannot be improved any further in the region (a). Similarly, the maximal squared Schmidt coefficient over all biesparable expansions of 11J'!1) is
t}
P'2 t>'3
,= 1 +D2D'+ D' where D' = JD2 + 2D + 9. Since, > t for D ~ -2, also in the region (b) - (c) no improvement of (i is possible. In (d) and (d') our approximation reads (i = ,AI + (1-,)A2 and in (e) (i = ,AI + (1 -,)A4' Again the fact that, > for D ~ -2 excludes any improvement of (i in (d) or (d'). Finally, as the maximal squared overlap of 11J'!4) with states in ~ is ~ and , ~ ~ in (e), no better (i can be found as well in this case. Fig. 7 summarizes our computational results. Its left column, as mentioned earlier, contains plots of -Tr(WHI?) for the entanglement witness W H, and the right column - plots for the approximate genuine entanglement witness W H just constructed. The exemplary D values used are 1.0,0.25,0.0, -0.5. The reader may relate these plots with the location of D values on the level crossing diagram of Fig. 8: edges in the plots correspond to transitions between different sectors (a) - (e) with the change of B value.
t
254
Acknowledgment The author is greatly indebted to Professor Masanori Ohya and the QBIC Centre for support and hospitality during his visit at Tokyo University of Science in March 2010.
References 1. Michalski M., [in :] Quantum Bio-Informatics III, L. Accardi, W. Freudenberg, M. Ohya, eds., Quantum Probability and White Noise Analysis XXVI, p. 217, World Scientific, 2009. 2. Bennett Ch., et al., Phys. Rev. Lett. 82, 5385 (1999). 3. Collins D., et al., Phys. Rev. Lett. 88, 170405 (2002). 4. Diir W., J. 1. Cirac, R. Tarrach, Phys. Rev. Lett. 83, 3562 (1999). 5. Acin A., et al. , Phys. Rev. Lett. 87, 040401-1 (2001). 6. Bourennane M., et al., Phys. Rev. Lett. 92, 087902-1 (2004). 7. Horodecki M., P. Horodecki, R. Horodecki, Phys. Lett. A 223 , 1 (1996). 8. Lewenstein M., B. Kraus, J. 1. Cirac, P. Horodecki, Phys. Rev. A 62, 052310 (2000). 9. Terhal B., Phys. Lett. A 271 , 319 (2000). 10. T6th G. , Phys. Rev. A 71 , 010301(R) (2005). 11. Horodecki M., P. Horodecki, R. Horodrecki, Phys. Lett. A 283 ,1 (2001). 12. Eisert J., F. G. S. L. Brandao, K. M. Audenaert, New J. Phys. 9 , 46 (2007). 13. Eisert J., D. Gross, Multi-partite entanglement, [in] Lectures on quantum information, D. Bruss and G. Leuchs Eds., Wiley-VCH, Weinheim, 2006. 14. Horodecki R., P. Horodecki, M. Horodecki, K. Horodecki, Rev. Mod. Phys. 81, 865 (2009). 15. Acin A. , et al. , Phys. Rev. Lett . 85, 1560 (2000); quant-ph/0003050. 16. Acin A., A. Andrianov, E. Jane, R. Tarrach, J. Phys. A: Math. Gen. 34, 6725 (2001); quant-ph/0009107. 17. Michalski M., [in:] Quantum Bio-Informatics III, L. Accardi, W. Freudenberg, M. Ohya, eds., Quantum Probability and White Noise Analysis XXVI, p. 231, World Scientific, 2009. 18. Jamiolkowski A., Rep. Math. Phys. 3 , 275 (1972). 19. Carteret H. A., A. Higuchi, A. Sudbery, J. Math. Phys. 41 , 7932 (2000). 20. Verstraete F., J. Dehaene, B. De Moor, Phys. Rev. A 68, 012103 (2003). 21. Verstraete F., J. Dehaene, B. De Moor, H. Verschelde, Phys. Rev. A 65 , 052112 (2002). 22. Diir W., J. 1. Cirac, Phys. Rev. A 61, 042314 (2000). 23. Zhao M.-J., Z.-X. Wang, Rep. Math. Phys. 63 , 409 (2009). 24. Brukner C., V. Vedral, arXiv:quant-ph/0406040v1 (2004). 25. Vedral V., Open Sys. Information Dyn. 16, 287 (2009).
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 255-265)
ON KADISON-SCHWARZ PROPERTY OF QUANTUM QUADRATIC OPERATORS ON M 2 (C)
FARRUKH MUKHAMEDOY*
ABDUAZIZ ABDUGANIEY+
Department of Computational €3 Theoretical Sciences Faculty of Science, International Islamic University Malaysia P.O. Box, 141, 25710, Kuantan Pahang, Malaysia * E-mail: [email protected];[email protected] +E-mail: [email protected] In the present paper we first describe quantum quadratic operators (q.q.o) acting on the algebra of 2 x 2 matrices M2(1C). Moreover, we provide necessary conditions for q.q.o. with Haar to satisfy Kadison-Schwarz condition. By means of such a description we give an example of q.q.o. which is not the Kadision-Schwarz operator. Keywords: quantum quadratic operators; Kadison-Schwartz operator.
1. Introduction
It is known that one of the main problems of quantum information is characterization of positive and completely positive maps on C* -algebras. There are many papers devoted to this problem (see for example 4,10,18,19). In the literature the most tractable maps, the completely positive ones, have proved to be of great importance in the structure theory of C*algebras. However, general positive (order-preserving) linear maps are very intractable 1o ,12. It is therefore of interest to study conditions stronger than positivity, but weaker than complete positivity. Such a condition is called Kadison-Schwarz property, i.e a map ¢ satisfies the Kadison-Schwarz property if ¢(a)*¢(a) :0::: ¢(a*a) holds for every a. Note that every unital completely positive map satisfies this inequality, and a famous result of Kadison states that any positive unital map satisfies the inequality for self-adjoint elements a. In 17 relations between n-positivity of a map ¢ and the KadisonSchwarz property of certain map is established. Some a nice property of the Kadison-Schwarz maps were investigated in 16. The present paper is devoted to the Kadison-Schwarz property of certain 255
256
class of operators on M2(C), It is known that the theory of Markov processes is a rapidly developing field with numerous applications to many branches of mathematics and physics. However, there are physical systems that can not be described by Markov processes. One of such systems is given by quadratic stochastic operators (see 2), which relates to population genetics. The problem of studying the behavior of trajectories of quadratic stochastic operators was stated in 20. The limit behavior and ergodic properties of trajectories of quadratic stochastic operators were studied in 7,8,9. However, such kind of operators do not cover the case of quantum systems. Therefore, in 5,6 quantum quadratic operators acting on a von Neumann algebra were defined and studied. Certain ergodic properties of such operators were studied in 13,14. In the present paper we are going to study quantum quadratic operators (q.q.o.) with the Kadison-Schwarz property. Note that operators with this property no need to be completely positive. Aim of the paper is to find some necessary conditions for the trace-preserving quadratic operators to be the Kadison-Schwarz ones. Since trace-preserving maps arise naturally in quantum information theory (see e.g. 15) and other situations in which one wishes to restrict attention to a quantum system that should properly be considered a subsystem of a larger system with which it interacts. Therefore, in Section 3 we describe q.q.o. with Haar state (invariant with respect to trace), namely certain characterizations of q.q.o, the KadisonSchwarz operators are given. By means of such a description in Section 4, we shall provide an example of q.q.o. which is not a Kadision-Schwarz operator. It is worth to mention that characterizations of positive and completely positive maps defined on M2 (C) were considered in 10,11,19 and 18, respectively.
2. Preliminaries In what follows, by M 2 (C) we denote an algebra of 2 x 2 matrices over complex filed C. By M 2(C) 0 M 2(C) we mean tensor product of M 2(C) into itself. We note that such a product can be considered as an algebra of 4 x 4 matrices M 4 (C) over C. In the sequel II means an identity matrix,
i.e.
n=
(~~).
By S(M 2 (C)) we denote the set of all states (i.e. linear
positive functionals which take value 1 at ll) defined on M2(C)'
Definition 2.1. A linear operator
~
: M 2(C)
----+
M 2(C) 0 M 2(C) is said
257
to be (a) - a quantum quadratic operator (q. q. 0.) if it satisfies the following conditions: (i) unital, i.e. ~ll = II ® ll; (ii) ~ is positive, i.e. ~x ~ 0 whenever x ~ 0; (b) - a K adison-Schwarz operator (KS) if it satisfies ~(x*x) ~ ~(x*)~(x)
for all x E M2(C)'
(1)
Note that if ~ is unital and KS operator, then it is a q.q.o. A state hE S(M 2 (C)) is called a Haar state for a q.q.o. ~ if for every x E M 2 (C) one has
(h ® id)
0
~(x)
= (id ® h)
0
~(x)
= h(x)ll.
(2)
Remark 2.1. Let U : M 2(C) ® M 2(C) --+ M 2(C) ® M 2(C) be a linear operator such that U(x ® y) = y ® x for all x, y E M2(C)' If a q.q.o. ~ satisfies U ~ = ~, then ~ is called a quantum quadratic stochastic operator. Such a kind of operators were studied and investigated in 13,14.
3. Quantum quadratic operators with Kadison-Schwarz property on M 2 (C)
In this section we are going to describe quantum quadratic operators on M 2 (C) as well as find necessary conditions for such operators to satisfy the Kadison-Schwarz property. Recall 3 that the identity and Pauli matrices {ll, 0"1,0"2, 0"3} form a basis for M 2 (C), where
0"1
=
(1001)
0"2
=
(0 -i) 0 i
0"3
=
(10-10) .
In this basis every matrix x E M 2 (C) can be written as x = woll + wa with Wo E C, w = (WI, W2, W3) E C 3 , here WO" = WIO"I + W20"2 + W30"3· Lemma 3.1.
18
The following assertions hold true:
(a) x is self-adjoint iffwo,w are reals; (b) Tr( x) = 1 iff Wo = 0.5, here Tr is the trace of a matrix x; (c) x > 0 iff Ilwll ::; wo, where Ilwll = JIW112 + IW212 + IW312.
258
Note that any state t.p E S(M 2 (C)) can be represented by
t.p(woll + wO") = Wo
+ (w, f),
(3)
where f = (11,12, h) E IR3 with Ilfll :; 1. Here as before (-,.) stands for the scalar product in CC 3 . Therefore, in the sequel we will identify a state t.p with a vector f E IR3. In what follows by T we denote a normalized trace, i.e. T(X) = ~ Tr(x), x E M2(C)' Let ~ : M 2(C) -+ M 2 (C) 0 M 2 (C) be a q.q.o. with a Haar state T. Let us write the operator ~ in terms of a basis in M 2 (C) 0 M 2 (C) formed by the Pauli matrices. Namely, ~ll =
ll 0 ll; 3 3 3
L
~(O"i) = bi(ll 0 ll) + 2:);;)(ll 0 O"j) + :2);;)(O"j 0 ll) + j=l
j=l
bml,i(O"m 00"d,
m ,l=l
where i = 1,2,3. By means of the Haar equality with T (see (2)) we find that bi = 0, el) -- b(2) ..J E {I " 2 3} . bij ij -- 0 f or every z, Positivity of ~ implies that all numbers {bij,d are real ones. Hence, one can prove the following ~ : M 2 (C) -+ M 2 (C) 0 M 2 (C) be a g.g.o. with a Haar then it has the following form:
Theorem 3.1. Let
state
T,
3
L
~(x) = woll 0 II +
(4)
(b m1 , w)a-m 0 0"1,
m,l=l where x
= Wo + WO", b m1 = (bm1,1, bml,2, bm1 ,3).
Let us turn to the positivity of~. Given vector f
= (11 ,12, h)
E IR3
put 3
(3(f)ij
=
L
(5)
bki,jfk.
k=l Define a matrix JB(f) = ((3(f)ij)rj=l. By IIJB(f) II we denote a norm of the matrix JB(f) associated with Euclidean norm in CC 3 . Put
S = {p = (Pl,P2,P3)
E IR3:
pi + p~
+ p~
:;
I}
259
and denote
111lH\111 = sup 11lH\(f) II· fES
Now given a state cp by E
Proposition 3.1. Let ~ be a q.q.o. with a Haar state
T,
then 111lH\111 :S 1.
Proof. From (4) we find 3
E
=
Wo
+ (f, w), f = (il, 12, h)
E S, and we have used cp(ai)
= fi
3
L(bi,j, W)fi = (lH\(f)w)j i=l Positivity of x yields that E
~*(cp®'ljJ)(ak)= Lbij,kfipj, i,j=l Thanks to Lemma 3.1 the functional the vector
satisfies Ilf~'(
~ * (cp ®
k=1,2,3. 'ljJ) is a state if and only if
260
Proposition 3.2. Let ~ : M 2(C) -+ M 2(C) 181 M 2(C) be a linear operator with a H aar state T. Then the bilinear form ~ * (- 181 .) is positive if and only if one holds
(6) From the proof of Proposition 3.1 and the last proposition we get Corollary 3.1. Let R(f) be the corresponding matrix to an operator given by (4). Then IIIRIII :'S 1 if and only if (6) is satisfied. ReIllark 3.1. Note that characterizations of positive maps defined on M2 (
Next we would like to find some conditions for q.q.o. to be the KadisonSchwarz ones. Let ~ : M2 (C) -+ M2 (C) 181 M2 (C) be a linear operator with a Haar state T, then it has a form (4). Now we are going to find some conditions to the coefficients {bml,d when ~ is a Kadison-Schwarz operator. Given x = Wo + wa and state
= ((b m1 , w), (b m2 , w), (b m3 , w)), fm =
Xm
(7)
(8)
where m, l = 1,2,3. Note that here the numbers ami are skew-symmetric, i.e. ami = -ami. By 7r we shall denote mapping {I, 2, 3, 4} to {I, 2, 3} defined by 7r(1) = 2, 7r(2) = 3, 7r(3) = 1, 7r(4) = 7r(1). Denote
q(f, w)
=
((iJ(fh, [w, w]), (iJ(fh, [w, w]), (iJ(fh, [w, w])),
(9)
where iJ(f)m = (iJ(f)ml, iJ(f)m2, iJ(f)m3) (see (5)) TheoreIll 3.2. Let ~ : M 2(C) -+ M 2(C) 181 M 2(C) be a Kadison-Schwarz operator with a H aar state T, then it has the form (4) and the coefficients {bml ,d satisfy the following conditions 3
IIwl1 2 :'S i
3
L fm a 1f(m),1f(m+l) + L m= 1
m= 1
II x ml1 2
(10)
261
Ilq(f, w) - ii;/ml'7r(m) '7r Cm+l) - [Xm, xmlll ::; IIwl1 2 - itfka7rCk) ,7rCk+l) 3
- LII XmI1 2. m=l
(11)
for all f E S, wE 1[:3. Here as before Xm = ((b ml , w), (b m2 , w), (b m3 , w)), bml = (bml,l, bml ,2, bml,3) and q(f, w), ami and l'ml are defined in (7) ,(8) .(9) , respectively.
Remark 3.2. The provided characterization with 10,18 allows us to construct examples of positive or Kadison-Schwarz operators which are not completely positive (see next section). 4. An Example of q.q.o. which is not Kadision-Schwarz one
In this section we are going to provide an example of q.q.o. which does not satisfy the Kadision-Schwarz property. Let us consider the following matrix {bij ,d given by: bll ,l = E;
bll ,2 = 0;
bll ,3 = 0;
b12 ,1 = 0;
b12 ,2 = 0;
b12 ,3 = E;
h3 ,1 = 0;
b13 ,2 = E;
h3 ,3 = 0;
b22 ,1 = 0;
b22,2 = E;
b22 ,3 = 0;
b23 ,1 = E;
b23 ,2 = 0;
b23 ,3 = 0;
b33,1 = 0;
b33 ,2 = 0;
b33,3 = E;
and bij,k = bji,k. Via (4) we define a linear operator D. c, for which T is a Haar state. In the sequel we would like to find some conditions to E which ensure positivity of D.c. For given {bijd one can easily find a form of D.c as follows
D.c(x) =
won 0 n+ EWIO'I 00'1 + EW30'1 00'2 + EW20'1 00'3
where as before x
+EW30'2 0 0'1
+ EW20'2 0
+EW20'3 0 0'1
+ EWl0'3 0 0'2 + EW30'3 0 0'3,
0'2
+ EWl0'2 0
0'3
= won+ WO'.
Theorem 4.1. A linear operator D.c is a q.q.o if and only if lEI ::; ~.
(12)
262
Proof. Let x = won + wa be a positive element from M2(C), Let us show positivity of the matrix b.c:(x). To do it, we rewrite (12) as follows b.c:(x) = won + EB, here
B= ( WI
W3 W2 - iWI W2 - iWI Wz + iWI -W3 WI + Wz W2 + iWI WI + W2 -W3 + 2iw3 - W2 -W2 - iWI -W2 - iWI
2iw3.- WZj -W2 + ZWI -W2 + iWI W3
WI -
where positivity of x yields that wo, WI, W2, W3 are real numbers. In what follows, without loss of generality, we may assume that Wo = 1, and therefore I/wl/ ::; 1. It is known that positivity of the matrix b.c:(x) is equivalent to positivity of the eigenvalues of b.c:(x). Let us first examine eigenvalues of B. A simple algebra shows us that all eigenvalues of B can be written as follows
Now examine maximum of the functions the ball Ilwll ::; 1. One can see that
IA3(W)1 = IA4(W)1 ::; Now let us rewrite AI(W) = WI
AI(W)
AI(W),
3
3
k=1
k=1
A2(W), A3(W), A4(W) on
:L IWkl ::; V3:L IWkl2 ::; V3
and A2(W) as follows
+ W2 + W3 + ~J3(wi + w~ + w§) -
(WI
+ Wz + w3)2 (13)
AZ(W) =
WI
+W2 +W3 -
~J3(wi +W§ +W§) -
(WI
+W2 +W3)2 (14)
On the other hand, we have
263
for any h E lR (k = 1,2). Therefore, the functions Ak(W), k = 1,2 reach their maximum on the sphere wi + w~ + w~ = 1 (i.e. Ilwll = 1). Hence, denoting t = WI + W2 + W3 from (13) and (14) we introduce the following functions gl(t)
=t+
~~,
where It I :::: V3. One can find that the critical values of gl are t = ±1, and the critical value of g2 is t = -1. Consequently, the maximum of gl and g2 on It I :::: V3 are the following:
max Igl(t)1 Itl:"=v'3
= 3,
max Ig2(t)1 Itl:"=v'3
= 3;
= 3,
max IA2(W)1 Iwl9
= 3;
Therefore, we conclude that max IAl(W)1 Iwl:"=l
It is known that for the spectrum of II + EB one has 8p(ll + EB)
= 1 + E8p(B)
Therefore, 8p(ll + EB)
=
{I
+ EAk(W)
: k = 1,4}
So, if 1
lEI < , - max IAk(W)1 IIwl19
k
= 1,4
then we have I+EAk(w) 2: 0 for allllwil :::: 1, k = 1,4. This implies that the matrix II + EB is positive for all w with Ilwll :::: 1. This yields the required 0 assertion. Theorem 4.2. Let E
=
i
then corresponding q.q.o. b. E is not KS-opemtor.
Proof. It is enough to show dissatisfaction the inequality (11) at some value w: Ilwll:::: 1 and f = (h, h, h)· Assume that f = (1,0,0), then a little algebra shows that (11) reduces to the following one
VA+B+C
(15)
264
where A = IE(W2W3 - W3W2) - iE2(2w2W3 - 21wll2 - W2Wi
+ WiW2
- WiW3
+ W3WiW
B = IE(WiW2 - W2Wt} - iE2(2wiW2 - 21w312 - WiW3
+ W3Wi
- W3W2
+ W2W3W
C = IE(W3Wi - WiW3) - iE2(2w3Wi - 21w212 - W3W2
+ W2W3
- W2Wi
+ WiW2)1 2
+ IW212 + IW312) W2W3 + W2 Wi - WiW2 + WiW3
D = (1 - 3IEI2)(lWlI2
-iE2(W3W2 -
- W3Wt}
Now choose w as follows:
Wi =
1
- 9;
W3
Then calculations show that A = 9594 . 19131876' 1625 C = 3779136; Hence, we find 9594 19131876
+
19625 86093442
B
=
=
5i 27
19625 . 86093442'
D=~. 17496
+
1625 3779136
>
589 17496
which means that (15) is not satisfied. Hence, .6.", is not a KS-operator at 10 = 1/3. 0 Now we are going to show that condition (6) is necessary for positivity of .6.. Let us again consider the operator .6.",. If 1101 :s: ~ is satisfied, then (6) holds. Indeed,
t,
1iti bij,kfiPj 12
=
E2(lhpi +Ihpi
+ hp2 + hp31 2 + Ihpi + hp2 + hP31 2
+ hp2 + hp31 2)
:s: E2((j~ + fi + f~)(pi + p~ + p~) +(j~
+ fi + ff)(pi + p~ + p~) +(pi + p~ + p~)(fi + f~ + f~))
:s: 10 2 (1 + 1 + 1) = 310 2 :s: l. From this and Theorem 4.1 we conclude that if operator .6.", is not positive, while (6) is satisfied.
10
E (~, ~) then the
265
Acknowledgement
The first author (F.M.) would like to thank Professor Noboru Watanabe and Professor Masanori Ohya for their kind hospitality during ICQBIC 2010. This work is partially supported by the Malaysian Ministry of Science, Technology and Innovation Grant 01-01-08-SF0079 and by HUM Research Endowment Grant B (EDW B 0905-303). References 1. S. N. Bernstein, Uchen. Zapiski NI Kaf. Ukr. Otd. Mat. 1924, no. 1, 83115. (Russian) 2. L. Boltzmann, Selected works, Nauka, Moscow 1984. (Russian) 3. O. Bratteli, D. W. Robertson, Operator algebras and quantum statistical mechanics. I, Springer, New YorkHeidelbergBerlin 1979. 4. M-D. Choi, Lin. Alg. Appl. 10(1975), 285-290. 5. N. N. Ganikhodzhaev, F. M. Mukhamedov, Uzb. Matem. Zh. 1997, no. 3, 8-20. (Russian) 6. N.N. Ganikhodzhaev, F. M. Mukhamedov, Izv. Math. 65 (2000), 873-890. 7. H. Kesten, Adv. in Appl. Probab. 1970, no. 2, 182, 179228. 8. Yu. I. Lyubich, Russian Math. Surveys 26(1971). 9. Yu. I. Lyubich, Mathematical structures in population genetics, Springer, Berlin 1992. 10. W.A. Majewski, M. Marciniak, J. Phys. A: Math. Gen. 34 (2001) 5863-5874. 11. W.A. Majewski, M. Marciniak, arXiv:0705.0798 12. W.A. Majewski, J. Phys. A: Math. Gen. 40 (2007) 11539-11545. 13. F.M. Mukhamedov, Method of Funct. Anal. and Topology, 7(2001), No.1, 63-75. 14. F.M. Mukhamedov, Izvestiya Math. 68(2004), 1009-1024. 15. M.A. Nielsen, I.L. Chuang, Quantum Computation and Quantum Information, Cambridge University Press, Cambridge, 2000. 16. A. G. Robertson, Math. Z. 156(1977), 205-206. 17. A. G. Robertson, Math. Proc. Camb. Philos. Soc. 94(1983), 291-296. 18. M.B. Ruskai, S. Szarek, E. Werner, Lin. Alg. Appl. 347 (2002) 159-187. 19. E. Stormer, Acta Math. 110(1963), 233-278. 20. S. M. Ulam, A collection of mathematical problems, Interscience, New YorkLondon 1960.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W . Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 267-278)
ON PHASE TRANSITIONS IN QUANTUM MARKOV CHAINS ON CAYLEY TREE
LUIGI ACCARDI and FARRUKH MUKHAMEDOY* SABUROY+
MANSOOR
Centro Interdisciplinare Vito Volterra II Universita di Roma "Tor Vergata " Via Columbia 2, 00133 Roma, Italy E- email: [email protected] Department of Computational & Theoretical Sciences Faculty of Science, International Islamic University Malaysia P.O. Box, 141, 25710, Kuantan Pahang, Malaysia * E-mail: [email protected];farrukh_m @iiu.edu.my + E-mail: [email protected] In the present paper we continue our investigations started in [Accardi L. , Ohno, H. , Mukhamedov , F. , Quantum Markov fields on graphs , I nf. Dim. Analysis, Quantum Probab. Related Topics (accepted) arxi v: 0911 . 1667]. In [Accardi L., Mukhamedov, F., Saburov M. On Quantum Markov chains on Cayley tree and associated chains with XY-model arxiv: 1004.3623] we provided a construction of forward and backward Quantum Markov Chains (QMC) d efined on the Cayley tree , and established uniqueness of QMC associated with XY-model on a Cayley tree order 2. In the present paper we study the same model on a Cayley tree order 3. Surprisingly in this case, we establish a phase transition (i.e. existence of two distinct quantum Markov chains) for the considered model on the Cayley tree order 3.
K eywords: quantum Markov Chain; Cayley tree; phase transition .
1. Introduction
Nowadays, it is know that Markov fields play an important role in classical probability, in physics, in biological and neurological models and in an increasing number of technological problems such as image recognition. Therefore, it is quite natural to forecast that the quantum analogue of these models will also playa relevant role. The quantum analogues of Markov processes were first constructed in 1, where the notion of quantum Markov 267
268
chain on infinite tensor product algebras was introduced. Nowadays, quantum Markov chains have become a standard computational tool in solid state physics, and several natural applications have emerged in quantum statistical mechanics and quantum information theory. The reader is referred to 3,5,6,16,21 and the references cited therein, for recent developments of the theory and the applications. A first attempts to construct a quantum analogue of classical Markov fields has been done in 4,6,9,17. These papers extend to fields the notion of quantum Markov state introduced in 8 as a sub-class of the quantum Markov chains introduced in 1 In 7 it has been proposed a definition of quantum Markov states and chains, which extend a proposed one in 20, and includes all the presently known examples. Note that in mentioned papers quantum Markov fields were considered over multidimensional integer lattice 7l,d. This lattice has so called amenability condition. Therefore, it is natural to investigate quantum Markov fields over non-amenable lattices. One of the simplest non-amenable lattice is a Cayley tree. First attempts to investigate Quantum Makov chains over such trees was done in 12, such studies were related to investigate thermodynamic limit of valence-bondsolid models on a Cayley tree 14. It was constructed finitely correlated states as ground states of that model. The mentioned considerations naturally suggest the study of the following problem: the extension to fields of the notion of generalized Markov chain. In 10 we have introduced a hierarchy of notions of Markovianity for states on discrete infinite tensor products of C* -algebras and for each of these notions we constructed some explicit examples. We showed that the construction of 8 can be generalized to trees. It is worth to note that, in a different context and for quite different purposes, the special role of trees was already emphasized in 17. Note that a noncommutative extensions of classical Markov fields, associated with Ising and Potts models on Cayley tree, were investigated in 18,19. In the classical case, Markov fields on trees are also considered in 22_26. In the present paper we continue our investigations started in 10,11. Using the construction (see 11) of forward and backward Quantum Markov Chains (QMC) defined on the Cayley tree, we investigate QMC associated with XY-model on a Cayley tree order 3. We establish a phase transition for the XY-model on the Cayley tree order 3 in a QMC scheme. Note that classical XY-model have been investigated by many authors on a Cayley tree 13.
269
2. Preliminaries
Recall that a Cayley tree rk of order kQ:1 is an infinite tree whose each vertices have exactly k + 1 edges. The vertices x and yare called nearest neighbors and they are denoted by I =< x, Y > if there exists an edge connecting them. A collection of the pairs < x, Xl >, ... , < Xd-l, Y > is called a path from the point x to the point y. The distance d(x, y), x, Y E V, on the Cayley tree, is the length of the shortest path from x to y. If we cut away an edge {x, y} of the tree r k, then rk splits into connected components, called semi-infinite trees with roots x and y, which will be denoted respectively by rk(x) and rk(y). If we cut away from rk the origin together with all k + 1 nearest neighbor vertices, in the result we obtain k+1 semi-infinite rk(x) trees with x E So = {y E rk : d(O, y) = I}. Hence we have
°
U rk(x) U {a}.
rk =
xESo
Therefore, in the sequel we will consider semi-infinite Cayley tree r~ = (L, E) with the root xo, L is the set of vertices and E is the set of edges. Now we are going to introduce a coordinate structure in r~ as follows: every vertex x (except for xo) of r~ has coordinates (iI, ... ,in), here im E {I, ... , k}, 1 :::; m :::; n and for the vertex xo we put (0). Namely, the symbol (0) constitutes level 0, and the sites (i l , ... , in) form level n ( i.e. d( xo, x) = n) of the lattice. Let us set n
Wn
=
{x E L : d(x,xo)
= n},
An
=
UWk,
m
A[n,mJ
=
U Wk,
(n < m)
k=n
k=O
co
En
= {<
X,y >E E : X,y E An},
A~ =
UWk k=n
For x E r~, x
= (il, ... , in) denote S(x) = {(x,i): 1:::; i:::; k},
here (x, i) means that (i l , ... , in, i). This set is called a set of direct successors of x. From these one can see that Am = Am- 2 U (
U
{x U S(x)}),
(1)
XEW=-l
Em \ Em-l =
U U {< X,y >}. XEW=_l YES(x)
(2)
270
The algebra of observables Ex for any single site x E L will be taken as the algebra Md of the complex d x d matrices. The algebra of observables localized in the finite volume A c L is then given by EA = ® Ex. As xEA
usual if Al c A2 C L, then EAl is identified as a sub algebra of EA2 by tensoring with units matrices on the sites x E A2 \ AI. Note that, in the sequel, by E A,+ we denote positive part of EA. The full algebra EL of the tree is obtained in the usual manner by an inductive limit
One can see that the shift Ii induces a homomorphism on EL . In what follows, by S(EA) we will denote the set of all states defined on the algebra EA. Consider a triplet C c E c A of unital C* -algebras. Recall that a quasiconditional expectation with respect to the given triplet is a completely positive (CP) identity preserving linear map E : A ----* E such that E(ca)
= cE(a),
a E
A, c E C.
(3)
Notice that, as the quasi-conditional expectation E is a real map, one has E(ac) = E(a)c,
a E A, c E C.
as well.
Definition 2.1. Let cp be a state on E L . Then cp is called (i) a forward quantum d-Markov chain (QMC) , associated to {An}, on EL if for each An, there exist a quasi-conditional expectation EAi. with respect to the triplet
(4) and a state
such that for any n E N one has
(5) and
(6) in the weak- * topology.
271
(ii) a backward quantum d-Markov chain, associated to {An}, if there exist a quasi-conditional expectation EAn with respect to the triple BA n _ 1 C BAn C BAn+l for each n E N and an initial state Po E S(BAo) such that
in the weak-* topology. In this definition, forward and backward QMC
Remark 2.1. Note that in 10 the forward QMC has been called a generalized quantum Markov state. We think the present definition is a more adequate than the generalized quantum Markov state. 3. Constructions of Quantum d-Markov chains on the Cayley tree In this section, we are going to provide constructions of forward and backward quantum d-Markov chains. Note that some of part of the construction has been outlined in 10. Assume that for each edge < x, y >E E of the tree is assigned an operator K<x,y> E B{x,y}' We would like to define a state on BAn with boundary conditions Wo E B(o),+ and h = {h x E B x ,+ }xEL. To do this, let us denote Kn
=
w6/ 2K[0,ljK[1,2j ... K[n-1,n] II
h~/2,
(7)
xEWn
here by definition we put k
K[O,l] :=
II K<xO,(xO,i»'
k
K[m-1,mj:=
II II K<x,(x,i»,
m?: 1 (8)
i=l
Since, in generally, the operators K<x,y> do not commute (they share a vertex) and therefore, the product over i of operators Kx,(x,i) is actually ordered product, i.e. here we are taking a multiplication of those operators from left to right in the given coordinate structure. Define
(9) It is clear that Wnj, Wnj are positive.
272
In what follows, by TrA : BL - f BA we mean normalized partial trace, for any A <:;;fi n L. For the sake of shortness we put Trnj := TrAn' Now we are ready to define positive functionals 'PSn,b~ and 'PSn,~ on BAn 0, 0, by
for every a
E
'PS:,b~(a)
=
Tr(Wn +1j(a ® llwn+J),
(10)
'P~'~(a)
=
Tr(Wn+1j(a ® llwn+J),
(11)
®
BAn' where llwn+l =
ll.
Note that here,
Tr
is a
normalized trace on B L. To get a state 'P(c) on BL , i.e. on the infinite-volume tree, by means of 'P(n ,c ) (here c = b, f) such that 'P(c) BA = 'Pw(n ,ch), we need to impose wo, h n 0, some constrains to the boundary conditions {wo , h} so that the functionals {'P(n,ch)} satisfy the compatibility condition, i.e. WQ,
r
r
(n+1 ,c) 'PwQ,h BAn
=
(n,c) 'PwQ,h'
(12)
Theorem 3.1. 10 Let the boundary conditions Wo E B(o) ,+ and h Bx,+ }xEL satisfy the following conditions: Tr(woh o) = 1 k Tr{x}
(g
for every x E L.
K <x,(x,i»
= {h x
E
(13) k
k
g gK~x'(X'k+1-i» ) = h(x ,i)
hx ,
(14)
Then the functionals {'PS:'~~} satisfy the compatibility
condition (12). There is a unique state 'P(b) h on BL such that (n(b) h wo, rwo, r (n,b) W 1m n->oo 'P wQ ,h . If hx is invertible for all x E L , then 'P(b) h is a backward QMC on B L . WQ,
On the other hand, it is known 8 that {'P~',~} satisfy the compatibility condition if a sequence {Wnj} is projective with respect to Trnj, i.e. Trn - 1j (Wn j)
=
W n - 1j,
Vn E N.
(15)
One has the following Theorem 3.2. 11 Let the boundary conditions Wo E B(o) ,+ and h = {h x E Bx,+ }xEL satisfy (13) and (14). Then {Wnj} is a projective sequences of density operators, i. e. there is a unique forward QM C 'P WQ, (f) h on B L such U ) = W - lim that (n (n (n,J). rWQ ,h n---+oo Ywo ,h
273
Definition 3.1. We say that there exists a phase transition for a family of operators {K<x,y>} if (13) and (14) have at least two (wo, {hx}xEL) and Cwo, {hx}xEL) solutions such that the corresponding QMC 'Pw 0, hand 'Pw 0, fi are disjoint. Otherwise, we say there is no phase transition. 4. Quantum d-Markov chains associated with XY-model
In this section, we prove the existence of a phase transition of the quantum d-Markov chain associated with XY-model on a Cayley tree of order three. Let us consider a semi-infinite Cayley tree r~ = (L, E) of order 3. Our starting C* -algebra is the same EL but with Ex = M2 (q for x E L. By O"~u), O"£u) , O"~u) we denote the Pauli spin operators for at site u E L. Here (u)=(OI) O"x 10'
=
(0 -i) °'
O"z
=
exp{,6H}, ,6
(u)
O"y
i
(u)
(1 °)
0-1'
(16)
For every edge < u, v >E E put K
=
> 0,
(17)
H
= -12 (O"(u)O"(v) x x + O"(u)O"(v)) y y .
(18)
where
Now taking into account the following equalities 2m _ H2 _ 1 (n H -:2
-
(u) (v)) o"z O"z ,
2m-l H
== H ,
mEN,
one finds K =
n+ sinh,6H + (cosh,6 - I)H~u,v>'
We are going to describe all solutions h = {h x } and Wo of the equations (13),(14). Furthermore, we shall assume that hx = hy for every x, y E W n , n E N. Hence, we denote h~n) := hx, if x E W n . Now from (17),(18) one can see that K = K~u,v>' therefore, the equation (14) can be rewritten as follows K K ) - h(n-l) Trx ( K<x,y> K <X,z> K <x,v> h y(n)h(n)h(n)K z v <x,v> <X,z> <X,y> - x ,
(19) for every x E L. After a little algebra the equation (19) reduces to the following system {
2 - a(n-l) + A 2 a(n)la(n)1 11 12 11 Bllai~)I(ai~))2 + Allai~)13 = lai~-I)1 (n))3 B 2 (a 11
(20)
274
where Al = sinh 3 ;3 cosh;3, BI = sinh;3 cosh 2 ;3(1 + cosh;3 + cosh 2 ;3), (21) A2 = sinh2 ;3 cosh 2 ;3(1 + 2 cosh ;3), B2 = cosh 6 ;3, (22)
= hen) = hen) = z v
hen) y
(n) ( all (n)
(n))
l2 a (n) a 21 a 22
.
Remark 4.1. Note that according to positivity of h~n) and ai';) = a~~) we conclude that ai~) > lai;) I for all n E N.
Now we are going to investigate the derived system (20). To do this, let us define a mapping f : (x, y) E lR~ --+ (Xl, yl) E lR~ given by
{
B2(XI)3 + A 2x ' (y')2 = X BI (Xl?yl + Al (yl)3 = Y
(23)
Furthermore, due Remark 4.1, we restrict the dynamical system (23) to the following domain ,6.
= {(x,y)
E lR~
:x
> y}.
Denote Pg(t) = t g
+ 2t4 + 2t 3 -
t - 1,
(24)
+ 2 cosh;3) + D cosh6 ;3'
(25)
D:= A2 - AI. B I -B2 Further, we will need the following auxiliary fact:
(26)
-
tS
-
t7
-
t6 1
E := sinh2;3 cosh 2 ;3(1
Lemma 4.1. Let AI, B I , A 2 , B 2 , D be numbers defined by (24), (22), (26) and Pg(t) be polynomial given by (24), where ;3 > O. Then the following statements holds true:
(i) The polynomial P g (t) has only tree positive roots 1, t*, and t* such that 1.05 < t* < 1.1 and 1.5 < t* < 1.6. Moreover, if t E (1, t*) u (t*,oo) then Pg(t) > 0 and t E (t*,t*) then Pg(t) < O. Denote by ;3* = cosh -1 t* and;3* = cosh -1 t*; (ii) For any ;3 E (0, (0) we have Al < A 2 ; (iii) If;3 E (0,;3*] U [;3*,(0) then BI :s; B2 and If;3 E (;3*,;3*) then BI > B 2 ;
275
(iv) (v) (vi) (vii)
For any (3 E (0,00) we have Al + BI < A2 + B 2; If (3 E ((3*, (3*) then D > 1 and E > 0; For any (3 E (0,00) we have AIA2 < BIB2 and AIB2 < A 2B I ; If (3 E ((3*, (3*) then A2BI < AIA2 + 3A I B 2 + BIB2 and 2AIA2 3A I B 2 < A 2B I ·
+
Let us first find all of the fixed and periodic points of (23). Theorem 4.1. Let f be a dynamical system given by (23). Then the following assertions hold true:
(i) If (3 E (0, (3*] U [(3*,00) then there is a unique fixed point (cos~3 f3' 0) in the domain 6.;
(ii) If (3 E ((3*, (3*) then there are two fixed points in the domain 6., which are Cos~3f3'0) and (VDE,VE). (iii) For any (3 E (0,00) the dynamical system f does not have any k periodic points, where k ;::: 2.
Now let us formulate results concerning limiting behavior of f. Theorem 4.2. Let f : 6. ---+ lR~ be the dynamical system given by (23) and (3 E (0, (3*] U [(3*,00). Then the following assertions hold true:
(i) ify(O) > 0 then the trajectory {(x(n),y(n))}~=o of f starting from the point (x(O), y(O)) is finite. = 0 then the trajectory {(x(n),y(n))}~=o starting from the point (x(O), yeO)) has the following form
(ii) if yeO)
3\1x(O) cosh 3 (3 -n--{ x(n) = - - cosh 3 (3 yen)
= 0,
and it converges to the fixed point (cos~3 f3 ' 0). Theorem 4.3. Let f : 6. ---+ lR~ be the dynamical system given by (23) and (3 E ((3*, (3*). The following assertions hold true: (i) There are two invariant lines w.r.t. f defined by y = O} and 12 = {(x,y) E 6.: y = ]v};
h = {(x, y)
E
6. :
(ii) if an initial point (x(O), yeO)) belongs to the invariant lines lk' of dynamical system (23), then the trajectory {(x(n) , y(n))}~=o, starting from the point (x(O), yeO)), converges to the fixed point which belongs an invariant line lk' where k = l,2;
276
(iii) if an initial point (x(O), yeO)) satisfies the following condition
yeO) x(O)
( E
1)
0, y75
,
then the trajectory {(x(n), y(n))}~=o, starting from the point (x(O) , y(O)), converges to the fixed point (cos~3 f3 ,0) which belongs an invariant line h; (iv) if an initial point (x(O), y(O)) satisfies the following condition yeO) x(O) E
(1 ) y75,1
,
then the trajectory {(x(n),y(n))}~=o, starting from the point (x(O) , y(O)), is finite. Let 13 E (0,13*] u [13*,00). From Theorem 4.2, we infer that equation (19) has a lot of parametrical solutions (wo (a), {h x (a)}) given by
3\1a cosh3 13 wo(a) =
h~n) (a) =
(
cosh 3 f3
(27)
° for every x E V., here a is any positive real number. The boundary conditions corresponding to the fixed point of (23) is give by (27) at value of ao =
1
3 in. Therefore, further, we denote such cosh 13 operators by Wo (ao) and h~n) (ao). Let us consider the states
(wo(a),{h~n\a)}). One can show that
Q
Theorem 4.4. Let 13 E (0,13*] u [13*,00). There is a unique backward and forward quantum d-Markov chains, respectively, for the model (17) on the Cayley tree of order 3.
277
Taking into account Theorems 4.1 and 4.3 one can establish the following Theorem 4.5. Let (3 E ((3*, (3*), then there is a phase transition for the model (17) on the Cayley tree of order 3.
We note that in 11 we have considered the model (17) on the Cayley tree of order 2 and established uniqueness of the backward and forward QMS associated with this model. Acknowledgement
The authors would like to thank Professor Noboru Watanabe and Professor Masanori Ohya for their kind hospitality during ICQBIC 2010. The present study have been done within the Malaysian Ministry of Higher Education Grant FRGS0308-91. The authors (F.M. and M.S.) also acknowledge the Malaysian Ministry of Science, Technology and Innovation Grant 01-01-08SF0079. References 1. Accardi L., On the noncommutative Markov property, Func. Anal. Appl., 9 (1975) 1-8. 2. Accardi L., Cecchini's transition expectations and Markov chains in: Quantum Probability and Applications IV Springer LNM N. 1396 (1987) 1-6 3. Accardi 1., Fidaleo F., Entangled Markov chains. Annali di Matematica Pum e Applicata, 184(2005), 327-346. 4. Accardi L., Fidaleo F., Quantum Markov fields, Inf. Dim. Analysis, Quantum Probab. Related Topics 6 (2003) 123-138. 5. Accardi L., Fidaleo F., Non homogeneous quantum Markov states and quantum Markov fields, J. Funct. Anal. 200 (2003), 324-347. 6. Accardi L., Fidaleo F., On the structure of quantum Markov fields, Proceedings Burg Conference 15-20 March 2001, W. Freudenberg (ed.), World Scientific, QP-PQ Series 15 (2003) 1-20 7. Accardi L., Fidaleo F. Mukhamedov, F., Markov states and chains on the CAR algebra, Inf. Dim. Analysis, Quantum Probab. Related Topics 10 (2007), 165-183. 8. Accardi L., Frigerio A., Markovian cocycles, Proc. Royal Irish Acad. 83A (1983) 251-263. 9. Accardi L., Liebscher V., Markovian KMS-states for one-dimensional spin chains, Infin. Dimens. Anal. Quantum Probab. Relat. Top., 2(1999) 645-661. 10. Accardi L., Ohno, H., Mukhamedov, F., Quantum Markov fields on graphs, Inf. Dim. Analysis, Quantum Probab. Related Topics (accepted) arxiv: 0911.1667. 11. Accardi L., Mukhamedov, F., Saburov M. On Quantum Markov chains on Cayley tree and associated chains with XY-model arxi v: 1004.3623.
278
12. Affleck L, Kennedy E., Lieb E.H., Tasaki H., Valence bond ground states in isortopic quantum antiferromagnets, Commun. Math. Phys. 115 (1988), 477-528. 13. Bernardes A.T., de Oliveira M.J., Field behaviour of the XY chiral model on a Cayley tree, J. Phys. A, 25(1992) 1405-1415. 14. Fannes M., Nachtergaele B. Werner R. F., Ground states of VBS models on Cayley trees, J. Stat. Phys. 66 (1992) 939-973. 15. Fannes M., Nachtergaele B. Werner R. F., Finitely correlated states on quantum spin chains, Commun. Math. Phys. 144 (1992) 443-490. 16. Fidaleo F., Mukhamedov F., Diagonalizability of non homogeneous quantum Markov states and associated von Neumann algebras, Probab. Math. Stat. 24 (2004), 401-418. 17. Liebscher V., Markovianity of quantum random fields, Proceedings Burg Conference 15-20 March 2001, W. Freudenberg (ed.), World Scientific, QPPQ Series 15 (2003) 151-159 18. Mukhamedov F.M. On factor associated with the unordered phase of A-model on a Cayley tree. Rep. Math. Phys. 53(2004) 1-18. 19. Mukhamedov F.M., Rozikov U.A. On Gibbs measures of models with competing ternary and binary interactions on a Cayley tree and corresponding von Neumann algebras I, II. J. Stat. Phys. 114(2004), 825-848; 119(2005), 427-446 20. Ohno H., Extendability of generalized quantum Markov chains on gauge invariant C* -algebras, Inf. Dim. Analysis, Quantum Probab. Related Topics, 8(2005) 141-152. 21. Ohya M., Petz D., Quantum entropy and its use, Springer, BerlinHeidelbergNew York, 1993. 22. Preston C., Gibbs States on Countable Sets, Cambridge University Press, London, 1974. 23. Spataru A., Construction of a Markov Field on an infinite Tree, Advances in Math 81(1990), 105-116. 24. Spitzer F., Markov random Fields on an infinite tree, Ann. Prob. 3 (1975) 387-398. 25. Zachary S., Countable state space Markov random fields and Markov chains on trees, Ann. Prob. 11 (1983) 894-903. 26. Zachary S., Bounded attractive and repulsive Markov specifications on trees and on the one-dimensional lattice, Stochastic Process. Appl. 20 (1985) 247256.
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 279-289)
SPACE(-TIME) EMERGENCE AS SYMMETRY BREAKING EFFECT*
IZUMIOJIMA Research Institute for Mathematical Sciences, Kyoto University, Kyoto 606-8502, Japan The microscopic origin of space(-time) geometry is explained on the basis of an emergence process associated with the condensation of infinite number of microscopic quanta responsible for symmetry breakdown, which implements the basic essence of "Quantum-Classical Correpondense" and of the forcing method in physical and mathematical contexts, respectively. From this viewpoint, the space(-time) dependence of physical quantities arises from the "logical extension" 6 to change "constant objects" into "variable objects" by tagging the order parameters associated with the condensation onto "constant objects"; the logical direction here from a value y to a domain variable x (to materialize the basic mechanism behind the Gel'fand isomorphism) is just opposite to that common in the usual definition of a function I : x f----t I(x) from its domain variable x to a value y = I(x).
1. Outline of the Problem
Before going into the main context, a comment would be necessary on the parenthesis in "Space( -Time) Emergence" in the title: in sharp contrast to the case of space, the emergence of time axis seems now doubtful. To justify this suspicion, we need re-examine its consistency with the spacetime picture essential in special and general theories of relativity, which is not undertaken yet here. This is the reason for the expression "Space(Time)". 1.1. "Theory of Everything" vs. Duheme-Quine thesis
In search for a new theory to incorporate the old and standard one as a special case, one usually attempts trial-and-error searches in a heuristic way which seems to be unavoidable. How and to which extent can this be made systematic by the method for solving "inverse problem"? In this
* Talk
at ICQBIC (10 - 13 March 2010) at Tokyo University of Sciences, Noda
279
280
context, we note the existence of an obstruction to this possibility in such a form as "Duheme-Quine thesis". This is just a No-Go theorem telling the impossibility to determine uniquely a theory from phenomenological data so as to reproduce the latter, because of unavoidable finiteness in number of measurable quantities and of their limited accuracy:
non-unique choices of starting "Micro":
predictions: not l-to-l inference: not onto
finite data with limited accuracy . at "Macro" level
Owing to inevitable errors in the measured data, the agreements between theoretical predictions and experimental data justify the former only as one of the possible candidates to explain the latter: Theory 1 '\. Theory 2 ----+ Experimental data
+ errors.
/ Fortunately, our bi-directional method of "Micro-Macro duality" 1 can resolve this universal dilemma in harmony with the necessary and sufficient levels of accuracy determined by the inevitable restrictions on focused aspects and degrees of accuracy inherent in a certain pre-chosen context. Within such a context, a theoretical explanation can be unicified by "Micro-Macro duality" as a context-dependent "matching condition" 2 between the phenomena ("Macro") to be described and the theory to describe ("Micro"); "Macro" in this mathematical formulation plays the roles of a standard reference frame characterized by its "universality" (in the mathematical and categorical sense). This naturally leads us to the idea of "matching condition" between inductive & deductive aspects for judging the correctness, which demarcates and characterizes the target domain of discourse.
1.2. "Geometrization principle" vs. Physical emergence of space-time In contrast to the above resolution of Duheme-Quine thesis by Micro-Macro duality:
Micro
deduction ~ induction
Macro,
281
the "standard" approach regards the "rigorous" deductions from Micro to Macro as the only possible scientific paths to be followed. In this context, the starting point of Micro Theory consists simply of ad hoc postulates which cannot be justified within a theory itself, to be justified experimentally up to certain limited accuracy. With this point neglected, however, theoretical hypotheses on Micro quantum systems are always absolutized, in combination with the basic principle of "geometrization of physics" prevailing in modern physics. Then, we need to ask what its foundation is: it turns out to be based upon the successes of methods of modern geometry in mathematics and in its physical applications, such as general relativity, gauge theories, etc., which are essentially of macroscopic nature!! Almost all the basic principles governing modern geometries (differential, complex-analytic, algebraic, etc.) have been extracted from and applied to classical (3 macroscopic levels based on commutative algebras (of observables), and its quantum versions have only started to be sought for, without reaching mature stages yet! In spite of strong emphasis there on the rigo'T'Ous derivations of Macro (scopically observable predictions) from Micro, neither the origin of spacetime as Macro, nor "Macro principle of geometrization" seem to be well founded!? These pitfalls at both the Micro and Macro ends seem to be the two fatal defects of the fashionable trends in modern physics hidden in its blind spots. Therefore, we need explain the microscopic origins of macroscopic structures of space-time geometry itself This is just the problem to search the physical origin and emergent processes of spacetime structure in microscopic physics, to be pursued in the following. 2. Universality inherent in Macro-levels For this latter purpose also, the methods based on "Micro-Macro duality" will turn out to be quite effective, as shown below. In this context, what plays the most crucial roles can be found in the construction of a MicroMacro composite system consisting of Micro and Macro levels based upon the duality between the two directions, deduction and induction: i) deduction [Micro ===} Macro]= a bundle structure i
P
A<---+F ---» F/A
c:-:'.
----
Gal(F/A)
(formulated as an exact sequence, 1m i = ker p, of "triples" equipped with a tri-linear multiplication, e.g., A x A x A ::1 (A, B, C) f-----+ {A, B, C} E A, depending linearly on A and C and anti-linearly on B) and
282
ii) [Micro
{==
Macro] induction m
fined as its splittings, A«-F equivalent three conditions, m
0
i
h
~
= the
corresponding connections de-
F / A, characterized by one of the mutually
= IA,P 0 h = IF/A, i
0
m
+ hop =
IF.
Example: The first law of thermodynamics describes a dilation !1E = Q + W of heat Q and of work W into a closed dynamical system with conserved energy !1E. Here, the heat Q symbolically represents the uncontrollable component in Macro-manifestation of invisible Micro-motions as holonomy in the thermal classifying space (consisting of the basic thermodynamic order parameters) and the work W Macro-aspects of thermal system directly controllable at Macro-level, both of which are unified via dilation into Micro-Macro composite system as a closed dynamical system with conserved energy. This relation can be expressed concisely in terms of the exact sequence in such a form as [Micro fibre:
Q]~[!1E = Q + W: Micro-Macro composite system] ~[W: Macro "base space"], where lmi c kerp means that heatE lmi cannot be transformed into controllable work as its projection by p equals 0, and, conversely, ker p c 1m i means that any energy E ker p unchangeable to work should be regarded as heat E 1m i. Thus, the bundle structure + exactness can be seen to carry relevant physical or operational meanings. Other interesting examples can also be found, for instance, in the theories of Maxwell and of Einstein in such forms as: Micro
Micro-Macro
Macro FJlv
JJl Maxwell Eqn
H
H
+=2 Electromagmetoc Forces
1jJ
AJl
and Micro Micro-Macro Macro TJlv RJlv Einstein Eqn
H 1jJ
+=2 Gravitational Force
H r~v
In many cases the above exact sequence takes such a form that A and Fare (C*-)algebras, and the triple F/A can be viewed as A-module F
283
controlled by Galois group G = Gal(F/ A) defined by such a subgroup of automorphisms Aut(F) of F as consisting of elements E G fixing A = FG pointwise. In this case, GaT{i1A) = 6 can be regarded as the totality of irreducible unitary representations of G (if such is meaningful) or the tensor category consisting of unitary representations of G and the map p : F --» G extracts the G-representation contents of each element in F. If we equip F with an A-valued inner product, F x F 3 (F1,F2) f----+ (F1iF2) E A, it becomes a Hilbert module with a right action of A, and a splitting, A:z:.F:" F / A =
6, can be specified by the conditional expectation value A:z:.F arising from an A-valued inner product of F by m(F) = (lJF) if 1 E F (or considering an approximate unit of F). Then, F can be recovered from A and 6 as a Galois extension by a crossed product: F = A )<J 6 which gives a typical example of dilation from Macro to Micro. In this way, the duality between bundle structure A~F ~ F / A ~ ---
m
h
Gal(F / A) and its connection A«--F f--O F / A can be seen to condense the essence of Fourier- Galois duality, especially because the functors Gal and G f----+ 6 assign, respectively, a group Gal(F/ A) to the A-module F / A and the representation contents 6 of G to a group G. Extending this Fourier-Galois theoretical machinery to a symmetry breaking situation of a G-dynamical system F r \ G with a fixed-point r
sub algebra A = F G , we see below that a process of space(-time) emergence can be formulated as a kind of symmetry breaking in terms of the notions of an augmented algebra and of an associated sector bundle 2 3. "Sector bundle" associated with broken symmetry
Breakdown of a symmetry of F in a state w E EF with a group G into a subgroup H c G of a remaining symmetry is characterized 2 by the noninvariance of the "central extension" of won the centre 3.".w(F) := 7rw (F)"n 7r w (F)' under the corresponding G-action on 3.".w (F). In this situation, the role of algebra A = FG of observables is known in algebraic QFT to be replaced by the Haag-dual extension Ad owing to the breakdown A ~ Ad of Haag duality, where the Haag-dual net Ad is defined with respect to a vacuum representation 7ro by Ad(O) := [7r 1](7ro(A(O'))') (for VO: double cones in Minkowski spacetime), so that the sector structure is determined
o
by the factor spectrum Ad = Spec(3(A d)) = if: the group dual of a compact Lie group H consisting of its irreducible unitary representations:
284
A
Ad = FH (F = Ad )<J if). A general and desirable definition of the group G of broken symmetry
===}
is not known yet in terms of the above data coming from the Haag dual net Ad, but such a definition as G := Gal(F/A) with the field algebra F = Ad )<J if <=': Ad = FH and the group of unbroken symmetry: H = Gal(F / Ad) c G, is sufficient for our purposes when the obtained G is a finite-dimensional Lie group. With j: := Ad )<J G = F)<J (ii\G) called an augmented algebra 2, m we have a split bundle exact sequence Ad «-- F f---> F / Ad ~ G. In this "-->
~
situation, the minimality of G and j: is guaranteed by the G-central ergodicity, i.e., G-ergodicity of the centre 3iT(j:) in the representation ngiven by the GNS representation of Wo 0 m induced from the vacuum state Wo of Ad 2, and we have the following commutativity diagram:
FH
=
j'G: unbroken alg. of observables
1:1/,
'\. 1:1
F
j'H: extended observables .JJ-1 :1
'\. '\.1:1 onto!
1:1/,
F: augmented alg.
onto!
/,onto
H
«--
.JJ-onto
! onto
onto '\. '\.
! onto
------
G/H
whose dual version describes the sector structure:
j'G= FH /onto
ironto
F 1:1 1:1 1:1
H
11'
i
"" ""onto 11'
i i
/1:1
F '---->
11'1: 1
G: broken
~
H
unbroken sectors .JJ-1 :1
"" onto
j'H ~ G /onto
X
H
if
i 1: 1 i i "" "" 1: 1 ---»
sector bdle .JJ.JJ-onto .JJ-
G/H: degenerate vacua
where F = Spec(3(F)) denotes the factor spectrum of F, etc. Remark: The physical essence ofthe extension A ===} Ad from the original observable algebra A to its Haag-dual net algebra Ad = FH can now be interpreted as an "extension of coefficient algebra A" by (the dual of) G / H
285
to parametrize the degenerate vacua: Ad
= FH = JC = [(F)<J (H\G)jC =
(iiW)
F )<J = A )<J (ii\G). In this extension, a part G / H of originally invisible G becomes visible through the emergence of degenerate vacua parametrized by G / H due to the condensation of order parameters E G/H associated with S(ponteneous) S(ymmetry) B(reaking) of G to H. As a result , observables A E A acquire G / H -dependence: A = (G / H =" g f----+ A(g) E A) E A)<J (ii\G), which should just be interpreted as an example case of logical extension 6 transforming a "constant object" (A E A) into a "variable object" (A E A)<J (ii\G)) having functional dependence on the universal classifying space G / H for (multi-valued) semantics( , as is familiar in the non-standard and Booleanvalued analyses). By replacing G/H with the space(-time), the above consideration can be utilized as a prototype for the origin of the functional dependence of physical quantities on space(-time) coordinates, due to the physical emergence of space(-time) from microscopic physical world. Along this line, we prescribe the similar logical extension procedure on the observable algebra Ad = FH adding G / H-dependence: C
Ad)<J
(ii\G) = FH
)<J
(H\G) = (F)<J (H\G))H = j"H.
Then, the whole sector structure of JH
= (FH )<J (ii\G)) can be identified
II;
this is seen to constitute a bundle
with its factor spectrum JH = G x H
structure,
II ~ JH =
G x H
II
--t>
G / H, called a sector bundle consisting
of the classifying space G / H of degenerate vacua, each fibre over which describes the sector structure II corresponding to the unbroken remaining symmetry H (or, more precisely, the conjugated group gHg- 1 for the vacuum parametrized by g = gH E G / H). Namely, the sector bundle,
II
~ JH
=
G x H
II
--t>
G/H, can be
understood as the connection= splitting of the dual, FH = II
JH = FH )<J
(ii\G)
--t>
(ii\G)!
4. Emergence of space(-time) as symmetry breaking We can now apply the above scenario to the situation with the group G describing both the external (= space-time) and the internal symmetries.
286
For simplicity, the latter component described by a subgroup H of G is assumed to be unbroken, and hence, the broken symmetry described by G / H represents the space-time symmetry. It would be convenient to take H as a normal subgroup of G, though not essential. To be precise, G / H may contain such non-commutative components as spatial rotations (and Lorentz boosts) acting on space(-time), we simply neglect this aspect to identify G / H as the space( -time) itself. Then, by identifying G / H with a space(-time) domain R, we can notice a remarkable parallelism between the commutative diagram in the previous section:
FH =j"G
H./ F
"",G/H j"H
.IJ-
1 G/H "", "", ~ 1 G/H F
1 H
./
.IJ-
./ H "", "",
1, 1
----1
G/H
<+-
and the diagram controlling Doplicher-Roberts reconstruction of the local net R f----+ F(R) from R f----+ A(R):
Op =
G./
at "",n
A(R)
1 , 1 "", "", 1
. /G
where Od := C* ({ ?)ii' ?)ij}) is a Cuntz algebra consisting of d-isometries ?)ii d
(i = 1,2,··· ,d): ?)i:?)ij = Oijl, 2:.?)ii?)i: = l. i=l
The crucial ingredients for this scenario are as follows: 1) The essence of transitions from invisible Micro with dynamical motions into visible Macro equipped with universal indices can be found physically and typically in the processes of condensation (to form condensed states), whose mathematical expression is nothing but the socalled "B-construction" (or "bar-construction", "basic construction", etc., with variety of names) and/or "heat kernel method" to extract topological and/or homotopical invariants forming a classifying space and playing the universal roles in classifying objects in question. Such a classical
287
object as G / H to classify degenerate vacua plays universal roles in the sector structure of SSB in such a form as the base space of the sector bundle,
II '-+ yH = G
x
II
---7>
G/H.
H
2) The notions of sectors or pure phases and mixed phases have been introduced to clarify the mutual relations between quantum Micro and classical Macro 2. For this purpose, we need classify representations of the algebra of physical variables on the basis of quasi-equivalence which is just the unitary equivalence up to multiplicities. The minimal units in this classification are factor states or factor representations whose centres are trivial, and they are called sectors mathematically or pure phases physically. If a state or its GNS representation is not a factor because of its non-trivial centre, it is called a mixed phase which can be canonically decomposed into sectors or pure phases. 3) In the context of measurement processes, the above Micro ==? Macro transitions are taking place in the amplification process to magnify the microscopic changes of quantum states at the contact points between the systems and the microscopic ends (= probe systems) of measuring apparatus into the macroscopic motions of measuring pointers 1,3. In these papers, the process of this sort is shown to be formulated as a Levy process with (or, ideally without) small deviations from it. What is most important in this process is the transitions from a mixed phase as a virtual probabilistic mixture of many different sectors or phases into their spatial configurations in the "real space" , each sub domain (= pointer position) of which is occupied by a single sector or phase. In the context of measurements, this phase separation is allowed to take place chronologically as is indicated by the Born statistics rule, whereas it can occur spatially or synchronically in some such thermal contexts as non-equilibrium states with certain degrees of stability. 4) The above problem of phase separation from the physical viewpoint can be viewed logically as the localized selections of the single truth value (in the sense of standard two-valued logic) from a multi-valued logic (in the context of Boolean-valued analysis of probability space). This process can be controlled by a logical method called "forcing" 7 (famous for P.Cohen's use of it in the context of proving independence of continuum hypothesis from the ZFaxioms of set theory), resulting in a topos of sheaves on the classifying space G / H = R (consisting of degenerate vacua) whose core member is given by a sheaf r( G x if) of sections of H
288
the sector bundle
fI
'----+
j-H
G x H
iI --
G / H on G / H
= R describing
the sector structure of the algebra j-H of extended observables in terms of its factor spectrum j-H. This is to be put in parallelism with the sheaf R f----+ EA(R) of local states in DHR sector theory, which means that the notion of local states EA(R) of a local algebra A(R) in a spacetime domain R should be understood to correspond to a choice of a family of degenerate vacua in G / H = R arising from SSB, namely, to the context of considering states of extended observables j-H in reference to each member of degenerate vacua belonging to G / H. This parallelism clearly shows the existence of quantum fluctuations inside of each space(-time) point x E G / H = R, to which extent space(-time) points are highly non-trivial physical objects!! 5) The differences in the degrees of stability mentioned in 3) may be related in a meaningful way to the corresponding differences in changeability between some items to be put one place to another and the certain stable behaviours of the "names" attached to specific places. To systematize such degrees of stability may be quite relevant for satisfactory understanding of the various stabilized domains appearing in different levels in nature and also of hierarchical structures of biological organisms, from the viewpoint of Grothendieck's topoi and sites. Acknowledgment
I have benefited very much about the problems related to the forcing from discussions with Mr. H. Saigo and Mr. R. Harada, to whom I am very grateful. I would like to express my sincere thanks to Profs. Belavkin, Khrennikov, Smolyanov and Volovich for their valuable comments and kind encouragements at QBIC201O. References 1. Ojima, I., Micro-macro duality in quantum physics, 143-161, Proc. Intern.
Conf. "Stochastic Analysis: Classical and Quantum -Perspectives of White Noise Theory" ed. by T. Hida, World Scientific (2005). 2. Ojima, I., A unified scheme for generalized sectors based on selection criteria -Order parameters of symmetries and of thermality and physical meanings of adjunctions- , Open Systems and Information Dynamics, 10 (2003), 235-279; How to formulate non-equilibrium local states in QFT?-General characterization and extension to curved spacetime- , pp.365-3S4 in "A Garden of Quanta", World Scientific (2003). 3. Ojima, I. , Micro-Macro duality and emergence of macroscopic levels, Quantum
289
4.
5. 6.
7.
Probability and White Noise Analysis, 21, 217 - 228 (2008); Harada, R. and Ojima, 1., A unified scheme of measurement and amplification processes based on Micro-Macro Duality - Stern-Gerlach experiment as a typical example -, Open Systems and Information Dynamics 16, 55-74 (2009). Doplicher, S. and Roberts, J.E., Why there is a field algebra with a compact gauge group describing the superselection structure in particle physics, Comm. Math. Phys. 131 (1990), 51-107; Endomorphism of C*-algebras, cross products and duality for compact groups, Ann. Math. 130 (1989), 75-119; A new duality theory for compact groups, Inventiones Math. 98 (1989), 157-218. Haag, R., Local Quantum Physics -Fields, Particles, Algebras- (2nd ed.), Springer-Verlag, 1996. Ojima, 1. and Ozawa, M., Unitary representations of the hyperfinite Heisenberg group and the logical extension methods in physics, Open Systems and Information Dynamics 2, 107-128 (1993). Cohen, P.J., Set Theory and the Continuum Hypothesis, W.A.Benjamin, New York,1966.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M . Ohya © 2011 World Scientific Publishing Co. (pp. 291- 309)
USE OF CRYPTOGRAPHIC IDEAS TO INTERPRET BIOLOGICAL PHENOMENA (AND VICE VERSA)
MASSIMO REGOLI Centro Vito Volterra, Universita di Roma "Tor Vergata ", Roma 1-001 33, Italy E-mail: [email protected]
1. Introduction
The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. This algorithm, as shown below, has the peculiarity to expand the message to be encrypted hiding the ciphered message itself within a set of garbage and control information. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. In particular the RNA sequences has some sections called Introns. Introns, derived from the t erm "intragenic regions", are non-coding sections of precursor mRNA (pre- mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre- mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre- mRN A is not clear and it is under ponderous researches by Biologists but, in our case, we will use the presence of Introns, in the RNA-Crypto System output, as a strong method to add only apparently chaotic and non coding information with an unnecessary behavior in the access to the secret key to code the messages. In the RNA-Crypto System algorithm the introns are sections of the ciphered message with non-coding information as well as in the precursor 291
292
mRNA. But the term "non-coding" does not necessarily mean "junk data". In this text a new cryptographic algorithm is described starting from a mathematical point of view. 1.1. Interpretation
This algorithm can be used to code clear messages (and the main scope of this job regard this application), but it can be used as well as to suggest to biologists to consider redundancy in the pre-mRNA sequences as a mechanism used by nature as a sort of protection against a possible decoding attack from the outside or, from another point of view, they can give to introns an important role in the mechanism for coding the resulting mRNA for example the one to achieve future functionalities or to archive old ones. 2. Ingredients
To introduce the ReS algorithm let us start to give some important ingredients. 2.1. Spaces
We introduce for the first the spaces:
• K := {h:ilh:i E K} as the space of all finite sequences in a given space K, of length less or equal to NK where 00 > NK EN; • M := {O'ilO'i EM}, as the space of all finite sequences in M of length less or equal to N M, where 00 > N MEN (could be M = K) • S = Sl U S2 with Sl n S2 = 0 with lSI < 00. Namely K is the set of Secret Keys of length :s: N K, M is the set of Clear Messages of length :s: N M, K and M are finite spaces of symbols (just to fix the ideas should be: K = M = {O, I} the standard binary digits). Also S is a finite set of symbols (or functions) and we can consider it as the set of all possible actions that act on the clear message. At least we introduce the following spaces:
= AUB where A = ANA and B = UiBi where Bi = ANA X B~Bi. Also in this case A and Bi are finite sets of symbols as
• 0
above. From a biological point of view the space A is the exons space and B is the introns space.
293
o is the base space for Coded Messages (a coded messages will be a finite sequences of elements of 0). After this preliminary description of necessary spaces, now we can introduce some useful definitions: Definition 1. Be
{l, ... ,NK
with 0
K,
E K.
We define for mEN and for each
E
},
-s: m -s:
N K as a subsequence of
K,
(m = 0 represent a subsequence of length 0). Definition 2. Be
{l, ... ,NM
with 0
(J
E
M.
We define for n E N and for each i E
},
-s: n -s:
N M as a subsequence of
(J.
(n = 0 represent a subsequence of length 0). In the above definitions the operator is the sum with appropriate module depending on the length of the sequence and, for simplicity, in the following we will use the form K,i,m == K,i and (Ji,n == (Ji moreover holds: K,i E K and (J i E M
+
Definition 3. Be C = {EilEi E O} the space of all finite sequences in O.
C is the set of Coded Messages. 2.2. Functions
To reach our goal we must introduce two kinds of functions: • Coding functions: these functions are used to: - transform a portion of the clear message in a portion of the coded message (exons) - insert some apparently redundant information in the coded message (introns) • Operational functions: these functions are used to: - modify the local state of the coding system
294
- modify the global state of the variables of the system (see below) For the first we start with the definition of a family of coding functions fa : M X K ----+ 0 with a E 8: if a E 8 1 if a E 8 2
(1)
where E: E A and i E B. The functions fa must be chosen in such way to guarantee the existence of a function J : 0 X K ----+ M such that if a E 8 1 if a E 8 2
(2)
Now we introduce the definition of some operational functions: be ga : K x 0 ----+ K as follow:
(3) the existence of a function g that does not depend from a ensures the chaoticity of the system. For the last, we need, as a member of the set of the operational functions, the trivial characteristic function XS 1 : 8 ----+ {a, I}
2.3. Global variables and further definitions Now let us introduce some further definitions:
Definition 4. Be
K
E K a Pre Shared Key (PSK) of length N K.
In cryptography, a pre-shared key or PSK is a shared secret which was previously shared between the two parties using some secure channel before it needs to be used.
Definition 5. Be {a} length.
{ai lai
E
8} a random sequence of arbitrary
A random sequence is a kind of stochastic process. In short, a random sequence is a sequence of random variables. In computer science it comes
295
from a random number generator, often abbreviated as RNG, that is a computational device designed to generate a sequence of numbers or symbols that lack any pattern, i.e. appear random.
Definition 6. Be
(J
M a message of length N M.
E
In communications science, a message is information which is sent from a source to a receiver.
Definition 7. ~ E C will be the coded message of length that will depend on the system and the variables state. In cryptography, a coded message is a clear message transformed into an obscured form, preventing those who do not possess special information, or key, required to apply the transform from understanding what is actually transmitted.
2.4. Biological parallelism From the biological point of view the PSK and the random sequence {a} represent the whole mechanism of splicing (this is the process by which pre-mRN A is modified to remove certain stretches of non-coding sequences called exactly introns) , while the coded message is the pre-mRNA itself and the clear message is the final mRNA.
3. The algorithm
3.1. Coding Suppose to have a K, E K as a Pre Shared Key (PSK) of length N K and a message (J E M of length N M and that we have fixed two numbers n, mEN. (For simplicity and without affecting the generality of the system, we can suppose that N M is a multiple of n). Using the above definitions for ai, Crl i and K,ji we can define the following algorithm:
f Cti (Crl" K,jJ
I
K,ji+l Ci E
A and
Li E
ai
E
81
if
ai
E
82
= li + XS 1 n = gCti (K" ~i)
lHl
where
if
(4)
B.
Definition 8. The sequence
~
=
{~1'
... ' ~N}
E C is the coded message.
296
3.2. Decoding For the decoding phase suppose to have Ii E J( as a Pre Shared Key (PSK) of length N K and a coded message ~ E C of arbitrary length, and suppose that we have fixed two numbers n, mEN. Using the above definitions for ali and K,ji we can define the following algorithm:
{ !(~i' K,jJ = O"i = g(li, ~i) liji+l
if
~i E
A (5)
{!(~i' K,jJ =0 liji+l
= g(li, ~i)
if ~i E B*
cB
Note that, thanks to the definition of g, in the decoding phase is not necessary to know the random {O:i} sequence.
3.3. Preliminary observations The coded message depends on: • The clear message; • The secret Key (PSK); • The random sequence {o:}; and it important to understand that the algorithm during the coding phase using the same message and the same PSK can supply different coded messages simply using different {o:}. It is also important to underline the fact that it is no necessary, during the decoding phase, to know the sequence {o:} used in the coding phase. This because the information about the {o:} sequence are intrinsic in the couple (~i' K,jJ. In this circumstance we want to underline the possible role of introns to carry important information for the decoding phase, for example the function g(Ii,~) can use the information present in ~ E B to produces a change in the PSK. Moreover, during the coding phase there is an expansion of the original messages into the coded message, and this expansion comes from several factors, for example it comes from the dimensions of the Bi and from the number of elements of 8 2 in the {o:} sequence.
297
4. Early implementations
4.1. The environment Now let us introduce the first realization of the above algorithm. Step by step we will substitute each ingredient of the algorithm, with its representative object in the implementation. Starting from the definition of K = M = A = B1 = {O, I}, with NK,NM EN, NA = 3, NB l = 2 and S = Sl U S2 with Sl = {a,b} and S2 = {c, d}. 0 = {O, l}NA X ({O, l}NA X Finally n = 1 and m = 3.
B;Bl).
4.2. Functions and implementation To introduce the functions fa the table 1 T : A x K n using the following rules:
--+
S will be useful
1) if a E Sl
- be c = K,j - if ai = O"i = 0 then localize a binary number 000 S; r S; 111 such that Tr,c = a (or Tr,c = b) - if ai = O"i = 1 then localize a binary number 000 S; r S; 111 such that Tr,c = b (or T r,c = a) - now let ~i = (~i,1'~i,2'~i ,3 ) = (r1,r2,r3) (i.e. an exon I:i E
A).
2) if a
E
S2
- be c = K,j - localize a binary number 000 S; r S; 111 such that Tr, c = a - now let ~i ((~i , 1'~i , 2 ' ~i , 3 ) ' (~i ,4 , I:i,5)) (( r1, r2, r3), (~i,4' ~i,5))' (i.e. an intron ~i E B) - the other 2 components of the vector (i.e. ~i,4' ~i,5) will be given using a random choice in the binary range {OO, ... , 11}
298
Table 1 The seek table Tr,c
I 0 00 00 I
(2;i, l, 2;i, 2, 2;i ,3)
0 00
a b
0 10
C
aI I
d a b c d
1 00 101 11 0 I II
I
001
b c d a b c d a
I
000
c d a b c d a b
I
I j
al l
d a b c d a b c
100
10 I
110
I I I
a b c d a b c d
b c d a b c d a
c d a b c d a b
d a b c d a b c
Now for the 9 function we will use the following rules: • • • •
obtain r , c as above if T r,c E {a , b} then 9 will return ii j i +m if Tr,c = c then 9 will return iij i+(E i ,4 Ei,5ho if Tr,c = d then 9 will return iiji - (E i ,4 Ei,5 ho
where the subscript 10 means the decimal representation of the binary number in parenthesis. The last function we must define is 1 : 0 x K ----+ M. Remember that conditions in equation (2) must hold. The function /(L.i' ii j ) could be able to understand if L.i E A or L.i E B. Starting from the fact that L. (i.e. the entire coded message) is a sequence of 0 and 1, we can extract the single element in terms of bits (i.e. I;l i ' I;l i+1, " .). Following this strategy we can define: (6) the new function act in this way: • • • • •
be r = (I; li ,I;li+l,I;li+ 2) be c = iiji if Tr ,c = a then 1 returns O'i if Tr,c = b then 1 returns O' i else 1 return 0' i = 0
= {O} (or O' i = {I}) = {I} (or O' i = {O})
The function 9 acts as above but using (I;l i ' I;l i+ 1 , I; li+ 2 ) if Tr,c (I;I " I;l i+l,I;li+2 ,I;li+3, I;l iH ) if Tr ,c = {c,d}.
= {a , b}
or
299
4.3. Observation
In this simple implementation of the RNA algorithm, redundancy is given by the following considerations:
• for each processed bit in the clear message 3 bits will be inserted in the coded one • each intron will insert in the coded message 5 bits • we use an uniform distribution for the generation of the sequence
{ad • then we can suppose that the length of the sequence {ad is
II{adll = II{ailai E SI}11 + II{ailai E S2}11 • of course, in this implementation, must be
II{ailai E SI}11
=
NM
Then, if the original message has a length of N M bits, the length of coded one is le :::::; 3NM + 5NM = 8NM . But it is important to underline that this implementation uses a redundant table Tr,e, in fact for each c there are two r that satisfy the condition Tr,e = a. But of course if we want to reduce the expansion of the message we can replace it with the one in table 2 (in this version must be NA = 2 and m = 2).
Table 2 Analternative seek table Tr
1
1 00 00
(~i,I' ~i,2)
01 10 II
a b c d
I
01
b c d a
I)
10
c d a b
I
II
e
'I
d a b c
in this case the message explosion will be le :::::; 2NM + 4NM = 6NM . Another solution in order to diminish the explosion of the coded message is that to use an asymmetric probability function for the generation of the {ad for example:
300
Table 3
Ia I p I a b c d
3/8 3/8 1/8 1/8
Using these last two optimizations the approximate length of the coded message will be reduced to le ~ 3NM. 4.4. A biological implementation
Now, as an example, we want to create an implementation of our model nearer the biology. For this let us introduce again the ingredients for the new model: Starting from the definition of K = N, M = A = {a, b, c, d}, with NK,NM E N, and 5 = 51 U 52 with 51 = {c} and 52 = {nc}. (') = {ANA\(a,a,a)} U ({a,a,a} x (UiANBi)), NA = n = 3, NBi E Nand m=l. The function f will be: ~i'
cr' = fa (cri' K,jJ = { '
(a, a, a, B) with B
E
_
(7)
AKji
in this case the element (a, a, a) E A3 is the codon to localize the beginning of the introns in the sequence (it will never appear in the clear code) and the function 1 will be: if (~i,1'~i,2'~i,3) otherwise
-I- (a,a,a)
(8)
5. The role of the Coding Functions In section 2.2 we introduce the definition of Coding functions. These functions play an important role in the complexity and in the lengths of the cipher text. It is also very important to understand that the number of functions involved in the process of encryption can be huge. In fact, besides the
301
canonical functions introduced in section 4.1 we can add functions that have the role of: • Change the public / secret keys - For example in the cipher text we can insert a sequence of bits of arbitrary length that will replace the public (private) key to decode the rest of the message. • Insert a long sequences of random bits - As before but the inserted bits are ignored. • Move forward or backward the key pointer position - The presence of a disruptive effect on the sequencing access to the secret key can only add more noise in the attacks • Reset global parameters like message pointer position, key pointer position, secret and public data - See before is easy to see that the number of Coding functions that can be inserted in the encoding mechanism is great and may become part of the shared secret information. In the same way it is obvious that some Coding functions can greatly increase the size of the cipher text as it is also obvious that the inclusion of random bits within the text could create an increase in the randomness of the cipher text.
6. Statistical Analysis 6.1. Cryptanalysis Cryptanalysis (from the greek krypts, "hidden" and analein, "break") is the study of methods for obtaining the meaning of encrypted information without having access to secret information which is usually required to perform the operation. Typically it comes to finding a secret key. Cryptanalysis is the "counterpart" of cryptography, namely the study of techniques for concealing a message, and together form the cryptology, the science of writing hidden. Cryptanalysis refers also to any attempt to circumvent the security of other cryptographic algorithms and cryptographic protocols. Although the methods of cryptanalysis usually excludes attacks that are directed to the inherent weaknesses of the method to violate, such as
302
bribery, physical coercion, theft, social engineering, such attacks are often the most productive of cryptanalysis traditional, they are still an important component. The first cryptanalytic tool used to analyze the RNA-Crypto System is based on the Statistical analysis. For this purpose we used a famous battery of tests called Diehard (or Dieharder in the new version). • The Diehard tests is a battery of statistical tests for measuring the quality of a set of random numbers. • It is cited by NIST as one of the best statistical suite for testing randomness • It was developed by George Marsaglia over several years and first published in 1995 and then maintained and improved by Robert Brown at the Duke University. • We focused our attention to the following tests: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) (11) (12)
Birthday spacings Overlapping permutations Ranks of matrices Monkey tests Count the Is Parking lot test Minimum distance test Random spheres test The squeeze test Overlapping sums test Runs test The craps test
all these tests are well described in the software package and in literature.
6.2.
OUT
Idea
In nature a nucleotides ribbon is a sequence of (almost always) 4 symbols (a, c, g, tlu). In computer science a 'string' is a sequence of 2 symbols (0, 1). Now, suppose to translate the 4 symbols as in table 4:
303
Table 4 Translaton table base binary a 00 c 01 g 10 11 tlu then we have a correspondence between bits and nucleotides. With this assumption the Diehard tests can be used to assess the randomness of this binary conversion of a sequence of nucleotides as well as the sequence of an encrypted message. Our idea is based on the following questions: (1) Have RNA ribbons a good random behavior according to DieHard( er) tests? (2) Are RNA-Crypto System sequences a good random behavior according to DieHard (er) tests? The answers can be given by looking at test results. 6.3. BIG results
6.3.1. The Experiment We used only some very long sequences of nucleotides from on line standard free databases, this because the tests run well on more than 12 M bytes of data. Accordingly with some simple calculations, holds the formula 1M bytes = ~ M bases, then we need sequences of at least 48 M bases, so our attention was focused also on the human genome which can gave very long sequences.
6.3.2. The protocol (1) Get a sequence from the database (longer than 48 M bases) (2) Translate it in binary mode (3) Run the DH test on it
6.3.3. The Data For the biological sequences we uses the following:
304
• • • •
Caenorhabditis elegans chromosomes I-V, complete sequences. Walia by, whole genome. Human chromosome 14 complete sequence Drosophila melanogaster some chromosomes, complete sequences
all encoded use table 4.
6.3.4. The Results In table 5 we can see the results of our experiment using the Biological data. It is obvious that Bio-sequences are not random at all. The reason for this fact could be found, of course, in: • Some parts of a pre-mRNA sequence could be highly repetitive (Satellite, Minisatellite, ... ) • Some parts of a pre-mRNA sequence could be made up a very long sequence of the same nucleotide (Polyadenylation tail, ... ) n 1 2 3 4 5 6 7 8 9 10 11
12 13 14 15
Table 5 Bio Results Test name Birthday Spacings Overlapping Permutations Ranks of 31x31 and 32x32 matrices Ranks of 6x8 Matrices Monkey Tests on 20-bit Words Monkey Tests OPSO,OQSO,DNA Count the 1's in a Stream of Bytes Count the l's in Specific Bytes Parking Lot Test Minimum Distance Test Random Spheres Test The Squeeze Test Overlapping Sums Test Runs Test The Craps Test
Status FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL
6.4. RNA-Crypto System results 6.4.1. The Experiment In the RNA-Crypto System experiment we long sequences of binary data encrypted with our protocol (approximately 100 MBytes of data for each
305
experiment) .
6.4.2. The Protocol The protocol used to estimate the randomness of RNA-Crypto System
IS:
(1) Run the DNACrypto program on a message of opportune length (2) Save the encrypted message (3) Run the DR test considering it as a random sequence.
6.4.3. The Data For the cryptographic sequences we uses:
• A 10 M bytes file filled with ASCII char 'A' • A 10 M bytes file filled with random binary numbers • One of the above biological sequence
encoded with the Algorithm
6.4.4. The Results
In table 6 we can see the results of our experiment using the RNA-Crypto System data. It is obvious that RNA-Crypto System sequences are random following the DR tests.
306
n 1 2 3 4 5 6
7 8 9 10 11 12 13 14 15
Table 6 Crypto Results Test name Birthday Spacings Overlapping Permutations Ranks of 31x31 and 32x32 matrices Ranks of 6x8 Matrices Monkey Tests on 20-bit Words Monkey Tests OPSO,OQSO,DNA Count the 1 's in a Stream of Bytes Count the 1 's in Specific Bytes Parking Lot Test Minimum Distance Test Random Spheres Test The Squeeze Test Overlapping Sums Test Runs Test The Craps Test
Passed PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS PASS
6.5. First Results - Differences As expected the results correspond to our ideas on the sequences of nucleotides and maybe also to the ones about the Crypto System. Of course natural phenomena is not really random like a cryptographic system Some obvious questions are: • Why pre-mRNA sequences do not pass the statistical tests? - Of course they are not random (life is not random) • But some of the motivations, from the statistical point of view, may be the follows - Some parts of a pre-mRNA sequence could be highly repetitive (Satellite, Minisatellite, ... ) - Some parts of a pre-mRNA sequence could be made up a very long sequence ofthe same nucleotide (Polyadenylation tail, ... ) • Is is possible to manipulate the RNA-Crypto System protocol to obtain the same results?
307
6.6. Changes: Informatics emulates biology
6.6.1. Modify Cryptographic protocol - Phase I Due to its random nature the cryptographic model has not repeated sequences inside then we introduce a new coding function p (the replicator function) that will add these repeated sequences artificially inside the code These code can be considered: • active (they act in some way with the system) • passive (just junk!!!)
6.6.2. Strategies The replicator function
Definition 6.1. Be p : M x K x 0 p (a,
K"
---->
0 a junk function:
0) =
~
=
L
(9)
where B :3 i = OU,K and OU,K is a subsequence of o. If a is a part of the encoded message, this mechanism implements the repeated sequences phenomenon.
6.6.3. Alter Cryptographic protocol - Phase II Due to its random nature the cryptographic model has not significant allequal subsequences then we introduce a new coding function T (stutter function) that will add these mono-symbols subsequences artificially. Also in this case this sub-sequences can be created to be: • active (they act in some way with the system) • passive (just junk!!!)
6.6.4. Phase II - Strategies The stutter function
Definition 6.2. Be
T :
M xK
---->
0 a junk function:
(10) where B :3 i = Zn",K and Z E Au B i , nU,K is an integer. This mechanism implements the polyadenylation tail.
308
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Table 7 New Crypto Results Test name Birthday Spacings Overlapping Permutations Ranks of 31x31 and 32x32 matrices Ranks of 6x8 Matrices Monkey Tests on 20-bit Words Monkey Tests OPSO,OQSO,DNA Count the 1 's in a Stream of Bytes Count the 1's in Specific Bytes Parking Lot Test Minimum Distance Test Random Spheres Test The Squeeze Test Overlapping Sums Test Runs Test The Craps Test
Status FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL FAIL
6.7. Results New results come from statistical analysis on the new cryptographic data. Note: the result in table 7 holds for all the data in the cryptographic set
6.8. New Results - Differences In these days we are working hard to prove that the security of the cryptographic system is not affected by the new entrants, nowadays we did not find any attack that can exploit these features, so we are highly confident that the new system has the same level of security of the standard one. • In fact, even if the spy could isolate the Introns, the rest of the message would be exactly equal to the previous version • Then the safety remains the same (or better) Moreover adding redundancy we obtained, as a side effect, some properties: • A certain robustness to error in transmission - Indeed, in a very redundant system, the probability that an error in a bit prevents the decoding is very low
309
This also suggests a particular interpretation of redundancy in RNA (protection against excessive mutations as an example) • Furthermore, the spy, during his attack, does not know whether the piece of code that is trying to attack be an Exon or an Intron This also suggests another particular interpretation of redundancy in RNA (protection against pathogens 7) 7. Conclusion In conclusion this job will allow us to take some considerations. For the first, adding to a good random sequence some artificial noise, we transformed it in a non random sequence but this transformation does not affect the cryptographic security. Then this means that tests of randomness, while giving a good estimate of safety, are not always infallible and more tests are needed to determine the goodness of an encryption system. Finally, the similarity between biology and computer science (cryptography) must be investigated further in order to give further interpretation to the biological mechanisms also today still partially obscure. References 1. Lewin, B., "Gene VI" Oxford University Press, 1997. 2. Menezes, A., van Oorschot, P. and Vanstone, S., "Handbook of Applied Cryptography", CRC Press, 1996 3. Regoli, M., "Bio-Cryptography: A Possible Coding Role for RNA Redundancy", Foundations of Probability and Physics-5 (AlP Conference Proceedings), Accardi EDT, 2008
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 311-319)
DISCRETE APPROXIMATION TO OPERATORS IN WHITE NOISE ANALYSIS
SI SI Faculty of Information Science and Technology Aichi Prefectural University Aichi Prefecture, Japan
2000 AMS Classification: 60H40 In this paper we discuss how to approximate white noise functionals and the operators in white noise analysis by using variables depending on discrete parameter. We discuss to understand the basic idea and real meaning of approximation of operators.
1. Introduction
Main aim of this report is to discuss a method of approximation of white noise functional by using a system of variables with discrete parameter. Then, we naturally proceed to the approximation of operators. When we discuss approximation of nonlinear functionals of white noise we meet a crucial difficulty. Namely, to come to some nonlinear functionals of 13(t)'s we are naturally required to have the so-called renormalization. Thus, we shall provide a general theory of renormalization of white noise functionals. First, we shall take a system {Xn' n E Z} of standard Gaussian random variables and their functions. We then come to approximation of white noise functional cp( 13 (t), t E R} by those of Xn's. In this course, we can see reasons why renormalization is necessary for some white noise functionals. So this approximation that we discuss in this note is very much different from the approximation of ordinary functions. The last section is devoted to the approximation of operators acting on the space of white noise functionals. 311
312
2. Analysis of white noise functionals
We wish to discuss functionals rp(B(t), t E R), noting that {B(t) , t E R} is a system of idealized elemental random variables. First, we form basic functionals of B(t)'s, that is polynomials in B(t)'s. Consider a system A
= algebra generated
by the system {I, B(t), t E R}.
Proposition 2.1. A forms a graded algebra, that is i) A is an algebra. ii) A = L ~=o An, (algebraic direct sum) where An = {homogeneous polynomials of degree n}
iii) An· Am
=
{fn(B)gm(B);fn E An ,gm E Am}.
Remark 2.1. B(t) = B(t, w), wE n(fl), B is Brownian motion. For every w, B(t, w) is defined as a generalized function of t. But smeared variables B(~),~ E E,E being a nuclear space, are well defined as ordinary random variables. Remark 2.2. The sigma field generated by {B(t), t E R} is understood to be equal to the sigma field B generated by {B(~),~ E E}. It is known that 00
(L2) == L 2(n,B,fl) =
EB'Hn (Fock space) n=O
In particular, HI is spanned by B(~) , ~ E E and we have (1)
As an extension of the isometry we have
Also it is shown that (1) can be extended to
'Hi- I) ~ K(-I)(R),
(2)
where K( -1) (R) is the Sobolev space of degree -lover Rl. Since K( -1) (R) contains delta function btU, we see that B(t) is a well defined member of H 1( - I) .
313
In this line the orthogonalization A~ of the sub-algebras An leads us to define a space H~-n) of generalized white noise functionals of degree nand finally come to the space of generalized white noise functionals :
(L2)- = EBcn1{~-n). 3. Discrete parameter case We restrict time parameter to [0,1]. The linear space spanned by (E, ~n)"s ~n being a base of L2([0, 1]) is the same as the space spanned by the (E, Xn,k) where
Xn,k Write
= XLl. n , llnk = [k 2n' - 1 ~] 2n . k
X;: = 2~ (E, Xn,k)' Then X;:'s are independent identically
N(O, l)-distributed, n = 0,1,2"" ; 1 :s; k :s; 2n. Obviously B(X;:, n = 0, 1,2,'" ; 1 :s; k :s; 2n) = B =
VBn n
where Bn = B(X;:, 1 :s; k :s; 2n). We have n
Theorem 3.1. The space (L;J is increasing in n and inductive limit of (L;J is equal to (L2). 4. Operators : from discrete to continuous form We first approximate a Brownian motion by Levy's construction (see [7]) which is fitting to realize an approximation. That is, we should take an independent system {Ll.jnB}, which approximates Brownian motion when k
{llk} is getting finer. Thus, we take {Ll.}!} as a basic system for the random variables in the k
followings. We are now going to give the interpretation why we take Frechet derivative to define 8 t = a:(t)' In fact, we define as a limit (in the sense to be prescribed below) of where {Xn} is independent identically N(O, l)-distributed. To see the idea we simply note in the followings.
at,
at
314
1) In the discrete parameter case let X = (X 1 ,X2 ,···), Xi's be independent identically and N(O, I)-distributed. The partial derivative
a!L f(X) is defined to be
8 -8 f(x1,x2,···)1 x J -x·· Xn J
(3)
We wish to use analogous technique. 2) Since .6,~
-t
{t}
.6,nB ===?
~n
.
-t
B(t).
(4)
k
and since the S-transform of B(t) is ~(t), we can use a counter part of (3). Namely, for cp(B) = cp(B(t), t E R1) first take 8~~t) (Scp)(~) and apply S-l. This is expressible as
8
tCP
= ~ = S-1_8_(S ) 8B(t)
8~(t)
cp,
(5)
where atct) is the Frechet derivative. Formally we follow this, however we need some interpretations to have (5) understood correctly. Coming back to (3), the partial derivative a~n means a derivative with respect to X n . This fact should correspond to a variation of the random variable X n , for which we recongnize the variation within the world of random function. This is difficult to be justified. Having overcome this difficulty, the definition (5) is acceptable. In the expression (5), the Frechet derivative is understood to be a derivative obtained by measuring infinitesimal variations in all possible direction. The idea of understanding the partial derivative with respect to the random variable B(t) is the same as in the discrete case. Concerning the definition of the partial derivative 8t , we have to note a crucial difference form that in the discrete case. For a~n' the variable Xn is a standard guage, i.e. Xn has unit variance. On the other hand B(t) is an infinitesimal random variable, formally speaking it has variance Such difference is absorbed by the S-transform. Nevertheless, we must be careful when 8 t is approximated by a~n in the discrete case. The examples which are now in order illustrate this fact. We note that the Frechet derivative lets the degree of homogeneous functionals decrease by 1, which means 8 t is an annihilation operator. It
it.
315
acts in such a way that the kernel function F( Ul, ... ,un ) associated to cp E Hn becomes
The S-transform acting on (L2)- is expressible as (Scp)(~)
=
C(~)(e(x,O, cp(x))
since e(-'· ) E (L2)+. Set F = {(Scp)(~); cp E (L 2 )-, ~ E E}. Theorem 4.1. The vector space F can be topologized to be a Repoducing Kernel Hilbert Space with kernel C(~ - 7)), (~, 7)) E E x E and F~(L2)-.
Example 4.1. If j is continuous, then at
In particular, atB(s) analogue of
J
j(u)B(u)dU
= o(t -
= j(t).
s), where 0 is the delta function. This is the
Example 4.2. at : B(s)n := (n - 1) : B(t)n-l : o(t - s).
Note that 0(t - s) follows in the above expression. Example 4.3. For the Gauss kernel
cp = Nee! B(U)2du,
(6)
atcp = 2c : B(t)cp :,
(7)
Thus,
which comes from the variation of S-transform
o
o~U(O
=
2c~(t)U(O,
(See the literature 6 ). Taking S - l-transform, we obtain (7) .
(8)
316
A particular interest is centered to quadratic normal functionals, in the sense of LevylO, the S-transform of which are expressed in the form U(O
= 10
1
f(U)~(U)2du+ 1ot 1o t F(u,v)~(u)~(v)dudv,
where f is continuous and F E L2(R2). The action of the partial differential operator transform in the expression
at
is seen by its S-
UI(~, t) = 2f(t)~(t) + 10 t F(u, v)~(u)du. The second derivative
(9)
(10)
a; corresponds to the functional U/I(~,
t) = 2f(t)8 t
(11)
to which we wish to correspond 8~2 in the digital case. If we wish to correspond to the trace
z= 8~~
,
then nwe define
(12) In view of this, the second term is annihilated. Hence for
(13)
l1 L
If we change the interval [0,1] to any interval (maybe [0, 00) or (-00,00), we have the same expression provides f is integrable. Using the formula for Hermite polynomial, the multiplication in discrete case is obtained as
Xjf(X) =
o~f(X) + (o~)*f(X) J
J
which approximates the multiplication with B(t) which is denoted by Pt, that is
(14) We then come to the infinitesimal rotations. In discrete case the infinitesimal rotation on (Xj, Xk)-plane through the Euler angle is
317
which approximates
that is, by using (14),
Also we prove that 6.£ is rotation invariant, that is
(15) In short, we may say the expression 6.£
=
J
&;(dt)2
is coordinate free. For the discrete case, we fix T = [O,I],Xn (B,(n), {(n} being a complete orthonormal system in L 2 (T). We assume that the system is equally dense in the sense of P. Levy [12]. Then, the 6.£ is defined by
(16) where 1 N &2
_
6.£ There is an example. Let cp
= lim N ~ &e'
(17)
Xn = (B,(n), = I>n(X'; - 1) n
where an = 1 or -1. Of course cp is a generalized functional if test functionals are of the form n
bn being real and rapidly decreasing. The bilinear form is given by
(18) the right side is absolutely convergent, i.e.
318
And _ 6.L
IN
= lim N Lan 1
exists for periodic an. The limit changes by rotation. Thus, we see that there is a difference between the discrete case and the continuous case such that the Laplacian in the discrete case is not coordinate free although the Laplacian in the continuous case is coordinate free. Acknowledgement
The author wishes to express her deep gratitude to the organizers for the invitation to International conference QBIC 2010, held in Tokyo University of Science. References 1. L. Accardi and V. Bogachevet, The Ornstein Uhlenbeck Process and the Dirichlet Form associated to Levy Laplacian, N. 193 Volterra Center, 2004. 2. T. Hida, Canonical representations of Gaussian processes and their applications. Mem. ColI. Sci. Univc. Kyoto, 34 (1960), 109-155. 3. T. Hida and N. Ikeda, Analysis on Hilbert space with reproducing kernel arising from multiple Wiener integral. Proc, 5th Berkeley Symp. on Math. Statistics and Probability. vol. 2, (1967) 117-143. 4. T. Hida, Analysis of Brownian functionals. Carleton Math. Lecture Notes no. 13, Carleton University, 1975. 5. T. Hida, Theory of Probability. Floundation and Developments. Kyouritsu Pub. Co. in Japanese 2010. 6. T. Hida and Si Si, An innovation approach to random fields. Application of white noise theory. World Scientific Pub. Co. 2004. 7. T. Hida and Si Si, Lectures on white noise function als . World. Sci. Pub. Co. 2008. 8. T . Hida, Theory of Probability. Floundation and Developments. Kyouritsu Pub. Co. in Japanese 2008. 9. P. Levy, Processus stochastiques et mouvement brownien. Gauthier-Villars. 1948. 2eme ed. with supplement 1965. 10. P. Levy, Problemes concrets d'analyse fonctionnelle. Gauthier-Villars . 1951. 11. J. Mikusinski, On the square of the Dirac delta-distribution. Bulletin de l'Academie Polonaise des Sciences. Ser. math, astro et Phys. 14 (1966), 511513. 12. Si Si and T. Hida, Some aspects of quadratic generalized white noise functionals. Proc. QBlC08 held at Tokyo Univ. of Science. 2008, QP-PQ, Vol, XXIV 2009, World Scientific Publ. Co., 184-191
319
13. Si Si, An aspect of quadratic Hida distributions in the realization of a duality between Gaussian and Poisson noises. IDAQP 11 (2008) 109-118. 14. Si Si, Introduction to Hida distributions . World Sci. Pub. Co. 2010. to appear
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W . Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 321- 337)
BOGOLIUBOV TYPE EQUATIONS VIA INFINITE-DIMENSIONAL EQUATIONS FOR MEASURES V.V. KOZLOV AND O.G. SMOLYANOV
Steklov Mathematical Institute, Russian Academy of Sciences E-mail [email protected] Lomonosov Moscow State University E-mail: [email protected]
Introduction We introduce a new method, of developing some systems of equations describing evolution of both quantum and classical systems of statistical mechanics, which does not use the so called thermodynamical limit [7] . A particular case of one of the systems is the famous Bogoliubov system of equations (this system is also called the Born-Bogoliubov-Green-KirkwoodYvon chain of equations). Each of the systems of equation either follows from or is equivalent to an equation with respect to functions of real variable taking values in the space of some measures on the infinite-dimensional phase space. In the classical case the latter equation is the infinite-dimensional Liouville equation, i.e. the adjoint equation to the equation for first integrals of the (system of) Hamilton equation(s). In the quantum case the Liouville equation is substituted by an equation, with respect to functions taking values in the space of measures on the phase space, which is a generalization of the equation describing the evolution of the Wigner function. One needs to consider, in infinite-dimensional case, equations with respect to functions taking values in some spaces of measures, because there does not exist a Lebesgue measure on an infinite-dimensional space, i.e. a measure which if translation invariant, (I-additive, (I-finite and locally finite (=finite on some balls) and hence it is impossible to consider, instead of measures, their densities. In particular, we need to define the notion of the Wigner 321
322
measure. Actually the systems of equations that we consider below are systems of equations describing evolution of densities of measures, on finitedimensional spaces, which are either finite-dimensional projections of measures, on some infinite-dimensional spaces, whose evolution is governed by some infinite-dimensional equations or the results of a procedure of the so called desintegration [13] of similar measures. The classical Bogoliubov system of equation is a particular case of one of the latter systems. One can say that, instead of the thermodynamical limit, we use an infinite-dimensional phase space. This allows to develop not only the systems of equations with respect to densities of particles and of the collections of particles but also some systems of equations with respect to finitedimensional densities of probabilities. The paper is organized as follows. First we consider the classical case when the state is described by a nonnegative measure on the phase space. After that we define the notion of a Wigner measure, briefly formulate some properties of that object and introduce the equation describing the evolution of the Wigner measure. After that having noticed the similarity between the Liouville equation for nonnegative measures and the equation for the Wigner measure we formulate the quantum analogs of those results of the paper that are related to the classical case. Finally, we discuss a generalization of a Poincare model of irreversible but symmetrical with respect to time evolution. Below we want to clarify main ideas and in some places we omit some technical assumptions.
1. Symplectic locally convex spaces and Hamilton's
equations.
For any locally convex space (LCS) E we denote by E* the vector space of all continuous linear functionals on E equipped with a topology compatible with the duality between E* and E. We assume that the scalar field is the field of all real numbers and that all LCS are Hausdorff ones. For any LCS E, G we denote by L(E, G) the vector space of all linear continuous mappings from E into G; for any n E N we denote by Bn(E) the vector space of all n-linear functionals on the Cartesian product of n copies of E. A symplectic LCS is a pair (E,1) where E is an LCS, 1 is a linear mapping of E* into E, such that 1* = -1. A Hamiltonian system is the triplet (E, 1, H) where (E,1) is a symplectic LCS, called (as well as E)
323
the phase space of the Hamiltonian system, and 'H is a numerical function on E, called the Hamiltonian function. The Hamilton equation for this Hamiltonian system is the equation
f'(t) = I'H' f(t) with respect to a function f of a real argument taking values in the phase space E; here 'H' is the (Gateaux) derivative of the function 'H. The equation for first integrals for the Hamilton equation, or the Liouville equation w.r.t. functions, is the equation:
8F
7§t(t) = .CI-t(F(t)); here F is a function of a real argument taking values in some space F(E) of functions on the phase space E and £1-( is the linear operator on F(E), called the Liouville operator, defined by (£1-( cp) (x) = {cp, 'H}( x) where {-, .} is the Poisson bracket, which is defined for any two functions cP and \Ii on E by
{cp, \Ii}(x) = cp' (x) (I (\Ii' (x))). 2. Liouville's equations with respect to measures.
The Liouville equation with respect to measures is the adjoint to the equation for first integrals for the Hamilton equation. Nevertheless it has the same structure, which follows from an infinite-dimensional version of the Liouville theorem about the conservation of the phase volume (the original Liouville theorem has no sense for infinite-dimensional spaces because, as it has already been mentioned, there does not exists "the Lebesgue measure" on an infinite-dimensional space.) If E is an LCS then a set AcE is called cylindrical if it is the inverse image of a Borel subset of some finite-dimensional LCS G under a continuous linear mapping f of E onto G (it is sufficient to assume that G is a quotient space of E and that f is the canonical mappings of E onto G). For any finite-dimensional quotient space G of E we denote by Ac (E) the inverse image of the o--algebra of Borel's subsets of G under the canonical mapping of E onto G. Let 9 be a family of finite-dimensional quotient spaces of an LCS E satisfying the following condition: for any G 1 , G 2 E 9 there exists G 3 E 9 some quotient spaces of which are naturally isomorphic to G 1 and G 2 and let Ag(E) = UCEgAc(E) (Ag(E) may be the algebra, of all cylindrical subsets of E, which is denoted by A(E)). We assume that Ag(E) generates
324
the Borel o--algebra of E. A (9- )cylindrical measure on E is a numbervalued function v on Ag(E) such that, for any G E 9 the restriction of v to Ac(E) is count ably additive. The set of all cylindrical measures on (E, Ag(E)) (resp., on (E, A(E))) is denoted by Mg(E) (resp., by M(E)). A function on E is called G-cylindrical if it is measurable with respect to the o--algebra Ac(E); a function is called (9- )cylindrical if it is G-cylindrical for some G E g. Any bounded cylindrical measure becomes count ably additive under a suitable extension of the space [11] ( the assumption of bounded ness is often included in the definition of the cylindrical measure). Definition 2.1. If k is a vector field on E (i.e. a mapping from E into E) then a measure v E Mg(E) is called differentiable along k if there exists a function f3'kO on E, which is called a logarithmic derivative of v along k , such that
Ie
f(x)f3'k(x)v(dx)
=-
Ie
f'(x)hv(dx)
for any cylindrical function a differentiable along k and bounded together with its derivative along k. If k(x) = h E E for any x E E then we write f3v (h, .) instead of f3'k 0 and call the measure v differentiable along h; in this case the function f3v (h, .) is called the logarithmic derivative of v along h. The measures f3'kOv and f3 V (h, ·)v are called, respectively, derivatives of v along k and along h and are denoted by v' k and by v' h. Actually the same definition can be applied to generalized measures like the Feynman type pseudo-measure. Proposition 2.1. If the linear span of {k(x) : x E E} is contained in the collection of all vectors from E along which the measure v is differentiable aOne of the definitions of integral with respect to a bounded cylindrical measure is as follows. Consider an extension of the initial space on which the measure is countably additive. If the function whose integral is to be determined admits a natural extension to the extended space, then its integral with respect to the given cylindrical measure is defined as the usual integral of the extended function with respect to the Lebesgue extension of the measure on the extended space generated by the given cylindrical measure. If the given cylindrical measure itself is count ably additive, then the integral with respect to this measure is defined as the integral with resp ect to its Lebesgue extension. For cylindrical functions, integral is calculated directly, because G-cylindrical functions are measurable with respect to the a-algebra AG(E); in this case we do not need to assume that the cylindrical measure is bounded. It is also worth noticing that actually both integrals with respect to a a-additive measure and integrals with respect to a cylindrical measure are calculated as limits of proper sequences of some integrals over finite-dimensional spaces.
325
then the measure v is differentiable along k and (3~ ( x)
= (3v ( k (x ) , x) + tr k' (x )
(see [12}). Remark 2.1. If the dimension of a space E is finite, then the algebra of sets A(E) coincides with the cr-algebra of Borel sets and M(E) coincides with the set of all count ably additive finite Borel measures. If, moreover, the measure v has a smooth density 9 E Ll (E), with respect to the Lebesgue measure on E, then, whatever a vector hE E, the measure v is differentiable along h and the derivative of v along h, v' h, has a density f' Oh with respect to the Lebesgue measure; the logarithmic derivative of the measure v along h coincides with the logarithmic derivative of its density. However, in the general case, the density of the derivative of such a measure along a vector field k, v'k, does not coincide with the derivative along this field of its density, and the logarithmic derivative of the measure along a vector field does not coincide with the logarithmic derivative of the density of the measure along the same vector field. Proposition 2.2. If a vector field k is Hamiltonian, then (3~(x)
= (3V(k(x), x)
(one does not assume that k' (x) is a trace class operator). Corollary 2.1. Under the assumptions of Remark 2.1, the derivative of a measure along a Hamiltonian vector field has a density, which coincides with the derivative of the density along this field, and its logarithmic derivative coincides with the logarithmic derivative of the density. Proposition 2.3. If F is a canonical transformation of the phase space, v E Mg(E), and vF is the image of a measure v with respect to the mapping F- 1 , then
d(vF) (x) dv
= exp(
r (3V(FUt, x), F(t, x)dt), 1
Jo
where the mapping F : [0,1] x E --7 E is differentiable with respect to the first argument and such that F(O, x) = x and F(1, x) = F(x) (the RadonNikodym derivative d(~:) does not depend on the choice of such a mapping). This proposition follows from Proposition 2.2 due to results of [12].
326
Remark 2.2. Proposition 2.3 is a version of the classical Liouville theorem on the conservation of the phase volume; Proposition 2.2 can be regarded as an infinitesimal version of the same Liouville theorem. Remark 2.3. Derivatives and logarithmic derivatives can also be defined for measures not being finite so that analogues of Propositions 2.2 and 2.3 remain valid. Definition 2.2. The Liouville equation with respect to measures on the phase space E of a Hamiltonian system is the equation
: (t) = DH(v(t)) with respect to functions of a real argument taking values in some space of cylindrical measures on E; here, L'H is an operator on the space of measures adjoint to the Liouville operator on the space of functions. If we want to be more precise we will speak of measures on (E, Ag(E)). Theorem 2.1. The Liouville equation with respect to measures on the phase space E has the following form :
(t)
= -(v(t))'(IH') ==
_(3v(t) (I(H'(.),
·)v(t)
The right hand side of this equation is the derivative of the measure v( t) along the vector field IH' (.), thus, the theorem follows from Proposition 2.2 and the definition of the Liouville operator on the space of functions. Remark 2.4. The Hamiltonian function H is usually the sum of a series which may converge not everywhere; but in what follows we do not discuss convergence conditions for the corresponding series. 3. Systems of equations with respect to finite-dimensional distributions of probabilities. In this section we consider a general scheme of the passage from the Liouville equation with respect to measures on an infinite-dimensional space to an equivalent infinite system of equations with respect to functions on finitedimensional spaces. In this and in the next sections we assume that the symplectic space E has a symplectic basis B = {ek : k = 1,2, ... } and is endowed with the Euclidean structure with respect to which the elements of E form an orthonormal basis. We also assume that every subspace of E generated by a finite family of elements of B is equipped with the Lebesgue
327
measure generated by its Euclidean structure (this measure depends only on the initial symplectic structure). For each finite set {ek 1, ... , ek n } of elements from 8, constituted a symplectic basis of a finite-dimensional (symplectic) subspace of E, let Fk1, ... ,k n be this subspace, and let Gk1, ... ,kn be the symplectic quotient space of E by its closed subspace generated by those elements of 8 which are not contained in the linear span of {ekll ... ,ekn}. Let 9(= 9(8)) be the set of all quotient spaces Gk1, ... ,k n . The Liouville equation with respect to measures on (E, Ag (E)) is equivalent to the system of equations with respect to the restrictions of these measures to (some) subalgebras Ac(E), where G E 9(8). Under the assumption that the measures on these sub algebras may be determined by densities on the spaces Gk1, ... ,kn , or, equivalently (because of the natural isomorphism between Fk1, ... ,k n and Gk1, ... ,kJ, on Fk1, ... ,k n , one can compare the system of equations with respect to the densities with the Bogolyubov system of equations b ; but these two systems are equivalent only if the space E is finite-dimensional; in general case they are different (see below). For any measure f.1 E Mg(E) and any finite set {ek 1, ... , ek n } of elements of 8 let ,k n denote the corresponding density of the marginal measure on the space Gk1, ... ,kn .
ft, ...
Theorem 3.1. Let v E Mg(E) and {ekll ... ,ekn}, {erll ... ,e rm } C 8, let the Hamiltonian function H be Gr1, ... ,r m -cylindrical and let hr1, ... ,r m be the Hamiltonian vector field on E generated by this function, that is, hr1, ... ,rm I(H'(·)). If{erll ... ,erm } C {ekll ... ,ekn}, then
bIn nonequilibrium statistical mechanics, the Bogolyubov system (or chain) of equations (we do not consider here the Bogolyubov equations describing the equilibrium states 2 , which have the similar form but a different meaning) is an infinite sequence of equations with respect to time-dependent functions on n-particle phase spaces. In the classical papers some similar but finite systems of equations are derived from the classical Liouville equations describing finite sets of particles; after that a formal operation called the passage to the thermodynamical limit, in which the number of particles and the volume occupied by them tend to infinity, while the particle density remains constant, is performed . Although the functions on n-particle phase spaces that form solutions to finite systems of Bogolyubov equations are proportional to the densities of the corresponding probability distributions, under the passage to the thermodynamical limit (in which the number of equations in the system tends to infinity), the proportionality coefficients tend to infinity; this is the formal reason for the fact that solutions to Bogolyubov infinite systems of equations consist of functions not proportional to densities of any finite-dimensional probability distributions.
328
v'h n (X) -f k1,.·.,k j(f{VTl,···,T rn }U{k 1,···, kn })'( ... X S1 '
... ,
Xs p ... )hrl , ... r
,Tn
(",XSl' ... ,
Xs P ... )dxs1···dxsP
(equalities of the first type can be and are considered as special cases of equalities of the second type). Remark 3.1. If {er" ... , e rm } n {ekl' ... , ek n
}
=0
then fk:~ .. ,kjx) ==
o.
Let 1t = I:: Hrl ,... ,rn , where 1t r1 ,... ,rn is a G r1 ,... ,rn -cylindrical function for each finite set {r1' ... , rn} of positive integers and the summation is over all such sets of positive integers. Let also hr1 ,... ,rm = I(H~" ... ,rIJ·)), Theorem 3.2. A function 1/(') taking values in Mg(E) is a solution of the Liouville equation with respect to measures if and only if the functions t f--+ gr" ... ,rn (t), defined by the equalities gr" ... ,rn (t) = f::,(,t).,r n satisfy the following infinite system of differential equations:
g~" ... ,kn (t)(·)
=
where the summation is over all finite sets {r1' ... , rm} of natural numbers for which {r1' ... , r m} n {k1' ... , k n } #- 0, and, in accordance with what has been said above, if {r1' ... , rm} n {k1' ... , k n } = {r1' ... , rm}, then the integral sign is assumed to be missing.
This follows from Theorem 3.1 and Remark 3.1. 4. Bogolyubov's systems of equations.
In this section we describe a relationship between an infinite system of equations similar to the Bogolyubov system, and the Liouville equation with respect to functions taking values in the space of infinite measures. Below the symbol M=(E) denotes a vector space of measures, on the phase space E , taking values in the extended real line [-00,00]. We assume that the domains of such measures is a ring of Borel subsets but we will not discuss here in details such domains; instead, we specify the measures by integrals, with respect to them, of some Borel functions whose restrictions to
329
each of the subspaces {F1'1 ,... ,1'n} are continuous and compactly supported. Those integrals, in turn, are defined as the limits of sequences of integrals over the subspaces {F1'" ... ,1'n }' If the measures on the subs paces {F1'" ... ,1'n } are given by their densities 'l/J1'" ... ,1'n ' then we say that the family of functions {'l/J1'" .. .,Tn : {rl, ... ,rn } C N} determines a measure v on E. A family of such functions is said to be compatible if, for any finite set {rl, ... , rn} and its proper subset {rl, ... , rd, the following holds: 'l/J1'" ... ,1'k (Xl , .. ., Xk) = limvol(V) -H)() vOI(V) 'l/J1'l ,.. .,Tn (Xl, ... , Xn)dXk+I ... dxn; here V is a ball centered at zero in F rk + 1 , ... ,1' n and vol (V) is its volume. Suppose that the operator L'H, which is adjoint to the Liouville operator, is defined on Moo(E).
Iv
Theorem 4.1. Let, for every finite set {rl, ... , rn} of natural numbers, gr" ... ,1'n be a function of a real argument taking values in the space of (bounded continuous nonnegative) functions on F1'" ... ,1' n ' and, for each t, the family of functions g1', ,... ,1'n (t) is compatible and determines a unique measure v(t) E Moo(E). Then the function v(-) is a solution of the Liouville equation with respect to measures, if the family of functions g1'" .. ,1'n is a solution of the system of equations 00
+
L {j" .. ,jp}={1'" ... ,r",}n{k " ... ,kn}
.
1
N---+oo
Nm-p
hm - - -
{1'" ... ,1'",}\{k" .. ,kn}n{I,... ,N }
Remark 4.1. The function v(·) being a solution of the Liouville equation does not imply that the family of functions g1'" ... ,rn (.) is a solution of the system of equations from the theorem. For example it is so if all measures v(t) are probabilities (then we need to apply Theorem 3.1). Remark 4.2. The difference between the system of equations in Theorem 3.2 and that in Theorem 4.1 is that the latter contains a generalized Cesaro mean [15] (instead of one of the sums), which may exist even when the corresponding series diverge. Moreover the functions g1'" ... ,1'", are not
330
probability densities; they can be considered as densities of collections of points in the phase space. Remark 4.3. Under the assumption that all functions grl, ... ,rn (t) are symmetric for each t, the system of equations in Theorem 4.1 implies a system of equations similar to the classical Bogolyubov system; in particular the original Bogolybov system can also be obtained. 5. Wigner measures.
In this section we discuss some quantum analogs of the preceding results using an infinite-dimensional version of the Wigner representation of the quantum states (the finite-dimensional Wigner representation is introduced in [3] and is developed in [4] (see also [5, 6]). We define infinite-dimensional pseudo differential operators with Weyl symbols in the spirit of the Hida White noise calculus [16]. Those operators play the role of quantum observabIes and hence the pass from symbols (which are classical observables) to operators can be called Schrodinger-Weyl quantization. We pose the Plank constant is equal to one. We introduce the notion of the Wigner measure, which substitute, for infinite-dimensional systems, the Wigner function, formulate the equation which describes the evolution of the Wigner measure and show that the results of the preceding sections can be extended to the case of the Wigner measure. Within this section E = Q x P, where the LeS Q and P are such that P = Q* and Q = P*; hence E* = P x Q and the mapping J : E --+ E*, (q, p) f---+ (p, q) is an isomorphism. Let also the mapping I : E* --+ E be defined by I(p, q) = (q, -p). The LeS Q (resp. P) is called the configuration space (resp., the momentum space) ofthe Hamiltonian system (E, I, H). If ql, q2 E Q,PI,P2 E P then the value that the linear functional (PI, qd = J (ql, pd takes at the element (q2, P2) is denoted by PI q2 + q1P2· We assume that the Hilbert space H of the corresponding quantum system is the complex space L2 (Q, f.L) where f.L is a P-cylindrical measure on Q; in order to define infinite-dimensional pseudodifferential operators we assume that this is a Gaussian measure and use the ideology of the Hida White noise analysis calculus. Nevertheless this measure does not appear in the final formulae. Let the symbol T denote the von Neumann density operator (=trace class positive operator in H whose trace is equal to one) that defines a state of the system, and let PT denote the integral kernel of the density operator. If TJ is a (X*)-cylindrical measure on a LeS X and D(TJ) is the collection
331
of all vectors along which the measure r; is differentiable then the generalized density of r; is a scalar function F7) on D(r;) whose logarithmic derivative along any h E D(r;) is equal to j37)(h, .). Even for Gaussian measures the generalized density is defined only up to a multiplicative constant. One can show that if r; is the Gaussian measure whose Fourier transform ii is defined by ii(z) = exp(-~(zB(z))), where B is a linear mapping of X* into X, then F7)(x) = Cexp(-~(xB-1(x))). This result shows that the Gaussian measure can be defined by its generalized density. Below we use the generalized density of the Gaussian measure in order to define pseudo differential operators in L2 (Q, fJ). Let the measure fJ be the Gaussian measure defined by its generalized density as follows: Ff.L(q) = exp(-~(qB-1(q))) where B E L(P,Q) and let l/ be a Q-cylindrical Gaussian measure on P defined by Fl/(p) = exp(-~(q(B*)-l(q))). It is well known that if Q and P are Hilbert spaces then fJ and l/ are a-additive if and only if B is a (positive) trace class operator. For each "good enough" scalar function (we will not formulate the corresponding analytical assumptions) H on E(= Q x P) the symbol F denotes the pseudodifferential operator in L2 (Q, fJ) (which is supposed to be essentially selfadjoint), whose Weyl symbol is H. This means that if rp E domH(c L 2 (Q,fJ)), then
(Hrp)(q) =
r r H(q1 2+q,p)e-ip(ql-q)rp(qd
JpJQ
x (Ff.L (q)) - ~ (Ff.L (q1)) - ~ (Fl/(p)) -lfJ( dqdl/( dp). The integral ar r.h.s. is defined as follows:
where c;;:l
=
r r e-ip(ql -q) (Ff.L( q)) - ~ (Ff.L (qd) - ~ (Fl/(p))
JPn JQn
-1 fJ( dqdl/( dp)
and Qn x Pn = Fk1, ... ,kn ; we assume that for any n the subspace Fk1, ... ,k n is contained in the domain of the integrands of the latter finite-dimensional
332
integrals and use the regularisations of finite=dimensional integrals which are defined by the following way. If f E Lioc(IRn), then we say that the integral JlRn f(x)dx exists if for any rp E V(IRn), for which rp(O) = 1, the limit lima-+oo JlRn rp(ax)f(x)dx exists and then by definition JlRn f(x)dx = lima-+oo JlRn rp(ax)f(x)dx (the definition does not depend on the choice of rp). For any h E E the symbol h denotes the pseudodifIerential operator in L 2(Q), whose Weyl symbol is Jh(E E*)i in particular if h = q+p(= (q,p)) then h = P+ g. Let us mention that if qo E Q then go is the operator of the momentum in the direction of qo (but not of the coordinate).
Remark 5.1. Let E = {h : hE E}i then the mapping P- : Jh f--+ h, E* -+ E is a liner isomorphism (we assume that E is equipped with the natural structure of a vector space). The extension of P- to a linear mapping, of the space generated by E* and the function on E whose values at each points are equal to one, into the space of operators in L2 (Q, JL), defined by the assumption that the image of this function is the multiplication by i, is called the Scrodinger representation of the canonical commutation relations. Definition 5.1. The Weyl operator W(h) generated by h E E is defined by: W(h) = e- ih . The Weyl function WT corresponding to the state (=density operator) T of the quantum system, whose Hilbert space is L2 (Q, JL), is the function on E, defined by WT(h)
=
trTW(h).
Definition 5.2. The Wigner measure WT generated by the state (=density operator) T of the quantum system, whose Hilbert space is L2 (Q, JL), is the image, with respect to the mapping J- 1 : E* -+ E, of the E-cylindrical measure on E* whose Fourier transform is the Weyl function. This means that
r
e i (P,Q2+Q,P2)WT (dq1 dpd
=
W T (q2,P2).
lQxP
Remark 5.2. One can show that the Wigner measure can also be defined as the integral kernel of the linear functional P f--+ trT F on the vector space of the Weyl symbols of (bounded) pseudodifIerential operators in L 2 (Q,JL). This means that for any such Weyl's symbol P the following identity holds trTF =
tk
P(q,p)WT(dqdp)
(1)
333
(see (cf.[IO] where only finite-dimensional spaces are considered when the Wigner's measure can be substituted by its density, which is called the Wigner function).
Jp
Remark 5.3. The measure wi? on Q defined by Wi?(dq) = WT(dqdp) is the (cylindrical) probability on Q describing the distribution of results of measurements of the coordinates.
To formulate the equation describing the evolution of the Wigner measure we need a definition of what one could call a function of the Poisson bracket. Below we use some topological tensor powers of the phase space but we do not discuss the topologies of them. We use the assumptions and definitions of sections 3 and 4. Let, for any n E N, the symbol I@n denotes the mapping, of a proper subspace of Bn(E), into E@n generated by n-th tensorial power of I (here E@n is a topological tensor product of n copies of E). Let moreover, for any two scalar functions F and G on E,
and
Cr;) H(x) = {F, G}(n)(x) (of course, CG # Cr;»). Finally let, for any a > 0, the operator (sin)aC c be defined by 00
( . )
sm
r
'"'
al..-C = ~
a
2n-l
r(2n-l)
(2n _1)!l..-c
(we do not discuss now in which sense the series converges) and let (sin)aCb be the operator in a space of E* -cylindrical measures on E, which is adjoint to (sin)aCc. Theorem 5.1. The dynamics of the Wigner measure is governed by the following equation:
The proof can be obtained by combination of technique of the theory of differentiable measures and some methods of developing equations describing the evolution of the Wigner function [4], [5], [6].
334
Theorem 5.2. Let z; E Mg(E), {ek" .. . , ekn } , {erp ... , er",} C B and let the Hamiltonian function H be Gr" ... ,rTn -cylindrical. If {erp .. . , e rTn } C {ek" ... , ekn }, then
f 2 (sin H .c;,.c v \x)
=
kl J· .. ,kn
2 J 2(sin)~£H
T l , · .. ,T;n
(f{vr 1 , ··· , r rn } U{k 1, · ·· , kn })( ... xs" ... ,xsP ... )dxs, .. .dxs P .
Theorem 5.3. A function W(·) taking values in Mg(E) is a solution of the equation governing the dynamics of the Wigner measure if and only if the functions t f--' gr" ... ,rn (t), defined by the equalities gr" .. .,r n (t) =
fr2,(,s.'.·n.,r) n~.c:H( W (t )) satisfy the following infi nite system of differential equations:
g~" . .,kn (t)( ·)
=
L J2(sin)~£Hr",rTn (g{ r" ... ,r=}U{k" ... ,kn}(t))
where the summation is over all finit e sets {rl, ... , rm} of natural numbers for which {rl, ... , r m} n {k 1 , ... , k n } -f. 0, and, in accordance with what has been said above, if {rl, ... , rm} n {k 1 , ... , k n } = {rl, ... , rm}, then the integral sign is assumed to be missing.
This follows from the theorems 5.1 and 5.2. Suppose that the operator (sin)£'H is defined on MOO(E). Theorem 5.4. Let, for every finite set {rl'"'' r n} of natural numbers, gr" ... ,rn be a function of a real argument taking values in the space of (bounded continuous) functions on Fr" ... ,rn , and, for each t the family of functions gr" ... ,r n (t) is compatible and determin es a unique measure w(t) E MOO(E). Th en, the function w(·) is a solution of the equation governing the dynamics of the Wigner measure, if the family of functions
335
grl, ... ,rn is a solution of the system of equations
x (.)
+
1
"" ~
lim - -
N-.oo Nm-p
{jl , ,, .,jp }={ rl ,,,. ,r",} n { kl ,,, . ,k n
}
If it is not assumed that the Plank constant is equal to one then it is necessary to substitute "2(sin)~" by "~(sin)~ ", where fi is the Plank constant. Remark 5.4. From Theorem 5.4, which is similar to theorem 4.1, one can deduce a quantum version of the classical Bogolyubov system of equations and also some other similar systems of equations. It is also worth noticing that the integrals w(t)(dq, dp) are not probabilities and hence (Remark 5.3) the measures w(t) are not Wigner measures.
Jp
6. Generalization of Poincare's model.
In this section we formulate a proposition (Proposition 6.1 below) that improve the similar proposition related to the Poincare model ([1], [8], [9]) of irreversible but symmetrical with respect to time evolution. Actually in the original model one assumes that the initial probability distribution, on the phase space, is the product of identical one-dimensional distributions; in Proposition 6.1 we consider some general distribution on the phase space and prove that this distribution tends (both when time tends to +00 and to ~oo) to the same limit as in the case when the initial distribution is the product of the identical distributions. The similar improvement is valid for the quantum version [10] of the Poincare model. Proposition 6.1. Let E
=Qx
P, E' 2
=P
x Q, I(p, q)
=
(q,
~p),
and let,
in natural notations, 'H(q,p) = 2: f,;:; (this Hamiltonian system describes noninteracting particles). Let v(-) be a solution of the Liouville equation for probability measures such that the finite-dimensional projections of the measure v(O) on E satisfy the conditions of Theorem 4·1 in flO). If Qn is a finite-dimensional subspace of the space Q and, for each t, P(t,') is the density of the projection of the measure v(t) on Qn , then, for any compact
336
subsets Kl and K2 of Qn! the following holds:
JK, P(t, q)dq JK2 P(t, q)dq
--->
mesK1 mesK2
,
t
--->
±oo,
where mes denotes the Lebesgue measure on Qn (so one could say that the probability distribution on the configuration space tends to the uniform distribution) . The proof uses Theorem 3.2 and is similar to the proof of Theorem 4.1 from [10]. This proposition somewhat strengthens a result of [8] which goes back to Poincare [1]. In [8] it was actually proved in the special case when the initial probability measure v(O) is the product of copies of a one-dimensional probability measure. A completely similar proposition is valid for quantum systems because for systems with quadratic Hamiltonian functions the equations for the Wigner measure coincide with the Liouville equations (cf. [10]).
References 1. H.Poincare. J.de Physique theorique et appliquee, 4 serie, 5, (1906),369-403. 2. N.N.Bogolyubov. Problems of dynamical theory in statistical physics. Moscow, 1946 (in Russian; there exists an English translation). 3. E.Wigner. Phys.Rev., 40, (1932), 749. 4. J. E. Moya!. Quantum mechanics as a statistical theory. Proc. Cambridge Philos. Soc. , v. 45 , (1949), 99-24. 5. G. B. Folland. Harmonic Analysis in Phase Space. (Princeton Univ. Press, 1989) . 6. Kim Y. S., Noz M. E. Phase-Space Picture of Quantum Mechanics. Group Theoretical Approach. (World Scientific, 1991). 7. Radu Balescu, Equilibrium and nonequilibrium statistical mechanics, vo!'l (John Wiley and Sons, 1975). 8. V. V. Kozlov, Thermal Equilibrium in the sense of Gibbs and Poincare (Moscow-Izhevsk, 2002) [in Russian]. 9. V.V.Kozlov. Reg. Chaotic Dyn. 1. (2004), 23-34. 10. V.V.Kozlov, O.G.Smolyanov. Theory Probability and Applications. Vo!' 51, 1, (2006) , pp. 1-13. 11. O.G.Smolyanov, S.V.Fomin. Soviet Mathematical Surveys, V. 31, 4. (1976), 3-56. 12. O.G.Smolyanov, H.von Weizsaecker.Comptes Rend. Acad. Sci. Paris. T. 321, ser. 1. (1995), 103-108. 13. N.Bourbaki , Integration, Chapitre 6 (Springer, 2007).
337
14. O.G.Smolyanov, H.v.Weizsacker. Smooth probability measures and associated differential operators. Inf. Dimens. Anal., Quantum Probab. and Relat. Top. V.2, 1, (1999),51-78. 15. L. Accardi and O. G. Smolyanov. Generalized Levy Laplacians and Cesaro Means. Doklady Mathematics, Vol. 79, 1, (2009), 1-4. 16. T.Hida, H.H.Kuo, J.Pothoff, L.Streit. White noise. An infinite dimensional calculus. Kluwer Academic, 1993.
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 339-354)
ANALYSIS OF SEVERAL CATEGORICAL DATA USING MEASURE OF PROPORTIONAL REDUCTION IN VARIATION
KOUJI YAMAMOTO, KOUJI TAHATA, NOBUKO MIYAMOTO AND SADAO TOMIZAWA * Department of Information Sciences, Tokyo University of Science, Noda City, Chiba 278-8510, Japan * E-mail: [email protected] For a two-way contingency table with nominal row and column variables, the measures which describe the proportional reduction in variation (PRV) from the marginal distribution of one variable to the conditional distribution given the other variable are proposed by Goodman and Kruskal (1954), Theil (1970), and Freeman (1987, p. 101). Tomizawa, Seo and Ebi (1997), and Miyamoto, Usui and Tomizawa (2005) proposed the generalization of those measures. Tomizawa, Miyamoto and Yajima (2002), and Yamamoto and Tomizawa (2009) proposed the PRV measures for a nominal-ordinal contingency table and for an ordinal-ordinal contingency table, respectively. The present paper (1) reviews these PRY measures and (2) analyzes and compares between several categorical data using these PRY measures.
Keywords: Concentration coefficient; Measure, Proportional reduction in variation; Square contingency table; Total uncertainty coefficient.
1. Introduction
The data in Table 1 taken from the Meteorological Agency in Japan are obtained from the daily temperatures at Nagasaki City, Japan, in two years, 2001 and 2002, using three levels, (1) below normals, (2) normals and (3) above normals (see Tahata, Takazawa and Tomizawa, 2008). The observations, say, Iij, in the (i, j)-th cell indicate that for each of Iij days in 365 days (i.e., from 1 January to 31 December), the temperatures in two years are i in 2001 and j in 2002. Table 2 is taken from Tallis (1962) and constructed from the crossclassified data of Merino ewes according to the numbers of lambs born in consecutive years, 1952 and 1953 (also see Bishop, Fienberg and Holland, 1975, p. 288; Miyamoto, Niibe and Tomizawa, 2005). 339
340
Table 3 is the data on unaided distance vision of 7477 women aged 30-39 employed in Royal Ordnance factories in Britain from 1943 to 1946. The row variable is the right eye grade and the column variable is the left eye grade with the categories ordered from the Best (1) to the Worst (4). The vision data in Table 3 have been analyzed by many statisticians, including Stuart (1955), Bishop et al. (1975, p. 284), McCullagh (1978), Goodman (1979) , Agresti (1983) , Tomizawa (1985 , 1993, 2009), Miyamoto, Ohtsuka and Tomizawa (2004), Tomizawa, Miyamoto and Yamamoto (2006), Tomizawa and Tahata (2007), and Tahata, Yamamoto, Nagatani and Tomizawa (2009). Table 4 is the data on unaided distance vision of 3168 pupils comprising nearly equal number of boys and girls aged 6-12 at elementary schools in Tokyo, Japan, examined in June 1984. The data in Table 4 have also been analyzed by Tomizawa (1985), Miyamoto et al. (2004), and Tahata and Tomizawa (2006). Table 5 is the data on unaided distance vision of 4746 students aged 18 to about 25 including about 10 percent women in Faculty of Science and Technology, Science University of Tokyo in Japan examined in April 1982. The data in Table 5 have been analyzed by Tomizawa (1984, 1985) and Tahata et al. (2009). The data in Table 6 represent the cross-classification of a sample of individuals according to their socioprofessional category in 1954 and in 1962 (see Caussinus, 1965; Bishop et al., 1975, p. 298). Tables 1 through 6 are the data of square contingency tables having the same row and column classifications. In addition, the categories in each of Tables 1 through 6 are ordered. Many observations concentrate on (or near) the main diagonal cells in the table. Therefore the row classification tends to be strongly associated with the column classification, namely, the model of independence (i.e., null association) between the row and column classifications does not hold. For those data we are interested in whether or not the row value of an individual is symmetric to the column value. Many models of symmetry and asymmetry have been proposed by many statisticians; for instance, Bowker (1948), Caussinus (1965), Bishop et al. (1975, Chap. 8), McCullagh (1978), Goodman (1979), Agresti (1983, 2002), Tomizawa (1993, 2009) , and Tomizawa and Tahata (2007). We omit here the details of models of symmetry or asymmetry. For the data in Tables 1 through 6 we are also interested in measuring the relative improvement in variation in predicting the value of the other variable when the value of one variable is known, opposed to when it is not
341
known. Consider an r x c contingency table with both nominal categories of the explanatory variable X and the response variable Y. Let Pij denote the probability that an observation will fall in the (i, j)-th cell (i = 1, ... ,rj j = 1, ... ,c). A measure which describes the proportional reduction in variation (PRV) from the marginal distribution of Y to the conditional distribution of Y given the value of X has form
V(Y) - E[V(YIX)] V(Y)
(1.1 )
where V(Y) is an index of variation for the marginal distribution of Y, and E[V(YIX)] is the expectation of the conditional variation taken with respect to the distribution of X (Agresti, 2002. p. 56). Tomizawa, Seo and Ebi (1997) proposed the generalized PRY measure defined by
(A> -1),
where c
Pi·
=
r
LPit, p.j
=
t=l
LPSj, s=l
and the value at A = 0 is taken to be the continuous limit as A - 7 0, and where A is a real value that is chosen by the user. Note that Tomizawa and Ebi (1998) and Tomizawa and Machida (1999) extended the measure T(A) into the multi-way contingency tables. The variation index used in T(A) is
V(Y) =
~ (1 -tp~+l) , J=l
which includes the Shannon entropy (when A = 0) and Gini concentration (when A = 1). In special cases, when A = 1, T(l) is identical to Goodman and Kruskal's (1954) measure (called the concentration coefficient) defined
342
by
T=
and when).. = 0, T(O) is identical to Theil's (1970) measure (called the uncertainty coefficient) defined by
t U
=
i=l
tpij log ( Pi j ) j=l p"'P'J
---'----c-----
- LP.j logp.j j=l
In a nominal-nominal contingency table, for a situation in which the explanatory and response variables are not defined clearly, a measure which describes the PRY from the marginal distribution of one variable (of X and Y) to the conditional distribution of the variable given the value of the other variable has a general form V(Y)
+ V(X)
- E[V(YIX)]- E[V(X/y)] V(Y) + V(X)
(1.2)
Miyamoto, Usui and Tomizawa (2005) proposed a generalized PRY measure, i.e., a generalized total uncertainty measure Tt~~l with)" > -1 (in a similar idea to TP.·»). In a special case, when).. = 0, Tt~Ll is identical to Freeman's (1987, p. 101) total uncertainty measure defined by 2
Utotal =
t
i=l
t Pij log ( Pi j j=l p",p']
)
~-rCC-------c-----
- LPi.logpi. - LP.j logp.j i=l j=l
For a nominal-ordinal table with a nominal variable X and an ordinal variable Y, Tomizawa, Miyamoto and Yajima (2002) proposed a PRY measure. For ordinal-ordinal tables, Tomizawa and Yukawa (2003, 2004) proposed some PRY measures. Also, for an ordinal-ordinal table in which the explanatory and response variables are not defined clearly, Yamamoto and Tomizawa (2009) considered a PRY measure ~~lal with ).. > -1 (see Section 2).
343
For the data in Tables 1 through 6, we cannot define clearly which of row and column variables is the explanatory variable and the response variable. So, for these data we are interested in applying Yamamoto-Tomizawa PRY (A) measure I]> total' The purpose of the present paper is (1) to review the PRY measure I]>~~L and (2) to analyze and compare between the data in Tables 1 through 6 . h ,T..(A) usmg t e measure 'l'total' 2. Review of generalized total uncertainty measure
Consider an r x c contingency table with ordered categories in which the explanatory and response variables are not defined clearly. This section reviews briefly the generalized total uncertainty measure I]> ~~L. The measure is defined as follows: for A > -1,
(A + 1) I]>(A) _ total -
[~~(F(k))A+1 ~ ~'J
J(A) l(Jk)
+
j=l k = l
~ ~(dk))A+1 ~ ~ i= l k= l
2'
r- 1
c-1
L H~~]) + L j=l
'
H~~)
i=l
where c
j
F.~1) =
LP.t,
L
F.j2) =
t= l
=
r
'" ~Ps.,
C(2) 2'
= '" ~
Ps· ,
s=i+1
s=l
~A
(1 _~(F(k))A+1)
'
= ~A
(1 _~(dk))A+1)
,
H(A) = l(J)
H(A) 2(2)
p·t ,
t= j + 1
i
(l) C i·
~'J k=l
~
2'
k=l
(A)
J'Uk)
=
1
'\(H 1)
~ Flk) r
F(k)
~
{(
p,) A} ,
F(k)jF(k) 2J
1
J(A) 2(2k)
'J
_
1
344
with j
FS) =
aU) =
F(2) =
LPit, t=1
2J
c
Pit, L t=j+l
i
r
0<2) =
LPsj, s=1
'J
Psj, L s=i+l
and where the value at A = 0 is taken to be continuous limit as A ----> O. Note that each of Hi~]) and H~~]) is the Patil and Taillie's (1982) diversity index which includes the Shannon entropy (when A = 0), 2
Hi~J) = -
L F.~k) log F.~k), k=1
and each of Ji~]k) and J~~L) are the power-divergence (Cressie and Read, 1984) between two distributions which includes the Kullback-Leibler information (when A = 0), (O) J l(jk) =
r '" ~ i=1
F(k)
'J 1 -----u0 og F. j
(F(k)/F(k)) 'J
.
p,.
.J
.
We see that for each A, (1) 0 S ~~L 1, (2) ~~L = 0 if and only if X is independent of Y (i.e., there is the null association in the table), and (3) ~~L = 1 if and only if there is no conditional variation in the sense that for each i, P(Y = tlX = i) = 1 for some t, and for each j, P(X = slY = j) = 1 for some s (i.e., there is the complete association in the table).
s
3. Approximate confidence interval for measure Let Iij denote the observed frequency in the (i, j)-th cell (i = 1, ... , r; j = 1, ... ,c). Assume that a multinomial distribution applies to the table. The . 0 f ",(A) . .i.(A) . . by ",(A) . h { } 1 d samp1e verSlOn '¥ total' 1.e., '¥ total> IS gIven '¥ total WIt Pij rep ace by {Pij}, where Pij = Iij / nand n = L L Iij. Using the delta method (Bishop et al., 1975, Sec. 14.6), Vn(~~LI - ~~L) has asymptotically (n ----> (0) a normal distribution with mean zero and variance (72 ~~iatl. For
[
345
the detail of (72[~~L], see Yamamoto and Tomizawa (2009). Therefore we can obtain an approximate confidence interval for ~~ial using the estimated approximate standard error o-[~~Ll/Vn for ~~~ial' where o-2[~~iall denote (72 [ ~~all with {Pij} replaced by {Pij}.
4. Analysis of data
We shall analyze the data in Tables 1 through 6 using the total uncertainty measure ~~ial' Table 7 gives the value of estimated measure ~~~L and the approximate 95% confidence interval for the measure ~~L applied to the data in Tables 1 through 6. We see from Table 7 that the confidence interval for ~~L applied to the data in Table 1 includes zero for all A. Therefore this would indicate that there is a structure of independence, i.e., null association, between the daily temperature at Nagasaki City in 2001 and in 2002. Namely, when we know the temperature of a day in 2001 (in 2002), the knowledge would not be useful for predicting the temperature of the day in 2002 (in 2001). We also see from Table 7 that for the data of Merino ewes in Table 2 the values of estimated measure ~~~Ll are close to zero, however, the confidence intervals for ~~ial do not include zero. Therefore these would indicate that the number of lambs born in 1953 may be somewhat associated with the number of lambs born in 1952. Namely, when we know the number of lambs born in 1952 (in 1953), the knowledge may be somewhat useful for predicting the number of lambs born in 1953 (in 1952). Moreover we see from Table 7 that for three kinds of vision data in Tables 3, 4 and 5, the values of estimated measure ~~~L are greater than zero and the confidence intervals for ~~ial do not include zero. Therefore for each of vision data in Tables 3, 4 and 5, the right eye grade for an individual is strongly associated with the left eye grade for the individual. Namely, when we know the grade of one eye (of right eye and left eye) for an individual, the knowledge would be useful for predicting the grade of the other eye for the individual. In addition, we see from Table 7 that (1) the value of estimated measure ~~~L and the values in confidence interval for ~~ial applied to the vision data of pupils in Table 4 are greater than the corresponding values of them applied to the vision data of women in Table 3, and (2) the values of them applied to the vision data of students in Table 5 are greater than the corresponding values of them applied to the vision data of pupils in Table
346
4. Thus, when we want to predict the grade of one eye for an individual by obtaining the knowledge of the grade of the other eye for the individual, (1) the knowledge would be useful for the data of pupils in Table 4 rather than for the data of women in Table 3, and also (2) the knowledge would be useful for the data of students in Table 5 rather than for the data of pupils in Table 4. We see from Table 7 that for example, when A = 1, the value of estimated measure ~!Ll is 0.650 for the data of students in Table 5. Thus, when we predict the grade of one eye for a student by obtaining the knowledge of the grade of the other eye for the student, the prediction becomes 65% better when we know the information than when we do not know the information. Similarly, for the vision data of pupils in Table 4, the prediction becomes 53% better when we know the information than when we do not know the information. Also, for the vision data of women in Table 3, the prediction becomes 44% better. We see further from Table 7 that for the data of socioprofessional status in Table 6, the value of estimated measure ~~lal are greater than zero and the confidence intervals for ~~L do not include zero. In addition, the value of ~~L applied to the data in Table 6 is greater than any other value of ~~ial applied to the data in Tables 1 through 5. Therefore, for the data of socioprofessional status in Table 6, the socioprofessional status in 1962 for an individual is strongly associated with the status in 1954 for the individual. Namely, when we know the socioprofessional status in 1954 (in 1962) for an individual, the knowledge would be useful for predicting the status in 1962 (in 1954) for the individual. We see from Table 7 that for example, when A= 1, the value of estimated measure ~!~al is 0.699 for the data of socioprofessional status in Table 6. Thus, we want to predict the socioprofessional status in one year of 1954 and 1962 for an individual by obtaining the knowledge of the status in the other year for the individual, the prediction becomes about 70% better when we know the information than when we do not know the information. 5. Remarks
For a two-way contingency table with the explanatory variable X and the response variable Y, the PRY measure of form (1.1) including, e.g., TCA), T and U, would be useful for seeing what degree the relative improvement in variation for predicting the value of response variable Y when we know the value of explanatory variable X is toward the perfect prediction (i.e.,
347
when the measure equals 1). For a two-way contingency table in which the explanatory and response variables are not defined clearly, the PRY measure of form (1.2) including, e.g., Tt~~~l' Utotal and ~~ial' would be useful for seeing what degree the relative improvement in variation for predicting the value of one variable when we know the value of the other variable is toward the perfect prediction. Yamamoto and Tomizawa (2009) applied the total uncertainty measure ~~ial to the data of cross-classification of father's and his son's occupational status in Denmark, in British and in Japan (though the details are omitted here). When we want to predict the son's occupational status for a pair of father and his son by obtaining the knowledge of his father's occupational status for the pair, and conversely we want to predict the father 's occupational status for the pair by obtaining the knowledge of his son's occupational status for the pair, we are interested in what degree the prediction becomes better when we know the information for one of the pair than when we do not know the information. In such a case, the PRY measure as ~~L would be useful for measuring the degree of the proportional reduction in variation. Note that (1) the measures T()') , T, U, TL~~1 and Utotal are usually used when the row and column classifications have both nominal categories, (2) the measure ~~L is used when those have both ordered categories, and (3) the PRY measure proposed by Tomizawa et al. (2002) is used when one of row and column classifications has the nominal category and the other has the ordered category.
6. Conclusions The present paper has analyzed several categorical data and compared the degree of PRY between them using the total uncertainty measure ~~L· In the present paper we have seen that when we want to predict the value of one variable by knowing the value of the other variable, the prediction based on the information would be useful for the unaided distance vision data and for the data of socioprofessional status rather than for the data of temperatures in two years and for the data of numbers of lambs born in consecutive years.
7. Discussion The data in Table 8, taken from Everitt (1992, p. 56) show the frequencies obtained when 284 consecutive admissions to a psychiatric hospital are
348
classified with respect to social class and diagnosis. Everitt examined what degree the knowledge of a patient's social class is useful for predicting his diagnostic category using Goodman and Kruskal's (1954) lambda measure. We are now interested in applying the PRY measures, TeA), Tt~~l and
1>~~L to the data in Table 8. However, since the categories in Table 8 seem to be nominal (i.e., not ordinal), it would not be suitable to apply the measure 1>~~lal. Therefore we shall apply TeA) and Tt~~~l to these data. We now see that the estimated values of TeA) (TL~~I) are, for instance, 0.041 (0.046) when A = 0, 0.043 (0.050) when A = 0.4, and 0.039 (0.049) when A = 1, and the confidence intervals for TeA) (Tt~~~l) do not include zero though the details are omitted. Therefore the knowledge of a patient's social class would be useful for predicting his diagnostic category, and also conversely the knowledge of a patient's diagnostic category would be useful for predicting his social class. So, using the PRY measure, it would be important to examine what degree the prediction of his diagnostic category becomes better when one knows the patient's social class than when one does not know it.
References 1. Agresti, A. (1983). A simple diagonals-parameter symmetry and quasi-
symmetry model. Statistics and Probability Letters, 1, 313-316. 2. Agresti, A. (2002). Categorical Data Analysis, second edition. Wiley, New York. 3. Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. The MIT Press, Cambridge, Massachusetts. 4. Bowker, A. H. (1948). A test for symmetry in contingency tables. Journal of the American Statistical Association, 43, 572-574. 5. Caussinus, H. (1965). Contribution a l'analyse statistique des tableaux de correlation. Annales de la Faculte des Sciences de l'Universite de Toulouse, 29, 77-182. 6. Cressie, N. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. Journal of the Royal Statistical Society, Ser. B, 46, 440-464. 7. Everitt, B. S. (1992). The Analysis of Contingency Tables, second edition. Chapman and Hall, London. 8. Freeman, D. H. (1987). Applied Categorical Data Analysis. Marcel Dekker, New York. 9. Goodman, L. A. (1979). Multiplicative models for square contingency tables with ordered categories. Biometrika, 66, 413-418. 10. Goodman, L. A. and Kruskal, W. H. (1954). Measures of association for cross classifications. Journal of the American Statistical Association, 49, 732-764.
349 11. McCullagh, P. (1978). A class of parametric models for the analysis of square contingency tables with ordered categories. Biometrika, 65, 413-418. 12. Miyamoto, N., Niibe, K. and Tomizawa, S. (2005). Decompositions of marginal homogeneity model using cumulative logistic models for square contingency tables with ordered categories. Austrian Journal of Statistics, 34, 361-373. 13. Miyamoto, N., Ohtsuka, W. and Tomizawa, S. (2004). Linear diagonalsparameter symmetry and quasi-symmetry models for cumulative probabilities in square contingency tables with ordered categories. Biometrical Journal, 46, 664-674. 14. Miyamoto, N., Usui, E. and Tomizawa, S. (2005). Generalized total uncertainty measure for two-way contingency table with nominal categories. The Pacific and Asian Journal of Mathematical Sciences, 1, 23-39. 15. Patil, G. P. and Taillie, C. (1982). Diversity as a concept and its measurement. Journal of the American Statistical Association, 77, 548-561. 16. Stuart, A. (1955). A test for homogeneity of the marginal distributions in a two-way classification. Biometrika, 42, 412-416. 17. Tahata, K. and Tomizawa, S. (2006). Decompositions for extended double symmetry models in square contingency tables with ordered categories. Journal of the Japan Statistical Society, 36, 91-106. 18. Tahata, K. and Tomizawa, S. (2008). Orthogonal decomposition of pointsymmetry for multi-way tables. Advances in Statistical Analysis, 92, 255-269. 19. Tahata, K., Takazawa, A. and Tomizawa, S. (2008). Collapsed symmetry model and its decomposition for multi-way tables with ordered categories. Journal of the Japan Statistical Society, 38, 325-334. 20. Tahata, K., Yamamoto, K., Nagatani, N. and Tomizawa, S. (2009). A measure of departure from average symmetry for square contingency tables with ordered categories. Austrian Journal of Statistics, 38, 101-108. 21. Tallis, G. M. (1962). The maximum likelihood estimation of correlation from contingency tables. Biometrics, 18, 342-353. 22. Theil, H. (1970). On the estimation of relationships involving qualitative variables. American Journal of Sociology, 76, 103-154. 23. Tomizawa, S. (1984). Three kinds of decompositions for the conditional symmetry model in a square contingency table. Journal of the Japan Statistical Society, 14, 35-42. 24. Tomizawa, S. (1985). Analysis of data in square contingency tables with ordered categories using the conditional symmetry model and its decomposed models. Environmental Health Perspectives, 63, 235-239. 25. Tomizawa, S. (1993). Diagonals-parameter symmetry model for cumulative probabilities in square contingency tables with ordered categories. Biometrics, 49, 883-887. 26. Tomizawa, S. (2009). Analysis of square contingency tables in statistics. American Mathematical Society Translations, 227, 147-174. 27. Tomizawa, S. and Ebi, M. (1998). Generalized proportional reduction in variation measure for multi-way contingency tables. Journal of Statistical Research, 32, 75-84.
350
28. Tomizawa, S. and Machida, M. (1999). Measure of proportional reduction in variation for multi-way contingency tables with multiple response variables. The Egyptian Statistical Journal, 43, 167-182. 29. Tomizawa, S. and Tahata, K. (2007). The analysis of symmetry and asymmetry: orthogonality of decomposition of symmetry into quasi-symmetry and marginal symmetry for multi-way tables. Journal de la Societe Francaise de Statistique, 148, 3-36. 30. Tomizawa, S. and Yukawa, T. (2003). Proportional reduction in variation measures of departure from cumulative dichotomous independence for square contingency tables with same ordinal classifications. Far East Journal of Theoretical Statistics, 11, 133-165. 31. Tomizawa, S. and Yukawa, T. (2004). Proportional reduction in variation measure for two-way contingency tables with ordered categories. Journal of Statistical Research, 38, 45-59. 32. Tomizawa, S., Miyamoto, N. and Yajima, R. (2002). Proportional reduction in variation measure for nominal-ordinal contingency tables. Calcutta Statistical Association Bulletin, 53, 167-183. 33. Tomizawa, S., Miyamoto, N. and Yamamoto, K. (2006). Decomposition for polynomial cumulative symmetry model in square contingency tables with ordered categories. Metron, 64, 303-314. 34. Tomizawa, S., Seo, T. and Ebi, M. (1997). Generalized proportional reduction in variation measure for two-way contingency tables. Behaviormetrika, 24, 193-201. 35. Yamamoto, K. and Tomizawa, S. (2009). Measure of proportional reduction in variation and measure of agreement for contingency tables with ordered categories. International Journal of Applied Mathematics and Statistics, 14, 3-23.
351 Table 1. The daily temperatures at Nagasaki City, Japan, in 2001 and 2002; from Tahata et al. (2008).
2001
Below normals (1)
2002 Normals (2)
Above normals (3)
Total
Below normals (1) Normals (2) Above normals (3)
11 38 19
18 79 64
30 64 42
59 181 125
Total
68
161
136
365
Table 2. Merino ewes according to number of lambs born in consecutive years; from Tallis (1962). Number of Lambs in 1953
Number of Lambs in 1952 0 1 2
Total
2
58 26 8
52 58 12
1 3 9
111 87 29
Total
92
122
13
227
0
Table 3. D naided distance vision of 7477 women aged 30-39 employed in Royal Ordnance factories in Britain from 1943 to 1946; from Stuart (1955). Left eye grade Third Second (2) (3)
Right eye grade
Best (1)
Best (1) Second (2) Third (3) Worst (4)
1520 234 117 36
266 1512 362 82
Total
1907
2222
Worst (4)
Total
124 432 1772 179
66 78 205 492
1976 2256 2456 789
2507
841
7477
352 Table 4. Unaided distance vision of 3168 pupils comprising nearly equal number of boys and girls aged 6-12 at elementary schools in Tokyo, Japan, examined in June 1984; from Tomizawa (1985). Left eye grade Second Third (2) (3)
Right eye grade
Best (1)
Best (1) Second (2) Third (3) Worst (4)
2470 96 10 12
126 138 42 7
Total
2588
313
Worst (4)
Total
21 33 75 16
10 5 15 92
2627 272 142 127
145
122
3168
Table 5. Unaided distance vision of 4746 students aged 18 to about 25 including about 10% women in Faculty of Science and Technology, Science University of Tokyo in Japan examined in April 1982; from Tomizawa (1984). Left eye grade Second Third (2) (3)
Right eye grade
Best (1)
Best (1) Second (2) Third (3) Worst (4)
1291 149 64 20
130 221 124 25
Total
1524
500
Worst (4)
Total
40 114 660 249
22 23 185 1429
1483 507 1033 1723
1063
1659
4746
Table 6. Cross-classification of individuals according to Socioprofessional status; from Caussinus (1965). Status in 1954
(1)
(2)
Status in 1962 (3) (4) (5)
(1) (2) (3) (4) (5) (6)
187 4 22 6 1 0
13 191 8 6 3 2
17 4 182 10 4 2
11 9 20 323 2 5
Total
220
223
219
370
(6)
Total
3 22 14 7 126 1
1 3 4 17 153
232 231 249 356 153 163
173
179
1384
353
Table 7.
Estimate of measure
error for 4:>~~lal' and approximate 95% confidence interval for
-0.4 0.0 0.4 1.0 -0.4 0.0 0.4 1.0 -0.4 0.0 0.4 1.0 -0.4 0.0 0.4 1.0 -0.4 0.0 0.4 1.0 -0.4 0.0 0.4 1.0
Estimated measure Standard error (a) For Table 1 0.009 0.006 0.011 0.007 0.012 0.008 0.012 0.008 (b) For Table 2 0.085 0.030 0.093 0.033 0.094 0.034 0.095 0.035 (c) For Table 3 0.278 0.007 0.363 0.008 0.408 0.009 0.435 0.009 (d) For Table 4 0.432 0.019 0.503 0.020 0.524 0.021 0.530 0.021 (e) For Table 5 0.470 0.010 0.574 0.010 0.622 0.010 0.650 0.009 (f) For Table 6 0.518 0.019 0.624 0.019 0.673 0.019 0.699 0.019
Confidence interval (-0.003, (-0.004, (-0.004, (-0.004,
0.020) 0.025) 0.028) 0.029)
(0.027, (0.029, (0.028, (0.026,
0.143) 0.157) 0.161) 0.163)
(0.265, (0.347, (0.390, (0.417,
0.292) 0.379) 0.425) 0.453)
(0.395, (0.463, (0.484, (0.488,
0.470) 0.543) 0.565) 0.572)
(0.450, (0.554, (0.603, (0.631,
0.490) 0.593) 0.641) 0.668)
(0.481, 0.554) (0.587, 0.662) (0.636,0.710) (0.663, 0.736)
354
Table 8. Social class and diagnostic category for a sample of psychiatric patients; from Everitt (1992, p. 56). Diagnosis Depressed Personality disorder
Social class
Neurotic
(1) (2) (3)
45 10 17
25 45 21
Total
72
91
Schizo phrenic
Total
21 24 18
18 22 18
109 101 74
63
58
284
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 355- 361)
THE ELECTRON RESERVOIR HYPOTHESIS FOR TWO-DIMENSIONAL ELECTRON SYSTEMS
K. YAMADA1, T. UCHIDA1 , M. FUJITAl, H. KOIZUMI 2
AND T. TOYODA h 1 Department of Physics, Tokai University, Kitakaname 1117, Hiratsuka, Kanagawa 259-1292, Japan 2Nippon Gear Co., Ltd., Kirihara-cho 7, Fujisawa, Kanagawa 252-0811 , Japan * Corresponding author. E-mail: [email protected]
The electron reservoir model for the integer quantum Hall effects, the magnetoplasmon dispersion plateaus, and the radiation-induced magnetoresistance oscillations, are briefly reviewed.
Keywords: Two-dimensional electron systems; Quantum Hall effects; Magnetoplasmon dispersion ; Magnetoresistance oscillations.
1. Introduction
The quantum statistical theory of many-electron systems is based on the grand canonical ensemble, which requires the existence of an electron reservoir. If the number of electrons of the system under consideration is fixed, then it is necessary to calculate the chemical potential as a function of the electron number N exp and other thermodynamic variables by solving the equation (1)
where ~ oR is the second quantized field operator in the Heisenberg picture describing the electrons, and a is the spin variable. The notation < ... >G stands for the grand canonical ensemble expectation value defined as 1
(.. ')G == -tr{e- i3 Zo
355
(
~
~
H-j1.N
)
... },
(2)
356
where Ze is the grand partition function, f3 = l/kBT, J-L is the chemical potential, and iI and N are the Hamiltonian and number operators for the electrons, respectively. Except for the zero-temperature limit of an ideal electron gas, the above equation (1) cannot be solved to find the chemical potential
(3) This difficulty is inherent in the grand canonical ensemble formulation of quantum many-body theories. However, if the system is an open system with respect to the electron number, the situation is totally different. In such a system, the chemical potential is one of the independent thermodynamic variables that are directly controlled in the experiment. Then one need not solve the above equation. The aim of this paper is to show that there exist such cases in the two-dimensional electron systems (2DES) in various semiconductors such as MOSFET and GaAs heterostructure FET 1 ,2,3. Three prominent cases, i.e., the integer quantum Hall effects, the magnetoplasmon dispersion plateaus, and the radiation-induced magnetoresistance oscillations, are briefly reviewed in the following sections. 2. Quantum Hall Effects In 1980 von Klitzing4 discovered that the Hall conductivity of a twodimensional electron system in the inversion layer of MOSFET at a very low temperature (T = 1.5K) is quantized
(j=1,2 , ... )
(4)
when the system is subjected to a strong perpendicular magnetic field (B = 18.9T). In 1985, Toyoda, Gudmundsson, and Takahashi3 showed that this phenomenon can be fully explained by introducing the second quantized Schrodinger field operators to describe the electrons and by assuming the Hamiltonian
(5) On the right-hand side, the first term HA is the kinetic energy with minimal coupling to the magnetic field; the second term HE, the energy due to the electric field; the third term H spin , the coupling between the spin and the magnetic field; the fourth term H e - e , the electron-electron interaction; and the fifth term H imp , the effects of impurities or lattice defects. Using this
357
Hamiltonian, the canonical equations of motion for the currents
Iv =
J
d2r [- ien ljI~(x) '8 vljla(X) - ~ Av(X)ljI~(X)ljIa(X)] 2m me
(6)
are calculated. Then taking the grand canonical ensemble average of the equations and considering the boundary conditions, the Hall conductivity formula
Nee
(7)
O"H=B
is obtained. If the 2DES is an open system with respect to electrons, then
eB N = -
1
LL = N((3 fl) he a n= al+exp(3(Ena-fl) " 00
(8)
where Ena is the energy spectrum of the electron asymptotic field, i.e., the Landau levels with renormalized parameters. Equations (7) and (8) yield the quantum Hall conductivity formula
e2 O"H
=h
1
00
~ ~ 1 + exp (3 (Ena -
fl) .
(9)
The significance of this result is that the electron-electron interaction and the impurity term in the Hamiltonian do not appear explicitly. Their effects should be in the energy spectrum Ena. Adopting the linear model for the relation between the chemical potential and the gate voltage fl
= a(Vg + Va)
(10)
and assuming the Landau level energy spectrum Ena
=
eB n-m*e
(1) +- + n
2
g* flB sgn(a)-2-B ,
(11)
where flB is the Bohr magneton, m* and g* are the effective mass and effective g-factor, respectively, the Hall conductivity can be written as a function of B, Vg , and temperature. The obtained theoretical results show excellent quantitative agreement with the experimental data3 . 3. Magnetoplasmon Dispersion Plateaus
In 2004, Holland et al. 6 measured the explicit filling factor dependence of the dispersion of the long-wavelength magnetoplasmon in a high-mobility 2DES realized in a GaAs quantum well, using the coupling between the
358
plasmon with THz radiation. The observed dispersion seemed to deviate violently from the well-established semi-classical dispersion 271"e 2 N 2DES (12) q, em where e is the dielectric constant of GaAs semiconductor into which the 2DES is embedded, m is the electron effective mass, -e is the electron charge, N 2DE S is the electron number density, q is the wave-number vector, and We = eBjmc is the cyclotron frequency. U sing the measured magnetoplasmon frequency they defined the renormalized magnetoplasmon frequency, 2
2
W mp
= We +
w!;P,
nEXP =
{
2 - We
We
mp
They plotted this
EXp }2
W mp
.
(13)
n!;P versus the filling factor defined as hcNsample
v = ---"--
(14) eB ' where Nsample is the electron number density of the samples, whose explicit values were given for three samples in Ref. 6. By substituting Eqs. (12) and (14) into Eq. (13), and assuming N 2DE S = Nsample, one straightforwardly finds
nEXP
_
271"ecNsample
mp
-
eB
_
q-
271"e 2 eh vq.
(15)
For a fixed value of the wave-number vector q, this Eq. (15) shows that n!;p is simply proportional to the filling factor . After the measurement, however, they found an astonishing deviation from such a simple linear relation. They found a quantized dispersion with plateaus forming around even filling factors. If we follow carefully the quantum statistical mechanical derivation of the dispersion relation (12)10, then we find that the quantity N 2DES in the formula is actually the grand canonical ensemble expectation value for the electron number density in the system and that it should be given by (8). Then, the renormalized magnetoplasmon frequency defined by Eq. (13) should be replaced by 271"ecN2DEs eB q,
(16)
where w mp is given by Eq. (12). The only difference between Eqs. (15) and (16) is the electron number density. In Eq. (15) , Nsample is a given
359
quantity. On the other hand, in Eq. (16), N 2DE S is a function of the temperature T, the magnetic field strength B, and the chemical potential /-i, as given by Eq. (8). Substituting Eq. (8) into Eq. (16), we find 7
Apart from the factor 27rq/c , the resemblance to the quantum Hall conductivity formula is derived in Refs. 2 and 8 is unmistakable. This result (17) shows excellent agreement with the experimental data.
4. Radiation-induced Magnetoresistance Oscillations
In 2004, Zudov et alY and Mani et aP2 found a new class of magnetoresistance oscillations in a high-mobility two-dimensional electron system (2DES) in a GaAs/ AlxGal-xAs heterostructure subjected to weak magnetic fields and millimeterwave radiation. The period of these new oscillations is governed by the ratio of the millimeterwave to cyclotron frequencies, and the minima of the oscillations are characterized by an exponentially vanishing diagonal resistance. When the millimeterwave radiation is turned off, the magnetoresistance shows the well-established SdHvA oscillation whose period is governed by the ratio of the chemical potential to the cyclotron frequency. Hence this magnetoresistance oscillation is apparently induced by the illumination with millimeterwaves. Although various different theoretical models have been proposed, one crucial question remains to be solved. The experiment by Smet et al. 13 shows that the resistance oscillations are notably immune to the polarization of the radiation field. This observation is discrepant with these theories and seems to cast doubt on the validity of the theoretical models so far proposed. The Fermi liquid theory of electrical conductivity was originally formulated by Eliashberg. The core of the theory is the analytic continuation of the finite temperature current correlation function with respect to the Matsubara frequency to obtain the retarded real-time current response function, which directly yields the conductivity. Here the formulation given in Ref. 5 is applied to an electron gas that is confined in the xy-plane and subjected to a perpendicular magnetic field B = (0,0, B). The general expression of
360
conductivity can be found as -e 2 !'i
O'xx = 4m 2
LL 00
J
dw
M=O
0:
x
1 27f
jPrna x
o
2 f(w) R A dp P ~GMex(p,W)GMa(P,W)
-Prnax
{I + ~ReAII(p,w)} (18)
where f(w) = (l+exp(,8hw))-I, GR (G A ) is the retarded (advanced) Green function, and An is the vertex function. The theories so far proposed consider the effects ofthe millimeterwave radiation on the 2DES only. However, if there is an electron reservoir, the effects of the radiation on the electrons in the reservoir should also be taken into account. Since the amount of energy that an excited electron can receive from millimeterwave radiation is !'iv, the condition under which the electron can join the 2DES should be hw e/ 2 + E res < !'iv. This condition can also be written as B < (2mc/!'ie)(!'iv - Eres) == Be. As the chemical potential is the minimum free energy to add an electron to the system, the emergence of such a process may be described by introducing another singularity in the retarded Green function at !'iv. This singularity of the Green function may be expressed as an effective chemical potential. Then the conductivity formula (18) yields I4
O'xx
e2 A
= -m
{WIBI
+ W2B2 + ~B} ==
e2 A
-WON m
(19)
when the radiation is on, and
e2 A O'xx = - {W2 m
+ ~B} ==
e2 A
-WOFF m
(20)
when the radiation is off. Here ~ == (-e/47f 2 mcA) LM AO is assumed to be a constant, and Wi'S are given as Wi
=
L ex
f
M =O
,8 hw e
{1 + e(3(c M ,,-TJ;l} {I + e-(3(C M ,,-TJil} .
(21)
In the measurement by Zudov et al. l l the current Ix is measured by controlling the electric field Ex, while the current Iy as well as the external electric field Ey are kept zero. Therefore, the resistivity Rxx observed in their measurement should correspond to 1/0' xx in this theory. The resistivity corresponding to Rxx in RefY is given as (22)
361
The theoretical pattern shows excellent agreement with the experimental curve. The B-dependence of the oscillatory patterns of the millimeterwave induced magnetoresistance oscillations observed by Zudov et al. l l is almost perfectly reproduced from our theoretical model based on the FLH and the ERH, including its immunity to the polarization of the radiation field in perfect accordance with the experimental observation by Smet et al. 13. 5. Concluding Remarks We have shown that the electron reservoir model can perfectly explain the three prominent phenomena in semiconductor 2DES. Although experimental identification of the microscopic mechanism of the electron reservoir still needs to be carried out, there seems to be no doubt that the electron reservoir should exist in those systems. References 1. G. A. Baraff and D. C. Tsui, Phys. Rev. B 24, 2274 (1981). 2. T. Toyoda, V. Gudmundsson, and Y. Takahashi, Phys. Lett. 102A, 130 (1984) 3. T. Toyoda, V. Gudmundsson, and Y. Takahashi, Physica 132A, 164 (1985). 4. K. von Klitzin, G. Dorda, and M. Pepper, Phys. Rev. Lett. 45, 494 (1980). 5. T. Toyoda, Phys . Rev. A 39, 2659(1989) 6. S. Holland , Ch. Heyn, D. Heitmann, E. Batke, R. Hey, K. J. Friedland, and C.-M. Hu, Phys. Rev. Lett. 93, 186804 (2004). 7. T. Toyoda, N. Hiraiwa, T. Fukuda, and H. Koizumi, Phys. Rev. Lett. 100, 036802 (2008). 8. T. Toyoda, V. Gudmundsson , and Y . Takahashi, Physica 132A, 164 (1985). 9. M. P. Greene, H. J . Lee, J. J. Quinn, and S. Rodriguez, Phys. Rev. 177, 1019 (1969). 10. N. Hiraiwa and T. Toyoda, in preparation. 11. M.A. Zudov, D.R. Du, J.A. Simmons, and J.L. Reno, Phys. Rev. B 64, 201311 (2001). 12 . R. G. Mani, J. H. Smet, K. von Klitzing, V. Narayanamurti, W. B. Johnson, and V. Umansky, Nature 420,646 (2002). 13. J. H. Smet , B. Gorshunov, C. Jiang, L. Pfeiffer, K. West, V. Umansky, M. Dressel, R. Meisels, F. Kuchar, and K. von Klitzing, Phys. Rev. Lett. 95, 116804 (2005). 14. T. Toyoda, Modern Physics Letters B, 24, 1923 (2010).
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 363-372)
ON THE CORRESPONDENCE BETWEEN NEWTONIAN AND FUNCTIONAL MECHANICS E.V. PISKOVSKIY, LV. VOLOVICH
Moscow Institute of Physics and Technology Institustkiy lane 9, 141700 Dolgoprudny, Moscow Region, Russia email: [email protected] Steklov Mathemati cal Institute Gubkin St.8, 119991 Moscow, Russia email: volovich@mi. ms.ru
The world view underlying traditional science is based on reductionism and determinism when there is an empty space (vacuum) and material points which move along the Newtonian trajectories. This approach may be called "mechanistic" or " Newtonian" . Quantum mechanics, in its Copenhagen interpretation, also adopts this world view. However this world view is not satisfactory by at least two reasons. First, there is uncertainty in the derivation of the position and velocity of the material point and second, it can not solve the time irreversibility problem. Moreover, the Newtonian approach is not well suited for applications of mathematics and physics to life science. Recently a new approach to classical mechanics was proposed in which the basic notion is not the trajectory but a probability distribution. In this functional mechanics approach one deals with the mean trajectories and one has corrections to the Newtonian equation of motion. In this note we consider correspondence between the Newtonian trajectories for an anharmonic oscillator and the averaged trajectories in the functional mechanics and compute the dependence of the characteristic time from the dispersion.
1. Introduction
Classical mechanics, as first formulated by Newton and further developed by others, was seen, until the early 20th century, as the foundation for science as a whole. Other disciplines, such as physics, biology, or economics, did accept a general mechanistic or Newtonian approach and world view. Even quantum mechanics is based on the classical deterministic Newtonian mechanics , according to the Copenhagen interpretation. The basic principle behind Newtonian science is reductionism: to understand any complex phenomenon, you need to reduce it to its individual components. The smallest possible parts are called atoms or "elementary 363
364
particles". The only property that fundamentally distinguishes particles is their position in space and velocities. If you know the initial positions and velocities of the particles constituting a system together with the forces acting on those particles, then you can in principle predict the further evolution of the system with complete certainty and accuracy. The evolution will be regular, reversible and predictable. Such categories as life, mind, or organization are to be seen as particular arrangements of particles in space and time. However the Newtonian world view is not satisfactory by at least two reasons. First, there is uncertainty in the derivation of the position and velocity of the material point. Second, it can not solve the time irreversibility problem, see 1; for a discussion of the irreversibility problem see 2. Moreover, the Newtonian approach is not well suited for applications of mathematics and physics to life science. Recently a new approach to classical mechanics was proposed in which the basic notion is not the trajectory but a probability distribution. In this functional mechanics approach 1, see also 3,4,5,6 ,7, one deals with the mean trajectories and one has corrections to the Newtonian equation of motion. Emphasize that the exact derivation of the coordinate and momentum can not be done, not only in quantum mechanics, where there is the Heisenberg uncertainty relation, but also in classical mechanics. Always there are some errors in setting the coordinates and momenta. There are classical uncertainty relations 1: !:,.q > 0, !:,.p> 0, i.e. the uncertainty (errors of observation) in the determination of coordinate and momentum is always positive (non zero). The concept of arbitrary real numbers, given by the infinite decimal series, is a mathematical idealization, such numbers can not be measured in the experiment. In this note we consider correspondence between the Newtonian trajectories for an anharmonic oscillator and the averaged trajectories in the functional mechanics and compute the dependence of the characteristic time from the dispersion. We are motivated by the consideration of the quantum classical correspondence for the baker's map performed by Inoue, Ohya and one of the present authors 8. Note that the conventional widely used concept of the microscopic state of the system at some moment in time as the point in phase space, as well as the notion of trajectory and the microscopic equations of motion have no direct physical meaning, since arbitrary real numbers not observable
365
(observable physical quantities are only presented by rational numbers). The fundamental equation of the microscopic dynamics of the proposed functional probabilistic approach is not Newton's equation, but a Liouville equation for distribution function. It is well known that the Liouville equation is used in statistical mechanics for the description of the motions of gas. Let us stress that we shall use the Liouville equation for the description of a single particle in the empty space. A Liouville equation on the manifold f with coordinates x = (xl, ... , xk) has the form [)p [)t
k
+
,,[) i L [)x Jpv )
= 0.
(1)
i=1
Here p
= p(x, t) is the probability density function and v = v(x) =
(vI, ... , v k )
vector field on f. The solution of the Cauchy problem for the equation (1) with initial data -
(2) might be written in the form
(3) Here CPt (x) is a phase flow along the solutions of the characteristic equation
X=v(x).
(4)
Let (q,p) be co-ordinates on the plane ffi.2 (phase space), t E ffi. is time. The state of a classical particle at time t will be described by the function p = p( q, p, t), it is the density of the probability that the particle at time t has the coordinate q and momentum p. Therefore, in the functional approach to classical mechanics the concept of precise trajectory of a particle is absent, the fundamental concept is a distribution function p = p(q,p, t) and D-function as a distribution function is not allowed. We assume that the continuously differentiable and integrable function p = p( q, p, t) satisfies the conditions:
p ~ 0,
r
ill!.2
p(q,p, t)dqdp = 1, t E ffi..
(5)
If f = f(q,p) is a function on phase space, the average value of f at time t is given by the integral
(f)(t) =
J
f(q,p)p(q,p, t)dqdp.
(6)
366
In a sense we are dealing with a random process ~(t) with values in the phase space. Motion of a point body along a straight line in the potential field will be described by the equation
8p p 8p 8t = - m 8q
8V(q) 8p
+ --;)g 8p .
(7)
Here V(q) is the potential field and mass m > O. If the distribution Po(q,p) for t = 0 is known, we can consider the Cauchy problem for the equation (7):
plt=o
=
(8)
Po(q,p)·
The mean trajectories defined as follows
(q)(t) =
J
qp(q,p, t)dqdp =
J
q(t)Po(q,p)dqdp,
(9)
where q(t) - is a classical trajectory of a point mass, the function q(t) is governed by Newton equation
.. mq
8V(q)
=--;)g'
Therefore the mean trajectory can be obtained by averaging classical trajectory with reference to probability distribution function for initial conditions. This fact will be widely used in the present work. 2. Anharmonic oscillator In the present work an anharmonic oscillator is considered. Namely a point mass is moving within a field that is described by the potential V(q): 1
1
V(q) = 2w6q2 + = t:: q4
(10)
where Wo > 0 and E > 0 is a small coupling constant. The coordinate q(t) changes in accordance with the Newton equation:
q + W6q = _ Eq 3.
(11)
We set the following initial conditions:
(12) The exact solution to (11) is well-known but for simplicity we shall use the approximate Krylov-Bogolyubov method described for example in 9. The solution reads: a3
qKB(t) = acos(tw)
+ E32w5
cos(3tw)
+ O(E2),
367
where a is an arbitrary constant and is the frequency w depends on parameter a as follows:
3
w = w(a) = wo(1 + Sa 2 c + O(c 2 )). From the initial conditions (12) we get (13) We write
and we have qKB(O, b) = b. The mean value of coordinate in the functional mechanics we define by the integral
(q)(t, CJ)
=
11qKB(t, b)e
r;;:;; V
CJ1r
(b - bO)2
CJ
db
(14)
IR
°
with the distribution function Po(b) = exp{ -(b- bO)2 /CJ} / ViW. Here CJ > is dispersion. Note that (q)(O, CJ) = boo We want to compare two time dependent functions: (q)(t, CJ) and qKB(t, bo). For small t the difference between them is small since (q)(O, CJ) = qKB(O, bo) = boo The question is what is the characteristic threshold value tc such that for t > tc the difference between the two functions become rather big and what is the dependence of tc = tc(CJ) from the dispersion CJ? Note that the analogous problem of the dependence of the characteristic time from the Planck constant is considered in 8.
3. Newtonian and Averaged Trajectories Comparison It was proved in
1
that for any time t one has lim (q)(t, CJ) = qKB(t, bo).
a-+O
That is why one should expect tc(CJ) to increase with CJ --> 0. One can assume without loss of generality that Wo = 1. The real roots of the cubic equation (13) are given by the following function 10:
a = a(b) =
S{£ sinh(~arCSinh(~f¥bo))
368
Let us introduce the following change of variables in the integral (14): + bo = b. It yields
zva
(q)(t, O") =
In 1
(a(zVa + bo) cos(tw(zVa + bo))+
+
E(a(zva+bo))3 32
2
cos(3tw(zva + bo)))e- Z dz
(15)
One can make a rough estimate of the threshold time tc by using the method of steepest descent. In this way we get tc = 0(1/ as 0" --7 O.
va)
4. Numerical Approach One of the ways to compare functional mechanical mean value of coordinate and trajectory obtained by means of perturbations theory is estimate dependence time tthr of convergence on dispersion 0" of initial condition b. To define the threshold time we consider the following function
L\(t) = I(q)(t, 0")
-
qKBM(t, bo)1
(16)
so that the moment of time tc = tthr(O", C) is the minimal value of time t when the absolute value of the function L\(t) equals some positive value C
(17) For instance, constant C can be put equal to bo. In order to carry out numerical estimations one has to define constants as follows E = 0.1, bo = 0.5. The classical and averaged trajectories for different time intervals are plotted on the figures below. The classical and averaged trajectories are represented by the dashed and solid lines respectively. One can see from Fig. 1 that the divergence between classical and averaged trajectories is less than E for t E [0; 15]. On the Fig. 2 it is shown that the amplitude of the averaged trajectory is decreasing with time, consequently the divergence between the trajectories is increasing. Also one can see that the phase shift between oscillations becomes easily observed (Fig. 2). The averaged trajectory amplitude tends to zero with time (Fig 3) while the phase difference of the oscillations does not seem to have any limit value. In order to move further and make estimations of tc for different values of 0" it is necessary to define constant C = O.lqKB(O, bo). So that the
369
q f
0.4 i
I
I
f I
0.2
/
I
I
,
!
I
/'. '
\
\
\
,
\
\\ \
\\
\
-0.2
-0.4
Figure 1.
Numerically estimated classical and averaged trajectories in t E [0; 15]
q
Figure 2.
Numerically estimated classical and averaged trajectories in t E [60; 75]
following equation for threshold time is considered:
= 0.1, bo = 0.5, qKB(O, bo) = bo, E
1~(tc)1 = O.lqKB(O,
bo).
(18) (19) (20)
370
q
Figure 3.
Numerically estimated classical and averaged trajectories in t E [200; 400J
Figure 4.
Numerically estimated dependence tc = tc ((T)
The values of tc that meet the equation (18) are shown on the Fig. (4). The data presented on the Fig. (4) is presented below:
371
Figure 5.
The tc and (]" values and the function (22)
With the help of nonlinear regression with parameters a and b one finds numerical values of the parameters that make the model
~-b
(21)
fo
give the best fit to data as a function of
(J".
Thus the constants are a =
25.2086, b = 27.6926:
tc
=
25.2086
fo
-
27.6926.
(22)
The tc and (J" values together with the function obtained are presented on the figure below. As it can be seen the points are close to the curve. Thus it shows that
The numerical estimations and nonlinear regression mentioned in the present work are performed with "Wolfram Mathematica 6.0" program suite licensed to Steklov Mathematical Institute.
372
Acknowledgements
This work was partially supported by grants RFBR-OS-OI-00727-a and RFBR-09-0l-12161-ofi-m, and also by grants NSh-3224.200S.1 and by Division of Mathematics of RAS. References 1. LV. Volovich, Randomness in Classical Mechanics and Quantum Mechanics. Foundations of Physics, DOl 1O.1007/s10701-01O-9450-2; Time Irreversibility Problem and Functional Formulation of Classical Mechanics, arXiv:0907.2445. 2. V.V. Kozlov, Gibbs ensembles and nonequilibrium statistical mechanics, Moscow-Ijevsk (in Russian), 2008. 3. LV. Volovich. Functional mechanics and time irreversibility problem. In "Quantum Bio-Informatics III", ed. L. Accardi, W. Freudenberg, M. Ohya. World Scientific, Singapore, 2010, pp. 393-404. 4. A. S. Trushechkin, 1. V . Volovich, Functional Classical Mechanics and Rational Numbers, P-Adic Numbers, Ultra metric Analysis and Applications, 1:4 (2009), 365-371; arXiv: 0910.1502. 5. A. S. Trushechkin, Irreversibility and the measurement procedure in the functional mechanics, Theor. Mathern. Phys. 164:3 (2010), 435-440. 6. LV. Volovich, Bogolyubov equations and functional mechanics, Theor. Mathern. Phys. 164:3 (2010) 354-362. 7. E.V . Piskovskiy, A study of some model systems of functional mechanics in the functional mechanics framework, Abst., The Second International Conference on Mathematical Physics and Its Applications, Samara, 2010. 8. K. Inoue, M. Ohya, LV. Volovich, Semiclassical properties and chaos degree for the quantum baker's map, Journal of Mathematical Physics, 43, 734-755 (2002) . 9. N.N. Bogolyubov, Y. A. Mitropolski, Asymptotic Methods in the Theory of Nonlinear Oscillations. Gordon and Breach, New York, 1961. 10. G. Bikhoff, S. Mac Lane, A Survey of Modern Algebra, 4th ed., Macmillan Publishing Co Inc., New York, 1977.
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 373-386)
QUANTILE-QUANTILE PLOTS: AN APPROACH FOR THE INTER-SPECIES COMPARISON OF PROMOTER ARCHITECTURE IN EUKARYOTES
KASPAR FELDMEIER, JOACHIM KILIAN, KLAUS HARTER, DIERK WANKE* AND KENNETH W. BERENDZEN* 2MBP Pfianzenphysiologie, Universitiit Tiibingen Auf der Morgenstelle 1, D-72076 Tiibingen, Germany * e-mail: [email protected] * e-mail: [email protected] Regulatory non-coding DNA is important to drive gene transcription and thereby influence mRNA and consequently protein abundance. Therefore, biologists and bioinformation scientists aim to extract meaningful information from these sequence regions, in particular upstream regulation regions called promoters, and conclude on regulatory sequence function. While some approaches have been successful for single genes or a single genome, it is an open question whether information on promoter function can readily be transferred between different species. Thus, it is useful for biologists to know more about the general structure and composition of promoters including the occurrence of cis regulatory DNA-elements (CREs) to be able to compare promoter architecture between organisms. To approach this task, we utilized the fully sequenced genomes of the plant model organisms: mouse-ear cress (Arabidopsis thaliana), western balsam poplar (Populus trichocarpa) , Sorghum bicolor and rice (Oryza sativa). For the interspecies comparison we made use of quantile-quantile (QQ)-plots of the variances of hexanucleotides or known functional CREs of core-promoter regions. Here, we suggest that the differences in promoter architecture correlate with the sizes of the intergenic space, i.e. regions, in which the promoters are located. In contrast, analysis of CREs is hampered by the general lack of well characterized transcription factorCRE-relationships.
Keywords: Promoter code; cis-regulatory elements CRE); cross-species promoter analysis; promoter evolution; QQ-plot of variances.
1. Introduction
Gene expression in eukaryotes is controlled by several intermingling levels of regulation that involves both pre- and post-transcriptional processes. One level comprises the intermingling of proteins, termed transcription factors 373
374
(TFs), which make direct physical contact with specific short stretches of DNA called cis-regulatory DNA-motif elements (CREs) and subsequently act on the transcription machinery. These CREs are typically positioned upstream of the DNA sequence encoding the RNA transcript of a gene and follow a cryptic regulatory code 1. The interface between the transcription factors and their regulatory DNA sequence companions harbor essential information to control specific gene expression and integrate information derived from upstream signaling cascades 2. From an evolutionary point of view, it has been noted that the same or highly similar TF-CRE-relations exist in closely related species and control the expression of the same or highly similar genes. However, it still remains to be elucidated, how TF-CRE-relations change or are retained over more distant evolutionary time. Previous studies have shown that several DNA-motifs are conserved in the regulatory promoter sequences between many eukaryote genomes, e.g. Arabidopsis, Saccharomyces, Drosophila and Caenorhabditis, and this is especially true for the core-promoter region proximal to the genes 1. To draw conclusions on promoter- and CRE-evolution from comparative cross-species approaches, it is of importance to integrate the genomic sequences of closely and more distantly related species in the same analysis. Plant genomes are especially suited for studies on promoter sequences, as most CREs are located upstream of the transcription start site (TSS) close to the gene sequences 1,3. Fortunately, genomic sequences from several plant species are publicly available which one can use for interspecies comparisons. One of the best studied eukaryote model organisms to date is the flowering plant Arabidopsis thaliana. Its physiological responses and its lifecycle have been studied for more than 100 years. With only five chromosomes, the overall complexity of its genome is low and the intergenic regions that harbor the promoters are small. Thus far, Arabidopsis has risen to one of the best understood eukaryote model organisms with the best annotated genome sequence of all eukaryotes available 4. While Arabidopsis thaliana is a modern dicot plant of little agricultural importance, the evolutionary more distantly related grasses or trees are of a high economical value world wide 5. A major drawback for genetic approaches in grasses lies in the complex genomes of cereal crops that have multimerized and duplicated during the breeding and inbreeding processes of domestication. Rice has a small size and relatively simple genomic organization and is favored in laboratory research as it is the second fully sequenced eukaryote plant species 6. The mono cot Sorghum bicolor, whose
375
genome sequence will be about to be finished in the near future, is closer related to rice than it is to the dicot Arabidopsis 7. Despite their economical importance, trees have a great disadvantage for genetic studies due to their naturally a long life cycle compared to cryptogams, making laboratory experiments tedious or not feasible 8. Nevertheless, the genomic sequence of the western balsam poplar (Populus trichocarpa) has been submitted to public databases 8 and can readily been analyzed by bioinformatics routines. Here, we conduct an interspecies promoter motif analysis to assess whether the architecture of the core-promoters and CREs distribution therein is evolutionary conserved. Therefore, we extracted upstream sequences from Arabidopsis, rice, poplar and Sorghum annotated genes, which contain the proximal promoters and most essential CREs. The variances of all hexanucleotide motifs and known functional CREs were computed for these four species. For the pair-wise visualization of the interspecies differences in the promoters, we employed quantile-quantile (QQ)-plots of these variances. This approach disclosed that a higher information density is contained in the promoters of those species with more compact genomes.
2. Material and Methods 2.1. Plant genome information
The genome sequence of the chromosome pseudomolecules for the plant model organisms Arabidopsis thaliana, western balsam poplar (Populus trichocarpa) , Sorghum bicolor and rice (Oryza sativa) were retrieved from GenBanks Plant Genomes Central (http://www . ncbi . nlm. nih. gOY / genomes/PLANTS/PlantList .html): Arabidopsis thaliana GenBank accessions: NC_003070.9, NC_003071.7, NC_003074.8, NC_003075.7 and NC_003076.8; Populus trichocarpa GenBank accessions: NC_008467.1, NC_008468.1, NC_008469.1, NC_008470.1, NC_008471.1 , NC_008472.1, NC_008473.1, NC_008474.1, NC_008475.1, NC_008476.1, NC_008477.1, NC_008478.1, NC_008479.1, NC_008480.1, NC_008481.1, NC_008482.1, NC_008483.1, NC_008484.1 and NC_008485.1; Oryza sativa GenBank accessions: NC_008394.1, NC_008395.1, NC_008396.1, NC_008397.1, NC_008398.1, NC_008399.1, NC_008400.1, NC_008401.1, NC_008402.1, NC_008403.1, NC_008404.1 and NC_008405.1. The genomic sequences of Sorghum bicolor 1v4 were retrieved from the Sorghum bicolor Genome at Plant genome data base, PlantGDB (http://www . plantgdb . org/ /SbGDB/). Promot-
376
ers were extracted as 1000 bp 5 of the annotated start for each genes (without redundancy) using Motif Mapper .NET (5.1.1.39) and python scripts (1.2) (http://www.zmbp.uni-tuebingen.de/PlantPhysiologyI ResearehGroups/harter/berendzen/programs.html).
2.2. Hexanucleotide Motifs and cisRegulatory Elements Known CREs were retrieved from the PLACE databases (http://www . dna. afire. go. jp/PLACE/) as has been published previously in 1. Hexamers were generated with Motif Mapper All Oligos function.
2.3. Variance based Promoter Motif Analysis To compute the variances for all DNA-motifs, frequency distribution curves were compiled with the Motif Mapper python scripts. All frequency distribution curves were rooted to the annotated start of the transcriptional unit of the genes pointing to the right. Subsequently, the variance for each motif was computed from its distribution curve.
2.4. Phylogenetic Analysis Comparison of information from the Arabidopsis, Populus, Sorghum and rice (Oryza) genomes as well as correlation data has been compared by using ClustalW (http://www.ebi.ae . ukl elustalw/). Default settings were used to calculate the dendrogram files. Tree graphs were visualized using TreeView software version 1.6.6 (http://taxonomy . zoology. gla. ae. ukl rod/treeview . html). The evolutionary distance was computed on RbcL and Cytc sequences. Distance trees for the variances of DNA-motifs in the promoter were computed from the Euclidean distance of the variance vectors by using statistiXL 1.8 for MS Excel. The variances for each of the motifs and all four organisms have been used as attribute data to compute a derived distance matrix according to the statistiXL descriptions. Next, a correlation coefficient was calculated, to indicate how similar the final hierarchical pattern and initial distance matrix were. A dendrogram file was provided to graphically summarize the similarity patterns.
2.5. Interspecies Quantile-Quantile Plots The a-quantiles of the datasets were computed using Math Works MATLAB routines 9. For each of the DNA-motifs in each of the organisms promoter
377
datasets the a-quantiles have been assessed on the basis of the variances of motif occurrence. To conclude on the conservation of motifs in the distantly related plant species, the quantiles of the datasets were plotted against each other. In a QQ-plot two variance datasets were compared by their quantiles. The same quantiles are calculated for both datasets and used as x- /y-coordinates for the QQ-plot. The a-quantiles used for QQ-plot comparisons were as follows: 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.65, 0.7, 0.75, 0.8, 0.85, 0.9, 0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98 and 0.99. 3. Results and Discussion
3.1. DNA-motif variance as a measure of information content Biologists and bioinformation scientists are working handin-hand to gain better insight into how regulatory non-coding DNA the promoter sequences is capable of mediating changes in gene expression. The approaches taken so far to compare gene regulation on a genomic scale are hampered by the varying occurrence of these small DNA-motifs between species due to different genome sizes or DNA-base composition (GC/AT-content) 1. On the other hand, the positional frequency of a motif in promoters of one species can be graphed as frequency distribution curves for each motif, whereby each motif has its own trajectory (Fig. 1A). Strong deviations from the mean background frequency are characteristic for disequilibria, i.e. regions within a promoter that do not follow the average distribution, and presumptively are from information states of DNA-protein evolution. It has been demonstrated that higher order information is indeed responsible for these disequilibria since many of these motifs are often DNA binding sites for proteins 1. Based on these observations, one can conclude that a positional bias of a motif in promoters from a single genome reflects functional information. A comparative approach of DNA-motif distribution curves is complicated, as necessary normalization procedures will have a transforming effect on the curves (Fig. 1B) and distort the biological data. Thus, we used the variance of a motifs frequency-distribution within the promoter as a measure of its information content. DNA-motifs with high variance will have distributioncurve signatures that are distinct from the background mean frequency and, hence, harbor more information than others. By using the variance of a motif as a measure for information content, no additional normalization or correction is needed, thus preserving as much biological information as
378
possible. This approach provides us with several observations: First, two different frequency-distribution curves for two different DNA motifs with different means can still have the same variance (Fig. lA); the variance is invariant under translation and will not be altered, when the plot is shifted along the y-axis. Second, normalized distribution curves display different variances with different means (Fig. 1B). As a consequence, we can also evaluate rare motives and omit compensating for DNA-base composition between the different genomes of the organisms. We extracted the -1000 bp upstream of the annotated start sites of genes. This resulted in in 33238 promoter sequences for the Arabidopsis genome, 10483 for Populus, 36250 for Sorghum and 18328 for Oryza. Next, the frequency distribution curves for all 4096 hexanucleotides and 426 known CREs were compiled and the variances were calculated.
3.2. Quantile-quantile (QQ)-plots for interspecies promoter comparison To compare the variances of the motifs of different species, we made use of QQ-plots 9. In a QQ-plot, two datasets are compared by their quantiles. The same quantiles were calculated for each dataset and used as coordinates for the QQ-plot. If, for example, two normal distributions with different variances are plotted, the resulting QQ-plot will exhibit a linear function with a steepness dependent on the variances; if they have different means the resulting QQ-plot will be affine. If the variances of the hexanucleotide or CRE maps show a similar distribution, the QQ-plots could have an affine shape. The steepness would correlate with the variance of the variances. A higher consensus (positional bias) in one species could hint to higher average information content in the underlying promoter sequences. First we examine the QQ-plots for hexanucleotides for all species. Arabidopsis and Populus are both dicot species and are considered to be more related to each other than to the two mono cots Oryza and Sorghum. When the quantiles of the variances of Arabidopsis and Populus were QQ-plotted against each other (Fig. 2A) , a graph with only a small steepness was the result. This can be explained by higher information content in the promoters of Arabidopsis compared to Populus. However, there was a strong increase in the last quantile [0.99] in Populus, which can be accounted to 41 hexanucleotides with the highest variance in both organisms. Among these motifs, we identified several motifs of putative functionality, e.g. the TATA-box-like motifs 1.
379
+1
+1
Figure 1. Assessing the variances of DNA motifs from frequency distribution curves (A) Two different frequency distribution curves for two different DNA motifs with different means [dashed lines] can have the same variance. The variance is invariant under translation and will not be altered, when the plot is shifted along the y-axis. (B) Interspecies comparison is hampered by different motif frequencies. Thus , normalized distribution curves, which were gained by division through their means for comparative reasons, display a different variance with different means of two otherwise identical frequency distribution curves.
Figure 2B shows the quantiles of the variances of Arabidopsis and Oryza. Although the QQ-plots were located under the expected linear function with a steepness of 1, the quantiles were highly similar, which is indicative for high similarities in the promoter architecture of these two species. Simi-
380
o
A
ff)
15001. '
::I
"3
I
§'1000t
CL
'
SOO~'"
500
1000
1500
o
2000
o
Arabidopsis
B
500
1000 1500 Oryza
2000
E
500
1000 1500 Arabidopsis
2000
c
SOD 1000 1S00 2000 2500
Oryza
F
2500r .. . - •• _-_ .. . .... . . .. j
2000'
E
,
g'
i
.E 1500+ en 1000· 500 O.. ··~· ··· -··· ··· -·· o 500 1000 lS00 2000 2500
Arabidopsis
o ..".............................. ,... .' o 500 1000 lS00 2000 2500 Populus
Figure 2. Quantile·quantile plots of the variance of hexanucleotides in promoters. Pair· wise comparison of the variance of hexanucleotide motifs in the regulatory promoter se· quences of different species by using QQ·plots. The variances of all 4096 hexanucleotides were assessed from the respective frequency distribution curves for Arabidopsis, Populus, Sorghum and Oryza promoters [·1000 bp sequence upstream of the annotated start of the gene]. Shown are the plots for all six different QQ-combinations (A F) between the four species. The bisecting line is integrated and indicates a linear steepness of 1, i.e. the variances of the motifs between the two species are of the same value.
larly, the QQ-plots of the Arabidopsis and Sorghum hexanucleotide variances (Fig. 2C) were also proximal to the bisecting line, but with slightly
381
higher variances in Sorghum. Here again, the majority of difference was found in the quantiles between 0.90 and 0.96. This can not readily be explained by e.g. different sizes of the promoter dataset - both species had a similar number of annotated promoters. However, it might be possible that the Sorghum genome data or the dataset contains a certain degree of redundancy and, thus, there were more highly conserved motifs with a higher variance found that lead to higher quantiles. Therefore, further research on the nature of these elements and on the Sorghum dataset needs to be done to clarify this matter. The comparison ofthe hexanucleotides for Populus and Oryza (Fig. 2D) displays overall similarities and trends to the Populus and Arabidopsis comparison (Fig. 2A). Hence, the Populus promoter dataset contains unique features that account for these differences. This observation can not be explained by differences in the sizes of the dataset as the number of promoters contains in the Oryza and Populus dataset are of a similar magnitude, while Arabidopsis is larger. Nevertheless, the same trends were observed for all comparisons with the Populus promoters. Figure 2E displays the comparison of hexanucleotide variances of Sorghum and Oryza. Interestingly, the quantiles of Sorghum were much higher than those of Oryza. As has been noted previously, an explanation would be that the underlying promoter regions in Sorghum have a higher amount of high-consensus hexanucleotides - probably hinting to a higher number of regulatory active sequences or to redundance within the dataset. In Figure 2F, Sorghum is compared to the Populus dataset. While Populus displayed the lowest values in its quantiles, the Sorghum dataset contains the highest values in its quantiles of our organisms and, hence, the resulting plot has a very high steepness due to the high amount of high-consensus hexanucleotides (those with specific positional bias) in the Sorghum promoters. While we consider the variance in hexanucleotides a measure for overall promoter architecture and motif composition, we expected that the variances of CREs would provide us with information on conserved corepromoters and retained CRE function in gene regulation. Figure 3 gives QQ-plots of the CRE-motives for the four plant species under investigation. For comparative reasons, we also included the data of the hexanucleotide variances [in grey] from the previous study in the CRE-plots. This will provide us with the information, whether the variances of the CREs differ from the general promoter architecture. Since the CRE-motives harbor known regulatory information, we could expect from them to have high variances in the promoter regions and, thus, harbor a higher order of information.
382
o
A
2000
4000
Arabidopsis
B
2000
4000
o
6000
2000
4000
6000
Oryza
E
o
6000
2000
4000
6000
Oryza
ArabidopsiS
c
F
o
2000
4000
Arabidopsis
6000
o
:2000
4000
6000
POpulus
Figure 3. Quantile-quantile plots of the variance of known cis-regulatory elements (CREs) in promoters. Pairwise comparison of the variance of known cis-regulatory elements (CREs) in the regulatory promoter sequences of different species by using QQ-plots. The variances of 426 known CREs from the PlaCE (http://www.dna.affrc.go.jp/PLACE/) database were assessed from the respective frequency distribution curves for Arabidopsis, Populus, Sorghum and Oryza promoters [-1000 bp sequence upstream of the annotated start of the gene]. Shown are the plots for all six different QQ-combinations (A - F) between the four species. QQ-plots for the variances of CRE-motives are displayed in black; for comparison, the QQ-plots for the variance of the hexanucleotides shown figure 2 [in grey] have been incorporated into this figure. The solid line indicates the bisector of an angle and indicates a linear steepness of 1, i.e. the variances of the motifs between the two species are of the same value.
383
In Figure 3A the quantiles of Populus were plotted against Ambidopsis. Similar to the hexanucleotide comparison, the quantiles of Populus were much lower than those of A mbidopsis. On the one hand this finding could be explained by the on average smaller sizes of promoters with higher information content in Ambidopsis. On the other hand, the Populus dataset is of a poor quality due to mis-annotation or problems to curate the genome information. Next, the quantiles of the CRE variances of Oryza were compared to Ambidopsis (Fig. 3B). It was obvious that the variances were of approximately of the same values for the promoters of both species and, thus, close to the bisecting line. Here, also the steepness of the QQ-plot for the CRE variances was similar to that of the QQ-plot of the hexanucleotides, which points towards the fact, that the amount of relative information between two species is similar for hexanucleotides and CRE-motifs. Figure 3C shows the QQ-plot of the CRE variances of Sorghum and Ambidopsis. The quantiles of the variances were very similar for both species, which is indicative of similar frequency distribution curves for the CREs in the promoters of the two species. Especially for the lower quantiles the graph follows the dissecting line. Hence, one can conclude that the motif composition in the promoters of the two species and their architectures are highly similar in general. However, this similarity was highest for the known functional CREs. Only for the quantiles 0.97 - 0.99, the variances diverged. The simplest explanation for this observation was the presence of longer CRE-consensi or naturally rare motifs, which occur slightly more in one genome than the other and, thus, have exorbitantly high variance over a very low mean frequency. The QQ-plots of the CRE variances of Populus against the Oryza (Fig. 3D) or Sorghum (Fig. 3F) datasets revealed a very low information content for Populus. Again, the values in the quantiles of Populus were the smallest in the whole dataset and, therefore, Populus also displayed a low CRE consensus. The same trend was seen for the Sorghum promoters. In Figure 3E, the quantiles of CRE variance in Sorghum were plotted against the Oryza dataset; these quantiles displayed similarities for both species and, thus, the distribution of the known functional elements in the promoters likely follows a similar trend in frequency. Since the quantiles of Sorghum have also been similar to that of Ambidopsis (Fig. 3C), we could conclude that CRE distribution in the promoters might indeed be conserved between the three species. Moreover, these similarities could be detected with QQ-plots of the variance as a measure irrespective of the
384
A
B
c
Figure 4. Distance matrix tree graphs for evolutionary distance of the species and Euclidean distance of the DNA-motifs. Tree graphs display the relative relatedness of the organisms on the basis of their protein sequence similarities (A) or the similarities between the DNA-motif variances of all hexanucleotides (B) and known CREs (C). Similarities and dissimilarities between the tree topologies in A - C are highlighted by lines that interconnect the four organisms between the graphs.
different dataset sizes or DNA-base composition (GC/AT-content). Our presumption was that information content within promoters is retained as positional disequilibria in promoter sequences when observed at a genomic scale. As a simple, but effective measurement, we took the variance of DNA-motifs frequency-distribution to indicate the degree of information a motif carries, and compared these variances between species to capture global similarities in promoter architecture. To make conclusions on these interspecies comparisons, we established distance matrices for protein sequence similarity and for the variance of hexanucleotides or CREs; these results were displayed as tree graphs (Fig. 4). The last common ancestor of dicot (Arabidopsis and Populus) and monocot (Sorghum and Oryza) species lived at approx. 140 Mio years before present 10. The phylogenetic tree in Figure 4A captures this evolutionary time and serves as our reference. The Oryza dataset appears to be closer related to Arabidopsis than to its evolutionarly close relative Sorghum when looking at the hexanucletide variances. One possible explanation for this similarity in hexanucleotide composition is that both species have relatively small, compressed genome sizes 11. As a consequence, the intergenic space, i.e. the DNA-region between the genes, is relatively small, and as such, more information must be packed into shorter promoter regions 3. The Arabidopsis thaliana genome is one of the smallest eukaryote genome 4 and, thus, regulatory promoter sequence must be short and enriched for motifs with a high order of information. This notion might be true for Oryza as well, as its genome size is one of the smallest amongst other Poaceae crop plants. The CRE variance is shown in Figure 4C. Here, the overall tree topology follows our reference tree (Fig. 4A), with the exception of the Populus
385
dataset, which could likely be considered as an outlier. The reasons for the aberrant Populus set might be manifold, but most likely originate from a low quality in the annotation of the gene starts. 4. Conclusions We have shown that promoter architecture is strongly dependent on the sizes of intergenic regions. The observation that the CRE variances for Arabidopsis, Oryza and Sorghum were highly similar supports the idea that the changes in protein sequence or in cis-regulatory DNA over time might be proportional. Thus, known functional elements display the same evolutionary concepts as are considered for genome or protein evolution. To further support and refine our findings, well annotated genome information of more species is needed and has to be integrated in our analysis. The sequence information of species with different genome sizes and larger evolutionary distances would be of special interest. Moreover, the analysis of the CRE variances is hampered by too little information on transcription factor (TF)-CRE relationships. Hence, our approach is well worth to be repeated with updated information on both, TF-CREs and a larger number of promoter datasets. Acknowledgements We thank Jochen Supper for his constant input and critical discussions. This work was supported by the DFG (HA2146/11-1). References 1. K. W. Berendzen, K. Stueber, K. Harter and D. Wanke, Bmc Bioinformatics
7(NOV 30 2006). 2. J. L. Riechmann, J. Heard, G. Martin, L. Reuber, C. Jiang, J. Keddie, L. Adam, O. Pineda, O. J. Ratcliffe, R. R. Samaha, R. Creelman, M. Pilgrim, P. Broun, J. Z. Zhang, D. Ghandehari, B. K. Sherman and G. Yu, Science 290, 2105(Dec 2000). 3. D. Walther, R. Brunnemann and J. Selbig, PLoS Genet 3, p. ell(Feb 2007). 4. Arabidopsis Genome Initiative, Nature 408, 796(Dec 2000). 5. Y. Yamazaki and P. Jaiswal, Plant Cell Physiol46, 63(Jan 2005). 6. International Rice Genome SequencingProject, Nature 436, 793(Aug 2005). 7. X. Wang, G. Haberer and K. F. X. Mayer, BMC Genomics 10, p. 284 (2009). 8. G. A. Tuskan, S. Difazio, S. Jansson, J. Bohlmann, I. Grigoriev, U. Hellsten, N. Putnam, S. Ralph, S. Rombauts, A. Salamov, J. Schein, L. Sterck, A. Aerts, R. R. Bhalerao, R. P. Bhalerao, D. Blaudez, W. Boerjan, A. Brun,
386
A. Brunner, V. Busov, M. Campbell, J. Carlson, M. Chalot, J. Chapman, G.-L. Chen, D. Cooper, P. M. Coutinho, J. Couturier, S. Covert, Q. Cronk, R. Cunningham, J. Davis, S. Degroeve, A. Dejardin, C. Depamphilis, J. Detter, B. Dirks, I. Dubchak, S. Duplessis, J. Ehlting, B. Ellis, K. Gendler, D. Goodstein, M. Gribskov, J. Grimwood, A. Groover, L. Gunter, B. Hamberger, B. Heinze, Y. Helariutta, B. Henrissat, D. Holligan, R. Holt, W. Huang, N. Islam-Faridi, S. Jones, M. Jones-Rhoades, R. Jorgensen, C. Joshi, J. Kangasjarvi, J. Karlsson, C. Kelleher, R. Kirkpatrick, M. Kirst, A. Kohler, U. Kalluri, F. Larimer, J. Leebens-Mack, J.-C. Leple, P. Locascio, Y. Lou, S. Lucas, F. Martin, B. Montanini, C. Napoli, D. R. Nelson, C. Nelson, K. Nieminen, O. Nilsson, V. Pereda, G. Peter, R. Philippe, G. Pilate, A. Poliakov, J. Razumovskaya, P. Richardson, C. Rinaldi, K. Ritland, P. Rouze, D. Ryaboy, J. Schmutz, J. Schrader, B. Segerman, H. Shin, A. Siddiqui, F. Sterky, A. Terry, C.-J. Tsai, E. Uberbacher, P. Unneberg, J. Vahala, K. Wall, S. Wessler, G. Yang, T. Yin, C. Douglas, M. Marra, G. Sandberg, Y. Van de Peer and D. Rokhsar, Science 313, 1596(Sep 2006). 9. Y.-Y. Ho, L. Cope, M. Dettling and G. Parmigiani, Methods Mol Biol408, 171 (2007). 10. S.-M. Chaw, C.-C. Chang, H.-L. Chen and W.-H. Li, J Mol Evol58, 424(Apr 2004). 11. M. Spannagl, O. Noubibou, D. Haase, L. Yang, H. Gundlach, T. Hindemitt, K. Klee, G. Haberer, H. Schoof and K. F. X. Mayer, Nucleic Acids Res 35, DS34(Jan 2007).
Quantum Bio-Informatics IV eds. L. Accardi, W . Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 387-401)
ENTROPY TYPE COMPLEXITIES IN QUANTUM DYN AMICAL PROCESSES
NOBORU WATANABE Department of Information Sciences, Tokyo University of Science, Noda City, Chiba 278-8510, Japan E-mail: [email protected]
In a study of several systems, we are interested to examine (1) the dynamics of state change and (2) the complexity of states . Ohya introduced in 1991 (ref.[21]) a general idea so-called a Information Dynamics (ID), which constitutes a theory under a frame of the ID by synthesizing the formalities of the investigations of (1) and (2). There are two kind of complexities in ID. One is a complexity of state describing system itself and another is a transmitted complexity between two systems. Entropies of classical and quantum systems are the example of these complexities. In order to treat a flow of dynamics process, dynamical entropies were introduced in not only classical but also quantum systems . The main purpose of this paper is to compare with these mean entropies and the complexities in rD, we calculate these mean entropies for some simple models to discuss the complexity of information transmission for OOK and PSK modulations.
1. Introduction
In several complicated systems, it is important to study (1) the dynamics of state change and (2) the complexity of states of systems. Information Dynamics (ID) introduced by Ohya is a general idea constructing a theory by synthesizing the research schemes of (1) and (2) under a frame of the ID. In ID, there are two type of complexities, that is, (a) a complexity of state describing system itself and (b) a transmitted complexity between two systems. Entropies of classical and quantum information theory are the example of the complexities of (a) and (b). In quantum information theory, Ohya introduced a compound state and defined Ohya mutual entropy 18 based on quantum relative entropy of Umegaki 31 in 1983, he extended it 19 to general quantum systems by using the relative entropy of Araki 5 and Uhlmann 32. Based on the quantum mutual entropy, he quantum capacity is discussed in 24,25,28. One can discuss the coding theorems by means of 387
388
the mean entropy and the mean mutual entropy defined by the dynamical entropy. The KS entropy 14 was introduced in classical systems. Several quantum dynamical entropy were studied by Emch 10 , Connes-Stormer 9, Connes, Narnhoffer and Thirring 8, Park 29, .Alicki and Fannes 4, Hudetz 12. Ohya 21, Voiculescu 33, Accardi, Ohya and Watanabe 2 ,3, Kossakowski, Ohya and Watanabe 15,34, Ohya and Petz 23 and Choda 7, Bennati 6, Muraki and Ohya 16 and so on. In this paper, we briefly explain the mean entropy and the mean mutual entropy defined by Ohya 21. We calculate these mean entropies for some simple models to discuss the complexity of information transmission for subsets of the initial state space.
2. Quantum Channels The concept of channel has been played an important role in the progress of the quantum communication theory. Here we briefly review the notion of the quantum channels. Let Hk (k = 1,2) be complex separable Hilbert spaces. We denote the set of all bounded linear operators on Hk by B(Hd (k = 1,2) and we express the set of all density operators on Hk by 6(Hk) (k = 1,2). Let (B(H k ),6(H k )) (k = 1,2) be input (k = 1) and output (k = 2) quantum systems, respectively. (1) A mapping from 6(Hd to 6(H2) is called a quantum channel A*. (2) A* is called a linear channel if A* satisfies the affine property such as A*(2:k AkPk) = 2:k Ak A* (Pk) for any Pk E 6(Hl) and any nonnegative number Ak E [0,1] with 2:k Ak = 1. For the quantum channel A *, the dual map A of A * is defined by
trA*(p)B = trpA(B),
Vp
E
6(H 1 ), VB
E
B(H2)'
(3) A * is called a completely positive (CP) channel if A * is linear channel and its dual map A : B(H 2) --+ B(H 1 ) of A* holds
( x , t AiA(A;Aj)AjX);::::: 0
(Vx
E
HI)
c B(H 2) and any {Ai}
C
B(Hd·
t,J=1
for any n E N, any {Ad
Almost all physical transform of states can be denoted by the CP channels 18,19,13,23,27.
389
2.1. Noisy optical channel
Now we explain the noisy optical channel such as an example of the quantum communication channels 19,26. Let Kl and K2 be complex separable Hilbert spaces of noise and loss systems, respectively. For an input state p in 6(Hd and a noise state ~ E 6 (Kd, we defined in 26 a mapping IT* by
== V (p @ ~) V*,
IT* (p @~)
which is called a generalized beam splitting, where V is a linear mapping from HI @ Kl to H 2 @K 2 given by nl+ml
V(lnl; @ Iml;) =
L
Cj",m"lj; @ Inl + ml - j;
j=O
for the nl,ml,j,(nl+ml-j) photon number state vectors Inl; HI, Iml; E Kl , Ij; E H2, Inl + ml - j; E K 2, and Cn1,m1 J
K j - r = ''"'(-It"+
~
r=L xaml-j+2r
.I
,
, "(
E
_ ')'
ynl·ml·]· nl +ml ]. r!(nl - j)!(j - r)!(ml - j + r)!
(-73) nl +j-2r .
(1)
a and {3 are complex numbers satisfying lal 2 + 1{31 2 = l. K and L are constants given by K = min{nl,j}, L = max{ml - j, O}. For the coherent input state p = Ie; (el @ Ih;; (h;I E 6 (Hl@Kd, the output state of IT* is obtained by IT* (Ie; (el @ Ih;; (h;I)
= lae + @
IT* with the vacuum noise state given by
~o
{3h;;
(ae +
{3h;1
1-73e + (ih;) \ -73e + (ih;I·
= 10; (01 is called the beam splitter ITo
= lae; (ael @ 1-73e) \-73 e l for the coherent input state p @ ~o = Ie; (el @ 10; (01 E 6 (Hl@Kd· ITo was ITo (Ie; (el @~O)
described by means of the lifting Eo from 6 (H) to 6 (H@K) in the sense of Accardi and Ohya 1 as follows
Eo (Ie; (el) =
lae; (ael @ l{3e) ({3el·
Based on the liftings, the beam splitting was studied by Accardi - Ohya and Fichtner - Freudenberg - Libsher 11. Noisy quantum channel A* introduced in 26 with a fixed noise state ~ E 6 (Kd was defined by A*(p)
==
trJC 2IT*(p @~) = trJC2 V (p @~) V*.
(2)
390
Moreover, the noisy quantum channel with the vacuum noise state 10) (01 is called the attenuation channel given by Ohya 19 as
Ao
~o
= (3)
which are important to discuss the quantum communication processes.
3. Complexities In ID 22, two kind of complexities CS (p), TS (p; A*) are used for studying the complex systems. CS (p) is a complexity of a state p measured from a subset Sand T S (p; A*) is a transmitted complexity associated with the state change from p to A* p. These complexities should satisfy the following conditions: Let S ,S, St be subsets of 6 (Hd , 6 (H 2) , 6 (H1 ® H2), respectively. (1) For any PES, CS (p) and TS (p; A*) are nonnegative. (2) For a bijection j from ex6(H 1) to ex6(H 1), CS(p) is equal to CS(j (p)), where ex6 (Hd is the set of extremal point of 6 (Hd. (3) For p ® a E 6 (H1 ® H 2), p E 6 (Hd, a E 6 (H 2), the complexity CSt (p ® a) of the state p ® a of totally independent systems is equal to the sum C S (p) + C S (a) of the complexities of the states p and a. (4) The transmitted complexity TS (p; A*) is greater than 0 and it is less than the complexity C S (p) of the state p. (5) If the channel A* is the identity map id, then T S (p ;id) is equal to
CS(p) . One of the example of the above complexities are the Shannon entropy
S (p) for CS (p) and classical mutual entropy I (p; A*) for TS (p; A *). Let us consider these complexities for quantum systems.
3.1. Example of Complexity C S (p) 3.1.1. (1) von Neumann entropy One of the example of the complexity C S (p) of ID in quantum system is the von Neumann entropy 17 S (p) described by
C S (p) {:} S(p) = -trplogp for any density operators p E 6 (Hd, which satisfies the above conditions (1), (2), (3).
391
3.1.2. (2) S-mixing entropy Let (A, 6(A), a(G)) be a C*-dynamical system and S be a weak* compact and convex subset of 6(A). For example, S is given by 6(A) (the set of all states on A), I(a) (the set of all invariant states for a ), K(a) (the set of all KMS states), and so on. Every state rp E S has a maximal measure fL pseudosupported on exS such that rp
=
is
(4)
wdfL,
where exS is the set of all extreme points of S. The measure fL giving the above decomposition is not unique unless S is a Choquet simplex. We denote the set of all such measures by M",(S), and define
D",(S) = {M",(S); s.t.
c
jR+
~fLk = 1,
fL
3fLk
and {rpd
=
c
exS
~fLk8(rpk)}'
(5)
where 8(rp) is the Dirac measure concentrated on an initial state rp. For a measure fL E D",(S), we put H(fL)
=-
LfLklogfLk·
(6)
k
The C*-entropy of a state rp E S with respect to S (S-mixing entropy) is defined in 21 by SS(rp)
={
inf {H (fL);
+00
fL E
D",(S)}
if D",(S) = 0.
(7)
It describes the amount of information of the state rp measured from the subsystem S. We denote S6(AJ(rp) by S(rp) if S = 6(A). It is an extension of von Neumann's entropy. This entropy (mixing S-entropy) of a general state rp satisfies the following properties 21. Theorem 3.1. When A = B(H) and at = Ad(Ut ) (i.e., at(A) = Ut AUt for any A E A) with a unitary operator Ut, for any state rp given by rp(.) = trp· with a density operator p, the following facts hold:
(1) S(rp)
= -trplogp.
(2) If rp is an a-invariant faithful state and every eigenvalue of p is non-degenerate, then SI(a.J(rp) = S(rp), where I (a) is the set of all a-invariant faithful states.
392
(3) If ep E K(n:), then sK(al(ep) states.
= 0,
where K (n:)is the set of all KMS
Theorem 3.2. For any ep E K(n:) , one can obtain
(1) sK(al(ep) :::; sI(al(ep). (2) sK(al(ep) :::; Seep). 3.2. Example of Transmitted Complexity
rs (Pi A *)
3.2.1. (1) Ohya mutual entropy for density operator An example of the transmitted complexity T S (p; A*) of ID in quantum system is the Ohya mutual entropy with respect to the initial state p and the quantum channel A* defined in 19 by T S (p; A*)
~ J (p; A*)
== sup
{~S(O'E'P@A*P)'P = ~AnEn}'
(8)
where 0' E is the compound state given by 0' E = 2::n AnEn@A *En associated with the Schatten-von Neumann (one dimensional spectral) decomposition 30 p = 2::n AnEn of the input state p, and S (', .) is the Umegaki's relative entropy denoted by
S(
) = {trp(lOgp-IogO') (whenranpcranO') p,O' (h .) 00 ot erWlse
(9)
which was extended to more general quantum systems by Araki and Uhlmann 5,20,23,32. The Ohya mutual entropy holds the above conditions (4) such as
0:::; J(p,A*):::; S(p). 3.2.2. (2) Ohya mutual entropy for general C*-system Let (A,6(A),n:(G)) be a unital C*-system and S be a weak* compact convex subset of 6(A). For an initial state ep E S and a channel A* : 6 (A) -+ 6 (B), two compound states are
=
Is
w @A*w dj1,
ep@A*ep.
(10) (11)
393
The compound state if>~ expresses the correlation between the input state c.p and the output state A*c.p. The mutual entropy 21 with respect to Sand f.L is given by
I~ (c.p ; A *)
= s (if>~,
(12)
if>o)
and the mutual entropy with respect to S is defined by Ohya
21
as
IS (c.p ; A*) = sup {I~ (c.p ; A*) ; f.L E Mtp (S)} .
(13)
4. Quantum Mean Mutual Entropy of K-S type In this section, we briefly review quantum mean entropy and quantum mean mutual entropy introduced by Ohya 21. A stationary information source in quantum information theory is described by a C*triple (A, 6(A), () A) with a stationary state c.p with respect to () A; that is, A is a unital C* -algebra, 6(A) is the set of all states on A, ()A is an automorphism of A, and c.p E 6(A) is a state over A with
c.p 0
()A
= c.p.
Let an output C*-dynamical system be the triple (8,6(8), ()s), and A* : 6(A) ---> 6(8) be a covariant c.p. channel: A : 8 ---> A such that A 0 ()s = () A 0 A. In this section we explain new functionals S~(c.p;aM), SS(c.p;a M ), I~ (c.p; aM, f3N) and IS (c.p; aM, f3N) introduced in 21,16 for a pair of finite sequences of aM = (aI, a2, . .. , aM), f3N = (13 1,13 2, . .. ,13 N) of completely positive unital maps am : Am ---> A , f3 n : 8 n ---> 8 where Am and 8 n (m = 1, · · · ,M, n = 1, ··· ,N) are finite dimensional unital C*-algebras. For a given finite sequences of completely positive unital maps am : Am ---> A from finite dimensional unital C* -algebras Am (m = 1,·· . ,M) and a given measure f.L of c.p E Mtp(S), the compound state of M
aic.p, a2c.p, ... ,aMc.p on the tensor product algebra
®
Am is given by
21 ,16
m=l
if>~(aM) =
r
M
Q9 a:nw df.L(w).
(14)
JS(A) m=l
Furthermore if>~(aM U f3N) is a compound state of if>~(aM) and if>~(f3N) with aM U f3N
== (aI, a2,··· ,aM, 131, 132,··· ,f3N ) constructed as
if>~ (aM U f3N) = 1 (~a:nw) SeA )
m=l
(db f3~W) n=l
df.L.
(15)
394
For any pair (aM, (3N) of finite sequences aM = (aI, ... , aM) and (3N = ((31' ... , (3N) of completely positive unital maps (c.p.u. maps for short) am : Am ----; A, (3n : 8 n ----; A from finite dimensional unital C* -algebras and any extremal decomposition measure J.L of rp, the entropy functional SIL and the mutual entropy functional III are defined by 21,16
S~(rp;aM)=
r
JSeA)
S((6>::;=la:nw,
I~ (rp; aM, (3N) = S (
(6)
~ ((3N)) ,
(16)
(17)
where S(·,·) is the relative entropy for a finite algebra. The relative entropy of two states was introduced in 31 for O'-finite and semi finite von Neumann algebras. Araki 5 and Uhlmann 32 extended this relative entropy for more general quantum systems 23. For a given pair of finite sequences of completely positive unital maps aM = (a1,'" ,aM), (3N = ((31'''' , (3N), the functional SS(rp;a M ) (resp. IS(rp;a M,(3N)) is given by taking the supremum of S~(rp;aM) (resp. I~(rp; aM, (3N)) for all possible extremal decompositions J.L'S of rp:
SS(rp;a M) = sup{S~(rp;aM); J.L E M
(18) (19)
Let A (resp. 8) be a unital C* -algebra with a fixed automorphism 804 (resp. 8{3), A be a covariant c.p.u. map from 8 to A, and rp be an invariant state over A, i.e., rp 0 804 = rp.
a N -= ( a, 8A
0
a, ... , 8AN -
1 0
a) ,
(3f.. == (Ao(3,A 0 8{30(3, '" , A 0 8;;- 1 0(3).
(20) (21)
For each c.p.u. map a : Ao ----; A (resp. (3 : 8 0 ----; 8) from a finite dimensional unital C* -algebra Ao (resp. 8 0 ) to A (resp. 8), §S(rp;8 A ,a), jS(rp;A*,8 A ,8{3,a,(3) are given by
jS (rp; A*,8 04 , 8{3, a, (3) = lim sup N1 IS (rp; aN, (3f..).
(23)
N-+oo
The functionals §s (rp; 804) and jS (rp; A*,804, 8{3) are defined by taking the
395
supremum for all possible Ao's, a's, Bo's, and (3's: SS(
(24)
<>
I-s (cp;A * ,BA,Bs)
= supI-s (cp;A * ,B A ,B s ,a,(3).
(25)
<>,(3
The next theorem 21 ,16 shows that the fundamental inequality in information theory holds for SS(
o:s; jS(cp;A*,BA,Bs):S; min{SS(cp;BA),SS(A*cp;Bs)}.
Theorem 4.1.
4.1. Computation of mean mutual entropy for modulated states of OOK and PSK Let B(rto) (resp. B(rto)) be the set of all bounded linear operators on a Hilbert space rto (resp. {to) and Ao (resp. Bo) be a finite subset in B(rt o) (resp. B(rto)). Let A (resp. B) be an infinite tensor product space of B(rto) (resp. B(rt o)) represented by 00
A ==
0
00
B(rto),
B ==
i =- oo
0
B(rto).
i=-oo
Moreover, let BA (resp. Bs) be a shift transformations on A (resp. B) defined by 00
00
00
i=-oo
i'=-ex>
i= - CXJ
00
00
00
j =-oo
j' = -oo
j = -oo
Let a (resp. (3) be the embedding map from Ao to A , (resp. Bo to B) given by a( A) = ... I ® I ® A ® I ® . .. E A,
for any A E Ao,
(3(B) = ... I ® I ® B ® I ® ...
for any B E Bo.
E
B,
We denote the set of all density operators on rto (resp. rto) by 60 (resp. ( 0 ), and let 6 (resp. 6) be the set of all states p on A (resp. p on B) . The maps afM)' (3r(M) are given by N aiM)
(3r(M)
== ==
(
BAoao'(M),"', BA aO'(M)'
(1(M)
N
0
1
) oao'(M)'
J... 0 (3, 1(M) 0 J... 0 Bs 0 (3,'" , 1(M) 0 J... 0 B;-l 0 (3),
396 00
where we took a special channel and modulators M such that
A == ®
A
i = - oo 00
and
i(M)
== ®
I(M)'
i=-(X)
4.1.1. Mean mutual entropy for modulated state of OOK
and PSK 00
For an initial state
00
to\
p(M)
(M)
Pi
I.(Y
i=-oo
E
®
6
i
(M = OaK, PSK),
i=-CXJ
p~OOK) and p~PSK) are given by
p~OOK) p~PSK)
+ (1 - v) I,,;) (,,;1, = v 1-,,;) (-,,;1 + (1- v) I,,;) (,,;1
=
v 10) (01
(0::; v ::; 1).
The Schatten decomposition of p~M) is obtained as 2
pi M ) = L
A~) E~~) ,
ni=l
where the eigenvalues A~~) of
pi M ) are
A~~OK) = ~ {l+(_l)k-l V1 Ar:,SK) =
~ {l+(_l)k-l V1- 4v(1- v)
Two projections E~~) (ni
(ni
=
4v(1- v) (1- exp (-1,,;1 2)) },
1,2) are given by
(1- exp (-2 1,,;1 2 ) ) }
(k = 1, 2) .
1, 2) and the eigenvectors le~~)) of A~~)
397
where
For the above initial state E~A(), one can obtain the output state for the attenuation channel as follows:
Ao
A*E~A() =
2
L
:\~~~;E~7~;
(ni
= 1, 2),
n~=l
.
~(M)
(M)
where the eIgenvalues Ani, n; of A* Eni
.
are gIven by (ni = 1,2), (k = 1,2)
398
When
AD
is given by the attenuation channel, we get
The compound states through the attenuation channel
Ao becomes
iI> E (a[11)) ® iI> E ({3;:;'o (M))
=
i;l··· nNf:=l (IT A~~)) m~l mt=l Crr: A~:) xJo
mt~o (IT ~;::: m~) (~E~:'l) (~E~~: m:, )
399 00
Lemma 4.1. For an initial state
00
(8)
p(M) =
(8)
pi M ) E
i=-oc>
6
i,
we have
i=-OCJ
IE(p(M); arM)' fJraOK))
~ nt,·· ·n'Nt.dn~'··· nNt
m: A~~'lX;'~)nl)
By using the above lemma, we have the following theorems. 00
Theorem 4.2. For an initial state
p(M)
(8)
=
00
pi M )
E
i=-oo
(8)
6i
i=-oo
(M = OOK, PSK), we have S(p(M); ()A, arM)) =
J~oo~S(p(M);
arM))
2
=-
L
A~M) log A~M)
n=l
and I-(p( M)., A* , 2
2
_ ""
- 0
0
n'=l n=l
()A ()B n,N a. N ) , ''-'(M)' fJ(M) ~(M)
(M)~(M)
An
An, n' An, n' log -----=-2----'--" A(M)>..(M) ~
m
m,n'
m=l 00
00
Theorem 4.3. For an initial state
p(M)
=
(8)
pi M )
E
i=-OCJ
(M
=
OOK, PSK), one has the following inequalities:
(1) () N ) S-( P(PSK).,A, a(PSK)
2':
S-( (OOK). () N ) p ,A, a(OOK) ,
(8) i=-OCJ
6
i
400
(2) I-( p(PSK) ;A* , _( (OOK) A* ?: 1 p ;,
eA, eB, a(PSK)' N (3N ) (PSK) eA, eB, a(OOK)' N (3N ) (OOK)'
References 1. Accardi, L., and Ohya, M., Compound channels, transition expectation and liftings , Appl. Math, Optim., 39 , 33-59 (1999). 2. Accardi, L. , Ohya, M. and Watanabe, N., Dynamical entropy through quantum Markov chain, Open System and Information Dynamics, 4, 71-87, (1997) . 3. Accardi, L., Ohya, M. and Watanabe, W., Note on quantum dynamical entropies , Rep. Math. Phys. , 38, 457-469, (1996). 4. Alicki, R . and Fannes, M. , Defining quantum dynamical entropy, Lett . Math. Physics, 32 , 75-82 , (1994). 5. Araki, H., Relative entropy for states of von Neumann algebras, Publ. RIMS Kyoto Univ. 11, 809-833, 1976. 6. Benatti, F., Deterministic Chaos in Infinite Quantum Systems, Springer, Berlin, (1993). 7. Choda, M. , Entropy for extensions of Bernoulli shifts , Ergodic Theory Dynam. Systems, 16, No.6 , 1197-1206 (1996). 8. Connes, A., Narnhoffer, H. and Thirring, W., Dynamical entropy of C*algebras and von Neumann algebras, Commun. Math. Phys., 112, 691719, (1987) . 9. Connes, A. and Stormer , E. , Entropy for automorphisms of von Neumann algebras, Acta Math. , 134, 289-306, (1975). 10. Emch, G.G., Positivity of the K- entropy on non-abelian K-fiows, Z. Wahrscheinlichkeitstheory verw. Gebiete, 29, 241 (1974). 11. Fichtner, K.H., Freudenberg, W., and Liebscher, V., Beam splittings and time evolutions of Boson systems, Fakultat fur Mathematik und Informatik, Math/ Inf/96/ 39, J ena, 105 (1996). 12. Hudetz, T., Topological entropy for appropriately approximated C*-algebras, J. Math. Phys. 35, No .8, 4303-4333 (1994). 13. Ingarden , R.S., Kossakowski, A., and Ohya, M., Information Dynamics and Open Systems, Kluwer, (1997). 14. Kolmogorov, A.N., Theory of transmission of information, Amer. Math. Soc. Transla tion, Ser. 2, 33, 291 (1963). 15 . Kossakowski, A., Ohya, M. and Watanabe, N., Quantum dynamical entropy for completely positive map, Infinite Dimensional Analysis, Quantum Probability and Related Topics, 2, No.2, 267-282, (1999) 16. Muraki, N. and Ohya, M., Entropy functionals of Kolmogorov Sinai type and their limit theorems , Letter in Mathematical Physics., 36, 327-335, (1996) .
401
17. von Neumann, J., Die Mathematischen Grundlagen der Quantenmechanik, Springer-Berlin, (1932). 18. Ohya, M., Quantum ergodic channels in operator algebras, J. Math. Anal. Appl., 84, 318-328, (1981). 19. Ohya, M., On compound state and mutual information in quantum information theory, IEEE Trans. Information Theory, 29, 770-774 (1983). 20. Ohya, M., Note on quantum probability, L. Nuovo Cimento, 38, 402-404, (1983) . 21. Ohya, M., Some aspects of quantum information theory and their applications to irreversible processes, Rep. Math. Phys. 27, 19-47, (1989). 22. Ohya, M., Information dynamics and its applications to optical communication processes, Springer Lecture Note in Physics, 378, 81-92, (1991). 23. Ohya, M., and Petz, D., Quantum Entropy and its Use, Springer, Berlin, (1993) . 24. Ohya, M., Petz, D., and Watanabe, N., On capacity of quantum channels, Probability and Mathematical Statistics, 17, 179-196 (1997). 25. Ohya, M., Petz, D., and Watanabe, N., Numerical computation of quantum capacity, International Journal of Theoretical Physics, 37, No.1, 507-510 (1998). 26. Ohya, M., and Watanabe, N., Construction and analysis of a mathematical model in quantum communication processes, Electronics and Communications in Japan, Part 1, 68, No.2, 29-34 (1985). 27. Ohya, M., and Watanabe, N., Foundations of Quantum Communication Theory (in Japanese), Makino Pub. Co., (1998). 28. Ohya, M., and Watanabe, N., Quantum capacity of noisy quantum channel, Quantum Communication and Measurement, 3, 213-220 (1997). 29. Park, Y.M., Dynamical entropy of generalized quantum Markov chains, Lett. Math. Phys. 32, 63-74, (1994) 30. Schatten, R., Norm Ideals of Completely Continuous Operators, SpringerVerlag, (1970). 31. Umegaki, H., Conditional expectations in an operator algebra IV (entropy and information), Kodai Math. Sem. Rep., 14, 59-85 (1962). 32. Uhlmann, A., Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in interpolation theory, Commun. Math. Phys., 54, 21-32, 1977. 33. Voiculescu, D., Dynamical approximation entropies and topological entropy in operator algebras, Comm. Math. Phys., 170, 249, (1995) 34. Watanabe, N., Some Aspects of Complexities for Quantum Processes, Open Systems and Information Dynamics, 16, No.2&3, 293-304, (2009).
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 403-412)
A FAIR SAMPLING TEST FOR EKERT PROTOCOL GUILLAUME ADENIER* and NOBORU WATANABE
Tokyo University of Science, 2641 Yamazaki, Noda, Chiba 278-8510, Japan * E-mail: [email protected] Andrei Yu. Khrennikov
Linnaeus University, Vejdes plats 7, SE-351 95 Viixjo, Sweden
We propose a local scheme to enhance the security of quantum key distribution in Ekert protocol (E91). Our proposal is a fair sampling test meant to detect an eavesdropping attempt that would use a biased sample to mimic an apparent violation of Bell inequalities. The test is local and non disruptive: it can be unilaterally performed at any time by either Alice or Bob during the production of the key, and together with the Bell inequality test.
Keywords: Ekert protocol, Entangled states, Fair sampling.
1. Introduction
Ekert protocoP-3 uses entangled states to guarantee the secrecy of a key distributed to two parties (Alice and Bob) that wished to communicate secretly through a public channel. Identical measurements performed on a maximally entangled state yield perfect correlation, which can be used to produce a shared key. The secrecy of the key can be guaranteed by the violation of Bell inequalities measured for non-identical measurements. An unconditional violation of Bell inequalities would guarantee that no local (hidden) variables exist that an eavesdropper (Eve) could exploit. It would mean unconditional privacy: Eve could have full control of the detectors and the source, more advanced theory and technology, it would still be secure. 2 However, actual implementations of Ekert protocol presently require the use of photons, because a key distribution protocol is useful only if Alice 403
404
and Bob can be separated by macroscopic distances 4, which makes photons the only practical solution. The downside is that the type of photons is limited in practice by the pair creation process (parametric down conversion) and by the optical components that are used (fiber optics, polarizing beamsplitters) to wavelength at which standard photon counters have a poor detection efficiency 5 ,6. It means that optical implementations of Ekert protocols cannot avoid a rather heavy postselection: Alice and Bob must discard all measurements for which either of them failed to register a click. The trouble is that the validation of a violation of Bell inequalities observed in such a post selection protocol requires an extra assumption: the sample of detected pairs must represent fairly the population of emitted pairs (fair sampling assumption) . It has long been known that a breakdown of this assumption allows local hidden-variable models to reproduce exactly the predictions of Quantum Mechanics on the subset of detected pairs 7, unless the detection efficiency is higher than 83% 8 ,9,10 ,11. In the context of experiments on the foundations of Quantum Mechanics, the fair sampling assumption is usually considered reasonable, with the idea that Nature is not conspiratory. In Quantum Key Distribution however, Eve is expected to conspire 12, so that Alice and Bob must assume that Eve is actually attempting to bias their sample. a Naturally, this issue becomes critical if Eve manufactured the detectors, which means that Alice and Bob should thoroughly check that their detectors are functioning according to specifications 2. We will argue here that photomultipliers or avalanche photo diodes can in principle be subjected to such a bias sampling attack, by exploiting the thresholds of these detectors, in particular if they exhibit nonlinearities.
2. A biased-sampling attack on Ekert protocol The motivation for the possibility of a biased-sampling attack is that avalanche photo diodes and photomultipliers are fundamentally threshold detectors. They are sometimes referred to as such because they cannot distinguish between the absorption of one or of several photons, but it should also be pointed out that the principle of detection itself relies on thresholds, and that these thresholds are essential to the proper functioning of these aThis possibility should not be underestimated. A successful quantum hacking has already been successfully implemented experimentally with a time-shifting attack 13. The attack essentially introduced a hidden-variable in the protocol (the time-shift) to influence the probability for Alice and Bob to get either a 0 or a 1.
405
ipB
Figure 1. Standard Ekert protocol. Alice and Bob randomly switch their measurement settings. Pairs associated with identical measurement settings (if A = if B) are used to produce a correlated key, while those associated to non-identical measurement settings (if A i= if B) are used to check the violation of Bell inequality (and the security of the key).
detectors. At the input, the energy must be higher than the band gap to trigger an avalanche or a photoelectron; while at the output, the current must be higher than a discriminator value to be counted as a click 14. This combined threshold could be exploited by Eve to obtain an apparent violation of Bell inequalities on the detected sample using only local states entangled state 15 . The idea of the attack would be to replace the genuine source of entangled photons with a source of near threshold pulses correlated in polarization. These pulses would lead to a probability of generating a click in either output channel that would depend on the characteristics of the pulse and the measurement settings. Consider a near-threshold pulse linearly polarized along >- impinging on a polarizer set at a measurement angle cp. It would have a significantly greater chance to produce a click when the angle difference 0: between these two variables is 0: = 1>- - cpl ~ k1r /2 (parallelor orthogonal) than when 0: ~ k1r/2 + 7f/4 (diagonal). Indeed, in the first case most of the energy of the pulse would go in one specific channel, retaining enough energy in this channel to remain above the threshold, while in the other case the energy of the pulse would be split between both channels, in principle below the threshold. In the context of low efficiency of detection, it would bias the sampling of the pulses with the effect of artificially increasing the visibility of the coincidence counts, thus leading to an apparent violation of Bell inequalities 15.
406
In principle, Eve could even obtain an apparent violation of Bell inequalities reproducing exactly the predictions of Quantum Mechanics on the detected sample, by reproducing the asymmetrical detection pattern of a Larsson-Gisin model 10,11. For this purpose, Eve would have to send pairs of correlated pulses with energy Eo = 2 and polarization A, with A being a random variable uniformly distributed on the interval [-Jr /2, Jr /2]. On one side, Eve would have to send pulses to which the detectors react with a very steep rising detection probability at the threshold (ideal threshold), while on the other side, Eve would have to send pulses to which the detectors react with a detection probability rising linearly at the threshold (linear threshold). With the condition Eo = 2, what happens is that Malus' Law is cut by the bottom precisely at the intersection of the two channels (at I'P - AI = Jr /4). Consequently, Alice would always records a click in exactly one channel: channell if I'PA - AI < Jr/4, channel 0 otherwise; whereas on Bob's side the probability to get a click in the channel 1 would vary with cos2('PB - A) when I'PB - AI < Jr/4, and with sin2('PB - A) when I'P B - AI > Jr / 4 in channel O. The crucial feature of the resulting detection pattern is that the probability to obtain a click in either channel on Bob's side depends explicitly on A: it is maximum for I'PB - AI = 0, and decreases down to zero for I'PB - AI = Jr /4. The sampling is thus unfair, or biased, and leads to an apparent violation of Bell inequalities on the detected sample 10,11,15
Note that if instead of scanning the full correlation, Alice and Bob are only checking a few points of the correlation predicted by Quantum Mechanics, as is the case in Ekert protocol, Eve would not need to aim at reproducing the full correlation and could therefore implement her attack with a symmetrical design, inducing the same detection pattern for Alice and Bob. However, if each pulse contains at most one photon the biased sampling described here would be ineffective, because the energy seen at a detector would always be the same regardless of the measurement settings (when this photon does reach a detector). Eve would therefore have to produce nearthreshold pulses consisting of several photons of lower frequencies. In order for her attack to work however, she needs that the probability that a pulse produces a click in either channel be close to zero for angle differences close to diagonals, that is for angle differences close to a = I'P - AI = Jr / 4 + kJr /2, while it should be non zero, and significantly greater than the probability of a dark count, for other angles differences, in particular for a = I'P - AI = k7r /2.
407
3. Countermeasures In order to prevent Eve from using this bias sampling attack, Alice and Bob can in principle use several countermeasures. The first countermeasure consists in increasing the efficiency to reach 83%. However, this proves difficult with threshold detectors. Decreasing the band gap threshold does increase the efficiency of the detectors, but only at the cost of higher dark count rates. Unless special detectors operating near absolute zero temperature are used, such as Transition-Edge Sensors (which are too cumbersome and slow to be practical solution to QKD), this can be considered a general rule that applies to any detectors, and fundamentally limits their efficiencies, so that this desirable solution can be considered unrealistic in a quantum key distribution framework. The second countermeasure would be to guarantee that Alice's and Bob's detectors are not susceptible to nonlinearities at any frequency. The probability that a photon produces a click in a detector should remain the same regardless of the circumstances, in particular it should not depend on the number of other photons reaching the detector simultaneously. If this can be guaranteed, then Eve cannot exploit the threshold because each photon sees the threshold independently of the presence of other photons, and is therefore either above the threshold or below, regardless of the instrument settings. In practice, it might be difficult to guarantee that detectors do not exhibit nonlinearities at any frequency, and as much as this question has been studied experimentally, the result is that detectors do exhibit nonlinearities 16. A third countermeasure would consist in using filters to prevent Eve from using lower frequency photons. This would however come at the expanse of a lower overall quantum efficiency. The narrower the bandwidth of the filter, the lower the quantum efficiency. With photon detectors that have significant amount of dark count, this would mean an increased of the quantum bit error rate. The imperfections of the filters could also become the target of Eve's attack. If for instance the filter does not provide a complete extinction at a frequency usable for an attack, Eve would only have to send brighter pulses to allow enough of these photons to go through and perform the bias sampling attack. In principle, any of the three above countermeasure would be enough to prevent Eve's attack, but they can be very difficult or impractical to implement. This leads us to suggest a fourth possibility, which is to test the fairness of the sampling.
408
Figure 2. Fair Sampling test on Alice's side. The detector Al is replaced by a polarimeter with two detectors At and A 1 , whereas the detector Ao is replaced by a polarimeter with two detectors At and Ail. Ekert protocol is unaltered by our test if all detectors have the same efficiency "I): the light green area is equivalent to detector Al with efficiency "I), whereas the light red area is equivalent to detector Ao with efficiency "I).
For this purpose, we propose to analyzing the output channels of the polarizing beamsplitters, instead of simply feeding detectors with them. We keep the standard design of Ekert protocol, with two polarizing beamsplitters on each side (Alice and Bob) projecting the incoming pulses on random bases rp A and rp B, as depicted on Fig. 1, but we replace each detectors by a polarimeter: a polarizing beamsplitter followed by a detector at each output. Consider Alice's side (see Fig. 2). We label the additional PBS in channell by its orientation eA, , and the one in channel 0 by its orientation eAo. A click in any of the two detectors following eA, is counted as aI, whereas a click in any of the two detectors following eAo is counted as a o. Bob proceeds similarly with two polarimeters labeled eB, and eBo. From the point of view of Ekert protocol and of a genuine source of entangled photons, nothing is changed. Consider Alice's channell. The setup constituted by the PBS eA, and its corresponding detectors can be seen as one big single detector in channell, in which the orientation eA, has no influence on the result, that is, if we assume a balanced quantum efficiencies
409
TJ of the detectors at the two outputs. A photon exiting the PBS oriented
along r.p A through channell will be detected in either output channel after () Al with a probability TJ. Similarly, the setup constituted by the () Ao and its corresponding detectors can be seen as one single detector in channell, where the orientation () Ao plays no role whatsoever, and the same goes for Bob's setup. As long as all the detectors have the same efficiency TJ, each polarimeter can then be considered as one detector with quantum efficiency TJ· The polarimeter () Al can be seen as one single detector in channell, in which the orientation () Al has no influence on the result: a photon exiting the PBS r.p A through channel 1 will be detected in either output channel of polarimeter () Al with a probability TJ. Similarly, polarimeter () Ao can be seen as one single detector in channel 0, where the orientation () Ao plays no role whatsoever, and the same goes for Bob's setup. The production of the key and the verification of the violation of Bell inequalities is thus unaltered by our fair sampling test setup in case of a genuine source of entangled photons, because the additional measurement settings () AI' () A o , () BI and () BI controlled by Alice and Bob have no influence on the measured results. However, they have a strong influence on the result in the case of a biased sample attack by Eve. Let us consider the simpler case of an ideal threshold detector. By ideal threshold detector, we mean a detector that produces a click with certainty if an only if the pulse impinging on the detector carries an energy Eo greater than the detector threshold . Eve sends pairs of correlated pulses with energy Eo and polarization .x, where .x is a random variable uniformly distributed on the interval [0,271"[. By Malus law, the energy of the pulse reaching Alice's detector is
At
(1) Starting from a uniform distribution of the polarization .x of pulses on the circle, we write that !PAdAI = IFA +I (E A+ )dEA+ I, so that the probability I I to get an energy between E A +1 and E A +I + dEA +I in the A + channel is given by
(2)
where Emax = Eo cos 2 (r.p A detector (by Malus' law).
-
() AI)
is the maximum energy reaching the
410
The probability to obtain a click in an ideal threshold detector placed at the transmitted output (+) of polarimeter Al is then simply the integral of this density distribution over the energy reaching the detector, from the threshold to Emax:
(3)
(4) Similarly the probability to obtain a click in an ideal threshold detector positioned at the reflected output (-) of polarimeter Al is
2
Eo sin ('P A
-
BAI)
.
(5)
We can also consider a less ideal threshold detector, that would be more likely to resemble the characteristics of a real detector. In order to keep the calculations of the integrals simple enough, we considered the case of threshold detector with linear rise, that is, a threshold detector for which the probability to generate a click increases linearly starting from the threshold , possibly with a saturation value after which increasing the impinging energy no longer increases the probability to generate a click. In these cases, the analytical results are more complicated since the probability to get a click for an energy E + dE is not always equal to 1, but the principle of calculation remains the same: integrate the product of the probability density distribution by the probability of obtaining a click for a given energy. The analytical results are qualitatively similar to that of ideal threshold detectors. The results in the case Eo = 2- which is leading to a violation of Bell inequalities exactly reproducing the predictions of Quantum Mechanics- are displayed in Fig. 3: the probability to get a click in the polarimeter oriented along B Al depends on I'P A - BAli . It is maximum for I'P A - B Al I = 0 + k7r /2, and reaches zero for I'P A - BAli = 7r /4+ k7r /2. Similar results would be obtained for Alice's BAo polarimeter, and for Bob's polarimeters. This fair sampling test can be implemented very simply on Alice's side by fixing B Al = B Ao = O. The random switching in Ekert protocol (Fig. 1 and Fig. 2) ensures that the points at 0 and 7r/4 are both scanned automatically. Any significant difference in the number of single counts recorded
411
OM predictions
0.4
0.3 0.2 0.1
o
7r
7r
37r
4
2
4
Figure 3. Analytical Results in case of biased-sampling attack on threshold detectors in Alice's Al channel, with Eo = 2<1>. The blue plots represent the probability to get a click in detector whereas the purple plots represent the probability to get a click in detector Ai. The probability to get a click in polarimeter () A, thus depends on I'P A - () All· By contrast, in case of a genuine source of entangled photons, Quantum Mechanics predicts independence.
At,
when 'P A = 0 and 'P A = 7f / 4 would betray Eve's attempt to bias the sample through a biased-sampling attack on the threshold detectors. Similarly, Bob would chose B, = Bo = 7f /8, and compare the number of singles when 'PB = -7f/8 and 'PB = 7f/8.
e
e
4. Conclusion This fair sampling test can be implemented during the production of the key and together with the violation of Bell inequality check, so that it seems hard to fool it without reducing the visibility of the violation. For instance, increasing the energy of the pulses with respect to the threshold would tend to reduce the dip in the fair sampling test, but it would reduce the visibility of the correlation (weaker violation of Bell inequalities) at the same time, and it would also give rise to double counts. The combination of a Bell inequality test with a monitoring of the double counts and our local fair sampling test therefore constitutes a solid scheme
412
against eavesdropping a E91 protocol using a biased-sampling attack. In principle, the same idea could be implemented in other QKD protocol, by replacing passive detectors in each channel by a device with the same efficiency, that would analyze further whichever degree of freedom is used to encode the key, instead of simply feeding detectors with it b.
Acknowledgments We are grateful to Hoi-Kwong Lo, Jan-Ake Larsson, Takashi Matsuoka and Masanori Ohya for useful discussions on Quantum Key Distribution.
References 1. A. K. Ekert, Phys. Rev. Lett. 67, 661(Aug 1991). 2. N. Gisin, G. Ribordy, W. Tittel and H. Zbinden, Rev. Mod. Phys. 74, 145(Mar 2002). 3. G. Jaeger, Quantum Information (Springer New-York, 2007). 4. V. Scarani, H. Bechmann-Pasquinucci, N. J. Cerf, M. Dusek, N. Ltitkenhaus and M. Peev, Rev. Mod. Phys. 81, 1301(Sep 2009). 5. H.-K. LO and Y. Zhao, Quantum cryptography (2008). 6. A. S. H. Z. J. G. R. T. W. D. Stucki, G. Ribordy, Journal of Modern Optics 48, p. 1967 (2001). 7. P. M. Pearle, Phys. Rev. D 2, 1418(Oct 1970). 8. A. Garg and N. D. Mermin, Phys. Rev. D 35, 3831(Jun 1987). 9. P. H. Eberhard, Phys. Rev. A 47, R747(Feb 1993). 10. J. Ake Larsson, Physics Letters A 256, 245 (1999). 11. N. Gisin and B. Gisin, Physics Letters A 260, 323 (1999). 12. A. Ekert, Less reality, more security(September, 2009). 13. Y. Zhao, C.-H. F. Fung, B. Qi, C. Chen and H.-K. Lo, Phys. Rev. A 78, p. 042333(Oct 2008). 14. G. F. Knoll, Radiation Detection and Measurement (Wiley &Sons, 1999). 15. G. Adenier, Violation of bell inequalities as a violation of fair sampling in threshold detectors FOUNDATIONS OF PROBABILITY AND PHYSICS5 1101 (AlP, 2009). 16. K. J. Resch, J. S. Lundeen and A. M. Steinberg, Phys. Rev. A 63, p. 020102(Jan 2001). 17. H.-K. L. Bing Qi, Li Qian, A brief introduction of quantum cryptography for engineers (2010).
bThe use of four detectors on each side can also serve other purpose, like shielding Alice and Bob from a time-shift attack 17
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 413-426)
BROWNIAN DYNAMICS SIMULATION OF MACROMOLECULE DIFFUSION IN A PROTOCELL TADASHI ANDO
Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology250 14th Street NW, Atlanta, GA 30318-5304, USA
JEFFREY SKOLNICK
Center for the Study ofSystems Biology, School ofBiology, Georgia Institute of Technology250 14th Street NW, Atlanta, GA 30318-5304, USA The interiors of all living cells are highly crowded with macromolecules, which differs considerably the thermodynamics and kinetics of biological reactions between in vivo and in vitro. For example, the diffusion of green fluorescent protein (GFP) in E. coli is -IO-fold slower than in dilute conditions. In this study, we performed Brownian dynamics (BD) simulations of rigid macromolecules in a crowded environment mimicking the cytosol of E. coli to study the motions of macromolecules. The simulation systems contained 35 70S ribosomes, 750 glycolytic enzymes, 75 GFPs, and 392 tRNAs in a 100 nm x 100 nm x 100 nm simulation box, where the macromolecules were represented by rigid-objects of one bead per amino acid or four beads per nucleotide models. Diffusion tensors of these molecules in dilute solutions were estimated by using a hydrodynamic theory to take into account the diffusion anisotropy of arbitrary shaped objects in the BD simulations. BD simulations of the system where each macromolecule is represented by its Stokes radius were also performed for comparison. Excluded volume effects greatly reduce the mobility of molecules in crowded environments for both molecular-shaped and equivalent sphere systems. Additionally, there were no significant differences in the reduction of diffusivity over the entire range of molecular size between two systems. However, the reduction in diffusion of GFP in these systems was still 4-5 times larger than for the in vivo experiment. We will discuss other plausible factors that might cause the large reduction in diffusion in vivo.
1.
Introduction
One of the most characteristic features of the interiors of all living cells is the extremely high total concentration of biological macromolecules. Typically, 20-30% of the total volume of cytoplasm is occupied by a variety of proteins, nucleic acids and other macromolecules. Under these conditions, the distance between neighboring proteins is comparable to the protein size itself, though the molar concentration of each protein ranges from nM to 11M. In this crowded, heterogeneous environment, biomolecules work to maintain living systems and 413
414
they have evolved over several billion years. Therefore, modeling the crowded cellular environment is not only an important first step toward whole cell simulation but also a crucial factor in understanding the nature of living systems. In this study, we performed Brownian dynamics (BD) simulations of rigid macromolecules in a crowded environment mimicking the cytosol of E. coli to study the motions of macromolecules. BD simulations using an equivalent sphere system, where macromolecules were represented by their Stokes radius were also performed. It has been reported that the diffusion of green fluorescent protein (GFP) in E. coli is ~10-fold slower than in dilute conditions (1, 2). Our aim is to investigate the mechanism(s) that causes this large reduction in diffusion in vivo. 2.
Methods
2.1. Estimation of diffusion tensor of a macromolecule from atomic structure To account for the diffusion anisotropy of macromolecules in our simulation, the diffusion tensors of macromolecules were calculated by using the rigid-particle formalism method (3-5). Here, we will describe this approach briefly. The diffusion of an arbitrarily shaped object undergoing Brownian motion is expressed by a 6 x 6 diffusion tensor, D, which is related to a frictional or resistance tensor, S, through the generalized Einstein relationship, D = kBT S·l. Both D and S can be partitioned into 3 x 3 sub-matrices, which correspond to translation (tt), rotation (rr), and translation-rotation coupling (tr and rt) tensors: D= (
_ )-1 D,,) = k T(SII D - '
DII
.:I
B
D"
rr
.:I
tr
"
~rr
(1)
where (2) Here, the superscript T indicates transposition. Translational and rotational diffusion coefficients in a dilute solution are given by Do" = 1/3 Tr{D,,),
(3)
1/3 Tr{n,,),
(4)
Do" where Tr is the trace of the tensor.
=
415
The components of E can be obtained by the following procedure. From the Cartesian coordinates of the object consisting of N beads with the same radius, a, the 3 x 3 hydrodynamic interaction tensors between beads i and}, Tij (i,) = 1, ... , N) are calculated using the expression formulated by Rotne and Prager (6) and Yamakawa (7), the so-called RPY tensor, l";j ~
T.= 'J
2a, (5)
Here, 1] is the viscosity of the solvent and rij is the distance vector between beads i and j. It is important to note that the radius of bead is the only parameter to be optimized to reproduce hydrodynamic properties in dilute conditions. In what follows, we ignore intermolecular hydrodynamic interactions. Now, consider a 3N x 3N supermatrix, B, consisting of N x N Bij blocks at an arbitrary origin 0
B= [
Bij
1
B." ;
:.. ..
B:N ; ,
BNl
...
BNN
(6)
=6ij 6:17a + (1-6ij)rij.
Here, ~ij is the Kronecker delta function. This supermatrix is then inverted to obtain a 3N x 3N supermatrix, C, ...
1
C IN . , ... C NN
.
in which each of the written as
Cij
block is a 3 x 3 matrix. Now, the elements ofE can be
Ell
= LLCij'
E" = LLUi ·Cij' E"
where
(7)
=-LLUi·Cij.U
(8) j ,
416
U,
~[ ~ -Y;
-Zi
0
x;
-;;
y, ]
(9)
.
Here, Xi, Yi, and ziare the components of the position vector of bead i at origin 0. So far the choice of the origin of the coordinates has been left arbitrary. However, the diffusion tensor, D, especially translational and translation-rotation coupling tensors, depends on the origin. At a certain origin, the so-called center of diffusion Q, the translational diffusion coefficient reaches a minimum. The position of Q with respect to the arbitrary origin 0, is calculated with the diffusion tensor obtained at 0, Do, as
[ 1[ XOQ
ROQ = YOQ = Z Oil
XY _DTr,O
DYY + D" Tr,D Tr,O -
D;;,o xz D Tr,O
D Tr,O + D" Tr,O XX
-D" Tr,O - D:'~o ] DTr,O + DTr,O XX
YY
_1[ D
Y'
fr,O
-D'Y fr,O
]
Dt~O - Dt;~O . (10)
DXY - Dtr,O tr,O YX
D at the center of diffusion Q are calculated by D !t,Q =DIt,D -U OQ ·Drr,O ·U OQ +Drt,O ·U OQ -U OQ ·Dtr,O' D rl,!) =D rl,O -U OQ ·DTr,O'
(11)
D rr,Q =DTr,O'
where
U OQ
=( Z~Q
l-YoQ
o
YOQ]
-~OQ
.
(12)
A volume correction term for rotational and intrinsic viscosity estimation is applied in Eq. 8 in some studies (3, 4). However, significant deviations of calculated diffusion properties from experimental values were not observed even without the correction.
2.2. Brownian dynamics for arbitrarily shaped objects BD is one of the most important simulation approaches to investigate the Brownian motion of arbitrary shaped objects, in which solvent molecules are treated implicitly and the influence of solvent on solute particles is incorporated through frictional and stochastic forces (8). In the high-friction limit, where it is assumed that momentum relaxation is much faster than position relaxation, and when we treat the diffusion tensor as a constant, a BD propagation scheme for an arbitrarily-shaped object can be written as (9)
417
Xi
=
X~ + kl:,.~ Di ·Fi +gi(M), P
(13)
B
where I:,.t is the time step and Xi is the vector describing the position of the center of diffusion and orientation of the i-th object, (14) Here, rJ, r2, and r3 are the position of the center of diffusion, and qJ I, qJ2, qJ3 describe its orientation. FP is a generalized system force having two components, the force acting on the center of diffusion (f) and the torque Cr):
(15) g(l:,.t) is a 6 x 1 random displacement vector during time step I:,.t due to the Brownian noise, which satisfies the following relations: (16) Here Di is the 6 x 6 diffusion tensor of object i at the center of diffusion. Once this diffusion tensor is calculated as described above, we can compute the random displacement vector using a Cholesky decomposition technique (9). The Cholesky decomposition of the diffusion tensor D is determined as
D=S·ST,
(17)
where S is a lower triangle matrix. The desired vector geM) is then obtained by the following: (18) where Z is a 6 x 1 vector, which has elements chosen from a Gaussian distribution so that (19) In BD simulations, quatemions, q = (qQ, ql, q2, q3), were used for handling rotations of rigid objects (8). Diffusion tensors of objects were evaluated in body-fixed frames only once at the start of the simulation. The force and torque on each object calculated in the laboratory or space-fixed frame (f' or T S) were converted to their body-fixed frame (f' or T b) using the rotation matrix Q obtained with quatemions, (20)
418
For each step, quatemions were scaled usmg Lagrange's method of undetermined multipliers to satisfY q2 = I (10).
2.3. Potential function In this study, we considered only repulsive interactions between intermolecular particles in BD simulations using a soft-sphere potential described by (21) is the distance between particles i andj, and kss is a force constant. rm is ai + aj + fl, in which ai and aj are radii of particles i and j and fl is an arbitrary parameter representing buffer distance between particles. In this study, fl of 2 A and kss of 5kB TI fl2 was used, which means Vss = 5kB T at the distance rij = ai + aj. where
rij
2.4.
Simulation conditions and analysis
All simulations were performed at 298 K with periodic boundary conditions. For all simulation systems, ten independent simulations were run with different initial configurations. 35 I-ls simulations were performed with time step of 0.5 ps. Configurations of the systems were sampled every 1 ns. Trajectories for the first 5 I-lS were discarded for analysis. The translational diffusion coefficient of a particle in three dimensions is estimated by
([r(t + T)- r(t )]') = 6D T ,
(22)
where ret) is position of the particle at time t and T is time interval. () indicates the ensemble average over the same particle type and time t.
419
3. Results
3.1. Estimation of diffusion tensor of a macromolecule from atomic structure 0.18
Translational --B-Rotational -e-
0.16 ,;<11
0.14
·0
8
s:-
0.12
9
0.10
"0
-g
0
Q
v
0.08 0.06 0.04 5.0
5.5
6.0
6.5
7.0
Radius (A)
Figure 1. Errors in translational and rotational diffusion coefficients calculated by the rigid-particle formalism as function of bead radius. Do"l and Doexp represent the translational or rotational diffusion constant in a dilute condition calculated by theory and estimated by experiment, respectively. Values for translational (open squares) and rotational (open circles) diffusion coefficient were averaged over the twelve proteins listed in Table 2 of Ref. (4) except for lactose dehydrogenase. Lines are drawn to guide the eye.
To estimate diffusion tensors of macromolecules from their atomic structures, the rigid-particle formalism was used (3-5). As described in Methods, the particle radius is the only one parameter optimized to reproduce the experimental translational and rotational diffusion coefficients of macromolecules at infinite dilution. The bead radius was optimized using the twelve different proteins whose molecular weight ranges from 6 kDa to 230 kDa, which are the same as those used in Ref. (4) except for lactose dehydrogenase. Proteins were represented by Ca beads. Figure I shows that the difference between experimental and calculated diffusion coefficients average over the twelve proteins of Ref. (4) as a function of bead radius. Radius of6.1 A gave the minimal error in both translational and rotational diffusion coefficients. For example, the method provided translationla diffusion coefficient for GFP of 8.9 A2/ns at 293 K, which is excellent agreement with experimental value of 8.7 A2/ns for this protein in dilute conditions at room temperature (11, 12). In order to simulate the inside of cells, we treat not only proteins but also nucleic acids. Thus, diffusion coefficients of a Phe-tRNA (PDB ID: lEHZ) were also evaluated using the same method. Nucleic acids were represented by P, C4', NI,
420
and N9 beads. The bead density of the tRNA was 3.7 beads/run3 , close to the value of proteins averaged over the twelve proteins, 3.5 beads/run3 . Using the same bead radius as proteins, the translational diffusion coefficient of the tRNA is 7.6 N/ns at 293 K, which was also in good agreement with experimental value of7.8 N/ns, at the same temperature (13).
3.2. Construction of the intracellular environment Table I. Macromolecules in the simulation system Name 70S ribosome
PDB ID
3IlQ& 3IlR IQI8
Molecular weight (Da)
No. of molecules in the system
Translational Do at 298 K (Nlns)
115.2
2.13
72,504 75 Hexokinase 34.9 Glucose-6-phosphate IGZV 126,649 75 40.1 isomerase 6-phosphofructokinase /44,514 75 IPFK 42.4 isozyme I Fructose 1-6 75 !DOS 78,235 36.8 bisphosphate aldolase Triosephosphate 75 ITRE 54,006 31.7 isomerase Glyceraldehyde-3-P 43.1 75 IS7C 145,013 dehydrogenase Phosphoglycerate 75 IZMR 41,287 29.2 kinase Phosphoglycerate 75 57,544 32.3 IE58 mutase 91 ,267 75 35.9 Enolase IE9I Pyruvate kinase IPKY 203,184 75 52.9 75 24.0 GFP IW7S 26,936 Phe-tRNA IEHZ 25,203 196 28.2 Initial tRNA 3CW5 24,833 196 27.8 tStokes radii were calculated by Do = kBTI67trta, where a is the Stokes radius.
7.02
2,155,152
35
Stokes radius (A)t
6.12 5.78 6.67 7.73 5.69 8.41 7.59 6.82 4.63 10.2 8.69 8.83
The SIze and molecular contents of the E. coli have been estimated by experimentally. To construct a virtual E. coli cytoplasm, we used a statistics data summarized in the CyberCell Database (14). The total cell volume of E. coli is about 1 fL and about 70% of this total volume is cytoplasm, in which the macromolecular concentration reaches 300-400 mg/ml (15). In terms of dry weight, half of that mass is protein and 20% of the protein complement of the cell is ribosomal protein. The number of cytoplasmic proteins excluding ribosomal proteins is roughly 1,000,000. The number of ribosomes consisting 1/3 protein and 2/3 rRNA in mass is about 18,000, which occupies 10% of the total cell volume. tRNAs are also abundant and about 200,000 molecules exist
421
in the cell. Based on these data, a protocell that containing 35 70S ribosomes, 750 enzymes involving in glycolysis which are the most abundant proteins in E. coli cells (16), 75 GFPs for a tracer protein that we can compare to experimental results, and 392 tRNAs in a 100 nm x 100 nm x 100 nm simulation box was constructed (Table 1 and Figure 2 left panel). The total concentration is 271 mg/ml and the volume occupancy reaches at 41 % when the radii of all beads were set to 6.l A. The simulation box has 1,252 molecules and total 1,279,871 beads. For comparison, simulation systems where each macromolecule was represented by an equivalent sphere with its Stokes radius were also constructed (Figure 2 right panel). Volume occupancy in this system reaches at 45%.
Figure 2. Molecular-shaped (left) and equivalent sphere (right) systems. Macromolecules are represented in different colors. These figures were generated by VMD (17).
3.3. Effect of molecular shapes on diffusion Next, we performed BD simulations using two systems that resemble the E. coli cytoplasm: molecular-shaped and equivalent sphere systems. In order to compare the diffusivity of macromolecules between two systems as well as experiments, we will concentrate on the analysis of the translational diffusion of molecules. Hereafter, diffusion refers to translational diffusion. Mean square displacements (MSD) for several macromolecules in the both systems as a function of time are shown in Figure 3. For both systems, as in other simulation studies on diffusion in cytosol-like systems, crossover from anomalous diffusion to normal diffusion were observed for all molecules in short times (18-20). Subsequently, a linear relationship between time and MSD is well observed for long time for all molecules and the diffusion coefficients
422
quickly converged. In Figure 4, the ratio of the long-time translational diffusion coefficients observed in the virtual cytoplasm system to that estimated in dilute solution, diDo, as a function of Stokes radius is shown. For both systems, the ratio diDo decreased with increasing molecular radius, which is qualitatively consistent with other simulation studies (18-20) and experimental results on eukaryotic cells (21). The results of explicit molecular-shaped and the equivalent sphere systems are very close over the entire range of radii. These results suggest that effects of macromolecular shape on molecular diffusion in crowded environments are small and that the single sphere per one molecule model is a reasonable approximation for the analysis of in vivo diffusion. In experiments, the reduction in diffusion of OFP is about 0.06-0.09 (11, 12) (see also Figure 4). On the other hand, dIDo values of OFP obtained in molecular-shaped and sphere systems are 0.4 and 0.42, 5-7 times larger than experiment. This result indicates that (consistent with other studies (18-20)) although excluded volume effects greatly reduce the macromolecules diffusion rate in intracellular environments, they cannot explain the factor of ~ 10-16 reduction in diffusion constant observed in vivo. 7xlO"~ ,--_-_-_--~----,
7x
IlY , - - _ - _ - _ - - _ - - - - .
aj
~
4x 105
"
~ 3x1O-'
'""' 5 X
::.
\04
o 4x 104 ~
3x
Hr
I X 10"
Ox
~~~ X'I~0'i:5.::0X:IO;;;';"1.0"'x""I0"",,..,;'.,":"x~101~2~.oX":"'~0'~2."""X""'IO"""""'3.0"jX Time interval (ns)
10'
Jiili'1::s:;~""""'~~"":""""':"'''"':''''':''''''''''''''''"!
Ox 10° 0.0 X 10° 5.0 X]02 LO X 10.1 1.5 X ]OJ 2.0 X 1O-~ 2.5 X to] 3.0 X Ja' Time interval (ns)
Figure 3. MSDs of three macromolecules in sphere system (a, c) and molecular shaped-system (b, d) as a function of time interval, To MSDs up to 25 !!S are shown in upper graphs (a, b). MSDs over the short-time range are shown in lower graphs (c, d). Solid lines in lower graphs are fitted lines in the time range from 10 to 25 !!s. GAPDH represents glyceraldehyde-3-P dehydrogenase.
423
0.5
!
Sphere system Molecular-shaped system DH5a BL21(DE3) K-12
I
:
-:O!,
0.4
~~ o 0
0.3
-i
0
...,8 Q
0.2
0
0
0.1 ~
I I I
:
0 20
30
40
50
60
70
80
90
100
110
120
Radius (A)
Figure 4. The reduction in long-time diffusion coefficient as a function of radius in simulation of sphere and molecular-shaped system with steric repulsion. Open spheres and filled spheres are for the values in sphere system and molecular-shaped systems, respectively. Three values of the reduction in diffusion ofGPF measured in vivo ofDH5a (I, 2), BL2I(DE3) (I, 2), and K-12 (I, 2)E. coli are also shown for comparison. The values corresponding to GFP diffusivity are surrounded by the dashed line to guide the eye.
4. Discussion To investigate a possible cause of the large reduction in diffusion in vivo, we performed BD simulations on two different types of systems: a molecular-shaped system and an equivalent sphere system. Our simulation results provide for two important conclusions: First, excluded volume effects are insufficient to explain the large reduction in diffusion of macromolecules observed in vivo. Second, the one sphere per one molecule model is a good approximation to describe macromolecular diffusion III intracellular environments. In addition to the excluded volume effects, a number of other factors can affect the nature and magnitude of intracellular diffusion. 1) HydrodynamiC interactions (HI). Effects of HI on dynamics of particles have been well studied in the field of colloidal suspensions. Simulation studies showed that HI greatly reduce the diffusivity of monodisperse colloids especially in dense systems (22, 23). Evaluating HI of an N particle system is computationally expensive scales as D(N3 ) (22). Therefore, it is very difficult to consider HI in BD of molecular-shaped systems. However, our demonstration of the relatively minor shape effect enables HI to be calculated in a computationally tractable manner for the equivalent sphere system. 2) Non-specific interactions. One fourth of surface residues of proteins are hydrophobic (24); this which could give rise to
424
attractive interactions between molecules. In addition, electrostatic might be important even though they are well screened under physiological condition s(the Debye length is ~8 A). 3) Viscosity of the cytoplasm. In our simulations, the viscosity of water was used as a parameter in calculating diffusion tensors of molecules and simulations. The in vivo viscosity has been measured by various methods, which indicated that viscosity of the cytoplasm medium is not significantly larger than that of bulk water (21).4) GFP dimerization. It is well known that GFP tends to dimerize in solutions of low « 100 mM) ionic strength (25). All of these physical factors will decrease macromolecular diffusivity in vivo. These factors should be examined in further work. 5. Conclusions Until now, little attention has been paid to the biophysical properties of crowded environments, which have a great impact on biological reactions taking place inside cells. Therefore, modeling these crowding effects is an important first step towards whole cell simulation. In that spirit, by conducting a series of Brownian dynamics simulations, the following conclusions were obtained: First, although excluded volume effects can significantly reduce the diffusivity of macromolecules, this effect is insufficient to explain the large reduction that is observed in vivo. Second, representing a macromolecule by a single equivalent sphere is a reasonable approximation for analyzing diffusion of macromolecules in vivo. Thus, in future work, using our protocell, we shall explore the role of hydrodynamic and electrostatic interactions as in the reduction of in vivo diffusivity. Acknowledgements This work was supported in part by grant No. GM-37408 of the Division of General Medical Sciences of the National Institutes of Health. References l.
2.
3.
Elowitz MB, Surette MG, Wolf PE, Stock JB, & Leibler S (1999) Protein mobility in the cytoplasm of Escherichia coli. J Bacterial 181(1 ): 197-203. Konopka MC, Shkel lA, Cayley S, Record MT, & Weisshaar JC (2006) Crowding and confinement effects on protein diffusion in vivo. J Bacterial 188(17):6115-6123. Carrasco B & Garcia de la Torre J (1999) Hydrodynamic properties of rigid particles: comparison of different modeling and computational procedures. Biophys J76(6):3044-3057.
425
4.
5.
6. 7. 8. 9. 10. 11.
12.
13.
14.
15.
16. 17. 18.
19.
20.
Garcia De La Torre J, Huertas ML, & Carrasco B (2000) Calculation of hydrodynamic properties of globular proteins from their atomic-level structure. Biophys J 78(2):719-730. Garcia De La Torre J, Jimenez A, & Freire JJ (1982) Monte-Carlo Calculation of Hydrodynamic Properties of Freely Jointed, Freely Rotating, and Real Polymethylene Chains. Macromolecules 15( 1): 148-154. Rotne J & Prager S (1969) Variational Treatment of Hydrodynamic Interaction in Polymers. J Chern Phys 50(11):4831-4837. Yamakawa H (1970) Transport Properties of Polymer Chains in Dilute Solution - Hydrodynamic Interaction. J Chern Phys 53(1):436-443. Allen MP & Tildesley DJ (1987) Computer simulation of liquids (Oxford University Press, Oxford New York). Ermak DL & Mccammon JA (1978) Brownian Dynamics with Hydrodynamic Interactions. J Chern Phys 69(4):1352-1360. Hiyama M, Kinjo T, & Hyodo S (2008) Angular momentum form of Verlet algorithm for rigid molecules. J Phys Soc Jpn 77(6):064001. Swaminathan R, Hoang CP, & Verkman AS (1997) Photobleaching recovery and anisotropy decay of green fluorescent protein GFP-S65T in solution and cells: Cytoplasmic viscosity probed by green fluorescent protein translational and rotational diffusion. Biophysical Journal 72(4): 1900-1907. Terry BR, Matthews EK, & Haseloff J (1995) Molecular Characterization of Recombinant Green Fluorescent Protein by Fluorescence Correlation Microscopy. Biochem Bioph Res Co 217(1):21-27. Potts R, Fournier MJ, & Ford NC (1977) Effect of Aminoacy1ation on of Yeast Phenylalanine Transfer-Rna. Nature Conformation 268(5620):563-564. Sundararaj S, et al. (2004) The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli. Nucleic Acids Res 32:D293-D295. Zimmerman SB & Trach SO (1991) Estimation of Macromolecule Concentrations and Excluded Volume Effects for the Cytoplasm of Escherichia-Coli. J Mol Bioi 222(3):599-620. Ishihama Y, et at. (2008) Protein abundance profiling of the Escherichia coli cytosol. Bmc Genomics 9: 102. Humphrey W, Dalke A, & Schulten K (1996) VMD: Visual molecular dynamics. J Mol Graphics 14(1):33-38. McGuffee SR & Elcock AH (2010) Diffusion, crowding & protein stability in a dynamic molecular model of the bacterial cytoplasm. PLoS Comput BioI6(3):e1000694. Ridgway D, et al. (2008) Coarse-grained molecular simulation of diffusion and reaction kinetics in a crowded virtual cytoplasm. Biophysical Journal 94(10):3748-3759. Roberts E, Stone JE, Sepulveda L, Hwu W-MW, & Luthey-Schulten Z (2009) Long time-scale simulations of in vivo diffusion using GPU
426
21.
22. 23.
24.
25.
hardware. in Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing (IEEE Computer Society), pp 1-8. Luby-Phelps K (2000) Cytoarchitecture and physical properties of cytoplasm: Volume, viscosity. diffusion, intracellular surface area. Int Rev CytoI192:189-221. Brady JF & Bossis G (1988) Stokesian Dynamics. Annu Rev Fluid Mech 20:111-157. Phillips RJ, Brady JF, & Bossis G (1988) Hydrodynamic Transport-Properties of Hard-Sphere Dispersions .1. Suspensions of Freely Mobile Particles. Phys Fluids 31(12):3462-3472. Lu H, Lu L, & Skolnick J (2003) Development of unified statistical potentials describing protein-protein interactions. Biophys J 84(3): 1895-1901. Yang F, Moss LG, & Phillips GN (1996) The molecular structure of green fluorescent protein. Nat Biotechnol14(1 0): 1246-1251.
Quantum BiD-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 427-436)
SIGNALING NETWORK OF ENVIRONMENTAL SENSING AND ADAPTATION IN PLANTS: KEY ROLES OF CALCIUM ION TAKAMITSU KURUSU' AND KAZUYUKI KUCHlTSU,,2,3
Research Institute for Science and Technology, 2Department ofApplied Biological Science, 3Quantum Bio-Informatics Center, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba 278-8510, Japan; E-mail: [email protected] I
Considering the important issues concerning food, environment, and energy that humans are facing in the 21 st century, humans mostly depend on plants. Unlike animals which move from an inappropriate environment, plants do not move, but rapidly sense diverse environmental changes or invasion by other organisms such as pathogens and insects in the place they root, and adapt themselves by changing their own bodies, through which they developed adaptability. Whole genetic information corresponding to the blueprints of many biological systems has recently been analyzed, and comparative genomic studies facilitated tracing strategies of each organism in their evolutional processes. Comparison of factors involved in intracellular signal transduction between animals and plants indicated diversification of different gene sets. Reversible binding of Ca2+ to sensor proteins play key roles as a molecular switch both in animals and plants. Molecular mechanisms for signaling network of environmental sensing and adaptation in plants will be discussed with special reference to Ca2 + as a key element in information processing. Key words: Calcium ion; Molecular switch; Plant immunity; Signal transduction
1. Calcium ion as a key element in information processing l -3
Living organisms perceive external information through proteins called receptors and control gene expression and other cellular functions through the series of signal transduction mechanisms. In the signal transmission processes, low-molecular-weight substances and ions collectively called second messengers play important roles. Calcium ions (Ca2+) are involved in the transmission of various diverse stimuli, such as environmental stresses including drought and low temperature, mechanical stimulation, infection by pathogens and symbiosis by microorganisms, and responses to plant hormones. In biological processes for sensing and adapting to various environmental changes and stresses, changes in intracellular Ca2+ concentration are induced and act as a second messenger in information processing. For an intracellular substance to undertake the transmission of information, its concentration should remain at a low level in a normal state, rise in response to stimulation, and then return to the initial level after transmission of the stimulation. The cytoplasmic Ca2+ level is maintained at about 10-100 nM in a
427
428
normal state, whereas the extracellular level is maintained in the mM-range, forming an about several ten thousand-fold Ca2 + concentration gradient between the inside and outside of the cell. The vacuole that occupy most of the volume of plant cells as well as other intracellular organelles, such as endoplasmic reticulum (ER), contain Ca2+ at a level similar to the extracellular level and function as intracellular Ca2+storage sites. In addition, Ca2+ is also present at a high level in mitochondria and chloroplasts. Accordingly, an influx of a small amount of Ca2+from either outside of the cell or intracellular organelles into the cytosol mediated by transmembrane Ca2 + channels markedly alters the cytosolic free Ca2+ concentration ([Ca2+]cyt), which enables Ca2+ to act as an element in information processing in molecular switch mechanisms. Characteristic temporal and spatial patterns of intracellular Ca2+ concentration changes in response to various stimuli are controlled by diverse Ca2+-permeable channels. How are these spatio-temporal patterns of Ca2+levels distinguished in cells? Sequencing of the entire genomes of model plants such as Arabidopsis and rice revealed the presence of many types of Ca2+-binding protein in plants. Many of them are considered to play important roles as Ca2+ sensors III interpretation of information carried by spatio-temporal patterns of [Ca2+]cyt. 2. Ca 2+-mediated signaling and plant immunity4.6 Plant cells recognize pathogenic infection using specific receptors for pathogenderived signaling molecules called microbe/pathogen-associated molecular patterns (MAMPs/PAMPs) or elicitors4,5. Within a few minutes after the recognition of these signaling molecules from pathogens, characteristic early signaling events occur, including an influx of Ca2+and H+, efflux of K+ and cr, membrane potential changes, typically transient membrane depolarization, production of reactive oxygen species (ROS), and activation of mitogenactivated protein kinases (MAPKs)7.]]. These initial responses are followed by the production of antimicrobial substances and the induction of programmed cell death which is a crucial event to prevent the spread of biotrophic pathogens 12 (Fig. I). These downstream events are often prevented when Ca2+ influx is compromised by either Ca2+ chelators, such as ethylene glycol tetraacetic acid (EGTA), or Ca2+-channel blockers, such as La3+, suggesting the importance of Ca2+ influx in the induction of these responses 13 ,14. We here focus on the formation of spatio-temporal patterns of changes in [Ca2+]cyt as well as decoding of Ca2+-mediated signals by Ca2+ sensors, and discuss the relationship between [Ca2+]cyt changes and innate immunity.
429
•
Pathogen
Pathogen Inte«:ellul... Ign. transduction
Expression of defense gene
~ u
I Figure 1. Plant immunity and defense responses against pathogens in plants. Plants respond to pathogenic attack by activating a variety of defense mechanisms including the accumulation of antimicrobial compounds, cell wall crosslinking and localized programmed cell death (PCD), which restricts pathogens at the site of infection.
Figure 2. Rice cell culture. (A) A fluorescence micrograph of cultured rice cells expressing green fluorescent protein (GFP). (B) Rice cells were suspension-cultured in a liquid medium.
3. Regulation of spatio-temporal patterns of cytosolic Ca2+ concentration triggered by signal molecules from pathogens In 1961 Dr. Osamu Shimomura, a Nobel Prize laureate in Chemistry in 2008, isolated a Ca2+-sensitive chemiluminescent protein named aequorin from a jelly fish Aequoria victoria. Chemiluminescence of blue light from aequorin is transduced to the green fluorescent protein (GFP; Fig. 2) by fluorescence resonance energy transfer. Both aequorin and GFP are utilized as important tools for recent biological research. We have established transgenic cells expressing aequorin and developed non-invasive Ca2+ monitoring systems in many plant species (Fig. 3f,8.
430
(B)
(A)
-" I -~~
Stimulation
p-~~~.nl."""n.LS 3C.'·
Apoaequorin
Aequorin
Realtime monitoring by luminometer
Plant cultured cells
Figure 3. A non-invasive monitoring system for intracellular Ca2 + concentration in plant cells by expressing a Ca2+ sensitive chemiluminescent protein, aequorin. (A) Aequorin is composed of an apoprotein, apoaequorin, with an approximate molecular mass of 22 kDa, and a chromophore, coelenterazine. Aequorin has three EF-hand motifs that function as binding sites for Ca2+. When Ca2 + occupies these sites, the protein undergoes a conformational change and converts coelenterazine into excited coelenteramide through oxidation, followed by emission of blue light through relaxation to the ground state. (B) A realtime noninvasive Ca2+ monitoring system for cultured plant cells by using a luminometer.
In tobacco cultured cells BY-2, cell cycle-depending synchronous defense responses, including programmed cell death (PCD), are induced in response to cryptogein, a proteinaceous elicitor derived from a pathogenic oomycete, Phytophthora cryptogea. Immediately after recognition of cryptogein, transient and biphasic changes in [Ca2+]cyt are induced (Fig. 4A)7, subsequently followed by ROS production. Pharmacological analyses indicated that the two phases of the [Ca2+]cyt changes correspond to Ca2+ influx through the plasma membrane and an inositol 1,4,5-trisphophate-mediated release of Ca2+ from intracellular Ca2+ stores, respectively7. Chitin oligosaccharides and sphingolipid MAMPs induced Ca2+ transients mainly due to the plasma membrane Ca2+ influx in rice cells 8 (Fig. 4B). Considering that various patterns of changes in [Ca2+]cyt are dependent on trigger molecules from pathogens, Ca2+ influx in defense responses may be controlled by a series of regulatory mechanisms composed of Ca2+ influx through the plasma membrane and release of Ca2+ from intracellular Ca2+ stores.
431
(A)
(8)
_1st peak
:;-
:;-
c
~ c Q)
c
o
Q)
::s
::s .~
~
.§ ~ o Q)
~
5- .5
_1st peak [Caz'lCY!
[Ca2'1CY!
fr .5
cryptogein
~~ ·EO Q)
.<:: u
-5
~
2nd peak
0
10 5 Time (min)
Q)
E
GlcNAc
«..2 ·EO Q)
.<:: u
15
-5
0
10 5 Time (min)
15
Figure 4. Two typical temporal patterns of changes in cytosolic Ca2+ concentraion triggered by signal molecules from pathogens. Transformed tobacco and rice cells stably expressing aequorin were used. (A) Cryptogein-induced changes in [Ca2+],y" in tobacco BY-2 cells. (B) Nacetylchitooligosaccharides-induced changes in [Ca2+],y" in rice cells. rlu: relative luminescence unit.
4. Ca 2+-permeable cation channels and plant immunityl-3
Activated Ca2+-permeable channels localized in the plasma membrane and endomembranes such as vacuolar membrane mobilize Ca2+ into the cytosol in an electrochemical potential-dependent manner. Consequently, spatio-temporal patterns of [Ca2+]cyt are formed. Little is known about the molecular bases for the Ca 2+ channels l -3 • In the early stage of defense responses, membrane potential changes9 and Ca2 + influx7,8 occur at the same time, suggesting that voltage-dependent Ca2 + channels (VDCCs) may be involved in these responses. Vertebrate VDCCs are found in excitable cells of skeletal muscle, cardiac tissue, and neurons. No homologues ofVDCCs have been found in plants. In contrast, homologues of mammalian two-pore channels (TPCs) have been identified in several plant species, such as Arabidopsis, rice, and tobacco ls - l8 . Plant TPC family proteins have two EF-hand motifs, putative Ca2+ binding domains, in the intracellular domain, implying that the protein could be under feedback regulation by binding of Ca2+. AtTPCI from Arabidopsis has been reported to encode a vacuolar membrane-localized cation channel and be involved in sucrose-induced Ca2+ influx and stomatal Ca2+ response lS , l6. In contrast, rice and wheat TPCI are thought to localize in the plasma membrane 17 ,19. Reverse genetics studies suggest that OsTPCI from rice 14 and NtTPCIA, B from tobacco 18 are involved in plant immunity at least in suspension-cultured cells.
432
5. Regulation of Ca 2+-permeable channels 2,3 How are Ca2+-permeable channels activated or inactivated in plant defense responses? Oligosaccharide and sphingolipid MAMPs-induced Ca 2+ transients were dramatically suppressed by a protein phosphatase lI2A inhibitor, calyculin A (CA) in riceS. When transient [Ca2+]cyt changes are induced, Ca2+ channels need to be negatively regulated immediately after they are activated. These results suggest that protein phosphorylaion could be involved in the downregulation of Ca2+ channels after transducing the signals in plant innate immunity. However, MAMP-activated plasma membrane Ca2+-permeable channel has not been identified, and molecular mechanisms for CA-induced negative regulation of Ca2+ rise remain to be elucidated. Searches for the phosphoproteins involved in the regulation of Ca2+ mobilization are currently underway to further understand the relationship between Ca2+ signaling and the regulation of plant innate immunity.
If ••
OFF Ca 2; . (Inactive)
ON (Active)
Figure 5. Reversible binding of Ca2 + to sensor proteins play key roles as a molecular switch.
6. Decoding of Ca2+-mediated signals by Ca2+ sensor proteins 1,20.22 How are the signals mediated by spatio-temporal patterns of [Ca2 +]cyt transmitted to downstream events? Many Ca2 + binding proteins containing an EF-hand motifs 23 ,24, including calmodulin (CaM), calcineurin B-like protein (CBL), and Ca 2+-dependent protein kinase (CDPK), have been identified in plants (Figs. 5 and 6), and play pivotal roles as sensors for Ca2+. These proteins bind to ci+ and change their conformation or three-dimensional structure to control the activity or function of target proteins as "molecular switches". Ca2+/CaMdependent kinase (CCaMK) is has also been shown to be involved in decoding the specific-pattern of [Ca2+]cyt changes (Ca2+-spiking) in response to Nod factors, signal molecules from symbiotic bacteria, during nodulation in legumes.
433
e· . ..
.. c
1. CaM
"
CaM
III
III
~
II 2. CBL-CIPK
-11"- "
. - tJ-
I WUUW 1111 I [£OJ 3. CDPK
. . . .1
•
WUUW
•
Ca 2+ kinase domain EF hand
0 II!I
CaM binding FISL motif (CBl binding domain)
Figure 6. The structure of representative Ca2 + sensor proteins in plants. Many genes for Ca2+ binding proteins containing Ca2+-binding EF-hand motifs, including calmodulins (CaM), calcineurin B-like proteins (CBL), and Ca2+-dependent protein kinases (CDPK), have been identified. These proteins bind to Ca2 + and change their conformation to control the activity or function of target proteins; therefore, each plays the role for "Ca 2+ sensor".
I Active
01.
~ .,< Protein " phosphatase
Signal ON •
RIIII!III
\0 ® Signal OFF
Ca2+ kinase domain II!lI
eBl binding domain
Figure 7. Molecular mechanisms for decoding of Ca2 +-mediated signals by the Ca2 + sensor protein, CBL and the protein kinase, CIPK. Note that both CBL and CIPK act as reversible molecular switches.
434
The mammalian calcineurin consists of a type 2B protein phosphatase (subunit A) and a Ca2+ sensor (subunit B). Many genes encoding calcineurin Blike proteins (CBL), highly homologous to calcineurin B, exist in the plant genome. However, neither a homologues of calcineurin A nor type 2B protein phosphatases has been found in plants. The CBLs bind to and activate a group of protein kinases, CBL-interacting protein kinases (CIPKs), in a Ca2+_ dependent manner (Fig. 7). The model plant Arabidopsis thaliana has 10 CBLs and 25 CIPKs. Their combinations for intermolecular interaction are presumably involved in various Ca2+-mediated signal transduction pathways. Various CBL-CIPK systems play important roles in signal transduction pathways triggered by abiotic stresses, such as wounding, low temperature, drought, and salro-22 . We recently identified CIPK proteins (OsCIPK14115) that are induced by multiple MAMPs and function in innate immunity in rice. Functional characterization of the OsCIPK14115-suppresed lines established by RNAi method, as well as the overexpressing lines, suggested that these CIPKs are involved in the regulation of various immune responses, including mitochondrial dysfunction, PCD, biosynthesis of phytoalexins and pathogenesis-related gene expression as well as ROS generation triggered by a protein from a fungal pathogen I2 ,25. The protein kinase activities of these CIPKs were enhanced by combination with a specific CBL, OsCBL4. Some OsCBLs, including OsCBL4, or other unidentified factors that interact with OsCIPK15 via the FISLINAF-motif may regulate the activity and localization of OsCIPK14115 in the MAMPtriggered signal transduction pathwa/ 2 . Searches for the in vivo substrates of these CIPKs are currently underway to further elucidate the Ca2+ signaling pathways regulating hypersensitive cell death and innate immunity. These findings should shed further light on our understanding of defense signaling pathways (Fig. 8). Pathogenic sIgnal (PAMPIMAMP)
~
, 0'
Recep:.
,
ca;
~
...................
'\1 ...
Ca"channel
Spaliotemporal changes of [Ca 2+1cyt
Ca2+
I
Intracellular Ca2+ pools
Ca 2+sensors
Figure 8. A hypothetical model for Ca2+-mediated immune responses in plants.
435
7. Concluding remarks The Ca2+-mediated signal transmission system composed of the spatio-temporal pattern formation of the intracellular Ca2+ concentration by Ca2+-permeable channel proteins in the membranes, and its decoding by Ca2+ sensor proteins is a universal information-processing system common to animals and plants. However, molecules involved and the control mechanisms show marked diversity. Ca2+also plays a pivotal role as an information-processing element in plant immunity. Identification of the molecules involved in these intracellular signaling networks should be a key step to understand the mechanisms for information processing in plants that do not evolve brains. The understanding and utilization of the latent potentials of plants and effectively coexisting with them are essential tasks for humans in the future. Attempts to reduce the use of chemical pesticides including bactericidal agents and insecticides by increasing plant immune responses have recently been attracting attention. The genetic strengthening of disease resistance by breeding and genetic engineering as well as exploration and the use of physiologically active substances to enhance plant immune responses should markedly alter next-generation agriculture. We recently discovered novel chemical substances that potentially activate plant immune responses. Elucidation of the action mechanism and evaluation for practical application are underway. The basis of these studies is understanding of the uniquely developed immune system of plants at a molecular level, which is completely different from that of animals. Understanding the signal transmission network in plant immunity is an important research subject in many contexts. Collaborative study between molecular biologists and information scientists should be a challenging but promising field in the near future. Acknowledgement This work was supported in part by Grant-in-Aid for Scientific Research on Innovative Areas (21200067) to T.K., for Exploratory Research (21658118) to K.K. and for Young Scientists (B) (21780041) to T.K., and by grants from Japan Science and Technology Agency, for Adaptable and Seamless Technology transfer Program through target-driven R&D (AS221Z03504E) to T.K, and by a grant from the Ministry of Agriculture, Forestry and Fisheries of Japan (Integrated research project for plants, insects, and animals using genome technology MP-2134 to K.K.).
436
References
1. AN. Dodd, 1. Kudla, D. Sanders, Annu Rev Plant Bioi 61, 593 (2010). 2. T.H. Kim, M. Bohmer, H. Hu, N. Nishimura, JI. Schroeder Annu Rev Plant Bioi 61, 561 (2010). 3. M.R. Roelfsema, R. Hedrich, Plant Cell Environ 33, 305 (2010). 4. C. Zipfel, Curr Opin Plant Bioi 12, 414 (2009). 5. J. Zhang, JM. Zhou, Mol Plant 3,783 (2010). 6. T. Niirnberger, D. Scheel, Trends Plant Sci 6,372 (2001). 7. Y. Kadota, T. Goh, H. Tomatsu, R. Tamauchi, K. Katsumi, S. Muto, et aI., Plant Cell Physiol45, 160 (2004). 8. T. Kurusu, H. Hamada, Y. Sugiyama, T. Yagala, Y. Kadota, T. Furuichi, et aI., J Plant Res DOl: 1O.1007/s10265-01O-0388-4 (2010). 9. K. Kuchitsu, M. Kikuyama, N. Shibuya, Protoplasma 174, 79 (1993). 10. K. Kuchitsu, H. Kosaka, T. Shiga, N. Shibuya, Protoplasma 188, 138 (1995). 11. K. Kuchitsu, Y. Yazaki, K. Sakano, N. Shibuya, Plant Cell Physiol 38, 1012 (1997). 12. T. Kurusu, 1. Hamada, H. Nokajima, Y. Kitagawa, M. Kiyoduka, A Takahashi, et aI., Plant Physiol153, 678 (2010). 13. D. Lecourieux, C. Mazars, N. Pauly, R. Ranjeva, A Pug in, Plant Cell 14, 2627 (2002). 14. T. Kurusu, T. Yagala, A Miyao, H. Hirochika, K. Kuchitsu, Plant J 42, 798 (2005). 15. T. Furuichi, KW. Cunningham, S. Muto, Plant Cell Physiol42, 900 (2001). 16. E. Peiter, FJ. Maathuis, LN. Mills, H. Knight, J. Pelloux, AM. Hetherington, et aI., Nature 17, 404 (2005). 17. T. Kurusu, Y. Sakurai, A Miyao, H. Hirochika, K. Kuchitsu, Plant Cell Physiol45, 693 (2004). 18. Y. Kadota, T. Furuichi, T. Sano, H. Kaya, Y. Murakami, W. Gunji, et aI., Biochem Biophys Res Comm 336,1259 (2005). 19. Y1. Wang, IN. Yu, T. Chen, ZG. Xhang, Y1. Hao, JS. Zhang, et aI., J Exp Bot 56, 3051 (2005). 20. S. Luan, Trends Plant Sci 14, 37 (2009). 21. 1. Kudla, O. Batistic, K. Hashimoto, Plant Cell 22, 541 (2010). 22. T.A. DeFalco, K.W. Bender, W.A Snedden, Biochem J 425,27 (2009). 23. Y. Ogasawara, H. kaya, G. Hiraoka, F. Yumoto, S. Kimura, Y. Kadota, et aI., J Bioi Chem 283, 8885 (2008). 24. S. Takeda, Gapper, C, H. Kaya, E. Bell, K. Kuchitsu, L. Dolan, Science 319, 1241 (2008). 25. T. Kurusu, 1. Hamada, H. Hamada, S. Hanamata, K. Kuchitsu, Plant Signal Behav 5, 1045 (2010).
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 437-450)
NETZCOPE: A TOOL FOR DISPLAYING AND ANALYZING COMPLEX NETWORKS
MICHAEL J. BARBER Department of Foresight and Policy Development, Austrian Institute of Technology
LUDWIG STREIT BiBoS, Univ. Bielefeld and CCM, Univ. da Madeira
OLEG STROGAN Kyiv, NaUKMA, Faculty of Informatics Networks are a natural and popular mechanism for the representation and investigation of a broad class of systems. But extracting information from a network can present significant challenges. We present NetzCope, a software application for the display and analysis of networks. Its key features include the visualization of networks in two or three dimensions, the organization of vertices to reveal structural similarity, and the detection and visualization of network communities by modularity maximization.
1. Introduction
Networks describe the structure and dynamics of relations between objects or agents: in short, networks are everywhere. In biology they have become a prominent subject of research, in engineering network concepts have become ever more important-from Kirchhoff's laws in the 19th century to optimizing the design of a microchip to-day-and in the socio-economic field "globalization" is paradigmatic for their dominance. A short list of examples may serve to underscore this point:
Neural Networks The well-studied nematode Caenorhabditis elegans has a neural network of some 300 neurons with some 7000 connections between them. Metabolic Networks metabolic processes in the cell. Protein interaction networks physical interactions between an 437
438
organism's proteins. Transcriptional networks regulatory interactions between different genes. Food webs Who eats whom? Sexual relations and infections AIDS epidemiological models Pollination networks plants and their pollinators Electric networks stability of power grids Electronic networks, computer chips computing speed Airline networks service efficiency Internet search engines Linguistics words linked by co-occurrence, language families Social networks identification of central players, gate keepers, Collaborations actors, authors, research labs, ... Typically these networks exhibit considerable complexity and more often than not their structure is far from transparent. In this work, we present NetzCope, a software application for the display and analysis of complex networks. NetzCope is a general purpose tool for investigating networks, allowing the user to interactively explore networks, especially with regard to visualizing the most important relationships in the network. 1.1. Bipartite Networks
The NetzCope software was originally developed to find and display the structure hidden in long lists of tens of thousands of collaborative research projects sponsored by the European Union. Said networks are bipartite, with links always connecting members of two different sets. Some examples: Regulatory networks transcription factors and target genes Economic networks Financial centers and multinationals, firms and board members Collaboration networks actors and films in which they appeared together, laboratories and joint research projects, scientists and joint publications, ...
1.2. Modularity Of particular interest to the exploration of the network of EU-funded projects- and, indeed, to networks in general-is any possible modular structure of the network. Quoting from an article on network biology:
439
Cellular functions are likely to be carried out in a highly modular manner. In general, modularity refers to a group of physically or functionally linked molecules (nodes) that work together to achieve a (relatively) distinct function. Modules are seen in many systems, for example, circles of friends in social networks or websites that are devoted to similar topics on the World Wide Web. Similarly, in many complex engineered systems, from a modern aircraft to a computer chip, a highly modular structure is a fundamental design attribute. Biology is full of examples of modularity ... (Barabasi and Oltvai I ) To reach a more precise understanding, in order are a few words on basic concepts of graph theory, the mathematical formulation of networks.
2. A Few Words on Graphs Let • V be a set (vertices) • E be a set of vertex pairs from V x V (edges ). The pair G = (V, E) is called a graph. Given a partition V = VI + V2 · If there are no edges between pairs of points within either Vi , then G is called bipartite. The number of edges of a vertex v we call the degree:
A simple graph G = (V, E) is described by an adjacency matrix indicating whether vertices i and k are connected by an edge: aik
=
{
I irvk 0 otherwise
The degree of vertex k is then
and we shall set
The Laplacian
L=D-A
440
and the normalized Laplacian
£
= 1-
D-
1 / 2 AD- I / 2
playa central role. In particular, -£ is (up to a similarity transformation) the generator of a continuous time random walk on the graph, with equal probability l/d k along each edge.
2.1. How to plot a graph Typically a graph or network will be given simply as a list of agents (i.e., vertices) and their relations (i.e., edges). How would one translate such a list into a graphical display? To begin with, try to arrange the vertices on a straight line: put each vertex k at position Xk such that those connected by an edge will be as close as possible, as if connected by elastic springs. Mechanics tells us that such an arrangement would minimize the expression
Neurons in the nematode C. elegans are said to be distributed in this fashion! We can also write this as the scalar product
E
=
(x, (D - A)x).
Of course the minimum E = 0 would arise for the vector x Xl = X2 = ... Xk = ... = 1, indeed
(D - A)e,
eo with
~ (D - A) G) ~ 0
Here all the vertices are at the same place, Xk = 1. Excluding this, we are led to the next eigenvector of L = D - A with
In practice a better ordering is achieved using DI/2 h, where h is the Fiedler vector, the eigenvector of the normalized Laplacian corresponding to the smallest positive eigenvalue.
441
2.2. Modularity of Graphs A good division of a network into communities is not merely one in which there are few edges between communities; it is one in which there are fewer than expected edges between communities. (M. E. J. Newman 2 ). A popular measure of the quality of such a division or decomposition is the modularit y 3. Modularity is-up to a normalization constant-the number of edges within communities c minus those for a null model:
Q ==
1
21EI
LL C
where
lEI
t,]
(Aij -
P ij ),
Ec
is the number of edges or links, and _ did j
Pij
= 21EI
corresponds to a random graph model with a fixed set of vertices and the constraint that on average they should reproduce a given distribution of vertex degrees d i 4 . In empirical investigations, modularity values above roughly 0.3 are indicative of a partitioning of the network vertices showing significant modular structure. Modularity close to one would correspond to a near perfect decomposition of the network into loosely interconnected communities. The goal now is to find a division of the vertices into communities such that the modularity Q is maximal. An exhaustive search for such a decomposition is out of the question: even for moderately large graphs there are far too many ways to decompose them into communities. But fast approximate algorithms do exist 5, many based on the idea of greedily merging small communities into larger ones with a higher modularity 6,7. For bipartite graphs the null model must be modified, to reproduce the characteristic form of bipartite adjacency matrices
A=
(:T~)
also for the null model 8. This gives a bipartite modularity Q B. Comparatively few algorithms have been proposed for maximizing Q B, but methods for unipartite networks can often be adapted with little trouble. Recently, Barber 8 proposed an appropriate algorithm (BRIM: bipartite, recursively induced modules) to find communities for bipartite networks. Starting from a (more or less) ad hoc partition of the vertices of type 1, it
442
is straightforward to optimize a corresponding decomposition of the vertices of type 2. From there, optimize the decomposition of vertices of type 1, and iterate. In this fashion, modularity will increase until a (local) maximum is reached. NetzCope allows to combine a suitable greedy algorithm with BRIM, significantly enhancing the performance of the former. Modularity has some limitations that should be kept in mind. The measure has a resolution limit dependent on the number of edges in the network, so that small communities cannot be found in large networks by simply finding a maximum in the modularity 9. Further, modularity maximization is an NP-complete problem lO ; typical to the class, there are exponentially many local maxima in Q, so some ambiguity of decompositions is inevitable l l . NetzCope provides a number of visualization tools to vary and compare them. For a quantitative comparison, NetzCope will compute the mutual information 12 of different decompositions.
3. What NetzCope does For moderately large networks of some 104 vertices, say, there is not only the challenge of finding a display which exhibits as much as possible of the network structure. There are simply too many vertices and edges to fit distinguishably into a plot of any reasonable size. As a consequence, a central part of our strategy will be to identify, display, and analyze communities within the overall network. These (interconnected) communities do admit a graphical representation, and so NetzCope first displays a network of communities. For more detailed analysis the software then allows the user to "zoom into" communities and explore their inner structure. In contrast to well established network analysis software tools such as UCINET or Pajek, Netzcope implements new methods to analyze and visualize network structures, with a special emphasis on using recent methods from statistical physics to identify and visualize community groups and to analyze and visualize the adjacency matrix. Overall, the principal functionalities of NetzCope are: (1) Identification of disjoint components within the network, if any such exist. For any such component it will perform the following tasks: (2) Display of the adjacency matrix which encodes the connectivity (3) Display the same for the matrix after "Fiedler ordering" which reorders the original (e.g., alphabetical) network listing in such a way that interacting partners are grouped together
443
(4) Generate plots of the nodes and the links between them in two or three dimensions, and rotate the plots about their axes (5) Identify communities of close collaboration inside the network (6) Plot the network of these communities. This achieves one of the main goals namely a suitable complexity reduction, so as to arrive at a feasible and meaningful graphical display of networks of a size where plotting of the full network becomes meaningless. The following items are tools for the further analysis of these communities, such as (7) Topical profile: nodes may carry one or more labels. Their frequency within a community is represented by colored segments in the aforementioned plot. A more precise representation of this as a histogram can be called up for each community, comparing their occurrence within the community to overall occurrence in the network. (8) A scatter plot displays the number of internal versus external links for the leading players in the community, thus identifying central players and gatekeepers. Centrality is further analyzed by scatter plots which compare high linkage (to many partners) and important linkage (to important partners) for leading players. (9) An important functionality of NetzCope is the possibility to repeat the above tasks for each of the communities. The user may thus analyze the network iteratively, "zooming into" the community structure until a desired level of structural detail is obtained. (10) Finally "network portraits" as proposed by Bagrow et al. 13 are included, mainly to facilitate the comparison of empirical network structure with that of simulations stemming e.g. from random graph or multi-agent models.
Originally designed to analyze and display bipartite networks of organizations collaborating through projects, we have lifted this restriction to make NetzCope more versatile. NetzCope can now also analyze general, unipartite networks. NetzCope distributions, together with sample network data files, are available for Linux as well as Windows users. NetzCope loads network data in the widely used Pajek format or from an edge list in plain text format. Netzcope can be downloaded at http://www.physik.uni-bielefeld.de/-strogan/.
444
file ]bois !!elp
P
0
Relations
3013
5751
0
17989
1
17865
2
6
6
3
6
6
4
5
5
5
5
5
Adjacency matrix Adjacency matrix ftedlerized 20 plot
6
5
5
Save this component
7
5
5
Save this component weighted.
8
4
4
9
4
4
BP Evec + degr. centralities
10 4
4
Decompositions search (FMBP+BRlM)
11 , 3
3
Figure 1.
Save bipartite of this component Compute Q (unipartite, ·0")
Oecomposition{s) search (FM + BRIM, conf
Connect ed Components
4. Some NetzCope Screenshots
4.1. The Connected Components After a network has been loaded, NetzCope first extracts and lists the connected components. For a bipartite network, each line in Fig. 1 shows the number of edges ( "relat ions" ), the number of nodes of type 0 ("organizations" ), t he number of nodes of type P ( "projects" ). Clicking on one of the boxes displayed will open a menu of options for the further analysis of the chosen component.
4.2. The Adjacency Matrix For large networks, NetzCope displays the non-zero entries (edges) of the adjacency matrix as dots. Figure 2(a) shows part of such a matrix for some 5600 vertices. They will in general be more or less randomly distributed,
445
as long as the list of vertices does not group vertices together which are connected by many links. Fiedler ordering, reordering the graph vertices so that the components of the Fielder vector are sorted, does just this. Figure 2(b) shows the same segment of the adjacency matrix after Fiedler ordering with its characteristic concentration of dots near the diagonal, i.e., of short links in the listing.
~ ~ .::, ~:.,",
.. ,.,.~'II
.
. .1.- ' ... • i- :. . ':~I .~·::~~'n'~;1" "
_,
....
4 ••
~.
'-,~.
~
~
•
,
i
I ~ _ _ _ _ _ _ ._
------..'0' -
~
'
I
(a) Ad Hoc Ordering
______' _ _
,' "
--'0 -
(b) Fielder Ordering
Figure 2.
Adjacency Matrix
446
4.3. Plotting the Graph The menu item "2 D plot" will produce a two dimensional representation of the graph. When networks have more than a thousand vertices, the display will become more and more opaque (Fig. 3(a)), and even the NetzCope zoom function will be of limited use (Fig. 3(b)).
,.o ------,------ a'. (a) The 2-d Plot
(b) Zooming Into the Plot
Figure 3.
Plotting the Graph
447
•
choo s e a de c ofnpo s ition :t'o display
•
u()
o
Write info files on disk when displaying
Q
93
7
N commmunities
Figure 4.
Modularities vs. Number of Communities
4.4. Plotting the Network of Communities Before NetzCope outputs a graph of communities, various settings can be specified, in particular the number(s) of communities for which NetzCope then computes the decomposition, the respective modularities are displayed as in Fig. 4. For the collaborative research networks one observes a typical leveling off of the modularity Q as the number of communities increases. A click on the chosen value point on the display opens a window to prescribe some graphical settings, and then produces a display of the corresponding network of communities, such as in Fig. 5.
4.5. The Network Portrait The "network portraits" proposed by Bagrow et al. 13 display (Fig. 6) not only the degree distribution but more generally, row by row, the distribution of all shell sizes where the n-th shell of a node in the network consists of all the nodes at distance n- the first row gives the usual degree distribution, while the highest n is equal to the diameter of the network. These portraits display structural features, including the network diameter (d = 7 in our example), which, particularly for large networks, are not accessible to direct visual inspection of the latter. They have proven useful to show how real world networks differ from certain random or multi-agent models.
448
U~. O ", o.631812271Cil4
,\
f ~
I
1/
I
-~~ . \.
. ....
I I
j ,~~_ 7 :;;
[;J ~ ~-:-------:---(EJ"'--~~~~~~@]
Figure 5.
Network of Communities
Figure 6.
Network Portrait
iI ;I
449
5. Auxiliary Features
NetzCope provides numerous auxiliary features and settings, some of which have already been mentioned in their proper context. As mentioned above NetzCope allows interactive exploration ofthe network, permitting pointing into the display and reading out the corresponding information at the side. The display has a zooming and rotating capability, both in two and three dimensions. The latter is useful to show possible congruences; network graphs occasionally become alike after a suitable rotation. Graphical details such as colorings and line widths can be chosen. The search for communities can be done by maximizing modularity with a greedy algorithm 6 ,7 and/ or BRIM 8 (for bipartites) , and with varying degrees of randomness and preferred numbers of communities. Their display is controlled by the aforementioned Fiedler vector providing one coordinate of the network nodes, and typically the subsequent eigenvector of the normalized Laplacian to provide the second one. NetzCope allows the user to choose other eigenvectors for this purpose, generating different distributions of the nodes within the display. Starting from a bipartite network of "organizations" and "projects," NetzCope can generate, e.g., the projected network of organizations only, which are linked if they share projects. The number of projects may additionally be t aken into account to produce a weighted graph. One should also keep in mind that the community search may involve random elements. In that case output will then vary from one run to another. NetzCope offers a quantitative control of these variants by calculating their mutual information.
6. Conclusion
We have presented a tour of the features of NetzCope, a software application for the display and analysis of complex networks. Originally created to support investigating specific collaboration networks, NetzCope has become a general purpose tool for network study. NetzCope allows the user to interactively explore networks, especially with regard to organizing the vertices so that the most important relationships in the network can be observed. Despite its lengthy list of features, NetzCope remains in its infancy, offering much potential for future extension.
450
Acknowledgments
Work supported by FCT and POCl 2010 (project MAT /58321/2004) with participation of FEDER, and by the Austrian Science Fund (FWF) under project P21450. References 1. A.-L. Barabasi and Z. N. Oltvai, Nature Reviews Genetics 5, 100(February 2004). 2. M. E. J. Newman, Proceedings of the National Academy of Sciences 103, 8577 (2006). 3. M. E. J. Newman and M. Girvan, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 69, p. 026113 (2004). 4. F. Chung and L. Lu, Annals of Combinatorics 6, 125 (2002). 5. S. Fortunato, Physics Reports 486, 75(February 2010). 6. M. E. J. Newman, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 69, p. 066133 (2004). 7. A. Clauset, M. E. J. Newman and C. Moore, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 70, p. 066111 (2004). 8. M. J. Barber, Physical Review E (Statistical, Nonlinear, and Soft Matter Physics) 76, p. 066102 (2007). 9. S. Fortunato and M. Barthelemy, PNAS 104, 36 (2007). 10. U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer, Z. Nikoloski and D. Wagner, Knowledge and Data Engineering, IEEE Transactions on 20, 172(December 2007). 11. B. H. Good, Y. A. de Montjoye and A. Clauset, Physical Review E 81, 046106+(Apr 2010). 12. L. Danon, A. Diaz-Guilera, J. Duch and A. Arenas, J. Stat. Mech. ,p. P09008 (2005). 13. J. P. Bagrow, E. M. BoHt, J. D. Skufca and D. ben Avraham, EPL (Europhysics Letters) 81, 68004+(March 2008).
Quantum Bio-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 451-460)
STUDY OF HIV-l EVOLUTION BY CODING THORY AND ENTROPIC CHAOS DEGREE KEIKO SATO
Department ofInformation Science, Tokyo University of Science, Noda, 278-8510, Japan We studied the evolution of HI V-I (Human Immuno-deficiency Virus Type I) by means of coding theory and information dynamics. More precisely, (I) we applied various artificial codes to look for the similarity between these codes and the code of HI V-I; (2) the entropic chaos degree was used to describe the evolution of HIV -1.
1. Introduction Information of life is stored as a sequence of nucleotides, and the sequence composed of four bases seems to be a sort of code [1, 2]. Therefore, we can consider that the DNA or gene in each organism is a code showing its inherent structure. Thus, we ask what kind of structure each code has. More precisely, we ask what roles the code structure has for the emergence of life and how it is concerned with the changes of the living body. On such questions, by using the sequences encoded by various artificial codes in information transmission, we explored the code structure of the sequences of HIV-1 envelope V3 region obtained longitudinally for each patient and then studied the evolutionary changes during the course of HIV-1 infection from the viewpoint of the code structure. We next examined the evolutionary changes in the V3 region for each patient with HIV-linfection by using the entropic chaos degree introduced on information dynamics [3, 4], and then studied whether these evolutionary changes are related with the disease progression of HIV -1 infection. It is very interesting to find the' dynamics causing the changes of viruses from one type to others. If one can get a rule to describe the dynamics such as Schrodinger equation of motion and traces the changes in viruses, it will be a great step to study viral genomes, even more the diseases caused by viruses. However it is also very difficult to find such an equation of motion and more difficult to solve the equation because the micro-dynamics of the virus change will be one of the multi-body problems with very complicated interactions. In any case, we have to look for some rule representing the dynamics even in not complete micro-level, that is, in a certain macro level. Most of biologists discuss the changes of viruses by looking at each site of genome or counting the substitution rate of the sites. 451
452
We use the probability theory and the chaos degree to study such changes, which provide us with a bit more rules than merely counting.
2. Methods 2.1. The Code Structure of HIV-I
First, we select artificial correcting codes satistying the hypothesis below, and then we investigate which artificial code characterizes the nucleotide sequence ofHIV-l envelope V3 region. Our hypothesis is the following. Each codon determines an amino acid and the third nucleotide of the codon will not have much influence on the amino acid, so that the third nucleotide is supposed to playa role of a check symbol in an error-correcting code. That is, an error-correcting code that a gene has is considered to be a code which has the code length that does not destroy the codon unit and changes the third nucleotide. Under this hypothesis, we consider how the code structure of a gene is analyzed. Since the Galois Field GF(4) consists of four elements, 0, 1, a and a 2 , the four bases can be expressed as A~0,T~I,C~a,G~a2 .
We rewrite an important part of the sequence in a gene by that of these four elements, and we make the error-correcting code by using an artificial code. The total length of such a code is multiples of 3 and the length of the information symbols is multiples of 2. Artificial error-correcting codes used for our study are BCH codes, selforthogonal codes and Iwadare-codes. We encode the n nucleotide sequences of each patient at a specific point in time, and then get the encoded amino acid sequences X~ (j = 1,2,· .. , n) by code C . Here, a degree to measure the similarity between an artificial code of X~ and the code of amino acid sequences
Xj
(j = 1,2,.··, n) before coding is defined by
Dc(j) =
(0 s; p(Xj,Xn l),
p( Xj,Xn
S;
which is called "Entropy Evolution Rate" [5] and it indicates the difference between Xj and X~ . When the code structure of Xj gets closer to an artificial code C obtained in usual coding theory, the value of Dc(j) gets closer to O. Moreover, to look for a common code of a group of several V3 sequences, we use the degree Dc defined by
Ip(Xj,xn
D
_-,-j_~l_ _ _ __
c
n
453
By calculating Dc for various C codes, we can find a common code structure ofV3 sequences obtained at each point in time after HIV-1 infection. If Dc for a code is smaller than that of any other codes, then we can infer that the group has the property that code C owns. To study the code structure of the sequences of HIV-1 envelope V3 region during the course of HIV -1 infection, we encoded, by means of various artificial C codes, the sequences obtained from patients infected with HIV -1 at several points in time after infection or seroconversion, and then calculated the index Dc mentioned above. 2.2. The Evolutionary Changes of HIV-I by Entropic Chaos Degree
Entropic Chaos Degree (ECD for short) has been used to characterize the chaotic aspects of the dynamics leading sequence changes [6, 7]. We will briefly explain the ECD. The ECD for the amino acid sequence xn of HIV -1 envelope V3 region obtained at n years post-HIV-1 seroconversion with the one xn+l obtained at n + 1 years post-HIV -1 seroconversion from a patient is given as follows [8,9]. Let take the amino acid sequences of two sequences xn and xn+l . We like to find a rule xn changes to xn+l if xn is supposed to be ahead of xn+l . As explained above, it is almost impossible to write down the equation of motion in the very micro level, say the level of quantum mechanics. However it is true that there exists some micro-dynamics causing the change from xn to xn+l even if we can not knows the exact form of the dynamics. We denote this dynamics by A:;cro and its extension to a macro scale properly considered by
A'. Even when the micro-dynamics A:;cro is hidden, we can somehow find the macro dynamics A' . The macro dynamics we consider here is one in usual discrete probability theory, in which we can apply the usual Shannon's information theory. So the macro-dynamics A' is nothing but a channel from the probability space of xn to the probability space. After two amino acid sequences are aligned, they are denoted by the same symbols xn and xn+l . The complete event system of xn is determined by the occurrence probability P; of each amino acid a; (i = 1,2,···,20) and Po of the gap *; (Xn,P)=(
* Po
20
where
LP; = 1. In the same way, the complete event system xn+l i::::O
by
is denoted
454
...
~20).
...
P20
The compound event system of xn and xn+1 (Xnxxn+l,r)=(** roo
*al rOI
where 'ij represents the joint probability for the event a; of xn and the event a j of X n+1 satisfYing
20
20
j =O
;=0
2>ij = Pi' 2>ij =
Pj'
So the dynamics A * describing
the change of sequence from xn to xn+1 is given by a certain mapping called a channel sending the probability distribution p to p == A *P . It is difficult to know the details of this dynamics in the course of sequence changes. The ECD can be used to measure the complexity without knowing the exact dynamics [7], which is one of the aspects due to the sequence change. The ECD for the amino acid sequences is given by the following formula; ECD(xn,x n+I )== L>;S(A*8J, where SO is the Shannon entropy and P
I (i
=
= j)
~ pA, 8; (j) = { 0 (i"* J)' Note
that the ECD(X n,xn+l) is written as ECD(p,A*) to indicate p and A* explicitly. This chaos degree is originally considered for how much chaos is produced by the dynamics A * [10, 11]. Therefore it is considered as: (1) A * produces a chaos iff ECD > 0 (2) A* does not produce a chaos iffECD = O. Moreover, the chaos degree ECD(X",X n+l ) provides a certain difference between xn and xn+1 through a change from xn to xn+1 , so that the chaos degree characterizes the dynamics changing xn to xn+1 . We calculated the ECD of the dynamics leading sequence changes of the V3 region which were obtained from patients infected with HIV -1 at several points in time after infection or seroconversion.
2.3. Longitudinal Sequence Data We obtained envelope V3 sequence data from several longitudinal studies of HIV -1 infection [12-16]. Some patients had progressed to AIDS and died of AIDS-related complications, while others have been asymptomatic during the period of follow-up. This paper has the results for four patients as representatives of many patients.
455
3. Results and Discussion The value of Dc by various C codes and that of the ECD for the V3 sequences are shown in Figure 1 for Patient A and Patient B who have been asymptomatic during the follow-up period, and those values are shown in Figure 2 for Patient C and Patient D who had progressed to AIDS and died of AIDS-related complications during the follow-up period.
Patient A
0.2
G-
9
0.15
~
I"l
-8
22
46
59
Dc 0.1
0.05
0 0
13
33
M>nths Post Seroconversion
035 0.3 0.25 0.2 0
() UJ
0.15 0.1 0.05 0 (0, 13)
(13, 22)
(22, 33)
(33,46)
(46, 59)
456
Patient B 0.2
..n 0.15
Dc ..tl
-
0.1
-~
---
0.05
0
29
3
42
58
---
70
100
",ntlls Post Seroconversion
0.3
0.25 0.2
0°15 ()
w
01
0.05
° Figure 1.
(3, 29)
(29, 42)
(42, 58)
(58, 70)
(70, 100)
The value of Dc and the value of the ECD for theV3 sequences obtained at each point
in time from two patients (Patient A and Patient 8) who have been asymptomatic during the followup period. Codes
Error·correcting Capacity
-+- (15, lO)-cyclic code
t 1 + (it G +al' +t4 +cit3 +ci
--- (21, 14)-cyclic code
...... Self-orthogmal code (Constraint Length 69)
GB'lerator Polynomial tj+a~t4+t'+at+ci
2 (randcrn erroc)
d J +DJ1 +rJ+d+l . d 8 + D U + LfD + n4 +1
- - (15, lO)-cyclic code
t!J+ t 4 +tJ +1
..... (21, 14)-cyclic code
t ' +ar+arl+atJ +at+l [j9 + nY, +[jB+ n21+ Dl4 +1 • d8+DD+D2S+D21 +£1+1
......... Self-orthogcnal code (Constraint Length 120) 3 (random error) -&- (15, lO)-cyclic code
"+1
457
The values of Dc for the V3 region obtained from asymptomatic patients were low for artificial codes represented by dark blue, pink and light blue at all points in time. In addition, all codes had a constant value of Dc' Although Patient B is a long term non-progressor, CD4+ T-Iymphocyte counts gradually decrease from around 100 months post-HIV-I seroconversion. We observed that the code structure of V3 region showed a change previous to the decrease of CD4+ T-Iymphocyte counts. The ECD of patients who have not diagnosed AIDS during follow-up like Patient A has maintained stable and low value. For Patient B, the value of the ECD for the V3 sequences obtained at 70 months and 100 months post-HIV-I seroconversion was larger compared with that of other points in time. Patient C 0.2
G-eeeeee~eee~
0.15
Dc 0.1
0.05
9
16
19
21
25
28
34
40
42
43
49
56
62
68
81
Mmths Post Seroconversion
0.35 0.3
o
0.25
()
W
0.2 0.15 0.1 0.05
o (3. 9)
(9,21)
(21,34)
(34, 49)
(49,68)
(68,81)
458
Patient D 0_2
0.15
Dc 0.1
0_05
3
14
24
34
45
51
61
66
68
17
80
87
94
98
105
Mmths Post Seroconversion
0.35 0._3 80.·25
w 0..2 015
0.1 0 0.5 0. (3,14)
Figure 2,
(14,24)
(24, 34)
(34,45)
(45,61)
(61,77)
<77,87)
(87, 98) (98,105)
The value of Dc and the value of the ECD for theV3 sequences obtained at each point
in time from two patients (Patient C and Patient D) who had progressed to AIDS and died of AIDSrelated complications during the follow-up period , Color scheme of various codes is the same as that used in Figure 1. The points in time after AIDS diagnosis (CD4+ T -lymphocyte counts of less than 200 cells/J-lL) are represented by red color. Patient C and Patient D died at 97 and 109 monthes postHIV-l seroconversion, respectively, Codes --+- (15,10)-cyciIc code ~
(21, 14)-cyclic code
Error-cocrecting Capacity Generatcr Polynomial t 5 + a?t 4 +t2 +at+ci [7 +ah 6 +at j +t4 +a~t3 +0:"
-4-
Self-crthogonal code (Constraint Length 69)
2 (ran dcm efTC!')
-to- (15, lO)-cyclic code
.....- (21, 14)-cycl ic code
__ Self-orthogonal code (Constraint Length 120) 3 (random error)
---<>-- (15,lO)-cyciIc code
Jj" +D21+D9+ D"+1.
d S + Di'j + DTJ + D4 +1 t t 1 +a:f +at3 + at" +at+1 g9 + n)6 + [?3 + n-:ll+ [j4 +1. [5+ 4+ t 2+ 1
d 9 + n D + D-:lS + n 27 + rJ +1 t 5 +1
459
The code structure of the V3 region obtained from patients who had progressed to AIDS and died of AIDS-related complications during the followup period changed throughout the entire course of HIV -1 infection. The values of Dc for Patient C and Patient D were low for artificial codes represented by dark blue, pink and light blue during the early stage of HI V-I infection .However, as HIV-l disease progressed, the V3 region was completely different from the structure of such codes. In contrast, the V3 region was close to the structure of the code represented by orange or pale purple during the late stage of HIV -1 infection. The code structure of V3 region changed previous to the decrease ofCD4+ T-Iymphocyte counts. We consider that the changes of the code structure are related with the stage of HIV -1 disease. The values of the ECD for asymptomatic patients with HIV -1 infection were comparatively low. The value of the ECD gradually increases from asymptomatic HIV -1 infection to around the time of AIDS diagnosis, and then it continues to decrease up to death for duration of AIDS. Therefore we can infer patient's stage of disease progression from the variation pattern of the ECD.
References 1. M. Ohya and K. Sato, USE OF INFORMATION THEORY TO STUDY GENOME SEQUENCES, Rep. Math. Phys., vol.46, 419-428 (2000). 2. M. Ohya and K. Sato, The Code Structure of HlV-l, Open Sys. & Information Dyn., vo1.14, 295-306 (2007). 3. R. S. Ingarden, A. Kossakowski and M. Ohya, Information Dynamics and Open Systems, Kluwer Academic Publishers (1997). 4. I. V. Volovich and M. Ohya, "Mathmatical Foundation of Quantum Information and Computation with applications to nano- and bio-sciences, to be published from Springer-Verlag. 5. M. Ohya, Information theoretical treatment of genes, IEICE E72, No.5, 556-560 (1989). 6. M. Ohya, Information Dynamics and Its Applications to Optical Communication Processes, Lect. Note Phys., vo1.378, 81-92 (1991). 7. M. Ohya, Complexities and their application to characterization of chaos, Int. 1. Theor. Pliys., vol. 37, 495-505 (1998). 8. K. Sato, T. Tanabe and M. Ohya, How to Classify Influenza A Viruses and Understand their Severity," Open Sys. & Information Dyn., VoU7, 297-310 (2010).
9. K. Sato and M. Ohya, Analysis of the disease course of HIV -1 by entropic chaos degree, Amino Acids, vol.20, 155-162 (2001). 10. M. Ohya, Adaptive Dynamics and its Applications to Chaos and NPC Problem, QP-PQ: Quantum Probability and White Noise Analysis, Quantum Bio-Informatics, vol. 21, 181-216 (2007). 11. K. Inoue, M. Ohya and K. Sato, Application of chaos degree to some dynamical systems, Chaos, Solitons and Fractals, vol. 11, 1377-1385 (2000). 12. R. Shankarappa, 1. B. Margolick, S. J. Gange, A. G. Rodrigo, D. Upchurch, H. Farzadegan, P. Gupta, C. R. Rinaldo, G. H. Learn, X. He, X. L. Huang
460
13.
14.
15.
16.
and 1. 1. Mullins, Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection, Virol., vol. 73, 10489-10502 (1999). A. Cleland, H. G. Watson, P. Robertson, C. A. Ludlam and A. J. L. Brown, Evolution of zidovudine resistance-associated genotypes in human immunodeficiency virus type I-infected patients, Acquir.Immune Defic. Syndr. Hum. Retrovirol., vol. 12, 6 -18 (1996). T. F. Wolfs, G. Zwart, M. Bakker, M. Valk, C. L. Kuiken and J. Goudsmit, Naturally occurring mutations within HIV-l V3 genomic RNA lead to antigenic variation dependent on a single amino acid substitution, Virology, vol. 185,195-205 (1991). L. M. Frenkel, Y. Wang, G. H. Learn, 1. L. McKernan, G. M. Ellis, K. M. Mohan, S. E. Holte, S. M. De Vange, D. M. Pawluk, A. 1. Melvin, P. F. Lewis, L. M. Heath, 1. A. Beck, M. Mahalanabis, W. E. Naugler, N. H. Tobin and J. 1. Mullins, Multiple viral genetic analyses detect low-level human immunodeficiency virus type 1 replication during effective highly active antiretroviral therapy, J Virol. vol. 77, 5721-5730 (2003). H. Imamichi, K. A. Crandall, V. Natarajan, M. K. Jiang, R. L. Dewar, S. Berg, A. Gaddam, M. Bosche, J. A. Metcalf, R. T. Davey Jr, H. C. Lane, Human immunodeficiency virus type 1 quasi species that rebound after discontinuation of highly active antiretroviral therapy are similar to the viral quasispecies present before initiation of therapy, J Infect. Dis., vol. 183, 3650 (2001).
Quantum BiD-Informatics IV eds. L. Accardi, W. Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 461-467)
THE PREDICTION OF BOTULINUM TOXIN STRUCTURE BASED ON IN SILl CO AND IN VITRO ANALYSIS TOMONORI SUZUKI AND SATORU MIYAZAKI
Department 0/ Medicinal and Life Science, Faculty 0/Pharmaceutical Sciences, Tokyo University o/Science, 2641 Yamazaki, Noda 278-8510, Japan Many of biological system mediated through protein-protein interactions. Knowledge of protein-protein complex structure is required for understanding the function. The determination of huge size and flexible protein-protein complex structure by experimental studies remains difficult, costly and five-consuming, therefore computational prediction of protein structures by homolog modeling and docking studies is valuable method. In addition, MD simulation is also one of the most powerful methods allowing to see the real dynamics of proteins. Here, we predict protein-protein complex structure of botulinum toxin to analyze its property. These bioinformatics methods are useful to report the relation between the flexibility of backbone structure and the activity.
1. Introduction In the post genome era, since it is generally assumed that function of a protein is closely linked to its three-dimensional structure, many research groups work on protein structure determination by X-ray crystallography or solution nuclear magnetic resonance (NMR). There are more than 68,000 structures registered in Protein Data Bank (PDB) as of 2010. However, since mu1tidomain proteins and complex proteins are often difficult to crystallize and many are too large for NMR structure determination, the rate of structure determination is still low compared with sequence determination. On the other hand, a number of computational methods have been developed for the prediction of protein structures and interactions from genomic and structural information. We have been studying the structures of botulinum toxin, which is large multimeric protein complex consisting of several different components. Some of the component and complex structures have not yet been determined. Here, we describe the prediction of botulinum toxin and related protein structures by the computational methods. Clostridium botulinum is anaerobic bacterium; which produces seven distinct serotypes (A-G) of neurotoxins (BoNT; 150 kDa) [1]. BoNT is
461
462
synthesized as a single chain and associates with non-toxic proteins in culture supernatants. These include the non-toxic non-hemagglutinin (NTNHA; 130 kDa) and three types of hemagglutinin (HA) subcomponents, HA-70, HA-33 and HA-17 (70, 33 and 17 kDa, respectively), forming a large toxin complex (TC) held together by non-covalent binding. We analyzed the 3D structure and function of TCs of serotypes C and D. In the case of serotype C and D, 3D structures of HA-33 (C and D), HA-17 (D) and HA-70 (C) have been determined by X-ray crystal diffraction. However, other structures are still unknown. So far, we have reported the model structure of TC based on in vitro experimental data (Fig. 1) [2, 3], some interaction region [2] and predicted 3D structures of BoNT, trypsin like protease (TLP) and TLP/BoNT complex (Fig. 2) [4]. Thus, determination or predict of the individual component of TC were almost accomplished. In this work, we predicted 3D structures of NTNHA and HA-70 produced by C. botulinum serotype D.
Figure 1. Schematic model of Botulinum toxin complex. The toxin complex consists of BoNT (red), NTNHA (green), HA-70 (yellow), HA-33 (blue) and HA-17 (cyan).
463
Figure 2. Predicted structures of serotype C BoNT (A), TLP (B) and BoNT/TLP complex (C).
2. Modeling of NTNHA structures No structures of NTNHA have been determined by X-ray diffraction crystallography as yet. The NTNHA sequence has about 40% sequence homology to the BoNT and tetanus neurotoxin (TeNT). The structures of BoNT
464
and C-terminal region of TeNT have been determined [5, 6, 7]. Thus, we predicted C-terminal region of NTNHA structure. The modeled structures of NTNHA produced by serotype D was constructed, using the program Discovery Studio v1.7 (Accelrys Software Inc., San Diego, CA, USA). For modeling of serotype D NTNHA, serotypes A (PDB ID: 3BTA) and B (lEPW) BoNTs and TeNT (lDIW) were used as template structures. The commonly conserved regions present in the NTNHA and template structure were determined by sequence homology and secondary structure alignment (Fig. 3). Secondary structure prediction of the NTNHA was carried out using PSI-PRED [8]. Validation of the models was carried out using Ramachandran plot calculation computed with the Swiss-PdbViewer (available at http://swissmodel.expasy.org/spdbvl)afterstructureminimization. In Ramachandran plot space, over 80% of residues were in the favored region. As a result of modeling, C-terminal region of NTNHA were predicted (Leu837 to Ser1191) (Fig. 4A). There is gap in sequence alignment ofNTNHA and neurotoxins, and most of these gap regions correspond to long loop region of neurotoxins (Fig. 4B). The result suggests, the NTNHA structure is more compact than neurotoxin structures.
. ---
Figure 3. Sequence alignment of NTNHA and template structures (C·terminal domain of neurotoxins).
465
Figure 4. (A) Predicted C-terminal region of NTNHA structure. (B) Superimposed structure with template structures. Yellow; NTNHA, white; BoNT/A, blue; BoNTlB and pink; TeNT.
466
3. Modeling of HA-70 structure For modeling of serotype D HA-70 structure, serotype C HA-70 structure (PDB ID: 2ZS6) was used as template structures [9]. Sequence homology between serotype C and D HA-70 is over 90%, however serotype C HA-70 structure contains deleted region. To fill the deleted region, we used the partial structure of integrin as template (PDB ID: 3FCS) [10]. In Ramachandran plot space, over 80% of residues of predicted HA-70 were in the favored region. The single chain HA-70 is nicked by TLP at specific site. The HA-70 is thereby split into a dichain structure [11]. As a result of prediction of intact HA-70 structure, nicking site was exposed on the molecular surface, which is likely to be specifically attacked by TLP (Fig. 5). We have predicted the BoNT, NTNHA (only C-terminal region) and HA-70 of serotype D. The structure of HA-33/HA-17 has been determined by X-ray crystallography analysis [3]. And some interaction regions between TC components have been estimated by in vitro experiments [2, 12]. To clarify the botulinum TC structure, we attempt to determine the complete structures of subcomponents and interaction sites.
Figure 5. (A) Predicted serotype D HA-70 structure. (B) Close·up view of nicking site.
467
Acknowledgments
The authors thank Drs. T. Ohyama, T. Watanabe and K. Niwa, Tokyo University of Agriculture, for their in vitro experiments. References
1. 1.H. Sugiyama, Microbio!. Rev. 44,419 (1980). 2. 2.T. Suzuki, T. Watanabe, S. Mutoh, K. Hasegawa, H. Kouguchi, Y. Sagane, Y. Fujinaga, K. Oguma and T. Ohyama., Microbiol-UK, 151, 1475 (2005). 3. 3.K. Hasegawa, T. Watanabe, T. Suzuki, A. Yamano, T. Oikawa, Y. Sato, H. Kouguchi, T. Yoneyama, K. Niwa, T. Ikeda and T. Ohyama., J Bio!. Chem., 282, 24777 (2008). 4. 4.T. Suzuki, T. Yoneyama, K. Miyata, A. Mikami, T. Chikai, K. Inui, H. Kouguchi, K. Niwa, T. Watanabe, S. Miyazaki and T. Ohyama, Biochem. Biophys. Res. Commun., 379, 309 (2009). 5. S.D. B. Lacy, W. Tepp, A. C. Cohen, E. R. DasGupta and R. C. Stevens. Nat. Struct. Bioi., 5, 898 (1998). 6. 6.S. Swaminathan and S. Eswaramoorthy. Nat. Struct. Bioi., 7, 693 (2000). 7. 7.P. Emsley, C. Fotinou, I. Black, N. F. Fairweather, I. G. Charles, C. Watts, E. Hewitt and N. W. Isaacs. J Bioi. Chem., 275, 8889 (2000). 8. 8.D. T. Jones, J Mol. BioI. 292, 195 (1999). 9. 9.T. Nakamura, M. Kotani, T. Tonozuka, A. Ide, K. Oguma K and A. Nishikawa, J Mol. Bioi., 385, 1193 (2008). 10. 10.l Zhu, B. H. Luo, T. Xiao, C. Zhang, N. Nishida and T. A. Springer, Mol. Cell, 32, 849 (2008). 11. 1l.K. Oguma, K. Inoue, Y. Fujinaga, K. Yokota, T. Watanabe, T. Ohyama, K. Takeshi and K. Inoue, J. Toxico!. Toxin Rev. 18, 17 (1999). 12. 12.T. Suzuki, H. Kouguchi, T. Watanabe, H. Hasegawa, T. Yoneyama, K. Niwa, A. Nishikawa, l C. Lee, K. Oguma and T. Ohyama, Protein J 26, 173 (2007).
This page intentionally left blank
Quantum Bio-Informatics IV eds. L. Accardi, W . Freudenberg and M. Ohya © 2011 World Scientific Publishing Co. (pp. 469- 490)
ON THE MECHANISM OF D-WAVE HIGH Tc SUPERCONDUCTIVITY BY THE INTERPLAY OF JAHN-TELLER PHYSICS AND MOTT PHYSICS H . USHIO Tokyo National College of Technology, 1220-2 Kunugida-chou, Hachioji 193-0997, Japan E-mail:[email protected] S. MATSUNO Shimizu General Education Center, Tokai University, 3-20-1 Orido Shimizu-ku, Shizuoka 424-8610, Japan, E-mail: [email protected] H. KAMIMURA Department of Applied Physics, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan, E-mail: [email protected] In the present paper we will discuss two important roles of the interplay of JahnTeller physics and Mott physics. One is the small Fermi surface. The "Fermi arcs" observed in ARPES should be one of the edges of small Fermi pockets, based on the Kamimura-Suwa model (K-S model). This prediction is consistent with ARPES results by Tanaka et al. Another is the mechanism of superconductivity in cuprates. This can be explained by the interplay of strong electron-phonon interactions and local AF order. It is shown that the characteristic phase difference of wave functions between u p- and down-spin carriers in the presence of the local AF order leads to the superconducting gap of d x 2_y2 symmetry even in the phonon-involved mechanism.
1. Introduction
In 1986 Bednorz and Miiller discovered high temperature superconductivity in copper oxides with motivation that higher Tc could be achieved by combining Jahn-Teller (JT) active eu ions with the structural complexity of layer-type perovskite oxides. 1 Their idea gets the essential point in considering the mechanism of high Tc superconductivity. 469
470
o
Cu
0°0 .1 11
•
Cu dz2
~II P" I big>
la\g> antibonding orbital
bonding orbitals
(b)
(a) Figure 1.
Spatial extension of lai g
)
antibonding orbital and
I bl g
)
bonding orbital
Undoped copper oxide La2Cu04 is an anti ferromagnetic Mott insulator, in which an electron correlation plays an important role. Thus we may say that undoped cuprates are governed by Mott physics. Most of models of high temperature superconductivity, assumed that the doped holes itinerate through an orbital of b 1g symmetry extended over a CU02 plane in the systems consisting of the CU06 octahedrons elongated by the JT effect. Those models are called "single-component theory", because they consider only the orbitals of hole carriers extend only over a CU02 plane. This singlecomponent theory, however, met a serious difficulty that, in the presence of anti-ferromagnetic (AF) order constructed from the localized spins, the hole carriers cannot move smoothly. In order to remove this difficulty, RYB model 2, t-J model 3, etc. were proposed. When Sr2+ ions are substituted for La3+ ions in LSCO, first-principles calculations showed that the apical oxygen in the CU06 octahedrons tend to approach towards Cu2+ ions 4,5. As a result the CU06 octahedron elongated by the JT effect shrink by doping holes. This deformation against the JT distortion is called "anti-J ahn-Teller effect" 6.
471
By this effect, the energy separation between the two kinds of orbital states, the ai g anti-bonding orbital state and the bIg bonding orbital state, becomes smaller 7,8. The spatial extension of the ai g anti-bonding orbital and the bIg bonding orbital is shown in figure l. By taking account of the anti-Jahn-Teller effect Kamimura and Suwa proposed that one must consider the ai g and bIg states equally in forming the metallic state of cuprates, and they constructed a metallic state coexisting with the local AF order. This model is now called "Kamimura-Suwa model" (K-S model) following the two authors' names of the paper by H. Kamimura and Y. Suwa 9. In this paper we will discuss two important results obtained from the interplay of Jahn-Teller Physics and Mott Physics. One is concerned with the Fermi surface. On the basis of the K-S model we will show that the "Fermi arcs" observed in ARPES should be one of the edges of real small Fermi pockets in the nodal region. This prediction is shown to be consistent with recent ARPES experimental results reported by Tanaka et al 11 and by Meng et al 22 . As the second important result obtained from the interplay of JahnTeller Physics and Mott Physics, we will show that the mechanism of superconductivity is caused by the interplay of strong electron-phonon interactions and an AF order. It is shown that (1) the characteristic phase difference of wave functions between up- and down-spin carriers in the presence of the AF order leads to the superconducting gap of d x 2_y2 symmetry even in the phonon-involved mechanism, (2) the out-of plane phonon modes in LSCO with tetragonal symmetry contribute to the formation of Cooper pairs while the in-plane modes do not, (3) Tc is higher in a cuprate with CU05 pyramid than that with CU06 octahedron because the number of out-of plane modes which contribute to d-wave superconductivity is larger in the former than in the latter, and (4) the calculated hole-concentration dependences of Tc and of the isotope effects for LSCO are consistent with recent experimental results. The organization of the present paper is the following: It consists of three parts. In the first part (Section 2) we will review briefly the K-S model. In the second part (Section 3) we will give the key features of the many-body-effects included energy bands and Fermi surfaces. And in the final part (Section 4) we will show how the d-symmetry superconducting gap appears from the K-S model. Section 5 is devoted to conclusion and concluding remarks.
472
2. Brief review of the Kamimura-Suwa (K-S) model
2.1. Anti-Jahn-Teller Effect When Sr2+ ions are substituted for La3+ions in LSCO, one may think intuitively that apical oxygen in the CU06 octahedrons tend to approach toward central Cu 2+ ions in order to gain the attractive electrostatic energy. Theoretically it was shown by the first-principles variational calculations of the spin-density-functional approach 4,5 that the optimized distance between apical 0 and Cu in LSCO which minimizes the total energy of LSCO decreases with increasing Sr concentration. As a result the elongated CU06 octahedrons by the Jahn-Teller (JT) interactions shrink by doping holes. We have called this shrinking effect against the Jahn-Teller distortion "antiJ ahn-Teller effect" 6.
2.2. The energy landscape of CU06 octahedron In order to clealify the the character of carriers hole,the first discribe many electron state CU06 octahedrons. Figure 2 shows the energy-level landscape starting from the e g and t2g orbitals of a Cu2+ ion in a regular CU06 octahedron with octahedral symmetry shown at the left column in the case of LSCO. By the JT effect the Cu eg orbital state splits into aIg and bIg orbital states, which form antibonding and bonding molecular orbitals with oxygen p orbitals. The molecular orbitals are denoted by ai g , aIg, big and bIg, as shown at the middle column. In an undoped case, 7 electrons occupy these cluster orbitals, so that the highest occupied big state is half filled, resulting in an S = 1/2 state, where a big orbital has Cu d x 2_y2 character. Following Mott physics, we introduce the Hubbard U interaction (U = 10eV) as a strong electron-correlation effect. Then the big state splits into the lower and upper Hubbard bands denoted by L.H. and V.H. in the figure, and this gives rise to a localized spin around a Cu site. These localized spins form the antiferromagnetic (AF) order by the superexchange interaction via an intervening 0 2- ion in undoped La2Cu04. Now let us remove one electron from the present system. In other words, let us consider the case that one hole is introduced to this CU06 cluster which has an up-spin electron as the localized spin. Due to the strong electron-electron interaction, it is not the big electron which is removed from the original electron configuration as expected from one-electron energy diagram. Instead, there are two possibilities of removing an electron;
473
U.H. fl······ /
Electron energy
/ /
/
/"
/
IA-site IIB-site I
/ /
/ /
3d
CU 2 1
Exchange interaction
,
3eV
\
\
... +
\ \ \
-- --------------- ~ \ \ \
\
L.H. \. ~••.
•• '1-"
Localized spin
Figure 2. Energy diagram showing the interplay of the Mott physics (electron correlation and exchange interaction between the spins of a doped hole and a localized hole) and of the J ahn- Teller physics (anti-JT effects) (Results of the first-principles calculations by Kamimura and Eto 7,8).
the bIg and ai g states. Since the multiplet energy of the six electron state with an absent big electron becomes very large due to the Hubbard repulsion of bIg, other multiplet states become more favorable. That is, the many-electron states with one electron occupying the big state and the other one in the bIg or ai g orbital with other doubly occupied orbitals. The former state must have total spin S = 0 due to the exchange interaction between bIg and big states while the latter has spin triplet structure because of Hund's interaction between big and ai g states. In the right column of figure 2, the present situation is described by assigning effective "one-electron energy" to each eg-electrons in a CU06 octahedron. For example, removing an up-spin electron form an ai g orbital costs O.5eV more than removing a down-spin electron from the same orbital as shown in the figure. Next, we show the energy landscape in "hole picture" in figure 3. As is
474
~/
Localized spin
Localized spin
/
\
\ \ \
-- - ----\ -.... + \
\
\
Hole energy
\
L.H. \. .••. . ..... Localized spin Figure 3_ Energy diagram in hole picture, when one down-spin hole is doped into CU06 cluster embedded in LSCO material.
shown in figure 3, when one down-spin hole is doped, it occupies bIg orbital in A-site. Then this hole can transfer from bIg orbital at A-site to ai g orbital at B-site. While the localized up-spin at A- and the localized down-spin at B-sites occupy U.H. band, in the hole picture. So that, the dopant downspin hole in ai g orbital at B-site becomes parallel to the localized spin in the localized big orbital by Hund's coupling. Thus this spin-triplet multiplet is called Hund's coupling 3B Ig . On the other hand, the dopant down-spin hole in bIg orbital at A-site becomes anti-parallel to the localized spin in the localized big orbital. This spin-singlet multiplet is called Zhang-Rice 1 A lg . The spatial distribution of these two kinds of multiplets are shown in the upper part of figure 3. By the first-principles cluster calculations which takes into account the Madelung potential due to all the ions surrounding a CU06 cluster in LSCO, Kamimura and Eto showed that the lowest-state energies of these two multiplets are nearly equal when both the Hund's coupling and the spin-singlet
475
exchange interaction are taken into account 7,8. Consequently the energy difference between the highest occupied orbital states ai g in 3B 1g multiplet and b 1g in 1 A 1g multiplet becomes only O.leV for the optimum doping (x = 0.15) in La2-xSrxCu04' Thus, when the two CU06 clusters with localized up and down spins are placed at the nearest neighbors, these states are easily mixed by the transfer interaction between ai g and b 1g orbitals, which is about 0.3eV for LSCO.
2.3. Kamimura-Suwa model (K-S model)
By using the calculated results of Kamimura and Eto 7,8 and assuming that the localized spins form an AF order in a spin-correlated region, Kamimura and Suwa 9 constructed a metallic state of LSCO for its underdoped regime. The feature of the K-S model is that the hole-carriers in the under doped regime of LSCO form a metallic state, by taking the Hund's coupling triplet and the Zhang-Rice singlet alternately in the presence of the local AF order without destroying the AF order. In order to understand the coexistence of AF order and metallic state, we will show the cartoon (figure 4), where the localized hole state and carrier hole state has been compared with 1st and 2nd story. In figure 4 the first story of a Cu house with (yellow) roof is occupied by the Cu localized spins, which form the AF order in the spin-correlated region by the superexchange interaction. The second story in a Cu house consists of two floors, lower ai g floor and upper b 1g floor. These second stories between neighboring eu houses are connected by Oxygen rooms with (blue-color) roof. In the second story a hole-carrier with up spin enters into the ai g floor at the lefthand eu house due to Hund's coupling with eu localized up-spin in the first story (Hund's coupling triplet), as shown in the extreme left column of the figure. By the transfer interaction, the hole is transferred into the b 1g floor at the neighboring eu house (the second from the left) through the Oxygen room, where the hole with up spin forms a spin-singlet state with a localized down spin at the second Cu house from the left (Zhang-Rice singlet). Thus the AF order is preserved when a hole-carrier itinerates. The important feature of K-S model is the coexistence of the AF order and a metallic (superconducting) state in the underdoped regime. This feature is different from that of the single-component theory.
476
dopant hole with up spin
5= 1 5= 0 5= 1 5= 05= 1 5= 0 big
floor 2 nd story
1st story Up-spin Down
rFigure 40
Up-
Down-
Up-
Down-
o 0 o 0 spin-correlated region (antiferromagnetic order) o
--I
An extended two-story house model (K-S model)
2.4. Effective Hamiltonian for the Kamimura-Suwa model (K-S) model The following effective Hamiltonian is introduced in order to describe the extended two-story house (K-S) model by Kamimura and Suwa 9 It consists of four parts: The effective one-electron Hamiltonian Heff for a~g(ai) and b1g(b 1) orbital states, the transfer interaction between neighboring CU06 octahedrons (CU05 pyramids) H tn the superexchange interaction between the Cu d x 2_y2 localized spins H AF , and the exchange interactions between the spins of dopant holes and of d x 2_y2 localized holes within the same CU06 octahedron (CU05 pyramid) Hex Thus we have 0
0
H
= Heff + H tr + HAF + Hex = L i,m,a
EmCJmuCimu
+
L
tmn (CJmuCjnu
+ hoc o)
(i,j),m,n,a
+JLSiOSj+LKmSi,moSi, (i,j) i,m
(1)
477
where
Em
(m
=
aig(ai) or b1g (bd) represents the effective one-electron
energy of the aig(ai) and b 1g (bd orbital states, clma- and Cima- are the creation and annihilation operators of a dopant hole with spin (J in the i-th CU06 octahedron (i-th CU05 pyramid), respectively, tmn the effective transfer integrals of a dopant hole between m-type and n-type orbitals of neighboring CU06 octahedrons (CU05 pyramids), J the superexchange interaction between the spins S i and S j of dX 2 _y2 localized holes in the big (bi) orbital at the nearest neighbor Cu sites i and j (J > 0 for AF interaction), and K m the exchange integral for the exchange interaction between the spin of a dopant hole Sim and the dX 2 _y2 localized spin S i in the i-th CU06 octahedron (i-th CU05 pyramid). The values of the parameters in Eq. (1) for the case of LSCO are: J ~_--;-_--,Ig~ = 0.1, Ka* = -2.0, Kb l g = 4.0, ta*19 a*19 = 0.2, tb l g b , g = 0.4, ta*19 b , g = Jta~ga~gtbIgblg '" 0.28, Ea~g = 0, Eb 'g = 2.6 in units of eV, where the values of Hund's coupling exchange constant K a *Ig and Zhang-Rice exchange constant K b'g are taken from the first principles cluster calculations for a CU06 octahedron in LSCO 7,8, and the energy difference of the effective one-electron energies between ai g and bIg orbital states, Ea~g - Eb ,g = -2.6, is determined so as to reproduce the energy difference between the 3B 1g and 1A 1g multiplets in LSCO in the MCSCF cluster calculations 8, while those of tmn's are due to band structure calculations 4,5.
3. Effective energy band and the shape of Fermi surface Ushio and Kamimura 13,12 have started from tight binding Hamiltonian. Then they have obtained the effective many-body-effects included Hamiltonian (1) by taking into account the exchange term as a molecular field, which is determined so that the energies of 1A 1g and 3B 1g coincide with that calculated by Eto and Kamimura 8.
3.1. Effective energy band In figure 5 (b), the calculated many-body effect included energy band structure for up-spin (or down-spin) dopant holes for LSCO is shown for various values of wave-vector k and symmetry points in the AF Brillouin zone. Here one should note that the energy in this figure is taken for electronenergy but not hole-energy, and the Hubbard bands for localized big holes do not appear in this figure. In the undoped La2Cu04, all the energy bands in figure 5 (2) are fully occupied by electrons so that La2Cu04 is an insulator, consistent with ex-
478
In doping, a hole carrier begins to occupy from 11 point.
AF Brillouin zone
~
>~.----.~
0
~~~-+~
~ 00
~ ~ o .... ~~~2j:s~~~==~ 6L----L--L---L-~
(0,0,0)
(a)
('1C, O,O) .t::.. (0,0,0) G 1 ('1C / 2, '1C /2,0) (0,0, '1C)
(b)
Figure 5. (a) The Fermi surface of hole-carriers for x = 0.15 calculated for the #1 band. Here the kx axis is taken along rG1, corresponding to the x-axis (the Gu-O-Gu direction) in a real space. (b) The many-body-effect included band-structure for upspin dopant holes, obtained by Ushio and Kamimura 13,12. In this figure the Gu-O-Gu distance a is taken to be unity.
perimental results. In this respect the present effective energy band structure is completely different from the ordinary LDA energy bands 16,17. When Sr are doped, holes begin to occupy the top of the highest band in figure 5 (b) marked by #1 at ~ point which corresponds to Clf j2a, 7r j2a, 0) in the AF Brillouin zone. At the onset concentration of superconductivity, the Fermi level is located at the energy of E = 9.04 eV just below the top of the #1 band at ~, which is a little higher than that of the G 1 point. Here the G 1 point in the AF Brillouin zone lies at (7rja, 0, 0), and corresponds to a saddle point of the van Hove singularity. 3.2. The shape of Fermi surface
Based on the calculated band structure shown in figure 5 (b), U shio and Kamimura 13,6 calculated the Fermi surfaces for the underdoped regime of LSCO. This Fermi surface structure is also completely different from that
479
of an ordinary Fermi liquid picture, in which a Fermi surface is large. In figure 5 (a) the Fermi surface structure of hole-carriers calculated for x = 0.15 is shown as an example, where the Fermi surface (FS) consists of two pairs of extremely flat tubes. Thus the feature of FS in the underdoped regime is a small FS, and its volume is proportional to the doping concentration of dopant holes.
1.0
0.0 0.0
, 03 :, 05 . 0.5
0 .0
0.5
' 0.0
0.5
0.5
1.0
Figure 6. Observed doping dependence of Fermi arc (the inner section of FS) of La2-xSrxCu04 from x = 0.03 to 0.15, observed in ARPES by T. Yoshida et al 20,21 The calculated results based on the K-S model for x = 0.05 and x = 0.15 shown by thin curves are superimposed on the experimental results obtained by Yoshida et al.
Since the Bloch function of top band has Fourier component mainly in AF BZ, only the inner section of FS might be observed. ARPES researchers have called it "Fermi arc". Recently the Fermi arcs of hole carriers have been observed for the underdoped regime of LSCO by Yoshida et al 20,21. In figure 6 the calculated FS for x = 0.05 and x = 0.15 are compared with experimental results of La2-xSrxCu04 with x = 0.03, x = 0.05, x = 0.07 and x = 0.15. As seen in the figure, the agreement on the doping dependence of Fermi arcs between theory and experiment is fairly good. Recently, Meng et.al. 22 report ARPES measurements of Bi2Sr2-xLaxCu06+8 (La-Bi2201) that reveal Fermi pockets, supporting our calculated Fermi surface. We extend our calculations to the overdoped regime of LSCO. Since
480
the superexchange interaction becomes destroyed with increasing the hole concentration in the overdoped regime, the K-S model is considered not to hold beyond a certain hole concentration XO' As a result FS will change from small ones to a large one beyond XO' In the case of LSCO we think that Xo is 0.2 from experimental results of Loram et al 23 and of Nakano et al 24. 4. High temperature superconductivity
4.1. Spin-dependent electron-phonon interaction In this subsection we describe that the electron-phonon interaction in the K-S model depends on a spin direction of a hole carrier, in doing so, we describe only a stream of theoretical derivation. Therefore, for those who have interests in the derivation of equations, please read chapters 13 and 14 in our Springer text book entitled "Theory of Copper Oxide Superconductors" 6. On the basis of the K-S model we have shown that the interplay between electron-phonon interaction and the AF order leads to the d-wave phonon-involved mechanism in LSCO 26,27,6. As seen in figure 7, the atomic wave function for up-spin coincides with that for down-spin, if it is displaced by vector a (a is Cu-O-Cu distance). Thus, the wavefunctions of a hole-carrier with up and down spin in K-S model have the following phase relation:
(2) From this relation (2), the electron-phonon interaction matrix elements from states k to k' with up spin scattered by phonon with wave vector q, V l' (k, k', q), has the following spin-dependent property:
V l' (k, k', q)
= exp(iK . a)V 1 (k, k', q),
(3)
where K = k - k' - q is a reciprocal lattice vector in the AF Brillouin zone and a is a Cu-O-Cu distance. Since a reciprocal lattice vector in the AF Brillouin zone is expressed as K = (mr/a,m1f/a,O) with n+m = even, the electron-phonon interactions for up spin and down spin may have a different sign depending on a value of K.
4.2. Effective inter-hole interaction via phonon From the relation (3) and K = (n1f/a,m1f/a,O) with n+m = even, the effective interactions of a pair of holes from (k 1', - k 1) to (k' 1', - k' 1),
481
(a)
-
a
(b)
Figure 7. Schematic picture showing the phase relation between the wavefunctions of a hole-carrier with up and down spin on the extended two-story house mod el for La2-xSrxCu04
scattered by phonon with wave vector q, is expressed as
v l' (k , k' , q)V 1 (-k, -k', q) = exp(iK . a)1V l' (k , k' , q)12.
(4)
Since exp(iK . a) = +1 for n = even and exp(iK . a) = -1 for n = odd, the effective interaction for forming a Cooper pair becomes attractive for n = even while repulsive for n = odd. This remarkable result in the K-S model leads to the superconducting gap of d x 2_y2 symmetry. In figure 8 we show how attractive and repulsive subprocesses compete each other in a scattering process of a conduction hole by phonon. Suppose a hole occupies the state A in figure 8. A hole is scattered by phonon with wavevector q from A to shaded region because q is in normal BZ. So that , scattering process from the state A to the state B consists of two kinds of subprocesses. One subprocess is that a hole is scattered by phonon with wavevector q from A to state C' on FS. Since state C' is transferred to an equivalent state B by the translation of a reciprocal lattice vector (-Ql - Q2), the effective interaction for this subprocess is attractive
482
q: phonon wave vector
QIt Q2: AF reciproca l lattice vectors
..........•
umklapp sub-process (repulsive) normal sub-process (attractive)
Figure 8. Competition of attractive and repulsive subprocesses in the same scattering process of a carrier hole by phonon from state A to B in the AF Brillouin zone. A hole is first scattered by phonon to the point in the shadowed area , and then to the point equivalent to it.
according to equation (4) with n = 2. The other subprocess is that the hole at the state A on FS is also scattered from A to state C by phonon with wavevector q. Since the state C is equivalent to state B by the translation of a reciprocal lattice vector ( - Ql), the effective interaction is repulsive by equation (4) with n = m = 1. State C I is inside an AF Brillouin zone while state C is outside the AF Brillouin zone, one may say that a scattering process from A to B via C I is a Normal scattering while a scattering process from A to B via Cis Umklapp scattering. Normal and Umklapp scattering have different sign, so that the effective interaction between holes at state A and B becomes attractive or repulsive depending on the strengths of two scatterings. These strengths vary by depending a wave vector q. As an example of calculated results, we present the calculated results of the k and kl dependence of the electron-phonon spectral function 0;2 F r 1 ([1, k, k/) with spin-singlet for one of out-of-plane modes, A 1g mode (see the lower left corner ofthe Fig. 9(a)) in LSCO with tetragonal symmetry in figure 9, where Fr 1 ([1, k, k/) is the momentum-dependent density of phonon states and 0;2 is the square of the electron-phonon coupling constant
483
La
(b)
attractive
(a) repulsive
(C)
repulsive
Figure 9. Calculated result of a 2 P, 1 (0, k, k') and a gap function for one of out-of-plane phonon modes, Al g mode in LSCO as a function of k'
26,6. From the Fig. 9( c) we can see that the momentum-dependent spectral function varies by taking values with + and - sign, when the wave vector k' changes from the section CD of small FS to the section @)while k is fixed. This is clearly a d-wave behavior. By using this spectral function, we can obtain the Fig. 9(c), the obtained d-component gap function 6.(k) vary as a function of (cos(kxa) - cos(kya)) and the wavefunction of a Cooper pair has a spatial extension of d x 2_y2 symmetry. Kamimura et al calculated the electron-phonon spectral function a 2F(n) for all phonon modes of LSCO with tetragonal symmetry, and their calculated result of d-wave component of the spectral function for LSCO with tetragonal symmetry, a 2F i 1(2) (0,), is shown in figure 10 as a function of phonon frequency n. The phonon modes of LSCO with tetragonal symmetry are classified into in-plane modes and out-of-plane modes with regard to a CU02 plane. Figure 10 shows that the out-of-plane modes
484
(0 ~
c:
~
"'-
~
""a
~
Phonon Energy
n (meV)
s:l
M
& Cil
C"I
u
...,uI-<
...
Example of phonon mode
~
.9 ...,
La
that contributes to positive a 2F
(b)
~
Q)
p.
r:n
0
•
0
ci
100
La
...
Example of phonon mode
(a)
0
that contributes to negative a 2F
(C)
Figure 10. The d-wave component of a 2 F(O) calculated for tetragonal LSCO with an optimum doping
such as the A 1g mode in figure 9 yield positive spectral function so that they contribute to the formation of Cooper pairs by acting as an attractive force while the in-plane modes yield negative spectral function so that these do not contribute to the formation of Cooper pairs by acting as a repulsive force.
4.3. Calculated results of the hole-concentration dependence of Tc and of isotope effect Before calculating T e , We would like to point out that a mean free path Co in a metallic region, where K-S model stands up all right, is much expanded from the correlation length AS by the spin fluctuation effect. As is shown
485
in figure 11, localized spins in the boundary of spin correlation region are frustrated and spin-flip time, Ts cv 1O-14 sec, is shorter than traveling time of a hole over the spin-correlated region, TF cv 1O-13 sec. So that, the spin correlated region moves with the dopant hole and a mean free path is much expanded from the correlation length.
b Ig
do
nt hole
:::::::::::::.:-::~:::::::::::::::::::::::::-:-:::::::::::::::::::::::::: 8
spins
RVB state
(a)
::::::::::::::::::::-:_:::~:,~.o:;::::::::::::::::: __--::::::::::::::::: ~...,t-HI--++--
-
--
spins
(b)
RVB state
Figure 11. Model of expansion of metallic region. When a dopant hole moves, localized spin flips on a boundary and the metallic region moves with a dopant hole.
As mentioned previous section, we have shown that even though the electron-phonon interaction is involved in the mechanism of superconductivity, the interplay of the AF order and phonon mechanism in the K-S model creates superconductivity of "d-wave symmetry", when a system is infinite and homogeneous. Now we will present the calculated results of Tc and the isotope effect for LSCO with tetragonal symmetry, considering that the metallic region is finite. Kamimura et al 26,27,6 calculated the holeconcentration dependence of Tc and the isotope effects for LSCO using a slightly modified McMillan Tc-equation 28, and their calculated results of Tc
486
~[K]
fa (x=O .15) = 300A
~
• Exp . by
0
~
Ta kag i et al.
0 0
isotope effect
TJK] 0
~
o
(Y')
0
0
N
(Y')
(Y')
a
0 0
N
0
N
O L--L__~~__-L~L-~__~
0.0
0.1
0.2
0.3
0 0 ,....,
0 ,....,
O
0
L-~__~~~~~~.~~~ ~
0.0
Hole Concentration x
0.1
0.2
0.3
Hole Concentration x (b)
(a)
Figure 12. (1) The calculated result on the hole-concentration dependence of Tc(x) for LSCO and the mean free path £0 . (2) The calculated isotope effect a and Tc curve (thick line) for t etragonal LSCO as a function of x
and the isotope effect ex are shown in figure 12. The result of isotope effect is compared with experimental results by Crawford et al 29 and Ronay et al 30. Here the isotope effect ex is defined by d log Te ex = - dlogM' where M denotes the mass of constituent atoms. In their calculation a wellknown relation between the Debye frequency and a mass, DD '" M - O.5 , is used. Further, in their calculation, the masses of all constituent atoms have been changed by the mass ratio of 18 0 to 16 0. Thus their calculated results may be overestimated. Here we should make a remark on the calculated results of Te. On the K-S model the size of a metallic state is finite. In order to obtain the observed value of 40 K for the highest Tc of LSCO at the optimum doping (x = 0.15), Kamimura et al 26,27,6 have chosen the size of a metallic region £0 as shown in figure 12.
487
From the calculated results of the isotope effect shown in figure 12, the following conclusions emerge: The isotope effect on Tc in LSCO depends on the hole concentration critically, and the isotope constant 0: is remarkably large near the onset concentration of superconductivity while it is small around the optimum concentration. Recently Bishop et al 32 published a review article on recent and earlier results on isotope effects in cuprates, pointing out that the isotope effect on Tc vanishes at optimum doping but increases with decreasing doping level to be substantially larger than the BCS value (0: = 0.5) at the border to the AF state. This experimental result is consistent with the theoretical result obtained from the K-S model shown in figure 12. In this section we have calculated the appearance of d-wave suoerconductivity, however, because of finite size effect, we consider that the superconductivity is not pure d-wave superconductivity but is mixed with s-wave superconductivity, as is consistent with argument by professor Muller 31.
4.4. Comment on why Tc is higher in CU05 pyramid than in CU06 octahedron It is well known that the transition temperature is higher in CU05 pyramid than in CU06 octahedron. We discuss as for this character shortly. As for phonon mode, we will consider only out-of-plane mode. For example, Cu out-of-plane mode shown in figure 13 contributes to d-wave superconductivity only in CU05 pyramid, if we consider the q=O case. It does not contribute to d-wave superconductivity in CU06 octahedron because CU06 octahedron has a symmetry with respect to CU02 plane and electron-phonon interaction cancels. Thus the reason why Tc is higher in CU05 pyramid than in CU06 octahedron, is because the number of out-of plane modes which contribute to d-wave superconductivity is larger. 5. Conclusion and concluding remarks The extended two-story house model (the Kamimura-Suwa (K-S) model) has clarified how the interplay of Mott physics and Jahn-Teller physics plays an important role in determining the superconducting as well as metallic state of underdoped cuprates. In this paper it was first pointed out for under doped cuprates that Mott physics gives rises to the existence of local antiferromagnetic order due to the localized spins while the anti-J ahn-Teller effect as a central issue of Jahn-Teller physics produces the existence of two kinds of orbitals parallel and perpendicular to a CU02 plane which are ener-
488
CuD 6 octahedron (a)
CuDs pyramid (b)
Figure 13. Exa mple of Cu out-of-plane mode, which contributes to d-wave superconductivity only in CU05 pyramid .
getically nearby. As a result of the interplay of both physics the K-S model has shown the appearance of exchange interactions between the spins of a localized hole and of a carrier hole in underdoped cuprates and that these exchange interactions play an important role in producing the coexistence of superconductivity and antiferromagnetism and the appearance of d-wave superconductivity even in the phonon-involved mechanism. Brief review of these facts as well as the K-S model has been given in this paper.
Acknowledgments We would like also to thank Dr. Wei-Sheng Lee of Stanford University for his valuable discussion and suggestions with regard to the recent ARPES experimental results on the presence of two distinct energy gaps exhibiting different doping dependence which has been reported in ref. 11 .
References 1. J. G. Bednarz and K. A. Muller, Z. Phys. B 64 189 (1986). 2. P. W . Anderson , Science 235 1196 (1987). 3. F. C. Zhang and T . M. Rice, Phys. Rev. B 373759 (1988).
489
4. N. Shima, K. Shiraishi, T. Nakayama, A. Oshiyama and H. Kamimura, Pmc. lSAP-MRS Int'l Conf. on Electronic Materials eds. Sugano T et al (Materials Research Society) p 51 (1989). 5. A. Oshiyama, N. Shima, T. Nakayama, K. Shiraishi and H. Kamimura, Mechanism of High Temperature Superconducitivity (Springer Series in Materials Sciecne Vol. 11) eds Kamimura Hand Oshiyama A (Springer) p111 (1989). 6. H. Kamimura, H. Ushio, S. Matsuno and T . Hamada, Theory of Copper Oxide Supercondcutors (Heidelberg: Springer) (2005). 7. H. Kamimura and M. Eto, l. Phys. Soc. lpn. 59 3053 (1990). 8. M. Eto and H. Kamimura, l. Phys. Soc. lpn. 60 2311 (1991). 9. H. Kamimura and Y. Suwa, l. Phys. Soc. lpn. 623368-3371 (1993) . 10. H. Kamimura, T. Hamada and H. Ushio, Phys, Rev. B 66 054504 (See section II) (2002). 11. K. Tanaka, W. S. Lee, D. H. Lu, A. Fujimori, T. Fujii, Risdiana, Terasaki, D.J. Scalapino, T.P. Devereaux, Z. Hussain, Z,-X. Shen, Science 314 1910 (2006). 12. H. Kamimura and H. Ushio, Solid State Commun. 91 97 (1994). 13. H. Ushio and H. Kamimura, l. Phys. Soc. lpn. 642585 (1995). 14. T .Mason, A. Schroder, G. Aeppli, H.A. Mook and S.M. Haydon, Phys. Rev. Lett. 77 1604 (1996). 15. K. Yamada, C.H. Lee, J. Wada, K. Kurahashi, H. Kimura, Y. Endoh, S. Hosoya, G. Shirane, R.J. Birgeneau and M.A. Kastner, l. Supercond. 10343 (1997). 16. See , also, J. Yu, A.J. Freeman and J.-H Xu, Phys. Rev. Lett. 58 1028 (1987). 17. See, also, J. Yu, A.J. Freeman and J.-H Xu, Phys. Rev. Lett. 58 1035 (1987). 18. C.T . Chen, L.H. Jieng,J . Kuo, P. Rudolf, F. Sette and R.M. Fleming, Phys. Rev. Lett. 68 2543 (1992). 19. V.L. Anisimov, S.Yu Ezhov Sand T.H. Rice, Phys. Rev. B 55 12829 (1997). 20. T. Yoshida, X.J. Zhou, M. Nakamura, S.A. Keller, P.V. Bogdanov, E.D. Lu, A. Lanzara, Z. Hussain, A. Ino, A. Fujimori, H. Eisaki, Z. -X ,Shen, T. Kakeshita and S. Uchida, Phys. R ev. Lett. 91027001 (2003). 21. T. Yoshida, X.J. Zhou, K. Tnanaka, W.L. Yang, Z. Hussain, Z. -X. Shen, A. Fujimori, S. Sahrakorpi, M. Lindroos, R.S. Markiewicz, A. Bansi, Seiki Komiya, Yoichi Ando, H. Eisaki, T. Kakeshita, and S. Uchida, Phys. Rev. B 74224510 (2006). 22. J. Meng, et al., NATURE 462 335 (2009). 23. J. W. Loram, K. A. Mirza, J. R. Cooper and J. L. Tallon, l. Phys. Chem. Solids 59 2091 (1998). 24. T. Nakano, M. Oda, C. Manabe, N. Momono, Y. Miura and M. Ido, Phys. Rev. B 49 16000 (1994). 25. M.R. Norman, M. Randeria, H. Ding and J.C. Campzano, Phys. Rev. B 57 Rll093 (1998). 26. H. Kamimura, S. Matsuno, Y. Suwa and H. Ushio, Phys. Rev. Lett. 77723 (1996) . 27. H. Kamimura, T. Hamada, S. Matsuno and H. Ushio, l. Supercond. 15 379 (2002).
490
28. W. L. McMillan, Phys. Rev. 167331 (1968). 29. M. K. Crawford, W. E. Farneth, E. M. III McCarron, R. LHarlow, A. H. Moudden, Science 250 1390 (1990); M. K. Crawford, M. N. Kunchur, W. E. Farneth, E. M. III McCarron and S. J. Poon, Phys. Rev. B 41 282 (1990). 30. Maria Ronay, M. A. Frisch and T. R. McGuire, Phys. Rev. B 45 355 (1992). 31. K. A. Muller, Phil. Mag. Lett. 82 279 (2002). 32. A.R. Bishop, A. Bussmann-Holder, M. Cardona, O.V. Dolgov, A. Furrer, H. Kamimura, H. Keller, R. Khasanov, R.K. Kremer, D. Manske, K.A. Muller and A. Simon, J. Supercond. Novel Magnetism 20 393 (2007). 33. N. Doiron-Leyrud, C. Proust, D. LeBoeuf, J. Levallois, J-B. Bonnemaison, R. Liang, D. A. Bonn, W. H. Hardy and L. Taillefer, Nature 447565 (2007). 34. X-G. Wen and P. A. Lee, Phys. Rev. Lett. 76 503 (1996).
Purpose
Organizer
Ad aryCo mmi ttee 11,,,, 1C1 D'lnu n, lt,,y) ~ fIA(fIllt CoGm, Gmufotl
!fN-,""",""'_'.n,.o.) (TokyoU_of5dolr.<,.) (K, o,o Ulllu"l t , . J,pU)
loea/Committee
,-......
\¥l,h.
"'""" !'.,,111
·111"")t.. r~
:,c.,#
.. x .... ,~
1'"... "."JIt"
:w,p.tt.,tofIISJ_iU"'JIfn
• orb)",,,
"."h ... , tlk-. II,.
•
:wp~tAil:fqI~
Itt,
"
"':,IIifri'I,rfu..."""
,..,. ......... tlSc
Venue Canal Hall, Noaa Campus Tokyo University of5cience Noda ~ City, Chiva, Japan
_~_-==-,===:;;a~==ss~-
"'''',..",."
l i,ritaidoUIlIMllI'!I «wtCt/I1.r't:rSIty nrngham UIIIl1t"T .. ,ry W'liCt'lttll" CL!plTm~lI~ Unro.
A Majewski
UmrxtS1hlllj
M Micml/ski F. A'fllldutmedi1r1
N'hi/ttl. C~ U" l"t I,,'mflf ~ U,. fil"
Izumi 0iinlil ,\ LI~si"kl Rrgoli Sj S;
Kyoto Um;tt1'6t1y
IJrwnJnrbUTg UmOtfSl'V
0_ Smolyn1llw
Um~ty tJ/ Rt'nun A/chI Prq..,.",u"'U~ Aolosc/RU Sink Un/'tWSltv
S,rmJI1rlmr& tlnlWtSlty
HuI" l\o:tltpdJm
1\~IX"1{tI UmwrJ'llI(
1.IItiwill8 Streif Tndfbhj ToyPda
8180S Unrtter>iII.1I6rd# Tok", Unitf("J".~ity
Un~tyofu.rn.us
HJJd.1 Us/tio
TokytlNilLWnaiColTcsrc7(Ttd.
A tam1d"
~z,'Olt:rd CcptTnr.:"s
""
Wfr;!IIImi~
A-W
G:l7i..,
11"~
f,wndt-St.-mU" UIllCJ.
l((,J.u(S
A. fJrrtm,,4,fL'
CopemKI4!>
Unt~~t"ofVb",~
Unit>. Unm
'mi.
SId.ilJ,;t Mal1/(~mtJtl{'trl Tiihmgar Um<'tTSJ~ atld sonl" mlJrc Q8lC tflC'mt!t:r!i .,./. k'
Igt'lr \-OIm't("h Dluk '""ante
I tern tional Con~ rene Q51C 2010 Quantum Bio-Informatics Center
.
~
Tokyo University of Science .
,~
March 10-1:3 2010
_ _ _ (IIC!.CINDI.
_'C ..
'I1IL:481~UDl-sNP
...... cp
......
TheQllIC(QuanlUm B;o.lnfi:>nnatocs O!nter) project from 2006 to 2011 ;5"1""""'00 by lapane... MinistJy of &luca_ and Tokyo Um""";'Y ofSdenre