Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
6154
Alessandro Aldini Marco Bernardo Alessandra Di Pierro Herbert Wiklicky (Eds.)
Formal Methods for Quantitative Aspects of Programming Languages 10th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2010 Bertinoro, Italy, June 21-26, 2010 Advanced Lectures
13
Volume Editors Alessandro Aldini Università di Urbino “Carlo Bo”, Dipartimento di Matematica, Fisica e Informatica Piazza della Repubblica 13, 61029 Urbino, Italy E-mail:
[email protected] Marco Bernardo Università di Urbino “Carlo Bo”, Dipartimento di Matematica, Fisica e Informatica Piazza della Repubblica 13, 61029 Urbino, Italy E-mail:
[email protected] Alessandra Di Pierro Università di Verona, Dipartimento di Informatica Ca’ Vignal 2, Strada le Grazie 15, 37134 Verona, Italy E-mail:
[email protected] Herbert Wiklicky Imperial College London, Department of Computing Huxley Building, 180 Queen’s Gate, London SW7 2BZ, UK E-mail:
[email protected]
Library of Congress Control Number: 2010928129 CR Subject Classification (1998): D.2.4, D.3.1, F.3-4, C.3 LNCS Sublibrary: SL 2 – Programming and Software Engineering ISSN ISBN-10 ISBN-13
0302-9743 3-642-13677-X Springer Berlin Heidelberg New York 978-3-642-13677-1 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper 06/3180
Preface
This volume presents the set of papers accompanying some of the lectures of the 10th International School on Formal Methods for the Design of Computer, Communication and Software Systems (SFM). This series of schools addresses the use of formal methods in computer science as a prominent approach to the rigorous design of the above-mentioned systems. The main aim of the SFM series is to offer a good spectrum of current research in foundations as well as applications of formal methods, which can be of help for graduate students and young researchers who intend to approach the field. SFM 2010 was devoted to formal methods for quantitative aspects of programming languages and covered several topics including probabilistic and timed models, model checking, static analysis, quantum computing, real-time and embedded systems, and security. This volume comprises four articles. The paper by Di Pierro, Hankin, and Wiklicky investigates the relation between the operational semantics of probabilistic programming languages and discrete-time Markov chains and presents a framework for probabilistic program analysis inspired by classical abstract interpretation. Broadbent, Fitzsimons, and Kashefi review the mathematical model underlying measurement-based quantum computation, a novel approach to quantum computation where measurement is the main driving force of computation instead of the unitary operations of the more traditional quantum circuit model. The paper by Malacaria and Heusser illustrates the informationtheoretical basis of quantitative information flow by showing the relationship between lattices, partitions, and information-theoretical concepts, as well as their applicability to quantify leakage of confidential information in programs. Finally, Wolter and Reinecke discuss the trade-off between performance and security by formulating metrics that explicitly express the trade-off and by showing how to find system parameters that optimize those metrics. We believe that this book offers a useful view of what has been done and what is going on worldwide in the field of formal methods for quantitative aspects of programming languages. We wish to thank all the speakers and all the participants for a lively and fruitful school. We also wish to thank the entire staff of the University Residential Center of Bertinoro for the organizational and administrative support. June 2010
Alessandro Aldini Marco Bernardo Alessandra Di Pierro Herbert Wiklicky
Table of Contents
Probabilistic Semantics and Program Analysis . . . . . . . . . . . . . . . . . . . . . . . Alessandra Di Pierro, Chris Hankin, and Herbert Wiklicky
1
Measurement-Based and Universal Blind Quantum Computation . . . . . . . Anne Broadbent, Joseph Fitzsimons, and Elham Kashefi
43
Information Theory and Security: Quantitative Information Flow . . . . . . Pasquale Malacaria and Jonathan Heusser
87
Performance and Security Tradeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Katinka Wolter and Philipp Reinecke
135
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
169
Probabilistic Semantics and Program Analysis Alessandra Di Pierro1, Chris Hankin2 , and Herbert Wiklicky2 1 2
University of Verona, Ca’ Vignal 2 - Strada le Grazie 15, 37134 Verona, Italy
[email protected] Imperial College London, 180 Queen’s Gate, London SW7 2AZ, United Kingdom {clh,herbert}@doc.ic.ac.uk
Abstract. The aims of these lecture notes are two-fold: (i) we investigate the relation between the operational semantics of probabilistic programming languages and Discrete Time Markov Chains (DTMCs), and (ii) we present a framework for probabilistic program analysis which is inspired by the classical Abstract Interpretation framework by Cousot & Cousot and which we introduced as Probabilistic Abstract Interpretation (PAI) in [1]. The link between programming languages and DTMCs is the construction of a so-called Linear Operator semantics (LOS) in a syntax-directed or compositional way. The main element in this construction is the use of tensor product to combine information about different aspects of a program. Although this inevitably results in a combinatorial explosion of the size of the semantics of program, the PAI approach allows us to keep some control and to obtain reasonably sized abstract models.
1
Introduction
These lecture notes aim in establishing a formal link between the semantics of deterministic and probabilistic programming languages and Markov Chains. We will consider only discrete time models but, as we have shown in [2], it is possible to use similar constructions also to model continuous time systems. Our motivation is based on concrete systems rather than specifications of systems as we find it for example in the area of process algebras; we therefore eliminate any non-probabilistic or pure non-determinism. To a certain degree non-deterministic models can be simulated by using “unknown” probability variables rather than constants to express choice probabilities. However, this leads to slightly different outcomes as even “unknown” probabilities, for example, are able to express correlations between different choices. A further (didactic) restriction we will use throughout these notes is the finiteness of our state and configuration spaces. Although it is possible to develop a similar framework also for infinite spaces, this requires certain mathematical tools from Functional Analysis and Operator Theory (e.g. C∗ algebras, Hilbert and Banach spaces) which are beyond what a short introduction can provide. We will therefore consider only a finite-dimensional algebraic theory for which a basic knowledge of linear algebra is sufficient. A. Aldini et al. (Eds.): SFM 2010, LNCS 6154, pp. 1–42, 2010. c Springer-Verlag Berlin Heidelberg 2010
2
A. Di Pierro, C. Hankin, and H. Wiklicky
In the following we will use a simple but intriguing example to illustrate our approach: Example 1 (Monty Hall). The origins of this example are legendary. Allegedly, it goes back to some TV show in which the contestant was given the chance to win a car or other prizes by picking the right door behind which the desired prize could be found. The game proceeds as follows: First the contestant is invited to pick one of three doors (behind one is the prize) but the door is not yet opened. Instead, the host – legendary Monty Hall – opens one of the other doors which is empty. After that the contestant is given a last chance to stick with his/her door or to switch to the other closed one. Note that the host (knowing where the prize is) has always at least one door he can open. The problem is whether it is better to stay stubborn or to switch the chosen door. Assuming that there is an equal chance for all doors to hide the prize it is a favourite exercise in basic probability theory to demonstrate that it is better to switch to a new door. We will analyse this example using probabilistic techniques in program analysis - rather than more or less informal mathematical arguments. An extensive discussion of the problem can be found in [3] where it is also observed that a bias in hiding the car (e.g. because the architecture of the TV studio does not allow for enough room behind a door to put the prize there) changes the analysis dramatically. Note that it is pointless to investigate a non-deterministic version of the Monty Hall problem: If we are only interested in a possibilistic analysis then both strategies have exactly the same possible outcomes: The contestant might win or lose – everything is possible. As in many walks of life it is not what is possible that determines success, but the chances of achieving one’s aim.
2
Mathematical Preliminaries
We assume that the reader of these lecture notes is well acquainted with basic ideas from linear algebra and probability theory. We will consider here only finite dimensional spaces and thus avoid a detailed consideration of finite dimensional spaces, as in functional analysis, and general measure theoretic concepts. However, it is often possible to generalise the concepts to such an infinite dimensional setting and we may occasionally mention this or give hints in this direction. We need to introduce a few basic mathematical concepts – the acquainted readers may skip immediately to Section 3. The aim of this section is to sketch the basic constructions and to provide some motivation and intuition of the mathematical framework we use. A more detailed discussion of the notions and concepts we need can be found in the appropriate textbooks on probability and linear algebra.
Probabilistic Semantics and Program Analysis
2.1
3
Vector Spaces
In all generality, the real vector space V(S, R) = V(S) over a set S is defined as the formal1 linear combinations of elements in S which we can also see as tuples of real numbers xs indexed by elements in S V(S) = {xs , ss∈S | xs ∈ R} = xs s = {(xs )s∈S } , s∈S
with the usual point-wise algebraic operations, i.e. scalar multiplication for λ ∈ R: λ · (xs )s = (λ · xs )s and vector addition (xs )s + (ys )s = (xs + ys )s . We denote tuples like (xs )s or (ys )s as vectors x and y. We consider in the following only finite dimensional vector spaces, i.e. V(S) over finite sets S, as they possess a unique topological structure, see e.g. [4, 1.22]. By imposing additional constraints one could equip V(S) with an appropriate topological structure even for infinite sets S, e.g. by considering Banach or Hilbert spaces like 1 (S), 2 (S), etc. (see for example [5]). The importance of vector spaces in the context of these notes comes from the fact that we can use them to represent probability distributions ρ, i.e. normalised functions which associate to elements in S some probability in the interval [0, 1] ρ : S → [0, 1] s.t. ρ(s) = 1. s∈S
The set of all distributions Dist(S) on S is isomorphic to a sub-set (however, not a sub-space) of V(S). This helps to transfer the algebraic structures of V like, for example, the tensor product (see below) immediately into the context of distributions. The important class of structure preserving maps between vector spaces V and W are linear maps T : V → W which fulfil: T(v) = λ · T(v) and T(v 1 + v 2 ) = T(v 1 ) + T(v 2 ). For linear maps T : V → V we usually use the term operator. Vectors in any vector space can be represented – as in the above definition of V(S) – as a linear combination of elements a certain basis, or even simpler as a tuple, i.e. a row, of coordinates. Usually, we will use here the defining basis {s | s ∈ S} so that we do not need to consider the problem of base changes. As with vectors we can also represent linear maps in a standardised way as matrices. We will treat here the terms linear map and operator as synonymous of 1
We allow for any – also infinite – linear combinations. For the related notion of a free vector space one allows only finite linear combinations.
4
A. Di Pierro, C. Hankin, and H. Wiklicky
matrix. The standard representation of a linear map T : V → W simply records the image of all basis vectors of the basis in V and collects them as row vectors of a matrix. It is sufficient to just specify what happens to the (finitely many) basis vectors to completely determine T as by linearity this can be extended to all (uncountably infinitely many) vectors in V. Given a (row) vector x = (xs )s and the matrix (Tst )st , with the first index indicating the row and the second the column of the matrix entry, representing a linear map T we can implement the application of T to x as a matrix multiplication: T(x) = x · T = (xs )s · (Tst )st = ( xs Tst )t . s
2.2
Discrete Time Markov Chains
The standard and most popular model for probabilistic processes are Markov Chains. We assume a basic knowledge as presented for example in [6,7,8,9,10], to mention just a few of the many monographs on this topic. Markov chains have the important property that they are memory-less in the sense that the “next state” does not depend on anything else but the current state. Markov Chains come in two important versions as Discrete Time Markov Chains (DTMC) and Continuous Time Markov Chains (CTMC). We will deal here only with DTMCs, i.e. probabilistic descriptions of a system only at discrete time steps. This allows us to talk about the next state in the obvious way (for CTMC this concept is a bit more complicated). The DTMCs we will use to model the semantics of a programming language will be based on finitely many states S 2 . For such a system a description at a given point in time is represented by a distribution over the finite state space S, we will refer to the elements in s also as classical states and to the elements in Dist(S) as probabilistic states. In general, we would need measures or vectors in Banach or Hilbert spaces to describe probabilistic states. Once we have an enumeration of states in S we can represent probabilistic states, i.e. distributions on S, as normalised tuples or simply as vectors in V(S). The fact that DTMCs are memory-less means that we only need to specify how the description of a system changes into the one at the next step, i.e. how to transform one probabilistic state dt into the next one dt+1 . Intuitively, we need to describe how much of the probability of an si ∈ S is “distributed” to the other sj in the next moment. Again, we can use matrices to do this. More precisely, we need to consider stochastic matrices M, where all rows must sum up to 1, i.e. Mst = 1 for all s, t
so that for a distribution represented by d the image x·M is again a (normalised) distribution. Note that we follow in these notes the convention of postmultiplying M and that vectors are implemented as row vectors. 2
Unfortunately, the term “state” is used differently in probability theory and semantics: The (probabilistic) state space for the semantics we represent is made up of so-called configurations which are pairs of (semantical) states and statements.
Probabilistic Semantics and Program Analysis
5
We will consider here only homogenous DTMCs where the way the system changes does not change itself over time, i.e. d0 is transformed into d1 in the same way as dt becomes dt+1 at any time t. The change to matrix M, thus, does not depend on t. In fact, we can define a DTMC as we use it here just by specifying its state space S and its generator matrix M, which has to be stochastic. 2.3
Kronecker and Tensor Product
For the definition of our semantics we will use the tensor product construction. The tensor product U ⊗ V of two vector spaces U and V can be defined in a purely abstract way via the following universal property: For each bi-linear function f : U × V → W there exists a unique linear function f⊗ : U ⊗ V → W such that f (u, v) = f⊗ (u ⊗ v), see e.g. [11, Ch 14]. In the case of infinite dimensional topological vector spaces one usually imposes additional requirements on the tensor product ensuring, for example, that the tensor product of two Hilbert spaces is again a Hilbert space, see e.g. [12, 2.6]. Product measures on the Cartesian product of measure spaces as characterised by Fubini’s Theorem, see e.g. [13, 4.5], can also be seen as tensor products. For finite dimensional vector spaces we can realise U ⊗ V as the space of the tensor product of vectors in V and U. More concretely, we can construct the tensor product of two finite dimensional matrices or vectors – seen as 1 × n or n × 1 matrices – via the so-called Kronecker product: Given an n × m matrix A and a k × l matrix B then A ⊗ B is the nk × ml matrix ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ b1,1 . . . b1,l a1,1 B . . . a1,m B a1,1 . . . a1,m ⎟ ⎜ ⎟ ⎜ ⎜ .. ⎟ A ⊗ B = ⎝ ... . . . ... ⎠ ⊗ ⎝ ... . . . ... ⎠ = ⎝ ... . . . . ⎠ an,1 . . . am,n
bk,1 . . . bk,l
an,1 B . . . an,m B
For a d1 dimensional vector u and a d2 dimensional vector v we get a d1 · d2 dimensional vector u ⊗ v. The ith entry in u ⊗ v is the product of the i1 th coordinate of u with the i2 th coordinate of v. The relation between index i and the indices i1 and i2 is as follows: i = (i1 − 1) · d2 + (i2 − 1) + 1 i1 = (i − 1) div d2 + 1 i2 = (i − 1) mod d2 + 1 Note that the concrete realisation of the tensor product via the Kronecker product is not base independent, i.e. if we use a different basis to represent A and B then it is non-trivial to see how the coordinates of A ⊗ B change. Thus many texts prefer the abstract definition of tensor products. However, our discussions will not involve base changes and we thus can work with Kronecker and tensor products as synonyms.
6
A. Di Pierro, C. Hankin, and H. Wiklicky
The binary tensor/Kronecker product can easily be generalised to an n-ary version which is associative but not commutative. Among the important algebraic properties of the tensor/Kronecker product (of matrices and vectors with matching dimensions) we have for example, see e.g. [11,12]: (λA) ⊗ B = λ(A ⊗ B) = A ⊗ (λB) (A1 + A2 ) ⊗ B = (A1 ⊗ B) + (A1 ⊗ B) A ⊗ (B1 + B2 ) = (A ⊗ B1 ) + (A ⊗ B2 ) (A1 ⊗ B1 )(A2 ⊗ B2 ) = (A1 A2 ) ⊗ (B1 B2 ) If we consider the tensor product of vector spaces V(X) and V(Y ) over some (finite) sets X and Y then we get the following important isomorphism which relates the Cartesian product and the tensor product: V(X × Y ) = V(X) ⊗ V(Y ) This follows directly from the universal properties of the tensor product. In terms of distribution this provides a way to construct and understand the space of distributions over product spaces.
3
Probabilistic While
We now introduce a simple imperative language, pWhile, with constructs for probabilistic choice and random assignment, which is based on the well known While language one can find for example in [14,15]. We will use this language to investigate static program analysis techniques based on its semantics. We first present the syntax and operational semantics (in an SOS style) of pWhile; then we develop a syntax-directed semantics which will immediately give the generator of the corresponding DTMC. 3.1
Syntax
The overall structure of a pWhile program is made up from a possibly empty declaration part D of variables and a single statement S which represents the actual program: P ::= begin S end | var D begin S end The declarations D of variables v associate to them a certain basic type e.g. int, bool, or a simple value range r, which determine the possible values of the variable v. Each variable can have only one type, i.e. possible values are in the disjoint union of Z representing integers, B = {true, false} for booleans. r ::= bool | int | { c1 , . . . cn } | { c1 .. cn } D ::= v : r | v:r;D
Probabilistic Semantics and Program Analysis
7
The syntax of statements S is as follows: S ::= stop | skip | v := a | v ?= r | S1 ; S2 | choose p1 : S1 or p2 : S2 ro | if b then S1 else S2 fi | while b do S od We have in pWhile two types of “empty” statements, namely stop and the usual skip statement. We can use both as final statements in a program but while skip represents actual termination the meaning of stop is an infinite loop which replicates the terminal configuration forever – this is a behaviour we need in order to avoid “probability leaks” and to obtain proper DTMCs. The meaning of the assignment “:=”, sequential composition “;”, “if” and “while” are as usual – we only change the syntax slightly to allow for an easier implementation of a pWhile parser in ocaml. We have two additional probabilistic statements: a random assignment “?=” which assigns a random value to a variable using a uniform distribution over the possible values in the range r; and a probabilistic choice “choose”, which executes either S1 or S2 with probabilities p1 and p2 , respectively. Here p1 and p2 are constants and we assume without loss of generality that they are normalised, i.e. that p1 + p2 = 1; if this is not the case, we can also require that at compile time these values are normalised to obtain i p˜i = p1p+p . It is obvious how to generalise the “choose” construct from a binary 2 to an n-ary version. We will also use brackets, indentation and comment lines “#” to improve the readability of programs. Expressions e in pWhile are either boolean expressions b or arithmetic expressions a. Arithmetic expressions are of the form a ::= n | a1 a2 with n ∈ Z a constant and ‘’ representing one of the usual arithmetic operations like ‘+’, ‘−’, ‘×’, ‘/’ or ‘%’ (representing the remainder of an integer division). The syntax of boolean expressions b is defined by b ::= true | false | not b | b1 && b2 | b1 || b2 | a1 < > a2 The symbol ‘< >’ denotes one of the standard comparison operators for arithmetic expressions, i.e. <, ≤, =, =, ≥, >.
8
3.2
A. Di Pierro, C. Hankin, and H. Wiklicky
Operational Semantics
The semantics of pWhile follows essentially the standard one for While as presented, e.g., in [15]. The only two differences concern (i) the probabilistic choice and (ii) random assignments. The structured operational semantics (SOS) is given as usual via a transition system on configurations S, σ, i.e. pairs of statements and (classical) states. To allow for probabilistic choices we label these transitions with probabilities; except for the choose construct and the random assignment these probabilities will always be 1 as all other statements in pWhile are deterministic. A state σ ∈ State describes how variables in Var are associated to values in Value = Z + B (with ‘+’ denoting the disjoint union). The value of a variable can be either an integer or a boolean constant, i.e. State = Var → Z + B The expressions a and b evaluate to values of type Z and B in the usual way. The value represented by an arithmetic expression can be computed by: E(n)σ = n E(v)σ = σ([[v]]σ) E(a1 a2 )σ = E(a1 )σ E(a2 )σ The result is always an integer (i.e. E(.)a ∈ Z). Boolean expressions are also handled in a similar way; their semantics is given by an element in B = {true, false}: E(true)σ E(false)σ E(not b)σ E(b1 || b2 )σ E(b1 && b2 )σ E(a1 < > a2 )σ
= true = false = ¬E(b)σ = E(b1 )σ ∨ E(b2 )σ = E(b1 )σ ∧ E(b2 )σ = E(a1 )σ < > E(a2 )σ
If we denote by Expr the set of all expressions e then the evaluation function E(.). is a function from Expr × State into Z + B. Based on the functions [[.]]. and E(.). the semantics of an assignment is given, for example, by: v := e, σ−→1 stop, σ[v → E(e)σ]. The state σ stays unchanged except for the variable v. The value of this variable is changed so that it now contains the value represented by the expression e. The formal definition of the transition rules defining the operational semantics of pWhile in the SOS style is given in Table 1.
Probabilistic Semantics and Program Analysis
9
Table 1. The rules of the SOS semantics of pWhile R0 skip, σ−→1 stop, σ R1 stop, σ−→1 stop, σ R2 v := e, σ−→1 stop, σ[v → E (e)σ] R3 v ?= r, σ−→
1 |r|
stop, σ[v → ri ∈ r]
R41
S1 , σ−→p S1 , σ S1 ; S2 , σ−→p S1 ; S2 , σ
R42
S1 , σ−→p stop, σ S1 ; S2 , σ−→p S2 , σ
R51 choose p1 : S1 or p2 : S2 ro, σ−→p1 S1 , σ R52 choose p1 : S1 or p2 : S2 ro, σ−→p2 S2 , σ R61 if b then S1 else S2 fi, σ−→1 S1 , σ
if E (b)σ = true
R62 if b then S1 else S2 fi, σ−→1 S2 , σ
if E (b)σ = false
R71 while b do S od, σ−→1 S; while b do S od, σ if E (b)σ = true R72 while b do S od, σ−→1 stop, σ
3.3
if E (b)σ = false
Examples
To illustrate the the use of pWhile to formulate probabilistic programs we present two small examples which we will use throughout these lecture notes. Example 2 (Factorial). This example concerns the Factorial of a natural number, i.e. n! = 1 · 2 · 3 · . . . · n (with 0! = 1). The two programs below compute the usual factorial n! and the “double factorial 2 · n!. var m : {0..2}; n : {0..2};
var m : {0..2}; n : {0..2};
begin m := 1; while (n>1) do m := m*n; n := n-1; od; stop; # looping end
begin m := 2; while (n>1) do m := m*n; n := n-1; od; stop; # looping end
10
A. Di Pierro, C. Hankin, and H. Wiklicky
Though these two programs are deterministic, we will still analyse them using probabilistic techniques. Example 3 (Monty Hall). Let us consider again Example 1 in Section 1. We can implement the two possible strategies of the contestant: Either to stick to his/her initial choice no matter what the show host is doing, or to switch doors once one of the empty doors has been opened. var d :{0,1,2}; g :{0,1,2}; o :{0,1,2};
var d :{0,1,2}; g :{0,1,2}; o :{0,1,2};
begin # Pick winning door d ?= {0,1,2}; # Pick guessed door g ?= {0,1,2}; # Open empty door o ?= {0,1,2}; while ((o == g) ||(o == d)) do o := (o+1)%3; od; # Stick with guess stop; # looping end
begin # Pick winning door d ?= {0,1,2}; # Pick guessed door g ?= {0,1,2}; # Open empty door o ?= {0,1,2}; while ((o == g) ||(o == d)) do o := (o+1)%3; od; # Switch guess g := (g+1)%3; while (g == o) do g := (g+1)%3; od; stop; # looping end
3.4
Linear Operator Semantics
In order to study the semantic properties of a pWhile program we will investigate the stochastic process which corresponds to the program executions. More precisely, we will construct the generator of a Discrete Time Markov Chain (DTMC) which represents the operational semantics of the program in question. The generator matrix of the DTMC which we will construct for any given pWhile program defines a linear operator – thus we refer to it as a Linear Operator Semantics (LOS) – on a vector space based on the labelled blocks and classical states of the program in question. The SOS transition relation – and in particular its restriction to the reachable configurations of a given program – can be directly encoded in a linear operator (cf. [16]), i.e. a matrix T defined for all configurations ci , cj by p if Si , σi −→p Sj , σj (T)ci ,cj = 0 otherwise,
Probabilistic Semantics and Program Analysis
11
However, this approach is in fact only a matrix representation of the SOS semantics and requires the construction of all possible execution trees. This is in itself not compositional, i.e. if we know already the DTMC of a part of the program (e.g. a while loop) it is impossible or at least extremely difficult to describe the operational semantics of a program which contains this part. Instead we present here a different construction which has the advantage of being compositional and therefore provides a more suitable basis for the compositional analysis in Section 4.2. In order to be able to refer to particular program points in an unambiguous way we introduce a standard labelling (cf. [15]) S ::= [stop] | [skip] | [v := a] | [v ?= r] | [S1 ; S2 | [choose] p1 : S1 or p2 : S2 ro | if [b] then S1 else S2 fi | while [b] do S od where is a label in Lab – typically just a unique number. Classical and Probabilistic States. The probabilistic state of the computation is described via a probability measure over the space of (classical) states State = (Var → Z + B). In order to keep the mathematical treatment as simple as possible we will exploit the fact that Var is finite for any given program. We furthermore restrict the actual range of integer variables to a finite sub-set Z of Z. Although such a finite restriction is somewhat unsatisfactory from a purely theoretical point of view, it appears to be justified in the context of static program analysis (one could argue that any “real world” program has to be executed on a computer with certain memory limitations). As a result we can restrict our construction to probability distributions on State, i.e. Dist(State) ⊆ V(State) rather than referring to the more general notion of probability measures on states. While in discrete, i.e. finite, probability spaces every measure can be defined via a distribution, the same does not hold any more for infinite state spaces, even for countable ones: it is, for example, impossible to define on the set of rationals in the interval [0, 1] a kind of “uniform distribution” which would correspond to the Lebesgue measure. As we consider only finitely many variables, v = |Var|, we can represent the space of all possible states Var → Z + B as the Cartesian product (Z + B)v , i.e. for every variable vi ∈ Var we specify its associated value in (a separate copy of) Z + B. As the declarations of variables fix their types – in effect their possible range – we can exploit this information by presenting the state in a slightly more effective way: State = Value1 × Value2 . . . × Valuev
12
A. Di Pierro, C. Hankin, and H. Wiklicky
with Valuei = Z or B. We will use the convention that, given v variables, we enumerate them according to the sequence in which they are declared in D. Probabilistic Control Flow. We base the compositional construction of our LOS semantics on a probabilistic version of the control flow [15] or abstract syntax [17] of pWhile programs. The flow F = flow is a set of triples i , pij , j which record the fact that control passes with probability pij from block Bi to block Bj , where a block is of the form Bi = [. . .]i . We assume label consistency, i.e. the labels on blocks are unique. We denote by Block(P ) the set of all blocks and by Lab(P ) the set of all labels in a program P . Except for the choose statement and the random assignment the probability pij is always equal to 1. For the if statement we indicate the control step into the then branch by underlining the target label; the same is the case for while statements. The formal definition of the control flow of a program following the presentation in [15] is based on two auxiliary operations init and final init : Stmt → Lab final : Stmt → P(Lab) which return the initial label and the final labels of a statement (whereas a sequence of statements has a single entry, it may have multiple exits, as for example in the conditional). init([skip] ) = init([stop] ) = init([v := e] ) = init([v ?= e] ) = init(S1 ; S2 ) = init(S1 ) init([choose] p1 : S1 or p2 : S2 ro) =
init(if [b] then S1 else S2 fi) = init(while [b] do S od) = and final([skip] ) = {} final([stop] ) = {} final([v := e] ) = {} init([v ?= e] ) = {} final(S1 ; S2 ) = final(S2 ) final([choose] p1 : S1 or p2 : S2 ro) = final(S1 ) ∪ final(S2 )
final(if [b] then S1 else S2 fi) = final(S1 ) ∪ final(S2 ) final(while [b] do S od) = {}
Probabilistic Semantics and Program Analysis
13
The probabilistic control flow F (S) = flow(S) is then defined via the a function flow flow : Stmt → P(Lab × [0, 1] × Lab) which maps statements to sets of triples which represent the probabilistic control flow graph: flow([skip] ) = ∅ flow([stop] ) = {, 1, } flow([v := e] ) = ∅ flow([v ?= e] ) = ∅ flow(S1 ; S2 ) = flow(S1 ) ∪ flow(S2 ) ∪ ∪ {(, 1, init(S2 )) | ∈ final(S1 )} flow([choose] p1 : S1 or p2 : S2 ro) = flow(S1 ) ∪ flow(S2 ) ∪ ∪ {(, p1 , init(S1 )), (, p2 , init(S2 ))} flow(if [b] then S1 else S2 fi) = flow(S1 ) ∪ flow(S2 ) ∪ ∪ {(, 1, init(S1 )), (, 1, init(S2 ))} flow(while [b] do S od) = flow(S) ∪ ∪ {(, 1, init(S))} ∪ {( , 1, ) | ∈ final(S)} Example 4. Consider the following labelled program P : var z : {0..200}; begin while [z<100]1 do [choose]2 1/3:[x:=3]3 or 2/3:[x:=1]4 ro; od; [stop]5; end The flow of this program is given by: 2 1 flow(P ) = {1, 1, 2, 1, 1, 5, 2, , 3, 2, , 4, 3, 1, 1, 4, 1, 1}. 3 3 Probabilistic Configurations. The construction of the DTMC representing the probabilistic semantics of a pWhile program will – as in the SOS case, represent a transition relation on configurations in Conf. In the classical case configurations are pairs formed by the (remaining) program S which is to be executed and the computational state σ representing the current values of all the variables. For pWhile it is in fact enough to just record the initial init(S) statement of the program S. In other words, a classical configuration is an element in Stmt × State or just in Block × State.
14
A. Di Pierro, C. Hankin, and H. Wiklicky
For probabilistic programs there is in general no unique configuration which describes the current situation when we execute a program. Instead we need to use distributions (in general measures) over classical configurations, i.e. configurations are elements in Dist(State × Block) ⊆ V(State × Block). In order to distinguish between classical and probabilistic states we reverse the order between the value part and the syntactic part of configurations. Exploiting the fact that states can also be described as tuples in the Cartesian products of values of each variable and that blocks are uniquely labelled we identify as the space of configurations describing the computational situation in the probabilistic case as Dist(Value1 × . . . × Valuev × Lab) ⊆ V(Value1 × . . . × Valuev × Lab). Finally we observe the important isomorphism between the vector space over Cartesian products and the tensor product of vector spaces. This allows us to construct probabilistic configurations as elements in V(Value1 × . . . × Valuev × Lab) = V(Value1 ) ⊗ . . . ⊗ V(Valuev ) ⊗ V(Lab). This decomposition of the space of configurations is the basis for a compositional description of the DTMC generator which defines the semantics of a program. In particular, by the finiteness condition for Valuei and the fact that Lab is always finite we know immediately the finite set of (potentially) reachable states and thus the state of probabilistic configurations. Furthermore, we will exploit the tensor product to describe the DTMC generator as a linear combination of local updates. These factors themselves are given as tensor products of operators which describe the computational dynamics of individual statements (blocks). Basic Operators. In order to construct the concrete semantics we need to identify those states which satisfy certain conditions, e.g. all those states where a variable has a value larger than 5. This is achieved by “filtering” states which fulfil some conditions via projection operators, which are concretely represented by diagonal matrices. Consider a variable x together with the set of its possible values Value = {v1 , v2 , . . .}, and the vector space V(Value). The probabilistic state of the variable v can be described by a distribution over its possible values, i.e. a vector in V(Value). For example, if we know that x holds the value v1 or v3 with probabilities 13 and 23 respectively (and no other values) then this situation is represented by the vector ( 13 , 0, 23 , 0, . . .). As we represent distributions by row vectors x the application of a linear map corresponds to a post-multiplication by the corresponding matrix T, i.e. T(x) = x · T.
Probabilistic Semantics and Program Analysis
15
We might need to apply a transformation T to the probabilistic state of the variable xi only when a certain condition is fulfilled. We can express such a condition by a predicate q on Valuei . Defining a diagonal matrix P with otherwise, 1 if p(vi ) holds (P)ii = 0 otherwise. allows us to “filter out” only those states which fulfil the condition q, i.e. P · T applies T only to those states. The Linear Operator Semantics (LOS) of pWhile is built using a number of basic operators which can be represented by the (sparse) square matrices specified in Table 2. The matrix units E(m, n) contains only one non-zero entry, and I is the identity operator. Using these basic building blocks we can define a number of “filters” P as depicted in Table 3. The operator P(c) has only one non-zero entry: the diagonal element Pcc = 1, i.e. P(c) = E(c, c). This operator extracts the probability corresponding to the c-th coordinate of a vector, i.e. for x = (xi )i the multiplication with P(c) results in a vector x = x · P(c) with only one non-zero coordinate, namely xc = xc . The operator P(σ) performs a similar test for a vector representing the probabilistic state of the computation. It filters the probability that the computation is in a classical state σ. This is achieved by checking whether each variable xi has the value specified by σ namely σ(xi ). Finally, the operator Table 2. Basic Operators for pWhile (E(m, n))ij = (I)ij =
1 if m = i ∧ n = j 0 otherwise. 1 if i = j 0 otherwise.
Table 3. Test or Filter Operators pWhile (P(c))ij =
P(σ) =
1 if i = c = j 0 otherwise.
v
P(σ(xi ))
i=1
P(e = c) =
E(e)σ=c
P(σ)
16
A. Di Pierro, C. Hankin, and H. Wiklicky Table 4. Update Operators for pWhile (U(c))ij =
U(xk ← c) =
1 if j = c 0 otherwise.
k−1
I ⊗ U(c) ⊗
i=1
U(xk ← e) =
v
I
i=k+1
P(e = c)U(xk ← c)
c
Table 5. Linear Operator Semantics for pWhile
T(1 , 2 ) = I ⊗ E(1 , 2 )
for [skip]1
T(, ) = I ⊗ E(, )
for [stop]
T(1 , 2 ) = U(v ← e) ⊗ E(1 , 2 ) for [v := e]1
1 1 T(1 , 2 ) = |r| c∈r U(v ← c) ⊗ E(1 , 2 ) for [v ?= r] T(, k ) = I ⊗ E(, k )
for [choose]
T(, t ) = P(b = true) ⊗ E(, t )
for [b]
T(, f ) = P(b = false) ⊗ E(, f )
for [b]
P(e = c) filters those states where the values of the variables xi are such that the evaluation of the expression e results in c. The number of (diagonal) nonzero entries of this operator is exactly the number of states σ for which E(e)σ = c. LOS Semantics. The update operators (see Table 4) implement state changes. From an initial probabilistic state σ, i.e. a distribution over classical states, we get a new probabilistic state σ via σ = σ · U The simple operator U(c) implements the deterministic update of a variable xi : Whatever the value(s) of xi are, after applying U(c) to the state vector describing xi we get a point distribution expressing the fact that the value of xi is now certainly c. The operator U(xk ← c) puts U(c) into the context of other variables: Most factors in the tensor product are identities, i.e. most variables keep their previous values, only xk is deterministically updated to its new
Probabilistic Semantics and Program Analysis
17
value c using the previously defined U(c) operator. The operator U(xk ← e) updates a variable not to a constant but to the value of an expression e. This update is realised using the filter operator P(e = c): For all possible values c of e we select those states where e evaluates to c and then update xk to this c. The full LOS semantics of a pWhile program P is defined as the operator T = T(P ) on V(State × B(P )). This concrete semantics of a program P is given by: T(P ) = pij · T(i , j ). i,pij ,j∈F (P )
The meaning of T(P ) is to collect for every triple in the probabilistic flow F (P ) of P its effects, weighted according the probability associated to this triple. The operators T(i , j ) which implement the local state updates and control transfers from i to j are presented in Table 5. Each local operator T(i , j ) is of the form N ⊗ E(i , j ) where the first factor N represents a state update or, in the case of tests, a filter operator while the second factor realises the transfer of control from label i to label j . For the skip and stop no changes to the state happen, we only transfer control (deterministically) to the next statement or loop on the current (terminal) statement using matrix units E. Also in the case of a choose there is no change to the state but only a transfer of control, however the probabilities pij will in general be different from 1, unlike skip. With assignments we have both a state update, implemented using U(p ← e) as well as a control flow step. For tests b we use a filter operator P(b = true) to select those states which pass the test or P(b = false) fail it to determine to which label control will pass. Proposition 1. The operator T(P ) is a stochastic matrix for any pWhile program P , i.e. the sum of all elements in each row add up to one. Thus, T is indeed the generator of a DTMC. Furthermore, by the construction of T it also follows immediately that the SOS and LOS semantics are equivalent in the following sense. Proposition 2. For any pWhile program P and any classical state σ ∈ State, we have: S, σ −→p S , σ iff (T(P ))σ,,σ , = p,where and label the first block in the statement S and S , respectively. It is an easy exercise to introduce additional languages features, e.g. pointers (see [18]) or (probabilistic) jumps, i.e. gotos, or sub-routines. 3.5
Example
Example 5 (Monty Hall). Consider again our running Example 1. The labelled version of the program Hw (with switching doors) is:
18
A. Di Pierro, C. Hankin, and H. Wiklicky
var d :{0,1,2}; g :{0,1,2}; o :{0,1,2}; begin [d ?= {0,1,2}]1; [g ?= {0,1,2}]2; [o ?= {0,1,2}]3; while [((o == g)||(o == d))]4 do [o := (o+1)%3]5; od; [g := (g+1)%3]6; while [(g == o)]7 do [g := (g+1)%3]8; od; [stop]9; end The blocks for this program Hw are thus Block(Hw ) = {[d ?= { 0, 1, 2 }]1 , [g ?= { 0, 1, 2 }]2 , [o ?= { 0, 1, 2 }]3 , [((o == g)||(o == d))]4 , [o := ((o + 1) % 3)]5 , [g := ((g + 1) % 3)]6 , [(g == o)]7 , [g := ((g + 1) % 3)]8 , [stop]9 } and the flow Flow(Hw ) = {(1, 1, 2), (2, 1, 3), (3, 1, 4), (4, 1, 5), (5, 1, 4), (4, 1, 6), (6, 1, 7), (7, 1, 8), (8, 1, 7), (7, 1, 9), (9, 1, 9)} The elements describing the version Ht where we stick to the original door is identical to this one, except that it just involves labels 1 to 5 and then a final [stop]6 (instead of [stop]9). Note that this program does not use probabilistic choices and so the second element of all entries in flow are 1 (though we use random assignments here). Following the definition of the LOS semantics we can construct the transition operators, i.e. the generators of the corresponding DTMCs, in a straight forward way. T(Ht ) =
1 (U(d ← 0) + U(d ← 1) + U(d ← 2)) ⊗ E(1, 2) + 3 1 (U(g ← 0) + U(g ← 1) + U(g ← 2)) ⊗ E(2, 3) + 3 1 (U(o ← 0) + U(o ← 1) + U(o ← 2)) ⊗ E(3, 4) + 3 P((o == g)||(o == d) = true) ⊗ E(4, 5) + P((o == g)||(o == d) = false) ⊗ E(4, 6) + I ⊗ E(6, 6)
Probabilistic Semantics and Program Analysis
19
and T(Hw ) =
1 (U(d ← 0) + U(d ← 1) + U(d ← 2)) ⊗ E(1, 2) + 3 1 (U(g ← 0) + U(g ← 1) + U(g ← 2)) ⊗ E(2, 3) + 3 1 (U(o ← 0) + U(o ← 1) + U(o ← 2)) ⊗ E(3, 4) + 3 P((o == g)||(o == d) = true) ⊗ E(4, 5) + P((o == g)||(o == d) = false) ⊗ E(4, 6) + U(g ← (g + 1)%3) ⊗ E(6, 7) + P((g == o) = true) ⊗ E(7, 8) + P((g == o) = false) ⊗ E(7, 9) + U(g ← (g + 1)%3) ⊗ E(6, 7) + I ⊗ E(9, 9)
The individual update operators T(i, j) are then given by the following matrices where the enumeration of elements in State is as follows 1 2 3 4 5 6 7 8 9
... ... ... ... ... ... ... ... ...
(d → 0, g → 0, o → 0) (d → 0, g → 0, o → 1) (d → 0, g → 0, o → 2) (d → 0, g → 1, o → 0) (d → 0, g → 1, o → 1) (d → 0, g → 1, o → 2) (d → 0, g → 2, o → 0) → 0, g → 2, o → 1) (d (d → 0, g → 2, o → 2)
10 11 12 13 14 15 16 17 18
... ... ... ... ... ... ... ... ...
(d → 1, g → 0, o → 0) (d → 1, g → 0, o → 1) (d → 1, g → 0, o → 2) (d → 1, g → 1, o → 0) (d → 1, g → 1, o → 1) → 1, o → 2) (d → 1, g (d → 1, g → 2, o → 0) (d → 1, g → 2, o → 1) (d → 1, g → 2, o → 2)
19 20 21 22 23 24 25 26 27
... ... ... ... ... ... ... ... ...
(d → 2, g → 0, o → 0) (d → 2, g → 0, o → 1) (d → 2, g → 0, o → 2) (d → 2, g → 1, o → 0) (d → 2, g → 1, o → 1) (d → 2, g → 1, o → 2) (d → 2, g → 2, o → 0) (d → 2, g → 2, o → 1) (d → 2, g → 2, o → 2)
For the first three random assignments we have
T(1, 2) =
⎛ 1 3 ⎜ . ⎜ ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ 1 ⎜ 3 ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ 1 ⎜ ⎜ 3 ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎝ .
. . . . . . . . 1 . . . . . . . 3 . 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3 . . . . . . . 1 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3 . . . . . . . 1 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . . 1 3
1 . . . . . . . 3 . 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3 . 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3 . 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . .
. 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3 . 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3 . 1 . . . . . . 3 . . 1 . . . . . 3 . . . 1 . . . . 3 . . . . 1 . . . 3 . . . . . 1 . . 3 . . . . . . 1 . 3 . . . . . . . 1 3 . . . . . . . . 1 . . . . . . . 3
. . . . . . . 1 3 . . . . . . . . 1 3 . . . . . . . . 1 3 .
.
⎞
.⎟ ⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ 1 ⎟ ⎟ 3 ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ 1 ⎟ ⎟ 3 ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎠ 1 3
⊗E(1, 2)
20
A. Di Pierro, C. Hankin, and H. Wiklicky
T(2, 3) =
T(3, 4) =
⎛ 1 3 ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ 1 ⎜ 3 ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ 1 ⎜ 3 ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎝ .
. . 1 . 3 . 1 3 . . 1 . 3 . 1 3 . . 1 . 3 . 1 3 . .
1 . 3 . 1 3 . . 1 . 3 . 1 3 . . 1 . 3 . 1 3 . . .
. 1 3 . . 1 . 3 . 1 3 . . 1 . 3 . 1 3 . . 1 . 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . 1 . . . 3 . 1 . . 3 . . . . 1 . . . 3 . 1 . . 3 . . . . 1 . . . 3 . 1 . . 3 . . 1 . 3 . . . . 1 3 . . . . . . . . . 1 3 . . . . 1 3 . . . . . . . . 1 . 3 . . . . 1 3 . . . . .
.
.
.
.
.
.
.
.
.
. . . 1 . 3 . . 1 3 1 . . 3 1 . . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3 . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . 1 . . 3 . 1 . 3 . . . 1 3 . . . . . . . 1 3 . . . 1 3 . . . . . . . 1 3 . . . 1 3 . . . .
⎛ 1 3 ⎜ 1 ⎜ 3 ⎜ 1 ⎜ ⎜ 3 ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ ⎝ .
1 3 1 3 1 3 .
1 . 3 1 . 3 1 . 3 1 . 3 1 . . 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. . 1 . 3 1 . 3 1 . 3 1 . 3 1 . . 3 . . 1 3
. 1 3 1 3 1 3 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . 1 . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 1 3 .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . 1 . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3
.
.
. 1 3 .
.
.
.
.
.
.
.
.
.
.
. . 1 . 3 1 . 3 1 . 3 . 1 3 . . 1 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 1 . 3 1 . 3 1 . 3 1 . 3 . . 1 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 1 . 3 1 . 3 1 . 3 . 1 3 . . 1 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 1 . 3 1 . 3 1 . 3 . 1 3 . . 1 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 1 . 3 1 . 3 1 . 3 . 1 3 . . 1 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . 1 . 3 1 . 3 1 . 3 1 . 3 1 . . 3 . . 1 3 . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 1 3 1 3 1 3 .
. 1 3 1 3 1 3 .
. 1 3 1 3 1 3 .
. 1 3 1 3 1 3 .
. 1 3 1 3 1 3 .
. 1 3 1 3 1 3 .
. . . 1 . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3 . 1 . 3 . . 1 3 1 . . 3
. . . 1 3 1 3 1 3
.
⎞
.⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ 1 ⎟ ⎟ 3 ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ 1 ⎟ ⎟ 3 ⎟ .⎟ ⎟ ⎟ .⎠ 1 3 .
⊗E(2, 3)
⎞
.⎟ ⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ ⎟ .⎟ ⎟ 1 ⎟ 3 ⎟ 1 ⎟ ⎠ 3 1 3
⊗E(3, 4)
In each of these cases we have an update of the values of one of the three variables d, g and o together with a (deterministic) transfer control from one label to the next implemented via a matrix unit E(1, 2), E(2, 3) and E(3, 4). After that we have to consider the projections which implement the guard of the first while loop. If the current state of the variables d, g and o fulfils the condition that (o == g) || (o ==d) then we transfer control to label 5, otherwise to label 6. The filter or projections which determine which control transfer will be executed are diagonal matrices such that the transfer operators are given by:
Probabilistic Semantics and Program Analysis
21
T(4, 5) =diag 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 0 1 1 1 0 1 0 1 1 0 0 1 ⊗E(4, 5) and T(4, 6) =diag 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 0 1 0 1 0 0 1 1 0 ⊗E(4, 6). The single statement labelled with 5 is a deterministic update and is combined with a return to the guard of the loop labelled with 4, i.e. ⎛
T(5, 4) =
⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
. . 1 . . . . . . . . . . . . . . . . . . . . . . . .
1 . . . . . . . . . . . . . . . . . . . . . . . . . .
. 1 . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 1 . . . . . . . . . . . . . . . . . . . . .
. . . 1 . . . . . . . . . . . . . . . . . . . . . . .
. . . . 1 . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 1 . . . . . . . . . . . . . . . . . .
. . . . . . 1 . . . . . . . . . . . . . . . . . . . .
. . . . . . . 1 . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 1 . . . . . . . . . . . . . . .
. . . . . . . . . 1 . . . . . . . . . . . . . . . . .
. . . . . . . . . . 1 . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 1 . . . . . . . . . . . .
. . . . . . . . . . . . 1 . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1 . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 1 . . . . . . . . .
. . . . . . . . . . . . . . . 1 . . . . . . . . . . .
. . . . . . . . . . . . . . . . 1 . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . 1 . . . . . .
. . . . . . . . . . . . . . . . . . 1 . . . . . . . .
. . . . . . . . . . . . . . . . . . . 1 . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . 1 . . .
. . . . . . . . . . . . . . . . . . . . . 1 . . . . .
. . . . . . . . . . . . . . . . . . . . . . 1 . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . . . . . . . 1 . .
⎞ . .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ 1⎠ .
⊗E(5, 4)
If we are not switching doors then we can construct essentially with these matrices the full LOS semantics of Ht . For the switching strategy we also need to execute the switch. This starts with the statement at label 6 which its associated control step from label 6 to 7.
T(6, 7) =
⎛ . ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ 1 ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎝ . .
. . . . . . . 1 . . . . . . . . . . . . . . . . . . .
. . . . . . . . 1 . . . . . . . . . . . . . . . . . .
1 . . . . . . . . . . . . . . . . . . . . . . . . . .
. 1 . . . . . . . . . . . . . . . . . . . . . . . . .
. . 1 . . . . . . . . . . . . . . . . . . . . . . . .
. . . 1 . . . . . . . . . . . . . . . . . . . . . . .
. . . . 1 . . . . . . . . . . . . . . . . . . . . . .
. . . . . 1 . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 1 . . . . . . . . . . .
. . . . . . . . . . . . . . . . 1 . . . . . . . . . .
. . . . . . . . . . . . . . . . . 1 . . . . . . . . .
. . . . . . . . . 1 . . . . . . . . . . . . . . . . .
. . . . . . . . . . 1 . . . . . . . . . . . . . . . .
. . . . . . . . . . . 1 . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1 . . . . . . . . . . . . .
. . . . . . . . . . . . . . 1 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 1 . .
. . . . . . . . . . . . . . . . . . . . . . . . . 1 .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . 1 . . . . . . . .
. . . . . . . . . . . . . . . . . . . 1 . . . . . . .
. . . . . . . . . . . . . . . . . . . . 1 . . . . . .
. . . . . . . . . . . . . . . . . . . . . 1 . . . . .
. . . . . . . . . . . . . . . . . . . . . . 1 . . . .
.⎞ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ 1⎟ ⎟ .⎟ ⎠ . .
⊗E(6, 7)
After this we have another loop which is guarded by the test g == o labelled with 7. To implement this we need again two diagonal matrices which allow us to construct T(7, 8) = 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 0 1 ⊗E(7, 8)
22
A. Di Pierro, C. Hankin, and H. Wiklicky
and T(7, 9) = 0 1 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 0 1 1 1 0 1 1 1 0 ⊗E(7, 9) Finally, we only need the LOS semantics of the single statement which makes up the loop we have to avoid changing our guess to the already open door. The matrix for this is given by
T(8, 7) =
⎛ . ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ 1 ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎜ ⎜ . ⎝ . .
. . . . . . . 1 . . . . . . . . . . . . . . . . . . .
. . . . . . . . 1 . . . . . . . . . . . . . . . . . .
1 . . . . . . . . . . . . . . . . . . . . . . . . . .
. 1 . . . . . . . . . . . . . . . . . . . . . . . . .
. . 1 . . . . . . . . . . . . . . . . . . . . . . . .
. . . 1 . . . . . . . . . . . . . . . . . . . . . . .
. . . . 1 . . . . . . . . . . . . . . . . . . . . . .
. . . . . 1 . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . 1 . . . . . . . . . . .
. . . . . . . . . . . . . . . . 1 . . . . . . . . . .
. . . . . . . . . . . . . . . . . 1 . . . . . . . . .
. . . . . . . . . 1 . . . . . . . . . . . . . . . . .
. . . . . . . . . . 1 . . . . . . . . . . . . . . . .
. . . . . . . . . . . 1 . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1 . . . . . . . . . . . . .
. . . . . . . . . . . . . . 1 . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . 1 . .
. . . . . . . . . . . . . . . . . . . . . . . . . 1 .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1
. . . . . . . . . . . . . . . . . . 1 . . . . . . . .
. . . . . . . . . . . . . . . . . . . 1 . . . . . . .
. . . . . . . . . . . . . . . . . . . . 1 . . . . . .
. . . . . . . . . . . . . . . . . . . . . 1 . . . . .
. . . . . . . . . . . . . . . . . . . . . . 1 . . . .
.⎞ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ 1⎟ ⎟ .⎟ .⎠
⊗E(8, 7)
.
Note that the update part of T(8, 7) is exactly the same as for T(6, 7). This reflects the fact that we have the same statement at label 6 and 8: g := (g+1)%3. The LOS semantics is compositional and syntax directed, it treats syntactically equivalent parts in exactly the same way. The only difference between T(8, 7) and T(6, 7) is the different “continuation” after the update. The final [stop] statement will just introduce an infinite loop on the terminal state we have reached (in order to “preserve” the final result). It is implemented both for the program Ht as well as for the program Hw as the matrix I ⊗ E(, ). Putting everything together we get the LOS semantics for both programs as; T(Hs ) = T(1, 2) + T(2, 3) + T(3, 4) + T(4, 5) + T(5, 4) + T(4, 6) + + I ⊗ E(6, 6) and T(Hw ) = T(1, 2) + T(2, 3) + T(3, 4) + T(4, 5) + T(5, 4) + T(4, 6) + + T(6, 7) + T(7, 8) + T(8, 7) + T(7, 9) + I ⊗ E(9, 9) This is for Ht a (27·5)×(27·5) = 162×162 matrix and for Ht a (27·9)×(27·9) = 243 × 243 matrix which we omit to present in full. These matrices encode all possible transitions between all (potentially) reachable configurations. For any probabilistic state, i.e. distributions over State, we can construct the successor configuration using just this matrices.
Probabilistic Semantics and Program Analysis
4
23
Probabilistic Static Analysis
Program analysis is a collection of techniques to predict in advance what will happen when a program is executed. Classically, such information could be used to optimise the code produced by a compiler; more recently this has formed the basis for the automatic debugging, verification and certification of code. Since well known un-decidability results tell us that it is impossible to know everything about the behaviour of every program, we can only aim for partial answers to some of the questions. Program analysis techniques allows us to obtain such partial answers via the use of approximation and abstraction aiming at reducing/simplifying the problem under consideration. It is now widely recognised that considering probabilistic information allows one to obtain more precise and practically useful results from a static analysis. Statistical or other types of information can be encoded in the form of probabilities in the program semantics and used to weight the execution paths according to their likelihood to actually be executed. We will show here how one of the basic techniques of static analysis, namely Abstract Interpretation, can be extended so as to include quantitative information in the form of both probabilities associated to the analysed programs, and estimates of the precision of the resulting analyses. 4.1
Classical Abstract Interpretation
The basic idea behind abstract interpretation is to analyse a program not in terms of its standard or concrete semantics, but rather in terms of an appropriately simplified approximated or abstract semantics, which only registers aspects of the program which are of relevance with respect to a specific analysis (cf. [19,20,15]). Typically these aspects are encoded in the definition of an abstract domain, which is usually structured, like the concrete domain, as a complete partially ordered set. The idea is to execute the program using abstract instead of concrete values to describe the current state or configuration. For an informal introduction let us consider the factorial programs. Example 6 (Factorial). Considering the two programs in Example 2 we get the following labelling: var m : {0..2}; n : {0..2}; begin [m := 1]1 ; while [(n>1)]2 do [m := m*n]3 ;; [n := n-1]4 ;; od; [stop]5; end
var m : {0..2}; n : {0..2}; begin [m := 2]1 ; while [(n>1)]2 do [m := m*n]3 ; [n := n-1]4 ; od; [stop]5; end
24
A. Di Pierro, C. Hankin, and H. Wiklicky
The idea is now to analyse the properties of the states during the execution of the program rather than the actual or concrete values of the variables. To demonstrate this idea let us look at the parity of the variables, i.e. whether they are even or odd. The abstract property we are interested in is the description of the possible parities of the variables m and n: If we can guarantee that a variable is always ‘even’ when we reach a certain program point then we associate to it the abstract value or property even; if on the other hand we are certain that a variable is always ‘odd’, then we use odd as its abstract value. However, we also have to take care of the case when we are not sure about the parity of a variable: it could be sometimes even or sometimes odd. We use the value to indicate this ambiguous situation. We can distinguish this situation from another kind of unknown value ⊥ we use to handle non-initialised variables which are neither even nor odd. This situation can be formalised using the notion of a lattice L, cf. [21]: y DDD yy DD y y DD yy D y y evenE odd EE zz EE z z EE zz EE zz ⊥ which expresses the relation between abstract values as an order relation, e.g. is more general than even and odd, i.e. if we know that a variable could be even and odd then the general statement which describes its (abstract) value is to say that its parity is ambiguous or . We can interprete this property lattice also as the power-set of {even, odd}, i.e. L = P({even, odd}), identifying = {even, odd} and ⊥ = ∅ and ordered by inclusion “⊆”. We now consider the abstract execution of the “double factorial” program (on the left-hand side above). Two cases are possible: One where the guard in label 2 fails, and one where we enter the loop. The abstract values we can associate in these two cases (assuming that we start with unknown rather than non-initialised values) are: 1 : m → , n → 2 : m → even, n → 3: 4: 5 : m → even, n →
1 : m → , n → 2 : m → even, n → 3 : m → even, n → 4 : m → even, n → 5 : m → even, n →
We observe that the parity of n remains ambiguous throughout the execution of the program. However, whether or not the loop is executed, the parity of m will always be even when we reach the final label 5: If we omit the loop then the even value 2 we assigned to m is directly used; if we execute the loop, then m enters the loop at the first iteration with an even value and remains even despite the fact that in label 3 it is multiplied with an unknown n because we know that the
Probabilistic Semantics and Program Analysis
25
product of an even number with any number results again in an even number. In any subsequent iteration the same argument holds. Thus, whenever the loop terminates, we will always be certain that m is even when we reach label 5. The “double factorial” always produces an even result. If we consider the program on the right-hand side, which implements the simple “factorial” then our arguments break down. The abstract executions in this case give us: 1 : m → , n → 2 : m → odd, n → 3: 4: 5 : m → odd, n →
1 : m → , n → 2 : m → odd, n → 3 : m → , n → 4 : m → , n → 5 : m → , n →
If the loop is not executed we can guarantee that m is odd; but if we execute the loop then we have to multiply (in the first iteration) an odd m with an unknown n and we cannot guarantee any particular parity for m from then on. As a result the analysis will return for the parity of m at label 5. The factorial indeed may give an odd value (for 0 and 1) but it is obvious that for “most” values of n it will be an even number. The classical analysis is conservative and unable to extract this information. The remainder of these notes aims in developing a framework which allows for a formal analysis which captures such a “probabilistic” intuition. A detailed formal discussion of the relation between the concrete values of m and n as sub-sets of Z, i.e. as elements in the power-set P(Z) (which also forms a lattice in a canonical way via the sub-set relation) and their abstract values in L is beyond the the scope of these notes. For our purposes, it is sufficient to say that there exists a abstraction function α between the concrete and abstract values of m and n and a formal way to define an abstract semantics describing our factorial programs in terms of these abstract values by constructing the “right” concretisation function γ. In the standard theory of abstract interpretation, which was introduced by Cousot & Cousot 30 years ago [22,23], the correctness of an abstract semantics is guaranteed by ensuring that we have a pair of functions α and γ which form a Galois connection between two lattices C and D representing concrete and abstract properties. Definition 1. Let C = (C, ≤C ) and D = (D, ≤D ) be two partially ordered set (e.g. lattices). If there are two functions α : C → D and γ : D → C such that for all c ∈ C and all d ∈ D: c ≤C γ(d) iff α(c) ≤D d, then (C, α, γ, D) forms a Galois connection. The intended meaning is that an abstract element d approximates a concrete one c if c ≤C γ(d) or equivalently (by adjunction) if α(c) ≤D d. Therefore,
26
A. Di Pierro, C. Hankin, and H. Wiklicky
the concrete value corresponding to an abstract denotation d is γ(d), while the adjunction guarantees that α(c) is the best possible approximation of c in D (because whenever d is a correct approximation of c, then α(c) ≤D d). An abstract function f # : D → D is a correct approximation of a concrete function f : C → C if α ◦ f ≤A f # ◦ α If α and γ form a Galois connection then correctness is automatically guaranteed. The important case is when f describes the (concrete) semantics of a program. An easy way to define a correct abstract function (e.g. a semantics) f # is to induce it simply via f # = α ◦ f ◦ γ. An alternative characterisation of a Galois connection is as follows: Theorem 1. Let C = (C, ≤C ) and D = (D, ≤D ) be two partially ordered set together with two functions α : C → D and γ : D → C. Then (C, α, γ, D) form a Galois connection iff 1. α and γ are order-preserving, 2. α ◦ γ is reductive (i.e. for any d ∈ D, α ◦ γ(d) ≤D d), 3. γ ◦ α is extensive (i.e. for any c ∈ C, c ≤C γ ◦ α(c)). A further important property of Galois connections guarantees that the approximation of a concrete semantics by means of two functions α and γ related by a Galois connection is not only safe but also conservative in as far as repeating the abstraction or the concretisation gives the same results as by a single application of these functions. Formally, this property is expressed by the following proposition: Let (C, α, γ, D) be a Galois connection, then α and γ are quasi-inverse, i.e. α ◦ γ ◦ α = α, and γ ◦ α ◦ γ = γ. 4.2
Probabilistic Abstract Interpretation
The general approach for constructing simplified versions of a concrete (collecting) semantics via abstract interpretation is based on order-theoretic and not on linear structures. One can define a number of orderings (lexicographic, etc.) as an additional structure on a given vector space, and then use this order to compute over- or under-approximations using classical Abstract Interpretation. Though such approximations will always be safe, they might also be quite unrealistic, addressing a worst case scenario rather than the average case [24]. Furthermore, there is no canonical order on a vector space (e.g. the lexicographic order depends on the base). In order to provide probabilistic estimates we have previously introduced, cf. [1,25], a quantitative version of the Cousot & Cousot framework, which we have called Probabilistic Abstract Interpretation (PAI). The PAI approach is based, as in the classical case, on a concrete and abstract domain C and D – except that C and D are now vector spaces (or in general, Hilbert spaces) instead of lattices. We assume that the pair of abstraction and concretisation function α : C → D and γ : D → C are again structure preserving, i.e. in our setting they are (bounded) linear maps represented by matrices A and G. Finally, we replace the notion of a Galois connection by the notion of a Moore-Penrose pseudo-inverse.
Probabilistic Semantics and Program Analysis
27
Definition 2. Let C and D be two finite dimensional vector spaces, and let A : C → D be a linear map between them. The linear map A† = G : D → C is the Moore-Penrose pseudo-inverse of A iff A ◦ G = PA and G ◦ A = PG where PA and PG denote orthogonal projections (i.e. P∗A = PA = P2A and P∗G = PG = P2G where .∗ denotes the adjoint [11, Ch 10]) onto the ranges of A and G. Alternatively, if A is Moore-Penrose invertible, its Moore-Penrose pseudo-inverse, A† satisfies the following: (i) AA† A = A, (ii) A† AA† = A† , (iii) (AA† )∗ = AA† , (iv) (A† A)∗ = A† A. It is instructive to compare these equations with the classical setting. For example, if (α, γ) is a Galois connection we similarly have α ◦ γ ◦ α = α and γ ◦ α ◦ γ = γ. This allows us to construct the closest (i.e. least square, see for example [26,27]) approximation T# : D → D of the concrete semantics T : C → C as: T# = G · T · A = A† · T · A = A ◦ T ◦ G. As our concrete semantics is constructed using tensor products it is important that the Moore-Penrose pseudo-inverse of a tensor product can easily be computed as follows [27, 2.1,Ex 3]: (A1 ⊗ A2 ⊗ . . . ⊗ An )† = A†1 ⊗ A†2 ⊗ . . . ⊗ A†n . Example 7 (Parity). Let us consider as abstract and concrete domains C = V({−n, . . . , n}) and D = V({even, odd}). The abstraction operator Ap and its concretisation operator Gp = A†p corresponding to a parity analysis are represented by the following n × 2 and 2 × n matrices (assuming w.l.o.g. that n is even) with .T denoting the matrix transpose, (AT )ij = (A)ji : ⎛ ⎞ 10 ⎜0 1⎟ ⎜ ⎟ 1 ⎜1 0⎟ 1 1 0 n+1 0 . . . n+1 ⎜ ⎟ † n+1 Ap = ⎜ 0 1 ⎟ Ap = 0 n1 0 n1 . . . 0 ⎜ ⎟ ⎜ .. .. ⎟ ⎝. .⎠ 10 The concretisation operator A†p represents uniform distributions over the n + 1 even numbers in the range −n, . . . , n (as the first row) and the n odd numbers in the same range (in the second row).
28
A. Di Pierro, C. Hankin, and H. Wiklicky
Example 8 (Sign). With C = V({−n, . . . , 0, . . . , n}) and D = V({−, 0, +}) we can represent the usual sign abstraction by the following matrices: ⎞ ⎛ 100 ⎜ .. .. .. ⎟ ⎜. . .⎟ ⎜ ⎟ ⎛1 ⎞ 1 ⎜1 0 0⎟ n ... n 0 0 ... 0 ⎜ ⎟ ⎟ As = ⎜ A†s = ⎝ 0 . . . 0 1 0 . . . 0 ⎠ ⎜0 1 0⎟ ⎜0 0 1⎟ 0 . . . 0 0 n1 . . . n1 ⎜ ⎟ ⎜. . .⎟ ⎝ .. .. .. ⎠ 001 Example 9 (Forget). We can also abstract all details of the concrete semantics. Although this is in general a rather unusual abstraction it is quite useful in the context of a tensor product state and/or abstraction. Let the concrete domain be the vector space over any range, i.e. C = V({n, . . . , 0, . . . , m}), and the abstract domain a one dimensional space D = V({}). Then the forgetful abstraction and concretisation can be defined by: 1 1 1 1 ATf = 1 1 1 . . . 1 A†f = m−n+1 m−n+1 m−n+1 . . . m−n+1 For any matrix M operating on C = V({n, . . . , 0, . . . , m}) the abstraction A†f ·M· Af gives a one dimensional matrix, i.e. a single scalar μ. For stochastic matrices, such as our T generating the DTMC representing the concrete semantics we have: μ = 1. If we consider a tensor product of two matrices M ⊗ N, then the abstraction Af ⊗ I extracts (essentially) N, (Af ⊗ I)† · (M ⊗ N) · (Af ⊗ I) = = (A†f ⊗ I† ) · (M ⊗ N) · (Af ⊗ I) = = (A†f · M · Af ) ⊗ (I · N · I) = = μ ⊗ N = μN. 4.3
Abstract LOS Semantics
The abstract semantics T# (P ) of a program P is constructed exactly like the concrete one, except that we will use abstract tests and update operators. This is possible as abstractions and concretisations distribute over sums and tensor products. More precisely, we can construct T# for a program P as: T# (P ) = pij · T# (i , j ) i,pij ,j∈F (P )
where the transfer operator along a computational step from label i to j can be abstracted “locally”: Abstracting each variable separately and using the concrete control flow we get the operator v A=( Ai ) ⊗ I = A1 ⊗ A2 ⊗ . . . ⊗ Av ⊗ I. i=1
Probabilistic Semantics and Program Analysis
29
Then the abstract transfer operator T# (i , j ) can be defined as: T# (i , j ) = (A†1 Ni1 A1 ) ⊗ (A†2 Ni2 A2 ) ⊗ . . . ⊗ (A†v Niv Av ) ⊗ E(i , j ). This operator implements the (abstract) effect to each of the variables in the individual statement at i and combines it with the concrete control flow. This follows directly from a short calculation: T# = A† TA = = A† ( pij · T(i , j ))A = =
i,j
=
i,j
=
i,j
=
i,j
i,j
pij · (A† T(i , j )A) = pij · ( Ak ⊗ I)† T(i , j )( Ak ⊗ I) = k
k
† pij · ( Ak ⊗ I† )( Nik ⊗ E(i , j ))( Ak ⊗ I) = k
pij ·
k
k
(A†k Nik Ak )
k
⊗ E(i , j ).
It is of course also possible to abstract the control flow, or to use abstractions which abstract several variables at the same time, e.g. by specifying the abstract state via the difference of two variables. The dramatic reduction in size, i.e. dimensions, achieved via PAI illustrated also by the examples in these notes lets us hope that our approach could ultimately lead to scalable analyses, despite the fact that the concrete semantics is non-feasibly large. As many people have observed – the use of tensor products or similar constructions in probabilistic models leads to a combinatorial explosion of the size of the formal model. However, the PAI approach allows us to keep some control and to obtain reasonably sized abstract models. Further work in the form of practical implementations and experiments is needed to decide whether this is indeed the case. The LOS represents the SOS via the generator of a DTMC. It describes the stepwise evolution of the state of a computation and does not provide a fixed-point semantics. Therefore, neither in the concrete nor in the abstract case can we guarantee that limn→∞ (T(P ))n or limn→∞ (T(P )# )n always exist. The analysis of a program P based on the abstract operator T(P )# is considerably simpler than by considering the concrete one but still not entirely trivial. Various properties of T(P )# can be extracted by iterative methods (e.g. computing limn→∞ (T(P )# )n or some averages). As often in numerical computation, these methods will converge only for n → ∞ and any result obtained after only a finite number of steps will only be an approximation. However, one can study stopping criteria which guarantee a certain quality of this approximation. The development or adaptation of iterative methods and formulation of appropriate stopping criteria might be seen as the numerical analog to widening and narrowing techniques within the classical setting.
30
4.4
A. Di Pierro, C. Hankin, and H. Wiklicky
Classical vs. Probabilistic Abstract Interpretation
Classical abstract interpretation and probabilistic abstract interpretation provide “approximations” for different mathematical structures, namely partial orders vs vector spaces. In order to illustrate and compare their features we therefore need a setting where the domain in question in some way naturally provides both structures. One such situation is in the context of classical function interpolation or approximation. The set of real-valued functions on a real interval [a, b] obviously comes with a canonical partial order, namely the point-wise ordering, and at the same time is equipped with a vector space structure, where again addition and scalar multiplication are defined point-wise. Some care has to be taken in order to define an inner product – which we need to obtain a Hilbert space structure, e.g. one could consider only the square integrable functions L2 ([a, b]). In order to avoid mathematical (e.g. measure-theoretic) details we simplify the situation by just considering the step functions on the interval [a, b]. For a (closed) real interval [a, b] ⊆ R we ncall the set of subintervals [ai , bi ] with i = 1, . . . , n the n-subdivision of [a, b] if i=1 [ai , bi ] = [a, b] and bi − ai = b−a for n all i = 1, . . . , n. We assume that the sub-intervals are enumerated in the obvious way, i.e. ai < bi = ai+1 < bi+1 for all i and in particular that a = a1 and bn = b. Definition 3. The set of n-step functions Tn ([a, b]) on [a, b] is the set of realvalued functions f : [a, b] → R such that f is constant on each subinterval (ai , bi ) in the n-subdivision of [a, b]. We define a partial order on Tn ([a, b]) in the obvious way for f, g ∈ Tn ([a, b]): f g iff f (
b i − ai b i − ai ) ≤ g( ), for all 1 ≤ i ≤ n 2 2
i.e. iff the value of f (which we obtain by evaluating it on the mid-point in (ai , bi )) on all subintervals (ai , bi ) is less or equal to the value of g. It is also obvious to see that Tn ([a, b]) has a vector space structure isomorphic to Rn and thus is also provided with an inner product. More concretely we define the vector space operations . · . : R × Tn ([a, b]) → Tn ([a, b]) and . + . : Tn ([a, b]) × Tn ([a, b]) → Tn ([a, b]) pointwise as follows: (α · f )(x) = αf (x) (f + g)(x) = f (x) + g(x) for all α ∈ R, f, g ∈ Tn ([a, b]) and x ∈ [a, b]. The inner product is given by: f, g =
n i=1
f(
b i − ai b i − ai )g( ). 2 2
In this setting we now can apply and compare both the classical and the quantitative version of abstract interpretation as in the following example.
Probabilistic Semantics and Program Analysis
31
Example 10. Let us consider a step function f in T16 (the concrete values of a and b don’t really play a role in our setting) which can be depicted as: 10 9 8 7 6 5 4 3 2 1 0 a
b
We can also represent f by the vector in R : 5567843286679887 16
We then construct a series of abstractions which correspond to coarser and coarser sub-divisions of the interval [a, b], e.g. considering 8, 4 etc. subintervals instead of the original 16. These abstractions are from T16 ([a, b]) to T8 ([a, b]), T4 ([a, b]) etc. and can be represented by 16 × 8, 16 × 4, etc. matrices. For example, the abstraction which joins two sub-intervals and which corresponds to the abstraction α8 : T16 ([a, b]) → T8 ([a, b]) together with its Moore-Penrose pseudoinverse is represented by: ⎛ ⎞ 10000000 ⎜1 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜0 1 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜0 1 0 0 0 0 0 0⎟ ⎜ ⎟ ⎛1 1 ⎞ ⎜0 0 1 0 0 0 0 0⎟ 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ⎜ ⎟ ⎜0 0 1 0 0 0 0 0⎟ ⎜ 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0⎟ 2 2 ⎜ ⎟ ⎟ ⎜ ⎜0 0 0 1 0 0 0 0⎟ ⎜ 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0⎟ 2 2 ⎜ ⎟ ⎜ ⎟ 1 1 ⎜0 0 0 1 0 0 0 0⎟ ⎜ ⎟ ⎟ G8 = ⎜ 0 0 0 0 0 0 2 2 10 10 0 0 0 0 0 0 ⎟ A8 = ⎜ ⎜0 0 0 0 1 0 0 0⎟ ⎜0 0 0 0 0 0 0 0 ⎟ 2 2 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎜0 0 0 0 1 0 0 0⎟ ⎜ 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0⎟ 2 2 ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 0 0 1 0 0⎟ ⎝ 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0⎠ 2 2 ⎜ ⎟ ⎜0 0 0 0 0 1 0 0⎟ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12 12 ⎜ ⎟ ⎜0 0 0 0 0 0 1 0⎟ ⎜ ⎟ ⎜0 0 0 0 0 0 1 0⎟ ⎜ ⎟ ⎝0 0 0 0 0 0 0 1⎠ 00000001 With the help of Aj , j ∈ {1, 2, 4, 8}, we can easily compute the abstraction of f as f Aj , which in order to compare it with the original f we can then again
32
A. Di Pierro, C. Hankin, and H. Wiklicky
concretise using G, i.e. computing f AG. In a similar way we can also compute the over- and under-approximation of f in Ti based on the above pointwise ordering and its reverse ordering. The result of these abstractions is depicted geometrically in Figure 1. With the help of Aj , j ∈ {1, 2, 4, 8}, we can easily compute the abstraction of f as f Aj , which in order to compare it with the original f we can then again concretise using G, i.e. computing f AG. In a similar way we can also compute the over- and under-approximation of f in Ti based on the above pointwise ordering and its reverse ordering. The result of these abstractions is depicted geometrically in Figure 1. The individual diagrams in this figure depict the original, i.e. concrete step function f ∈ T16 together with its approximations in T8 , T4 , etc. On the left hand side the PAI abstractions show how coarser and coarser interval subdivisions result in a series of approximations which try to interpolate the given function as closely as possible, sometimes below, sometimes above the concrete values. The diagrams on the right hand side depict the classical over- and underapproximations: In each case the function f is entirely below or above these approximations, i.e. we have safe but not necessarily close approximations. Additionally, one can also see from these figures not only that the PAI interpolation is in general closer to the original function than the classical abstractions (in fact it is the closest possible) but also that the PAI interpolation is always between the classical over- and under-approximations. The vector space framework also allows us to judge the quality of an abstraction or approximation via the Euclidian distance between the concrete and abstract version of a function. We can compute the least square error as f − f AG. In our case we get for example: f − f A8 G8 = 3.5355 f − f A4 G4 = 5.3151 f − f A2 G2 = 5.9896 f − f A1 G1 = 7.6444 which illustrates, as expected, that the coarser our abstraction is the larger is also the mistake or error. 4.5
Examples
We conclude by discussing in detail how probabilistic abstraction allows us to analyse the properties of programs. In the first example we are going to present, the aim is to reduce the size (dimension) of the concrete semantics so as to allow for an immediate understanding of the results of a computation. The second example will look more closely at the efficiency of an analysis, i.e. how PAI
Probabilistic Semantics and Program Analysis Probabilistic Abstract Interpretation
33
Classical Abstract Interpretation
T8
T4
T2
T1 Fig. 1. Average, Over- and Under-Approximation
can be deployed in order to beat the combinatorial explosion or the curse of dimensionality. Example 11 (Monty Hall). We have already investigated the LOS semantics of the Monty Hall program in Example 5. We still have to analyse whether it is Ht or Hw that implements the better strategy. In principle, we can do this using the concrete semantics we constructed above. However, it is rather cumbersome
34
A. Di Pierro, C. Hankin, and H. Wiklicky
to work with “relatively large” 162 × 162 or 243 × 243 matrices, even when they are sparse, i.e. contain almost only zeros (in fact only about 1.2% of entries in Ht and 0.7% of entries in Hw are non-zero). If we want to analyse the final states, i.e. which of the two programs has a better chance of getting the right door, we need to start with an initial configuration and then iterate T(H) until we reach a/the final configuration. For our programs it is sufficient to indicate that we start in label 1, while the state is irrelevant as we initialise all three variables at the beginning of the program; we could take – for example – a state with d = o = g = 0. The vector or distribution which describes this initial configuration is a 162 or 243 dimensional vector. We can describe it in a rather compact form as: x0 = 1 0 0 ⊗ 1 0 0 ⊗ 1 0 0 ⊗ 1 0 0 . . . 0 , where the last factor is 6 or 9 dimensional, depending on whether we deal with Ht or Hw . This represents a point distribution on 162 or 243 relevant distributions. Assuming that our program terminates for all initial states, as it is the case here, then there exists a certain number of iterations t such that x0 T(H)t = x0 T(H)t+1 , i.e. we will eventually reach a fix-point which gives us a distribution over configurations. In general, as in our case here, this will not be just a point distribution. Again we get vectors of dimension 162 or 243, respectively. For Ht and Hw there are 12 configurations which have non-zero probability. ⎧ ⎧ x12 = 0.074074 x18 = 0.11111 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ = 0.037037 x x ⎪ ⎪ 18 27 = 0.11111 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x = 0.11111 ⎪ ⎪ 36 54 = 0.037037 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ = 0.11111 x x ⎪ ⎪ 48 72 = 0.074074 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x = 0.11111 = 0.074074 ⎪ ⎪ 72 108 ⎪ ⎪ ⎨ ⎨ x78 = 0.037037 x117 = 0.11111 for Ht for Hw x90 = 0.074074 x135 = 0.11111 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x144 = 0.037037 = 0.11111 ⎪ ⎪ 96 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x180 = 0.037037 = 0.11111 ⎪ ⎪ 120 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x198 = 0.074074 = 0.11111 ⎪ ⎪ 132 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ x x225 = 0.11111 = 0.074074 ⎪ ⎪ 150 ⎪ ⎪ ⎩ ⎩ x156 = 0.037037 x234 = 0.11111 It is anything but easy to determine from this information which of the two strategies is more successful. In order to achieve this we will abstract away all unnecessary information. First, we ignore the syntactic information: If we are in the terminal state, then we have reached the final stop state, but even if this would not be the case we only need to know whether in the final state we have guessed the right door, i.e. whether d==g or not. We thus also don’t need to know the value of o as it ultimately is of no interest to us which door had been opened during the game. Therefore, we can use the forgetful abstraction
Probabilistic Semantics and Program Analysis
35
Af to simplify the information contained in the terminal state. Regarding d and g we want to know everything, and thus use the trivial abstraction A = I, i.e. the identity. The result for Ht is for x the terminal configuration distribution: x · (I ⊗ I ⊗ Af ⊗ Af ) = 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 and for Hw we get: x · (I ⊗ I ⊗ Af ⊗ Af ) = 0.22 0.04 0.07 0.07 0.22 0.04 0.04 0.07 0.22 The nine coordinates of these vectors correspond to (d → 0, g → 0), (d → 0, g → 1), (d → 0, g → 2), (d → 1, g → 0), . . . , (d → 2, g → 2). This is in principle enough to conclude that Hw is the better strategy. However, we can go a step further and abstract not the values of d and g but their relation, i.e. whether they are equal or different. For this we need the abstraction: ⎛ ⎞ 10 ⎜0 1⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎟ Aw = ⎜ ⎜1 0⎟ ⎜0 1⎟ ⎜ ⎟ ⎜0 1⎟ ⎜ ⎟ ⎝0 1⎠ 10 where the first column corresponds to a winning situation (i.e. d and g are equal), and the second to unequal d and g. With this we get for Ht : x · (Aw ⊗ Af ⊗ Af ) = 0.33333 0.66667 and for Hw
x · (Aw ⊗ Af ⊗ Af ) = 0.66667 0.33333
It is now obvious that Ht has just a 13 chance of winning, while Hw has a probability of picking the winning door.
2 3
This example illustrates how abstraction can be used in order to obtain useful information from a large collection of data – so to say, how to use abstractions to do statistics. We did not utilise PAI here to simplify the semantics itself but only the final results. We will now consider this issue in our second running example. Example 12 (Factorial). Classical abstraction allows us to determine the parity properties of the “double factorial” in Example 2. However, we cannot use it to justify our intuition that even the plain factorial itself almost always produces a even result. In order to do this, let us first consider the concrete semantics of our program using the following labelling:
36
A. Di Pierro, C. Hankin, and H. Wiklicky
var m : {0..2}; n : {0..2}; begin [m := 1]1 ; while [(n>1)]2 do [m := m*n]3; [n := n-1]4; od; [stop]5; end The flow of this program F is given as follows: Flow(F ) = {(1, 1, 2), (2, 1, 3), (3, 1, 4), (4, 1, 2), (2, 1, 5), (5, 1, 5)} The operator T(F ) is then constructed as T(F ) = U(m ← 1) ⊗ E(1, 2) + P((n > 1)) ⊗ E(2, 3) + U(m ← (m * n)) ⊗ E(3, 4) + U(n ← (n - 1)) ⊗ E(4, 2) + P((n <= 1)) ⊗ E(2, 5) + I ⊗ E(5, 5) using the matrices T(, ) = S() ⊗ E(, ) (where we indicate the then and else branches again by underlining in the obvious way): ⎛ ⎞ ⎛ ⎞ 000100000 000000000 ⎜0 0 0 0 1 0 0 0 0⎟ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 0 0 1 0 0 0⎟ ⎜0 0 1 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 1 0 0 0 0 0⎟ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎜ ⎟ S(1) = ⎜ ⎜ 0 0 0 0 1 0 0 0 0 ⎟ S(2) = ⎜ 0 0 0 0 0 0 0 0 0 ⎟ ⎜0 0 0 0 0 1 0 0 0⎟ ⎜0 0 0 0 0 1 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 1 0 0 0 0 0⎟ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎝0 0 0 0 1 0 0 0 0⎠ ⎝0 0 0 0 0 0 0 0 0⎠ 000001000 000000001 ⎛ ⎞ ⎛ ⎞ 000000000 100000000 ⎜1 0 0 0 0 0 0 0 0⎟ ⎜0 1 0 0 0 0 0 0 0⎟ ⎟ ⎜ ⎟ ⎜ ⎜0 0 1 0 0 0 0 0 0⎟ ⎜0 1 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜1 0 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎟ S(3) = ⎜ ⎜ 0 0 0 0 1 0 0 0 0 ⎟ S(4) = ⎜ 0 0 0 1 0 0 0 0 0 ⎟ ⎜0 0 0 0 1 0 0 0 0⎟ ⎜0 0 0 0 0 0 0 0 1⎟ ⎟ ⎜ ⎟ ⎜ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜1 0 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎝0 0 0 0 0 0 0 1 0⎠ ⎝0 0 0 0 0 0 1 0 0⎠ 000000010 000000000
Probabilistic Semantics and Program Analysis
37
⎛
⎞ ⎛ ⎞ 100000000 100000000 ⎜0 1 0 0 0 0 0 0 0⎟ ⎜0 1 0 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜0 0 1 0 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 1 0 0 0 0 0⎟ ⎜0 0 0 1 0 0 0 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎟ ⎟ ⎜ S(2) = ⎜ ⎜ 0 0 0 0 1 0 0 0 0 ⎟ S(5) = ⎜ 0 0 0 0 1 0 0 0 0 ⎟ ⎜0 0 0 0 0 0 0 0 0⎟ ⎜0 0 0 0 0 1 0 0 0⎟ ⎟ ⎜ ⎟ ⎜ ⎜0 0 0 0 0 0 1 0 0⎟ ⎜0 0 0 0 0 0 1 0 0⎟ ⎜ ⎟ ⎜ ⎟ ⎝0 0 0 0 0 0 0 1 0⎠ ⎝0 0 0 0 0 0 0 1 0⎠ 000000000 000000001 Note that for the updates in label 3 and 4 we have “empty rows”, i.e. rows where we have no non-zero entries. These correspond to over- and under-flows as we are dealing only with finite values in Z. We could clarify the situation in various ways, e.g. by introducing an additional value ⊥ for undefined (concrete) values of variables, or by introducing an error configuration, etc. In the analysis we present in this example these over- and under-flows do not play any relevant role and we therefore leave things as they are – one should however keep in mind that this violates the observation that T(F ) is a stochastic matrix. The full operator representing the LOS semantics of this (small) factorial program is given by a (3 · 3 · 5) × (3 · 3 · 5) = 45 × 45 matrix: ⎛
T(F ) =
⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 1 . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . 1 . . . . . . . . . . . .
. 1 . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . 1 . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1 . . . . . . . . . . . . . . 1 . . . . . . . 1 . . . . . . 1 . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . 1 . . 1 . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 1 . . . . . . . . . . . . . . 1 . . . . . . . 1 . . . . . . 1 . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . 1 . . 1 . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . 1 . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . 1 . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . 1 . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . 1 . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . .
⎞ . .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎟ .⎟ ⎠ .
1
We can now construct an abstract version T# (F ) of T(F ) by recording only the parity of m as even and odd. We will not abstract n nor the labels defining the current configuration during the execution. We thus get
38
A. Di Pierro, C. Hankin, and H. Wiklicky
T# (F ) = (Ap ⊗ I ⊗ I)† T(F )(Ap ⊗ I ⊗ I) a (2 · 3 · 5) × (2 · 3 · 5) = 30 × 30 matrix. Though this abstract semantics does have some interesting properties, it appears to be only a minor improvement with regard to the concrete semantics: We managed to reduce the dimension only from 45 to 30. However, the simplification becomes substantially more dramatic once we increase the possible values of m and n, and combinatorial explosion really takes a hold. If we allow n to take values between 0 and n then we must allow for m values between 0 and n!. Concrete values of the dimensions of T(F ) and T# (F ) are given in the following table: n dim(T(F )) dim(T# (F )) 2 45 30 3 140 40 4 625 50 5 3630 60 6 25235 70 7 201640 80 8 1814445 90 9 18144050 100 The problem is that the size of T(F ) explodes so quickly that it is impossible to simulate it for values of n much larger than 5 on a normal PC. If we want to analyse the abstract semantics, things remain much smaller. Importantly, we can construct the abstract semantics in the same way as the concrete one, just using “smaller” matrices: T# (F ) = U# (m ← 1) ⊗ E(1, 2) + P# ((n > 1)) ⊗ E(2, 3) + U# (m ← (m * n)) ⊗ E(3, 4) + U# (n ← (n - 1)) ⊗ E(4, 2) + P# ((n <= 1)) ⊗ E(2, 5) + I# ⊗ E(5, 5) Fortunately, most of the operators T# (, ) are very easy to construct. These matrices are 2 · (n + 1) · 5 × 2 · (n + 1) · 5 = 10(n + 1) × 10(n + 1) matrices if we consider the control transfer and only 2(n + 1) × 2n + 1) matrices if we deal only with the update of the current state. In principle we could abstract the T# (, ) from their concrete versions T(, ) using our abstraction and its Moore-Penrose pseudo-inverse as concretisation. However, by considering the matrices T# (, ) in detail it is possible to come up with an even more direct construction. Except for label 3 only either m or n (but not both) are involved in each statement: We thus can express the T# (, )’s as tensor products of a 2 × 2 and a (n + 1) × (n + 1) matrix.
Probabilistic Semantics and Program Analysis
39
⎞ 0 0 ... 0 0 0 ... 0⎟ ⎟ 1 0 ... 0⎟ ⎟ 0 1 ... 0⎟ ⎟ .. .. . . .. ⎟ . . . .⎠ 0 00 0 . . . 1 ⎛ ⎞ 0 0 0 0 ... 0 ⎜1 0 0 0 ... 0⎟ ⎟ ⎜ ⎜0 1 0 0 ... 0⎟ 10 ⎜ ⎟ # ⊗ ⎜0 0 1 0 ... 0⎟ U (n ← (n-1)) = 01 ⎜ ⎟ ⎜ .. .. .. .. . . .. ⎟ ⎝. . . . . .⎠ ⎛
1 ⎜0 ⎜ ⎜0 0 1 ⎜ U# (m ← 1) = ⊗ ⎜0 01 ⎜ ⎜ .. ⎝.
⎛
0 1 0 0 .. .
0 0 0 0 ... 0
⎞ 0 0 0 0 ... 0 ⎜0 0 0 0 ... 0⎟ ⎟ ⎜ ⎜0 0 1 0 ... 0⎟ 10 ⎜ ⎟ # P ((n > 1)) = ⊗ ⎜0 0 0 1 ... 0⎟ 01 ⎜ ⎟ ⎜ .. .. .. .. . . .. ⎟ ⎝. . . . . .⎠ 0 0 0 0 ... 1
⎞ 1 0 0 0 ... 0 ⎜0 1 0 0 ... 0⎟ ⎟ ⎜ ⎜0 0 0 0 ... 0⎟ 1 0 ⎟ ⎜ ⊗ ⎜0 0 0 0 ... 0⎟ P# ((n <= 1)) = 01 ⎟ ⎜ ⎜ .. .. .. .. . . .. ⎟ ⎝. . . . . .⎠ 0 0 0 0 ... 0 ⎛
Finally, we need just to construct the update for label 3. It is easy to see that for even m the result is again even and for odd m the parity of n determines the parity of the resulting m. We can thus write this update as: ⎛ ⎞ 1 0 0 0 ... 0 ⎜0 1 0 0 ... 0⎟ ⎟ ⎜ ⎜0 0 1 0 ... 0⎟ 1 0 ⎜ ⎟ U# (m ← (m * n)) = ⊗ ⎜0 0 0 1 ... 0⎟ + 00 ⎜ ⎟ ⎜ .. .. .. .. . . .. ⎟ ⎝. . . . . .⎠ 0 0 0 0 ... 1 ⎛ ⎞ ⎛ ⎞ 0 0 0 0 ... 0 1 0 0 0 ... 0 ⎜0 1 0 0 ... 0 ⎟ ⎜0 0 0 0 ... 0 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜0 0 0 0 ... 0 ⎟ ⎜0 0 1 0 ... 0 ⎟ ⎜ ⎟ ⎜ ⎟ 00 00 0 0 0 1 ... 0 ⎟ 0 0 0 0 ... 0 ⎟ ⊗⎜ + ⊗⎜ + ⎜ ⎟ ⎜ ⎟ 01 10 ⎜ .. .. .. .. . . .. ⎟ ⎜ .. .. .. .. . . .. ⎟ ⎜. . . . . . ⎟ ⎜. . . . . . ⎟ ⎝ ⎠ ⎝ ⎠ . . 0 0 0 0 . . . .. 0 0 0 0 . . . ..
40
A. Di Pierro, C. Hankin, and H. Wiklicky
With this we can now approximate the probabilistic properties of the factorial function. In particular, if we look at the terminal configurations with the initial abstract configuration: x0 =
1
1 2 2
⊗
1 n+1
...
1 n+1
⊗ 10000
which corresponds to a uniform distribution over all possible abstract values for our variables m and n (in fact, the part describing m could be any other distribution) then we get as final probabilistic configuration: x=
n−1
2 n+1 n+1
⊗
1 n n+1 n+1
0 ... 0 ⊗ 0 0 0 0 1
This expresses the fact that indeed in most cases (with probability n−1 n+1 ) we get an even factorial – only in two cases out of n + 1 (for 0 and 1) we get an odd result (namely 1). The final value of n is nearly always 1 except when we start with 0 and we always reach the final statement with label 5. If we start with the abstract initial state x0 above and execute T# (F ) until we get a fix-pint x we can use (as before in the Monty Hall example) abstractions not to simplify the semantics but instead in order to extract the relevant information. Concretely we can use: A = I ⊗ Af ⊗ Af i.e. once we reached the terminal configuration (of the abstract execution) we ignore the value of n and the final label and only concentrate on the abstract, i.e. parity, values of m. Concretely we have to compute: ( lim x0 · (T# (F ))i ) · A i→∞
Note that we always reach the fix-point after a finite number of iterations (namely at most n) so this can be computed in finite time. The concrete probabilities we get are for various n are: n 10 100 1000 10000
even odd 0.81818 0.18182 0.98019 0.019802 0.99800 0.0019980 0.99980 0.00019998
We see that we can easily compute the final distribution on {even, odd} for quite large n despite the fact that, as said, it is virtually impossible to compute the explicit representation of the concrete semantics T(F ) already for n = 6. Acknowledgements. These lecture notes are partly based on previously published material in [18] and [24]. The example (matrices) in these notes were generated using a parser written in ocaml and using octave [28].
Probabilistic Semantics and Program Analysis
41
References 1. Di Pierro, A., Wiklicky, H.: Concurrent Constraint Programming: Towards Probabilistic Abstract Interpretation. In: PPDP 2000, pp. 127–138 (2000) 2. Di Pierro, A., Hankin, C., Wiklicky, H.: Continuous-time probabilistic KLAIM. In: Focardi, R., Zavattaro, G. (eds.) SecCo 2004 — CONCUR Workshop on Security Issues in Coordination Models, Languages and Systems. ENTCS, vol. 128(5). Elsevier, Amsterdam (2005) 3. Stirzaker, D.: Probability and Random Variables – A Beginners Guide. Cambridge University Press, Cambridge (1999) 4. Greub, W.: Linear Algebra. In: Grundlehren der mathematischen Wissenschaften, 3rd edn., vol. 97. Springer, New York (1967) 5. Conway, J.: A Course in Functional Analysis. In: Garduate Texts in Mathematics, 2nd edn., vol. 96. Springer, New York (1990) 6. Seneta, E.: Non-negative Matrices and Markov Chains, 2nd edn. Springer, Heidelberg (1981) 7. Grimmett, G., Stirzaker, D.: Probability and Random Processes, 2nd edn. Clarendon Press, Oxford (1992) 8. Tijms, H.: Stochastic Models – An Algorithmic Approach. John Wiley & Sons, Chichester (1994) 9. Grinstead, C., Snell, J.: Introduction to Probability, 2 revised edn. American Mathematical Society, Providence (1997) 10. Norris, J.: Markov Chains. Cambidge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (1997) 11. Roman, S.: Advanced Linear Algebra, 2nd edn. Springer, Heidelberg (2005) 12. Kadison, R., Ringrose, J.: Fundamentals of the Theory of Operator Algebras: Volume I – Elementary Theory. Graduate Studies in Mathematics, vol. 15. American Mathematical Society, Providence (1997); Reprint from Academic Press edition (1983) 13. Filter, W., Weber, K.: Integration Theory. Chapman and Hall, Boca Raton (1997) 14. Nielson, F., Nielson, H.R.: Flow logics and operational semantics. ENTCS 10 (1998) 15. Nielson, F., Nielson, H.R., Hankin, C.: Principles of Program Analysis. Springer, Heidelberg (1999) 16. Di Pierro, A., Hankin, C., Wiklicky, H.: Measuring the confinement of probabilistic systems. Theoretical Computer Science 340(1), 3–56 (2005) 17. Cousot, P., Cousot, R.: Systematic design of program transformation frameworks by abstract interpretation. In: POPL 2002, pp. 178–190 (2002) 18. Di Pierro, A., Hankin, C., Wiklicky, H.: A systematic approach to probabilistic pointer analysis. In: Shao, Z. (ed.) APLAS 2007. LNCS, vol. 4807, pp. 335–350. Springer, Heidelberg (2007) 19. Cousot, P., Cousot, R.: Abstract Interpretation and Applications to Logic Programs. Journal of Logic Programming 13(2-3), 103–180 (1992) 20. Abramsky, S., Hankin, C. (eds.): Abstract Interpretation of Declarative Languages. Ellis-Horwood, Chichester (1987) 21. Davey, B., Priestley, H.: Introduction to Lattices and Order. Cambridge University Press, Cambridge (1990) 22. Cousot, P., Cousot, R.: Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints. In: POPL 1977, pp. 238–252 (1977)
42
A. Di Pierro, C. Hankin, and H. Wiklicky
23. Cousot, P., Cousot, R.: Systematic Design of Program Analysis Frameworks. In: Proceedings of POPL 1979, San Antonio, Texas, pp. 269–282 (1979) 24. Di Pierro, A., Hankin, C., Wiklicky, H.: Abstract interpretation for worst and average case analysis. In: Reps, T., Sagiv, M., Bauer, J. (eds.) Wilhelm Festschrift. LNCS, vol. 4444, pp. 160–174. Springer, Heidelberg (2007) 25. Di Pierro, A., Wiklicky, H.: Measuring the precision of abstract interpretations. In: Lau, K.-K. (ed.) LOPSTR 2000. LNCS, vol. 2042, pp. 147–164. Springer, Heidelberg (2001) 26. Campbell, S., Meyer, D.: Generalized Inverse of Linear Transformations. Constable and Company, London (1979) 27. Ben-Israel, A., Greville, T.: Generalised Inverses, 2nd edn. Springer, Heidelberg (2003) 28. Eaton, J.: Gnu Octave Manual (2002), http://www.octave.org
Measurement-Based and Universal Blind Quantum Computation Anne Broadbent1 , Joseph Fitzsimons1,2 , and Elham Kashefi3 1
Institute for Quantum Computing, University of Waterloo, Canada 2 Materials Department, University of Oxford, United Kingdom 3 School of Informatics, University of Edinburgh, United Kingdom
Abstract. Measurement-based quantum computation (MBQC) is a novel approach to quantum computation where the notion of measurement is the main driving force of computation. This is in contrast with the more traditional circuit model which is based on unitary operation. We review here the mathematical model underlying MBQC and the first quantum cryptographic protocol designed using the unique features of MBQC.
1
Introduction
Traditionally, the main framework to explore quantum computation has been the circuit model [Deu89], based on unitary evolution. Though this model is very useful for complexity analysis [BV97], there are other models such as quantum Turing machines [Deu85] and quantum cellular automata [Wat95, vD96, DS96, SW04]. Although they are all proved to be equivalent from the point of view of expressive power, there is no agreement on what is the canonical model for exposing the key aspects of quantum computation. Recently, distinctly different models have emerged, namely adiabatic and topological quantum computing. Both suggest different architectures, and fault tolerant schemes, and provide specific approaches to new applications and algorithms, and specific means to compare classical and quantum computation. Another family of models, collectively called measurement-based quantum computing (MBQC), has also received wide attention. MBQC is very different from the circuit model where measurement is done only at the end to extract classical output. In measurement-based quantum computing the main operation to manipulate information and control computation is measurement [GC99, RB01, RBB03, Nie03]. This is surprising because measurement creates indeterminacy, yet it is used to express deterministic computation defined by a unitary evolution. More precisely, a computation consists of a phase in which a collection of qubits are set up in a standard entangled state. Then measurements are applied to individual qubits and the outcomes of the measurements may be used to determine further adaptive measurements. Finally – again depending on measurement outcomes – local adaptive unitary operators, called corrections, are applied to some qubits; this allows the elimination of the indeterminacy introduced A. Aldini et al. (Eds.): SFM 2010, LNCS 6154, pp. 43–86, 2010. c Springer-Verlag Berlin Heidelberg 2010
44
A. Broadbent, J. Fitzsimons, and E. Kashefi
by measurements. Conceptually MBQC highlights the role of entanglement and separates the quantum and classical aspects of computation; thus it clarifies, in particular, the interplay between classical control and the quantum evolution process. The first structural feature of MBQC is a key factorisation property, namely that entanglement can be done in the initial phase of the computation, followed by local measurements. This can be reduced to confluence properties of a simple algebraic rewriting system. This is captured by the Measurement Calculus [DKP07], which one can think of it as an “assembly language” for MBQC with a notation for such classically correlated sequences of entanglements, measurements, and local corrections. Computations are organised in patterns. Here we use the word “pattern” rather than “program” as this corresponds to the commonly used terminology in the physics literature. The Measurement Calculus consists of local equations over patterns that exploit some special algebraic properties of the entanglement, measurement and correction operators. More precisely, it uses the fact that 1-qubit measurements are closed under conjugation by Pauli operators and the entanglement command belongs to the normaliser of the Pauli group. This calculus is sound in that it preserves the interpretation of patterns. Most importantly, one can derive from it a simple algorithm (called standardisation) by which any general pattern can be put into a standard form where entanglement is done first, then measurements, then corrections. The consequences of the existence of such a procedure are far-reaching. Since entangling comes first, one can prepare the entire entangled state needed during the computation right at the start: one never has to perform “on the fly” entanglement. Furthermore, the rewriting of a pattern in standard form reveals parallelism in the target computation. In a general pattern, one is forced to compute sequentially and to strictly obey the command sequence, whereas, after standardisation, the dependency structure is relaxed, resulting in lower computational depth complexity [BK09]. The existence of a standard form for any pattern also has interesting corollaries beyond implementation and complexity matters, as it follows that patterns using no dependencies, or using only the restricted class of Pauli measurements, can only realise a unitary belonging to the Clifford group, and hence can be efficiently simulated by a classical computer [DKP07]. The second structural feature of MBQC is captured by the notion of Flow [DK06]. Although quantum measurements are inherently non-deterministic, one can sometimes ensure the global determinism of the computation using suitable dependencies between measurements. Flow asserts that under a graph-theoretic condition on the entanglement underlying a given computation, it is possible to construct such dependencies. This is a significant progress in the direct understanding of the specifics of measurement-based information processing. Building on this criterion, and the well known stabiliser formalism, a full characterisation of determinism is obtained [BKMP07]. Having obtained the rigourous mathematical model underlying MBQC, we then demonstrate that this model suggests new techniques for designing quantum protocols. We present a protocol, called Universal Blind Quantum Computation
Measurement-Based and Universal Blind Quantum Computation
45
(UBQC) which allows a client to have a server carry out a quantum computation for her such that the client’s inputs, outputs and computation remain perfectly private, and where she does not require any quantum computational power or memory. The client only needs to be able to prepare single qubits randomly chosen from a finite set and send them to the server, who has the balance of the required quantum computational resources. UBQC protocol is the first universal scheme which detects a cheating server, as well as the first protocol which does not require any quantum computation whatsoever on the client’s side. The novelty of UBQC protocol is in using the unique features of MBQC which allows one to clearly distinguish between the quantum and classical aspects of a quantum computation.
2
Preliminaries
We give a brief summary of quantum mechanics and quantum computing. We develop some of the algebra, define notation, and prove a couple of relations which are used in this chapter. The reader will find the expository book of Nielsen and Chuang [NC00] useful for quantum computation or the excellent book by Peres [Per95] for general background on quantum mechanics. The vector spaces that arise in quantum mechanics are Hilbert spaces and are thus usually written H; that is they have an inner product usually written u | v where u and v are vectors. Following Dirac, it is customary to call elements of H kets and write them in the form |u or whatever symbol is appropriate inside the half-bracket. The dual vectors are called bras and are written v|; the pairing thus can naturally be identified – conceptually and notationally – with the inner product. A hermitian operator A is one such that A = A† and a unitary operator U is one such that U −1 = U † . A projection P is a linear operator such that P 2 = P and P = P † . A projection operator can be identified with a subspace, namely its range. The eigenvalues of a hermitian operator are always real. Suppose U is a unitary, and P a projection, then U P U † is also a projection. The spectral theorem for hermitian operators states that if M is a hermitian operator, λi are its eigenvalues and Pi are projection operators onto the corresponding eigenspaces then one can write M= λi Pi . i
If we have |i as the normalised eigenvectors for the eigenvalues λi then we can write this in Dirac notation as: λi |ii|. M= i
Finally we need a method to combine Hilbert spaces. Given two Hilbert spaces H with basis vectors {ai |1 ≤ i ≤ n} and H with basis {bj |1 ≤ j ≤ m} we define the tensor product, written H ⊗ H , as the vector space of dimension n · m with basis ai ⊗bj . In practice we almost never write the symbol ⊗ between the vectors
46
A. Broadbent, J. Fitzsimons, and E. Kashefi
and in the Dirac notation this is almost always omitted where one writes, for example, |uv instead of |u ⊗ |v. The important point is that there are vectors that cannot be written as the tensor product of vectors. This means that given a general element of H ⊗ H one cannot produce elements of H and H ; this is very different from the cartesian product of sets. This is the mathematical manifestation of entanglement. A very important function on square matrices is the trace. The usual trace – i.e. the sum of the diagonal entries – is basis independent and is actually equal to the sum of the eigenvalues, counted with multiplicity. The trace of A is denoted tr(A) and satisfies the cyclicity property tr(AB) = tr(BA); applying this repeatedly one gets tr(A1 . . . An ) = tr(Aσ(1) . . . Aσ(n) ) where σ is a cyclic permutation. The explicit formula for the trace of A : V → V is tr(A) = i i|A|i where |i is a basis for V . In quantum mechanics, one often needs to compute a partial trace. Consider a linear map L : V ⊗ W → V ⊗ W . Suppose that |vi is a basis for V and |wi is a basis for W then |vi wj is a basis for V ⊗ W . Now we can define the partial trace over V as trV (A) : W → W = vi |A|vi . i
This corresponds to removing the V dependency; often we use the phrase “tracing out the V component” to refer to this operation. We can now state the basic facts of quantum mechanics and will not discuss the experimental basis for this framework, though the interested reader is refered to [NC00] for further discussion. The key aspects of quantum mechanics are: – the states of a quantum system form a Hilbert space, – when two quantum systems are combined, the state space of the composite system is obtained as the tensor product of the state spaces of the individual systems, and – the evolution of a quantum system is given by a unitary operator, and – the effect of a measurement is in general indeterminate. The first says that one can form superpositions of the states. This is one of the most striking features of quantum mechanics. Thus states are not completely distinct as they are in classical systems. The inner product measures the extent to which states are distinct. The fact that systems are combined by tensor product implies that there are states that of composite systems that cannot be decomposed into individual pieces. This is the phenomenon of entanglement or non-locality. Measurement is what gives quantum mechanics its indeterminate character. The usual case, called projective measurements, is when the quantity being measured is described by a hermitian operator M . The possible outcomes are the eigenvalues of M . If M is an observable (hermitian operator) with eigenvalues λi and eigenvectors |φi and we have a generic state |ψ = i ci |φi then the probabilities and expectation values of the measurement outcomes are given by:
Measurement-Based and Universal Blind Quantum Computation
47
– P rob(λi ||ψ) = |ci |2 – E[M ||ψ] = i |ci |2 λi = i ci c¯i φi |M |φi = ψ|M |ψ. It is important to note that the effect of the measurement is that the projection operator Pi is applied when the result λi is observed. The operator M does not describe the effect of the measurement. Quantum computation is carried out with qubits the quantum analogues of bits. Just as a bit has two possible values, a qubit is a two dimensional complex Hilbert space, in other words it is (isomorphic to) the two dimensional complex vector space C2 . Generally, one works with a preferred basis, physically this corresponds to two distinguishable states, like “spin up” and “spin down”. One writes |0, and |1 for this canonical basis, so that any vector |ψ can be written as α|0 + β|1 with α, β in C. Furthermore, C2 can be turned into a Hilbert space, defining as the inner product between two vectors |ψ = α|0 + β|1 and |ψ = α |0 + β |1 as follows: ψ | ψ := α α + β β where denotes complex conjugation. One then obtains the norm of a vector as: 1
1
ψ := ψ | ψ 2 = (α α + β β) 2 Given V a finite set, one writes HV for the Hilbert space ⊗u∈V C2 ; the notation means an n-fold tensor product of the C2 where n is the size of V . A vector in HV is said to be decomposable if it can be written as ⊗u∈V ψu for some ψu ∈ C2 . Such decomposable vectors will be denoted in what follows. Decomposable vectors can be represented by a map from V to C2 , and we will use both notations depending on which is more convenient. As we have noted before there are some vectors that are not decomposable. As in the case of C2 , there is a canonical basis for HV , sometimes also called the computational basis, containing decomposable vectors such that for all v ∈ V , (v) = |0 or (v) = |1. The inner product on HV , according to the general definition given above, is defined on decomposable vectors as: | := v∈V (v) | (v) Note that all vectors in the computational basis are orthogonal and of norm 1. The vectors of norm 1 are usually called unit vectors; we always assume that states are described by unit vectors to ensure that probability of distinct measurement outcomes sum to unity. Here are some common states that arise in quantum computation: 1 1 1 1 0 1 |0 = | ↑ = , |1 = | ↓ = , |+ = √ , |− = √ . 0 1 2 1 2 −1 It is easy to see that a linear operator is unitary if it preserves the inner product and hence the norm. Thus unitaries can be viewed as maps from quantum states to quantum states.
48
A. Broadbent, J. Fitzsimons, and E. Kashefi
Some particularly useful unitaries are the Pauli operators given by the following matrices in the canonical basis of C2 : 01 0 −i 1 0 X= ,Y = ,Z= 10 i 0 0 −1 We note that all these operators are involutive, self-adjoint, and therefore unitary. All these matrices have determinant = −1 and trace = 0. Some basic algebra of these matrices are given below. First they all square to the identity. X 2 = Y 2 = Z 2 = I. The Pauli operators do not commute and we have the following relations: XY = iZ ZX = iY Y Z = iX
Y X = −iZ XZ = −iY ZY = −iX
[X, Y ] = 2iZ [Z, X] = 2iY [Y, Z] = 2iX
{X, Y } = 0 {Z, X} = 0 {Y, Z} = 0
Definition 1. Define the Pauli group, Pn , as the group consisting of tensor products of I, X, Y, and Z on n qubits, with an overall phase of ±1 or ±i. A very important related group is called the Clifford group. Definition 2. The Clifford group, Cn , is the group of unitary operators that leave the Pauli group invariant under conjugation, i.e. it is the normaliser of the Pauli group viewed as a subgroup of the unitary group. The Clifford group on n qubits can be generated by the Hadamard transform, the controlled-X (CN OT ) or controlled-Z (∧Z), and the single-qubit phase rotation: ⎛ ⎞ ⎛ ⎞ 1000 100 0 ⎜0 1 0 0 ⎟ ⎜ ⎟ 1 1 ⎟ , ∧Z = ⎜0 1 0 0 ⎟ , P = 1 0 H = √12 , CN OT = ⎜ ⎝0 0 0 1 ⎠ ⎝0 0 1 0 ⎠ 1 −1 0i 0010 0 0 0 −1 In part the importance of the Clifford group for quantum computation is due to the fact that any computation consisting of only Clifford operations on the computational basis followed by final Pauli measurements can be efficiently simulated by a classical computer, a result known as the Gottesman-Knill theorem [Got97, NC00]. Furthermore, the Clifford group exhibits many of the key features of quantum mechanics, including the ability to produce superpositions and entanglement, making this result particularly surprising. In order to capture the notion of partial information about quantum systems one uses density matrices. Before we describe density matrices we review some linear algebra in the bra-ket notation. Given a ket |ψ the notation |ψψ| denotes the projection operator onto the one dimensional subspace spanned by |ψ. If |ψi is an orthonormal basis for H the identity matrix is written i |ψi ψi |. If Q is a linear operator with eigenvalues qi and eigenvectors |qi , which form an orthonormal basis for H, we can represent Q as i qi |qi qi |. A state |ψ in H is
Measurement-Based and Universal Blind Quantum Computation
49
called a pure state. If a and b are distinct eigenvalues of some observable A with corresponding eigenvectors |a and |b it is perfectly possible to prepare a state of the form √12 (|a + |b). A measurement of A on such a state will yield either a or b each with probability 12 . However, it is also possible that a mixture is prepared. That is to say instead of a quantum superposition a classical stochastic mixture is prepared. In order to describe these we will use density matrices. For a system in a pure state |ψ, the density matrix is just the projection operator |ψψ|. What if the state is not known completely? Suppose that we only know that a system is one of several possible states |ψ1 , . . . , |ψk with probabilities p1 , . . . , pk respectively. We define the density matrix for such a state to be ρ=
k
pi |ψi ψi |.
i=1
The same formulas for the probability of observing a value qi , i.e. T r(Pi ρ) and for the expectation value of Q, i.e. T r(Qρ) apply. One can check directly that a density matrix has the following two properties. Proposition 3. An operator ρ on H is a valid density matrix if and only if – ρ has trace 1 and – ρ is a positive operator, which means that it has only positive eigenvalues or, equivalently, that for any x ∈ H we have x|ρ|x ≥ 0. Furthermore, if ρ is a density operator, T r(ρ2 ) ≤ 1 with equality if and only if ρ is a pure state (i.e. a projection operator). The axioms of quantum mechanics are easily stated in the language of density matrices. For example, if evolution from time t1 to time t2 is described by the unitary transformation U and ρ is the density matrix for time t1 , then the evolved density matrix ρ for time t2 is given by the formula ρ = U ρU † . Similarly, one can describe measurements represented by projective operators in terms of density matrices [NC00, Pre98]. Thus if a projector P acts on a state |ψ then the result is P |ψ; the resulting transformation of density matrices is |ψψ| → P |ψψ|P . For a general density matrix ρ we have ρ → P ρP , note that since P is self-adjoint we do not have to write P † . What are the legitimate “physical” transformations on density matrices? The legitimate transformations obviously take density matrices to density matrices. They have to be positive maps considered as maps between the appropriate ordered vector spaces. The appropriate ordered vector spaces are the vector spaces of linear operators on H the Hilbert space of pure states. Unfortunately the tensor product of two positive maps is not positive in general. The remedy is to require the appropriate condition by fiat. Definition 4. A completely positive map K is a positive map such that for every identity map In : Cn → Cn the tensor product K ⊗ In is positive. It is not hard to show that the tensor of completely positive maps is always a completely positive map. The important result in this regard is the Kraus representation theorem [Cho75].
50
A. Broadbent, J. Fitzsimons, and E. Kashefi
Theorem 1 (Kraus). The general form for a completely positive map E : B(H1 ) → B(H2 ) is E(ρ) = Am ρA†m m
where the Am : H1 → H2 . Here B(H) is the Banach space of bounded linear operators on H. If, in addition, we require that the trace of E(ρ) ≤ 1 then the Am will satisfy A†m Am ≤ I. m
Lastly, the following term is common in the quantum computation literature. Definition 5. A superoperator T is a linear map from BV to BU that is completely positive and trace preserving.
3
MBQC - Syntax
We first develop a notation for 1-qubit measurement-based computations. The basic commands one can use in a pattern are: – – – –
1-qubit auxiliary preparation Ni 2-qubit entanglement operators Eij 1-qubit measurements Miα and 1-qubit Pauli operators corrections Xi and Zi
The indices i, j represent the qubits on which each of these operations apply, and α is a parameter in [0, 2π]. Expressions involving angles are always evaluated modulo 2π. These types of command will be referred to as N , E, M and C. Sequences of such commands, together with two distinguished – possibly overlapping – sets of qubits corresponding to inputs and outputs, will be called measurement patterns, or simply patterns. These patterns can be combined by composition and tensor product. Importantly, corrections and measurements are allowed to depend on previous measurement outcomes. It is known that patterns without these classical dependencies can only realise unitaries that are in the Clifford group [DKP07]. Thus, dependencies are crucial if one wants to define a universal computing model; that is to say, a model where all unitaries over ⊗n C2 can be realised. It is also crucial to develop a notation that will handle these dependencies, which we do now. Preparation Ni prepares qubit i in state |+i . The entanglement commands are defined as Eij := ∧Zij (controlled-Z), while the correction commands are the Pauli operators Xi and Zi . Measurement Miα is defined by orthogonal projections on |+α := |−α :=
√1 (|0 2 √1 (|0 2
+ eiα |1) − eiα |1)
Measurement-Based and Universal Blind Quantum Computation
51
followed by a trace-out operator. The parameter α ∈ [0, 2π] is called the angle of the measurement. For α = 0, α = π2 , one obtains the X and Y Pauli measurements. Operationally, measurements will be understood as destructive measurements, consuming their qubit. The outcome of a measurement done at qubit i will be denoted by si ∈ Z2 . Since one only deals here with patterns where qubits are measured at most once (see condition (D1) below), this is unambiguous. We take the specific convention that si = 0 if under the corresponding measurement the state collapses to |+α , and si = 1 if to |−α . Outcomes can be summed together resulting in expressions of the form s = i∈I si which we call signals, and where the summation is understood as being done in Z2 . We define the domain of a signal as the set of qubits on which it depends. As we have said before, both corrections and measurements may depend on signals. Dependent corrections will be written Xis and Zit and dependent measurements will be written t [Miα ]s , where s, t ∈ Z2 and α ∈ [0, 2π]. The meaning of dependencies for corrections is straightforward: Xi0 = Zi0 = I, no correction is applied, while Xi1 = Xi and Zi1 = Zi . In the case of dependent measurements, the measurement angle will depend on s, t and α as follows: t
(−1)s α+tπ
[Miα ]s := Mi
(1)
so that, depending on the parities of s and t, one may have to modify the α to one of −α, α + π and −α + π. These modifications correspond to conjugations of measurements under X and Z: Xi Miα Xi = Mi−α Zi Miα Zi = Miα+π
(2) (3)
accordingly, we will refer to them as the X and Z-actions. Note that these two actions commute, since −α + π = −α − π up to 2π, and hence the order in which one applies them does not matter. As we will see later, relations (2) and (3) are key to the propagation of dependent corrections, and to obtaining patterns in the standard entanglement, measurement and correction form. Since the measurements considered here are destructive, the above equations actually simplify to Miα Xi = Mi−α Miα Zi
=
Miα−π
(4) (5)
Another point worth noticing is that the domain of the signals of a dependent command, be it a measurement or a correction, represents the set of measurements which one has to do before one can determine the actual value of the command. We have completed our catalog of basic commands, including dependent ones, and we turn now to the definition of measurement patterns. For convenient reference, the language syntax is summarised in Figure 1. We proceed now with the formal definition of a measurement pattern.
52
A. Broadbent, J. Fitzsimons, and E. Kashefi
S := 0, 1, si , S + S A := Ni Eij t [Miα ]s Xis , Zis
Signals Preparations Entanglements Measurements Corrections
Fig. 1. 1-qubit based measurement language syntax
Definition 6. Patterns consists of three finite sets V , I, O, together with two injective maps ι : I → V and o : O → V and a finite sequence of commands An . . . A1 , read from right to left, applying to qubits in V in that order, i.e. A1 first and An last, such that: (D0) (D1) (D2) (D3)
no command depends on an outcome not yet measured; no command acts on a qubit already measured; no command acts on a qubit not yet prepared, unless it is an input qubit; a qubit i is measured if and only if i is not an output.
The set V is called the pattern computation space, and we write HV for the associated quantum state space ⊗i∈V C2 . To ease notation, we will omit the maps ι and o, and write simply I, O instead of ι(I) and o(O). Note, however, that these maps are useful to define classical manipulations of the quantum states, such as permutations of the qubits. The sets I, O are called respectively the pattern inputs and outputs, and we write HI , and HO for the associated quantum state spaces. The sequence An . . . A1 is called the pattern command sequence, while the triple (V, I, O) is called the pattern type. To run a pattern, one prepares the input qubits in some input state |ψ ∈ HI , while the non-input qubits are all set to the |+ state, then the commands are executed in sequence, and finally the result of the pattern computation is read back from outputs as some |φ ∈ HO . Clearly, for this procedure to succeed, we had to impose the (D0), (D1), (D2) and (D3) conditions. Indeed if (D0) fails, then at some point of the computation, one will want to execute a command which depends on outcomes that are not known yet. Likewise, if (D1) fails, one will try to apply a command on a qubit that has been consumed by a measurement (recall that we use destructive measurements). Similarly, if (D2) fails, one will try to apply a command on a non-existent qubit. Condition (D3) is there to make sure that the final state belongs to the output space HO , i.e., that all non-output qubits, and only non-output qubits, will have been consumed by a measurement when the computation ends. We write (D) for the conjunction of our definiteness conditions (D0), (D1), (D2) and (D3). Whether a given pattern satisfies (D) or not is statically verifiable on the pattern command sequence. We could have imposed a simple type system to enforce these constraints but, in the interests of notational simplicity, we chose not to do so. Here is a concrete example: H := ({1, 2}, {1}, {2}, X2s1 M10 E12 N2 )
Measurement-Based and Universal Blind Quantum Computation
53
with computation space {1, 2}, inputs {1}, and outputs {2}. To run H, one first prepares the first qubit in some input state ψ, and the second qubit in state |+, then these are entangled to obtain ∧Z12 (ψ1 ⊗ |+2 ). Once this is done, the first qubit is measured in the |+, |− basis. Finally an X correction is applied on the output qubit, if the measurement outcome was s1 = 1. We will do this calculation in detail later, and prove that this pattern implements the Hadamard operator H. In general, a given pattern may use auxiliary qubits that are neither input nor output qubits. Usually one tries to use as few such qubits as possible, since these contribute to the space complexity of the computation. A last thing to note is that one does not require inputs and outputs to be disjoint subsets of V . This, seemingly innocuous, additional flexibility is actually quite useful to give parsimonious implementations of unitaries [DKP05]. Next we described how one can combine patterns in order to obtain bigger ones. The first way to combine patterns is by composing them. Two patterns P1 and P2 may be composed if V1 ∩ V2 = O1 = I2 . Provided that P1 has as many outputs as P2 has inputs, by renaming the pattern qubits, one can always make them composable. Definition 7. The composite pattern P2 P1 is defined as: — V := V1 ∪ V2 , I = I1 , O = O2 , — commands are concatenated. The other way of combining patterns is to tensor them. Two patterns P1 and P2 may be tensored if V1 ∩ V2 = ∅. Again one can always meet this condition by renaming qubits in such a way that these sets are made disjoint. Definition 8. The tensor pattern P1 ⊗ P2 is defined as: — V = V1 ∪ V2 , I = I1 ∪ I2 , and O = O1 ∪ O2 , — commands are concatenated. In contrast to the composition case, all the unions involved here are disjoint. Therefore commands from distinct patterns freely commute, since they apply to disjoint qubits, and when we say that commands have to be concatenated, this is only for definiteness. It is routine to verify that the definiteness conditions (D) are preserved under composition and tensor product. Before turning to this matter, we need a clean definition of what it means for a pattern to implement or to realise a unitary operator, together with a proof that the way one can combine patterns is reflected in their interpretations.
4
MBQC - Semantics
In this section we give a formal operational semantics for the pattern language as a probabilistic labeled transition system. We define deterministic patterns and thereafter concentrate on them. We show that deterministic patterns compose. We give a denotational semantics of deterministic patterns; from the construction it will be clear that these two semantics are equivalent.
54
A. Broadbent, J. Fitzsimons, and E. Kashefi
Besides quantum states, which are non-zero vectors in some Hilbert space HV , one needs a classical state recording the outcomes of the successive measurements one does in a pattern. If we let V stand for the finite set of qubits that are still active (i.e. not yet measured) and W stands for the set of qubits that have been measured (i.e. they are now just classical bits recording the measurement outcomes), it is natural to define the computation state space as: S := ΣV,W HV × ZW 2 . In other words the computation states form a V, W -indexed family of pairs q, Γ , where q is a quantum state from HV and Γ is a map from some W to the outcome space Z2 . We call this classical component Γ an outcome map, and denote by ∅ the empty outcome map in Z∅ 2 . We will treat these states as pairs unless it becomes important to show how V and W are altered during a computation, as happens during a measurement. Operational Semantics We need some further notation. For any signal s and classical state Γ ∈ ZW 2 , such that the domain of s is included in W , we take sΓ to be the value of s given by the outcome map Γ . That is to say, if s = I si , then sΓ := I Γ (i) where the sum is taken in Z2 . Also if Γ ∈ ZW 2 , and x ∈ Z2 , we define: Γ [x/i](i) = x,
Γ [x/i](j) = Γ (j) for j =i
W ∪{i}
. which is a map in Z2 We may now view each of our commands as acting on the state space S; we have suppressed V and W in the first 4 commands: |q, Γ
i −→
N
|q ⊗ |+i , Γ
|q, Γ
Eij
−→
∧Zij |q, Γ
|q, Γ
−→
XisΓ |q, Γ
|q, Γ
−→
ZisΓ |q, Γ
Xis Zis
t
[Miα ]s
t
[M α ]s
V ∪ {i}, W, |q, Γ −→ V, W ∪ {i}, +αΓ | qi , Γ [0/i] i V, W ∪ {i}, +αΓ | qi , Γ [1/i] V ∪ {i}, W, |q, Γ −→
where αΓ = (−1)sΓ α + tΓ π following equation (1). Note how the measurement moves an index from V to W ; a qubit once measured cannot be measured again. Suppose q ∈ HV , for the above relations to be defined, one needs the indices i, j on which the various command apply to be in V . One also needs Γ to contain the domains of s and t, so that sΓ and tΓ are well-defined. This will always be the case during the run of a pattern because of condition (D). All commands except measurements are deterministic and only modify the quantum part of the state. The measurement actions on S are not deterministic, so that these are actually binary relations on S, and modify both the quantum
Measurement-Based and Universal Blind Quantum Computation
55
and classical parts of the state. The usual convention has it that when one does a measurement the resulting state is renormalised and the probabilities are associated with the transition. We do not adhere to this convention here, instead we leave the states unnormalised. The reason for this choice of convention is that this way, the probability of reaching a given state can be read off its norm, and the overall treatment is simpler. As we will show later, all the patterns implementing unitary operators will have the same probability for all the branches and hence we will not need to carry these probabilities explicitly. Denotational Semantics Let P be a pattern with computation space V , inputs I, outputs O and command sequence An . . . A1 . To execute a pattern, one starts with some input state q in HI , together with the empty outcome map ∅. The input state |q is then tensored with as many |+s as there are non-inputs in V (the N commands), so as to obtain a state in the full space HV . Then E, M and C commands in P are applied in sequence from right to left. We can summarise the situation as follows: / HO O
HI HI × Z∅ 2
prep
/ HV × Z∅ 2
A1 ...An
/ HO × ZV O 2
If m is the number of measurements, which is also the number of non outputs, then the run may follow 2m different branches. Each branch is associated with a unique binary string s of length m, representing the classical outcomes of the measurements along that branch, and a unique branch map As representing the linear transformation from HI to HO along that branch. This map is obtained from the operational semantics via the sequence (qi , Γi ) with 1 ≤ i ≤ n + 1, such that: q1 , Γ1 = q ⊗ |+ . . . +, ∅ qn+1 = q =0 A
i and for all i ≤ n : qi , Γi −→ qi+1 , Γi+1 .
Definition 9. A pattern P realises a map on density matrices ρ given by ρ → † A (ρ)A . We write [[P]] for the map realised by P. s s s Proposition 10. Each pattern realises a completely positive trace preserving map. Proof. Later on we will show that every pattern can be put in a semantically equivalent form where all the preparations and entanglements appear first, followed by a sequence of measurements and finally local Pauli corrections. Hence branch maps decompose as As = Cs Πs U , where Cs is a unitary map over HO collecting all corrections on outputs, Πs is a projection from HV to HO representing the particular measurements performed along the branch, and U is
56
A. Broadbent, J. Fitzsimons, and E. Kashefi
a unitary embedding from HI to HV collecting the branch preparations, and entanglements. Note that U is the same on all branches. Therefore, † † † † s As As = s U Πs Cs Cs Πs U = s U † Πs† Πs U † = U ( s Πs )U = U †U = I where we have used the fact that Cs is unitary, Πs is a projection and U is independent of the branches and is also unitary. Therefore the map T (ρ) := † A (ρ)A is a trace-preserving completely-positive map (CPTP-map), explics s s itly given as a Kraus decomposition. Hence the denotational semantics of a pattern is a CPTP-map. In our denotational semantics we view the pattern as defining a map from the input qubits to the output qubits. We do not explicitly represent the result of measuring the final qubits; these may be of interest in some cases. Techniques for dealing with classical output explicitly are given by Selinger [Sel04] and Unruh [Unr05]. With our definitions in place, we will show that the denotational semantics are composable. Theorem 2. For two patterns P1 and P2 we have [[P1 P2 ]] = [[P2 ]][[P1 ]] and [[P1 ⊗ P2 ]] = [[P2 ]] ⊗ [[P1 ]]. Proof. Recall that two patterns P1 , P2 may be combined by composition provided P1 has as many outputs as P2 has inputs. Suppose this is the case, and suppose further that P1 and P2 respectively realise some CPTP-maps T1 and T2 . We need to show that the composite pattern P2 P1 realises T2 T1 . Indeed, the two diagrams representing branches in P1 and P2 : HI1
HI1 × Z∅ 2
p1
/ HV1 × Z∅ 2
/ H O1 O
HI2
/ HO × ZV1 O1 1 2
HI2 × Z∅ 2
/ H O2 O p2
/ HV2 × Z∅ 2
/ HO × ZV2 O2 2 2
can be pasted together, since O1 = I2 , and HO1 = HI2 . But then, it is enough to notice 1) that preparation steps p2 in P2 commute with all actions in P1 since they apply on disjoint sets of qubits, and 2) that no action taken in P2 depends on the measurements outcomes in P1 . It follows that the pasted diagram describes the same branches as does the one associated to the composite P2 P1 . A similar argument applies to the case of a tensor combination, and one has that P2 ⊗ P1 realises T2 ⊗ T1 .
5
MBQC - Universality
In this section we first introduce a simple parameterised family J(α) that generates all unitaries over C2 . By adding the unitary operator controlled-Z (∧Z)
Measurement-Based and Universal Blind Quantum Computation
57
defined over C2 ⊗ C2 , one then obtains a set of generators for all unitary maps over ⊗n C2 . Both J(α) and ∧Z, have simple realisations in MBQC, using only two qubits. As a consequence, one obtains an implementation of the controlledU (∧U ) family of unitaries, using only 14 qubits. Combining these as building blocks, any general unitary can be obtained by using relatively few auxiliary qubits [DKP05]. Furthermore, this building blocks have an interesting property, namely that their underlying entanglement graphs have no odd-length cycles, and such states have been shown to be robust against decoherence [DAB03]. Consider the following one-parameter family J(α): 1 eiα J(α) := √12 , 1 −eiα we can see already that the Pauli spin matrices, phase and Hadamard operators can be described using only J(α): X = J(π)J(0) Z(α) = J(0)J(α) Z = J(0)J(π) H = J(0) We will also use the following equations: J(0)2 =I J(α)J(0)J(β) = J(α + β) J(α)J(π)J(β) = eiα Z J(β − α) The second and third equations are referred to as the additivity and subtractivity relations. Additivity gives another useful pair of equations: XJ(α) = J(α + π) = J(α)Z
(6)
Any unitary operator U on C2 can be written: U = eiα J(0)J(β)J(γ)J(δ) for some α, β, γ and δ in R. We will refer to this as a J-decomposition of U . To prove this note that all three Pauli rotations are expressible in terms of J(α): α
Rx (α) = e−i 2 J(α)J(0) α π π Ry (α) = e−i 2 J(0)J( )J(α)J(− ) 2 2 −i α 2 Rz (α) = e J(0)J(α)
(7) (8) (9)
From the Z–X decomposition, we know that every 1-qubit unitary operator U can be written as: U = eiα Rz (β)Rx (γ)Rz (δ) and using equations (9) and (7) we get: U = eiα e−i
β+γ+δ 2
J(0)J(β)J(γ)J(δ)
We conclude in particular, that J(α) generates all 1-qubit unitary operators.
58
A. Broadbent, J. Fitzsimons, and E. Kashefi
Next, we turn to the decomposition of ∧U in terms of J(α) and ∧Z. Subscripts on operators indicate the qubit to which they apply, and we sometimes abbreviate Ji (α) as Jiα . Suppose U has J-decomposition eiα J(0)J(β)J(γ)J(δ), then ∧U can also be decomposed as follows:
− γ2
∧U12 = J10 J1α J20 J2β+π J2
−π 2
J2
π
γ
−π−δ−β 2
J20 ∧Z12 J22 J22 J2
−β+δ−π 2
J20 ∧Z12 J2
with α = α + β+γ+δ . 2 To prove the above decomposition, we first define auxiliary unitary operators: A = J(0)J(β + π)J(− γ2 )J(− π2 ) ) B = J(0)J( π2 )J( γ2 )J( −π−δ−β 2 C = J(0)J( −β+δ−π ) 2 Then, using the additivity relation we obtain ABC = I. On the other hand, using both the subtractivity relation and equations (6), we get: )J(π)J( −β+δ−π ) AXBXC = J(0)J(β +π)J(− γ2 )J(−π2 )J(π)J( π2 )J( γ2 )J( −π−δ−β 2 2 δ+β+γ = e−i 2 J(0)J(β)J(γ)J(δ) 2α+β+γ+δ
2 Therefore one also has ei AXBXC = U . Combining our two equations in A, B, C, we obtain ∧U12 = P1 (α )A2 ∧X12 B2 ∧X12 C2 with α = α + β+γ+δ ; a decomposition which we can rewrite using our 2 generating set: Z(α)1 = J10 J1α ∧X12 = H2 ∧Z12 H2 = J20 ∧Z12 J20 to obtain the above decomposition of ∧U . Having all unitaries U over C2 and all unitaries of the form ∧U over C2 ⊗ C2 we can conclude that: Theorem 3 (Universality). The set {J(α), ∧Z} generates all unitaries.
The following unitaries H = J(0), Z( π4 ) = J(0)J( π4 ), and ∧X = J(0) ∧ZJ(0), are known to be approximately universal, in the sense that any unitary can be approximated within any precision by combining these [NC00]. Therefore the set J(0), J( π4 ) and ∧Z is also approximately universal. It is easy to verify that the following patterns implement our generators J (α) := X2s1 M1−α E12 ∧Z := E12 where in the first pattern 1 is the only input and 2 is the only output, while in the second both 1 and 2 are inputs and outputs (note that we are allowing patterns to have overlapping inputs and outputs). Combining these two patterns, by composition and tensoring, will therefore generate patterns realising all unitaries over ⊗n C2 . These patterns are indeed among the simplest possible. Remarkably, there is only one single dependency overall, which occurs in the correction phase of J (α). No set of patterns without any measurement could be a generating set, since those can only implement unitaries in an abelian subgroup of the Clifford group.
Measurement-Based and Universal Blind Quantum Computation
6
59
Measurement Calculus
We turn now to the structural result on MBQC asserting that the key factorisation property, namely that entanglement can be done first, and then local measurements, can be reduced to confluence properties of a simple algebraic rewriting system [DKP07]. The expressions appearing as commands are all linear operators on Hilbert space. At first glance, the appropriate equality between commands is equality as operators. For the deterministic commands, the equality that we consider is indeed equality as operators. This equality implies equality in the denotational semantics. However, for measurement commands one needs a stricter definition for equality in order to be able to apply them as rewriting rules. Essentially we have to take into the account the effect of different branches that might result from the measurement process. The precise definition is below. Definition 11. Given two patterns P and P we define P = P if and only if P P P for any branch s, we have AP s = As , where As and As are the branch map As defined in Section 4. The first set of equations gives the means to propagate local Pauli corrections through the entangling operator Eij . Eij Xis = Xis Zjs Eij
(10)
Eij Xjs Eij Zis Eij Zjs
(11) (12)
= = =
Xjs Zis Eij Zis Eij Zjs Eij
(13)
These equations are easy to verify and are natural since Eij belongs to the Clifford group, and therefore maps under conjugation the Pauli group to itself. Note that, despite the symmetry of the Eij operator qua operator, we have to consider all the cases, since the rewrite system defined below does not allow one to rewrite Eij to Eji . If we did allow this the rewrite process could loop forever. A second set of equations allows one to push corrections through measurements acting on the same qubit. Again there are two cases: t
[Miα ]s Xir = t [Miα ]s+r
(14)
t
(15)
[Miα ]s Zir = t+r [Miα ]s
These equations follow easily from equations (4) and (5). They express the fact that the measurements Miα are closed under conjugation by the Pauli group, very much like equations (10),(11),(12) and (13) express the fact that the Pauli group is closed under conjugation by the entanglements Eij . Define the following convenient abbreviations: [Miα ]s := 0 [Miα ]s
t
[Miα ] := t [Miα ]0
Miα := 0 [Miα ]0
Mix := Mi0
Particular cases of the equations above are: Mix Xis = Mix Miy Xis = [Miy ]s = s [Miy ] = Miy Zis
π
Miy := Mi2
60
A. Broadbent, J. Fitzsimons, and E. Kashefi
The first equation, follows from the fact that −0 = 0, so the X action on Mix is trivial; the second equation, is because − π2 is equal π2 + π modulo 2π, and therefore the X and Z actions coincide on Miy . So we obtain the following: t
[Mix ]s = t [Mix ] t [Miy ]s = s+t [Miy ]
(16) (17)
which we will use later to prove that patterns with measurements of the form M x and M y may only realise unitaries in the Clifford group. We now define a set of rewrite rules, obtained by orienting the equations above. Recall that patterns are executed from right to left: Eij Xis Eij Xjs Eij Zis Eij Zjs t [Miα ]s Xir t [Miα ]s Zir
⇒ Xis Zjs Eij ⇒ Xjs Zis Eij ⇒ Zis Eij ⇒ Zjs Eij ⇒ t [Miα ]s+r ⇒ r+t [Miα ]s
EX EX EZ EZ MX MZ
to which we need to add the free commutation rules, obtained when commands operate on disjoint sets of qubits: Eij A→ ⇒ A→ Eij k k A→ Xis ⇒ Xis A→ k k A→ Zis ⇒ Zis A→ k
→
k
where A is not an entanglement where A is not a correction where A is not a correction
where k represent the qubits acted upon by command A, and are supposed to be distinct from i and j. Clearly these rules could be reversed since they hold as equations but we are orienting them this way in order to obtain termination. Condition (D) is easily seen to be preserved under rewriting. Under rewriting, the computation space, inputs and outputs remain the same, and so do the entanglement commands. Measurements might be modified, but there is still the same number of them, and they still act on the same individual qubits. The only induced modifications concern local corrections and dependencies. If there was no dependency at the start, none will be created in the rewriting process. In order to obtain such rewrite rules, it was essential that the entangling command (∧Z) belongs to the normaliser of the Pauli group. The point is that the Pauli operators are the correction operators and they can be dependent, thus we can commute the entangling commands to the beginning without inheriting any dependency. Therefore the entanglement resource can indeed be prepared at the outset of the computation. Write P ⇒ P if one obtains the command sequence of P from the command sequence of P by applying one of the rewrite rules of the previous section. Similarly we will write P ⇒∗ P if one obtains the command sequence of P from the command sequence of P by applying multiple rewrite rules. We say that P is
Measurement-Based and Universal Blind Quantum Computation
61
standard if for no P , P ⇒ P and the procedure of writing a pattern to standard form is called standardisation. We use the word “standardisation” instead of the more usual “normalisation” in order not to cause terminological confusion with the physicists’ notion of normalisation. One of the most important results about the rewrite system is that it has the desirable properties of determinacy (confluence) and termination (standardisation). In other words, we will show that for all P, there exists a unique standard P , such that P ⇒ P . It is, of course, crucial that the standardisation process leaves the semantics of patterns invariant. This is the subject of the next simple, but important, proposition, Proposition 12. Whenever P ⇒ P , [[P]] = [[P ]]. Proof. It is enough to prove the case where P ⇒ P , as the more general case P ⇒∗ P follows by induction. The first group of rewrites has been proved to be sound in the preceding subsections, while the free commutation rules are obviously sound. We now begin the main proof of this section. First, we prove termination. Theorem 4 (Termination). All rewriting sequences beginning with a pattern P terminate after finitely many steps. For our rewrite system, this implies that for all P there exist finitely many P such that P ⇒ P where the P are standard. Proof. Suppose P has command sequence An . . . A1 ; so the number of commands is n. Let e ≤ n be the number of E commands in P. This number is invariant under ⇒. Moreover E commands in P can be ordered by increasing depth, read from right to left, and this order, written <E , is also invariant, since EE commutations are forbidden explicitly in the free commutation rules. Define the following depth function d on E and C commands in P: i if Ai = Ejk d(Ai ) = n − i if Ai = Cj Define further the following sequence of length e, dE (P)(i) is the depth of the E-command of rank i according to <E . By construction this sequence is strictly increasing. Finally, we define the measure m(P) := (dE (P), dC (P)) with: dC (P) = C∈P d(C) We claim the measure we just defined decreases lexicographically under rewriting, in other words P ⇒ P implies m(P) > m(P ), where < is the lexicographic ordering on Ne+1 . To clarify these definitions, consider the following example. Suppose P’s command sequence is of the form EXZE, then e = 2, dE (P) = (1, 4), and m(P) = (1, 4, 3). For the command sequence EEX we get that e = 2, dE (P) = (2, 3) and m(P) = (2, 3, 2). Now, if one considers the rewrite EEX ⇒ EXZE, the measure of the left hand side is (2, 3, 2), while the measure of the right hand
62
A. Broadbent, J. Fitzsimons, and E. Kashefi
side, as said, is (1, 4, 3), and indeed (2, 3, 2) > (1, 4, 3). Intuitively the reason is clear: the Cs are being pushed to the left, thus decreasing the depths of Es, and concomitantly, the value of dE . Let us now consider all cases starting with an EC rewrite. Suppose the E command under rewrite has depth d and rank i in the order <E . Then all Es of smaller rank have same depth in the right hand side, while E has now depth d − 1 and still rank i. So the right hand side has a strictly smaller measure. Note that when C = X, because of the creation of a Z (see the example above), the last element of m(P) may increase, and for the same reason all elements of index j > i in dE (P) may increase. This is why we are working with a lexicographical ordering. Suppose now one does an M C rewrite, then dC (P) strictly decreases, since one correction is absorbed, while all E commands have equal or smaller depths. Again the measure strictly decreases. Next, suppose one does an EA rewrite, and the E command under rewrite has depth d and rank i. Then it has depth d − 1 in the right hand side, and all other E commands have invariant depths, since we forbade the case when A is itself an E. It follows that the measure strictly decreases. Finally, upon an AC rewrite, all E commands have invariant depth, except possibly one which has smaller depth in the case A = E, and dC (P) decreases strictly because we forbade the case where A = C. Again the claim follows. So all rewrites decrease our ordinal measure, and therefore all sequences of rewrites are finite, and since the system is finitely branching (there are no more than n possible single step rewrites on a given sequence of length n), we get the statement of the theorem. The next theorem establishes the important determinacy property and furthermore shows that the standard patterns have a certain canonical form which we call the NEMC form. The precise definition is: Definition 13. A pattern has a NEMC form if its commands occur in the order of N s first, then Es , then M s, and finally Cs. We will usually just say “EMC” form since we can assume that all the auxiliary qubits are prepared in the |+ state and we usually just elide these N commands. Theorem 5 (Confluence). For all P, there exists a unique standard P , such that P ⇒ P , and P is in EMC form. Proof. Since the rewriting system is terminating, confluence follows from local confluence by Newman’s lemma, see, for example, [Bar84]. This means that whenever two rewrite rules can be applied to a term t yielding t1 and t2 , one can rewrite both t1 and t2 to a common third term t3 , possibly in many steps. Then the uniqueness of the standard form is an immediate consequence. In order to prove the local confluence we look for critical pairs, that is occurrences of three successive commands where two rules can be applied simultaneously. One finds that there are only five types of critical pairs, of these the three involve the N command, these are of the form: N M C, N EC and N EM ; and
Measurement-Based and Universal Blind Quantum Computation
63
the remaining two are: Eij Mk Ck with i, j and k all distinct, Eij Mk Cl with k and l distinct. In all cases local confluence is easily verified. Suppose now P does not satisfy the EMC form conditions. Then, either there is a pattern EA with A not of type E, or there is a pattern AC with A not of type C. In the former case, E and A must operate on overlapping qubits, else one may apply a free commutation rule, and A may not be a C since in this case one may apply an EC rewrite. The only remaining case is when A is of type M , overlapping E’s qubits, but this is what condition (D1) forbids, and since (D1) is preserved under rewriting, this contradicts the assumption. The latter case is even simpler. We have shown that under rewriting any pattern can be put in EMC form, which is what we wanted. We actually proved more, namely that the standard form obtained is unique. However, one has to be a bit careful about the significance of this additional piece of information. Note first that uniqueness is obtained because we dropped the CC and EE free commutations, thus having a rigid notion of command sequence. One cannot put them back as rewrite rules, since they obviously ruin termination and uniqueness of standard forms. A reasonable thing to do, would be to take this set of equations as generating an equivalence relation on command sequences, call it ≡, and hope to strengthen the results obtained so far, by proving that all reachable standard forms are equivalent. But this is too naive a strategy, since E12 X1 X2 ≡ E12 X2 X1 , and: E12 X1s X2t ⇒ X1s Z2s X2t Z1t E12 ≡ X1s Z1t Z2s X2t E12 obtaining an expression which is not symmetric in 1 and 2. To conclude, one has to extend ≡ to include the additional equivalence X1s Z1t ≡ Z1t X1s , which fortunately is sound since these two operators are equal up to a global phase. Thus, these are all equivalent in our semantics of patterns. We summarise this discussion as follows. Definition 14. We define an equivalence relation ≡ on patterns by taking all the rewrite rules as equations and adding the equation X1s Z1t ≡ Z1t X1s and generating the smallest equivalence relation. With this definition we can state the following proposition. Proposition 15. All patterns that are equivalent by ≡ are equal in the denotational semantics. This ≡ relation preserves both the type (the (V, I, O) triple) and the underlying entanglement graph. So clearly semantic equality does not entail equality up to ≡. In fact, by composing teleportation patterns one obtains infinitely many patterns for the identity which are all different up to ≡. One may wonder whether two patterns with same semantics, type and underlying entanglement graph are necessarily equal up to ≡. This is not true either. One has J(α)J(0)J(β) = J(α + β) = J(β)J(0)J(α) (where J(α) is defined in Section 5),
64
A. Broadbent, J. Fitzsimons, and E. Kashefi
and this readily provides a counter-example. We can now formally describe a simple standardisation algorithm. Algorithm. Input: A pattern P on |V | = N qubits with command sequence AM · · · A1 . Output: An equivalent pattern P in NEMC form. 1. Commute all the preparation commands (new qubits) to the right side. 2. Commute all the correction commands to the left side using the EC and MC rewriting rules. 3. Commute all the entanglement commands to the right side after the preparation commands. Note that since each qubit can be entangled with at most N −1 other qubits, and can be measured or corrected only once, we have O(N 2 ) entanglement commands and O(N ) measurement commands. According to the definiteness condition, no command acts on a qubit not yet prepared, hence the first step of the above algorithm is based on trivial commuting rules; the same is true for the last step as no entanglement command can act on a qubit that has been measured. Both steps can be done in O(N 2 ) time. The real complexity of the algorithm comes from the second step and the EX commuting rule. In the worst case scenario, commuting an X correction to the left might create O(N 2 ) other Z corrections, each of which has to be commuted to the left themselves. Thus one can have at most O(N 3 ) new corrections, each of which has to be commuted past O(N 2 ) measurement or entanglement commands. Therefore the second step, and hence the algorithm, has a worst case complexity of O(N 5 ) time. We conclude this subsection by emphasising the importance of the EMC form. Since the entanglement can always be done first, we can always derive the entanglement resource needed for the whole computation right at the beginning. After that only local operations will be performed. This will separate the analysis of entanglement resource requirements from the classical control. Furthermore, this makes it possible to extract the maximal parallelism for the execution of the pattern since the necessary dependencies are explicitly expressed, see [BK09] for further discussion. The EMC form also provides us with tools to prove general theorems about patterns, such as the fact that they always compute CPTP-maps and the expressiveness theorems [DKP07]. Finally, we present later the first MBQC protocol designed using the EMC form which allows one to clearly distinguish between the quantum and classical aspects of a quantum computation.
7
Determinism
An important aspect of MBQC is the way the inherent randomness of the measurement outcomes can be accounted for, so that the overall computation remains deterministic. This is accomplished by conditioning the basis of certain measurements upon the outcome of others, introducing a measurement order. We first introduce various notions of determinism. A pattern is said to be
Measurement-Based and Universal Blind Quantum Computation
65
deterministic if it realises a CPTP-map that brings pure states to pure states. This is equivalent to saying that for a deterministic pattern branch maps are proportional, that is to say, for all q ∈ HI and all s1 , s2 ∈ Zn2 , As1 (q) and As2 (q) differ only up to a scalar. The class of deterministic patterns include projections. A more restricted class contains all the unitary and unitary embedding operators: a pattern is said to be strongly deterministic when branch maps are equal (up to a global phase), i.e. for all s1 , s2 ∈ Zn2 , As1 = eiφs1 ,s2 As2 . These are the patterns implementing quantum algorithms and hence understanding their structural properties is of particular interest. Proposition 16. If a pattern is strongly deterministic, then it realises a unitary embedding. Proof. † Define T to be the map realised by the pattern. We have T (ρ) = s As ρAs . Since the pattern in strongly deterministic all the branch maps are the same. Define A to be 2n/2 As , then A must be a unitary embedding, because A† A = I. An important sub-class of deterministic patterns are robust under the changes of the angles: a pattern is said to be uniformly deterministic if it is deterministic for all values of its measurement angles. In other words a uniformly deterministic pattern defines a class of quantum operators that can be performed given the same initial entanglement resources. On the other hand it is known that if we fix the angle of measurements to be Pauli the obtained operators is in Clifford group [DKP07]. That means uniform determinism allow us to associate to a family of quantum operators a canonical pattern implementing a Clifford operator, a potential valuable abstract reduction for the study of quantum operators. Finally a pattern is said to be stepwise deterministic if it is deterministic after performing each single measurement together with all the corrections depending on the result of that measurement. In another words a pattern is stepwise deterministic if after each single measurements there exists a set of local corrections depending only on the result of this measurements to be performed on some or all of the non-measured qubits that will make the two branches equal (up to a global phase). A variety of methods for constructing measurement patterns have been already proposed that guarantee determinism by construction [RBB03, HEB04, CLN05a]. We introduce a direct condition on graph states which guarantees a strong form of deterministic behaviour for a class of MBQC patterns defined over them [DK06]. Remarkably, this condition bears only on the geometric structure of the entangled graph states. Let us define an open graph state (G, I, O) to be a state associated with an undirected graph G together with two subsets of nodes I and O, called inputs and outputs. We write V for the set of nodes in G, I c , and Oc for the complements of I and O in V , NG (i) for the set of neighbours of i in G, i ∼ j for (i, j) ∈ G, and EG := i∼j Eij for the global entanglement operator associated to G. Thus i ∼ j denotes that i is adjacent to j in G. NI c denotes the sequence of preparation commands i∈I c Ni .
66
A. Broadbent, J. Fitzsimons, and E. Kashefi
Definition 17. A flow (f, ) for an open graph state (G, I, O) consists of a map f : Oc → I c and a partial order over V such that for all x ∈ Oc : (i) x ∼ f (x); (ii) x f (x); (iii) for all y ∼ f (x), x y . As one can see, a flow consists of two structures: a function f over vertices and a matching partial order over vertices. In order to obtain a deterministic pattern for an open graph state with flow, dependent corrections will be defined based on function f . The order of the execution of the commands is given by the partial order induced by the flow. The matching properties between the function f and the partial order will make the obtained pattern runnable. Figure 2 shows an open graph state together with a flow, where f is represented by arcs from Oc (measured qubits, black vertices) to I c (prepared qubits, non boxed vertices). The associated partial order is given by the labelled sets of vertices. The coarsest order for which (f, ) is a flow is called the dependency order induced by f and its depth (4 in Figure 2) is called flow depth. The existence of a causal flow is a sufficient condition for determinism. Before we can prove this, however, we need the following simple lemma which describes an essential property of graph state. Lemma 18 For any open graph (G, I, O) and any i ∈ I c , EG NI c = Xi ZNG (i) EG NI c Proof. The proof is based on equations 10, 12 of the Measurement Calculus, and the additional equation Xi Ni = Ni , which follows from the fact that Ni produces a qubit in the |+ state which is a fix point of X.
4 3
2 b
g
d
1
f a
c e
Fig. 2. An open graph state with flow. The boxed vertices are the input qubits and the white vertices are the output qubits. All the non-output qubits, black vertices, are measured during the run of the pattern. The flow function is represented as arcs and the partial order on the vertices is given by the 4 partition sets.
Measurement-Based and Universal Blind Quantum Computation
67
EG NI c = E G Xi NI c Xi NI c = E E k,l i,j =i,l
=i G (i) j∈N (k,l)∈G,k
Xi j∈NG (i) Zj = (k,l)∈G,k
=i,l
=i Ek,l j∈NG (i) Ei,j NI c = Xi j∈NG (i) Zj EG NI c = Xi ZNG (i) EG NI c The operator Ki := Xi ( j∈NG (i) Zj ) is called graph stabiliser [HEB04] at qubit i and the above lemma proves Ki EG NI c = EG NI c . Note that this equation is slightly more general than the common graph stabiliser [HEB04] as it can be applied to open graph states where input qubits are prepared in arbitrary states. Theorem 6. Suppose the open graph state (G, I, O) has flow f , then the pattern: si si αi := Z M X EG Pf,G,→ i k f (i) α i∈Oc
k∼f (i) k
=i
where the product follows the dependency order of f , is uniformly and strongly deterministic, and realises the unitary embedding: |Oc |/2 := 2 + | EG UG,I,O,→ αi i α i∈Oc
Proof. The proof is based on anachronical patterns, i.e. patterns which do not satisfy the D0 condition (see section 3) saying that no command depends on an outcome not yet measured. Indeed, in the anachronical pattern Miα Zisi , the command Zisi depends on the outcome si whereas the qubit i is not yet measured. However, by relaxing the D0 condition, we have the following equation: +α |i = Miα Zisi Indeed, if si = 0 the measurement realises the projection +α |i , and if si = 1 the measurement realises the projection −α |i = +α |i Zi . Thus, any correction-free αi pattern i∈Oc M i EG NI c can be turned into an anachronical strongly deterministic pattern i∈Oc Miαi Zisi EG NI c which realises UG . The rest of the proof consists in transforming this anachronical pattern into a pattern which satisfies the D0 condition: αi i αi si si si c = M Z E N M Z Z X EG NI c c c G I si i i i i∈O i∈O j∈NG (f (i)) j f (i) ≺ si si αi = i∈Oc Xf (i) j∈NG (f (i)){i} Zj Mi EG NI c Lemma 18 and condition 3 of the causal flow are used in the previous equation for eliminating the command Zsii , whereas conditions 1 and 2 ensure that the pattern satisfies the D0 condition. The intuition of the proof is that entanglement between two qubits i and j converts an anachronical Z correction at i, given in the term Miα Zisi , into a
68
A. Broadbent, J. Fitzsimons, and E. Kashefi
pair of a ‘future’ X correction on qubit j. The existence of the flow is only a sufficient condition for determinism which assign to every single measured qubit a single unique correcting vertex f (i). A natural generalisation is to consider a set of vertices as a correcting set which leads to a full characterisation of determinism [BKMP07]. Having obtained the rigourous mathematical model underlying MBQC, we can now demonstrate how this model suggests new techniques for designing quantum protocols.
8
Universal Blind Quantum Computing
When the technology to build quantum computers becomes available, it is likely that initially it will only be accessible to a handful of centres around the world. Much like today’s rental system of supercomputers, users will probably be granted access to the computers in a limited way. How will a user interface with such a quantum computer? Here, we consider the scenario where a user is unwilling to reveal the computation that the remote computer is to perform, but still wishes to exploit this quantum resource. The solution is the Universal Blind Quantum Computing (UBQC) protocol [BFK09] that allows a client Alice (who does not have any quantum computational resources or quantum memory) to interact with a server Bob (who has a quantum computer) in order for Alice to obtain the outcome of her target computation such that privacy is preserved. This means that Bob learns nothing about Alice’s inputs, outputs, or desired computation. The privacy is perfect, does not rely on any computational assumptions, and holds no matter what actions a cheating Bob undertakes. Alice only needs to be able to prepare single qubits randomly chosen from a finite set and send them to the server, who has the balance of the required quantum computational resources. After this initial preparation, Alice and Bob use two-way classical communication which enables Alice to drive the computation by giving single-qubit measurement instructions to Bob, depending on previous measurement outcomes. Note that if Alice wanted to compute the solution to a classical problem in NP, she could efficiently verify the outcome. An interfering Bob is not so obviously detected in other cases. UBQC uses an authentication technique which performs this detection. The UBQC protocol is constructed using the unique feature of MBQC that separates the classical and quantum parts of a computation, leading to a generic scheme for blind computation of any circuit without requiring any quantum memory for Alice. This is fundamentally different from previously known classical or quantum schemes. UBQC can be viewed as a distributed version of an MBQC computation (where Alice prepares the individual qubits, Bob does the entanglement and measurements, and Alice computes the classical feedforward mechanism), on top of which randomness is added in order to obscure the computation from Bob’s point of view. This is the first time that a new functionality has been achieved thanks to MBQC (though other theoretical advances due to MBQC appear in [RHG06, MS08]). From a conceptual point of view, this shows that MBQC has tremendous potential for the development of new protocols, and
Measurement-Based and Universal Blind Quantum Computation
69
maybe even of algorithms. UBQC can be used for any quantum circuit and also works for quantum inputs or outputs. We now give some applications. – Factoring. Factoring is a prime application of UBQC: by implementing Shor’s factoring algorithm [Sho97] as a blind quantum computation, Alice can use Bob to help her factor a product of large primes which is associated with an RSA public key [RSA78]. Thanks to the properties of UBQC, Bob will not only be unable to determine Alice’s input, but will be completely oblivious to the fact that he is helping her factor. – BQP-complete problems. UBQC could be used to help Alice solve a BQPcomplete problem, for instance approximating the Jones polynomial [AJL06]. There is no known classical method to efficiently verify the solution; this motivates the need for authentication of Bob’s computation, even in the case that the output is classical. – Processing quantum information. Alice may wish to use Bob as a remote device to manipulate quantum information. Consider the case where Alice is participating in a quantum protocol such as a quantum interactive proof. She can use UBQC to prepare a quantum state, to perform a measurement on a quantum system, or to process quantum inputs into quantum outputs. – Quantum prover interactive proofs. UBQC can be used to accomplish an interactive proof for any language in BQP, with a quantum prover and a nearly-classical verifier, where the verifier requires the power to generate random qubits chosen from a fixed set. Moreover, UBQC can be adapted to provide a two-prover interactive proof for any problem in BQP with a purely classical verifier. The modification requires that the provers share entanglement but otherwise be unable to communicate. Guided by the verifier, the first prover measures his part of the entanglement in order to create a shared resource between the verifier and the second prover. The remainder of the interaction involves the verifier and the second prover who essentially run the main protocol. In the classical world, Feigenbaum introduced the notion of computing with encrypted data [Fei86], according to which a function f is encryptable if Alice can easily transform an instance x into instance x , obtain f (x ) from Bob and efficiently compute f (x) from f (x ), in such a way that Bob cannot infer x from x . Following this, Abadi, Feigenbaum and Kilian [AFK89] gave an impossibility result: no NP-hard function can be computed with encrypted data (even probabilistically and with polynomial interaction), unless the polynomial hierarchy collapses at the third level. Ignoring the blindness requirement of UBQC yields an interactive proof with a BQP prover and a nearly-classical verifier. This scenario was first proposed in the work of [ABE10], using very different techniques based on authentication schemes. Their protocol can be also used for blind quantum computation. However, their scheme requires that Alice have quantum computational resources and memory to act on a constant-sized register. A related classical protocol for the scenario involving a P prover and a nearly-linear time verifier was given in [GKR08].
70
A. Broadbent, J. Fitzsimons, and E. Kashefi
Returning to the cryptographic scenario, still in the model where the function is classical and public, Arrighi and Salvail [AS06] gave an approach using quantum resources. The idea of their protocol is that Alice gives Bob multiple quantum inputs, most of which are decoys. Bob applies the target function on all inputs, and then Alice verifies his behaviour on the decoys. There are two important points to make here. First, the protocol only works for a restricted set of classical functions called random verifiable: it must be possible for Alice to efficiently generate random input-output pairs. Second, the protocol does not prevent Bob from learning Alice’s private input; it provides only cheat sensitivity. The case of a blind quantum computation was first considered by Childs [Chi05] based on the idea of encrypting input qubits with a quantum one-time pad [AMTW00, BR03]. At each step, Alice sends the encrypted qubits to Bob, who applies a known quantum gate (some gates requiring further interaction with Alice). Bob returns the quantum state, which Alice decrypts using her key. Cycling through a fixed set of universal gates ensures that Bob learns nothing about the circuit. The protocol requires fault-tolerant quantum memory and the ability to apply local Pauli operators at each step, and does not provide any method for the detection of malicious errors. The UBQC protocol [BFK09] is the first protocol for universal blind quantum computation where Alice has no quantum memory that works for any quantum circuit and assumes Alice has a classical computer, augmented with the power to prepare single qubits randomly chosen in √ {1/ 2 |0 + eiθ |1 | θ = 0, π/4, 2π/4, . . . , 7π/4} . The required quantum and classical communication between Alice and Bob is linear in the size of Alice’s desired quantum circuit. Interestingly, it is sufficient for our purposes to restrict Alice’s classical feed forward computation to modulo 8 arithmetic! Similar observations in a non-cryptographic context have been made in [AB09]. Except for an unavoidable leakage of the size of Alice’s data [AFK89], Alice’s privacy is perfect. We provide an authentication technique to detect an interfering Bob with overwhelming probability; this is optimal since there is always an exponentially small probability that Bob can guess a path that will make Alice accept. All previous protocols for blind quantum computation require technology for Alice that is today unavailable: Arrighi and Salvail’s protocol requires multiqubit preparations and measurements, Childs’ protocol requires fault-tolerant quantum memory and the ability to apply local Pauli operators at each step, while Aharonov, Ben-Or and Eban’s protocol requires a constant-sized quantum computer with memory. In sharp contrast to this, from Alice’s point of view, UBQC can be implemented with physical systems that are already available and well-developed. The required apparatus can be achieved by making only minor modifications to equipment used in the BB84 key exchange protocol [BB84].
Measurement-Based and Universal Blind Quantum Computation
9
71
The Brickwork States
The family of graph states called cluster states [RB01] is universal for MBQC, however, the method that allows arbitrary computation on the cluster state consists in first tailoring the cluster state to the specific computation by performing some computational basis measurements. If one was to use this principle or any arbitrary graph sates for blind quantum computing, Alice would have to reveal information about the structure of the underlying graph. Instead UBQC uses a new family of states called the brickwork states (Figure 3) which are universal for X − Y plane measurements and thus do not require the initial computational basis measurements. Other universal graph states that do not require initial computational basis measurements have appeared in [CLN05b]. Definition 19. A brickwork state Gn×m , where m ≡ 1 or 5 (mod 8), is an entangled state of n × m qubits constructed as follows (see also Figure 3): 1. Prepare all qubits in state |+ and assign to each qubit an index (i, j), i being a column (i ∈ [n]) and j being a row (j ∈ [m]). 2. For each row, apply the operator ∧Z on qubits (i, j) and (i, j + 1) where 1 ≤ j ≤ m − 1. 3. For each column j ≡ 3 (mod 8) and each odd row i, apply the operator ∧Z on qubits (i, j) and (i + 1, j) and also on qubits (i, j + 2) and (i + 1, j + 2). 4. For each column j ≡ 7 (mod 8) and each even row i, apply the operator ∧Z on qubits (i, j) and (i + 1, j) and also on qubits (i, j + 2) and (i + 1, j + 2). ... ... ... ... ... .. .
.. . ... ...
Fig. 3. The brickwork state, Gn×m . Qubits |ψx,y (x = 1, . . . , n, y = 1, . . . , m) are arranged according to layer x and row y, corresponding to the vertices in the above graph, and are originally in the |+ = √12 |0 + √12 |1 state. Controlled-Z gates are then performed between qubits which are joined by an edge.
Theorem 7 (Universality). The brickwork state Gn×m is universal for quantum computation. Furthermore, we only require single-qubit measurements under the angles {0, ±π/4, ±π/2}, and measurements can be done layer-by-layer.
72
A. Broadbent, J. Fitzsimons, and E. Kashefi
Proof. It is well-known that the set U = {∧X, H, Z( π4 )}, is a universal set of gates, where ∧X denotes the controlled-X operator; we will show how the brickwork state can be used to compute any gate in U . Recall the rotation iθX iθZ transformations: X(θ) = e 2 and Z(θ) = e 2 . Consider the measurement pattern and underlying graph state given in Figure 4. The implicit required corrections are implemented according to the flow condition [DK06] which guarantees determinism, and allows measurements to be performed layer-by-layer. The action of the measurement of the first three qubits on each wire is clearly given by the rotations in the right-hand part of Figure 4 [BB06]. The circuit identity follows since ∧Z commutes with Z(α) and is self-inverse. By assigning specific values to the angles, we get the Hadamard gate (Figure 5), the Z( π4 ) gate (Figure 6) and the identity (Figure 7). By symmetry, we can get H or Z(π/4) acting on logical qubit 2 instead of logical qubit 1. In Figure 8, we give a pattern and show using circuit identities that it implements a ∧X. The verification of the circuit identities is straightforward. Again by symmetry, we can reverse the control and target qubits. Note that as long as we have ∧Xs between any pair of neighbours, this is sufficient to implement ∧X between further away qubits. We now show how we can tile the patterns as given in Figures 4 through 8 (the underlying graph states are the same) to implement any circuit using U as a universal set of gates. In Figure 9, we show how a 4-qubit circuit with three gates, U1 , U2 and U3 (each gate acting on a maximum of two adjacent qubits) can be implemented on the brickwork state G9,4 . We have completed the top and bottom logical wires with a pattern that implements the identity. Generalising this technique, we get the family of brickwork states as given in Figure 3 and Definition 19. Here we only consider approximate universality. This allows us to restrict the angles of preparation and measurement to a finite set and hence simplify the
α
β
γ
0
Rz (α)
Rx (β)
Rz (γ)
Rz (α )
Rx (β )
Rz (γ )
= α
β
γ
0
Fig. 4. Pattern with arbitrary rotations. Squares indicate output qubits.
4
4
4
2
I ?
2
2
2
2
Fig. 5. Implementation of a Hadamard gate
Measurement-Based and Universal Blind Quantum Computation
6
2
2
73
6
2 ?
2
2
2
2
Fig. 6. Implementation of a Z(π/4) gate
0
0
0
0 =
0
0
0
0
Fig. 7. Implementation of the identity
2
4
2
2
I
S{ ) 4 *I I
I
? 2
4
2
. 4 S{ ) 4 *
I S{ ) 4 *I [
I S{ ). 4 *I [
?
? Sy ) 4 *
[
Sy ). 4 *
[
Fig. 8. Implementation of a ∧X
U1 U3
=
U2
U1 U3 U2
Fig. 9. Tiling for a 4-qubit circuit with three gates
description of the protocol. However one can easily extend UBQC to achieve exact universality as well, provided Alice can communicate real numbers to Bob.
10
The UBQC Protocol
Suppose Alice has in mind a unitary operator U that is implemented with a pattern on a brickwork state Gn×m (Figure 3) with measurements given as multiples of π/4. This pattern could have been designed either directly in MBQC
74
A. Broadbent, J. Fitzsimons, and E. Kashefi
or from a circuit construction. Each qubit |ψx,y ∈ Gn×m is indexed by a column x ∈ {1, . . . , n} and a row y ∈ {1, . . . , m}. Thus each qubit is assigned: a measurement angle φx,y , a set of X-dependencies Dx,y ⊆ [x − 1] × [m], and a set ⊆ [x − 1] × [m] . Here, we assume that the dependency of Z-dependencies Dx,y sets Xx,y and Zx,y are obtained via the flow construction [DK06]. During the execution of the pattern, the actual measurement angle φx,y is a modification of φx,y that depends on previous measurement outcomes in the following way: let sX x,y = ⊕i∈Dx,y si be the parity of all measurement outcomes for qubits in Xx,y and similarly, sZ si be the parity of all measurement outcomes x,y = ⊕i∈Dx,y X
for qubits in Zx,y . Then φx,y = (−1)sx,y φx,y + sZ x,y π . Protocol 1 implements a blind quantum computation for U . Note that we assume that Alice’s input to the computation is built into U . In other words, Alice wishes to compute U |0, her input is classical and the first layers of U may depend on it.
Protocol 1. Universal Blind Quantum Computation 1. Alice’s preparation For each column x = 1, . . . , n For each row y = 1, . . . , m 1.1 Alice prepares |ψx,y ∈R {|+θx,y = √12 (|0 + eiθx,y |1) | θx,y = 0, π/4, . . . , 7π/4} and sends the qubits to Bob. 2. Bob’s preparation 2.1 Bob creates an entangled state from all received qubits, according to their indices, by applying ∧Z gates between the qubits in order to create a brickwork state Gn×m (see Definition 19). 3. Interaction and measurement For each column x = 1, . . . , n For each row y = 1, . . . , m Z 3.1 Alice computes φx,y where sX 0,y = s0,y = 0. 3.2 Alice chooses rx,y ∈R {0, 1} and computes δx,y = φx,y + θx,y + πrx,y . 3.3 Alice transmits δx,y to Bob. Bob measures in the basis {|+δx,y , |−δx,y }. 3.4 Bob transmits the result sx,y ∈ {0, 1} to Alice. 3.5 If rx,y = 1 above, Alice flips sx,y ; otherwise she does nothing.
The universality of Protocol 1 follows from the universality of brickwork state for measurement-based quantum computing. Correctness refers to the fact that the outcome of the protocol is the same as the outcome if Alice had run the pattern herself. The fact that Protocol 1 correctly computes U |0 follows from the commutativity of Alice’s rotations and Bob’s measurements in the rotated bases. This is formalised below. Theorem 8 (Correctness). Assume Alice and Bob follow the steps of Protocol 1. Then the outcome is correct. Proof. Firstly, since ∧Z commutes with Z-rotations, steps 1 and 2 do not change the underlying graph state; only the phase of each qubit is locally changed, and it
Measurement-Based and Universal Blind Quantum Computation
75
is as if Bob had done the Z-rotation after the ∧Z. Secondly, since a measurement in the |+φ , |−φ basis on a state |ψ is the same as a measurement in the |+φ+θ , |−φ+θ basis on Z(θ)|ψ, and since δ = φ + θ + πr, if r = 0, Bob’s measurement has the same effect as Alice’s target measurement; if r = 1, all Alice needs to do is flip the outcome. We now define and prove the security of the protocol. Intuitively, we wish to prove that whatever Bob chooses to do (including arbitrary deviations from the protocol), his knowledge on Alice’s quantum computation does not increase. Note, however that Bob does learn the dimensions of the brickwork state, giving an upper bound on the size of Alice’s computation. This is unavoidable: a simple adaptation of the proof of Theorem 2 from [AFK89], confirms this. We incorporate this notion of leakage in our definition of blindness. A quantum delegated computation protocol is a protocol by which Alice interacts quantum mechanically with Bob in order to obtain the result of a computation, U (x), ˜ , x) is Alice’s input with U ˜ being a description of U . where X = (U Definition 20. Let P be a quantum delegated computation on input X and let L(X) be any function of the input. We say that a quantum delegated computation protocol is blind while leaking at most L(X) if, on Alice’s input X, for any fixed Y = L(X), the following two hold when given Y : 1. The distribution of the classical information obtained by Bob in P is independent of X. 2. Given the distribution of classical information described in 1, the state of the quantum system obtained by Bob in P is fixed and independent of X. Definition 20 captures the intuitive notion that Bob’s view of the protocol should not depend on X (when given Y ); since his view consists of classical and quantum information, this means that the distribution of the classical information should not depend on X (given Y ) and that for any fixed choice of the classical information, the state of the quantum system should be uniquely determined and not depend on X (given Y ). We are now ready to state and prove our main theorem. Recall that in Protocol 1, (n, m) is the dimension of the brickwork state. Theorem 9 (Blindness). Protocol 1 is blind while leaking at most (n, m). Proof. Let (n, m) (the dimension of the brickwork state) be given. Note that the universality of the brickwork state guarantees that Bob’s creating of the graph state does not reveal anything on the underlying computation (except n and m). Alice’s input consists of φ = (φx,y | x ∈ [n], y ∈ [m]) with the actual measurement angles φ = (φx,y | x ∈ [n], y ∈ [m])
76
A. Broadbent, J. Fitzsimons, and E. Kashefi
being a modification of φ that depends on previous measurement outcomes. Let the classical information that Bob gets during the protocol be δ = (δx,y | x ∈ [n], y ∈ [m]) and let A be the quantum system initially sent from Alice to Bob. To show independence of Bob’s classical information, let θx,y = θx,y + πrx,y (for a uniformly random chosen θx,y ) and θ = (θx,y | x ∈ [n], y ∈ [m]). We have δ = φ + θ , with θ being uniformly random (and independent of φ and/or φ ), which implies the independence of δ and φ. As for Bob’s quantum information, first fix an arbitrary choice of δ. Because rx,y is uniformly random, for each qubit of A, one of the following two has occurred: √1 (|0 + ei(δx,y −φx,y ) |1. 2 |ψx,y = √12 (|0 − ei(δx,y −φx,y ) |1.
1. rx,y = 0 so δx,y = φx,y + θx,y and |ψx,y = + π and 2. rx,y = 1 so δx,y = φx,y + θx,y
Since δ is fixed, θ depends on φ (and thus on φ), but since rx,y is independent of everything else, without knowledge of rx,y (i.e. taking the partial trace of the system over Alice’s secret), A consists of copies of the two-dimensional completely mixed state, which is fixed and independent of φ. There are two malicious scenarios that are covered by Definition 20 and that we explicitly mention here. Suppose Bob has some prior knowledge, given as some a priori distribution on Alice’s input X. Since Definition 20 applies to any distribution of X, we can simply apply it to the conditional distribution representing the distribution of X given Bob’s a priori knowledge; we conclude that Bob does not learn any information on X beyond what he already knows, as well as what is leaked. The second scenario concerns a Bob whose goal it is to find Alice’s output. Definition 20 forbids this: learning information on the output would imply learning information on Alice’s input. Note that the protocol does not allow Alice to reveal to Bob whether or not she accepts the result of the computation as this bit of information could be exploited by Bob to learn some information about the actual computation. In this scenario, Protocol 4 can be used instead.
11
Quantum Inputs and Outputs
We can slightly modify Protocol 1 to deal with both quantum inputs and outputs. In the former case, no extra channel resources are required, while the latter case requires a quantum channel from Bob to Alice in order for him to return the output qubits. Alice will also need to be able to apply X and Z Pauli operators in order to undo the quantum one-time pad. Note that these protocols can be combined to obtain a protocol for quantum inputs and outputs. Consider the scenario where Alice’s input is the form of m physical qubits and she has no efficient classical description of the inputs to be able to incorporate
Measurement-Based and Universal Blind Quantum Computation
77
it into Protocol 1. In this case, she needs to be able to apply local Pauli-X and Pauli-Z operators to implement a full one-time pad over the input qubits. The first layer of measurements are adapted to undo the Pauli-X operation if necessary. By the quantum one-time pad, Theorem 8 and Theorem 9, this modified protocol, given in Protocol 2 is still correct and private. Here we assume that Alice already has in her hands the quantum inputs: unless she receives the inputs one-by-one, she requires for this initial step some quantum memory. She also needs to be able to apply the single-qubit gates as described above. Note that this is only asking slightly more than Alice choosing between four single-qubit gates, which would be the minimum required in any blind quantum computation protocol with quantum inputs. Protocol 2. Universal Blind Quantum Computation with Quantum Inputs 1. Alice’s input preparation For the input column (x = 0, y = 1, . . . , m) corresponding to Alice’s input 1.1 Alice applies Z0,y (θ0,y ) for θ0,y ∈R {0, π/4, 2π/4, . . . , 7π/4}. i0,y . She sends the qubits to Bob. 1.2 Alice chooses i0,y ∈R {0, 1} and applies X0,y 2. Alice’s auxiliary preparation For each column x = 1, . . . , n For each row y = 1, . . . , m 2.1 Alice prepares |ψx,y ∈R {|+θx,y | θx,y = 0, π/4, 2π/4, . . . , 7π/4} and sends the qubits to Bob. 3. Bob’s preparation 3.1 Bob creates an entangled state from all received qubits, according to their indices, by applying ∧Z gates between the qubits in order to create a brickwork state G(n+1)×m . 4. Interaction and measurement For each column x = 0, . . . , n For each row y = 1, . . . , m 4.1 Alice computes φx,y with the special case φ0,y = (−1)i0,y φ0,y . 4.2 Alice chooses rx,y ∈R {0, 1} and computes δx,y = φx,y + θx,y + πrx,y . 4.3 Alice transmits δx,y to Bob. 4.4 Bob measures in the basis {|+δx,y , |−δx,y }. 4.5 Bob transmits the result sx,y ∈ {0, 1} to Alice. 4.6 If rx,y = 1 above, Alice flips sx,y ; otherwise she does nothing.
Suppose Alice now requires a quantum output, for example in the case of blind quantum state preparation. In this scenario, instead of measuring the last layer of qubits, Bob returns it to Alice, who performs the final layer of Pauli corrections. The following theorem shows a privacy property on the quantum states that Bob manipulates. Theorem 10. At every step of Protocol 1, Bob’s quantum state is one-time padded. Proof. During the execution of the protocol the value of sX and sZ are unknown to Bob since they have been one-time padded using the random key r at each
78
A. Broadbent, J. Fitzsimons, and E. Kashefi
layer. Due to the flow construction [DK06], each qubit (starting at the third column) receives independent Pauli operators, which act as the full quantum one-time pad over Bob’s state. Since our initial state is |+, and since the first layer performs a hidden Z-rotation, it follows that the qubits in the second layer are also completely encrypted during the computation. This result together with Theorems 8 and 9 proves the correctness and privacy of Protocol 3 that deals with quantum outputs.
Protocol 3. Universal Blind Quantum Computation with Quantum Outputs 1. Alice’s auxiliary preparation For each column x = 1, . . . , n − 1 For each row y = 1, . . . , m 1.1 Alice prepares |ψx,y ∈R {|+θx,y | θx,y = 0, π/4, 2π/4, . . . , 7π/4} and sends the qubits to Bob. 2. Alice’s output preparation 2.1 Alice prepares the last column of qubits |ψn,y = |+ (y = 1, . . . , m) and sends the qubits to Bob. 3. Bob’s preparation 3.1 Bob creates an entangled state from all received qubits, according to their indices, by applying ∧Z gates between the qubits in order to create a brickwork state Gn×m . 4. Interaction and measurement For each column x = 1, . . . , n − 1 For each row y = 1, . . . , m Z 4.1 Alice computes φx,y where sX 0,y = s0,y = 0 for the first column. 4.2 Alice chooses rx,y ∈R {0, 1} and computes δx,y = φx,y + θx,y + πrx,y . 4.3 Alice transmits δx,y to Bob. 4.4 Bob measures in the basis {|+δx,y , |−δx,y }. 4.5 Bob transmits the result sx,y ∈ {0, 1} to Alice. 4.6 If rx,y = 1 above, Alice flips sx,y ; otherwise she does nothing. 5. Output Correction 5.1 Bob sends to Alice all qubits in the last layer. Z X 5.2 Alice performs the final Pauli corrections Z sn,y X sn,y .
12
Authentication and Fault-Tolerance
We now focus on Alice’s ability to detect if Bob is not cooperating. There are two possible ways in which Bob can be uncooperative: he can refuse to perform the computation (this is immediately apparent to Alice), or he can actively interfere with the computation, while pretending to follow the protocol. It is this latter case that we focus on detecting. The authentication technique enables Alice to detect an interfering Bob with overwhelming probability (strictly speaking, either Bob’s interference is corrected and he is not detected, or his interference is detected with overwhelming probability). Note that this is the best that we can
Measurement-Based and Universal Blind Quantum Computation
79
hope for since nothing prevents Bob from refusing to perform the computation. Bob could also be lucky and guess a path that Alice will accept. This happens with exponentially small probability, hence our technique is optimal. In the case that Alice’s computation has a classical output and that she does not require fault-tolerance, a simple protocol for blind quantum computing with authentication exists: execute Protocol 1, on a modification of Alice’s target circuit: she adds N randomly placed trap wires that are randomly in state |0 or |1 (N is the number of qubits in the computation). If Bob interferes, either his interference has no effect on the classical output, or he will get caught with probability at least 12 (he gets caught if Alice finds that the output of at least one trap wire is incorrect). The protocol is repeated s times (the traps are randomly re-positioned each time); if Bob is not caught cheating, Alice accepts if all outputs are identical; otherwise she rejects. The probability of an incorrect output being accepted is at most 2−s . Protocol 4 is more general than this scheme since it works for quantum outputs and is fault-tolerant. If the above scheme is used for quantum inputs, they must be given to Alice as multiple copies. Similarly (but more realistically), if Protocol 4 is to be used on quantum inputs, these must already be given to Alice in an encoded form as in step 2 of Protocol 4 (because Alice has no quantum computational power). In the case of a quantum output, it will be given to Alice in a known encoded form, which she can pass on to a third party for verification. The theory of quantum error correction provides a natural mechanism for detecting unintended changes to a computation, whereas the theory of fault-tolerant computation provides a way to process information even using error-prone gates. Unfortunately, error correction, even when combined with fault-tolerant gate constructions is insufficient to detect malicious tampering if the error correction code is known. As evidenced by the quantum authentication protocol [BCG+ 02], error correction encodings can, however, be adapted for this purpose. UBQC proceeds along the following lines. Alice chooses an nC -qubit error correction code C with distance dC . (The values of nC and dC are taken as security parameters.) If the original computation involves N logical qubits, the authenticated version involves N (nC + 3nT ) (with nT = nC ), logical qubits: throughout the computation, each logical qubit is encoded with C, while the remaining 3N nT qubits are used as traps to detect an interfering Bob. The trap qubits are prepared as a first step of the computation in eigenstates of the Pauli operators X, Y and Z, with an equal number of qubits in each state. The protocol also involves fault-tolerant gates, for some of which it is necessary to have Bob periodically measure qubits [ZCC07]. In order to accomplish this, the blind computation protocol is extended by allowing Alice to instruct Bob to measure specific qubits within the brickwork state in the computational basis at regular intervals. These qubits are chosen at regular spacial intervals so that no information about the structure of the computation is revealed. It should be noted that in Protocol 4, we allow Alice to reveal to Bob whether or not she accepts the final result.
80
A. Broadbent, J. Fitzsimons, and E. Kashefi
Protocol 4. Blind Quantum Computing with Authentication (classical input and output) 1. Alice chooses C, where C is some nC -qubit error-correcting code with distance dC . The security parameter is dC . 2. In the circuit model, starting from circuit for U , Alice converts target circuit to fault-tolerant circuit: 2.1 Use error-correcting code C. The encoding appears in the initial layers of the circuit. 2.2 Perform all gates and measurements fault-tolerantly. 2.3 Some computational basis measurements are required for the fault-tolerant implementation (for verification of ancillae and non-transversal gates). Each measurement is accomplished by making and measuring a pseudo-copy of the target qubit: a ∧X is performed from the target to an ancilla qubit initially set to |0, which is then measured in the Z-basis. 2.4 Ancilla qubit wires are evenly spaced through the circuit. 2.5 The ancillae are re-used. All ancillae are measured at the same time, at regular intervals, after each fault-tolerant gate (some outputs may be meaningless). 3. Within each encoded qubit, permute all wires, keeping these permutations secret from Bob. 4. Within each encoded qubit, add 3nT randomly interspersed trap wires, each trap being a random eigenstate of X, Y or Z (nT of each). For security, we must have nT ∝ nC ; for convenience, we choose nT = nC . The trap qubit wire (at this point) does not interact with the rest of the circuit. The wire is initially |0, and then single-qubit gates are used to create the trap state. These single-qubit gates appear in the initial layers of the circuit. 5. Trap qubits are verified using the same ancillae as above: they are rotated into the computational basis, measured using the pseudo-copy technique above, and then returned to their initial basis. 6. Any fault-tolerant measurement is randomly interspersed with verification of 3nT random trap wires. For this, identity gates are added as required. 7. For classical output, the trap wires are rotated as a last step, so that the following measurement in the computational basis is used for a final verification. 8. Convert the whole circuit above to a measurement-based computation on the brickwork state, with the addition of regular Z-basis measurements corresponding to the measurements on ancillae qubits above. Swap and identity gates are added as required, and trap qubits are left untouched. 9. Perform the blind quantum computation: 9.1 Execute Protocol 1, to which we add that Alice periodically instructs Bob to measure in Z-basis as indicated above. 9.2 Alice uses the results of the trap qubit measurements to estimate the error rate; if it is below the threshold (see discussion in the main text), she accepts, otherwise she rejects.
UBQC can also be used in the scenario of non-malicious faults: because it already uses a fault-tolerant construction, the measurement of trap qubits in Protocol 4 allows for the estimation of the error rate (whether caused by the environment or by an adversary); if this error rate is below a certain threshold
Measurement-Based and Universal Blind Quantum Computation
81
(this threshold is chosen below the fault-tolerance threshold to take into account sampling errors), Alice accepts the computation. As long as this is below the fault-tolerance threshold, an adversary would still have to guess which qubits are part of the code, and which are traps, so Theorem 13 also holds in the faulttolerant version. The only difference is that the adversary can set off a few traps without being detected, but he must still be able to correctly guess which qubits are in the encoded qubit and which are traps. Increasing the security parameters will make up for the fact that Bob can set off a few traps without making the protocol abort. This yields a linear trade-off between the error rate and the security parameter. Note that the brickwork state (Figure 3) can be extended to multiple dimensions, which may be useful for obtaining better fault-tolerance thresholds [Got00]. While the quantum Singleton bound [KL00] allows error correction codes for which dC ∝ nC , it may be more convenient to use the Toric √ Code [Kit97] for which dC ∝ nC , as this represents a rather simple encoding while retaining a high ratio of dC to nC . For the special case of deterministic classical output, a classical repetition code is sufficient and preferable as such an encoding maximises nC . Theorem 11 (Fault Tolerance). Protocol 4 is fault-tolerant. Proof. By construction, the circuit created in step 2.1 is fault-tolerant. Furthermore, the permutation of the circuit wires and insertion of trap qubits (steps 2.2 and 2.3) preserves the fault tolerance. This is due to the fact that qubits are permuted only within blocks of constant size. The fault-tolerant circuit given in step 2.1 can be written as a sequence of local gates and ∧X gates between neighbours. Clearly permutation does not affect the fidelity of local operations. As qubits which are neighbours in the initial fault-tolerant circuit become separated by less than twice the number of qubits in a single block, the maximum number of nearest-neighbour ∧X gates required to implement ∧X from the original circuit is in O(nC + 3nT ) (the size of a block). (If required, the multi-dimensional analogue of the two-dimensional brickwork state can be used in order to substantially reduce this distance.) As this upper bound is constant for a given implementation, a lower bound for the fault-tolerance threshold can be obtained simply be scaling the threshold such that the error rate for this worst-case ∧X is never more than the threshold for the original circuit. Thus, while the threshold is reduced, it remains non-zero. Step 8 converts the fault-tolerant circuit to a measurement pattern; it is known that this transformation retains the fault-tolerance property [ND05, AL06]. Finally, in step 9, distributing the fault-tolerant measurement pattern between Alice and Bob does not disturb the fault tolerance since the communication between them is only classical. Theorem 12 (Blindness). Protocol 4 is blind while leaking at most (n, m). Proof. Protocol 4 differs from Protocol 1 in the following two ways: Alice instructs Bob to perform regular Z-basis measurements and she reveals whether or not she accepts or rejects the computation. It is known that Z measurements
82
A. Broadbent, J. Fitzsimons, and E. Kashefi
change the underlying graph state into a new graph state [HEB04]. The Z measurements in the protocol are inserted at regular intervals and their numbers are also independent of the underlying circuit computation. Therefore their action transforms the generic brickwork state into another generic resource still independent of Alice’s input and the blindness property is obtained via the same proof of Theorem 9. Finally, from Alice’s decision to accept or reject, only information relating to the trap qubits is revealed to Bob, since Alice rejects if and only if the estimated error rate is too high. The trap qubits are uncorrelated with the underlying computation (in the circuit picture, they do not interact with the rest of the circuit) and hence they reveal no information about Alice’s input. In the following theorem, for simplicity, we consider the scenario with zero error rate; a proof for the full fault-tolerant version is similar. Theorem 13 (Authentication). For the zero-error case of Protocol 4, if Bob interferes with an authenticated computation, then either he is detected except with exponentially small probability (in the security parameter), or his actions fail to alter the computation. Proof. If Bob interferes with the computation, then in order for his actions to affect the outcome of the computation without being detected, he must perform a non-trivial operation (i.e. an operation other than the identity) on the subspace in which the logical qubits are encoded. Due to the fault-tolerant construction of Alice’s computation (Theorem 11), Bob’s operation must have weight at least dC . Due to discretisation of errors, we can treat Bob’s action as introducing a Pauli error with some probability p. If a Pauli error acts non-trivially on a trap qubit then the probability of this going undetected is 1/3. Pauli operators which remain within the code space must act on at least dC qubits. As Bob has no knowledge about the roles of qubits (Theorem 12), the probability of him acting on any qubit is equal. As the probability of acting on a trap is at least 3nT /(nC + 3nT ), for each qubit upon which he acts non-trivially, the probability of Bob being detected is more than 2nT /(nC + 3nT ). Thus the probability of an M -qubit Pauli operator going undetected is below (1 − 2nT /(nC + 3nT ))M . Since nT = nC , the minimum probability of Bob affecting the computation and going undetected is = 2−dC .
13
Entangled Servers
We will close with a discussion of UBQC in the context of multi-prover interactive proofs. As stated before, one can view UBQC as an interactive proof system where Alice acts as the verifier and Bob as the prover. An important open problem is to find an interactive proof for any problem in BQP with a BQP prover, but with a purely classical verifier. Protocol 4 makes progress towards finding a solution by providing an interactive proof for any language in BQP, with a quantum prover and a BPP verifier that also has the power to generate random
Measurement-Based and Universal Blind Quantum Computation
83
qubits chosen from a fixed set and send them to the prover. This perspective was first proposed by Aharonov, Ben-Or and Eban [ABE10], however their scheme demands a more powerful verifier. Protocol 5 is a solution to another closely related problem, namely the case of a purely classical verifier interacting with two non-communicating entangled provers. The idea is to adapt Protocol 1 so that one prover (that we now call a server) is used to prepare the random qubits that would have been generated by Alice in the original protocol, while the other server is used for universal blind quantum computation. Using the authenticated protocol (Protocol 4) between Alice and the second server, Alice will detect any cheating servers as clearly, any cheating by Server 1 is equivalent to a deviation from the protocol by Server 2, which is detected in step 2 of the protocol (the proof is directly obtained from Theorem 13). On the other hand, since Server 2 has access to only half of each entangled state, from his point of view, his sub-system remains in a completely mixed state independently of Server 1’s actions and the blindness of the protocol is obtained directly from Theorem 12. This protocol acts as an interactive proof system for BQP, as an authenticated blind computation gaurantees the correctness of the result except with exponentially small probability. Thus UBQC provides a means for a completely classical party to interactively verify the correctness of any quantum computation. While the UBQC protocol was the first to demonstrate new functionality via measurement based computation, we expect that this novel model will provide fertile ground for future research.
Protocol 5. Universal Blind Quantum Computation with Entangled Servers Initially, Servers 1 and 2 share |Φ+ x,y =
√1 (|00 2
+ |11) (x = 1, . . . , n, y = 1, . . . , m).
1. Alice’s preparation with Server 1 For each column x = 1, . . . , n For each row y = 1, . . . , m 1.1 Alice chooses θ˜x,y ∈R {0, π/4, 2π/4, . . . , 7π/4} and sends it to Server 1, who measures his part of |Φ+ x,y in |±θ˜x,y . 1.2 Server 1 sends mx,y , the outcome of his measurement, to Alice. 2. Alice’s computation with Server 2 2.1 Alice runs the authenticated blind quantum computing protocol (Protocol 4) with Server 2, taking θx,y = θ˜x,y + mx,y π.
Acknowledgments We would like to thank our collaborators and co-authors in the series of the papers that this chapter is based on: Vincent Danos and Prakash Panangaden.
84
A. Broadbent, J. Fitzsimons, and E. Kashefi
References [AB09]
Anders, J., Browne, D.E.: Computational power of correlations. Physical Review Letters 102, 050502 (4 pages) (2009) [ABE10] Aharonov, D., Ben-Or, M., Eban, E.: Interactive proofs for quantum computations. In: Proceedings of Innovations in Computer Science (ICS 2010), pp. 453–469 (2010) [AFK89] Abadi, M., Feigenbaum, J., Kilian, J.: On hiding information from an oracle. Journal of Computer and System Sciences 39, 21–50 (1989) [AJL06] Aharonov, D., Jones, V., Landau, Z.: A polynomial quantum algorithm for approximating the Jones polynomial. In: Proceedings of the 38th annual ACM symposium on Theory of computing (STOC 2006), pp. 427–436 (2006) [AL06] Aliferis, P., Leung, D.W.: Simple proof of fault tolerance in the graph-state model. Phys. Rev. A 73 (2006) [AMTW00] Ambainis, A., Mosca, M., Tapp, A., de Wolf, R.: Private quantum channels. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science (FOCS 2000), pp. 547–553 (2000) [AS06] Arrighi, P., Salvail, L.: Blind quantum computation. International Journal of Quantum Information 4, 883–898 (2006) [Bar84] Barendregt, H.P.: The Lambda Calculus, Its Syntax and Semantics. In: Studies in Logic. North-Holland, Amsterdam (1984) [BB84] Brassard, G., Bennett, C.H.: Public key distribution and coin tossing. In: Proceedings of IEEE International Conference on Computers, Systems and Signal Processing (1984) [BB06] Browne, D.E., Briegel, H.J.: One-way quantum computation. In: Lectures on Quantum Information, pp. 359–380. Wiley-VCH, Berlin (2006) [BCG+ 02] Barnum, H., Cr´epeau, C., Gottesman, D., Smith, A., Tapp, A.: Authentication of quantum messages. In: Proceedings of the 43rd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2002), p. 449 (2002) [BFK09] Broadbent, A., Fitzsimons, J., Kashefi, E.: Universal blind quantum computation. In: Proceedings of the 50th Annual Symposium on Foundations of Computer Science (FOCS 2009), pp. 517–527 (2009) [BK09] Broadbent, A., Kashefi, E.: Parallelizing quantum circuits. In: Theoretical Computer Science, pp. 2489–2510 (2009) [BKMP07] Browne, D., Kashefi, E., Mhalla, M., Perdrix, S.: Generalized flow and determinism in measurement-based quantum computation. New Journal of Physics 9 (2007) [BR03] Boykin, P.O., Roychowdhury, V.: Optimal encryption of quantum bits. Physical Review A 67, 042317 (2003) [BV97] Bernstein, E., Vazirani, U.: Quantum complexity theory. SIAM Journal of Computing 5(26) (1997) [Chi05] Childs, A.M.: Secure assisted quantum computation. Quantum Information and Computation 5, 456–466 (2005); Initial version appeared online in (2001) [Cho75] Choi, M.D.: Completely positive linear maps on complex matrices. Linear Algebra and Applications 10 (1975) [CLN05a] Childs, A.M., Leung, D.W., Nielsen, M.A.: Unified derivations of measurement-based schemes for quantum computation. Physical Review A 71 (2005), quant-ph/0404132
Measurement-Based and Universal Blind Quantum Computation [CLN05b]
85
Childs, A.M., Leung, D.W., Nielsen, M.A.: Unified derivations of measurement-based schemes for quantum computation. Physical Review A 71, 032318 (14 pages) (2005) [DAB03] D¨ ur, W., Aschauer, H., Briegel, H.J.: Multiparticle entanglement purification for graph state. Physical Review Letters 91 (2003), quant-ph/0303087 [Deu85] Deutsch, D.: Quantum theory, the Church-Turing principle and the universal quantum computer. Proceedings of the Royal Society of London A400 (1985) [Deu89] Deutsch, D.: Quantum computational networks. Proc. Roy. Soc. Lond A 425 (1989) [DK06] Danos, V., Kashefi, E.: Determinism in the one-way model. Physical Review A 74, 052310 (6 pages) (2006) [DKP05] Danos, V., Kashefi, E., Panangaden, P.: Parsimonious and robust realizations of unitary maps in the one-way model. Physical Review A 72 (2005) [DKP07] Danos, V., Kashefi, E., Panangaden, P.: The measurement calculus. Journal of ACM 54, 8 (45 pages) (2007) [DS96] D¨ urr, C., Santha, M.: A decision procedure for unitary linear quantum cellular automata. In: Proceedings of FOCS 1996 – Symposium on Foundations of Computer Science, LNCS. Springer, Heidelberg (1996), quantph/9604007 [Fei86] Feigenbaum, J.: Encrypting problem instances: Or... can you take advantage of someone without having to trust him? In: Williams, H.C. (ed.) CRYPTO 1985. LNCS, vol. 218, pp. 477–488. Springer, Heidelberg (1986) [GC99] Gottesman, D., Chuang, I.L.: Quantum teleportation is a universal computational primitive. Nature 402 (1999) [GKR08] Goldwasser, S., Kalai, Y.T., Rothblum, G.N.: Delegating computation: interactive proofs for muggles. In: Proceedings of the 40th annual ACM symposium on Theory of computing, pp. 113–122 (2008) [GOD+ 06] Greentree, A.D., Olivero, P., Draganski, M., Trajkov, E., Rabeau, J.R., Reichart, P., Gibson, B.C., Rubanov, S., Huntington, S.T., Jamieson, D.N., Prawer, S.: Critical components for diamond-based quantum coherent devices. Journal of Physics: Condensed Matter 18, 825–842 (2006) [Got97] Gottesoman, D.: Stabilizer codes and quantum error correction. PhD thesis, California Institute of Technology (1997) [Got00] Gottesman, D.: Fault-tolerant quantum computation with local gates. Journal of Modern Optics 47, 333–345 (2000) [HEB04] Hein, M., Eisert, J., Briegel, H.J.: Multi-party entanglement in graph states. Physical Review A 69 (2004) [Kit97] Kitaev, A.Y.: Quantum computations: algorithms and error correction. Russian Mathematical Surveys 52, 1191–1249 (1997) [KL00] Knill, E., Laflamme, R.: A theory of quantum error-correcting codes. Physical Review Letters 84, 2525 (2000) [MS08] Markham, D., Sanders, B.C.: Graph states for quantum secret sharing. Physical Review A 78, 042309 (17 pages) (2008) [NC00] Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) [ND05] Nielsen, M.A., Dawson, C.M.: Fault-tolerant quantum computation with cluster states. Phys. Rev. A 71 (2005) [Nie03] Nielsen, M.A.: Universal quantum computation using only projective measurement, quantum memory and preparation of the 0 state. Physical Review A 308 (2003)
86
A. Broadbent, J. Fitzsimons, and E. Kashefi
[Per95] [Pre98]
[RB01] [RBB03] [RHG06] [RSA78]
[Sel04] [Sel05] [Sho97]
[SW04] [Unr05] [vD96] [Wat95]
[ZCC07]
Peres, A.: Quantum Theory: Concepts and Methods. Kluwer Academic Publishers, Dordrecht (1995) Preskill, J.: Fault-tolerant quantum computation. In: Lo, H.K., Popescu, S., Spiller, T.P. (eds.) Introduction to Quantum Computation and Information. World Scientific, Singapore (1998) Raussendorf, R., Briegel, H.J.: A one-way quantum computer. Physical Review Letters 86 (2001) Raussendorf, R., Browne, D.E., Briegel, H.J.: Measurement-based quantum computation on cluster states. Physical Review A 68 (2003) Raussendorf, R., Harrington, J., Goyal, K.: A fault-tolerant one-way quantum computer. Annals of Physics 321, 2242–2270 (2006) Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21, 120–126 (1978) Selinger, P.: Towards a quantum programming language. Mathematical Structures in Computer Science 14(4) (2004) Selinger, P. (ed.): Proceedings of the 3nd International Workshop on Quantum Programming Languages. ENTCS (2005) Shor, P.W.: Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. SIAM Journal on Computing 26, 1484–1509 (1997); First published in 1995 Schumacher, B., Werner, R.F.: Reversible quantum cellular automata (2004), quant-ph/0405174 Unruh, D.: Quantum programs with classical output streams. In: Selinger [Sel05] (2005) van Dam, W.: Quantum cellular automata. Master’s thesis, Computer Science Nijmegen (1996) Watrous, J.: On one-dimensional quantum cellular automata. In: Proceedings of FOCS 1995 – Symposium on Foundations of Computer Science, LNCS. Springer, Heidelberg (1995) Zeng, B., Cross, A., Chuang, I.L.: Transversality versus universality for additive quantum codes (2007), arXiv:0706.1382v3 (quant-ph)
Information Theory and Security: Quantitative Information Flow Pasquale Malacaria and Jonathan Heusser School of Electronic Engineering and Computer Science, Queen Mary University of London
Abstract. We present the information theoretical basis of Quantitative Information Flow. We show the relationship between lattices, partitions and information theoretical concepts and their applicability to quantify leakage of confidential information in programs, including looping programs. We also report on recent works that use these ideas to build tools for the automatic quantitative analysis of programs. The applicability of this information theoretical framework to the wider context of network protocols and the use of Lagrange multipliers in this setting is also demonstrated.
1
Introduction
Computational systems (a general term) have two basic properties: first they process information, second they allow for observations of this processing to be made. For example a program will typically process the inputs and allows its output to be observed for example on a screen. In a distributed system each unit processes information and will allow some observation to be made by the environment or other units, for example by message passing. In an election voters cast their vote, officials count the votes and the public can observe the election result. The broad goal of our research is to develop theories and techniques to quantity the information leaked by components of a computational system by these observations. The information we are interested to quantify is the one coming from some designed sources, for example in a system processing secret data we may want to quantify how much is leaked by observations available to the general public. In an election we may want to quantify how much about the choice of individual voters is leaked by the result; consider for example the extreme case when a candidate get all votes: then all secret is revealed, a total loss of anonymity. Several terms can be used for essentially the same concept: quantification of leakage, quantitative information flow, quantification of interference. quantitative analysis of dependencies. The basic idea for this quantitative analysis has been proposed in various forms by a number of researchers. Given components A and B we can say that Interference from A to B can be measured by the number of states of A we can distinguish given observations over B A. Aldini et al. (Eds.): SFM 2010, LNCS 6154, pp. 87–134, 2010. c Springer-Verlag Berlin Heidelberg 2010
88
P. Malacaria and J. Heusser
Although this is a somehow crude model it does capture the essence of quantitative information flow. The model is crude because while the number of distinguishable states is the essential quantity, their probabilities are also to be taken into account: the measure can be refined by Information Theoretical notions like entropy. To see why probabilities are essential consider a simple pin checking program of a cash machine if (pin==guess) x=access else x=deny the leakage of information about the secret pin by observing the output value of x greatly depend on the probability of the pin to have a particular value. If the possible values of a 4 digits pin are equally likely then the leakage will be very low whereas if the pin is very likely to take the same value as guess then the leakage will be unacceptable. Hence probabilities are an important component of a quantitative information flow analysis. 1.1
Structure of This Work
We will start by relating observations over a system with sets of distinguishable states and partitions. It will be shown how random variables in our context can be seen as partitions and that partitions over a set of states form a complete lattice. The concept of measure on lattice points is then discussed and it will be argued that Shannon’s entropy provides the best measure. This measure induces a pseudometric on the lattice points and the derived equivalence classes (points of distance 0) can be seen, following Shannon, as the information tokens of the space. The second part of the work is devoted to apply this lattice-information theoretical framework to quantify leakage of sequential programs. Leakage of programs is defined in two steps: first interpret programs as random variables, that is partitions in the lattice of information and then measure this partition using information theory. Splitting the definition in two steps is useful in that allows to use a unique framework for several measures of leakage like Shannon’s one and those based on guessability. We will then see in details how to quantify leakage of loops, using both an analytic approach and the partition approach. These approaches will be shown to be equivalent. In the third part we will describe recent work towards automation of these ideas. These ideas, their relation to current verification and abstract interpretation tools and techniques and the challenges in the implementation are discussed. We conclude with a short review of more advanced techniques like Lagrange multipliers in a the general setting of probabilistic systems.
Information Theory and Security: Quantitative Information Flow
2 2.1
89
Basics Observations and the Lattice of Information
Information Theory aims to measure the amount of information of random variables or of some sort of processes (mainly stochastic processes): what the information is about is not a concern of the theory, the measure is based on the number of distinctions available in an information context. As an example consider the information-wise very different processes “flipping a coin” and “presidential election between two candidate”. While the first is a rather inconsequential process and the second may have important consequences they are both contexts allowing for two choices hence they both have an information measure of (at most) 1 bit. In a context where n choices are possible (a process with n outcomes) the information associated is measured in terms of the number of bits needed to encode those possible choices, so it is at most log2 (n). We can see an observation over a context (or a process) as a partial information, that is an observation reveals some information about a context. As an analogy a witness observing a crime may notice that the criminal was a tall male. He may not be able to identify the criminal but his observation will split the population into two groups, the possible criminals (tall males) the non suspects (all not tall males). Hence the concept of observation ties in nicely with the one of information: observation = partial information = sets of indistinguishable items Notice that we can always define a minimal, least informative observation (nothing is distinguished) and a maximal, most informative observations (everything is distinguished). We will make an important determinacy assumption about observations, i.e. that an observations can be defined as a partition on the set of all possible states: a block in this partition is the set of states that are indistinguishable by that observation. This assumption is satisfied for example in the setting of sequential languages when we take as observations the program outputs. If we consider a more general probabilistic settings like anonymity protocols a more general framework should be considered [11] (see section 8). 2.2
Partitions and Equivalence Relations as Lattice Points
Given a system with a set of possible states Σ the set of all possible partitions over Σ is a complete lattice: the Lattice of Information (LoI) [24]. Order on partitions is given by refinement: a partition is above another if it is more informative, i.e. each block in the lower partition is included in a block in the above partition. An alternative view of the same structure is in terms of equivalence relations. Notice first that there is simple translations between an equivalence relation and a partition: given an equivalence relation define the partition whose blocks are equivalences classes.
90
P. Malacaria and J. Heusser
Let us define the set LoI which stands for the set of all possible equivalence relations on a set Σ. The ordering of LoI is now defined as ≈ ∼ ↔ ∀σ1 , σ2 (σ1 ∼ σ2 ⇒ σ1 ≈ σ2 )
(1)
where ≈, ∼ ∈ LoI and σ1 , σ2 ∈ Σ. Furthermore, the join and meet lattice operations stand for the intersection of relations and the transitive closure union of relations respectively. Thus, higher elements in the lattice can distinguish more while lower elements in the lattice can distinguish less states. It easily follows from (1) that LoI is a complete lattice. We will assume this lattice to be finite; this is motivate by considering information storable in programs variables: such information is ≤ 2k where k is the number of bits of the secret variable. We give a typical example of how these equivalence relations can be used in an information flow setting. Let us assume the set of states Σ consists of a tuple l, h where l is an observable, usually called low, variable and h is a confidential variable, usually called high. One possible observer can be described by the equivalence relation l1 , h1 ≈ l2 , h2 ↔ l1 = l2 That is the observer can only distinguish two states whenever they agree on the low variable part. Clearly, a more powerful attacker is the one who can distinguish any two states from one another, or l1 , h1 ∼ l2 , h2 ↔ l1 = l2 ∧ h1 = h2 The ∼-observer gains more information than the ≈-observer by comparing states, therefore ≈ ∼. 2.3
Lattice of Information as a Lattice of Random Variables
A random variable (noted r.v.) is usually defined as a map X : D → R, where D is a finite set with a probability distribution and the real numbers R is the range of X. For each element d ∈ D, its probability will be denoted p(d). For every element x ∈ R we write p(X = x) (or often in short p(x)) to mean the def probability that X takes on the value x, i.e. p(x) = d∈X −1 (x) p(d). In other words, what we observe by X = x is that the input to X in D belongs to the set X −1 (x). From that perspective, X partitions the space D into sets which are indistinguishable to an observer who sees the value that X takes on1 . This can be stated relationally by taking the kernel of X which defines the following equivalence relation ker(X): d ker(X) d iff X(d) = X(d )
1
We define an event for the random variable a block in the partition.
(2)
Information Theory and Security: Quantitative Information Flow
91
Equivalently we write X Y whenever the following holds X Y iff {X −1 (x) : x ∈ R} = {Y −1 (y) : y ∈ R} and thus if X Y then H(X) = H(Y ). This shows that each element of the lattice LoI can be seen as a random variable. Given two r.v. X, Y in LoI we define the joint random variable (X, Y ) as their least upper bound in LoI i.e. X Y . It is easy to verify that X Y is the partition obtained by all possible intersections of blocks of X with blocks of Y . 2.4
Basic Concepts of Information Theory
This section contains a very short review of some basic definitions of Information Theory; additional background is readily available both in textbooks (the standard being Cover and Thomas textbook [17]) and on the web. Given a space of events with probabilities P = (pi )i∈N (N is a set of indices) the Shannon’s entropy is defined as H(X) = − pi log pi (3) i∈N
It is usually said that this number measures the average information content of the set of events: if there is an event with probability 1 then the entropy will be 0 and if the distribution is uniform i.e. no event is more likely than any other the entropy is maximal, i.e. log |N |. In the literature the terms information content and uncertainty in this context are often used interchangeably: both terms refer to the number of possible distinctions on the set of events in the sense we discussed before. The entropy of a r.v. X is just the entropy of its probability distribution i.e. p(X = x) log p(X = x) − x∈X
Given two random variables X and Y , the joint entropy H(X, Y ) measures the uncertainty of the joint r.v. (X, Y ). it Is defined as − p(X = x, Y = y) log p(X = x, Y = y) x∈X,y∈Y
Conditional entropy H(X|Y ) measures the uncertainty about X given knowledge of Y . It is defined as H(X, Y ) − H(Y ). The higher H(X|Y ) is, the lower is the correlation between X and Y . It is easy to see that if X is a function of Y , then H(X|Y ) = 0 (there is no uncertainty on X knowing Y if X is a function of Y ) and if X and Y are independent then H(X|Y ) = H(X) (knowledge of Y doesn’t change the uncertainty on X if they are independent) . Mutual information I(X; Y ) is a measure of how much information X and Y share. It can be defined as I(X; Y ) = H(X) − H(X|Y ) = H(Y ) − H(Y |X)
92
P. Malacaria and J. Heusser
Thus the information shared between X and Y is the information of X (resp Y ) from which the information about X given Y has been deduced. This quantity measures the correlation between X and Y . For example X and Y are independent iff I(X; Y ) = 0. Mutual information is a measure of binary interaction. Conditional mutual information, a form of ternary interaction will be used to quantify leakage. Conditional mutual information measures the correlation between two random variables conditioned on a third random variable; it is defined as: I(X; Y |Z) = H(X|Z) − H(X|Y, Z) = H(Y |Z) − H(Y |X, Z) 2.5
Measures on the Lattice of Information
Suppose we want attempt to quantify the amount of information provided by a point in the lattice of information. We could for example associate to a partition P the measure |P | = “number of blocks in P ”. This measure would be 1 for the least informative partition and its maximal value would be reached by the top partition. It is also true that A B then |A| ≤ |B| so the measure reflects the order of the lattice. An important property of “additivity” for measures is the inclusion-exclusion principle: roughly speaking this principle says that things should not be counted twice. In terms of sets, the inclusion-exclusion principle says that the number of elements in a union of sets is the sum of the number of elements of the two sets minus the number of elements in the intersection2 : in our case the inclusionexclusion principle is: |A B| = |A| + |B| − |A B| Unfortunately this property does not hold. As example, by taking A = {{1, 2}{3, 4}}, B = {{1, 3}{2, 4}} as two partitions, then their join and meet will be A B = {{1}{2}{3}{4}}, A B = {{1, 3, 2, 4}}. The counting principle from above is in this case not satisfied |A B| = 4 = 3 = |A| + |B| − |A B| Another problem with the map | | is that when we consider LoI as a lattice of random variables the above measure may end up being too crude; in fact, all probabilities are disregarded by | |. To address this problem we introduce more abstract lattice theoretic notions. 2
The principle is universal e.g. in propositional logic the truth value of A ∨ B is given by the truth value of A plus the truth value of B minus the truth value of A ∧ B.
Information Theory and Security: Quantitative Information Flow
93
A valuation on LoI is a real valued map ν : LoI → R, that satisfies the following properties: ν(X Y ) = ν(X) + ν(Y ) − ν(X Y )
(4)
X Y implies ν(X) ≤ ν(Y )
(5)
A join semivaluation is a weak valuation, i.e. a real valued map satisfying ν(X Y ) ≤ ν(X) + ν(Y ) − ν(X Y )
(6)
X Y implies ν(X) ≤ ν(Y )
(7)
for every element X and Y in a lattice [35]. The property (5) is order-preserving: a higher element in the lattice has a larger valuation than elements below itself. The first property (6) is a weakened inclusion-exclusion principle. Proposition 1. The map ν(X Y ) = H(X, Y )
(8)
is a join semivaluation on LoI . Proof: The tricky part is to prove that inequality 6 is satisfied. Since it is true that H(X, Y ) = H(X) + H(Y ) − I(X; Y ) it will be enough to prove that H(X Y ) ≤ I(X; Y ) This can be proved by noticing that 1. H(X Y ) = I(X Y ; X) this is clear because I(X Y ; X) measure the information shared between X Y and X and because X Y X such measure has to be H(X Y ) 2. I(X Y ; X) ≤ I(Y ; X) this is clear because X Y Y hence there is more information shareable between Y and X than between X Y and X combining we have H(X Y ) = I(X Y ; X) ≤ I(Y ; X) An important result proved by Nakamura [35] gives a particular importance to Shannon entropy as a measure on LoI . He proved that the only probabilitybased join semivaluation on the lattice of information is Shannon’s entropy. It is easy to show that a valuation itself is not definable on this lattice, thus Shannon’s entropy is the best approximation to a probability-based valuation on this lattice. Other measures can be used, which are however less mathematically appealing. We will also consider Min-Entropy, used recently by Smith in an information flow context [41], which seems like a good complementing measure. While
94
P. Malacaria and J. Heusser
Shannon entropy intuitively results in an “averaging” measure over a probability distribution, the Min-Entropy H∞ takes on a “worst-case” view: only the maximal value p(x) of a random variable X is considered H∞ (X) = − log max p(x) x∈X
where it is always the case that H∞ (X) ≤ H(X). 2.6
Shannon’s “Lattice Theory of Information”
Shannon’s original work on Information Theory [38] did not use the term Information but the term Communication. This was not a coincidence. In a little known note from 1953 [39] Shannon explained that while the entropy H(X) is a reasonable measure of the amount of information contained in the random variable (or process) X it can hardly be said that it represents the actual information of X: as we already observed two random variables might have the same entropy yet they have not in general the “same information” (flipping a coin, electing a US president) . The point here is to agree on what we mean by “same information”: consider two tables in a spreadsheet expressing distances about cities, one table measuring distance in kilometers and the other in miles. Then knowledge of one table reveals all information contained in the other table by converting kilometers to miles and viceversa. In general consider a random variable (or a stochastic process): a random variable could be described in many different possible ways, all those descriptions are reversible translations, in the same way as a newspaper article could be translated in another language without losing any information; an information element is then an equivalence class of objects under an invertible translation. More formally given random variables X and Y we can defined a distance d(X, Y ) = H(X|Y ) + H(Y |X) = 2H(X, Y ) − H(X) − H(Y ) we will see in a moment that d is a pseudometric, but let’s first understand what it means for X, Y to have distance zero: first they have the same entropy, in fact H(X|Y ) + H(Y |X) = 0 implies H(X) = H(Y ). Second and more important all information about X can be derived by knowing Y and viceversa, so X and Y contain the same information and reciprocally if they contain the same information then given full knowledge of one completely describes the other, i.e. H(X|Y ) = 0 = H(Y |X), and thus they have distance 0. For example, if X and Y are “flipping a coin” and “electing the US president” then knowing the next president will not help in determining the outcome of flipping a coin (i.e. H(X|Y ) > 0) and similarly flipping a coin will not help in knowing who won the election, so H(Y |X) > 0. We can state two elements having distance 0 contain the same information, i.e. they are reversible translation of the same process.
Information Theory and Security: Quantitative Information Flow
95
The argument can be pushed even further; knowing that d is a pseudometric implies that the relation X ≡ Y ⇐⇒ d(X, Y ) = 0 is an equivalence relation; we can then state that An information element is an equivalence class [X]≡ This means that nothing outside [X]≡ contains the same information as X and any thing that contain the same information as X is inside [X]≡ . Theorem 1. d is a pseudometric (or a metric if we take the equivalence classes [X]≡ ) d(X, X) = 0 is trivial and the symmetry of d is also trivial hence to prove the result the only non trivial property to prove is triangular inequality: d(X, Z) ≤ d(X, Y ) + d(Y, Z), i.e. by unfolding the definition H(X|Z) + H(Z|X) ≤ H(X|Y ) + H(Y |X) + H(Y |Z) + H(Z|Y ) let us prove one half (the other half will be the same argument) H(X|Z) ≤ H(X|Y ) + H(Y |Z) H(X|Y ) + H(Y |Z) ≥ H(X|Z) ⇔ H(X, Y ) − H(Y ) + H(Y, Z) − H(Z) ≥ H(X, Z) − H(Z) ⇔ H(X, Y ) + H(Y, Z) ≥ H(X, Z) + H(Y ) We now show that by adding a positive quantity to the right hand side we get the left hand side hence proving the inequality. We have = H(X, Z) + H(Y ) + H(Y |X, Z) + I(X; Z|Y ) = H(X, Z) + H(Y ) + H(Y, X, Z) − H(X, Z) + I(X; Z|Y ) = H(Y ) + H(Y, X, Z) + I(X; Z|Y ) = H(Y ) + H(Y, X, Z) + H(X|Y ) − H(X|Z, Y ) = H(Y ) + H(Y, X, Z) + H(X|Y ) − H(X, Z, Y ) + H(Y, Z) = H(Y ) + H(X|Y ) + H(Y, Z) = H(Y ) + H(X, Y ) − H(Y ) + H(Y, Z) = H(X, Y ) + H(Y, Z) The quantity H(Y |X, Z) + I(X; Z|Y ) ≥ 0 we added to prove the inequality can be found by using Venn diagrams, a powerful source of intuition when reasoning
96
P. Malacaria and J. Heusser
Y
X
a
b
c
d e
f
g
Z
Fig. 1. Reasoning with Venn diagrams
in Information Theory. Figure 1 shows r.v. X, Y, Z as the three main circles. H(X, Y ) corresponds to the union of X, Y , i.e. the regions a + b + c + d + e + f , similarly H(Y, Z) is made up by the regions b + c + d + e + f + g. The right hand side of the inequality gives H(X, Z) corresponding to a + b + d + e + f + g and H(Y ) corresponding to b + c + d + f . By subtracting the right hand side from the left hand side we are left with the regions c and e: c is the region corresponding to taking out X, Z from Y , i.e. Y − X, Z which is the term H(Y |X, Z) and e is the intersection of X and Z minus Y , X ∩ Z − Y which corresponds to I(X; Z|Y ). As pointed out by Yeung [43] Venn diagrams’ reasoning can be used by seeing entropy as a measure μ on sets corresponding to random variables and then using the following interpretation: 1. 2. 3. 4.
μ(X μ(X μ(X μ(X
∪ Y ) = H(X, Y ) − Y ) = H(X|Y ) ∩ Y ) = I(X; Y ) ∩ Y − Z) = I(X; Y |Z)
A related3 notion is an order on random variables defined as X ≥d Y ⇔ H(Y |X) = 0 The intuition here is that X provides complete information about Y , or equivalently Y has less information than X, so Y is an abstraction of X (some information is forgotten). Let us now relate this order with the lattice of information. We can show that when we consider the lattice of information as a lattice of random variables then 3
Notice that d induces a metric hence a topology. This topology can be completed by adding Cauchy converging sequences, i.e. lim
n→∞,m→∞
We will ignore these completion.
d(Xn , Xm ) = 0
Information Theory and Security: Quantitative Information Flow
97
the order above defined is the same as the order in LoI, hence they define the same lattice. Theorem 2 X Y ⇔ X ≤d Y To prove the result let us first define, given two partitions X and Y , the conditional partition Y |X = x (where x is a block in X) as the intersection of all blocks in Y with x; given Y |X = x a probability distribution is achieved by normalising the probabilities (the normalisation factor being p(x)). The notation Y |X = x is justified because we have that H(Y |X = x) is the usual notion of information theoretical entropy of the variable Y given the event X = x. Formally, Y |X = x ≡ {y ∩ x|y ∈ Y } and the probability distribution associated to Y |X = x is {
p(y ∩ x) |y ∈ Y } p(x)
Proof: Let’s start from the direction X Y ⇒ X ≥d Y Suppose now that X refines Y , then Y |X = x consists of at most one block (the block of which x is a subset of). Therefore H(Y |X = x) = 0 for all x and it follows that H(Y |X) = 0. X Y ⇐ X ≥d Y For the reverse implication suppose X does not refine Y , so there exists a block in X where its elements intersect two blocks in Y , for such x we have Y |X = x ≡ y, y , ... and hence H(Y |X = x) > 0, so H(Y |X) > 0 which proves the result.
3 3.1
Measuring Leakage of Programs Observations over Programs
An observation over a program P is an equivalence relation on states of P . A particular equivalence class will be called an observable. Hence an observable is a set of states indistinguishable by an attacker making that observation. The above intuition can be formalized in terms of several program semantics. We will concentrate here on a specific observation: the output observation [25]. For this observation the random variable associated to a program P is the equivalence relation on any two states σ, σ from the universe of states Σ defined by σ σ ⇐⇒ [[P ]](σ) = [[P ]](σ )
(9)
98
P. Malacaria and J. Heusser
where [[P ]] represents the denotational semantics of P . Hence the equivalence relation amounts to“ have the same observable output”. We denote the interpretation of a program P in LoI as defined by the equivalence relation (9) by Π(P ). According to denotational semantics commands are considered as state transformers, informally maps which change the values of variables in the memory; similarly, language expressions are interpreted as maps from the memory to values. The relation Π(P ) is nothing else than the kernel of the denotational semantic of P . 3.2
LoI Interpretation of Programs and Basic Properties
For the example programs used, we are referring to a simple imperative language with assignments, sequencing, conditionals and loops. Syntax and semantics for the language are standard, as in e.g. [47]. The expressions of the language are arithmetic expression, with constants 0, 1, . . . and boolean expressions with constants tt, ff. To see a concrete example, let P be the program if h==0 then x=0 else x=1 where the variable h ranges over {0, 1, 2, 3}. The equivalence relation (i.e. partition) Π(P ) associated to the above program is then O = { {0} {1, 2, 3}} x=0
x=1
O effectively partitions the domain of the variable h, where each disjoint subset represents an output. The partition reflects the idea of what a passive attacker can learn of secret inputs by backwards analysis of the program, from the outputs to the inputs. The quantitative evaluation of the partition O measures such knowledge gains of an attacker, solely depending on the partition of states and the probability distribution of the input. The next proposition shows how algebraic operations in LoI can be expressed using programs. Proposition 2. Given programs P1 , P2 there exists a program P12 such that Π(P12 ) = Π(P1 ) Π(P2 ) Given programs P1 , P2 , we define P12 = P1 ; P2 where the primed programs P1 , P2 are P1 , P2 with variables renamed so to have disjoint variable sets. If the two programs are syntactically equivalent, then this results in self-composition [3]. For example, consider the two programs P1 ≡ if (h == 0) x = 0 else x = 1,
P2 ≡ if (h == 1) x = 0 else x = 1
with their partitions Π(P1 ) = {{0}{h = 0}} and Π(P2 ) = {{1}{h = 1}}. The program P12 is the concatentation of the previous programs with variable renaming
Information Theory and Security: Quantitative Information Flow
99
P12 ≡ h = h; if (h == 0) x = 0 else x = 1; h = h; if (h == 1) x = 0 else x = 1 The corresponding lattice element is the join, i.e. intersection of blocks, of the individual programs P1,2 Π(P12 ) = {{0}{1}{h = 0, 1} = {{0}{h = 0}} {{1}{h = 1}} The above result can be extended to expressions of the language: we can associate to an expression e the program consisting of the assignment x = e and use Proposition 2 to compute the l.u.b. in LoI of a set of expressions. 3.3
Definition of Measuring Leakage
Let us take the following intuition The leakage of confidential information of a program is defined as the difference between an attacker’s uncertainty about the secret before and after her available observations about the program. For a Shannon-based measure, the above intuition can be expressed in terms of conditional mutual information. In fact if we start by observing that the attacker uncertainty about the secret before observations is H(h|l) and the attacker uncertainty about the secret after obervations is H(h|l, Π(P )) then using the definition of conditional mutual information we define leakage as H(h|l) − H(h|l, Π(P )) = I(h; Π(P )|l) We can now simplify the above definition as follows I(Π(P ); h|l) = H(Π(P )|l) − H(Π(P )|l, h) =A H(Π(P )|l) − 0 = H(Π(P )|l) =B H(Π(P ))
(10)
where equality A holds because the program is deterministic and B holds when the program only depends on the high inputs, for example when all low variables are initialised in the code of the program. Thus, for such programs Leakage: (Shannon-based) leakage of a program P is defined as the (Shannon) entropy of the partition Π(P ). Notice that the above definition can be easily adapted to other real valued maps from the lattice of information, providing possibly different definitions of leakage: Π(P ) provides a very general representation that can be used as the basis for several quantitative measures likes Shannon’s entropy, Renyi entropies or guessability measures. We can relate the order in LoI and the amount of leakage by the following result Proposition 3. Let P1 , P2 be two programs depending only on the high inputs. Then Π(P1 ) Π(P2 ) iff for all probability distributions on states in LoI, H(Π(P1 )) H(Π(P2 )).
100
4
P. Malacaria and J. Heusser
Foundational Issues about Measuring Leakage
Let us revisit the idea that Shannon’s entropy measures the information content of a random variable. Consider a horse race including four horses and the random variable W for “the winner is”. W can take four values, value i standing for ”the winner is the i−th horse”. The information content of a random variable can also be interpreted as the minimum space needed to store and transmit the possible outcomes of a random variable. 1. Suppose the r.v. W takes one of its 4 possible values with probability 1, so the other values have probability 0. Then there is only one possible outcome for the variable, which is known: is the value with probability 1, hence no space is needed to store or transmit the information content of W , i.e. W has 0 information content. 2. Suppose, at the other extreme, that all 4 values are equally likely. In that case the information content of W is 2 because using 2 bits is possible to store 4 values. 3. If there were only two possible values and they were equally likely then the information content of W would be 1 because in 1 bit is possible to store 2 values. Accordingly the entropy of W , H(W ) will take on the values 0, 2, 1 respectively when W follows the distributions p1 = 0, 0, 0, 1 (for the first case), p2 = 1/4, 1/4, 1/4, 1/4 (for the second case) and p3 = 1/2, 1/2, 0, 0 (for the third case). 4.1
Guessability 1: Dictionary Attack
Let us now consider a different idea. Instead of measuring the information content of W we now measure its guessability G(W ), i.e. the number of attempts that on average we need to guess the winner by choosing at each stage the most likely element not yet chosen. In security terms this method is called a dictionary attack. 1. Suppose the r.v. W takes one of its 4 possible values with probability 1, so the others have probability 0. Then there is only one possible outcome for the variable, which is known so we need 0 guesses to guess the winner. 2. The other extreme assumes that all 4 values are equally likely. In that case with one guess we will guess the right horse 1/4th of the times, with 2 guesses we will be right 1/4th of the times, with 3 guesses will be right 1/4th of the times and with 4 guesses 1/4, so (1/4) + (2/4) + (3/4) + (4/4) = 2.5 on average we will need 2.5 guesses to guess the winner. 3. If there were only two values possible and they were equally likely then on average we would only need 1 guess 1/2 of the times and 2 1/2 i.e 1.5 guesses on average.
Information Theory and Security: Quantitative Information Flow
101
The general definition of guessability for a random variable where its distribution is written in decreasing order xi ≥ xi+1 is G(W ) = i ixi . In general if there are n elements that are equally likely then G(W ) = ip(xi ) = 1/n i = 1/n ∗ n(n + 1)/2 = (n + 1)/2 i
1≤i≤n
whereas Shannon entropy results in H(W ) = H(1/n, . . . , 1/n) = log2 (n) For example when n = 100 then G(W ) = 101/2 = 50.5, H(W ) = log2 (100) = 6.6438. So there is a significant difference between average number of guesses and entropy; but notice that entropy is always lower than guessability. So what does entropy really measure? 4.2
Guessability 2: The 20 Questions Game
In the 20 questions game a player thinks of an object and his opponent can ask yes or no questions with the aim to guess the object with the minimum number of questions, usually less than 20 are needed to succeed. Using a dictionary attack for asking questions is not a clever strategy because it only eliminates one object at each round. A better strategy instead is to ask questions about sets of elements, i.e. if the object is or isn’t an element of a set. If the set is chosen carefully a large number of objects can be eliminated at each round. Assuming a uniform distribution then with 20 questions and yes/no answers there are 220 = 1048576 possible items that can be identified. This strategy is played as follows: 1. split the universe of all possible items into two sets of equal size A, B. Then ask if the item is in set A. 2. if the answer is yes set the universe to be the set A, if the answer is no then set the universe to be the set B. Go to step 1. Suppose now we believe that the player has chosen an item with a higher probability than other items. What is the best way to act? We could just ignore our believe, or we could combine it in creating a set with probability 1/2 containing that item, we could just try the guess that item? As an example, we have 8 possible items and probabilities: 1/4, 1/8, 5/48, 5/48, 5/48, 5/48, 5/48, 5/48 We would have the choices: – ignore: define A = {1/4, 1/8, 5/48, 5/48}, B = {5/48, 5/48, 5/48, 5/48} and ask is it in A (or B)? – brute force: guess item with probability 1/4 – combine: set A = {1/4, 1/8, 5/48}, B = {5/48, 5/48, 5/48, 5/48, 5/48} and ask is it in A?
102
P. Malacaria and J. Heusser
Information theory tell us that the best strategy is to combine. This can be proven as follows. Encode the universe with an Huffman code. Then ask questions about the leftmost unknown bits of the code The Huffman code of a set of events E is defined by building a binary tree as follows: Initialise the set P as the set of probabilities of the events in E and T as the empty set. Step: given P and a set of trees T pick from P two elements a, b with the lowest probability (if several have the same lowest probability randomly pick two of them). Add to T the the new tree consisting of a new parent node c with children a and b. Add to P the element c where its probability is the sum of probabilities of a and b The Huffman code of the previous example is built as follows: 1. joint x3 , x4 with probability 5/48, 5/48; get a new element y1 with probability 10/48 = 5/24 2. joint x5 , x6 with probability 5/48, 5/48; get a new element y2 with probability 10/48 = 5/24 3. joint x7 , x8 with probability 5/48, 5/48; get a new element y3 with probability 10/48 = 5/24 4. joint x2 , y1 with probability 1/8, 5/24; get a new element y4 with probability 9/24 = 5/24 5. . . . This results in the following code: x1 = 00, x2 = 010, x3 = 0110, x4 = 1111, x5 = 100, x6 = 101, x7 = 110, x8 = 1110 Now the question about the leftmost unknown bit corresponds to partitioning the universe into A = {1/4, 1/8, 5/48}, B = {5/48, 5/48, 5/48, 5/48, 5/48} and asking if the object is in A? The average length of the words (we calculate this as the sum of all lengths multiplied by their probability) is 1/4 ∗ 2 + 1/8 ∗ 3 + 2 ∗ 5/48 ∗ 4 + 4 ∗ 5/48 ∗ 3 = 2.95833333 We can see the word 00 as identifying the element x1 by the following sequence of questions/answers: “is the leftmost bit 0? Yes. Is the next leftmost bit 0? Yes. Then is x1 ”. In general by seeing a word as the sequence of questions/answers that have the encoded element as the outcome then we can see the average length as the average length of the sequence of questions/answers needed to guess elements of that universe. Compare the average length above with the entropy of the same probability space H(1/4, 1/8, 5/48, 5/48, 5/48, 5/48, 5/48, 5/48) = 2.9143965
Information Theory and Security: Quantitative Information Flow
103
This is not a coincidence, in fact Shannon’s entropy measures the average length of the sequence of questions/answers in an optimal guessing strategy. Notice the tiny discrepancy of 0.04 between Huffman code and Entropy . Entropy is a lower limit on coding and although Huffman algorithm is pretty close to such a limit it is still above it. The important remark is that Huffman algorithm is optimal, i.e. there is no other feasible code that performs better4 . Suppose now we could find a more efficient strategy to play the 20 questions game. Then this can easily turned into an an algorithm that given any finite probability space would give us an average shorter sequence of binary codes for elements of the probability space than the one given by Huffman coding. This contradict the optimality of Huffman coding so it is not possible. 4.3
Leakage and Guessability: Smith’s Example
Let us see how this investigation on guessing strategies relates to some recent fundational debate on quantitative information flow [41]. Consider the two programs below and assume the secret is a 8k bits variable under uniform distribution i.e. H(h) = 8k. 1. if (h % 8 == 0) l=h else l=1 The leakage of this program consists of a conditional statement can be computed as the leakage of the guard plus the weighted leakage of the branches i.e. H(p(h%8 == 0)) + p(h%8 == 0)H(l = h|h%8 == 0) + p(h%8 = 0)H(l = 1|h%8 = 0) that is 1 7 1 H( , ) + H(l = h|h%8 = 0) 8 8 8 7 + H(l = 1|h%8 = 0) 8 1 7 1 7 = H( , ) + log(28k−3 ) + 0 = 8k − 7k + 0.169 8 8 8 8 Smith computes the leakage using mutual information as I(h; l) = H(h) − H(h|l). We have already seen that the two definitions of leakage are equivalent; in fact H(h|l) is 7k−0.169 so the leakage H(h)−H(h|l) 8k−7k+0.169. 2. l = h & 07k−1 1k+1 This program copies the last k + 1 bits of h into l., hence its leakage is k + 1 Alternatively using mutual information we have H(h|l) is 7k − 1 so the leakage H(h) − H(h|l) 8k − 7k + 1 k + 1. The programs leak a similar amount. 4
Some minor improvement can be achieved in some context
104
P. Malacaria and J. Heusser
Smith’s point is that program 1 is much a bigger threat than 2, because after running program 1 the attacker has one chance in 8 to guess the secret whereas after running program 2 the probability to guess the secret is much lower, at 1/27k−1 . On this basis Smith proposes a measure (based on Min-Entropy [36]) according to which program 1 has a much bigger measure than program 2. So what is wrong with Shannon’s entropy for those examples? 4.4
Meaning of Shannon’s Measure
Smith’s observation assumes an attacker attempts a single guess of the secret after running the program just once. While this is often a reasonable assumption about the real world this kind of attacks (like the dictionary attacks we saw before) are not the most powerful guessing strategy and hence it may underestimate the power of an attacker. Suppose the attacker has, after running the program, an optimal guessing strategy: he can play a 20 questions game using the outcome of the two programs. Then with program 2 from before there are around k + 1 bits leaked, i.e. 8k − (k + 1) = 7k − 1 bits are left so we would need 7k − 1 questions using an optimal strategy to guess the secret. With program 1 in 1/8th of the cases the attacker will need 0 questions, whereas in 7/8 of the cases he will face a set of size 7 ∗ 28k−3 where the secret could be; we can approximate this to 23 28k−3 = 28k i.e. in 7/8 of the cases we need around 8k questions to guess the secret. The expected number of questions is around 1/8 ∗ 0 + 7/8 ∗ 8k = 7k This argument justifies why Shannon’s leakage of the two programs above is similar. Hence Shannon’s measure indicates the threat level of a program when attacked by (in some respect) the most powerful attacker, and hence provide a good lower bound to the security threat of programs for most security scenarios. However as shown by Smith’s work other measure may be more appropriate in particular contexts and guessability in n tries or within a confidence interval are sometimes a better indication of the threat level of code.
5
Reasoning about Programs: Looping Constructs
The generality of the definition of leakage we gave in Section 3 may present a problem. In fact it abstracts over all programming constructs and so it doesn’t tell much about how to go on to reason about the leakage of specific program constructs. In this section, we introduce reasoning techniques for a very challenging program construct: loops. Looping constructs are one of the most challenging aspects of programming languages. Most kind of program analyses would be much simpler if it wasn’t for loops. The main complication of loops is that they introduce “circular” dependencies between program points. Circular dependencies, if taken literally usually
Information Theory and Security: Quantitative Information Flow
105
result in poor analysis where either nothing or everything is leaked. Hence any useful analysis needs to provide general reasoning tools to cleverly break down this apparent circularity. We present two approaches to the analysis of loops: the first approach, not based on the lattice of information, follows [25,26] and provide an analysis based on the source of leakage and the number of iterations. The second approach, based on the lattice of information interpret loops in terms of chains in the lattice of information and their leakage as the entropy of the least upper bound of the chain. The two approaches are shown to be equivalent. 5.1
Loops: Analytical Approach
A possible way to analyse leakage of loops is an analysis of possible sources of leakage. Both the guard and the body of a loop can be sources of leaks. In fact it has been shown [25] that those are two of the three components needed to provide a precise quantitative analysis. The three components are: Guard: the information about the number of iterations of the loop Body: the information about the output given knowledge of the number of iterations Collisions: the information about the number of iterations given knowledge of the output The idea is that the leakage of a looping program (noted L(P )) is given by the information leaked by the guard plus the information leaked by the body minus the ambiguity given by the collisions. In terms of random variables this can be expressed as follows [26] (the following random variables will be formally defined later on): L(P ) = H(NIterations(P ))+H(P |NIterations(P ))−H(NIterations(P )|P ) guard
body
collisions
Consider this example program l=0; while(l < h) { if (h==2) l=3 else l++ } and suppose h,l are two bit variables with range {0, 1, 2, 3} and all values of h are equally likely. Then the loop will terminate in 0 iterations with probability 0.25 (i.e. only when h=0); it will terminate in 1 iterations with probability 0.5 (i.e. only when h=1 and h=2), it will terminate in 2 iterations with probability 0 and it will terminate in three iterations with probability 0.25 (i.e. only when h=3). Now we have the first ingredient of our formula: H(NIterations(P )) = H(0.25, 0.5, 0.25) guard
106
P. Malacaria and J. Heusser
Considering the leakage in the body, we have that in the case of two and three iterations there is no uncertainty left about the secret (0 bits of information), and in the case of two iterations the body leaks the information that h=1 or h=2 (1 bit of information). This amounts to: H(P |NIterations(P )) = 0.25 ∗ 0 + 0.25 ∗ 0 + 0.5 ∗ 1 body
For the collisions, notice that the output l=3 can be the result of one or three iterations, hence the output l=3, happening with probability 0.5 generates 1 bit of uncertainty about the number of iterations (it could be one or three iterations). This give the last element of the leakage formula: H(NIterations(P )|P ) = 0.5 ∗ 1 collisions
For this particular program the leakage is then H(0.25, 0.5, 0.25) + 0.25 ∗ 0 + 0.25 ∗ 0 + 0.5 ∗ 1 − 0.5 ∗ 1 = H(0.25, 0.5, 0.25) = 1.5 The fact that 1.5 is the correct amount leaked can be checked with the intuition. An attacker observing the output of the program may observe l = 0 in which case knows that h = 0, may observe l = 1 in which case knows that h = 1, may observe l = 3 in which case knows that h = 2 or h = 3. These three observations have probability 0.25, 0.25, 0.5 respectively and so the leakage given the observations is H(0.25, 0.5, 0.25) = 1.5 We are now going to make this argument formal following [25]. 5.2
Loops as Disjoint Union of Functions
Given a looping program P ≡ while e M that depends only on a high input variable h let associate to P the following random variables: NItP is the random variable “number of iterations the loop terminates in”. The associated distribution p(NItP = n) is the sum of the probabilities of all values of h such that for those values P terminates in n iterations. p(NItP = n) = {p(h = v)|P (v) terminates in n iterations} We can then show that this analytical approach give the same leakage as in definition 10: Proposition 4 H(Π(P )) = H(NItP ) + H(Π(P )|NItP ) − H(NItP |Π(P )) Proof: We use the information theoretical equality H(Y ) = H(X) + H(Y |X) − H(X|Y ) which is true because by definition of conditional entropy
(11)
Information Theory and Security: Quantitative Information Flow
107
H(X) + H(Y |X) − H(X|Y ) = H(X) + H(Y, X) − H(X) − H(X|Y ) = H(X) + H(Y, X) − H(X) − H(X, Y ) + H(Y ) = H(Y ) The result then follows with replacing X = NItP , Y = Π(P ).
This proposition states that the leakage of a looping program is equivalent to the uncertainty about the number of iterations it takes for the loop to terminate plus the uncertainty about the output of the program knowing how many iterations it took to terminate minus the uncertainty in the number of iterations it took to terminate knowing the output of the program. We interpret the elements in equation 11 as follows 1. H(NItP ) is the leakage of the guard 2. H(Π(P )|NItP ) is the leakage of the body 3. H(NItP |Π(P )) is the measure of the collisions of the loop A collision is an observable value that could be generated in different numbers of iterations of the loop. We can “approximate” the r.v. NItP by the r.v. NItPn which is “number of iterations ≤ n the loop terminates in”. The possible values for NItPn are 0, . . . , n, where the last value n is for “the loop terminates in > n iterations”. Probabilities associated to NItP n are also an approximation of the probabilities of NItP. They are defined by if m ≤ n, p(NItP = m) p(NItPn = m) = 1 − {p(NItP = s)|s > n} otherwise. 5.3
Basic Definitions
Definition 1. Define the leakage of a collision free loop while e M up to n iterations by W (e, M )n = H(NItPn ) + H(Π(P )|NItPn ) Proposition 5. ∀n ≥ 0, W (e, M )n ≤ W (e, M )n+1 Proof: The proof can be decomposed in showing that H(NItPn ) ≤ H(NItP n+1 ) which is true because NItPn+1 refines the distribution NItPn . To prove the other component of the inequality, i.e. H(Π(P )|NItP n ) ≤ H(Π(P )|NItP n+1 ) consider the event e as the ”loop terminates in n + 1 iterations”. Using the definition of conditional entropy we have then p(e)H(Π(P )|NItPn = e) H(Π(P )|NItPn ) = NItPn =e
≤
p(e)H(Π(P )|NItP n = e) + p(e )H(Π(P )|e )
NItPn =e
=
p(e)H(Π(P )|NItPn+1 = e)
NItPn+1 =e
= H(Π(P )|NItP n+1 )
108
P. Malacaria and J. Heusser
Using proposition 4 we can hence define the leakage of a loop as lim W(e, M)n − H(NItP |Π(P ))
n→∞
(12)
which simplify when there are no collisions to lim W(e, M)n
n→∞
(13)
using this simplified definition we now formalize some important concepts. The rate of leakage is W(e, M)n n→∞,p(NItPn =n)
=0 n lim
Thus in the case of terminating loops the rate will be the total leakage divided by the number of iterations. This can be considered a rough measure of rate: for example if the first iteration were to leak all secret and the following billion nothing the rate would still be one billionth of the secret size. However as in our model the attacker can only perform observations on the output and not on intermediate states of the program the chosen definition of rate will give indication of the timing behavior of the channel in that context. A fundamental concept in Information Theory is channel capacity, i.e. the maximum amount of leakage over all possible input distributions, i.e. maxp lim W(e, M)n n→∞
(14)
In our setting we will look for the distribution which will maximize leakage. Informally such a distribution will provide the setting for the most devastating attack: we will refer to this as the channel distribution. Also we will use the term channel rate for the rate of leakage of the channel distribution. Again this should be thought of as the average maximal amount of leakage per iteration. To define rate and channel capacity on the case of collisions the above definitions should be applied on the definition of leakage for loops with collisions. The previous definitions can be used to give a simple classification of the leakage behaviour of loops: for example a bounded loop is one where even if we were able to increase arbitrarily the size of the secret we would not be able to increase arbitrarily the amount leaked. Similarly we can define the rate of leakage as increasing (or decreasing or constant) if increasing the size of the secret increases (or decreases or keeps constant) the rate. Notice also that the rate of leakage is loosely related to timing behaviour. In loops with decreasing rate if the size of the secret is doubled each iteration will (on average) reveal less information than each iteration with the original size. We will discuss timing behaviour in one example shortly. In most cases a separation property of the definition of leakage for loops can be exploited. As shown, the definition neatly separates information flows in the
Information Theory and Security: Quantitative Information Flow
109
guard and body of a loop. If there is no leakage in the body – e.g. no high variable appears in the body of the loop – (13) reduces to lim H(NItP n )
(15)
n→∞
On the other hand, if there is no indirect flow from the guard – e.g. e doesn’t contain any variable affected by high variables – then (13) reduces to lim H(Π(P )|NItP n )
(16)
n→∞
5.4
Examples
Let us apply the previous theory to the analysis of two looping programs. Unless stated otherwise we assume uniform distribution for all input random variables and that the high input is a k-bit variable assuming possible values 0. . . . , 2k − 1 (i.e. no negative numbers). An unbounded covert channel with decreasing rate. Consider the following simple loop with an increasing counter l: l=0; while (l != h) { l=l+1 } No high variable appears in the body of the loop, so there is no leakage in the body, i.e lim H(Π(P )|NItPn ) = 0 n→∞
Therefore we only need to study the behaviour of lim H(NItP n )
n→∞
The events associated to the random variable NItP n are: ⎧ ⎪ ⎨ 0 = h, if i = 0 (NItP n = i) = ⎪ ⎩ 0 = h, . . . , i = h ∧ i + 1 = h, if i > 0 hence every event is equally likely, i.e. p(NItPn = i) = possible guards is then lim H(NItP n ) = H(
n→∞
1 . 2k
The entropy over all
1 1 , . . . , k ) = log(2k ) = k k 2 2
As expected all k-bits of a variable are leaked in this loop, for all possible k; however to reveal k bits 2k iterations are required. We conclude that this is an unbounded covert channel with decreasing rate 2kk . To attach a concrete timing meaning to this rate, let t1 , t2 be the time taken by the system to evaluate the
110
P. Malacaria and J. Heusser
expression l != h and to execute the command l = l+1 respectively. Then the above program leaks 2kk bits per t1 + t2 milliseconds. Notice that uniform distribution maximizes leakage, i.e. it achieves channel capacity. Consider for example the following input distribution for a 3-bit variable: p(0) =
7 1 , p(1) = p(2) · · · = p(7) = 8 56
In this case the attacker knows, before the run of the program, that 0 is much more likely than any other number to be the secret, so the amount of information revealed by running the program is below 3 bits (below capacity). In fact, we have 1 7 1 H( , , . . . , ) = 0.8944838 8 56 56 Notice that whatever the distribution the security of this program is 0 and leakage ratio 1. A bounded covert channel with constant rate. The next example is a loop with a decreasing counter and a slightly different guard expression: l=20; while (h < l) { l=l-1 } Again, since the body of the loop does not contain any high variable, the body part of the leakage is 0 lim H(Π(P )|NItP n ) = 0
n→∞
Thus we only need to study the leakage of the guard. After executing the program, l will be 20 if h ≥ 20 and will be h if 0 ≤ h < 20, i.e. h will be revealed if its value is in the interval 0 . . . 19. The events associated to NItPn are:
(NItPn
⎧ h < 20 − i ∧ h ≥ 20 − (i + 1) ≡ ⎪ ⎪ ⎪ ⎨ h = 20-(i+1), i>0 = i) = ⎪ ⎪ ⎪ ⎩ h ≥ 20, i=0
and
p(NItP n
⎧ k 2 −20 ⎪ if i = 0 ⎪ 2k ⎪ ⎪ ⎪ ⎪ ⎨ = i) = 21k if 0 < i ≤ 20 ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎩ 0 if i > 20
Information Theory and Security: Quantitative Information Flow
111
The leakage is then given by lim H(NItPn ) =
n→∞ k
2 − 20 1 1 , k , . . . , k , 0, . . . , 0) = 2k 2 2 2k − 20 2k − 20 1 1 log( ) − 20( k log( k )) − k k 2 2 2 2 H(
This function is plotted in Figure 2 for k = {6 . . . 16}. The interesting element in the graph is how it shows that for k around 6 bits the program is unsafe (more than 2.2 bits of leakage) whereas for k from 14 upwards the program is safe (around 0 bits of leakage).
Fig. 2. Leakage in l=20; while (h < l) {l=l-1}
However, the uniform distribution is not the channel distribution. The capacity of this channel is 4.3923 and is achieved by the distribution where the only values with non zero probability forh are in the range {0 . . . 20} and have uniform distribution5 . The channel distribution ignores values of h higher than 20, so the channel rate = 0.2091. We conclude that this is a bounded covert channel is constant 4.3923 21 with decreasing rate.
6
Loops in the Lattice of Information
We are now to show how the previous analysis of loops is naturally interpreted in the lattice of information. In informal terms the key result is that leakage of loops is the semivaluation of the l.u.b. of a chains of points in the lattice of 5
We are ignoring the case where k < 5 where the capacity is less than 4.3923.
112
P. Malacaria and J. Heusser
information, where the chain is the interpretation of the different iterations of the loop. To understand the ideas let’s consider again the program l=0; while(l < h) { if (h==2) l=3 else l++ } and let us now study the partitions it generates. The loop terminating in 0 iterations will reveal that h=0 i.e. the partition W0 = {{0}{1, 2, 3}}, termination in 1 iteration will reveal h=1 if the output is 1 and h=2 if the output is 3 i.e. W1 = {{1}{2}{0, 3}}, the loop will never terminate in 2 iterations i.e. W2 = {{0, 1, 2, 3}} and in 3 iterations will reveal that h=3 given the output 3, i.e. W3 = {{3}{0, 1, 2}}. Let’s define W≤n as n≥i≥0 Wi , we have then W≤1 = W≤2 = W≤3 = {{0}{1}{2}{3}} We also introduce an additional partition C to cater for the collisions in the loop: the collision partition is C = {{0}{1}{2, 3}} because for h=2 the loop terminates with output 3 in 1 iterations and for h=3 the loop terminates with output 3 in 3 iterations, hence H( n≥0 W≤n C) = H({{0}{1}{2, 3}}) Notice now that the analytic and lattice interpretation give the same result: assuming uniform distribution we get H(0.25, 0.5, 0.25) + 0.5 H(0.5, 0.5) − 0.5 H(0.5, 0.5) = guard
body
collisions
= 1.5 = H({{0}{1}{2, 3}}) The above is not a coincidence; using the lattice of information we can relate this analytic formula to the join semi-valuation of lattice chains: We can interpret looping programs in the lattice of information as least upper bounds of increasing sequences; for some loops (those with collisions) this is not immediately true: we will show however that all loops can be interpreted as the meet of the l.u.b. of an increasing sequence and a point in the lattice representing the collisions. 6.1
Algebraic Interpretation
Given a loop W , let Wn be the program W up to the n-th iteration. The random variable associated to Wn is hence a partition where only the outputs of W up to the n−th iteration are distinguished. Hence, Wn+1 will refine Wn by introducing additional blocks. As a simple example of a collision free program consider the “linear search” program P below
Information Theory and Security: Quantitative Information Flow
113
l=0; while (l
n, because Wi+1 destructively refines or “splits” a finite block of Wi into smaller equivalence classes. 6.2
Loops with Collisions
Let us look at the colliding program shown in Figure 3. It consists of two iterations, represented by functions f1 and f2 . The exact partition for this program is P = {{a, a }, {x, x , y}, {c}} The chain of partitions associated to the program is the following: W1 = {{a, a }, {x, x }, {y, c}} W2 = {{a, a }, {x, x , y}, {c}}
114
P. Malacaria and J. Heusser
f1
f2
a a' x x'
b'
y c
b''
b
Fig. 3. Two iterations with one collision at b
We see that W2 extends the block containing x, x with y because all three of them have the same image b . This reflects the idea of collisions, namely that two (or more) elements of the codomain of two different iteration functions, here f1 and f2 coincide. The result is that their inverse images are indistinguishable from one another and therefore end up being in the same block, here {x, x , y}. Then, W2 is equal to P . However, because W2 extends a block in W1 this is not an ascending chain anymore; actually by choosing a distribution assigning probability 0 to c, we can see that H(W1 ) > H(W2 ) and therefore theorem 3 is false in case of collisions. To address this problem we first introduce a trick to transform a sequence of partitions into an ascending chain of partitions: given a sequence of partitions (Wi )i≥0 define the sequence (W≤i )i≥0 by W≤i = j≤i Wj It is easy to see that (W≤i )i≥0 is an increasing chain. Define now the collision equivalence of a loop W as the reflexive and transitive closure of the relation σ C σ iff σ, σ generate the same output from different iterations. We are now ready to relate the leakage of arbitrary loops with semivaluations on LoI. Theorem 4. The leakage of an arbitrary loop as in definition 12 is equivalent to semivaluating the meet of the least upper bound of its increasing chain W≤n and its collision partition C, i.e. lim W(e, M)n − H(NItP |Π(P )) = H( n≥0 W≤n C)
n→∞
Proof: Notice first that increasing chains xn with a maximal element in a lattice do distribute, i.e.: ( n≥0 xn ) y = n≥0 (xn y) Assuming distributivity the argument is then easy to show: ( n≥0 W≤n C) = n≥0 (W≤n C)
Information Theory and Security: Quantitative Information Flow
115
Notice now that (W≤n C)n≥0 is a chain cofinal to the sequence (Wn )n≥0 and so we can conclude that n≥0 (W≤n C) is the partition whose semivaluation corresponds to W (e, M ). Notice the generality of the lattice approach: we can replace Shannon entropy H with any real valued map form the lattice of information F and we get a definition of leakage for loops as follows: F ( n≥0 (Wn C))
7
Automation
By now it is clear that a central ingredient to quantifying information flows in programs is the partitioning of the secret space into indistinguishable subsets, i.e. equivalence classes. One equivalence class contains all inputs which lead to the output described by the equivalence class. Terauchi and Aiken [44] describe the crucial insight into automatically quantifying information flows by stating that a program with secure information flows satisfies the 2-safety property. This means that insecure information flows in a program can be detected by observing two finite traces of the program which lead to a distinction in the outputs from related inputs. Figure 4 describes this situation, where each oval describes an equivalence class and the four dots inside the top figure are elements in the secret space. Let us take the top partition as an initial partition of the secret and the bottom partition as “output” partition generated by a program. Under this setup, the arrow to B from the first equivalence class represents a violation of the 2-safety property: two initially indistinguishable secret elements are now in distinct equivalence classes A and B. Checking the initial partition for every such violation is equivalent to describing the “output” partition. Given that partition, the quantification is simply achieved by applying different entropy measures on it as described in previous sections. Thus, the question any automatic technique has to address in one way or the other is how to find the “output” partition given a program and an initial secret partition (usually the ⊥ partition with only one equivalence class). The next
A
B
C
Fig. 4. Distinction in class B as Non-Interference violation
116
P. Malacaria and J. Heusser
sections describe different approaches to solving this problem, starting with a more thorough description of our own tool AQuA (which is partially inspired by the tool described in Section 7.2) and then reviewing other existing techniques. 7.1
SAT Solving and Model Counting
The computationally intensive task of AQuA is to automatically calculate the output partition given a C program code. Given a program P , its partition is denoted as Π(P ) as defined in Section 3. Applying any measure to it, e.g. F (Π(P )), is in comparison to finding the partition cheap and easy (if the probability distribution is known). The idea behind the partition discovery is best explained using the recurring password example with 4 bit variable width and the secret input variable pwd: if(pwd == 4) { return 1 } else { return 0 } The first step of the method is to find a representative input for each possible output. In our case, AQuA could find the set {4, 5}, for outputs 1 and 0, respectively. This is accomplished using a SAT-based fixed point computation. The next step runs on that set of representative inputs. For each input in that set, the number of possible inputs are counted which lead to the same implicit, distinct output. This step is accomplished using model counting. The next section will describe these two steps in more detail. Method. The method consists of two reachability analyses, which can be run either one after another or interleaved. The first analysis finds a set of inputs to which the original program produces distinct outputs for. That set has cardinality of the number of possible outputs for the program. The second analysis counts the set of all inputs which lead to the same output. This analysis is run on all members of the set of the first analysis. Together, these two analyses allow to discover the partition of the input space according to a program’s outputs.
Input: P= Output: Sinput Sinput ← ∅ h ← random Sinput ← Sinput ∪ {h} while P= (h) not unsat do (l, h ) ← Run SAT solver on P= (h) Sinput ← Sinput ∪ {h } h ← h P= ← P= ∧ l =l end Algorithm 1. Calculation of Sinput using P=
Information Theory and Security: Quantitative Information Flow
117
To a program P we associate two modified programs P
= and P= , representing the two reachability questions. The two programs are defined as follows: P
= (i) ≡ h = i; P ; P ; assert(l! = l )
P= (i) ≡ h = i; P ; P ; assert(l = l ) The program P is self-composed [3,44] and is either asserting low-equality or lowinequality on the output variable and its copy. Their argument is the initialisation value for the input variable. This method works on any number of input variables, but we simplify it to a single variable. The programs P
= and P= are unwound into propositional formula and then translated in Conjunctive Normal Form (CNF) in a standard fashion. P
= is solved using a number of SAT solver calls using a standard reachability algorithm (SAT-based fixed point calculation). Algorithm 1 describes this input discovery. In each iteration it discovers a new input h which does not lead to the same output as previous the input h. The new input h is added to the set Sinput . The observable output l is added to the formula as blocking clause, to avoid finding the same solution again in a different iteration. This process is repeated until P
= is unsatisfiable which signifies that the search for Sinput elements is exhausted. Given Sinput (or a subset of it) as result of Algorithm 1, we can use P= to count the sizes of the equivalence classes represented by Sinput using model counting. This process is displayed in Algorithm 2 and is straightforward to understand. The algorithm calculates the size of the equivalence class [h]P= for every h in Sinput by counting the satisfying models of P= (h). The output M of Algorithm 2 is the partition Π(P ) of the original program P . Proposition 8 (Correctness). The set Sinput of Algorithm 1 contains a representative element for each possible equivalence class of Π(P ). Algorithm 2 calculates {[s1 ]P= , . . . , [sn ]P= } which, according to (9), is Π(P ). Implementation. The implementation builds up on a toolchain of existing tools, together with some interfacing, language translations, and optimisations. See Figure 5 for an overview.
Input: P= , Sinput Output: M M =∅ = ∅ do while Sinput h ← s ∈ Sinput #models ← Run allSAT solver on P= (h) M = M ∪ {#models} Sinput ← Sinput \ {s} end Algorithm 2. Model counting of equivalence classes in Sinput
118
P. Malacaria and J. Heusser C CBMC Constr aints Optimisations SelfComp Language translation Spear Format SAT
#SAT
P=
S_input
P=
Partition
Fig. 5. Translation steps
AQuA has the following main features: – runs on a subset of ANSI C without memory allocation and with integer secret variables – no user interaction or code annotations needed except command line options – supports non-linear arithmetic and integer overflows AQuA works on the equational intermediate representation of the CBMC bounded model checker [15]. C code is translated by CBMC into a program of constraints which in turn gets optimised through standard program analysis techniques into cleaned up constraints6 . This program then gets self-composed and user-provided source and sink variables get automatically annotated. In a next step, the program gets translated into the bit-vector arithmetic Spear format of the Spear theorem prover [1]. At this point, AQuA will spawn the two instances, P= and P
= , from the input program P . Algorithms 1 and 2 get executed sequentially on those two program versions. However, depending on the application and cost of the SAT queries, once could also choose to execute them interleaved, by first calculating one input to the program P= and then model counting that equivalence class. For Algorithm 1, Spear will SAT solve P
= directly and report the satisfying model to the tool. The newly found inputs are stored until P
= is reported to be unsat. For Algorithm 2, Spear will bit-blast P= down to CNF which in turn gets model counted by either RelSat [4] or C2D. C2D is only used in case the user specifies fast model counting through command line options. While the counting is much faster on difficult problems than RelSat, the CNF instances have to be transformed into a d-DNNF tree which is very costly in memory. This is a 6
CBMC adds some constraints which distorts the model counting.
Information Theory and Security: Quantitative Information Flow
119
Table 1. Performance examples. * 30 loop unrollings; † from [2]; counted with C2D Machine: Linux, Intel Core 2 Duo 2GHz. Program #h range Σh bits P= Time P= + P= Time Spear LOC CRC8 1h.c 1 8 bit 8 17.36s 32.68s 370 CRC8 2h.c 2 8 bit 16 34.93s 1m18.74s 763 sum3.c† 3 0 . . . 9 9.96 (103 ) 0.19s 0.95s 16 sum10.c 10 0 . . . 5 25.84 (610 ) 1.59s 3m30.76s 51 nonlinear.c 1 16 bit 16 0.04s 13.46s 20 search30.c* 1 8 bit 8 0.84s 2.56s 186 auction.c† 3 20 bit 60 0.06s 16.90s 42
trade-off between time and space. In most instances, RelSat is fast enough, except in cases with multiple constraints on more than two secret input variables. Once the partition Π(P ) is calculated, the user can choose which measure to apply. Loops. The first step of the program transformations is treating loops in an unsound way, i.e. a user needs to define a fixed number of loop unwindings. This is a inherent property of the choice of tools used, as CBMC is a bounded model checker, which limit the number of iterations down to what counterexamples can be found. While this is a real restriction in program verification – as bugs can be missed in that way – it is not as crucial for our quantification purposes. Algorithm 1 detects at one point an input which contains all inputs beyond the iteration bound. Using the principle of maximum entropy, this “sink state” can be used to always safely over-approximate entropy. Let us assume we analyse a binary search examples with 15 unwindings of the loop and 8 bit variables. AQuA reports the partition Partition: {241}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}{1}: 256 where the number in the brackets are the model counts. We have 15 singleton blocks and one sink block with a model count of the remaining 241 unprocessed inputs. When applying a measure, the 241 inputs could be distributed as well in singleton blocks which would over-approximate (and in this case actually exactly find) the leakage of the input program. Proposition 9 (Sound loop leakage). Let us assume partition Π(P )n is the result of n unwindings of P , and Π(P )m is m unwindings of P , where m ≥ n. If every element of the “sink state” block b ∈ Π(P )n is distributed in individual ˆ )n , then Π(P )m Π(P ˆ )n . From Proposition blocks, the partition denoted as Π(P ˆ )n ). 3 follows that H(Π(P )m ) H(Π(P Experiences. Table 1 provides a small benchmark to give an idea on what programs AQuA has been tested on. The running times have been split between Algorithm 1 to calculate P
= and the total run time; also it provides the lines of code (LOC) the program has in Spear format.
120
P. Malacaria and J. Heusser
The biggest example is a full CRC8 checksum implementation where the input is two char variables (16 bit) which has over 700 LOC. The run time depends on the number of secrets and their ranges and as a result on the cardinality of the partition. The programs are available from the second author’s website. 7.2
Model Checking and Constraint Solving
Recently, Backes, K¨opf, and Rybalchenko published an elegant method to calculate and quantify an equivalence relation given a C-like program [2]. Two algorithms are described to discover and quantify the required equivalence relation. The procedure Disco starts with an equivalence relation equivalent to the ⊥ element in the lattice of information, and iteratively discovers and refines the relation by discovering pairs of execution paths which do lead to a distinction in the outputs. The corresponding high inputs of those two paths are then split in two different equivalence classes. This process is repeated until no more counter examples are discovered. The procedure Quant calculates the sizes of equivalence classes generated by the output of the previous procedure. The result can be normalised to a probability distribution and any probabilistic measure can be applied on it. Disco is implemented by turning the information flow checking into a reachability problem, as shown by [44]. The program P is self-composed by creating a copy of the code P with disjoint variable sets (indicated by the primes) and an added low inequality check at the end of the newly created program, where R is the relation to be refined: if(l = l’ && (h,h’) in R) P(h,l) P’(h’,l’) if(l != l’) error If the error state is reachable then that indicates that there exist two paths of the program P with related low and high inputs which produce distinguishable outputs l and l . This is a violation of the non-interference property and thus a leak of information. The model checker Armc is applied to this reachability problem which will output a path to the error label, if reachable. Beside the path, the model checker also returns a formula in linear arithmetic which characterises all initial states from which the error state is reachable. Out of this formula, the two previously related secrets h and h can be extracted which are then split in two different equivalence classes. Given the formula from the last step, Quant calculates the number and sizes of those equivalence classes using a combination of the Omega calculator and the Lattice Point Enumeration Tool. Omega calculates for each equivalence class a linear arithmetic proposition in disjunctive normal form. The enumeration tool
Information Theory and Security: Quantitative Information Flow
121
then solves these system of linear inequalities for each class, which results in counting the number of elements in the equivalence class. The so generated equivalence class can then be applied to various entropy formulas. The paper shows as example, among others, a sum query of three secrets. The precision and scalability of the tool entirely depends on the choice of underlying tools. The runtime depends on the number of execution paths of the program under analysis and number of variables involved. 7.3
Abstract Interpretation
Mu and Clark use probabilistic semantics in an abstract interpretation framework to build an automatic analyser [34]. They borrow Kozen’s semantics for probabilistic programs which interprets programs as a partial measurable functions on a measurable space; these semantics can be seen as a way to map an input probability distribution to an output probability distribution through the execution of the program under analysis. The entropy measure used is Shannon’s entropy was extended to work on “incomplete” random variables, where the entropy is normalised to the coverage of the probability distribution. To make their analysis tractable, they employ abstract interpretation as their abstraction technique. The interval abstract domain is used to partition the concrete measure space into blocks. Additionally, Monniaux’s abstract probabilistic semantics are used to replace the previous concrete semantics. The abstraction overestimates the leakage through uniformalization, which provides safe upper bounds on the leakage. The concrete space X is abstracted to a set of intervalbased partitions for each program variable, together with a weighting factor αi , which is the sum of the probabilities of the interval value-range. The abstract domain is described by a Galois connection X α, γ X # , where the measure space X is abstracted by X # . The abstraction function α is a map from X to sets of interval-based partitions X # = { αi , [Ei ] }0
the interval based partition of X. The concretisation function γ maps X # to {x|x ∈ [Ei /η]}, where Ei is a block of the abstract object X # and η is a sub-partition on each block under uniform distribution. # The corresponding abstract semantics function [[·]] transforms the abstract # spaces described by X . The description of the abstract operations are skipped and instead we explain the effects of a conditional on the abstract domain as an example: the test splits the abstract space into two parts according to the two outcomes of the test. The if statement returns the sum of the statements in the two branches where the new intervals of variable values are calculated using interval arithmetic. Taking an example from their paper if(x==0) then y=0 else y=1 The analysis starts off with an initial probability distribution for x as 3 bit variable 0 → 0.1 1 → 0.1 2 → 0.1 3 → 0.1 4 → 0.2 5 → 0.2 6 → 0.1 7 → 0.1
122
P. Malacaria and J. Heusser
and y as low security variable under any distribution. An initial partition is then for example E1 [0, 3]x , [0, 7]y α1 = 0.4 E2 [4, 7]x , [0, 7]y α2 = 0.6 After applying the abstract operation for the if statement the abstract domain is transformed to E1 [0, 3]x , [0, 1]y α1 = 0.4 E2 [4, 7]x , [1, 1]y α2 = 0.6 The working of the interval-arithmetic is clearly visible in the restriction of the intervals for the y variable. Leakage of loops is done in a standard fashion using least fixpoints and interval widening, additionally the weight on each abstract element is the maximum between the current and previous iteration. Once the final abstract space has been calculated, the uniformalization transformation guarantees a conservative leakage analysis; this is a process to maximise the entropy for a given abstract space. Again, let us explain this using an example. After some computation, we are given the abstract object on the left; the uniformalized probability distribution on variable l is given on the right. E1 [2, 2]y α1 = 0.3 E2 [3, 3]y α2 = 0.4 , E3 [4, 6]y α2 = 0.3
⎛2 → ⎜3 → ⎜ ⎜4 → ⎜ ⎝5 → 6 →
0.3/1 ⎞ 0.4/1 ⎟ ⎟ 0.3/3 ⎟ ⎟ 0.3/3 ⎠ 0.3/3
Thus, the weight of each interval is divided by the size of the interval. Finally, the leakage upper bound of this space is calculated as H(0.3, 0.4, 0.3) + 0.3 ∗ log(3). This work is the first description of an abstract domain for quantitative information flow. The precision of the analysis is clearly limited by the precision of the interval arithmetic and uniformalization. The scalability of the analysis has not been discussed by the authors. 7.4
Network Flow Capacity and Dynamic Analysis
McCamant and Ernst [30] present a technique to analyse information leakage in large (for this field) C, C++ and Objective C programs. They released as tool Flowcheck which computes leakage as the maximum flow between inputs and outputs using a combination of network flow capacities and dynamic binary analysis. The basic tool to model programs is a network flow graph which represents the execution of a program in a form similar to a circuit. Edges represent values, and have their secret bitwidths (how many secret bits that can be transferred by that edge) as capacity. Nodes represent basic operations on those values. Implicit flows, generated by branches and pointers, are integrated in this model through what the authors call enclosed region. Such a region is an annotation in the code which abstracts away a block of the program into a single
Information Theory and Security: Quantitative Information Flow
123
node with given inputs and outputs. Those annotations can be partly inferred and some need manual editing of the source code. The calculation of leakage then reduces to checking the maximum flow in this network. The tool was applied to a number of large programs, such as the OpenSSH client, and X server, and the Imagemagick tool. Their approach is interesting as it is not based on reachability, like most other analyses, and because it reaches a high level of scalability through making use of dynamic instrumentation tools, such as Valgrind. Chatzikokolakis, Chothia, and Guha [7] automatically quantify information flows using mutual information, however their knowledge about the distribution stems from sampled data instead of predetermined assumptions. They also prove error bounds of their channel capacity estimate. The system under analysis is treated as black box and the leakage is estimate purely from trial runs and their outputs.
8
Advanced Techniques: Channel Capacity Using Lagrange Multipliers
This section, based on [11,27], introduces in the quantitative information flow context a useful technique which is often used to answer questions such as the following one. What is the maximum or the minimum of a function? In our case the relevance of answering this question is related to the fact that once we see information leakage as a function we can answer questions like “what is the maximum leakage in a system?”. The technique we will use to answer the question is the Lagrange multipliers method. This method is very general and so we will use a more general framework to the one we have seen so far so to include probabilistic systems, so to model for example anonymity protocols. One consequence of this framework is that we will not rely anymore on the lattice of information because observations cannot be considered anymore as partitions. We define Leakage Information Channels as a triple A, O, φ where A is a set of secrets (or anonymous events), and O is a set of observations. To introduce probabilities we associate the random variables h for A and O for O respectively. Then, φ expresses the conditional probability between the two random variables. The “secret” in this model is the information of which event in A (i.e. which input) caused the observed observation in O. We denote members of A as hi ∈ A. The above triple can be represented by the protocol matrix shown in Table 2. Rows describe elements of A, columns describe elements of O and the value at position (hi , ok ) is the conditional probability φk,i . This is the chance of observing ok given hi as input.
124
P. Malacaria and J. Heusser Table 2. Protocol matrix o1 o2 . . . on h1 φ1,1 φ2,1 . . . φn,1 h2 φ1,2 φ2,2 . . . φn,2 .. . ... ... ... ... hm φ1,m φ2,m . . . φn,m
This model, based on [8], generalizes the previous language based models and allows for example to quantify loss of anonymity in protocols. Deterministic programs are modeled by systems with φk,i = 1 (a secret uniquely determine an observation). Notice that the system is given by the conditional probabilities hence maximum leakage will only be determined by the choice of the distribution on the secret that will make the system leak the most. As usual leakage is defined as the reduction in uncertainty about the secret given the observations, for simplicity we now ignore the low inputs hence we can define leakage as the mutual information between secrets and observations, I(h; O) The channel capacity is hence the problem of finding the distribution on h which maximized the above function, i.e. max I(h; O) which is a constrained maximization problem. In fact implicit behind information theoretical functions there are constrains: the usual being that the distribution is a set of numbers with sum 1, i.e. we have max I(h; O), hi = 1 i
Our problem in general will hence be to maximize a function f subject to a family of constraints (gi )i∈I max f (x), (gi (x) = bi )i∈I 8.1
Lagrange Method
We will illustrate the use of the Lagrange method by a simple example below. For formal definitions and a tutorial, we refer the reader to the literature [46]. A simple example. Suppose we want to maximize the following function: 10 − (x − 5)2 − (y − 3)2 It is easy to see that the maximum is achieved by x = 5, y = 3.
Information Theory and Security: Quantitative Information Flow
125
Now a constraint x + y = 1 is added to the above problem. Then the above solution is no longer correct. The Lagrange multiplier method combines the original function with the constraint together in a new function 10 − (x − 5)2 − (y − 3)2 + λ(x + y − 1) where λ is a number which indicates the weight associated with the constraint, for example ignoring the constraint is equivalent to setting λ = 0. The term λ is the Lagrange multiplier and the Lagrange technique consists in finding the maximum of the function 10 − (x − 5)2 − (y − 3)2 + λ(x + y − 1) by differentiating on x, y and λ. In this example the derivatives generate the equations: −2x + 10 + λ = 0, −2y + 6 + λ = 0, x + y − 1 = 0 The first two equations imply x = y +2 and by replacing this in the last equation we get 1 y + 2 + y = 1, i.e. y = − 2 It is then easy to derive the values for the other variables i.e. x=
3 , λ = −7 2
Now the values x = 32 , y = − 12 do satisfy the constraint. They are also the values that maximize the original function 10 − (x − 5)2 − (y − 3)2 for all values satisfying the constraint. The function evaluated on this point has value -14.5. If we take other values satisfying x + y = 1 we can only get lower results, e.g. 0.5, 0.5 results in -16.5 and 1, 0 results in −15. Lagrange Theorem. In a general setting let L(x, λ) be the Lagrangian of a function f subject to a family of constraints C1≤i≤m (where Ci ≡ gi (x) = bi ), i.e. λi (gi (x) − bi ) L(x, λ) = f (x) + 1≤i≤n
The basic result justifying Lagrange multipliers is the following theorem: Theorem 5. Assume the vector x∗ = (x∗1 , . . . , x∗n ) maximizes (or minimizes) the function f (x) subject to the constraints (gi (x) = bi )1≤i≤m . Then either
126
P. Malacaria and J. Heusser
1. the vectors (∇gi (x∗ ))1≤i≤m are linearly dependent, or 2. there exists a vector λ∗ = (λ∗1 , . . . , λ∗m ) such that ∇L(λ∗ , x∗ ) = 0 i.e. (
δL ∗ ∗ δL ∗ ∗ (x , λ ) = 0)1≤i≤n , ( (x , λ ) = 0)1≤i≤m , δxi δλi
where ∇ is the gradient. The reverse implication of the theorem is valid when some properties are satisfied. Overall, a maximum is obtained when f is concave and a minimum when f is convex. The previous example is obtained by the following instantiations: f (x1 , x2 ) = 10 − (x1 − 5)2 − (x2 − 3)2 ,
C ≡ x1 + x2 = 1
Channel Capacity for Leakage Information Channels. The following result, from [11] solve the maximization problem for Leakage Information Channels. We will assume that the constraints are “statistics” or expectations7 , i.e. linear expressions Ci of the form hj fj,i = Fi j
for example the constraint j hj = 1 is given by choosing all fj,i = 1 and Fi = 1. We note by Oˆi the sets of observations associated to hi , i.e. Oˆi = {os |φs,i = 0} We have then Theorem 6. The probabilities hi maximizing I(h; O) subject to the family of constraint (Ck )k∈K are given by solving in hi the equations φs,i φs,i ln( )−1+ λk fi,k = 0 os k
os ∈Oˆi
and the constraints (Ck )k∈K . Proposition 10. The channel capacity is given by hi (1 − λk fi,k )d i
k
where the hi ’s are given by theorem 6. Moreover, in the case of the single constraint i hi = 1 the above can be simplified to d(1 − λ0 ) where d =
1 ln 2 .
Proofs and further details about the above results are in [11]. 7
More generally, non-linear constraints need to satisfy additional properties of concavity (convexity) for the theory to work.
Information Theory and Security: Quantitative Information Flow
127
As an example of application of the above results lets study the following well known channel: Example: binary symmetric channel. Consider the classic binary symmetric channel ( [17] p. 186) where there are two values for the secret 0,1 and two possible observations 0,1; the probability of the secret being equal to the observation is 1 − p while the probability of the secret being different from the value observed is p: φ0,0 = φ1,1 = 1 − p
Using
φ0,1 = φ1,0 = p
i
hi φk,i = ok we get o0 = (1 − p)h0 + ph1
o1 = ph0 + (1 − p)h1
Then using theorem 6 we have the equation system: o1 o0 ) − p ln( ) − 1 + λ0 = 0 1−p p o1 o0 −p ln( ) − (1 − p) ln( ) − 1 + λ0 = 0 p 1−p −(1 − p) ln(
By solving it we end up with h0 = h1 =
1 2
1 λ0 = ln( ) − p ln(p) − (1 − p) ln(1 − p) + 1 2
The channel capacity is then d(1 − λ0 ) = 1 − H(p) which coincide with the classical results on binary symmetric channels [17]. A further application of the theory is in generalizing previous results by [8] which characterized anonymity protocols as noisy channels. Deriving Chatzikokolakis, Palamidessi and Panangaden Theorem. we are now going to show how the following result from [8] is a special case of theorem 6. Theorem 7. Given a protocol described by a weakly symmetric matrix, its channel capacity is given by C = ps log
|Os | − H(rs ) ¯ ps
where Os is the set of symmetric output values, rs the symmetric part of a row ¯ of the matrix and ps is the sum of rs . ¯
128
P. Malacaria and J. Heusser
By theorem 6 the probabilities for the channel capacity are given by solving in hi the equations
φs,i ln(
os ∈Oˆi
φs,i )−1+ λk fi,k = 0 os
(17)
k
In our setting, a weakly symmetric matrix in the sense of [8] means that there exists a subset of indices K such that given any k ∈ K, for all i, j, φk,i = φk,j . This set is denoted by On in [8]. For all other indices s ∈ K we have for all i, j, (φs,i )s
is a permutation (with no 0 element) of (φ ∈K s,j )s
∈K : these are the “symmetric output value”. To use the same notations as [8], we write rs for (φs,i )s
and p for φ . Also the above conditions imply that for all i, j ∈K s s
∈K s,i ˆ ˆ ˆ Oi = Oj . We denote this (unique) set as O. As in [8], assuming that there are not additional constraints apart from i hi = 1 then equation (17) becomes
φs,i ln(
ˆ os ∈O
φs,i ) − 1 + λ0 = 0 os
(18)
Using the fact that s ∈ K ⇒ φs,i = (os |hi ) = os It is easy to show that
φs,i ln(
ˆ os ∈O
=−
φs,i ) − 1 + λ0 os
φs,i ln(os ) − ln(2)H(rs ) − 1 + λ0
s
∈K
where ln(2) converts log in the entropy formula into the natural logarithm ln. We hence derive the system of equations (
φs,i ln(os ) = ln(2)H(rs ) + 1 − λ0 )i∈N
s
∈K
Noticing that the right-hand-side is a constant and that for all i, j, (φs,i )s
∈K is a permutation of (φs,j )s
we deduce that ∈K ∀i, j ∈ K, oi = oj and since ps =
s
∈K
φs,j we derive ∀i ∈ K, oi =
k , k = |{i ∈ K}| ps
Information Theory and Security: Quantitative Information Flow
129
We have hence the equation
φs,i ln(
s
∈K
k ) = ln(2)H(rs ) + 1 − λ0 ps
i.e. ps ln(
k ) − ln(2)H(rs ) = 1 − λ0 ps
(19)
Using Proposition 10, replacing λ0 in d(1−λ0 ) with the left hand side of equation (19) we finally arrive at 1 k k (ps ln( ) − ln(2)H(rs )) = ps log( ) − H(rs ) ln(2) ps ps which is theorem 7. However, if we consider protocols which can be represented by weakly symmetric matrices but the inputs of the protocol has some constraints in addition to i hi = 1 then theorem 7 is no longer valid. Recall that when we derive the system of equations ( φs,i ln(os ) = ln(2)H(rs ) + 1 − λk fk,i )i∈N s
∈K
k
The right hand side of the equation is not a constant any more; in particular we cannot derive that ∀i, j ∈ K, oi = oj Therefore theorem 7 collapses. 8.2
Dining Cryptographers
To show how Theorem 6 handle anonymity protocols let’s consider the Dining Cryptographers protocol (a more extensive treatment of anonymity protocols is provided in [11]). It is shown in [11] that the channel capacity is given by solving the following equations: oNYY oYYN oYNY ) − b ln( ) − B − b ln( ) − 1 + λ0 = 0 a b b oNYY oYYN oYNY ) − b ln( ) − B − a ln( ) − 1 + λ0 = 0 −b ln( b b a oNYY oYYN oYNY ) − a ln( ) − B − b ln( ) − 1 + λ0 = 0 −b ln( b a b −a ln(
where – oNYY is the probability of observing the first cryptographer saying ”disagree”, the second and the third saying ”agree”, (similarly for the other oYYN is ”agree,agree,disagree” etc.),
130
P. Malacaria and J. Heusser
– a = (p3 + (1 − p)3 ), b = (p2 (1 − p) + (1 − p)2 p)) (with p the probability of the coin flip giving outcome head) bh1 +bh2 +bh3 – the term B = −b ln( oNNN ) = 0 (because bh1 +bhb 2 +bh3 = b ) = −b ln( b 1) can be eliminated. Notice that there is only one λ-term in these equations, λ0 , corresponding to the fact that there’s only one constraint considered; namely that i hi = 1. Let us start with an example where the protocol provides perfect anonymity. This is the case if the coin-toss is fair; i.e. p = 12 and therefore a = b = 14 . As result of that, the three equations reduce to one: ln(h1 + h2 + h3 ) − 1 + λ0 = 0 and because the mentioned constraint above, h1 + h2 + h3 = 1, we necessarily get λ0 = 1 Now we have the means to calculate the channel capacity by Proposition 10. By plugging in the values of λ0 and hi we conclude that the channel capacity is 0. Hence there is no loss of anonymity. Let us look at another example, where either one of the maximal cases are considered, i.e. “p = 0” and “p = 1”, this results in a = 1 and b = 0. The three equations above reduce to ln(hi ) − 1 + λ0 = 0 This system has only one solution, namely when the following holds h1 = h2 = h3 =
1 3
which is the channel distribution resulting in the maximal loss of anonymity: I(h; O) = log 3 bits In this case the system reveals who of the three cryptographers pays the bill. In general, given p, the above channel distribution and the only constraint of i hi = 1, the channel capacity of the dining cryptographer protocol can be calculated by 1 − p + p2 )− 1 − 3p + 3p2 1 − p + p2 2(p − p2 ) log( ) p − p2
(1 − p + p2 ) log 3 − (1 − 3p + 3p2 ) log(
9
Literature Review
A growing number of papers are published around the topic of quantitative information flow (from now on QIF). This section tries to summarise most of
Information Theory and Security: Quantitative Information Flow
131
the important works, focussing on language-related approaches using information theory. It is structured in a bottom-up and chronological fashion. For a more general information flow review, readers can refer to [37]; also notice that cryptography and information theory is a large topic not covered here, a review of that literature can be found here [29]. 9.1
QIF Fundamentals/Foundation
In [20], Denning first suggested a definition of information flow for programs, based on information theory. Given two program variables x and y in a program P and two states s, s of P , Denning connects the flow of information from the variable x at s, denoted xs to the variable y at s , with the conditional entropy as follows H(xs |ys ) < H(xs |ys ) Consequently, the flow of information is the difference in uncertainty between the two entropies, i.e. H(xs |ys ) − H(xs |ys ) A major merit of Denning’s work has been to explore the use of information theory as basis for a quantitative analysis of information flow in programs. Millen established [33] a formal correspondence between noninterference and mutual information. He proves that the notion of non-interference in the state machine model is equivalent to the mutual information between random variables representing certain inputs and outputs being equal to zero. He uses this equivalence to measure interference in state machine systems, in particular to study the channel capacity for covert channels. McLean introduced a Flow Model in [32] with a notion of time included to better track correlations between high and low inputs. In this model, system is only secure if a low-level input at time t is independent of all previous high level inputs, given all previous low-level inputs. This research is inspired by a more restrictive non-deducibility model by Sutherland [42]. In [31] McIver and Morgan put forward an information theoretic security condition based on measurement of information flow for a sequential programming language enriched with probabilities. The context of their discussion is program refinement and the paper establishes a number of equivalent conditions for a sequential language program to meet their security condition. Their security condition seeks to prevent any change in the knowledge low has of high through the operation of the program. In a series of works, Clark, Hunt, and Malacaria quantify the interference between high and low program variables. In [12] the authors define leakage as the information a passive attacker can learn about a high input observing the low outputs of a program. They take on the mutual information as measure of leakage, as proposed by Gray [21], and show how in their deterministic setting this reduces to just the conditional entropy between the output random variable and the low inputs. In a further paper [14], the static analysis on a deterministic language is further refined with the goal to provide leakage upper bounds on
132
P. Malacaria and J. Heusser
every observable variable. Leakage is calculated in a compositional way through inference rules. Malacaria provided a precise formula [25] for calculating loop leakage as thoroughly described in this chapter. Chen and Malacaria extended the analysis to multithreaded programs [10]. Clarkson, Myers, and Schneider [16] propose a new perspective for a quantitative definition of interference. The idea is to model attacker belief about the secret input as a probability distribution. This belief is then revised using Bayesian techniques as the program is run. The attacker can be seen as a gambler and his belief as the amount he would be prepared to bet that the secret is such and such. This belief revision point of view unveils a new notion of uncertainty depending on how strongly the attacker believes something to be true. The authors suggest that this belief-based system is a more adequate way to quantify information flows between variables in programs than just modelling it as reduction of uncertainty of previously described models. Smith [41] proved that if the adversary has only one try to attack the system then mutual information is not a good measure to assess the leakage in a system. This issue of which measure to use when has been addressed in Section 4. Motivated by this research, Braun, Chatzikokolakis, and Palamidessi [6] considered two notions of leakage related to Bayes risk. Mu and Clark [34] provide an abstract interpretation framework based on probabilistic semantics for a simple deterministic language with the aim to automatically calculate the leakage of programs as described in [25]. To scale to larger state spaces they use an interval abstract domain. 9.2
Information Theory
All of the research is based on information theory. We pay tribute at least to the following people (and many more) for providing the basis of our research. The field of information theory started with the ground breaking work of Shannon [38]. Renyi generalised Shannon’s measure and Nakamura proved the connection between the lattice of partitions and Shannon’s entropy. The introduction of guessing entropy is attributed to Massey [28]. Yeung [43] describes the analogy of set theory and information theory.
References 1. Babi´c, D., Hutter, F.: Spear Theorem Prover. In: Proc. of the SAT 2008 Race (2008) 2. Backes, M., K¨ opf, B., Rybalchenko, A.: Automatic Discovery and Quantification of Information Leaks. In: Proc. 30th IEEE Symposium on Security and Privacy, S& P 2009 (2009) (to appear) 3. Barthe, G., D’Argenio, P.R., Rezk, T.: Secure Information Flow by SelfComposition. In: CSFW 2004: Proceedings of the 17th IEEE workshop on Computer Security Foundations (2004) 4. Bayardo, R., Schrag, R.: Using CSP look-back techniques to solve real-world SAT instances. In: Proc. of AAAI 1997, pp. 203–208. AAAI Press/The MIT Press (1997)
Information Theory and Security: Quantitative Information Flow
133
5. Birkhoff, G.: Lattice theory. Amer. Math. Soc. Colloq. Publ. 25 (1948) 6. Braun, C., Chatzikokolakis, K., Palamidessi, C.: Quantitative notions of leakage for one-try attacks. In: Proceedings of MFPS 2009. ENTCS, vol. 248, pp. 75–91. Elsevier, Amsterdam (2009) 7. Chatzikokolakis, K., Chothia, T., Guha, A.: Statistical Measurement of Information Leakage. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 390–404. Springer, Heidelberg (2010) 8. Chatzikokolakis, K., Palamidessi, C., Panangaden, P.: Anonymity protocols as noisy channels. Information and Computation 206(2-4), 378–401 (2008) 9. Cachin, C.: Entropy Measures and Unconditional Security in Cryptography. PhD thesis, Swiss Federal Institute of Technology (1997) 10. Chen, H., Malacaria, P.: Quantitative analysis of leakage for multi-threaded programs. In: PLAS 2007: Proceedings of the 2007 workshop on Programming languages and analysis for security (2007) 11. Chen, H., Malacaria, P.: Quantifying Maximal Loss of Anonymity in Protocols. In: Proceedings ACM Symposium on Information, Computer and Communication Security 2009 (2009) 12. Clark, D., Hunt, S., Malacaria, P.: Quantitative Analysis of the Leakage of Confidential Data. In: QAPL 2001, Quantitative Aspects of Programming Laguages, November 2002. ENTCS, pp. 238–251 (2002) 13. Clark, D., Hunt, S., Malacaria, P.: A static analysis for quantifying information flow in a simple imperative language. Journal of Computer Security 15(3) (2007) 14. Clark, D., Hunt, S., Malacaria, P.: Quantitative information flow, relations and polymorphic types. Journal of Logic and Computation, Special Issue on Lambdacalculus, type theory and natural language 18(2), 181–199 (2005) 15. Clarke, E., Kroening, D., Lerda, F.: A Tool for Checking ANSI-C Programs. In: Jensen, K., Podelski, A. (eds.) TACAS 2004. LNCS, vol. 2988, pp. 168–176. Springer, Heidelberg (2004) 16. Clarkson, M.R., Myers, A.C., Schneider, F.B.: Quantifying information flow with beliefs. J. Comput. Secur. 17(5), 655–701 (2009) 17. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, Chichester (1991) 18. K¨ opf, B., Basin, D.: An information-theoretic model for adaptive side-channel attacks. In: CCS 2007: Proceedings of the 14th ACM conference on Computer and communications security, pp. 286–296 (2007) 19. Denning, D.E.: A lattice model of secure information flow. Commun. ACM 19(5), 236–243 (1976) 20. Denning, D.: Cryptography and data security, 0-201-10150-5. Addison-Wesley Longman Publishing Co., Inc., Boston (1982) 21. Gray III, J.W.: Toward a mathematical foundation for information flow security. In: Proc. 1991 IEEE Symposium on Security and Privacy, Oakland, CA, pp. 21–34 (1991) 22. Heusser, J., Malacaria, P.: Applied Quantitative Information Flow and Statistical Databases. In: Proceedings of Workshop on Formal Aspects in Security and Trust, FAST 2009 (2009) 23. Goguen, J.A., Meseguer, J.: Security policies and security models. In: Proceedings of the 1982 IEEE Computer Society Symposium on Security and Privacy (1982) 24. Landauer, J., Redmond, T.: A Lattice of Information. In: Proc. of the IEEE Computer Security Foundations Workshop. IEEE Computer Society Press, Los Alamitos (1993) 25. Malacaria, P.: Assessing security threats of looping constructs. In: Proc. ACM Symposium on Principles of Programming Language (2007)
134
P. Malacaria and J. Heusser
26. Malacaria, P.: Risk Assessment of Security Threats for Looping Constructs. Journal of Computer Security 18(2) (2010) 27. Malacaria, P., Chen, H.: Lagrange Multipliers and Maximum Information Leakage in Different Observational Models. In: ACM SIGPLAN Third Workshop on Programming Languages and Analysis for Security (June 2008) 28. Massey, J.L.: Guessing and entropy. In: Proceedings of the 1994 IEEE International Symposium on Information Theory, p. 204 (1994) 29. Maurer, U.M.: The Role of Information Theory in Cryptography. In: Fourth IMA Conference on Cryptography and Coding, pp. 49–71 (1993) 30. McCamant, S., Ernst, M.D.: Quantitative information flow as network flow capacity. In: PLDI 2008, Proceedings of the ACM SIGPLAN 2008, Conference on Programming Language Design and Implementation, Tucson, AZ, USA (2008) 31. McIver, A., Morgan, C.: A probabilistic approach to information hiding. In: Programming Methodology, pp. 441–460. Springer, New York (2003) 32. Mclean, J.: Security Models and Information Flow. In: Proc. IEEE Symposium on Security and Privacy, pp. 180–187. IEEE Computer Society Press, Los Alamitos (1990) 33. Millen, J.K.: Covert Channel Capacity. In: IEEE Symposium on Security and Privacy, pp. 1540–7993, p. 60. IEEE Computer Society, Los Alamitos (1987) 34. Mu, C., Clark, D.: An Abstraction Quantifying Information Flow over Probabilistic Semantics. In: Workshop on Quantitative Aspects of Programming Languages (QAPL), ETAPS (2009) 35. Nakamura, Y.: Entropy and Semivaluations on Semilattices. Kodai Math. Sem. Rep. 22, 443–468 (1970) 36. R´enyi, A.: On measures of information and entropy. In: Proceedings of the 4th Berkeley Symposium on Mathematics, Statistics and Probability, pp. 547–561 (1960) 37. Sabelfeld, A., Myers, A.C.: Language-based information-flow security. IEEE Journal on Selected Areas in Communications 21(1), 5–19 (2003) 38. Shannon, C.E.: A mathematical theory of communication. Bell Systems Technical Journal 27(3), 379–423 (1948) 39. Shannon, C.E.: The lattice theory of information. IEEE Transactions on Information Theory 1, 105–107 (1953) 40. Simovici, D.: Metric-Entropy Pairs on Lattices. Journal of Universal Computer Science 13(11), 1767–1778 (2007) 41. Smith, G.: On the Foundations of Quantitative Information Flow. In: de Alfaro, L. (ed.) FOSSACS 2009. LNCS, vol. 5504, pp. 288–302. Springer, Heidelberg (2009) 42. Sutherland, D.: A model of information. In: Proc. 9th National Security Conference, Gaithersburg, Md., pp. 175–183 (1986) 43. Yeung, R.W.: A new outlook on Shannon’s information measures. IEEE Transactions on Information Theory 37(3), 466–474 (1991) 44. Terauchi, T., Aiken, A.: Secure information flow as a safety problem. In: Hankin, C., Siveroni, I. (eds.) SAS 2005. LNCS, vol. 3672, pp. 352–367. Springer, Heidelberg (2005) 45. Tschantz, M.C., Nori, A.V.: Measuring the Loss of Privacy from Statistics. In: QA 2009: Workshop on Quantitative Analysis of Software (June 2009) 46. Vapnyarskii, I.B.: Lagrange multipliers. In: Hazewinkel, M. (ed.) Encyclopaedia of Mathematics. Springer, Heidelberg (2001) 47. Winskel, G.: The Formal Semantics of Programming Languages. MIT Press, Cambridge (1993)
Performance and Security Tradeoff Katinka Wolter and Philipp Reinecke Freie Universit¨ at Berlin Institut f¨ ur Informatik Takustr. 9, 14195 Berlin, Germany {katinka.wolter,philipp.reinecke}@fu-berlin.de
Abstract. A tradeoff is a situation that involves losing one quality or aspect of something in return for gaining another quality or aspect. Speaking about the tradeoff between performance and security indicates that both, performance and security, can be measured, and that to increase one, we have to pay in terms of the other. While established metrics for performance of systems exist this is not quite the case for security. In this chapter we present standard performance metrics and discuss proposed security metrics that are suitable for quantification. The dilemma of inferior metrics can be solved by considering indirect metrics such as computation cost of security mechanisms. Security mechanisms such as encryption or security protocols come at a cost in terms of computing resources. Quantification of performance has long been done by means of stochastic models. With growing interest in the quantification of security stochastic modelling has been applied to security issues as well. This chapter reviews existing approaches in the combined analysis and evaluation of performance and security. We find that most existing approaches take either security or performance as given and investigate the respective other. For instance [34] investigates the performance of a server running a security protocol, while [21] quantifies security without considering the cost of increased security. For special applications, mobile Ad-hoc networks in [5] and the email system in [32] we will see that models exist which can be used to explore the performance-security tradeoff. To illustrate general aspects of the security-performance tradeoff we set up a simple Generalised Stochastic Petri Net (GSPN) model that allows us to study both, performance and security and especially the tradeoff between both. We formulate metrics, such as cost and an abstract combined performance and security measure that explicitly express the tradeoff and we show that system parameters can be found that optimise those metrics. These parameters are optimal for neither performance nor security, but for the combination of both.
1
Introduction
Performance of computing systems has for many years been considered an important characteristic in the evaluation of these systems. Good metrics should A. Aldini et al. (Eds.): SFM 2010, LNCS 6154, pp. 135–167, 2010. c Springer-Verlag Berlin Heidelberg 2010
136
K. Wolter and P. Reinecke
reflect customer or process requirements, be measurable and should define a target (or optimal) value they can be compared with as to judge the observed metric (cf. [33, 17, 16]). Typical metrics are throughput and response time of a system. For most communication or production systems as well as services both constitute meaningful metrics in the evaluation of the protocol, process, or system. Usually, both, time and number of processed items can be measured and there exists a notion of optimal performance. The anticipated number of processed items is process-dependent and can only be defined using existing knowledge of the system. The response time of a system is in the theoretical optimal case zero and can be compared with zero. Security, similar to reliability, suffers from the fact that in the optimal case no observations can be made. Many different security metrics exist but they rarely are defined in a strict mathematical sense. The most tangible metrics are in analogy with reliability metrics [20, 9] the time between attacks (time between failures), the time to recover from an attack (time to repair), the number of security breaches (system failures) until a certain point in time, etc. The scientific community does not agree yet whether or not security must be placed under the umbrella of dependability. Recently, Aviˇzienis included security in his definition of dependability [3], but for the sake of simplicity, we use the short list definition of dependability, relate it with security and trade both against performance. We wish to be able to formulate optimisation problems for the tradeoff between security and performance as they exist between dependability and performance. In the latter case e.g. software rejuvenation can be used to increase dependability, but this comes at a cost. At the same time system failures incur a cost as well. The optimisation problem then is to tune the rejuvenation interval as to minimise the total cost, which consists of the sum of the rejuvenation cost and the downtime cost. Analogously for encryption one may formulate an optimisation problem: the longer the encryption key the higher is the encryption cost. At the same time a security incident has associated costs as well. The encryption key must have an optimal length as to keep encryption costs and security incident costs together as low as possible. Vice versa, operating a system achieves revenue. This revenue is reduced by encryption cost and, at a much higher degree, by the recovery cost from a security incident. In order to maximise revenue the encryption key must be chosen such that encryption cost is reasonably low and security incidents occur only very rarely. Our study of existing work reveals that the idea of investigating security in a similar way as dependability has been around for a long time [20, 27, 3] but pragmatic studies are rare. We discuss in this chapter existing related approaches that primarily focus on investigating the performance of security mechanisms, the degree of security achieved or the performance of different security algorithms. The related approaches cover the following areas. Lamprecht et al. [18] investigate the performance of different implementations of different encryption
Performance and Security Tradeoff
137
algorithms by taking time measurements. This is an empirical study that creates relevant knowledge for the considered systems, algorithms and configuration but does not generalise or provide a methodology for evaluation. Furthermore, security is an indirect input parameter through the different encryption algorithms and key lengths and merely performance is measured. No combined performancesecurity evaluation is performed that would allow the formulation of a tradeoff. The second related approach is the performance study of a key distribution centre by Zhao and Thomas [34]. The authors investigate the effect of the rekeying interval on the performance of the server. This is a pure performance study of a security application, where security is encoded in the rekeying interval. The third approach by Madan et al. [21] formulates a stochastic model for security similar to earlier performability models. The system is subject to security attacks which transform it first to a degraded and then to a failed state. The model allows to compute availability and mean time to security failure, two relevant security metrics. The model is not tailored to any particular application or system and performance is not considered. The fourth model, presented by Cho et al. in [5], addresses security and performance in mobile ad-hoc networks with intrusion detection systems. Performance is measured by the response times for processed messages, while security is measured as the time until a security failure takes place. Cho et al. analyse the metrics separately and show that both have an optimum. These two metrics could also be combined into a metric to optimise the tradeoff. The fifth and last model investigates the security of an email system under various types of attacks by formulating a queueing model that represents a quasibirth-death process [32]. The considered measures are availability of the system, queue length and information leakage probability. Queue length can be seen as a performance measure while availability and information leakage probability certainly constitute security measures. Although this model could be used to evaluate the performance-security tradeoff it is not done in the paper. Since there seem to be no established models to explore the performancesecurity tradeoff in general we set up a simple and reasonably generic model that illustrates how this could be done. We aim on one hand at explicitly modelling performance and security aspects of a system and on the other hand at formulating metrics and measures that combine both. The model has two separate sub-models, one for security the other for performance. They can be linked or kept completely separate. We have analysed both cases. We show how to formulate measures that refer to the tradeoff between security and performance and allow to optimise the model parameters with respect to this tradeoff. Although we assume that such a model can be formulated in all situations where a system has different performance as well as security states it remains open whether this abstraction applies to all security methods. Security mechanisms are too diverse for a simple classification to cover the whole field. The model we set up assumes an encryption algorithm that has different degree of security and different performance impact depending on the key length. In building our model, we apply the general idea of performability analysis
138
K. Wolter and P. Reinecke
[22, 23] to the joint evaluation of performance and security: We combine two sub-models, one modelling performance aspects, and the other modelling security levels of the system. Then, we can parametrise performance or security of the model appropriately to explore the tradeoff between both. Such a model could certainly also be formulated for access control mechanisms. Whether it applies to security protocols cannot be judged at an abstract level. This chapter is organised as follows: In the first two sections we take a step back from the general approach just described and consider its individual parts, viz. the performance sub-model and the security sub-model. This requires a discussion of performance and security metrics and models and approaches to measure the metrics using the models. Then we discuss existing approaches to combined performance-security evaluation, with the aim of illustrating the type of questions that can be answered with specific approaches and the difficulties one encounters. In Section 5 we complement this discussion by describing and analysing an example model consisting of a performance and a security model. We conclude the chapter with a section illustrating practical issues of modelbased evaluation.
2
Performance Metrics and Models
The performance metrics we are interested in describe the system in terms of its throughput, completion times, or response times, as defined e.g. in queueing theory or networking. For the sake of simplicity we assume that the considered system can be modelled using a discrete-event state-space formalism. Then, the continuous-time stochastic process {X(t)|t ≥ 0} describes the discrete state of the system at time t. Even though general stochastic processes can be formulated, due to their complexity, lack of generalised solution algorithms and difficulty in defining the necessary parameters they are not widely used in practise. Commonly used is a specialised class of stochastic processes, the continuous time Markov chains (CTMC), specified by generator matrix Q and vector of state probabilities π. The model (Q, π) has the transient solution π(t) = π(0) · e−Qt
(1)
0 = πQ.
(2)
and steady-state solution As modelling formalism we employ in this chapter the widely used generalised stochastic Petri nets (GSPNs) [1]. GSPNs have been extended in various ways by adding advanced reward structures [29] or generally distributed times of activities [10, 19] but for the models discussed in this chapter the basic formalism is sufficient. Next to stochastic Petri nets also stochastic Process algebras (SPA) enjoy great popularity [13]. One implementation is the PEPA tool (Performance Engineering Process Algebra) that has been used to analyse the Kerberos protocol discussed in Section 4.2. A GSPN is a bipartite directed graph. Formally, a GSPN is defined as a fivetuple (P, T, A, M, m0 ), where P is a set of places, drawn as circles, T is a set
Performance and Security Tradeoff
139
of transitions, drawn as rectangles, and A is a set of arcs. Places can hold an integer number of tokens. Transitions represent activities and have either an exponentially distributed firing time at rate λ, if λ is the parameter of the associated exponential distribution, or an immediate firing time, in which case they are drawn as thin bars. The expected delay of a transition with exponentially distributed firing time is the expectation of the exponential distribution, 1/λ. In the models in this chapter only timed transitions with exponentially distributed firing time are used, therefore we do not distinguish types of transitions furtheron. The set of arcs is divided into input and output arcs. The former connect a transition and a place and the latter vice versa. A transition is enabled if all its input places hold at least the number of tokens that will be removed by the firing of the transition. Upon firing, the transition removes tokens from its input places and deposits tokens in its output places. The number of tokens is indicated by the labelling of the arcs. This varies across implementations. The state of a Petri net is defined by its marking mi , a vector of length |P |, where entry i indicates the number of tokens in place Pi . The set of all marking is denoted M. The initial marking of the net is m0 . A GSPN can be mapped onto a CTMC, where the possible markings M = m1 , . . . mn are numbered and the vector of state probabilities π(t) = (π1 (t), . . . , πn (t)) corresponds to the probability of the model being in state mi at time t. Several software tools exist that perform transient as well as stationary analysis of the underlying CTMC or (stationary or transient) simulation of the Petri net itself. We use the Petri net tool TimeNET [35] for specification and analysis of our models. PEPA models may be similarly mapped onto CTMCs. For the definition and discussion of PEPA models the reader is referred to [13, 11]. To represent system metrics a reward rate ri can be associated with each state i of the CTMC. Let Z(t) = rX(t) be the instantaneous reward rate of the process at time t. A simple measure is then the accumulated weighted sum of transient state probabilities E [Z(t)] =
n
ri · πi (t)
(3)
i=0
or the accumulated reward over an interval (0, t] E [Y (t)] = E
t
Z(τ ) dτ = 0
n
ri
i=0
Lj (t) =
t
πi (τ ) dτ.
(4)
0
t
πi (τ ) dτ
(5)
0
is the expected time spent in state j during time (0, t]. Moments of the reward can be computed in a straightforward way. For computing the distribution of Y see [28].
140
K. Wolter and P. Reinecke TBI
t1
td1 tr1
t2
td2
TTID
tr2
t
TBDR
TTIR
Fig. 1. Security Metrics, based on [4]
The expected steady-state reward rate is E [Z] =
n
ri · πi
(6)
i=0
To compute system throughput the reward rate must be defined such that it represents the production rate of the system in each state. Then the accumulated reward over a time interval gives the system throughput within that time while the expected steady-state reward rate gives the expected system throughput per time unit in steady-state. The computation of completion time of a task is rather challenging, even more so if the system is modelled as a Petri net, or a stochastic process algebra. The fact that tokens in a Petri net cannot be distinguished causes difficulties and is subject of ongoing research [8]. To compute the average first passage time Ti,S from state i to a state in the set S one can formulate a recursion as given in (7)[25]: 1 qij Ti,S = m{j,S} j ∈ / S. (7) qi qi j ∈S /
For a closed-form solution the model must be transformed such that the set of final states of the considered passage is absorbing. Then the first passage time is Phase-Type (PH) distributed [24], and the average first passage time is [26] ˆ −1 1l E [T ] = α(−Q)
(8)
where α is the initial probability vector of the resulting CTMC, that is, αi = 1 ˆ is the generator matrix of the transient states, and 1l is = i, Q and αj = 0 for j a vector of ones. The only performance metric that can be computed from a stochastic Petri net model using most tools is the throughput of transitions or sub-models. Indirect performance metrics such as cost or gain can be formulated as reward and can be computed for a stochastic Petri net model. Measures related to completion times, waiting times, or passage times are very difficult to derive, but efforts for doing so exist [7, 8].
3
Security Metrics and Models
The quantitative study of security requires that we can measure it. The previous section provided a discussion of performance metrics and quantitative models
Performance and Security Tradeoff
141
to evaluate systems with respect to these performance metrics. Quantitative analysis of the system performance has been the focus of attention for decades, and consequently a large body of work and a mature theory exist [12]. The same cannot be said for security metrics and models. In fact, in his thorough survey of existing work Verendel concludes that ‘quantified security is a weak hypothesis’ [31]. That is, for most of the surveyed work, Verendel found that it is unclear whether the methods and underlying assumptions are valid. Verendel identifies a lack of empirical validations as well as a lack of comparison between methods as reasons for this. An intuitive approach to quantitative security metrics and models is the application of concepts from dependability evaluation, as proposed in e.g. [20, 27, 3]. Just as with dependability, where the system state is assumed (in the simplest case) to be either working or failed, one describes the system as either secure or insecure and specifies transitions between both states either by probabilities, by continuous random variables, or by stochastic processes. We will employ this approach throughout this chapter. Let us start by first considering how this concept enables the derivation of quantitative security metrics. We will assume that our system is either secure, insecure, or in a recovery state between insecure and secure. The system state may change from secure to insecure, from insecure to recovery, and from recovery back to secure. The secure state models the normal operating state of the system. The insecure state is reached when a security incident occurs (this is equivalent to a failure in a dependable system [3]). The recovery state reflects necessary recovery actions before the system may be considered secure again. We base our discussion on three security metrics defined by the Centre for Internet Security in 2009 [4]. Figure 1 illustrates the view underlying these metrics: Security incidents occur at times t1 , t2 , . . . , tn . Each security incident i, occurring at time ti , is followed by its detection at time tdi and recovery from this incident at time tri . The metrics are defined as follows: – Mean Time Between Security Incidents (MTBSI): n
− ti−1 )
i=2 (ti
MTBSI =
n
,
– Mean Time To Incident Discovery (MTTID): n MTTID =
i=1 (tdi
− ti )
n
, and
– Mean Time to Incident Recovery (MTTIR): n MTTIR =
i=1 (tri
n
− ti )
.
The above metrics have corresponding reliability metrics: The Mean Time To Incident Discovery corresponds to the Mean Time To Failure, the Mean Time
142
K. Wolter and P. Reinecke
Between Security Incidents corresponds to MTBF (Mean Time Between Failures) and the Mean Time To Incident Recovery corresponds to MTTR (Mean Time To Repair). Note that Figure 1 also shows the time between detection and recovery. This metric is not defined in [4], but it is required if we assume recovery as a distinct state. This metric is defined as follows: – Mean Time Between Detection and Recovery (MTBDR): n (tri − tdi ) MTBDR = i=1 . n Figure 2 shows a GSPN version of the generic security model: The security state of the system is reflected in the location of the single token in the model. The firing of the transition fail corresponds to the occurrence of a security incident which leaves the system in the insecure state. Upon detection of the incident (firing of detect ) the system moves to the recovery phase, in which the system and/or its operators try to undo the damage from the security breach and bring the system back into the secure state. Note that this model may be modified in a number of ways. For instance, with some systems recovery may not be possible. Then, no transitions back to the secure place exist and the insecure state is absorbing. The approach by Cho et al. [5] discussed in Section 4 is an example. Furthermore, in our model we used transitions with exponentially distributed firing delay and described the transition from secure to insecure as direct. More elaborate models may use more realistic firing-delay distributions and model the transition from secure to insecure as a multi-step process. Madan et al. provide an example for both. We can fully parametrise the simple model in Figure 2 using the metrics we just discussed. The inverse of the MTBSI is the rate of the fail transition, the inverse of the MTTID is the rate of the detect transition, and the recover transition is parametrised by the inverse of the MTBDR. The model can then be analysed using e.g. the TimeNET tool [35] to yield various security metrics. In analogy to the availability metric from the dependable computing area we can compute the probability Pr {secure} of the system being in the secure state. In the GSPN model this is simply the expected number of tokens on the secure place: Pr {secure} = E [#secure] . Likewise, we can compute the mean time to security failure (MTTSF), which corresponds to the mean time to failure (MTTF) in dependable computing, by removing the transitions back to the secure state and computing the time to absorption of the resulting model. This is done in the work by Madan et al., discussed in Section 4. Most importantly, however, this security model can be used in exploring the tradeoff between performance and security. In order to do this, we follow another well-developed approach, viz. that of performability modelling [22, 23]: In performability modelling one combines a performance model and a dependability
Performance and Security Tradeoff
143
Fig. 2. GSPN of the security model with recovery phase
model of the system under study with the goal of jointly evaluating performance and dependability, and the dependencies between parameters controlling either. In the following sections we consider using the security model in place of the dependability model in a performability analysis. Considering some areas that contribute to computer system security one may wonder whether the security model above applies to e.g. access control, security protocols or cryptography. For the last we formulate and analyse a GSPN model in Section 5. A security protocol that aims at providing secure access to resources as well as access control is targeted at securing a system. This system may consequently be in secure and insecure states. If we can model the security mechanism we may also combine it with a system security model and study their interaction.
4
Existing Approaches
We will now discuss several approaches to evaluating performance and security as well as the tradeoff between performance and security. We do not aim at providing an exhaustive overview. For such an overview, we refer the reader to the work of Nicol et al. [27], Verendel [31], and the reports of the EU AMBER research project [30]. Our goal in this section is to illustrate the concepts discussed in the previous section, and to give a basic impression of the methods, problems, and achievements in the field. In particular, we will illustrate which questions can be answered using specific approaches. We start the discussion with three approaches that focus on only one side of the tradeoff: Lamprecht et al. [18] and Zhao and Thomas [34] study performance, while Madan et al. [21] consider only security. We then continue with the approaches by Cho et al. [5] and Wang et al. [32] that take into account both security and performance. This section is concluded by a discussion of the presented work, and the description and analysis of a general performance-security model in Section 5. 4.1
Measurement-Based Evaluation of the Performance-Costs of Encryption
In [18] Lamprecht et al. study the performance of encryption algorithms. Among the studies considered here, this one is unique in that it is not model-based,
144
K. Wolter and P. Reinecke
but instead presents measurements of implementations. The focus here lies on performance. Lamprecht et al. consider Java implementations of common cryptographic routines. They measure the time required by basic operations, such as symmetric and asymmetric encryption and decryption, hashing, and key generation. They show that runtimes may depend on the implementation, on the algorithm, on the length of input data, and on the key length. This last point is the most relevant for our purposes, since in general a larger key length means higher security, and thus the relation between key length and runtime describes the tradeoff between security and performance. In their evaluation, Lamprecht et al. consider two operations: Signing a 2 KB message, and verifying the signature on this message. Signing a message here works as follows: First, a digest of the message is computed using the SHA1 algorithm. The digest is a 160 Bit number strongly depending on the original message (the process is also known as ‘hashing’). Then, employing the asymmetric encryption scheme RSA, the digest is encrypted using the private key of the sender. The result is called the signature of the message and allows anyone in possession of the sender’s public key to verify message authenticity and integrity: Only the owner of the private/public key pair belonging to this signature can have encrypted the digest using this private key, and two different messages are unlikely to have the same digest. Security of signing/verification relies on the security of the public/private key pair. If an attacker can obtain the private key, she is able to generate arbitrary message signatures. With RSA, computing the private key from the public key is essentially a problem of factoring a product of large prime numbers. The effort required then depends mainly on the length of the key: Longer keys are much more secure than shorter keys. Lamprecht et al. study the performance impact of longer keys on both signing and verification. They note that larger keys drastically increase signing-times, but have little effect on verification. This illustrates the tradeoff between security and performance: The longer the key, the more secure the system (i.e. the longer it will take for an attacker to break the key), but also the higher the performance impact of the cryptographic operations on the system. Obtainable measures. Lamprecht et al.’s study is focused on performance. In their work, security is a parameter that determines performance. Security is not explicitly measured, and it is also not thoroughly formalised. Instead, they follow common knowledge in assuming that longer key lengths correspond to higher security. We can thus conclude that their study helps to quantify the costs of security in performance terms for the specific scenario of public-key encryption. 4.2
Performance-Evaluation of a Key-Distribution Centre
A key distribution centre for Kerberos-like key exchange schemes is studied by Zhao and Thomas in [34]. Zhao and Thomas propose a PEPA model and obtain
Performance and Security Tradeoff
145
KDC
Alice 1
Alice 2
Alice n
Bob 1
Bob 2
Bob n
Fig. 3. Key Distribution Centre model employed in [34]
utilisation and response time as performance measures. As with Lamprecht et al.’s approach, security is a parameter and is not measured. The Kerberos key distribution centre (KDC) provides authentication and shared session keys for each pair of users wishing to communicate. Each user trusts the KDC, but does not trust the identity claimed by the other user. By performing a successful interaction with the KDC, upon which the user obtains a shared session key, each user can prove its identity to the counterpart. Figure 3 displays this system for N pairs of users. Zhao and Thomas are interested in the performance of the key distribution centre in various scenarios. They define two performance measures: First, the average utilisation at the KDC, and second the average response time of the KDC when serving key distribution requests. On the other hand, Zhao and Thomas consider security only as a parameter, but not as a measure that can be evaluated. Their definition of the security parameter proceeds from the following intuition: Each session key is used to encrypt a certain amount of data, resulting in a certain amount of ciphertext that is then transmitted. Assuming a constant data rate, the amount of ciphertext encrypted with the same session key depends on the length of time the key is used. As it is a well-known fact that cryptanalysis becomes easier the more ciphertext is available, one way to improve security is to limit the amount of data encrypted with the same session key. Consequently, more frequent key changes render the system more secure. Zhao and Thomas therefore parametrise security by setting the duration for which the session key is used (or equivalently the session key rekeying rate). The Model. Zhao and Thomas model the key distribution centre and its users with the stochastic process algebra PEPA [14]. Each request/response between the users in the user pairs and between a user and a KDC is encoded as a PEPA component. The complete model can then be analysed or simulated as a Continuous-Time Markov Chain (CTMC). Alternatively, an approximate solution can be obtained in a simpler and faster way. Parametrisation and obtainable measures. The important parameters with respect to the performance-security tradeoff are the rate rp at which the KDC issues new keys upon a request, and the rate ru at which the user requests
146
K. Wolter and P. Reinecke
new keys from the key distribution centre. Note that due to the limitation of the PEPA formalism to CTMCs all times are assumed to have an exponential distribution, which can be parametrised by its rate. Zhao and Thomas vary rp and ru and obtain utilisation and average response time of the key distribution centre. That is, they provide results on the performance of the KDC at different levels of security. Their results illustrate that utilisation and response time grow monotonously with growing key-issuing rate rp and rekeying rate ru . That is, increasing security reduces performance, and vice versa. Note that there is no discernible optimum in either performance measure. We may attribute this to the fact that the model does not include performance penalties for low security. In fact, such penalties are out of scope for this model, as lower security does not affect the performance characteristics of the key distribution centre. 4.3
Models for Software-System Security
In contrast to the previous sections, the work by Madan et al., presented in [21], and the similar approach by Almasizadeh and Azgomi [2] focus on evaluating security. These generic models allow the analysis of the security of systems capable of detecting and responding to attacks. In the following we discuss the approach by Madan et al. in detail, as it appears more general than that in [2].
Fig. 4. GSPN model for the systems considered in [21]
Model. Madan et al. study a class of systems that is able to respond to attacks by changing into different degraded modes, similarly to the types of systems studied in performability and dependability analysis [22]. Note that their approach starts from a model describing a class of systems, which is in contrast to the other approaches, where a specific system is modelled. In Figure 4 we have depicted Madan et al.’s model as a GSPN, with one token denoting the system’s current state. The system starts in a good, secure state Good and may then move to a vulnerable state Vulnerable. If the vulnerability is detected in time, the system may return to the good state without any impact on
Performance and Security Tradeoff
147
the system itself. Otherwise, it enters the active attack state Attack, in which an attack takes place. Then, the attack may either go undetected and compromise the system (state Compromised (undetected)), or be detected and responded to. In the latter case, the system may be able to mask the attack without any detrimental effects (state Compromised (detected)). If this is not possible (e.g. due to the nature of the attack, or because of insufficient resources), the system must either enter a state with degraded performance or functionality (Graceful degradation), a fail-secure state (Fail-secure), or signal a security failure (Failure). Which of these states is entered depends on the system and the attack type. In the model, this is reflected in the transition-times between the states. In general, the system may return to the good state by appropriate measures, but these backward arcs may be removed for certain types of attacks, and for analysis (see below). Parametrisation and obtainable results. The generic model can be parametrised to model specific attacks and to answer specific questions about the system. Let us start by discussing the measures that can be derived from the model. First, one can compute the steady-state availability of the system, which Madan et al. define as the probability that the model is in one of the states corresponding to correct function of the system. In the GSPN version, this is the sum of the expected number of tokens in all places referring to correct states. The second measure that can be computed from the model is the mean time to security failure (MTTSF), i.e. the mean time until a failure state is reached. Madan et al. propose to do this by converting failure states to absorbing states (i.e. precluding recovery), and then computing the time to absorption of the resulting model. Suitable parametrisation of the generic model enables the computation of these security metrics for different systems and different attack types. Different systems are modelled by choosing appropriate transition times for the model. For each attack type, specific metrics can be defined, simply by removing irrelevant states from the model. For instance, Madan et al. give the example of a denialof-service (DoS) attack, where the fail-safe state is equivalent to a success of the attack. Furthermore, they argue that a DoS attack cannot be masked by redundancy, and thus remove both the Fail-safe and the Compromised (masked) state from their model. Then, availability under DoS attacks is given by (cf. [21]): ADoS = 1 − (E [#Failure] + E [#Undetected (compromised)]). Note that the same might be achieved without modification of the model, by defining an appropriate set of ‘correct’ states. E.g. in the case of a DoS attack the set of correct states would be {Good, Vulnerable, Graceful degradation}. Note further that one might argue whether the attack state and the graceful degradation state should rather be considered failure states, since in these states the DoS attack might already impact performance of the service and make it at least partially unavailable. Equivalently, the MTTSF may be computed in the same way, by defining a set of failure states and converting them to absorbing states.
148
K. Wolter and P. Reinecke
In contrast to the above two papers, which focused on performance, this approach is concerned only with security. While we can quantify the security of the system under consideration using Madan et al.’s model, we cannot quantify performance. However, one might introduce performance as a parameter influencing security as follows: If the impact of performance parameters on security is known, transitions in the model can be parametrised accordingly. For instance, with a virus scanner we would expect that scanning intervals are related to both performance and security, as shorter intervals reduce performance but also increase the probability of identification of vulnerabilities. If we used performance as a parameter, we would then parametrise the transition from the Vulnerable state back to the Good state accordingly. 4.4
The Performance-Security Tradeoff in MANETs
We now continue with approaches that study both performance and security and thus allow to explore the tradeoff between both. We begin with a modelbased evaluation of the performance-security tradeoff in Mobile Ad-Hoc Networks (MANETs) presented by Cho et al. in [5]. Let us start with a short summary of the type of system Cho et al. consider: They study group-communication between a group of nodes in a mobile ad-hoc network. Nodes may enter or leave the group at any time. Furthermore, nodes may be evicted from the group if an Intrusion-Detection System (IDS) detects that they are compromised, and evicted nodes cannot re-enter the group. The IDS may fail to detect a compromised host (false negative), and may erroneously flag a correct host as compromised (false positive). Communication is only allowed between group members. A shared group key is used to encrypt all communication between the members of the group. In fact, group-membership is equivalent to knowledge of the shared group key. Membership-changes can then be enforced by generating a new group key. In particular, such a rekeying operation is required when a node is evicted, since compromised nodes must not be able to access group communication. Thus, on one hand, rekeying operations should be performed as soon as a membership change occurs. On the other hand, frequent rekeying increases response times, since rekeying operations consume bandwidth. Batch re-keying aims to alleviate this performance impact by rekeying only after thresholds in the amount of membership changes have been reached. In exploring the performance-security tradeoff, Cho et al. first define two security failure scenarios: First, a compromised node accesses group communication, and second, group communication becomes impossible since too few group members are left. The first of these two security failures, access to group communication by a compromised node, may occur only if a compromised node has not been evicted from the group, either because it has not been detected by the intrusion-detection system (constituting a false negative), or because it has been detected, but has not been evicted yet. The second security failure, impossibility of group communication, happens when too many members of the group have been evicted.
Performance and Security Tradeoff
149
This may be the case if too many nodes are compromised or if too many correct nodes have been detected as being compromised (false positives). Security of the system is thus influenced by the choices of thresholds in the rekeying algorithm and by the quality of the IDS. Higher rekeying thresholds result in less frequent rekeying operations. Less frequent rekeying operations translate to longer periods until compromised nodes are evicted, which make it more likely that a compromised node gains access to a message sent within the group. Likewise, a higher false negative rate of the IDS may enable a compromised host to go undetected for a longer period of time, and thus increase its chances of accessing group communication. Lower false-negative and higher false-positive rates of the IDS, on the other hand, increase the likelihood of nodes being evicted, and in turn of system failure of the second type. In particular, high false-positive rates may lead to an untimely failure of the system. After identifying security failures, Cho et al. define two metrics to quantify performance and security. Performance is simply defined as the response time R for messages transmitted within the group, averaged over the system’s total lifetime. Security is measured as the Mean Time To Security Failure (MTTSF), that is, as the mean time until either an attacker gains access to group communication or until the system becomes unavailable.
Fig. 5. Stochastic Petri-Net model for performance and security in MANETs used in [5]
Model. Cho et al. model the systems they are interested in with a Stochastic Petri-Net (SPN) (shown in Figure 5). Initially, all N nodes are correct (all N tokens are on place Correct Nodes). Nodes may then either become compromised or the IDS may incorrectly detect that they have been compromised (false positive). We first consider the second case: After the node has been flagged compromised, it will be evicted from the system by the next rekeying operation. This path is modelled by transitions False Detect and Rekeying and intermediate place Falsely Detected. Firing of transition False Detect corresponds to an erroneous detection and moves the token to the intermediary state Falsely Detected. In this state, the node is still available for group communication. Firing of transition Rekeying models the next rekeying operation, which evicts the node from the group.
150
K. Wolter and P. Reinecke
Alternatively, a node may become compromised, then be detected by the IDS, and eventually be evicted by rekeying. This sequence is modelled by transitions Compromise, Detect, Rekeying and places Compromised, undetected and Compromised, detected. Both before and after detection by the IDS the node is still able to participate in communication. Thus, both states may lead to security failure due to the node gaining access to a message within the group. This is modelled by transitions Data Access 1 and Data Access 2, which move the token to the absorbing place Unauthorised Data Access. The remaining places and transitions, namely Enter Requests, Leave Requests, Enter and Leave serve to model join and leave requests by the nodes. These do not affect security, but have an impact on performance, as these requests require rekeying operations. Note that the transition Rekeying has a non-standard semantics, in that its firing is not governed by its input places, but instead by a set of triggering conditions that reflect the thresholds for rekeying. The transition fires if the numbers of join, leave and evict requests exceeds the threshold value of the particular batch rekeying strategy. The model describes security and performance characteristics as follows: If there is a token in place Unauthorised Access, a security failure of the first type has occurred. If the total number of tokens in places Compromised, undetected and Compromised, undetected is larger than one third of all nodes in the group (which is equal to the sum of the tokens in Correct Nodes, Compromised, undetected, Compromised, undetected and Falsely detected ), too many nodes have been compromised and a security failure of the second type occurs. Performance (response time) is measured based on the time the medium is left idle between rekeying operations. Parametrisation and obtainable results. We now discuss important parameters for the SPN model. We focus on those parameters that are important for exploration of the performance-security tradeoff. Join and leave rates as well as the rekeying rate influence performance, while security is determined by the rate of nodes becoming compromised and detected and the quality of the IDS (specified by false-positive and false-negative probabilities). The rekeying rate depends on the communication time for broadcasting the rekeying message, but the actual throughput of the rekeying transition Rekeying is also controlled by the trigger conditions reflecting the rekeying threshold. The rates at which nodes are compromised and compromised nodes are detected are given as functions of the number of compromised and detected nodes, respectively. Cho et al. choose linear functions, i.e. both rates increase linearly. Probabilities of false negatives and false positives depend on the number of uncompromised and compromised, but undetected nodes. From this model, detailed results on the tradeoff between performance and security can be obtained by varying the eviction thresholds and the IDS intervals. The results by Cho et al. show distinctive optima both in the MTTSF and the response-times of the system. These results provide a guideline to selecting optimal parameters for the system.
Performance and Security Tradeoff
4.5
151
Security of the E-Mail System
The last model we consider is presented by Wang et al. in [32]. They study performance and security of an E-Mail system, using a queueing model. They consider two types of attacks and obtain queue length, mail system availability and information leakage probability as measures for performance and security, respectively. Let us again start by describing the system under study. While Wang et al. do not explicitly state this, from the description of attack types we can deduce that the system is the incoming mailbox of a user, additional filtering mechanisms, and the user itself. To understand this, we need to take a look at the attacks. First, Wang et al. consider attacks that aim to gather information from the system, in particular, by gaining access to the mails in the inbox. They split this attack type into attacks that work by cracking the password, and attacks that require that the user clicks on a link inside a malicious mail. Second, they consider denial of service (DoS) attacks that render the mail system unusable. The example given are mail bombs that flood the mail system. The Model. Wang et al. model this system with a four-queues model. Ordinary mail traffic arrives in the first queue, which has limited capacity. Password cracking attacks and malicious mails are modelled as jobs that arrive in the second and third queues, while mail bombs arrive in the fourth queue. Note that Wang et al.’s model is somewhat misleading, since only the first queue is actually served by the server, while the remaining queues only model security and are not served by the same server as the mail queue. Instead, each of the attack queues has its own server. In Wang et al.’s model the mail queue is an ordinary M|M|1|N queue, that is, mail is assumed to have exponential inter-arrival and service-time distributions with rates λ and μ, respectively, and the mailbox size is limited to N entries. The attack queues, on the other hand each have capacity one. Both information gathering attacks (password cracking and malicious mails) are jobs in M|PH|1|1 queues. That is, inter-arrival times are still exponential, but service times have a phase-type (PH) distribution. Mail bombs are modelled as a M|M|1|1 queue. Service times in the attack queues model security, by affecting the system in different ways. When a job enters one of the first two attack queues, there is a certain probability that the attacker gains access to mails in the queue. This probability is weighed by the time the job stays in the server of the respective queue, i.e. the PH-distributed service time. Jobs in the DoS-attack queue, on the other hand, render the mail system unavailable for the duration of their service. Note, however, that in Wang et al.’s model ‘unavailable’ means that all other processes are suspended. In particular, the mail queue does not change while a DoS attack is in progress. To illustrate this better, we have converted Wang et al.’s model into an equivalent GSPN model (Figure 6). The mail queue is modelled by the sub-net in the upper left corner, while the remaining sub-nets model the different attacks. Each attack is in progress if there is a token on the DoS Attack, Password Cracked or
152
K. Wolter and P. Reinecke
Fig. 6. GSPN for the queueing-model proposed in [32]
Trojan Horse place, respectively. The inhibitor arcs from the DoS Attack place suspend all other sub-models while a token is present in this place. Parametrisation and obtainable results. The model may be parametrised in various ways. Performance is determined by the relation between the arrival rate λ and the service rate μ of the mail queue. Security is parametrised by the configuration of the remaining three queues. If we assume that the arrival rates of attacks are out of our control, then the effect of the attacks depends on their respective service times and, in the case of information gathering attacks, on their success probabilities. Let us consider a few examples and how these might be represented in the model parameters: In the case of password cracking security might be improved by using better passwords and intrusion detection systems. Better passwords would then be modelled as lower attack success probability, while a better IDS could be modelled with a phase-type distribution with lower mean. Likewise, the impact of malicious mails might be reduced by educating users (lower success probability) or by employing faster mail virus scanners (smaller service-times). The effect of DoS-attacks could be reduced by reducing the recovery-time of the system after such an attack. Wang et al. propose three measures, viz. system availability, average queue length, and information leakage probability. All three of these are of interest with respect to the tradeoff between security and performance. While the average queue length is obviously a measure of performance, and information leakage probability is a measure for security, at first glance system availability appears to be a dependability measure, rather than one for security or performance. However, since the DoS attack type specifically aims to disrupt the service, availability is also a security metric in the scenario studied here. Wang et al. derive these measures in steady state using a quasi birth-death (QBD) structure. The definition of the measures and the way to compute them merit some further discussion: First, availability is defined as the steady-state probability that the mail system is available. From the discussion of the model, we observe that this is simply the probability that there is no job in the M|M|1|1 queue modelling DoS attacks (although Wang et al. compute it based on their QBD approach). Consequently, this measure is directly determined by the choice
Performance and Security Tradeoff
153
of arrival and recovery parameters for this attack type and does not provide new insight into the system behaviour. Similarly, the average queue length is only determined by the arrival and service processes for the mail queue. The information leakage probability (ILP) is derived from the number of jobs in the mail queue when an information gathering attack occurs, weighed by the success probability of the attack and the average queue length. That is, ILP gives the steady-state probability of a mail being affected by an information gathering attack. The same measures can be defined on our GSPN version of the model. For instance, average queue length is simply the expected number of tokens on place Mail Queue. Unfortunately, Wang et al. do not use their model to explore the performancesecurity tradeoff in the same way that Cho et al. [5] do. Furthermore, it appears that this cannot be done easily with the current model, since the model does not contain a performance impact of security measures, and vice versa. However, the model may serve to explore the efficacy of different security-enhancing approaches. For instance, one may be interested in whether a better IDS or better passwords are more effective at increasing security against password-cracking attacks. This would be particularly interesting if there was a performance impact of either method involved. Although the model does not allow to answer the latter question, exploring different parameter settings would show which method is more effective. These results may then be used to determine the optimal choice of parameters with respect to both security and performance.
5
Combined Performance-Security Model
Let us now illustrate modelling and analysis of the performance-security tradeoff using a combined performance-security model. Our objective in this section is to give the reader an impression of the process of exploring the tradeoff between performance and security. Thus our focus is not on analysing a real system, but rather on showing methods that may be employed in the analysis. 5.1
The Model
Although our model is not tailored to a specific system, in practise one would usually start with one, and derive model structure and parameter choices from the application. Therefore, we first describe the abstract problem we have in mind. Our model is intended to address the problem of choosing an appropriate key length for the encryption of documents or messages. In accordance with Lamprecht et al. [18] we assume that the longer the encryption key, the higher the security of the encrypted messages, and thus the higher the security of the system. At the same time, the computational effort incurred by encryption increases with key length (as illustrated in [18]). Depending on the amount of available computing power and on the security requirements there exists an optimal key length that solves the tradeoff between performance and security. In some systems that constitute extremes with respect to either performance or security the solution to the tradeoff may be obvious: E.g. in embedded systems
154
K. Wolter and P. Reinecke
with low computational power, strict energy constraints and little outside interaction security requirements may be quite low. Here, the focus typically lies on performance and energy-efficiency. Another extreme are enterprise systems (in areas like finance or insurances) with extremely high security requirements and sufficient computation facilities. In these systems usually the highest security level is chosen and its cost in terms of performance is readily accepted. For systems between these extremes, however, the optimum is not apparent, and these are the systems where a model-based evaluation can be useful. The model is shown in Figure 7. We employ the Generalised Petri Net (GSPN) formalism [1] and use the TimeNET tool [35] in the modelling and analysis stages. This choice was made primarily for reasons of simplicity. Note that for more realistic models, which may be much more complex, other formalisms and tools, e.g. the PEPA approach used in [34] or the M¨ obius toolkit [6] may be more appropriate. Our model consists of a performance sub-model, shown on the left, and the security sub-model we described in Section 3 on the right. The performance model represents encryption within a processing system. Entities are generated, encrypted and then sent out. This model could for example represent a communication system, encrypting and transmitting messages or data packets. Within this context, the security model reflects the security of the key used for encryption. If this key is broken, the system becomes insecure. Recovery is achieved when a new key is available.
Fig. 7. Petri net model for combined performance and security analysis
We want to evaluate the model with respect to performance and security. Looking only at the performance part of the model, we are interested in the throughput of certain transitions. Taking into account only the security part, our interest is in the probability of the system being secure. Considering the combination of both models, we want to study combined performance-security measures such as revenue depending on performance gain and security cost. Let us consider the model in more detail: In the performance part there are N tokens representing the entities to be processed. Each firing of the generate transition injects one token into the processing part. The entity must then be encrypted (firing of encrypt ), before it can be sent out, reflected by a firing of
Performance and Security Tradeoff
155
the send transition. As described in Section 3 the single token circling in the security sub-model models the security state of the system: Initially, the token occupies the secure state, which corresponds to the system being secure. Then, a security incident may happen (firing of fail ), which leaves the system in an insecure state. Upon detection of the insecure state (detect ) recovery is initiated, which eventually results in a secure system again (firing of transition recover ). Note that both models are connected by an inhibitor arc between the restoring place and the encrypt transition. This arc means that a key known to be insecure will not be used for encryption. 5.2
Parametrisation
The model may be parametrised such that it represents various systems of interest. In particular, we may choose the firing delay distributions of the transitions and the parameters of these distributions. For the sake of simplicity, we assume firing delays with exponential distribution, since these may be parametrised simply by their firing rate. Considering now the choice of firing rates, we should distinguish between those parameters that determine the behaviour of the processing system only, and those that are related to the security mechanism. We make this distinction because we are interested in the performance implications of the security mechanism and can therefore treat the non-security-related parameters of the processing system as constants. Clearly, all rates of transitions in the security sub-model are related to security. Since we assume that longer keys translate to higher encryption times, the rate of the encrypt transition is also connected to the security of the system. The generate and send transitions, on the other hand, only describe the processing system. Table 1 lists the parameters we chose for the subsequent analysis. Choosing the right parameters is, of course, difficult, since they must represent the system under study, must be relevant for the questions posed to the model, and must also result in models that are still feasible for analysis. In real applications performance and security parameters differ by many orders of magnitude, as do performance and reliability parameters. Performance of the system in typical applications is in the order of one completed job per second or at least one per minute. This translates to the firing rate of a transition. On the other hand the frequency of security incidents would be rather one per month or one per year. Even a system with low performance and low security standard that would process one request per minute would have a rate of security incidents of 2.3 · 10−5 versus a processing rate of 1. This difference in orders of magnitude imposes great difficulties on analytical and simulation-based solutions of models as we will show. In our model, this problem affects primarily the rates of the encrypt and fail transitions. The relation between these rates reflects the relation between the time required to encrypt messages and the time required to break the encryption key. Obviously, for an encryption scheme to be useful, the time for breaking the key must be much longer than that for encryption. We need to include this relation in our model, however, we still want to keep analysis feasible. For
156
K. Wolter and P. Reinecke
simplicity, we circumvent the problem by choosing rates that differ by not much more than two orders of magnitude. While these choices are not realistic, they allow us to still obtain an analytical solution for the stationary case using the TimeNET tool [35]. Table 1. Parameters of the Petri net model Parameter Name generate send N encrypt TSI detect recover
5.3
Value 2.0 0.1 150 0.1, . . ., 3.4 by 0.1 12.5,25,50,100, . . ., 15100 by 500 120 360
Measures
After defining the model and its parameters, we must now define the measures we want to investigate. All measures are defined as rewards that are functions of the token occupancy and the state probabilities of the GSPN model. The measures are summarised in Table 2. Our first measure of interest is the throughput achieved by the processing system. Throughput is a classical performance measure, and we evaluate it by computing the throughput of the send transition. As can be seen from the model (Figure 7), throughput is influenced by the actual firing rate of the encrypt transition. The actual firing rate, in turn, depends both on the encryption delay and on the probability of the system being in the restoring state. We thus expect security to have an impact on throughput. Our second measure is the probability of the system being secure. This is a pure security measure, which we expect to increase with the time to security incident. In our security model this is simply the probability that the place secure is not empty. Note that in our security sub-model only one token is circling on the places. In such a model, this probability is just the expected number of tokens in place secure. We combine these two metrics in the CPSM (combined performance and security) measure, which is simply the sum of performance and security. This gives us a measure with performance as well as security contribution. Both contributing measures are ‘high-better’ [15] measures and therefore their sum again is a ‘high-better’ measure that hopefully has a maximum. The next two metrics, Gain and Loss, are indirect performance and security measures that can be combined into a revenue measure. The revenue metrics connect the performance and security parts by weighting the expected number of messages that have been encrypted and are ready to be processed by a reward rate. Only if the system is secure the revenue metric translates to gain made by operating the system. If the system is insecure and not recovering the security incident has not been detected yet and the revenue metric turns into loss.
Performance and Security Tradeoff
157
Table 2. Metrics for the model Throughput(send ) Pr {secure} CPSM Gain Loss lowCostRevenue highCostRevenue
10 · Pr {#processing > 0} E [#secure] = Pr {#secure > 0} Throughput(send) + Pr {secure} 2 · E [#processing IF #secure = 1] −E [#processing IF #insecure = 1] 2 · E [#processing IF #secure = 1] − E [#processing IF #insecure = 1] E [#processing] · (2 · E [#secure] − 5 · E [#insecure])
In order to illustrate the impact of the chosen revenue metrics on the results we consider two revenue scenarios, a low cost one and a high cost scenario. Comparing both will show how the choice of revenue metric, which is largely influenced by the modeller, affects the results. In the low cost scenario every securely processed item gains twice the revenue an insecurely processed one loses. In the high cost scenario, on the other hand, the gain is the same, but the cost of processing a message in an insecure situation is 5. 5.4
Analysis
The analysis aims at investigating the effects of different key lengths on performance and security of the system. The key length has an impact on encryption 1.6 1.4 1.2 1 0.8 0.6 0.4 Pr{secure} throughput Pr{secure} + throughput
0.2 0 0
0.5
1
1.5
2
2.5
3
3.5
encryption time Fig. 8. Throughput of the processing system and probability of being a secure system
158
K. Wolter and P. Reinecke
time as well as on the time needed to break the key. Therefore, with the key length the firing delay of transition encrypt as well as transition fail changes. We consider key length to be reflected in these firing times. As can be seen in Table 1, encryption with the shortest key is assumed to take 0.1 time units, and the time to break this key is assumed to be 12.5. We assume that encryption time increases in steps of length 0.1, while the time to break the key increases first by a factor of 2 until 100 and then by linear steps of 500. As both parameters increase simultaneously we only use encryption time in the plots. First, we consider throughput, probability of the system being secure, and CPSM, the combination of both. Figure 8 shows these measures for increasing key length (reflected by increasing encryption times). Considering that the relation between the two varying parameters (encrypt and TSI) changes after the first three solution runs the probability of the system being in the secure state increases almost linearly with thetime between security incidents. This is as expected. Reasoning naively, the throughput should decrease linearly with increasing encryption time. This is not the case for a simple reason. For very short encryption times the throughput is limited by the long delay (low firing rate) of transition generate. The interplay of the two transitions (generate and encrypt) is blurred by the effects of the inhibitor arc blocking the encryption while the system is recovering from a security incident. This inhibitor arc injects side effects of the security model into the performance measure throughput. Short keys have more security incidents and therefore more time is spent in the recovery state. Therefore, only short keys (and encryption times) show in the throughput. From Figure 8 we observe that the CPSM metric, which is the sum of throughput and probability of the system being in the secure state, is a simple and straightforward measure for the performance and security tradeoff. It has a clear maximum which is at encryption time 1.4 and TSI (time to security incident) 6100. Consider now the revenue measures (Figure 9). Both revenue measures show very clear optimal parameter settings for the encryption time at 1.9 and hence the key length and expected time between security incidents (TSI). Note that the optimum encryption time lies just below the firing delay of the generate transition. For longer encryption time the generate delay is no longer the limiting factor and a queue may build up in place queueing. For short encryption times many more messages are being processed, therefore the difference between both cost models is more pronounced. In the limit for very long encryption time and extremely long time between security incidents the total revenue decreases for both cost models and they both approach the same limit, zero. Figure 10 shows the same metrics in a new presentation. The lowCostRevenue is the same as shown in Figure 9 and gain is its positive contribution. The difference between both curves, which more clearly shows in the zoomed plot on the right side in Figure 10 illustrates the security cost that is higher the shorter the times between security incidents are.
Performance and Security Tradeoff
159
0.12 0.1 0.08 0.06 0.04 0.02 0
lowCostRevenue highCostRevenue
-0.02 -0.04 -0.06 -0.08 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 9. Revenue with two different cost models 0.11
0.11
0.1
0.1
lowCostRevenue gain
0.09
0.09
0.08
0.08
0.07
0.07
0.06
0.06
0.05
0.05
0.04
0.04
0.03
0.03
0.02
lowCostRevenue gain
0.02 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
0.2
0.4
0.6
0.8 encryption time
1
1.2
1.4
Fig. 10. Comparison of security cost and total revenue (zoomed in on the right)
5.5
The Modified Model
In order to more distinctly split the model into a performance and a security model we have removed the inhibitor arc blocking the encryption of messages during recovery. Henceforth, the performance and the security model are only intertwined by the metrics defined on them and by the simultaneous increase of encryption time and time between security incidents.
160
K. Wolter and P. Reinecke
Fig. 11. Simplified Petri net model for combined performance and security analysis
1.6 1.4 1.2 1 0.8 0.6 0.4 Pr{secure} throughput Pr{secure} + throughput
0.2 0 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 12. Throughput of the simplified processing system
We expect to see more clearly the characteristics of performance and security and how they conflict. The simplified model is shown in Figure 7. While the probability of being in the secure state is not affected by the change in the model, throughput undoubtedly increases for short encryption times and remains almost the same for long encryption times. Without the inhibitor arc performance of the system benefits from more processing time. The constant throughput for small encryption times is caused by the limitation through the generate transition. This holds true until the encryption time equals
Performance and Security Tradeoff
161
the delay of the generate transition. As the encryption time increases further it is decisive for the throughput which then decreases with increasing encryption time. The probability of being in the secure state and the throughput are both monotonous functions (increasing or decreasing) with increasing encryption time and time between security incidents. While neither of them has an optimum over the encryption time, the combination of both, e.g. our CPSM metric, has a clear maximum when the encryption time equals the delay of the generate transition. 0.1 0.08 0.06 0.04 0.02 0
lowCostRevenue highCostRevenue
-0.02 -0.04 -0.06 -0.08 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 13. Revenue with two different cost models for the simplified processing model
If encryption is not blocked during recovery from a security incident the most insecure system with short encryption times and high probability of key breaking achieves no revenue because cost dominates the gain in both the high and the low cost scenarios. Indirectly, the system throughput influences the revenue and the parameter set that achieves the highest throughput also obtains the highest revenue. In this model throughput in the insecure state is treated the same as throughput in the secure state. The higher throughput as compared to the earlier modes comes at the expense of insecurely processed data, which is not considered in the measures. Blocking the processing system during recovery is a wise action as it reduces the amount of wasted work considerably and therefore increases the revenue. This applies to both, the low cost and the high cost scenario. The cost is most
162
K. Wolter and P. Reinecke
0.1
0.1 lowCostRevenue gain
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0
0
-0.02
lowCostRevenue gain
-0.02 0
0.5
1
1.5
2
encryption time
2.5
3
3.5
0.2
0.4
0.6
0.8
1
1.2
1.4
encryption time
Fig. 14. Comparison of security cost and total revenue (zoomed in on the right) for the simplified processing model
dominant while the encryption key is short and the system insecure. In Figure 13 the cost is the difference between both curves which diminishes with increasing encryption time and key length. Figure 14 shows the relationship between gain and loss in the low cost scenario by displaying gain and total revenue. The cost then is the distance between both curves. Obviously, the gain is less as the expected number of items being processed is less when removing the inhibitor arc. Cost also decreases but, as seen above the total revenue still decreases. If gain and loss are proportional to the number of items in the processing state this result is as expected. When optimising revenue, the best encryption time is 0.13.
6
Analysis Issues
So far we have assumed that we can obtain solutions for the models we consider. However, analysis of combined performance and security models suffers from the same numerical difficulties known in performability models. We will now discuss a few common problems. First, one often encounters the problem that the size of the state space rapidly increases. This increases computation time and memory requirements and may render models unsolvable. Furthermore, as Table 3 and Figure 15 illustrate, a large state space may translate into inaccuracy of the solution. Table 3 lists the number of states the model has when increasing the capacity of the processing system. We have computed the probability of being in the secure state for different capacities in both models. Of course, this probability should be constant for all capacities. However, it turned out that the solutions suffered severely from numerical inaccuracy and differed significantly. We then found that we had used the default setting of TimeNET that limits the number of iterations in the steady state solver to 1000 and this was hardly ever enough. This illustrates that the size of the state space can have a drastic impact of the convergence properties of the solution algorithms, and
Performance and Security Tradeoff
163
Table 3. Size of the state space for different capacity N of the processing system number of states 3 9 18 30 63 198 3 978 15 453 34 428
n 0 1 2 3 5 10 50 100 150
1
Probability of secure state
0.9
0.8
0.7
security model only with inhibitor, n = 150 without inhibitor, n = 1 without inhibitor, n = 2 without inhibitor, n = 3 without inhibitor, n = 5 without inhibitor, n = 10 without inhibitor, n = 150
0.6
0.5 0
2000
4000
6000
8000
10000
12000
14000
16000
time between security incidents
Fig. 15. Probability of being in the secure state
consequently on the accuracy of the results. Figure 16 shows the number of iterations needed in both models for the parameter set indicated by the encryption time. Only in very few parameter configurations do the solutions converge within 1000 iterations, and some even need up to 16 000 iterations. Interestingly, the parameter sets around encryption time of 2 require most iterations and encryption time 2, which is identical to the delay of the generate transition needs much less iterations than parameters slightly higher and lower. Figure 16 illustrates that the solution algorithm for the model without inhibitor arc for all parameter configurations requires many more iterations to
164
K. Wolter and P. Reinecke 16000 14000
model with inhibitor model without inhibitor
12000
No. iterations
10000 8000 6000 4000 2000 0 0
0.5
1
1.5 2 encryption time
2.5
3
3.5
Fig. 16. Iterations needed to obtain 10−7 accuracy
1 0.9 0.8 0.7 0.6 0.5 0.4
with inhibitor, maxIter 1000 with inhibitor, maxIter 2000 with inhibitor, maxIter 5000 with inhibitor, maxIter 1000000 without inhibitor, maxIter 1000 without inhibitor, maxIter 2000 without inhibitor, maxIter 5000 without inhibitor, maxIter 1000000
0.3 0.2 0.1 0 0
0.5
1
1.5
2
2.5
3
Fig. 17. Deviation of the probability of being in the secure state
3.5
Performance and Security Tradeoff
165
converge than for the model with inhibitor arc. The consequences of poor convergence are shown in Figure 17 where the probability of being in the secure state is plotted for different limits of the number of iterations.It becomes clear that the solutions for the model with inhibitor are reasonably good even if the algorithm does not converge. Accuracy of the solution (which we do not show for all runs) is never below 10−5 . The same holds for the model without inhibitor arc using at most 2000 iterations. Using only 1000 iterations for high parameter values precision of the solution goes down to 10−4 while the worst accuracy of the probability of being in the secure state is for intermediate parameter values. This illustrates that an accuracy of 10−7 is sometimes, but not always necessary for reasonably precise results for the measures. Even worse, no rule exists for when high accuracy is essential.
7
Conclusions
In this chapter we have investigated the relationship of performance and security in model-based evaluation. The approach we illustrated is based on the premise that there are significant similarities between security and dependability. In consequence, security may be evaluated using stochastic processes and in particular CTMCs using stochastic Petri nets or stochastic process algebras as specification languages. The combination of security and performance poses interesting tradeoffs and inspires similar models as the combination of performance and dependability, known as performability. Quantification of security has only recently attracted more attention, and while some initial conceptual work has been published already decades ago, serious model-based evaluation of security mechanisms has been published only recently. The tradeoff between performance and security has been investigated only for very specific scenarios. This tradeoff is of high relevance especially in modern systems that are subject to requirements in both areas, performance and security. In order to proceed to a more general treatment and understanding of the performance-security tradeoff we have proposed a rather simple model which distinctly consists of a security part and a performance part. We have shown how to formulate measures that include both, performance and security aspects and that optimise the tradeoff between the two. While previously the performance of security mechanisms has been investigated, or the security of a processing system, we want to initiate more explicit treatment of both properties together. We have used our model to discuss typical issues of parametrisation, reward formulation, and analysis frequently encountered with models of this type. Many challenges and open problems remain that will hopefully be addressed in the future. In particular, it is as of yet unclear whether all existing security mechanisms can be traded for performance of the respective system, whether it will be possible to study realistic parameter sets and whether combined measures exist for arbitrary systems.
166
K. Wolter and P. Reinecke
References [1] Marsan, M.A., Balbo, G., Conte, G., Donatelli, S.: Modelling with Generalized Stochastic Petri Nets. Series in Parallel Computing. John Wiley & Sons, Chichester (1995) [2] Almasizadeh, J., Azgomi, M.A.: Intrusion process modeling for security quantification. In: International Conference on Availability, Reliability and Security, pp. 114–121. IEEE Computer Society, Los Alamitos (2009) [3] Aviˇzienis, A., Laprie, J.-C., Randell, B., Landwehr, C.: Basic concepts and taxonomy of dependable and secure computing. IEEE Transactions on Dependable and Secure Computing 1(1), 11–33 (2004) [4] The Center for Internet Security. The CIS Security Metrics v1.0.0 (May 2009) [5] Cho, J.-H., Chen, I.-R., Feng, P.-G.: Performance analysis of dynamic group communication systems with intrusion detection integrated with batch rekeying in mobile ad hoc networks. In: AINAW 2008: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications – Workshops, Washington, DC, USA, pp. 644–649. IEEE Computer Society, Los Alamitos (2008) [6] Deavours, D.D., Clark, G., Courtney, T., Daly, D., Derisavi, S., Doyle, J.M., Sanders, W.H., Webster, P.G.: The M¨ obius Framework and Its Implementation. Transactions on Software Engineering 28(10), 956–969 (2002) [7] Dingle, N.J., Harrison, P.G., Knottenbelt, W.J.: Hydra: Hypergraph-based distributed response-time analyzer. In: Arabnia, H.R., Mun, Y. (eds.) Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, PDPTA 2003, Las Vegas, Nevada, USA, June 23-26, vol. 1, pp. 215–219. CSREA Press (2003) [8] Dingle, N.J., Knottenbelt, W.J.: Automated customer-centric performance analysis of generalised stochastic petri nets using tagged tokens. Electron. Notes Theor. Comput. Sci. 232, 75–88 (2009) [9] Freiling, F.C.: Introduction to security metrics. In: Dependability Metrics, pp. 129–132 (2005) [10] German, R.: Performance Analysis of Communication Systems with NonMarkovian Stochastic Petri Nets. John Wiley & Sons, Inc., Chichester (2000) [11] Gilmore, S., Hillston, J.: The pepa workbench: A tool to support a process algebra-based approach to performance modelling. In: Haring, G., Kotsis, G. (eds.) TOOLS 1994. LNCS, vol. 794, pp. 353–368. Springer, Heidelberg (1994) [12] Haverkort, B.R.: Performance of Computer Communication Systems: A ModelBased Approach. John Wiley & Sons, Chichester (1998) [13] Hillston, J.: A Compositional Approach to Performance Modelling. Cambridge University Press, Cambridge (1994) [14] Hillston, J.: A Compositional Approach to Performance Modelling (Distinguished Dissertations in Computer Science). Cambridge University Press, New York (2005) [15] Jain, R.: The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation and Modeling. Wiley, New York (1991) [16] Jaquith, A.: Security Metrics: Replacing Fear, Uncertainty and Doubt. AddisonWesley Professional, Reading (2007) [17] Kitchenham, B., Pfleeger, S.L., Fenton, N.: Towards a framework for software measurement validation. IEEE Trans. Softw. Eng. 21(12), 929–944 (1995)
Performance and Security Tradeoff
167
[18] Lamprecht, C., van Moorsel, A., Tomlinson, P., Thomas, N.: Investigating the efficiency of cryptographic algorithms in online transactions. International Journal of Simulation: Systems, Science & Technology 7(2), 63–75 (2006) [19] Lindemann, C.: Performance Modelling with Deterministic and Stochastic Petri Nets. John Wiley & Sons, Chichester (1998) [20] Littlewood, B., Brocklehurst, S., Fenton, N., Mellor, P., Page, S., Wright, D., Dobson, J., Mcdermid, J., Gollmann, D.: Towards operational measures of computer security. Journal of Computer Security 2, 211–229 (1993) [21] Madan, B.B., Goseva-Popstojanova, K., Vaidyanathan, K., Trivedi, K.S.: Modeling and quantification of security attributes of software systems. In: DSN 2002: Proceedings of the 2002 International Conference on Dependable Systems and Networks, Washington, DC, USA, pp. 505–514. IEEE Computer Society Press, Los Alamitos (2002) [22] Meyer, J.F.: On evaluating the performability of degradable computing systems. IEEE Transactions on Computers 29(8), 720–731 (1980) [23] Meyer, J.F.: Performability modeling: Back to the future? In: Proceedings of the 8th International Workshop on Performability Modeling of Computer and Communication Systems, pp. 5–9, CTIT (2007) [24] Miner, A.S.: Computing response time distributions using stochastic petri nets and matrix diagrams. In: IEEE International Workshop on Petri Nets and Performance Models. IEEE Computer Society, Los Alamitos (2003) [25] Mitrani, I.: Probabilistic modelling. Cambridge University Press, New York (1998) [26] Neuts, M.F.: Matrix-Geometric Solutions in Stochastic Models. An Algorithmic Approach. Dover Publications, Inc., New York (1981) [27] Nicol, D.M., Sanders, W.H., Trivedi, K.S.: Model-based evaluation: From dependability to security. IEEE Trans. Dependable Secur. Comput. 1(1), 48–65 (2004) [28] Pattipati, K.R., Mallubhatla, R., Gopalakrishna, V., Viswanatham, N.: MarkovReward Models and Hyperbolic Systems. In: Performability Modelling: Techniques and Tools, pp. 83–106. Wiley, Chichester (1998) [29] Sahner, R.A., Trivedi, K.S., Puliafito, A.: Performance and Reliability Analysis of Computer Systems: An Example-Based Approach Using the SHARPE Software Package. Kluwer Academic Publishers, Dordrecht (1996) [30] van Moorsel, A., Bondavalli, A., Pinter, G., Madeira, H., Majzik, I., Dur˜ aes, J., Karlsson, J., Falai, L., Strigini, L., Vieira, M., Vadursi, M., Lollini, P., Esposito, R.: State of the art. Technical Report D2.1, Assessing, Measuring and Benchmarking Resilience (AMBER) (April 2008) [31] Verendel, V.: Quantified security is a weak hypothesis: A critical survey of results and assumptions. In: NSPW 2009: Proceedings of the New Security Pradigms Workshop 2009, pp. 37–50. ACM, New York (2009) [32] Wang, Y., Lin, C., Li, Q.-L.: Performance analysis of email systems under three types of attacks. Performance Evaluation (2010) (in Press) (Corrected Proof) [33] Weyuker, E.J.: Evaluating software complexity measures. IEEE Trans. Softw. Eng. 14(9), 1357–1365 (1988) [34] Zhao, Y., Thomas, N.: Efficient solutions of a pepa model of a key distribution centre. Performance Evaluation (2009) (in Press) (Corrected Proof) [35] Zimmermann, A., German, R., Freiheit, J., Hommel, G.: Petri Net Modelling and Performability Evaluation with TimeNET 3.0. In: Haverkort, B.R., Bohnenkamp, H.C., Smith, C.U. (eds.) TOOLS 2000. LNCS, vol. 1786, pp. 188–202. Springer, Heidelberg (2000)
Author Index
Broadbent, Anne
43
Di Pierro, Alessandra
Kashefi, Elham 1
43
Malacaria, Pasquale
87
Fitzsimons, Joseph
43
Reinecke, Philipp
135
Hankin, Chris 1 Heusser, Jonathan
87
Wiklicky, Herbert 1 Wolter, Katinka 135