Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2828
3
Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo
Antonio Lioy Daniele Mazzocchi (Eds.)
Communications and Multimedia Security Advanced Techniques for Network and Data Protection 7th IFIP-TC6 TC11 International Conference, CMS 2003 Torino, Italy, October 2-3, 2003 Proceedings
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Antonio Lioy Politecnico di Torino Dip. di Automatica e Informatica corso Duca degli Abruzzi, 24, 10129 Torino, Italy E-mail:
[email protected] Daniele Mazzocchi Istituto Superiore Mario Boella corso Trento, 21, 10129 Torino, Italy E-mail:
[email protected]
Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at
. CR Subject Classification (1998): C.2, E.3, D.4.6, H.5.1, K.4.1, K.6.5, H.4 ISSN 0302-9743 ISBN 3-540-20185-8 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de ©IFIP International Federation for Information Processing, Hofstraße 3, A-2361 Laxenburg, Austria 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP-Berlin GmbH Printed on acid-free paper SPIN: 10959107 06/3142 543210
Preface
The Communications and Multimedia Security conference (CMS 2003) was organized in Torino, Italy, on October 2-3, 2003. CMS 2003 was the seventh IFIP working conference on communications and multimedia security since 1995. Research issues and practical experiences were the topics of interest, with a special focus on the security of advanced technologies, such as wireless and multimedia communications. The book “Advanced Communications and Multimedia Security” contains the 21 articles that were selected by the conference program committee for presentation at CMS 2003. The articles address new ideas and experimental evaluation in several fields related to communications and multimedia security, such as cryptography, network security, multimedia data protection, application security, trust management and user privacy. We think that they will be of interest not only to the conference attendees but also to the general public of researchers in the security field. We wish to thank all the participants, organizers, and contributors of the CMS 2003 conference for having made it a success.
October 2003
Antonio Lioy General Chair of CMS 2003 Daniele Mazzocchi Program Chair of CMS 2003
VI
Organization
CMS 2003 was organized by the TORSEC Computer and Network Security Group of the Dipartimento di Automatica ed Informatica at the Politecnico di Torino, in cooperation with the Istituto Superiore Mario Boella.
Conference Committee General Chair: Antonio Lioy (Politecnico di Torino, Italy) Program Chair: Daniele Mazzocchi (Istituto Superiore Mario Boella, Italy) Organizing Chair: Andrea S. Atzeni (Politecnico di Torino, Italy)
Program Committee F. Bergadano, Universit` a di Torino E. Bertino, Universit` a di Milano L. Breveglieri, Politecnico di Milano A. Casaca, INESC, chairman IFIP TC6 M. Cremonini, Universit` a di Milano Y. Deswarte, LAAS-CNRS M. G. Fugini, Politecnico di Milano S. Furnell, University of Plymouth R. Grimm, Technische Universit¨ at Ilmenau B. Jerman-Blaˇziˇc, Institut Joˇzef Stefan S. Kent, BBN T. Klobuˇcar, Institut Joˇzef Stefan A. Lioy, Politecnico di Torino P. Lipp, IAIK J. Lopez, Universidad de M´ alaga F. Maino, CISCO D. Mazzocchi, ISMB S. Muftic, KTH F. Piessens, Katholieke Universiteit Leuven P. A. Samarati, Universit` a di Milano A. F. G. Skarmeta, Universidad de Murcia L. Strous, De Nederlandsche Bank, chairman IFIP TC11 G. Tsudik, University of California at Irvine
Organization
CMS 2003 was organized by the TORSEC Computer and Network Security Group of the Dipartimento di Automatica ed Informatica at the Politecnico di Torino, in cooperation with the Istituto Superiore Mario Boella.
Conference Committee General Chair: Antonio Lioy (Politecnico di Torino, Italy) Program Chair: Daniele Mazzocchi (Istituto Superiore Mario Boella, Italy) Organizing Chair: Andrea S. Atzeni (Politecnico di Torino, Italy)
Program Committee F. Bergadano, Universit` a di Torino E. Bertino, Universit` a di Milano L. Breveglieri, Politecnico di Milano A. Casaca, INESC, chairman IFIP TC6 M. Cremonini, Universit` a di Milano Y. Deswarte, LAAS-CNRS M. G. Fugini, Politecnico di Milano S. Furnell, University of Plymouth R. Grimm, Technische Universit¨at Ilmenau B. Jerman-Blaˇziˇc, Institut Joˇzef Stefan S. Kent, BBN T. Klobuˇcar, Institut Joˇzef Stefan A. Lioy, Politecnico di Torino P. Lipp, IAIK J. Lopez, Universidad de M´ alaga F. Maino, CISCO D. Mazzocchi, ISMB S. Muftic, KTH F. Piessens, Katholieke Universiteit Leuven P. A. Samarati, Universit` a di Milano A. F. G. Skarmeta, Universidad de Murcia L. Strous, De Nederlandsche Bank, chairman IFIP TC11 G. Tsudik, University of California at Irvine
Table of Contents
Cryptography Computation of Cryptographic Keys from Face Biometrics . . . . . . . . . . . . . Alwyn Goh, David C.L. Ngo
1
AUTHMAC DH: A New Protocol for Authentication and Key Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Heba K. Aslan
14
Multipoint-to-Multipoint Secure-Messaging with Threshold-Regulated Authorisation and Sabotage Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alwyn Goh, David C.L. Ngo
27
Network Security Securing the Border Gateway Protocol: A Status Update . . . . . . . . . . . . . . . Stephen T. Kent
40
Towards an IPv6-Based Security Framework for Distributed Storage Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Alessandro Bassi, Julien Laganier
54
Operational Characteristics of an Automated Intrusion Response System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maria Papadaki, Steven Furnell, Benn Lines, Paul Reynolds
65
Mobile and Wireless Network Security A Secure Multimedia System in Emerging Wireless Home Networks . . . . . Nut Taesombut, Richard Huang, Venkat P. Rangan
76
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yusuke Sakabe, Masakazu Soshi, Atsuko Miyaji
89
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Michelle S. Wangham, Joni da Silva Fraga, Rafael R. Obelheiro
Trust and Privacy Privacy and Trust in Distributed Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Thomas R¨ ossler, Arno Hollosi
VIII
Table of Contents
Extending the SDSI / SPKI Model through Federation Webs . . . . . . . . . . . 132 Altair Olivo Santin, Joni da Silva Fraga, Carlos Maziero Trust-X : An XML Framework for Trust Negotiations . . . . . . . . . . . . . . . . . . 146 Elisa Bertino, Elena Ferrari, Anna C. Squicciarini
Application Security How to Specify Security Services: A Practical Approach . . . . . . . . . . . . . . . 158 Javier Lopez, Juan J. Ortega, Jose Vivas, Jose M. Troya Application Level Smart Card Support through Networked Mobile Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Pierpaolo Baglietto, Francesco Moggia, Nicola Zingirian, Massimo Maresca Flexibly-Configurable and Computation-Efficient Digital Cash with Polynomial-Thresholded Coinage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Alwyn Goh, Kuan W. Yip, David C.L. Ngo
Multimedia Security Selective Encryption of the JPEG2000 Bitstream . . . . . . . . . . . . . . . . . . . . . 194 Roland Norcen, Andreas Uhl Robust Spatial Data Hiding for Color Images . . . . . . . . . . . . . . . . . . . . . . . . . 205 Xiaoqiang Li, Xiangyang Xue, Wei Li Watermark Security via Secret Wavelet Packet Subband Structures . . . . . 214 Werner Dietl, Andreas Uhl A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 David Meg´ıas, Jordi Herrera-Joancomart´ı, Juli` a Minguill´ on Loss-Tolerant Stream Authentication via Configurable Integration of One-Time Signatures and Hash-Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 Alwyn Goh, G.S. Poh, David C.L. Ngo Confidential Transmission of Lossless Visual Data: Experimental Modelling and Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252 Bubi G. Flepp-Stars, Herbert St¨ ogner, Andreas Uhl
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Computation of Cryptographic Keys from Face Biometrics Alwyn Goh1 and David C.L. Ngo2 1
Corentix Laboratories, B–19–02 Cameron Towers, Jln 5/58B, 46000 Petaling Jaya, Malaysia. [email protected] 2 Faculty of Information Science & Technology, Multimedia University, 75450 Melaka, Malaysia
Abstract. We outline cryptographic key–computation from biometric data based on error-tolerant transformation of continuous-valued face eigenprojections to zero-error bitstrings suitable for cryptographic applicability. Biohashing is based on iterated inner-products between pseudorandom and userspecific eigenprojections, each of which extracts a single-bit from the face data. This discretisation is highly tolerant of data capture offsets, with same-user face data resulting in highly correlated bitstrings. The resultant user identification in terms of a small bitstring-set is then securely reduced to a single cryptographic key via Shamir secret-sharing. Generation of the pseudorandom eigenprojection sequence can be securely parameterised via incorporation of physical tokens. Tokenised bio-hashing is rigorously protective of the face data, with security comparable to cryptographic hashing of token and knowledge key-factors. Our methodology has several major advantages over conventional biometric analysis ie elimination of false accepts (FA) without unacceptable compromise in terms of more probable false rejects (FR), straightforward key-management, and cryptographically rigorous commitment of biometric data in conjunction with verification thereof.
1 Introduction Biometric ergonomics and cryptographic security are highly complementary attributes, hence the motivation for the presented research. Computation of cryptographic keys from biometric data was first proposed in the Bodo patent [1], and is technically challenging from both signal processing and information security viewpoints. The representation problem is that biometric data (ie linear time-series or planar bitmaps) is continuous and high-uncertainty, while cryptographic parameters are discrete and zero-uncertainty. Biometric consistency—ie the difference between reference and test data, which are (at best) similar but never equal—is hence inadequate for cryptographic purposes which require exact reproduction. This motivates the formulation of offset-tolerant discretisation methodologies, the end result of which is also required to be protect against adversarial recovery of user-specific biometrics.
A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 1–13, 2003. © IFIP International Federation for Information Processing 2003
2
A. Goh and D.C.L. Ngo
2 Review of Previous Work The earliest publications in this domain are by Soutar et al [2, 3], whose research outlines cryptographic key-recovery from the integral correlation of freshly captured fingerprint data and previously registered bioscrypts. Bioscrypts result from the mixing of random and user-specific data—thereby preventing recovery of the original fingerprint data—with data capture uncertainties addressed via multiply-redundant majority-result table lookups. This ensures representation tolerance against offsets in same-user test fingerprints, but does not satisfactorily handle the issue of discrimination against different-user data.. The Davida et al [4, 5] formulation outlines cryptographic signature verification of iris data without stored references. This is accomplished via open token-based storage of user-specific Hamming codes necessary to rectify offsets in the test data, thereby allowing verification of the corrected biometrics. Such self-correcting biometric representations are applicable towards key-computation, with recovery of iris data prevented by complexity theory. Resolution of biometric uncertainty via Hamming error correction is rigorous from the security viewpoint, and improves on the somewhat heuristic Soutar et al lookups. Monrose et al key-computation from user-specific keystroke [6] and voice [7] data is based on the deterministic concatenation of single-bit outputs based on logical characterisations of the biometric data, in particular whether user-specific features are below (0) or above (1) some population-generic threshold. These feature-derived bitstrings are used in conjunction with randomised lookup tables formulated via Shamir [8] secret-sharing. Error correction in this case is also rigorous, with Shamir polynomial thresholding and Hamming error correction considered to be equivalent mechanisms [5]. The inherent scalability of the bitstrings is another major advantage over the Soutar et al methodology. Direct mixing of random and biometric data (as in Soutar er al) allows incorporation of serialised physical tokens, thereby resulting in token+biometric cryptographic keys. There are also advantages from the operations security viewpoint, arising from the permanent association of biometrics with their owners. Tokenised randomisation protects against biometric fabrication—as demonstrated by Matsumoto et al [9] for fingerprints, which is considered one of the more secure form factors—without adversarial knowledge of the randomisation, or equivalently possession of the corresponding token.
3 Bio–Hash Methodology This paper outlines cryptographic key-computation from face bitmaps, or specifically from Sirovich-Kirby [10, 11] eigenprojections thereof. The proposed bio-hashing is based on: (1) biometric eigenanalysis: resulting in user-specific eigenprojections with a moderate degree of offset tolerance, (2) biometric discretisation: via iterated innerproduct mixing of tokenised and biometric data, with enhanced offset tolerance, and (3) cryptographic interpolation: of Shamir secret-shares corresponding to token and biometric data, culminating in a zero-error key. Bio-hashing has the following ad-
Computation of Cryptographic Keys from Face Biometrics
3
vantages: (1) tokenised random mixing: in common with Soutar et al, (2) discretisation scalability: in common with Monrose et al, and (3) rigorous error correction: in common with Davida et al and Monrose et al. The proposed formulation is furthermore highly generic arising from the proposed discretisation in terms of innern
products ie s = a⋅b for a,b∈IR We believe our work to be the first demonstration of key-computation from face data, which seems difficult to handle (in common with other planar representations) using the Monrose et at procedure. Bio-hashing is essentially a transformation from representations which are high-dimension and high-uncertainty (the face bitmaps) to those which are low-dimension and zero-uncertainty (the derived keys). The successive N
representations are: (1) raw bitmap: x ∈ S in domain IR , with N the pixelisation n
dimension, (2) eigenprojection: a ∈ S′ in domain IR , with n << N the eigenbasis dimension, (3) discretisation: x ∈ S″ in domain 2 m , with m the bitstring length, and (4) interpolation: a in domain 2 m ; as illustrated below:
IR
IR
Fig. 1. Bio-hash representations and transformations
with enhanced stability at each step. Note this abstracted outlook does not take into account bitmap pre-processing prior to step (2), which is in actual fact extremely important due to the obvious correlation between the offset tolerances of (2) and (3). Enhancements in the former can be effected via application of Hambridge feature location [12] and eigenanalysis as reported in Ngo-Goh [13]. Our methodology is still straightforwardly applicable, with a and x in this case a concatenation of featurespecific contributions. The primary concern from the security viewpoint centres on protection of information during the representational transformations, and in particular whether these transformations can be inverted to recover the input information. The above-listed parameters are said to be zero knowledge (ZK) representations of their inputs if the transformations are non-invertible, as in the case of cryptographic hash h(i, j) : 2 m × ∀ 2 m′ → 2 m for token serialisation i and secret knowledge j. This m′ motivates an equivalent level of protection for biometric a; which is accomplished via token-specification of the (3) and (4) representations, such that bio-hash m
n
H (i, a) : 2 × IR → 2
m
does not jeopardise 〈i, a〉. ZK representation a = H(i, a) is
4
A. Goh and D.C.L. Ngo
subsequently useful for standard cryptographic operations ie signature generation and message decryption. Note H has an important (and challenging) additional requirement over h, namely offset tolerance so that H(i, a) is stable for ∀a ∈ S′. This requirement essentially addresses the fundamental gap between biometric similarity and cryptographic equality. Our methodology is outlined in the above-discussed stages, as follows:3.1 Biometric Eigenanalysis N
Sirovich-Kirby principal components analysis (PCA) presumes that IR face bitmaps n
are more effectively represented as IR eigenprojections, with interim dimensionality M << N corresponding to the number of distinct users in the bitmap database. Eigenface characterisation requires computation of eigenbasis e k (ranked by eigen-
value c k significance) for k = 1… M. The n << M principal eigenfaces enables descriptive accuracy up to an externally specified accuracy, with user-specific data † represented as are then computed as a k = e ⋅ d α . k Conventional biometrics requires storage of user-specific a so as to provide a reference against freshly captured test data. This is not satisfactory from the security viewpoint, as an intercepted a opens up the possibility of transaction fraud. Revocation of a (analogous to password refreshment or token replacement) is also highly problematic for all biometric forms, and impossible for face data. This dilemma is a major motivation for our work, particularly in its emphasis that stored references are fundamentally insecure and that bio-hashing should operate in a one-way manner on fresh data, analogous to password hashing. 3.2 Biometric Discretisation The most offset tolerant transformation on face data a single-bit. This is accomplished via:1.
2.
n
∈ IR is reduction down to a
Compute s( a, b) = a⋅b = ∑ c k (a k b k ) with random normalised b ∈ IR k 0 : s < µ−σ Assign b(s) = 1: s > µ+σ ∅ : s ∈ [µ−σ, µ+σ]
n
for empirical µ and σ, the former of which should theoretically vanish due to above specification of a relative to the population average. Extracted b(a⋅b) is a broad measure of whether 〈a, b〉 are inline or opposed, with σ applied to exclude the perpendicular case. This exclusion mitigates against data capture uncertainties in a, which might otherwise result in bit-inversion for numerically small s. Repetition of this procedure to obtain multiple bits raises the issue of inter-bit correlations, which is addressed via orthonormal set β = {b k : k = 1ν} with ν < n.
Computation of Cryptographic Keys from Face Biometrics
5
Each bit x k = b (a⋅b k ) is hence rendered independant of all others, so that legitimate (and unavoidable) variations in ∀a ∈ S′ that invert x k would not necessarily have the same effect on x k ′ . Inter-bit correlations and observations thereof are also important from the security viewpoint, the latter of which is prevented via cryptographic hashing of the concatenated bits. Indeterminate bits x k = ∅ are handled via replacement of nearperpendicular b k with alternative b′ k , the net effect of which is bit-extraction via
adjusted set β − ∀ b k + ∀ b′k . This reformulation is facilitated by the original k ∈⊥ k ∈⊥ stipulation on ν, which allows up to n–ν replacements for unsuitable b k . The proposed discretisation via repeated inner-products then proceeds as follows: 1. Generate random β + ∀b′ for k = 1…ν…n 2. Orthonormalise β + ∀b′ via Gram-Schmidt procedure 3. For each k = 1…ν: 1. Compute s k = a⋅b k 2. While s k ∈ [µ−σ, µ+σ] :1. Get next unused b′ 2. Reassign b k = b′ in β 3. Recompute s k
Assign x k = b (s k ) 4. Concatenate α = ∀ x k k 5. Compute x = h(α) Note the easy adaptability to the previously discussed multi-feature biometrics, and also the inherent scalability (with respect the α∈ 2ν bitlength) equivalent to the Monrose et al methodology. The experimental data in the next section is designed to address signal processing issues, hence the omission of step (5) there. Step (3.2) is critical for representational stability ie the confinement of x(a) for ∀a ∈ S′ to a small set S″, so as to facilitate mapping down to a single cryptographic key. This requires the generic stability of random x (a ⋅ b k ) ; and is a fundamental motivation for the 3.
presented error correction at two stages, the first of which uses σ valuation to mitigate against continuous-valued uncertainties in a. The second-stage addresses the discretisation of these uncertainties in x(a) ∈ 2 m . Recall the stipulation that a be protected equivalent to other cryptographic keyfactors, which is accomplished via the use of tokenised cryptographic mechanisms— ie X9.17 pseudorandom generators [14] constructed from ciphers or hashes—in step (1). Resultant sequence β(i) and output x(i, a) are hence ZK representations of i, and consequently protective of a as subsequently outlined; which is reminiscent of the Soutar et al methodology. Note the effect of different token i′ on the β sequence, resulting in x(i, a) ≠ x(i′, a) to a high degree of certainty. The proposed tokenised
6
A. Goh and D.C.L. Ngo
discretisation can therefore be said to combine the best attributes of the Soutar et al and Monrose et al approaches. 3.3 Cryptographic Interpolation The limited uncertainty of x ∈ S″ is addressed via Shamir secret-sharing; which uses modular polynomial f (x) : ZZ q →ZZ q for secret encoding f (0) = a, which is the 2
m ZZ q cryptographic key in our context. In the simplest linear case, this allows
x⋅f (x ′) x ′⋅ f (x) + (mod q) with x = x(i, a) and x′ = h(i). Coorx − x′ x′ − x dinate pair 〈x, f(x)〉 constitutes a secret-share, any two of which can be combined to recover a. The operational concept is to match one of the biometric-associated shares with the token-associated one, so as to be consistent with the above-outlined discussion on token-specific discretisations. This is a rigorous 2-of-µ threshold system, with µ = |S″| the number of possible discretisation outcomes corresponding to a particular user. The still approximate nature of the above-presented discretisation is addressed via prior specification and token-side insertion of X = 〈∀〈χ, y〉, c〉, with: (1) χ = h(x) x ′⋅f (x) and y = mod q for ∀a ∈ S′, and (2) c = f(x′) mod q; corresponding to some x′ − x random key-encoding polynomial f. Key-computation then commences as follows:1. Retrieve X from token 2. Compute x = x(i, a) as previously outlined 3. Select y such that χ = h(x) else stop c⋅x + y (modq) 4. Compute a = x − h(i) provided ∀x ∈ S″ have been properly identified. Note that a(i, a) in step (4) cannot be computed without one of the correct discretisations x or token i, and that neither of these can be recovered from the ZK representations in X. The latter can in fact be stored completely in the open, which is illustrative of the protocol-level security comparable to the Davida et al and Monrose et al formulations. This is in complete contrast to the highly sensitive handling of biometric references, and the serious consequences arising from failure thereof. It is furthermore possible to encode a x ′⋅f (x ′′) password-associated key-share via prior specification in X of y′′ = mod q x ′ − x ′′ with x″ = h(j) from password j. This enables subsequent token+knowledge computac⋅x ′′ + y′′ (modq) via y″ from token-side X as a backup option when the tion a = x′′ − h(i) usual token+biometric computation is inapplicable ie in low-light conditions. Polynomial thresholding is rigorous and versatile, but is on the other hand restrictive in that (small) µ has to known a priori. This requires a high degree of error tolerance in the above-discussed discretisation x(i, a), as would result from suitable adsecret recovery via a =
Computation of Cryptographic Keys from Face Biometrics
7
justment of σ. Key-interpolation is interpreted as a final error-correcting step in this context, supplementing the basic robustness of random bit-extraction and the replacement of bits over-sensitive to legitimate variations in a. End result a = H(i, a) is hence: (1) sensitively dependant on i: so that exact correctness is required for β(i) and x′(i), the former of which contributes sensitively towards x(i, a) ∈ S″, (2) robustly dependant on a; commensurate with the discrete i and continuous a key-factors.
4 Experimental Data The proposed methodology is tested on Spacek’s Faces94 dataset [15] posted on the University of Essex Website. This dataset contains frontal face photos taken from a fixed camera distance, with the subjects asked to speak throughout the process; resulting in biometric data with the following characteristics: (1) database size: 153 individuals, 3060 images, (2) bitmap dimension: 180×200 pixels, 256-level grayscale, (3) photo illumination: relatively uniform, with dark background, (4) face scale in image: relatively uniform, (5) face position in image: minor variations, (6) face aspect: very minor variations in turn, tilt and slant, and (7) face expression: significant variation due to speech. Faces94 is considered to be somewhat less challenging in comparison to other widely analysed datasets (ie Faces95 and Faces96) from the viewpoint of scale, aspect and illumination offsets; but is excellent for our purposes as it simulates our anticipated operational scenario ie individual users in desktop or kiosk environments. There scenarios allow biometric capture under relatively controlled conditions, with users safely presumed to be facing forward in adequately illuminated surroundings. Recall the focus of this paper on the effects of post-eigenanalytic discretisation and error correction; hence our omission of image-preprocessing, which is acceptable for Faces94 but far less so for the other datasets. We look forward to presenting a more comprehensive analysis—with more challenging data, and incorporating image-preprocessing—in a subsequent publication. Faces94 is furthermore quite large with 20 distinct images per person; so that half can be used for establishment of the population eigenbasis, and the rest for testing. The featured experimental configurations are as follows: (1) pca-n: denoting n IR eigenanalysis, (2) pca+d-n: denoting 2 n σ = 0 discretisation without exclusion of weak inner-products, and (3) pca+de-n: denoting 2 n discretisation with σ error-correction based on analysis of inner-products computed from random and user-specific eigenprojections. The last configuration amounts to exclusion parameter σ amounting to selection of the n most significant inner-products from a n’
random sample of size n′ > n. This necessitates a IR eigenbasis, with n′–n corresponding to the Hamming distances between same user discretisations. Our methodology requires relatively small Hamming distances in the pca+d-n configurations, which are then further reduced via error-correction for pca+de-n. We acquired experimental data for n = 20, 30, 40, 50, 60, 70 and 80 in all cases.
8
A. Goh and D.C.L. Ngo
4.1 Same and Different User Histogrammes Population-wide histogrammes for: (1) Euclidean distance between same and different user eigenprojections, (2) Hamming distance between same and different user discretisations; are presented below:4
16
4
Histograms for PCA−20, PCA+d−20 and PCA+de−20
x 10
12 Same user (pca) Different users (pca) Same user (pca+d) Different users (pca+d) Same user (pca+de) Different users (pca+de)
14
Histograms for PCA−40, PCA+d−40 and PCA+de−40
x 10
Same user (pca) Different users (pca) Same user (pca+d) Different users (pca+d) Same user (pca+de) Different users (pca+de)
10
12 8 10
8
6
6 4 4 2 2
0
0
0.5
1
1.5
2
0
2.5
0
0.2
0.4
0.6
0.8
(a) 4
10
Histograms for PCA−60, PCA+d−60 and PCA+de−60
x 10
1
1.2
1.4
1.6
1.8
2
(b) 4
9 Same user (pca) Different users (pca) Same user (pca+d) Different users (pca+d) Same user (pca+de) Different users (pca+de)
9
8
Histograms for PCA−80, PCA+d−80 and PCA+de−80
x 10
Same user (pca) Different users (pca) Same user (pca+d) Different users (pca+d) Same user (pca+de) Different users (pca+de)
8
7
7 6 6 5 5 4 4 3 3 2
2
1
1
0
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0
0
0.2
0.4
0.6
(c)
0.8
1
1.2
1.4
1.6
1.8
2
(d)
Fig. 2. Same and different user histogrammes for pca-n, pca+d-n and pca+de-n; for n = (a) 20, (b) 40, (c) 60 and (d) 80
on a normalised scale, with measure ∆x′ derived from: (1)
∆x for Euclidean dis2x p
∆x tances, with x p the peak of the different user histogrammes for pca-n, and (2) n for Hamming distances arising from pca+d-n and pca+de-n. Note the occurence of histogramme peaks—for pca+d-n (red-highlighted) and n (differpca+de-n (blue-highlighted)—at Hamming distances of 0 (same user) and 2 ent users), both of which are strong vindications of the proposed methodology. Clear separation of the same and different user histogrammes is extremely important from the security viewpoint, hence the attractiveness of the pca+de-n same user histogrammes with its much steeper peak-to-plateau drop-offs compared to the corre-
Computation of Cryptographic Keys from Face Biometrics
9
sponding pca+d-n profiles. The above-outlined Euclidean normalisation allows for qualitative comparison of pca-n characteristics, which also emphasises the advantages of the pca+de-n configurations. These sharp drop-offs are clearly apparent in the n = 40 and 60 cases, but less so for n = 20 and 80. This can be attributed to the descriptive insufficiency for low n, and over-sensitivity to noise for high n configurations; not just for the proposed n α∈ 2 n bitstrings but also for the basic. a ∈ IR The form of the pca+de-n = 40 and 60 histogrammes allows for specification of zero FAs without overly jeopardising the FR performance. FR (FA = 0) is, in fact, an important merit criteria in the proposed framework, which anticipates H(i, a) parameterised cryptographic functionality. It is important to be able to preclude the occurence of FAs in this context. 4.2 FA and FR Characteristics Establishment of FR (FA = 0) and the more commonly cited crossover error (CE) rate (at which point FA = FR) for a particular configuration requires analysis of the FAFR operational characteristics ie: CE % for Pca−40, Pca+d−40 and Pca+de−40 5 Pca−40 , Pca+d−40 Pca+de−40
4.5
4
3.5
FA %
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
(a) CE % for Pca−60, Pca+d−60 and Pca+de−60
3.5
4
4.5
5
5 Pca−60 , Pca+d−60 Pca+de−60
4.5
Pca−80 , Pca+d−80 Pca+de−80
4.5
4
4
3.5
3.5
3
3
FA %
FA %
3
CE % for Pca−80, Pca+d−80 and Pca+de−80
5
2.5
2.5
2
2
1.5
1.5
1
1
0.5
0.5
0
2.5 FR %
(b)
0
0.5
1
1.5
2
2.5 FR %
(c)
3
3.5
4
4.5
5
0
0
0.5
1
1.5
2
2.5 FR %
3
3.5
4
4.5
5
(d)
Fig. 3. Operational characteristics for for pca-n, pca+d-n and pca+de-n; for n = (a) 20, (b) 40, (c) 60 and (d) 80
10
A. Goh and D.C.L. Ngo
Note the higher CE rates of pca+d-n (red-highlighted) compared to the corresponding pca-n configuration; the former of which eventually drops under the latter, corresponding to lower FR (FA = 0) rates. The pca+d-n configuration is hence more secure than the corresponding pca-n, but on the other hand somewhat less robust in terms of recognition. Error-correction can certainly be expected to improve recognition, as can be seen from the consistent location of pca+de-n (blue-highlighted) inside the corresponding pca+d-n profile. This reduces the CE point dramatically for the n = 20, 40 and 60 cases; but (in common with the Fig 3 histogrammes) less so for n = 80. The CE points for pca+de-n are in fact a significant improvement over the corresponding pca-n, again with the notable exception of the n = 80 case. 4.3 General Characteristics The general characteristics of pca-n, pca+d-n and pca+de-n are as follows: Table 1. Characteristics of (a) pca-n, (b) pca+d-n and pca+de-n a eigenbasis 20 30 40 50 60 70 80 α bitlength
20 30 40 50 60 70 80
Same user diff (Euclidean) 0.030 0.035 0.041 0.046 0.049 0.053 0.055
Same user diff (Hamming) pca+d pca+de 1.15 0.03 1.72 0.09 2.18 0.15 2.85 0.47 3.53 0.86 4.17 1.79 4.48 4.51
FR % (FA = 0) 4.47 2.80 2.49 2.37 2.26 1.69 1.60
FR % (FA = 0) pca+d pca+de 29.70 3.37 8.42 0.01 4.04 0.01 1.98 0.27 1.57 0.22 1.17 0.35 1.31 0.93
CE % 0.56 0.57 0.49 0.49 0.41 0.55 0.37 CE % pca+d 2.07 1.02 0.77 0.57 0.49 0.55 0.37
pca+de 0.02 0.01 0.01 0.07 0.10 0.15 0.34
and clearly illustrate the functional shortcomings of under and over-sized n representations. Choice of operational n in the (30, 45) range appears most suitable, so as to simultaneously avoid degraded recognition and frequent occurence of bit-errors. Note the relatively small Hamming distances between same user pca+d-n bitstrings, which vindicates the Section 3.2 discretisation and error-correction. This implies the sufficiency of relatively small n′–n margins. The even smaller Hamming differences—less than a single-bit for most of the above-tabulated operational range—in the augmented pca+de-n case is also encouraging as it suggests a relatively small number of x(i, a) outcomes per user, which is important for the Section 3.3 interpolation.
Computation of Cryptographic Keys from Face Biometrics
11
5 Security Analysis The security of H should be evaluated in terms of key-factor: (1) independence: ie evaluation of a = H(i, a) in the absence of i or a, and (2) non-recovery: of i or a given specific value of a(i, a) and the other factor; with the benchmark being cryptographic hashing of i and secret knowledge j. Recall that a = h(i, j) cannot be computed without both 〈i, j〉 factors, so that adversarial deduction is no more more probable than random guessing of order 2 − m . Factorisation 〈i, j〉 is also protected by the targetcollision resistance of h, so that deduction of i or j—from output a(i, j) and one of the factors—is equally improbable. 5.1 Key–Factor Independence Non-possession of i means that tokenised β(i) is unavailable to an adversary, so that previously intercepted (or fabricated) a is simply not useful. This prevents meaningful deduction of a(i, a), with random guessing being of probability q −1 in this case. Possession of i is more useful as it divulges ∀χ = h(x) from the token-inserted X(i, a), which suggests an analytic strategy whereby random α∈ 2ν bitstrings are tested for suitability with respect condition h(h(α)) = χ. The collision probability is µ 2 −ν in this case, hence the motivation to minimise µ and to maximise ν. This is accomplished via suitable choice for inner-product exclusion parameter σ (which serves no useful purpose if over-large); and also by adoption of the previously discussed multifeature eigenanalysis [13], so that lengthier α can be concatenated from featurespecific bitstrings. Note α with arbitrarily large ν are straightforwardly obtained from integral transform representations, which do not restrict the length of the β(i) sequence. Recall this issue of discretisation scalability also arises in the Monrose et al formulation. The operational security of our scheme is enhanced via token-side access control and encryption of X, with respective parameterisation 〈k, k′〉 = h(i′, i) for domain or platform serialisation i′. This necessitates prior token-side insertion of Ψ = E k′(X) , with the following operational sequence: 1. Compute 〈k, k′〉 from token i 2. Transmit k to retrive Ψ from token 3. Recover X = D k ′(Ψ ) prior to the computations of Section 3, successful completion of which is restricted to domain/platform i′.
5.2 Key-Factor Non–recovery Knowledge of a(i, a) and a does not in any way jeopardise i, due to non-recovery of: (1) any x ∈ S″ from a, (2) any b k ∈ β(i) from x and a, and (3) i from β(i) or any
12
A. Goh and D.C.L. Ngo
subset thereof; thereby resulting in i deduction being no less probable than the 2 − m of random guessing. The other scenario of a and i compromise allows testing of random a
n
∈ IR eigenprojections for suitability with respect condition h(x(i, a)) = χ.
Probability of a recovery in this case is µ p ν , with p < 2 −1 due to exclusion of numerically small inner-products.
Key-factor protection is enhanced via reasonable operational measures: (1) minimisation of µ and maximisation of ν, and (2) access control and encryption of X; in addition to incorporation of i′ dependence in the β sequence. This β(i, i′) specification is straightforwardly accomplished ie via initialisation b 0(i′) for the proposed X9.17 pseudorandom generator. 5.3 Cryptographic Applicability The above-outlined a = H(i, a) computation facilitates the application of asymmetric cryptogaphic protocols, ie for (1) online verification over a priori insecure environments, or (2) offline commitment and subsequent verification in relation to specific data; without presumptions that might be operationally inconvenient or unrealistic ie the establishment of communications security prior to biometric verification. Secure channel establishment in any case requires cryptographic support—ie the DiffieHellman (DH) [14] protocol—hence the motivation for the integrated handling of biometric and communications security, as subsequently outlined. Bio-hash H allows for cryptography predicated on 〈i, a〉 possession, which is more secure (due to simplicity of the key-computation conditions) and furthermore supportive of greater functional sophistication. Cryptographic operations are straightforwardly parameterised via discrete logarithmic (DL) [14] or elliptic curve (EC) [16] key-pair of form 〈a(i, a), A(a)〉 with public-key A = a⋅g for basepoint g in some scalar-multiplicative subgroup G q ⊆ E of the specified curve. User-specific key-pair 〈a, A〉 is hence a ZK representation of 〈i, a〉 via the H and g (a) : ZZ q → G q transformations, with remote identification in terms of A(i, a) ∈ G q . This is qualitatively superior compared to the insecure and functionally limited a ∈ IRn of conventional biometrics.
6 Concluding Remarks This paper outlines error-tolerant discretisation and cryptographic key-computation from user-specific face images and uniquely serialised tokens. Our bio-hash methodology has significant functional advantages over conventional biometrics ie extremely clean separation of the same and different user histogrammes and near-zero CE point, thereby allowing elimination of FAs without suffering from increased occurence of FRs. H(i, a) is furthermore highly secure with respect independence and non-
Computation of Cryptographic Keys from Face Biometrics
13
recovery of the 〈i, a〉 key-factorisation, with tokenised immunity against biometric interception or fabrication. Use of token+biometric key a(i, a) within the context of asymmetric cryptography is also attractive in that it enables secure and versatile functionality.
References 1. 2. 3. 4. 5.
6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.
A Bodo (1994). Method for Producing a Digital Signature with Aid of a Biometric Feature. German Patent DE 42–43–908–A1 C Soutar & GJ Tomko (1996). Secure Private Key Generation Using a Fingerprint. Cardtech/Securetech Conf 1: pp 245–252 C Soutar, D Roberge, A Stoianov, R Gilroy & BVK Vijaya Kumar (1998). Biometric Encryption Using Image Processing. SPIE 3314: pp 178–188 GI Davida, Y Frankel & BJ Matt (1998). On Enabling Secure Applications Through Off– Line Biometric Identification. IEEE Symp on Security & Privacy: pp 148–157 GI Davida, Y Frankel, BJ Matt, & R Peralta (1999). On the Relation of Error Correction and Cryptography to an Off–Line Biometric–Based Identification Scheme. Wkshop Coding & Cryptography: Paris France F Monrose, MK Reiter & S Wetzel (1999). Password Hardening Based on Keystroke Dynamics. 6–th ACM Conf on Comp & Comms Security: pp 73–82 F Monrose, MK Reiter, Q Li & S Wetzel (2001). Cryptographic Key Generation from Voice. IEEE Symp on Security & Privacy: pp 202–213 A Shamir (1979). How to Share a Secret. ACM Comms 22 (11): pp 612–613 T Matsumoto, H Matsumoto, K Yamada & H Hoshino (2002). Impact of Artificial “Gummy” Fingers on Fingerprint Systems. SPIE 4677: ?? L Sirovich & M Kirby (1987). A Low–Dimensional Procedure for Characterisation of Human Faces. J Optical Soc 4 (3): pp 519–524 M Turk & A Pentland (1991). Face Recognition Using Eigenfaces. IEEE Conf Comp Vision & Pattern Recognition: pp 586–591 J Hambridge (1926). The Elements of Dynamic Symmetry. Yale Univ Press, New Haven USA DCL Ngo & A Goh (2003). Facial Feature Extraction via Dynamic Symmetry Modelling for User Identification. Pattern Pattern Recognition Letters. AJ Menezes, P van Oorschot & S Vanstone (1996). Handbook of Applied Cryptography. CRC Press, Boca Raton USA. L Spacek (2000). Face Recognition Data. http://cswww.essex.ac.uk/allfaces/index.html AJ Menezes (1993). Elliptic–Curve Public–Key Cryptosystems. Kluwer Academic Press, Boston USA.
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution Heba K. Aslan The Electronics Research Institute, El-Tahrir St., Dokki, Cairo, Egypt [email protected]
Abstract. In the present paper, a new protocol for authentication and key distribution is proposed. The new protocol has the aim to achieve a comparable performance with the Kerberos protocol and overcome its drawbacks. For authentication of the exchanged messages during authentication and key distribution, the new protocol uses the Message Authentication Codes (MAC) to exchange the Diffie-Hellman components. On the other hand, the new protocol uses nonces to ensure the freshness of the exchanged messages. Subsequently, there is no need for clock synchronization which will simplify the system requirements. The new protocol is analyzed using queuing model, the performance analysis of the new protocol shows that the new protocol has a comparable performance with the Kerberos protocol for short messages and outperforms it for large messages.
1 Introduction In recent years, many applications that require the exchange of sensitive information over public networks increase considerably. This introduces the need of two requirements: firstly, the need for providing authenticity of the communicating parties and to ensure the freshness of the exchanged messages. Secondly, the need to ensure the security of exchanged information. In literature, many protocols for authentication and key distribution were given. While some of them use symmetric key techniques, the others use public key techniques to distribute a symmetric key. Examples of the first method are: the Kerberos protocol [1, 2], and the Kryptoknight protocol [3]. On the other hand, examples of the second method are: the SPX protocol [4, 5], and the authenticated Diffie-Hellman protocol [6]. Among the above protocols, the Kerberos protocol achieves a widespread use especially for UNIX environment. This is due to the fact that it uses symmetric techniques; therefore, it leads to a better performance over the other protocols. In the present paper, a new protocol for authentication and key distribution is proposed. The new protocol, which is named AUTHMAC_DH, is based on work done in [7, 8] and has the aim to achieve a comparable performance with the Kerberos protocol and overcome its drawbacks [9]. The paper is organized as follows: in Section 2, a description of the new protocol is detailed. In Section 3, expressions for the performance analysis of both the Kerberos protocol and the new protocol were derived. In Section 4, numerical results of the new protocol are discussed. Finally, the paper concludes in Section 5. A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 14–26, 2003. © IFIP International Federation for Information Processing 2003
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
2
15
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
The Kerberos protocol is based on the exchange of symmetric key between the communicating parties using symmetric key encryption. Therefore, in case of compromise of messages exchanged during authentication and key distribution, all subsequent communication will be susceptible to disclosure. Besides the above feature, to ensure the freshness of the exchanged messages, timestamps are used. Consequently, the need for synchronizing all clocks of the system, which is a difficult problem, becomes a vital requirement. To overcome the abovementioned drawbacks, the new protocol uses the Message Authentication Codes (MAC) as in KryptoKnight protocol to exchange the Diffie-Hellman components as in the authenticated Diffie-Hellman protocol. Since, the compromise of messages exchanged during authentication and key distribution will lead to the disclosure of the Diffie-Hellman components. Therefore, the attacker cannot calculate the symmetric key used between the communicating parties (the difficulty of computing discrete logarithm for a large modulo number is a well-known problem in number theory). The use of MAC will fasten the proposed protocol. On the other hand, the new protocol uses nonces to ensure the freshness of the exchanged messages. Subsequently, there is no need for clock synchronization which will simplify the system requirements. In order to achieve the goals of authentication and key distribution, a third-party called the Security Manager (SM) is incorporated into the system. The SM stores all the Diffie-Hellman components of all registered users and it shares a symmetric key with all users. Each station needs to store its Diffie-Hellman secret and the symmetric key shared between it and the SM. Fig. 1 shows the steps required to perform authentication and key distribution. The steps of the protocol are described in the following paragraphs. Step 1: When client A wants to communicate with server B, it sends to the SM a message consisting of: A’s identity ‘A’, B’s identity ‘B’, and a nonce Na. Step 2: After receiving A’s request, the SM calculates the MAC of the following messages: - The symmetric key shared between SM and A ‘KSM,A’, A, B, Na, the DiffieHellman component of A ‘axmodp’, and the Diffie-Hellman component of B ‘aymodp’. - The symmetric key shared between SM and B ‘KSM,B’, A, B, Na, axmodp, and aymodp. Then, it sends to both A and B a message consisting of: A, B, Na, axmodp, aymodp, MAC(KSM,A, A, B, Na, axmodp, aymodp), and MAC(KSM,b, A, B, Na, axmodp, aymodp). Step 3: Upon receiving the reply of the SM, both A and B authenticate the receiving message using the MAC appended in the reply. A ensures the freshness of the received message using Na. Then, A and B calculate the symmetric key axymodp which will be used to encrypt subsequent communication between them. Next, B generates a nonce Nb and sends to A a message consisting of Na and Nb both encrypted using axymodp. After receiving B’s reply, A decrypts the message of B, and compares between the nonce included in the message and the nonce generated by it ‘Na’. If they are equal, this implies that B can calculate axymodp. Therefore, A authenticates B.
16
H.K. Aslan
Step 4: A sends to B a message consisting of Nb encrypted using axymodp. Upon receiving the reply of A, B authenticates A by decrypting the reply and comparing between the nonce included in the message and the nonce generated by it ‘Nb’. After completion of the abovementioned steps, both A and B authenticate each other, and share a symmetric key axymodp. The proposed protocol has the following advantages over the Kerberos protocol: - It does not rely on timestamps which simplify the system requirements. - Since, the Diffie-Hellman components are exchanged during authentication and key distribution and not the symmetric key itself as in the Kerberos protocol. Therefore, the compromise of messages exchanged during authentication and key distribution does not lead to the disclosure of the symmetric key used between A and B. In the next section, expressions for the performance analysis of both the AUTHMAC_DH and the Kerberos protocols will be derived.
Step 1: A sends to the SM a request to communicate with B Transmitted message of Step1: [A, B, Na] Step 2: The SM broadcasts its reply to both A and B Transmitted message of Step2: [A, B, Na, axmodp, aymodp, MAC(KSM,A, A, B, Na, axmodp, aymodp), MAC(KSM,B, A, B, Na, axmodp, aymodp)] x : Diffie-Hellman secret of A y : Diffie-Hellman secret of B a, p : Diffie-Hellman parameters Step 3: A authenticates B Transmitted message of Step3: [Na, Nb]axymodp Step 4: B authenticates A Transmitted message of Step4: [Nb]axymodp Fig. 1. Steps required during authentication and key distribution for the AUTHMAC_DH protocol
3 Performance Analysis of the AUTHMAC_DH and the Kerberos Protocol In order to derive the performance expressions, the following assumptions are made: number of clients = m and the number of servers = n. All clients are identical and
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
17
have the same statistics. The same is true for the servers. The client requests a service from the server at rate requests/sec. The rate of messages exchange between the client and the server is v messages/sec. Messages exchanged between clients and servers are of exponential distribution with mean length equals to L bits. Symmetric encryption is done using the Advanced Encryption Standard (AES) [10] with rate TAES bits/sec. Calculation of the Diffie-Hellman components is done using the Chinese remainder theorem [11] with rate TDH bits/sec. Message authentication codes are calculated using the UMAC algorithm [12] with rate TMAC bits/sec. The network capacity = C bits/sec. In the following subsections, expressions for the performance analysis of the AUTHMAC_DH and the Kerbros protocols will be derived. 3.1 Performance Analysis of the AUTHMAC_DH Protocol Fig. 2 illustrates the time sequence diagram of the AUTHMAC_DH protocol. The steps of the protocol are given in Section 2. According to Fig. 2, the mean message time Tmes for transmitting a message from a client to a server is given by: Tmes=
1 {E[Wc1]+Sc1+Wn1+tn1+E[Wsm]+Ssm+Wn2+tn2+E[Ws1]+Ss1+Wn3+ r tn3+E[Wc3]+Sc3+Wn4+tn4+E[Ws2] +Ss2}+ E[Wc4]+Sc4+ Wn5+tn5+ E[Ws3]+Ss3
(1) where, r is the number of transmitted messages between the client and the server using the key calculated in the key distribution phase, Wc1 is the waiting time in the client queue (A) before processing the request for the communication with the server (B), Sc1 is the time required for processing the request of A to communicate with B, Wn1 is the waiting time in the network queue, tn1 is the time required for transmitting the request of A to the SM, Wsm is the waiting time in the Security Manager (SM) queue before processing the request of A, Ssm is the time required to calculate the UMAC of (KSM,A, A, B, Na, axmodp, aymodp), and the UMAC of (KSM,B, A, B, Na, axmodp, aymodp)], Wn2 is the waiting time in the network queue, tn2 is the time required for transmitting the SM reply to both A and B, Ws1 is the waiting time in the server’s queue before processing the reply of the SM, Ss1 is the time required to calculate the UMAC of (KSM,B, A, B, Na, axmodp, aymodp), to calculate axymodp, and to encrypt (Na, Nb) using the AES cipher, Wn3 is the waiting time in the network queue, tn3 is the time required for transmitting the server’s reply to A, Wc3 is the waiting time in the client queue before processing B’s reply, Sc3 is the time required to decrypt(Na, Nb), and to encrypt (Nb) using the AES cipher, Wn4 is the waiting time in the network queue, tn4 is the time required for transmitting A’s reply to B, Ws2 is the waiting time in the server’s queue before processing the reply of A, Ss2 is the time required to decrypt (Nb), Wc4 is the waiting time in the client queue before encryption of the message transmitted to B, Sc4 is the time required to encrypt a message of length L using the AES cipher, Wn5 is the waiting time in the network queue, tn5 is the time required for transmitting the message from A to B, Ws3 is the waiting time in the server’s queue before processing the message of A, and Ss3 is the time required to decrypt A’s message.
18
H.K. Aslan
In order to calculate the mean message time, the following assumptions are made: A’s identity = B’s identity = 8 bits, Na = Nb = 16 bits, axmodp = aymodp = axymodp = 512 bits, KSM,A = KSM,B = 128 bits, the encryption block of the AES = 128 bits, the output block of UMAC algorithm = 32 bits, and TAES = 20TDH = TMAC/5. According to the previous assumptions, the following parameters are calculated: Sc1 = 0 (since it involves no encryption), Ssm = 476/TAES, Sc2 = 10478/TAES, Ss1 = 10510/TAES, Sc3 = 48/TAES, Ss2 = 16/TAES, and Sc4 = Ss3 = L/TAES.For Simplicity the following assumptions are made: Wc1= Wc2 = Wc3 = Wc4 =E[Wc], Ws1= Ws2 = Ws3 = E[Ws], and Wn1= Wn2 = Wn3 = Wn4 = Wn5 =E[Wn]
Fig. 2. The time sequence diagram for the AUTHMAC_DH protocol
For the client queue: The client is modeled as an M/G/1 queue. It has to be noted that the key calculated in the authentication and key distribution phase ‘axymodp’ has a length of 512 bits. Therefore, four keys each of 128 bits could be extracted from these 512 bits. Each key could be used r’ times between the client and the server (i.e. r = 4r’). The client queue has the following parameters: the arrival rate client = (number of arrivals per ticket/lifetime of the ticket)*number of servers = (3 + 4r’)nλ = (3 + 4r’)nv , the mean service time Tclient = 4r’ 10497 + 4r’L , and the traffic intensity (4r’+3)TAES
client
1 4r’ (S + S + Sc3 ) + S = 4r’+3 c1 c2 4r’+3 c4 = nv(2624 + r’L) . For M/G/1 queues, the r’TAES
waiting time is given by [13, Eq. (2.65)]: 2 E[W] = λE[τ ] 2(1 - ρ )
(2)
where is the traffic intensity, is the mean arrival rate, and E[ 2] is the second moment of the service time and is equal to: 2 E[ 2] = 109790788 + 8r’L (4r’+3)TAES2
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
19
Substituting into Eq. (2), the average waiting time E[Wc] is equal to: E[Wc] =
nv(13723849 + r’L2 ) nv(2624 + r’L) r’TAES2 (1 − ) r’TAES
(3)
For the server queue: The server queue is modeled as an M/G/1 queue with the following parameters: server
= mv(1 + 2r’) , Tserver = 2r’
5263 + 2r’L 1 4r’ , (S + S ) + S = 4r’+2 s1 s2 4r’+2 s3 (2r’+1)TAES
server
=
2 mv(2632 + r’L) , and E[ 2]= 55230178 + 2r’L . Substituting into Eq. (2), the average r’TAES (2r’+1)TAES2 waiting time E[Ws] is equal to:
E[Ws] =
mv(27615089 + r’L2 ) mv(2632 + r’L) ) 2r’TAES2 (1 − r’TAES
(4)
For the Security Manager queue: The SM queue is modeled as an M/D/1 queue with 120mnv mnv 476 , Tsm = , and sm= . For M/D/1 the following parameters: sm = 4r’ r’TAES TAES queue the average waiting and service time is given by [13, Eq. (2.63)]: E[w+s] = 1 - ρ/2
(5)
µ (1 − ρ )
where is the traffic intensity to the queue, and is its service rate. Substituting into Eq. (5), the average waiting and service time E[WSM+ SSM] is equal to: 476 60mnv (1 ) E[Wsm+ Ssm] = TAES r’TAES 120mnv 1− r’TAES
(6)
For the network queue: The network queue is modeled as an M/G/1 queue with the following parameters: network = mnv(r’+1) , the mean packet length = 300 + r’L , r’ r’+1 2 = mnv(300 + r’L) , and E[ 2]= 314176 + 2r’L . Substir’C C(r’+1) (r’+1)C2 tuting into Eq. (2), the average waiting time E[Wn] is equal to:
Tnetwork = 300 + r’L ,
E[Wn] =
network
mnv(157088 + r’L2 ) mnv(300 + r’L) r’C2 (1 − ) r’C
(7)
20
H.K. Aslan
The mean message time Tmes given in Eq. (1) can be now calculated using Eqs. (3, 4, 6, and 7): Tmes =
+
+
120 60mnv (1 ) 300 + r’L 2644 + 2r’L r’TAES + + r’TAES r’C 120mnv r’TAES 1− r’TAES
2 mv(2r’+1)(27615089 + r’L2 ) + nv(2r’+1)(13723849 + r’L ) nv(2624 + r’L) mv(2632 + r’L) 2 2 ) 4r’2 TAES2 (1 − ) 2r’ TAES (1 − r’TAES r’TAES mnv(r’+1)(157088 + r’L2 ) mnv(300 + r’L) r’2 C2 (1 − ) r’C
(8)
3.2 Performance Analysis of the Kerberos Protocol Fig. 3 shows the time sequence diagram of the Kerberos protocol. For more information about the Kerberos protocol, the reader could refer to [1]. The steps of the protocol are summarized below: Step 1: A sends to the AS a request to communicate with B Transmitted message of Step1: [A, B] A : A’s identity B : B’s identity Step 2: The AS sends its reply to A Transmitted message of Step2: [Tas, L1, Ka,b, B]Ka,as, [Tas, L1, Ka,b, A]Kb,as] Tas : timestamp generated by AS Ka,b : symmetric key between A and B L1 : lifetime of Ka,b Ka,as : the symmetric key between A and AS Kb,as : the symmetric key between B and AS Step 3: A sends to B the AS’s reply Transmitted message of Step3: [A, [Tas, L1, Ka,b, A]Kb,as, [A, Ta] Ka,b Ta : timestamp generated by A Step 4: A authenticates B Transmitted message of Step4: [A, Ta+1] Ka,b According to Fig. 3, the mean message time Tmes for transmitting a message from a client to a server is given by: Tmes =
1 {E[Wc1]+Sc1+ Wn1+tn1+ E[Was]+Sas+ Wn2+tn2+ E[Wc2]+Sc2 + r’ Wn3 + tn3 + E[Ws1] + Ss1 + Wn4 + tn4 + E[Wc3] + Sc3} +E[Wc4]+Sc4+ Wn5+tn5+ E[Ws2]+Ss2
(9) where, r’ is the number of transmitted messages between the client and the server using the key transmitted in the key distribution phase, Wc1 is the waiting time in the client queue (A) before processing the request for the communication with the server (B),
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
21
Sc1 is the time required for processing the request of A to communicate with B, Wn1 is the waiting time in the network queue, tn1 is the time required for transmitting the request of A to the AS, Was is the waiting time in the Authentication Server (AS) queue before processing the request of A, Sas is the time required to encrypt (Tas, L1, Ka,b, A), and (Tas, L1, Ka,b, B) using the AES cipher, Wn2 is the waiting time in the network queue, tn2 is the time required for transmitting the AS reply to A, Wc2 is the waiting time in A’s queue before processing the reply from the AS, Sc1 is the time required to decrypt (Tas, L1, Ka,b, B), and to encrypt (A, Ta) using the AES cipher, Wn3 is the waiting time in the network queue, tn3 is the time required for transmitting A’s reply to B, Ws1 is the waiting time in the server’s queue before processing the reply of A, Ss1is the time required to decrypt (Tas, L1, Ka,b, A), to decrypt (A, Ta), and to encrypt (B, Ta+1) using the AES cipher, Wn4 is the waiting time in the network queue, tn4 is the time required for transmitting B’s reply to A, Wc3 is the waiting time in the client queue before processing B’s reply, Sc3 is the time required to decrypt(B, Ta+1) using the AES cipher, Wc4 is the waiting time in the client queue before encryption of the message transmitted to B, Sc4 is the time required to encrypt a message of length L using the AES cipher, Wn5 is the waiting time in the network queue, tn5 is the time required for transmitting the message from A to B, Ws2 is the waiting time in the server’s queue before processing the message of A, and Ss2 is the time required to decrypt A’s message.
Fig. 3. The time sequence diagram for the Kerberos protocol
In order to calculate the mean message time, the following assumptions are made: A’s identity = B’s identity = 8 bits, Tas = 24 bits, L1 = 8 bits, Ka,b = 128 bits, and the encryption block of the AES = 128 bits. According to the previous assumptions, the following parameters are calculated: Sc1 = 0 (since it involves no encryption), Sas = 336/TAES, Sc2 = 200/TAES, Ss1 = 232/TAES, Sc3 = 32/TAES, and Sc4 = Ss2 = L/TAES. For Simplicity the following assumptions are made: Wc1= Wc2 = Wc3 = Wc4 =E[Wc], Ws1= Ws2 = E[Ws], and Wn1= Wn2 = Wn3 = Wn4 = Wn5 =E[Wn].
22
H.K. Aslan
For the client queue: The client queue is modeled as an M/G/1 queue with the fol(3 + r’)nv nv(232 + r’L) , Tclient = 232 + r’L , client = , lowing parameters: client = r’ r’TAES (r’+3)TAES 2 and E[ 2] = 41024 + 2r’L .Substituting into Eq. (2), the average waiting time E[Wc] is (r’+3)TAES2
equal to: E[Wc] =
nv(20512 + r’L2 ) nv(232 + r’L) r’TAES2 (1 − ) r’TAES
(10)
For the server queue: The server queue is modeled as an M/G/1 queue with the following parameters: server = mv(1+ r’) , Tserver = 232 + r’L , server = mv(232 + r’L) , r’TAES r’ (r’+1)TAES 2 and E[ 2] = 53824 + 2r’L . Substituting into Eq. (2), the average waiting time E[Ws] (r’+1)TAES2
is equal to: E[Ws] =
mv(26912 + r’L2 ) mv(232 + r’L) r’TAES2 (1 − ) r’TAES
(11)
For the Authentication Server queue: The AS queue is modeled as an M/D/1 queue mnv , Tas = 336 , and as = 336mnv . Substiwith the following parameters: as = r’ TAES r’TAES tuting into Eq. (5), the average waiting and service time E[WSM+ SSM] is equal to: 336
168mnv ) r’TAES 336mnv 1− r’TAES
E[Was+ Sas] = TAES
(1 -
(12)
For the network queue: The network queue is modeled as an M/G/1 queue with the following parameters: network= mnv(r’+4) , the mean packet length = 584 + r’L , r’ r’+4 Tnetwork = 584 + r’L , C(r’+4)
network=
mnv(584 + r’L) , and E[ 2] = 154176 + 2r’L2 . Substir’C (r’+4)C2
tuting into Eq. (2), the average waiting time E[Wn] is equal to: E[Wn] =
mnv(77088 + r’L2 ) mnv(584 + r’L) ) r’C 2 (1 − r’C
(13)
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
23
The mean message time Tmes given in Eq. (9) can be now calculated using Eqs. (10– 13): 336 168mnv (1 ) 584 + r’ L 464 + 2 r’ L r’ TAES r’TAES Tmes = + + 336mnv r’C r’TAES 1− r’TAES nv(r’+3)(20512 + r’L2 ) mv(r’+1)(26912 + r’L2 ) + + nv(232 + r’L) mv(232 + r’L) r’2 TAES2 (1 − ) r’2 TAES2 (1 − ) r’TAES r’TAES 2 + mnv(r’+4)(77088 + r’L ) mnv(584 + r’L) r’2 C 2 (1 − ) r’C
(14)
4 Numerical Results and Discussions In the previous section, formulas for the mean message time for both the AUTHMAC_DH and the Kerberos protocols were derived. In order to obtain performance curves, the following assumptions are made: the number of clients m = 150, the number of servers n = 15, and the number of transmitted messages between the client and the server r’ = 10. In the present paper, the performance is analyzed for the following case: - Two message lengths are assumed: one for short messages where L = 1000 bits and the latter for large messages where L = 1 Mbits. - Two encryption speeds are assumed: one for low encryption speed where TAES = 1 Mbps and the latter for high encryption speed where TAES = 1 Gbps. - Two network rates are assumed: one for low network rate where C = 1 Gbps and the latter for high network rate where C = 10 Gbps. In the following paragraphs, performance comparison between the proposed protocol and the Kerberos protocol will be given. Figs 4–7 depict Tmes versus v for Kerberos, and the proposed protocol. Figs 4 and 5 are plotted for L = 1000 bits. Figs. 4.a and 4.b are plotted for TAES = 1 Gbps, while Fig. 5 is plotted for TAES = 1 Mbps. Figs 6 and 7 are plotted for L = 1 Mbits. Fig. 6 is plotted for TAES = 1 Gbps, while Fig. 7 is plotted for TAES = 1 Mbps. All (a) figures are plotted for C = 10 Gbps, while all (b) figures are plotted for C = 1 Gbps. From the figures, the following remarks can be deduced.: 1. For short messages(L = 1000 bits) and high encryption speed (TAES = 1 Gbps), two cases are examined: one for a transmission rate C = 10 Gbps and the latter for C = 1 Gbps. The following remarks could be deduced: - For C = 10 Gbps, the Kerberos protocol has a better performance than the new protocol. This is shown in Fig. 4.a. For short messages and high encryption speed, it could be concluded from all the queues of both the Kerberos protocol and the new protocol that the Kerberos protocol has a better performance over the new protocol.
24
H.K. Aslan
-
For C = 1 Gbps, the new protocol outperforms the Kerberos protocol as illustrated in Fig. 4.b. This is due to the fact that, as the network speed decreases, the performance becomes network dependent. Moreover, it could be concluded from the network queues (Eqs. 7 and 13) that the network time of the new protocol is less than that of the Kerberos protocol.
3.
For large messages (L = 1 Mbits), the new protocol outperforms the Kerberos protocol (Figs 6–7). This results since, for large messages, both the network and the server queues of the new protocol have a better performance over the Kerberos protocol for both low encryption speed and high encryption speed. The same conclusion could be applied for both high network speed and low network speed.
Kerberos protocol
AUTHMAC_DHprotocol
Tmes in microseconds
For short messages and low encryption speed (TAES = 1 Mbps), the Kerberos protocol has a better performance than the new protocol as illustrated in Fig. 5. This results from the fact that for low encryption rates, the performance becomes dependent on the server’s queue. From the server queues (Eqs. 4 and 11), it is clear that for short messages, the server’s waiting time in the Kerberos protocol is less than that of the new protocol.
Tmes in microseconds
2.
32 22
AUTHMAC_DHprotocol
12
Kerberos protocol
2 0
100
200
300
400
v messages/sec
v messages/sec
a.
b.
Kerberos protocol
AUTHMAC_DHprotocol
v messages/sec
a.
Tmes in microseconds
Tmes in microseconds
Fig. 4. Tmes versus v for L = 1000 bits and TAES = 1 Gbps: a. C = 10 Gbps b. C = 1 Gbps
AUTHMAC_DHprotocol
Kerberos protocol
v messages/sec
b.
Fig. 5. Tmes versus v for L=1000 bits and TAES=1 Mbps: a. C = 10 Gbps b. C = 1 Gbps
AUTHMAC_DHprotocol Kerberos protocol
Tmes in microseconds
Tmes in microseconds
AUTHMAC_DH: A New Protocol for Authentication and Key Distribution
25
Kerberos protoco
AUTHMAC_DHprotocol
v messages/sec
.
.
.
.
v messages/sec
a.
b.
. .
Kerberos protocol AUTHMAC_DHprotocol
. . . . v messages/sec
a.
Tmes in seconds
Tmes in seconds
Fig. 6. Tmes versus v for L = 1 Mbits and TAES = 1 Gbps: a. C = 10 Gbps b. C = 1 Gbps
. .
Kerberos protocol AUTHMAC_DHprotocol
. . . . v messages/sec
b.
Fig. 7. Tmes versus v for L = 1 Mbits and TAES = 1 Mbps: a. C = 10 Gbps b. C = 1 Gbps
5 Conclusions In the present paper, a new protocol for authentication and key distribution called the AUTHMAC_DH is proposed. In order to provide authentication of the exchanged messages during authentication and key distribution, the AUTHMAC_DH protocol uses the Message Authentication Codes (MAC) to exchange the Diffie-Hellman components. Since, the compromise of messages exchanged during authentication and key distribution will lead to the disclosure of the Diffie-Hellman components and not the symmetric key itself. Therefore, the attacker cannot calculate the symmetric key used between the communicating parties. This feature is considered as an advantage over the Kerberos protocol in which the disclosure of messages exchanged during authentication and key distribution will lead to the disclosure of the symmetric key used between the communicating parties. It has to be noted that the use of MAC will fasten the proposed protocol. On the other hand, the AUTHMAC_DH protocol uses nonces to ensure the freshness of the exchanged messages. Subsequently, there is no need for clock synchronization which will simplify the system requirements which is considered as a second advantage over the Kerberos protocol. Performance expres-
26
H.K. Aslan
sions for both the AUTHMAC_DH and the Kerberos protocols are derived using queuing model analysis. The performance is evaluated for several conditions: both short and large messages are examined, also the evaluation is considered for high and low encryption speeds, finally the analysis is undertaken for low and high network speeds. The performance analysis shows that the AUTHMAC_DH protocol has a comparable performance with the Kerberos protocol for short messages and outperforms it for large messages. In conclusion, besides that the AUTHMAC_DH protocol has a comparable performance with the Kerberos protocol, it overcomes its drawbacks.
References 1. 2. 3. 4. 5. 6. 7.
8.
9. 10. 11. 12.
13.
Kohl, J. T., and Neuman, B. C.: The Kerberos Network Authentication Service (V5), RFC 1510, September 1993, (1993). Kohl, J. T., Neuman, B. C., and Ts’o., T. Y.: The Evolution of the Kerberos Authentication Service, IEEE Computer Society Press Book (1994). Molva, R., Tsudik, G., Herreweghen, E. V., and Zatti, S.: KryptoKnight Authentication and Key Distribution System, Proceedings of ESORICS 92, October 1992, (1992). Tardo, J. J., and Alagappan, K.: SPX: Global Authentication Using Public Key Certificates, IEEE Privacy and Security Conference (1991). Tardo, J. J., and Alagappan, K.: SPX Guide: A Prototype Public Key Authentication Service, Digital Equipment Corporation, May 1991, (1991). Ford, W.: Computer Communications Security: Principles, Standard Protocols and Techniques, Prentice Hall (1994). Aslan, H. K.: Logic-Based Analysis and Performance Evaluation of a New Protocol for Authentication and Key Distribution in Distributed Environments, Ph. D. Thesis, Electronics and Communications Dept., Faculty of Engineering, Cairo University, July 1998, (1998). El-Hadidi, M. T., Hegazi, N. H., and Aslan, H. K.: Logic-Based Analysis of a New Hybrid Encryption Protocol for Authentication and Key Distribution, IFIP SEC98 conference (1998). Bellovin, S. M., and Merrit, M.: Limitations of the Kerberos Authentication System, Computer Communication Review, October 1990, (1990). Daemen J., and Rijmen, V.: AES Proposal: Rijndael, available at http://www.nist.gov/aes. http://www.iaik.tugraz.at/aboutus/people/groszschaedl/papers/acsac2000.pdf Black, J., Halevi, S., Krawczyk, H., Krovetz, T., and Rogaway, P.: UMAC: Fast and Secure Message Authentication, Advances in Cryptology, Crypto’99, Lecture Notes in Computer Science, Vol. 1666, Springer-Verlag (1999). Schwartz, M.: Telecommunication Networks: Protocols, Modeling and Analysis, Addison–Wesley (1987).
Multipoint-to-Multipoint Secure-Messaging with Threshold-Regulated Authorisation and Sabotage Detection Alwyn Goh1 and David C.L. Ngo2 1
2
Corentix Laboratories, B-19-02 Cameron Towers, Jln 5/58B, 46000 Petaling Jaya, Malaysia. [email protected]
Faculty of Information Science & Technology, Multimedia University, 75450 Melaka, Malaysia
Abstract. This paper presents multi-user protocol-extensions for Schnorr/Nyberg-Ruepple (NR) signatures and Zheng signcryption, both of which are elliptic curve (EC)/discrete logarithmic (DL) formulations. Our extension methodology is based on k-of-n threshold cryptography—with Shamir polynomial parameterisation and Feldman-Pedersen verification—resulting in multi-sender Schnorr-NR (SNR) and multi-sender/receiver Zheng-NR (ZNR) protocols, all of which are interoperable with their single-user base formulations. The ZNR protocol-extensions are compared with the earlier Takaragi et al multi-user sign-encryption, which is extended from a base-protocol with two random key-pairs following the usual specification of one each of signing and encryption. Both single and double-pair formulations are analysed from the viewpoint of EC equivalence (EQ) establishment, which is required for rigorous multi-sender functionality. We outline a rectification to the original Takaragi et al formulation; thereby enabling parameter-share verification, but at significantly increased overheads. This enables comprehensive equivalentfunctionality comparisons with the various multi-user ZNR protocolextensions. The single-pair ZNR approach is shown to be significantly more efficient, in some cases demonstrating a two/three-fold advantage.
1 Introduction The emergence of various technologies ie peer-to-peer computing and ad hoc communications motivates the development of transactional models beyond the presently dominant presumption of single-user functionality and point-to-point connectivity. This in turn motivates the development of cryptographic protocols to support network-mediated collaboration and workgroup transactions, the multi-user nature of which is not accommodated naturally by the conventional presumption of userspecific key-parameterisation. External transaction-to-workgroup association is a far better solution—from the viewpoint of transactional logic and liability—which also reduces the receiver-side storage overhead to a single public-key. The cryptographic specification is therefore to rigorously associate multiple userspecific key-shares with a common workgroup public-key, so that a configurable user-subset is able to exercise workgroup-representative authority. This can be eleA. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 27–39, 2003. © IFIP International Federation for Information Processing 2003
28
A. Goh and D.C.L. Ngo
gantly implemented via the polynomial-based k-of-n threshold methodology of Shamir [1] and Feldman-Pedersen [2-4], which is applicable to EC [5, 6]/DL protocols. k-of-n thresholding is therefore a useful multi-user specification methodology; as respectively demonstrated by Park-Kurosawa [7] and Takaragi et al [8] extensions on ElGamal [9] and NR [10] signatures respectively, the latter of which was presented to the Institute of Electrical and Electronics Engineers (IEEE) study-group for publickey cryptography standards. Takaragi et al also specifies a sign-encryption protocol able to incorporate multiple senders and receivers. This paper departs from earlier work in its emphasis on secure-messaging rather than signatures; with its focus on integration of message authentication/encryption and multi-user functionality. We outline k-of-n threshold extensions for Schnorr [11], Zheng [12] and NR constructions, with the characteristic property of message-level parameterisation based on a single EC/DL key-pair of the initial sender-determined randomisation. This approach was motivated by the use of the single-pair in Zheng signcryption for both authentication and encryption; which is a departure from the more frequently encountered specification of distinct key-pairs for each message-related functionality, as exemplified by the Takaragi et al NR-derived (TNR) sign-encryption. Single (rather than double) key-pair secure-messaging is significantly more compute-efficient on a point-to-point basis, and is shown in this paper to be similarly advantageous for multiuser extensions. This applies to both fast and rigorous multi-sender modes, the latter of which necessitates detection of malformed parameter-shares via ECEQ establishment. The original multi-sender TNR sign-encryption is, in fact, not rigorous due to non-establishment of ECEQ, which can be rectified via application of the ChaumPedersen [13] and Fiat-Shamir[14] protocols.
2 Review of Base Protocols and Mechanisms 2.1 Schnorr-Zheng, NR and Takaragi et al Cryptography All signature and secure-messaging protocols in this section presume prior specification of a EC/DL finite-field. We adopt the former description, denoted F with base-
{
}
point g ∈ F and multiplicative-group G = k g : k ∈ Zq ⊂ F . Schnorr signatures are inherently bandwidth-efficient, with signature bit-length of |h| + |q| (for h some cryptographic hash) independent of the underlying finite-field. Zheng secure-messaging extends the Schnorr formulation to enable receiver-designation, so that the senderside signcryption and receiver-side unsigncryption operations respectively incorporate symmetric cipher operations 〈E, D〉. Both protocols require prior specification of sender (A) key-pair 〈a, A (= a g)〉, with Zheng additionally necessitating receiver (B) key-pair 〈b, B (= b g)〉. Sender and receiver-side computations then proceed as follows: with F some key-formatting function, most conveniently implemented with hash h. Note the use of basepoint g and receiver public-key B as the expansion point for initial randomisation k, resulting in random message-specific public-keys k and E. This prescription is entirely consistent with NR cryptography, with the only difference being specification of r = ν – h(m) instead of the above-outlined r = h ν (m) .
Threshold-Regulated Authorisation and Sabotage Detection
29
Table 1. (a) Schnorr and (b) Zheng protocols
A
Schnorr
Zheng
Generate 〈k, k (= k g)〉 Compute ν = F(k) Compute r = h ν (m)
Generate 〈k, E (= k B)〉 Compute 〈µ, ν〉 = F(E) Compute c = E µ(m)
Compute s = k – ar (mod q)
Compute r = h ν (m) Compute s = k – ar (mod q)
↓ 〈m, r, s〉
B
↓ 〈c, r, s〉
Recover k = sg + rA Recover ν = F(k) Confirm h ν(m) = r
Recover E = b (sg + rA) Recover 〈µ, ν〉 = F(E) Recover m = Dµ(c) Confirm h ν(m) = r
The computation-overheads of SNR and ZNR are essentially equal from the viewpoint of EC scalar-multiplication (M) operations, each of which is far more expensive than EC point-addition (A) or number-field/symmetric computations. Leading-order analysis then yields sender and receiver-side overheads of M and 2M. ZNR is therefore significantly more compute-efficient compared to the usual superposition of signing and encryption operations. S/ZNR is also more efficient than ElGamal and the USA National Institute of Standards and Technologies (NIST) Digital Signature Standard (DSS), both of which require sender/receiver-side number-field multiplicative inversion. Both k and E have different functional roles, the latter of which enforces receiverside demonstration of private-key b as a precondition for message-recovery and verification. This is beyond the scope of pure multisignature formulations, but is important for collaborative protocols with receiver-designation. The Takaragi et al NR-extended (TNR) sign-encryption—with explicit use of k for authentication and E for encryption—takes an alternative approach, as outlined below: Table 2. TNR sign-encryption
A
Generate 〈k, k, E〉 Compute ν = F(k) and µ = F(E) Compute r = ν – h(m) Compute s = k – ar (mod q) Compute c = E µ(m) ↓ 〈c, r, s〉
B
Recover k = sg + rA and ν Recover E = b k and µ Recover m = Dµ(c) Confirm ν = r + h(m)
30
A. Goh and D.C.L. Ngo
This formulation costs 2M on the sender-side and 3M on the receiver-side, the latter of which arises from the necessity to sequentially compute k and then E. Both are more significantly more compute-intensive than the corresponding ZNR operations. 2.2 k-of-n Polynomial Thresholding k-of-n threshold cryptography as formulated by Shamir allows workgroup (set of all users U) key-parameterisation via (k–1)-degree polynomial
k −1 e(x) = ∑ eµ x µ (mod q) , with a = e(0) mod q interpreted as the workgroup privateµ=0 key. Individual users—of which there are n, indexed i ∈ U—would then be assigned polynomial-associated private key-shares a i = e(i) mod q , which are essentially a k-th share of a if e is secret. This arises from the necessity of at least k datapoints of form 〈i, e(i)〉 for finite-field Lagrange interpolation ie
x − j (mod q) . Evaluation of this expression results in e(x) = ∑ e(i) ∏ i − j i ∈S j ∈ S − {i}
j e(0) = a = ∑ εia i (mod q) with index-coefficient εi = (mod q) for ∏ i ∈S j ∈ S − {i} j − i any k-sized subset S ⊂ U. Knowledge of e should be restricted to a trusted keygenerator (T), whose role will be subsequently outlined. Pedersen verification allows individual key-shares a i to be verified as a k-th portion of workgroup private-key a without divulging polynomial e. This operation can be executed with [3] or without [4] a centralised T. Presumption of T allows an efficient non-interactive implementation; with individual EC key-pairs polynomial parameterisation
(
eµ, e µ = e µ g
a i, Ai (= a i g ) and
) , the latter of which includes work-
group key-pair 〈a, A (= a g)〉. Key-share generation, distribution and verification between T and all users i ∈ U then proceeds as follows:Table 3. Key-share generation, distribution and verification
T
Generate polynomial eµ, eµ Generate key-share a i, A i for ∀i Authenticated ch:
Secure ch:
⇓ eµ, i, Ai
↓ ai
i∈U Confirm
k −1 ∑ iµ mod q eµ = a i g = A i µ=0
(
)
Threshold-Regulated Authorisation and Sabotage Detection
31
the last step of which is a zero knowledge (ZK) verification of key-share possession by user i, thereby enabling engagement in the subsequently outlined protocols. Note the non-interactive nature of the above-described one-time procedure, with authenticated communication essentially equivalent to signed postings on a bulletin-board.
3 Basic Multi-sender Protocol-Extensions 3.1 Individual and Workgroup Parameterisations The most straightforward extension methodology would be via SNR k and ZNR E public-keys as the starting point. The protocol parameters are outlined below: Table 4. Sender-specific and workgroup-combined parameters
SNR
ZNR
i∈ S
k i = ki g S⊂U
k = ∑ ki i ∈S ν, r
ki Ei = k i B E = ∑ Ei i ∈S 〈µ, ν〉, 〈c, r〉
si = k i − εia ir (mod q) s = ∑ si (mod q) i ∈S
i S
with Schnorr-Zheng/NR differentiation via specification for r. The end result would be SNR signature 〈r, s〉 or ZNR signcryption 〈c, r, s〉, as would be computed by an entity with private-key a = ∑ εia i (mod q) . This approach has been demonstrated
i ∈S
by Takaragi et al to be advantageous compared to the earlier Park-Kurosawa formulation with individual random polynomials. The Takaragi et al description of multisignature formation specifies broadcast of all individual k i, si and repeated computation of the common k and 〈r, s〉 by each i ∈ S. We outline an alternative presentation with a centralised combiner (C) of workgroup parameters—the details of which can logged and straightforwardly verified— which is also applicable towards TNR multisignatures, as demonstrated below: (See Table 5) resulting in NR signature 〈r, s〉. Such an implementation clearly and efficiently separates security-critical sender-specific and verifiable workgroup-aggregated operations, the latter of which does not result in an externally (to the workgroup) visible contribution.
32
A. Goh and D.C.L. Ngo Table 5. TNR multisignature formation
i∈S 1 2 3
C
Generate k i, k i Compute εi and si
ki →
⇐ 〈∀i, r〉
si →
4
Compute k, ν and r Compute ∀εi Confirm si g + r εi A i = k i Compute s
3.2 Multi-sender Extended Cryptography Recall that TNR multisignatures are an extension of the NR base-formulation, hence the applicability of Table 5 to Schnorr multisignatures via definition r = h ν (m) . The equivalent ZNR extension is as follows:Table 6. ZNR multi-signcryption
i∈S 1 2 3
Generate k i, Ei
C
Ei →
⇐ 〈∀i, r〉
Compute εi and si
4
si →
Compute E, 〈µ, ν〉 and 〈c, r〉 Compute ∀εi Compute s
Both T/SNR and ZNR formulations have sender-side overheads of M (computations) and |p| + |q| (communications), which is slightly higher (with respect bandwidth) compared with the single-sender base-protocols in Table 1. The 2kM computation required for T/SNR signature-share verification in step (4) of Tables 5 is noteworthy,
k + 1 |h| broadcast overhead after step (2) in both protocol2
as is the modest
extensions. Note that submission of k after step (1) and its subsequent verification in step (4) as in Table 5, does not preclude protocol-sabotage by individual users. This is executed via submission of E = k B and s′ = k′ – εar (mod q) with different initial randomisations, resulting in receiver-side inability to recover the signcrypted message. Detection and mitigation of malformed parameter-shares motivates our subsequent analysis of TNR sign-encryption, and formulation of a ZNR extension with verified combination.
Threshold-Regulated Authorisation and Sabotage Detection
33
4 Multi-sender Protocol-Extension with Verified Combination 4.1 Analysis of Randomised Key-Pairs Verification of the ZNR-shares in Table 6 essentially requires establishment that the public-keys 〈k, E〉 are ECEQ . This is not demonstrated in TNR multi-sender signencryption—which simply uses one key-pair each for parameter-share authentication and encryption—as outlined below:-
Table 7. TNR multi-sender sign-encryption
i∈S 1 2 3
Generate k i, k i, Ei
C
k i, Ei → ⇐ 〈∀i, r〉
Compute εi and si
si →
4
Compute k, ν and r Compute ∀εi Confirm si g + r εi A i = k i Compute s Compute E, µ and c
Note the pair-related computations are essentially independent signing and encryption operations—with increased sender-side overheads of 2M and 2|p| + |q|—which is problematic due to individual senders being able to sabotage the protocol through submission of non-ECEQ pair 〈k, E′〉. Such an malformed submission enables successful verification (internal to the workgroup), but prevents proper receiver-side recovery (typically outside the workgroup). Saboteurs can therefore remain undetected in TNR multi-sender sign-encryption. This inability to detect non-ECEQ pairs prior to combination is unfortunate, since typical operations might result in submission of more than k parameter-shares. Combiner-side detection of sabotaged parameter-shares under such circumstances would therefore allow for their straightforward replacement with well-formed ones, so that the resultant 〈c, r, s〉 is also well-formed. Lack of such a capability, on the other hand, is problematic in any number of realistic operational scenarios. 4.2 Rectification via ECEQ Establishment A pair P = 〈k, E〉 can be proven ECEQ with respect basepoint pair 〈g, B〉 via the Chaum-Pedersen [13] protocol, which can be made non-interactive via Fiat-Shamir [14] heuristics. Prover (P) knowledge of common randomisation k allows Verifier (V) side confirmation of ZK proof 〈e, z〉 as follows:-
34
A. Goh and D.C.L. Ngo Table 8. ECEQ of P with respect 〈g, B〉
P
Generate random r Compute P′ = 〈r g, r B〉 Compute e = h(g, B, P, P′) Compute z = r – ek (mod q) ↓ 〈e, z〉
V
Compute k′ = ek + zg Compute E′ = eE + zB Confirm e = h(g, B, P, P′)
which requires prover and verifier-side computation overheads of 2M and 4M respectively, in addition to bandwidth |h| + |q|. ECEQ establishment allows rectification of the TNR formulation in Table 7 as follows:-
Table 9. TNR multi-sender sign-encryption with ECEQ
i∈S 1 2 3
Generate k i, k i, Ei
C
k i, Ei → ⇐ 〈∀i, r〉
Compute ei, zi Compute εi and si
Compute k, ν and r Compute ∀εi
ei , z i , s i →
4
Establish ECEQ k i, Ei Confirm si g + r εi A i = k i Compute s Compute E, µ and c
resulting in a well-formed 〈c, r, s〉; but at significantly higher overheads, particularly combiner-side for large k. 4.3 Homomorphic ECEQ Establishment ECEQ establishment for multi-sender signcryption is far more straightforward via reexpression of the EC verification condition (V): sg + rA = k (ref Table 1), specifically its RHS(V): k = E + G with G = k d and d = g – B. Individual senders would therefore need to compute and transmit ECEQ pair 〈E, G〉, the latter of which essentially constitutes a homomorphic commitment on the former. This results in the following ZNR extension:
Threshold-Regulated Authorisation and Sabotage Detection
35
Table 10. ZNR multi-signcryption with verified combination
i∈S 1 2 3
Generate k i, Ei, Gi
C
Ei, Gi → ⇐ 〈∀i, r〉
Compute E, 〈µ, ν〉 and 〈c, r〉 Compute ∀εi
si →
Recover k i = Ei + Gi Confirm si g + r εi A i = k i Compute s
Compute εi and si
4
with parameter-share verification in step(4) prior to computation of the workgroup s. The single key-pair computation results in sender-side overheads essentially equal to weak TNR sign-encryption without ECEQ (Table 7), but is only half that of the rigorous variant with ECEQ (Table 9). The combiner-side overhead is essentially equal to that of the T/SNR multisignature scheme in Table 5, and also only a third of TNR sign-encryption with ECEQ. Note the differences in the ECEQ establishment mechanisms, with independent use of 〈k, E〉 resulting in the necessity for specification of another pair 〈k′, E′〉. ZNR predication on single public-key E, on the other hand, allows for a much simpler homomorphic establishment of ECEQ which leverages EC verification (in any case required) of individually submitted s. This illustrates the efficacy of the ZNR signcryption approach which integrates signature and encryption operations.
5 Multi-receiver Protocol-Extension 5.1 Individual and Workgroup Parameterisations ZNR multi-receiver extensibility is predicated on receiver-specific (i ∈ R) knowledge of key-share bi applied to compute parameter-share Ei = bi (sg + r A) . Sufficient
quantities of the latter can be summed to obtain workgroup-common (R ⊂ U) E = ∑ εiEi . This parameterisation also applies to the TNR decrypt-verify proto-
i∈R
col, but is beyond the functional scope of the TNR and SNR multisignature formulations. Following the sender-side analysis, we adopt a presentation with centralised C so as to separate security-critical receiver-specific (predicated on key-share knowledge) and verifiable workgroup-aggregated operations. This is straightforward for ZNR recovery of E, but more complicated for the equivalent TNR operation predicated on both k and E. The most efficient approach is to independantly compute receiver-
36
A. Goh and D.C.L. Ngo
specific Ei —departing from single-receiver case in Table 2—and workgroupcommon k = sg + rA as illustrated below: Table 11. TNR multi-receiver decrypt-verification
i∈R
Compute Ei
↓ Ei
Compute ∀εi , E and µ Recover m = Dµ(c)
C
Compute k = sg + rA and ν Confirm ν = r + h(m) with an overhead of 2M per receiver (ref Section 2.1), and an additional 2M at C. This is less efficient than multi-receiver ZNR, as will be subsequently demonstrated. Successful message recovery/verification presumes proper sender-side formation of 〈c, r, s〉, which places a premium on parameter-share verification. 5.2 Multi-receiver Extended Cryptography ZNR unsigncryption as outlined in Section 2.1 can be extended to incorporate multiple receivers, as follows:Table 12. ZNR multi-unsigncryption
i∈R
Compute Ei
↓ Ei
C
Compute ∀ε i and E Recover 〈µ, ν〉 = F(E) Recover m = Dµ(c) Use ν to confirm r
with Schnorr-Zheng/NR differentiation only in the final confirmation ie h ν(m) = r and ν = r + h(m) respectively. This formulation can be used in conjunction with single/multi-sender signcryption protocols of Tables 1(b), 6 and 10; the last of which prevents protocol-sabotage via malformed signcryption-shares. This ZNR extension is also more compute-efficient on the combiner-side—by 2M, due to non-computation of k—compared with the equivalent TNR operation.
Threshold-Regulated Authorisation and Sabotage Detection
37
6 Comparison with TNR Protocols The computation and communications overheads of the featured multi-user extensions are as follows:-
Table 13. Comparison of (a) single/multi-sender signature/signcryption protocols, and (b) single/multi-receiver verification/unsigncryption protocols. #, * and + denote receiver–designation, parameter–share verification and receiver–confirmation
Protocol
Table
SNR sgn ZNR sgncpt # TNR sgn/enc #
1(a) 1(b)
M, |h|+|q|
2
2M, |h|+|q|
T/SNR multisgn * TNR multisgn/enc #
5
M, |p|+|q|
7
2M, 2|p|+|q|
2kM,
TNR multisgn/enc ECEQ #*
9
4M, 2|p|+2|q|+|h|
6kM,
ZNR unverif multi-sgncpt #*
6
M, |p|+|q|
ZNR verif multi-sgncpt #*
10
2M 2|p|+|q|
Protocol
Table
T/SNR verif ZNR unsgncpt + TNR dec/verif + TNR multidec/verif + ZNR multiunsgncpt +
1(a) 1(b) 2
Sender overhead
Combiner overhead n/a
Receiver overhead 2M
k + 1 |h| 2
k + 1 |h| 2 k kA , + 1 |h| 2 k 2kM, + 1 |h| 2
Combiner overhead
Receiverconfirmation no
n/a
3M
11
2M+kA
yes
2M, |p| 12
kA
Note the presentation of two ZNR multi-sender extensions, the more rigorous (Table 10) of which facilitates parameter-share verification in addition to receiverdesignation. This is achieved efficiently via homomorphic ECEQ, resulting in overheads only marginally greater than T/SNR multisignature formation (Table 5). Rigorous multi-sender TNR (Table 9) sign-encryption requires significantly higher (doubled/tripled) overheads due to the necessity to establish ECEQ of the 〈k, E〉 public-keys with respect a challenge (r-dependent) pair 〈k′, E′〉. Both ZNR and TNR
38
A. Goh and D.C.L. Ngo
multi-sender extensions can be operated in unverified modes ie Tables 6 and 7 respectively, with dispensation of the combiner-side overhead for the latter. ZNR multi-signcryption is also significantly more efficient sender-side when operated in fast mode. The multi-receiver ZNR (Table 12) and TNR (Table 11) formulations differ through their respective use of single E and double 〈k, E〉, the former of which is more efficient. Both protocol-extensions are vulnerable to sender-side sabotage resulting in malformed secure-messages, which emphasises the importance of parameter-share verification. Multi-receiver ZNR in conjunction with the verifying multi-sender and single-sender ZNR variants, can therefore be characterised as rigorous and efficient multipoint-to-multipoint secure-messaging.
7 Concluding Remarks The outlined multi-user S/ZNR protocols are functionally comprehensive, compute/bandwidth-efficient and transparently interoperable with respect their single-user base-formulations. This allows for straightforward implementation of both within typical workgroup environments; with verified combination by designated users or centralised servers, and externally-visible S/ZNR parameters structurally identical to their single-user base-formulations. Combiners can therefore be regarded as workgroup gateways, the efficiency of which is enhanced by the near-similarity of the S/ZNR formulations. Note the receiver-side operation can be concluded after a single cryptographic computation, and is therefore inherently efficient independant of k. Sender-side collaboration can also be simplified to a single pass for the (k = 2) case, with only initiating (i ∈ S) and responding (j ∈ S) users. This versatility and efficiency stems from the featured multi-user extension methodology on single key-pair base-protocols, which in the case of ZNR departs from the usual prescription (adopted for TNR sign-encryption) of distinct pairs for message-authentication and encryption. The proposed formulation integrates authentication and encryption functionalities, and enables efficient detection of sabotaged parameter-shares in multi-sender ZNR. This capacity for sabotage-detection is also present in the T/SNR multisignature protocol, which is a single-pair authenticationonly formulation. Sabotage-detection can also be incorporated into the double-pair multi-sender TNR sign-encryption, but only at the cost of significantly higher overheads compared to multi-sender ZNR signcryption. It is interesting to speculate whether other double-pair secure-messaging formulations can be efficiently extended to incorporate this attribute. Parameter-share verification is a significant functional advantage, the lack of which jeopardises multi-receiver message-recovery/verification. This can be seen from transaction scenarios featuring long-term—so that existence of the original message, sender-side key-shares or even the sending-workgroup cannot be presumed— escrow of inadvertently malformed secure-messages, resulting in permanent information loss. Efficiency with respect sabotage-detection is also important, especially in consideration of the two/three-fold differences in the ZNR and TNR overheads. The presented ZNR extension can therefore be safely characterised as rigorous yet efficient multipoint-to-multipoint secure-messaging.
Threshold-Regulated Authorisation and Sabotage Detection
39
References 1. 2. 3.
4. 5. 6. 7. 8.
9. 10. 11. 12.
13. 14.
A Shamir (1979). How to Share a Secret. Assoc Comp Machinery (ACM) Comms vol 22, no 11: pp 612–613 P Feldman (1987). A Practical Scheme for Non-Interactive Verifiable Secret-Sharing. 28th IEEE Symp on the Foundations of Comp Sc: pp 427–437 TP Pedersen (1991). Distributed Provers with Applications to Undeniable Signatures. Eurocrypt-91, Springer-Verlag Lecture Notes in Computer Science (LNCS) 547: pp 221– 238 TP Pedersen (1991). A Threshold Cryptosystem without a Trusted Party. Eurocrypt-91, Springer-Verlag LNCS 547: pp 522–526 AJ Menezes (1993). Elliptic Curve Public-Key Cryptosystems. Kluwer Acad Press. IF Blake, G Seroussi & NP Smart (1999). Elliptic Curves in Cryptography. Cambridge Univ Press C Park & K Kurosawa (1996). New ElGamal-Type Threshold Digital Signature Scheme. Inst Electrical, Info & Comms Engineers (IEICE) Trans, Vol E79-A, no 1: pp 86–93 K Takaragi, K Miyazaki & M Takahashi (1998). A Threshold Digital Signature Issuing Scheme without Secret Communication. Presentation IEEE P1363 Study Group for Public-key Crypto Stds T ElGamal (1985). A Public-Key Cryptosystem and Signature Scheme Based on Discrete Logarithms. IEEE Trans Info Theory K Nyberg & R Ruepple (1993). A New Signature Scheme Based on DSA Giving Message Recovery. 1-st ACM Conf on Comp & Comms Security, ACM Press: pp 58–61 CP Schnorr (1989). Efficient Identification and Signatures for Smartcards. Crypto-89, Springer-Verlag LNCS 435: pp 239–252 Y Zheng (1997). Digital Signcryption or how to Achieve Cost(Signature & Encryption) << Cost(Signature) + Cost(Encryption). Crypto-97, Springer Verlag LNCS 1396: pp 291–312 DL Chaum & TP Pedersen. Wallet Databases with Observers. Crypto-92, Springer Verlag LNCS 740: 89–105 A Fiat & A Shamir. How to Prove Yourself: Practical Solutions to Identification and Signature Problems. Crypto-86, Springer Verlag LNCS 263: 186–194
Securing the Border Gateway Protocol: A Status Update Stephen T. Kent BBN Technologies, 10 Moulton Street, Cambridge, MA, U.S. 02138 [email protected]
Abstract. The Border Gateway Protocol (BGP) is a critical component of the Internet routing infrastructure, used to distribute routing information between autonomous systems (ASes). It is highly vulnerable to a variety of malicious attacks and benign operator errors. Under DARPA sponsorship, BBN has developed a secure version of BGP (S-BGP) that addresses most of BGP’s architectural security problems. This paper reviews BGP vulnerabilities and their implications, derives security requirements based on the semantics of the protocol, and describes the S-BGP architecture. Refinements to the original S-BGP design, based on interactions with ISP operations personnel and further experience with a prototype implementation are presented, including a heuristic for significantly improving performance. The paper concludes with a comparison of S-BGP to other proposed approaches.
1 Problem Description
Routing in the public Internet is based on a distributed system composed of many routers, grouped into management domains called Autonomous Systems (ASes). Routing information is exchanged between ASes using the Border Gateway Protocol (BGP) [1], via UPDATE messages. BGP is highly vulnerable to a variety of attacks [2], due to the lack of a secure means of verifying the authorization of BGP control traffic. In April 1997, we began work on the security architecture described in this paper. We begin by reviewing the problem—a model for correct operation of BGP, BGP vulnerabilities and a threat model, and the goals, constraints and assumptions that underlie S-BGP. The reader is assumed to be familiar with the fundaments of BGP. BGP is used in two different contexts. External use of BGP (eBGP) propagates routes between ISPs, or between ISPs and subscriber networks that are connected to more than one ISP, i.e., multi-homed subscribers. BGP also is used internally, within an AS, to propagate routes acquired from other ASes. This use is referred to as internal BGP (iBGP). eBGP is the primary focus of this work, because failures of eBGP adversely affect subscribers outside the administrative boundary of the source of the failure. Nonetheless, some ISPs have expressed interest in using S-BGP to protect the distribution of routes within an ISP. If route servers are employed for iBGP (see secLioy and Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp 40–53, 2003. © IFIP International Federation for Information Processing 2003
Securing the Border Gateway Protocol: A Status Update
41
tion 5.5), or if the number of iBGP peers is small, S-BGP may be a viable approach for iBGP security. We use „BGP“ to refer to eBGP, unless otherwise noted. A route is defined as an address prefix and a set of path attributes, one of which is an AS path. The AS path specifies the sequence of ASes that subscriber traffic will traverse if forwarded via this route. When propagating an UPDATE to a neighboring AS, the BGP router prepends its AS number to the sequence, and may update certain other path attributes. Each BGP router maintains a full routing table, and sends its best route for each prefix to each neighbor. In BGP, „best“ is very locally defined. The BGP route selection algorithm has few criteria that are universal, which limits the extent to which any security mechanism can detect and reject „bad“ routes emitted by a neighbor. Each ISP makes use of local policies that it need not disclose, and this gives BGP route selection a „black box“ flavor, which has significant adverse implications for security. 1.1 Correct Operation of BGP As we noted in [2], security for BGP is defined as the correct operation of BGP routers. This definition is based on the observation that any successful attack against BGP will result in other than correct operation, presumably yielding degraded routing. Correct operation of BGP depends upon the integrity, authenticity, and timeliness of the routing information it distributes, as well as each BGP router’s processing, storing, and distribution of this information in accordance with both the BGP specification and with the local routing policies. Many statements could be made in an effort to characterize correct operation, but they rest on two simple assumptions: • Communications between peer BGP routers is authenticity & integrity secure • BGP routers execute the route selection algorithm correctly and communicate the results The first assumption is easily realized through the use of a suitable, point-to-point security protocol, e.g., IPsec. The second assumption is divisible into two cases: processing received UPDATEs, and generation and transmission of UPDATEs. From the perspective of an AS trying to protect itself against external attacks, correct operation of its own BGP routers is an implementation, not an architectural, security issue. However, an AS ought not rely on other ASes to operate properly, since such reliance leads to cascade failures. It is desirable for a BGP router to be able to verify that each UPDATE it receives from a peer is valid and timely. Validity of an UPDATE encompasses four primary criteria. 1. The router that sent the UPDATE was authorized to act on behalf of the AS it claims to represent (by virtue of placing that AS number in the AS path). 2. The AS from which the UPDATE emanates was authorized by the preceding AS in the AS path to advertise the prefixes contained within the UPDATE. 3. The first AS in the AS path was authorized, by the „owner“ of the set of prefixes (NLRI), to advertise those prefixes. 4. If the UPDATE withdraws one or more routes, then the sender must have advertised the route(s) prior to withdrawing it (them).
42
S.T. Kent
There are limitations to the ability of any security mechanism to detect attacks. The local policy feature of BGP allows considerable latitude in UPDATE processing,, so S-BGP cannot detect erroneous behavior that could be attributed to local policies not visible outside an AS. To address such attacks, the semantics of BGP itself would have to change. Moreover, because UPDATEs do not carry sequence numbers, a BGP router can generate an UPDATE based on authentic, but old, information, e.g., withdrawing or reasserting a route based on outdated information. Thus the temporal accuracy of UPDATEs, in the face of Byzantine failures, is enforced only very coarsely by these countermeasures. 1.2 Threat Model and BGP Vulnerabilities BGP has many vulnerabilities that can be exploited to cause improper routing or nondelivery of subscriber traffic, network congestion, and traffic delays. Misrouting attacks facilitate passive and active wiretapping of subscriber traffic, and thus an attack against BGP may be part of a larger attack against subscriber computers. Routers are vulnerable in both the architectural and implementation domains. Implementation vulnerabilities may allow an attacker to assume control of a router, to cause it to operate maliciously, or to cause the router to crash, and thus deny service. Architectural vulnerabilities permit various forms of attack, independent of implementation details, and thus are potentially more damaging, as they persist across all implementations. To make Internet routing robust, both forms of vulnerabilities must be addressed. S-BGP does not directly address implementation vulnerabilities, but it does limit the impact of such vulnerabilities. Use of S-BGP by an AS protects that AS against many attacks that result from security failures suffered by other ASes. BGP can be attacked in many ways. Communication between BGP peers can be subjected to active and passive wiretapping. A router’s BGP software, configuration information, or routing databases may be modified or replaced illicitly via unauthorized access to a router, or to a server from which router software is downloaded, or via an attacked distribution channel. Most of these attacks transform routers into hostile insiders. Effective security measures must address such Byzantine failures. Many countermeasure could be employed in an attempt to addresses these vulnerabilities. Better physical and procedural security for network management facilities, and routers would help. Cryptographic protection of BGP traffic between routers and for network management traffic would also reduce some of these vulnerabilities. However, improved physical and procedural security is expensive and imperfect and these countermeasures would not protect the Internet against accidental or malicious misconfiguration by operators, nor against attacks that mimic such errors. Misconfiguration of this sort has proved to be a source of several significant Internet outages in the past and seems likely to persist. Any security approach that relies on ISPs to act properly, that relies on „trust“ among ISPs, violates the "principle of least privilege" and leaves the Internet routing system vulnerable at its weakest link. In contrast, the security approach described here satisfies this principle, so that any attack on any component of the routing system is limited in its impact on the Internet as a whole.
Securing the Border Gateway Protocol: A Status Update
43
1.3 Goals, Constraints, and Assumptions In order to create countermeasures that are both effective and practical, the S-BGP architecture is based on the following goals, constraints, and assumptions. Any proposed security architecture must exhibit dynamics consistent with the existing system, e.g., responding automatically to topology changes, including the addition of new networks, routers and ASes. Solutions also must scale in a manner consistent with the growth of the Internet. The countermeasures must be consistent with the BGP protocol standards and with the likely evolution of these standards. This includes packet size limits and features such as path aggregation, communities, and multi-protocol support (e.g., MPLS). The S-BGP architecture must be incrementally deployable; there cannot be a „flag day“ when all BGP routers suddenly begin executing S-BGP. It is desirable to not create new organizational entities that must be accepted as authorities by ISPs and subscribers, in order to make routing secure.
2 S-BGP Architecture S-BGP consists of four major elements: • a Public Key Infrastructure (PKI) that represents the ownership and delegation of address prefixes and AS numbers • „address attestations“ that the owner of a prefix uses to authorize an AS to originate routes to the prefix • „route attestations“ that an AS creates to authorize a neighbor to advertise prefixes • IPsec for point-to-point security of BGP traffic transmitted between routers These elements are used by an S-BGP router to secure communication with neighbors, and to generate and validate UPDATE messages relative to the authorization model represented by the PKI and address attestations. Together, the combination of these security mechanisms provide a „firebreak“ that prevents a compromised AS from propagating erroneous routing data to other (secured) ASes. 2.1 S-BGP Public Key Infrastructure (PKI) S-BGP uses a PKI based on X.509 (v3) certificates to enable BGP routers to validate the authorization of BGP routers to represent ASes and prefixes. This PKI was described in [24] and the reader is referred to that paper for additional details. The S-BGP PKI parallels the existing IP address and AS number assignment delegation system and takes advantage of this infrastructure. Because the PKI mirrors existing infrastructure, it avoids many of the "trust" issues that often complicate the creation of a PKI. This PKI is unusual in that it focuses on authorization, not authentication; the names used in most of the certificates in this PKI are not meaningful outside of S-BGP.
44
S.T. Kent
S-BGP calls for a certificate to be issued to each organization that is granted "right to use" of a portion of the IP address space. This certificate is issued through the same chain of entities that today is responsible for address allocation starting with the IANA1. If an ISP or subscriber owns multiple prefixes, we issue a single certificate containing a list of prefixes, to minimize the number of certificates needed to validate an UPDATE. This PKI represents the assignment of prefixes by binding prefixes to a public key belonging to the organization to which the prefixes have been assigned. These certificates are used to prove „ownership“ of one or more prefixes2. Each certificate in this PKI contains a private extension that specifies the set of prefixes that has been allocated to the organization. We use the Domain Component (RFC 2247) construct in the subject name in each certificate to represent a DNS-style name for an organization. Certificates issued under this PKI also represent the binding between an organization and the AS numbers allocated to it. The PKI allows an organization to certify that a router represents the organization’s AS(es). Here too, the PKI parallels existing "trust relationships," i.e., the IANA assigns AS numbers to RIRs, which in turn assign AS numbers to ISPs or subscribers that run BGP, and they certify their routers. 2.2 Attestations An attestation is a digitally signed datum asserting that its target (an AS) is authorized by the signer (an organization) to advertise a path to the specified prefix(es). There are two types of attestations, address and route. They share a single format. • Address attestation (AA)— the signer of an AA is the organization that „owns“ the prefix(es) in the AA, and the target is a set of ASes that the organization authorizes to originate a route to the prefix(es), i.e., the ISPs with which the issuer has a traffic carriage arrangement. AAs are relatively static data items, since relationships between address space owners and ISPs change relatively slowly. • Route attestation (RA)— the signer of an RA is an S-BGP router in an ISP. The target is a set of ASes, representing the neighbors to which UPDATEs containing the RA will be sent. Note that the router signing an RA might sign a separate RA for each neighbor, or it may sign a single RA directed to all of its neighbors. The latter option permits a router to reduce its digital signature burden, so long as the same parameters appear in the UPDATEs sent to each neighbor. RAs, unlike AAs, are very dynamic, possibly changing for each transmitted UPDATE.
2
One could use X.509 attribute certificates to represent this authorization, but they offer little benefit in this context and would increase the certificate processing burden.
Securing the Border Gateway Protocol: A Status Update
45
2.3 UPDATE Validation Attestations and certificates are used by BGP routers to validate routes asserted in UPDATE messages, i.e., to verify that the first AS in the route has been authorized to advertise the prefixes by the prefix owner(s), and that each subsequent AS has been authorized to advertise the route for the prefixes by the preceding AS in the route. To validate a route received from ASn, ASn+1 requires: • an AA for each organization owning a prefix represented in the NLRI portion of the UPDATE • a valid public key for each organization owning a prefix in the NLRI • an RA corresponding to each AS along the path (ASn to AS1), where the RA generated and signed by router in ASn encompasses the NLRI and the path from ASn+1 through AS1 • a certified public key for each S-BGP router that signed an RA along the path (ASn to AS1), to check the signatures on the corresponding RAs An S-BGP router verifies that the advertised prefixes and the origin AS are consistent with an AA information. The router verifies the signature on each RA and verifies the correspondence between the signer of the RA and the authorization to represent the AS in question. There also must be a correspondence between each AS in the path and an appropriate RA. If all of these checks pass, the UPDATE is valid. Address attestations are not used to check withdrawn routes in an UPDATE. Use of IPsec to secure communication between each pair of S-BGP routers, plus the use of a separate adjacency routing information base (Adj-RIB-In) for each neighbor, ensures that only the advertiser of a route can withdraw it. 2.4 Distribution of S-BGP Data Each S-BGP router must have the public keys required to validate the RAs in UPDATEs, which usually means keys for every other S-BGP router. Each router also needs access to all AAs, to verify that the origin AS is authorized to originate a route to the prefix(es) in the UPDATE. S-BGP does not distribute certificates, CRLs, or AAs via UPDATEs. Transmission of these items via UPDATEs would be very wasteful of bandwidth, as each BGP router would receive many redundant copies from its peers. Also, an UPDATE is limited to 4096 bytes and thus generally could not carry this data. S-BGP distributes this data to routers via out-of-band means. The data is relatively static and thus is a good candidate for caching and incremental update. Moreover, the certificates and AAs can be validated (processed against CRLs) and reduced to a more compact, „extracted file“ format3 by ISP operation centers prior to distribution to routers. This avoids the
3
Only the public key, subject name, and selected extensions need be retained.
46
S.T. Kent
need for each router to perform this processing, saving both bandwidth and storage space. S-BGP uses repositories for distribution of this data. We initially described a model in which a few replicated, loosely synchronized repositories were operated by the RIRs. Discussions with ISPs suggest a model in which major ISPs and Internet exchanges operate repositories, and smaller ISPs and subscribers make use of these repositories. In either model, ISPs periodically (e.g., daily), upload and download new/changed certificates, CRLs, and AAs. The repositories periodically transfer new data to one another to maintain loose synchronization. ISPs process the repository information to create „extracted files“ and transfer them to their routers. Since certificates, AAs, and CRLs are signed and carry validity interval information, they require minimal additional security. Nonetheless, S-BGP employs SSL, with both client and server certificates, to protect access to the repositories, as a countermeasure to denial of service attacks. The simple, hierarchic structure of the PKI allows repositories to automatically effect access control checks on the uploaded data. 2.5 Distribution of Route Attestations Route attestations (RAs) are distributed with BGP UPDATEs in a newly defined, optional, transitive path attribute. Because RAs may change quickly, it is important that they accompany the UPDATEs that are validated using the RAs. When an S-BGP router opens a BGP session with a peer, transmitting a portion of its routing information database via UPDATEs, relevant RAs are sent with each UPDATE, and with subsequent UPDATEs sent in response to route changes. These attestations employ a compact encoding scheme to help ensure that they fit within the BGP packet size limits, even when route or address aggregation is employed. (S-BGP accommodates aggregation by explicitly including signed attribute data that otherwise would be lost when aggregation occurs.) An S-BGP router receiving an UPDATE from a peer caches the RAs with the route in the Adj-RIB for the peer, and in the Loc-RIB (if the route is selected). As noted below in Section 4, the bandwidth required to support inband distribution of route attestations is negligible (compared to user traffic). Although the RA mechanism was designed to protect AS path data, it can also accommodate other new path attributes, e.g., communities [13] and confederations [14]. Specifically, there is a provision to indicate what data, in addition to the AS path, is covered by the digital signature that is part of the RA. 2.6 IPsec and Router Authentication S-BGP uses IPsec [8,9,10], specifically the Encapsulating Security Payload (ESP) protocol, to provide authentication, data integrity, and anti-replay for all BGP traffic between neighboring routers. The Internet Key Exchange protocol (IKE) [11,12] is used for key management services in support of ESP. The PKI established for S-BGP includes certificates for IKE, separate from those used for RA processing.
Securing the Border Gateway Protocol: A Status Update
47
3 How S-BGP Addresses BGP Vulnerabilities Together, the S-BGP PKI and AAs support validation of router assertions about: • the ASes the router is authorized to represent • the prefixes an AS is authorized to originate • the prefixes an AS has been authorized to advertise by other ASes The UPDATE validation procedure described earlier ensures that every AS along the path has been authorized by the preceding AS to advertise the prefixes in the UPDATE, and that the origin AS was authorized by the prefix user. AAs allow a router to detect any attempt by an AS to advertise itself as an origin for a prefix unless the prefix owner has authorized the AS to do so. The use of RAs in UPDATEs allows an S-BGP router to detect any tampering with a path by any intermediate router. This includes attempts to add to the set of prefixes in the NLRI, to add or remove AS numbers from the AS path, or to synthesize a bogus UPDATE. The use of IPsec protects all S-BGP traffic between routers against active wiretap attacks. It is necessary to prevent a wiretapper from sending UPDATEs that only withdraw routes (and thus would not contain any RAs to be validated) and to prevent such an attacker from replaying valid UPDATEs. IPsec also protects the router against various TCP-based attacks, including SYN flooding and spoofed RSTs (resets). Despite the extensive security offered by S-BGP, there exist architectural vulnerabilities that are not eliminated by its use. For example, an S-BGP router may reassert a route that was withdrawn earlier, even if the route has not been re-advertised. The router also may suppress UPDATEs, including ones that withdraw routes. These vulnerabilities exist because BGP UPDATEs do not carry sequence numbers or timestamps that could be used to determine their timeliness. However, RAs do carry an expiration date & time, so there is a limit on how long an attestation can be misused this way. S-BGP restricts malicious behavior to the set of actions for which a router or AS is authorized, based on externally verifiable, authoritative constraints.
4 Performance and Operational Issues In developing the S-BGP architecture, we paid close attention to the performance and operational impact of the proposed countermeasures, and reported our analysis in earlier papers. In preparing this paper, we updated our data, utilizing a variety of sources, e.g., the Route Views project. Although much data about BGP and associated infrastructure is available, other data is difficult to acquire in a fashion that is representative of a „typical“ BGP router. This is because each AS in the Internet embodies a slightly different view of connectivity, as a result of local policy filters applied by other ASes.
48
S.T. Kent
4.1 Some BGP and S-BGP Parameters The backbone routers of the major ISPs have a route to every reachable IP address. As of 2003, the routing information databases (Loc-RIBs) in these routers contain about 125,000 IPv4 address prefixes. Each route contains an average of about 3.7 ASes, and typically there would be one route attestation per AS, which provides a basis for calculating how much space is devoted to RAs in UPDATE messages and in RIBs. Over a 24 hour period, a typical BGP router receives an average of about one UPDATE per minute per peer. Thus a router at an Internet exchange with 30 peers, receives about .5 UPDATEs per second, on average. This rate is affected somewhat by Internet growth, but it is primarily a function of link, component, or congestion failures and recoveries. We originally estimated the peak, per-minute rate for UPDATEs at about 10 times the average. However, more recent data suggests that, in times of extreme stress, the peak UPDATE rate might be as much as 200 times the average. Analysis shows that about 50% of all UPDATEs are sent as a result of route "flaps," i.e., transient communication failures that, when remedied, result in a return to the former route. This sort of routing behavior has long been characteristic of the Internet4 [3]. The X.509 certificates used in S-BGP are about 600 bytes long. The certificate database will grow each year as more prefixes, ASes, and S-BGP routers are added. We estimate the current database size at about 75-85 Mbytes. The CRL database associated with these certificates adds to this total, but since most of these certificates are issued to organizations and devices (vs. people) and the expected revocation rate should be relatively low and thus CRLs ought not grow large. 4.2 S-BGP Processing The computation burden for signature generation and validation in S-BGP has attracted considerable attention, as well it should. After all, routers today do not process digital signatures and this new burden must be considered carefully. Under normal conditions, UPDATEs processing represents a minimal burden for most BGP routers5. However, when routes are changing rapidly, the BGP processing load can rise dramatically, and when a BGP router reboots, it receives complete routing tables (via UPDATEs) from each of its neighbors. The time required by BGP to process all of these UPDATEs represents a significant processing burden. Better algorithms and heuristics are needed to allow routers to better cope with UPDATE surges. Such algorithms should be developed irrespective of the use of S-BGP, but S-BGP would allow these algorithms to operate with confidence about the source and integrity of UPDATEs. 4
In a discussion with David Mills, an architect of the NSFNET, he confirmed that route flapping has been a characteristic of the Internet since the mid-80’s. 5 Most subscriber traffic traverses a router via a „fast path“ which often uses hardware for path selection. Management traffic, such as BGP, is directed to a general purpose processor and associated memory, which processes the traffic and executes routing algorithms.
Securing the Border Gateway Protocol: A Status Update
49
In previous analysis, we assumed that each received UPDATE would contain about 3.6 RAs (now updated to 3.7), and would result in transmission of an UPDATE with one new signature. This was an over simplification; a router generates and transmits an UPDATE only if the newly received route is „better“ than the current best route, or if that route is withdrawn by the UPDATE. When a router has many peers, most of the UPDATEs it receives will not trigger a change in its view of the best route. On the other hand, when a router does select a new route, an UPDATE may be constructed and sent to each neighbor, requiring a one signature per neighbor. This is because an RA specifies the AS number of the neighbor to which it is directed. It is possible to construct an RA that identifies the next hop as a set of AS numbers, corresponding to all the neighbors to which an UPDATE is authorized to be sent. The downside of this strategy is that it makes the RAs, and thus UPDATEs, larger. This observation suggests a heuristic for UPDATE processing to mitigate signature validation costs. A router can defer validation of the RAs in any UPDATE that it receives, if the UPDATE would not represent a new best route. This optimization could be especially helpful for routers that receive the greatest number of UPDATEs, i.e., routers with many neighbors. One might worry that this strategy allows an attacker to force processing, by sending what would be considered „very good“ routes, but an S-BGP router will detect such fraudulent UPDATEs and could choose to drop its connection to a peer that behaved this way. Out initial analysis yielded a peak signature verification rate of about 9/second, for a router with 30 peers, and taking advantage of a depth 1 cache. Given the more thoughtful analysis above, and the more realistic surge UPDATE rates, it is no longer clear what constitutes a good estimate for typical & surge signature validation/generation rates. One could argue for use of a crypto processor to accommodate worst case (200-fold surge) UPDATE rates at a router with many peers. One also could argue that deferring validation unless a received UPDATE would trigger transmission of an UPDATE would reduce the crypto burden to a level that is well within the capabilities of modern, general purpose CPUs. We have not constructed a new analytic model or simulation to evaluate the heuristic. Initialization/reboot of a BGP router also results a surge in UPDATE processing, and the deferred processing heuristic is applicable here too, even though reboots are relatively infrequent. Saving RIBs in non-volatile storage also addresses this problem. 4.3 Transmission Bandwidth Transmission of RAs in UPDATEs increases the average size of these messages to about 600 bytes. This is a significant percentage increase (over 800%), but UPDATEs represent a very, very small amount of data vs. subscriber traffic. Downloading the certificate, CRL, and AA databases contributes an insignificant increment to this overhead. Full database download, from a repository to an ISP might entail a 75-85 Mbyte file transfer by each ISP. Even if performed more than once a day, these transfers would be swamped by subscriber traffic. Thus the impact on utilization of Internet bandwidth due to transmission of all of the countermeasures data is minimal.
50
S.T. Kent
4.4 RIB Size UPDATEs received from neighbors are held in Adj-RIBs and in the Loc-RIB. The space required for RAs is estimated at about 30-35 Mbytes per peer today. This is a modest amount of memory for a typical router with a few peers, but a significant amount of storage for routers at Internet exchanges, where a router may have tens of peers. Thus the management CPU in a router might need up a gigabyte of RAM under some conditions, a modest amount by current workstation standards. Unfortunately, most currently deployed BGP routers cannot be configured with more than 128M or 256M of RAM; additional RAM would be needed in these routers to support full deployment of S-BGP. Over time it is reasonable to assume that routers could be configured with enough RAM, but this analysis shows that full deployment is not feasible with the currently deployed router base. To add RAM, and possibly to add non-volatile storage, router vendors will have to upgrade the processor boards where net management processing takes place. That suggests that addition of a crypto accelerator chip would be prudent as part of the board redesign process. 4.5 Deployment and Transition Issues Adoption of S-BGP requires cooperation among several groups. ISPs and subscribers running BGP must cooperate to generate and distribute AAs. Major ISPs must implement the S-BGP security mechanisms in order to offer significant benefit to the Internet community. IANA and RIRs must expand operational procedures to support generation of prefix and AS number allocation certificates. Router vendors need to offer additional storage in next generation products, or offer ancillary devices for use with existing router products, and revise BGP software to support S-BGP. There is some good news; S-BGP can be deployed incrementally, subject to the constraint that only neighboring ASes will benefit directly from such deployment. Although we chose a transitive path attribute syntax to carry RAs, and thus it might be possible for non-neighbor ASes to exchange RAs, it seems likely that intervening ASes would not have sufficient storage for the RAs in their RIBs. Also, the controls needed in routers to take advantage of non-contiguous deployment of S-BGP are quite complex, hence our comment that only contiguous deployment is a viable strategy. External routes received from S-BGP peers need to be redistributed within the AS, both to interior routers and to other border routers, in order to maintain a consistent and stable view of the exterior routes across the AS. Thus an AS must switch to using S-BGP for all its border routers, to avoid route loops within the AS.
5 Related Work Any discussion of routing security must include a reference to the first significant treatment of the topic, Radia Perlman’s thesis [21]. Several papers on routing security have been published over the last decade, but most deal with „toy“ protocols, not with BGP specifically. A number of these papers
Securing the Border Gateway Protocol: A Status Update
51
made suggestions that the techniques they developed would be applicable to BGP, but the assertions proved to be incorrect. Fast signatures based on hash chains have been proposed for this purpose, most recently in [20], but these proposals also have failed to make a solid case for their applicability to BGP. Some ISPs do make use of a keyed, MD5 integrity check with TCP for BGP transport [16]. This mechanism is less desirable than the use of IPsec in S-BGP, due to its lack of automated key management and operation above the IP layer. It has been suggested [17] that one could use the DNS and DNSSEC [18] to distribute the information contained in AAs. This mechanism does not address route authorization, nor does the proposal describe in detail how this data would be distributed to BGP routers, and thus it is at best a part of a solution. Several papers have proposed using Internet Routing Registries [19] or servers operated by ISPs [22] as a basis for distributing data for use in detecting unauthorized route advertisements. The proposals do not address how the accuracy of the information placed in these registries would be verified. The latter proposal suggests that servers operated by ISPs would communicate to verify routes, when routers detect suspicious UPDATEs, but this merely creates another path for propagating erroneous data. Any approach that relies on repositories to propagate routing (vs. origin AS authorization) data will be less dynamic than routing changes, creating problems when route authorizations change quickly, a not uncommon occurrence in response to major outages. Finally, the Internet routing registries (as opposed to RIRs) are „artificial“ entities from an authorization perspective, which creates additional concerns. The most recent proposal in the BGP security arena is soBGP, described in a set of individual Internet Drafts submitted by a team of engineers from Cisco. The name suggests that soBGP focuses on securing origin AS data, but the proposal has evolved to encompass security for AS paths (routes). At this stage, soBGP is not a security architecture for BGP. It is a „Chinese menu“ set of components that cannot be analyzed as a system, because it allows a variety of options for various aspects of the protocol, and mandates no choices among these options. Absent such choices, interoperability cannot be assured among ASes, nor can the impact of the system be evaluated. For example, soBGP allows distribution of signed route data via repositories, or inband (via new BGP protocol extensions). It allows the computation of authorized routes by routers, or by a NOC that distributes the results to the routers in its AS at some unspecified interval. One cannot meaningfully compare soBGP to S-BGP at this time, because the former does not yet reflect choices that permit such comparisons.
6 Status As of early 2003, an implementation of S-BGP has been developed and demonstrated on small numbers of workstations representing small numbers of ASes. We also developed software for a simple repository, and for NOC tools that support secure upload and download of certificates, CRLs, and AAs to and from repositories, and for certificate management for NOC personnel and routers. This suite of software, plus
52
S.T. Kent
CA software from another DARPA program, provide all of the elements needed to represent a full S-BGP system. All of this software is available in open source form.
7 Summary S-BGP represents a comprehensive approach to addressing a wide range of security concerns associated with BGP. It is currently the only complete proposal for addressing BGP security problems. It detects and rejects unauthorized UPDATE messages, irrespective of the means by which they arise, e.g., misconfiguration, active wiretapping, compromise of routers or management systems, etc. S-BGP addresses the timeliness of UPDATE messages only in a limited fashion. S-BGP also does not address an existing, significant problem for BGP routers, i.e., rapid demuxing of management traffic to avoid processor overload. The former problem is a side effect of the lack of such capabilities in BGP itself; the latter is a problem not unique to BGP. The S-BGP design is based on a top-down security analysis, starting with the semantics of BGP and factoring in the wide range of attacks that have or could be launched against the existing infrastructure. Acknowledgements. Many individuals contributed to the design and development of S-BGP. Initial funding was provided by NSA, in April of 1997, yielding a first cut design. DARPA provided continued funding, under Dr. Hilarie Orman and Dr. Douglas Maughan, that enabled us to refine, implement and test the design, and to create the current prototype. The author would also like to thank Christine Jones, Charlie Lynn, Joanne Mikkelson, and Karen Seo for their efforts on this project.
References 1. 2. 3. 4. 5. 6. 7. 8.
Y. Rekhter, T. Li, "A Border Gateway Protocol 4 (BGP-4)," RFC 1771, March 1995. S. Kent, C. Lynn, and K. Seo, „Secure Boarder Gateway Protocol (S-BGP),“ IEEE Journal on Selected Areas in Communications, vol. 18, no. 4, April 2000. C. Villamizar, R. Chandra, R. Govindan., "BGP Route Flap Damping," RFC 2439, November 1998. Smith, B.R. and Garcia-Luna-Aceves, J.J., "Securing the Border Gateway Routing Protocol," Proceedings of Global Internet ‘96, November 1996. Smith, B.R, Murphy, S., and Garcia-Luna-Aceves, J.J., "Securing Distance-Vector Routing Protocols," Symposium on Network and Distributed System Security, February 1997. Kumar, B., "Integration of Security in Network Routing Protocols," ACM SIGSAC Review, vol.11, no.2, Spring 1993. Murphy, S., panel presentation on "Security Architecture for the Internet Infrastructure," Symposium on Network and Distributed System Security, April 1995. S. Kent, R. Atkinson, "Security Architecture for the Internet Protocol", RFC 2401, November 1998.
Securing the Border Gateway Protocol: A Status Update 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
23. 24.
53
R. Glenn & S. Kent, "The NULL Encryption Algorithm and its Use with IPsec," RFC 2410, November 1998. S. Kent, R. Atkinson, "IP Encapsulating Security Payload (ESP)," RFC 2406, November 1998. D. Maughan, M. Schertler, M. Schneider, J. Turner, "Internet Security Association and Key Management Protocol (ISAKMP)," RFC 2408, November 1998.. D. Harkins, D. Carrel, "The Internet Key Exchange (IKE)," RFC 2406, November 1998. R. Chandra, P. Traina, T. Li, "BGP Communities Attribute", RFC 1997, August 1996. P. Traina, "Autonomous System Confederations for BGP," RFC 1965, June 1996. T. Bates, R. Chandra, D. Katz, Y. Rekhter, "Multiprotocol Extensions for BGP-4," RFC 2283, February 1998. A. Heffernan, "Protection of BGP Sessions via the TCP MD5 Signature Option," RFC 2385, August 1998. T. Bates, R. Bush, T. Li, Y. Rekhter, "DNS-based NLRI origin AS verification in BGP," presentation at NANOG 12, February 1998, http://www.nanog.org/mtg-9802. rd D. Eastlake, 3 , C. Kaufman, "Domain Name System Security Extensions," RFC 2065, January 1997. C. Alaettinoglu, T. Bates, E. Gerich, D. Karrenberg, D. Meyer, M. Terpstra, C. Villamizar, "Routing Policy Specification Language (RPSL)," RFC 2280, January 1998. Yih-Chun Hu, A. Perrig, and D. Johnson, „Efficient Security Mechanisms for Routing Protocols,“ Network and Distributed System Security Symposium, February, 2003. R. Perlman, "Network Layer Protocols With Byzantine Robustness," MIT/LCS/TR-429, October, 1988. G. Goodell, W. Aiello, T. Griffin, J. Ioannidis, P. McDaniel, and A. Rubin, „Working Around BGP: An Incremental Approach to Improving Security and Accuracy for Interdomain Routing,“ Network and Distributed System Security Symposium, February, 2003. J. Ng, „Extensions to BGP to Support Secure Origin BGP (soBGP), " www.ietf.org/internet-drafts/draft-ng-sobgp-bgp-extensions-00.txt. Seo, K., Lynn, C., Kent, S., „Public-Key Infrastructure for the Secure Border Gateway Protocol (S-BGP),“ DARPA Information Survivability Conference and Exposition, June 2001.
Towards an IPv6-Based Security Framework for Distributed Storage Resources Alessandro Bassi1 and Julien Laganier2,3 1
LoCI Laboratory – University of Tennessee 203 Claxton Building – 37996–3450 Knoxville, TN, USA [email protected] 2 SUN Microsystems Laboratories Europe 180, avenue de l’Europe 38334 Saint-Ismier Cedex [email protected] 3 INRIA Action RESO / Laboratoire de l’Informatique du Parall´elisme ´ Ecole Normale Sup´erieure de Lyon - 46, all´ee d’Italie 69364 LYON Cedex 07 - France [email protected]
Abstract. Some security problems can be often solved through authorization rather than authentication. Furthermore, certificate-based authorization approach can alleviate usual drawbacks of centralized systems such as bottlenecks or single point of failure. In this paper, we propose a solution that could bring an appropriate security architecture to the Internet Backplane Protocol (IBP), a distributed shared storage protocol. The three basic building blocks are IPsec, Simple Public Key Infrastructure (SPKI) certificates and Crypto-Based Identifiers (CBID). CBID allows entities to prove ownership of their identifiers, SPKI allows entities to prove that they have been authorized to performs specific actions while IPsec provides data origin authentication and confidentiality. We propose to use them to bring some level of ’opportunistic’ security in the absence of any trusted central authority. This is particularly tailored to ad-hoc environments where collaborations might be very short-termed. Keywords: IBP, IPv6, IPsec, authorization certificates, SPKI, CBID, CGA
1
Introduction
In many security approaches the issue of authorization is often overlooked, as the main research focus lies on authentication issues. This is unfortunate, because many security problems have stringent authorization issues rather than
This work is supported by the National Science Foundation Next Generation Software Program under grant # 0204007, the Department of Energy Scientific Discovery through Advanced Computing Program under grant # DE-FC02-01ER25465, and by the National Science Foundation Internet Technologies Program under grant # ANI-9980203.
A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 54–64, 2003. c IFIP International Federation for Information Processing 2003
Towards an IPv6-Based Security Framework
55
authentication problems. Furthermore, an authorization approach avoids usual drawbacks of centralised systems such as bottlenecks or single point of failure, and it’s much more suited for high-performance distributed systems. In this paper, we would like to introduce a solution that could bring an appropriate security architecture to the Internet Backplane Protocol (IBP), a protocol for managing distributed shared storage. The three basic building blocks we are using to provide an acceptable level of security are IPsec, Simple Public Key Infrastructure (SPKI) certificates and Crypto-Based Identifiers (CBID). At present, IBP provides a certain level of security, using an interesting authorization scheme based on cryptographically secure URLs for loading and storing data on a server, but unfortunately not on the most sensitive call of the protocol, the one that allows remote space to be reserved by an end user. Of those three basic blocks, CBID allows entities to prove ownership of their identifiers, SPKI allows entities to prove that they have been authorized to perform specific actions while IPsec provides data origin authentication and possibly confidentiality. We propose to use them to bring some level of ’opportunistic’ security in the absence of any trusted central authority, as the IBP architecture is designed around the concept of using untrusted data depots for holding data for a limited amount of time . As IBP itself does not provide any system to guarantee the integrity and confidentiality of data, these matters have to be taken care by the applications willing to use the IBP infrastructure, and therefore are outside of the scope of this work. We are concentrating especially of potential Denial-of-Service attacks that might occur if an attacker tries to reserve all the available space. The approach we are following is also particularly tailored to ad-hoc environments where collaborations might be very short-termed. The paper is organized as follows: in section 2 we will describe the Internet Backplane Protocol. Then, in section 3, we will analyse its security features, and in section 4 we will talk about the basic security building blocks we will use. Section 5 will focus on how those blocks will work together, and, after discussing about the related topics in section 6, we will illustrate the direction of our research in section 7.
2
The Internet Backplane Protocol
The Internet Backplane Protocol [9] (IBP) is a protocol developed by the Logistical Computing and Internetworking (LoCI) Lab from the University of Tennessee to allow an easy sharing of distributed storage resources. The singularity of this protocol is the way of considering those resources completely exposed: any application can allocate some amount of space for a limited amount of time on any server. This key aspect of the IBP storage model, the capacity of allocating space on a shared network resource, can be seen as doing a C-like malloc on an Internet resource, with some outstanding differences, such, for instance, timelimitation. IBP servers, also called depots to underline the similarity with the industrial and military logistics, are therefore equipment that allow the sharing
56
A. Bassi and J. Laganier
of space, either disk or RAM, giving any application the possibility to manage a certain amount of space for a limited time, and therefore allowing end users and applications to explicitly schedule the movement and the position of data. IBP allocations have to be considered ”best-effort”, as the server does not guarantee the presence of stored data. Therefore, if reliability of the storage is requested, data replication is necessary, and it must be carried out either through the LoRS tools provided by the same LoCI lab, or directly by the application itself. Because of this particular characteristic, an analogy can be seen between IBP and the Internet Protocol: as IP is a more abstract service based on link-layer datagram delivery, and provides an unreliable, connectionless network service, IBP is a more abstract service based on blocks of data, managed as “byte arrays”, providing an unreliable and stateless storage service. The IBP protocol has not been standardized yet, but efforts in this sense have been made in the realm of the Global Grid Forum, and a final protocol specification is likely to appear towards the end of this year. IBP Servers have been deployed in around 160 sites, mainly in the United States, providing a publicly available total storage of more than 10 Terabytes. Because of their general-purpose nature, they are well adapted to many different applications, from data staging for scientific calculation, to overlay routing, to multimedia stream caching. In this last field we can notice several initiatives, the latest one being IBPvo, a mechanism similar to TiVo, a set-up box for recording TV shows on a hard disk support, but based on the IBP infrastructure. Therefore, we can forecast that this protocol will be broadly used in the future, as its peer-to-peer nature makes it the perfect candidate for the next generation of multimedia sharing software.
3 3.1
Security Analysis for IBP Threat on Allocate Method
The only mechanism currently implemented to protect depots from Denial-ofService (DoS) attacks on allocation is Access Control List (ACL). The owner of a depot may choose to define ACL listing IP address list of clients authorized to perform an allocation. Since this verification is based on a longest prefix match of an identifier that has been obtained in an untrusted manner versus some ACL entries, it does not provide a very high level of security, and is particularly vulnerable to IP spoofing and hijacking attacks. An attacker that can snoop a link that carries legitimate and authorized IBP traffic towards a given depot can easily attack this depot by generating either fake read/write/free queries with valid capabilities to destroy other users’ data, or fake alloc queries with an authorized source IP address to mount a DoS attack on the exhaustion of the available storage, although, allowing application to retain storage only for a certain amount of time, the risk of having a resource completely taken over for a long amount of time is practically non-existent. Our belief is that this call (i.e. allocate) is the most important, as it allows applications to commit part of a public resource for their private use, and at the
Towards an IPv6-Based Security Framework
57
same time the more vulnerable one, as apart from the ACL no other mechanism is implemented to protect the unauthorized use of the resource. This work is focused on how to secure this vital phase of the protocol in a manner that both demonstrates scalability and retains a fine-grained resource control. 3.2
Threat on get and put Method
IBP has been designed around the concept of capability, an opaque string returned to a client by a server after a successful allocation, which is the functional equivalent of both a plain-text password and a handler. By providing this capability to his subsequent queries (to put or get data on a depot), the client can prove that either he is the allocator of the storage area, or that the original allocator has authorized him to use this resource by unveiling to him the associated capability. Semantically, those two situations are the same for an IBP server, as the focus is on authorization rather than authentication. This a very practical means of delegating rights to share resources; however, since with the current level of the code no strong cryptographic mechanism (namely authentication and encryption) protect theses exchanges, they could be subject to a wide range of attacks originating from the network layer to the application layer. As the simple adoption of a publicly available SSL library for any exchange where capabilities have to be passed would provide a sufficient level of security, it would require to modify legacy IBP applications to benefit from. This imply that when each IBP-based application establish a new communication channel intended to carry IBP capabilities, it also needs to perform a SSL/TLS handshake, independently of the fact it may have already perform such an handshake before for protecting a different socket instance. So we decided not to concentrate our attention on this aspect in this paper. Since our scheme use IPsec (IP packet level), it is almost transparent to applications and allows to factorize the security context establishemnt between two peers for protecting multiple socket instances.
4 4.1
Secure Building Blocks: IPsec, SPKI, and CBID IPsec
IPsec [2] is the security architecture for IP. It can provides data-origin authentication, replay protection, non-repudiation and confidentiality to IP datagram delivery. IPsec processing takes place at the bottom of the IP stack. Each outgoing or incoming packet is matched against the Security Policy Database (SPD) to see which policy must be applied. The selection of the policy is based on the inbound or outbound network interfaces used, source and destination IP addresses, IP protocol carried (e.g. TCP, UDP). The policy specifies if a packet should be dropped, bypass IPsec processing, or secured by an appropriate Security Association. When the appropriate Security Policy has been selected, the
58
A. Bassi and J. Laganier
kernel look at the Security Association Database (SAD) to find which algorithms and parameters should be applied to the packet (a two-peers agreement on such parameters is called a Security Association). After the algorithms are applied, the packet continues to flow within the stack. SAs can be established through manual keying or automatic key exchange with Internet Key Exchange (IKE). The main issue with automatic key exchange is the authentication of so-called ”IKE peers”. IKE specifies three means of authentication: – Pre-Shared Keys – Digital Signatures – Public Key Encryption The problem here is that in the absence of a trusted infrastructure, these methods don’t allow two previously unknown nodes to successfully authenticate each other: Pre-Shared Keys require prior agreement between the peers, Digital Signatures and Public Key requires both peers to know each other’s public key. This knowledge could be achieved either by prior agreement, or by relying on a trusted infrastructure like a Public Key Infrastructure (PKI), a Trusted Third Party (TTP), or a Key Distribution Center (KDC). The approach described in this paper supersedes current limitations on IKE by allowing any two previously unknown nodes to exchange keys. 4.2
SPKI
SPKI [6] stands for Simple Public Key Infrastructure. The vast majority of today’s security mechanism relied on ACLs and PKIs. ACLs define who (or which entity) is authorized to perform a given action (like having read access to a file), while PKIs allows any subject of a common trust domain to bind his name (so called Distinguished Names or DN in X509 nomenclature) to a given public/private key pair. SPKI specifies a framework which includes the definition of authorization certificates allowing a given public-private key pair to authorize another entity (subject) to perform some actions through the delegation of rights. SPKI authorization certificates differs from standard X509 attributes certificates by the fact that both the issuer (or Certificate Authority or CA in X509 terms) and the subject are identified by either their public key or a hash of their public key (Whereas with X509, the issuer must be identified by its DN). A subject may be authorized to do something by an issuer through a chain of several delegatable SPKI authorization certificates, each of them having a subject field corresponding to the issuer of the next certificate in the chain. The final subject gains the intersection of the rights granted by each certificate of the chain. SPKI authorization certificates are very useful objects in a distributed and open system threatened by possible malicious attacks. They can be seen as an ACL entry packed with an in-line PKI certificate. They allow an entity to prove
Towards an IPv6-Based Security Framework
59
to the controller of a remote resource that he has the right to perform some actions on it, without the intervention of any trusted third party, and possibly without direct contact between the requester and the controller. The delegation feature reduces the management overhead in many different manners, from standard PKI-oriented hierarchical delegation, through small flat groups, to Web-of-Trust Peer-to-Peer communities. This delegation property is particularly convenient for securing distributed systems because it does not require a centralized management of credentials, thus alleviating some of the potential scalability issues encountered on such large-scale systems (e.g. bottlenecks, single point of failure) A SPKI authorization certificate has the following general structure: (sequence (public-key object) (cert object) (signature object) ) public-key, signature and cert are objects defined by the SPKI framework. In our experiences, we only uses SPKI authorization certificates in which the cert object contains an application-dependent tag specifying the attributes of the granted authorization. 4.3
Crypto-Based Identifiers (CBID)
CBID were used by Montenegro et Castellucia ([7]) to help solve the identifier ownership problem in MobileIPv6. Since then, it was also proposed to secure the group management problem for IPv6 multicast/anycast with the help of authorization certificates ([8]), to secure IPv6 Neighbor Discovery, to secure the JXTA peer-to-peer infrastructure ([10]), etc. The concept of Crypto-Based IDentifiers is quite simple: Starting from a public/private key pair, one could use its public key as one of the input parameters of a secure hash function. The truncated output of the hash is the CBID itself. CBID = h128 (P K|Imprint)
(1)
Where P K is the Public Key of the owner of the CBID and Imprint an input parameter to the secure hash function that allows to restrict the scope of a given CBID. Imprint is usually the 64 bits of the node’s IPv6 N etworkP ref ix. h128 denotes the truncated output (128 leftmost bits) of the secure hash function h By doing so, one is able to prove ownership of its CBID by computing the digital signature (using its private key) of the message carrying it: msg = CBID|text|P K|SIGSK {CBID|text|P K}
(2)
60
A. Bassi and J. Laganier
Where SK is the Private Key associated with P K, text is some data that the owner wants to protect and SIGSK is the digital signature function (usually RSA). In this scheme, we choose to use IPv6 addresses that are CBIDs, sometimes also called Cryptographically Generated Addresses (CGAs) or Statistically Unique and Cryptographically Verifiable Addresses (SUCV-Address). We construct them by concatenating the IPv6 Network Prefix N P with the 64 leftmost bits of the secure hash output used as an Interface IDentifier (IID): CGIID = h64 (P K|Imprint) CGA = N P |h64 (P K|Imprint)
(3) (4)
Thus, one would loose half the bits of entropy, but gain a routable CBID (i.e. CGA, SUCVAddr) that can be embedded in existent IP protocols like Neighbor Discovery, or, in our case, Internet Key Exchange and Internet Backplane Protocol. Two peers previously unknown nodes using CGA are now able to authenticate themselves and exchange keys to negotiate some IPsec Security Associations that could be subsequently used by any Upper Layer Protocol (ULP) like IBP.
5
Putting Together the Building Blocks
In this paper we propose the use of existing IPsec security mechanism in conjunction with SPKI certificates and CBID at the network layer to secure the whole stack from the IP network layer, to IBP, used as an ULP for distributed shared storage systems for the Grid. We assume that the network is no more than a collection of untrusted nodes, and that no trusted infrastructure (e.g. PKI, TTP, KDC) is available. The owner of an IBP depot wants to control who is authorized to allocate storage in his depot, and moreover, doesn’t want to be involved in each authorization process because it does not scale with the number of users. We combine together CBIDs, used as IPv6 addresses (i.e. CGA), with SPKI authorization certificates and IPsec to allow us to verify address ownership, bootstrap a subsequent IPsec Security Association, and verify that a given endpoint is indeed authorized to request a given service (like a long term IBP storage allocation). We propose to authorize the subsequent IBP allocation only if the requesting end-point has transmitted in-line an appropriate SPKI certificates chain. Such a chain must begin by a SPKI certificate signed with the private key of the storage depot, whose issuer is CBID of the depot. If this certificate is not delegatable (propagate), then the subject must be the entity trying to allocate space. If the certificate can be delegated, then it can be followed by other certificates which have to be delegatable as well (except the latest). Each certificate subject field of the chain must be the issuer of the next certificate of the chain. These certificate chains allow the maximum flexibility in the representation of the effective trust relationship among a large number of individuals and others entities.
Towards an IPv6-Based Security Framework
61
(cert (issuer (cbid ) ) (subject (cbid ) ) (tag (ibp-alloc <size> ) ) (propagate) (online-test ) (not-before ) (not-after ) ) (cert (issuer (cbid <3ffe:b89a:10c0:a120:2c48:54ff:fec0:de93>) (subject (cbid <2c48:54ff:1ae3:01bb:0ab4:b89a:4f0e:389a>) (tag (ibp-alloc <5GB> <*>) ) (propagate) (not-before <10/1/2002>) (not-after <10/31/2002>) (online-test ) )
Fig. 1. IBP SPKI Authorization Certificate
The IBP SPKI authorization certificate (figure 1) includes a tag which defines the maximum space and duration of an allocation on a given depot. The signer can also include an Uniform Resource Identifier (URI) indicating a location in order to perform online revocation check of the certificate. Those revocation check can be conveniently performed toward the issuer of an authorization certificate, as it is probably the most capable entity for doing that. Thus revocation can be performed at the point of authorization, and not on a centralized CRL server. Another solution to avoid the need for a centralized CRL server may be that the issuer only issues short-lived certificates, thus requiring certificate re-validation from time to time. Each time an entity wants to perform an action toward a depot, it opens a connection, and packets begin to flow. While the IPsec security policy has been appropriately defined, it should specify that IP packets from protocol TCP and port number IBP PORT NUMBER MUST be authenticated and encrypted using ESP transport mode. When the first packet is sent, it match an SPD entry, so the stack will search an appropriate SA to apply it to the packet. If such a SA is not yet in place, IKE would be requested to perform the appropriate keyexchange and SA negotiation using CBID as IPv6 addresses. IKE will then prove address-ownership and bootstrap an ESP transport mode Security Association
62
A. Bassi and J. Laganier
with the generated Diffie-Hellman shared-secret. This brings both data-origin integrity and confidentiality to the ULP, thus avoiding compromising the IBP capability. If the security policy is appropriate, the underlying verifiable identifier was already verified when the TCP connection complete three-way handshake. However, it could be safer to verify a second time in the ULP that the host public-key match its CBID or CGA (this can be done through a single call to the CBID library). In the case of an allocation request, the depot verifies that the provided SPKI authorization certificates chains indeed authorizes the requester to allocate the desired space. In the case of get and put, the depot does not need to verify certificates because the capability proves that the authorization has been given by the owner of the resource (indeed, the owner had given authorization by disclosing a capability). Since the TCP connection is protected by IPsec encapsulation from IP spoofing, hijacking and snooping, an attacker cannot claim to be someone that is authorized to allocate nor can he learn some capabilities that would authorize him to read/write/manage associated data buffers.
6
Related Work
The approach presented here differs from classical other security approaches targeted at large scale distributed systems by the fact that it does not rely on the existence of a global trusted infrastructure to authenticate users and others entities. Some of these frameworks try to federate multiple security framework (such as Kerberos, DCE, PKI) under a global PKI, like in Globus Security Infrastructure [11]. Others propose to lay on top of existant remote untrusted storage (e.g. NFS, CIFS, P2P, HTTP) a layer that bring security like in SiRIUS [12]. Both requires an always on-line PKI top work. Theses approaches have a lot of advantages in terms of re-usability of existing components, but they don’t address the major problem in distributed systems which is scalability. Actually, there is no Internet-wide deployed PKI, and some peoples thinks that the nearest thing we could achieve is DNS-SEC. Assuming that this is the case, it severely limits the real scalability of such solutions. Scalability concerns may push users to use their own trusted Certificate Authority, thus fragmenting the unified grid environment. From another point of view, the fact that these architectures allows local administrator to enforce their locally implemented security policy while joining an existent grid computing domain is extremely valuable in terms of scalability because it gives much more flexibility in deployment (no modification of locally implemented security mechanisms is required). Apparently the scalability of a grid security architecture seems to be a tradeoff between agregation-centralization (reducing management and heterogeneity) and distribution (avoiding possibles bottlenecks, single-point of failures, etc.). We choose to address the second point.
Towards an IPv6-Based Security Framework
63
Meanwhile, some researchers have already build completely distributed security systems using SPKI or Keynotes authorization certificates. The most related solutions have been described to implement distributed firewalls [14] and Trust Management in IPsec [13]. Our contribution bring to the existent approaches some interesting way to verify the identifier used within network protocols because we use the public key (in fact, the hash of the public key) as an identifier in both the architecture and the protocols we want to secure. This is particularly convenient for securing existent protocols because as long as protocol’s identifiers are as long as needed (about 128 bits), you can replace them by CBIDs and gain a means of verifying identifiers embedded in the protocol itself (without requiring to modify it). This is also convenient by the fact that protocols identifiers are usually generated with the address as an input parameter, and that the address is a CBID in our solution, so in addition to being able to verify the identities of your correspondent in the ULP (using CBIDs and SPKI) and at the network layer (using IPsec and CGA), you can benefit from a Cryptographic Layering Enforcement of the complete protocol stack. The ULP is then securely layered on top of TCP/IP. Hence, an attacker cannot succeed while launching an attack towards the lonely ULP because IPsec prevents IP spoofing and hijacking, and the ULP’s identifiers are equivalently tight to both the public key and the IPv6 Cryptographically Generated Address.
7
Future Works
In this paper, we have presented new solutions to secure the most sensitive part of a distributed and shared storage infrastructure, the resource allocation phase. The Internet Backplane Protocol provides a complete storage infrastructure inside the network in the form of distributed data depots which allow users to deploy and temporarily store their data. Currently more than 160 depots are available worldwide (mainly at Internet2 and PlanetLab nodes, but also at many independent locations), with an aggregate storage capacity of around 10 Terabytes. Depots must rely on replicated authorization mechanisms to move data between them in a secure way. We showed how to combine together security protocols (IPsec, SPKI and IPsec) for allocation authorization in IBP depots. We are currently implementing this model inside the Internet Backplane Protocol software suite. This implementation will be completely transparent to the protocol itself, and it will be easily usable by IBP servers and clients. A further step will be to rethink through the capability mechanism that IBP uses for read/write authorization rights and see if out framework could be applicable, and at what price in terms of structural modification to the current scheme. Because of its elegant architecture, this security model can easily be extended to other Peer to Peer or Content Delivery infrastructure which need fully distributed authorization solutions.
64
A. Bassi and J. Laganier
References 1. S. Deeringr, B. Hinden, “Internet Protocol version 6 (IPv6) Specification”, RFC 2460, December 1995. 2. S. Kent, R. Atkinson, “Security Architecture for the Internet Protocol”, RFC 2401, November 1998. 3. S. Kent, R. Atkinson, “IP Authentication Header”, RFC 2402, November 1998. 4. S. Kent, R. Atkinson, “Encapsulating Security Payload”, RFC 2403, November 1998. 5. T. Dierks, C. Allen, “The Tranport Layer Security (TLS) Protocol”, RFC 2246, January 1999. 6. C. Ellison et al., ”SPKI Certificate Theory”, RFC 2693, September 1999. 7. G. Montenegro & C. Castellucia, “Stastically Unique and Cryptographically Verifiable (SUCV) Identifiers and Addresses”, 9th Network and Distributed System Security Symposium (NDSS), February 2002. 8. G. Montenegro & C. Castellucia, “Securing Group Management”, ACM Transactions on Security (T-SEC) 2002, February 2001. 9. J. Plank, A. Bassi, M. Beck & al., Managing Data Storage in the Network, IEEE Internet Computing, September-October 2001. 10. G. Montenegro & D. Bailly The Crypto-ID JXTA project web site, http://cryptoid.jxta.org 11. The Globus project web site, http://www.globus.org 12. E. Goh, H. Shacham, N. Modadugu, D. Boneh, “SiRIUS: Securing Remote Untrusted Storage”, Proc. Network and Distributed System Security Symposium (NDSS), February 2003. 13. J. Ioannidis, A. Keromytis & al., “Trust Management for IPsec”, Proc. Network and Distributed System Security Symposium (NDSS), February 2001. 14. J. Ioannidis, A. Keromytis et al., “Implementing a Distributed Firewall”, Proc. ACM Conference on Computer and Communications Security 2000.
Operational Characteristics of an Automated Intrusion Response System Maria Papadaki1, Steven Furnell1, Benn Lines1, and Paul Reynolds2 1
Network Research Group, University of Plymouth, Drake Circus, Plymouth, United Kingdom [email protected] 2 Orange Personal Communications Services Ltd, St James Court, Great Park Road, Bradley Stoke, Bristol, United Kingdom.
Abstract. Continuing organisational dependence upon computing and networked systems, in conjunction with the mounting problems of security breaches and attacks, has served to make intrusion detection systems an increasingly common, and even essential, security countermeasure. However, whereas detection technologies have received extensive research focus for over fifteen years, the issue of intrusion response has received relatively little attention - particularly in the context of automated and active response systems. This paper considers the importance of intrusion response, and discusses the operational characteristics required of a flexible, automated responder agent within an intrusion monitoring architecture. This discussion is supported by details of a prototype implementation, based on the architecture described, which demonstrates how response policies and alerts can be managed in a practical context.
1 Introduction Ever since the commercialisation of the Internet, there has been a substantial growth in the problem of intrusions, such as Denial of Service attacks, website defacements and virus infections [1]. Such intrusions cost organisations significant amounts of money each year; for example, the 2003 CSI/FBI Computer Crime and Security Survey [2] reported annual losses of $201,797,340 from 530 companies questioned. Although these results suggest that the cost of attacks has decreased for the first time since 1999, it is still significant amount, representing a 101.55% increase compared to 1997 [3]. As a defence against such attacks, intrusion detection technologies have been employed to monitor events occurring in computer systems and networks. Intrusion detection has been an active research area for more than 15 years [4,5], and merits a wide acceptance within the IT community [6;3]. However, detecting intrusions is only the first step in combating computer attacks. The next step involves the counteraction of an incident and has so far been largely overlooked [7;8]. The CSI/FBI survey suggests a declining trend amongst organisations to address vulnerabilities, or report incidents to law enforcement since 1999 [2]. Although the percentage of respondents, A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 65–75, 2003. © IFIP International Federation for Information Processing 2003
66
M. Papadaki et al.
who patched vulnerabilities after an incident, was reasonably high, it was still decreased by 2% when compared to the respective figure of 1999, while about 50% of the respondents chose not to report the incident at all. Even if vulnerability patching and incident reporting are only two aspects of responding to intrusions, the lower percentages suggest a lack of effective response policies and mechanisms within organisations. A principal reason for this problem is likely to be the administrative overhead posed by response procedures. At the moment, the detection of a suspected intrusion typically triggers a manual intervention by a system administrator, after having received an alert message from the intrusion detection system. The IDS can additionally assist the incident response process, by providing the details of the attack, saved in a log file [9]. However, responding manually to intrusions is not necessarily an easy task, as it may involve dealing with a high number of alerts and notifications from the IDS [10], ensuring awareness of security bulletins and advisories from incident response teams, and taking appropriate actions to resolve each of the alerts reported. From the system administrator’s perspective, the main requirement is to ensure that the system remains operational and available. Thus, unless resolving a detected incident is explicitly required to ensure that this is the case, the task of responding is likely to be given a lower priority. The importance of timely response has been demonstrated by Cohen [11] in his simulation of attacks, defences and their consequences in complex ‘cyber’ systems. These showed that, if skilled attackers are given 10 hours between being detected, and generating a response, then they have an 80% chance of a successful attack. When that time interval increases to 20 hours, the rate of success rises to 95%. After 30 hours the skill of the system administrator makes no difference, as the attacker will always succeed. However, if the response is instant, the probability of a successful attack against a skilled system administrator becomes almost zero. This shows not only the importance of response, but also the relationship between its effectiveness and the time it is initiated. At the time of writing, the degree of automation in current IDS is very low, offering mostly passive responses (i.e. actions that aim to notify other parties about the occurrence of an incident and relying on them to take further action). In contrast, active responses (actions taken to counter the incident that has occurred) either have to be initiated manually or may not be offered at all. Lee [12] found that even if IDS products offer active responses, they are not trusted by administrators, mainly due to the likely adverse effects in the event of them being falsely initiated. In spite of the potential problems, practical factors suggest that automated response methods will become increasingly important. For example, the widespread use of automated scripts to generate distributed attacks [13] can offer very limited opportunity to respond, and further diminishes the feasibility of doing so manually. Thus, there is a need for the adoption of automated response mechanisms, which will be able to protect system resources in real time and, if possible, without requiring explicit administrator involvement at the time. As an effort to enhance the effectiveness of automated response and reduce its adverse effects in false rejection scenarios, an automated response framework has been devised. The aim is to enable accurate response decisions to be made autonomously, based on the nature of the attack and the context in which it is occurring (e.g. what applications are running, what account is being used, etc.). The
Operational Characteristics of an Automated Intrusion Response System
67
remainder of this paper describes the concept of the Responder, followed by details of a prototype implementation that demonstrates the approach in practice.
2 The Intrusion Monitoring System (IMS) IMS has been the focus of research within the authors’ research group for several years and is a conceptual architecture for intrusion monitoring and activity supervision, based around the concept of a centralised host handling the monitoring of a number of networked client systems. Intrusion detection is based upon the comparison of current user activity against both historical profiles of normal behaviour for legitimate users and intrusion specifications of recognised attack patterns. The architecture addresses data collection and response on the client side, and data analysis and recording at the host. The elements of the architecture that are relevant to the discussion presented in this paper are illustrated in Figure 1. The main modules of IMS have already been defined in earlier publications [14], and interested readers are referred to these for associated details. In this paper, specific focus will be given to the modules related to intrusion response. The Responder is responsible for monitoring the Alerts sent from the Detection Engine (note: this module was referred to as the Anomaly Detector in previous papers) and, after considering them, in conjunction with other contextual factors, taking appropriate actions where necessary. If the actions selected by the Responder need to be performed on the client side, a local Responder Agent is responsible for initiating and managing the process. Without providing an exhaustive list, examples of actions that could be performed at the client side include correcting vulnerabilities, updating software, issuing authentication challenges, limiting access rights and increasing the monitoring level. The Responder utilises a variety of information in order to make an appropriate decision. This is acquired from several other elements of IMS, including the Detection Engine, the Collector, the Profiles, and the Intrusion Specifications. The possible contributions from each of these sources are described below. As well as indicating the type of suspected incident, the Detection Engine is also able to directly inform the Responder about the intrusion confidence, the current alert status of the IDS, the source of the alert that triggered the detection, information about the perceived perpetrator(s) and the target involved. The Collector is able to provide information about current activity on the target system (e.g. applications currently running, network connections currently active, applications installed etc.). This information can be used to minimise the disruption of legitimate activity, by making sure that no important work at the target gets lost, or no important applications are ended unnecessarily, as a result of selected response actions. It can also be used for cases of compromised targets when information about them needs to be reassessed. For example, the determination of whether unauthorised software (sniffing software / malware) has been installed will be vital information for the response decision process. In that way the negative impacts of responses can be minimised and the response capability enhanced as much as possible.
68
M. Papadaki et al.
Client Response RESPONDER AGENT Challenge/ Response
Activity Data
COLLECTOR
C O M M U N I C A T O R
Challenge Data
User Response
Activity Data
C O M M U N I C A T O R
Challenge Data Policy Refinements
RESPONSE ACTIONS
Response RESPONDER Behaviour Data Activity Data
Alert
†
PROFILES†
DETECTION ENGINE
Alert Status
SENSORS
RESPONSE POLICY
INTRUSION SPECIFICATIONS
Host
User and System Profiles Fig. 1. The Intrusion Monitoring System (IMS)
The Profiles contain information about users and systems, both of which can provide some information in the context of response decisions: − User profiles: If the incident involves the utilisation of a user account, then the corresponding user profile can indicate aspects such as the privileges and access rights associated with it. − System profiles: These relate to system characteristics, such as versions of operating systems and installed services, the expected load at given hours/periods, the importance of the system within the organisation (e.g. whether it holds sensitive information or offers critical services), its location on the network etc. Finally, Intrusion Specifications contain information about specific types of intrusions and their characteristics - such as incident severity rating, ratings of likely impacts (e.g. in terms of confidentiality, integrity and availability), and the speed with which the attack is likely to evolve [15]. Once the Detection Engine has indicated the type of incident that it believes to have occurred, additional information can be retrieved from the specifications to obtain a comprehensive view of the incident (all of which would again influence the response selection). Having gathered all of the available information, the actions that should be initiated in different contexts are then specified in the Response Policy. In the first instance, the Response Policy would need to be explicitly defined by the system administrator; however, it could also be refined over time to reflect practical experience. For example, if a particular response is found to be ineffective against a particular situation, then the policy could be updated to account for this. It is envisaged that this refinement could be initiated manually by the system administrator, as well as automatically by the system itself. Further information about this process is given in the next section.
Operational Characteristics of an Automated Intrusion Response System
69
3 Operational Characteristics of the Responder In order to enable increasingly automated responses, and reduce the risks associated with using active response methods, the architecture incorporates techniques to improve the flexibility of the response process when compared to approaches in current IDS. Specifically, the proposed Responder includes the ability to: − adapt decisions according to the current context; and − assess the appropriateness of response actions before and after initiating them. The concept of adaptive decision-making relates to the requirement for flexibility in the response process. A fundamental principle of the proposed approach is that response decisions should vary depending upon the context in which an incident has occurred (i.e. a response that is appropriate to a particular type of incident on one occasion will not necessarily be appropriate if the same incident was to occur again under different circumstances). The previous section described how the Responder draws upon information from a number of other sources within the IMS framework. This enables the system to determine the overall context in which an incident has occurred, including considerations such as: − the overall alert status of the IDS at the time of the new incident; − whether the incident is part of an ongoing series of attacks (e.g. how many targets have already been affected? Which responses have already been issued?); − the perpetrator of the attack (is there enough information to suggest a specific attacker? Is he/she an insider/outsider? Has he/she initiated an attack before? How dangerous is he/she? What attacks is he likely to attempt?); − the current status of the target (e.g. is it a business critical system? What is its load at the moment? Is there any information or service that needs to be protected? What software/hardware can be used for response?); − the privileges of the user account involved (e.g. what is the risk of damage to the system?); − the probability of a false alarm (how reliable has the sensor/source that detected the incident been in the past? What is the level of confidence indicated by the Detection Engine about the occurrence of an intrusion?); − the probability of a wrong decision (how effective has the Responder been so far? Have these responses been applied before in similar circumstances?). Having assessed the above factors, response decisions must then be adapted to the context accordingly. For example, if the incident has been detected on a business critical system, and the Detection Engine has indicated a low confidence, then the selection of a response with minimal impact upon the system would represent the most sensible course of action. That decision minimises the chance of critical operations being disrupted in the case of an error alert. However, if the same scenario occurred in conjunction with previous alerts having already been raised (i.e. indicating that the current incident was part of a series of attacks), or if the overall alert status of the IDS was already high, then a more severe response would be
70
M. Papadaki et al.
warranted. More comprehensive information about this decision process, and the information that would be assessed, is presented in earlier publications [15; 16]. The other novel feature of the Responder is its ability to assess the appropriateness of response actions. This can be achieved in two ways; firstly by considering the potential side effects of a response action, and secondly by determining its practical effectiveness in containing or combating attacks. As previously identified in the introduction, the problem of side effects is a particular concern in the context of using active responses, because they have the potential to adversely affect legitimate users of the system. As a result, this needs to be considered before the Responder chooses to initiate a given action. There are a number of characteristics that would be relevant in this context: − the transparency of the response action. In some cases it might be preferable to issue responses that do not alert the attacker to the fact that he/she has been noticed, whereas in others it could be preferable to issue a response that is very explicit. − the degree to which the action would disrupt the user to whom it is issued. This is especially relevant in the context of a response action having been mistakenly issued against a legitimate user instead of an attacker. In situations where the Detection Engine has flagged an incident but expressed low confidence, it would be desirable to begin by issuing responses that a legitimate user would be able to overcome easily. − the degree to which the action would disrupt other users, or the operation of the system in general. Certain types of response (e.g. termination of a process, restriction of network connectivity) would have the potential to affect more than just the perceived attacker, and could cause reduced availability to other people as well. As such, the Response Policy may wish to reserve such responses only for the most extreme conditions. Each of these factors would need to be rated independently, and the information would be held in the database of available response actions (previously illustrated in Figure 1). The consideration of the ratings could then be incorporated into the response selection process as appropriate, and indeed during the formulation of the Response Policy by the system administrator. In addition to assessing the side effects, each response could also usefully be given an associated rating to indicate its perceived strength (which could inform the Responder and the administrator about its likely ‘stopping power’ in relation to an attacker). The second factor that would influence the appropriateness of a response in a particular context would be whether it had been used in the same context before. If the Responder keeps track of its previous response decisions, then they can subsequently be used as the basis for assessing whether the response actions were actually effective or not. This requires some form of feedback mechanism, which can then be used to refine the Response Policy. It is envisaged that feedback could be provided in two ways: explicitly by a system administrator, and implicitly by the Responder itself. In the former case, the administrator would inspect the alert history and manually provide feedback in relation to the responses that had been selected to indicate whether or not they had been effective or appropriate to the incident. By contrast, the latter case would require the Responder itself to infer whether previous
Operational Characteristics of an Automated Intrusion Response System
71
responses had been effective. A simplified example of how it might do this would be to determine whether it had been required to issue repeated responses in relation to the same detected incident. If this was the case, then it could potentially infer that (a) the initial response actions were not effective against that type of incident, and (b) the last response action issued might form a better starting point on future occasions (i.e. upgrading and downgrading the perceived effectiveness of the responses when used in that context). Having obtained such feedback, it would be desirable for the system to automatically incorporate it into a refined version of the Response Policy. This, however, would be a non-trivial undertaking, and it is anticipated that a full implementation of the system would need to incorporate machine-learning mechanisms to facilitate a fully automated process. An alternative would be to collate the feedback, and present it to the system administrator for later consideration when performing a manual overhaul of the Response Policy.
4 A Prototype Responder System As an initial step towards the development of the Responder, a prototype system has been implemented that demonstrates the main response features of IMS, including the ability to make decisions based on the information from IDS alerts and other contextual factors. The first element of the prototype is a console used to simulate intrusion conditions. In the absence of a full Detection Engine, or indeed genuine incidents, this is necessary to enable incident conditions to be configured before generating an alert to trigger the Responder’s involvement. The parameters that can be adjusted from the console interface include the ones that are meant to be provided by the Detection Engine in the alert message, and are illustrated in Figure 2. The Responder can form a decision by monitoring (or determining) an additional set of contextual parameters, and then using these in conjunction with the ones included in the alert message. The second component of the prototype is the Responder itself, which is responsible for receiving the alerts and making response decisions according to the given context. The Responder largely bases its decision upon the Response Policy, which can be accessed from the Responder module, by selecting the Response Policy Manager tool. A user-friendly interface is provided for the review of Policy rules, which are represented via a hierarchical tree, where the incidents are at the highest level and the response actions lie at the lowest levels. At the most basic level, there will be a one to one correspondence between a type of incident and an associated type of response. However, a more likely situation is that the desired response(s) to an incident will vary, depending upon other contextual factors, and the Policy Manager allows these alternative paths to be specified via intermediate branches in the tree. Between them, these intermediate branches comprise the conditions, under which specific response actions are initiated for particular incidents.
72
M. Papadaki et al.
Fig. 2. Prototype Console Interface
The IMS Response Policy Manager is illustrated in Figure 3, with an example of response rules that could be specified in relation to an ‘authentication failure’ incident. In this case, had there been an alarm from the Detection Engine describing the successful login of a suspected masquerador, the Responder would check for the most recent update of related software to ensure that it is not vulnerable, and initiate keystroke analysis and facial recognition (if available) to authenticate the user in a non-intrusive manner. Of course, the conditions for the latter to happen would not be just the occurrence of the incident. Only the addition of the alarm to a log file would happen in that case. For the previously mentioned responses to be issued, the intrusion confidence would need to be low (hence the responder would need to collect more information about the incident), the overall threat and the importance of the target would need to be at low levels as well, not justifying the issue of more severe responses. Also, the account involved would need to be not privileged, with login time outside the normal pattern, in order to issue non-intrusive authentication. Had there been a privileged account logged in at an abnormal time, then the urgency to collect more information about the incident would be greater and thus more intrusive countermeasures could be allowed. More authentication challenges like continuous keystroke analysis [17], the use of cognitive questions [18], and fingerprint recognition could also be used. Other methods that could be utilised include session logging (for further future reference or forensic purposes), alerting the user himself/herself about the occurrence of this suspicious behaviour (aiming to provoke a reaction from him/her and possible discourage him/her from any further unauthorised activity). Finally another option would be the redirection to a decoy system, in order to protect the integrity of the original target. Although this option would be more suited in the case of a server being compromised, it could still be an option for very sensitive environments, where a maximum level of security is required and minimum levels of risk are allowed. In any case, Figure 3 depicts an example of a security policy, which may or may not be optimal.
Operational Characteristics of an Automated Intrusion Response System
73
Fig. 3. IMS Response Policy Manager
Having determined the Response Policy, the Responder can make decisions about the alerts it receives. During normal operation, the Responder logs the details of responses that have been issued so that they can be tracked and reviewed by a system administrator. This is achieved via the Alert Manager interface (see Figure 4), which contains a list of suspected incidents, allowing them to be selected and reveal the response action(s) initiated for them. Each alert contains information about the incident itself, and the reasoning for the associated response decision. When viewing the alerts, it is also possible for the administrator to review the response decision that was made by the system, and provide feedback about the effectiveness of the actions selected. A full implementation of the Responder would use this feedback as the basis for automatic refinement of the response policy over time.
Fig. 4. IMS Responder: Alert Manager
74
M. Papadaki et al.
5 Conclusions and Future Work This paper has presented the requirements for enhanced intrusion response and the operational characteristics of an automated response architecture that enables flexible, escalating response strategies. The prototype system developed provides a proof-ofconcept, and demonstrates the process of creating and managing a flexible response policy, as well as allowing intrusion scenarios to be simulated in order to test the response actions that would be initiated. Although the IMS approach as a whole would not necessarily be suited to all computing environments it is considered that the automated response concept could still be more generally applicable. Future work could usefully include the integration of machine learning algorithms into the Responder implementation, in order to enable it to learn from the effectiveness (or otherwise) of previous response decisions and automatically refine the response policy accordingly. Based on the feedback from experience, the ability to learn and to assess its decision-making capability, the Responder could eventually attain a sufficient level of confidence to operate autonomously. Acknowledgments. The research presented in this paper has been supported by funding from the State Scholarships Foundation (SSF) of Greece.
References 1. 2. 3. 4. 5.
6. 7. 8.
9.
CERT Coordination Center: Security of the Internet, Vol. 15, The Froehlich/Kent Encyclopedia of Telecommunications, Marcel Dekker, New York (1997) 231–255 Richardson, R.: 2003 CSI/FBI Computer Crime and Security Survey (2003) http://www.gocsi.com/ Power, R.: 2002 CSI/FBI Computer Crime and Security Survey, Vol. VIII, No. 1, Computer Security Issues and Trends (2002) 10–11, 20–21 Denning, D.E.: An Intrusion-Detection Model, Vol. SE-13, No. 2, IEEE Transactions on Software Engineering (1987) 222–232 Allen, J., Christie, A., et al.: State of the Practice of Intrusion Detection Technologies, Technical Report CMU/SEI-99-TR-028, Carnegie Mellon University (2000) http://www.sei.cmu.edu/publications/documents/99.reports/99tr028/99tr028abstract.html Mukherjee, B., Heberlein, L.T.; Levitt, K.N.: Network Intrusion Detection, IEEE Networks 8, no.3 (1994) 26–41 Schneier, B.: Secrets and Lies: Digital Security in a Networked World, John Wiley & Sons (2000) Amoroso, E.: Intrusion Detection: An Introduction to Internet Surveillance, Correlation, Traps, Trace Back, and Response, Second Printing, Intrusion.Net Books, New Jersey (1999) Bace, R., and Mell, P.: NIST Special Publication on Intrusion Detection Systems, National Institute of Standards and Technology (NIST), http://csrc.nist.gov/publications/drafts/idsdraft.pdf (2001)
Operational Characteristics of an Automated Intrusion Response System
75
10. Newman, D., Snyder, J., and Thayer, R.: Crying Wolf: False Alarms hide attacks, Network World Fusion Magazine, http://www.nwfusion.com/techinsider/2002/0624security1.html/ (2002) 11. Cohen, F.B.: Simulating Cyber Attacks, Defences, and Consequences, The Infosec Technical Baseline studies, http://all.net/journal/ntb/simulate/simulate.html (1999) 12. Lee, S.Y.J.: Methods of response to IT system intrusions, MSc thesis, University of Plymouth, Plymouth (2001) 13. Cheung, S., and Levitt, K.N.: Protecting Routing Infrastructures from Denial of Service Using Cooperative Intrusion Detection, Proceedings of the New Security Paradigms Workshop, Langdale,Cumbria UK (1997) http://riss.keris.or.kr:8080/pubs/contents/proceedings/commsec/283699/ 14. Furnell, S.M., and Dowland, P.S.: A conceptual architecture for real-time intrusion monitoring, Vol. 8, No. 2, Information Management & Computer Security (2000) 65-74 15. Papadaki, M., Furnell, S.M., Lines, B.M., and Reynolds, P.L.: A Response-Oriented Taxonomy of IT System Intrusions, Proceedings of Euromedia 2002, Modena, Italy (2002) 87–95 16. Papadaki, M., Furnell, S.M., Lee, S.J, Lines, B.M., and Reynolds, P.L.: Enhancing response in intrusion detection systems, Vol. 2, No. 1, Journal of Information Warfare (2002) 90–102 17. Dowland, P., Furnell, S., and Papadaki, M.: Keystroke Analysis as a Method of Advanced User Authentication and Response, Proceedings of IFIP/SEC 2002 - 17th International Conference on Information Security, Cairo, Egypt (2002) 215–226 18. Irakleous, I., Furnell, S., Dowland, P., and Papadaki, M.: An experimental comparison of secret-based user authentication technologies, Vol. 10, No. 3, Journal of Information Management & Computer Security (2002) 100–108
A Secure Multimedia System in Emerging Wireless Home Networks Nut Taesombut, Richard Huang, and Venkat P. Rangan Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92093-0114, USA {ntaesomb,ryhuang,venkat}@cs.ucsd.edu
Abstract. With their availability of high-bandwidth and extensive transmission range, Wireless Local Area Networks (WLANs) become a compelling technology for developing next-generation home entertainment systems. In the near future, home multimedia systems will rely on wireless networks in lieu of typical wired networks in interconnecting media appliances. Wireless connectivity will enable many novel and advanced features to entertain and comfort a home user. However, before systems are viable and widely acceptable, there are security and copyright protection problems that need to be addressed. Since the system relies on wireless communication, privacy of streaming media content becomes a great concern among media content owners and legitimate consumers who wish to protect their intellectual property against illegal use. In this paper, we present a gateway-based architecture for secure wireless home multimedia systems. We have developed a protocol to ensure secure bootstrap registration of new media devices and protect communication within a wireless home network. We have built the system prototype to evaluate the timing overhead of the device registration process and identify the bottleneck.
1
Introduction
In the past, the lack of high speed and broad transmission range was a main obstacle that hindered the widespread deployment of wireless communication technologies. With an advanced “third generation” (3G) mobile phone, a user can exchange only non-bursty-traffic information, such as a voice, a small image and a short video clip. Nonetheless, the recent advancements in the IEEE 802.11 standard [1] yield a new paradigm for considerably higher bandwidth communication over a wireless medium. As a consequence, multimedia streaming which generally requires a bandwidth up to several megabits per second becomes feasible in a wireless network. Such advanced wireless technology has catalyzed the development of digital multimedia systems for homes and small enterprises today. It allows media appliances and conventional computers to communicate with each other using a radio signal rather than through a wired cable. One key advantage of adopting WLAN in implementing multimedia systems over A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 76–88, 2003. c IFIP International Federation for Information Processing 2003
A Secure Multimedia System in Emerging Wireless Home Networks
77
its wired counterpart is that all wiring difficulty during system installation and adjustment can be eliminated. Furthermore, a device can be relocated unrestrictedly and seamlessly within the wireless network, thus allowing more flexibility and convenience for a user to use the system. In the near future, WLANs are expected to increasingly replace wired networks in interconnecting media appliances in homes and small workplaces. We envision next-generation home entertainment systems built from off-the-shelf media devices that can be automatically recognized by the system (plug and play) and communicate with each other wirelessly. The Internet has revolutionized the way digital media content is produced, distributed and consumed. Varying kinds of media content, such as movies and songs, are now available online and can be delivered to home users at low cost. Nevertheless, not all media content are available for public consumption; some are for sale or controlled by other restriction rules. In a wireless network, which relies on a broadcast medium, one can easily intercept streaming media content from a network neighborhood and illegally forward it to his display device. Due to the digital nature of today’s media content, once compromised, unlimited number of copies can be made and redistributed. This has posed a great concern to media content owners who wish to protect their intellectual property against online piracy. Therefore, to make such wireless home multimedia systems viable, there are significant digital media content protection and security challenges that need to be addressed. It has been shown that a wireless network based on the IEEE 802.11 standard is untrustworthy. Even though there are a few techniques that provide authentication, privacy and access control in wireless LANs [1,2], in practice these security mechanisms are ineffective and usually not enabled by default. In addition, several flaws have been found in the design and implementation of WLAN [3,4,5]. A smart attacker can eavesdrop an on-going communication in, obtain an unauthorized access from, and inject a bogus message into the wireless network. For this reason, the existing security mechanisms should not be counted on to provide a reliable communication in WLAN. In this paper, we present a gateway-based architecture of the wireless home media network and develop a secure registration protocol for it. The primary goal of our proposed architecture is to secure a wireless home network as well as facilitate digital rights management (DRM) for media content protection. The protocol aims to ensure secure bootstrap registration of new media appliances. It provides mechanisms for mutual authentication and trust establishment between media appliances and a home gateway. At the end of the protocol, session keys will be generated and distributed to the participating gateway and media appliance. We demonstrate an application of these keys with a lightweight video encryption algorithm to encrypt video streams. The major challenge in the design and implementation of the system is the resource constraints of a lightweight media appliance. A media appliance, such as TV and DVD player, generally has limited processing capabilities and memory resources. We overcome such limitations by avoiding computationally expensive operations at a
78
N. Taesombut, R. Huang, and V.P. Rangan
media appliance. We expect our full-fledged system to provide a strong security and content protection guarantee for wireless home multimedia networks. The remainder of this paper is organized as follows: In Section 2, we describe the gateway-based architecture of the wireless multimedia system followed by its components and advantages. In Section 3, we present the secure registration protocol and its security analysis. In Section 4, we illustrate an application of the generated session keys to protect the privacy of digital video. In Section 5, we present the system prototype built to evaluate the performance of the device registration process. In Section 6, we discuss the future work and conclude the paper.
2
Wireless Home Multimedia System
This section presents a gateway-based architecture of the wireless home media network. The proposed architecture is illustrated in Figure 1. The secure multimedia system can be viewed as two connected networks: (1) a wireless home network and (2) a wired global network. All communication across these two networks is managed through a master gateway. The wireless home network is based on the WLAN technology. It provides a hosting environment to media appliances (e.g. TVs, DVD players and speakers). As communication within WLAN is untrustworthy, when a media appliance first emerges in the network it does not trust any other device. On the other hand, the wired global network is the well-known Internet in which an authentication server and media content providers reside. In contrast to that in WLAN, any communication between two parties in the Internet is assumed to be trusted using SSL. The rest of this section will describe the system components and the advantages of the gateway-based approach.
Fig. 1. Architecture of Wireless Home Multimedia System
A Secure Multimedia System in Emerging Wireless Home Networks
2.1
79
System Components
The wireless home multimedia system comprises of four main components: namely (1) media device; (2) active gateway; (3) media content provider; and (4) authentication server. The detail of each component is given in this section. Media Device. Each media appliance connects to the home WLAN through a lightweight interface called Buddy [6]. The buddy is equipped with an encoder and/or decoder that corresponds to the media types that the device is capable of playing. However, as the buddy is designed to be a simple and inexpensive attachment, it does not include any user interface and cannot perform processingintensive cryptographic operations. For convenience, a media appliance together with its buddy will be referred to as a media device. Active Gateway. A master gateway called Actiway (Active gateway) is a resourceful gateway located between the wireless home network and the Internet. Connectivity to the Internet via the gateway allows media content providers to deliver media content to remote devices. The gateway maintains a record of all the devices owned by the user, enforces access control and copyrights policy as well as mediates communication among media devices. Since the system aims to support varying kinds of media devices, many of which may have their specific data formats, the gateway also functions as a media switch, capable of performing necessary media type conversion and streaming media content from multiple sources to multiple playback devices. Media Content Provider. A media content provider is a storage server that contains multimedia presentations, such as movies, songs, etc. These documents can be accessed from a media device in the wireless home network. If the media content is available for public consumption, a home user can download the content via the gateway to his display device and replay it many times. On the other hand, a user may be required to pay for the content before he can get access to it. Authentication Server. An authentication server (or AS) is a trusted server and may be specific to device manufacturer. It maintains a secured database that contains a unique ID, embedded keys and an access key of each genuinely manufactured device. When the device first emerges in the network, the device and gateway do not trust each other. The AS mediates the establishment of trust and a secure channel between them. 2.2
System Advantages
We have chosen to use the gateway-based architecture for the wireless home multimedia system. In this architecture, the gateway is a master controller over all media devices in the wireless home network. This section summarizes the benefits resulting from using the gateway-based approach.
80
N. Taesombut, R. Huang, and V.P. Rangan
Media Switching. In a multimedia-based network, communicating information is inherently media content. Multiple media sources can simultaneously deliver media content to multiple media sinks. With the gateway as a central media switch, media content can be appropriately forwarded to media sinks and conversion between different types of media format can be efficiently managed.
Media Synchronization. When a media source streams media content to more than one media sink, the emergence of asynchrony among the different media sinks is imminent due to an unreliable nature of a wireless network. The gateway can perform as a conductor to ensure synchrony over all media sinks. Using adaptive feedback techniques for media synchronization and continuity as described in [7], the media sinks can periodically send synchronization information back to the gateway and the gateway can use this information to dynamically adapt media streaming rate appropriate for individual media sink.
Centralized Access Control. As an access controller, the gateway controls which media devices can enter the wireless network and participate in the home multimedia system. All communication between media devices and other machines in the Internet and among the devices themselves in the home network must be through the gateway. A device will not be allowed to communicate with any other device in the system unless it can properly identify and prove itself as trusted (authentic) one.
Digital Rights Management. The gateway-based architecture of the wireless home multimedia system facilitates the concept of Digital Rights Management (DRM). Instead of streaming media content to (possibly dishonest) media devices directly, the media content provider delivers the content through the gateway. The gateway can enforce media rights policies provided by a media content provider. The protected media content are buffered at the gateway and then forwarded to the media device in a manner corresponding to the restriction rule. The gateway ensures that the content will never be copied or distributed illegally.
3
Secure Registration Protocol
In this section, we present the Secure Registration Protocol (SRP) for the wireless home multimedia system. Here, we provide only an overview of the protocol. The detailed description can be found in our previous paper [8]. The primary goals of the protocol are to provide a secure bootstrap registration for a new media device and to establish a secure communication channel between the device and the home gateway.
A Secure Multimedia System in Emerging Wireless Home Networks
3.1
81
Protocol Overview
We now present a high-level description of the protocol. Figure 2 illustrates all messages exchanged in the protocol1 . There are three main stages in the protocol as summarily described below.
Device
Gateway
AS
DevReq DevAut DevAutRes GatAut GatAutRes DevReqRes DevKeyVer GatKeyVer DevFin
Fig. 2. Secure Registration Protocol
Authentication of Media Device to Authentication Server. The device needs to identify and authenticate itself to the authentication server in order to show that it is genuine. When manufactured, the device is associated with a globally unique ID (DeviceID) and embedded with two secret keys (KE,KMAC ) – the former is used as a key for encryption functions and the later is used a key for message authentication code (MAC) generation functions. These keys are known only to the device itself and the authentication server. Therefore, to authenticate, the device creates a new request message, called DevReq, with HMAC record signed by KMAC and broadcasts it over the WLAN to the gateway. The gateway then copies all information in the received message together with its GatewayID into a new message, called DevAut, and forwards it to the authentication server via a secure channel. Since the authentication server also knows KMAC, it can check whether the device is authentic by re-computing the HMAC record. If both HMACs match, the authentication server accepts and trusts the device. The server then notifies the gateway with DevAutRes that the device can be trusted. 1
See Appendix for the detailed message format and the notations we use.
82
N. Taesombut, R. Huang, and V.P. Rangan
Authentication of Gateway to Authentication Server. The gateway also has to prove itself to the authentication server that it is authorized to control the device. At the time of purchase, the user will be given a device together with its access key 2 . The user can delegate his access rights to the device to the gateway by entering the access key into the gateway. The gateway sends the hashed value of the access key as a certificate in GatAut to the authentication server. Session Key Distribution. If the access key is valid and the corresponding device has never been associated with any other gateway, the authentication server associates the device with the requesting gateway. The server then grants a ticket3 to the gateway in the GatAutRes message. Next, the authentication server generates a master key and securely distributes it to the device and gateway (via GatAutRes and DevReqRes messages). At this point, both the device and gateway use the master key to generate two session keys (SKE,SKMAC ) with which they will use to establish a secure communication channel. However, before they start the secure session, they need to verify that their derived keys are matched (with DevKeyVer, GatKeyVer and DevFin messages). 3.2
Protocol Analysis and Security Considerations
The goals of the protocol and the secure system are security of a wireless network and copyright protection of communicating media content. This section shows how these security properties can be achieved. Preventing Unauthorized Devices from Entering a Home Network. If an unauthorized device attempts to connect to a home network, it needs to supply the authentication server with its private KMAC key that has not been previously associated with any other gateway. Since this key is securely embedded with the device and has never been exposed, it could not be compromised. In addition, the user needs to supply a device’s access key in the authentication process. If an unauthorized device requests to join the network, the user can observe the emergence of this device from the gateway’s user interface. Either of these processes will fail for an unauthorized device. Preventing Eavesdropping and Message Forgery over Device-Gateway Communication. Following a successful registration, all communication between device and gateway is either encrypted or noise-modulated with SKE that is generated from the confidential master key known only to the gateway, the participating device and the authentication server. Without knowledge of this key, an attacker cannot derive any intelligible information from an obscured 2 3
The access key is a credential that its possessor can claim as a rightful controller of the device. Possession of the ticket proves to the device that the gateway is authorized to control the device. See M2 in Table 4 for the detail of the ticket.
A Secure Multimedia System in Emerging Wireless Home Networks
83
communication in the wireless home network. To verify the integrity of messages transmitted over the wireless home network and the identity of its sender, every message is attached with HMAC digest generated using the secret key SKMAC. Unless this digest is valid, the received message is rejected and discarded. Therefore, an attacker cannot impersonate a trusted entity or inject a counterfeit message into the home network.
4
Lightweight Video Encryption
At the end of the SRP protocol, session keys (SKE, SKMAC ) are generated and distributed to the device and the gateway. These session keys will be used to establish a secure communication path between the two parties, which guarantees communication privacy and message authentication. This section presents an application of the session keys, a lightweight video encryption which appropriate for resource-constrained devices. 4.1
Background
MPEG (Moving Picture Experts Group) standards, including MPEG-1, MPEG2, MPEG-4, and MPEG-7, are widely used in multimedia applications. MPEG standards consist of three parts: audio, video, and the interleaving of the two streams. In this paper, we consider MPEG-1 video, although similar operations can be applied to the other MPEG models. An MPEG sequence consists of units called Group Of Pictures (GOPs). Each GOP contains I (intracoded) frames, P (predicted) frames, andB(bidirectionallypredicted) frames. Motion estimation in MPEG operates on Macroblocks. We’re interested in the intracoded macroblocks. They can occur in P and B frames in addition to I frames. This is important because some selective methods encrypt only the I frames, but leaves I blocks in P and B frames unencrypted. This is not sufficiently effective because I blocks usually contain the most important aspect of a frame. 4.2
Lightweight Video Encryption
By making the observation that P and B frames are interpolated from I frames, we decided to encrypt only I frames with the 128 bit session keys. We experimented with encrypting a combination of Y, U, and V blocks. Encrypting any of Y, U, or V blocks by itself does not fully make the frames unrecognizable. Encrypting Y and one of U or V is sufficient to result in an unrecognizable picture. Our lightweight video encryption works by applying DES to the AC components of the Y and V blocks of the I frames. However, to provide a higher level security, we can optionally encrypt the I blocks within the P and B frames. Another possibility would be to encrypt only I blocks in all frames. Figure 3 shows the result of encrypting.
84
N. Taesombut, R. Huang, and V.P. Rangan
Fig. 3. From upper-left to bottom right, unencrypted I frame, encrypted I frame, P frame from encrypted I frame and unencrypted I blocks, and P frame from encrypted I frame and encrypted I bocks
5
System Prototype
This section focuses on the system prototype we have built to demonstrate the potential of the proposed architecture. We give an overview of the prototype and present its implementation. Here, we also describe the experiment we performed to evaluate the performance of the device registration process. 5.1
Implementation
The main objective of implementing the prototype is to illustrate and evaluate the secure device registration process of the system. Figure 4 shows the physical structure of the prototype. As can be seen, the prototype consists of a media device, a gateway, an authentication server and a wireless access point. The gateway, the authentication server and the access point are connected through a 10 Mbits/sec speed LAN, while the media device connects to the access point via a 11 Mbits/sec speed WLAN. The system specification is as follows: the media device is a 500 MHz Pentium III Dell Inspiron 7500 with 192 MB RAM and a Lucent WaveLan 10/100 PC card; the gateway is a 1.8 GHz Pentium 4 Sony PCG-GRS100P with 512 MB RAM and a Inter(R) PRO/100 VE Network Connector card; and the authentication server is a 800 MHz Celeron HP desktop with 128 RAM and a Linksys NC100 Fast Ethernet adapter. All machines operated on RedHat 8.0 with kernel version 2.4.18-14 and all processes were written in GNU C. The authentication server had a database management system (DBMS) running on it. The DBMS we used is MySQL version 3.23 (www.mysql.com). The database stored records of each genuinely manufactured device comprising of DeviceID, embedded secret keys and the current value of device counter. The SSL and cryptographic library is the OpenSSL library version 0.9.7 (www.openssl.org). The
A Secure Multimedia System in Emerging Wireless Home Networks
85
Fig. 4. System Prototype
key-hashing function and the symmetric key encryptions we used are HMAC [9] and AES [10] respectively. 5.2
Performance of Secure Registration Process
We used the setup as described above to perform an experiment. The main goal of the experiment is to measure the timing overhead of the secure registration process of the system and to identify the bottleneck. The overhead includes the processing time of the process at each device and the communication time among them. The experiment we performed is as described below: Initially, the gateway had established a secure communication with the authentication server using SSL and it was listening for a new request from the device. We then started running the device process. We measured the total elapsed time in the secure registration process starting from the time the device sent the DevReq message until when the gateway received the DevFin message. Note that during the time the experiment was performed no other nodes were active. Therefore, we assume that there is no interference with our experimental results. Table 1 shows the average elapsed time for the secure registration process to complete. The total elapsed time for the entire process is approximately 303.6 msec. As can be seen, the communication time dominates the processing time, accounting for 96.5 percent of the total elapsed time. Specifically, most of the communication time was spent in the conversation between the gateway and the authentication server (230.6 msec), as compared to that between the device and the gateway (62.5 msec). Next we consider the processing-time overhead of the secure registration process. Unfortunately, due to our resource limitations, the machines we used to perform the experiment had different processing capabilities. The gateway
86
N. Taesombut, R. Huang, and V.P. Rangan Table 1. Average elapsed time for SRP protocol Total elapsed time Processing time - at device - at gateway - at auth server Communication time - between device and gateway - between gateway and auth server
Time(msec) 303.6 10.5 1.0 2.0 7.5 293.1 62.5 230.6
Table 2. Processing time for each category of operations at the device Key-hashing function (HMAC) Encryption function Other operations
Time(msec) 0.5 0.1 0.4
process, which was run on the fastest machine (1.8 GHz), would take longer time to finish when it was run on the other machines in the experiment. Nonetheless, as can be observed in Table 1, the process at the authentication server incurs the highest overhead. From our detailed analysis, this was a consequence of the time for the AS process to make a query to the database system. In our experiment, we were specifically interested in the processing-time overhead at the media device since in practice only the device would have limited processing resources. Table 2 shows the processing time of each category of operations executed at the device. As we expected, the types of operations that dominated the others were cryptographic operations, including keyed-hashing functions and symmetric key encryptions. Although such operations are computationally expensive, they are required for our system to meet its security goals. In fact, the processing-time overhead would have been much higher, if we instead used public key encryptions and digital signatures. Since the communication time results in the largest portion of the total elapsed time, minimizing the number of exchanged messages would greatly reduce the timing overhead. It is possible to combine the device authentication process and the gateway authentication process by merging the DevAut and GatAut and sending them together. Another possibility of optimization is to start performing session key verification process at the gateway rather than at the device. The gateway can generate a random number, encrypt it and send the outcome with the DevReqRes message to challenge the device. Only two additional messages (DevKeyVer and GatKeyVer ) are required to complete the entire verification process and the protocol. As a result, one roundtrip time be-
A Secure Multimedia System in Emerging Wireless Home Networks
87
tween the gateway and the authentication server and one-trip time between the device and the gateway can be eliminated.
6
Conclusion and Future Work
We have presented the design and implementation of our gateway-based architecture that aims to secure a communication in wireless home networks and protect digital media content from online piracy. A major service provided by the system is a secure registration where new media devices can be securely registered at the master gateway. In addition, we have developed the system prototype and performed the experiment to investigate the timing overhead of the registration process. The results we got are promising; it takes approximately 0.3 second to complete the entire registration process and the processing resource used at the device is minimized. The next step of our project is to design and implement a lightweight communication protocol suitable for other multimedia-based applications and apply the Digital Rights Management (DRM) technology to our prototype.
References 1. L.M.S.C. of the IEEE Computer Society: Wireless LAN Medium Access Control. IEEE Standard 802.11, 1999. 2. Lucent Orinoco: User’s Guide for the ORiNOCO Manager’s Suite, November 2000. 3. W. A. Arbaugh, N. Shankar, and Y. J. Wan: Your 802.11 Wireless Network Has No Clothes, http://www.cs.umd.edu/waa/wireless.pdf. 4. J. Walker: Unsafe at Any Key Size: Analysis of the WEP Encapsulation. Technical Report 03628E, 802.11 Committee, March 2002. 5. Intercepting Mobile Communications: The Insecurity of 802.11. in Proceedings of the 7th Conference on Mobile Computing and Networking, March 2002. 6. I. Ramani, R. Bharadwaja, and P. V. Rangan: Location Tracking for Media Appliances in Wireless Home Networks. in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’03), July 2003. 7. S. Ramanthan and P. V. Rangan: Adaptive Feedback Techniques for Syschronized Multimedia Retrieval over Integrated Networks. in IEEE/ACM Transaction on Networking, 1993. 8. N. Taesombut, V. Kumar, R. Dubey, and P. V. Rangan: A Secure Registration Protocol for Media Appliances in Wireless Home Networks: in Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’03), July 2003. 9. M. Bellare, R. Canetti, and H. Krawczyk: Message Authentication Code Using Hash Functions: The HMAC Construction. in Proceedings of the 23rd International Cryptology Conference (CRYPTO’96), March 1996. 10. J. Daemen and V. Rijmen: AES Proposal: Rijindael. in Proceedings of the 1st AES Conference, August 1998.
Appendix Table 3 shows all notations and terms used in the paper and Table 4 illustrates the detailed format of the messages involved in the secure registration protocol.
88
N. Taesombut, R. Huang, and V.P. Rangan Table 3. Notations and terms used in the paper.
M1 , M2 DeviceID GatewayID N Dcount Scount1, Scount2 AccKey M asterKey KE
Concatenation of two messages M1 and M2 Device’s identification Gateway’s identification Random number for key verification Counter maintained by device Two separate counters maintained by authentication server
Device’s access key Master key Embedded key for encryption function shared between device and authentication server KM AC Embedded key for HMAC function shared between device and authentication server SKE Session key for encryption function shared between device and gateway SKM AC Session key for HMAC function shared between device and gateway HM ACk () Keyed-hashing function for message authentication code using key k H() One way hashing function Ek () Encryption function using key k DevAcc Pre-determined number indicating that the device authentication is successful GateAcc Pre-determined number indicating that the gateway authentication is successful F inish Pre-determined number indicating that the protocol is complete Table 4. Detail of messages exchanged in the SRP protocol. DevReq (Device → Gateway) DevAut (Gateway → AS) DevAutRes (AS → Gateway) GatAut (Gateway → AS) GatAutRes (AS → Gateway) DevReqRes (Gateway → Device) DevKeyV er (Device → Gateway) GatKeyV er (Gateway → Device) DevF in (Device → Gateway)
M1 , HM ACKM AC (M1 ) where M1 = DeviceID, Dcount GatewayID, M1 , HM ACKM AC (M1 ) DevAcc, GatewayID, DeviceID, Scount1 GatewayID, DeviceID, Scount1, h(AccKey, Scount1) GateAcc, M2 , M asterKey, HM ACKM AC (M2 ) where M2 = GatewayID, DeviceID, EKE (M asterKey), Scount2, Dcount M2 , HM ACKM AC (M2 ) M3 , HM ACSKM AC (M3 ) where M3 = GatewayID, DeviceID, ESKE (N ) M4 , HM ACSKM AC (M4 ) where M4 = GatewayID, DeviceID, ESKE (N − 1) M5 , HM ACSKM AC (M5 ) where M5 = F inish, DeviceID, GatewayID
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents Yusuke Sakabe , Masakazu Soshi, and Atsuko Miyaji School of Information Science, Japan Advanced Institute of Science and Technology 1-1 Asahidai, Tatsunokuchi, Nomi, Ishikawa 923-1292, JAPAN {y-sakabe,soshi,miyaji}@jaist.ac.jp
Abstract. In this paper we propose novel techniques to obfuscate Java programs for developing secure mobile agent systems. Our obfuscation techniques take advantage of polymorphism and exception mechanism of object-oriented languages and can drastically reduce the precision of points-to analysis of the programs. We show that determining precise points-to analysis in obfuscated programs is NP-hard and the fact provides a theoretical basis for our obfuscation techniques. Furthermore, in this paper we present some empirical experiments, whereby we demonstrate the effectiveness of our approaches. Keywords: mobile agents, security, obfuscation, static analysis, computational complexity
1
Introduction
In recent years mobile agent systems have been paid much attention to as a new computing paradigm. Here a mobile agent is a program that migrates through several hosts and performs a specific job on behalf of users. They can not only move to other hosts, but can also offer advanced computing services such as information retrieval and cooperative computation with other agents. In addition because a mobile agent is self-contained, i.e., it is accompanied by resources necessary for execution, it can run on a host offline. Consequently, mobile agent systems can realize a promising computing environment for next generations, in particular an infrastructure suitable for electronic commerce. In such an application area, security is of critical importance since the mobile agents process sensitive data of users on behalf of them. In order to attain security for mobile agent systems, we must provide protection against the two kinds of attacks: (i) attacks by malicious agents and (ii) attacks by malicious hosts. The problem of the former attacks has been well understood and thus various countermeasures are available, e.g., sandbox security technologies, cryptographic techniques such as encryption, digital signatures, or authentication protocols. On the other hand, few attempts have been made so
The author is currently with Sony Corporation, Network Application & Content Service Sector.
A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 89–103, 2003. c IFIP International Federation for Information Processing 2003
90
Y. Sakabe, M. Soshi, and A. Miyaji
far on solutions to the latter problem. Hence we can hardly find any established techniques to solve the problem, except for those with some dedicated hardware. For instance, encryption cannot be used to solve the problem since encrypted mobile agents must be eventually decrypted into executable forms and then become vulnerable against the attacks mounted by malicious hosts1 . In order to solve the problem, obfuscation of agent programs is very promising [2,1]. Therefore in this paper we propose novel methods to protect mobile agents via software obfuscation. The proposed methods are to obfuscate Java since Java is one of the most excellent object-oriented languages for developing mobile agents. We believe that our obfuscation techniques are easily applicable to other object-oriented languages such as C++. Software obfuscation has been vigorously studied so far [3,2,1,4,5,6]. Unfortunately previous software obfuscation techniques share a major drawback that they do not have a theoretical basis and thus it is unclear how effective they are. In order to mitigate such a situation, Wang et al. proposed a software obfuscation technique based on the fact that aliases in a program drastically reduce the precision of static analysis of the program [6]. However, their approach is limited to the intraprocedural analysis. Since a program consists of many procedures in general, we must conduct interprocedural analysis whether or not it is obfuscated. Hence Ogiso et al. proposed obfuscation techniques with the use of function pointers and arrays, which greatly hinder interprocedural analysis [5]. Their work is very promising since it succeeds in providing theoretical basises for the effect of obfuscation techniques. However, unfortunately, the techniques in [5,6] cannot straightforwardly apply to object-oriented languages, especially, to Java. Our obfuscation techniques take advantage of polymorphism and exception handling mechanism of Java, which can drastically reduce the precision of static analysis of the obfuscated programs and thus make them significantly harder to understand and to modify. Hence in mobile agent systems developed in Java, the techniques can provide protection against attacks by malicious hosts. More technically, our obfuscation techniques are based on the difficulty of points-to analysis2 , which can be proved to be NP-hard [8]. Therefore, we can also provide a theoretical basis to the techniques. This is one of the great advantages of our approaches over other related work. We plan to build secure mobile agent systems with our proposed methods and are now conducting various experiments and implementations. In this paper we present some of empirical evaluation, whereby we demonstrate the effectiveness of our approaches. The rest of the paper is structured as follows. In Sect. 2, we introduce polymorphism and exception in Java to describe our approaches. Next in Sect. 3, we discuss related work to ours and point out their drawbacks. In order to solve 1 2
In this paper, due to space limitation, we cannot go into the details of the problem of attacks by malicious hosts. Refer to [1] for further details. See also [7] for the difficulty of conducting static analysis in Java in cases other than ours.
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents
91
them, we propose new obfuscation techniques in Sect. 4, and we theoretically evaluate our techniques in Sect. 5. In Sect. 6 we show empirical experiments of the techniques. Finally we conclude this paper in Sect. 7.
2
Java
Our obfuscation techniques take advantage of functions of Java as an objectoriented language such as polymorphism, exceptions, and so on. Therefore, before going into details of the techniques, in this section we explain about the functions that we use.
2.1
Object-Oriented Languages
Object-orientation is the framework to describe a program with objects and messages. Object-oriented language have advantages over traditional languages such as C from the viewpoint of the cost for reuse or maintenance of programs. Object-oriented languages mainly consist of the following three foundations: 1. encapsulation: integrates data and routines, 2. inheritance: defines a hierarchical relationships among objects, and 3. polymorphism: handles different functions by a unique name. While these functions often make it easier to implement programs for large scale or advanced application, the behavior of the program is likely to be more complex. As a result, the analysis of object-oriented programs often becomes more difficult. Our proposed obfuscation techniques exploit this fact. In the rest of this section, we describe about Polymorphism on Java and Exception with subtyping, which are ingeniously used in our proposed obfuscation techniques.
2.2
Polymorphism on Java
Polymorphism is one of the fundamental mechanisms of object-oriented languages, which can “handle different functions by a unique name.” Especially in Java, we can implement polymorphism by the following features; 1. method override with class subtyping, 2. interface, or 3. method overload. We explain how to implement polymorphism with interface and method overload, which are used for our obfuscation techniques.
92
Y. Sakabe, M. Soshi, and A. Miyaji interface F { public void m(); } class A implements F { public void m() { System.out.println("I’m A"); } } class B implements F { public void m() { System.out.println("I’m B"); } } { F obj; obj = new A(); c1: obj.m(); obj = new B(); c2: obj.m(); } Fig. 1. Example of Interface
Interface. Fig. 1 is an example of the use of interface. The variable obj is defined as the type of interface F, therefore obj can be an instance of a class that implements interface F. When this code is executed, the string “I’m A” is printed at the program site c1, because obj is an instance of class A. And at the site c2, the program prints “I’m B” because obj is an instance of class B there. Here notice that the code obj.m() at c1 is identical to the one at c2, although, different methods are called according to the class types of obj. That behavior is not decided at the time that the program is compiled, but is dynamically decided when it is executed.
static void m(int arg) { System.out.println("int"); } static void m(char arg) { System.out.println("char"); } {
int i=0; char c=0; c1: m(i); c2: m(c);
} Fig. 2. Example of Overloading
Method Overloading. Next in Fig. 2, we show a Java code that performs method overloading. At the site c1 and c2, methods of the same name m are called. The difference between them is the type of the arguments, which is int at site c1 and char at c2. Consequently the string printed on the terminal is
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents
93
“int” and “char”, respectively. If there are some methods with the same name, the type or the number of the arguments determine which method is called. class E1 extends Exception {} class E2 extends Exception {} { int d, u; 1: d=1; try { 2: method(); // may throw an exception E1 or E2 } catch (E1 e1) { 3: d=4; } catch (E2 e2) { 4: d=3; } 5: u = d; } Fig. 3. Example of Exception
2.3
Exception
Java uses exceptions to provide error-handling capabilities. An exception is an event that occurs during program execution, which disrupts the normal flow of instructions. In Java programs, throw and catch (and finally) statements are used to handle exceptions. In order to utilize exceptions, at first we define a class for each error type, then throw/catch the instance of the class. Fig. 3 is a simple example of how to use exceptions in Java codes. If the method() called at the site 2 may throw an exception E1 or E2, the value of d substituted for u at the site 5 changes dependently on which exception was thrown, or whether the exception was thrown or not. Thus, operation of the program containing exception exhibits non-deterministic property.
3
Related Work
In this section, we discuss some of existing approaches to solve the problem of attacks by malicious hosts. Sander and Tshudin proposed mobile cryptography for the problem [9]. In mobile cryptography, we can develop programs that perform operation on encrypted data. It has an advantage that the security is provable, although, it cannot be applicable to general agent programs but only to those of a rather specific form. Hence it is of very limited use for practical situations. For other cryptographic approaches, see also [10]. Now protection of mobile agents with software obfuscation is being paid much attention to. Therefore below we present obfuscation techniques proposed so far, some of which are not limited to security for mobile agent systems.
94
Y. Sakabe, M. Soshi, and A. Miyaji
Hohl proposed the concept of ‘time-limited blackbox security’, which provides tamper-resistance for mobile agents by software obfuscation techniques until a prescribed time limit [1]. For other obfuscation approaches, Mambo proposed obfuscation techniques in which frequency distributions of instructions in obfuscated programs are made as uniformly as possible by limiting available instructions for obfuscation [4]. Unfortunately previous software obfuscation techniques share a major drawback that they do not have a theoretical basis and thus it is unclear how effective they are. In order to mitigate such a situation, Wang et al. proposed a software obfuscation technique based on the fact that aliases in a program drastically reduce the precision of static analysis of the program [6]. However, their approach is limited to the intraprocedural analysis. Since a program consists of many procedures in general, whether or not it is obfuscated, we must conduct interprocedural analysis. Hence Ogiso et al. proposed obfuscation techniques with the use of function pointers and arrays, which greatly hinder interprocedural analysis [5]. Their work is very promising since they are successful in providing the theoretical basis for the effect of obfuscation techniques. Unfortunately the techniques in [5,6] cannot straightforwardly apply to object-oriented languages, especially, to Java, because they require the use of pointers or goto statements, which are not supported in Java3 . Therefore we need new obfuscation techniques that can be applicable to Java.
4
Proposed Obfuscation Techniques
From the discussions so far, in this section we shall propose new software obfuscation techniques using object-oriented features of Java. 4.1
Use of Polymorphism
As described in Sect. 2.2, polymorphism is one of the fundamental mechanisms of object-oriented languages, which can handle different functions by a unique name. While polymorpihsm makes it easy to implement complicated algorithms, it also makes the behavior of the created programs more complicated. Consequently it is difficult to analyze those program. We apply such a feature to our obfuscation techniques. Our obfuscation procedures with respect to polymorphism on Java are given below. They consist of three phases: (1) Introduction of method overloading, (2) Introduction of interfaces and dummy classes, and (3) Change types and new sentences. Below, the procedures are concisely described because of space limitation. Also notice that although the example programs below that result from 3
Collberg et al. proposed some obfuscation techniques using object-oriented features [2], however their techniques are limited to rather simple ones, e.g., disturbance of class hierarchies. Furthermore they do not provide any theoretical basis about how effective their techniques are.
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents
class A { int display(); int calc(int, int); void run(); ... }
class B { float show(); float move(float, float); ... }
95
class Rt { ... } class Ag { ... } class Ag1 extends Ag { ...} ... class Ag3 extends Ag { ... } class A { Rt m(Ag1); // display Rt m(Ag2); // calc Rt m(Ag3); // run ... }
class B { Rt m(Ag1); // show Rt m(Ag2); // move Rt m(Ag3); // *dummy ... }
Fig. 4. Change definitions of methods
obfuscation are intentionally not so obfuscated for the purpose of explanation, it is not difficult to transform a program into any more obfuscated form. (1) Introduction of Method Overloading. At first, we introduce new classes Ag and Rt as preparation. The instance of Ag is used to preserve method4 arguments, and Rt is to preserve return values. Then we pick some classes randomly, and we create new classes Ag1∼Agn derived from Ag, where n is the maximum number of methods contained in the picked classes. Next, we change the name of every method contained in the classes into the same name. Then we change the type of return value for every method into type Rt, and change the type of the arguments into a type of subclasses of Ag. In this step, the numbers of the arguments of the methods are made to be the same. Class Rt manages information of return types and values, and class Ag and its subclasses manage argument types and values. Moreover, when two or more classes are chosen, some number of dummy methods are added so that the number of the methods of each class becomes the same. Fig. 4 is an example of definition changes of methods described above. In that case class A and B are chosen, then the name of every method in two classes changes into ‘m’, and a dummy method is added to class B. Finally all return types are changed into Rt, and the arguments into Ag1∼Ag3. Here, notice the method calc of class A. Although the method originally require two arguments of type int, it changes into ‘m(Ag2)’, which requires one argument of type Ag2. Moreover the return type of calc is changed from int into Rt. Therefore, to maintain the semantics of the original program, we need to modify the method call and the method itself. This process is illustrated in Fig. 5. The constructor of Ag2 requires two int arguments, and getRetValue(int z) returns int value. They correspond to the arguments and the return value of original method calc, respectively. We apply this transformation for each method call. 4
In Java there are two types of method, instance method and static(class) method, and our procedure does not count static methods. Hereinafter a ‘method’ means instance method.
96
Y. Sakabe, M. Soshi, and A. Miyaji
// in class A public int calc(int x, int y) { int z; ... return z; }
// in class A public Rt m(Ag2 p) { // calc int x, y, z; x=p.getArg(0); y=p.getArg(1); ... Rt p = new Rt(z); return r; }
{ int x, y, z; A the_a = new A(); ... z = the_a.calc(x, y); ...
{ int x, y, z; A the_a = new A(); Rt r; Ag2 p; ... p = new Ag2(x, y); r = the_a.m(p); z = r.getRetValue(z); ...
}
}
Fig. 5. Modify method and method call
interface I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); } class A implements I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); ... }
class B implements I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); ... }
// *dummy class C implements I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); ... }
Fig. 6. Definition of new Interfaces and Classes
(2) Introduction of Interfaces and Dummy Classes. In this step we newly introduce interface, and dummy classes if needed. The interface defines methods transformed in step (1), and we make targeted classes to ‘implements’ this interface. Moreover, we newly create classes that play no role (i.e., dummy). These dummy classes also need to implement the interface defined immediately before. If dummy classes are not needed for some reasons (for example, due to performance the program requires), we can cancel to introduce dummy classes.
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents
97
{ I ins; if(EXP_TRUE) ins = new A(); else if(EXP_FALSE) ins = new B(); else ins = new C(); ... Ag3 p = new Ag3() ins.m(p); ...
{ A the_a = new A(); ... the_a.run(); ... } }
Fig. 7. Change Types and new sentences
As continuation of the example given by (1), we show an example in Fig. 6. The interface I defines three methods that have the same name ‘m’ and return type Rt, and the arguments of each method are Ag1, Ag2, and Ag3 respectively. Furthermore, we add the declaration of implementation to class A and B. Class C has the same method as class A and B. (3) Change Types and New Sentences. Finally, we change types of instance variables of targeted classes into the type of interface introduced in the step (2). And for every new sentence which creates the reference of the targeted class, we put the new sentence into if-sentence with another new sentences. Fig. 7 is an example of that conversion. EXP TRUE is the condition expressions that is always true such as x*(x+1)%2==0 or y*(y+1)*(y+2)%6==0. Hence the semantics of the original program is maintained. However, generally speaking, in static analysis it is very difficult to evaluate such expressions and this results in difficulty in determining the execution paths in the presence of if-statements5 Needless to say, such condition expressions can be made arbitrarily complicated as long as the original semantics is retained. Therefore the if-statements make it difficult to determine the reference variable ins points to. 4.2
Use of Exception
In this section, we propose another obfuscation technique using exception, which is independent of the technique in Sect. 4.1. Although exceptions are to provide error-handling as explained in Sect. 2.3, of course, they can be inserted in any site of a program. Our technique converts if-sentences to try/catch-sentences as instances. Here, we consider an if-sentence in Fig. 8, where EXP1 , EXP2 , ..., EXPn−1 are appropriate condition expressions, and S1 , S2 , ..., Sn−1 are sequences of sentences. Now we introduce exception classes exp base, exp 1, exp 2, ..., exp n, 5
Here static analysis of a program is conducted under the assumption that all execution paths within procedures, without regard to interprocedural paths, are executable. This assumption is commonly found in the literature and is often called ‘meet over all paths’ [11].
98
Y. Sakabe, M. Soshi, and A. Miyaji class exp_base extends Exception { } class exp_1 extends exp_base { } ... class exp_n extends exp_base { }
{ ... if( EXP1 ) { S1 } else if( EXP2) { S2 } ... else if( EXPn-1) { Sn-1 } else { Sn } ...
{ exp_base e; ... try { throw e } catch ( exp_1 e1 ) { S1 } catch ( exp_2 e2 ) { S2 } ... catch ( En en ) { Sn } ...
}
}
Fig. 8. Use of Exceptions
where exp 1∼exp n are subclasses of exp base. Then we convert the if-sentence to the try/catch-sentence as shown in the right side in Fig. 8. The variable e should be an instance of an appropriate exception class so that obfuscated sentences execute equivalently to the original if-sentence. 4.3
Example of Obfuscation
At the end of Sect. 4, for completeness of the description of this section, we show in Fig. 9 an example of obfuscation to which all obfuscation techniques apply.
5
Complexity Evaluation
Our obfuscation techniques described in Sect. 4 substantially impede precise points-to analysis. In this section, we support this claim by presenting a proof in which we show that statically determining precise points-to is NP-hard. Theorem 1: In the presence of classes which implement interfaces, method calls by the instances of the classes, and at the same time in the presence of methodoverloadings, the problem of precisely determining if there exists an execution path in a program on which a given instance points to a given method at a point of the program is NP-hard 6 . Proof: The proof of Theorem 1 is by reduction from the 3-SAT problem [8] for ∧ni=1 (li,1 ∨ li,2 ∨ li,3 ) with propositional variables {v1 , v2 , ..., vm }, where lij is a literal and is either vk or vk for some k (1 ≤ k ≤ m). The reduction is specified by the program in Fig. 10, which is polynomial in the size of the 3-SAT 6
For further backgrounds behind the way of this proof, see [11].
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents
class A { int display(); int calc(int, int); void run(); } class B { float show(); float move(float, float); }
99
class Rt { ... } class Ag { ... } class Ag1 extends Ag { ... } class Ag2 extends Ag { ... } class Ag3 extends Ag { ... } interface I { Rt m(Ag1); Rtm(Ag2); Rt m(Ag3); } class A implements I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); }
class B implements I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); }
class C implements I { Rt m(Ag1); Rt m(Ag2); Rt m(Ag3); }
{ int x, y, z; float s, t; I ins1, ins2; Ag p; ex_base e; ... if(x*(x+1)%2==0) e = new ex_1(); else if(x/y==z) e = new ex_2(); else e = new ex_3(); ... try { throw e } catch(ex_1 e1) { ins1 = new A(); } catch(ex_2 e2) { ins1 = new B(); } catch(ex_3 e3) { ins1 = new C(); } ... p = new Ag2(x, y); r = ins1.m(p); z = r.getRetValue(z); ... if(x*x<0) e = new ex_1(); else if(y*(y+1)*(y+2)&6==0) e = new ex_2(); else e = new ex_3(); ... try { throw e } catch(ex_1 e1) { ins2 = new A(); } catch(ex_2 e2) { ins2 = new B(); } catch(ex_3 e3) { ins2 = new C(); } p = new Ag2(t); r = ins2.m(p); s = r.getRetValue(s); ...
{ int x, y, z; float s, t; ... A the_a = new A(); ... z = the_a.calc(x, y); ... B the_b = new B(); s = the_b.move(t); ... }
}
Fig. 9. Example of Obfuscation
problem. The conditionals are not specified in the program since we assume that all paths are executable. As will be seen later, paths through the code between L1 and L2 represent truth assignments for the propositional variables. The truth assignment on a particular path is encoded in the points-to relationship of certain variables in the program. Paths between L2 and L3 then encode in the points-to relationship whether or not the truth assignment resultant from the path to L2 satisfies ∧ni=1 (li,1 ∨ li,2 ∨ li,3 ). If we interpret vi pointing to b true as the propositional variable vi being true, then any path from L1 to L2 uniquely determines one truth assignment. Furthermore, the converse is also true, namely, every truth assignment corresponds to exactly one execution path as just mentioned. Now consider the path from L2 to L3. If the truth assignment for a path from L1 to L2 satisfies the formula then every clause has at least one literal which is true. Pick the path from L2 on which each clause assigns b ture to c. Then each assignment corresponds to c = b true.and(b true) and c must point to b true at L3. However if the formula is unsatisfiable then every truth assignment has a clause, say (li,1 ∨ li,2 ∨ li,3 ), where all these three literals are false. This implies li,1 , li,2 , and li,3 all point to b false. Because every path from L2 to L3 must go through the statement
100
Y. Sakabe, M. Soshi, and A. Miyaji
if(-) c = c.and(li,1 ) else if(-) c = c.and(li,2 ) else c = c.and(li,3 ); c must point to b false on all paths to L3 and thus c never points to b true. Therefore 3-SAT is polynomial reducible to the problem of Theorem 1 and this completes the proof.
6
Empirical Evaluation
In this section we present application of our obfuscation procedures to four programs, RC6, compress, MD5 and FFT. Table 1 shows the difference between the hierarchy/call graphs of the original programs and those of the obfuscated programs. In the rows for a hierarchy graph, ‘#nodes’ represents the sum of the classes and the interfaces, and ‘#edges’ represents the sum of subclassing and implements edges. Furthermore, in the rows of a call graph, ‘#nodes’ represents the number of call sites, and ‘#edges’ represents the number of ‘to method nodes’. Here, we can readily see from the table that the increase of the numbers of edges is greater than those of nodes, while the number of edges usually increases in proportion to the number of nodes. In the original call graphs, the numbers of nodes and edges are almost the same, which means that all the call sites and target methods correspond one to one. On the other hand, in the call graphs of the obfuscated programs, the number of edges is 3.4 times the number of nodes on average. Therefore, some call sites have two or more candidates of methods, and these result give a good evidence that precision of analysis is much reduced by our obfuscation techniques. We have evaluated performance degradation due to the obfuscation, as indicated in Table 2. The experiments were conducted on a Sun Ultra 5(UltraSPARC-II 400Mhz) with Solaris 8 (SunOS 5.8). Programs were compiled and executed by Java version 1.3.1. Each execution time was the average of 1000 times execution. The average rate of execution times of obfuscated programs over the original programs is 1.3, which is not so great as the rise in source codes or class files. Therefore, our obfuscation techniques do not degrade performance so much. Note that these are results of applying our obfuscation procedures just once. If needed we can apply the procedures two or more times, then it will provide further obfuscated programs.
7
Conclusion
In recent years mobile agent systems have been paid much attention to as a new computing paradigm. However, for advanced application such as electronic commerce, the agent systems are of little value unless their security is ensured. Especially it is significantly important to provide protection for mobile agents against attacks by malicious hosts.
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents interface Bool { public Bool and(Bool arg); } class True implements Bool { public Bool and(Bool arg) { return arg; } } class False implements Bool { public Bool and(Bool arg) { return this; } } class theorem { Bool b true, b false; Bool c; void var(Bool v1 , Bool v1 ) { if(-) var(v1 , v1 , b true, b false); else var(v1 , v1 , b false, b true); } void var(Bool v1 , Bool v1 , Bool v2 , Bool v2 ) { if(-) var(v1 , v1 , v2 , v2 , b true, b false); else var(v1 , v1 , v2 , v2 , b false, b true); } . . . void var(Bool v1 , Bool v1 , Bool v2 , Bool v2 , ... Bool vm , Bool vm ) { L2:
L3: }
if(-) c = l1,1 else if(-) c = l1,2 else c = l1,3 ; if(-) c = c.and(l2,1 ) else if(-) c = c.and(l2,2 ) else c = c.and(l2,3 ); . . . if(-) c = c.and(ln,1 ) else if(-) c = c.and(ln,2 ) else c = c.and(ln,3 );
public theorem() { b true = new True(); b false = new False(); L1:
}
if(-) var(b true, b false); else var(b false, b true); } public static void main(String[] args) { new theorem(); }
Fig. 10. 3-SAT solution iff c, b true in Interprocedural Points-to Analysis
101
102
Y. Sakabe, M. Soshi, and A. Miyaji Table 1. Change of hierarchy and call graphs program
Hierarchy #nodes compress Graph #edges Call Graph #nodes #edges Hierarchy #nodes RC6 Graph #edges Call Graph #nodes #edges Hierarchy #nodes MD5 Graph #edges Call Graph #nodes #edges Hierarchy #nodes FFT Graph #edges Call Graph #nodes #edges
Before Obfuscation After Obfuscation 3 21 2 30 30 74 30 274 5 23 4 33 18 77 18 297 10 33 9 42 194 667 202 1775 7 25 6 41 86 318 86 1057
ratio 7.0 15.0 2.5 9.1 4.6 8.3 4.3 16.5 3.3 4.7 3.4 8.8 3.6 6.8 3.7 12.3
Table 2. Change of program size and execution time program
program source[#lines] compress size class file[byte] execution time[sec] program source[#lines] RC6 size class file[byte] execution time[sec] program source[#lines] MD5 size class file[byte] execution time[sec] program source[#lines] FFT size class file[byte] execution time[sec]
Before Obfuscation After Obfuscation 126 332 2906 12056 0.57 0.69 561 853 6039 18414 0.70 0.78 762 2142 11257 43462 0.61 0.92 874 2185 11260 37158 0.66 0.99
ratio 2.6 4.2 1.2 1.5 2.0 1.1 2.8 3.9 1.5 2.5 3.3 1.5
In order to solve the problem, obfuscation of agent programs is very promising. Unfortunately previous software obfuscation techniques share a major drawback that they do not have a theoretical basis and thus it is unclear how effective they are. Therefore, in this paper we propose novel obfuscation techniques for Java. We believe it is fairly easy to apply the techniques to other object-oriented languages such as C++. Our obfuscation techniques take advantage of polymorphism and exception mechanism and can drastically reduce the precision of points-to analysis of the programs. We have shown that determining precise points-to analysis in obfus-
Java Obfuscation with a Theoretical Basis for Building Secure Mobile Agents
103
cated programs is NP-hard and the fact provides a theoretical basis for our obfuscation techniques. Furthermore, in this paper we have presented some empirical experiments. The results show the effectiveness of our obfuscation approaches.
References 1. Hohl, F.: Time limited blackbox security: Protecting mobile agents from malicious hosts. In Vigna, G., ed.: Mobile Agents Security. Volume 1419 of Lecture Notes in Computer Science. Springer-Verlag (1998) 92–113 2. Collberg, C., Thomborson, C., Low, D.: A taxonomy of obfuscating transformations. Technical Report 148, Department of Computer Science, the University of Auckland, Auckland, New Zealand (1997) 3. Aucsmith, D.: Tamper resistant software: An implementation. In Anderson, R.J., ed.: Information Hiding: First International Workshop. Volume 1174 of Lecture Notes in Computer Science., Springer-Verlag (1996) 317–333 4. Mambo, M., Murayama, T., Okamoto, E.: A tentative approach to constructing tamper-resistant software. In: New Security Paradigm Workshop. (1997) 23–33 5. Ogiso, T., Sakabe, Y., Soshi, M., Miyaji, A.: Software obfuscation on a theoretical basis and its implementation. IEICE Transactions on Fundamentals E86-A (2003) 176–186 6. Wang, C., Hill, J., Knight, J., Davidson, J.: Software tamper resistance: Obstructing static analysis of programs. Technical Report CS-2000-12, Department of Computer Science, University of Virginia (2000) 7. Chatterjee, R., Ryder, B.G., Landi, W.: Complexity of points-to analysis of Java in the presence of exceptions. IEEE Transactions on Software Engineering 27 (2001) 481–512 8. Garey, M.R., Johnson, D.S.: Computers and Intractability – A Guide to the Theory of NP-completeness. W. H. Freeman and Co. (1979) 9. Sander, T., Tschudin, C.F.: Protecting mobile agents against malicious hosts. In Vign, G., ed.: Mobile Agents and Security. Volume 1419 of Lecture Notes in Computer Science. Springer-Verlag (1998) 44–60 10. Kotzanikolaou, P., Burmester, M., Chrissikopoulos, V.: Secure transactions with mobile agents in hostile environments. In: Information Security and Privacy: Fifth Australasian Conference on Information Security and Privacy, ACISP 2000. Volume 1841 of Lecture Notes in Computer Science., Springer-Verlag (2000) 289–297 11. Myers, E.W.: A precise inter-procedural data flow algorithm. In: Conference record of the 8th ACM Symposium on Principles of Programming Languages (POPL). (1981) 219–230
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems Michelle S. Wangham, Joni da Silva Fraga, and Rafael R. Obelheiro Department of Automation and Systems Federal University of Santa Catarina C. P. 476 – 88040-900 – Florian´ opolis – SC – Brazil {wangham,fraga,rro}@das.ufsc.br
Abstract. Mobile agents have recently started being deployed in largescale distributed systems. However, this new technology brings some security concerns of its own. In this work, we propose a security scheme for protecting mobile agent platforms in large-scale systems. This scheme comprises a mutual authentication protocol for the platforms involved, a mobile agent authenticator, and a method for generation of protection domains. It is based on SPKI/SDSI chains of trust, and takes advantage of the flexibility of the SPKI/SDSI certificate delegation infrastructure to provide decentralized authorization and authentication control.
1
Introduction
A mobile agent in a large-scale network can be defined as a software agent that is able to autonomously migrate from one host to another in a heterogeneous network, crossing various security domains. In order for these agents to exist within a system or to form themselves a system, they require a computing environment—an agent platform—for deployment and execution. The ability to move agents (code + state) allows deployment of services and applications in a more flexible, dynamic, and customizable way with respect to the client-server paradigm [1]. Despite its many benefits, the mobile agent paradigm introduces new security threats from malicious agents and platforms [2]. Due to these threats, security mechanisms should be designed to protect the communications infrastructure, agent platforms and agents themselves. This paper concentrates on mechanisms for protecting agent platforms against malicious agents, considering large-scale distributed systems. One of the main concerns with an agent platform implementation is ensuring that agents are not able to interfere with one another or with the underlying agent platform [3]. A common approach for accomplishing it is to establish isolated execution domains (protection domains) for each incoming mobile agent and platform, and to control all inter-domain access. Protection against malicious agents is not restricted to confining their execution to their own execution domains in agent platforms; other issues need to be considered when distributed large-scale systems are the focus. For instance, generation of these protection A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 104–116, 2003. c IFIP International Federation for Information Processing 2003
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems
105
domains depends on distributed authentication and authorization mechanisms, which makes it a difficult task. The Java platform is quickly becoming the language of choice for implementing mobile agent systems. Besides being considered a de facto standard for programming distributed applications, Java has several properties that make it a good fit for this task. However, although Java is very convenient for the creating mobile agents, its static and centralized access control model poses some limitations with regard to security policy definition. These considerations have led us to design a security scheme for protecting agent platforms against malicious mobile agents in large-scale systems. This scheme takes advantage of the flexibility of the SPKI/SDSI certificate delegation mechanisms to accomplish decentralized authentication and authorization. It includes a protocol for establishing secure channels, an algorithm for authentication of incoming mobile agents, and a scheme for generating isolated protection domains for agents that aims to overcome some limitations of the Java 2 access control model.
2
Security in Mobile Agent Platforms
A mobile agent platform provides an environment where agents can execute themselves and interact with other agents. A malicious agent can attack the platform it is currently visiting or other agents on the same platform, thus posing a significant threat to this platform. Possible security threats facing mobile agent platforms include [3]: masquerading, when an agent poses as an authorized agent in an effort to gain access to services and resources to which it is not entitled or to deceive the agent with which it is communicating; denial of service, when an agent attempts to consume an excessive amount of the agent platform’s computing resources or to send repeatedly messages to another agent; unauthorized access, for example when an agent obtains read or write access to data for which it has no authorization, including access to agent’s state or code, and repudiation, when a agent participates in a transaction or communication and later claims that it did not happen. Establishing isolated domains for agents is the most common technique for protecting agent platforms resources and functionalities against malicious agents. In addition to this approach, other techniques were proposed based on conventional security techniques. Some of these techniques are: safe code interpretation [4][5], digital signatures [4][6][7], path histories [8], State Appraisal [2], and Proof-Carrying Code (PCC) [9]. Many of these mechanisms offer an effective security to agent platforms and their resources for some classes of applications, particularly when techniques are combined. For instance, the domain isolation technique combined to code signing (as provided in the Java 2 platform [4]) makes it possible to implement run-time access control based on the authority of an agent (the owner)1 . However, these 1
Limitations of the Java 2 access control model are described in section 6.
106
M.S. Wangham, J. da Silva Fraga, and R.R. Obelheiro
techniques are not really suitable to large-scale applications. Moreover, access control based solely on owner of the agent does not seem to be appropriate when multi-hop agents with free destinations are taken into consideration, since the trust in an agent depends not only on the owner and the forwarding platform but also on all platforms visited by the agent [8]. Therefore, an effective authentication and authorization scheme for large-scale mobile agent systems should be based on agent credentials, on the identities of agent owners and on the lists of platforms visited by agents. Identities and credentials are usually represented by, and stored in, digital certificates, such as those used in SPKI/SDSI.
3
SPKI/SDSI Infrastructure
The Simple Public Key Infrastructure/Simple Distributed Security Infrastructure(SPKI/SDSI) specification defines a simple and flexible distributed authentication and authorization infrastructure based on digital certificates and local name spaces [10]. SPKI/SDSI uses an egalitarian model of trust. The subjects (or principals) are public keys and each public key is a certificate authority [11]. There is no hierarchical global infrastructure as in X.509. Two types of certificates are defined by SPKI/SDSI: name and authorization certificates. An authorization certificate grants specific authorizations from the issuer to the subject of the certificate; these authorizations are bound to a name, a group of names or a key. The issuer of the certificate can also allow a principal to delegate the received permissions to other principals. The SPKI/SDSI delegation model allows the construction of chains of trust that begin with a service guardian and arrive in principals’ keys. When a given subject desires to obtain access to some resource, it must present a signed request and a chain of authorization certificates for checking the needed permissions [11]. Since SPKI/SDSI follows a decentralized approach to authentication and authorization, it is suitable for large-scale systems. Access rights can be delegated to form a chain of certificates (controlled distribution of authorization). Authorizations and permissions can be freely defined and are not restricted to any predefined set [10]. The issuer of a certificate can specify certain conditions under which the certificate is valid; this provides for finer-grained control of delegation. Since certificates are bound to keys instead of names, they eliminate the need for finding the public key corresponding to a given name [12], and can be also used in situations where anonymity is desired. Due to these advantages, SPKI/SDSI certificates were chosen to represent agent credentials in our security scheme.
4
A Proposal for Authentication and Authorization Based on Chains of Trust
We now present an authentication and authorization scheme for large-scale mobile agent systems. We assume that agents have free itineraries and are multihop, that is, the number of platforms they traverse in a given itinerary is not
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems
107
fixed. SPKI/SDSI chains of trust and the concept of federations provide scalability and flexibility to this scheme. 4.1
Platform Federations
In this scheme, we use the concept of federation introduced in [13], which emphasizes the grouping of principals with common interests. With that, mobile agent platforms can group according to their service classes, constituting service federations. For example, the Mold Enterprise Federation may group the mobile agent platforms from enterprises that manufacture molds and matrices. The purpose of a federation is to assist its members on reducing principal names and on building new chains of trust through its Certificate Manager (CM) [13]. These chains of trust between client and service are quickly and efficiently established from name and authorization certificates available at the certificate repository of the service federation. By storing name and authorization certificates in these repositories, the services available in the associated platforms can be announced. The inclusion of a platform in one of these federations should be negotiated with the association that controls this certificate storage service. The CM offers a certificate search alternative, either for name reduction or for creating new authorization chains. Besides, a member of a federation can join other federations and different federations can establish trust relationships. The certificate managers can be associated to each other, linking those who, for affinity, can better represent the needs of their members, creating webs of federations with global scope. The main function of a web of federations is to help a client, through its agents, in the search for access privileges that link it to the guardian of a service (another platform). Further details on the concept of federations can be found in [13]. 4.2
Authentication and Authorization Scheme
Fig. 1 shows the procedures defined in the security scheme, composed by prevention and detection techniques that emphasize the protection of agent platforms and their resources. In this scheme, after the mobile agent creation process, the source and destination platforms (the ones which send and receive agents, respectively) first go through a mutual authentication protocol so that a secure channel between them can be established. After that, the agent will be sent with its credentials to be authenticated by the destination platform and then its domain of execution can be created. In other words, when an agent arrives in a platform it presents its public key and a chain of SPKI/SDSI certificates to a verifier that performs the authorization checks. From these information, this verifier must generate the permissions required for the agent to be run in a protection domain on its destination platform. This dynamic generation of permissions provides flexibility and follows the principle of least privilege. Chains of trust also help to achieve the necessary scalability for Internet-based applications. The following section analyzes some aspects referring to the mechanisms in the proposed scheme.
108
M.S. Wangham, J. da Silva Fraga, and R.R. Obelheiro
Fig. 1. Security Scheme for Agent Platform Protection
Creation of Mobile Agents. During the mobile agent creation process (see Fig. 1, procedure 1), the owner, being the authority that an agent represents, provides a set of SPKI/SDSI authorization certificates defining the agent’s credentials. It should be noted that this initial set of authorization certificates may not be sufficient to grant access to certain resources in a given platform. So, new certificates can be provided to the agent during its visits to other agent platforms. For example, suppose that an agent needs to visit a transport enterprise associated to a mold enterprise. The agent may not have the certificates needed to be received in this platform. Thus, the platform that represents the mold enterprise can delegate certificates to this agent enabling it to access the associated enterprise. The trust model proposed in SPKI/SDSI determines that a client is responsible for finding certificates that enable it to access a given service. Therefore, an agent can search for certificates on the webs of federations and negotiate them when they are found. The owner of the agent has to put in an object that will contain the list of previously visited platforms (called the path register ) a signature indicating its identity and the identity of the first platform to be visited. This object is attached to the agent. A list that indicates service federations whose member platforms are authorized to execute the agent can be (optionally) defined and attached to the agent. The path register and the list of federations are used to analyze the history of the agent’s travels in the agent authentication process. Finally, the agent’s owner must sign the code of the agent and its read-only data, and then create the agent in its home platform to send it through the network. Secure Channel Establishment. In the proposed scheme, mutual authentication between the platforms involved must be established before agents can be transferred. This authentication creates a secure channel in the communications infrastructure between the authenticated parties that is used for agent transfers. In accordance to the SPKI/SDSI model, identification is not done with names, but with public keys, with digital signatures as the authentication mech-
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems
109
anism. Thus, in platform authentication, for a digital signature to be verified, a public key and a signed request must be present at the receiver. Mutual authentication is performed during the secure channel establishment between agents platforms. Fig. 2 shows the mutual authentication performed with a challenge-response protocol, based on SPKI/SDSI certificates of the owners (managers) of the platforms. The basis for authentication in SPKI/SDSI are chains of authorization certificates [10]. In step 1, Fig. 2, the source platform sends a signed message containing a request (establish trust) and a nonce (nonceSP ), without any certificates. From this request, the destination platform builds a signed challenge and sends it to the source platform so that it can prove it has the required permissions (step 2). The challenge is composed by information from the resource’s ACL, by nonceSP and by a nonce generated by the destination platform (nonceDP ). In step 3, the source platform verifies the signature of the challenge to confirm the authenticity of the destination platform. Then, it sends a signed response with the request, nonceDP, and the authorization certificates for the desired operation. From the chain of authorization certificates, the destination platform can check the requester’s signature, finishing the authentication process (step 4).
Fig. 2. Protocol for Mutual Authentication of Platforms
It is important to note that the process of mutual authentication of platforms is concluded with the establishment of a secure channel. This channel will be used for all agents that are transferred between the two platforms, without need for subsequent platform authentication. For secure channel establishment, an underlying security technology—Secure Sockets Layer (SSL)—is used to ensure confidentiality and integrity of communications between agent platforms. When a secure channel is established, the source platform sends the agent to the destination platform along with its its credentials for building an execution domain that is appropriate for the agent.
Mobile Agents Authentication. Before instantiating a thread to an agent, the destination platform must authenticate the received agent. In order to protect against an agent, a platform depends not only on the verification of the agent’s owner authenticity, but also on the degree of trust in the platforms already visited by the agent, since a mobile agent can become malicious by virtue of its state having been corrupted by previously visited platforms [2]. One of the
110
M.S. Wangham, J. da Silva Fraga, and R.R. Obelheiro
contributions of this paper is the definition of a multi-hop authenticator that establishes trust on an agent, based on the authenticity of the owner of the agent, on the authenticity of the platforms visited by the agent and on the federations defined by the owner of the agent. Consider the authenticator shown in Fig. 3; upon receiving a mobile agent, a platform must first check, through verification of the agent’s signature (code and read-only data), that this agent has not been corrupted and confirm its association to a principal, its owner (step 1). Thus, modifications introduced by malicious platforms can be easily detected by any platform visited by the agent.
Fig. 3. Multi-hop Agents Authenticator
For one-hop agents, the technique proposed in step 1 ensures the integrity of an agent, but for multi-hop agents this technique is insufficient. For detecting possible modifications and checking the agent’s traveling history, the destination agent platform must analyze the agent’s path register (step 2). For that purpose, each platform visited by the agent should add to the agent’s path register a signed entry containing its identity (public key) and the identity (public key) of the next platform to be visited, forming a history of the path followed by the agent. In step 2, Fig. 3, the platform administrator has to define how the agent’s path register is analyzed and how the trust level is established. The possibilities shown in Fig 3 include only step 2.1, only step 2.2, or both steps2 . Moreover, we suggest that platform-generated sensitive data (write-once data) should be stored in a container to be carried by the agents (as proposed by Karnik in [7]). These sensitive data should be signed by the generating platform so that possible modifications can be detected. This approach is vulnerable to some attacks, however [14][15][16]. For instance, Roth [16] describes an attack where a malicious platform, which is visited a second time by an agent or which colludes with another platform, deletes all items that were added to the agent’s container since one of its previous visits or since the agent’s departure from the first platform. In this paper, we focus on protecting agent platforms, but it is our intention to address the protection of agents in future work. Generation of Protection Domains. Protection domains and the permissions assigned to them are defined after trust in an agent has been established 2
An inconvenient is that analyzing an agent’s path becomes costlier as the path register grows.
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems
111
(a result from previous procedures). They are based on the agent’s SPKI/SDSI authorization certificates and on trust and risk levels. The platform guardian verifies the agent’s certificates in order to define the set of permissions. This decouples the privileges granted to agents (their credentials) from the privileges required to access resources (the access control policy), which provides flexibility and scalability to the security scheme. Some extensions to the Java 2 security model are needed for generating the protection domain where an agent will be run. These extensions, represented by grey boxes in Fig. 4, are: SPKISecureClassLoader, required for extracting certificates from the incoming agent and for creating a protection domain of a thread; SPKIPolicy, an object that represents a policy that defines, from the certificates carried by an agent, which Java permissions will be associated to the application domain; and SPKIVerifier, required for verifying the authenticity of SPKI certificates. Following the dynamics depicted in Fig. 4, the platform administrator describes the security policy of the agent platform by mapping the authorization granted from SPKI/SDSI certificates to Java permissions, defining for that purpose a policy file. When an agent is received in a platform, its credentials are forwarded by SPKISecureClassLoader to the SPKIPolicy object which interprets them. When SPKI permissions are mapped to Java permissions, the Java support generates the corresponding protection domain for the thread that runs the agent; the Java permissions are made available through PermissionCollection.
Fig. 4. Dynamics for Protection Domain Generation
If a thread (agent) makes an access request to a resource outside of its application domain, that is, in a system domain, AccessController is activated to check whether the access must be allowed. It must verify, in the created protection domain, whether the principal has a corresponding Permission object in its collection of permissions. If it does, the thread can exchange domains, entering in the system domain.
112
5
M.S. Wangham, J. da Silva Fraga, and R.R. Obelheiro
Implementation
A prototype of the security scheme for protection of agent platforms has been defined and implemented in order to demonstrate its suitability. The architecture of this prototype is shown in Fig. 5.
Fig. 5. Architecture of the Prototype
For the mobile agents support layer we have chosen IBM Aglets3 , an opensource platform that uses the Java platform. Aglets provides mechanisms for code and state information mobility, and an environment (called Tahiti) which supports creation, cloning, dispatching, and retraction of agents. Aglets supports both Agent Transfer protocol (ATP) and Java Remote Method Invocation (RMI) as communication infrastructures. In our work, we have chosen to use only RMI because it is better suited to purely Java-based distributed systems. The SPKI/SDSI Infrastructure component of the prototype architecture is responsible for the creation of SPKI/SDSI certificates for agent platforms and mobile agents. For this component, we have adapted an existing Java library that implements SDSI 2.0 [17] and provides the necessary functionalities. A tool for assisting the agent creation process was implemented. This GUIbased application allows an owner to define the credentials and list of federations for an agent, to sign the code and read-only data, and to initialize the path register and the write-once container. The protocol for secure channel establishment (see Fig. 2) was implemented with the SDSI 2.0 library and with Java 2 cryptographic tools. SSL support is provided by the iSaSiLk toolkit4 . The Aglets platform was adapted to optionally use RMI over SSL. The multi-hop authenticator described in section 4.2 is being implemented with the Java 2 cryptographic tools and the SDSI 2.0 library. Presently, all entries in an agent’s path register are analyzed considering only step 2.1 (Fig. 3). Only two levels of trust were defined according to the list of federations: authorized or non-authorized platforms. That is, the platform either is or is not a member 3 4
http://aglets.sourceforge.net/ http://jce.iaik.tugraz.at/products/02 isasilk/
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems
113
of a federation present in the list of federations previously defined for the agent mission. The multi-hop algorithm as currently defined is shown in Fig. 6.
Fig. 6. Multi-hop Authenticator
The scheme for generation of protection domains (see Fig. 4) has been fully designed but only partially implemented in the Aglets Platform. When a agent is dispatched, the Aglets Platform attaches the certificates defined in the agent creation to the serialized agent through SPKIAgletWriter. When the agent is received in the destination platform, SPKISecureClassLoader calls SPKIAgletReader to extract these certificates.
6
Related Work
Most Java-based agent platforms take advantage of the Java security models (especially the Java 2 version) to implement part of their security mechanisms. Among these platforms, there are commercial ones, such as IBM Aglets and GMDFokus Grasshopper, and academic ones, such as SOMA (University of Bologna) and Ajanta (University of Minnesota). These mobile agent platforms extend the Java Security Manager to provide a more flexible and adequate solution to agent platforms and to implement protection domains that isolate mobile agents, preventing malicious attacks from them. The difference between authorization schemes in these platforms lies in the information used to determine the set of access rights for an incoming agent. Aglets and Ajanta use only the agent’s owner identity. Grasshopper uses access control policies based on the owner’s identity or on the name of its group (group membership). In SOMA the principals are associated to roles that identify the operations allowed on the resources of the system. All these approaches are not really suitable to large-scale systems. Besides, it is important to note that the Java 2 access control model has some limitations that need to be analyzed. Instead of following the distributed
114
M.S. Wangham, J. da Silva Fraga, and R.R. Obelheiro
nature of its execution model, the Java 2 security model uses a centralized authorization scheme5 . When running, each code is labeled as belonging to one or more protection domains. Each domain has a set of permissions associated from a policy configuration file. Therefore, this file defines a static mapping between each mobile component and the permissions granted to it for execution in a local environment. In addition to a number of difficulties related to programming, development of a distributed and dynamic environment is constrained by limitations that stem from the concentration of trust on a single configuration file, which demands an up-front static definition of all distributed components in the system and their corresponding security attributes. Agent authentication is essential for implementing an effective authorization scheme in mobile agent systems. The Aglets and Grasshopper platforms do not have mechanisms for mobile agent authentication. SOMA authenticates agents based on several data contained in its credentials: domain and place of origin, class which implements the agent and user responsible for the agent. Before migration, these information, the initial state of agent and its code are digitally signed by the user that creates the agent. When an agent arrives at a remote site, its credentials are verified with regard to authenticity by checking the signature of the agent’s owner. The Ajanta platform uses a challenge-response authentication protocol with random nonce generation to prevent replay attacks, based on the signature of the agent’s owner. In comparison to the static model in Java 2 and to the platforms discussed above, our scheme has the advantage of decoupling privilege attributes (credentials) from control attributes (policies), its use of some Java security features notwithstanding. This means that, although a policy configuration file still needs to be statically defined, the proposed mechanisms add the flexibility offered by SPKI certificates to domain generation. That is, domains are dynamically defined when an agent aggregates to its credentials the delegated certificates received during its itinerary. Besides, in the agent authentication process described in section 5, the information used to determine an agent’s set of access rights is based not only on the identity of the agent’s owner, but also on the public keys of the owner and of the visited platforms, which avoids global name resolutions in large-scale systems. Two related proposals use SPKI/SDSI certificates to improve access control in Java 2. The first, developed by Nikander and Partnen [12], uses SPKI authorization certificates to delegate Java permissions that directly describe possible permissions associated to a protection domain. In this work, the authorization tag of the SPKI certificate was extended to express Java permissions. This solution has the disadvantage of using modified SPKI certificates. The second work [18] proposes two improvements to access control in Java 2: association of access control information to each mobile code segment (applet) as attributes, and the introduction of intermediate elements in the access control scheme for assisting 5
This centralization refers to the fact that all access control is done from a single configuration file that defines the whole security policy of a machine. Thus, there is only one ACL relating all subjects and objects in the machine.
A Security Scheme for Mobile Agent Platforms in Large-Scale Systems
115
the configuration of access control attributes of incoming mobile programs and of access control information located in the runtime environments. The SPKI/SDSI group mechanism is implemented through name certificates and makes these improvements possible. Molva and Roudier’s work [18] does not provide details on how their proposal can be implemented nor on how to combine it to current Java 2 security model. Both works discussed above do not deal with mutual authentication between source and destination platforms nor analyze the history of the visited platforms to establish trust on mobile code. Only the first proposal has flexibility characteristics similar to the ones proposed in the present work, in which the domains are formed according to the certificates delegated to an agent in its creation and throughout its itinerary. Nikander and Partanen propose that the search for forming new chains should be responsibility of the server. However, as mentioned before, this is not in accordance to the SPKI/SDSI model.
7
Concluding Remarks
Security issues still hamper the development of applications with mobile systems. Current security mechanisms do not present satisfactory results for protecting mobile agent platforms. There are even more limitations when we consider largescale systems, which impose stronger requirements with regard to flexibility and scalability. Our security scheme was motivated by perception of these limitations and a concern with aspects of security specific to large-scale applications. Its purpose is to prevent mobile agent attacks against platforms, defining a procedure that employs a combination of prevention and detection techniques. This scheme is based on decentralized authorization and authentication control that is suitable for large-scale systems due to its use of SPKI/SDSI authorization certificates. The mechanism of authorization certificate delegation allows a separation between agent credentials and security policy definition. The scheme for generation of protection domains is more flexible than those of the related works. The work described in this paper, although not fully implemented yet, already presents satisfactory results. As soon as it is concluded, its performance will be properly evaluated. This prototype is currently being integrated to a distributed Internet-based application in order to demonstrate its usefulness [19]. Considering the protection of platforms from agents and of the communication channel, the proposed security scheme effectively mitigates the perceived security threats, albeit further work is still needed to define mechanisms for protection of mobile agents against malicious agent platforms. Acknowledgments. The authors thank the “IFM (Instituto F´ abrica do Milˆenio)” and “Chains of Trust” project (CNPq 552175/01-3) members for their contributions, especially Elizabeth Fernandes, Galeno Jung, Ricardo Schmidt, and Rafael Deitos. We also thank the anonymous reviewers for their helpful
116
M.S. Wangham, J. da Silva Fraga, and R.R. Obelheiro
comments. The first and second authors are supported by CNPq (Brazil). The third author is supported by CAPES (Brazil).
References 1. Vigna, G., ed.: Mobile Agents and Security. LNCS 1419. Springer-Verlag (1998) 2. Farmer, W., Guttman, J., Swarup, V.: Security for mobile agents: Issues and requirements. In: Proc. 19th National Information System Security Conference. (1996) 3. Jansen, W., Karygiannis, T.: Mobile agent security. Technical Report NIST Special Publication 800-19, National Institute of Standards and Technology (1999) 4. Sun: Java 2 SDK security documentation. (2003) http://java.sun.com/security/. 5. Levy, J., Ousterhout, J., Welch, B.: The Safe-Tcl security model. Technical Report SMLI TR-97-60, Sun Microsystems (1997) 6. Gray, R., Kotz, D., Cybenko, G., Rus, D.: D’Agents: Security in a multiplelanguage, mobile agent systems. In Vigna, G., ed.: Mobile Agents and Security. LNCS 1419. Springer-Verlag (1998) 154–187 7. Karnik, N.: Security in Mobile Agent Systems. PhD thesis, University of Minnesota (1998) 8. Ordille, J.: When agents roam, who can you trust? In: 1st Conference on Emerging Technologies and Applications in Communications. (1996) 9. Necula, G., Lee, P.: Safe, untrusted agents using proof-carrying code. In Vigna, G., ed.: Mobile Agents and Security. LNCS 1419. Springer-Verlag (1998) 61–91 10. Ellison, C.M., Frantz, B., Lampson, B., Rivest, R., Thomas, B., Yl¨ onen, T.: SPKI requirements. RFC 2693, Internet Engineering Task Force (1999) 11. Clarke, D.E.: SPKI/SDSI HTTP server/certificate chain discovery in SPKI/SDSI. Master’s thesis, Massachusetts Institute of Technology (MIT) (2001) 12. Nikander, P., Partanen, J.: Distributed policy management for JDK 1.2. In: Proc. 1999 Network and Distributed Systems Security Symposium. (1999) 13. Santin, A., Fraga, J., Mello, E., Siqueira, F.: Extending the SPKI/SDSI model through federation webs. In: Proc. 7th IFIP Conference on Communications and Multimedia Security. (2003) 14. Yee, B.: A sanctuary for mobile agents. In: Secure Internet Programming. LNCS 1603. Springer-Verlag (1997) 261–273 15. Karjoth, G., Asokan, N., Gulcu, C.: Protecting the computing results of freeroaming agents. In: Proc. 2nd International Workshop on Mobile Agents. (1998) 16. Roth, V.: On the robustness of some cryptographic protocols for mobile agent protection. In Picco, G.P., ed.: Mobile Agents. LNCS 2240. Springer-Verlag (2001) 1–14 17. Morcos, A.: A Java implementation of Simple Distributed Security Infrastructure. Master’s thesis, Massachusetts Institute of Technology (1998) 18. Molva, R., Roudier, Y.: A distributed access control model for Java. In: European Symposium on Research in Computer Security (ESORICS). (2000) 19. Rabelo, R., Wangham, M., Schmidt, R., Fraga, J.: Trust building in the creation of virtual enterprises in mobile agent-based architectures. In: 4th IFIP Working Conference on Virtual Enterprises. (2003)
Privacy and Trust in Distributed Networks Thomas R¨ossler and Arno Hollosi Institute for Applied Information Processing and Communications (IAIK), Graz University of Technology, Inffeldgasse 16a, 8010 Graz, Austria {thomas.roessler,arno.hollosi}@iaik.tugraz.at
Abstract. Today distributed service frameworks play an ever more important role. Transitive trust is of great importance in such frameworks and is well researched. Although there are many solutions for building and transmitting trust in distributed networks, impacts on privacy are often neglected. Based on a trust metric it will be shown why insufficient trust is eventually inevitable if a request or message pass through a chain of services. Depending on the reaction of the service, privacy critical information may leak to other entities in the chain. It is shown that even simple error messages pose a privacy threat and that proper re-authentication methods should be used instead. Several methods of re-authentication and their impacts on privacy are discussed.
1
Introductional Example
The following example shall illustrate the problems concerning privacy and trust relationships in a distributed network of services. Here, in order to process a special task, the cooperation of several autonomous services is needed. Assume the Following Situation. A client would like to buy something which is offered by an online shop where the client is registered in the customer data base. Therefore, the buyer signs a purchase order through the shop’s online portal. The process involves several services of other instances such as a service for checking the inventory of the chosen product or a service for doing the payment. Figure 1 depicts the constellation of services used in this example. After the user has been authenticated to the portal of the online shop (service A), he has to fill in some forms and enter some personal data. When this first step is completed, service A contacts the service which processes a new order (service B ). This service validates and verifies the content of the order. First, it checks if the desired product is still available for sale. This will be done by sending a request to the inventory service (service F ). Next, service B initiates the payment service (service C ) to process the payment of the product. The payment is preferably done by the client’s customer account which allows the client to buy products within a certain credit line. Thus, service C contacts a service of the internal account system (service D). Assuming, that the user has not enough money on his account, the payment service tries to make the payment by a direct debit to the client’s credit card institution by requesting A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 117–131, 2003. c IFIP International Federation for Information Processing 2003
118
T. R¨ ossler and A. Hollosi
the corresponding service E. At this point, some problems concerning privacy and transitive trust will arise: – Because of too low security restrictions in the chain of services, it might be possible that service E does not accept the request. – Depending on how service E reacts to this situation, private data about the client may be disclosed to other services and the client’s privacy may be harmed.
F A X X
B
C
E D
public service (request needs not to be personalized with the client’s data) non-public service (request must be personalized with the client’s data) optional path (service used in special cases)
Fig. 1. Example of distributed services
If service E does not trust the request from service C, it is likely that an error message is sent back to the client. Passing this message back to the user through the service chain harms the user’s privacy because each service learns about the error. In the worst case, the error message contains the information of its origin, service E. Thus, every service in the chain gains knowledge that the client has not enough money and needs to use his credit card. Otherwise it would not have been necessary to involve service E into the process. Moreover, by recording a client’s habit it would be possible for instance to recognize that his customer account is overdrawn every end of month. Instead of sending an error message in response to insufficient trust into the authentication, service E can request a re-authentication of the client. Again, similar problems and privacy threats arise in this case. In the next section a trust model and applicable trust algebra is described, which is based on the work of Audun Jøsang ([7],[8],[9],[10]). Based on this a metric for determining the trustworthiness of a request inside a chain of services is discussed. By introducing and adapting trust values ([1],[2]) based on established criteria ([6],[4]) it is possible to decide either to trust or distrust an incoming request. This will lead to the necessity of re-authenticating requests if a service does not trust the incoming request. In the last section, privacy and threats on privacy are described. Error messages and re-authentication requests are considered critical to privacy and so they are discussed in detail at the end of this paper.
Privacy and Trust in Distributed Networks
B
A A (t)
C B (t)
119
D C (t)
A Fig. 2. An example of chained trust
2 2.1
The Opinion Triangle Definitions
Initially, the trustworthiness of each service has to be determined. Therefore, every entity can be divided in two parts. On the one hand, the connection between two services has to be evaluated under the aspect of security. For example, a normal TCP/IP connection will result in a lower level of security than an SSLconnection with client certificates. On the other hand, the service itself has to be evaluated. For this purpose, some established criteria already exist, e.g. the Common Criteria [6]. Such criteria not only consider the technical infrastructure and the system itself. They also take the technical and nontechnical environment into account. After evaluating each service, the level of trust must be expressed in an applicable metric. Therefore, in [10] and [9] the term opinion (ω) was defined as: t + d + u = 1, {t, d, u} ∈ [0, 1]3
(1)
Definition Opinion: Let ω = {t, d, u} be a triplet satisfying (1) where the first, second and third component correspond to trust, distrust and uncertainty respectively. Then ω is called an opinion. Corresponding to this definition, several levels of trustworthiness are mapped accordingly to different opinions. Equation 1 defines a triangle which is depicted in figure 3. Every opinion ω can be described as a point {t, d, u} in the triangle. For example, there are five trust levels to distinguish (according to table 1) and each trust level can be found in the opinion triangle (fig. 3). Table 1. Example of mapping between trust levels and opinions ω = {t, d, u} t 0.00 0.10 0.40 0.70 0.95
d 0.95 0.10 0.10 0.15 0.00
u trust level 0.05 distrust (-1) 0.80 ignorance (0) 0.50 minimum trust (1) 0.15 medium trust (2) 0.05 maximum trust (3)
120
T. R¨ ossler and A. Hollosi uncertainty 1
0
0
1
2
-1 distrust 1
0
0
3 1
trust
Fig. 3. Trust levels inserted into the opinion triangle
The advantage of using the opinion based trust model instead of a simple trust level based model is that there are three parameters expressing trust instead of only one value. As it will turn out in the next section, these three separate values are not treated equally when different opinions have to be combined. This will result in a real world adequate model for distributed trust relationships.
3
Subjective Logic
The algebra for determining trust will be based on a framework for artificial reasoning called Subjective Logic, which has been described in Audun Jøsang’s papers [8] and [9]. It defines various logical operators for combining opinions. In this section, only the most important definitions will be quoted, e.g. the Recommendation and Consensus operator. Finally, the subjective logic allows to examine joined entities under the aspect of trust. 3.1
Definition: Conjunction
If some entity has two different opinions about another entity, then the conjunction (∧) of these opinions may be useful. A A A A A A Let ωpA = {tA p , dp , up } and ωq = {tq , dq , uq } be entity A’s opinions about two distinct binary statements p and q. Then the conjunction of ωpA and ωqA , representing A’s opinion about both p and q being true is defined by [9]:
where
A = ωpA ∧ ωqA ωp∧q A A = {tA p∧q , dp∧q , up∧q }
(2)
A A tA p∧q = tp tq , A A A A dp∧q = dA p + dq − dp dq , A A A A A A up∧q = tp uq + up tq + uA p uq .
(3)
Privacy and Trust in Distributed Networks
3.2
121
Definition: Recommendation
Recommendation (⊗) is needed if an entity A decides about the trustworthiness of something (p) based on trust-recommendations given by a third party B. More formally: A A A Let A and B two entities where ωB = {tA B , dB , uB } is A’s opinion about B’s recommendation, and let p be a binary statement where ωpB = B B {tB p , dp , up } is B’s opinion about p expressed in a recommendation to A. Then A’s opinion about p as a result of the recommendation from B is defined by [9]:
A ωpAB = ωB ⊗ ωpB AB AB = {tp , dp , uAB p }
(4)
B = tA tAB p B tp , AB A B d p = tB d p , A A B = dA uAB p B + u B + tB u p .
(5)
where
It must be mentioned that ωpB is actually only the opinion that B recommends to A and it is not necessarily B’s real opinion. The opinion about an entity’s A recommendation, e.g. ωB , results of the conjunction of two separate opinions. On the one hand, there is entity A’s opinion about the trustworthiness of entity A B by itself, called ωKA(B) . On the other hand, entity A’s opinion about the trustworthiness of the recommendations (recommendation trust) made by entity A B has to be considered, given as ωRT (B) . Applying the conjunction operator as defined above results in: A A A ωB = (ωKA(B) ∧ ωRT (B) )
(6)
This term is also known as the conjunctive recommendation term [9]. 3.3
Definition: Consensus
The consensus (⊕) operator is used to combine several independent opinions about the same statement. As a result the certainty should increase. A A B B B B Let ωpA = {tA p , dp , up } and ωp = {tp , dp , up } be opinions respectively held by entities A and B about the same statement p. Then the consensus opinion held by an imaginary entity [A,B] representing both A and B is defined by [9]:
122
T. R¨ ossler and A. Hollosi
ωpA,B = ωpA ⊕ ωpB = {tA,B , dA,B , uA,B } p p p
(7)
B B A A B A B tA,B = (tA p p up + tp up )/(up + up − up up ), A,B A B B A A B B dp = (dp up + dp up )/(up + up − uA p up ), A,B A B A B A B up = (up up )/(up + up − up up ).
(8)
where
The effect of the consensus operator is to reduce the uncertainty. Opinions containing zero uncertainty can not be combined. Equipped with these three basic operation, it is possible to form a model for determining distributed trust in the web service scenario.
4
Chained Trust
As a basis for any calculation, each service must already have assigned an opinion ω about its security level—preferable by an independent authority. This opinion will be determined initially, during setting up the service, and has to be kept up-to-date. With some precautions, for example wrapping the opinion value into a signed certificate, the trust value could be sent within the requests. Anyway, trust values, opinions about the trustworthiness of an entity, respectively, have to be propagated in the network.
? request
request
B
A A (t)
request
C B (t)
D C (t)
Fig. 4. The problem of chained trust
Figure 4 depicts the stated problem: a request originating from service A will be propagated through service B, C to D. Because of the security requirements of service D, there must be a mechanism to decide whether the request is trustworthy or not. This question is similar to determining service D’s opinion about the trustworthiness of service A. Because of the indirect relationship between service A and D, the principle of recommendation is used. Let us consider the chained situation step by step. At first, service B has to decide about the trustworthiness of service A. This can easily be done by evaluating the opinion of service A’s trustworthiness ωtA , which preferable was attached to the request. Because of the direct trust relationship between A and B, ωtA is the value which enables
Privacy and Trust in Distributed Networks
123
a decision. In the next step, assuming that service B considers A’s request as trustworthy, service B sends a request to service C. At this point, service C has to decide whether to trust or distrust the whole chain. Therefore, the subjective logic is needed. A direct trust relationship exists between B and C and the opinion ωtB is received by service C through the request. However, between service A and C there is no such direct relationship. In order to decide about the trustworthiness of the chain, respectively about the trustability of service A, the recommendation operator is applied. In this case, the opinion ωtA about service A’s trustworthiness is recommended to service C by the preceding service B. In the definitions stated by Audon Jøsang [9] there is a difference between the opinion ωtA about the trustability of an entity A and the opinion about recommendations of an entity. Therefore, the recommendation operator as introduced in section 3.2 requires the so called conjunctive recommendation term (equation A 6), e.g. ωB , which combines the opinion of the trustworthiness about a service itself and the opinion about its recommendation by applying the conjunction operator (as stated in 3.2). In this work, these two opinions are considered as equal. This assumption is legitimate in this context because in this scenario, if a service is not trustworthy and its trust value is respectively low, then the recommendations of this service should also be considered as not trustworthy and vice versa. Thus, the conjunctive recommendation term is built by the use of only one opinion and the term can be reduced to (equ. 9): A A A B ωB = (ωKA(B) ∧ ωKA(B) ) = (ωtB ∧ ωtB ) = ωt∧t
(9)
Therefore, it is not necessary to define a separate opinion for recommendations which simplifies the application of the recommendation operator. With this assumption, the trust relationship at service C can be calculated as follows: CB C ωt(A) = ωB ⊗ ωtA = (ωtB ∧ ωtB ) ⊗ ωtA B = ωt∧t ⊗ ωtA
(10)
In (10), C’s opinion about the trustworthiness of service A consists of: – the conjunction (∧) of C’s opinion about B’s recommendations and B’s authenticity. In this matter, they are the same, namely ωtB . – B’s opinion about the trustworthiness of service A (ωtA ) With this result, service C is able to decide about the trustability of the chain. The same problem arises in the next step. Then, service D receives the request from the preceding service and it has to decide whether to trust or to distrust the chain of services. Based on recommendations as mentioned above, DCB service D will calculate the opinion ωt(A) in order to determine the trustability of service A through the chain of recommendations. D C DCB ωt(A) = ωC ⊗ ωB ⊗ ωtA = (ωtC ∧ ωtC ) ⊗ (ωtB ∧ ωtB ) ⊗ ωtA CB = (ωtC ∧ ωtC ) ⊗ ωt(A) C CB = ωt∧t ⊗ ωt(A)
(11)
124
T. R¨ ossler and A. Hollosi
The pattern of calculation is always the same. Moreover, it can be shown that this determination is recursive. With the opinion about the trustability of the chain so far, which was determined at the preceding service, and with the opinion about the trust relationship between the actual and the previous service, the chain can be evaluated. We will summarize this recursive approach to the calculation in the following lemma (12): Lemma: Recursive Trust Let A1 . . . An be a chain of services, where service An−1 makes some request to service An . An−1 ’s opinion about the trustAn−1 ...A2 worthiness of the chain so far is given by ωt(A and it is attached to the 1) request. An ’s opinion about the trustworthiness of the whole chain is: A
...A2
A
...A2
An ...A2 An n−1 ωt(A = ωA ⊗ ωt(A n−1 1) 1) A
n−1 n−1 = ωt∧t ⊗ ωt(A 1)
(12)
In consequence of this recursive calculation it is possible to evaluate the trustworthiness of the whole chain of services without having a particular list of all involved services. Such a list would be a threat for privacy as well. Tracing the components of the opinion about the trustworthiness of the whole chain during its propagation (based on recommendation) leads to the conclusion that the trust-component will never increase and the uncertainty generally becomes higher. Furthermore, at the end of the chain, depending on the particular opinions during the propagation, the uncertainty about a request may be too high for a service with sophisticated security restrictions. This is the reason why services either decline to act on the request and return an error message, or need the possibility to re-authenticate the client. In the next section we are going to look at the privacy threats that arise from this situation.
5
Privacy and Re-authentication
Insufficient trust leads to either declining a request or forcing a re-authentication. Webster’s dictionary defines privacy as “freedom from unauthorized intrusion” which is also an adequate definition for this situation. Here, privacy means that the involved services should not gain more knowledge than it is necessary.
F re-auth.
error
A
B
C
E D
Fig. 5. Errors in distributed services
In the example given in the introduction, the error message disclosed enough information to conclude about the customer’s financial situation. Therefore, er-
Privacy and Trust in Distributed Networks
125
ror messages can act as side-channels. By analyzing similar processes initiated by different people it is possible to establish the standard workflow. But some client’s request causes an error message or a re-authentication request due to a too low security level in the chain of services (service E in our example). With this information and with enough knowledge about the process it is possible to conclude about the involved services. In addition it is possible to gain information about user’s request and about the user himself. This is why it is crucial to react carefully in such a situation. There are two possible reactions: – replying with an error message – starting a re-authentication procedure Chain History. The question arises how and when information is propagated in order to reach previous services or the original client directly. One possibility is adding a chain history of preceding services to every request. The benefits of this approach are obvious: the original client is known to every service and each service can decide to trust the request based on the history of the request instead of calculating a level of trust. But adding a history of all involved services to every message not only increases the header of such messages, it also harms the user’s privacy. Every involved service learns about all preceding services. In our example, service E (credit card institute) would learn about the user’s contact to the online shop portal represented by service A, and that the user is going to buy something but does not have enough money (service B and service C respectively.) It seems that harming privacy is a too high price for the benefits of a chain history. Thus, a request or message should contain information of the sending and the receiving service only. No service should get more information about the process automatically, or more generally: a service should get as much information as needed but not more than absolutely necessary. 5.1
Error Messages
The minimum reaction is to return an error message to the preceding service. In the situation depicted in fig. 5, service E rejects the request from service C due to security considerations. Therefore, service C will receive an according message. Depending on how detailed the error message is, the receiving service will react. In the worst case, the error will be reported backwards through the whole chain to the original client. On the one hand, in order to give the user as much help as possible, the error message should be very detailed. On the other hand, a detailed error message, which in the worst case passes through every entity in the chain, gives all desirable information about the whole process and the user himself. For this reason, such messages should be encrypted with the client’s public key. This prevents disclosing detailed information to any third party. But already the occurrence of an error message brings enough information. Nevertheless, there must be at least a message about the unsuccessfully terminated process that has to be sent to every involved service in order to stop the process. It is preferable to use a solution where services try to fulfill the request without rejecting an
126
T. R¨ ossler and A. Hollosi
error message. The usage of re-authentication is an attempt to do so. In same cases it may be the only practical way in a distributed service framework. 5.2
Re-authentication Requests
A re-authentication request is sent to the user in order to authenticate the request for a dedicated service. It is also possible that instead of the user himself a trusted third party is allowed to sign the request on behalf of the user. A reauthentication request has to contain at least the following data: – a pseudo-random stream or a digital finger-print of the sending service (hashvalue) – a time-stamp or nonce to prevent replay-attacks – an explanation of the receiving service and the purpose of this service in plain text (readable for the user who has to sign it) – a signature over the request with the private key of the sending service in order to prove the origin of this request The time-stamp or nonce is needed to prevent manipulation of a service with a replayed re-authentication request in order to gain confidential information about a client. This component is essential for security and is common practice in security technologies. The additional text of the request has to contain detailed information about the service which wants the user to re-authenticate himself. The user must be able to recognize the circumstances of this request in order to decide correctly. Furthermore, the explanation in the request must point out the consequences and results of signing and executing the request. At last, the whole re-authentication request has to be signed by the sending service in order to prove the origin of the message. With such a detailed request, the client will be well informed and will be able to decide whether to grant the permission by re-authenticating or not to grant permission. Re-authentication requests can be split up into synchronous and asynchronous requests. In a synchronous request, the re-authentication request has to be fulfilled in time. That means that the requesting service is waiting until the request is sent back. This is a viable option only if it can be assumed that this will happen within a certain time frame. Contrarily, the asynchronous request leads to a temporary interruption of the process, because it is not predictable when the re-authentication request will be sent back. If too much time elapsed between starting the process and answering to the re-authentication request, some problems may arise concerning time restrictions for some outstanding requests in the chain. Therefore, a re-authentication through a synchronous way should be the first choice. The re-authentication can be carried out by four means, as follows. Out-of-Band Re-authentication. When using this method, every service contacts the user directly. It implies the requirement that every service has to get information about how to reach the user in an out-of-band way. Therefore,
Privacy and Trust in Distributed Networks
127
it is necessary to add some information like the client’s e-mail address to the request. request
A
request
B
request
C
D
Fig. 6. An example of an out-of-band re-authentication
From the point of view of privacy this is the best solution, as services do not gain any more information by adding (possibly temporary) contact information to the requests. And in case of errors or re-authentication no information is disclosed to other services in the chain (see figure 6.) Assuming that the client signs the request immediately, no other service in the chain will recognize the re-authentication. The drawback is that a re-authentication in this way will most likely be asynchronous (for example using e-mail). It is not very likely for all users to have their own server running which services can contact for re-authentication. Roll-Back Re-authentication. Using this method the re-authentication request is passed step by step back to the client. Beginning from the last service, e.g. service D, a re-authentication request will go through the whole chain backwards until a trustworthy service or the user itself is reached. The request is then signed and sent back. (fig.7). Each entity in the chain will learn that something is going on, and depending on the content of the request private information may be disclosed. request
B
A re-auth
request
re-auth
request
C re-auth
D re-auth
Fig. 7. Roll-back re-authentication
Using this method it is very important to encrypt the content of the re-authentication request. Otherwise, every involved service which transmits the request will gain additional information about the process. But even if the content of the request is encrypted and the identity of the requesting service is masked through some session-id privacy is threatened. In our introductory example, when service E issues service A an encrypted re-authentication request, service A can reason that the request originated from service E, as no other
128
T. R¨ ossler and A. Hollosi
service in the workflow would issue such a request. Thus, service A learns that the user has to have some problems in context with his financial situation. From the aspect of privacy roll-back re-authentication is not the best solution. Contrarily to the out-of-band mechanism, this re-authentication is a synchronous possibility to get in contact with the user. This is because, the client is already logged in to service A and the re-authentication request is eventually presented to the user through service A. Ticket-Server Solution. Similar to MS-Passport or the Kerberos authentication system ([12],[11]), this solution uses an additional ticket server (TS). As depicted in figure 8, in parallel to accessing the agent’s portal (service A) the user signs in at the ticket server. Whenever an authentication is needed the service in question contacts the ticket server. request
A
request
B
request
C
D
TS
Fig. 8. Re-authentication by using a ticket server
After the user is successfully signed in at the ticket server, the ticket server is allowed to perform re-authentication requests by signing these requests on behalf of the user. Therefore, the server replies with tickets to the requesting services. The ticket itself is signed by the ticket server using its private key. In order to prevent unrelated services requesting an authentication ticket from the server, the server has to generate a session-id which the user will tie to his request to service A by using cryptographic techniques. Otherwise the ticket server would have to make plausibility checks on the re-authentication requests which is impractical in most but the trivial cases. Furthermore, the ticket server would need more information than a session-id. This in turn may create new privacy problems. The main advantage of this solution is that in the case of a required re-authentication no other service will learn about it. Also, the ticket server itself has no idea about the other involved services which do not require reauthentication or the whole process as such. Moreover, this solution is very comfortable for the user because he is not burdened with the re-authentication. This is why there will be no additional time-delay caused by the client while answering the request. And so, this re-authentication can be made synchronous. The drawback is that the user has to have absolute trust in the ticket server itself. After all, the ticket server acts on his behalf. Therefore this server must be maintained by a trustworthy independent party. Of course, such a server will be a prime target for attacks.
Privacy and Trust in Distributed Networks
129
Communication Server Solution. This solution is similar to the ticket server solution. However, here the so-called communication server (CS) does not act on behalf of the user but is only used as contact and communication point (fig.9). This scenario does not suffer the drawbacks of the previous solution. request
A
request
B
request
C
D
CS
Fig. 9. Re-authentication through a web-server
The user contacts the CS and is reachable through the CS during the time of the process. For example, if the services have web-interfaces the communication server can be a special website. If a service wants to communicate with the client, e.g. because a re-authentication request is needed, it sends a request to the communication server. This implies that the services need to be aware of the IP-address of this web server. Thus, its address has to be propagated within the requests. Beside the IP-address, the requests may also contain some sessionid in order to make it easier for the communication server to classify incoming requests. Both information do not provide additional private information about the user and are thus not privacy critical. The communication server passes on the request to the user who then signs the re-authentication request. The signed response is sent back directly to the requesting service. This solution is similar to the out-of-band re-authentication. Here, the request is sent to a communication server instead of the client’s mailbox. The request will be displayed directly through for example a web-page and the client can sign the request immediately. Therefore, the re-authentication is synchronous which is the difference to a common out-of-band solution. Apart from this difference, the content of the re-authentication request itself will be quite similar to the other solutions. The communication server could also be used to inform the client about the actual status of the process or to send him error messages. Generally, the communication server allows to communicate with the client without harming his privacy. The only precondition is that the communication server is trustworthy and is run by some reliable party. 5.3
Practical Aspects
The models and problems described in this section are crucial for applications in an e-governmental environment. Here too, distributed services are used to process transactions initiated by a client. Furthermore, because of the sensitive data involved with governmental transactions, protecting the client’s privacy is of paramount importance.
130
T. R¨ ossler and A. Hollosi
From the discussed solutions in the previous section, out-of-band re-authentication cannot be used as some services require a synchronous reauthentication possibility. Roll-back re-authentication discloses too much information and should not be used in a privacy sensitive environment. That leaves the ticket server and communication server methods as options. However, such a server acts as central authentication authority for whichever governmental service the user accesses. Thus, such a server could be used to profile the user’s actions. In addition, data protection laws may forbid running such a service in the context of governmental processes. How can this situation be resolved? Depending on the circumstances three solutions exist: 1. using a private communication server 2. using different authentication authorities 3. minimize the need for re-authentication Ad. 1). If it can be assumed that the user has access to a private communication server (possibly run by a third party), this server can be used to solve the problem. Again, this is a central approach, but the point is that this server is unrelated to the accessed services and it is a conscious decision on the user’s part to use and trust that server. Ad. 2). The second solution is using many different authentication authorities, instead of only one. For example, the first service in a process could act as an authority. Furthermore, the user might choose different authorities for accessing different services as it is intended by the Liberty Alliance specifications [5] and the federated Single Sign-On approach [3]. This stands out against a central approach clearly, but has the drawback of possible privacy violations mentioned earlier. Ad. 3). The third solution is to minimize the need for re-authentication. This can be achieved by digitally signing the request and binding it to the current session. The signature can be verified by every service in the chain and thus the user can reliably be authenticated at each service. To prevent sending the complete request and disclosing too much information to each service the request could be split into separate signed parts. Alternatively, the parts could be partially encrypted with the targeted service’s public key. However, signing and splitting the request is not possible in all cases. To sum up, every solution has its benefits and drawbacks and should be applied according to the situation at hand. Based on our experience, a sensible combination of the proposed solutions may solve most problems.
6
Conclusion
In this paper, we analyzed privacy threats in distributed service frameworks. First, we introduced a metric to calculate security in such frameworks. The term opinion was defined and an algebra which is based on recommendation
Privacy and Trust in Distributed Networks
131
relationships was given. With this it is possible to determine the trustworthiness of a request without having information of all involved services. Determining trust stepwise through a chain of services led to the conclusion, that the trustworthiness of a request will decrease while the value of uncertainty will become higher. Thus, eventually a request may be considered as not trustworthy. In order to complete the request successfully, the necessity for some re-authentication mechanism arises. Furthermore, we have shown how the client’s privacy can be harmed by simple error messages. The mere fact of the existence of error messages combined with a knowledge of the workflow may disclose private information to others. Next, several methods of re-authentication and their impact on privacy have been discussed. A consequence of this analysis is that messages should contain only the absolute minimum information necessary for the services to function correctly. Any more data may harm the client’s privacy. Furthermore, encryption should be used wherever sensible, so that information can be passed to services further down the chain without disclosing it to intermediate nodes. It has also been shown that introducing a trusted third party may have substantial benefits from the point of view of privacy. We hope, that other researchers follow suit and consider privacy when designing methods for distributed service networks.
References 1. A. Abdul-Rahman and S. Hailes: A distributed trust model. In Proceedings of the New Security Paradigms 97, 1997. 2. A. Abdul-Rahman and S. Hailes: Supporting trust in virtual communities. In Proceedings of the Hawaii Int. Conference on System Sciences 33 , Maui, Hawaii, 2000. 3. J.D. Beatty et al: Liberty Protocols and Schemas Specification 1.0. Liberty Alliance, 2002. 4. ECSC-EEC-EAEC: Information Technology Security Evaluation Criteria (ITSEC), 1991. 5. J. Hodges et al: Liberty Architecture Overview 1.0. Liberty Alliance, 2002. 6. International Standardization Organisation (ISO): Evaluation criteria for IT security (ISO/IEC 15408:1999), 1999. 7. A. Jøsang: The right type of trust for distributed systems. In C. Meadows, editor, Proceedings of the 1996 New Security Paradigms Workshop, 1996. 8. A. Jøsang: Artificial reasioning with subjective logic. In Abhaya Nayak, editor, Proceedings of the Second Australian Workshop on Commonsense Reasoning, 1997. 9. A. Jøsang: An algebra for assessing trust in certification chains. In J.Kochmar, editor, Proceedings of the Network and Distributed Systems Security (NDSS’99) Symposium, 1999. 10. A. Jøsang: Trust-based decision making for electronic transactions. In L.Yngstrm and T.Svensson, editors, Proceedings of the Fourth Nordic Workshop on Secure IT Systems (NORDSEC’99), Stockholm, Sweden, 1999. 11. J. Kohl and C. Neuman: The Kerberos Network Authentication Service (V5). RFC 1510, 1993. 12. Microsoft Corporation: Microsoft .NET Passport – Technical Overview, 2001.
Extending the SDSI / SPKI Model through Federation Webs* Altair Olivo Santin1,2, Joni da Silva Fraga1, and Carlos Maziero2 1 Federal University of Santa Catarina – DAS/CTC/UFSC
C.P. 476, CEP 88040-900 – Florianópolis – Brazil {santin,fraga}@das.ufsc.br 2 Pontifical Catholic University of Paraná – PPGIA/CCET/PUCPR
R. Imaculada Conceição 1155, CEP 80215-901 – Curitiba – Brazil {santin,maziero}@ppgia.pucpr.br
Abstract. Classic security systems use a trust model centered in the authentication procedure, which depends on a naming service. Even when using a Public Key Infrastructure as X.509, such systems are not easily scalable and can become single failure points or performance bottlenecks. Newer systems, with trust paradigm focused on the client and based on authorization chains, as SDSI/SPKI, are much more scalable. However, they offer some difficulty on locating the chain linking the client to a given server. This paper defines extensions to the SDSI/SPKI authorization and authentication model, which allow the client to build new chains in order to link it to a server when the corresponding path does not exist.
1 Introduction In the classic view of authentication and authorization in distributed systems, the naming service centralizes the authentication procedure, restricting its action to the local naming domain. On the other hand, authorization mechanisms are generally implemented in a distributed way. This model, usually adopted in corporate networks, grows in complexity when applied to the whole Internet. In order to overcome the scalability limitations, it is necessary to define inter-domains trust relationships, allowing the coverage of a global naming space. Under such circumstances, the management of these relationships may become a difficult task. Public Key Infrastructures (PKI) offer means to carry out authentication on a global context. The X.509 PKI, for example, adopts a global naming system (X.500), which is based on a hierarchical trust model formed by Certification Authorities (CA). In this model, the authentication chains start from a root CA and lead to a principal (a user, for example). Although the X.509 PKI is widely used, its global model faces *
This project has been partially supported by the Brazilian Research Council – CNPq, under the grant 552175/2001-3.
A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 132–145, 2003. © IFIP International Federation for Information Processing 2003
Extending the SDSI / SPKI Model through Federation Webs
133
difficulties on adjusting to each country’s legislation, and can also be difficult to use due to its complex and inflexible scheme. In other words, trust models based on a centralized entity (names/authentication service), besides representing critical points regarding faults and vulnerability, may impose restrictions to performance and system’s scalability on large-scale environments [1]. Internet applications must be developed taking into account authentication and authorization models in which the trust relationships can be established on a flexible, scalable, and distributed way. The Pretty Good Privacy (PGP) mechanism, employed to cipher and authenticate computer files and e-mail [2], adopts a structure for key and certificate management based on a web of trust. When compared to the X.509 hierarchies, the PGP web of trust – built up on an arbitrary way – is quite flexible and very well adapted to the World Wide Web features. However, choosing such pondered based models leads to difficulties for making trust decisions, as multiple signatures can be demanded in a single certificate for assuring credibility. On egalitarian trust models – which have the main purpose of adapting authentication and authorization models to the distributed worldwide network environment – i.e. the Internet – the trust management concept has been proposed mainly as a focused paradigm for authorization [3]. The trust management unifies the concepts of security policies, identification, access controls, and authorization. Two different approaches are found in the technical literature that can follow this concept. In the first one, the trust management is set using a language for authorization and credentials description, and an engine that defines the compliance checker module. PolicyMaker and KeyNote [4] are systems that use this approach. The concept of trust management can also be implemented using a standardized information structure that allows the description of both credentials stating authorization and security policies; the Simple Distributed Security Infrastructure / Simple Public Key Infrastructure (SDSI/SPKI) is a good example of this approach. The SDSI/SPKI infrastructure has been motivated by the complexity of the X.509 global naming scheme. SDSI [5] is a security infrastructure which main purpose is to make the building of secure distributed systems easier. SPKI [6] is the final result of concentrated efforts on a project of a simple and well-defined authorization model. As they have complementary purposes, SPKI and SDSI proposals are combined together, resulting in a unique base for authentication and authorization in distributed applications. The main difficulty in SDSI/SPKI is to find an authorization chain that certifies a given principal (client) is granted permission to access an object or a service in the distributed system. Several architectures and algorithms have been proposed to help a client to search a certificate chain. However, none of these proposals offers alternatives to the client when a search for a certificate chain is unsuccessful (i.e. no certificate chain is found between the client and the server). This work presents a new approach for using trust chains for authentication and authorization in large scale distributed systems. The SDSI/SPKI trust model is extended through the federations notion, in order to simplify certificate management, as well as to establish new trust relationships in large scale systems. Federations define domains in which there exist trust relationships among principals, providing
134
A.O. Santin, J. da Silva Fraga, and C. Maziero
mechanisms that allow principals to compose global trust relationships. Therefore, in the absence of a given authorization chain, principals can locate certificates in the federation web and then negotiate the concession of privileges in order to establish a new authorization chain. This paper is structured as follows: section 2 shortly summarizes SDSI/SPKI; section 3 introduces the proposed extensions to the SDSI/SPKI model; section 4 explains how new authorization chains can be established; section 5 details the prototype implementation; section 6 summarizes related works, and finally section 7 outlines some conclusions.
2 Overview of SDSI/SPKI The SDSI/SPKI defines an egalitarian trust model: principals are public keys that can sign and issue certificates – similarly to a Certification Authority (CA) on an X.509 PKI environment. In the current version of SDSI/SPKI, two different types of certificates are defined: one for names and the other for authorization. A name certificate links names to public keys, or even to other names. The names described on a name certificate are meaningful only within the naming space of the certificate issuer. The concatenation of the public key of the certificate issuer with a local name represents a SDSI/SPKI unique global identifier. A certificate is always signed with the private key of the issuer. The SDSI/SPKI names and naming chains are used only to ease searching the real principal identifiers: the public keys. When names need to be resolved, the naming chain must be examined in order to reach the corresponding public key. The procedure of resolving the naming chain to reach the real certificate name is called “naming chain reduction”. The SDSI/SPKI authorization certificates link authorizations to a name, to a special group of principals – called threshold subjects – or to a public key. Through these certificates, the issuer can delegate access permissions to other principals (other public keys) in the system. The SDSI/SPKI authorization certificates can be used for two different purposes. If the delegation bit is off (delegation not allowed), the received privileges cannot be delegated (forwarded). In such case, the subject (principal) should keep the authorization certificate considering it as “private”, i.e. only that principal can use it. If the delegation bit is on (delegation allowed), the subject is holding a “public” authorization certificate, enabling it to delegate (grant) access privileges, which means, keeping them for private use, passing them on to a third party – either as a whole or partially – or both [7]. For the access control procedures, the granted rights through consecutive delegations (authorization chains) must be “reduced / summarized” to only one certificate containing the intersection of all the privileges granted to that subject, in a procedure called “authorization chain reduction”. Fig. 1 shows the authorization flow on the SDSI/SPKI trust model. Through the delegation of privileges from the application server, authorization chains are built, ensuring trust paths between the server and the clients. In the Fig. 1, clients A and B,
Extending the SDSI / SPKI Model through Federation Webs
135
after receiving the certificates, will have authorization chains allowing them to access the server. The authorization chains are usually built arbitrarily. The privilege owner must keep the corresponding certificate and present it to the server when accessing the protected resource. Based on this, one can state that the trust model adopted by SDSI/SPKI is focused on the client.
Local certificate repository
Local certificate repository
Client B
Client A subject
(PK B_CLIENT , PK A_CLIENT , “delegation not allowed” , “authorization” , “time restrictions”)
(PK S_SERVER , PK B_CLIENT , “delegation allowed” , “authorization” , “time restrictions”)
issuer
Application Server signed request + authorization certificate chain
(“SELF” , PK B_CLIENT , “delegation allowed” , “authorization” , “time restrictions”)
ACL´s repository
Server S Caption: SELF: Reserved word, used only in ACLs issued by the authorization chain checker
PK x : x public key identification
Fig. 1. SDSI/SPKI authorization flow (trust model focused on the client)
3 A Trust Model Proposal Based on SDSI/SPKI Extensions This section presents the proposal of an extension to the SDSI/SPKI trust model, which allows building new authorization chains. The proposed trust model is based on the concept of a federation, which emphasizes the grouping of principals with common interests. The purpose of a federation is to assist its members on reducing principal names and on building new authorization chains, through the federation’s Certificate Manager (CM). By joining a federation, principals get access to the federation facilities and new trust relationships among these principals can be established. In this sense, the SDSI/SPKI trust model is extended by adding a Certificate Manager. The CM offers a certificate search alternative, either for name reduction or for creating new authorization chains.
Federation X CM
Public certificate autorization searching
Local certificate repository
signed request + authorization certificate chain Delegation request
Client A Application Server Public certificate repository
Public certificate storage
Local certificate repository C lient B
Fig. 2. SDSI/SPKI extended trust model
ACL´s repository Server S
136
A.O. Santin, J. da Silva Fraga, and C. Maziero
Fig. 2 shows the CM integrated to the SDSI/SPKI classic model. It ensures that client B stores its public certificates in the federation certificate repository. Through a search on the CM repository, client A – which has no access to the server S – can identify a principal (client B) in the federation holding such privilege. Client A can then negotiate with client B in order to receive this privilege. The presence of a client at distinct federations allows this client to easily access the public authorization certificates held by members of those federations. However, the number of federations a client must join in order to have an acceptable visibility in the worldwide network can also be considered a scalability problem. The scalability requirements are achieved in the proposed model by associating federations. The certificate managers can be associated to each other, linking those that can better represent the needs of their members. Such associations are done through trust relationships constituting federation webs (Fig. 3). This approach frees clients and servers from joining a considerable number of federations to achieve global scope. Fig. 3 illustrates how the entities constituting a federation web are organized. Client authorization certificates (private and public) are stored in a local repository under the responsibility of an agent that represents this principal in its local domain. Clients make name certificates issued by their corresponding principals and their public authorization certificates available in the CMs of the federations they belong. The certificates available through CMs are used in the search of potential issuers of delegable permissions. Principal
ACL´s repository
Federation X CM A’s Agent
Local certificate repository
Application Server X’s member
Public certificate repository
X’s member
Server S
Client A Associated
Federations web
Principal B’s Agent
Local certificate repository
Y’s member
Federation Y CM
Y’s member
Public certificate repository
ACL´s repository
Application Server Server T
Client B
Fig. 3. Federation web overview
One can notice that there is no centralization or hierarchical arrangement in the proposal. The federation webs are arbitrarily formed, and do not play any active role in the authorization chains. In other words, they just carry out support roles in the authorization procedures. A federation is basically composed of three entities: clients, servers, and certificate manager, which will be explained in the following topics.
Extending the SDSI / SPKI Model through Federation Webs
137
3.1 Certificate Manager The main purpose of the certificate manager is to facilitate the interaction between clients and servers. A certificate manager only serves the principals that belong to its own federation. The public keys of its members constitute a SDSI group. As the CM does not actively participate on any authorization chain, therefore it is not seen as a principal – it is basically a repository of public certificates. In order to any ordinary principal participate in a federation, an endorsement in the form of a threshold certificate is demanded from it [8]. The threshold certificate signature depends on “k–out of–n” federation members. Each federation defines the number of members (k) that must sign the endorsement request. When joining a federation, the principal’s name certificate is included in the federation repository. The federation’s certificate manager will store name certificates in order to make ease principal identification (section 3.3). To every new member joining the federation, a name certificate stating SDSI group inclusion is issued, for membership proving purposes. The creation of associations among federations (federation webs) is also interpreted as membership inclusion in the SDSI groups of each federation involved. In this case, the new member (the other federation) is recognized as a group defined and administered within another naming space, according to the definition of SDSI groups [5]. Therefore, the certificate manager should manage the information related to the members and associations of its own federation. This manager has the ability to include or exclude members and associations to other federations, observing any interest conflicts. Procedures for storing and retrieving name and authorization certificates are made available to federation members through standard interfaces offered by the federation’s CM. 3.2 Clients and Application Servers The client represents the principal who creates name certificates, propagates the authorization certificates by delegation, takes part in threshold certificates, requests access, and composes new chains. The storage and retrieval of certificates in the client naming space is responsibility of the client’s agent (Fig. 3). This agent is a program that manages the certificates available at the local repository. These tasks include checking and effecting signatures, searching for certificate chains, negotiating permission grants, issuing new authorization certificates, and maintaining local names consistency. The agent must be instantiated during the client’s lifetime; it interacts with the client through a binding to its operational interface. The application server implements the service objects, which are protected by SPKI ACLs kept by a guardian. In order to perform delegations and negotiations to propagate permissions, the server can also make use of an agent. In the certificates reduction procedure, the server can issue authorization certificates to clients that
138
A.O. Santin, J. da Silva Fraga, and C. Maziero
present new delegation chains and/or include the public keys of these clients in the guardian’s ACLs. 3.3 Authentication, Authorization, and Auditing in the Model In the SDSI/SPKI principals’ authentication, the identification is not performed using names, but public keys, and the authentication is done through digital signatures. In order to check the digital signature on the destination, the principal’s public key must arrive there securely. As there is no public key distribution entity in the SDSI/SPKI infrastructure, the public keys demanded by an authentication procedure are available through authorization certificate chains. Mutual authentication is achieved with SDSI/SPKI on an authorization chain basis. The client making a request to a server must sign it and send it along with the authorization chain that grants the required access privileges. The authorization chain sequence associated to a request is checked by the resource guardian upon its arrival. When this verification is successful, the guardian uses the last key in the authorization chain (the client’s key) to check the digital signature on the request. Having this check been successful, then client’s authenticity is verified. Every authorization certificate carries the public key of the principal signing that certificate in the issuer field. Therefore, to authenticate a server (always expressed as a public key starting an authorization chain), the client should require the server’s name certificate, retrieved from a federation web. After that, the client uses the certificate’s public key for validating the server’s signature in the authorization chain. When all the mentioned procedures are successfully done, the server identity can be assured. All accesses by public keys to the server are locally logged, and these log records can be used for auditing purposes. If needed, the search of the corresponding name certificate can be performed on the federation web to identify the principal corresponding to the public key that performed a given access. The entire mentioned authentication and authorization scheme described in this section is in compliance with the SDSI/SPKI specifications.
4 Creation of New Authorization Chains on the Proposed Model There are several related experiences regarding procedures for searching SDSI/SPKI certificates [9,10,11,12]. However, in all these approaches, if a certificate chain is not found, the search is finished reporting an exception (fault), and the client is unable to perform the desired access. This work, through the federation notion, proposes a schema that enables a client to locate a certificate holding the needed authorization in a federation web. Later on, the client can negotiate with the privilege holder such grant to build an authorization chain that makes possible the desired access. The scenario detailed bellow will consider the situation depicted in Figure 3. At first, an authorization certificate is stored in the CM of federation X, after been propagated from the server S to the client A (A is a member of the federation X). In
Extending the SDSI / SPKI Model through Federation Webs
139
Figure 4, the messages exchanges are depicted for the case in which the chain between the client B and the server S does not exist. The client B, member of federation Y, starts by requesting an access to server S (message m1 in Fig. 4). Server S replies by sending a challenge message back to B. In this message (m2), server S informs the ACL protecting the requested object and asks from client B to prove its authorization for the requested access. In this case, SDSI/SPKI ACL data is effective to accelerate the searching process.
m3: search (“ certificate chain ”)
Federation Y CM
m4: return (“ search.null, certificate associated”)
Federation X CM
m6: return (“ certificate chain ”)
Client A
m8: negotiation (“ requirements ”) m9: negotiation (“ attributes ”)
m5: search (“ certificate chain ”)
m1: request (“ w/chain ”) m2: challenge (“ object.ACL ”) C L I E N T
m7: negotiation (“ start ”)
m10: granting (“ privilegies ”)
S E R V E R S
B
m11: response (“ request, certificate chain
Fig. 4. Messages exchanges in the authorization chain compounding
Having the ACL, B’s agent performs a local search for an authorization chain allowing the requested access that links client B to server S. This search must retrieve all the authorization chains that include the required permission, and have the requested server as the issuer. Supposing that the local search turns to be unsuccessful, B’s agent asks the CM of the federation it belongs (Y) to search for authorization certificates holding the required rights for accessing server S (m3). The attributes considered in the search are the required permissions and the public key of server S. In the case considered in Fig. 4, the search does not result in any authorization chain. In this situation, the CM of federation Y returns to client B, as a result of the search, the certificates belonging to members of the associated federations (Federation X, in Fig. 3), so that it can keep on searching (m4 message). Having the associated certificates, the client extends its search on the other federation’s CMs (belonging to the federations web). Message m5 corresponds to the queries on federation X in the considered example. In the message m6, client B receives as return from the X’s CM a chain – the authorization certificate with the access permission between client A and server S. Then client B sends to the rights holder (A) the delegation right request (message m7). The delegation of permissions can be carried out in a simple way – because both the client and the rights holder are sharing the same federation, for example. However, depending on the application semantics, more complex negotiations may be demanded. The Fig. 4 represents this situation: the requested rights holder notifies client B about a set of requirements for the permission concession (message m8). The client gathers the demanded requirements and sends them to client A (message m9). Once the application requirements are satisfied, the rights holder issues a certificate granting permissions to client B and sends it on
140
A.O. Santin, J. da Silva Fraga, and C. Maziero
message m10. By this last message, the chain compounding process is concluded and client B can answer the challenge proposed by server S in the response message m11. 4.1 Example: Internet Commerce Application In this section is depicted an example scenario to overview the usage of federation webs, which synthesizes the proposed schema. This scenario is built upon a Web-based e-commerce application, and involves access privileges location and negotiation. One should notice that the proposed schema is quite general and can be applied to distinct situations. In order to simplify the presentation, let’s consider a credit card institution (CC) and a banking institution (BK), having some business agreement allowing each other easy financial transactions. Based on this agreement, the CC representative grants to the BK representative the right to “allow purchase” – in this case, the bank representative can allow purchases if payments are to be charged to credit cards issued by the credit card institution. The BK representative, whenever receives an authorization certificate with the delegation flag on, stores it on the CM of the federation FB.
signed request + auhorization certificate chain
Principal FB’s member FCC’s associated
B’s Agent
8 7 6
3
Local certificate repository
5
4
Client B
BK’s Agent
Federation FB CM FB’s member FCC’s associated Public certificate repository Associated
Public certificate storage
C h a l l e n g e
R e s p o n s e
Local certificate repository
1
2
9
Representative BK Federations web
Federation FCC CM
Principal FCC’s member FB’s associated
Public certificate repository
R e q u e s t
Principal
Public certificate storage
CC’s Agent
Internet Commerce site server
Local certificate repository Representative CC
ACL´s repository
FCCs’ member FB’s associated
Server S
Fig. 5. Scenario for purchases in the Internet using of the federation web
Table 1 describes the messages (numbered in Fig. 5) exchanged between entities in order to implement the purchase transactions in the e-commerce site.
Extending the SDSI / SPKI Model through Federation Webs
141
Table 1. Messages exchanged in Fig. 5 scenario Step Actions 1 Client B navigates through the web pages offered by the e-commerce server S. After selecting some items to purchase, client B proceeds to checkout. 2 Server S sends back to the client a message containing the purchase bill and a challenge: the “allow purchase” privilege holder is the CC representative. Step Actions 3 Client B queries its local repository and finds no chains linking it to the CC representative. Then, client B sends the chain query to the CM of federation FB. 4 CM of federation FB makes a search in their repository and finds the required chain. It sends back to client B the chain between the server S and the BK representative. 5 Client B requests to BK representative the delegation of privilege “allow purchase”. 6 BK representative notifies client B that delegating the requested privilege requires paying the bill using one of the payment options offered by BK. 7 B client pays the bill using one of the options offered by BK. 8 BK representative delegates the “allow purchase” privilege to client B. 9 Client B sends the authorization chain to server S, along with the request (in a response message) and the server concludes the purchase transaction.
In order to monitor the “allow purchase” privilege delegations, the CC representative receives a copy of all paid purchase bills from the e-commerce site. In the scenario described here, no authorization chains existed linking the CC representative to client B. However, the mechanisms proposed by the federation web model allowed to dynamically and automatically creating the requested authorization chain, in order to complete the purchase operation on the e-commerce site. Of course, if the chain holding the requested authorization was not found in the CM of federation FB, the search would continue on the associated federations until an appropriate chain was found. For the scenario described in Fig. 5, it should be noticed that the ACL of the server does not have an entry for client B allowing it to access the services. Therefore, it is no longer required to register the clients on the server ACL to allow their access to the services. Consequently, all clients’ private information is stored only in those institutions with which they have strict relationships. In the example above, the client can pay for the purchase not only if it is a credit card customer – but also being only an ordinary bank customer. By doing so, no credit card numbers or other client-related information is transmitted through the network. Also, client information is stored only by its banking institution.
5 Architecture Implementation Aspects The SDSI/SPKI infrastructure and the policies applied in the model (previously described in sections 3 and 4) are totally independent from the adopted technology. In this sense, the tools used in the prototype (Fig. 6) have been highly influenced by the model usage in the Internet – environment assumed as the context of this work.
142
A.O. Santin, J. da Silva Fraga, and C. Maziero
The motivation on adopting CORBA as middleware is to take advantage of the services provided by that platform, mainly in aspects related to object lookup (name resolution) and secure remote access invocation. SSL (Secure Socket Layer) was adopted for remote communications. In order to establish a secure communication channel between a client and a server (holding SSL integrity and confidentially properties), mutual authentication for the principals (client and server) is required. However, SPKI uses keys as principals, instead of names. To solve this, a function was developed to translate SDSI/SPKI into SSL name certificates. The SDSI/SPKI integration with the distributed object middleware (shown in Fig. 7) was done using CORBASec at application level (CORBASec Level 2) [13]. Authorization / Authentication Policies SDSI / SPKI Infrastructure ORB / CORBASec
Client JVM
HTTP, FTP, ...
Server JVM
TLS / SSL TCP / IP
Fig. 6. Prototype model architecture
Security Level 2 is not helpful in structuring security functions at application level. However, in order to make use of the CORBA security model, a minimum set of objects at the ORB level has been kept. The following session objects were maintained: PrincipalAuthenticator, SecurityManager and Credentials (Fig. 7).
<XML>
Local certificate repository
Client
Federation Federation F F Federation F Federation F CM CM CMCM
Public Public certificate Public certificate Public repository certificate repository certificate repository repository
<XML>
Federations web
Server
ACL’s repository
ACL
Principal Authenticator
SDSI / SPKI Resolver
credentials
credentials
Security Manager
Security Manager
ORB Services
Principal Authenticator
SDSI / SPKI (Access Decision)
ORB Services
ORB Core
SSL
Fig. 7. CORBA-SPKI integration prototype overview
Fig. 7 shows other implementation details. The CM public certificate repository is implemented using Apache Xindice (which stores XML native data) [14]. The CM is
Extending the SDSI / SPKI Model through Federation Webs
143
implemented as an extension module of the Apache server [15]. All messages exchanged between members and CM is written in XML. The SDSI/SPKI certificates, originally coded as S-expressions, are translated into XML in our prototype for portability and standardization reasons [16]. The SDSI/SPKI resolver object shown in Fig. 7 is a partial implementation of the client’s agent, covering chain searching and digital signature management. Finally, the reference monitor (guardian) is implemented by the SDSI/SPKI Access Decision object. The client and server integration onto the prototype environment was greatly facilitated by using plug-ins and applets in the application deployment.
6 Related Work In [9], the DNS service was used for storing and retrieving SDSI/SPKI certificates. In that proposal, DNS extensions added by RFC 2065 have been used to allow the storage of certificate records by using entities that store identification and authorization certificates in DNS databases. In addition, the search algorithms include some filtering of the certificates being retrieved. The work [10] views the network built by the propagation of SDSI/SPKI authorization certificates as an oriented graph. It also assumes that, in typical corporate environments, such graph is hourglass-shaped. This is due to the fact that there is much more client and server keys than intermediary keys. Therefore, starting from these premises, the author uses the DFS forward and DFS backward algorithms, and their combination, to perform fast searches in a database having only one intermediary. Experiments using the distributed search algorithms proposed in [10] are reported in [11]. This work also analyses some improvements in the DFS forward algorithm. One can notice that the previously described works have been conceived for preliminary versions of SDSI/SPKI, in which some aspects of the model still had not been solved. Some premises assumed at that time are now considered obsolete, no longer complying with the RFC 2693 specifications. However, these papers have valuable contributions in terms of system architecture. According to [17], SDSI/SPKI local names can be viewed as distributed groups of principals for name resolution. Based on this assumption, the author proposes algorithms based on logic programming, supposed to be more efficient in chain search when compared to conventional implementations. As the main purpose of the paper was to define search algorithms based on logic programming, a new architecture has not been proposed. Nevertheless, the interpretation of local names as distributed groups can be considered a significant contribution. The chain search algorithms suggested by [12] and aspects considered there are deeper refinements of RFC 2693 recommendations. It also presents an implementation of the SPKI current version, quite rich in content, although no architectural propositions for distributed systems have been developed.
144
A.O. Santin, J. da Silva Fraga, and C. Maziero
7 Conclusion This paper proposed architectural extensions to the SDSI/SPKI authorization and authentication model, allowing the client to build new chains to link it to a server, when the corresponding path does not exist. The proposal is centered on the notion of federations and on entities called Certificate Managers. The role of certificate managers is to help in the construction of authorization chains, through a repository searching facility, for locating privileges needed by access. As the certificate manager does not participate in the authorization chains, the proposed model can be considered fully decentralized. Thus, the manager does not centralize nor turns hierarchical the relationship between clients and servers, neither it constitutes a critical point regarding to faults, vulnerability or performance. Adopting the federation web model frees the server from user account management. It also frees the client from the traditional account creation procedures in order to have access to a server – even in a global context. The proposed model presents a support to certificate management that allows the creation of new authorization chains. This facility is not observed in any other proposal presented in the technical literature. The proposed scheme is quite flexible and dynamic, even considering that in some cases the number of messages exchanged to create a new chain can be expressive.
References 1.
Horst, F. W., Lischka, M.: Modular Authorization. Proceedings of ACM SACMAT (2001) 2. Garfinkel, S.: PGP:Pretty Good Privacy. O’Reilly & Associates, Inc (1995) 3. Blaze, M., Feigenbaum, J., Lacy, J.: Decentralized Trust Management. Proceedings of the 17th IEEE Symposium on Security and Privacy (1996) 4. Blaze, M., Feigenbaum, J., Lacy, J.: The KeyNote Trust Management System, Version 2. IETF-RFC2704 (1999) 5. Lampson, B., Rivest, R. L.: A Simple Distributed Security Infrastructure (1996). URL http://theory.lcs.mit.edu/~cis/sdsi.html, Last access on Jun, 2003. 6. Ellison, C., Frantz, B., Lampson, B., Rivest, R., Thomas, B., Ylonen, T.: SPKI Certificate Theory. IETF-RFC2693 (1999) 7. Gasser, M., Mcdermott, E.: An Architecture for Practical Delegation in a Distributed System. Proceedings of the IEEE Symposium on Security and Privacy (1990) 8. Aura, T.: On the Structure of Delegation Networks. Proceedings of IEEE CSFW (1998) th 9. Nikander, P., Viljanen, L.: Storing and Retrieving Internet Certificates. Proceedings of 3 Nordic Workshop on Secure IT Systems (1998) 10. Aura, T.: Fast Access Control Decisions from Delegation Certificate Databases. th Proceedings of 3 Australian Conference on Information Security and Privacy (1998) 11. Ajmani, S.: A trusted Execution Platform for Multiparty Computation. Master thesis, Dep. of Electrical Engineering and Computer Science, MIT (2000) 12. Clarke, D. E.: SPKI/SDSI HTTP Server Certificate Chain Discovery in SPKI/SDSI. Master dissertation, Dep. Electrical Engineering and Computer Science of MIT (2001)
Extending the SDSI / SPKI Model through Federation Webs
145
13. OMG – Object Management Group: Security Service Specification, v1.8 (2002). URL http://www.omg.org/cgi-bin/doc?formal/02-03-11.pdf. Last access on Jun, 2003. 14. Staken, K.: Xindice Developers Guide 0.7.1 (2002). URL http://xml.apache.org/xindice/guide-developer.html. Last access on Jun, 2003. 15. Thau, R.: Design Considerations for the Apache API (2002). URL http://modules.apache.org/reference. Last access on Jun, 2003. 16. Terreros, X. S. L., Ribes, J-M. M.: SPKI-XML Certificate Structure (2002). URL http://www.oasis-open.org/cover/xml-spki.html. Last access on Jun, 2003. 17. Li, N.: Local Names in SPKI/SDSI. Proceedings of the IEEE CSFW (2000).
Trust-X : An XML Framework for Trust Negotiations Elisa Bertino1 , Elena Ferrari2 , and Anna Cinzia Squicciarini1 1
2
Dipartimento di Informatica e Comunicazione Universit` a degli Studi di Milano Via Comelico, 39/41 20135 Milano, Italy Fax +39-0250316253 {bertino,squicciarini}@dico.unimi.it Dipartimento di Scienze Chimiche, Fisiche e Matematiche Universit` a dell’Insubria Como Via Valleggio, 11 22100 Como, Italy Fax +39-0312386119 [email protected]
Abstract. In this paper we present Trust-X , a comprehensive XMLbased [9] framework for trust negotiations. The framework we propose takes into account all aspects related to negotiations, from the specification of the profiles and policies of the involved parties to the determination of the strategy to succeed in the negotiation. In the paper we present the system architecture, and describe the phases according to which negotiations can take place.
1
Introduction
The extensive use of the web for exchanging information and requiring or offering services requires to deeply redesign the way access control is usually performed. In a conventional system, the identity of subjects is known in advance and can be used for performing access control. This simple paradigm is not suitable for an environment like the web, where the involved parties need to establish mutual trust on first contact, even if they are total strangers. A promising approach is represented by trust negotiation [7], according to which mutual trust is established through an exchange of digital credentials. Disclosure of credentials, in turn, must be protected through the use of policies that specify which credentials must be received before the requested credential can be disclosed. A number of approaches to trust negotiation have been recently proposed [1], [5], [8], [6], [4]. However, all these proposals mainly focus on one of the aspects of trust negotiation, such as for instance policy and credential specification [1], or the selection of the negotiation strategy, but none of them provide a comprehensive solution to trust negotiation, able to take into account all the phases of the negotiation process. For this reason, we propose Trust-X . Trust-X provides an A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 146–157, 2003. c IFIP International Federation for Information Processing 2003
Trust-X : An XML Framework for Trust Negotiations
147
XML-based language, named X -TNL, for specifying Trust-X certificates. TrustX certificates convey information about the profile of the parties involved in the negotiation. The formalism we propose allows the specification of both credentials and declarations, where a credential is a set of properties of a party certified by a Certification Authority, whereas declarations contain information that may help the negotiation process (such as for instance specific preferences of one of the party) but do not need to be certified. All the certificates associated with a party are collected into its X -P rof ile. In addition, to better structure credentials and declarations into an X -P rof ile, each X -P rof ile is organized into Data sets. Each data set collects a class of credentials and declarations referring to a particular aspect of the life of their owner, and can be used to facilitate certificate exchange. We provide the definition of disclosure policies, encoded using XML. With the notion of disclosure policy we mean requirement needs for the release of a resource expressed by rules. Such rules regulate the disclosure of a resource by imposing conditions on the certificates the requesting party should possess and can be organized into groups of policies, ordered by a sensitivity level, to better protect flow of sensitive information that policies contain. A resource can be either a service, a credential, or any kind of data that need to be protected. In this paper we focus on the approach used in Trust-X for policy disclosures during negotiation and we present the negotiation system architecture. The paper is structured as follows. In Section 2 we present an overview of the Trust Negotiation language we have developed. We refer the readers to [2] for details on the X -TNL language. In Section 3 we introduce Trust-X negotiation, whereas Section 4 discusses some additional features of the Trust-X system. Finally, Section 5 concludes the paper.
2
Overview of X -TNL, Trust Negotiation Language
In this section, we summarize the key elements of X -TNL, the XML [9] language we have developed for specifying Trust-X certificates and policies. Then, we present X -TNL disclosure policies, that is, policies regulating the disclosure of resources by imposing conditions on the certificates the requesting party should possess. A detailed presentation of X -TNL can be found in [2]. 2.1
X -TNL Certificates
Constructs of X -TNL include the notion of certificate, which are the means to convey information about the profile of the parties involved in the negotiation. A certificate can be either a credential or a declaration.
Credential. A credential is a set of properties of a party certified by a Certification Authority. X -TNL simplifies the task of credential specification because it provides a set of templates called credential types, for the specification of credentials with similar structure. In X -TNL, a credential type is modeled as a
148
E. Bertino, E. Ferrari, and A.C. Squicciarini Olivia White Grange Wood 69 Dublin <e mail> [email protected] <position> Student Fig. 1. Example of Trust-X credential
DTD and a credential as a valid document with respect to the corresponding credential type. Each credential is digitally signed by the issuer Credential Authority, according to the standard defined by W3C for XML Signatures [9]. A credential is an instance of a credential type, and specifies the list of property values characterizing a given subject. A Trust-X credential is thus a valid XML document conforming to the DTD modeling the corresponding credential type. Figure 1 shows an example of credential, containing the basic information about a library badge issued by the digital library Library Badge. Declaration. Declarations are sets of data without any certification, therefore they are stated by their owner. Declarations can be considered as structured objects like credentials, collecting personal information about the owner. This kind of certificates provide thus auxiliary information that can help the negotiation process. For instance, a declaration named book preferences describes the literary preferences of a given subject. In X -TNL, we simply define a declaration as a valid XML document. Like credentials, also declarations are structured into declaration types, that are DTDs to which the corresponding declarations conform. Figure 2 shows the Trust-X representation of the book preferences declaration. The declaration describes the genre of books Olivia prefers to read and lists some her favourite authors. This declaration can be used to communicate Olivia’s personal preferences during negotiation with an online library.
2.2
Data Sets and X -Profiles
All certificates associated with a party are collected into its X -P rof ile. To better structure credentials and declarations into an X -Profile, each X -Profile is organized into Data sets. Each data set collects a class of credentials and declarations referring to a particular aspect of the life of their owner. For instance, Demographic Data, Education, Working Experience are examples of possi-
Trust-X : An XML Framework for Trust Negotiations
149
Olivia White horror < author> Stephen King John Smith Cinema tab Fig. 2. Example of an X -TNL declaration
ble data sets.1 For example, Alice’s certificates concerning working experiences can be collected in the Working Experience data set. In this group of digital documents we can find Alice’s work license number, a digital copy of her last job contract and some uncertified information about her precedent job experiences. Organizing certificates into data sets facilitates their retrieval during negotiation. Indeed, all the certificates collected in the same data set are logically related. Data sets can then be used to refer to a set of homogeneous declarations or credentials as a whole, and this can facilitate their evaluation and exchange during negotiation. 2.3
Disclosure Policies
Trust-X disclosure policies are specified by each party involved in a negotiation, and state the conditions under which a resource can be released during a negotiation. Conditions are expressed as constraints against the certificates possessed by the involved parties and on the certificate attributes. Each party adopts its own policies to regulate release of local information and access to services. Similar to certificates, disclosure policies are encoded using XML[2]. Trust-X policies are thus defined for protecting any kind of sensitive resources. Additionally, a resource can be characterized by a set of attributes, specifying relevant characteristics of the resource that can be used when specifying disclosure policies. To make easier resource management and protection, resources can be further classified into simple and composite resources. Trust-X language includes specific formalism to define resources relationship.2 Intuitively, a simple resource is an atomic service or information whereas a composite resource R can be thought as the composition of several resources R1 , R2 , ..Rk . Resources R1 , R2 , ..Rk in turn, can be either simple or composite and therefore 1 2
Like for credentials we assume that data set names are unique and are registered through some central organization. We omit the complete specification for lack of space.
150
E. Bertino, E. Ferrari, and A.C. Squicciarini []
policyBase[ policyBase(policySpec)+> policySpec (properties, resource, type)> properties (DELIV |certificate+) > resource EMPTY> type EMPTY > certificate (certCond*) > DELIV EMPTY > certCond (#PCDATA) > certificate targetCertType CDATA #REQUIRED> DELIV value CDATA #FIXED ’DELIV’ > resource target CDATA #REQUIRED > type value (CERT|SERVICE) ’SERVICE’ >
Fig. 3. Trust-X policy base template
may be released singularly or all together as R. Each resource Ri i ∈ [1, k] can have its own disclosure policies. Resource R itself can also have its own disclosure policies. When R is requested, the related disclosure policy is obtained by processing all policies for R, if any, and of R1 R2 , .., Rk . An example of composite resource may be a theatre package, including a theatre ticket for a performance, the corresponding script, and a dinner at the restaurant of the theatre. The resource may be likely also characterized by a set of attributes giving information related to the performance, that is, the name of the show and the exact location and time where it is performed. Different policies for the same sensitive resource denote alternative policies equally valid to obtain it. Each resource R can be disclosed if and only if one of the corresponding policies is satisfied. In addition, the disclosure policy language can be used to specify prerequisite information. Such policies denote conditions that must be satisfied for a resource request to be taken into consideration, and are therefore used at the beginning of the negotiation process, how explained in Section 3. Figure 3 shows the template of X -TNL policy base. According with our XML compliant formalism, the template correspond to a DTD, whereas a policy base is the corresponding valid XML document.
3
Trust-X Negotiation
The reference scenario for Trust-X negotiations is a network composed of different entities that interact with each other by exchanging sensitive resources controlled by the entities themselves. The notion of resource comprises both sensitive information and services, whereas the notion of entity includes users,
Trust-X : An XML Framework for Trust Negotiations
REQUESTER
X-
151
CONTROLLER
POLICY
POLICY
BASE
BASE
PROFILE
X-
POLICY EXCHANGE COMPLIANCE CHECKER
PROFILE
COMPLIANCE CHECKER
SEQUENCE PREDICTION MODULE
SEQUENCE PREDICTION MODULE
POLICY EXCHANGE
SEQ .CACHE
TREE MANAGER
TREE MANAGER
SEQ .CACHE
Fig. 4. Architecture for TrustX negotiation
processes, servers. Entities are characterized by a set of credentials, issued by CAs. Each credential describes attributes characterizing the owner, and they are used as a means to certify properties of the parties. A negotiation involves two entities (also named parties); each of them has a Trust-X profile of certificates, conforming to the syntax introduced in the previous section. During a negotiation mutual trust might be established between the controller and the requester: the requester has to show its certificates to obtain the resource, and the controller, whose honesty is not always assured, submits certificates to the counterpart in order to establish trust before receiving sensitive information. Release of information is regulated by disclosure policies, introduced in section 2.3, and by appropriate policies that govern access to protected resources by specifying credential combinations that must be submitted to obtain authorization. Disclosure policies are exchanged to inform the other party of the trust requirements that need to be satisfied to advance the state of the negotiation. The main components of the Trust-X architecture are shown in Figure 4. Note that the architecture is peer-to-peer. The goals of the system components are essentially the following: supporting policy exchange, testing whether a policy might be satisfied, supporting certificate and trust ticket exchange, and caching of sequences. Each of these functions is executed by a specific module of the Trust-X system. Facets modules may be also added to make the negotiation easier and faster, but we omit them to focus on the most relevant components. The system is composed by a P olicy Base, storing disclosure policies, the X P rof ile associated with the party, a T ree M anager, storing the state of the negotiation, and a Compliance Checker, to test policy satisfaction and determine request replies. In the following, we assume that both parties are Trust-X compliant but it is possible to enforce negotiations even between parties that do not adopt the same negotiation language, simply by adding a translation mech-
152
E. Bertino, E. Ferrari, and A.C. Squicciarini
anism to guarantee semantic conversion of possibly heterogeneous certificates. In the following we illustrate in more details the negotiation phases.
3.1
Negotiation Phases
A Trust-X negotiation is structured according to the following phases: introductory phase, the policy evaluation phase, certificate exchange phase. The policy evaluation phase is the core of the negotiation process. Certificates and services are disclosed only after a complete counterpart policies evaluation, that is, only when the parties have found a trust sequence of certificate disclosures that makes it possible the release of the requested resource, according to the disclosure policies of both parties. Even disclosure of sensitive policies may be protected, by disclosing sensitive policies gradually according to the degree of trust established. The concluding step of a Trust-X successful negotiation is an analysis of the certificates and information exchanged and may result in caching the generated sequence, as illustrated in section 4.2. In what follows we illustrate each one of the above-mentioned phases in more detail. Note that articulating the negotiation into different distinct phases results in a multilevel protection mechanism, that avoids the release of unnecessary or unwanted information. Each phase is executed only if the previous ones succeed and sensitivity of the exchanged information increases during negotiation. Indeed, the disclosure of certificates, that can be regarded as more sensitive with respect to policies, is postponed at the end of policy evaluation: only if there is any possibility to succeed in the negotiation, certificates are effectively disclosed.
Introductory Phase. The introductory phase begins when a requester contacts a controller asking for a resource R. The initial phase of a negotiation is carried out to exchange preliminary information that must be satisfied in order to start the actual processing of the resource request. Such exchange of information is regulated by the introductory policies of both parties. Introductory policies are used to verify properties of the counterpart that are mandatory to continue in the negotiation. For instance, a server providing services only to registered clients, before evaluating the requirements for the requested service can first ask the counterpart for the login name. If the requester is not registered there is no reason to further proceed. Prerequisite policies are therefore essentially used to establish whether to enter into the core of the negotiation process or not, but they also can also help in driving the following phases of the process. Introductory policies may also be used to collect information about the requester preferences and/or needs. For instance, in a purchasing of books online, a book store may ask the requester to submit the book preferences declaration, if any, in order to satisfy customer preferences. If the requester does not assume honesty of the controller it can, in turn, send its own introductory policies. Such phase is therefore composed by a short number of simple messages exchanged between the two parties.
Trust-X : An XML Framework for Trust Negotiations
153
Example 1. Consider a Rental Car Service. When Alice asks to rent a car the negotiation starts. Possible prerequisites that the agency may require during introductory phase are modeled by the following introductory policies3 : • Car Rentalp ← P ref erred customer() • Car Rentalp ← Car P ref erences(). The first policy checks whether the client has the pref erred customer credential denoting previous business relationship between the parties, whereas with the second policy the Agency asks the requester to submit the car preferences declaration, if any, in order to satisfy requester preferences.
Policy Evaluation Phase. During this phase, both client and server communicate disclosure policies adopted for the involved resources. The goal is to determine a sequence of client and server certificates that when disclosed enable the release of the requested resource, in accordance to the disclosure policies of both parties. This phase is carried out as an interplay between the client and the server. During each interaction one of the two parties sends a set of disclosure policies to the other. The receiver party verifies whether its X -Profile satisfies the conditions stated by the policies, and determines a policy counter request regulating the disclosure of the certificates requested. If the X -Profile of the receiver party satisfies the conditions stated by at least one of the received policies, the receiver can adopt one of two alternative strategies. It can choose to maximize the protection of its local resources replying only for one policy at a time, hiding the real availability of the other requested resources, or, alternatively, it can reply for all the policies to maximize the number of potential solutions for negotiation. Additionally, when selecting a policy each party determines whether its preconditions are verified by the policies disclosed until that point, and, only in this case, the policy is selected. By contrast, if the X -Profile of the receiver party does not satisfy the conditions stated by the received policies, the receiver informs the other party that it does not possess the requested certificates. The counterpart then sends an alternative policy, if any, or it halts the process, if no other policies can be found. The interplay goes on until one or more potential solutions are determined, that is, whenever both the parties determine one or more set of policies that can be satisfied for all the involved resources. The policy evaluation phase is mostly executed by the Compliance Checker, whose goal is the evaluation of remote policies with respect to local policies and certificates (certificates can be locally available in the X -Profile or can be retrieved through certificate chains), and the selection of the strategy for carrying out the remainder of the negotiation. To simplify the process a tree structure is used which is managed and updated by the T ree M anager. Note that no certificates are disclosed during the policy evaluation phase. The satisfaction of the policies is only checked to communicate to the other party the possibility of going on with the process and how this can be done. A 3
Policies are expressed in terms of logical expressions to simplify the comprehension of the corresponding semantic
154
E. Bertino, E. Ferrari, and A.C. Squicciarini
detailed description of the Trust-X negotiation, with the associated algorithms for all the negotiation strategies can be found in [3]. Certificate Exchange. This phase begins when the previous phase ends successfully, determining one or more trust sequences. Several sequences of credentials can be determined to succeed in the same negotiation. Once the parties choose the sequence of certificates to disclose, the certificate exchange starts. Each party discloses its certificates, observing the order defined in the sequence. Upon receiving a certificate, the counterpart verifies the satisfaction of the associated policies, checks for revocation, checks validity dates and authenticates the ownership (for credentials). Eventually, if further information is needed for establishing trust, it is the receiver responsibility to check for new certificates using credential chains. For example, if a medical certificate was requested and the issuer is an unknown hospital, the receiver party has to check the validity of issuer certificate by collecting new certificates from issuer repository. The receiver then replies with an acknowledgment expressed with an ack message, and asks for the following certificate in the sequence, or if it has received all the certificates of the set, it sends a certificate belonging to the subsequent set of certificates in the trust sequence. If no unforeseen event happens, the exchange ends with the disclosure of the requested resource. Figure 5 shows an example of the messages exchanged by two parties performing a car rental negotiation. Suppose that parties have successfully completed policy evaluation phase and have determined the subsequent sequence (each name denotes a certificate): [{Corrier Affiliation}, {Corrier Employee}, {Rental Car}]. The disclosure of certificates begins with submission of Corrier Affiliation credential, which should satisfy the conditions specified during policy evaluation phase. The subsequent certificate (Corrier Employee) is disclosed from the counterpart after checking the remote certificate received. Rental Service is finally provided and the negotiation succeeds.
4
Additional Features of Trust-X
In this section we discuss two additional aspects of Trust-X system, that is, the negotiation of multiple resources and the possibility of caching the sequence of certificates as a mechanism to speed up negotiation processes. 4.1
Negotiation of Composite Resources
As previously introduced, Trust-X resources can be either simple or composite, that is, obtained as the composition of several resources. The negotiation of composite resources can be considered as an extension of negotiation of simple ones and it is supported as follows. Each component of the main resource is sequentially negotiated. When a party asks for a composite resource, the negotiation is
Trust-X : An XML Framework for Trust Negotiations SERVER
CLIENT
Sequences:
sequence messagge
155
2. { ( Corrier_Affiliation),( Corrier_Employee), (Rental)}
Accept sequence
ack Send Corrier_Affiliation
Corrier_Affiliation {Corrier_Employee}{ R}
Co rrier_Employee { R} check: Corrier_Employee Send Rental Service
check: Corrier- Affilation Send Corrier_employee
Car Reservation confirmed
Fig. 5. An example of certificate exchange phase
executed evaluating first the policies for the root resource4 and then negotiating each component at the time. Each time the policy evaluation phase of a resource component succeeds, the corresponding trust sequence is determined. Note that, although it is possible to determine more than a sequence for the same resource, we now refer to the sequence on which parties agree. Instead of immediately executing it, both parties store it and start evaluating requirements for negotiating the next component resource. Intuitively, the information obtained from the previous process can be used to make this phase simpler and faster, by avoiding to ask for the same certificates or properties again. This simplification consists in not asking again the satisfaction of disclosure policies that were already satisfied in previous policy exchange. This simple strategy reduces the number of party requests and optimizes the number of exchanges. In addition, the controller can choose whether allow partial disclosure of some of the components or not. Indeed, suppose that during a negotiation of a composite resource (composed by three atomic resources, say), a party results to be in order to obtain only two of the three resource components of the resource originally requested. What should the controller do? It can choose to disclose the two resources anyway or deny the entire access. Intuitively, this decision strongly depends from the context and from the kind of the resources parties are negotiating. If parties are part of a collaborative environment and resources are not logically dependent from each other, the disclosure can be granted, otherwise, it is denied. 4.2
Sequence Caching
It is really likely that some negotiations will be performed by an entity with different counterparts having a similar profile. In such cases it might be useful to keep track of the sequences of certificates most frequently exchanged, instead 4
Policies may be specified for both the main resource and for resource components too.
156
E. Bertino, E. Ferrari, and A.C. Squicciarini
of recalculating them for each negotiation. For instance, some books of a digital library might be often asked during exam sessions. The trust negotiation process will be very similar for different students. Or, better, the asked information will be exactly the same and the only differences will concern the type of certificates disclosed. As an example, consider a student attending a certain University Y. Students might have a card issued by the main secretary office of the university, whereas students of a branch department might have only a badge issued by the departmental secretary office. Suppose that the properties required to obtain the authorization to access the library can be proved by presenting either the card issued by the main secretary office or by presenting both the badge issued by the departmental secretary office and the student library badge. Suppose moreover that students usually require privacy guarantees before disclosing certificates, and that the library proves its honesty by a set of proper certificates. Alternative sequences can therefore be generated to disclose the same resource, but all of them are quite intuitive and easy to determine. The controller can cache and suggest them upon receiving a request from a student. The student can cache the most widely used sequences for negotiating access to digital libraries as well as the library, and suggest them upon sending a request to a library. Intuitively, this approach does not ensure a complete protection of policies. However, in many contexts, protection of policies and associated certificates is not the main goal for parties. Moreover, we expect that in many scenarios there will be standard, off-the-shelf policies available for widely used resources (e.g., VISA cards, passports). In case of negotiations involving such common resources the sequences of certificates to be disclosed will be only regulated by such standard and predictable policies. In this case, certificates represent only a means to easily prove parties properties, and it is not unsafe to suggest the sequences at the beginning of the process. If the counterpart can not satisfy the proposed sequence, the negotiation can continue by executing the policy evaluation phase or by suggesting the another sequence. The module of TrustX in charge of caching and suggesting the sequences is the sequence prediction module.
5
Conclusion
In this paper we have introduced Trust-X , a comprehensive XML-based framework for trust negotiations. We have focused on the various phases in which a Trust-X negotiation is articulated. Future work includes the extension of X TNL among several directions, such as for instance the possibility of specifying the credential submitter. Another extension we are currently working on is the possibility of disclosing only portions of a credential during the negotiation process. This will allows us to protect the elements of a credential in a selective and differentiated way. Finally, we are developing techniques and algorithms for credential chains discovery and for recovering from a negotiation failure and we are optimizing caching strategy to increase the efficiency of negotiation.
Trust-X : An XML Framework for Trust Negotiations
157
References 1. E. Bertino, S. Castano, E. Ferrari. ”On Specifying Security Policies for Web Documents with an XML-based Language”. Proc. of SACMAT 2001, ACM Symposium on Access Control Models and Technologies, Fairfax, VA, May 2001. 2. E. Bertino, E. Ferrari, A. Squicciarini, ”X -TNL - An XML Based language for Trust Negotiations”. Proceedings of Fourth International Workshop on Policies for Distributed Systems and Networks, Como, Italy, June 2003. 3. E. Bertino, E. Ferrari, A. Squicciarini, ”Trust-X A Peer to Peer framework for Trust Establishment”. Submitted for Publication. 4. M. Blaze, J. Feigenbaum, J. Ioannidis, and A. Keromytis, The KeyNote TrustManagement System RFC 2704, September 1999. 5. T. Yu, M. Winslett, K. Seamons, ”Supporting Structured Credentials and Sensitive Policies through Interoperable Strategies for Automated Trust Negotiation” ACM Transactions on Information and System Security, volume 6, number 1, February 2003. 6. P. Bonatti, P. Samarati, ”Regulating Access Services and Information Release on the Web” 7th ACM Conference on Computer and Communications Security, Athens, Greece, November 2000. 7. K. E. Seamons, M. Winslett, T. Yu, B. Smith, E. Child, J. Jacobson, H. Mills, and L. Yu. ”Requirements for Policy Languages for Trust Negotiation”. International Workshop on Policies for Distributed Systems and Networks (POLICY 2002), Monterey, CA, June 2002. 8. A. Herzberg, Mihaeli, Y. Mass, D. Naor, and Y. Ravid, ”Access Control System Meets Public Infrastructure, Or: Assigning roles to Strangers” IEEE Symposium on Security and Privacy, Oakland, CA, May 2000. 9. World Wide Web Consortium. Available at http://www.w3.org/
How to Specify Security Services: A Practical Approach Javier Lopez1, Juan J. Ortega1, Jose Vivas2, and Jose M. Troya1 1Computer Science Department, E.T.S. Ingeniería Informática
University of Malaga, SPAIN {jlm,juanjose,troya}@lcc.uma.es 2Hewlett-Packard Labs.
Bristol, UK [email protected]
Abstract. Security services are essential for ensuring secure communications. Typically no consideration is given to security requirements during the initial stages of system development. Security is only added latter as an afterthought in function of other factors such as the environment into which the system is to be inserted, legal requirements, and other kinds of constraints. In this work we introduce a methodology for the specification of security requirements intended to assist developers in the design, analysis, and implementation phases of protocol development. The methodology consists of an extension of the ITU-T standard requirements language MSC and HMSC, called SRSL, defined as a high level language for the specification of security protocols. In order to illustrate it and evaluate its power, we apply the new methodology to a real world example, the integration of an electronic notary system into a web-based multi-users service platform.
1 Introduction Many problems with security critical systems arise from the fact that developers seldom have a strong background in computer security. However, nowadays it is widely accepted that an adequate specification of a system is required in order to obtain a robust implementation. There is currently an increased need to consider security aspects at the early stages of system development. This need is not always met by adequate knowledge on the part of the developer. This is problematic since security is most often compromised not by breaking the dedicated mechanisms, but by exploiting weaknesses in the way those mechanisms are used. Therefore security mechanisms cannot simply be inserted into the system as an afterthought. In consequence, security aspects should be considered already at an early stage of the software development life cycle. Results obtained using formal specification techniques are not readily applicable in the context of a real world development environment. First of all there is a requirements engineering problem: how to capture the intended security requirements. Then A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 158–171, 2003. © IFIP International Federation for Information Processing 2003
How to Specify Security Services: A Practical Approach
159
we have an implementation problem. Thus, it is not obvious how to reconcile the mathematical notion of a perfect public key with the fact of a stored file representing a couple of numbers n and e encoded according to the Basic Encoded Rules (BER). Also, although we often talk about secure channels, in reality what we have are things such as https connections. Security requirements are commonly expressed as system constrains, but often they are in fact a kind of service that must be provided by a variety of mechanisms. In this sense, security requirements are not different from e.g. real-time requirements, and should be treated in an analogous way. Accordingly, we refer to these services as security services. The rest of this paper is organized as follows. In Sect. 2 we present some common security concepts. In Sect. 3 we give an overview of a couple of representative specification languages. Sect. 4 is dedicated to a brief introduction of the communication requirements language Message Sequence Charts (MSC), and the High-Level MSC (HMSC). In Sect. 5 we describe a new specification language, SRSL, which is an extension of MSC and HMSC. Sect. 6 is dedicated to the description of an application of SRSL to a real world example, and in Sect. 7 we present some conclusions.
2 Specification of Security Properties Paradigm A security protocol [7] is a general template describing a sequence of communications and making use of cryptographic techniques to meet one or more particular security related goals. The basic security services [11] provided by security mechanisms (cryptographic algorithms and secure protocols) are authentication, access control, data confidentiality, data integrity, and non-repudiation. The notion of authentication includes authentication of origin and entity authentication. Authentication of origin can be defined as the certainty that a message that is claimed to proceed from a certain party actually originated from it. As an illustration, if during the execution of a protocol Bob receives a message, supposed to come from Anne, then the protocol is said to guarantee authentication of origin for Bob if it is always the case that, if Bob’s node accepts the message as being from Anne, then it must indeed be the case that Anne sent exactly this message earlier. Thus, authentication of origin must be established for the whole message. Moreover, it is often the case that certain time constraints concerning the freshness of the message received must also be met. Entity authentication protocols, by its turn, guarantees that the claimed identity of an agent participating in the protocol is identical to the real one. Access control service ensures that only authorized principals can gain access to protected resources. Usually the identity of the principal must be established; hence entity authentication of origin is also required here. Confidentiality may be defined as prevention of unauthorized disclosure of information. Data integrity means that data cannot be corrupted, or at least that corruption will not remain undetected. If it were possible for a corrupted message to be accepted, then
160
J. Lopez et al.
this would show up as a violation of integrity and the protocol must be regarded as flawed. Non-repudiation provides evidence to the parties involved in a communication that certain determined steps of the protocol have occurred. This property appears to be very similar to authentication, but here the participants are given capabilities to fake messages up to the usual cryptographic constrains. It uses signature mechanisms and a trusted notary. These services are enforced using cryptographic protocols or similar mechanisms, and it is essential to determine which ones are needed. In order to specify a security system it is not necessary to know how the analysis of the system will be carried; however, it is absolutely indispensable to identify the security services required.
3 Overview of Security Specification Languages We focus here on two kinds of specification languages applied to security systems: languages for software engineering and languages for the design and analysis of cryptographic protocols [6]. A representative example of the first kind of language is UML - Unified Modeling Language [10]. UML is a language for the specification, visualisation, development, and maintenance of software systems. The notation consists basically of graphical symbols, including a set of diagrams giving different views of a system. J. Jurjens [3] has defined an extension of UML, UMLsec, to specify standard security requirements on security-critical systems. The aim of this work is to use UML to encapsulate knowledge on prudent security engineering and to make it available to developers not specialised in security. It is based on the most important kinds of diagrams for describing object-oriented software, class diagrams, state-chart diagrams, and interaction diagrams, and uses the basic elements offered by UML to extend the language, i.e. stereotypes, tagged values, and constraints. An example of the second kind of language is CAPSL [8]. This is a high-level formal language intended to support security analysis of cryptographic authentication and key distribution protocols. A protocol specification in CAPSL can be translated into a multiset rewriting (MSR) rule intermediate language like CIL, and the result can be used as input to different security analysis tools. A CAPSL specification has tree sections: protocol specification, type specification, and environment specification. The type and environment specifications are optional. A protocol specification is a description of behaviour and consists of three parts: declarations, messages, and goals. A type specification defines cryptographic operators, whereas an environment specification provides scenarios used by model-checking tools to verify the protocol. A CAPSL extension called MuCAPSL is also under development [9], intended to support the specification of protocols for secure multicast. We believe it is important to develop a specification language integrating both kinds of languages. As a first approach to achieve this aim we propose here a new language, the Security Requirements Specification Language (SRSL), based on Mes-
How to Specify Security Services: A Practical Approach
161
sage Sequence Charts. In the next section we give an overview of the latter, and in the subsequent one we introduce the SRSL.
4 A Requirements Language for Communication Protocols: MSC The ITU-T’s Standardization Sector specifies Message Sequence Charts [2] (ITU-T Z.120) as the requirements language for the visualization of system runs or traces within communication systems. MSCs can be defined as a trace language for describing message interchanges among communicating entities. It is endowed with a graphical layout that gives a description of system behaviour in terms of message flow diagrams that is both clear and perspicuous. MSCs focus on the communication behaviour of system components and their environments, and are widely used as follows: for requirements definition; for specification of process communication and interface; as a basis for automatic generation of Specification Description Language (SDL) [1] skeletons; for selection and specification of test cases; and for documentation. It is used most frequently together with SDL. The basic language constructs of MSCs are instances and messages. Instances are graphically represented by an axis, i.e. a vertical line or a column. An entity name and an instance name can be specified within an instance heading in the graph. A total ordering of the communication events is specified along each instance axis. Actions describing an internal activity of an instance, in addition to message exchange, may also be specified. The system environment is also represented by a frame symbol forming the boundary of the diagram. Instances may also be created from a parent instance. Instance creation is described by a special symbol in the shape of a dashed arrow that can be associated with textual parameters. Termination of instances is also possible, and is represented by a stop symbol in form of a cross at the end of an instance axis. It is also possible to specify conditions describing a state associated with a nonempty set of instances. Conditions can be also used for sequential composition of MSCs. MSCs can be used to describe the behavior of a subsystem or component intended to be combined in different ways into a more complex system. In addition, MSCs may also be combined with the help of expressions consisting of composition operators and references to the MSCs. MSC references can be used either to reference a single MSC or a number of MSCs using a textual MSC expression. The MSC expressions are constructed from the operators alt, par, loop, opt and exc, described below. The keyword alt denotes alternative executions of several MSCs. Only one of the alternatives is applicable in an instantiation of the actual sequence. The par operator denotes parallel executions of several MSCs. All events within the MSCs involved are executed, with the sole restriction that the event order within each MSC is preserved. An MSC reference with a loop construct is used for iterations and can have several forms. The most general construct, loop, where “n” and “m” are natural numbers, denotes iteration at least n and most m times.
162
J. Lopez et al.
The opt construct denotes a unary operator. It is interpreted in the same way as an alt operation where the second operand is an empty MSC. An MSC reference where the text starts with exc followed by the name of an MSC indicates that the MSC can be aborted at the position of the MSC reference symbol and instead continued with the referenced MSC. MSC references with exceptions are used frequently. High-level MSCs [2] provide a means to graphically define how a set of MSCs can be combined. An HMSC is a directed graph where each node is a start symbol, an end symbol, an MSC reference, a condition, a connection point, or a parallel frame. The flow lines are used to connect the nodes in the HMSC and indicate the sequencing that is possible among the nodes. The incoming flow lines are always connected to the top edge of the node symbols, whereas the outgoing flow lines are connected to the bottom edge. If there is more than one outgoing flow line from a node this indicates an alternative. The conditions in HMSCs can be used to indicate global system states or guards and impose restrictions on the MSCs that are referenced in the HMSC. The parallel frames contain one or more small HMSCs and indicate that the small HMSCs are the operands of a parallel operator, i.e. the events in the different small HMSCs can be interleaved. The connection points are introduced to simplify the layout of the HMSCs and have no semantic meaning. High-level MSCs can be constrained and measured with time intervals for MSC expressions. In addition, the execution time of a parallel frame of an HMSC can be constrained or measured. The interpretation is similar to the interpretation of Timed MSC expressions.
5 The SRSL Language The specification language proposed here is an extension of ITU standard requirements language MSC and HMSC. SRSL [4] is a high level language intended to specify cryptographic protocols and secure systems. Such a language must be modular, easy to learn, and able to express security notions. The main design criteria we have used during development of the SRSL language are the following: • The language should provide visualization capabilities. A graphic representation offers perspicuous views of a system and clarifies the communication among customers, users and developers. • It should be possible to produce specifications of a system at several levels of abstraction. Different stakeholders may have different views, and accordingly so we require that it be possible to describe a system at several levels of abstraction. • The language should make available mechanisms for modularisation. Standard implementations of a part of the system are often used as components of the whole system. Standard modules may be defined, implemented and reused. For instance, it is common to implement a new system using well-defined standards protocols. Thus, if we need a server authenticated and encrypted
How to Specify Security Services: A Practical Approach
•
•
•
163
connection, we can make use of a standard TSL protocol. In this way it becomes also easier to use a formerly implemented library in any program language. Standard languages should be reused as much as possible. Using standard languages have a lot of benefits. Usually it is difficult to learn a new language or methodology. In the beginning there may be not enough tools to support it, or they may be too inefficient. Standard languages typically have tools that support them, and many users may have previous experience with them. The language should be suitable for validation purposes. When specifying a system we usually describe what the system is supposed to do. However, we need also be able to validate the system, i.e. to show that it is defined according to the specification. It should be possible to express security requirements in the language. Unfortunately, there is currently no standard notation to represent those requirements.
The SRSL has two levels of representation: multi-layer module scheme and security scenario description. The multi-layer module scheme describes security systems in terms of a multilayered structure. The first layer is the communication medium, also used to study attack strategies in the security analysis phase, i.e. the intruders’ behaviour. The other layers depend on the security mechanisms defined during system development. This description is translated into a security scenario representation using standard package definitions. As an example, we consider a system that uses SSL security mechanisms in order to achieve server authentication and confidential communication. Its design must ensure that an application server (the responder) is given evidence of the fact that a sender (the initiator) has previously sent some message, i.e. the protocol must be able to ensure non-repudiation of the origin of messages sent to the server. The module specification is depicted in figure 1. User Protocol INITIATOR
RESPONDER
SSL Protocol SSL layer (client)
SSL layer (server)
MEDIUM LAYER
Fig. 1. SRSL Module Specification
164
J. Lopez et al.
The SSL layer is described in a standard security communications package. Therefore, part of medium layer is generated automatically. Consequently, we only have to specify the Initiator-Responder protocol. It is composed of simple scenarios described in MSC. The SRSL security scenario description is divided into three parts; the specification of protocol elements, the message exchange flow, and the security services requirements. The extensions proposed concern the definition of entities, the definition of data types related to security aspects, and the security services that are going to be used or analysed. These are described in comment text boxes, and are intended to be examined during the security analysis phase. These elements required to define a security protocol can be divided into several categories. These are explained as follows (keywords are in cursive): • Entities: Agent (Initiator/Responder), the principal’s identification; Key_Server, which provides cryptographic keys; Time_Server, which provides time tokens; Notary, which registers the transaction; Server Certification Authority (SCA), which validates a certificate. • Messages: Text, of type clear text; Random_Number, an integer; Timestamp, giving the actual time; Sequence, the count number. • Keys: Public_key, e.g. in PKCS#12 format (aprostrophe symbol (‘) reference to private key); Certificate, a public key signed by CA; Private_key, used to sign documents; Shared_key, a secret key shared by more than one entity; Session_key, a secret key used to encrypt transmitted data. In addition, SRSL may operate with previously defined data types. These operations are: Concatenate, composition of complex data (operator ’,’); Cipher (operator “{“ ”}”), which provides cipher data resp. cleartext data (e.g. RSAcipher/RSADecipher PKCS#1 format); Hash, the result of a one-way algorithm; Sign (operator “[“ “]”), a message hash encrypted with signer’s private key (e.g. RSAsign PKCS#7). Furthermore, user-defined functions are also considerated. The message exchange is defined in MSC, the requirements language most widely utilized in telecommunications, and its extension HMSC, explained in section 4. The is known for its high degree of flexibility and is universally accepted in protocol engineering. The security services requirements section are also described in comment text boxes. We use three different security statements: Authenticated(A,B), stating that B is certain of the identity of A; conf(X), stating that the data X cannot be deduced (also called confidentiality); NRO(A,X) ( non-repudiation of origin ), stating that that data X (the evidence) must have originated in A. These statements have been formally defined in [5]. Furthermore, an automatic translator program [7] is used to produce the SDL version of the system from SRSL. The SDL system produced is subsequently used to analyse security requirements of the system during the analysis phase. In the specification stage we must consider two different kinds of tasks. The first one concerns the specification of a system yet to be developed, in which case we have more freedom to choose what security mechanisms to use. The second one concerns the specification of an already implemented system.
How to Specify Security Services: A Practical Approach
165
In order to specify a system, we have to identify different functional parts. These are represented as composition of MSCs using HMSC. Each part is considered separately. Next, we specify possible scenarios without regard to security requirements, i.e. taking into consideration only the purely functional aspects. As an example, if we want to specify the access to a data bank account via the internet, we only describe the data request and the bank reply. Figure 2 depicts a scenario consisting of a sequence of message exchanges, described in MSC. As we can see, we have two entities: the User_browser and the Bank_portal. The user, by way of a browser, asks for access to the bank’s portal in order to request his data bank. We obtain two alternative scenarios according to whether the access request is accepted or rejected. First the user sends its account number and data request. If the request is accepted, the Bank_portal sends the User_data_bank. Otherwise, if the request is rejected, a rejection data is sent instead. The transmitted data is finally displayed in the User’s browser. MSC
bank_access
Bank_Portal
User_Browser Web_access
data_request ( acc_number,data_req ) 1
alt data show_user_data
( User_data ) 1 reject
show_req_rejected ( data_rejected ) 1
Fig. 2. MSC scenario of user’s access to data bank
We can now analyse the specification in order to verify and validate the functional requirements, whereupon we may proceed to analyse the security requirements. These are defined in a comment text box. If a new system is being designed, we include here the security mechanisms needed to meet these requirements. Otherwise, we include the existing security mechanisms instead. In Figure 3, we describe the process of integration of the security requirements into the specification. The security requirements are the following:
166
J. Lopez et al.
12345-
Authenticated(Bank_Portal,User_Browser); Authenticated(User,Bank_Portal); conf(account_number); conf(data_req); conf(user_data);
The first requirement means that user’s browser must authenticate itself to bank’s portal. This is achieved with the help of an HTTPS-connection. The second requirement means that the user must authenticate itself to the bank’s portal. This is accomplished by a mechanism that asks for the user’s identification and password, and subsequently validates it. The three last requirements mean that the data transmitted is confidential. This goal is accomplished by making use of the session key established during the https connection. MSC bank_security_access User_Browser
Bank_Portal definitions User_Browser, Bank_Portal: Agent; account_number, data_req: Text; data_user_bank: Text; web_form_login: Text; login,password: Text; wellcome_page: Text; session_ID: Text; data_rejected: Text; httpskey: session_key;
init_state Web_access https_connection_server_authen login_password show_web_login_page fill_web_login_page
Security Service Authenticated(Bank_Portal,User_Browser); Authenticated(User,Bank_Portal); conf(account_number); conf(data_req); conf(data_user_bank);
( {web_form_login}httpskey) login_password ({login,password}httpskey )
alt
invalid_login_password
1
show_invalid_authentication
init_state 1 gainned_access show_wellcome_page ( {wellcome_page,session_ID}httpskey) select_data_request
1
data_request ( {account_number,data_req,session_ID} )
alt
data_bank
1
show_user_data_bank ( {user_data_bank,session_ID}httpskey) 1 rejected show_request_rejected ( {data_rejected,session_ID}httpskey) init_state 1
Fig. 3. SRSL security scenario of user’s access to data bank
How to Specify Security Services: A Practical Approach
167
We might as well have considered alternative security mechanisms to meet the five requirements. The important point to note here is that we have chosen a form of specification that does not bind the developer to any particular security mechanisms, thus enhancing separation of concerns and modularity. This is accomplished by allowing the security requirements to be defined at a high level of abstraction and independently of the definition of the functional requirements. In the case that we have a system that is already implemented, i.e. a legacy system, and we want to analyse or document it, we describe instead the security mechanisms that have been implemented.
6 A Case Study: Online Contracting Processes We have applied our methodology to a system currently being developed by an IT company that plays the role of user partner in the EU-project where this work has been performed. This is working on a virtual enterprise business scenario implementing online contracting processes by integration of Trusted Third Party services (TTPs) such as electronic notary systems into a web-based multi-users services platform. The current on-line contracting process is rather complex and supports several activities such as contract creation, negotiation, signing and final archiving. To start with, we focus on the contract signing process (managing the contract signing and notarisation process control). This procedure is part of the business-tobusiness scenario for setting up a virtual enterprise platform integrating technology components such as e-contracting, e-notary and role based authorization engines. This section describes the existing electronic notary process within an e-business scenario. The central core of this set-up is the MESA platform, developed by the IT company. MESA provides web-based user interfaces and role based control mechanisms for accessing functions made available by the TTPs.
Fig. 4. Contract signing process
The following diagram (depicted in figure 5) describes the contract signing process implemented by an e-Notary reference application and used within the IT company scenario. A user intending to access a web-based user interface provided by the MESA
168
J. Lopez et al.
platform triggers manually the contract signing process within the following business scenario: In the sequel we describe the contract signing process, including the security requirements and the relationships among the users, the MESA platform, and the enotary service. Our methodology has been used to examine this process in terms of communication security issues. The intended goals have been to validate the model and evaluate both the current reference implementation and a proposed extension to an agent-based scenario for the reference implementation. M ESA
Fig. 5. Application structure in SRSL module description
This implementation is being used within the current business scenario. However, the current client/server implementation, based on traditional PKC technology, has inherent problems in terms of flexibility and scalability. While the reference scenario requires a certain infrastructure, compliance to the European directives concerning digital signatures, to alternative PKC technologies and to certificate infrastructures might be more suitable when adopting the e-notary process within other business scenarios (with different context of actors, contents, legal requirements and liability issues). In fact, changing the context of a recent e-Notary deployment scenario and identification of implications in terms of security are the most interesting challenges we face. We have to pay attention to the fact that what we have specified here is a newly implemented system. Therefore the task has been to describe the behavior of current application in order analyze and improve the current implementation. We started by emphasizing for the developers the usefulness of elaborating a system specification intended to clarify the different scenarios in order to increase our understanding of them and to avoid certain ambiguities. We show now the definition of the system module description representing the different layers structured according to the security services.
How to Specify Security Services: A Practical Approach
169
The protocols described in each layer are specified in terms of MSC/HMSC diagrams. Many are standard protocols and so they are instances of generic specifications. The system (see Figure 6) is divided into three parts: the contract creation process, the signing process, and the notarization process. A diagram in a lower abstraction level describes each MSC reference. M S C
N e t u n io n
c o n t r a c t _ c r e a t io n
s ig n i n g _ p r o c e s s
n o t a r i s a t i o n _ c o n tr o l
Fig. 6. HMSC application description
A representative part of the specification is the create_contract scenario (Figure 7). MSC
create_ contract CL
MESA Definitio n client_authentic ated
1
alt c reate_contract_from _tem plate
ask_for_list_of_tem plates
(
{s ess ion_ID }https key
Secu rity service conf(contract); conf(ses sion_ID ); conf(tem plate_ID ); conf(tem plate); conf(list_of_signers);
) lis t_of_tem plates
show_lis t_of_tem plates {listtem ( plates }httpskey
) 1
choos e_tem plate
tem plate_ID
(
{s ess ion_ID ,tem plate_ID }httpskey
)
tem plate s tore_tem plate_localy {tem ( plate}https key
)
1
negotiation upload_contract
contract (
{sesion_ID ,c ontrac t}httpskey
)
list_of_signers s how_list_of_s igners {list_of_s ( igners ]httpskey
) 1
c hoose_s igners
lis t_of_s igners
( C onfirm _data
{session_ID ,list_of_signers}https key
C L,M E SA : Agent; contract, teplate_ID : T ext; ses sion_ID : T ext; list_of_signers: T ext; c onfirm _data: T ext; httpskey : S ES S IO N _K E Y;
)
1
c onfirm {confirm _data}httpskey C reated_contract 1
c reated_c ontrac t
Fig. 7. SRSL create_contract security scenario
170
J. Lopez et al.
The contract leader (CL) triggers the contract creation process. Previously, the contract leader and the MESA platform had to be authenticated, and a HTTPS session key exchanged. This is represented by the initial state client_authenticated. The scenario is divided into four independent alternatives (alt-operator). In the third sub-scenario we use the task MSC operator to specify the possibility of an external negotiation agreement that is not part of our system. The fourth sub-scenario ends the process by accepting the uploaded contract and starting the next scenario in the state created_contract. Developers considered this methodology very useful for their purposes, especially with regard to the specification of the contract signing process (Figure 8). The notification was initially implemented by letting the E-notary service send an e-mail to each signer. However, this procedure is unreliable since it lacks any kind of security guarantees. When this fact was drawn to attention of the developers, they decided to modify the system in order to provide for security services, such as the signing of the email by the e-notary service to ensure non-repudiation of origin (NRO). This has been appended to security services section, and a signing mechanism has been included in the definition of the e-notary. This mechanism is checked later in analysis phase. MSC contract_signing
MSC contract_sign_NRO
Definitions
Definitions Signer : Agent; e_notary : Agent;
Signer,e_notary : Agent; contract_ID: text; PKen: Public_key;;
contract_ID: text;
Security service NRO(e-notary,contract_ID Integrity(contract_ID)
Security service Signer
e_notary
Signer
e_notary
signing
signing
1
loop
1
loop
contract_ID
contract_ID
uest_for_contract_signing
contract_ID_signed ( {contract_ID}Pken)
1
wait_for_contract_signing
1
wait_for_contract_signing
Fig. 8. SRSL contract_signing security scenario & non-repudiation improvement
The developers deemed this methodology easy to learn and to apply in real environments. They believed that it has been of great help for understanding the implementation and that it provided a method to improve the application with regard to the required security services and mechanisms. Furthermore, the methodology made available a formal method of analysis that increased the developers and users reliance on the system.
How to Specify Security Services: A Practical Approach
171
7 Conclusions We have studied different methods for designing and analyzing a system containing communication protocols. We observed that in order to define a secure system, the developers need a unified framework that allows them to integrate the security aspects into both the software system itself, or at least relevant parts of it, and the communication protocols that constitute a part of the total system. The methodology consisted of an extension of the ITU standard requirements language MCS, called SRSL, a high level language for the specification of cryptographic protocols. In order to illustrate the methodology, we have shown an application consisting of an electronic notary process scenario that the developers wanted to validate and improve. Moreover, we have described how this electronic Notary process can be inserted into a different scenario, given different input parameters. In this way, we were able to offer a framework within which it became possible to define and to evaluate different deployment options for rolling out the security services. We concluded by noting that the solutions proposed were very well received by the developers, who considered them easy to learn and to apply. Acknowledgements. The work described in this paper has been supported by the European Commission through the IST Programme under Contract IST-2001-32446 (CASENET).
References 1.
ITU-T Recommendation Z.100 (11/99), Specification and Description Language (SDL), Geneva, 1999. 2. ITU-T Recommendation Z.120 (11/99), Message Sequence Charts (MSC-2000), Geneva,1999. 3. Jurjëns, J., Towards development of secure systems using UMLsec, Lecture notes in Computer Science 2029, 2001. 4. Lopez, J., Ortega, J.J. and Troya, J.M., Protocol Engineering Applied to Formal Analysis of Security Systems, Infrasec'02, LNCS 2437, Bristol, UK, October 2002. 5. Lopez ,J., Ortega, J.J. and Troya, J.M., Verification of authentication protocols using SDL-Method, Workshop of Information Security, Ciudad-Real- SPAIN, April 2002. 6. Meadows, C., Open issues in formal methods for cryptographic protocol analysis, Proceedings of DISCEX 2000,pages 237–250. IEEE Comp. Society Press, 2000. 7. Menezes, A., Van Oorschot, P.C., Vanstone, S., Handbook of Applied Cryptography, CRC Press, 1996. 8. Millen, J. and Denker, G., CAPSL integrated protocol environment, In DARPA Information Survivability Conference (DISCEX 2000), IEEE Computer Society, 2000. 9. Millen. J. Denker, G. CAPSL and MuCAPSL, J. Telecommunications and Information Technology, 2002 10. Object Management Group. http://www.omg.org/ 11. Ryan, P. and Schneider, S., The Modelling and Analysis of Security Protocols: the CSP Approach, Addison-Wesley, 2001.
Application Level Smart Card Support through Networked Mobile Devices Pierpaolo Baglietto1, Francesco Moggia1, Nicola Zingirian2, and Massimo Maresca2 1 DIST University of Genoa, via Opera Pia 13,
16145 Genova, Italy {p.baglietto, fram}@dist.unige.it 2 DEI, University of Padua, via Gradenigo, 16145 Padova, Italy
{mm, panico}@dei.unipd.it
Abstract. This paper addresses the problem of using networked mobile devices as providers of cryptographic functions. More specifically the paper describes a system developed to allow the usage of portable devices, such as PDAs and mobile phones, as remote smart card readers when connected into a TCP/IP network. This system is completely transparent to desktop applications. The digital signature technology, at its highest level of security, requires the use of smart cards and smart card readers not yet widely deployed. This requirement limits the mobility and may turn out to be an obstacle to the wide adoption of the digital signature technology. Our work aims precisely at facilitating the adoption of the smart card technology by means of PDAs and mobile phones.
1 Introduction The main challenge for a successful and wide deployment of a new technology is the implementation of new user friendly services possibly characterized by a minimal impact on the an existing infrastructure. The use of smart cards to execute cryptographic operations, such as the electronic signature, is presently possible only by means of smart card readers directly connected to the PC. The goal of our work has been to allow accessing the services of a microprocessor card located on a remote device without the introduction of a new API. On the contrary, at the application level we adopt standard interfaces that are already widely adopted. In past projects, such as the CASTING project [1], a GSM phone has been proposed as a tool for authentication and for the storage of confidential information. In that case, private keys were securely stored on a SIM card that was able to communicate with a desktop PC using the SECTUS protocol, an ad hoc wireless protocol. In other works [2][3][4] has been proposed to make smart card accessible in a more general distributed environment. On the contrary, the work described in this paper is focused on the development of a system which simply allows to remotely access the services provided by smart cards A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 172–180, 2003. © IFIP International Federation for Information Processing 2003
Application Level Smart Card Support through Networked Mobile Devices
173
through the standard PKCS#11 [5] or CSP (Cryptographic Service Provider) [6] application interfaces in TCP/IP networks. More specifically the architecture presented in this paper is based on a client-server model and it consists of two components. These components are both compliant with either the PKCS#11 or the CryptoAPI specifications. The client module establishes a secure TLS [7] channel with its corresponding server application and it sends calls to the PKCS#11 or CSP module to the server module. The server module receives the information about the requested cryptographic operations and executes them on the smart card inserted on a local smart card reader by means of usual PKCS#11 or CSP modules. The whole process is transparent to the user. In section 2 and 3 we briefly present application level cryptographic standards we have used in the implementation of the system and the related work in the remote smart card application field. In section 4 we describe the usage scenario we considered during both the specification of the system architecture, described in section 5, and the development of a prototype system described in section 6. Finally, we give some concluding remarks in section 7.
2 Smart Card Application Programming Interfaces The digital signature devices are used in the application by means of two standard called respectively CryptoAPI and PKCS#11, that define a high level application programming interface (API) layer. This API isolates an application from the details of the cryptographic device. The application does not have to change the interface for a different type of device or to run in a different environment; thus, the application is portable. 2.1 Cryptographic APIs Cryptographic APIs (CryptoAPIs) are set of functions that allow applications to encrypt and digitally sign data in a flexible manner, granting privacy, secrecy and protection for the user’s sensitive private-key data. CryptoAPIs provide services that enable developers to add cryptography and certificate management functionality to their applications. Applications can use CryptoAPI functions ignoring the underlying implementation, in the same way that an application can use a graphic library putting no attention about the particular graphic hardware configuration. All cryptographic operations, furthermore, are performed by independent modules known as cryptographic service providers (CSPs). Each CSP provides a different implementation of the CryptoAPI. Some CSPs interact directly with hardware components such as smartcards.
174
P. Baglietto et al.
2.2 PKCS#11 This standard specifies an application programming interface (API), called “Cryptoki,” for devices which hold cryptographic information and perform cryptographic functions. Cryptoki addresses the goals of technology independence (any kind of device) and resource sharing (multiple applications accessing multiple devices), presenting to applications a common, logical view of the device called “cryptographic token”.
3 Scenario The basic usage scenario is the signature of an e-mail, a document, or everything that have to be signed for authentication, integrity and non-repudiation purposes, without using a local smart card reader. In particular, let’s consider the signature task of a document. If a user wants to digitally sign a document in a traditional way he has to use a smart card reader attached to his personal computer and the relative software. This is a restriction to spreading and utilization of digital signature, making it feasible only with full-configured and full-equipped PC. On the contrary, our system allows the user to perform a digital signature operation everywhere, by means of a mobile device. In everyday life few people take with them a PC smart card reader but everyone possesses a cellular phone or, surely in the next future, a more intelligent device such a PDA or a SmartPhone. Our idea is to use these mobile devices as remote smart card readers.
Fig. 1. The system basic elements
Another usage scenario may be the following. Let’s consider the trust relationship usually existing between a company secretary and her sales manager in case of normal business transactions. The company secretary needs that some documents, not particularly significant for the enterprise, to be signed by the sales manager. The manager
Application Level Smart Card Support through Networked Mobile Devices
175
trusts the secretary and puts his signature without spending time to read them accurately. This is a widespread interaction-model and we want to reproduce it using digital signature through networked mobile devices. The proposed schema is based on a client-server model that involves three main components: a client application on a desktop computer, a server application on a mobile device, and a secure channel between them. The mobile device is equipped with a smart card that contains the secret keys of the signer. We suppose that the mobile device has a PC/SC compliant smart card reader. The environment is a TCP/IP network, so that our device can be identified by an unique IP address. So this approach is valid both in a wireless LAN or in a WAN through the Internet.
4 Related Work in Remote Smart Card Applications The problem of making the services offered by a smart card accessible from the Internet has been addressed in literature. In the WebSIM approach [5] a GSM SIM is transparently visible from the Internet via HTTP. This is possible by means of a small Web server placed on the SIM, whose unique function is to provide a Web-like interface to the SIM. All the services are implemented as server side scripts on the card and, so, they can be accessed via HTTP requests. The Web server is an applet on the top of a GSM SIM Toolkit platform that allows interacting with the SIM and provides a set of API for I/O operations. The problem of connecting the Web server to the Internet is solved by introducing a proxy server into the WebSIM architecture. The proxy has a public IP address which an Internet host can send its HTTP request to. The proxy encapsulates the HTTP request into a specially tagged SMS and sends it to the user’s mobile phone. Once de-capsulated, the HTTP request is processed by the Web server and the response is sent back to the proxy using the same SMS tunneling mechanism. Finally, the proxy extracts HTTP response from the SMS and sends it to the Internet host. The WebSIM main purpose is to univocally authenticate and identify a user in the Web. In the CASTING project [4] a mobile phone, containing the user certificate and the corresponding private key, is used for authentication purposes, allowing each user to securely access to his personal data, placed on a Web sever, from a pool of fixed PCs (terminals). The link between the mobile phone and the user terminal is secured by a wireless ad hoc protocol named SECTUS, while the long distance Internet connection is protected with SSL/TLS. The idea behind this work is to install a custom PKCS#11 Netscape token on the user terminal, that asks the mobile phone to execute on the card all necessary cryptographic functions involving the user private key in the SSL handshake with the Web server, and leaves to the terminal the task to perform the remaining operations. The goal of the SECTUS protocol is to establish an authentic channel between the terminal and the mobile phone by sharing a secret between them. The secret is necessary to compute the MAC of all messages that are sent across the wireless link, assuring that they are not been altered. The SECTUS protocol consists of the following steps: the mobile phone generates a temporary secret, called r, that is in-
176
P. Baglietto et al.
serted by the user into the terminal. The mobile phone sends its certificate to the terminal that generates an authentication key, called s. This is encrypted with the user public key and sent back to the mobile phone together with a MAC calculated with r. The mobile phone verifies the MAC, decrypts s and, finally, sends the MAC of its certificate calculated with s to the terminal that, so, can verify it. In the TLS handshake with the Web server the client proves its identity by signing an hash value. The authenticity of the transmission of the hash and the signature is assured by MAC computed with the authentication key s. A different approach, based on a distributed object model, is adopted by Chan and Tse [6], whose aim is to give a Corba interface to java card applications. The proposed architecture defines an OrbCard adaptor that works as a proxy gateway, translating the client Corba-specific requests into their APDU representations and sending them to the on-card applet. This applet generates the responses in the APDU format, which the adaptor maps into Corba responses and sends back to the client. The OrbCard adaptor gives a Corba interface to the on-card services, so that they appear as typical Corba objects.
5 Architecture Our starting point is that the system must be transparent to the user, who can use standard application to perform a sign task. The digital signature devices are accessed in our application by means of two standard called respectively Crypto Service Provider (CSP) and PKCS#11. Both of them define a high level API layer that is independent from the specific cryptographic device. For example Internet Explorer and Outlook Express uses CSPs (through CryptoAPI interface) to sign a document or an e-mail, while Netscape Navigator uses a PKCS#11 module. Our system relies on these standard interfaces, assuring a good level of portability. We have developed both PKCS#11 and CSP modules that expose all the essential functions for digital signature, that are a subset of whole CryptoAPI / PKCS#11 functions. The idea is to transmit through a socket channel the necessary data to perform a sign task on the mobile device. The use of TCP/IP and Internet as, respectively, communication protocol and communication channel between the application and the smart card in place of RS232 and a serial cable entails the use of some technique to make the channel secure as if we had a local smart card reader. We must be sure about the identity of the mobile device which we are sending information to, we have to be sure about the identity of the person that executes the sign request and we want to assure the communication secrecy. 5.1 Smart Card Reader Authentication Our solution for smart card reader authentication and communication secrecy is to use an SSL channel between the client and the server application, so that the PDA can authenticate itself presenting its certificate to the client and the communication chan-
Application Level Smart Card Support through Networked Mobile Devices
177
nel between client and server can be encrypted for information hiding purposes. Smart card authentication is necessary to avoid that someone could impersonate the mobile device. In that case, the ill-intentioned could easily put himself between the user and the mobile device, read the whole communication and modify it in order to sign his document instead of the user.
Fig. 2. The modules we have developed
5.2 User Authentication For the user authentication, instead, we use an ad hoc solution. When the user wants to sign a document or e-mail, in fact, the application that performs the operation asks to the user to insert a PIN. The PIN, generally, protects sensible data on the card and allows only an authenticated access to the on board functions. In our application, the PIN takes the meaning of a user identification code. The user identification code, serves exclusively to allow the user to be sure that the signature request, appearing on his PDA, is the request started on the personal computer. Once the user has verified that the request is correct by means of the user identification code on the PDA, he must enter the PIN code to perform the signature. Inserting PIN code on PDA is more secure than inserting PIN code on a personal computer. Altering a software or the operative system on a PDA, in fact, is more difficult than modify them on a PC, simply because the PDA is less accessible than a personal computer. On the personal
178
P. Baglietto et al.
computer, in fact, a ill-intentioned could install a software that, while the user inserts a PIN, captures the keys pressed on the keyboard and rescues the PIN. The user identification code, for example, is necessary to avoid the man in the middle attack. If we send a request to the mobile device to sign a document without the user identification code, an hacker could block our request and take our place and we wouldn’t have the necessary means to reveal this attack. So the user, interpreting the request on his PDA as trusted, would insert the PIN code to finalize the signature task, but the signed document owner would be the ill-intentioned and not the user.
Fig. 3. System architecture
Fig. 4. A possible attack to the system
Application Level Smart Card Support through Networked Mobile Devices
179
6 Implementation At the implementation level, we have developed a system based on a client-server paradigm. The client side application includes a PKCS#11 module, while the server side application a SSL server. Moreover, the client side application communicates with the server side application by means of an ad hoc communication protocol. 6.1 Ad hoc Communication Protocol Our ad hoc communication protocol consists of a simple mechanism of serialization. In particular, we have implemented a common base class that holds a function identifier and a member variable containing the server side error code generated by the invoked function. Each PKCS#11/CSP function is mapped in a class, derived from the base, that contains specific parameters. The necessary steps to invoke a function are the following: - Instantiate the right class corresponding to the function (Function id is automatically assigned in the constructor of the class) - Insert the values of the function parameters in the instantiated class - Send the class instance to the server - Receive the server response - Parse the response and return to the client application the right values 6.2 Client Side Application A PKCS #11 module is the main part of the client side application. For simplicity, we have not implemented all the functions defined by the standard but only those necessary for being able to perform digital signature operations, and only some of them are executed remotely. The functions locally performed refer to information about the PKCS#11 module, to functionalities supported by the module and to all that is not used directly by the smart card. The client module opens an SSL channel with the mobile device when the application requests the execution of the first remote function. For being able to verify if the mobile device to which we are sending confidential information is trusted device, the PC screen visualizes the certificate sent by mobile device in the SSL handshake and asks to the user if the content is correct. At this point, the SSL channel is established and all remote functions can be correctly invoked by means of the ad hoc protocol above-mentioned. At the end, the client closes the SSL session when the digital signature task is finished or when a timeout expires. 6.3 Server Side Application The server side application waits for the incoming client request and try to establish a secure SSL channel with it. The server identity is granted by the SSL certificate and by the corresponding private key generated with OpenSSL.
180
P. Baglietto et al.
If the SSL handshake has succeeded, the server is ready to execute the function requested by the client. The server performs all operations, parsing the client request and identifying the function called. This is accomplished by checking the first byte received that specifies the function identifier invoked by the client application. This process continues until the client requests to execute the C_Login function. At this point the server shows on the display of the PDA the one-time user identification code, so that the user can verify it and, if it’s correct, insert the PIN code to perform digital signature. For the management of the SSL channel we have used the OpenSSL libraries compiled for Windows CE.
7 Conclusions and Future Works We have presented an architecture model based on well-defined standards that allows signing of a document remotely and in a transparent way. We have made a prototype that uses a handheld PC with a wireless LAN card, and we are working to extend this work to cellular phones. SIM Application Toolkit provides the SIM card with an embedded java virtual machine on which it is possible to run applets. Our idea is to use a proxy that terminates the SSL connection of the client and works as a SMS gateway for the mobile device. Moreover, a direct porting of our application on mobile phones that adopt Windows CE as operating environment is straightforward.
References 1. 2.
3.
4. 5. 6. 7.
Michael Rohs, Harald Vogt., Smart Card Applications and Mobility in a World of Short Distance Communication, CASTING Project Tech Report., ETH Zurich, January 2001. Roger Kehr, Michael Rohs, Harald Vogt, Mobile Code as an Enabling Technology for Service-oriented Smartcard Middleware, Proc. 2nd Int. Symposium on Distributed Objects and Applications DOA'2000, Antwerp, Belgium, IEEE Computer Society, pp. 119–130, September 2000. Scott Guthery, Roger Kehr, Joachim Posegga, and Harald Vogt. GSM SIMs as Web Servers. In 7th International Conference on Intelligence in Services and Networks IS&N, Athens, Greece, Feb. 2000 A. Chan, F. Tse, J. Cao and H. Leong, Enabling distributed Corba Access to smart card applications, Ieee Internet Computing, pp. 27–36, Vol. 6 N. 3, May 2002. RSA Laboratories. PKCS#11 v2.11: Cryptographic Token Interface Standard, November 2001. Robert Coleridge, The Cryptography API, or How to Keep a Secret, Microsoft Developer Network Technology Group., August 1996. C. Allen and T. Dierks. The TLS Protocol, Version 1.0. Internet RFC 2246, January 1999.
Flexibly-Configurable and Computation-Efficient Digital Cash with Polynomial-Thresholded Coinage Alwyn Goh1, Kuan W. Yip2, and David C.L. Ngo3 1
Corentix Laboratories, B-19-02 Cameron Towers, Jln 5/58B, 46000 Petaling Jaya, Malaysia. [email protected] 2 Help Institute, BZ-2 Pusat Bandar Damansara, 50490 Kuala Lumpur, Malaysia 3 Faculty of Information Science & Technology, Multimedia University, 75450 Melaka, Malaysia Abstract. This paper describes an extension of the Brands protocol to incorporate flexibly-divisble k-term Coins via application of Shamir polynomial parameterisation and Feldman-Pedersen zero knowledge (ZK) verification. User anonymity is preserved for up to k sub-Coin Payments per k-term Coin, but revoked for over-Payments with (k+1) or more sub-Coins. Poly-cash construction using only discrete logarithm (DL) or elliptic curve (EC) operations enables efficient implementation in terms of the latter; which constitutes an advantage over previous divisble Coin formulations based on quadratic residue (QR) binary-trees, integer factorisation (IF) cryptography or hybrid DL/IF. Comparative analysis of Poly-cash and previous protocols illustrates the advantages of the former for operationally realistic Coin sub-denominations. The advantage of Poly-cash in terms computational overhead is particularly significant, and facilitates implementation on lightweight User Purses and Merchant Payment-terminals. Configurable k-divisibility is also an important consideration for real-world applicability with decimal currency denominations, which is not well addressed by the binarised values of QR-tree divisible Coins.
1 Introduction Digital cash protocols—specifying interactions between distinct User, Bank and Merchant entities—typically emphasise anonymous Payment transactions and feature Coin data-structures which can be recognised as authentic. These attributes are characteristic of physical currency, the goodness of which can be established without User identity being an operational issue. The notion of User anonymity in a digital cash context is usually modified to result in preservation only in the event of legitimate Coin usage. Conditional anonymity allows for offline Coin verification—with respect having been issued by a particular Bank, without compromising User identity and without the necessity for Merchant-to-Bank connectivity—during a User-to-Merchant Payment. Merchants and Banks are, however, protected via detection of User fraud during subsequent Merchant-to-Bank Deposits. The conceptual framework for digital cash protocols was defined by the groundbreaking research of Chaum [1] and Brands [2, 3], the latter of which constitutes the base-formulation (with single-term nondivisible Coins) for featured protocol. The concept of Coin-divisibility (with multiple sub-Coins per Coin) was first formulated by Okamoto-Ohta [4, 5], and is motivated by the reduced overheads arising from amortisation of computationally-expensive Withdrawals. Okamoto divisi A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 181–193, 2003. © IFIP International Federation for Information Processing 2003
182
A. Goh, K.W. Yip, and D.C.L. Ngo
bility is based on QR/IF cryptography, with the Coin a binary-tree of successively computed QRs. Such a data-structure—subsequently also used in the Chan-FrankelTsiounnis [6] protocol—has 2k leaf-nodes for a tree of height k, allowing sub-Coins of binarised fractional value 1 2
k
for k ∈ [1, n]. QR-tree Coinage is extremely effi-
cient for large k values, but would result in relatively high overheads for smaller denominations due to the necessity for long QR parameter-lengths. The Ferguson protocol [7] also features Coin divisibility based on the Rivest-Shamir-Adleman (RSA) formulation, and would therefore require similarly high overheads. We outline a divisible cash protocol with flexibly configurable k-term Coins— with sub-Coins of relative fractional value 1 —specified via Shamir [8] polynomials k
and Feldman-Pedersen [9, 10] verification. The proposed divisibility mechanism can be expressed in terms of DL or EC finite-field operations, the latter of which results in significantly reduced overheads. This is a major advantage over the QR/IF basis of previous divisible cash protocols. The resultant k-term Coins and extended Brands operational framework is both efficient and versatile, for instance being straightforwardly supportive of practical (k = 10) dime or (k = 20) nickel denominations.
2 Review of Single-Term Brands Protocol 2.1 Bank/User Setup The Brands protocol can be described in terms of the following processes:(1) Bank Setup: establishment of common computational environment (2) User Setup: account establishment and individual licensing for Coin-generation (3) Withdrawal: generation of anonymised Coin, resulting in User account-debit (4) Payment: transfer of transaction-committed Coin from User to Merchant (5) Deposit: submission of Coin to Bank, resulting in Merchant account-credit as illustrated below:
User Signed Committed Sub-Coin
Blinded Unsigned Coin
Payment Transaction Parameters
Merchant
Withdrawal
Deposit
Blinded Signed Coin
Bank
Signed Committed Sub-Coin
Fig. 1. Digital Cash Withdraw-Pay-Deposit Cycle
Flexibly-Configurable and Computation-Efficient Digital Cash
183
Brands digital cash—in common with other DL formulations—can transcribed in terms of EC finite-field operations, thereby leveraging the latter’s order-of-magnitude advantage in terms of computation, storage and communications for equivalentsecurity parameterisations. The Brands scheme in particular requires publication of the following environmental parameters: (1) primes (p, q) with q | p–1 as previously b defined, (2) basepoints (g, g′, g′′) ∈ E a, p on the elliptic curve defined over Zp with number of points divisible by large prime q, and (3) public component of Bank keypair (x, y (= xg)); during Bank Setup. Individual User Setups episodes can subsequently proceed with the generation of individual key-pairs (µ, I (=µg′)); with the public components submitted to the Bank, associated with specific User accounts and signed (using Bank private-key x) to create Coin-generation licenses of form z = x(I + g″). In the Brands framework the different base-points serve various functions ie g for blind-signature generation and (g′, g″) for representation of the Coin data-structure. 2.2 User-to-Bank Withdrawal The simultaneous requirements for verifiable Coin-authenticity and conditional Useranonymity is ensured by the Bank generating a blinded Schnorr signature for each Coin, as outlined below:Table 1. User-to-Bank Withdrawal Bank B 1
Generate random w Compute Schnorr (a, b)
User U (a, b) →
2
3 4
Compute r = cx + w mod q
←c r→
Generate blinding (s, u, v) Compute Coin A and (z′, a′, b′) Generate secret-splitting (x′, x″) Compute Coin B = x′g′ + x″g″ Compute Coin challenge (c′, c) Verify • rg = cy + a • r(I + g″) = cz + b Compute r′ = ru + v mod q
The Schnorr signature protocol (in common with other EC/DL formulations) requires a secret signer-determined randomisation w; which is subsequently used to compute (a, b) = (wg, w(I + g″)), the second component of which is User-specific. The subsequent blinding—by the signature verifier (User)—usually requires four parameters, two each for message blinding and manipulation of the signer (Bank) response to a verifier-issued challenge. The choice of (I + g″) as the Schnorr message necessitates the application of User private-key µ as one of the blinding parameters, along with random Coin-specific (s, u, v). This allows the computation of (A, z′) = (s(I + g″), sz)
184
A. Goh, K.W. Yip, and D.C.L. Ngo
and (a′, b′) = (ua + vg, sub + vA). Parameter A can be regarded as the User-specific part of the Coin, while (z′, a′, b′) are components of the (as yet incomplete) signature. Encoding of the User secret into the Coin is established randomly generated factors (x′, x″) and B. The latter constitutes a User-to-Bank commitment on the specific Coin, and is later used for the verification of the challenge-response pair during Payment. This is followed by computation of unblinded c′ = h(A, B, z′, a′, b′) and blinded c = c′u −1 mod q challenge parameters, both of which are Coin-specific and the latter of which is then presented to the Bank for blind-signature affixation. All of computations thus far constitute pre-transaction operations which can, in fact, be executed (and the generated parameters stored) prior to an actual Withdrawal episode resulting in User account-debit. This speeds up Coin-generation to a significant extent, which consequently only requires the User-to-Bank challenge-response sequence in the last two steps of Table 1 for satisfactory conclusion. A proper response r to blinded challenge c would require knowledge—presumed as restricted to the Bank—of private-key x and transaction parameter w. Such a response for blinded message (z, a, b, c) can be verified using Bank public-key y and enables construction of unblinded signature σ(A, B) = (z′, a′, b′, r′), following which the Bank can be safely authorised to execute the account-debit. The interpretation for σ(A, B) as a blind-signature can be seen from the Bank not knowing (A, B), in conjunction with the dependence of third-party (Merchant) verification condition (r′g, r′A) = (c′y + a′, c′z′ + b′) on Bank public-key y. Coin χ = (A, B, σ(A, B)) is hence verifiably authentic, while also entirely protective of User anonymity. 2.3 User-to-Merchant Payment Coin χ can subsequently be used for Payment as follows:Table 2. User-to-Merchant Payment U
Compute challenge c = h ( A, B, I m, T ) Compute response (r′, r″) for c ↓ χ(A, B), T, (r′, r″)
Merchant M
Verify Bank signature σ(A, B) Compute challenge c Verify response r′g′ + r″g″ = cA + B
Note the Payment-specific non-interactive challenge c—which incorporates Merchant ID I m and external specifier (ie timestamp) T—the correct response for which is (r′, r″) = (c(µs) + x′, cs + x″). The response: (1) encodes µ and Coin-specific blinding factor s, while (2) concealing s; thereby facilitating recovery of the User secret (and consequently identity I) with two or more of the challenge-response pairs submitted during the subseqeuntly described Deposit. User anonymity is therefore conditional on proper use ie not more than one Payment per Coin.
Flexibly-Configurable and Computation-Efficient Digital Cash
185
2.4 Merchant-to-Bank Deposit Deposit is simply Merchant-to-Bank submission of the Coin and Payment-specific commitment χ′(A, B) = (χ(A, B), (c, r′, r″)), as outlined below: Table 3. Merchant-to-Bank Deposit M B
↓ χ′ (A, B) Verify own signature σ(A, B) Verify response r′g′ + r″g″ = cA + B Check for over-use of σ(A, B) If detected, then:′ ′ • Compute µ = r − ρ r′′ − ρ′′ • Compute I = µg′ for identification
with (ρ′, ρ″) some previously catalogued response in the Bank database, threby enabling detection of double spending. There are three types of fraud which can be attempted on a particular Coin, with multiple responses that are: (1) identical, (2) nonidentical but non-verifiable, and (3) non-identical and verifiable. The first two cases can probably be attributed to Merchant error or fraud, with User non-liability in (2) due to malformation of the commitments. Case (3) is User double-spending since valid commitments can only be computed by the User, whose anonymity is the revoked via recovery of key-pair (µ, I) from multiple Payment-specific responses.
3 Review of Polynomial Secret-Sharing Our scheme is based on the division of the Brands’ Coin to multiple sub-Coins via polynomial secret-sharing (SS). Initial setup requires selection of a configurable threshold (k) so that the secret—in our particular case the User-identity—can be be split into an arbitrary (n) number of shares, with secret-reconstruction necessitating possession of k+1 (or more) shares. On the other hand, possession of k (or less) shares is cryptographically equivalent to knowing nothing about the divided secret, hence preserving User anonymity. There are various encoding schemes for such divided secrets, with the most straightforward (due to Shamir) based on polynomial interpolation over a finite field. k In such a scheme a k-th degree polynomial f (x) = ∑ bi x i mod q is used to encode i=0 secret f (0) = b 0 , with the remaining k coefficients bi ∈ Zq (for i ≠ 0 and q prime) randomly generated and kept secret by the User. The Coin-specific polynomial is then recoverable via Lagrange interpolation given k+1 distinct (x, f(x)) coordinate pairs on the polynomial-defined curve, enabling polynomial reconstruction k k x−xj mod q and subsequent recovery of secret f(0). f (x) = ∑ f ( x i ) ∏ j = 0, ≠ i x i − x j i=0
186
A. Goh, K.W. Yip, and D.C.L. Ngo
Polynomial-based SS is attractive due to the inherent threshold property preventing the manipulation of k (or less) (x, f(x)) shares for any useful information on the encoding polynomial. Shamir SS does, however, assume honest share allocation by the User, which is operationally unrealistic. The disclosed shares must therefore be verifiable, thereby allowing receivers to validate share association with the secret polynomial. This can be established via Pedersen verified SS (VSS) which represents both the secret and its shares in terms of EC/DL images, thereby resulting in ZK share-verification. The b Pedersen formalism—expressed in terms of EC operations with respect g ∈ E a, p — requires prior disclosure of polynomial commitments Ei = big , thereby allowing ZK k k polynomial evaluation via f (x)⋅g = ∑ x iEi = ∑ bi x i⋅g . This enables a Merchant i=0 i=0 to verify the legitimacy of an individual share (sub-Coin) with respect a Coin-specific secret, without acquiring cryptographically useful information on the Coin and consequently the User. This results in User anonymity preservation unless there are if k+1 (or more) distinct sub-Coins (Payments) corresponding to a k-term Coin. Polynomial SS is an elegant mechanism with which to implement Coin divisibility into the Brands digital cash protocol. Note this results in both divisibility and conditional-anonymity mechanisms being based on EC operations, in contrast to previous formulations with Coin divisibility based on QR/IF cryptography. Our approach also enables interpretation of Brands conditional-anonymity—which is revoked from two distinct challenge-response pairs—as a special case with linear (k=1) polynomials.
4 Poly-Cash: k-Divisible Polynomial-Based Coins Incorporation of polynomial divisibility requires requires extending the aboveoutlined Withdrawal, Payment and Deposit operations—for single-term Brands Coins—as follows: 4.1 Coin Withdrawal Coin-specific polynomials f(x) (of degree k) can be used to encode User identity ie b 0 = µ , with the remaining coefficients randomly generated and kept secret by the User. k-term Coin-generation therefore entails generation of polynomial coefficients and their EC images ie (bi, Ei (= big′′ )) for i = 0…k., with integration into the Brands
formulation via specification of Coin-parameter B = h ∏ Ei . This constitutes a ∀i User-to-Bank commitment on the Coin-specific polynomial which (if over-spent) can subsequently be used for secret-recovery. Withdrawal and Coin-generation then proceeds as follows:
Flexibly-Configurable and Computation-Efficient Digital Cash
187
Table 4. Coin Withdrawal B 1
Generate random w Compute Schnorr (a, b)
U (a, b) →
2
Generate blinding (s, u, v) Compute Coin A and (z′, a′, b′) Generate secret-splitting (x′, x″) Compute Coin B from Ei ←c
3
Compute Coin challenge (c′, c)
Compute r = cx + w mod q r→
4
Verify • rg = cy + a • r(I + g″) = cz + b Compute r′ = ru + v mod q
with a heavier computation overhead (by a factor of k ) for the computation of B. 2
The first two steps are—in common with the base-formulation of Table 1—precomputable, the result of which is improved efficiency compared to Withdrawal of k distinct single-term Coins. 4.2 Sub-coin(s) Payment Note the essential similarity of Tables 1 and 4, which differ only in the construction of the B component of Coin χ. The subsequently outlined Payment and Deposit, on the other hand, must now deal with sub-Coins—of value 1 in relation to k-term Coin k
χ—as the fundamental transactional unit. The basic concept is to commit each subCoin Payment via disclosure of a Shamir polynomial share, which can then be ZKverified (by the Merchant) using the Pedersen formalism. This necessitates User disclosure of Coin χ and polynomial-commitments E, both of which are verifiable via σ(A, B). A single share of the encoded secret is hence (c, r (= f(c))), which is verifik able via rg′′ = ∑ ciEi . Note that knowledge of polynomial-coefficients b is rei=0 quired for proper construction of verifiable responses, thereby restricting Payment to the legitimate User. Payment then proceeds as follows:
188
A. Goh, K.W. Yip, and D.C.L. Ngo
Table 5. Sub-Coin Payment U
Compute sub-Coins (c, r) j
↓ χ, Ei, (T, r) j M
Verify Bank signature σ(A, B) Verify Coin polynomial Ei Verify sub-Coin shares (c, r) j
and can be repeated to transfer n sub-Coins—for n = 1…k and with a relative value of n —via computation of n challenge-response pairs of form (c, r) j for j = 1…n. Note k
(
)
the incorporation of non-interactive challenges c j = h A, B, I m, T j —with I m and T j as previously specified—resulting in a simplified transactional framework. Payment using multiple sub-Coins divided from a k-term Coin (with reuse of all steps involving χ and E) is also more efficient than the alternative of several equallydenominated single-term Coins. Successful Payment results in Merchant possession
(
)
of User-commitment on n sub-Coins of form χ′( A, B) = χ, Ei, (c, r) j , the Deposit of which is subsequently described. The computation complexity for a single sub-Coin only involves generation of the challenge-response pair, which requires k(k+1) 2
modular multiplications with k a small (by cryptographic standards) integer. An average case of k sub-Coins per Payment would therefore result in an overhead of 2
k 2(k+1) modular multiplications. 4
4.3 Sub-coin(s) Deposit Digital cash protocols are specifically designed to preserve User-anonymity in the event of legitimate Coin usage, which in our formulation is specified by the Bank not having a priori knowledge of Coin details ( A, B, Ei ) and additionally by the number of sub-Coins being below the recovery threshold. Over-Payment associated with a kterm Coin (by a particular User) is detectable by the Bank via scrutiny of χ′(A, B) Deposit data for possible accumulation of k+1 (or more) distinct and verifiable shares. These (c, r) j shares can then be combined (using Lagrange interpolation) by the Bank, resulting in knowledge of the Coin-specific f(x) and subsequently identification of the overspending User via (µ, I) computation. The following Deposit process:-
Flexibly-Configurable and Computation-Efficient Digital Cash
189
Table 6. Sub-Coin Deposit M B
↓ χ′ (A, B) Verify own signature σ(A, B) Verify Coin commitments Ei Verify sub-Coins (c, r) j Check for over-use of (A, B) If detected, then:• Recover Coin f(x) • Compute (µ, I) for identification
leads (barring detection of fraud) to Merchant account-credit by n the value of Coin k
χ. Note the emphasis on a posteriori detection of User fraud; which contrasts with the in situ prevention of Merchant or third-party fraud, both of which are effectively ruled out by the necessity for verifiable sub-Coin commitment. The Brands protocol can actually be extended to include User-side Observer modules for User fraud prevention (as opposed detection) at the expense of additional computation. The incorporation of VSS-based divisibility as outlined in this document is designed to ensure Coinstructural compatibility with respect the Brands framework, thereby allowing for Observer-based fraud preventive measures. The featured protocol does, however, allow the Bank to establish a posteriori subCoin (same Withdrawal, different Payments) level linkages; even when User anonymity is not revoked. This trait is also present in the Brands and Okamoto formulations, but not in the Chan et al protocol due to its elimination and downwardmovement of the User Setup procedure into the Withdrawal operation. Such a strategy is somewhat at odds with one of the important motivations for divisible cash ie to reduce per-Payment overheads related to Withdrawal, which is particularly heavy due to the complexity of conditional anonymity enablement and User identity encoding into individual Coin. Note that our scheme can also be rendered unlinked via use of Coin-specific identities, as in the Chan et al formulation. The most practical resolution of this issue is, however, operational in nature via periodic refreshment (via User Setup repetition) of identity vector (µ, I, z).
5 Comparative Analysis We compare the performance of the above-outlined EC-based k-term scheme with equivalent-security [11] parameterisations of previously published protocols. This amounts to 160-bit moduli-lengths for our EC formulation being equivalent (at currently accepted equivalent security levels) to 1024-bit moduli-lengths for DL-based schemes ie Brands, Okamoto and Chan et al. Equivalent-security considerations also requires parameter-length adjustments ie standardisation of Okamoto’s (P, Q, N, n, b) to (512, 512, 1024, 1024, 128) and an upgrade of Chan-Frankel-Tsiounnis’ (P, Q, N, p, q) to (512, 512, 1024, 1024, 512).
190
A. Goh, K.W. Yip, and D.C.L. Ngo
Comparison of DL and EC overheads requires cross-calibration of the two most significant operations for each formalism ie DL multiplication (Mult) vs EC point addition (PA) and DL exponentiation (Exp) vs EC scalar multiplication (SM). The computational costs of these operations is analysed in [11, 12] and can be summarised as follows:• Mult: lg 2p •
Exp (via Repeated Squaring): lg q ⋅ lg 2p
•
PA: 3.5 Mults and one modular-inversion, which costs lg 2p
•
SM (via Addition-Subtraction): lg p doublings/PAs and lg p PAs 2
Note the EC operations presume a shorter moduli-length. This leads to evaluation of the relative Mult-to-PA overheads [12] as approximately 9 at the equivalent setting of (DL, EC) = (1024, 160)-bit, with an increase to 23 for the more rigorous (2048, 200)bit [11] setting. Similar analysis establishes a Exp-to-SM ratios [12] of approximately 6 at the regular-security setting, with an increase to 15 at the high-security setting. The relatively small increase in the EC moduli-lengths can be attributed to its exponential effect on computational security, as opposed increases in DL/IF modulilengths resulting in much less dramatic sub-exponential enhancements. Our analysis examines the computation, storage and communications overheads in terms of the leading-order significant operations (mostly Exp and SM); and presumes pre-computation whenever allowed by the respective protocols. We also adjust for the continuous divisibility of QR-tree based Coins via assignment of an average height h (=lg k). The real-time overheads of the (1) Okamoto, (2) Chan et al, (3) Brands and (4) Poly-cash schemes are summarised as follows:Table 7. Computation, Storage and Bandwidth Overheads of Various Schemes
User Setup
Withdrawal
Payment
Computation
Storage
Communications
1, 2
lg P ⋅ P 4 ⋅ Exp(P)
2|N|+|P|+|Q|
lg P ⋅ P 4
3 4
2 Exp (p) 2 SM (p)
|p| |p|
|p| |p|
1
2 Exp (N)
2|N|+|P|+|Q|
2|N|
2
12 Exp (N)
2|p|+|q|+2|P|
3|p|+2|q|
3 4
6 Exp (p) 6 SM (p)
5|p|+|q| 6|p|
2|p|+2|q| 4|p|
1
(h+1) exp (N)
|N|*(h+1)
|N|*(h+1)
2 3 4
(h+1) exp (N) 2 Mult (q) k(k+1) Mul (p)
|N|*(h+1) 3|q| k|p|
|N|*(h+1) 3|q| 2k|p|
Flexibly-Configurable and Computation-Efficient Digital Cash
191
with respective Coin bit-lengths of: 1. 2. 3. 4.
3|N|+|b|+|P|+|Q| (=424) 2|p|+|q|+2|P| (=4636) 5|p|+|q| (=960) 6|p| (=960) for decimal (k=10) divisibility
Poly-cash can be seen to have the most efficient Setup and Withdrawal processes, with the most striking comparison in case being with respect Okamoto and Chan et al. This illustrates the usefulness of an EC-compatible divisibility mechanisms, as opposed the alternative of tree-node sub-Coins based on QR/IF cryptography. The Setup and Withdrawal processes are, in fact, essentially as efficient as an EC transcription of the Brands protocol, with negligible additional overheads arising from Coin divisibility. Note also the relatively modest sub-kbit size of the k-term Coin, thereby facilitating implementation of Coin-carrying Purses on lightweight handheld platforms. Divisible Coins are specifically intended to amortise a particular Withdrawal overhead over multiple Payments, hence the most interesting analysis being from that viewpoint. Such schemes are predicated on the efficiency of Sub-Coin verification, which was the motivation for divisibility based on QR-trees. The intrinsic logarithmic efficiency and continuous divisibility of formulations based on QR-trees will eventually prevail for large divisibility (k) configurations. EC-based Poly-cash is linearly efficient with respect k; and is therefore expected to be advantageous for small values of k due to the previously discussed DL-to-EC operational ratios. We found that our formulation to result in more efficient Payments—at least for practical divisibilities ie k~10—at the moderate (DL, EC) = (1024, 160)-bit equivalency setting. The computation and communications overheads for increasing k is illustrated below: Communication/Storage Overhead vs Number of Sub-Coins for EC Keysize=160 and DL Keysize (p,q)=(1024, 180) During Payment
Computation Overhead vs Number of Sub-Coins for EC Keysize = 104 and DL Keysize (p,q)=(512,160) During Payment
1.E+04 8.E+08
8.E+03
6.E+08
Bit Usage
Computation
7.E+08
5.E+08 4.E+08 3.E+08
6.E+03 4.E+03
2.E+08
2.E+03
1.E+08 0.E+00
0.E+00 1
6
11
16
21
Number of Sub-Coins, k
poly-cash
26
31
1 poly-cash
6
11
16
21
26
31
Number of Sub-Coins, n Brands Okamoto/Chan-et-al
Okamoto/Chan-et-al
Fig. 2. (a) Computation and (b) communications overheads at moderate equivalent-security configurations
192
A. Goh, K.W. Yip, and D.C.L. Ngo
Note from Fig 2(a) that the equal-efficiency (crossover) point is not even on the graph, hence the lower computational overheads of Poly-cash even for fairly extreme divisibility settings. The crossover point in Fig 2(b), on the other hand, occurs around k~16, which is still desirable as it exceeds the practical k=10 configuration with dimesized sub-Coins. Comparison of communications overheads at the (2048, 200)-bit high-security equivalency setting, as follows:Communication/Storage Overhead vs Number of Sub-Coins for EC Keysize=200 and DL Keysize (p,q)=(2048,200) During Payment
Bit Usage
1.E+04 8.E+03 4.E+03 0.E+00 1
poly-cash
6
11
16
21
26
31
Number of Sub-Coins, n Okamoto/Chan-et-al Brands
Fig. 3. Communications Overheads at High Equivalent-Security Settings
right-shifts the cross-over point to k~31, thereby demonstrating the relative efficiency of k=20 divisibility with nickel-sized sub-Coins.
6 Concluding Remarks Polynomial thresholding is an elegant Coin-divisibility mechanism, and enables an efficient EC realisation impossible with the earlier QR-tree methodology. The combination of polynomial divisibility, an extended Brands operational framework and an EC implementation results in significant performance advantages (at equivalentsecurity parameterisations) compared to previous protocols. The flexible k-term divisibility is also preferable to the more rigid binarised denominations of the QR-tree formalism, especially with respect straightforward integration into existing trading and financial frameworks. This and other digital cash protocols constitute viable alternatives to existing electronic payment systems, which typically require expensive Merchant-Bank connectivity during Payment. Offline Coin verifiability also lowers the operational overhead of Bank- side account management. User conditional anonymity also facilitates consumer privacy (without jeopardising fraud detectability) to an extent impossible with typical payment solutions.
Flexibly-Configurable and Computation-Efficient Digital Cash
193
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.
D Chaum, A Fiat & M Noar. “Untraceable Electronic Cash”. Crypto 88, Springer-Verlag Lecture Notes in Comp Sc (LNCS) (1988). S Brands. “Untraceable Off-Line Cash in Wallets with Observers”. Crypto 93, SpringerVerlag LNCS (1993). S Brands. “An Efficient Off-line Electronic Cash System based on the Representation Problem”. Tech Rep CS-R9323, CWI (1993) T Okamoto & K Ohta. “Universal Electronic Cash”. Crypto 91, Springer-Verlag LNCS (1991) T Okamoto. “An Efficient Divisible Electronic Cash Scheme”. Crypto 95, SpringerVerlag LNCS (1995) Chan, A., Frankel, Y. & Tsiounnis, Y., “Easy come- easy go divisible cash”, EuroCrypt 98, Springer-Verlag LNCS (1998) N Ferguson. “Extensions of Single-Term Coins”. Crypto 93, Springer-Verlag LNCS 773, pp 292-301(1993) A Shamir. “How to Share a Secret”. ACM Comms (1979) P Feldman. “A Practical Scheme for Non-Interactive VSS”. IEEE Symp Foundations Comp Sc (1987) TP Pedersen. “Non-Interactive and Information-Theoretic Secure VSS”. Crypto 91, Springer-Verlag LNCS (1991) Menezes, A., “Comparing the security of ECC and RSA”, at www.certicom.com, 11 Jan (2000) P1363/D13, Draft Version 13, Standard Specification for PKC (1999) WK Yip, “Divisible Digital Cash via Secret Sharing Schemes”, Masters in Comp Sc Thesis, Universiti Sains Malaysia (2001)
Selective Encryption of the JPEG2000 Bitstream Roland Norcen and Andreas Uhl Department of Scientific Computing, Salzburg University, Jakob-Haringer-Str. 2, A-5020 Salzburg, Austria
Abstract. In this paper, we propose partial encryption of JPEG2000 coded image data using an AES symmetric block cipher. We find that encrypting 20% of the visual data is sufficient to provide a high level of confidentiality. This percentage of encryption also provides security against replacement attacks, which is discussed at length.
Introduction Images and videos (often denoted as visual data) are data types which require enormous storage capacity or transmission bandwidth due to the large amount of data involved. In order to provide reasonable execution performance for encrypting such large amounts of data, only symmetric encryption (as opposed to public key cryptography) can be used. The Advanced Encryption Standard AES [3] is a recent symmetric block cipher potentially used in such applications. Nevertheless, real-time encryption of an entire video stream using such symmetric ciphers requires much computation time due to the large amounts of data involved. Since many multimedia applications require security on a low level (e.g. TV broadcasting [6]) or should protect their data just for a short period of time (e.g. news broadcast), faster encryption procedures specifically tailored to the target environment should be designed for such multimedia security applications. A good step in this direction is the introduction of a “soft“ encryption scheme. Such a scheme does not strive for maximum security and trades off security for computational complexity. Selective or partial encryption (SE) of visual data is an example for such an approach. Here, application specific data structures are exploited to create more efficient encryption systems (see e.g. SE of MPEG video streams [1,5,10,11,13, 19], of wavelet-based encoded imagery [2,4,8,9,18,19], and of quadtree decomposed images [2]). Consequently, SE only protects (i.e. encrypts) the visually most important parts of an image or video representation relying on a secure but slow “classical” cipher. In this work we discuss the selective encryption of JPEG2000 coded image data using an symmetric AES cipher. Section 1 introduces the JPEG2000 algorithm with a special focus on the bitstream assembling part, since this is the starting point for our selective encryption approach. Then, in section 2, our partial encryption scheme of the JPEG2000 bitstream is presented and the obtained results are discussed in detail with emphasis on the achieved security. A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 194–204, 2003. c IFIP International Federation for Information Processing 2003
Selective Encryption of the JPEG2000 Bitstream
(a) Lena (256 × 256 and 512 × 512 pixels)
195
(b) Angiogram (512 × 512 pixels)
Fig. 1. Testimages used in the experiments
In Fig. 1 we display the testimages which we use in our experiments. We have decided to discuss the encryption efficiency of our proposed partial encryption scheme with respect to two different classes of visual image data. The first class of visual data discussed is typical still image data and the testimage representing this class is the lena image (see Fig. 1.a) at different resolutions including 256 × 256 and 512 × 512 pixels. Since this special type of visual data is usually encoded in lossy mode, the lena image is lossy coded in our experiments (at a fixed rate of 2 bpp). The second type of digital visual data should represent an application class where lossless coding is important. We have therefore decided to use an angiogram as testimage in this case (see figure 1.b), since angiograms represent an important class of medical image data where lossless coding is usually a mandatory requirement. In order to make the explanations and experiments of the proposed techniques simpler, we assume the testimages to be given in 8bit/pixel (bpp) precision and in a squared format. Extensions to images of different acquisition types (e.g. [17]), higher bitdepth or non-squared format are straightforward.
1
JPEG2000
Image compression methods that use wavelet transforms [16] (which are based on multiresolution analysis – MRA) have been successful in providing high compression ratios while maintaining good image quality. Therefore, they have replaced
196
R. Norcen and A. Uhl
DCT based techniques in recent standards for still image coding: JPEG2000 [15] and VTC (visual texture coding in MPEG-4 [12]). The JPEG2000 image coding standard is based on a scheme originally proposed by Taubman and known as EBCOT (“Embedded Block Coding with Optimized Truncation” [14]). The major difference between previously proposed wavelet-based image compression algorithms such as EZW or SPIHT (see [16]) is that, after performing a global wavelet transform, EBCOT as well as JPEG2000 operate on independent, non-overlapping blocks of transform coefficients which are coded in several bit layers to create an embedded, scalable bitstream. Instead of zerotrees, the JPEG2000 scheme depends on a per-block quad-tree structure since the strictly independent block coding strategy precludes structures across subbands or even code-blocks. These independent code-blocks are passed down the “coding pipeline” shown in Fig. 2 and generate separate bitstreams. The wavelet coefficients inside a code-block are processed from the most significant bitplane towards the least significant. Furthermore, in each bitplane the bits are scanned in a maximum of three passes called coding passes. Finally, during each coding pass, the scanned bits with their context value are sent to a contextbased adaptive arithmetic encoder that generates the code-block’s bitstream. This procedure is called Tier-1 encoding. inherently parallel on indep. code blocks
I/O, Setup
source image
coded image Rate Allocation
Wavelet Transform Entropy coding pipeline in several stages (Quantization, ROI Scaling, Arithmetic Coding, ...)
Fig. 2. JPEG2000 coding pipeline
The rate-distortion optimal merging of these bitstreams into the final one is based on a sophisticated optimization strategy and is called Tier-2 encoding. This last procedure carries out the creation of the so-called layers which roughly stand for successive qualities at which a compressed image can be optimally reconstructed. These layers are built in a rate-allocation process that collects, in each code block, a certain number of coding-passes codewords. Hence, in a code-block, the bitstream is distributed into a certain number of layers. The final JPEG2000 bitstream is organized as follows: A set of different main headers (including a main header (SIZ), a coding style header (COD), a quantization header (QCD), a comments header (COM), a start of a tile parts header (SOT)) is followed by packets of data which are all preceded by a packet header. In each packet appear the codewords of the code-blocks that belong to the same image resolution and layer, the header identifies the data. Depending on the arrangement of the packets, different progression orders may be specified. Among others, resolution and layer progressive are most important for grayscale
Selective Encryption of the JPEG2000 Bitstream
197
images. In layer progression order, the packets corresponding to the first layer are arranged first and cover data contained in all resolutions, followed by packets corresponding to the second layer and so on. Vice versa, in resolution progression order the packets corresponding to the first resolution level are arranged first (these contain data of all layers).
2
Selective Encryption of JPEG2000 Coded Data
For selectively encrypting the JPEG2000 bitstream we have two general options. First, we do not care about the structure of the bitstream and simply encrypt a part, e.g. the first 10% of the bitstream. In this case, the main header and a couple of packets including packet header and packet data are encrypted. Since basic information necessary for reconstruction usually located in the main header is not available at the decoder, encrypted data of this type can not be reconstructed using a JPEG2000 decoder. Although this seems to be desirable at first sight, an attacker could reconstruct the missing header data using the unencrypted parts, and, additionally, no control over the quality of the remaining unencrypted data is possible. Therefore, the second option is to design a JPEG2000 bitstream format compliant encryption scheme which does not encrypt main and packet header but only packet data. This is what we propose in the following. In order to achieve format compliance, we need to access and encrypt data of single packets. Since the aim is to operate directly on the bitstream without any decoding we need to discriminate packet data from packet headers in the bitstream. This can be achieved by using two special JPEG2000 optional markers which were originally defined to achieve transcoding capability, i.e. manipulation of the bitstream to a certain extent without the need to decode data. Additionally, these markers of course increase error resilience of the bitstream. These markers are “start of packet marker” (SOP - 0xFF91) and “end of packet marker” (EPH - 0xFF92). The packet header is located between SOP and EPH, packet data finally may be found between EPH and the subsequent SOP. For example, using the official JAVA JPEG2000 reference implementation (JJ2000 available at http://jj2000.epfl.ch) the usage of these markers may be easily invoked by the options -Peph on -Psop on. Having identified the bitstream segments which should be subjected to encryption we note that packet data is of variable size and does not at all adhere to multiples of a block ciphers block-size. We have to employ AES in CFB mode for encryption, since in this mode, an arbitrary number of data bits can be encrypted, which is not offered by the ECB and CBC encryption modes. Information about the exact specification of the cryptographic techniques used (e.g. key exchange) may be inserted into the JPEG2000 bitstream taking advantage of so-called termination markers. Parts of the bitstream bounded by termination markers are automatically ignored during bitstream processing and do not interfere with the decoding process. Note that a JPEG2000 bitstream which is selectively encrypted in the described way is fully compliant to the standard
198
R. Norcen and A. Uhl
and can therefore be decoded by any codec which adheres to the JPEG2000 specification. We want to investigate whether resolution progressive order or layer progressive order is more appropriate for selective JPEG2000 bitstream encryption. We therefore arrange the packet data in either of the two progression orders, encrypt an increasing number of packet data bytes, reconstruct the images and measure the corresponding quality. 11
res layer
10.5 10
PSNR
9.5 9 8.5 8 7.5 7
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
no. first bytes coded
Fig. 3. Angiogram: Comparison of selective encryption (PSNR of reconstructed images) using resolution or layer progressive encoding - part 1.
19
14 res layer
res layer
18 13 17 16 12
PSNR
PSNR
15 14
11
13 10 12 11 9 10 9
8 0
500
1000
1500
2000
2500
3000
3500
4000
first bytes replaced
(a) Lena 256 × 256 pixels
4500
5000
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
first bytes replaced
(b) Lena 512 × 512 pixels
Fig. 4. Comparison of selective encryption (PSNR of reconstructed images) using resolution or layer progressive encoding - part 2.
Resolution progression is more suited for selectively encrypting the angiogram image at higher rates of encrypted data (see Fig. 3, where a lower PSNR means
Selective Encryption of the JPEG2000 Bitstream
199
that it is more suited for selective encryption). In order to relate the obtained numerical values to visual appearance, two reconstructed versions of the angiogram, corresponding to the two progression orders, are displayed in Fig. 5. In both cases, 1% of the entire packet data has been encrypted. Whereas no details are visible using layer progression (Fig. 5.a at 8.79 dB), only very high frequency visual information (right lower corner) is visible using resolution progression (Fig. 5.b at 7.45 dB). When considering the Lena image in Fig. 4, we observe that resolution progression shows superior PSNR results for both tested image dimensions as compared to layer progression. Two reconstructed versions of the Lena image with 512 × 512 pixels, corresponding to the two progression orders, are displayed in Fig. 6. In each case, 1% of the entire packet data has been encrypted. Whereas only very high frequency information is visible in the reconstructed image using layer progression (Fig. 6.a at 8.51 dB), important visual features are visible using resolution progression (Fig. 6.b at 10.11 dB). In this case, the visible high frequency information is enough to reveal sensible data. At 2 percent encrypted packet data, this information is destroyed fully in the resolution progressive case. The lena image at lower resolution (256 × 256 pixels) performs equally, and the results are therefore only given for the 5122 pixels version.
(a) layer progressive, 8.79 dB
(b) resolution progressive, 7.45 dB
Fig. 5. Comparison of selective encryption (visual quality of reconstructed Angiogram where 1% of the bitstream data have been encrypted) using resolution or layer progressive encoding.
Please note also the difference in coarseness of the noise pattern resulting from encryption between resolution and layer progression. Since in resolution
200
R. Norcen and A. Uhl
(a) layer progressive, 8.51 dB
(b) resolution progressive, 10.11 dB
Fig. 6. Comparison of selective encryption (visual quality of reconstructed Lena (512 pixels) where 1% of the bitstream data have been encrypted) using resolution or layer progressive encoding.
progression data corresponding to the higher levels of the wavelet transform is encrypted, the noise introduced by the cipher is propagated by the repeated inverse transform and thereby magnified resulting in a much coarser pattern as compared to layer progression. When summarizing the obtained numerical and visual results, it seems that encrypting 1-2% of the packet data in layer progressive mode is sufficient to provide confidentiality for the JPEG2000 bitstream. This is a very surprising result of course.
3
Security Evaluation
We want to assess the security of the presented selective encryption scheme by conducting a simple ciphertext-only attack. Therefore, an attacker would replace the encrypted parts of the bitstream by artificial data mimicking typical images (“replacement attack”, see also [7]). This attack is usually performed by replacing encrypted data by some constant bits (i.e. in selective bitplane encryption). In encrypting the JPEG2000-bitstream, this attack does not have the desired effect, since bitstream values are arithmetically decoded and the corresponding model depends on earlier results and corrupts the subsequently required states. Therefore, the reconstruction result is a noise-like pattern similar as obtained by directly reconstructing the encrypted bitstream. We exploit a built-in error resilience functionality in JJ2000 to simulate a bitstream-based replacement attack. An error resilience segmentation symbol in the codewords
Selective Encryption of the JPEG2000 Bitstream
201
at the end of each bit-plane can be inserted. Decoders can use this information to detect and conceal errors. This method is invoked in JJ2000 encoding using the option -Cseg_symbol on. If an error is detected during decoding (which is of course the case if data is encrypted) it means that the bit stream contains some erroneous bits that have led to the decoding of incorrect data. This data affects the whole last decoded bit-plane. Subsequently, the affected data is concealed and no more passes should be decoded for this code-block’s bit stream. The concealment resets the state of the decoded data to what it was before the decoding of the affected bit-plane started. Therefore, the encrypted packets are simply ignored during decoding. Using this technique, we again compare selective JPEG2000 encryption using resolution and layer progressive mode by reconstructing images with a different amount of encrypted packets. Decoding is done using error concealment. In Fig. 7 and 8 we immediately recognize that the PSNR values are significantly higher as compared to directly reconstructed images (see Fig. 3 and 4). Layer progression is more suited for selectively encrypting the angiogram image. For the lena test images, the situation differs slightly: When encrypting only minor parts of the overall bitstream, layer progression is superior, at higher rates of encryption, the resolution progression scheme shows superior results.
11.2
res layer
11 10.8 10.6
PSNR
10.4 10.2 10 9.8 9.6 9.4 9.2 9
0
2000
4000
6000
8000
10000
first bytes replaced
Fig. 7. Angiogram: PSNR of reconstructed images after replacement attack using resolution or layer progressive encoding - part 1.
Again, the numerical values have to be related to visual inspection. Fig. 9.a shows a reconstruction of the selectively compressed angiogram image, where the first 1% of the packets in resolution progressive mode have been encrypted and the reconstruction is done using the error concealment technique. In this case, this leads to a PSNR value of 10.51 dB, whereas the directly reconstructed image has a value of 7.45 dB (see Fig. 5.b). The text in the right corner is clearly readable and even the structure of the blood vessels is exhibited. The lena performs similarly (see Fig. 10.a), all important visual features are reconstructed
202
R. Norcen and A. Uhl
at 1% encrypted. Here, we have a resulting PSNR of about 11.31 db, whereas the directly reconstructed image has a value of 10.11 dB (see Fig. 6.b). When increasing the percentage of encrypted packet data steadily, we finally result in 20% percent of the packet data encrypted where neither useful visual nor textual information remains in the image (see Fig. 9.b and 10.b). This result is confirmed also with other images including other angiograms and other still
19
14.5 res layer 14
17
13.5
16
13
15
12.5 PSNR
PSNR
res layer 18
14 13
12 11.5
12
11
11
10.5
10
10
9
9.5 0
5000
10000
15000
first bytes replaced
(a) Lena 256 × 256 pixels
20000
0
5000
10000
15000
20000
first bytes replaced
(b) Lena 512 × 512 pixels
Fig. 8. PSNR of reconstructed images after replacement attack using resolution or layer progressive encoding - part 2.
(a) 1% encrypted, 10.51 dB
(b) 20% encrypted, 9.90 dB
Fig. 9. Visual quality of reconstructed Angiogram after replacement attack using resolution encoding.
Selective Encryption of the JPEG2000 Bitstream
(a) 1% encrypted, 11.31 dB
203
(b) 20% encrypted, 9.77 dB
Fig. 10. Visual quality of reconstructed Lena (512 pixels) after replacement attack using resolution encoding.
images and can be used as a rule of thumb for a secure use of selective encryption of the JPEG2000 bitstream.
4
Conclusion
A computationally efficient technique for the confidential storage and transmission of digital image data has been discussed. In detail, we propose a partial encryption technique based on AES where parts of the JPEG2000 bitstream are encrypted. The percentage of data subjected to encryption while maintaining high confidentiality is significantly reduced as compared to full encryption, the encryption of 20% data already delivers a satisfying secure result. This is due to the fact that important visual features are concentrated at the begin of the embedded JPEG2000 bitstream and may therefore be protected effectively.
References 1. A. M. Alattar, G. I. Al-Regib, and S. A. Al-Semari. Improved selective encryption techniques for secure transmission of MPEG video bit-streams. In Proceedings of the 1999 IEEE International Conference on Image Processing (ICIP’99). IEEE Signal Processing Society, 1999. 2. H. Cheng and X. Li. Partial encryption of compressed images and videos. IEEE Transactions on Signal Processing, 48(8):2439–2451, 2000.
204
R. Norcen and A. Uhl
3. J. Daemen and V. Rijmen. The Design of Rijndael: AES - the advanced encryption standard. Springer Verlag, 2002. 4. Rapha¨el Grosbois, Pierre Gerbelot, and Touradj Ebrahimi. Authentication and access control in the JPEG 2000 compressed domain. In A.G. Tescher, editor, Applications of Digital Image Processing XXIV, volume 4472 of Proceedings of SPIE, San Diego, CA, USA, July 2001. 5. Thomas Kunkelmann. Applying encryption to video communication. In Proceedings of the Multimedia and Security Workshop at ACM Multimedia ’98, pages 41–47, Bristol, England, September 1998. 6. Benoit M. Macq and Jean-Jacques Quisquater. Cryptology for digital TV broadcasting. Proceedings of the IEEE, 83(6):944–957, June 1995. 7. M. Podesser, H.-P. Schmidt, and A. Uhl. Selective bitplane encryption for secure transmission of image data in mobile environments. In CD-ROM Proceedings of the 5th IEEE Nordic Signal Processing Symposium (NORSIG 2002), TromsoTrondheim, Norway, October 2002. IEEE Norway Section. file cr1037.pdf. 8. A. Pommer and A. Uhl. Wavelet packet methods for multimedia compression and encryption. In Proceedings of the 2001 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, pages 1–4, Victoria, Canada, August 2001. IEEE Signal Processing Society. 9. A. Pommer and A. Uhl. Selective encryption of wavelet packet subband structures for obscured transmission of visual data. In Proceedings of the 3rd IEEE Benelux Signal Processing Symposium (SPS 2002), pages 25–28, Leuven, Belgium, March 2002. IEEE Benelux Signal Processing Chapter. 10. Lintian Qiao and Klara Nahrstedt. Comparison of MPEG encryption algorithms. International Journal on Computers and Graphics (Special Issue on Data Security in Image Communication and Networks), 22(3):437–444, 1998. 11. C. Shi and B. Bhargava. A fast MPEG video encryption algorithm. In Proceedings of the ACM Multimedia 1998, pages 81–88, Boston, USA, 1998. 12. Iraj Sodagar, H. J. Lee, P. Hatrack, and Ya-Qin Zhang. Scalable wavelet coding for synthetic/natural hybrid coding. IEEE Transactions on Circuits and Systems for Video Technology, 9(2):244–254, 1999. 13. L. Tang. Methods for encrypting and decrypting MPEG video data efficiently. In Proceedings of the ACM Multimedia 1996, pages 219–229, Boston, USA, November 1996. 14. D. Taubman. High performance scalable image compression with EBCOT. IEEE Transactions on Image Processing, 9(7):1158 – 1170, 2000. 15. D. Taubman and M.W. Marcellin. JPEG2000 — Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers, 2002. 16. P.N. Topiwala, editor. Wavelet Image and Video Compression. Kluwer Academic Publishers Group, Boston, 1998. 17. P. Trunk, B. Gersak, and R. Trobec. Topical cardiac cooling - computer simulation of myocardial temperature changes. Computers in Biology and Medicine, 2003. To appear in this issue. 18. T. Uehara, R. Safavi-Naini, and P. Ogunbona. Securing wavelet compression with random permutations. In Proceedings of the 2000 IEEE Pacific Rim Conference on Multimedia, pages 332–335, Sydney, December 2000. IEEE Signal Processing Society. 19. Wenjun Zeng and Shawmin Lei. Efficient frequency domain video scrambling for content access control. In Proceedings of ACM Multimedia 1999, pages 285–293, Orlando, FL, USA, November 1999.
Robust Spatial Data Hiding for Color Images* Xiaoqiang Li, Xiangyang Xue, and Wei Li Department of Computer Science and Engineering, Fudan University, Shanghai 200433, China [email protected]
Abstract. In paper [1], we present a new watermarking technique in spatial domain expressly devised for color images, which is based on transmit diversity techniques. We also show that the robustness of our new algorithm is better than single color channel. However, we don’t compare the robustness of our algorithm with luminance channel. In this paper, we describe the revised scheme of algorithm [1] in detail, and give experimental results that demonstrate the unmistakable advantage of our new approach over algorithms operating on not only single color channel, but also luminance channel. All experimental results prove that our scheme is more robust than any other single channel scheme in spatial for color images.
1 Introduction The Internet and the World Wide Web have revolutionalized the way in which digital data is distributed. The wide spread and easy access to multimedia content has motivated development of technologies for data hiding, with emphasis on access control, authentication, and copyright protection. Much of the recent work in data hiding is about copyright protection of multimedia data. This is also referred to as digital watermarking. In the field of image watermarking, a great deal of research has been carried out for mainly two targets. The first is to enlarge the maximum number of information bits that can be hidden in a host image invisibly. The second is to improve the robustness of watermark. However, there is a trade-off between the capacity of the information embedded to a given image and the robustness of the watermarking system. Previous researches were mostly focused on grayscale image watermarking. For color images, researchers usually embed information in luminance channel [2,3], or in a single color channel [4,5]. Alternative approaches to color image watermarking have been advanced by Fleet and Heeger [6], who suggest embedding the watermark into the yellow-blue channel of the opponent-color representation of color images. The aim of their scheme is resisting printing and rescanning. Kutter et al in [7,8] embed the watermark by modifying a selected set of pixel values in the blue channel, since human eye is less sensitive to changes in this band. However, the ____________________ *This work was supported in part by NSF of China under contract number 60003017, China 863 Projects under contract numbers 2001AA114120 and 2002AA103065. A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 205–213, 2003. © IFIP International Federation for Information Processing 2003
206
X. Li, X. Xue, and W. Li
They didn’t take this phenomenon into consideration. Barni et al [9] proposed a robust watermarking scheme for color images, its detection is based on a global correlation measure which is computed by taking into account the information conveyed by the three color channels as well as their interdependency, but it is only used to decide whether the watermark is exist or not. Multi-bits robust watermarking for image remains a topic of research. In [1], we developed a new multi-bits watermarking algorithm for color images based on transmit diversity techniques. In that scheme, color image is considered as three independent communication channels to meet the assumption of transmit diversity technique. We detect the transmitted information from combination of three channels. A single color channel may suffer severe fading in many cases. However, by using transmit diversity we can provide redundant replica of information to the extractor. Compared with the single channel systems, our scheme has better performance on capacity and on robustness to unintentional attacks, such as the JPEG lossy compression. In this paper, we revise watermarking algorithm [1] in two aspects. Interleaver is replaced by shuffling in embedding algorithm. In extraction algorithm a simple method accumulating the energy of three channels is used. By this method, we can achieve the capacity much higher than that of scheme in [5]. Lots of experiments show that the watermarking scheme we present excel not only single color algorithm, but also luminance channel algorithm. This paper is organized as follows. In the next section we give a representation of the transmit diversity technique. We combine this technique with watermarking. Embedding and extraction algorithm is described in section 3. In section 4, we prove by plentiful experiments that transmit diversity techniques in our scheme can make notable gains in both capacity and robustness. In section 5, we conclude that all these properties indicate that our scheme can be employed in the video watermarking and covert communication [11].
2 Transmit Diversity and Watermarking Any single communication channel suffers fading aroused by various channel noises. As a result, the receiver must have a redundant replica of the transmitted signal for reliable communication. In wireless communications, transmit diversity techniques have been proposed for providing diversity gain for higher voice and data rates. By jointly designing the forward error correction coding, modulation, transmit diversity, and optional receiver diversity scheme, one can boost the throughput of band-limited wireless channels [12,13]. In spatial spread spectrum watermarking scheme, a narrow-band signal (the watermark) has to be transmitted in a wide-band channel with noise (the image or video signal). Conventional algorithms just embed information into a single channel: one of the three color channels (always the blue one) or the luminance channel. According to the assumption in transmit diversity techniques, the RGB tones of color images should be considered as independent and slowly fading channels. Modulated by different pseudo-random sequences and shuffling respectively, the same watermarking in different forms is embedded into the three channels. We employ
Robust Spatial Data Hiding for Color Images
207
three receivers that can receive replicas of the same transmitted information through independent fading paths. Even if a particular path is severely faded, we are still able to recover a reliable estimate of the transmitted symbols through other propagation paths.
w hR
hG
WR
WG
nG
demodulate
sR
modulate
modulate
modulate
nR
hB
WB
nB
demodulate
sG
demodulate
sB
sK hard dicision
wK Fig. 1. Watermarking scheme based on the theory of transmit diversity techniques
Transmit diversity techniques fall into three main categories: information feedback-assisted schemes, feedforward or training assisted arrangements, and blind schemes [13]. As image is non-stationary pseudo signal, it is hard to find an appropriate channel model to describe its characteristics. We also find that the performance of the blue channel is not always better than the other two for all the color images. So we consider the images as unknown channel with unstructured interference. After demodulating the watermark from the three paths respectively, we decide the information bits by simply accumulating the energy of three channels. Figure 1 shows the spatial watermarking in conjunction with three propagation paths.
3 Data Hiding in Spatial Domain Based on transmit diversity principle, new color image watermarking scheme embedding the same watermark in three color channels improves the decoding performance. To better hide the watermark to human observer, we use human visual system (HVS) model in spatial to ensure watermark invisible.
208
X. Li, X. Xue, and W. Li
Secret key Ki
PRS generator
pi Information b
Interleaver
ci
Expand
c’i pigc’i
Original channel IR,IG,IB
HVS analysis
JR,JG,JB
Embed
Watermarked channel I’R,I’G,I’B
Fig. 2. Watermark embedding algorithm illustration
3.1 Watermark Embedding Algorithm The embedding algorithm is shown in Figure 2. Given a color image I, the R, G, B channel are separated and denoted as IR, IG, IB. Hilbert scan is adopted and the twodimensional pixel sequence is reordered into a one-dimension sequence for each channel. Watermark w is represented by the binary antipodal vector b. It is shuffled in different ways to be disordered into two new antipodal vectors which are denoted by cR, cG respectively. Including the unshuffled information sequence itself, we get three antipodal vectors: cR, cG, cB. And these vectors should be expanded into three new vectors: c’R, c’G, c’B. Then c’R, c’G, c’B are modulated into vectors: pRgc’R, pGgc’G, pBgc’B by three different pseudo-random antipodal sequences pR, pG, pB produced by pseudo-random-sequence (PRS) generator, which are controlled by three secrete keys: KR, KG, KB for security purpose. Considering the invisibility character, we use a human visual system (HVS) analyzer to control the watermark strength. It calculates the Just Noticeable Distortion (JND) mask: JR, JG, JB, for each color channel of the host image respectively. Multiplied by the JND, each modulated vector is embedded into corresponding color channel according to the following rule: (1) I’i=Ii+Ji pi c’ i where i ∈ { R, G, B} . The watermarked color image I’ is obtained by reassembling I’R, I’G, I’B. To maximize the admissible amplitude of the unperceivable information embedded, we use the HVS model in [5] and calculate the JND mask of each color channel directly in the spatial domain. This algorithm contains 3 aspects: texture and edge analysis, edge separation and reclassification, luminance sensitivity analysis. Firstly, each channel of the original image is divided in blocks of 8[8 pixels. Then to compute JND matrix for each 8[8 pixels block as follow: J i ( x, y ) = li ( x, y ) + dif i ( x, y ) where i ∈ {R, G, B} . For each channel, li ( x, y) represents the additional noise threshold and dif i ( x, y ) represents the basic noise threshold of the block it belongs to. Each JND mask has the same size with the original image, denoted as JR, JG, JB.
Robust Spatial Data Hiding for Color Images
Watermarked channel I’R,I’G,I’B
Window accumulate
s’R,s’G,s’B Deinterleave
sR,sG,sB
Decoder
209
Extract information b’
Fig. 3. Watermark extraction algorithm illustration
3.2 Shuffling Algorithm There are three advantages we use shuffling in our scheme. First, the watermark is shuffled in different ways can better meet the assumption of transmit diversity technique than original scheme [1]. Second, after shuffling, information bit is scattered in different area of different color channel. So the same information bit is possibly embedded in flat region and texture region, and we can make use of texture region to recover more bit correctly in extract process. Third, shuffling also enhances security since the shuffling table or a key for generating the table is needed to correctly extract the hidden data. A key k=(k0, k1) is chosen by the copyright owner, where k0 is an arbitrary integer, and k1 is an integer within the interval [N/3, 2N/3] and is prime to N. Define f (i) = ( k0
k1 ) mod N, i = 0,1,…,N
(2)
Clearly, a one-to-one mapping between i and f(i) exists. In extraction procedure, we can derive i from f(i) by using above algorithm on f(i). Define i = ( f (i)
k0 ) hk2 mod N
(3)
where k2 satisfies the equation (4) k2hk1 = 1 mod N
3.3
(4)
Watermark Extraction Algorithm
The extraction algorithm is shown in Figure 3. For a given watermarked color image I’, the channel I’R, I’G, I’B are first separated. Hilbert scan is used again to transfer two-dimensional sequences into one-dimensional sequences. Then we calculate the window energy s’i,k [1] for each channel. Before deciding the extracted information bit, we synchronize the three channels by deshuffling s’R, s’G, s’B into sR, sG, sB. For computational simplicity, we use simple following methods to decide the information bits: Meth--accumulating the value of sR,k, sG,k, sB,k for each bit, the sum is denoted by sk . After getting sk, we use hard-decision to estimate the message according the sign of sk, namely: y If the sign of sk is positive, the extracted information bit is 1; y If the sign of sk is negative, the extracted information bit is 0.
210
X. Li, X. Xue, and W. Li
4 Experimental Results Here we just give some experiment results for brevity. Please note that watermarks in five schemes are the same in all aspects except the channels in which they are embedded. We have tested the proposed algorithm on images with various content complexity, e.g., “Lena,” “Pepper,” “Baboon,” and “F16.” All these images are 512h512h24 color image. The image editing and attacking tools adopted in experiment are StirMark 4.0 and Photoshop 6.0. 4.1
Capacity Analysis
Watermarking based on direct-sequence spread spectrum in spatial domain is a highpayload technique. But we have to trade off bandwidth and capacity in reality. As transmit diversity can improve the throughput of band-limited wireless channel, it may operates on watermark-channel. Experiments show that a certain color channel doesn’t perform congruously. So referring to the transmit diversity principle, we model the three color tones of RGB image as three independent fading channels and embed the watermark in three channels simultaneously. Extensive experiments prove that this method can furthest integrate the merit of each channel. When embedding the same capacity of information, the BER of new algorithm is always lower than that of single channel scheme (See Figure 4) for all the four images. In Huan’s [10] experiment, 192 information bits are embedded and watermark extraction need original unwatermarked image. Lots of experiment result proves that propose blind watermarking algorithm in the paper is able to hide 512 bits without error. Du [5] mainly studied four kinds of convolutional code with different constraint lengths by comparing their performance on BER with that of the uncoded scheme in single channel. Combined convulutional code with proposed algorithm, the capacity can be improved again. 4.2
Robustness Test
To test the robustness against noise attack, we distorted the watermarked “Peppers” by Gaussian noise with different amount, and the peak signal-to-noise (PSNR) correspond to different amount noise is listed in Table 1. If the noise amount is 2%, the BER when using red, green, blue and luminance channel separately is 2.53%, 4.88% and 1.17%. From Figure 5, we can see that the BER of our algorithm is only 2% when the noise amount is equal 20%, but it’s more than 7% for any single channel. Especially when the noise amount is below 10%, our scheme can keep the BER almost zero. Experiment results prove that our algorithm is considerably more robust than single channel.
Robust Spatial Data Hiding for Color Images
211
Table 1. The amount of Gaussian noise (in percentage) vs. PSNR in each channel The amount of Gaussian noise (%) for watermarked image “Peppers”
PSNR
(db) R
2 38.73
4 32.77
6 29.13
8 26.87
10 24.52
12 22.76
14 21.78
16 20.88
18 19.97
20 19.05
G
38.69
32.74
29.16
26.96
24.66
22.94
22.06
21.13
20.25
19.30
B
38.68
32.79
29.21
27.03
24.72
22.98
22.07
21.16
20.15
19.23
Using transmit diversity techniques, we greatly improve the robustness to the unintentional distortion as JPEG compression. In our experiments, we fix the hiding rate at 1 bit per 1024 pixels. It can be seen in Figure 6 that our scheme is more robust to JPEG compression than single channel schemes. We test the performance on resistance to cropping by cutting off the borders of the watermarked images and experiment results in Table 2 shows that new scheme performs much better than the single channel schemes. When the watermark is transmitted through a single channel, information bits in the regions that are cut off can’t be extracted correctly. But we can recover these bits by replica transmitted in other channels. In addition, due to shuffling, the information bits in a certain channel of the cropped region are undestroyed in other channels because they are embedded in other regions that in most cases are not cropped. Our scheme can also resist low-pass filtering at certain degree. We have tested the robustness against median filter with 3h3 window size. We find that our scheme shows great advantage on the robustness to low-pass filtering when only one color channel is filtered. Experiment results are shown in Table 3, where Bf means to filter the blue channel. Table 2. BER (%) of five schemes for cropped watermarked image “Peppers” with 512 information bits Embedded Channel R G B Lum Meth
5% 5.85 6.05 2.14 5.85 0
Cut off borders to image “Peppers” (% of image cropped) 10% 15% 20% 9.76 13.47 13.86 9.17 13.67 13.47 5.27 9.17 9.17 8.65 11.58 14.04 0.19 0.39 0.58
25% 21.87 24.20 16.70 16.79 0.97
5 Conclusion In this paper, we describe the revised scheme of watermarking algorithm [1] in detail, and give experimental results that demonstrate the unmistakable advantage of our new approach over algorithms operating on not only single color channel, but also
212
X. Li, X. Xue, and W. Li
luminance channel. Experimental result also prove that our scheme is more robust than any other single channel scheme in spatial for color images. This algorithm cost little time on extraction so that it can be applied to video watermark. Its robustness to kinds of attacks also makes it suitable for broadcast monitoring and covert communication. Proposed algorithm is blind data hiding scheme, information bits can be extracted without original image, but it is fragile to geometric-based attack. So our further work focuses on designing a method to successfully resist geometric-based attack.
(a)
(b)
(c)
(d)
Fig. 4. Capacity test for image “Baboon”, “F16”, “Lena” and “Peppers”. Detection BER (%) for watermarked image with different amount of information bits.
Fig. 5. Empirical BER with respect to the noise (in percentage) with PSNR listed in Table 1 for “Pepper”
Fig. 6. Empirical BER with respect to the JPEG compression with different quality factor for “Pepper”
Robust Spatial Data Hiding for Color Images
213
Table 3. BER (%) of five schemes after 3h3 median filtering for “Peppers” Watermark (bit) 256 512 1024
Meth Bf 0 0.39 0.66
Lum Bf 0.21 3.13 4.63
R Rf 5.85 23.04 22.55
G Gf 3.15 22.41 22.41
B Bf 3.52 16.99 22.24
References 1.
2. 3.
4. 5. 6.
7.
8. 9.
10. 11.
12.
13.
X.Q. Li, Z.Y. Du, et al. “Multi-channel Data Hiding Scheme for Color Images”, in Proceeding of International Conference on Information Technology: Coding and Computing, ITCC2003, Apr 28, 2003. W. Bender, D. Gruhl, and et al, “Technique for Data Hiding”, IBM Systems Journal, vol. 35, no.3&4, pp.313–336, 1996 G. C. Langelaar, J. C. A. Van der Lubbe, and R. L. Lagendijk, “Robust Labeling Methods for Copy Protection of Images”, in Storage and Retrieval for Image and Video Databases V, I. K. Sethin and R. C. Jain, Eds. San Jose, CA: IST and SPIE, Feb. 1997, vol. 3022, pp.298–309 N. Nikolaidis and I. Pitas, “Robust Image Watermarking in the Spatial Domain”, Signal Processing, vol. 66, no. 3, pp. 385–403, May 1998. Z.Y. Du, Y. Zhou, and P.Z. Lu, “An Optimized Spatial Data Hiding Scheme Combined with Convolutional Codes and Hilbert Scan”, PCM2002. D. Fleet and D.Heeger, “Embedding Invisible Information in Color Images”, in Proc. IEEE Inter. Conf. Image processing’97, Santa Barbara, CA, October 26-29 1997, vol.1, pp.532–535. M. Kutter, F. Jordan, and F. Bossen, “Digital signature of Color Images Using Amplitude Modulation” in Storage and Retrieval for Image and Video Database V, SPIE vol. 3022, San Jose, CA, February 8-14 1997, pp. 518–526 M. Kutter, S. Winkler, “A Vision-based Masking Model for Spread-spectrum Image Watermarking”, IEEE Tran. on Image Processing, vol.11, no.1, Jan. 2002 M. Barni, F. Bartolini and A. Piva, “Multichannel watermarking of color images”, IEEE Transactions on circuits and systems for video technology, vol. 12, no. 3, pp.142–156, Mar. 2002. J. Huang and Y.Q. Shi, “Reliable Information bit hiding”, IEEE Transactions on circuits and systems for video technology, vol. 12, no. 10, pp. 916–920, Oct. 2002. Ingemar J.Cox, Matt L. Miller and Jeffrey A. Bloom, “Watermarking Applications and Their Properties”, in Proceeding of Int. Conf. On Information Technology’2000, Las Vegas, 2000. V. Tarokh, N. Seshadri and A.R. Calderbank, “Space-time codes for high data rate wireless communication: performance criterion and code construction”, IEEE Transactions on Information Therory, vol. 44, no. 2, Mar. 1998. T. H. Liew and L. Hanzo, “Space-time Black Codes and Concatenated Channel Codes for Wireless Communications”, in Proceedings of the IEEE, vol.96, no.2, pp. 744–765, Feb. 2002.
Watermark Security via Secret Wavelet Packet Subband Structures Werner Dietl and Andreas Uhl Dept. of Scientific Computing, University of Salzburg, Jakob-Haringer-Str. 2, A-5020 Salzburg, Austria {wdietl,uhl}@cosy.sbg.ac.at
Abstract. Wavelet packet decompositions generalize the classical pyramidal wavelet structure. We use the vast amount of possible wavelet packet decomposition structures to create a secret wavelet domain and discuss how this idea can be used to improve the security of watermarking schemes. Two methods to create random wavelet packet trees are discussed and analyzed. The security against unauthorized detection is investigated. Using JPEG and JPEG2000 compression we assess the normalized correlation and Peak Signal to Noise Ratio (PSNR) behavior of the watermarks. We conclude that the proposed systems show improved robustness against compression and provide around 21046 possible keys. The security against unauthorized detection is greatly improved.
1
Introduction
Fast and easy distribution of content over the Internet is a serious threat to the revenue stream of content owners. Watermarking has gained high popularity as a method to protect intellectual property rights on the Internet. For introductions to this topic see [1,2,3,4,5]. Over the last several years wavelet analysis was developed as a new method to analyze signals [6,7,8]. Wavelet analysis is also used in image compression, where better energy compaction, the multi-resolution analysis and many other features make it superior to the existing discrete-cosine based systems like JPEG. The new JPEG2000 compression standard [9,10] uses the wavelet transformation and achieves higher compression rates with less perceptible artifacts and other advanced features. With the rising interest in wavelets also the watermarking community started to use them. Many watermarking algorithms have been developed that embed the watermark in the wavelet transform domain — Meerwald [11] compiled an overview. The pyramidal wavelet decomposition recursively decomposes only the approximation subband. The wavelet packet decomposition does not have this limitation and allows further decomposition of all subbands. There are special algorithms to select the best decomposition for a specific input. For an introduction to wavelet packets and the best basis algorithm see [7]. A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 214–225, 2003. c IFIP International Federation for Information Processing 2003
Watermark Security via Secret Wavelet Packet Subband Structures
215
Wavelet packets have not found too much attention in the watermarking community yet. Wang [12] uses one non-standard decomposition to embed a watermark sequence in the middle frequencies of an image. The algorithm by Tsai [13] uses wavelet packets, but the selection is not specified and no experimental results are provided. One interesting approach is used by Vehel in [14]. The wavelet decomposition structure itself is used as the watermark sequence. In previous work the following techniques to enhance the security of watermarks have been proposed. Pseudo-random skipping of coefficients has been proposed by Wang [15] or Kundur [16], but skipping significant coefficients reduces the capacity of the systems. Fridrich [17] introduced the concept of key-dependent basis functions in order to protect a watermark from hostile attacks. By embedding the watermark information in a secret transform domain, Fridrich’s algorithm can better withstand attacks such as those described by Kalker [18] employing a public watermark detector device. However, Fridrich’s approach suffers from the computational complexity and the storage requirements for generating numerous orthogonal patterns of the size of the host image. In a later paper Fridrich reduced the computational complexity of the system [19]. Parametrized wavelet filters were proposed in [20,21] to improve the security of wavelet based watermarking systems. In this paper we propose to embed the watermark sequence using a secret wavelet decomposition and to use the decomposition structure as embedding key. This protects wavelet-based watermarks against unauthorized detection — the watermark should only be detectable using the specific wavelet decomposition that was used for embedding. Section 2 describes the details of our proposed system. Then in sections 3 and 4 we assess the security against unauthorized detection and the behavior under JPEG and JPEG2000 compression. We finish the paper with the conclusions in section 5.
2
Proposed Method
Our system is based on the Wang algorithm proposed in [15]. In the paper the authors already suggest to keep the wavelet decomposition structure secret, but no further details or experimental results are provided. The basic system design is shown in Fig. 1. For the forward wavelet transformation we use a secret wavelet packet tree and then embed the watermark in the generated wavelet coefficients. After embedding the watermark we apply the inverse transformation using the same wavelet packet tree to generate the watermarked image. The Wang algorithm embeds the watermark sequence based on Successive Subband Quantization (SSQ), which has been developed by the same authors [22,23], and is used in the Multi-Threshold Wavelet Codec (MTWC). Within a selected subband all unselected coefficients Cs (x, y) that are larger than a threshold Ts are used to embed a watermark element Wk according to Cs,k (x, y) = Cs (x, y) + αs βs Ts Wk .
(1)
216
W. Dietl and A. Uhl Host Image
Wavelet Decomposition
Forward DWT
Watermarked Image
Inverse DWT
Watermark Embedding Wavelet Packet Tree
1001101110
Wavelet Packet Tree
Fig. 1. Basic system design
The two factors αs and βs are used to determine the embedding strength of the algorithm. The wavelet packet tree is generated by a random process that depends on a secret seed number. In the following we will also call this seed number either simply key or tree number. Two tree numbers that are close together do not necessarily generate similar trees. We select a tree number and create a random wavelet packet tree using that number. This tree number is kept secret and is later needed to extract the watermark. When it is time to detect the watermark we use the secret tree number to generate the same wavelet packet tree again and extract the watermark sequence. Then we apply a normalized correlation calculation to the embedded and the extracted sequences and determine the likeliness that a watermark was embedded. There is a vast number of possible wavelet packet trees. According to [24], for a decomposition with n+1 levels there are f (n) =
4 4 i=0
i
· (f (n − 1))i
(2)
possible trees ( f (0) = 1 ). For 4 decomposition levels this results in around 265 trees and for 7 levels around 24185 trees are possible. For all decompositions we use the standard Biorthogonal 7/9 filter.
2.1
Generation of Tree Decompositions
We have three parameters that influence the tree decomposition: the tree number, the number of decomposition levels and the decomposition method. We implemented two methods to randomly construct a tree. The first method randomly decomposes the subbands and the second one puts more focus on decomposing middle frequencies.
Watermark Security via Secret Wavelet Packet Subband Structures
217
Random Tree Decomposition 1. For this method we initialise a random number generator with the tree number as seed and then use a 50% probability for each subband to decide whether it should be further decomposed or not. This decomposition strategy gives us the full range of possible decomposition trees, but could also result in trees that are generally not good for watermark embedding. For example, if the generated tree only applies decompositions to the detail subbands on all levels, then we are likely to embed the watermark in a high frequency domain which is more sensitive to image compression. Random Tree Decomposition 2. We developed this decomposition specifically for our watermarking system. The focus is on building a wavelet tree that has a good resolution of the middle and low frequencies, which are best suited for watermark embedding. No decomposition of the three detail subbands on the first level (HL1 , LH1 and HH1 ) is performed, only the first approximation subband is further decomposed. Therefore, more emphasis is put on decomposing in the middle frequencies. Using this decomposition strategy we basically loose all the trees that are below the three top-level detail subbands. Therefore we have only around 83521 trees for 4 levels, but still around 21046 trees for 7 levels. 2.2
Embedding Variations
From the security analysis we learn that common subtrees can happen and can result in high correlation even for wrong tree numbers (see section 3, Figs. 2 and 3). To protect against this problem we implemented two embedding variations that add another dependency on the tree number. Then two trees can have a common subtree, but through the embedding variation there are still enough differences between the two watermarks to make the system secure. Both embedding variations can also be used with the standard watermarking system. Instead of using the tree number again as seed for the embedding variation we could use another number and use it as additional key element. But to limit the complexity of our analysis we simply reuse the tree number for the embedding variations. Variation 1 — Tree-Dependent Coefficient Skipping. This first variation skips a part of the selected significant coefficients, as proposed by Wang [15]. We use 95% of the coefficients that are selected. The disadvantage of coefficient skipping could be reduced robustness to compression and reduced capacity. We expect that using 95 percent of the coefficient results in very good robustness results and does not limit the capacity too severely. Variation 2 — Tree-Dependent Watermark Shuffling. The second variation creates a permutation of the watermark sequence before we embed it into the wavelet coefficients. Depending on the tree number we shuffle the elements
218
W. Dietl and A. Uhl
of the watermark and then embed them into the selected wavelet coefficients. This variation should not have an influence on the robustness or capacity of the watermark, because we select the same coefficients for embedding.
3
Security Assessment
For the security assessment we embed a 1000 element watermark with 40dB PSNR into the Lena image. We use the tree that is generated by the tree number 150000 for embedding and then use a set of keys with which we try to detect the watermark. The set starts at 100000 and goes up to 200000 in increments of 50 — giving 2001 measurements. We also compare the behavior for 4 and 7 decomposition levels. Besides showing the effect of using the wrong key to extract the watermark we also look at the effect of using the wrong variation. When we embed the watermark with one of the variations we only want to be able to successfully extract the watermark with that variation.
1
1 Decomposition 2, Emb-Variation , Ext-Variation
0.8
0.8
0.6
0.6 Correlation
Correlation
Decomposition 1, Emb-Variation , Ext-Variation
0.4
0.4
0.2
0.2
0
0
-0.2 100000
120000
140000
160000 Tree #
(a) Decomposition 1
180000
200000
-0.2 100000
120000
140000
160000
180000
200000
Tree #
(b) Decomposition 2
Fig. 2. Security assessment with 7 levels and without embedding variation
Fig. 2 compares the response for decomposition 1 and 2 with 7 decomposition levels without an embedding variation. For decomposition 1 there is one clear peak at 150000 and low correlation for all other tree numbers. But for decomposition 2 there is one peak at 150000 and also three other tree numbers with more than 60 percent correlation. There are also many other tree numbers with a correlation of around 0.20. Because for decomposition 2 we do not allow decompositions at the top level and also have a different probability distribution at the lower levels we have more common subtrees than in decomposition 1. This leads to more common sequences in different trees and therefore to higher correlation for the wrong
Watermark Security via Secret Wavelet Packet Subband Structures
1
1 Decomposition 1, Emb-Variation , Ext-Variation
Decomposition 2, Emb-Variation , Ext-Variation
0.8
0.8
0.6
0.6 Correlation
Correlation
219
0.4
0.4
0.2
0.2
0
0
-0.2 100000
120000
140000
160000
180000
-0.2 100000
200000
120000
140000
Tree #
160000
180000
200000
Tree #
(a) Decomposition 1
(b) Decomposition 2
Fig. 3. Security assessment with 4 levels and without embedding variation
extraction parameters. If two trees are very similar this will lead to the high correlation we see at the three additional peaks in Fig. (b). In Fig. 3 we compare the behavior if only 4 decomposition levels are used. Here we see that also decomposition 1 can create many common subtrees that lead to high correlation for wrong tree numbers. For decomposition 2 there are many trees with a correlation of around 0.20. The reason for this exact behavior is unknown at the moment.
1
1 Decomposition 1, Emb-Variation -1, Ext-Variation
0.8
0.8
0.6
0.6 Correlation
Correlation
Decomposition 1, Emb-Variation -1, Ext-Variation -1
0.4
0.4
0.2
0.2
0
0
-0.2 100000
120000
140000
160000
180000
Tree #
(a) Correct Extraction
200000
-0.2 100000
120000
140000
160000
180000
200000
Tree #
(b) Incorrect Extraction
Fig. 4. Effect of skipping some coefficients with decomposition 1 and 7 levels
These result were the reason why we introduced the two embedding variations. Different coefficients should be modified or the same coefficients should be modified in a different way, even if common subtrees happen. We implemented the two tree-number-dependent embedding variations described earlier to add
220
W. Dietl and A. Uhl
this feature. Decomposition 1 with a 7 level decomposition has a lower likeliness of common subtrees and can be used without a variation. But decomposition 2 with 7 levels and both decompositions with only 4 decompositions need one or both of the variations to protect against common subtrees. In Fig. 4 we see the effect of embedding variation one — skipping some coefficients — in combination with decomposition 1 with 7 levels. There is again one clear peak and the correlation of wrong tree numbers was further decreased. In Fig. (b) we see what happens when we do not use variation 1 for watermark extraction. There is low correlation for all tree numbers and the watermark is not found.
1
1 Decomposition 2, Emb-Variation -1, Ext-Variation
0.8
0.8
0.6
0.6 Correlation
Correlation
Decomposition 2, Emb-Variation -1, Ext-Variation -1
0.4
0.4
0.2
0.2
0
0
-0.2 100000
120000
140000
160000
180000
Tree #
(a) Correct Extraction
200000
-0.2 100000
120000
140000
160000
180000
200000
Tree #
(b) Incorrect Extraction
Fig. 5. Effect of variation 1 on decomposition 2 with 7 levels
In Fig. 5 we see the effect of variation 1 on decomposition strategy 2 with 7 levels. There is only one peak for the correct tree number and the correlation for the wrong trees is reduced. The introduction of the embedding variations makes decomposition 2 a useable system. Fig. (b) shows the result when the wrong extraction variation is used. Again there is very low correlation for all tree numbers and the correct tree number can not be found. This shows that the embedding variations can also be used to further increase the security of the watermark.
4
Quality Assessment
For the quality assessment we embed a watermark with 40dB PSNR and compress the watermarked image with JPEG and JPEG2000. We then try to detect the watermark in the compressed image and measure the correlation. As a measure of the distortion we use the PSNR of the compressed image.
Watermark Security via Secret Wavelet Packet Subband Structures
221
To get a good variation of different trees we use tree numbers from 100000 to 200000 with increments of 400. From all those 251 measurements we calculate the average, maximum and minimum correlation and PSNR.
1
40 38
0.8
36 34
0.6 PSNR
Correlation
32
0.4
30 28 26
0.2
24 Decomposition 1 Decomposition 2 Pyramidal Biorthogonal 7/9 Pyramidal Daubechies 6
0
0.25
0.2
Decomposition 1 Decomposition 2 Pyramidal Biorthogonal 7/9 Pyramidal Daubechies 6
22
0.15 0.1 JPEG2000 Rate
0.05
20 0.25
0
0.2
0.15 0.1 JPEG2000 Rate
(a) Correlation
0.05
0
(b) PSNR
Fig. 6. Comparison of the proposed systems with the standard systems under JPEG2000 compression
1
40
38 0.8 36
0.6 PSNR
Correlation
34
32
0.4
30 0.2 28
Decomposition 1 Decomposition 2 Pyramidal Biorthogonal 7/9 Pyramidal Daubechies 6 0
Decomposition 1 Decomposition 2 Pyramidal Biorthogonal 7/9 Pyramidal Daubechies 6
26 90
80
70
60
50 JPEG Quality
40
(a) Correlation
30
20
10
90
80
70
60
50 JPEG Quality
40
30
20
10
(b) PSNR
Fig. 7. Comparison of the proposed systems with the standard systems under JPEG compression
We compare tree decomposition 1 and 2 and expect that tree decomposition 2 will have better results for higher compression rates. The results of the wavelet packet methods are compared with the standard Wang watermarking system using the Daubechies 6 and the Biorthogonal 7/9 filters.
222
W. Dietl and A. Uhl
1
1
0.8
0.8
0.6
0.6 Correlation
Correlation
With more subband decompositions we expect that it will be possible to embed longer watermark sequences compared to the pyramidal decomposition. To see whether this is true we analyze the image quality with watermark lengths 1000, 5000 and 20000. Figs. 6 and 7 show the compression behavior of the two different wavelet packet decompositions in comparison with the standard system with the Biorthogonal 7/9 and the Daubechies 6 filters. In Fig. 6(a) we see the correlation behavior under JPEG2000 compression. The performance of decomposition 2 is superior to all other systems and the wavelet packet system with decomposition 1 is also better than the two standard systems.
0.4
0.2
0
0.25
0.4
0.2
Average Correlation Minimum Correlation Maximum Correlation 0.2
0
0.15 0.1 JPEG2000 Rate
0.05
(a) JPEG2000 Correlation
0
Average Correlation Minimum Correlation Maximum Correlation 90
80
70
60
50 JPEG Quality
40
30
20
10
(b) JPEG Correlation
Fig. 8. Details for tree decomposition 2
Fig. 7(a) shows the JPEG compression results. In this case the difference is smaller than under JPEG2000 compression. The decomposition 1 system performs slightly better than decomposition 2 and both wavelet packet systems are above the standard decompositions. The results for the PSNR performance are shown in Figs. 6(b) and 7(b). All systems behave very similarly without a significant difference. Therefore we will not show further PSNR results. Fig. 8 gives a close look at the results for decomposition 2. This diagram shows the average, minimum and maximum correlation behavior under compression. The minimum correlation behavior is still very good and the average is closer to the maximum. Figs. 9 and 10 compare the performance of decompositions 1 and 2 with 7 decomposition levels for watermarks of length 1000, 5000 and 20000. We see that under JPEG2000 compression decomposition 2 is the better system for all watermark lengths. The advantage of decomposition 2 gets bigger for longer watermarks. Fig. 10 shows the results under JPEG compression. For a watermark
Watermark Security via Secret Wavelet Packet Subband Structures
223
of length 1000 decomposition 1 has a slight advantage, but for lengths 5000 and 20000 decomposition 2 is clearly the better system. In comparison to the pyramidal decomposition the wavelet packet systems clearly have a higher robustness to compression.
1
0.8
Correlation
0.6
0.4
0.2 Decomposition 1; Length 1000 Decomposition 1; Length 5000 Decomposition 1; Length 20000 Decomposition 2; Length 1000 Decomposition 2; Length 5000 Decomposition 2; Length 20000 Pyramidal Biorthogonal 7/9; Length 1000 Pyramidal Biorthogonal 7/9; Length 5000 Pyramidal Biorthogonal 7/9; Length 20000
0
-0.2 0.25
0.2
0.15 0.1 JPEG2000 Rate
0.05
0
Fig. 9. Correlation comparison under JPEG2000 compression
1
0.8
Correlation
0.6
0.4
0.2
0
Decomposition 1; Length 1000 Decomposition 1; Length 5000 Decomposition 1; Length 20000 Decomposition 2; Length 1000 Decomposition 2; Length 5000 Decomposition 2; Length 20000 Pyramidal Biorthogonal 7/9; Length 1000 Pyramidal Biorthogonal 7/9; Length 5000 Pyramidal Biorthogonal 7/9; Length 20000
-0.2
-0.4 90
80
70
60
50 JPEG Quality
40
30
20
10
Fig. 10. Correlation comparison under JPEG compression
Decomposition 1 with 7 levels shows good security properties even without embedding variation and good robustness for a 1000 element watermark. For longer watermark lengths decomposition 2 has significantly better robustness results, but one of the embedding variations should be used to guard against common subtrees.
224
5
W. Dietl and A. Uhl
Conclusions
In this paper we described how the wavelet packet decomposition can be used to enhance the security of wavelet-based watermarking systems. We use the Wang coefficient selection method and propose two methods for generating random trees. The first method uses a 50% probability of decomposition for all subbands. The second method does not decompose the detail subbands on the first level and puts more emphasis on decomposing in the low and middle frequency range. The seed for the random number generator is used as key and is kept secret. For 7 decomposition levels we have 24185 or 21046 possible tree decompositions for the first and the second decomposition method, respectively. We also introduced two methods to protect against common subtrees that can result in higher correlation even for the wrong tree number. Both the security and the quality assessment show that the wavelet packet systems show better performance than the standard pyramidal system. Decomposition 1 can be used without embedding variation for 1000 element watermarks. Decomposition 2 has higher robustness for long watermarks and clearly outperforms the pyramidal decomposition. Overall we conclude that a random tree decomposition that focuses on the low and middle frequency range and uses either coefficient skipping or watermark shuffling results in a robust and secure watermarking system. Acknowledgments. Parts of this work were funded by the Austrian Science Fund FWF projects P15170 “Sicherheit f¨ ur Bilddaten in Waveletdarstellung” and P13732 “Objekt-basierte Bild- und Videokompression mit Wavelets”.
References 1. Katzenbeisser, S., Petitcolas, F.A.P.: Information Hiding Techniques for Steganography and Digital Watermarking. Artech House (1999) 2. Dittmann, J., ed.: Digitale Wasserzeichen: Grundlagen, Verfahren, Anwendungsgebiete. Springer Verlag (2000) 3. Johnson, N.F., Duric, Z., Jajodia, S.: Information Hiding: Steganography and Watermarking - Attacks and Countermeasures. Kluwer Academic Publishers (2000) 4. Cox, I.J., Miller, M.L., Bloom, J.A.: Digital Watermarking. Morgan Kaufmann (2002) 5. Eggers, J.J., Girod, B.: Informed Watermarking. Kluwer Academic Publishers (2002) 6. Daubechies, I.: Ten Lectures on Wavelets. Number 61 in CBMS-NSF Series in Applied Mathematics. SIAM Press, Philadelphia, PA, USA (1992) 7. Wickerhauser, M.: Adapted wavelet analysis from theory to software. A.K. Peters, Wellesley, Mass. (1994) 8. Mallat, S.: A wavelet tour of signal processing. Academic Press (1997) 9. ISO/IEC JPEG committee: JPEG 2000 image coding system — ISO/IEC 154441:2000 (2000) 10. Taubman, D., Marcellin, M.: JPEG2000 — Image Compression Fundamentals, Standards and Practice. Kluwer Academic Publishers (2002)
Watermark Security via Secret Wavelet Packet Subband Structures
225
11. Meerwald, P., Uhl, A.: A survey of wavelet-domain watermarking algorithms. In Wong, P.W., Delp, E.J., eds.: Proceedings of SPIE, Electronic Imaging, Security and Watermarking of Multimedia Contents III. Volume 4314., San Jose, CA, USA, SPIE (2001) 12. Wang, Y., Doherty, J.F., Dyck, R.E.V.: A wavelet-based watermarking algorithm for copyright protection of digital images. IEEE Transactions on Image Processing 11 (2002) 77–88 13. Tsai, M.J., Yu, K.Y., Chen, Y.Z.: Wavelet packet and adaptive spatial transformation of watemark for digital image authentication. In: Proceedings of the IEEE International Conference on Image Processing, ICIP ’00, Vancouver, Canada (2000) 14. Levy-Vehel, J., Manoury, A.: Wavelet packet based digital watermarking. In: Proceedings of the 15th International Conference on Pattern Recognition, Barcelona, Spain (2000) 15. Wang, H.J., Kuo, C.C.J.: Watermark design for embedded wavelet image codec. In: Proceedings of the SPIE’s 43rd Annual Meeting, Applications of Digital Image Processing. Volume 3460., San Diego, CA, USA (1998) 388–398 16. Kundur, D.: Improved digital watermarking through diversity and attack characterization. In: Proceedings of the ACM Workshop on Multimedia Security ’99, Orlando, FL, USA (1999) 53–58 17. Fridrich, J., Baldoza, A.C., Simard, R.J.: Robust digital watermarking based on key-dependent basis functions. In Aucsmith, D., ed.: Information hiding: second international workshop. Volume 1525 of Lecture notes in computer science., Portland, OR, USA, Springer Verlag, Berlin, Germany (1998) 143–157 18. Kalker, T., Linnartz, J.P., Depovere, G., Maes, M.: On the reliability of detecting electronic watermarks in digital images. In: Proceedings of the 9th European Signal Processing Conference, EUSIPCO ’98, Island of Rhodes, Greece (1998) 13–16 19. Fridrich, J.: Key-dependent random image transforms and their applications in image watermarking. In: Proceedings of the 1999 International Conference on Imaging Science, Systems, and Technology, CISST ’99, Las Vegas, NV, USA (1999) 237–243 20. Meerwald, P., Uhl, A.: Watermark security via wavelet filter parametrization. In: Proceedings of the IEEE International Conference on Image Processing (ICIP’01). Volume 3., Thessaloniki, Greece, IEEE Signal Processing Society (2001) 1027–1030 21. Dietl, W., Meerwald, P., Uhl, A.: Key-dependent pyramidal wavelet domains for secure watermark embedding. In Delp, E.J., Wong, P.W., eds.: Proceedings of SPIE, Electronic Imaging, Security and Watermarking of Multimedia Contents V. Volume 5020., Santa Clara, CA, USA, SPIE (2003) 22. Wang, H.J., Bao, Y.L., Kuo, C.C.J., Chen, H.: Multi-threshold wavelet codec (MTWC). Technical report, Derpartment of Electrical Engineering, University of Southern California, Los Angeles, CA, USA, Geneva, Switzerland (1998) 23. Wang, H.J., Kuo, C.C.J.: High fidelity image compression with multithreshold wavelet coding (MTWC). In: SPIE’s Annual meeting - Application of Digital Image Processing XX, San Diego, CA, USA (1997) 24. Pommer, A., Uhl, A.: Selective encryption of wavelet packet subband structures for secure transmission of visual data. In Dittmann, J., Fridrich, J., Wohlmacher, P., eds.: Multimedia and Security Workshop, ACM Multimedia, Juan-les-Pins, France (2002) 67–70
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression David Meg´ıas, Jordi Herrera-Joancomart´ı, and Juli`a Minguill´on Estudis d’Inform`atica i Multim`edia Universitat Oberta de Catalunya Av. Tibidabo 39–43, 08035 Barcelona Tel. (+34) 93 253 7523, Fax (+34) 93 417 6495 {dmegias,jordiherrera,jminguillona}@uoc.edu
Abstract. This paper describes an audio watermarking scheme based on lossy compression. The main idea is taken from an image watermarking approach where the JPEG compression algorithm is used to determine where and how the mark should be placed. Similarly, in the audio scheme suggested in this paper, an MPEG 1 Layer 3 algorithm is chosen for compression to determine the position of the mark bits and, thus, the psychoacoustic masking of the MPEG 1 Layer 3 compression is implicitly used. This methodology provides with a high robustness degree against compression attacks. The suggested scheme is also shown to succeed against most of the StirMark benchmark attacks for audio. Keywords: Copyright protection, Audio watermarking, Frequency domain methods.
1
Introduction
Electronic copyright protection schemes based on the principle of copy prevention have proven ineffective or insufficient in the last few years (see [1,2], for example). Pragmatic approaches, like the one adopted for protecting DVDs [3], combine copy prevention with copy detection. Watermarking is a well-known technique for copy detection, whereby the merchant selling the piece of information (e.g. an audio file) embeds a mark in the copy sold. From a construction point of view, a watermarking scheme can be described in two stages: mark embedding and mark reconstruction. Since the former determines the mark reconstruction process, the real problem is where and how the marks should be placed into the product. Watermarking schemes should provide some basic properties depending on specific applications. Different properties are pointed out in the literature [4,2,5,6] but the most relevant are imperceptibility, capacity and robustness. Imperceptibility, sometimes referred as perceptual quality, guarantees that the mark introduced is imperceptible and then the marked version of the product is not distinguishable from the original one. Capacity measures the amount of information that can be embedded. Such a property is also known as bit rate. Robustness determines the resistance to accidental removal A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 226–238, 2003. c IFIP International Federation for Information Processing 2003
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression
227
of the embedded mark. All those properties intersect in the sense that an increase in capacity usually improves robustness but reduces imperceptibility and, reciprocally, an increase in imperceptibility reduces robustness. Hence a trade-off between them must be achieved. In audio watermarking schemes, the mark embedding process can be performed in different ways, since audio allows multiple manipulations without affecting the perceptual quality. But, since robustness is the most important watermarking property, questions like where and how to place the mark are important issues. In order to maximise imperceptibility, some proposals [7,8,9] exploit the frequency characteristics of the audio signal to determine the place where the mark should be embedded. Other proposals [10] use echo coding techniques where the mark is encoded by using different delays between the original signal and the echo. Such a technique increases robustness against MPEG 1 Layer 3 audio compression and D/A conversion, but is not suitable for speech signals with frequent silence intervals. Robustness against various signal processing operations is also increased in [11] by dividing the set of the original samples in embedding segments. A more detailed state of the art in audio watermarking can be found in [5]. In this paper we present a novel watermarking scheme for audio. The scheme is based in some sense on the ideas of [12], where a lossy compression algorithm determines where the mark bits are placed. This paper is organised as follows. Section 2 presents the method that describes the new watermarking scheme. Section 3 analyses the properties of the resulting watermarking scheme: imperceptibility, capacity and robustness. Finally, in Section 4, conclusions and some guidelines for further research are outlined.
2 Audio Watermarking Scheme The audio watermarking scheme suggested in this paper is inspired in the image watermarking algorithm depicted in [12] in the sense that lossy compression is used in the mark embedding process in order to identify which samples are suitable for marking. Let the signal S to be watermarked be a collection of Pulse Code Modulation (PCM) samples (for example a RIFF-WAVE1 file). The aim of the watermarking scheme is to embed a mark into this file in such a way that imperceptibility and robustness of the mark is preserved. 2.1
Mark Embedding
Without loss of generality, let S be codified in RIFF-WAVE format. It is well-known that the Human Auditory System (HAS) is sensitive to information in the frequency rather than the time domain. Because of this, the first step of this method is to obtain SF , the spectrum of S, by applying a Fast Fourier Transform (FFT) algorithm. In order to determine where the mark bits should be placed, the signal S is compressed using a MPEG 1 Layer 3 algorithm with a rate of R Kbps (tuning parameter) and, then, decompressed again to RIFF-WAVE format. The modified signal, after this compression/decompression operation, is called S , and its spectrum SF is obtained. 1
RIFF-WAVE stands for Resource Interchange File Format-WAVEform audio file format.
228
D. Meg´ıas, J. Herrera-Joancomart´ı, and J. Minguill´on
Throughout this paper, the Blade codec (compressor/decompressor) for the MPEG 1 Layer 3 algorithm has been chosen and, thus, the psychoacoustic model of this codec is implicitly used. Note that audio quality is not an objective of the codec used for this step, since we only need the compression/decompression operation to produce a signal S which is slightly different from the original S. Hence, any other codec might have been used. Now, the set of frequencies Fmark = {fmark } suitable for marking are chosen according to the following criteria: 1. All fmark ∈ Fmark must belong to the relevant frequencies Frel of the original signal SF . This means that the magnitude (or modulus) |SF (fmark )| must be not lower than a given percentage (for example a 2%) of the maximum magnitude of SF . Therefore, a first set of frequencies Frel = {frel } is chosen as: fmax p : |SF (f )| ≥ Frel = f ∈ 0, |SF |max , 2 100 where fmax is the maximum frequency of the spectrum, which depends on the sampling rate and the sampling theorem2 , p ∈ [0, 100] is a percentage and |SF |max is the maximum magnitude of the spectrum SF . Note that the spectrum values in the interval [fmax /2, fmax ] are the complex-conjugate of those in [0, fmax /2]. The marking frequencies are a subset of these relevant frequencies, i.e. Fmark ⊆ Frel . 2. Now, the frequencies to be marked are those which remain “unchanged" after the compression/decompression phase, where “unchanged" means a relative error below a given threshold ε (for example ε = 0.05): SF (f ) − SF (f ) Fmark = {f1 , f2 , . . . , fn } = f ∈ Frel : <ε . SF (f ) Similarly, as done in the image watermarking scheme of [12], a 70-bit stream mark, W (|W | = 70), is firstly extended to a 434-bit stream WECC (|WECC | = 434) using a dual Hamming Error Correcting Code (ECC). Using dual Hamming binary codes allows us to apply the watermarking scheme as a fingerprinting scheme robust against collusion of two buyers [13]. Finally, a pseudo-random binary stream (PRBS), generated with a cryptographic key k, is added to the extended mark as it is embedded into the original signal. Once the frequencies in Fmark have been chosen, the mark embedding method consists of increasing or decreasing the magnitude of SF (fmark ) in order to embed a ‘1’ or a ‘0’, respectively. The increase or decrease in the magnitude of SF must be small enough not to be perceptible, but large enough such that the mark can be reconstructed from an attacked signal. The approach of the suggested scheme is to increase or decrease the signal amplitude d dB to embed a ‘1’ or a ‘0’, i.e., if fmark is the frequency at which a bit must be marked, the watermarked signal spectrum will be: SF (fmark ) · 10d/20 to embed ‘1’, SˆF (fmark ) = SF (fmark ) · 10−d/20 to embed ‘0’. 2
fmax =
1 , Ts
where Ts is the sampling time.
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression
229
where the parameter d dB can be tuned. This process is performed for all the frequencies fmark ∈ Fmark . Note, also, that it is required that n (the number of elements in Fmark ) should be greater than or equal to the length |WECC | of the extended mark (434 in our example). In a typical situation, the mark is embedded tens or hundreds of times all over the spectrum SˆF . In addition, it must be taken into account that the spectrum components in SF are paired (pairs of complex-conjugate values) and thus the same transformation (adding or subtracting d dB) must be performed to the magnitude SF (fmark ) and to the magnitude of its conjugate. For f ∈ Fmark the spectrum of SˆF is the same as that of S: if f ∈ Fmark , SF (f ), SˆF (f ) = SF (f ) ± d dB, if f ∈ Fmark .
Original signal S
FFT
SF
MPEG 1 Layer 3
MPEG 1 Layer 3
compressor
decompressor
S
ECC
WECC 434-bit stream
FFT
SF
Relative error SF , Fmark
k W 70-bit stream
SF , Frel
Relevant frequencies
PRBS
Magnitude modification SˆF IFFT
Sˆ Watermarked signal
Fig. 1. Mark embedding process
Finally, the marked audio signal is converted to the time domain Sˆ applying an inverse FFT (IFFT) algorithm. The whole mark embedding process is depicted in the block diagram of Fig. 1. Note that this scheme has been designed to provide with “natural" robustness against compression attacks, since only the frequencies for which the magnitude remains unaltered after compression/decompression, within some specified tolerance (the parameter ε), are chosen for marking. The mark embedding algorithm can be denoted in terms of the following expression: ˆ Fmark Embed (S, W, parameters = {R, p, ε, d, k}) → S,
230
2.2
D. Meg´ıas, J. Herrera-Joancomart´ı, and J. Minguill´on
Mark Reconstruction
The objective of the mark reconstruction algorithm is to detect whether an audio test ˆ It is assumed that T is signal T is a (possibly attacked) version of the marked signal S. in RIFF-WAVE format. If it were not the case, a format conversion step (for example MPEG 1 Layer 3 decompression) should be performed prior to the application of the reconstruction process. First of all, the spectrum TF is obtained applying the FFT algorithm and, then, the magnitude at the potentially marked frequencies |TF (fmark )|, for all fmark ∈ Fmark , is computed. Note that this method is strictly positional and, because of this, it is required that the number of samples in Sˆ and T is the same. If there is only a little difference in the number of samples, it is possible to complete the sequences with zeroes. Thus, this methodology cannot be directly applied when resampling attacks occur. In such a case, sampling rate conversion must be performed before the mark reconstruction algorithm can be applied. When |TF (fmark )| are available, a scaling step is undertaken in order to minimise ˆ the distance of the sequences |TF (fmark )| and SF (fmark ). This scaling is performed to suppress the effect of attacks which modify only a range of frequencies or which scale ˆ The following least squares problem is solved: the PCM signal S.
2 min SˆF (f ) − λ |TF (f )| . λ
f ∈Fmark
This problem can be solved analytically as follows. Given the vectors T s = |SF (f1 )| |SF (f2 )| . . . |SF (fn )| , T
sˆ = SˆF (f1 ) SˆF (f2 ) . . . SˆF (fn ) , T t = |T (f1 )| |T (f2 )| . . . |T (fn )| , where T stands for the transposition operator, it is possible to write the least squares problem in vector form as T
min (ˆ s − λt) (ˆ s − λt) , λ
which yields the minimum for: λ=
sˆT t . tT t
Now, each component of λt is divided by the corresponding component of s and the value obtained is compared with 10d/20 to decide wether a ‘0’, a ‘1’ or a ‘*’ (not identified) might be embedded in this component of λt. Let r i = λstii : d d 100 − q 100 + q ˆi := ‘1’, , 10 20 ⇒b r i ∈ 10 20 100 100 d d 1 100 − q 100 + q ˆi := ‘0’. , 10 20 ⇒b ∈ 10 20 ri 100 100
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression
231
ˆi := ‘*’. Here q ∈ [0, 100] is a If none of these two conditions are satisfied, then b ˆ ˆ which contains percentage (e.g. q = 10) and bi is the i-th component of the vector b a sequence of “detected bits". Finally, the PRBS signal is subtracted from the bits ˆ b to recover the true embedded bits b. This operation must preserve unaltered the ‘*’ marks. Once b has been obtained, it must be taken into account that its length n is (much) greater than the length of the extended mark. Hence, each bit of the mark appears at different positions in b. For example, if the length of the extended mark is 434, the first bit should appear at b1 , b435 , b869 , . . . , b1+434j , . . . Some of these bits will be identified as ‘1’, others as ‘0’ and the rest as ‘*’. Now a voting scheme is applied to decide wether the i-th bit of the mark is ‘1’, ‘0’ or unidentified (‘*’). Let n0 , n1 and n∗ , be the number of ‘0’s, ‘1’s and ‘*’s identified for the same mark bit. The simplest approach is to assign to each bit the sign which appears most. For example, if a given mark bit had been identified 100 times with n0 = 2, n1 = 47 and n∗ = 51, this simple approach would assign a ‘*’ mark to this bit. But, taking into account that any value outside the interval defined above is identified as ‘*’, it is clear that near ‘1’s are identified as ‘*’ although they are much closer to ‘1’ than to ‘0’. In the reported example, the big difference between the number of ‘1’s and ‘0’s (47 2) can reasonably lead to the conclusion that the corresponding bit can be assigned a ‘1’ with very little probability error, since most of the ‘*’ will probably be near ‘1’s. As a result of this consideration, the voting scheme used in this method ignores the ‘*’ if n∗ is not more than twice the difference |n1 − n0 |: ‘*’ if n∗ > 2 |n1 − n0 | , bit := ‘1’ if n∗ ≤ 2 |n1 − n0 | and n1 > n0 , ‘0’ if n∗ ≤ 2 |n1 − n0 | and n0 > n1 , A more sophisticated method using statistics might be applied instead of this voting scheme. For instance, an analysis of the distribution of r i for each bit might be performed. However, the voting procedure described here is simple to implement and fast to execute, which makes it very convenient for real applications. SF , Fmark
SˆF , Fmark Test signal T
W 70-bit stream
FFT
ECC
TF
Least squares
WECC
Voting
434-bit stream
scheme
λTF
b
Bit identification ˆ b PRBS
k
Fig. 2. Mark reconstruction process As a result of this voting scheme an identified extended mark WECC will be avail able. Finally, WECC and the error correcting algorithm are used to recover an identified
232
D. Meg´ıas, J. Herrera-Joancomart´ı, and J. Minguill´on
70-bit stream mark, W , which will be compared with the true mark W . The whole reconstruction process is depicted in Fig. 2. The mark reconstruction algorithm can be described in terms of the following expression:
ˆ Fmark , parameters = {q, d, k} → {W , b} Reconstruct T, S, S, where b is a byproduct of the algorithm which might be used to perform statistical tests. The proposed scheme is not blind, in the sense that the original signal is needed by the mark reconstruction process. On the other hand, the bit sequence which forms the embedded mark is not needed for reconstruction, which makes this method suitable also for fingerprinting once the mark is properly coded [14].
3
Performance Evaluation
As pointed out in Section 1, three main measures are commonly used to assess the performance of watermarking schemes: Imperceptibility: the extent to which the embedding process leaves undamaged the perceptual quality of the marked object. Capacity: the amount of information that may be embedded and recovered. Robustness: the resistance to accidental removal of the embedded bits. In this section, we test the properties of the proposed scheme presented in Section 2. The scheme described in Section 2 was implemented using a dual binary Hamming code DH(31, 5) as ECC and the pseudo-random generator is a DES cryptosystem implemented in a OFB mode. A 70-bit mark W (resulting in an encoded WECC with |WECC | = 434) was included. In order to test the watermarking scheme we have chosen the following parameters for embedding and reconstruction: – R = 128 Kbps, which is the most widely used bit rate in MPEG 1 Layer 3 files. – p = 2, meaning that we only consider relevant those frequencies for which the magnitude of SF is, at least, a 2% of the maximum magnitude. – ε = 0.05, which implies that a frequency is considered unchanged after compression/decompression if the magnitude varies less than a 5% (relative error). – d = 1 dB, if lower imperceptibility is required, a lower value can be chosen. – q = 10, i.e. a ±10% band is defined about d in order to reconstruct ‘1’s and ‘0’s. This choice is quite conservative, since the ‘0’ and the ‘1’ bands are quite far from each other. It is worth pointing out that these parameters have been chosen without performing a deep analysis on tuning. Basically, R, p and ε affect the place where the mark bits are embedded; d is related to the imperceptibility of the hidden mark, since it describes how much the spectrum of each marked frequency is disturbed; and, finally, q affects the robustness of the method, since it avoids misidentification of the embedded bits. To test the performance of the suggested audio watermarking scheme, some of the audio files provided in the Sound Quality Assessment Material (SQAM) page [15] have
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression
233
been used. The following files have been tested: violoncello (m.p.3 ), trumpet (m.p.), horn (m.p.), glockenspiel (m.p.), harpsichord (arp.4 ), soprano (voice), bass (voice), quartet (voice), English female speech (voice) and English male speech (voice). We have taken only the first ten seconds of each of these files, i.e., 441000 samples, and the mark has been embedded in the left channel only. The glockenspiel file is an especial case, since it has about 5 blank seconds out of 10. 3.1
Imperceptibility
The imperceptibility property determines how much the marked signal Sˆ differs from the original one S. That is, imperceptibility is concerned with the distortion added with the inclusion of the mark or, in other words, with the audio quality of the marked signal Sˆ with respect to S. There are several ways to measure audio quality. Here, the signal-tonoise ratio (SNR) and an average SNR (ASNR) are used. The SNR measure determines the power of the noise added by the watermark relative to the original signal, and is defined by N
SNR =
Si2
i=1
N
Si − Sˆi
2
i=1
ˆ Usually, where N is the number of samples and Si (Sˆi ) denotes the i-th sample of S (S). this value is given in dB by performing the operation 10 log10 (SNR). Another measure usual in audio quality assessment is an average of the SNR computed taking sample blocks of some length. A typical choice is to consider pieces of 4 ms, which, with a sampling rate of 44100 Hz, means 176 samples. The SNR of all these pieces is computed, and the average for all the sample blocks is obtained. The ASNR measure is often given in dB. The measure used in this paper does not take into account the Human Auditory System (HAS) and, thus, all frequencies are equally weighted. In Table 1, the SNR and ASNR measures obtained for the ten benchmark files are shown. The SNR measures are about 19 dB whereas the ASNR measures are about 20 dB. This means that power of the noise introduced by watermarking is roughly 0.01 times the power of the original signal, which is quite satisfying and might even be improved (reduced) by choosing proper tuning parameters. Obviously, the parameter d only affects the imperceptibility of the watermark, since it determines to which extent the spectrum of the marked signal Sˆ is modified with respect to the original signal S. Hence, by reducing d, to say 0.5 dB, the imperceptibility of the mark would increase, though it will be more easily removed. The parameters R, p and ε determine how many frequencies are chosen for watermarking and, thus, they also affect the imperceptibility of the mark. The larger the number of marked frequencies is, the more perceptible the mark becomes. This establishes a link between the imperceptibility 3 4
“m.p." stands for “melodious phase". “arp." stands for “arpegio".
234
D. Meg´ıas, J. Herrera-Joancomart´ı, and J. Minguill´on Table 1. Capacity and imperceptibility
SQAM file Marked bits Capacity (bits) SNR (dB) ASNR (dB) violoncello 4477 722 18.92 20.91 trumpet 3829 617 18.83 19.84 horn 1573 253 18.96 21.10 glockenspiel 1258 202 25.78 29.75 harpsichord 3874 624 21.25 22.84 soprano 5042 813 19.47 21.59 bass 15763 2542 19.02 20.08 quartet 13548 2185 19.22 20.36 female speech 10677 1722 19.57 21.84 male speech 9359 1509 19.44 21.49
and the capacity of the watermarking system. Hence a tradeoff between imperceptibility and capacity must be achieved. Note, also, that capacity is related to robustness, since an increase in the number of times the mark is embedded into the signal results in decreasing the probability of losing the mark. 3.2
Capacity
The capacity of the watermarking scheme is determined by the parameters R, p and ε used in the embedding process. Since the marked frequencies are chosen according to the difference of S and S (the compressed/decompressed signal), it is obvious that the rate R is a key parameter in this process. The percentage p determines which frequencies are significant enough to be taken into account and, thus, this is a relevant parameter as capacity is concerned. Finally, the relative error ε are used to measure wether two spectral values of S and S are equal, which also affects the number of marked frequencies. In Table 1, the capacity of the suggested scheme for the ten benchmark files is displayed. We have considered that the true capacity is not the number of marked bits (the second column), since the extended watermark is highly redundant: 70 bits of information plus 364 bits of redundancy. Hence only 70/434 of the marked bits are the true capacity (third column). However, this redundancy is relevant to the robustness of the method, as it allows to correct errors once the extended mark WECC is recovered. Note, also, that 10 seconds of music are enough to allow for, at least, 3 times the mark. If 3-minute files were marked using this method, the capacity of method would be between 3652 bits (or 52 times a 70-bit mark) plus the redundancy for the glockenspiel file and 45763 bits (or 653 times a 70-bit mark) plus the redundancy for the quartet file. It must be taken into account that the glockenspiel file is an especial case, since it only contains 5 seconds of music. 3.3
Robustness Assessment
The robustness of the resulting scheme has been tested using the StirMark benchmark for audio [16], version 0.2. Some of the attacks in this benchmark can not be evaluated
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression
235
for the watermarking scheme presented in this paper, since the current version of our watermarking scheme does not allow for a large difference between the number of ˆ and the attacked (T ) signals. In addition, only the left channel samples of the marked (S) has been marked in the experiments, thus stereo attacks do not apply here either. The attacks considered for this test are summarised in Table 2. Table 2. Attacks described in the StirMark benchmark for audio
Name
Number
Name
AddBrumm 1—11 AddDynnnoise AddSinus 18 Amplify Echo 21 Exchange FFT Invert 24 FFT RealReverse FFT Test 27 FlippSample LSBZero 30 Normalise RC-LowPass 33 Smooth Stat1 36 Stat2
Number 12 19 22 25 28 31 34 37
Name
Number
Addnoise 13—17 Compressor 20 FFT HLPass 23 FFT Stat1 26 Invert 29 RC-HighPass 32 Smooth2 35 ZeroCross 38
According to this table, thirty-eight different attacks are performed. The attack AddFFTNoise with default parameters destroys the audio file (it produces no sound) and, thus, no results are available for this attack. Future versions of the watermarking scheme should cope with stereo attacks (ExtraStereo and VoiceRemove) and attacks which modify the number of samples in a significant way (CutSamples, ZeroLength, ZeroRemove, CopySample and Resampling), but the current version of the watermarking scheme proposed here cannot cope with them. In order to test the robustness of the suggested watermarking scheme against these 38 attacks, a correlation measure between the embedded mark W and the identified mark W is used. Let Wi and Wi be, respectively, the i-th bit of W and W , hence 1, if Wi = Wi βi = −1, if Wi = Wi is defined. Now, the correlation is computed, taking into account the βi for all the |W | bits (70 in our case) of the mark, as follows: |W |
1 βi . Correlation = |W | i=1 This measure is 1 when all the |W | bits are correctly recovered (W = W ) and it is −1 when all the |W | bits are misidentified. A value of about 0 is expected when 50% of the bits are correctly recovered, as it would occur if the mark bits were reconstructed randomly. In the StirMark benchmark test, we have considered that the watermarking scheme survives an attack only if the correlation is exactly 1, i.e. only if all the 70 bits of the mark are correctly recovered.
236
D. Meg´ıas, J. Herrera-Joancomart´ı, and J. Minguill´on Table 3. Survival of the mark to the StirMark test
Number Survival ratio Number Survival ratio Number Survival ratio 1
10/10
2
10/10
3
10/10
4
10/10
5
10/10
6
10/10
7
10/10
8
10/10
9
10/10
10
10/10
11
10/10
12
6/10
13
10/10
14
10/10
15
10/10
16
10/10
17
8/10
18
6/10
19
10/10
20
10/10
21
0/10
22
10/10
23
3/10
24
10/10
25
10/10
26
0/10
27
0/10
28
0/10
29
10/10
30
10/10
31
10/10
32
1/10
33
10/10
34
9/10
35
9/10
36
10/10
37
10/10
38
1/10
In Table 3 the survival of the mark against the StirMark benchmark attacks is displayed. The relation between the attack number and the name given in the StirMark benchmark is given in Table 2. Each attack has been performed to the ten files of the SQAM corpus reported above. Hence, the results are shown in a x/10 ratio since the total number of files is 10. As remarked above, the mark is considered to be recovered only if all the 70 bits are correctly reconstructed. The results of Table 3 show that only 7 of the 38 attacks of the StirMark benchmark performed in this paper cause serious damage to the embedded mark. The attacks with survival ratios of 6/10 or above produce good correlation values in the non-survived cases, which suggests that better results might arise with an appropriate tuning of the watermarking scheme. The non-survived attacks are the following: 21 (Exchange), 23 (FFT HLPass), 26 (FFT Stat1), 27 (FFT Test), 28 (Flipp Sample), 32 (RC Highpass) and 38 (ZeroCross). It must be remarked that most of these attacks produce significant audible damage to the signal and would not be considered acceptable under the most usual situations, especially for music files. Finally, a set of MPEG 1 Layer 3 compression attacks (using the Blade codec) have been carried out to the marked soprano SQAM file in order to test the robustness of the suggested watermarking scheme against compression. Since the rate used for watermarking is R = 128 Kpbs, it was expected that the scheme is able to overcome compression attacks with bit rates of 128 Kbps and higher. Table 4 displays the correlation values obtained for the MPEG 1 Layer 3 compression attacks for several bit rates, from 320 Kbps to 32 Kbps. This table shows that the watermarking scheme suggested here is not only robust for all bit rates greater than or equal to 128 Kpbs, as expected, but also to rates 112 and 96 Kbps, which are more compressed
A Robust Audio Watermarking Scheme Based on MPEG 1 Layer 3 Compression
237
Table 4. MPEG 1 Layer 3 compression attacks
Bit rate (Kbps) 320 256 224 192 160 128 112 Compression ratio 4.41:1 5.51:1 6.30:1 7.35:1 8.82:1 11.03:1 12.60:1 Correlation 1 1 1 1 1 1 1 Bit rate (Kbps) 96 80 64 56 48 40 32 Compression ratio 14.70:1 17.64:1 22.05:1 25.20:1 29.30:1 35.28:1 44.10:1 Correlation 1 0.97 0.97 0.94 0.83 0.80 0.49
than the rate used for watermarking (128 Kbps). In addition, the correlation value is very close to 1 even for rates 80, 64 and 56 Kbps. Of course, better robustness against compression attacks might be achieved by choosing a different rate for watermarking, for example R = 64 Kbps.
4
Conclusions and Further Research
This paper presents a watermarking method which uses MPEG 1 Layer 3 compression to determine the position of the embedded mark. The main idea of the method, borrowed from the image watermarking scheme of [12], is to find the frequencies for which the spectrum of the original signal is not modified after compression. These frequencies are used to embed the mark bits by adding or subtracting a given parameter to the magnitude of the spectrum. The method is complemented with an error correcting code and a pseudo-random binary signal to increase robustness and to avoid collusion of two buyers. Thus, this watermarking approach is also suitable for fingerprinting. The performance of the suggested schemes has been evaluated for the SQAM file corpus using three measures: imperceptibility, capacity and robustness. We have shown that (without tuning) the power of the embedded watermark is about 0.01 times that of the original signal. As capacity is concerned, for typical 3-minute music files, the mark can be repeated hundreds of times within the marked signal. Finally, robustness has been tested by performing the applicable attacks in the StirMark benchmark and also MPEG 1 Layer 3 attacks. The suggested scheme has been shown to be robust against most of the StirMark attacks and to compression attacks with compression ratios larger than that used for watermarking. There are several directions to further the research presented in this paper. Part of the future research will be focused on the parameters of the scheme, since some guidelines to tune these parameters must be suggested. In addition, the watermarking scheme must be adapted to stereo files by marking both the left and the right channels appropriately. It is also required that the watermarking scheme is able to cope with attacks which modify the number of samples in the attacked signal in a significant way. The use of filters which model the HAS to measure imperceptibility is another research topic. Finally, the possibility to work with blocks of samples instead of using the whole file should be addressed.
238
D. Meg´ıas, J. Herrera-Joancomart´ı, and J. Minguill´on
Acknowledgements. This work is partially supported by the Spanish MCYT and the FEDER funds under grant no. TIC2001-0633-C03-03 STREAMOBILE.
References 1. Petitcolas, F., Anderson, R., Kuhn, M.: Attacks on copyright marking systems. In: 2nd Workshop on Information Hiding. LNCS 1525, Springer-Verlag (1998) 219–239 2. Petitcolas, F., Anderson, R.: Evaluation of copyright marking systems. In: Proceedings of IEEE Multimedia Systems’99. (1999) 574–579 3. Bell, A.: The dynamic digital disk. IEEE Spectrum 36 (1999) 28–35 4. Swanson, M., Kobayashi, M., Tewfik, A.: Multimedia data-embedding and watermarking technologies. In: Proceedings of the IEEE. Volume 86(6)., IEEE Computer Society (1998) 1064–1087 5. Swanson, M.D.; Bin Zhu; Tewfik, A.: Current state of the art, challenges and future directions for audio watermarking. In: Proceedings of IEEE International Conference on Multimedia Computing and Systems. Volume 1., IEEE Computer Society (1999) 19–24 6. Voyatzis, G.; Pitas, I.: Protecting digital image copyrights: a framework. IEEE Computer Graphics and Applications 19 (1999) 18–24 7. Cox, I.J., Kilian, J., Leighton, T., Shamoon, T.: Secure spread spectrum watermarking for multimedia. IEEE Transactions on Image Processing 6 (1997) 1673–1687 8. M.D. Swanson, B. Zhu, A.T., Boney, L.: Robust audio watermarking using perceptual masking. Elsevier Signal Processing, Special Issue on Copyright Protection And Access Control 66 (1998) 337–335 9. W. Kim, J.L., Lee, W.: An audio watermarking scheme robust to mpeg audio compression. In: Proc. NSIP. Volume 1., Antalya, Turkey (1999) 326–330 10. D. Gruhl, A.L., Bender, W.: Echo hiding. In: Proceedings of the 1st Workshop on Information Hiding. Number 1174 in Lecture Notes in Computer Science, Cambridge, England, Springer Verlag (1996) 295–316 11. Bassia, P., Pitas, I., Nikolaidis, N.: Robust audio watermarking in the time domain. IEEE Transactions on Multimedia 3 (2001) 232–241 12. Domingo-Ferrer, J., Herrera-Joancomart´ı, J.: Simple collusion-secure fingerprinting schemes for images. In: Proceedings of the Information Technology: Coding and Computing ITCC’2000, IEEE Computer Society (2000) 128–132 13. Domingo-Ferrer, J., Herrera-Joancomart´ı, J.: Short collusion-secure fingerprinting based on dual binary hamming codes. Electronics Letters 36 (2000) 1697–1699 14. Boneh, D., Shaw, J.: Collusion-secure fingerprinting for digital data. In: Advances in Cryptology-CRYPTO’95. LNCS 963, Springer-Verlag (1995) 452–465 15. Purnhagen, H.: SQAM - Sound Quality Assessment Material (2001) http://www.tnt.uni-hannover.de/project/mpeg/audio/sqam/. 16. Steinebach, M., Petitcolas, F., Raynal, F., Dittmann, J., Fontaine, C., Seibel, S., Fates, N., Ferri, L.: Stirmark benchmark: audio watermarking attacks. In: Proceedings of the Information Technology: Coding and Computing ITCC’2001, IEEE Computer Society (2001) 49–54
Loss-Tolerant Stream Authentication via Configurable Integration of One-Time Signatures and Hash-Graphs 1
2
Alwyn Goh , G.S. Poh , and David C.L. Ngo
3
1
Corentix Laboratories, B-19-02 Cameron Towers, Jln 5/58B, 46000 Petaling Jaya, Malaysia [email protected] 2 Mimos, Technology Park Malaysia, 57000 Kuala Lumpur, Malaysia 3 Faculty of Information Science & Technology, Multimedia University, 75450 Melaka, Malaysia Abstract. We present a stream authentication framework featuring preemptive one-time signatures and reactive hash-graphs, thereby enabling simultaneous realisation of near-online performance and packet-loss tolerance. Stream authentication is executed on packet aggregations at three levels ie: (1) GM chaining of packets within groups, (2) WL star connectivity of GM authenticator nodes within meta-groups, and (3) signature m-chaining between meta-groups. The proposed framework leverages the most attractive functional attributes of the constituent mechanisms ie: (1) immediate verifiability of onetime signatures and WL star nodes, (2) robust loss-tolerance of WL stars, and (3) efficient loss-tolerance of GM chains; while compensating for various structural characteristics ie: (1) high overhead of one-time signatures and WL stars, and (2) loss-intolerance of the GM chain authenticators. The resultant scheme can be operated in various configurations based on: (1) ratio of GM chain to WL star occurence, (2) frequency of one-time signature affixation, and (3) redundancy and spacing of signature-chain.
1 Introduction Lossy streaming results in the received stream being a subset of the transmitted one; which is problematic from the data authentication viewpoint, especially in comparison to the well-established authentication and verification of block-oriented data. Such signature protocols allow receiver-side establishment of: (1) absolute data integrity during transit, and (2) association with specified sender; thereby regarding data loss (even a single bit) in transit as equivalent to fraudulent manipulation. Block-oriented signature protocols are therefore essentially inapplicable on lossy datastreams. 1.1 Overview of Previous Research Previous research in stream authentication [1-7] has focussed on the integrated use of signatures and hash-chaining, with the latter essentially an amortisation mechanism to compensate for the heavy overheads of the former. This basic concept is established
A
. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 239–251, 2003. © IFIP International Federation for Information Processing 2003
240
A. Goh, G.S. Poh, and D.C.L. Ngo
in the seminal work of Gennaro-Rohatgi (GR) [1], who also introduced the notion of reactive and preemptive mechanisms. The former is applicable when the entire datastream is available a priori to the sender, thereby enabling attachment to every packet of the hash-value of the preceeding packet. Earlier sections of the datastream are hence reactively verified by subsequently recovered packets. This contrasts with the preemptive one-time signatures applicable when the datastream is not entirely available a priori. In this case previously recovered one-time public-keys are used to verify subsequent one-time signatures. Note that both formulations as originally presented are intolerant of packet-loss. Reactive authentication was extended by the hash-trees of Wong-Lam (WL) [2], in which the hash-values of child-nodes are aggregated and used for parent-node computation. WL hash-trees require: (1) sender-side buffering of leaf and intermediate hashes, and (2) affixation of highly redundant authenticative information to the data packets; both of which can constitute significant overheads. This formulation does nevertheless enable immediate receiver-side verification and is also maximally loss-tolerant. This tolerance against worst-case packet-loss contrasts with the more economical presumption of random-bursty loss adopted by Golle-Modadugu (GM) [3]. GM augmented chains are far more efficient to compute than WL trees, but are not loss-tolerant to the same extreme degree. 1.2 Proposed Solution The major issue in stream authentication is the difficulty of simultaneously enabling: (1) online (ie immediate or with minimal delay) functionality, and (2) packet-loss robustness. The GR formulation satisfies (1) but not (2), with subsequent research [26] has focussed on incorporation of loss-tolerance via hash-graph topology. Overemphasis on either of (1) or (2)—as respectively exemplified by the GR and WL/GM approaches—results in stream authentication with a narrow functional emphasis, and therefore limited usefulness. This paper describes stream authentication in a broad functional context, where online (or near-online) performance and packet-loss tolerance are both important. We outline a composite solution featuring both preemptive and reactive mechanisms, with authentication operations executed at three packet-aggregation layers ie: (1) packet level GM chaining thereby ensuring efficient authentication on the bulk of the datastream, (2) group level WL star connectivity to protect the functionally important chain authenticators, and (3) meta-group level one-time signature [10-12] chaining to establish data-to-sender association.
2 Basic Mechanisms A digital stream D differs significantly from conventional block-oriented data in several important respects ie: (1) a priori undefined (ie infinite) length, (2) online generation in terms of finite L-packet substreams D k = {d1, , d L} ⊂ D , (3) online consumption upon receipt of one or more substreams D′k , and (4) probable loss of
Loss-Tolerant Stream Authentication via Configurable Integration
packets during transit so that D′k ⊆ D k and D′k ⊆ D . ∀k
241
Attributes (2, 3)
necessitate high-throughputs for sender-side authentication and receiver-side verification, thereby motivating the use of collision-resistant hash functions H rather than the (significantly slower) number-theoretic constructions. These hashes are used as the fundamental building blocks for the subsequently described H-graphs and sender-irrefutable signatures, the latter of which are functionally equivalent to blockoriented signatures. Such schemes are denoted σ : G x,y, Sx , V y [13] with key(G)eneration, (S)igning and (V)erification parameterised by key-pair (x, y) of private and public portions; as would be familiar from number-theoretic cryptography. H-graphs and one-time signatures are respectively reactive and preemptive authentication mechanisms, with computations in the former case necessitating forward-buffering of data. Reactive authentication enables receiver-side verification to be loss-tolerant to a certain degree, but is not genuinely online ie only nearly so if the buffer-size is relatively small compared to characteristic stream-lengths. Proactive authentication, in contrast, enables online performance, but requires lossless recovery of the transmitted stream. The inherently dichotomous requirements of online signing/verification (2, 3) and loss-tolerance (4) is an important motivation for the featured research, and will be subsequently discussed in more detail. 2.1 H-Chains H-graphs result from the conceptualisation of authenticated message packets as vertices and H-based computation as directed edges, the latter of which establishes one-way connectivity among the former. Multi-packet authentication via H-graphs can be (depending on the topology) highly efficient due to overhead amortisation over the multiple packets in a particular graph. This is achieved through appending a particular packet hash (immediately or otherwise) ahead or astern of its location in the packet-graph. H-graph authentication can also be robust—to some degree, depending again on the topology—against packet-loss, usually at the expense of signing/verification delays arising from the necessity for packet buffering during graph construction. Various buffered H-graph schemes [2-7] are therefore loss-tolerant, while others [1] are genuinely online-computable but loss-intolerant. The simplest H-graph construction is a linear chain on finite-stream D ie:
π0 = H (d1),S (H (d1)) and πi = d i, H (d i, H (d i +1))
(1)
for i ∈ [1, L–1]. The number-theoretic signature on initial packet π0 is required to establish sender-association, which bootstraps the authentication process. There is then the necessity for d i +1 prior to computation of πi , which is characteristic of reactive schemes.
242
A. Goh, G.S. Poh, and D.C.L. Ngo
2.2 One-Time Signatures H- based signatures [10-12] are significantly faster than the corresponding numbertheoretic operations, but functionally limited in that a key-pair can only be used to sign and verify a single-message. This results in a linear key-to-data overhead, as opposed the constant overhead of number-theoretic formulations with long-term reusable key-pairs. The signatures themselves also tend to be quite large—ie in the kbit-range for Even-Goldreich-Micali (EGM) [10] formulation—thereby rendering impractical signature affixation on every stream packet. One-time signatures do not (in contrast to H-graphs) require packet-buffering and can therefore be operated online. The basic operational concept is for the i-th signing Si and verification V i components (as parameterised by the i-th key-pair) to be applied on packet d i ∈ D . Note that one-time schemes also require bootstrapping with a number-theoretic signature on the initial one-time public-key y 0 . This
certified public-key can subsequently be used for receiver-side verification of a subsequent packet, the logic of which extends down the one-time signature chain as follows:
(
)
π0 = y 0,S ( y 0 ) and πi = d i, yi, Si −1 H (d i, yi )
(2)
This specific formulation is not loss-tolerant, and cannot recover from dropped packets. It can, however, be extended via association of multiple public-keys to a particular packet πi . Such a m-time scheme would be able to tolerate lost packets, but at an increased key-to-data overhead. Note that the signature in the i-th packet (on data and public-keys in πi ) is computed sender-side using the (i–1)-th public-key transmitted in πi −1 , which is characteristic of preemptive authentication formulations. This constrasts with the reactive authenticative logic of Eqn 1, where sender-side computation of H-chain node πi presumes prior availabity of (buffered) πi +1 . 2.3 Wong-Lam H-Star The H-star is the simplest case of the WL hierarchical authentication scheme [2], the basic idea of which is to bind consecutive packets into groups defined by a common signature (number-theoretic or hash-based) on the packet-group. Each packet is subsequently appended with the authenticative information (including the packetgroup signature) necessary to confirm membership in the arbitrary n-sized group. This is explicitly designed to enable verification of single packets independent of any other within the same group. The result is an extremely high degree of loss-tolerance, allowing for group-level authentication even if n–1 (out of n) packets are lost during transmission. Formation of WL H-graphs, on the other hand, requires high buffering and communications overheads. The attributes of packet-loss robustness and high authenticative overhead can be seen from the packet structure ie:
Loss-Tolerant Stream Authentication via Configurable Integration
πi = d i, H (d j),S H H (d j ) ∀j≠ i ∀j
243
(3)
with the root-node S corresponding to the common packet-group signature. Note S in this case denotes the signature-based binding together of all packets within a particular group, rather than a distinct node. Eqn 3 facilitates packet-group verification via any packet πi ∈ Π within an starauthenticated sub-stream, but at the expense of O(n) hashes and one signature per packet-group. This constitutes an extremely high authenticative overhead per packet, and is also manifestly reactive ie necessitating buffering of ∀d i ∈ D prior to computation of any πi . Packet-group size n is therefore indicative of both packetloss robustness and authenticative overheads, with small (large) values appropriate for relatively low (high) loss communications environments. 2.4 Golle-Moldagu Augmented H-Chain GM H-chaining [3] (in common with the WL constructions) is designed to facilitate verification within the context of lossy transmissions, but with a substantively different presumption of packet-loss. Note the packet-level self-similarity in Eqn 3, which renders WL stars robust against worst-case packet-loss, but at the expense of an extremely high authenticative overhead per packet. The GM approach adopts the less strenuous presumption that packets are loss in random-bursts, so that some packets in a n-sized packet-group would be successfully retrieved. There is some evidence [8, 9] that the latter presumption is more realistic, which is fortunate because mitigation of worst-case loss (as addressed by WL stars) should intuitively require heavier overheads than alternative packet-loss models. GM chains in fact enable a significant reduction in the per-packet authenticative overhead, via adoption of: (1) basic chain structure, with far less H-connections compared to the above-discussed WL configuration; and (2) non-uniform distribution of authenticative overheads over the packet-group. Attribute (2) results in a more complex definition ie:-
ψ , H (ψ ) i ∈ [α, n −1] i i with i=n πi = d n ψ ,S H ψ i=β β β d i, H (d i +1) i = α, n −1 i ∈ [1, n − 2] ψ i = d i, H (d i +1), H (d i + 2 ) , H ( ), H ( ) , H ( ) i = β dα d1 d2 dβ
( ( ))
(4)
for i ∈ {α, 1, …, n, β}, with head node α and tail β. This is justified by the resultant constant communications overhead per packet, as opposed O(n) (ie scaling with group-size) for WL stars. The group-level communications overheads are therefore of
244
A. Goh, G.S. Poh, and D.C.L. Ngo
( )
O(n), rather then O n 2
for WL stars. Such a reduction is of major practical
significance, particularly for operational scenarios transmission and bandwidth-constrained environments.
with
high-volume
data
GM augmented chaining allows for any packet πi ∈ Π to the verified so long as there is a path to the signed πβ , which must be retrieved. This position-dependant authenticative prioritisation is diametrically opposed to the node-level uniformity of WL stars, the latter of which results in loss-tolerance irrespective of position. Note also that GM chain formation at both sending and receiving endpoints requires O(n) packet-buffering, with verification predicated on recovery of the β node. This contrasts with the online verifiability of WL star-connected packets, any of which can be verified independently of all others in the packet-group.
3 Proposed Composite Solution The above-outlined mechanisms have various attractive attributes ie: (1) structural simplicity of H-chains with single/multiple connections, (2) online performance of mtime signature chaining, (3) maximal packet-loss robustness of WL stars, and (4) efficient packet-loss robustness of GM augmented chains; as illustrated below: 0
1
i-1
h1
i
hi
h i −1
i+1
0
L-1
1
y0
h i +1
i-1
i
i+1
yi
yi −1
y1
yi +1
(b)
(a) S
h3
h1
h1
hi
h i −1
h i +1
1
α
hn
hα
h i+2
hi i
i+2
h n −1 n-1
hi+3
n
n-2
hn
i+1
hn −2
i-1
h i +1
2
h i −1
h2
n-3
1
i-1
i
(c)
i+1
n
i-2
(d)
Fig 1. (a) Linear H-chain, (b) linear one-time signature chain, (c) WL H-star, and (d) GM augmented H-chain
with the arrows indicative of the authentication direction. This section describes such a framework featuring: (1) GM chain connectivity of individual packets in the datastream, (2) WL star connectivity of the GM β nodes, and (3) m-time signature affixation on the WL groups. The basic idea is to use GM chaining—with its simultaneous realisation of loss tolerance and structural efficiency—for the bulk of the datastream. This still necessitates recovery of the β nodes, which are then protected strongly via WL star connectivity. What remains is therefore to establish an association between any given WL star-group and the sender identity, which is efficiently done via a chained sequence of H-based signatures. Note this results in a tiered authentication framework addressing (1) packet, (2) packet-group and (3) stream-level datastructures.
Loss-Tolerant Stream Authentication via Configurable Integration
245
3.1 Packet-Level GM Chain-Connectivity Packets within groups can be characterised as d ik ∈ D k , with (k, i) the respective group and packet indices. GM chain-authenticated packets πik are straightforwardly obtained via Eqn 4, with the only difference being the handling of the β nodes ie:-
( )
( ) ( ) ( )
βk = ψ βk , H ψβk with ψβk = dβk , H d αk , H d1k , H d k2
(5)
Note these group-wise anchor nodes are not signed as in Eqn 4, but rather used (as subsequently outlined) as leaf nodes within a larger-scale WL star encompassing multiple GM chains. This is denoted by i ∈ {α, 1, …, n, β} and k ∈ {1, …, N} applicable in Eqns 4 and 5. Each data packet in the GM chain then requires a communications overhead of 3 H-words—less for the i ∈ {α, n–1, n} packets— resulting in a total of 3n H-words per packet-group. This is comparable to the overhead of a single WL node, hence obvious attraction of GM chains. Computation
( ) values in Eqn 4 are buffered,
of H-chain is also relatively efficient if the H d ik
resulting in a total requirement of 2n H-computations per group. 3.2 Group-Level WL Star-Connectivity Node β k of Eqn 5 allows verification of the k-th packet-group even if some packets
d ik (for i ≠ β) are dropped, but can itself be loss in transit. Loss of a particular β packet must therefore be mitigated against, so that the consequences do not extend beyond the relatively small n-sized packet-group. This is addressed in our framework via inter-β WL star-connectivity, with one-time signatures on the root-nodes. The resultant structural form is modified from Eqn 3 ie:
( )
µ µ µ Bk = βk , y k , y k +σ, H β k , Σ µ−1 with ∀k ′ ≠ k
µ Σµ−1 = Sµ−1 H H H βk , yµ, yµ+σ ∀k
( )
(6)
for k ∈ {1, …, N}, with µ the meta-group index and σ the inter-group spacing between the affixed one-time public-keys. These star-connected β nodes encompass N packet-groups, and are representative of a meta-group containing Nn data-packets. The WL star configuration ensures maximal robustness against loss of the groupspecific βµk , so that each node can be verified independently of all others. One β node out of the N is therefore sufficient to establish associativity within the largerscale meta-group context, so long as public-key yµ−1 (necessary for verification of signature Σ µ−1 ) is previously recovered and verified. Note the inclusion of two public-keys per WL leaf in Eqn 6, thereby mitigating against discontinuities in the sequence of one-time public-keys and signatures. Note the communications overhead of NH + mY + S per packet-group, with key/signature-lengths Y and S further expressible in terms of H-words. The featured
246
A. Goh, G.S. Poh, and D.C.L. Ngo
EGM protocol—in common with other one-time signature schemes ie DiffieLamport, Merkle and Merkle-Winternitz—has short Y = H key-lengths, but is particularly attractive in that the signature-lengths are both configurable and comparatively short. Typical settings then result in S in the kbit-range ie comparable to commonly encountered number-theoretic implementations. Efficient computation is facilitated—similar to the previously discussed packet-level GM chaining—by
( )
buffering of the H βµk values; resulting in an overhead of NH + S per Nn-sized meta-group, with EGM signature-generation S (or verification V, both of which are equal) also configurable. 3.3 Stream-Level m-Time Signatures The double signature-chaining of the previous section requires equivalently connected initialisation:
(
)
B0 = y 0, y σ,S H (y 0, y σ )
(7)
with Eqns 6 and 7 essentially a straightforward extension of the linear chaining of [1]. Incorporation of these preemptive signatures facilitate immediate verification—upon recovery of B node as specified in Eqn 6—with multiple connectivity allowing resumption of stream verification σ meta-groups astern of any completely dropped meta-group. Characteristic stream-dimension σ is described in [1] as chain-strength, in the sense that such signature-chains would tolerate the loss of σ–1 WL stars between Bµk and Bµ+σ k ′ , as illustrated below: µ
B0
y 0, y σ
S
n-1
n
β Bk
y µ, y µ +σ
Σ µ−1
µ+σ
µ
n-1
n
β B k +1
n-1
n
β Bk
y µ+σ, y µ+ 2 σ
Σ µ+σ−1
Fig 2. Layered framework featuring packet, group and meta-group mechanisms
This facilitates loss-tolerance between meta-groups; and is therefore complementary to the previously discussed WL star-aggregation of β nodes, which addresses packetloss within specified meta-groups. The m = 2 configuration does nevertheless result in a doubled key overhead per meta-group compared to Eqn 2, hence our incorporation of the EGM formalism with short H-sized public-keys. EGM (in common with other one-time protocols) also requires sender-side generation of the one-time key-pairs—with one required for each transmitted meta-groups—which can be pre-computed for enhanced operational efficiency. The framework as outlined features configurable parameters: (1) m public-keys per meta-group, (2) σ signature-chain meta-group spacing, (3) N groups per metagroup, and (4) n data-packets per group. Note the group/packet-level settings (N, n) facilitates more immediate verification compared to a long GM chain of length Nn, in addition to allowing more flexibility in response to different operational conditions.
Loss-Tolerant Stream Authentication via Configurable Integration
247
4 Analysis of Framework The proposed scheme can be implemented with any number-theoretic and one-time protocol. Our choice of the Rabin [14] and EGM formalisms is based primarily on performance considerations. Rabin verification (necessitating only a single modularsquaring) is, for instance, significantly more computation-efficient compared with other number-theoretic and even H-based formulations [15]. EGM, on the other hand, is significantly more communications-efficient compared with other one-time schemes, but in fact necessitates a higher computation-overhead. We therefore presume the relative preeminence of bandwidth and latency constraints over those related to endpoint computations. EGM signature generation and verification is also more computation-efficient (by up to two orders of magnitude) compared with number-theoretic protocols [15]. The constituent H-operations in EGM can be executed using block ciphers—ie the Data Encryption Standard (DES) as originally suggested—or one-way collision-resistent compressions ie Message Digest (MD) 5 or the Secure Hash Algorithm (SHA). Use of MD5 or SHA hashing results in superior computation-efficiency, and is therefore adopted in our implementation. These hashes are also used to construct the above-discussed WL stars and GM augmented chains. Practical stream authentication must be both effective and efficient, with evaluation of both dependant on the manner in which packets are dropped in transit. Mechanisms designed to tolerate packet-loss in random-bursts (ie GM chains) can therefore be expected to be significantly more efficient that those designed for worstcase loss (ie WL stars). Our incorporation of both WL stars and GM chains addresses the fact that the β packets of the latter cannot be lost without major functional consequence. The objective is therefore to demonstrate that the featured specification of loss-tolerance effectiveness does not significantly degrade computation and communications efficiency. Our analysis is presented as follows: 4.1 Correctness The above-outlined layered scheme can be demonstrated to be correct by considering system-wide compromise to be equivalent to compromise of the underlying cryptography protocols ie the signatures and H-graphs. We follow the GR methodology [1], in which presumption of secure signatures—thereby establishing the initial signed packet—is subsequently extended down a linear H-chains. This established security of linear H-chaining is based on random oracles, and is therefore itself extensible to non-linear constructions (ie the proposed layer framework) via demonstration that node-level compromise is as difficult as the equivalent effort on the underlying one-time signature and H-function. The proposed framework is therefore as secure as the underlying mechanisms ie: (1) number-theoretic signature, (2) one-time signature, (3) H-graphs and (4) H-function. 4.2 Signing and Verification Delay Delay Ω is defined as the number of packets which must be buffered prior to signing or verification operations. These operations should ideally be executed on a particular
248
A. Goh, G.S. Poh, and D.C.L. Ngo
packet without delay ie Ω = 0, which denotes genuine online transmission and consumption. Recall that delay-free operations are rendered impossible by our use of H-graphs as an amortisation mechanism against the high overhead of signature operations. Our scheme results in: (1) Ω(send) = Nn from meta-group level buffering prior to signature generation on WL root-node, and (2) Ω(recv) = n from group-level buffering prior to verification with respect signed β node; the latter of which presumes recovery of the required one-time public-keys. 4.3 Communications Overhead The communications overhead per packet:i=0 S′ + mH 2H i = α, n −1 i ∈ [1, n − 2] Ωi = 3H 0 i=n S + (m + N + 3)H i = β
(8)
can be surmised from Eqns 4-7, with: (1) S′ the number-theoretic signature-length, (2) S the one-time signature-length, and (3) H the hash-length. We compare the proposed scheme—in three (N, n) configurations, with m = 2 signature-chaining—against other stream authentication protocols over a 20-packet stream. Table 1 presents the delay and communication overheads associated with the various protocols: Table 1. Buffering delays and communications overheads for 20-packet stream
:i
Scheme
: (send, recv)
GR
0, 0
(0) 128, (i) 10
None
GR (one-time) WL star
0, 0
(0) 128, (i) 146
None
20, 0
(i) 318
Worst-case
GM chain
20, 20
Random
Proposed scheme (4, 5) Proposed scheme (2, 10) Proposed scheme (1, 20)
20, 5
(α, n–1) 20, (i) 30, (n) 0, (E) 158 (0) 148, (α, n–1) 20, (i) 30, (n) 0, (E) 226 (0) 148, (α, n–1) 20, (i) 30, (n) 0, (E) 206 (0) 148, (α, n–1) 20, (i) 30, (n) 0, (E) 196
20, 10 20, 20
(i)
Loss
Random Random Random
given 10-byte H-words, 128-byte number-theoretc signatures and 136-byte one-time signatures. These hashes are relatively short compared to those used on blocked data, but are sufficient in the context of interest ie to ensure target collision-resistance with respect a fixed message, rather than a more generalised anti collision-resistance.
Loss-Tolerant Stream Authentication via Configurable Integration
249
Note the lessened verification delay compared to unassisted GM chaining, while retaining the general efficiency of the augmented chain structure. This configurable reduction of the receiver-side delay is attained while simultaneously incorporating (random) loss-tolerance, the latter of which cannot be addressed by the GR and GR one-time formulations. The trade-off between Ω(recv) and Ωβ is also interesting, as is the significantly lessened communications overhead compared with the unassisted WL star configuration. Our scheme can therefore be said to possess functional advantages compared with previously reported formulations.
4.4 Computation Overhead The signing overhead can also expressed in terms of the constituent operations ie: (1) S′ computations on a stream basis, (2) S computations on a meta-group basis, and (3) H computations on a group/packet basis. We presume the necessity of only a single S′ per streaming session—thereafter represented as [S′] to denote amortisation over multiple meta-groups—and also prior generation of the one-time key-pair sequence. Each meta-group then requires N(n+2) H-computations to account for all the data packets, with N(n+1) required for GM chain construction; as can be seen from Eqns 4 and 5. Meta-group formation also requires Ω(WL) = S + (N+2)H association with WL star computation from Eqn 6. Buffering of packet-level hashes as previously discussed allows for significant efficiency gains, thereby allowing analysis in terms of incremental overheads during GM chaining ie ∆Ω i = H (for i ≠ n) and ∆Ω n = 0 . This results in a total meta-group overhead of:
Ω k = [S′] + S + ( N(n + 2) + 2 ) H
(9)
associated with sender-side signature generation. The verification overhead is likewise expressible in terms of: (1) V′ computations on a stream basis, (2) V computations on a meta-group basis, and (3) H computations on a group/packet basis; with [V′] to denote amortisation over multiple meta-groups. Presumption of packet-level hash-buffering then allows for incremental overheads ∆Ψ i = H (for i ≠ n), ∆Ψ n = 0 and ∆Ψ(WL) = V + 3H. This results in:
Ψ k = [V′] + V + ( N(n +1) + 3) H
(10)
associated with receiver-side signature verification. It should be emphasised that H retrieval rather than recomputation is especially significant for (N, n) configurations with relatively sparse WL star connections of long GM chains. Our framework—once again in three (N, n) configurations—is compared with previously published protocols over multiple µ repetitions of a 20-packet meta-group, resulting in Table 2 ie:
250
A. Goh, G.S. Poh, and D.C.L. Ngo Table 2. Signing and verification overheads for µ repetitions of 20-packet meta-group Scheme GR GR (one-time) WL star GM chain Proposed scheme (4, 5) Proposed scheme (2, 10) Proposed scheme (1, 20)
:(sign) S′ + 20µH
<(verify) V′ + 20µH
Loss None
S′ + 20µS
V′ + 20µV
None
µ⋅(S′ + 21H) S′ + µ⋅(S + 21H) µ⋅(S′ + 24H) S′ + µ⋅(S + 24H) S′ + µ⋅(S + 30H)
µ⋅(V′ + 40H) V′ + µ⋅(V + 40H) µ⋅(V′ + 19H) V′ + µ⋅(V + 19H) V′ + µ⋅(V + 27H)
Worst-case
S′ + µ⋅(S + 26H)
V′ + µ⋅(V + 25H)
Random
S′ + µ⋅(S + 24H)
V′ + µ⋅(V + 24H)
Random
Random Random
We present the WL star and GM chain overheads in terms of: (1) originally reported two-layer (number-theoretic signatures and H-graphs) frameworks, and (2) modified two-layer framework incorporating one-time signatures; the latter of which is clearly more efficient. Note the progressively higher overheads corresponding to more frequent WL star computations. This nevertheless still results in receiver-side verification significantly more efficient than the unassisted WL star configuration. Comparison with unassisted GM chaining is obviously unfavourable, with the overhead difference interpretable as the cost of ensuring that β node verification is loss-tolerant. As with Table 1, the GR and GR one-time (both of which are lossintolerant) formulations are the most computation-efficient. The presented framework is by contrast robustly loss-tolerant, while featuring configurable overheads not significantly greater than the optimal GM chain configuration.
5 Conclusions The presented multi-layer stream authentication framework effectively leverages the desirable characteristics of the underlying mechanisms ie: (1) preemptive one-time signature-chains, (2) robustly loss-tolerant WL stars, and (3) efficiently loss-tolerant GM chains. This enables flexible configurability ie emphasising: (1) buffering or communications efficiency (ref Table 1), and (2) packet-loss tolerance or computation efficiency (ref Table 2); the relative importance of which is variably dependant on the operational scenario. Our integrated framework also compensates for the functional shortcomings of the constituent mechanisms ie: (1) high overhead of WL stars, and (2) inherent loss-sensitivity of the GM β nodes. Note the relatively heavy authenticative overhead of the β nodes—resulting from signature and WL star related parameter affixation—compared to the bulk data packets. This authenticative non-uniformity is a consequence of the GM chain structure, and is entirely appropriate for various real-life datastreams ie as specified
Loss-Tolerant Stream Authentication via Configurable Integration
251
by the Motion Picture Experts Group (MPEG) standard. MPEG streams are, in fact, constituted from functionally distinct data constructs ie: (1) (I)ntra frames which are essentially static Joint Picture Experts Group (JPEG) images, (2) (P)redictive frames derived from previous reference data, and (3) (B)idirectional frames derived from previous/subseqeunt reference data; the first of which is by far the largest. The functional importance and large size of the I-frames allows them to carry the β authenticative overhead without resulting in unreasonably low data-to(data+authenticator) ratios. We look forward to exploring the applicability of the proposed formulation in an upcoming publication.
References 1. 2. 3. 4.
5.
6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
R Gennaro & P Rohatgi, “How to Sign Digital Streams”, Adv in Cryptology – CRYPTO ’97, Springer-Verlag, Berlin, pp 180–197, 1997. CK Wong & SS Lam, “Digital Signatures for Flows and Multicasts”, Comp Sc Tech Rep TR-98-15, U Texas at Austin. Also, in IEEE ICNP ’98, 1998. P Golle & N Modadugu, “Authenticating Streamed Data in the Presence of Random Packet Loss”, ISOC Network and Distributed System Security Symp, pp 13–22, 2001. P Rohatgi. “A Compact and Fast Hybrid Signature Scheme for Multicast Packet Authentication and Others Protocols”, 6th ACM Conf on Comp and Comms Security, pp 93-100, 1999. A Perrig, R Canetti, JD Tygar & D Song, “Efficient Authentication and Signing of Multicast Streams over Lossy Channels”, IEEE Symp on Security and Privacy, pp 56–73, 2000. S Miner & J Staddon, “Graph-based Authentication of Digital Streams”, IEEE Symp on Security and Privacy, 2001. A Perrig, “The BiBa One-Time Signature and Broadcast Authentication Protocol”, 8th ACM Conf on Comp and Comms Security, pp 28–37, 2001. V Paxson. “End-to-end Internet Packet Dynamics”, IEEE/ACM Trans on Networking, 7, pp 277–292, 1999. M Borella, D Swider, S Uludag & G Brewster, “Internet Packet Loss: Measurement and Implications for End-to-end QoS”, Intl Conf Parallel Processing, 1998. S Even, O Goldreich & S Micali, “On-line/Off-line Digital Signatures”, J Cryptology, 9(1), pp 35–67, 1996. RC Merkle, “A Digital Signature based on a Conventional Encryption Function”, Adv in Cryptology—Crypto ’87, LNCS 293, pp 369–378, 1987. RC Merkle, “A Certified Digital Signature”’ Adv in Cryptology—Crypto ’89, LNCS 435, pp 218–238, 1989. S Goldwasser, S Micali & R Rivest, “A Digital Signature Scheme Secure Against Adaptive Chosen Message Attack”, Siam J. Comp., 17(2), 281–308, 1988. MO Rabin, “Digital Signatures and Public-Key Functions as Intractable as Factorization”, Comp Sc Tech Rep MIT/LCS/TR-212, MIT, 1979. GS Poh. “Loss-Tolerant Stream Authentication Based on One-Time Signatures and HashGraphs”, Comp Sc Masters Thesis, University Sains Malaysia.
Confidential Transmission of Lossless Visual Data: Experimental Modelling and Optimization Bubi G. Flepp-Stars1 , Herbert St¨ ogner1 , and Andreas Uhl1,2 1
Carinthia Tech Institute, School of Telematics & Network Engineering Primoschgasse 8, A-9020 Klagenfurt, Austria 2 Salzburg University, Department of Scientific Computing Jakob-Haringerstr.2, A-5020 Salzburg, Austria
Abstract. We discuss confidential transmission of visual data in lossless format. Based on exemplary experimental data, we model the costs of the three main steps of such a scheme, i.e. compression, encryption, and transmission. Subsequently, we derive a cost optimal strategy for employment of confidential visual data transmission. Our approach may be applied to any environment provided that the costs of the single main steps in the target environment are known.
1
Introduction
The organization of large amounts of visual content in database infrastructures which are distributed worldwide shows very clearly that there is urgent need to provide and protect the confidentiality of sensitive visual data (e.g. patient related medical image data) when transmitted over networks of any kind. The storage and transmission of visual image data in lossless formats differs significantly from storage and transmission of common visual data for multimedia applications. It is constrained by the fact that in lossless environments the amount of compression is typically limited to a factor of about 2-3 in contrast to factors of 100 or more achievable in lossy schemes [4]. There exist several reasons why a lossy representation may not be acceptable or a lossless one may be preferable: – Due to requirements of the application a loss of image data is not acceptable (e.g., in medical applications because of reasons related to legal aspects and diagnosis accuracy [17], in geographical information systems (GIS) due to the required integrity of the data). – Due to the low processing power or limited energy resources of the involved hardware, compression and decompression of visual data may not be possible or desired at all (e.g. mobile and wireless clients in the context of pervasive/ubiquitous computing).
This artificial name represents the following group of students working on this project in the framework of the multimedia 1 laboratory (winterterm 2003/2004): G. Biebl, J. Burger, T. Freidl, J. Gruber, P. Leidner, R. Pfarrhofer, J. Podritschig, S. Rebernig, T. Salbrechter, and G. Stanossek
A. Lioy and D. Mazzocchi (Eds.): CMS 2003, LNCS 2828, pp. 252–263, 2003. c IFIP International Federation for Information Processing 2003
Confidential Transmission of Lossless Visual Data
253
– Due to the high bandwidth available at the communication channel lossy compression is not necessary and lossless formats are therefore preferred. A possible solution to the first issue is to use selective compression where parts of the image that contain crucial information (e.g. microcalcifications in mammograms) are compressed in a lossless way whereas regions containing unimportant information are compressed in a lossy manner [2]. However, legal questions and problems related to the usability of such a scheme remain unsolved. Therefore, we restrict the discussion to lossless data formats. In the context of applications involving visual data transmission, we may distinguish whether the image data is given as plain image data (e.g. after being captured by a digitizer or CCD array) or in form of a bitstream resulting from prior compression. The first application scenario has been denoted as “on-line” (e.g. video conferencing, surveillance applications) and the latter “off-line” (since the respective applications like video on demand or photo CD-ROM are purely retrieval based) [9,14]. Note that the involvement of compression technology might be not mandatory in the on-line scenario, especially when focusing onto lossless techniques due to the limited compression gain in this case. In this work we focus on computationally efficient schemes to provide confidentiality for the transmission of visual data in a lossless on-line scenario. In particular, we seek to optimize the interplay of the three main steps of such a scheme, i.e. compression, encryption, and transmission in the sense of minimal computational effort and energy consumption. Based on exemplary experimental data, we model the costs of the three involved processing steps and subsequently derive a cost optimal strategy for employment of confidential visual data transmission in the target environment. Specifically, we aim at the question whether the compression stage is required in any case to result in an overall cost optimal scheme or not. Additionally, we consider “selective encryption” (SE) for trading off computational complexity for security, where application specific data structures are exploited to create more efficient encryption systems (see e.g. SE of MPEG video streams [1,6,11,13,15,18], of wavelet-based encoded imagery [3,7,10,18], and of quadtree decomposed images [3]). The basic idea of SE is to protect (i.e. encrypt) the visually most important parts of an image or video representation only, relying on a secure but slow “classical” cipher. Section 2 provides an introduction to principles of confidential transmission of visual data. In section 3 we present the building blocks of our target environment and model the respective computational costs based on experimental data. Cost optimal employment of several different configurations of confidential visual data transmission using full encryption and SE is derived in section 4. In the conclusion we summarize the main results and give an outlook to further work in this direction.
254
2
B.G. Flepp-Stars, H. St¨ ogner, and A. Uhl
Principles of Confidential Transmission of Visual Data
Images and videos (often denoted as visual data) are data types which require enormous storage capacity or transmission bandwidth due to the large amount of data involved. In order to provide reasonable execution performance for encrypting such large amounts of data, usually symmetric encryption is used in practical applications. On the other hand, key management is a difficult issue in symmetric systems. As done in most current applications with demand for confidentiality, public key techniques like RSA are used for key exchange or digital signature generation only (such schemes are usually denoted as “hybrid”). The Advanced Encryption Standard AES [5] is a recent symmetric block cipher which is going to replace the Data Encryption Standard DES in all applications where confidentiality is really the aim. AES operates on 128-bit blocks of data and uses 128, 196, or 256 bit keys. The RSA algorithm [12] is the most popular public key algorithm nowadays and operates with key sizes of 512 bit upwards. For achieving confidentiality, the data is encrypted using the public key of the receiver who may decrypt the message using his private key. Vice versa, for generating a digital signature, the data is signed (i.e. encrypted) using the private key of the signer. The signature may be verified with the corresponding public key. We use AES and RSA as the basic cryptographic building blocks in section 2. There are two ways to provide confidentiality to a transmission application. First, confidentiality is based on mechanisms provided by the underlying computational infrastructure. The advantage is complete transparency, i.e. the user or a specific application does not have to take care about confidentiality. The obvious disadvantage is that confidentiality is provided for all applications, no matter if required or not, and that it is not possible to exploit specific properties of certain applications. To give a concrete example, consider the distributed database infrastructure mentioned in the introduction. If the connections among the components are based on TCP/IP internet-connections (which are not confidential by themselves of course), confidentiality can be provided by creating a Virtual Private Network (VPN) using IPSec (which extends the IP protocol by adding confidentiality and integrity features). In this case, the entire visual data is encrypted for each transmission which puts a severe load on the encryption system. The second possibility is to provide confidentiality on the application layer. Here, only applications and services are secured which have a demand for confidentiality. The disadvantage is that each application needs to take care for confidentiality by its own, the advantage is that specific properties of certain applications may be exploited to create more efficient encryption schemes or that encryption is omitted if not required. For example, all approaches involving selective encryption are classified into the second category since SE takes advantage of the redundancy in visual data and therefore takes place at the application level.
Confidential Transmission of Lossless Visual Data
3
255
Basic Building Blocks: Compression, Encryption, and Transmission
In any (storage or) transmission application no matter if lossy or lossless, compression has always to be performed prior to encryption since the statistical properties of encrypted data prevent compression from being applied successfully. Moreover, the reduced amount of data after compression decreases the computational demand of the subsequent encryption stage. Therefore, the processing chain has a fixed order (compression – encryption – transmission). In the following subsections, we introduce the basic technology and model the costs in terms of computational demand of the stages in the processing chain in the target environment. The hardware platform used is a 996 MHz Intel Pentium III with 128 MB RAM, the Network is 100 MBit/s Ethernet.
3.1
Lossless Compression
In order to model a compression scheme which trades off compression efficiency for computational demand, we use the JBIG reference implementation in a selective mode where a different amount of bitplanes of 8 bpp grayscale images is compressed. So our scheme ranges from applying no compression at all to compressing a certain number of bitplanes using JBIG (starting from the MSB bitplane since the achievable compression ratio is the highest here). Finally, instead of applying JBIG to all bitplanes, JPEG 2000 (the Jasper C implementation) in lossless mode is used since the compression results are better as compared to full JBIG coding. A set of 20 testimages in two sizes is subjected to all compression settings and the obtained file sizes and compression timings are averaged for the 512 × 512 and 1280 × 1024 pixels images, respectively. The resulting measurement points as depicted in Fig. 1 clearly show the decreasing compression time for increasing data size. In order to obtain a closed expression to be handled easily in an analytical way for the optimization, we approximately interpolate the measurement points by a 6th-order polynomial and result in the following formulas (also visualized in Fig. 1) for the compression behaviour of our test system (where x is the resulting data size in 100 KByte after compression and t is the compression time in seconds):
t = 7.32x6 −90.89x5 +463.02x4 −1237.72x3 +1829.28x2 −1417.22x+450.73 (1)
t = −0.07x5 + 1.73x4 − 21.99x3 + 153.94x2 − 562.09x + 841.21 Equations (1) and (2) correspond to Figs. 1.a and 1.b, respectively.
(2)
256
B.G. Flepp-Stars, H. St¨ ogner, and A. Uhl
compression size to time for picturesize 512 x 512 0.8
compression size to time for picturesize 1280 x 1024
jasper
4
original approximation
original approximation jasper
0.7
3.5
plane 1−7
0.6
3
plane 1−6
0.5
plane 1−7
2.5
plane 1−6
time in sec
time in sec
plane 1−5 0.4 plane 1−4 0.3
plane 1−5
2
plane 1−4
1.5
plane 1−3 plane 1−3 0.2
1
plane 1−2
plane 1−2 0.1
0.5
plane 1
plane 1 original
0
−0.1 1.4
original
0
1.6
1.8
2 2.2 size in 100Kbyte
2.4
2.6
−0.5
2.8
(a) 512x512 pixels images
6
7
8
9 10 size in 100Kbyte
11
12
13
(b) 1280x1024 pixels images
Fig. 1. Trandeoff between compression timings and the resulting data amount after compression.
3.2
Encryption and Transmission
In order to model encryption behaviour, we use the C++ RSA implementation given in [16] and the C++ AES implementation available at the web (http://fp.gladman.plus.com/cryptography_technology/rijndael/). Equations (3) to (6) relate the amount of data encrypted in 100 kByte to the processing time (given in seconds), the corresponding curves are visualized in Fig. 2.a. In contrast to the compression case, we notice a purely linear behaviour. Note that RSA is employed for reasons of obtaining a rich variety in the overall behaviour of the processing chain. In practice, one would hardly use a public-key system to encrypt visual data. Of course, the time demand of RSA is several orders of magnitude higher as compared to AES. Performance differences among encryption schemes with the exhibited magnitude could result from applying hardware or software based approaches in real-life systems. t = 5.81x (RSA 512 bit) t = 19.72x (RSA 2048 bit)
(3) (4)
t = 0.01x t = 0.02x
(5) (6)
(AES 128 bit) (AES 256 bit)
Finally, the transmission stage is modeled by using the message passing library PVM for sending data between two computers over the Ethernet connection. We use four different modes. The routine pvm send sends a message stored in the active send buffer to the PVM process identified by tid. On the other hand, the routine pvm psend takes a pointer to a buffer buf, its length
Confidential Transmission of Lossless Visual Data
key generation and encryption with RSA
257
sending time
600
1.4
duration in sec.
500 400
1.2
512 bits keylength 2048 bits keylength
300 200
1
0
0
5
10
15 size in 100kByte
20
25
30
key generation and encryption with AES
time in sec
100 0.8
0.6
duration in sec.
0.4 0.4
0.3 128 bits keylength 256 bits keylength
0.2
send ganz sendteil psend ganz psendteil
0.2
0.1 0
0
0
5
10
15 size in 100kByte
20
25
30
0
5
10
15
20
25
30
35
size in 100 Kbyte
(a) Encryption
(b) Transmission
Fig. 2. Processing timings for varying data amount.
len, and its data type datatype and sends this data directly to the PVM task identified by tid. Data is sent as a whole block in mode “ganz” whereas it is sent in pieces of 1 KByte in mode “teil”. Again, the data size is varied and the time required to transmit the data is measured and fitted by a polynomial (see Fig. 2.b and equations (7) - (10). We notice small differences among the transmission schemes with pvm psend in “teil” mode being the most expensive one. All schemes are dominated by the linear term, higher order terms vanish with the precision considered. t = 0.01x t = 0.02x
pvm send, teil pvm psend, teil t = 0.007x pvm send, ganz t = 0.01x
pvm psend, ganz
(7) (8) (9) (10)
Summarizing we notice that AES encryption and transmission operate on a similar level of time demand, whereas RSA is much more expensive. As expected, both processing stages exhibit linear behaviour.
4
Cost Optimal Configuration of Confidential Visual Data Transmission
The processing chain for confidential transmission of visual data as defined in the last section (compression – encryption – transmission) may have a fixed order, but does not necessarily involve all stages in full completeness in any case. For example, in SE only a subset of the data is subjected to encryption.
258
B.G. Flepp-Stars, H. St¨ ogner, and A. Uhl
However, the entire system might have to maintain a certain level of security which puts constraints on the lower bound of data subjected to encryption. Similarly, compression must not necessarily be applied to the full amount of data or might be omitted at all for complexity reasons. Again, limited transmission bandwidth may impose a constraint on the possible datarate and therefore enforce the employment of compression to a certain extent. The aim of this section is to identify a cost optimal way (in terms of processing time) to operate our processing chain. Concerning constraints imposed by the target environment, we assume the channel to be of infinite bandwidth and SE as being secure enough for the target application. The first configuration considered is to process 1280×1024 images and to use AES as cipher. In Fig. 3 we clearly see that the overall shape of the resulting time curves closely matches the curve modeling compression behaviour (see Fig. 1.b). In particular, the curves are almost monotonically decreasing and attain their minima at the right edge of the x-axis, corresponding to a data size of about 1290 KB. When matching this result with Fig. 1.b, we identify the optimal operation mode as to perform no compression at all. This is a surprising result at first sight of course, however, when considering the low processing cost of encryption and transmission as compared to compression, it is obvious.
overalll size to time for picturesize 1280x1024
overalll size to time for picturesize 1280x1024
4
4 AES(256Bit) & zeit−psend−teil
AES(256Bit) & zeit−send−ganz
3.5
3.5
3
time in seconds
time in seconds
3
2.5
2.5
2
2 1.5
1.5
1
1
7
8
9
10 size in 100 Kbytes
11
(a) AES(256) with psend teil
12
0.5
7
8
9
10 size in 100 Kbytes
11
12
(b) AES(256) with send ganz
Fig. 3. Overall timings for varying data amount processing 1280x1024 pixels images using AES.
The only difference between Figs. 3.a and 3.b is the employment of the most and least expensive transmission modes. As expected, this results in almost no difference in the final overall time curves. The same is true whether 128 or 256 bit AES is used (figures are not shown). As a consequence, the formulas describing
Confidential Transmission of Lossless Visual Data
259
the overall behaviour are identical to Equation (2) with the linear terms replaced by −562.05x and −562.07x, respectively. The second configuration investigated is to process 1280×1024 images and to use RSA as cipher. Fig. 4 shows entirely different shapes as seen before. Almost linearly increasing curves describe the time behaviour of the system, attaining their respective minima at the left edge of the x-axis, corresponding to a data size of 660 KB. The fact that these curves are monotonically increasing may be easily verified by showing the first derivative of the corresponding formulas to be larger as zero over the entire range considered. Relating this finding to Fig. 1.b, the optimal operation mode is to perform maximal compression, in this case lossless JPEG 2000.
overalll size to time for picturesize 1280x1024
overalll size to time for picturesize 1280x1024
80
260
RSA(512Bit) & zeit−psend−teil
RSA(2048Bit) & zeit−psend−teil
75 240
70 220
time in seconds
time in seconds
65
60
200
180
55 160
50
140
45
40
7
8
9
10 size in 100 Kbytes
11
(a) RSA(512) with psend teil
12
120
7
8
9
10 size in 100 Kbytes
11
12
(b) RSA(2048) with psend teil
Fig. 4. Overall timings for varying data amount processing 1280x1024 pixels images using RSA.
The employment of RSA with differently sized keys (i.e. 512 bit for Fig. 4.a and 2048 bit for Fig. 4.b) does change neither the shape of the curve nor the position of the minimum, the curve is simply upshifted a certain amount (which dramatically shows the dominance of encryption in this configuration). The formulas describing the overall behaviour are again identical to Equation (2) with the linear terms replaced by −556.26x and −542.35x, respectively. Having identified the optimal operation modes of our test system with full encryption, we now focus onto selective encryption (SE). In this context, SE only encrypts a subset of the bitplanes of the binary representation of the visual data. For results concerning the security of this approach see [8,14]. Besides the interesting properties of this technique itself, we are interested whether there exist optimal operation modes where a partial compression is required (instead of no compression with AES and full compression with RSA). Selective encryption
260
B.G. Flepp-Stars, H. St¨ ogner, and A. Uhl
is modeled by reducing the processing time of encryption to the corresponding amount, i.e. by modifying Equations (3) - (6) accordingly. Fig. 5 shows the shapes of the resulting overall curves when processing 1280 × 1024 pixels images using selective encryption with RSA (512 bit key).
overalll size to time for picturesize 1280x1024
overalll size to time for picturesize 1280x1024
16
10.4 RSA(512Bit)/8 & zeit−psend−teil
RSA(512Bit)/5 & zeit−psend−teil 10.2
15
10
14.5
9.8
14
9.6
time in seconds
time in seconds
15.5
13.5
13
9.4
9.2
12.5
9
12
8.8
11.5
8.6
11
7
8
9
10 size in 100 Kbytes
11
12
(a) 20 % encrypted with psend teil
8.4
7
8
9
10 size in 100 Kbytes
11
12
(b) 12.5 % encrypted with psend teil
Fig. 5. Overall timings for varying data amount processing 1280x1024 pixels images using 512 bit RSA with selective encryption.
Whereas encrypting 20 % of the visual data still results in a monotonically increasing curve (Fig. 5.a), the reduction of encryption to 1/8 (i.e. 12.5 %) of the data results in a curve showing a local minimum close to the middle of the range of the x-axis (Fig. 5.b). Equations (11) and (12) show the corresponding formulas for the overall time behaviour. t1(x) = 0.71x5 + 1.73x4 − 21.99x3 + 153.94x2 + −560.91x + 841.21 t2(x) = 0.71x5 + 1.73x4 − 21.99x3 + 153.94x2 + −561.34x + 841.21
(11) (12)
In order to analytically determine the point of optimal configuration for the SE restricted to 12.5 % encryption, we need to determine the local minimum of Equation (12). In order to do that we set the first derivative to zero t2 (x) = 3.55x4 + 6.91x3 − 65.96x2 + 307.88x − 561.41 = 0
(13)
and obtain two roots in the area of interest [6.6, 13] (compare Fig. 1.b): x = 7.72 and x = 9.71. Obviously, the local minimum we are looking for is
Confidential Transmission of Lossless Visual Data
261
attained at x = 9.71 (which can be computing by verifying t2 (9.71) < 0 and is also shown in Fig. 5.b). Relating this result to Fig. 1.b we may deduce that the optimal configuration of the entire system compressing 1280 × 1024 pixels images and applying selective encryption with 512 bit RSA to 12.5 % of the data is to compress 3 out of 8 bitplanes with JBIG. Finally, we cover the case of processing 512 × 512 pixels images with 512 bit RSA. Fig. 6 shows similar shapes of the resulting overall time curves compared to the case of larger images before, equations (14) and (15) show the corresponding formulas. Note that only relatively small differences in the linear term cause the significant different shapes as compared to the “compression” Equation (1). Again we focus on the interesting case of encrypting 12.5 % of the data (Fig. 6.b).
overalll size to time for picturesize 512x512
overalll size to time for picturesize 512x512
3.2
1.95
RSA(512Bit)/5 & zeit−psend−teil
RSA(512Bit)/8 & zeit−psend−teil
3.1 1.9
3
time in seconds
time in seconds
2.9
2.8
2.7
1.85
1.8
2.6 1.75
2.5
1.5
1.6
1.7
1.8
1.9 2 2.1 size in 100 Kbytes
2.2
2.3
2.4
(a) 20 % encrypted with psend teil
2.5
1.7
1.5
1.6
1.7
1.8
1.9 2 2.1 size in 100 Kbytes
2.2
2.3
2.4
2.5
(b) 12.5 % encrypted with psend teil
Fig. 6. Overall timings for varying data amount processing 512x512 pixels images using 512 bit RSA with selective encryption.
t3(x) = 7.32x6 − 90.89x5 + 463.02x4 − 1237.72x3 + 1829.28x2 + −1416.04x + 450.73 t4(x) = 7.32x6 − 90.89x5 + 463.02x4 − 1237.72x3 + 1829.28x2 + −1416.47x + 450.73
(14) (15)
Similarly, in order to determine the point of optimal configuration, we need to determine the local minimum of equation (15). In order to do that we set the first derivative to zero t4 (x) = 43.93x5 −454.43x4 +1852.1x3 −3713.16x2 +3658.56x−1416.47 = 0 (16)
262
B.G. Flepp-Stars, H. St¨ ogner, and A. Uhl
and obtain two roots in the area of interest [1.4, 2.6] (compare Fig. 1.a): x = 1.64 and x = 2.10. Obviously, the local minimum we are looking for is attained at x = 2.10 (which can be computing by verifying t4 (2.10) < 0 and is also shown in Fig. 6.b). As before, we identify the optimal operation mode of the entire system to apply selective compression to 2 bitplanes out of 8 (to be seen from Fig. 1.a).
5
Conclusion
In the context of confidential transmission of visual data in lossless format, we have modeled the costs of the three main steps of such a scheme, i.e. compression, encryption, and transmission. Based on the behaviour of our exemplary system, we have shown that depending on the type of encryption involved, the optimal configuration of the entire system may be to operate without compression, with full compression, or even with partial compression. In future work we will additionally include constraints coming from the target environment (e.g. limited channel bandwidth, certain level of security in SE) into our optimization. Additionally we will model the dependency between selective compression and selective encryption more clearly. Acknowledgements. This work has been partially supported by the Austrian Science Fund FWF, project 15170.
References [1] A. M. Alattar, G. I. Al-Regib, and S. A. Al-Semari. Improved selective encryption techniques for secure transmission of MPEG video bit-streams. In Proceedings of the 1999 IEEE International Conference on Image Processing (ICIP’99), volume 4, pages 256–260, Kobe, Japan, October 1999. IEEE Signal Processing Society. [2] A. Bruckmann and A. Uhl. Selective medical image compression techniques for telemedical and archiving applications. Computers in Biology and Medicine, 30(3):153 – 169, 2000. [3] H. Cheng and X. Li. Partial encryption of compressed images and videos. IEEE Transactions on Signal Processing, 48(8):2439–2451, 2000. [4] P.C. Cosman, R.M. Gray, and R.A. Olshen. Evaluating quality of compressed medical images: SNR, subjective rating, and diagnostic accuracy. Proceedings of the IEEE, 82(6):919–932, 1994. [5] J. Daemen and V. Rijmen. The Design of Rijndael: AES — The Advanced Encryption Standard. Springer Verlag, 2002. [6] Thomas Kunkelmann. Applying encryption to video communication. In Proceedings of the Multimedia and Security Workshop at ACM Multimedia ’98, pages 41–47, Bristol, England, September 1998. [7] R. Norcen, M. Podesser, A. Pommer, H.-P. Schmidt, and A. Uhl. Confidential storage and transmission of medical image data. Computers in Biology and Medicine, 33(3):277–292, 2003.
Confidential Transmission of Lossless Visual Data
263
[8] M. Podesser, H.-P. Schmidt, and A. Uhl. Selective bitplane encryption for secure transmission of image data in mobile environments. In CD-ROM Proceedings of the 5th IEEE Nordic Signal Processing Symposium (NORSIG 2002), TromsoTrondheim, Norway, October 2002. IEEE Norway Section. file cr1037.pdf. [9] A. Pommer and A. Uhl. Application scenarios for selective encryption of visual data. In J. Dittmann, J. Fridrich, and P. Wohlmacher, editors, Multimedia and Security Workshop, ACM Multimedia, pages 71–74, Juan-les-Pins, France, December 2002. [10] A. Pommer and A. Uhl. Selective encryption of wavelet packet subband structures for secure transmission of visual data. In J. Dittmann, J. Fridrich, and P. Wohlmacher, editors, Multimedia and Security Workshop, ACM Multimedia, pages 67–70, Juan-les-Pins, France, December 2002. [11] Lintian Qiao and Klara Nahrstedt. Comparison of MPEG encryption algorithms. International Journal on Computers and Graphics (Special Issue on Data Security in Image Communication and Networks), 22(3):437–444, 1998. [12] B. Schneier. Applied cryptography (2nd edition): protocols, algorithms and source code in C. Wiley Publishers, 1996. [13] C. Shi and B. Bhargava. A fast MPEG video encryption algorithm. In Proceedings of the Sixth ACM International Multimedia Conference, pages 81–88, Bristol, UK, September 1998. [14] Champskud J. Skrepth and Andreas Uhl. Selective encryption of visual data: Classification of application scenarios and comparison of techniques for lossless environments. In B. Jerman-Blazic and T. Klobucar, editors, Advanced Communications and Multimedia Security, IFIP TC6/TC11 Sixth Joint Working Conference on Communications and Multimedia Security, CMS ’02, pages 213 – 226, Portoroz, Slovenia, September 2002. Kluver Academic Publishing. [15] L. Tang. Methods for encrypting and decrypting MPEG video data efficiently. In Proceedings of the ACM Multimedia 1996, pages 219–229, Boston, USA, November 1996. [16] M. Welschenbach. Kryptographie in C und C++. Zahlentheoretische Grundlagen, Computer-Arithmetik mit großen Zahlen, kryptographische Tools. Springer Verlag, 1998. [17] S. Wong, L. Zaremba, D. Gooden, and H.K. Huang. Radiologic image compression – a review. Proceedings of the IEEE, 83(2):194–219, 1995. [18] Wenjun Zeng and Shawmin Lei. Efficient frequency domain video scrambling for content access control. In Proceedings of the seventh ACM International Multimedia Conference 1999, pages 285–293, Orlando, FL, USA, November 1999.
Author Index
Aslan, Heba K.
Norcen, Roland
14
194
Baglietto, Pierpaolo 172 Bassi, Alessandro 54 Bertino, Elisa 146
Obelheiro, Rafael R. 104 Ortega, Juan J. 158
Dietl, Werner
Papadaki, Maria Poh, G.S. 239
214
Ferrari, Elena 146 Flepp-Stars, Bubi G. Furnell, Steven 65 Goh, Alwyn
Rangan, Venkat P. 76 Reynolds, Paul 65 R¨ ossler, Thomas 117
252
1, 27, 181, 239
Herrera-Joancomart´ı, Jordi Hollosi, Arno 117 Huang, Richard 76 Kent, Stephen T.
226
40
Laganier, Julien 54 Li, Wei 205 Li, Xiaoqiang 205 Lines, Benn 65 Lopez, Javier 158 Maresca, Massimo 172 Maziero, Carlos 132 Meg´ıas, David 226 Minguill´ on, Juli` a 226 Miyaji, Atsuko 89 Moggia, Francesco 172 Ngo, David C.L.
65
1, 27, 239, 181
Sakabe, Yusuke 89 Santin, Altair Olivo 132 Silva Fraga, Joni da 104, 132 Soshi, Masakazu 89 Squicciarini, Anna C. 146 St¨ ogner, Herbert 252 Taesombut, Nut 76 Troya, Jose M. 158 Uhl, Andreas Vivas, Jose
194, 214, 252 158
Wangham, Michelle S. Xue, Xiangyang Yip, Kuan W.
205 181
Zingirian, Nicola
172
104