Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen
2587
3
Berlin Heidelberg New York Barcelona Hong Kong London Milan Paris Tokyo
Pil Joong Lee
Chae Hoon Lim (Eds.)
Information Security and Cryptology – ICISC 2002 5th International Conference Seoul, Korea, November 28-29, 2002 Revised Papers
13
Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editors Pil Joong Lee Pohnag University of Science and Technology San 31, Hyoja-dong, Nam-gu, Pohang, Kyungbuk, 790-784, Korea E-mail:
[email protected] Chae Hoon Lim Sejong University 98, Gunja-dong, Gwangjin-gu, Seoul, 143-747, Korea E-mail:
[email protected]
Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at
.
CR Subject Classification (1998): E.3, G.2.1, D.4.6, K.6.5, F.2.1, C.2, J.1 ISSN 0302-9743 ISBN 3-540-00716-4 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springer.de © Springer-Verlag Berlin Heidelberg 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by DA-TeX Gerd Blumenstein Printed on acid-free paper SPIN 10872441 06/3142 543210
Preface
Organized by KIISC (the Korea Institute of Information Security and Cryptology) and sponsored by MIC (Ministry of Information and Communication, Korea), the Fifth International Conference on Information Security and Cryptology (ICISC 2002) was held at the Seoul Olympic Parktel in Seoul, Korea, November 28–29, 2002. This conference aims at providing a forum for the presentation of new results in research, development, and application in information security and cryptology. This is also intended to be a place where research information can be exchanged. The program committee received 142 submissions from 23 countries and regions (Australia, Austria, Belgium, Canada, China, Czech Republic, France, Finland, Germany, India, Iran, Ireland, Israel, Japan, Korea, Malaysia, Norway, Singapore, Spain, Sweden, Taiwan, UK, and USA), of which 35 were selected for presentation in 9 sessions. All submissions were anonymously reviewed by at least 3 experts in the relevant areas. There was one invited talk by David Naccache (Gemplus, France) on “Cut-&-Paste Attack with Java.” We are very grateful to all the program committee members who devoted much effort and valuable time to reading and selecting the papers. These proceedings contain the final version of each paper revised after the conference. Since the revised versions were not checked by the program committee rigorously, the authors must bear full responsibility for the contents of their papers. We also thank the external experts and the committee’s advisory members who assisted the program committee in evaluating various papers and apologize for not including their names here. Special thanks also go to all members of the Information Security Laboratory (http://oberon.postech.ac.kr) for their skillful and professional assistance in supporting the various tasks of the program chairs. We are also grateful to all the organizing committee members for their volunteer work. Finally, we would like to thank all the authors who submitted their papers to ICISC 2002 (including those whose submissions were not successful), as well as the conference partipants from around the world, for their support, which made this conference a big success.
December 2002
Pil Joong Lee, Chae Hoon Lim
ICISC 2002 2002 International Conference on Information Security and Cryptology Seoul Olympic Parktel, Seoul, Korea November 28–29, 2002
Organized by Korea Institute of Information Security and Cryptology (KIISC) (http://www.kiisc.or.kr)
Sponsored by MIC (Ministry of Information and Communication), Korea (http://www.mic.go.kr)
Organization
VII
Organization
General Chair Dongho Won
Sungkyunkwan University, Korea
Program Co-chairs Pil Joong Lee Chae Hoon Lim
Pohang University of Science & Technology, Korea Sejong University, Korea
Program Committee Zongduo Dai Ed Dawson Markus Jakobsson Kwangjo Kim Kwok-Yan Lam Arjen K. Lenstra Jongin Lim Atsuko Miyaji Sang Jae Moon David Naccache Christof Paar Choonsik Park Dingyi Pei Josef Pieprzyk David Pointcheval Bart Preneel Bimal Roy Kouichi Sakurai Tsuyoshi Takagi Serge Vaudenay Sung-Ming Yen
Academia Sinica, China Queensland University of Technology, Australia RSA Laboratories, USA ICU, Korea PrivyLink International Limited, Singapore Citibank, USA & Technische Universiteit Eindhoven, The Netherlands Korea University, Korea JAIST, Japan Kyungpook National University, Korea Gemplus Card International, France Ruhr-Universit¨ at Bochum, Germany ETRI, Korea Chinese Academy of Sciences, China Macquarie University, Australia ´ Ecole Normale Sup´erieure, France Katholieke Universiteit Leuven, Belgium Indian Statistical Institute, India Kyushu University, Japan Technische Universt¨at Darmstadt, Germany EPFL, Switzerland National Central University, Taiwan
Organizing Committee Chair Jong-Seon No
Seoul National University, Korea
VIII
Organization
Organizing Committee Jae-Cheol Ha Souhwan Jung Hyung-Woo Lee Sang Jin Lee Dong-Joon Shin Youjin Song
Korea Nazarene University, Korea Soongsil University, Korea Cheonan University, Korea Korea University, Korea Hanyang University, Korea Dongguk University, Korea
Table of Contents
Invited Talk Cut-&-Paste Attacks with JAVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Serge Lefranc and David Naccache
Digital Signatures Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption . . . 16 Ik Rae Jeong, Hee Yun Jeong, Hyun Sook Rhee, Dong Hoon Lee, and Jong In Lim New DSA-Verifiable Signcryption Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Jun-Bum Shin, Kwangsu Lee, and Kyungah Shim Convertible Group Undeniable Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Yuh-Dauh Lyuu and Ming-Luen Wu An Efficient Fail-Stop Signature Scheme Based on Factorization . . . . . . . . . . . . 62 Willy Susilo and Rei Safavi-Naini On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature Scheme . . . . . . . . . . . . . . 75 Guilin Wang
Internet Security System Specification Based Network Modeling for Survivability Testing Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 HyungJong Kim A Risk-Sensitive Intrusion Detection Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Hai Jin, Jianhua Sun, Hao Chen, and Zongfen Han Applet Verification Strategies for RAM-Constrained Devices . . . . . . . . . . . . . . 118 Nils Maltesson, David Naccache, Elena Trichina, and Christophe Tymen
Block/Stream Ciphers Sliding Properties of the DES Key Schedule and Potential Extensions to the Slide Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Raphael Chung-Wei Phan and Soichi Furuya
X
Table of Contents
Consistent Differential Patterns of Rijndael . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Beomsik Song and Jennifer Seberry Hardware Design and Analysis of Block Cipher Components . . . . . . . . . . . . . . 164 Lu Xiao and Howard M. Heys Higher Order Correlation Attacks, XL Algorithm and Cryptanalysis of Toyocrypt . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Nicolas T. Courtois
Stream Ciphers & Other Primitives On the Efficiency of the Clock Control Guessing Attack . . . . . . . . . . . . . . . . . . . 200 Erik Zenner Balanced Shrinking Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Se Ah Choi and Kyeongcheol Yang On the Universal Hash Functions in Luby-Rackoff Cipher . . . . . . . . . . . . . . . . . 226 Tetsu Iwata and Kaoru Kurosawa Threshold MACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Keith M. Martin, Josef Pieprzyk, Rei Safavi-Naini, Huaxiong Wang, and Peter R. Wild Ideal Threshold Schemes from MDS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .253 Josef Pieprzyk and Xian-Mo Zhang
Efficient Implementations New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Tae-Jun Park, Mun-Kyu Lee, and Kunsoo Park Efficient Computations of the Tate Pairing for the Large MOV Degrees . . . 283 Tetsuya Izu and Tsuyoshi Takagi Improved Techniques for Fast Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 Bodo M¨ oller Efficient Hardware Multiplicative Inverters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .313 Hyun-Gyu Kim and Hyeong-Cheol Oh
Side-Channel Attacks Ways to Enhance Differential Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327 R´egis Bevan and Erik Knudsen
Table of Contents
XI
A Simple Power-Analysis (SPA) Attack on Implementations of the AES Key Expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Stefan Mangard A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem . . . . . 359 Kouichi Sakurai and Tsuyoshi Takagi Hardware Fault Attack on RSA with CRT Revisited . . . . . . . . . . . . . . . . . . . . . . 374 Sung-Ming Yen, Sangjae Moon, and Jae-Cheol Ha Cryptographic Protocols I Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Byoungcheon Lee and Kwangjo Kim Non-interactive Auction Scheme with Strong Privacy . . . . . . . . . . . . . . . . . . . . . 407 Kun Peng, Colin Boyd, Ed Dawson, and Kapali Viswanathan An Anonymous Buyer-Seller Watermarking Protocol with Anonymity Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421 Hak Soo Ju, Hyun Jeong Kim, Dong Hoon Lee, and Jong In Lim Speeding Up Secure Sessions Establishment on the Internet . . . . . . . . . . . . . . . 433 Yaron Sella Cryptographic Protocols II On Fairness in Exchange Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 Olivier Markowitch, Dieter Gollmann, and Steve Kremer A Model for Embedding and Authorizing Digital Signatures in Printed Documents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 Jae-il Lee, Taekyoung Kwon, Sanghoon Song, and Jooseok Song A Dynamic Group Key Distribution Scheme with Flexible User Join . . . . . . 478 Hartono Kurnio, Luke McAven, Rei Safavi-Naini, and Huaxiong Wang Efficient Multicast Key Management for Stateless Receivers . . . . . . . . . . . . . . . 497 Ju Hee Ki, Hyun Jeong Kim, Dong Hoon Lee, and Chang Seop Park Biometrics Fingerprint Verification System Involving Smart Card . . . . . . . . . . . . . . . . . . . . 510 Younhee Gil, Daesung Moon, Sungbum Pan, and Yongwha Chung A Fast Fingerprint Matching Algorithm Using Parzen Density Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 Choonwoo Ryu and Hakil Kim Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .535
Cut-&-Paste Attacks with JAVA Serge Lefranc1 and David Naccache2 1
´ Ecole Nationale Sup´erieure des Techniques Avanc´ees 32 Boulevard Victor Paris cedex 15, F-75739, France [email protected] http://www.ensta.fr/~lefranc 2 Gemplus Card International 34 rue Guynemer, Issy-les-Moulineaux, F-92447, France [email protected] http://www.gemplus.com/smart
Abstract. This paper describes malicious applets that use Java’s sophisticated graphic features to rectify the browser’s padlock area and cover the address bar with a false https domain name. The attack was successfully tested on Netscape’s Navigator and Microsoft’s Internet Explorer; we consequently recommend to neutralize Java whenever funds or private data transit via these browsers and patch the flaw in the coming releases. The degree of novelty of our attack is unclear since similar (yet nonidentical) results can be achieved by spoofing as described in [6]; however our scenario is much simpler to mount as it only demands the inclusion of an applet in the attacker’s web page. In any case, we believe that the technical dissection of our malicious Java code has an illustrative value in itself.
1
Introduction
In the past years, ssl [1] has become increasingly popular for protecting information exchanged between web stores and Internet users. ssl features public-key encryption and signature, two cryptographic functions that require the prior exchange of public keys between the sender and the receiver. Assuming the security of the underlying algorithms, one must still make sure that the received public keys actually belong to the entity claiming to possess them. In other words, after receiving a public key from a site claiming to be http://www.amazon.com, it still remains to check that the public key indeed belongs to Amazon; this is ascertained using certificates. A certificate is a signature of the user’s public-keys, issued by a trusted third party (authority). Besides the public-key, the certificate’s signed field frequently contains additional data such as the user’s identity (e.g. amazon.com), an algorithm ID (e.g. rsa, dsa, ecdsa etc.), the key-size and an expiry date. The P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 1–15, 2003. c Springer-Verlag Berlin Heidelberg 2003
2
Serge Lefranc and David Naccache
authority’s public-keys, used for verifying the certificates, are assumed to be known to everybody. Besides the site-specific information displayed by a website to a user (contents that one can trust or not), secure sessions has two visual tell-tale signs : • The image of a closed padlock appears in the browser (at the lower left corner of the browser for Netscape’s Navigator and at the lower right part of the window for Microsoft’s Internet Explorer). • A slight change appears in the address bar, where instead of the usual : http://www.domain-name.com an additional s (standing for the word secure) can be seen : https://www.domain-name.com Figures 1.a, 1.b (pages 10 and 11) for Microsoft Internet Explorer and 2.a, 2.b for Netscape Navigator (pages 12 and 13) illustrate these visual differences. In essence, the main indications guaranteeing the session’s security to the user are visual.
2
The Flaw
To make navigation attractive and user-friendly, browsers progressively evolved to enable the on-the-fly delivery of images, movies, sounds and music. This is made possible by the programming language Java. When a user loads an html page containing an applet (a Java program used in a web page), the browser starts executing the byte-code of this applet. Unlike most other procedural languages, the compilation of a Java program does not yield a machine-code executable but a byte-code file that can be interpreted by any browser implementing a Java Virtual Machine. This approach allows to reach an unprecedented level of compatibility between different operating systems (which is, in turn, the reason why Java has become so popular [4, 5, 2]). A very intriguing feature of applets is their ability to display images beyond the browser’s bounds, a feature largely exploited by the attacks described in this paper. In a nutshell, our malicious applet will cover the browser’s padlock area with the image of a closed padlock and, using the same trick, rectify the address bar’s http to an https). Several variants can also be imagined: cover and mimic the genuine navigator menus, modify the title banners of open windows, display false password entry windows etc. 2.1
Scenario and Novelty
The scenario is easy to imagine: a user, misled by a fake padlock, can, for instance, feed confidential banking details into a hostile site. The degree of novelty
Cut-&-Paste Attacks with JAVA
3
of our attack is unclear since similar (yet non-identical) results can be achieved by spoofing as described in [6]; however our scenario is much simpler to mount as it only demands the inclusion of an applet in the attacker’s web page. In any case, we believe that the technical dissection of our malicious Java code has an illustrative value in itself.
3
The Code
This section will explain in detail the structure of applets tailored for two popular browsers : Netscape’s Navigator et Microsoft’s Internet Explorer (our experiments were conducted with version 4.0, at least, of each of these browsers, in order to take advantage of Java. Previous versions of these browsers represent less then 10% of the browsers in the field). For the sake of clarity we separately analyze the display and positioning parts of the applets. Explanations refer to Netscape’s applet testN.java; minor modifications suffice to convert testN.java into a code (testE.java) targeting the Explorer. 3.1
Displaying the Fake Padlock
Image files downloaded from the Internet are usually displayed line after line, at a relatively slow pace. Such a gradual display is by orders of magnitude slower then the speed at which the microprocessor updates pixels. The closed padlock must therefore appear as suddenly as possible so as not to attract the user’s attention. Luckily, there is a class in Java (MediaTracker) that avoids progressive display. To do so, we add the image of the padlock to a tracker object with the following command: MediaTracker tracker = new MediaTracker(this); image = getImage(getCodeBase(),"PadlockN47.gif"); tracker.addImage(image,0); We can add as many images as we please to a single media tracker but one must assign ID numbers to these images. Here we have only one image (PadlockN47.gif shown in figure 3) which ID is zero by default.
Fig. 3. The fake padlock for Netscape’s Navigator (image file PadlockN47.gif)
4
Serge Lefranc and David Naccache
To wait for an image to be loaded completely, we use the following code : try {tracker.waitForID(0);} catch(Exception e) {} This means that if the picture is not fully loaded, the program will throw an exception. To display the picture we use Java’s standard function: window1.setBounds(X,Y,imgWidth,imgHeight); which means that the frame containing the picture should appear at coordinates {X, Y}, be imgWidth pixels wide and imgHeight pixels high. window1.show(); window1.toFront(); The show() method makes a window visible and the toFront() method makes sure that the window will be displayed at the top of the visualization stack. public void start() { thread.start(); } As we want to continuously display the padlock, we instanciate a Thread object that creates an independent thread. The start() method creates the thread and begins the display process by invoking the start() method of Thread. The call of start() causes the call of the applet’s run() method that in turn displays the padlock : public void run() { ... window1.getGraphics().drawImage(image,0,0,this); window1.validate(); } These lines of code finally make sure that the drawImage() method draws the picture at the right place, and validate it. To make the applet fully functional, one can add a function that will check if the victim has moved the browser and if so redraw the padlock at the right position. We do not detail this feature here. 3.2
The Padlock’s Position
To paste the padlock at the right position we use Javascript [3] functions which are distinct for the Navigator and the Explorer. The positioning calculations are done in Javascript and involve constants representing the coordinates of the padlock area and the dimensions of the fake padlock. This explains the existence of two different html pages that we analyze separately. Both can be easily merged into a code that adapts itself to the attacked browser, but this was avoided to keep the description as simple as possible.
Cut-&-Paste Attacks with JAVA
5
Netscape’s Navigator Two functions of the window method are very useful for correctly displaying the padlock. The following Javascript code calculates its exact position: sX = window.screenX; sY = window.screenY + window.outerHeight - 23; By default, {0, 0} is the screen’s upper left corner, which is why we subtract the height of the padlock (23 pixels) from the sum of window.screenY and window.outerHeight. It remains to hand over the Javascript variables sX and sY to the applet. The strategy for doing so is the following: we define a one pixel applet so as to remain quasi-invisible and avoid attracting the user’s attention. The pixel can be hidden completely by assigning to it a color identical to the background but again, this was avoided to keep the code simpler. We hand-over the position data using: document.write("<APPLET CODE =’testN.class’ HEIGHT=1 WIDTH=1>") document.write(" ") document.write(" ") document.write("") Back in the Java code, these parameters are received as Strings and converted to integers as follows: String x = getParameter("winPosX"); int X = Integer.parseInt(x); String y = getParameter("winPosY"); int Y = Integer.parseInt(y); As illustrated in figure 4 (page 14), our applet works perfectly when called from the Navigator. Unless the user purposely dig information in the Navigator’s security menu (Communicator/Security Info) the illusion is perfect. We intentionally omitted the https part of the applet to avoid publishing an off-the-shelf malicious code. Microsoft’s Internet Explorer The Explorer’s behavior is slightly different. When an applet is displayed, a warning banner is systematically added to its window. To overcome this, we design an applet that appears to be behind the browser while actually being in front of it. This is better understood by having a look at figures 5 (next page) and 6 (page 15). A second (more aggressive) approach consists in adding to the html code an instruction that expands the browser to the entire screen (the warning banner will then disappear). It is even possible to neutralize the function that allows the user to reduce the browser’s size.
6
Serge Lefranc and David Naccache
Fig. 5. The fake padlock for Microsoft Explorer (image file EvaPeronPadlock.gif)
4
Solutions
As our experiments prove, patching and upgrading seems in order. Here are some solutions one can think of (the list is, of course, far from being exhaustive). Random Icons During installation, the program picks an icon at random (e.g. from a database of one million icons) and customizes the padlock area with it. The selected icon, that the user learns to recognize, can be displayed in green (secure) or red (insecure). This should be enough to solve the problem, assuming that hostile applets can not read the selected icon. Warning Messages Have the system display a warning message whenever the padlock area is partially or completely covered by another window (e.g. A window has just covered a security indicator, would you like to proceed?). Note that warnings are necessary only when open padlocks are covered; warnings due to intentional user actions such as dragging or resizing can be automatically recognized and avoided.
Cut-&-Paste Attacks with JAVA
7
Display in Priority Whenever a window covers an open padlock, have the open padlock (handled by the operating system as a privileged icon) systematically appear in the foreplan. Note that such a radical solution paves the screen with holes and might be difficult to live with. Restricted Graphic Functions Allow display only within the browser’s bounds. Selective Tolerance Determine which application covered the padlock area and activate any of the previous protections only if the covering application is cataloged by the system as a priori insecure (e.g. unsigned by a trusted authority, failure to complete an ssl session etc.). Cockpit Area Finally, one can completely dissociate the padlocks from the browsers and display the padlocks, application names and address bars in a special (cockpit) area. By design, the operating system will then make sure that no application can access pixels in the cockpit area.
Acknowledgments The authors are grateful to Florent Coste, Fabrice Delhoste, Pierre Girard and Hampus Jakobsson for their valuable comments.
References [1] K. Hickman, The SSL Protocol, December 1995. Available electronically at : http://www.netscape.com/newsref/std/ssl.html 1 [2] C. Horstmann and G. Cornell, Core Java, volumes 1 and 2, Sun Microsystems Press, Prentice Hall, 2000. 2 [3] N. McFarlane,Professionnal Javascript, Wrox Press, 1999. 4 [4] G. McGraw and E. Felten, Securing Java : getting down to business with mobile code , 2-nd edition, Wiley, 1999. 2 [5] S. Oaks, Java security, O’Reilly, 1998. 2 [6] E. Felten & al., Web Spoofing : An Internet Con Game, Technical Report 540-96, Princeton University, 1997. 1, 3
8
Serge Lefranc and David Naccache
Appendices A
The html Page testN.html
THIS SITE IS INSECURE
(DESPITE THE CLOSED PADLOCK)
<SCRIPT> sX = window.screenX; sY = window.screenY + window.outerHeight - 23; document.write("<APPLET CODE =’testN.class’ HEIGHT=1 WIDTH=1>") document.write(" ") document.write(" ") document.write("")
The html page testE.html is obtained by changing the definitions of sX and sY to: sX = window.screenLeft + document.body.offsetWidth - 198; sY = window.screenTop + document.body.offsetHeight;
and replacing the applet’s name in: document.write("<APPLET CODE =’testIE.class’ HEIGHT=1 WIDTH=1>")
B
The Applet testN.java
import java.awt.*; import java.awt.image.*; import java.applet.*; public class testN extends Applet implements Runnable { Window window1; Image image ; Thread thread = new Thread(this); int imgWidth = 24; int imgHeight = 23; public void init() { // We use the MediaTracker function to be sure that // the padlock will be fully loaded before being displayed MediaTracker tracker = new MediaTracker(this); image = getImage(getCodeBase(),"PadlockN47.gif"); tracker.addImage(image,0); try {tracker.waitForID(0);} catch(Exception e) {} String x = getParameter("winPosX"); int X = Integer.parseInt(x); String y = getParameter("winPosY"); int Y = Integer.parseInt(y); window1 = new Window(new Frame()); window1.setBounds(X,Y,imgWidth,imgHeight);
Cut-&-Paste Attacks with JAVA
9
window1.show(); window1.toFront(); } public void start() { thread.start(); } public void run() { // winPosX,Y are parameters that define the position // of the padlock in the screen String x = getParameter("winPosX"); int X = Integer.parseInt(x); String y = getParameter("winPosY"); int Y = Integer.parseInt(y); window1.setBounds(X,Y,imgWidth,imgHeight); window1.getGraphics().drawImage(image,0,0,this); window1.validate(); } }
The applet testE.java is obtained by replacing the definitions of imgWidth and imgHeight by: int imgWidth
= 251; int imgHeight = 357;
and changing the fake padlock file’s name to: image = getImage(getCodeBase(),"EvaPeronPadlock.gif");
10
Serge Lefranc and David Naccache
Fig. 1.a. Potentially insecure session (Netscape’s Navigator)
Cut-&-Paste Attacks with JAVA
Fig. 1.b. Secure session (Netscape’s Navigator)
11
12
Serge Lefranc and David Naccache
Fig. 2.a. Potentially insecure session (Microsoft Explorer)
Cut-&-Paste Attacks with JAVA
Fig. 2.b. Secure session (Microsoft Explorer).
13
14
Serge Lefranc and David Naccache
Fig. 4. Fake padlock applet on a Netscape Navigator
Cut-&-Paste Attacks with JAVA
Fig. 6. Fake padlock applet on a Microsoft Explorer
15
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption Ik Rae Jeong, Hee Yun Jeong, Hyun Sook Rhee, Dong Hoon Lee, and Jong In Lim Center for Information Security Technologies (CIST) Korea University, Seoul, Korea {jir,hyun,math33}@cist.korea.ac.kr {donghlee,jilim}@koera.ac.kr
Abstract. To make authenticated encryption which provides confidentiality and authenticity of a message simultaneously, a signcryption scheme uses asymmetric primitives, such as an asymmetric encryption scheme for confidentiality and a signature scheme for authentication. Among the signcryption schemes, the hybrid signcryption schemes are the signcryption schemes that use a key agreement scheme to exchange a symmetric encryption key, and then encrypt a plaintext using a symmetric encryption scheme. The hybrid signcryption schemes are specially efficient for signcrypting a bulk data because of its use of a symmetric encryption. Hence to achieve the joint goals of confidentiality and authenticity in most practical implementation, hybrid signcryption schemes are commonly used. In the paper, we study the properties of signcryption and propose a new generic hybrid signcryption scheme called DHEtS using encrypt-thensign composition method. DHEtS uses a symmetric encryption scheme, a signature scheme, and the DH key agreement scheme. We analyze DHEtS with respect to the properties of signcryption, and show that DHEtS provides non-repudiation and public verifiability. DHEtS is the first provable secure signcryption schemes with public verifiability. If encrypting and signing components of DHEtS can use the same random coins, the computational cost and the size of a signcryption would be greatly reduced. We show the conditions of signing component to achieve randomness-efficiency. Keywords: authenticated encryption, signcryption, encrypt-then-sign, confidentiality, authenticity, non-repudiation, public verifiability.
1
Introduction
Confidentiality and authenticity have been important goals in cryptography. To provide confidentiality and authenticity simultaneously, authenticated encryption schemes have been intensively investigated in the literature [9, 14, 12, 8, 6, 1]. In the symmetric setting, an authenticated encryption scheme uses a symmetric encryption scheme for confidentiality and a MAC scheme for authentication. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 16–34, 2003. c Springer-Verlag Berlin Heidelberg 2003
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
17
In the asymmetric setting, an authenticated encryption scheme, called a signcryption scheme, uses asymmetric primitives, such as an asymmetric encryption scheme for confidentiality and a signature scheme for authentication. Signcryption schemes are convenient because the sender can encrypt the message only using the receiver’s public key and its own private key without pre-sharing the common secret key, but very expensive in computation cost compared to those in the symmetric setting. In a hybrid signcryption scheme a sender uses asymmetric primitives to exchange a symmetric encryption key, and then encrypts a plaintext using a symmetric encryption scheme. So hybrid signcryption schemes are more efficient than signcryption schemes without using symmetric encryption, especially in case of encrypting data in bulk. In an authenticated encryption scheme there are three general composition methods, autheticate-then-encrypt (AtE), authenticate-and-encrypt (A&E), and encrypt-then-authenticate (EtA). These three composition methods for symmetric setting and for signcryption are formally treated in recent works [6] and [1, 3] respectively. Signcryption can provide several additional properties, i.e., non-repudiation and public verifiability. There are two kinds of non-repudiation. Non-repudiation of the sender means that only the sender can make a valid signcryption, so he can not deny the fact that he made the signcryption. Non-repudiation of both the sender and the receiver means that only the sender can make such a signcryption that only the designated receiver can insist that the signcryption is made for the receiver, so the sender can not deny the fact that he made the signcryption for that receiver. There are three kinds of public verifiability. Public verifiability of validity means that anyone can verify whether the signcryption is valid or not only with the public information. Public verifiability of the sender means that anyone can know who is the sender of the signcryption only with the public information. Public verifiability of the receiver means that anyone can know who is the receiver of the signcryption only with the public information. Schemes in the symmetric setting do not provide non-repudation and public verifiability which are important functionalities in e-commerce. Signcryption schemes can provide those functionalities, but signcryption schemes without using symmetric encryption are not used to encrypt a bulk data because of inefficiency. Thus in most practical implementation, hybrid signcryption schemes are commonly used to achieve the joint goal of confidentiality and authenticity. Another important issue is randomness-efficiency. Each encrypting and signing components in signcryption need to use random coins respectively. If we can use the same random coins in both encrypting and signing components, we can reduce the computational cost and the size of a signcryption. In the paper, we propose new encrypt-then-sign composition method in signcryption called DHEtS, and show that DHEtS provides non-repudiation and public verifiability. DHEtS is the first provably secure signcryption scheme which provides public verifiability as long as we know. So this scheme is use-
18
Ik Rae Jeong et al.
ful for applications which need public verifiability. And we show the conditions of the signing component of DHEtS to achieve randomness-efficiency. 1.1
Related Work and Our Contribution
The security notions and their relations for confidentiality and authentication in the symmetric setting are shown in [6], where three general methods to combine a symmetric encryption scheme and a MAC scheme are also analyzed. A signcryption scheme is proposed in [9]. It is based on a Nyberg-Rueppel type encryption scheme and a Schnorr type signature scheme, but its security is not proved. The security notions and their relations of signcryption are shown in [1], where three general methods to combine an asymmetric encryption scheme and a signature scheme are also analyzed. In [1] the author proposed a signcryption scheme called ESSR and a hybrid scheme called DHET M , and proved their security. DHET M consists of the Diffie-Hellman key agreement scheme, a symmetric encryption scheme, and a MAC scheme. ESSR in [1] follows encrypt-then-sign composition mehtod, and satisfies semantic security against adaptive chosen ciphertext attacks (IND-CCA2) for confidentiality. And it is also strongly unforgeable and unchangeable of the receiver’s public key against adaptive chosen message attacks (SRUF-CMA) for authenticity. So it provides non-repudiation, but does not provide public verifiability nor randomness-efficiency. ESSR uses an IND-CCA2 secure asymmetric encryption scheme and a strongly unforgeable signature scheme against chosen message attacks (SUF-CMA) to make a signcryption scheme. CtE&S in [3] follows encrypt-and-sign composition method, and is an INDgCCA2 (which is a variant of IND-CCA2 and was defined in [3]) and SRUF-CMA secure signcryption scheme. So it provides non-repudiation, but does not provide public verifiability nor randomness-efficiency. CtE&S uses an IND-gCCA2 secure asymmetric encryption scheme, an weakly unforgeable signature scheme against chosen message attacks (WUF-CMA), and a secure commitment scheme. To make a hybrid signcryption scheme, we can follow two different approach. One approach is to make a secure hybrid asymmetric encryption scheme which is made using a symmetric encryption scheme and asymmetric primitives, and then combine a secure signature scheme and a secure hybrid asymmetric encryption scheme using ESSR or CtE&S. The other approach is to combine a secure signature scheme and a secure symmetric encryption scheme without making a secure hybrid asymmetric encryption scheme. A hybrid signcryption following the latter approach is proposed in [14]. It uses the Diffie-Hellman key agreement scheme, a symmetric encryption scheme, and a variant of the DSS signature scheme. It follows encrypt-and-sign composition method, and its security is formally proved in the random oracle model in [7]. The scheme in [14] is random-efficient and provides non-repudiation, but does not provide public verifiability. The name ”signcryption” is borrowed from [14], but the schemes in [14] are actually the hybrid signcryption schemes.
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
19
We propose a new hybrid signcryption scheme called DHEtS. DHEtS uses the Diffie-Hellman key agreement scheme, a symmetric encryption scheme, and a signature scheme. DHEtS follows encrypt-then-sign composition method. DHEtS is generic, i.e., it uses a symmetric encryption scheme and a signature scheme as a black-box manner. DHEtS provides non-repudiation and public verifiability. And we also show that under which conditions DHEtS can be randomness-efficient. The outline of the paper is as follows. In Section 2, we describe assumptions on which DHEtS depends. In Section 3, we describe the properties and security notions of signcryption. In Section 4 we construct DHEtS and prove its security. Section 5 concludes the paper.
2
Assumptions
To provide confidentiality of a message we use an encryption scheme. The security notions and attack models for a symmetric encryption scheme are well researched in the literature [4, 6, 11]. Our scheme uses a symmetric encryption scheme which is semantically secure against chosen plaintext attacks (INDCPA). Under the IND-CPA secure encryption scheme any adversary gains no information about the plaintext of the given ciphertext with access to the encryption oracle. To provide authenticity of the message we use a signature scheme. Unforgeability of a signature scheme means any adversary can not make a valid signature. We define two kinds of unforgeability, weak unforgeability (WUF) and strong unforgeability (SUF). Under a WUF-CMA secure signature scheme any adversary can not forge any valid signature for a new message M though he can make another valid signature for old messages with access to the signing oracle, the restriction being that the adversary can not query the signing oracle on the message M . Under an SUF-CMA secure signature scheme any adversary can not forge any new valid message-signature pair (M, σ), the restriction being that the adversary can not receive the signature σ for the message M from the signing oracle. To make a symmetric encryption key in DHEtS, we use the variant of the Diffie-Hellman key agreement scheme. This key agreement scheme is based on the variant of the Diffie-Hellman assumption. The HDH0 (Hash Diffie-Hellman) based on the DDH (Decisional Diffie-Hellman) problem is introduced in [2]. The DDH problem is to determine a given triple (U, V, W ) is a Diffie-Hellman triple of the form (g u , g v , g uv ) or not. The HDH0 problem is to determine a given triple (U, V, W ) is a hash Diffie-Hellman triple of the form (g u , g v , H(g uv )) or not. We define the HDH1 problem which is to determine a given quadruple (u, X, V, W ) is a hash Diffie-Hellman quadruple of the form (u, g x , g v , H(g u ||g xv )) or not. The HODH (Hash Oracle Diffie-Hellman) problem [2] is a variant of the HDH problem where access to the hash oracle is allowed. In the HODH0 problem, given a triple (U, V, W ), an adversary can query any value X except U to the hash oracle O − HDH0 (X) = H(X v ). Note that if access to O − DDH (X) = X v
20
Ik Rae Jeong et al.
is allowed in the DDH problem, we can easily solve the DDH problem by querying g x · U and dividing the response (g x · U )v by V x . But in the HODH0 problem querying any value except U to the hash oracle seems to give no information about H(g uv ) if H looks random. In the HODH1 problem the hash oracle is O − HDH1 (X1 , X2 ) = H(X1 ||X2v ) and an adversary, given a quadruple (u, X, V, W ), can query any pair (X1 , X2 ) except (g u , X) to the hash oracle. Definition 1 (HODH). Let GG be a group generator which generates a generator g and a group G whose order is |G|. Let k1 , k2 ∈ N be polynomially related security parameters. Let H : {0, 1}∗ → {0, 1}k2 be a hash function. Consider the following experiment. ExpHODH0 H,AHODH0 (k1 , k2 ) (g, |G|) ← GG(k1 ) R u, v ← {1, ..., |G|} U ← gu; V ← gv R
ExpHODH1 H,AHODH1 (k1 , k2 ) (g, |G|) ← GG(k1 ) R x, u, v ← {1, ..., |G|} X ← gx; V ← gv
b ← {0, 1} if b = 1 then W ← H(g uv ) else W ← {0, 1}k2 O−HDH0 (·) return AHODH0 (U, V, W )
R
b ← {0, 1} if b = 1 then W ← H(g u ||X v ) else W ← {0, 1}k2 O−HDH1 (·,·) return AHODH1 (u, X, V, W )
The advantage of an adversary AHODHX (k1 , k2 ) (for X = 0, 1) is defined as follows: HODHX AdvH,A (k1 , k2 ) = |P r[ExpHODHX H,AHODHX (k1 , k2 ) = 1|b = 1] HODHX
−P r[ExpHODHX H,AHODHX (k1 , k2 ) = 1|b = 0]|
The advantage function of the scheme is defined as follows: HODHX HODHX AdvH (k1 , k2 , t, qh , µh ) = max A {AdvH,AHODHX (k1 , k2 )},
where AHODHX is any adversary with time complexity t, making at most qh hash queries and at most µh total hash query bits. The HODHX assumption is that there exists a hash function H such that the advantage of any adversary AHODHX with time complexity polynomial in (k1 , k2 ) is negligible.
3
Signcryption
In a signcryption scheme confidentiality of a message is provided by encrypting component and authentication of a message is provided by signing component. A signcryption scheme consists of SC = (SC.keys ,SC.keyr ,SC.enc, SC.dec). SC.keys and SC.keyr generate a private-public key pair for the sender and the receiver, respectively. SC.enc signcrypts a message with the sender’s private key and the receiver’s public key, and outputs a signcryption. SC.dec designcrypts a signcryption with the receiver’s private key, and outputs the identity (public key) of the sender and the plaintext if the signcryption is valid, or ⊥ otherwise.
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
21
Definition 2 (IND-ATK of SC). Let k1 , k2 ∈ N be polynomially related security parameters. k2 is for a symmetric encryption scheme. Let SC be a signcryption scheme. Consider the following experiment: ExpIND-ATK SC,AIND-ATK (k1 , k2 ) (xs , ys ) ← SC.keys (k1 ) (xr , yr ) ← SC.keyr (k1 ) (m0 , m1 , s) ← AO,O1 (f ind, ys , yr ) R b ← {0, 1} C ← SC.enc<xs > (mb , yr ) return AO,O2 (guess, s, C)
- O = SC.enc<xs > (·, ·) - If ATK=CPA then O1 = and O2 = - If ATK=CCA1 then O1 = SC.dec<xr > (·) and O2 = - If ATK=CCA2 then O1 = SC.dec<xr > (·) and O2 = SC.dec<xr > (·)
In the above experiments the signcryption oracle, given a query (M, y ), signcrypts a plaintext M with its secret key xs and the receiver’s public key y , and returns C = SC.enc<xs > (M, y ). y may be different from yr . The designcryption oracle, given a query C , designcrypts a signcryption C with his secret key xr , and returns a pair (y, M ) = SC.dec<xr > (C ) of the identity (public key) of the sender and the plaintext if the signcryption is valid, or ⊥ otherwise. y may be different from ys . The power of adversaries depends on whether or not they are able to access to the signcryption and/or designcryption oracles, before and/or after the signcryption is given. means an adversary can not use the oracle. The advantage of an adversary AIND-ATK (k1 , k2 ) is defined as follows: IND-ATK AdvSC,A (k1 , k2 ) = |P r[ExpIND-ATK SC,AIND-ATK (k1 , k2 ) = 1|b = 1] IND-ATK
−P r[ExpIND-ATK SC,AIND-ATK (k1 , k2 ) = 1|b = 0]|
The advantage function of the scheme is defined as follows: IND-ATK IND-ATK AdvSC (k1 , k2 , t, qe , µe , qd , µd , lm ) = max A {AdvSC,AIND-ATK (k1 , k2 )},
where AIND-ATK is any adversary with time complexity t, making at most qe signcryption queries, at most µe total signcryption query bits, at most qd designcryption queries, at most µd total designcryption query bits, and outputting (m0 , m1 ) of the maximum length lm . The scheme SC is IND-ATK secure if the advantage of any adversary AIND-ATK with time complexity polynomial in (k1 , k2 ) is negligible. Unforgeability of SC means any adversary can not make a valid signcryption. There are two kinds of unforgeability, SUF and SRUF. SUF unforgeability means any adversary can not forge any new valid signcryption C though he can insist that the receiver of a signcryption is any other one than the originally intended receiver, the restriction being that the adversary can not receive it from the signcryption oracle. SRUF unforgeability means any adversary can not forge any new valid signcryption C nor insist that the receiver of a signcryption is any other one than the originally intended receiver, the restriction being that the adversary can not receive it from the signcryption oracle.
22
Ik Rae Jeong et al.
In SC an adversary for unforgeability can be not only a third party but also the receiver. In a symmetric authenticated encryption scheme the receiver can easily forge a valid signcryption because the sender and the receiver share and use the same secret value for signcryption and designcryption. But in SC the sender and the receiver have and use the different secret values for signcryption and designcryption. So it is not always possible for the receiver to forge a valid signcryption. In SC the receiver has more information than a third party in forging a valid signcryption. Moreover we allow the receiver to change its identity (public key) in forging a signcryption (public key changing attacks). Definition 3 (SUF-CMA,SRUF-CMA of SC). Let k1 , k2 ∈ N be polynomially related security parameters. k2 is for a symmetric encryption scheme. Let SC be a signcryption scheme. Consider the following experiment: ExpSUF-CMA ExpSRUF-CMA SC,ASUF-CMA (k1 , k2 ) SC,ASRUF-CMA (k1 , k2 ) (xs , ys ) ← SC.keys (k1 ) (xs , ys ) ← SC.keys (k1 ) (C , x , y ) ← AO (ys ) (C , x , y ) ← AO (ys ) if C
= ⊥ then if C
= ⊥ then κ ← SC.dec<x > (C ) κ ← SC.dec<x > (C ) if κ
= ⊥ then if κ
= ⊥ then parse κ as (y, M ) parse κ as (y, M ) if y = ys and O never if y = ys and {O never returned C as a response returned C or (M, y ) was then return 1 never a query to O} else return 0 then return 1 else return 0
In the above experiments the signcryption oracle O = SC.enc<xs > (·, ·), given a query (M, y ), signcrypts a plaintext M with its secret key xs and the receiver’s public key y , and returns a signcryption C = SC.enc<xs > (M, y ). The advantage of an adversary AS(R)UF-CMA (k1 , k2 ) is defined as follows: S(R)UF-CMA
S(R)UF-CMA
AdvSC,AS(R)UF-CMA (k1 , k2 ) = P r[ExpSC,AS(R)UF-CMA (k1 , k2 ) = 1].
The advantage function of the scheme is defined as follows: S(R)UF-CMA
AdvSC
S(R)UF-CMA (k1 , k2 , t, qe , µe ) = max (k1 , k2 )}, A {AdvSC,A S(R)UF-CMA
where AS(R)UF-CMA is any adversary with time complexity t, making at most qe signcryption queries and at most µe total signcryption query bits. The scheme SC is S(R)UF-CMA secure if the advantage of any adversary AS(R)UF-CMA with time complexity polynomial in (k1 , k2 ) is negligible.
4
DHEtS
DHEtS follows encrypt-then-sign composition method. If we follow encryptthen-sign composition method carelessly, the constructed scheme may be insecure. Consider the following simple signcryption. Let a pair (xs , ys ) be the
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption Algorithm DHEtS.keys (k1 ) begin (xs , ys ) ← SIG.key(k1 )
23
Algorithm DHEtS.keyr (k1 ) begin (g, |G|) ← GG(k1 ) R
xr ← {1, ..., |G|} yr ← g xr return (xr , yr ) end Algorithm DHEtS.enc<xs > (M, yr ) Algorithm DHEtS.dec<xr > (C) begin begin return (xs , ys ) end
R
x ← {1, ..., |G|} X ← gx Ke ← H(ys ||yrx ) c ← SY M.enc (M ) σ ← SIG.gen<xs > (X||c) C ← ys ||X||c||σ end
parse C as ys ||X||c||σ if SIG.ver (X||c, σ) = 1 then Ke ← H(ys ||X xr ) M ← SY M.dec (c) return (ys , M ) else return ⊥ end
Fig. 1. Signcryption and Designcryption algorithms in DHEtS
sender’s private and public keys, and (xr , yr ) be the receiver’s. Let ASY M be an asymmetric encryption scheme and SIG be a signature scheme. Suppose that a plaintext M is signcrypted as follows: c ← ASY M.enc (M ); σ ← SIG.gen<xs > (c); C ← ys ||c||σ. This signcryption scheme is insecure, if adversaries can use the designcryption oracle. That is, if an adversary A is given a signcryption C = ys ||c||σ, it can recover the plaintext by querying C = yA ||c||σ to the designcryption oracle, where σ is a signature made by the adversary with his private key corresponding to the public key yA . This attack is possible, since encrypting component is not affected by signing component. To construct a signcyption scheme, DHEtS uses a variant of the DiffieHellman key agreement scheme on a cyclic group G based on the HODH1 assumption. Let GG be a group generator which generates a generator g and a group G whose order is |G|. The sender’s private and public keys are for signing component, and the receiver’s private and public keys are for encrypting component. The receiver’s public key is used in the Diffie-Hellman key agreement, so the receiver’s public key has to be selected randomly from the group G. A symmetric encryption key H(ys ||yrx ) is made using the sender’s public key ys , the ephemeral public key g x and the receiver’s public key yr . So the symmetric encryption key depends both on the sender’s public key and the receiver’s public key, and varies in each signcryption. A plaintext is encrypted with this symmetric encryption key using a symmetric encryption scheme. Then the ciphertext is signed with the sender’s private key. In DHEtS the sender uses his private key only to sign the ciphertext, so all kinds of signature schemes can be used. Let’s reconsider the adversary A attacking DHEtS. When given a signcryption C = ys ||X||c||σ, A replaces the signature part of a given signcryption by signing with its private key and queries C = yA ||X ||c||σ to the designcryption
24
Ik Rae Jeong et al.
oracle, then the reconstructed symmetric encryption key by the designcryption oracle looks random. And the symmetric encryption part c of the signcryption is decrypted with this reconstructed key. Then the designcrypted message which looks random is returned to the adversary. Thus this attack fails. We concretely analyze the securities of the scheme in the next subsection. 4.1
Security of DHEtS
Theorem 1. Let GG be a group generator which generates a generator g and a group G whose order is |G|. Let k1 , k2 ∈ N be polynomially related security parameters. k2 is for a symmetric encryption scheme and a hash function. Let SY M be an IND-CPA secure symmetric encryption scheme and SIG a SUFCMA secure signature scheme. Let H : {0, 1}∗ → {0, 1}k2 be a hash function satisfying the HODH1 assumption. Then DHEtS is IND-CCA2 secure. Concretely, IND-CCA2 AdvDHEtS (k1 , k2 , t, qe , µe , qd , µd , lm ) HODH1 IND-CPA (k1 , k2 , t, qd , qd · 2 · ly ) + AdvSY (k2 , t, 0, 0, lm ) 2 · AdvH M SUF-CMA (k1 , t, qe + 1, µe + (qe + 1) · (2 · ly + lb ) + lm ), +2 · AdvSIG
where t is the maximum total experiment time including an adversary’s execution time, ly is the maximum length of a group element. We assume that the size of a plaintext M and the size of a ciphertext SY M.enc(M ) differ at most lb . Theorem 2. Let GG be a group generator which generates a generator g and a group G whose order is |G|. Let k1 , k2 ∈ N be polynomially related security parameters. k2 is for a symmetric encryption scheme and a hash function. Let SY M be an IND-CPA secure symmetric encryption scheme and SIG a SUF-CMA secure signature scheme. Let H : {0, 1}∗ → {0, 1}k2 be a hash function satisfying the HODH1 assumption. Then DHEtS is SUF-CMA secure. Concretely, SUF-CMA SUF-CMA AdvDHEtS (k1 , k2 , t, qe , µe ) AdvSIG (k1 , t, qe , µe + qe · (2 · ly + lb )),
where t is the maximum total experiment time including an adversary’s execution time, ly is the maximum length of a group element. We assume that the size of a plaintext M and the size of a ciphertext SY M.enc(M ) differ at most lb . The proofs of the above theorems are in Appendix B. DHEtS does not contain the receiver’s information explicitly. Although a signcryption is unforgeable (SUF-CMA), anyone can designcrypt a signcryption with his secret key and insist that the sender signcrypts a plaintext M . But only the implicitly intended receiver can designcrypt the correct plaintext M . If unchangeability of the receiver’s identity of a signcryption and uniqueness of the designcrypted plaintext from the signcryption are important, DHEtS can easily accomplish those functionalities by signing the receiver’s public key together. Lemma 1. DHEtS is SRUF-CMA secure, if the following operations are added to DHEtS’s algorithms:
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
25
– When signcrypting a plaintext, include the receiver’s public key in the signature. – When designcrypting a signcryption, check that the included public key is the receiver’s. If not, return ⊥. Proof of Lemma 1: If DHEtS includes the receiver’s public key in the signature, the receiver’s attack of changing his public key can be prevented, then DHEtS is SRUF-CMA secure. The formal proof is similar to Theorem 2, so we omit it here. Non-repudiation: DHEtS is SUF-CMA secure, so the sender can not deny the fact that he made the signcryption (non-repudiation of the sender). If DHEtS includes the receiver’s public key in the signature, it is SRUF-CMA secure. Then the sender can not deny the fact that he made the signcryption for the receiver (non-repudiation of the sender and the receiver). Public Verifiability: DHEtS provides public verifiability. Anyone can verify whether or not a signcryption is valid (verifiability of validity). I.e., anyone can check the validity of a signcryption by checking whether the signcryption is a valid signature or not. And anyone can verify the sender of a signcryption by checking the signer of a signcryption (verifiability of the sender). If DHEtS includes the receiver’s public key in the signature, anyone can verify the receiver of a signcryption (verifiability of the receiver). Randomness-Efficiency: DHEtS uses the Diffie-Hellman key agreement on a cyclic group G with the receiver’s public key. So the receiver’s public key is also from group G, but the sender’s public key may be based on another group. If the sender’s public key is from G and signing component uses the same group G, DHEtS can use the same random coins in both encrypting and signing components. A signcryption of DHEtS consists of (ys , X, c, σ), where ys is the sender’s public key and c is the output of a symmetric encryption scheme. X = g x is for the Diffie-Hellman key agreement using the random coin x. If x is reused in signing component, the commitment of x, X = g x , may be reconstructed from the signature σ. In this case DHEtS can omit X, and then a signcryption is (ys , c, σ). If X is necessary later, anyone can reconstruct X from the σ. For example a Schnorr signature [13] for a message M is σ = (c, s), where c ← H(g x , M ); s ← x − c · xs with the signer’s private and public keys (xs , ys ). Anyone can verify the validity of the message-signature pair (M, σ) by checking ? the equation c = H(g s · ysc , M ). So anyone can reconstruct g x = g s · ysc from σ. The security proofs for the randomness-efficient version of DHEtS are similar to those of Theorem 2 and Theorem 3, so we omit. Non-repudiation and public verifiability of the randomness-efficient version of DHEtS are obvious.
26
5
Ik Rae Jeong et al.
Conclusion
We have presented a generic signcryption scheme DHEtS. DHEtS uses an IND-CPA secure symmetric encryption scheme, a SUF-CMA secure signature scheme, and the variants of the Diffie-Hellman key agreement scheme based on the variants of the Diffie-Hellman assumption. DHEtS obtains IND-CCA2 security for confidentiality and SUF-CMA security for authentication, and can be easily converted into SRUF-CMA secure scheme. DHEtS provides additional properties, non-repudiation and public-verifiability. DHEtS can be also converted into randomness-efficient version, if encrypting component and signing component use the same group.
References [1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
J. H. An. Authenticated Encryption in the Public-Key Setting: Security Notions and Analyses. Report 2001/079, Cryptology ePrint Archive, http://eprint.iacr.org/, 2001. 16, 17, 18 M. Abdalla, M. Bellare, and P. Rogaway. The Oracle Diffie-Hellman assumptions and an analysis of DHIES. CT-RSA 2001, volume 2020 of Lecture Notes in Computer Science, pages 143-158. Springer Verlag, 2001. 19 J. H. An, Y. Dodis, and T. Rabin. On the Security of Joint Signature and Encryption. Advances in Cryptology-EUROCRYPT 2002, volume 2332 of Lecture Notes in Computer Science, pages 83-107, Springer Verlag, 2002. 17, 18 M. Bellare, A. Desai, E. Jokipii, and P. Rogaway. A Concrete Security Treatment of Symmetric Encryption: Analysis of DES Modes of Operation. Proceedings of the 38th Symposium on Foundations of Computer Science, IEEE, 1997. 19 M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway. Relations among notions of security for public-key encryption schemes. Advances in Cryptology-Crypto’98, volume 1462 of Lecture Notes in Computer Science, Springer Verlag, 1998. M. Bellare and C. Namprempre. Authenticated encryption: Relations among notions and analysis of the generic composition paradigm. Advances in CryptologyAsiacrypt 2000, volume 1976 of Lecture Notes in Computer Science, pages 531545, Springer Verlag, 2000. 16, 17, 18, 19 J. Baek, R. Steinfeld, and Y. Zheng. Formal Proofs for the Security of Signcryption. Public Key Cryptography 2002, volume 2274 of Lecture Notes in Computer Science, pages 80-98, Springer Verlag, 2002. 18 W.-H. He and T.-C. Wu. Cryptanalysis and improvement of Petersen-Michels signcryption schemes. IEE Proc. - Computers and Digital Techniques, 146(2): pp. 123-124, 1999. 16 P. Horster, M. Michels, and H. Petersen. Authenticated encryption schemes with low communication costs. Technical Report TR-94-2-R, University of Technology, Chemnitz-Zwickau, 1994. appeared in Electronic Letters, Vol. 30, No. 15, 1994. 16, 18 H. Krawczyk. The order of encryption and authentication for protecting communications (Or: how secure is SSL?). Advances in Cryptology-Crypto 2001, volume 2139 of Lecture Notes in Computer Science, Springer Verlag, 2001. J. Katz and M. Yung. Complete Characterization of Security Notions for Probabilistic Private-Key Encryption. Proceedings of the 32nd Annual Symposium on the Theory of Computing, ACM, 2000. 19
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption [12]
27
H. Petersen and M. Michels. Cryptanalysis and improvement of signcryption schemes. IEE Proc. - Computers and Digital Techniques, 145(2), pp. 149-151, 1998. 16 David Pointcheval and Jacques Stern. Security Arguments for Digital Signatures and Blind Signatures. Journal of Cryptology, 13(3), pp. 361-396, 2000. 25 Y. Zheng. Digital signcryption or how to achieve cost(signature & encryption) << cost(signature) + cost(encryption). Advances in Cryptology-Crypto’97, volume 1294 of Lecture Notes in Computer Science, pages 165-179, Springer Verlag, 1997. 16, 18
[13] [14]
A
Definitions
A.1
IND-CPA of SYM
A symmetric encryption scheme consists of SY M = (SY M.key, SY M.enc, SY M.dec). SY M.key generates a symmetric encryption key from a key space. SY M.enc encrypts a plaintext using an encryption key. SY M.dec decrypts a ciphertext using a decryption key and outputs a plaintext if the ciphertext is valid or ⊥ otherwise. Let k2 ∈ N be a security parameter. Let SY M be a symmetric encryption scheme. Consider the following experiment: ExpIND-CPA SY M,AIND-CPA (k2 ) Ke ← SY M.key(k2 ) (m0 , m1 , s) ← AO−SY M.enc (·) (f ind) R b ← {0, 1} c ← SY M.enc (mb ) return AO−SY M.enc (·) (guess, c, s)
The advantage of an adversary AIND-CPA (k2 ) is defined as follows: IND-CPA IND-CPA AdvSY M,AIND-CPA (k2 ) = |P r[ExpSY M,AIND-CPA (k2 ) = 1|b = 1]
−P r[ExpIND-CPA SY M,AIND-CPA (k2 ) = 1|b = 0]|
The advantage function of the scheme is defined as follows: IND-CPA IND-CPA AdvSY (k2 , t, qe , µe , lm ) = max A {AdvSY M,AIND-CPA (k2 )}, M
where AIND-CPA is any adversary with time complexity t, making at most qe encryption queries, at most µe total encryption query bits, and outputting (m0 , m1 ) of the maximum length lm . The scheme SY M is IND-CPA secure if the advantage of any adversary AIND-CPA with time complexity polynomial in k2 is negligible.
28
Ik Rae Jeong et al.
A.2
SUF-CMA of SIG
A signature scheme consists of SIG = (SIG.key, SIG.gen, SIG.ver). SIG.key generates a private-public key pair for the user. SIG.gen makes a signature for the message with the private key. SIG.ver verifies the message-signature pair with the public key and returns 1 if valid or 0 otherwise. Let GG be a group generator which generates a generator g and a group G whose order is |G|. Let k1 ∈ N be a security parameter. Let SIG be a signature scheme. Consider the following experiment: ExpSUF-CMA SIG,ASUF-CMA (k1 ) (g, |G|) ← GG(k1 ) (sk, pk) ← SIG.key(k1 ) τ ← AO−SIG.gen<sk> (·) (pk) if τ = ⊥ then return 0 else parse τ as (M, σ) if SIG.ver (M, σ) = 1 and O − SIG.gen<sk> (·) never returned σ on input M then return 1 else return 0
The advantage of an adversary ASUF-CMA (k1 ) is defined as follows: SUF-CMA AdvSIG,A (k1 ) = P r[ExpSUF-CMA SIG,ASUF-CMA (k1 ) = 1] SUF-CMA
The advantage function of the scheme is defined as follows: SUF-CMA SUF-CMA AdvSIG (k1 , t, qs , µs ) = max A {AdvSIG,ASUF-CMA (k1 )},
where ASUF-CMA is any adversary with time complexity t, making at most qs signing queries and at most µs total signing query bits. The scheme SIG is SUFCMA secure if the advantage of any adversary ASUF-CMA with time complexity polynomial in k1 is negligible. A.3
HDH
Let GG be a group generator which generates a generator g and a group G whose order is |G|. Let k1 , k2 ∈ N be polynomially related security parameters. Let H : {0, 1}∗ → {0, 1}k2 be a hash function. Consider the following experiment. ExpHDH0 ExpHDH1 H,AHDH0 (k1 , k2 ) H,AHDH1 (k1 , k2 ) (g, |G|) ← GG(k1 ) (g, |G|) ← GG(k1 ) R R u, v ← {1, ..., |G|} x, u, v ← {1, ..., |G|} u v x U ← g ;V ← g X ← g ; V ← gv R R b ← {0, 1} b ← {0, 1} uv if b = 1 then W ← H(g ) if b = 1 then W ← H(g u ||X v ) k2 else W ← {0, 1} else W ← {0, 1}k2 return AHDH0 (U, V, W ) return AHDH1 (u, X, V, W )
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
29
The advantage of an adversary AHDHX (k1 , k2 ) (for X = 0, 1) is defined as follows: HDHX AdvH,A (k1 , k2 ) = |P r[ExpHDHX H,AHDHX (k1 , k2 ) = 1|b = 1] HDHX
−P r[ExpHDHX H,AHDHX (k1 , k2 ) = 1|b = 0]|
The advantage function of the scheme is defined as follows: HDHX HDHX AdvH (k1 , k2 , t) = max A {AdvH,AHDHX (k1 , k2 )},
where AHDHX is any adversary with time complexity t. The HDHX assumption is that there exists a hash function H such that the advantage of any adversary AHDHX with time complexity polynomial in (k1 , k2 ) is negligible.
B B.1
Proofs of Theorems Proof of Theorem 1
In the experiment we assume that an adversary AIND-CCA2 never queries the decryption oracle on the ciphertexts received from the encryption oracle. This assumption is reasonable because the adversary already knows the plaintexts of the ciphertexts received from the encryption oracle. Let a SS query be a decryption oracle query on a valid ciphertext which has the sender’s signature. A boolean variable ASK is true if at least one SS query exists. The advantage of an adversary is from the following three cases: Case (1) The hash function H does not look random. Case (2) The hash function H looks random and no SS query exists. Case (3) The hash function H looks random and at least one SS query exists. If the advantage of the adversary is not negligible, we can use the adversary A for attacking at least one building block with the non-negligible advantage. We make the attacking algorithms of basic building blocks using the adversary in the following claims. Let O = O − HDH1 (·). Claim (1). P r[x, u, v ← {1, ..., |G|}; W ← H(g u ||g xv ) : C O (u, g x , g v , W ) = 1] R
=
IND-CCA2 AdvDHEtS,A (k1 , k2 , t, qe , µe , qd , µd , lm ) 1 + 2 2
(Proof of Claim) We construct an algorithm C shown in Fig. 2 which distinguishes whether an input quadruple (u, X, V, W ) is a hash Diffie-Hellman quadruple or not with the adversary A’s advantage from Case (1). If an input quadruple is a hash Diffie-Hellman quadruple, the simulators of the encryption and decryption oracles in C are exactly those in the real experiment for A.
30
Ik Rae Jeong et al. Algorithm C O−HDH1 (·,·) (u, X, V, W ) Subroutine Enc-Sim(M, y ) begin begin R xs ← u x ← {1, ..., |G|} u ys ← g X ← gx yr ← V Ke ← H(ys ||y x ) Ke ← W c ← SY M.enc (M ) run A(f ind, ys , yr ) σ ← SIG.gen<xs > (X||c) - for each encryption query (M, y ) C ← ys ||X||c||σ return Enc-Sim(M, y ) return C - for each decryption query C end return Dec-Sim(C ) - let (m0 , m1 , s) be the output of A Subroutine Dec-Sim(C ) b ← {0, 1} c ← SY M.enc (mb ) σ ← SIG.gen<xs > (X||c) C ← ys ||X||c||σ run A(guess, s, C) - for each encryption query (M, y ) return Enc-Sim(M, y ) - for each decryption query C return Dec-Sim(C ) - let b be the output of A if b = b then return 1 else return 0 end R
begin parse C as y ||X ||c ||σ if SIG.ver (X ||c , σ ) = 1 then if y = ys ∧ X = X then M ← SY M.dec (c ) else Ke ← O-HDH1 (y , X ) M ← SY M.dec (c ) return M else return ⊥ end
Fig. 2. Algorithm C for attacking the HODH1 assumption using the IND-CCA2 adversary of DHEtS
Because C outputs 1 if and only if A guesses correctly, the probability that C outputs 1 on the input of a hash Diffie-Hellman quadruple (u, g x , g v , H(g u ||g xv )) is identical with the advantage of A. Claim (2). P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1 ∧ ASK] R
R
IND-CPA AdvSY 1 M,B (k2 , t, 0, 0, lm ) + 2 2
(Proof of Claim) We construct an algorithm B shown in Fig. 3 which is an INDCPA attacker of SY M with the adversary A’s advantage from Case (2). If an input quadruple is not a hash Diffie-Hellman quadruple and there exists no SS query, the simulators of the encryption and decryption oracles in C are exactly those in B. Because C outputs 1 if and only if A guesses correctly and B outputs whatever A outputs, the probability that C outputs 1 on the input of a random quadruple is identical with the advantage of B under the condition that A asks
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
31
Algorithm B(f ind) Algorithm B(guess, c, s ) begin begin R R xs , xr ← {1, ..., |G|} x ← {1, ..., |G|} xs ys ← g X ← gx yr ← g xr parse s as (m0 , m1 , s, xs , ys , xb , yr ) run A(f ind, ys , yr ) ASK ← false - for each encryption query (M, y ) σ ← SIG.gen<xs > (X||c) return Enc-Sim(M, y ) C ← ys ||X||c||σ - for each decryption query C run A(guess, s, C) return Dec-Sim(C ) - for each encryption query (M, y ) - let (m0 , m1 , s) be the output of A return Enc-Sim(M, y ) s ← (m0 , m1 , s, xs , ys , xr , yr ) - for each decryption query C return (m0 , m1 , s ) return Dec-Sim(C ) end - if ASK = true then b ← {0, 1} else let b be the output of A Subroutine Enc-Sim(M, y ) return b begin end R x ← {1, ..., |G|} X ← gx Subroutine Dec-Sim(C ) x Ke ← H(ys ||y ) begin c ← SY M.enc (M ) parse C as y ||X ||c ||σ σ ← SIG.gen<xs > (X||c) if SIG.ver (X ||c , σ ) = 1 then C ← ys ||X||c||σ if y = ys then return C ASK ← true; return ⊥ end else Ke ← H(y ||X xr ) M ← SY M.dec (c ) return M else return ⊥ end
Fig. 3. Algorithm B for attacking IND-CPA security of SY M using the INDCCA2 adversary of DHEtS
no SS query. P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1 ∧ ASK] R
R
=
IND-CPA AdvSY 1 M,B (k2 , t, 0, 0, lm ) + 2 2
Claim (3). P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1 ∧ ASK] R
R
SUF-CMA AdvSIG,F 1 (k1 , t, qe + 1, µe + (qe + 1) · (2 · ly + lb ) + lm )
(Proof of Claim) The probability that C outputs 1 on the input of a random quadruple is the adversary A’s advantage from Case (3) under the condition
32
Ik Rae Jeong et al. Algorithm F 1O−SIG.gen<xs > (·) (ys ) begin R xr , x ← {1, ..., |G|} yr ← g xr ; X ← g x Ke ← H(ys ||yrx ) run A(f ind, ys , yr ) - for each encryption query (M, y ) return Enc-Sim(M, y ) - for each decryption query C return Dec-Sim(C ) - let (m0 , m1 , s) be the output of A R b ← {0, 1} c ← SY M.enc (mb ) σ ← O-SIG.gen<xs> (X||c) C ← ys ||X||c||σ run A(guess, s, C) - for each encryption query (M, y ) return Enc-Sim(M, y ) - for each decryption query C return Dec-Sim(C ) - let b be the output of A if no forgery return ⊥ end
Subroutine Enc-Sim(M, y ) begin R x ← {1, ..., |G|} X ← gx Ke ← H(ys ||y x ) c ← SY M.enc (M ) σ ← O-SIG.gen<xs > (X||c) C ← ys ||X||c||σ return C end Subroutine Dec-Sim(C ) begin parse C as y ||X ||c ||σ if SIG.ver (X ||c , σ ) = 1 then if y = ys then let (X ||c , σ ) as forgery; halt else Ke ← H(y ||X xr ) M ← SY M.dec (c ) return M else return ⊥ end
Fig. 4. Algorithm F 1 for attacking SUF-CMA security of SIG using the INDCCA2 adversary of DHEtS
that there exists at least one SS query. We construct an algorithm F 1 shown in Fig. 4 which is a SUF-CMA attacker of SIG. Then the probability that C outputs 1 under the condition is bounded by the advantage of F 1. P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1 ∧ ASK] R
R
SUF-CMA P r[ASK] = AdvSIG,F 1 (k1 , t, qe + 1, µe + (qe + 1) · (2 · ly + lb ) + lm )
Now we are ready to prove the theorem. HODH1 AdvH,C (k1 , k2 , t, qd , qd · 2 · ly ) = |P r[ExpHODH1 (k1 , k2 , t, qd , qd · 2 · ly ) = 1|b = 1] H,C
(k1 , k2 , t, qd , qd · 2 · ly ) = 1|b = 0]| −P r[ExpHODH1 H,C (k1 , k2 , t, qd , qd · 2 · ly ) = 1|b = 1] P r[ExpHODH1 H,C = P r[x, u, v ← {1, ..., |G|}; W ← H(g u ||g xv ) : C O (u, g x , g v , W ) = 1] R
Provably Secure Encrypt-then-Sign Composition in Hybrid Signcryption
33
(k1 , k2 , t, qd , qd · 2 · ly ) = 1|b = 0] P r[ExpHODH1 H,C = P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1] R
R
= P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1 ∧ ASK] R
R
+P r[x, u, v ← {1, ..., |G|}; W ← {0, 1}k2 : C O (u, g x , g v , W ) = 1 ∧ ASK] R
R
By Claim (1), (2) and (3), HODH1 AdvH,C (k1 , k2 , t, qd , qd · 2 · ly )
−
IND-CCA2 AdvDHEtS,A (k1 , k2 , t, qe , µe , qd , µd , lm ) 2
IND-CPA AdvSY M,B (k2 , t, 0, 0, lm ) SUF-CMA − AdvSIG,F 1 (k1 , t, qe +1, µe + (qe +1)·(2·ly +lb )+lm ) 2 IND-CCA2 IND-CPA (k1 , k2 , t, qe , µe , qd , µd , lm ) AdvSY AdvDHEtS,A M,B (k2 , t, 0, 0, lm ) HODH1 (k1 , k2 , t, qd , qd · 2 · ly ) +2 · AdvH,C SUF-CMA +2 · AdvSIG,F 1 (k1 , t, qe + 1, µe + (qe + 1) · (2 · ly + lb ) + lm )
So the following inequality holds: IND-CCA2 HODH1 AdvDHEtS (k1 , k2 , t, qe , µe , qd , µd , lm ) 2 · AdvH (k1 , k2 , t, qd , qd · 2 · ly ) IND-CPA +AdvSY (k2 , t, 0, 0, lm ) M SUF-CMA (k1 , t, qe + 1, µe + (qe + 1) · (2 · ly + lb ) + lm ). +2 · AdvSIG
This proves the theorem. Note that for this theorem to be correct k1 has to be polynomially related to k2 . If not, there exists an algorithm B ∗ whose time complexity is polynomial in k1 such that the advantage of B ∗ attacking INDCPA security of SY M is non-negligible.
Algorithm F 2O−SIG.gen<xs > (·) (ys ) Subroutine Enc-Sim(M, y ) begin begin R run A(ys ) x ← {1, ..., |G|} - for each encryption query (M, y ) X ← gx return Enc-Sim(M, y ) Ke ← H(ys ||y x ) - let (C , x , y ) be the output of A c ← SY M.enc (M ) if C
= ⊥ then σ ← O-SIG.gen<xs > (X||c) parse C as ys ||X ||c ||σ C ← ys ||X||c||σ return (X ||c , σ ) as forgery return C else return ⊥ end end
Fig. 5. Algorithm F 2 for attacking SUF-CMA security of SIG using the SUFCMA adversary of DHEtS
34
B.2
Ik Rae Jeong et al.
Proof of Theorem 2
We prove that DHEtS is SUF-CMA secure. If the advantage of an adversary ASUF −CMA attacking SUF-CMA security is not negligible, we can construct an algorithm F 2 shown in Fig. 5 attacking SUF-CMA security of SIG with the same advantage. F 2 makes a signing oracle query if and only if ASUF −CMA makes an encryption oracle query. So if ASUF −CMA forges a new valid ciphertext C = ys ||X ||c ||σ then (X ||c , σ ) is a new valid message-signature pair of ys . SUF-CMA SUF-CMA AdvDHEtS,A (k1 , k2 , t, qe , µe ) = AdvSIG,F 2 (k1 , t, qe , µe + qe · (2 · ly + lb )) SUF-CMA SUF-CMA (k1 , k2 , t, qe , µe ) AdvSIG (k1 , t, qe , µe + qe · (2 · ly + lb )) AdvDHEtS
New DSA-Verifiable Signcryption Schemes Jun-Bum Shin1,2 , Kwangsu Lee1 , and Kyungah Shim3 1
Softforum Co. LTD., Mirae-Building, 9th Floor, 1306-6, Seocho-dong, Seocho-gu Seoul, 137-070, Korea {jbshin,kslee}@softforum.com 2 Department of EECS, KAIST, 373-1, Kusong-dong, Yusong-gu Taejon 305-701, Korea [email protected] 3 KISA(Korea Information security Agency), 78, Garak-Dong, Songpa-gu Seoul 138-803, Korea [email protected]
Abstract. In this paper, we propose new DSA-verifiable signcryption schemes. At ICISC ’01, Yum and Lee first introduced the need for the public verifiability using standardized signature schemes and they proposed a KCDSA-verifiable scheme [1]. However, no DSA-verifiable signcryption schemes have been proposed. Additionally, we show some potential weakness of previous schemes proposed in [1, 2].
1
Introduction
Traditionally, signature-then-encryption approach and signature-and-encryption approach are used to achieve unforgeability, non-repudiation, and confidentiality. Signcryption, first proposed by Zheng [3], is a new cryptographic primitive that performs signature generation and encryption at a lower cost than signature-then-encryption and signature-and-encryption. One of the shortcomings of Zheng’s original schemes is that its non-repudiation procedure is more complex than that of signature-then-encryption or signature-and-encryption. Furthermore, Petersen and Michels showed that Zheng’s idea to achieve the non-repudiation seriously weakens the confidentiality [4]. To achieve simple and safe non-repudiation procedure, several research has been done [1, 2]. In Bao and Deng’s signcryption scheme [2], the signature can be verified by a sender’s public key only without weakening the confidentiality of other messages. Furthermore, in Yum and Lee’s scheme [1], the signature can be verified using one of standardized signature schemes, KCDSA (Korea Certificate-based Digital Signature Algorithm, [5]). Our work is motivated by Yum and Lee’s work [1]: the standardization is one of the crucial factors for practical uses of cryptosystems. We propose new signcryption schemes whose signature can be verified using, one of the most widely used standardized signature schemes, DSA (Digital Signature Algorithm, [6]). As shown in previous work [1, 2, 3], there are several technical difficulties for P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 35–47, 2003. c Springer-Verlag Berlin Heidelberg 2003
36
Jun-Bum Shin et al.
constructing DSA-verifiable signcryption schemes. For example, it is non-trivial to apply Bao and Deng’s technique [2] to DSA. To construct our DSA-verifiable signcryption schemes (SC-DSA and SCDSA+), we propose a modified version of DSA (MDSA). Two important characteristics of MDSA are as follows: (1) MDSA is suitable to apply Bao and Deng’s technique [2]; and (2) A recipient can easily recover the DSA signature from MDSA signature. The same idea is used in Yum and Lee’s construction of the KCDSA-verifiable signcryption scheme [1]. Our construction of SC-DSA and SC-DSA+ from MDSA is easy enough because of (1), and both are DSA-verifiable because of (2). Another contribution of this paper is the security analysis of previous schemes including ours. Bao and Deng [2], and Yum and Lee [1] claimed that their signcryption schemes with public verifiability are semantically secure without proof. However, their claims do not hold, because their signcryption schemes leak small information about a plaintext m. Since that information is similar to the hash value of m, their schemes are not semantically secure. We mention that the same problem can be found in SC-DSA, but not in the security-enhanced version of SCDSA (SC-DSA+). The paper is structured as follows: Notations and previous schemes are reviewed in Section 2. Our schemes, MDSA, SC-DSA, and SC-DSA+, are proposed and analyzed in Section 3-5. And, we conclude in Section 6.
2 2.1
Preliminaries Notations
The following notations will be used in this paper. – a||b: a concatenation of two strings a and b. – (p, q, g): a domain parameter where p is a large prime, q is a large prime factor of p − 1, and g ∈ Zp∗ is a base element of order q. – (xA , yA = g xA mod p): the private and public key of Alice. – (xB , yB = g xB mod p): the private and public key of Bob. – hash: a one-way hash function. – (E, D): the encryption and decryption algorithms of some symmetric key cipher (secure against chosen plaintext attacks). Throughout this paper, we will assume that Alice A is a sender and Bob B is a recipient.
New DSA-Verifiable Signcryption Schemes
2.2
DSA and SDSS1
We review two signature schemes, DSA [6] and Zheng’s scheme SDSS1 [3]. DSA [6] – Signing: Alice chooses x ∈ Zq∗ at random, sets 1. k ← g x mod p, 2. r ← k mod q, 3. h ← hash(m), and 4. s ← (h + xA r)/x mod q. and sends (m, r, s) to Bob. – Verifying: Bob sets 1. h ← hash(m), 2. e1 ← h/s mod q 3. e2 ← r/s mod q, e2 4. k ← g e1 yA mod p, and checks whether r = k mod q. SDSS1 [3] – Signing: Alice chooses x ∈ Zq∗ at random, sets 1. k ← g x mod p, 2. r ← hash(k||m), and 3. s ← x/(r + xA ) mod q. and sends (m, r, s) to Bob. – Verifying: Bob sets 1. k ← (yA g r )s mod p and checks whether r = hash(k||m)
2.3
Bao and Deng’s Signcryption Scheme
Bao and Deng’s scheme based on SDSS1 [2] – Signcrypting: Alice chooses x ∈ Zq∗ at random, sets 1. k ← g x mod p, x mod p, 2. K ← yB 3. Kenc ← hash(K), 4. c ← EKenc (m), 5. r ← hash(m||k), and 6. s ← x/(r + xA ) mod q. and sends (c, r, s) to Bob. – Unsigncrypting: Bob sets 1. k ← (yA g r )s mod p 2. K ← k xB mod p 3. Kenc ← hash(K) 4. m ← DKenc (c) and checks whether r = hash(m||k)
37
38
Jun-Bum Shin et al.
Bao and Deng’s scheme is an extension to Zheng’s scheme [3]. When Bao and Deng’s scheme is used, a third party can verify Alice’s signature on a message m if Bob publishes a message m and a signature (r, s). Generalization of Bao and Deng’s technique When Bao and Deng’s scheme is used, a verifier cannot verify Alice’s signature using SDSS1 because the order of m and k in hash(m||k) is different from that of SDSS1. However, their scheme can be easily modified so that the verifier can verify the signature using SDSS1 if we change hash(m||k) to hash(k||m). Definition 1 (SIG-verifiability). For a given signature scheme SIG, we say that a signcryption scheme is SIG-verifiable if a recipient Bob can recover Alice’s signature on a message m that can be verified using SIG. Bao and Deng’s technique can be generalized so that any signature scheme SIG satisfying Req 1 can be transformed to SIG-verifiable signcryption scheme, where Req 1. A recipient can recover k = g x mod p without the knowledge of a message m For example, SDSS1 satisfies Req 1, because k = (yA g r )s mod p = (g xA g r )x/(r+xA ) mod p = g x mod p Below, we assume that SIG satisfies Req 1. Then the following shows how SIG can be transformed to SIG-verifiable signcryption scheme. – Signcrypting: (S-Step 1) Choose x ∈ Zq∗ at random, x mod p) (S-Step 2) Compute c = EKenc (m), where Kenc = hash(yB (S-Step 3) Compute sig = SignA,x (m), where SignA,x(m) represents Alice’s signature on a message m with random x. and sends (c, sig) to Bob – Unsigncrypting: (U-Step 1) Compute k = g x mod p as defined in SIG1 . (U-Step 2) Compute m = DKenc (c), where Kenc = hash(k xB mod p) (U-Step 3) Verify sig as defined in SIG except the use of value k = g x mod p computed in (U-Step 1). – Dispute handling: Bob sends m and sig to the verifier. Note that, if Bob opens m, then the verifier can check whether sig is a valid signature on a message m or not, using the verification algorithm defined in SIG. Remark 1. We cannot apply this technique directly to DSA, because DSA does e2 mod p, but Bob cannot not satisfy Req 1. In DSA, k = g x mod p = g e1 yA compute e1 = hash(m)/s mod q without the knowledge of a message m. 1
(U-Step 1) cannot be defined if SIG does not satisfy Req 1.
New DSA-Verifiable Signcryption Schemes
3
39
Modified DSA
We propose and analyze a modified version of DSA (MDSA). 3.1
MDSA Specification
The goals of our modifications to DSA are twofold:. (1) To apply Bao and Deng’s technique, MDSA should satisfy Req 1. (2) To achieve the DSA verifiability, MDSA should satisfy Req 2, where Req 2. A recipient can easily recover the DSA signature from MDSA signature. MDSA is defined as follows: – Signing: Alice chooses x ∈ Zq∗ at random, sets 1. k ← g x mod p, 2. r ← k mod q, 3. h ← hash(m), and 4. s ← (h + xA r)/x mod q. 5. e1 ← h/s mod q and e2 ← r/s mod q and sends (m, e1 , e2 ) to Bob. – Verifying: Bob sets e2 mod p, 1. k ← g e1 yA 2. r ← k mod q, and 3. s ← r/e2 mod q 4. h ← hash(m) and checks whether e1 s ≡ h mod q MDSA is very similar to DSA except that a sender computes e1 and e2 . 3.2
Analysis
From our construction, it is easy to see that MDSA satisfies Req 1, because e2 mod p = g h/s g xA r/s mod p = g (h+xA r)/(h+xA r)/x) mod p = g x mod p k = g e1 yA
For Req 2, we prove a stronger result, that is, MDSA is strongly equivalent to DSA, similar to Yum and Lee’s work [1]. Definition 2 (Strong equivalence, [7]). Two signature schemes are called strongly equivalent if the signature of the first scheme can be transformed efficiently into signature of the second scheme and vice versa, without knowledge of private key. By definition, if MDSA is strongly equivalent to DSA, then we get the following two: (1) Req 2 is satisfied and (2) the security of MDSA is the same as that of DSA.
40
Jun-Bum Shin et al.
Theorem 1. MDSA is strongly equivalent to DSA. Proof: Consider the following two directions: – M DSA ⇒ DSA: We suppose that (m, e1 , e2 ) is a valid MDSA signature, i.e., e2 mod p) mod q) mod q. (1) hash(m) ≡ (e1 /e2 )((g e1 yA e2 Then (m, α = (g e1 yA mod p) mod q, β = α/e2 mod q) is a valid DSA signature, i.e., α/β α = (g hash(m)/β yA mod p) mod q (2)
because (1) hash(m) ≡ (e1 /e2 )α mod q from Equation 1; (2) hash(m)/β ≡ hash(m)/(α/e2 ) ≡ e1 mod q because of (1); (3) g hash(m)/β = g e1 mod p because of (2); α/β e2 mod p because β = α/e2 mod q; (4) yA ≡ yA α/β e1 e2 (5) g yA mod p = g hash(m)/β yA mod p because of (3) and (4); so α/β α/β (6) α = (g hash(m)/β yA mod p) mod q = (g hash(m)/β yA mod p) mod q because of (5). – DSA ⇒ M DSA: By the similar arguments as for M DSA ⇒ DSA, if (m, r, s) is a valid DSA signature, then (m, e1 = hash(m)/s mod q, e2 = r/s mod q) becomes the valid MDSA signature. ✷
4
SC-DSA
We propose a DSA-verifiable signcryption scheme SC-DSA. 4.1
SC-DSA Specification
MDSA satisfies Req 1, so we can apply to achieve MDSA-verifiable signcryption scheme. Since MDSA satisfies Req 2, if we change only the dispute handling phase, then the resulting signcryption scheme achieves DSA-verifiability. SC-DSA is defined as follows: – Signcrypting: Alice chooses x ∈ Zq∗ at random, sets x mod p, 1. K ← yB 2. Kenc ← hash(K), 3. c ← EKenc (m), 4. k ← g x mod p, 5. r ← k mod q, and 6. h ← hash(m), and 7. s ← (h + xA r)/x mod q. 8. e1 ← h/s mod q and e2 ← r/s mod q and sends (c, e1 , e2 ) to Bob. – Unsigncrypting: Bob sets
New DSA-Verifiable Signcryption Schemes
41
1. k ← (yA g r )s mod p 2. K ← k xB mod p 3. Kenc ← hash(K) 4. m ← DKenc (c) 5. r ← k mod q, 6. s ← r/e2 mod q 7. h ← hash(m) and checks whether e1 s ≡ h mod q – Dispute handling: Bob sets e2 mod p) mod q 1. r ← (g e1 yA 2. s ← r/e2 mod q and sends (m, r, s) to the verifier2 . Note that anyone can verify whether (r, s) is a valid signature on a message m using the verification algorithm defined in DSA as shown in the proof of Theorem 1. 4.2
Analysis
We consider three goals of signcryption schemes, the unforgeability, non-repudiation, and confidentiality, proposed by Zheng [3]. Unforgeability and non-repudiation Note that Bao and Deng’s technique is used in our construction of SC-DSA, both unforgeability and non-repudiation are satisfied if MDSA satisfies them. So if DSA is secure, then SC-DSA achieves both of them because MDSA is strongly equivalent to DSA. Remark 2. The security of DSA is not proven yet, but proving the security of DSA is beyond the scope of this paper. Confidentiality When a sender Alice’s key is not exposed, the confidentiality of SC-DSA is similar to that of CIPHER1, CIP HER1B (m) = (k = g x mod p, c = EKenc (m), h = hash(m) mod q) x where x is a random number chosen by a sender Alice, Kenc = hash(yB mod p), and yB is a recipient Bob’s public key.
Theorem 2. If Alice’s key is not exposed, then the confidentiality of SC-DSA is the same as that of CIPHER1, in the sense of semantic security. Proof: Our proof is very similar to Zheng’s proof of confidentiality of his signcryption scheme [3]. We suppose that AS and AC are attackers for SC-DSA and CIPHER1 respectively. Then we will show that one attacker is converted to other attacker with a little modification. 2
The computation of r and s is not required if Bob stores r and s computed in Step 5 and 6 during the unsigncrytion phase.
42
Jun-Bum Shin et al.
At first, suppose that there exist a probabilistic polynomial time attacker for CIPHER1 such that AC (p, q, g, yB , k = g x modp, c = EKenc (m), h = hash(m) x modq) where Kenc = hash(yB mod p). Then probabilistic polynomial time attacker for SC-DSA is constructed like follow: Algorithm AS (p, q, g, yB , yA , c, e1 , e2 ) e2 k ← g e1 yA mod p, r ← k mod q, s ← r/e2 mod q, h ← e1 s mod q, return AC (p, q, g, yB , t2 , c, h). Since the input of AC is a valid one, if AC gets some partial information about m (IN F OC (m)), then AS also gets IN F OC (m). Now suppose that there exist a probabilistic polynomial time attacker for SC-DSA such that AS (p, q, g, yB , yA , c, e1 , e2 ). Then probabilistic polynomial time attacker for CIPHER1 is constructed like follow: Algorithm AC (p, q, g, yB , k, c, h) r ← k mod q, choose s ∈ Zq∗ at random, ← (k s /g h )1/r mod p, yA e1 ← h/s mod q, e2 ← r /s mod q, , c, e1 , e2 ). return AS (p, q, g, yB , yA Since the input of AS is a valid one, if AS gets some partial information about m ✷ (IN F OS (m)), then AC also gets IN F OS (m). Remark 3. Theoretically, 1. SC-DSA does not provide the semantic security because CIPHER1 is not semantically secure. Note that an attacker can distinguish two messages m1 and m2 using the value h = hash(m) mod q in the ciphertext of CIPHER1. 2. Bao and Deng [2], and Yum and Lee [1] claimed that their signcryption schemes with public verifiability are semantically secure. However, their schemes are not semantically secure similar to our SC-DSA. Note that, when Bao and Deng’s technique is used, an attacker can see the signature part sig of a message m. Since and anyone can verify whether sig is a valid signature on m, the attacker can distinguish m1 and m2 using sig 3 . Remark 4. Practically, 1. SC-DSA provide the sufficient confidentiality of a message m when m is not guessable. Note that CIPHER1 does not leak any information about m 3
The same problem can be found in traditional signature-and-encryption approach [8].
New DSA-Verifiable Signcryption Schemes
43
except hash(m) mod q because the following cipher is semantically secure against the chosen plaintext attack [9]: CIP HER2B (m) = (k = g x mod p, c = EKenc (m)) x where Kenc = hash(yB mod p). 2. Bao and Deng’s scheme [2] and Yum and Lee’s one [1] also provide the sufficient confidentiality of a message m when m is not guessable by the same argument as for SC-DSA.
5
SC-DSA+
We propose a security-enhanced version of SC-DSA (SC-DSA+). 5.1 SC-DSA+ Specification Note that SC-DSA leaks small information about a message m, and that is similar to hash(m). So the problem can be fixed if (1) we can get a value Kmac that is hidden from the attacker, and (2) we change hash(m) to hash(m||Kmac ). Additionally, to apply the proof technique used in Beak et al.’s work [10], we will hash them along with bindA,B , which represents the concatenation of two identities of a sender Alice and a recipient Bob, SC-DSA+ is defined as follows: – Signcrypting: Alice chooses x ∈ Zq∗ at random, sets x mod p, 1. K ← yB 2. (Kenc , Kmac ) ← hash(K) 3. c ← EKenc (m), 4. k ← g x mod p, 5. r ← k mod q, and 6. h ← hash(m||bindA,B ||Kmac ) 7. s ← (h + xA r)/x mod q. 8. e1 ← h/s mod q and e2 ← r/s mod q and sends (c, e1 , e2 ) to Bob. – Unsigncrypting: Bob sets 1. k ← (yA g r )s mod p 2. K ← k xB mod p 3. (Kenc , Kmac ) ← hash(K) 4. m ← DKenc (c) 5. r ← k mod q, 6. s ← r/e2 mod q 7. h ← hash(m||bindA,B ||Kmac ) and checks whether e1 s ≡ h mod q. – Dispute handling: Bob sets e2 mod p) mod q 1. r ← (g e1 yA 2. s ← r/e2 mod q and sends (m||bindA,B ||Kmac , r, s) to the verifier. Remark 5. When SC-DSA+ is used, the DSA signature is (m||bind||k2 , r, s) instead of (m, r, s) in SC-DSA. However it is not the burden of SC-DSA+, because the verifier can simply regard last |bind||k2 |-bits of m||bind||k2 as garbage after verifying the signature, so the meaning of a message m is not changed.
44
5.2
Jun-Bum Shin et al.
Analysis
SC-DSA+ satisfies both unforgeability and non-repudiation because SC-DSA does. So we consider the confidentiality only. We adopt the confidentiality of signcryption as IND-CCA2 in Flexible Unsigncrytion Oracle model: Definition 3. (FUO-IND-CCA2, [10]) Let SC = (GK, SC, U SC) be a signcryption scheme and let A be an adversary that conduct adaptive chosen ciphertext attack. Then SC is FUO-IND-CCA2 secure, if for all polynomial time adversary A P r[(pkA , skA ) ←R GK(1k ); (pkB , skB ) ←R GK(1k ); (m0 , m1 , z) ← ASCO,USCO (f ind, pkA , pkB ); b ←R {0, 1}; c ← SCskA ,pkB (mb ) : ASCO,USCO (guess, c, z) = b] ≤ 12 + neg(k) where SCO is a signcryption oracle that signcrypt a message with access to skA and pkB ; U SCO is an unsigncryption oracle that unsigncrypts a ciphertext with access to skB ; and neg(·) is a negligible function. Baek, Steinfeld and Zheng showed that if Gap Diffie-Hellman (GDH) problem is hard and the symmetric encryption scheme in Zheng’s scheme is IND-CPA secure then modified Zheng’s scheme is FUO-IND-CCA2 secure [10]. Using same proof technique, we can show that SC-DSA+ is FUO-IND-CCA2 secure. Theorem 3. If the GDH problem is hard, the DSA is unforgeable against chosen message attack, and the symmetric encryption scheme is IND-CPA secure, then SC-DSA+ is FUO-IND-CCA2 secure in the random oracle model. Proof Sketch: The proof is almost same as Baek, Steinfeld and Zheng’s one [10]. They proved the security of modified Zheng’s scheme that is defined like follow: Modified Zheng’s Signcrypt(xA , yB , m) x ←R Z∗q x K ← yB mod p Kenc ← hash(K) c ← EKenc (m) r ← hash(m||bindA,B ||K) s ← x/(r + xA ) mod q return (c, r, s) At first, we will show that the output of SC-DSA+ is converted to the output which is similar to modified Zheng’s scheme. Let (c, e1 , e2 ) be the output of SCDSA+, then we can derive a new output (c, h, s) using algorithm T :
New DSA-Verifiable Signcryption Schemes
45
Algorithm T (yA , yB , c, e1 , e2 ) e2 k ← g e1 yA mod p r ← k mod q s ← r/e2 mod q h ← e1 s mod q return (c, h, s) Then h = hash(m||bindA,B ||Kmac ) and s = (h + xA r/x) mod q from our definition of SC-DSA+. Note that we derived this output using publicly known information and the derived output is similar with the output of Zheng’s scheme. And there exist an algorithm T −1 such that (c, e1 , e2 ) = T −1 (yA , yB , c, h, s). There remains two problems such that the message hash’s input and the s value of SC-DSA+ are different with Zheng’s scheme. Note that the difference of s value does not matter at the proof. So we have only to change the input of hash of SC-DSA+ similar to that of hash used in Zheng’s scheme. In order to solve the difference of message hash’s input, we define new message hash like follow: Algorithm hash (m, bindA,B , K) (Kenc , Kmac ) ← hash(K) h ← hash(m||bindA,B ||Kmac ) return h Note that hash is another representation of original hash used in SC-DSA+ because each operation of hash already exists in SC-DSA+’s signcrypt and unsigncrypt algorithm. Therefore, we can consider hash is used in SC-DSA+ instead of hash, and this implies that we can use the proof method of Baek, Steinfeld and Zheng. ✷
6
Conclusion
In this paper, we proposed two signcryption schemes, SC-DSA and SC-DSA+, whose signature can be verified using DSA. We believe that the standardization is one of the crucial factors for practical uses of cryptosystems, so the result of this paper is of practical importance because DSA is one of the most widely used standardized signature schemes. We analyzed our schemes including some previous ones and showed that (1) for unforgeability and non-repudiation, both SC-DSA and SC-DSA+ achieve them if and only if DSA does so, and (2) for confidentiality, – SC-DSA+ satisfies Baek, Steinfeld and Zheng’s recent security definition (FUO-IND-CCA2, [10]); but – Bao and Deng’s scheme [2], Yum and Lee’s KCCDSA-verifiable scheme [1], and our SC-DSA are not semantically secure, whereas all of them provide us a sufficient security when a message m is not guessable.
46
Jun-Bum Shin et al.
Table 1. Comparison of signcryption schemes Zheng Bao&Deng KCDSA(II) ([3]) ([2]) ([1]) 3 5 5
SC-DSA
SC-DSA+
StE*
SaE*
# of modular 5 5 6 6 exponentiations # of modular 1 1 0 2 2 3 3 divisions Public No SDSS1** KCDSA DSA DSA DSA DSA verifiability (Standard) (Standard) (Standard) (Standard) (Standard) Conf. of secure insecure insecure insecure secure secure insecure guessable m Conf. of secure secure secure secure secure secure secure unguessable m
* We consider the composition of DSA and DHIES [11]. ** Bao and Deng’s scheme achieves SDSS1-verifiability with small modifications as mentioned in Section 2.3.
The comparison results are summarized in Table 1. We mention that both SC-DSA and SC-DSA+ do not satisfy the forward secrecy with respect to the sender, because, when a sender Alice’s private key xA is exposed, an attacker can recover x from e1 and e2 . But, they can achieve the forward secrecy with small modifications: (1) we change the signcryption text from (c, e1 , e2 ) to (k, c ), where k = g x mod p and c = EKenc (m, e1 , e2 ), and Kenc is the hash value of k xB mod p; and (2) a recipient Bob checks whether k = e2 mod p during the unsigncryption phase. From the security of ElGamal-like g e1 yA encryption schemes, an attacker cannot see e1 and e2 even though the attacker knows a sender Alice’s private key xA . We claim that those variants of SC-DSA and SC-DSA+ satisfy both forward secrecy and DSA-verifiability. For the further research, analysis of our signcryption schemes using An, Dodis, and Rabin’s recent model [8] will be considered.
Acknowledgements We would like to thank Byungkuk Seo, Dae Hyun Yum, Pil Joong Lee, and anonymous referees for helpful discussions. This work was supported by the Korea Information Secrutiy Agency (KISA) under contact R&D project 2001S-092.
References [1] D. Yum and P. Lee, “New signcryption schemes based on KCDSA,” in The 4th International Conference on Information Security and Cryptology, pp. 341–354, Springer-Verlag, LNCS 2288, 2001. 35, 36, 39, 42, 43, 45, 46 [2] F. Bao and R. H. Deng, “A signcryption scheme with signature directly verifiable by public key,” in PKC98, pp. 55 – 59, Springer-Verlag, LNCS 1431, 1998. 35, 36, 37, 42, 43, 45, 46
New DSA-Verifiable Signcryption Schemes
47
[3] Y. Zheng, “Digital signcryption or how to achieve cost(signature & encryption) cost(siganture) + cost(encryption),” in Crypto’97, pp. 165–179, Springer-Verlag, LNCS 1294, 1997. 35, 37, 38, 41, 46 [4] H. Petersen and M. Michels, “Cryptanalysis and Improvement of Signcryption Schemes,” IEE Computers and Digital Communications, vol. 145, no. 2, pp. 149– 151, 1998. 35 [5] TTAS, “Digital Signature Mechanism with Appendix - Part 2 : Certificate-Based Digital Signature Algorithm (KCDSA),” TTAS.KO-12.0001/R1, 1998. 35 [6] NIST (National Institute for Standard and Techonology), “Digital Signature Standard (DSS).” FIPS PUB 186, 1994. 35, 37 [7] K. Nyberg and R. A. Rueppel, “Message recovery for signature schemes based on the discrete logarithm problem,” in Eurocrypt’94, pp. 182–193, Springer-Verlag, LNCS 950, 1994. 39 [8] J. H. An, Y. Dodis, and T. Rabin, “On the Security of Joint Signature and Encryption,” in Advances in Cryptology - EUROCRYPT 2002 Proceedings, SpringerVerlag, LNCS 2332, 2002. 42, 46 [9] M. Bellare and P. Rogaway, “Random oracles are practical: A paradigm for disigning efficient protocols,” in Proceedings of the 1st ACM Conference on Computer and Communication Security, pp. 62 – 73, 1993. 43 [10] J. Baek, R. Steinfeld, and Y. Zheng, “Formal Proofs for the Security of Signcryption,” in Proceedings of Public Key Cryptography 2002, Springer-Verlan, LNCS 2274, pp. 80 – 98, 2002. 43, 44, 45 [11] M. Abdalla, M. Bellare, and P. Rogaway, “DHIES: An Encryption Scheme Based on the Diffie-Hellman Problem,” in IEEE P1363a, ANSI X9.63EC, and SECG. Available at http://www-cse.ucsd.edu/users/mihir/papers/dhies.html. 46
Convertible Group Undeniable Signatures Yuh-Dauh Lyuu1 and Ming-Luen Wu2 1
2
Dept. of Computer Science & Information Engineering and Dept. of Finance National Taiwan University, Taiwan [email protected] Dept. of Computer Science & Information Engineering National Taiwan University, Taiwan [email protected]
Abstract. Group undeniable signatures are like ordinary group signatures except that verifying signatures needs the help of the group manager. In this paper, we propose a convertible group undeniable signature scheme in which the group manager can turn all or selective signatures, which are originally group undeniable signatures, into ordinary group signatures without compromising security of the secret key needed to generate signatures. The proposed scheme also allows the group manager to delegate the ability to confirm and deny to a limited set of parties without providing them the capability of generating signatures. For business applications, convertible group undeniable signatures can be widely used to validate price lists, press release or digital contracts when the signatures are commercially sensitive or valuable to a competitor. Our scheme is unforgeable, signature-simulatable and coalition-resistant. The confirmation and denial protocols are also zero-knowledge. Furthermore, the time, space and communication complexity are independent of the group size.
1
Introduction
In electronic life, digital signatures are used to verify whether one message really comes from the alleged signer or not. Like human signatures, standard digital signatures must be nonrepudiatable and universally verifiable. However, universal verifiability might not suit the circumstances under which verifying signature is a valuable action. Chaum and van Antwerpen [5] initiate an undeniable signature scheme in which anyone must interact with the signer to verify a valid signature and the signer can disavow an invalid signature through a denial protocol. The important property of non-repudiation still holds because the signer cannot disavow the signature through a denial protocol except that the signature is indeed invalid. With undeniable signatures, anyone needs the cooperation of the signer for verifying the signatures. This is not satisfactory because the signer might pass away or be occupied. Boyar et al. [2] first introduce the concept of convertible undeniable signatures: By releasing appropriate verification keys, the signer can P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 48–61, 2003. c Springer-Verlag Berlin Heidelberg 2003
Convertible Group Undeniable Signatures
49
turn all or selective signatures, which are original undeniable signatures, into ordinary digital signatures without compromising security of the secret key needed to generate signatures. The convertible schemes in [2, 7] consider converting valid undeniable signatures to universal verifiable ones. Michels and Stadler [9] present a convertible undeniable signature scheme in which the signer can not only convert valid undeniable signatures into ordinary signatures, but also convert invalid undeniable signatures into universal verifiable statements about the fact. A group signature scheme allows a group member to sign messages on behalf of the group without revealing his or her identity. Nevertheless, in case of a later dispute, a designated group manager can open the signature, thus tracing the signer. At the same time, any one—including the group manager—cannot misattribute a valid signature. The concept of group signature schemes is initiated by Chaum and van Heyst [6], while Camenisch and Stadler [3] present the first scheme in which the size of the public key and signatures is independent of the group size. Analogous to standard digital signatures, group signatures are both nonrepudiatable and universal verifiable. A group undeniable signature is like an ordinary group signature except that verifying signatures needs the help of the group manager. In this paper, we propose a convertible group undeniable signature scheme in which the group manager can convert all or selective signatures, which are originally group undeniable signatures, into universally verifiable ones without compromise of security of the signing key. The notions of convertible group undeniable signatures combine those of group signatures [6] and convertible undeniable signatures [9]. The proposed scheme also allows the group manager to delegate the ability to confirm and deny to a limited set of parties without providing them the capability of generating signatures. Our scheme is based on signatures of knowledge [3] and undeniable signature schemes [4]. We can show the present scheme is existentially unforgeable against adaptive chosen message attacks and is both signature-simulatable and coalition-resistant under reasonable numbertheoretical complexity assumptions and in the random oracle model [1]. The signature confirmation and denial protocols can be zero-knowledge by applying the commitment techniques. This paper is organized as follows. In Section 2, the convertible group undeniable signature model is introduced. Then, in Section 3, useful facts and assumptions in number theory are presented. Section 4 defines basic signatures of knowledge. Section 5 describes our scheme and discusses its security. Conclusions are given in Section 6.
2
Model
In this section we give the definition of a convertible group undeniable signature scheme, the related security requirements, and the significant efficiency considerations. First, we define group undeniable signature schemes. A group undeniable signature scheme consists of the following six components:
50
Yuh-Dauh Lyuu and Ming-Luen Wu
System Setup: The group secret and group public keys are generated for the group manager. Join: To become a group member, a user generates his secret key and membership key, and registers the membership key with the group manager. Then, the group manager sends to him the membership certificate. Sign: A group member can sign messages using his secret key, his membership certificate, and the group public key. Signature Confirmation Protocol: To verify a signature requires interacting with the group manager. Signature Denial Protocol: The group manager can prove to anyone that an invalid signature is invalid through a signature denial protocol. Open: The group manager can trace the identity of the member who actually signs a given message. A convertible group undeniable signature scheme is a group undeniable signature scheme with the following additional components: Individual Receipt Generation: Given a message, an alleged signature and the group secret key, the group manager can generate the individual receipt by which anyone can verify whether the alleged signature is valid or not. A group undeniable signature can be converted into an ordinary group signature by releasing its individual receipt. Individual Verification: Given a message, an alleged signature, an individual receipt, and the group public key, one can check the receipt is valid or invalid with respect to the alleged signature; in case of the former, the alleged signature can be verified using the receipt. Universal Receipt Generation: With the group secret key, the group manager can generate the universal receipt by which anyone can verify whether signatures are valid or not. A group undeniable signature scheme can totally be converted into an ordinary group signature scheme by releasing the universal receipt. Universal Verification: With the group public key, one can check the given universal receipt is valid or invalid. Suppose the receipt is valid. Given a message and an alleged signature, anyone can verify the signature using the receipt. In general, a group undeniable signature scheme has the following security considerations: Unforgeability: Only the group member can sign on behalf of the group. Unlinkability: No one except the group manager can recognize whether two different signatures are generated by the same group member. Anonymity: No one except the group manager can identify the signer. Non-transferability: No one can prove the validity or invalidity of signatures except the group manager. Zero Knowledge: The confirmation and denial protocols reveal no extra information beyond the validity or invalidity of signatures.
Convertible Group Undeniable Signatures
51
Exculpability: Neither the group manager nor a group member can sign on behalf of another group member. Traceability: The group manager can identify the signer of a valid signature. Coalition-Resistance: A colluding subset of group members can not generate valid signatures that can not be traced by the group manager. The efficiency of a group undeniable signature scheme involves the following interest parameters: – – – –
3
The size of the group signature. The size of the group public key. The efficiency of System setup, Join and Open. The efficiency of Sign and Verify (including the confirmation and deniable protocols).
Number-Theoretic Preliminaries
We present some number-theoretic results and assumptions. See [11, 12] for additional information. Notations. For integer n, Zn denotes the ring of integers modulo n, and Z∗n denotes the multiplicative group modulo n. Let φ(n) denote Euler’s phi function, which gives the number of positive integers m ∈ {1, 2, . . . , n − 1} such that gcd(m, n) = 1. Let r ∈R I represent that r is chosen randomly from a set I. The least positive integer d such that g d ≡ 1 (mod n) is called the order of g modulo n, and is denoted by ordn g or ord(g). A universal exponent of n is a positive integer u such that g u ≡ 1 (mod n) for all g relatively prime to n. The minimal universal exponent of n is denoted by λ(n). Fact 1. If q and p = 2q + 1 are both primes and a is a positive integer with 1 < a < p− 1, then −a2 is a quadratic nonresidue and a primitive root modulo p. Fact 2. Let M be a positive integer with odd prime factorization M = p1 p2 · · · pn . (1) λ(M ) = lcm(φ(p1 ), φ(p2 ), . . . , φ(pn )). (2) There exists an integer g such that ordM g = λ(M ), the largest possible order of an integer modulo M . (3) Let ri be a primitive root modulo pi . The solution of simultaneous congruences x ≡ ri (mod pi ), i = 1, 2, . . . , n, produces such an integer g. Fact 3. Let G =< g > be a cyclic group generated by g. If ord(g) = n and if r is a positive integer, then ord(g r ) = n/ gcd(n, r) Thus, if we choose a positive integer a such that gcd(a, n) = 1, then g a has the same order as g. Let G =< g > be a cyclic group generated by g with order n. Next, we present some number-theoretic problems. These problems are assumed to be intractable whether n is known or not.
52
Yuh-Dauh Lyuu and Ming-Luen Wu
Discrete Logarithm (DL): Given y ∈R G and the base g, find the discrete logarithm x of y = g x to the base g. Representation (Rep): Given y ∈R G and the base gi for i = 1, . . . , k, find the representation (x1 , x2 , . . . , xk ) of y = g1x1 g2x2 · · · gkxk to the bases g1 , . . . , gk . Equality of Discrete Logarithm (EDL): Given x, y ∈R G and the bases f, g, determine the equality of logf x and logg y over Zn . Root of Discrete Logarithm (RDL): Given y ∈R G, an exponent e and the e base g, find the e-th root x of y = g (x ) to the base g. The above intractable problems are used for signatures of knowledge described in the next section. Security of our signature scheme is also based on them.
4
Signatures of Knowledge
Signatures of knowledge allow a prover to prove the knowledge of a secret with respect to some public information noninteractively. This cryptographic tool has been used in many group signature schemes. In this section, we review the important signatures of knowledge, which are employed as building blocks in our signature scheme. Now, we explain the notation used in the following signatures of knowledge. Let G be a cyclic group generated by g with order M , where M is an RSA modulus. We denote by Greek letters the elements whose knowledge is proven and by all other letters the elements that are publicly known. Denote by the concatenation of two binary strings and by ∧ the conjunction symbol. Assume H is a collision resistant hash function which maps a binary string of arbitrary length to a hash value of fixed length. Knowledge of a Discrete Logarithm. A signature of knowledge of the discrete logarithm of y = g x ∈ G to the base g on the message m is a pair (c, s), which can be generated as follows. Choose r ∈ Z. Compute c = H(m y g g r ), s = r − cx. Such a signature can be computed by a signer who knows the secret x. We denote the signature by SKDL[α : y = g α ](m). ?
Any one can verify (c, s) by testing c = H(m y g g s y c ). 1 xe1j w xewj gb1j , . . . , yw = j=1 gbwj , Knowledge of a Representation. Let y1 = j=1 where eij ∈ {1, . . . , u} and bij ∈ {1, . . . , v}. A signature of knowledge of a representation (x1 , . . . , xu ) of y1 , . . . , yw with respect to the bases g1 , . . . , gv on the message m is (c, s1 , s2 , . . . , su ), which can be generated as follows. Choose ri ∈ Z for i = 1, . . . , u. Compute
Convertible Group Undeniable Signatures
i c =H(m y1 . . . yw g1 . . . gv {{eij , bij }j=1 }w i=1
1
re
gb1j1j · · ·
j=1
w
53
re
gbwjwj ),
j=1
si =ri − cxi , for i = 1, . . . , u. Such a signature can be computed by a signer who knows a representation (x1 , . . . , xu ). We denote this signature by SKREP[(α1 , . . . , αu ) : (y1 =
1
αe
gb1j1j ) ∧ · · · ∧ (yw =
j=1
w
αe
gbwjwj )](m).
j=1 ?
Any one can verify the signature by testing c = H(m y1 . . . gv 1 se1j c w sewj c i {{eij , bij }j=1 }w i=1 j=1 gb1j y1 · · · j=1 gbwj yw ). Knowledge of Roots of Representations. Such a signature is used to prove e that one knows the e-th root x of the g-part of a representation of v = f w g x ∈ G e to the bases f and g. A signature of knowledge of the pair (w, x) of v = f w g x on the message m consists of two components: i
– (v1 , . . . , ve−1 ), where vi = f ri g x for i = 1, . . . , e − 1 and ri ∈ Z, δ – SKREP[(γ1 , γ2 . . . , γe , δ) : v1 = f γ1 g δ ∧ v2 = f γ2 v1δ ∧ · · · ∧ ve−1 = f γe−1 ve−2 ∧ γe δ v = f ve−1 ](m). To generate the signature efficiently, a small integer e is chosen. A signer who knows (w, x) can generate such a signature. The first component is computed directly. Because ri ∈R Z, we know vi ∈R G. Furthermore, according to the i e equations vi = f ri g x and v = f w g x , we actually have γ1 = r1 , γi = ri − xγi−1 for i = 2, . . . , e − 1, γe = w − xγe−1 , and δ = x. Hence, the second component can be obtained. We denote this whole signature by e
SKRREP[(α, β) : v = f α g β ](m). Knowledge of Roots of Discrete Logarithms. Let e be a small integer. Assume f is also a generator of G and logg f is not known. A signature of e knowledge of the e-th root x of the discrete logarithm of y = g x to the base g on the message m comprises two components: e
– SKRREP[(α, β) : y = f α g β ](m), – SKDL[γ : y = g γ ](m). e
With the secret x, the signer knows a representation (0, xe ) of y = f 0 g x to the bases f and g. This is the only representation the signer knows; otherwise, he would be able to compute logg f . Therefore, we have α = 0, β = x, and γ = xe ; the two underlying signatures can be computed. To verify such a signature, one must check the correctness of the two components. We denote the signature by e
SKRDL[α : y = g α ](m).
54
Yuh-Dauh Lyuu and Ming-Luen Wu
According to the further results in [10, Section 3], in the random oracle model, the signatures SKDL and SKREP are simulatable and they are existentially unforgeable against adaptive chosen message attacks under the related numbertheoretic complexity assumptions. Thus, SKRREP and SKRDL clearly have the same properties.
5
The Scheme
Now we present our scheme and discuss its security. 5.1
System Setup
To derive the group secret and group public keys, the group manager computes the following values: n = p1 p2 , where both pi = 2qi + 1 and qi are primes for i = 1, 2, an RSA public key (q1 q2 , eR ) and secret key dR , an integer g ∈ Z∗n such that ordn g = q1 q2 , f = g a , Sf = f d , Sg = g b , u = g h , t = uρ , where a, d, b, h, ρ ∈R Z∗q1 q2 , and all arithmetic is modulo n, – (e, d) for e, d ∈R Z∗q1 q2 such that ed ≡ 1 (mod q1 q2 ), – – – –
It is noteworthy that n must be chosen such that factoring n and solving DL in Z∗n are intractable. By Fact 1 and 2, we can obtain g0 with order λ(n) = 2q1 q2 , and then have g = g02 with order q1 q2 by Fact 3. Moreover, the order of f, Sf , Sg , u, and t is also q1 q2 . The group manager keeps (b, d, dR , e, ρ−1 , p1 , p2 ) as the group secret key and opens (n, eR , f, g, Sf , Sg , u, t) as the group public key. 5.2
Join
When one, say Alice, wants to join the group, she chooses the secret key y ∈R Z∗n and computes her membership key z = g y mod n. We can assume that gcd(y, q1 q2 ) = 1. Alice sends z to the group manager, and proves to the group manager that she knows the discrete logarithm of z without revealing y. Next, the group manager chooses c ∈ Z∗q1 q2 such that (zg c )q1
= 1 (mod n) and (zg c)q2
=1 (mod n) (this can be done by testing at most three continuous integers). Note that gcd(y + c, q1 q2 ) = 1. Then the group manager computes Alice’s membership certificate (x = g c mod n, v = (c + b)dR mod q1 q2 , w = (zx)d mod n), and sends (x, v, w) to Alice. Such a (y, x, v, w) is called a valid signing key. It is important to note that the group manager must choose distinct c’s for different registers and prevent anyone from knowing c’s. In addition , by Fact 3, we have ord(z) = ord(x) = ord(w) = q1 q2 .
Convertible Group Undeniable Signatures
5.3
55
Sign
Given a message m, Alice can generate the signature S by computing the following nine values: – – – – – – – – –
gˆ = g r for r ∈R Z∗n , Z0 = Sgr , Z1 = gˆy , Z2 = xr , A1 = g y ur , A2 = tr , S0 = SKREP[(α, β) : gˆ = g β ∧Z0 = Sgβ ∧Z1 = gˆα ∧A1 = g α uβ ∧A2 = tβ ](m), e S1 = SKRDL[γ : Z2 Z0 = gˆγ R ](m), S2 = w r .
The above arithmetic is modulo n. Alice’s group undeniable signature on m is S = (ˆ g , Z0 , Z1 , Z2 , A1 , A2 , S0 , S1 , S2 ). We call S a valid group undeniable signaure if S is generated using a valid signing key. The correctness of S is the conjunction of the correctness of S0 , S1 , and S2 . Now we explain the roles of the elements in S. First, considering S0 , it proves that the same random number is used in the computation of gˆ, Z0 , A1 , and A2 , and proves that the same exponent y is used in Z1 = gˆy and A1 = g y ur for some y ∈R Z∗n . If S0 is correct, (A1 , A2 ) is an ElGamal encryption of z = g y with respect to the public key (u, t). The element S1 proves that Alice knows the knowledge of an eR -th root of the discrete logarithm of Z2 Z0 to the base gˆ. Finally, considering S2 , the verifier must interact with the group manager to check whether S2 = (Z1 Z2 )d or not. 5.4
Signature Confirmation Protocol
A signature confirmation protocol is an interactive protocol between the group manager and a verifier, in which the group manager can convince a verifier of the fact that a signature is valid. However, the group manager cannot cheat the verifier into accepting an invalid signature as valid except with a very small probability. In the sequel, we denote by P the group manager and by V the verifier. Let X −→ Y : Z represent that X sends Z to Y . In the confirmation protocol, common inputs to P and V include the message m, the group public key and the alleged signature S. The secret input to P is the group secret key. Now, we present how V can be convinced that S is valid. First, V checks S0 and S1 . If either is incorrect, then V recognizes that S is invalid. Otherwise, P and V do the following steps: 1. V −→ P : A V chooses e1 , e2 ∈R Z∗n , and computes A = S2e1 Sfe2 mod n. 2. P −→ V : B P computes B = Ae mod n.
56
Yuh-Dauh Lyuu and Ming-Luen Wu ?
3. V verifies that (Z1 Z2 )e1 f e2 = B mod n. If equality holds, then V accepts S as a valid signature for m. Otherwise S is undetermined. Our confirmation protocol is based on Chaum’s method [4]. To illustrate the protocol clearly, the above steps omit the zero-knowledge part. We can make the protocol zero-knowledge by modifying Step 2 as follows: P commits B to V using a commitment scheme such that V cannot learn what B is unless V sends the correct e1 and e2 to P. Because B = (Z1 Z2 )e1 f e2 mod n can be computed using the correct e1 and e2 , P reveals no extra information to V. Accordingly, the whole protocol is zero-knowledge. We prove that the verifier will accept a valid signature. Theorem 1. If S is a valid group undeniable signature, then the verifier will accept S as a valid signature for m. Proof. Obviously, S0 and S1 must be correct. Furthermore, because w = (g y+c )d modn, we have g )y+c )d ≡ (Z1 Z2 )d S2 ≡ wr ≡ ((g y+c )d )r ≡ ((ˆ So B ≡ Ae ≡ ((S2 )e1 (Sf )e2 )e ≡ (Z1 Z2 )e1 f e2 (mod n).
(mod n).
Next, we prove that the group manager cannot cheat a verifier into accepting an invalid signature as valid except with a very small probability. Theorem 2. If S is not a valid group undeniable signature, then a verifier will accept S as a valid signature for m with probability at most 1/q1 q2 . Proof. If S0 or S1 is incorrect, a verifier recognizes S as invalid. Now suppose S0 and S1 are correct. Because S is generated without a valid signing key, S2
= (Z1 Z2 )d mod n. P can make V accept the signature only if P can find B = (Z1 Z2 )e1 f e2 mod n such that (e1 , e2 ) satisfies A ≡ S2e1 (Sf )e2 (mod n). That is, (e1 , e2 ) satisfies the following two equations: A = S2e1 Sfe2 mod n e1
B = (Z1 Z2 ) f
e2
mod n,
(1) (2)
where S2
= (Z1 Z2 )d mod n. Assume A = f i , B = f j , S2 = f k , and Z1 Z2 = f , where i, j, k, . ∈ Zq1 q2 , and all arithmetic is modulo n. Recall Sf = f d mod n. From (1) and (2), we have i = ke1 + de2 mod q1 q2 j = .e1 + e2 mod q1 q2 .
(3) (4)
Because f k
= f d (mod n), k
= .d (mod q1 q2 ). As a result, there is only one solution for (e1 , e2 ) from (3) and (4). By Fact 3, the order of S2 , Sf , and Z1 Z2 is q1 q2 . Hence, there are at least q1 q2
Convertible Group Undeniable Signatures
57
ordered pairs (e1 , e2 ) corresponding to A. P can not identify which of them has been used to compute A by V. In addition, every B is the correct response for exactly one of the possible q1 q2 ordered pairs (e1 , e2 ) for e1 , e2 < q1 q2 . Consequently, the probability that P will give V the correct response B verified is at most 1/q1 q2 . The theorem is proven. 5.5
Signature Denial Protocol
A signature denial protocol is an interactive protocol between P and V, which allows P to convince V of the fact that an alleged signature is invalid. However, P cannot make V believe that a valid signature is invalid except with a very small probability. In the denial protocol, common inputs to P and V include two constants c1 and c2 , the message m, the group public key, and the alleged signature S. The secret input to P is the group secret key. Now, we present how P can make V accept an invalid signature S as invalid. First, V checks S0 and S1 . If either is incorrect, then V recognizes that S is invalid. Otherwise, P and V repeat the following steps at most c2 times. When V finds S is undetermined, the protocol stops. 1. V −→ P : A1 , A2 V chooses e1 ∈R Zc1 , e2 ∈R Zn and computes A1 = (Z1 Z2 )e1 f e2 mod n, A2 = S2e1 Sfe2 mod n. 2. P −→ V : B P computes A1 /Ae2 ≡ (Z1 Z2 /S2e )e1 (mod n). P finds e1 , and then sends B = e1 to V. ? 3. V checks whether B = e1 . If equality holds, then V is convinced that S is invalid one time. Otherwise S is undetermined. If convinced of S’s invalidity c2 times, V will accept S as invalid. It is noteworthy that P can perform at most c1 c2 operations to find the correct e1 ’s. The denial protocol is based on Chaum’s method [4]. To illustrate this protocol clearly, we omit the zero-knowledge part. Applying a commitment scheme, we can make the protocol zero-knowledge by modifying Step 2 as follows: P commits B to V such that V cannot learn what B is unless V sends the correct e2 to P. The correct e2 means that e2 satisfies A1 = (Z1 Z2 )e1 f e2 mod n and A2 = S2e1 Sfe2 mod n, where e1 is the value found by P. This can be checked by P. Because the correct e2 ensures that P and V have the same e1 , P reveals no extra information to V. Accordingly, the whole protocol is zero-knowledge. In the following theorem, we prove P can convince V of the fact that an alleged signature is invalid. Theorem 3. If S is not a valid group undeniable signature, then a verifier will accept S as an invalid signature for m. Proof. If S0 or S1 is incorrect, a verifier will recognize S as an invalid signature. Suppose S0 and S1 are correct. Because S is generated without a valid signing
58
Yuh-Dauh Lyuu and Ming-Luen Wu
key, S2
= (Z1 Z2 )d mod n. Therefore S2e
= Z1 Z2 . We have A1 /Ae2 ≡ (Z1 Z2 /S2e )e1 (mod n). Consequently, P can always find e1 and give the correct response. This implies that V will accept S as an invalid signature for m. Next, we prove that P cannot fool V into accepting a valid signature as invalid except with a small probability. Theorem 4. If S is a valid group undeniable signature, then a verifier will accept S as an invalid signature for m with probability 1/cc12 . Proof. Because S is valid, S0 and S1 are correct, and S2 = (Z1 Z2 )d mod n. Therefore S2e ≡ Z1 Z2 (mod n). We have A1 /Ae2 ≡ (Z1 Z2 /S2e )e1 ≡ 1 (mod n). In this case P can only randomly choose e1 from Zc1 . Consequently, V will accept S as an invalid signature for m with probability 1/cc12 . 5.6
Open −1
Given a valid signature S, the group manager can compute zP = A1 A−ρ . The 2 signer with the membership key z = zP can be traced directly. We notice that zP is an ElGamal decryption of (A1 , A2 ) with respect to the secret key ρ−1 . 5.7
Convertibility
We describe the four components for convertibility. Individual Receipt Generation. Let S be a signature for the message m. We show how to generate its individual receipt. The group manager chooses r ∈R Z∗q1 q2 , and computes the receipt R = (f˜, R1 , R2 , R3 ) as follows: f˜ = f r mod n, R1 = (Z1 Z2 )r mod n, H = H(m f˜ R1 ), R2 = SKREP[α : R1 = (Z1 Z2 )α ∧ f˜ = f α ](m), R3 = r − Hd mod q1 q2 . Obviously, releasing the individual receipt does not compromise security of the secret key d needed to generate signatures. Individual Verification. To check R, one sees the correctness of R2 and tests whether f˜ = f R3 SfH mod n. If both succeed, then the receipt R with respect to S is valid. Otherwise the receipt is invalid. If R is valid, then the alleged signature S can be verified by checking the correctness of S0 and S1 , and testing whether R1 = (Z1 Z2 )R3 S2H mod n. Hence, with the individual receipt R, the alleged signature S can be universally verified. Universal Receipt Generation. To make all signatures universally verifiable, the group manager releases e as the universal receipt. According to the basic assumption behind regular RSA, this does not compromise security of the secret key d needed to generate signatures.
Convertible Group Undeniable Signatures
59
Universal Verification. To check e, one can test whether f = Sfe mod n. If the equality holds, then e is valid. Otherwise e is invalid. If e is valid, then all alleged signatures can be verified by checking the correctness of S0 and S1 , and testing whether Z1 Z2 ≡ S2e (mod n). Consequently, the group undeniable signature scheme can totally be converted into an ordinary group signature scheme by releasing the universal receipt e. In addition, our scheme allows the group manager to delegate the ability to confirm and deny to a limited set of parties by issuing e only to them. 5.8
Security Analysis
The security notions below are considered under reasonable number-theoretic complexity assumptions and the random oracle model. Exculpability. Because the DL problem is intractable, neither the group manager nor a group member can compute the secret key of another group member. Thus, it is infeasible to frame another member. However, this does not prevent the group manager from generating any valid signatures. Unforgeability. We prove that our signature is existentially unforgeable against adaptive chosen message attacks. Recall that any valid signature S¯ must contain correct S0 , S1 , and S2 . Considering S2 , an attacker must obtain S2 = ξ d mod n, e where ξ = ξ1 ξ2 with ξ1 = g¯y¯ mod n, ξ2 Z¯0 = g¯v¯ R mod n. Using adaptive chosen message attacks, the attacker can compute many (ξ, ξ d )’s with random ξ’s, but he cannot learn d. From a random ξ, the two values ξ1 and ξ2 must be computed such that S0 and S1 are correct. Here S0 =SKREP[(α, β) : g¯ = g β ∧ Z¯0 = eR Sgβ ∧ ξ1 = g¯α ∧ A¯1 = g α uβ ∧ A¯2 = tβ ](m) and S1 =SKRDL[γ : ξ2 Z¯0 = g¯γ ](m). Next, we show that the attacker cannot simultaneously obtain correct S0 , S1 and S2 . Suppose α = y¯ and γ = v¯. Note that the attacker cannot compute S0 and S1 without knowing y¯ and v¯, respectively. Now, to obtain S0 from a (ξ, ξ d ), the attacker chooses y¯ and has ξ1 = g¯y¯ mod n. So ξ2 = ξξ1−1 mod n. Assume ξ2 = e g¯c¯ mod n. Because the value v¯ = (¯ c + b)dR satisfying ξ2 Z¯0 ≡ g¯v¯ R mod n cannot be obtained, S1 is existentially unforgeable against adaptive chosen message attacks. Consequently, we have the following theorem: Theorem 5. Our signature scheme is existentially unforgeable against adaptive chosen message attacks. Unlinkability, Anonymity, Non-transferability. These properties hold if the signatures are simulatable. Now, we show the signatures can be simulated. Let S be a valid signature. Assume the signer’s membership key z equals urz mod n for some rz ∈ Z∗n . So A1 = urz +r mod n. To generate an indistinguishable ˜ and then computes g˜ = ˜ the simulator randomly chooses r¯, r˜, y˜, c˜, d, signature S, ˜ r˜ ˜ r˜ ˜ y˜ ˜ c˜ ˜ r¯ ˜ r˜ ˜ g , Z0 = Sg , Z1 = g˜ , Z2 = g˜ , A1 = u , A2 = t , S2 = (Z˜1 Z˜2 )d , where all arithmetic is modulo n. Obviously, g˜, Z˜0 , A˜1 , and A˜2 are indistinguishable from gˆ, Z0 , A1 , and A2 , respectively. Because the EDL problem is intractable, Z˜1 , Z˜2 and S˜2 are indistinguishable from Z1 , Z2 , and S2 , respectively. Recall that S0 and S1 are simulatable in the random oracle model. Consequently, the whole signature is simulatable. Hence, we have the following theorem:
60
Yuh-Dauh Lyuu and Ming-Luen Wu
Theorem 6. Our signature scheme is signature-simulatable. Thus the properties of unlinkability, anonymity, and non-traceability hold. Zero Knowledge. By applying the commitment techniques, the confirmation and denial protocols reveal no extra information except for the validity or invalidity of a signature. As a result, our scheme can be zero-knowledge. Coalition-Resistance. We show that a colluding subset of group members cannot generate a valid signature that cannot be traced by the group manager. A valid signature S¯ must contain correct S0 , S1 , and S2 . Considering S2 , colluding members must obtain S2 = ξ d mod n, where ξ = ξ1 ξ2 with ξ1 = g¯y¯ mod n, eR ξ2 Z¯0 = g¯v¯ mod n. However, even using their signing keys, the colluding members cannot derive d; they can obtain ξ = g r mod n and ξ d mod n for any r. In addition, the two values ξ1 and ξ2 must be computed such that S0 and S1 are correct. Here S0 =SKREP[(α, β) : g¯ = g β ∧ Z¯0 = Sgβ ∧ξ1 = g¯α ∧ A¯1 = g α uβ ∧ A¯2 = e tβ ](m) and S1 =SKRDL[γ : ξ2 Z¯0 = g¯γ R ](m). Next, we show that the colluding members cannot simultaneously obtain correct S0 , S1 and S2 . Suppose α = y¯ and γ = v¯. We know that the colluding members cannot compute S0 and S1 without knowing y¯ and v¯, respectively. Now, to obtain the correct S0 , S1 and S2 , the colluding members must choose y¯ and c¯ such that y¯+¯ c and v¯ = (¯ c +b)dR can be comy¯ c¯ puted. Note that ξ1 = g¯ mod n, ξ2 = g¯ mod n, and ξ ≡ ξ1 ξ2 ≡ g¯y¯+¯c (mod n). In the following we show that obtaining such a c¯ is infeasible. Suppose a group member i has the signing key (yi , xi = g ci mod n, vi = (ci + b)dR mod q1 q2 , wi ). Because the colluding members cannot compute the ci ’s, solving for b is infeasic + b), where (¯ c + b) is any value such ble. Therefore c¯ cannot be derived from (¯ dR that (¯ c + b) can be obtained by the colluding members. As a result, y¯ + c¯ cannot be computed. This implies that it is infeasible to choose y¯ and c¯ such that y¯ + c¯ and v¯ = (¯ c + b)dR are derived simultaneously. Now, we have the following theorem: Theorem 7. Our signature scheme is coalition-resistant.
6
Conclusions
In this paper, we employ signatures of knowledge and RSA-based undeniable signature techniques to construct a convertible group undeniable signature scheme. Our scheme also allows the group manager to delegate the ability to confirm and deny to a limited set of parties without providing them the capability of generating signatures. Under reasonable number-theoretic complexity assumptions and the random oracle model, we can prove the group undeniable signature scheme is unforgeable, unlinkable, anonymous, non-transferable, and exculpable. The signature confirmation and denial protocols are zero-knowledge. Even a colluding subset of group members cannot generate valid signatures that cannot be traced.
Convertible Group Undeniable Signatures
61
References [1] M. Bellare and P. Rogaway. Random oracles are practical: A paradigm for designing efficient protocols. In Proc. 1st ACM Conference on Computer and Communications Security, pages 62–73, 1993. 49 [2] J. Boyar, D. Chaum, I. Damg˚ ard, and T. Pedersen. Convertible undeniable signatures. In Advances in Cryptology—CRYPTO ’90, pages 189–205, 1990. 48, 49 [3] J. Camenisch and M. Stadler. Efficient group signature schemes for large groups (extended abstract). In Advances in Cryptology—CRYPTO ’97, pages 410–424, 1997. 49 [4] D. Chaum. Zero-knowledge undeniable signatures (extended abstract). In Advances in Cryptology—EUROCRYPT 90, pages 458–464, 1990. 49, 56, 57 [5] D. Chaum and H. van Antwerpen. Undeniable signatures. In Advances in Cryptology—CRYPTO ’89, pages 212–216, 1989. 48 [6] D. Chaum and E. van Heyst. Group signatures. In Advances in Cryptology— EUROCRYPT 91, pages 257–265, 1991. 49 [7] I. Damg˚ ard and T. Pedersen. New convertible undeniable signature schemes. In Advances in Cryptology—EUROCRYPT 96, pages 372–386, 1996. 49 [8] S. J. Kim, S. J. Park and D. H. Won. Convertible group signatures. In Advances in Cryptology—ASIACRYPT 96, pages 311–321, 1996. [9] M. Michels and M. Stadler. Efficient convertible undeniable signature schemes. In Proc. 4th Workshop on Selected Areas in Cryptography (SAC ’97), pages 231–244, 1997. 49 [10] D. Pointcheval and J. Stern. Security arguments for digital signatures and blind signatures. Journal of Cryptology, 13(3):361–396, 2000. 54 [11] K. H. Rosen. Elementary Number Theory and its Applications (Third Edition). Addison Wesley, 1993. 51 [12] H. N. Shapiro. Introduction to the Theory of Numbers. John Wiley & Sons, 1983. 51
An Efficient Fail-Stop Signature Scheme Based on Factorization Willy Susilo and Rei Safavi-Naini Centre for Computer Security Research School of Information Technology and Computer Science University of Wollongong Wollongong 2522, Australia {wsusilo,rei}@uow.edu.au
Abstract. Fail-stop signature (FSS) schemes protect a signer against a forger with unlimited computational power by enabling the signer to provide a proof of forgery, if it occurs. In this paper, we show a flaw in a previously proposed fail-stop signature that is based on the difficulty of factorization, and then describe a secure scheme based on the same assumption.
1
Introduction
Security of an ordinary digital signature scheme relies on a computational assumption, that is assuming that there is no efficient algorithm to solve the hard problem that underlies the security of the scheme. This means that if an enemy can solve the underlying problem, he can successfully forge a signature and there is no way for the signer to prove that a forgery has occurred. To provide protection such an enemy, fail-stop signature (FSS) schemes have been proposed [15, 4]. Loosely speaking, an FSS is a signature scheme augmented such that the signer can prove that a forged signature was not generated by him/her. To achieve this property, the signature scheme is constructed such that there are many secret keys that correspond to the same public key and the sender knows only one of the keys. An unbounded enemy can find all the secret keys cannot determine which secret key is actually used by the sender. In the case of a forgery, that is signing a message with a randomly chosen secret key, the sender can use his secret key to generate a second signature for the same message. This signature will be different with overwhelming probability from the forged one. The two signatures on the same message can be used as a proof that the underlying computational assumption is broken and the system must be stopped - hence the name fail-stop. FSS schemes provide unconditional security for the signer, however security for the receiver is computational and relies on the difficulty of the underlying hard problem. FSS schemes in their basic form are one-time primitives and so the key can be used for signing a single message. FSS schemes and their variants have been studied by numerous authors (see, for example, [13, 14, 9, 12, 8, 11]). The schemes can be broadly divided into two P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 62–74, 2003. c Springer-Verlag Berlin Heidelberg 2003
An Efficient Fail-Stop Signature Scheme Based on Factorization
63
categories: those based on the hardness of discrete logarithm problem and those based on the difficulty of factorization. The first scheme that uses factorization as its underlying hard problem was proposed in [4, 14]. However, the signing algorithm in this scheme is very inefficient. In [12], an RSA-based FSS scheme was proposed. The scheme is attractive because of the way the proof of forgery works, i.e. by revealing the non-trivial factor of the modulus. Our Contributions In this paper, we will show that the scheme proposed in [12] does not have provable security and then propose a new FSS scheme based on factorization, which is provably secure. We evaluate the efficiency of the scheme and show that it is as efficient as the most efficient discrete logarithm based FSS scheme due to van Heijst and Pedersen [13]. We provide a complete security proof for our scheme. The paper is organized as follows. In section 2, we present the basic concepts and definitions of FSS, and briefly review the general construction and its relevant security properties. In section 3, we review the FSS construction in [12] and show its security flaw. In section 4, we present a new FSS construction based on the same assumption, and show that it is an instance of the general construction [4] and hence has provable security. Finally, section 5 concludes the paper.
2
Preliminaries
In this section, we briefly recall relevant notions, definitions and requirements of fail-stop signatures and refer the reader to [7, 6, 4] for a more complete account. 2.1
Notations
The length of a number n is the length of its binary representation and is denoted by |n|2 . p|q means p divides q. The ring of integers modulo a number n is denoted by Zn , and its multiplicative group, which contains only the integers relatively prime to n, by Zn∗ . Let N denote the natural numbers. 2.2
Review of Fail-Stop Signatures Schemes
Similar to an ordinary digital signature scheme, a fail-stop signature scheme consists of a polynomial time protocol and two polynomial time algorithms. 1. Key generation: is a two party protocol between the signer and the center to generate a pair of secret key, sk , and public key, pk . This is different from ordinary signature schemes where key generation is performed by the signer individually and without the involvement of the receiver. 2. Sign: is the algorithm used for signature generation. For a message m and using the secret key sk , the signature is given by y = sign(sk , m).
64
Willy Susilo and Rei Safavi-Naini
3. Test: is the algorithm for testing acceptability of a signature. For a message m, a signature y and a given public key pk , the algorithm produces a true ? response if the signature is acceptable under pk . That is test(pk , m, y) = true. An FSS also includes two more polynomial time algorithms: 4. Proof: is an algorithm for proving a forgery; 5. Proof-test: is an algorithm for verifying that the proof of forgery is valid. A secure fail-stop signature scheme must satisfy the following properties [14, 6, 4]. 1. If the signer signs a message, the recipient must be able to verify the signature (correctness). 2. A polynomially bounded forger cannot create forged signatures that successfully pass the verification test (recipient’s security). 3. When a forger with an unlimited computational power succeeds in forging a signature that passes the verification test, the presumed signer can construct a proof of forgery and convinces a third party that a forgery has occurred (signer’s security). 4. A polynomially bounded signer cannot create a signature that he can later prove to be a forgery (non-repudiability). To achieve the above properties, for each public key, there exists many matching secret keys such that different secret keys create different signatures on the same message. The real signer knows only one of the secret keys, and can construct one of the many possible signatures. An enemy with unlimited computing power, although can generate all the signatures but cannot determine which one is generated by the true signer. Thus, it would be possible for the signer to provide a proof of forgery by generating a second signature on the message with a forged signature, and use the two signatures to show the underlying computational assumption of the system is broken, hence proving the forgery. Security of an FSS can be broken if 1) a signer can construct a signature that he can later prove to be a forgery, or 2) an unbounded forger succeeds in constructing a signature that the signer cannot prove that it is forged. These two types of forgeries are completely independent and so two different security parameters, k and σ, are used to show the level of security against the two types of attacks. More specifically, k is the security level of the recipient and σ is that of the signer. It is proved [4] that a secure FSS is secure against adaptive chosen message attack and for all c > 0 and large enough k, success probability of a polynomially bounded forger is bounded by k −c . For an FSS with security level σ for the signer, the success probability of an unbounded forger is limited by 2−σ . In the following we briefly recall the general construction given in [4] and outline its security properties. 2.3
The General Construction
The construction is for a single-message fail-stop signature and uses bundling homomorphisms. Bundling homomorphisms can be seen as a special kind of hash functions.
An Efficient Fail-Stop Signature Scheme Based on Factorization
65
Definition 1. [4] A bundling homomorphism h is a homomorphism h : G → H between two Abelian groups (G, +, 0) and (H, ×, 1) that satisfies the following. 1. Every image h(x) has at least 2τ preimages. 2τ is called bundling degree of the homomorphism. 2. It is infeasible to find collisions, i.e., two different elements that are mapped to the same value by h. To give a more precise definition, we need to consider two families of groups, G = (GK , +, 0) and H = (HK , ×, 1), and a family of polynomial-time functions indexed by a key, K. The key is obtained by applying a key generation algorithm g(k, τ ), on two input parameters k and τ . The two parameters determine the difficulty of finding collision and the bundling degrees of the homomorphism, respectively. Given a pair of input parameters, k, τ ∈ N , firstly, using the key generation algorithm, a key K is calculated and then, GK , HK and hK are determined. For a formal definition of bundling homomorphisms see Definition 4.1 [4]. A bundling homomorphism can be used to construct an FSS scheme as follows. Let the security parameters of the FSS be given as k and σ. The bundling degree of the homomorphism, τ , will be obtained as a function of σ as shown below. 1. Prekey generation: The center computes K = g(k, τ ) and so determines a homomorphism hK , and two groups GK and HK . Let G = GK , H = KK and h = hK . 2. Prekey verification: The signer must be assured that K is a possible output of the algorithm g(k, τ ). This can be through providing a zero-knowledge proof by the center or by testing the key by the signer. In any case the chance of accepting a bad key must be at most 2−σ . 3. Main key generation genA : the signer generates her secret key sk := (sk1 , sk2 ) by choosing sk1 and sk2 randomly in G and computes pk := (pk1 , pk2 ) where pki := h(ski ) for i = 1, 2. 4. The message space M is a subset of Z. 5. Signing: The signature on a message m ∈ M is, s = sign(sk, m) = sk1 + m × sk2 where multiplying by m is m times addition in G. 6. Testing the signature: can be performed by checking, ?
pk1 × pk2m = h(s) 7. Proof of forgery: Given an acceptable signature s ∈ G on m such that = sign(sk, m), the signer computes s := sign(sk, m) and proof := (s, s ). s 8. Verifying proof of forgery: Given a pair (x, x ) ∈ G × G, verify that x = x and h(x) = h(x ).
66
Willy Susilo and Rei Safavi-Naini
Theorem 4.1 [4] proves that for any family of bundling homomorphisms and any choice of parameters the general construction: 1. produces correct signature; 2. a polynomially bounded signer cannot construct a valid signature and a proof of forgery; 3. if an acceptable signature s∗ = sign(sk, m∗ ) is found the signer can construct a proof of forgery. Moreover for two chosen parameters k and σ, a good prekey K and two messages m, m∗ ∈ M , with m = m∗ , let T := {d ∈ G|h(d) = 1 ∧ (m∗ − m)d = 0}
(1)
Theorem 4.2 [4] shows that given s = sign(sk, m) and a forged signature s∗ ∈ G such that test(pk, m∗ , s∗ ) = ok, the probability that s∗ = sign(sk, m∗ ) is at most |T |/2τ and so the best chance of success for an unrestricted forger to construct an undetectable forgery is bounded by |T |/2τ . Thus to provide the required level of security σ, we must choose |T |/2τ ≤ 2−σ . This general construction is the basis of all known provably secure constructions of FSS. It provides a powerful framework by which proving security of a scheme is reduced to specifying the underlying homomorphism, and determining the bundling degree and the set T . Hence, to prove security of a scheme two steps are required. 1. showing that the scheme is in fact an instance of the general construction; 2. determine bundling parameter and the size of the set T .
3
FSS Schemes Based on Factorization Assumption
There are two constructions of FSS schemes based on factorization, namely FSS based on quadratic residues modulo n [14, 4] and FSS based on RSA [12]. In this section, we briefly review the FSS construction based on RSA [12] and show that it is not secure, and in the next section, present a provably secure scheme based on the same assumption. The scheme in [12] consists of five algorithms, the dealer’s initialization, sender’s key generation, signature generation, signature verification and proof of forgery. In the dealer’s initialization step, the dealer D chooses two large safe primes p and q where p = 2p + 1 and q = 2q + 1 where p and q are prime numbers, and computes n = pq and φ(n) = (p − 1)(q − 1). He also chooses an element α ∈ Zn∗ , chooses his RSA secret key dD , such that gcd(dD , φ(n)) = 1 and computes the (mod φ)(n). Finally, he calculates the corresponding RSA public key eD = d−1 D public key β = αdD (mod n). The value (α, n) is published and (eD , β) is sent to the signer S via an authenticated channel.
An Efficient Fail-Stop Signature Scheme Based on Factorization
67
In the signer’s key generation, S chooses his secret key as four integers k1 , k2 , k3 , k4 ∈ Zn∗ and computes his public key (β1 , α1 , α2 ) as follows. β1 = αk4 β k3 α1 = αk3 β k1
(mod n) (mod n)
α2 = αk4 β k2
(mod n)
To sign a message m ∈ Zn∗ , S computes y1 = k1 m + k2 and y2 = k3 m + k4 , and publishes (y1 , y2 ) as his signature on m. Everyone can verify the signature by ?
testing whether αy2 β y1 = αm 1 α2 (mod n) holds. We omit the proof of forgery phase since it is not relevant to our discussion. We refer the reader to [12] for more detail. 3.1
Proof of Security
To prove security of the scheme we must show the groups GK , HK and the bundling homomorphism. The mapping h that determines the public key is defined as, h(p,q,α,β) : Zφ(n) × Zφ(n) → Zn∗ h(p,q,α,β) = αki β kj
(mod n),
ki , kj ∈ Zφ(n)
However, the signature is defined over Z which is not a finite group. That is, ki , kj ∈ Zφ(n) ; ym = ki m + kj Hence, it does not follow the general construction of [4]. One may modify the signature generation as follows ki , kj ∈ Zφ(n) ; ym = ki m + kj
(mod φ(n))
This would result in the bundling homomorphism to be – Families of groups: Let n = pq. Define GK = Zφ(n) and HK = Zn . – The homomorphism: h(p,q,α,β) is defined as: h(p,q,α,β) : Zφ(n) × Zφ(n) → Zn∗ ; ki , kj ∈ Zφ(n) ; h(p,q,α,β) = αki β kj
(mod n)
The revised scheme would follow the general construction of [4] but requires the value of φ(n) to be known by the signer to be able to generate a signature. However the knowledge of n and φ(n) allows the sender to factorize n [10] and be able to deny his signature. In the next section we give a new FSS scheme based on factorization. The only other FSS scheme based on factorization and with provable security is the scheme in [4] which is not practical.
68
4
Willy Susilo and Rei Safavi-Naini
A New FSS Scheme Based on Factorization
In this section we propose a new FSS scheme based on factorization and show that it is an instance of the general construction. Proof of forgery is by revealing the secret key kept by the dealer and so verifying the proof is very efficient. For simplicity, we describe our scheme with a single recipient model. As in [13], the scheme can be extended to multiple recipient by employing a coinflipping protocol. As the other FSS schemes, the basic scheme is one-time and can be only used once, however, it is possible to extend the scheme to sign multiple messages [2, 13, 5, 1]. Model There is only a single recipient, R who also plays the role of the trusted center and performs prekey generation of the scheme. Prekey Generation Given the two security parameters k and σ, R chooses two large primes p and q, where p = c1 βp + 1, q = c2 βq + 1, p , q , β are also prime, (c1 , c2 ) ∈ Z and gcd(c1 , c2 ) = 2 (which means that both c1 , c2 = 2˜ c, c˜ ∈ Z). For simplicity, assume c1 = 2 and c2 = 4. To guarantee security, |β|2 must be chosen such that the subgroup discrete logarithm problem for the multiplicative subgroup of order β in Zn∗ be intractable (for example, |n|2 ≈ 1881 bits and |β|2 ≈ 151 bits [3]). R computes n = pq. and selects an element α such that the multiplicative order of α modulo n is β. Let Nβ denote the subgroup of Zn∗ generated by α. R also chooses a secret random number a ∈ Nβ and computes γ = αa (mod n). (α, β, γ, n) is published and (p, q, a) is kept secret. We note that although the factors of n are of a particular form, to our knowledge there is no known efficient algorithm for factorization that can be applied in this case. Proposition 1. For α ∈ Zn∗ and knowing φ(n) (or λ(n)) and its factorization, it is easy to determine ordn (α), but without the knowledge of the factor of φ(n) (or λ(n)), it is hard to find ordn (α). Lemma 1. It is easy for R to find an element α where ordn (α) = β, for p = c1 βp + 1 and q = c2 βq + 1 and gcd(c1 , c2 ) = 2, when R knows the factorization of n. Proof (sketch). To find an element α where ordn (α) = β, R will perform the following. 1. Compute φ(n) = 2c1 c2 βp q , where c1 = c21 and c2 = c22 . 2. Find an element g ∈ Zn∗ of order φ(n). Based on Proposition 1, R can randomly choose an element g ∈ Zn∗ , find its order and if not equal to φ(n), choose another value. The algorithm is efficient because ordn (g)|φ(n) and φ(n) has small number of factors. 3. Set α = g 2c1 c2 p q (mod n) It is easy to see that ordn (α) = β.
✷
An Efficient Fail-Stop Signature Scheme Based on Factorization
69
Prekey Verification Prekey verification will be done by the signer S by verifying ?
αβ = 1 (mod n) and α =1
(mod n)
A prekey is good if the above equation holds. Key Generation S selects a1 , a2 , b1 , b2 ∈ Zβ as his secret key and computes η1 = αa1 γ a2
(mod n) and η2 = αb1 γ b2
(mod n)
The public key is (η1 , η2 ). Signing a Message m To sign a message m ∈ Zβ , S computes s1 = a 1 + b 1 m
(mod β)
and s2 = a2 + b2 m
(mod β)
and publishes (s1 , s2 ) as his signature on m. Verifying a Signature A signature (s1 , s2 ) on a message m passes the verification test if ?
η1 η2m = αs1 γ s2
(mod n)
holds. The verification algorithm works because η1 η2m
m (mod n) = αa1 γ a2 αb1 γ b2
(mod n)
a1 +b1 m a2 +b2 m
γ (mod n) =α s1 s2 (mod n) =α γ ✷ Proof of Forgery If there is a forged signature (s1 , s2 ) which passes the verification test, then the presumed signer can generate his own signature, namely (s1 , s2 ), on the same message, and the following equation will hold:
αs1 γ s2 = αs1 γ s2 s1 +a s2
α
s1 −s1
(mod n)
s1 +a s2
=α
(mod n)
a(s2 −s2 )
α =α (mod n) s1 − s1 = a(s2 − s2 ) (mod β) a = (s1 − s1 )(s2 − s2 )−1
(mod β)
By evaluating a, S can show that he can solve an instance of discrete logarithm problem which was assumed to be hard.
70
Willy Susilo and Rei Safavi-Naini
Proof (sketch): From the above proof of forgery steps, it is true that
αs1 −s1 = αa(s2 −s2 ) (mod n) s1 − s1 = a(s2 − s2 ) (mod β)
because ordn (α) = β. 4.1
✷
Security Proof
Firstly, we show that the scheme is an instance of the general construction proposed in [4] with the following underlying bundling homomorphism family. Bundling Homomorphism – Key Generation: On input the security parameters k and σ, two primes p and q with |q|2 = σ and |p|2 ≈ |q|2 , p = c1 βp + 1; q = c2 βq + 1; gcd(c1 , c2 ) = 2; (c1 , c2 ) ∈ Z; and an element α where ordn (α) = β are chosen. Let γ = αa (mod n). The key will be (p, q, α, β, γ). – Families of Groups: Let n = pq. Define GK = Zβ and HK = Zn∗ . The homomorphism h(p,q,α,β,γ) is h(p,q,α,β,γ) : Zβ ×Zβ → Zn∗ , a1 , a2 ∈ Zβ; h(p,q,α,β,γ)(a1 , a2 ) = αa1 γ a2 (mod n) Discrete Logarithm (DL) Assumption [10] Given I = (p, α, β), where p is prime, α ∈ Zp∗ is a primitive element and β ∈ Zp∗ , where αa ≡ β (mod p) it is hard to find a = logα β. Theorem 1. Under DL assumption, the above construction is a family of bundling homomorphisms. Proof. To show that the above definition is a bundling homomorphism, we have to show that (definition 4.1 [4]), 1. For any µ ∈ Zn∗ where µ = αa1 γ a2 (mod n), (a1 , a2 ) ∈ Zβ × Zβ , there are β preimages in Zβ . 2. For a given µ ∈ Zn∗ where µ = αa1 γ a2 (mod n), (a1 , a2 ) ∈ Zβ × Zβ , it is difficult to find a pair (a˜1 , a˜2 ) such that αa˜1 γ a˜2 = µ (mod n). 3. It is hard to find two pairs (a1 , a2 ), (a˜1 , a˜2 ) ∈ Zβ × Zβ that map to the same value. To prove property 1, we note that knowing µ = αk (mod n) = αa1 γ a2 (mod n) for γ = αa (mod n) and ordn (α) = β, there exists exactly β different values of (a˜1 , a˜2 ) in Zβ that satisfy k = a˜1 + aa˜2 (mod β). Hence there are β preimages for µ in Zβ .
An Efficient Fail-Stop Signature Scheme Based on Factorization
71
Now given µ = αa1 +a a2 (mod n), finding a1 + aa2 is equivalent to solving an instance of DL problem, which is hard (property 2). Property 3 means that it is difficult to find (a1 , a2 ) and (a˜1 , a˜2 ) such that αa1 γ a2 = αa˜1 γ a˜2 (mod n). Suppose that there is a probabilistic polynomialtime algorithm A˜ that could compute such a collision. Then, we construct an ˜ that on input (n, α, β, γ), where γ = αa (mod n), outputs the algorithm D secret value a as follows: ˜ runs A, ˜ and if A˜ outputs a collision, i.e. (s1 , s2 ) and (s˜1 , s˜2 ), such that First, D s1 s2 s˜1 s˜2 ˜ computes: (mod n), then D α γ =α γ
αs1 γ s2 = αs1 γ s2 αs1 +a
s2
s1 −s1
s1 +a
=α
(mod n) s2
(mod n)
a(s2 −s2 )
=α (mod n) α s1 − s1 = a(s2 − s2 ) (mod β) a = (s1 − s1 )(s2 − s2 )−1
(mod β)
˜ is successful with the same probability as A˜ and almost equally efficient. D Hence, it contradicts with the DL assumption. ✷ Theorem 2. The FSS scheme described above is secure for the signer. According to the Theorem 4.2 in [4], we must find the size of the set T : T := {(c1 , c2 ) ∈ Zβ × Zβ |αc1 γ c2 = 1
(mod n) ∧ (m (c1 + a c2 ) = 0)}
for all values of m between 1 and β − 1, given that the prekey is good. Since (0, 0) is the only element of this set, then the size of the set T is 1. Together with theorem 4.2 [4], this implies that it suffices to choose τ = σ in the proposed scheme. ✷ 4.2
Efficiency Comparison
In this section we compare efficiency of the proposed scheme with the best known FSS schemes. Efficiency of an FSS scheme has been measured in terms of three length parameters: the lengths of the secret key, the public key and the signature, and the amount of computation required in each case. To compare two FSSs we fix the level of security provided by the two schemes and find the size of the three length parameters, and the number of operations (for example multiplication) required for signing and testing. Table 1 gives the results of comparison of four FSS schemes when the security levels of the receiver and the sender are given by k and σ, respectively. In this comparison, the first two schemes (first and second column of the table) are chosen because they have provable security. The first scheme, proposed by van Heijst and Pedersen [13], is the most efficient and provably secure scheme, which is based on discrete logarithm assumption. We refer this scheme as DL scheme in
72
Willy Susilo and Rei Safavi-Naini
this paper. The second scheme is a factorization based FSS proposed in [14, 4]. The third scheme is the RSA based FSS scheme [12]. This scheme, although is insecure, is included for completeness. Column four corresponds to our proposed scheme. We use the same value of σ and k for all the systems and determine the size of the three length parameters. The hard underlying problem in all three schemes are Discrete Logarithm (DL) problem, Subgroup DL [3] and/or Factorization problem. This means the same level of receiver’s security (given by the value of parameter k) translates into different size primes and moduli. In particular, the security level of a 151 bits subgroup discrete logarithm with basic primes of at least 1881 bits, is the same as factorization of a 1881 bits RSA modulus [3]. To find the required size of primes in DL scheme, assuming security parameters (k, σ) are given, first K = max(k, σ) is found and then the prime q is chosen such that |q|2 ≥ K. The bundling degree in this scheme is q and the value of p is chosen such that q|p − 1 and (p − 1)/q be upper-bounded by a polynomial in K (page 237 and 238 [6]). The size of |p|2 must be chosen according to standard discrete logarithm problem, which for adequate security must be at least 1881 bits [3]. However, the size of |q|2 can be chosen as low as 151 bits [3]. Since |p|2 ˆ to denote |p|2 . and |q|2 are to some extent independent, we use K In the factorization scheme of [4], the security level of the sender, σ satisfies τ = ρ + σ where τ is the bundling degree and 2ρ is the size of the message space. Security parameter of the receiver, k, is determined by the difficulty of factoring the modulus n. Now for a given pair of security parameters, (k, σ), the size of modulus Nk is determined by k but determining τ requires knowledge of the size of the message space. Assume ρ = |p|2 ≈ |q|2 = Nk /2. This means that τ = σ + Nk /2. Now the efficiency parameters of the system can be given as shown in the table. In particular the size of secret and public keys are 2(τ + Nk ) and 2Nk respectively. In RSA-based FSS scheme [12], τ = |φ(n)|2 , and security of the receiver is determined by the difficulty of factoring n. This means that τ ≈ |n|2 . To design a system with security parameters (k, σ), first Nk , the modulus size that provides security level k for the receiver is determined and then K = max(σ, |Nk |2 ). The modulus n is chosen such that |n|2 = K. With this choice, the system provides adequate security for the sender and the receiver. In our proposed scheme bundling degree and hence security level of the sender is σ = τ = |β|2 . The security of the receiver is determined by the difficulty of factorization of n and discrete logarithm in a subgroup of size β in Zn∗ . Assume |p|2 ≈ |q|2 ≈ |n|2 2 and n ≈ c×|β|2 . Then we first find Nk which is the modulus size for which factorization has difficulty k. Next, we find Fk,Nk which is the minimum size of a multiplicative subgroup of Zn∗ for which subgroup discrete logarithm has hardness k. Finally, we choose K = max(Fk,Nk , σ) and set |β|2 = K. With these choices, the sender and receiver’s level of security is at least σ and k, respectively. ˆ to represent |n|2 . We use K The proposed scheme is more efficient than the factorization scheme of [12] and [4] and is as efficient as the DL scheme. In DL scheme, to achieve the
An Efficient Fail-Stop Signature Scheme Based on Factorization
73
Table 1. Comparison of efficiency parameters PK (mult) Sign (mult) Test (mult) Length of SK (bits) Length of PK (bits) Length of a signature (bits) Underlying hard problem
DL[13] Fact[14, 4] RSA based[12] 4K 2K 4K 2 K 2 3K 2K + σ 3K 4K 4K + 2σ 4K
Our FSS 4K 2 3K 4K
ˆ 2K
2K
2K
ˆ 2K
2K
2K + σ
4K
2K
DL
Factorization Factorization Factorization
ˆ must be at adequate security, K must be chosen to be at least 151 bits, and K least 1881 bits [3]. These are also the values required by our scheme.
5
Conclusions
We constructed a new FSS scheme based on factorization which is provably secure (c.f. [12]). The scheme is as efficient as the most efficient FSS scheme due to van Heijst and Pedersen [13] which is based on discrete logarithm problems. We proved security of the proposed scheme.
References [1] N. Bari´c and B. Pfitzmann. Collision-Free Accumulators and Fail-Stop Signature Schemes without Trees. Advances in Cryptology - Eurocrypt ’97, Lecture Notes in Computer Science 1233, pages 480–494, 1997. 68 [2] D. Chaum, E. van Heijst, and B. Pfitzmann. Cryptographically strong undeniable signatures, unconditionally secure for the signer. Interner Bericht, Fakult¨ at f¨ ur Informatik, 1/91, 1990. 68 [3] A. Lenstra and E. Verheul. Selecting cryptographic key sizes. online: http://www.cryptosavvy.com/. Extended abstract appeared in Commercial Applications, Price Waterhouse Coopers, CCE Quarterly Journals, 3:3 – 9, 1999. 68, 72, 73 [4] T. P. Pedersen and B. Pfitzmann. Fail-stop signatures. SIAM Journal on Computing, 26/2:291–330, 1997. 62, 63, 64, 65, 66, 67, 70, 71, 72, 73 [5] B. Pfitzmann. Fail-stop signatures without trees. Hildesheimer InformatikBerichte, Institut f¨ ur Informatik, 16/94, 1994. 68 [6] B. Pfitzmann. Digital Signature Schemes – General Framework and Fail-Stop Signatures. Lecture Notes in Computer Science 1100, Springer-Verlag, 1996. 63, 64, 72
74
Willy Susilo and Rei Safavi-Naini
[7] B. Pfitzmann and M. Waidner. Formal aspects of fail-stop signatures. Interner Bericht, Fakult¨ at f¨ ur Informatik, 22/90, 1990. 63 [8] R. Safavi-Naini and W. Susilo. A general construction for Fail-Stop Signature using Authentication Codes. Proceedings of Workshop on Cryptography and Combinatorial Number Theory (CCNT ’99), Birkh¨ auser, pages 343–356, 2001. 62 [9] R. Safavi-Naini, W. Susilo, and H. Wang. An efficient construction for fail-stop signatures for long messages. Journal of Information Science and Engineering (JISE) - Special Issue on Cryptology and Information Security, 17:879 – 898, 2001. 62 [10] D. R. Stinson. Cryptography: Theory and Practice. CRC Press, Boca Raton, New York, 1995. 67, 70 [11] W. Susilo, R. Safavi-Naini, M. Gysin, and J. Seberry. A New and Efficient FailStop Signature schemes. The Computer Journal vol. 43 Issue 5, pages 430 – 437, 2000. 62 [12] W. Susilo, R. Safavi-Naini, and J. Pieprzyk. RSA-based Fail-Stop Signature schemes. International Workshop on Security (IWSEC ’99), IEEE Computer Society Press, pages 161–166, 1999. 62, 63, 66, 67, 72, 73 [13] E. van Heijst and T. Pedersen. How to make efficient fail-stop signatures. Advances in Cryptology - Eurocrypt ’92, pages 337–346, 1992. 62, 63, 68, 71, 73 [14] E. van Heijst, T. Pedersen, and B. Pfitzmann. New constructions of fail-stop signatures and lower bounds. Advances in Cryptology - Crypto ’92, Lecture Notes in Computer Science 740, pages 15–30, 1993. 62, 63, 64, 66, 72, 73 [15] M. Waidner and B. Pfitzmann. The dining cryptographers in the disco: Unconditional sender and recipient untraceability with computationally secure serviceability. Advances in Cryptology - Eurocrypt ’89, Lecture Notes in Computer Science 434, 1990. 62
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature Scheme Guilin Wang Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore 119613 [email protected]
Abstract. A (t, n) threshold group signature scheme is a generalization of group signature, in which only t or more members from a given group with n members can represent the group to generate signatures anonymously and the identities of signers of a signature can be revealed in case of dispute later. In this paper, we first present a definition of threshold group signatures, and propose several requirements to evaluate whether a threshold group signature scheme is secure and efficient. Then we investigate the security and efficiency of a threshold group signature scheme proposed by Li, Hwang, Lee and Tsai, and point out eight weaknesses in their scheme. The most serious weakness is that there is a framing attack on their scheme. In this framing attack, once the group private key is controlled, (n − t + 1) colluding group members can forge a valid threshold group signature on any given message, which looks as if it was signed by (t−1) honest group members and one cheating member. At the same time, all these (t − 1) honest members cannot detect this cheating behavior, because they can use the system to generate group signatures normally. Keywords: digital signatures, group signatures, threshold group signatures, threshold-multisignatures.
1
Introduction
As a relatively new concept, group signatures are introduced and realized by Chaum and van Heyst in [10]. In a group signature scheme, each member of a given group is able to sign messages anonymously on behalf of the group. However, in case of later dispute, a group signature can be opened by the group manager and then the actual identity of signer can be revealed. From verifiers’ point of view, they only need to know a single group public key to verify group signatures. On the other hand, from the point of view of signing group, the group can conceal its internal organizational structures, but still can trace the signer’s identity if necessary. In virtue of these advantages, it is believed that group signatures have many potentially practical applications, such as authenticating price lists, press releases, digital contract, e-voting, e-bidding and e-cash etc. Inspired by the pioneering work of Chaum and van Heyst, a number of improvements and new group signature schemes have been proposed [11, 12, 31, P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 75–89, 2003. c Springer-Verlag Berlin Heidelberg 2003
76
Guilin Wang
5, 6, 7, 3, 2, 1]. In [11], Chen and Pedersen constructed the first scheme which allows new members join the group dynamically. They also pointed out the idea of sharing group public key to realize a t out of n threshold scheme, but did not provide concrete schemes. Camenisch presented an efficient group signature scheme with ability to add (or remove) group members after the initialization and then extended his scheme to a generalized group signature such that authorized subset of group members can sign messages on the group’s behalf collectively [5]. As an example of his generalized group signature scheme, Camenisch presented the first threshold group signature scheme. But in [5], both lengths of the group public key and a group signature are proportional to the group size. In [6], Camenisch and Stadler proposed the first group signature scheme whose group public key and signatures have length independent of the group size. Thus their scheme can be used for large groups. Camenisch and Michels [8] aimed to design generic group signature schemes with separability, i.e, the group members can choose their keys independently of each other. Ateniese et. al focused on some bothersome issues that stand in the way of real world applications and developments of group signatures, such as coalition attacks and member deletion [3, 2]. Based on their observation of an unsuitable number theoretic assumption in [6], Ateniese and Tsudik [3] presented some quasi-attacks on the basic scheme of [6] and then proposed some simple ways to prevent them. In [1], Ateniese et al. proposed a provably secure coalitionresistant group signature scheme. Kim, Lim and Lee [23] proposed a new group signature scheme with a member deletion procedure. Based on the notion of dynamic accumulators, Camenisch and Lysyanskaya proposed a new efficient method for the member deletion problem in group signature schemes. At the same time, they pointed out that the scheme proposed by Kim et al. in [23] is broken, i.e., deleted group members can still prove membership. In [24], Langford pointed out attacks on the group public key generation protocols in several threshold cryptosystems [19, 17, 25, 30], i.e, a group member can control the group private key. Michels and Horster [27] discovered some attacks against several multiparty signature schemes [17, 18, 25, 20]. Their attacks are in common that the attacker is an insider, i.e., a dishonest group member, and the protocol will be disrupted. Joye et. al [22, 21] showed that several newly designed group signature schemes are universally forgeable, that is, anyone (not necessarily a group member) is able to generate a valid group signature on arbitrary message, which cannot be traced by the group manager. By combining the idea of the (t, n) threshold signatures [13, 14, 17, 16] with the multisignatures [28, 4, 29, 15], Li, Hwang, Lee and Tsai [26] proposed a new type of signature called the (t, n) threshold-multisignatures with three properties: (1) Threshold characteristic: only t or more members of a given group can generate valid group signatures; (2) Anonymity: the group members generate group signatures anonymously, and they use pseudonyms as their identities in the public directory; (3) Traceability: the identities of signers can be revealed in exceptional cases, such as legal dispute. At the same time, they presented two
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
77
concrete such schemes [26], one needs a trusted share distribution center (SDC)1 while the other does not. Furthermore, they extended their proposed schemes to realize the generalized-multisignatures such that the group signatures can only be generated by some specified subsets of group members rather than by any subset of t members. We notice that in a multisignature scheme the identities of signers are often public and the public keys of signers are needed to verify a signature. At the same time, anonymity and traceability are two essential properties of a group signature scheme [10]. So, we believe that it is more accurate to call the (t, n) threshold-multisignature schemes in [25, 26] as (t, n) threshold group signature schemes. In this paper, we first present a definition to (t, n) threshold group signature schemes because such definition is not given previously. Then, we list several requirements to evaluate whether a threshold group signature scheme is secure and efficient. After that, we investigate the security and efficiency of the second scheme proposed by Li, Hwang, Lee and Tsai in [26]. For convenience, we will refer hereafter this scheme as LHLT scheme. According to these evaluation criteria, we point out eight weaknesses in LHLT scheme. The most serious weakness is that there is a framing attack 2 . The reason is that we find Langford’s attack can also be applied to LHLT scheme. Based on this weakness in the group public key generation protocol, we present the detailed procedure of this framing attack on LHLT scheme by demonstrating how (n − t + 1) colluding group members can forge a valid threshold group signature on any given message, which looks as if it was signed by (t − 1) honest group members and one cheating member. The rest of this paper is organized as follows. Section 2 proposes a definition of (t, n) threshold group signature schemes, and addresses the security and efficiency of these schemes. Section 3 reviews LHLT scheme briefly, and section 4 points out some weaknesses of it. After that, section 5 demonstrates how (n − t + 1) colluding group members can forge a valid threshold group signature on any given message to frame (t − 1) honest group members. Section 6 gives an example to explain the disadvantage of this framing attack and remarks to compare our framing attack with Michels and Horster’s attack [27].
2
Definition
Based on the formal definitions of group signatures given by [12, 5, 6, 7, 8, 3, 2, 1] and our understanding to threshold group signatures, we present the following formal definition. Definition 1. A (t, n) threshold group signature scheme is a digital signature scheme comprised of the following six procedures: 1 2
A SDC can also be called as a group manager, authority or dealer. In a framing attack, one or several group members have to be responsible for a signature generated by several other group members and/or non group members [11].
78
Guilin Wang
– SETUP: A protocol among group managers for setting system parameters and generating the initial group public key and group private key. – JOIN: A protocol between group managers and a user that results in the user becoming a new group member. – SIGN: A protocol among t or more group members for producing a group signature of a given message. – VERIFY: An algorithm for establishing the validity of a group signature when the group public key and a signed message are given. – OPEN: A protocol among group managers that reveals the actual identities of the t signers when a signed message and the group public key are given. – QUIT: A protocol between a group member and group managers for removing the group member from the system. A secure threshold group signature scheme must satisfy the following properties: 1. Correctness: All signatures on any message generated by any honest authorized subset of group members using SIGN will get accepted by VERIFY. 2. Unforgeability: Only group members are able to generate valid partial signatures for given messages. 3. Threshold Characteristic: Only t or more group members are able to generate valid threshold group signatures for given messages. 4. Anonymity: Given a threshold group signature, identifying the real signers is computationally hard for everyone but group managers. 5. Unlinkability: Deciding whether two different signatures were generated by the same subset of group members is computationally hard. 6. Exculpability: Any subset of group members or group managers cannot sign a message on behalf of another subset3 , i.e, without the existence of framing attacks. 7. Traceability: In case of dispute, a group signature can be opened and the real identities of signers can be revealed; moreover, the subset of signers cannot prevent the opening of a valid group signature. 8. Coalition-Resistance: A colluding subset of group members cannot generate a valid group signature such that it cannot be traced. The efficiency of a threshold group signature scheme is typically based on the following parameters: – Whether the size of the group public key is independent of the size of the group. – Whether the size of a group signature is independent of the size of the group. – The computational complexity and communications cost of SIGN, VERIFY and OPEN. – The efficiency of SETUP, JOIN and QUIT. 3
But this property does not preclude group managers from creating nonexistent members and then generating valid group signatures.
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
3
79
Review of LHLT Scheme
In LHLT (t, n) threshold group signature scheme [26], it is assumed that all communication channels among group members are secure and reliable. The whole scheme consists of four stages: system initialization, group public key and secret shares generation, partial signature generation and verification, and group signature generation and verification. Stage 1. System Initialization Some or all members collectively agreed on the public system parameters {p, q, α, H}, where: – – – –
p: a prime modulus such that 2511 < p < 2512 ; q: a prime such that q|(p − 1) and 2159 < q < 2160 ; α: a random generator of order q in GF (p); H: a collision free one-way hash function.
Stage 2. Group Public Key and Secret Shares Generation Each of member i in group A = {1, 2, · · · , n} randomly selects a polynomial fi (x), whose degree is no more than (t − 1), and a number xi ∈R {1, 2, · · · , q − 1}, denoted his pseudonym, then he computes yi as follows: yi = αfi (0) mod p. (xi , yi ) are the public key of member i, i ∈ A, and the polynomial fi (x) (especially fi (0)) is kept secretly. When all members have released (xi , yi ) through a broadcast channel, the group public key y can be determined as: f (0) mod q y= yi mod p = α i∈A i mod p . (1) i∈A
Then, member i generates following values to member j(j ∈ A, j = i) as: uij = gij + fi (xj ) mod q, where gij ∈R {1, 2, · · · , q − 1}; yij = αuij mod p = αgij +fi (xj ) mod q mod p ; zij = αgij mod p.
(2)
uij is sent privately to member j as his secret shares, but yij and zij are published as public information4 . Stage 3. Partial Signature Generation and Verification When t members of group A want to generate a signature for message m, each member i, i ∈ B (B ⊂ A and |B| = t) selects a random number ki ∈R [1, q − 1], computes and broadcasts a public value ri as: ri = αki mod p. 4
Member j can use yij and zij to check whether he received correct secret shares from member i . For details, please consult [26].
80
Guilin Wang
Once all rj (j ∈ B) are available, each member i computes values R and E, and then his partial signature si as follows: k mod q mod p , R = j∈B rj mod p = α j∈B j (3) E = H(m, R), si = fi (0) + j∈A\B uji · CBi + ki · E mod q. Where, CBi is the Lagrange interpolating coefficient given by xj CBi = mod q. xj − xi
(4)
j∈B\{i}
Then, each member i (i ∈ B) sends his partial signature (m, i, ri , si ) to the designated combiner DC (any member in group A or the verifier of a signature can play this role). After computed the values R and E displayed by equation (3), DC uses public information (xi , yi ) and yji (j ∈ A \ B) to verify the validity of (m, i, ri , si ): ?
αsi ≡ yi ·
yji
CBi
· riE mod p,
∀i ∈ B.
(5)
j∈A\B
Stage 4. Group Signature Generation and Verification If all partial signatures (m, i, ri , si ), i ∈ B, are valid, then DC produces the group signatue (m, B, R, S) by the following two equations: ri mod p, S= si mod q. (6) R= i∈B
i∈B
When a verifier want to verify the validity of a group signature (m, B, R, S), he first computes values E and T as follows: E = H(m, R); CBi T = i∈B mod p. z ji j∈A\B
(7)
Then, the verifier uses the group public key y to check whether the following equality holds ?
αS ≡ y · T · RE mod p.
(8)
If yes, he accepts (m, B, R, S) as a valid group signature. Li et al. did not provide the proof to the correctness of this scheme, so we give the following theorem to guarantee the correctness of LHLT scheme5 . Theorem 1. If all members i ∈ B and DC are honest, then the group signature (m, B, R, S) generated by them is valid, i.e., it satisfies the equation (8). 5
Theorem 1, 2 and 4 in [26] do not express the correctness of the three schemes but repeat the definitions of valid group signatures.
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
81
Proof. First, from the definitions of S and si , we have S= si = fi (0) + CBi · uji + ki · E mod q. i∈B
i∈B
i∈B j∈A \B
i∈B
If we replace uji in the above equation by fj (xi ) and gji according to the first equation of (2) and sum them seperately, then we can get the following equation: CBi · fj (xi ) + CBi · fi (0) + gji + E · ki mod q. S= i∈B
j∈A \B i∈B
i∈B
j∈A\B
i∈B
Furthermore, we replace the items in the second expression of the above equation by the following Lagrange interpolating equation fj (0) = CBi · fj (xi ) mod q. (9) i∈B
Then, we get S=
i∈A
fi (0) +
CBi · gji + E · ki mod q. i∈B
j∈A\B
i∈B
Finally, if we do the exponential operations on base α to the both sides of the above equation, then we will know equation (8) holds.
4
Weaknesses in LHLT Scheme
In [26], Li et al. indeed presented elaborate security analysis for their schemes. However, from the above description of LHLT scheme, it is not difficult to see that this threshold signature scheme has the following eight weaknesses: first four of them are about the efficiency, and others about the security. (1) Public Key Length. In fact, the public key of LHLT scheme not only consists of y, but also includes (i, xi , yi ) and (yij , zij ), ∀ i, j ∈ [1, n]. Because the DC needs yi and yij to check the validity of each partial signature si according to equation (5), verifiers need zij to calculate the value T in equation (7), and both of them need xi to calculate CBi (recall equation (4)). So, the public key length is dependent of the size n of the group. (2) The Size of Signatures. In a signature pair (m, B, R, S), B is dependent of the size of threshold t. If n and t are big integers, then the size of signatures becomes big, too. (3) Member Deletion. This is an open problem [3] on the design a practical group signature scheme. LHLT scheme did not provide any solution to it, i.e., this scheme lacks QUIT procedure. (4) Member Addition. LHLT scheme mentioned that a new member n + 1 can be dynamically added without affecting the shares of old members. But in fact, in addition to publish his public key pair (xn+1 , yn+1 ), many things
82
(5)
(6)
(7)
(8)
5
Guilin Wang
have to be done before the new member n + 1 becomes a group member. For example, new member n + 1 has to distribute un+1,j to old members and publish yn+1,j and zn+1,j as public information; old member j has to send uj,n+1 to new member n + 1 and publish yj,n+1 and zj,n+1 as public information. Moreover, in some cases, this procedure will reveal the real identity of the new member n + 1, because it is possible that all the real identities of members in the group are known publicly (but the corresponding map between identities and public key pairs is a secret). An example of these cases is the directorate of a corporation, where the public key pair (xj , yj ) of each old member is not changed. But by comparing the identities and public key pairs of the old group and new group, everyone can recognize the real identity of the new member and his public key pair. So there is no anonymity for the new member. In this scenario, maybe the only choice is to reset the system by updating the group public key, and all parameters and secret shares of all members. Anonymity. From the subset B of a valid signature pair (m, B, R, S), each verifier can learn the pseudonyms of all signers, so LHLT scheme can only provide weak anonymity. Unlinkablility. Using information revealed by B, verifiers can link all signatures signed by the same subset or the same member. Therefore, LHLT scheme does not possess unlinkability. Traceability. LHLT scheme does not provide any method to bind the real identity of a member with his pseudonym, so the tracing procedure is not described in details. However, in distribution environments, how to record members’ real identities and maintain the relationship between real identities and pseudonyms is really not easy. Exculpability. In [26], Li et al. claimed that the signing set of a group signature cannot be impersonated by any other set of group members, i.e., without the existence of framing attacks. But, in fact there is a framing attack on LHLT scheme. So this threshold group signature scheme does not have exculpability. Details of the framing attack are given in next section.
A Framing Attack on LHLT Scheme
In this section, we present the details about how (n − t + 1) colluding members can forge a valid group signature on any message. This forged signature looks as if it is signed by other (t − 1) honest members and one of these corrupted members. At the same time, some of the group members, including all honest members, can generate group signature properly. So, honest members feel the system works normally and cannot detect the existence of any deceit. But in the case of disputes, such forged signatures are opened, and then these honest members have to take responsibility for them. For convenience, we assume that the first (t − 1) members, i.e., member 1, · · ·, t − 1, are honest: each of them honestly selects parameters, distributes secret shares, receives and checks his secret shares sent by other members to meet the
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
83
requirements of section 2, and does not reveal any uji sent by other members and gij selected by himself to anybody. But all other members collude with member n: they also select parameters and distribute secret shares to meet the requirements described in section 2; however, some of them reveal the values gij selected by themselves to member n, others of them intentionally ignore the fact that member n does not send values uni to them. The whole procedure includes three steps: member n controlling the group private key, member n distributing secret shares, and forging valid group signatures. 5.1
Member n Controlling the Group Private Key
In LHLT scheme, it is not required whether all public keys yi should be published simultaneously when generating the group public key y according to equation (1). So member n can publish his public key yn last after he has learned all other yi , i ∈ {1, · · · , n − 1}, in spite that he has prepared his public key yn as follows, yn = αfn (0) mod p. Now, member n computes and broadcasts the following value y¯n as his public key by using all published values yi , i ∈ {1, · · · , n − 1}: y¯n = yn ·
n−1
yi−1 mod p.
i=1
Hence, all members in group A will take yn as the group public key y, but member n knows the group private key fn (0) corresponding to y, because the following equation holds: y = y¯n ·
n−1
yi = yn = αfn (0) mod p.
i=1
Of course, member n does not know his private key f¯n (0) corresponding to y¯n unless he can solve the following discrete logarithm problem: ¯
y¯n = αfn (0) mod p. Once member n controlled the group private key, he can collude with other (n − t + 1) members to forge a valid group signature. 5.2
Member n Distributing Secret Shares
By imagining knowledge of a polynomial f¯n (x) ∈ Zq [x] with degree less than t and such that the free term of f¯n (x) is f¯n (0), member n can successfully share his private key f¯n (0) with other members, although he does not know the value of it. Here is the basic idea: Member n selects random numbers as secret shares for the first t − 1 (honest) members, but computes other shares for the rest members (his accomplices). The concrete method is described as follows.
84
Guilin Wang
1. Member n selects 2(t−1) random numbers anj , bnj ∈R [1, q−1](1 ≤ j ≤ t−1) as the corresponding g¯nj and f¯n (xj ), respectively, and computes: unj = g¯nj + f¯n (xj ) mod q (= anj + bnj mod q), ¯ ynj = αunj mod p = αg¯nj +fn (xj ) mod q mod p , znj = αg¯nj mod p = αanj mod p .
(10)
Then, for every j ∈ {1, · · · , t − 1}, member n sends unj to member j secretly, and publishes ynj and znj as public information. 2. Because t values of the function f¯n (x), i.e., f¯n (0), f¯n (x1 ), · · · , f¯n (xt−1 ), has been fixed (although member n does not know the exact value of f¯n (0)), the function f¯n (x) is determined. For every l ∈ [t, n − 1], if let Bl = {1, 2, · · · , t − 1} ∪ {l}, then the following equation holds ¯
¯
y¯n = αfn (0) = αCBl l ·fn (xl ) ·
t−1
¯
αCBl j ·fn (xj ) mod p,
∀l ∈ [t, n − 1].
j=1 ¯
From this equation, member n can compute the value of αfn (xl ) as follows: t−1 CBl l −1 ¯ ¯ α−CBl j ·fn (xj ) αfn (xl ) = y¯n ·
mod q
mod p,
∀l ∈ [t, n − 1]. (11)
j=1
3. For the next k (1 ≤ k ≤ n−t) members (i.e., number t, · · ·, t+k−1) after the firt (t − 1) members, member n selects k random numbers unl ∈R [1, q − 1], and computes ynl = αunl mod p, ¯ znl = ynl · α−fn (xl ) mod p. ¯
¯
Where, α−fn (xl ) is the inverse of αfn (xl ) determined by equation (11). But in this case, member n does not know the value of g¯nl , for each l ∈ [t, t + k − 1]. 4. For the last (n − t − k) members (i.e., member t + k, · · ·, n − 1), member n selects g¯nl ∈R [1, q − 1], and computes znl and ynl as follows: znl = αg¯nl mod p, ¯ ¯ ynl = αg¯nl · αfn (xl ) mod p = znl · αfn (xl ) mod p . ¯
Where, αfn (xl ) is determined by equation (11). In this case, member n does not know the value of unl , for each l ∈ [t + k, n − 1]. 5. Up to now, the knowledge of member n is showed in table 1. Let set C = {1, 2, · · · , t − 1, t, · · · , t + k − 1}, table 1 shows that each member i ∈ C knows uni , so any t members in C can generate valid group signatures normally by using equations (3) and (6). But member n does not know f¯n (0), and member l (l ∈ [t + k, n − 1] does not know unl , so they cannot take part in the normal generation of threshold group signatures. Moreover, the situation is worse than this, because there is a framing attack on LHLT scheme.
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
85
Table 1. The Knowledge of Member n Index l l ∈ [1, t − 1] l ∈ [t, t + k − 1] l ∈ [t + k, n − 1]
5.3
Member n knows g¯nl , f¯n (xl ), unl , ynl , znl ¯ unl , ynl , znl , αfn (xl ) ¯ g¯nl , ynl , znl , αfn (xl )
Member n does not know f¯n (0) f¯n (0), f¯n (xl ), g¯nl f¯n (0), f¯n (xl ), unl
Forging Valid Group Signatures
After member n distributed secret shares, he can collude his (n − t) conspirators (i.e., all members j, j ∈ [t, n − 1]) to forge a valid group signature for any message m. But (t − 1) honest members and one cheating member have to take responsibility for this forged signature because it includes their pseudonyms and all pseudonyms can be opened if necessary. Now we describe the whole procedure as follows. 1. Member n first selects t random numbers ki ∈R [1, q − 1] (i ∈ Bl = {1, 2, · · · , t − 1, l} and l ∈ [t + k, n]), then computes values R and E as follows ki mod q R = α i∈Bl mod p = i∈Bl ri mod p , E = H(m, R). 2. If l ∈ [t + k, n − 1], each conspirator j (j ∈ A \ Bl \ {n}) sends his secrets gji (for all i ∈ Bl ) to member n. According to table 1, member n knows all g¯ni (i = 1, · · · , t − 1) and g¯nl because l ∈ [t + k, n − 1] , so he can compute a signature Sl as follows: CBl i · g¯ni + CBl i ·gji + ki ·E mod q. (12) Sl = fn (0)+ i∈Bl
j∈A\Bl \{n} i∈Bl
i∈Bl
3. If l = n, each conspirator j (j ∈ A \ Bn ) sends gji (for all i ∈ Bn ) to member n, so member n can compute a signature Sn as follows: CBn i · gji + ki · E mod q. (13) Sn = fn (0) + j∈A\Bn i∈Bn
i∈Bn
4. Thus, all (n − t + 1) corrupted members, including member n, forged a group signature (m, Bl , R, Sl ) for message m such that a verifier believes that it is signed collectively by member 1, · · ·, t − 1, and l. The following theorem guarantees the validity of the forged group signature (m, Bl , R, Sl ) obtained from the above procedure. Theorem 2.
The above forgery attacks are successful, i.e.:
(1) If l ∈ [t + k, n − 1], then the forged signature (m, Bl , R, Sl ) computed from equation (12) is a valid threshold group signature for message m;
86
Guilin Wang
(2) If l = n, then the forged signature (m, Bn , R, Sn ) computed from equation (13) is also a valid threshold group signature for message m. Proof. (1) In the case of l ∈ [t + k, n − 1], if t members in subset Bl select the same t numbers ki as in the first step of the above procedure, then their valid signature for message m is given by the following S: CBl i ·uni + S= si = fi (0)+ CBl i ·uji + ki ·E mod q. i∈Bl
i∈Bl
i∈Bl
i∈Bl
j∈A\Bl \{n}
By replacing the uni and all uji by the right sides of the first equation in (10) and (2), exploiting the Lagrange interpolating equation (9), and using the fact that fn (0) = f¯n (0) + fn−1 (0) + · · · + f1 (0), the above equation can be rewritten as S = fn (0) + CBl i · g¯ni + CBl i · gji + ki · E mod q. i∈Bl
j∈A\Bl \{n} i∈Bl
i∈Bl
By comparing the right sides of the above equation and (12), it is showed that S = Sl . So, according to Theorem 1, the forged tuple (m, Bl , R, Sl ) computed from equation (12) is a valid threshold group siganture for message m. (2) When l = n, the validity of signature (m, Bn , R, Sn ) can be proved similarly.
6
An Example and Remarks
In this section, we first give a simple example to explain the disadvantage of the above framing attack. Then, we compare our framing attack with Michels and Horster’s attack [27]. At last, several simple methods to avoid these attacks are given. As an example, we assume that ten members in the directorate of a corporation use a (7, 10) threshold group signature scheme to vote on a proposal m by setting t = 7 and n = 10. As a regulation of this corporation directorate, proposal m is passed if and only if a valid threshold group signature for m is produced, i.e., at least seven members agree on this proposal and then produce valid partial signatures for it. But in fact, the first six members of this directorate disagree on m, while other four members agree on it. If a secure threshold group signature scheme is used, it is impossible to generate a valid group signature in this scenario. But now, we assume that the LHLT threshold group signature scheme is used and member 10 has controlled the group private key. Therefore, the last four members can forge a valid group signature for m in the pseudonyms of {1, 2, 3, 4, 5, 6, 9} or {1, 2, 3, 4, 5, 6, 10} (let k = 2). The result is that the proposal m is passed by the directorate of this corporation, although most members disagree on it. All the honest members do not detect the existence of deceit, because any 7 members of set {1, 2, 3, 4, 5, 6, 7, 8} can produce group signatures normally. But member 9 and 10 cannot generate valid partial signatures.
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
87
In [27], Michels and Horster also pointed out a framing attack on two schemes in [25] 6 . Their attack can also be applied to LHLT scheme since Li et. al [26] did not take any countermeasure to prevent this attack. In Michels and Horster’s attack (see §4.3 of [27]), it is assumed that member 1 colludes with member t, · · · , n and the DC (or called as the clerk) to cheat member 2, · · · , t − 1. The result is that when members in B = {1, 2, · · · , t − 1, t} generate a signature (m, B, R, S) on a message m, member 1 (and other cheating members) can generate a valid ˜ R, ˜ S) ˜ on the same massage m under the name threshold group signature (m, B, ˜ of B = {1, 2, · · · , t − 1, t + 1}. Our attack is stronger than Michels and Horster’s in the following senses: – In Michels and Horster’s attack, dishonest members can only forge valid signatures on those messages that honest member 2, · · · , t − 1 agree to sign. In our attack, however, dishonest members can forge valid signatures on any messages selected by themselves. ˜ R, ˜ S), ˜ member 1 has to disrupt – In order to generate signature pair (m, B, the signing protocol for one time. So, this abnormal action can be detected by honest members. But, in our attack, dishonest members don’t need to interact with any honest member. Therefore, honest members can only find something wrong when they get a valid group signature signed under their names but they did not sign it at all. – When a threshold group signature is opened, the true signers are identified. Then, they will deserve awards or punishments corresponding to whether their decision on signing message m is good or bad. In Michels and Horster’s ˜ i.e., member t and t + 1. attack, only one member is exchanged in B and B, Moreover, both of them are dishonest members. So, their attack means that one dishonest member t + 1 substitutes another dishonest member t to take awards or bear punishments. But, in our attack all honest members are involved in the dispute. – To overcome their attack, Michels and Horster proposed an improvement to schemes in [25]: compute E = H(m, R, B) instead of E = H(m, R) and use a simultaneous channel for the distribution of ri or require that all signers prove knowledge of the discrete logarithm of ri without revealing its value. Even though LHLT scheme is modified according to this improvement, our attack will work as well. The reason is that our attack roots in the public key generation protocol instead of the distribution of values of all ri . To prevent the above framing attack, the synchronous submissions of each member’s public key yi have to be reached in the public key generation protocol. To achieve this goal, we can require that all members have to commit their public keys yi before any of these values are revealed or each member submits 6
Note that there are two typos on page 343 [27]: the symbol r˜1 appeared in line 8 and 10 should be replaced by a new symbol, for example r¯1 . Since r˜1 has been defined b−1 ·˜ b1 ˜ · R−1 · r1 mod p should be used such mod p, a new symbol r¯1 := R as r˜1 := r11 ˜ that when member 1 reveals r¯1 to all co-signers in B, all signers in B compute R ˜ ≡ r¯1 · t ri mod p. instead of R, where R i=2
88
Guilin Wang
the signed xi by using his private key fi (0) when he submits his public key yi . At the same time, to avoid Michels and Horster’s attack their improvement for distribution of ri should also be adopted. However, there is no straightforward way to improve LHLT scheme to get rid of other weaknesses described in §4. In fact, to our best knowledge, no existing threshold group signature schemes satisfy all security and efficiency requirements proposed in this paper.
Aknowledgements The author would like to thank Dr. Jianying Zhou, Dr. Feng Bao, and Dr. Yongdong Wu as well as the anonymous referees for their helpful comments.
References [1] G. Ateniese, J. Camenisch , M. Joye, and G. Tsudik. A practical and provably secure coalition-resistant group signature scheme. In: Crypto’2000, LNCS 1880, pp. 255-270. Springer-Verlag, 2000. 76, 77 [2] G. Ateniese, M. Joye, and G. Tsudik. On the difficulty of coalition-resistant in group signature schemes. In: Second Workshop on Security in Communication Networks (SCN’99), September 1999. 76, 77 [3] G. Ateniese, and G. Tsudik. Some open issues and new directions in group signature schemes. In: Financial Cryptography (FC’99), LNCS 1648, pp. 196-211. Springer-Verlag, 1999. 76, 77, 81 [4] C. Boyd. Digital multisignatures. In: Cryptography and Coding, pp. 241-246. Oxford University Press, 1989. 76 [5] J. Camenisch. Efficient and generalized group signatures. In: Eurocrypt’97, LNCS 1233, pp. 465-479. Springer-Verlag, 1997. 76, 77 [6] J. Camenisch, and M. Stadler. Efficient group signature schemes for large groups. In: Crypto’97, LNCS 1294, pp. 410-424. Springer-Verlag, 1997. 76, 77 [7] J. Camenisch. Group signature schemes and payment systems based on the discrete logarithm problem. Vol. 2 of ETH-Series in Information Security an Cryptography, ISBN 3-89649-286-1, Hartung-Gorre Verlag, Konstanz, 1998. 76, 77 [8] J. Camenisch, and M. Michels. Separability and efficiency for generic group signature schemes. In: Crypto’99, LNCS 1666, pp. 413-430. Springer-Verlag, 1999. 76, 77 [9] J. Camenisch, and A. Lysyanskaya. Dynamic accumulators and application to efficient revocation of anonymous credentials. In: Crypto’2002, LNCS 2442, pp. 61-76. Springer-Verlag, 2002. [10] D. Chaum, E. van Heyst. Group signatures. In: Eurocrypt’91, LNCS 547, pp. 257-265. Springer-Verlag, 1991. 75, 77 [11] L. Chen, and T. P. Pedersen. New group signature schemes. In: Eurocrypt’94, LNCS 950, pp. 171-181. Springer-Verlag, 1995. 76, 77 [12] L. Chen, and T. P. Pedersen. On the efficiency of group signatures providing information-theoretic anonymity. In: Eurocrypt’95, LNCS 921, pp. 39-49. Springer-Verlag, 1995. 76, 77 [13] Y. Desmedt. Society and group oriented cryptography: a new concept. In Crypto’87, LNCS 293, pp.120-127. Springer-Verlag, 1988. 76
On the Security of the Li-Hwang-Lee-Tsai Threshold Group Signature
89
[14] Y. Desmedt, and Y. Frankel. Threshold cryptosystems. In Crypto’89, LNCS 435, pp. 307-315. Springer-Verlag, 1990. 76 [15] A. Fujioka, T. Okamoto, and K. Ohta. A practical digital multisignature scheme based on discrete logarithms. In: Auscrypt’92, LNCS 718, pp. 244-251. SpringerVerlag, 1992. 76 [16] R.Gennaro, S. Jarecki, H.Krawczyk, and T. Rabin. Robust threshold DSS signatures. In: Eurocrypt’96, LNCS 1070, pp. 354-371. Springer-Verlag, 1996. 76 [17] L. Harn. Group-oriented (t, n) threshold digital signature scheme and multisignature. IEE Proceedings - Computers and Digital Techniques, 1994, 141(5): 307-313. 76 [18] L. Harn. New digital signature scheme based on discrete logarithm. Electronic Letters, 1994, 30(5): 396-398. 76 [19] L. Harn, and S. Yang. Group-oriented undeniable signature schemes without the assistance of a mutually trusted party. In Auscrypt’92, LNCS 718, pp.133-142. Springer-Verlag, 1993. 76 [20] P. Horster, M. Michels, and H. Petersen. Meta-multisignature schemes based on the discrete logarithm problem. In Proc. of IFIP/SEC’95, pp. 128-141. Chapman & Hall, 1995. 76 [21] M. Joye, S. Kim, and N-Y. Lee. Cryptanalysis of two group signature schemes. In: Information Security (ISW’99), LNCS 1729, pp. 271-275. Springer-Verlag, 1999. 76 [22] M. Joye, N-Y. Lee, and T. Hwang. On the security of the Lee-Chang group signature scheme and its derivatives. In: Information Security (ISW’99), LNCS 1729, pp. 47-51. Springer-Verlag, 1999. 76 [23] H-J. Kim, J. I. Lim, and D. H. Lee. Efficient and secure member deletion in group signature schemes. In: Information Security and Cryptology (ICISC 2000), LNCS 2015, pp. 150-161. Springer-Verlag, 2001. 76 [24] S. K. Langford. Weaknesses in some threshold cryptosystems. In Crypto’96, LNCS 1109, pp.74-82. Springer-Verlag, 1996. 76 [25] C-M. Li, T. Hwang and N-Y. Lee. Threshold-multisignature schemes where suspected forgery implies traceability of adversarial shareholders. In: Eurocrypt’94, LNCS 950, pp. 194-204. Springer-Verlag, 1995. 76, 77, 87 [26] C-M. Li, T. Hwang, N-Y. Lee, and J-J.Tsai. (t, n) threshold-multisignature schemes and generalized-multisignature scheme where suspected forgery implies traceability of adversarial shareholders. Cryptologia, July 2000, 24(3): 250-268. 76, 77, 79, 80, 81, 82, 87 [27] M. Michels, and P. Horster. On the risk of discruption in several multiparty signature schemes. In Asiacrypt’96, LNCS 1163, pp.334-345. Springer-Verlag, 1996. 76, 77, 86, 87 [28] T. Okamoto. A digital multisignature scheme using bijective public-key cryptosystem. ACM Transactions on Computer Systems, 1988, 6(8): 432-441. 76 [29] T. Ohata, and T. Okamoto. A digital multisignature scheme based on the FiatShamir scheme. In: Asiacrypt’91, LNCS 739, pp. 75-79. Springer-Verlag, 1991. 76 [30] C. Park, and K. Kurosawa. New Elgamal type threshold digital signature scheme. IEICE Trans. Fundamentals, January 1996, E79-A(1): 86-93. 76 [31] H. Petersen. How to convert any digital signature scheme into a group signature scheme. In: Proc. of Security Protocols Workshop’97, LNCS 1361, pp. 67-78. Springer-Verlag, 1997. 76 [32] A. Shamir. How to share a secret. Communications of the ACM, 1979, 22(11): 612-613.
System Specification Based Network Modeling for Survivability Testing Simulation HyungJong Kim Korea Information Security Agency, 78, Garak-Dong, Songpa-Gu, Seoul, Korea [email protected]
Abstract. As the structure and behavior of computer network becomes complicated and unpredictable, it becomes difficult to test the survivability of network. The modeling and simulation is a widely used approach to predict the behavior of system or set of system. In this research, we apply a modeling methodology to construct the valid model of computer network focusing on its vulnerability for survivability testing. To accomplish our aim, the appropriate modeling method should be defined. Especially, we take advantage of the system specification based modeling approach to construct valid network model. Keyword : Survivability, Vulnerability Analysis, System Specification, Simulation Model, DEVS-Formalism
1
Introduction
Modeling and simulation is widely accepted tool to predict the affairs in near future and to understand the current behavior of complex system or environment [5,6]. In the computer and network security research area, there are some related works that take advantage of the modeling and simulation to solve the security related problems [1,2,7]. Survivability is a kind of computer network characteristics that represent the degree of endurance against the external attack, internal fault, and unexpected accident [3]. In order to increase the survivability of computer network, the appropriate selection and management of security systems is essential. Additionally, the understanding of the computer network is prerequisite for the sound selection and management of them. In the testing of the survivability of network, the first hand method is to test against the real network. But, if we use such method, we should encounter several shortcomings that make it unsuitable to test the survivability of the information infrastructure. The representative shortcomings that should be supplemented are as follows. First, it can cause damages or performance degradation in the information infrastructure, because their activities are based on real networks. Especially, when P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 90-106, 2003. Springer-Verlag Berlin Heidelberg 2003
System Specification Based Network Modeling for Survivability Testing Simulation
91
some essential services or systems in the information infrastructure are stopped, it can be badly damaged. Second, it is impossible to test the survivability of networks that doesn't exist currently. Also, there are needs to test the network that will be constructed in near future and the network whose design will be altered. Third, since we can test the survivability of the network just based on the current security related state, it is difficult to assess the influence of new defensive mechanism upon the network. Especially, although the results are different according to the manager’s experience and management period, the method cannot consider these attributes. The simulation technology is appropriate to overcome these limits. When we make use of it, it is possible to define the previous and subsequent relations in the model, and select various attacks based on the result of previous attacking. Also, there are no effect on function and performance of information infrastructure, and it makes it possible to evaluate the network that doesn't exist currently. Additionally, in simulation, since attacks are generated in various time-axis, it is possible to test the survivability at the various security level and management period. This paper shows the modeling of computer network especially focusing on the vulnerability of computer network and presents the models of computer network for each system specification level. Through this research, we present the network modeling method to support the construction of valid computer network model and the testing of survivability of computer network. In the next chapter, the background knowledge related with this work will be presented, and in the third chapter, we will show the main research contents that explain computer network modeling for survivability testing and in fourth chapter we will show the model execution examples with figures. In last chapter, we make our conclusion and show some discussion related with the research.
2
Background Knowledge
2.1
DEVS Formalism [5]
The DEVS formalism developed by Zeigler is a theoretical, well-grounded means of expressing hierarchical, modular discrete-event models. In DEVS, a system has a time base, inputs, states, outputs, and functions. The system functions determine next states and outputs based on the current states and input. In the formalism, a basic model is defined by structure: M = < X, S, Y, δint , δ ext , λ, t a >, where X is an external input set, S is a sequential state set, Y is an external output set, δint is an internal transition function, δ ext is an external transition function, λ is an output function, and t a is a time advance function. DN = < D, {Mi}, {Ii}, {Zi,j}, select>, where D is a set of component name, Mi is a component basic model, Ii is a set of influences of I, Zi,j is an output translation, select is a tie-breaking function. Such a coupled model can itself be employed in a larger coupled model. Several atomic models can be coupled to build a more complex model, called a coupled-model. A coupled model tells how to couple several models together to form a new model.
92
2.2
HyungJong Kim
Security Simulation Related Work
Fred Cohen [1] constructs a cause-effect model to simulate the cyber attack and defense effort. In the research, network model is constructed and evaluated. When simulation is started, the cause-effect model is applied to network model and the process of attack and defense is simulated. The cause-effect model was designed for the purpose of simulation and analysis and it is based on a set of 37 classes of threats, 94 classes of attack mechanisms, and about 140 classes of protective mechanisms. Those are inter-linked by database, which associates threats with attacks and attacks with defenses. In this simulation, users specify defender strength, strategy, and a number of simulation runs and the success of attack are determined by attack luck and defender quality. In CMU/SEI [2], attack-modeling method is suggested for information security and survivability. In the research, attack tree is used to represent an event that could significantly harm the enterprise’s mission. Also, the attack patterns are defined to represent a deliberate, malicious attack that commonly occurs in specific contexts. The attack pattern consists of the overall goal, a list of precondition, the steps for carrying out the attack, and a list of post-conditions that are true if the attack is successful. Also, related attack patterns are organized into attack profile that contains a common reference model, a set of variants, a set of attack patterns, a glossary of defined terms and phrases. When the attack patterns that have set of variants is properly instantiated, we say such patterns are applicable to the enterprise attack tree. The instantiation is to substitute the variants for domain specific values.
3
Vulnerability Modeling Methodology for Survivability Testing
3.1
The Overall Structure of Simulation System
Fig. 1 shows the overall structure of the simulation system. It consists of simulation component and database component. The simulation component is composed of EF (Experimental Frame), which is survivability testing environment and target network model. The database component contains data, which is used at simulation execution time. It consists of AttackDB and VulDB, which have attack and vulnerability information respectively. The database component is named as VDBFS (Vulnerability DataBase For Simulator), and the ADBI (Abstracted DataBase Interface) is designed to interface simulation component with VDBFS. One of the most important things in modeling and simulation research is to construct valid model that reflects the characteristics of real system well. When we execute a model whose validity is not guaranteed, validity of the result of the simulation also cannot be guaranteed. There are some methods to construct a valid simulation model, and the making use of the well-founded modeling methodology is widely recommended. In our simulation modeling, we take advantage of the system specification concept and DEVSformalism in order to get profit for construction of valid model. Fig. 2 shows that the whole scope of process related with survivability testing simulation.
System Specification Based Network Modeling for Survivability Testing Simulation
Experimental Frame
Target Network
Host Model
Simulation Component
Attacker Model
Access Control Model
Host Model Final state
Access Control Model
Initial state
Initial state
Host Model
:
Final state
Access Control Model
Initial state
Gateway Model Final state
Access Control Model
:
Initial state
Host Model
Host Model Final state
Access Control Model
Evaluator Model
Final state
Access Control Model
Access Control Model
Initial state
Final state
Initial state
ADBI(Abstracted Database Interface)
Database Component
VDBFS Attack AttackDB DB
Vulnerability VulnerabilityDB DB
Fig. 1. The overall structure of simulation model
Model Design
Network Model Construction
MODEL BASE
- Attacker - Evaluator - Vulnerability - Net Device - Security Model
Requirements & Constraints
Req. & Constraints for Model Design
Domain Analysis
Specification & Model Design Model Implementation
Requirements & Constraints Model Execution
Simulation Execution
VDBFS
-Attack DB - Vulnerability DB
Network Editing Model Base
Constructed Network Model
Simulation Model Execution
Security Policy, Net. Design Modification
Model Analysis
Simulation Result Representation
Gathered Data
- The data that is used to analyze the network vulnerability is determined based on performance index.
Simulation Result
Vulnerability Analysis Vulnerability DataBase For Simulator
Documentation
Apply in Real W orld
Fig. 2. Modeling and Simulation Process
93
94
HyungJong Kim
Left side of Fig. 2 shows the simulation process that consists of model design, model execution and model analysis. At the model design process, the models in model base are extracted and overall simulation model is constructed, and the data in VDBFS is used to execute the model. During the execution process, the data related survivability testing is gathered, and it is used at the model analysis process. The right side of the processes in the cross shape are for end user of simulation system and the other processes are research scope of this paper. We have two main research topics, one is to construction of VDBFS and the other is design and implementation of models in model base. As we have already mentioned, the focus of this paper is how to construct the model of target network that contains vulnerabilities based on their system configuration. The next section will show the network modeling for testing of survivability using the system specification. 3.2
System Specification Based Network Modeling Focusing on Vulnerability
In this section, the network modeling for testing of survivability is presented. To construct a valid network model, it is helpful to make use of a well-founded modeling methodology in the modeling process. Especially, we constructed simulation model based on DEVS-formalism, a well-founded modeling methodology, and we consider the hierarchy of system specification in the modeling process. In order to utilize the system specification, we should analyze the computer network through the multiple abstraction level. Fig. 3 shows the hierarchy of system specifications. In the specification hierarchy, lower specification represents behavior and higher-level specification represents the structure. In the hierarchy in Fig. 3, as the specification level increases, the amount of represented knowledge in the model increases. Also, the abstraction level becomes low as the specification level decreases. In the following section, we define a Node model and describe and we also show the model example in each specification level. S : System Specification Level
high
less
Network of Systems Multi-component system
Structured System I/O System I/O Function I/O Relation I/O Frame
Fig. 3. Hierarchy of the system specification
Structure-to-behavior Direction
more
Represented Knowledge of System
Abstraction Level
Low
System Specification Based Network Modeling for Survivability Testing Simulation
95
O Network Description 1. Network is consists of nodes and links. O Link Description 1. Links have processing capacity that is amount of data per time unit, O Nodes Description (nodes) 1. Nodes can be categorized network devices and hosts. 2. Nodes have processing capacity and it is expressed by the amount of data per time unit. (network devices and hosts) 3. Network Devices have several services that should be offered for network operation. 4. Hosts have several services that should be offered to human or other hosts. (protocol) 5. Each protocol has its own communication mechanism that is used by services. 6. Each protocol may have vulnerabilities that can be exploited by malicious users. (service) 7. Services that offer their facilities from remote places are bound to a specific network protocol and port. 8. Each service has facilities that should be performed to accomplish its purpose. 9. Each service has processing capacity, which can be overflowed by the excessive service request. 10. Each service may have vulnerabilities that can be exploited by malicious users. (vulnerability) 11. Remediation or fixes may exist for vulnerabilities. 12. Vulnerability is dividable as several vulnerability-units that are not dividable any more (it will be explained in section 3). 13. There are some conditions that enable the exploitation of the vulnerability such as host or service configurations, network protocol and so on. 14. When the exploitation of vulnerability is succeeded, the attacker gets to a consequence that he or she aimed at. Fig. 4. Sixteen constraints for network modeling
3.2.1 System Description Constraints for model construction are presented in Fig. 4. Those constraints are collected in the viewpoint of the vulnerability and we will get the more specific description of the system in the subsequent sections. As shown in Fig. 4, the constraints consist of three parts. First part has one constraint that describes the elements of network, and second part describes the capacity of the links of network. Constraints in each first two parts can be considered that they are too simplified ones to show the feature of network. But, we don’t have to construct the models of all components in target system, and we consider the aim of simulation in selecting the component to be constructed as a model. Though those constraints are insufficient to describe the all feature of network, they are sufficient to construct the network model focusing on the vulnerability.
96
HyungJong Kim
The third part in Fig. 4 shows the constraints related with nodes in network. Through these constraints, we can see the focus of model construction. Especially, those constraints show that the vulnerabilities of the nodes are originated from the protocols and services in the nodes, and the vulnerabilities are composed of atomicvulnerabilities that cannot be divided more. These features of the vulnerability will be described later. In the following section, the modeling of network is presented for each level of system specification from low to high level. As we show network-modeling method in each level, we can find our computer network modeling method for survivability testing simulation. Also, these model definitions can be used to construct the DEVS model directly. The relation of system specification and DEVS-formalism is described well in [5]. 3.2.2 I/O Frame Specification Level In this specification level, just the input and output interface of a system are defined as an input and an output set. When we consider the constraints in Fig. 4, we can see that there are attack input that is inserted by an action of attacker and the reaction of the system. In this level, we just observe the input and output of the system and we do not consider relation, function, state and structure of the system. The node model definition in this level is as follows. Nodes = ,where T : Time. X : {x1, x2, x3….xn} xi : attacker’s action input. Y : {y1, y2, y3….ym} yi : reaction output of nodes. In this specification level the Nodes is just a black box that has attack input and reaction output. Also, in this level, the input set X is abstract input set and Y is abstract output set. Since there isn’t any relation between input and output, all element xi in X don’t have output yj in Y. In this level, it is important that we do not consider the relation between the X and Y, just see the X set and Y set. Example X = {SynFloodingInput, BufferOverflowInput, FormatStringInput, RaceConditionExploitInput,UnexpectedFileAccessInput, RootShellCreationInput } T = Attack Generation Time and Output Generation Time. Y = {Denial of Service, Gain User Access, Gain Root Access, Gain Information} 3.2.3 I/O Relation Specification Level In the I/O relation specification, the relation between the input and output is considered in the model construction. The definition of node model is as follows. Nodes = ,where Ω : the set of allowable attack input segments.
System Specification Based Network Modeling for Survivability Testing Simulation
97
,where, Ω ⊆ (X, T) and a segment ω ∈ (X, T) is input segment of Nodes. R : the I/O relation ,where R ⊆ Ω × (Y, T) In the definition, the R shows the definition of relation between the input and output. The example of the I/O Relation level model is as follows. Example X = {SynFloodingInput, BufferOverflowInput, FormatStringInput, RaceConditionExploitInput,UnexpectedFileAccessInput, RootShellCreationInput } T = Attack Generation Time and Output Generation Time. Y = {Denial of Service, Gain User Access, Gain Root Access, Gain Information} Ω= {Ω1, Ω2, Ω3,…., Ωn} : Attack Input Segment Set. Ω1= {(SynFloodingInput, t0),(SynFloodingInput, t1),(SynFloodingInput, t2),… (SynFloodingInput, tn)} Ω2= {(BufferOverflowInput, tj)} R : { Ω1×(Denial of Service, tn), Ω2×(Gain User Access, tj), Ω2×(Gain Root Access, tj) } 3.2.4 I/O Function Specification Level In the I/O relation specification level, there is ambiguous case that the system can generate different output though same input is inserted. It is because it just considers the relation between input and output. But, when we consider function set that represents the relation, it enables us to solve the problem that we have mentioned. The definition that considers the I/O function is as follows. Nodes = ,where F : the I/O relation ,where ƒ ∈ F ⇒ ƒ ⊆ Ω × (Y, T) Example X = {SynFloodingInput, BufferOverflowInput, FormatStringInput, RaceConditionExploitInput,UnexpectedFileAccessInput, RootShellCreationInput } T = Attack Generation Time and Output Generation Time. Y = {Denial of Service, Gain User Access, Gain Root Access, Gain Information} Ω = {Ω1, Ω2, Ω3,…., Ωn} : Attack Input Segment Set. Ω1= {(SynFloodingInput, t0),(SynFloodingInput, t1),(SynFloodingInput, t2),… (SynFloodingInput, tn)} Ω2= {(BufferOverflowInput, tj)} ƒ1(Ω1) ⇒ (Denial of Service, tn) ƒ2(Ω2) ⇒ (Gain User Access, tj) ƒ3(Ω2) ⇒ (Gain Root Access, tj) The selection of the proper function is based on the initial state of the nodes. The initial state is the information of the nodes, and when the node models receive an input segment, it considers its initial state and selects a proper function. Our example
98
HyungJong Kim
also shows that if the service of the target system is executed by a user privilege, then the ƒ2 will be selected. If the service of the target system is executed by a root privilege, the ƒ3 will be selected. 3.2.5 I/O System Specification Level In the I/O function level, we just consider the initial state of the system, but it cannot express the intermediate and final state of the system. In this level, we should consider the state and state transition function for modeling the interior of the Nodes. The node definition of I/O system specification level is as follows. Nodes = ,where Q is a set, the set of Node states ∆ : Q × Ω ! Q is the state transition function Λ : Q × X ! Y(or Λ: Q ! Y) is the output function Example X = {SynFloodingInput, BufferOverflowInput, FormatStringInput, RaceConditionExploitInput,UnexpectedFileAccessInput, RootShellCreationInput } T = Attack Generation Time and Output Generation Time. Y = {Denial of Service, Gain User Access, Gain Root Access, Gain Information} Q = {Normal, Warning, Consequence} Case 1. ∆ : Normal × (SynFloodingInput, t0) ! Normal Normal × (SynFloodingInput, t1) ! Normal : Normal × (SynFloodingInput, tj) ! Warning : Warning × (SynFloodingInput, tn) ! Consequence Λ : Consequence ! Denial of Service Case 2. ∆ : Normal × (UnexpectedFileAccessInput, t0) ! Warning Warning × (RootShellCreationInput, t1) ! Consequence Λ : Consequence ! Gain Root Access The above two cases shows the three states represent the security related status of Nodes. First, Normal represents that there is no bad effect to the nodes by malicious input from outside. Second, Warning represents that subsequent malicious inputs may cause a bad effect to the nodes. Third, Consequence represents that the Nodes is exploited by the malicious inputs from outside world. The case 1 shows DoS (denial of service) consequence that is caused by flooding input and the case 2 is system-parameter based vulnerability exploitation. In the DoS example, there are input sequences during a specific time quantum and they make the state transition from Normal to Warning and from Warning to Consequence. Also, at the Consequence state, the output is generated from the Nodes model. In the case 2, as the UnexpectedFileAccessInput is inserted, the state transition from Normal to
System Specification Based Network Modeling for Survivability Testing Simulation
99
Warning state is occurred. Also, the insert of the RootShellCreationInput causes occurrence of the state transition from Warning to Consequence. Also, at Consequence state the Gain Root Access output is generated. 3.2.6 Structured and Multi-component System Specification Level In the structured system specification level, we should consider how the state set and transition function are realized. For example, in I/O system level, we defined three state, Normal, Warning, and Consequence, but each state can be defined using the more primitive state. Also, the state-transitions related with the primitive state can be defined. When we consider the states and state-transitions in lower level, we can define a model in structured system specification level. Also, in the multi-component system specification level, we should consider system’s components that have its own I/O, state, and state-transition. In this research, to construct the model that reflect the interior status related with vulnerability, the vulnerability model is designed and inserted as a component of the Nodes. Also, the states and state transitions are defined based on the vulnerability component. In this design of the vulnerability model, we defined two concepts. The one is AV(Atomic Vulnerability) and the other is CV(Compound Vulnerability) [9]. AV is the vulnerability that cannot be divided any more and CV consists of AV set. We can define our Nodes model using the AV and CV concept. The definition is as follows. Nodes = {T, X, Y, Ω, D, Λ} ,where D is a set, set of CV’s reference. The Nodes model has component D that contains a set of CV’s reference. The component D represents all vulnerabilities in the nodes and it determines dynamics of the Nodes. In the Nodes definition, it has output function Λ, and the nodes model generates its output at the Consequence state. The important difference between the I/O system specification level and this level is that the state and its transition is determined by its components that are referenced by the component D in the Nodes definition. Especially, in our case, the component D of Nodes refers the CVs that are represented by the AVs and operators used to define the relations among AVs. The definitions of CVs and AVs are as follows: Compound Vulnerability: CV = {Icv, Qcv, δcv, WSX, VX} where, Icv = {Icv1, Icv2, …, Icvn} Qcv= {Normal, Intermediate, Warning, Consequence} δcv : Icv × Qcv → Qcv WSX : warning state vulnerability expression VX : vulnerability expression
100
HyungJong Kim Ⱦ c v(Ic v Ý Q c v)
N o rm a l
Ⱦ cv(Icv Ý Q cv)
In te r m e d ia te Ⱦ c v(Ic v Ý Q c v) V X is T r u e
Ⱦ cv(Icv Ý Q c v) W S X is T r u e
Ⱦ c v(Ic v Ý Q c v) W S X is T r u e
W a r n in g
Ⱦ c v (Ic v Ý Q c v) V X is T r u e
C o n seq u en ce
Ⱦ cv(Icv Ý Q c v) V X is T r u e
Fig. 5. State and state transition of CV
Table 1. Logical operators for VX
AND Relation (AND) OR Relation (OR) Probabilistic OR Relation (POR) Sequential AND Relation (SAND)
To represent vulnerability exploited if both AVs are true To represent vulnerability exploited if either or both AVs are true To represent vulnerability exploited if either or both AVs are true. But each AV has weight value that accounts for vulnerability of target system for each AV(from 0 to 1). To represent vulnerability exploited if one AV at front is true and then the other AV is true sequentially.
In the definition of CV, Icv is a set of attack input sequences and it means the external inputs (X) in the Nodes model. Qcv has four essential states that are meaningful in the simulation. Normal state is a state in which a target system is waiting for input packets. When the target system is under attack, system’s state is Intermediate. The Warning state means that probability of exploitation occurrence is beyond a specific level, and the system can transit to an abnormal state by a simple attack input. Consequence state is a goal state, which means the target system is exploited by attacker. δcv is state transition function and each state transition is defined as shown in Fig.5. A CV is represented by logical composition of AVs. VX holds the expression. An expression is composed of AVs and four binary logical operators. We evaluate the expression by calling AV objects. If this expression is evaluated as TRUE, it means that the vulnerability is exploited by attack action sequence and state transition to compromised state occurs in the model. WSX is warning state vulnerability expression. Syntax of WSX is the same as VX’s. If this expression is TRUE, state transition to Warning state occurs.
System Specification Based Network Modeling for Survivability Testing Simulation
101
Atomic Vulnerability : AV = {Iav, Qav, δav, Type, Category} where, Iav = {Iav1, Iav2, …, Iavn} Qav = Q(initial state) Q(final state) δav : Iav × Q(initial state) → Q(final state) Type : {Fact, NonProb, Prob} Category : {Generic, Application-Specific, System-Specific} In the definition of AV, Qav is a set of states. Q(initial state) is a subset of Qav and has special states NONE and ANY. Q(final state) is a subset of Qav and has a special state NONE. Iav is a set of attack input sequences to AV and it is also same as the Icv. δav is a state transition function. Identification of states and attack inputs in AV is relevant to abstraction level that is needed in future application. Abstraction level of this research is related with the aim of modeling and simulation. We constructs our simulation model at the conceptual level of Ye’s process control approach [7]. The conceptual level describes security-critical states and state transitions of an entity. Type and Category are bases for classifying the corresponding AV. An AV is one of three type; Fact, NonProb or Prob. A Fact AV has no input(NONE) and no state(NONE). Therefore, a Fact AV’s δav is δav(NONE, NONE) = NONE. This type explains the corresponding AV’s origin. NonProb and Prob type AVs explain whether these AVs are exploited probably or deterministically. Category is Generic, Application-Specific for specific application, System-Specific for specific OS or H/W.
Sample Network inA out1 outA
Node1
r_inA
outH
in1
Link A r_out1 r_in1
inH r_outH r_inH
r_outA
inB
Node2
outB r_inB
inSN
out 2 in2 r_out 2 r_in2
outH
Link B
inH
r_inB
r_inH
r_outB r_out A
out3
r_inC
in3 r_out3 r_in3
r_outC
r_in outA in A A inR
Hub
Router outH
inR
outR
inH
outR
r_inR
r_outH
r_inR
r_outR
r_inH
r_outR
outSN r_inSN
r_in outA in A A
r_outSN
inC outC
outB
r_outH
r_outB
Node3
r_out A inB
r_inH
Link C
r_outH in H outH
Fig. 6. Sample Network model in Network of System specification Level
102
HyungJong Kim
3.2.7 Network of System Specification Level As we mentioned in 3.2.1, the Network consists of Nodes and Links. In the Network of System Specification level, defines the relation between Nodes and Link. Its relation is defined as a coupling relation between Nodes and Links and the relation makes paths from Nodes to Links and from Links to Nodes. Also, these relation can be exists between the models expressed in this level and models expressed in other higher level, and it makes it possible to construct hierarchical simulation model. Fig. 6 shows the example of the network model that is constructed hierarchically.
4
Simulation Execution
To experiment the models that we explained, we construct a simulation system. We make use of the Modsim III, the general-purpose simulation language on Windows platform. This simulation system enables users to edit the network topology in dragand-drop pattern and to configure the information of the each node. Its vulnerability is extracted from VDBFS(Vulnerability DataBase For Simulator) according to its system information. Also, user can apply the packet filter and proxy model in the firewall.
Fig. 7. Simulation run for testing of network model
System Specification Based Network Modeling for Survivability Testing Simulation
103
Fig. 8. Attack path in sample network
Fig. 7 shows an execution of this system, which a sample network model is loaded. When the network editing and configuration setting is ended, it should extract vulnerabilities from VDBFS based on its configuration. It creates AV and CV components in node models. After vulnerabilities are extracted, attack simulation can be executed. During the simulation is running, simulator gathers the information from the models to test the survivability of the network. In the Fig. 7, we can see an attacker system and attack generation as a red-circled moving icon. After the attack simulation is ended, we can find attack paths as red arrows in Fig. 8, and mouse right button click enables to see the vulnerability information that causes the attack path creation as shown in Fig. 9. In Fig. 9, the icons of node are moved not to make arrows overlap. Through the simulation execution, we can find the attack paths that are created by the exploitation of vulnerability. The attack path information is essential one in the survivability testing. As shown in Fig. 9, there are exploited vulnerabilities in each attack path, and user can see the details of the vulnerability. Especially, we can see the atomic vulnerabilities for each vulnerability in Fig 10. In the Fig. 10, we can see the information of AV, such as AV name, external input, initial state and final state. Through the information, we can trace the state transition of the victim system. The execution result can be illustrated as a timing diagram that shows the event sequences and state transition of victim system model. Fig. 11 shows its timing diagram. The states of the Node model are represented by the set of CV’s transited states and in each state transition the Vulnerability Expression (VX) is shown. When the VX is evaluated as TRUE, the state transition occurs. Also, as shown in Fig. 11,
104
HyungJong Kim
the two attack inputs can’t influence any effect in the state transition of Node Model, because VX isn’t evaluated as TRUE. Through the Fig. 11, we can see that the state transition of Node Model is expressed by the state set of exploited CV, and state transition is determined by the evaluation result of the VX.
Fig. 9. Vulnerability information in attack path
Fig. 10. Atomic vulnerability information for selected vulnerability
5
Discussion and Conclusion
This paper presents a network modeling approach that makes it possible to test the survivability in computer network using simulation technology. Also, networkmodeling approaches for each system specification level are presented. Especially, the atomic vulnerability (AV), compound vulnerability (CV) and vulnerability expression (VX) concept are applied in the structured and multi-component system specification. In the modeling and simulation study, the validity of the model is very important. In
System Specification Based Network Modeling for Survivability Testing Simulation
105
order to accomplish the aim of valid model construction, it is helpful to take advantage of the system specification based modeling approach, because it enables users to consider constraints of domain in multiple abstraction level. The future work of this research is to extend the simulation system as a survivability evaluation tool of information infrastructure. To achieve it, we have several missions that must be done. One is to gather vulnerability information in VDBFS to the satisfied amount. Second is to test the system in diverse network environments and compare the simulation result with real world phenomena. The last one is to develop survivability indexes and methodology to extract the indexes.
sendmail [command]
X
send Oracle TNS Command[shellcode]
send rlpd [shellcode]
OverwriteFile_samba_more
(input) VX
NoCheckParameterCondition AND EnabledAlterReturnAddressInStack AND RootShellCreated_OracleTNS
S
{GainRootAccess[Oracle-TNS], GainRootAccess(rlpdaemon)}
VX
{GainRootAccess[Oracle-TNS]}
(phase)
NORMAL
t0
t1
t2
NoCheckParameterCondition AND EnabledAlterReturnAddressInStack AND StackOverflow AND RootShellCreated_rlpd
t3
t4
e (elapsed time)
Y
GainRootAccess
GainRootAccess
(output) Time
Fig. 11. Timing Diagram of Node Model
Acknowledgements Thanks for the technical advice, comments and suggestion from KyungHee Koh, DongHoon Shin, DongHyun Kim, Dr. HongGeun Kim, and Professor TaeHo Cho. This work is supported in part by the IITA(Institute of Information Technology Assessment) of Korea.
106
HyungJong Kim
Reference [1]
F. Cohen, "Simulating Cyber Attacks, Defences, and Consequences," Computer & Security, Vol.18, pp. 479-518, 1999. [2] P. Moore, R. J. Ellison and R. C. Linger, "Attack Modeling for Information Security and Survivability," Technical Report No. CMU/SEI-2001-TR-001, Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, March, 2001. [3] N. R. Mead, R. J. Ellison, R. C. Linger, T. Longstaff, and J. Mchugh, "Survivable Network Analysis Method," Technical Report No. CMU/SEI2000-TR-013, Pittsburgh, PA: Software Engineering Institute, Carnegie Mellon University, March, 2000. [4] M. Bishop, "Vulnerabilities Analysis", Proceedings of the Recent Advances in Intrusion Detection, pp. 125-136, September, 1999. [5] P. Zeigler, H. Praehofer and T. Kim, Theory of Modeling and Simulation, Second Edition, Academic Press, 2000. [6] M. Law and W. D. Kelton, Simulation Modeling and Analysis, Third Edition, McGraw Hill, 2000. [7] N. Ye and J. Giordano, "CACA - A Process Control Approach to Cyber Attack Detection", Communications of the ACM, Vol.44(8), pp. 76-82, 2001. [8] TaeHo Cho and HyungJong Kim, "DEVS Simulation of Distributed Intrusion Detection System," Transactions of the Society for Computer Simulation International, vol. 18, no. 3, pp. 133-146, September, 2001. [9] HyungJong Kim, KyoungHee Ko, DongHoon Shin and HongGeun Kim, "Vulnerability Assessment Simulation for Information Infrastructure Protection", Proceedings of the Infrastructure Security Conference 2002, LNCS Vol. 2437, pp. 145-161, October. 1-3, 2002. [10] HyungJong Kim, HongGeun Kim and TaeHo Cho, "Simulation Model Design of Computer Network for Vulnerability Assessment," Proceedings of International Workshop on Information Security Applications, pp.203-217, September. 13-14, 2001.
A Risk-Sensitive Intrusion Detection Model Hai Jin, Jianhua Sun, Hao Chen, and Zongfen Han Internet and Cluster Computing Center Huazhong University of Science and Technology, Wuhan, 430074, China [email protected]
Abstract. Intrusion detection systems (IDSs) must meet the security goals while minimizing risks of wrong detections. In this paper, we study the issue of building a risk-sensitive intrusion detection model. To determinate whether a system calls sequence is normal or not, we consider not only the probability of this sequence belonging to normal sequences set or intrusion sequences set, but also the risk of a false detection. We define the risk model to formulate the expected risk of an intrusion detection decision, and present risk-sensitive machine learning techniques that can produce detection model to minimize the risks of false negatives and false positives. Meanwhile, this model is a hybrid model that combines misuse intrusion detection and anomaly intrusion detection. To achieve a satisfying performance, some techniques are applied to extend this model.
1
Introduction
There are two well-known kinds of intrusion detection systems, which are called misuse intrusion detection system and anomaly intrusion detection system. Misuse intrusion detection (also called knowledge-based intrusion detection) system stores signature patterns of known intrusion, compares a behavior with these signature patterns, and signals intrusion when there is a match. Anomaly intrusion detection system maintains users’ normal behavior profiles and signals intrusion when observed behaviors differ greatly from the normal profiles. Misuse intrusion detection system is efficient and accurate in detecting known intrusions, but cannot detect novel intrusions whose signature patterns are unknown [25]. Anomaly intrusion detection system can detect both novel and known attacks, but false alarm rate is high. Hence, misuse intrusion detection system and anomaly detection system are often used together to complement each other. Many different approaches and techniques have been applied to anomaly intrusion detection. [9] uses neural network to model normal data. Lee et al. apply data mining programs to system audit to learn rules that accurately capture the behaviors of intrusions and normal activities [19, 20]. Fuzzy theory is also applied to intrusion detection. [7] generates fuzzy association rules from new audit data to detect whether an intrusion occurs or not. In [5], the fuzzy intrusion
This paper is supported by Key Nature Science Foundation of Hubei Province under grant 2001ABA001.
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 107–117, 2003. c Springer-Verlag Berlin Heidelberg 2003
108
Hai Jin et al.
recognition engine (FIRE) uses fuzzy logic to assess whether malicious activity is taking place on a network. Bridges et al. apply fuzzy data mining techniques to the anomaly-based components [3]. Based on the belief that legitimate users can be classified into categories based on the percentage of commands, [21] applys a variety of techniques, such as K-means and learning vector quantization, to develop a hybrid anomaly detection methodology for intrusion detection. [22] records system calls from daemon processes and setuid programs for anomaly detection. In [4], a network-based intrusion detection system named Macroscope uses bottleneck verification (BV) to detect user-to-superuser attacks. [24] gives us a comparison of anomaly detection techniques and draws a conclusion that the short sequences of system calls are more important than the particular method of analysis. Attentions should be paid to consider what are the most effective data streams to monitor. In [25], various probabilistic techniques are applied to intrusion detection. The studies show that the frequency property of multiple audit event types in a sequence of events other than a single audit event is necessary for intrusion detection. The ordering property of multiple audit events provides additional advantage to the frequency property for intrusion detection. Because of the scalability problem of the ordering property, the frequency property is a viable solution for intrusion detection. Stephanie Forrest presents an approach for modeling normal sequences using look ahead pairs [8] and contiguous sequences [11]. Lane [13, 14, 15] examines unlabeled data for anomaly detection by comparing the sequences of users’ actions during an intrusion to the users’ normal profile. [23] states that Bayesian methods can present evidences of intrusion as probabilities, which are easy for human fraud investigators to interpret. Bayesian decision making gives us a method to obtain the lest risk of classifying system calls sequences into normal data set or anomalous data set and therefore is used in our intrusion detection model. The rest of this paper is organized as follows. In section 2, we discuss the research background, introduce the risk-sensitive intrusion detection model based on Bayesian decision theory, and extend this model by using a similarity measure. In section 3, we describe how to generate system calls sequences databases, and evaluate our intrusion detection model using sequences databases. Section 4 ends up with a conclusion.
2
A Risk-Sensitive Intrusion Detection Model
The goal of an IDS is to detect an intrusion when it happens and respond to it, and is able to keep security stuffs from being disturbed by the false alarms. False negative means that when an intrusion really happens, but your IDS does not catch it. A false positive is a situation where an abnormity defined by the IDS happens, but it does not turn out to be an real intrusion. Hence low false negatives and low false positives are the goal of an IDS. General IDSs often ignore the risks of false negatives and false positives. To minimize the risks,
A Risk-Sensitive Intrusion Detection Model
109
we build an intrusion detection model based on Bayesian decision theorem and a similarity measure. 2.1
Research Background
In Linux operating system, a program consists of a number of system calls, and different processes have different system calls sequences. Because the diversities of processes coding, there are difference in the order and the frequency of invocation of system calls [22]. So the speciality in the order and the frequency of system calls provides clear separation between different kinds of processes. Experiments in [8] show that short sequences of system calls of processes generate a stable signature for normal behaviors and the short range ordering of system calls appears to be remarkably consistent. This suggests a simple model of normal behaviors. The basic idea of our model is similar to [8]. During intrusion, system calls are invoked in a different manner from normal usage, and these intrusion system calls sequences have their own special characters in the order and the frequency. We build two profile databases respectively. One is called NSCS database that contains normal system calls sequences, and the other is called ISCS database that contains intrusion system calls sequences. Misuse intrusion detection can be achieved on the base of ISCS database, and felling back on NSCS database anomaly intrusion detection can be realized. These two kinds of detection submodels can work independently. Through Bayesian decision theorem, misuse intrusion detection and anomaly intrusion detection are combined. The details of this hybrid model are showed in the following sections. 2.2
Bayesian Decision Theorem
Risks of misrecognition occur everywhere in our lives. For example, different mistakes in diagnosis can cause distinct risks. In medical treatment, false positives and false negatives also occur in physical examination. A positive test for AIDS or cancer, when the person is disease free, is a false positive. The person suffers psychally from the outcome that he has a disease when he actually does not. A false negative is when there actually is a disease but the results come back as negative. A finding of no cancer, when there actually is cancer, is a false negative. The patient will be devastated because he does not get the timely treatment that he needs. Obviously, the results produced by these two error diagnostic decisions have apparently different harm. These false results cannot be completely eliminated, but they can be reduced. Bayesian decision theorem [2] is applied to our risk-sensitive model to consider the different losses caused by various kinds mistakes and offer an outcome that minimizes losses and risks. Decision is more commonly called action in the literature. Particular actions is denoted by a, while the set of all possible actions under consideration is denoted by Φ. Φ is defined as: Φ = {a1 , a2 , ..., ac }
(1)
110
Hai Jin et al.
Table 1. General Form of a Decision Table α1 α2 ... αi ... αc
w1 λ (α1 , w1 ) λ (α2 , w1 ) ... λ (αi , w1 ) ... λ (αc , w1 )
w2 λ (α1 , w2 ) λ (α2 , w2 ) ... λ (αi , w2 ) ... λ (αc , w2 )
... ... ... ... ... ... ...
wj λ (α1 , wj ) λ (α2 , wj ) ... λ (αi , wj ) ... λ (αc , wj )
... ... ... ... ... ... ...
wc λ (α1 , wc ) λ (α2 , wc ) ... λ (αi , wc ) ... λ (αc , wc )
Each element in Φ can incur some loss, which is often the function of decision and state of nature. The decision table is used to denote the relationship. Table 1 is the general form of a decision table. In table 1, wi is the ith state of nature, αj is the jth action, and λ (αj , wi ) is a risk function related with αj and wi . The quantity w which affects the decision process is commonly called the state of nature. In making decision it is clearly important to consider what the possible state of nature is. The symbol Ω is used to denote the set of all possible states of nature. Then Ω = {w1 , w2 , ..., wc }. (2) In this model, c equals 2, w1 denotes normal, and w2 denotes intrusion. Accordingly a1 means that the sequence is normal and can be passed over, and a2 means that a signal of intrusion is emitted and that an action responds to the signal. A random variable is denoted by X, and a particular realization of X is denoted by x. x = x1 , x2 , ..., xn (xi ∈system calls) and x1 , x2 , ..., xn means a sequence of system calls like x1 → x2 → ... → xn , such as fstat64→mmap2→read→ close→munmap→rt sigprocmask. Each x is classified into a normal sequences set or an intrusion sequences set. In decision theory, a key element is the risk function. If a particular action ai is taken and wj (i, j = 1, 2, ..., c) turns out to be the true state of nature, then a risk λ (αi , wj ) is incurred. λ (α1 , w2 ) means the risk because that the sequence is ignored while it turns out to be an intrusion; λ (α2 , w1 ) means the risk because that the a signal of intrusion is emitted while the sequence turns out to be normal. Clearly, λ (α1 , w2 ) is larger greatly than λ (α2 , w1 ) . The expected conditional risk R (αi | x) can be obtained from the following formula: R (αi | x) = E [λ (αi , wj )] = cj=1 λ (αi , wj ) P (wj | x), i = 1, 2, ..., c (3) where P (wj | x) is the conditional probability of wj for a given x and can be got through Bayesian theorem: p(x | wi )P (wi ) , i = 1, ..., c P (wi | x) = c j=1 p(x | wj )P (wj )
(4)
A Risk-Sensitive Intrusion Detection Model
111
where prior probabilities P (wi ) are assumed known. Among these R (α1 | x) , R (α2 | x) , ..., R (αc | x) , the optimal decision is ak , which is got from the following: R (αk | x) = min R (αi | x) i=1,...,c
(5)
In our model we just make a comparison between R (α1 | x) and R (α2 | x) , and choose the action that bring less risk to the system. That is the formulized Bayesian view of optimal decision making. System calls sequences of the system normal state and the intrusion state have their special patterns, and can be used as signatures for normal determination and intrusion detection. An accuracy requirement is needed to be sensitive to identify system normal state and intrusion state. But too narrow a definition will result in many false positives. How do we set the thresholds so that we can detect real intrusions and avoid false alarms? It is obviously that setting the thresholds too low will lead to large boring false alarms. On the other hand, setting the threshold too high may result in missing some attacks, which bring danger to the system. One solution is to set a narrow definition for signatures, and apply a similarity measure to our model to avoid the demerits incurred by a narrow definition. 2.3
Extended Model
System calls for the normal processes are regular and can be modeled effectively, and are significantly different from that of intrusion traces. For example, an intrusion exploits bugs in the program to obtain a root shell. Since this never happens in normal processes, the system call traces are significantly different [6]. Intrusions belonging to the same intrusion category have identical or similar attack principles and intrusion techniques, so they have identical or similar system calls sequences and are significantly different from normal system calls sequences. To avoid a narrow definition for normal signatures and intrusion signatures, we apply a similarity measure to this model. The similarity measure we used is similar to [13]. And it differs in that we make a comparison between system calls sequences while [13] makes a comparison between command sequences. The set of normal system calls sequences is denoted by Ψ1 , and the set of intrusion system calls sequences is denoted by Ψ2 . Once Ψ1 and Ψ2 are formed, we compare an incoming sequence to sequences in Ψ1 and Ψ2 to calculate the similarity values between each of the two sets. If the two similarity values have wide gap, we directly classify this sequence to Ψ1 or Ψ2 . For example, if an observed sequence x owns a similarity value 0.8 with Ψ1 and 0.2 with Ψ2 , x is then classified into Ψ1 . Otherwise, if the two similarity values have little difference, we use Bayesian decision theorem to make a decision that this sequence is normal or not. The similarity measure simply assigns a score equal to the number of identical tokens found in the same location of the two sequences and assigns a higher score to adjacent identical token than to separated identical tokens.
112
Hai Jin et al.
We define the similarity of an observed sequence x to a set of sequence, Ψi , as:
Sim(x, Ψi ) = max {Sim(x, seq)}, i = 1, ..., c Seq∈Ψi
(6)
And sequence y is most similar to x in Ψi . Sim(x, y) = max {Sim(x, seq)}, i = 1, ..., c Seq∈Ψi
(7)
We add this factor Sim to our risk-sensitive model, and get the ”artificial” conditional probability of x, p(x | wi ), for given wi via p(y | wi ) and Sim(x, Ψi )
p(x | wi ) = Sim(x, y)p(y | wi )R , i = 1, ..., c
(8)
where R is an adjustive factor to let p(x | wi ) between 0 and 1. In experiment, it is easy to obtain full normal traces. However, due to the limitation knowledge of known intrusions, we can only obtain the known intrusion traces. In order to detect novel intrusions, we can use this similarity measure to extend the known intrusions traces to novel intrusions traces. This solution eliminates the flaws of a narrow definition for intrusion signatures and enhances detecting performance. Experiment shows that the performance of the extended model than that of the original model.
3
Experiment
For three reasons, we have this experiment on the privileged process Sendmail. The first one is that Sendmail is used widely and often becomes the target of hackers. The second is that Sendmail provides various service that owns relatively more leaks and tends to be controlled easily. The last one is that Sendmail is running on root privilege. Because root processes have access to more parts of the system, therefore attackers aim at Sendmail to gain the root privilege. Obviously privileged processes need paying more attentions, and we conducted this experiment on Sendmail. Sendmail was running on a cluster with the Linux operation system in Internet and Cluster Computing Center (ICCC) at Huazhong University of Science and Technology (HUST), and Strace 4.0 was used to trace processes. 3.1
Construction of Sequences Databases
In this experiment, there were two kinds of databases to be builded, including NSCS database and ISCS database. The implementation of NSCS database followed the method of Forrest, as described in [8]. However, it differed in that we added the frequency of each sequence into the database.
A Risk-Sensitive Intrusion Detection Model
113
Table 2. Sequences Samples of length 6 with the Total Number and Frequency sequences samples total number of each frequency fcntl64→fcntl64→fcntl64→fcntl64→fcntl64→fcntl64 742951 13.51% flock→fstat64→flock→flock→fstat64→flock 111113 2.02% time→getpid→getpid→stat64→lstat64→geteuid32 92456 1.68%
Table 3. Sequences Samples of length 9 with the Total Number and Frequency sequences samples total number of each frequency fcntl64→fcntl64→fcntl64→fcntl64 →fcntl64→fcntl64→fcntl64→fcntl64→fcntl64 734558 13.35% flock→fstat64→flock→flock →fstat64→flock→flock→fstat64→flock 99528 1.81% time→getpid→getpid→stat64 →lstat64→geteuid32→lstat64→geteuid32→open 73744 1.34%
We traced Sendmail running for two months and obtained traces of a total of 5.5 million system calls sequences through selecting typical data. Given the sequences of length 6, there were 1348 unique system calls sequences from the total of 5.5 million system calls sequences; given the sequences of length 9, there were 1622 unique system calls sequences; given the sequences of length 12, there were 1938 unique system calls sequences. In addition, we calculated the frequency of each sequence in NSCS database. Under the condition of large sequences samples set , the percentage of sequences could be viewed as probability. Table 2, Table 3 and Table 4 list some sequence samples, and the total number and frequency of each one. From these tables we can see that system calls of Sendmail mainly focus on the operations on files, such as to lock file, to open files, and so on. The total number of each sequence change with the different sequences lengths. The higher value for sequences length, the smaller of total number and frequency of each sequence. Whereafter, we constructed the ISCS database. We generated traces of three types of intrusions behaviors which attack Sendmail effectively. The three types of intrusions included U2R, buffer overflow and forwarding loop. The sunsendmailcp script delegating U2R used a special command line option to cause sendmail to append an email message to a file. By using this script, a local user might obtain root access. The syslog attack delegating buffer overflow used the syslog interface to overflow a buffer in sendmail and left one port for later intrusion. Forwarding loop wrote special email addresses and forward files to form a logical circle and to send letters from machine to machine [8]. During intrusion, intrusion system calls sequences were attained. In order to get the frequency of these intrusion system calls sequences, Strace run on Sendmail for two months
114
Hai Jin et al.
Table 4. Sequences Samples of length 12 with the Total Number and Frequency sequences samples fcntl64→fcntl64→fcntl64→fcntl64→fcntl64→fcntl64 →fcntl64→fcntl64→fcntl64→fcntl64→fcntl64→fcntl64 flock→fstat64→flock→flock→fstat64→flock →flock→fstat64→flock→flock→fstat64→flock time→getpid→getpid→stat64→lstat64→geteuid32 →lstat64→geteuid32→open→fstat64→flock→open
total number frequency of each 725927
13.19%
90746
1.65%
64939
1.18%
to trace intrusion traces. The total intrusion system calls turned out to be 300K and the number of unique intrusion system calls was about 342 with the length 6, about 420 with the length 9, and about 513 with the length 12. 3.2
Detect Known and Novel Intrusion
In this section, we give an illustration of intrusion detection process and test the performance of our model. To determine whether a system calls sequence x is normal or not, we compare x with the sequences in ISCS database and NSCS database. If Sim(x, ISCS) is not less than λI , x is an intrusion system calls sequence; in the same way, if Sim(x, N SCS) is not less than λN , x is a normal system calls sequence. Otherwise, we use the Bayesian decision theorem to make a decision. λN is a threshold value, above which a behavior is regarded as normal, and λI is also a threshold value above which it is deemed intrusion. To detect intrusion effectively, we assign a large value to λN and λI , such as 0.95 and 0.9. In experiment, we need to assign values to the parameters in formula (3) and table 1. The prior probability of an intrusion is called P (w2 ). The value may change for different periods. For example, it may increase with increasing levels of DEFCON, or if there is an increase in the number of hackers operating [10]. The estimate of the prior probability of an intrusion has been realized by Axelsson [1]. P (w1 ) can be got from the following formula. P (w1 ) = 1 − P (w2 )
(9)
p(x | wi ) is the probability of a certain sequence x given a system state wi . The cost of responding to an alarm when there is no intrusion is denoted by λ (α2 , w1 ). The cost of failing to respond to an intrusion is denoted by λ (α1 , w2 ) . We assume that the costs of correct responses are zero, that is, λ (α1 , w1 ) and λ (α2 , w2 ) are zero. We introduce a cost ratio denoted by C, C = λ (α1 , w2 ) /λ (α2 , w1 ) [10]. Table 5 compares the detection rates for old intrusions and new intrusions with sequences of length 12 and with different cost ratio C. Here new intrusions refer to those that did not have corresponding instances in the training data. From the table we can see that detection rates of old intrusions have nothing to
A Risk-Sensitive Intrusion Detection Model
115
Table 5. Detection Rates with Different Cost Ratio C C Category U2R Buffer Overflow Forwarding Loop
1 old 87.7 90.2 92.5
new 35.5 26.7 56.3
10 old 91.3 88.5 90.2
new 80.1 73.6 80.7
20 old 89.6 92.1 93.4
new 82.3 80.4 77.5
40 old 88.3 92.2 91.5
new 50.1 56.3 43.9
do with C. Because the system calls sequences of these old intrusions have been stored in ISCS database, it is easy to detect old intrusions. Whereas detection rates of news intrusions are relevant to C, and a high detection rates can be got for C between 10 and 20. ROC curves for intrusion detection indicate how the detection rate changes with false alarm rate to reflect detection accuracy against analyst workload [18]. Fig.1 shows the ROC curves of the detection models with different sequences lengths. In Fig.1, the x-axis is the false alarm rate, calculated as the percentage of normal sequences classified as an intrusion; the y-axis is the detection rate, calculated as the percentage of intrusions detected. From the curve lines for different sequences lengths, we can see that the detection model with sequences length 12 has the best performance.
Fig. 1. ROC Curves on Detection Rates and False Alarm Rates
116
4
Hai Jin et al.
Conclusions
In this paper, we propose a risk-sensitive intrusion detection model based on Bayesian decision theorem and a similarity measure to minimize the risks of false negatives and false positives. To achieve the goal of detection, NSCS database and ISCS database should be established first. Using similarity measure, misuse intrusion detection based on ISCS database and anomaly intrusion detection based on NSCS database can work well independently. By applying Bayesian decision theorem to our model, the combination of misuse intrusion detection and anomaly intrusion detection is achieved. Through Bayesian decision theorem, this model can minimize the risks of wrong decisions. Empirical experiments show that our risk-sensitive model and deployment techniques are effective in reducing the overall intrusion detection risk. The results show that detection rates of new intrusions are relevant to cost ratio C, and a high detection rates can be obtained for a given C between 10 and 20, and the detection model with sequences length 12 has the best performance. The model proposed in this paper provides us an alternative approach to intrusion detection. Intrusion detection model is a compositive model that needs various theories and techniques, and one or two of which can hardly offer satisfying results. Although the proposed method works well in intrusion detection, it is just a beginning. There is still much work to be done in this field. We will attempt to apply other theories and techniques in our future work in intrusion detection domain.
References [1] S. Axelsson, ”The base-rate fallacy and the difficulty of intrusion detection”, ACM Trans. on Information and System Security, 3(3), 2000, pp.186-205 114 [2] J. O. Berger, Statistical decision theory: foundations, concepts, and methods, New York, Springer, 1980, pp.94-96 109 [3] S. M. Bridges and Rayford B. Vaughn, ”Fuzzy data mining and genetic algorithms applied to intrusion detection”, Proc. of the Twenty-third National Information Systems Security Conference, Baltimore, MD, October 2000 108 [4] R. K. Cunningham, R. P. Lippmann, and S. E. Webster, ”Detecting and displaying novel computer attacks with macroscope”, IEEE Trans. on Systems, Man, and Cybernetics—Part A: Systems and Humans, vol.31, No.4, July 2001, pp.275-281 108 [5] J. E. Dickerson and J. A. Dickerson, ”Fuzzy network profiling for intrusion detection”, Proc. of 19th International Conference of the North American, Fuzzy Information Processing Society, 2000, NAFIPS, pp.301-306 107 [6] E. Eskin, ”Anomaly Detection over Noisy Data using Learned Probability Distributions”, Proc. of ICML00, Palo Alto, CA: July, 2000 111 [7] G. Florez, S. M. Bridges, and R. B. Vaughn, ”An improved algorithm for fuzzy data mining for intrusion detection”, Proc. of NAFIPS, Annual Meeting of the North American, Fuzzy Information Processing Society, 2002, pp.457-462 107 [8] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff, ”A sense of self for unix processes”, Proc. IEEE Symposium on Security and Privacy, Los Alamitos, CA, 1996, pp.120-128 108, 109, 112, 113
A Risk-Sensitive Intrusion Detection Model
117
[9] A. Ghosh and A. Schwartzbard, ”A study in using neural networks for anomaly and misuse detection”, Proc. of the Eighth USENIX seurity Symposium, 1999 107 [10] J. John E. Gaffney and J. W. Ulvila, ”Evaluation of intrusion detectors: a decision theory approach”, IEEE Symposium on Security and Privacy, 2001, pp.50-61 114 [11] S. A. Hofmeyr, S. Forrest, and A. Somayaji, ”Intrusion detection using sequences of system calls”, Journal of Computer Security, 6, 1998, pp.151-180 108 [12] S. Forrest, S. Hofmeyr, and A. Somayaji ”Computer immunology”, Communications of the ACM, 1997, vol.40, No.10, pp.88-96 [13] T. Lane and C. E. Brodley, ”Sequence matching and learning in anomaly detection for computer security”, Proc. of the AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management, 1997, pp.43-49, Menlo Park, CA: AAAI Press 108, 111 [14] T. Lane and C. E. Brodley, ”Temporal sequence learning and data reduction for anomaly detection”, Proc. of the Fifth ACM Conference on Computer and Communications Security, 1998, pp.150-158 108 [15] T. Lane and C. E. Brodley, ”Temporal sequence learning and data reduction for anomaly detection”, ACM Trans. on Information and System Security, 2, 1999, pp.295-331 108 [16] L. Portnoy, E. Eskin and S. J. Stolfo, ”Intrusion detection with unlabeled data using clustering”, Proc. of ACM CSS Workshop on Data Mining Applied to Security (DMSA-2001), Philadelphia, PA: November 5-8, 2001 [17] W. Lee, W. Fan, M. Miller, S. Stolfo, and E. Zadok, ”Toward Cost-Sensitive Modeling for Intrusion Detection and Response”, to appear in Journal of Computer Security, 2001 [18] R. P. Lippman, D. J. Fried, I. Graf, J. W. Haines, K. R. Kendall, D. McCllung, D. Weber, S. E. Webster, D. Wyschogrod, R. K. Cunningham, and M. A. Zissman, ”Evaluating Intrusion Detection Systems: The 1998 DARPA Off-line Intrusion Detection Evaluation”, Proc. of DARPA Information Survivability Conference and Exposition , Jan 25-27, 2000, vol.2, pp.12-26 115 [19] W. Lee and S. Stolfo, ”Data Mining Approaches for Intrusion Detection”, Proc. of the Seventh USENIX Security Symposium (SECURITY ’98), San Antonio, TX, January 1998 107 [20] W. Lee, S. Stolfo, and P. Chan, ”Learning Patterns from Unix Process Execution Traces for Intrusion Detection”, Proc. of AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, July 1997 107 [21] J. Marin, D. Ragsdale, and J. Surdu, ”A hybrid approach to the profile creation and intrusion detection”, Proc. of DARPA Information Survivability Conference & Exposition II, 2001. DISCEX ’01. Proc. vol.1, 2001, pp.69-76 108 [22] Y. Okazaki, I. Sato, and S. Goto, ”A new Intrusion detection method based on process profiling”, Proc. of the 2002 Symposium on Applications and the Internet (SAINT’02) 108, 109 [23] S. L. Scott, ”A Bayesian paradigm for designing intrusion detection systems”, to appear in Computational Statistics and Data Analysis, 2002 108 [24] C. Warrender, S. Forrest, and B. Pearlmutter, ”Detecting intrusions using system calls: Alternative data models”, Proc. IEEE Symposium on Security and Privacy, 1999, pp.133-145 108 [25] N. Ye, X. Li, Q. Chen, S. M. Emran, and M. Xu, ”Probabilistic techniques for intrusion detection based on computer audit data”, IEEE Trans. on Systems, Man, and Cybernetics—Part A: Systems and Humans, vol.31, No.4, July 2001, pp.266-274 107, 108
Applet Verification Strategies for RAM-Constrained Devices Nils Maltesson1 , David Naccache2 , Elena Trichina3 , and Christophe Tymen2 1
3
Lund Institute of Technology Magistratsv¨ agen 27A, Lund, 226 43, Sweden [email protected] [email protected] 2 Gemplus Card International 34 rue Guynemer, Issy-les-Moulineaux, 92447, France {david.naccache,christophe.tymen}@gemplus.com University of Kuopio, Department of Computer Science and Applied Mathematics Po.B. 1627, FIN-70211, Kuopio, Finland [email protected]
Abstract. While bringing considerable flexibility and extending the horizons of mobile computing, mobile code raises major security issues. Hence, mobile code, such as Java applets, needs to be analyzed before execution. The byte-code verifier checks low-level security properties that ensure that the downloaded code cannot bypass the virtual machine’s security mechanisms. One of the statically ensured properties is type safety. The type-inference phase is the overwhelming resource-consuming part of the verification process. This paper addresses the RAM bottleneck met while verifying mobile code in memory-constrained environments such as smart-cards. We propose to modify classic type-inference in a way that significantly reduces memory consumption. Our algorithm is inspired by bit-slice data processing and consists in running the verifier on each variable in turn. In other words, instead of running the fix-point calculation algorithm once on M variables, we re-launch the algorithm M/ times, verifying each time only variables. Parameter can then be tuned to suit the RAM resources available on board whereas M/ upper-bounds the computational effort (expressed in re-runs of the usual fix-point calculation algorithm). The resulting RAM economy, as experimented on a number of popular applets, is around 40%.
1
Introduction
The Java Card architecture for smart cards [2] allows new applications, called applets, to be downloaded into smart-cards. While bringing considerable flexibility and extending the horizons of smart-card usage this post issuance feature raises major security issues. Upon their loading, malicious applets can try to subvert the JVM’s security in a variety of ways. For example, they might try to overflow the stack, hoping to modify memory locations which they are not P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 118–137, 2003. c Springer-Verlag Berlin Heidelberg 2003
Applet Verification Strategies for RAM-Constrained Devices
119
allowed to access, cast objects inappropriately to corrupt arbitrary memory areas or even modify other programs (Trojan horse attacks). While the general security issues raised by applet download are well known [9], transferring Java’s safety model into resource-constrained devices such as smart-cards appears to require the devising of delicate security-performance trade-offs. When a Java class comes from a distrusted source, there are two basic manners to ensure that no harm will be done by running it. The first is to interpret the code defensively [3]. A defensive interpreter is a virtual machine with built-in dynamic runtime verification capabilities. Defensive interpreters have the advantage of being able to run standard class files resulting from any Java compilation chain but appear to be slow: the security tests performed during interpretation slow-down each and every execution of the downloaded code; as will be seen later, the memory complexity of these tests is not negligible either. This renders defensive interpreters unattractive for smartcards where resources are severely constrained and were, in general, applets are downloaded rarely and run frequently. Another method consists in running the newly downloaded code in a completely protected environment (sandbox), thereby ensuring that even hostile code will remain harmless. Java’s security model is based on sandboxes. The sandbox is a neutralization layer preventing direct access to hardware resources. In this model, applets are not compiled to machine language, but rather to a virtualmachine assembly-language called byte-code. Upon download, the applet’s byte-code is subject to a static analysis called byte-code verification which purpose is to make sure that the applet’s code is well-typed. This is necessary to ascertain that the code will not attempt to violate Java’s security policy by performing ill-typed operations at runtime (e.g. forging object references from integers or calling directly API private methods). Today’s de facto verification standard is Sun’s algorithm [8] which has the advantage of being able to verify any class file resulting from any standard compilation chain. While the time and space complexities of Sun’s algorithm suit personal computers, the memory complexity of this algorithm appears prohibitive for smart-cards, where RAM is a significant cost-factor. This limitation gave birth to a number of innovating workarounds: Leroy [6, 7] devised a verification scheme which memory complexity equals the amount of RAM necessary to run the verified applet. Leroy’s solution relies on off-card code transformations whose purpose is to facilitate on-card verification by eliminating the memory-consuming fix-point calculations of Sun’s original algorithm. Proof carrying code [11] (PCC) is a technique by which a side product of the full verification, namely, the final type information inferred at the end of the verification process (fix-point), is sent along with the byte-code to allow a straight-line verification of the applet. This extra information causes some transmission overhead, but the memory needed to verify a code becomes essentially equal to the RAM necessary to run it. A PCC off-card proof-generator is a rather complex software.
120
Nils Maltesson et al.
The work reported in this paper describes two new memory optimization techniques. The rest of the paper is organized as follows: the next section recalls Java’s security model and Sun’s verification algorithm with a specific focus on its dataflow analysis part. The subsequent sections describe in detail our algorithms, which benchmarks are given in the last section.
2
Java Security
The Java Virtual Machine (JVM) Specification [8] defines the executable file structure, called the class file format, to which all Java programs are compiled. In a class file, the executable code of methods (Java methods are the equivalent of C functions) is found in code-array structures. The executable code and some method-specific runtime information (namely, the maximal operand stack size Smax and the number of local variables Lmax claimed by the method) constitute a code-attribute. We briefly overview the general stages that a Java code goes through upon download. To begin with, the classes of a Java program are translated into independent class files at compile-time. Upon a load request, a class file is transferred over the network to its recipient where, at link-time, symbolic references are resolved. Finally, upon method invocation, the relevant method code is interpreted (run) by the JVM. Java’s security model is enforced by the class loader restricting what can be loaded, the class file verifier guaranteeing the safety of the loaded code and the security manager and access controller restricting library methods calls so as to comply with the security policy. Class loading and security management are essentially an association of lookup tables and digital signatures and hence do not pose particular implementation problems. Byte-code verification, on which we focus this paper, aims at predicting the runtime behavior of a method precisely enough to guarantee its safety without actually having to run it. 2.1
Byte-Code Verification
Byte-code verification [5] is a link-time phase where the method’s run-time behavior is proved to be semantically correct. The byte-code is the executable sequence of bytes of the code-array of a method’s code-attribute. The byte-code verifier processes units of method-code stored as class file attributes. An initial byte-code verification pass breaks the byte sequence into successive instructions, recording the offset (program point) of each instruction. Some static constraints are checked to ensure that the bytecode sequence can be interpreted as a valid sequence of instructions taking the right number of arguments. As this ends normally, the receiver assumes that the analyzed file complies with the general syntactical description of the class file format.
Applet Verification Strategies for RAM-Constrained Devices
121
Then, a second verification step ascertains that the code will only manipulate values which types are compatible with Java’s safety rules. This is achieved by a type-based data-flow analysis which abstractly executes the method’s bytecode, by modeling the effect of the successive byte-codes on the types of the variables read or written by the code. The next section explains the semantics of type checking, i.e., the process of verifying that a given pre-constructed type is correct with respect to a given class file. We explain why and how such a type can always be constructed and describe the basic idea behind data-flow analysis. The Semantics of Type Checking A natural way to analyze the behavior of a program is to study its effect on the machine’s memory. At runtime, each program point can be looked upon as a memory instruction frame describing the set of all the runtime values possibly taken by the JVM’s stack and local variables. Since run-time information, such as actual input data is unknown before execution starts, the best an analysis may do is reason about sets of possible computations. An essential notion used for doing so is the collecting semantics defined in [4] where, instead of computing on a full semantic domain (values), one computes on a restricted abstract domain (types).
↑
stack growth
12711 @346 127.55 1113
= values
int Ljava/lang/String; FH FL int
= types
For reasoning with types, one must precisely classify the information expressed by types. A natural way to determine how (in)comparable types are is to rank all types in a lattice L. A brief look at the toy lattice depicted below suffices to find-out that animal is more general than fly, that int and spider are not comparable and that cat is a specific animal. Hence, knowing that a variable is designed to safely contain an animal, one can infer that no harm can occur if during execution this variable would successively contain a cat, a fly and an insect. However, should the opposite be detected (e.g. an instruction would attempt to use a variable supposed to contain an animal as if it were a cat) the program should be rejected as unsafe. The most general type is called top and denoted . represents the potential simultaneous presence of all types, i.e. the absence of (specific) information. By definition, a special null-pointer type (denoted null) terminates the inheritance chain of all object descendants. Formally, this defines a pointed complete partial order (CPO) on the lattice L.
122
Nils Maltesson et al.
int cat ↓ null
Object ↓ animal
spider ↓ null
insect ↓ bee fly ↓ ↓ null null
Stack elements and local variable types are hence tuples of elements of L to which one can apply point-wise ordering. int
L=
↓ ···
Object ↓ ··· τk τ1 ↓ ↓ . . . . . . . . . . ··· . . ··· τ··· τ··· τ··· τ··· ↓ ↓ ↓ ↓ ↓ ↓ null null null null null null
Abstract Interpretation The verification process described in [8] §4.9, is an (iterative data-flow analysis) algorithm that attempts to builds an abstract description of the JVM’s memory for each program point. A byte-code is safe if the construction of such an abstract description succeeds. Assume, for example, that an iadd is present at some program point. The i in iadd hints that this instruction operates on integers. iadd’s effect on the JVM is indeed very simple: the two topmost stack elements are popped, added and the sum is pushed back into the stack. An abstract interpreter will disregard the arithmetic meaning of iadd and reason with types: iadd pops two int elements from the stack and pushes back an int. From an abstract perspective, iadd and isub have identical effects on the JVM. As an immediate corollary, a valid stack for executing an iadd must have a value which can be abstracted as int.int.S, where S may contain any sequence of types (which are irrelevant for the interpretation of our iadd). After executing iadd the stack becomes int.S Denoting by L the JVM’s local variable area (irrelevant to iadd), the total effect of iadd’s abstract interpretation on the JVM’s memory can be described by the transition rule Φ: iadd :
(int.int.S,
L)
→
(int.S,
L)
The following table defines the transition rules of seven representative JVM instructions1 . 1
Note that the test n ∈ L is equivalent to ascertaining that 0 ≤ n ≤ Lmax .
Applet Verification Strategies for RAM-Constrained Devices Instruction
Transition rule Φ
iconst[n] iload[n] istore[n] aload[n] astore[n] dup getfield C.f.τ
(S, L) (S, L) (int.S, L) (S, L) (τ.S, L) (τ.S, L) (ref (D).S,
L)
123
Security test → → → → → → →
(int.S, L) (int.S, L) (S, L{n → int}) (L[n].S, L) (S, L{n → τ }) (τ.τ.S, L) (τ.S, L)
| S |< Smax n ∈ L, L[n] == int, | S |< Smax n ∈ L n ∈ L, L[n] Object, | S |< Smax n ∈ L, τ Object | S |< Smax D C
For the first instruction of the method, the local variables that represent parameters are initialized with the types τj indicated by the method’s signature; the stack is empty () and all other local variables are filled with s. Hence, the initial frame is set to: (,
(this, τ1 , . . . , τn−1 , , . . . , ))
For other instructions, no information regarding the stack or the local variables is available. Verifying a method whose body is a straight-line code (no branches), is easy: we simply iterate the abstract interpreter’ transition function Φ over the successive instructions, taking the stack and register types after any given instruction as the stack and register types before the next instruction. The types describing the successive JVM memory-states produced by the successive instructions are called frames. Denoting by in(i) the frame before instruction i and by out(i) the frame after instruction i, we get the following data-flow equation where evaluation starts from the right: in(i + 1) ← out(i) ← Φi (in(i)) Branches introduce forks and joins into the method’s flowchart. Let us illustrate these with the following example: program point Java code int m (int q) { p1 → int x; int y; if (q == 0) p2 → { x = 1; ... } p3 → else { y = 2; ... } p4 → ... }
After program point p1 one can infer that variable q has type int. This is denoted as out(p1 ) = {q = int, x = , y = }. After the if’s then branch, we infer the type of variable x, i.e., out(p2 ) = {q = int, x = int, y = }. After the else, we learn that out(p3 ) = {q = int, x = , y = int}. However, at p4 , nothing can be said about neither x nor y. We hence prudently assume that in(p4 ) = {q = int, x = , y = } by virtue of the principle that if two execution paths yield different types for a given variable, only the
124
Nils Maltesson et al.
lesser-information type can serve for further calculations. In other words, we assume the worst and check that, still, type-violations will not occur. Thus, if an instruction i has several predecessors with different exit frames, i’s frame is computed as the least common ancestor2 (LCA) of all the predecessors’ exit frames: in(i) = LCA{out(i) | j ∈ Predecessor(i)}. In our example: in(p4 ) = {q = int, x = = LCA(int, ), y = = LCA(, int)} Finding an assignment of frames to program points which is sufficiently conservative for all execution paths requires testing them all; this is what the verification algorithm does. Whenever some in(i) is adjusted, all frames in(j) that depend on in(i) have to be adjusted too, causing additional iterations until a fixpoint is reached (i.e., no more adjustments are required). The final set of frames is a proof that the verification terminated with success. In other words, that the byte-code is well-typed. 2.2
Sun’s Type-Inference Algorithm
The algorithm below which summarizes the verification process, is taken from [8]. The treatment of exceptions (straightforward) is purposely omitted for the sake of clarity. The initialization phase of the algorithm consists of the following steps: 1. Initialize in(0) ← (,
(this, τ1 , . . . , τn−1 , , . . . , ))
where (τ1 , . . . , τn−1 ) is the method’s signature. 2. A ‘changed‘ bit is associated to each instruction, all ‘changed‘ bits are set to zero except the first. Execute the following loop until no more instructions are marked as ‘changed‘ (i.e., a fix-point is reached). 1. Choose a marked instruction i. If there aren’t any, the method is safe (exit). Otherwise, reset the ‘changed‘ bit of the selected instruction. 2. Model the effect of the instruction on in(i) by doing the following: – If the instruction uses values from the stack, ensure that: • There are sufficiently many values on the stack, and that • The topmost stack elements are of types that suit the executed instruction. Otherwise, verification fails. – If the instruction uses local variables: • Ascertain that these local variables are of types that suit the executed instruction. 2
The LCA operation is frequently called unification.
Applet Verification Strategies for RAM-Constrained Devices
125
Otherwise, verification fails. – If the instruction pushes values onto the stack: • Ascertain that there is enough room on the stack for the new values. If the new stack’s height exceeds Smax , verification fails; • Add the types produced by the instruction to the top of the stack. – If the instruction modifies local variables, record these new types in out(i). 3. Determine the instructions that can potentially follow instruction i. A successor instruction can be one of the following: – For most instructions, the successor instruction is just the next instruction; – For a goto, the successor instruction is the goto’s jump target; – For an if, both the if’s remote jump target and the next instruction are the successors; – return has no successors. – Verification fails if it is possible to ‘fall off‘ the last instruction of the method. 4. Unify out(i) with the in(k)-frame of each successor instruction k. – If this successor instruction k is visited for the first time, • record that out(i) calculated in step 2 is now the in(k)-frame of the successor instruction; • mark the successor instruction by setting the ‘changed‘ bit. – If the successor instruction has been visited before, • Unify out(i) with the successor instruction’s (already present) in(k)frame and update : in(k) ← LCA(in(k), out(i)). • If the unification caused modifications in in(k), mark the successor instruction k by setting its ‘changed‘ bit. 5. Go to step 1. If the code is safe, the algorithm must exit without reporting a failure. 2.3
Basic Blocks and Memory Complexity
As explained above, the data-flow type analysis of a straight-line code consists of simply applying the transition function to the sequence of instructions i1 , i2 , ..., it taking in(ik ) ← out(ik−1 ). This property can be used for optimizing the algorithm. Following [1, 10], we call a basic block (B) a straight-line sequence of instructions that can be entered only at its beginning and exited only at its end. For instance, we identify in the example below four basic blocks denoted B0 , B1 , B2 and B3 :
126
Nils Maltesson et al.
B0 B0 Public class Example { B0 public int cmpz (int a, int b) B1 { B1 int c; B1 compile if (a==b) −→ B1 c = a+b; B1 else B2 c = a*a; B2 return c; B2 } B2 B3 B3
Method int cmpz(int,int) 0 iload 1 1 iload 2 2 if cmpne 12 5 iload 1 6 iload 2 7 iadd 8 istore 3 9 goto 16 10 iload 1 11 iload 1 12 imul 13 istore 3 14 iload 3 15 ireturn
In several implementations of Sun’s algorithm, the data-flow equations evolve at the basic-block-level rather than at the instruction-level. In other words, it suffices to keep track in permanent memory only the frames in(i) where i is the first instruction of a B (i.e., a branch target). All other frames within a basic block can be temporarily recomputed on the fly. By extension, we denote by in(B) and out(B), the frames before and after the execution of B. The entire program is denoted by P. Denoting by Nblocks the number of Bs in a method, a straightforward implementation of Sun’s algorithm allocates Nblocks frames, each of size Lmax + Smax . Lmax and Smax are determined by the compiler and appear in the method’s header. This results in an O((Lmax + Smax ) × Nblocks ) memory-complexity. In practice, the verification of moderately complex methods would frequently require a few thousands of bytes. 2.4
The Stack’s Behavior
A property of Java code is that a unique stack height is associated to each program point. This property is actually verified on the fly during type-inference although it could be perfectly checked independently of type-inference. In other words, the computation of stack heights does not require the modeling of the instructions’ effect on types, but only on the stack-pointer. Denoting by σi the stack height associated to program point i, this section presents a simple algorithm for computing {σ0 , σ1 , . . .} from P The algorithm uses a table ∆ associating to each instruction a signed integer indicating the effect of this instruction on the stack’s size:
Applet Verification Strategies for RAM-Constrained Devices ∆ 2 1 -1 -1 -1 0 0
Instruction iconst[n] aload aaload astrore[n] sadd,smul iinc goto
∆ 1 1 0 -2 -2 -3 0
Instruction sconst[n] sload iaload store[n] iadd,imul icmp return
∆ 1 1 -1 -1 0 -1 0
Instruction bspush aload[n] astore pop getfield a ifne athrow
∆ 2 2 -2 1 1 -2 0
127
Instruction bipush iload[n] istore dup getfield i if acmpne arraylength
The information we are looking for is easily obtained by running Sun’s algorithm with the modeling effect on types turned off, monitoring only the code’s effect on the stack pointer: Algorithm PredictStack(P) – Associate to each program point i a bit changed[i] indicating if this program point needs to be re-examined; initialize all the changed[i]-bits to zero. – Set σ0 ← 0; changed[0] ← 1; – For all exception code entry points3 j, set changed[j] ← 1 and σj ← 1; – While ∃ i such that changed[i] == 1: • Set changed[i] ← 0; • α ← σi + ∆(i) • If α > Smax or α < 0 then report a failure. • If i is the program’s last instruction and it is possible to fall-off the program’s code then report a failure. • For each successor instruction k of i : ∗ If k is visited for the first time then set σk ← α; changed[k] ← 1 ∗ If k was visited before and σk = α, then report a failure. – Return {σ0 , σ1 , . . .}
3
A Simplified Defensive Virtual Machine Model
We model the JVM by a very basic state-machine. Although over-simplified, our model suffices for presenting the verification strategies described in this paper. 3.1
Memory Elements
Variables and the stack elements will be denoted: L = {L[0], . . . , L[Lmax − 1]} and S = {S[0], . . . , S[Smax − 1]} Since in Java a precise stack height σj is associated with each j we can safely use a unique memory-space M to identify all memory elements: albeit, the stack 3
These can be found in the method component.exception handlers[j].handler offset fields of Java card *.cap files.
128
Nils Maltesson et al.
machine can be very easily converted into a full register machine by computing {σ0 , σ1 , . . .} ← PredictStack(P) and replacing stack accesses S[σj ] by register accesses L[Lmax + σj ]. we thus denote Mmax = Lmax + Smax and: M = {M [0], . . . , M [Mmax − 1]} = {L[0], . . . , L[Lmax − 1], S[0], . . . , S[Smax − 1]} . 3.2
Operational Semantics
We assume that each instruction reads and re-writes the entire memory M . In other words, although in reality only the contents of very few variables will change after the execution of each byte-code, we regard the byte-code at program point j as a collection of Mmax functions : M [i] ← φj,i (M ) for 0 ≤ i < Mmax which collective effect can be modeled as : M ← {φj,0 (M ), . . . , φj,Mmax −1 (M )} = Φj (M ) Based upon the instruction (j) and the data (M ) the machine selects a new j (the current instruction’s successor) using an additional "next instruction" function θj (M ). Execution halts when θj (M ) outputs a special value denoted stop. Using the above notation, the method’s execution boils-down to setting j ← 0 and iterating {j, M } ← {θj (M ), Φj (M )} while j ∈ {stop, errorruntime }. where errorruntime signals an error encountered during the course of execution (such as a division by zero for instance). 3.3
Defensive Interpretation
¯ [i] ∈ L. In A Defensive JVM associates to each value M [i] a type denoted M general, functions and variables operating on types will be distinguished by upper bars (V¯ represents the type of the value contained in V ). ¯ through Given an instruction j, Java’s tying rules express the effect of j on M a function : ¯ ) : LMmax Φ¯j (M → {L ∪ errortype }Mmax where errortype is an error resulting from a violation of Java’s typing rules. By definition, whenever errortype occurs, execution stops. The effect of Φ¯j simply shadows that of Φj : ¯ ← {φ¯j,0 (M ¯ ), . . . , φ¯j,Mmax −1 (M ¯ )} = Φ¯j (M ¯) M
Applet Verification Strategies for RAM-Constrained Devices
129
The complete Defensive Java Virtual Machine DJVM(P, input data), can hence be modeled as follows: ¯ } ← {0, input data, signature(P)} – {j, M, M ¯) – while (j ∈ {stop, errorruntime } and errortype ∈M ¯ } ← {θj (M ), Φj (M ), Φ¯j (M ¯ )} • {j, M, M
4
Variable-Wise Verification
Variable-wise verification is inspired by bit-slice data processing and consists in running the verifier on each variable in turn. In other words, instead of calculating at once the fix-points of Mmax variables, we launch the algorithm Mmax / times, verifying each time only variables. Parameter can then be tuned to suit the RAM resources available on board whereas Mmax / will upper-bound the computational effort expressed in re-runs of [8]. The advantage of this approach is the possibility to re-use the same tiny RAM space for the sequential verification of different variables. 4.1
A Toy-Example
Consider the following example where = 1, and the operation M [13] ← M [4] + M [7]
(1)
is to be verified. The operator + (sadd) requires two arguments of type short; we launch the complete verification process for i ← 0, . . . , Mmax − 1 : – When i ∈ {4, 7, 13} nothing is done. – When i = 4 (i.e. we are verifying M [4]), the algorithm meets expression ¯ [4] is short, assuming that M ¯ [7] is short. The (1) and only checks that M operator’s effect on M [13] is ignored. – When i reaches 7, the algorithm meets expression (1) again and checks only ¯ [7] is short, this time the algorithm assumes that M ¯ [4] is short. The that M operator’s effect on M [13] is ignored again. – When i reaches 13, the algorithm meets expression (1) and models its effect ¯ [13] by assigning M ¯ [13] ← short. only on M Hence, in runs 4 and 7 we successively ascertained that no type violations occurred in the first (M [4], run 4) or the second (M [7], run 7) argument of the operator +, while the 13-th round modeled the effect of sadd on M [13]. Note that the same RAM variable could be used to host, in turn, the type information associated to M [4], M [7] and M [13].
130
4.2
Nils Maltesson et al.
The Required Properties
For this to work, each instruction (j) must comply with the following two properties: 1. There exist Mmax − 1 sets of types Tj,0 , . . . , Tj,Mmax −1 such that: ¯ ∈ Tj,0 × Tj,1 × . . . × Tj,Mmax −1 , ∀M
¯) errortype ∈ Φ¯j (M
¯,M ¯ ∈ Tj,0 × Tj,1 × . . . × Tj,Mmax −1 2. ∀M ∀i, 0 ≤ i < Mmax ,
¯ [i] = M ¯ [i] ⇒ φ¯j,i (M ¯ ) = φ¯j,i (M ¯ ) M
The first requirement expresses the independence between the types of variables read by the instruction; this is necessary to verify independently each variable regardless the types of its neighbors. The second requirement (selfsufficiency) guarantees that the type of each variable before executing the instruction suffices to precisely determine its type after the execution of the instruction. 4.3
Byte-Code Compliance
We now turn to examine the compliance of a few concrete Java-card [2] bytecodes with these definitions. The stack elements that our examples will operate on are: {S[σj ], S[σj + 1], S[σj + 2], . . .} = {M [Lmax + σj ], M [Lmax + σj + 1], M [Lmax + σj + 2], . . .} . Example 1: icmp icmp transforms the types of the four topmost stack elements from {intH, intL, intH, intL} to {short, undef, undef, undef}. (1) is fulfilled: the sets from which variable types can be chosen are : for i ∈ {0, 1, 2, 3} Tj,Lmax +σj = {intH} Tj,Lmax +σj +2 = {intH}
Tj,Lmax +σj +i = L Tj,Lmax +σj +1 = {intL} Tj,Lmax +σj +3 = {intL}
(2) is also fulfilled: the type of each variable after the execution of icmp can be precisely determined from the variable’s type before executing icmp : ¯)= M ¯ [Lmax + σj + i] for i ∈ {0, 1, 2, 3} φ¯j,Lmax +σj +i (M ¯ ) = short φ¯j,Lmax +σj (M ¯ φ¯j,L +σ +2 (M ) = undef max
j
¯ ) = undef φ¯j,Lmax +σj +1 (M ¯ ) = undef φ¯j,Lmax +σj +3 (M
Applet Verification Strategies for RAM-Constrained Devices
131
Example 2: pop pop acts only on the topmost stack element (i.e. S[σj ] = M [Lmax + σj ]) and transforms its type from any type different than intL to undef. property (1):
Tj,x =
property (2):
¯)= φ¯j,Lmax +σj +i (M
L − {intL} for x = Lmax + σj , L for x = Lmax + σj . undef for i = 0 , ¯ [Lmax + σj + i] for i M =0 .
Example 3: dup dup duplicates the topmost stack element S[σj ] = M [Lmax + σj ]. Property (1) is satisfied (dup can duplicate any type) : Tj,0 × Tj,1 × . . . × Tj,Mmax −1 = LMmax However, property (2) is clearly violated for Lmax + σj + 1; indeed, an M ¯ [Lmax + σj ] ¯ [Lmax + σj ] and M ¯ [Lmax + σj + 1] = and an M such that M =M ¯ [Lmax + σj + 1] = undef, yield: M ¯)= M ¯ [Lmax + σj ] ¯ ) = M ¯ [Lmax + σj ] φ¯Lmax +σj +1 (M = φ¯Lmax +σj +1 (M Hence, unlike the previous examples, dup does not lend itself to variable-wise verification. dup belongs to a small family of byte-codes (namely: dup, dup2, dup x, swap x, aload, astore and athrow) that ’mix’ or ’cross-contaminate’ the types of the variables they operate on. The workaround is simple: before starting verification, parse P. Whenever one of these problematic instructions is encountered, group all the variables processed by the instruction into one, bigger, ’extended’ variable. The algorithm performing this packing operation, Group(P), is described in the next section. 4.4
Grouping Variables
Grouping transforms the list M = {0, 1, 2, · · · , Mmax − 1} into a list G with a lesser number of symbols. All G-elements containing equal symbols are to be interpreted as M [i] cells that must be verified together as their types are inter-dependent. The algorithm below describes the grouping process. Although in our practical implementation PredictStack(P) was merged into Group(P)’s main loop (this spares the need to save σ0 , σ1 , . . .), PredictStack(P) was moved here into the initialization phase for the sake of clarity.
132
Nils Maltesson et al.
Algorithm Group(P) – Initialize M ← {0, 1, 2, · · · , Mmax − 1}. For the sake of simplicity, we denote by S[i] the elements of M that shadow stack cells and by L[i] the elements of M that shadow local variables4 . – An ‘unseen‘ bit is associated to each instruction. All ‘unseen‘ bits are reset. – Run PredictStack(P) to compute σ0 , σ1 , . . . Iterate the following until no more ‘unseen‘ bits are equal to zero (i.e., all the method’s byte-codes were processed exactly once): – Choose an ‘unseen‘ instruction j. If there aren’t any return the list G ← M and exit. Otherwise, set the ‘unseen‘ bit of the selected instruction. • if the j-th instruction is a dup, dup2, dup x or swap x then lookup the row (k) corresponding to instruction j. For all non-empty entries in (k) replace all occurrences of max{S[σj + (k)], S[σj + k]} in M by min{S[σj + (k)], S[σj + k]}. bytecode ↓ dup x {m = 1, n = 1, 0} dup x {m = 1, n = 2} dup x {m = 1, n = 3} dup x {m = 1, n = 4} dup x {m = 1, n = 5} dup x {m = 2, n = 5} dup x {m = 3, n = 5} dup x {m = 3, n = 4} dup x {m = 2, n = 3} dup x {m = 4, n = 7} dup x {m = 4, n = 5} dup x {m = 3, n = 7} dup x {m = 2, n = 2, 0} dup x {m = 2, n = 4} dup x {m = 2, n = 6} dup x {m = 4, n = 6} dup x {m = 3, n = 3, 0} dup x {m = 3, n = 6} dup x {m = 4, n = 8} dup x {m = 4, n = 4, 0} dup dup2 swap x {m = 1, n = 1} swap x {m = 1, n = 2} swap x {m = 2, n = 1} swap x {m = 2, n = 2}
k
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
(k)
4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 00 0 0 0 0 0 0 0 0 000 0 0 0 0 0 0 0 000 0 0 0 0 0 00 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 0 1 010 1 0 1 0 1 0 2 1 0 2 1 0 2 1 0 3 2 1 0 3 2 1 0 3 2 1 0 0 1 0 0 0 0 0 0 -1
• if the j-th instruction is an aload , astore , aload , or astore then replace all occurrences of max{L[n], S[σj ]} in M by min{L[n], S[σj ]}. • if the j-th instruction is an athrow then replace all occurrences of max{S[0], S[σj ]} in M by min{S[0], S[σj ]}. The process is illustrated below by a toy-example where the character ’ ’ denotes stack cells used by the program. 4
def def i.e. L[i] = M[i] and S[i] = M[i + Lmax ].
Applet Verification Strategies for RAM-Constrained Devices
133
stack −→ L0 L1 L2 L3 L4 L5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
M M M M M M M M M M M M M M M M M M M M M G
sconst 3 sconst 5 sdiv pop aload <1> aconst null aload <2> aload <3> aconst null dup swap x m=2,n=1 if acmpeq 14 sconst 2 sstore <3> pop2 pop2 sconst 2 sstore 4 sconst 3 sstore 5 return
= = = = = = = = = = = = = = = = = = = = = =
S0 S1 S2 S3 S4 S5
0
1
2
3
4
5
6
7
8
9 10 11
0
1
2
3
4
5
6
7
8
9 10 11
0
1
2
3
4
5
6
7
8
9 10 11
0
1
2
3
4
5
6
7
8
9 10 11
0
1
2
3
4
5
6
7
8
9 10 11
0
1
2
3
4
5
1
7
8
9 10 11
0
1
2
3
4
5
1
7
8
9 10 11
0
1
2
3
4
5
1
7
2
9 10 11
0
1
2
3
4
5
1
7
2
3 10 11
0
1
2
3
4
5
1
7
2
3 10 11
0
1
2
3
4
5
1
7
2
3 10 10
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
0
1
2
3
4
5
1
7
2
3
3
3
Given that the largest group of variables (those tagged by 3) has four elements (namely L3, S3, S4 and S5), it appears that the code can be verified with 4-cell frames (instead of 12-cell ones). Having reduced memory complexity as much as we could, it remains to determine how many passes are required to verify the code. At a first glance, seven passes will do, namely: pass pass pass pass pass pass pass
1 2 3 4 5 6 7
L3 S3 S4 S5 L2 S2 L1 S0 L0 L4 L5 S1
However, given that we anyway pay the price of a 4-cell memory complexity, it would be a pity to re-launch the entire verification process without packing passes 2, 3, 4, 5, 6 and 7 into two additional 4-cell passes. For instance: pass 1 L3 S3 S4 S5 pass 2 L2 S2 L4 L5 pass 3 L1 S0 L0 S1
This is realized by the algorithm described in the next section.
134
4.5
Nils Maltesson et al.
Bin-Packing
Bin-packing is the following NP-complete problem: Given a set of n positive integers U = {u1 , u2 , . . . , un } and a positive bound B, divide U into k disjoint subsets U = U1 ∪ U2 ∪, . . . , ∪Uk such that: Ui is smaller than B. – The sum of all elements in each subset Ui , denoted – The number of subsets k is minimal. Although no efficient algorithm can solve this problem exactly, a number of efficient algorithms that find very good approximate solutions (i.e. k k) exist [15, 16, 17]. Bin-packing (approximation) algorithms come in two flavors : on-line and off-line ones. On-line algorithms receive the ui s one after another and place ui in a subset before getting ui+1 . Although the on-line constraint is irrelevant to our case (we dispose of the entire set U as Group(P) ends), very simple on-line algorithms [14] computing approximations tighter than k ≤ k ≤ 17 10 k + 2 exist. First-Fit : places ui in the leftmost Uj that has enough space to accommodate ui . If no such Uj is found, then a new Uj is opened. Best-Fit : places ui in the Uj that ui fills-up the best. In other words, ui is added to the Uj that minimizes B − Uj − ui . In case of tie, the lowest index j is chosen. If no such Uj is found, then a new Uj is opened. Refined versions of these algorithms (e.g. Yao’s First-Fit) even find approximations tighter than k ≤ k ≤ 53 k + 5. Off-line algorithms perform much better. Best-fit and First-Fit can be improved by operating on a sorted U. In other words, the biggest ui is placed first, then the second-biggest ui is placed etc. The resulting algorithms are called FirstFit-Decreasing and Best-Fit-Decreasing and yield approximations tighter than k ≤ k ≤ 11 9 k + 4. Note that the implementation of both Best-Fit-Decreasing and First-FitDecreasing on 8-bit micro-controllers are trivial. We denote by {v1 , . . . , vk } ← BinPacking(G) the following algorithm: – Let ui be the number of occurrences of symbol i in G. Let B = max{ui }. Initialize N ← G. – Solve {U1 , . . . , Uk } ← BestFitDecreasing(B; {u1 , . . . , un }) – For i ← 1 to k: if uj was placed in Ui then replace all occurrences of j in N by βi = min {j}. uj ∈Ui
– Let {v1 , v2 , · · · , vk } be a set of Mmax -bit strings initialized to zero. – For i ← 1 to k • w←1 • For ← 0 to Mmax − 1 if N[] == βi set vi [] ← w; set w ← w + 1; – Return {v1 , v2 , · · · , vk }
Applet Verification Strategies for RAM-Constrained Devices
135
Hence, the effect of BinPacking on the previous example’s output would be: P
↓
0
1
2
3
4
5
6
7
8
9 10 11
=M
0
1
2
3
4
5
1
7
2
3
3
3
=G
0
0
2
3
2
2
0
0
2
3
3
3
=N
Group
↓
a collection of 7 variable groups G
↓
BinPacking
↓
a collection of 3 ≤ 7 variable groups N U1
↓ v1
↓ U2
U3
v2
v3
↓
↓
v1 = 1 2 0 0 0 0 3 4 0 0 0 0 v2 = 0 0 1 0 2 3 0 0 4 0 0 0 v3 = 0 0 0 1 0 0 0 0 0 2 3 4
4.6
Putting the Pieces Together
The group-wise verification process GWVer(P, v) mimics very closely Sun’s original algorithm. There are only two fundamental differences between the two algorithms: – In GWVer(P, v) each frame contains only µ = max(v[i]) memory cells (de¯ noted in(i) = {T [0], . . . , T [µ − 1]}) instead of Mmax -cell frames. – Whenever Sun’s verifier reads or writes a variable M [i] in some in(·), GWVer(P, v) substitutes this operation by a reading or a writing into the ¯ memory cell T [v[i] − 1] in in(i). Hence, we built a memory interface to Sun’s algorithm so that execution would require O(µ×Nblocks ) memory-complexity instead of a O(Mmax ×Nblocks ) The entire process is summarized in the following schematic: P
↓
Group
↓ ↓
a collection of n variable groups G BinPacking
↓
a collection of k ≤ n variable groups U = U1 ∪ U2 ∪, . . . ∪ Un
136
Nils Maltesson et al.
U1
↓
···
v1
↓
Ui
···
↓ vi
↓
Un
↓
vn
↓
↓
P → GWVer · · ·
P → GWVer
· · · P → GWVer
accept/reject
accept/reject
accept/reject
↓
↓
↓
↓
logical and
↓
accept/reject P
To evaluate experimentally the above process, we wrote a simple program that splits variables into categories for a given *.jca file and counts the number of RAM cells necessary to verify its most greedy method. We used for our estimates the representative Java card applets from [13]. The detailed outputs of our program are available upon request from the authors. Results are rather encouraging, the new verification strategy seems to roughly save 40% of the memory claimed by [8]. Increase in workload is a rough doubling of verification time (due to more complex bookkeeping and the few inherent extra passes traded-off against memory consumption). Applet Sun [8] NullApp.jca 6 HelloWorld.jca 40 JavaLoyalty.jca 48 Wallet.jca 99 JavaPurse.jca 480 Purse.jca 550 CryptoApplet.jca 4237
Group-Wise 4 = 6 × 66% 12 = 40 × 30% 45 = 48 × 93% 55 = 99 × 55% 200 = 480 × 41% 350 = 550 × 63% 2230 = 4237 × 52%
Acknowledgments The authors would like to thank Jacques Stern for his help concerning a number of technical points.
References [1] A. Aho, R. Sethi, J. Ullman, Compilers: Principles, Techniques, and Tools, Addison-Wesley, 1986. 125 [2] Z. Chen, Java Card Technology for Smart Cards: Architecture and Programmer’s Guide, The Java Series, Addison-Wesley, 2000. 118, 130 [3] R. Cohen, The defensive Java virtual machine specification, Technical Report, Computational Logic Inc., 1997. 119
Applet Verification Strategies for RAM-Constrained Devices
137
[4] P. Cousot, R. Cousot, Abstract Interpretation: a Unified Lattice Model for Static Analysis by Construction or Approximation of Fixpoints, Proceedings of POPL’77, ACM Press, Los Angeles, California, pp. 238-252. 121 [5] X. Leroy, Java Byte-Code Verification: an Overview, In G. Berry, H. Comon, and A. Finkel, editors, Computer Aided Verification, CAV 2001, volume 2102 of Lecture Notes in Computer Science, pp. 265-285, Springer-Verlag, 2001. 120 [6] X. Leroy, On-Card Byte-code Verification for Java card, In I. Attali and T. Jensen, editors, Smart Card Programming and Security, proceedings E-Smart 2001, volume 2140 of Lecture Notes in Computer Science, pp. 150-164, Springer-Verlag, 2001. 119 [7] X. Leroy, Bytecode Verification for Java smart card, Software Practice & Experience, 32:319-340, 2002. 119 [8] T. Lindholm, F. Yellin, The Java Virtual Machine Specification, The Java Series, Addison-Wesley, 1999. 119, 120, 122, 124, 129, 136 [9] G. McGraw, E. Felten Securiy Java, John Wiley & Sons, 1999. 119 [10] S. Muchnick, Advanced Compiler Design and Implementation, Morgan Kaufmann, 1997. 125 [11] G. Necula, Proof-carrying code, Proceedings of POPL’97, pp. 106-119, ACM Press, 1997. 119 [12] D. Schmidt, Denotational Semantics, a Methodology for Language Development, Allyn and Bacon, Boston, 1986. [13] P. Bieber, J. Cazin, A. El-Marouani, P. Girard, J.-L. Lanet, V. Wiels, G. Zanon, The PACAP prototype: a tool for detecting java card illegal flows, In I. Attali and T. Jensen, editors, Java on Smart Cards: Programming and Security, vol. 2041 of Lecture Notes in Computer Science, pp. 25-37, Springer-Verlag, 2001. 136 [14] A. Yao, New algorithms for bin packing, Journal of the ACM, 27(2):207-227, April 1980. 134 [15] W. de la Vega, G. Lueker, Bin packing can be solved within 1+ in linear time, Combinatorica, 1(4):349-355, 1981. 134 [16] D. Johnson, A. Demers, J. Ullman, M. Garey, R. Graham, Worst-case performance bounds for simple one-dimensional packaging algorithms, SIAM Journal on Computing, 3(4):299-325, December 1974. 134 [17] B. Baker, A new proof for the first-fit decreasing bin-packing algorithm, SIAM Journal Alg. Disc. Meth., 2(2):147-152, June 1981. 134
Sliding Properties of the DES Key Schedule and Potential Extensions to the Slide Attacks Raphael Chung-Wei Phan1 and Soichi Furuya2 1
Swinburne Sarawak Institute of Technology 1st Floor, State Complex, 93576 Kuching, Sarawak, Malaysia [email protected] 2 Systems Development Lab, Hitachi, Ltd., Japan [email protected]
Abstract. The DES key schedule is linear and yet defeats related-key cryptanalysis and other attacks that exploit weaknesses in key schedules, for example the slide attacks. In this paper we present new interesting key-pairs of the DES that can be used in related-key attacks to produce sliding properties of the full-round DES key schedule. This is a sort of key schedule weakness from a slide attack point of view. Our results demonstrate the first known sliding property of the original DES key schedule for its full 16 rounds. Finally, we consider potential applications of these findings in extending the slide attacks. Keywords: Block ciphers, cryptanalysis, DES, key schedule, extended slide attacks, sliding property
1
Introduction
The key schedule of the Data Encryption Standard (DES) is one of the most analyzed key schedules among block ciphers. Previous researchers have noted that though linear, it is resistant against related-key attacks [1, 6]: “As an open question, we note that the DES key schedule is linear, and wonder why it appears to resist related-key attacks.” The first intent of our paper is to present new potentially weak key-pairs of the DES. We show that for every key of the DES, there is another key (which is a simple bit permutation of the original key) such that many of the round keys generated by this key-pair are the same. More interestingly, the equal round keys occur in the same sequence, which is often advantageous in slide attacks [2, 3]. Since each round is fully dependent on the round key, this means that the encryptions by the key-pair would have many similar rounds, a phenomenon that we call the sliding property of the key schedule. This is a sort of key-schedule weakness from the slide attacks’ point of view. What is more intriguing about this result is that it demonstrates the first-known sliding property of the original unmodified DES key schedule, for the full 16 rounds. The best-known previous P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 138–148, 2003. c Springer-Verlag Berlin Heidelberg 2003
Sliding Properties of the DES Key Schedule
139
result was on a much weakened variant of the DES key schedule with a constant number of shifts [1]. The reader is also referred to the Appendix for a summary of previous slide attacks on DES variants. Secondly, we wish to kick-start more active research into the study of linear key schedules especially that of the DES. Such an effort is not without its motivations. Considering that the DES key schedule, besides being the most analyzed key schedule since 1977, is linear and yet remains strong, defying the popular belief among cryptographers that strong key schedules should be nonlinear. As a headstart in this direction, we consider potential ways to exploit the DES keypairs in extended versions of the slide attacks. So far, we have not found any obvious way to mount a practical attack based on the sliding property we describe. Nevertheless, we believe that the property has potential weaknesses and further investigation based on our observations is needed. Of independent interest in this work is the detailed study of more advanced ways to extend the slide attacks. Previous slide attacks [2, 3] considered sliding encryptions with selfsimilar round sequences of only one type. We extend on this work by suggesting ways to slide encryptions with two types of similar round sequences. This paper is organized as follows: Section 2 describes the DES key schedule. We present our new key-pairs of the DES in Section 3, and show how they can be used in a simple related-key attack to produce a sliding property for the full 16 rounds of DES. In Section 4, we consider potential ways to exploit this sliding property in extending the slide attacks on the DES. In particular, we suggest how to slide DES encryptions with two types of self-similar rounds. We conclude in Section 5 and motivate areas for further research.
2
The DES Key Schedule
The key schedule of DES takes a 64-bit secret key which is passed through a permutation, P C1 that removes the parity bits, causing a resultant 56-bit key, K. Since this permutation is of no cryptographic importance, the secret key of DES is normally assumed to be 56 bits in length. The 56-bit key, K is divided into two halves, C0 and D0 , each of 28 bits, hence we have K = C0 ||D0 where || denotes concatenation. The round keys, Ki where i ∈ {1, . . . , 16} are defined as Ki = P C2(Ci ||Di ) where Ci = LSi (Ci−1 ), Di = LSi (Di−1 ), P C2 is a permutation and where LSi is a left circular shift by the number of positions according to Table 1.
Table 1. Circular shifts in the key schedule of DES i 1 2 3 4 5
6
7
8
9 10 11 12 13 14 15 16
LSi 1 1 2 2 2 2 2 2 1 2 2 2 2 2 2 1 a[i] 1 2 4 6 8 10 12 14 15 17 19 21 23 25 27 28
140
Raphael Chung-Wei Phan and Soichi Furuya
In this paper, we will use the alternative representation introduced in [7], and define La[i] (C0 ||D0 ) = LSa[i] (C0 )||LSa[i] (D0 ) where a[i] is the accumulated number of shifts given in Table 1. Hence, the round keys are then Ki = P C2(La[i] (C0 ||D0 )) = P C2(La[i] (K)).
3
Sliding Properties of the DES Key Schedule
The key schedule of the DES is linear in nature, hence this causes certain pairs of keys (potentially weak key-pairs) to generate common round keys for some of the rounds, as was demonstrated by Knudsen in [7]. Theorem 1 [7]. For every key K, there exists a key K such that Ki+1 = Ki ;
i ∈ {2, . . . , 7} ∪ {9, . . . , 14}
(1)
i.e., K and K have 12 common round keys. Proof. Given a key, K, then there is another key, K = L2 (K) such that K3 = P C2(L4 (K)) = P C2(L2 (L2 (K))) = P C2(L2 (K )) = K2 . Similarly, it can be shown that Ki+1 = Ki holds for i ∈ {2, . . . , 7}. Furthermore, K9 = P C2(L15 (K)) and K8 = P C2(L14 (K )) = P C2(L14 (L2 (K))) = P C2(L16 (K)). Hereafter, the round keys get ‘re-synchronized’ since K10 = P C2(L17 (K)) = P C2(L15 (L2 (K))) = P C2(L15 (K )) = K9 . And similarly Ki+1 = Ki holds for i ∈ {9, . . . , 14}.
Theorem 1 illustrates that if we have a pair of keys K and K = L2 (K), then they would generate 12 round keys in common. Nevertheless, such a property still resists slide and related-key attacks since there is a single ‘unsynchronized round’ in the middle of the key schedule with different round key values (in this case round key 9 of K and round key 8 of K ). Interestingly, we have discovered key-pairs that do not have any unsynchronized round keys in the middle, making it more desirable in terms of mounting slide and related-key attacks: Lemma 1. For every key K, there exists a key K such that Ki+7 = Ki ;
i ∈ {1, . . . , 9}
i.e., K and K have 9 common round keys. Proof. Given a key, K, then there is another key, K = L13 (K) such that K8 = P C2(L14 (K)) = P C2(L1 (L13 (K))) = P C2(L1 (K )) = K1 .
(2)
Sliding Properties of the DES Key Schedule
Similarly, it can be shown that Ki+7 = Ki holds for i ∈ {1, . . . , 9}.
141
The key-pairs presented in Lemma 1 cause 9 rounds of the DES encryptions to have common (equal) round keys. A naive consequence of this is to apply it in a simple related-key attack on DES. Request for the encryption of plaintext, P under key K, denoted by C = EK (P ) and the encryption of P under another related-key, K = L13 (K), denoted by C = EK (P ). Then the rounds 8 to 16 of the first encryption would be equal to rounds 1 to 9 of the second encryption, since they share common round keys. We then have the relationships G7 (P −) = P H7 (C) = C
(3) (4)
where G7 denotes the first 7 rounds of DES encryption and H7 denotes the last 7 rounds of DES encryption. Nevertheless, mounting a key analysis phase that exploits the above relationships is hard since G7 and H7 are not weak 1 . But we can do better than that. It is obvious that the key-pairs given in Lemma 1 have 9 round keys in common, but there are 6 other round keys that are implicitly in common, though less obvious. Theorem 2. For every key-pair K and K = L13 (K), then Ki+7 = Ki ; and
Ki−8 = Ki ;
i ∈ {1, . . . , 9}
(5)
i ∈ {10, . . . , 15}
(6)
i.e., K and K have 9+6 = 15 common round keys. Proof. Equation (5) was proven in Lemma 1, hence it suffices to prove only (6). Given a key-pair, K, and K = L13 (K), then K10 = P C2(L17 (K )) = P C2(L17 (L13 (K))) = P C2(L30 (K)) = P C2(L2 (K))
= K2 . Note that L30 (K) = L2 (K) since La[i] (K) is a left-shift of the two 28-bit halves of K so it recycles back after consecutive shifts of 28 bits. Similarly, it can be shown that Ki−8 = Ki holds for i ∈ {10, . . . , 15}. Theorem 2 presents an interesting sliding property of the DES key schedule. In particular, given a key-pair K and K = L13 (K), then the DES encryptions keyed by this key-pair have 15 rounds in common, and they occur in the same sequence. 1
Following the definition as in [2], a function, F is weak if given two equations it is easy to extract the key, K. Consult [2] for further details.
142
Raphael Chung-Wei Phan and Soichi Furuya
More formally, let: Ea (.) denote the DES variant reduced to 6 rounds, keyed by K2 , K3 , . . . , K7 (or equivalently K10 , K11 , . . . , K15 ) Eb (.) denote the DES variant reduced to 9 rounds, keyed by K8 , K9 , . . . , K16 (or equivalently K1 , K2 , . . . , K9 ) E1 (.) denote the first round of DES encryption keyed by K E16 (.) denote the last round of DES encryption keyed by K .
Then, our two encryptions (one keyed by K and the other keyed by K ) satisfy
and
C = EK (P ) = Eb (Ea (E1 (P ))),
(7)
C = EK (P ) = E16 (Ea (Eb (P ))).
(8)
Illustrated pictorially2 : P → E1 ◦ Ea ◦ Eb → C → C P → Eb ◦ Ea ◦ E16 Notice the same sequence of rounds, Ea and Eb between the two encryptions. By exploiting this sliding property such that all the similar rounds are aligned together, then we will only have 1 round that is unaligned. More importantly, the only unaligned round is located at the edge of the similar sequences, which is sometimes eliminated in many cryptanalytic attacks. Even though conventional slide attacks and related-key attacks do not seem to work on such a property, it is possible to extend the slide attacks to exploit this. This is discussed in the next section.
4
Extended Slide Attacks Based on the DES Key-Pairs
In this section, we consider how the slide attacks can potentially be extended and applied on the DES by exploiting its key-pairs given in Theorem 2. The DES key-pairs cause their corresponding encryptions to have two types of similar round sequences, Ea and Eb . The conventional slide attacks (including sliding with a twist and complementation slide [3]) are unsuccessful in this scenario, since they only work on self-similar round sequences of one type, but here we show two possible ways to overcome this limitation. The first suggests how to extend a concept first outlined in [3]. We then propose an extended related-key slide attack called the double slide attack. 2
For ease of illustration, the composition of functions f and g is denoted by f ◦ g where f is done first.
Sliding Properties of the DES Key Schedule
4.1
143
Extending the Domino Effect
We briefly recap on the ideas of the conventional slide attacks [2]. In a typical slide attack, we consider the encryption, C = EK (P ) = F ◦F ◦. . .◦F = F r (P ) as a composition of r round functions, F where each round function is similar to the other. Such a cipher is one with one-round self-similarity. Two such encryptions are then slid with each other such that they are one round out of phase, namely: P → F ◦ F ◦ F ◦ ... ◦ F → C P → F ◦ F ◦ . . . ◦ F ◦ F → C Since all the slid rounds are similar to each other, then we would have: P = F (P )
(9)
C = F (C).
(10)
These are called the slid equations and the pair P, P with their corresponding ciphertexts, C, C is called a slid pair if it satisfies the slid equations. We also recall an important observation in Section 3.5 of [3], where it was mentioned that given a slid pair (P, C) and (P , C ), where C = EK (P ) = F r (P ) and C = EK (P ) = F r (P ), then if we request for the ciphertexts, Cj and Cj after encryption of P and P by a multiple of j times, (1 ≤ j ≤ M ), we will get M extra slid pairs ˆofor free¨ o. We formalize this under the term domino effect. Theorem 3 [3] (The Domino Effect). Consider a plaintext P . Request for a multiple of j times the encryption of P , to obtain the ciphertexts, Cj , (1 ≤ j ≤ M ). Then if P = F (P ), we will get M pairs for free that also satisfy the relation Cj = F (Cj ). Proof. Denote F as the round function and F r as the full encryption, where r is the total number of rounds of the cipher. Then P = F (P ) implies that C1 = F r (P ) = F r (F (P )) = F (F r (P )) = F (C1 ). Also, since C1 = F (C1 ), then C2 = F r (C1 ) = F r (F (C1 )) = F (F r (C1 )) = F (C2 ). Hence, ) = F r (F (Cj−1 )) = F (F r (Cj−1 )) = F (Cj ). Cj = F r (Cj−1
The domino effect can be used to generate free slid pairs from a detected slid pair. Supposing that we have a slid pair, (P, P ), then if we request for the encryption of P and P after j multiple times (1 ≤ j ≤ M ), we get the ciphertext pairs (C1 , C1 ), (C2 , C2 ), . . . , (CM , CM ), or depicted pictorially: P → F ◦F ◦. . .◦F → C1 → F ◦F ◦. . .◦F → C2 → . . . → F ◦F ◦. . .◦F → CM P → F ◦. . .◦F ◦F → C1 → F ◦. . .◦F ◦F → C2 → . . . → F ◦. . .◦F ◦F → CM
144
Raphael Chung-Wei Phan and Soichi Furuya
Here, each ciphertext-pair, (Cj , Cj ) would form a slid pair. The domino effect propagates all the way to the last ciphertext-pair, (CM , CM ) since all the slid rounds are similar with each other. Notice that if we have two encryption sequences with alternating round functions of the form: P → F ◦G◦. . .◦G → C1 → F ◦G◦. . .◦G → C2 → . . . → F ◦G◦. . .◦G → CM P → G◦. . .◦G◦F → C1 → G◦. . .◦G◦F → C2 → . . . → G◦. . .◦G◦F → CM
where F and G denote different round functions, then the domino effect still applies. Returning to our situation of the DES, we have two such alternating encryption sequences: P → E1 ◦ Ea ◦ Eb → C P → Eb ◦ Ea ◦ E16 → C except for the initial (respectively final) round function, E1 (respectively E16 ). A potential approach to extend the slide attacks to apply to alternating encryption sequences is as follows:
1. (DESK Oracle Calls): Obtain 232 plaintexts, P i (i ∈ {1, 2, . . . , 232 }. For each P i , request for the ciphertexts, Cj i after multiple encryptions, C1i , C2i , . . . , CMi such that
and
C1i = EK (E1−1 (P i ))
(11)
Cj+1i = EK (E1−1 (Cj i )).
(12)
2. (DESK Oracle Calls): Meanwhile, for another 232 plaintexts, P i (i ∈ {1, 2, . . . , 232 }, obtain related-key queries for the ciphertexts, Cj i after mul i tiple encryptions, C1 i , C2 i , . . . , CM such that
and
−1 C1 i = E16 (EK (P i ))
(13)
−1 i = E16 (EK (Cj i )). Cj+1
(14)
Clearly, our objective is to push E1 and E16 out of the way so that we have a continuous alternating sequence of Ea and Eb in between consecutive ciphertexts. As an illustration, sliding the two encryptions by this approach, we would then have:
Pi → Ea ◦ Eb ◦ Ea ◦ Eb ◦ . . . ◦ Ea ◦ Eb → CM Pi → Eb ◦ Ea ◦ Eb ◦ . . . ◦ Ea ◦ Eb ◦ Ea → CM Nevertheless, the text requirements are high. Consider the first sequence. To compute E1−1 , we need to guess all 248 values of K1 , before asking for adaptively
Sliding Properties of the DES Key Schedule
145
chosen plaintexts to be encrypted. This means we will need 248 × 232 adaptively chosen plaintexts. For the second sequence, we need another 248 × 232 relatedkey adaptively chosen plaintexts. This amount of text requirements makes this approach currently impractical. 4.2
Double Slide Attack
In this section, we describe a novel related-key technique for extending the slide attacks, which we term the double slide attack. Looking at the interesting structure of the two encryptions as given in (7) and (8), we consider extending the slide attack by sliding the two encryptions two times instead of just once (the double slide). A first slide causes the encryptions to be 7 rounds out of phase, so that the last 9 rounds of the first encryption and the first 9 rounds of the second are aligned, and we have the situation where: P → E1 ◦ Ea ◦ Eb → C → C P → Eb ◦ Ea ◦ E16 The plaintext and ciphertext pairs, (P, P ) and (C, C ) would then satisfy the slid equation (15) Ea (E1 (P −)) = P and we get ‘for free’ another slid equation of the form: E16 (Ea (C)) = C .
(16)
Now, consider the first 7 rounds of the first encryption, denoted by E1 ◦ Ea and the last 7 rounds of the second, denoted by Ea ◦ E16 . If we slide these rounds a second time such that they are one round out of phase, we have: P → E1 ◦ Ea → P C → Ea ◦ E16 → C
so another slid equation emerges, of the form E1 (P −) = C
(17)
and we immediately obtain another slid equation ‘for free’: E16 (P ) = C .
(18)
This presents a very interesting result. For one thing, we have four slid equations, instead of two as in the conventional slide attacks. This would allow one to impose more conditions on the possible pairs and hence aid in the check for the double-slid pairs. The slid equations in (17) and (18) also relate the plaintext and ciphertext pairs by only one DES round. Furthermore, since DES is a Feistel cipher, we automatically obtain a 64-bit filtering condition on the pairs from both the slid equations in (17) and (18). In particular, we denote PL and PR
146
Raphael Chung-Wei Phan and Soichi Furuya
(respectively CL and CR ) as the left and right halves of the plaintext, P (respectively ciphertext, C). Then for DES, the slid equations in (17) and (18) mean that PR = CL and similarly, PR = CL . A possible attack begins by obtaining the encryptions of a pool of 264 known texts, P keyed by both the unknown key, K and a related key, K = L13 (K). We will use C and C to denote the two encryptions respectively. From this pool of plaintexts, we can form 2128 pairs P and P = P among which we expect by the birthday paradox that one double-slid pair would exist that satisfies the slid equations in (17) and (18). In fact, we can discard most of the plaintexts on the fly, picking only those where the right halves of P equal the left halves of C, and similarly for P and C . This leaves us with 232 known texts (P, C) and another 232 known texts (P , C ). Since E1 and E16 are weak, then given (P, C) and (P , C ), it is easy to respectively with an effort roughly equal to extract the related keys K1 and K16 one round of DES encryption. Each of the 232 known texts (P, C) suggests 216 possible candidates for the 48-bit value K1 so in total 248 values of K1 would be suggested. Similarly, the 232 known texts (P , C ) suggest 248 possible values of K16 . Now, we have 248 × 248 = 296 possible combinations of (K1 , K16 ) where it can be shown that K1 and K16 in fact share 42 bits in common. Nevertheless, we are unable to significantly narrow down the list of possible values of K1 and K16 . This is mainly due to the fact that the round function of the DES consists of an S-box that is not invertible so given two known texts, (P, C) there is no way to work backwards through the S-box and compute all 48 bits of a candidate round key. Instead, we can only compute 32 bits of a round key.
5
Conclusion and Open Problems
We have presented new potentially weak key-pairs of the DES that cause 15 out of 16 rounds of the DES to have common round keys. More interestingly, the equal round keys occur in the same sequence, which is often advantageous in slide attacks, plus the fact that the only unequal round is located at the edge of the sequences, which is sometimes eliminated in many cryptanalytic attacks. Our results also demonstrate the first known sliding property of the original unmodified DES key schedule for its full 16 rounds. This is an interesting property and is a sort of key schedule weakness from the slide attacks’ point of view. We strongly believe that a more detailed study of exploiting the sliding property of the DES key schedule is required. We also considered potential ways to extend the conventional slide attacks on the DES based on these key-pairs. So far, the conventional slide attacks are applicable to encryptions with only one type of self-similar round sequences. We suggested two possible ways in which we can overcome this limitation by extending the consideration to encryptions with two types of similar round sequences. It remains an open problem whether further enhancements can be made to the extensions that we have discussed in Section 4. It would also be interesting
Sliding Properties of the DES Key Schedule
147
to see if the key schedules of other block ciphers possess similar properties that could be exploited in extended slide attacks.
Acknowledgements We would like to thank Alex Biryukov and David Wagner for their comments and interest in this work. We are also grateful to the anonymous referees whose comments and suggestions helped to improve this paper.
References [1] Biham, E.: New Types of Cryptanalytic Attacks Using Related Keys. Journal of Cryptology, Vol.7, Springer-Verlag (1994) 229–246 138, 139, 147, 148 [2] Biryukov, A., Wagner, D.: Slide Attacks. Proceedings of Fast Software Encryption’99, LNCS 1636, Springer-Verlag (1999) 245–259 138, 139, 141, 143, 147 [3] Biryukov, A., Wagner, D.: Advanced Slide Attacks. Proceedings of Eurocrypt 2000, LNCS 1807, Springer-Verlag (2000) 589–606 138, 139, 142, 143, 147, 148 [4] Brown, L., Seberry, J.: Key Scheduling in DES Type Cryptosystems. Proceedings of AUSCRYPT’90, LNCS 453, Springer-Verlag (1990) 221–228 148 [5] Furuya, S.: Slide Attacks with a Known-Plaintext Cryptanalysis. Proceedings of ICISC 2001, LNCS 2288, Springer-Verlag (2002) 214–225 147, 148 [6] Kelsey, J., Schneier, B., Wagner, D.: Key-Schedule Cryptanalysis of IDEA, GDES, GOST, SAFER, and Triple-DES. Proceedings of Crypto’96, LNCS 1109, Springer-Verlag (1996) 237–251 138, 147 [7] Knudsen, L. R.: New Potentially ’Weak’ Keys for DES and LOKI (Extended abstract). Proceedings of Eurocrypt’94, LNCS 950, Springer-Verlag (1994) 419–424 140
Appendix: A Summary of Previous Slide Attacks on DES Variants The DES key schedule has received considerable attention from cryptanalysts. Among the notable cryptanalysis results include numerous slide attacks [1, 2, 3, 5] on its variants. As the slide attacks typically exploit weaknesses in key schedules, these results shed some light into the design and security of the DES key schedule. The first known slide attack presented in 1994 [1] was actually a relatedkey slide attack 3 that was applied on a DES with constant left shifts in the key schedule, a much weakened variant by today’s standards. It requires a complexity of 217 related-key chosen plaintexts or 233 related-key known plaintexts. Later in 1999 [2], the slide attacks were applied on 2K-DES, a DES variant with 64 rounds and which alternatively used two independent 48-bit round keys, K1 and K2 in the odd and even rounds respectively. Note that the number 3
Also referred to as rotational related-key cryptanalysis [6].
148
Raphael Chung-Wei Phan and Soichi Furuya
of rounds was limited to 64 for ease of illustration. In fact, it is equally applicable to 2K-DES with an infinite number of rounds. The slide attack requires 233 adaptively chosen plaintexts and 233 work, or 232 known plaintexts and 250 work. Notice that the scenario here is a known- or chosen-plaintext slide attack, and does not require related-key queries as was the case in the related-key slide attack presented in the preceding paragraph [1]. A year later [3], more advanced slide attacks, namely the complementation slide and sliding with a twist were introduced and again mounted on 2K-DES, reducing the effort to 232 known plaintexts and 233 work, or 217 chosen plaintexts/ciphertexts and 217 work. Furthermore, a slightly more complex DES variant, 4K-DES which is alternatively keyed by 4 independent 48-bit round keys, K1 , . . . , K4 was also considered. It was demonstrated that for a fraction of 1/216 of all keys, the advanced slide attacks were applicable to 4K-DES requiring 217 chosen plaintexts/ciphertexts and 217 work. Also, for the same fraction of all keys, a similar attack was presented on another DES variant with a key schedule proposed by Brown-Seberry [4], requiring just 128 chosen plaintexts/ciphertexts and 27 work. In 2001, an extended slide attack [5] which we feel should rightly be called the slide-linear attack was presented and mounted on 4K-DES. The slide-linear attack is basically a fusion of the concepts from linear cryptanalysis into the slide attacks. In essence, a conventional slide attack is first applied, and then the unslid rounds, being 4-round DES in the case of 4K-DES and obviously not weak, is attacked with linear cryptanalysis. The attack requires 244 chosen plaintexts and 262 work.
Consistent Differential Patterns of Rijndael Beomsik Song and Jennifer Seberry Centre for Computer Security Research School of Information Technology and Computer Science University of Wollongong Wollongong 2522, Australia {bs81,jennifer seberry}@uow.edu.au
Abstract. Rijndael is an SPN (Substitution Permutation Network) structure block cipher, which was recently selected as the AES (Advanced Encryption Standard) algorithm. In this paper, we describe some algebraic properties of the basic functions used in Rijndael, and introduce consistent differential patterns of this cipher. We then describe how these properties can be applied to the cryptanalysis of this cipher. Keywords: Consistent Differential Patterns, Differential Characteristics, Cryptanalysis, Rijndael.
1
Introduction
An SPN (Substitution Permutation Network) structure block cipher Rijndael [1, 4] was recently selected as the AES (Advanced Encryption Standard) algorithm. This cipher has been reputed to be secure against DC (Differential Cryptanalysis) and LC (Linear Cryptanalysis) [4, 7], but some cryptanalysts have presented several cryptanalytic methods using algebraic properties of the functions used in this cipher or the key schedule of this cipher to attack reduced variants of Rijndael [2, 3, 4, 8, 9]. Among these cryptanalytic methods, the Square attack [4, 5] has presented a method to attack six-round Rijndael using a third round distinctive output property of this cipher (consistent output property led to by some chosen plaintexts), and Impossible differential cryptanalysis [2] has introduced an impossible differential property, which never appears on the fourth round of this cipher, for an attack of five-round Rijndael. Also, the collision attack [3] has shown a method to attack seven-round Rijndael using a fourth round distinctive output property of this cipher. In this paper, we present some algebraic properties of the basic functions used in Rijndael and introduce consistent differential patterns of this cipher, which will be useful for its cryptanalysis. We then show how these properties can be practically applied to the cryptanalysis of Rijndael. In particular, in terms of the consistent differential patterns of this cipher, we have found that • if two plaintexts of this cipher differ by only one byte, then there are always four pairs of bytes in the second-round output difference with each pair P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 149–163, 2003. c Springer-Verlag Berlin Heidelberg 2003
150
Beomsik Song and Jennifer Seberry
Table 1. Complexities of some cryptanalytic methods against Rijndael Attack
Number of Rounds
Chosen Plaintexts
Time Complexity
Source
Square
4 5 6
29 211 232
29 240 272
[4] [4] [4]
Partial sum
6 7
6 × 232 2128 − 2119
244 2120
[8] [8]
Impossible Differential
5 6
229.5 291.5
231 2122
[2] [6]
New method
3 4 5 6
10 26 232 2 × 232
28 232 34 16 2 + 2 < 235 1/6× (268 + 265) < 266
This paper This paper This paper This paper
having the same value (this pattern is consistent as long as two plaintexts differ by only one byte). • if two plaintexts of this cipher differ by up to four bytes in certain positions, then the above pattern appears in the second-round output difference as well. • for any 28n plaintexts, which vary in n bytes (any positions) and the other bytes are all the same, if we pair one of these plaintexts with each of the other plaintexts, then any of the output differences is equal to the XOR of the other output differences after the third round. • for any 232 plaintexts, which vary in four certain bytes and the other bytes are all the same, if we pair one of these plaintexts with each of the other plaintexts, then any of the output differences is equal to the XOR of the other output differences after the fourth round. The complexities of the cryptanalytic methods based on our observations on 128-bit Rijndael are summarised in Table 1 in comparison with some well-known methods which use distinctive output properties of this cipher. The main part of this paper is organised as follows: the description of Rijndael is given in Section 2; some significant properties of Rijndael are observed in Section 3; and the application of the properties to the cryptanalysis of Rijndael are described in Section 4 on three, four, five and six rounds.
2
Description of Rijndael
Rijndael is an SPN structure block cipher, which has variable block lengths and key lengths (128, 192, and 256 respectively). In the standard case, it processes
Consistent Differential Patterns of Rijndael
i00 i10 i20 i30
i01 i11 i21 i31
i02 i12 i22 i32
i03 i13 i23 i33
i00 i10 i20 i30
S-box (bytewise substitution) O00 O10 O20 O30
O01 O11 O21 O31
O02 O12 O22 O32
i01 i11 i21 i31
i02 i12 i22 i32
i03 i13 i23 i33
i00 i10 i20 i30
i00 i11 i22 i33
i01 i12 i23 i30
i02 i13 i20 i31
i03 i10 i21 i32
O00 O10 O20 O30
< ShiftRows >
< SubBytes >
i02 i12 i22 i32
i03 i13 i23 i33
mix of four bytes in each column
cyclic shift in each row
O03 O13 O23 O33
i01 i11 i21 i31
151
O01 O11 O21 O31
O02 O12 O22 O32
O03 O13 O23 O33
< MixColumns >
Fig. 1. Functions of Rijndael
data blocks of 128 bits with a 128-bit Cipher Key [1, 4]. In this paper we discuss the standard case because the results of the observations will be similar in the other cases. As Figure 1 shows, each byte in the block is substituted bytewise by the SubBytes transformation (using a 256-byte S-box), and then every byte, in each row, is cyclicly shifted by a certain value (row #0: 0, row #1: 1, row #2: 2, row #3: 3) by the ShiftRows transformation. After this, all four bytes in each column are mixed through the MixColumns transformation by the matrix formula in Figure 2. Here, each column is considered as a polynomial over GF (28 ), and multiplied with a fixed polynomial 03 · x3 + 01 · x2 + 01 · x + 02 (modulo x4 + 1). After these operations, a 128-bit round key extended from the Cipher Key is XORed in the last part of the round. The MixColumns transformation is omitted in the last round (10th round), but before the first round a 128-bit initial round key is XORed through the initial round key addition routine.
O0c
02 03 01 01
i0c
01 02 03 01
i1c
O2c
01 01 02 03
i2c
O3c
03 01 01 02
i3c
O1c =
Fig. 2. Mixing of four bytes in a column (M ixColumn)
152
3 3.1
Beomsik Song and Jennifer Seberry
Some Properties of Rijndael Algebraic Properties of the Basic Functions
We refer here to some mathematical properties of the basic functions used in this cipher. Some algebraic properties of the MixColumns transformation and the SubBytes transformation are discussed. Properties of the MixColumns Transformation We first obtain Property 1 from the fact that the function M ixColumn described in Figure 2 is a linear function. Although this property looks very simple, this property is very useful for reducing the number of round keys that should be guessed to find the key actually used. The advantage of this property will be explained in more detail in Section 4.4. Property 1 Let X = (i0 , i1 , i2 , i3 ) and X = (i0 , i1 , i2 , i3 ) be any two inputs of M ixColumn, and ∆X be the input difference between these two inputs. Then the output difference between the two corresponding outputs is equal to M ixColumn of the input difference. That is M ixColumn(X) ⊕ M ixColumn(X ) = M ixColumn(∆X = X ⊕ X ). From the matrix multiplication of M ixColumn we have found another property of this function. Property 2 will be used to find consistent differential patterns of the second round. Property 2 For any four-byte input of M ixColumn described in Figure 2, if the value (α) of one byte is different from the other three bytes, which have the same value (β), then the value α appears in two bytes of the output of M ixColumn. In other words, if the inputs of M ixColumn are I = (α, β, β, β), I = (β, α, β, β), I = (β, β, α, β), or I = (β, β, β, α), then M ixColumn(I) = (γ, α, α, δ), M ixColumn(I ) = (δ, γ, α, α), M ixColumn(I ) = (α, δ, γ, α), M ixColumn(I ) = (α, α, δ, γ), γ ⊕ δ = α ⊕ β. Differential Characteristics of the SubBytes Transformation (S-Box) Here, we discuss differential characteristics of the S-box used in Rijndael. As mentioned in the previous section, the S-box, which is a non-linear function, consists of 256 substitution paths. Then each input byte is replaced with a new value through the SubBytes transformation. In terms of the S-box used in Rijndael, we have found some unusual differential characteristics from a complete
Consistent Differential Patterns of Rijndael
153
input difference (∆X) ⊕ I1
I2
⊕
Key
0x00 0x01 0x02 0x03 0x04 0x05 0x06 • • • • • • • ↓
↓
↓
↓
↓
↓
↓
0xff ↓
0x63 0x7c 0x77 0x7b 0xf2 0x6b 0x6f • • • • • • • 0x16
O2
O1
⊕ output difference (∆Y)
Fig. 3. Substitution in the S-box computer search. We have observed that for any input difference (∆X) of the Sbox, the number of the possible output differences (∆Y ) is always 127. That is, for all 128 input pairs of the S-box having the same input difference, a certain value always appears twice in the output differences while the other values appear just once (very well distributed but why does one value always appear twice?). We have also observed that if five inputs, which are different from each other, are input to the S-box XORed with a key, as shown in Figure 3, then the combination of the four output differences (pairing one of the outputs with each of the other outputs) is unique to the key. In other words, each key generates a unique combination of output differences for any five inputs. 3.2
Consistent Differential Patterns
Consistent Differential Pattern 1 (Second Round) Consider two plaintexts in which only the values of the first bytes (byte#0 ) are different from each other. Then the input difference is row #0: (p, 00, 00, 00), row #1: (00, 00, 00, 00), row #3: (00, 00, 00, 00), row #3: (00, 00, 00, 00). After the initial round key addition and the SubBytes transformation of the first round, this property still remains in the difference (only the value of the first byte is changed: p → q, p and q are any hexadecimal values). After the ShiftRows transformation of the first round, each byte maintains the same value in the difference. However, the MixColumns transformation of the first round leads to the change of the value of the first byte in each row. The result is by Property 2
154
Beomsik Song and Jennifer Seberry
(r, (q, (q, (s,
00, 00, 00, 00,
00, 00, 00, 00,
00), 00), 00), 00).
(Since M ixColumn is a linear function, Property 2 is applicable to the difference.) As the round key addition does not affect the difference, the data is unchanged after the first round key addition. However, after the SubBytes transformation of the second round the value of the first byte in each row is changed to (α, 00, 00, 00), (δ, 00, 00, 00), (γ, 00, 00, 00), (β, 00, 00, 00). After this, the result of the ShiftRows transformation of the second round is (α, 00, 00, 00), (00, 00, 00, δ), (00, 00, γ, 00), (00, β, 00, 00). Here, the MixColumns transformation of the second round causes, by Property 2, a particular differential pattern in the output difference such as (a) in Figure 4. This pattern remains after the second round key addition because the round key addition does not affect the difference. Therefore, we finally find a particular output pattern in the second round output difference that byte#1 = byte#2 byte#4 = byte#5 byte#8 = byte#11 byte#14 = byte#15
= byte#0 ⊕ byte#3, = byte#6 ⊕ byte#7, = byte#9 ⊕ byte#10, = byte#12 ⊕ byte#13 .
This pattern is consistent provided only the values of the first bytes in two plaintexts are different from each other, so we call this property the consistent differential pattern of the second round. If the byte having the different values between two plaintexts is moved, the consistent differential pattern appears in other positions such as (b), (c), and (d) in Figure 4. Consistent Differential Pattern 2 (Second Round) Any pair of two plaintexts having one of the input differences in Figure 5 also has the consistent differential pattern in the output difference after the second round. Here, the circled bytes in the input differences indicate the bytes, each of whose values need not be equal to 00. Figure 5 shows the relations between the input differences and the consistent differential patterns after the second round. The reasoning is the same as described in Consistent Differential Pattern 1.
Consistent Differential Patterns of Rijndael
155
input differences p
00
00
00
00
p
00
00
00
00
p
00
00
00
00
p
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
δ
α
β
δ
α
⊕=δ
β α
output differences
γ
γ
β
β
α γ
δ
α
δ
α
δ
α
γ
γ
β
β δ
(a)
α
γ
β
(b)
δ
γ β
(c)
δ
γ
(d)
Fig. 4. Consistent Differential Pattern 1 input differences 00 00 00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
⊕=δ
β α
γ
γ β
γ
(a)
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
00
δ
α
β
δ
α
00
output differences
β
α
00
δ
α
δ
α
δ
α
γ
γ
β
β δ
(b)
α
δ
γ
β
γ β
(c)
δ
γ
(d)
Fig. 5. Consistent Differential Pattern 2
Consistent Differential Pattern 3 (Second Round) Let us consider n plaintexts which are different from each other only in a certain byte. If we pair one of these plaintexts with each of the other plaintexts, then we can obtain n − 1 input differences. In this case, all n − 1 intermediate differences of the second round are different from each other in every byte by the following reasoning: • after the SubBytes transformation of the first round, the output differences are different from each other only in a certain byte. • after the ShiftRows transformation and the MixColumns transformation of the first round, the output differences are different from each other in one column (four bytes) by Property 2 referred to in section 3.1.
156
Beomsik Song and Jennifer Seberry
• considering the SubBytes transformation, the ShiftRows transformation and the MixColumns transformation of the second round we can find the property that the second-round intermediate differences are different from each other in every byte. • because the round key addition does not affect the differences, this property remains after the second round key addition. Consistent Differential Pattern 4 (Third Round) Let us consider 28n plaintexts which vary in n bytes (any positions) and the other bytes are all the same. If we pair one of these plaintexts with each of the other plaintexts, then we can obtain 28n − 1 input differences. In this case, any of the output differences is equal to the XOR of the other output differences after the third round by the following reasoning: • every group of 28n plaintexts, which vary in n bytes, consists of 28(n−1) sets of 28 plaintexts that vary only in one byte. In other words, 28n plaintexts can be regrouped into 28(n−1) sets of 28 plaintexts which are different from each other only in one byte. • with the help of the fact that the XOR of all the third round ciphertexts for 28 plaintexts, which differ by one byte, is 00 in all bytes [4], we can find that the XOR of all the third round ciphertexts for the above 28n plaintexts is also 00 in all bytes [8]. • so, if we pair one of these 28n ciphertexts with each of the other ciphertexts, then any of the differences is equal to the XOR of the other differences. The concept of the “Difference” is efficient, with Property 1, in reducing the number of round keys that should be tested to find the key actually used. We will give a explanation in more detail in Section 4.4. Consistent Differential Pattern 5 (Fourth Round) We now find a fourth-round consistent differential pattern from the property of Consistent Differential Pattern 4. In Consistent Differential Pattern 4, let us consider 232 plaintexts which vary in the first column (four bytes). If we decrypt these plaintexts by one round with any round key, then the decrypted texts vary in (byte#0 , byte#5 , byte#10 , byte#15). That is, the 232 decrypted texts are different from each other in the circled bytes described in Figure 5 (a). This means that for any 232 plaintexts which vary in (byte#0 , byte#5, byte#10 , byte#15 ), if we pair one of these plaintexts with each of the other plaintexts, then any of the output differences is equal to the XOR of the other output differences after the fourth round. This idea can be applied to the other three columns, and we can find Consistent Differential Pattern 5 for the fourth round. That is, for any 232 plaintexts which vary in (byte#0 , byte#5, byte#10 , byte#15 ), (byte#3 , byte#4 , byte#9 , byte#14), (byte#2 , byte#7 , byte#8 , byte#13 ), or (byte#1 , byte#6 , byte#11 , byte#12 ) as described in Figure 5, if we pair one of these plaintexts with each of the other plaintexts, then any of the output differences is equal to the XOR of the other output differences after the fourth round.
Consistent Differential Patterns of Rijndael
4
157
Application to the Cryptanalysis of Rijndael
In this section, we describe how the properties we have observed can be applied to the cryptanalysis of Rijndael. 4.1
Three Rounds
We use two sets of five chosen plaintexts for this method. One set (Set #1) consists of five plaintexts, which are different from each other only in the first byte (byte#0 ) as illustrated in Figure 4 (a). The other set (Set #2) consists of five plaintexts, which are different from each other only in the ninth byte (byte#8 ) as illustrated in Figure 4 (c). 1. We first use Set #1. Let us pair any of the ciphertexts with each of the other ciphertexts, and we can obtain four pairs of the ciphertexts. If we decrypt these pairs with the correct third-round key, then the second-round intermediate differences must have Consistent Differential Pattern 1 (a) as illustrated in Figure 4 (a) with byte#1 =byte#2, byte#4 =byte#5, byte#8 =byte#11 , byte#14 =byte#15. We first find byte#13 and byte#10 of the third round key from the relation that byte#1 =byte#2 in the second round intermediate differences. 2. Let us decrypt each byte#13 in a pair of the ciphertexts with all 28 possible values for byte#13 of the third round key considering Shif tRows−1 and SubBytes−1 (there is no M ixColumns in the last round). Then, we can obtain all possible values for byte#1 of the second-round intermediate difference (the number of these values will be 127 by the differential characteristics of the SubBytes transformation referred to in Section 3.1). Now, let us decrypt each byte#10 in the pair of the ciphertexts with all 28 possible values for byte#10 of the third round key. Then we can obtain all possible values for byte#2 of the second-round intermediate difference. Here, we select the values for byte#13 and byte#10 of the third round key, which make byte#1 =byte#2 in the second-round intermediate difference. These values are candidates for byte#13 and byte#10 of the third round key. 3. Now, if we repeat step 2 with the other three pairs of the ciphertexts, then at the end, the number of values overlapped will be one for each byte (byte#13 , byte#10 ) by the differential characteristics of the S-box referred to in Section 3.1 unless byte#13 and byte#10 of the five ciphertexts are always the same as each other. This pair of byte#13 and byte#10 is a component of the third round key. 4. With the same method, if we consider the other relations that byte#4 =byte#5, byte#8=byte#11 , and byte#14=byte#15 in the second-round intermediate difference, then we can obtain (byte#4 , byte#1 ), (byte#8 , byte#15 ), and (byte#6 , byte#3 ) of the third round key.
158
Beomsik Song and Jennifer Seberry
5. Now, using the other five plaintexts (Set #2), which are different from each other only in the ninth byte (byte#8 ), we can find (byte#0 , byte#7 ), (byte#2 , byte#5 ), (byte#9 , byte#12 ), and (byte#11 , byte#14 ) of the third round key with the same method. In summary, if we have 10 chosen plaintexts, we can find the Cipher Key for three-round Rijnadel. The time complexity of this method is about 28 . 4.2
Four Rounds
Suppose we have 64 chosen plaintexts, which are different from each other only in the first byte (byte#0 ). If we pair one of these ciphertexts with each of the other ciphertexts, then we can obtain 63 pairs of ciphertexts. We show how to find the fourth round key with these pairs of ciphertexts using the property of Consistent Differential Pattern 3 and the property of Consistent Differential Pattern 1 (a). We follow the decryption procedures. 1. We first guess a combination of (byte#0 , byte#7 , byte#10 , byte#13) in the fourth round key. Then with this combination, decrypt the corresponding bytes in the 64 ciphertexts until before Round Key Addition−1 of the third round. Now, if we pair one of the intermediate texts with each of the other intermediate texts, then we can obtain 63 first columns (byte#0 , byte#1 , byte#2 , byte#3) of the third round intermediate differences. 2. These values of the first columns are maintained after Round Key Addition−1 of the third round because Round Key Addition−1 does not affect the differences (we do not care about the third round key). After this, by Property 1 we can obtain the values for (byte#0 , byte#1 , byte#2, byte#3 ) in the intermediate differences after M ixColumns−1 . After Shif tRows−1 of the third round we can find the values for (byte#0 , byte#5, byte#10 , byte#15 ) in the 63 intermediate differences. 3. If the above values satisfy Consistent Differential Pattern 3, then the four bytes guessed in step 1 could be eligible for the components of the fourth round key. If not, the combination of the four bytes is not eligible for the components of the fourth round key. This is because the pairs of the plaintexts which we are using lead to Consistent Differential Pattern 3 in the second-round intermediate differences, as mentioned in section 3.2, and this property is maintained until after SubBytes (before ShiftRows) of the third round. By 232 repetitions of the above steps (1 to 3) we can obtain all eligible combinations of (byte#0 , byte#7 , byte#10 , byte#13 ) in the fourth round key. 4. If we apply the above steps to the other bytes in the fourth round key we can find all eligible values for (byte#0 , byte#7 , byte#10, byte#13), (byte#1 , byte#4 , byte#11 , byte#14 ), (byte#2 , byte#5 , byte#8, byte#15 ), and (byte#3 , byte#6 , byte#9 , byte#12 ). By combinations of these sets of four bytes, we can obtain all the possible fourth round keys. Then from these possible fourth round keys we can also find the corresponding third round keys.
Consistent Differential Patterns of Rijndael
159
5. Now, if we randomly choose five ciphertexts from the above 64 ciphertexts, and then decrypt these five ciphertexts, we can obtain four second-round intermediate differences (pairing one of the five intermediate texts with each of the other four intermediate texts). If the fourth round key and the third round key are correct, then all the four second-round intermediate differences have Consistent Differential Pattern 1 (a) because we use plaintexts which are different from each other only in byte#0 . After all, we can select the fourth round key actually used, and obtain the Cipher Key from this fourth round key. The probability that Consistent Differential Pattern 1 (a) accidently appears in one second-round intermediate difference is 1/264 , and so the probability that wrong keys satisfy the property of Consistent Differential Pattern 1 (a) for four intermediate differences is 1/2256 at this stage. This means that this event does not happen unless the possible key is the key actually used. Here, we note that we can use 32 chosen plaintexts or fewer with this method if we don’t need to consider time complexity. The reason why we use 64 chosen plaintexts for this method is to make the number of the eligible four-byte keys in step 3 as small as possible. If we use a small number of chosen plaintexts, the number of eligible four-byte keys in step 3 may increase. Our careful analysis shows that if we use 64 chosen plaintexts, then the number of the eligible fourbyte keys in step 3 is one in almost all cases, and so the complexity of step 4 and 5 is negligibly small. 4.3
Five Rounds
We now present the method using Consistent Differential Pattern 2 (a) and Consistent Differential Pattern 5 for the cryptanalysis of five-round Rijndael. We use one set of 232 chosen plaintexts which vary in (byte#0 , byte#5 , byte#10 , byte#15 ) as described in Figure 5 (a). Finding Possible Keys We first find all possible fifth round keys. We recall that the XOR of all the fourth round ciphertexts for the above plaintexts is 00 in all bytes as referred to in Consistent Differential Pattern 5. We first find the first byte (byte#0 ) in the fifth round key considering the decryption procedures. If we guess a value for the first byte in the fifth round key, then we can obtain 232 first bytes in the forth round ciphertexts considering (S-box)−1 . We check whether the XOR of these bytes is 00. If the result is 00, then the value guessed for byte#0 is eligible for the first byte of the fifth round key. If not, then the guess is wrong. The probability that the value of the XOR is equal to 00 in a wrong guess is 1/28 , and so one wrong key may be chosen. If we apply this step to finding the other bytes of the fifth round key (considering the shift of bytes by the ShiftRows transformation), then we can obtain all eligible values for each byte of the fifth round key. This step is similar to the Square attack against four round Rijndael. But we are
160
Beomsik Song and Jennifer Seberry
attacking five round Rijndael using the four-round distinctive output property while the Square attack uses the three-round distinctive output property for the attack of four round Rijndael. Selecting the Key Actually Used Now, we select the fifth round key actually used from the above possible keys. Instead of using one more set of 232 chosen plaintexts, we use the property of Consistent Differential Pattern 2 (a). The number of the possible fifth round keys will be about 216 even in the worst case because the number of possible values for each byte of the fifth round key is less than two in the step of Finding Possible Keys. So we will obtain at most 216 possible sets of fifth, fourth and third round key from the possible fifth round keys. We now randomly choose five plaintexts from the 232 chosen plaintexts used in the step of Finding Possible Keys. If we decrypt the corresponding five ciphertexts with a possible set of a fifth, a fourth, and a third round key, and then pair one of the resulting second-round intermediate texts with each of the other intermediate texts, we can obtain four second-round intermediate differences. Here, we check the four intermediate differences for Consistent Differential Pattern 2 (a). In this step only one set of the round keys will satisfy Consistent Differential Pattern 2 (a) for all four intermediate differences due to the same reason described in the four round cryptanalysis. (As has been shown here and in the four round cryptanalysis, Consistent Differential Patterns 1 and 2 are very efficient in selecting the key actually used.) For this method we do not actually have to do encryption or decryption in the step of Finding Possible Keys. All we have to do is to guess 256 values for each byte in the fifth round key, and do (S-box)−1 for 232 ciphertexts considering the shift of bytes. So we must look up the S-box 16 × 232 × 28 = 244 times. But in the step of the selection of the key actually used we have to decrypt five ciphertexts with about 216 possible keys. For this reason, considering 244 S-box lookups are comparable to 236 one-round decryptions for a full text we can say that the complexity of this method is about 15 × (236 + 5 × 216 ) ≈ 234 + 216 . 4.4
Six Rounds
We here describe the method for the cryptanalysis of six round Rijndael using Property 1 in Section 3.1 and Consistent Differential Pattern 5 in Section 3.2. We use two sets of 232 chosen plaintexts: one set consists of 232 plaintexts which vary in (byte#0 , byte#5, byte#10, byte#15 ) as described in Figure 5 (a), and the other set consists of 232 plaintexts which vary in (byte#2 , byte#7 , byte#8, byte#13 ) as described in Figure 5 (c). In this method, we only guess four key bytes (32 bits) together, which means that this method has a big advantage in the number of keys guessed.
Consistent Differential Patterns of Rijndael
161
Finding Possible Key Components We first use the set of 232 chosen plaintexts, which vary in (byte#0 , byte#5 , byte#10 , byte#15 ), for finding possible sixth round keys. For this method we need to carefully follow the decryption procedures. 1. We first guess a combination of (byte#0 , byte#7 , byte#10 , byte#13) in the sixth round key. And then, with this combination, decrypt the corresponding bytes in the 232 ciphertexts until before Round Key Addition−1 of the fifth round. Now, if we pair one of the decrypted texts (we briefly call this text ’JG’) with each of the other decrypted texts, then we can obtain 232 − 1 first columns (byte#0 , byte#1 , byte#2 , byte#3) of the fifth round intermediate differences. 2. These values of the first columns are maintained even in the intermediate differences after Round Key Addition−1 of the fifth round because Round Key Addition−1 does not affect the differences (so we do not care about the fifth round key). Although we have not considered the fifth round key, we can obtain, by Property 1 in Section 3.1, the first columns of the intermediate differences after M ixColumns−1 of the fifth round. (Here we note that if we did not apply Property 1 and the concept of the “Difference”, we could not find the first columns of the intermediate differences after M ixColumns−1 of the fifth round without the consideration of the corresponding four bytes of the fifth round key.) After Shif tRows−1 of the fifth round, we can obtain the values for (byte#0 , byte#5, byte#10 , byte#15 ) in the intermediate differences before SubBytes−1 of the fifth round. 3. If the four bytes initially guessed in step 1 are correct, the above values for each byte must satisfy Consistent Differential Pattern 5 after SubBytes−1 of the fifth round (any intermediate difference is equal to the XOR of the other intermediate differences in every byte). This can be checked by assuming the substitution paths, which byte#0 , byte#5 , byte#10 , and byte#15 in the intermediate text of JG pass through at this stage, because we know (Sbox)−1 and the input differences of (S-box)−1 . If any four substitution paths (s(p) → p for byte#0 , s(q) → q for byte#5, s(r) → r for byte#10 , and s(t) → t for byte#15 ) satisfy Consistent Differential Pattern 5, then we keep the values of s(p), s(q), s(r), and s(t). If not, then we throw away the four bytes initially guessed and go back to step 1 because the four bytes initially guessed will never be components of the sixth round key. (#0,#1,#2,#3) =(byte#0, byte#1, byte#2, byte#3 ) of 4. Now, we can obtain Key5 (#0,#1,#2,#3) =(byte#0 , the fifth round key with (s(p), s(q), s(r), s(t)) and JG5 #1 #2 #3 byte , byte , byte ) of the fifth round intermediate text of JG (this value has already been obtained in step 1) by the fact that (#0,#1,#2,#3)
M ixColumn(s(p), s(q), s(r), s(t)) ⊕ Key5 (#0,#1,#2,#3) = JG5 .
162
Beomsik Song and Jennifer Seberry
Checking the Key Components Now, we check whether the above (byte#0 , byte#7 , byte#10 , byte#13 ) for the sixth round key and (byte#0 , byte#1 , byte#2 , byte#3 ) for the fifth round key are really eligible. To do this, we use the other set of plaintexts (ciphertexts). For this set, if these (byte#0 , byte#7 , byte#10 , byte#13) for the sixth round key and (byte#0 , byte#1 , byte#2 , byte#3 ) for the fifth round key satisfy Consistent Differential Pattern 5 in the corresponding bytes, then these bytes are really eligible for the components of the sixth round key and the fifth round key. This is because the probability of this event is 1/264 (the probability that Consistent Differential Pattern 5 appears in four bytes) unless the keys are correct. Selecting the Key Actually Used With the same method, we can find the other components of the sixth round key, which are (byte#1 , byte#4, byte#11 , byte#14), (byte#2 , byte#5 , byte#8 , byte#15 ), and (byte#3 , byte#6 , byte#9 , byte#12), and the corresponding components of the fifth round key. Now, if we select the sixth round key and the fifth round key, which both satisfy the key schedule, then these round keys are the keys actually used for the sixth round and the fifth round. We can also check Consistent Differential Pattern 2 (a) for five plaintexts. For this method, we must consider four key bytes together and decrypt the 232 corresponding four bytes. Then we must look up the S-box 4 × 240 times for each combination of four key bytes. After this, we decrypt 232 four bytes by one round for checking whether the four byte key components are correct. These operations must be done four times (because one block consists of 16 bytes). For this reason, this method requires 2 × 232 × 232 decryptions of one round, and 16 × (232 × 240 ) = 276 S-box lookups. Considering that 276 S-box lookups are comparable to 268 one-round decryptions for a full text, we can say the the complexity of this method is about 16 (268 + 265 ) < 266 .
5
Conclusions
We have introduced our observations on Rijndael, which will be useful for the cryptanalysis of this cipher. Specifically, we have introduced some significant properties of the MixColumns transformation, and observed the differential characteristics of the SubBytes transformation (S-box). In addition to these properties, we have found some consistent differential patterns of Rijndael for two, three, and four rounds. We have described how these properties which we have found can be applied to the cryptanalysis of this cipher. We do not think that the method we have presented is strongly threatening to Rijndael right now. However, we expect that our observations will be useful for future studies of the cryptanalysis of Rijndael.
Consistent Differential Patterns of Rijndael
163
References [1] “Advanced Encryption Standard(AES)”, FIPS-Pub. 197, NIST, http://csrc.nist.gov/publications/drafts, 2001. 149, 151 [2] E. Biham and N. Keller, “Cryptanalysis of Reduced Variants of Rijndael”, http://csrc.nist.gov/encryption/aes/round2/conf3/aes3papers.html, 2000. 149 [3] H. Gilbert and M. Minier, “A Collision Attack on 7 Rounds of Rijndael”, Proceeding of the Third Advanced Encryption Standard Candidate Conference, NIST, pp.230-241, 2000. 149 [4] J. Daemen and V. Rijmen, “AES Proposal: Rijndael”, http://csrc.nist.gov/ encryption/aes/rijndael/Rijndael.pdf, 1999. 149, 151, 156 [5] J. Daemen, L. Knudsen, and V. Rijmen, “The Block Cipher Square”, Proceeding of FSE’97, Lecture Notes In Computer Science Vol.1267, pp.149-165, 1997. 149 [6] J. Cheon, M. Kim, K. Kim, J. Lee, and S. Kang, “Improved Impossible Differential Cryptanalysis of Rijndael and Crypton”, Proceeding of ICISC’2001, Lecture Notes In Computer Science Vol.2288, pp.39-49, 2001. [7] M. Sugita, K. Kobara, K. Uehara, S. Kubota, and H. Imai, “Relationships among Differential, Truncated Differential, Impossible Differential Cryptanalyses against Word-oriented Block Ciphers like Rijndael, E2”, Proceeding of the Third AES Candidate Conference, 2000. 149 [8] N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner, and D. Whiting, “Improved Cryptanalysis of Rijndael”, Fast Software Encryption Workshop ’2000, Preproceeding, 2000. 149, 156 [9] S. Lucks, “Attacking Seven Rounds of Rijndael under 192-Bit and 256-Bit Keys”, Proceeding of the Third Advanced Encryption Standard Candidate Conference, NIST, pp.215-229, 2000. 149
Hardware Design and Analysis of Block Cipher Components Lu Xiao and Howard M. Heys Electrical and Computer Engineering Faculty of Engineering and Applied Science, Memorial University of Newfoundland St. John’s, NF, Canada A1B 3X5 {xiao,howard}@engr.mun.ca
Abstract. This paper describes the efficient implementation of Maximum Distance Separable (MDS) mappings and Substitution-boxes (S-boxes) in gate-level hardware for application to SubstitutionPermutation Network (SPN) block cipher design. Different implementations of parameterized MDS mappings and S-boxes are evaluated using gate count as the space complexity measure and gate levels traversed as the time complexity measure. On this basis, a method to optimize MDS codes for hardware is introduced by considering the complexity analysis of bit parallel multipliers. We also provide a general architecture to implement any invertible S-box which has low space and time complexities. As an example, two efficient implementations of Rijndael, the Advanced Encryption Standard (AES), are considered to examine the different tradeoffs between speed and time.
1
Introduction
In a product cipher, confusion and diffusion are both important to the security [1]. One architecture to achieve this is the Substitution-Permutation Network (SPN). In such a cipher, a Substitution-box (S-box) achieves confusion by performing substitution on a small sub-block. An n×m S-box refers to a mapping from an input of n bits to an output of m bits. An S-box is expected to be nonlinear and resistant to cryptanalyses such as differential attacks [2] and linear attacks [3]. In recently proposed SPN-based block ciphers (e.g., Rijndael [4], Hierocrypt [5], Anubis [6], and Khazad [7]), permutations between layers of S-boxes have been replaced by linear transformations in the form of mappings based on Maximum Distance Separable (MDS) codes to achieve diffusion. During encryption, as Figure 1 illustrates, typically the input data of each round is mixed with round key bits before entering the S-boxes. Key mixing typically consists of the Exclusive-OR (XOR) of key and data bits. The decryption is composed of the inverse S-boxes, the inverse MDS mappings, and the key mixtures in reverse order. To maintain similar dataflow in encryption and decryption, SPNs omit the linear transformation in the last round of encryption. Instead, one additional key mixture is appended at the end of the cipher for security considerations. If the S-box and the MDS mappings are both involutions [8] P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 164–181, 2003. c Springer-Verlag Berlin Heidelberg 2003
Hardware Design and Analysis of Block Cipher Components
165
Plaintext
⋅⋅
Round 1
⋅⋅
•••
Key Mixture
⋅⋅
⋅⋅
S-box
S-box
⋅⋅
•••
⋅⋅
⋅⋅ ⋅⋅ S-box
⋅⋅
MDS mapping
Round 2~n-2
Round n-1
⋅⋅
⋅⋅
•••
•••
⋅⋅
⋅⋅
⋅⋅
⋅⋅
⋅⋅
S-box
•••
⋅⋅
Key Mixture
S-box
⋅⋅
•••
•••
⋅⋅
⋅⋅ S-box
⋅⋅
MDS mapping
⋅⋅
Round n
⋅⋅
•••
Key Mixture
⋅⋅
⋅⋅
S-box
S-box
⋅⋅
⋅⋅
⋅⋅
•••
Key Mixture
⋅⋅
•••
⋅⋅ ⋅⋅ S-box
⋅⋅
⋅⋅
Ciphertext
Fig. 1. An SPN with MDS Mappings as Linear Transformation
(i.e., for any input x, f (f (x)) = x where f (·) represents a layer of S-boxes or the MDS layer), both the encryption and decryption operations can be performed by the same SPN except for small changes in the round key schedule in the case of XOR key mixing. We refer to such a cipher as an involution SPN, of which Anubis and Khazad are examples. An MDS mapping can be performed through multiplications and additions over a finite field. In Galois field arithmetic, additions over a finite field are bit-wise XORs, and multiplications can be calculated as polynomial multiplications modulo an irreducible polynomial. The MDS mapping used in Rijndael is implemented efficiently by several applications of “xtime” [4] (i.e., one-bit left shifting followed by addition with the irreducible polynomial). However, this method only suits the case that all entries in the generation matrix have both low Hamming weights and small magnitudes. As typically the only nonlinear components in a block cipher, S-boxes must be designed to promote high security. As a result, each bit of an S-box output is a complicated Boolean function of input bits with a high algebraic order, which makes it difficult to optimize or evaluate the complexity of S-boxes generally in hardware1. In Section 4, we propose an efficient hardware model of invertible S-boxes through the logic minimization of a decoder-switch-encoder circuit. By use of this model, a good upper bound of the minimum hardware complexity can be deduced for the S-boxes used in SPNs and some Feistel networks (e.g., 1
Some special cases with algebraic structure such as the Rijndael S-box can be efficiently optimized.
166
Lu Xiao and Howard M. Heys
Camellia [9]). The model can be used as a technique for the construction of S-boxes in hardware so that the space and time complexities are low. In our work, we take the conventional approach that the space complexity of a hardware implementation is evaluated by the number of 2-input gates and bit-wise inverters; the time complexity is evaluated by the gate delay as measured by the number of traversed layers in the gate network. These measures are not exactly proportional to the real area and delay in a synthesized VLSI design because logic synthesis involves technology-dependent optimization and maps a general design to different sets of cells based on targeted technologies. For example, a 2-input XOR gate is typically larger in area and delay than a 2-input AND gate in most technologies. As well, it is assumed in this paper that the overhead caused by routing after logic minimization can be ignored. Although routing affects the performance in a place-and-routed implementation, it is difficult to estimate its complexity accurately before synthesis into the targeted technology. From previous FPGA and ASIC implementations of block ciphers such as listed in [10], it is well established that S-boxes normally comprise most of a cipher’s area requirement and delay. Although linear components such as MDS mappings are known to be much more efficient than S-boxes, it is important for cipher designers to characterize hardware properties of both S-boxes and MDS mappings on the same basis as is done through the analysis in this paper.
2
Background
2.1
MDS Mappings
A linear code over Galois field GF(2n ) is denoted as an (l, k, d)-code, where l is the symbol length of the encoded message, k is the symbol length of the original message, and d is the minimal symbol distance between any two encoded messages. An (l, k, d)-code is MDS if d = l − k + 1. A (2k, k, k + 1)-code with generation matrix G = [I|C], where C is a k × k matrix and I is an identity matrix, determines an MDS mapping from the input X to the output Y through matrix multiplication over a Galois field as follows: → Y = C·X fM : X
where
(1)
Xk−1 Yk−1 Ck−1,k−1 . . . Ck−1,0 .. .. . .. X = ... , Y = ... , C = . . .
X0
Y0
C0,k−1 . . . C0,0
Each entry in X , Y, and C is an element in GF(2n ). For a linear transformation, the branch number is defined as the minimum number of nonzero elements in the input and output when the input elements are not all zero [11]. It is desirable that a linear transformation has a high branch
Hardware Design and Analysis of Block Cipher Components
167
number when it is used after a layer of S-boxes in a block cipher, in order for there to be low probabilities for differential and linear characteristics [2, 3]. A mapping based on a (2k, k, k+1)-code has an optimal branch number of k+1. 2.2
Bit-Parallel Multipliers
An MDS mapping can be regarded as matrix multiplication in a Galois field. Since the generation matrix is constant, each element in the encoded message is the XOR of several outputs of constant multipliers. As basic operators, bitparallel multipliers given in standard base [12, 13] are selected in this paper. A constant multiplier can be written as a function from element A to element B over GF(2n ) as follows: →B =C·A (2) fC : A
where C is the constant element in GF(2n ). The expression in binary polynomial form is given as bn−1 xn−1 + · · · + b0 = (cn−1 xn−1 + · · · + c0 )(an−1 xn−1 + · · · + a0 ) mod P (x) (3) where P (x) is denoted as the irreducible polynomial of degree n. An n×n binary matrix FC is associated with this constant multiplier such that: bn−1 an−1 bn−2 an−2 (4) .. = FC × .. . . b0 a0 where
fn−1,n−1 . . . fn−1,0 .. .. .. FC = . . . f0,n−1 . . . f0,0
and fi,j ∈ {0, 1}, 0 ≤ i, j ≤ n−1. The entries in each column of FC are determined by fn−1,j xn−1 + · · · + f0,j = xj (cn−1 xn−1 + · · · + c0 ) mod P (x). (5) Since FC is constant, it is trivial to implement a constant bit-parallel multiplier by bit-wise XOR operations. For example, considering a constant multiplier to perform B = 19H × A over GF(28 ) where “H” indicates hexadecimal format and P (x) = x8 + x4 + x3 + x + 1, we get the binary product matrix F19H and the corresponding Boolean expressions for all bit outputs as the following: b 7 = a4 a3 00011000 0 0 0 0 1 1 0 0 b 6 = a3 a2 1 0 0 0 0 1 1 0 b 5 = a7 a2 a1 1 1 0 0 0 0 1 1 b 4 = a7 a6 a1 a0 F19H = 0 1 1 1 1 0 0 1 ⇒ b 3 = a6 a5 a4 a3 a0 . 1 0 1 0 0 1 0 0 b 2 = a7 a5 a2 b 1 = a6 a4 a1 0 1 0 1 0 0 1 0 b 0 = a5 a4 a0 00110001
168
Lu Xiao and Howard M. Heys
If we define w(FC ) as the count of nonzero entries in FC and wi (FC ) as the count of nonzero entries in the i-th row of FC , the number of 2-input XOR gates used for the multiplier is upper bounded by w(FC ) − n and the delay of gate levels is max{log2 wi (FC )}. 2.3
Three Types of Matrices
In the search of optimized MDS mappings in the next section, we will use three types of matrices which suit different applications. When an exhaustive matrix search is impractical, we will limit the search scope to one of the following three matrix types. – Circulant matrices: Given k elements α0 , . . . , αk−1 , a circulant matrix A is constructed with each entry Ai,j = α(i+j) mod k . The probability that a circulant matrix is suitable for an MDS mapping C is much higher than that of a normal square matrix [14]. – Hadamard matrices: Given k elements α0 , . . . , αk−1 , a Hadamard matrix A is constructed with each entry Ai,j = αi⊕j . Each Hadamard matrix A over a finite field has the following properties: A2 = γ · I where γ is a constant. When γ = 1, A is an involution matrix. An involution MDS mapping is required by an involution SPN. – Cauchy matrices: Given 2k elements α0 , . . . , αk−1 , β0 , . . . , βk−1 , a Cauchy matrix A is constructed with each entry Ai,j = 1/(αi ⊕ βj ). Any Cauchy matrix is MDS when α0 , . . . , αk−1 are distinct, β0 , . . . , βk−1 are distinct, and αi = βj for all i, j [15]. Although a Cauchy matrix can be conveniently used as matrix C for an MDS mapping, the relation between selected coefficients (i.e., α0 , . . . , αk−1 , β0 , . . . , βk−1 ) and corresponding MDS complexity is not as straightforward as in the former two matrix types. Hence, it is difficult to select coefficients to construct a Cauchy matrix that can be efficiently implemented in hardware. 2.4
A Method to Simplify S-box Circuits
In [16], a method of generating a Boolean function through nested multiplexing is introduced to optimize gate circuits for the 6×4 S-boxes in DES implementations. Consider that a Boolean function f (a, b, c) with three input bits a, b, and c can be written as f (a, b, c) = f1 (a, b) · c + f2 (a, b) · c where f1 (a, b) and f2 (a, b) are two Boolean functions and “+” denotes OR. If f3 (a, b) = f1 (a, b) ⊕ f2 (a, b), then f (a, b, c) = f2 (a, b) ⊕ (f3 (a, b) · c) . Similarly, a Boolean function with an input of 4 bits can be regarded as a multiplexor using one input bit to select two boolean functions determined by the other three input bits. This procedure is repeated until a Boolean function has
Hardware Design and Analysis of Block Cipher Components
169
6 input bits. A 6×4 DES S-box contains four of these 6-bit Boolean functions. This general approach can be taken for any size S-box and works well for optimization of small S-boxes such as the 4×4 S-boxes in Serpent [17]. However, in the case of general invertible 8×8 S-boxes used by many ciphers, this method can be improved upon, as we shall see.
3 3.1
Optimized MDS Mappings for Hardware Complexity of MDS Mappings
An MDS mapping has been defined in (1) where each entry Ci,j of matrix C is associated with a product matrix FCi,j . Replacing each Ci,j in matrix C with FCi,j as a submatrix, we get an nk×nk binary matrix FC as the following: FCk−1,k−1 . . . FCk−1,0 .. .. .. FC = . . . . FC0,k−1 . . . FC0,0 Because Y is the matrix product of FC and X , the MDS mapping can be straightforwardly implemented by a number of XOR gates. The gate count of 2-input XORs is upper bounded by GMDS = w(FC ) − nk
(6)
and the delay is upper bounded by DMDS = max{log2 wi (FC )}
(7)
where 0 ≤ i ≤ n−1. 3.2
The Optimization Method
The hardware complexity of an MDS mapping is determined directly by matrix C. In order to improve hardware performance, matrix C should be designed to produce low hardware complexity. However, not every matrix with low complexity is suitable as an MDS mapping. The mapping associated with matrix C can be tested using the following theorem: Theorem 1. [15]: An (l, k, d)-code with generation matrix G = [I|C] is MDS if, and only if, every square submatrix of C is nonsingular. To minimize gate count and delay in hardware, we want to find an MDS mapping based on a (2k, k, k + 1)-code over GF(2n ) with low Hamming weights of w(FC ) and wi (FC ). Theorem 1 provides us a way to determine whether a matrix candidate is MDS. Theoretically, the optimal MDS mapping can always be determined through an exhaustive search of all matrix candidates of C. However, such a search is computationally impractical when k and n get large. In this
170
Lu Xiao and Howard M. Heys
Table 1. Four Choices for MDS Search Search Options
# of Candidates
Exhaustive Circulant Matrices Hadamard Matrices Cauchy Matrices
k2 n
2 2kn 2kn 22kn
Applicable Cases small k, n large k, n large k, n as well as involution no MDS mappings found in other matrix categories
case, it is reasonable to focus the search on some subsets of candidates which are likely to yield MDS mappings. The search scope can thus be limited to circulant, Hadamard, and Cauchy matrices. Table 1 describes four choices for the MDS search. We adopt an appropriate searching method based on the number of candidates to be tested and the required MDS features (involution or not). If computation permits, exhaustive search is preferred. When an exhaustive search is impractical, a search in circulant matrices may be performed for non-involution MDS mappings or a search in Hadamard matrices may be performed for MDS mappings which are involutions. Since only a subset of MDS mappings are derived from circulant, Hadamard, or Cauchy matrices, only exhaustive search over all possible matrices (and therefore all MDS mappings) is guaranteed to find a truly optimized MDS mapping. However for large k and n, searching over a subset of MDS mappings is the best that can be achieved. The objective is to find the candidate with the MDS property and a low hardware cost. The hardware “cost” could be gate count, delay, or both. Sometimes, no candidates in the sets of circulant and Hadamard matrices pass the MDS test. In this case, the optimal mapping will be determined through a search of Cauchy matrices, where each candidate is deterministically MDS. Once a candidate is proved to be MDS (or involution MDS), those remaining candidates with higher hardware cost can be ignored narrowing the search space. The results generated in this searching method can be used for the hardware characterization of ciphers with MDS mappings of a specified size. It is noted that w(FC ) − nk just indicates the upper bound of XORs in the circuit. Two greedy methods introduced in [13] can be applied to the MDS matrix multiplication in order to further reduce redundancy in the circuit. However, the improvement of using greedy methods is not significant when w(FC ) is already low. 3.3
MDS Search Results
We have implemented a search for the best MDS mappings of various sizes. During the search, gate reduction is given higher priority than delay reduction because the delay difference among mappings is generally not evident. The optimal2 non-involution MDS mappings for bit-parallel implementations for various 2
Here “optimal” means “locally optimal” when the MDS mapping is constrained to a particular matrix category.
Hardware Design and Analysis of Block Cipher Components
171
Table 2. MDS Search Results MDS
(4, 2, 3) (4, 2, 3) (4, 2, 3) (8, 4, 5) (8, 4, 5) (16, 8, 9) (16, 8, 9)
Galois Field
Optimal Non-involution MDS Optimal Involution MDS P (x) Average Delay Delay w(FC ) w(FC ) (Gate Matrix w(FC ) (Gate Matrix levels) Type levels) Type
GF(22 ) 7H GF(24 ) 13 H GF(28 ) 11D H GF(24 ) 13 H GF(28 ) 11D H GF(24 ) 13 H GF(28 ) 11D H
8 32 128 128 512 512 2048
9 17 35 76 164 464 784
2 2 3 3 3 4 4
exhaustive exhaustive exhaustive circulant circulant Cauchy circulant
11 21 48 88 200 544 928
2 2 3 3 4 5 5
exhaustive exhaustive exhaustive Hadamard Hadamard Cauchy Hadamard
sizes of MDS mappings are given in Table 2. As in Rijndael, SPNs using these optimal MDS mappings are more efficient in encryption than decryption. In Table 2, the average w(FC ) is determined by computing the number of matrix entries and dividing by two. These average w(FC ) values are included to show how effective the optimization work is for each MDS category. The optimal involution MDS mappings in terms of our complexity analysis are also given in Table 2. Since the MDS test of Theorem 1 is computationally intensive, an involution test will be performed first to eliminate wrong candidates. In [8], an algebraic construction of an involution MDS mapping based on Cauchy matrices is described. This known MDS mapping is used to prune remaining candidates that produce higher complexity before a better mapping is found. These two steps reduce the candidate space dynamically. The categories in Table 2 correspond to many MDS mappings in real ciphers (although there are minor differences in Galois field selection). For example, Square, Rijndael, and Hierocrypt at the lower level have non-involution MDS mappings based on (8, 4, 5)-codes over GF(28 ) [14, 4, 5]. SHARK has an noninvolution MDS mapping based on (16, 8, 9)-codes over GF(28 ) [11]. Hierocrypt at the higher level has two choices of non-involution MDS mappings, based on (8, 4, 5)-codes over GF(24 ) and GF(232 ), respectively [5]. Anubis has an involution MDS mapping based on an (8, 4, 5)-code over GF(28 ) [6]. Khazad has an involution MDS mapping based on a (16, 8, 9)-code over GF(28 ) [7]. None these ciphers have MDS mappings with complexity as low as their corresponding cases listed in the tables. The mappings of Rijndael, Anubis, and Khazad have MDS mappings that are close to the optimal cases in terms of gate counts (i.e., w(FC ) = 184, 216, and 1296, respectively), while Hierocrypt’s MDS mappings have high complexity, similar to the average gate counts. As Table 2 indicates, the involution MDS mappings are not as efficient as non-involution MDS mappings after optimization. However, the performance difference between them is quite small. When used in an SPN, the involution MDS mapping produces equally optimized performance for both encryption and decryption. When an SPN uses a non-involution MDS mapping optimized only for encryption, the inverse MDS mapping used in decryption has a higher complexity. For example, the MDS mapping used in Rijndael decryption has w(FC ) = 472
172
Lu Xiao and Howard M. Heys
I0 I1
n
n×2 decoder
• • •
In-1
• • •
X0 X1 • • •
switch
• • •
X 2n −1
Y0 Y1
n
• • •
2 ×n encoder
Y2n −1
O0 O1 • • •
On-1
Fig. 2. A General Hardware Structure of Invertible S-boxes and, hence, needs more gates in hardware than the MDS mapping used for encryption which has w(FC ) = 184. When a non-involution MDS mapping is optimized for both encryption and decryption, the overall hardware cost is similar to an optimized involution MDS mapping. The real hardware circuits of these MDS mappings produce complexities with the same trends as shown in Table 2. For example, using Synopsys Design Compiler (with default optimization strategy) and TSMC’s 0.18 µm CMOS cell library, we get the area sizes of the optimal non-involution MDS mappings of the bottom four rows of Table 2 as 1549.0, 3659.0, 8863.0, and 17376.4 µm2 , respectively. Their critical time delays are 1.30, 1.33, 2.01, and 2.01 ns, respectively.
4 4.1
General Hardware Model of Invertible S-boxes Decoder-Switch-Encoder Structure
In this section, we derive a general hardware model of n×n invertible S-boxes by simplification of a decoder-switch-encoder structure. Using this model, the upper bounds of optimized gate counts and delay for S-boxes can be deduced. As shown in Figure 2, the n×2n decoder outputs 2n distinct minterms from the n-bit S-box input. The switch is a wiring area composed of 2n wires. Each wire connects an input port Xi to an output port Yj , 0 ≤ i, j ≤ 2n −1. Since the S-box is invertible, only one input port is connected to an output port. Although the wiring scheme embodies the S-box mapping, the switch does not cost any gates. The output of the switch is encoded through a 2n × n encoder, which produces the n-bit output of the S-box. 4.2
Decoder
The n×2n decoder is implemented by n NOT gates and a number of AND gates. The NOT gates generate complementary variables of n inputs. The AND gates produce all 2n minterms from n binary inputs and their complements. The most straightforward approach is to generate every minterm separately, which costs 2n · (n − 1) 2-input AND gates plus n bit-wise NOT gates, and a delay of log2 n+1 gate levels. This approach can be improved by eliminating redundant AND gates in the circuit. The optimized circuit can be generated using a dynamic programming method.
Hardware Design and Analysis of Block Cipher Components
173
for i ← 0 to n − 1 do D(i, i) ← 0 for step ← 1 to n − 1 do for i ← 0 to n − 1 − step do j = i + step D(i, j) ← ∞ for k ← i to j − 1 do temp = D(i, k) + D(k + 1, j) + 2j−i+1 if temp < D(i, j) then D(i, j) ← temp return D(0, n − 1)
Fig. 3. Algorithm to Determine Decoder AND-Gate Count
Consider the dynamic programming algorithm in Figure 3, used to compute the minimum number of AND gates in the decoder. Let D(i, j) be the minimal number of 2-input AND gates used for generating all possible minterms composed of literals Ii , · · · , Ij and their complements. Thus, D(i, j) = 0 when i = j. If we know two optimal results of subproblems, say D(i, k) and D(k + 1, j) where i ≤ k < j, all minterms for Ii , · · · , Ij can be obtained by using AND gates to connect two different minterms in the subproblems, respectively. Since the number of these pairs is 2j−i+1 , this solution needs D(i, k) + D(k + 1, j) + 2j−i+1 AND gates in total. The algorithm of Figure 3 can be easily modified to determine the actual gate network used for the decoder. When n = 2k , it can be shown that the number of 2-input AND gates and bit-wise NOT gates in the decoder is given by k i 22 −i + n . (8) GDec (n) = n i=1
The delay, in terms of the number of gate levels, of the decoder is DDec (n) = log2 n + 1 . 4.3
Encoder
The 2n ×n binary encoder can be implemented using a number of 2-input OR gates. Table 3 gives the truth table of a 16 × 4 binary encoder. Each output signal Oi is the OR of the 2n−1 input signals that produce “1” in column Oi in the truth table; this is denoted as Oi = Yk . If we separately construct circuits for these output signals, it would cost n · (2n−1 − 1) 2-input OR gates and a delay of n−1 gate levels. Fortunately, most OR gates can be saved if the same intermediate ORed signals are reused. Considering that the OR is done in a dynamic programming method, some subproblems used in calculating Oi are also used in calculating Oj if i > j > 0. For example, as shown in Table 3, the task of calculating On−1 includes the subproblems of calculating the OR from Y5·2n−3 to Y6·2n−3 −1 and calculating the OR from Y6·2n−3 to Y2n −1 . These two subproblems are also included in the
174
Lu Xiao and Howard M. Heys
Table 3. Truth Table of a 2n ×n Encoder Input Yk Y0 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Y13 Y14 Y15
O3 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
Output O2 O1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1 0 0 0 0 0 1 0 1 1 0 1 0 1 1 1 1
Input Yk Y0 , · · · , Y2n−3 −1
O0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
(a) n = 4
Output On−1 On−2 On−3 · · · 0 0 0 ···
Y2n−3 , · · · , Y2n−2 −1
0
0
1
···
Y2n−2 , · · · , Y3·2n−3 −1
0
1
0
···
Y3·2n−3 , · · · , Y2n−1 −1
0
1
1
···
Y2n−1 , · · · , Y5·2n−3 −1
1
0
0
···
Y5·2n−3 , · · · , Y6·2n−3 −1
1
0
1
···
Y6·2n−3 , · · · , Y7·2n−3 −1
1
1
0
···
Y7·2n−3 , · · · , Y2n −1
1
1
1
···
(b) n ≥ 4
calculation of On−3 and On−2 , respectively. As a result, the OR gates needed to solve the recurrent subproblems can be saved. Actually, in the procedure of calculating Oi , only the subproblem of calculating the OR from Y2i to Y2i+1 −1 has to be solved because all other 2n−i−1 −1 subproblems have been solved in the procedures of calculating On−1 , · · · , Oi+1 . In this sense, we need 2i −1 OR gates for the subproblem that has not been solved and 2n−i−1 −1 OR gates to OR the results of all 2n−i−1 subproblems. In total, the count of OR gates for the encoder is GEnc (n) =
n−1
[(2i − 1) + (2n−i−1 − 1)] = 2n+1 − 2n − 2
(9)
i=0
and the gate delay is DEnc (n) = n − 1. 4.4
S-box Complexity
Based on the analysis of the decoder-switch-encoder structure, the hardware complexity of invertible S-boxes is estimated. Since 8×8 S-boxes are very popular in current block ciphers (e.g., Rijndael [4], Hierocrypt [5], and Camellia [9]), let us examine the usability of this model in this case. According to (8) and (9), the upper bound of the optimal gate count for an 8×8 invertible S-box is 806, while the gate count before logic minimization is 2816. Through experimental simplifications using the Synopsys logic synthesis tool [18], we can realize 8×8
1000000
35
100000
30
Delay (Gate levels)
Gate Count
Hardware Design and Analysis of Block Cipher Components
10000 1000 100 10 1
175
25 20 15 10 5 0
4
5
6
7
8
9
10 11 12 13 14 15 16
4
5
6
7
8
9
Size n DSE Model
10
11
12
13
14
15
16
Size n
Reference Model
DSE Model
Reference Model
Fig. 4. Gate Count Upper Bounds of Fig. 5. Delay Upper Bounds of SS-boxes boxes invertible S-boxes with a count of area units close to 800 when the target library is lsi 10k.db. Since a small part of cells in the library have more than 2 inputs, the cell count is around 550. Such a result is quite close to the upper bound when n = 8. When considering the implementation of an S-box in hardware, the upper bound of the gate count increases exponentially with the S-box size n, as shown in Figure 4. Simultaneously, the upper bound of delay increases linearly, as shown in Figure 5. In these two figures, the S-box optimization model described in [16] and presented in Section 2 is used as a reference and the decoder-switch-encoder model is labelled DSE. When the size of an S-box is less than 6, the delay of the two models are similar and the gate count of the reference model is slightly lower. As the size of the S-box increases, the decoder-switch-encoder model costs less in both gate count and delay. The details of gate counts and delays are listed in Table 4 and Table 5. Given the fact that about half the gates used in the reference model are XOR gates which are typically more expensive in hardware than NOT, AND, and OR gates, the decoder-switch-encoder model would appear to be more useful for hardware design, both as an indication of the upper bound on the optimal S-box complexity and as a general methodology for implementing an invertible S-box.
Table 4. Gate Counts of Invertible S-boxes in the Decoder-Switch-Encoder Model S-box Size NOT # AND # OR # Gate Count Reference Count
4×4 4 24 22 50 36
6×6 6 88 114 208 192
8×8 8 304 494 806 1020
10×10 10 1120 2026 3156 5112
12×12 12 4272 8166 12450 24564
14×14 14 16712 32738 49464 114672
16×16 16 66144 131038 197198 524268
176
Lu Xiao and Howard M. Heys
Table 5. Gate Delays of Invertible S-boxes in the Decoder-Switch-Encoder Model S-box Size 4×4 6×6 8×8 10×10 12×12 14×14 16×16 NOT 1 1 1 1 1 1 1 AND 2 3 3 4 4 4 4 OR 3 5 7 9 11 13 15 Delay 6 9 11 14 16 18 20 Reference Delay 6 10 14 18 22 26 30
5
Efficient Rijndael Encryption Implementations
Since Rijndael was selected as AES, it is of great significance to characterize the implementation of Rijndael in hardware. Each round of Rijndael contains the following operations to the state (i.e., the intermediate data stored in a two dimensional array) [4]: (1) a layer of 8×8 S-boxes called ByteSub, (2) a byte-wise cyclic shift per row called ShiftRow, (3) an MDS mapping based on an (8, 4, 5)-code per column called MixColumn, and (4) the round key mixing through XORs. The MDS mapping is defined over GF(28 ) and the S-box performs multiplicative inverse over GF(28 ) followed by a bitwise affine operation. With parallel S-boxes implemented through table lookups, a hardware design is proposed in [19]. Adhering to the structure of the algorithm specification of [4] as in Figure 6(a), this design achieves a throughput of 1.82 Gbits/sec in 0.18 µm CMOS technology, where each S-box costs about 2200 gates. Since some operations over the composite field GF((24 )2 ) are more compact than over GF(28 ), an efficient Rijndael design in composite field arithmetic is proposed in [20]. A cryptographic core (i.e., essentially one round mainly consisting of 16 S-boxes and the MDS mapping layer) in [20] only costs about 4000 gates and a delay of 240 gate levels [21] is expected in theory. Following the normal encryption dataflow, labelled as Design I in Figure 6(a), we apply the discussed S-box model and MDS bit-parallel implementation method to ByteSub and MixColumn, respectively. After the first round key K0 is added to the plaintext, the state goes through an iterative round structure. Regardless of its mathematical definition, ByteSub is implemented as a layer of 16 parallel 8×8 S-boxes using the decoder-switch-encoder model. Then, the state iteratively proceeds through ShiftRow, MixColumn, and the addition with round key Kr . ShiftRow is implemented through wiring without any gates needed. Four bit-parallel MDS mappings perform MixColumn for the 4 columns. As listed in Table 6, we get an iterative core circuit of one round which costs 13456 gates and produces a delay of 15 gate levels per round. Because the MDS mappings are omitted in the last round, the Rijndael encryption of 10 rounds produces a delay of 148 gate levels, a significant improvement over the delay of 240 gates levels in the design of [20]. The design needs far fewer gates than in [19]. As shown in Figure 6(b), labelled as Design II, we get a more compact circuit through hybrid operations over GF(28 ) and its equivalent composite field
Hardware Design and Analysis of Block Cipher Components Plaintext
Plaintext Kr-1
K0 ByteSub
T(⋅)
T(⋅) Inversion over GF((24)2)
ShiftRow no
177
no r <10 yes MixColumn
r <10 yes LT1
Kr
LT2 r =10 yes Ciphertext (a) Design I
no K10 Ciphertext (b) Design II
Fig. 6. Rijndael Encryption Implementations GF((24 )2 ). The polynomial P1 (y) = y 4 + y + 1 is used to define GF(24 ) and the polynomial P2 (x) = x2 + x + 09H is used to define GF((24 )2 ). Such a composite field is the same as in the implementation proposed in [20] for ease of comparison. The conversion from GF(28 ) to GF((24 )2 ) is denoted as T (·), and its inverse is T −1 (·). It has been recognized that the multiplicative inverse over GF((2m )n ) can have a much lower complexity than the equivalent inverse over GF(2mn ) [13, 22]. As an example, the equivalent ByteSub over GF((24 )2 ) costs less than one fifth of the gate count of a general invertible S-box based on the upper bound of 806 in the decoder-switch-encoder S-box model. However, the subfield-based operation is normally slow. In the implementation of Figure 6(b), the inverse over the composite field costs a gate delay of 14 (as deduced from [12, 13, 20, 21]). Given additional overhead for field conversion and ByteSub’s affine function, the ByteSub instance has a much longer delay path than in the implementation of Design I. To mitigate this problem, we can incorporate all linear operations into LT1 in the first nine rounds and LT2 in the last round as shown in Figure 7, resulting in a delay of 202 gate levels for encryption. The number of gates used in the iterative core circuit is slightly (about 3%) less than in [20]. The detailed gate counts and delays for Design II components are listed in Table 7. The Appendix describes the detailed implementation of LT1 and LT2.
Table 6. Gate Counts and Delays of Operations in Design I Operations ByteSub MixColumn Key Addition Total Per Round Gate Count 12896 432 128 13456 Delay (gate levels) 11 3 1 15
178
Lu Xiao and Howard M. Heys
Table 7. Gate Counts and Delays of Operations in Design II 16×Inversion over LT1 LT2 T (·) Key Total GF((24 )2 ) [12, 13, 20, 21] Addition Per Round Gate Count 2384 792 304 208 128 3816 Delay (gate levels) 14 5 3 3 1 20 Operations
Figure 8 compares the estimated performance of the two designs of Figure 6. Design I uses the MDS mapping implementation method and S-box model discussed in Sections 2 and 4 directly (while “Design I (Ref.)” uses the reference model in [16] for the S-boxes). In Design II, the method discussed in the Appendix is used to deduce the linear transformations LT1 and LT2. As Figure 8 shows, Design II gains a delay reduction of 16% and a slight reduction in the number of gates compared with the implementation of [20]. Design I is a much faster implementation with about three times as many gates. The round structures of the two Rijndael designs have been coded in VHDL and synthesized by using Synopsys Design Compiler and TSMC’s 0.18 µm CMOS cell library. Setting constraints to tradeoff area and delay during synthesis, we get the characteristic curves shown in Figure 9. The two end points of each curve represent the synthesis results with smallest delay and area. In line with our performance evaluation, Design I can lead to an iterative cipher architecture with a throughput up to 4 Gbits/sec (i.e., the smallest round critical path is 3.04 ns). On the other hand, Design II is useful for an area-restricted or pipelined application because of its small area requirement.
6
Conclusions
We have presented a mechanism to select the MDS mappings for optimal hardware implementation of a block cipher. The optimized MDS mapping straightforwardly leads to a compact and fast implementation at the gate level. As well, a general model of invertible S-boxes is proposed and the upper bounds of the
500%
-1
T (⋅)
400%
Affine A(Q Function ij)
T -1(⋅)
300%
ShiftRow
Affine A(Q Function ij)
200%
MixColumn
ShiftRow
100%
T(⋅) (a) LT1
0%
(b) LT2
Fig. 7. Linear Transformations in Design II
Design in [20]
Design I (Ref.)
Design I
Design II
Gate count
100%
423%
337%
97%
Delay
100%
74%
62%
84%
Fig. 8. Performance Comparison
Hardware Design and Analysis of Block Cipher Components
179
400000 350000 300000 250000 Area 200000 (µm2) 150000 100000 50000 0 0
5
10
15
20
Delay (ns) Design I
Design II
Fig. 9. Synthesis of Round Structure minimal hardware complexity are deduced through systematic logic minimization. Since S-boxes and MDS mappings are both widely used cipher components, the discussed design, optimization and hardware complexity evaluation provides an analytical basis for studying the hardware performance of block ciphers. As an example, two efficient hardware designs of Rijndael encryption are considered with regards to different tradeoffs between gate count and delay, and their synthesis results are presented.
References [1] C. E. Shannon, “Communication Theory of Secrecy Systems”, Bell System Technical Journal, vol. 28, pp. 656-715, 1949. 164 [2] E. Biham and A. Shamir, “Differential cryptanalysis of DES-like cryptosystems”, Advances in Cryptology - CRYPTO ’90 , Lecture Notes in Computer Science 537, pp. 2-21. Springer-Verlag, 1991. 164, 167 [3] M. Matsui, “Linear Cryptanalysis Method for DES Cipher”, Advances in Cryptology - Eurocrypt ’93, Lecture Notes in Computer Science 765, Springer-Verlag, pp. 386-397, 1993. 164, 167 [4] J. Daemen and V. Rijmen, “AES Proposal: Rijndael”, Advanced Encryption Standard, available on: csrc.nist.gov/encryption/aes/rijndael. 164, 165, 171, 174, 176 [5] K. Ohkuma, H. Muratani, F. Sano, and S. Kawamura, “The Block Cipher Hierocrypt”, Workshop on Selected Areas in Cryptography - SAC 2000, Lecture Notes in Computer Science 2012, Springer-Verlag, pp. 72-88, 2001. 164, 171, 174 [6] P. Barreto and V. Rijmen, “The Anubis Block Cipher”, NESSIE Algorithm Submission, 2000, available on: www.cosic.esat.kuleuven.ac.be/nessie. 164, 171 [7] P. Barreto and V. Rijmen, “The Khazad Legacy-Level Block Cipher”, NESSIE Algorithm Submission, 2000, available on: www.cosic.esat.kuleuven. ac.be/nessie. 164, 171 [8] A. Youssef, S. Mister, and S. Tavares, “On the Design of Linear Transformations for Substitution-Permutation Encryption Networks”, Workshop on Selected Areas in Cryptography - SAC ’97, Ottawa, 1997. 164, 171 [9] K. Aoki, T. Ichikawa, M. Kanda, M. Matsui, S. Moriai, J. Nakajima, and T. Tokita, “Camellia: a 128-bit Block Cipher Suitable for Multiple Platforms”, NESSIE Algorithm Submission, 2000, available on: www.cosic.esat.kuleuven. ac.be/nessie. 166, 174
180
Lu Xiao and Howard M. Heys
[10] J. Nechvatal, E. Barker, L. Bassham, W. Burr, M. Dworkin, J. Foti, and E. Roback, “Report on the Development of the Advanced Encryption Standard (AES)”, Report on the AES Selection from U. S. National Institute of Standards and Technology (NIST), available on: csrc.nist.gov/encryption/aes. 166 [11] V. Rijmen, J. Daemen, B. Preneel, A. Bosselaers, and E. De Win, “The Cipher SHARK”, Workshop on Fast Software Encryption - FSE ’96, Lecture Notes in Computer Science 1039, Springer-Verlag, pp. 99-112, 1997. 166, 171 [12] E. D. Mastrovito, “VLSI Design for Multiplication over Finite Fields GF(2m )”, Applied Algebra, Algebraic Algorithms and Error-Correcting Codes - AAECC-6, Lecture Notes in Computer Science 357, pp. 297-309, 1989. 167, 177, 178 [13] C. Paar, “Efficient VLSI Architectures for Bit-Parallel Computation in Galois Fields”, PhD Thesis, Institute for Experimental Mathematics, University of Essen, Germany, 1994. 167, 170, 177, 178, 181 [14] J. Daemen, L. R. Knudsen, and V. Rijmen, “The Block Cipher Square”, Workshop on Fast Software Encryption - FSE ’97, Lecture Notes in Computer Science 1267, Springer-Verlag, pp. 54-68, 1997. 168, 171 [15] F. J. MacWilliams and N. J. A. Sloane, The Theory of Error-Correcting Codes, North-Holland, Amsterdam, 1977. 168, 169 [16] E. Biham, “A Fast New DES Implementation in Software”, Workshop on Fast Software Encryption - FSE ’97, Lecture Notes in Computer Science 1267, Springer-Verlag, pp. 260-272, 1997. 168, 175, 178 [17] R. Anderson, E. Biham, and L. Knudsen, “Serpent: a Proposal for the Advanced Encryption Standard”, AES Algorithm Submission, available on: www.cl.cam.ac.uk/~rja14/serpent.html. 169 [18] Synopsys, Online Documentation on Synopsys Design Analyzer, 2000. 174 [19] H. Kuo and I. Verbauwhede, “Architectural Optimization for a 1.82Gbits/sec VLSI Implementation of the AES Rijndael algorithm”, Workshop on Cryptographic Hardware and Embedded Systems - CHES 2001, Lecture Notes in Computer Science 2162, Springer-Verlag, pp. 51-64, 2001. 176 [20] A. Rudra, P. K. Dubey, C. S. Jutla, V. Kumar, J. R. Rao, and P. Rohatgi, “Efficient Rijndael Encryption Implementation with Composite Field Arithmetic”, Cryptographic Hardware and Embedded Systems - CHES 2001, Lecture Notes in Computer Science 2162, Springer-Verlag, pp. 171-184, 2001. 176, 177, 178 [21] A. Rudra, Personal Communication. 176, 177, 178 [22] V. Rijmen, “Efficient Implementation of the Rijndael S-box”, available on: www.esat.kuleuven.ac.be/~rijmen/rijndael. 177
Appendix: Implementation of LT1 and LT2 in Rijndael Design II In order to mathematically represent LT1 and LT2, we denote the input state as {Ui,j } and the output state as {Vi,j }, where i denotes the row index and j denotes the column index of an element in the state. The binary coefficients of Ui,j and Vi,j in their polynomial expressions can be written as two tuples Ui,j and Vi,j , respectively. LT1 can be expressed as V0,j FL02 FL03 FL01 FL01 U0,j T (63H) V1,j FL01 FL02 FL03 FL01 U1,j−1 T (63H) (10) V2,j = FL01 FL01 FL02 FL03 U2,j−2 + T (63H) . V3,j FL03 FL01 FL01 FL02 U3,j−3 T (63H)
Hardware Design and Analysis of Block Cipher Components
181
In above equation, FL01 , FL02 , and FL03 are 8×8 submatrices derived from the following expression: FL0i = FT · F0i · FA · FT−1 , i = 1, 2, 3
(11)
where F0i is the product matrix associated with 01H, 02H, or 03H in GF(28 ) and matrix FA is associated with the affine function A(·) inside ByteSub (i.e., A(X ) = FA · X + 63H). FT is the 8×8 transformation matrix associated with T (·)(i.e., T (Ui,j ) = FT · Ui,j ). Its inverse is FT−1 . Similarly, LT2 is a function defined as V0,j U0,j 63H V1,j 63H −1 U1,j−1 (12) V2,j = (FA · FT ) U2,j−2 + 63H . V3,j U3,j−3 63H Once we know the matrices FT , FL0i , and the result of FA · FT−1 (as listed in the following), the gate networks consisting of XORs can be straightforwardly derived for LT1 and LT2. The greedy method I described in [13] is used to reduce redundancy in the gate network, where small modifications are made in order to avoid the increase of delay.
10100000 1 0 1 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 0 FT = 1 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 11011101 10101011 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 FL02 = 0 0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1 1 0 0 00001011 10000110 1 1 0 1 0 0 0 0 1 0 0 0 1 1 1 0 0 1 1 1 1 0 1 1 −1 FA · FT = 0 0 0 0 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 0 1 1 1 1 01100101
FL01
FL03
00001000 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 = 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 1 0 1 0 00010100 10100011 1 0 1 0 1 1 1 1 1 0 0 1 1 1 0 0 0 1 0 1 0 0 1 1 = 1 0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 0 00011111
Higher Order Correlation Attacks, XL Algorithm and Cryptanalysis of Toyocrypt Nicolas T. Courtois CP8 Crypto Lab, SchlumbergerSema 36-38 rue de la Princesse, BP 45, 78430 Louveciennes Cedex, France [email protected] http://www.nicolascourtois.net
Abstract. Many stream ciphers are built of a linear sequence generator and a non-linear output function f . There is an abundant literature on (fast) correlation attacks, that use linear approximations of f to attack the cipher. In this paper we explore higher degree approximations, much less studied. We reduce the cryptanalysis of a stream cipher to solving a system of multivariate equations that is overdefined (much more equations than unknowns). We adapt the XL method, introduced at Eurocrypt 2000 for overdefined quadratic systems, to solving equations of higher degree. Though the exact complexity of XL remains an open problem, there is no doubt that it works perfectly well for such largely overdefined systems as ours, and we confirm this by computer simulations. We show that using XL, it is possible to break stream ciphers that were known to be immune to all previously known attacks. For example, we cryptanalyse the stream cipher Toyocrypt accepted to the second phase of the Japanese government Cryptrec program. Our best attack on Toyocrypt takes 292 CPU clocks for a 128-bit cipher. The interesting feature of our XL-based higher order correlation attacks is, their very loose requirements on the known keystream needed. For example they may work knowing ONLY that the ciphertext is in English. Keywords: Multivariate cryptography, overdefined systems of multivariate equations, MQ problem, XL algorithm, Gr¨ obner bases, stream ciphers, pseudo-random generators, nonlinear filtering, ciphertext-only attacks, Toyocrypt-HR1, Toyocrypt-HS1, Cryptrec.
1
Introduction
The security of most cryptographic schemes is usually based on impossibility to extract some secret information, given access to some encryption, signature oracles or other derived information. In most useful cases, there is no security in information-theoretic setting: the adversary has usually enough information to uniquely determine the secret (or the ability) he wants to acquire. Moreover the basic problem is always (in a sense) overdefined: the adversary is assumed to have at his disposal, for example, great many plaintext and cipher text pairs, message and signature pairs, etc. He usually has available, much more than the P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 182–199, 2003. c Springer-Verlag Berlin Heidelberg 2003
Higher Order Correlation Attacks
183
information needed to just determine the secret key. Thus, one might say, most cryptographic security relies on the hardness of largely overdefined problems. In public key cryptography, the problem is addressed by provable security, that will assure that each utilization of the cryptographic scheme does not leak useful information. The security is guaranteed by a hardness of a single difficult problem, and will not degrade with the repetitive use of the scheme. However unfortunately, there is yet very little provable security in secret key cryptography. It is also in secret key cryptography that the problems become most overdefined, due to the amounts of data that are usually encrypted with one single session key. This is especially true for stream ciphers: designed to be extremely fast in hardware, they can encrypt astronomic quantities of data, for example on an optical fiber. In this paper we point out that many constructions of stream ciphers directly give an overdefined system of multivariate equations of low degree. The fact that solving such overdefined systems of equations, is much easier than expected, has been demonstrated at Eurocrypt 2000 by Courtois, Klimov, Patarin and Shamir. Later, the possibility to use multivariate polynomial equations to attack secret key cryptosystems such as AES has been proposed by Courtois and Pieprzyk [7]. Unfortunately, these attacks are, to say the least, heuristic. In this paper we apply similar techniques to stream ciphers. Unlike in the work of Courtois and Pieprzyk, our systems of equations will be much more overdefined. We show that in this case it is possible to predict the behaviour of the XL method with precision and confidence. We attack a large class of stream ciphers in which there is a linear part, producing a sequence with a large period, and a nonlinear part that produces the output, given the state of the linear part. The security of such stream ciphers have been studied by many authors. In [11], Golic gives a set of criteria that should be satisfied in order to resist to the known attacks on stream ciphers. For example, a stream cipher should resist to the fast correlation attack [15], the conditional correlation attack [1] and the inversion attack [11]. In this paper we show that correlation immunity of order one is not sufficient, and show that it is really possible to use (at least in theory) any correlation of any order to mount an attack. Moreover we demonstrate that such attacks can be much faster than exhaustive search for some real stream ciphers, for example for Toyocrypt. The paper is organized as follows: In Section 2 and in Appendix A we study the XL algorithm from [24] for solving multivariate quadratic equations, and extend it to equations of higher degree. In Section 3 we apply XL to the cryptanalysis of stream ciphers. In Section 4 we discuss the opportunity to use bent functions in stream ciphers. Then in Section 5 we apply our attack to Toyocrypt stream cipher.
2
The XL Algorithm
In this paper we describe a rather obvious extension of the XL algorithm proposed by Courtois, Klimov, Patarin and Shamir at Eurocrypt 2000 [24]. In-
184
Nicolas T. Courtois
stead of solving a system of m multivariate quadratic equations with n variables of degree K = 2 as in [24], we consider also higher degree equations, i.e. study the general case K ≥ 2. Let D be the parameter of the XL algorithm. Let li (x0 , . . . , xn−1 ) = 0 be the initial m equations, i = 1 . . . m with n variables xi ∈ GF (2). The XL algorithm consists of multiplying both sides of these equations by products of variables: Definition 2.0.1 (The XL Algorithm). Execute the following steps: k 1. Multiply: Generate all the products j=1 xij · li with k ≤ D − K, so that the total degree in the xi of these equations is ≤ D. 2. Linearize: Consider each monomial in the xi of degree ≤ D as a new variable and perform Gaussian elimination on the equations obtained in 1. The ordering on the monomials must be such that all the terms containing one variable (say x1 ) are eliminated last. 3. Get a Simpler Equation: Assume1 that step 2 yields at least one univariate equation in the powers of x1 . Solve this equation over the finite field (e.g., with Berlekamp’s algorithm). 4. Final Step: It should not be necessary to repeat the whole process. Once the value of x1 is known, we expect that all the other variables will be obtained from the same linear system. We expect that to find one solution to the system, the complexity of XL will be essentially the complexity of one single Gaussian reduction in the step 2. 2.1
The Necessary Condition for XL to Work
The XL algorithm consists of multiplying the initial m equations li by all possible monomials of degree up to D − K, so that the total degree of resulting equations is D. Let R be the number of equations generated in XL, and T be the number of all monomials. We have, (the first term is dominant): D−K D n n n n ≈m· , T = ≈ R=m· i D − K i D i=0 i=0 The main problem in the XL algorithm is that in practice not all the equations generated are independent. Let F ree be the exact number of equations that are linearly independent in XL. We have F ree ≤ R. We also have necessarily F ree ≤ T . The main heuristics behind XL is the following: it can be seen that for some D we have always R ≥ T . Then we expect that F ree ≈ T , as obviously it cannot be bigger than T . More precisely, following [24], when F ree ≥ T − D, it is possible by Gaussian elimination, to obtain one equation in only one variable, and XL will work. Otherwise, we need a bigger D, or an improved algorithm2 . 1
2
Improved versions of the XL algorithm exist in which the system can still be solved even if this condition is not satisfied, see the FXL algorithm [24], and XL’ and XL2 methods described in [6]. We do not need these improvements here. Improved versions of XL exist [24, 6], see the footnote 1 on the previous page.
Higher Order Correlation Attacks
185
The Saturation Problem in XL The exact value of F ree in XL is somewhat complex to predict. In [24] authors demonstrate that XL works with a series of computer simulations for K = 2 and over GF (127). In [6] authors show that it also works very well for K = 2 and over GF (2). Moreover they explain how to predict the exact number F ree of linearly independent equations in XL. In this paper we extend the study of XL for higher degree equations K > 2 (still over GF (2)), and will also give a formula that allows to compute F ree (see Conjecture A.3.1). This will allow us to say that our (cryptanalytic) applications of XL should work exactly as predicted. In order to XL algorithm to work, it is sufficient that for some D, the number F ree of linearly independent equations satisfies F ree ≥ T −D. In [19], Moh states that ”From the theory of Hilbert-Serre, we may deduce that the XL program will work for many interesting cases for D large enough”. In Section 4 Moh shows a very special example on which the basic version2 of XL always fails, for any D [19]. This example is very interesting, however it seems that such counter-example does not exist when XL is done over a small finite field, see [13] and [6]. Remark: In Section 3 of [19] Moh gives another misleading argument. He assumes D n in a formula in which D = O( √nm ). He shows that, apparently 1 F ree/R ≈ (n+D)(n+D−1) = w, and it is obvious that w → m when D → ∞. D(D−1)m However in XL, D is never as big as n, if we assume that we have D ≈ √nm as in the previous section, we get w ≈ 1. The conclusion of Moh is inappropriate, not to say incorrect. According to [13], when D is sufficiently big, we will always have F ree = T − α, with α being the number of solutions to the system3 . Therefore for systems over GF(2) that have one and unique solution we expect to always achieve F ree = T − 1 > T − D, which is called saturation4 . In all our simulations we observed that this saturation, necessary to solve the system, is achieved very quickly, and in fact as soon as R > T .
2.2
Asymptotic Analysis of XL for Equations of Degree K
We assume that D n. XL algorithm is expected to succeed when R ≥ T , i.e. when n n (n − D + K) · · · (n − D + 1) m ≥ ⇒ m≥ D(D − 1) · · · (D − K + 1) D−K D 3 4
Here however one should include also the solutions at infinity. Such solutions do not exist when the equations of the field x2i = xi are included in XL, see [6] and [13]. It is easy to show that F ree = T is impossible for a system that has a solution, and more generally if α is the number of solutions (including points at infinity, see the footnote 3), one always has F ree ≤ T − α in XL, cf. [13].
186
Nicolas T. Courtois
Thus (assuming that D n) we get: D≈
n m1/K
, and T
ω
ω n ≈ D
≈
n n m1/K
ω
Asymptotically this is expected to be a good evaluation, when m = εnK with a constant ε > 0. The Complexity of XL and Gaussian Reduction Let ω be the exponent of the Gaussian reduction. In theory it is at most ω ≤ 2.376, see [4]. However the (neglected) constant factor in this algorithm is expected to be very big. The fastest practical algorithm we are aware of, is Strassen’s algorithm that requires about 7 · T log2 7 operations. Since our basic operations are over GF (2), we expect that a careful bitslice implementation of this algorithm on a modern CPU can handle 64 such operations in one single CPU clock. To summarize, we evaluate the complexity of the Gaussian reduction to be 7/64 · T log2 7 CPU clocks. The Exact Behaviour of XL for Interesting Cases K ≥ 2. In this paper we do not use any of the above approximations. We study the exact behaviour of XL, and compute the exact values of F ree for the interesting values of K and D. This part is in Appendix A.
3
Application of XL to Stream Ciphers
In this part we outline a general strategy to apply XL in cryptanalysis of a general class of stream ciphers. Later we will apply it to Toyocrypt. 3.1
The Stream Ciphers that May Be Attacked
We consider only synchronous stream ciphers, in which each state is generated from the previous state independently of the plaintext, see for example [17]. We consider regularly clocked stream ciphers, and also (it makes no difference) stream ciphers that are clocked in a known way5 . For simplicity we restrict to binary stream ciphers in which the state and keystream are composed of a sequence of bits bi , generating one bit at a time. Let L be the ”connection function” that computes the next state. We restrict to the (very popular) case of cipher with linear feedback, i.e. when L is linear over GF (2). We assume that L is public, and only the state is secret. We also assume that the function f that computes the output bit from the state is public and does not depend on the secret key of the cipher. The only no-linear component of the cipher is f and this way of building stream ciphers is sometimes called 5
This condition can sometimes be relaxed, see the attacks on LILI-128 in [5].
Higher Order Correlation Attacks
187
”nonlinear filtering”. It includes the very popular filter generator, in which the state of a single LFSR6 is transformed by a boolean function, and also not less popular scenarios, in which outputs of several LFSR are combined by a boolean function (combinatorial function generators or nonlinear function generators). The problem of cryptanalysis of such a stream cipher can be described as follows. Let (k0 , . . . , kn−1 ) be the initial state, then the output of the cipher (i.e. the keystream) is given by: b0 = f (k0 , . . . , kn−1 ) b1 = f (L (k0 , . . . , kn−1 ))
b2 = f L2 (k0 , . . . , kn−1 ) .. . The problem we consider7 is to recover (k0 , . . . , kn−1 ) given some bi . 3.2
The Attack Scenario
We are going to design a partially known plaintext attack, i.e. we know some bits of the plaintext, and the corresponding ciphertext bits. These bits does not need to be consecutive. For example if the plaintext is written with latin alphabet and does not use too much special characters, it is very likely that all the characters have their most significant bit equal to 0. This will be enough for us, if the text is sufficiently long. In our later attacks we just assume that we have some m bits of the keystream at some known positions: {(t1 , bt1 ), . . . , (tm , btm )}. Remark: Even if no bit of plaintext is known, there are many cases in which our attack can be extended. For example if the plaintext contains parity bits. 3.3
Criteria on the Function f
Let f be the boolean function8 that is used to combine the bits of the linear part of a stream cipher (the entries of the function are for example some bits of the state of some LFSR’s). There are many design criteria known on boolean functions. Some of them are clearly justified, for example a function should be balanced in order to avoid statistical attacks. Some criteria are not well justified, no practical attacks are known when the function does not satisfy them, and they are used rather to prevent some new attacks. It is obvious that for stream ciphers such as described above, the function f should be non-linear. The abundant 6
7 8
A Linear Feedback Shift Register, see for example [17]. It is also possible to use a Modular LFSR, i.e. a MLFSR, which is equivalent in theory, see, [18], but may be better in practice. A MLFSR is used in the Toyocrypt cipher that we study later. We do not consider attacks in which one can predict the future keystream, given some information on the current keystream, and without computing the key. We describe an attack with a single boolean function f , still it is easy to extend it to stream ciphers using several different boolean functions.
188
Nicolas T. Courtois
literature on fast correlation attacks implies also that it should be highly nonlinear9 and also correlation immune at order 1. Similarly, f should have high order (i.e. an algebraic normal form of high degree), to prevent algebraic attacks and finally, a ”good” boolean function should also be correlation immune at high order, as pointed out in [3, 12]. However up till now, no practical and non-trivial attacks on stream ciphers were published, when a function is of high degree, but not higher-order correlation immune. In this paper we design such a general attack based on the XL algorithm, and show that it can be successfully applied to Toyocrypt. Our attack works in two cases: S1 When the boolean function f has a low algebraic degree K. S2 When f can be approximated10 with good probability, by a function g that has a low algebraic degree K. More precisely, we assume that: 1. with probability ≥ 1 − ε f (s0 , .., sn−1 ) = g(s0 , .., sn−1 ) holds: 2. and with g of degree K. Note: In the first scenario S1, when f has just a low
algebraic degree, it is known n keystream bits. A successful that the system can be easily broken given K example of this attack is described for example in [2]. In this paper we show that, since in S2, we do not need for the function to have a low algebraic degree (S1), successful attacks can be mounted given much less keystream bits, and with much smaller complexities. For example in Toyocrypt the degree of f is 63, but in our attacks it will be approximated by a function of degree 2 or 4. 3.4
The Actual Attack
Given m bits of the keystream, we have the following m equations to solve:
∀i = 1 . . . m, bti = f Lti (k0 , . . . , kn−1 ) We recall that f , and all the Lti are public, and only the kj are secret11 . Each of the keystream bits gives one multivariate equation of degree K, with n variables (k0 , .., kn−1 ) and being true with probability (1 − ε):
∀i = 1 . . . m, bti = g Lti (k0 , . . . , kn−1 ) with probability ≥ 1 − ε If we choose m such that (1 − ε)m ≥ 12 , we may assume that all these equations are true and we have to find a solution to our system of m multivariate equations 9 10
11
But maybe not perfectly non-linear, see Section 4. If such a (sufficiently good) approximation exists, there are efficient algorithms to find it. This problem is also known as ”learning polynomials in the presence of noise”, or as ”decoding Reed-Muller codes”. See for example [3, 12, 9]. Important: If L is not public, as it is may be the case in Toyocrypt, our later attacks will not work. Nevertheless they show that Toyocrypt is cryptographically weak.
Higher Order Correlation Attacks
189
of degree K with n variables. More generally, even if (1 − ε)m < 12 , the attack still works, if we repeat it about (1 − ε)−m times, each time for a different subset of m keystream bits, and until it succeeds. The complexity of this attack will be the complexity of generalized XL obtained in Section 2.2, multiplied by the number of repetitions necessary to succeed: ω n ω −m W F = T (1 − ε) ≈ (1 − ε)−m n m1/K The above attack requires about m keystream bits, out of which we choose m at each iteration of the attack. We also need to choose m that minimizes the complexity given above. In practice, since the XL algorithm complexity increases increases by big leaps, with the value of D, we will in fact choose D and determine a minimal m for which the attack works.
4
Non-linear Filtering Using Bent Functions
In order to prevent the numerous known fast correlation attacks, ciphers such as we described above (for example filter generators) should use a function f that is highly non-linear. For this, Meier and Staffelbach suggested at Eurocrypt’89 to use so called perfect non-linear functions, also known as ”bent functions” [16, 22]. These functions achieve optimal resistance to the correlation attacks, because they have a minimum (possible) correlation to all affine functions, see Theorem 3.5. in [16]. It is therefore tempting to use a bent function as a combiner in a stream cipher. And indeed many cryptographic designs (e.g. Toyocrypt, and not only in stream ciphers) use such functions, or modified versions of such functions12 . Unfortunately optimality against one attack does not guarantee the security against other attacks. Following Anderson [1], any criteria on f itself cannot be sufficient. The author of [1] claims that ”attacking a filter generator using a bent or almost bent function would be easy” and shows why on small examples. He considers ”an augmented function” that consists of α copies of the function f applied to consecutive windows of n consecutive bits, among the n + α consecutive bits of an LFSR output stream. He shows explicit examples in which even if f : GF (2)n → GF (2) is a bent function, still the augmented function GF (2)n+α → GF (2)α will have very poor statistic properties, and thus will be cryptographically weak. For real ciphers, it is difficult to see if Anderson’s remark is really dangerous. For example in Toyocrypt, an MLFSR is used instead of an LFSR, which greatly decreases the number of common bits between two consecutive states, and more importantly, only a carefully selected subset of state bits is used in 12
In general the authors of [16] did not advocate to use pure bent functions, because it is known that these functions are not balanced and cannot have a very high degree. They advise to use modified bent functions, for which it is still possible to guarantee a high non-linearity, see [16].
190
Nicolas T. Courtois
each application of f . Thus it seems that Toyocrypt makes any version of the attacks described by Anderson in [1] completely impractical. Bent Function Used in Toyocrypt The combining function f of Toyocrypt is built according to: Theorem 4.0.1 (Rothaus 1976 [22]). Let g be any boolean function g : GF (2)k → GF (2). All the functions f : GF (2)2k → GF (2) of the following form are bent: f (x1 , x2 , . . . , x2k )
= x1 x2 + x3 x4 + . . . , +x2k−1 x2k + g (x1 , x3 , . . . , x2k−1 )
Remark: More precisely, the function of Toyocrypt is a XOR of s127 and a function built according to the above theorem. We must say that using such a function as a non-linear filter is not a very good idea. It is easy to see that if we use a single LFSR or MLFSR, there will be always a ”guess and find” attack on such a cipher. This is due to the fact that if we guess and fix k state bits, here it will be the odd-numbered bits, then the expression of the output becomes linear in the other state bits. This can be used to recover the whole state of the cipher given 3k/2 bits of it, i.e. the effective key length in such a scheme is only 3k/2 instead of 2k bits. This attack is explained in details (on the example of Toyocrypt) in [18]. In this paper we do not use this property of f , and design a different attack, based on the low number of higher degree monomials, and thus being potentially able to break variants of Toyocrypt that are not based on the above theorem and for which there is no ”guess and find” attacks.
5
Application of XL to the Cryptanalysis of Toyocrypt
In this section we present a general attack on Toyocrypt [18], a cipher that was, at the time of the design, believed to resist to all known attacks on stream ciphers. In Toyocrypt, we have one 128-bit LFSR, and thus n = 128. The boolean function is as follows: f (s0 , .., s127 ) = s127 +
62
si sαi + s10 s23 s32 s42 +
i=0
+s1 s2 s9 s12 s18 s20 s23 s25 s26 s28 s33 s38 s41 s42 s51 s53 s59 +
62
si .
i=0
with {α0 , . . . , α62 } being some permutation of the set {63, . . . , 125}. This system is quite vulnerable to the XL higher order correlation attack we described above: there are only a few higher-order monomials: one of degree 4, one of degree 17 and one of degree 63. Everything else is quadratic.
Higher Order Correlation Attacks
191
A Quadratic Approximation Most of the time, the system is quadratic. We put: g(s0 , .., s127 ) =
62 i=0
si sαi .
Then f (s) = g(s) holds with probability about 1 − 2−4 . With the notations of the Section 3.4 we have K = 2 and ε = 2−4 . Currently, it is an open problem if this approximation allows any efficient attacks on Toyocrypt. An Approximation of Degree K = 4 One can also see that if we put: g(s0 , .., s127 ) =
62
si sαi + s10 s23 s32 s42 .
i=0
Then f (s) = g(s) holds with probability very close to 1 − 2−17 . We have K = 4 and we have approximatively ε = 2−17 . 5.1
Our Higher Order Correlation Attack on Toyocrypt
The equation (1−ε)m ≈ 12 gives m ≈ 216 . This is simply to say that if we consider some 216 , not necessarily consecutive bits of the keystream, the probability that for all of them we have f (s) = g(s) will be about 1/2. A more precise evaluation shows that if we put m = 1.3 · 216 , we still have (1 − ε)m = 0.52. This is the value we are going to use. Thus, given some m keystream bits, m = 1.3 · 216 , one can write from Toyocrypt m equations of degree 4 and with 128 variables ki . To this system of equations we apply generalized XL as described in Section 2. We have n = 128 and let D ∈ IN. We multiply each of the m equations by all products of up to D−4 n We D−4 variables ki . The number of generated equations is: R = m i=0 i D n also have T = . We observe that for D = 9 we get R/T = 1.1401. i=0 i Following our simulations and their analysis given in Section A.3, and since D < 3K, we expect that
the exact number of linearly independent equations is m F ree = min(T, R − 2 − m) − with a very small m.
This F ree is sufficient: m we − m))/T = 1.13998, and thus R − − m > T and R − have (R − m 2 2 2 −m is not very close to T . From this, following Conjecture A.3.1 and our simulation results, we expect that F ree = T − with = 1. XL works for D=9. The complexity of the attack is basically the complexity of solving a linear system T×T (we don’t need to take more than T equations). With Strassen’s algorithm, we get: 7 · T log2 7 = 2122 . WF = 64
192
6
Nicolas T. Courtois
Improved XL Higher Correlation Attacks
We will now explore the tradeoff described in Section 3.4. The basic idea is that, if we diminish a little bit a success probability of the attack, we may use a higher m, the system will be more overdefined and we will be able to use a lower value of D. This in turn greatly diminishes the value of T that may compensate for the necessity to repeat the attack. Improved Attacks Exploring the Tradeoff m In the attack above m we saw that F ree = min(T, R− 2 −m)− and that we may in fact neglect 2 − m. Moreover if D becomes smaller, and when D < 2K = 8, following Section A.3 we expect to have F ree = min(T, R) − 1. Thus we may say that for D < 9, and R > 1.1 · T the attack does certainly work. It gives the following condition on m: D D−4 n n > 1.1 · m i i i=0 i=0 From this, given D, we put m = 1.1
D n D−4 n / . The probability i=0 i i=0 i
that our approximation of degree 4 holds for all m equations is (1 − Finally, the complexity of the whole attack is: 1 7 1 · W F = (1 − 17 )−m · 7 · T log2 7 /64 = (1 − 17 )−m · 2 2 64
1 m 217 ) .
D log2 7 n i=0
i
The number of keystream bits required in the attack is about m, and the memory is T 2 bits. In the following table we show possible tradeoffs: D 4 Data 223 Memory 289 Complexity 2200
5 221 256 2102
6 219 265 296
7 218 273 2102
8 217 281 2112
9 216 288 2122
Now, our best attack is in 296 , requires 265 bits of memory and only 82 kilobytes of keystream. Better Attacks with an Iterated Variant of XL It is possible to improve this attack slightly by iterating the XL algorithm. Here is one possible way to do this. We start with m = 1.6 · 218 keystream bits. The probability that all the corresponding m approximations of degree 4 are true is (1 − 2117 )m ≈ 2−4.62 . This means that the whole attack should be repeated on average 24.62 times. Now we apply the XL algorithm with D = 5, i.e. we multiply
Higher Order Correlation Attacks
193
each equation by nothing or one of the variables. We have R = 129 · 1.6 · 218. The goal is however not to eliminate most of the terms, but only all the terms that contain one variable k0 . Let T be the of terms in does
not contain number TD that D . The number the first variable k0 . We have T = i=0 ni and T = i=0 n−1 i of remaining equations of degree K = 5 that contain only n = 127 variables 5 127 25.37 + = 2 . We have is R − (T − T ) = 129 · 1.6 · 218 − 5i=0 128 i=0 i i R /(T − T ) = 5.06 and the elimination takes the time of 7 · T log2 7 /64 = 275.5 . Then we re-apply XL for K = 5, n = 127, m = R − (T − T ) = 225.37 and 87.59 D = 6. We have R /T = 1.021 and XL works with the complexity
of 292.2 . 4.62 75.5 87.6 2 = 2 The complexity of the whole attack is: 2 +2 CPU clocks. Our best attack is now in 292 , it requires still 265 bits of memory, and now only 51 kilobytes of keystream. Comparison with Previously Known Attacks Our new attack is much better than the generic purpose time/memory/data tradeoff attack described by Shamir and Biryukov in [23], that given the same number of keystream bits, about 219 , will require about 2109 computations (in pre-computation phase). Our attack is sometimes better, and sometimes worse than the Mihaljevic and Imai attack from [18]. In [18], given much more data, for example 248 bits, and in particular at least some 32 consecutive bits of the keystream, and given the same quantity of memory 264 , the key can be recovered with a pre-computation of 280 and processing time 232 . However if the keystream available does not contain 32 consecutive bits, only our attack will work. Similarly, if the keystream available is limited to 219 bits, both the Mihaljevic and Imai attack [18] and the generic tradeoff attack from [23] will require a pre-computation of about 2109 . In this case our attack in 292 is better.
7
Extensions and Generalizations
Improved Elimination Methods. A careful implementation of our attack could be substantially faster. It is possible that there are more careful elimination algorithms, that generate the equations in a specific order and eliminate monomials progressively, so that they are not generated anymore. We also expect that fast Gr¨ obner bases algorithms such as Faug`ere’s F5/2 [8] would improve our attack, at least in practice. Variants of Toyocrypt. Our XL-based attacks can cryptanalyse not only Toyocrypt but also many variants of Toyocrypt that resist to all known attacks. For example, if in Toyocrypt we replace the bilinear part of f by a random quadratic form, such ”guess-and-find” attacks as in [18] are not possible anymore, still our XL-based higher degree correlation attack works all the same. The same is true when we leave the quadratic part unchanged and add to f some terms
194
Nicolas T. Courtois
of degree 3 and 4 in variables x2 , x4 , . . .. It is also possible to see that, if the positions of the known bits of the keystream are sparsely distributed, and we do not have any known 32 consecutive bits, the attacks from [18] will not work anymore, and our attack still works. New Attack Scenarios S3 and S4. Since this paper was written, there was substantial progress in algebraic attacks on stream ciphers. Generalizing the attack scenarios S1 and S2 described in this paper, two new attack scenarios S3 and S4 have been introduced by Courtois and Meier [5]. The principle of these new attacks is (roughly) to generate new multivariate equations of substantially lower degree than the original ones, by multiplying the equations by well-chosen multivariate polynomials. Thus, the authors are able to break Toyocrypt in 249 CPU clocks instead of 292 , and also present an attack in 257 for LILI-128.
8
Conclusion
In this paper we studied higher order correlation attacks on stream ciphers. Our approach is to reduce the problem of recovering the (initial) state of a cipher, to solving an overdefined system of multivariate equations. We studied the Toyocrypt stream cipher, accepted to the second phase of the Japanese government Cryptrec call for cryptographic primitives. It is a 128-bit stream cipher, and at the time of submission of Toyocrypt, it was claimed to resist to all known attacks on stream ciphers. The weakness of Toyocrypt we exploited here is the presence of only a few higher degree monomials. It has already been identified as dangerous in Rueppel’s book [21], page 79, back in 1986, however the designers of Toyocrypt ignored this warning. Having little higher degree monomials, it is possible to approximate the filtering function, by a function of a much lower degree with a good probability. From this we were able to reduce the cryptanalysis of Toyocrypt to solving a system of multivariate equations of degree 4. In order to solve it, we studied an extension of the XL algorithm proposed at Eurocrypt 2000 for the case of quadratic equations [24]. The problem about XL is that it is heuristic, not all equations that appear in XL are linearly independent, and thus it is somewhat difficult to say to what extent is works. In this paper we showed that we are always able to explain the origin of the linear dependencies that appear in XL and to predict the exact number of non-redundant equations in XL. Our best higher order correlation attack on Toyocrypt requires 292 CPU clocks for a 128-bit cipher. This is achieved using only 51 kilobytes of the keystream, that does not have to be consecutive, and using 265 bits of memory. This attack will work in many scenarios in which all known attacks fail, for example when the plaintext in only partially known. We conclude that higher order correlation immunity, should be taken more seriously than previously thought, in the design of stream ciphers.
Higher Order Correlation Attacks
195
Acknowledgements This paper has been written following the initial idea suggested by David Wagner. I wish also to thank Willi Meier, Josef Pieprzyk and Greg Rose for helpful remarks, and I’m grateful to Mehdi-Laurent Akkar for writing some useful code for the simulations.
References [1] Ross Anderson: Searching for the Optimum Correlation Attack, FSE’94, LNCS 1008, Springer, pp 137-143. 183, 189, 190 [2] Steve Babbage: Cryptanalysis of LILI-128; Nessie project internal report, available at https://www.cosic.esat.kuleuven.ac.be/nessie/reports/. 188 [3] Paul Camion, Claude Carlet, Pascale Charpin and Nicolas Sendrier, On Correlation-immune Functions; In Crypto’91, LNCS 576, Springer, pp. 86-100. 188 [4] Don Coppersmith, Shmuel Winograd: ”Matrix multiplication via arithmetic progressions”; J. Symbolic Computation (1990), 9, pp. 251-280. 186 [5] Nicolas Courtois and Willi Meier: Algebraic Attacks on Stream Ciphers with Linear Feedback, preprint, available on demand from [email protected]. 186, 194 [6] Nicolas Courtois and Jacques Patarin, About the XL Algorithm over GF (2); Cryptographers’ Track RSA 2003, San Francisco, April 13-17 2003, LNCS, Springer. 184, 185, 199 [7] Nicolas Courtois and Josef Pieprzyk, Cryptanalysis of Block Ciphers with Overdefined Systems of Equations, to be presented at Asiacrypt 2002, a preprint with a different version of the attack is available at http://eprint.iacr.org/ 2002/044/. 183 [8] Jean-Charles Faug`ere: Computing Gr¨ obner basis without reduction to 0, Workshop on Applications of Commutative Algebra, Catania, Italy, 3-6 April 2002. 193 [9] Oded Goldreich, Ronitt Rubinfeld and Madhu Sudan: Learning polynomials with queries: The highly noisy case, preprint September 13, 1998. A preliminary version appeared in 36th Annual Symposium on Foundations of Computer Science, pages 294-303, Milwaukee, Wisconsin, 23-25 October 1995. IEEE. 188 [10] Michael Garey, David Johnson: Computers and Intractability, a guide to the theory of NP-completeness, Freeman, p. 251. [11] Jovan Dj. Golic: On the Security of Nonlinear Filter Generators, FSE’96, LNCS 1039, Springer, pp. 173-188. 183 [12] Jovan Dj. Golic: Fast low order approximation of cryptographic functions, Eurocrypt’96, LNCS 1070, Springer, pp. 268-282. 188 [13] Mireille Martin-Deschamps, private communication. 185 [14] James L. Massey, Rainer A. Rueppel: Linear ciphers and random sequence generators with multiple clocks, in Eurocrypt’84, LNCS 209, Springer. [15] Willi Meier and Othmar Staffelbach: Fast correlation attacks on certain stream ciphers; Journal of Cryptology, 1(3):159-176, 1989. 183 [16] Willi Meier and Othmar Staffelbach: : Nonlinearity Criteria for Cryptographic Functions; Eurocrypt’89, LNCS 4234, Springer, pp.549-562. 189
196
Nicolas T. Courtois
[17] Alfred J. Menezes, Paul C. van Oorshot, Scott A. Vanstone: Handbook of Applied Cryptography; CRC Press. 186, 187 [18] M. Mihaljevic, H. Imai: Cryptanalysis of Toyocrypt-HS1 stream cipher, IEICE Transactions on Fundamentals, vol. E85-A, pp. 66-73, Jan. 2002. Available at http://www.csl.sony.co.jp/ATL/papers/IEICEjan02.pdf. 187, 190, 193, 194 [19] T. T. Moh: On The Method of XL and Its Inefficiency Against TTM, available at http://eprint.iacr.org/2001/047/. 185 [20] Jacques Patarin: Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): two new families of Asymmetric Algorithms; Eurocrypt’96, pp. 33-48. [21] Rainer A. Rueppel: Analysis and Design of Stream Ciphers, Springer Verlag, New York, 1986. 194 [22] O. S. Rothaus: On ”bent” functions; Journal of Combinatorial Theory, Ser. A, Vol. 20, pp. 300-305, 1976. 189, 190 [23] Adi Shamir, Alex Biryukov: Cryptanalytic Time/Memory/Data Tradeoffs for Stream Ciphers; Asiacrypt 2000, LNCS 2248, Springer, pp. 1-13. 193 [24] Adi Shamir, Jacques Patarin, Nicolas Courtois, Alexander Klimov, Efficient Algorithms for solving Overdefined Systems of Multivariate Polynomial Equations, Eurocrypt’2000, LNCS 1807, Springer, pp. 392-407. 183, 184, 185, 194 [25] Volker Strassen: Gaussian Elimination is Not Optimal; Numerische Mathematik, vol 13, pp 354-356, 1969.
A
The Exact Behaviour of XL for K ≥ 2
Let F ree be the maximum number of equations that are linearly independent in XL algorithm. We will show how to compute F ree exactly and compare the results with computer simulations. In all the simulations that follow, we pick a random system of linearly independent equations yi = fi (x0 , . . . , xn−1 ) of degree ≤ K (non-homogenous). Then we pick a random input x = (x0 , . . . , xn−1 ) and we modify the constants in the system in order to have a system that gives 0 in x, i.e. we write a system to solve as li (x0 , . . . , xn−1 ) = 0, for i = 1, . . . m. A.1
The Behaviour of XL for K = 2 and D = 3
By definition, F ree is smaller than R and cannot exceed T , see Section 2.1. Therefore: F ree ≤ min(T, R) We have done various computer simulations with K = 2 and D = 3. In the following table we fix n and try XL on a random system of m linearly independent equations with growing m and with a fixed D.
Higher Order Correlation Attacks
K 2 2 2 2 2 n 10 10 10 10 10 m 10 14 16 17 18 D 3 3 3 3 3 R 110 154 176 187 198 T 176 176 176 176 176 F ree 110 154 174 175 175
2 2 2 2 2 20 20 20 20 20 20 40 50 60 65 3 3 3 3 3 420 840 1050 1260 1365 1351 1351 1351 1351 1351 420 840 1050 1260 1350
197
2 2 64 64 512 1024 3 3 33280 66560 43745 43745 33280 43744
Fig. 1. XL simulations for K = 2 and D = 3 n m D R T F ree
number of variables. number of equations. we generate equations of total degree ≤ D in the xi . number of equations generated (independent or not). number of monomials of degree ≤ D. number of linearly independent equations among the R equations. XL will work when F ree ≥ T − D.
Results: For K = 2 and D = 3 we observe that most of the time13 F ree = min(T, R) and at any rate, we always have F ree = min(T, R) − with = 0, 1, 2 or 3. A.2
The Behaviour of XL for K = 2 and D = 4
When D = 4 we do not have F ree = min(T, R) anymore. However most of the equations are still linearly independent. K 2 2 2 n 10 10 10 m 5 10 11 D 4 4 4 R 280 560 616 T 386 386 386 F ree 265 385 385
2 2 2 2 2 2 20 20 20 20 20 20 20 24 28 30 32 36 4 4 4 4 4 4 4220 5064 5908 6330 6752 7596 6196 6196 6196 6196 6196 6196 4010 4764 5502 5865 6195 6195
2 40 128 4 105088 102091 96832
Fig. 2. XL with K = 2 and D = 4 (notations as on Fig. 1) 13
F ree is bounded by two functions and most of the time it is just the minimum of their values. However around the point where the two graphics meet, we sometimes have a ”smooth transition”: we observe that F ree = min(T, R) − with = 0, 1, 2 or 3. Here the smooth transition is visible for K = 2, n = 10, m = 16, D = 3.
198
Nicolas T. Courtois
Results: From these simulations, it can be seen that K = 2 and D = 4 we have always: m F ree = min T, R − − m − with = 0, 1, 2 or 3. 2 m The fact that F ree = R − 2 − m − when R − m 2 − m ≤ T , suggests m that, in all cases, there are 2 + m linear dependencies between the equations in R. We are able to explain the origin (and the exact number) of these linear dependencies. Let li be the equations taken formally (not expanded), and let [li ] denote the expanded expression of the left side of these equations as quadratic polynomials. Then we have: li [lj ] = [li ]lj For each i = j, the above equation a linear dependency between the
defines dependencies. equations of XL. This explains the m 2 Example: For example if l1 = x1 x3 + x4 (which means that the equation l1 is x1 x3 + x4 = 0) and l5 = x2 x1 + x4 x7 then the notation l1 [l5 ] = [l1 ]l5 denotes the following linear dependency between the li xj xk : l1 x2 x1 + l1 x4 x7 = l5 x1 x3 + l5 x4 . There also other dependencies. They come from the fact that we have: li [li ] = li This explains the remaining m dependencies. For example if l1 = x1 x3 + x4 we obtain that: l1 = l1 x1 x3 + l1 x4 . A.3
Tentative Conclusion on XL and More Simulations for K ≥ 2 and D ≥ 4
From the above simulations, we see that, at least for simple cases, we are always able to predict the exact number of linearly independent equations that will be obtained. From the above simulations we conjecture that: Conjecture A.3.1 (Behaviour of XL for D < 3K). 1. For D = K . . . 2K − 1 there are no linear dependencies when R ≥ T and we have F ree = min(T, R) − with = 0, 1, 2 or 3. 2. For D = 2K . . . 3K −1 there arelinear dependencies and we have D−2K
n m − with = 0, 1, 2 or 3. F ree = min T, R − i 2 +m m
i=0 The factor 2 + m is due to the linear dependencies of type li [lj ] = [li ]lj and li [li ] = li as explained above. Moreover when D > 2K there are other linear dependencies that are products of these by monomials in xi of degree up to D−2K,and to count these we have multiplied their number by a factor D−2K n . i=0 i 3. It is also possible to anticipate what happens for D ≥ 3K. However, it is more complex, and in this paper we do not need to know this.
Higher Order Correlation Attacks
199
Theory vs. Practice Here is a series of simulations with different K > 2 and different values of D to see if our conjecture is verified in practice. K 3 3 3 3 3 3 n 10 10 10 10 10 10 m 10 10 10 10 10 10 D 3 4 5 6 7 8 R 10 110 560 1760 3860 6380 T 176 386 638 848 968 1013 F ree 10 110 560 846 966 1011
3 3 3 3 3 16 16 16 16 16 16 16 16 16 16 3 4 5 6 7 16 272 2192 11152 40272 697 2517 6885 14893 26333 16 272 2192 11016 26330
Fig. 3. XL with K = 3 (notations as on Fig. 1) K 4 4 4 4 4 4 4 n 10 10 10 10 10 10 10 m 10 10 10 10 10 10 10 D 4 5 6 7 8 9 10 R 10 110 560 1760 3860 6380 8480 T 386 638 848 968 1013 1023 1024 F ree 10 110 560 966 1011 1021 1022
4 4 4 4 4 16 16 16 16 16 16 16 16 16 16 4 5 6 7 8 16 272 2192 11152 40272 2517 6885 14893 26333 39202 16 272 2192 11152 39200
Fig. 4. XL with K = 4 (notations as on Fig. 1) By inspection we see that these results, all our previous simulations, as well as those done in [6], always do confirm the Conjecture A.3.1.
On the Efficiency of the Clock Control Guessing Attack Erik Zenner Theoretische Informatik, University of Mannheim (Germany) [email protected]
Abstract. Many bitstream generators are based on linear feedback shift registers. A widespread technique for the cryptanalysis of those generators is the linear consistency test (LCT). In this paper, we consider an application of the LCT in cryptanalysis of clock-controlled bitstream generators, called clock control guessing. We give a general and very simple method for estimating the efficiency of clock control guessing, yielding an upper bound on the effective key length of a whole group of bitstream generators. Finally, we apply the technique against a number of clock-controlled generators, such as the A5/1, alternating step generator, step1-step2 generator, cascade generator, and others. Keywords: Block/Stream Ciphers, Cryptographic Primitives.
1
Introduction
Pseudorandom bitstream generators are an important building block in modern cryptography. The design goal is to expand a short key into a long bitstream that is indistinguishable from a true random sequence by all computational means. In most cryptographic applications, the resulting pseudorandom bit sequence is added modulo 2 to the plaintext bit sequence. In this paper, we consider the typical cryptanalytic situation for bitstream generators. The cryptanalyst is assumed to know the complete algorithmic description of the generator, with the exception of the inner state. Given a piece of bitstream, his goal is to reconstruct an initial state of the generator such that the generator’s output is identical to the known bitstream. Many bitstream generators are based on linear feedback shift registers (LFSR). An LFSR implements a linear recursion, transforming a short initial state into a long bit sequence. If the feedback recursion is chosen such that the corresponding feedback polynomial is primitive, the resulting sequence displays good statistical properties. In particular, all short substrings of length l occur with probability of almost exactly 2−l . Throughout the rest of the paper, we assume that sequences generated by LFSR have this property.1 1
This work was partially supported by the LGF Baden-W¨ urttemberg. For more details on LFSR, refer to [9].
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 200–212, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Efficiency of the Clock Control Guessing Attack
201
Nonetheless, a simple LFSR is an easy target for a cryptanalyst. Since the sequence generated by the LFSR is linear, recovering the initial state is only a matter of solving a system of linear equations. Thus, LFSR must be employed in a more involved fashion, adding some non-linearity to the bitstream sequence. One way of achieving this goal is clock-control. Clock-controlled generators do not clock all of their LFSR once per master clock, but rather use some irregular clocking instead. This way, the linearity of the resulting bit sequences is destroyed. Purpose of the Paper: In practical cipher design, the most widespread technique is what Rueppel [15] denoted by the system-theoretic approach. Under this design paradigm, a cipher is considered secure for practical purposes if (1) it meets all previously known cryptanalytic design principles and (2) passes a public evaluation phase. Since failure in step (2) is very costly, it is paramount for a cipher designer to take step (1) very seriously. In order to do so, a toolbox of generic design principles and attacks would be helpful. However, in the case of stream ciphers, this toolbox contains only a few universal techniques, such as pseudorandomness and nonlinearity tests, correlation attacks or time-memory tradeoffs. Actually, there is a wealth of research on stream cipher cryptanalysis available. However, more often than not, those attacks target concrete stream ciphers, with the generalisation being left to the cipher designer. Thus, it is the aim of this paper to provide a generalised technique for stream cipher cryptanalysis. We provide a description of the attack, give a universal running time estimate and set up rules how to protect a cipher against such an attack. Indeed, by this example, we hope to motivate the search for more generic attacks in the field of stream cipher cryptography. Organisation of the Paper: The techique considered in this paper is called clock control guessing. It is a generalisation of the linear consistency test (LCT) presented in [18], which will be reviewed in section 2. Although clock control guessing has been used in prior publications (e.g. in cryptanalysis of the A5/1 stream cipher in [19, 7, 14]), its potential for stream cipher cryptanalysis has not been fully analysed so far. Thus, our purpose is to generalise the technique so that it can be added to the “cryptanalyst’s toolbox”. A general description, along with a set of criteria for the attack to work, is given in section 3. The efficiency of clock control guessing is examined in section 4, yielding our main result: Without any further knowledge about cipher details like LFSR lengths or feedback polynomials, a surprisingly simple upper bound on the efficient key length of all involved generators can be derived. In section 5, we review the clock control guessing attack against A5/1 and give some experimental results, showing that the practical security of the generator against clock control guessing almost exactly coincides with the theoretical upper
202
Erik Zenner
bound. Section 6 applies the technique to a number of well-known generators. Finally, in section 7, some design recommendations and conclusions are given. On Notation: Throughout the paper, the length of the inner state of a generator will be denoted by L. The initial state S(0) will sometimes be called “key” for simplicity. Each inner state S(t) determines uniquely a clock control behaviour ξt (sometimes referred to as “clocking”) that leads to the inner state S(t + 1). From the inner states S(0), S(1), . . ., the generator derives a bitstream that is denoted by y = (y0 , y1 , . . .). ξ0
ξ1
ξ2
S(0) −→ S(1) −→ S(2) −→ . . . When LFSR are used, they are denoted by A, B and C. LFSR A has length |A| and generates a sequence a = (a0 , a1 , . . .); similarly for LFSR B and C. Finally, by log(x) we denote the base-2 logarithm log2 (x).
2
LCT and Adaptive Bit Guessing
Linear Consistency Tests: In [18], the linear consistency test (LCT) was formally introduced. The basic technique as given in figure 1 has been employed against many bitstream generators, such as the Shrinking Generator [4] or the Summation Generator [5]. However, the term “linear consistency test” is hardly used, many authors preferring the more general notion of a “divide and conquer attack”. Usually, the equation system will be in at most L variables. Since in most practical applications, the linear equations can be read from a small precomputed table, each loop of the algorithm in figure 1 takes O(L3 ) computational steps for solving a system of linear equations. Thus, the total running time of the algorithm is in the order of O(L3 · 2|K1 | ) computational steps. Example: As a simple example, consider the alternating step generator [10]. The generator consists of three LFSR C, A and B. For each clock t, the output ct of LFSR C is determined. If ct = 0, clock LFSR A, else clock LFSR B. Finally,
Linear Consistency Test: 1. Choose a particularly useful subkey K1 with |K1 | < L. 2. For all assignments κ for the subkey K1 : 3. Derive the system of linear equations implied by κ. 4. If the equation system is consistent: 5. Output κ as subkey candidate. 6. Else: 7. Discard κ.
Fig. 1. Linear consistency test
On the Efficiency of the Clock Control Guessing Attack
203
add the current output bit of LFSR A and B (modulo 2) and append it to the bitstream. This generator can be attacked by a simple LCT attack. The cryptanalyst guesses the inner state of LFSR C. Now, he can compute the behaviour of the clock control and can form one equation of the form ai ⊕ bj = yt per known output bit. Using the feedback recurrence, he can transform each such equation such that only variables from the starting state of LFSR A and B are being used. Finally, he checks the set of resulting equations for consistency. Thus, the number of linear consistency tests equals 2|C| , taking less than O(L3 ) steps each (while the number of wrong key candidates should be negligibly small, see [18]). This might tempt a cipher designer to choose a large length for LFSR C at the cost of the length of LFSR A and B. However, in sections 3 and 4, we shall see that this is not helpful in building a safer bistream generator. Adaptive Bit Guessing: A variant of the plain LCT technique presented above can be denoted as adaptive bit guessing. It was used, e.g., by Goli´c in [6] in order to break the A5/1 stream cipher, or by Zenner, Krause, and Lucks in [20] for an attack against the self-shrinking generator. The general idea is as follows. Instead of guessing all of the subkey in one go, the subkey bits are guessed one by one, allowing for instant verification of the linear equation system. This yields a backtracking attack on a clearly defined search tree. In many cases, this procedure has the advantage that (if an early contradiction occurs) a whole group of subkey candidates can be discarded at once, severely improving the running time. However, the running time of this attack is determined by the number of search tree nodes that are visited, and this number is often hard to determine in practice.
3
Clock Control Guessing
Clock Control Guessing: In connection with clock-controlled bitstream generators, the LCT technique may be used in a slightly different way, yielding a very simple method of proving an upper bound on the running time. We consider clock control generators that have the following properties: 1. The output bit depends on the inner state of the generator in some linear way. For each clock cycle t and each assignment to the output bit yt , a linear equation q can be given such that the inner state S(t) generates output bit yt iff S(t) is a solution to q. 2. The behaviour of the clock control depends on the inner state of the generator in some linear way. For each clock cycle t and each assignment to the clock control behaviour ξt , a set Q of linear equations can be given such that the inner state S(t) generates the clock control value ξt iff S(t) is a solution to Q. 3. The number of possible behaviours of the internal clock is small.
204 clock 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Erik Zenner guess(equ system, clock ctrl, t) Build all linear equations from properties 1 and 2. Add equations to equ system. If (LCT(equ system)=false): Start backtracking. t←t+1 If (t = L): Do exhaustive search on remaining key space. Start backtracking. For all possible clockings ξt : clock guess(equ system, ξt , t).
Fig. 2. Recursive method clock guess
Given a generator that has properties 1-3, we can modify the adaptive bit guessing attack as follows. Instead of guessing individual bits, for each clock cycle t = 0, 1, . . ., we guess the associated clocking ξt . We add all linear equations that follow from output bit yt and clock control ξt to the linear equation system and check for consistency. The recursive method clock guess in figure 2 gives the general idea of the attack. Observation: Note that clock guess implements a depth search on a tree, where each node of the tree contains a system of linear equations. Due to properties 1 and 2, all solutions to the equation system are keys that produce the bitstream y0 , . . . , yt−1 . Consequently, steps 7-8 are only executed for keys that produce the bitstream y0 , . . . , yL−1 . Since this property is only rarely met by random keys, the number of calls to steps 7-8 amongst all calls to clock guess should be a very small integer. Thus, the average effort for steps 7-8 on a single call to clock guess is negligible. Considering that step 1 can be executed by a table lookup on a small precomputed table, it becomes obvious that the running time of one execution of clock guess is dominated by steps 2 and 3. Here, the Gaussian algorithm for linear equation systems can be deployed, yielding an overall effort in O(L3 ) steps per call to clock guess. Alternating Step Generator, Revisited: Applying the clock guessing attack against the alternating step generator, we would first guess c0 , then c1 , c2 and so on2 . Thus, we obtain two linear equations in each round (one for the clock control and one for the output bit) and wait for contradictions to occur. Note that - if LFSR A and B are much shorter than LFSR C - the first linear inconsistencies will occur long before the bit c|C| has been guessed, making clock control guessing much more efficient than a plain LCT attack. 2
Note that for the alternating step generator, the clock control guessing attack is identical to the adaptive bit guessing attack.
On the Efficiency of the Clock Control Guessing Attack
4
205
On the Efficiency of Clock Control Guessing
Estimating the Running Time: As stated above, the running time of backtracking attacks is not easily determined. An important role plays the depth d of the nodes where the first inconsistent linear equation systems occur, and the probability of this event. For more involved bitstream generators, these values are not easily determined. This is also true for the clock control guessing attack. A precise estimate of the running time (i.e., the number of calls to clock guess) is not possible without paying close attention to the details of the cipher considered. The length of the registers, the sparseness of the feedback polynomials, the positions of the output and clock control bits and the choice of the output and clock control function all determine the efficiency of the attack. We can, however, prove a general upper bound for the size of the search tree considered. In order to do this, we assume that the generator meets the following condition: 4. The number of initial states S(0) that are consistent with the first d output bit (d ≤ L) is approximately 2L−d . Note that this condition is met by all properly designed bitstream generators, since otherwise, correlation attacks are easily implemented. Now we can estimate the maximum width of the search tree, using an elegant technique proposed by Krause in [12]. First, we make some simple observations. Observation 1: Consider a node v in the search tree at depth d. Such a node is reached by a sequence c0 , c1 , . . . , cd−1 of guesses for the clock control behaviour. It contains a system V of linear equations derived on the path from the root to the node by using properties 1 and 2 of the generator. The set of solutions to V has the following properties: a) All solutions to V produce the clock control sequence c0 , c1 , . . . , cd−1 . b) All solutions to V produce the bitstream sequence y0 , y1 , . . . , yd−1 . c) If V is consistent, there is at least one solution to V . We say that the node v represents all inner states that are solutions to V , and that v is consistent if V is consistent. As a consequence of property a, no two nodes at depth d represent the same inner state, since different nodes imply different behaviours of the clock control. On the other hand, no node v represents an inner state that is inconsistent with the output bits y0 , . . . , yd−1 . From property 4 of the generator, we know that there are approximately 2L−d solutions in all of the nodes. Since by property c, there are no empty consistent nodes, there can be at most 2L−d consistent nodes at depth d. For low values of d, however, the number of consistent nodes is going to be a lot smaller since each node represents a huge number of inner states.
206
Erik Zenner
Observation 2: On the other hand, the number of nodes in the tree at depth d can never be larger than k d , where k is the number of possible behaviours of the clock control. For small values of d, this estimate will usually be exact, while for larger values of d, the actual tree contains a lot less nodes than indicated by this number. Width of the Search Tree: Observe that the function 2L−d is constantly decreasing in d, while k d is constantly increasing. Since the number of consistent nodes in the tree is indeed upper bounded by both of these functions, the maximum number of nodes at a given depth is upper bounded by min{2L−d, k d }. If we write k d = 2log(k)·d for convenience, the maximum number of nodes must be smaller than 2w with w = L − d, yielding 2w = 2log(k)·(L−w) w = log(k) · (L − w) log(k) L w= log(k) + 1 Thus, the number of consistent nodes in the widest part of the search tree can log(k) not exceed 2λL with λ = log(k)+1 . Note that this is not an asymptotical result; it is perfectly valid to use concrete values for k and L and to calculate the upper bound. Total Running Time: Now that we have obtained an upper bound on the width of the search tree, the total running time is easily determined. Observing that – there are at most two layers with width 2w , that – all layers above those two have at most 2w consistent nodes amongst them, and that – all layers below those two have at most 2w consistent nodes amongst them, we see that the tree has at most 4 · 2w consistent nodes. Observing further that there must be less than k non-consistent nodes for each consistent node, we obtain a maximum of 4·(k+1)·2w ∈ O(2w ) recursive calls to method clock guess. Thus, remembering our observation from section 3, the overall running time must log(k) . be in the order of O(L3 · 2λL ) with λ = log(k)+1 Alternating Step Generator, Concluded: Let us use our new result on the alternating step generator. There are only two options for the clock control, yielding log(k) = log(2) = 1 and thus w = L/2. Consequently, quite independent of the choice of the individual parameters, any implementation of the alternating step generator can be broken by a clock control guessing attack in O(L3 · 20.5L ) steps, yielding an absolute upper bound of 0.5L bit on the efficient key size of this kind of generator. In particular, increasing the length of LFSR C while
On the Efficiency of the Clock Control Guessing Attack
207
Table 1. Clock control and linear equations C (011) (101) (110) (111)
Equation u1 = u2 = u3 u1 = u2 = u3 u1 = u2 = u3 u1 = u2 = u3
decreasing the lengths of LFSR A and B (as proposed in section 2) can not possibly increase security beyond this point. Also note that depending on the choice of the individual parameters, the attack may even be much more efficient.
5
Application: Attacking A5/1
Description of the Cipher: A5/1 is the encryption algorithm used by the GSM standard for mobile phones; it was described in [3]. The core building block is a bitstream generator, consisting of three LFSR with a total length of 64 bit. First, the output is generated as the sum (mod 2) of the least significant bits of the three registers. Then the registers are clocked in a stop-and-go fashion according to the following rule: – Each register delivers one bit to the clock control. The position of the clock control tap is fixed for each register. – A register is clocked iff its clock control bit agrees with the majority of all clock control bits. Clock Control Guessing: As mentioned before, the clock control guessing attack on A5/1 was discussed earlier by Zenner [19], Goli´c [7], and Pornin and Stern [14]. First observe that the A5/1 generator produces 1 output bit per master clock cycle, and that there are 4 different behaviours of the clock control. Let u1 , u2 and u3 denote the contents of the clock control bits for a given clock cycle. Table 1 gives the dependency between u1 , u2 , u3 and the behaviour C of the clock control. Note that equivalent linear equations are easily constructed. Thus, we see that the A5/1 algorithm meets all prerequisites for a successful clock control guessing attack. We simply guess the behaviour of the clock control for each output bit, derive the linear equations and check for consistency. Upper Bounding the Running Time: Applying our estimate technique to the A5/1, we have to observe two facts: 1. The initial state is generated in such a way that only 58 · 264 states are in fact possible. The impossible states can be excluded by a number of simple linear equations (for details, see [6]). Thus, the efficient key length of the inner state is only 64 + log( 58 ) ≈ 63.32 bit.
208
Erik Zenner
Table 2. 40-bit version of the A5/1 generator LFSR A B C
length 11 14 15
feedback polynomial x11 + x2 + 1 x14 + x5 + x3 + x + 1 x15 + x12 + x4 + x2 + 1
clock control tap a6 (in a0 , . . . , a10 ) b7 (in b0 , . . . , b13 ) c8 (in c0 , . . . , c14 )
2. Furthermore, the first output bit is not yet dependent on the clock control. Thus, the efficient key length of the inner state prior to any clock control guessing is further reduced by 1 bit, yielding L ≈ 62.32. For each master clock cycle, 4 possible behaviours of the clock control are possible. Thus, k = 4 and log(k) = 2. Using the estimate from section 4, we conclude that the search tree has a maximum width of 2(2/3)·62.32 ≈ 241.547 nodes. This result coincides with the maximum number of end nodes as given by Goli´c in [7], derived from a more involved analysis. Also note that in the same work, the average number of end nodes was estimated to be 240.1 , as was to be expected: By paying close attention to important details of the generator such as the position of the feedback taps or the length of the registers, an estimate for the tree size can be derived that in most cases will be lower than the general upper bound. Nonetheless, this upper bound gives a first indication of a cipher’s strength by ruling out some weak ciphers without further effort. Test Run on a Small Version: In order to demonstrate the difference between the proven upper bound and the actual running time, we have implemented a 40bit version of the A5/1, featuring the details given in table 2. Again, we observe that the first output bit is not yet dependent on the clock control, yielding 239 candidates for the initial state or an efficient key length of L = 39 bit.3 Thus, we would expect the bounding functions to be 4d and 239−d , yielding a maximum search tree width of 226 . An overall of 120 experiments was conducted, and the results are shown in figure 3. The figure shows the average width of the search trees that were found in the experiments. It also gives the bounding functions 4d and 239−d for convenience. The following observations can be made: – The actual tree width at depth d matches the predicted value of min(4d , 239−d ) surprisingly well. – In the widest part of the tree (d = 14), the actual number of nodes is smaller than the predicted upper bound, which was to be expected. – In the lowest part of the tree (d > 34), the actual number of nodes is larger than predicted by the function 239−d . This is due to the fact that for the A5/1 generator, there is a chance that several inner states map onto the same output sequence, i.e., assumption 4 does not hold for high values of d. 3
For simplicity’s sake, we ignore the fact that only possible.
5 8
· 240 inner states are actually
On the Efficiency of the Clock Control Guessing Attack
209
width(d) 2 39−d
30
2
4d
20
2
2
10
depth d 5
10
15
20
25
30
35
40
Fig. 3. Width of search tree for small A5/1 generator
This, however, does not affect the performance of the algorithm, since the running time is almost exclusively determined by the widest part of the tree. In our experiments, we found an average of 1.758 inner states that produce the same output. Judging from the empirical data as given in table 3, it seems that the probability of a bitstream (generated from a random seed) having z generating keys is approximately 2−z for small values of z. Whether or not this assumption is correct and whether or not it also holds for the full version of A5/1 remains an open problem.
6
Other Generators
In this section, we will review some generators from literature, pointing out some dos and don’ts when using the above attack and the associated technique for upper bounding the efficient key length. Stop-and-Go Generator: The stop-and-go generator [2] consists of two LFSR C and A, where the output bit is taken as the least significant bit of LFSR A. While LFSR C is clocked regularly and outputs c1 , c2 , . . ., LFSR A is clocked iff ct = 1. As a consequence, the output sequence y has a probability of 3/4 that the condition yt = yt−1 holds. Thus, certain output sequence prefixes are much more likely than others, contradicting property 4. Thus, even though the clock control guessing attack can be implemented against the stop-and-go generator, the estimate can not be used without further thought.
Table 3. Frequency of equivalent keys equivalent keys frequency
1 64
2 33
3 17
4 2
5 3
6 -
7 1
210
Erik Zenner
Step1-Step2 Generator: The step1-step2 generator [8] modifies the stop-andgo generator in that depending on bit ct , the LFSR A is stepped once (ct = 0) or twice (ct = 1). In this case, the resulting bit sequence does not display the anomaly of the stop-and-go generator and meets property 4. Since the behaviour of the clock control can be described as for the alternating step generator and since there are only 2 possible behaviours of the clock control, we obtain an upper bound of 0.5L for the efficient key length of the step1-step2 generator, independent of the individual parameters. [1..D] Decimating Generator: More generally, a generator might pick some bits from LFSR C and interpret them as a positive number ξ ∈ {1, . . . , D}. Then, register A is clocked ξ times before delivering the next output bit. Such a generator is called [1..D] decimating generator [8]. If it meets conditions 1-4, a clock control guessing attack is possible and has an efficient key length of at log(D) L bit. most log(D)+1 Cascade Generator: A [1..D] decimating generator can be further generalised by turning it into a cascade, using s LFSR A1 , . . . , As instead of just 2. In [8], Gollmann and Chambers describe some possible constructions for cascade generators obtaining good statistical bitstream properties. A typical example is a cascade of stop-and-go generators where the output bit of LFSR Ai controls the clocking of LFSR Ai+1 and is also added to the output of LFSR Ai+1 . Since the basic clock-control mechanism (stop-and-go) meets conditions 1-3, the cascade generator can be attacked using clock control guessing. Since the cascade (as opposed to the simple stop-and-go generator) meets assumption 4, we can use the above technique to derive an upper bound on the effective key length. We see that there are k = 2s−1 possible behaviours for the clock control, yielding log(k) = s − 1 and an efficient key length of at most s−1 s L. Note that this is not identical to the na¨ıve LCT attack of guessing the contents of the uppermost s − 1 registers and deriving the content of the lowest LFSR from the bitstream. This na¨ıve attack has computational cost in the order of O(2L−l ), where l is the length of the final LFSR. If l < Ls , the clock control guessing attack will usually be more efficient than the simple LCT attack. Shrinking Generator: The shrinking generator was proposed in [4]. It consists of two LFSR C and A that are clocked simultaneously. For each clock t , if the output bit ct of LFSR C equals 1, the output bit at of LFSR A is used as output bit. Otherwise, at is discarded. Note that this generator can be viewed as a clock-controlled generator, where register A is clocked once with probability 1/2, twice with probability 1/4 a.s.o. before producing one bit of output. Thus, the number of possible clock control behaviours is rather large (up to |C| different possibilities), the property 3 is
On the Efficiency of the Clock Control Guessing Attack
211
violated and the attack is not applicable in a straightforward manner. In this case, the adaptive bit guessing attack seems to obtain better results4 .
7
Conclusions
We have presented the cryptanalytic technique of clock control guessing which is applicable against a large number of clock-controlled bitstream generators. We have also given a general technique for upper bounding the efficiency of our log(k) L bit, where k is the attack, yielding an efficient key length of at most log(k)+1 number of possible behaviours for the clock control. Most clock-controlled generators proposed in the literature have rather simplistic clock control rules, often yielding k = 2 and thus cutting the efficient key length down to L/2 even without more detailed analysis. If this is not acceptable, any of the following design changes increases resistance against our attack: – Increase the number of possible behaviours for the clock control. This way, the search tree expands rather rapidly, making the search more difficult. – Choose a non-linear function for the clock control. – Choose a non-linear function for the keybit extraction. A generic example of a clock-controlled bitstream generator that can be designed to follow all of those design criteria is the LILI generator [17]. The generator consists of two LFSR C and A, where C determines the clock control and A the output. The clock control ct is determined from the inner state of LFSR C by a bijective function fc : {0, 1}m → {1, . . . , 2m }, and the output bit yt is computed from the inner state of LFSR A using a Boolean function fd : {0, 1}n → {0, 1}. If the values m and n are chosen large enough and if the functions fc and fd are non-linear, the generator should be safe from clock control guessing attacks5 . Note, however, that security against clock control guessing is a necessary, but by no means sufficient condition for cryptographic security. In the case of the LILI generator, correlation attacks proved to be fatal [11], as did timememory trade-off attacks [1, 16]. Good cipher designs have to resist all known cryptanalytic techniques - clock control guessing is just one of them.
Acknowledgements The author would like to thank Stefan Lucks and Matthias Krause for helpful discussions and advice. 4 5
The same observation holds for the self-shrinking generator, presented in [13] and cryptanalysed in [20]. The mapping fc (x1 , . . . , xk ) = 1 + x1 + 2x2 + . . . + 2k−1 xk that was proposed by the authors is easily modelled using linear equations. This should not be a problem, as long as the other design criteria are met. For paranoia’s sake, however, a non-linear permutation might be considered instead.
212
Erik Zenner
References [1] S. Babbage. Cryptanalysis of LILI-128. Technical report, Nessie project, 2001. https://www.cosic.esat.kuleuven.ac.be/nessie/reports/. 211 [2] T. Beth and F. Piper. The stop-and-go generator. In T. Beth, N. Cot, and I. Ingemarsson, editors, Advances in Cryptology - Eurocrypt ’84, volume 209 of LNCS, pages 88–92. Springer, 1985. 209 [3] M. Briceno, I. Goldberg, and D. Wagner. A pedagogical implementation of A5/1. http://www.scard.org/gsm/a51.html. 207 [4] D. Coppersmith, H. Krawczyk, and Y. Mansour. The shrinking generator. In D. R. Stinson, editor, Advances in Cryptology - Eurocrypt ’93, volume 773 of LNCS, pages 22–39, Berlin, 1993. Springer. 202, 210 [5] E. Dawson and A. Clark. Divide and conquer attacks on certain classes of stream ciphers. Cryptologia, 18(4):25–40, 1994. 202 [6] J. D. Goli´c. Cryptanalysis of alleged A5 stream cipher. In W. Fumy, editor, Advances in Cryptology - Eurocrypt ’97, volume 1233 of LNCS, pages 239–255, Berlin, 1997. Springer. 203, 207 [7] J. D. Goli´c. Cryptanalysis of three mutually clock-controlled stop/go shift registers. IEEE Trans. Inf. Theory, 46(3):1081–1090, May 2000. 201, 207, 208 [8] D. Gollmann and W. Chambers. Clock-controlled shift registers: A review. IEEE J. Selected Areas Comm., 7(4):525–533, May 1989. 210 [9] S. Golomb. Shift Register Sequences. Aegean Park Press, Laguna Hills (CA), revised edition, 1982. 200 [10] C. G¨ unther. Alternating step generators controlled by de Bruijn sequences. In D. Chaum and W. Price, editors, Advances in Cryptology - Eurocrypt ’87, volume 304 of LNCS, pages 88–92. Springer, 1988. 202 [11] F. J¨ onsson and T. Johansson. A fast correlation attack on LILI-128. Technical report, Lund University, Sweden, 2001. 211 [12] M. Krause. BDD-based cryptanalysis of keystream generators. In L. Knudsen, editor, Advances in Cryptology - Eurocrypt ’02, LNCS. Springer, 2002. 205 [13] W. Meier and O. Staffelbach. The self-shrinking generator. In A. De Santis, editor, Advances in Cryptology - Eurocrypt ’94, volume 950 of LNCS, pages 205– 214, Berlin, 1995. Springer. 211 [14] T. Pornin and J. Stern. Software-hardware trade-offs: Application to A5/1 cryptanalysis. In C ¸ . Ko¸c and C. Paar, editors, Proc. CHES 2000, volume 1965 of LNCS, pages 318–327. Springer, 2000. 201, 207 [15] R. Rueppel. Stream ciphers. In G. Simmons, editor, Contemporary Cryptology The Science of Information Integrity, pages 65–134. IEEE Press, 1992. 201 [16] M.-J. Saarinen. A time-memory tradeoff attack against LILI-128. In J. Daemen and V. Rijmen, editors, Proc. FSE 2002, volume 2365 of LNCS, pages 231–236. Springer, 2002. 211 [17] L. Simpson, E. Dawson, J. Goli`c, and W. Millan. LILI keystream generator. In D. Stinson and S. Tavares, editors, Proc. SAC 2000, volume 2012 of LNCS, pages 248–261. Springer, 2001. 211 [18] K. Zeng, C. Yang, and Y. Rao. On the linear consistency test (LCT) in cryptanalysis with applications. In G. Brassard, editor, Advances in Cryptology - Crypto ’89, volume 435 of LNCS, pages 164–174. Springer, 1990. 201, 202, 203 [19] E. Zenner. Kryptographische Protokolle im GSM-Standard - Beschreibung und Kryptanalyse. Master’s thesis, University of Mannheim, 1999. 201, 207 [20] E. Zenner, M. Krause, and S. Lucks. Improved cryptanalysis of the self-shrinking generator. In V. Varadharajan and Y. Mu, editors, Proc. ACISP ’01, volume 2119 of LNCS, pages 21–35. Springer, 2001. 203, 211
Balanced Shrinking Generators Se Ah Choi and Kyeongcheol Yang Dept. of Electronic and Electrical Engineering Pohang University of Science and Technology (POSTECH) Pohang, Gyungbuk 790-784, Korea {sea78,kcyang}@postech.ac.kr http://www.postech.ac.kr/ee/ccl
Abstract. The shrinking generator is a keystream generator which is good for stream ciphers in wireless mobile communications, because it has simple structure and generates a keystream faster than other generators. Nevertheless, it has a serious disadvantage that its keystream is not balanced if they use primitive polynomials as their feedback polynomials. In this paper, we present a method to construct balanced shrinking generators by modifying the structure of the shrinking generator and analyze their cryptographical properties including period, balancedness, linear complexity, and probability distribution. Experimental results show that the keystreams of these generators have larger linear complexity than that of the shrinking generator, provided that the sizes of LFSRs are fixed. Keywords: Shrinking Generator, Self-Shrinking Generator, Balancedness, Period, Linear Complexity, Statistical Properties.
1
Introduction
Stream ciphers are a secret-key cryptosystem used to encrypt large amounts of data very fast. Their keystreams are usually generated using linear feedback shift registers (LFSRs) [8], [7]. They are considered secure if partial knowledge on them can not be used to recover them completely. It is generally required that the keystream of a stream cipher be balanced and have long period and large linear complexity in order to make it secure. In [3], Coppersmith, Krawczyk and Mansour proposed a new clock-controlled generator called the shrinking generator, which consists of two LFSRs, as shown in Figure 1. Later, Meier and Staffelbach proposed another shrinking generator called the self-shrinking generator [6], shown in Figure 2. It is made up of only one LFSR, where the even and odd bits of the sequence in the LFSR play the same role as the sequences of LFSR 1 and LFSR 2 in the shrinking generator, respectively. Let B = {bi }∞ i=0 be a binary sequence. The sequence B has period PB if PB is the smallest positive integer such that bi+PB = bi for any integer i. It is said to be balanced if the difference between the number of 0’s and the number of 1’s in one period is less than or equal to one. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 213–226, 2003. c Springer-Verlag Berlin Heidelberg 2003
214
Se Ah Choi and Kyeongcheol Yang
LFSR 1 Feedback Polynomial ai s i=1
LFSR 2
output z j discard z j
si
s i=0
Feedback Polynomial
Fig. 1. The Shrinking Generator
LFSR Feedback Polynomial
x 2i+1
x 2i =1
x 2i
output y
discard y j x 2i =0
Fig. 2. The Self-shrinking Generator For our convenience, we define the shift operator x by xbi bi+1 . More generally, it can be extended to a binary polynomial f (x) = xm + fm−1 xm−1 + · · · + f0 in a natural way as f (x)bi bi+m + fm−1 bi+m−1 + · · · + f0 bi . The characteristic polynomial fB (x) of B is a polynomial of minimum degree such that f (x)bi = 0 for any i. The degree of fB (x) is called the linear complexity (or linear span) of B, denoted by LB . It is easily checked that g(x)bi = 0 for all i if and only if fB (x)|g(x), i.e., fB (x) divides g(x). The sequence B is called a maximal-length sequence (or an msequence, for short) if PB = 2dB − 1, where dB denotes the degree of fB (x). It is well-known that the sequence B is an m-sequence if and only if its characteristic polynomial is primitive [4]. The keystream of the shrinking generator in Figure 1 consists of the bits in the sequence of LFSR 1 where the corresponding bits in the sequence of LFSR ˆ si } ∞ 2 are ‘1’. In other words, let Aˆ = {ˆ ai } ∞ i=0 and S = {ˆ i=0 be the sequences of LFSR 1 and LFSR 2 in Figure 1, respectively. The keystream Zˆ = {ˆ zj } ∞ j=0 is th given by zˆj = a ˆwj , where wj is the position of the j 1 in Sˆ for each j ≥ 0. In the self-shrinking generator in Figure 2, let X = {xi }∞ i=0 denote the output is given by yj = x2wj +1 sequence of the LFSR. Then the keystream Y = {yj }∞ j=0 th where w is the position in the j 1 in the sequence {x }∞ for each j ≥ 0. j
2i i=0
j
Balanced Shrinking Generators
215
The shrinking generator may be implemented as a special case of the selfshrinking generator. Consider an arbitrary shrinking generator defined by two linear shift register LFSR 1 and LFSR 2 with the feedback polynomials fAˆ (x) and fSˆ (x), respectively. Then the keysteam of the shrinking generator can be obtained from the self-shrinking generator by setting X = (ˆ s0 , a ˆ0 , sˆ1 , a ˆ1 , · · ·), which is generated by using an LFSR with the feecback polynomial fAˆ (x2 )fSˆ (x2 ) [6]. To compare the shrinking generator with the self-shrinking generator, we assume that both use primitive polynomials as their feedback polynomials and the size of LFSR in the self-shrinking generator is equal to the sum of the sizes of two LFSRs in the shrinking generator. The keystream of the shrinking generator is not balanced, while the keystream of the self-shrinking generator is balanced. The keystream of the self-shrinking generator has longer period and larger linear complexity than the keystream of the shrinking generator. The shrinking generator needs two clock pulses to generate one bit of keystream on the average [9], while the self-shrinking generator needs four clock pulses. Hence, the shrinking generator is twice as fast as the self-shrinking generator. In this paper, we propose another keystream generator (called a “balanced shrinking generator”) by modifying the shrinking generator and analyze its cryptographical properties including period, balancedness, linear complexity, and probability distribution. In addition to balancedness, experimental results show that its keystream has larger linear complexity than the keystream in the shrinking generator, provided that the size of LFSRs are fixed. Furthermore, it can generate a keystream as fast as the shrinking generator. The paper is organized as follows. Section 2 presents a method to construct balanced shrinking generators and analyze balancedness and periods of their keystreams. In Section 3, we discuss the linear complexity of their keystreams and derive lower and upper bounds on it. Experimental results on their linear complexity are also presented. Statistical properties of the keystream in the balanced shrinking generator are discussed in Section 4. We give concluding remarks in Section 5.
2
Balanced Shrinking Generators
Consider a generator in Figure 3, obtained by modifying the shrinking generator ∞ in Figure 1. Let A = {ai }∞ i=0 and S = {si }i=0 be the sequences in LFSR 1 and LFSR 2, respectively. Then the keystream Z = {zj }∞ j=0 in our costruction is defined by z = a + s , where w is the position of the j th 1 in S for each j
wj
wj −1
j
j ≥ 0. This generator will be referred to as a balanced shrinking generator, since the keystream Z is balanced, as will be shown later. The balanced shrinking generator in Figure 3 may be implemented as a special case of the shrinking generator. Let fA (x) and fS (x) be the feedback polynomials of LFSR 1 and LFSR 2 in Figure 3, respectively. The keystream of the balanced shrinking generator can be obtained from the shrinking generator, if ∞ ˆ we choose Aˆ = {ai + si−1 }∞ i=0 and S = {si }i=0 in the shrinking generator in
216
Se Ah Choi and Kyeongcheol Yang
LFSR 1 Feedback Polynomial ai
si
LFSR 2 s i-1
s i=1 s i=0
output z j discard z j
Feedback Polynomial
Fig. 3. The Balanced Shrinking Generator Figure 1. Furthermore, the sequences Aˆ and Sˆ are produced by the LFSRs with the feedback polynomials fA (x)fS (x) and fS (x), respectively. Conversely, any shrinking generator may also be considered as a special case of the balanced shrinking generator. This is because the keystream of the shrinking generator can be obtained from the balanced shrinking generator in si } ∞ Figure 3 if we choose A = {ˆ ai + sˆi−1 }∞ ˆ (x) = fA (x)fS (x) i=0 , S = {ˆ i=0 , fA and fSˆ (x) = fS (x). In a similar way, the balanced shrinking generator may be implemented as the special case of the self-shrinking generator by choosing X = (s0 , a0 +s−1 , s1 , a1 + s0 , s2 , a2 + s1 , · · ·) in Figure 2. Conversely, the self-shrinking generator may also be considered as a special case of the balanced shrinking generator by choosing A = (x−2 + x1 , x0 + x3 , x2 + x5 , x4 + x7 , · · ·) and S = (x0 , x2 , x4 , · · ·) in Figure 3. From now on, we sometimes use the notation a(i), s(i), a ˆ(i) and z(i) instead ˆi , and zi , respectively, for the brevity of natation. of ai , si , a The following lemma will be useful in analyzing the period of the keystream Z = {zi }∞ i=0 of the balanced shrinking generator. ∞ Lemma 1. Let B = {bi }∞ i=0 and C = {ci }i=0 be binary m-sequences with relatively prime characteristic polynomials fB (x) and fC (x), respectively. Let D = {bi + ci }∞ i=0 where the addition is performed modulo-2. Let PB , PC and PD be the periods of B, C and D, respectively. Then
(i) fD (x) = fB (x)fC (x); (ii) PD = lcm(PB , PC ). Proof. (i) Clearly, fD (x) | fB (x)fC (x) since fB (x)fC (x)(bi + ci ) = fC (x)fB (x)bi + fB (x)fC (x)ci = 0. It suffices to show that fD (x) = 1, fB (x) or fC (x) since fB (x) and fC (x) are relatively prime. If fD (x) = 1, then D should be the all-zero sequence, which is a contradiction. Suppose fD (x) = fB (x) without loss of generality. Since fB (x)ci = fB (x)(di − bi ) = fB (x)di − fB (x)bi = 0
Balanced Shrinking Generators
217
for any i, we have fC (x) | fB (x), which is a contradiction. (ii) It suffices to show that PB | PD and PC | PD . Since fB (x) | xPD − 1 by (i), we have (xPD − 1)bi = 0 for any i, that is, bi+PD = bi for any i. Therefore, ✷ we have PB | PD . Similarly, PC | PD . The period of the keystream in the balanced shrinking generator may be derived in the same way as in the case of the shrinking generator (c.f. Theorem 1 [3]). Theorem 2. Let dA and dS be the degrees of the feedback polynomials of LFSR 1 and LFSR 2 in Figure 3, respectively. If the feedback polynomials of LFSR 1 and LFSR 2 are primitive polynomials with (dA , dS ) = 1, then the period PZ of the keystream Z in the balanced shrinking generator is given by PZ = (2dA − 1)2dS −1 . Proof. Let A and S be the output sequences of LFSR 1 and LFSR 2 in Figure 3, respectively. Let PA and PS be the periods of A and S, respectively. Let Aˆ = {ai + si−1 }∞ ˆ its period. Then PA ˆ = PA PS by Lemma 1. Let wi be the position i=0 and PA th of i 1 in S and WS the number of 1’s in one period of S. Then it is easily checked that z(i + jWS ) = a ˆ(wi + jPS ), and z(i + jPA WS ) = a ˆ(wi + jPA PS ) = a ˆ(wi ) = z(i) for any i and j. Hence, PZ | PA WS . ˆ(wi+PZ + jPS ) for any i and j, On the other hand, note that a ˆ (wi + jPS ) = a since z(i + jWS ) = z(i + jWS + PZ ). Therefore, PAˆ | (wi+PZ − wi ) ,
∀i
(1)
Putting i + 1 instead of i in (1), we get PAˆ | (wi+1+PZ − wi+1 ) ,
∀i
(2)
Combining (1) with (2), we get wi+PZ +1 − wi+PZ = wi+1 − wi + (ji+1 − ji )PAˆ
(3)
for some integers ji and ji+1 . The left side of (3) is the distance between (i + PZ + 1)th and (i + PZ )th positions of 1’s in S, while wi+1 − wi is the distance between (i + 1)th and ith positions of 1’s. If ji+1 − ji is not zero, there exist at least PAˆ consecutive zeros in S, which is impossible because PAˆ > dS . Therefore, we have wi+PZ +1 − wi+PZ = wi+1 − wi , and PS | wi+PZ − wi for all i which implies that the number of 1’s in S in one period of S divides the number of 1’s in S from wi to wi+PZ . Hence, PZ = lWS for an integer l. Using the relation z(i) = a ˆ(wi ) and z(i) = z(i + PZ ), we have a ˆ(wi ) = a ˆ (wi + jlPS ) for any i and j. Therefore, PAˆ | lPS . Since (PA , PS ) = 1 and PAˆ = PA PS , we get PA | l and so PA WS |PZ . ✷ In the following theorem, we show that the keystream of the balanced shrinking generator is balanced.
218
Se Ah Choi and Kyeongcheol Yang
Theorem 3. Let dA and dS be the degrees of the feedback polynomials of LFSR 1 and LFSR 2 in Figure 3, respectively. If the feedback polynomials of LFSR 1 and LFSR 2 are primitive polynomials with (dA , dS ) = 1, the keystream Z of the balanced shrinking generator is balanced. Proof. Let A and S be the output sequences of LFSR 1 and LFSR 2 in Figure 3, respectively. Let PA be the period of A. Clearly, the sequence {z(i+jWS )}∞ j=0 for any i is either an m-sequence of degree dA or the complement of an m-sequence depending on the value s(wi − 1) where wi and WS denote the position of ith 1 in S and the number of 1’s in one period of S, respectively. Consider a pair (s(i − 1), s(i)) of two consecutive bits in S. From the run-distribution property of m-sequences [4], the pairs (0, 1) and (1, 1) appear exactly WS /2 times in one period of S, respectively. Hence, the half of the sequences {z(i + jWS )}∞ j=0 for 0 ≤ i < WS is m-sequences and another half is its complements. Furthermore, ‘1’ appears exactly PA2+1 times and ‘0’ appears exactly PA2−1 times in one period of an m-sequence, while ‘1’ appears exactly PA2−1 times and ‘0’ appears exactly PA +1 times in one period of the complement of an m-sequence. Therefore both 2 ‘1’ and ‘0’ appear exactly PA WS /2 times in one period of Z, respectively. ✷
3
Linear Complexity of Balanced Shrinking Generators
Another important measure of security for the keystream is its linear complexity. In order to get lower and upper bounds on the linear complexity of the keystream in the balanced shrinking generator, the following lemma may be very useful. Lemma 4. ([2]) Let n be a positive integer and α ∈ F2n a primitive element. Let T : F2n → F2 be a nontrivial F2 -linear map. Let V = {vi }∞ i=0 be the sequence of period 2n−1 over F2n by letting vi be the (i + 1)th element x in the sequence {αi }∞ i=0 having the property that T (x) = 1. Then the linear complexity of V is at most 2n−1 − (n − 2), in other words, 2n−1 −(n−2) n−1
2
i=0
− (n − 2) vi+e = 0. i
Theorem 5. Let dA and dS be the degrees of the feedback polynomials of LFSR 1 and LFSR 2 in Figure 3, respectively. If the feedback polynomials of LFSR 1 and LFSR 2 are primitive polynomials with (dA , dS ) = 1, then the linear complexity LZ of the keystream Z satisfies dA 2dS −2 < LZ ≤ (dA + 1) 2dS −1 − (dS − 2). ∞ Proof. Let A = {ai }∞ i=0 and S = {si }i=0 be the sequences of LFSR 1 and LFSR 2 in Figure 3, respectively. Let PA and PS be the periods of A and S, respectively. Let α ∈ F2dA be a root of the characteristic polynomial fA (x) of A. Let f (x)
Balanced Shrinking Generators
219
be the minimal polynomial of αPS over F2 . Then f (x) is a primitive polynomial of degree dA , since (PS , PA ) = 1. For each i, 0 ≤ i < WS , the characteristic polynomial gi (x) of the sequence {z(i + jWS )}∞ j=0 is given by gi (x) =
f (xWS ), (xWS + 1)f (xWS ),
if s(wi − 1) = 0 if s(wi − 1) = 1
Therefore, fZ (x) is a divisor of (xWS + 1)f (xWS ). Since WS = 2dS −1 , we may rewrite fZ (x) by fZ (x) = (x + 1)l1 f (x)l2 , (Upper Bound) Let f (x) =
dA j=0
2dS −1
l1
(x + 1) f (x)
0 ≤ l1 , l2 ≤ 2dS −1 .
(4)
fj xj . From (4), fZ (x) divides
l1 l1 i+2dS −1 j = fj x i j=0 i=0 dA
Therefore, for any integer e, l1 l1 F (e) fj z(i + 2dS −1 j + e) i j=0 i=0 dA
=
dA
fj
j=0
=
i
i=0
dA l1 l1 i=0
=
l1 l1
dA j=0
fj
i
(a(wi+e + jPS ) + s(wi+e + jPS − 1))
fj a(wi+e + jPS ) +
j=0
l1 i=0
l1 s(wi+e + jPS − 1) i
dA j=0
fj
l1 l1 s(wi+e + jPS − 1) i i=0
dA since j=0 fj a(wi+e + jPS ) = 0 for any i. From the theory of m-sequences, there exist a primitive element β ∈ F2dS and an element c ∈ F2dS such that si = dS −1 2i Tr cβ i for any integer i where Tr(x) = i=0 x . Let T : F2dS → F2 be the F2 -linear map defined by T (x) = Tr (cβx). Then swi −1 corresponds to the (i + 1)th element in the sequence {β i }∞ i=0 having the property that T (x) = 1. If we choose l1 = 2dS −1 − (dS − 2), then 2dS −1 −(dS −2) d −1 S
2
i=0
− (dS − 2) s(wi+e + jPS − 1) = 0 i
by Lemma 4, so F (e) = 0 for any integer e. Hence, LZ ≤ (dA + 1) 2dS −1 −(dS −2). (Lower Bound) If max(l1 , l2 ) ≤ 2dS −2 , then PZ < PA WS /2, which is a con✷ tradiction. Therefore, the degree of fZ (x) exceeds dA WS /2.
220
Se Ah Choi and Kyeongcheol Yang
Table 1. Range of linear complexities of the shrinking generator and the balanced shrinking generator when the feedback polynomials run over all primitive polynomials of given degree Linear complexity deg A deg S Period Shrinking Balanced Shrinking Generator Generator 3 2 14 6 8 5 2 62 10 12 7 2 254 14 16 9 2 1022 18 20 4 3 60 16 19 5 3 124 20 23 7 3 508 28 31 8 3 1020 32 35 10 3 4092 40 43 3 4 56 24 30 5 4 248 35 ∼ 40 41 ∼ 46 7 4 1016 56 62 9 4 4088 72 78 3 5 112 45 ∼ 48 58 ∼ 61 4 5 240 60 ∼ 64 72 ∼ 76 6 5 1008 90 ∼ 96 100 ∼ 109 7 5 2032 105 ∼ 112 115 ∼ 125 8 5 4080 128 138 ∼ 141 9 5 8176 144 154 ∼ 157 5 6 992 160 187 ∼ 188 7 6 4064 224 251 ∼ 252 3 7 448 189 ∼ 192 221 ∼ 226 4 7 960 256 313 ∼ 315 5 7 1984 320 377 ∼ 379 6 7 4032 378 ∼ 384 437 ∼ 443 8 7 16320 512 569 ∼ 571 10 7 65472 640 697 ∼ 699 5 8 3968 640 755 ∼ 762 7 8 16256 896 1011 ∼ 1018 4 9 3840 1020 ∼ 1024 1266 ∼ 1273 5 9 7936 1270 ∼ 1280 1516 ∼ 1529 7 10 65024 3577 ∼ 3584 4076 ∼ 4088
Under the conditions in Theorem 2, the keystream of the shrinking generator in Figure 1 is not balanced and has the same period as the balanced shrinking generator in Figure 3. In addition, it is well-known in [3] that its linear complexity is between dA 2dS −2 and dA 2dS −1 . Table 1 shows the actual range of the linear complexities of both generators when the characteristic polynomials for LFSR 1 and LFSR 2 in Figures 1 and 3 run over all primitive polynomials of given degrees. In fact, experimental results show that the balanced shrinking generator
Balanced Shrinking Generators
221
has larger linear complexity than the shrinking generator. Hence, the balanced shrinking generator may have more preferable cryptographical properties than the shrinking generator. Furthermore, the linear complexity of the keystream in the balanced shrinking generators is larger than 2dS −1 (dA + 1) − (dS + dA ) as shown in Table 1. This is much larger than the lower bound dA 2dS −2 + 1. For these reasons, more work may be needed to improve its lower bound.
4
Statistical Properties
Before we discuss statistical properties of the keystreams in the balanced shrinking generators, we review some backgrounds on the notions of Fourier transform and (-bias distributions [3] for our analysis. Boolean functions of n variables may be considered as real-valued functions f : {0, 1, }n −→ {−1, 1}. Any real valued function f (x) for x ∈ F2n can be uniquely expressed as a linear combination of (−1)s·x for s ∈ F2n , i.e. fˆ(s)(−1)s·x f (x) = s∈F2n
where fˆ(s) is the Fourier transform of f (x) is given by 1 fˆ(s) = n f (x)(−1)s·x 2 n x∈F2
for s ∈ F2n . It is easily check that fˆ(s) = Pr{f (x) = (−1)s·x } − Pr{f (x) = (−1)s·x } where x = (x1 , x2 , · · · , xn ) is chosen uniformly at random. A probability distribution is (-biased if it is “close” to the uniform distribution in the following sense. Definition 6. ([3]) A distribution µ over F2n is called an (-bias distribution if |ˆ µ(s)| ≤ (2−n for every s ∈ F2n . The following lemma shows a connection between LFSRs and (-bias distributions. Lemma 7. ([1]) Let B = {bi }∞ i=0 be an m-sequence based on a LFSR where feedback is chosen with uniform probability among all primitive polynomials of degree dB over F2 and the seed for B is chosen uniformly over all non-zero B , n) be the distribution of B of length n. Then elements in F2dB . Let DIS(d n−1 DIS(dB , n) is an 2dB -bias distribution. Applying Lemma 7 to the balanced shrinking generators, we get the following theorem.
222
Se Ah Choi and Kyeongcheol Yang
∞ Theorem 8. Let B = {bi }∞ i=0 and C = {ci }i=0 be m-sequences based on two LFSRs where feedback polynomials are chosen with uniform probability among all primitive polynomials of degree dB and dC over F2 and the seeds for B and C are chosen uniformly over all non-zero element in F2dB and F2dC , respectively. Let D = {bi + ci }∞ i=0 where the addition is performed modulo-2, and DIS(dB , dC , n) 2 -bias disthe distribution of D of length n. Then DIS(dB , dC , n) is an 2d(n−1) B +dC +n tribution.
B , n)∗ DIS(d C , n) where ∗ denotes the Proof. Note that DIS(dB , dC , n) = DIS(d B , n) and DIS(d C , n) convolution. By Lemma 7, the Fourier coefficients of DIS(d n−1 n−1 have magnitude less than or equal to 2dB +n and 2dC +n , respectively. Therefore, n−1 the Fourier coefficients of DIS(dB , dC , n) have magnitude ≤ 2dn−1 . ✷ B +n 2dC +n Definition 9. ([3]) Let f be a function from F2n to the real numbers. The L1 norm of f is defined L1 (f ) = fˆ(s) . s∈F2n
Lemma 10. ([3]) Let f and g be functions from F2n to the real numbers. Then, L1 (f g) ≤ L1 (f )L1 (g) and L1 (f + g) ≤ L1 (f ) + L1 (g). The following lemma relates (-bias distributions to the norm L1 (f ) and gives an upper bound on the difference between the average Eµ [f ] of a real valued function f over a distribution µ and the average EU [f ] of f over the uniform distribution U . It may be useful for tests of pseudo-randomness of a function. Lemma 11. ([5]) For a real-valued function f , |EU [f ] − Eµ [f ]| ≤ (L1 (f ) where U is the uniform distribution and µ is an (-bias distribution. Lemma 12. ([3]) i) Let SUM(x) = ni=0 xi , then L1 (SUM) = n. ii) Let AND(x) = i xi , then L1 (AND) = 1. iii) For R ∈ {0, 1, ×}n we define a template templateR (x) = 1 iff x and R = × then ri = xi . For any agree on each 0 or 1 in R, i.e., for each ri R ∈ {0, 1, ×}n then L1 ( templateR ) = 1. Before we apply the above results to the balanced shrinking generator, we first discuss the sequence obtained by summing two m-sequences in the following. ∞ Theorem 13. Let B = {bi }∞ i=0 and C = {ci }i=0 be m-sequences based on LFSRs where the feedback polynomials are chosen with uniform probability among Let D = {bi + all primitive polynomials of degree dB and dC over F2 , respectively. n where the addition is performed modulo-2. Let SUM = ci } ∞ D i=0 j=1 dij and
Balanced Shrinking Generators
223
SUMY = ni=1 yi where yi are independent and identically distributed (i.i.d.) {0, 1}-random variables with Pr[yi = 1] = 1/2 and 1 ≤ i1 < i2 < · · · < in ≤ PD , where PD denotes the period of D. Then k k E (SUMD ) − E (SUMY ) ≤
nk+2 2dB +dC +n
.
Proof. By Theorem 8 and Lemmas 11 and 12, it follows that (n − 1)2 k k E (SUMD ) − E (SUMY ) ≤ (L1 (SUM))k dB +dC +n 2 ✷ The following theorem shows that each template in the sum of two msequences is distributed in a similar way as a random string. ∞ Theorem 14. Let B = {bi }∞ i=0 and C = {ci }i=0 be m-sequences based on two LFSRs where feedback polynomials are chosen with uniform probability among all primitive polynomials of degree dB and dC over F2 , respectively. Let D = {bi + ci }n−1 i=0 where the addition is performed modulo-2 and Y a random string of n bits. Let R ∈ {0, 1, ×}n be a template. Then
|E[templateR (D)] − E[templateR (Y )]| ≤
n2 2dB +dC +n
Proof. Combining Theorem 8 with Lemmas 11 and 12, we get |E[templateR (D)] − E[templateR (Y )]| ≤ L1 (templateR )
(n − 1)2 2dB +dC +n . ✷
Now we are in a position to apply the above results to the balanced shrinking generator. The following corollary shows that the moments of the keystream of the balanced shrinking generator are very close to those of a random string. Corollary 15. Let A and S be the sequences of LFSR 1 and LFSR 2 in Figure 3, where A and S are m-sequences of degree dA and dS , respectively, and Z the output sequence of the balanced shrinking generator. Let SUMZ i+l−1 be the sum i l−1 of l consecutive bits in the sequence Z, where l ≤ PZ . Let SUMY = i=0 yi where yi ∈ F2 are i.i.d. {0, 1}-random variables with Pr[yi = 1] = 1/2. Then E[(SUMZ i+l−1 )k ] − E[(SUMYl )k ] ≤ i
lk+2 2dA +dS +l
.
224
Se Ah Choi and Kyeongcheol Yang
i+l−1 Proof. Let Aˆ = (ai + si−1 )∞ a ˆwi+j where wi+j i=0 . Then SUMZii+l−1 = j=i th means (i + j) position of 1 in S. By Theorem 13, the inequality holds. The following theorem shows that each template is distributed similarly in the keystream of the balanced shrinking generator and a random string. Theorem 16. Let Zn be the first n bits in the keystream Z of the balanced shrinking generator. Let R ∈ {0, 1, ×}n be a template. Then |E[templateR (Z)] − E[templateR (Y )]| ≤
9 2dA +dS +3
Proof. Let A and S be the sequences of LFSR 1 and LFSR 2 in Figure 3. Let th ˆ Aˆ = {ai + si−1 }∞ i=0 . Zn consists of the first wn bits of A, where wn is the n position of 1 in the S. Create a template RS of length wn by modifying the template R (put ∗ in ith location corresponding to si = 0, and copy R in the location corresponding to si = 1). |E[templateR (Z)] − E[templateR (Y )]| ˆ − E[templateR (Y )] = Pr[S] E[templateRS (A)] S
≤
Pr[S]
S
= ≤ =
wn 2 2dA +dS +wn
1 2dA +dS 2dA +dS
wn 2 2wn
Pr(S)
32 23
S
1
Pr(S)
S
9 2dA +dS +3 ✷
The correlation between two bits is the difference between the probability that the two bits are equal and the probability that they differ. Corollary 17. Let A and S be the sequences of LFSR 1 and LFSR 2 in Figure 3, where A and S are m-sequences of degree dA and dS , respectively. The correlation between zi and zi+l−1 in the keystream Z of the balanced shrinking generator is 9 bounded by 2dA +d S +3 Proof. Use the four templates R1 = (0, ×, · · · , ×, 0), R2 = (0, ×, · · · , ×, 1), R3 =
l
(1, ×, · · · , ×, 0) and R4 = (1, ×, · · · , ×, 1) and apply Theorem 14.
l
l
l
✷
Balanced Shrinking Generators
225
Corollary 18. Let A and S be the sequences of LFSR 1 and LFSR 2 in Figure 3, where A and S are m-sequences of degree dA and dS , respectively. Let Zii+l−1 be an arbitrary string of l consecutive bits in the keystream Z of the balanced shrink9 for ing generator. The probability that Zii+l−1 = p is in the range 2−l ± 2dA +d S +3 l all p ∈ F2 . Proof. It follows from Theorem 14, since E[templatep (Zii+l−1 )] = Pr[Zii+l−1 = ✷ p ], and E[templatep (Y )] = 21l .
5
Conclusion
A construction method of the balanced shrinking generator has been provided. We show that the balanced shrinking generator outperforms the shrinking generator in terms of cryptographic properties. Above all, the keystream of the balanced shrinking generator is balanced. Experimental results show that it has larger linear complexity than the shrinking generator. Furthermore, our construction method may find more applications than the shrinking generator.
Acknowledgments This work was supported in part by the MSRC at Kyungpook National University as an ITRC and the BK21 program of Ministry of Education of Korea.
References [1] N. Alon, O. Gollreich, J. Hastad, and R. Peralta, “Simple constructions of almost k-wise independent random variables,” 31th Annual Symposium on Foundations of Computer Science, St. Louis, Missouri, pp. 544-553, 1990. 221 [2] S. R. Blackburn, “The linear complexity of the self-shrinking generator,” IEEE Trans. on Inform. Theory, IT-45, no. 6, pp. 2073-2077, September 1999. 218 [3] D. Coppersmith, H. Krawczyk, and Y. Mansour, “The shrinking generator,” Advanced in Cryptology-CRYPTO’93, Lecture Notes in Computer Science, vol. 773, pp. 22-39, 1993. 213, 217, 220, 221, 222 [4] S. W. Golomb, Shift Resister Sequences, Aegean Park Press, 1982. 214, 218 [5] E. Kushilevitx and Y. Mansour, “Learning decision trees using the fourier spectrum,” Proceedings of the 23th Annual ACM Symposium on Theory of Computing, pp. 455-464, May. 1991. 222 [6] W. Meier and O.Staffelbach, “The self-shrinking generator,” Advanced in Cryptology-EUROCRYPT’94, Lecture Notes in Computer Science, vol. 950, pp. 205-214, 1995. 213, 215 [7] A. J. Menezes, P. C. Oorshot, S. A. Vanstone, Handbook of Applied Cryptography, CRC Press, 1997. 213 [8] R. A. Rueppel,Analysis and Design of Stream Ciphers, Springer-Verlag, 1986. 213 [9] I. Shparlinski, “On some properties of the shrinking generator,” Designs, Codes and Cryptography, vol. 23, pp. 147-156, 2001. 215
226
Se Ah Choi and Kyeongcheol Yang
[10] T. Siegenthaler, “Correlation-immunity of nonlinear combining functions for cryptographic applications,” IEEE Transactions on Information Theory, IT-30, pp. 776-780, Sept. 1984.
On the Universal Hash Functions in Luby-Rackoff Cipher Tetsu Iwata and Kaoru Kurosawa Department of Computer and Information Sciences, Ibaraki University 4–12–1 Nakanarusawa, Hitachi, Ibaraki 316-8511, Japan {iwata,kurosawa}@cis.ibaraki.ac.jp
Abstract. It is known that a super-pseudorandom permutation on 2n bits can be obtained from a random function f on n bits and two bisymmetric and AXU hash functions h1 and h2 on n bits. It has a Feistel type structure which is usually denoted by φ(h1 , f, f, h2 ). Bi-symmetric and AXU hash functions h1 , h2 are much weaker primitives than a random function f and they can be computed much faster than random functions. This paper shows that we can further weaken the condition on h1 . Keywords: Block/Stream Ciphers, Provable Security, Cryptographic Primitives.
1 1.1
Introduction Background
It is ideal that a block cipher looks like a random permutation. Luby and Rackoff proved the pseudorandomness and the super-pseudorandomness of Feistel permutations [2]. A block cipher is called pseudorandom if it looks like a random permutation against chosen plaintext attack. It is called super-pseudorandom if it looks like a random permutation against chosen plaintext and ciphertext attacks. Let φ(f1 , f2 , f3 ) denote the three round Feistel permutation such that the i-th round function is fi . Similarly, let φ(f1 , f2 , f3 , f4 ) denote the four round Feistel permutation. Luby-Rackoff Construction [2]. Suppose that each fi is a random function. Then Luby and Rackoff proved that φ(f1 , f2 , f3 ) is pseudorandom and φ(f1 , f2 , f3 , f4 ) is super-pseudorandom [2]. We call them Luby-Rackoff constructions. Since then a considerable amount of research has been done mainly focusing on the following question: Can we obtain more efficient construction of superpseudorandom permutation than Luby and Rackoff’s one ? [3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 19]. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 226–237, 2003. c Springer-Verlag Berlin Heidelberg 2003
On the Universal Hash Functions in Luby-Rackoff Cipher
227
Single Random Function Constructions. Pieprzyk showed that φ(f 2 , f, f, f ) is pseudorandom [11]. Patarin showed that φ(f ◦ ξ ◦ f, f, f ) is pseudorandom and φ(f ◦ξ ◦f, f, f, f ) is super-pseudorandom, where ξ is, for example, a rotation of one bit [8]. These results show that we can obtain a super-pseudorandom permutation on 2n bits from a single random function f on n bits. Lucks Construction [3]. Among others, Lucks is the first who noticed that a random function fi can be replaced with a universal hash function h. Universal hash functions have been studied by many researchers so far. For example see [17]. Lucks showed that φ(h, f2 , f3 ) is pseudorandom if h is an almost XOR universal (AXU) hash function [3]. We call it Lucks construction. Note that a universal hash function is a much weaker primitive than a random function. Also, hash functions are much more efficient primitives than a random function from the standpoint of efficiency of computations. PRS Construction [9]. Patel, Ramzan and Sundaram [9] next introduced a notion of bi-symmetric and AXU hash functions. By using that, they showed that φ(h1 , f, f, h2 ) is super-pseudorandom if each hi is a bi-symmetric and AXU hash function. We call it PRS construction. 1.2
Our Contribution
PRS construction implies that a super-pseudorandom permutation on 2n bits can be obtained from a random function f on n bits and two bi-symmetric and AXU hash functions h1 and h2 on n bits. Bi-symmetric and AXU hash functions h1 and h2 are much weaker primitives than a random function f and they can be computed much faster than random functions. Then we ask: Can we weaken the condition on h1 and h2 so that φ(h1 , f, f, h2 ) is still super-pseudorandom ? In this paper, we show a positive answer for this problem. We first prove that the notion of -bi-symmetric hash functions, which was intorduced by [9], is almost equivalent to a well known notion of -uniform hash functions. More precisely,√we show that -uniformity implies -bi-symmetry and -bi-symmetry implies -uniformity. We next show that AXU hash functions are strictly weaker primitives than bi-symmetric and AXU hash functions. See Table 1. We finally show that bi-symmetry on h1 in PRS construction is redundant. More precisely, we show that φ(h1 , f, f, h2 ) is super-pseudorandom even if h1 is
Table 1. Our first result Bi-symmetric hash function ≈ Uniform hash function Bi-symmetric and AXU hash function > AXU hash function
228
Tetsu Iwata and Kaoru Kurosawa
Table 2. Bi-symmetry is redundant Assumption of PRS [9] Our assumption h1 Bi-symmetric and AXU hash function AXU hash function h2 Bi-symmetric and AXU hash function Uniform and AXU hash function
just an AXU hash function, where we assume that h2 is a uniform and AXU hash function. See Table 2. Note that from our first result, the assumption on h1 is strictly reduced, while the assumption on h2 is almost identical.
2 2.1
Preliminaries Notation
For x ∈ {0, 1}2n, xL denotes the first (left) n bits of x and xR denotes the last (right) n bits of x. That is, x = (xL , xR ). We denote by Fn the set of all functions from {0, 1}n to {0, 1}n. Similarly, we denote by P2n the set of all permutations over {0, 1}2n. For two functions f and g, g ◦ f denotes the function x → g(f (x)). R For a set S, s ← S denotes the process of picking an element s from S uniformly at random. 2.2
Feistel Permutation
Definition 2.1 (The Basic Feistel Permutation). For any function f ∈ Fn , def
define the basic Feistel permutation φf ∈ P2n as φf (xL , xR ) = (f (xL )⊕xR , xL ). Note that it is a permutation since φ−1 f (xL , xR ) = (xR , f (xR ) ⊕ xL ). Definition 2.2 (The r Round Feistel Permutation). Let r ≥ 1 be an integer. For f1 , . . . , fr ∈ Fn , define the r round Feistel permutation φ(f1 , . . . , fr ) ∈ def
P2n as φ(f1 , . . . , fr ) = φfr ◦ · · · ◦ φf1 . The four round Feistel permutation is illustrated in Figure 1. For simplicity, the left and right swaps are omitted. 2.3
Super-Pseudorandomness
Super-pseudorandomness measures a security of a block cipher against adaptive chosen plaintext and chosen ciphertext attacks. Let Φ be a subset of P2n . We say that Φ is super-pseudorandom if it is indistinguishable from P2n , where the adversary is given access to both directions of the permutation. Our adaptive adversary A is modeled as a Turing machine that has blackbox access to two oracles, the forward direction of the permutation and the
On the Universal Hash Functions in Luby-Rackoff Cipher
229
❄ r ✲ f1 ✲ +❧ ❄
✛ f2 ✛ +❧
r
❄ r ✲ f3 ✲ +❧ ❄
✛ f4 ✛ +❧ ❄
r ❄
Fig. 1. Feistel permutation −1
backward direction of the permutation. The notation Aφ,φ indicates A with an oracle which, in response to a query (+, x), returns y ← φ(x), and in response −1 to a query (−, y), returns x ← φ−1 (y). The notation AR,R indicates A with an oracle which, in response to a query (+, x), returns y ← R(x), and in response to a query (−, y), returns x ← R−1 (y). The computational power of A is unlimited, but the total number of oracle calls is limited to a parameter q. After making at most q queries to the oracles, A outputs a bit. Definition 2.3 (Advantage, sprp). Let Φ be a family of permutations over {0, 1}2n. For an adversary A, we define the advantage of A as −1 −1 def R R sprp AdvΦ (A) = Pr φ ← Φ : Aφ,φ = 1 − Pr R ← P2n : AR,R = 1 . Definition 2.4 (Super-Pseudorandom Permutation). We say that Φ is sprp super-pseudorandom if AdvΦ (A) is negligible (as a function of n) for any adversary A that makes at most q queries in total. 2.4
Hash Functions
Let Hn be a subset of Fn . We denote by #Hn the cardinality of Hn . Then the following definition follows from those given in [1, 18, 9, 12]. Definition 2.5. 1. Hn is an -uniform (-U) family of hash functions if for any element x ∈ {0, 1}n and any element y ∈ {0, 1}n, there exist at most #Hn hash functions h ∈ Hn such that h(x) = y. 2. Hn is an -almost XOR universal (-AXU) family of hash functions if for any two distinct elements x, x ∈ {0, 1}n and any element y ∈ {0, 1}n , there exist at most #Hn hash functions h ∈ Hn such that h(x) ⊕ h(x ) = y. 3. Hn is an -bi-symmetric (-B) family of hash functions if for any elements x, x ∈ {0, 1}n (not necessarily distinct) and any element y ∈ {0, 1}n, there exist at most (#Hn )2 pairs of hash functions (h, h ) ∈ Hn × Hn such that h(x) ⊕ h (x ) = y.
230
Tetsu Iwata and Kaoru Kurosawa
We show some examples. 1. Let Hn1 = {ha (x) = a·x over GF(2n )}. Then Hn1 is a 21n -AXU family of hash functions. 2. Let Hn2 = {ha,b (x) = a·x+ b over GF(2n )}. Then Hn2 is a 21n -U and 21n -AXU family of hash functions. 3. Ramzan and Reyzin showed that [12] Hn3 = {hA,v (x) = Ax ⊕ v | A is an n × n random matrix and v ∈ {0, 1}n} is a
1 1 2n -B, 2n -U
Note that
and
1 2n -AXU
#Hn1 = 2n ,
family of hash functions.
#Hn2 = 22n ,
2
#Hn3 = 2n
+n
.
Definition 2.6. We say that 1. h is an -uniform (-U) hash function if h ∈ Hn , where Hn is an -U family of hash functions. 2. h is an -almost XOR universal (-AXU) hash function if h ∈ Hn , where Hn is an -AXU family of hash functions. 3. h is an -bi-symmetric (-B) hash function if h ∈ Hn , where Hn is an -B family of hash functions. We sometimes omit if it is negligible. For example, we say that h is a bisymmetric and AXU hash function if it is an 1 -bi-symmetric and 2 -AXU hash function for some negligible 1 and 2 .
3
Bi-Symmetry Is Almost Equivalent to Uniformity
Patel, Ramzan and Sundaram [9] introduced a notion of bi-symmetric hash functions and showed that φ(h1 , f, f, h2 ) is super-pseudorandom if each hi is a bisymmetric and AXU hash function. In this section, we prove that the notion of bi-symmetric hash functions is almost equivalent to a well known notion of uniform hash functions. Theorem 3.1. If Hn is an -U family of hash functions, then it is an -B family of hash functions. Proof. Let Hn be an -U family of hash functions. Let x, x ∈ {0, 1}n (not necessarily distinct) and y ∈ {0, 1}n be any elements. Let h be any function in Hn . Then we have at most #Hn hash functions h ∈ Hn such that h(x) = y ⊕ h (x ) since Hn is an -U family of hash functions. Therefore, we have at most (#Hn )2 hash functions (h, h ) ∈ Hn × Hn such that h(x) ⊕ h (x ) = y. √ Theorem 3.2. If Hn is an -B family of hash functions, then it is a -U family of hash functions.
On the Universal Hash Functions in Luby-Rackoff Cipher
231
√ Proof. Assume that Hn is not a -U family of hash functions. We show that it is not an -B family√of hash functions. Since Hn is not a -U family of hash functions, there exist x and y such that √ #{h | h ∈ Hn , h(x) = y} > #Hn . Then we have #{(h, h ) | (h, h ) ∈ Hn × Hn , h(x) ⊕ h (x) = 0} > (#Hn )2 √ since √ we have at least #Hn hash functions h ∈ Hn such that h(x) = y and at least #Hn hash functions h ∈ Hn such that h (x) = y. Therefore, Hn is not an -B family of hash functions. √ Suppose that √ h ∈ Hn is -B and -U. Then ≤ ≤ . Note that is negligible if is negligible. Therefore, h ∈ Hn is a uniform hash function if and only if it is a bi-symmetric hash function.
4
Bi-Symmetry Is Redundant
In this section, we first prove that AXU hash functions are strictly weaker primitives than bi-symmetric and AXU hash functions. We next show that bi-symmetry is redundant in h1 of PRS construction. 4.1
AXU Is Strictly Weaker than Bi-Symmetric and AXU
We show that there exists an 1 -AXU family of hash functions which is not an 2 -B family of hash functions for negligible 2 . Consider Hn1 of Section 2.4. Then it is easy to see that #Hn1 = 2n and Hn1 is 1 a 2n -AXU family of hash functions. Now, let x = x = y = 0. Then the number of (a, a ) such that a · x ⊕ a · x = y, which is equivalent to a · 0 ⊕ a · 0 = 0, is (2n )2 . Therefore, it is an -B family of hash functions with = 1. 4.2
Bi-Symmetry Is Redundant in h1 of PRS Construction
We next show that bi-symmetry is redundant in h1 of PRS construction. Let h1 ∈ Hn be an 1 -AXU hash function, h2 ∈ Hn be an 2 -U and 3 -AXU hash function, f ∈ Fn be a random function, and φ = φ(h1 , f, f, h2 ). Lemma 4.1. Fix x(i) ∈ {0, 1}2n and y (i) ∈ {0, 1}2n for 1 ≤ i ≤ q arbitrarily in such a way that {x(i) }1≤i≤q are all distinct and {y (i) }1≤i≤q are all distinct. Then the number of (h1 , f, h2 ) such that φ(x(i) ) = y (i) for 1 ≤ ∀i ≤ q is at least
(#Hn )(#Fn )(#Hn ) 22qn
1 − 1 q2 − 2 q 2 − 3 q2 .
(1)
232
Tetsu Iwata and Kaoru Kurosawa
A proof is given in the next section. Let Φ = {φ | φ = φ(h1 , f, f, h2 ), h1 ∈ Hn , f ∈ Fn , h2 ∈ Hn } and R ∈ P2n be a random permutation. Theorem 4.1. For any adversary A that makes at most q queries in total, q q q(q − 1) sprp 2 AdvΦ (A) ≤ 1 + 2 q + 3 + 2n+1 . 2 2 2 Therefore, φ(h1 , f, f, h2 ) is a super-pseudorandom permutation if 1 , 2 and 3 are negligible. Proof. Let O = R or φ. The adversary A has oracle access to O and O−1 . There are two types of queries A can make: either (+, x) which denotes the query “what is O(x)?”, or (−, y) which denotes the query “what is O−1 (y)?” For the i-th query A makes to O or O−1 , define the query-answer pair (x(i) , y (i) ) ∈ {0, 1}2n × {0, 1}2n , where either A’s query was (+, x(i) ) and the answer it got was y (i) or A’s query was (−, y (i) ) and the answer it got was x(i) . Define view v of A as v = ((x(1) , y (1) ), . . . , (x(q) , y (q) )). Without loss of generality, we assume that {x(i) }1≤i≤q are all distinct, and (i) {y }1≤i≤q are all distinct. Since A has unbounded computational power, A can be assumed to be deterministic. This implies that for every 1 ≤ i ≤ q the i-th query of A is fully determined by the first i − 1 query-answer pairs. Similarly, the final output of A (0 or 1) depends only on v. Hence denote by CA (v) the final output of A as a function of v. def def Let vone = {v | CA (v) = 1} and None = #vone . −1 R def Evaluation of pR . Define pR as pR = Pr R ← P2n : AR,R = 1 . Then we −1
#{R | AR,R = 1} have pR = . (22n )! For each v ∈ vone , the number of R such that R(x(i) ) = y (i) for 1 ≤ ∀i ≤ q is exactly (22n − q)!. Therefore, we have pR =
#{R | R satisfying (2)} (22n )! v∈v one
= None ·
(22n − q)! . (22n )!
Evaluation of pφ . Define pφ as −1 R def R R pφ = Pr h1 ← Hn , f ← Fn , h2 ← Hn : Aφ,φ = 1 .
(2)
On the Universal Hash Functions in Luby-Rackoff Cipher
233
−1
#{(h1 , f, h2 ) | Aφ,φ = 1} . (#Hn )(#Fn )(#Hn ) Similarly to pR , we have # {(h1 , f, h2 ) | (h1 , f, h2 ) satisfying (1)} . pφ = (#Hn )(#Fn )(#Hn ) v∈v
Then we have pφ =
one
Then from Lemma 4.1, we obtain that 1 − 1 q − 2 q 2 − 3 q 2 2 pφ ≥ 2qn 2 v∈vone q q None 2 = 2qn 1 − 1 − 2 q − 3 2 2 2 q q (22n )! 2 = pR · 2qn 2n − 2 q − 3 . 1 − 1 2 2 2 (2 − q)! 2n
Since 22qn(2(22n)!−q)! ≥ 1 − q(q−1) 22n+1 (This can be shown easily by an induction on q), we have q(q − 1) q q pφ ≥ pR 1 − 2n+1 1 − 1 − 2 q 2 − 3 2 2 2 q q q(q − 1) ≥ pR 1 − 1 − 2 q 2 − 3 − 2n+1 2 2 2 q q q(q − 1) ≥ pR − 1 (3) − 2 q 2 − 3 − 2n+1 . 2 2 2 Applying the same argument to 1 − pφ and 1 − pR yields that q q q(q − 1) 1 − pφ ≥ 1 − pR − 1 − 2 q 2 − 3 − 2n+1 . 2 2 2 q Finally, (3) and (4) give |pφ − pR | ≤ 1 2 + 2 q 2 + 3 q2 + q(q−1) 22n+1 .
5
(4)
Proof of Lemma 4.1
For (x(i) , y (i) ), we denote by I2 ∈ {0, 1}n, the input to f in the second round (i) in φ, and denote by O2 ∈ {0, 1}n, the output of it. Similarly, for (x(i) , y (i) ), (i) (i) n I3 , O3 ∈ {0, 1} are the input and output of f in the third round, respectively. See Figure 2. (i)
Number of h1 . For any fixed i and j such that 1 ≤ i < j ≤ q: (i)
(j)
– if xL = xL , then there exists no h1 ∈ Hn such that (i)
(i)
(j)
(j)
h1 (xL ) ⊕ xR = h1 (xL ) ⊕ xR (i)
(j)
(i)
(j)
= xR ; since xL = xL implies xR
(5)
234
Tetsu Iwata and Kaoru Kurosawa (i)
(i)
xL
xR
❄ r ✲ h1 ✲ +❧ (i)
(i)
I ❄O2 ✛• f ✛2• r +❧ (i) (i) I O ❄ r 3•✲ f •✲3 +❧
❄
✛ h2 ✛ +❧ ❄
(i)
r ❄
(i)
yL
yR
Fig. 2. The labeling convention used in the proof of Lemma 4.1 (i)
(j)
= xL , then the number of h1 ∈ Hn which satisfies (5) is at most – if xL 1 #Hn since h1 is an 1 -AXU hash function. Therefore, the number of h1 ∈ Hn such that (i)
(i)
(j)
(j)
(i)
(i)
(j)
(j)
h1 (xL ) ⊕ xR = h1 (xL ) ⊕ xR for 1 ≤ ∃i < ∃j ≤ q q is at most 1 2 #Hn . Then, the number of h1 ∈ Hn such that h1 (xL ) ⊕ xR = h1 (xL ) ⊕ xR for 1 ≤ ∀i < ∀j ≤ q (6) q is at least #Hn − 1 2 #Hn . Fix h1 which satisfies (6) arbitrarily. This implies (1) (q) (i) (j) = I2 for 1 ≤ ∀i < ∀j ≤ q. that I2 , . . . , I2 are fixed in such a way that I2 Number of h2 . For any fixed i and j such that 1 ≤ i < j ≤ q: – if yR = yR , then there exists no h2 ∈ Hn such that (i)
(j)
(i)
(i)
(j)
(j)
h2 (yR ) ⊕ yL = h2 (yR ) ⊕ yL (i)
(j)
(i)
(7)
(j)
= yL ; since yR = yR implies yL (i) (j) – if yR = yR , then the number of h2 ∈ Hn which satisfies (7) is at most 3 #Hn since h2 is an 3 -AXU hash function. Therefore the number of h2 ∈ Hn such that (i)
(i)
(j)
(j)
h2 (yR ) ⊕ yL = h2 (yR ) ⊕ yL for 1 ≤ ∃i < ∃j ≤ q (8) q is at most 3 2 #Hn . Next, for any fixed i and j such that 1 ≤ i, j ≤ q (not necessarily distinct), the number of h2 ∈ Hn such that (i)
(i)
(j)
h2 (yR ) ⊕ yL = I2
On the Universal Hash Functions in Luby-Rackoff Cipher
235
is at most 2 #Hn since h2 is an 2 -U hash function. Therefore, the number of h2 ∈ Hn such that (i)
(i)
(j)
h2 (yR ) ⊕ yL = I2
for 1 ≤ ∃i, ∃j ≤ q
(9)
is at most 2 q 2 #Hn . Then from (8) and (9), the number of h2 ∈ Hn such that (i)
(i)
(j)
(j)
h2 (yR ) ⊕ yL = h2 (yR ) ⊕ yL for 1 ≤ ∀i < ∀j ≤ q, and (i) (i) (j) h2 (yR ) ⊕ yL = I2 for 1 ≤ ∀i, ∀j ≤ q,
(10)
is at least #Hn − 3 q2 #Hn − 2 q 2 #Hn . Fix h2 which satisfies (10) arbitrarily. (1) (q) (i) (j) = I3 for 1 ≤ This implies that I3 , . . . , I3 are fixed in such a way that I3 (i) (j) = I2 for 1 ≤ ∀i, ∀j ≤ q. ∀i < ∀j ≤ q, and I3 Number of f . Now h1 and h2 are fixed in such a way that (1)
(q)
I2 , . . . , I2
(1)
(q)
and I3 , . . . , I3
(which are inputs to f ) are all distinct, and the corresponding outputs (1)
(1)
(q)
(q)
xL ⊕ I3 , . . . , xL ⊕ I3
(1)
(1)
(q)
(q)
and I2 ⊕ yR , . . . , I2 ⊕ yR
are fixed. In other words, for f , the above 2q input-output pairs are determined. The other 2n − 2q input-output pairs are undetermined. Therefore we n n have (2n )2 −2q = #F 22qn possible choice of f for any such fixed h1 and h2 . Completing the Proof. To summarize, we have: – at least #Hn − 1 q2#Hn choice of h1 , – at least #Hn − 3 q2 #Hn − 2 q 2 #Hn choice of h2 when h1 is fixed, and n – #F 22qn choice of f when h1 and h2 are fixed. Then the number of (h1 , f, h2 ) which satisfy (1) is at least q q (#Hn )(#Fn )(#Hn ) 2 q 1 − 1 − − 1 3 2 2 2 22qn q q (#Hn )(#Fn )(#Hn ) 2 ≥ q − 1 − − 1 2 3 2 2 22qn This concludes the proof of the lemma.
236
Tetsu Iwata and Kaoru Kurosawa
References [1] J. L. Carter and M. N. Wegman. Universal classes of hash functions. J. Comput. Syst. Sci., vol. 18, no. 2, pp. 143–154, 1979. 229 [2] M. Luby and C. Rackoff. How to construct pseudorandom permutations from pseudorandom functions. SIAM J. Comput., vol. 17, no. 2, pp. 373–386, April 1988. 226 [3] S. Lucks. Faster Luby-Rackoff ciphers. Fast Software Encryption, FSE ’96, LNCS 1039, pp. 189–203, Springer-Verlag. 226, 227 [4] M. Naor and O. Reingold. On the construction of pseudorandom permutations: Luby-Rackoff revised. J. Cryptology, vol. 12, no. 1, pp. 29–66, Springer-Verlag, 1999. 226 [5] Y. Ohnishi. A study on data security. Master’s Thesis (in Japanese), Tohoku University, 1988. 226 [6] J. Patarin. Pseudorandom permutations based on the DES scheme. Proceedings of EUROCODE ’90, LNCS 514, pp. 193–204, Springer-Verlag, 1990. 226 [7] J. Patarin. New results of pseudorandom permutation generators based on the DES scheme. Advances in Cryptology — CRYPTO ’91, LNCS 576, pp. 301–312, Springer-Verlag, 1991. 226 [8] J. Patarin. How to construct pseudorandom and super pseudorandom permutations from one single pseudorandom function. Advances in Cryptology — EUROCRYPT ’92, LNCS 658, pp. 256–266, Springer-Verlag, 1992. 226, 227 [9] S. Patel, Z. Ramzan, and G. Sundaram. Towards making Luby-Rackoff ciphers optimal and practical. Fast Software Encryption, FSE ’99, LNCS 1636, pp. 171– 185, Springer-Verlag, 1999. 226, 227, 228, 229, 230 [10] S. Patel, Z. Ramzan, and G. Sundaram. Luby-Rackoff ciphers: Why XOR is not so exclusive. Preproceedings of Selected Areas in Cryptography, SAC 2002, 2002. 226 [11] J. Pieprzyk. How to construct pseudorandom permutations from single pseudorandom functions. Advances in Cryptology — EUROCRYPT ’90, LNCS 473, pp. 140–150, Springer-Verlag, 1990. 226, 227 [12] Z. Ramzan and L. Reyzin. On the round security of symmetric-key cryptographic primitives. Advances in Cryptology — CRYPTO 2000, LNCS 1880, pp. 376–393, Springer-Verlag, 2000. 229, 230 [13] R. A. Rueppel. On the security of Schnorr’s pseudorandom generator. Advances in Cryptology — EUROCRYPT ’89, LNCS 434, pp. 423–428, Springer-Verlag, 1989. 226 [14] B. Sadeghiyan and J. Pieprezyk. On necessary and sufficient conditions for the construction of super pseudorandom permutations. Advances in Cryptology — AISACRYPT ’91, LNCS 739, pp. 194–209, Springer-Verlag, 1991. 226 [15] B. Sadeghiyan and J. Pieprezyk. A construction of super pseudorandom permutations from a single pseudorandom function. Advances in Cryptology — EUROCRYPT ’92, LNCS 658, pp. 267–284, Springer-Verlag, 1992. 226 [16] C. P. Schnorr. On the construction of random number generators and random function generators. Advances in Cryptology — EUROCRYPT ’88, LNCS 330, pp. 225–232, Springer-Verlag, 1988. 226 [17] D. R. Stinson. On the connections between universal hashing, combinatorial designs and error-correcting codes. Congressus Numerantium, vol. 114, pp. 7–27, 1996. 227 [18] M. N. Wegman and J. L. Carter. New hash functions and their use in authentication and set equality. J. Comput. Syst. Sci., vol. 22, no. 3, pp. 265–279, 1981. 229
On the Universal Hash Functions in Luby-Rackoff Cipher
237
[19] Y. Zheng, T. Matsumoto, and H. Imai. Impossibility and optimality results on constructing pseudorandom permutations. Advances in Cryptology — EUROCRYPT ’89, LNCS 434, pp. 412–422, Springer-Verlag, 1990. 226
Threshold MACs Keith M. Martin1 , Josef Pieprzyk2 , Rei Safavi-Naini3 , Huaxiong Wang2 , and Peter R. Wild1 1
Information Security Group, Royal Holloway, University of London Egham, Surrey TW20 0EX, U.K. 2 Centre for Advanced Computing – Algorithms and Cryptography Department of Computing, Macquarie University North Ryde, NSW 2109, Australia 3 Centre for Computer Security Research School of Information Technology and Computer Science, University of Wollongong Northfields Avenue, Wollongong 2522, Australia
Abstract. The power of sharing computation in a cryptosystem is crucial in several real-life applications of cryptography. Cryptographic primitives and tasks to which threshold cryptosystems have been applied include variants of digital signature, identification, public-key encryption and block ciphers etc. It is desirable to extend the domain of cryptographic primitives which threshold cryptography can be applied to. This paper studies threshold message authentication codes (threshold MACs). Threshold cryptosystems usually use algebraically homomorphic properties of the underlying cryptographic primitives. A typical approach to construct a threshold cryptographic scheme is to combine a (linear) secret sharing scheme with an algebraically homomorphic cryptographic primitive. The lack of algebraic properties of MACs rules out such an approach to share MACs. In this paper, we propose a method of obtaining a threshold MAC using a combinatorial approach. Our method is generic in the sense that it is applicable to any secure conventional MAC by making use of certain combinatorial objects, such as cover-free families and their variants. We discuss the issues of anonymity in threshold cryptography, a subject that has not been addressed previously in the literature in the field, and we show that there are trade-offs between the anonymity and efficiency of threshold MACs.
1
Introduction
Providing the integrity and authenticity of information is a major task in computer systems and networks. Message integrity is typically achieved by sharing a secret key k between the sender and the receiver. When sending a message m the sender computes a keyed hash function σ = Fk (m), called MAC or authentication tag, and transmits the string σ along with the message. At reception, the receiver recomputes the authentication tag σ on the received message using the shared key and checks the authenticity of the message by comparing the values of the tags σ and σ . P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 237–252, 2003. c Springer-Verlag Berlin Heidelberg 2003
238
Keith M. Martin et al.
Threshold cryptography has been extensively studied in the past decade. The main goal of threshold cryptography is to replace a single entity by a group of entities. The power of sharing computation in a cryptosystem is crucial in many real-life applications of cryptography. Cryptographic primitives to which threshold cryptosystems have been applied, include digital signature, identification schemes, public-key encryption and block ciphers. Algebraically homomorphic properties have played a crucial role in threshold cryptography to allow several parties to perform jointly a cryptographic task. However, there are cryptographic primitives for which algebraic properties should be avoided or otherwise, their security is questionable. Examples of such primitives include block ciphers, pseudorandom functions, pseudo-random permutations, and MACs. It is desirable to extend the domain of cryptographic primitive to which threshold cryptography can be applied. In this paper, we generalise MACs for the threshold setting. We will study how to share a MAC among a group of senders in such a way that only authorised subsets of the senders can generate valid authenticated messages. In particular, we consider (t, n) threshold MACs in which any t out of n senders can produce valid authentication tags while no collection of t−1 or less senders should be able do so. This means that in a (t, n) threshold MAC, receiving a valid authenticated message implies authorisation by at least t senders. Due to the lack of algebraic properties, our approach for threshold MAC is combinatorial, it implements conventional MACs using combinatorial objects, such as a cover-free family and its variants. Related Work. Previous work on threshold authentication follows two tracks: (1) the computational model, based on digital signature, and (2) the information theoretic model, based on unconditionally secure authentication codes. Both approaches rely heavily on homomorphic properties of the underlying authentication functions. Suppose F : K × M → T is an authentication function (e.g. RSA signature or Simmons’ authentication code), where K, M and T are the sets of keys, messages and authentication tags/signatures, respectively. If F possesses the homomorphic property: F (k1 + k2 , m) = F (k1 , m) ⊕ F (k2 , m), where + and ⊕ are some algebraic operations defined on K and T , respectively, then sharing F can be achieved as follows. We first share the secret key k among the parties using some linear secret sharing schemes, and each party computes his partial tag (or partial signature) in the form F (ki , m) and the valid authenticated tag (or signature) F (k, m) can be then computed as a (linear) combination of the partial tags/signatures generated by an authorised set of parities. In the computational model, threshold signature schemes were independently introduced by Desmedt [8] and Boyd [6]. They have been extensively studied over the past decade (see, for example, [12, 17] ). In a (t, n) threshold signature scheme, signature generation requires collaboration of at least t members out of n participants.
Threshold MACs
239
In the information theoretical model, threshold message authentication codes were introduced by Desmedt, Frankel and Yung [9]. They gave two constructions for unconditionally secure (t, n) threshold authentication under Simmons’ authentication model. Bounds and constructions of their model have been extensively studied (see, for example, [10, 13]). Although construction of threshold authentication schemes generally uses a combination of secret sharing schemes and signature schemes or authentication codes, it is a well-known fact that a simplistic combination of the two primitives could result in a completely insecure system that allows the members of an unauthorised group to recover the secret key of the underlying authentication scheme. In a secure threshold authentication scheme such as threshold signature, the power of any signature generation must be shared among n participants in such a way that any t or more participants can collaborate to produce a valid signature for any given message whilst no subset of fewer than t participants can produce a signature even if many signatures on different messages are known. While most previous threshold cryptosystems were based on cryptographic primitives with homomorphic properties, the work [15] by Micali and Sidney was perhaps the first one dealing with systems based on non-homomorphic primitives. They proposed a method for generation a pseudo-random function f (·) that is shared among n users in such a way that for all the inputs x, any u users can jointly compute f (x) while less than t users fail to do so, where 0 < t ≤ u ≤ n. The idea behind the Micali-Sidney scheme is to generate and distribute secret seeds S = {s1 , . . . , sd } of a random collection of functions, among n users, each user receiving a subset of S, in such a way that any u players together hold all the secret seeds in S while less than t users will lack at least one element d from S. Then the pseudo-random function is computed as f (x) = i=1 fsi (x), where fsi (·), i = 1, . . . , d are poly-random functions. Since MAC can be constructed from a pseudo-random functions [1], the techniques of [15] can be effectively adapted to sharing MACs in a straightforward manner, where the secret seeds are replaced by secret keys in MACs. Brickell, Di Crescenzo and Frankel [7] discussed similar techniques for sharing the computation of block ciphers. However, as shown in [15, 7], the secret seeds or the number of keys, d, of the system is exponential in n in the worst case. Recently, Martin et al [14] extend the work of [15, 7], and present methods and techniques to share the computation of block ciphers that significantly reduces the number of keys from 2O(n) to O(log n) in its optimal form. They also suggested methods for sharing MACs that will be further developed in this paper. Our Work. In this paper, we continue the work of [14] and study the methods of sharing MACs. We give the model of threshold MACs and present constructions for them. The basic idea of our constructions for an (t, n) threshold MAC can be summarised as follows. Let F : {0, 1}κ × M → {0, 1} be a MAC. We first construct a new v-fold XOR MAC F (v) : ({0, 1}κ)v × M → {0, 1} defined v (v) by F ((k1 , . . . , kv ), m) = i=1 F (ki , m) which, in turn, is a secure MAC provided F is secure. We next share F (v) by distributing the keys X = {k1 , . . . , kv }
240
Keith M. Martin et al.
among n senders, P1 , . . . , Pn , in such a way that each sender Pi is given a subset Bi of X and any t senders together can recover all the keys in X while less than t senders will miss at least one key from X. Such a construction indeed results in a (t, n) threshold as it requires n MAC, it is however very inefficient keys and each sender to store n−1 the receivers to hold t−1 t−1 keys, so the complexity of number of keys for both senders and receiver are exponential in n. We improve the above approach using a combinatorial object, called a coverfree family, to assign subsets of keys to the senders. Informally, a set system (X, B), where B = {B1 , . . . , Bn } is a family of subsets of a set X with |X| = v, is called an (n, v, t) cover-free family if any block Bi is not a subset of the union of any other t − 1 blocks in B. If the keys are the elements of X which are allocated to the n senders using the sets B1 , . . . , Bn of an (n, v, t) cover-free family, then obviously any t senders can construct a r-fold XOR MAC F (r) based on their key components where r is the number of keys the t senders hold altogether. An adversary who corrupts up to t − 1 senders is unable to forge the valid authenticated message for F (r) due to the cover-free property. Such a construction reduces the numbers of keys for both senders and receiver to O(log n). We show that, although promising, the cover-free family construction for threshold MAC in general does not provide anonymity for the senders who generate the MAC. That is, in order to enable the receiver to verify the valid authentication tags, the identities of the senders who generate the MAC need to be revealed to the receiver. To hide the group structure for the MAC operations of the senders, it is important to provide certain level of anonymity for the senders who generated the MAC. A threshold MAC may be used to show that the required minimum number of users have authorised a message. They are providing this authorisation on behalf of a larger group of users. An application requiring such a cryptographic primitive might be one for which users are not always active or available. It may be important to protect the identities of those users that are active from attack by an adversary who may want to launch a denial of service attack or to eliminate the users altogether. Thus, we further improve the cover-free based constructions. In particular, we design a special class of cover-free families, called generalised cumulative arrays (or GCAs) that provide a certain level of anonymity for the group, while maintaining the key efficiency in the cover-free approach. The idea is to construct cover-free families in which many distinct groups of t senders will come up with the same keys so the receiver cannot distinguish which group is the actual one who has generated the MAC. We show that there is a trade-off between the key efficiency and the anonymity in our cover-free approach. We stress that our construction methods are generic as: (1) they use the combination of any secure MAC and some special cover-free set systems; (2) they are suitable for sharing other non-homomorphic cryptographic primitives, such a block ciphers, pseudo random functions, and pseudo-random permutations. The paper is organised as follows. In Section 2 we briefly review the basic definition of MACs. In Section 3 we introduce the model of threshold MACs and
Threshold MACs
241
the concept. Section 5 to 7 are devoted to the constructions of threshold MACs. We conclude the paper in Section 8.
2
Message Authentication Codes
A message authentication code is a 3-tuple M = (KGEN, M AC, V F ), called the key generation algorithm, the MAC generation (or signing) algorithm, and the MAC verification algorithm. For a given a security parameter, a key generation algorithm KGEN returns keys for the sender and receiver, which are typically random strings with appropriate length. For a message m and a key k, the algorithm MAC generates a short tag, or MAC, which is transmitted along with the message m in order to authenticate the message. For a message m, a key k, and a tag σ, the verification algorithm VF returns 0 or 1, with 1 accepting the message m as being authentic and 0 otherwise. If there is no confusion, sometimes we will simply denote a MAC M = (KGEN, M AC, V F ) by the generation algorithm MAC. Namely, a message authentication code is a family of functions F : {0, 1}κ × M → {0, 1}, where K = {0, 1}κ and M are the key space and the message space, respectively, and is the length of the tag. We consider the security of the MAC under a chosen message attack. The adversary sees a sequence (m1 , σ1 ), (m2 , σ2 ), · · · , (mq , σq ) of pairs of messages and their corresponding tags (that is, σi = F (k, mi )) transmitted from the sender to the receiver for some chosen messages m1 , . . . , mq . The adversary breaks the MAC if she can find a message m, not included among m1 , · · · , mq , together with its corresponding valid authentication tag σ = F (k, m). The success probability of the adversary is the probability that she breaks F . Following [1], we can formally measure the success of an adversary O by the following experiment: Experiment Forge(F, O) k ← K; (m, σ) ← OF (k,·) If F (k, m) = σ and m was not a query of O then return 1 else return 0 The security of a MAC is measured by its resistance to existential forgery under chosen-message attack, which is captured by giving the adversary O the access to an oracle F (k, ·). The experiment returns 1 when O is successful or 0, otherwise. Definition 1 ([1]). Let F : {0, 1}κ × M → {0, 1} be a MAC, and O a forger. The success probability of O is defined as Adv
mac F (O)
= P r[Experiment Forge(F, O) returns 1].
We associate with F an insecurity function Adv mac F (·, ·) defined for any integers q, s ≥ 0 via mac Adv mac F (q, s) = max{ Adv F (O) }. O
The maximum is over all adversaries O such that the oracle in Experiment Forge(F, O) is invoked at most q times, and the ”running time” is at most s.
242
Keith M. Martin et al.
Definition 2. Let F : {0, 1}κ × M → {0, 1} be a MAC, and O a forger. We say that O (s, q, !)-breaks F if Adv
mac F (q, s)
≥ !.
We say F is (a, q, !)-secure if no forger (s, q, !)-breaks it. If F is (s, q, !)-secure for all values s and q, we say F is !-secure.
3
Threshold MAC
Assume there are n senders P = {P1 , P2 , . . . , Pn } and a receiver R. In a (t, n) threshold MAC, any t-subset of P is able to jointly generate valid tags for messages while less that t senders should not be able to do so. Formally, a (t, n) MAC is a 3-tuple M[nt ] = (KGEN, M AC, V F ), consisting of the following algorithms: 1. KGEN: For a security parameter, the key generation algorithm KGEN returns key kR ∈ K for the receiver R and keys ki ∈ Ki for sender Pi , where K and Ki are sets of possible keys for the receiver and sender Pi , for 1 ≤ i ≤ n. 2. MAC: The MAC generation is a multi-party protocol that any t senders collaboratively generate tags (MACs) for messages. Namely, for a message m and a set of keys of t senders, it returns a valid authentication tag σ. 3. VF: For a message, the key kR of the receiver and a (purported) tag, the verification algorithm returns 0 or 1, with 1 indicating that the tag is generated by t senders from P, and 0, otherwise. We make a few remarks on the above definition. - Key Generation: We assume that all keys of the system are generated and securely distributed to the senders by the receiver. Although it is possible that key generation and distribution can be done by a trusted authority (TA), our model is conceptually different from the model with a TA. In our model, the receiver not only knows all the key information, but also the identity of the sender associated with each key. However, if keys can be generated and distributed by a TA, a sender can hide his identity from the key content. This will have impact on the sender anonymity in the threshold authentication scheme that we will discuss later in this paper. Moreover, any threshold MAC without a TA can be easily adapted to the scheme with a TA. - MAC Generation. The generation of MAC is done in two phases: first, assume that t senders A = {Pi1 , . . . , Pit } want to authenticate a message m, each sender Pij securely computes his partial tag, σiA , and secondly each sender in A transmits his partial tag to a combiner. The combiner uses a public algorithm that computes the tag for the receiver as σ = CA (σiA ; Pi ∈ A) using some publicly known combination function C and then transmits (m, σ) to the receiver. - Communication Model. We assume that all the communications during the generation and transition of partial MACs generated by the senders and MAC sent to the receiver are carried out through public broadcast channels.
Threshold MACs
243
We will assume that an adversary can corrupt up to t − 1 of the n senders. We consider the attack in which the adversary learns all the information held by the corrupted senders and listens the broadcasted messages and tags for his chosen messages. We will denote a (t, n) threshold MAC by M[nt ]. Suppose the adversary corrupts t − 1 senders Pi1 , . . . , Pit−1 and sees a sequence (m1 , σ1 ), (m2 , σ2 ), . . . , (mq , σq ) of pairs of messages and their corresponding tags transmitted to the receiver, where (mi , σi ) is generated by t senders Ai = {Pi1 , . . . , Pit } for chosen messages m1 , . . . , mq . Moreover, the adversary also sees the partial authentication tag broadcasted from Pis when Ai were generating σi , for all 1 ≤ s ≤ t and 1 ≤ i ≤ q. The adversary breaks the threshold MAC, if he can find a message m, not included among m1 , . . . , mq , together with its corresponding valid authentication tag σ that is deemed to be generated collectively by t senders. The success probability of the adversary is the probability that he breaks the threshold MAC M[nt ]. As in the conventional MACs, we can formally measure the success of an adversary O who corrupts t − 1 senders Pi1 , . . . , Pit−1 by the following experiment: Experiment Forge(M[nt ], O[i1 , i2 , · · · , it−1 ]) k ← K; ki ← Ki , 1 ≤ i ≤ n; n (m, σ) ← OM[t ]((k,k1...,kn ),·) If V F (k, (m, σ)) = 1 and m was not a query of O then return 1 else return 0 Definition 3. Let M[nt ] = (KGEN, M AC, V F ) be a (t, n) threshold MAC, and O a forger. The success probability of O is defined as Adv
mac (O) M[n t ]
= P r[Experiment Forge(M[nt ], O[i1 , i2 , · · · , it−1 ]) returns 1].
We associate to M[nt ] an insecurity function Adv tegers q, s ≥ 0 via Adv
mac (q, s) M[n t ]
= max{ Adv O
mac (·, ·) M[n t]
mac MAC (O)
defined for any in-
}.
The maximum is over all forgers O with respect to any t − 1 senders i1 , . . . it−1 such that the oracle in Experiment Forge(M[nt ], O[i1 , i2 , · · · , it−1 ]) is involved at most q times, and the ”running time” is at most s. Definition 4. Let M[nt ] be a (t, n) threshold MAC, and O a forger. We say that O (s, q, !)-breaks M[nt ] if Adv
mac (q, s) M[n t]
≥ !.
We say M[nt ] is (s, q, !)-secure if no forger (s, q, !)-breaks it. If M[nt ] is (s, q, !)secure for all values s and q, we say M[nt ] is !-secure.
244
Keith M. Martin et al.
Anonymity in Threshold MAC. An issue that has not been explicitly addressed in the field of threshold cryptography is the anonymity of the parties who actually carry out the computation of the cryptographic tasks. For many applications in threshold cryptography, the parties who carry out the cryptographic tasks (e.g. signing/encryption) on behalf of the whole group might very well like to hide their individual identities or their group structure. We believe that anonymity is an important property for threshold cryptographic schemes. In the following, we introduce the concept of an anonymous threshold MAC. Definition 5. Let M[nt ] be a (t, n) threshold MAC over n participants P. Let A be a t-subset of P. For any valid authenticated message (m, σ) generated by A, we denote P rm (A) the probability that the receiver, on seeing (m, σ), can guess A correctly. We define the degree of anonymity for A by d(A) = 1 −
max
m∈{0,1}L
P rm (A).
We define the overall degree of anonymity for M[nt ] by A∈Γt,n d(A) n µ= , t
where Γt,n = {A | A ⊆ P, |A| = t}. We say M[nt ] is µ-anonymous. There are nt possible t-subsets of P, we know that the degree of anonymity n for any t-subset A is d(A) ≤ 1 − 1/ t . If the equality holds for all A ∈ Γt,n , we say M[nt ] is perfectly anonymous. It is easy to see that M[nt ] is perfectly anonymous if and only if it is (1 − 1/ nt )-anonymous.
4
The Fundamental Lemma
In this section, we give a lemma that is fundamental to constructing threshold MACs in the rest of the paper. The construction was suggested in [15] and [14]. Let F : {0, 1}κ × {0, 1}L → {0, 1} be a MAC. We define a new MAC F (r) : {0, 1}κr × {0, 1}L → {0, 1} by F (r) ((k1 , . . . , kr ), m) =
r
F (ki , x),
i=1
where (k1 , . . . , kr ) ∈ {0, 1}κr and x ∈ {0, 1}L. We say F (r) is the r-fold XOR MAC of F . Lemma 1. If F is a secure MAC, then F (r) is a secure MAC as well. Moreover, an adversary can generate forged MAC for F (r) only if she knows all key components in (k1 , . . . , kr ). Namely even if all, except one, key components are revealed to the adversary, he cannot generate forged MAC.
Threshold MACs
245
Proof. Assume that an adversary O can be used to break F (r) . We show that O can be used to break F as well. We choose r − 1 keys from K, k1 , . . . , kr−1 and construct a (r − 1)-fold XOR MAC F (r−1) = F ((k1 , . . . , kr−1 ), ·). Given a sequence of valid authenticated messages (m1 , σ1 ), . . . , (mq , σq ) for the MAC F , i.e., F (k, mi ) = σi , for some secret key k, our goal is to find a new pair of message (m, σ) such that σ = F (k, m) and m = mi , i = 1, . . . , q. To this end, we compute δi = F (r−1) (mi ), for i = 1, . . . , q − 1 and feed O with (m1 , δ1 + σ1 ), . . . , (mq , δq + σq ), which consist of q valid authenticated messages of F (r) . So the adversary O will output a new valid (m, σ ∗ ) for F (r) . We then extract σ from σ ∗ by σ = σ ∗ ⊕F (r−1) , and it is clear that (m, σ) is a new valid authenticated message for F . Moreover, it is not unreasonable to assume that the computation cost (time) for computing F (r−1) (mi ) in the above theorem can be negligible.
5
A Simple 0-Anonymous Threshold MAC
Based on Lemma 1, we start with a very simple (t, n) threshold MAC. It combines the t-fold XOR MAC of any secure MAC with Shamir’s secret sharing scheme [16] in a straightforward manner to ensure that the key of the receiver is only t times the size of the key held by any sender. However, it is 0-anonymous, so does not provide any group anonymity. 0-Anonymous Threshold MAC. Let F : {0, 1}κ × {0, 1}L → {0, 1} be a MAC. Three phases of the scheme are as follows. 1. KGEN: Let {0, 1}κ correspond to the finite field GF (2κ ). The receiver randomly chooses t elements a0 , a1 , . . . , at−1 ∈ GF (2κ ) as his secret key and t−1 constructs a polynomial g(y) = i=1 ai y i . The receiver then securely sends g(yi ) to sender Pi , where y1 , . . . , yn are distinct public values in GF (2κ ). 2. MAC: Assume t senders A = {Pi1 , . . . , Pit } want to generate a MAC for a message m, each Pij computes F (g(yij ), m) and broadcasts to other mem(t) bers in A. The final tag is computed as σ = FA (m) = ⊕tj=1 F (g(yij ), m)), and the message (m, σ, {i1 , . . . , it }) is sent to the receiver. 3. VF: The receiver, upon receiving a message (m, σ, {i1 , . . . , it }), uses the index {i1 , . . . , it } to compute the keys for A, and verify the equality of σ = ⊕tj=1 F (g(yij ), m)) for the authenticity. Theorem 1. The above scheme is a secure (t, n) threshold MAC, provided that F is a secure MAC. Proof. The completeness is straightforward. To prove the soundness, we observe that the key distribution is the variant of Shamir secret sharing scheme in which any t − 1 senders, even if they are all corrupt, have no information about the key of any other remaining sender. From Lemma 1, an adversary who corrupts (t) up to t − 1 senders can not generate a valid MAC FA (m) for any t-subset A of P and for any message m since he is unable to find all the keys of any t senders.
246
Keith M. Martin et al.
This scheme is very efficient in terms of the key lengths for the senders and the receiver, as each sender has the same key length as the underlying MAC and for the receiver, who only needs to hold the coefficients a0 , . . . , at−1 of the polynomial g, it only increase t times. The MAC length has only been increased by t log n bits compared to the underlying MAC. However, the identities of the group who generated the MAC has to be revealed to the receiver as their identities are appended to the message as part of the tag. It means that d(A) = 0 for all A, i.e., the scheme is 0-anonymous.
6
Threshold CFF MAC
In this section, we present an approach to construct threshold MACs that combines a secure MAC and a combinatorial object, called a cover-free family (CFF). Definition 6 ([11]). A set system (X, B) with X = {x1 , . . . , xv } and B = {Bi ⊆ X | i = 1, . . . , n} is called an (n, v, t)-cover-free family (or (n, v, t)-CFF for short) if for any subset ∆ ⊆ {1, . . . , n} with |∆| = t and any i ∈ ∆, |Bi \∪ j ∈ ∆ Bj | ≥ 1. j = i
The elements of X are called points and elements of B are called blocks. In other words, in a (n, v, t)-CFF (X, B) the union of any t − 1 blocks in B can not cover any other remaining one. Cover-free families were introduced by Erd¨ os, Frankl and Furedi [11]. They have found many applications in information theory, combinatorics, communication and cryptography. Threshold CFF MAC. Suppose (X, B) is an (n, v, t)-CFF and F : {0, 1}κ × {0, 1}L → {0, 1} is a secure MAC, we construct a (t, n) threshold MAC M[nt ] = (KGEN, M AC, V F ) as follows. 1. KGEN: The receiver randomly chooses v keys in {0, 1}κ, X = {k1 , . . . , kv }, and securely sends a subset Bi ⊆ X of keys to sender Pi , for 1 ≤ i ≤ n, such that (X, B) is an (n, v, t)-CFF, where B = {B1 , . . . , Bn }. 2. MAC: Suppose t senders A = {Pi1 , . . . , Pit } want to authenticate message m. The senders in A first compute the set of indices for their keys, that is, they compute I = {j | kj ∈ Bi1 ∪ · · · ∪ Bit }. Then the senders in A jointly (|I|) compute σ = FI (m) = ⊕j∈I Fkj (m) and send (m, σ, I) to the receiver. (|I|) 3. VF: Upon receiving a message (m, σ, I), the receiver recomputes FI (m), using the keys {kj | j ∈ I}, to verify the authenticity of the message. Theorem 2. Let F : {0, 1}κ ×{0, 1}L → {0, 1} be a secure MAC. If there exists an (n, v, t)-CFF, then there exists a secure (t, n) threshold MAC M[nt ]. Proof. The completeness is straightforward. We show the soundness of the scheme. Suppose an adversary corrupts t − 1 senders, he cannot find all the keys for any t senders since the collection of the key subsets of senders forms an (n, v, t)-CFF. Security for the scheme follows from Lemma 1 directly.
Threshold MACs
247
Note that in the scheme above the sizes of key for the receiver and sender Pi increase v and |Bi | times from the key size of the underlying MAC, respectively. The length of tag of M[nt ] slightly increase from the tag of underlying MAC by v log n bits only. Thus to increase the storage and communication efficiency for the resulting threshold MAC, we hope that v is as small as possible for any given n and t. Note that reducing the value v will reduce each |Bi | naturally. Constructions and bounds for (n, v, t)-CFF were studied by numerous authors (see, for example, [11, 19, 20]). It is shown in [20] that for (n, v, t)-CFF with t ≥ 2, t2 v ≥ c log os et al [11] t log n for some constant c ≈ 1/8. On the other hand, Erd¨ showed that for any n > 0, there exists an (n, v, t)-CFF with v = O(t2 log n) and |Bi | = O(t log n). Next we consider the anonymity of the above scheme. For a t-subset A = {Pi1 , . . . , Pit } of P, we denote w(A) = |{(j1 , . . . , jt ) | Bj1 ∪ · · · ∪ Bjt = Bi1 ∪ · · · ∪ Bit }|. That is, there are w(A) possible t-subsets whose keys are the same as those from A, and thus from the receiver’s point view he is not able to distinguish A from any of these w(A) subsets and the degree of anonymity for A is d(A) = 1 − 1/w(A). Therefore to increase the anonymity for A we would like w(A) as large as possible. Its optimal value is w(A) = nt , in this case the scheme achieves a perfect anonymity. Unfortunately, to have a perfect anonymity in the CFF approach, the value v needs to be extremely large, as shown in the following theorem. Theorem 3. A (t, n) threshold CFF n MAC from an (n, v, t)-CFF provides perfect anonymity if and only if v ≥ t−1 . Due to the space limit, the proof of the theorem is omitted. Thus, threshold CFF MACs for perfect anonymity are not key efficient. In its worst case, the numbers of keys for each senders and receives are both exponential in n, where n is the number of senders. Consider an example of a (6, 10) threshold CFF MAC, it requires the receiver to store 252 keys and each senders to have 126 keys of the underlying MAC. On the other hand, threshold CFF MAC based on an (n, v, t)-CFF with small v may result in poor anonymity. We give an example to illustrate this. Consider a finite field GF (q), where q is a prime power and q ≥ d + 1. We define a CFF (X, B) as follows. X consists of pairs of the elements in GF (q), i.e., X = GF (q) × GF (q) = { (x, y) | x, y ∈ GF (q)}. To each polynomial f of degree less than or equal to d, we associate a block Bf = { (x, f (x)) | x ∈ GF (q)}, and let B = { Bf | f a polynomial of degree at most d}. It is easy to see that |Bf | = q. Furthermore, |B| = q d+1 since there are q d+1 different polynomials with degree at most d. Now, if f = g, then |Bf ∩ Bg | ≤ d because h(x) = f (x) − g(x) is a polynomial with at most d different solutions for the equation h(x) = 0, or f (x)−g(x) = 0. Now for all integers t, d, (X, B) is a (q d+1 , q 2 , t)-cover free family provided q ≥ (t − 1)d + 1. Indeed, for any t blocks, Bi , Bi1 , . . . , Bit−1 , we have
248
Keith M. Martin et al.
t−1 t−1 |Bi \ ∪t−1 j=1 Bij | = |Bi \ ∪j=1 (Bi ∩ Bij )| ≥ |Bi | − j=1 |Bi ∩ Bij | ≥ q − (t − 1)d ≥ 1. The claim follows. Note that if q is slightly larger than the above minimal value, i.e., q ≥ td + 1, then the resulting threshold CFF MAC is 0-anonymous. Indeed, it is easy to see that in this case w(A) = 1 for any t subset A of P since the union of any t blocks in (X, B) doesn’t cover any other blocks, and so is unique.
7
Threshold GCA MAC
We have shown the trade-off between the key efficiency and the degree of anonymity in a threshold CFF MAC. The challenge is to construct cover free families that give high degree of anonymity, and simultaneously the resulting schemes are key efficient for both sender and receiver ends. In the following, we give solutions that significantly improve the efficiency stated in Theorem 3, while maintaining a reasonable degree of anonymity, our approach is based on the concept of generalised cumulative array (GCA for short) first introduced in [14]. Definition 7. Let X1 , . . . , X be disjoint subsets of a set X such that X = ∪j=1 Xj . Let B = {Bi , 1 ≤ i ≤ n} be a family of subsets of X. We call (X1 , X2 , . . . , X ; B) an (n, t) generalised cumulative array (GCA) if the following conditions are satisfied: 1. For any t blocks Bi1 , . . . Bit in B, there exists an j such that Xj ⊆ ∪ts=1 Bis . ⊆ ∪t−1 2. For any t−1 blocks Bi1 , . . . , Bit−1 , and for any j, 1 ≤ j ≤ , Xj s=1 Bis . If |X1 | = · · · = |X | = α for some integer α, we say (X1 , X2 , . . . , X ; B) is an (n, α, , t)-GCA. It is easy to see that a GCA is a CFF. Now we slightly modify the previous threshold CFF MAC scheme as follows, if the underlying CFF is a GCA. Threshold GCA MAC. Let (X1 , X2 , . . . , X ; B) be an (n, α, , t)-GCA and F : {0, 1}κ × {0, 1}L → {0, 1} be a MAC. We construct a threshold MAC, called threshold GCA MAC, as follows. 1. KGEN: The receiver randomly chooses a set of α keys from {0, 1}κ, X = {k1 , . . . , kα }, and partitions X into disjoint subsets X1 , . . . , X with |Xi | = α for all i. The receiver then securely gives to sender Pi a subset of keys Bi ⊆ X in such a way that (X1 , . . . , X ; B) is an (n, α, , t)-GCA, where B = {B1 , . . . , Bn }. 2. MAC: Suppose a t-subset of P, A = {Pi1 , . . . , Pit }, wants to authenticate a message m. For each index j, 1 ≤ j ≤ , they determine the set Ij of indices of their keys in Xj and put J equal to the smallest index j such that {ki |i ∈ Ij } = Xj . Note that since (X1 , . . . , X ; B) is a GCA, such J exists. They then compute σ= F (k, m), k∈XJ
Threshold MACs
249
and send (m, σ, J) to the receiver. 3. VF: The receiver uses keys from Xj to verify the authenticity of (m, σ) by checking the equality σ = ⊕k∈Xj F (k, m). Example. We give an example to illustrate how a threshold GCA MAC works. Suppose that we want to construct a (2, 8) threshold MAC from a secure MAC F : {0, 1}κ × {0, 1}L → {0, 1}. First, the receiver randomly chooses 6 keys k1 , k1 , k2 , k2 , k3 , k3 ∈ {0, 1}κ and partitions them by X1 = {k1 , k1 }, X2 = {k2 , k2 } and X3 = {k3 , k3 }. The receiver then securely sends key subsets to the 8 senders P1 , . . . , P8 , each Pi receives a subset Bi of keys, as follows. P1 P3 P5 P7
: B1 : B3 : B5 : B7
= {k1 , k2 , k3 }; = {k1 , k2 , k3 }; = {k1 , k2 , k3 }; = {k1 , k2 , k3 };
P2 P4 P6 P8
: B2 : B4 : B6 : B8
= {k1 , k2 , k3 }; = {k1 , k2 , k3 }; = {k1 , k2 , k3 }; = {k1 , k2 , k3 }.
Let B = {B1 , . . . , B8 }. Then it is easy to verify that (X1 , X2 , X3 ; B) is a (8, 2, 3, 2)-GCA. Suppose that P1 and P4 want to authenticate a message m. ⊆ B1 ∪ B4 . Now X2 = {k2 , k2 } ⊆ B1 ∪ B4 and X3 = {k3 , k3 } ⊆ B1 ∪ B4 but X1 So P1 and P4 will use keys from X2 to authenticate m. P1 and P4 compute F (k2 , (m)), σ = F (k2 , m) and send (m, σ, {2}) to the receiver who uses the keys in X2 to recompute the authentication tag σ to verify the validity of the message. Clearly, any single sender will not be able to find two keys from any Xi , so cannot forge the authentication tag for any message. Moreover, since there are 16 pairs of senders, {P1 , P3 }, {P1 , P4 }, {P1 , P7 }, {P1 , P8 }, {P2 , P3 }, {P2 , P4 }, {P2 , P7 }, {P2 , P8 }, {P5 , P3 }, {P5 , P4 }, {P5 , P7 }, {P5 , P8 }, {P6 , P3 }, {P6 , P4 }, {P6 , P7 }, {P6 , P8 } such that the keys from each of these 16 pairs cover X2 as well, it follows that the degree of anonymity for {P1 , P4 } is 1 − 1/16 = 15/16. Theorem 4. Let F be a secure MAC and (X1 , . . . , X ; B) be an (n, α, , t) GCA. Then the construction described above results in a secure (t, n) GCA MAC in which the key lengths of the receiver and each sender Pi are α times and |Bi | times of the key length of the underlying MAC F , respectively. Moreover, the resulting threshold MAC is µ-anonymous, where µ ≥ (1 − / nt ). Proof. The completeness is straightforward. The soundness follows from Lemma 1 and the property of GCA. Moreover, the parameters for the key lengths of sendersand receiver are obvious. We are left to show that the threshold MAC is (1 − / nt )-anonymous. For each Xi , 1 ≤ i ≤ l, let c(Xi ) be the number of possible t blocks in B that cover Xi . Then we have i=1 c(Xi ) ≥ nt , since any t blocks cover at least one Xi by the condition of GCA. Let Γ = {A | A ⊆ P, |A| = t}, we partition Γ into subsets Γ1 , . . . , Γ in the following way
250
Keith M. Martin et al.
Γ1 is the subset Γ whose elements cover X1 , Γ2 is the subset of Γ \ Γ1 whose elements cover X2 , .. . Γi is the subset of Γ \ ∪i−1 j=1 Γj whose elements cover Xi .. . Γ is the subset of Γ \ ∪−1 j=1 Γj whose elements cover X It is clear that for each A ∈ Γi , the degree of anonymity for A is d(A) ≥ 1 − 1/|Γi |, and there are at least |Γi | elements that cover Xi . It follows that the overall degree of the anonymity of the threshold MAC is d(A) i µ = i=1 A∈Γ n ≥
i=1
=
t A∈Γi (1 n t
| i=1 (|Γ ni t
− 1/|Γi |)
− 1)
n − = tn t
= 1 − n , t
hence the result follows.
Theorem 4 shows that there is a trade-off between the degree of anonymity and the efficiency of the key lengths. To have high degree of anonymity, we wish the value to be as small as possible. The optimal value of is 1, in this case the threshold MAC is perfectly anonymous. However, the value α which corresponds to the number of keys for the receiver is nt as shown in Theorem 3. A natural question is: can we increase the value of (i.e decrease the degree of the anonymity) and then hope to reduce α from nt . In [14] it was shown that a GCA can be constructed from a perfect hash family [4]. An existence result for perfect hash families with parameters (; n, t, t) (see [4] gives the following lemma which shows we may take α = t and, for fixed t, = O(log n) Lemma 2. There exists an (n, t, , t)-GCA, (X1 , . . . , X , B), in which |Bi | = for all Bi ∈ B provided ≥ tet log n. Combining Theorem 4 and Lemma 2, we obtain the following result. Theorem 5. If there exists a secure MAC F , then for any fixed t, there exists a secure (t, n) threshold MAC in which both the key lengths of the senders and receiver increase O(log n) times from the key length of F , and it provides (1 − O(log n)/ nt )-anonymity.
Threshold MACs
251
We remark that this result is an existence result but that an efficient construction of a (t, n) threshold MAC results whenever there is an efficient construction of a perfect hash family with parameters (; n, t, t).
8
Conclusion
Threshold cryptosystems allow several parties to jointly perform cryptographic operations. A typical approach to construct a threshold cryptographic scheme is to combine a linear secret sharing scheme and a homomorphic cryptographic primitive. The lack of algebraically homomorphic property of MACs rules out such an approach for threshold MACs. In this paper, we proposed method of threshold MACs using the combinatorial approach. Our method is generic in the sense that it combines any secure conventional MACs with certain combinatorial objects, such as cover-free families. We considered the issues of anonymity in the threshold cryptography and showed that there is a trade-off between the anonymity and efficiency of our proposed threshold MACs.
References [1] M. Bellare, J. Kilian and P. Rogaway, The security of cipher block chaining: message authentication codes, Advances in Cryptology – Crypto’94, LNCs, 839 (1994), 340-358. (Also appeared in Journal of Computer and System Sciences Vol.61, No. 3, 2000, 362–399) 239, 241 [2] M. Bellare, R. Guerin and P. Rogaway, XOR MACs: New methods for message authentication, Advances in Cryptology – Crypto’95, LNCs, 963 (1995), 15-28. [3] J. Black, Message authentication codes, PhD thesis, University of California, Davis, 2000. [4] S. R. Blackburn. Combinatorics and Threshold Cryptography, in Combinatorial Designs and their Applications, Chapman and Hall/CRC Research Notes in Mathematics, 403, F. C. Holroyd, K. A. S. Quinn, C. Rowley and B. S. Web (Eds.), CRC Press, London (1999) 49–70. 250 [5] D. Boneh, G. Durfee and M. Franklin, Lower Bounds for Multicast Message Authentication, Advances in Cryptology–Eurocrypt’01, Lecture Notes in Comput. Sci. [6] C. Boyd, Digital multisignatures, Cryptography and coding (Beker and Piper eds.), Clarendon Press, 1989, 241-246. 238 [7] E. Brickell, G. Di Crescenzo and Y. Frankel, Sharing Block Ciphers, Information Security and Privacy, Lecture Notes in Computer Science, ACISP2000, 2000. 239 [8] Y. Desmedt, Society and group oriented cryptology: a new concept, Advances in Cryptography–CRYPTO ’87, Lecture Notes in Comput. Sci. 293, 1988, 120-127. 238 [9] Y. Desmedt, Y. Frankel and M. Yung, Multi-receiver/Multi-sender network security: efficient authenticated multicast/feedback, IEEE Infocom’92, (1992) 20452054. 239 [10] M. van Dijk, C. Gehrmann and B. Smeets, Unconditionally Secure Group Authentication, Designs, Codes and Cryptography, 14 ( 1998), 281-296. 239
252
Keith M. Martin et al.
[11] P. Erd¨ os, P. Frankl, and Z. Furedi, Families of finite sets in which no set is covered by the union of r others, Israel Journal of Mathematics, 51(1985), 79-89. 246, 247 [12] Y. Frankel, P. MacKenzie and M. Yung, Robust efficient distributed RSA-key generation, in Proc. 30th STOC, 663-672, ACM, 1998. 238 [13] K. Martin and R. Safavi-Naini, Multisender Authentication Schemes with Unconditional Security, Information and Communications Security, LNCS, 1334 (1997), 130-143. 239 [14] K. Martin, R. Safavi-Naini, H. Wang and P. Wild, Distributing the Encryption and Decryption of a Block Cipher. Preprint, 2002. 239, 244, 248, 250 [15] S. Micali and R. Sidney. A Simple Method for Generating and Sharing PseudoRandom Functions, with Applications to Clipper-like Escrow Systems. Advances in Cryptology: CRYPTO ’95, Lecture Notes in Computer Science, 963(1995), 185– 195. 239, 244 [16] A. Shamir, How to Share a Secret, Communications of the ACM, 22 (1976), 612– 613. 245 [17] V. Shoup, Practical Threshold Signature, Advances in Cryptology – Eurocrypt’99, LNCS, 1807(2000), 207-222. 238 [18] G. J. Simmons, W.-A. Jackson and K. Martin. The Geometry of Shared Secret Schemes, Bulletin of the ICA, 1 (1991), 71–88. [19] D. R. Stinson, T. van Trung and R. Wei, Secure frameproof codes, key distribution patterns, group testing algorithms and related structures, J. Statist. Plan. Infer., 86(2000), 595–617. 247 [20] D. S. Stinson, R. Wei and L. Zhu, Some new bounds for cover-free families, Journal of Combinatorial Theory, A, 90(2000), 224-234. 247
Ideal Threshold Schemes from MDS Codes Josef Pieprzyk and Xian-Mo Zhang Centre for Advanced Computing – Algorithms and Cryptography Department of Computing, Macquarie University Sydney, NSW 2109, Australia {josef,xianmo}@ics.mq.edu.au
Abstract. We observe that MDS codes have interesting properties that can be used to construct ideal threshold schemes. These schemes permit the combiner to detect cheating, identify cheaters and recover the correct secret. The construction is later generalised so the resulting secret sharing is resistant against the Tompa-Woll cheating. Keywords: Secret Sharing, Threshold Schemes, Cheating Detection and Identification.
1
Introduction
In this paper we use MDS codes, i.e., maximum distance separable codes, to construct ideal threshold schemes. Based on the properties of MDS codes, in these ideal threshold schemes, cheating can be detected, cheaters can be identified and the correct secret can be recovered. The work is structured as follows. The basic concepts of perfect and ideal secret sharing schemes/threshold schemes are introduced in Section 2, In Section 3, we briefly introduce MDS codes. In Section 4, we use MDS codes to construct ideal threshold schemes. We then propose a general construction of ideal threshold schemes in Section 5. The construction not only provides ideal threshold schemes but also protects secret sharing against the Tompa-Woll cheating. In Section 6, we prove that all these ideal threshold schemes, constructed from MDS codes, have an ability to detect incorrect shares, recover correct shares and identify the cheaters. We illustrate our considerations in Section 7. We compare this work with previous works in Section 8. Conclusions close the work.
2
Access Structures and Threshold Structures
A secret sharing scheme is a method to share a secret among a set of participants P = {P1 , . . . , Pn }. Let K denote the set of secrets and S denote the set of shares. The secret sharing has two algorithms: the distribution algorithm (dealer) and the recovery algorithm (combiner). The dealer assigns shares s1 , . . . , sn ∈ S to all the participants P1 , . . . , Pn respectively. Assume that participants Pj1 , . . . , Pj are active, i.e., they currently have trading, then P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 253–263, 2003. c Springer-Verlag Berlin Heidelberg 2003
254
Josef Pieprzyk and Xian-Mo Zhang
they submit their shares sj1 , . . . , sj to the combiner so as to recover a secret while other participants have no trading. Shares sj1 , . . . , sj together can determine a secret K ∈ K if and only if {Pj1 , . . . , Pj } is a qualified subset of P. The collection of all qualified sets is called the access structure Γ . The access structure should be monotone: if A ∈ Γ and A ⊆ B ⊆ P then B ∈ Γ . An access structure Γ = {A | #A ≥ t}, where #X denotes the cardinality of the set X, i.e., the number of elements in the set X and t is an integer with 0 < t ≤ n, is called a (t, n)-threshold access structure. A secret sharing scheme with a (t, n)-threshold access structure is called a (t, n)-threshold scheme. The parameter t is called the threshold. We say that secret sharing based on an access structure Γ is perfect if the following two conditions are satisfied [3]: (1) if A ∈ Γ , then the participants in A can determine a secret, (2) if A
∈ Γ , then the participants in A can determine nothing about a secret (in the information theoretic sense). Alternatively, we say that a (t, n)-threshold scheme is perfect if the following two conditions are satisfied: (1’) if #A ≥ t then the participants in A can determine a secret, (2’) if #A < t then the participants in A can determine nothing about a secret (in the information theoretic sense). It is known [3] that for perfect secret sharing, the size of the shares has to be no smaller than the size of the secrets or #K ≤ #S. In particular, secret sharing is said to be ideal if it is perfect and the size of the shares is equal to the size of the secrets or #K = #S. Thus ideal secret sharing is a special case of perfect secret sharing. Without loss of generality, we can assume that S = K for ideal secret sharing. Threshold schemes were first introduced by Blakley [1] and Shamir [9]. Ito et al. [4] generalised secret sharing for arbitrary monotone access structure.
3
MDS Codes
In this section we introduce MDS codes, that will be used to construct ideal threshold schemes. Let q = pv where p is a prime number and v is a positive integer. We write GF (q) or GF (pv ) to denote the finite field of q = pv elements, and GF (q)n or GF (pv )n to denote the vector space of n tuples of elements from GF (q). Then each vector α ∈ GF (q)n can be expressed as α = (a1 , . . . , an ) where a1 , . . . , an ∈ GF (q). We write HW (α) to denote the Hamming weight of α, i.e., the number of nonzero coordinates of α. The Hamming distance of two vectors α and β in GF (q)n , denoted by dist(α, β), is the Hamming weight of α − β. A set of R vectors in GF (q)n is called an (n, R, d)q code if min{dist(α, β) | α, β ∈ , α
= β} = d. The parameter n is called the length of the code. Each vector in is called a codeword of . In particular, if is a t-dimensional
Ideal Threshold Schemes from MDS Codes
255
subspace of GF (q)n , then the (n, q t , d)q code is called linear and it is denoted by [n, t, d]q . Since an [n, t, d]q code is a subspace of GF (q)n , a linear [n, t, d]q code can be equivalently defined as a t-dimensional subspace of GF (q)n such that min{HW (α) | α ∈ , α
= 0} = d. In this work we focus our attention on linear codes. Let be an [n, t, d]q code. Set ⊥ = {β | β, α = 0 for all α ∈ } where β, α denotes the inner product between two vectors β = (b1 , . . . , bn ) and α = (a1 , . . . , an ), i.e., β, α = b1 a1 + · · · + bn an . The set ⊥ is an (n − t)dimensional linear subspace of GF (q)n and it is called the dual code of . There are two methods to determine a linear code : a generator matrix and a parity check matrix. A generator matrix of a linear code is any t × n matrix G whose rows form a basis for . A generator matrix H of ⊥ is called a parity check matrix of . Clearly, the matrix H is of the size (n − t) × n. Hence α = (a1 , . . . , an ) ∈ if and only if HαT = 0. For any [n, t, d]q code, the following inequality holds and it is known as the Singleton bound [7], [8], [10], t + d ≤ n + 1. In particular, if t + d = n + 1 then the [n, t, d]q code is called maximum distance separable (MDS) [7], [10]. Clearly we can rewrite an [n, t, d]q MDS code as [n, t, n − t + 1]q . MDS codes have interesting properties, that will be used in this work. From [7], [10], we assert the validity of the lemma given below. Lemma 1. Let be an [n, t, d]q code. Then the following statements are equivalent: (i) is an [n, t, n − t + 1]q MDS code, (ii) any t columns of a generator matrix of are linearly independent, (iii) ⊥ is an [n, n − t, t + 1]q MDS code. The following property of MDS codes is known [7], [8], [10]. Lemma 2. Let be an [n, t, n − t + 1]q MDS code. Then n − q + 1 ≤ t ≤ q − 1.
4
Ideal Threshold Schemes from MDS Codes
Construction 1 Let D be a generator matrix of an [n + 1, t, n − t + 2]q MDS code. Thus D is a t × (n + 1) matrix over GF (q) satisfying (ii) of Lemma 1. Set (K, s1 , . . . , sn ) = (r1 , . . . , rt )D
(1)
where each rj ∈ GF (q). For any fixed r1 , . . . , rt ∈ GF (q), K, s1 , . . . , sn can be calculated from (1). We define s1 , . . . , sn to be the shares for participants P1 , . . . , Pn respectively, and define K to be the secret corresponding to the shares s1 , . . . , sn . Lemma 3. The secrets and shares, defined in Construction 1, satisfy Conditions (1’) and (2’) so the resulting secret sharing is a perfect (t, n)-threshold scheme.
256
Josef Pieprzyk and Xian-Mo Zhang
Proof. Index n + 1 columns of D by 0, 1, . . . , n, and write D = [η0 , η1 , . . . , ηn ], where ηj is the jth column of D. Let P1 , . . . , Pn be all the participants and Pj1 , . . . , Pj be all the currently active participants, where 1 ≤ j1 < · · · < j ≤ n. We first verify Condition (1’). Let ≥ t. Assume that the dealer sends shares s1 , . . . , sn to P1 , . . . , Pn respectively, where (s1 , . . . , sn ) is created according to (1). Thus Pj1 , . . . , Pj have their shares sj1 , . . . , sj respectively. Consider a t × submatrix D1 = [ηj1 , . . . , ηj ]. From (1), we get (sj1 , . . . , sj ) = (r1 , . . . , rt )D1
(2)
Recall that D is a generator matrix of an [n + 1, t, n − t + 2]q . Due to the statement (ii) of Lemma 1, when ≥ t, the rank of D1 is t and then according to the properties of linear equations, (r1 , . . . , rt ) is uniquely identified by (sj1 , . . . , sj ). It follows that K is uniquely determined by K = (r1 , . . . , rt )η0 . This proves (1’). We next verify Condition (2’). Let 0 < < t. Consider a t × (1 + ) submatrix D0 = [η0 , ηj1 , . . . , ηj ]. For any arbitrary K, sj1 , . . . , sj ∈ GF (q), consider the system of equations on r1 , . . . , rt : (K, sj1 , . . . , sj ) = (r1 , . . . , rt )D0
(3)
Due to (ii) of Lemma 1, when < t, the rank of D0 is 1 + (≤ t). Thus, using the properties of linear equations, we conclude that (3) has solutions on (r1 , . . . , rt ) and the number of solutions is q t−−1 . This number is independent to the choice of K. Thus the secret K can take any element in GF (q) at an equal probability and thus there is no information on the secret. We then have proved that the scheme satisfies Condition (2’). Summarising Conditions (1’) and (2’), we have proved that the secret and shares, defined in Construction 1, form a perfect (t, n)-threshold scheme. Corollary 1. The secrets and shares, defined in Construction 1, form an ideal (t, n)-threshold scheme. Proof. According to Lemma 3, the (t, n)-threshold scheme, defined in Construction 1, is perfect. Note that each column vector ηj (0 ≤ j ≤ n) of matrix D is nonzero. Thus (r1 , . . . , rt )η0 takes all elements in GF (q) when (r1 , . . . , rt ) takes all vectors in GF (q)t . This implies that K = GF (q). On the other hand, for each j with 1 ≤ j ≤ n, (r1 , . . . , rt )ηj , takes all elements in GF (q) when (r1 , . . . , rt ) takes all vectors in GF (q)t . This implies that S = GF (q). By definition, we know that the scheme is ideal. We now explain how the scheme works. The matrix D is public but (r1 , . . . , rt ) is chosen secretly by the dealer. From (r1 , . . . , rt ), the dealer (distribution algorithm) computes (s1 , . . . , sn ) based on (1). The dealer sends the shares s1 , . . . , sn to participants P1 , . . . , Pn respectively via secure channels. Assume that Pj1 , . . . , Pj are the currently active participants, where 1 ≤ j1 < · · · < j ≤ n. Pj1 , . . . , Pj submit their shares to the combiner (recovery algorithm). The combiner recovers the secret. There are two cases: ≥ t and < t. According to
Ideal Threshold Schemes from MDS Codes
257
Lemma 3 and its proof, if ≥ t, then the combiner can uniquely determine (r1 , . . . , rt ) and then identify the secret K = (r1 , . . . , rt )η0 , while in the case of < t, the secret can be any element in GF (q) with the same probability so the combiner knows nothing about the secret.
5
More General Constructions of Ideal Threshold Schemes
In this section, we generalise Construction 1. Construction 2 Let D be a generator matrix of an [n + 1, t, n − t + 2]q MDS code. Thus D is a t × (n + 1) matrix over GF (q) satisfying (ii) of Lemma 1. Let π0 , π1 , . . . , πn be permutations on GF (q). Set (K, s1 , . . . , sn ) = (r1 , . . . , rt )D
(4)
(K ∗ , s∗1 , . . . , s∗n ) = (π0 (K), π1 (s1 ), . . . , πn (sn ))
(5)
and
where each rj ∈ GF (q). For any fixed r1 , . . . , rt ∈ GF (q), K ∗ , s∗1 , . . . , s∗n can be calculated from (4) and (5). We define s∗1 , . . . , s∗n to be the shares for participants P1 , . . . , Pn respectively, and define K ∗ to be the secret corresponding to the shares s∗1 , . . . , s∗n . Theorem 1. The secrets and shares, defined in Construction 2, form not only a perfect but also an ideal (t, n)-threshold scheme. Proof. Let P1 , . . . , Pn be all the participants and Pj1 , . . . , Pj be all the currently active participants, where 1 ≤ j1 < · · · < j ≤ n. We first verify Condition (1’). Let ≥ t. Assume that the dealer sends the shares s∗1 , . . . , s∗n to P1 , . . . , Pn respectively where (s∗1 , . . . , s∗n ) is created according to (5). Then Pj1 , . . . , Pj have their shares s∗j1 , . . . , s∗j respectively. Clearly, there uniquely exists a (sj1 , . . . , sj ) such that s∗j1 = πj1 (sj1 ), . . . , s∗j = πj (sj ). Due to the same reasoning as in the proof of Lemma 3, (r1 , . . . , rt ) is uniquely identified by (sj1 , . . . , sj ). It follows that K is uniquely determined by (r1 , . . . , rt ). Thus K ∗ = π(K) is uniquely determined. This proves (1’). We next verify Condition (2’). Let 0 < < t. For any arbitrary K ∗ , s∗j1 , . . . , s∗j ∈ GF (q), there uniquely exists a (sj1 , . . . , sj ) such that s∗j1 = πj1 (sj1 ), . . . , s∗j = πj (sj ). Due to the same reasoning as in the proof of lemma 3, for these sj1 , . . . , sj , (3) has solutions on (r1 , . . . , rt ), and the number of solutions is q t−−1 . This number is independent to the choice of K, and thus K can take any element in GF (q) at an equal probability. It follows that K ∗ can take any element in GF (q) at an equal probability, and then there exists no information on the key. We have proved that the scheme satisfies Condition (2’). Summarising Conditions (1’) and (2’), we have proved that the secret and shares, defined in Construction 2, form a perfect (t, n)-threshold scheme. Due to Corollary 1, we know that this scheme is ideal.
258
Josef Pieprzyk and Xian-Mo Zhang
Clearly the schemes in Construction 1 are special schemes in Construction 2 when π0 , π1 , . . . , πn are all the identity permutation on GF (q). We now explain how the scheme works. The matrix D and the n + 1 permutations π0 , π1 , . . . , πn are public but (r1 , . . . , rt ) is chosen secretly by the dealer. From (r1 , . . . , rt ), the dealer (distribution algorithm) computes (s1 , . . . , sn ) based on (4), then (s∗1 , . . . , s∗n ) based on (5). After that, the dealer sends the shares s∗1 , . . . , s∗n to participants P1 , . . . , Pn respectively, via the secure channels. Assume that Pj1 , . . . , Pj are the currently active participants, where 1 ≤ j1 < · · · < j ≤ n, and they wish to recover the secret. They submit their shares to the combiner (recovery algorithm). There are two cases: ≥ t and < t. According to Theorem 1, if ≥ t, then the combiner can uniquely determine (r1 , . . . , rt ) from (4), identify K from (4), and finally identify the secret K ∗ = π0 (K) from (5). In the case when < t, the secret may take any element in GF (q) with uniform probability so the secret cannot be determined. In contrast to Construction 1, Construction 2 not only provides ideal threshold schemes but also improves the schemes in Construction 1. In fact, all the possible share vectors (s1 , . . . , sn ) in a (t, n)-threshold scheme by Construction 1 form a linear subspace of GF (q)n as MDS codes are linear codes. Usually, this is not a desirable property from a point of information security as this case gives a chance to the Tompa-Woll attack [11]. To remove this drawback, we consider schemes in Construction 2. For example, we choose π0 , π1 , . . . , πt−1 to be the identity permutation on GF (q) but we require the permutations πt , . . . , πn on GF (q) to satisfy πt (0)
= 0, . . . , πn (0)
= 0. It is easy to verify that all the possible share vectors (s∗1 , . . . , s∗n ) in the (t, n)-threshold scheme by Construction 2 do not form a linear subspace of GF (q)n , as (s∗1 , . . . , s∗n ) cannot take (0, . . . , 0) ∈ GF (q)n .
6
Cheating Detection and Cheater Identification
In this section, we show that the ideal threshold schemes constructed in Construction 2 have an ability to find whether the shares, submitted by participants to the combiner, are correct, or in other words, the modified shares can be detected. The (t, n)-threshold schemes, defined in Construction 2, have the following property. Theorem 2. Let K ∗ , s∗1 , . . . , s∗n , K, s1 , . . . , sn and r1 , . . . , rt satisfy (4) and (5), ∗ and K ∗ , s∗ 1 , . . . , sn , K , s1 , . . . , sn and r1 , . . . , rt also satisfy (4) and (5). If (r1 , . . . , rt )
= (r1 , . . . , rt ) then the Hamming distance between (K ∗ , s∗1 , . . . , s∗n ) ∗ ∗ and (K , s1 , . . . , s∗ n ) is at least n − t + 2. Proof. Recall that K ∗ = π0 (K), s∗1 = π1 (s1 ), . . ., s∗n = πn (sn ), and K ∗ = ∗ π0 (K ), s∗ 1 = π1 (s1 ), . . ., sn = πn (sn ). Thus we know that K ∗ = K ∗ if and only if K = K ,
(6)
s∗j
(7)
=
s∗ j
if and only if sj =
sj
(j = 1, . . . , n)
Ideal Threshold Schemes from MDS Codes
259
Since (r1 , . . . , rt )
= (r1 , . . . , rt ) and the rank of the matrix D in (4) or (1) is equal to t, we know that (K, s1 , . . . , sn ) and (K , s1 , . . . , sn ) are two distinct codewords of an [n + 1, t, n − t + 2]q MDS code. Thus the Hamming distance between (K, s1 , . . . , sn ) and (K , s1 , . . . , sn ) is at least n − t + 2. On the other hand, according to (6) and (7), we know that the Hamming distance between ∗ (K ∗ , s∗1 , . . . , s∗n ) and (K ∗ , s∗ 1 , . . . , sn ) is equal to the Hamming distance between (K, s1 , . . . , sn ) and (K , s1 , . . . , sn ). This proves the theorem. The following property [10] of codes will be used in this work: Lemma 4. Let be an (n, R, d)q code. For any j with 1 ≤ j ≤ n, the code 0 , obtained by removing the jth coordinate from all codewords of , is a code (n − 1, R, d − 1)q or (n − 1, R, d)q . Given an [n+1, t, n−t+2]q MDS code with a generator matrix D and n+1 permutations π0 , π1 , . . . , πn . According to Theorem 1, we have an ideal threshold scheme defined in Construction 2. Let P1 , . . . , Pn be the participants. We keep using all the notations in Sections 4 and 5. The dealer selects r1 , . . . , rt ∈ GF (q) then computes s1 , . . . , sn ∈ GF (q) by (4), and then s∗1 , . . . , s∗n ∈ GF (q) by (5). The dealer sends the shares s∗1 , . . . , s∗n to P1 , . . . , Pn respectively. Let Pj1 , . . . , Pj be all the currently active participants, where 1 ≤ j1 < · · · < j ≤ n. Consider a t × submatrix D1 consisting of columns of D, indexed by j1 , . . . , j . Set W0 = {(s∗j1 , . . . , s∗j ) = (πj1 (sj1 ), . . . , πj (sj )) | (sj1 , . . . , sj ) = (r1 , . . . , rt )D1 , r1 , . . . , rt ∈ GF (q)}
(8)
According to Theorem 2 and Lemma 4, we state Lemma 5. Any two distinct vectors in W0 , defined in (8), have a Hamming distance at least − t + 1. 6.1
Cheating Detection
Assume that Pj1 , . . . , Pj submit their modified shares s∗j1 + δ1 , . . . , s∗j + δ to the combiner (recovery algorithm) where each δj ∈ GF (q). Thus Pji is honest if δi = 0, otherwise he cheats. We write β = (s∗j1 , . . . , s∗j ), δ = (δ1 , . . . , δ ) and β˜ = β + δ
(9)
Assume that HW (δ1 , . . . , δ ) ≤ − t. Clearly ˜ β) = HW (δ) ≤ − t dist(β,
(10)
260
Josef Pieprzyk and Xian-Mo Zhang
Theorem 3. Given an [n + 1, t, n − t + 2]q MDS code with a generator matrix D and n + 1 permutations π0 , π1 , . . . , πn . According to Theorem 1, we have an ideal (t, n)-threshold scheme defined in Construction 2. Let P1 , . . . , Pn be all the participants and Pj1 , . . . , Pj (t < ≤ n) be all the participants who are currently active. Assume that no more than − t cheaters who submit incorrect ˜ where β˜ has been defined in (9), is correct if and only if β˜ ∈ W0 , shares. Then β, where W0 has been defined in (8), or in other words, the combiner can find that β˜ is correct or incorrect according to β˜ ∈ W0 or β˜
∈ W0 . Proof. Assume that β˜ is correct, or in other words, δ = (δ1 , . . . , δk ) = (0, . . . , 0) where δ has been defined in (9). Thus β˜ is identical with the β. In this case β˜ = β ∈ W0 . Conversely, assume that β˜ ∈ W0 . We now prove by contradiction that β˜ = β. Assume that β˜
= β. According to Lemma 5, β˜ and β have a Hamming distance at least − t + 1. This contradicts (10). The contradiction proves that β˜ must be identical with β and thus β˜ = β is correct. Thus we have proved that β˜ is correct if and only if β˜ ∈ W0 . 6.2
Cheater Identification
In Section 6.1 the combiner can detect incorrect shares sent by participants, however there is no guarantee that it can identify the cheaters or reconstruct the correct shares (and the secret). In this section we consider how to identify the cheaters and how to recover the correct shares. We keep using all the assumptions and the notations in Section 6.1. We additionally suppose that δ = (δ1 , . . . , δ ) satisfies 1 0 < HW (δ) ≤ ( − t) 2
(11)
where r denotes the maximum integer no larger than r. Due to (11) and Theorem 3, the combiner knows that β˜ is incorrect by the fact β˜
∈ W0 . The combiner further determines a vector γ0 ∈ W0 such that ˜ γ0 ) = min{dist(β, ˜ γ) | γ ∈ W0 } dist(β,
(12)
We now prove by contradiction that γ0 is identical with β. Assume that γ0 =
β. Since γ0 , β ∈ W0 , due to Lemma 5, we know that dist(γ0 , β) ≥ − t + 1
(13)
˜ β) = HW (δ) ≤ 1 ( − t), we have Recall that dist(β, 2 ˜ γ) | γ ∈ W0 } ≤ dist(β, ˜ β) ≤ 1 ( − t) ˜ γ0 ) = min{dist(β, (14) dist(β, 2 ˜ +dist(β, ˜ β). Thus dist(γ0 , β) ≤ dist(γ0 , β)+ ˜ Clearly dist(γ0 , β) ≤ dist(γ0 , β) HW (δ). Due to (14), we have 1 1 dist(γ0 , β) ≤ ( − t) + ( − t) ≤ − t < − t + 1 2 2
(15)
Ideal Threshold Schemes from MDS Codes
261
Obviously, (15) contradicts (13). The contradiction disproves the assumption that γ0
= β. Therefore γ0 and β must be identical. After knowing γ0 , i.e., β, the combiner can identify the δ as he has received the vector of β˜ = β + δ. So we can formulate the following theorem. Theorem 4. Given an [n + 1, t, n − t + 2]q MDS code with a generator matrix D and n + 1 permutations π0 , π1 , . . . , πn . According to Theorem 1, we have an ideal (t, n)-threshold scheme defined in Construction 2. Let P1 , . . . , Pn be all the participants and Pj1 , . . . , Pj (t < ≤ n) be all the participants who are currently active. If the number of cheaters is less than or equal to 12 ( − t) then the combiner can identify the cheaters who submitted incorrect shares also recover the correct shares by determining the vector γ0 ∈ W0 where W0 has been defined in (8) and γ0 satisfies (12). Summarising Theorems 3 and 4, the combiner first checks whether the share ˜ that he received from the active participants, is correct. If β˜ is incorrect, vector β, the combiner further determines who are cheaters and reconstructs the correct shares. We notice that both Theorems 3 and 4 require the parameter to be greater than t.
7
Examples
Example 1. There exists an MDS code [18, 9, 10]25, that is also a quadratic residue code (Chapter 4 of [8]). Let D denote a general matrix of this code. For any permutations π0 , π1 , . . . , π17 on GF (25), according to Theorem 1, we can construct an ideal (9, 17)-threshold scheme over GF (25) in Construction 2. Let (9 < ≤ 17) denote the number of currently active participants. Due to Theorems 3 and 4, this scheme has the ability to detect cheating and identify cheaters. More precisely, if there are no more than − 9 participants who submit incorrect shares then the incorrect shares can be detected. Furthermore, if there are no more than 12 ( − 9) participants submitting incorrect shares then all the cheaters can be identified and the correct shares can be recovered. Example 2. Let GF (q) = {0, λ1 , . . . , λq−1 } and t be an integer with 2 ≤ t ≤ q−1. Set 1 1 ··· 1 1 0 λ1 λ2 · · · λq−1 0 0 .. .. .. .. .. .. . . . . . . E= (16) λ21 λ22 · · · λ2q−1 0 0 .. .. .. .. .. .. . . . . . . t−1 t−1 t−1 λ1 λ2 · · · λq−1 0 1 From [7], [10], E is a generator matrix of a [q + 1, t, q − t + 2]q MDS code. For any permutations π0 , π1 , . . . , πq on GF (q), according to Theorem 1, we can construct an ideal (t, q)-threshold scheme over GF (q) in Construction 2. Let
262
Josef Pieprzyk and Xian-Mo Zhang
(t < ≤ n) denote the number of currently active participants. Due to Theorems 3 and 4, this scheme has the ability to detect cheating and identify cheaters. More precisely, if there are no more than − t participants who submit incorrect shares then the incorrect shares can be detected. Furthermore, if there are no more than 12 ( − t) participants submitting incorrect shares then all the cheaters can be identified and the correct shares can be recovered.
8
Comparing This Work with Previous Results
Comparing Shamir scheme [9] with the ideal threshold scheme in Example 2, we can find: (a) k in Shamir scheme is corresponding to t in Example 2, (b) the coefficients a0 , a1 , . . . , ak−1 of the polynomial q(x) in Shamir scheme are corresponding to r1 , . . . , rt in Example 2 respectively, (c) the shares D1 = q(1), . . . , Dn = q(n) in Shamir scheme are corresponding to s1 , . . . , sn in Example 2 respectively, (d) if we remove the last two columns of E in Example 2 and change the entries of E, then we obtain 1 1 ··· 1 1 2 ··· n (17) .. .. .. .. . . . . 1 2t−1 · · · nt−1
where the entries are elements in the residue modulo class of prime p (t ≤ n ≤ p−1), then we regain Shamir scheme. This shows that the Lagrange interpolation suggested in [9] can be re-obtained from Example 2. McEliece and Sarwate [6] generalised Shamir’s construction as they allowed the elements in the Lagrange interpolation to be from a finite field, instead of only elements in a prime filed. They also indicated that the share vectors form Reed-Solomon codes and then their schemes can correct modified shares. As known, Reed-Solomon codes are special MDS codes and MDS codes are not necessarily Reed-Solomon codes. Thus Constructions 1 and 2 are more general. Karnin, Greene and Hellman obtained a similar result (Theorem 2 of [5]) to Construction 1. There is, however, a basic difference between this work and their work. The difference is in the definitions of (t, n) threshold schemes. In our definition, we allow t or more participants to collaborate in recovery of the secret. In fact, the cheating detection relies on the existence of redundant shares so they can be used to identify incorrect ones (then identify cheaters) and to recover the correct secret. Karnin et al considered threshold schemes in which the number of active participants is precisely equal to t. However, as mentioned in Theorem 6 of [5], cheating detection is impossible in this case. Summarising the above discussions, the above previous schemes are all special cases in Construction 1. However Construction 1 is a special case of Construction 2. In addition, according to Theorem 1, we are sure that all the threshold schemes in Constructions 1 and 2 are ideal. However this property was not mentioned in the above papers.
Ideal Threshold Schemes from MDS Codes
9
263
Conclusions
Using interesting properties of MDS codes, we have constructed ideal threshold schemes and indicated that incorrect shares can be detected and the cheaters can be identified, furthermore the correct secret can be recovered. We have further suggested a general construction that not only provides more ideal threshold schemes but also prevents Tompa-Woll attack.
Acknowledgement The work was partially supported by Australian Research Council grant A00103078.
References [1] G. R. Blakley. Safeguarding cryptographic keys. In Proc. AFIPS 1979 National Computer Conference, pages 313–317. AFIPS, 1979. 254 E. F. Brickell and D. R. Stinson. [2] E. F. Brickell and D. M. Davenport. On the Classification of Ideal Secret Sharing Schemes. J. Cryptology, 4: 123 - 134, 1991. [3] E. F. Brickell and D. R. Stinson. Some Improved Bounds on Information Rate of Perfect Sharing Schemes J. Cryptology, 5: 153 - 166, 1992. 254 [4] M. Ito, A. Saito, and T. Nishizeki. Secret sharing scheme realizing general access structure. In Proceedings IEEE Globecom ’87, pages 99–102. IEEE, 1987. 254 [5] E. D. Karnin, J. W. Greene, and M. E. Hellman. On secret sharing systems. IEEE Transactions on Information Theory, IT-29:35–41, 1983. 262 [6] R. J. McEliece and D. V. Sarwate. On Sharing Secrets and Reed-Solomon Codes. Communications of the ACM, Vol. 24, 1981, pp 583-584. 262 [7] F. J. MacWilliams and N. J. A. Sloane. The theory of error-correcting codes. North-Holland, Amsterdam, Seventh Impression 1992. 255, 261 [8] V. C. Pless and W. C. Huffman, Editors. Handbook of Coding Theory, Elsevier Science B. V., 1998. 255, 261 [9] A. Shamir. How to share a secret. Communications of the ACM, 22:612–613, November 1979. 254, 262 [10] S. Roman. Coding and Information Theory. Springer-Verlag, Berlin, Heidelberg, New York, 1992. 255, 259, 261 [11] M. Tompa and H. Woll. How to share a secret with cheaters. Journal of Cryptology, 1(2):133–138, 1988. 258
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms Tae-Jun Park, Mun-Kyu Lee, and Kunsoo Park School of Computer Science and Engineering, Seoul National University Seoul, 151-742, Korea {tjpark,mklee,kpark}@theory.snu.ac.kr
Abstract. The Frobenius expansion is a method to speed up scalar multiplication on elliptic curves. Nigel Smart gave a Frobenius expansion method for elliptic curves defined over odd prime fields. Gallant, Lambert and Vanstone suggested that efficiently computable endomorphisms other than Frobenius endomorphisms can be used for fast scalar multiplication. In this paper we show that these two kinds of endomorphisms can be used together for a certain class of curves, and we present a new expansion method for elliptic curves over odd prime fields. Our experimental results show that the throughputs of the known scalar multiplication algorithms are improved by 7.6 ∼ 17.3% using the new expansion method. Keywords: Elliptic Curve Cryptosystem, Scalar Multiplication, Frobenius Expansion, Endomorphism.
1
Introduction
The use of elliptic curves in cryptography was first suggested by Koblitz [8] and Miller [14], and an extensive research on elliptic curve cryptosystems has been done in recent years. The most time-consuming operation in a cryptographic protocol such as ECDSA [22] is a scalar multiplication of an elliptic curve point. To speed up scalar multiplication, various methods that use special curves have been studied. The use of anomalous elliptic curves was suggested in [9] and [12]. M¨ uller [17] and Cheon et al. [3] extended these ideas to give the Frobenius expansion over small fields of characteristic two, and Solinas [20, 21] combined the nonadjacent form (NAF) [16] and the Frobenius expansion for faster computation. Smart [19] generalized M¨ uller’s result to elliptic curves over small odd prime fields. Kobayashi et al. [7, 6] proposed an efficient scalar multiplication algorithm on elliptic curves over optimal extension fields [1, 2] combining Frobenius map and table reference. Lim and Hwang [10] proposed to use the LL algorithm [11] in conjunction with the Frobenius expansion. Gallant, Lambert and Vanstone [4] suggested that efficiently computable endomorphisms other than Frobenius maps can be used for fast multiplication.
This work was supported by the Brain Korea 21 Project and the MOST grant M60203-00-0039.
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 264–282, 2003. c Springer-Verlag Berlin Heidelberg 2003
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
265
In this paper we propose a new Frobenius expansion method for elliptic curves defined over odd prime fields. To compute mP for an integer m and a point P , we expand the integer m by the Frobenius endomorphism ϕ: m=
k
ri ϕi ,
i=0
where the coefficients ri are of the form ri = ri1 + ri2 λ or ri = ri1 + ri2 ρ (ri1 , ri2 ∈ Z), and λ, ρ are efficiently computable endomorphisms used in [4]. Our method can be used to improve the known scalar multiplication algorithms that use Frobenius expansion, such as Kobayashi et al.’s algorithm [7, 6] and Lim and Hwang’s algorithm [10]. When our method is applied to these algorithms, the number of point doublings in a scalar multiplication is reduced to about a half. Our experimental results show that the overall throughputs of scalar multiplications are increased by 7.6 ∼ 17.3% compared to those of the original algorithms, when these algorithms are implemented over several optimal extension fields [1, 2].
2 2.1
Preliminaries Frobenius Expansion
Let E be an elliptic curve defined over the finite field Fq . An endomorphism φ on E(Fqn ) is a homomorphism φ : E −→ E, i.e., φ(P + Q) = φ(P ) + φ(Q) for all P , Q ∈ E(Fqn ). Well-known endomorphisms on E(Fqn ) are the multiplication-by-m map m [m] : P −→ P + P + · · · + P and the Frobenius map ϕ : (x, y) −→ (xq , y q )
and
O −→ O.
The Frobenius map ϕ satisfies the minimal polynomial ϕ2 − τ ϕ + q = 0, √ where τ is the trace of ϕ and |τ | ≤ 2 q. It is well known that #E(Fq ) = q + 1 − τ [13]. Let End(E) denote the set of all endomorphisms over E. The set End(E) is a ring with two binary operations (+, ◦), where the multiplication is given by composition: (φ + ψ)(P ) = φ(P ) + ψ(P ) ψ ◦ φ(P ) = ψ(φ(P )). If E is non-supersingular, End(E) is an order in a quadratic imaginary field Q( τ 2 − 4q) [18].
266
Tae-Jun Park et al.
Smart showed that the multiplication-by-m map on E(Fqn ) can be expanded in terms of a polynomial in ϕ if q is odd [19]: m = a0 + a1 ϕ + · · · + ak ϕk , where ai ∈ {−(q + 1)/2, . . . , (q + 1)/2}. Thus we can compute mP as follows: mP =
k
ai ϕi (P )
i=0
= ϕ(· · · ϕ(ϕ(ϕ(ak P ) + ak−1 P ) + ak−2 P ) · · · + a1 P ) + a0 P. 2.2
Other Efficient Endomorphisms
Gallant, Lambert and Vanstone introduced a new scalar multiplication method that uses the following endomorphisms [4]. Example 1. Let p ≡ 1 (mod 4) be a prime. Consider the elliptic curve E1 : y 2 = x3 + ax
(1)
defined over Fp . Let α ∈ Fp be an element of order 4. Then we get an efficiently computable map on E1 (Fpn ) λ : (x, y) −→ (−x, αy)
and
O −→ O.
(2)
If P ∈ E1 (Fpn ) is a point of prime order N , then λ acts on P as a multiplication map, i.e., λ(Q) = lQ for all Q ∈ P , where l is an integer satisfying l2 ≡ −1 (mod N ). Note that λ(Q) can be computed using only n multiplications on Fp . Example 2. Let p ≡ 1 (mod 3) be a prime. Consider the elliptic curve E2 : y 2 = x3 + b
(3)
defined over Fp . Let β ∈ Fp be an element of order 3. Then we get an efficiently computable map on E2 (Fpn ) ρ : (x, y) −→ (βx, y)
and
O −→ O.
(4)
If P ∈ E2 (Fpn ) is a point of prime order N , then ρ acts on P as a multiplication map, i.e., ρ(Q) = kQ for all Q ∈ P , where k is an integer satisfying k 2 +k ≡ −1 (mod N ). Note that ρ(Q) can be computed using only n multiplications on Fp . Note that endomorphism λ is equivalent to multiplication by a root of unity of order 4, i.e., λ = ±i, and that√ρ is equivalent to multiplication by a root of unity of order 3, i.e., ρ = (−1 ± 3i)/2.
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
3 3.1
267
New Method for Frobenius Expansions Fourth Roots of Unity
In this section we show that if a prime p satisfies p ≡ 1 (mod 4), then the coefficients of a Frobenius expansion can be represented using 4th roots of unity λ = ±i. We begin by proving that for the curve in (1), ϕ ∈ Z[λ] and division by ϕ in Z[λ] is well defined. Lemma 1. Let p ≡ 1 (mod 4). On curve E1 in (1), Frobenius map ϕ satisfies ϕ ∈ Z[λ] for λ in (2). Proof. Without loss of generality, let λ = i. Let Q(i) = {u + vi | u, v ∈ Q}. Since Q(i) is a quadratic imaginary field, α ∈ Q(i) is called an algebraic integer if it satisfies the monic quadratic equation x2 + a x + b = 0 (a, b ∈ Z). It is well-known that the set of all algebraic integers in Q(i) is Z[i] [5]. over a finite field, Since E1 is a nonsupersingular elliptic curve defined √ ) is an order in a quadratic imaginary field Q( m) (i.e., End(E1 ) ⊗ Q ∼ End(E = 1 √ Q( m)), where m is a square-free rational integer and m < 0 [18]. Since √ λ ∈ End(E1 ) and λ = i, we get i ∈ Q( m). Hence m = −1. Since Frobenius map ϕ is in End(E1 ) and satisfies the minimal equation ϕ2 − τ ϕ + p = 0, ϕ is also an algebraic integer. Therefore, ϕ ∈ Z[i]. ✷ Lemma 2. Let p ≡ 1(mod 4) and s ∈ Z[λ]. There exist r, t ∈ Z[λ] such that s = tϕ + r and r ≤ p2 , where || · || = NZ[λ]/Z (·). Proof. By Lemma 1, ϕ can be written as a + bλ for a, b ∈ Z. By the minimal equation ϕ2 − τ ϕ + p = 0, we get ¯ = a2 + b 2 , p = ϕ · ϕ¯ = (a + bλ) · (a + bλ) (5) τ = ϕ + ϕ¯ = 2a, (6) since λ = ±i. Hence, a = τ2 and b = ± p − a2 . (We can determine which of 2 ϕ = τ2 ± p − τ4 λ holds by applying them to some points on the curve.) Let s = s1 + s2 λ for s1 , s2 ∈ Z. Then there exists a quotient x = x1 + x2 λ (x1 , x2 ∈ Q) such thats = ϕ · x, i.e., s1 + s2 λ = (a + bλ) · (x1 + x2 λ). If we s1 , we get represent s1 + s2 λ as s2
a −b x1 s1 = s2 x2 b a and
x1 x2
1 = p
a b −b a
by (5). To find a quotient in Z[λ], set
x1 , t= x2
s1 s2
(7)
(8)
268
Tae-Jun Park et al.
where z means the nearest integer to z. Then
s1 a −b x1 r = s − tϕ = . − x2 s2 b a
(9)
Without loss of generality, let λ = i. See Fig. 1. Since s, t, ϕ ∈ Z[λ], s and tϕ are in the two-dimensional integer lattice L1 generated by 1 and λ. Note that tϕ is also in the integer lattice L2 generated by ϕ and λϕ, but s is not. Thus, computing r by (7), (8) and (9) is equivalent to finding a point in L2 nearest to s. It is easy to see that r ≤ p2 in the figure. ✷ It can be shown that the number of possible r’s in Lemma 2 is exactly p = a2 + b2 . The following lemma gives a precise set of possible r’s for the special case that p = 5. Lemma 3. Let p = 5 and s ∈ Z[λ]. There exist r ∈ {0, 1, −1, λ, −λ} and t ∈ Z[λ] such that s = tϕ + r. Proof. We first decide the relation between ϕ and λ. For each curve E1 defined by the value of a in (1), we obtain the relations as shown in Table 1. (Note that in some cases there are two valid relations.) Now we find possible values of r. We consider only the case of ϕ = 1 − 2λ, since the other cases are similar. As in the proof of Lemma 2, define x = x1 + x2 λ (x1 , x2 ∈ Q) such that s = ϕ · x, and let t = x1 + x2 λ. Then it is easy to see that r = s − tϕ satisfies r ∈ {0, 1, −1, λ, −λ} as shown in Fig. 2. ✷ We are now ready for the main part of this section. The following theorem shows that a 4th root of unity λ can be used for a Frobenius expansion and that the length of the expansion is reasonably bounded.
0
1
Fig. 1. Computing t and r given s: the case of λ = i
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
269
Table 1. Relation between ϕ and λ for p = 5 curve minimal polynomial relation y 2 = x3 + x ϕ2 − 2ϕ + 5 = 0 ϕ = 1 − 2λ or ϕ = 1 + 2λ ϕ = 2 − λ or ϕ = 2 + λ y 2 = x3 + 2x ϕ2 − 4ϕ + 5 = 0 ϕ = −2 + λ y 2 = x3 + 3x ϕ2 + 4ϕ + 5 = 0 y 2 = x3 + 4x ϕ2 + 2ϕ + 5 = 0 ϕ = −1 − 2λ or ϕ = −1 + 2λ
Theorem 1. Let p ≡ 1 (mod 4) and s ∈ Z[λ]. Then we can write s= where ri ∈ Z[λ], ||ri || ≤
p 2
k
ri ϕi ,
(10)
i=0
and k ≤ 2 logp ||s||.
Proof. There are two cases to consider, i.e., p ≥ 13 and p < 13. We first give a proof for the case p ≥ 13. By Lemma 2, we can obtain an expansion of the form j
i ri ϕ + sj+1 ϕj+1 (11) s = s0 = s1 ϕ + r0 = (s2 ϕ + r1 )ϕ + r0 = · · · = with ||ri || ≤
p
2.
i=0
Using the triangular inequality, we get ||sj+1 || ≤
||sj || + ||rj || ||ϕ||
0
1
1
Fig. 2. Computing t and r given s: the case of ϕ = 1 − 2λ, λ = i
270
Tae-Jun Park et al.
||sj || + ≤ √ p .. .
p 2
sj 1 = √ +√ p 2 j
||s0 || 1 ≤ √ j+1 + √ 2 i=0 p
1 √ p
i
√ p ||s0 || 1 . ≤ √ j+1 + √ · √ p−1 2 p
(12)
Now if j ≥ 2 logp ||s0 || − 1, then ||s0 || √ j+1 ≤ 1. p
(13)
Since p ≥ 13, we see
√ p 1 p < . 1+ √ · √ p − 1 2 2 By (12), (13) and (14), we get p . ||sj+1 || < 2
(14)
Setting sj+1 = rj+1 in (11), we get the expansion (10) with k ≤ 2 logp ||s||. Next we consider the case p < 13. Note that the only prime p such that p ≡ 1 (mod 4) and p < 13 is p = 5. By Lemma 3, we obtain an expansion (11) with ri ∈ {0, 1, −1, λ, −λ}. Using the triangular inequality, we get ||sj+1 || ≤
||sj || + 1 ||sj || + ||rj || 1 ||s0 || √ ≤ ≤ √ j+1 + √ . ||ϕ|| 5 5 −1 5
If j ≥ 2 log5 ||s0 || − 1, then 1 < 2. sj+1 ≤ 1 + √ 5−1 Since sj+1 ∈ Z[λ], sj+1 is in {0, 1, −1, λ, −λ, 1 + λ, 1 − λ, −1 + λ, −1 − λ}. Hence, we obtain √ sj+1 ≤ 2. 5 Setting sj+1 = rj+1 in (11), we get the expansion (10) with ||ri || ≤ 2 and k ≤ 2 log5 ||s||. ✷ Now we give an example. Let P be an F5n -rational point on y 2 = x3 + x defined over F5 . We can expand 17 as follows: 17 = (3 + 7λ)ϕ − λ = ((−2 + 3λ)ϕ − 1)ϕ − λ = ((−2ϕ − λ)ϕ − 1)ϕ − λ = (((−λϕ + λ)ϕ − λ)ϕ − 1)ϕ − λ.
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
271
Hence we compute 17P as follows: 17P = ϕ(ϕ(ϕ(ϕ(−λ(P )) + λ(P )) − λ(P )) − P ) − λ(P ). 3.2
Third Roots of Unity
In this section we show that if a prime p satisfies p ≡ 1 (mod 3), then the coefficients√of a Frobenius expansion can be represented using 3rd roots of unity ρ = (−1 ± 3i)/2. (All the proofs appear in the Appendix since they are similar to those of the previous section.) First, we show that for the curve in (3), ϕ ∈ Z[ρ] and division by ϕ in Z[ρ] is well defined. Lemma 4. Let p ≡ 1 (mod 3). On curve E2 in (3), Frobenius map ϕ satisfies ϕ ∈ Z[ρ] for ρ in (4). Lemma 5. Let p ≡ 1√(mod 3) and s ∈ Z[ρ]. There exist r, t ∈ Z[ρ] such that s = tϕ + r and r ≤ 23p , where || · || = NZ[ρ]/Z (·). It can be shown that the number of possible r’s in Lemma 5 is p. The following lemma gives a precise set of possible r’s for the special case that p = 7. Lemma 6. Let p = 7 and s ∈ Z[ρ]. There exist r ∈ {0, 1, −1, ρ, −ρ, ρ2, −ρ2 } and t ∈ Z[ρ] such that s = tϕ + r. The following theorem shows that a 3rd root of unity ρ can be used for a Frobenius expansion. Theorem 2. Let p ≡ 1 (mod 3) and s ∈ Z[ρ]. Then we can write s=
k
ri ϕi ,
(15)
i=0
where ri ∈ Z[ρ], ri ≤
4
√
3p 2
and k ≤ 2 logp ||s||.
Algorithms
In this section, we present practical algorithms that perform scalar multiplication using our new expansion method. First, we explain two well-known algorithms that use the Frobenius map over Fpn , i.e., the Kobayashi-Morita-KobayashiHoshino algorithm [7, 6] and the LL algorithm with Frobenius expansion [10], which we call hereafter KMKH and FLL, respectively. Then we show how these algorithms can be adapted to use our new expansion method. (Note that Gallant et al.’s algorithm [4], which uses efficient endomorphisms over prime fields Fp , can also be modified to be applicable to Fpn . According to our analysis, however, it does not seem to be as efficient as Frobenius expansion methods in the case of Fpn .)
272
4.1
Tae-Jun Park et al.
The Original Algorithms
The first algorithm that we describe is KMKH [7, 6]. The first and second steps of this algorithm deal with the Frobenius expansion of m and its optimization. In the second step, one reduces the length of the expansion using ϕn (P ) = P . The expansion length 2 logp 2m + 3 [19] is reduced to n, i.e., to about a half if m ≈ pn . The same technique will be applied to our expansion method. Note that we eliminated Step 2-2 in the original algorithm (a step for Hamming weight optimization), since it almost does not affect the overall performance. From now on, subscripts are used to denote array indices, and superscripts with parentheses are used to denote bit positions, where the least significant bit is regarded as the 0-th bit. Algorithm KMKH Input: m, P Output: Q = mP Step 1: Frobenius expansion of m Step 1-1: i ← 0, x ← m, y ← 0, uj ← 0 for 0 ≤ j < 3n. Step 1-2: if (x = 0 and y = 0) then go to Step 2. Step 1-3: ui ← x mod p. Step 1-4: v ← (x − ui )/p, x ← τ v + y, y ← −v, i ← i + 1. Step 1-5: go to Step 1-2. Step 2: optimization of the Frobenius expansion using ϕn (P ) = P Step 2-1: ri ← ui + ui+n + ui+2n for 0 ≤ i < n. Step 3: scalar multiplication Step 3-1: Pi ← ϕi (P ) for 0 ≤ i < n. Step 3-2: Q ← O, j ← log2 p + 1. Step 3-3: Q ← 2Q. Step 3-4: for i = 0 to n − 1 do (j) if (ri = 1) then Q ← Q + Pi . Step 3-5: j ← j − 1. Step 3-6: if (j ≥ 0) then go to Step 3-3.
rn−1
...
rn−2
r1
r0
a columns ra−1 r2a−1
... ...
...
r0 ra
...
...
... rha−1
r1 ra+1
h rows
r(h−1)a+1 r(h−1)a
Fig. 3. Partition of coefficients ri ’s into h groups (a = n/h)
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
273
a =2 r1 r3 r5
r0 r2 r4 r6
h=4
Fig. 4. Example partition for n = 7, h = 4 Now, we illustrate another well-known algorithm, i.e., the FLL algorithm [10] which is a Frobenius expansion version of the LL algorithm [11]. In this algo i rithm, coefficients ri ’s in m = n−1 i=0 ri ϕ are partitioned into h groups as in Fig.3, where h is a parameter of one’s own choice. (There is another parameter v in the original LL algorithm. For FLL, however, we use only v = 1, since the other values of v are always less efficient than v = 1 in practical settings.) Then there are a = n/h columns, and the leftmost columns of the last row can be empty. For example, the case of n = 7, h = 4 is as Fig.4. In the (on-line) precomputation stage, one computes and stores point h−1 ia P P(eh−1 ,eh−2 ,...,e1 ,e0 ) = i=0 ei ϕ (P ) for each possible combination of ei ’s, where ei ∈ {0, 1}. For the example of Fig. 4, one computes the following values: P P(0,0,0,0) = O, P P(0,0,0,1) = P, P P(0,0,1,0) = ϕ2 (P ), .. . P P(1,1,1,1) = ϕ6 (P ) + ϕ4 (P ) + ϕ2 (P ) + P. These precomputed values are used to deal with h bits of the coefficients in parallel for scalar multiplication. The complete algorithm is as follows: Algorithm FLL Input: m, P Output: Q = mP Step 1: Frobenius expansion of m (the same as that of KMKH) Step 2: optimization of the Frobenius expansion (the same that of as KMKH) Step 3: scalar multiplication Step 3-1: compute P P(eh−1 ,...,e0 ) for each possible combination of ei ’s. Step 3-2: Q ← O, j ← log2 p + 1. Step 3-3: Q ← 2Q. (j) (j) (j) (j) Step 3-4: R ← P PIa−1,j , where Iij = (r(h−1)a+i , r(h−2)a+i , . . . , ra+i , ri ). Step 3-5: for i = a − 2 to 0 do R ← ϕ(R). R ← R + P PIij . Step 3-6: Q ← Q + R. Step 3-7: j ← j − 1. Step 3-8: if (j ≥ 0) then go to Step 3-3.
274
Tae-Jun Park et al.
For n = 7, h = 4, Steps 3-4 and 3-5 can be simplified as follows: Step 3-4: R ← P P(0,r(j) ,r(j) ,r(j) ) . 5 3 1 Step 3-5: R ← ϕ(R). R ← R + P P(r(j) ,r(j) ,r(j) ,r(j) ) . 6
4.2
4
2
0
The Improved Algorithms
Algorithms KMKH and FLL can be modified to use our new expansion method. We can construct 4 algorithms, i.e., λKMKH, ρKMKH, λFLL and ρFLL. First we present the λKMKH algorithm that is a modified KMKH so that endomorphism λ is used as well as the Frobenius map. Note that in Step 3, we have to check the signs of ri1 ’s and ri2 ’s, since they can have negative values. Algorithm λKMKH Input: m, P Output: Q = mP Step 1: expansion of m using the Frobenius map and endomorphism λ (See equations (7), (8) and (9) in Lemma 2.) Step 1-1: i ← 0, s1 ← m, s2 ← 0, uj,1 ← 0, uj,2 ← 0 for 0 ≤ j < 3n. Step 1-2:
if (s1= 0 and
s2 = 0) thengo to Step 2. 1 a b x1 s1 Step 1-3: ← , where ϕ = a + bλ. x −b a p 2
s2
ui,1 s1 a −b x1 Step 1-4: . ← − u s b a x2
2
i,2 x1 s1 . ← Step 1-5: x2 s2 Step 1-6: i ← i + 1. Step 1-7: go to Step 1-2. Step 2: optimization of the expansion using ϕn (P ) = P Step 2-1: ri1 ← ui,1 + ui+n,1 + ui+2n,1 for 0 ≤ i < n. ri2 ← ui,2 + ui+n,2 + ui+2n,2 for 0 ≤ i < n. Step 3: scalar multiplication Step 3-1: Pi1 ← ϕi (P ), Pi2 ← λ(ϕi (P )) for 0 ≤ i < n. Step 3-2: Q ← O, j ← (log2 p)/2 + 1. Step 3-3: Q ← 2Q. Step 3-4: for i = 0 to n − 1 do (j) if (ri1 > 0 and ri1 = 1) then Q ← Q + Pi1 . else if (ri1 < 0 and (−ri1 )(j) = 1) then Q ← Q − Pi1 . (j) if (ri2 > 0 and ri2 = 1) then Q ← Q + Pi2 . else if (ri2 < 0 and (−ri2 )(j) = 1) then Q ← Q − Pi2 . Step 3-5: j ← j − 1. Step 3-6: if (j ≥ 0) then go to Step 3-3.
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
275
We omit the description of the ρKMKH algorithm, since it can be constructed similarly. (See the proof of Lemma 5 in Appendix A for precise equations.) Next we give the λFLL algorithm. This algorithm uses the same precomputed table as that of FLL. Hence, because negative coefficients cannot be considered in this table, first we have to transform the negative values to positive ones. This i is done easily using the property that n−1 i=0 ϕ (P ) = O [6] in Step 3. The ρFLL algorithm can be constructed similarly. Algorithm λFLL Input: m, P Output: Q = mP Step 1: expansion of m (the same as that of λKMKH) Step 2: optimization of the expansion (the same as that of λKMKH) Step 3: transformation of the coefficients to positive values Step 3-1: ri1 ← ri1 − mini (ri1 ) for 0 ≤ i < n. ri2 ← ri2 − mini (ri2 ) for 0 ≤ i < n. Step 4: scalar multiplication Step 4-1: compute P P(eh−1 ,...,e0 ) for each possible combination of ei ’s. Step 4-2: Q ← O, j ← (log2 p/2) + 2. Step 4-3: Q ← 2Q. (j) (j) (j) (j) Step 4-4: R←P PIa−1,j,1 , where Iij1=(r(h−1)a+i,1 , r(h−2)a+i,1 , . . . , ra+i,1 , ri,1 ). (j)
(j)
(j)
(j)
R←R+λ(P PIa−1,j,2 ), where Iij2=(r(h−1)a+i,2 , r(h−2)a+i,2 , . . . , ra+i,2 , ri,2 ). Step 4-5: for i = a − 2 to 0 do R ← ϕ(R). R ← R + P PIij1 . R ← R + λ(P PIij2 ). Step 4-6: Q ← Q + R. Step 4-7: j ← j − 1. Step 4-8: if (j ≥ 0) then go to Step 4-3.
5
Performance Analysis
In this section, we compare the performance of scalar multiplication algorithms described in the previous section. For the underlying fields, we consider only optimal extension fields Fpn (OEFs, for short), where p is selected to fit into a CPU word [1, 2]. Note that other fields of the form Fpn are not practical, since they cannot compete in speed with prime fields Fp or even characteristic fields. For example, using Fpn with small p is between ten and one hundred times slower than using even characteristic fields of the same order [19]. The fields and curves that we have implemented are shown in Table 2. We used affine coordinates, because the ratio of field inversion to field multiplication is relatively small in OEFs. Table 3 presents the timings for scalar multiplications on a 866MHz Pentium III CPU using gcc-2.96. For reference, we have also shown the results for the NAF
276
Tae-Jun Park et al.
Table 2. Implemented fields and curves curve curve curve curve curve
1 2 3 4 5
p 231 − 1 229 − 3 216 − 15 214 − 3 214 − 3
n irreducible binomial 7 f (x) = x7 − 3 7 f (x) = x7 − 2 13 f (x) = x13 − 2 13 f (x) = x13 − 2 13 f (x) = x13 − 2
curve order (bits) endomorphism y 2 = x3 + 5 187 ρ y 2 = x3 + 2x 162 λ y 2 = x3 + 37x 193 λ y 2 = x3 + 2x 163 λ y 2 = x3 + 3 169 ρ
Table 3. Timings for scalar multiplications on various curves (µsec) curve curve 1 curve 2 curve 3 curve 4 curve 5
NAF 3895.03 4731.45 6804.44 5381.96 5567.75
KMKH 2018.57 2593.59 2831.96 2302.71 2300.47
λKMKH ρKMKH gaina FLL λFLL ρFLL · 1803.81 11.9% 1472.95 · 1310.24 2233.53 · 16.1% 1919.18 1636.56 · 2542.32 · 11.4% 2155.60 2003.33c · 2046.68 · 12.5% 1809.14 1617.76c · · 2081.00 10.5% 1811.49 · 1627.77c
gainb 12.4% 17.3% 7.6% 11.8% 11.3%
a
throughput increase of λKMKH or ρKMKH over KMKH throughput increase of λFLL or ρFLL over FLL c Algorithms λFLL and ρFLL can be improved further for curves 3, 4 and 5. The values shown are the results using these improved algorithms. (See Appendix B.) b
Table 4. Average number of point operations in one scalar multiplication (additions/doublings) curve curve 1 curve 2 curve 3 curve 4 curve 5
NAF 61.39/184.77 53.13/159.67 63.36/190.68 53.44/160.68 55.50/166.68
KMKH λKMKH or ρKMKH 103.68/30.58 101.10/15.84 96.80/28.58 92.53/14.54 96.47/15.95 89.89/8.12 83.31/13.93 76.12/7.17 83.81/13.95 77.32/7.16
FLL λFLL or ρFLL 63.81/30.33 65.16/15.48 60.20/28.34 60.50/14.10 66.88/15.84 64.50/8.00 59.86/13.82 55.19/7.00 59.92/13.85 55.40/7.00
scalar multiplication algorithm. According to Table 3, our method improves the throughput by 10.5 ∼ 16.1% for the KMKH algorithm, and by 7.6 ∼ 17.3% for the FLL algorithm. We remark that the time required for an expansion is equivalent to only a few point additions. Table 4 shows the average number of point additions and doublings needed for a scalar multiplication. (Note that other operations require negligible time: for example, one ϕ map can be computed using n−1 multiplications on Fp [2, 7], and a λ or ρ map can be computed using n multiplications on Fp .) We see that the results in Table 4 coincide with the timing results in Table 3. (Note that a scalar multiplication on curve 1 is the fastest, although its number of required point operations is relatively large. It is because the underlying field of curve 1 is a Type I OEF.)
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
277
As justified in Table 4, the improvements are mainly obtained from the fact that the number of point doublings is reduced to about a half, since the coefficients in our expansion are smaller than that of the original Frobenius method by a square root order. (See the ranges of indices j in scalar multiplication steps of the algorithms given in the previous section.) Therefore, the gain is expected to be increased further if we use OEFs with larger p, for example, p ≈ 264 for a 64-bit CPU. Finally, we remark that our method does not seem to give much improvement in the point-known-in-advance case, since the off-line precomputation will reduce significantly the required number of on-line doublings in both of the Frobenius method and our method.
6
Concluding Remarks
We proposed a new method of Frobenius expansion for scalar multiplication in an elliptic curve defined over an odd prime field. By reducing the required number of point doublings in a scalar multiplication, our method improves the scalar multiplication algorithms that use Frobenius expansion. According to our experiments, the throughputs of the KMKH algorithm and the FLL algorithm are increased by 7.6 ∼ 17.3% using our method. Our method requires that p satisfy a specific condition p ≡ 1 (mod 4) or p ≡ 1 ( mod 3) and it uses special curves. Hence the number of suitable curves is smaller than those of the original Frobenius expansion methods. It is not a problem, however, since there exist many curves that are suitable for cryptographic use, i.e., that have a large prime factor in their group orders. (Some example curves are given in Section 5.) Note also that instead of λ or ρ, another endomorphism γ can be used in the coefficients of the Frobenius expansion if it is efficiently computable and it satisfies ϕ = a + bγ for some a, b ∈ Z, even if γ is not a root of unity. (Examples of this endomorphism are given in [4], though they are not very efficient, i.e., they are a little harder than a point doubling.) It would also be interesting to apply multi-exponentiation techniques [15] to our methods, since mP = r0,1 (P ) + r0,2 (λ(P )) + · · · + rn−1,1 (ϕn−1 (P )) + rn−1,2 (ϕn−1 (λ(P ))) can be regarded as a sum of 2n scalar multiplications. Finally, we remark that there is no known attack that significantly reduces the time required to compute elliptic curve discrete logarithms on curves such as ones used in this paper [4].
Acknowledgements We thank the anonymous reviewers for their helpful comments and references.
278
Tae-Jun Park et al.
References [1] D. V. Bailey and C. Paar. Optimal extension fields for fast arithmetic in public key algorithms. In Advances in Cryptology-CRYPTO 98, volume 1462 of LNCS, pages 472–485. Springer-Verlag, 1998. 264, 265, 275 [2] D. V. Bailey and C. Paar. Efficient arithmetic in finite field extensions with application in elliptic curve cryptography. Journal of Cryptology, 14(3):153–176, 2001. 264, 265, 275, 276 [3] J. H. Cheon, S. Park, S. Park, and D. Kim. Two efficient algorithms for arithmetic of elliptic curves using Frobenius map. In Public Key Cryptography 98, volume 1431 of LNCS, pages 195–202. Springer-Verlag, 1998. 264 [4] R. Gallant, R. Lambert, and S. Vanstone. Faster point multiplication on elliptic curves with efficient endomorphisms. In Advances in Cryptology-CRYPTO 2001, volume 2139 of LNCS, pages 190–200. Springer-Verlag, 2001. 264, 265, 266, 271, 277 [5] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers. Oxford University Press, 3rd edition, 1954. 267, 279 [6] T. Kobayashi. Base-φ method for elliptic curves over OEF. IEICE Trans. Fundamentals, E83-A(4):679–686, 2000. 264, 265, 271, 272, 275 [7] T. Kobayashi, H. Morita, K. Kobayashi, and F. Hoshino. Fast elliptic curve algorithm combining Frobenius map and table reference to adapt to higher characteristic. In Advances in Cryptology-EUROCRYPT 99, volume 1592 of LNCS, pages 176–189. Springer-Verlag, 1999. 264, 265, 271, 272, 276 [8] N. Koblitz. Elliptic curve cryptosystems. Mathematics of Computation, 48:203– 209, 1987. 264 [9] N. Koblitz. CM-curves with good cryptographic properties. In Advances in Cryptology-CRYPTO 91, volume 576 of LNCS, pages 279–287. Springer-Verlag, 1991. 264 [10] C. H. Lim and H. S. Hwang. Speeding up elliptic scalar multiplication with precomputation. In Information Security and Cryptology-ICISC 99, volume 1787 of LNCS, pages 102–119. Springer-Verlag, 1999. 264, 265, 271, 273 [11] C. H. Lim and P. J. Lee. More flexible exponentiation with precomputation. In Advances in Cryptology-CRYPTO 94, volume 839 of LNCS, pages 95–107. SpringerVerlag, 1994. 264, 273, 281 [12] W. Meier and O. Staffelbach. Efficient multiplication on certain non-supersingular elliptic curves. In Advances in Cryptology-CRYPTO 92, volume 740 of LNCS, pages 333–344. Springer-Verlag, 1992. 264 [13] A. Menezes. Elliptic Curve Public Key Cryptosystems. Kluwer Academic Publishers, 1993. 265 [14] V. Miller. Use of elliptic curves in cryptography. In Advances in CryptologyCRYPTO 85, volume 218 of LNCS, pages 417–428. Springer-Verlag, 1986. 264 [15] B. M¨ oller. Algorithms for multi-exponentiation. In Selected Areas in Cryptography – SAC 2001, volume 2259 of LNCS, pages 165–180. Springer-Verlag, 2001. 277 [16] F. Morain and J. Olivos. Speeding up the computations on an elliptic curve using addition-subtraction chains. Theoretical Informatics and Applications, 24:531– 543, 1990. 264 [17] V. M¨ uller. Fast multiplication on elliptic curves over small fields of characteristic two. Journal of Cryptology, 11:219–234, 1998. 264 [18] J. R. Silverman. The Arithmetic of Elliptic Curves. Springer-Verlag, 1986. 265, 267
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
279
[19] N. P. Smart. Elliptic curve cryptosystems over small fields of odd characteristic. Journal of Cryptology, 12:141–151, 1999. 264, 266, 272, 275 [20] J. A. Solinas. An improved algorithm for arithmetic on a family of elliptic curves. In Advances in Cryptology-CRYPTO 97, volume 1294 of LNCS, pages 357–371. Springer-Verlag, 1997. 264 [21] J. A. Solinas. Efficient arithmetic on Koblitz curves. Designs, Codes and Cryptography, 19:195–249, 2000. 264 [22] ANSI X9.62. Public key cryptography for the financial services industry: the elliptic curve digital signature algorithm (ECDSA), 1999. 264
A
Proofs of Lemmas and Theorem
√ √ Proof of Lemma 4 Let Q( 3i) = {u + √ v 3i | u, v ∈√Q}. It is well-known that the set of all algebraic integers in Q( 3i) is {(a + b 3i)/2 | a, b ∈ Z, a ≡ b (mod 2)} [5]. It is easy to show that this set is equal to Z[ρ]. As in the proof of Lemma 1, ϕ is an algebraic integer, and thus ϕ ∈ Z[ρ]. ✷ Proof of Lemma 5 By Lemma 4, ϕ can be written as a + bρ for a, b ∈ Z. By the minimal equation ϕ2 − τ ϕ + p = 0, we get p = ϕ · ϕ¯ = (a + bρ) · (a + bρ¯) = a2 − ab + b2 ,
(16)
τ = ϕ + ϕ¯ = 2a − b,
(17)
since ρ2 + ρ + 1 = 0. Hence we can decide a and b. Let s = s1 + s2 ρ for s1 , s2 ∈ Z. Then there exists a quotient x = x1 + x2 ρ (x1 , x2 ∈ Q) such that
s= ϕ · x, i.e., s1 + s2 ρ = (a + bρ) · (x1 + x2 ρ). s1 , we get Representing s1 + s2 ρ as s2
1 a−b b x1 s1 = (18) x2 s2 −b a p by (16). Setting
t=
we get
x1 x2
,
(19)
a −b x1 r = s − tϕ = . (20) − x2 b a−b √ Without loss of generality, let ρ = (−1 + 3i)/2. See Fig. 5. Since s, t, ϕ ∈ Z[ρ], s and tϕ are in the integer lattice generated by 1 and ρ. Note that tϕ is also in the integer lattice generated by ϕ and ρϕ, but s is not. It is easy to see that if √ 3p we compute r by (18), (19) and (20), then the largest value of r is 2 . ✷ s1 s2
Proof of Lemma 6 As in the proof of Lemma 3, we can decide the relation between ϕ and ρ as shown in Table 5. We consider only the case of ϕ = −1 + 2ρ, since the other cases are similar. Define x = x1 + x2 ρ (x1 , x2 ∈ Q) such that s = ϕ · x, and let t = x1 + x2 ρ. Then it is easy to see that r = s − tϕ satisfies ✷ r ∈ {0, 1, −1, ρ, −ρ, ρ2, −ρ2 } as shown in Fig. 6.
280
Tae-Jun Park et al.
0
1
Fig. 5. Computing t and r given s: the case of ρ = (−1 +
√ 3i)/2
Proof of Theorem 2 There are two cases: p ≥ 13 and p < 13. We first give a proof for the case p ≥ 13. By Lemma 5, we can obtain an expansion of the form j
s= ri ϕi + sj+1 ϕj+1 (21) with ri ≤
√ 3p 2 .
i=0
As in the proof of Theorem 1, we get √ √ p 3 ||s0 || ·√ . ||sj+1 || ≤ √ j+1 + 2 p−1 p
If j ≥ 2 logp ||s0 || − 1, then
Since p ≥ 13, we see 1+
||s0 || √ j+1 ≤ 1. p √ √ √ p 3 3p ·√ < . 2 p−1 2
Table 5. Relation between ϕ and ρ for p = 7 curve minimal polynomial relation y 2 = x3 + 1 ϕ2 + 4ϕ + 7 = 0 ϕ = −1 + 2ρ or ϕ = −3 − 2ρ ϕ2 + ϕ + 7 = 0 ϕ = 1 + 3ρ or ϕ = −2 − 3ρ y 2 = x3 + 2 2 3 ϕ = −2 + ρ y = x + 3 ϕ2 + 5ϕ + 7 = 0 ϕ = 3 + ρ or ϕ = 2 − ρ y 2 = x3 + 4 ϕ2 − 5ϕ + 7 = 0 ϕ2 − ϕ + 7 = 0 ϕ = −1 − 3ρ y 2 = x3 + 5 2 3 ϕ = 3 + 2ρ or ϕ = 1 − 2ρ y = x + 6 ϕ2 − 4ϕ + 7 = 0
(22)
(23)
(24)
New Frobenius Expansions for Elliptic Curves with Efficient Endomorphisms
281
1 0
1
Fig. 6. Computing t and r given s: the case of ϕ = −1 + 2ρ, ρ = (−1 +
√
3i)/2
By (22), (23) and (24), we get √ 3p . ||sj+1 || < 2 Setting sj+1 = rj+1 in (21), we get the expansion (15) with k ≤ 2 logp ||s||. Next we consider the case p < 13. Note that the only prime p such that p ≡ 1 (mod 3) and p < 13 is p = 7. By Lemma 6, we obtain an expansion (21) with ri ∈ {0, 1, −1, ρ, −ρ, ρ2, −ρ2 }. Using the triangular inequality, we get ||sj+1 || ≤
||s0 || ||sj || + 1 1 √ ≤ √ j+1 + √ . 7 7−1 7
If j ≥ 2 log7 ||s0 || − 1, then 1 sj+1 ≤ 1 + √ . 7−1 The elements in Z[ρ] that satisfies this inequality are {0, 1, −1, ρ, −ρ, ρ√2, −ρ2 }. Setting sj+1 = rj+1 in (21), we get the expansion (15) with ri ≤ 1 < 23·7 and k ≤ 2 log7 ||s||. ✷
B
Modification of λFLL and ρFLL
There are two ways to apply the LL precomputation algorithm [11] to the Frobenius expansion methods. The first one is to partition coefficients into h groups
282
Tae-Jun Park et al.
as described in Section 4. The second one is to slice each of the coefficients into h pieces, i.e., to apply the original LL algorithm to each of the coefficients in parallel. According to our experiments, the latter is more efficient for curves 3, 4 and 5 in Table 2.
Efficient Computations of the Tate Pairing for the Large MOV Degrees Tetsuya Izu1 and Tsuyoshi Takagi2 1
FUJITSU LABORATORIES Ltd. 4-1-1, Kamikodanaka, Nakahara-ku, Kawasaki, 211-8588, Japan [email protected] 2 Technische Universit¨ at Darmstadt, Fachbereich Informatik Alexanderstr.10, D-64283 Darmstadt, Germany [email protected]
Abstract. The Tate pairing has plenty of attractive applications, e.g., ID-based cryptosystems, short signatures, etc. Recently several fast implementations of the Tate pairing has been reported, which make it appear that the Tate pairing is capable to be used in practical applications. The computation time of the Tate pairing strongly depends on underlying elliptic curves and definition fields. However these fast implementation are restricted to supersingular curves with small MOV degrees. In this paper we propose several improvements of computing the Tate pairing over general elliptic curves over finite fields IFq (q = pm , p > 3) — some of them can be efficiently applied to supersingular curves. The proposed methods can be combined with previous techniques. The proposed algorithm is specially effective upon the curves that has a large MOV degree k. We develop several formulas that compute the Tate pairing using the small number of multiplications over IFqk . For k = 6, the proposed algorithm is about 20% faster than previously fastest algorithm. Keywords: Elliptic curve cryptosystem, Tate pairing, Jacobian coordinate, MOV degree, efficient computation.
1
Introduction
After the proposal of the cryptosystems by Koblitz and Miller [Kob87, Mil86], elliptic curves have attracted a lot of cryptographic interests. Menezes-OkamotoVanstone found some weak curves, the supersingular curves, on which the Weil pairing transforms the cyclic group on the elliptic curve into a finite filed with small extension degree (MOV degree) [MOV93]. Then Frey-M¨ ullerR¨ uck extended their attack and found more weak curves by using the Tate pairing [FMR99]. These curves are avoided for cryptographic use. Now, elliptic curves for cryptography have been standardized by many organizations [IEEE, NIST, SEC], on which the transformation just produces a finite filed with huge extension. Recently, Okamoto-Pointcheval found a new class of problems in which the Decision Diffie-Hellman (DDH) problems are easy but the Diffie-Hellman (DH) P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 283–297, 2003. c Springer-Verlag Berlin Heidelberg 2003
284
Tetsuya Izu and Tsuyoshi Takagi
problems are still hard [OP01]. The Weil and the Tate pairings exactly provide this problem and they are used as a constructive tool for cryptography. Indeed, an enormous number of primitives have been proposed [Jou02], such as the tripartite Diffie-Hellman scheme [Jou00], the key agreement scheme [Sma01], the encryption scheme [BF01], the signature scheme [BLS01, Hes02, Pat02, SOK00], and the self-blindable credential certificates [Ver01]. These primitives require the specific elliptic curves, rather than standardized curves in [IEEE, NIST, SEC], on which the transformation by the pairings produces a small extension degree. Supersingular elliptic curves are suitable for this purpose [Gal01]. Recent results [MNT01, DEM02], which construct ordinary elliptic curves with given (small) extension degree, enable us to use ordinary curves in these primitives and provide more freedom for cryptographic applications. On the other hand, fast implementation of the pairings have been studied. Elliptic curves over characteristics 2 and 3 fields are attractive, because point doubling in characteristic 2 (tripling in characteristic 3) can be performed very efficiently [BKLS02, GHS02, SW02]. However, when we implement the pairing based cryptosystems in software or hardware, it is natural to use existing familiar technologies, namely characteristic p fields. They are easy to be implemented because there are massive results of researches since the RSA. Recently, it is reported that characteristic 3 fields are less efficient for hardware implementation [PS02]. Moreover, the discrete logarithm problem over IF2m can be solved faster than that over other finite fields, and thus the size of finite field IF2m must be chosen larger [Cop83]. Contribution of this Paper In this paper, we pursuit the fast computation of the Tate pairing over characteristic p fields. Previous studies, such as [BKLS02, GHS02], were dedicated to the other characteristics or supersingular curves. Our main target is a general (not necessarily supersingular) elliptic curve with arbitrary extension degree. We propose several efficient computations of the Tate pairing. They are specially effective upon large extension degrees e.g., k > 2 — some of them can be efficiently applicable to k = 2 as well. Our algorithms are independent from the previously proposed methods and we can combine our proposed method with them. The computation of the Tate pairing consists of three stages: (1)To compute the elliptic curve addition or doubling, (2)To compute the coefficients of two lines l1 and l2 , (3)To evaluate the values l1 (Q), l2 (Q) and update the value of the Tate pairing. In this paper, we improve/optimize the computation time of each step. For the first step, we develop a new coordinate, called the simplified Chudonovsky-Jacobian coordinate J s . We also proposed an addition formula that directly computes (2w )Q instead of repeatedly applying an elliptic curve doubling. These modifications can save several multiplications. For the second step, we optimize the generation of the coefficients of lines l1 , l2 . For the third step, we encapsulate the lines l1 , l2 into one quadratic equation, and thus we can avoid the computation over extension fields IFqk . This modification can reduce the number of multiplication of IFqk . The efficiency of all the proposed methods are estimated. We also demonstrate how the proposed methods improve the
Efficient Computations of the Tate Pairing for the Large MOV Degrees
285
whole computation of the Tate pairing. For k = 6, the proposed methods are about 20% faster than the previously known algorithms. This paper is organized as follows: In Section 2, we review the arithmetic of elliptic curves and the definition of the Tate pairing. In Section 3, the computation of the Tate pairing using the Miller’s algorithm is described. In Section 4, we present our proposed algorithms. In Section 4.5 the application of our algorithm to supersingular curves is discussed. In Section 5 we compare our proposed algorithm with the previously known algorithms. In Section 6 we state our occlusion of this paper.
2
Preliminaries
In this section we explain the basic arithmetic of elliptic curves and Tate pairing. We assume that base field K = IFq (q = pm , p > 3) is a finite field with q elements in this paper, where p is called the characteristic of K. 2.1
Elliptic Curves
Elliptic curves over K can be represented by the Weierstrass-form equation E(K) := {(x, y) ∈ K ×K | y 2 = x3 +ax+b (a, b ∈ K, 4a3 +27b2 = 0)}∪O, (1) where O is the point of infinity. An elliptic curve E(K) has an additive group structure. Let P1 = (x1 , y1 ), P2 = (x2 , y2 ) be two elements of E(K) that are different from O and satisfy P2 = ±P1 . Then the addition P3 = P1 + P2 = (x3 , y3 ) is defined by x3 = λ2 − x1 − x2 , y3 = λ(x1 − x3 ) − y1 , where λ = = P2 , and λ = (3x21 + a)/(2y1 ) for P1 = P2 . For two (y2 − y1 )/(x2 − x1 ) for P1 points P1 , P2 of E(K), we call P1 + P2 (P1 = P2 ) the elliptic curve addition (ECADD) and P1 + P2 (P1 = P2 ), that is 2 ∗ P1 , the elliptic curve doubling (ECDBL). For a given integer d and a point P on the elliptic curve E(K), compute the point d ∗ P is called the scalar multiplication. 2.2
Coordinate System
There are several ways to represent a point on an elliptic curve. The costs of computing an ECADD and an ECDBL depend on the representation of the coordinate system. The detailed description of the coordinate systems is given in [CMO98]. The major coordinate systems are as follows: the affine coordinate (A), the projective coordinate (P), the Jacobian coordinate (J ), the Chudonovsky-Jacobian coordinate (J C ), and the modified Jacobian coordinate (J m ). We summarize the costs in Table 1, where M, S, I denotes the computation time of a multiplication, a squaring, and an inverse in the definition field K, respectively. The speed of ECADD or ECDBL can be enhanced when the third coordinate is Z = 1 or the coefficient of the definition equation is a = −3. The Jacobian coordinate offers a faster ECDBL (but a slower ECADD). The equation of the curve is given by EJ : Y 2 = X 3 + a XZ 4 + bZ 6 by setting
286
Tetsuya Izu and Tsuyoshi Takagi
Table 1. Computing times of an addition (ECADD) and a doubling (ECDBL) Coordinate ECADD System Z = 1 Z=1 A 2M + 1S + 1I — P 12M + 2S 9M + 2S J 12M + 4S 8M + 3S JC 11M + 3S 8M + 3S Jm 13M + 6S 9M + 5S
ECDBL a = −3 a = −3 2M + 2S + 1I 7M + 5S 7M + 3S 4M + 6S 4M + 4S 5M + 6S 5M + 4S 4M + 4S
x = X/Z 2 , y = Y /Z 3 in (1), and a point on the curve is represented by (X, Y, Z) = 0) are identified as same. where two points (X, Y, Z) and (λ2 X, λ3 Y, λZ) (λ The addition formulas for Jacobian coordinate are given in Table 2. Chudonovsky-Chudonovsky proposed the Chudonovsky-Jacobian coordinate J C , in which a point is represented by a quintuplet (X, Y, Z, Z 2 , Z 3 ) where (X, Y, Z) represents a point in the Jacobian coordinate. In the ChudonovskyJacobian coordinate, there is no need to compute a squaring and a tripling of Zcoordinates of inputs because they are in the coordinate, but need to compute a squaring and a tripling of Z-coordinates of an output. 2.3
Addition Chain
Let d be an n-bit integer and P be a point of the elliptic curve E. A standard way for computing a scalar multiplication d ∗ P is to use the binary expression d = d[n − 1]2n−1 + d[n − 2]2n−2 + . . . + d[1]2 + d[0], where d[n − 1] = 1 and d[i] = 0, 1 (i = 0, 1, ..., n − 2). The binary method computes an ECDBL for every d[i] and an ECADD if d[i] = 0. In average it requires (n − 1) ECDBLs and (n − 1)/2 ECADDs. Because computing the inverse −P of P is essentially free, i we can relax the condition ”binary” to ”signed binary” d = n−1 i=0 d[i]2 , where d[i] = −1, 0, 1. It is called the signed binary method (or the addition-subtraction method). NAF offers a way to construct the addition-subtraction chain, which requires (n − 1) ECDBLs and (n − 1)/3 ECADDs in average [IEEE] for an n-bit
Table 2. Addition formulas in Jacobian coordinate ECADD (8M + 3S) Input: P1 = (X1 , Y1 , Z1 ), P2 = (X2 , Y2 , 1) Output: P3 = P1 + P2 = (X3 , Y3 , Z3 ) U1 ← X1 , U2 ← X2 Z12 S1 ← Y1 , S2 ← Y2 Z13 H ← U2 − U1 , R ← S2 − S1 X3 ← −H 3 − 2U1 H 2 + R2 Y3 ← −S1 H 3 + R(U1 H 2 − X3 ) Z3 ← Z1 H
ECDBL (4M + 6S) Input: P1 = (X1 , Y1 , Z1 ), a Output: P4 = 2 ∗ P1 = (X4 , Y4 , Z4 ) M ← 3X12 + a Z14 S ← 4X1 Y12 X4 ← M 2 − 2S Y4 ← M (S − X4 ) − 8Y14 Z4 ← 2Y1 Z1
Efficient Computations of the Tate Pairing for the Large MOV Degrees
287
scalar multiplication. We denote the signed binary expression obtained by NAF as d = N AF (d)[i]2i . In the binary methods, points P and −P are constant that we can set Z-coordinates of them to 1 for an efficiency reason. 2.4
Tate Pairing
Let be a positive integer coprime to q (In most cryptographic primitives, is set to a prime such that |#E(IFq )). Let k be the smallest positive integer such that the -th root of unity is in IFqk , namely |(q k − 1). k is called the MOV degree [MOV93]. Let E(IFqk )[] be the subgroup of points in E(IFqk ) of order . Then the Tate pairing ·, · is defined by ·, · : E(IFqk )[] × E(IFqk )/E(IFqk ) → IF∗qk /(IF∗qk )
where the right hand value is modulo -th powers. The Tate pairing is computed via the following function fP . Here P is a point of order . There is a function fP whose divisor div(f ) is equal to (P ) − (O). Then we have P, Q = f (Q) where Q denotes a divisor in the same class as Q such that the support of Q is disjoint with the support of (f ). This is done by setting Q = (Q + S) − (S) where (Q) − (O) is the divisor and S ∈ E(IFqk ). For cryptographic applications, values of the Tate pairing are expected to be unique. Thus the Tate pairing is computed by P, Q = (fP (Q + S)/fP (S))(q
k
−1)/
.
(2)
The properties of the Tate pairing are as follows [GHS02]: 1. (Well-defined) O, Q ∈ (IF∗qk ) for all Q ∈ E(IFqk )[] and P, Q ∈ (IF∗qk )
for all P ∈ E(IFqk )[], Q ∈ E(IFqk ). 2. (Non-degeneracy) For each point P ∈ E(IFqk ) − {O}, there exist a point Q ∈ E(IFqk ) such that P, Q ∈ (IF∗qk ) . 3. (Bilinearity) For any integer n, nP, Q ≡ P, nQ ≡ P, Qn modulo -th power. We describe the standard key sizes of q, k, l in the following. q k is at least larger than 1024 bits in order to make the discrete logarithm problem over IFqk intractable. l is at least larger than 160 bits in order to resist the baby-step-giantstep algorithm or Pollard’s λ algorithm for solving the short discrete logarithm k problem of (fP (Q + S)/fP (S))(q −1)/ ∈ IFqk .
3
Computing the Tate Pairing
In this section we estimate the computing time of the Tate pairing via the Miller’s algorithm.
288
Tetsuya Izu and Tsuyoshi Takagi
Table 3. Miller’s Algorithm Input: , P ∈ E(IFq ), Q, S ∈ E(IFqk ) Output: fP (Q + S)/fP (S) ∈ IFqk 1: T = P , f = 1 2: For i = n − 1 down to 0 3: Compute T = ECDBL(T ) and lines l1 , l2 for T + T 2 (S) 4: f ← f 2 × ll11 (Q+S)×l (S)×l2 (Q+S) 5: If [i] = 1 then 6: Compute T = ECADD(T, P ) and lines l1 , l2 for T + P 2 (S) 7: f ← f × ll11 (Q+S)×l (S)×l2 (Q+S) 8: return f
3.1
Miller’s Algorithm
A standard algorithm for computing the fP (Q + S)/fP (S) (in the Tate pairing) is the Miller’s algorithm [Men93]. Let [i] be the bit representation of an n-bit prime where [n − 1] = 1. We review the Miller’s algorithm in Table 3. = P , and is the tangent at Here the line l1 passes two points T and P if T point T if T = P . The line l2 is the vertical line to the x-axis through T + P . We call the procedures in Step 3 and Step 4 as TDBL, and in Step 5 and Step 6 as TADD. A computation of TADD/TDBL is divided into three stages, the computation of ECADD/ECDBL and coefficients for l1 , l2 , the evaluation of l1 (Q + S), l1 (S), l2 (Q + S), l2 (S), and the update of f . 3.2
Straightforward Implementation
Let us estimate the computing time of the Tate pairing when points are represented by the Jacobian coordinate J . Suppose Z-coordinates of the points P are chosen to 1 (this is done by converting to the affine coordinate). We set T = (X1 , Y1 , Z1 ) and P = (X2 , Y2 , 1). We also set S = (xS , yS , 1), Q + S = (xQ+S , yQ+S , 1) as the affine points over E(IFqk ). An element of IFqk is represented as a bold character in the following. We denote a computing time of a multiplication and a squaring in IFqk as Mk and Sk , respectively. As the extension field IFqk is represented as a k-dimensional vector space over IFq , a naive implementation provides Mk = O(k 2 M ), where M is the computation time of a multiplication of IFq . The multiplication of elements between IFqk and IFq requires kM . Computation of TADD: The lines l1add and l2add for P + T (T = ±P ) are given by l1add (x, y) = Z3 (y − Y2 ) − (Y2 Z13 − Y1 )(x − X2 ), l2add (x, y) = Z32 x − X3 ,
Efficient Computations of the Tate Pairing for the Large MOV Degrees
289
where (X3 , Y3 , Z3 ) = P + T . A computation of P + T requires 8M + 3S, and during the computation, the coefficient R = Y2 Z13 − Y1 has been computed. Thus we only need to compute Z3 Y2 , RX2 , Z32 , which requires 2M + 1S. For the evaluation of l1add (Q) and l2add (Q) for Q = (xQ , yQ , 1) ∈ E(IFqk ), we require 3k multiplications in IFq . Then, at last, we update f = f by a = a × l1add (Q + S) × l2add (S), b = b × l1add (S) × l2add (Q + S), where f = a/b is the quotient of two values a, b of IFqk . It requires 4 multiplications in IFqk . Thus a TADD requires TADD = (8M + 3S) + (2M + 1S) + 2(3kM ) + 4Mk = 4Mk + (6k + 10)M + 4S. Computation of TDBL: Similarly, the lines for T + T are given by l1dbl (x, y) = (Z4 Z12 y − 2Y12 ) − (3X12 + a Z14 )(Z12 x − X1 ) l2dbl (x, y) = Z42 x − X4 , where (X4 , Y4 , Z4 ) = T + T . A computation of T + T requires 4M + 6S, and computation of coefficients requires 3M + 1S. For an evaluation, we require 3k multiplications in IFq . An update is computed by a = a2 × l1dbl (Q + S) × l2dbl (S), b = b2 × l1dbl(S) × l2dbl (Q + S), which requires 4 multiplications and 2 squiring in IFqk . Thus a TDBL requires TDBL = (4M + 6S) + (3M + 1S) + 2(3kM ) + 4Mk + 2Sk = 4Mk + 2Sk + (6k + 7)M + 7S.
4
Improvements
In this section we describe how to improve the computation time of the Tate k pairing P, Q = (fP (Q + S)/fP (S))(q −1)/ . In the Miller’s algorithm, we need three stages to update f ; (1) computation of ECDBL(ECADD) and coefficients of l1 , l2 , (2) evaluation of l1 (Q+S), l1 (S), l2 (Q+S), l2 (S), and (3) update of f . All computation in (1) are in IFq , while in (2),(3) are in IFqk . We investigate complete formulas of TADD and TDBL assembled by arithmetics of the definition field IFq and its extension field IFqk . 4.1
Coordinate System
In the computations of ECADD, ECDBL and coefficients of the lines l1 , l2 , we need many squaring of Z-coordinates. This implies that a new coordinate (X, Y, Z, Z 2 ) matches the computation. We call this representation by the simplified Chudonovsky-Jacobian coordinate J s , in which (X, Y, Z) represents a point in the Jacobian coordinate.
290
Tetsuya Izu and Tsuyoshi Takagi
Table 4. Comparison of computing times of ECADD(ECDBL) and coefficients ECADD Coeff. Total J 8M + 3S 2M + 1S 10M + 4S J C 8M + 3S 2M 10M + 3S J s 8M + 3S 2M 10M + 3S
ECDBL Coeff. Total 4M + 6S 3M + 1S 7M + 7S 5M + 6S 3M 8M + 6S 4M + 6S 3M 7M + 6S
In ECADD, our coordinate J s requires 8M + 3S as same as the Jacobian coordinate J and the Chudonovsky-Jacobian coordinate J C . However, coefficients of l1 , l2 are computed in 2M because we have Z32 in the corrdinate. Thus we require (8M + 3S) + 2M = 10M + 3S for ECADD and coefficient computations. Similarily, we require (4M + 6S) + 3M = 7M + 6S for ECDBL and coefficient computations with our coordinate. A comparison is in Table 4. 4.2
Direct Computation of l1 (Q + S) × l2 (S)
In TADD, lines l1add , l2add are given by l1add (x, y) = Z3 (y − Y2 ) − R(x − X2 ) = a x + by + c, l2add (x, y) = Z32 x − X3 = dx + e, where we requires 2M + 1S for coefficients computation as before. Here we have l1add (Q+S)×l2add(S) = ad(xQ+S xS )+bd(yQ+S xS )+aexQ+S +beyQ+S +cdxS +ce, and we need 6M for coefficients and 5kM for evaluation if xQ+S xS and yQ+S xS are pre-computed. For l1add (S) × l1add (Q + S), we compute l1add (S)× l2add (Q + S) = ad(xS xQ+S )+ bd(yS xQ+S )+ aexS + beyS + cdxQ+S + ce, which requires more 4kM , because we have already computed all coefficients and xQ+S xS . After computing l1add (Q + S) × l2add (S) and l1add (S) × l2add (Q + S), the update requires only 2Mk . Thus we need TADDs = (8M + 2S) + (2M + 1S) + 6M + 5kM + 4kM + 2Mk = 2Mk + (9k + 16)M + 3S. Similar avoidance can be possible for TDBL, which requires TDBLs = (4M + 5S)+(3M +1S)+6M +5kM +4kM +2Mk +2Sk = 2Mk +2Sk +(9k+13)M +6S. We summarize these results in the following table.
Table 5. Computing times of a TADD and a TDBL TADD TDBL Straightforward Method 4Mk + (6k + 10)M + 4S 4Mk + 2Sk + (6k + 7)M + 7S Improved Method 2Mk + (9k + 16)M + 3S 2Mk + 2Sk + (9k + 13)M + 6S
Efficient Computations of the Tate Pairing for the Large MOV Degrees
291
Table 6. Iterated ECDBL in the Jacobian coordinate iECDBL (4wM + (4w + 2)S) Input: P0 = (X0 , Y0 , Z0 ), w Output: Pw = 2w P1 = (Xw , Yw , Zw ) W0 ← a Z04 M0 ← 3X02 + W0 S0 ← 4X0 Y02 X1 ← M02 − 2S0 Y1 ← M0 (S0 − X1 ) − 8Y04 Z1 ← 2Y0 Z0 for(i = 1 to w − 1){ 4 )Wi−1 Wi ← 2(8Yi−1 Mi ← 3Xi2 + Wi Si ← 4Xi Yi2 Xi+1 ← Mi2 − 2Si Yi+1 ← Mi (Si − Xi+1 ) − 8Yi4 Zi+1 ← 2Yi Zi }
4.3
Iterated TDBL
For a point P ∈ E(IFq ), computing 2w P is called the w-iterated ECDBL. A witerated ECDBL can be computed by applying ECDBL w times successively, but it may be faster by sharing intermediate results if we call it as one function. Indeed, Itoh et al. proposed a fast algorithm (Table 6) for a w-iterated ECDBL in the Jacobian coordinate [ITTTK99], which requires 4wM + (4w + 2)S. This idea can be easily applied to our situation. We show a fast algorithm for the w-iterated TDBL in the following. We represent f = a/b as the quotient of two elements a, b ∈ IFqk . Suppose 2i P = (Xi , Yi , Zi ) are computed by the (i) (i) iterated ECDBL for i ≥ 1, The lines l1 and l2 are recursively defined by the equations: (i)
2 2 2 l1 (x, y) = (Zi Zi−1 y − 2Yi−1 ) − Mi−1 (Zi−1 x − Xi−1 ), (i)
l2 (x, y) = Zi2 x − Xi , 2 4 where Zi = 2Yi−1 Zi−1 , Mi−1 = 3Xi−1 + a Zi−1 . Here we require 3M + 1S for 2 2 2 2 . coefficients Zi Zi−1 , Mi−1 Zi−1 , Mi−1 Xi−1 , Zi , because we have computed Zi−1 The update of the iterated TDBL is similarly computed by the direct computation technique. Thus we have
iTDBLs (w) = 2wMk + 2wSk + (9k + 13)wM + (5w + 1)S 4.4
Combining with Previous Methods
In this section, we discuss the combination of our techniques with previous two techniques: the first is to use the random element S ∈ E(IFq ) ([GHS02]) and the
292
Tetsuya Izu and Tsuyoshi Takagi
other one is to use the condition |(q − 1) ([BKLS02]). They aim at enhancing the computation time of the Tate pairing for supersingular curves. However, these choices of the parameters can improve the efficiency of computing the Tate pairing for genera curves. We estimate how the choices can make the Tate pairing faster combining with the methods from the previous sections. S ∈ E(IFq ): We can choose S ∈ E(Fq ) instead of S ∈ E(Fqk ) [GHS02]. Let DQ be the divisor from the class (Q) − (O). Then DQ ∼ (Q + S) − (S) for any S ∈ E(IFqk ), and we can choose S ∈ E(IFq ). A problem might occur during the computation of the Tate pairing. The calculation of TDBL and TADD for points T, P should be equal to neither of ±S, ±(S + Q). If S ∈ E(IFq ) is randomly chosen, the error probability is negligible, because we compute P using the addition chain. The number of the intermediate points arisen from the addition chain of are bounded in the polynomial order of log , and the possible choice of S is in the exponential order of q > . We denote by TADDS∈E(IFq ) and TDBLS∈E(IFq ) the computation time of TADD and TDBL for S ∈ E(IFq ), respectively. We first consider TADDS∈E(IFq ) . If we choose S ∈ E(IFq ), the values of l1 (S), l2 (S) are in subfield IFq . l1add (S) = Z3 (yS − Y2 ) − R(xS − X2 ), l2add (S) = Z32 xS − X3 , where R = Y1 −Y2 Z13 . The coefficient computation requires 2M +1S and the evaluation requires only 3M . We estimated that the evaluation of l1 (Q+S), l2 (Q+S) for Q + S ∈ E(IFqk ) requires 3kM . a = a × l1add (Q + S) × l2add (S) b = b × l1add (S) × l2add (Q + S), The updating of a, b requires 2Mk + 2kM due to l2add (S), l1add (S) ∈ IFq . In total we need 2Mk +(5k+13)M +3S for computing TADDS∈E(IFq ) . Similarly computing a TDBL requires TDBLS∈E(IFq ) = 2Mk + 2Sk + (5k + 10)M + 6S, and w-iterated TDBL requires 2wMk + 2wSk + (5k + 10)wM + (5w + 1)S. We summarize the results in the following table.
Table 7. Computing times of TADD/TDBL/iterated TDBL (S ∈ E(IFq )) Computing times (S ∈ E(Fq )) TADD 2Mk + (5k + 13)M + 3S TDBL 2Mk + 2Sk + (5k + 10)M + 6S w-iterated TDBL 2wMk + 2wSk + (5k + 10)wM + (5w + 1)S
Efficient Computations of the Tate Pairing for the Large MOV Degrees
293
q − 1 is Not Divisible by : The prime q must satisfies |(q k − 1), where k is the divisor of #E(IFq ). The Tate pairing computes α(q −1)/ in the final step, k where α ∈ IFqk . If we choose with |(q − 1), then we have a(q −1)/ = 1 for a ∈ IFq . This observation was pointed out for supersingular curves in the paper [BKLS02]. The condition of |(q − 1) can be checked in the parameter generation stage. When we combine this condition with S ∈ E(IFq ), the computation of the Tate
|(q−1)
|(q−1) pairing can be speeded up. We denote by TADDS∈E(IFq ) and TDBLS∈E(IFq ) the computation time of TADD and TDBL for S ∈ E(IFq ) with condition |(q − 1), respectively.
|(q−1) We first consider TADDS∈E(IFq ) . We assume S ∈ E(IFq ). If we choose S ∈ E(IFq ), the values of l1 (S), l2 (S) are in subfield IFq . Thus these values can be discarded of the evaluation. We can update f = a/b as follows: a = a × l1add (Q + S) b = b × l2add (Q + S). The evaluation of l1 (Q+S), l2 (Q+S) for Q+S ∈ E(IFqk ) requires (3k+2)M +1S. The updating of a, b require 2Mk . Consequently we need 2Mk + (3k + 10)M + 3S for computing a TADD under assumptions S ∈ E(IFq ) and |(q − 1). Similarly,
|(q−1) computing a TDBL requires TDBLS∈E(IFq ) = 2Mk + 2Sk + (3k + 7)M + 6S, and a w-iterated TDBL requires 2wMk + 2wSk + (3k + 7)wM + (5w + 1)S. 4.5
Application to Supersingular Curves
In this section, we discuss the improvements for supersingular elliptic curves. We combine the methods proposed in reference [BKLS02] with our methods in the previous section. According to [Men93], the trace of the supersingular curve over IFp (p > 3) equals to 0, that is we have the extension degree k = 2. In the following we consider a supersingular curve defined by the equation y 2 = x3 + ax over IFp (p ≡ 3 (mod 4)), which has a distortion map Φ : (x, y) → (−x, iy) ∈ IFp2 , where i2 = −1. In this case, the computation time of a multiplication M2 and a squaring S2 in the extension field IFp2 can be achieved M2 = 4M and S2 = 2M , where M and S are the computation time of a multiplication and a squaring in the prime field IFq , respectively.
Table 8. Computing times of TADD/TDBL/iterated TDBL (S ∈ E(IFq ), |(q − 1)) Computing times (S ∈ E(IFq ) and |(q − 1)) TADD 2Mk + (3k + 10)M + 3S TDBL 2Mk + 2Sk + (3k + 7)M + 6S w-iterated TDBL 2wMk + 2wSk + (3k + 7)wM + (5w + 1)S
294
Tetsuya Izu and Tsuyoshi Takagi
Table 9. Computing times of TADD/TDBL/iterated TDBL for y 2 = x3 + ax
TADD TDBL iterated TDBL
Computing times (S ∈ E(IFp ) and Φ) 18M + 3S 16M + 6S 16wM + (5w + 1)S
When a point Q is computed using the distortion map [BKLS02, Jou02], we can make the computation of the Tate pairing much faster. One reason is that we can choose Q = (x, y), where one of x, y is the element of subgroup IFq . The other reason is that we do not have to generate a point Q ∈ E(IFq2 ) — the point Q is easily converted from a point in E(IFq ). We estimate the running time of TADD, TDBL and w-iTDBL under the assumption S ∈ E(IFp ). If we use the distortion map, condition |(p − 1) Φ is automatically satisfied. We denote by TADDΦ S∈E(IFp ) , TDBLS∈E(IFp ) , and wiTDBLΦ S∈E(IFp ) , the computation time of TADD, TDBL, and w-iTDBL for S ∈ E(IFp ) with torsion map Φ, respectively. Because the x-coordinate of Φ(Q + S) is an element of IFp , l2add (Q + S) ∈ IFp holds. Thus we do not have to compute l2add (Q + S) due to |(p − 1). We can update f = a/b as follows: a = a × l1add (Q + S). Here we have a representation l1add (Q + S) = gy + h for some g, h ∈ IFp . The evaluation of l1add (Q+S) requires 4M . The updating of a requires 1M2 = 4M . In total we need 1M2 + (2k + 10)M + 3S = 18M + 3S for computing TADDΦ S∈E(IFp ) . Φ Φ Similarly, computing TDBLS∈E(IFp ) requires 16M + 6S, and w-iTDBLS∈E(IFp ) requires 16wM + (5w + 1)S, respectively. If we implement the Tate pairing over supersingular curve y 2 = x3 + ax.
5
Comparison
In this section, we compare the computing times of the Tate pairing. In order to assure the security of the pairing based primitives, || ≥ 160 and |q k | ≥ 1024 [GHS02], where |x| denotes the bit length of x. So we used 5 pairs of parameters (k, |q|) = (2, 512), (3, 342), (4, 256), (5, 205), (6, 171). For each parameter, we randomly generate 1000 s and compute the Tate pairing with the NAF representation [IEEE]. Algorithms are as follows: (0) Straight-forward implementation, (1) Direct computation in J s , (1i) Direct computation in J s with iterated TDBL, (2) Direct computation in J s with S ∈ E(IFq ), (2i) Direct computation in J s with S ∈ E(IFq ) with iterated TDBL, (3) Direct computation in J s with S ∈ E(IFq ) and |(q − 1), (3i) Direct computation in J s with S ∈ E(IFq ) and |(q − 1) with iterated TDBL. Timing data are summarized in Table 10. We assume that Mk = k 2 M, Sk = k 2 S, 1S = 0.8M in the table,
Efficient Computations of the Tate Pairing for the Large MOV Degrees
295
Table 10. Comparison of computing times
(0) (1) (1i) (2) (2i) (3) (3i) (4) (4i)
Computing times of the Tate pairing (Estimation) k = 2, |q| = 512 k = 3, |q| = 342 k = 4, |q| = 256 k = 5, |q| = 205 k 31122.1M 35413.0M 41002.6M 47290.9M 33308.4M 33677.9M 35944.5M 39135.9M 33035.4M 33495.7M 35808.1M 39026.5M 25793.0M 26828.8M 29451.0M 32841.7M 25520.1M 26646.6M 29314.6M 32732.4M 21010.5M 22719.3M 25691.7M 29284.2M 20737.6M 22537.1M 25555.2M 29174.8M 14143.2M ——– ——– ——– 13870.3M ——– ——– ——–
= 6, |q| = 171 53905.7M 42761.1M 42670.0M 36595.0M 36503.9M 33169.4M 33078.3M ——– ——–
here M denotes the computing time of a multiplication in IFq (So it is of no use to compare two computing times in different column). As k become larger, the direct computation become more efficient. If k = 6, the direct computation (1) is about 20.7% faster than the straight-forward implementation (0), and the direct computation with other techniques (3i) is about 38.6% faster. When k = 2, the direct computation looks inefficient, namely it makes the computation slower. Still our coordinate J s and the iterated TDBL work well. Indeed, if we use J s and the iterated TDBL (but not the direct computation), the estimation is 30292.6M for k = 2 which is about 2.6% faster than (0). We also give an estimated computation time (4) for a supersingular curve y 2 = x3 + ax discussed in section 5.1. In this case, the distortion map works very significantly and the computing time is very short. Still our iterated TDBL makes it about 2.0% faster.
6
Concluding Remarks
We proposed several improvements of computing the Tate pairing of the elliptic curves over finite fields IFq with (q = pm , p > 3). The proposed algorithms can be applicable not only the supersingular curves but also general elliptic curves. The proposed methods are specially effective upon the elliptic curves that has large MOV degree (k > 2). For k = 6, the proposed scheme is about 20% faster than the previously fastest algorithm.
Acknowledgement The authors would like to thank anonymous refrees for their helpful comments and suggestions.
296
Tetsuya Izu and Tsuyoshi Takagi
References [BF01]
D. Boneh, and M. Franklin, ”Identity-based encryption from the Weil pairing”, CRYPTO 2001, LNCS 2139, pp.213-229, Springer-Verlag, 2001. 284 [BKLS02] P. Barreto, H. Kim, B. Lynn, and M. Scott, ”Efficient Algorithms for Pairing-Based Cryptosystems”, CRYPTO 2002, LNCS 2442, pp.354-368, Springer-Verlag, 2002. 284, 292, 293, 294 [BLS01] D. Boneh, B. Lynn, and H. Shacham, ”Short Signatures from the Weil Pairing”, ASIACRYPT 2001, LNCS 2248, pp.514-532, Springer-Verlag, 2001. 284 [Cop83] D. Coppersmith, ”Evaluating Logarithms in GF (2n )”, STOC 1984, pp.201207, 1983. 284 [CMO98] H. Cohen, A. Miyaji and T. Ono, ”Efficient elliptic curve exponentiation using mixed coordinates”, Asiacrypt’98, LNCS 1514, pp.51-65, SpringerVerlag, 1998. 285 [DEM02] R. Dunport, A. Enge, and F. Morain, ”Building curves with arbitrary small MOV degree over finite prime fields”, Cryptology ePrint Archive, Report 2002/094, 2002. 284 [FMR99] G. Frey, M. M¨ uller, and H. R¨ uck, ”The Tate pairing and the discrete logarithm applied to elliptic curve cryptosystems”, IEEE Trans. on Information Theory, vol.45, pp.1717-1718, 1999. 283 [Gal01] S. D. Galbraith, ”Supersingular Curves in Cryptography”, Asiacrypt 2001, LNCS 2248, pp.495-513, Springer-Verlag, 2001. 284 [GHS02] S. D. Galbraith, K. Harrison, and D. Soldera, ”Implementing the Tate pairing”, ANTS V, LNCS 2369, pp.324-337, Springer-Verlag, 2002. 284, 287, 291, 292, 294 [Hes02] F. Hess, ”Exponent Group Signature Schemes and Efficient Identity Based Signature Schemes Based on Pairings”, Cryptology ePrint Archive, Report 2002/012, 2002. 284 [IEEE] IEEE P1363, Standard Specifications for Public-Key Cryptography, 2000. 283, 284, 286, 294 [ITTTK99] K. Itoh, M. Takenaka, N. Torii, S. Temma, and Y. Kurihara, ”Fast Implementation of Public-Key Cryptography on DSP TMS320C6201”, CHES’99, LNCS 1717, pp.61-72, 1999. 291 [Jou00] A. Joux, ”A One Round Protocol for Tripartite Diffie-Hellman”, ANTS IV, LNCS 1838, pp.385-393, Springer-Verlag, 2000. 284 [Jou02] A. Joux, ”The Weil and Tate Pairings as Building Blocks for Public Key Cryptosystems (survey)”, ANTS V, LNCS 2369, pp.20-32, Springer-Verlag, 2002. 284, 294 [Kob87] N. Koblitz, ”Elliptic curve cryptosystems”, Math. of Comp., vol.48, pp.203209, 1987. 283 [Men93] A. Menezes, ”Elliptic Curve Public Key Cryptosystems”, Kluwer Academic Publishers, 1993. 288, 293 [Mil86] V. Miller, ”Use of elliptic curves in cryptography”, CRYPTO’85, LNCS 218. p.417-426, Springer-Verlag, 1986. 283 [MNT01] A. Miyaji, M. Nakabayashi, and S. Takano, ”New explicit conditions of elliptic curve traces for FR-reduction”, IEICE Trans. Fundamentals, E84A(5), May, 2001. 284 [MOV93] A. Menezes, T. Okamoto, and S. Vanstone, ”Reducing Elliptic Curve Logarithms to Logarithms in a Finite Field”, IEEE Trans. on Information Theory, vol.39, pp.1639-1646, 1993. 283, 287
Efficient Computations of the Tate Pairing for the Large MOV Degrees [NIST]
[OP01]
[Pat02] [PS02] [Sma01]
[SEC] [SOK00]
[SW02]
[Ver01]
297
National Institute of Standards and Technology, Recommended Elliptic Curves for Federal Government Use, in the appendix of FIPS 186-2. 283, 284 T. Okamoto, P. Pointcheval, ”The Gap Problems: a new class of problems for the security of cryptographic primitives”, PKC 2001, LNCS 1992, pp.104-118, Springer-Verlag, 2001. 284 K. G. Paterson, ”ID-based Signatures from Pairings on Elliptic Curves”, Cryptology ePrint Archive, Report 2002/004, 2002. 284 D. Page, and N. Smart, ”Hardware Implementation of Finite Fields of Characteristic Three”, to appear in the proceedings of CHES 2002. 284 N. P. Smart, ”An Identity Based Authentificated Key Agreement Protocol Based on the Weil Pairing”, Cryptology ePrint Archive, Report 2001/111, 2001. 284 Standards for Efficient Cryptography Group (SECG), Specification of Standards for Efficient Cryptography. http://www.secg.org. 283, 284 R. Sakai, K. Ohgishi, and M. Kasahara, ”Cryptosystems Based on Pairing”, 2000 Symposium on Cryptography and Information Security (SCIS 2000), Okinawa, Japan, Jan. 26-28, 2000. 284 N. P. Smart, and J. Westwood, ”Point Multiplication on Ordinary Elliptic Curves over Fields of Characteristic Three”, Cryptology ePrint Archive, Report 2002/114, 2002. 284 E. R. Verheul, ”Self-Blindable Credential Certificates from the Weil pairing”, ASIACRYPT 2001, LNCS 2248, pp.533-551, Springer-Verlag, 2001. 284
Improved Techniques for Fast Exponentiation Bodo M¨ oller Technische Universit¨ at Darmstadt, Fachbereich Informatik [email protected]
Abstract. We present improvements to algorithms for efficient exponentiation. The fractional window technique is a generalization of the sliding window and window NAF approach; it can be used to improve performance in devices with limited storage. Window NAF splitting is an efficient technique for exponentiation with precomputation for fixed bases in groups where inversion is easy (e.g. elliptic curves).
1
Introduction
Many schemes in public key cryptography require computing powers ge (exponentiation) or power products
e
gj j
1≤j≤k
(multi-exponentiation) in a commutative semigroup G with neutral element 1G , e.g. in the group (Z/nZ)∗ or more generally in the multiplicative semigroup (Z/nZ) for some integer n, or in the group of rational points on an elliptic curve over a finite field. The exponents e, ej are positive integers with a typical length of a few hundred or a few thousand bits. Bases g, gj ∈ G sometimes are fixed between many computations. With fixed bases, it is often advantageous to perform a single time a possibly relatively expensive precomputation in order to prepare a table that can be used to speed up exponentiations involving those bases. (For multi-exponentiation, some of the bases may be fixed while others are variable: for example, verifying a DSA [11] or ECDSA [1] signature involves computing the product of two powers where one of the bases is part of domain parameters that can be shared between a large number of signers while the other base is specific to a single signer.) In this paper, we look at efficient algorithms for exponentiation and multiexponentiation based on either just multiplication in the given semigroup or optionally, in the case of a group, on multiplication and division. This amounts to constructing addition chains or addition-subtraction chains for the exponent e for exponentiation, and to constructing vector addition chains or vector addition-subtraction chains for the vector of exponents (e1 , . . ., ek ) for multiexponentiation (see e.g. the survey [4]). P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 298–312, 2003. c Springer-Verlag Berlin Heidelberg 2003
Improved Techniques for Fast Exponentiation
299
For purposes of performance analysis, we distinguish between squarings and general multiplications, as the former can often be implemented more efficiently. If we allow division, our performance analysis does not distinguish between divisions and multiplications; this is reasonable e.g. for point groups of elliptic curves, where inversion is almost immediate. If inversion is expensive, the group should be treated as a semigroup, i.e. inversion should be avoided. Section 2 gives a framework for exponentiation algorithms. In section 3, we show how it can be adapted to multi-exponentiation by using interleaved exponentiation. In section 4, we describe within the framework the sliding window exponentiation method and the window NAF exponentiation method. We then present improvements to the state of the art: section 5 describes fractional windows, a technique that closes a gap in the sliding window and window NAF methods and is useful for devices with limited storage; section 6 describes window NAF splitting, a technique for exponentiation with precomputation for fixed bases in groups where inversion is easy. Then, in section 7, we discuss how the exponent representations employed by our techniques can be implemented with small memory overhead. Finally, section 8 gives our conclusions. 1.1
Notation
If c is a non-negative integer, LSBm (c) = c mod 2m is the integer formed by the m least significant bits of c, and LSB(c) = LSB1 (c). When writing digits, we use the convention that b denotes a digit of value −b where b is understood to be a positive integer; for example, 1012 = 22 − 20 = 3.
2
A Framework for Exponentiation
Many algorithms for computing g e for arbitrary large integers e fit into one of two variants of a common framework, which we describe in this section. Exponents e are represented in base 2 as e= bi · 2i , 0≤i≤
using digits bi ∈ B ∪ {0} where B is some set of integers with 1 ∈ B. We call this a B-representation of e. Details of it are intrinsic to the specific exponentiation method. (Note that for given B, B-representations are usually not canonical.) The elements of B must be non-negative unless G is a group where inversion is possible in reasonable time. Given a B-representation, left-to-right or right-to-left methods can be used. Left-to-right methods look at the elements of bi starting at b and proceed down to b0 ; right-to-left methods start at b0 and proceed up to b . Depending on how the values bi can be obtained from an input value e, it may be easy to compute them on the fly instead of storing the B-representation beforehand. Left-to-right methods and right-to-left methods can be considered dual to each other (cf. the duality observation for representations of arbitrary addition chains as directed multi-graphs in [6, p. 481]); both involve two stages.
300
2.1
Bodo M¨ oller
Left-to-Right Methods
For left-to-right methods, first, in the precomputation stage, powers g b for all b ∈ B are computed and stored; if division in G is permissible and |b| ∈ B for each b ∈ B, then it suffices to precompute g b for those b ∈ B that are positive. We refer to this collection of precomputed powers g b as the precomputed table. How to implement the precomputation stage efficiently depends on the specific choice of B. In certain semigroups, in order to accelerate the evaluation stage, precomputed elements can be represented in a special way such that multiplications with these elements take less time (for example, precomputed points on an elliptic curve may be converted from projective into affine representation [3]). Note that if both the base element g and the digit set B are fixed, then the precomputation stage need not be repeated for multiple exponentiations if the precomputed table is kept in memory. In cases without such fixed precomputation, B is usually a set consisting of small integers such that the precomputation stage requires only a moderate amount of time. If B = 1, 3, . . ., β or B = ± 1, ±3, . . ., ±β with β ≥ 3 odd, the precomputation stage can be implemented with one squaring and (β − 1)/2 multiplications as follows: first compute g 2 ; then iteratively compute g 3 = g · g 2 , . . . , g β = g β−2 · g 2 . This applies to all encoding techniques we will present in later sections. In the evaluation stage (or left-to-right stage) of a left-to-right method, given the precomputed table and the representation of e as digits bi , the following algorithm is executed to compute the desired power from the precomputed elements g b . A ← 1G for i = down to 0 do A ← A2 if bi = 0 then A ← A · g bi return A If division is permissible, the following modified algorithm can be used: A ← 1G for i = down to 0 do A ← A2 if bi = 0 then if bi > 0 then A ← A · g bi else A ← A /g |bi | return A Note that in these algorithms squarings can be omitted while A is 1G ; similarly, the first multiplication or division can be replaced by an assignment or an assignment followed by inversion of A.
Improved Techniques for Fast Exponentiation
2.2
301
Right-to-Left Methods
For right-to-left methods, no precomputed elements are used. Instead, first the right-to-left stage yields values in a number of accumulators Ab , one for each positive element b ∈ B. If division is permissible, B may contain negative digits; we require that |b| ∈ B for each b ∈ B. Second, the result stage combines the accumulator values to obtain the final result. The following algorithm description comprises both stages, but the result stage is condensed into just the “return” line: how to implement it efficiently depends on the specific choice of B. For brevity, we show just the algorithm with division (if B does not contain negative digits, the “else”-branch will never be taken and can be left out). {right-to-left stage} for b ∈ B do if b > 0 then Ab ← 1G A←g for i = 0 to do = 0 then if bi if bi > 0 then Abi ← Abi · A else A|bi | ← A|bi | /A A ← A2 {result stage} return b∈B Ab b b>0
The squaring operation may be omitted in the final iteration as the resulting value of A will not be used. For each Ab , the first multiplication or division can be replaced by an assignment or an assignment followed by inversion (implementations can use flags to keep track which of the Ab still contain the values 1G ). If B = 1, 3, . . ., β or B = ± 1, ±3, . . ., ±β with β odd (as in all encoding techniques we will present in later sections), the result stage can be implemented as follows ([19], [6, exercise 4.6.3-9]): for b = β to 3 step −2 do Ab−3 ← Ab−3 · Ab A1 ← A1 · A2b return A1 This algorithm requires (β − 1)/2 squarings and β − 1 multiplications.
302
3
Bodo M¨ oller
Multi-exponentiation by Interleaved Exponentiation
Let a Bj -representation ej =
bj,i · 2i
0≤i≤j
be given for each of the exponents in a power product e gj j , 1≤j≤k
where each Bj is a digit set as in section 2. Then the multi-exponentiation can be performed by interleaving the left-to-right algorithms for the individual e exponentiations gj j [10]. For each j, precomputed elements gjb are needed as in section 2.1. Let be the maximum of the j . We may assume that = 1 = . . . = k (pad with leading zeros where If division is permissible, interleaved necessary). e exponentiation to compute 1≤j≤k gj j can be performed as follows: A ← 1G for i = down to 0 do A ← A2 for j = 1 to k do = 0 then if bj,i if bj,i > 0 then b A ← A · gj j,i else |b | A ← A /gj j,i return A As in section 2.1, initial squarings can be omitted while A is 1G , and the first multiplication or division can be replaced by an assignment possibly followed by inversion. The algorithm variant without division is obvious.
4
Sliding Window Exponentiation and Window NAF Exponentiation
A well-known method for exponentiation in semigroups is the sliding window technique (cf. [18, p. 912] and [4, section 3]). The encoding is based on a parameter w, a small positive integer called the window size. The digit set is B = {1, 3, . . ., 2w − 1}. Encodings using these digits can be computed on the fly by scanning the ordinary binary representation of the exponent either in left-toright or in right-to-left direction: in the respective direction, repeatedly look out for the first non-zero bit and then examine the sequence of w bits starting at this bit position; one of the odd digits in B suffices to cover these w bits. For example, given e = 88 = 10110002, left-to-right scanning using window size w = 3 yields 101 10002 → 510002,
Improved Techniques for Fast Exponentiation
303
and right-to-left scanning also using window size w = 3 yields 1 011 0002
→
10030002.
The average density of non-zero digits in the resulting representation bi · 2i e= 0≤i≤
is 1/(w + 1) for e → ∞. The length is at most that of the binary representation, i.e. a maximum index suffices to represent any + 1-bit exponent. Including negative digits into B allows decreasing the average density: a {±1}-representation such that no two adjacent digits are non-zero (“property M” from [13]) is called a non-adjacent form or NAF. More generally, let B=
± 1, ±3, . . ., ±(2w − 1) ;
then the following algorithm (from [17]) generates a B-representation of e such that at most one of any w + 1 consecutive digits is non-zero. There is a unique representation with this property, the width-(w + 1) NAF of e. We use the term window NAF (wNAF) if w is understood. This idea is also known as the signed window approach; w + 1 can be considered the window size. c←e i←0 while c > 0 do if LSB(c) = 1 then b ← LSBw+1 (c) if b ≥ 2w then b ← b − 2w+1 c←c−b else b←0 bi ← b; i ← i + 1 c ← c/2 return bi−1 , . . ., b0 Width-(w + 1) NAFs have an average density of 1/(w + 2) for e → ∞ ([15], [16], [9], [17]). Compared with the binary representation, the length can grow by one at most, so a maximum index is sufficient to represent any -bit exponent. For left-to-right exponentiation using the sliding window or window NAF technique, the precomputation stage has to compute g b for b ∈ {1, 3, . . ., 2w − 1}, which for w > 1 can be achieved with one squaring and 2w−1 − 1 multiplications (see section 2.1). For right-to-left exponentiation using the sliding window or window NAF technique, the result stage has to compute Ab b b∈{1,3,...,2w −1}
304
Bodo M¨ oller
given accumulator values Ab resulting from the right-to-left stage. This can be done in 2w−1 − 1 squarings and 2w − 2 multiplications (see section 2.2). 4.1
Modified Window NAFs
The efficiency of exponentiation given a B-representation depends on the number of non-zero digits and the length of the representation (i.e. the minimum index I such that bi = 0 for i ≥ I). Window NAFs may have increased length compared with the ordinary binary representation: e.g., the (width-2) NAF for 3 = 112 is 1012 , and the NAF for 7 = 1112 is 10012 . Such length expansion can easily be avoided in about half of the cases and thus exponentiation made more efficient by weakening the non-adjacency property (cf. [2]). A modified window NAF is a B-representation obtained from a window NAF as follows: if the w + 2 most significant digits (ignoring any leading zeros) have the form 1 0 0 . . . 0 b, w zeros
then substitute 0 1 0 . . . 0 β
w − 1 zeros
w
where β = 2 − b. In the above example, we obtain that the modified (width-2) NAF for 3 is 112 . However, the modified NAF for 7 is still 1001: in this case, length expansion cannot be avoided without increasing the number of non-zero digits.
5
Fractional Windows
In small devices, the choice of w for exponentiation using the sliding window or window NAF technique described in section 4 may be dictated by memory limitations. The exponentiation algorithms given in section 2 need storage for 1 + 2w−1 elements of G, and thus memory may be wasted: e.g., if sufficient storage is available for up four elements, only three elements can actually be used (w = 2). In this section, we show how the efficiency of exponentiation can be improved by using fractional windows, a generalization of the sliding window and window NAF techniques. We describe this new encoding technique first for the case that negative digits are allowed (signed fractional windows). We then describe a simpler variant for the case that only non-negative digits are permissible (unsigned fractional windows). 5.1
Signed Fractional Windows
Let w ≥ 2 be an integer and m an odd integer such that 1 ≤ m ≤ 2w − 3. The digit set for the signed fractional window representation with these parameters is B = ± 1, ±3, . . ., ±(2w + m) .
Improved Techniques for Fast Exponentiation
305
Let the mapping digit : {0, 1, . . ., 2w+2 } → B ∪ {0} be defined as follows: – – – –
If x is even, then digit (x) = 0; otherwise if 0 < x ≤ 2w + m, then digit (x) = x; otherwise if 2w + m < x < 3 · 2w − m, then digit (x) = x − 2w+1 ; otherwise we have 3 · 2w − m ≤ x < 2w+2 and let digit (x) = x − 2w+2 .
Observe that if x is odd, then x − digit (x) ∈ {0, 2w+1, 2w+2 }. The following algorithm encodes e into signed fractional window representation: d ← LSBw+2 (e) c ← e/2w+2 i←0 while d =0 ∨ c = 0 do b ← digit (d) bi ← b; i ← i + 1 d← d−b d ← LSB(c) · 2w+1 + d/2 c ← c/2 return bi−1 , . . ., b0 This algorithm is a direct variant of the window NAF generation algorithm shown in section 4, but based on the new mapping digit . Here we have expressed the algorithm in a way that shows that the loop is essentially a finite state machine (with 2w+1 + 1 states for storing, after b has been subtracted from the previous value of d, the even number d with 0 ≤ d ≤ 2w+2 ); new bits taken from c are considered input symbols and the generated digits bi are considered output symbols. The average density achieved by the signed fractional window representation with parameters w and m is 1 w+
m+1 2w
+2
for e → ∞. (Assume that an endless sequence of random bits is the input to the finite state machine described above: whenever it outputs a non-zero digit, the intermediate value d mod 2w+2 consists of w + 1 independent random bits plus the least significant bit, which is necessarily set. Thus with probability w+1 , which implies that the next non-zero p = 12 − 2m+1 w+1 , we have d − digit (d) = 2 output digit will follow after exactly w intermediate zeros; and with probability 1 − p, we have d − digit (d) ∈ {0, 2w+2 }, which implies that the next non-zero output digit will follow after w + 2 intermediate zeros on average. Thus the total average for the number of intermediate zeros is p·w+(1−p)·(w+2) = w+ m+1 2w +1, which yields the above expression for the density.) Comparing this with the 1/(w + 2) density for width-(w + 1) NAFs, we see that the effective window size has been increased by (m + 1)/2w , which is why we speak of “fractional windows”.
306
Bodo M¨ oller
As in section 4.1, length expansion can be avoided in many cases by modifying the representation. The modified signed fractional window representation is obtained as follows: if the w + 2 most significant digits are of the form 1 0 0 . . . 0 b, w zeros
then substitute 0 1 0 . . . 0 β
w − 1 zeros
where β = 2w − b; if the w + 3 most significant digits are of the form 1 0 0 . . . 0 b w + 1 zeros
with b > 2w , then substitute 0 1 0 . . . 0 β w zeros
where β = 2w+1 − b; and if the w + 3 most significant digits are of the form 1 0 0 0. . . 0 b w + 1 zeros
with b < 2w , then substitute 0 0 3 0 . . . 0 β
w − 1 zeros
where β = 2w − b. Precomputation for left-to-right exponentiation can be done in one squaring and 2w−1 + (m − 1)/2 multiplications (see section 2.1), and the result stage for right-to-left exponentiation can be implemented in 2w−1 + (m − 1)/2 squarings and 2w + m − 1 multiplications (see section 2.2). Table 1 shows expected performance figures for left-to-right exponentiation using the signed fractional window method in comparison with the usual window NAF method for 160-bit scalars; a typical application is elliptic curve cryptography. The signed fractional window method with w = 2, m = 1 achieves an evaluation stage speed-up of about 2.3 % compared with the window NAF method with w = 2, assuming that squarings take as much time as general multiplications. (When projective coordinates are used for representing points on elliptic curves, squarings are in fact usually faster, which will increase the relative speed-up.) Table 2 is for right-to-left exponentiation; it takes into account the optimizations to the right-to-left stage noted in section 2.2. The table shows that at this exponent bit length, for w = 3 fractional windows bring hardly any advantage for right-to-left exponentiation due to the relatively high computational cost of the result stage.
Improved Techniques for Fast Exponentiation
307
Table 1. Left-to-right exponentiation with window NAFs or signed fractional windows, = 160 w=2 w=3 w=4 wNAF s. fract. wNAF s. fract. s. fract. s. fract. wNAF m=1 m=1 m=3 m=5 precomputation stage: table entries 2 3 4 5 6 7 8 squarings 1 1 1 1 1 1 1 multiplications 1 2 3 4 5 6 7 evaluation stage: squarings ≤ 160 ≤ 160 ≤ 160 ≤ 160 ≤ 160 ≤ 160 ≤ 160 multiplications ≈ 40.0 ≈ 35.6 ≈ 32.0 ≈ 30.5 ≈ 29.1 ≈ 27.8 ≈ 26.7
Table 2. Right-to-left exponentiation with window NAFs or signed fractional windows, = 160 w=2 w=3 w=4 wNAF s. fract. wNAF s. fract. s. fract. s. fract. wNAF m=1 m=1 m=3 m=5 right-to-left stage: squarings ≤ 160 ≤ 160 ≤ 160 ≤ 160 ≤ 160 ≤ 160 ≤ 160 multiplications ≈ 39.0 ≈ 33.6 ≈ 29.0 ≈ 26.5 ≈ 24.1 ≈ 21.8 ≈ 19.7 result stage: input variables 2 3 4 5 6 7 8 squarings 1 2 3 4 5 6 7 multiplications 2 4 6 8 10 12 14
5.2
Unsigned Fractional Windows
The unsigned fractional window representation uses digit set B = {1, 3, . . ., 2w + m} and can be obtained by a variant of the technique from section 5.1. Here, let the mapping digit : {0, 1, . . ., 2w+1 } → B ∪ {0} be defined as follows: – If x is even, then digit (x) = 0; – otherwise if 0 < x ≤ 2w + m, then digit (x) = x; – otherwise let digit (x) = x − 2w . If x is odd, then x − digit (x) ∈ {0, 2w }. The following algorithm encodes e into unsigned fractional window representation: d ← LSBw+1 (e) c ← e/2w+1 i←0 while d =0 ∨ c = 0 do
308
Bodo M¨ oller
b ← digit (d) bi ← b; i ← i + 1 d← d−b d ← LSB(c) · 2w + d/2 c ← c/2 return bi−1 , . . ., b0 Similarly to the signed case, it can be seen that the average density of the unsigned fractional window representation is 1 w+
m+1 2w
+1
for e → ∞. The precomputation or result stage is as before. Table 3 shows expected performance figures for left-to-right exponentiation using the unsigned fractional window method in comparison with the usual sliding window method for 1024-bit scalars; a typical application is exponentiation in the multiplicative semigroup (Z/nZ) for an integer n. If squarings take as much time as general multiplications, the unsigned fractional window method with w = 2, m = 1 is approximately 3.7 % faster than the sliding window method with w = 2. Table 4 shows the figures for right-to-left exponentiation, taking into account the optimizations to the right-to-left stage noted in section 2.2. 5.3
Example: Application to Multi-exponentiation
Assume we have to compute a power product g1e1 g2e2 with random -bit exponents e1 , e2 in a group where inversion is easy, and that we have storage for five precomputed elements. For using interleaved exponentiation as described in section 3, we can represent e1 as a width-3 NAF and e2 in signed fractional window representation with w = 2, m = 1. This means we use precomputed evaluation stage needs at most squarings and elements g1 , g13 , g2 , g23 , g25 . The 1 approximately 14 + 4+1/2 = 17 36 multiplications on average, compared with
Table 3. Left-to-right exponentiation with sliding windows or unsigned fractional windows, = 1023 w=2 w=3 w=4 slid. w. u. fract. slid. w. u. fract. u. fract. u. fract. slid. w. m=1 m=1 m=3 m=5 precomputation stage: table entries 2 3 4 5 6 7 8 squarings 1 1 1 1 1 1 1 multiplications 1 2 3 4 5 6 7 evaluation stage: squarings ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 multiplications ≈ 341.0 ≈ 292.3 ≈ 255.8 ≈ 240.7 ≈ 227.3 ≈ 215.4 ≈ 204.6
Improved Techniques for Fast Exponentiation
309
Table 4. Right-to-left exponentiation with sliding windows or unsigned fractional windows, = 1023 w=2 w=3 w=4 slid. w. u. fract. slid. w. u. fract. u. fract. u. fract. slid. w. m=1 m=1 m=3 m=5 right-to-left stage: squarings ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 ≤ 1023 multiplications ≈ 340.0 ≈ 290.3 ≈ 252.8 ≈ 236.7 ≈ 222.3 ≈ 209.4 ≈ 197.6 result stage: input variables 2 3 4 5 6 7 8 squarings 1 2 3 4 5 6 7 multiplications 2 4 6 8 10 12 14
1 2
multiplications for interleaved exponentiation with width-3 NAFs for both exponents (precomputed elements g1 , g13 , g2 , g23 ). (A similar scenario is considered in [14], using a different multiexponentiation algorithm; for groups where inversion is easy, that technique using the same amount of storage as needed in our above example runs slightly slower according to the heuristical results in [14, table 12].)
6
Window NAF Splitting
One approach for efficient exponentiation with precomputation for fixed bases, given an upper bound +1 for exponent bit lengths and a positive integer parameter v, is to turn exponentiations into multi-exponentiations by using exponent splitting as follows [12]: iv (g 2 )e[iv+v−1 ... iv] ge = 0≤i< (+1)/v
Here e[j . . . j ] denotes the integer whose binary representation is the concatena tion of bits j down to j of e (i.e. e/2j mod 2j−j +1 ). For groups where inversion is easy, [10] proposes to use this approach with window NAF based interleaved exponentiation: that is, each of the length-v exponent parts is encoded as a window NAF as described in section 4, and then an interleaved exponentiation using these windows NAFs is performed as described in section 3. With width-(w + 1) NAFs, this computation should take about v squarings and /(w + 2) multiplications using ( + 1)/v · 2w−1 precomputed elements. However, if v is very small, the expected number of multiplications will be noticeably higher because the estimate that the density of window NAFs is approximately 1/(w + 2) becomes accurate only if the encoded number is sufficiently long. (Window NAFs usually waste part of one window; the more individual integers must be encoded into window NAFs, the more is wasted in total.)
310
Bodo M¨ oller
An improved technique that avoids this drawback is window NAF splitting. Instead of splitting the binary representation of exponent e into partial exponents of length v and determining window NAFs for these, we first determine the window NAF of e and then split this new representation into parts of length v. The computation continues as above, using the interleaved exponentiation algorithm shown in section 3. To avoid length expansion if possible, this technique should be used with modified window NAFs (section 4.1) The leftmost part can be made large than the others if one more part would have to be added otherwise; e.g. for integers up to 160 bits with v = 8: b160 b159 · · · b152 b151 · · · b144 · · · b7 · · · b0 9 digits
8 digits
8 digits
Most of the time, the additional digit of the leftmost part will be zero since length expansion is relatively rare (for modified window NAFs of positive integers up to a length of bits with w = 4, only about one out of five cases has a non-zero digit at maximum index ). With window NAF splitting, exponentiations for -bit exponents can be performed in v − 1 squarings and on average about /(w + 2) multiplications, using ( + 1)/v · 2w−1 precomputed elements. If the leftmost part gets an extra digit as described above, /v · 2w−1 precomputed elements are sufficient, and the number of squarings goes up to v for some cases. This method can compete with Lim and Lee’s algorithm for exponentiation with precomputation described in [8] and [7] even when much space is available for precomputed elements (whereas exponent splitting with window NAF based interleaving exponentiation is better than the Lim-Lee algorithm only for comparatively small precomputed tables). For example, if = 160, then with v = 8 and w = 4 (160 precomputed elements if we allow an extra digit in the leftmost window NAF part), our exponentiation method with window NAF splitting needs about 7.2 squarings and 26.7 multiplications. The Lim-Lee algorithm can perform such 160-bit exponentiations in 13 squarings and about 26.6 multiplications using 128 precomputed elements, or in 11 squarings and about 22.8 multiplications using 256 precomputed elements. It is possible to use window NAF splitting with a flexible window size: While generating digits using the algorithm described in section 4, parameter w can be changed. This should be done only at the beginning of a new part of the window NAF (i.e., when the number of digits generated so far is a multiple of v). For example, if in the = 160 setting we are using v = 8 and allowing an extra digit in the leftmost part, the (modified) window NAF will be split into 20 parts; we can start with w = 5 for the first 12 of these, then switch to w = 4 for the remaining 8. Then we need 12 · 24 + 8 · 23 = 256 precomputed elements and can perform 8·8 exponentiations in about 7.2 squarings and 12·8 5+2 + 4+2 ≈ 24.4 multiplications, which is usually (depending on the relative performance of squarings and general multiplications) better than the performance of the Lim-Lee algorithm with 256 precomputed elements.
Improved Techniques for Fast Exponentiation
7
311
Compact Encodings
When storing a window NAF or fractional window representation where a single digit may take w + 1 bits of memory (this is the case for width-(w + 1) NAFs if we take into account that the digit may be zero, and it is the case for signed fractional window representations), then it is not necessary to store digits separately in w + 1 bits each. If memory is scarce, it is possible to exploit the properties of the representation to obtain a more compact encoding into bit strings (cf. [5]). We can encode a zero digit as a single zero bit, and a non-zero digit as a one bit followed by a representation of the respective digit, which together takes w + 1 bits in the case of window NAFs and w + 2 bits in the case of signed fractional window representations. After each non-zero digit, there will be w zero digits (unless conversion into a modified window NAF has taken place), and these can be omitted from the encoding. Thus, compared with the usual binary representation of the number, in the case of window NAFs we only have growth by a small constant; in the case of signed fractional window representations (and similarly in the case of unsigned fractional window representations), we additionally have growth by one bit for each non-zero digit of the representation. This bit string encoding can easily be adapted to the case that the bit string will be read in the reverse of the direction in which it was written (for example, non-zero digits should be encoded as a representation of the respective digit followed by a one bit rather than the other way around).
8
Conclusions
We have closed a gap in the sliding window and window NAF methods for efficient exponentiation: our fractional window techniques can improve the performance by a couple of percents in devices with limited memory by making use of memory that would have to remain unused with the previously known methods. With window NAF splitting, we have shown an efficient technique for exponentiation with precomputation in groups where inversion is easy, which provides a convenient alternative to the patented Lim-Lee method.
References [1] American National Standards Institute (ANSI). Public key cryptography for the financial services industry: The elliptic curve digital signature algorithm (ECDSA). ANSI X9.62, 1998. 298 [2] Bosma, W. Signed bits and fast exponentiation. Department of Mathematics, University of Nijmegen, Report No. 9935, 1999. 304 [3] Cohen, H., Ono, T., and Miyaji, A. Efficient elliptic curve exponentiation using mixed coordinates. In Advances in Cryptology – ASIACRYPT ’98 (1998), K. Ohta and D. Pei, Eds., vol. 1514 of Lecture Notes in Computer Science, pp. 51– 65. 300
312
Bodo M¨ oller
[4] Gordon, D. M. A survey of fast exponentiation methods. Journal of Algorithms 27 (1998), 129–146. 298, 302 [5] Joye, M., and Tymen, C. Compact encoding of non-adjacent forms with applications to elliptic curve cryptography. In Public Key Cryptography – PKC 2001 (2001), K. Kim, Ed., vol. 1992 of Lecture Notes in Computer Science, pp. 353–364. 311 [6] Knuth, D. E. The Art of Computer Programming – Vol. 2: Seminumerical Algorithms (3rd ed.). Addison-Wesley, 1998. 299, 301 [7] Lee, P.-j., and Lim, C.-h. Method for exponentiation in a public-key cryptosystem. United States Patent 5,999,627, 1999. 310 [8] Lim, C. H., and Lee, P. J. More flexible exponentiation with precomputation. In Advances in Cryptology – CRYPTO ’94 (1994), Y. G. Desmedt, Ed., vol. 839 of Lecture Notes in Computer Science, pp. 95–107. 310 [9] Miyaji, A., Ono, T., and Cohen, H. Efficient elliptic curve exponentiation. In International Conference on Information and Communications Security – ICICS ’97 (1997), Y. Han, T. Okamoto, and S. Qing, Eds., vol. 1334 of Lecture Notes in Computer Science, pp. 282–290. 303 ¨ ller, B. Algorithms for multi-exponentiation. In Selected Areas in Cryp[10] Mo tography – SAC 2001 (2001), S. Vaudenay and A. M. Youssef, Eds., vol. 2259 of Lecture Notes in Computer Science, pp. 165–180. 302, 309 [11] National Institute of Standards and Technology (NIST). Digital Signature Standard (DSS). FIPS PUB 186-2, 2000. 298 [12] Pippenger, N. On the evaluation of powers and related problems (preliminary version). In 17th Annual Symposium on Foundations of Computer Science (1976), IEEE Computer Society, pp. 258–263. 309 [13] Reitwiesner, G. W. Binary arithmetic. Advances in Computers 1 (1960), 231– 308. 303 [14] Sakai, Y., and Sakurai, K. Algorithms for efficient simultaneous elliptic scalar multiplication with reduced joint Hamming weight representation of scalars. In Information Security – ISC 2002 (2002), A. H. Chan and V. Gligor, Eds., vol. 2433 of Lecture Notes in Computer Science, pp. 484–499. 309 [15] Schroeppel, R., Orman, H., O’Malley, S., and Spatscheck, O. Fast key exchange with elliptic curve systems. In Advances in Cryptology – CRYPTO ’95 (1995), D. Coppersmith, Ed., vol. 963 of Lecture Notes in Computer Science, pp. 43–56. 303 [16] Solinas, J. A. An improved algorithm for arithmetic on a family of elliptic curves. In Advances in Cryptology – CRYPTO ’97 (1997), B. S. Kaliski, Jr., Ed., vol. 1294 of Lecture Notes in Computer Science, pp. 357–371. 303 [17] Solinas, J. A. Efficient arithmetic on Koblitz curves. Designs, Codes and Cryptography 19 (2000), 195–249. 303 [18] Thurber, E. G. On addition chains l(mn) ≤ l(n) − b and lower bounds for c(r). Duke Mathematical Journal 40 (1973), 907–913. 302 [19] Yao, A. C.-C. On the evaluation of powers. SIAM Journal on Computing 5 (1976), 100–103. 301
Efficient Hardware Multiplicative Inverters Hyun-Gyu Kim1 and Hyeong-Cheol Oh2 1
Lab. of Parallel Computation, Bio-science Bldg. #231-B, Korea University Seoul 136-701, Korea [email protected] 2 School of Engineering, Korea University at Seo-Chang Cho-Chi-Won, Chung-Nam 339-700, Korea [email protected]
Abstract. We propose two hardware inverters for calculating the multiplicative inverses in finite fields GF (2m ): one produces a result in every O(m) time using O(m) area; and the other produces a result in every O(1) time using O m2 area. While existing O(m)-time inverters require at least two shift registers in the datapath, the proposed O(m)-time implementation uses only one, thus costing less hardware. By exploiting the idea used in the O(m)-time inverter and developing a new way of controlling the dataflow, we also design a new O(1)-time inverter that works faster but costs less hardware than the best previously proposed O(1)-time implementation with the same area-time complexity.
1
Introduction
The computation of multiplicative inverses in Galois fields is an important operation in various digital systems such as elliptic curve cryptosystems [1] and errorcontrol codecs [2]. Since inversion is a very time-consuming operation, many a researcher has reported various special-purpose hardware implementations for the inversion operation, most of which adopt structures with high area complexity, such as systolic arrays, to boost the performance (e.g., [3, 4, 5] and the references therein.) As the size of the field for related applications including public-key cryptosystems is getting larger and larger, the hardware costs of these implementations become more and more crucial. In this paper, we investigate efficient schemes, based on the extended Euclidean algorithm over GF (2), for finding the inverse of an element of GF (2m ). We only consider the polynomial basis representation and assume that the representation of the elements is defined by a primitive polynomial F (x) = xm + Fm−1 xm−1 + · · · + F0 of degree m over GF (2). Given a polynomial basis representation of an element A(x) = Am−1 xm−1 + · · · + A0 in GF (2m ), we need to find an element I(x) = Im−1 xm−1 + · · · + I0 so that I(x) = 1/A(x) mod F (x). A polynomial A of order k over GF (2) can be represented as the k-dimensional vector A = [Ak−1 , · · · , A0 ] and stored in a k-bit register. As the order of polynomials of interest becomes large, the hardware cost for the registers becomes a major factor of the implementation cost of the inverter. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 313–326, 2003. c Springer-Verlag Berlin Heidelberg 2003
314
Hyun-Gyu Kim and Hyeong-Cheol Oh
The hardware cost for the registers becomes even more crucial when the registers need to be shift registers: the hardware cost (in the number of equivalent gates used) of a 163-bit basic register is about 40% less than that of a 163-bit ordinary shift register and about 62.9% less than that of a 163-bit shift register with 4-bit barrel shifters which can shift bits in blocks of up to 4 bits.1 In order to calculate I(x) = 1/A(x) mod F (x), we can implement the extended Euclidean algorithm using four polynomials R− , R0 , U − , and U 0 , as follows [3]: initial conditions : R− ← F (x), R0 ← A(x), U − ← 0, U 0 ← 1 ; iteration : R− ← R− − (R− mod R0 )R0 , U − ← U − − (R− mod R0 )U 0 ; R− ↔ R0 , U − ↔ U 0 ; The iteration stops when R0 = 0 and U − = I(x).2 In this paper, we use the semicolon to distinguish operations that cannot be performed simultaneously, while we put the comma to separate other operations. The modulo operation included in the above algorithm can be performed by using the polynomial division. Bruner et. al. [3] proposed to implement the polynomial division by finding the leading ones in R0 with shift-left operations. Based on this idea, they presented an inverter with basically four shift registers in the datapath, achieving the best known area-time (AT) product of O m2 , with the time complexity of O(m) and the area complexity of O(m). Hasan [4] observed that the degree of Ro decreases while the degree of U 0 increases during the iteration process so that an inequality deg(U 0 (x)) + deg(R0 (x)) < m holds. Based on this observation, Hasan proposed an architecture that uses two shift registers in the datapath. There also have been endeavors to provide the maximum throughput while keeping the AT product unchanged. The best result among such works is the inverter proposed by Guo and Wang [5], which can compute an inverse or a division in every O(1) time using O m2 area. In this paper, we propose two hardware inverters: one produces a result in every O(m) time using O(m) area; and the other produces a result in every O(1) time using O m2 area. In the next section, we present our O(m)-time implementation that uses only one shift register in the datapath. All the previous proposals use at least two shift registers in the datapath. We also analyze a tradeoff between the latency and the implementation cost of the proposed inverter. In Section 3, we present our O(1)-time implementation that is obtained by applying the idea used in Section 2 and developing a new way of controlling 1 2
The gate counts were estimated using Synopsys’s synthesis and optimization tool [6] with a 0.5µm CMOS standard cell library. This is the case that the condition is tested at the end of each iteration stage. In our O(m)-time implementation, we test if R− = 0 (and U 0 = I(x)) before the swap operations begin.
Efficient Hardware Multiplicative Inverters
315
the dataflow in the hardware. Our estimation results show that the new architecture can reduce the latency, the maximum cell delay, and the gate usage of the inverter proposed in [5]. A few concluding remarks are made in Section 4.
2
O (m)-Time Implementation
In this section, we present a hardware inverter that produces a result in O(m) time using O(m) area. The proposed inverter is based on the Hasan’s algorithm [4], but it differs from the implementation proposed by [4] in that the proposed one uses only one shift register in the datapath while the one in [4] uses two. After we present our O(m)-time architecture, we loosen the time constraint and analyze a tradeoff between the latency and the implementation cost of the proposed inverter. Figure 1 shows the architecture of the proposed O(m)-time inverter. In the datapath, there are three (m + 2)-bit registers L = [Lm+1 , · · · , L0 ], R = [Rm+1 , · · · , R0 ], and B = [Bm+1 , · · · , B0 ], where L is a shift register. As Hasan [4] suggested, the coefficients of four polynomials R− , R0 , U − , and U 0 are stored in two registers L and R so that L = R0 |U 0 and R = R− |U − , where | represents the concatenation of two polynomials. The authors do not know, however, how to operate the Hasan’s architecture for obtaining the result in O(m)-time unless we significantly increase the size of the shift registers or adopt an extra temporary storage to hold the intermediate result during each iteration stage of the modulo operation. We chose to use a temporary storage, B. In the figure, ∼ and & denote the bitwise inversion and the bitwise AND operation, respectively. We also found that R does not have to be a shift register when we use the algorithm described in the following section, whereas both L and R are shift registers in the architecture of [4]. The (m + 2)bit shift register M = [Mm+1 , · · · , M0 ] in the control unit holds the information about the moving boundary. We can dispense with M . It is required, however, to control the operations on the bits in the registers in a bit-by-bit manner. Consequently, the layout of control wires would be extremely complicated and occupy even larger area without M . We believe that the structure proposed in [4] would also suffer from the same difficulty without M . 2.1
O (m)-Time Algorithm
Figure 2 pictorially describes the first iteration stage that the proposed O(m)time inverter goes through. The registers L and R are initialized as shown in Fig. 2(a). All four polynomials are stored in such a way that the most significant bit (MSB) of each polynomial is placed in the uppermost (leftmost) position of the corresponding part in the register.
316
Hyun-Gyu Kim and Hyeong-Cheol Oh
~M&R
|U
B
R=R
0
L = R 0| U
M
Control Logic
&
& M&R Fig. 1. New O(m)-time hardware architecture. Filled arrows represent the connections for the swap operation, while blank arrows represent the connections for the add operations
After the initialization stage, the algorithm enters for the leading-one detection phase (LODP), in which a leading-one detection (LOD) operation is performed. R0 is shifted up (left), while zeros fill the space left, until the MSB of R0 gets 1. The result of this process is depicted in Fig. 2(b), where it is assumed that there were d leading zeros before the LOD operation. In the figure, 0d represents d zeros. Then, as shown in Fig. 2(c), R0 is swapped with R− , while U − is stored into B, U − is set to U 0 , and U 0 is cleared. Before the algorithm enters for the computation phase (CP), one addition (subtraction) operation shown in Fig. 2(d) is performed. In the computation phase, LOD and ADD operations are repeated until the degree of R0 gets equal to that of R− as shown in Fig. 2(e). Finally, as shown in Fig. 2(f), the intermediate result held in B is added to U 0 , which completes the iteration stage.
Efficient Hardware Multiplicative Inverters L
R
MSB
R0 = A(x)
R0 = x A(x) d
R = F(x)
R=F(X)
0d LSB U0 = 1
U- = 0
U0 = 1
(a) Initial Condition
R0 = R-
(b) LOD operation (in LODP)
R= R0
R0 = R0 + R-
0d
U0=0
0
U- = U0
(c) SWAP operation
R0
R-
U- = 0
R-
0d
B = U-
U0 = U0 + U-
U-
(d) ADD operation
R0
R-
MSB 0d U0 LSB
U0 = U0 + B
B
U-
(e) LOD and ADD (in CP)
(f) ADD w/ backup Operation
Fig. 2. The first iteration stage of the O(m)-time algorithm
317
318
Hyun-Gyu Kim and Hyeong-Cheol Oh
The whole algorithm for operating the proposed O(m)-time architecture is defined in Algorithm 1. Algorithm 1. O(m)-Time Algorithm * L = [0, Am−1 , · · · , A0 , 1] * R0 ← 0A(x), U 0 ← 1, R− ← F (x), U − ← 0, * R = [Fm , Fm−1 , · · · , F0 , 0] * M ← 0(m+1) 1, cnt ← 0; * 0k : k 0’s * (m+2) while (∼ M & L) = 0 repeat * Phase 1 - Leading-one Detection * if Lm+1 = 1 then swap(); add(); shl(M), shl(L), cnt ← cnt − 1; goto phase 2 else L ← shl(R0 )|U 0 , cnt ← cnt + 1; end repeat * Phase 2 - Computation * if Lm+1 = 1 then add(); if cnt > 0 then shl(M), shl(L), cnt ← cnt − 1; else add-w-Backup(); L ← shl(R0 )|U 0 , cnt ← cnt + 1; goto phase 1 end endwhile swap() for all i do in parallel if Mi = 1 then Bi ← Ri , Ri ← Li , Li ← 0; else Bi ← 0, Ri ↔ Li ; end add() for all i do in parallel Li ← Li + Ri ; endfor add-w-Backup() for all i do in parallel Li ← Li + Bi ; endfor In Algorithm 1, shl() denotes an one-bit shift-left operation. The operation L ← shl(R0 )|U 0 shifts the R0 part only, while zeros fill the space left. The masking register M in the control unit is used to hold the information about the moving boundary. As we mentioned in the previous section, we can dispense with M . The operation shl(M ) shifts the contents of M , while one fills the space left.
Efficient Hardware Multiplicative Inverters
2.2
319
Implementation
We sought the optimal implementation of the O(m)-time inverter. We loosened the time constraint and analyzed a tradeoff between the latency and the implementation cost of the proposed inverter. Except for the use of one shift register in the datapath and the use of registers M and B, we followed the inverter model defined in [4]: it uses a g-bit leading-one detector; it shifts the bits in blocks of up to g bits, thus requiring g-bit barrel shifters; and it processes (adds) the data in blocks of size r. Our O(m)-time algorithm shown before, which is described for a specific case of g = 1 and r = m, can be easily extended to this generalized model. Since the probability that four consecutive bits are all zero is 1/16 for random input sequences, we consider the events that shift more than 4 bits at a time to be rare. Thus, we decided to limit the size of the leading-one detector to 4 bits. We also considered implementations with g = 1 (without barrel shifters) for reducing the implementation cost and with r = m for speeding up the inverter. We estimated the implementation cost (NG ) in the number of equivalent gates, based on data from a 0.5µm CMOS standard cell library. The control unit except for the register M was excluded in our estimation. Table 1 shows the estimates we obtained for three selected configurations. The values of m were selected from the ones recommended by NIST FIPS 186-2 [7]. Note that the configuration with r = 32 is not a O(m)-time inverter. The gate-count increase due to the use of m XOR gates is much less than the implementation cost needed for barrel shifters and multiplexers. The area occupied by the XOR gates should be also negligible since the XOR gates could be placed and routed locally between the registers. Table 1 also summarizes, for the random instances, the average value over 10,000 runs of the latency (TL ) in clock cycles. The constant factor of O(m) is about 3 for the configuration of g = 4 and r = m. For the configuration of g = 1 and r = m, the constant factor is about 3.5, but the implementation cost increases much less quickly as m increases. Table 1 shows that we can improve the performance significantly with even less hardware usage by parallelizing the add (XOR) operations using m XORs (r = m). When the hardware resource is limited, we can eliminate the leadingone detector (and the barrel shifters) to reduce significantly the hardware cost with a relatively small amount of performance loss.
3
O (1)-Time Implementation
In this section, we propose an inverter that produces a result in every O(1) time using O m2 area. The proposed O(1)-time inverter has a systolic array structure of 2m × (m + 2) cells. Like the O(m)-time implementation presented in Section 2, the inverter presented in this section is also based on the Hasan’s algorithm [4]. The algorithm used in this section, however, differs from the O(m)-time algorithm in two ways.
320
Hyun-Gyu Kim and Hyeong-Cheol Oh
Table 1. Comparison of the latency (TL ) in clock cycles and the implementation cost (NG ) in the number of equivalent gates for three selected configurations. Each value of TL represents an average of 10,000 experiments m 163 233 283 409 571
g = 4, r = 32 g = 4, r = m g = 1, r = m TL NG TL NG TL NG 10,198 9,583 479 8,256 567 5,596 20,866 13,757 686 11,784 812 7,976 30,459 16,544 834 14,304 987 9,676 63,642 23,966 1,206 20,654 1,428 13,960 123,039 33,250 1,684 28,819 1,994 19,468
First, in the O(1)-time algorithm, the coefficients of U 0 and U − are kept in the reverse order. Secondly, our O(1)-time algorithm avoid using the backup register for holding intermediate results, since the use of the backup register prevents regular computation which is crucial for systolic computation. Instead, it uses two (m + 2)-bit registers LM and RM to hold the information about the moving boundaries within the registers L and R, respectively, where LM is a shift register. The use of LM and RM again simplify the interconnections as well as the hardware for controlling dataflow. Even though the use of two extra registers seems to increase the area occupied by the systolic array, we have found that our implementation not only works faster, but also uses less gates, than the O(1)-time inverter proposed in [5]. 3.1
O (1)-Time Algorithm
Since the idea behind our O(1)-time algorithm is similar to the one explained in Section 2.1, we only explain the main differences between the algorithms. During initialization, four polynomials are stored in L and R so that L = R0 |U − and R = R− |U 0 . Note that the positions of U − and U 0 are swapped. The order in which the coefficients of U ’s are stored is also different: the most significant bit (MSB) of U − (or U 0 ) is placed in the lowermost (rightmost) position while the MSB of R0 (or R− ) is placed in the uppermost (leftmost) position. As a result, when R0 is shifted up (left) during the leading-one detection phase, U − is also shifted up so that zeros fill the space left to the MSB side of U − , while zeros fill to the least significant bit (LSB) side in our O(m)-time algorithm as well as in the algorithms proposed in [3, 4, 5]. The algorithm can be rewritten for our O(1)-time inverter as follows:
Efficient Hardware Multiplicative Inverters
321
Algorithm 2. O(1)-Time Algorithm: R0 ← 0A(x), U − ← 0, * L = [0, Am−1 , · · · , A0 , 0] * R− ← F (x), U 0 ← 1, * R = [Fm , Fm−1 , · · · , F0 , 1] * LM ← 0(m+1) 1, RM ← 0(m+2) , cnt ← 0 ; for i = 0 to 2m − 1 do repeat * Phase 1 - Leading-one Detection * if LMSB = 1 then swap(); add(); goto phase 2 else cnt ← cnt + 1 ; shl(L), shl(LM); end repeat * Phase 2 - Computation * if cnt <= 1 then goto phase 1 else cnt ← cnt − 1 ; if LMSB = 1 then add(); shl(L); end endfor swap() for all i do in parallel LMi ↔ RMi , Li ↔ Ri ; endfor add() for i = 1 to m − 1 do in parallel if LM i ∧ RM i = 1 then Li ← Li + Ri ; else if LMi ∧ RMi = 1 then Li ← Li ; else if LMi ∧ RMi = 1 then Ri ← Li + Ri ; endfor After 2m iterations, the O(1)-time algorithm generates the inverse I(x). However, I(x) can appear in L or R, depending on the input operands. Thus, we need to make I(x) appear always in L so that L = [0, I0 , ..., Im−1 , 0], by adding a swap operation controlled by the result value kept in LM , which is not shown in the algorithm. 3.2
Implementation
Since the O(1)-time algorithm defined in Section 3.1 involves simple 2m iterations, it can be performed in a 2m-stage pipelining fashion. From the O(1)-time algorithm, we constructed two small algorithms: the first one (called Type-2 algorithm) is for performing one iteration; the second one (called Type-1 algorithm)
1 0 a3 f3
. . i0
. i1
f0 . . . 0 0
(LM 0 ) (R M 0 )
a0 . . .
(LM 1 ) (RM 1 )
a1 f1 . . . .0 0
(U -) (U 0 )
a2 f2 . .0 0
(LM 2 ) (R M 2 )
(LM 3 ) (R M 3 )
Hyun-Gyu Kim and Hyeong-Cheol Oh
(CNT)
(LODP)
322
0 11 1
0
1
0
1
0
1
0
1
0
1
i2
Fig. 3. New O(1)-time hardware architecture for multiplicative inversion in GF (23 ). Big rectangles crossed with ”X” represent the Type-1 cells. A small and black rectangle represents a delay element
Efficient Hardware Multiplicative Inverters
LODP
CNT
R
L
0
1
0
1
SWAP ADD
<= 1 ?
Counter 0
1
LODP
CNT
R
Fig. 4. Type-1 cell
Ri
Li
SWAP
0
1
0
1
LMi
0
RMi
1
0
1
SWAP
ADD
ADD 0
1
Li
0
1
Ri
Fig. 5. Type-2 cell
LMi
RMi
323
324
Hyun-Gyu Kim and Hyeong-Cheol Oh
is for generating the control signals needed to operate the first one. Type-2 algorithm was able to be further partitioned into unit algorithms, each of which described how to calculate a bit. Then we designed two hardware modules: Type1 cell to perform the Type-1 algorithm and Type-2 cell to perform the unit part of the Type-2 algorithm. The O(1)-time inverter can be constructed as a systolic array consisting of 2m Type-1 cells and 2m × (m + 1) Type-2 cells. Figure 3 shows our architecture for multiplicative inversion in GF (23 ). In the figure, the big rectangles crossed with ”X” and the big blank rectangles represent the Type-1 cells and the Type-2 cells, respectively. A small and black rectangle represents a delay element. The internal block diagrams of the Type-1 cell and the Type-2 cell are shown in Fig. 4 and Fig. 5. Using the LMSB signal, the Type-1 cells generate control signals for swap and add operations to be performed in the Type-2 cells. As mentioned earlier, the states of LM and RM of each cell are used to control all the other operations performed in the cell. In order to handle the initial process, the two Type-1 cells on the top side do not include the comparator module. The leftmost Type-2 cell in the bottom row includes an extra swap module to guarantee the result to appear in L. In Fig. 4, the LODP signal informs the cell which phase the inverter is in. The module labelled as Counter computes the value of the log2 (m + 1) -bit signal CNT and changes the LODP value when it is necessary. In Table 2, we compare our implementation with the inverter proposed by Guo and Wang[5]. Even though we increased the number of columns by one, we reduced the number of rows from 3m to 2m. As a result, our implementation significantly reduces the latency, the maximum cell delay, and the gate usage of the inverter proposed by [5].
4
Conclusion
In this paper, we have presented two hardware inverters for calculating the multiplicative inverses in finite fields GF (2m ): one calculates a result in every O(m) time O(m) area; and the other calculates a result in every O(1) time using using O m2 area. Starting from the best previous proposal in [4], we sought an efficient implementation of the inverter and designed a new O(m)-time inverter that uses only one shift register in the datapath, while all the existing architectures require at least two. A tradeoff between latency and implementation cost of the O(m)-time inverter has bee analyzed, showing that we can improve the performance significantly with even less hardware usage by parallelizing the add (XOR) operations using m XORs (r = m). When the hardware resource is limited, we can eliminate the leading-one detector (and the barrel shifters) to reduce significantly the hardware cost with a relatively small amount of performance loss. By exploring the idea used in the O(m)-time inverter and developing a new way of controlling the dataflow in the hardware, we also designed a new O(1)time inverter that has a systolic array inverter structure of 2m×(m+2) cells. An
Efficient Hardware Multiplicative Inverters
325
Table 2. Comparison of hardware architectures with O(1)-time complexity for computing inverses in GF (2m ) Guo and Wang [5]
Proposed
Throughput (1/cycles)
1
1
Latency (cycles)
8m − 1
5m − 1
Maximum cell delay
TAND2 + TXOR3 + 2TMUX2
inverter 2-in AND 2-in XOR 3-in XOR The number 2-to-1 MUX of basic 1-bit latch components
2m 7m2 + 10m 5m2 + 4m 2m2 16m2 + 10m
2TMUX + TXOR2 inverter 2-in AND/OR 3-in AND/OR 2-in XOR 2-to-1 MUX
34m2 + 4m + 2 1-bit latch +4m lg(m + 1)
2m2 + 4m 4m 4m2 + 4m 2m2 + 2m 12m2 + 18m 16m2 + 21m + 5 +4m lg(m + 1)
lg(m + 1)-bit zero-checker 2m
lg(m + 1)-bit zero-checker
2m
lg(m + 1)-bit 2m adder
lg(m + 1)-bit adder
2m
Area complexity
O m2
O m2
AT O m2 O m2 product TANDi : the propagation delay through an i-input AND gate. TXORi : the propagation delay through an i-input XOR gate. TMUXi : the propagation delay through an i-to-1 multiplexer.
analysis result shows that our O(1)-time implementation works faster but costs less hardware than the best previously proposed O(1)-time implementation with the same area-time complexity.
Acknowledgements The authors wish to acknowledge the financial support of a Korea University Grant and the CAD tool support of IDEC (IC Design Education Center). The authors would also like to thank the anonymous reviewers for their valuable comments.
326
Hyun-Gyu Kim and Hyeong-Cheol Oh
References [1] Koblitz, N.: A course in number theory and cryptography. 2nd ed. Springer-Verlag New York, Inc. (1994) 313 [2] Paar, C., Rosner, M.: Comparison of arithmetic architectures for Reed-Solomon decoders in reconfigurable hardware. Proc. IEEE FCCM’97 (1997) 219–225 313 [3] Brunner H., Curiger, A., Hofstetter, M.: On Computing Multiplicative Inverses in GF (2m ). IEEE Trans. Comput. 42(8) (1993) 1010–1015 313, 314, 320 [4] Hasan, M. A.: Efficient computation of multiplicative inverses for cryptographic applications. Proc. IEEE ARITH-15 (2001) 66–72 313, 314, 315, 319, 320, 324 [5] Guo, J.-H., Wang, C.-L.: Systolic Array Implementation of Euclid’s Algorithm for Inversion and Division in GF (2m ). IEEE Trans. Comput. 47(10) (1998) 1161– 1167 313, 314, 315, 320, 324, 325 [6] Synopsys, Version 2000.11. Synopsys Inc., Mountain View, CA, (2000) 314 [7] National Institute of Standards and Technology: Digital signature standard. FIPS Publication 186-2 (2000) 319
Ways to Enhance Differential Power Analysis R´egis Bevan and Erik Knudsen Oberthur Card Systems SA 25 rue Auguste Blanche, BP 133, 92800 Puteaux, France {r.bevan,e.knudsen}@oberthurcs.com Abstract. In [1] P. Kocher et al. introduced Differential Power Analysis (DPA), a statistical test (the difference of means) to retrieve secret keys from smart cards power consumption. For the correct hypothesis on the key, the difference of means is significantly different from zero. Hence a large peak is observed in the trace of the difference of means for the correct hypothesis. In the first part of this paper we explain why even with an arbitrarily large number of experiments, the difference of means is not always null for incorrect hypotheses on the key. We show further that peaks observed in the traces of the difference of means for incorrect hypotheses are inherent to the attacked algorithm and that this knowledge can be used to enhance power analysis attacks. Finally we propose another test that under some conditions efficiently detects the correct hypothesis even if incorrect hypotheses show larger peaks on the curves representing the difference of means. The combination of these methods can reduce the number of messages necessary to retrieve a key from a device by a factor greater than 2.
1
Introduction
In [1] P. Kocher et al. introduced Differential Power Analysis (DPA), a statistical test (the difference of means) to retrieve secret keys from smart cards power consumption. This attack exploits the fact that the power consumption of cryptographic hardware is dependent on their activity and in particular is dependent on the value of temporary variables in cryptographic algorithms. In [3], T. Messerges used a simple model of power consumption based on Hamming weights to show the soundness of DPA and a second order DPA attack. In this paper, we use a slightly more sophisticated model that can be seen as a simplification of the model presented by Chari et al. in [2]. The description and justification of the model are presented in section 2. If we take the notation presented in [1], let D(Mi , Kj ) be the selection function with Mi the plaintext (or the ciphertext), and Kj the supposition on the value of a part of the key. D returns the value of a bit of a temporary variable of the attacked cryptographic algorithm. Let ∆DKj be the difference between the average of the traces Ci (t) for which D(Mi , Kj ) is one and the average of the traces Ci (t) for which D(Mi , Kj ) is zero. N N D(Mi , Kj )Ci (t) i=1 (1 − D(Mi , Kj ))Ci (t) ∆DKj (t) = i=1 − N N i=1 D(Mi , Kj ) i=1 (1 − D(Mi , Kj )) P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 327–342, 2003. c Springer-Verlag Berlin Heidelberg 2003
328
R´egis Bevan and Erik Knudsen
P. Kocher assumed that if Kj is incorrect, ∆DKj should approach zero but noticed (like T. Messerges in [5]) that ”the actual trace may not be completely flat, as D with Kj incorrect may have a weak correlation to D with Kj correct”. In section 3, we show with our power consumption model that this correlation even can be strong and depends on the attacked algorithm and the selection function D. In section 4, we deduce some consequences to enhance power analysis. Finally in section 5, we present a new statistical test that under some conditions detects the correct Kj even if the incorrect Kj show larger peaks in the differential trace due to strong correlation or counter-measures.
2
Model of Power Consumption
In this section we present a simple model of power consumption of cryptographic hardware in CMOS. This model is made to quickly simulate the contribution of one variable of the attacked algorithm in the power consumption of the chip and it also allows us to write theoretical results. As in [2] we ignore coupling effects and create a linear model: the power consumption of the chip is the sum of the power consumption of its components. Hence we can isolate the power consumption of the component (register, memory cells or bus lines) storing the temporary variable predicted by the selection function D from the power consumption of the rest of the device. C(t) = Cregister (t) + Crest
of the chip (t)
CMOS devices only consume power when logic states change after a clock top, static power dissipation is negligible (see [2] and [6]). Let B = (βm , βm−1 , ..., β1 ) be the value of the relevant variable to store in the register. Let A = (αm , αm−1 , ..., α1 ) be the previous content of the register. Let c01 (resp. c10 ) be the amount of power needed to flip a bit of the register from 0 to 1 (resp. from 1 to 0). If after the clock top the value B is stored in the register, the power consumption of the chip after the clock top is: C(t) =
m
(1 − αi )βi c01 (t) + αi (1 − βi )c10 (t) + Crest
of the chip (t)
i=1
We assume that the sequence of instructions executed by the chip does not depend on the data (that is a common counter-measure against SPA). So from one execution of the cryptographic algorithm to another, the chip states differ only by the value of the temporary variables. Moreover, as most processors can not change more than one variable per cycle, the power consumption between two executions of the attacked algorithm differs in the same cycle only by the contribution of the register where the relevant variable is stored and by the noise. The power consumption of the rest of the chip can thus be modelled by a gaussian noise w(t) whose mean represents the constant part of power
Ways to Enhance Differential Power Analysis
329
consumption between two different executions of the algorithm. C(t) =
m
(1 − αi )βi c01 (t) + αi (1 − βi )c10 (t) + w(t)
(1)
i=1
This model can easily be simulated with Matlab to verify the soundness of a new attack.
3
First Order DPA
We have attacked a software DES without counter-measures on a smart card with a Von Neumann architecture. The selection function is one bit of the right buffer after the first round. The power consumption is measured at the end of the first round (the S-boxes, the permutation P, and the XOR between the left and right part of the buffer). We observe that at time t1 = 5.916µs spikes are present in the ∆DKj (t) traces (the difference of means for the correct hypothesis is indicated by the solid line). If we analyse the evolution of ∆DKj (t1 ) for all Kj , they do not always converge to null (see figure 1). In this section, we explain why these values are not always null and how to calculate them in a particular case.
difference of means at t1 = 5.916 µs
−3
8
x 10
6
4
2
∆D
K
j
0
−2
−4
−6
−8
−10
0
200
400
600
800 1000 1200 number of messages
1400
1600
1800
2000
Fig. 1. Evolution of the difference of means with the number of messages increasing
330
R´egis Bevan and Erik Knudsen
When the number of traces approaches infinity, the empirical mean converges to the statistical expectation so: lim ∆DKj (t) = E{Ck (t)|D(Mk , Kj ) = 1} − E{Ck (t)|D(Mk , Kj ) = 0}
N →+∞
E{Ck (t)|D(Mk , Kj ) = 1} = E{(1 − αmk )βmk c01 (t) + αmk (1 − βmk )c10 (t)|D(Mk , Kj ) = 1} + E{
m−1
(1 − αik )βik c01 (t) + αik (1 − βik )c10 (t)|D(Mk , Kj ) = 1}
i=1
+ E{w(t)|D(Mk , Kj ) = 1} E{Ck (t)|D(Mk , Kj ) = 0} = E{(1 − αmk )βmk c01 (t) + αmk (1 − βmk )c10 (t)|D(Mk , Kj ) = 0} + E{
m−1
(1 − αik )βik c01 (t) + αik (1 − βik )c10 (t)|D(Mk , Kj ) = 0}
i=1
+ E{w(t)|D(Mk , Kj ) = 0} Let βmk be the target value of the selection function D(Mk , Kj ), i.e. the function D computes the supposed value of the bit βmk knowing Mk and guessing Kj . Let K∗ be the actual key. Therefore ∀k, βmk = D(Mk , K∗ ) To calculate E{Ck (t)|D(Mk , Kj ) = 1} and E{Ck (t)|D(Mk , Kj ) = 0} we need a simplifying hypothesis: Hypothesis: αmk is independent of βmk and D(Mk , Kj ). So E{αmk βmk } = E{αmk }E{βmk } This hypothesis is not only a simplifying hypothesis but also a realistic one. For example are systems with precharge often used in memory cells, i.e. when updating a memory cell the cell is first charged with a constant value (111..111b) then charged with the new value. Also in a Von Neumann architecture, the bus lines are first charged with the address of the data (and so remains constant between two executions of the algorithm), then with the value of the data. In these two cases, αmk is a constant and thus independent of βmk and D(Mk , Kj ). Moreover the implementation of permutations in cryptographic algorithms often implies that the previous value of a variable is not correlated to the next value in the first or last round of the algorithm which are the rounds targeted in DPA. m−1 If we denote E{αmk } = αm and Zk (t) = i=1 (1 − αi )βik c01 (t) + αi (1 − βik )c10 (t) then: E{Ck (t)|D(Mk , Kj ) = 1} = c01 (t)(1 − αm )E{βmk |D(Mk , Kj ) = 1} + c10 (t)αm (1 − E{βmk |D(Mk , Kj ) = 1}) + E{Zk (t)|D(Mk , Kj ) = 1}
Ways to Enhance Differential Power Analysis
331
+ E{w(t)|D(Mk , Kj ) = 1} E{Ck (t)|D(Mk , Kj ) = 0} = c01 (t)(1 − αm )E{βmk |D(Mk , Kj ) = 0} + c10 (t)αm (1 − E{βmk |D(Mk , Kj ) = 0}) + E{Zk (t)|D(Mk , Kj ) = 0} + E{w(t)|D(Mk , Kj ) = 0} Let ρj be the probability that D(Mk , Kj ) = D(Mk , K∗ ) for Mk random (It is obvious that ρ∗ = 1). P (βmk = 1|D(Mk , Kj ) = 1) = P (D(Mk , K∗ ) = 1|D(Mk , Kj ) = 1) = P (D(Mk , K∗ ) = D(Mk , Kj )|D(Mk , Kj ) = 1) P (D(Mk , K∗ ) = D(Mk , Kj ), D(Mk , Kj ) = 1) = P (D(Mk , Kj ) = 1) P (D(Mk , K∗ ) = D(Mk , Kj ))P (D(Mk , Kj ) = 1) = P (D(Mk , Kj ) = 1) = P (D(Mk , K∗ ) = D(Mk , Kj )) So P (βmk = 1|D(Mk , Kj ) = 1) = ρj . E{βmk |D(Mk , Kj ) = 1} = P (βmk = 1|D(Mk , Kj ) = 1) × 1 + P (βmk = 0|D(Mk , Kj ) = 1) × 0 E{βmk |D(Mk , Kj ) = 1} = ρj Now we have to find E{βmk |D(Mk , Kj ) = 0} in order to calculate E{Ck (t)|D(Mk , Kj ) = 0}. Let G and F be two assumptions, and G the complementary of G: P (F )(1 − P (G|F )) P (F )P (G|F ) P (F G) = = 1 − P (G) 1 − P (G) P (G) P (F )P (G|F ) P (F ) P (GF ) P (F ) − = − = 1 − P (G) 1 − P (G) 1 − P (G) 1 − P (G)
P (F |G) =
So P (F |G) =
P (G) P (F ) − P (F |G) 1 − P (G) 1 − P (G)
(2)
If F : ”βmk = 1”, G : ”D(Mk , Kj ) = 1”, then P (F ) = P (G) = 12 (the probability of a bit being equal to 1) thus according to the equation (2) P (βmk = 1|D(Mk , Kj ) = 0) = 1 − ρj . So E{βmk |D(Mk , Kj ) = 0} = 1 − ρj . Since the noise is not correlated to the data thus to D(Mk , Kj ), then E{w(t)|D(Mk , Kj ) = 1} = E{w(t)|D(Mk , Kj ) = 0} = E{w(t)}.
332
R´egis Bevan and Erik Knudsen
Finally: E{Ck (t)|D(Mk , Kj ) = 1} = c01 (t)(1 − αm )ρj + c10 (t)αm (1 − ρj ) + E{Zk (t)|D(Mk , Kj ) = 1} + E{w(t)} E{Ck (t)|D(Mk , Kj ) = 0} = c01 (t)(1 − αm )(1 − ρj ) + c10 (t)αm ρj + E{Zk (t)|D(Mk , Kj ) = 0} + E{w(t)} lim ∆DKj (t) = (2ρj − 1)(c01 (t)(1 − αm ) − c10 (t)αm )
N →+∞
+ E{Zk (t)|D(Mk , Kj ) = 1} − E{Zk (t)|D(Mk , Kj ) = 0} (3) 3.1
Case E(Zk (t)|D(Mk , Kj ) = 1) = E(Zk (t)|D(Mk , Kj ) = 0)
This is the ideal case in the sense that the noise produced by the other bits of the register cancels out. If E(Zk (t)|D(Mk , Kj ) = 1) = E(Zk (t)|D(Mk , Kj ) = 0) then: lim ∆DKj (t) = (c01 (t)(1 − αm ) − c10 (t)αm )(2ρj − 1) N →+∞
Thus for the correct guess of the key : lim ∆DK∗ (t) = c01 (t)(1 − αm ) − c10 (t)αm
N →+∞
but for the incorrect guesses, as 0 < ρj < 1 : lim |∆DKj (t)| = |c01 (t)(1 − αm ) − c10 (t)αm ||2ρj − 1|
N →+∞
lim |∆DKj (t)| < |c01 (t)(1 − αm ) − c10 (t)αm |
N →+∞
lim |∆DKj (t)| <
N →+∞
lim |∆DK∗ (t)|
N →+∞
So in the ideal case, the peaks for the incorrect guesses are not always null (ρj can be different from 12 , it depends on the algorithm) but they are always smaller than the peak observed for the correct guess. This explains why increasing the number of measurements for a DPA does not decrease the peaks for the incorrect guesses. These peaks are inherent to the attacked algorithm. The ideal case is reached when Zk (t) is independent of D(Mk , Kj ). Zk (t) is in particular independent of D(Mk , Kj ) when the βjk are independent of βmk . This condition is verified if we attack the DES with messages chosen uniformly at random and take as selection function one of the 32 bits of the right buffer after the first round. The XOR between the left and right parts makes the bits independent since the left part comes directly from the initial message. Calculating the ρj is an easy task since one bit of one S-box’ output depends only on 6 bits of the initial message. In figure 2, the observed difference of means in dotted line (the same experiment that in figure 1 at time t1 = 5.916µs) are close to the theoretical results in solid line (we took αm = 0 and c10 = ∆DK∗ (t1 )).
Ways to Enhance Differential Power Analysis
−3
6
333
Observed and theoretical difference of means
x 10
4
2
∆
D
K
j
0
−2
−4
−6
−8
0
10
20
30 40 hypotheses on the key
50
60
Fig. 2. Observed and theoretical results after 1900 messages
3.2
Case E(Zk (t)|D(Mk , Kj ) = 1) = E(Zk (t)|D(Mk , Kj ) = 0)
= E(Zk (t)|D(Mk , Kj ) = 0) then nothing proves If E(Zk (t)|D(Mk , Kj ) = 1) that the difference of means for the correct guess will be greater than the other difference of means. It depends on the correlation between the βjk and βmk . We computed DPA with the same set of messages and power consumption traces as in figure 1 but with a different selection function. We used one output bit of the eighth S-box as a selection function. The evolution of the maxima of the difference of means is shown in figure 3, the difference of means for the correct hypothesis (in solid line with stars) does not present the largest peak (compare figure 3 with figure 1). So with this selection function, DPA failed on this implementation. Thus the choice of the selection function is very important to perform an efficient DPA. In the next section we propose some guidelines to decrease the number of messages necessary to retrieve the key with DPA.
4 4.1
Consequences for the Attacks Choice of the Selection Function
The common criteria for DPA is to choose the guess which has the largest peak in absolute value. According to section 3, the first guideline to DPA is to choose (if possible) a selection function where the βj are mutually independent in order
334
R´egis Bevan and Erik Knudsen
difference of means at t1=5.916µs 0.015
0.01
∆D
K
j
0.005
0
−0.005
−0.01
−0.015
0
500
1000 1500 number of messages
2000
2500
Fig. 3. Evolution of the difference of means where D is one output bit of Sbox 8 to be sure that the largest peak belongs to the difference of mean trace of the correct hypothesis. The second guideline is to choose a selection function which presents small peaks for the incorrect guesses. According to the Central Limit Theorem, the value of the difference of means after N messages can be seen as a gaussian whose mean and variance are equal to (c01 (t)(1 − αm ) − c10 (t)αm )(2ρj − 1) and √σ respectively where σ is the variance of the noise w(t) in equation (1). It is N thus easy to see that to minimize the probability that an incorrect hypothesis has a difference of means greater in absolute value than the difference of means of the correct hypothesis, the attacker has to maximize the distance between |(c01 (t)(1−αm )−c10 (t)αm )(2ρj −1)| and |c01 (t)(1−αm )−c10 (t)αm |. For instance, to attack the 6 bits of the key used before the first S-box of the first round of the DES, the best choice is the second bit of the S-box output. See figure 4, the |2ρj − 1| are smaller for bit2 than for the other bits (|2ρj − 1| = 1 for the correct guess). 4.2
Combination of Selection Functions
Another way to increase the difference between the correct guess and the incorrect guesses is to combine the results from several selection functions. For instance, there are 4 selection functions to attack 6 bits of a DES key. Hence instead of deciding the value of the key when the four selection functions give the same result, the result will be reached faster if the four differences of means
Ways to Enhance Differential Power Analysis
correlation bit1
correlation bit2
1
1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
20
40
60
80
0
0
20
correlation bit3 1
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
20
40
40
60
80
60
80
correlation bit4
1
0
335
60
80
0
0
20
40
Fig. 4. |2ρj − 1| in ascending order for the 4 output bits of the first S-box are summed. However, this method is efficient if and only if the values of the four bits influence the power consumption at the same time and in the same way. If this condition is fulfilled: lim sum(∆DK∗ (t))/4 = c01 (t)(1 − αm ) − c10 (t)αm
N →+∞
lim sum(∆DKj (t))/4=(c01 (t)(1−αm )−c10 (t)αm )(2(
N →+∞
ρ
+ρ
+ρ
ρ1j + ρ2j + ρ3j + ρ4j )−1) 4
+ρ
As 1j 2j 4 3j 4j is closer to 12 than ρkj , the peaks in the traces of the mean of the difference of means for incorrect hypotheses are smaller than the ones in the traces of the difference of means taken separately. Figure 5 shows the evolution with the number of messages of the |∆DKj | for an attack on the first S-box of the DES. The mean of the difference of means points out the correct guess (solid line with stars) faster and more explicitly than the difference of means of each selection function separately: the mean of ∆DK∗ is greater in absolute value than the mean of ∆DKj after 5000 messages whereas ∆DK∗ is greater in absolute value than the ∆DKj after 10000 messages. This method was efficient to decrease the number of messages necessary to retrieve the key of a hardware DES.
5
Likelihood Test
DPA is often blind, i.e. the attacker does not know when the bit used for the selection function is actually manipulated by the card. Sometimes peaks occur
336
R´egis Bevan and Erik Knudsen
−3
4
difference of means with bit1
x 10
3 |∆ | 2 K j
1
0
0
5000
−3
4
10000
15000
mean of the difference of means
x 10
3 mean(|∆ |) 2 K j
1
0
0
5000
10000
15000
number of messages
Fig. 5. Evolution of the difference of means with an increasing number of samples
before or after the bit of the selection function is used, because previous or subsequent bits of the algorithm are correlated with the bit of the selection function or because noise is more present in power consumption samples before or after the time the bit of the selection function is used (this noise can also be created by a bad alignment of the measurements ). Finally the traces of difference of means can show different peaks and the peak corresponding to the correct hypothesis can be the largest at the time when the bit of the selection function is manipulated without being the largest considering the complete duration of the measurements. P. Kocher et al. in [6] presented an enhancement of DPA with the computation of the standard deviation to normalize the difference of means. If C0 (resp. C1 ) denotes the set of elements for which D(Mi , Kj ) = 0 (resp. D(Mi , Kj ) = 1), s[U ] and n[U ] denote the standard deviation and the number of elements of the set U , then the test presented in [6] is:
∆DK = j
C1 − C0 s[C1 ]2 n[C1 ]
+
s[C0 ]2 n[C0 ]
Since noisy parts of power consumption have a greater standard deviation, the peaks observed in these parts are decreased. Sometimes it is not sufficient to point out the correct key because noisy peaks can remain larger than the relevant one.
Ways to Enhance Differential Power Analysis
337
In this section we introduce a test that detects the time when the bit of the selection function influences the power consumption. 5.1
Chosen Text DPA
To present our test, we use a concrete case: the attack of a hardware DES with chosen messages. As T.Messerges et al. noticed in [5], attackers can reduce the algorithmic noise by sending chosen messages to the attacked device. However a chosen text attack implies that the bits inside the algorithm are not distributed uniformly so peaks not stemming from the bit of the selection function can be observed. One bit of the right buffer of the DES after the first round comes from only seven bits of the input message. Hence only 128 messages are necessary to attack 6 bits of the key. Figure 6 shows the DPA traces for the 64 possibilities of the subkey before the fourth S-box. The selection function was one bit of the right-hand value of the input of the second round of the DES (see 3.1).The 128 messages were sent 156 times to the smart card, only the maximum value per cycle was kept for each sample. It is clear that the biggest peak corresponds neither to the correct guess (solid line with stars) nor to the right time. A closer look at the cycles 145 to 200 shows that the peaks reveal the structure of the DES: one round per cycle (16 peaks) separated by one cycle per key shift. Thus the time where the peak of the correct hypothesis is the largest (cycle 147)
−3
2.5
difference of means after 19968 chosen messages
x 10
2
1.5
1
0.5
∆D
K
j
0
−0.5
−1
−1.5
−2
0
50
100
150
200
clock cycles
Fig. 6. DPA traces with chosen messages
250
338
R´egis Bevan and Erik Knudsen
Evolution of maxt(L(Kj,t)) with the number of messages 0.9
0.8
0.7
0.6
maxt(L(Kj,t))
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8 1 1.2 number of messages
1.4
1.6
1.8
2 4
x 10
Fig. 7. Evolution of the maximum of likelihood for each hypothesis
corresponds to the end of the first round. This assumption is verified by our test in the following subsection. 5.2
Likelihood Test
In figure 6, peaks are present in the rounds 2 to 16 because the messages are not uniformly distributed. If the structure of the DES was had not been so easily deduced, we would have needed a test to choose which cycle corresponds to the end of the first round. To this end, we use the fact that the differences of means converge to a known distribution according to section 3. If r is the number of bits of Kj , the ∆DKj (t) at time t can be represented as a vector of length 2r . This vector is according to the Central Limit Theorem a sample vector of a gaussian distribution with mean vector holding (c01 (t)(1 − αm ) − c10 (t)αm )(2ρj − 1) in the j’th entry and covariance matrix Γ∆ . If we denote c(t) = c01 (t)(1 − αm ) − c10 (t)αm and 2ρj − 1 = corrj , then: ∆DK1 (t) corr1 . .. = ∆D (t) ∼ N (c(t)corr, Γ∆ ) = N (c(t) .. , Γ∆ ) . ∆DK2r (t) ∆ We also have : ∆D (t)/c(t) ∼ N (corr, cΓ2 (t) )
corr2r
Ways to Enhance Differential Power Analysis
339
Evolution of the maxima of ∆DK
j
0.012
0.01
0.008
|∆K | 0.006 j
0.004
0.002
0
0
0.5
1 1.5 number of messages
2
2.5 4
x 10
Fig. 8. Classical DPA attack
The likelihood of each hypothesis Kj is thus: L(Kj , t)=
1 c2 (t) −1 exp(− (∆D (t)/c(t)−corrj )T Γ∆ (∆D (t)/c(t)−corrj )) r 2 2 2π det(Γ∆ )
We do not know the exact value of c(t) but ∆DKj (t) is an estimator of c(t) for each hypothesis Kj . The calculation of Γ∆ is hard so for a first approximation we suppose Γ∆ is equal to the identity matrix (Least Square Method). Thus the test is to choose as the correct key the Kj that maximizes the quantity: 1 L(Kj , t) = exp(− (∆D (t)/∆DKj (t) − corrj )T (∆D (t)/∆DKj (t) − corrj )) 2 We applied this method with success to our attack described in the previous subsection 5.1(chosen text and right-hand value of DES). Figure 7 shows the evolution of the maximum of likelihood for each hypothesis with an increasing number of messages. With 2048 messages, the correct hypothesis (solid line with stars) is clearly distinguishable as is the time when the bit of the selection function is manipulated (see figure 9) whereas the peak of the correct key is smaller than noisy peaks in the traces of the difference of means (figure 10). We have also processed a classical DPA against the same device (random messages and right-hand value). The evolution of the maximum of the absolute value of
340
R´egis Bevan and Erik Knudsen
Likelihood after 2048 messages 0.7
0.6
0.5
0.4 L(Kj,t) 0.3
0.2
0.1
0
0
50
100
150
200
250
cycles t
Fig. 9. Likelihood for each hypothesis after the processing of 2048 messages
the difference of means for each hypothesis on the subkey is shown in figure 8. The correct key (solid line with stars) is retrieved with 25000 messages. This example shows the efficiency of our new test as soon as the temporary result used by the selection function actually is manipulated by the attacked device. In further work the knowledge of Γ∆ may give better results.
6
Conclusion
The peaks observed in the traces of the difference of means for the incorrect hypotheses have been explained. They are inherent to the attacked algorithm and the knowledge of their distribution can be used to enhance DPA. A new test based on this knowledge (a kind of Least Square Method) has been presented and proved efficient to detect the correct hypothesis on the key even if the correct hypothesis does not present the largest peak of all the traces. The addition of several methods (for methods against hardware counter-measures see the paper of Clavier et al. [7] to reduce the number of samples necessary to retrieve keys can reduce this number dramatically. Theoretical counter-measures (as presented in [2], [8], [9], [10] and [11]) that guarantee the nullity of the difference of means should be preferred rather than counter-measures hiding the signal into noise.
Ways to Enhance Differential Power Analysis
d~fferenceof means after 2048 messages
x 1o-? I
-2.5
0
341
I
50
100
I
I
150
200
I
cycles t
Fig. 10. Difference of means for each hypothesis after the processing of 2048 messages
References [I] Kocher P., Jaffe J.: Jun B.: "Differential power analysis" Advances in Cryptology - Proceedings of CRYPTO'99 327 [2] Chari S., Jutla CS., Rao JR., Rohatgi P.,"Towards sound approaches to counteract power analysis attacks", Advances in Cryptology - Proceedings of CRYP'I'0799 327, 328, 340 [3] hlesserges TS., "IJsing second-order power analysis to attack DPA resistant software", Cryptographic Hardware and Embedded Systems - Proceedings of CHES 2000 327 [4] hlesserges TS., Dabbish EA., Sloan RH., L'Investigationsof power analysis attacks on smart cards" Proceedings of the USENIX Workshop on Smart Card 'Ikchnology 1999 [5] Messerges TS., Dabbish EA., Sloan RH., "Examining Smart-Card Security under the Threat of Power Analysis Attacks", IEEE 'I'ransactions on computers, Vol. 51, No. 5, May 2002 328, 337 [6] Coron JS.: Kocher P., Naccache D., "Statistics and secret leakage", Proceedings Cryptography 2000 328, 336 of Fz~aa~aczal [7] Clavier C., Coron JS., Dabbous N.: "Differential power analysis in the presence of hardware countermeasures", Cryptographic Harduiare and Embedded Systems Proceedings of CHES 2000 340
342
R´egis Bevan and Erik Knudsen
[8] Goubin L., Patarin J., “DES and differential power analysis, the duplication method”, Cryptographic Hardware and Embedded Systems - Proceedings of CHES 1999 340 [9] Messerges TS., “Securing the AES finalists against power analysis attacks”, Fast Sofware Encryption - Proceedings of FSE 2000 340 [10] Coron JS., Goubin L., “On boolean and arithmetic masking against differential power analysis”, Cryptographic Hardware and Embedded Systems - Proceedings of CHES 2000 340 [11] Akkar M, Giraud C, “An implementation of DES and AES, secure against some attacks”, Cryptographic Hardware and Embedded Systems - Proceedings of CHES 2001 340
A Simple Power-Analysis (SPA) Attack on Implementations of the AES Key Expansion Stefan Mangard Institute for Applied Information Processing and Communications Graz University of Technology Inffeldgasse 16a, A-8010 Graz, Austria [email protected]
Abstract. This article presents a simple power-analysis (SPA) attack on implementations of the AES key expansion. The attack reveals the secret key of AES software implementations on smart cards by exploiting the fact that the power consumption of most smart-card processors leaks information during the AES key expansion. The presented attack efficiently utilizes this information leakage to substantially reduce the key space that needs to be considered in a brute-force search for the secret key. The details of the attack are described on the basis of smart cards that leak the Hamming weight of intermediate results occurring during the AES key expansion. Keywords: Smart Cards, Power Analysis, SPA, AES, Key Expansion, Key Scheduling.
1
Introduction
The symmetric block cipher Rijndael [6] has been standardized by NIST1 as Advanced Encryption Standard (AES) [19] in November 2001. Being the successor of the Data Encryption Standard (DES) [20], the AES is used in a wide range of applications—in particular, the AES is also used on smart cards. Implementations of cryptographic algorithms on smart cards, however, are susceptible to simple and differential power-analysis (SPA and DPA) attacks. These attacks have been introduced by P. Kocher in [13] based on implementations of the DES. Power attacks on implementations of the AES have been discussed for the first time at the second AES conference in Rome (see [4], [5] and [8]). Due to the fact that these attacks are relatively cheap and easy to conduct on implementations of the AES, appropriate countermeasures need to be implemented on smart cards and similar devices. There exist two basic approaches to protect a device: The first one is to implement the AES based on a logic style [23] that is resistant against power
1
The work described originates from the European Commission funded project Crypto Module with USB Interface (USB CRYPT) established under contract number IST2000-25169 in the Information Society Technologies (IST) program. National Institute of Standards and Technology
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 343–358, 2003. c Springer-Verlag Berlin Heidelberg 2003
344
Stefan Mangard
attacks. This is a very effective countermeasure, yet the drawback is that it is quite expensive to implement. Usually, a full-custom design is necessary to make the power consumption a CMOS circuit independent of the data it processes. The alternative to this approach is the randomization of the intermediate results that occur during AES en- and decryptions. This so-called masking of the intermediate results counteracts first-order DPA attacks and is usually used, when the AES is implemented in software on a standard smart-card processor. There are several recent publications (see [2], [11], [12] and [24]) that describe various masking strategies for the AES. However, it is important to outline that the currently proposed masking strategies do not provide any protection to the implementation of the AES key expansion. This article shows that such unprotected implementations of the AES key expansion on 8-bit and in certain cases even on 32-bit processors are susceptible to an SPA attack, which can be conducted efficiently in practice. This SPA attack on the AES key expansion exploits the fact that the power consumption of standard smart-card processors leaks information about the intermediate results occurring during the key expansion. The basic idea of the attack is to use this information leakage to substantially reduce the number of keys that need to be considered in a brute-force search for the secret key. The attack is described on the basis of devices that leak the 8-bit Hamming weight of the data they process. Devices with this behavior have previously been analyzed in [4], [15], [17] and [18]. The attack, however, is not limited to such devices—it is also applicable to devices leaking other information about the data they process. Explanations describing to what extend the attack needs to be modified for these devices are provided in the corresponding sections of this article. The only publication so far that is dealing with power attacks on the key scheduling/expansion of symmetric ciphers (in particular the DES) is [4]. Yet in [4], no statement is made on how attacks on the AES key expansion can be done or dealt with. Neither does this article contain a statement on whether power attacks on the AES key expansion are practically feasible at all or not. The present work shows that such attacks can be conducted efficiently and that they consequently pose a serious practical threat to software implementations of the AES on smart cards. Today, in most applications the AES is used with a 128-bit secret key (this variant is called AES-128) and so the discussion of the attack is restricted to this particular key size. However, in principle it is also extendable to 192-bit and 256-bit secret keys. The remainder of this article is organized as follows. Section 2 presents the basics of the AES-128 key expansion. The requirements that need to be fulfilled by a device for an SPA attack on the AES key expansion are described in detail in Section 3. These requirements are also compared with those necessary for a classical first-order DPA attack. On the basis of an 8-bit smart card processor leaking Hamming weight information, Section 4 provides the details of the SPA attack on implementations of the AES key expansion. Variants of the
A Simple Power-Analysis (SPA) Attack
345
attack, in particular one that is applicable to 32-bit processors, are presented in Section 5. In this section also the effectiveness of the attack and its variants is analyzed. Proposals on how to counteract this attack are presented in Section 6. Concluding remarks can be found in Section 7.
2
AES-128 Key Expansion
For the computation of the AES-128, the 128-bit secret key is expanded to eleven 128-bit round keys. The basic idea of this key expansion is that the first round key, RoundKey0, corresponds to the secret key. All subsequent round keys are derived by a function f from their respective predecessor. So, RoundKeyi = f (RoundKeyi−1 ) for all 0 < i < 11. Figure 1 shows a pseudo code for the AES-128 key expansion. This pseudo code is based on 32-bit key words—the eleven 128-bit round keys are stored one after the other in the word array W[0..43]. The RotWord function, used in the pseudo code, simply rotates the input word by one byte to the left. The SubWord function applies the AES S-Box function to each byte of the input word. FIPS 197 [19] provides a detailed description of the AES key expansion and a definition of the AES S-Box function. For the attack described in this article, the following observations about the AES-128 key expansion are important: Given an arbitrary round key, RoundKeyi where 0 ≤ i < 11, all other round keys (this includes the secret key) can be calculated. Consequently, knowing a round key and its round number is equivalent to knowing the secret key. This is due to the fact that there exists an inverse to the function f , which can be used to calculate RoundKeyi−1 based on RoundKeyi . Figure 2 illustrates how to calculate RoundKey1 based on RoundKey0 and the corresponding inverse operation. The notation used in this figure is used
RC[1..10] = (’01’,’02’,’04’,’08’,’10’,’20’,’40’,’80’,’1B’,’36’) Rcon[i] = (RC[i],’00’,’00’,’00’) for(i = 0; i < 4; i++) W[i] = (key[0], key[1], key[2], key[3]) for(i = 4; i < 44; i++) { temp = W[i - 1] if (i mod 4 == 0) temp = SubWord(RotWord(temp)) xor Rcon[i / 4] W[i] = W[i - 4] xor temp }
Fig. 1. Pseudo code for the AES-128 key expansion
346
Stefan Mangard
W4,j = S-Box(W3,j+1 ) ⊕ Rcon1,j ⊕ W0,j
W0,j = S-Box(W3,j+1 ) ⊕ Rcon1,j ⊕ W4,j
W5,j = W4,j ⊕ W1,j
W1,j = W4,j ⊕ W5,j
W6,j = W5,j ⊕ W2,j
W2,j = W5,j ⊕ W6,j
W7,j = W6,j ⊕ W3,j
W3,j = W6,j ⊕ W7,j
Fig. 2. The equations on the left side show the calculation of the 16 bytes of RoundkeyKey1 (W4...7,0...3 ) based on RoundKey0 (W0...3,0...3 ). The equations on the right side show the corresponding inverse operation: the calculation of RoundkeyKey0 based on RoundkeyKey1 throughout the remainder of this article: The byte Wi,j refers to the (j mod 4)th byte of the ith key word. In Figure 2, it can be observed that in order to calculate one byte of an arbitrary round key, RoundKeyi , not all bytes of RoundKeyi−1 or RoundKeyi+1 need to be known. This is vital for the presented attack. Indeed, the entire attack is based upon the fact that already few bytes of one round key suffice to calculate many bytes of the previous round keys and of the following. This property of the AES key expansion will be discussed in detail in Section 4.
3
Attack Scenario
Power attacks exploit the fact that the power consumption of most devices implemented in CMOS depends on the data that is being processed. However, this dependency on its own is not enough to carry out a power attack—contingent on the type of power attack, several other requirements need to be fulfilled. In this section, the requirements for first-order DPA attacks are listed and those for the SPA attack of the AES key expansion are presented in detail. This way, the additional effort that is necessary to conduct an SPA attack on the AES key expansion becomes clear. A first-order DPA attack [13] on a device implementing a symmetric cipher, like the AES, is based on the following four requirements: 1. The attacker has the opportunity to measure the power consumption of the device while it performs several en-/decryptions with the same key, but with different data blocks. 2. Either the corresponding cipher- or plaintexts need to be known by the attacker. 3. One intermediate result that occurs during the en-/decryption needs to be a function of the cipher-/plaintext and a small number of key bits. 4. The power consumption of the device for the processing of this intermediate result has to be different for at least two different values. In practice, the first two requirements are fulfilled by almost all devices performing cryptographic operations. This is why the goal of all countermeasures
A Simple Power-Analysis (SPA) Attack
347
proposed so far is to make devices not fulfill either requirement three or four. For example, devices implementing a masked AES do not fulfill requirement three. Devices based on a logic style that is resistant against power attacks do not fulfill requirement four. Since first-order DPA attacks are already known for years, no current smart card should fulfill all listed requirements any more. The requirements for an SPA attack on the AES key expansion are different, and the important point is that they are fulfilled by most devices protecting the AES only by one of the currently proposed masking schemes. The requirements for an SPA attack on the AES key expansion are: 1. The attacker has the opportunity to measure the power consumption of the device while it performs at least one full AES key expansion (RoundKey0 → RoundKey10 or RoundKey10 → RoundKey0). The exact number of required measurements depends on the amount of noise in the measurements. This noise can be reduced by using the average power trace of multiple measurements for the attack. 2. The attacker needs to be able to assign each intermediate result occurring during the AES key expansion to a part of the (average) power trace that is mostly caused by the processing of this intermediate result. 3. The attacker needs to be able to obtain sufficient information from each part of the (average) power trace that is caused by the processing of an intermediate result to conduct a brute-force search, if all the leaking information can be utilized as described in Section 4. 4. One ciphertext and sufficient properties of the corresponding plaintext need to be known by the attacker to conduct a brute-force search of a relatively small set of possible keys. The first requirement is similar to the one of a first-order DPA attack—in fact it is even weaker because the data input of the device does not matter. The requirements two to four are quite different. The following subsections discuss them in detail and describe under which circumstances these requirements are fulfilled in practice. 3.1
Assignment of Intermediate Results to Specific Parts of the Power Trace
For an SPA attack on the AES key expansion, it is required that the attacker can assign each intermediate result to a part of the power trace, which is mostly caused by the processing of this value. For this purpose, the parts of the power trace being influenced by the intermediate results of the AES key expansion are usually at first separated from the rest of the power trace. A strategy to identify the parts of a power trace that directly depend on the key or the intermediate results of a key expansion is described in [4] and [9]. The basic idea of this strategy is to first make power measurements of a device that en-/decrypts different data blocks with the same key. Second, the power consumption of identical devices with different keys is measured while the same
348
Stefan Mangard
input block is en-/decrypted. By analyzing the variance of these two sets of measurements, the parts of the power trace that are influenced by the key or by intermediate results of the key expansion can be identified successfully. However, in many cases a simpler strategy is applicable. Devices implementing the AES usually offer a function to perform a full AES key expansion (RoundKey10 is calculated based on RoundKey0 ). In devices with sufficient memory this function is used to pre-calculate and store all round keys. This allows fast data en- and decryption. Devices with a small memory typically implement a function for a full key expansion to switch from en- to decryption mode. The reason for this is the fact that in an encryption, RoundKey0 is used as first key, while a decryption uses RoundKey10 as initial key. The ten expansion steps that are performed during a full AES key expansion (RoundKey0 → RoundKey1 → . . . → RoundKey10 ) are usually clearly visible in the power trace of a device. By subtracting the power consumption of these expansion steps from each other, an attacker can determine the parts of the power trace that depend on an intermediate result of the key expansion. The difficult task in practice is to find out which part of the power trace is caused by which intermediate result. Obviously, this task is manageable for someone knowing the assembler code that runs on the device. However, also an attacker not having access to such insider knowledge can do this assignment. The bytes/words of the round keys cannot be calculated in arbitrary order during the key expansion. Assuming the key expansion is implemented in a more or less standard way (for example implementations see [7] and [21]; for 32 bit implementations also see [3]) an attacker has a good chance to do this assignment—in particular because the attack described in Section 4 leads to contradictions, if this assignment is not done correctly. 3.2
Extraction of Information about the Intermediate Results from the Power Trace
A successful SPA attack on the AES key expansion requires that the attacker knows the power consumption characteristic of the attacked device, i.e. it is necessary that the attacker can reduce the number of possible values for certain intermediate results of the key expansion based on a measured (average) power trace. For example, if a power trace reveals that a specific 8-bit intermediate result has the Hamming weight three, the number of possible values for this intermediate result is reduced from 28 = 256 to 83 = 56. Provided that a device actually leaks information about the data it processes, the challenging task for an attacker is to learn the power consumption characteristic of the device. An attacker not having access to insider information, has to characterize the device on his/her own. Devices with a power consumption that is proportional to the Hamming weight they process are relatively easy to characterize. In this case, the attacker can identify the power levels that correspond to the processing of data with a certain Hamming weight by visual inspection of a measured (average) power
A Simple Power-Analysis (SPA) Attack
349
trace and by building differences of power traces. E.g. for 8-bit devices nine power levels need to be identified this way. Devices with other power consumption characteristics, as for example those analyzed by Akkar et. al. in [1], can be characterized in the following ways: – An attacker having access to a developer version of the device can run test programs with different data and record the power consumption. This way an exact characterization is possible. – An attacker having only access to the attacked device, can exploit the fact that he/she usually knows certain data, which is processed by the device. For example, at least the ciphertext is usually known. By locating the part of the power trace that is caused by the processing of the ciphertext and by running en-/decryptions with different data blocks, the attacker can learn how the power consumption of the device depends on the processed data. This characterization leads to good results because the last instruction (the instruction leading to the ciphertext) of the AES and of masked AES variants is an xor(). This is also most frequently used instruction during the AES key expansion. 3.3
Knowledge of the Ciphertext and of Sufficient Properties of the Corresponding Plaintext
As already mentioned in the introduction, the presented SPA attack is based on substantially reducing the number of keys that need to be considered in a bruteforce search. In order to perform a brute-force search of the remaining keys successfully, besides one ciphertext, also information about the corresponding plaintext needs to be known. In practice, there are many scenarios in which the attacker has information about the plaintext. For example, in case of an electronic purse the plaintext contains information the amount of money that needs to be transferred. Information about the plaintext may also be extracted from the power trace according to the same strategy as it is used for the intermediate results of the AES key expansion.
4
Description of the Attack Based on Hamming Weight Leakage
In this section, the SPA attack on the AES key expansion is presented on basis of a device leaking the 8-bit Hamming weight of the data it processes, i.e. the 8-bit Hamming weight of all intermediate results occurring during the AES key expansion can be extracted from the power trace of the device. Leaking information, as for example the Hamming weight of intermediate results, is extremely valuable to an attacker. The following observation makes this fact clear: Assume an attacker knows the Hamming weight of the bytes of all AES round keys—this are 16 ∗ 11 = 176 Hamming weights. For each Hamming
350
Stefan Mangard
part 0
part 2 W12,0 W12,1 W 12,2 W12,3 W13,0 W13,1 W 13,2 W13,3 W14,0 W14,1 W 14,2 W14,3 W15,0 W15,1 W 15,2 W15,3 part 1
part 3
Fig. 3. RoundKey3 is split into four overlapping parts weight there are 9 possible values (0 . . . 8). So, in principle there exist 9176 ≈ 2558 different combinations of Hamming weights. This is much more than the 2128 different keys that can be used in the AES. It is obvious that not all combinations of Hamming weights can occur in practice. It is hard, though, to determine the exact total number of different combinations of Hamming weights that can occur during an AES key expansion. For this purpose one would need to analyze if there exist two or more different secret keys that lead to the same 176 Hamming weights. The high diffusion of the AES key expansion suggests that there are only very few keys of this kind, if there are such keys at all. The crucial point is: the Hamming weight of the bytes of the round keys gives the attacker a lot of information about the key. In practice it is not so important, whether it is even possible to uniquely identify a key based on the knowledge of certain Hamming weights or not. If an attacker can reduce the number of possible keys to a level that allows a brute-force search in reasonable time, it is sufficient to find the secret key. An ideal attack based on the Hamming weight leakage of a device during the AES key expansion would be to have a function or a program, where all measured/known Hamming weights are entered and the attacker gets a list of possible keys (The same is true for all other kinds of leaking information). Yet, the problem is that the diffusion of key expansion makes it hard to determine the concrete list of keys in a reasonable time based on the Hamming weight of arbitrary intermediate results. The attack presented in the following paragraphs is a very reasonable compromise between the freedom to use any combination of measured Hamming weights (or other leaking information) for an attack on the one side and the computational effort needed for the attack on the other side. The attack is based on the observation that an AES round key can be split into four overlapping parts which can be attacked independently to a certain degree. Figure 3, on the basis of RoundKey3, shows how an attacked round key is split to exploit the 8-bit Hamming weight leakage efficiently. The motivation
A Simple Power-Analysis (SPA) Attack
351
Table 1. Visualization of which byte values depend on part0 of RoundKey3 W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19
B0 B1 B2 B3 S0 S1 S2 S3 R0
•
•
• •
•
• • • ◦ ◦ ◦ ◦ • • • •
•
◦
•
•
•
•
for this splitting is the following: Based on the five bytes of one such part of a round key, many bytes of other round keys and of various intermediate results can be calculated. These dependencies are presented in Table 1 on the basis of part0 of RoundKey3. Each cell of Table 1 represents one byte of a round key or an intermediate byte value that occurs during the computation of the round keys. The byte values marked with a • can be calculated based on the bytes of part0 of RoundKey3 (W12...15 ), which are marked with a ◦. This is a property of the AES key expansion presented in Section 2. For the parts part1...3 the bullets and circles are shifted accordingly to the right modulo four. The following notation is used for the byte values shown in Table 1: B0...3 correspond to the bytes of the key words Wi . Sx is the output of S-Box(Bx ) and finally R0 = xor(S1 , Rcon i ,0 ). The values R1...3 are not listed, because these 4 values are not different from the corresponding Sx values. Notice that R0 is calculated based on S1 because of the RotWord operation. In order to exploit these dependencies an attacker proceeds as follows: First, he/she decides which round key to attack and determines the Hamming weights of the bytes of this round key. Second, the Hamming weights of all bytes that solely depend either on part0 , part1 , part2 or part3 are determined. Now the attacker runs through all possible values of each 5-byte part. Since the 8-bit Hamming weights of the attacked round key are known, trying all
352
Stefan Mangard
possible values means running through about 321 million values2 (≈ 228,26 ) per part on average. For each value the attacker calculates all intermediate results and bytes of other round keys that solely depend on the currently attacked part. The attacker then verifies whether the calculated intermediate results and the calculated bytes of the other round keys fulfill the measured Hamming weight properties or not. If all Hamming weight properties are fulfilled, the 5-byte value of the attacked part may be a part of the attacked round key, otherwise this value is not possible. The values which are not possible are removed from a list of possible 5-byte values. This is done for each part of the attacked round key and so the attacker gets four lists: One list of possible values for each 5-byte part of the attacked round key. These lists are then merged into one list—the list of possible values for the attacked round key. Due to the Hamming weight mapping properties of the S-Box function and the xor() operation, only very few keys remain in this list of possible round keys. Similar results are achieved, if comparable leakage information is used to shorten the lists of possible values for the 5-byte parts. The effectiveness of the attack and of variations of the attack is analyzed more closely in the following section. Figure 4 summarizes the described attack.
5
Effectiveness of the Attack and of Its Variants
The attack described in the previous section is highly effective. After the determination of the necessary Hamming weights by power analysis, this attack can be conducted on almost every PC or workstation. For this article, experiments were conducted using a standard PC based on an Intel Pentium III processor with 866 MHz and 128 MB of RAM. 1000 AES keys were randomly generated and the corresponding Hamming weights of these keys were attacked as described in the previous section. The average running time to determine the list of possible values for the attacked round key was less than six minutes and the average number of remaining keys for the brute-force search was about 11. The described attack is very robust, because not all Hamming weights mentioned in the previous section are needed for a successful attack. The attack summarized in Figure 4 is based on 81 Hamming weights. If some of these Hamming weights cannot be determined by power analysis, the attack results do not get significantly worse. For example, if the Hamming weight of the values xor(S-Box(Wt+y,1 ), Rcon t+1+y ,0 ) cannot be determined, the number of keys 4 that remain to be searched by brute-force only increases to about 16 (see column two of Table 2). In this case 76 Hamming weights are used. 2
Assuming that a byte value is uniformly distributed, the expected number of values that need to be considered, when given its Hamming weight, can be calculated as 8 (h8 ) p(H = h) h8 ≈ 50.27, where p(H = h) = 256 . So, for a 5-byte part follows: h=0 the expected value is 50.275 .
A Simple Power-Analysis (SPA) Attack
353
Preparation: 1. Decide which round key to attack. The round keys starting with the words Wt where t ∈ {12, 16, 20, 24, 28, 32} are possible. 2. By power analysis determine the Hamming weights of the bytes Wt+x,0...3 ∀x ∈ {−9, −6, −5, −3, −2, −1, 0, +1, +2, +3, +4, +5, +6, +7}. Additionally, determine in the same manner the Hamming weights of the bytes S-Box(Wt+y,0...3 ) and xor(S-Box(Wt+y,1), Rcon t+1+y ,0 ) 4
∀y ∈ {−9, −5, −1, +3, +7} 3. Find one ciphertext and sufficient information about the corresponding plaintext. Execution:
1. For part = 0 . . . 3 do – Determine the list, Subkeypart , of all 5-byte values whose Hamming weights are equal to the measured Hamming weights of Wt,part, Wt+1,part, Wt+2,part , Wt+3,part and Wt+3,(part+1) mod 4 . – Run through all elements of Subkeypart and for each 5-byte value of this list calculate the bytes Wt+x,part ∀x ∈ {−9, −6, −5, −3, −2, −1, 0, +1, +2, +3, +4, +5, +6, +7} Additionally, calculate the bytes S-Box(Wt+y,0...3 ) and xor(S-Box(Wt+y,1), Rcon t+1+y ,0 ) with y ∈ {−9, −5, −1, +3, +7}. 4 Delete all 5-byte combinations from the list Subkeypart that do not lead to the Hamming weights determined by power analysis. end for 2. Based on the four lists, Subkey0...3 , determine a list of 128-bit values that are possible for the attacked round key. 3. Perform a brute-force search of this list to identify the correct key.
Fig. 4. Summary of the attack on the AES key expansion
However, an attack based on even less Hamming weights is feasible in practice. For example, the attack also works, if only the Hamming weights of those 40 intermediate results are used that are either the input or the output of an S-Box lookup. Using the notation of Figure 4, these are the Hamming weights of the values Wt+y,0...3 and S-Box(Wt+y,0...3 ) ∀y ∈ {−9, −5, −1, +3, +7} whereby t ∈ {12, 16, 20, 24, 28, 32}. An attack which is just based on knowing the Hamming weight of the S-Box inputs and outputs, works more or less the same way as the one described in the previous section. The attacked round key is split into the same four 5-byte parts (see Figure 3) as in the original attack. However, the difference is that in this case the list of possible 5-byte values that needs to be considered per part (called Subkeypart in Figure 4) is significantly longer. This is due to the fact that the Hamming weight of only two bytes (Wt+3,part and Wt+3,(part+1) mod 4 ) per part is known.
354
Stefan Mangard
Table 2. Comparison of the effectiveness of the described attack and its variations. The numbers are based on 1000 experiments, except for the last column where only 100 experiments were made due to the relatively long computation time Number of used Hamming Weights
81
76
40
Number of keys that remain for brute-force search
mean: 11.18 max: 280.0 min: 1.0
mean: 16.41 max: 1152.0 min: 1.0
mean: 1.68*1012 max: 81.71*1012 min: 11.22*106
Time ( hours:min:sec ) needed for the selection of the keys that need to be searched by brute force
mean: 00:05:27 max: 00:15:35 min: 00:00:12
mean: 00:04:52 max: 00:13:49 min: 00:00:10
mean: 05:13:43 max: 10:41:24 min: 00:36:37
An additional difference is that only the Hamming weight of bytes that are either an S-Box input or output can be used to shorten the lists of possible 5-byte values. The consequence of these differences is an increased execution time of the attack. However, it is still feasible in practice. Table 2 shows the effectiveness of this attack in column three. Independent of the register and bus size of a processor, an S-Box lookup takes one byte as input and returns one byte as output—there only exist 256 different S-Box lookups. Knowing the power consumption characteristic of these S-Box lookups on a 32-bit processor, an attacker can exploit them in the same way as the S-Box lookups on an 8-bit processor. This is particularly true for 32-bit processors that offer instructions to process byte values. If sufficient information is leaking about the inputs and outputs of the S-Box lookups, the previously described variant of the attack can also be deployed for 32-bit processors. The results shown in Table 2 document that the presented attacks and its variants are highly effective. The number of keys that need to be searched by brute force is manageable in all listed cases. In fact, in most cases the brute-force search not even requires dedicated hardware and can be executed on a standard PC. Table 2 intentionally does not provide a timing analysis for the final bruteforce search because there are many publications on how fast the AES can be performed in hardware (see [10], [14], [16] and [22]) and software [3]. Based on the provided number of keys that need to be searched by brute force the overall time for the attack can easily be determined for any desired platform. The numbers presented in Table 2 have a quite high variance. This is due to the fact that the execution time of the attack and the number of remaining keys highly depend on the 8-bit Hamming weights used for the attack. The presented attack and its variants are also feasible, if other information than the Hamming weight is leaking from a device. In general, the feasibility
A Simple Power-Analysis (SPA) Attack
355
of an SPA attack on the AES key expansion is determined by two factors. The first one is the amount of information that is leaking from a device about each intermediate result the device processes. This is mainly determined by the device itself and by the quality of the power consumption characterization that is available to the attacker. The second factor is the amount of leaking information that can be used in the attack. In practice it is not possible to utilize leakage information of arbitrary intermediate results efficiently—there needs to be an exploitable relation between them. In the presented attack and its variants the exploited relation is that a relatively big number of 8-bit intermediate results can be calculated based on a 5-byte part of an attacked round key. The amount of leaking information that can be used in an attack is mainly determined by the AES key expansion function. For the exploitation of a Hamming weight leakage, the splitting of the attacked round key as presented in this article is the most efficient one. However, the number of used intermediate results in an attack can be increased compared to the described attack by splitting the attacked round key in parts of six instead of five bytes. This at first increases the list of possible keys per part, however it also allows to utilize the leakage information of more intermediate results (more intermediate results can be calculated based on six bytes than based on five bytes). The bottom line is that this makes the attack stronger (less keys remain for the brute-force search) at the cost of a significantly increased execution time.
6
Countermeasures
The presented attack exploits the information leakage of devices about the 8bit intermediate results they process. A protection against the attack can be achieved in the following ways: – A logic style that is resistant against power attacks can be used to stop the information leakage of the device. This way most kinds of power attacks become very hard. Yet, the implementation of the device becomes much more expensive. – The AES can be implemented in hardware. This way, the key expansion can be done in a highly parallel way. For example the calculation of one round key based on its predecessor can be done within one clock cycle. In such a parallel design, significantly less information is leaking during the key expansion. The number of keys that need to be considered in a brute-force search cannot be reduced sufficiently to make such a search practically feasible. Hardware implementations as such are, however, not resistant against DPA attacks. – If multiple power measurements need to be made by the attacker to reduce the noise of the power measurement sufficiently, a randomization of the key expansion counteracts the presented attack. Existing masking schemes (see [2], [11], [12] and [24]) can be extended to protect also the AES
356
Stefan Mangard
key expansion. The basic idea is that instead of the round keys, masked round keys and corresponding random masks are processed by the device. Thereby the condition RoundKeyi = M askedRoundKeyi ⊕ M aski needs to be fulfilled. The calculation of M askedRoundKeyi+1 and M aski+1 based on M askedRoundKeyi and M aski can be done almost in the same way as the one of RoundKeyi+1 based on RoundKeyi . The Rotword function can be applied separately to M askedRoundKeyi and M aski . The xor() function is simply applied to the corresponding M askedRoundKeyi values. The S-Box operation is not linear, but strategies on how to mask this function are extensively discussed in [11] and [24]. The mask protecting a round key can be removed, after the key has been added to the masked data, i.e. after the AddRoundKey operation. Yet, the mask protecting the data that is processed (the conventional masking) may of course not be removed.
7
Conclusions
This article shows that SPA attacks on implementations of the AES key expansion on 8-bit and in certain cases even on 32-bit processors are practically feasible. Devices using currently proposed masking schemes are equally susceptible to this attack as completely unprotected AES implementations. Leaking information about the inputs and the outputs of S-Box lookups occurring during the AES key expansion suffices to reveal the 128-bit secret key of a device. The main strength of the presented attack is the efficient utilization of leaking information to substantially reduce the number of keys that need to be considered in a brute-force search for the secret key. The key expansion is an integral part of the AES and consequently the presented attack poses a serious practical threat to unprotected implementations. Different strategies on how to counteract this attack have been described in this article.
References [1] M.-L. Akkar, R. Bevan, P. Dischamp, and D. Moyart. Power Analysis, What Is Now Possible... In Advances in Cryptology – ASIACRYPT 2000, volume 1976 of Lecture Notes in Computer Science (LNCS), pages 489–502. Springer-Verlag, 2000. 349 [2] M.-L. Akkar and C. Giraud. An implementation of DES and AES, secure against some attacks. In Cryptographic Hardware and Embedded Systems – CHES 2001, volume 2162 of Lecture Notes in Computer Science (LNCS), pages 309–318. Springer-Verlag, 2001. 344, 355 [3] G. Bertoni, L. Breveglieri, P. Fragneto, M. Macchetti, and S. Marchesin. Efficient Software Implementation of AES on 32-bits Platforms. In Cryptographic Hardware and Embedded Systems – CHES 2002, Lecture Notes in Computer Science (LNCS). Springer-Verlag, 2002. 348, 354 [4] E. Biham and A. Shamir. Power Analysis of the Key Scheduling of the AES Candidates. In Second Advanced Encryption Standard (AES) Candidate Conference, Rome, Italy, 1999. 343, 344, 347
A Simple Power-Analysis (SPA) Attack
357
[5] S. Chari, C. Jutla, J. R. Rao, and P. Rohatgi. A Cautionary Note Regarding Evaluation of AES Candidates on Smart-Cards. In Second Advanced Encryption Standard (AES) Candidate Conference, Rome, Italy, 1999. 343 [6] J. Daemen and V. Rijmen. The Design of Rijndael. Springer-Verlag, 2002, ISBN 3-540-42580-2. 343 [7] J. Daemen and V. Rijmen. The Rijndael Page. Available at http://www.esat. kuleuven.ac.be/~rijmen/rijndael/. 348 [8] J. Daemen and V. Rijmen. Resistance Against Implementation Attacks. A Comparative Study of the AES Proposals. In Second Advanced Encryption Standard (AES) Candidate Conference, Rome, Italy, 1999. 343 [9] P. Fahn and P. Pearson. IPA: A New Class of Power Attacks. In Workshop on Cryptographic Hardware and Embedded Systems – CHES 1999, volume 1717 of Lecture Notes in Computer Science (LNCS), pages 173–186. Springer-Verlag, 1999. 347 [10] V. Fischer and M. Drutarovsk´ y. Two Methods of Rijndael Implementation in Reconfigurable Hardware. In Workshop on Cryptographic Hardware and Embedded Systems – CHES 2001, volume 2162 of Lecture Notes in Computer Science (LNCS), pages 77–92. Springer-Verlag, 2001. 354 [11] J. Dj. Golic and C. Tymen. Multiplicative Masking and Power Analysis of AES. In Cryptographic Hardware and Embedded Systems – CHES 2002, Lecture Notes in Computer Science (LNCS). Springer-Verlag, 2002. 344, 355, 356 [12] K. Itoh, M. Takenaka, and N. Torii. DPA Countermeasure Based on the ”Masking Method”. In Information Security and Cryptology – ICISC 2001, volume 2288 of Lecture Notes in Computer Science (LNCS), pages 440–456. Springer-Verlag, 2002. 344, 355 [13] P. C. Kocher, J. Jaffe, and B. Jun. Differential Power Analysis. In Advances in Cryptology – CRYPTO 1999, volume 1666 of Lecture Notes in Computer Science (LNCS), pages 388–397. Springer-Verlag, 1999. 343, 346 [14] H. Kuo and I. Verbauwhede. Architectural Optimization for a 1.82Gbits/sec VLSI Implementation of the AES Rijndael Algorithm. In Workshop on Cryptographic Hardware and Embedded Systems – CHES 2001, volume 2162 of Lecture Notes in Computer Science (LNCS), pages 51–64. Springer-Verlag, 2001. 354 [15] R. Mayer-Sommer. Smartly Analyzing the Simplicity and the Power of Simple Power Analysis on Smartcards. In Cryptographic Hardware and Embedded Systems – CHES 2000, volume 1965 of Lecture Notes in Computer Science (LNCS), pages 78–92. Springer-Verlag, 2000. 344 [16] M. McLoone and J. V. McCanny. High Performance Single-Chip FPGA Rijndael Algorithm Implementations. In Workshop on Cryptographic Hardware and Embedded Systems – CHES 2001, volume 2162 of Lecture Notes in Computer Science (LNCS), pages 65–76. Springer-Verlag, 2001. 354 [17] T. S. Messerges. Using Second-Order Power Analysis to Attack DPA Resistant Software. In Cryptographic Hardware and Embedded Systems – CHES 2000, volume 1965 of Lecture Notes in Computer Science (LNCS), pages 238–251. SpringerVerlag, 2000. 344 [18] T. S. Messerges, E. A. Dabbish, and R. H. Sloan. Investigations of Power Analysis Attacks on Smartcards. In Proceedings of USENIX Workshop on Smartcard Technology, pages 151–162, 1999. 344 [19] National Institute of Standards and Technology. FIPS 197 Advanced Encryption Standard (AES). Available at http://csrc.nist.gov/publications/fips/ fips197/fips-197.pdf. 343, 345
358
Stefan Mangard
[20] National Institute of Standards and Technology. FIPS 46-2 Data Encryption Standard (DES). Available at http://csrc.nist.gov/publications/fips/. 343 [21] National Institute of Standards and Technology. The AES Home Page. Available at http://csrc.nist.gov/encryption/aes/. 348 [22] A. Satoh, S. Morioka, K. Takano, and S. Munetoh. A Compact Rijndael Hardware Architecture with S-Box Optimization. In Advances in Cryptology – ASIACRYPT 2001, volume 2248 of Lecture Notes in Computer Science (LNCS), pages 239–254. Springer-Verlag, 2001. 354 [23] K. Tiri, M. Akmal, and I. Verbauwhede. A Dynamic and Differential CMOS Logic with Signal Independent Power Consumption to Withstand Differential Power Analysis on Smart Cards. In 28th European Solid-State Circuits Conference – ESSCIRC 2002, Florence, Italy, 2002. 343 [24] E. Trichina, D. De Seta, and L. Germani. Simplified Adaptive Multiplicative Masking for AES and its Secure Implementation. In Cryptographic Hardware and Embedded Systems – CHES 2002, Lecture Notes in Computer Science (LNCS). Springer-Verlag, 2002. 344, 355, 356
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem Kouichi Sakurai1 and Tsuyoshi Takagi2 1
Kyushu University Department of Computer Science and Communication Engineering Hakozaki, Fukuoka 812-81, Japan [email protected] 2 Technical Universit¨ at Darmstadt, Fachbereich Informatik Alexanderstr.10, D-64283 Darmstadt, Germany [email protected]
Abstract. EPOC-2 is a public-key cryptosystem that can be proved IND-CCA2 under the factoring assumption in the random oracle model. It was written into a standard specification P1363 of IEEE, and it has been a candidate of the public-key cryptosystem in several international standards (or portfolio) on cryptography, e.g. NESSIE, CRYPTREC, ISO, etc. In this paper we propose a chosen ciphertext attack against EPOC-2 from NESSIE by observing the timing of the reject signs from the decryption oracle. We construct an algorithm, which can factor the public modulus using the difference of the reject symbols. For random 384-bit primes, the modulus can be factored with probability at least 1/2 by invoking about 385 times to the decryption oracle. Keywords: EPOC-2, chosen ciphertext attack, reject function, timing attack, factoring, Manger’s attack.
1
Introduction
The security criteria of public-key cryptosystems for general purposes is the semantic security against the chosen ciphertext attack (IND-CCA2) [BDPR98]. The chosen ciphertext security is achieved by rejecting the invalid ciphertexts asked to the decryption oracle. Some public-key cryptosystems deploy several different reject functions in the decryption process, namely invalid parameter sizes, invalid control padding values, invalid decoding formats, etc. If the implementation of the decryption is careless or naive, the attacker can detect the difference of the reject symbol. Manger proposed a chosen ciphertext attack against RSA-OAEP using the difference of reject symbols [Man01]. The message of the target ciphertext can be recovered by adaptively asking the decryption oracle and knowing the difference of two reject signs. After computing the RSA primitive (C d mod n), the decryption process of the RSA-OAEP checks two different functions: (1) integer-to-octet conversion is smaller than a bound, (2) OAEP-padding is correct. Manger suggested the two rejections can be distinguished by some implementations, e.g., the differences of the timing, the log file P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 359–374, 2003. c Springer-Verlag Berlin Heidelberg 2003
360
Kouichi Sakurai and Tsuyoshi Takagi
of the decryption, etc. Several attacks related to this scenario can be found in several papers [KCJLMWY01], [Koc96], [KR02], [KJJ99], [Nov02]. EPOC-2 is a public-key cryptosystem that can be proved semantically secure against the adaptive chosen ciphertext attack (IND-CCA2) under factoring assumption in the random oracle model [EPOC]. EPOC-2 was standardized as the public-key encryption at P1363 of IEEE and it has been a candidate of several international standardizations (or portfolio), e.g. NESSIE, CRYPTREC, ISO, etc. The specifications of EPOC-2 in these standards are not unique (In Section 2.1 we describe a short history of EPOC family). The old specification (ver. D6) from IEEE is based on the paper [FO99b], and those from the current version of IEEE, NESSIE and CRYPTREC are based on the paper [FO01]. One of the main differences of these papers is the consideration of the chosen ciphertext attack proposed by Joye, Quisquater, and Yung (JQY attack) [JQY01]. The JQY attack is effective upon only the old EPOC-2 from IEEE (ver. D6, [FO99b]), but the attack is no longer effect against those of the current version of IEEE, NESSIE, or CRYPTREC. In this paper we propose a chosen ciphertext attack against the current version of EPOC-2 by observing the timings of two different reject functions. We construct an algorithm, which can factor the modulus by asking the decryption oracle and analyzing the rejection symbols. The EPOC-2 has two main rejection functions: (Reject 1) The first is arisen from the size of the decrypted ephemeral integer by the cryptographic primitive. If its size is too large, the decryption process is stopped and returns the rejection symbol. (Reject 2) The second one comes from the integrity check by the re-encryption of the decrypted message. It returns the reject sign, if the integrity of the re-encryption is invalid. The calculation of the re-encryption requires two modular exponentiations, which is as slow as that of the cryptographic primitive. Therefore, on some computation environments, the attacker can tell the difference of these rejection symbols by timing attack. The proposed attack detects the approximation of an ephemeral integer from the most significant bits. The attacker can guess each bit by invoking a manipulated ciphertext to the decryption oracle. Once the attacker know the ephemeral integer for a given ciphertext, the modulus can be factored. For a random 384-bit secret prime, the attack can factor the modulus with probability at least 1/2 with about 385 invocations to the decryption oracle. Dent has independently proposed a similar reject timing attack against EPOC-2 cryptosystem [Den02a, Den02b]. He also assumed that the attacker can distinguish the reject symbols of manipulated ciphertexts, but he constructed a different algorithm that can factor the public modulus. It directly detects the secret prime from the most significant bits instead of knowing the ephemeral integer in our proposed construction. This paper organizes as follows: In section 2, we review the EPOC-2 encryption scheme from NESSIE and the history of EPOC family. In section 3, we present our proposed attack. We also show an example of our attack against a test vector by the designer. In section 4 the relation of the proposed attack to other provably secure cryptosystems is considered.
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem
361
Key Generation pLen, the bit length of prime p n = p2 q, the modulus, g ∈ ZZ/nZZ s.t. p|ordp2 (g) gp = g mod p2 , h = g n mod n Public-key: (n, g, h, pLen), Secret key: p, q, gp Encryption of m m ∈ {0, 1}∗ , a message, σ ∈ {0, 1}pLen−1 , a random integer c2 = m ⊕ G(σ), c1 = g σ hH(m,σ,c2 ) mod n The ciphertext: (c1 , c2 ) Decryption of c 2 p−1 σ ∗ = L(cp−1 mod p )L(g mod p2 )−1 mod p (= [[c1 ]]g ) p 1 ∗ If |σ | ≤ pLen − 1, then go to next step, otherwise return Reject, ∗ ∗ ∗ m∗ = c2 ⊕ G(σ ∗ ), if c1 = g σ hH(m ,σ ,c2 ) mod q holds, ∗ then output m as decryption of (c1 , c2 ), otherwise return Reject.
Fig. 1. EPOC-2 Cryptosystem
2
EPOC-2 Cryptosystem
In this section we review the EPOC-2 encryption scheme. There are several different versions of EPOC-2 as scientific papers [FO99b], [FO01] or as specifications of international standards (or portfolio) [IEEE], [NESSIE], [CRYPTREC], etc. In this paper we consider the current specification and the notation of the self-evaluation report that were submitted to the 2nd phase of NESSIE project [NESSIE]. The specifications of EPOC-2 from IEEE and CRYPTREC are similar to that of NESSIE. EPOC-2 is an probabilistic encryption scheme based on the hardness of the factoring problem of n = p2 q, where p, q are distinct prime numbers. Let pLen be the bit-length of the prime p. In the key generation, we additionally generate an integer g of ZZ/nZZ such that p|ordp2 (g) (the order of g mod p2 in group ZZ/p2 ZZ is divisible by p). Moreover, we compute gp = g mod p2 and h = g n mod n. Then the public-key and the secret key of EPOC-2 are (n, g, h, pLen) and (p, q, gp ), respectively. Let G be a mask generation function: {0, 1}pLen−1 → {0, 1}∗ and let H be a hash function: {0, 1}∗ × {0, 1}pLen−1 × {0, 1}∗ → {0, 1}rLen, where rLen is the bit-length of the output of the hash function H, defined by the security parameter for primes p, q. There are several variations of EPOC-2 in the key generation (e.g. h of CRYPTREC is chosen differently), but the proposed attack is not affected by its variations. The encryption of EPOC-2 is computed as follows: m ∈ {0, 1}∗ is a message with arbitrary bit length. For a random integer σ ∈ {0, 1}pLen−1, we encrypt the message m as follows: c2 = m ⊕ G(σ), c1 = g σ hH(m,σ,c2 ) mod n. The ciphertext of m is C = (c1 , c2 ). The decryption of EPOC-2 is as follows: At first the first component c1 of the ciphertext C is decrypted by computing σ ∗ = L(cp−1 mod p2 )L(gpp−1 mod 1 2 −1 p ) mod p, where L(x) = (x − 1)/p. In this paper we also denote by [[c1 ]]g =
362
Kouichi Sakurai and Tsuyoshi Takagi
L(cp−1 mod p2 )L(gpp−1 mod p2 )−1 mod p. Here we have the first reject function 1 based on the size of σ ∗ . Let |σ ∗ | be the bit-length of σ ∗ . If |σ ∗ | > pLen − 1, we stop the decryption procedure and return Reject. Otherwise we go to next step. This rejection function is necessary in order to prevent the attack proposed by Joye, Quisquater, and Yung [JQY01]. In this paper we denote by Reject 1 this reject symbol. Note that the ciphertext C = (c1 , c2 ) with c1 = g r mod n for integer r < 2pLen−1 and random integer c2 ∈ ZZ/nZZ is not rejected by this test and go to next step, although C is an invalid ciphertext (it is rejected in the next step). The message m∗ is decrypted by computing m∗ = c2 ⊕ G(σ ∗ ). Here ∗ ∗ ∗ we have the second rejection function. If c1 = g σ hH(m ,σ ,c2 ) mod q holds, then output m∗ as decryption of (c1 , c2 ), otherwise return Reject. 2.1
History of EPOC
In this section we shortly review the history of the specifications of EPOC family. We mainly discuss how the reject symbol that is returned by the decryption oracle has been changed. The cryptographic primitive of EPOC was proposed by Uchiyama and Okamoto at EUROCRYPT’98 [OU98]. The one-wayness and the semantic security (IND-CPA) of the primitive are as secure as factoring and p-subgroup problem in the standard model. The EPOC primitive has no reject symbol in the decryption oracle, so that it is insecure against the chosen ciphertext attack. Indeed, Joye, Quisquater, and Yung proposed a chosen ciphertext attack against the EPOC primitive at rump session of Eurocrypt’98 [JQY98]. Let c be the ciphertext of m, which is larger than the secret key p. If the attacker obtains the decrypted message m of the ciphertext c, the modulus n of the EPOC primitive can be factored by computing gcd(m − m , n) = p. At CRYPTO’99 Fujisaki and Okamoto proposed a conversion technique that enhances the EPOC primitive to be IND-CCA2 under factoring assumption in the random oracle model [FO99b]. In the decryption process the conversion checks the integrity of the ciphertext by re-encrypting the message. This version of EPOC was submitted to the IEEE P1363a on October 1998 [IEEE]. Joye et al. proposed a chosen ciphertext attack against the submission (ver. D6 of EPOC-2 in IEEE) [JQY01]. We call it the JQY attack. The JQY attack based on the chosen ciphertext attack against the EPOC primitive [JQY98], and the attack tries to find the approximation of the secret prime p by adaptively asking ciphertexts (whose message is as large as p) to the decryption oracle. In the paper [JQY01] they suggested that if the decryption oracle checks the size of the integer decrypted by the EPOC primitive, the JQY attack is no longer successful. The reject symbol arisen from this rejection function is called Reject 1 in Section 3. The current version of EPOC-2 from IEEE supports this reject function and the JQY attack does not work for it. The security reduction from [FO99b] was evaluated for general cryptographic primitives and the advantage of the reduction was not so tight. Fujisaki and Fujisaki proved the better security reduction in the paper [FO01]. In that paper they included the reject treatment proposed by Joye et al. (Reject 1).
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem
363
EPOC-2 have been proposed at NESSIE 1st/2nd phase [NESSIE], at CRYPTREC 2000/2001 [CRYPTREC]. These versions support the rejection function (Reject 1). We notice that the specification of the EPOC-2 from NESSIE 1st phase is different — the decryption oracle returns only one reject symbol after completing all steps of the decryption process. Although EPOC has not incorporated into the draft of ISO Standard, EPOC-2 will be included in the standard [Sho01]. We summarize these history of EPOC related to the reject function in the next table. EPOC Version Based Paper Reject 1 JQY Attack EPOC Primitive [OU98] NO YES [FO99b] NO YES IEEE (ver. D6) IEEE Version [FO01] YES NO [FO01] YES NO CRYPTREC Version [FO01] YES NO NESSIE Version
3
Proposed Attack
In this section we describe the proposed attack against the current version of EPOC-2. The attack is based on the JQY attack [JQY01]. Although the current version of EPOC-2 is secure against the JQY attack, our proposed attack can break it using the timing of the two different rejection symbols.1 At first we show an observation on the decryption algorithm of EPOC2. In the decryption process, the calculation of the integrity check c1 = ∗ ∗ ∗ g σ hH(m ,σ ,c2 ) mod q is executed if and only if |σ ∗ | ≤ pLen−1 holds. It has two modular exponentiations modulo q and their running time is relatively slow — several milliseconds in standard computation environments. The timing attack, which measures the timing of receiving Reject from the decryption oracle, can observe the calculation. Therefore we use the following assumption: For any ciphertext C = (c1 , c2 ), the attacker can know that σ ∗ = [[c1 ]]g satisfies σ ∈ {0, 1}pLen−1 or not by asking the ciphertext C to the decryption oracle. From this assumption, the attacker can tell the difference of two reject symbols: the error of the primitive decryption (Reject 1) and the error of the integrity check (Reject 2) in the decryption oracle. If the decrypted ephemeral integer σ ∗ by the EPOC primitive is large than 2pLen−1 , then Reject 1 is returned. The reject ∗ ∗ ∗ symbol Reject 2 is returned, if both |σ ∗ | ≤ pLen−1 and c1
= g σ hH(m ,σ ,c2 ) mod q for m∗ = c2 ⊕ G(σ ∗ ) hold. 1
Recently, Dent has proposed a similar reject timing attack against EPOC-2 cryptosystem [Den02a, Den02b]. His attack is described in Section 3.6.
364
Kouichi Sakurai and Tsuyoshi Takagi Decryption of c 2 p−1 σ ∗ = L(cp−1 mod p )L(g mod p2 )−1 mod p(= [[c1 ]]g ) p 1 ∗ If |σ | ≤ k − 1, then go to next step, otherwise return Reject 1, ∗ ∗ ∗ m∗ = c2 ⊕ G(σ ∗ ), if c1 = g σ hH(m ,σ ,c2 ) mod q holds, then output m∗ as decryption of (c1 , c2 ), otherwise return Reject 2.
We state this observation as the following lemma. Lemma 1. Let C = (c1 , c2 ) be a ciphertext of EPOC. Let σ ∗ = [[c1 ]]g be the ephemeral integer decrypted by the EPOC primitive. We have the following conditions: (1) σ ∗ > 2pLen−1 ⇒ Reject 1, ∗ ∗ ∗ = g σ hH(m ,σ ,c2 ) mod q for m∗ = c2 ⊕ G(σ ∗ ) ⇒ Reject 2. (2) c1
3.1
Main Idea
We describe the main idea of our attack. Let C = (c1 , c2 ) be a valid ciphertext of EPOC-2. Let σ ∗ = [[c1 ]]g . The attacker manipulates the ciphertext C by multiplying it with an integer D = g α mod n, namely C = (c1 /D mod n, c2 ). The ciphertext C is rejected in the decryption oracle with overwhelming prob∗ ∗ ∗ ability, because the second integrity check fails (c1
= g σ hH(m ,σ ,c2 ) mod q for m∗ = c2 ⊕ G(σ ∗ )). However the attacker can know a relation of σ ∗ and α based on the rejection symbols: Reject 1 or Reject 2. Indeed we have the following lemma. Lemma 2. Assume p > 2pLen−1 + α for a positive integer α < 2pLen−1 . Let C = (c1 , c2 ) be a ciphertext of EPOC-2. Let [[c1 ]]g = σ ∗ . The reject symbol against the ciphertext C = (c1 /D, c2 ) with D = g α mod n is equal to Reject 2 if and only if σ ∗ > α holds. Proof. Note that [[c1 /D mod n]]g = σ ∗ − α mod p. If σ ∗ > α holds, then we have [[c1 /D mod n]]g = σ ∗ − α < 2pLen−1 and the reject symbol is Reject 2. If σ ∗ < α holds, then we have [[c1 /D mod n]]g = σ ∗ − α + p. Because of σ ∗ − α + p > σ ∗ + 2pLen−1 > 2pLen−1 , the ciphertext C is reject with Reject 1. Therefore the difference of the reject symbols yields an oracle, which answers that the condition σ ∗ > α holds or not for a given ciphertext C = (c1 , c2 ) and an integer α, where [[c1 ]]g = σ ∗ . If we ask the ciphertext C with different many α to the decryption oracle, the attacker can find the approximation of σ ∗ . Once we know an algorithm which answers σ ∗ = [[c1 ]]g for a given ciphertext C = (c1 , c2 ), we can factor the modulus n. We have the following lemma. Lemma 3. Let c1 = g σ mod n with σ > p. If we know the decryption σ ∗ = L(cp−1 mod p2 )L(gpp−1 mod p2 )−1 mod p = [[c1 ]]g , then we can factor the mod1 ulus by computing gcd(σ − σ ∗ , n) = p. Proof. Because σ ∗ = [[c1 ]]g = σ mod p holds, we have p|(σ − σ ∗ ).
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem
365
This lemma is used for the security proof of the EPOC primitive [OU98] and the chosen ciphertext attack on the EPOC primitive (JQY attack) [JQY01]. In the following we will construct an algorithm that finds σ ∗ for a given ciphertext c1 and an integer σ using the oracle above. We show the high level description of the attack as follows. 1. Choose an integer σ such that σ > 2pLen > p. Compute c1 = g σ mod n. Let C = (c1 , c2 ) be a ciphertext for random c2 ∈ {0, 1}∗. 2. The attacker asks the manipulated ciphertext C = (c1 /D, c2 ) to the decryption oracle, where D = g α mod n for some integers 0 < α < 2pLen−1 . He/She analyzes the reject symbols for the ciphertexts C . 3. The attacker outputs σ ∗ (= σ mod p) and factors n by gcd(σ − σ ∗ , n). 3.2
Initialization
In the beginning of the attack, we require a ciphertext c1 = g σ mod n with σ > p and σ ∗ = σ mod p < 2pLen−1 . This condition is easily tested by asking the ciphertext C = (c1 , c2 ) to the decryption oracle. If we choose the σ from the interval [2pLen , 2pLen+1 ], then σ mod p < 2pLen−1 is satisfied with probability at least 1/2. Thus we have the following initialization for our attack. Initialization Input: n, g, pLen Output: C = (c1 , c2 ) with σ > p, σ ∗ = σ mod p < 2pLen−1 1. Generate σ ∈R [2pLen , 2pLen+1 ] 2. Compute C = (c1 , c2 ), where c1 = g σ mod n, c2 ∈R {0, 1}∗ 3. Ask C to the decryption oracle. If we receive Reject 1, goto step 1 4. Return C
3.3
Outline of Attack
We explain the outline of the proposed attack. The proposed attack guesses the bits of σ ∗ = σ mod p from the most significant bit. From Lemma 2, the attack can guess σ ∗ is larger or smaller than a given bound. Let U B, LB be the upper bound and lower bound of σ ∗ known by the oracle call, respectively. U B and LB are stored as temporary values. The attacker tries to shrink the distance LB−U B by asking the oracle. From the initialization, we have LB = 0 and U B = 2pLen−1 in the beginning. Moreover we assume that p > 2pLen−1 + 2pLen−2 , which is satisfied with probability at least 1/2 for randomly chosen pLen-bit primes. We explain how to guess whether σ ∗ > 2pLen−2 or not. We assume that the ciphertext is already initialized. Let D = g α mod n for α = 2pLen−2 . If we ask the ciphertext C = (c1 /D mod n, c2 ) to the decryption oracle, from Lemma 2 we have following relationship: (1)σ ∗ > 2pLen−2 ⇔ Reject 2 (2)σ ∗ < 2pLen−2 ⇔ Reject 1
366
Kouichi Sakurai and Tsuyoshi Takagi
Therefore we know the σ ∗ is in intervals [0, 2pLen−2 ] or [2pLen−2 , 2pLen−1 ]. Indeed we assign LB = Av if Reject 2, otherwise, U B = Av, where Av = (LB + U B)/2. In order to guess the next most bits, the following moralization of the ciphertext is executed. If σ ∗ is in the upper interval [2pLen−2 , 2pLen−1 ], then the ciphertext is normalized by calculating c1 /D mod n with integer D = g β mod n for β = 2pLen−1 . Here C/D mod n was already computed in the previous step, and we just assign c1 = c1 /D mod n if integer σ in the upper interval [2pLen−2 , 2pLen−1 ]. Then we manipulate the ciphertext c1 /D = g α mod n for α = 2pLen−3 . From p > 2pLen−1 + 2pLen−2 , the prime p satisfies the assumption of Lemma 2 for α = 2pLen−3 , namely p > 2pLen−1 + 2pLen−3 . By asking C = (c1 /D, c2 ) to the oracle, we know σ ∗ is in the intervals [0, 2pLen−3 ], [2pLen−3 , 2pLen−2 ], and thus σ ∗ is in one of intervals [(i − 1)2pLen−3 , i2pLen−3 ] for i = 1, 2. Consequently we assign the new upper/lower bound of σ ∗ by selecting LB = Av if Reject 2 or LB = Av otherwise, where Av = (LB + U B)/2. If we iterate these steps, the lower bits of integer σ ∗ can be found. We eventually find the approximation of σ ∗ with a small error bound. 3.4
Details of Algorithm
We describe the algorithm that factors the modulus n using the reject timing attack. RTA EPOC Input: n, g, pLen (Public Key) Output: p, q (Secret Primes) 1. σ, C = (c1 , c2 ) ← Initialization(n, g, pLen) 2. LB = 0, U B = 2pLen−1 3. For i = 2 to pLen 4. α = 2pLen−i , A = c1 /g α mod n 5. Av = (LB + U B)/2 6. Ask C = (A, c2 ) to the decryption oracle for random c2 7. If Reject 2, then c1 = A, LB = Av 8. If Reject 1, then U B = Av 9. If n > p = gcd(σ − σ ∗ , n) > 1 for σ ∗ ∈ [LB, U B], then compute q = n/p2 . 8. Return p, q
In step 1, the first component c1 of the ciphertext satisfies σ ∗ = [[c]]g < 2 . The difference U B − LB in step 9 is at most 2 because we iterate pLen − 1 times the approximation finding algorithm. The gcd computation in step 9 is performed at most twice. If gcd(σ − σ ∗ , n) = 1 or n holds, the algorithm fails to factor the modulus. If the prime p satisfies the condition p > 2pLen−1 + 2pLen−2 , the algorithm always outputs the prime p due to Lemma 2. If we chose randomly the prime from 2pLen−1 < p < 2pLen , this requirement is satisfied with probability at least 1/2. Thus we have the following theorem. pLen−1
Theorem 1. Algorithm RTA EPOC can factor the modulus n with probability at least 1/2 if the secret prime p is randomly chosen from pLen-bit primes.
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem
367
Note that our attack is not restricted to these above conditions. The algorithm works in general situations, although the probability of success may change. 3.5
An Example
We demonstrate an example of our proposed attack against EPOC-2. A key from the test vector distributed by NTT [EPOC] is examined, namely the public key we tested is as follows: g =2 n = 4152082246314238505355867044990543688751999781554451624701106598380392 1542404818130493308730652602259005592361720580572637999435883733867663 8939981704437437451639350210369269495068539708532435959993658412592819 4115043204081322843398774201030468222769615766429364969134206293259707 9108707252040308702094410062749766137657427879520751496889474301533
The initial integer σ should satisfy both σ > 2pLen and σ ∗ = σ mod p < 2pLen−1 . The criteria σ ∗ < 2pLen−1 is examined by asking C = (c1 , c2 ) to the decryption oracle, where c1 = g σ mod n and c2 is an random integer. We chose the following value: σ = 459673101604635995219856896542867619161831215705589851585465126859 600945206135700890840822595308528953765266714945265
Then we compute the main loop of our proposed algorithm. At step i the ciphertext C = (c1 , c2 ) is manipulated by computing c1 /D mod n with D = g α mod n for α = 2pLen−i for some integers. The manipulated ciphertext is asked to the decryption oracle and the attacker knows the lower bound LB and the upper bound U B of the approximation of integer σ ∗ . The difference U B−LB is shrinking for each iteration. We list up several first and last values of the lower bound LB and the upper bound U B of integer σ ∗ from our experiment. i Rejection LB (Lower Bound) UB (Upper Bound) 2 Reject 2 LB[2] = 0 UB[2] = 985050154909861980306976002503590345126993481761636166 6987073351061430442874302652853566563721228910201656997576704 3 Reject 1 LB[3] = 492525077454930990153488001251795172563496740880818083 3493536675530715221437151326426783281860614455100828498788352 UB[3] = UB[2] 4 Reject 2 LB[4] = LB[3] UB[4] = 738787616182396485230232001877692758845245111321227125 0240305013296072832155726989640174922790921682651242748182528 ··· ··· ··· ··· 382 Reject 1 LB[382] = 50670023607970887958773558071218657566602227664593135 56096647551492187643373325124918522171908865342977731283270068 UB[382] = LB[381] 383 Reject 2 LB[383] = LB[382] UB[383] = 50670023607970887958773558071218657566602227664593135 56096647551492187643373325124918522171908865342977731283270070 384 Reject 1 LB[384] = 50670023607970887958773558071218657566602227664593135 56096647551492187643373325124918522171908865342977731283270069 UB[384] = LB[383]
368
Kouichi Sakurai and Tsuyoshi Takagi
At the end of the main loop, we know U B − LB = 1. Finally we compute gcd(σ − σ ∗ , n) for integer σ ∗ ∈ [LB[384], U B[384]]. If 0 < gcd(σ − σ ∗ , n) < n holds, we obtain the secret prime p = gcd(σ − σ ∗ , n) and the other factor by computing q = n/p2 . In our example, we have successfully obtained the secret prime p. gcd(σ − σ∗ , n) = 3788384160365324220199829506131214611709758274492754483578 0706609009063130230197980493525035283305300898961285972933
3.6
Dent’s Attack
Dent has proposed a similar reject timing attack against EPOC-2 cryptosystem [Den02a, Den02b]. His construction of the factoring algorithm is slightly different from ours. We explain his idea in the following. Instead of using Lemma 2, the attacker directly finds the bit representation of the prime p. The attacker tries to find the bit representation from the most significant bit down to lower bits. In order k−1 k−2 + g2 to know the second most signification bit, the attacker ask c1 = g 2 to the decryption oracle, where k = pLen. Note that the second most significant bit is one if and only if Reject 1 is received from the decryption oracle due to p > 2k−1 + 2k−2 . After knowing the most i significant bits a2 , ..., ai , the attacker asks the ciphertext k−1
c1 = g 2
k−2
+ a2 g 2
k−i
+ ... + ai g 2
k−i−1
+ g2
(1)
to the decryption oracle. He/she then know the (i + 1)-th most significant bit is one if and only if Reject 1 is received. His construction directly detects the secret prime p from the most significant bits instead of knowing the ephemeral integer σ ∗ in our proposed construction. The paper [Den02a] only described the mathematical outline of his attack. Although the attack is strongly based on the previous papers [JQY98, JQY01], there was no comparison with these attacks in paper [Den02a]. In the following, we describe the superior points of our paper. Section 2.1 discusses a comprehensive historical review on the attacks related to the EPOC-2 by considering previous references and other standards, namely IEEE, CRYPTREC, ISO. In Section 3.5 we show an example of the attack using the test vector from the designer. Section 3.7 describes how to repair the EPOC-2 precisely. In Section 4 we discuses how the proposed attack can be extended to other provably secure cryptosystems. We present comprehensive references related to this topic. 3.7
How to Repair EPOC-2
The proposed attack against EPOC is effective, because there are two different rejection processes. One possibility to resist the attack is to use only one rejection function.
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem
369
Modified Decryption σ ∗ = L(cp−1 mod p2 )L(gpp−1 mod p2 )−1 mod p, ∗ ∗ ∗ m∗ = c2 ⊕ G(σ ∗ ), c∗1 = g σ hH(m ,σ ,c2 ) mod q. Event 1 = {|σ ∗ | ≤ k − 1}, Event 2 = {c1 = c∗1 }. Set Γ = {Event 1 ∧ Event 2}. If Γ = 1, output m∗ as decryption of (c1 , c2 ), otherwise, return Reject.
The decryption oracle always computes both σ ∗ and c∗1 . Then the Boolean logic functions Event 1 = {|σ ∗ | ≤ k − 1} and Event 2 = {c1 = c∗1 } are evaluated. Then the control bit Γ = {Event 1 ∧ Event 2} is assigned. If Γ = 1 holds, m∗ is output as the decryption of (c1 , c2 ), otherwise Reject is returned. Because the timings for computing the values of Event 1 and the control bit Γ are negligible, the attacker can not know the value of Event 1. On the other hands, the implementer also has to care the treatment of Event 1 and Event 2. If the history of Event 1 is stored in a log file, then the attacker can perform the proposed attack by knowing the log file. This was discussed by Manger [Man01] and was extended to the memory dump attack [KCJLMWY01]. As described in the current version of PKCS #1, the implementer should make efforts not to correlate Event 1 with the decrypted ciphertexts.
4
Relation to Other Cryptosystems
In this section we discuss how the proposed attack can be extended to other provably secure cryptosystems. EPOC-2 consists of the encryption primitive from Okamoto and Uchiyama [OU98] and the conversion technique from Fujisaki and Okamoto [FO99b] that makes the encryption primitive semantically secure against the chosen ciphertext attack. We can consider two possible variations of EPOC-2: (1) to replace the conversion technique to others. (2) to replace the encryption primitive to others. We discuss how these variations are secure against the proposed attack. 4.1
Other Conversion Techniques
We can convert the EPOC primitive to be secure against the chosen ciphertext attack using different conversions. Fujisaki and Okamoto proposed a conversion technique that converts an IND-CPA scheme to be IND-CCA [FO99a]. The Fujisaki-Okamoto conversion with the EPOC primitive is called EPOC-1. The EPOC primitive is IND-CPA under a non-standard assumption, e.g. the psubgroup assumption [OU98], and there is no significant advantage for EPOC to use this conversion. Pointcheval proposed a general conversion technique that can convert a one-way function to be IND-CCA2 [Poi00]. However the security reduction is not so tight. A conversion technique that has the tight security reduction from the encryption primitive is the REACT conversion [OP01], which is based on the conversion proposed Bellare and Rogaway [BR93]. The REACT conversion with the EPOC primitive is called EPOC-3. In Figure 2, we show
370
Kouichi Sakurai and Tsuyoshi Takagi Encryption of m m ∈ {0, 1}∗ , a message, σ ∈ {0, 1}pLen−1 , a random integer c2 = m ⊕ G(σ), c1 = g σ hr mod n for random r ∈ {0, 1}rLen c3 = H(m, σ, c1 , c2 ) The ciphertext: (c1 , c2 , c3 ) Decryption of c 2 p−1 σ ∗ = L(cp−1 mod p )L(g mod p2 )−1 mod p p 1 ∗ If |σ | ≤ pLen − 1, then go to next step, otherwise return Reject, m∗ = c2 ⊕ G(σ ∗ ), if c3 = H(m∗ , σ ∗ , c1 , c2 ) holds, then output m∗ as decryption of (c1 , c2 , c3 ), otherwise return Reject.
Fig. 2. EPOC using REACT Conversion
a construction of EPOC using REACT conversion, which is modified – the original description in [OP01] does not support two different rejection symbols – in order to compare the security of the converted scheme against the proposed attack with that of EPOC-2. Here h is a hash function that tests the integrity check in the decryption oracle. In this construction there are two different reject functions. If the timing of calculating m∗ = c2 ⊕ G(σ ∗ ) and c3 = H(m∗ , σ ∗ , c1 , c2 ) are relative slow, then the attacker have a possibility to tell the difference between two reject symbols. However, the computation time of hash functions is generally very fast. On the other hand, the integrity check of EPOC-2 computes two modular exponentiations. The attacker has a larger chance to break EPOC-2 using the reject timing attack. Similarly the EPOC-1 using the Fujisaki-Okamoto conversion is vulnerable against the reject timing attack, because it utilizes the re-encryption technique. Recently Coron et al. proposed the GEM family [CHJPPT02a, CHJPPT02b]. The construction of their conversion technique is based on hash functions and a symmetric key cryptosystem — the invalid ciphertexts are rejected by the integrity test using the hash functions and the symmetric key cryptosystem. The computation time of these integrity test are much faster than that of the re-encryption test of EPOC-2, and the reject timing attack on GEM family is more difficult. 4.2
Other Encryption Primitives
The conversion technique by Fujisaki and Okamoto is designed for converting any one-way function to be IND-CCA2 [FO99b]. The Fujisaki-Okamoto conversion is applicable to other cryptographic primitives. We discuss the possibility of adapting our attack to other primitives. We shortly describe their conversion technique in the following. We do not describe the hybrid version using symmetric key system, but the scheme using hash functions. Let (pk, sk) be the public key for a given security parameter k. Let MSP be the message space and let k1 be the size of message space. Epk is the
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem
371
Encryption of m m ∈ {0, 1}∗ , a message, σ ∈R MSP, c2 = m ⊕ G(σ), c1 = Epk (σ, H(m, σ)) The ciphertext: c = (c1 , c2 ) Decryption of c σ ∗ = Dsk (c1 ), If σ ∗ ∈ MSP, then go to next step, otherwise return Reject, m∗ = c2 ⊕ G(σ ∗ ), if c1 = Epk (σ ∗ , H(m∗ , σ ∗ )) holds, then output m∗ as decryption of (c1 , c2 ), otherwise return Reject.
Fig. 3. Fujisaki-Okamoto Conversion
encryption function that encrypts a message in MSP with k2 -bit random integer. Dsk is the decryption function that satisfies Dsk (Epk (σ, r)) = σ for σ ∈ MSP and a random k2 -bit integer r. We use a hash function h : {0, 1}k1 → {0, 1}∗, and a mask generation function g : {0, 1}∗ → {0, 1}k2 . In Figure 3 we describe the Fujisaki-Okamoto conversion technique. Here we have two different rejection functions. The first one is arisen from checking Dsk (c1 ) ∈ MSP. In the case of EPOC-2, the message space MSP is equal to {0, 1}pLen−1 , which is strictly smaller than the space of Dsk (c1 ). Here, most standard cryptographic primitives like RSA, ElGamal-type encryption are permutation and they satisfy the following condition: MSP = {Dsk (c1 )|c1 = Epk (σ, r) for σ ∈ MSP, r ∈ {0, 1}k2 }.
(2)
The message space MSP is not smaller than the space of the decrypted messages. Therefore, any ciphertexts are not rejected by the first test. However, when we design a new cryptographic primitive, we have to care the treatment of the reject function. The cryptographic primitives that have the degenerated MSP are the Rabintype cryptosystem [Bon01] [NSS01] or the NICE cryptosystem [BST01]. It is an interesting problem to investigate the security against the reject timing attack. Note that Manger’s attack [Man01] is not effective on the Rabin-type cryptosystem because the Rabin primitive has no reject function based on the size of the integer-to-octet conversion. On the other hand, Paillier primitive is known as an extension of EPOC to the ring ZZ/n2 ZZ where n is the RSA modulus [Pai99]. The message space of the Paillier primitive is ZZ/nZZ, which is equal to that of the decrypted messages, and thus the Paillier primitive has no reject function based on checking Dsk (c1 ) ∈ MSP. We can not break the cryptosystem based on the Paillier primitive using the reject timing attack.
5
Conclusion
We proposed the reject timing attack against EPOC-2 by observing the timing of two different reject functions. The proposed attack can factor the public mod-
372
Kouichi Sakurai and Tsuyoshi Takagi
ulus, and the EPOC-2 is totally broken. For random 384-bit primes, a public modulus can be factored with probability at least 1/2 by invoking the manipulated ciphertexts about 385 times. In order to repair this flaw of EPOC-2, a unique rejection function should be used through the decryption process and the reject symbol is returned at the end of the whole decryption process. The countermeasures against the Manger’s attack or the core dump attack can also effectively resist our proposed attack.
Acknowledgements We would like to thank anonymous referees and Alexi Dent for their valuable comments.
References [BR93]
[BDPR98]
[Bon01]
[BST01]
[CHJPPT02a]
[CHJPPT02b]
[CRYPTREC] [Den02a]
[Den02b]
[EPOC]
M. Bellare and P. Rogaway, “Random oracles are practical: a paradigm for designing efficient protocols,” First ACM Conference on Computer and Communications Security, (1993), pp.62-73. 369 M. Bellare, A. Desai, D. Pointcheval, and P. Rogaway, “Relations among notions of security for public-key encryption schemes,” Advances in Cryptology – CRYPTO’98, LNCS 1462, pp.26-45, 1998. 359 D. Boneh, “Simplified OAEP for the RSA and Rabin Functions,” Advances in Cryptology - CRYPTO 2001, LNCS 2139, pp.275-291, 2001. 371 J. Buchmann, K. Sakurai, and T. Takagi, “An IND-CCA2 PublicKey Cryptosystem with Fast Decryption,” Information Security and Cryptology - ICISC 2001, LNCS 2288, pp.51-71, 2001. 371 J. -S. Coron, H. Handschuh, M. Joye, P. Paillier, D. Pointcheval, and C. Tymen, “Optimal Chosen-Ciphertext Secure Encryption of Arbitrary-Length Messages,” Public Key Cryptography 2002, LNCS 2274, pp.17-33, 2002. 370 J. -S. Coron, H. Handschuh, M. Joye, P. Paillier, D. Pointcheval, and C. Tymen, “GEM: A Generic Chosen-Ciphertext Secure Encryption Method,” Topics in Cryptology - CT-RSA 2002, LNCS2271, pp.263276, 2002. 370 CRYPTREC, Evaluation of Cryptographic Techniques, IPA. http://www.ipa.go.jp/security/enc/CRYPTREC/. 361, 363 A. Dent, “An implementation attack against the EPOC-2 public-key cryptosystem,” Electronics Letters, 38(9), pp.412, 2002. 360, 363, 368 A. Dent, “An evaluation of EPOC-2,” New European Schemes for Signatures, Integrity, and Encryption (NESSIE), http://www.cryptonessie.org/. 360, 363, 368 EPOC, Efficient Probabilistic Public-Key Encryption. http://info.isl.ntt.co.jp/epoc/ 360, 367
A Reject Timing Attack on an IND-CCA2 Public-Key Cryptosystem [FO99a]
[FO99b]
[FO01]
[IEEE]
[JQY98]
[JQY01]
[KCJLMWY01]
[KR02]
[Koc96]
[KJJ99]
[Man01]
[NSS01]
[NESSIE]
[Nov02]
[OP01]
373
E. Fujisaki and T. Okamoto, “How to Enhance the Security of Public-Key Encryption at Minimum Cost,” 1999 International Workshop on Practice and Theory in Public Key Cryptography, LNCS 1560, (1999), pp.53-68. 369 E. Fujisaki and T. Okamoto, “Secure Integration of Asymmetric and Symmetric Encryption Schemes,” Advances in Cryptology – CRYPTO’99, LNCS 1666, (1999), pp.537-554. 360, 361, 362, 363, 369, 370 E. Fujisaki and T. Okamoto, “A Chosen-Cipher Secure Encryption Scheme Tightly as Secure as Factoring,” IEICE Trans. Fundamentals, Vol. E84-A, No.1, (2001), pp.179-187. 360, 361, 362, 363 IEEE P1363, Standard Specifications for Public-Key Cryptography, 2000. Available from http://grouper.ieee.org/groups/1363/. 361, 362 M. Joye, J. -J. Quisquater, and M. Yung, “The Policeman in the Middle Attack,” presented at rump session of Eurocrypt’98, 1998. 362, 368 M. Joye, J. -J. Quisquater, and M. Yung, “On the Power of Misbehaving Adversaries and Security Analysis of the Original EPOC,” Topics in Cryptology - CT-RSA 2001, LNCS 2020, pp.208-222, 2001. 360, 362, 363, 365, 368 S. Kim, J. Cheon, M. Joye, S. Lim, M. Mambo, D. Won, and Y. Zheng, ”Strong Adaptive Chosen-Ciphertext Attacks with Memory Dump (or: The Importance of the Order of Decryption and Validation)”, Cryptography and Coding, 8th IMA International Conference, LNCS 2260, pp.114-127, 2001. 360, 369 V. Klima and T. Rosa; “Further Results and Considerations on Side Channel Attacks on RSA,” Cryptology ePrint Archive: Report 2002/071, 2002. http://eprint.iacr.org/2002/071/. 360 C. Kocher, ”Timing attacks on Implementations of Diffie-Hellman, RSA, DSS, and other Systems”, Advances in Cryptology - CRYPTO ’96, LNCS 1109, pp.104-113, 1996. 360 C. Kocher, J. Jaffe, and B. Jun, ”Differential Power Analysis”, Advances in Cryptology - CRYPTO ’99, LNCS 1666, pp.388-397, 1999. 360 J. Manger, “A Chosen Ciphertext Attack on RSA Optimal Asymmetric Encryption Padding (OAEP) as Standardized in PKCS #1 v2.0,” Advances in Cryptology - CRYPTO 2001, LNCS 2139, pp.230238, 2001. 359, 369, 371 M. Nishioka, H. Satoh, and K. Sakurai, “Design and Analysis of Fast Provably Secure Public-Key Cryptosystems Based on a Modular Squaring,” Information Security and Cryptology - ICISC 2001, LNCS 2288, pp.81-102, 2001. 371 NESSIE, New European Schemes for Signatures, Integrity, and Encryption, IST-1999-12324. https://www.cosic.esat.kuleuven. ac.be/nessie/ 361, 363 R. Novak, “SPA-Based Adaptive Chosen-Ciphertext Attack on RSA Implementation,” Public Key Cryptography 2002, LNCS 2274, pp.252-262, 2002. 360 T. Okamoto and D. Pointcheval, “REACT: Rapid Enhanced-security Asymmetric Cryptosystem Transform,” In Proceedings of the Cryptographers’ Track at RSA Conference ’2001, LNCS 2020, (2001), pp.159-175. 369, 370
374 [OU98]
[Pai99]
[Poi00]
[Sho01]
Kouichi Sakurai and Tsuyoshi Takagi T. Okamoto and S. Uchiyama; “A New Public-Key Cryptosystem as Secure as Factoring,” Eurocrypt’98, LNCS 1403, pp.308-318, 1998. 362, 363, 365, 369 P. Paillier, “Public-Key Cryptosystems based on Composite Degree Residuosity Classes,” Eurocrypt’99, LNCS 1592, pp.223-238, 1999. 371 D. Pointcheval, “Chosen-ciphertext security for any one-way cryptosystem,” Public Key Cryptography 2000, LNCS 1751, pp.129-146, 2000. 369 V. Shoup, “A Proposal for an ISO Standard for Public-Key Encryption (version 2.1),” http://www.shoup.net. 363
Hardware Fault Attack on RSA with CRT Revisited Sung-Ming Yen1
, Sangjae Moon2 , and Jae-Cheol Ha3
1
2
Laboratory of Cryptography and Information Security (LCIS) Dept of Computer Science and Information Engineering National Central University, Chung-Li, Taiwan 320, R.O.C. [email protected] http://www.csie.ncu.edu.tw/~yensm/ School of Electronic and Electrical Engineering, Kyungpook National University Taegu, Korea 702-701 [email protected] 3 Dept of Computer and Information, Korea Nazarene University Choong Nam, Korea 330-718 [email protected]
Abstract. In this paper, some powerful fault attacks will be pointed out which can be used to factorize the RSA modulus if CRT is employed to speedup the RSA computation. These attacks are generic and can be applicable to Shamir’s countermeasure and also applicable to a recently published enhanced countermeasure (trying to improve Shamir’s method) for RSA with CRT. These two countermeasures share some similar structure in their designs and both suffer from some of the proposed attacks. The first kind of attack proposed in this paper is to induce a fault (which can be either a computational fault or any fault when data being accessed) into an important modulo reduction operation of the above two countermeasures. Note that this hardware fault attack can neither be detected by Shamir’s countermeasure nor by the recently announced enhancement. The second kind of attack proposed in this paper considers permanent fault on some stored parameters in the above two countermeasures. The result shows that some permanent faults cannot be detected. Hence, the CRT-based factorization attack still works. The proposed CRT-based fault attacks once again reveals the importance of developing a sound countermeasure against RSA with CRT. Keywords: Chinese remainder theorem (CRT), Computational fault, Cryptography, Factorization, Hardware fault cryptanalysis, Memory access fault, Permanent fault, Physical cryptanalysis, Side channel attack, Transient fault.
This work was supported by the Mobile Network Security Research Center, School of Electronic and Electrical Engineering, Kyungpook National University, Korea. The first author was also supported in part by the Computer & Communication Research Laboratories (CCL), Industrial Technology Research Institute (ITRI), Republic of China.
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 374–388, 2003. c Springer-Verlag Berlin Heidelberg 2003
Hardware Fault Attack on RSA with CRT Revisited
1
375
Introduction
In order to provide a better support for data protection under strong cryptographic schemes (e.g., RSA [1] or ElGamal [2] systems), varieties of implementations based on tamper-proof devices (e.g., smart IC cards) are proposed. Due to this popular usage of tamper-resistance, much attention has recently been paid regarding the security issues of cryptosystems implemented on tamper-proof devices [3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18] from the view point of presence of hardware faults. This category of cryptanalysis is called the fault-based cryptanalysis. In the fault-based cryptanalysis model, it is assumed that when an adversary has physical access to a tamper-proof device she may purposely induce a certain type of fault into the device. Based on a set of incorrect responses or outputs from the device, due to the presence of faults, the adversary can then extract the secrets embedded in the tamper-proof device. These attacks exploit the presence of many kinds of transient or permanent faults. These attacks are of very general nature and remain valid for a large variety of cryptosystems, e.g., the LUC public key cryptosystem [19] and elliptic curve cryptography (ECC). In this paper, we focus our attention on public key cryptosystems in which their computation can be sped up using the Chinese remainder theorem (CRT) [20, 21]. These cryptosystems may be vulnerable to the hardware fault cryptanalysis to reveal the secret key if the following three conditions are met: (1) the message m to sign (or to decrypt) is known or the correct signature on message m is available; (2) a random fault occurs during the computation of a residue number system; (3) the device outputs the faulty signature on message m. This kind of attacks was called the CRT-based hardware fault cryptanalysis [6, 13, 14]. Our main objective is to emphasize the importance of a careful implementation of cryptosystems with CRT-based speedup. Suppose you are in a context involving trusted third parties (e.g., banks) where thousands of signatures being produced each day. If by accident, a single signature is faulty, then the security of the whole system may be compromised. Leakage of the secret key can be avoided by making sure that either of the above three conditions will not satisfy. Shamir developed a countermeasure [16, 17] to disable the CRT-based hardware fault attack which becomes widely known and employed. However, a very recent research result by a group of researchers from Infineon pointed out a serious security flaw [22] in Shamir’s countermeasure. A countermeasure was also provided [22] in order to fix the security flaw. This article was recently published again [23] at the CHES 2002 conference with only minor modification. Both Shamir’s [16, 17] and Infineon’s [22, 23] countermeasures were designed in heuristic approaches that no rigorous security proof nor at least detailed security analysis was given. The main contribution of this paper is that two new CRT-based fault attacks are proposed in this paper which are generic and can be
376
Sung-Ming Yen et al.
applicable to both Shamir’s and Infineon’s1 countermeasures. Most importantly, it will be shown that these new attacks are much more powerful and feasible than the recent Infineon’s attack on Shamir’s countermeasure. The proposed CRT based attacks are nontrivial since the CRT speedup technique has already been widely employed for almost all popular implementations. Many application scenarios should especially be taken care of under this physical attack model. One of them is the attack on smart IC card since it is mostly easy to expose under a dangerous environment of physical attack. Another example is the attack on some large servers for the purpose of signature generation since a large amount of signatures may be generated within a short time. Banking servers or certification server are two typical examples of such attack targets.
2
Preliminary Background of CRT-Based Cryptanalysis
2.1
Chinese Remainder Theorem
Chinese remainder theorem [21] (CRT) tells that given a set of integers (moduli) n1 , n2 , . . . , nk that are pairwise relatively prime, then the following system of simultaneous congruence s ≡ s1 (mod n1 ) s ≡ s2 (mod n2 ) .. . s ≡ sk (mod nk ) k has a unique solution modulo n = i=1 ni . For the case of RSA secret computation, there are two moduli p and q and the unique integer solution s can be computed by the following two well known CRT recombination algorithms. Given sp and sq , one of the possible CRT recombination computes s = (sp · q · (q −1 mod p) + sq · p · (p−1 mod q)) mod n = (sp · Xp + sq · Xq ) mod n
(1)
where both Xp and Xq can be pre-computed and stored in advance. The above method is often called Gauss’s algorithm [21, p.68]. There is a well known improved CRT recombination, called Garner’s algorithm [21, pp.612–613], which computes s = sq + ((sp − sq ) · (q −1 mod p) mod p) · q. 1
(2)
In fact, the first attack can only be applicable to the original Infineon’s countermeasure [22] published at ePrint Archive of IACR. The very slightly modified version [23] published at CHES 2002 can be free from the first attack. However, no explanation on why to modify their original design can be found. In this paper, detailed analysis of the attack and how to counteract the attack will be provided.
Hardware Fault Attack on RSA with CRT Revisited
377
This CRT-based method was proposed to speed up the RSA signature and decryption computation [20]. This CRT-based speedup for RSA computation has been widely adopted as an implementation standard with the performance of four times faster than a direct computation in terms of bit operations. The applications range from large servers to very tiny smart IC cards. For servers, there are huge amount and very frequent RSA computations to be performed. For smart IC cards, the card processors are often not powerful enough to perform complicated cryptographic computations, e.g., RSA signature and decryption. Therefore, the CRT speedup technique can provide substantial assistance. 2.2
The CRT-Based Cryptanalysis
Let p and q be two primes and n = p · q. In the RSA cryptosystem, a message m is signed with a secret exponent d as s = md mod n. Using the CRTbased approach, the value of s can be evaluated more efficiently from computing both sp = md mod p and sq = md mod q, and using the Chinese remainder theorem to reconstruct s. Suppose that an error (any random error) occurs during the computation of sp (sˆp denotes the erroneous result), but the computation of sq is error free. Applying the CRT on both sˆp and sq will produce a faulty signature sˆ. The CRTbased hardware fault cryptanalysis [6, 13, 14] enables the factorization of n by computing q = gcd((ˆ s − s) mod n, n). (3) or q = gcd((ˆ se − m) mod n, n).
(4)
Notice that in the above computation gcd((ˆ se − m) mod n, n) = gcd(ˆ se − m, n). Similarly, p can be derived from a faulty signature computed by applying the CRT on a faulty sˆq and a correct sp .
3
The Proposed First Attack on Shamir’s Countermeasure
Both Shamir’s and the following Infineon’s countermeasures were designed in heuristic approaches that no rigorous security proof nor detailed analysis was given. In their countermeasures, schemes were developed in order to be immune from the previously reviewed CRT attack. However, it will be shown in the following that their schemes themselves introduce a new possible attack. 3.1
Shamir’s Countermeasure
Shamir presented a simple countermeasure in the rump session of Eurocrypt ’97 [16] and applied a patent [17] which is announced to be secure against the factorization attack. In Shamir’s countermeasure, it is assumed that the smart IC card stores the prime numbers p and q, the secret exponent d, and a precomputed
378
Sung-Ming Yen et al.
parameter q −1 mod p (suppose that Garner’s algorithm will be employed). For each RSA secret computation, a random prime r is chosen, then p = p · r and dp = d mod (p − 1) · (r − 1) are computed. The intermediate value sp = (m mod p )dp mod p is computed, then a partial signature sp = sp mod p is computed. A value of sq = sq mod q is also computed in a similar way where sq = (m mod q )dq mod q . The IC card checks whether sp ≡ sq (mod r).
(5)
If the above checking is correct, then both sp and sq are assumed to be error free and Garner’s algorithm is employed to obtain the RSA signature as s = sq + ((sp − sq ) · (q −1 mod p) mod p) · q. 3.2
The Proposed Attack
If by accident, a faulty result of sˆp is obtained when sp is reduced modulo p and all other intermediate results are correct. Then, the RSA modulus n can be factorized by the following two approaches. In the first case, if the attacker can obtain both the correct signature s and the erroneous signature sˆ (by recombining sˆp and sq ), then the following computation can factorize n q = gcd(ˆ s − s, n) where the erroneous signature sˆ = sq + ((sˆp − sq ) · (q −1 mod p) mod p) · q if Garner’s algorithm is employed. This attack works as well if Gauss’s algorithm will be employed. Notice that in the above attack no knowledge of the related message m is required. In the second case, if the attacker can only obtain the erroneous signature sˆ and its related message m, then the following computation can factorize n q = gcd(ˆ se − m, n). Similarly, this attack works as well if Gauss’s algorithm will be employed. In Shamir’s countermeasure, the scheme checks whether sp ≡ sq (mod r) and the checking result is employed to indicate the correctness of both sp and sq in a probabilistic approach. Then the correctness status of sp and sq is implicitly used to derive the knowledge of correctness of both sp and sq . Unfortunately, this checking cannot detect any computational fault occurred during the computation of sp = sp mod p or fault occurred when the operands sp or p being accessed from memory. The checking result will always be true since both sp and sq are error free. Therefore, the countermeasure will be in vain in this situation. 3.3
Feasibility of the Proposed Attack
We now consider the feasibility of the proposed attack. Note especially that the hardware fault model assumed in this paper is a very weak assumption which is much more feasible than almost all existing attacks.
Hardware Fault Attack on RSA with CRT Revisited
379
Most previously published fault based attacks assumed the induced fault to be at a single bit or at a few bits either of specific locations or random locations. Some attacks assumed one or some bits will be flipped during the fault attack. In our proposed attack, any kind of random computational fault when computing the modulo reduction operation sp = sp mod p or any kind of fault when either sp or p being accessed prior to the reduction operation can be applicable to conduct the proposed attack. Such kind of fault induced when accessing an operand is widely assumed, e.g., an attack developed by inducing fault when accessing p in order to compute p = p · r was reported and experimented in [22]. Note however that in our proposed attack a fault induced when accessing any one of the two operands of sp = sp mod p is sufficient. Furthermore, any computational fault induced during the modular reduction sp mod p is also sufficient. However, in the Infineon’s attack on Shamir’s countermeasure [22], random computational fault during p·r cannot be used to conduct their fault attack. Totally, the proposed attack is much more feasible when compared with most existing hardware fault based attacks. 3.4
The Proposed Attack on Other Similar Variants
Conventionally, it is assumed that only dp and dq (instead of d) will be stored inside the smart card when CRT technique will be employed. For this purpose, a variant of Shamir’s countermeasure was described [24] in which sp = (m mod p )dp mod (p−1)·(r−1) mod p and spr = (m mod r)dp mod (r−1) mod r are computed, then the IC card checks whether sp ≡ spr (mod r). If the above checking is correct, then sp is assumed to be error free and sp = sp mod p is computed. The case of sq is performed in a similar way. The proposed fault based attack is generic and can be applied to the above variant of Shamir’s countermeasure by inducing computational fault or memory access fault into the operation of sp = sp mod p. The checking of whether sp ≡ spr (mod r) cannot detect the above mentioned induced fault no matter whether the checking will be performed prior to or after the operation of sp = sp mod p.
4 4.1
The Proposed First Attack on Infineon’s Countermeasure Infineon’s Countermeasure
A countermeasure developed by a group of researchers of Infineon was recently published [22] which tried to overcome a security flaw of Shamir’s countermeasure and to meet a practical constraint that an implementation of RSA with CRT often does not store the parameter d directly (refer to [24]). Infineon’s Attack In [22], an attack on Shamir’s countermeasure was reported in brief which is summarized in the following. In Shamir’s countermeasure, if a fault is induced when accessing p, then an erroneous pˆ = pˆ·r will be obtained. By
380
Sung-Ming Yen et al.
this approach, an erroneous sˆp will eventually be computed but the relationship sˆp ≡ sq (mod r) still holds. Therefore, the CRT attack still applies to Shamir’s countermeasure under this attack. Possible Extensions of the Original Infineon’s Attack In fact, the Infineon’s attack depends on the details of implementation. In Shamir’s countermeasure, if the prime factor p is accessed from memory and will be stored in a specific register, then the faulty pˆ (due to access fault) will still be used when p − 1) · (r − 1). Therefore the attack succeeds since computing dp = d mod (ˆ ˆ mod pˆ · r) ≡ (md mod (q−1)·(r−1) mod q · r) (mod r). (md mod (p−1)·(r−1)
However, there are often very limited amount of registers in an IC card processor, so the prime factor p will be accessed again (this time it is assumed to be error free) when required to compute dp . Nevertheless, the attack still succeeds in this situation since (md mod (p−1)·(r−1) mod pˆ · r) ≡ (md mod (q−1)·(r−1) mod q · r) (mod r). This observation of re-access of the prime factor p (in fact also for q) extends Infineon’s attack in the following scenario. If the attacker can induce a memp − 1) · (r − 1), then ory access fault to obtain pˆ when computing dp = d mod (ˆ another possible fault attack succeeds since ˆ mod p · r) ≡ (md mod (q−1)·(r−1) mod q · r) (mod r). (md mod (p−1)·(r−1)
Therefore, appropriate protection/detection should be taken when accessing p or q in order to be immune from the above extended attacks. Infineon’s Proposal In Infineon’s countermeasure [22], it is assumed that the smart IC card stores the prime numbers p and q, two partial secret exponents dp = d mod (p − 1) and dq = d mod (q − 1), and a precomputed parameter q −1 mod p. For each RSA secret computation, a random prime r is chosen, then p = p · r and dp = dp + random1 · (p − 1) are computed where random1 is a random integer. sp = mdp mod p is computed, then a partial signature sp = sp mod p is computed. A value of sq is also computed in a similar way. The IC card first performs some simple checkings on the parameters p and dp (also for q and dq ) in order to avoid the previously mentioned security flaw in Shamir’s method, say by checking whether p mod p = 0 and dp mod (p − 1) = dp . Then, Garner’s algorithm is employed to obtain the RSA signature as s = sq + ((sp − sq ) · (q −1 mod p) mod p) · q. The IC card checks whether s mod p = sp
and s mod q = sq
(6)
prior to releasing out the final result s. In their countermeasure, the idea of DiffieHellman key distribution is employed to develop an additional checking by ver ifying the equality of (sp mod r)dq mod (r−1) ≡ (sq mod r)dp mod (r−1) (mod r).
Hardware Fault Attack on RSA with CRT Revisited
4.2
381
The Proposed Attack
Like in the proposed attack on Shamir’s countermeasure, suppose that a computational fault or memory access fault is induced when performing the modular operation sp = sp mod p and a faulty result of sˆp is obtained. Then, the RSA modulus n can be factorized by the following two possible approaches. In the first case, given both the correct signature s and the erroneous signature sˆ (by recombining sˆp and sq with Garner’s algorithm), then n can be factorized by computing q = gcd(ˆ s −s, n) where sˆ = sq +((sˆp −sq )·(q −1 mod p) mod p)· q. It works as well if Gauss’s algorithm will be employed. In the second case, given the erroneous signature sˆ and its related message m, then n can be factorized by computing q = gcd(ˆ se − m, n). It works when either Garner’s or Gauss’s algorithms will be employed. Two checking schemes in Infineon’s countermeasure [22] will be analyzed in the following. Firstly, the countermeasure checks whether s mod p = sp and also whether s mod q = sq . However, the signature s is either correct by recombining both sp and sq , or the signature s is incorrect by recombining both sˆp and sq . This checking can only be used to verify the correctness of Garner’s or Gauss’s computation result. However, in the proposed attack, the computational fault or memory access fault is induced prior to the recombination computation. Therefore, the checking result will always be true if the checking procedure itself is error free. Secondly, the countermeasure checks whether (sp mod r)dq mod (r−1) ≡ (sq mod r)dp mod (r−1) (mod r) and it depends only on sp , sq , and r. Since we assume that both sp and sq are error free. So, the fault induced when computing sp = sp mod p cannot be detected by the above checking scheme. The above proves the validity of the proposed CRT attack.
5
Overall Consideration of the First Proposed Attack
All the above mentioned countermeasures [16, 17, 22, 24] share a same technique that includes the following components. (1) Expanded computation: The original secret computations are performed in an expanded approach. For example, in Shamir’s countermeasure, the com putation of (m mod p)dp mod p is replaced by (m mod p )dp mod p where p = p · r, dp = d mod (p − 1) · (r − 1), and a random prime r is chosen. (2) Auxiliary computation: Additional computations are required for some countermeasures for the purpose of fault detection. For example, in [24] an auxiliary computation of (m mod r)dp mod r is performed. (3) Checking scheme: Some mathematical relationship between expanded computation results or relationship between expanded and auxiliary computation results are checked in order to detect any possible fault occurred within the expanded computation. Different checking schemes are employed in different countermeasures. Some countermeasures use other additional checking
382
Sung-Ming Yen et al.
schemes trying to detect possible fault induced within other operations in their countermeasure. For example, the Infineon’s countermeasure [22] behaves complexly since checkings for CRT recombination and some parameter manipulations are also provided. (4) Transformation: The expanded computation results will be transformed (usually by modular reduction operation) into the necessary result. For example, in Shamir’s countermeasure, sp = (m mod p )dp mod p will be transformed into the necessary partial signature sp = (m mod p)dp mod p by performing sp = sp mod p. All the above proposed attacks exploit the possible computational fault or memory access fault occurred during the transformation step while the checking scheme cannot detect these possible faults. 5.1
Possible Countermeasure
In the slightly modified version2 of Infineon’s countermeasure [23], the original checking procedure (given in [22]) listed in Eq. 6 was replaced by s ≡ sp (mod p) and s ≡ sq (mod q).
(7)
However, no explanation was given on why to make this modification. We analyze this checking procedure with details in the following. The checking of whether s ≡ sp (mod p) is a combination of the following two verifications
and also
s ≡ sp (mod p)
(8)
sp ≡ sp (mod p).
(9)
Checking whether s ≡ sp (mod p) is in order to avoid some potential attacks induced within the CRT recombination operation reported in [25]. On the other hand, checking whether sp ≡ sp (mod p) is in order to avoid the attack proposed in this paper. 5.2
Enhanced Version of Shamir’s Countermeasure
In the following, similar idea will be applied to develop an enhanced version of the original Shamir’s countermeasure. (1) A random prime r is chosen, then p = p · r, q = q · r, dp = d mod (p − 1) · (r − 1), and dq = d mod (q − 1) · (r − 1) are computed. The IC card checks whether p mod p = 0 and (d − dp ) mod (p − 1) = 0. The checkings for prime factor q are similar to that for p. 2
In fact, the primary modification was the replacement on Eq. 6.
Hardware Fault Attack on RSA with CRT Revisited
383
(2) The intermediate values sp = (m mod p )dp mod p and sq = (m mod q )dq mod q are computed, then the partial signatures sp = sp mod p and sq = sq mod q are computed. (3) The IC card then checks whether sp ≡ sq (mod r). If the checking is correct, Garner’s or Gauss’s algorithms is then employed to obtain the RSA signature s = CRT (sp , sq ). (4) The IC card checks whether both s ≡ sp (mod p) and s ≡ sq (mod q) prior to sending out the computed signature s. Any failure of checking in the above countermeasure will stop the computation. Note that a checking scheme performed in the step (4) can detect computational fault of the modular reduction, memory access fault when reading either sp or p (also sq or q), and also any fault occurred during the CRT recombination. However, it cannot detect a potential attack which corrupts the values of sp or sq prior to sp or sq is computed. Interestingly, this potential attack can however be detected by the checking procedure sp ≡ sq (mod r) performed in the step (3).
6
The Second Proposed Attack – Permanent Fault Attack on RSA with CRT
All previous existing countermeasures for RSA with CRT did not consider a potential and important fault attack, i.e., the permanent fault attack. In the permanent fault attack, some parameters of the RSA with CRT countermeasure may be permanently corrupted by the attacker (by some means) or be damaged because of serious hardware malfunction or environmental factor. In the following, analysis of permanent fault attack on the enhanced Shamir’s countermeasure and the Infineon’s countermeasure will be given. 6.1
Permanent Fault on p or q
We first consider that a permanent fault was induced on the storage of p and the value becomes pˆ. In this situation, an erroneous value of sˆp will eventually be obtained. Without proper countermeasure, the faulty signature sˆ can be used to factorize n. It can be verified that in both enhanced Shamir’s and Infineon’s countermeasures all checkings except s ≡ sp (mod p) cannot detect this permanent fault. Suppose that Garner’s CRT recombination algorithm is employed, it can be shown that the checking procedure s ≡ sp (mod p) can detect the permanent fault because of the following property. Theorem 1. In Garner’s CRT recombination algorithm, given sp , sq , (q −1 mod p), q, and a faulty pˆ, it follows that s ≡ sp (mod pˆ).
384
Sung-Ming Yen et al.
Proof. In Garner’s algorithm, for the above given parameters, the signature s is computed as s = sq + ((sp − sq ) · (q −1 mod p) mod pˆ) · q. Therefore, s ≡ sq + ((sp − sq ) · (q −1 mod p) mod pˆ) · q (mod pˆ) ≡ sq + ((sp − sq ) · (q −1 mod p) · q) (mod pˆ) ≡ sp (mod pˆ) ≡ 1 (mod pˆ). since (q −1 mod p) · q
≡ Because of the result sp ≡ sp (mod pˆ) and what the Theorem 1 showing s ≡ sp (mod pˆ). This proves that the checking procedure sp (mod pˆ), therefore s s ≡ sp (mod p) can detect permanent fault on p. On the other hand, if a permanent fault has occurred on the storage of q and the value becomes qˆ. The checking procedure s ≡ sp (mod p) can also detect the permanent fault on q based on the following Theorem 2 and the result of sp ≡ sp (mod p). Theorem 2. In Garner’s CRT recombination algorithm, given sp , sq , (q −1 mod p), p, and a faulty qˆ, it follows that s ≡ sp (mod p). Proof. In Garner’s algorithm, for the above given parameters, the signature s is computed as s = sq + ((sp − sq ) · (q −1 mod p) mod p) · qˆ. Therefore, s ≡ sq + ((sp − sq ) · (q −1 mod p) mod p) · qˆ (mod p) ≡ sq + ((sp − sq ) · (q −1 mod p) · qˆ) (mod p) ≡ sp (mod p) since (q −1 mod p) · qˆ ≡ 1 (mod p).
If Gauss’s CRT recombination algorithm is employed, it can be verified that the checking procedure s ≡ sp (mod p) still works. Suppose a permanent fault has occurred on p. Then the following Theorem 3 implies that the checking can detect the fault. Theorem 3. In Gauss’s CRT recombination algorithm, given sp , sq , Xp = q · (q −1 mod p), Xq = p · (p−1 mod q), q, and a faulty pˆ, it follows that s ≡ sp (mod pˆ). Proof. In Gauss’s algorithm, for the above given parameters, the signature s is computed as s = (sp · Xp + sq · Xq ) mod (ˆ p · q). Therefore, s ≡ (sp · Xp + sq · Xq ) (mod pˆ) ≡ sp (mod pˆ) ≡ 1 (mod pˆ) and Xq ≡ 0 (mod pˆ). since Xp The situation to detect a permanent fault on q is similar.
Hardware Fault Attack on RSA with CRT Revisited
6.2
385
Permanent Fault on dp or dq
In the original Shamir’s countermeasure and the enhanced version proposed in Subsection 5.2, only the secret exponent d is stored in the IC card. So, if a permanent fault has occurred on d, both sp and sq will be erroneous. Therefore, the CRT-based hardware fault attack is no longer applicable. However, in Infineon’s countermeasure [22, 23], a permanent fault on dp will lead to a faulty sˆp but the result of sq will still be correct. Therefore, the recombined signature sˆ = CRT (sˆp , sq ) enables the factorization of n. Unfortunately, no checking procedure in their countermeasure can detect this permanent fault. Given the faulty dˆp , Infineon’s countermeasure computes dˆp = dˆp + random1 · (p − 1) then checks whether dˆp mod (p − 1) = dˆp . Evidently, this checking is totally in vain in order to detect this permanent fault dˆp . ˆ The IC card computes sˆp = mdp mod (p · r) and the partial signature sˆp = sˆp mod p. After that the signature is computed as sˆ = sq + ((sˆp − sq ) · (q −1 mod p) mod p) · q and a checking procedure sˆ ≡ sˆp (mod p) is performed. However, this checking does not help to detect the existence of permanent fault dˆp since both sˆ ≡ sˆp (mod p) and sˆp ≡ sˆp (mod p) are true. Finally, in their countermeasure, the following checking is performed (sˆp mod
ˆ
r)dq ≡ (sq mod r)dp (mod r). Unfortunately, this checking still fails to detect the permanent fault dˆp because of the following fact
(sˆp mod r)dq
ˆ
≡ ((mdp mod (p · r)) mod r)dq (mod r) ˆ
ˆ
≡ (mdp )dq (mod r) ≡ (mdq )dp (mod r)
ˆ
≡ ((mdq mod (p · r)) mod r)dp (mod r) ˆ
≡ (sq mod r)dp (mod r).
7
Concluding Remarks
In this paper, we consider two kinds of CRT-based hardware fault attacks. In the first attack, computational fault or memory access fault are exploited to conduct the proposed attack. Permanent fault on some stored parameters of existing countermeasures is exploited in the second attack. Both these attacks are generic and can be applicable to many existing countermeasures. As far as we know, the proposed enhanced version of Shamir’s countermeasure in Subsection 5.2 may be a good countermeasure (with checking procedures in it) which can be immune from all published (including previously reported results and the new proposed results in this paper) CRT-based fault attacks at the same time.
386
Sung-Ming Yen et al.
One important thing to notice is that a countermeasure will become less reliable and less secure if more checking procedures will be employed like what existed in Infineon’s countermeasure [22, 23] and in the proposed enhanced version of Shamir’s countermeasure in Subsection 5.2. This is especially a serious problem when compound conditional statements (refer to the above two mentioned countermeasures) are employed since they behave more complicated and leave more possible weakness. It is possible that the countermeasure may fail if more than two spikes attacks (or a combination of one spikes attack and one glitch attack) can be conducted such that the first spikes attack will be used to introduce computational fault or memory access fault (like what Infineon’s attack [22, 23] does) on Shamir’s method and the second spikes attack (or a glitch attack) will be used to tamper (or just to skip) the checking procedure. Notice that in any processor a checking procedure (often as a conditional JUMP machine instruction) always relies on a single flag bit and it is often the ZERO flag. Therefore, tampering on this checking procedure can fail the countermeasure easily. So, it will be better to develop a countermeasure without any checking procedure within it. A final remark to point out is that if it is necessary to assume only storing dp and dq (instead of only d) inside the smart card when CRT technique being employed. Evidently, it is because of this assumption that makes Infineon’s solution [23] be so complex (even though it may not be secure all the time as analyzed in Section 6) which brings much overhead of memory storage to store many additional parameters and the much complex program to encode the countermeasure algorithm, and also overhead of computation time to conduct the countermeasure.
Acknowledgments The authors wish to acknowledge the four anonymous reviewers for their valuable suggestions and comments, especially the agreement with the proposed attacks. One reviewer pointed out that the permanent fault attack using errors on ROM is not practical. This comment motivates us to notice that in a real scenario a permanent fault can be achieved by a temporary memory access fault when a parameter is retrieved from ROM or even a hard disk into RAM or a CPU register, respectively for later usage. Evidently, the proposed permanent fault attack is still practical in this scenario.
References [1] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital signatures and public-key cryptosystem,” Commun. of ACM, vol. 21, no. 2, pp. 120– 126, 1978. 375 [2] T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms,” IEEE Trans. Inf. Theory, vol. 31, no. 4, pp. 469–472, 1985. 375
Hardware Fault Attack on RSA with CRT Revisited
387
[3] R. Anderson and M. Kuhn, “Tamper resistance – a cautionary note,” In Proceedings of the 2nd USENIX Workshop on Electronic Commerce, pp. 1–11, 1996. 375 [4] R. Anderson and M. Kuhn, “Low cost attacks on tamper resistant devices,” In Pre-proceedings of the 1997 Security Protocols Workshop, Paris, France, 7–9th April 1997. 375 [5] Bellcore Press Release, “New threat model breaks crypto codes,” Sept. 1996, available at URL . 375 [6] D. Boneh, R. A. DeMillo, and R. J. Lipton, “On the importance of checking cryptographic protocols for faults,” In Advances in Cryptology – EUROCRYPT ’97, LNCS 1233, pp. 37–51, Springer-Verlag, 1997. 375, 377 [7] F. Bao, R. H. Deng, Y. Han, A. Jeng, A. D. Narasimbalu, and T. Ngair, “Breaking public key cryptosystems on tamper resistant devices in the presence of transient faults,” In Pre-proceedings of the 1997 Security Protocols Workshop, Paris, France, 1997. 375 [8] Y. Zheng and T. Matsumoto, “Breaking real-world implementations of cryptosystems by manipulating their random number generation,” In Pre-proceedings of the 1997 Symposium on Cryptography and Information Security, Fukuoka, Japan, 29th January–1st February 1997. An earlier version was presented at the rump session of ASIACRYPT ’96. 375 [9] I. Peterson, “Chinks in digital armor – Exploiting faults to break smart-card cryptosystems,” Science News, vol. 151, no. 5, pp. 78–79, 1997. 375 [10] M. Joye, J.-J. Quisquater, F. Bao, and R. H. Deng, “RSA-type signatures in the presence of transient faults,” In Cryptography and Coding, LNCS 1355, pp. 155– 160, Springer-Verlag, 1997. 375 [11] D. P. Maher, “Fault induction attacks, tamper resistance, and hostile reverse engineering in perspective,” In Financial Cryptography, LNCS 1318, pp. 109–121, Springer-Verlag, Berlin, 1997. 375 [12] E. Biham and A. Shamir, “Differential fault analysis of secret key cryptosystems,” In Advances in Cryptology – CRYPTO ’97, LNCS 1294, pp. 513–525, SpringerVerlag, Berlin, 1997. 375 [13] A. K. Lenstra, “Memo on RSA signature generation in the presence of faults,” September 1996. 375, 377 [14] M. Joye, A. K. Lenstra, and J.-J. Quisquater, “Chinese remaindering based cryptosystems in the presence of faults,” Journal of Cryptology, vol. 12, no. 4, pp. 241245, 1999. 375, 377 [15] M. Joye, F. Koeune, and J.-J. Quisquater, “Further results on Chinese remaindering,” Tech. Report CG-1997/1, UCL Crypto Group, Louvain-la-Neuve, March 1997. 375 [16] A. Shamir, “How to check modular exponentiation,” presented at the rump session of EUROCRYPT ’97, Konstanz, Germany, 11–15th May 1997. 375, 377, 381 [17] A. Shamir, “Method and apparatus for protecting public key schemes from timing and fault attacks,” United States Patent 5991415, November 23, 1999. 375, 377, 381 [18] S. M. Yen and M. Joye, “Checking before output may not be enough against fault-based cryptanalysis,” IEEE Trans. on Computers, vol. 49, no. 9, pp. 967– 970, Sept. 2000. 375 [19] P. J. Smith and M. J. J. Lennon, “LUC: A new public key system,” In Ninth IFIP Symposium on Computer Security, Elsevier Science Publishers, pp. 103–117, 1993. 375
388
Sung-Ming Yen et al.
[20] J.-J. Quisquater and C. Couvreur, “Fast decipherment algorithm for RSA publickey cryptosystem,” Electronics Letters, vol. 18, no. 21, pp. 905–907, 1982. 375, 377 [21] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of applied cryptography. CRC Press, 1997. 375, 376 [22] C. Aum¨ uller, P. Bier, W. Fischer, P. Hofreiter, and J.-P. Seifert, “Fault attacks on RSA with CRT: Concrete results and practical countermeasures,” Posted at the ePrint Archive of IACR web page with paper number 073. 375, 376, 379, 380, 381, 382, 385, 386 [23] C. Aum¨ uller, P. Bier, W. Fischer, P. Hofreiter, and J.-P. Seifert, “Fault attacks on RSA with CRT: Concrete results and practical countermeasures,” In Pre-proceedings of Cryptographic Hardware and Embedded Systems – CHES 2002, pp. 261–276, August 13-15, 2002, California, USA. 375, 376, 382, 385, 386 [24] M. Joye, P. Pailler, and S. M. Yen, “Secure evaluation of modular functions,” In Proc. of 2001 International Workshop on Cryptology and Network Security – CNS 2001, pp. 227–229, September 26–28, 2001. 379, 381 [25] S. M. Yen, S. J. Kim, S. G. Lim, and S. J. Moon, “RSA speedup with residue number system immune against hardware fault cryptanalysis,” In Information Security and Cryptology – ICISC 2001, LNCS 2288, pp. 397–413, Springer-Verlag, 2002. 382
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer Byoungcheon Lee1 and Kwangjo Kim2 1
Joongbu University San 2-25, Majon-Ri, Chuboo-Meon, Kumsan-Gun, Chungnam, 312-702, Korea [email protected] 2 Information and Communications University 58-4, Hwaam-dong, Yusong-gu, Daejeon, 305-732, Korea [email protected]
Abstract. We investigate the receipt-freeness issue of electronic voting protocols. Receipt-freeness means that a voter neither obtains nor is able to construct a receipt proving the content of his vote. [Hirt01] proposed a receipt-free voting scheme by introducing a third-party randomizer and by using divertible zero-knowledge proof of validity and designated-verifier re-encryption proof. This scheme satisfies receiptfreeness under the assumption that the randomizer does not collude with a buyer and two-way untappable channel exists between voters and the randomizer. But untappable channel is hard to implement in real world and will cause inconvenience to voters although it is provided. In this paper we extend [Hirt01] such that a tamper-resistant randomizer (TRR), a secure hardware device such as smart card or Java card, replaces the role of third-party randomizer and untappable channel. Moreover K-out-of-L receipt-free voting is provided in more efficient manner by introducing divertible proof of difference. Keywords: Electronic voting, receipt-freeness, randomizer, divertible zero-knowledge proof.
1
tamper-resistant
Introduction
The research on electronic voting is very important for the progress of democracy. It is expected that in the near future electronic voting will be used more frequently to collect people’s opinion for many kind of political and social decisions through cyber space. In cryptographic aspect it is one of the most significant applications of cryptographic protocols. 1.1
Security Requirements and Approaches
Many extensive researches on electronic voting have been conducted and now an extensive list of security requirements for electronic voting is available. Generally we can classify the security requirements of electronic voting into the following two categories [BT94, FOO92, MH96, NR94, LK00]: P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 389–406, 2003. c Springer-Verlag Berlin Heidelberg 2003
390
Byoungcheon Lee and Kwangjo Kim
Basic Requirements – – – – – –
Privacy: All votes should be kept secret. Completeness: All valid votes should be counted correctly. Soundness: Any invalid vote should not be counted. Unreusability (prevent double voting): No voter can vote twice. Eligibility: No one who is not allowed to vote can vote. Fairness: Noting can affect the voting.
Extended Requirements – Robustness: The voting system should be successful regardless of partial failure of the system. – Universal verifiability: Anyone can verify the fact that the election is fair and the published tally is correctly computed from the ballots that were correctly cast. – Receipt-freeness: A voter neither obtains nor is able to construct a receipt proving the content of his vote. – Incoercibility: A voter cannot be coerced into casting a particular vote by a coercer. This is a stronger requirement than receipt-freeness. If we assume that the coercer cannot observe the voter during the very moment of voting, receipt-freeness gives incoercibility and vote buying is prevented. The basic requirements are satisfied in most electronic voting systems and their implementation is relatively easy. But the extended requirements are hard to implement and in many case they require large amount of computation and communication. Specially universal verifiability and receipt-freeness seem to be contradictory. Exchanged messages or user-chosen randomness are useful to verify the correctness of vote, but there are possibilities that these data are used as a receipt. Current research on electronic voting is focused on receipt-free schemes that also satisfy universal verifiability. Electronic voting schemes found in the literature can be classified by their approaches into the following three categories: – Schemes using blind signature: [Cha88, FOO92, OMAFO99]. – Schemes using mix-net: [PIK93, SK95, Pfi94, MH96, Abe98, Jak98, HS00, Hirt01, MBC01]. – Schemes using homomorphic encryption: [Ben87, SK94, CFSY96, CGS97, LK00, Hirt01, BFPPS01, Cha02, Po00]. Voting schemes based on blind signature technique are simple, efficient, and flexible, but they cannot provide receipt-freeness. Voter’s blind factor can be used as a receipt of his vote, therefore a voter can prove his vote to a buyer. Voting schemes based on mix-net are generally not efficient because they require huge amount of computation for multiple mixers (mixing and proving correctness of their jobs). Voting schemes based on homomorphic encryption use zeroknowledge proof techniques to prove the validity of ballot. In this approach there have been extensive researches to provide receipt-freeness.
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
1.2
391
Approaches to Achieve Receipt-Freeness
The concept of receipt-freeness was first introduced by Benaloh and Tuinstra [BT94]. Considering the threat of vote-buyers (coercers), a voting scheme should ensure not only that a voter can keep his vote private, but also that he must keep it private. The voter should not be able to prove to a third party that he had cast a particular vote. He must neither obtain nor be able to construct a receipt proving the content of his vote. Recently, [HS00] has shown that the voting protocol of [BT94] does not provide receipt-freeness. In this study we assume that the coercer does not observe the voter during the very moment of voting. Obviously, if voters use personal computer to vote over the Internet, the coercer can manage to observe the voter and coerce him to cast a particular vote. But this threat is possible in any voting system using personal computer over the Internet and is beyond the scope of cryptographic research. Our goal in this paper is to prevent a voter from getting or being able to construct a receipt. To achieve receipt-freeness, voting schemes in the literature make some physical assumption about the communication channel between the voter and the authority. 1. One-way untappable channel from the voter to the authority [Oka97]. 2. One-way untappable channel from the authority to the voter [SK95, HS00]. 3. Two-way untappable channel (voting booth) between the voter and the authority [BT94, Hirt01]. Note that the existence of untappable channel from the authority to the voter is the weakest physical assumption for receipt-freeness [HS00]. 1.3
Related Works
In this section, we review [LK00, Hirt01], and [MBC01] briefly because our study is based on their results. [LK00] tried to provide receipt-freeness by extending [CGS97]. They assumed a trusted third party called honest verifier (HV) who verifies the validity of voter’s first ballot and generates the final ballot and proof of validity of ballot cooperatively with the voter such that the voter cannot get any receipt. This is an efficient solution because a single entity can provide receipt-freeness. But [Hirt01] has pointed out that in this protocol a malicious HV can help a voter to cast an invalid vote and thereby falsify the outcome of the whole vote. Moreover the voter can construct a receipt by choosing his challenge as a hash value of his first ballot. This is the same attack applied to [BT94]. To resist against this attack, voter should not be allowed to choose any challenge. [Hirt01] proposed a receipt-free voting scheme based on a third-party randomizer. The role of randomizer is similar to HV of [LK00] (generates the final ballot by randomizing the first ballot and generates the proof of validity interactively with the voter), but the randomizer generates the re-encryption proof in
392
Byoungcheon Lee and Kwangjo Kim
designated-verifier way and uses a divertible zero-knowledge proof technique to generate the proof of validity. Recently [BFPPS01] proposed an efficient multicandidate electronic voting scheme based on Paillier Cryptosystem [Pai99], in which tallying stage is more efficient. [MBC01] proposed a receipt-free electronic voting protocol using a tamperresistant smartcard. They pointed out the difficulty of implementing untappable channel and introduced the necessity of tamper-resistant device. In their voting protocol smartcard plays the role of mixer. But, in their voting protocol the re-encryption proof is given in an interactive way, so the same attack applied to [BT94] and [LK00] is possible. The re-encryption proof should be given in a non-interactive and designated-verifier way such that it cannot be transferred to third parties and the voter cannot construct a receipt. 1.4
Tamper-Resistant Hardware Device
[HS00] stated that the existence of untappable channel from the authority to the voter is the weakest physical assumption for receipt-freeness. But, in the real world, implementing an untappable channel in distributed environment is very difficult. If a physically isolated voting booth in a dedicated computer network is used to achieve receipt-freeness, it will cost a lot and will cause inconvenience to voters since they have to go to particular voting booth. If the overall voting system is inconvenient, participation in electronic voting will not be advantageous. To increase the participation rate in electronic voting, Internet voting will be the best solution, in which voters can participate in electronic voting in any place over the Internet. But achieving receipt-freeness is a hard task in Internet voting, since Internet is a tappable channel. As suggested in [MBC01], a tamper-resistant hardware device can replace the role of untappable channel and trusted third party. Since tamper-resistant hardware devices are designed by secure architecture, it is thought to be the ultimate place to store user’s secret information such as secret signing key. As the technology of tamper-resistant hardware device advances in the point of computational power, it can compute complicated computation. Recently, the technology of tamper-resistant hardware device advances quickly and the usage of smart card and Java card is increasing. Therefore tamper-resistant hardware device seems to be more practical assumption than untappable channel and trusted third party. It is expected that tamper-resistant hardware device can be applied to wide range of advanced applications in the near future. Electronic voting can be a good example. 1.5
Our Contribution
In this paper we extend [Hirt01] scheme such that a tamper-resistant randomizer (TRR), a secure hardware device such as smart card or Java card, replaces the role of third party randomizer and untappable channel. Moreover K-out-of-L (choose K candidates among L candidates) receipt-free voting is provided in
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
393
more efficient manner by introducing divertible proof of difference. In this scheme TRR is locally connected to the voter system (does not use network facility) and executes the role of randomizer. This scheme does not require untappable communication channel and trusted third party. Assuming the tamper-resistance of TRR, it provides receipt-freeness together with efficiency. Furthermore we consider an efficient variant that the voter just inputs his choice, and then TRR generates encrypted ballot and proof of validity, and finally the voter approves the result. 1.6
Outline of the Paper
The paper is organized as follows. In Section 2, we overview the proposed voting scheme briefly and describe the model of electronic voting. Cryptographic primitives are described in Section 3 and complete voting protocol is described in Section 4. Security and efficiency analysis are followed in Sections 5 and 6. Finally we conclude in Section 7.
2
Model of Electronic Voting
In this section we overview the proposed voting scheme briefly and describe the model of electronic voting. Some of the zero-knowledge proof techniques which appear first in this section will be described in the following section. 2.1
Overview of the Proposed Voting Protocol
The proposed voting protocol runs as follows. The voter generates an encrypted first ballot and gives it to tamper-resistant randomizer (TRR). Then TRR randomizes it to generate a final ballot and prove its correctness to the voter using the designated-verifier re-encryption proof. If this is valid, the voter and TRR jointly generate a proof of validity of the final ballot using divertible proof of validity protocol and divertible proof of difference protocol. The final ballot and the proof of validity are first digitally signed by voter’s TRR during the protocol run, and then they are signed by the voter to represent voter’s approval. The voter posts the final ballot, the proof of validity and the proof of difference on the bulletin board. Only valid ballots are counted by the authority. 2.2
Entities
The main entities involved in the voting protocol are an administrator A, M voters Vi (i = 1, . . . , M ), and N talliers Tj (j = 1, . . . , N ). To participate in the voting each voter should have his own tamper-resistant randomizer (TRR) issued by A. The roles of each entity are as follows: – Administrator A verifies the identities and their eligibilities of M voters and then issues TRR devices to voters in the registration stage. She manages the whole voting process (announces the list of candidates, collects valid ballots, and announces the final result).
394
Byoungcheon Lee and Kwangjo Kim
Voter
TRR
BBS
✲ Encrypted first ballot
✛ Re-encrypted final ballot (signed) Designated-verifier re-encryption proof
✛
✲
Divertible proof of validity (signed) Divertible proof of difference (signed)
✲ Voting (post signed messages) Final ballot, proof of validity, proof of difference (first signed by TRR and then signed by the voter)
Fig. 1. Overview of the proposed voting protocol – There are M voters Vi (i = 1, . . . , M ). They have their own digital signature keys certified by a certification authority (CA). To participate in the voting, each voter needs to register to A and get his own TRR issued by A. – There are N talliers Tj (j = 1, . . . , N ) who cooperatively decrypt the collected ballots to open the result of voting. A threshold t denotes the lower bound of the number of authorities that is guaranteed to remain honest during the protocol. Here we assume that the administrator A does not collude with a buyer to issue an illegal TRR to a voter. This assumption is equivalent to the assumption of [Hirt01] that the third party randomizer does not collude with a buyer. 2.3
Tamper-Resistant Randomizer
TRR is a tamper-resistant hardware device issued by the administrator A (or any trusted third party) to a specific qualified voter. It is not an independent entity in our model, but is a hardware equipment owned by the voter. It is directly connected to voter system and has restricted set of interfaces for communication. The communication channel between the voter and his TRR is assumed to be untappable. It has its own randomness source and is securely equipped with its own digital signature key certified by the administrator. It is equipped with talliers’ public key and voter’s public key. Because it is a tamper-resistant device, even the administrator and the voter cannot access the randomness and any internal information.
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
395
It helps the voter to generate an encrypted ballot and proof of validity such that the voter can be convinced of the validity of his vote but cannot get a receipt of his vote. More specifically, TRR produces the final ballot by randomizing voter’s first ballot, provides designated-verifier re-encryption proof, produces proof of validity jointly with the voter. All messages that TRR provides are digitally signed with its signature key. Only the encrypted ballots and the proof of validity which are signed by TRR are accepted to be valid. 2.4
Communication Model
The communication channel between the voter and the administrator is a public broadcast channel with memory, i.e., a bulletin board. Voters post their encrypted ballot and proof of validity on the bulletin board with their signature, so double voting is prevented. Anyone except the voter cannot post a ballot with the name of the voter. Anyone can read and verify the posted ballots, which provides universal verifiability. The communication channel between the voter and his TRR is an internal communication without using network functions. We assume that the coercer does not observe the voter during the very moment of voting. Obviously, if voters use personal computer to vote over the Internet, the coercer can manage to observe the voter. But this threat is possible in any voting system using personal computer over the Internet and is beyond the scope of cryptographic research. Our goal in this paper is to prevent a voter from getting or being able to construct a receipt. 2.5
Encoding of Ballots
First, we consider a 1-out-of-L voting scheme in which voters choose a candidate out of L candidates. Let g be a generator of a multiplicative subgroup Zp∗ of order q and h be the public key of talliers. To achieve simple decryption using the homomorphic property of ElGamal encryption, a vote for the i-th candidate i−1 where M is the maximum number of voters. (1 ≤ i ≤ L) is represented as g M i−1 Then ElGamal encryption for the vote is given by (x, y) = (g α , hα g M ) where α is voter’s random number. This encoding allows easy decoding of the sum by simple remaindering. Next, we consider a K-out-of-L voting scheme in which voters can have K choices out of L candidates. In this case the total ballot is composed of K independent ballots of 1-out-of-L voting with additional proofs that the K choices are all different.
3 3.1
Cryptographic Primitives Threshold ElGamal Encryption
To generate encrypted ballot, homomorphic ElGamal encryption and threshold ElGamal decryption are used.
396
Byoungcheon Lee and Kwangjo Kim
Consider the ElGamal encryption system [ElG85] under a multiplicative subgroup Zp∗ of order q, where p and q are large primes such that q | p−1. If a receiver chooses a private key s, the corresponding public key is h = g s where g is the generator of the subgroup. Given a message m ∈ Zp , encryption of m is given by (x, y) = (g α , hα m) for a randomly chosen α ∈R Zq . To decrypt the ciphertext (x, y), the receiver recovers the plaintext as m = y/xs using the private key s. In our proposed voting scheme, we consider a K-out-of-L voting where K is the number of voter’s choices and L is the number of candidates. We implement it as K independent ballots of 1-out-of-L voting. If we choose a special encoding of message such that the homomorphic property is preserved, the final tally can be computed by a single decryption of the product of all valid ballots. A threshold public-key encryption scheme is used to share a secret key among N talliers such that messages can be decrypted only when a substantial subset of talliers cooperate. More detailed description is found in [CGS97] and [Ped91]. It consists of key generation protocol, encryption algorithm, and decryption protocol. Consider a (t, N )-threshold encryption scheme where the secret key is shared among N talliers Tj (1 ≤ j ≤ N ) and decryption is possible only when more than t talliers cooperate. Through the key generation protocol, each tallier Tj will possess a share sj ∈ Zq of a secret s. Each tallier publishes the value hj = g sj as a commitment of the share sj . The shares sj are chosen such that the secret s can be reconstructed from any subset Λ of t shares using appropriate Lagrange coefficients, l sj λj,Λ , λj,Λ = s= l−j j∈Λ
l∈Λ\{j}
s
The public key h = g is announced to all participants in the system. Encryption of a message m using the public key h is given by (x, y) = (g α , hα m) which is the same as the ordinary ElGamal encryption. To decrypt a ciphertext (x, y) = (g α , hα m) without reconstructing the secret s, talliers execute the following protocol: 1. Each tallier Tj broadcasts wj = xsj and proves the equality of the following discrete logs in zero-knowledge using the proof of knowledge protocol. logg hj = logx wj . 2. Let Λ denote any subset of talliers who passed the zero-knowledge proof. Then the plaintext can be recovered as λ m = y/ wj j,Λ . j∈Λ
3.2
Designated-Verifier Re-encryption Proofs
A designated-verifier proof is a proof which is convincing only the designated verifier, but it is completely useless when it is transferred to any other entity [JSI96].
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
397
The basic idea is to prove knowledge of either the witness in question or of the secret key of the designated verifier. Such a proof convinces the designated verifier because he assumes that the prover does not know his secret key. But, if the proof is transferred to another entity, it loses its persuasiveness completely. We consider designated-verifier re-encryption proofs. Let (x, y) = (g l , hl m) be an original ElGamal ciphertext of some message m with a public key h = g s . Let (xf , yf ) = (xg w , yhw ) be a re-encrypted ElGamal ciphertext generated by the prover P (TRR). Let hV = g sV be the public key of the verifier V (Voter) corresponding to the private key sV . P wants to prove to V that his re-encryption was generated correctly in a way that his proof cannot be transferred to others. He will prove that xf /x and yf /y have same discrete logarithm under bases g and h, respectively. Designated-Verifier Re-encryption Proof : Prover (TRR): 1. Chooses k, r, t ∈R Zq . 2. Computes (a, b) = (g k , hk ) and d = g r htV . 3. Computes c = H(a, b, d, xf , yf ) and u = k − w(c + r). 4. Sends (c, r, t, u) to V . Verifier (Voter): ? 1. Verifies c = H(g u (xf /x)c+r , hu (yf /y)c+r , g r htV , xf , yf ). In this protocol d = g r htV is a trapdoor commitment (or chameleon commitment) for r and t. Because V knows his private key sV , he can open d to arbitrary values r and t such that r + sV t = r + sV t holds. V can generate the re-encryption proof for any (˜ x, y˜) of his choice using his knowledge of sV . Selecting (α, β, u˜) at random, V computes x)α , hu˜ (yf /˜ y)α , g β , xf , yf ), c˜ = H(g u˜ (xf /˜ c, r˜, t˜, u ˜) is an accepting and also computes r˜ = α − c˜ and t˜ = (β − r˜)/sV . Then (˜ proof. Therefore designated-verifier re-encryption proof cannot be transferred to others. 3.3
Divertible Proof of Validity
In the proposed receipt-free voting scheme, the voter gives his first encrypted ballot to TRR, then TRR re-encrypts it to generate the final ballot. The divertible proof of validity is an interactive modification of the non-interactive proof of validity of ballot such that TRR adds its own randomness to the commitment of the voter and then adjusts the response of the voter such that the non-interactive proof of validity holds for the final ballot, but the voter cannot construct any receipt. Let (x, y) = (g α , hα mi ) be voter’s first ballot for his vote mi where α is voter’s random number and (xf , yf ) = (xg β , yhβ ) be the final ballot re-encrypted by TRR where β is TRR’s internal random number. Voter and TRR can jointly compute a non-interactive proof of validity for the final ballot as follows:
398
Byoungcheon Lee and Kwangjo Kim
Divertible Proof of Validity: 1. Voter → TRR (commitment): – Voter chooses a random number w ∈R Zq and computes ai = g w , bi = hw . – For j = 1, . . . , i−1, i+1, . . . , L, voter chooses rj , dj ∈R Zq , and computes aj = g rj xdj and bj = hrj (y/mj )dj . – Voter sends (A , B ) = (a1 , b1 , . . . , aL , bL ) to TRR. 2. Voter ← TRR (randomized commitment): – For j = 1, . . . , L, TRR chooses rj , dj ∈R Zq , and computes aj = aj g rj xdj and bj = bj hrj (y/mj )dj . Here j dj = 0 should hold. – TRR sends (A, B) = (a1 , b1 , . . . , aL , bL ) to the voter. 3. Voter → TRR (response): – Voter computes c = H(a1 , b1 , . . . , aL , bL ). – Voter computes di = c − j=i dj and ri = w − αdi . – Voter sends (D , R ) = (d1 , r1 , . . . , dL , rL ) to TRR. 4. Voter ← TRR (adjusted response): – For j = 1, . . . , L, TRR computes dj = dj + dj and rj = rj + rj − dj β. – TRR sends (D, R) = (d1 , r1 , . . . , dL , rL ) to the voter. 5. Voter (Any verifier): – Voter checks ?
d1 + · · · + dL = H(g r1 xdf 1 , hr1 (yf /m1 )d1 , . . . , g rL xdf L , hrL (yf /mL )dL ). The final verification equation holds because of the following relations. c=
dj
j
aj = aj g rj xdj = g rj +rj xdj +dj = g rj +βdj xdj = g rj xf j ,
d
bj = bj hrj (y/mj )dj = hrj +rj (y/mj )dj +dj = hrj +βdj (y/mj )dj = hrj (yf /mj )dj . Through an interactive protocol between the voter and TRR, voter gets a proof of validity (A, B, D, R) for the final ballot (xf , yf ). In this protocol, protocol messages from TRR should be authentic, i.e., messages (A, B) and (D, R) should be digitally signed by TRR’s private key and verified by the voter. Signed proofs represent that they are generated by TRR. The original interactive proof of validity protocol is honest-verifier zeroknowledge, i.e., it is zero-knowledge with an honest verifier who selects the challenge independently from the commitment message. The non-interactive variant of proof of validity is zero-knowledge in the random oracle model since the hash value of commitment message is used as a challenge. Since the modified commitment and adjusted response are fully randomized by TRR, the voter cannot prove any correspondence between the proof of validity of the final ballot and that of his first ballot. Therefore this protocol is receipt-free.
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
3.4
399
Divertible Proof of Difference
When the voter participates in a K-out-of-L voting, he prepares K independent encrypted ballots and provides proofs that they are all different. Using the same method, the proof of difference can be made divertible and receipt-free. Let (x1 , y1 ) and (x2 , y2 ) be two independent first ballots of the voter and (xf 1 , yf 1 ) and (xf 2 , yf 2 ) be corresponding final ballots re-encrypted by TRR. (x1 , y1 ) = (g α1 , hα1 m1 ), (x2 , y2 ) = (g α2 , hα2 m2 ). (xf 1 , yf 1 ) = (x1 g β1 , y1 hβ1 ), (xf 2 , yf 2 ) = (x2 g β2 , y2 hβ2 ). Now consider their differences as follows. (x, y) ≡ (x1 /x2 , y1 /y2 ) = (g α1 −α2 , hα1 −α2 m1 /m2 ) ≡ (g α , hα m1 /m2 ) (xf , yf ) ≡ (xf 1 /xf 2 , yf 1 /yf 2 ) = (xg β1 −β2 , yhβ1 −β2 ) ≡ (xg β , yhβ ) Voter and TRR jointly generate the proof of difference as follows. Divertible Proof of Difference: 1. Voter → TRR (commitment): – Voter chooses random numbers k1 , k2 ∈R Zq and computes a1 = g k1 , b1 = hk1 , a2 = g k2 , b2 = hk2 . – Voter sends (a1 , b1 , a2 , b2 ) to TRR. 2. Voter ← TRR (randomized commitment): – TRR chooses random numbers k1 , k2 ∈R Zq and computes a1 = a1 g k1 , b1 = b1 hk1 , a2 = a2 g k2 , b2 = b2 hk2 . – TRR sends (a1 , b1 , a2 , b2 ) to the voter. 3. Voter → TRR (response): – Voter computes c = H(a1 , b1 , a2 , b2 ). – Voter computes s1 = k1 − cα, s2 = k2 − ck1 . – Voter sends (s1 , s2 ) to TRR. 4. Voter ← TRR (adjusted response): – TRR computes c = H(a1 , b1 , a2 , b2 ). – TRR computes s1 = s1 + k1 − cβ = k1 + k1 − c(α + β) and s2 = s2 + k2 − ck1 = k2 + k2 − c(k1 + k1 ). – TRR sends (s1 , s2 ) to the voter. 5. Voter (Any verifier): ?
?
?
– Voter verifies the validity of proof as a1 = g s1 xcf , a2 = g s2 ac1 , b2 = hs2 bc1 ?
– Voter verifies the difference b1 = hs1 yfc . If they are equal, it means that two final ballots (xf 1 , yf 1 ) and (xf 2 , yf 2 ) are votes for the same candidate, therefore they are not valid. If they are not equal, two final ballots are valid.
400
Byoungcheon Lee and Kwangjo Kim
The final verification equations hold because of the following relations.
a1 = g k1 +k1 = g s1 +c(α+β) = g s1 xcf
a2 = g k2 +k2 = g s2 +c(k1 +k1 ) = g s2 ac1 b2 = hk2 +k2 = hs2 +c(k1 +k1 ) = hs2 bc1
b1 = hk1 +k1 = hs1 +c(α+β) = hs1 yfc . Through an interactive protocol between the voter and TRR, voter gets a proof of difference (a1 , b1 , a2 , b2 , s1 , s2 ) for two final ballots (xf 1 , yf 1 ) and (xf 2 , yf 2 ). In this protocol, protocol messages from TRR should be authentic, i.e., messages (a1 , b1 , a2 , b2 ) and (s1 , s2 ) should be digitally signed by TRR’s private key and verified by the voter. Similarly this protocol is zero-knowledge in the random oracle model and is receipt-free.
4
Proposed Receipt-Free Electronic Voting Scheme
The proposed receipt-free electronic voting scheme consists of the following 4 stages: system set-up, registration, voting, and tallying. Stage 1. System Set-Up N talliers (T1 , . . . , TN ) execute the key generation protocol of (t, N )-threshold ElGamal encryption scheme and as a result each tallier Ti possesses his share si ∈ Zq of a secret s. The resulting public key of the voting system h = g s is announced to voters. Any cooperation of more than t talliers can decrypt an encrypted ballot. The administrator A publishes the list of L candidates on the bulletin board. Stage 2. Registration We assume that every voters Vi have their certificates Certi certified by a certification authority (CA). Voter Vi connects to A and requests registration for voting with his certificate, then A verifies Vi ’s identity and qualification for voting. If Vi is a legitimate voter, A issues a tamper-resistant randomizer T RRi to Vi in which a digital signature key is equipped securely, and also issues a certificate CertT RRi which corresponds to T RRi ’s digital signature key. In T RRi , talliers’ public key h and voter’s certificate Certi are equipped. A publishes (Vi , Certi , CertT RRi ) on the bulletin board. Stage 3. Voting In this stage voter Vi and his T RRi jointly generates encrypted ballots and proofs of validity as follows. First we consider the 1-out-of-L voting scheme. 1. Vi chooses a candidate among L candidates. Let’s assume that he has choj−1 sen j-th candidate. He computes his first ballots as (x, y) = (g α , hα g M ) where α is Vi ’s random number. He sends it to T RRi with his signature.
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
401
2. T RRi verifies Vi ’s signature in his first ballot and computes the final ballot as (xf , yf ) = (xg β , yhβ ) where β is T RRi ’s random number. It also computes the designated-verifier re-encryption proof. It digitally signs the final ballot and the designated-verifier re-encryption proof and sends them to Vi . 3. Vi verifies the digital signature of the final ballot and also verifies its correctness with the designated-verifier re-encryption proof. 4. If the final ballot is generated correctly, Vi and T RRi jointly compute the proof of validity of the final ballot using the divertible proof of validity protocol. As a result of this protocol, Vi gets the proof of validity, (A, B) and (D, R), which are digitally signed by T RRi . 5. Vi signs the final ballots and the proof of validity with his private key corresponding to his certificate Certi , and posts these messages on the bulletin board. Therefore the posted messages (xf , yf ), (A, B), (D, R) are first signed by T RRi and then signed by Vi . Anyone can verify the fact that these messages are generated by T RRi and approved by Vi . In the case of K-out-of-L voting scheme, Vi and T RRi compute K independent final ballots and proofs of validity in the same way. In addition, Vi and T RRi compute K − 1 proofs of difference using the divertible proof of difference protocol, which represents that K final ballots are votes for different candidates. Stage 4. Tallying When the deadline of voting is reached, administrator A collects all the valid ballots, computes the product (X, Y ) = ( li=1 xf,i , li=1 yf,i ) where l is the total number of valid ballots, and posts it on the bulletin board. Anybody can check the validity of the product because all the final ballots are posted on the bulletin board and their validity can be verified publicly. Then N talliers jointly execute the (t, N )-threshold decryption protocol for (X, Y ) to obtain W = Y /X s . Because the secret key s is shared among N talliers, any subset of t talliers can decrypt (X, Y ) to obtain W . Note that the secret key s is not reconstructed but just X s is computed in the decryption process. 0 1 L−1 where (r1 , . . . , rL ) are the result Now we get W = g r1 M +r2 M +···+rL M of the election. Computation of (r1 , . . . , rL ) requires the computation of the discrete logarithm problem and it is generally considered as a computationally √ L−1 hard problem. In this case, it requires O( l ) time to get the result [CGS97]. It is feasible only for a reasonable size of l and L. Therefore, if this scheme is applied to a large scale electronic voting, A can group the valid ballots into several subgroups with reasonable size of l, and then N talliers can decrypt the subproducts easily, one by one. Note that this kind of local tallying is a common experience in the real world. Now we consider two simple variants of the proposed voting protocol. Non-interactive Variant: If we assume that TRR is tamper-resistant and is constructed correctly by the administrator A, then the first ballot needs not be
402
Byoungcheon Lee and Kwangjo Kim
encrypted by the voter. In this case, we can consider a variant of the voting protocol that the voter just sends his choices to TRR and then TRR computes by itself (non-interactively) the final encrypted ballots, designated-verifier encryption proofs, proofs of validity, and proofs of difference, with its digital signature. After receiving the results from TRR, the voter approves the results with his digital signature and then posts them. Then the ballot generation protocol can be executed in a non-interactive way and the overall voting protocol will be much more efficient. Multiple-Choice Variant: Another simple variant is that the proposed scheme can be used to allow duplicated selection of the same candidate, if the proof of difference is not used. In this case the voter can choose K choices out of L candidates without any requirement for difference.
5
Security Analysis
The proposed electronic voting protocol satisfies the basic and extended requirements of electronic voting. – Privacy: The tallying procedure is executed only for the product of multiple valid ballots. Assuming the honesty of at least N − t talliers (do not open single voter’s ballot), privacy of individual voter is satisfied. Since the proof of validity is zero-knowledge, no partial information on voter’s choice is exposed. – Completeness: The final ballot and the proof of validity are posted on the public bulletin board. Anyone can verify the validity of the final ballots, the correctness of ballot collection and the final result. Therefore valid ballots are counted correctly. – Soundness: Any invalid ballot is detected from the public bulletin board, so it cannot be counted. – Unreusability (prevent double voting): Each voter posts his encrypted ballot and proofs on the bulletin board with his signature and TRR’s signature. Therefore he can vote only once and double voting is detected easily. – Eligibility: Legitimate voters registered to the administrator A are published on the bulletin board together with their certificates. Therefore only legitimate voters can participate in voting. – Fairness: Because the privacy of voter is kept by N talliers and the voting protocol is zero-knowledge, nothing can affect the voting process. – Robustness: (t, N ) threshold ElGamal encryption scheme can tolerate the failure of maximum N − t talliers. – Universal verifiability: Because the final ballot and proof messages are posted on the bulletin board together with voter information, the validity of each ballot is publicly verifiable. The product of valid ballots and tallying result are also publicly verifiable.
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
403
– Receipt-freeness: Since the designated-verifier re-encryption proof given by TRR cannot be transferred to others, the voter cannot prove any relation between his first ballot and the final ballot. Since the proof of validity and the proof of difference are fully randomized by TRR, these proof messages are independent from voter’s commitment messages. Therefore the voter cannot prove any correlation between the proof messages and his first ballot. Assuming the tamper-resistance of TRR, the voter cannot obtain any information on TRR’s internal randomness. Therefore the voter cannot construct any receipt from the protocol messages. – Incoercibility: Since we have assumed that the coercer cannot observe the voter during the very moment of voting, receipt is the only way for the coercer to check voter’s vote. Since the proposed voting scheme satisfies receipt-freeness, incoercibility is also satisfied and vote buying is prevented.
6
Efficiency Analysis
Let’s consider the message size transferred in the voting stage and the number of modular exponentiations in the voting stage. Let |p| be the bit size of group element in Zp , |q| be the bit size of Zq , and |s| be the bit size of digital signature. In the K-out-of-L voting scheme, exchanged messages are as follows. – (2LK + 6K − 4)|p| + (2LK + 2K − 2)|q| + 5|s| (from the voter to TRR). – (2LK + 6K − 4)|p| + (2LK + 6K − 2)|q| + 5|s| (from TRR to the voter). – (2LK + 6K − 4)|p| + (2LK + 2K − 2)|q| + 6|s| (posted on the bulletin board). On the other hand the total number of modular exponentiations are given as follows, excluding the digital signature operations. – Exponentiations by the voter: 8LK + 18K − 12. – Exponentiations by TRR: 4LK + 10K − 4. Therefore overall performance requires O(LK) message transfer and modular exponentiations. This is much more efficient compared with [Hirt01] which requires O(L CK ) ≈ O(2L ) message transfer and modular exponentiations. [Hirt01] also introduced a variant using a binary encoding of ballot and a proof of summation which requires O(2L) message transfer and modular exponentiations. In this scheme valid ballot and its proof of validity are generated only in the voter system without any network communication. Therefore this scheme is more efficient than [Hirt01] in the point of network communication. The noninteractive variant of the proposed scheme is more simple and efficient in the sense that the inner communication protocol between the voter and TRR is also non-interactive. The usage of TRR can be considered to be very costly in large scale election. But it is much more practical than the untappable channel assumption. Moreover tamper-resistant hardware devices are thought to be the ultimate place to store
404
Byoungcheon Lee and Kwangjo Kim
user’s secret information, such as secret signing key. As the technology of tamperresistant hardware device advances in the point of computational power and cost, it is expected that in the near future everybody can store their signing key in their ID card. If this is the case, the proposed electronic voting scheme can be applied very easily over the public network like the Internet without any extra cost.
7
Conclusion
In this paper we have proposed an efficient receipt-free electronic voting scheme using TRR. Because TRR is locally connected to voter system and any network communication is not used during the voting stage, untappable channel assumption is not required and the voting scheme is much more secure and efficient. TRR can be considered to be a secure implementation of the untappable channel and the trusted third party. For an efficient implementation of K-out-of-L voting, we have extended [Hirt01] using the divertible proof of difference. Our scheme requires O(LK) message transfer and modular exponentiations while [Hirt01] requires O(L CK ) ≈ O(2L ). Furthermore we have considered a non-interactive variant that the voter just sends his choices to TRR and then TRR computes by itself the final encrypted ballots, designated-verifier encryption proofs, proofs of validity, and proofs of difference, with its digital signature. Finally, the voter approves the results with his digital signature and then posts them. Then the ballot generation protocol can be executed in non-interactive way and the overall voting protocol can be much more efficient. Because of the rapid advance of hardware technology, tamper-resistant hardware device tends to have more powerful computation and communication functionality. Moreover it is considered to be the ultimate place to store user’s secret information, such as secret signing key. It is expected that it can be applied to wide range of advanced applications in the near future. Therefore TRR seems to be a more practical assumption than untappable channel and trusted third party. If we can use the Internet for electronic voting, voters can participate in voting in any place they like over the Internet. Then electronic voting system can play an important role to increase the participation rate in voting and realize participatory democracy.
Acknowledgements We would like to thank many anonymous reviewers for their valuable comments, which help to make this paper more readable one. There was a comment (and we agree) that the proof of difference may leak some information of voter’s vote, more than just the fact of difference. Further works need to be done to design more efficient ballot encoding and to improve the proof of difference.
Receipt-Free Electronic Voting Scheme with a Tamper-Resistant Randomizer
405
References [Abe98]
[Ben87]
[BFPPS01]
[BT94]
[CFSY96]
[CGS97]
[Cha88]
[Cha02] [ElG85]
[FOO92]
[Hirt01]
[HS00]
[Jak98] [JSI96]
[LK00]
M. Abe, “Universally verifiable mix-net with verification work independent of the number of mix-servers”, Advances in Cryptology – Eurocrypt’98, LNCS Vol.1403, pages 437–447, Springer-Verlag, 1998. 390 J. Benaloh, “Verifiable secret-ballot elections”, PhD Thesis, Yale University, Department of Computer Science, New Haven, September 1987. 390 O. Baudron, P.-A. Fouque, D. Pointcheval, G. Poupard and J. Stern, “Practical Multi-Candidate Election System”, Proc. of the 20th ACM Symposium on Principles of Distributed Computing, N. Shavit, Pages 274–283, ACM Press, 2001. 390, 392 J. Benaloh and D. Tuinstra, “Receipt-free secret-ballot elections”, Proc. of 26th Symp. on Theory of Computing (STOC’94), pages 544–553, New York, 1994. 389, 391, 392 R. Cramer, M. Franklin, B. Schoenmakers, and M. Yung, “Multiauthority secret ballot elections with linear work”, Advances in Cryptology – Eurocrypt’96, LNCS Vol.1070, pages 72–83, Springer-Verlag, 1996. 390 R. Cramer, R. Gennaro, and B. Schoenmakers, “A secure an optimally efficient multi-authority election schemes”, Advances in Cryptology – Eurocrypt’97, LNCS Vol.1233, pages 103–118, Springer-Verlag, 1997. 390, 391, 401 D. Chaum, “Elections with unconditionally- secret ballots and disruption equivalent to breaking RSA”, Advances in Cryptology – Eurocrypt’88, LNCS Vol.330, pages 177–182, Springer-Verlag, 1988. 390 D. Chaum, “Privacy Technology: A survey of security without identification”, IACR Distinguished Lecture in Crypto2002, 2002. 390 T. ElGamal, “A public key cryptosystem and a signature scheme based on discrete logarithms”, IEEE Trans. on IT, Vol.31, No.4, pages 467– 472, 1985. 396 A. Fujioka, T. Okamoto, and K. Ohta, “A practical secret voting scheme for large scale election”, Advances in Cryptology – Auscrypt’92, LNCS Vol.718, pages 244–260, Springer-Verlag, 1992. 389, 390 M. Hirt, “Multi-party computation: efficient protocols, general adversaries, and voting”, Ph.D. Thesis, ETH Zurich, Reprint as vol. 3 of ETH Series in Information Security and Cryptography, ISBN 3-89649-747-2, Hartung-Gorre Verlag, Konstanz, 2001. 389, 390, 391, 392, 394, 403, 404 M. Hirt and K. Sako, “Efficient receipt-free voting based on homomorphic encryption”, Advances in Cryptology - Eurocrypt2000, LNCS vol.1807, pages 539–556, Springer-Verlag, 2000. 390, 391, 392 M. Jakobsson, “A practical mix”, Advances in Cryptology – Eurocrypt’98, LNCS Vol.1403, pages 449–461, Springer-Verlag, 1998. 390 M. Jakobsson, K. Sako, and R. Impagliazzo, “Designated verifier proofs and their applications”, Advances in Cryptology – Eurocrypt’96, LNCS Vol.1070, pages 143–154, Springer-Verlag, 1996. 396 B. Lee, and K. Kim, “Receipt-free electronic voting through collaboration of voter and honest verifier”, Proceeding of JW-ISC2000, pages 101–108, Jan. 25-26, 2000, Okinawa, Japan. 389, 390, 391, 392
406
Byoungcheon Lee and Kwangjo Kim
[MBC01]
[MH96]
[NR94]
[Oka97]
[OMAFO99]
[Pai99]
[Pfi94]
[PIK93]
[Po00]
[SK94]
[SK95]
E. Magkos, M. Burmester, V. Chrissikopoulos, “Receipt-freeness in large-scale elections without untappable channels”, 1st IFIP Conference on E-Commerce / E-business / E-Government, Zurich, Octomber 2001, Kluwer Academics Publishers, pages 683–693, 2001. 390, 391, 392 M. Michels and P. Horster, “Some remarks on a receipt-free and universally verifiable mix-type voting scheme”, Advances in Cryptology – Asiacrypt’96, LNCS Vol.1163, pages 125–132, Springer-Verlag, 1996. 389, 390 V. Niemi and A. Rendall, “How to prevent buying of votes in computer elections”, Advances in Cryptology – Asiacrypt’94, LNCS Vol.917, pages 141–148, Springer-Verlag, 1994. 389 T. Okamoto, “Receipt-free electronic voting schemes for large scale elections”, Proc. of Workshop on Security Protocols’97, LNCS Vol.1361, pages 25–35, Springer-Verlag, 1997. 391 M. Ohkubo, F. Miura, M. Abe, A. Fujioka and T. Okamoto, “An Improvement on a practical secret voting scheme”, Information Security’99, LNCS Vol.1729, pages 225–234, Springer-Verlag, 1999. 390 P. Paillier, “Public-key cryptosystems based on discrete logarithms residues”, Advances in Cryptology - Eurocrypt ’99, LNCS Vol. 1592, pages 223–238, Springer-Verlag, 1999. 392 B. Pfitzmann, “Breaking an efficient anonymous channel”, Advances in Cryptology – Eurocrypt’94, LNCS Vol.950, pages 332–340, SpringerVerlag, 1994. 390 C. Park, K. Itoh, and K. Kurosawa, “Efficient anonymous channel and all/nothing election scheme”, Advances in Cryptology – Eurocrypt’93, LNCS Vol.765, pages 248–259, Springer-Verlag, 1994. 390 D. Pointcheval, “Self-scrambling anonymizers”, Proceedings of Financial Cryptography 2000, Y. Frankel, Pages 259–275, LNCS 1962, SpringerVerlag, 2001. 390 K. Sako and J. Killian, “Secure voting using partial compatible homomorphisms”, Advances in Cryptology – Crypto’94, LNCS Vol.839, pages 411–424, Springer-Verlag, 1994. 390 K. Sako and J. Kilian, “Receipt-free mix-type voting scheme – a practical solution to the implementation of a voting booth”, Advances in Cryptology – Eurocrypt’95, LNCS Vol.921, pages 393–403, Springer-Verlag, 1995. 390, 391
Non-interactive Auction Scheme with Strong Privacy Kun Peng, Colin Boyd, Ed Dawson, and Kapali Viswanathan Information Security Research Centre Faculty of Information Technology, Queensland University of Technology 2, George Street, Brisbane, QLD4001, Australia {k.peng,c.boyd,e.dawson,k.viswanathan}@qut.edu.au
Abstract. Key chain, as an effective tool to achieve strong bid privacy non-interactively, was employed by Watanabe and Imai in an auction scheme. But in their scheme bid privacy cannot be achieved unconditionally and losing bidders must trust bidders with higher bids for privacy of their bids. Moreover, their scheme is not efficient. In this paper the key chain in the shceme by Watanabe and Imai is optimised to achieve unconditional bid privacy. In the new scheme, every losing bidder can control privacy of their own bids while no trust is needed. Computational cost of this scheme is optimised by avoiding the costly verifiable encryption technique in their scheme.
1
Introduction
Sealed-bid auction is an ideal method to distribute merchandise. In sealed-bid auctions each bidder seals his bid (by encryption or hash function) and submits it before a set time. After that time the bids are opened and the winning price and winner are determined according to a pre-defined auction rule. Compared to other types of auction, such as open-cry auction, sealed-bid auction is more suitable in network environment. Therefore sealed-bid auction has been attracting most attention in the research of e-auction. In many auction applications it is desired to keep the losing bids private even at the end of the auction. This requirement is called bid privacy and is discussed in many papers. Watanabe and Imai presented a non-interactive sealed-bid auction scheme [15], which provides privacy for the losing bids. The essential idea in this scheme is a technique called key chain. The advantage of that scheme is that bid privacy is obtained non-interactively (the bidders need not participate in opening the bids after they submit their bids). The authors claimed that they provided satisfactory bid privacy (“. . . prevent even an auctioneer from getting any useful information of bids of losers . . . ”). However the bid privacy in this scheme is achieved based on strong trust (either a fraction of bidders, the auctioneer or a third party must be trusted). In other words, a losing bid can be revealed by cooperation of the auctioneer and all the bidders with higher bids. This kind of bid privacy may not been satisfactory. Moreover this scheme is not efficient. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 407–420, 2003. c Springer-Verlag Berlin Heidelberg 2003
408
Kun Peng et al.
In this paper a new scheme is presented. The idea of key chain is inherited, but the key chain is constructed in a different way, so that bid privacy for a losing bidder is achieved without any trust on other parties. Namely, without the cooperation of a losing bidder his bid is private. Additionally, the new scheme is simpler as the third party T and the auctioneer A are removed. As a result, communication in the proposed scheme is more efficient than in [15]. 1.1
Desired Properties in Sealed-Bid Auction
There are several properties that are usually desired in e-auction schemes [2, 14, 13]. Their definitions are as follows. 1. Correctness: If every party acts honestly, the correct winning price and winner(s) are determined according to the auction rules. 2. Soundness: If an auction result is declared, it is a correct result even if there are some dishonest parties. 3. Fairness: No bidder can take advantage of other bidders. This includes: – No bidder knows anything about other bidders’ bids before he submits his own bid. – After a bidder submits his bid, the bid cannot be modified. – No bidder can deny his bid after he submits it. This is sometimes called non-repudiation of bids. 4. Bid Privacy: The losing bids remain confidential up to, and after the end of the auction even to the auctioneers. 5. Public Verifiability: The validity of the result of the auction is publicly verifiable by anyone. 6. High Efficiency: Computation and communication must be efficient enough for applications. 1.2
Symbols and Outline
G is a cyclic group with a generator g. There are n bidders B1 , B2 , . . . , Bn and w biddable prices p1 , p2 , . . . , pw from highest to lowest. Ea (b) denotes encryption of b by a public key a. Da (b) denotes decryption of b by a private key a. Siga (b) denotes a’s signature on b and message b. V Ea (b) is verifiable encryption of b by a’s key. In section 2, related auction schemes are introduced. In section 3, the scheme by Watanabe and Imai is reviewed and analysed. In section 4, our new scheme is presented. In section 5, the security of our scheme is analysed.
2
Related Work
Bid privacy is a frequently desired property in auction schemes. It refers to the confidentiality of losing bids to anybody even after the auction ends. In current auction schemes, two methods are often applied to implement bid privacy.
Non-interactive Auction Scheme with Strong Privacy
409
The first method is to trust some parties to conceal the losing bids. To strengthen bid privacy, the trust is often shared among a few auctioneers, so that bid privacy can be achieved if the number of honest parties is over a threshold. This mechanism is usually realized by sharing the capability of bid-opening among several auctioneers and requiring the cooperation of a portion of them to open the bids. Several published schemes are in this category [5, 7, 4, 6, 1, 9]. [5, 7] employ standard threshold secret sharing technique. [4] employs a special 2 − 2 secret sharing. [6] also employs threshold secret sharing, but uses the degree of polynomials to stand for a bid. [1, 9] employ distributed decryption technique. [1] employs standard threshold distributed decryption. [9] employs only two auctioneers and is in fact 2-2 distributed decryption if bid decryption is defined as interpreting the meaning of bids in auction schemes. The disadvantage of this method is that the bid privacy obtained is not strong enough. In some applications stronger bid privacy is required. The strongest is unconditional bid privacy—without the cooperation of a losing bidder, his bid is confidential. A mechanism called Dutch style bid opening can be employed to achieve unconditional bid privacy. In this mechanism the bids are opened downwards from the highest biddable price, which is quite like the strategy in Dutch auction. After the winning bid is found in a downward search, cooperation from the bidders is not available, so any losing bidder’s bid is kept private without trust on anybody else. Therefore, very strong absolute privacy is achieved. The disadvantage of this method is low efficiency. The scheme is interactive and inefficient in computation. Prominent schemes in this category include [12], [11] and [13]. A scheme by Watanabe and Imai [15] was claimed to achieve strong bid privacy non-interactively. A cryptographic tool, key chain, is employed in this scheme. The bids are opened in a downward direction from the highest biddable price until the winning bid is found. Bid opening is non-interactive, which is an advantage over [12], [11] and [13]. However, bid privacy is not very strong. Further details are given below.
3 3.1
Auction Scheme Based on Key Chain Key Chain
In [15] only a finite set of prices are biddable and a key chain is constructed for these prices. The principle of key chain is as follows. 1. At each price all the bids are encrypted by the same public key, which is generated by all the bidders. 2. The corresponding decrypting key is shared among the bidders. Only when all the bidders put their shares together at a price, the bids at that price can be opened. 3. If a bidder is not willing to pay a price, at that price his bidding value contains his share of the decryption key needed to open the bids at the next lower price. So if none of the bidders are willing to pay a price, the decryption
410
Kun Peng et al.
key to open the bids at the next lower price can be constructed from their opened bids at the price. 4. If a bidder is willing to pay a price, his share of the decryption key needed to open the bids at the next lower price is not contained in the bid for the current price. In this case the key chain is broken and the decryption key to open the bids at the next lower price cannot be constructed, thus the confidentiality of the losing bids is protected. 3.2
The Scheme by Watanabe and Imai
There is an active auctioneer in the scheme by Watanabe and Imai. The auctioneer is responsible for constructing the public keys in the chain. To weaken the trust on the bidders, a share for each decryption key is provided by the auctioneer. Moreover, verifiable encryption is employed so that an off-line third party can interfere if a bidder is dishonest when constructing the key chain (correct shares for next decryption key is not in one of his bids). In this case the third party can recover the concealed correct share to help construct the next decryption key. Their protocol is as follows. 1. Registration phase – Bidder Bi chooses his secret share xi,j for price pj . The corresponding public key share is yi,j = g xi,j . Additionally xi,j is encrypted as βi,j = V ET (xi,j ) by a third party T ’s public key. Watanabe and Imai adopted Naccache-Stern encryption algorithm [8]. βi,j is recoverable by T and can be verified as a correct encryption of the secret committed in yi,j by zero knowledge proof of equality of logarithms [3]. Bi signs, and sends yi,j and βi,j for j = 1, 2, . . . , w to auctioneer A. – A verifies Bi ’s signature on yi,j and βi,j for j = 1, 2, . . . , w and the correctness of encryption. If the verification is successful, A sends a certificate certi = (zi,1 , zi,2 , . . . , zi,j ) to Bi where zi,j = SigA (Bi , yi,j ). Then A chooses his own secret n shares xA,j and generates the public keys in the chain Yj = g xA,j i=1 yi,j for j = 1, 2, . . . , w. Finally A publishes Yj for j = 1, 2, . . . , w and the registration information of the bidders. Key generation is illustrated in Table 1 for the case of 3 bidders and 6 biddable prices, so that n = 3 and w = 6. 2. Bidding phase – Bi publishes his bid Vi,j = EYj (Ii,j , yi,j , zi,j ) for j = 1, 2, . . . , w. If he is not willing to pay pj , Ii,j = (N o, xi,j+1 ). If he is willing to pay pj , Ii,j = (Yes, proof (xi,j +1 )) where proof (xi,j+1 ) is a transcript for zero knowledge proof of knowledge of xi,j+1 . Ii,j can be checked against yi,j and zi,j to show that Bi provides a valid xi,j+1 (in a “Yes” bid) or knows its value (in a “No” bid). Bid format is illustrated in Table 2 (supposing there are 3 bidders and 6 biddable prices). In the table, for simplicity, only xi,j , the basic element of bid Vi,j , is presented. 3. Opening phase – Bi publishes xi,1 , yi,1 and zi,1 .
Non-interactive Auction Scheme with Strong Privacy
411
– A calculates and publishes X1 = xA1 + ni=1 xi,1 , the decryption key for the bids at p1 . – If no “Yes” bid is found at this price, decryption key for p2 can be constructed and opening continues. Similarly the opening can go on along the key chain until a “yes” bid is found as winning bid and the key chain is broken.
3.3
Problems in the Scheme by Watanabe and Imai
Among the desired properties introduced in 1.1, bid privacy and high efficiency cannot be achieved satisfactorily. Since A provides a share for each decryption key the trust for bid privacy is shared among not only the bidders but also A. Namely the trust needed for the
Table 1. Key generation in the scheme by Watanabe and Imai A p1 p2 p3 p4 p5 p6
xA1
g g xA2 g xA3 g xA4 g xA5 g xA6
y1,1 y1,2 y1,3 y1,4 y1,5 y1,6
B1 = g x1,1 = g x1,2 = g x1,3 = g x1,4 = g x1,5 = g x1,6
y2,1 y2,2 y2,3 y2,4 y2,5 y2,6
B2 = g x2,1 = g x2,2 = g x2,3 = g x2,4 = g x2,5 = g x2,6
y3,1 y3,2 y3,3 y3,4 y3,5 y3,6
B3 = g x3,1 = g x3,2 = g x3,3 = g x3,4 = g x3,5 = g x3,6
Y1 Y2 Y3 Y4 Y5 Y6
encryption key =g × y1,1 × y2,1 × y3,1 = g xA2 × y1,2 × y2,2 × y3,2 = g xA3 × y1,3 × y2,3 × y3,3 = g xA4 × y1,4 × y2,4 × y3,4 = g xA5 × y1,5 × y2,5 × y3,5 = g xA6 × y1,6 × y2,6 × y3,6 xA1
Table 2. Bids in the scheme by Watanabe and Imai
p1 p2 p3 p4 p5 p6
B1 Ey1 (x1,2 )
B2 Ey1 (x2,2 )
B3 Ey1 (x3,2 )
decryption key X1 = xA1 + x1,1 +x2,1 + x3,1 Ey2 (proof (x1,3 )) Ey2 (x2,3 ) Ey2 (x3,3 ) X2 = xA2 + x1,2 +x2,2 + x3,2 Ey3 (x1,4 ) Ey3 (proof (x2,4 )) Ey3 (x3,4 ) B1 and A must collude to recover X3 Ey4 (x1,5 ) Ey4 (x2,5 ) Ey4 (x3,5 ) B1 , B2 and A must collude to recover X4 Ey5 (x1,6 ) Ey5 (x2,6 ) Ey5 (proof (x3,6 )) B1 , B2 and A must collude to recover X5 Ey6 (x1,1 ) Ey6 (x2,1 ) Ey6 (x3,1 ) B1 , B2 , B3 and A must collude to recover X6
412
Kun Peng et al.
privacy of the i + 1th highest bid is shared among the bidders submitting the highest i bids and A. As a result, weaker trust is required, however bid privacy is still conditional and the scheme is still unfair for bidders with lower bids. Because verifiable encryption enables T to recover a secret share once he gets its encrypted value, registration information from bidders must be transmitted through a confidential channel (this was not stated by Watanabe and Imai). Even though the registration information is encrypted, collusion of A and T still can reveal all decryption keys and thus all losing bids. That means bid privacy is based on the following two assumptions 1. A and the winner do not conspire, 2. A and T do not conspire. These are still strong assumptions and require strong trust. Inefficiency is also a problem. Because an active auctioneer is involved in key chain construction and verifiable encryption is employed, computation and communication in registration phase are costly. Another issue affecting efficiency is bid padding. Every bidder’s highest positive bid (transcript of a non-interactive zero-knowledge proof of knowledge) is in a different format from other bids (encryption of an integer less than the order of G, which is in G when ElGamal encryption algorithm is employed). As the highest positive bids are much longer, other bids must be padded to the same length to make the encrypted bids indistinguishable from one another, although padding was not mentioned in the paper by Watanabe and Imai. This increases the communication burden of the scheme.
4
New Scheme
We want unconditional bid privacy, namely no trust is needed on any other party for the confidentiality of a losing bidder’s bid. In the new scheme when
v1
Bids :
encrypt
X 1 recovered
3 valid shares for X 2 is revealed
decrypt
decrypt Key Chain:
Y1
v2
v3
encrypt
encrypt
Y3
Y2 X 2 recovered
2 valid shares for X is revealed. The winner does not know his share for X 3.
X 3 not recoverable
3
Fig. 1. Modified key chain
no shares for X 4 are revealed
Non-interactive Auction Scheme with Strong Privacy
413
there is a winning bid, the key chain is broken completely. One solution is to construct the key chain according to a rule: if a bidder has a positive bid at a price, he does not have a share of the decryption key for the next lower price. His share is actually shared again among all the bidders. So the public keys are generated in a special way so that the share for the decryption key at the winning price to the winner can only be extracted by a cooperation of all the bidders. Therefore any decryption key at a price lower than the winning price cannot be reconstructed without cooperation of all bidders. The modified key chain is illustrated in Figure 1 in an example where the fourth highest bid is the winning bid. To obtain a simpler and more effective and efficient scheme, no active auctioneer is employed and no registration phase is needed in our scheme. Nor does it need a third party or verifiable encryption. Bidders performing malicious behaviour (e.g. failing to reveal correct share in a “No” bid) can be publicly identified. Our scheme includes four phases: initial phase, pre-bidding phase, bidding phase and opening phase. 1. Initial phase: Each bidder Bi chooses a secret xi and publishes the commitments Com1i = (Bi , yi , SigBi (Bi , yi )) where yi = g xi for i = 1, 2, . . . , n on a bulletin board. 2. Pre-bidding phase: Every bidder publishes a public key for every biddable price. If a bidder Bi is not willing to pay pj , his public key for pj+1 is yi,j+1 = g xi,j+1 where the corresponding secret key xi,j+1 is kept as a secret. If bidder Bi ’s bidding price n is pj , his public key for pj+1 is yi,j+1 = g ri k=1,k=i yk where ri is kept secret and he chooses public keys yi,j+2 , yi,j+3 , . . . , yi,w randomly for pj+2 , pj+3 , . . . , pw . Bi publishes Com2i=(Bi , yi,1 , yi,2 , . . . , yi,w , SigBi (Bi , yi,1 , yi,2 , . . . , yi,w )) on the bulletin board. Key generation is illustrated in Table 3 (supposing there are 3 bidders and 6biddable prices, so that n = 3, w = 6). The public key for price pj is Yj = nk=1 yk,j and can be calculated by anybody using the public values available on the bulletin board. 3. Bidding phase: Every bidder submits a bid for each biddable price. If a bidder Bi is not
Table 3. Key generation in our scheme B1 B2 B3 evaluation p2 p3 p5 p1 y1,1 = g x1,1 y2,1 = g x2,1 y3,1 = g x3,1 p2 y1,2 = g x1,2 y2,2 = g x2,2 y3,2 = g x3,2 r1 x2,3 p3 y1,3 = g y2 y3 y2,3 = g y3,3 = g x3,3 r2 p4 any y1,4 in G y2,4 = g y1 y3 y3,4 = g x3,4 p5 any y1,5 in G any y2,5 in G y3,5 = g x3,5 p6 any y1,6 in G any y2,6 in G y3,6 = gr3 y1 y2
encryption key Y1 Y2 Y3 Y4 Y5 Y6
= y1,1 × y2,1 × y3,1 = y1,2 × y2,2 × y3,2 = y1,3 × y2,3 × y3,3 = y1,4 × y2,4 × y3,4 = y1,5 × y2,5 × y3,5 = y1,6 × y2,6 × y3,6
414
Kun Peng et al.
willing to pay pj , his bid at pj is Vi,j = EYj (xi,j+1 ). If Bi is willing to pay pj , Vi,j = EYj (ri ). At price pj lower than his evaluation, Vi,j is randomly chosen. Bi publishes Vi = (Bi , Vi,1 , Vi,2 , . . . , Vi,w , SigBi (Bi , Vi,1 , Vi,2 , . . . , Vi,w )) on the bulletin board. Bid format is illustrated in Table 4 (supposing there are 3 bidders and 6 biddable prices). 4. Opening phase: The bidders publish Com3i = (xi,1 , SigBi (xi,1 )) for i = 1, 2, . . . , n. Anybody can verify the validity of the shares against y i,1 for i = 1, 2, . . . , n, n construct the decryption key for the first price X1 = k=1 xk,1 and decrypt all the bids at p1 . The meaning of Bi ’s decrypted bid vi,1 can be determined by testing whether yi,2 = g vi,1 (vi,1 is negative bid) or yi,2 = g vi,1 nk=1,k=i yk (vi,1 is positive bid). If there is no bid showing willingness to payat p1 , all n the shares xi,2 = vi,1 for i = 1, 2, . . . , n are obtained and X2 = k=1 xk,2 can be recovered. Then all the bids at p2 are opened. The opening continvi,j ues until is met and the key chain breaks at pj+1 . If yi,j+1 = n yi,j+1 = g vi,j g k=1 yk , pj and Bi are declared as winning price and winner. Otherwise Bi is identified as a cheater. Figure 2 illustrates the auction procedure.
5
Analysis
The new auction scheme is analysed in this section in relation to the properties from section 1.1. It will be shown that the scheme is correct, sound, fair, publicly verifiable and achieves unconditional privacy for losing bids.
Table 4. Bids in our scheme B1 evaluation p2 p1 EY1 (x1,2 ) p2 EY2 (r1 ) p3 random bid in correct format p4 random bid in correct format p5 random bid in correct format p6 random bid in correct format
B2 p3 EY1 (x2,2 ) EY2 (x2,3 ) EY3 (r2 ) random bid in correct format random bid in correct format random bid in correct format
B3 p5 EY1 (x3,2 ) EY2 (x3,3 ) EY3 (x3,4 )
decryption key
X1 = x1,1 + x2,1 + x3,1 X2 = x1,2 + x2,2 + x3,2 B2 and B3 must collude to recover X3 EY4 (x3,5 ) all the bidders must collude to recover X4 EY5 (r3 ) all the bidders must collude to recover X5 random bid in all the bidders must correct format collude to recover X6
Non-interactive Auction Scheme with Strong Privacy
415
Initial Phase
1.
Com1i = (SigBi (Bi , yi ))
Bi
✲ BB ∗
yi = g xi Pre-bidding Phase
2.
Bi
Com2i = (SigBi (Bi , yi,1 , yi,2 , . . . , yi,w ))
✲ BB
negative bid: yi,j+1 = g xi,j+1
n
positive bid: yi,j+1 = g ri
k=1,k=i
yk
Bidding Phase
3.
Bi
Vi = (SigBi (Bi , Vi,1 , Vi,2 , . . . , Vi,w ))
✲ BB
negative bid: Vi,j = EYj (xi,j+1 ) positive bid: Vi,j = EYj (ri )
Opening Phase
4.
Com3i = (SigBi (xi,1 ))
Bi
At price pj , construct Xj =
n i=1
✲ BB
xi,j
Decryption: vi,j = DXj (Vi,j ) If g vi,j = yi,j+1 , Vi,j is a negative bid If g vi,j = yi,j+1 /(
n
k=1,k=i
yk ), Vi,j is a positive bid and opening stops.
If vi,j = xi,j+1 for i = 1, 2, . . . , n are recovered, Xj+1 =
n
i=1
vi,j is constructed and opening continues. * BB: bulletin board
Fig. 2. Optimistic auction procedure
416
Kun Peng et al.
1. Correctness: n An honest bidder Bi publishes xi,1 = log g yi,1 . So X1 = k=1 xk,1 = log g Y1 can be reconstructed. Therefore the key chain starts correctly and the bids at p1 can be opened. An honest bidder Bi ’s bids at all the biddable prices are as follows (a) At a price pj no lower than his evaluation, his bid is xi,j+1 satisfying yi,j+1 = g xi,j+1 . (b) At a price pj equal to his evaluation, his bid is ri satisfying yi,j+1 = n g ri k=1,k=i yk . (c) At a price pj lower than his evaluation, his bid is a random value. If at a price pj higher than any bidder’s evaluation bids are opened, the decrypted bids are vi,j = xi,j+1 = log g yi,j+1 for i = 1, 2 . . . , n, thus Xj+1 = n x k=1 k,j+1 = log g Yj+1 can be reconstructed. So the key chain extends correctly one step downwards and the bids at pj+1 can be opened. Namely as far as all the opened bids are as expressed in (a) above, the key chain can extend on. Therefore if all bidders have an evaluation lower than the lowest biddable price, the key chain extends ultimately to pw and n the item on sale is not sold. Otherwise vi,j = ri satisfying yi,j+1 = g ri k=1,k=i yk must be met for some i and j. In this case pj is the winning price and Bi is the winner. 2. Soundness: As the number of biddable prices is finite, extension of the key chain must stop somewhere. (a) If the key chain extends to pw and no winner is found, yi,j+1 = g DXj (Vi,j ) for i = 1, 2, . . . , n and j = 1, 2, . . . , w − 1. Since yi,j+1 and Vi,j for i = 1, 2, . . . , n and j = 1, 2, . . . , w−1 are signed by Bi , they are generated by Bi if the signature algorithm is secure. So no bidder submits a positive bid no lower than the lowest biddable price. (b) If pu and Bv are declared as winning price and winner, yi,j+1 = g DXj (Vi,j ) for i = 1, 2, . . . , n, j = 1, 2, . . . , u − 1 and i = 1, 2, . . . , v − 1, v + 1, v + 2 . . . , n, j = u. Since yi,j+1 and Vi,j for i = 1, 2, . . . , n and j = 1, 2, . . . , u are signed by Bi , they are generated by Bi if the signature algorithm is secure. So pu and Bv are winning price and winner. (c) If Bi is declared as a cheater, the key chain must be broken at a price pu and yi,j+1 = g DXj (Vi,j ) for j = 1, 2, . . . , u − 2 and yi,u = g DXu−1 (Vi,u−1 ) n and yi,u = g DXu−1 (Vi,u−1 ) k=1,k=i yk . Since yi,j+1 and Vi,j for j = 1, 2, . . . , u − 1 are signed by Bi , they are generated by Bi if the signature algorithm is secure. So Bi is a cheater. 3. Fairness: – First it is illustrated that before the opening phase, no bids are revealed. Before the opening phase only every bidder’s public keys and bids for each price are published. The public keys are generated in two methods. In the first method a bidder Bi chooses a secret key xi,j randomly for pl and the public key is yi,j = g xi,j . Since xi,j is chosen from 1, 2, 3, . . . , ord(G) randomly, yi,j has a identical distribution over G. In the second Bi chooses a random value ri for pl and the public key
Non-interactive Auction Scheme with Strong Privacy
417
is yi,j = g ri nk=1,k=i yk . Since ri is chosen from 1, 2, 3, . . . , ord(G) randomly, yi,j has a identical distribution over G too. In both cases all the public keys are in identical distribution over G, so no information about any bidder’s bids is revealed from the public keys. All the submitted bids are encryptions of a random integer less than ord(G), thus have an uniform distribution in the ciphertext space (G in the case of ElGamal encryption) if a semanticly secure encryption algorithm (e.g. ElGamal or Paillier’s [10]) is employed . So no information about the bids is revealed from the encrypted bids although no padding operation is employed. Therefore before the opening phase all bids are confidential on the assumption that the encryption algorithm is semantically secure1 . The only method to open any bid is to construct the key chain, which requires the cooperation of all bidders and does not happen until the opening phase. – No bidder can change or deny his bid after bidding phase. A bidder Bi ’s bidding value at a price pj is determined by whether yi,j+1 = g DXj (Vi,j ) n or yi,j+1 = g DXj (Vi,j ) k=1,k=i yk . Since yi,j+1 and Vi,j are published in pre-bidding phase and bidding phase respectively, they cannot be changed. So bidding values cannot be changed. yi,j+1 and Vi,j are signed by Bi , so Bi cannot deny his bids. 4. Public Verifiability All the information necessary to decide the auction result is published on the bulletin board, so anybody can verify the auction result using the contents of the bulletin board. 5. Bid Privacy: The bidders with higher bids (e.g. the winner) cannot take advantage over other bidders even after the auction result turns out, because to open any losing bid the cooperation of all the losing bidders is necessary. When Bv is the winner and pu is the winning price, Bv ’s bid at pu is opened to be rv satisfying yv,u+1 = g rv while all the other bidders are opened as x1,u+1 , x2,u+1 , . . . xv−1,u+1 , xv+1,u+1 , xv+2,u+1 , . . . xn,u+1 . If an attacker A can decrypt any losing bid at pu+1 , he must know the decryption key Xu+1 = rv +
v−1 k=1
xk +
n
xk +
k=v+1
v−1
xk,u+1 +
k=1
v−1
xk,u+1
k=1
on condition that the applied encryption algorithm (e.g. ElGamal or Paillier’s) is secure. So he must know v−1 k=1 1
xk +
n k=v+1
xk = Xu+1 −
v−1 k=1
xk,u+1 −
n
xk,u+1 − rv
k=v+1
An encryption algorithm is said to be semantically secure if given that ck is the encryption of message m0 or m1 , it is computationally difficult to determine which is the correct message coresponding to ck .
418
Kun Peng et al.
Table 5. Efficiency comparison Scheme by Watanabe and Imai
Our scheme
8w + 1
1.5w + 2
5.5nw + w + 4n at least 5120 bits at least 1024(8n + 1)w bits
nw/2 + 2n + 1 1024 bits 2048nw bits
Computational cost of a bidder (exponentiations) Computational cost of a auctioneer (exponentiations) bid length communication
But to know v−1 k=1
xk +
n
xk =
k=v+1
v−1 k=1
log g yk +
n
log g yk
k=v+1
the attacker needs the cooperation of all the losing bidders if Diffie-Hellman assumption is correct. So without cooperation of all the losing bidders all losing bids at pu+1 are confidential. That also means no share of Xu+2 is published. Therefore without cooperation of all the losing bidders all losing bids at pu+2 are confidential too. Similarly all lower bids cannot be opened without cooperation of the losing bidders. So in this fashion stronger bid privacy can be achieved in our scheme than in the scheme by Watanabe and Imai [15].
6
Efficiency Comparison
As stated before, [15] is not efficient in computation and communication. Our scheme improves communication efficiency greatly as bid length is much shorter in our scheme and communication with an active auctioneer is avoided. Table 5 compares computation and communication efficiency of the scheme by Watanabe and Imai and our scheme. For comparison of computation efficiency, the parameters n and w are used to denote the number of bidders and the number of biddable prices repectively as before. For comparison of communication efficiency, interger length of 1024 bits is assumed for all the cryptographic primitives. Table 5 demonstrates the significant improvement in computation and communication efficiency for both the bidders and the auctioneers in the new scheme.
7
Conclusion
The key chain in the scheme by Watanabe and Imai[15] is modified, so that stronger bid privacy can be achieved in the proposed auction scheme. So far, this is the only scheme that can achieve non-interaction, public verifiability and
Non-interactive Auction Scheme with Strong Privacy
419
unconditional privacy for losing bids at the same time. Efficiency is also improved in this scheme. However their scheme is more robust as it is able to continue the auction even if bidders disrupt the auction. In our new scheme, the dishonest bidders can be identified, disenrolled, but it is necessary to rewind the auction.
References [1] Masayuki Abe and Koutarou Suzuki. M+1-st price auction using homomorphic encryption. In Public Key Cryptology 2002, pages 115–124, Berlin, 2002. SpringerVerlag. Lecture Notes in Computer Science Volume 2288. 409 [2] Colin Boyd and Wenbo Mao. Security issues for electronic auctions. Technical report, 2000. available at www.hpl.hl.com/techreports/2000/HP-2000-90.html. 408 [3] D. Chaum and T. P. Pedersen. Wallet databases with observers. In Ernest F. Brickell, editor, Advances in Cryptology - Crypto ’92, pages 89–105, Berlin, 1992. Springer-Verlag. Lecture Notes in Computer Science Volume 740. 410 [4] Koji Chida, Kunio Kobayashi, and Hikaru Morita. Efficient sealed-bid auctions for massive numbers of bidders with lump comparison. In Information Security, 4th International Conference, ISC 2001, pages 408–419, Berlin, 2001. SpringerVerlag. Lecture Notes in Computer Science Volume 2200. 409 [5] H Kikuchi, Michael Harkavy, and J D Tygar. Multi-round anonymous auction. In Proceedings of the First IEEE Workshop on Dependable and Real-Time ECommerce Systems, pages 62–69, June 1998. 409 [6] Hiroaki Kikuchi. (m+1)st-price auction. In The Fifth International Conference on Financial Cryptography 2001, pages 291–298, Berlin, February 2001. SpringerVerlag. Lecture Notes in Computer Science Volume 2339. 409 [7] Hiroaki Kikuchi, Shinji Hotta, Kensuke Abe, and Shohachiro Nakanishi. Distributed auction servers resolving winner and winning bid without revealing privacy of bids. In proc. of International Workshop on Next Generation Internet (NGITA2000), IEEE, pages 307–312, July 2000. 409 [8] David Naccache and Jacques Stern. A new public key cryptosystem based on higher residues. In ACM Computer Science Conference 1998, pages 160–174, 1998. 410 [9] Kazumasa Omote and Atsuko Miyaji. A second-price sealed-bid auction with the discriminant of the p-th root. In Financial Cryptography 2002, Berlin, 2002. Springer-Verlag. 409 [10] P Paillier. Public key cryptosystem based on composite degree residuosity classes. In Eurocrypt’99, pages 223–238, Berlin, 1999. Springer-Verlag. Lecture Notes in Computer Science Volume 1592. 417 [11] K Sako. An auction scheme which hides the bids of losers. In Public Key Cryptology 2000, pages 422–432, Berlin, 2000. Springer-Verlag. Lecture Notes in Computer Science Volume 1880. 409 [12] Kouichi Sakurai and S Miyazaki. A bulletin-board based digital auction scheme with bidding down strategy -towards anonymous electronic bidding without anonymous channels nor trusted centers. In Proc. International Workshop on Cryptographic Techniques and E-Commerce, pages 180–187, Hong Kong, 1999. City University of Hong Kong Press. 409 [13] Koutarou Suzuki, Kunio Kobayashi, and Hikaru Morita. Efficient sealed-bid auction using hash chain. In International Conference on Information Security and
420
Kun Peng et al.
Cryptology 2000, pages 183–191, Berlin, 2000. Springer-Verlag. Lecture Notes in Computer Science 2015. 408, 409 [14] Kapali Viswanathan, Colin Boyd, and Ed Dawson. A three phased schema for sealed bid auction system design. In Information Security and Privacy, 5th Australasian Conference, ACISP’2000, pages 412–426, Berlin, 2000. Springer-Verlag. Lecture Notes in Computer Science 1841. 408 [15] Yuji Watanabe and Hideki Imai. Reducing the round complexity of a sealed-bid auction protocol with an off-line ttp. In STOC 2000, pages 80–86. ACM, 2000. 407, 408, 409, 418
An Anonymous Buyer-Seller Watermarking Protocol with Anonymity Control Hak Soo Ju1 , Hyun Jeong Kim2 , Dong Hoon Lee2 , and Jong In Lim2 1
2
Korea Information Security Agency (KISA), Korea [email protected] Center for Information and Security Technologies (CIST), Korea [email protected] {donghlee, jilim}@korea.ac.kr
Abstract. Anonymity is one of important requirements for electronic marketplaces to offer similar privacy as current marketplace. Unfortunately, most watermarking protocols in the literature have been developed without considering a buyer’s anonymity. It would be unsatisfactory for the buyer to reveal his/her identity to purchase multimedia contents. In this paper we propose an anonymous buyer-seller watermarking protocol, where a buyer can purchase contents anonymously, but anonymity control is provided, i.e., a seller can trace the identity of a copyright violator. The proposed scheme also provides unlinkability of contents purchased by a buyer. Another distinct feature of our scheme is that the identification of an illegal buyer is accomplished in a way such that the seller can directly convince a judge that the buyer has redistributed contents. This feature removes any possibility of dispute. Furthermore, when the innocent buyer is falsely accused by a malicious seller, it is possible to prove his/her innocence without attending a court and revealing any privacy.
1
Introduction
In electronic commerce, copyright protection of multimedia contents being sold is a key problem to be solved. Two techniques for protecting copyrights of contents are fingerprinting and watermarking. Fingerprinting is a technique that allows to trace redistributors of the contents by obtaining the original buyers’ information from the redistributed contents. Watermarking is a technique that allows to prove the rights of the contents by obtaining the sellers’ information from the redistributed contents. Classical fingerprinting [4, 5] and watermarking schemes [8, 9, 21] are symmetric in the sense that both the seller and the buyer know the fingerprinted or watermarked copy. One problem with these symmetric techniques is that a watermark/fingerprint is inserted solely by the seller, which permits the seller of framing attack or causes unsettled dispute. That is, the seller (a reselling agent) may benefit from making unauthorized copies or
This work was supported by grant No. R01-2001-000-00537-0 (2002) from the Korea Science & Engineering Foundation.
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 421–432, 2003. c Springer-Verlag Berlin Heidelberg 2003
422
Hak Soo Ju et al.
the accused buyer of reselling an unauthorized copy may claim that the copy originated from the seller or there existed a security breach in the seller’s system. Two non-symmetric techniques have been introduced by Pfitzmann et al. in Asymmetric fingerprinting [16] and by Qian and Nahrstedt in Owner-Customer Watermarking Protocol [20]. Since the first scheme is based on secure two-party computation, the complexity is too high to be implemented in practice. The scheme in [16] does not offer a buyer’s privacy and the scheme in [20] does not solve the problem that a copyright violator denies his/her guilt. Pfitzmann et al. proposed anonymous fingerprinting scheme[17] which satisfies both anonymity and unlinkability. In this scheme no one can find the purchase information of honest buyers except the case that the registration center reveals the buyers’ private information. However, this scheme is still inefficient because it is also based on secure two party computation. As a efficient method without secure two party computation Pfitzmann et al. suggested the coin-based construction [18]. But these methods have used as a building block Boneh and Shaw scheme [5] for collusion resistance. Boneh and Shaw’s code needed for embedding is so long that the overall system can not be practical. Recently Memon and Wong proposed the scheme so-called Interactive BuyerSeller Watermarking Protocol which is asymmetric and can prevent a copyright violator from denying his/her guilt using invisible watermarking [15]. They have used Cox’s invisible watermarking algorithm [8] as a building block. Cox’s algorithm uses normally distributed random values as watermarks, which are highly resistant to collusion attacks [12]. Memon and Wong scheme(MW scheme) satisfies neither anonymity nor unlinkability. In their scheme a buyer must inform the seller of the important privacy information such as purchase information, his/her public key and identity for the purpose of buying contents. To resolve a dispute in MW scheme, the accused (and possibly innocent) buyer has to take part in the dispute resolution protocol by revealing his/her secret key or watermark to the judge. MW scheme can’t operate properly if an encryption algorithm used by the buyer is probabilistic because the data encrypted by the judge(TTP) may not equal to the data submitted by the seller. Therefore the encryption scheme suggested in MW scheme must be a deterministic encryption algorithm. Our scheme is constructed based on Memon and Wong scheme’s embedding method using Cox’s algorithm for collusion resistance, but solves these problems in this paper. We define the properties that the watermarking and fingerprinting schemes must satisfy as follows : Non-repudiation The buyer accused of reselling an unauthorized copy should not be able to claim that the copy was created by the seller or a security breach of the seller’s system. No-framing An honest buyer should not be falsely accused by a malicious seller or other buyers. Anonymity A buyer should be able to purchase contents anonymously. Unlinkability Given two contents, anyone should not be able to decide whether or not these two contents were purchased by the same buyer.
An Anonymous Buyer-Seller Watermarking Protocol
1.1
423
Our Result
Anonymity becomes even more important when one purchases contents giving a lot of information about a his lifestyle, habits, and etc. Various researches about anonymity have been done in anonymous payment systems and communication systems [13, 7, 3]. It is a great pity that a buyer’s anonymity is destroyed for the purpose of obtaining watermarked contents. In this paper we propose an anonymous buyer-seller watermarking protocol, where a buyer can purchase cheap contents like audio, image, video and text anonymously; nevertheless, the seller is able to identify copyright violators later. This possibility of identification will only exist for dishonest buyers whereas honest buyers will always remain anonymous. The proposed scheme also satisfies unlinkability of the contents purchased by a buyer. Our scheme is constructed based on MW scheme. To add buyer’s anonymity and unlinkability to MW scheme a pair of one-time anonymous private key and public key is used. Because the seller can not identify a buyer from the buyer’s anonymous public keys, the anonymous key pair offers the anonymity of a buyer. Unlinkability is provided since the key pair is independently generated for each purchase of contents. Another distinct feature of our scheme is an identification protocol in which a copyright violator detection and a dispute resolution are integrated. A dispute may occur by the accused buyer for the purpose of denying his illegal act. Our scheme needs not to carry out any dispute resolution protocol. In our scheme the identification of an illegal buyer is accomplished for the seller to directly convince a judge that the buyer has redistributed contents. This feature is implemented by exploiting verifiable encryption schemes [2, 22, 19, 1, 6], where the ownership of the anonymous public key is certified by the trusted third party without disclosing the identity of the owner. In MW scheme, the accused buyer must reveal his/her private key or watermark issued by the watermark certificate center to prove his/her innocence even if the buyer is innocent. The exposure of private key is undesirable for the honest buyer. In case that the accused buyer refuses to reveal his/her secret information and insists innocence, the dispute can not be resolved. All these restrictions are removed in our scheme. Since identification in our scheme is carried out without any help of the accused buyer, our scheme removes the possibility that a (malicious) seller accuses any buyer of misconduct, causing the buyer the inconvenience of going to court and the disclosure of his/her secret to prove innocence. We compare our result with previous schemes mentioned above in the following table : Classic Fing PS96 PW97 PS99 QN98 MW01 Our [4, 5] [16] [17] [18] [20] [15] Scheme Non-Repudiation No-framing Anonymity Unlinkability No two party computation
× × × × ◦
◦ ◦ × × ×
◦ ◦ ◦ ◦ ×
◦ ◦ ◦ ◦ ◦
× × × × ◦
◦ ◦ × × ◦
◦ ◦ ◦ ◦ ◦
424
Hak Soo Ju et al.
This paper is organized as follows. In Section 2 we briefly describe Memon and Wong’s scheme and verifiable encryption schemes needed to construct the identification protocol with the judge’s verification. In Section 3, we construct our scheme. The security of our scheme is analyzed in Section 4. Finally, we conclude in Section 5.
2 2.1
Preliminaries Memon-Wong’s Scheme
Memon and Wong’s scheme is based on a private watermarking scheme and a public key encryption scheme with homomorphic property defined as follows. Definition 1. A private watermarking scheme is a triple, (W G, Emb, Det) of algorithms satisfying the following conditions. (1) Watermark generation algorithm : an algorithm W G which, on a input 1k (for security parameter k) produces a watermark W . (2) Embedding algorithm : an algorithm Emb which takes as inputs a watermark W from the range of W G(1k ) and the original content X, and produces as a output a watermarked content Y . (3) Detecting algorithm : an algorithm Det that takes as inputs a distorted content Y and the original content X and produces as an output a watermark W . Definition 2. A public key encryption functions E : G → R defined on a group (G, ·) is said to be homomorphic if E forms a (group) homomorphism. That is, given E(x) and E(y) for some unknown x, y ∈ G, anyone can compute E(x · y) without any need for the private key. Somewhat surprisingly, homomorphic property has a wide range of applications, including secure voting protocols, multi party computations and signature schemes. For instance, RSA cryptosystem has the property that E(x) · E(y) = E(x · y). Besides RSA, several other homomorphic cryptosystems such as ElGamal and Paillier cryptosystems are currently known. At first Memon and Wong applied homomorphic encryption system to their watermarking scheme. MW scheme is explained as follows. A buyer registers at the watermark certificate center(W) and obtains a valid watermark W = {w1 , w2 , . . . , wm } encrypted with the buyer’s public key pkB , EpkB (W ) = {EpkB (w1 ), EpkB (w2 ), . . . , EpkB (wm )}, and the center’s signature signskW (EpkB (W )), which ensures that the issued watermark is valid. Here, the encryption algorithm E is homomorphic. Next the buyer sends pkB , EpkB (W ) and signskW (EpkB (W )) to the seller to obtain a watermarked content. By verifying the signature with the center’s public key, the seller is convinced of the watermark’s validity. If the verification holds, the seller generates and embeds a unique watermark V into multimedia content X. Let X be the watermarked content with V . When an unauthorized copy is generated from X , this unique
An Anonymous Buyer-Seller Watermarking Protocol
425
watermark V is used for identifying the original buyer of X . To embed the second watermark W generated by the seller into X without decrypting EpkB (W ), the seller encrypts the watermarked content X with pkB and finds the permutation σ satisfying σ(EpkB (W )) = EpkB (σ(W )) = {EpkB (wσ(1) ), . . . , EpkB (wσ(m) )}. Because of the homomorphic property of the encryption algorithm E used by the watermark certificate center, the seller can compute watermarked content EpkB (X ) by the following process. EpkB (X ) = EpkB (X ) ⊕ σ (EpkB (W )) = EpkB (X ) ⊕ EpkB (σ(W )) = {EpkB (x1 ), . . . , EpkB (xn )} ⊕ {EpkB (wσ(1) ), . . . , EpkB (wσ(m) )} = {EpkB (x1 ⊕ wσ(1) ), . . . , EpkB (xm ⊕ wσ(m) ), EpkB (xm+1 ), . . . , EpkB (xn )} = EpkB (X ⊕ σ(W )) , n ≥ m where ⊕ denotes the embedding operation. The seller transmits the computed EpkB (X ) to the buyer. The seller can not find the watermarked content X without the decryption key skB of the buyer and only the buyer can obtain the watermarked content. When the seller discovers an illegal copy Y , he extracts the unique watermark U in Y using the detection algorithm Det. And then he finds a buyer’s ID stored with V by examining the correlations of extracted watermark U and all V’s in the seller’s table. If the identified buyer denies that an illegal copy Y has originated from his content, the seller reveals σ, EpkB (W ), and signskW (EpkB (W )) to the judge. The judge verifies signskW (EpkB (W )) and then ask the identified buyer for his private key or W . The judge verifies W by encrypting it with buyer’s public key pkB and checking if it equals to EpkB (W ) revealed by the seller. After verifying W , the judge verifies the existence of σ(W ) in Y using detection algorithm Det. The buyer is guilty if it exists, otherwise he is innocent. MW scheme does not provide a buyer’s anonymity since the public key pkB and the identity of the buyer should be available in the process of generating the watermarked content. In addition, unlinkability of contents purchased by a buyer is not satisfied because the seller knows all contents purchased with the same public key. Furthermore, the seller can not insist that a illegal copy Y has originated from the identified buyer by using only the result from executing the copyright violator identification protocol, because the seller can accuse an honest buyer with Y generated by himself using X and V stored in the seller’s table. This framing attack can be prevented by the dispute resolution protocol. Owing to this false accusation the dispute resolution protocol must be executed in MW scheme. Also it gives a buyer inconvenience to participate in the dispute resolution protocol for proving his innocence whenever he is accused.
426
2.2
Hak Soo Ju et al.
Verifiable Encryption
In this subsection we introduce a cryptographic primitive, verifiable encryption scheme and explain briefly how to escrow an anonymous private key skB∗ of a buyer with the judge’s public key pkJ . We denote the buyer by B, the seller by S, and the judge by J . A verifiable encryption scheme can be informally described as follows. First B encrypts a message skB∗ with J ’s public key and generates a certificate for proving that skB∗ is a secret (ex. a discrete logarithm or an e-th root) of a public value pkB∗ , without disclosing skB∗ . After B transmits the ciphertext c of message skB∗ and the certificate to S, S verifies the certificate. If the verification holds, S is convinced that the ciphertext is indeed the encryption of the skB∗ . In other words, if J later decrypts the ciphertext, the output must be the secret of the given value. Verifiable encryption schemes are employed in many cryptographic protocols. Examples are fair exchange of digital signature [1], escrow scheme [19] and verifiable secret sharing schemes [2, 22]. Note : If ElGamal encryption scheme is used in a buyer-seller watermarking protocol [14] we can apply verifiable encryption schemes in [2, 22] to our scheme. Whereas if RSA algorithm is used [15], we can use a generalized verifiable encryption schemes in [19, 1, 6]. In this paper we assume that there exists a secure verifiable encryption scheme and we denote the ciphertext as C = EpkJ (skB∗ ) and the certificate as cert.
3
The Proposed Scheme
In this section we describe the proposed anonymous buyer-seller watermarking protocol. Our protocol has three subprotocols : watermark generation, watermark insertion, and identification protocol. In the whole scheme we assume that all participants have a pair of a private key and a public key (sk, pk) certified by CA (Certificate Authority). We assume that there is a trusted watermark certificate center that generates random watermarks and issues them to buyers upon their requests. 3.1
Watermark Generation Protocol
This protocol shown in Fig. 1 is performed between the watermark certificate center W and a buyer B who is supposed to register at the watermark certificate center as follows. 1. B generates an anonymous key pair of a private key and a public key (skB∗ , pkB∗ ). Using a verifiable encryption scheme E, B generates C = EpkJ (skB∗ ) and cert proving that skB∗ is a discrete logarithm or an e-th root of a given pkB∗ without disclosing skB∗ . This process can be precomputed. 2. B sends pkB∗ , signskB (pkB∗ ), C, and cert to W and requests a valid watermark.
An Anonymous Buyer-Seller Watermarking Protocol
B
427
W
∗ ∗ generate (skB , pkB ) ∗ sign = signskB (pkB ) generate (C, cert) where ∗ C = EpkJ (skB )
pk∗ , sign, C, cert
−−−−−B−−−−−−−−−−−→
w, s
verify s using pkW
verify sign using pkB verify cert generate W w = EpkB∗ (W ) ∗ s = signskW (w||pkB ) ∗ TableW ← B, w, s, pkB , sign, C, cert
←−−−−−−−−−−−−−−−−
Fig. 1. Watermark Generation Protocol 3. W verifies the buyer’s anonymous public key pkB∗ with the buyer’s public key pkB certified by CA. And then W verifies the cert. If it is verified, then W generates a watermark W = {w1 , w2 , . . . , wm } randomly. 4. W sends to the buyer the anonymous pubic key pkB∗ and the watermark encrypted with the buyer’s anonymous public key w = EpkB∗ (W ) along with s = signskW (EpkB∗ (W )||pk ∗B ), which certifies the validity of the watermark and also ensures that pkB∗ was used to encrypt W as a public key. W stores B, w, s, pkB∗ , signskB (pkB∗ ), and (C, cert) in TableW . Here || denotes a concatenation and EpkB∗ (W ) = {EpkB∗ (w1 ), EpkB∗ (w2 ), . . . , EpkB∗ (wm )} as in MW scheme. 5. B verifies s with W’s public key pkW 3.2
Watermark Insertion Protocol
This is an interactive protocol shown in Fig. 2 between the seller S and the buyer B who wants to purchase a watermarked content. This protocol depends on the underlying watermarking and homomorphic encryption techniques used and proceeds as follows. 1. B sends pkB∗ , w = EpkB∗ (W ), and s = signskW (EpkB∗ (W )||pkB∗ ) to S. 2. S verifies s with W’s public key pkW . If the verification holds, the next step proceeds. Otherwise the protocol halts. 3. Let X denote the original content which the buyer wants to purchase. S generates a unique watermark V for identifying the specific user when an illegal copy is found later. And then S inserts it into the original content X to get the watermarked content X .
428
Hak Soo Ju et al.
B
S pk∗ , w, s
B −−−−−−−− −−−−−−−−→
Epk∗ (X )
verify s using pkW X : original content generate V X = X ⊕ V find σ EpkB∗ (X ) = EpkB∗ (X ⊕ σ(W )) ∗ TableS ← pkB , w, σ, V, s
B ←−−−−−−− −−−−−−−−−
Fig. 2. Watermark Insertion Protocol 4. S generates a random permutation σ which is used to permute the elements of the encrypted watermark EpkB∗ (W ). S computes σ(EpkB∗ (W )) = EpkB∗ (σ(W )). 5. S embeds into the watermarked content X the second watermark which was the permuted watermark σ(W ) obtained in the previous step. Although the watermark received from B is encrypted with B’s anonymous public key pkB∗ , the seller can embed this second watermark without decrypting EpkB∗ (σ(W )) as follows. EpkB∗ (X ) = EpkB∗ (X ) ⊕ σ EpkB∗ (W ) = EpkB∗ (X ) ⊕ EpkB∗ (σ(W )) = {EpkB∗ (x1 ), . . . , EpkB∗ (xn )} ⊕ {EpkB∗ (wσ(1) ), . . . , EpkB∗ (wσ(m) )} = {EpkB∗ (x1 ⊕ wσ(1) ), . . . , EpkB∗ (xm ⊕ wσ(m) ), EpkB∗ (xm+1 ), . . . , EpkB∗ (xn )} = EpkB∗ (X ⊕ σ(W )) , n ≥ m 6. S transmits EpkB∗ (X ) to B and stores pkB∗ , w, s, σ and V in TableS . 7. B decrypts the encrypted content EpkB∗ (X ) and obtains the watermarked content X . The above insertion method is asymmetric as in [15, 16, 17]. The seller does not know the watermarked contents X because the decryption key skB∗ is known only to the buyer and the buyer can obtain the watermarked content. Because the buyer does not know σ, he cannot remove σ(W ) from X even though he knows W . 3.3
Copyright Violator Identification
This is a three party protocol executed by the seller S, the judge J , and the watermark certificate center W, to identify an illegal buyer from an unauthorized
An Anonymous Buyer-Seller Watermarking Protocol
S found Y U ← Det(X, Y )
J
429
W
∗ X, Y , pkB , w, s, σ, V
−−−−−−−−−→
verify s using pkW
pk∗
− −−−−−B−−− → B,sign, C, cert
verify sign using pkB verify cert ∗ skB = DskJ (C) W = DskB∗ (EpkB∗ (W )) σ(W ) ← Det(X, Y )
←−−−−−−−−
?
σ(W ) = σ(W ) B
←−−−−−−−−
Fig. 3. Copyright Violator Identification
copy and then prove the buyer’s crime to a third party. The protocol is depicted in Fig. 3 and proceeds as follows. 1. When an illegal copy Y of an original content X is discovered, S extracts the unique watermark U in Y . This is done by the detection algorithm Det which extracts a watermark U and depends on the watermarking algorithm. 2. For robust watermarks, by computing correlations of extracted watermark U and every watermark stored in TableS , S finds V with the highest correlation and obtains the transaction information involving V from the table. The information consists of pkB∗ , w = EpkB∗ (W ), s = signskW (EpkB∗ (W )||pkB∗ ), σ and V . S sends pkB∗ , w, s, σ, V, X and Y to the judge J . 3. J verifies s with the center public key pkW . If the verification holds, J performs the next step. Otherwise, the protocol halts. 4. J sends pkB∗ to W. Then W sends B, (C, cert), and signskB (pkB∗ ) back to J . Using signskB (pkB∗ ) and B’s certified pkB , J can verify that pkB∗ belongs to the accused buyer B. 5. J verifies signskB (pkB∗ ) and cert. If the verification holds, J performs the next step. Otherwise the protocol halts. 6. J recovers B’s anonymous private key skB∗ from C with his private key skJ . The key skB∗ enables J to decrypt EpkB∗ (W ). If the verification fails or decrypted data W is not equal to the watermark issued by W, the protocol halts. Otherwise J computes σ(W ) and checks the existence of σ(W ) in Y by extracting the watermark from Y and then estimating its correlations with σ(W ). If there exists σ(W ), B is guilty and B’s ID is revealed to the seller. Otherwise, B is innocent and the protocol halts.
430
4
Hak Soo Ju et al.
Security Analysis
The security of the proposed scheme is analyzed in this section. We assume that all of the underlying primitives are secure. The seller’s and the buyer’s security rely on the security of the underlying watermarking algorithm and homomorphic encryption system. Anonymity and Unlinkability Both anonymity and unlinkability are provided by using a pair of anonymous private and public keys for a buyer. Nothing about the purchase behavior of honest buyers becomes known to the seller, except the case that the watermark certificate center cooperates with the seller. We assume that the watermark certificate center is TTP and does not collude with a seller or a buyer. Our scheme executes one-time watermark generation protocol by using a anonymous key pair whenever the buyer purchases a content. This implies that the buyer’s purchases are unlinkable. Security for the Seller Due to the properties of the underlying encryption and digital signature techniques, we can assume that a malicious buyer can not change or substitute a watermark generated by a trusted watermark certificate center. Furthermore, the use of a time stamp along with information about the transaction in the watermark generation protocol as in MW scheme would prevent a malicious buyer from replacing the watermark with an older one he may previously have obtained from the watermark certificate center. The security of the seller in watermark insertion protocol is the same as that of MW scheme. He should insert a watermark V and σ(W ) in the right manner for his own interest. If he does not correctly insert V or σ(W ), he would not be able to identify the original buyer of an illegal copy. Furthermore a detecting function in the watermark detection must guarantees that the seller can extract the unique watermark V that belongs to a copyright violator. This relies on the security of the underlying watermarking algorithm. Our identification protocol correctly identifies the copyright violator by decrypting the accused buyer’s anonymous secret key, which depends on the secure verifiable encryption scheme, and by detecting σ(W ) in illegal copy Y , which relies on the watermarking algorithm. Security for the Buyer Honest buyers are never found as guilty in the identification protocol. A real ID of an honest buyer can be revealed to the seller in the identification protocol only if the seller can find Y into which the special watermark W is embedded. To forge Y with the special watermark W , the seller must know the buyer’s anonymous private key skB∗ for decryption of EpkB∗ (W ). Thus an honest buyer cannot be falsely identified by the seller even though the seller knows the buyer’s other information (except skB∗ ).
5
Conclusion
To protect both seller and buyer’s rights, Memon and Wong proposed “a buyerseller watermarking protocol” in [15]. But in their scheme, to obtain watermarked
An Anonymous Buyer-Seller Watermarking Protocol
431
contents, a buyer must inform the seller of the important information about privacy, such as his/her public key, identity, and purchase information. To solve this problem we proposed an anonymous buyer-seller watermarking protocol where the buyer can purchase contents anonymously and the seller can identify a copyright violator. The proposed scheme also provides unlinkability of contents purchased by one buyer. We note that the watermark certificate center and the judge may be integrated into one party. In Memon Wong’s scheme, the buyer must participate in dispute resolution protocol and reveal the secret information to prove his innocence whenever he is accused by the seller. We have solved this problem by applying key-escrow method using verifiable encryption scheme to Memon and Wong’s scheme.
References [1] N. Asokan, V. Shoup, and M. Waidner, Optimistic fair exchange of digital signatures, IEEE Journal on selected Areas in communications, 18(4), pp.591–610, Apr. 2000. 423, 426 [2] F. Bao, An Efficient Verifiable Encryption scheme for Encryption of Discrete Logarithms, CARDIS’98, vol.1820 of LNCS2000. 423, 426 [3] S. Brands Untraceable Off-line Cash in Wallets with Observers Advances In Cryptology : Proc of CRYPTO’93, pp.302-318, 1994. 423 [4] G. R. Blakley, C. Meadows, G. B. Purdy Fingerprinting Long Forgiving Messages, Crypto’85, LNCS 218, Springer-Verlag, Berlin 1986, 180-189. 421, 423 [5] D. Boneh and J. Shaw, Collusion-Secure Fingerprinting for Digital Data, Crypto’95, LNCS 963, pp.452–465, Springer-Verlag, Berlin, 1995. 421, 422, 423 [6] J. Camenisch and I. Damgard, Verifiable encryption and applications to group signatures and signatures sharing, Technical Report RS 98-32, Brics, Department of Computer Science, University of Aarhus, Dec. 1998. 423, 426 [7] D. Chaum, A. Fiat, and M. Naor, Untraceable Electronic Cash Advances in Cryptology-CRYPTO’88, pp. 319-327,1990. 423 [8] I. J. Cox, J. Kilian, T. Leighton, and T. Shamoon, Secure spread spectrum watermarking for images, audio and video, IEEE Trans. on Image Processing, vol.6, no.12, pp.1673–1678, 1997. 421, 422 [9] S. Craver, N. Memon, B. L. Yeo, and M. M. Yeung, Can Invisible Watermarks Resolve Rightful Ownership?, IBM research Report, RC 20509, July 25, 1996. 421 [10] S. Craver, N. Memon, B. L. Yeo, and M. M. Yeung, Resolving Rightful Ownership with invisible Watermarking Techniques: Limitations, Attacks and implications, Technical report, IBM Research Report RC 20755, March 1997. [11] S. Craver, Zero Knowledge Watermark Detection, Proceedings of the Third International workship on Information Hiding, Springer Lecture Notes in Computer Science, vol.1768, pp.101-116, 2000. [12] Joe Killian, F. Thomson Leighton, Lesley R. Matheson, Talal G.Shamoon, Robert E. Tarjan, and Francis Zane, Resistance of Digital Watermarks to Collusive Attacks, Proceedings of the IEEE International Symposium on Information Theory, 1998. 422 [13] Anna Lysyanskaya,Ronald L.Rivest, Amit Sahai, and Stefan Wolf. Pseudonym systems http://theory.lcs.mit.edu/ anna/lrsw99.ps, 1999. 423
432
Hak Soo Ju et al.
[14] N. Memon and P. W. Wong, A Buyer-Seller Watermarking Protocol based on amplitude Modulation and the ElGamal Public Key Cryptosystem, Proceedings of SPIE99, vol3657. 426 [15] N. Memon and P. W. Wong, A Buyer-Seller Watermarking Protocol, IEEE Transactions on image processing, vol.10, no.4, April 2001. 422, 423, 426, 428, 430 [16] B. Pfitzmann and M. Schunter, Asymmetric Fingerprinting, Eurocrypt’96, LNCS 1070, pp.84–95, Springer-Verlag, Berlin 1996. 422, 423, 428 [17] B. Pfitzmann and M. Waidner, Anonymous Fingerprinting, Eurocrypt’97, LNCS 1233, Spring-Verlag, Berlin 1997, 88-102 422, 423, 428 [18] B. Pfitzmann and Ahmad-Reza Sadeghi, Coin-Based Anonymous Fingerprinting, Eurocrypt’99, LNCS 1592, Spring-Verlag, Berlin 1996, 150-164. 422, 423 [19] G. Poupard and J. Stern, Fair Encryption of RSA Keys, Enrocrypt2000, LNCS, pp.173–190, Springer-Verlag, 2000. 423, 426 [20] L. Qian and K. Nahrstedt, Watermarking Schemes and Protocols for Protection Rightful Ownership and Customer’s Rights, Academic Press Journal of Visual Communication and Image Representation, 1998. 422, 423 [21] M. Ramkumar and A. N. Akansu, Image Watermarks and Counterfeit Attacks: Some Problems and Solutions, Content Security and Data Hiding in Digital Media, Newark, NJ, May 14, 1999. 421 [22] M. Stadler, Public verifiable secret sharing, Eurocrypt’96, LNCS 1070, pp.191– 199, Springer Verlag, 1996. 423, 426
Speeding Up Secure Sessions Establishment on the Internet Yaron Sella The Hebrew University of Jerusalem School of Computer Science and Engineering Givat Ram, Jerusalem, Israel [email protected]
Abstract. We propose a method for speeding up secure sessions establishment between clients and servers on the Internet, which is applicable for both RSA and DH. In the case of RSA, the method effectively offloads computational work from a heavily-loaded server to its clients. In the case of DH, the improved performance is obtained at the price of extended certificates. Our method is built upon a scheme called simultaneous multiple exponentiation, and basically splits the work of simultaneous multiple exponentiation between two entities. The challenge is to do so without leaking secret bits of the secret exponent, and still improve the performance. We prove that these two tasks can be achieved simultaneously.
1
Introduction
The scenario of a client and a server that need to agree on secret keys is very common in the Internet, for example, a web server that establishes a secure session with a browser using the SSL protocol [8]. Protocols that standardize this activity (e.g., IPSEC [13], SSL [8]) often use RSA [24] encryption/decryption or Diffie-Hellman (DH) key agreement [9], which require modular exponentiation a computationally expensive operation. In this paper we explore a method that can speed this process up. The idea is very simple - we use a time/space (i.e., computation/communication) trade-off offered by a scheme known as simultaneous multiple exponentiation ([18], §14.6) in order to save time. When RSA is used, the client encrypts a random key, sends it to the server, and the server decrypts. We propose to off-load computational work from the server to the clients, i.e., the clients perform extra computations on the server’s behalf, and send some of their results to the server. Sending these values requires more communication, but the server can use them to compute its part more efficiently. This approach is very sensible for heavily loaded servers due to the following. 1. The computational resources available to clients are constantly increasing. 2. The growth in the number of connected PCs means that the number of clients that can address a single server simultaneously is huge. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 433–450, 2003. c Springer-Verlag Berlin Heidelberg 2003
434
Yaron Sella
When DH is used, we assume that the DH public key of at least one party, the client or the server, resides in a certificate that was signed by a trusted third party, e.g., a certificate authority (CA). Otherwise, DH is is vulnerable to a man-in-the-middle attack. The communication/computation trade-off manifests itself in a form of somewhat inflated certificates. That is, the certificate includes, in addition to the DH public key, some pre-computed values. The CA verifies their correctness and signs them as part of the certificate. The recipient of the certificate uses these values in order to compute its part more efficiently. Obviously, the speed up is greater when both parties have extended certificates. Simultaneous multiple exponentiation was first introduced by Lim and Lee in [14]. It allowed them to achieve more flexible time/space trade-off for fixedbase exponentiation than a scheme previously proposed in [6]. Lim and Lee also showed how to use it in order to speed up the verification of several signature schemes (e.g., Schnorr [25], DSS [21]). However, they did not address the security implications that might arise, when two parties that do not trust each other perform different parts of simultaneous multiple exponentiation. Previous work also did not consider further optimizations that can be applied to simultaneous multiple exponentiation, and the preferred setting of its parameters, which in turn can lead to improved performance. Our contributions in the current paper are the following. 1. We propose simultaneous multiple exponentiation as a tool for speeding up key agreement between servers and clients on the Internet. 2. We identify three attacks that apply when simultaneous multiple exponentiation is used by two different parties, Alice and Bob. The goal of the attacks is to expose the secret exponent of one of the parties, say Alice. Our attacks are quite powerful, in the sense that Alice cannot prevent them, even if she uses means of protection against chosen ciphertext attacks, like OAEP [3] as specified in the PKCS #1 standard [12] for RSA. Furthermore, in the case of RSA, Alice can implicitly verify that the cleartext is correct by checking that its encryption gives the ciphertext, but that too does not prevent our attacks. 3. We conclude that safe usage of simultaneous multiple exponentiation between two parties requires explicit verification of the values that one party sends to the other, and show how this can be done efficiently and securely for both RSA and DH. 4. We examine a specific phase within simultaneous multiple exponentiation, and prove that finding the optimal computation for it is an NP-hard problem. We develop an approximated bounding method, and use it to estimate the average cost of this phase. This allows us to estimate the overall computational savings of the scheme, to compare between different settings of the scheme’s parameters, and to choose the best one. The rest of this paper is organized as follows. Section 2 briefly reviews RSA, DH and simultaneous multiple exponentiation. The proposed scheme and its security are discussed in Section 3. Section 4 motivates the safe and careful usage of simultaneous multiple exponentiation in our scheme by presenting several
Speeding Up Secure Sessions Establishment on the Internet
435
attacks. The goal of sections 5 and 6 is to evaluate the speedup that can be obtained by using our proposed method. Section 5 analyzes the cost of a specific phase within simultaneous multiple exponentiation. Section 6 discusses selection of parameters that leads to optimized performance of the proposed scheme. Finally, Section 7 contains some concluding remarks. 1.1
Related Work
The idea of off-loading time-consuming RSA computation from a weak device to a powerful server without exposing information about the secret exponent has been studied extensively under the title server-aided RSA. The first protocols, RSA-S1 and RSA-S2, were invented by Matsumoto et al [17]. Their basic idea was to hide necessary computations by requesting additional unnecessary ones. These protocols were attacked in [23], an attempt to improve them was made in [16], but the improved versions, RSA-S1M and RSA-S2M, were attacked again in [15]. Protocols proposed more recently [1, 11] were discovered to be vulnerable to lattice reduction attacks [22, 19]. The attacks are classified as active or passive, depending on whether the server deviates or does not deviate from the protocol. The challenge seems to be to design server-aided RSA protocols that can resist both types of attacks. It is currently unknown if such protocols exist. In this paper we take a more cautious approach. Our method does not transfer as much workload as previous server-aided RSA protocols, but it is provably secure. Another reason why we decided to take a different approach is that all the existing server-aided RSA protocols require adding at least one round of communication between the device and the server. This property is unacceptable in an Internet scenario. Our proposed method does not require additional communication rounds, although it does extend the messages being sent.
2
Preliminaries
This section includes definitions and notation that are used throughout the paper, as well as a brief review of RSA, DH, and simultaneous multiple exponentiation. We start by recalling the RSA cryptosystem. Let N = pq be the product of two large primes. We use the notation |N | = n bits to denote that there are n bits in the number N . Let |p| = |q| = n/2 bits. Let e, d be two integers satisfying ed = 1 mod ϕ(N ) where ϕ(N ) = (p − 1)(q − 1). We call N the RSA modulus, e the encryption exponent and d the decryption exponent. The pair N, e is the public key, the pair N, d is the private key. A message is an integer M ∈ Z∗N . To encrypt M one computes C = M e mod N . To decrypt the receiver computes C d mod N , and indeed C d = M ed = M (mod N ). The security relies on the RSA assumption and on the presumed intractability of factoring. Standard practice in RSA decryption is to speedup the computation by exponentiating separately modulo p and modulo q. One first computes Mp = C d mod (p−1) mod p and Mq = C d mod (q−1) mod q. Then one uses the Chinese
436
Yaron Sella
Remainder Theorem (CRT) to find a unique value M ∈ ZN satisfying M = Mp mod N and M = Mq mod N . Exponentiation with CRT is approximately four times faster than regular exponentiation. DH can be applied in various finite cyclic groups. Our presentation relates to Z∗p in order to keep it simpler and more concrete. At the setup phase, an appropriate prime p and a generator g ∈ Z∗p are fixed as global parameters. Then every participant chooses a random secret x, 0 < x < p − 1, and publishes y = g x mod p as its public key. Typically, users keep their public keys in a form of a signed certificate. Suppose the public keys of Alice and Bob are ya = g xa mod p and yb = g xb mod p, respectively. When Alice and Bob meet, each one first obtains the public key of her/his partner. The common key, which both of them can compute, is g xa xb mod p = yaxb mod p = ybxa mod p. The security relies on the Decision Diffie-Hellman assumption (DDH), 1 and on the presumed intractability of computing discrete logarithms (DL). Simultaneous multiple exponentiation is a method for calculating au bv mod N without first evaluating au mod N, bv mod N . We use it here for performing a single modular exponentiation more efficiently. We describe it as a procedure called SM E(B, E, N, k, ) that receives as input parameters: a base B, an exponent E, a modulus N , and k, the dimensions of an array (see more details below). Letting m = |E|, i.e., the number of bits in E, we require that k be in the range [1, m], and that be equal to m k . SM E(B, E, N, k, ) computes B E mod N in three phases as follows. 1. SM E1(B, N, k, ) (Successive squaring): Compute and output (i−1)
Bi = B (2
)
mod N, i = 1..k.
(1)
2. SM E2(B1 , . . . , Bk , E, N, k, ) (Preparation): Arrange the exponent E in a bit array of k× (called the exponent-array). The array’s elements are marked as Eij , and its rows as E1 , E2 , .., Ek . E1 E1 . . . E12 E11 E2 E2 . . . E22 E21 (2) .. = .. . .. . . . . . .. . Ek . . . Ek2 Ek1
Ek
The first row of the array, E1 , contains the least significant bits of E, LSB in E11 , MSB in E1 . The second row, E2 , contains the next bits of E, and so on. If m mod k
= 0, then the last row, Ek , is padded with leading 0’s. Compute and output using the Bi ’s, which are the output of SM E1() P [j] = ( Bi ) mod N, j = 1... (3) Eij =1 1
Currently, there is no proof that the Computational Diffie-Hellman assumption (CDH) implies that the result of the DH protocol contains a large enough number of secret bits [4]. Hence, we rely on DDH and not on CDH.
Speeding Up Secure Sessions Establishment on the Internet
437
That is, modular products P [], . . . , P [1] are computed, one per each column of the exponent-array, where a product P [j] is a modular multiplication of all the Bi ’s for which the corresponding bit Eij = 1 (if ∀i, Eij = 0 then P [j] = 1). Later, in Section 5, we discuss how to compute all the P [j]’s efficiently, and prove that finding the minimal-cost computation is NP-hard. 3. SM E3(P [1], . . . , P [], N, ) (Actual exponentiation): Compute and output R (= B E mod N ) using the following square-and-multiply algorithm (the P [j]’s are the output of SM E2()) R = P[] for i = - 1 downto 1 Begin R = R * R mod N R = R * P[i] mod N End The reason why SM E(B, E, N, k, ) correctly computes B E mod N is formally captured by the following equalities: k k (E 2−1 +...+E 21 +E 20 ) i2 i1 BiEi mod N = Bi i mod N = B E mod N = i=1
i=1
(2j−1 ) j−1 Bi P [j](2 ) mod N. mod N =
j=1
3
Eij =1
(4)
j=1
The Proposed Scheme
The proposed method uses SM E() to calculate modular exponentiation, such that one party performs SM E1() and another performs SM E2() and SM E3(). We start with a detailed description of the scheme for RSA and DH, and continue with a security analysis. 3.1
RSA
We assume that the server has an RSA public key N, e known to the client, and that the decomposition of N (= pq) is known to the server. We assume that key agreement with RSA is done in the standard way, i.e., the client chooses a random key, embeds it in a cleartext message M , uses RSA encryption C = M e mod N , and sends the ciphertext C to the server. The server decrypts M = C d mod N . The common key is derived from M . It is also assumed that standard means of protection against chosen ciphertext attacks on RSA (i.e., OAEP [3]) are being used. The proposed method transfers workload from the server to its clients as follows. After computing C, the client executes C1 , . . . , Ck = SM E1(C, N, k, ).
438
Yaron Sella
The client sends C1 , . . . , Ck to the server (note that C1 = C). The server verifies the Ci ’s and executes SM E2() and SM E3() as described below. 1. 2. 3. 4.
Reduction: compute CPi = Ci mod p for i = 1..k (2 ) Verification: if CPi
= CPi−1 mod p for any i = 2..k then abort Preparation: execute P [1], . . . , P [] = SM E2(C1 , . . . , Ck , d, N, k, ) Exponentiation: execute and output M = SM E3(P [1] . . . , P [], N, ) (the common key is derived from M as usual)
Figure 1 illustrates how the method works for RSA for the specific parameters m = |d| = 1024, k = 4 (⇒ = 256). The method can be easily adapted to work with other techniques that can reduce the amount of computation done by the server, including decryption with CRT, rebalanced RSA, multi-prime RSA and multi-factor RSA (see [5] for a survey of these techniques). For example, when the CRT is used, SM E2() and SM E3() are executed twice (for dp = d mod p−1 and for dq = d mod q − 1), and the two results are combined using the CRT. Note that the values CQi = Ci mod q for i = 1..k need to be computed too.
Client
Server
a. Pick and format M b. Encryption: C = Me mod N c. Successive squaring (SME1): C1 = C, C2 = C2^256 mod N, C3 = C2^512 mod N, C4 = C2^768 mod N C1 C2 C3 C4
d. Reduction: CP1 = C1 mod p, CP2 = C2 mod p, CP3 = C3 mod p, CP4 = C4 mod p e. Verification: verify that CP12^256 mod p = CP2, CP22^256 mod p = CP3 and CP32^256 mod p = CP4 f. Preparation (SME2): compute C1C2 mod N, C1C3 mod N, C1C4 mod N, C2C3 mod N, C2C4 mod N, C3C4 mod N, C1C2C3 mod N, C1C2C4 mod N, C1C3C4 mod N, C2C3C4 mod N, C1C2C3C4 mod N (with high probability, d’s 4 x 256 exponent-array contains all possible 4-bit vectors in its columns) g. Exponentiation (SME3): compute M = Cd mod N = C1d1C2d2 C3d3 C4d4 mod N using the square-and-multiply algorithm of Section 2, and the results from step f.
Fig. 1. RSA example with m = |d| = 1024, k = 4
Speeding Up Secure Sessions Establishment on the Internet
3.2
439
DH
In this case, our method does not transfer workload from server to clients, but rather speeds the calculations of both. We simplify the presentation by describing it only from the server’s point of view. Let g, p be the common DH parameters known to all. Let the client’s public key be y, and the server’s private key be s. We assume that the client has a certificate that contains y, which was signed by a certificate authority (CA). During DH key agreement, the client sends its certificate to the server, the server verifies the CA’s signature, and computes M = y s mod p. The common key is derived from M . The proposed method speeds up the calculations as follows. When the client’s certificate is generated, the CA computes y1 , . . . , yk = SM E1(y, p, k, ), stores y1 , . . . , yk in the certificate (note that y1 = y), and signs. When the server receives the client’s certificate, it verifies the yi ’s and executes SM E2() and SM E3() as described below. 1. Verification: verify the CA’s signature on the client’s certificate. 2. Preparation: execute P [1], . . . , P [] = SM E2(y1 , . . . , yk , s, p, k, ) 3. Exponentiation: execute and output M = SM E3(P [1] . . . , P [], p, ) (the common key is derived from M as usual) 3.3
Security Analysis
We argue that our proposed scheme is as secure as the original schemes with respect to two security goals - the secrecy of the agreed upon session keys, and the secrecy of the private exponents. Our security assumptions for DH are DDH, the intractability of the DL problem, and that there is a trusted CA that signs DH certificates with unforgeable signatures. Our security assumptions for RSA are the RSA assumption, the intractability of factoring, and that OAEP-RSA is being used (or any other method that prevents chosen ciphertext attacks on RSA). First, it is clear that the values C1 , . . . , Ck for RSA, and y1 , . . . , yk for DH, do not expose any new information that an adversary could not obtain by himself. Second, in both cases, the result computed by the server is the same as in the original protocol. Therefore, with regard to the security of the agreed upon session keys, the modified protocols provide the same level of security as the original ones under the same intractability assumptions. It remains to verify that the additional input that the server accepts cannot be abused in order to attack the server’s private key. In the case of DH, according to the protocol, the CA itself calculated y1 , . . . , yk , and signed them as part of the client’s certificate. Since the CA can be trusted, and since its signatures are unforgeable, after the server verified the CA’s signature on the certificate (step 1), it can be certain that y1 , . . . , yk are correct. It follows that abuse of y1 , . . . , yk is not possible in this case. In the case of RSA, the array C1 , . . . , Ck that the client sends can either be (i−1) ) a valid one, i.e., satisfying Ci = C (2 mod N, i = 1..k, or an invalid one. If
440
Yaron Sella
the array C1 , . . . , Ck is valid, it corresponds to some cleartext message M , and the server obtains M . Therefore, in this case, M is as secret as it is with the original scheme, and the use of RSA-OAEP provides immunity against chosen ciphertext attacks. The other option is that the array C1 , . . . , Ck is invalid. If it failed the server’s verification (step 2), the server aborts, and the attacker learns nothing. We conclude by proving a Lemma claiming that if an adversary managed to produce an invalid array, that passes the server’s verification and RSA-OAEP, then he factored the RSA modulus. Lemma 1. Consider the proposed scheme for RSA. Suppose an adversary found (i−1) ) an array Q1 , . . . , Qk that violates Qi = Q(2 mod N, i = 1..k, but passes the verification phase and RSA-OAEP. Then this adversary factored the RSA modulus. Proof. Since RSA-OAEP is used, the adversary must start from a valid cleartext M . It follows that the adversary knows C1 , . . . , Ck , the valid array of M . Since the adversary managed to produce Q1 , . . . , Qk , which is an invalid array that passed the server’s verification and decrypts to M , it follows that the ad= Qj mod N and Cj = Qj mod p. versary has at least one pair Cj , Qj s.t., Cj
The two congruences together imply that Cj
= Qj mod q, so we have that gcd(N, Cj − Qj ) = p.
4
Attacks
This section motivates the extra effort that was taken in order to verify that the values sent by the client are consistent. As we demonstrate below, several attacks can be launched if the consistency of these values is not validated. The attacks use a divide-and-conquer strategy on the rows of the exponent-array. We describe the attacks for RSA, but they work equally well for DH. In fact, RSA presents a more difficult challenge for an adversary, because here the server can protect itself against chosen ciphertext attacks (e.g., by using OAEP). But our attacks are designed to overcome this measure. The only assumption we make is that the adversary can tell whether a secure session has been successfully established. We believe that this assumption is quite realistic for a general adversary, and very realistic if the adversary is the client. 4.1
Attack #1
The first attack is performed by a malicious client, say Bob. According to the protocol, Bob is supposed to pick the cleartext M , and compute C = M e mod N . (i−1) ) Then Bob is supposed to compute and send Ci = C (2 mod N, i = 1..k to the server. Now, suppose Bob wants to find dj , the j th row of the exponent d arranged as an exponent-array. Bob tries to find dj systematically by guessing all its possible assignments, and verifying if the guess was correct. Suppose Bob guesses
Speeding Up Secure Sessions Establishment on the Internet
441
The attack proceeds as follows. Bob generates a cleartext message M that dj = d. = M/αd mod N for any with OAEP formatting as necessary. Bob computes M (instead α ∈ Z∗N of his choice. Then Bob computes all the Ci ’s based on M of M ). Finally, Bob sends all the Ci ’s to the server, but instead of Cj he sends αCj mod N . It is easy to see that if dj = d (i.e., Bob’s guess was correct), then ∗ αd mod N = M . ∗ αdj mod N = M the server’s computation gives M Conclusion: Based on the success/failure of the key establishment protocol, Bob knows when the server obtained M , and can therefore confirm a correct guess of dj . The complexity of the attack is O(2 ). 4.2
Attack #2
The second attack shows that the server must verify its input even if the clients can be trusted. It is performed by an active eavesdropper, say Eve, who can intercept and modify messages in transit. The attack is useful when k ≥ 3. If k < 3 the complexity of the attack is not better than the complexity of exhaustive search. Suppose Eve wants to find di and dj , the ith , j th rows of the exponent d arranged as an exponent-array. Eve tries to find di and dj systematically by guessing some of their possible assignments, and verifying if the guess was correct. To keep the presentation simple let i = 1 and j = 2. Eve attacks by intercepting the Ci ’s of many clients, modifying them, and waiting until the key establishment protocol succeeds despite the modification. Eve modifies the Ci ’s of a specific client by changing C1 to be C1α and C2 to be C1β , for 1 < α, β < 2 . As we will shortly see, there is another constraint on the pair α, β, which reduces the number of pairs that must be tried. Now suppose that the key establishment protocol succeeded for a specific α, β pair. This means that the server obtained the correct plaintext M . It follows that
C1d1 C2d2 = (C1α )d1 (C1β )d2 mod N =⇒ C1d1 C12
d2
= C1αd1 C1βd2 mod N.
(5)
This implies the following non-modular 2 equation d1 (α − 1) − d2 (2 − β) = 0. By constraining the search to α, β pairs that satisfy gcd(α − 1, 2 − β) = 1, Eve knows that d1 = (2 − β)g12 and d2 = (α − 1)g12 where g12 = gcd(d1 , d2 ). Conclusion: Based on the success/failure of the key establishment protocol, Eve knows when the server obtained M , and can therefore confirm a correct guess of di , dj up to their gcd. The complexity of this attack is O(22 ). 2
If the multiplicative order of C1 is small, the resulting equation might be modular in that order. However, if safe primes are used, the chance that this will happen is negligible. In any case, the attacker can verify its findings using several different M ’s.
442
Yaron Sella
4.3
Attack #3
The third attack exploits the internals of simultaneous multiple exponentiation. It is based on finding relations between rows of the exponent-array. The attack can be performed by an active eavesdropper, say Eve. As in the previous attacks, Eve multiplies a subset of the Ci ’s by some correction factors, and waits until her modifications do not affect the final result, in which case she learns information about the secret exponent. We start by considering the simple (but very unlikely) case, in which row i in the exponent-array of d contains only 0’s. In this case, Eve can modify Ci to be αCi mod N , without affecting the success of the key agreement protocol. The probability that some row in a random exponent-array contains only 0’s is k( 12 ) . But there are relations which are more likely to occur. Consider the case in which two rows in the exponent-array of d, i and j, are equal, i.e., di = dj . In this case, Eve can modify Ci to be αCi mod N , and Cj to be α−1 Cj mod N , without affecting the success of the key agreement protocol. The probability that any two rows in a random exponent-array are equal is k2 ( 12 ) . The attack can be extended to relations other than equality. Consider the case in which row i is equal to row j shifted by one bit to the right, i.e., di = Shif tRight(dj , 1). In this case, Eve can modify Ci to be αCi2 mod N , and Cj to be α−1 Cj mod N , without affecting the success of the key agreement protocol. An example that involves three rows is slightly more complicated. Suppose rows h, i, j in the exponent-array of d satisfy the following condition ∀r ∈ {1..k} ((dhr = dir = djr = 0) ∨ (dhr = (dir ⊕ djr )). In this case, Eve can modify Ch to be αCh mod N , Ci to be α−1 Ci mod N , and Cj to be α−1 Cj mod N , and despite all these modifications, the key agreement protocol succeeds. The probability that any three rows in a random exponent-array satisfy the above condition is 3 k3 ( 38 ) . More generally, Eve can choose how many rows to correlate and in what manner. The framework of the general attack is as follows. Eve picks a nonempty subset of the rows of the exponent array, i.e., rows i1 , . . . , in (0 < n < k). Eve assigns correction factors α1 , . . . , αn (different from 0 and 1) to Ci1 , . . . , Cin . Finally, Eve applies the correction factors using modular multiplication, i.e., ∀j ∈ {i1 , . . . , in } replace Cj by αj Cj mod N . Eve’s goal is to find a subset of the rows and assign correction factors such that the final result of the exponentiation is not affected.
5
Preparation Phase Cost Analysis
Clearly, the proposed scheme offers a trade-off between communication and computation. Our goal in the next two sections is to evaluate the speedup that can be gained from using it, both theoretically and practically. We start by analyzing the performance of a specific phase within simultaneous multiple exponentiation. This analysis serves as a basis for optimizing the performance of the entire scheme, which is done in the next section.
Speeding Up Secure Sessions Establishment on the Internet
443
In this section, we focus on the computational cost of the preparation phase within simultaneous multiple exponentiation (SM E2()). Our cost measurement unit is the required number of modular multiplications (henceforth ModMuls). The number of ModMuls required for the preparation phase is bounded from above by 2k − k − 1, because this is the number of k-bit vectors with more than one 1. We call these vectors the relevant vectors. However, when < 2k log 2k , it is likely that not all the relevant vectors will be ’hit’ by a random exponentarray 3 , so it may be possible to save some work. Furthermore, it may be possible to reduce the cost by careful ordering of ModMuls. We use RSA notation to illustrate this point. Suppose the preparation phase requires the computation of C1 C2 C3 mod N and C1 C3 C6 mod N . Each product separately requires 2 ModMuls. But if the intermediate result C1 C3 mod N is computed first (and stored), the two products can be computed with only 3 ModMuls. Informally, one wants to minimize the work required for computing all the products. We start by formalizing the problem of finding the optimal computation for the preparation phase, and continue by showing that this problem is NP-hard. We then propose an approximated bounding method that allows us to bound its expected performance. The bounding is approximated because the NP-hardness implies that accurate bounding is likely to be intractable. 5.1
Introducing MCCG
This section formally defines some terms that are needed in order to reason about the cost of the preparation phase. Definition 1. A computation graph is a digraph CG(V, E) in which V is a union of two distinct sets V = Vin Vcomp , that satisfy: 1. ∀v ∈ Vin , indegree(v) = 0 2. ∀v ∈ Vcomp , indegree(v) = 2 The input of the computation graph is Vin . The output of the computation graph is a subset of Vcomp . Definition 2. The cost of a computation graph CG(V, E) is |Vcomp |. The optimization problem associated with the preparation phase is finding a minimal cost computation graph. We call this problem MCCG. Definition 3. MCCG - given a k × exponent-array, find a minimal cost computation graph with k inputs and outputs, s.t. it computes all the products specified by the array. 3
If m mod k = 0 the exponent-array cannot be considered truly random, because of the padding of its kth row with leading 0’s. However, the non-randomness that is introduced actually works in favor of optimizing the performance of the preparation phase. Hence, in this case, randomness of the entire exponent-array is a worst-case assumption.
444
Yaron Sella
Note - we defined MCCG in the context of modular multiplications, but in fact the operation can be any commutative operation. However, we do not allow our computation graph to use inverse operations, because these translate in our domain to expensive modular inverses. Hence, we allow our computation graphs to compute only ’forward’. 5.2
MCCG is NP-Hard
Unfortunately, the MCCG problem is NP-hard. We prove this by reduction from a known NP-complete problem called Exact-3-Cover (X3C) [10]. Let us first recall the definition of X3C. Definition 4. Exact-3-Cover (X3C) - Let n be a natural number divisible by 3. Let A = {a1 , a2 , . . . , an } be a set of n elements. Let A3 be a set of triplets over A. Is there a subset of the triplets in A3 that covers all the elements of A, each element exactly once? We continue by proving a theorem on the NP-hardness of MCCG. Theorem 1. The problem MCCG is NP-hard. Proof. By reduction from X3C. The input is an instance of X3C, namely A = {a1 , a2 , . . . , an } and A3 = {A1 , A2 , . . . , Am }. The goal is to solve the X3C instance correctly (i.e., provide a yes/no answer) using an MCCG oracle. The first step is to transform the X3C instance, A and A3, into an array that will be the input to the MCCG oracle. This is done as follows. The number of rows in the array is |A| = n. Each row in the array corresponds to an element + m + 1. The first n(n−1) of A. The number of columns in the array is n(n−1) 2 2 columns match all the combinations of unordered pairs over A. The next m columns match all the triplets as specified in A3. The last column contains n×1’s. The transformation is polynomial in n, and linear in m, so it is polynomial in the input size. The next step is to let the MCCG oracle operate on the array produced by the transformation above. The MCCG oracle outputs CG, a minimal cost computation graph. Let C be the cost of CG. Finally, the answer is determined + m − 1 reply yes, otherwise reply no. based on C’s value: if C = n(3n−1) 6 n(n−1) Analysis: the first 2 columns of the array contain all the unordered pairs over n elements. Here there is no room for saving. Any computation graph must matching vertices in its Vcomp . The next m columns of the array include n(n−1) 2 contain m triplets. Every triplet can be obtained from some pair using a single operation. Since all pairs were included, again there is no room for saving. Any computation graph must include m matching vertices in its Vcomp . The last column contains only 1’s. It represents a product of all the elements of A, Πall for short. Considering the products that must have been computed by the computation graph thus far, the minimal possible cost for Π-all is n3 − 1. This
Speeding Up Secure Sessions Establishment on the Internet
445
cost can be achieved iff Π-all is calculated from triplets only. Finally, observe that Π-all can be calculated from triplets only, iff the answer to the original X3C instance was yes. Note that n 3n(n − 1) + 2n n(3n − 1) n(n − 1) +m+ −1= +m−1= + m − 1. (6) 2 3 6 6 5.3
Solving MCCG
A general solution to MCCG is probably intractable. In order to get a feel for the size of MCCG instances that can be solved in reasonable time we implemented an MCCG-solver. The program receives an exponent-array as input, produces all possible computation graphs, and records the one with minimal cost. Traversing the computation graphs is done by an exhaustive, recursive search on all the possible decompositions of products from the array. After some experimentation with the MCCG-solver, we discovered that its performance can be significantly improved by implementing the following optimizations: 1. Pruning partially developed solutions as soon as it is clear that they cannot improve on the current best solution. 2. Decomposing (at every stage) any product in the to-do list that can be expressed as a multiplication of two factors, such that each factor has already been computed or will be computed in the future. We then tested the MCCG-Solver on random 160-bit exponent-arrays. The experiments were executed on 1.1 GHz Pentium 4 PCs with a Linux (Redhat 7.2) operating system. The MCCG-Solver was able to solve random 8×20 cases in less than twenty-four hours, but could not cross the 9 × 18 barrier. Beyond this point we had to use heuristics that no longer guarantee optimal solutions. The most effective heuristic that we found is based on reducing several target products by their GCD. In this heuristic each GCD gets a score. Suppose a GCD contains i elements, and divides j products. Then its score is 2ij . The score evaluates how much work will be saved if this GCD is reduced. The GCD heuristic works as follows: 1. Find G - the GCD with the highest score 2. Decompose every product that contains G using G 3. Proceed to search for solutions with an exhaustive search Using the GCD heuristics we were able to find solutions to 9 × 18 cases after reducing a single GCD. In order to solve 10 × 16 and 11 × 15 cases, two GCDs had to be reduced before proceeding to the exhaustive search. All the solutions were found in less than one minute, but recall that these solutions are no longer optimal. It was clear that activating the MCCG-Solver (even with the GCD heuristic) on larger exponent-arrays is hopeless. Another heuristic that we considered was to generalize optimization (2) above to three factors. Note that this heuristic guarantees finding 1.5-competitive solutions. In practice, the performance with this heuristic was disappointing, but we found a good use for it as part of the approximated bounding method presented in the next subsection.
446
5.4
Yaron Sella
Approximated Bounding Method
We now present a simple method that calculates an upper bound on the cost of the preparation phase for a specific exponent-array, which is tighter than the 2k − k − 1 upper bound. The technique bounds the exact cost from above by approximation, since it was shown that finding the exact cost is an NP-hard problem. The method is then used to estimate the average cost of the preparation phase for large exponent-arrays. Each column in the exponent-array actually represents a product that needs to be computed during the preparation phase. As an initial step, we can discard products of irrelevant vectors, and leave a single representative from duplicate products. It remains to bound the number of ModMuls required in order to compute the remaining products. The bounding method is given L unique (and relevant) products to compute, {P [1], . . . , P [L]}, and returns an upper bound on the number of ModMuls that it would cost to compute them. It is based on the observation that given L unique products to compute, the most efficient computation must have at least L ModMuls. Hence, at least one ModMul is required per each product. It works as follows: Input {P [1], . . . , P [L]} Execution : 1. Discard every product that can be computed with a single ModMul, and count it as costing one unit. More formally, here we discard every product, P [i], that can be expressed as P [i] = A B, where each of A, B is either one of C1 , . . . , Ck or included in {P [1], . . . , P [L]} (for DH substitute C1 , . . . , Ck with y1 , . . . , yk ). 2. Discard every product that can be computed with two ModMuls, and count it as costing two units. A product removed at this step must have the structure P [i] = A BD. The conditions on A, B, D are as in step (1). 3. Count every product that survived steps (1)-(2) as costing k − 1 units (note that k − 1 is the upper bound on the cost of any product). Output Sum of the units counted in steps (1)-(3) We implemented our approximated bounding method, and conducted extensive experiments with it. The experiments were done on exponent-arrays of sizes 512, 786 and 1024 bits. In each experiment 100000 random exponent-arrays were generated, and for each array the cost of the preparation phase was bounded. The results, presented in Table 1, report the average bound. The second column in the table shows the 2k − k − 1 upper bound for reference. For k = 5, the results of our bounding method and the 2k − k − 1 upper bound are essentially the same. But for all other entries in the table it is clear, that our method produces a bound which is tighter than 2k − k − 1. The improvement becomes more significant for larger values of k. Experiments held on larger exponents (up to 2048 bits), although not reported here, gave similar results.
Speeding Up Secure Sessions Establishment on the Internet
447
Table 1. Results of approximated bounding method k 2k − k − 1 |E| = 512 |E| = 786 |E| = 1024 5 26 25 26 26 6 57 42 49 53 7 120 56 71 83 8 247 68 88 105 9 502 80 105 126 10 1013 89 119 145 11 2036 102 128 158
6
Optimizing Performance
This section considers performance optimization of the proposed scheme. For simplicity, we discuss performance only from the server’s point of view. In the case of RSA, we assume that the server is loaded enough so that the additional computation done by the clients has no effect on the total delay. If this is not the case, the scheme should be applied selectively. We do not investigate this issue further here, and leave it for future research. We also assume a standard N = pq modulus. In the case of DH, we note that an extended certificate requires some additional hashing as part of its signature validation. We assume that the signature validation in general, and the extra hashing effort in particular, are negligible in comparison to the cost of the modular exponentiation, and ignore them. In order to optimize the performance, we need to set the parameter k such that the combined cost of all the phases is minimized. For RSA, the reduction phase costs k reductions modulo p (we use a worst-case approach and count them as k/4 full-size ModMuls), the verification phase requires (k−1) ModMuls modulo p (equivalent to (k − 1)/4 full-size ModMuls), and the exponentiation phase costs on average (2 − 2−k )( − 1) ModMuls. The cost of the preparation phase can be bounded using the technique presented in Section 5. DH is the same as RSA but without reduction and verification. (Similar counting of ModMuls can be done for RSA with the CRT). We continue by evaluating the above formulas for 1024-bit exponents and several values of k. The results are summarized in Table 2. The line k = 1 is included for reference. It shows the cost when our proposed scheme is not used. It was calculated assuming dynamic window optimization with w = 6 (w is the window size). The cost of the preparation phase was bounded by 2k − k − 1 for k = 2..5, and by the approximated bounding method for k = 6..12. The conclusion from Table 2 is that for k ∈ [1, 12], the best choice is k = 12, and the expected speedups for RSA and DH are 2.13 and 3.64 respectively. 6.1
Experiments
Our proposed scheme actually offers a computation-communication trade-off. We wanted to verify that this trade-off can improve the throughput of a server
448
Yaron Sella
Table 2. Expected performance for k = 1..12, 1024-bit exponents k 1 2 3 4 5 6 7 8 9 10 11 12
1024 512 342 256 205 171 147 128 114 103 94 86
Reduc. Verific. Prepar. Exponen. Tot. RSA 0 0 32 1194 1226 0.5 128 1 894.25 1023.75 0.75 171 4 639.38 815.13 1 192 11 494.06 698.06 1.25 205 26 401.63 633.88 1.5 213.75 53 337.34 605.59 1.75 220.5 83 290.86 596.11 2 224 105 253.5 584.50 2.25 228 126 225.78 582.03 2.50 231.75 145 203.9 583.15 2.75 235 158 185.95 581.70 3 236.5 167 169.98 576.48
Tot. DH 1226 895.25 643.38 505.06 427.63 390.34 373.86 358.50 351.78 348.9 343.95 336.98
in practice. We therefore designed and implemented a client/server application. The server generates an RSA modulus N = pq, and derives the private and public exponents d, e. The server then sends the RSA public key to the client. The client encrypts a single, random message, M , and bombards the server with the ciphertext(s) of that single message, requesting its decryption. This is meant to simulate a server under heavy load of many clients. The whole experiment was repeated several times with different parameters (N, M, d, etc.), in order to make the results independent of the unique properties of any specific parameters selection. Communication was implemented using standard TCP/IP connections, and big integers using the GMP library. We set our client/server application with the following parameters: |N | = 1024 bits, bombardment size = 1000 messages, repetition size = 1000 repetitions. The server and client programs were then executed on two PCs (1.1 GHz Pentium 4, OS Linux Redhat 7.2) connected via LAN. The application reported the (net) time that it took the server to process all the 1000 decryption requests, averaged on the 1000 executions. We used these measurements to compute the speedup gained in practice, and compared it with the speedup predicted theoretically. As expected, the speedup in practice was smaller than the theoretical speedup, but the difference was at most 5%.
7
Concluding Remarks
The method proposed in this paper can speed up the establishment of secure sessions between servers and clients in the Internet. Currently, it is incompatible with standard Internet protocols that are used for this purpose, but these protocols are updated and improved from time to time. Finally, we would like to mention two directions for further improvements. First, an implementation of our scheme for RSA can reduce the cost of the verification phase by using batch verification techniques as described in [2]. Second, for very small values of k
Speeding Up Secure Sessions Establishment on the Internet
449
(the exact number depends on the size of the exponent), it is possible to obtain improved performance by using an alternative multi-exponentiation algorithm called interleaving exponentiation [20].
Acknowledgments I am grateful to Dahlia Malkhi, Noam Nisan, and Victor Halperin for many invaluable suggestions. I would also like to thank the anonymous referees for their insightful comments, which helped me to improve the presentation of this paper.
References [1] [2]
[3] [4] [5] [6] [7] [8] [9] [10] [11]
[12] [13] [14] [15] [16]
P. Beguin, and J-J. Quisquater. Fast server-aided RSA signatures secure against active attacks. In Proceedings of Crypto 95, pages 57–69, 1995. 435 M. Bellare, J. Garay, and T. Rabin. Fast batch verification for modular exponentiation and digital signatures. In Proceedings of Eurocrypt 98, pages 236–250, 1998. 448 M. Bellare, and P. Rogaway. Optimal Assymetric Encryption - How to Encrypt with RSA. In Advances in Cryptology Eurocrypt 94, pages 92–111, 1994. 434, 437 D. Boneh. The decision Diffie-Hellman problem. In Proceedings of the Third Algorithmic Number Theory Symp., LNCS Vol. 1423, pages 48–63, 1998. 436 D. Boneh, and H. Shacham. Fast variants of RSA. In RSA Laboratories Cryptobytes, Volume 5 No. 1, pages 1–8, Winter/Spring 2002. 438 E. F. Brickell, D. M. Gordon, K. S. McCurley, and D. Wilson. Fast exponentiation with precomputation. In Proceedings of Eurocrypt 92, pages 200–207, 1992. 434 C. Coup’e, P. Nguyen, and J. Stern. The Effectiveness of Lattice Attacks Against Low-Exponent RSA. In Proceedings of PKC’99, pages 204–218, 1999. T. Dierks, and C. Allen. RFC 2246: The TLS Protocol Version 1. January 1999. http://www.ietf.org/rfc/rfc2246.txt 433 W. Diffie, and M. Hellman. New directions in Cryptography. IEEE Transactions on Information Theory, Volume 22, No. 6, pages 644–654, 1976. 433 M. R. Garey, and D. S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, New York, 1979. 444 S. Hong, J. Shin, H. Lee-Kwang, and H. Yoon. A new approach to server-aided secret computation. In Proceedings of the 1st International Conference on Information Security and Cryptology - ICISC’98, pages 33–45, 1998. 435 B. Kaliski, and J. Staddon. RFC 2437: PKCS #1 - RSA Cryptography Specifications Version 2.0. October 1998. http://www.ietf.org/rfc/rfc2437.txt 434 S. Kent, and R. Atkinson. Security Architecture for the Internet Protocol. RFC2401, http://www.ietf.org/rfc/rfc2401.txt 433 C. H. Lim, and P. J. Lee. More flexible exponentiation with precomputation. In Proceedings of Crypto 94, pages 95–107, 1994. 434 C. H. Lim, and P. J. Lee. Security and Performance of server-aided RSA computation protocols. In Proceedings of Crypto 95, pages 70–83, 1995. 435 T. Matsumoto, H. Imai, C. S. Laih, and S. M. Yen. On verifiable implicit asking protocol for RSA computation. In Proceedings of Auscrypt 92, pages 296–307, 1993. 435
450
Yaron Sella
[17] T. Matsumoto, K. Kato, and H. Imai. Speeding up Secret Computations with Insecure Auxiliary Devices. In Proceedings of Crypto 88, pages 497–506, 1990. 435 [18] A. J. Menezes, P. C. Van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography, CRC Press, 1997. 433 [19] J. Merkle. Multi-Round Passive Attacks on Server-Aided RSA Protocols. In Proceedings of CCS ’00, pages 102–107, 2000. 435 [20] B. M¨ oller. Algorithms for Multi-Exponentiation. In Selected Areas in Cryptography (SAC) 2001, LNCS Vol. 2259, pages 165–180, 2001. 449 [21] National Institute for Standards and Technology. Digital Signature Standard (DSS). Technical Report 169, 1991. 434 [22] P. Nguyen, and J. Stern. The B’eguin-Quisquater server-aided RSA protocol from Crypto ’95 is not secure. In Proceedings of Asiacrypt ’98, pages 372–379, 1998. 435 [23] B. Pfitzmann, and M. Waidner. Attacks on protocols for server-aided RSA computation. In Proceedings of Eurocrypt 92, pages 153–162, 1992. 435 [24] R. L. Rivest, A. Shamir, and L. Adleman. A method for obtaining digital signatures and public key cryptosystems. Communication of the ACM, 21:120–126, 1978. 433 [25] C. P. Schnorr. Efficient signature generation by smart cards. J. Cryptology 4 (3), pages 161–174, 1991. 434
On Fairness in Exchange Protocols Olivier Markowitch1, Dieter Gollmann2 , and Steve Kremer1 1 Universit´e Libre de Bruxelles Bd du Triomphe - CP212, 1050 Brussels, Belgium 2 Microsoft Research Ltd 7 J J Thomson Avenue, Cambridge CB3 0FB, United Kingdom
Abstract. The aim of this paper is to give an overview of the most classical definitions of fairness in exchange protocols. We show the evolution of the definition, while putting forward that certain definitions are rather vague or too specialized. We propose a structured and generalized definition of fairness and of the security of exchange protocols. Keywords: security protocols, fairness, exchange protocols, fair exchange, security properties.
1
Introduction
With the growth of open networks in general and the Internet in particular, many security related problems have been identified and solutions have been proposed. Applications in which the fair exchange of electronic items between users is required are becoming more frequent. Payment systems, electronic commerce, certified e-mail and contract signing are classical examples in which fairness is a relevant security property. Informally, an exchange protocol is said to be fair if it ensures that during the exchange of the items, no party involved in the protocol can gain a significant advantage over the other party, even if the protocol is halted for any reason. This paper addresses the problem of defining fairness. In the literature, one finds many different definitions of fairness. Some are too vague (such as for example the informal definition above), and some are too constraining. One important problem is due to the fact that the word “fairness” has several usual interpretations in the current language. These interpretations are often distinct from the definition needed in the context of exchange protocols. Another problem is the excessive use of the fairness property in computer security. If fairness may be sound in the theoretical study of exchange protocols, its practical necessity must be argued depending on the actual situation. To carry out a protocol respecting fairness requires the set up of security mechanisms that may sometimes be rather heavy. It results in an increase of computations and/or communications. It can on occasion be more realistic, in practice, to envisage mechanisms that manage the problems potentially occurring during an exchange separately from the exchange protocol itself. For example, in an exchange of low value electronic information against payments, in case of an unfair situation the P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 451–465, 2003. c Springer-Verlag Berlin Heidelberg 2003
452
Olivier Markowitch et al.
seller could accept the loss or he can lodge a complaint against the buyer. In this paper we will focus on theoretical aspects of fairness in exchange protocols. The majority of publications on fair exchange assume the existence of a trusted third party (TTP) in the protocol. Independently of how the TTP is involved in the protocol, its main role is to resolve the problems that may occur between the involved parties. Some protocols [11, 17, 32] use a TTP to store the details of the transaction in order to help to successfully complete an exchange. As the TTP is actively involved in the protocol, this approach considerably reduces the efficiency of the exchange. To remedy this shortcoming, independently Micali [22] and Asokan et al. [1, 4] proposed a solution that avoids the presence of the TTP between the parties. They proposed not to use the TTP during the transaction when the parties behave correctly and the network functions, but to invoke the TTP to complete the protocol in case of problems with one of the parties or the network. Such protocols are said to be optimistic. Fair exchange protocols without TTP have also been proposed [9, 20, 27, 28]. However the last two are based on an unpractical definition of fairness (as we will see below) and the others adopt a probabilistic approach towards fairness. Independently of the way a TTP is (or is not) used, several categories of exchange protocols exist, depending on the information to be exchanged: – electronic purchase of digital goods: exchange of an electronic item against an electronic payment (issued by the client) – digital contract signing: exchange of signatures on a given electronic document – non-repudiation protocol: exchange of an electronic item and its proof of origin against a proof of receipt – certified e-mail: exchange of an electronic message against a proof of receipt1 – barter: an electronic item of value is exchanged against another electronic item of (similar) value – ... The existence of different categories is responsible for some exotic definitions of fairness. We also note that two-party fair exchange protocols and multi-party fair exchange protocols have often different fairness definitions (partially due to different topologies used in the multi-party case) [2, 7, 13]. In this paper we defend the point of view of a unified definition of fairness, whatever the underlying exchange protocol or formalism may be. We put ourselves in a theoretical context where fairness is always needed and propose a generalized definition of fairness and of the security of an exchange protocol. Our definitions aim to capture what a given property provides, and not how it is provided. To quote Roscoe [24], we avoid “intensional definitions”, which are 1
A difference between non-repudiation protocols and certified e-mail protocols is that in the latter the recipient of the message should not know the sender’s identity when deciding to accept the message or not [19]. Moreover, non repudiation of origin may not be required in certified e-mail.
On Fairness in Exchange Protocols
453
related to a sequence of actions that must (or must not) happen in a given order, for a property to hold. Intensional specifications are useful in formal verification as they capture the way the protocol designers have foreseen the protocol execution (and therefore, we do not criticize them) but are not general enough to give a general definition of a concept such as fairness. In this vein, when examining the statement that “no party has an advantage” we will distinguish between the aspects of an exchange the advantage could apply to, but we will not examine how this advantage could be measured. We warn the readers who expect to find formalized definitions that they will be disappointed, but disappointed for a reason. There are different views of what constitutes an advantage and we have to be able to differentiate these aspects and define them informally but clearly before choosing a particular formalism. The formalism would explain what is meant by gaining information (implying some interesting views, as those proposed in [18, 25]). There can again be different formal definitions of “gaining information”. We would like to stress that an informal analysis of security properties is essential before properties are being formalized. The goal is to promote a clear informal understanding of fairness to be able to compare different formalisms. In the next section, we survey and discuss some of the more classic fairness definitions found in the literature. In the third section, we propose a consistent and modular definition of the security of an exchange protocol, where fairness is one of the properties needed. We point out the necessity of other important properties like timeliness, viability and non-repudiability in this security definition. We conclude in the last and fourth section.
2 2.1
Evolution of the Fairness Definition Historical Definitions
Although the relevance of fairness has been well appreciated since the early 1980s, the first propositions of a fairness definition corresponding to practical solutions were expressed in terms of computing power. The protocols exchanged information piece by piece and it was required that the computational effort required from the parties to obtain each others remaining information should be approximately equal at any stage during the execution of the protocol. Even et al. [12] proposed a classical definition of fairness in the framework of contract signing, called “concurrency”: if one party X executes the protocol properly, then his counterpart Y cannot obtain X’s signature to the contract without yielding his own signature to it. Unfortunately, they did not propose a solution respecting this definition. In order to solve the exchange problem they introduced a weaker definition, called “approximate-concurrency”: if one party X executes the protocol properly then with very high probability, at each stage during the execution, X can compute his counterpart’s signature to the contract using approximately the same amount of work used by Y to compute X’s signature to the contract. This is what we call the computational approach towards fairness.
454
Olivier Markowitch et al.
It was rapidly accepted that requiring an equal, equivalent or even related2 computing power between the communicating parties is not reasonable. An important evolution was the probabilistic approach not requiring equivalent computing power, first proposed by Ben Or et al. [8] in the contract signing framework. They defined probabilistic fairness in the following way: a party is privileged when s/he is capable of causing the judge to rule that the contract is binding on both parties; a contract signing protocol is (v, ε)−f air for a party A if the following holds for any contract C when A follows the protocol properly: at any step of the protocol, in which the probability that another party B is privileged is greater than v, the conditional probability that A is not privileged given that B is privileged is at most ε. ε denotes an upper bound on the probability that one party is not privileged given that the other is privileged. Protocols based on this last definition are traditionally implemented by successive rounds during which, in turn, a party is privileged whereas the other is not. This yields a situation that could be considered unfair (in the common sense). In our eyes, the fact that the entities are privileged in turn is not unfair. It would be unfair if one party were able to prove that the other party is linked alone to the contract. Putting aside these historical definitions, the actually most widely accepted definition of fairness [3, 4, 6, 12, 16, 26, 29, 31] describes fairness in relation to the end of the exchange protocol run: at the end of the exchange protocol run, either all involved parties obtain their expected information or none of them receives anything. We consider this definition to be, almost, the most suitable one3 . 2.2
Definitions with Ballast
There are many recent definitions of fairness that include additional properties in the basic definition. Often, fairness definitions describe the mechanisms necessary to realize fairness in particular cases. The definition is not only based on what is fairness but rather on how to obtain it. Digital Contract Signing In the context of digital contract signing, Asokan et al. [5] specify by the means of a game that an exchange protocol for signatures is not fair if a malicious entity can exchange an invalid signature against a valid one. Although the definition is appropriate, a general definition of fairness should not be based on such specific concerns (even if the proposed definition is always true in the framework of digital contract signing). We believe that a general definition of fairness can be expressed such that all exchange types are covered. The way followed to 2 3
Where the computing power ratio between two communicating parties is known and fixed. Moreover, this definition can easily be adapted in a probabilistic context: at the end of the exchange protocol run, there has to be an overwhelming probability that either all involved parties obtain their expected information or none of them receives anything.
On Fairness in Exchange Protocols
455
obtain fairness depends on the context: for digital contract signing protocols, the security of the signature is an important part but this aspect has to be developed in the security proof of the protocol and not in the fairness definition. Garay et al. [15] specify that an optimistic contract signing protocol is fair if: 1. it is impossible for a corrupted participant to obtain a valid contract without allowing the remaining participant to also obtain a valid contract 2. once a correct participant obtains a cancellation message from the TTP, it is impossible for any other participant to obtain a valid contract 3. every correct participant is guaranteed to complete the protocol. The restriction on the cancellation message (the second rule) seems too restrictive. Indicating that a party, having carried out a cancellation, should not take the risk to continue the protocol is a part of the protocol’s description: the specification of the behavior of the parties implied in the protocol belongs to its description and should not belong to the fairness definition. Moreover, as cancellation is specific to optimistic protocols, we would have different fairness definitions depending on the TTP’s involvement. Note that the cancellation (or abort) token is produced during an optimistic protocol to inform the party asking for a cancellation that the TTP will no longer accept recovery requests during this protocol run. This cancellation token, issued during an abort protocol, is necessary to ensure timeliness (which will be clearly defined in the third section). The timeliness property is respected if the parties always have the ability to reach, in a finite amount of time, a point in the protocol where they can stop the protocol while preserving fairness. The third rule talks about the ability to complete the protocol. If the guarantee to complete a protocol is related to the fact that a way to securely end the protocol (with or without a completed exchange) must exist, then this corresponds to the timeliness property we just discussed. Otherwise, if completing the protocol is related to the fact that the exchange succeeds, this is the viability property. A protocol is viable if the exchange always succeeds when the involved parties behave honestly (i.e. follow the protocol). Viability differs from fairness and is more difficult to obtain in practice, because the success of the exchange does not depend only on the honesty of the parties but also on the quality of the underlying network. In [23], Pfitzmann et al. say that a contract signing scheme is called fair if it fulfills the following requirement: 1. 2. 3. 4.
correct execution unforgeability of contracts verifiability of valid contracts (a signed contract cannot be invalidated) no surprise with invalid contracts (a rejected contract — no party had signed it — cannot be declared signed) 5. termination on synchronous network (the protocol ends after a finite amount of rounds) 6. termination on asynchronous network (after a time-out or a user’s manual input, the protocol ends after a fixed time)
456
Olivier Markowitch et al.
The first rule is related, if no time-out is used, to the viability property, which is distinct from fairness. The second, third and fourth rules are specific to digital contract signing. As these rules apply to any contract signing protocol, these definitions could be considered as extensional specifications of a contract signing protocol. In [23], these statements are defined in terms of precise inputs and outputs of the protocol and in terms of execution of subprotocols show and sign4 . Such definitions refer to the machinery of contract signing. The fifth and sixth rules are related to the timeliness property, which is also distinct from fairness. Of course, it may be the case that in certain formalisms it is quite difficult to state extensional definitions. Fair Exchange Vogt et al. [29] proposed to split the definition of fairness into two aspects: the participation of a “faulty” entity is or is not needed when the TTP is requested to help finishing the protocol. Again, this approach described how fairness is obtained. We emphasize once more that when defining fairness it is necessary to focus on what is fairness and not on how to obtain it. Franklin et al. [14] said that at the end of the fair exchange the following must be true: 1. if A, B, and the TTP are honest, A learns B’s information and B learns A’s information; 2. if A and the TTP are honest then B does not learn anything about A’s information unless if A learns B’s information; 3. if B and the TTP are honest A does not learn anything about B’s information unless if B learns A’s information; 4. if A and B are honest then the TTP does not learn anything about A’s and B’s information. Again, the first property is the viability property. The fourth property is about confidentiality with regard to the TTP. This is not needed in fairness but is due to the context of key exchanges of their paper. 2.3
Vague Definitions
Zhou et al., in the context of non-repudiation protocols [30, 32, 33, 34], define fairness as follows: a non-repudiation protocol is fair if it provides the originator and the recipient with valid irrefutable evidence after completion of the protocol, without giving a party an advantage over the other at any stage of the protocol run. 4
For example, in the fairness definition [23], the statement “Verifiability of valid contracts” is defined by “If a correct signatory, say A, outputs (signed , C, tid) and later executes “show” on input (show , tid ) then any correct verifier will output (signed , C, tid) for any C”.
On Fairness in Exchange Protocols
457
Boyd et al. [10] propose a similar definition: an exchange protocol is fair if at no point during the execution of the protocol either of the entities participating in the exchange can gain any (significant) advantage over the other if the protocol is suddenly halted. In both definitions the notion of advantage is not defined. These definitions seem practically to exclude any protocol not offering a perfect symmetry (in terms of knowledge and possibility of action). However such a definition is obviously not formalizable and seems primarily related to the common acceptance of the word fairness. Moreover in the definition by Zhou et al., the first part of the definition imposes viability, which is, in practice, rather unrealistic. 2.4
Weak Fairness and Transparent TTP
Asokan introduced [1] the notion of weak fairness in relation to protocols where fairness can be broken in certain circumstances. In a weakly fair protocol, a well behaving despoiled party is able, thanks to the help of the TTP to proof his honesty to an external adjudicator. More precisely, if a party A does not receive its expected item, it will be able to prove to an external adjudicator that the other party received the item sent by A or is able to retrieve this item without any further intervention from A. If the misbehaving party (who has not provided his item) refuses to cooperate, the TTP will transmit to A an affidavit in replacement of the missing information. In practice, this property is interesting in protocols with a low weight TTP, when it is more relevant to obtain affidavits produced by the TTP than the expected low cost items. Note, that one also has to define dispute resolution protocols, defining the way an adjudicator has to evaluate these affidavits, as it is the case in non-repudiation protocols. Weak fairness shows an interesting way of linking fairness and non-repudiation. Participants do not get a guarantee that they will obtain the intended item, but at least they get a non-repudiation evidence that the other party was involved in a particular run of the exchange protocol. Weak fairness may also be interesting, in some circumstances, to allow to achieve simultaneously some kind of fairness and timeliness. Therefore weak fairness may be of practical interest. Recent evolutions [5, 10, 15, 21] in optimistic exchange protocols with transparent TTP, based on verifiable encryption and recoverable signatures, offer solutions where it is possible to maintain “strong” fairness. In such optimistic protocols the TTP is always able to retrieve the original expected information in case of a problem, without needing the cooperation of the parties to enforce fairness. Moreover, with such a transparent TTP, at the end of a protocol where the exchange is realized, it is impossible to decide whether the TTP did intervene in the protocol execution or not. As it is difficult to determine whether the TTP was required during the protocol because of a dishonest party or because of a network problem, a transparent TTP may be particularly relevant, for example, in an electronic commerce environment.
458
2.5
Olivier Markowitch et al.
Abuse-Free Digital Contract Signing Protocols
Recently Garay et al. [15] introduced the notion of abuse-free digital contract signing protocols. An optimistic contract signing protocol is abuse-free if it is impossible for a single entity at any point in the protocol to be able to prove to an outside party that he has the power to either terminate (abort) or successfully complete the protocol. The main protocol they propose consists of four steps. During the first part of the main protocol (the first two steps) the parties exchange verifiable commitments to signatures (called “private contract signatures”). The specificity of these signatures is that only the intended recipient is able to verify whether the verifiable committed signature he received can be transformed into universally verifiable signatures by a TTP. Moreover, this recipient is not able to prove the committed signature’s validity to any external parties. The second part of the main protocol consists of the exchange of the universally verifiable signatures on the contract. In case of problems, the entities can run a recovery protocol with the TTP. The TTP will extract the universally verifiable signature from the committed ones. Hence, as only Alice, Bob and the TTP can verify the commitments, it is not possible to prove to an external party that a protocol run has been engaged in. Proving to an external party that a contract is going to be signed may be useful, for instance, in a sale protocol, in order to make this external party increase his offer. Abuse-freeness is an interesting property. In our view, its most important feature is that committed signatures are not universally verifiable. Note that it is not sufficient to use non-universally verifiable committed signatures to obtain abuse-free contract signing protocols. With a non-universally verifiable signature only the expected recipient is able to verify the signature. But this recipient should not be able to prove the validity of this signature to an external party (for example thanks to an interactive proof of knowledge of the secret he used to verify the committed signature or even by completely divulging its secret). However, if using a resilient network when communicating with the TTP (i.e. a network where messages are delivered before a finite although unknown amount of time), we are sceptic about the ability of a party to either terminate or successfully complete the protocol. It is, in fact, rather easy for Bob to force the termination of the protocol proposed in [15]. If he stops the protocol, the only thing Alice can do is to launch an abort protocol. Forcing the successful completion of the protocol is harder: Bob needs to send a recovery request before Alice’s abort request arrives at the TTP. We believe that it is rather difficult to block messages on a resilient network. Hence, a race condition decides whether the abort request or the recovery request first arrives at the TTP. This means that, when using resilient channels, most of the optimistic contract signing protocols, are actually abuse-free, as a race condition decides of the outcome of the protocol. Although the protocols may be abuse-free, with respect to the definition given in [15], the fact that a party can prove to an outsider, that the
On Fairness in Exchange Protocols
459
protocol has been engaged with a given party, before the final outcome of the protocol is known, may be considered as a problem. This motivates the use of private contract signatures, which overcome that problem. Also note that when all the communication channels are synchronous (i.e. messages are delivered before a finite and constant amount of time), which is not realistic in practice as the transmissions are not controlled by race conditions anymore, private contract signatures are indeed needed to ensure that the protocol is abuse-free.
3
A General and Modular Definition of the Security of an Exchange Protocol
In this section, we require the definition of an exchange protocol to explicitly refer to the items the protocol participants want to exchange. The properties following below are defined with respect to the items exchanged as referred to in the exchange protocol definition. As suggested in [15, 16] we propose to speak about the security of exchange protocols. However, we do not consider that viability or abuse-freeness must imperatively be in the mandatory part of a definition of the security of an exchange protocol. We say that an exchange protocol is secure when it respects these three mandatory properties : – viability: independently of the communication channels quality, there exists an execution of the protocol, where the exchange succeeds. – fairness: the communication channels quality being fixed, at the end of the exchange protocol run5 , either all involved parties obtain their expected items or none (even a part) of the information to be exchanged with respect to the missing items is received. – timeliness: the communication channels quality being fixed, the parties always have the ability to reach, in a finite amount of time, a point in the protocol where they can stop the protocol while preserving fairness. Moreover, a secure exchange protocol can respect some optional properties. For example : – non repudiability: it is impossible for a single entity, after the execution of the protocol, to deny having participated6 in a part or the whole of the communication. – abuse-freeness: it is impossible for a single entity at any point in the protocol to be able to prove to an outside party that he has the power to terminate (abort) or successfully complete the protocol [15]. 5 6
It should be noted that the end of the exchange protocol run is not necessarily related to the fact that the exchange succeeded. Classical non-repudiation needs are non-repudiation of origin of a message and nonrepudiation of receipt of a message.
460
Olivier Markowitch et al.
If we accept the common meaning of “fairness”, we can consider that this paper deals with three different aspects of fairness: – fairness, as defined, relating to the items exchanged during the protocol; – fairness relating to the ability to determine the progress of the protocol, called timeliness; – fairness relating to the ability to make statements about the possibility to determine the progress of a protocol, called abuse-freeness. Although these three properties can be bound to the common meaning of the word “fairness”, they are related to fairness at different levels and should not be merged in one single definition. To avoid confusion, we prefer not to use the term fairness when talking about timeliness or abuse-freeness and put the emphasis on a modular and general definition. To illustrate our concepts, consider the following classical protocol, described in a very general way, where Alice exchanges an electronic item against Bob’s digital signature on the publicly known description of the item: Main protocol: 1. B → A: committed signature (the TTP can open it without the help of B) 2. A → B: item 3. B → A: signature Recovery protocol: 1. A → TTP : B’s committed signature and the item 2. TTP → A: B’s signature 3. TTP → B: A’s item In the main protocol, Bob begins by sending to Alice his committed signature on the item’s description. Alice cannot retrieve Bob’s final signature on the description from the committed one, but we assume that she can verify the correctness of the commitment (she is able to verify that the TTP will be able to retrieve Bob’s final signature from the committed one). Then Alice sends to Bob the expected electronic item. If the item corresponds to the description expected by Bob, he sends to Alice his final signature on the item’s description. This final signature can be considered as the confirmation that Bob has received (or paid) Alice’s electronic item. If Bob does not send his final signature at the third step of the main protocol, Alice can initiate a recovery protocol with the TTP. She sends to the TTP Bob’s committed signature and the item. The TTP verifies whether the committed signature is valid and whether the signed description corresponds to the item Alice sent. If all the checks hold, the TTP computes Bob’s final signature on the description from the committed one and sends to Alice Bob’s final signature and to Bob the item. According to our definitions this protocol is fair. When Alice receives Bob’s committed signature, either she sends to Bob the item and receives Bob’s item
On Fairness in Exchange Protocols
461
during the main protocol, or she runs the recovery protocol and both of them obtain their item by the mean of the TTP. However, Alice can block the protocol after having received Bob’s committed signature. If Alice, after having received Bob’s committed signature, suspends her participation in the protocol, Bob cannot decide when to leave the protocol in a fair way. As Bob cannot run the recovery protocol, Alice has always the possibility, after Bob left the protocol, to run the recovery protocol. Therefore Bob can never leave the protocol before receiving Alice’s item. Although fairness is never broken, the timeliness property is not fulfilled. The protocol is viable (the three steps of the main protocol make the exchange happen) and fair, but does not respect timeliness. Hence, the protocol is not secure7 . We thus modify the protocol as follows (obtaining a protocol which is an abstract version of the one proposed in [4]). Main protocol: 1. 2. 3. 4.
A → B: B → A: A → B: B → A:
committed item committed signature (the TTP can open it without the help of B) item signature
Recovery protocol: 1. A or B → TTP : B’s committed signature and committed item 2. TTP → A: B’s signature 3. TTP → B: A’s item With these modifications Bob has an advantage, as he can initiate the recovery protocol right after having received the first message of Alice in the main protocol. Alice has to receive the second message of Bob to be able to do so. As described, the protocol does not respect the timeliness property, because Bob can temporarily suspend his participation in the protocol before deciding whether to continue the main protocol (by sending his committed signature) or to initiate the recovery protocol. So traditionally an abort protocol can be run by Alice. Abort protocol: 1. A → TTP : abort request 2. TTP → A: abort confirmation 3. TTP → B: abort confirmation The recovery protocol and the abort protocol are mutually exclusive and this mutual exclusion is assured by the TTP. The abort protocol can be used to prevent Bob to realize a recovery after having received the first message of the main protocol. But if the main protocol 7
With regard to the definitions proposed here, optimal efficiency of an optimistic contract signing protocol [23] refers to secure protocols and not only fair protocols.
462
Olivier Markowitch et al.
is run until its end, the exchange is achieved. If Alice executes the abort protocol after a completed exchange, the exchange still holds. The goal of the abort protocol is to ensure the timeliness property. An abort confirmation cannot cancel a successful exchange. Only Alice has the power to abort the protocol. If Bob wants the protocol to be aborted he has to wait long enough after having received the first message of the main protocol in order to force Alice to initiate the abort protocol. On the other hand, although Bob cannot run the abort protocol, he can, in practice, abort the main protocol (by stopping his participation) without informing Alice. Whereas when Alice makes an abort, Bob is informed by the TTP. These are not a security problem of the protocol but could be considered as unfair in the common sense. The definitions we propose are valid in a two party case or in a multi-party case. We do not specify in our definitions from whom the information must come and to whom it should be sent. The topology, which differentiates the various multi-party exchange protocols, does not interfere here. Similarly to the framework of digital contract signing protocols, a probabilistic definition of fairness [20] exists in the context of non-repudiation protocols. An exchange protocol is probabilistically fair if the communication channels quality being fixed, at the end of the exchange protocol run, the probability that one party obtains an expected information without providing his counterpart information is negligible and can be parametrized. With such a definition, we are able to design exchange protocols without TTP, which do not require equivalent computing power among the parties.
4
Conclusion
We observed, in the literature on fair exchange protocols, that some definitions of fairness include aspects specific to the application the exchange protocol is intended for. Moreover, some definitions of fairness actually stipulate the way fairness should be achieved. Such definitions provide an insufficient separation between the specification of a protocol and the specification of the protocol goals. We have proposed a unified definition of fairness, independent of the the underlying exchange protocol, and a structured and generalized definition of the security of exchange protocols, distinguishing between the aspects of viability, fairness, and timeliness. Finally, we have suggested how fairness, timeliness, and abuse-freeness could be considered as capturing different aspects of the informal idea of “having no advantage”.
References [1] N. Asokan. Fairness in Electronic Commerce. PhD thesis, University of Waterloo, May 1998. 452, 457
On Fairness in Exchange Protocols
463
[2] N. Asokan, M. Schunter, and M. Waidner. Optimistic protocols for multi-party fair exchange. Research Report RZ 2892 (# 90840), IBM Research, Dec. 1996. 452 [3] N. Asokan, M. Schunter, and M. Waidner. Optimistic protocols for fair exchange. In Proceedings of the fourh ACM Conference on Computer and Communications Security, pages 8–17. ACM Press, Apr. 1997. 454 [4] N. Asokan, V. Shoup, and M. Waidner. Asynchronous protocols for optimistic fair exchange. In Proceedings of the IEEE Symposium on Research in Security and Privacy, Research in Security and Privacy, pages 86–99. IEEE Computer Society,Technical Committee on Security and Privacy, IEEE Computer Security Press, May 1998. 452, 454, 461 [5] N. Asokan, V. Shoup, and M. Waidner. Optimistic fair exchange of digital signatures. IEEE Journal on Selected Areas in Communications, 18(4):593–610, Apr. 2000. 454, 457 [6] G. Ateniese. Efficient verificable encryption (and fair exchange) of digital signatures. In 6th ACM Conference on Computer and Communications Security, pages 138–146, Singapore, Nov. 1999. ACM Press. 454 [7] F. Bao, R. Deng, K. Q. Nguyen, and V. Vardharajan. Multi-party fair exchange with an off-line trusted neutral party. In DEXA’99 Workshop on Electronic Commerce and Security, Florence, Italy, Sept. 1999. 452 [8] M. Ben-Or, O. Goldreich, S. Micali, and R. Rivest. A fair protocol for signing contracts. IEEE Transaction on Information Theory, 36(1):40–46, Jan. 1990. 454 [9] D. Boneh and M. Naor. Timed commitments. In Advances in Cryptology: Proceedings of Crypto 2000, volume 1880 of Lecture Notes in Computer Science, pages 236–254. Springer-Verlag, 2000. 452 [10] C. Boyd and E. Foo. Off-line fair payment protocols using convertible signatures. In Advances in Cryptology: Proceedings of Asiacrypt’98, volume 1514 of Lecture Notes in Computer Science, pages 271–285. Springer-Verlag, 1998. 457 [11] T. Coffey and P. Saidha. Non-repudiation with mandatory proof of receipt. ACMCCR: Computer Communication Review, 26, 1996. 452 [12] S. Even, O. Goldreich, and A. Lempel. A randomized protocol for signing contracts. Communications of the ACM, 28(6):637–647, June 1985. 453, 454 [13] M. Franklin and G. Tsudik. Secure group barter: Multi-party fair exchange with semi-trusted neutral parties. Lecture Notes in Computer Science, 1465, 1998. 452 [14] M. K. Franklin and M. K. Reiter. Fair exchange with a semi-trusted third party. In 4th ACM Conference on Computer and Communications Security, pages 1–5. ACM Press, Apr. 1997. 456 [15] J. A. Garay, M. Jakobsson, and P. MacKenzie. Abuse-free optimistic contract signing. In Advances in Cryptology: Proceedings of Crypto’99, volume 1666 of Lecture Notes in Computer Science, pages 449–466. Springer-Verlag, 1999. 455, 457, 458, 459 [16] F. C. G¨ artner, H. Pagnia, and H. Vogt. Approaching a formal definition of fairness in electronic commerce. In Proceedings of the International Workshop on Electronic Commerce (WELCOM’99), pages 354–359, Lausanne, Switzerland, Oct. 1999. IEEE Computer Society Press. 454, 459 [17] Y. Han. Investigation of non-repudiation protocols. In ACISP: Information Security and Privacy: Australasian Conference, volume 1172 of Lecture Notes in Computer Science, pages 38–47. Springer-Verlag, 1996. 452 [18] M. Jakobsson. Ripping coins for fair exchange. In L. C. Guillou and J.-J. Quisquater, editors, Advances in Cryptology: Proceedings of Eurocrypt’95, vol-
464
[19]
[20]
[21]
[22] [23]
[24]
[25]
[26]
[27]
[28]
[29]
[30]
[31]
[32]
[33]
Olivier Markowitch et al. ume 921 of Lecture Notes in Computer Science, pages 220–230. Springer-Verlag, 21–25 May 1995. 453 S. Kremer and O. Markowitch. Selective receipt in certified e-mail. In Advances in Cryptology: Proceedings of Indocrypt 2001, Lecture Notes in Computer Science. Springer-Verlag, Dec. 2001. 452 O. Markowitch and Y. Roggeman. Probabilistic non-repudiation without trusted third party. In Second Conference on Security in Communication Networks’99, Amalfi, Italy, Sept. 1999. 452, 462 O. Markowitch and S. Saeednia. Optimistic fair-exchange with transparent signature recovery. In 5th International Conference, Financial Cryptography 2001, Lecture Notes in Computer Science. Springer-Verlag, 2001. 457 S. Micali. Certified E-mail with invisible post offices. Available from author; an invited presentation at the RSA ’97 conference, 1997. 452 B. Pfitzmann, M. Schunter, and M. Waidner. Optimal efficiency of optimistic contract signing. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, pages 113–122, New York, May 1998. ACM. 455, 456, 461 A. W. Roscoe. Intensional specifications of security protocols. In Proceedings of the 9th IEEE Computer Security Foundations Workshop, pages 28–38. IEEE Computer Security Press, 1996. 452 P. Syverson. Weakly secret bit commitment: Applications to lotteries and fair exchange. In Proceedings of the 1998 IEEE Computer Security Foundations Workshop (CSFW11), June 1998. 453 P. Syverson. Weakly secret bit commitment: Applications to lotteries and fair exchange. In Proceedings of the 1998 IEEE Computer Security Foundations Workshop (CSFW11), pages 2–13, June 1998. 454 T. Tedrick. How to exchange half a bit. In D. Chaum, editor, Advances in Cryptology: Proceedings of Crypto’83, pages 147–151, New York, 1984. Plenum Press. 452 T. Tedrick. Fair exchange of secrets. In G. R. Blakley and D. C. Chaum, editors, Advances in Cryptology: Proceedings of Crypto’84, volume 196 of Lecture Notes in Computer Science, pages 434–438. Springer-Verlag, 1985. 452 H. Vogt, H. Pagnia, and F. C. G¨ artner. Modular fair exchange protocols for electronic commerce. In Proceedings of the 15th Annual Computer Security Applications Conference, pages 3–11, Phoenix, Arizona, Dec. 1999. IEEE Computer Society Press. 454, 456 J. Zhou, R. Deng, and F. Bao. Evolution of fair non-repudiation with TTP. In ACISP: Information Security and Privacy: Australasian Conference, volume 1587 of Lecture Notes in Computer Science, pages 258–269. Springer-Verlag, 1999. 456 J. Zhou, R. Deng, and F. Bao. Some remarks on a fair exchange protocol. In Proceedings of 2000 International Workshop on Practice and Theory in Public Key Cryptography, volume 1751 of Lecture Notes in Computer Science, pages 46–57. Springer-Verlag, Jan. 2000. 454 J. Zhou and D. Gollmann. A fair non-repudiation protocol. In Proceedings of the IEEE Symposium on Research in Security and Privacy, Research in Security and Privacy, pages 55–61. IEEE Computer Society,Technical Committee on Security and Privacy, IEEE Computer Security Press, May 1996. 452, 456 J. Zhou and D. Gollmann. An efficient non-repudiation protocol. In Proceedings of The 10th Computer Security Foundations Workshop, pages 126–132. IEEE Computer Society Press, June 1997. 456
On Fairness in Exchange Protocols
465
[34] J. Zhou and D. Gollmann. Evidence and non-repudiation. Journal of Network and Computer Applications, 20:267–281, 1997. 456
A Model for Embedding and Authorizing Digital Signatures in Printed Documents Jae-il Lee1 , Taekyoung Kwon2 , Sanghoon Song2 , and Jooseok Song3 1
Korea Information Security Agency, Seoul 138-803, Korea 2 Sejong University, Seoul 143-747, Korea 3 Yonsei University, Seoul 120-749, Korea
Abstract. It is a desirable feature in a public key infrastructure (PKI) to include the signature information in a printed document for authenticity and integrity checks, in a way to bind an electronic document to the printed document. However, it is not easy to preserve the digital signature in the printed document because the digital signature is for the text code (or the whole document file), not for the text image (which can be scanned optically) in printed form. So, we propose a practical and secure method for preserving the authorized digital signatures for printed documents. We will derive a printable digital signature scheme from the Korean Certificate-based Digital Signature Algorithm (KCDSA) for secure transaction and utilize the dense two-dimensional barcode, QRcode, for printing out the signature and data in a small area within a printed document. Keywords: Computer Security, Distribute Systems Security, Practical Aspects.
1
Introduction
A digital signature is a bit string which associates a message in electronic form with an original signer. It has various kinds of applications in information security, including authentication, data integrity, and non-repudiation. In order to be accepted widely, the digital signature must be verifiable in an authentic manner. A public key infrastructure (PKI) plays an important role in that sense[2]. However, once a digitally signed electronic document is printed out in tangible form, it is not easy to assert the validity of signatures in the printed document. This is because the digital signature is for the text code (or the whole document file), not for the text image (which can be scanned optically) in printed documents. When we consider the wide acceptance of digital signatures in real world application, it seems a desirable feature in the PKI to print out a digitally signed document and verify the authorized signature in the form of either an electronic file or a tangible paper. For this purpose we may consider a simple method of embedding the digital signature into the printed document by using a ubiquitous bar code scheme[8]. However, this could not easily work because the electronic document may involve a specific format in a computer system that could never be expressed in printed form, and optical scanning is not a panacea. P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 465–477, 2003. c Springer-Verlag Berlin Heidelberg 2003
466
Jae-il Lee et al.
In this paper, we propose a practical and secure model for preserving digital signatures of electronic and printed documents. Firstly we will design a security model and build a digital signature protocol according to this model. We will derive the printable digital signature scheme from the Korean Certificatebased Digital Signature Algorithm (KCDSA) for secure transaction and utilize the dense two-dimensional barcode, QRcode, for printing out the signature and data in a small area within a printed document[3, 12]. This paper is organized as follows: In Section 2, we will describe a basic model for preserving the authorized signatures of electronic and printed documents. According to this model, a modified version of the KC-DSA will be provided in Section 3, in order to assert the authenticity and integrity of documents in printed form. Subsequently a practical matrix code will be scrutinized in Section 4, so as to embed the printable digital signature in the printed document. Finally we will conclude this paper in Section 5.
2 2.1
Basic Model Overview
The goal of our study is to build a fundamental model for preserving the authorized signatures of electronic documents even in printed document form. In other words, the authenticity imposed on an electronic document should be validated in a similar manner even if the document is transformed into printed form, i.e., a tangible and readable paper. A digital signature can simply be embedded into a printed document in that sense. However, care must be taken when we devise such a scheme because an adversarial attempt can easily be made to transform the electronic document into the printed document. We will define the basic components of our model for embedding and authorizing digital signatures even in printed documents. Our strong assumption is that the PKI is provided, so that every participant is able to acquire an authentic public key of another participant in our model. Definition 1. Our model is a 5-tuple of < E, P, S1 , S2 , M >. – – – – –
E: Entities P: Protocol S1 : Digital signature scheme for electronic documents S2 : Digital signature scheme for printed documents M: Matrix code scheme
The principal entities run a well-defined protocol for generation and verification of digital signatures. As we see, digital signature schemes must be provided for the printed documents as well as the electronic documents, and there must be a distinct relationship between them. The same kind of digital signature schemes may be used for S1 and S2 , with only slight modification. We will discuss further details in Section 3. Finally a matrix code, a kind of bar code, is necessary for embedding the printable digital signature into the printed document. Further details will be provided in Section 4.
A Model for Embedding and Authorizing Digital Signatures
467
B
a
f Server e
b A
C Verifier
Signer c
d D :
trusted
:
untrusted
Prover
Fig. 1. Relation among Principals 2.2
Principal and Adversarial Entities
First we define the following entities who principally participate and run a protocol, called a Digital but Printable Signature Protocol (DPSP), in our model. In addition, two kinds of adversarial entities must be defined in our model. Definition 2. E includes a 4-tuple of < A, B, C, D > as principal entities and a 2-tuple of < E, E > as adversarial entities. – – – – – –
A: Signer who may generate digital signatures. B: Server who may control the verification process. C: Verifier who may verify digital signatures. D: Prover who may attempt to prove digital signatures. E: Passive adversary who may eavesdrop the network. E : Active adversary if E defines a principal entity.
Figure 1 shows the DPSP relation among the principals. As depicted in Figure 1, A first agrees on signature generation with B (see flows a and b) and then sends D a digitally signed document (see flow c). Then, D may print out the document and try to prove its validity to C (see flow d). Finally C verifies the printed signature with B (see flows e and f ). Here are the clear requirements for the relationship of the principal entities. – A trusts B in that B may control the verification process. – A does not need to reveal the primary data1 to B for the trust. 1
In this paper, the primary data means the data that must be approved in terms of authenticity and integrity. Also the primary field means a field in which the primary data is located. Such a field could be the printable text or binary image.
468
Jae-il Lee et al.
– A lets D print out the signed document. – A does not trust C and D in terms of integrity. (A may suspect that C and D could compromise the signed document.) – B trusts A in that A may generate a valid signature. – B does not trust C and D in terms of integrity. (B may suspect that C and D could compromise the signed document.) – C trusts B in that B may control the verification process. – C does not need to reveal the primary data to B for the trust. – C does not trust D. (C may suspect that D could modify or reuse the Asigned document.) – D trusts A in that A may generate a valid signature. – D lets C verify the A-signed document. As we can see, we need to define two kinds of clear paths in our model. They are path A − B − C for controlling a printable signature, and path A − D − C for performing signature transaction. In a sense of concatenating the principal entities in our model, the path A − B − C must be provided with secrecy on a document and the path A − D − C with integrity of the document. In other words, A and C should not reveal the document to B on path A−B −C, while A and C should not allow D to modify the document on path A − D − C. All entities, A, B, C, and D, are considered to be separated in the above requirements. In order to assert flexibility, however, we can consider the following cases in our model. – Each entity is separated. In this case, the above requirements are completely applied. – A and B are the same entity. In this case, C must trust A as it did B in the above. – A and D are the same entity. – A, B, and C are the same entity. 2.3
Basic Protocol
A protocol P is called the DPSP, and can be defined in general as shown in Figure 1 above. Definition 3. P is composed of signature generation, signature transaction, and signature verification. 1. Signature generation (a) Make an electronic document. (b) Select primary fields. (c) Generate a data matrix code for the primary fields. (d) Generate a signature matrix code for the data matrix code. (e) Insert the matrix codes into the electronic document. (f) Generate a digital signature for the complete document. (g) Attach the digital signature to the document.
A Model for Embedding and Authorizing Digital Signatures
469
Signature Matrix Data Matrix
Digital Signature Primary Original
Printable
Fields
Document
(a)
Signature
Signature
Generation
Generation
Signature Generation
Scan
Signature Verification
Scan Printed Document (b)
Signature Verification (Printed)
Fig. 2. Signature Generation and Verification
2. Signature transaction (a) Verify the digital signature of the electronic document. (b) Print out the digitally signed document. 3. Signature verification (a) Scan the matrix codes of the printed document. (b) Verify the primary fields of the data matrix code. (c) Verify the digital signature of the primary fields. The signature generation corresponds to a and b in Figure 1 and the signature verification to e and f . Similarly the signature transaction corresponds to c and d in Figure 1. The primary fields mean chosen data that must be signed in an authentic manner. The whole data may not need to be signed in general, so that we propose to generate a printable signature on the chosen data only, as depicted in Figure 2. The primary fields must be clearly marked in the original document, so that a verifier can easily detect the primary fields in the printed document on verifying them after scanning the data matrix. Note that the data matrix should be scanned for showing the signed primary field data on verifier’s display. The data could be displayed, so long as the signature is correctly verified. In that sense, the data verification might be performed manually while the signature verification should be performed automatically. The automatic comparison of the printed and displayed primary fields is not considered in this paper. As for
470
Jae-il Lee et al.
choosing primary fields, it would be advantageous to a verifier to make location information involved in a data matrix. That means one encodes the primary fields with the location information of each, for example, a 3-tuple of {data, x − location, y − location}. We omit the details in this paper. Note that it is optional to choose the primary fields. One can generate a signature on the whole data of the document. The data matrix code must encode the primary fields along with the document identity information. Finally, the signature matrix code must encode the printable signature on the data matrix code. Both matrix codes should be made printable on the original document. Figure 2 summarizes these concepts. 2.4
Signature Schemes
Any kinds of digital signature schemes can be used for S1 and S2 . However, the chosen scheme must carefully be considered and modified so as to satisfy the following requirements on S2 . Note that this is the reason why we clearly separated S1 and S2 in our model definition. 1. A should not reveal the primary data to B on signature generation. 2. D should not be able to modify the data matrix on signature transaction. 3. D should not be able to modify the signature matrix on signature transaction. 4. D should not be able to reuse the signed document unless it is permitted. 5. C should not reveal the primary data to B on signature verification. 6. C should not be able to modify the data matrix after signature verification. 7. C should not be able to modify the signature matrix after signature verification. 8. C should not be able to reuse the signed document unless it is permitted. We have given the critical requirements for the printable signature scheme, S2 , in our model. We will introduce a carefully derived scheme, P-KCDSA (Printable KCDSA), in Section 3.2. 2.5
Matrix Codes
A well-chosen encoding and code representation method must be used in our model. As we mentioned already, the matrix code, M, is defined as follows. Definition 4. M is a 2-tuple of < MD , MS >. – MD : Data matrix that encodes the primary data fields and the printed document specific information, for example, a document serial number and an expiration date. – MS : Signature matrix that encodes the printable digital signature of the data matrix. We will introduce the chosen schemes in Section 4.
A Model for Embedding and Authorizing Digital Signatures
3
471
Digital Signatures
In this paper, we consider the KCDSA as a possible instance of the DPSP for practical use. The other established signature schemes could be considered for the same purposes. 3.1
KCDSA
The KCDSA (Korean Certificate-based Digital Signature Algorithm) is one of the ElGamal-type signature schemes in which security is based on the hard problem of finding discrete logarithms over finite fields[3]. Two famous variants of the ElGamal signature scheme include the DSS (Digital Signature Standard) and GOST 34.10[6]. Readers are referred to [3] for the details of the KCDSA. In this paper we utilize the KCDSA as a base signature scheme for S1 and S2 . For S1 we can utilize the KCDSA without any modification. However, we have to reconsider and modify it carefully in order to run S2 . Remember that the S2 specific requirements were necessary in our model (See Section 2.4). 3.2
Printable KCDSA
We call the modified version of the KCDSA the Printable KCDSA, and abbreviate it to P-KCDSA. That is, the P-KCDSA is a slightly modified version of KCDSA satisfying the S2 specific requirements described in Section 2.4. Figure 3 depicts the P-KCDSA message flows. In the figure, a message parenthesized by { and } is assumed to be a message encrypted under the recipient’s public key for confidentiality, while a message parenthesized by [ and ] is assumed to be a message with a MAC (message authentication code) or a digital signature for integrity. In that sense, the primary feature of our protocol in P-KCDSA must be guaranteeing the integrity of r and s in printed form because they are not encrypted nor digitally signed as shown in Figure 3. The P-KCDSA is as follows.
Parameter Setup. Client entity A should do the following things for choosing user parameters: 1. Select a large prime p such that |p| = 512 + 256i where i = 0, · · ·, 6. We denote the bit-wise length by | |. 2. Select a prime q such that |q| = 128 + 32j where j = 0, · · ·, 4 with the property that q|(p − 1). 3. Select a generator g of the unique cyclic subgroup of order q in Zp∗ such that g = α(p−1)/q mod p for an element α ∈ Zp∗ . 4. Select an integer x at random from Zq∗ . Note that x is a private key in electronic KCDSA. −1 5. Compute a public key y such that y = g x mod p. 6. Acquire a certificate from a CA (Certificate Authority). 7. Compute a hash value of the certificate such that z = h(CertData). Here we denote by CertData the signer’s certificate data.
472
Jae-il Lee et al.
B
{r,s}
{pi} Server [r]
[s’] A
C Verifier
Signer r,s’
[r,s’] (in electronic form)
D
(in printed form)
:
trusted
:
untrusted
Prover
Fig. 3. Printable KCDSA Message Flows Signature Generation. We assume A signs a binary message m of arbitrary length. Here m means the information encoded in the data matrix, i.e., the primary fields and the document specific data. Entity A should do the following: 1. 2. 3. 4.
Select random secret integer k such that 0 < k < q. Compute r = h(g k mod p). Compute s = x(k − r ⊕ h(z, m)) mod q. Submit (r, s) and the certificate to a trusted server B in a confidential manner.
Then entity B should do the following: 1. 2. 3. 4.
Select random secret integer t such that 0 < t < q. Compute s = s + t mod q. Send s to A in an authentic manner. Compute and store a verification permit π such that π = y −t mod p along with r.
A’s printable signature for m is (id, r, s ). We can say s is an encrypted form of s while the transient key t may be shared by A and B. Note that t must be a private key while the corresponding public key g −t (= π) will be provided to C in encrypted form in the next transaction. Also note that A and B could exchange a count C for restricting the number of verifiers (see below). Finally A should put it into the document by encoding it in a signature matrix code. Signature Verification. To verify A’s signature (id, r, s ) on m, entity C should do the following:
A Model for Embedding and Authorizing Digital Signatures
1. 2. 3. 4.
473
Obtain the user’s certificate. Verify its authenticity; if not, then reject the signature. Verify that 0 < r < q and 0 < s < q; if not, then reject the signature. Request a verification permit to B by sending r in an authentic manner.
Then entity B should send the permit π to C in a confidential manner if it is allowed. Note that if B exchanged C with A above, B should decrease it when (s)he gives the permit to a verifier. If C is equal to zero, B should deny sending the permit. Though we utilized r as an index for maintaining π, one can devise more concrete scheme for the purpose. Finally C should do the following: 1. 2. 3. 4. 5. 3.3
Compute z = h(CertData) Compute u = r ⊕ h(z, m) mod q. Compute w = y s mod p. Compute v = h(wπg u mod p). Accept the signature if and only if v = r. Analysis
We assume the KCDSA is secure in the random oracle model[3]. In that sense, a passive adversary, E, is not given any verifiable information on signature transaction. On the basis of the security of KCDSA, we will examine how the PKCDSA satisfies the S2 specific requirements described in Section 2.4, as a way of removing the threats of an active adversary, E . Note that an active adversary who impersonates C is denoted by C . Assuming the respective private keys of A and B are safe, we consider C and D only in the following examination. 1. A should not reveal the primary data to B on signature generation. – A sends B the initial signature r and s only in the P-KCDSA. – B is not able to acquire any information on m from r(= h(g k mod p)) and s(= x(k − r ⊕ h(z, m)) mod q) only. 2. D should not be able to modify the data matrix on signature transaction. – The information in the data matrix is all signed by A and encoded in the signature matrix. – D and D are not able to generate a new signature of A on signature transaction without having x. 3. D should not be able to modify the signature matrix on signature transaction. – The signature matrix contains r and s rather than s, so that D cannot even verify the signature on signature transaction. – D and D cannot remove t from s (= s + t mod q) without having previous knowledge of s or t. In fact, D cannot acquire t from s because D cannot decrypt out s from {r, s}. 4. D should not be able to reuse the signed document unless it is permitted. – In order to use the printable signature, an intervention of B is always required in the protocol. That means a verifier C should ask B for a permit on signature verification.
474
Jae-il Lee et al.
5. C should not reveal the primary data to B on signature verification. – C gives B the information r only on signature verification. – B is not able to derive m from r(= h(g k mod p)). 6. C should not be able to modify the data matrix after signature verification. – For the same reasons as in 2 above. 7. C should not be able to modify the signature matrix after signature verification. – D cannot derive t from π(= g −t mod p) because of the hardness of solving the discrete logarithm problem. – D cannot decrypt out t from s because of encryption on s. – D and D cannot remove t from s . 8. C should not be able to reuse the signed document unless it is permitted. – In order to use the printable signature, an intervention of B is always required. That means another verifier C ∗ should ask B for a permit on signature verification.
4 4.1
Embedding Digital Signatures with 2D Bar Code 2D Bar Code
Bar code and human-readable text are often printed together, so little additional cost is associated with the inclusion of a bar code symbol. A bar code scanner can extract all of the information by scanning through a conventional bar code symbol. Because of the simplicity in data entry, bar code has become the dominant automatic identification technology[8]. Two-dimensional codes provide much higher information density than conventional bar codes. Due to the low information density, conventional bar codes usually function as keys to databases. However, the increased information density of 2D bar codes enables the applications that require encoding of explicit information rather than a database key. A 2D bar code symbol can hold up to about 4,300 alphanumeric characters or 3,000 bytes of binary data in a small area [12]. With the immense storage capacity, the development of 2D bar codes enables the data exchange under off-line condition[8]. The 2D bar code may work as a portable data file because the information can be received without access to a database. The 2D bar codes also have an excellent data restoration capability for a damaged symbol. EDI(Electronic Data Interchange) has been proposed as a solution for quick exchange of large amounts of data in business. However, EDI faces serious practical limitations in an area with unreliable communication network or without network connection. A potable data file containing detailed information can be an alternative to EDI. Many business models have been developed using 2D bar codes in the fields of logistics, construction, automobile, semiconductors and chemicals. There are four widely used 2D bar codes that are ISO standard: PDF417, DataMatrix, QRcode and Maxicode. QRcode(Quick Response code) is particularly developed for high data capacity, reduced printing space, and high speed reading[12].
A
-.. .
Model for Embedding and Authorizing Digital Signatures
4,
"---:Ti ImII.wd;!
Position qetection Pattern dl
I .I.
I
.
- . 1
I
.
.
I.
Cell
Data Area
Fig. 4. QRcode Structure
4.2
QRcode
QRcode is a 2D matrix symbol which consists of square cells arranged in a square pattern. It allows three models - Alodel 1, Alodel 2, and MicroQR. Model 1 aiid Alodel 2 each have a position detection pattern in three corners while the AlicroQR has it in only one corner. The position detection pattern allows code readers to quickly obtain the symbol size, position and tilt. Model 2 is developed for enhanced specification with improved position correction and large volume of data capacity. MicroQR inodel is suitable for small amounts of data. A QRcode syinbol can encode up t o 7,089 characters (numeric data), 4,296 alphaiiuineric characters, aiid 2,953 8-bit bytes[l2]. The symbol size is determined by the number of characters t o encode. It can grow by 4 cells/side from 21x21 cells t o 177x177 cells. The physical size of a symbol is determined by the cell pitch. The minimum cell pitch is the width of the smallest printed element that can also be resolved by the reader. With current printing and reader technology, the iniiiiinum cell pitch can be as low as 0.lmm and the signature data of 1024-bit(128 bytes) can be printed in an area less than 10 mm sq[l4]. The Qrcode employs a Reed-Solomon algorithm to detect and correct data errors due t o a dirtied or damaged area. There are four levels of error-correction capability that users can select. The error-correction level determines the inaxiinum recoverable rate that is from 7 Embedding the digital signature in printed docuineiits will simplify the workflows that require verifying the authenticity and integrity of documents. One example of a promising recent application is an electronic notice system usiiig a 2D bar code in mobile phones. When we buy a ticket for a movie in advance, we can get the receipt in 2D bar code with the mobile phone. The primary data of the 2D bar code in the receipt contains information about the viewer's name, the name of the movie, the data aiid time, seat number, etc. The theater can verify the 2D bar code receipt in a mobile phone using a system with 2D bar
476
Jae-il Lee et al.
Sample QRcode
i
Cell Pitch
Code Contents
=
@.25mm, ECC
=
M(l5%)
Fig. 5. QRcode Samples code scanner or may be equipped with a kiosk that issues a paper ticket for the 2D bar code receipt. If the mobile phone is not available, we can print the 2D bar code receipt on plain paper and bring it with a personal ID card t o the theater. As another example, we can apply t o a transcript service in a university. When a university graduate needs his official transcript for job applications, he inay request the university to send his official transcript to him on line. He then prints the received electronic transcript on plain paper instead of the paper with the university official seal. The university embeds the primary data and signature data in printable 2D bar codes so that any third party can verify the authenticity and integrity of the printed transcript. The primary data has all the grade inforination including his personal information, such as name, student id, and birthday. If all the primary data cannot fit into one small 2D bar code, we can either make several additional 2D bar codes on one page or increase the size of the 2D bar code to hold more data.
5
Conclusion
In this paper, we proposed a practical and secure model for embedding and authorizing digital signatures in printed documents. For this purpose, we carefully derived a printable signature scheme froin the KCDSA, and selected an appropriate matrix code scheme. In a future study, we will implement the proposed model and analyze its applicability in more detail. When a digitally signed document is printed out in a human-readable text image, it is useful to include the signature information in the text image for authenticity and integrity checks. With the development of dense 2D bar codes, we
A Model for Embedding and Authorizing Digital Signatures
477
can put the digital signature in 2D bar code form into a small area of the printed document. Also, we can include several hundreds or thousands of alphanumeric characters in a small 2D bar code. We have proposed a practical and secure method to preserve authorized digital signatures in printed documents. The proposed model utilized KCDSA for secure transaction and the dense QRcode for printing out the signature and primary data in a small area.
Acknowledgement We would like to thank anonymous referees for their invaluable comments on this work. Also we would like to express our deep appreciation to the committee members of ICISC 2002.
References [1] H. E. Burke, “Handbook of bar Coding Systems,” Van Nostrand Reinhold, New York, N. Y., 1984. [2] T. Kwon, “Digital signature algorithm for securing digital identities,” Information Processing Letters, Vol. 82, Iss. 5, pp.247-252, May 2002. 465 [3] C. Lim, “A study on the proposed Korean digital signature algorithm,” Advances in Cryptology-ASIACRYPT’98, LNCS 1514, Spinger-Verlag, pp.175-186, 1998. 466, 471, 473 [4] S. Lin and D. J. Costello Jr., “Error Control Coding, Fundamentals and Applications,” Prentice Hall, Englewoo Cliffs, N. J., 1983. [5] A. Longacre, Jr., “Stacked Bar Code Symbologies,” Identification J., Vol. 11, No. 1, Jan./Feb., pp. 12-14, 1989. [6] M. Michels, D. Naccache and H. Pertersen, “GOST 34.10 - A brief overview of Russia’s DSA,” Computer Security, Vol. 15, No. 8, pp.725-732, 1996. 471 [7] NIST, “Digital signature standard,” Federal Information Processing Standards Publication 186, 1994. [8] Roger. C. Palmer, “The Bar Code Book,” Helmers Publishing, Peterborough, N. H., 3rd Ed., 1995. 465, 474 [9] Theo Pavlidis, Jerome Swartz, and Ynjiun P. Wang, “Fundamentals of Bar Code Information Theory,” IEEE Computer, Vol. 23, No. 4, pp.74-86, April 1990. [10] Y. P. Wang and T. Pavlidis, “Optimal Correspondence of String Subsequences,” IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. PAMI-12, No. 11, pp. 1080-1087, Nov. 1990. [11] Y. P. Wang, “PDF417 Specification, ” Symbol Technologies, Boemia,N. Y., 1991, [12] “QRmaker: User’s Manual,” Denso Corporation, Aichi, Japan, 1998. 466, 474, 475 [13] “A Business Case Study QRcode,” Denso Wave Inc., Kariya, Japan, 2001. [14] “2D Code Solution,” Sunwoo Information Inc., Seoul, Korea, 2002. 475
A Dynamic Group Key Distribution Scheme with Flexible User Join Hartono Kurnio1 , Luke McAven1 , Rei Safavi-Naini1 , and Huaxiong Wang2 1
Centre for Computer Security Research School of Information Technology and Computer Science, University of Wollongong Wollongong, NSW 2522, AUSTRALIA {hk22,lukemc,rei}@uow.edu.au 2 Department of Computing, Macquarie University Sydney, NSW 2109, AUSTRALIA [email protected]
Abstract. Group key distribution systems (GKDS) provide access control for group applications by maintaining a group key for a dynamic group of users U. During the system lifetime subgroups of U be formed and group keys can be established for each such subgroup. The group U can also be expanded by admitting new users. Dynamic GKDS allow the group management to be decentralised. That is, any group member can form a subgroup and collaboration of several group members may admit new members to the system. We introduce a novel property for dynamic GKDS: allowing specified subsets of users, called access sets, to admit new members to the group. We provide an access structure made up of access sets, where cooperation of a full access set is required to exact admission. This provides a mechanism for self–sufficient, size dynamic and hierachical groups, in the sense of only allowing specified sets of users to admit new members. We give a model and two dynamic GKDS with this property. The first is a threshold scheme, any user subset of a specified size is an access set. The second scheme limits the access structure to chosen sets of up to some size. We also give a variant of the second scheme to have better efficiency. We show that our proposed GKDS are consistent and secure. We evaluate the efficiency of the schemes.
1
Introduction
The importance of group key distribution schemes (GKDS) has grown considerably with the wide spread use of group applications such as pay-TV, pay per view and teleconferencing. GKDS provide access control for the applications by maintaining a group key for an authorised group of users. Group management in traditional group key distribution systems is centralised. There is a fixed group controller (group manager), GC, and an initial set of users, U. GC initialises the system by generating and securely delivering individual keys to the users. After initialisation, GC establishes authorised groups by broadcasting a message allowing users of the target subgroup to compute P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 478–496, 2003. c Springer-Verlag Berlin Heidelberg 2003
A Dynamic Group Key Distribution Scheme with Flexible User Join
479
a common group key. Most literature on GKDS focuses on efficient methods of forming authorised groups [1, 10, 17, 21, 22, 26]. It is usually assumed that U is static, or admission requires a new user to go through an initialisation process similar to other users of U to obtain unique key information. Dynamic GKDS allow the group manager to be dynamic and any group member to assume the role of the GC. That is, after system initialisation each group member can form a subgroup by sending a single broadcast message. Moreover, collaborations of group members may admit new members to the system. This is achieved by securely sending data to a new user such that the new user can compute key information. Systems with the decentralised property have higher reliability since they function without a GC. Allowing a group member to independently form a subgroup has many applications. For example in dynamic teleconferences, where users may wish to transmit data to a subgroup of users. If the fixed group controller model is used, then either all communication from users would be through the group controller and from there broadcast to the designated group, or numerous group keys must be established. These solutions have several drawbacks such as single point of failure, communication overhead for the group controller and communication delay. These problems do not exist in dynamic GKDS. Allowing users to join without the GC is essential for flexible and reliable group operations. For example, new recruits may urgently require to join a group when the director (GC) is absent. Having a system where several managers can cooperate to grant membership is then helpful. Nevertheless this mechanism must guard against abuse by corrupt group members. One solution requires several specified trusted managers to cooperate to grant membership. The main security requirement of subgroup establishment is that revoked users in a session cannot learn anything about the group key. If users are to be permanently removed from the group (they are not exist in the rest of the system lifetime), they should not be able to find any future group keys. One possible way to satisfy this requirement is to include them in the set of revoked users in all future sessions. In adding users, the new member should not learn group keys of previous sessions. These two security requirements are often referred to as forward secrecy and backward secrecy respectively. 1.1
Our Contribution
We introduce a new model which allows any group member to form a subgroup of existing users, and specified subsets of group members, called access sets, to admit new users to the group. Cooperation of users in an access set can admit a new user but arbitrary subsets of users cannot do so. This provides flexibility in admitting new users and security in the sense of ensuring that the new members have the approval of specified subsets of users. A group key is only computable by members of an authorised subgroup. A collusion of users outside the subgroup will have no information about the
480
Hartono Kurnio et al.
key, even if all messages broadcasted through the lifetime of the system are available. We propose a construction that uses symmetric polynomials over finite fields, and cumulative arrays. We present a dynamic GKDS with a threshold admission structure where the minimal access structure consists of all sets of t+1 users. We give a second scheme with an arbitrary admission structure, where the collection of access sets also includes some defined access sets of size at most t. We also give a variant of the second scheme with better efficiency. The schemes allow subgroup establishment of size at least |U| − t users and are secure against a collusion of at most t users, assuming any access set is not a subset of the collusion. Efficiency of the schemes is a function of t. We show our proposed dynamic GKDS satisfy the flexibility and security requirements, and we evaluate their efficiency. 1.2
Related Work
The problem of establishing group keys has been studied in several contexts. In key predistribution schemes (KPS) [18], a trusted authority distributes private information to a set of users so, later, each user of an authorised subset can compute a common group key. In a (t, n) KPS any group of t out of n users can compute a common key. A (t, n) KPS is called k-secure if keys are secure against collusions of up to k users. Blom [2] constructed a KPS using symmetric polynomials over finite fields that are information-theoretically optimal [3]. Constructions of KPS using key distribution patterns [20] were proposed in [26]. Fiat and Naor [10] introduced broadcast encryption schemes (BES) which enable a single source to securely broadcast to an arbitrary and dynamically changing subset of users from a user universe U. This mechanism can be used to establish a group key for an authorised subset. There is a set P ⊆ 2U of subsets of authorised users and a set F ⊆ 2U of subsets of unauthorised users. The scheme enables a center to securely broadcast data to A ∈ P while preventing collusions B ∈ F, where A ∩ B = ∅, from finding the data. In a (t, n) BES, P consists of all t-subsets of users, while F is the set of subsets of size at most t− 1. Efficiency improvements to BES were considered in [17, 26], and generalisations can be found in [14, 15, 22]. Some schemes [15, 22] consider user revocation by a dynamic controller, using key distribution patterns and perfect hash families. Wallner et al [27] proposed logical key tree schemes for multicast groups. The scheme uses a logical tree to allocate keys to users and the tree root corresponds to the group key. The tree structure enables a GC to form short broadcast messages to update the group key (called re-keying) for a group of authorised users. The scheme is primarily designed to revoke a single user but can be repeatedly used to remove multiple users and form arbitrary groups. The system is secure against any collusion size. Efficient schemes, with low communication and user storage costs, have been considered in [5, 19] using one-way functions and pseudo-random generators. User addition is performed by the GC giving keys with respect to the tree structure and using the re-keying protocol to establish a group key for the enlarged group. The schemes in [13, 15] use logical key tree and Diffie-Hellman key exchange to provide decentralised user revocation
A Dynamic Group Key Distribution Scheme with Flexible User Join
481
and user join operations. Schemes based on generalisation of Diffie-Hellman key exchange were proposed in [25]. Anzai et al [1] and Naor et al [21] independently used Shamir’s secret sharing [23] and Diffie-Hellman key exchange [9] to revoke up to t of n users and establish a group key for the remaining users. In this system the system secret is divided into n shares and each user holds a share. The GC keeps the polynomial of degree t + 1 used for generating the shares. Revocation involves broadcasting shares belonging to revoked users. Each authorised user combines his share with the broadcast to obtain the group key, while collusion of all revoked users cannot do so. The scheme in [21] requires a GC to perform user revocation, while the scheme in [1] has the decentralised property where any authorised user can establish a group key with the existance of public keys. In the schemes user addition is possible but requires a GC to give shares to new users. The rest of this paper is organised as follows. Section 2 presents our dynamic GKDS model. Section 3 describes some cryptographic tools employed in our construction. Section 4 gives the basic construction of the proposed dynamic GKDS using symmetric polynomials. Section 5 extends the basic dynamic GKDS using cumulative arrays to allow access sets to be defined. Section 6 gives a variant of the extended scheme to have better efficiency. Section 7 proves security of the proposed schemes. Additional discussions are given in section 8.
2
Model
Let the set N denote the universe of all users, and let K denote the set of all possible secret information in the system. The system lifetime consists of consecutive sessions, S = {S1 , S2 , . . . , SM }, with the corresponding membership events, E = {E1 , E2 , . . . , EM }. One membership event occurs in each session. In session Sl , there is a group of users Ul = {U1 , U2 , . . . , Unl } ⊆ N , and each user Ui ∈ Ul holds secret information K(Ui ) ⊂ K. The index i is a unique identifier for user Ui . There is also a collection of subsets on Ul , Γl = {A1 , . . . , Aml }. The collection is called the access structure and the subsets are called access sets. There are two types of membership events in the system: SUBGROUP and JOIN. In session Sl , a membership event El ∈ {SUBGROUP, JOIN} is invoked, resulting in a new session Sl+1 . The operations involved in the membership events are shown in tables 1 and 2, for SUBGROUP and JOIN respectively. In the SUBGROUP event, a subset Gl in Ul wants to have secure communication within Gl . This event is invoked by a member of Gl , Uzl ∈ Gl , called the group initiator and results in a group key GKl shared by Gl . The group initiator broadcasts a subgroup message MGl , and each user Ui ∈ Gl calculates the group key GKl using the broadcast message MGl and their secret information K(Ui ). In the JOIN event, a set of new users Jl joins the group Ul resulting in group Ul+1 in session Sl+1 . This event is invoked by an access set, Azl ∈ Γl , called sponsors in this context and results in secret information K(Ui ) being given to all new users Ui ∈ Jl . Each sponsor Ukl ∈ Azl transmits a join message
482
Hartono Kurnio et al.
Table 1. Operation in SUBGROUP event If El = SUBGROUP Input: Ul , Gl ⊆ Ul , Uzl ∈ Gl MG
l
Process: Uzl − − − − − − −−→ Gl Ui ∈ Gl does computation on MGl and K(Ui ) Output: Gl share a group key GKl Ul+1 = Ul
Table 2. Operation in JOIN event If El = JOIN Input: Ul , Jl ⊆ N \ Ul , Azl ∈ Γl MJ (Ui ) l
Process: Ukl ∈ Azl − − − − − − −−→ Ui ∈ Jl Ui ∈ Jl does computation on MJl (Ui ) Output: Ui ∈ Jl obtains secret information K(Ui ) Ul+1 = Ul ∪ Jl
MJl (Ui ) containing the secret information over a secure unicast channel to each new user Ui ∈ Jl . This allows the new user to obtain their secret information K(Ui ) and become a part of any subgroup in future sessions. A subsection of possible lifetime behaviour is given in figure 1, with events (..., JOIN, SUBGROUP, SUBGROUP, JOIN, ...). We assume there is a trusted authority, called the group controller GC, that initialises the system, and an initial group of users U0 ⊆ N . The GC generates system parameters and publishes necessary information. Each user Ui ∈ U0 gets secret information K(Ui ) from the GC through a secure unicast channel. The GC is responsible only for these tasks, and after system initialisation there is no further role for the GC. All membership events during the system lifetime are performed by group initiators and sponsors as described above. Security Model We consider a collusion C of t passive and computationally bounded adversaries that have access to secret information K(C) = U∈C K(U ), and all broadcast messages in the system. We assume the collusion does not contain any access Sl . . . ✲ Ul
Sl+1
Sl+2
Sl+3
✲ Ul+1 ✲ Ul+2 ✲ Ul+3 ✲ . . . ✻ ✻ ✻ ✻ ❣El ❣El+3 El+1 El+2 ✻ ✞ ✞ ❄ ✞ ❄ ✞ ✻ Gl+1 ✆ ✝ Gl+2 ✆ ✝ Jl+3 ✆ Jl ✆ ✝ ✝
Fig. 1. A sequence of events in the system
A Dynamic Group Key Distribution Scheme with Flexible User Join
483
set, that is C ⊇ A for all A ∈ Γl . The collusion tries to obtain the group key GK of sessions that the adversaries are not members of. There are security requirements for SUBGROUP and JOIN operations. 1. Subgroup Secrecy – For any session Sl : S1 ≤ Sl ≤ SM , El = SUBGROUP, a collusion C ⊆ Ul \ Gl cannot find the group key GKl . 2. Join Secrecy – For any session Sl , where S1 ≤ Sl ≤ SM , a collusion C ⊆ Ja : Sl ≤ Sa ≤ SM , Ea = JOIN cannot find the group key GKb : S1 ≤ Sb ≤ Sl , Eb = SUBGROUP. Observe that the collusion considered in (i) Subgroup Secrecy consists of users of Ul that are not in the subgroup Gl and (ii) Join Secrecy consists of new users of subsequent sessions after Sl . The security definitions implicitly subsume attacks from outsiders who only have access to the broadcast messages. We do not consider authentication issues in this paper and assume all communication channels are authentic. This means all messages are digitally signed by the sender using some sufficiently strong cryptographic tools, and all receivers are required to verify the authenticity of all received messages.
3
Cryptographic Tools
CUMULATIVE SCHEME. Cumulative schemes, first defined in [24], have been used [6, 11] to construct secret sharing schemes for arbitrary access structures. Let Γ be a monotone access structure on a set U. A cumulative scheme for Γ is a map α : U → 2F , where F is a finite set, such that for any A ⊆ U, αi = F ⇐⇒ A ∈ Γ, Ui ∈A
where αi = α(Ui ). We can represent the scheme as a |U| × |F | cumulative array C(Γ ) = [cij ], where row i is indexed by Ui ∈ U and column j is indexed by an element fj ∈ F, and where each entry cij is either 0 or 1 such that cij = 1 ⇐⇒ fj is given to Ui . The cumulative scheme of [24] is as follows. Let Γ − = A1 + . . . + Am be the minimal form of the monotone access structure Γ . That is, ∀Ak1 , Ak2 ∈ Γ − and = Ak2 , Ak1 ⊂ Ak2 . The dual access structure Γ ∗ = B1 + . . . + Bv is obtained Ak1 by interchanging sum and product in the boolean expression for Γ − . Let the finite set be F = {f1 , . . . , fv }. Then α : U → 2F determines αi = {fj |Ui ∈ Bj } . An Example: Let the minimal access structure be Γ − = U1 U2 + U2 U3 + U3 U4 . The dual access structure is Γ ∗ = U1 U3 + U2 U3 + U2 U4 . Since |Γ ∗ | = 3, let F = {f1 , f2 , f3 }. The cumulative array C(Γ − ) is in table 3, where α1 = {f1 }, α2 = {f2 , f3 }, α3 = {f1 , f2 } and α4 = {f3 }. Observe Ui ∈A αi = F , ∀A ∈ Γ − . SHAMIR (v, n)–THRESHOLD SCHEME. Shamir’s secret sharing scheme was proposed in [23]. Let q be a prime number, S ∈ GF (q) be the secret to be shared, n
484
Hartono Kurnio et al.
Table 3. Cumulative array for Γ − = U1 U2 + U2 U3 + U3 U4 U1 U2 U3 U4
U1 U3 U2 U3 U2 U4 1 0 0 0 1 1 1 1 0 0 0 1
be the number of users (q ≥ n + 1), and v be the threshold for reconstructing the secret.1 Each user Ui corresponds to a unique public value yi and Ui holds a share si = F (yi ), where F (x) is a random polynomial of degree v − 1 over GF (q) such that F (0) = S. The secret can be reconstructed by any v users pooling their shares and using polynomial interpolation to calculate S as follows. Let a set H we assume yi = i, of v users want to find the secret. For simplicity b mod then S = i∈H si × L(H, i) mod q, where L(H, i) = b∈H,b =i b−i q. Also the polynomial F (x) can be reconstructed by any v users as F (x) = i∈H si × Ψ (H, i) mod q, where Ψ (H, i) = b∈H,b =i x−b i−b mod q. (v, v)–THRESHOLD SCHEME. This scheme [12] is basically a simplified version of Shamir’s threshold scheme for the case n = v (it is not necessary that q is prime and q ≥ n + 1). For each user Ui , 1 ≤ i ≤ v − 1, the share si is a randomly v−1 chosen element of GF (q). The share for user Uv is sv =S − i=1 si mod q. v Observe that the v users can calculate the secret as S = i=1 si mod q.
4
The Basic Scheme
We consider a dynamic GKDS that provides protocols for SUBGROUP and JOIN membership events. We consider two cases. The first case, the basic scheme, has JOIN performed by any subgroup of size at least t + 1, as in [16]. In the second case, the extended scheme, an arbitrary access structure over the user set specifies subsets of users who can perform the join operation. In section 4 we recall the basic scheme, upon which we build the extended scheme, presented in section 5. The access structure in the basic construction is the set of all subsets of cardinality at least t + 1. That is, Γl = {A : A ⊆ Ul , |A| ≥ t + 1}. The minimal access structure consists of all access sets with size t+1, Γl− = {A : A ⊆ Ul , |A| = t + 1}. In this scheme any user can establish a group key for a subgroup, and any t + 1 users can grant group membership to a new user. 4.1
System Initialisation by the GC
1. Chooses a value for the system parameter t, generates two large primes p and q, where q | p − 1, and chooses a generator g of the multiplicative group of GF (p). The values p, q, and g are made public. 1
This scheme can be applied for any finite field of cardinality greater than n.
A Dynamic Group Key Distribution Scheme with Flexible User Join
485
2. Constructs a symmetric polynomial, F (x, y) =
t t
sa,b xa y b mod q,
a=0 b=0
where sa,b ∈ GF (q) (0 ≤ a ≤ t, 0 ≤ b ≤ t) are randomly chosen, and sa,b = sb,a for all a, b. The polynomial F (x, y) is kept secret. 3. Calculates a polynomial Fi (x) = F (x, i) and gives Fi (x) to user Ui ∈ U0 over t a secure unicast channel. Note that Fi (x) = a=0 Aai xa mod q, where Aai = t b b=0 sa,b i mod q. User Ui ∈ U0 keeps the polynomial Fi (x) of degree t as secret information, i.e., K(Ui ) = {Fi (x)}. The secret information of GC is the symmetric polynomial F (x, y) of degree t. Proposition 1 ([2]). The basic scheme requires (i) GC to store bits and (ii) each user to store (t + 1) log q bits of information. 4.2
(t+2)(t+1) 2
log q
Subgroup Event: El = SUBGROUP in Session Sl
Input: Ul is the group of users in session Sl . Some users in Ul want to form a subgroup Gl . Let a user Uzl ∈ Gl be the group initiator. Also, let the set U˘l = {i : Ui ∈ Ul } and the set G˘l = {i : Ui ∈ Gl }. Process: Uzl prepares and broadcasts the subgroup message MGl as follows. 1. Randomly generates an integer r ∈ GF (q) and computes gˆ = g r mod p. 2. Randomly chooses a set Il of integers from GF (q) such that |Il | + |Ul | = t + |Gl |, a =b = 0, ∀a, b ∈ Il , and Il ∩ U˘l = ∅. Let θl = Il ∪ (U˘l \ G˘l ). g )Fzl (h) mod p, for all h ∈ θl . Note Fzl (x) is the secret 3. Calculates gˆh = (ˆ information belonging to the group initiator. 4. Broadcasts MGl = {ˆ g, zl , gˆh $h : h ∈ θl }, using $ to denote concatenation. Upon receiving the broadcast message, each user Ui ∈ Gl uses secret information K(Ui ) = {Fi (x)} to compute the group key GKl as follows, recalling that L(H, i) represents Lagrange interpolation at i (see section 3). g )Fi (zl )×L(θl ∪{i},i) × (ˆ gh )L(θl ∪{i},h) mod p. GKl = (ˆ h∈θl
Output: All users in Gl share the same group key GKl . Proof. By the symmetry of F (x, y), Ui ∈ Gl can find a point of the polynomial belonging to Uzl (group initiator), i.e. Fi (zl ) = Fzl (i). Using the t points of Fzl (x) (in the exponent) in MGl , along with Fi (zl ), Ui can calculate the group key GKl using Lagrange interpolation (in the exponent). Note (ˆ g )Fi (zl ) (= (ˆ g )Fzl (i) ) is not
486
Hartono Kurnio et al.
included in MGl so Ui can reach the threshold t + 1 of Fzl (x). Each user in Gl computes the group key as (see also section 3) g )Fi (zl )×L(θl ∪{i},i) × (ˆ gh )L(θl ∪{i},h) mod p, GKl = (ˆ h∈θl Fzl (i)×L(θl ∪{i},i)
= (ˆ g) =
×
(ˆ g)Fzl (h)×L(θl ∪{i},h) mod p,
h∈θl Fzl (h)×L(θl ∪{i},h)
(ˆ g)
h∈θl ∪{i}
mod p,
F (h)×L(θl ∪{i},h)
= (ˆ g ) h∈θl ∪{i} zl = (ˆ g )Fzl (0) mod p,
mod p,
= (ˆ g )F (0,zl ) mod p.
✷
Theorem 1. In the basic scheme, a user may invoke the SUBGROUP event for a subgroup Gl , |Gl | ≥ |Ul | − t with a broadcast of (t + 1)(log p + log q) bits. Refer to [16] for a security proof of this protocol. The formation of a subgroup Gl does not change Ul . Non–subgroup users Ul \ Gl are still valid in Ul . Thus, the users in the next session, Ul+1 , are those in Ul . A non–subgroup user in session Sl might be a subgroup user in another session. 4.3
Join Event: El = JOIN in Session Sl
Input: Recall that Ul is the user group for session Sl . A set of new users Jl ⊆ N \ Ul wants to join the group. Let users in a minimal access set Azl ∈ Γl be the (t + 1) sponsors. Let the set U˘l = {i : Ui ∈ Ul } and the set A˘zl = {i : Ui ∈ Azl }, noting that Azl ⊆ Ul . Process: The sponsors prepare and send the join message MJl as follows. 1. A sponsor Ukl ∈ Azl establishes a group key for group Ul by invoking the subgroup protocol in section 4.2 with Gl = Ul . Let the group key be GKl . 2. All existing users Ui ∈ Ul update their secret information K(Ui ) = {Fi (x) = GKl + Fi (x) mod q}. 3. Each new user Ui ∈ Jl is given a unique identifier i by the sponsors satisfying i ∈ U˘l and i = 0. 4. Each sponsor Ukl ∈ Azl computes fkl ,i = Fkl (i) and sends the message MJl (Ui ) = {fkl ,i $kl } to each new user Ui ∈ Jl over a secure unicast channel. Upon receiving the message MJl (Ui ) from all sponsors, a new user Ui ∈ Jl calculates secret information K(Ui ) as follows. K(Ui ) = fk ,i Ψ (A˘z , kl ). l
l
˘z kl ∈A l
Output: All new users in Jl obtain correct secret information.
A Dynamic Group Key Distribution Scheme with Flexible User Join
487
Proof. After step 2 the system secret F (x, y) has been updated to F (x, y) = GKl + F (x, y) mod q. To be a member of Ul , a new user Ui ∈ Jl has to have secret information F (x, y)|y=i . Since Ui receives t + 1 pieces of information that is fkl ,i = Fkl (i) = F (i, kl ), for all kl ∈ A˘zl (in general |A˘zl | ≥ t + 1), he can ✷ interpolate to obtain K(Ui ) = {F (x, i) = Fi (x)}. Theorem 2. In the basic scheme, any t + 1 users may perform JOIN events, while any t or less users cannot do so. The set Jl has any number of new users and requires transmission of 2(t + 1) log q bits over secure unicast channels. Thus, the users in the next session, Ul+1 , are those in Ul ∪ Jl , and all membership events apply to Ul+1 .
5
The Extended Scheme
The basic scheme has the property that any t + 1 or more users can generate the new user’s secret information. An increase in t makes JOIN events less accessible, as approval of many users is required. In some cases it is desirable that less than t+1 users be able to join new users. That is, users in some specified subsets might need higher privileges. We extend the basic scheme to provide this property. In the extended scheme the access structure consists of all subsets of at least t + 1 users, together with a collection of access sets Γ , called the privileged set. That is, Γl = Γ ∪ {A : A ⊆ Ul , |A| ≥ t + 1}, where Γ is a collection of specified sets A ⊆ U0 , |A| ≤ t, for all A ∈ Γ . In this scheme (i) any user can establish a group key for a subgroup, and (ii) not only t + 1 or more users, but also access sets in the privileged sets can grant group membership to a new user. 5.1
System Initialisation by the GC
1. Same as steps 1, 2, and 3 of the scheme in section 4.1. 2. Defines a minimal access structure Γ over U0 and let U = A∈Γ A where U ⊆ U0 . 3. Constructs and publishes a u×v cumulative array C(Γ ) = [cij ] where u and v are the cardinalities of the sets U and F , respectively. Let each user Ui ∈ U correspond to a set βi consisting of all columns indexed by j where cij = 1, i.e., βi = {j : cij = 1}. 4. Randomly chooses v − 1 symmetric polynomials of degree t in x and y over GF (q), Y1 (x, y), Y2 (x, y), . . . , Yv−1 (x, y), and calculates Yv (x, y) = F (x, y) −
v−1
Yj (x, y) mod q.
(1)
j=1
All polynomials in this step are kept secret. 5. Associates column j of the cumulative array with the symmetric polynomial Yj (x, y), for 1 ≤ j ≤ v. Observe that the set F of the cumulative array C(Γ ) is F = {Y1 (x, y), Y2 (x, y), . . . , Yv (x, y)}.
488
Hartono Kurnio et al.
6. Gives elements of F to each user Ui ∈ Γ if and only if cij = 1 thru a secure unicast channel. Thus, the set αi = {Yj (x, y) : j ∈ βi }. The secret information of each user Ui ∈ U is K(Ui ) = {Fi (x), αi } and that of each user Ui ∈ U0 \ U is K(Ui ) = {Fi (x)}. Observe that the set αi consists of at most v symmetric polynomials of degree t. Proposition 2. In the extended scheme, (i) storage of the GC is (t+2)(t+1) log q 2 bits (ii) storage of a user not in U is (t + 1) log q bits and (iii) storage of a user log q bits. in U is at most (t + 1) log q + v (t+2)(t+1) 2 Subgroup Event protocol for the extended scheme is similar to that for the basic scheme. The difference is in the Input phase, the formed subgroup Gl also has to satisfy A ⊆ Ul \ Gl , for all A ∈ Γ . Note that in the extended scheme, a user Ui only uses Fi (x) to process subgroup event protocol. Thus, the results in section 4.2 guarantee security and efficiency of the scheme. 5.2
Join Event: El = JOIN in Session Sl
The join event protocol for the extended scheme has the property that sponsorship of at least t + 1 users can join a new user. The protocol follows the join event protocol for the basic scheme. In this section, we show in particular how sponsors in an access set Azl ∈ Γ can grant membership to new users. Input: Recall that in session Sl , there is a group Ul . A set of new users Jl ⊆ N \ Ul would like to join the group. Let sponsors in an access set Azl ∈ Γ be authorised to join the new users. Let the set U˘l = {i : Ui ∈ Ul } and the set A˘zl = {i : Ui ∈ Azl }. Note that Azl ⊆ U ⊆ Ul . Process: Assuming sponsors in Azl approve the new users’ admission to the group Ul , the sponsors construct and send the join message MJl as follows. 1. A sponsor Ukl ∈ Azl executes subgroup event protocol in section 4.2 with Gl = Ul . The result is the group key GKl shared by users in Ul . 2. All existing users Ui ∈ Ul update their secret information. (a) All users Ui ∈ Ul calculate Fi (x) = GKl + Fi (x) mod q. l (b) All users Ui ∈ U calculate u = GK mod q (assuming q > v) and v αi = {Yj (x, y) = u + Yj (x, y) mod q : j ∈ βi }. After this step, the secret information of users Ui ∈ U is K(Ui ) = {Fi (x), αi } and that of users Ui ∈ Ul \ U is K(Ui ) = {Fi (x)}. 3. Same as step 3 of the scheme in section 4.3. 4. Each sponsor Ukl ∈ Azl does the following for each new user Ui ∈ Jl . (a) Computes Yj,i (x) = Yj (x, i), for all j ∈ βkl . (b) Individually sends MJl (Ui ) = {Yj,i (x)$j : j ∈ βkl } to Ui over a secure unicast channel. It is possible βkl1 ∩ βkl2 = ∅, for some Ukl1 , Ukl2 ∈ Azl , Ukl1 = Ukl2 . It is then enough for one sponsor to send Yj,i (x), by convention the sponsor with the lowest kl .
A Dynamic Group Key Distribution Scheme with Flexible User Join
489
Using the secret information sent by all sponsors, a new user Ui ∈ Jl calculates secret information K(Ui ) as follows. Let βzl = Uk ∈Az βkl . l l K(Ui ) = Yj,i (x) mod q. j∈βzl
Output: All new users in Jl obtain correct secret information.
Proof. After step 2 the system secret F (x, y) is F (x, y) = GKl + F (x, y) mod q. To be in Ul , a new user Ui ∈ Jl has to have secret information F (x, y)|y=i . By the property of cumulative array C(Γ ), βzl = {1, . . . , v}. Thus Ui obtains Yj,i (x), for 1 ≤ j ≤ v, from the messages. From those, he can compute K(Ui ) as follows, referring to equation (1), Yj,i (x) mod q = Yj (x, i) mod q, j∈βzl
j∈βzl
GKl + Yj (x, i) mod q, = v j∈βzl = GKl + Yj (x, i) mod q, j∈βzl
= GKl + F (x, i) mod q,
= F (x, i) .
✷
Theorem 3. In the extended scheme, users in a defined access set may perform JOIN membership events, while any t or less users that do not contain all users in a defined access set cannot do so. The set Jl has any number of new users and requires transmission of v(t + 2) log q bits over secure unicast channels. Following a join membership event, session Sl+1 has a user group Ul+1 = Ul ∪ Jl . Since users in Jl only obtain secret information Fi (x), none of the new users are in U. In other words, Jl ⊆ Ul+1 \ U.
6
A Variant of the Extended Scheme
We propose a variant of the extended scheme with less user storage and lower transmission costs in user join. The costs are significantly reduced for large w, say w ≈ t, where w = min {|A| : A ∈ Γ }. 6.1
System Initialisation by the GC
In the scheme in section 4, collaborations of t + 1 users admit new users using their Fi (x). If the collaboration has less than t+1 users, no join can be performed. In this variant we give extra information to members of the privileged structure, to reduce the number of collaborators they need for join events. Let U˘0 = {i : Ui ∈ U0 }. The GC distributes t + 1 − w polynomials F (x, d), d ∈ U˘0 , among members of Γ , such that each access set can construct the F (x, d). In detail
490
Hartono Kurnio et al.
1. Same as steps 1, 2, and 3 of the scheme in section 5.1. 2. Chooses and publishes a set D of t + 1 − w distinct integers from GF (q) such ∈ D. that D ∩ U˘0 = ∅ and 0 3. For each d ∈ D, randomly chooses v − 1 polynomials of degree t over (d) (d) GF (q), Y1 (x), . . . , Yv−1 (x), and calculates Yv(d) (x) = F (x, d) −
v−1
(d)
Yj (x) mod q.
(2)
j=1 (d)
Let Yj = {Yj (x) : d ∈ D}, ∀ 1 ≤ j ≤ v. All polynomials here are secret. 4. Associates column j of the cumulative array with the set Yj , for 1 ≤ j ≤ v. Observe that the set F of the cumulative array C(Γ ) is F = {Y1 , . . . , Yv }. 5. Gives elements of F to each user Ui ∈ U if and only if cij = 1 thru a secure unicast channel. Thus, the set αi = {Yj : j ∈ βi }. Note K(Ui ) = {Fi (x), αi }, for all Ui ∈ U and K(Ui ) = {Fi (x)}, for all Ui ∈ U0 \U. Observe the set αi consists of at most v(t + 1 − w) polynomials of degree t. Proposition 3. In this scheme, (i) storage of the GC is (t+2)(t+1) log q bits (ii) 2 storage of a user not in U is (t + 1) log q bits and (iii) storage of a user in U is at most (t + 1) log q + v(t + 1 − w)(t + 1) log q bits. The subgroup event protocol here differs from the extended scheme only in the Process phase, the chosen set Il also has to satisfy Il ∩ D = ∅. 6.2
Join Event: El = JOIN in Session Sl
Input: Same as that of the scheme in section 5.2. Process: Let the sponsor set be Azl ∈ Γ and they construct and send the join message MJl to new users in Jl as follows. 1. Same as step 1 of the scheme in section 5.2. 2. All existing users Ui ∈ Ul update their secret information. (a) All users Ui ∈ Ul calculate Fi (x) = GKl + Fi (x) mod q. l (b) All users Ui ∈ U calculate u = GK v mod q (assuming q > v) and αi =
(d)
(d)
{Yj : j ∈ βi }, where Yj = {Yj (x) = u + Yj (x) mod q : d ∈ D}. 3. Same as step 3 of the scheme in section 5.2. 4. Each sponsor Ukl ∈ Azl does the following for each new user Ui ∈ Jl . (a) Computes fkl ,i = Fkl (i). (b) Chooses a set Dl of t + 1 − |Azl | elements from the set D, Dl ⊆ D, and for (d) (d) (d) all j ∈ βkl , computes yj,i = Yj (i), for all d ∈ Dl . Note Yj (x) ∈ Yj .
(d)
(c) Individually sends MJl (Ui ) = {fkl ,i $kl } ∪ {yj,i $j$d : j ∈ βkl , d ∈ Dl } to Ui , over a secure unicast channel. All Ukl ∈ Azl choose the same Dl . When βkl1 ∩ βkl2 = ∅, Ukl1 , Ukl2 ∈ Azl , Ukl1 = Ukl2 , only one sponsor (d) sends yj,i .
A Dynamic Group Key Distribution Scheme with Flexible User Join
491
Using the secret information sent by all sponsors, a new user Ui ∈ Jl calculates secret information K(Ui ) as follows, for βzl = Uk ∈Az βkl and φ = A˘zl ∪ Dl
K(Ui ) =
k ∈A˘ l
l
l
fkl ,i Ψ (φ, kl ) +
(d)
yj,i Ψ (φ, d) mod q
d∈Dl j∈βzl
zl
.
Output: All new users in Jl obtain correct secret information. Proof. To be in Ul , a new user Ui ∈ Jl has to obtain secret information that is the evaluation of the system secret F (x, y) at y = i. From the messages, Ui gets |A˘zl | pieces of information that is fkl ,i = Fkl (i) = F (i, kl ), for all kl ∈ A˘zl (in general |A˘zl | ≤ t). From the property of cumulative array C(Γ ), βzl = {1, . . . , v}, (d) so Ui obtains yj,i , for 1 ≤ j ≤ v, for all d ∈ Dl . From those, Ui may form |Dl | = t + 1 − |Azl | pieces of information that is F (i, d), for all d ∈ Dl , as follows, referring to equation (2).
(d)
yj,i mod q =
j∈βzl
(d)
Yj
(i) mod q,
j∈βzl
GKl (d) + Yj (i) mod q, v j∈βzl (d) = GKl + Yj (i) mod q,
=
j∈βzl
= GKl + F (i, d) mod q,
= F (i, d). Interpolation of all |A˘zl |+ |Dl | = |A˘zl |+ t+ 1 − |Azl | = t+ 1 pieces of information ✷ (|A˘zl | = |Azl |) gives secret information K(Ui ) = {F (x, i) = Fi (x)}. Theorem 4. In this scheme, users in a defined access set may perform JOIN events, while any t or less users that do not contain all users in a defined access set cannot do so. The set Jl has any number of new users and requires transmission of at most (2w + 3v(t + 1 − w)) log q bits over secure unicast channels.
7
Security Proofs
We assume the collusion C, |C| ≤ t, does not contain an access set, C ⊇ A, ∀A ∈ Γl . We give security proofs for the extended scheme, which explicitly include security proofs for the basic scheme. We show the collusion only knows K(C) = Ui ∈C K(Ui ) = {Fi (x), αi : Ui ∈ C} and cannot gain additional information about F (x, y) from K(C), then we show that the protocols for SUBGROUP and JOIN membership events are secure.
492
Hartono Kurnio et al.
Theorem 5. Any collusion C, |C| ≤ t, can obtain the session secret F (0, 0) for session Sl if and only if some access set Azl of Sl is contained in C. Proof. (sketch) If there is an access set of Sl in C then the only unknown in equation (1) is F (x, y), which can thus be obtained. Without loss of generality we take the strongest C not containing an access set to be of size t, and to hold all Yj (x, y), 1 ≤ j ≤ v − 1. Using the symmetry in F (x, y) and their key information the colluders can calculate t points, F (i, k), i ∈ C, in F (x, k), ∀Uk ∈ / C. Equation (1) gives them one equation for F (k , k), k ∈ /C but with another unknown Yv (k , k) also. The colluders cannot solve this, so cannot find F (0, 0) letting k = k = 0 ∈ / N. ✷ This proof applies to the variant scheme also, using equation (2) not (1). Theorem 6. For any session Sl : S1 ≤ Sl ≤ SM , El = SUBGROUP, a collusion C ⊆ Ul \ Gl , |C| ≤ t, cannot find the group key GKl . Proof. (sketch) We note that from the theorem 5 the collusion of non-subgroup users cannot obtain F (0, zl ) from their secret information K(C) and so cannot calculate the group key GKl = (ˆ g)F (0,zl ) (ˆ g is public). Although each adverg )Fzl (i) , they do not have sary Ui ∈ C can find Fzl (i) = Fi (zl ) and calculate (ˆ enough information since the t points (in the exponent) are released in MGl . Consider the case where a collusion of size t tries to find the group key GKl from broadcast messages. The proof uses the Decisional Diffie-Hellman (DDH) assumption, informally stated as: “given a cyclic group P and a generator g, there is no efficient algorithm that can distinguish between the two distributions (g a , g b , g ab ) and (g a , g b , g c ) where a, b, c are randomly chosen in [1, |P |]” [4]. We use a “reduction argument”: if there exists an algorithm (probabilistic polynomial-time) V using K(C), the broadcast MG , and all public information, to distinguish GKl = g rFzl (0) from a random value, then V contradicts DDH. Without loss of generality, we assume that the group initiator is Uzl and performs polynomially many SUBGROUP membership events. We assume C = {Ud } (t = 1). Let V be the algorithm that on input of values Fzl (d), polynomially many tuples (g rj , g rj Fzl (d) , g rj Fzl (0) ) generated with randomly chosen rj ’s, and a pair g r , g rFzl (d) , distinguishes between g rFzl (0) and a random value. Let V be the algorithm using V to break DDH. V takes g a , g b , and C and has to decide whether C is g ab or a random value. V generates inputs to V . Let Fzl (0) = b and r = a. V generates a randomly chosen set C ∗ = {Ud } and values Fzl (d ), random rj ’s and tuples (g rj , g rj Fzl (d ) , g rj b ), and gives them to V . a Then V gives the pair g , C to V , takes the output of V and outputs the same value. In this way V can distinguish between g ab and C, contradicting DDH.✷ Theorem 7. For any session Sl , where S1 ≤ Sl ≤ SM , a collusion C ⊆ Ja : Sl ≤ Sa ≤ SM , Ea = JOIN, |C| ≤ t, cannot find GKb : S1 ≤ Sb ≤ Sl , Eb = SUBGROUP. Proof. (sketch) Although in adding new users the system secret F (x, y) is updated in every join operation, we may assume secret information of the col lusion K(C) = {Fi (x), αi : Ui ∈ C} is determined from an updated system
A Dynamic Group Key Distribution Scheme with Flexible User Join
493
secret F (x, y) = GKl + F (x, y) mod q, as when the collusion tries to find group keys of previous sessions. Key GKl is established by Ukl ∈ Azl before joining the new users (collusion). Since the collusion does not know GKl , they cannot find group keys of sessions before Sl . The security proof is similar to that of theorem 6. The proof also shows a new user Ui , even a collusion of size t, cannot use messages MJl (Ui ) received from sponsors Ukl ∈ Azl to find old group keys. ✷
8 8.1
Additional Discussions Extension of Defined Access Sets
Observe the access sets in Γ are subsets of U0 and defined in the system initialisation. In practice it will be desirable to have flexible and dynamic Γ in the sense that new defined access sets can be appended to Γ during system lifetime. Let Γ be a collection of new access sets with cardinality at most t and defined over new users Jl and existing users Ul ⊇ Γ . After the extension, the new collection will be Γ ∪Γ and is assumed to be minimal. Let U = A∈Γ A and recall U = A∈Γ A. The extension gives secret information αi to all users Ui ∈ U \ U. The users in Jl (= U \ Ul ⊆ U \ U ) are new users for the group Ul so they also need secret information Fi (x) as well as αi . The users in (U \ U) \ Jl are existing users in Ul , so only require secret information αi . To give Fi (x) to users Ui ∈ Jl we may use the protocol in section 4.3, 5.2 or 6.2. We briefly described two techniques for sponsors in an access set Azl ∈ Γ to give secret information αi to users Ui ∈ U \ U (without the GC’s assistance). Without loss of generality let F = {/1 , . . . , /v }. Pre-defined New Access Sets The idea is to define, during system initialisation, some dummy access sets Γ, defined over U0 and some dummy users ¨ in addition to access sets Γ . The cumulative array will be C(Γ ∪ Γ), and is U, assumed to be minimal. We can use these reserved access sets during system lifetime. With this technique, appended new access sets are constrained to Γ ⊆ Γ, ¨ existing users (U \ U) \ Jl ⊆ U0 and new users Jl ⊆ U. Observe that each user Ui ∈ U \ U can obtain αi = {/j : j ∈ βi } from sponsors Ukl ∈ Azl whom securely send {/j : /j ∈ αkl , j ∈ βkl ∩ βi }. User Ui will precisely get all elements in αi since βzl = Uk ∈Az βkl = {1, . . . , v} and l l βi ⊆ {1, . . . , v}. Arbitrary New Access Sets We use the idea of redistribution schemes [8] to append arbitrary new access sets. There will be a new u ˇ × vˇ cumulative array C(Γ ∪ Γ) = [cie ] with Fˇ = {ω1 , . . . , ωvˇ }, a redistribution of F = {/1 , . . . , /v }. For each /j ∈ F, we apply the (ˇ v , vˇ)–threshold scheme to /j , giving shares v v (1) (ˇ v) (e) ˇ /j , . . . , /j . Each ωe ∈ F is computed as ωe = j=1 /j mod q. Note j=1 /j vˇ modq = e=1 ωe mod q.
494
Hartono Kurnio et al.
Table 4. Performance Comparison Subgroup Join Initiator Size Trans. Sponsor Size [1] 1 ≥n−t t GC any [22] 1 ≥ n − t O(log n) Basic 1 ≥n−t t t+1 any *to join a new user; n = |U|.
Trans.* 1 t+1
Collusion User Size Storage ≤t 1 ≤t O(log n) ≤t t+1
Redistribution is by sponsors Ukl ∈ Azl since Uk ∈Az αkl = F . Each Ui ∈ l l U ∪ U obtains α ˇ i = {ωe : e ∈ βˇi }, where βˇi = {e : cie = 1}, from sponsors (e) whom securely send shares /j : 1 ≤ j ≤ v, e ∈ βˇi . Ui computes α ˇ i = {ωe = v (e) / mod q : e ∈ βˇi }. j=1 j
8.2
Performance Comparison
In table 4 we compare the performance of our schemes with the decentralised schemes of [1, 22]. Parameters are measured in number of users, except transmission and user storage which use key length as the unit of measurement. The scheme of [1] and the basic scheme are based on public key systems, while the OR scheme of [22] is based on a symmetric key system. With the exception of [1], which uses public keys, the performance for subgrouping is similar. The user group in [22] is assumed static, and new user admission in [1] is by the GC only. Only the basic scheme has decentralised join, where the threshold of t + 1 users is required to admit new users to the group. The extended scheme and its variant, not shown above, allow specified access sets of size at most t to perform joins. Their efficiency (transmission and storage) depend on the structure of the access sets. The search for more efficient constructions for the specified access sets model continues.
References [1] J. Anzai, N. Matsuzaki and T. Matsumoto. A Quick Group Key Distribution Scheme with ”Entity Revocation”. Advances in Cryptology – ASIACRYPT ’99, LNCS 1716, 333-347, 1999. 479, 481, 494 [2] R. Blom. An Optimal Class of Symmetric Key Generation Systems. Advances in Cryptology – EUROCRYPT ’84, LNCS 209, 335-338, 1985. 480, 485 [3] C. Blundo, A. De Santis, A. Herzberg, S. Kutten, U. Vaccaro and M. Yung. Perfectly Secure Key Distribution for Dynamic Conferences. Advances in Cryptology – CRYPTO’92, LNCS 740, 471-486, 1993. 480 [4] D. Boneh. The Decision Diffie-Hellman Problem. 3rd Algorithmic Number Theory Symposium, LNCS 1423, 48-63, 1998. 492
A Dynamic Group Key Distribution Scheme with Flexible User Join
495
[5] R. Canetti, T. Malkin and K. Nissim. Efficient Communication-Storage Tradeoffs for Multicast Encryption. Advances in Cryptology – EUROCRYPT ’99, LNCS 1592, 459-474, 1999. 480 [6] C. Charnes and J. Pieprzyk. Cumulative Arrays and Generalised Shamir Secret Sharing Schemes. 17th Annual Computer Science Conference (ACSC-17), 519528, 1994. 483 [7] Y. Desmedt and Y. Frankel. Threshold Cryptosystems. Advances in Cryptology – CRYPTO ’89, LNCS 435, 307-315, 1989. [8] Y. Desmedt and S. Jajodia. Redistributing secret shares to new access structures and its applications. Preprint, 1997. 493 [9] W. Diffie and M. Hellman. New Directions in Cryptography. IEEE Trans. Inform. Theory 22, 644-654, 1976. 481 [10] A. Fiat and M. Naor. Broadcast Encryption. Advances in Cryptology – CRYPTO ’93, LNCS 773, 480-491, 1994. 479, 480 [11] W. -A. Jackson and K. Martin. Cumulative Arrays and Geometric Secret Sharing Schemes. Advances in Cryptology – AUSCRYPT ’92, LNCS 718, 48-55, 1993. 483 [12] E. Karnin, J. Greene and M. Hellman. On Secret Sharing Systems. IEEE Transactions on Information Theory, vol. 29, 35-41, 1983. 484 [13] Y. Kim, A. Perrig and G. Tsudik. Simple and Fault-Tolerance Key Agreement for Dynamic Collaborative Groups. 7th ACM Conference on Computer and Communications Security, ACM Press, 235-244, 2000. 480 [14] R. Kumar, S. Rajagopalan and A. Sahai. Coding Constructions for Blacklisting Problems Without Computational Assumptions. Advances in Cryptology – CRYPTO ’99, LNCS 1666, 609-623, 1999. 480 [15] H. Kurnio, R. Safavi-Naini, W. Susilo and H. Wang. Key Management for Secure Multicast with Dynamic Controller. Information Security and Privacy – ACISP 2000, LNCS 1841, 178-190, 2000. 480 [16] H. Kurnio, R. Safavi-Naini and H. Wang. A Group Key Distribution Scheme with Decentralised User Join. Pre-Proceedings of 3rd Conference on Security in Communication Networks – SCN 2002, 2002. 484, 486 [17] M. Luby and J. Staddon. Combinatorial Bounds for Broadcast Encryption. Advances in Cryptology – EUROCRYPT ’98, LNCS 1403, 512-526, 1998. 479, 480 [18] T. Matsumoto and H. Imai. On a Key Predistribution System - A Practical Solution to the Key Distribution Problem. Advances in Cryptology – CRYPTO ’87, LNCS 293, 185-193, 1988. 480 [19] D. A. McGrew and A. T. Sherman. Key Establishment in Large Dynamic Groups Using One-Way Function Trees. Manuscript, 1998. 480 [20] C. J. Mitchell and F. C. Piper. Key Storage in Secure Networks. Discrete Applied Mathematics 21, 215-228, 1988. 480 [21] M. Naor and B. Pinkas. Efficient Trace and Revoke Schemes. Financial Cryptography 2000, LNCS 1962, 1-20, 2001. 479, 481 [22] R. Safavi-Naini and H. Wang. New Constructions for Multicast Re-keying Schemes using Perfect Hash Families. 7th ACM Conference on Computer and Communications Security, ACM Press, 228-234, 2000. 479, 480, 494 [23] A. Shamir. How to Share a Secret. Communications of the ACM 22, 612-613, 1979. 481, 483 [24] G. Simmons, W. -A. Jackson and K. Martin. The Geometry of Shared Secret Schemes. Bulletin of the Institute of Combinatorics and its Applications (ICA), vol. 1, 71-88, 1991. 483
496
Hartono Kurnio et al.
[25] M. Steiner, G. Tsudik and M. Waidner. Key Agreement in Dynamic Peer Groups. IEEE Transactions on Parallel and Distributed Systems 11, no. 8, 769-780, 2000. 481 [26] D. R. Stinson and T. van Trung. Some New Results on Key Distribution Patterns and Broadcast Encryption. Designs, Codes and Cryptography 15, 261-279, 1998. 479, 480 [27] D. M. Wallner, E. J. Harder and R. C. Agee. Key Management for Multicast: Issues and Architectures. Internet Draft (draft-wallner-key-arch-01.txt), ftp://ftp.ietf.org/internet-drafts/draft-wallner-key-arch-01.txt. 480
Efficient Multicast Key Management for Stateless Receivers Ju Hee Ki1 , Hyun Jeong Kim1 , Dong Hoon Lee1, , and Chang Seop Park2 1
2
CIST, Korea University, Seoul, Korea {eye,khj}@cist.korea.ac.kr [email protected] Information Security Lab, Dankook University, Seoul, Korea [email protected]
Abstract. In a multicast communication system, group members possess a common group key and communicate using the key. For the communication between group members the group key must be managed securely and efficiently. Especially in a dynamic and large group the efficiency of the group key management is critical since the size of deleted or added members may be quite large. Most schemes proposed in the literature implicitly assume that members is constantly on-line, which is not realizable in many receiving devices. In the paper, we propose a hierarchical binary tree-based key management scheme for a dynamic large group with one group controller(GC), especially well suitable to stateless receivers, who do not update their state from session to session. In our scheme, all re-keying messages except for unicast of a individual key are transmitted without any encryption, all computation needed for re-keying is O(log 2 n) applications of one-way hash function and XOR operation, and all information needed for re-keying is in the current blinded factors and the initial information. The proposed scheme provides both backward and forward secrecy. If a bulletin board is used, each member can compute all needed keys without any re-keying messages. Keywords: key management, multicast, group communication
1
Introduction
In recent years, with the explosive developments of communication technologies, a wide range of services and applications on communication systems based on groups are supported by enabling sources to send a single copy of a message to multiple recipients who explicitly want to receive the information. Examples are pay TV, teleconference, real-time information service, multi-player game, coaction through network and etc. One of the main security issues in multicast communication is access control for making sure that only legitimate members can access to the multicast
This work was supported by grant No. R01-2001-000-00537-0(2002) from the Korea Science & Engineering Foundation.
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 497–509, 2003. c Springer-Verlag Berlin Heidelberg 2003
498
Ju Hee Ki et al.
group communication. The security mechanism can be provided by encrypting the group communication messages using a common group key which is not known to non-members. However the difficulty of managing cryptographic keys used for the group communication arises from the dynamic membership change problem. Every time a member is deleted from or added to a multicast group, the group controller must change the group key. Since a large number of members must be considered in the design of the key management scheme for the mulicast communication, both efficiencies the communication and the storage are important factors in terms of minimizing the mulicast control messages for re-keying and reducing a storage for key management. Among others, the key management schemes based on a hierarchical binary tree (HBT) has been proposed and turned out to be successful with respect to both security and efficiency. 1.1
Related Works
Wallner et al. [11] and Caronni et al. [2, 10] proposed the schemes with an HBT. In these schemes a group controller (GC) maintains a logical tree of keys. Each node of a tree holds a key encryption key (KEK). The leaves of a tree corresponding to group members. Each member secretly maintains the KEKs related with nodes in the path from the leaf to the root. Hence in a balanced binary tree, each member stores log2 n + 1 keys, where n is the number of members. When a member is added to (or deleted from) a group, the size of rekeying messages is 2K · log2 n where K is the size of a key. McGrew and Sherman proposed the scheme using so-called one-way function tree (OFT) [5]. Using one-way function and a mixing function the size of rekeying messages is reduced from 2K · log2 n to K · log2 n. In the scheme proposed by Canetti et al. [1] using pseudo-random generator tree the size of rekeying messages is also K · log2 n. Chang and Engel [3] presented the scheme using boolean function, where the Karnaugh map [4] is applied to find the smallest set of KEKs needed to rekey. Although the size of rekeying messages and the storage of the GC are reduced, their scheme is not secure against the attack by colluding or compromised members who can cooperate to determine all the keys on the system. Perrig et al. proposed a large-group key (ELK) protocol [7] using pseudorandom functions. For deleting a member the size of rekeying messages is K · log2 n. For adding a new member other group members can update their keys independently without any rekeying broadcast message. However in the case of joining l-members (l ≥ 1), it needs l unicast messages for the worst case and the costs of computation for the GC is very high since all KEKs in a tree must be changed. Rafaeli et al. proposed an efficient scheme based on an HBT (EHBT) using key indices [9]. This EHBT protocol achieves I · log2 n message size for adding a member and K · log2 n message size for deleting a member where I is the size of a key index and I < K. Recently Park and Lee [8] proposed a scheme reducing the group manager’s storage greatly. But their scheme does not provide
Efficient Multicast Key Management for Stateless Receivers
499
forward secrecy, I.e., new member can read previous messages. We note that all schemes above need an assumption that each member updates its state from session to session. Naor et al. proposed an efficient scheme in which a group member decrypts re-keying messages once to gain a group key [6]. In their scheme, the size of multicast messages is (2r − 1)K and the storage of a member is 12 K log2 n, where r is the number of changed members in the group. But their scheme does not provide backward secrecy clearly. To satisfy backward secrecy, the scheme must broadcast K · O(log 2 n) encrypted messages to each member whenever one new member joins the group. 1.2
Our Contributions
In many practical environments, a group member may not be able to record all the past history of transmissions or change its state from session to session. The stateless receiver may be a device in which the operation must be accomplished only with the current transmission and its initial configuration [6]. Since a receiver such as a multimedia player and a satellite receiver (GPS) may be not always on-line, it is important in practical aspects to provide a key management scheme for stateless receivers. Straightforward method to convert an ordinary scheme into one suitable to stateless receivers is to include all the history in every re-keying message, which is extremely inefficient. In our scheme, even though a member has not participated in communication for a long time, he/she can easily compute new keys only with the initial individual key and the latest rekeying messages related with needed nodes without keeping track of a history of all regenerated keys. In our scheme, re-keying messages are hidden by using one-way hash function and XOR operation and hence there is no encryption/decryption process except for unicast of an added member’s individual key. All computations needed for each member to re-key is O(log 2 n) applications of one-way hash function and XOR operation. If our scheme is allowed to use a public access location, each member can update keys without any re-keying message. This feature makes our scheme well suited to the fields where the users are low-end with severe memory restrictions. Although the length of re-keying messages is longer than one in OFT and EHBT (as analyzed in Section 5), our scheme is more efficient with respect to the computation and works for stateless receivers. In the scheme proposed by Naor et al. [6], the size of multicast messages is smaller than OFT, EHBT and our scheme. The scheme is efficient for stateless receivers, but does not provide efficient backward secrecy while our scheme does. Furthermore their scheme is less efficient than our scheme with respect to the storage and computation of the GC and a member. The rest of this paper is organized in four sections. In section 2 we briefly explain an idea of our scheme. Section 3 describes our scheme in detail, and Section 4 compares the proposed scheme with other schemes. Finally Section 5 summarizes our results.
500
Ju Hee Ki et al.
ni nLi
nRi
Fig. 1. An interior node of a binary tree and its two children
2
Our Idea
Our scheme has one centralized group controller (GC) that is responsible for n authenticated members and manages binary tree-based node keys which are used to update the group membership. We define the following terminology and explain our idea before we construct our scheme. In a binary tree, an interior node ni has exactly two children as shown in Figure 1. We define each of two children nL(i) and nR(i) as a left child node and a right child node respectively, and the interior node ni as a parent node of two children. The most important idea in our scheme is the usage of a function H : X × Y → Y which has the following properties : Property 1 For all x1 , x2 ∈ X, y ∈ Y , H(x1 , H(x2 , y)) = H(x2 , H(x1 , y)). Property 2 Given y and z(= H(x, y)) for all x ∈ X, y ∈ Y ,it is computationally infeasible to find x ∈ X such that H(x , y) = z. Each node in the tree is assigned a node key. For a node key ki assigned to a node ni , let kL(i) be the node key of its left child node, and kR(i) be the node key of its right child node. The above properties are applied to our scheme as follows : – Given H(kL(i) , r) and H(kR(i) , r) for some value r ∈ Y , any member with kL(i) or kR(i) can compute the node key ki where ki is H(kL(i) , H(kR(i) , r)) = H(kR(i) , H(kL(i) , r))). – With information of r and H(kL(i) , r), it is infeasible to find the node key kL(i) in polynomial time. Consider a node ni . Suppose that a member U1 knows the node key kL(i) of the left child node of ni and a member U2 knows the node key kR(i) of the right child node of ni . Then Property 1 means that U1 and U2 can easily compute the node key ki of ni using H(kR(i) , r) and H(kL(i) , r) respectively. Property 2 means that U1 cannot easily find kR(i) from H(kR(i) , r) and U2 cannot easily find KL(i) from H(KL(i) , r). The following is an example of the function satisfying two properties above. Example H(x, y) = h(x) ⊕ y, where h is an one-way hash function and ⊕ denotes bitwise exclusive-or.
Efficient Multicast Key Management for Stateless Receivers
501
It is easy to prove the function in the example satisfies Property 1 and 2. Our scheme use the function and the security of our scheme relies on the following theorem. Theorem 1. Let H : {0, 1}l × {0, 1}m −→ {0, 1}m such that H(x, y) = h(x) ⊕ y where m(≤ l) is a security parameter and h is an one-way hash function. Given H(x1 , y) and H(x2 , y) for all x1 , x2 ∈ {0, 1}l and y ∈ {0, 1}m, without x1 , x2 and y it is infeasible to find H(x1 , H(x2 , y)) = H(x2 , H(x1 , y)) with the probability more than 21m . Proof. Let H(x1 , H(x2 , y)) = H(x2 , H(x1 , y)) be K. Suppose that given H(x1 , y) and H(x2 , y) an attacker A without x1 , x2 and y wants to try to compute K (= h(x1 ) ⊕ h(x2 ) ⊕ y). A can only obtain the fact that H(x1 , y) ⊕ H(x2 , y) = h(x1 ) ⊕ h(x2 ) from H(x1 , y) and H(x2 , y). To compute K, only way that A can do is to randomly chooses y ∈ {0, 1}m. y satisfies the following equations : H(x1 , y) ⊕ y = h(x1 ) ⊕ y ⊕ y H(x2 , y) ⊕ y = h(x2 ) ⊕ y ⊕ y . Consequently it is always true that (H(x1 , y)⊕y )⊕(H(x1 , y)⊕y ) = h(x1 )⊕ h(x2 ). Therefore the probability that A finds out the exact value of K is the probability that A chooses y = y in the domain {0, 1}m. So A cannot find out the value of K with the probability more than 21m . The theorem implies that even though the GC sends multicast message [H(kL(i) , r), H(kR(i) , r)] that is needed to compute the node key ki , other members except members with kL(i) or kR(i) cannot find any information from this multicast message. Using this function makes our scheme need no encryption/decryption for multicast messages and more efficient.
3 3.1
Our Basic Scheme Structure of Our Scheme
In our scheme, the GC maintains a logical tree of keys as in other schemes based on an hierarchical binary tree. Each node has one node key called key generation key(KGK) which is used for generating a parent key, while in other schemes the keys are used for encryption/decryption of ancestors keys. Each leaf is associated with a group member. Each interior node of the tree has exactly two children. The root key is considered as the group key. The GC first assigns a randomly chosen key ki called individual key to each member Ui , securely unicasts this key to the member, and stores this individual key on the member’s leaf. In order to generate interior node keys the GC additionally chooses 2n − 2 auxiliary keys (ai ) and n − 1 random values (ri ). An auxiliary key ai is concerned in a node ni except the root node, and a random value ri is concerned in a node ni except leaf nodes. An individual key is a longterm secret key of each group member, and an auxiliary key is a short-term
502
Ju Hee Ki et al.
K18
K14
K 58
a14
a58
K12
K 34
a12
K56
K 78
a56
a34
a78
K1
K2
K3
a3
a4
K4
K5
K6
K7
K8
U1
U2
U3
U4
U5
U6
U7
U8
a1
a2
a5
a6
a8
a7
Fig. 2. An example of group initialization
key (i.e., whenever the parent node key of a node is changed, the auxiliary key should be regenerated). The individual key is secret but the auxiliary key is not necessary secret. Then the GC computes an interior node key kj as follows. kj = H(kL(j) ⊕ aL(j) , H(kR(j) ⊕ aR(j) , rj )) = H(kR(j) ⊕ aR(j) , H(kL(j) ⊕ aL(j) , rj )) where H : {0, 1}l × {0, 1}m −→ {0, 1}m defined by H(x, y) = h(x) ⊕ y with the security parameter m, and L(j) and R(j) denote the left and right child of the node nj respectively. We define H(kL(j) ⊕ aL(j) , rj ) and H(kR(j) ⊕ aR(j) , rj ) as a pair of blinded factors of kj . If there are kj and kR(j) (kL(j) ) in the path from a member U ’s leaf to the root, we define H(kL(j) ⊕aL(j) , rj ) (H(kR(j) ⊕aR(j) , rj )) as the blinded sibling factor of kR(j) (kL(j) ). After unicasting individual keys, the GC multicasts (2n − 2)-auxiliary keys and (n−1) pairs of blinded factors [H(kL(j) ⊕aL(j) , rj ), H(kR(j) ⊕aR(j) , rj )] with which all node keys except the root key are associated. Each group member can compute KGKs in the path from the corresponding leaf to the root by using his/her individual key and each blinded sibling factor of KGKs. For example, as shown in Figure 2, the GC unicasts a individual key ki to Ui , and multicasts 14-auxiliary keys and seven pairs of blinded factors : [a1 , · · · , a58 , H(k1 ⊕a1 , r12 ), H(k2 ⊕a2 , r12 ), H(k3 ⊕a3 , r34 ), H(k4 ⊕a4 , r34 ), H(k5 ⊕ a5 , r56 ), H(k6 ⊕a6 , r56 ), H(k7 ⊕a7 , r78 ), H(k8 ⊕a8 , r78 ), H(k12 ⊕a12 , r14 ), H(k34 ⊕ a34 , r14 ), H(k56 ⊕ a56 , r58 ), H(k78 ⊕ a78 , r58 ), H(k14 ⊕ a14 , r18 ), H(k58 ⊕ a58 , r18 )]. All members in the subtree rooted at nL(j) can compute kL(j) by using their individual keys. So they can compute kj by using kL(j) and the blinded sibling factor H(kR(j) ⊕ aj , rj ). For example, using individual key k3 , U3 can compute k34 , k14 , and k18 as follows :
Efficient Multicast Key Management for Stateless Receivers
503
k34 = H(k3 ⊕ a3 , H(k4 ⊕ a4 , r34 )) = h(k3 ⊕ a3 ) ⊕ h(k4 ⊕ a4 ) ⊕ r34 . k14 = H(k34 ⊕ a34 , H(k12 ⊕ a12 , r14 )) = h(k34 ⊕ a34 ) ⊕ h(k12 ⊕ a12 ) ⊕ r14 . k18 = H(k14 ⊕ a14 , H(k58 ⊕ a58 , r18 )) = h(k14 ⊕ a14 ) ⊕ h(k58 ⊕ a58 ) ⊕ r18 . Each member has to keep the individual key ki secure but needs not to the auxiliary key ai . No member can compute any partial bits of KGK kj not in the path from his/her leaf to the root, even though all blinded factors in the tree are known. By Theorem 1 one who knows only H(kL(j) ⊕ aL(j) , rj ) and H(kR(j) ⊕aR(j) , rj ) cannot compute kj = H(kL(j) ⊕aL(j) , H(kR(j) ⊕aR(j) , rj )) = H(kR(j) ⊕ aR(j) , H(kL(j) ⊕ aL(j) , rj )). This means that there is no need for encryption/decryption process in our scheme. Note that the GC is able to set all auxiliary keys to only one random value, since each individual key is a secret value and h is an one-way hash function. 3.2
Adding to Group
Single Member Addition to the Group When the GC receives a joining request from a new member Ui with which leaf node ni is to be associated, the GC assigns a randomly chosen individual key ki to Ui . To insert ni into the tree, the GC searches the nearest leaf node nj from the root to keep the height of the tree as low as possible. The GC replaces node nj with a new node np , and ni and nj are attached as np ’s children. An example is depicted in Figure 3. Let U6 be a new member and node n5 be the shallowest node. Node n56 becomes the new parent of leaves n5 and n6 . U6 is
K18’
K18
K14
K14
K 58
a14
K5
a5
U5
K’58
a14’
a58
a58’ K 56
K 78 a78
K 78 a78’
a56
K5
K ’6
U5
U6
a5’
a6’
Fig. 3. U6 is added to the group
504
Ju Hee Ki et al.
placed in leaf node n6 , which is a right child node of node n56 . Key k6 is assigned to leaf n6 . In order to guarantee backward secrecy, the GC has to update all KGKs in , r18 ). the path from n56 to the root. The GC chooses random values (R, r56 , r58 The GC sets R = a5 = a6 = a56 = a78 = a14 = a58 , and generates new KGKs as follows : k56 = H(k5 ⊕ a5 , H(k6 ⊕ a6 , r56 )) = H(k6 ⊕ a6 , H(k5 ⊕ a5 , r56 )). = H(k56 ⊕ a56 , H(k78 ⊕ a78 , r58 )) = H(k78 ⊕ a78 , H(k56 ⊕ a56 , r58 )). k58 k18 = H(k14 ⊕ a14 , H(k58 ⊕ a58 , r18 )) = H(k58 ⊕ a58 , H(k14 ⊕ a14 , r18 )).
The GC generates a unicast message[k6 ] for member U6 and a multicast mes sage [R, H(k5 ⊕ a5 , r56 ), H(k6 ⊕ a6 , r56 ), H(k56 ⊕ a56 , r58 ), H(k78 ⊕ a78 , r58 ), H(k14 ⊕ a14 , r18 ), H(k58 ⊕ a58 , r18 )]. Multiple Member Addition to the Group Several new members may be added to the group simultaneously. They are positioned as right child nodes of the nodes which are the shallowest leaves in the current tree. See Figure 4 for an example. If members U2 and U3 are added to the group, they are attach to nodes n12 and n34 , and receive individual keys k2 and k3 respectively. The GC chooses random values (R, r12 , r34 , r14 , r18 ). The GC sets R = a1 = a2 = a3 = a4 = a12 = a34 = a14 = a58 and generates new KGKs as follows : k12 = H(k1 ⊕ a1 , H(k2 ⊕ a2 , r12 )) = H(k2 ⊕ a2 , H(k1 ⊕ a1 , r12 )) k34 = H(k3 ⊕ a3 , H(k4 ⊕ a4 , r34 )) = H(k4 ⊕ a4 , H(k3 ⊕ a3 , r34 )) k14 = H(k12 ⊕ a12 , H(k34 ⊕ a34 , r14 )) = H(k34 ⊕ a34 , H(k12 ⊕ a12 , r14 )) k18 = H(k14 ⊕ a14 , H(k58 ⊕ a58 , r18 )) = H(k58 ⊕ a58 , H(k14 ⊕ a14 , r18 ))
K18
K 58
K14
K14’
a58
a14 K1
K18’
a4
U1
U4
a58
K12
K4
a1
K 58
a14’
K34
a12 K1
a34 K2
K3
a3
a4’
U1 U2
U3
U4
a1’
a2
K4
Fig. 4. U2 and U3 are added to the group
Efficient Multicast Key Management for Stateless Receivers
505
The GC generates a unicast message[k2 ] for member U2 , [k3 ] for member U3 and a multicast message [R, H(k1 ⊕ a1 , r12 ), H(k2 ⊕ a2 , r12 ), H(k3 ⊕ a3 , r34 ), H(k4 ⊕ a4 , r34 ) H(k12 ⊕ a12 , r14 ), H(k34 ⊕ a34 , r14 ), H(k14 ⊕ a14 , r18 ), H(k58 ⊕ a58 , r18 )]. 3.3
Deleting from Group
Single Member Deletion from the Group When the member Ui associated with the leaf node ni is deleted from the group, the member assigned to the sibling of ni is reassigned to the parent node np of ni . Moreover all keys known to Ui and the auxiliary key of the sibling member should be updated to guarantee forward secrecy. For example, as shown in Figure 5, when U6 is deleted from the group, nodes n6 and n56 are removed and the leaf node n5 is promoted to its parents’ place. To update k58 and k18 the GC chooses random values (R, r58 , r18 ). The GC sets R = a5 = a78 = a14 = a58 and computes KGKs as follows : k58 = H(k5 ⊕ a5 , H(k78 ⊕ a78 , r58 )) = H(k78 ⊕ a78 , H(k5 ⊕ a5 , r58 )). k18 = H(k14 ⊕ a14 , H(k58 ⊕ a58 , r18 )) = H(k58 ⊕ a58 , H(k14 ⊕ a14 , r18 )). The GC generates a multicast message [R, H(k5 ⊕ a5 , r58 ), H(k78 ⊕ a78 , r58 ), ), H(k58 ⊕ a58 , r18 )]. Note that in deletion operations, previous H(k14 ⊕ a14 , r18 other schemes need to encrypt/decrypt a node key. However there is no encryption/decryption process of re-keying in our method.
Multiple Member Deletion from the Group This case is handled similarly to the single member deletion. The deleted nodes are removed and the tree is adjusted accordingly. For example, as shown in Figure 6, when member U2 and U3 are deleted from the group, nodes n2 , n3 , n12 , and n34 are removed and leaf nodes n1 and n4 are promoted to their respective parents’ places n12
K18
K14
K18’
K14
K 58
a14
K56
K 78 a78
a56
K5
K6
U5
U6
a5
K 58’
a14’
a58
a58’ K5
a5’
K 78 a78’
U5
a6
Fig. 5. U6 is deleted from the group
506
Ju Hee Ki et al.
and n34 . The GC chooses (R, r14 , r18 ). The GC sets R = a1 = a4 = a14 = a58 and updates KGKs as follows:
K18
K14
K 58
a14
K34
a1
a34 K2
a2
K3
a3
K4
a4
K 58
K14’
a58
K12
a12 K1
K18’
a58’
a14’ K1 a1’
a4’
U1
U4
K4
’
U1 U2 U3 U4 Fig. 6. U2 and U3 are deleted from the group
k14 = H(k1 ⊕ a1 , H(k4 ⊕ a4 , r14 )) = H(k4 ⊕ a4 , H(k1 ⊕ a1 , r14 )). )). k18 = H(k14 ⊕ a14 , H(k58 ⊕ a58 , r18 )) = H(k58 ⊕ a58 , H(k14 ⊕ a14 , r18 The GC generates a multicast message [R, H(k1 ⊕ a1 , r14 ), H(k4 ⊕ a4 , r14 ), ⊕ a14 , r18 ), H(k58 ⊕ a58 , r18 )]. H(k14
3.4
A Variant Using a Bulletin Board
If our scheme is allowed to use a public access location where the GC can publish blinded factors corresponding to all KGKs and all auxiliary keys ai ’s, we can gain several advantages. We refer to this location as bulletin board. The amount of the public data on the bulletin board is (4n − 4)K where K is the length of key. In Figure 1 for the example the public data is [a1 , · · ·, a58 , H(k1 ⊕ a1 , r12 ),H(k2 ⊕ a2 , r12 ), H(k3 ⊕ a3 , r34 ), H(k4 ⊕ a4 , r34 ), H(k5 ⊕ a5 , r56 ), H(k6 ⊕a6 , r56 ), H(k7 ⊕a7 , r78 ), H(k8 ⊕a8 , r78 ), H(k12 ⊕a12 , r14 ), H(k34 ⊕a34 , r14 ), H(k56 ⊕a56 , r58 ), H(k78 ⊕a78 , r58 ), H(k14 ⊕a14 , r18 ), H(k58 ⊕a58 , r18 )]. Using the bulletin board, each member only with his/her individual key can compute all node keys. This feature makes our scheme well suited to the fields where the users are low-end with severe memory restrictions. Examples are the environments where the multicast group consists of specific service users with their mobile phones and the user module in the USIM card.
Efficient Multicast Key Management for Stateless Receivers
507
Furthermore our scheme with the bulletin board is efficient and simple for stateless members. We consider a member who cannot update the new group key for a long time. Whenever this member again wants to obtain the group key of the current session, he/she can easily compute the group key only with the individual key and the current blinded factors and the his/her auxiliary key in the bulletin board without keeping track of a history of regenerated keys. Even though the GC cannot use the bulletin board, for one stateless member it is needed only a broadcast message consisting of log2 n blinded factors and the current his/her auxiliary key ai without any encryption/decryption process. But in other schemes as EHBT and OFT, when a member is deleted, the individual of sibling is always changed. Since the GC cannot know a time for off-line from all members, the GC must keep track of a history of regenerated keys. If the GC uses the bulletin board in other schemes, the amount of the public data on the bulletin board is almost (n · log2 n · K). Since in other schemes all multicast messages are encrypted, the amount of the data is the height of tree times the number of users.
4
Evaluation
The performance of schemes based on tree construction mainly depends on the height of the tree. In our scheme, to maintain the height as low as possible, new members are added into the closest leaves to the root. We compare our algorithm with the other algorithms introduced in Section 1: HBT [2], OFT [5], and EHBT [9]. We assume that the trees used by HBT, OFT, and EHBT are all binary. Because the broadcast size increases with the branching ratio of the trees, a binary tree minimizes the communication costs. Tables 1, 2, 3 and 4 summarize our comparisons focusing on multicast size, unicast size and computation of the GC and a member. We use the following notations for the analysis. n d K I R H X E Nl
number of member in the gruop height of a tree size of a key in bits size of a key index in bits (I < K) key generation one-way hash function execution xor operation encryption/decryption operation in symmetric encryption scheme number of node keys that must be recomputed and multicasted when l members are added to (deleted from) the group
In our scheme the GC and all members do not need any encryption/decryption process of a re-keying message to update KGKs. All computation needed for re-keying is O(log2 n) applications of one-way hash function and XOR operation, and all information needed for re-keying is in the current transmission and the initial information. Though the size of re-keying messages in our scheme is larger
508
Ju Hee Ki et al.
Table 1. Single member is added to the group Computation
HBT OFT
Size of re-keying message
GC
Join member
Sibling
(d + 1)R + (2d + 1)E
(d + 1)E
dE
Join Unicast
Multicast
K
2dK
(d + 1)K
dK
R + d(H + X) + (2d + 1)E (d + 1)E + d(H + X) E + d(H + X)
EHBT
R + (d + 1)(X + H + E)
Our Scheme (d + 1)R + d(3X + 2H) + E
Table 2.
(d + 1)E
(d + 1)(X + H)
(d + 1)K
dI
E + d(H + 2X)
d(H + 2X)
K
(2d + 1)K
l members are added to the group Computation
Size of re-keying message
GC
Join member
Sibling
Join Unicast
Multicast
HBT
Nl R + 2(Nl − l)E
(d + 1)E
dE
lK
2Nl K
OFT
lR + (Nl − 1)H + (Nl − l)X l(d + 1)K
Nl K
EHBT
lR + l(d + 1)E + Nl (X + H)
(d + 1)E
(d + 1)(X + H)
l(d + 1)K
Nl I
Our Scheme
Nl R + (Nl − l)(3X + 2H) + lE
E + d(H + 2X)
d(H + 2X)
lK
(2Nl + 1)K
+(Nl + l − 1)E
(d + 1)E + dH + dX E + d(H + X)
Table 3. Single member is deleted to the group Computation GC
Sibling
Size of re-keying message Max. other member Sibling (except sibling)
Multicast
Unicast
HBT
dR + 2dE
dE
(d − 1)E
K
2dK
OFT
R + d(H + X + E)
E + d(H + X)
(d − 1)(E + H + X)
K
dK
EHBT
d(X + H) + (d − 1)E
d(H + X)
(d − 1)E
I
dK
Our Scheme
dR + (d − 1)(3X + 2H)
d(H + 2X)
(d − 1)(H + 2X)
K
(2d + 1)K
Table 4. l members are deleted to the group Computation GC
Sibling
Size of re-keying message Max. other member Sibling (except sibling)
HBT OFT
Nl (R + E)
dE
(d − 1)E
lR + (Nl − 1)(H + X + E) dE + (d − 1)(H + X) (d − 1)(E + H + X)
Multicast
Unicast K
2Nl K
K
Nl K
EHBT
Nl (X + H) + (Nl − l)E
d(H + X)
(d − 1)E
I
Nl K
Our Scheme
(Nl − l)(3X + 2H)
d(H + 2X)
(d − 1)(H + 2X)
K
(2Nl + 1)K
than the size in HBT, OFT and EHBT, our scheme is a very efficient method for the computation of GC and group members, and for a stateless receiver Ui only with the broadcast message consisting of the current d-blinded factors and auxiliary keys ai ’s.
Efficient Multicast Key Management for Stateless Receivers
5
509
Conclusion
We have presented and analyzed a new and efficient algorithm for managing keys in dynamic large groups, especially well suitable to stateless receiver. The group controller computes blinded factors with which the group members change their KGKs distributively. The only computation for the member to do is one-way hash function execution and xor operation related to his/her secret into the rekeying message. Also there is no risk of disclosure of keys by any collusion of deleted members since our scheme is based on the security of Theorem 1. To the best of our knowledge, our scheme is the first one that works for stateless receiver and provides both backward and forward secrecy. If our scheme is allowed to use a public access location, each member only with one individual key can compute all needed node keys without any multicast message.
References [1] R. Canetti, J. Garay, G. Itkis, K. Micciancio, M. Naor, and B. Pinkas. multicast Security: A Taxonomy and Some Efficient construcions. In Proc, of INFOCOM 99, 1999. 498 [2] G. Caronni, M. Waldvogel, D. Sunand, and B. Plattner. Efficient Security for Large and Dynamic Multicast Groups. In Workshop on Enabling Technologies, (WETICE 98). IEEE Comp Society Press, 1998. 498, 507 [3] I. Chang, R. Engel, D. Kandlur, D. Pendarakis, and K. Saha. Key Management for Secure Internet Multicast Using Boolean Function Minimization Techniques. In IEEE NFOCOM, March 1999. 498 [4] M. Karnaugh. The Map Method for Sysnthesis of Combinational Logic Circuits., Transactions AIEE, Communications and Electronics, Vol. 72, pp.593-599, November 1953. 498 [5] D. A. McGrew and A. T. Sherman. Key Establishment in Large Dynamic Groups Using One-Way Function Trees. Technical Report No. 0755, TIS Labs at Network Associates, Inc., Glenwood, MD, May 1998. 498, 507 [6] D. Naor, M. Naor and J. Lotspiech Revocation and Tracing Schemes for Stateless Receivers in Cryptology - CRYPTO 2001, Lecture Notes in Computer Science, vol 2139, 2002, pp. 41-62. 499 [7] A. Perrig, D. Song, and J. D. Tygar. ELK, a New Protocol for Efficient LargeGroup Key Kistribution. In 2001 IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 2001. 498 [8] C.-S. Park and D. H. Lee Secure and efficient key management for dynamic multicast groups. Operation Systems Review, ACM, Vol.35, No.4, pp 32-38, Oct.2001. 498 [9] S. Rafaeli, L. Mathy, and D. Hutchison. EHBT: An efficient protocol for group key management. Proc. of 3rd Intl. COST264 Workshop on Networked Group Communication (NGC 2001) Lecture Notes in Computer Science 2233, Springer,pp. 159-171. 2001. 498, 507 [10] M. Steiner, G. TsudiK, and M. Wakdner. Key Agreement in Dynamic Peer Groups. IEEE Transactions on Parallel and Distributed System, March 2000. 498 [11] D. Wallner, E. Harder, and R. Agee. Key management for a multicast : Issues and Architectures RFC 2627, June 1999. 498
Fingerprint Verification System Involving Smart Card Younhee Gil, Daesung Moon, Sungbum Pan, and Yongwha Chung Information Security Research Division Electronics and Telecommunications Research Institute 161 Kajong-dong, Yusong-gu, Taejon City, The Republic of Korea {yhgil,daesung,sbpan,ywchung}@etri.re.kr http://www.etri.re.kr
Abstract. The fingerprint has been used as biometrics for user identification and authentication because of its characteristics like uniqueness in that no two fingerprints pair from different fingers is definitely identical and its permanence. Traditional computer-aided fingerprint identification or verification system has been involving the central database of fingerprints. However, retaining the central database can cause critical fingerprint leakage. Therefore, it has been proposed to decentralize the fingerprint into each storage device such as a smart card. Also, to gain maximum security in the system, the computation of the fingerprint verification as well as the store of the fingerprint has to be taken place in the smart card in order to keep fingerprint from being streamed into the outside of the card. We call the system with this mechanism Match-onCard system. In this paper, we introduce Match-on-Card system and explain the fingerprint verification algorithm using a multi-resolution accumulator array that can be executed in such a restricted environment as a smart card.
1
Introduction
Biometrics has been used to identify or verify a person owing to several reasons. It relies on the personal biological or behavioral characteristics [1,3]. Therefore, there is no concern about being lost, stolen, or forgotten in contrast to the traditional methods, i.e. PINs, passwords, ID cards. Also, using biometrics to identify a person can inherently differentiate between a verified person and a fraudulent imposter. Examples of biometric signs are fingerprint, face[2], iris[10], hand vascular pattern[11] and speech. Biometrics used for person identification has to satisfy four requirements, which are universality, uniqueness, permanence, and collectability[2]. Fingerprints are the oldest biometric sign of identity. The flow pattern of ridges in a fingerprint is unique to the person in that no two fingerprints pair from different fingers is definitely identical. In addition, fingerprints are invariant with time. With these reasons, the fingerprint has become one of the most widely used biometrics[4-9].
P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 510-524, 2003. Springer-Verlag Berlin Heidelberg 2003
Fingerprint Verification System Involving Smart Card
511
In typical biometric verification systems, the biometric patterns are often stored in a central database. With the central storage of the biometric pattern, there are open issues of misuse of the biometric pattern such as the “Big Brother” problem. To solve these open issues, the database can be decentralized into millions of smart cards[1222]. However, most of the current implementations of this solution have a common characteristic that the biometric verification process is solely accomplished out of the smart card. This system is called Store-on-Card[17] because the smart card is used only as a storage device to store the biometric pattern. For example, in a fingerprintbased Store-on-Card, the fingerprint pattern stored in the smart card needs to be insecurely released into an external card reader to be compared with an input fingerprint pattern. To heighten the security level, the verification operation needs to be performed by the in-card processor, not the external card reader. This system is called Match-onCard[17] because the verification operation is executed on the smart card. Note that standard PCs on which typical biometric verification systems have been executed have 1GHz CPU and 128Mbytes memory. On the contrary, state-of-the-art smart card can employ 50MHz CPU, 64Kbytes ROM, 32Kbytes EEPROM, and 8Kbytes RAM at most. Therefore, the typical biometric verification algorithms may not be executed on the smart card successfully. In this paper, we present a minutiae-based Match-on-Card system that can be executed in real-time on the resource-constrained environments such as the smart card. To meet the processing power and memory space specification of the smart card, we develop a data structure, called a multi-resolution accumulator array, with which the equal amount of memory space is required at each resolution. Based on the experimental results, we confirmed that the memory requirement of the proposed algorithm is about 6.8 Kbytes, and the Equal Error Rate(EER) is 6.0%. Also, as far as we know, this is the first experimental study of the relationship among execution time, memory requirement, and accuracy of a fingerprint verification algorithm. The rest of the paper is structured as follows. Section 2 explains user verification method using fingerprint including minutiae extraction as preceding step of minutiaebased fingerprint verification system and minutiae matching algorithm. Section 3 describes smart card based fingerprint verification system and the proposed minutiae matching algorithm suited for resource-constrained environment. The experimental results are given in Section 4, and we conclude in Section 5.
2
User Verification Using Fingerprint
It is widely known that a professional fingerprint examiner relies on details of ridge structures to make fingerprint identifications[23]. The topological structure of the fingerprint is unique, and invariant with aging and impression deformations[23]. It implies that fingerprint authentication can be based on the matching of structural patterns. Generally, structural features used in fingerprint identification are composed of the point where ridge ends and that ridge bifurcates, which are called minutiae. Fig. 1 shows fingerprint images and two types of minutiae pointed by white arrows. The minutia in the Fig. 1(a) is bifurcation, and that in the Fig. 1(b) is ending point.
512
Younhee Gil et al.
(a)
(b)
Fig. 1. Examples of fingerprint image and their features pointed by arrows. The features are called as minutiae, and consist of bifurcation and ending point. (a) bifurcation (b) ending point
Enrolled Fingerprint
PREPROCESSING
EXTRACTION Enrolled Minutiae
STORE
Enrollment Verificatoin
Input Fingerprint
PREPROCESSING
MATCH
EXTRACTION
Yes No
Input Minutiae
Fig. 2. Fingerprint Verification System
As automatic fingerprint identification and authentication systems rely on representing the two most prominent minutiae, i.e., bifurcation and ridge ending, a reliable minutiae extraction algorithm is critical to the performance of the system. And the performance of the minutiae extraction algorithm depends heavily on the quality of input fingerprint images. Fig. 2 shows a fingerprint verification system, which consists of two phases: enrollment and verification. In the off-line enrollment phase, an enrolled fingerprint image is preprocessed, and the minutiae are extracted and stored. In the on-line verification phase, the similarity between the enrolled minutiae and the input minutiae is examined. In general, there are three steps involved in the verification process, that are, image preprocessing, minutiae extraction, and minutiae matching. Image preprocessing refers to the refinement of the fingerprint image against the image distortion obtained from a fingerprint sensor. Minutiae extraction refers to the extraction of features in the fingerprint image. After this step, some of the minutiae are detected and the information of the minutiae are stored into a pattern file, which includes the position, direction, and type of the minutiae.
Fingerprint Verification System Involving Smart Card
513
Based on the minutiae, the input fingerprint is compared with the enrolled fingerprint. Actually, minutiae matching is composed of alignment stage and matching stage. In order to match two fingerprints captured with unknown direction and position, the differences of direction and position between two fingerprints are detected, and alignment between them needs to be accomplished. Therefore, in alignment stage, transformations such as translation and rotation between two fingerprints are estimated, and two minutiae are aligned according to the estimated parameters. If alignment is performed accurately, matching stage is referred to point matching simply. In matching stage, two minutiae are compared based on their position, direction, and type. Then, a matching score is computed. Our representation is minutiae based, and each minutia is described by its position(x, y coordinates), the direction it flows and the type(ridge ending or bifurcation). 2.1
Minutiae Extraction
The minutiae extraction algorithm in our system mainly consists of three components as listed in Fig. 3: 1) Generation of direction map, 2) Binarization of fingerprint image, 3) Detection of minutiae. Besides these steps, it is critical to be able to analyze the image and determine areas that are degraded and likely to cause problem because the image quality of a fingerprint may vary. Several characteristics can be measured that convey information regarding the quality of localized regions in the image. These include detecting regions of low contrast, low ridge flow, and high curvature and determining directional flow of ridges. Using the information, the unstable areas in the image where minutiae detection is unreliable can be distinguished. Ge ne r ate Dir e c tion Ma p
Bina rize Ima ge
De te c t Minutia e
Fig. 3. Minutiae Extraction Process
2.1.1 Generation of Direction Map One of the fundamental steps in minutiae extraction process is deriving a directional ridge flow map. The purpose of this map is to represent areas of the image with sufficient ridge structure. Well-formed and clearly visible ridges are essential to reliably detecting minutiae. In addition, the direction map records the general direction of the ridges as they flow across the image.
514
Younhee Gil et al.
2.1.2 Binarization of Image As our minutiae detection algorithm is working on bi-level image, every pixel in the grayscale input image must be binarized. A pixel is assigned as a binary value based on the ridge flow direction associated with the block the pixel is within. In order to determine whether current pixel should be set to black or white, the pixel intensities of 7×9 pixel grid rotated according to the orientation of it, which surround the current pixel, are analyzed. Grayscale pixel intensities are accumulated along each rotated row in the grid, forming a vector of row sums. The binary value to be assigned to the center pixel is determined by multiplying the center row sum by the number of rows in the grid and comparing this value to the accumulated grayscale intensities within the entire grid. If the multiplied center row sum is less than the grid’s total intensity, then the center pixel is set to black. Otherwise, it is set to white. 2.1.3 Detection of Minutiae Before minutiae are detected, the binarized image should be thinned, and the detection step scans the thinned image with some kernels that can detect minutiae. After detection of minutiae, candidate minutiae points are detected. Usually, many false minutiae are included in the candidate list, therefore, removal of them are necessary to increase the performance of the fingerprint verification system. The step includes removing islands, lakes, holes, minutiae in regions of poor image quality, side minutiae, hooks, overlaps, minutiae that are too wide, and minutiae that are too narrow.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 4. Minutiae Extraction Result. (a),(d) Direction map superimposed on the original image. (b),(e) Binarized image. (c),(f) Extracted minutiae superimposed on the original image
Fingerprint Verification System Involving Smart Card
515
The results of our minutiae extraction algorithm are represented in Fig. 4. The upper row shows the results of Fig. 1(a), and the lower row shows those of Fig 1(b), respectively. Each row includes direction map superimposed on the original image, binarized image and extracted minutiae as represented as black small dot superimposed on the original image. 2.2
Minutiae Matching Algorithm
As mentioned before, minutiae matching is composed of alignment stage and matching stage. The input to the alignment stage consists of two sets of minutiae points P and Q extracted from input and enrolled fingerprint images, respectively. For the purpose of explanation, we define the following notations: P =
{( p
1 x
, p 1y , α
1
),..., ( p
p x
, p yp , α
p
)}
Q =
{(q
1 x
, q 1y , β
1
),..., (q
Q x
, q Qy , β
Q
)}
(1)
where ( p xi , p iy , α i ) and ( q xi , q iy , β i ) are the three features(spatial position, direction) associated with the ith minutia in the set P and Q, respectively. We assume that the second fingerprint image can be obtained by applying a similarity transformation(rotation and translation) to the first image. We discretize the set of all possible transformations, and the matching score is computed for each transformation. The transformation having the maximal matching score may be the correct one. Let’s consider a transformation,
Fθ , ∆ x , ∆ y where
θ
x cos θ = y − sin θ
sin θ cos θ
x ∆ x + y ∆ y
(2)
and ( ∆x, ∆y ) are the rotation and translation parameters, respectively. The
space of transformation consists of ( θ , ∆x, ∆y ), where each parameter is discretized into a finite set of values:
θ ∈ {θ1 ,...θ L } , ∆x ∈ {∆x1 ,...∆xM } , and ∆y ∈ {∆y1 ,...∆y N } , where L, M and N are the numbers of discretized parameters i.e., the number of arrays along the rotation and translation parameter axes. The alignment parameters for the transformations are collected in the accumulator array A, where the entry A(l,m,n) counts the evidence for the transformation Fθ , ∆ x , ∆ y . For each pair (p,q), where p is a point in the set P and q is a point in the set Q, find all possible transformations that can map p to q. Then, the evidence for these transformations in the array A is incremented. In this straightforward implementation alignment stage, the required memory space of the accumulator array A is O(LMN). If the numbers of L, M and N are 64, 128 and 128, respectively, the memory space of accumulator array A is 1,048,576 bytes. It cannot be executed on the smart card. Therefore, a different implementation of the accumulator array is needed to reduce the required memory space.
516
Younhee Gil et al.
3
Smart Card-Based Fingerprint Verification System
Fig. 5 represents smart card emulator[13] we are developing for fingerprint-based Match-on-Card. Also, Table 1 shows the system specification of the smart card that we are developing for fingerprint-based Match-on-Card. The smart card can employ 50MHz CPU, 64Kbytes ROM, 32Kbytes EEPROM, and 8Kbytes RAM. To assign the verification steps to the smart card and the card reader, we evaluate first the resource requirements of each step. Gil et al.[20] reported that the preprocessing and extraction steps cannot be executed on the resource-constrained environments such as smart card. Thus, we determined the minutiae matching step is executed on the smart card. Note that the minutiae matching step(alignment and matching stages) to compute the similarity between the enrolled minutiae and the input minutiae is executed on the Match-on-Card, whereas the image preprocessing and minutiae extraction steps are executed on the card reader. Fig. 6 represents a fingerprint-based Match-on-Card system. In the off-line enrollment phase, an enrolled fingerprint image is preprocessed, and the minutiae are extracted and stored. In the on-line verification phase, the minutiae extracted from an input fingerprint are transferred to the smart card. Then, the similarity between the enrolled minutiae and the input minutiae is examined in the smart card. In the following section, we focus on the minutiae matching algorithm for resource-constrained environment.
Fig. 5. Next generation smart card emulator we are developing
Fingerprint Verification System Involving Smart Card
517
Table 1. System Specification of the Smart Card
CPU ROM RAM EEPROM
Enrolled Fingerprint
32-bit RISC (ARM7TDMI) 64 Kbytes 8 Kbytes 32 Kbytes
PREPROCESSING
EXTRACTION Enrolled Minutiae
Processor
STORE
Enrollment Verificatoin
Input Fingerprint
PREPROCESSING
MATCH
EXTRACTION
Card Reader
Yes No
Input Minutiae Smart Card
Fig. 6. Fingerprint-based Match-on-Card system
3.1
Memory-Efficient Fingerprint Matching Algorithm
Fig. 7 shows the proposed memory-efficient algorithm using a multi-resolution accumulator array. Following terminology is defined to describe the computation flow of our algorithm using an accumulator array: • • • • •
depth(d): the number of levels, unit angle(ua): reference angle of the accumulator array at each level unit x(ux): reference distance along the horizontal axis of the accumulator array at each level unit y(uy): reference distance along the vertical axis of the accumulator array at each level maximum bin(denoted as shown in Fig. 7): the most accumulated position in the accumulator array.
For simplicity, it is assumed in description of the proposed algorithm that depth d is set to 3 and accumulator array only considered spatial parameter(ux, uy). In the 3rd level (with the coarse resolution, considering the range Y and the unit size(uy) y), the maximum bin( ) of the accumulator array that approximates the alignment parameters is found. To obtain more exact alignment parameters using the same memory space, the proposed algorithm iterates the same process with the finer resolution having range Y/2 and unit size y/2 around the positions found in the 3rd level. Finally, in the 1st level (with the finest resolution, range Y/4 and unit size y/4), the exact alignment parameter is found.
518
Younhee Gil et al. X y Y
3nd level
X /2 y/2 Y /2 2nd level
X /4 y/4 Y /4 1st level
Fig. 7. Computation flow of the proposed algorithm
More details about the algorithm are appeared in [22]. Note that the memory requirement of the proposed algorithm is same at each level. For example, if the numbers of L, M and N are 64, 128 and 128, respectively, the memory space of accumulator array A is ( 64/ 2 3 −1 )×(128/ 2 3 −1 )×(128/ 2 3 −1 ) (=16,384) bytes. Also, in level 2 and 1, our algorithm uses (32/ 2 2−1 )×(64/ 2 2 −1 )×(64/ 2 2−1 ) (=16,384) bytes and (16/ 21−1 )×(32/ 21−1 )×(32/ 21−1 ) (=16,384) bytes. Therefore, the required memory space of the proposed algorithm is O(LMN/ 23d ). Since d can be set such that 2 3d = L , the required memory space of the proposed algorithm is O(MN). The straightforward implementation requires memory of space O(LMN). Note that the number of instructions of the typical algorithm is O(|P||Q|), whereas that of the proposed algorithm is about O(d|P||Q|).
4
Experimental Results
We present our experimental results on the performance of the proposed fingerprint verification algorithm such as the number of instructions, the memory requirement, and the EER. We measured the number of instructions and memory requirement
Fingerprint Verification System Involving Smart Card
519
using simulator which models the behavior of a 32-bit RISC processor(ARM7TDMI) provided by dynalith[26]. It, which is named iSAVE, can simulate given algorithm calculating total number of instruction and heap and stack size allocated during execution. 4.1
Test Environment
We have tested our fingerprint verification algorithm on the fingerprint images captured with an optical scanner manufactured by SecuGen[24], which has resolution of 500dpi. The size of captured fingerprint images was 248×292. The fingerprint test images was provided by NitGen[25], our project partner, for the performance analysis of our fingerprint verification algorithm. The image set is composed of four fingerprint images per one finger from 100 individuals for a total of 400 fingerprint images. When these images were captured, no restrictions on the spatial position and direction of fingers were imposed. Also, the captured fingerprint images vary in quality as shown in Fig. 8. Each fingerprint in the test image set was tested with the other fingerprints. A matching was labeled GENUINE if the matched fingerprint image was from the same finger as template fingerprint image, and IMPOSTER otherwise. 600 GENUINEs were performed, i.e., 6 matchings have been performed per 4 fingerprint images subset from same finger. And, one reference fingerprint image was set for 4 fingerprint images subset to perform IMPOSTER, and used reference images only. Therefore, a total of 9900 IMPOSTERs was performed, i.e., for one reference fingerprint image and the other 99 reference fingerprint images have been tested. We performed GENUINE and IMPOSTER using various parameters, and evaluated not only the matching scores but also required memory and total number of instructions. Because the smart card has very limited sizes of resources(memory and processing power), the memory requirement as well as execution time should be evaluated. Using this information, the possibility of Match-on-Card can be determined.
Fig. 8. Samples of the test image set
4.2
Effect of Depth
Table 2 shows the parameter sets used to figure out how much memory space can be saved and tendency of decrease of the number of instructions as depth d is increasing. Also, Table 3 shows the experimental results. As mentioned before, the simulator we
520
Younhee Gil et al.
used is the iSAVE[26] which models the behavior of a 32-bit RISC processor(ARM7TDMI). According to Table 3, the higher depth d is applied, the more the number of instructions is needed, on the other hand, the fewer memory is needed. Limited memory size is critical for Match-on-Card. Therefore, d should be set as 3 at least to realize the Match-on-Card system. Fig. 9 shows the distribution of matching scores of GENUINEs and IMPOSTERs. In Fig. 9, vertical axis represents the distribution of matching scores, and horizontal axis represents the score ranging from 0 to 100. As shown in Fig. 9, three graphs have similar distribution, which means that matching score is rarely affected by d. Table 2. Parameter Sets Used for Matching
d 1 2 3
Test_1 Test_2 Test_3
ua 2 2 2
ux 4 4 4
uy 4 4 4
Table 3. Number of Instructions and Memory Requirement under Various Depth
Number of Instructions (Estimated Execution Time)
Memory Requirement(bytes)
Test_1
35,377,719
436,916
Test_2
40,891,096
20,486
Test_3
48,129,074
6,836
12 IMPOSTER GENUINE
percentag
10 8 6 4 2 0 0
10
20
30
40
50
60
m atching score
(a) Test_1
70
80
90
100
Fingerprint Verification System Involving Smart Card
521
12 IM POSTER GENUINE
percentag
10 8 6 4 2 0 0
10
20
30
40
50
60
70
80
90
100
90
100
m atching score
(b) Test_2
12 IM POSTER GENUINE
percentag
10 8 6 4 2 0 0
10
20
30
40
50
60
70
80
m atching score
(c) Test_3 Fig. 9. Distribution of matching score, Thin line of black color represents score distribution of imposter matching and thick line of gray color genuine matching
4.3
Effects of Unit Size
Table 4 represents the number of instructions and requirement memory space under the various conditions when d is set to 3. It indicates that the memory requirement is reduced significantly as the unit size gets larger. However, the error rate is degraded slightly as we mentioned before. Therefore, (2,4,4) is a reasonable bin unit satisfying the memory requirement of Match-on-Card with acceptable error rate.
522
Younhee Gil et al.
Table 4. Number of Instructions and Memory requirement under the Various Conditions of Unit Size
(ua, ux, uy)
Number of Instructions (Estimated Execution Time)
Memory Requirement(bytes) (Stack size + Heap size)
(1,1,1) (1,2,2) (2,2,2) (2,4,4) (4,2,2) (4,4,4)
332,685,880 112,348,976 76,263,937 46,129,074 58,183,205 43,873,083
223,924(2,460 + 221,464) 77,828(2,460 + 75,368) 40,144(2,460 + 37,684) 6,836(2,460 + 4,376) 24,338(2,460 + 21,878) 5,420(2,460 + 2,960)
Table 5. EER under Various Conditions of Unit Size
(ua, ux, uy)
EER(%)
(1,1,1)
(1,2,2)
(2,2,2)
(2,4,4)
(4,2,2)
(4,4,4)
(4,8,8)
7.5
6.10
5.85
6.00
6.11
6.67
7.17
Table 5 shows the error rate(EER) under the various conditions of unit of angle and coordination when d is set to 3. According to Table 5, the error rate does not worsen even if the bin unit is set to (2, 4, 4). It is because 2 degree and 4 pixels are more proper than 1 degree and 1 pixel. Using 1 pixel as an unit of the accumulate array is not reasonable in the image of resolution of 500 dpi. It means that about 0.05 mm is used as one unit. Thus, it is an ignorable distance.
5
Conclusion
The smart card is a model of very secure device, and the biometrics is the promising technology for verification. These two can be combined for many applications to enhance both the security and the convenience. However, typical biometric verification algorithms that have been executed on standard PCs may not be executed in realtime on the resource-constrained environment. In this paper, we have presented a memory-efficient fingerprint verification algorithm that can be executed in real-time on the smart card. To meet the processing power and memory space specification of the smart card, we first evaluated both the processing power and the memory requirement of each fingerprint verification step. We found that the memory requirement of the alignment stage was the most critical factor in implementing Match-on-Card. To reduce the memory requirement, we employ a small-sized accumulator array. Then, to compute the alignment parameters more accurately, we perform more computations at from a coarse-grain to a fine-grain resolution on the accumulator array.
Fingerprint Verification System Involving Smart Card
523
The experimental evaluations were performed on the iSAVE with the NitGen fingerprint database, and the experimental results were very encouraging. Given 248×292 fingerprint images, the matching was completed in 0.9seconds with 6.8Kbytes. The same step was performed in 0.3seconds with 400Kbytes using the typical algorithm. Also, the accuracy(EER) of the proposed algorithm was comparable to that of the typical algorithm (6.0% vs 3.8%). We are currently improving the accuracy further while maintaining that small-size memory requirement. We believe that the memory-efficient technique developed in this paper can offer a general framework for developing other biometric algorithms on the resource-constrained environments such as smart card.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
Jain, A., Bole, R., Panakanti, S.: Biometrics: Personal Identification in Networked Society. Kluwer Academic Publishers (1999) Jain, L. et al.: Intelligent Biometric Techniques in Fingerprint and Face Recognition. CRC Press (1999). Seto, Y.: Personal Authentication Technology using Biometrics. SICE, Vol. 37, No. 6 (1998) 395–401 Gamble, F., Frye, L., Grieser, D.: Real-time Fingerprint Verification System. Applied Optics, Vol. 31, No. 5 (1992) 652–655 Wilson, C., Watson, C., Paek, E.: Effect of Resolution and Image Quality on Combined Optical and Neural Network Fingerprint Matching. Pattern Recognition, Vol.33, No.2 (2000) 317–331 Lee, C., Wang, S.: Fingerprint Feature Extraction using Gabor Filters. Electronics Letters, Vol. 35, No. 4 (1999) 288–290 Jain, A., Hong, L., Bolle, R.: On-line Fingerprint Verification. IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.19, No.4 (1997) 302–313 Isenor, D., Zaky, S.: Fingerprint Identification using Graph Matching. Pattern Recognition, Vol. 19, No. 2 (1986) 113–122 Fan, K., Liu, C., Wang, Y.: A Fuzzy Bipartite Weighted Graph Matching Approach to Fingerprint Verification. Proc. of The IEEE Int. Conf. On Systems, Man and Cybernetics (1998) 729–733 Lim, S., Lee, K., Byeon, O., Kim, T.: Efficient Iris Recognition through Improvement of Feature Vector and Classifier. ETRI Journal, Vol.23, Num.2, June (2001) 61–70 Lim, U.: A Direction-based Vascular Pattern Extraction Algorithm for Hand Vascular Pattern Verification. ETRI Journal Dreifus, H., Monk, T.: Smart Cards. John Wiley & Sons (1997) Kim, H. et al.: Specification for the Next-Generation IC Card System(Korean). Technical Report, ETRI (2000) Hachez, G., Koeune, F., Quisquater, J.: Biometrics, Access Control, Smart Cards: A Not So Simple Combination. Proc. 4th Working Conf. on Smart Card Research and Advanced Applications (2000) 273–288
524
Younhee Gil et al.
15. Struif, B.: Use of Biometrics for User Verification in Electronic Signature Smartcards. Proc. of E-smart 2001, LNCS 2140 (2001) 220–227 16. Moon, Y., Ho, H., Ng, K., Wan, S., Wong, S.: Collaborative Fingerprint Authentication by Smart Card and a Trusted Host. Electrical and Computer Engineering, Vol.1, (2000) 108–112 17. Janke, M.: FingerCard Project Presentation. http://www.finger-card.org (2001) 18. Kaku, N., Murayama, T., Yamamoto, S.: Fingerprint Authentication System for Smart Cards. Proc. of IFIP on E-commerce, E-business, E-government (2001) 97–112 19. Sanchez-Reillo, R., Sanchez-Avila, C.: Fingerprint Verification using Smart Cards for Access Control Systems. Proc. of Int. Carnahan Conf. (2001) 250–253 20. Gil, Y., Chung, Y., Ahn, D., Moon, J., Kim, H.: Performance Analysis of Smart Card-based Fingerprint Recognition for Secure User Authentication. Proc. of IFIP on E-commerce, E-business, E-government (2001) 87–96 21. Moon, D., Gil, Y., Pan, S., Chung, Y.: Performance Analysis of the Match-onCard System for the Fingerprint Authentication. Proc. of The second International Workshop on Information Security Applications, Vol. 2 (2001) 449–459 22. Pan, S., Gil, Y., Moon, D., Chung, Y., Park, C.: A Memory-Efficient Fingerprint Verification Algorithm using A Multi-Resolution Accumulator Array for Matchon-Card. ETRI Journal 23. Federal Bureau of Investigation: The Science of Fingerprints: Classification and Uses. Washington, D.C., U.S. Government Printing Office (1984) 24. SecuGen, http://www.secugen.com 25. NitGen, http://www.nitgen.com 26. iSAVE, http://www.dynalith.com
A Fast Fingerprint Matching Algorithm Using Parzen Density Estimation Choonwoo Ryu and Hakil Kim Dept. of Automation Engineering, College of Engineering, Inha University #253 Yonghyun-Dong, Nam-Ku, Incheon, Korea [email protected] [email protected]
Abstract. Minutiae-based fingerprint matching algorithms generally consist of two steps: alignment of minutiae and search for the corresponding minutiae. This paper presents a triangular matching algorithm for fast alignment, in which the overall processing time can be significantly cut down by making a quick decision on the amounts of rotation and translation between a pair of fingerprint images. The alignment algorithm proposes a novel triangular data structure and utilizes Parzen density estimation. The proposed algorithm has been tested under wellformed testing scenario over an Atmel fingerprint database and demonstrated promising improvement both in processing time and in recognition accuracy.
1
Introduction
The main task of fingerprint recognition is making a decision on whether two fingerprint images are from the same finger or not. Most widely adopted structural features from fingerprint ridges are ending and bifurcation, together called, minutiae. Traditionally, it has been believed that the minutiae of fingerprints are the most discriminating and easily extractable features by digital image processing and computer vision techniques [1, 2, 3]. The matching algorithm in [4] uses a different triangular data structure to overcome geometric deformation but it produces a large size of feature templates because of dynamic time warping. An AFIS(Automated Fingerprint Identification System) matching system implemented by a distributed system proposed a triangular based matching algorithm for non-criminal applications [5]. However, in the matching algorithm, database management schemes are required and the larger amount of data is stored in the database than other minutiae based AFIS system. Moreover, a repetition of evidence updating and pose transformation is requisite process for this matching algorithm. A relaxation point pattern matching algorithm in [6] requires excessive matching time due to iterative processes of searching corresponding pairs of minutiae from the two fingerprints. In this paper, an enhanced triangular matching algorithm and a Parzen density estimation method are proposed for fast triangular matching of a set of fingerprint minutiae for one-to-one and one-to-few applications. Compared to other P.J. Lee and C.H. Lim (Eds.): ICISC 2002, LNCS 2587, pp. 525–533, 2003. c Springer-Verlag Berlin Heidelberg 2003
526
Choonwoo Ryu and Hakil Kim
matching algorithms which use different local structures, the suggested triangular matching method shows better performance in retrieving pose information and less sensitivity to nonlinear deformations between an enrolled fingerprint and an attempt fingerprint. Furthermore, the Parzen density estimation method significantly eliminates the number of pose transformation, which is performed numerous times with different parameters in the same fingerprint matching pair.
2
Proposed Algorithm
The overall process of the algorithm is depicted in Fig. 1. It produces triangular minutia structures named clique from minutiae of a search and a file fingerprint, where the file fingerprint is the registered fingerprint in an enrollment process and the search fingerprint is the attempted fingerprint for authentication. The geometry of a clique is shown in Fig. 2. Pair of cliques from the file and the search fingerprints is decided to be identical if all the elements of the cliques are within allowable ranges as described in the next section. The amounts of translation and rotation are calculated from the all paired cliques. Conventional matching algorithms used to carry out a large number of minutiae matching processes according to all possible translation and rotation candidates, and then select the highest score as the final matching score. However, a significant portion of the translation and rotation candidates are redundant
START Search Fingerprint
File Fingerprint
Make Cliques Make Cliques Sort Clique List
Find Paired Clique Calculate Translation and Rotation Candidates Parzen Density Estimation (finding translation and rotation information) Rotate & Translate File Fingerprint Minutia Matching
ma
θa
α
mb
θb
β mc
Calculate Correlation of Two Fingerprints
θc
END
Fig. 1. Overview of the proposed algorithm
Fig. 2. Geometry of cliques over thinned fingerprint image
A Fast Fingerprint Matching Algorithm Using Parzen Density Estimation
527
Table 1. Performance of algorithm with various k’s k 4 6 10 20 30
Matching time(sec) 0.002426 0.008316 0.040981 0.204968 0.582429
EER(%) 3.04 1.53 1.53 1.54 1.53
because they have similar values in the genuine matching and meaningless values in the impostor matching. On the contrary, the proposed algorithm effectively finds one or a very few translation and rotation candidates using Parzen density estimation which estimates the true parameters using similarity of the candidates in the genuine matching. Hence, it significantly reduces the number of the minutiae matching processes while keeping the error rate unchanged. 2.1
Clique Data Structure
A clique[7] consists of three minutiae and is depicted by the circle passing the minutiae as shown in Fig. 2. There are numerous ways of combination of three minutiae in a fingerprint. In this study, each minutia chooses k nearest minutiae from the minutiae list and produces k C2 cliques. The maximum number of clique is n×k C2 , where n is the number of minutiae in a fingerprint. Obviously, a larger k needs more memory space and computational power in the alignment stage while producing a more accurate estimate. Table 1 compares the matching time and the equal error rate (EER) under different k’s (100 fingerprints database). Our experiment confirms that a larger k causes large increase in computational time with unnoticeable improvement in error rates. Therefore, an appropriate k is to be chosen for trading off the error rate against the computational time. In our experiments, k is set to 6. As found in Fig. 2, each clique contains nine geometric elements: radius of the circle r, the biggest inner angle α, the next angle β in the clockwise sense, the ridge directions at the minutiae θa , θb , θc , and the types of the minutiae ζa , ζb , ζc . The elements of a clique in the file fingerprint will be compared with those from the search fingerprint to find clique pairs in the next step. 2.2
Clique Matching
Two cliques are decided to be identical if the following similarity conditions are satisfied. In equations (1) through (4), the superscript F denotes the file fingerprint, while the superscript S denotes search fingerprint. a) The radius r must be similar. min(rF , rS ) ≥ rth max(rF , rS )
(1)
528
Choonwoo Ryu and Hakil Kim
b) The angles α and β must be similar. |αF − αS | ≤ αth , |β F − β S | ≤ βth
(2)
c) The minutiae angles θa , θb , θc must be similar. |θaF − θaS | ≤ θth , |θbF − θbS | ≤ θth , |θcF − θcS | ≤ θth
(3)
d) The minutiae types ζa , ζb , ζc must be the same. 1 if ζaF ≡ ζaS T (ζaF , ζaS ) = 0 otherwise T (ζaF , ζaS ) = T (ζbF , ζbS ) = T (ζcF , ζcS ) = 1
(4)
Figure 3 shows an example for the distributions of the ratio in (1) and the difference values in (2) and (3). In this study, the thresholds in (1)-(3) are chosen from the distributions. 2.3
Parzen Density Estimation
Parzen density estimation is a nonparametric density estimation method. It does not require any assumption on the form of the probability density function, which is usually true for a real-world problem [8]. Let X = [∆x, ∆y, ∆θ]T denote a vector representing the amount of translation (∆x, ∆y) and rotation (∆θ) between a pair of cliques. Its density function pn (X) is defined in (5),
α
ͣ͟͡
β
θb
θa
θc
rS
rF
ͩ͟͢͡ ͧ͟͢͡ ͥ͟͢͡ ͣ͟͢͡
Ϊ Δ Ο Ζ Φ Ζ Σ Η
͟͢͡
ͩ͟͡͡ ͧ͟͡͡ ͥ͟͡͡ ͣ͟͡͡ ͡
͡
͟͢͡
ͣ͟͡
ͤ͟͡
ͥ͟͡
ͦ͟͡
ͧ͟͡
ͨ͟͡ ΧΒΝΦΖ
ͩ͟͡
ͪ͟͡
͢
͢͟͢
ͣ͢͟
ͤ͢͟
ͥ͢͟
Fig. 3. Distributions of the radius ratio and angle difference values in genuine matching
A Fast Fingerprint Matching Algorithm Using Parzen Density Estimation
529
θ
wθ + ∆wθ wθ
wy
wy + ∆wy
y
wx wx + ∆wx x
Fig. 4. Implementation of Parzen density estimation
where ϕ((X − Xi )/hn ) equals to unity if a certain sample of Xi falls within the hypercube of volume Vn centered at X, and is zero otherwise. n X − Xi 1 1 pn (X) = ϕ (5) n i=1 Vn hn In this study, as shown in Fig. 4, overlapped Parzen windows are designed to search efficiently and robustly the largest cluster, which represents the pose information for the best match between two fingerprints. wx , wy and wθ are the sizes of a Parzen window and ∆wx , ∆wy and ∆wθ are increments for the next window. In the manner of being overlapped, the proposed method assures finding the largest cluster. However, it may duplicate the same cluster in overlapped windows. In order to prevent this multiple detection, the estimation process is followed by a merging process where the duplicated clusters are merged to a single cluster. Parzen density estimation method is applied to search the largest cluster in the space of X, which implies the most possible alignment information for matching the two fingerprints. As shown in the last row of Table 2, genuine matching generally yields a much larger amount of X’s than impostor matching. Most of the X’s in genuine matching are gathered into a dense cluster representing the optimal alignment information for matching. Meanwhile, in impostor matching they rarely build a cluster because pairs of corresponding cliques randomly occur. Figure 5 compares the cluster distributions of X’s between in genuine matching and in impostor matching. The proposed method first searches as many reasonably dense clusters as possible. For each cluster, the amounts of rotation and translation are determined from its centroid, and the corresponding alignment process is carried out. In gen-
530
Choonwoo Ryu and Hakil Kim
200
200
150
150
100
100
50
50
θ
θ
0
0 -50 -50
-100
-100
-150 -200 300
-150 300 200
150
100 0
0
0
-50 -100
-100
50
100
50 0
y
200
100
100
y
x
(a) Genuine matching
-50 -100
-100
x
(b) Impostor matching
Fig. 5. Example of cluster distributions of X Table 2. Average number of aligned point pattern matching Parzen Estimation No Parzen Estimation
Genuine 1.10 69.83
Impostor 2.84 11.53
uine matching, the number of searched cluster is very small, mostly single, and the pose information produces the correct alignment. In contrast, the number of searched cluster for impostor matching is relatively large, and the corresponding alignment processes produce inconsistent matching results. Based on experimental observation, 93.6% of genuine matching produces a single dominant cluster, and 95.9% of impostor matching at most 9 clusters as shown in Fig. 6. Hence, the average number of aligned point pattern matching is 1.10 for genuine matching and 2.84 for impostor matching, which hardly repeats the alignment process compared to no Parzen estimation as shown in Table 2.
3 3.1
Experimental Results Database and Evaluation Method
The fingerprint database used in the experiment is provided by Atmel-Grenoble. The aim of the database is to discriminate the algorithm performance on a large scale. Atmel is using this database to qualify the FingerChip partners on a technical point of view. The database contains 2,702 fingerprint images from 201 different fingers. For each finger, the file fingerprint is randomly chosen and the rest are considered as search fingerprints for matching. The minutiae extraction algorithm used in this experiment is basically the one that participated in FVC2000 [9], but enhanced by adding pre-processing
A Fast Fingerprint Matching Algorithm Using Parzen Density Estimation
ΖΟΦΚΟΖ
ʹΝΦΤΥΖΣ͑ΕΚΤΥΣΚΓΦΥΚΠΟ ͖͢͡͡
531
ͺΞΡΠΤΥΖΣ
͖ͪͤͧ͟
͖ͪ͡ ͖ͩ͡ ͖ͨ͡ ͖ͧ͡ ͖ͦ͡ ͖ͥͤͣ͟ ͖ͥ͡ ͖ͤ͡ ͖ͩͪ͢͟
͖ͣ͡
͖ͩ͢͟͡
͖͢͡
͖ͥͩ͟
͖͡ ͢
ͣ
͖ͨͤ͟
͖ͣ͟͡
͖ͨ͟͡ ͤ
ͥ
͖ͦͣ͟ ͖ͤ͟͡
͖ͤͩ͟ ͖͟͢͡
ͦ ͧ ͔͑ΠΗ͑ΔΝΦΤΥΖΣ
͖ͣͪ͟ ͖͟͢͡
͖ͣ͟͢ ͖͟͡͡
ͨ
ͩ
͖ͧ͢͟
͖͟͢͡
ͪ
͖ͥ͟͢ ͖͟͡͡ ͢͜͡
Fig. 6. Cluster distribution Table 3. Summary of trials Number of failure Extraction Matching 77(2.85%) 0
Number of matching Genuine Impostor 2,426 500,049
filters [10], quality-check routine, and some parameters for tuning the algorithm of UINH. 77(2.85%) fingerprints are rejected in the feature extraction but no failure-to-acquire(FTA) in matching. Table 3 summarizes the total number of trials in this experiment. 3.2
Results of Performance Evaluation
The experiments are carried out under MS Windows XP with an Intel Pentium III 933MHz processor. Table 4 compares the average speeds of the matching algorithms between matching with and without Parzen density estimation. The difference is not significant as much as in Table 2 because of the overhead of Parzen density estimation.
Table 4. Average matching speed Parzen Estimation No Parzen Estimation
215.31(times/sec) 120.56(times/sec)
532
Choonwoo Ryu and Hakil Kim
No Parzen Estimation Parzen Estimation
F MR & F NMR
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0
100
200
300
400
500
600
700
800
900
1000
Fig. 7. FMR and FNMR curves
Figure 7 shows the resulted FMR (False Match Rate) and FNMR (False NonMatch Rate) curves with and without Parzen density estimation over the Atmel database. The results show negligible difference in FMR and FNMR except the shift of the equal error rate, which is caused by the fact that the matching without Parzen estimation selects the highest value in all pose transformations.
4
Conclusions and Future Works
This paper proposes a fast fingerprint matching algorithm and presents experimental results. A novel triangular minutia structure called clique is proposed and the Parzen density estimation method is utilized. It improves the performance in the sense of the processing time while keeping the error rate unchanged. However, it raises a problem of large memory requirement for a future work.
Acknowledgement This research has been supported by Korea Research Foundation, 1999 (KRF99-041-E00269).
References [1] A. Jain, L. Hong and R. Bolle, “On-Line Fingerprint Verification,” IEEE Trans. on PAMI, vol. 19, no. 4, pp. 302-314, 1997. 525 [2] X. Jiang and W. Yau, “Fingerprint Minutiae Matching Based on the Local and Global Structures,” IEEE 15th ICPR, pp. 1038-1041, 2000. 525 [3] L. C. Jain, U. Halici, I. Hayashi, S. B. Lee and S. Tsutsui, Intelligent Biometric Techniques in Fingerprint and Face Recognition, CRC Press, pp. 3-28, 1999. 525
A Fast Fingerprint Matching Algorithm Using Parzen Density Estimation
533
[4] Z. M. Kovacs-Vajna, “A Fingerprint Verification System Based on Triangular Matching and Dynamic Time Warping,” IEEE Trans. on PAMI, vol. 22, no. 11, pp. 1266-1276, Nov. 2000. 525 [5] R. S. Germain, A. Califano, and S. Colville, “Fingerprint Matching Using Transformation Parameter Clustering,” IEEE Computational Science and Engineering, vol. 4, no. 4, pp. 42-49, Oct.-Dec. 1997. 525 [6] A. Ranade and A. Rosenfeld, “Point pattern matching by relaxation,” Pattern Recognition, vol. 12, no. 2, pp. 269-275, 1993. 525 [7] D. Ahn and H. Kim, “Fingerprint Recognition Algorithm using Clique,” Journal of the Institute of Electronics Engineers of Korea, vol. 36-S, no. 5, pp. 69-80, 1999. 527 [8] R. O. Duda, P. E. Hart and D. G. Stork, Pattern Classification, 2nd Ed., WileyInterscience, pp. 164-167, 2001. 528 [9] D. Maio, D. Maltoni, R. Cappelli, J. L. Wayman and A. K. Jain, “FVC2000: Fingerprint Verification Competition,” IEEE Trans. on PAMI, vol. 24, no. 3, pp. 402-412, March 2002. 530 [10] S. Greenberg, M. Aladjem and D. Kogan, “Fingerprint Image Enhancement using Filtering Techniques,” Real-Time Imaging, vol. 8, issue 3, pp. 227-236, June 2002. 531
Author Index
Bevan, R´egis . . . . . . . . . . . . . . . . . . .327 Boyd, Colin . . . . . . . . . . . . . . . . . . . .407
Lim, Jong In . . . . . . . . . . . . . . . 16, 421 Lyuu, Yuh-Dauh . . . . . . . . . . . . . . . . 48
Chen, Hao . . . . . . . . . . . . . . . . . . . . . 107 Choi, Se Ah . . . . . . . . . . . . . . . . . . . 213 Chung, Yongwha . . . . . . . . . . . . . . 510 Courtois, Nicolas T. . . . . . . . . . . . 182
Maltesson, Nils . . . . . . . . . . . . . . . . 118 Mangard, Stefan . . . . . . . . . . . . . . . 343 Markowitch, Olivier . . . . . . . . . . . .451 Martin, Keith M. . . . . . . . . . . . . . . 237 McAven, Luke . . . . . . . . . . . . . . . . . 478 M¨ oller, Bodo . . . . . . . . . . . . . . . . . . 298 Moon, Daesung . . . . . . . . . . . . . . . . 510 Moon, Sangjae . . . . . . . . . . . . . . . . .374
Dawson, Ed . . . . . . . . . . . . . . . . . . . 407 Furuya, Soichi . . . . . . . . . . . . . . . . . 138 Gil, Younhee . . . . . . . . . . . . . . . . . . 510 Gollmann, Dieter . . . . . . . . . . . . . . 451 Ha, Jae-Cheol . . . . . . . . . . . . . . . . . 374 Han, Zongfen . . . . . . . . . . . . . . . . . . 107 Heys, Howard M. . . . . . . . . . . . . . . 164 Iwata, Tetsu . . . . . . . . . . . . . . . . . . . 226 Izu, Tetsuya . . . . . . . . . . . . . . . . . . . 283 Jeong, Hee Yun . . . . . . . . . . . . . . . . . 16 Jeong, Ik Rae . . . . . . . . . . . . . . . . . . . 16 Jin, Hai . . . . . . . . . . . . . . . . . . . . . . . 107 Ju, Hak Soo . . . . . . . . . . . . . . . . . . . 421 Ki, Ju Hee . . . . . . . . . . . . . . . . . . . . . 497 Kim, Hakil . . . . . . . . . . . . . . . . . . . . 525 Kim, Hyun Jeong . . . . . . . . . 421, 497 Kim, Hyun-Gyu . . . . . . . . . . . . . . . 313 Kim, HyungJong . . . . . . . . . . . . . . . .90 Kim, Kwangjo . . . . . . . . . . . . . . . . . 389 Knudsen, Erik . . . . . . . . . . . . . . . . . 327 Kremer, Steve . . . . . . . . . . . . . . . . . 451 Kurnio, Hartono . . . . . . . . . . . . . . . 478 Kurosawa, Kaoru . . . . . . . . . . . . . . 226 Kwon, Taekyoung . . . . . . . . . . . . . .465 Lee, Byoungcheon . . . . . . . . . . . . . 389 Lee, Dong Hoon . . . . . . . 16, 421, 497 Lee, Jae-il . . . . . . . . . . . . . . . . . . . . . 465 Lee, Kwangsu . . . . . . . . . . . . . . . . . . . 35 Lee, Mun-Kyu . . . . . . . . . . . . . . . . . 264 Lefranc, Serge . . . . . . . . . . . . . . . . . . . .1
Naccache, David . . . . . . . . . . . . 1, 118 Oh, Hyeong-Cheol . . . . . . . . . . . . . 313 Pan, Sungbum . . . . . . . . . . . . . . . . . 510 Park, Chang Seop . . . . . . . . . . . . . 497 Park, Kunsoo . . . . . . . . . . . . . . . . . . 264 Park, Tae-Jun . . . . . . . . . . . . . . . . . 264 Peng, Kun . . . . . . . . . . . . . . . . . . . . . 407 Phan, Raphael Chung-Wei . . . . . 138 Pieprzyk, Josef . . . . . . . . . . . .237, 253 Rhee, Hyun Sook . . . . . . . . . . . . . . . 16 Ryu, Choonwoo . . . . . . . . . . . . . . . 525 Safavi-Naini, Rei . . . . . . 62, 237, 478 Sakurai, Kouichi . . . . . . . . . . . . . . . 359 Seberry, Jennifer . . . . . . . . . . . . . . . 149 Sella, Yaron . . . . . . . . . . . . . . . . . . . 433 Shim, Kyungah . . . . . . . . . . . . . . . . . 35 Shin, Jun-Bum . . . . . . . . . . . . . . . . . 35 Song, Beomsik . . . . . . . . . . . . . . . . . 149 Song, Jooseok . . . . . . . . . . . . . . . . . 465 Song, Sanghoon . . . . . . . . . . . . . . . .465 Sun, Jianhua . . . . . . . . . . . . . . . . . . 107 Susilo, Willy . . . . . . . . . . . . . . . . . . . . 62 Takagi, Tsuyoshi . . . . . . . . . . 283, 359 Trichina, Elena . . . . . . . . . . . . . . . . 118 Tymen, Christophe . . . . . . . . . . . . 118 Viswanathan, Kapali . . . . . . . . . . .407 Wang, Guilin . . . . . . . . . . . . . . . . . . . 75 Wang, Huaxiong . . . . . . . . . . 237, 478
536
Author Index
Wild, Peter R. . . . . . . . . . . . . . . . . . 237 Wu, Ming-Luen . . . . . . . . . . . . . . . . . 48
Yang, Kyeongcheol . . . . . . . . . . . . 213 Yen, Sung-Ming . . . . . . . . . . . . . . . 374
Xiao, Lu . . . . . . . . . . . . . . . . . . . . . . . 164
Zenner, Erik . . . . . . . . . . . . . . . . . . . 200 Zhang, Xian-Mo . . . . . . . . . . . . . . . 253