Fast Software Encryption 10 conf

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen 2887 3 Berlin Heidelberg New Y...

Author: Thomas Johansson

70 downloads 1764 Views 2MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Edited by G. Goos, J. Hartmanis, and J. van Leeuwen

2887

3

Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo

Thomas Johansson (Ed.)

Fast Software Encryption 10th International Workshop, FSE 2003 Lund, Sweden, February 24-26, 2003 Revised Papers

13

Series Editors Gerhard Goos, Karlsruhe University, Germany Juris Hartmanis, Cornell University, NY, USA Jan van Leeuwen, Utrecht University, The Netherlands Volume Editor Thomas Johansson Lund University, Department of Information Technology Box 118, SE-221 00 Lund, Sweden E-mail: [email protected] Cataloging-in-Publication Data applied for A catalog record for this book is available from the Library of Congress. Bibliographic information published by Die Deutsche Bibliothek Die Deutsche Bibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data is available in the Internet at .

CR Subject Classification (1998): E.3, F.2.1, E.4, G.4 ISSN 0302-9743 ISBN 3-540-20449-0 Springer-Verlag Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer-Verlag. Violations are liable for prosecution under the German Copyright Law. Springer-Verlag Berlin Heidelberg New York a member of BertelsmannSpringer Science+Business Media GmbH http://www.springeronline.com © International Association for Cryptologic Research 2003 Printed in Germany Typesetting: Camera-ready by author, data conversion by PTP Berlin GmbH Printed on acid-free paper SPIN: 10966228 06/3142 543210

Preface

Fast Software Encryption is now a 10-year-old workshop on symmetric cryptography, including the design and cryptanalysis of block and stream ciphers, as well as hash functions. The ﬁrst FSE workshop was held in Cambridge in 1993, followed by Leuven in 1994, Cambridge in 1996, Haifa in 1997, Paris in 1998, Rome in 1999, New York in 2000, Yokohama in 2001, and Leuven in 2002. This Fast Software Encryption workshop, FSE 2003, was held February 24– 26, 2003 in Lund, Sweden. The workshop was sponsored by IACR (International Association for Cryptologic Research) and organized by the General Chair, Ben Smeets, in cooperation with the Department of Information Technology, Lund University. This year a total of 71 papers were submitted to FSE 2003. After a two-month reviewing process, 27 papers were accepted for presentation at the workshop. In addition, we were fortunate to have in the program an invited talk by James L. Massey. The selection of papers was diﬃcult and challenging work. Each submission was refereed by at least three reviewers. I would like to thank the program committee members, who all did an excellent job. In addition, I gratefully acknowledge the help of a number of colleagues who provided reviews for the program committee. They are: Kazumaro Aoki, Alex Biryukov, Christophe De Canni`ere, Nicolas Courtois, Jean-Charles Faug`ere, Rob Johnson, Pascal Junod, Joseph Lano, Marine Minier, Elisabeth Oswald, H˚ avard Raddum, and Markku-Juhani O. Saarinen. The local arrangements for the workshop were managed by a committee consisting of Patrik Ekdahl, Lena M˚ ansson and Laila Lembke. I would like to thank them all for their hard work. Finally, we are grateful for the ﬁnancial support for the workshop provided by Business Security, Ericsson Mobile Platforms, and RSA Security.

August 2003

Thomas Johansson

FSE 2003

February 24–26, 2003, Lund, Sweden Sponsored by the International Association for Cryptologic Research in cooperation with Department of Information Technology, Lund University, Sweden Program Chair Thomas Johansson (Lund University, Sweden) General Chair Ben Smeets (Ericsson, Sweden)

Program Committee Ross Anderson Anne Canteaut Joan Daemen Cunsheng Ding Hans Dobbertin Henri Gilbert Jovan Golic Lars Knudsen Helger Lipmaa Mitsuru Matsui Willi Meier Kaisa Nyberg Bart Preneel Vincent Rijmen Matt Robshaw Serge Vaudenay David Wagner

Cambridge University, UK Inria, France Protonworld, Belgium Hong Kong University of Science and Technology University of Bochum, Germany France Telecom, France Gemplus, Italy Technical University of Denmark Helsinki University of Technology, Finland Mitsubishi Electric, Japan Fachhochschule Aargau, Switzerland Nokia, Finland K.U. Leuven, Belgium Cryptomathic, Belgium Royal Holloway, University of London, UK EPFL, Switzerland U.C. Berkeley, USA

Table of Contents

Block Cipher Cryptanalysis Cryptanalysis of IDEA-X/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H˚ avard Raddum (University of Bergen)

1

Diﬀerential-Linear Cryptanalysis of Serpent . . . . . . . . . . . . . . . . . . . . . . . . . . . Eli Biham, Orr Dunkelman, and Nathan Keller (Technion)

9

Rectangle Attacks on 49-Round SHACAL-1 . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Eli Biham, Orr Dunkelman, and Nathan Keller (Technion) Cryptanalysis of Block Ciphers Based on SHA-1 and MD5 . . . . . . . . . . . . . . 36 Markku-Juhani O. Saarinen (Helsinki University of Technology) Analysis of Involutional Ciphers: Khazad and Anubis . . . . . . . . . . . . . . . . . . . 45 Alex Biryukov (Katholieke Universiteit Leuven)

Boolean Functions and S-Boxes On Plateaued Functions and Their Constructions . . . . . . . . . . . . . . . . . . . . . . 54 Claude Carlet and Emmanuel Prouﬀ (INRIA) Linear Redundancy in S-Boxes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Joanne Fuller and William Millan (Queensland University of Technology)

Stream Cipher Cryptanalysis Loosening the KNOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 Antoine Joux and Fr´ed´eric Muller (DCSSI Crypto Lab) On the Resynchronization Attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Jovan Dj. Goli´c (Telecom Italia Lab) and Guglielmo Morgari (Telsy Elettronica e Telecomunicazioni) Cryptanalysis of Sober-t32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Steve Babbage (Vodafone Group Research & Development), Christophe De Canni`ere, Joseph Lano, Bart Preneel, and Joos Vandewalle (Katholieke Universiteit Leuven)

MACs OMAC: One-Key CBC MAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Tetsu Iwata and Kaoru Kurosawa (Ibaraki University)

VIII

Table of Contents

A Concrete Security Analysis for 3GPP-MAC . . . . . . . . . . . . . . . . . . . . . . . . . 154 Dowon Hong, Ju-Sung Kang (ETRI), Bart Preneel (Katholieke Universiteit Leuven), and Heuisu Ryu (ETRI) New Attacks against Standardized MACs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Antoine Joux, Guillaume Poupard (DCSSI), and Jacques Stern (Ecole normale sup´erieure) Analysis of RMAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Lars R. Knudsen (Technical University of Denmark) and Tadayoshi Kohno (UCSD)

Side Channel Attacks A Generic Protection against High-Order Diﬀerential Power Analysis . . . . . 192 Mehdi-Laurent Akkar and Louis Goubin (Schlumberger Smart Cards) A New Class of Collision Attacks and Its Application to DES . . . . . . . . . . . . 206 Kai Schramm, Thomas Wollinger, and Christof Paar (Ruhr-Universit¨ at Bochum)

Block Cipher Theory Further Observations on the Structure of the AES Algorithm . . . . . . . . . . . . 223 Beomsik Song and Jennifer Seberry (University of Wollongong) Optimal Key Ranking Procedures in a Statistical Cryptanalysis . . . . . . . . . . 235 Pascal Junod and Serge Vaudenay (Swiss Federal Institute of Technology, Lausanne) Improving the Upper Bound on the Maximum Diﬀerential and the Maximum Linear Hull Probability for SPN Structures and AES . . 247 Sangwoo Park (National Security Research Institute), Soo Hak Sung (Pai Chai University), Sangjin Lee, and Jongin Lim (CIST) Linear Approximations of Addition Modulo 2n . . . . . . . . . . . . . . . . . . . . . . . . . 261 Johan Wall´en (Helsinki University of Technology) Block Ciphers and Systems of Quadratic Equations . . . . . . . . . . . . . . . . . . . . 274 Alex Biryukov and Christophe De Canni`ere (Katholieke Universiteit Leuven)

New Designs Turing: A Fast Stream Cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 Gregory G. Rose and Philip Hawkes (Qualcomm Australia)

Table of Contents

IX

Rabbit: A New High-Performance Stream Cipher . . . . . . . . . . . . . . . . . . . . . . 307 Martin Boesgaard, Mette Vesterager, Thomas Pedersen, Jesper Christiansen, and Ove Scavenius (CRYPTICO) Helix: Fast Encryption and Authentication in a Single Cryptographic Primitive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 Niels Ferguson (MacFergus), Doug Whiting (HiFn), Bruce Schneier (Counterpane Internet Security), John Kelsey, Stefan Lucks (Universit¨ at Mannheim), and Tadayoshi Kohno (UCSD) PARSHA-256 – A New Parallelizable Hash Function and a Multithreaded Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Pinakpani Pal and Palash Sarkar (Indian Statistical Institute)

Modes of Operation Practical Symmetric On-Line Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard (DCSSI Crypto Lab) The Security of “One-Block-to-Many” Modes of Operation . . . . . . . . . . . . . . 376 Henri Gilbert (France T´el´ecom)

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397

Cryptanalysis of IDEA-X/2 H˚ avard Raddum Dep. of Informatics, The University of Bergen, Norway Abstract. IDEA is a 64-bit block cipher with a 128-bit key designed by J. Massey and X. Lai. At FSE 2002 a slightly modiﬁed version called IDEA-X was attacked using multiplicative diﬀerentials. In this paper we present a less modiﬁed version of IDEA we call IDEA-X/2, and an attack on this cipher. This attack also works on IDEA-X, and improves on the attack presented at FSE 2002. Keywords: Cryptography, block ciphers, diﬀerential cryptanalysis, IDEA.

1

Introduction

The block cipher PES (Proposed Encryption Standard) was introduced at Eurocrypt in 1990 [1]. When diﬀerential cryptanalysis [2] became known in 1991, the algorithm was changed, and renamed to IPES (Improved PES). Later the cipher has become known as IDEA (International Data Encryption Algorithm), and is today used in many cryptographic components. IDEA has been extensively cryptanalysed, but remains unbroken. We brieﬂy mention some of this work. In 1993 2.5 rounds of IDEA was attacked with diﬀerential cryptanalysis [3]. At CRYPTO the same year, large classes of weak keys due to the simple key schedule were presented [4]. At EUROCRYPT 1997, 3and 3.5-round versions of IDEA were broken using a diﬀerential-linear attack and a truncated diﬀerential attack [5]. Larger classes of weak keys were demonstrated at EUROCRYPT 1998 [6]. At FSE 1999 impossible diﬀerentials were used to attack 4.5 rounds of IDEA [7], and at SAC 2002 attacks on IDEA for up to four rounds were improved [8]. At FSE 2002 multiplicative diﬀerentials were used to attack a slightly modiﬁed version of IDEA called IDEA-X [9]. We show in this paper that there exists a better attack for IDEA-X, and that this attack also works on a less modiﬁed version of IDEA we have chosen to call IDEA-X/2 (read as “idea x half”). The paper is organised as follows. In Section 2 we give a brief description of IDEA and its variants, in Section 3 we build the diﬀerential characteristic used to attack IDEA-X/2, in Section 4 we show how to ﬁnd the subkeys used in the output transformation, and we conclude in Section 5.

2

Description of IDEA

IDEA operates on blocks of 64 bits, using a 128-bit key. The cipher consists of several applications of three group operations ⊕, and . Each operation joins T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 1–8, 2003. c International Association for Cryptologic Research 2003

2

H˚ avard Raddum

together two words of 16 bits. The operation ⊕ is bitwise XOR, is addition modulo 216 , and is multiplication modulo 216 + 1, where the all-zero word is treated as the element 216 . IDEA has eight rounds, followed by an output transformation. One round of IDEA and the output transformation is shown in the ﬁgure below.

(1)

(1)

Z1

(1)

(1)

Z2

Z4

Z3

(1)

First round

Z5

MA− structure (1)

Z6

7 additional rounds

(9)

Z2

(9)

Z3

(9)

Z4

Output transform

(9)

Z1

Fig. 1. Structure of IDEA

The security of IDEA lies in the fact that no two of the three group operations are compatible, in the sense that the distributive law does not hold. The designers have also made sure that any two contiguous group operations in IDEA are never the same. (r) Zi is subkey i used in round r, where the output transformation counts as the ninth round. Each subkey is a 16-bit word, and a total of 52 subkeys are needed. They are generated as follows. The user selects a 128-bit master key,

Cryptanalysis of IDEA-X/2

3

viewed as eight 16-bit words. The ﬁrst 8 subkeys are taken as these 8 words, from left to right. Then the master key is cyclically rotated 25 positions to the left, and the resulting eight 16-bit words are taken as the next subkeys, and so on. The (1) (1) (1) (2) (2) (9) order the subkeys are taken in is Z1 , Z2 , . . . , Z6 , Z1 , . . . , Z6 , . . . , Z4 . 2.1

IDEA-X and IDEA-X/2

In [9], a variant called IDEA-X is attacked. In IDEA-X, each except for the two in the output transformation is changed to an ⊕. The authors then show that for 2112 of the keys there exists a multiplicative diﬀerential characteristic over eight rounds that holds with probability 2−32 . In this paper we consider IDEA-X/2, where we only change half of the ’s (r) (r) in one round to ⊕’s. In IDEA-X/2 only the ’s where Z2 and Z3 are inserted are changed to ⊕’s, the MA-structure is left unchanged.

3 3.1

Building a Diﬀerential Characteristic The Groups Z216 and GF(216 + 1)∗

The basis of our analysis comes from the fact that both Z216 and GF(216 + 1)∗ are cyclic groups, and therefore isomorphic (see [10]). Here we establish this isomorphism as follows. 2 Let g0 be a primitive element of GF(216 + 1)∗ , and deﬁne gi = gi−1 for 16 ∗ i = 1, . . . , 15. Then each element a in GF(2 + 1) can be written uniquely as x15 x14 g14 · · · g0x0 , a = g15

where each xi ∈ {0, 1}. For simpler notation we will write this as a = gx . Let φ be the map from GF(216 + 1)∗ to Z216 deﬁned by φ(a) = x, where a = gx . We show that φ is an isomorphism. The identity element of GF(216 + 1)∗ is 1, and the identity element of Z216 is 0. Since 1 = g0 we have φ(1) = 0. Clearly, φ is one-to-one. Let a = gx and b = gy be two elements of GF(216 + 1)∗ . Then y15 x15 g15 · · · g0x0 g0y0 . a b = g15

If at least one of xi , yi is 0 then gixi giyi = gixi +yi , with xi + yi ∈ {0, 1}. If 1 xi = yi = 1 we get gi1 gi1 = gi+1 gi0 , that is, we get a “carry”. Note that 1 1 0 g15 = −1, so if x15 = y15 = 1 we have g15 g15 = g15 , which means the carry is shifted out of the computation. From this we see that a b = gxy , showing that φ(a b) = φ(a) φ(b), and that φ respects the group operations. This shows that φ is an isomorphism.

4

H˚ avard Raddum

φ φ φ−1

Fig. 2. Isomorphic diagrams

3.2

Diﬀerential Properties of φ

In a cryptographic setting, we may regard φ as a 16-bit S-box. The above analysis shows that a b = φ−1 (φ(a) φ(b)). In other words, the two diagrams below may be used interchangeably. We have computed the S-box φ explicitly using g0 = 3 as a primitive element, and checked its diﬀerential properties. In the ﬁrst key-mixing layer in each round, (r) (r) Z1 and Z4 are mixed with two of the words using . Using the isomorphic diagram above, we may ﬁrst send the keys and the two words through φ, and then combine using . In the analysis of the diﬀerential properties we should therefore let the output diﬀerences of φ be , subtraction modulo 216 . We found that if we let the input diﬀerences to φ be diﬀerences with respect to ⊕, then the following diﬀerential holds with probability 1/2: φ

δ⊕ = F F F Dx −→ δ = 215 . The diﬀerence δ is preserved through the key-addition. Through φ−1 we get the φ−1

reversed diﬀerential with probability 1/2: δ −→ δ⊕ . These may be combined (r)

Zj

(r)

into the diﬀerential δ⊕ −→ δ⊕ that, on the average over all keys Zj , holds (r)

with probability 1/4 (j ∈ {1, 4}). For each key Zj , we have checked the exact probability of this diﬀerential. The keys 1 and −1 are known to be weak under , the diﬀerential holds with probability 1 and 0.5, respectively. The smallest probability that occurs (for the keys 3 and −3 with g0 = 3) is greater than 0.166.., and the probability lies in the range 0.23 − 0.27 for 216 − 22 of the (r) possible values for Zj . 3.3

Diﬀerential Characteristic of IDEA-X/2

Let the 64-bit cipher block be denoted by (w1 , w2 , w3 , w4 ), where each wi is a 16-bit word referred to as word i.

Cryptanalysis of IDEA-X/2

5

All diﬀerences in the characteristic are with respect to ⊕, and we denote δ = F F F Dx . Let a pair of texts at the beginning of one round have diﬀerence (r) (r) (δ, δ, δ, δ). Words 2 and 3 will have diﬀerence δ after XOR with Z2 and Z3 . (r) Each of the words 1 and 4 will have diﬀerence δ after multiplication with Z1 (r) and Z4 with probability 1/4. Thus the diﬀerence after the key-mixing layer in the beginning of the round is (δ, δ, δ, δ) with probability 2−4 . Since the diﬀerences in words 1 and 3 are the same and the diﬀerences in words 2 and 4 are the same, the two input diﬀerences to the MA-structure are both 0. Then the output diﬀerences of the MA-structure will be 0, so the diﬀerence of the blocks after the XOR with the outputs from the MA-structure will be (δ, δ, δ, δ). Since words 2 and 3 have equal diﬀerences the diﬀerence of the blocks after the swap at the end of the round will also be (δ, δ, δ, δ). This one-round characteristic may be concatenated with itself 8 times to form the 8-round diﬀerential characteristic (δ, δ, δ, δ)

8

rounds −→ (δ, δ, δ, δ)

that holds with probability (2−4 )8 = 2−32 . The probability of this characteristic may be increased by a factor four as (1) (1) follows. In the ﬁrst round Z1 and Z4 are inserted using . We look at the alternative diagram for this operation, containing the S-boxes φ. Then we see that the ﬁrst application of φ is done to words 1 and 4 of the plaintext block, before any key-material has been inserted. This means we can select the plaintext (1) pairs such that the words 1 and 4 will have diﬀerence δ before φ(Z1 ) and (1) φ(Z4 ) are inserted, with probability 1. Then the probability of the characteristic of the ﬁrst round will be 2−2 instead of 2−4 , and the overall probability of the 8-round characteristic will be 2−30 .

4

Key Recovery

We select 232 pairs of plaintext with diﬀerence (δ, δ, δ, δ), and ask for the corresponding ciphertexts. A pair of plaintexts that has followed the characteristic is called a right pair, and a pair that has not followed the characteristic is called a wrong pair. We expect to have 4 right pairs among the 232 pairs. 4.1

Filtering out Wrong Pairs

Let ci and ci be the i’th words of the ciphertexts in one pair. We compute what (9) (9) values (if any) Z2 and Z3 may have to make this pair a right pair. If this pair (9) (9) is a right pair we have (c2 Z2 ) ⊕ (c2 Z2 ) = δ. Two cases arise. (9) (9) Case 1: The second least signiﬁcant bits of (c2 Z2 ) and (c2 Z2 ) are both (9) (9) 0. Since (c2 Z2 ) and (c2 Z2 ) are otherwise bitwise complementary to each (9) (9) (9) other, we have (c2 Z2 ) (c2 Z2 ) = 216 − 3. This yields 2Z2 = 3 c2 c2 ,

6

H˚ avard Raddum

which is possible only if exactly one of c2 and c2 is odd. In that case we get (9) (9) Z2 = (3 c2 c2 ) >> 1 or Z2 = ((3 c2 c2 ) >> 1) 215 . (9) (9) Case 2: The second least signiﬁcant bits of (c2 Z2 ) and (c2 Z2 ) are (9) (9) (9) both 1. In this case we have (c2 Z2 ) (c2 Z2 ) = 1. This gives 2Z2 = 16 2 − 1 c2 c2 , again only possible when exactly one of c2 and c2 is odd. In that (9) (9) case we get Z2 = (216 −1c2 c2 ) >> 1 or Z2 = ((216 −1c2 c2 ) >> 1)215 . When exactly one of c2 and c2 is odd, we don’t know if we are in case 1 or (9) 2, so four values of Z2 will be suggested. The reasoning above also applies to c3 and c3 , so when exactly one of c3 and (9) c3 is odd, we will get four values of Z3 suggested. The probability that, in a random pair, exactly one of c2 and c2 is odd, and exactly one of c3 and c3 is odd is 1/4. When we ﬁlter on this condition about 230 of the pairs will remain. Next we focus on the words c1 and c1 in a pair. For the multiplication with (9) Z1 we use the alternative diagram containing the S-boxes φ and φ−1 . We have examined how the 216 pairs with input diﬀerence δ behave through φ. It turns out that 215 pairs get output diﬀerence 215 (with respect to ), and that there are 215 other possible output diﬀerences, each with a unique pair producing it. Now we go backwards through the last φ−1 and look at the diﬀerence φ(c1 ) φ(c1 ). If this diﬀerence is not one of the possible output diﬀerences of φ receiving input diﬀerence δ, we can throw away this pair as a wrong pair. When φ receives input diﬀerence δ there are 215 + 1 possible output diﬀerences, so this happens with probability 1/2. The same reasoning applies for c4 and c4 , so the probability of both words 1 and 4 surviving this test is 1/4. After performing this test we expect to be left with 228 pairs, each one with the possibility of being a right pair. 4.2

(9)

(9)

(9)

(9)

Finding the Subkey (Z1 , Z2 , Z3 , Z4 )

Each of the remaining pairs has at least one subkey that would make it a possible right pair. For each pair, these subkeys are suggested as the right subkeys. The correct subkey is suggested for each right pair, and all wrong keys are suggested more or less at random. We proceed to count how many keys each pair suggests. (9) (9) Each pair suggests 4 values of Z2 and 4 values of Z3 . These values can (9) (9) be combined in 16 diﬀerent ways to produce a possible (Z2 , Z3 )-value for the (9) (9) subkey. By examining the key schedule, we ﬁnd that Z2 and Z3 completely (1) determine Z4 . Letting p4 and p4 be the fourth words of the plaintexts in one (1) (1) (1) pair, we check for each of the 16 values of Z4 if (p4 Z4 ) ⊕ (p4 Z4 ) = δ. If this doesn’t hold, and the pair we are examining is a right pair, then the value (1) (9) (9) of Z4 (and hence (Z2 , Z3 )) must be wrong and can be discarded. Because of the special way we have chosen p4 and p4 (we have φ(p4 ) φ(p4 ) = 215 with probability 1), the probability of passing this test is 1/2, so we expect that 8 of (9) (9) the initial 16 possible (Z2 , Z3 )-values remain.

Cryptanalysis of IDEA-X/2 (9)

7

(9)

The number of (Z1 , Z4 )-values suggested for one pair depends on whether φ(c1 ) φ(c1 ) or φ(c4 ) φ(c4 ) is 215 . Whenever φ(c1 ) φ(c1 ) = 215 , this pair will (9) suggest 215 values of Z1 . (9) When φ(c1 ) φ(c1 ) = 215 we will get exactly one value of Z1 suggested, (9) likewise for Z4 . We expect to have four right pairs, each with diﬀerence δ in words 1 and 4 just before φ in the output transformation. The probability of getting diﬀerence 215 after φ is 1/2 for each word, so we expect that one of (9) (9) the right pairs will suggest 215 values for both Z1 and Z4 , a total of 230 (9) (9) values for (Z1 , Z4 ). The probability that a random pair after ﬁltering has φ(c1 ) φ(c1 ) = φ(c4 ) φ(c4 ) = 215 is 2−30 , so we don’t expect any other pairs to have this property, since we are left with only 228 pairs. The probability that a random pair after ﬁltering has φ(c1 ) φ(c1 ) = 215 is −15 2 , so we expect to ﬁnd 213 pairs with this property. These pairs will suggest (9) (9) 215 values for Z1 and one value for Z4 each. The same goes for the fourth (9) (9) word, we expect 213 pairs suggesting one value for Z1 and 215 values for Z4 . (9) (9) All other pairs will suggest exactly one value for (Z1 , Z4 ). (9) (9) Each of the values suggested from one pair for (Z1 , Z4 ) must be coupled (9) (9) with the eight values for (Z2 , Z3 ), so the total number of subkeys suggested is expected to be 8(1 · 230 + 213 · 215 + 213 · 215 + (228 − 214 ) · 1) ≈ 234 . The correct subkey is expected to be suggested 4 times, and the other keys are expected to be distributed more or less at random over the other 264 possible values. It is highly unlikely that a wrong key should be suggested four times, so we take the most suggested key as the correct subkey. 4.3

Finding the Rest of the Key

By keeping track of which pairs suggest which keys, the right pairs will be revealed. The remaining 64 bits of the master key may be found by further analysis using the right pairs. Since we know the diﬀerences in these pairs at any stage of the encryption, we may start at the plaintext or ciphertext side and let these pairs suggest values for the (partially) unknown subkeys. We will not go into details here, but this strategy should work faster than searching exhaustively for the remaining 64 bits.

5

Conclusion

We have shown how to use the isomorphism between the groups Z216 and GF(216 +1)∗ as a basis for a diﬀerential attack on IDEA-X/2 that works without any conditions on the subkeys. This attack also works on IDEA-X, and gives an improvement over the attack found in [9]. This shows that the security of IDEA

8

H˚ avard Raddum (r)

depends on the fact that and not ⊕ is used when inserting the subkeys Z2 (r) and Z3 . A 4-round characteristic has been implemented, to check that theory and practice are consistent when the round keys are not independent, but generated by the key schedule. The implementation also incorporated the ﬁrst round trick, bringing the probability of the diﬀerential to 2−14 . One thousand keys were generated at random, and for each key 220 pairs of plaintext were encrypted, and the number of right pairs recorded. The expected number of right pairs is 64, the actual number of right pairs produced by the keys ranged from 33 to 131. Thus the analysis (assuming independent round keys) seems to be consistent with the key schedule of IDEA.

References 1. X. Lai and J. Massey. A Proposal for a New Block Encryption Standard. Advances in Cryptology - EUROCRYPT ’90, LNCS 0473, pp. 389 - 404, Springer-Verlag 1991 2. E. Biham and A. Shamir. Diﬀerential Cryptanalysis of the Data Encryption Standard. Springer Verlag, 1993. 3. W. Meier. On the security of the IDEA block cipher. Advances in Cryptology EUROCRYPT ’93, LNCS 0765, pp. 371 - 385, Springer-Verlag 1994. 4. J. Daemen, R. Govaerts and J. Vandewalle. Weak Keys for IDEA. Advances in Cryptology - CRYPTO ’93, LNCS 0773, pp. 224 - 231, Springer-Verlag 1994. 5. J. Borst, L. Knudsen and V. Rijmen. Two Attacks on Reduced IDEA. Advances in Cryptology - EUROCRYPT ’97, LNCS 1233, pp. 1 - 13, Springer-Verlag 1997. 6. P. Hawkes. Diﬀerential-Linear Weak Key Classes of IDEA. Advances in Cryptology - EUROCRYPT ’98, LNCS 1403, pp. 112 - 126, Springer-Verlag 1998 7. E. Biham, A. Biryukov and A. Shamir. Miss in the Middle Attacks on IDEA and Khufu. Fast Software Encryption ’99, LNCS 1636, pp. 124 - 138, Springer-Verlag 1999. 8. H. Demirci. Cryptanalysis of IDEA using Exact Distributions. Selected Areas in Cryptography, preproceedings. 9. N. Borisov, M. Chew, R. Johnson and D. Wagner. Multiplicative Diﬀerentials. Fast Software Encryption 2002, LNCS 2365, pp. 17 - 33, Springer-Verlag 2002. 10. D. R. Stinson. Cryptography Theory and Practice. CRC Press 1995, p. 179.

Diﬀerential-Linear Cryptanalysis of Serpent Eli Biham1 , Orr Dunkelman1 , and Nathan Keller2 1

Computer Science Department, Technion, Haifa 32000, Israel {biham,orrd}@cs.technion.ac.il 2 Mathematics Department, Technion, Haifa 32000, Israel [email protected]

Abstract. Serpent is a 128-bit SP-Network block cipher consisting of 32 rounds with variable key length (up to 256 bits long). It was selected as one of the 5 AES ﬁnalists. The best known attack so far is a linear attack on an 11-round reduced variant. In this paper we apply the enhanced diﬀerential-linear cryptanalysis to Serpent. The resulting attack is the best known attack on 11-round Serpent. It requires 2125.3 chosen plaintexts and has time complexity of 2139.2 . We also present the ﬁrst known attack on 10-round 128-bit key Serpent. These attacks demonstrate the strength of the enhanced diﬀerential-linear cryptanalysis technique.

1

Introduction

Serpent [1] is one of the 5 AES [13] ﬁnalists. It has a 128-bit block size and accepts key sizes of any length between 0 and 256 bits. Serpent is an SP-Network with 32 rounds and 4-bit to 4-bit S-boxes. Since its introduction in 1997, Serpent has withstood a great deal of cryptanalytic eﬀorts. In [8] a modiﬁed variant of Serpent in which the linear transformation of the round function was modiﬁed into a permutation was analyzed. The change weakens Serpent, as this change allows one active S-box to activate only one S-box in the consecutive round. In Serpent, this is impossible, as one active S-box leads to at least two active S-boxes in the following round. The analysis of the modiﬁed variant presents an attack against up to 35 rounds of the cipher. In [9] a 256-bit key variant of 9-round Serpent1 is attacked using the ampliﬁed boomerang attack. The attack uses two short diﬀerentials – one for rounds 1–4 and one for rounds 5–7. These two diﬀerentials are combined to construct a 7round ampliﬁed boomerang distinguisher, which is then used to mount a key recovery attack on 9-round Serpent. The attack requires 2110 chosen plaintexts and its time complexity is 2252 9-round Serpent encryptions. In [4] the rectangle attack is applied to attack 256-bit key 10-round Serpent. The attack is based on an 8-round distinguisher. The distinguisher treats those 8 rounds as composed of two sub-ciphers: rounds 1–4 and rounds 5–8. In each 1

The work described in this paper has been supported by the European Commission through the IST Programme under Contract IST-1999-12324. We use n-round Serpent when we mean a reduced version of Serpent with n rounds.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 9–21, 2003. c International Association for Cryptologic Research 2003

10

Eli Biham, Orr Dunkelman, and Nathan Keller

sub-cipher the attack exploits many diﬀerentials. These 4-round diﬀerentials are combined to create an 8-round rectangle distinguisher. The attack requires 2126.8 chosen plaintexts and 2217 memory accesses2 which are equivalent to 2208.8 10round Serpent encryptions3 . The 10-round rectangle attack was improved in [6] and the improved attack requires 2126.3 chosen plaintexts, with time complexity of 2173.8 memory accesses (2165 10-round Serpent encryptions). Thus, using the rectangle attack, it is also possible to attack 192-bit key 10-round Serpent. A similar boomerang attack, which requires almost the entire code book is also presented in [6]. The best known attack so far against Serpent can attack up to 11 rounds. The attack [5] is based on linear cryptanalysis [11]. It requires data complexity of 2118 known plaintexts and time complexity of 2214 memory accesses (2205.7 11-round Serpent encryptions). In this paper we combine the diﬀerential and the linear results on Serpent to present an attack on 11-round Serpent which has a signiﬁcantly lower time complexity. The attack is based on the diﬀerential-linear technique [10]. The technique was later enhanced and improved in [7]. This technique combines a diﬀerential characteristic (or several diﬀerential characteristics) together with a linear approximation to construct a chosen plaintext distinguisher. This result sheds more light on the applicability and the power of the enhanced diﬀerentiallinear technique. The data complexity of our attack is 2125.3 chosen plaintexts and the time complexity is about 2139.2 11-round Serpent encryptions. Therefore, the attack is faster than exhaustive search even for 192-bit keys 11-round Serpent. We use the same techniques to present a 10-round attack on Serpent that requires 2107.2 chosen plaintexts and 2125.2 10-round Serpent encryptions. This is the ﬁrst known attack on a 128-bit key 10-round Serpent faster than exhaustive search. We organize this paper as follows: In Section 2 we give the basic description of Serpent. In Section 3 we brieﬂy describe the diﬀerential-linear technique. In Section 4 we present the diﬀerential-linear attack on 11-round Serpent and on 10-round Serpent. We summarize our results and compare them with previous results on Serpent in Section 5. In the appendices we describe the diﬀerential characteristic and the linear approximation which are used in the attacks.

2

A Description of Serpent

In [1] Anderson, Biham and Knudsen presented the block cipher Serpent. It has a block size of 128 bits and accepts 0–256 bit keys. Serpent is an SP-network block cipher with 32 rounds. Each round is composed of key mixing, a layer of S-boxes and a linear transformation. There is an equivalent bitsliced description which makes the cipher more eﬃcient, and easier to describe. 2 3

In [4] a diﬀerent number is quoted, but in [6] this mistake was mentioned, and the true time complexity of the algorithm was computed. The conversion was done according to the best performance ﬁgures, presented in [12], assuming one memory access is equivalent to 3 cycles.

Diﬀerential-Linear Cryptanalysis of Serpent

11

In our description we adopt the notations of [1] in the bitsliced version. The ˆi (which is a 128-bit value). intermediate value of the round i is denoted by B ˆ The rounds are numbered from 0 to 31. Each Bi is composed of four 32-bit words X 0 , X1 , X2 , X3 . Serpent has 32 rounds, and a set of eight 4-bit to 4-bit S-boxes. Each round function Ri (i ∈ {0, . . . , 31}) uses a single S-box 32 times in parallel. For example, R0 uses S0 , 32 copies of which are applied in parallel. Thus, the ﬁrst copy of S0 takes the least signiﬁcant bits from X0 , X1 , X2 , X3 and returns the output to the same bits. This can be implemented as a boolean expression of the 4 words. The set of eight S-boxes is used four times. S0 is used in round 0, S1 is used in round 1, etc. After using S7 in round 7 we use S0 again in round 8, then S1 in round 9, and so on. In the last round (round 31) the linear transformation is omitted and another key is XORed. The cipher may be formally described by the following equations: ˆ0 := P B ˆi+1 := Ri (B ˆi ) B ˆ32 C := B

i = 0, . . . , 31

where ˆ i )) Ri (X) = LT (Sˆi (X ⊕ K ˆ i) ⊕ K ˆ 32 Ri (X) = Sˆi (X ⊕ K

i = 0, . . . , 30 i = 31

where Sˆi is the application of the S-box S(i mod 8) thirty two times in parallel, and LT is the linear transformation. ˆi ), they are linearly Given the four 32-bit words X0 , X1 , X2 , X3 := Sˆi (Bˆi ⊕ K mixed by the following linear transformation: X0 := X0 <<< 13 X2 := X2 <<< 3 X1 := X1 ⊕ X0 ⊕ X2 X3 := X3 ⊕ X2 ⊕ (X0 << 3) X1 := X1 <<< 1 X3 := X3 <<< 7 X0 := X0 ⊕ X1 ⊕ X3 X2 := X2 ⊕ X3 ⊕ (X1 << 7) X0 := X0 <<< 5 X2 := X2 <<< 22 ˆ Bi+1 := X0 , X1 , X2 , X3 where <<< denotes bit rotation to the left, and << denotes bit shift to the left. The key scheduling algorithm of Serpent is deﬁned for 256-bit keys. Shorter keys are padded by a single bit of 1 followed by as many bits of 0’s required to have a total length of 256 bits. This value is loaded into a linear feedback shift

12

Eli Biham, Orr Dunkelman, and Nathan Keller

register, that outputs blocks of 128 bits. Each block passes through a layer of S-boxes (diﬀerent layer for each block). This process is repeated until 33 subkeys of 128 bits each are derived. The subkeys are linearly independent, but knowing the subkey which enters an S-box (for any round, for any S-box), we can invert the relevant S-box used in the key schedule and obtain 4 linear equations on the key.

3

Diﬀerential-Linear Cryptanalysis

Diﬀerential cryptanalysis [3] analyzes ciphers by studying the development of diﬀerences during encryption. The attack is a chosen plaintext attack based on a diﬀerential distinguisher which uses pairs of plaintexts. The distinguisher exploits the fact that for the attacked cipher, the probability that an input diﬀerence ΩP (i.e., diﬀerence between two inputs) results in an output diﬀerence ΩT is higher than for a random permutation. Linear cryptanalysis [11] analyzes the cipher by studying approximate linear relations. The attack is based on building a distinguisher which exploits the fact that for the attacked cipher, the probability that a given input mask λP and a given output mask λT are related is diﬀerent than 1/2 (the probability for a random permutation). In 1994, Langford and Hellman [10] showed that both kinds of analysis can be combined together by a technique called diﬀerential-linear cryptanalysis. The attack uses a diﬀerential characteristic that induces a linear relation between two intermediate encryption values with probability one. In [7] Biham, Dunkelman and Keller extended this technique to the cases where the probability of the diﬀerential part is smaller than 1. We use notations based on [2, 3] for diﬀerential and linear cryptanalysis, respectively. In our notations ΩP , ΩT are the input and output diﬀerences of the diﬀerential characteristic, and λP , λT are the input and output subsets (denoted by bit masks) of the linear characteristic. Let E be a block cipher which can be described as a cascade of two subciphers – E0 and E1 , i.e., E = E1 ◦ E0 . Langford and Hellman suggested to use a truncated diﬀerential ΩP → ΩT for E0 with probability 1. To this diﬀerential they concatenate a linear approximation λP → λT with probability 1/2 + q or with bias q. Their attack requires that the bits masked in λP have a constant and known diﬀerence in ΩT . If we take a pair of plaintexts P1 and P2 that satisfy P1 ⊕ P2 = ΩP , then after E0 , λP · E0 (P1 ) = λP · E0 (P2 ) (or the opposite if the scalar product ΩT · λP is 1). This follows from the fact that E0 (P1 ) and E0 (P2 ) have a ﬁxed diﬀerence in the masked bits according to the output diﬀerence. Recall that the linear approximation predicts that λP · P = λT · E1 (P ) with probability 1/2+q. Hence, λP ·E0 (P1 ) = λT ·E1 (E0 (P1 )) with probability 1/2+q, and λP · E0 (P2 ) = λT · E1 (E0 (P2 )) with probability 1/2 + q. As the diﬀerential predicts that λP · E0 (P1 ) = λP · E0 (P2 ), then with probability 1/2 + 2q 2 , λT · C1 = λT · C2 where C1 and C2 are the corresponding ciphertexts of P1 and P2 , respectively (i.e., Ci = E1 (E0 (Pi )).

Diﬀerential-Linear Cryptanalysis of Serpent

13

This fact allows constructing diﬀerential-linear distinguishers, based on encrypting many plaintext pairs, and checking whether the ciphertexts agree on the parity of the output subset. The data complexity of the distinguishers is O(q −4 ) chosen plaintexts, when the exact number of plaintexts is a function of the desired success rate, and of the number of possible subkeys. In [7] Biham, Dunkelman and Keller proposed a way to deal with diﬀerentials with probability p = 1. In case the diﬀerential is satisﬁed (probability p), the above analysis remains valid. In case the diﬀerential is not satisﬁed (probability 1 − p) we assume a random behavior of the input subset parities (and therefore, also the output subset parities). The probability that a pair with input diﬀerence ΩP will satisfy λT · C1 = λT · C2 is in that case p(1/2 + 2q 2 ) + (1 − p) · 1/2 = 1/2 + 2pq 2 . Furthermore, in [7] it was shown that the attack can still be applicable if the product ΩT · λP = 1. In this case, the analysis remains valid, but instead of looking for the instances for which λT · C1 = λT · C2 , we look for the cases when λT ·C1 = λT ·C2 . As the analysis remains the same given a pair of plaintexts with the input diﬀerence ΩP , the probability that the pair disagrees on the output subset parity is 1/2 + 2pq 2 . Another interesting result is that we can use the attack even when ΩT · λP is unknown, as long as it remains constant (this is the case, when the diﬀerence passes an unknown constant linear function). The data complexity of the enhanced diﬀerential-linear attack is O(p−2 q −4 ).

4

Attacking 11-Round Serpent

We now present our diﬀerential-linear attack on 11-round Serpent. The 11 rounds that we attack are rounds 1–11 of Serpent (i.e., starting at the second round of Serpent). The attack is based on building a 9-round diﬀerential-linear distinguisher for rounds 2–10, and retrieving 48 bits of the subkeys of rounds 1 and 11. The 9-round diﬀerential-linear characteristic is based on a 3-round diﬀerential with probability 2−6 and a 6-round linear approximation with a bias of 2−27 . There are 5 active S-boxes before the 3-round diﬀerential, and 7 active S-boxes after the 6-round linear approximation. In the key recovery attack we are only interested in the input diﬀerence of the diﬀerential and in the output mask of the linear approximation. To present these values we adopt the notations from [4, 5, 9]. The ﬁgures describe data blocks by rectangles of 4 rows and 32 columns. The rows are the bitsliced 32-bit words, and each column forms the input of an S-box. The upper row represents X0 , while the lower row represents X3 , and the rightmost column represents the least signiﬁcant bits of the words. In the description of diﬀerentials a thin arrow represents a probability of 1/8 for the speciﬁc S-box (given the input diﬀerence, the output diﬀerence is achieved with probability 1/8), and a fat arrow stands for probability 1/4. If there is a diﬀerence in a bit, its entry is ﬁlled. An example for our notation can be found in Figure 1. The input diﬀerence 1 in the ﬁrst S-box (S-box 0; related

14

Eli Biham, Orr Dunkelman, and Nathan Keller

Fig. 1. Diﬀerential and Linear Representation Example

Fig. 2. The Input Diﬀerence of the 9-Round Distinguisher

to the least signiﬁcant bits of the 4 words X0 , X1 , X2 , and X3 ) causes an output diﬀerence 3 with probability 1/4, and input diﬀerence 3 in S-box 30 causes an output diﬀerence 1 with probability 1/8. In the description of linear approximations a ﬁlled entry represents the subset of bits participating in the approximation, a thin arrow represents a bias of 1/8, and a fat arrow represent a bias of 1/4. Thus, we can treat Figure 1 as an example for a linear approximation: Bit 0 of the input of S-box 0 has the same parity as the subset {0, 1} of the S-box’s output with probability 1/2 ± 1/4 (or a bias of 1/4), and the parity of bits 0 and 1 of the input of S-box 30 has the same value as bit 1 of the output of the S-box with probability 1/2 ± 1/8 (a bias of 1/8). The 3-round diﬀerential is based on the best known 4-round diﬀerential characteristic of Serpent, taken from [4]. It starts in round 2 with two active S-boxes, and ends after round 4. Note, that we could have taken a diﬀerential with higher probability of 2−5 , but in exchange we would get 9 active S-boxes in the round before the distinguisher, instead of only 5. When we experimentally veriﬁed this result we have found out that there are other diﬀerentials which also predict the diﬀerence in the bits of λP . Summing all these diﬀerentials, we get that the probability that λP · ΩT = 0 is 1/2 + 2−7 . Hence, we use in our attack and analysis p = 2−7 . We present the input diﬀerence in Figure 2. The 6-round linear approximation is based on one of the two best known 6-round linear approximations of Serpent, taken from [5]. It starts in round 5, with 2 active S-boxes, and ends in round 10 with 5 active S-boxes and has a bias of 2−27 . The output masks can be computed once 7 S-boxes in round 11 are partially decrypted. We present the output mask of the linear approximation in Figure 3. In the basic 11-round attack, the attacker encrypts many plaintext pairs with a plaintext diﬀerence that might lead to the desired input diﬀerence, and

Diﬀerential-Linear Cryptanalysis of Serpent

15

Fig. 3. The Output Mask of the 9-Round Distinguisher

partially decrypts the ciphertext pairs. Then, the attacker checks whether the partially decrypted values agree on the parity of the output mask. As stated before, the probability that the partially decrypted ciphertexts agree on the output subset mask is 1/2 + 2pq 2 . Therefore, if the attack uses N plaintext pairs it is expected that about N (1/2+2pq 2 ) of them agree on the parity of the output subset. For the analysis stage of the attack, we deﬁne a random variable for each subkey candidate. The variable counts how many pairs agree on the parity of the output mask after the partial decryption. As there are 248 possible subkeys in the 12 S-boxes, there are 248 − 1 wrong subkeys. We assume that all random variables related to the wrong subkeys behave like a normal random variable with a mean value of N/2 and a variance of N/4. The right subkey’s random variable is also a normal random variable, but with mean of N (1/2 + 2pq 2 ) and a variance of about N/4. The attack succeeds if the random variable related to the right subkey has the highest value (or among the top fraction). Our analysis shows that for 2124.3 pairs, the random variable related to the right subkey has probability 72.1% to be the highest of all random variables. To optimize the data and time complexity of the attack, we use structures of chosen plaintexts. Each structure contains 220 chosen plaintexts, which vary on the input of the 5 active S-boxes in round 1. We use the following algorithm for the attack: 1. Select N = 2125.3 plaintexts, consisting of 2105.3 structures, each is chosen by selecting: (a) Any plaintext P0 . (b) The plaintexts P1 ,. . . ,P220 −1 which diﬀer from P0 by all the 220 − 1 possible (non-empty) subsets of the twenty bits which enter the 5 active S-boxes in round 1. 2. Request the ciphertexts of these plaintext structures (encrypted under the unknown key K). 3. For each value of the 20 bits of K1 entering these 5 S-boxes: (a) Initialize an array of 228 counters to zeroes. (b) Partially encrypt each plaintext the 5 active S-boxes in round 1 and ﬁnd the pairs which satisfy the diﬀerence ΩP before round 2. (c) Given those 2124.3 pairs, perform for each ciphertext pair:

16

Eli Biham, Orr Dunkelman, and Nathan Keller

i. Try all the 228 possible values of the 28 bits of subkey K11 that enter the 7 active S-Boxes in round 11. ii. For each value of the subkey, partially decrypt the ciphertexts through the 7 active S-boxes in round 11, and compute the parity of the subset of bits in λT after round 10. iii. If the parities in both members of the pair are equal, increment the counter in the array related to the 28 bits of the subkey. (d) The highest entry in the array should correspond to the 28 bits of K11 entering the 7 active S-boxes in round 11. 4. Each trial of the key gives us 20 + 28 = 48 bits of the subkeys (20 bits in round 1 and 28 bits in round 11), along with a measure for correctness. The correct value of the 48 bits is expected to be the most frequently suggested value (with more than 72.1% success rate). 5. The rest of the key bits are then recovered by auxiliary techniques. The time complexity of a naive implementation is 2125.3 · 248 · 12/352 = 2172.4 11-round Serpent encryptions. The memory requirements of this algorithm are mostly for keeping the plaintexts, but we can handle each structure independently from other structures, thus, 228 counters (of 20 bits each) would suﬃce. Hence, the memory complexity of the attack is 230 bytes. The time complexity of the attack can be improved to 2125.3 · 220 · 5/352 = 139.2 11-round Serpent encryptions. We note that for each guess of the subkey 2 of round 11 we perform 2125.3 decryptions. However, there are only 228 possible values of the 28 bits which we actually decrypt. The improvement is based on keeping a precomputed table that holds for any possible value of the 28 ciphertext bits (which enter the 7 active S-boxes) and any possible value of the corresponding 28-bit subkey, the parity of the partially decrypted value. The optimized Step 3(b) of the attack counts over all pairs how many times each of the 256 possibilities of the 56 bits (28 bits from each of the two ciphertexts) entering the 7 active S-boxes n round 11 occurs. After counting the occurrences, we ﬁnd how many pairs agree on the output subset parity, and how many disagree, for this subkey guess. This is the ﬁrst theoretical attack on 11-round Serpent with 192-bit keys. It requires 2125.3 chosen plaintexts, and has time complexity of 2139.2 11-round Serpent encryptions. The memory requirement of the attack is 260 bytes of RAM. We can attack 10-round Serpent by reducing the distinguisher to 8 rounds. We remove the last round of the linear approximation to get a 5-round linear approximation with bias q = 2−22 . The data complexity of the attack drops to 2107.2 chosen plaintexts. The time complexity of the attack in this case is 2107.2 · 220 · 5/320 = 2125.2 10-round Serpent encryptions. This is the ﬁrst known attack against 10-round Serpent with 128-bit key. Note, that in this attack we retrieve only 40 subkey bits, as there are only 5 active S-boxes in the last round (round 10). The rest of the bits can be found by exhaustive search with time complexity of 2128−40 = 288 10-round Serpent encryptions.

Diﬀerential-Linear Cryptanalysis of Serpent

5

17

Summary

In this paper we present the best published attack on 11-round Serpent. The attack is applicable against up to 11-round Serpent with keys of sizes 140–256 bits. It is faster than exhaustive search, and has a success rate of 72.1%. The attack requires 2125.3 chosen plaintexts. Its best time complexity is 2139.2 11round Serpent encryptions, using 260 bytes of RAM for the analysis. We also present an attack on 10-round Serpent requiring 2107.2 chosen plaintexts and whose time complexity is 2125.2 10-round Serpent encryptions. We summarize our new attacks, and previously published attacks against Serpent in Table 1. Table 1. Summary of Attacks on Serpent with Reduced Number of Rounds Rounds Type of Attack

Key Sizes

Complexity Data Time Memory 6 Diﬀerential [9] all 283 CP 290 244 71 103 Diﬀerential [9] all 2 CP 2 279 41 163 Diﬀerential [9] 192 & 256 2 CP 2 249 122 248 7 Diﬀerential [9] 256 2 CP 2 2130 Diﬀerential [4] all 284 CP 278.9 256 128 163 8 Amp. Boomerang [9] 192 & 256 2 CP 2 2137 110 175 Amp. Boomerang [9] 192 & 256 2 CP 2 2119 84 206.7 Diﬀerential [4] 256 2 CP 2 289 9 Amp. Boomerang [9] 256 2110 CP 2252 2212 10 Rectangle [4] 256 2126.8 CP 2207.4 2131.8 Rectangle [4] 256 2126.8 CP 2205 2196 126.3 165 Rectangle [6] 192&256 2 CP 2 2131.8 Boomerang [6] 192&256 2126.3 ACPC 2165 289 Enhanced Diﬀ.-Lin. (this paper) all 2105.2 CP 2123.2 240 118 205.7 11 Linear [5] 256 2 KP 2 2183 125.3 172.4 Enhanced Diﬀ.-Lin. (this paper) 192 & 256 2 CP 2 230 125.3 139.2 Enhanced Diﬀ.-Lin. (this paper) 192 & 256 2 CP 2 260 Complexity is measured in encryption units. Memory is measured in bytes. CP - Chosen Plaintexts, KP - Known Plaintexts, ACPC - Adaptive Chosen Plaintexts and Ciphertexts.

References 1. Ross Anderson, Eli Biham, Lars R. Knudsen, Serpent: A Proposal for the Advanced Encryption Standard, NIST AES Proposal, 1998. 2. Eli Biham, On Matsui’s Linear Cryptanalysis, Advances in Cryptology, proceeding of EUROCRYPT 1994, Lecture Notes in Computer Science 950, pp. 341–355, Springer-Verlag, 1994. 3. Eli Biham, A Shamir, Diﬀerential Cryptanalysis of the Data Encryption Standard, Springer-Verlag, 1993. 4. Eli Biham, Orr Dunkelman, Nathan Keller, The Rectangle Attack – Rectangling the Serpent, Advances in Cryptology, proceeding of EUROCRYPT 2001, Lecture Notes in Computer Science 2045, pp. 340–357, Springer-Verlag, 2001.

18

Eli Biham, Orr Dunkelman, and Nathan Keller

5. Eli Biham, Orr Dunkelman, Nathan Keller, Linear Cryptanalysis of Reduced Round Serpent, proceedings of Fast Software Encryption 8, Lecture Notes in Computer Science 2355, pp. 16–27, Springer-Verlag, 2002. 6. Eli Biham, Orr Dunkelman, Nathan Keller, New Results on Boomerang and Rectangle Attacks, proceeding of Fast Software Encryption 9, Lecture Notes in Computer Science 2365, pp. 1–16, Springer-Verlag, 2002. 7. Eli Biham, Orr Dunkelman, Nathan Keller, Enhancing Diﬀerential-Linear Cryptanalysis, Advances in Cryptology, proceeding of ASIACRYPT 2002, Lecture Notes in Computer Science 2501, pp. 254–266, Springer-Verlag, 2002. 8. Orr Dunkelman, An Analysis of Serpent-p and Serpent-p-ns, presented at the rump session of the Second AES Candidate Conference, 1999. Available on-line at http://vipe.technion.ac.il/∼orrd/crypt/. 9. John Kelsey, Tadayoshi Kohno, Bruce Schneier, Ampliﬁed Boomerang Attacks Against Reduced-Round MARS and Serpent, proceedings of Fast Software Encryption 7, Lecture Notes in Computer Science 1978, pp. 75–93, Springer-Verlag, 2001. 10. Suzan K. Langford, Martin E. Hellman, Diﬀerential-Linear Cryptanalysis, Advances in Cryptology, proceedings of CRYPTO ’94, Lecture Notes in Computer Science 839, pp. 17–25, 1994. 11. Mitsuru Matsui, Linear Cryptanalysis Method for DES Cipher, Advances in Cryptology, proceedings of EUROCRYPT ’93, Lecture Notes in Computer Science 765, pp. 386–397, 1994. 12. NESSIE, Performance of Optimized Implementations of the NESSIE Primitives, NES/DOC/TEC/WP6/D21/a, available on-line at http://www.nessie.eu.org/nessie. 13. NIST, A Request for Candidate Algorithm Nominations for the AES, available on-line at http://www.nist.gov/aes/. 14. David Wagner, The Boomerang Attack, proceedings of Fast Software Encryption 6, Lecture Notes in Computer Science 1636, pp. 156–170, 1999.

A

The Diﬀerential Characteristic

The 3-round truncated diﬀerential used in the attack is based on the ﬁrst 3 rounds of the best 4-round diﬀerential of Serpent. The ﬁrst round of the diﬀerential is round 2 (or any other round that uses S2 ) with probability 2−5 :

S2

−5

p=2

Diﬀerential-Linear Cryptanalysis of Serpent

19

After the linear transformation and the application of S3 we get the following truncated diﬀerential with probability 2−1 :

S3

p=2

−1

? ? ? Where the ‘?’ means that we do not care what the value of the diﬀerence is, and the bold and thick arrow means that this happens with probability 1/2. After the linear transformation, we get the following truncated diﬀerential in the input to S4 :

? ? ? ?

?

?

? ?

?

? ?

?

S4

p=1 ? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

? ? ? ?

We checked all the various outputs of this S4 and found that all the possible output diﬀerences do not aﬀect the linear approximation used in the attack. The empty arrow means that this holds with probability 1. As mentioned before, we have experimentally veriﬁed that with probability 1/2 + 2−7 this input diﬀerence causes even diﬀerence (no diﬀerence or diﬀerence in even number of bits) in the masked bits of λP .

B

The Linear Approximation

The 6-round linear approximation used in the attack is part of the best 9-round linear approximation of Serpent. It starts in a round with S5 as the S-box:

S5

-5

P=1/2-2

20

Eli Biham, Orr Dunkelman, and Nathan Keller

After the linear transformation, and the application of S6 we get the following linear approximation with bias of 2−3 :

S6

−3

P=1/2−2

After the linear transformation, and the application of S7 we get the following linear approximation with bias of 2−5 :

S7

-5

P=1/2-2

After the linear transformation, and the application of S0 we get the following linear approximation with bias of 2−6 :

S0

-6

P=1/2+2

After the linear transformation, and the application of S1 we get the following linear approximation with bias of 2−7 :

S1

-7

P=1/2-2

Diﬀerential-Linear Cryptanalysis of Serpent

21

After the linear transformation, and the application of S2 we get the following linear approximation with bias of 2−6 :

S2

-6

P=1/2-2

After this round, there are 7 active S-boxes in the following round: S-boxes 1, 8, 11, 13, 18, 23 and 28.

Rectangle Attacks on 49-Round SHACAL-1 Eli Biham1 , Orr Dunkelman1 , and Nathan Keller2 1

Computer Science Department, Technion, Haifa 32000, Israel {biham,orrd}@cs.technion.ac.il 2 Mathematics Department, Technion, Haifa 32000, Israel [email protected]

Abstract. SHACAL-1 is a 160-bit block cipher with variable key length of up to 512-bit key based on the hash function SHA-1. It was submitted to the NESSIE project and was accepted as a ﬁnalist for the 2nd phase of the evaluation. In this paper we present rectangle attacks on 49 rounds out of the 80 rounds of SHACAL-1. The attacks require 2151.9 chosen plaintexts or ciphertexts and have time complexity of 2508.5 49round SHACAL-1 encryptions. These are the best known attacks against SHACAL-1. In this paper we also identify and ﬁx some ﬂaws in previous attacks on SHACAL-1.

1

Introduction

In 1993 NIST has issued a standard hash function called Secure Hash Algorithm [13]. Later this version was named SHA-0, as in 1995 NIST published a small tweak to this standard called SHA-1. Both SHA-0 and SHA-1 are based on padding the message and dividing it to blocks of 512 bits. Then iteratively compressing those blocks into a 160-bit digest. Recently, NIST has published (besides SHA-1) 3 more standard hash functions as part of FIPS-180: SHA-256, SHA-384 and SHA-512. Each of the new hash functions has a digest size corresponding to its number, i.e., SHA-256 has a 256-bit digest, etc. Both SHA-0 and SHA-1 were subjected to a great deal of analysis. In [4] an attack producing pseudo-collisions in SHA-0 was suggested. A pseudo-collision is not a true collision, as it assumes that the attacker can control the input to the compression function, which is a constant value both in SHA-0 and in SHA-1. The attack requires 261 computations of SHA-0 to produce a pseudo-collision. This result does not apply to SHA-1. As hash functions can also be attacked using diﬀerential cryptanalysis [1], there is a continuous search for diﬀerentials in SHA-1. In [5] several of these diﬀerentials were presented. Recently, it was shown how to generate slid pairs in SHA-1 requiring about 232 computations of SHA-1, under some conditions [9]. As SHA-1 was thoroughly examined and analyzed, it was suggested to use its compression function as a block cipher [5]. Later this suggestion was named

The work described in this paper has been supported by the European Commission through the IST Programme under Contract IST-1999-12324.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 22–35, 2003. c International Association for Cryptologic Research 2003

Rectangle Attacks on 49-Round SHACAL-1

23

SHACAL-1 [6]. It is a 160-bit block cipher with variable key length (0–512 bits) and 80 rounds based on the compression function of SHA-1. The cipher, which was submitted as a candidate to the NESSIE project [10], was selected as a NESSIE ﬁnalist, but was not selected for the NESSIE portfolio [12]. As mentioned before, it is possible to use the results of diﬀerential cryptanalysis obtained on SHA-1, and apply them to SHACAL-1. In [5, 6] a 5-round diﬀerential characteristic with probability 1 is presented. Few 10-round diﬀerentials are also presented in [5, 6]. Recently, diﬀerentials with higher probabilities than claimed in [5, 6] were presented in [14, 8]. In [8] these new diﬀerentials were combined to mount an ampliﬁed boomerang attack on 47-round SHACAL-1. The attack requires 2158.5 chosen plaintexts and has time complexity equivalent to 2508.4 47-round SHACAL-1 encryptions. In this paper we present some ﬂaws in the analysis done in [8], and show how to correct them without aﬀecting the data complexity nor the time complexity of the original attack. We further improve the corrected results of [8] by applying the successor of the ampliﬁed boomerang attack – the rectangle attack to SHACAL-1. This allows us to reduce the data complexity to 2151.9 chosen plaintexts, and the time complexity of attacking 47 rounds (rounds 0–46) of SHACAL-1 from 2508.4 to 2482.6 encryptions. By moving the rectangle attack to other rounds (rounds 22–70 or rounds 2977) we also get an attack on 49-round SHACAL-1. The attack requires 2151.9 chosen plaintexts or chosen ciphertexts (depending on the rounds which we attack) and 2508.5 49-round SHACAL-1 encryptions. This is the best known attack against SHACAL-1. This paper is organized as follows: In Section 2 we describe the block cipher SHACAL-1. In Section 3 we present the previously best known results on SHACAL-1 (we also describe the ﬂaws, and present a modiﬁcation to the attack to correct those ﬂaws). In Section 4 we give a short description of the ampliﬁed boomerang and the rectangle attacks, and we present a rectangle attack on 47round SHACAL-1. In Section 5 we add two more rounds to the attack to present a rectangle attack on 49-round SHACAL-1. Finally, Section 6 summarizes the paper. The appendix contains some of the diﬀerentials used in the attack.

2

Description of SHACAL-1

SHACAL-1 [6] is a 160-bit block cipher supporting variable key lengths (0–512 bits). It is based on the compression function of the hash function SHA-1. The cipher has 80 rounds (also referred as steps) grouped into 4 types of 20 rounds each1 . The 160-bit plaintext is divided into ﬁve 32-bit words – A, B, C, D and E. We denote by Xi the value of word X before the ith round, i.e., the plaintext 1

To avoid confusion, we adopt the standard and common notations for rounds. In [6] the notation step means round, where round is used for a group of 20 steps.

24

Eli Biham, Orr Dunkelman, and Nathan Keller

P is divided into A0 , B0 , C0 , D0 and E0 , and the ciphertext is composed of A80 , B80 , C80 , D80 and E80 . In each round the words are updated according to the following rule: Ai+1 = Wi + ROT L5 (Ai ) + fi (Bi , Ci , Di ) + Ei + Ki Bi+1 = Ai Ci+1 = ROT L30 (Bi ) Di+1 = Ci Ei+1 = Di where + denotes addition modulo 232 , ROT Lj (X) represents rotation to the left by j bits, Wi is the round subkey, and Ki is the round constant2 . There are three diﬀerent functions fi , selected according to the round number: 0 ≤ i ≤ 19 fi (X, Y, Z) = fif = (X&Y )|(¬X&Z) fi (X, Y, Z) = fxor = (X ⊕ Y ⊕ Z) 20 ≤ i ≤ 39, 60 ≤ i ≤ 79 40 ≤ i ≤ 59 fi (X, Y, Z) = fmaj = ((X&Y )|(X&Z)|(Y &Z)) In [6] it is strongly advised to use keys of at least 128 bits, even though shorter keys are supported. The ﬁrst step in the key schedule algorithm is to pad the supplied key into a 512-bit key. Then, the 512-bit key is expanded into eighty 32-bit subkeys (or a total of 2560 bits of subkey material). The expansion is done in a linear manner using a linear feedback shift register (over GF (232 )). We omit the way the round subkeys are computed and the values of the round constants as these details do not aﬀect our analysis. However, we do use the fact that the subkeys are linearly dependent on the key.

3

Previous Results

SHACAL-1 is based on SHA-1 and it is widely presumed that any attack on one of them would lead to an attack on the other (as demonstrated lately in [9]). Moreover, great deal of the analysis done to SHA-1 can be applied to SHACAL-1 as well. In [5] the properties of the compression function of SHA-1 as a block cipher were studied. Diﬀerential and linear properties of SHACAL-1 were studied in [5, 6]: There is a 4-round linear approximation with bias 1/2 (the maximal bias), and for rounds with fxor this approximation can be extended into a 7-round approximation with the same bias. There is also a 5-round diﬀerential with probability 1. These papers also contain results on 10-round linear approximations and diﬀerentials. We summarize these results in Table 1. In [14] the diﬀerentials presented in [5, 6] were improved, and 20-round differentials with probability 2−41 are presented. In [8] another set of diﬀerentials of SHACAL-1 is presented. We summarize these results in Table 1. 2

This time we adopt the notations of [6], and alert the reader of the somewhat confusing notations.

Rectangle Attacks on 49-Round SHACAL-1

25

Table 1. Previously Known Diﬀerentials and Linear Approximations of SHACAL-1. Type Linear [6] Linear [6] Linear [6] Linear [6] Linear [6] Diﬀerential Diﬀerential Diﬀerential Diﬀerential Diﬀerential Diﬀerential Diﬀerential Diﬀerential Diﬀerential Diﬀerential

[6] [6] [6] [14] [14] [14] [8] [8] [8] [8]

f Type in Use Rounds Probability/Bias any 4 1/2 fxor 7 1/2 fxor 10 2−6 fif 10 2−7.2 fmaj 10 2−6.4 any 5 1 fif , fmaj 10 2−13 fxor 10 2−26 fif 10 2−12 fxor 10 2−12 fif , fmaj 20 2−41 20 fif then fxor 21 2−45 20 fif then fxor 28 2−107 20 fif then fxor 30 2−130 fxor 15 2−31

In [9] an algorithm for identifying whether two SHACAL-1 encryptions use related keys is presented. The attack is based on ﬁnding slid pairs, once a slid pair is encountered, the attacker can determine whether the two encryptions have related keys. The attack requires about 296 encryptions under each of the two keys to ﬁnd a slid pair. In [8] the 21-round diﬀerential for rounds 0–20 and the 15-round diﬀerential for rounds 21–35 were combined to build an ampliﬁed boomerang [7] distinguisher for 36-round SHACAL-1. This distinguisher is used to attack 39-round SHACAL-1 using 2158.5 chosen plaintexts and about 2250.8 39-round SHACAL-1 encryptions. The attack is based on guessing (or trying) the subkeys of the 3 additional rounds, and then checking whether the distinguisher succeeds. This approach is further extended to attack 47-round SHACAL-1 before exhaustive key search becomes faster than this attack. Another attack presented in [8] is a diﬀerential attack on 41-round SHACAL-1. The attack uses the 28-round differential characteristic with probability 2−107 for 128-bit keys, and the 30-round diﬀerential characteristic with probability 2−130 for longer keys. We summarize the data and time complexities of the attacks presented in [8] in Table 2. 3.1

Problems in the 47-Round Ampliﬁed Boomerang Attack and How to Fix Them

As mentioned before, in [8] an ampliﬁed boomerang attack on 47-round SHACAL-1 is presented. The attack is based on a 36-round ampliﬁed boomerang distinguisher, and guessing the subkeys of the remaining 11 rounds. The basic idea is to try each and every subkey for those 11 rounds, partially decrypt all the ciphertexts, and run the distinguisher. If the distinguisher succeeds (i.e., succeeds to distinguish the remaining 36 rounds from a random permutation), then the subkey for those 11 rounds is considered to be the right subkey.

26

Eli Biham, Orr Dunkelman, and Nathan Keller Table 2. Complexities of Previous Attacks on SHACAL-1 ([8]). Key Size Type of Attack 128-bit

Ampliﬁed Boomerang Diﬀerential 160-bit Ampliﬁed Boomerang Diﬀerential 256-bit Ampliﬁed Boomerang Diﬀerential 512-bit Ampliﬁed Boomerang Diﬀerential CP - Chosen Plaintexts

Number of Rounds 28 30 37 32 39 34 47 41

Complexity Data Time 2127.5 CP 2127.2 2110 CP 275.1 2158.5 CP 287.8 2141 CP 2105 2158.5 CP 2250.8 2141 CP 2234 2158.5 CP 2508.4 2141 CP 2491

We shall concentrate on the 39-round attack, which is based on guessing the subkeys of the last 3 rounds. The 39-round attack is actually considered as a procedure in the 47-round attack, hence, if this attack fails, so does the 47-round attack presented in [8]. The 36-round distinguisher needs 2158.5 chosen plaintexts that compose 2157.5 pairs. According to the analysis in [8] it is expected that for the right subkey the number of right ampliﬁed boomerang quartets3 is 8. However, the number of right quartets has a Poisson distribution. Hence, if the expected number of quartets is 8, and denoting by a random variable X the number of quartets, we get that X ∼ P oi(8). This means that Pr[X ≥ 8] = 0.505 which is much lower than expected by the authors of [8]. Usually this confusion between the expected value and the true distribution of the value has no implication on the correctness of the attack. However, the authors of [8] claim that “But, for a wrong subkey, the expected value of counter is equal to 0 or 1, since the expected number of quartets passed through Step 4 is 2−5 (= 2187 · (2−96 )2 ).” We agree that for most subkeys, the value of the counter (how many quartets suggest this subkey) is 0 or 1. The values of these counters also behave like Poisson random variables. Let us examine a Poisson random variable Y ∼ P oi(1/32), the probability Pr[Y ≥ 8] ≈ e−1/32 · (1/32)8 /8! = 2−55.3 is truly very small. If we take into consideration the fact that we have 296 such variables (each corresponds to a wrong subkey guess), each with probability of 2−55.3 to pass the ﬁltering, then about 240.7 out of the possible 296 subkeys also have at least 8 “right” quartets. Therefore the 39-round attack fails, as there are 240.7 subkeys suggested by the attack, and only in half of the cases the right subkey is among them! 3

We present in Section 4 a detailed description of the ampliﬁed boomerang attack. Meanwhile, we note that for a random permutation with 160-bit block, the probability that 2157.5 pairs create an ampliﬁed boomerang quartet is 2−5 .

Rectangle Attacks on 49-Round SHACAL-1

27

At ﬁrst glance it appears that these observations indicate that the attack is incorrect. We correct the attack by exploiting the fact that all subkeys and the user key are all linearly dependent. Given a subkey with 8 (or more) candidate quartets, we can check all keys which generate this subkey. Enlisting those keys is easy and can be done eﬃciently, as the subkeys are linearly dependent in the key. Thus, we can reduce the time complexity of exhaustive key search by a factor of 255.3 and ﬁnd the right key with probability 50.5%. By reducing the threshold (examining subkeys with 7 or more quartets) we increase the success rate of the attack to 69%. In exchange, we examine 1 out of every 247.3 keys, i.e., 255 times more keys than when 8 quartets are the requirement. We can further correct and improve the above results. Right quartets are composed of two pairs of right pairs with respect to the ﬁrst diﬀerential used in the ampliﬁed boomerang distinguisher4 , thus, if a subkey gets 7 or more quartets, we have at least 14 pairs with respect to the ﬁrst diﬀerential. We use those pairs to mount a regular diﬀerential attack. Our analysis shows that 14 pairs are suﬃcient to determine the ﬁrst round subkey uniquely. Moreover, for a wrong subkey guess, the probability that these 14 pairs suggest a subkey for the ﬁrst round is about 2−24 , hence, it can be used for eliminating wrong subkey guesses. This variant of the attack retrieves 128-bit subkey material, 32 bits in the ﬁrst round (these bits are actually key material), and the remaining 96 bits that are linearly derived from the key. We can further retrieve subkey material, continuing the diﬀerential attack and using auxiliary techniques. Combining these corrections and improvements we get a valid attack with the same time and data complexity as the one mentioned in [8]. The most time consuming part of the attack is still the basic 39-round attack, while the remaining steps have a negligible time complexity.

4

Rectangling the Attack – Attacking 47-Round SHACAL-1

In this section, we improve the corrected results of [8] to attack 47-round SHACAL-1 more eﬃciently. Our improvements are based on transforming the ampliﬁed boomerang attack into a rectangle attack. This allows us to reduce the data complexity of the attack to 2151.9 chosen plaintexts, and the time complexity to 2482.6 47-round SHACAL-1 encryptions. We ﬁrst upgrade the distinguisher from an ampliﬁed boomerang distinguisher [7] into a rectangle distinguisher [2]. These attacks are closely related to the boomerang attack [15]. Both the ampliﬁed boomerang and the rectangle attacks are based on treating the distinguished part of the cipher as composed of two sub-ciphers. Formally, we treat the block cipher E as a cascade of 4 sub-ciphers: E = Ef ◦ E1 ◦ E0 ◦ Eb , where Eb are the attacked rounds before the distinguisher, 4

Again, we present a detailed description of the ampliﬁed boomerang attack in the next section.

28

Eli Biham, Orr Dunkelman, and Nathan Keller Table 3. Number of Diﬀerentials for Rounds 0–20 of SHACAL-1. Probability (p) 2−45 2−46 2−47 2−48 2−49 2−50 2−51 Number of Diﬀerentials (l) 1 7 24 73 182 351 677 Contribution to pˆ2 (= lp2 ) 2−90 2−89.2 2−89.4 2−89.8 2−90.5 2−91.5 2−92.6

Ef are the attacked rounds after the distinguisher, and E = E1 ◦ E0 is the distinguished part. In the ampliﬁed boomerang attack, we take a diﬀerential α → β through E0 with probability p and a diﬀerential γ → δ through E1 with probability q. If we take N pairs of plaintexts with input diﬀerence α, about N p of them have a diﬀerence β after E0 . Those N p pairs can be combined in (N p)2 /2 quartets. Denoting such a quartet by ((P1 , P2 ), (P3 , P4 )) we know that P1 ⊕ P2 = P3 ⊕ P4 = α and that E0 (P1 ) ⊕ E0 (P2 ) = E0 (P3 ) ⊕ E0 (P4 ) = β. Assuming that {E0 (Pi )} are distributed uniformally, then with probability 2−160 it is true that E0 (P1 ) ⊕ E0 (P3 ) = E0 (P2 ) ⊕ E0 (P4 ) = γ. When this happens, with probability q 2 we get that E1 (E0 (P1 )) ⊕ E1 (E0 (P3 )) = E1 (E0 (P2 )) ⊕ E1 (E0 (P4 )) = δ. Hence, starting with N plaintext pairs with input diﬀerence α we expect about N 2 · (pq)2 · 2−161 quartets which satisfy the condition C1 ⊕ C3 = C2 ⊕ C4 = δ, where Ci is the corresponding ciphertext to the plaintext Pi . The rectangle attack is based on the same basic idea. However, the attack allows using any diﬀerential α → β in E0 and any diﬀerential γ → δ in E1 (as long as β = γ ). Besides these improvements, the attack gains a factor of 2 in the number of expected quartets by checking the two diﬀerent quartets ((P1 , P2 ), (P3 , P4 )) and ((P1 , P2 ), (P4 , P3 )). Hence, starting with N plaintext pairs with input diﬀerence α, we expect to get N 2 · (ˆ pqˆ)2 · 2−160 right quartets, where: Pr 2 [α → β], qˆ = Pr 2 [γ → δ]. pˆ = β

γ

Moreover, for a rectangle distinguisher, there is a better key recovery algorithm presented in [3]. Therefore, using the rectangle attack instead of the ampliﬁed boomerang attack is much more eﬃcient and requires less data. In order to compute pˆ we need to summarize the squares of the probabilities of all the diﬀerentials with input diﬀerence α through E0 . This task is computationally infeasible, and thus, we try to count over as many diﬀerentials as we can. We settle for counting over all diﬀerentials which have the same ﬁrst 20 or 19 rounds as the 21-round diﬀerential used in the original ampliﬁed boomerang attack. In Table 3 we gather the number of counted diﬀerentials according to their probabilities, and in Appendix A we present some of these diﬀerentials. Given these results we are able to compute a lower bound for pˆ = 2−43.64 . Due to the same reasons, computing the exact value of qˆ is computationally infeasible. Hence, we count only diﬀerentials which have the same 13 or 14 last rounds as the 15-round diﬀerential used in the original ampliﬁed boomerang attack. In Table 4 we list the number of diﬀerentials with their respective prob-

Rectangle Attacks on 49-Round SHACAL-1

29

Table 4. Number of Diﬀerentials for Rounds 21–35 of SHACAL-1. Probability (p) 2−31 2−32 2−33 2−34 2−35 2−36 2−37 Number of Diﬀerentials (l) 1 3 8 18 32 48 56 Contribution to pˆ2 (= lp2 ) 2−62 2−62.4 2−63 2−63.8 2−65 2−66.4 2−68.2

abilities, and in Appendix A we present some of these diﬀerentials. We take all these diﬀerentials into account and get that qˆ = 2−30.28 . These improvements reduce the data complexity of the attack from 2158.5 chosen plaintexts to 2155.9 chosen plaintexts. Due to the nature of the ampliﬁed boomerang and the rectangle attacks, this reduces the time complexity by a factor of 25.2 ≈ 35.5. Our third improvement to the data and time complexity of the attack is based on reducing the number of rounds in the distinguisher itself. In the attack presented in [8], there are 0 rounds in Eb , 21 rounds in E0 , 15 rounds in E1 , and between 3 to 11 rounds in Ef . We can move one round from the E0 to the rounds before the distinguisher Eb . This changes the division to sub-ciphers a little bit into 1 round in Eb and 20 in E0 . This increases the probability of the diﬀerentials of E0 by 24 in exchange for a more complex attack algorithm. In these new settings we use the results of [3], where a generic key recovery algorithm based on the rectangle distinguisher is presented. We shortly describe the notations used in that paper. Let E = Ef ◦ E1 ◦ E0 ◦ Eb be an n-bit block cipher. Assume that for E0 we have an input diﬀerence α and a related pˆ, and an output diﬀerence δ and qˆ for E1 . We denote by rb the number of bits which are active or can be active in the plaintext given that there is an α diﬀerence after Eb (and before E0 ). We denote by 2tb the number of of possibilities for these bits. For example, if an α diﬀerence after Eb requires that there is some plaintext bit that always diﬀers in the pair, then this bit is counted by rb but has no aﬀect on tb . We denote by mb the number of subkey bits in Eb that we attack (i.e., the number of subkey bits in Eb that aﬀect the α diﬀerence after Eb ). For Ef , we denote by rf the number of ciphertext bits whose diﬀerence is unknown after Ef if the input diﬀerence of Ef was δ. Similarly, 2tf is the number of possible diﬀerences in these rf bits. We denote by mf the number of subkey bits in Ef that aﬀect the δ diﬀerence. See [3] for more details of these notations and the detailed attack. The ﬁgures for this decomposition of 39-round SHACAL-1 are as follows: n = 160 (as SHACAL-1 is a 160-bit block cipher). If we truncate the 21-round characteristics by the ﬁrst round (which becomes Eb ), there are are 25 bits that might have diﬀerence in the plaintext: The 22 most signiﬁcant bits of register E, bit 5 of register A, bit 20 of register C and bit 15 of register D, thus rb = 25. If we look at these 25 bits we observe that not all the 225 diﬀerences are possible: 3 out of these 25 bits always diﬀer, i.e., for a pair with diﬀerence α after Eb these bits are always active, and for some other bits not all the diﬀerences are possible. For example, bits 10–14 of register E can have only one of the following

30

Eli Biham, Orr Dunkelman, and Nathan Keller

ﬁve patterns – {01x , 03x , 07x , 0Fx , 1Fx }. Similarly, our analysis reveals that the 22 bits of register E can have at most 5 · 2 · 4 · 2 · 2 · 12 = 960 = 29.9 diﬀerences before round 0, and therefore tb = 9.9. If the output diﬀerence of a pair after E1 is δ, after Ef we get that out of the 160 ciphertext bits, 68 bits have no diﬀerence or always have a diﬀerence (bits 0,1 of register B, bits 30,31 of register C and the entire D and E registers). Therefore, the number of bits which may diﬀer in a right pair is rf = 160−68 = 92. Still, for a right pair, not all the 292 possible diﬀerences in these 92 bits can be achieved. Our analysis reveals that at most 7 · 222 · 7 · 222 · 232 = 281.6 of these diﬀerences can be achieved if the input diﬀerence to Ef is δ, thus, tb = 81.6. Finally, for this attack mb = 32 (as we attack one round in Eb ) and mf = 96 (as we attack 3 rounds in Ef ). Assigning these ﬁgures to the complexity analysis from [3] we obtain that the data complexity of the attack is N = 2n/2+2 /ˆ pqˆ = 2151.9 chosen plaintexts, 2 rf −n−1 and the time complexity of the attack is N (2 + 2tf −n + 22tf +2rb −2n−2 + mb +2tf +tb −2n−1 mf +2tb +tf −2n−1 2 +2 ) + N memory accesses, which are 2235.4 235.4 memory accesses. These 2 memory accesses are equivalent to 2227.5 39-round 5 SHACAL-1 encryptions . We can extend this attack to a 47-round SHACAL-1 using the following algorithm: 1. Try all 2256 possible values for the eight 32-bit subkeys value (of rounds 39– 46). For each guess partially decrypt all ciphertexts these 8 rounds, and: (a) Apply the 39-round rectangle attack. (b) In case the 39-round attack suggests a subkey with 3 or more quartets, check all the 2128 keys that generate this 128-bit subkey value (the one suggested by the 39-round attack) and the 256-bit subkey values for rounds 39–46. The loop is repeated 2256 times, and it is expected that no more than 266 subkeys have 3 or more quartets for each 256-bit subkey guess. Thus, the time complexity of Step (b) is 2256 ·266 ·2128 = 2450 47-round SHACAL-1 encryptions. 8 The time complexity of Step 1 is 2256 · 2151.9 · 47 = 2405.3 47-round SHACAL-1 encryptions, and the time complexity of Step (a) is 2490.8 memory accesses (which are equivalent to 2482.6 47-round SHACAL-1 encryptions). Thus, the total time complexity of the attack is 2482.6 47-round SHACAL-1 encryptions.

5

Attacking 49-Round SHACAL-1

In this section we present a rectangle attack on 49-round SHACAL-1. The attack is quite similar to the one presented in the previous section. In order to improve the attack, we remove one round of the basic 39-round attack (obtaining a 38round rectangle attack), and perform a chosen ciphertext attack on rounds 29–78 (or on rounds 22–70 using a chosen plaintext attack) of SHACAL-1. This results 5

The conversion was done according to the best performance ﬁgures presented in [11].

Rectangle Attacks on 49-Round SHACAL-1

31

in an attack on these 49 rounds that requires 2508.5 encryptions, or an attack on 47 rounds that requires 2444.5 encryptions. The ﬁrst observation is that the inclusion of third round of Ef (the rounds after the distinguisher) in the 39-round attack does not contribute to the attack. After adding this third round to Ef , the number of subkey bits checked (mf ) is increased by 32. The number of possible diﬀerences that a δ diﬀerence causes after Ef is increased by 232 , hence, tf and rf are increased by 32 each. Therefore, adding this third round is equivalent to guessing this last round subkey, and decrypting all ciphertexts another round. This observation points out that we can use in the 47-round attack, a 38round attack which contains one round before the rectangle (Eb ), 20 rounds in E0 , 15 in E1 and 2 rounds after the rectangle (Ef ), with the following parameters: rb = 25, tb = 9.9, mb = 32, rf = 60, tf = 49.6, mf = 64 and pˆ = 2−39.87 , qˆ = 2−30.32 . Using these ﬁgures in the rectangle attack presented in [3] yields an attack which requires 2151.9 chosen plaintexts and 2203.4 memory accesses. Using this observation about the distinguisher, we change the attack algorithm from the previous section accordingly to: 1. Try all 2288 possible values for the nine 32-bit subkeys value (of rounds 38– 46). For each guess partially decrypt all ciphertexts these 9 rounds and: (a) Apply the 38-round rectangle attack. (b) In case the 38-round attack suggests a subkey with 3 or more quartets, check all the 2128 keys that generate this 96-bit subkey value (the one suggested by the 38-round attack) and the 288-bit subkey values for rounds 38–46. This new attack algorithm has the same data complexity as before (2151.9 chosen plaintexts) with time complexity of 2151.9 ·2288 ·9/47 = 2437.5 encryptions for step 1, 2490.8 memory accesses for step (a), and 2450 encryptions for Step (b). As the reader surely observed, this change does not improve the attack. First, we transform the 38-round attack into a chosen ciphertext attack. This might pose a problem, as the distinguisher starts with the ﬁrst round, and we cannot decrypt the round before it. To solve this problem, we use our second observation that fif and fmaj behave almost in the same manner with respect to diﬀerential cryptanalysis. Especially, the diﬀerentials that we use can be transferred to rounds 41–60 without aﬀecting their probability. Therefore, we can move the 38-round attack to rounds 40–77. The attack is a chosen ciphertext attack, thus we treat the cipher in a reversed order. The cipher which we attack starts just after round 77 and ends just before round 40: it has two rounds in the new Eb (rounds 76–77), 35 rounds in the distinguisher itself (rounds 41–75), and one round afterwards in the new Ef (round 40). For these settings the following values for the rectangle attack are: mb = 64, mf = 32, tb = 49.6, tf = 9.9, rb = 68 and rf = 22. As the same diﬀerentials are used, The values of pˆ and qˆ remain the same. Thus, our attack requires 2151.9 chosen ciphertexts but only 2165.4 memory accesses. We can now attack 11 more rounds after the distinguisher (actually before the distinguisher), thus attacking rounds 29–77 using the following algorithm:

32

Eli Biham, Orr Dunkelman, and Nathan Keller Table 5. Summary of Our Results and Previously Known Results. Attack

Number of Rounds Complexity Rounds Data Time Diﬀerential [8] 41 0–40 2141 CP 2491 Ampliﬁed Boomerang [8] 47 0–46 2158.5 CP 2508.4 Rectangle – this paper 47 0–46 2151.9 CP 2482.6 Rectangle – this paper 47 22–68 2151.9 CP 2444.5 Rectangle – this paper 47 31–77 2151.9 CC 2444.5 Rectangle – this paper 49 22–70 2151.9 CP 2508.5 Rectangle – this paper 49 29–77 2151.9 CC 2508.5 Complexity is measured in encryption units. CP - Chosen Plaintexts, CC - Chosen Ciphertexts

1. Try all 2352 possible values for the eleven 32-bit subkeys value (of rounds 29– 39). For each guess partially encrypt all plaintexts these 11 rounds and: (a) Apply the 38-round rectangle attack on rounds 40–77. (b) In case the 38-round attack suggests a subkey with 3 or more quartets, check all the 264 keys which generate this 96-bit subkey value (the one suggested by the 38-round attack) and the 352-bit subkey values for rounds 29–39. The time complexity of this 49-round attack is: 2151.9 · 2352 · 11/47 = 2501.8 49-round SHACAL-1 encryptions for step 1, 2516.8 memory accesses for step (a), and 2478 encryptions for Step (b). Translating the time complexity to units of 49-round SHACAL-1 encryptions we get that the total time complexity of the attack is 2508.5 49-round SHACAL-1 encryptions. We can use a chosen plaintext attack on rounds 22–70 using the same attack algorithm with chosen plaintexts and guessing the subkeys of the 11 rounds after the end (i.e., rounds 60–70), with the same data and time complexity. We can also apply this attack to 47-round SHACAL-1, with the same data complexity, but with time complexity of only 2444.5 47-round SHACAL-1 encryptions.

6

Summary and Conclusions

In this paper we improve the cryptanalytic results on SHACAL-1. The improvements allow us to attack 47-round SHACAL-1 (rounds 0–46) using the rectangle attack with data complexity of 2151.9 chosen plaintexts and time complexity of 2482.6 encryptions. We can also attack rounds 22-68 or rounds 31–77 with time complexity of 2444.5 encryptions. Another attack presented in this paper is on 49 rounds that requires the same amount of data, and 2508.5 encryptions. This is the best currently known result on SHACAL-1. We summarize our results and compare them with the previously known results in Table 5.

Rectangle Attacks on 49-Round SHACAL-1

33

References 1. Eli Biham, Adi Shamir, Diﬀerential Cryptanalysis of the Data Encryption Standard, Springer-Verlag, 1993. 2. Eli Biham, Orr Dunkelman, Nathan Keller, The Rectangle Attack – Rectangling the Serpent, Advances in Cryptology, proceeding of EUROCRYPT 2001, Lecture Notes in Computer Science 2045, pp. 340–357, Springer-Verlag, 2001. 3. Eli Biham, Orr Dunkelman, Nathan Keller, New Results on Boomerang and Rectangle Attacks, proceeding of Fast Software Encryption 9, Lecture Notes in Computer Science 2365, pp. 1–16, Springer-Verlag, 2002. 4. Florent Chabaud, Antoine Joux, Diﬀerential Collisions in SHA-0, Advances in Cryptology, proceeding of CRYPTO 1998, Lecture Notes in Computer Science 1462, pp. 56–71, Springer-Verlag, 1998. 5. Helena Handschuh, Lars R. Knudsen, Matthew J. Robshaw Analysis of SHA-1 in Encryption Mode, proceedings of CT-RSA 2001, Springer-Verlag Lecture Notes in Computer Science, vol. 2020, pp. 70–83, 2001. 6. Helena Handschuh, David Naccache, SHACAL, preproceedings of NESSIE ﬁrst workshop, Leuven, 2000. 7. John Kelsey, Tadayoshi Kohno, Bruce Schneier, Ampliﬁed Boomerang Attacks Against Reduced-Round MARS and Serpent, proceedings of Fast Software Encryption 7, Lecture Notes in Computer Science 1978, pp. 75–93, Springer-Verlag, 2000. 8. Jonsung Kim, Dukjae Moon, Wonil Lee, Seokhie Hong, Sangjin Lee, Seokwon Jung, Ampliﬁed Boomerang Attack against Reduced-Round SHACAL, Advances in Cryptology, proceedings of ASIACRYPT 2002, To appear. 9. Markku-Juhani O. Saarinen, Cryptanalysis of Block Ciphers Based on SHA-1 and MD5, these proceedings. 10. NESSIE – New European Schemes for Signatures, Integrity and Encryption. http://www.nessie.eu.org/nessie 11. NESSIE, Performance of Optimized Implementations of the NESSIE Primitives, NES/DOC/TEC/WP6/D21/2. 12. NESSIE, Portfolio of recommended cryptographic primitives. 13. US National Bureau of Standards, Secure Hash Standard, Federal Information Processing Standards Publications No. 180-2, 2002. 14. Eteinee Van Den Bogeart, Vincent Rijmen, Diﬀerential Analysis of SHACAL, NESSIE internal report NES/DOC/KUL/WP3/009/a, 2001. 15. David Wagner, The Boomerang Attack, proceedings of Fast Software Encryption 6, Lecture Notes in Computer Science 1636, pp. 156–170, 1999.

A

Diﬀerentials of SHACAL-1

In this appendix we describe the diﬀerentials used for the rectangle attacks on SHACAL-1. The basic diﬀerentials were previously presented in [8]. The ﬁrst diﬀerential (before truncating the ﬁrst round) is for rounds 0–20 or for rounds 40–60, and is presented in Table 6. We use the notation ei to represent the 32-bit word composed of 31 0’s and 1 in the ith place. We use ei,j to denote ei ⊕ ej and ei,j,k = ei,j ⊕ ek , etc. Recall that for the rectangle attacks presented in this paper, we changed this diﬀerential by removing its ﬁrst round. Due to the nature of the rectangle attack, we count over several diﬀerentials. We have counted over diﬀerentials which have the same 19 rounds as the one

34

Eli Biham, Orr Dunkelman, and Nathan Keller Table 6. Diﬀerential Characteristic for Rounds 0–20 (or 40–60) of SHACAL-1. Round (i) ∆Ai ∆Bi ∆Ci ∆Di ∆Ei Probability Input (i = 0) 0 e22 e15 e10 e5 2−4 1 e5 0 e20 e15 e10 2−3 2 0 e5 0 e20 e15 2−3 3 e15 0 e3 0 e20 2−2 4 0 e15 0 e3 0 2−2 5 0 0 e13 0 e3 2−2 6 e3 0 0 e13 0 2−2 7 e8 e3 0 0 e13 2−2 8 0 e8 e1 0 0 2−2 9 0 0 e6 e1 0 2−2 10 0 0 0 e6 e1 2−2 11 e1 0 0 0 e6 2−1 12 0 e1 0 0 0 2−1 13 0 0 e31 0 0 2−1 14 0 0 0 e31 0 2−1 15 0 0 0 0 e31 1 16 e31 0 0 0 0 2−1 17 e4 e31 0 0 0 2−2 18 e9 e4 e29 0 0 2−3 19 e14 e9 e2 e29 0 2−4 20 e19 e14 e7 e2 e29 2−5 Output (i = 21) e2,7,14,24,29 e19 e12 e7 e2 Diﬀerences are presented before the round, i.e., ∆A0 is the input diﬀerence.

Table 7. Possible ∆A21 Values for the First Characteristic with the Respective Probabilities. ∆A21

Prob.

∆A21

Prob.

∆A21

Prob.

∆A21

Prob.

e2,7,14,24,29

2−45

e2,3,7,14,24,29

2−46

e2,7,8,14,24,29

2−46

e2,7,14,15,24,29

2−46

e2,7,8,14,15,24,29

2−47

e2,7,8,14,24,25,29

2−47

e2,7,14,24,25,29

2−46

e2,3,7,14,24,29,30

2−46 e2,7,8,14,24,29,30,31 2−46 e2,3,4,7,14,15,24,29 2−47 2−47 e2,3,7,14,24,25,29 2−47 e2,3,7,14,15,24,29,30 2−47

e2,3,7,8,14,24,29

2−47

e2,7,8,14,24,29,30

2−47 e2,7,8,14,24,29,30,31 2−47 e2,7,14,15,16,24,29 2−47 e2,7,14,15,24,25,29 2−47

e2,3,7,8,14,24,29,30,31 2−47 e2,7,14,15,24,29,30

e2,3,7,14,15,24,29 e2,7,8,9,14,24,29

2−47

2−47 e2,7,14,15,24,29,30,31 2−47 e2,7,14,24,25,26,29 2−47 e2,7,14,24,25,29,30 2−47

e2,7,14,24,25,29,30,31 2−47

presented in [8] after it was truncated to a 20-round diﬀerential. In Table 7 we list the diﬀerentials with the highest probabilities which have the same ﬁrst 19 rounds as the original one. As in the last round the only aﬀected register is A, the list contains only the various diﬀerences in register A (the remaining registers have the same diﬀerences as in the original one of Table 6). We can also alter the round before last of this diﬀerential, gaining more characteristics with probabilities 2−46 at most. These changes are based on activating one bit in the output of the non-linear function fif (or fmaj ). We added their numbers to the table presented in Section 4, as they aﬀect the probability of the distinguisher, but we omit their description here.

Rectangle Attacks on 49-Round SHACAL-1

35

Table 8. Diﬀerential Characteristic for Rounds 21–35 (or 61–75) of SHACAL-1. Round (i) ∆Ai Input (i = 21) e1,5,8 22 0 23 e1,8 24 e1,3 25 0 26 e1 27 e1 28 0 29 0 30 0 31 0 32 e31 33 e4 34 e9,31 35 e14,29 Output e9,19,29,31 Diﬀerences are presented

∆Bi ∆Ci ∆Di ∆Ei Probability e1,3,5 e3,13 e1,5,13,31 e6,10,13,31 2−3 e1,5,8 e1,3,31 e3,13 e1,5,13,31 2−4 0 e3,6,31 e1,3,31 e3,13 2−4 e1,8 0 e3,6,31 e1,3,31 2−4 e1,3 e6,31 0 e3,6,31 2−3 0 e1,31 e6,31 0 2−2 e1 0 e1,31 e6,31 2−1 e1 e31 0 e1,31 2−1 0 e31 e31 0 1 0 0 e31 e31 1 0 0 0 e31 1 0 0 0 0 2−1 e31 0 0 0 2−1 e4 e29 0 0 2−3 e9,31 e2 e29 0 2−4 e14,29 e7,29 e2 e29 before the round, i.e., ∆A21 is the input diﬀerence.

Table 9. Possible ∆E21 Values for the Second Characteristic with the Respective Probabilities. ∆E21

Prob.

∆E21

Prob.

∆E21

Prob.

∆E21

Prob.

e6,10,13,31

2−31

e6,7,10,13,31

2−32

e6,10,11,13,31

2−32

e6,10,13,14,31

2−32

e6,7,8,10,13,31 e6,10,11,12,13,31 e6,7,10,11,12,31

2−33 2−33 2−34

e6,7,10,11,13,31 e6,10,11,13,14,31 e6,7,8,9,10,13,31

2−33 2−33

e6,7,10,13,14,31 e6,10,13,14,15,31

2−33 2−33

2−34 e6,7,8,10,11,13,14,31 2−34

e6,10,11,12,31 e6,7,8,9,13,31 e6,7,8,10,11,13,31

2−33 2−34 2−34

2−34 e6,7,10,11,13,14,31 2−34 e6,7,10,13,14,15,31 2−34 e6,10,11,12,13,14,31 2−34 e6,10,11,12,13,14,15,31 2−34 e6,10,11,13,14,15,31 2−34 e6,10,13,14,15,16,31 2−34 e6,7,10,11,12,13,31

The second diﬀerential for rounds 21–35 (or rounds 61–75) is presented in Table 8. It has also previously appeared in [8]. Again, due to the nature of the rectangle attack, we count probabilities of several diﬀerentials. We count over various similar characteristics, by changing the ﬁrst one or two rounds of this diﬀerential. In Table 9 we present some of the diﬀerentials which agree with the original characteristic but the ﬁrst round (round 21). As in this round the only change is in the diﬀerence of register E, only the various diﬀerences in that register are presented.

Cryptanalysis of Block Ciphers Based on SHA-1 and MD5 Markku-Juhani O. Saarinen Helsinki University of Technology Laboratory for Theoretical Computer Science P.O. Box 5400, FIN-02015 HUT, Finland [email protected]

Abstract. We cryptanalyse some block cipher proposals that are based on dedicated hash functions SHA-1 and MD5. We discuss a related-key attack against SHACAL-1 and present a method for ﬁnding “slid pairs” for it. We also present simple attacks against MDC-MD5 and the KaliskiRobshaw block cipher. Keywords: SHA-1, SHACAL, MD5, MDC, Slide attacks, Dedicated hash functions.

1

Introduction

One of the most widely used ways of creating cryptographic hash functions is the so-called Davies-Meyer mode. Let Y = E(M, X) be a compression function that takes in a message block M with an input variable X and produces a partial digest Y of equal size to X. To compute a message digest of message M1 | M2 | · · · | Mn , we simply set X0 to some predeﬁned initialization vector and iterate for i = 1, 2, . . . , n: Xi = E(Xi−1 , Mi ) ⊕ Xi−1 The resulting message digest is Xn . See [7, 14, 16, 17] for further discussion of hash function modes of operation. Notably, in 2002 Black, Rogaway, and Shrimpton proved that this mode is secure if E is secure [3]. It has been observed that the compression functions of most widely-used dedicated hash functions MD5 [18] and SHA [19, 20] are in fact based on this construction. In these hash functions the exclusive-or operation has been replaced by wordwise addition modulo 232 of the chaining variables. It has also been observed that the compression function E of MD5 and SHA can be eﬃciently computed in both directions; given Y = E(X, M ) and M , the original chaining value X can be recovered using an inverse transform X = E −1 (Y, M ). However, it is believed to be more diﬃcult to recover Xi−1 , given Xi and M .

This work has been supported by the Finnish Defence Forces.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 36–44, 2003. c International Association for Cryptologic Research 2003

Cryptanalysis of Block Ciphers Based on SHA-1 and MD5

2

37

The SHACAL Block Ciphers

One of the proposals for the NESSIE project was the block cipher SHACAL, which is essentially the SHA-1 compression function with the Davies-Meyer chaining peeled oﬀ [11]. In this proposal the message block M is viewed as the “key”, the chaining variable X acts as the plaintext block and Y = E(X, M ) is the corresponding ciphertext block. Later, a “tweak” was submitted where the original SHACAL was renamed SHACAL-1 and a new block cipher SHACAL-2 (based on SHA-256) was also proposed [12]. We refer the reader to [11, 12, 20] for detailed speciﬁcations of the SHACAL block ciphers. The basic structure is also used as a part of the HORNET stream cipher [15]. A detailed analysis of diﬀerential and linear properties of SHA-1 in encryption mode can be found in [10], where it is conjectured that a linear cryptanalytic attack would require at least 280 known plaintexts and a diﬀerential attack would require at least 2116 chosen plaintexts. 2.1

Sliding SHA-1 and SHACAL

Slide attacks against block ciphers were introduced by Biryukov and Wagner in 1999 [4, 5], although similar techniques had previously been used by others. To our knowledge, slide attacks against hash functions have not been previously considered in the literature. Indeed it is diﬃcult to see if and how “slid pairs” in the compression function can be exploited to ﬁnd collisions for the hash function. This remains an open question. However, it is interesting to consider the question whether or not slid pairs (which are essentially linear relations between two inputs and outputs) can be easily found for SHA-1. This is also related to Anderson’s classiﬁcation of hash functions [1]. David Wagner has considered a slide attack on 40 iterations of SHA-1 in unpublished work [21]. SHA-1 exhibits some properties which are useful when mounting slide attacks. a) The SHA-1 compression function consists of four diﬀerent “rounds”. For 20 iterations of each round the nonlinear function Fi and the constant Ki are unchanged. There are only three transitions between diﬀerent iteration types (see Figure 1). b) The key schedule (i.e. message expansion) can be slid. We simply choose = (W1 ⊕ W7 ⊕ W12 ⊕ W15 ) ≪ 1. It is Wi = Wi+1 for 0 ≤ i ≤ 14 and W15 easy to see that after the key expansion Wi = Wi+1 for 0 ≤ x ≤ 78. We note that these properties are not exhibited by SHA-256 (or SHA-512), thus making SHACAL-2 more resistant to slide attacks. 2.2

Related-Key Attacks

We shall consider the diﬃculty of distinguishing related keys. We assume that we are given chosen-plaintext access to two SHACAL-1 encryption oracles (“black

38

Markku-Juhani O. Saarinen

boxes”) whose keys are related in the way described in the previous section. The main question becomes how many chosen plaintexts are needed. For the transition iterations between diﬀerent types of functions we wish to ﬁnd inputs that produce the same output word for both types (“round collisions”). Experiments have conﬁrmed that the round functions behave suﬃciently randomly for us to use 2−32 as the probability of a round collision. Since there are three transitions, a simple distinguisher will require approximately 2128 chosen plaintext pairs. As pointed out by an anonymous program committee member, this can be improved to 296 by using “structures”; ﬁrst perform 232 encryptions of (A, B, C, D, x) on the ﬁrst oracle, where x = 0, 1, 2, · · · , 232 − 1 and A, B, C, D are some constants. Then do another 232 encryptions of (y, A, B ≫ 2, C, D) on the second “slid” oracle for y = 0, 1, 2, · · · , 232 − 1. Since each entry in the ﬁrst set corresponds to some slid entry in the second set, the ﬁrst collision is eﬀectively obtained for free, and only 296 pairs are required to distinguish the related keys. A version of SHACAL-1 reduced to three rounds (60 iterations) will require 264 pairs (only two transitions). 2.3

An Algorithm for Finding Slid Pairs

A method exists for ﬁnding slid pairs with roughly 232 eﬀort. The method is rather technical, so we can only give an overview of the key ideas used. The general strategy is as follows. The algorithm doesn’t start by choosing the plaintext or the ciphertext, but from the “middle”, iterations 20 and 40. We ﬁnd collisions in these positions with O(1) eﬀort and then work towards iterations 25 – 28, where we perform a partial meet-in-the middle match. Round Collisions. We note that in iteration i, not all input words aﬀect the possibility of a round collision; only B, C, and D are relevant, since A and E only aﬀect the output word linearily. Furthermore, the key word Wi has no eﬀect to the probability of collision in iterations i or i + 1. For iteration pair 19/20 (select-parity transition) we use 1 : (B, C, D) = (K20 , K0 , K0 ) It is easy to see that (B ∧ C) ∨ (¬B ∧ D) = K0 B ⊕ C ⊕ D = K20 Thus the constant (K0 /K20 ) is canceled out in both cases and a round collision occurs. 1

We use the boxed plus and minus symbols and to denote twos complement addition and subtraction operations mod 232 . x can be read as 0x. Other symbols denote are word-wise binary operators as follows: ¬ not, ∧ and, ∨ or, and ⊕ exclusiveor.

Cryptanalysis of Block Ciphers Based on SHA-1 and MD5

39

40

Markku-Juhani O. Saarinen

Similarly, for iteration pair 39/40 (parity-majority transition) we use: (B, C, D) = (K20 , K40 , K40 ) Again we see that a collision occurs: B ⊕ C ⊕ D = K20 (B ∧ C) ∨ (C ∧ D) ∨ (B ∧ D) = K40 . Keying. The key-expansion LFSR is sparse. This helps us to stretch the 16word span of the key schedule window to cover two collisions at iterations 20 and 40. We note that all 80 key words can be easily computed from any 16 consecutive words of the expanded key. In our attack we choose keys W21...36 . We start by forcing a collision at iteration 20 and then running the cipher forward to iteration 25. We then pick (A, B, C, D, E) after iteration 38 so that a collision occurs at iteration 40, and then run the cipher backwards to iteration 28. Key words W21 . . . W24 and W29 . . . W36 are set to zero. Therefore W37 = (W33 ⊕ W29 ⊕ W23 ⊕ W21 ) ≪ 1 = 0 W38 = (W34 ⊕ W30 ⊕ W24 ⊕ W22 ) ≪ 1 = 0. Note that W39 and W40 do not aﬀect collision at iteration 40. We can choose four keying words W25...28 without messing up the two round collisions! Unfortunately we cannot control all ﬁve words so we are forced to use a large lookup table to ﬁnd a match the ﬁfth word of the running state. This works because SHA-1 requires ﬁve iterations before all words are non-linearly changed. After we have collisions in iterations 20 and 40, and sixteen keying words W21...36 , we simply run the compression function forward to iteration 59/60 and see if a collision occurs there also. Since two of the three necessary collisions can be found with essentially O(1) eﬀort and the third requires O(232 ) operations, the overall complexity of ﬁnding slid pairs is O(232 ). The method has been implemented; see the Appendix for an example of a slid pair for full SHACAL-1. This is a surprising property, but we are have not discovered a direct way to transform it into a practical attack against SHA-1 and SHACAL.

3

Block Ciphers Based on MD5

We may also consider block ciphers derived from other dedicated hash functions. One obvious candidate is MD5 [18]. The MD5 compression function consists of four “rounds”, each of which has 16 iterations. Because each of the 64 iterations has a diﬀerent constant, the MD5 compression function doesn’t seem to be subject to sliding attacks. Figure 2 shows the structure of a single MD5 iteration.

Cryptanalysis of Block Ciphers Based on SHA-1 and MD5

41

Fig. 2. The MD5 iteration.

The MD5 compression function is known to not be collision-resistant [8]. This indicates the existence of “ﬁxed points” in the corresponding block cipher but doesn’t really tell us much its security. Some diﬀerential cryptanalysis of the hashing mode has been attempted, but we are not aware of any cryptanalysis of MD5 in encryption mode [2]. However, there exists at least one high-probability diﬀerential characteristic: P ⊕ P = 80000000 80000000 80000000 80000000 ↓ E(P, K) ⊕ E(P , K) = 80000000 80000000 80000000 80000000 With probability 2−16 this characteristic will penetrate 16 iterations of rounds 1, 2, and 4. The characteristic holds with P = 1 for round 3, yielding a total product probability of 2−48 . Note that if the chaining variable is added, the output xor becomes zero. This attack is closely related to the collision attacks discussed in [6]. 3.1

Message Digest Cipher

The Message Digest Cipher (MDC) encryption mode for iterated hash functions was proposed by Gutmann in 1993 and is used in his Secure FileSystem software (in conjunction with the SHA-1 compression function) [9]. MDC can be deﬁned as C0 = IV Ci = Pi ⊕ (E(Ci−1 , K) Ci−1 ) for i = 1, 2, . . . , n. Here the boxed plus symbol () denotes wordwise addition modulo 232 , IV is an initialization vector, Pi are the plaintext blocks, and Ci are the corresponding ciphertext blocks. If we ignore the addition operation, MDC is equivalent to running the compression function in CFB (cipher feedback) mode.

42

Markku-Juhani O. Saarinen

The decryption operation can be written as: Pi = Ci ⊕ (E(Ci−1 , K) Ci−1 ) This allows us to select the input to the compression function. Using the diﬀerential characteristic described in the previous section, we can distinguish MDCMD5 from a “perfect” 128-bit block cipher in CFB mode with about 248 blocks (255 bits) in a chosen ciphertext attack. Computational cost of key recovery can be substantially reduced using this, and other, diﬀerential properties of the MD5 compression function. Such attacks depend on the details of key scheduling mechanism used. 3.2

The Kaliski-Robshaw Cipher

Another proposal based on MD5 is the Kaliski-Robshaw cipher [13]. The main purpose of the proposal apparently was to activate discussion of very large block ciphers. However, the paper gave enough details about this proposal for us to mount a cryptanalytic attack. This cipher has 8192-bit blocksize and its basic iteration is closely related to that of MD5. However, the overall structure is radically diﬀerent. It turns out that ﬂipping bit 26 (0x04000000) of one of the 256 plaintext words will result in equivalence of at least 64 ciphertext words with experimental probability 0.096 = 2−3.4 . This is due to the fact that this particular bit often only aﬀects three words in ﬁrst round. In each of the consequent rounds the number of aﬀected words only can only quadruple, resulting 3 ∗ 4 ∗ 4 ∗ 4 = 192 aﬀected words in the end, and leaving 64 words untouched. This immediately leads to a distinguisher requiring no more than dozen chosen plaintext blocks. Analysis of key recovery attacks is made a little bit more diﬃcult by the sketchy nature of the description of key schedule. If we assume that the key can be eﬀectively recovered from permutation P, we believe that a key recovery attack will not require more than 216 chosen plaintext blocks and neglible computational eﬀort.

4

Conclusions

We have presented attacks against block ciphers that have been directly derived from dedicated hash functions. Section 2 discusses slide attacks against SHA-1 and SHACAL-1 and section 3 describes simple attacks against MDC-MD5 and the Kaliski-Robshaw cipher. Compression functions are meant to be only ran in one direction. The security properties of compression functions can be diﬀerent when ran in the opposite direction (“decryption”). Furthermore a key-scheduling mechanism suitable for a dedicated hash function may be insuﬃcient for a block cipher. Based on evidence in hand, we assert that since the design criteria of compression functions and block ciphers are radically diﬀerent, adaptation of even a secure compression function as a block cipher is often not a wise thing to do.

Cryptanalysis of Block Ciphers Based on SHA-1 and MD5

43

Acknowledgements The author would wish to thank Kaisa Nyberg, Helger Lipmaa, Matt Robshaw, and several anonymous reviewers for useful comments.

References 1. R. Anderson. The Classiﬁcation of Hash Functions. Proc. Codes and Cyphers: Cryptography and Coding IV, Institute of Mathematics & Its Applications, pp. 83 – 93. 1995. 2. T. A. Berson. Diﬀerential Cryptanalysis Mod 232 with Applications to MD5. Advances in Cryptology – Proc. Eurocrypt ’92, LNCS 0658. pp. 71 – 80. SpringerVerlag, 1993. 3. J. Black, P. Rogaway, and T. Shrimpton Black-Box Analysis of the Block-CipherBased Hash-Function Constructions from PGV Advances in Cryptology – Proc. Crypto ’02. LNCS 2442, pp. 320 – 335. Springer-Verlag, 2002. 4. A. Biryukov and D. Wagner. Slide attacks Proc. Fast Software Encryption ’99, LNCS 1636. pp. 245 – 259. Springer-Verlag, 1999. 5. A. Biryukov and D. Wagner. Advanced Slide Attacks. Advances in Cryptology – Proc. Eurocrypt ’00, LNCS 1807. pp. 589 – 606. Springer-Verlag, 2000. 6. B. den Boer and A. Bosselaers. Collisions for the compression function of MD5. Advances in Cryptology – Proc. Eurocrypt ’93, LNCS 0765. pp. 293 – 304. SpringerVerlag, 1994. 7. I. Damg˚ ard. A Design Principle for Hash Functions. Advances in Cryptology – Proc. Crypto ’89, LNCS 0435. pp. 399 – 416. Springer-Verlag, 1990. 8. H. Dobbertin. Cryptanalysis of MD5 Compress. Presented at Eurocrypt ’96 rump session, May 14, 1996. 9. P. C. Gutmann. SFS Version 1.0 Documentation. Available form http://http://www.cs.auckland.ac.nz/˜pgut001/sfs/ 10. H. Handschuh, L. R. Knudsen, and D. Naccache. Analysis of SHA-1 in Encryption Mode. Proc. RSA Cryptographers’ Track 2001, LNCS 2020. pp. 70 – 83. SpringerVerlag, 2001. 11. H. Handschuh and D. Naccache. SHACAL. Submission to the NESSIE project, 2000. Available from http://www.cryptonessie.org. 12. H. Handschuh and D. Naccache. SHACAL: A Family of Block Ciphers. Submission to the NESSIE project, 2002. Available from http://www.cryptonessie.org. 13. B. S. Kaliski and M. J. B. Robshaw. Fast Block Cipher Proposal. Proc. Fast Software Encryption 1993, LNCS 0809. pp. 33 – 40. Springer-Verlag, 1994. 14. Ralph Merkle. One Way Hash Functions and DES. Advances in Cryptology – Proc. Crypto ’89, LNCS 0435. pp. 428 – 446. Springer-Verlag, 1990. 15. R. K. Nichols and P. C. Lekkas. Wireless Security – Models, Threats, and Solutions. McGraw-Hill, 2002. 16. B. Preneel, R. Govaerts, and J. Vandewalle. Hash Functions Based on Block Ciphers: A Synthetic Approach. Proc Crypto ’93. LNCS 0773, pp. 368 – 378. SpringerVerlag, 1993. 17. B. Preneel. Cryptographic Primitives for Information Authentication – State of the Art. State of the Art in Applied Cryptography. Course on Computer Security and Industrial Cryptography, Leuven, Belgium, June 1997. Revised Lectures. LNCS 1528, pp. 49 – 130. Springer-Verlag, 1998.

44

Markku-Juhani O. Saarinen

18. R. Rivest. The MD5 Message-Digest Algorithm Network Working Group RFC 1321, 1992. 19. U.S. Department of Commerce. FIPS PUB 180-1: Secure Hash Standard. Federal Information Processing Standards Publication, April 1995. 20. U.S. Department of Commerce. FIPS PUB 180-2: Secure Hash Standard, Draft Federal Information Processing Standards Publication, 2001. 21. D. Wagner. A Slide Attack on SHA-1. Unpublished manuscript and personal communication. June 4, 2001.

A

A Slid Pair for SHA-1

The algorithm for ﬁnding slid pairs was implemented in the C programming language (588 lines). Test run required roughly 2 hours of CPU time on a 1 GHz Pentium III computer (GCC/Linux). Triplet A. (A, B, C, D, E) = 02AAD5C2 DC766713 19C66B2F 7CEAE5B1 CC08CC0B W0...15 = 8DA3F8F6 BBA5050C 99D3C3DC BBA5050C 99D3C3DC E42BAFB3 37DF640F 1ABABEEA 8DA3F8F6 E42BAFB3 37DF640F B57DEBB5 5AA5AB1F 44ED8DA0 1B63271F EAE12A73 (A , B , C , D , E ) = FC56BE44 03A42CDA F68056F0 960F5286 32985CD9 Triplet B. (A, B, C, D, E) = 4258DA7D 02AAD5C2 F71D99C4 19C66B2F 7CEAE5B1 W0...15 = BBA5050C 99D3C3DC BBA5050C 99D3C3DC E42BAFB3 37DF640F 1ABABEEA 8DA3F8F6 E42BAFB3 37DF640F B57DEBB5 5AA5AB1F 44ED8DA0 1B63271F EAE12A73 BA7C9CF9 (A , B , C , D , E ) = 58BB28F0 FC56BE44 C0E90B35 F68056F0 960F5286 It is easy to see that Triplet B has been slid “right” by one position compared to Triplet A. The message block has been slid “left” correspondingly. Output arrays (A , B , C , D , E ) include the ﬁnal addition of chaining variable. If this ﬁnal “chaining” is removed, this is also a slid pair for SHACAL-1.

Analysis of Involutional Ciphers: Khazad and Anubis Alex Biryukov Katholieke Universiteit Leuven, Dept. ESAT/COSIC, Leuven, Belgium [email protected]

Abstract. In this paper we study structural properties of SPN ciphers in which both the S-boxes and the aﬃne layers are involutions. We apply our observations to the recently designed Rijndael-like ciphers Khazad and Anubis, and show several interesting properties of these ciphers. We also show that 5-round Khazad has 264 weak keys under a “slide-witha-twist” attack distinguisher. This is the ﬁrst cryptanalytic result which is better than exhaustive search for 5-round Khazad. Analysis presented in this paper is generic and applies to a large class of ciphers built from involutional components.

1

Introduction

Ciphers constructed from a product of involutions are not new in cryptography. In fact one of the most popular constructions – the Feistel construction is a product of two involutions: XOR of the two halves of the block and swap of the two halves of the block. However SPNs resulting from a product of involutions (i.e. both the non-linear S-box layer and the aﬃne layer are involutions) were not intensively studied. Recently two ciphers with this new property have been designed by Barreto and Rijmen and submitted to the European prestandardization project NESSIE [6]. These ciphers are the 64-bit 8-round SPN cipher Khazad [1] and the 128-bit 12-18-round cipher Anubis [2]. Both ciphers have Rijndael-like structure [4]. Khazad uses an MDS diﬀusion layer which provides complete diﬀusion after one round (branch number is 9), while Anubis has a slower, Rijndael-like diﬀusion. In both cases linear transformations of the two ciphers are chosen to be involutions. The same is true for the S-boxes: Anubis and Khazad share the same 8x8 bit involutional S-box which is constructed from three layers of smaller 4x4 bit involutional S-boxes. Motivations behind such extensive use of involutional components is twofold: eﬃcient implementation and equal security of encryption and decryption. The only diﬀerence between encryption and decryption of these ciphers is in the inverse key-schedule. In this paper we study the structure of a generic involutional SPN on an example of Khazad. We show several interesting properties of Khazad and Anubis

The work described in this paper has been supported in part by the Commission of the European Communities through the IST Programme under Contract IST-199912324 and by the Concerted Research Action (GOA) Meﬁsto.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 45–53, 2003. c International Association for Cryptologic Research 2003

46

Alex Biryukov

which follow from their elegant involutional structure. The analysis that we provide holds for arbitrary S-boxes or aﬃne mixing layers assuming only that these are involutions, and the S-boxes don’t have ﬁxed points (which is true both for Khazad and Anubis, since it was one of the design criteria). Whether these properties can be exploited in attacks on round-reduced or full Khazad is a matter of further research. Finally, we use involutional structure of Khazad in order to mount sliding-with-a-twist attack [3] on 5-rounds of this cipher, which works for 264 out of 2128 keys. This attack might be of independent interest, since it may be applied to any cipher with the structure P ◦ F ◦ Q , where F is an arbitrary involution (for all the keys or for some of the keys), and P, Q are arbitrary keyed bijections.

2

Previous Results for Khazad and Anubis

The best attack so far on Khazad is the very eﬃcient square attack on 3-rounds (29 chosen plaintexts and 216 S-box lookups). Extended by 64-bit subkey guessing one gets an attack on 4-round Khazad (280 S-box lookups). Gilbert-Minier’s collision attack [5] which worked better than the square attack for Rijndael, will not work for Khazad since it will require full 64-bit block collisions which may happen only for equal inputs (in Rijndael one could provide partial 4-byte collisions, due to slower mixing). The designers claim that truncated diﬀerentials attacks can’t be mounted for 4 or more rounds. No weak keys are known. Many attacks that were devised against Rijndael can be applied to Anubis with similar results. The best attack is the Gilbert-Minier attack which breaks 7-round Anubis with 232 chosen plaintexts and 2140 complexity of analysis. The square attack against 7-rounds has data complexity 2128 −2119 chosen plaintexts, and the analysis complexity 2120 steps. Anubis has more rounds than Rijndael, in order to increase its security margin.

3

Some Properties of Khazad

As mentioned before we will study the involutional properties on an example of Khazad, however we will not use any speciﬁc features of its S-boxes or mixing layers and thus all the following discussion will apply to a generic involutional SPN cipher (with involutional S-boxes without ﬁxed points) and in particular will be applicable to the Anubis block cipher. We refer the reader to [1, 2] for detailed descriptions of Khazad and Anubis. Throughout this paper we will use the following notation: S-box layers will be denoted by S, linear diﬀusion layers by M (multiplication by an MDS matrix in Khazad) and key-addition by k (sometimes with a subscript to enumerate diﬀerent round subkeys). Since linear layer and key addition layer commute we can write: M (x) ⊕ k = M (x) ⊕ M (k ) = M (x ⊕ k ), where k is a version of the key k transformed by the linear layer M . This gives us an equivalent representation of Khazad in which the unknown subkeys are

Analysis of Involutional Ciphers: Khazad and Anubis

47

grouped around the odd S-box layers and the M operations are grouped around the even S-box layers. Thus, for the 5-round Khazad, which initially can be presented like this (from the left to the right): k0 SM k1 SM k2 SM k3 SM k4 Sk5 , we will arrive at the following structure: k0 Sk1 [M SM ]k2 Sk3 [M SM ]k4 Sk5 . In this equivalent representation the odd round subkeys are the same as in the original scheme and the even-round subkeys must be transformed by the linear diﬀusion layer k = M −1 (k) = M (k) (recall that the M is an involution). This new representation has several interesting properties. Deﬁnition 1. A cycle type of a permutation A is a multiset containing the cycle lengths of the permutation together with their multiplicities – the number of times each cycle length occurs in the permutation. We will denote the cycle type of A by C(A). For example a permutation (1, 0, 3, 2, 5, 6, 7, 4) which can be written as a collection of three cycles (1, 0), (3, 2) and (5, 6, 7, 4) has the cycle type (2, 2), (4, 1) (i.e. two cycles of length two, and one cycle of length 4). First of all, the M SM structure is a ﬁxed permutation that covers 1.5 rounds of a cipher and is known to the attacker. Moreover, since M is an involution we have M SM = M SM −1 = A which means that A is a permutation isomorphic to S, and since S in Khazad is an involution so is the permutation A, i.e. (C(A) = C(M SM ) = C(S))1 . In the following subsections we will study the properties of the kSk and M SM permutations. 3.1

Properties of the k1 Sk2 Structure

In this subsection we study the function: k1 Sk2 (x) = S(k1 ⊕x)⊕k2 , for arbitrary 64-bit keys k1 , k2 and a layer of eight S-boxes S. Theorem 1. The cycle structure of k1 Sk2 depends only on k1 ⊕ k2 . Moreover each cycle length appears in the permutation k1 Sk2 an even number of times. Proof. Let us denote a permutation caused by XOR of k1 ⊕ k2 as k1 k2 . Then we can write: k1 Sk2 (x) = S(k1 ⊕ x) ⊕ k2 as (S(k1 ⊕x)⊕k1)⊕(k1 ⊕k2) = k1 k2◦k1 Sk1 = (S((k1 ⊕k2)⊕k2 ⊕x)⊕k2) = k2 Sk2 ◦k1 k2 . 1

The tweaked version of Khazad has a very structured S-box, thus A is isomorphic to the [QP ] structure inside the S-box. This doesn’t add information in terms of the cycle type of A but may be helpful if we will ﬁnd interesting properties of the isomorphism itself.

48

Alex Biryukov

Both k1 Sk1 , and k2 Sk2 are involutions. Since k1 Sk2 is a product of two involutions without ﬁxed points (in case S has no ﬁxed points and unless k1 = k2 ) it consists of cycles of the same length in even numbers (this fact was central to the cryptanalysis of the Enigma cipher [7]). The k1 Sk2 layer is a parallel application of eight 8-bit permutations, each following the theorem above. Let us denote the number of cycles of the ith permutation by 2ni , then we can write the following corollary: Corollary 1. The permutation k1 Sk2 consists of at least 28 i ni cycles. The largest cycle size is smaller than 256 elements. Cycle lengths are 128-smooth numbers. Proof. Writing as an 8-component vector the initial point x = (x1 , x2 , . . . , x8 ), the cycle produced by iterations of k1 Sk2 will have the length which is a LCM of lengths of individual components cycles starting at xi . Since maximal cycle length for each component is 128 the corollary follows (the upper bound is reached if all cycle lengths are relatively prime)2 . Moreover let us pick two points x = (x1 , x2 , . . . , x8 ) and y = (y1 , y2 , . . . , y8 ) at random. The GCD of the cycle sizes deﬁned by the two initial points x and y has good chances to be quite large (which is unlikely for a random permutation). The probability that components xi ’s and yi ’s will belong to the same cycle or 2li to the two diﬀerent cycles of the same size is more than 256 , where 2li is the size of the cycle starting at xi . By collecting the cycle lengths and studying the factors we obtain initial information about the k1 k2 , since each k1 k2 deﬁnes (but not uniquely) its cycle pattern. 3.2

Properties of the [M SM ]k1 Sk2 [M SM ] Structure

Now let us look at the following piece of the Khazad’s structure: [M SM ]k1 Sk2 [M SM ] = Ak1 Sk2 A = B. This piece covers 3.5 rounds of Khazad. Since A is an involution k1 Sk2 and B are isomorphic permutations. Thus B has all the nice cycle structure described in the Corollary 1., in the previous subsection. Claim 1. For a randomly chosen pair of keys k1 , k2 the structure Ak1 Sk2 A has no ﬁxed points3 with probability larger than 1 − 2−8 . However if it has at least a single ﬁxed point then at once it must have more than 28 ﬁxed points4 . Proof. Due to isomorphism of Ak1 Sk2 A and k1 Sk2 under an involution A there is a 1-1 correspondence between the ﬁxed points of the two functions. Similarly 2

3 4

If a point belongs to cycles of lengths (l1 , . . . , l8 ), then in a ‘product’ permutation li it belongs to a cycle of length lcm(l1 , . . . , l8 ), and lcm(l1 ,...,l diﬀerent cycles of the 8) same length are generated. A random permutation has no ﬁxed points with probability ≈ 1/e. 1 For a random permutation the average number of ﬁxed points is e−1 N i=1 (i−1)! ≈ 1.

Analysis of Involutional Ciphers: Khazad and Anubis

49

it is enough to study ﬁxed points of the function S(x) ⊕ k , where k = k1 ⊕ k2 . The question is how many solutions there are to the equation S(x) ⊕ x = k ? Since S is an involution, running through all possible x’s there are at most 128 diﬀerent solutions for the diﬀerence of the keys (for the actual S-box there are 102 diﬀerent solutions, some suggested by 6 diﬀerent inputs). Thus if k is picked at random the chance to be in a “ﬁxed-point” permitting combination is: (102/256)8 ≈ 2−10.6 . However if the key diﬀerence is in the proper combination each S-box has at least two inputs that will preserve the ﬁxed point. Thus the total number of ﬁxed points will be more than 28 . 3.3

Properties of the kSk[M SM ]kSk[M SM ]kSk Structure

The structure KH5 = kSk[M SM ]kSk[M SM ]kSk = k0 Sk1 Ak2 Sk3 Ak4 Sk5 is actually a 5-round Khazad, so any non-randomness in it, will be a very interesting property. For example if k1 = k4 (a 64-bit restriction on the key-schedule) and then writing k5 = k0 ⊕(k0 ⊕k5 ) we obtain permutation isomorphic to just intermediate k2 Sk3 further permuted by a XOR with (k0 ⊕ k5 ). Another interesting property of 5-round Khazad is the following: suppose that the attacker guesses k = k0 ⊕ k5 , and iterates the 5-round encryption function together with the key-diﬀerence XOR after each iteration: (KH5 (x) ⊕ k )n . It is easy to see that due to cancellations that happen, this is almost equivalent to iterations of a 3-round Khazad. If we denote 3-round Khazad by KH3 we may write (KH5 (x) ⊕ k )n = k0 SM (KH3 )n M Sk0 . For example if one can detect any peculiarities in the cycle structure of 3-round Khazad in less than 264 steps this property will provide a distinguishing attack on 5-round Khazad faster than the exhaustive key search.

4

Sliding Encryption against Decryption for 5-Round Khazad

In this section we will apply advanced sliding technique called slide-with-atwist [3] to ﬁve round Khazad. The idea behind this attack method is to slide encryption against decryption in order to gain additional degree of self-similarity due to increased symmetry. Consider 5-round Khazad written in equivalent representation and aligned as shown below: P1 = k0 Sk1 [Ak2 Sk3 A]k4 Sk5 = C1 C2 = k5 Sk4 [Ak3 Sk2 A]k1 Sk0 = P2 . Here, as in the previous sections of this paper we have k1 = M (k1 ), and = M (k3 ), where k1 , k3 are the original subkeys of Khazad. The other subkeys k0 , k2 , k4 , k5 are not changed. k3

50

Alex Biryukov

If k2 = k3 , then a piece in the brackets [Ak2 Sk3 A] is an involution and sliding with a twist applies. We thus can write two very simple equations: S(P1 ⊕ k0 ) ⊕ k1 = S(C2 ⊕ k5 ) ⊕ k4 , S(P2 ⊕ k0 ) ⊕ k1 = S(C1 ⊕ k5 ) ⊕ k4 . Although there are many unknowns in these equations: (k0 , k1 , k4 , k5 ), there is no diﬀusion. If we encrypt a pool of texts Pi , i = 0, . . . , 232 − 1 with ﬁxed 32-bits, by setting inputs of four out of eight S-boxes to arbitrary constant, a proper slid pair will have 32-bits in common between the plaintexts and as a consequence of the slid-equations it will have 32-bits in the same positions in common between the ciphertexts. Thus instead of checking all the 263 possible pairs we can check only 231 pairs which satisfy the 32-bit ﬁltration condition. These pairs can be easily identiﬁed in a pool of ciphertexts sorted by the positions corresponding to ﬁxed input S-boxes. We will call the four S-boxes with a ﬁxed input – “nonactive” S-boxes and the four S-boxes with varying input – “active” S-boxes. There are several possible ways to exploit the set of remaining non-ﬁltered pairs which contains at least one slid-pair with non-negligible probability. One method would be to look at possibilities for the 4 · 8 · 3 = 96 key bits of the keys k0 , k1 ⊕ k4 , k5 suggested by the slid pairs. Notice that using the slid pair we can’t split the keys k1 and k4 . Since each equation provides us with 8-bit test condition per S-box, total ﬁltration power of two equations with four active S-boxes is 8·2·4 = 64. That means that each pair will suggest about 232 possible variants for the 96 key bits. For each analyzed pair all these key variants can be written in a compact cross-product form and stored using 4 · 28 = 210 bytes of memory. Using lookup tables it is also possible to perform this key enumeration eﬃciently in about 210 steps. Thus each pool suggests 263 choices for 96 key bits and these choices can be stored compactly in 241 bytes of memory. The idea would be to request several pools in order to be sure that the correct key is listed several times and then try to ﬁnd the frequently suggested correct variant of the key among the masses of incorrect guesses. While this approach might be marginally faster than exhaustive search we would prefer to ﬁnd a better technique. 4.1

Generating More Slid Pairs

Another method is based on an observation that given a slid pair, it is relatively easy to produce many more slid pairs due to absence of diﬀusion in the slid equations. Indeed if one makes a small change to one of the S-boxes in plaintext P1 and another change in the corresponding S-box at the ciphertext C2 , with a chance higher than 2−8 the pair will remain a slid pair (this idea is due to Gustaf Dellkrantz). Moreover, we apply the “variation” to one of the four S-boxes that were active in the pool-generation step. Since plaintexts already run through all possible values for these S-boxes we only have to apply a single change to the ciphertext side. We then decrypt a new ciphertext into a new plaintexts. If the original pair

Analysis of Involutional Ciphers: Khazad and Anubis

51

was a properly slid pair we are guaranteed, that a new slid pair will be formed by the new query and one of the pairs from the old pool. The attack will proceed as follows (suppose we vary S-box i): 1. For each probable slid pair, ﬁnd about 28 choices for the 24-bits of the three key bytes k0 , k1 ⊕ k4 , k5 at i-th S-box location. This step can be done in about 28 lookups given a 32-bit lookup table and byte-values of plaintext and ciphertext from the i-th byte location of the analyzed pair. 2. Change i-th byte of the ciphertext C2 to an arbitrary value, to get C2 . Decrypt C2 to get the new plaintext P2 . 3. For each 24-bit key guess from the 1st step, pick corresponding P1 , C1 pair from the pool. This is done given the ﬁrst slid equation (and can be precomputed and stored in a table for each key-candidate, if the variation value is ﬁxed). 4. Check that the pair is a properly slid pair, i.e. that the second equation holds as well. This is an 8-bit ﬁltering condition for the wrong key guesses. 5. After steps 1-4 on the average one candidate for the 24-bits of k0 , k1 ⊕ k4 , k5 at i-th location survives. We can thus apply second variation at the same ith location, to check this single key candidate. The change in the ciphertext byte and the candidate key will deﬁne on the average a single value of the plaintext byte via the ﬁrst slid equation. This plaintext byte will deﬁne which plaintext-ciphertext pair we need to pick from the pool. The second slid equation will thus provide us with an 8-bit ﬁltration for the wrong pairs. After this step only about 231 /28 = 223 pairs will remain. For these we can repeat step 5 again, till only the proper slid pairs remains. 6. Given that only few wrong pairs (if any) may survive steps 1-5, we apply variation to other S-box locations and recover 96-bits of the keys k0 , k1 ⊕ k4 , k5 corresponding to four active S-boxes. To summarize, we will need two pools of size 232 texts each (in order to increase the probability to ﬁnd a slid pair), for each pool we request additional 2 · 231 adaptive chosen ciphertext queries. We could then repeat this attack with another set of pools, now ﬁxing another 32-bits of the plaintexts to constant to completely recover the subkeys k0 , k1 ⊕ k4 , k5 . However this approach would double amount of data for the attack. Instead we will use already obtained slid pairs, and apply “variations” to the S-box locations that were non-active during the ﬁrst phase of the attack. This way we completely recover the keys in just a few more adaptive chosen ciphertext/chosen plaintext queries. Using the two outer subkeys k0 and k5 we partially decrypt one round at the top and one round at the bottom. We are left with 3-round Khazad which can now be attacked using auxiliary techniques, for example Square attack, which has complexity of 29 additional chosen plaintexts and 216 S-box lookups (we may also use the assumption that k2 = k3 which is the condition of the weak key class). The total data complexity of this attack is 234 blocks and the analysis complexity is 240 table lookups and O(232 ) memory. The attack works for one in 264 keys.

52

4.2

Alex Biryukov

A Generalization for an Arbitrary Involutional Cipher

This attack works whenever a cipher can be written as: Ek = P ◦ F ◦ Q , where F is an involution at least for some of the keys, and P, Q are arbitrary keyed bijections. Then one can slide with a twist: Ek (x1 ) = Q ◦ F ◦ P (x1 ) = y1 , Dk (y2 ) = P ◦ F ◦ Q(y2 ) = x2 . This provides two slid equations: P (x1 ) = Q(y2 ), P (x2 ) = Q(y1 ). If these equations are easy to solve the attack will work.

5

Properties of the KeySchedule

Khazad uses iterations of a 128-bit Feistel scheme with the internal F -function being a round of Khazad with the constants (taken from the S-box) used as the 64-bit key. The user speciﬁed 128-bit key is used as a plaintext and the intermediate 64-bit values after each round become the actual subkeys of Khazad. Such keyschedule would possess nice ‘sliding’ or symmetry properties, but the randomized constants spoil them. Still the keyschedule is a Feistel block cipher with a bijective F -function. Thus given any pair of consequent subkeys ki , ki+1 it is possible to reconstruct all the subkeys backwards or forward (including the master key). Moreover it can be done for ki , ki+2 and even for ki , ki+3 in some cases.

6

Conclusions

In this paper we have shown structural properties of SPN ciphers built with involutional components. Using these we have shown interesting features of recently designed ciphers Khazad and Anubis, which where submitted to the NESSIE European pre-standardization project. The main observation is that there might be a distinguisher for 5-round Khazad if 3-round Khazad has any peculiarities in its cycle structure. Using the equivalent representation of Khazad we show a class of 264 weak keys out of total 2128 keys. For a weak key, 5-round Khazad can be broken using 234 chosen plain/ciphertext queries and 240 table lookups for the analysis. Full round Khazad is not threatened by this result.

Acknowledgment The author would like to thank Christophe De Canni`ere, Gustaf Dellkrantz and the members of the NESSIE project team for fruitful discussions. We would also like to thank the anonymous referees, whose comments helped to improve this paper.

Analysis of Involutional Ciphers: Khazad and Anubis

53

References 1. P. Barreto, V. Rijmen, The Khazad Legacy-Level Block Cipher, Submission to the NESSIE Project. 2. P. Barreto, V. Rijmen, The Anubis Block Cipher, Submission to the NESSIE Project. 3. A. Biryukov, D. Wagner, Advanced Slide Attacks, Proceedings of Eurocrypt’2000, LNCS 1807, pp.589–606, Springer-Verlag, 2000. 4. J. Daemen, V. Rijmen, The Design of Rijndael, Springer-Verlag, 2001. 5. H. Gilbert, M. Minier, A collision attack on seven rounds of Rijndael, In Proceedings of the third AES Conference, pp.230-241, NIST, 2000. 6. NESSIE, New European Schemes for Signatures, Integrity, and Encryption, IST1999-12324, http://www.cryptonessie.org 7. M. Rejewski, Mathematical Solution of the Enigma Cipher, Cryptologia, Vol. 6, No. 1, pp. 1–18, 1982.

On Plateaued Functions and Their Constructions Claude Carlet1 and Emmanuel Prouﬀ2 1

2

INRIA, projet CODES, BP 105 – 78153, Le Chesnay Cedex, France; also member of GREYC-Caen and of the University of Paris 8 [email protected] INRIA Projet CODES and University of Paris 11, Laboratoire de Recherche en Informatique, 15 rue Georges Clemenceau, 91405 Orsay Cedex, France [email protected]

Abstract. We use the notion of covering sequence, introduced by C. Carlet and Y. Tarannikov, to give a simple characterization of bent functions. We extend it into a characterization of plateaued functions (that is bent and three-valued functions). After recalling why the class of plateaued functions provides good candidates to be used in cryptosystems, we study the known families of plateaued functions and their drawbacks. We show in particular that the class given as new by Zhang and Zheng is in fact a subclass of Maiorana-McFarland’s class. We introduce a new class of plateaued functions and prove its good cryptographic properties. Keywords: Boolean functions, Bent, Three-valued crosscorrelation, Nonlinearity, Resiliency, Stream Ciphers, Combinatorial Cryptography.

1

Introduction

In the design of cryptographic functions, there is need to consider various characteristics simultaneously (balancedness, high nonlinearity, high algebraic degree, good propagation characteristics, high order correlation immunity, non-existence of non-zero linear structures...). The importance of each characteristic depends on the choice of the cryptosytem. Balancedness and nonlinearity are most important criteria in all situations. By achieving optimum nonlinearity, bent functions permit to resist linear attacks in the best possible way. But, being not balanced they are improper for direct cryptographic use. Moreover, they exist only in even dimensions. This led cryptographers to search for new classes of Boolean functions whose elements still have good nonlinearities and can be balanced (and moreover resilient) for both odd and even dimensions. The class of partially-bent functions was ﬁrst investigated [9]. These functions are built by identifying in the space Fn2 two subspaces E and F whose direct sum equals Fn2 and by deﬁning the functions as the sums of linear functions deﬁned on E and of bent functions deﬁned on F . But in spite of their potentially good properties (good nonlinearity, resiliency and propagation characteristics) partially-bent functions, when they are not bent, have by deﬁnition non-zero linear structures and so do not give full satisfaction. The class T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 54–73, 2003. c International Association for Cryptologic Research 2003

On Plateaued Functions and Their Constructions

55

of plateaued functions1 is a natural extension of the notion of partially bent function. It provides some examples of good trade-oﬀs between all the properties needed for a cryptosystem. For instance, it has been shown by Sarkar and Maitra [34] that the order of resiliency and the nonlinearity of Boolean functions were strongly bounded (this result was partly also obtained by Tarannikov [36] and Zheng and Zhang [39]); the best compromise between those two properties is achieved by plateaued functions only. Tarannikov gave examples of functions achieving this best possible compromise. These examples [33] and almost all the other existing examples were obtained through iterative constructions. The functions obtained this way often have cryptographic weaknesses such as linear structures (see [4]). The only known general class of non iteratively deﬁned plateaued functions is obtained through Maiorana-McFarland’s construction (or its generalization by Carlet [12]). It contains functions which reach Sarkar et al.’s bound but only for very high resiliency orders (cf. [12]). By extending the notion of covering sequence of balanced functions introduced by Carlet and Tarannikov [14] we give in this paper a characterization of bent functions and extend it to a characterization of plateaued functions. We recall some basic properties of plateaued functions and we give a list of all the known constructions of such functions. For each of them, we recall the main drawbacks. In the last part of this paper, we introduce and we study a new class of plateaued functions which have not the weaknesses of the MaioranaMcFarland’s functions.

2

Preliminaries

We shall have to distinguish in the whole paper between the additions of integers in R, denoted by + and i , and the additions mod 2, denoted by ⊕ and i . So all the sums computed in characteristic 0 will be denoted by i and all the sums computed modulo 2 will be denoted by i . For simplicity and because there will be no ambiguity, we shall denote by + the addition of vectors of Fn2 (words). We ﬁrst recall basic facts about Boolean functions. A Boolean function f in n variables is an F2 -valued function on the space Fn2 of n-tuples over F2 . We call support of f the set {x ∈ Fn2 /f (x) = 1} and we denote it by Supp (f ). Its size is by deﬁnition the weight of f and is denoted by W (f ). A Boolean function f is balanced if W (f ) = 2n−1 . Every Boolean function f on Fn2 admits a unique representation as a polynomial over F2 in n binary variables of the form: f (x1 , · · · , xn ) = aI xi . (1) I⊆{1,··· ,n} 1

i∈I

The term of “plateaued” was proposed by Zhang and Zheng [40] and denotes the functions which either are bent or have a Walsh spectrum with three values 0 and ±λ. Some of these functions had been studied already in Sequence designs [2, 16, 19–21] and in Codes [31].

56

Claude Carlet and Emmanuel Prouﬀ

This representation is called the algebraic normal form (A.N.F.) of f . We will call (algebraic) degree of f and denote by deg f the degree of its A.N.F. and we denote by R (d, n) the set of all Boolean functions on Fn2 whose degrees are upper bounded by d (the so-called Reed-Muller code of order d on Fn2 ). To make easier the study of the properties of f , we classically introduce f (x) the “sign” function χf of f deﬁned as χf (x) = (−1) . The Fourier transform χ of χ will be called the Walsh transform of f . By deﬁnition χ f (b) = f f f (x)+x·b (−1) . It satisﬁes Parseval’s relation: x∈Fn 2

2

χ f (b) = 22n

(2)

b∈Fn 2

and the inverse Fourier formula, χ f = 2n χf

(3)

(which is more generally valid for every real-valued function). We recall now the deﬁnition of the convolutional product between two numerical functions ϕ and ψ on Fn2 . It is denoted by ϕ ⊗ ψ and deﬁned on Fn2 by: (ϕ ⊗ ψ) (x) = ϕ (a) ψ (a + x) . (4) a∈Fn 2

A well-known fact is that the Fourier transform of ϕ ⊗ ψ equals the product of that is: ϕ and ψ, ϕ ⊗ψ =ϕ ψ. (5) A useful tool to study a Boolean function f is the notion of derivative. The derivative of f with respect to a vector a ∈ Fn2 is the function Da f : x −→ f (x)⊕f (x + a). The derivatives play an important role in crytpography, related to the diﬀerential attack [1]. They are also naturally involved in the deﬁnition of the Strict Avalanche Criterion SAC and of the Propagation Criterion P C [32]. These criteria evaluate some kind of diﬀusion of the function. The Hamming distance between two Boolean functions f1 and f2 on Fn2 equals by deﬁnition the weight of f1 ⊕ f2 . We call nonlinearity of f and we denote by Nf the minimum distance between f and all aﬃne functions. The nonlinearity of a function quantiﬁes the level of confusion put in the system by the Boolean function. Cryptographic functions used in stream or block ciphers must have high nonlinearities to prevent these systems from linear attacks (see [37, 27]). For every Boolean function f , the nonlinearity Nf and the Walsh transform χ f satisfy the relation: Nf = 2n−1 −

1 χf (b) |. max | 2 b∈Fn2

(6)

Because of Parseval’s relation (2), Nf is upper bounded by 2n−1 − 2n/2−1 . This bound is tight for every n even. The functions achieving it are called bent. We recall now the diﬀerent known characterizations of bent functions:

On Plateaued Functions and Their Constructions

57

Proposition 1. A Boolean function f on Fn2 is bent if and only if one of the two following statements is satisﬁed: n 1. ∀b ∈ Fn2 , χ f (b) = ±2 2 , 2. ∀a ∈ Fn2 ∗ , W (Da f ) = 2n−1 . Thus, derivatives play also a role with respect to the notion of confusion, since they permit a characterization of bent functions; recent results [5] show that they play more generally a role with respect to nonlinearity, even for non-bent functions. Remark: the notion of bent function being invariant under addition of aﬃne functions, it would be more natural to characterize bent functions by means of their second-order derivatives Da Db f and not by means of their ﬁrst-order derivatives Da f (indeed, the aﬃne functions are those Boolean functions whose second-order derivatives all vanish). This will be done at Proposition 3.

3

A Characterization of Bent Functions through Their Second-Order Derivatives

In [14], Carlet and Tarannikov introduced the notion of covering sequence of a Boolean function. n Deﬁnition 1. A covering sequence of a function f : F 2 → F2 is any sequence of reals λ = (λa )a∈Fn such that the real summation λa Da f is equal to a 2

a∈Fn 2

constant function ρ. The value of ρ is called the level of this sequence. If ρ = 0, then we say that the covering sequence is non-trivial. The next proposition gives complete characterization of the balancedness of Boolean functions by means of their covering sequences. Proposition 2. [14] Any Boolean function f on Fn2 is balanced if and only if it admits a non-trivial covering sequence. The same covering sequence – the constant sequence 1 – can be taken for all balanced functions. The level of this sequence with respect to any balanced function equals 2n−1 . We denote the second-order derivatives of f by Da Db f ; we have Da Db f (x) = f (x) ⊕ f (x + a) ⊕ f (x + b) ⊕ f (x + a + b). As seen in Proposition 1, all the derivatives Da f , a = 0, of a bent function f are balanced and so, according to Proposition 2, satisfy: ∀x ∈ Fn2 , b∈Fn Db Da f (x) = 2n−1 . 2

Thus, all bent functions are such that: ∀x ∈ Fn2 , a,b∈Fn Db Da f (x) = 22n−1 − 2n−1 , 2 D D f (x) D D f (x) n (−1) b a = 2n , since (−1) b a = 1− that is ∀x ∈ F2 , a,b∈Fn 2 2Db Da f (x). Let us prove now that this necessary condition is in fact a necessary and suﬃcient one.

58

Claude Carlet and Emmanuel Prouﬀ

Proposition 3. A Boolean function f deﬁned on Fn2 is bent if and only if: D D f (x) ∀x ∈ Fn2 , (−1) a b = 2n . (7) a,b∈Fn 2

Proof. Set θ = 2n . A Boolean function f satisﬁes: D D f (x) ∀x ∈ Fn2 , a,b∈Fn (−1) a b =θ 2

if and only if ∀x ∈ Fn2 ,

a,b∈Fn 2

f (x+a)+f (x+b)+f (x+a+b)

(−1)

= θ(−1)f (x) ,

or equivalently: ∀x ∈ Fn2 ,

a,b∈Fn 2

f (a)+f (b)+f (x+a+b)

(−1)

= θ(−1)f (x) ,

which can be rewritten using the convolutional product in the form χf ⊗ χf ⊗ χf = θ χf . According to the bijectivity of the Fourier transform and according to Relation (5), this is equivalent to: ∀u ∈ Fn2 , χ f (u) = θ χ f (u) . 3

D D f (x) Thus, we have a,b∈Fn (−1) a b = θ if and only if, for every u ∈ Fn2 , χ f (u) 2 √ n equals ± θ or 0. Since θ = 2 and according to Parseval’s relation (2), the value 0 is then never achieved by χ f . Since we could characterize those Boolean functions satisfying relation (7) as the bent functions, a natural idea is to try to characterize similarly those Boolean functions such that ∀x ∈ Fn2 , (−1)Da Db f (x) = θ (8) a,b∈Fn 2

where θ is no more necessarily equal to 2n . We shall see that this relation characterizes the class of plateaued functions.

4 4.1

Characterization of Plateaued Functions through Their Second-Order Derivatives Plateaued Functions

A Boolean function f : Fn2 → F2 is said to be plateaued if its Walsh transform χ f only takes the three values 0 and ±λ, where λ is some positive integer. We shall call λ the amplitude of the plateaued function. Because of Parseval’s relation, λ cannot be null and must be a power 2r with r ≥ n/2.

On Plateaued Functions and Their Constructions

59

Clearly, the nonlinearity Nf of a plateaued function of amplitude λ equals 2n−1− λ2 . These functions have been studied by many researchers in sequence design when they studied the cross-correlation between m-sequences and their decimations by an integer d (cf. Annex A.2) and by Canteaut, Carlet, Charpin and Fontaine [5, 6]. In the standard model of stream ciphers, a Boolean function is used to combine the outputs of n linear feedback shift registers. To resist divideand-conquer attacks, called correlation attacks, this function called combining function, has to be balanced and to stay balanced if any m of the inputs are ﬁxed. This property, called correlation-immunity, can be characterized by means of the Fourier transform: Proposition 4 ([38]). A Boolean function f on Fn2 is m-th order correlation f (u) = 0 for every immune if and only if its Walsh transform χ f satisﬁes χ vector u in Fn2 such that 1 ≤ wH (u) ≤ m, where wH denotes the Hamming weight. Balanced mth-order correlation-immune functions are called m-resilient. As proved in [34], [36] and [41], the resiliency order m of a Boolean function m+1 deﬁned on Fn2 and its nonlinearity satisfy the relation Nf ≤ 2n−1 . Thus, n−1 −2 − Nf −1. Moreover, the resiliency order m is upper bounded by log2 2 Sarkar and Maitra have shown in [34] that the values of the Walsh Transform of an n-variable, m-resilient (resp. m-th order correlation-immune) function are divisible by 2m+2 (resp. 2m+1 ) if m ≤ n − 2. Thus, if an m-resilient function achieves nonlinearity 2n−1 − 2m+1 , then the function is plateaued. Indeed, acχf (b) | equals 2m+2 . By applying cording to Relation (6), the value of maxb∈Fn2 | Relation (6) again, we see that the nonlinearity of plateaued functions with amplitude λ = 2r being equal to 2n−1 − 2r−1 , their resiliency order is upper bounded by r − 2. A function whose nonlinearity and resiliency order are respectively 2n−1 − 2r−1 and r − 2 is a good candidate to be used in stream ciphers, since it gives a best possible tradeoﬀ between resiliency and nonlinearity (and it has then also maximum possible algebraic degree, cf. [36]). Tarannikov and other authors exhibited some functions with non-linearity Nf = 2n−1 − 2r−1 and resiliency order achieving the bound r − 2. 4.2

The Characterization

The proof of Proposition 3 (except its last sentence) extends straightforwardly to any value of θ: Theorem 1. A Boolean function f is plateaued on Fn2 if and only if there exists D D f (x) (−1) a b = θ. If this condition is θ such that for every x ∈ Fn2 , a,b∈Fn 2 √ satisﬁed, then the amplitude of the plateaued function f equals θ, and θ is a power of 2 whose exponent is even and greater than or equal to n. Remark: 1. The fact that quadratic functions are plateaued is a direct consequence of Theorem 1, since their second-order derivatives are constant. And Theorem 1

60

Claude Carlet and Emmanuel Prouﬀ

gives more insight on the relationship between the nonlinearity of a quadratic function and the number of its nonzero second-order derivatives. 2. The second-order derivatives of Boolean functions of degree 3 being aﬃne, Theorem 1 shows a relationship between the ability of constructing plateaued functions of degree 3 and producing sets of aﬃne hyperplanes which are multicoverings of Fn2 . In the next section, we present the diﬀerent existing ways to construct plateaued functions. By identifying their drawbacks we motivate the search for new classes and exhibit one.

5

Known Constructions of Plateaued Functions

In this section, we investigate the known constructions of Boolean functions and we determine precisely which ones allow us to obtain plateaued functions (for completeness, we list in Appendix A.2 the plateaued functions constructed in sequence designs). 5.1

Maiorana-McFarland’s Functions as Well as Others

We recall the primary contructions, in which one does not suppose the existence of previously deﬁned functions to deﬁne new ones (see Appendix A.1 for secondary constructions). All aﬃne functions are plateaued, but they have null nonlinearity. All quadratic functions are also plateaued, but they are improper for cryptographic use because of their low degree (see [25, 26]). The two main primary constructions of bent functions are given by Dillon [17] and McFarland [29] and lead to the classes called respectively P Sap and Maiorana-McFarland. For any integer n, a function in P Sap takes the form g( xy ) ( xy = 0 if x = 0 n

n

or y = 0) where x and y belong to F22 , the Galois Field of order 2 2 , and g is a n balanced Boolean function on F22 . We have checked that one cannot derive from P Sap a construction of plateaued functions by choosing g non-balanced instead of balanced. Camion et al. generalized in [3] the class of Maiorana Mc-Farland’s functions. We shall call M the generalized class deﬁned as follows: Deﬁnition 2. Class M is the set of functions fφ,h which can be written in the form: fφ,h (x, y) = x · φ (y) ⊕ h (y) (9) where r and s are any positive integers, n = r + s, φ is any function from Fs2 into Fr2 and h is any Boolean function on Fs2 . Notice that for r = 1, we obtain all Boolean functions on Fn2 . Let fφ,h be a function in M. For any pair (a, b) ∈ Fr2 × Fs2 , the value at (a, b) b·y+h(y) r . Then fφ,h of the Walsh transform χ fφ,h of fφ,h equals 2 y∈φ−1 (a) (−1)

On Plateaued Functions and Their Constructions

61

b·y+h(y) is plateaued if and only if y∈φ−1 (a) (−1) takes three values 0 and ±λ r s when (a, b) ranges over F2 × F2 . The following proposition gives two suﬃcient conditions to ensure that a given function fφ,h in M is plateaued. Proposition 5. Let fφ,h be a function deﬁned on Fr2 × Fs2 and belonging to M. If φ is injective (resp. takes exactly 2 times each value of Im (φ)), then fφ,h is plateaued of amplitude 2r (resp. 2r+1 ). Proof. If φ is injective, then every pre-image by φ has cardinality 1 or 0. This b·φ−1 (a)+h◦φ−1 (a) −1 implies that χ (a) = ∅ and equals 2r (−1) fφ,h (a, b) is null if φ otherwise. We deduce that fφ,h is plateaued of amplitude 2r . If φ is two-to-one, i.e. takes exactly 2 times each value of Im (φ), then φ−1 (a) has cardinality 2 or 0. Let a be an element of Fr2 such that #φ−1 (a) = 2, we de b·y+h(y) note by {y1 , y2 } the set φ−1 (a). Then, for any b ∈ Fs2 , 2r y∈φ−1 (a) (−1) equals to 2r [(−1)b·y1 +h(y1 ) + (−1)b·y2 +h(y2 ) ] which is either 0 or ±2r+1 . We shall denote by Mi the class of plateaued functions of amplitude 2i obtained by applying Proposition 5 (i = r in the ﬁrst case and i = r + 1 in the second). We study now the conditions in which we can achieve best tradeoﬀ between resiliency and nonlinearity. Proposition 6. Let fφ,g be a Maiorana-McFarland function on Fr2 × Fs2 . Let k denote the minimum Hamming weight of elements of φ (Fs2 ). If φ is injective, the resiliency order m of fφ,g equals k − 1 and k is upper bounded by max{t ∈ t N; i=0 ri ≤ 2r − 2s } + 1. If φ is two-to-one, then the resiliency order m of fφ,g t is either equal to k or to k − 1 and k is upper bounded by max{t ∈ N; i=0 ri ≤ 2r − 2s−1 } + 1. A drawback of Maiorana-McFarland functions is that their restrictions obtained by keeping y constant in their input are aﬃne. Aﬃne functions being cryptographically weak functions, there is a risk that this property be used in attacks. Moreover, Maiorana-McFarland functions having high divisibilities of their Fourier and Walsh spectra, there is also a risk that this property be used in attacks as it is used in [15] to attack block ciphers. 5.2

On Zhang and Zheng’s Class of Plateaued Functions

In [40], Zhang and Zheng introduce a subclass of the class of plateaued functions whose elements are not partially bent. We show that these functions belong to class M and satisfy the hypothesis of Proposition 5. Before presenting the construction of Zhang and Zheng, we recall that the truth table of any Boolean function f on Fn2 results in the binary word of length 2n deﬁned by (f (α0 ), f (α1 ), · · · , f (α2n −1 )) where α0 = (0, 0, · · · , 0), ..., α2n−1 −1 = (1, 1, · · · , 1). If ξi and ξj are two binary words of length 2n , we denote by ξi ξj the word of length 2n+1 resulting from their concatenation. We recall now the proposition of Zheng and Zhang leading to their “new” class of plateaued functions.

62

Claude Carlet and Emmanuel Prouﬀ

Proposition 7. [40] Let t and k be two integers such that k < 2t < 2k and let E ⊆ Fk2 be a subset of 2t elements such that any linear non null function on Fk2 is not constant on E. For every element ei of E, let ξi denote the truth table of the linear function x → x · ei on Fk2 . Then, the Boolean function f on Fk+t 2 having ξ0 ξ1 · · · ξ2t −1 for truth table is plateaued on Fk+t and its amplitude equals 2 2k . Viewed as binary vectors of length 2k+t , plateaued functions constructed in such a way are concatenations of distinct linear functions. We deduce that the functions constructed in Proposition 7 belong to M and satisfy the ﬁrst hypothesis of Proposition 5. Indeed, for any subset E ⊆ Fk2 of 2t elements, one can deﬁne an injective function φ from Ft2 into Fk2 such that φ (Ft2 ) = E: the function f associated to E in Proposition 7 can be rewritten on the product space Fk2 × Ft2 in the form f (x, y) = x · φ(y). Moreover, we notice that the condition k < 2t and the condition on E are not necessary to insure that f is plateaued since one only uses the fact that the cardinality of E is a power of 2 to rewrite f as an element of Mk . 5.3

A Recent Class

A construction generalizing construction M and avoiding the drawback that these functions are the concatenations of aﬃne functions was proposed in [12]. We shall denote it by M . The functions it produces are concatenations of quadratic functions (i.e. functions of degrees at most 2) instead of aﬃne functions. Deﬁnition 3.

Let n and r be positive integers such that r < n. Denote the integer part 2r by t and n − r by s. Let ψ be a mapping from Fs2 to Ft2 and let ψ1 , · · · , ψt be its coordinate functions. Let φ be a mapping from Fs2 to Fr2 and let φ1 , · · · , φr be its coordinate functions. Let g be a Boolean function on Fs2 . The function fψ,φ,g is deﬁned on Fn2 = Fr2 × Fs2 as fψ,φ,g (x, y) =

t

x2i−1 x2i ψi (y) ⊕ x · φ(y) ⊕ g(y); x ∈ Fr2 , y ∈ Fs2 .

i=1

Maiorana-McFarland’s functions correspond to the case where ψ is the null mapping. The following theorem is proved in [12]. Theorem 2. Let fψ,φ,g be deﬁned as in Deﬁnition 3. Then for every a ∈ Fr2 and every b ∈ Fs2 we have t χ 2r−wH (ψ(y)) (−1) i=1 (φ2i−1 (y)+a2i−1 )(φ2i (y)+a2i )+g(y)+y·b , fψ,φ,g (a, b) = y∈Ea

where Ea is the superset of φ−1 (a) equal if r is even to {y ∈ Fs2 / ∀i ≤ t, ψi (y) = 0 ⇒ (φ2i−1 (y) = a2i−1 and φ2i (y) = a2i )} ,

On Plateaued Functions and Their Constructions

63

and if r is odd to

∀i ≤ t, ψi (y) = 0 ⇒ (φ2i−1 (y) = a2i−1 and φ2i (y) = a2i ) . y ∈ Fs2 / φr (y) = ar We deduce straightfowardly: Proposition 8. Let fψ,φ,g be deﬁned as in Deﬁnition 3. If ψ has constant weight and if Ea has size 0 or 1 for every a (respectively 0 or 2 for every a), then fψ,φ,g is plateaued. Thus, this construction easily allows us to obtain plateaued functions. A suﬃcient condition for fψ,φ,g being m-resilient is given in [12], as well as examples of functions achieving good tradeoﬀs between resiliency and nonlinearity. It seems diﬃcult to give numerous such examples, all the more if we add this condition that Ea has size 0 or 1 (or 0 or 2) for every a. So, searching for other constructions still seems to be necessary. Notice that the examples given in [12] of functions fψ,φ,g having nonlinearities of the form 2n−1 − 2i − 2j with i = j (the m-resilient functions achieving best possible nonlinearities, with m ≤ n/2 − 2 must have such nonlinearities, at least if n is even) cannot be plateaued.

6

A New Construction of Boolean Functions Leading to Two Classes of Plateaued Functions

Functions fψ,φ,g in the class M are built as the concatenations of quadratic functions chosen in such a way that we can eﬃciently compute their Walsh spectra. The aim of this section is to present another way of concatenating quadratic functions, whose Walsh spectra can also be eﬃciently computed. As in the cases of the functions fφ,g and fψ,φ,g , the deﬁnition of the new class shall lead to two suﬃcient conditions implying two constructions of plateaued functions. The quadratic functions we shall concatenate are those functions whose associated symplectic forms (cf. [35]) have rank at most 2: Lemma 1. Let r be a positive integer and let f be any Boolean function on Fr2 of the form (u · x)(v · x) ⊕ w · x, where u, v and w are three vectors in Fr2 . Assume ﬁrst that u and v are linearly independent (i.e. u = 0, v = 0 and u = v). Then f is balanced if and only if w does not belong to the vectorspace < u, v >= {0, u, v, u + v} spanned by u and v. If w ∈ {0, u, v}, then x∈Fr (−1)f (x) equals 2 2r−1 , and if w = u + v, then it equals −2r−1 . Assume now that u and v are linearly dependent. If (u = 0 or v = 0) and w = 0 or if u = v = w, then x∈Fr (−1)f (x) equals 2r . Otherwise, x∈Fr (−1)f (x) is 2 2 null. Proof. Assume ﬁrst that u and v are linearly independent. We have (−1)f (x) = (−1)w·x + (−1)(w+v)·x . x∈Fr2

x∈u⊥

x∈(u⊥ )c

64

Claude Carlet and Emmanuel Prouﬀ

The sum x∈u⊥ (−1)w·x is null if w = 0 and w = u. The sum x∈(u⊥)c (−1)(w+v)·x is null if w = v and w = u + v. Hence if w does not belong to the vectorspace {0, u, v, u + v}, then x∈Fr (−1)f (x) is null, and thus f is balanced. It is a simple 2 matter to see that if w ∈ {0, u, v}, then x∈Fr (−1)f (x) equals 2r−1 and that if 2 w = u + v then it equals −2r−1 . Assume now that u and v are linearly dependent. Then f is linear. If u = 0 or v = 0 then f is null if and only if w = 0. If u = v = 0 then f is null if and only if w equals u. If f is not null, then it is balanced. Remark:If v = 0 and if u, v are linearly dependent (i.e. u = 0 or u = v), then the sum x∈Fr (−1)f (x) equals 2r if and only if w = u. This observation will be 2 useful for Proposition 9 and Corollary 2 below. We concatenate now the functions studied in Lemma 1 and their complements: Deﬁnition 4. We call Q the class of all Boolean functions f of the form ∀(x, y) ∈ Fr2 × Fs2 , fφ1 ,φ2 ,φ3 ,g (x, y) = (x · φ1 (y)) (x · φ2 (y)) ⊕ x · φ3 (y) ⊕ g(y) where φ1 , φ2 and φ3 are three functions from Fs2 into Fr2 and g is any Boolean function on Fs2 . Remark: 1. Class Q has a simpler deﬁnition than the class M recalled at subsection 5.3. And we shall see that its Walsh spectrum is also simpler to compute. Notice that s t s 3 s s s s its size (2r )2 × 22 = 2(3r+1)2 is larger than the size 22 × (2r )2 × 22 =

s 2(t+r+1)2 (where t = 2r ) of M . 2. The bent functions constructed in [11] (cf. Proposition 4) belong to class Q. The restrictions of fφ1 ,φ2 ,φ3 ,g obtained by ﬁxing y in its input being quadratic functions of the form (u · x)(v · x) ⊕ w · x, as a direct consequence of Lemma 1 we have: Proposition 9. Let fφ1 ,φ2 ,φ3 ,g be a function in Q such that φ2 (y) = 0 for every y ∈ Fs2 . Let E be the set of all y ∈ Fs2 such that the vectors φ1 (y) and φ2 (y) are linearly independent. Then, for every a ∈ Fr2 and every b ∈ Fs2 , χfφ (a, b) 1 ,φ2 ,φ3 ,g equals (−1)g(y)+b·y − 2r−1 (−1)g(y)+b·y + 2r−1 y∈E; φ3 (y)+a∈{0,φ1 (y),φ2 (y)}

r

2

y∈E; φ3 (y)+a=φ1 (y)+φ2 (y)

g(y)+b·y

(−1)

.

y∈Fs 2 \E; φ3 (y)+a=φ1 (y)

6.1

Two Constructions of Plateaued Functions in Q

We deduce from Proposition 9 two suﬃcient conditions to insure that an element in Q is plateaued. These conditions are used to construct two new classes Q1 and Q2 of plateaued functions.

On Plateaued Functions and Their Constructions

65

Corollary 1. Let fφ1 ,φ2 ,φ3 ,g be deﬁned as in Deﬁnition 4. Assume that, for every y ∈ Fs2 , the vectors φ1 (y) and φ2 (y) are linearly independent. If the 2dimensional ﬂats φ3 (y) + φ1 (y), φ2 (y) (where y ranges over yFs2 ) are pairwisely disjoint, then fφ1 ,φ2 ,φ3 ,g is plateaued of amplitude 2r−1 . Proof. According to the hypothesis, for every a ∈ Fr2 , there exists at most one vector y ∈ Fs2 such that a is included in φ3 (y)+ < φ1 (y), φ2 (y) >. According to Proposition 9, χfφ (a, b) equals then 0 or ±2r−1 for every a ∈ Fr2 and 1 ,φ2 ,φ3 ,g every b ∈ Fs2 . Thus, from every family of 2s pairwisely disjoint 2-dimensional ﬂats of Fr2 , Corollary 1 allows us to derive a plateaued function on Fr2 × Fs2 belonging to Q: for every y ∈ Fs2 , we choose one of these ﬂats and we choose two distinct nonzero elements in its direction. Denote these two elements by φ1 (y) and φ2 (y) and denote by φ3 (y) any element of the same ﬂat. The function fφ1 ,φ2 ,φ3 ,g satisﬁes then the hypothesis of Corollary 1. We denote by Q1 the class of those plateaued Boolean functions in Q constructed in this way. Similarly, if for every y ∈ Fs2 the vectors φ1 (y) and φ2 (y) are linearly independent and if, denoting by Fa the set {y ∈ Fs2 /a ∈ φ3 (y) + φ1 (y), φ2 (y)}, the cardinality of Fa equals 0 or 2 for every a ∈ Fr2 , then fφ1 ,φ2 ,φ3 ,g is plateaued of amplitude 2r . This condition seems more diﬃcult to satisfy than the condition obtained in Corollary 1. But since it leads to an amplitude of 2r , we can relax the condition that the vectors φ1 (y) and φ2 (y) are linearly independent. We obtain: Corollary 2. Let fφ1 ,φ2 ,φ3 ,g be deﬁned as in Deﬁnition 4. Assume that φ2 (y) is nonzero for every y ∈ Fs2 . For every a ∈ Fr2 , let Fa be the set of all y ∈ Fs2 such that φ1 (y) and φ2 (y) are linearly independent and such that a belongs to the ﬂat φ3 (y) + φ1 (y), φ2 (y). Let Fa be the set of all y ∈ Fs2 such that φ1 (y) and φ2 (y) are linearly dependent and a = φ3 (y) + φ1 (y). If, for every a ∈ Fr2 , the number #Fa + 2#Fa equals 0 or 2, then fφ1 ,φ2 ,φ3 ,g is plateaued of amplitude 2r . Proof. According to the hypothesis, for every a ∈ Fr2 , either Fa = ∅ and the size of Fa equals 0 or 2, or Fa = ∅ and Fa has size at most 1. Thus, fφ1 ,φ2 ,φ3 ,g is plateaued of amplitude 2r , according to Proposition 9. Corollary 2 leads to a second construction of plateaued functions belonging to Q. Let B and A be two disjoint subsets of Fr2 such that #B + 2#A = 2s . Let F be a family of 2-dimensional ﬂats included in A and such that, for every a ∈ A, there exist two 2-dimensional ﬂats of F which contain a. We derive a plateaued function in the following way: 1. To every a ∈ B, we associate injectively y ∈ Fs2 , we choose φ2 (y) in Fr2 \ {0}, φ1 (y) in {0, φ2 (y)} and we set φ3 (y) = a + φ1 (y). 2. For every a ∈ A, we consider the two ﬂats F1 and F2 in F which contain a and we choose injectively two vectors y1 and y2 among those of Fs2 which have not been chosen at step 1. We choose two distinct nonzero elements in the direction of F1 (resp. F2 ). We denote these elements by φ1 (y1 ) (resp. φ1 (y2 )) and φ2 (y1 ) (resp. φ2 (y2 )). We denote by φ3 (y1 ) (resp. φ3 (y2 )) any element of the same ﬂat.

66

Claude Carlet and Emmanuel Prouﬀ

At the end of step 2, every y ∈ Fs2 has been aﬀected a value for φ1 (y), φ2 (y) and φ3 (y) and the function fφ1 ,φ2 ,φ3 ,g satisﬁes the hypothesis of Corollary 2. We call Q2 the class of those plateaued Boolean functions in Q constructed this way.

7 7.1

Study of the New Classes Q1 and Q2 Algebraic Degree and Nonlinearity

In the next proposition, we denote by φj (1 ≤ j ≤ r) the j-th component function of any function φ from Fs2 into Fr2 . Proposition 10. The algebraic degree of the function fφ1 ,φ2 ,φ3 ,g of Q equals (10) max(max deg(φj1 φi2 ⊕ φi1 φj2 ) + 2 ; max deg φi3 ⊕ φi1 φi2 + 1 ; deg g). i,j≤r

i≤r

and is upper bounded by 2 + s. The degree of fφ1 ,φ2 ,φ3 ,g equals s + 2 if and only if there exists a pair of distinct indices {i, j} such that deg(φj1 φi2 ⊕ φi1 φj2 ) = s. The nonlinearity of any Boolean function of Q1 (resp. Q2 ) is 2n−1 − 2r−2 (resp. 2n−1 − 2r−1 ), according to Equality (6) and to Corollaries 1 and 2. 7.2

Resiliency

r s Proposition 11. Let fφ1 ,φ2 ,φ3 ,g be a Boolean function on F2 × F2 belonging to Q1 and let D1 denote the set y∈Fs (φ3 (y)+φ1 (y), φ2 (y)). Let k be the minimum 2 t weight of the elements of D1 . Then, we have k − 1 ≤ max{t ∈ N; i=0 ri ≤ 2r − 2s+2 } and fφ1 ,φ2 ,φ3 ,g is exactly (k − 1)-resilient.

Proof. According to Proposition 9 and to Corollary 1, χfφ (a, b) equals 1 ,φ2 ,φ3 ,g ±2r−1 if and only if a ∈ D1 . If (a, b) has weight smaller than or equal to k − 1, then a has weight smaller than or equal to k − 1 and does not belong to D1 : this implies χfφ (a, b) = 0. Thus, fφ1 ,φ2 ,φ3 ,g is at least (k − 1)-resilient. 1 ,φ2 ,φ3 ,g Moreover, suppose that fφ1 ,φ2 ,φ3 ,g has a resiliency order m larger than or equal (a, 0) = 0 for any a ∈ Fr2 having weight k wich contradicts to k, then χfφ 1 ,φ2 ,φ3 ,g the hypothesis on k and D1 . Since, by hypothesis, every word of weight smaller than or equal to k − 1 belongs to D1c , and since D1 is the union of 2s pairwisely k−1 r disjoint 2-dimensional ﬂats, we deduce that 2r − 2s+2 ≥ i=0 i and then t r k − 1 ≤ max{t ∈ N; i=0 i ≤ 2r − 2s+2 }. As recalled at section 4, for every Boolean function f on Fn2 , the nonlinearity Nf and the resiliency order m satisfy the relation Nf ≤ 2n−1 − 2m+1 . When f is plateaued of amplitude 2r−1 , this relation implies that m is upper bounded by r − 3. The functions whose nonlinearity and resiliency order equal 2n−1 − 2r−2 and r − 3 respectively are good candidates to be used in stream ciphers and are

On Plateaued Functions and Their Constructions

67

proved to be necessarily plateaued by Sarkar and Maitra [34]. For this reason, it would be interesting if Q1 could contain such functions. The aim of the following corollary is to give a necessary condition, wich must be satisﬁed by any element of Q1 having 2r−1 for amplitude and r − 3 for resiliency order. Corollary 3. If a Boolean function fφ1 ,φ2 ,φ3 ,g on Fr2 × Fs2 belonging to Q1 achieves maximum possible resiliency order r − 3, then s is upper bounded by log2 (r2 +r +2)−3 and thus have a degree upper bounded by log2 (r2 +r +2)−1 ≤ log2 (n2 + n + 2) − 1. Proof. According to Proposition 11, if fφ1 ,φ2 ,φ3 ,g is (r − 3)-resilient then r − 3 is t upper bounded max{t ∈ N; i=0 ri ≤ 2r − 2s+2 }. We deduce that r satisﬁes r−3 r r−3 r s+2 . Since i=0 ri equals 2r − 0r − 1r − 2r , we obtain i=0 i ≤ 2 − 2 2s+2 ≤ 1 + r + r(r−1) that is s ≤ log2 (r2 + r + 2) − 3. 2 The Walsh spectra of the elements of Q2 can be more complex than the Walsh spectrum of the elements of Q1 . However, we shall see that their resiliency approximatively have the same behavior as the elements of Q1 with respect to their amplitude. Proposition 12. Let fφ1 ,φ2 ,φ3 ,g be a function in Q2 . For every a ∈ Fr2 , let Fa and Fa be the sets deﬁned in Corollary 2. Let A denote the set {a ∈ Fr2 ; #Fa = 2} and let B denote the set {a ∈ Fr2 ; #Fa = 1}. Let k and k denote the minimum weights of the elements of A∪B and B respectively. Then fφ1 ,φ2 ,φ3 ,g is m-resilient with min(k − 1, k) ≥ m ≥ k − 1. Proof. Let D2 denote the set B ∪A. According to Proposition 9 and to Corollary (a, b) equals zero. Thus, if (a, b) has weight smaller 2, if a is in D2c , then χfφ 1 ,φ2 ,φ3 ,g than or equal to k − 1, then a has weight smaller than or equal to k − 1 and belongs to D2c . This implies that χfφ (a, b) = 0: we deduce that fφ1 ,φ2 ,φ3 ,g 1 ,φ2 ,φ3 ,g is at least (k − 1)-resilient. We notice that, by deﬁnition, k and k satisfy k ≥ k. Suppose that a is an element of B admitting k for weight. Due to Proposition 9, for every b ∈ Fs2 , χfφ (a, b) = ±2r . This implies that the resiliency of 1 ,φ2 ,φ3 ,g fφ1 ,φ2 ,φ3 ,g is upper bounded by k − 1. Suppose that a is an element of A admitting k for Hamming weight and denote by y1 and y2 the two elements of Fs2 such that a ∈ φ3 (y1 )+ < φ1 (y1 ), φ2 (y1 ) > and a ∈ φ3 (y2 )+ < φ1 (y2 ), φ2 (y2 ) >. Due to Proposition 9, the restriction of χfφ to {a} × Fs2 is deﬁned by one of the two following relations: 1 ,φ2 ,φ3 ,g 1 s ∀b ∈ F2 , 2r−1 χfφ (a, b) = ±[(−1)g(y1 )+b·y1 + (−1)g(y2 )+b·y2 ] = ±2[b · (y1 + 1 ,φ2 ,φ3 ,g y2 ) ⊕ g(y1 ) ⊕ g(y2 ) ⊕ 1] 1 ∀b ∈ Fs2 , 2r−1 χfφ (a, b) = ±[(−1)g(y1 )+b·y1 − (−1)g(y2 )+b·y2 ] = ±2[b · (y1 + 1 ,φ2 ,φ3 ,g y2 ) ⊕ g(y1 ) ⊕ g(y2 )]

Since the linear function b → b · (y1 + y2 ) is not constant on the set {b ∈ Fs2 ; ωH (b) ≤ 1} when y1 and y2 are distinct, then there always exists an element b ∈ Fs2 of weight ωH (b) ≤ 1 such that χfφ (a, b) is not null. This implies 1 ,φ2 ,φ3 ,g that the resiliency of fφ1 ,φ2 ,φ3 ,g is stricly upper bounded by k + 1.

68

7.3

Claude Carlet and Emmanuel Prouﬀ

Constructions of Highly Nonlinear Resilient Functions from the ClassQ

We have seen in the previous section, that the elements of Q1 and Q2 which achieve optimum tradeoﬀ between nonlinearity and resiliency must have a low degree. We show now that it is possible to construct functions in Q with very good (but not optimal) characteristics. Let r be any integer, we denote by 1 the vector in Fr2 having all its coordinates equal to 1. We construct functions in Q as follows: we choose two distinct elements e1 and e2 in the canonical basis of Fr2 (we recall that for every i ≤ r, ωH (ei ) = 1) and we denote by F the ﬂat 1+ < e1 , e2 >. We choose an integer t lower than or equal to r − 2 and we denote by Ut the set {u ∈< e1 , e2 >⊥ ; ωH (u) ≤ t}. By construction of the set Ut , (u + F )u∈Ut is a family of pairwisely disjoint 2-dimensional ﬂats whose elements have weights greater than or equal to r − t − 2. This family leads to construct Boolean functions in Q1 and Q2 such that the nonlinearity and the resiliency order can easily be computed. t s Construction 1 Let s denote log2 i=0 r−2 i . For every y ∈ F2 choose an element u ∈ Ut and choose two distinct nonzero elements in {e1 , e2 , e1 + e2 } (i.e. nonzero elements in the direction of F ). Denote these elements by φ1 (y) and φ2 (y) and denote by φ3 (y) any element of the ﬂat u + F . For any choice of Boolean function g on Fs2 , the function fφ1 ,φ2 ,φ3 ,g belongs to Q1 . Denoting r + s by n, its nonlinearity and resiliency order equal 2n−1 − 2r−2 and r − t − 3 respectively (indeed the minimum weight of the elements of Ut +F equals r−t−2 and Propostion 11 permits to conclude). In the following construction of highly nonlinear resilient functions from Q2 , we choose the set B deﬁned in Proposition 12 empty. t Construction 2 Let s − 1 denote log2 i=0 r−2 i . For every u ∈ Ut such that ωH (u) = t, we choose injectively two vectors y1 and y2 in Fs2 : we deﬁne φ1 (y1 ) = φ2 (y2 ) = e1 and φ2 (y1 ) = φ1 (y2 ) = e2 ; then we set φ3 (y1 ) = φ3 (y2 ) = u + 1. For every element u of Ut such that ωH (u) < t we choose injectively two vectors y1 and y2 among those of Fs2 which have not been chosen at the previous step and we choose two pairs of distinct nonzero elements in < e1 , e2 >: we denote these pairs by (φ1 (y1 ), φ2 (y1 )) and by (φ1 (y2 ), φ2 (y2 )); we denote by φ3 (y1 ) and by φ3 (y2 ) two elements of the same ﬂat u + F . After deﬁning a Boolean function g on Fs2 such that g(y1 ) = g(y2 ) ⊕ 1 when wH [φ1 (y1 ) + φ2 (y1 ) + φ3 (y1 )] = wH [φ1 (y2 ) + φ2 (y2 ) + φ3 (y2 )] = r − t − 2, we obtain a function fφ1 ,φ2 ,φ3 ,g in Q2 having 2n−1 − 2r−1 for nonlinearity and having r − t − 2 for resiliency.

On Plateaued Functions and Their Constructions

69

References 1. E. Biham et A. Shamir, “Diﬀerential Cryptanalysis of DES-like Cryptosystems,” Proceedings of CRYPTO’ 90, and Journal of Cryptology, Vol 4, No.1, 1991. 2. S. Botzas and P.V. Kumar, “Binary Sequences with Gold-Like Correlation but Larger Liner Span”, IEEE Trans. on Information Theory, vol. 40 no. 2, pp. 532537, 1994. 3. P. Camion, C. Carlet, P. Charpin and N. Sendrier, “On Correlation-immune Functions”, Advances in Cryptology - CRYPTO’ 91, Lecture Notes in Computer Science, Springer Verlag, v. 576, pp. 86-100, 1992. 4. P. Charpin and E. Pasalic. “On propagations characteristics of resilient functions”. Advances in Cryptology - SAC 2002, Lecture Notes in Computer Science to appear (2002). 5. A. Canteaut, C. Carlet, P. Charpin and C. Fontaine, ”Propagation characteristics and correlation-immunity of highly nonlinear Boolean functions”, Advances in Cryptology - EUROCRYPT’2000, Lecture Notes in Computer Science, Springer Verlag, v.1807, pp.507-522, 2000. 6. A. Canteaut, C. Carlet, P. Charpin and C. Fontaine. “On cryptographic properties of the cosets of R(1, m)”. IEEE Transactions on Information Theory Vol. 47, no 4, pp. 1494-1513, 2001. 7. A. Canteaut, P. Charpin and H. Dobbertin. “Binary m-sequences with three-valued crosscorrelation: a proof of Welch’s conjecture”. IEEE Transactions on Information Theory Vol. 46, pp. 4-8, 2000. 8. A. Canteaut, P. Charpin and H. Dobbertin. “Weight divisibility of cyclic codes, highly nonlinear functions on F2m , and crosscorrelation of maximum-length sequences”, SIAM Journal of Discrete Mathematics, v. 13(1), pp. 105-138, 2000. 9. C. Carlet, “Partially-bent functions”, Proceedings of CRYPTO’ 92, Advances in Cryptology, Lecture Notes in Computer Science 740, Springer Verlag, 280-291, 1993. 10. C. Carlet, “Two new classes of bent functions”, Advances in Cryptology - Eurocrypt’93, Lecture Notes in Computer Science, Ed. T. Helleseth, Springer-Verlag, v.765, pp.77-101, 1994. 11. C. Carlet, “Generalized Partial Spreads”, IEEE Transactions on Information Theory, vol 41, number 5, 1482-1487, 1995. 12. C. Carlet. “A larger Class of Cryptographic Boolean Functions via a Study of the Maiorana-McFarland Construction”. Advances in Cryptology - CRYPT0 2002, number 2442 in Lecture Notes in Computer Science, pp. 549-564, 2002. 13. C. Carlet and P. Sarkar, “Spectral Domain Analysis of Correlation Immune and Resilient Boolean Functions”, Finite Fields and Their Applications 8, pp.120-130, 2002. 14. C. Carlet and Y. Tarannikov, “Covering sequences of Boolean functions and their cryptographic signiﬁcance”, Designs Codes and Cryptography, 25, v.25, pp. 263279, 2002. 15. A. Canteaut and M. Videau, “Degree of composition of Highly Nonlinear Functions and Applications to Higher Order Diﬀerential Cryptanalysis”, Advances in Cryptology - Eurocrypt’2002, Lecture Notes in Computer Science, Springer-Verlag, v.2332, pp. 518-533, 2002. 16. T.W. Cusik and H. Dobbertin, “Some new three-valued crosscorrelation functions for binary m-sequences”, IEEE Transaction of Information Theory, vol. 42, pp. 1238-1240, 1996.

70

Claude Carlet and Emmanuel Prouﬀ

17. J. F. Dillon, “Elementary Hadamard Diﬀerence sets, Phd Thesis, University of Maryland, 1974. 18. H. Dobbertin, “Constructions of bent functions and balanced Boolean functions with high nonlinearity”, Fast Software Encryption, Lecture Notes in Computer Science, Ed. B. Preenel, Springer Verlag, 1994, v. 1008, pp.61-74. 19. R. Gold, “Maximal recursive sequences with 3-valued recursive cross-correlation functions”, IEEE Transaction of Information Theory, v. 14, pp. 154-156, 1968. 20. T. Helleseth, “Some results about the cross-correlation function between two maximal linear sequences”, Discrete Mathematics, v.16, pp. 209-232, 1976. 21. T. Helleseth, “Correlation of m-Sequences and Related Topics”, In Sequences and their Qpplicqtions SETA 98, pp. 49-66, 1999. 22. T. Helleseth and P. Vijay Kumar, Sequences with low correlation in Handbook of Coding Theory, ed. V. Pless and W.C. Huﬀman, pp.1765–1855, Elsevier, Amsterdam, 1998. 23. T. Helleseth, H. Martinsen, “Sequences with ideal autocorrelation and Diﬀerence sets”, Proceedings of International Meeting on Coding Theory and Cryptography, Septembre 1999. 24. H.D.L. Hollmann and Q. Xiang, “A proof of the Welch and Niho conjectures on crosscorrelation of binary m-sequences”, Finite Fields and Their applications, v. 7, 2001. 25. L.R. Knudsen. Truncated and higher order diﬀerentials. Fast Software Encryption, Second International Workshop, Lecture Notes in Computer Science, n 1008. pp. 196–211. – Springer-Verlag, 1995. 26. X. Lai. “Higher order derivatives and diﬀerential cryptanalysis”. Proc. Symposium on Communication, Coding and Cryptography, in honor of J. L. Massey on the occasion of his 60’th birthday, 1994. 27. M. Matsui, “Linear cryptanalysis method for DES cipher,” Advances in Cryptology - EUROCRYPT’93, number 765 in Lecture Notes in Computer Science. SpringerVerlag, pp. 386–397, 1994. 28. P. Sarkar and S. Maitra, “Nonlinearity bounds and construction pf resilient Boolean functions”, Advances in Cryptology - EUROCRYPT’ 2000, Lecture Notes in Computer Science, Springer Verlag, 1994, v. 1880, pp.512-532. 29. R. L. McFarland, “A family of noncyclic diﬀerence sets”, Journal of Combinatorial Theory, number 15, pp. 1-10, 1973. 30. E. Pasalic, S. Maitra, T. Johanson and P. Sarkar, “New Constructions of Resilient and Correlation Immune Boolean Functions Achieving Upper Bound on Nonlinearity”, Workshop on Coding and Cryptography, Electronic Notes in Discrete Mathematics, Ed. Elsevier, 2001. 31. V.S. Pless and W.C. Huﬀman, “Handbook of coding theory”, Elsevier, Amsterdam, 1998. 32. B. Preneel, W. Van Leekwijck, L. Van Linden, R. Govaerts and J. Vandevalle. Propagation characteristics of Boolean functions, Advances in Cryptology, EUROCRYPT’90, Lecture Notes in Computer Sciences, Springer Verlag n◦ 473, pp. 161-173, 1991. 33. P. Sarkar, S. Maitra, “Constructions of nonlinear Boolean functions with important cryptographic properties”, Advanced in Cryptology: Eurocrypt 2000, Proceedings, Lecture Notes in Computer Science, Springer Verlag, v. 1807, pp. 485-506, 2000. 34. P. Sarkar, S. Maitra, “Nonlinearity bounds and constructions of resilient Boolean functions”, Advanced in Cryptology: Crypto 2000, Proceedings, Lecture Notes in Computer Science, v. 1880, pp. 515-532, 2000.

On Plateaued Functions and Their Constructions

71

35. Mac Williams, F. J. and N. J. Sloane (1977). The theory of error-correcting codes, Amsterdam, North Holland. 36. Y. Tarannikov, “On resilient Boolean functions with maximum nonlinearity”, Proceedings of INDOCRYPT 2000, Lecture Notes in Computer Science, Ed. Bimal K. Roy and Eiji Okamoto, Springer Verlag, v. 1977, pp.19-30, 2000. 37. G.-Z. Xiao, C. Ding and W. Shan. The stability theory of stream ciphers, vol. LNCS 561, Springer Verlag, 1991. 38. Xiao Guo-Zhen and J. L. Massey. “A Spectral Characterization of CorrelationImmune Combining Functions”. IEEE Trans. Inf. Theory, Vol IT 34, n◦ 3 (1988), pp. 569-571. 39. Y. Zheng and X. Zhang, “Improved upper bound on the nonlinearity of high order correlation immune functions”, Proceedings of Selected Areas in Cryptography 2000, Lecture Notes in Computer Science, Springer Verlag, 2001, v.2012, pp. 262-274, 2000. 40. Y. Zheng and X. M. Zhang, “Plateaued functions”, Advances in Cryptology ICICS’99, Lecture Notes in Computer Science, Heidelberg, Ed., Springer-Verlag, v.1726, pages 284-300, 1999. 41. Y. Zheng X. M. Zhang, “Improved upper bound on the nonlinearity of high order correlation immune functions”, Selected Areas in Cryptography, 7th Annual International Workshop, SAC 2000, Lecture Notes in Computer Science, SpringerVerlag, v. 2012, pp.264-274, 2001.

A A.1

Other Constructions of Plateaued Functions Secondary Constructions

A ﬁrst construction is given in [17]. Let g : Fr2 → F2 and h : Fs2 → F2 be two plateaued functions. The function f deﬁned on Fr+s by f (x, y) = g (x) ⊕ h (y) 2 is plateaued on Fr+s 2 . Indeed we have g (a) × χ h (b). χ f (a, b) = χ But such function f does not have good cryptographic properties. For instance, the degree of f is upper bounded by max(deg g, deg h). And J. Dillon himself says that the “decomposable” functions this construction produces are not satisfactory. Two other secondary constructions of plateaued functions can be adapted from classical secondary constructions of resilient functions (cf. [3, 30, 36]). Proposition 13. 1. Let g and h be two plateaued Boolean functions on Fn2 of the same amplitude 2r . The functions deﬁned by f (x1 , · · · , xn , xn+1 ) = g(x1 , · · · , xn−1 , xn ⊕ xn+1 ) and f (x1 , · · · , xn , xn+1 ) = g(x1 , · · · , xn−1 , xn ⊕ xn+1 ) ⊕ xn are plateaued of amplitude 2r+1 on Fn+1 . If g is mth-order correlation immune (resp. m-resilient) 2 then f is mth-order correlation immune (resp. m-resilient) and f is m-resilient. If χ g (a1 , . . . , an−1 , 1) = 0 for every ((a1 , . . . , an−1 ), then f is (m + 1)-resilient. 2. If for all a ∈ Fn2 , the numbers χ g (a) and χ h (a) either are both null or are both nonzero, then the function f (x1 , · · · , xn , xn+1 ) = (xn+1 ⊕ 1)g(x1 , · · · , xn ) ⊕ xn+1 h(x1 , · · · , xn ) is plateaued of amplitude 2r+1 on Fn+1 . 2

72

Claude Carlet and Emmanuel Prouﬀ

If for all a ∈ Fn2 , at least one of the numbers χ g (a) and χ h (a) is null, then . the function f (x1 , · · · , xn , xn+1 ) is plateaued of amplitude 2r on Fn+1 2 If g and h are m-resilient then f is m-resilient. Proof. 1. For every (a1 , . . . , an+1 ) ∈ Fn+1 , we have 2

χ f (a1 , . . . , an+1 ) =

n+1

(−1)

i=1

ai xi ⊕g(x1 ,...,xn−1 ,xn ⊕xn+1 )

=

(x1 ,...,xn+1 )∈Fn+1 2

n−1

(−1)

i=1

ai xi ⊕an (xn ⊕xn+1 )⊕an+1 xn+1 ⊕g(x1 ,...,xn )

=

(x1 ,...,xn+1 )∈Fn+1 2

n

(−1)

i=1

ai xi ⊕(an ⊕an+1 )xn+1 ⊕g(x1 ,...,xn )

.

(x1 ,...,xn+1 )∈Fn+1 2

Thus, χ f (a1 , . . . , an+1 ) equals 2 χg (a1 , . . . , an ) if an = an+1 and 0 otherwise. Consequently, χ χg (a1 , . . . , an ) if an = an+1 ⊕ 1 and f (a1 , . . . , an+1 ) equals 2 0 otherwise. The consequences are then straightforward. 2. For every (a1 , . . . , an+1 ) ∈ Fn+1 , we have χ g (a1 , . . . , an )+ f (a1 , . . . , an+1 ) = χ 2 an+1 r+1 (−1) χ h (a1 , . . . , an ). Thus, χ (resp. ±2r ) f (a1 , . . . , an+1 ) equals 0 or ±2 thanks to the condition on χ g and χ h . The classes of Boolean functions these constructions permit to build are small. Moreover, f and f have the nonzero linear structure (0, . . . , 0, 1, 1) which can be used to attack the system in which it is implemented. A.2

Plateaued Functions from Sequence Designs

An important problem in sequence designs is to study the cross-correlation between a binary maximum-length sequence (called m-sequence) and its decimation by an integer d. Let s [1] = (s0 , s1 , s2 , . . . , sj , . . . ) denote a binary msequence of length 2n − 1 and let s [d] = (s0 , sd , s2d , . . . , sjd , . . . ) denote its decimation by an integer d, co-prime with 2n − 1. We denote by Cd (t) the cross-correlation function between the m-sequences s [1] and s [d], deﬁned by 2n −2 s +s Cd (t) = j=0 (−1) jd j+t for t = 0, 1, . . . , 2n − 2. These cross-correlations are known to take at least three diﬀerent values. The special case when exactly three values occur was the subject of many works [7, 16, 19, 20]. The study of binary sequences can be related to Boolean functions through their trace representation. Let T r : F2n → F2 denote the usual trace function on the Galois Field F2n and let α be a primitive element of F2n . Since the cross-correlation spectrum only depends on d and not on the choice of the msequence s [1], we may assume with no loss of generality that s [1] is given by sj = T r αj . Then to each sequence obtained by decimating s [1] by an integer d, one can associate its trace representation i.e. the function fd : x → T r xd . When t varies in {0, · · · , 2n − 2}, the functions associated to the sequences sj+t

On Plateaued Functions and Their Constructions

73

are the functions of the form x → T r (βx) where β = αt that is all the nonzero linear functions on F2n . The Walsh transform of the function fd and the cross-correlation Cd of s [1] are connected through the relation: χ fd (u) = Cd (t) + 1

(11)

where u = αt is in F2∗n The Boolean function fd is plateaued on F2n if and only if the associate sequence has a three-valued cross-correlation function. In [22], T. Helleseth and P. V. Kumar give the list of all the values d known at that time for which Cd is three-valued. We recall here this list and add the two constructions presented in [7] and [24]: 1. 2. 3. 4. 5. 6.

d = 2k + 1, n/gcd (n, k) odd, d = 22k − 2k + 1, n/gcd (n, k) odd, n+2 n d = 2 2 + 2 4 + 1, n ≡ 2 (mod 4), n+2 d = 2 2 + 3, n ≡ 2 (mod 4), n−1 d = 2 2 + 3, n odd, n−1 n−1 2 2 + 2 4 − 1 if n ≡ 1 (mod 4) d= n−1 3n−1 2 2 + 2 4 − 1 if n ≡ 3 (mod 4)

From cryptographic viewpoint, these classes are too small to provide satisfaction. In the case 1, introduced by Gold [19], the algebraic normal form has degree 2 and the constructed trace function belong to R (2, n) which was already known to be a set of plateaued functions. In the other cases respectively found by Kasami, Cusick and Dobbertin [16], Canteaut-Charpin-Dobbertin [7] and Hollmann and Xiang [24], the algebraic degree of the functions is greater than 2 but it equals 3 in the cases 4 and 5. It is a hard problem to test cryptographic properties as resiliency on these functions and more generally on Boolean functions deﬁned on the ﬁeld F2n .

Linear Redundancy in S-Boxes Joanne Fuller and William Millan Information Security Research Centre, Queensland University of Technology, GPO Box 2434, Brisbane, Queensland 4001, Australia {fuller,millan}@isrc.qut.edu.au

Abstract. This paper reports the discovery of linear redundancy in the S-boxes of many ciphers recently proposed for standardisation (including Rijndael, the new AES). We introduce a new method to eﬃciently detect aﬃne equivalence of Boolean functions, and hence we study the variety of equivalence classes existing in random and published S-boxes. This leads us to propose a new randomness criterion for these components. We present experimental data supporting the notion that linear redundancy is very rare in S-boxes with more than 6 inputs. Finally we discuss the impact this property may have on implementations, review the potential for new cryptanalytic attacks, and propose a new tweak for block ciphers that removes the redundancy. We also provide details of a highly nonlinear 8*8 non-redundant bijective S-box, which is suitable as a plug in replacement where required.

1

Introduction

The properties of substitution boxes (called S-boxes) form the basis for arguments regarding the security of symmetric encryption algorithms, and their importance is undoubted. Following Shannon’s theory of secrecy systems proposing confusion and diﬀusion in SP-networks [16], and the popularity of the subsequent Data Encryption Standard [12], S-boxes have been the working heart of many eﬃcient and secure encryption algorithms. A look up table can be implemented in a single software instruction, so S-boxes are attractive for fast software encryption. In fact, the vast majority of high quality proposals for symmetric encryption algorithms include the speciﬁcation of one (or more) S-boxes, together with a list of security criteria these S-boxes were selected to meet. Clearly a lot of attention has been given to S-boxes, yet still many open problems remain, and some important properties, such as the one presented in this paper, have previously gone unnoticed. Many papers have investigated the linear approximation and diﬀerential (autocorrelation) properties of S-boxes. It has been clearly demonstrated that powerful generic statistical attacks such as diﬀerential and linear cryptanalysis can be resisted by the selection of nearly optimal Boolean functions as components for the S-boxes. However, it is known that tradeoﬀs exist with respect to optimising Boolean functions for several security criteria simultaneously. Several methods T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 74–86, 2003. c International Association for Cryptologic Research 2003

Linear Redundancy in S-Boxes

75

to generate cryptographically useful S-boxes exist, such as random generation, using ﬁnite ﬁeld operations and heuristic algorithms. Of these, the ﬁnite ﬁeld operation of inversion with respect to a polynomial basis achieves best known combination of high nonlinearity, low autocorrelation and high algebraic degree. For these reasons the ﬁnite ﬁeld operations have become popular in symmetric cryptography. Finite ﬁeld operations have been used in many ciphers proposed since 1996, including Shark [14], Square [3], Rijndael [4], as well as several of the NESSIE [18] proposals Camelia, Hierocrypt, and SC2000. More recently the Japanesse Government’s CRYPTREC [19] standardisation process has had several proposals that use ﬁnite ﬁelds (including many of the abovementioned ciphers plus Cipherunicorn A (and E)). In CRYPTREC, the only block cipher proposal not based on ﬁnite ﬁeld S-boxes is RC6! The South Korean Government is also undertaking an encryption standardisation process[20] and their block cipher submissions that use ﬁnite ﬁelds include Seed and Zodiac. Also we note the stream ciphers BMGL (a NESSIE submission) and MUGI (a CRYPTREC submission) also use ﬁnite ﬁeld based S-boxes. Typically, the designers of these algorithms have chosen to wrap the ﬁnite ﬁeld inversion inside a bitwise aﬃne transformation, claiming that this would prevent algebraic attacks over GF (256). However, in this paper we report the discovery of a property of algebraic linear redundancy that is inherent in the ﬁnite ﬁeld exponentiation operations, including inversion, and which is not removed by any surrounding aﬃne transformation. Apart from ﬁnite ﬁeld operations, S-boxes can possess linear redundancy that stems from other sources, in particular a small number of inputs (as in Serpent and Q) [11] and also low order functions (as in Misty, Kasumi and CAST). In Section 2 of this paper we introduce the concepts relating to equivalence classes of Boolean functions and present an eﬃcient algorithm to detect aﬃne equivalence. Our initial discovery of redundancy in the AES S-box is presented in Section 3. In Section 4 we consider the special case of ﬁnite ﬁeld power permutation based S-boxes, and give a simple and direct proof, due to Wagner, that all the linear combinations of the output bits are given by Boolean functions that are equivalent under aﬃne transform! We ﬁrst demonstrated this fact by a computer experiment on Rijndael’s S-box (that revealed eight transforms between each pair of functions) but after an early draft of this paper [7] was circulated many people then oﬀered algebraic proofs for this case. The ﬁrst proof we received was from David Wagner [17] (and longer but somewhat similar proofs were found by Eric Garrido [8], Don Coppersmith [2] and very probably many others). It is amazing that a property with this simple a proof went so long unnoticed. In Section 5 we present some experimental results on the equivalence class variety possessed by random S-boxes, and hence propose a new randomness criterion: that all output functions should have distinct equivalence classes. In Section 6 we oﬀer some discussion about the consequences for encryption algorithms. We consider improved implementation tradeoﬀs that result from the redundancy, and discuss some possible avenues towards new cryptanalysis. Fi-

76

Joanne Fuller and William Millan

nally we suggest altering (or Tweaking [9]) S-boxes aﬀected by these results, as a barrier against any future cryptanalysis that may result from this kind of non-randomness. We show that this tweak removes the class redundancy without greatly reducing the nonlinearity and diﬀerential security properties of the S-box. In Appendix A we give examples of the matrix transforms that map between the output bits of the Rijndael S-box, and their inverses appear in Appendix B. Appendix C contains the look-up-table for the best non-redundant 8*8 S-box we have so-far generated by tweaking the AES S-box. It has nonlinearity 106 and algebraic order 7, which is the best combination known for a non-redundant S-box. We propose this s-box as a suitable plug-in replacement for ciphers such as AES.

2

Equivalence Classes of Boolean Functions

Boolean functions are represented by their truth tables. When there exists an aﬃne transformation that maps between two Boolean functions, then those functions are said to be aﬃne equivalent and are grouped together in the same equivalence class. Two n-input Boolean functions f and g are considered equivalent if there exists a non-singular binary matrix D, two n-element binary vectors a, b and a binary constant c such that g(x) = f (DxT ⊕ aT ) ⊕ b · xT ⊕ c, where b · xT = b1 x1 ⊕ b2 x2 ⊕ · · · ⊕ bn xn denotes a linear function of x selected by b. The study of Boolean functions can be greatly enhanced by considering equivalence classes. Many properties of cryptographic interest are unchanged by aﬃne transform, such as algebraic degree and nonlinearity. More generally, the absolute values of the Walsh transform and the autocorrelation function are both re-arranged by aﬃne transform. It seems little has been written on equivalence classes since the 1972 Berlekamp and Welch paper [1] on n = 5 described all 48 classes in terms of their Algebraic Normal Form (ANF). It seems to be well known that the number of equivalence classes increases exponentially with n, for example see [5]. Concretely, the 1991 Maiorana [10] paper states that there exist 150,357 classes for n = 6, including 2082 diﬀerent WHT distributions, but there is no analysis of structure for cryptology. More recently, equivalence classes have been used to provide restricted inputs to random and heuristic searches seeking better Boolean functions [13]. However, it has remained an open problem to easily distinguish between equivalent functions and indeed determine such mappings for functions of any n. In seeking other methods to approach the class distinguishing problem, we investigated the local structure by considering the set of functions at Hamming distance one from a given Boolean function. Deﬁnition 1. The 1-local neighbourhood of a Boolean function f consists of all 2n Boolean functions fi , i ∈ Zn2 , constructed such that dist(f , fi )=1. Furthermore the connected functions are given by

Linear Redundancy in S-Boxes

fi (x) =

77

f (x) x = i f (x) ⊕ 1 x = i

We now prove that if f and g are equivalent, then there exists a function gj at distance 1 from g, that is equivalent to a corresponding function fi at distance 1 from f under the same aﬃne transform that relates f and g. This result also provides a useful property for consideration when trying to determine whether two functions are equivalent. Theorem 1. If fi is a connecting function of f (x), deﬁned as above, then there exists a connecting function gj of g(x) = f (DxT ⊕ aT ) ⊕ bT · xT ⊕ c such that gj (x) = fi (DxT ⊕ aT ) ⊕ bT · xT ⊕ c and j = (D−1 (iT ⊕ aT ))T . Proof. Let g(x) = f (DxT ⊕ aT ) ⊕ bT · xT ⊕ c and f (x) x = i fi (x) = f (x) ⊕ 1 x = i Therefore,

(DxT ⊕ aT )T = i f (DxT ⊕ aT ) ⊕ bT · xT ⊕ c T T T T f (Dx ⊕ a ) ⊕ b · x ⊕ c ⊕ 1 (DxT ⊕ aT )T = i g(x) x = ((D−1 (iT ⊕ aT ))T = j T T T T fi (Dx ⊕ a ) ⊕ b · x ⊕ c = g(x) ⊕ 1 x = ((D−1 (iT ⊕ aT ))T = j T

T

T

T

fi (Dx ⊕ a ) ⊕ b · x ⊕ c =

And hence, fi (DxT ⊕aT )⊕bT ·xT ⊕c is equivalent with gj , a connecting function of g such that j = (D−1 (iT ⊕ aT ))T . Corollary 1. Let f and g be aﬃne equivalent. Then the 1-local neighbourhood of f and the 1-local neighbourhood of g are related by a permutation. Proof. This follows from the non-singularity of D.

3

Redundancy in Rijndael S-Box Functions

It was already known from [4] that the functions in the Rijndael S-box exhibit identical excellent properties of algebraic degree and nonlinearity. However it was not well known that the Boolean functions formed by all 255 linear combinations share the same frequency distributions for absolute walsh transform and absolute autocorrelation values. Given that these properties are known to be conserved under aﬃne transform, it suggests the existence of an aﬃne relationship, but does not prove it. It turns out that the general problem of determining equivalence between functions of six inputs and greater has been diﬃcult, and to date the only known solution was exhaustive search. However, the results of the previous section provide the theoretical basis for a new technique that we have implemented. This approach reduces the search space signiﬁcantly.

78

Joanne Fuller and William Millan

Theorem 1 indicates that the connecting functions of f (x) and those of g(x) = f (DxT ⊕ aT ) ⊕ b · xT ⊕ c share the same equivalence mapping as f and g. Hence, rather than only two equivalent functions, we in fact have 2n + 1 pairs of equivalent functions under the same aﬃne transform. After implementing the following algorithm, we believe that this provides suﬃcient data to uniquely determine D(n × n invertible matrix), (a, b) ∈ Zn2 and c ∈ Z2 in an eﬃcient manner. Test for Aﬃne Equivalence Input: f (x), g(x) and n Output: Return D(n × n invertible matrix), (a, b) ∈ Zn 2 and c ∈ Z2 when test is positive, else return not equivalent. 1. Finding a (a) From Theorem 1, we know that connecting function i of f will be equivalent to connecting function j = D−1 (i ⊕ a) of g and therefore i = Dj ⊕ a. When j = 0 we know that i = a. (b) Thus, determine which connecting functions g(j) could be equivalent to connecting function f(0) using algebraic degree and the absolute frequency distribution of the WHT and autocorrelation function of both f and g, the a must be from the set of valid j s (c) Let the set of possible values for a be denoted {a0 , a1 , ...}. 2. Finding D (a) From 1, we know that connecting function i of f will be equivalent to connecting function j = D−1 (i ⊕ a) of g, and therefore i = Dj ⊕ a for a ∈ {a0 , a1 , ...}. (b) When j = ek such that ek be the unit vector with 1 is position k and 0 elsewhere, we see that iT will be the kth column of D ⊕ a. (c) Thus, determine which f(i) could be equivalent to g(j) when j = ek (∀k ≤ n), using algebraic degree and the absolute frequency distribution of the WHT and autocorrelation function of both f and g, to ﬁnd the possible columns of D ⊕ a. (d) Let the set of possible values for Dk ⊕ a be denoted {x(k)0 , x(k)1 , ...}. 3. Finding b and c (a) From 2, the only two remaining unknown variables of the transform relating f and g are b ∈ Zn 2 and c ∈ Z2 . (b) Test each combination of these potential values for a, Dk ⊕ a, b and c to establish if a valid aﬃne equivalence mappings exists. (c) If a valid mapping is found, return D(n × n invertible matrix), (a, b) ∈ Zn 2 and c ∈ Z2 . Otherwise, return no mapping found.

This algorithm provides a general procedure that can be applied to determine an aﬃne equivalence relationship between two functions, or to prove that one does not exist. We should stress that the complexity varies greatly according to the actual pair of functions. We simply note that it is very eﬃcient for testing most pairs of functions, including the functions in the Rijndael S-box. This algorithm was applied to both the individual outputs and the linear combinations of the Rijndael S-box. In this case, for all pairs of functions considered, it was found that the search space was reduced to 215 possible aﬃne

Linear Redundancy in S-Boxes

79

mappings. It was therefore a feasible task to automate this procedure and ﬁnd the matricies. Exactly eight distinct linear transforms were identiﬁed relating any pair of functions from the S-box. One example mapping from each set relating the individual output functions is listed in Appendix A. The corresponding inverse matrices are in Appendix B.

4

Finite Fields

The polynomial basis representation of ﬁnite ﬁelds with characteristic 2 are discussed in the AES submission of Rijndael [4]. That document describes how to generate the S-box but does not examine its cryptographic properties in any great depth. We now present the ﬁrst known proof of the linear redundancy in ﬁnite ﬁeld inversion, due to David Wagner after he saw our initial posting to the IACR e-print archive [7]. Theorem 3. [17] The component output functions of ﬁnite ﬁeld inversion are related by linear transform. Proof. Let T r : GF (28 )− > GF (2) denote the trace function, and let S : GF (28 )− > GF (28 ) be the AES S-box, i.e., inversion: S(x) = x−1 . The basic fact required is that the linear function fi : GF (28 )− > GF (2) extracting the i-th bit of its input can be expressed in the form fi (x) = T r(ci x) for some constant ci in GF (28 ) that depends only on i. Now we can see that fj (S(x)) = T r(cj x−1 ) −1 = T r(ci d−1 ) i,j x

= fi ((di,j x)−1 ) = fi (S(di,j x)) where the constant di j in GF (28 ) is given by di,j = ci c−1 j . Of course, multiplying (in GF (28 )) by any constant in GF (28 ) is a GF (2)linear map, hence for each i, j there is an 8 ∗ 8 matrix Mi,j over GF (2) so that di,j x = Mi,j x. We ﬁnd that fj (S(x)) = fi (S(Mi,j x)), which is the result claimed. We brieﬂy note that Wagner’s proof works for any linear combination of the outputs and moreover it can be adjusted to apply to any single term exponenn−2 tiation. Inversion is the speciﬁc power mapping given by x−1 = x2 .

5

Proposing a New Criterion for S-Boxes

In this section we discuss the variety of diﬀerent equivalence classes and consider it as an indicator of non-randomness for S-boxes. These results show how unusual it is for an S-box on 6 or more inputs to have any linear redundancy at all. For the case n ≤ 4 there are so few equivalence classes that we expect there to be

80

Joanne Fuller and William Millan Table 1. Class Redundancy of Random Bijective S-boxes n = 5, 1000 Trials # classes frequency 5

4

6

20

7

86

8

220

9

246

10

228

11

146

12

39

13

8

14

3

some redundancy. Consider that there are only 8 clases for n = 4, so every 4 ∗ 4 S-box must have some linear redundancy as there are 15 linear combinations. When n = 5 there are 48 equivalent classes. For a 5*5 S-box to have all diﬀerent classes, 31 are required, so it is diﬃcult to avoid some redundancy. The distribution of number of classes for 5*5 bijective S-boxes is shown in the table 1. The average number of diﬀerent classes is 9. For n=6 and more there are so many available classes that it is diﬃcult to ﬁnd linear redundant S-boxes at random. Our experiments show that 3.3% of random 6*6 bijections have 62 diﬀerent classes, and one class used twice. The rest had no redundancy with 63 classes. Our experiments at n = 7, 8 found all S-boxes had no redundancy (in trials of 1000 random bijections). From these results we see that linear redundancy is very rare for 6 or more inputs, hence we propose a new randomness criterion for S-boxes. Proposition 1. Let B[.] be an S-box with 6 or more variables. Then B[.] fails the equivalence class variety test iﬀ the S-box has any linear redundancy. In the previous section we showed that ﬁnite ﬁeld power mappings inherently possess saturated linear redundancy and so they clearly fail our new test for nonrandomness.

6

Discussion

In this section we discuss some of the potential impact of these results on implementation, security and design. 6.1

Impact on Implementation

The most obvious consequence from these observations is the impact on the minimum size of hardware implementations. Clearly any implementation that used combinatorial logic to implement one S-box (with the time/space tradeoﬀ) could achieve a much smaller size now by implementing only one Boolean function instead of eight. The hardware cost for some additional XORs is much less than the hardware savings, however the speed of the reduced size implementation will

Linear Redundancy in S-Boxes

81

be reduced by a factor of 8. We note that very small hardware implementations may be economically suitable for future ubiquitous computing. We note that simpliﬁed hardware for ﬁnite ﬁeld inverse is already known from [15]. Here we point out that method implemented GF (2n ) inversion using nonlinear Boolean logic surrounding a single instance of inversion in GF (2n/2 ). Our redundancy result holds for that ﬁnite ﬁeld operation also, so our improvement can be applied to that method to achieve further space reductions. We futher note that the combined construction can be used to implement inversion over GF (216 ) using only a single Boolean function of 8 inputs, thus making designs using these larger S-boxes easier to implement. 6.2

Impact on Security

The impact on security is more diﬃcult to assess. It takes time for the cryptographic community to consider the many attacks that may be possible. To begin the discussion, we suggest these avenues for cryptanalysts to investigate: – A distinguishing attack may be possible on reduced round ciphers using linear redundant S-boxes. There is more research needed to discover just how the surrounding structures inﬂuence the equivalence property over multiple rounds. Whenever redundancy persists over several rounds, then the cipher does not display random behaviour and could be easily distinguished from random. – The linear redundancy could be exploited to reduce plaintext requirements of some existing attack techniques, open the door for new kinds of related key attacks, or improve the eﬃciency of other cryptanalyses such as using (perhaps multiple) non-linear approximations, higher order derivatives, interpolation, the square/integral attack, and algebraic attacks. – The single formula to represent Rijndael, presented at SAC2001 [6], is simpliﬁed by this result, since the division operation in the continued fraction is really inversion in the ﬁnite ﬁeld. We invite cryptanalysts to investigate how this redundancy aﬀects the complexity of solving the equation from [6]. – Some ciphers, including Rijndael, use an inversion based S-box in the key schedule. We wonder what security consequences this might have! What is the eﬀect of linearly redundant round keys on the eﬀectiveness of linear and diﬀerential cryptanalysis? It seems clear that there are many potential ways to investigate the application of linear redundant S-boxes to cryptanalysis. We challenge the cryptographic community to ﬁnd these new attacks or prove they do not exist. Until such proof appears, there must remain some doubt about the security of ciphers using redundant S-boxes, given the extreme non-randomness of these structures. 6.3

Proposal for Tweaking Redundant S-Boxes

In case new attacks are found that exploit linear redundancy in S-boxes, we make the following suggestion for tweaking [9] the S-box of Rijndael, or any

82

Joanne Fuller and William Millan Table 2. Properties of Tweaked AES-Sbox 10000 Trials nonlinearity order DDT frequency 96

7

8

1

98

6

8

4

98

7

6

1

98

7

8

1

100

6

6

21

100

6

8

48

100

6

10

3

100

7

6

9

100

7

8

13

100

7

10

1

102

6

6

382

102

6

8

723

102

6

10

20

102

6

12

1

102

7

6

146

102

7

8

307

102

7

10

8

104

6

6

2037

104

6

8

2776

104

6

10

56

104

6

12

1

104

7

6

895

104

7

8

1246

104

7

10

19

106

6

6

506

106

6

8

414

106

6

10

7

106

7

6

181

106

7

8

172

106

7

10

1

other cipher with a redundant 8*8 S-box. Divide a 128-bit public tweak value into 8 pairs of bytes. Then for each pair, swap entries in the S-box indexed by the bytes. This process can be done quickly in software and our experimental results show that only a few saps are reqired to remove the linear equivalence property, and that at worst the tweaked S-box is as good as random with regard to linear and diﬀerential probabilities. The designers of Rijndael noted that, given eﬀectiveness of the wide-trail strategy, a random S-box should be enough to provide security against diﬀerential and linear cryptanalysis. The tweaked S-boxes we propose are typically higher nonlinearity than random 8*8 S-boxes. 1 Using this tweak, an 8 ∗ 8 S-box is altered in (up to) 16 places, which is 16 of the S-box. Each round of Rijndael, for example, uses 16 S-boxes, so there is a good probability that these changes have some eﬀect in each round of the encryption. The equivalence class property is removed by this tweak and if the

Linear Redundancy in S-Boxes

83

key-schedule is allowed to use the original S-box, then the per-block encryption speed is not aﬀected by this tweak. Alternatively, (if tweaking is not allowed) an extra 128-bit sub-key can be generated from the keyschedule and used to tweak the S-box before encryption. We have performed some experiments to discover the distribution of properties in S-boxes generated by this tweak. An iteration of the experiment is swapping the outputs for a pair of randomly chosen inputs, and analysing the linearity and class variety of the resulting S-box. In 1000 trials we found that an average of 7.62 iterations was suﬃcient to eliminate any linear redundancy. The average S-box nonlinearity after this process was found to be 103.90. Table 2 shows the frequency distribution of the ﬁnal S-box nonlinearity, starting with the Rijndael S-box in all cases. In Appendix C we present the best S-box we have found so far by this method. It is the best from those found to have nonlin=106, order=7 and maximum XOR diﬀerence distribution table (DDT) value 6. We believe this S-box is suitable as a drop-in replacement for any (currently redundant) 8*8 S-box.

7

Conclusion

The property of linear redundnacy in S-boxes has been introduced, following the discovery of that property in the S-box of Rijndael. Moreover it is now proven that ﬁnite ﬁeld representations over polynomial basis have exponentiation operations (including inversion) that display linear redundancy in all their component Boolean functions and all of their linear combinations. This kind of S-box is used my many recently proposed ciphers. The immeadiate impact of these discoveries on the design, implementation and cryptanalysis of ciphers using redundant mappings has been discussed, but clearly much more needs to be done on this topic. A non-randomness property is deﬁned and we propose a way to remove the redundancy from aﬀected S-boxes using some key-material or a tweak value. A highly nonlinear and non-redundant 8*8 S-box has been provided as a possible replacement.

Acknowledgements The authors would like to thank Ed Dawson, Lauren May and Matt Henricksen for encouragement, assistance and helpful discussions. We also are grateful to the anonymous reviewers whose comments helped improve the presentation of this paper. We also give thanks to Dave Wagner for permission to present his original proof of redundancy for ﬁnite ﬁeld inversion.

References 1. E.R. Berlekamp and L.R. Welch. Weight Distributions of the Cosets of the (32, 6) Reed-Muller Code. IEEE Transactions on Information Theory, 18(1):203–207, January 1972. 2. D. Coppersmith. Personal communication, September 2002.

84

Joanne Fuller and William Millan

3. J. Daemen, L. Knudsen, and V. Rijmen. The Block Cipher SQUARE. In Fast Software Encryption, 1997 Haifa Workshop, LNCS, pages 149–165, 1997. 4. J. Daemen and V. Rijmen. AES proposal: Rijndael. 5. J.D. Denev and V.D. Tonchev. On the Number of Equivalence Classes of Boolean Functions under a Transformation Group. IEEE Transactions on Information Theory, 26(5):625–626, September 1980. 6. N. Ferguson, R. Schroeppel, and D. Whiting. A Simple Algebraic Representation of Rijndael. In Proceedings of Selected Ares in Cryptology, SAC’01, Lecture Notes in Computer Science, page 103, 2001. 7. J. Fuller and W. Millan. Linear redundancy in the aes s-box, aug 2002. manuscript 2002/111 on IACR E-print Archive. 8. E. Garrido. Personal communication, August 2002. 9. M. Liskov, R. Rivest, and D. Wagner. Tweakable Block Ciphers. In Advances in Cryptology - Crypto ’02, Proceedings, Lecture Notes in Computer Science, page 31, 2002. 10. J.A. Maiorana. A Classiﬁcationn of the Cosets of the Reed-Muller code r(1, 6). Mathematics of Computation, 57(195):403–414, July 1991. 11. S. Mister. Analysis of the building blocks of Serpent, 2000. 12. National Bureau of Standards (U.S.). Data Encryption Standard (DES). Federal Information Processing Standards, 1977. 13. E. Pasalic, T. Johansson, S. Maitra, and P. Sarkar. New constructions of resilient and correlation immune boolean functions achieving upper bounds on nonlinearity, 2001. 14. V. Rijmen, J. Daemen, B. Preneel, A. Bosselaers, and E. De Win. The Cipher SHARK. In Fast Software Encryption, 1996 Cambridge Workshop, LNCS, pages 99–111, 1996. 15. V. Rijmen. Eﬃcient Implementation of the Rijndael S-box. Presented at an AES conference. 16. C.E. Shannon. Communication theory of secrecy systems. Bell Systems Technical Journal, 28:656–715, 1949. 17. D. Wagner. Personal communication, August 2002. 18. The New European Schemes for Signatures, Integrity and Encryption (NESSIE) process maintains a web-site via http://www.cryptonessie.org. 19. The CRYPTREC process has a web-site at http://www.ipa.go.jp/security/enc/CRYPTREC. 20. The South Korean standards process has a web-site with downloads at http://www.kisa.or.kr/seed/algorithm.htm.

Appendix A – Rijndael Equivalence Relationships The AES sbox functions are bi (x) = 1&{AES[x] >> (i − 1)}, for 1 ≤ i ≤ 8. b2 (x) = b1 (D12 x) b3 (x) = b1 (D13 x) ⊕ 1 b4 (x) = b1 (D14 x) ⊕ 1 b5 (x) = b1 (D15 x) ⊕ 1

Linear Redundancy in S-Boxes

b6 (x) = b1 (D16 x) b7 (x) = b1 (D17 x) b8 (x) = b1 (D18 x) ⊕ 1 1 0 1 1 0 0 1 0 D12

0 0 1 = 0 1 1 1

1 0 0 1 0 1 0

0 0 0 0 0 0 0

1 1 1 1 0 1 0

0 0 1 1 1 0 1

1 0 0 1 1 0 1

1 1 0 0 1 0 0

0 0  0 D = 0  13 0 0 1

0 1 0 1 0 0 1 0 D14

0 0 1 = 1 1 0 0

0 1 1 1 0 1 1

0 0 0 1 1 1 0

1 0 1 0 0 0 0

0 0 0 0 1 1 1

1 1 0 1 0 0 0

0 0 0 1 1 1 0

0 1  1 D = 0  15 1 0 0

1 1 0 1 1 0 0 0 D16

1 0 1 = 0 1 1 0

0 1 1 1 0 0 0

0 1 0 1 1 1 0

0 0 0 0 1 1 0

0 1 0 1 1 0 0

0 1 0 1 0 1 1

1 0 0 1 1 0 0

0 0  1 D = 0  17 0 1 1

1 1 1 1 0 1 1 0 0 1100100

0 0 0 1 0 0 0 0 1 0 1 1 0 0 1 1   1 1 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 1101110 0 1100010

0 0 1 1 1 1 1 0 0 0011000

1 0 0 0 0 0 1 0 1 0 0 0 0 1 0 1   0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 1 1110110 0 0101110

0 1 0 0 0 0 0 0 1 1001101

1 0 0 1 1 0 1 1 0 0 0 0 0 0 1 1   0 0 1 0 1 0 0 1 0 1 0 1 0 0 0 1 0 1001010 1 0010011

1 0 1 1 0 1 1 0 1 1 1 0 0 0 0 0

0 1 0 1 1 0 1 1 0 1 1 1 0 0 0 0  D18 =  1 0 0 1 1 0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 1 1 0 1 1 1 0 0 0 0 0 1

Appendix B – Inverse Equivalence Relationships 0 1 1 1 1 0 0 0 D21

0 1 0 = 0 0 0 0

1 1 0 1 0 0 0

0 0 1 0 1 0 0

1 1 0 0 1 0 0

0 1 1 0 0 1 1

1 0 1 1 1 1 0

0 0 1 1 0 1 1

0 0  0 D = 0  31 0 0 1

0 1 1 1 0 0 0 0 D41

0 1 0 = 0 0 1 0

0 0 0 0 1 0 1

1 1 0 1 0 1 0

0 1 1 0 1 1 1

1 1 0 1 0 1 1

1 0 1 1 1 0 0

0 0 1 0 1 1 1

1 0  0 D = 0  51 0 1 1

1 1 0 0 0 1 0 0 1 1101100

1 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0   1 0 1 0 0 0 1 0 0 1 1 0 0 1 0 1 0 0100100 0 0011001

0 0 1 0 1 0 0 0 0 1101011

0 0 1 0 0 1 0 1 1 0 0 0 0 0 0 1   1 1 0 0 0 0 0 1 1 1 1 0 1 1 0 1 0 0001000 1 1010101

85

86

Joanne Fuller and William Millan

0 1 1 0 1 0 0 0 D61

1 0 0 = 0 1 0 1

1 1 0 0 0 0 0

0 1 0 1 1 1 1

1 1 1 0 0 0 0

1 0 1 1 0 1 0

0 1 1 1 0 0 0

1 1 0 1 1 0 1

0 0 1 0 0 1 1 0

0 0  1 D = 1  71 1 0 0

1 0000000

1 0 0 1 1 0 1 0 0 0 1 1 0 1 1 1   0 0 1 0 0 0 0 1 0 1 1 1 0 1 0 0 1 0100011 1 0110011

1 0 0 0 1 0 1 0 1 0 1 0 0 0 1 0

0 1 1 0 1 0 0 0 1 1 0 1 1 0 1 0  D81 =  1 1 1 1 1 1 0 0 0 0 1 1 0 1 0 1 1 0 0 0 1 1 0 1 0 0 1 0 1 0 0 1

Appendix C – A Replacement S-Box The following bijective s-box has nonlinearity 106 and algebraic order 7. It contains no ﬁxed points and no linear redundancy. The sbox has a DDT maximum of 6. The distribution of properties over all 255 XOR combinations of the sbox output functions is shown in Tables 3 and 4. SBox[256]={ 63, CA, B7, 04, 09, 53, D0, 51, CD, 60, E0, E7, BA, 70, E1, 8C,

7C, 82, FD, C7, 83, D1, EF, A3, 0C, 81, 32, C8, 78, 3E, F8, A1,

77, C9, 93, 23, 2C, 00, AA, 40, 13, 4F, 3A, 16, 25, B5, 98, 89,

DD, 7D, 26, C3, 1A, ED, FB, 8F, EC, DC, 0A, 6D, 2E, 66, 11, 0D,

F2, FA, 36, 18, 1B, 20, 43, 92, 5F, 22, 49, 8D, 1C, 48, 69, BF,

6B, 59, 3F, 96, 6E, FC, 4D, 9D, 97, 2A, 06, D5, A6, 03, D9, E6,

6F, 47, F7, 05, 10, B1, 56, 38, 44, 90, 24, 4E, B4, F6, 8E, 42,

C5,30, F0,AD, CC,34, 9A,07, A0,52, 5B,6A, 85,45, F5,BC, 5A,C4, 88,46, 5C,C2, A9,6C, C6,E8, 0E,61, 94,9B, 68,41,

01, D4, A5, 12, 3B, CB, F9, B6, A7, EE, D3, 33, 7B, 35, 1E, 99,

67, A2, E5, 80, D6, BE, 02, DA, 7E, B8, AC, F4, E3, 57, 87, 2D,

2B, AF, F1, E2, B3, 39, 7F, 21, 3D, 14, 62, EA, 1F, B9, E9, 0F,

FE, 9C, 71, EB, 29, 4A, 50, 15, 64, DE, 91, 65, 4B, 86, CE, B0,

D7, A4, D8, 27, 74, 4C, 3C, FF, 5D, 5E, 95, 7A, BD, C1, 55, 54,

AB, 72, 31, B2, 2F, 58, 9F, F3, 19, 0B, E4, AE, 8B, 1D, 28, BB,

76, C0, 17, 75, 84, CF, A8, D2, 73, DB, 79, 08, 8A, 9E, DF, 37}

Table 3. Frequency Distribution of Sbox Nonlinearity

Table 4. Frequency Distribution of Sbox Maximum Autocorrelation

nonlinearity frequency

maximum autocorrelation frequency

106

8

32

1

108

76

40

93

110

147

48

134

112

24

56

27

Loosening the KNOT Antoine Joux and Fr´ed´eric Muller DCSSI Crypto Lab, 18 rue du Docteur Zamenhof F-92131 Issy-les-Moulineaux Cedex, France {Antoine.Joux,Frederic.Muller}@m4x.org Abstract. In this paper, we present diﬀerential attacks on the selfsynchronizing stream cipher KNOT. Our best attack recovers 96 bits of the secret key with time complexity of 262 and requires 240 chosen ciphertext bits.

1

Introduction

KNOT is a self-synchronizing (also called asynchronous) stream cipher (SSSC) proposed by Joan Daemen, Ren´e Govaerts and Joos Vandewalle [4] in 1992. In that paper, the authors discussed the design of hardware and software optimized asynchronous stream ciphers and proposed KNOT as an example of such designs. KNOT uses a 96-bit secret key and has a 128-bit internal state. This paper presents diﬀerential attacks on KNOT. Diﬀerential attacks are cryptanalytic tools that can be applied against many kinds of secret key cryptosystems. In particular, it was successfully applied to many block ciphers including DES [2], RC5 [6, 7] and other DES-like cryptosystems [1]. The principle of a diﬀerential attack is to introduce controlled perturbations in the input of an encryption or decryption function and to observe the corresponding modiﬁcations induced on the output. This might allow an attacker to retrieve secret information (for instance subkeys or internal state values). In this paper, we will modify the ciphertext fed into the decryption function of KNOT and observe the corresponding plaintext. Hence our attack is a chosen ciphertext attack. In Section 2 we present general concepts about selfsynchronizing stream ciphers. Section 3 describes the cipher KNOT. Then, Section 4 presents a basic attack on KNOT with time complexity 269 that requires 239 chosen ciphertext bits. This diﬀerential attack is improved in Section 5 in order to obtain time complexity 262 and data complexity 240 .

2

Self-synchronizing Stream Ciphers

A self-synchronizing stream cipher is one in which the keystream bit is a function of the key and a ﬁxed number m of previous ciphertext bits. This parameter m is called the memory of the cipher. Let xt denote plaintext bit number t, yt the corresponding ciphertext bit and wt the corresponding keystream bit. The encryption can be described as yt = xt ⊕ wt T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 87–99, 2003. c International Association for Cryptologic Research 2003

88

Antoine Joux and Fr´ed´eric Muller

Û

Ü

Ý

º º º Ý

Ý

º º º Ý

½

Ý

Ý

½

Û

Ü

Fig. 1. Canonical representation of a self-synchronizing stream-cipher

where the keystream bit is computed as wt = F (yt−1 , . . . , yt−m , K) F denotes the keystream function and K the secret key. Without loss of generality, the encryption and decryption function of a self-synchronizing stream cipher can be presented as in Fig. 1. The basic idea behind SSSCs is to encrypt each plaintext bit with an encryption function depending only on the secret key and the previous m ciphertext bits. Therefore each ciphertext bit can be correctly deciphered as long as the previous m ciphertext bits have been successfully received. This self-synchronization mechanism has many advantages from an engineering point of view. For instance, it may be helpful in contexts where no lower layer of protocol is present to assure error-correction. In particular, it prevents long bursts of error when a bit insertion or deletion occurs during the transmission of the ciphertext, which may be a problem, for instance when using a block cipher. From a security point of view, such ciphers also have some advantages compared to other kind of secret key cryptosystems. First, since each plaintext bit inﬂuences the entire subsequent ciphertext, SSSCs are more likely to be resistant against attacks based on plaintext statistical properties or plaintext redundancies. In the case of synchronous additive stream ciphers, some ciphertext-only attacks have been proposed (see [5]) based only on partial information about the plaintext. In various cases, the entropy of the plaintext is greatly decreased, for instance when a 7-bit ASCII representation is used or when it contains English text. Furthermore, self-synchronizing stream ciphers are more likely than asynchronous ones to detect single digit modiﬁcations in the ciphertext. While such a modiﬁcation only implies a single digit error in the deciphered plaintext for a synchronous additive stream cipher, up to m ciphertext bits may be incorrectly decrypted in the case of a SSSC. This mechanism provides additional security against active attacks. However, insertion, deletion or replay of ciphertext is still

Loosening the KNOT

89

possible against a SSSC and is very diﬃcult to detect. This shows that, in spite of their nice properties, SSSCs cannot guarantee data integrity and authentication without the use of additional mechanisms. The most commonly used SSSCs are based on block ciphers used in 1-bit Cipher FeedBack (CFB) mode [9]. Such modes are usually quite ineﬃcient in terms of encryption speed, since one block cipher operation is needed in order to encrypt one bit. Moreover, it has been shown that reducing the number of rounds in order to increase encryption speed is not always a good idea [10]. The literature about SSSC is very limited compared to the literature about synchronous additive stream ciphers. However, general properties and design criteria for SSSCs have been studied by Maurer [8]. Besides, in [4], the authors discuss the design of SSSCs from an engineering point of view and propose KNOT as an example of such eﬃcient ciphers. An improved version of KNOT, called Γ Υ was also proposed by Daemen in [3]. Unfortunately, the attacks we propose in this paper are based on speciﬁc properties of KNOT and hence do not apply to Γ Υ . Therefore, we will focus only on KNOT in the following sections.

3 3.1

Description of KNOT Overview of the Cipher

In [4], the authors propose to build eﬃcient SSSCs in analogy with a block cipher function. In order to limit the gate delay during the computation of the keystream function, they suggest to use a construction with several simple rounds, similar to what is widely done in the context of block cipher design. Such a structure should also allow an eﬃcient pipelining. For example, in KNOT, they propose to use a keystream function with 8 rounds that use only basic boolean operations. After each round, the size of the internal state decreases in order to eventually produce 1 output bit starting from the m input bits of the keystream function. This multi-stage construction is chosen in order to protect the cipher against diﬀerential attacks. Hence, these stages must guarantee that no diﬀerence in the input of the keystream function will propagate to the outputs with nonnegligible probability. These properties concerning diﬀerence propagation have been analyzed by the designers of KNOT. They claim that no diﬀerence pattern in the inputs of the keystream function will imply a diﬀerence pattern in the outputs with probability greater than 2−16 . To improve confusion and diﬀusion properties on ciphertext and key bits, they propose to use a ﬁnite state machine in the upper stage instead of a simple shift register. In KNOT, this machine is based on the structure of a nonlinear register with a key-dependent evolution. This evolution mechanism has some special properties in order for the self-synchronization mechanism to work correctly. Furthermore, the ciphertext bit is introduced at diﬀerent positions in this machine, in order to modify very quickly the internal state when a diﬀerence in the ciphertext is introduced. This machine is described more precisely in Sec-

90

Antoine Joux and Fr´ed´eric Muller

Ý

º º º

Ê½ Ê Ê Ê Ê Ê Ê

Û Fig. 2. The KNOT keystream function

tion 3.2. The general structure of KNOT keystream function can be represented as in Fig. 2. 3.2

The Finite State Machine

Let Q denote the ﬁnite state machine used in the keystream function of the cipher KNOT. Q has 128 memory cells which values depend on the 96-bit secret key K and the last 96 ciphertext bits introduced in the machine. Thus, the content of Q at time t can be expressed as Q(t) = G(yt , . . . , yt−95 |k0 , . . . , k95 ) where yt denotes the ciphertext bit at time t and ki denotes the i-th key bit. The structure of Q is based on a nonlinear shift register. Accordingly, Q may be seen as a 96-bit register where some of the memory cells have been duplicated. Hence, memory cells of Q are sorted into 96 sets denoted as Q1 , . . . , Q96 where each set Qi may be seen as an extension of the i-th cell of the analog nonlinear shift register (see Fig. 3). Each set Qi contains ni cells. Let Qi,j denote its j-th cell. Thus Qi = {Qi,j , 0 ≤ i ≤ ni } If a standard shift register was used, a single digit diﬀerence in the input introduced at time t − 95 would imply only one diﬀering cell at time t, that is the rightmost cell of the register. This might be a problem when considering diﬀerential attacks. This was avoided in KNOT by duplicating the rightmost cells in the register. Therefore Q can be seen as a nonlinear shift register where the rightmost cells have been expanded. In analogy with a shift register, the value of each memory cell in Q is updated using only memory cells located on the left of this cell. More precisely, Qi is

Loosening the KNOT

º º º

º º º º º º

º

º

º

º º º

½

91

Fig. 3. The ﬁnite state machine Q with Expansion of the rightmost cells

updated at each instant using only the value of sets Qj with j < i. This updating is actually key-dependent, which means the new value of Qi is computed using also several secret key bits. Hence Qi (t) = Hi (Q1 (t − 1), . . . , Qi−1 (t − 1)|k0 , ..., ki−1 ) t Let z0t , . . . , z95 denote the input bits of the keystream function at time t, t where z95 is the oldest input and z0t is the latest input. These input bits are related to the ciphertext bits by

zit = yt−i When no confusion is possible, we will get rid of the dependence on t and note z0 , . . . , z95 these inputs. Thus, the content of the ﬁnite state machine Q can also be expressed as t Q(t) = G(z0t , . . . , z95 |k0 , . . . , k95 ) In order to have a progressive elimination of oldest input bits, each Qi does not actually depend on all input bits. In fact, the value of cells in Qi depends only on – The ﬁrst i secret key bits, k0 , . . . , ki−1 – The last i ciphertext bits introduced, z0 , . . . , zi−1 This can be summarized by t Qi (t) = Gi (z0t , . . . , zi−1 |k0 , . . . , ki−1 )

92

Antoine Joux and Fr´ed´eric Muller

To guarantee the propagation of diﬀerences in KNOT, these functions Gi were chosen of a particular form: t t t |k0 , . . . , ki−1 ) = zi−1 ⊕ Gi (z0t , . . . , zi−2 |k0 , . . . , ki−1 ) Gi (z0t , . . . , zi−1 t is ﬂipped, all cells in Qi are also ﬂipped. Actually, this property Thus, if zi−1 holds only for i < 96. The behavior of the update function G96 is diﬀerent. Indeed, ﬂipping z95 does not always cause cells of Q96 to ﬂip as well. Thus diﬀerent ciphertexts may yield the same internal state, which is an essential point of the key recovery attack we propose in this paper. A precise description of the update function of Q is given in Table 1. The explicit value of each Gi function could be directly derived from this table. For each Qi,j , the update function is denoted by

Qti,j = fi (ai , bi , ci , di ) where fi is a 4-entry boolean functions chosen among g(a, b, c, d) = a + b + c(d + 1) + 1 h(a, b, c, d) = a(b + 1) + c(d + 1)

3.3

The Pipelined Stages

When computing the new keystream bit using the keystream function, the content of Q is updated ﬁrst. This is followed by 8 rounds of transformations in order to decrease the size of the internal state, until one output bit is produced. In KNOT, these round transformations are not key-dependent. Internal values can be represented by 7 registers R1 , . . . , R7 which value is computed from Q using the functions described in Table 2. Update functions ψ and τ are deﬁned by ψ(X) = Y : yi = g(x6i , x6i+3 , x6i+1 , x6i+2 ) τ (X) = Y : yi = g(x5i , x5i+3 , x5i+1 , x5i+2 ) where the indices of x are taken modulo the length of X and the boolean function g is the same as the one used in the update function of Q. The 8-th round consist in the computation of the keystream bit by wt = R7 (0) + R7 (1)(R7 (2) + 1) + 1 where R7 (0), R7 (1), R7 (2) are the ﬁrst 3 bits of register R7 . These register are only used as intermediate values in the computation of the keystream bit. They cannot be considered part of the internal state of the cipher since they are not re-used afterwards. Therefore the eﬀective size of the internal state of KNOT (excluding the secret key bits) is only 128 bits.

Loosening the KNOT

93

Table 1. The update function of Q

3.4

Known Weakness of KNOT

¼

¼

¼ ¼ ¼

¼ ¼ ¼

In his PhD thesis [3], Joan Daemen reported that “the output of KNOT has a detectable imbalance”. We checked this bias experimentally and observed that the keystream produced for a uniformly distributed input is biased by ε 2−9 . Indeed, the probability of observing 1 as output is 12 (1 + ε). This bias results from an imbalance in the last stages of the keystream function. More precisely, when expressing the output as a function of intermediate register R5 as wt = θ(R5t (0), . . . , R5t (15)) where R5t (i) is the i-th bit of register R5 at time t, we observe that θ can be written as the exclusive or of 10 terms. The ﬁrst term is the constant 1, the other 9 terms are polynomials of degree 2 or higher in the R5t (i). Therefore each term has probability 34 to have value 1 and a bias of about 2−9 in the output is expected. By exhaustive search, we observed that the output is

94

Antoine Joux and Fr´ed´eric Muller Table 2. State-transition functions Stage Length Name State-transition 1 64 R1 ψ(Q) 2 64 R2 τ (R1 ) 3 32 R3 ψ(R2 ) 4 32 R4 τ (R3 ) 5 16 R5 ψ(R4 ) 6 16 R6 τ (R5 ) 7 8 R7 ψ(R6 )

1 for 32832 out of the 65536 possible inputs of the function θ. Then assuming that the content of register R5 is balanced, we expect a bias ε=2×

32832 1 −1= 9 65536 2

This observation gives a distinguisher on the cipher requiring about (29 )2 = 218 pairs of plaintext-ciphertext bits.

4

A Key Recovery Attack

In this section, we propose a chosen ciphertext attack. The basic idea behind our attack is that introducing two diﬀerent ciphertext sequences in the ﬁnite state machine Q may induce a collision with non-negligible probability on its whole internal state when the diﬀerence between these ciphertexts has a particular form. Therefore, equality of the keystream bits produced by these ciphertexts occurs with probability slightly bigger than 12 . Thus, by observing the keystream produced for chosen ciphertexts, we are able to recover some secret information concerning the internal values of Q. Then, this information allows us to recover the secret key with less computation than an exhaustive search. 4.1

Obtaining Collisions on Q

In order to obtain collisions on the internal state of the ﬁnite state machine Q, we compute the value of the keystream function for two diﬀerent inputs Z and Zˆ of the form Z = (z0 , . . . , z94 , 0) and Zˆ = (z0 , . . . , z94 , 1). Concretely, this is done by introducing two sequences of 96 ciphertext bits in the ﬁnite state machine that only diﬀer by the ﬁrst bit introduced. After the 96-th bit is introduced, we observe the internal state of the system. These inputs yield two internal states ˆ The two corresponding keystream bits produced are we denote by Q and Q. denoted by w and w. ˆ All sets Qi can be computed by a function Gi of the form Qi = Gi (z0 , . . . , zi−1 |k0 , . . . , ki−1 )

Loosening the KNOT

95

It appears that no set Qi for i ≤ 95 depends on z95 . Since all other input bits are the same in both executions, the only memory cells that might be diﬀerent ˆ are the ones in Q96 . between Q and Q ˆ it suﬃces to ﬁnd a collision beThus, to obtain a collision between Q and Q, ˆ tween Q96 and Q96 . The authors of [4] claimed that each set Qi can be generally computed at time t as t t Qti = zi−1 ⊕ Gi (z0t , . . . , zi−2 |k0 , . . . , ki−1 )

(1)

Actually, we observed that this property does not hold for i = 96. More precisely, unlike all other update functions, the updating of Q96 does not depend linearly on Q95 . At time t, all cells of Q96 can be computed as: Qt96,i = Qt−1 95,j .X ⊕ Y , 0 ≤ i ≤ 15 for some j. X and Y are two binary values that depend only on the secret key and t−1 t−1 the content of Qt−1 0 , . . . , Q94 . From (1), it follows that Q95 linearly depends on the ﬂipped input bit z95 , while X and Y only depend on the other input bits z0 , . . . , z94 . Hence, in spite of the ﬂipped input z95 , memory cells of Q96 may still have the same value, depending on the value of X and therefore on the value of the 95 other input bits. We assumed each event is balanced and occurs with probability 1 1 2 . Thus, each cell in Q96 is equal in both executions with probability 2 . ˆ are equal with probability Since Q96 contains 16 cells, Q and Q 1 1 ( )16 = 16 2 2 The existence of such collisions implies that w and w ˆ will be equal with a bias ε = 2−16 . This observation gives a distinguisher on the cipher. However, collisions can be used more eﬃciently to recover some secret information about the internal state of Q. 4.2

Recovering Internal State Values

In the previous section, we showed how to obtain internal state collisions on Q for two diﬀerent ciphertexts introduced. Such collisions occur at time t when the internal state of the ﬁnite state machine at time t − 1 has some particular properties. Thus, using these collisions, we are able to recover some secret information about the internal state of Q. Observing the properties of memory cells in Q96 regarding to this, we divide them in two parts – First part: Q96,0 , . . . , Q96,7 The updating of these cells depends only on sets Qi with 90 ≤ i ≤ 95. These sets are located in the rightmost part of Q. Thus, the equality of these 8 cells in both executions at time t depends on all other input bits z0 , . . . , z94 . We will assume that each of these events occur with probability 12 .

96

Antoine Joux and Fr´ed´eric Muller

– Second part: Q96,8 , . . . , Q96,15 The updating of these cells depends on Q95 and Q94 , but also on several Qi located further on the left in Q. It can be shown that the equality for each of these 8 cells at time t depends respectively on the value of Q69 , . . . , Q76 t−1 at time t − 1. More precisely, equalities occur when Qt−1 69,0 = 1,. . . ,Q76,0 = 1. These events only depend on input bits z1 , . . . , z76 and we will assume each equality occurs with probability 12 . According to these observations, we ﬁx the values of z1 , . . . , z76 . With probability 2−8 , this will induce a collision on the Second Part of Q96 . Then, we try all possible values for the 19 remaining input bits (z0 and z77 , . . . , z94 ). This does not change anything concerning the Second Part, but, in addition, we also get a collision on the First Part of Q96 with probability 2−8 . Observing the number of equalities in the keystream bits produced, we should be able to observe the expected bias and thus to detect when a collision on Q occurs. Obtaining such a collision means that the values chosen for z1 , . . . , z76 yield value 1 for the cells of Q69 , . . . , Q76 . Besides, relation (1) shows that these internal values only depend on the ﬁrst 76 secret key bits. Hence, by observing some internal collisions on Q, we recover the internal values of 8 cells in the ﬁnite state machine for known inputs. These values only depend on the ﬁrst 76 key bits, which allows us to mount a key recovery attack. 4.3

Practical Realization of this Attack

In order to recover some secret information concerning the internal state of Q, we want to observe a bias ε = 2−8 in the number of equal keystream bits produced. Besides, 19 ciphertext bits are randomized so we get a number of experiences Ω = 219 . Since 1 Ω> 2 ε we generally consider that this bias can be eﬃciently detected. Unfortunately, these Ω experiences are not independent. It appears that among the 19 ciphertext bits we randomize, 2 bits have no inﬂuence on the occurrence of internal state collisions: – the latest input to the keystream function, z0 . This ciphertext bit is not immediately introduced in Q96 . Thus, collisions ˆ will occur or not, independently of the value of z0 . between Q and Q – the second oldest input bit, z94 . Collisions on the First Part of Q96 at time t depend on the value of several internal values at time t − 1 that do not actually depend on z94 . Thus this input bit has no eﬀect on the occurrence of collisions. Therefore, only 217 independent experiences can be obtained. Randomizing z0 and z94 will only cause some repeated experiences as regards to collisions. With only 217 experiences, the statistical bound is very tight and it is likely that false alarms will be launched when trying to detect collisions.

Loosening the KNOT

97

However, the initial 219 experiences can be grouped in 217 sets of 4 repeated experiences, depending on the remaining 17 randomizable bits. If the ﬁxed 76 ciphertext bits yield a collision in the Second Part of Q96 , a collision on the First Part of Q96 and hence on Q should occur with probability 2−8 . In case of full collision, 4 keystream equalities should be observed in the corresponding group of experiences. Otherwise, no internal state collisions are possible and we expect balanced results concerning keystream equalities. Denoting by 0 the case of keystream equality and 1 the other case, the following distribution is expected in each group: Table 3. Expected Distributions Events Collisions No Collision

1 28

(0, 0, 0, 0) (0, 0, 0, 1) ... (1, 1, 1, 1) 1 1 1 + (1 − 218 ) 16 (1 − 218 ) 16 ... (1 − 218 ) 16 1 1 1 ... 16 16 16

The event (0,0,0,0) should be largely over-represented in case of a collision on the Second Part of Q96 . The bias is still about ε = 2−8 and the number of possible tests is Ω = 217 . However, such a bias is easier to detect on a 16-valued distribution than on a bi-valued one. Using this technique, internal state collisions on Q can be eﬃciently detected. Therefore, some secret information about the internal values of Q is recovered as described in the previous section. 4.4

Recovering the Key

The recovered values in the internal state yield some veriﬁable equations on key bits k0 , . . . , k75 . Therefore, when enough equations of this type are obtained, an exhaustive search on these 76 key bits is possible, using these equations as a stopping condition on the search. Generally, it is necessary to obtain at least 76 equations for such an exhaustive search to work. However, each collision found as described in the previous section gives only 8 conditions on key bits. Thus, 10 collisions at least are needed. Moreover, ﬁnding such a collision requires to run the keystream function with M chosen ciphertext bits, where M = 219 × 2 × 96 ×

1 psucc

and the probability of collision on the Second Part psucc is roughly equal to 2−8 . Thus M = 234.6 Hence the data complexity of our attack is 10 × M = 239 . Basically, the exhaustive search on the 76 secret key bits has a complexity of 276 . In fact, it is obviously better to verify the conditions on Q69 ﬁrst in order to guess only 69

98

Antoine Joux and Fr´ed´eric Muller

key bits. After the size of the key space is reduced using these conditions on Q69 , the other key bits can be successively guessed. Using this basic optimization, our attack has a time complexity of 269 .

5

An Improved Key Recovery Attack

In this section, we present a way to make our attack more eﬃcient. We describe how to obtain more useful information about the internal state of Q, in order to decrease the complexity of the exhaustive search on the secret key bits. The basic idea is to recover information about cells located further on the left in Q, using the internal state collisions we found in the previous section. This information will depend on less key bits, which will make the exhaustive search faster. Basically, we are able to recover some internal values of Q62,0 similarly to what we previously obtained on Q69,0 , . . . , Q76,0 . In the previous section, we showed how to detect when these 8 cells have simultaneously value 1. In fact, it is very easy to use this attack to observe when Q69,0 has value 1 by randomizing 7 additional input bits z70 , . . . , z76 as well. Flipping these input bits does not change the value of Q69,0 . If Q69,0 = 1, the octet (Q69,0 , . . . Q76,0 ) is (1, . . . , 1) once among the 27 experiences. Otherwise, this event never happens. Since it can be detected as previously, we ﬁnally learn the value of Q69 = G69,0 (z0 , . . . , z68 |k0 , . . . , k68 ) for any ciphertext. Each query to this function requires the same amount of data as the research of a collision. Therefore it requires 234.6 chosen input bits to be introduced in the keystream function. To obtain some information about the internal state, we observe a0 = G69 (z0 , 0, z2 , . . . , z68 |k0 , . . . , k68 ) a1 = G69 (z0 , 1, z2 , . . . , z68 |k0 , . . . , k68 ) Using various properties of the propagation of ciphertext diﬀerences in the ﬁnite state machine, similarly to what we did in the previous section, it can be t−2 shown that a0 = a1 at time t if and only if Qt−1 63,0 = Q62,0 = 1. This information can be used as a stopping condition in the exhaustive search of the ﬁrst 62 key bits. Only a few conditions of this type are needed in order to reduce the key space. Then additional conditions on Q69,0 , . . . , Q76,0 can be used to recover the correct key bits. The resulting attack has a data complexity of 240 and a time complexity of 262 .

6

Conclusion

In this paper, we present diﬀerential attacks on the stream cipher KNOT. The best attack has time complexity 262 and requires the production of 240 keystream bits. An open question is to analyze the asynchronous stream cipher Γ Υ proposed by Daemen as an improved version of KNOT in [3].

Loosening the KNOT

99

References 1. E. Biham and A. Shamir. Diﬀerential cryptanalysis of DES-like cryptosystems. In A.J. Menezes and S.A. Vanstone, editors, Advances in Cryptology – Crypto’90, volume 537 of Lectures Notes in Computer Science, pages 2–21. Springer-Verlag, 1990. Extended Abstract. 2. E. Biham and A. Shamir. Diﬀerential cryptanalysis of the full 16-round DES. In E.F. Brickell, editor, Advances in Cryptology – Crypto’92, volume 740 of Lectures Notes in Computer Science, pages 487–496. Springer-Verlag, 1992. 3. J. Daemen. Cipher and hash function design. Strategies based on linear and differential cryptanalysis. PhD thesis, march 1995. Chapter 9. 4. J. Daemen, R. Govaerts, and J. Vandewalle. A practical approach to the design of high speed self-synchronizing stream ciphers. In Singapore ICCS/ISITA ’92, pages 279–283. IEEE, 1992. 5. S. Fluhrer, I. Mantin, and A. Shamir. Weaknesses in the key scheduling algorithm of rc4. In S. Vaudenay and A.M. Youssef, editors, Selected Areas in Cryptography – 2001, volume 2259 of Lectures Notes in Computer Science, pages 1–24. SpringerVerlag, 2001. 6. B. Kaliski and Y.L. Yin. On diﬀerential and linear cryptanalysis of the rc5 encryption algorithm. In D. Coppersmith, editor, Advances in Cryptology – Crypto’95, volume 963 of Lectures Notes in Computer Science, pages 171–184. Springer, 1995. 7. L. Knudsen and W. Meier. Improved diﬀerential attacks on rc5. In N. Koblitz, editor, Advances in Cryptology – Crypto’96, volume 1109 of Lectures Notes in Computer Science. Springer, 1996. 8. U.M. Maurer. New approaches to the design of self-synchronizing stream ciphers. In D.W. Davies, editor, Advances in Cryptology – Eurocrypt’91, volume 547 of Lectures Notes in Computer Science, pages 458–471. Springer-Verlag, 1991. 9. National Bureau of Standard U.S. DES modes of operation, 1980. 10. B. Preneel, M. Nuttin, R. Rijmen, and J. Buelens. Cryptanalysis of the cfb mode of the des with a reduced number of rounds. In D.R. Stinson, editor, Advances in Cryptology – Crypto’93, volume 773 of Lectures Notes in Computer Science. Springer-Verlag, 1993.

On the Resynchronization Attack Jovan Dj. Goli´c1 and Guglielmo Morgari2 1 System on Chip, Telecom Italia Lab Via Guglielmo Reiss Romoli 274, I-10148 Turin, Italy [email protected] 2 Telsy Elettronica e Telecomunicazioni Corso Svizzera 185, I-10149 Turin, Italy [email protected]

Abstract. The resynchronization attack on stream ciphers with a linear next-state function and a nonlinear output function is further investigated. The number of initialization vectors required for the secret key reconstruction when the output function is known is studied in more detail and a connection with the so-called 0-order linear structures of the output function is established. A more diﬃcult problem when the output function is unknown is also considered. An eﬃcient branching algorithm for reconstructing this function along with the secret key is proposed and analyzed. The number of initialization vectors required is larger in this case than when the output function is known, and the larger the number, the lower the complexity. Keywords: Stream ciphers, Boolean functions, Resynchronization, Reconstruction algorithms

1

Introduction

A typical stream cipher is based on a keystream generator as an autonomous ﬁnite-state automaton whose output sequence is reversibly combined with a plaintext sequence to yield a ciphertext sequence. A practical stream cipher also uses a reinitialization algorithm which combines a secret key and a known parameter called initialization vector (IV ) into an initial state of the keystream generator. Reinitialization enables reusage of the same secret key with diﬀerent IV ’s for encrypting relatively short messages by diﬀerent keystreams. This is important for resynchronization purposes as well as for late entry in (multiparty) communication links. Reinitialization can increase the security due to shorter keystreams available for cryptanalysis, but can also decrease the security due to multiple keystreams derived from the same secret key. It is known that reasonably secure keystream generators can be constructed from a linear next-state function and a nonlinear output function, e.g., nonlinear ﬁlter generators and memoryless combiners, both

Most of this work was done while the authors were with Rome CryptoDesign Center, Gemplus, Italy.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 100–110, 2003. c International Association for Cryptologic Research 2003

On the Resynchronization Attack

101

based on linear feedback shift registers. However, it is shown in [2] that such a keystream generator when used together with a linear reinitialization algorithm is totally insecure if the output function depends on a relatively small number of input variables. More precisely, for an n-bit output Boolean function and a k-bit secret key, the complexity of the attack is about k2n evaluations of the output function to obtain a system of linear equations for the secret key, which is then reconstructed by solving the system. A common and more secure technique for reinitialization is to produce the initial state of the keystream generator from the output of the keystream generator itself when loaded with a linear combination of the secret key and IV (e.g., see [1]). The resynchronization attack [2] may then be useful for recovering the secret key from a set of previously reconstructed initial states of the keystream generator. A keystream generator can be rendered more secure by letting the secret key control the structure, that is, the next-state and/or output functions. In particular, only the output function can be chosen by the secret key. As in this case the resynchronization attack [2] is no longer applicable, it is interesting to investigate if other, more sophisticated attacks can then be developed. Sections 2, 3, and 4 are devoted to the ﬁrst objective of this paper which is to conduct a more in-depth analysis of the resynchronization attack and thus obtain more precise estimates of the number of IV ’s required for the secret key reconstruction given a general output Boolean function. The main properties of the so-called 0-order linear structures of Boolean functions are pointed out and their impact on the attack is determined. A characterization of Boolean functions in terms of 0-order and 1-order linear structures is also established. The second objective, which is to investigate the more diﬃcult case when the output function is not known, is treated in Sections 5 and 6. An eﬃcient algorithm for reconstructing this function along with the secret key is developed and its complexity is analyzed in terms of the number of IV ’s available. The main results and open problems are summarized in Section 7.

2

Problem Statement

According to [2], consider a general binary keystream generator with a linear next-state function St+1 = Lstate (St ), where St is the internal state at time t, with a linear initialization function S0 = Linit (K, IV ), where S0 is the initial state, K is the secret key, and IV is the initialization vector, and with an output function zt = f (Lout (St )), where zt is the output (keystream) bit at time t, f is a nonlinear Boolean function, and Lout is a linear function. Let IVi , 1 ≤ i ≤ Q, be given IV ’s and let the corresponding output bits be known at times t ∈ T , in the known keystream scenario. They deﬁne a system of nonlinear equations in K of the form zti = f (Lt (K) ⊕ L0t (IVi )),

1 ≤ i ≤ Q,

t ∈ T,

(1)

where Lt and L0t are linear functions derived from Lout (Ltstate (Linit (K, IVi ))) = Lt (K) ⊕ L0t (IVi )

(2)

102

Jovan Dj. Goli´c and Guglielmo Morgari

and ⊕ denotes the bitwise addition. One problem, considered in [2], is to ﬁnd a solution for K when f is known. Another, more diﬃcult problem is to ﬁnd a solution for K and f when f is unknown. Note that for a k-bit secret key K and an n-bit function f , the exhaustive search would require 2k steps for the ﬁrst n problem and 22 +k steps for the second problem.

3

Zero-Order and First-Order Linear Structures

Solving the system (1) depends on whether the output function has linear structures or not. Recall that an n-bit vector γ is called a linear structure of an n-bit Boolean function f if f (X) ⊕ f (X ⊕ γ) ≡ const. It is known that the set of all linear structures of f is a vector space. It is shown in [4] that f has nonzero linear structures iﬀ it can be expressed as g(A(X)) where A is a linear function and g is a function that is partially linear or that depends on less than n variables. According to [3], we can divide the linear structures into the so-called 0-order and 1-order linear structures. A vector γ is said to be a 0-order linear structure of f if f (X) ⊕ f (X ⊕ γ) ≡ 0. The all-zero vector is called the trivial (0-order) linear structure. Similarly, a vector γ is said to be a 1-order linear structure of f if f (X) ⊕ f (X ⊕ γ) ≡ 1. The linear structures of f are directly related to the autocorrelation function of f and can be determined with the complexity O(n2n ) by using the Walsh-Hadamard transform of f (e.g., see [3]). Here we give without proof a number of novel properties of Boolean functions related to 0-order linear structures which are interesting for the resynchronization attack. For the sake of completeness, we also give some properties of Boolean functions related to 1-order linear structures. Note that the distinction between 0-order and 1-order linear structures enables us to obtain novel characterizations of Boolean functions, by Propositions 4 and 8, which are more precise than the characterizations in terms of linear structures given in [4] and [3]. In particular, 0-order linear structures account for the degeneracy, whereas 1-order linear structures account for the partial linearity. Let L, L0 , and L1 denote the sets of all linear structures, all 0-order linear structures, and all 1-order linear structures of a given Boolean function f , respectively. Proposition 1. The set L0 is a vector space. Proposition 2. The cardinality |L0 | = 2m divides gcd(|f −1 (0)|, |f −1 (1)|), where f −1 (i) = {X|f (X) = i}, i = 0, 1, and for nonconstant f , m attains its maximum n − 1 iﬀ f is aﬃne. Proposition 3. The binary relation X1 ∼ = X2 iﬀ X1 ⊕ X2 ∈ L0 (i.e., iﬀ f (X ⊕ X1 ) ≡ f (X ⊕ X2 )) is an equivalence relation. The corresponding equivalence classes {X ⊕ γ|γ ∈ L0 } all have cardinality |L0 |. Proposition 4. Let Λ0 be an m-dimensional subspace of {0, 1}n and let Λ⊥ 0 be the dual (orthogonal) space of Λ0 . Then L0 = Λ0 iﬀ f (X) ≡ g(A(X)) where g is an (n − m)-bit Boolean function without nontrivial 0-order linear structures and A is a linear function represented by a matrix, acting on one-column vectors, whose rows generate Λ⊥ 0.

On the Resynchronization Attack

103

Proposition 5. The dimension m of L0 is the maximal nonnegative integer j such that f (X) ≡ g(A(X)) where g is an (n − j)-bit Boolean function and A is linear. Proposition 6. Let S(f ) denote the set of all n-bit Boolean functions h such that for some C, h(X) ≡ f (X ⊕ C). Then |S(f )| = 2n−m if |L0 | = 2m . Proposition 7. Either |L1 | = 0 or |L1 | = |L0 |. If |L1 | > 0, then |f −1 (0)| = |f −1 (1)|. Proposition 8. Let Λ be an (m + 1)-dimensional subspace of {0, 1}n , let Λ0 be an m-dimensional subspace of Λ, and let Λ1 = Λ \ Λ0 . Then L0 = Λ0 and L1 = Λ1 iﬀ f (X) ≡ g(A(X)) where g is an (n − m + 1)-bit Boolean function without nontrivial 0-order linear structures that is linear in the ﬁrst variable (g has exactly one 1-order linear structure, that is, the vector (1, 0, · · · , 0)) and A is a linear function represented by a matrix whose rows generate Λ⊥ 0 and whose rows without the ﬁrst row generate Λ⊥ . Also, L0 = Λ0 and L1 is empty iﬀ f (X) ≡ g(A(X)) where g is an (n − m)-bit Boolean function without nontrivial linear structures and A is a linear function represented by a matrix whose rows generate Λ⊥ 0.

4

Known Output Function

In the system (1), let qt denote the number of diﬀerent L0t (IVi ) for a given t. If we assume that the rank of L0t is maximal, n, then for moderately large 2n , qt can be approximated by the classical occupancy probabilistic model as n

qt ≈ q = 2n (1 − e−Q/2 ).

(3)

Consequently, the system (1) can be put in the form zti = f (Xt ⊕ Cti ),

1 ≤ i ≤ qt ,

t ∈ T,

(4)

where Xt = Lt (K) and Cti are all diﬀerent for each t ∈ T . For simplicity, in view of (3), we can assume that qt = q for every t ∈ T . Let k be the bit length of K and let τ = |T | . When f is a known n-bit function and n is relatively small, the system (4) can be solved by the method proposed in [2]. Namely, for each chosen t, ﬁnd Xt by exhaustive search, where the required number of diﬀerent Cti , q, is estimated to be about n, because the required number of diﬀerent equations in Xt should roughly be equal to the number of binary variables in Xt . This then takes about n2n evaluations of f . As each found Xt deﬁnes n linear equations in K, T has to be suﬃciently large so as to obtain k linearly independent equations in K. In particular, if T is such that the equations determined by Xt corresponding to diﬀerent t ∈ T are all linearly independent, then the required cardinality of T is τ = k/n. Altogether, this takes about k2n evaluations of f . At the ﬁnal stage, K is obtained by solving the resulting system of linear equations. Our objective here is to study in more detail the required q and τ for a general f .

104

Jovan Dj. Goli´c and Guglielmo Morgari

The ﬁrst note is about the number of solutions for Xt . If f has nontrivial 0-order linear structures, i.e., if the number of 0-order linear structures is 2m , m > 0, then Proposition 3 implies that the number of solutions is a multiple of 2m and for large enough q it is exactly 2m . More precisely, since f can then be put in the form speciﬁed by Propositions 4 and 5, where the function g has no nontrivial 0-order linear structures, it follows that for large enough q there is a unique solution for the linear function of Xt , A(Xt ). As A(Xt ) deﬁnes n − m linear equations, the required cardinality of T then increases to τ = k/(n−m). However, as the number of variables is then eﬀectively reduced from n to n − m, the total complexity of obtaining the system of linear equations in K reduces to about k2n−m evaluations of g. The second note is about how large q has to be in order to reach the minimal number of solutions for Xt , under the assumption that the vectors Cti and Xt are all chosen at random. The minimal value of q needed, qmin , depends on the choice of these vectors and on f . As nontrivial 0-order linear structures eﬀectively reduce the number of variables, it is appropriate to investigate randomly chosen f without nontrivial 0-order linear structures. In the experiments, for a chosen f , Xt is chosen at random, and the q output bits zti are then produced by (4) from randomly chosen diﬀerent vectors Cti . It turns out that an important parameter aﬀecting qmin is the relative number, p, of 1’s in the truth table of f , p = |f −1 (1)|/2n . Let q¯min be the average of qmin over random f and over random choices of the vectors Cti and Xt . A simple information-theoretic argument then yields a necessary condition q¯min ≥ n/H(p), where the value of the binary entropy function H(p) is the average number of bits of information about Xt provided by each equation in the system. In particular, for p = 1/2 we get q¯min ≥ n, whereas for p = 0 or p = 1 we naturally get that q¯min = ∞, which means that the secret key cannot be recovered at all. However, the experiments by computer simulations show that q¯min is larger than n/H(p) as well as that the probability that qmin is considerably larger than this lower bound is not small. As a consequence, both the number of IV ’s required and the attack complexity increase. An explanation for this is that the information contents of individual (nonlinear) equations are generally not mutually independent. Fig. 1 displays the estimates of the probability distribution of qmin obtained from 10,000 randomly chosen 8-bit f for p = 1/2 and p = 1/4. Fig. 2 displays the average values of qmin as a function of p obtained from 100,000 randomly chosen 8-bit f along with the lower bound n/H(p). Similar curves were also obtained for n = 6, 10. Accordingly, more IV ’s are required for nonbalanced than balanced f for the attack to be successful. In other words, the cryptographically most interesting case p = 1/2 requires the minimal number of IV ’s on average.

5

Unknown Output Function – Complete IV Set

When f is not known, e.g., when it is deﬁned by a secret key, the attack from the preceding section is not applicable. We ﬁrst consider the case when there

On the Resynchronization Attack

105

Pr 0,25 p = .5 p = .25

0,2 0,15 0,1 0,05 0 0

5

10

15

20

25

3 0 q min

Fig. 1. Probability distributions of qmin , for n = 8.

60 qmin 8/ H(p)

50 40 30 20 10 0 0

1/16

2/16

3/16

4/16

5/16

6/16

7/16

8/16 p

Fig. 2. Average values of qmin and n/H(p), for n = 8.

exists at least one value of t such that qt = 2n . In accordance with (3), this on average requires that Q = 2n ln 2n , provided that τ = 1. For τ > 1, this average value is somewhat reduced, depending on τ . For any such t, the set of all n-bit functions h consistent with (4) is exactly the set S(f ) of cardinality 2n−m from Proposition 6. So, the most one can get from the system (4) is the set S(f ), to which f belongs. Once the set S(f ) is recovered, the method then consists of running the attack from the preceding section for each candidate function h ∈ S(f ) and of testing the obtained secret key K on additional output bits, n − m on average. Both f and K are thus recovered with the complexity only 2n−m times larger than when f is known.

106

6

Jovan Dj. Goli´c and Guglielmo Morgari

Unknown Output Function – Incomplete IV Set

In this section, we consider a more diﬃcult case when qt < 2n for every t ∈ T , so that the set S(f ) cannot be recovered from any single value of t. In order to obtain this set, we have to combine the information from diﬀerent observation times t, and this can be achieved by the following algorithm. The times t are ﬁrst arranged in order of descending values of qt and denoted as 1, 2, · · · , τ . Then initially, for each 1 ≤ t ≤ τ , compute a partially deﬁned function h0t representing S(h0t ) that is consistent with (4) at time t. More precisely, qt binary values of h0t are deﬁned by using (4) with the all-zero vector instead of Xt , that is, h0t (Cti ) = zti ,

1 ≤ i ≤ qt ,

1 ≤ t ≤ τ,

(5)

while the remaining values of h0t remain undeﬁned. If we denote the undeﬁned values by the symbol b, then each h0t eﬀectively takes three output values: 0, 1, and b. The algorithm essentially consists of searching through a tree of candidate functions h representing all S(h) that are consistent with the current and previous observations combined. More precisely, a three-valued function, with a generic notation h, is assigned to each node in the tree in the following way. Initially, at level 1, start from a single node with the associated candidate function h = h01 , and then proceed iteratively. Consider a node at level j with the associated function h. The successor nodes to this node are derived from all diﬀerent Y ∈ {0, 1}n such that h(X) = h0j+1 (X ⊕ Y )

(6)

for every X ∈ {0, 1}n such that neither h(X) nor h0j+1 (X ⊕ Y ) is equal to b. For each such Y , then use (6) to modify h for every X such that h(X) = b = h0j+1 (X ⊕ Y ) by setting h(X) = h0j+1 (X ⊕ Y ). The modiﬁed h is the candidate function associated with the successor node corresponding to Y . If there does not exist such an Y , then there are no successors to the considered node. This means that the candidate function h assigned to this node is inconsistent with the observations and as such is incorrect, although it may be fully deﬁned. Such an end node is called a failure end node. On the other hand, if there exists such an Y , then the candidate function assigned to any successor node has at most as many undeﬁned values as the candidate function assigned to the considered node. Any node with a binary, fully deﬁned candidate function is called a success end node if it has a (unique) successor and every subsequent node, if generated, would have a (unique) successor. In practice, if the tree is examined by the depth-ﬁrst search with backtracking, with a negligible space complexity, this can be checked on a small number of additional nodes. Alternatively, one can apply the width-ﬁrst search by storing and examining one level of the tree at a time. In this case, at each level, one does not have to examine diﬀerent nodes with the same candidate function more than once, which means

On the Resynchronization Attack X

X

X

X

X

X

011

100

000

001

100

100

1

0

i

L (IV )C 0

i

2

0

1

i

L (IV )C 1

i

3

0

2

i

L (IV )C 2

i

4

0

3

i

L (IV )C 3

i

5

0

4

i

L (IV )C 4

i

6

0

5

i

L (IV )C 5

i

6

i=1 i=1 i=2 I=3 i=2 i=4 i=3

110 110 010 000 010 101 000

110 110 001 010 001 111 010

101 101 010 000 010 001 000

111 111 010 000 010 110 000

101 101 110 100 110 111 100

000 000 111 101 111 100 101

i=4

101

111

001

110

111

100

i

z

i

1

z

i

2

z

i

3

z

i

4

z

107

i

5

z

6

i=1

1

0

1

0

0

1

i=2

0

1

0

0

0

0

i=3

0

0

1

0

1

0

i=4

0

0

0

1

0

1

Fig. 3. The vectors Xt and Cti and the output bits zti , in Example 1.

that only the nodes with diﬀerent candidate functions assigned are further processed. The algorithm then stops when a level with a single node with a fully deﬁned candidate function is reached. In theory, even if τ = ∞, it is possible, but extremely unlikely, that a success end node is never reached. In this case, after a certain point, the new observation times contain no additional information about f and the obtained candidate functions are not being updated at all. After a success end node is found, the set S(f ) is recovered and the rest is the same as when the IV set is complete. The eﬀectiveness of the algorithm can be measured by the number of diﬀerent observation times required to reach a solution and by the total number of nodes examined. Both depend on the number of IV ’s available. The worst-case time complexity is reﬂected by the total number of nodes in the whole tree. The complexity per node is at most 22n elementary operations with the values 0, 1, and b. Example 1. Let f (x1 , x2 , x3 ) = 1 ⊕ x1 ⊕ x2 ⊕ x1 x2 ⊕ x1 x3 , τ = 6, and qt = q = 4, 1 ≤ t ≤ 6. Further, let the output bits zti be produced by (4) from the vectors Xt and Cti as displayed in Fig. 3. Then all the diﬀerent candidate functions obtained are shown in Fig. 4, where ht stands for a generic candidate function at time t. For each 2 ≤ t ≤ 6, the numbers of the form l1 → l2 are also shown, where l1 stands for the total number of candidate functions obtained and

108 h

Jovan Dj. Goli´c and Guglielmo Morgari h

1

0 b 0 b b 0 1 b

0 b 0 1 0 0 1 b

h

2

4

4

0 b 0 0 1 0 1 0

0 0 0 b 0 0 1 1

0 0 0 b b 0 1 b

0 0 0 1 0 0 1 b

0 b 0 1 0 0 1 0

h

3

6

6

0 1 0 0 1 0 1 0

0 0 0 1 0 0 1 1

0 0 0 1 b 0 1 b

0 0 0 1 0 0 1 0

0 0 0 1 0 0 1 0

0 b 0 1 0 0 1 0

h

4

13

6

0 1 0 1 0 0 1 0

0 1 0 0 1 0 1 0

h

5

9 0 0 0 1 b 0 1 0

0 0 0 1 1 0 1 b

0 0 0 1 0 0 1 0

0 b 0 1 0 0 1 0

0 1 0 1 0 0 1 0

6 0 1 0 0 1 0 1 0

6

1 0 0 0 1 b 0 1 0

0 0 0 1 1 0 1 b

1 0 0 0 1 1 0 1 1

Fig. 4. Candidate functions ht , in Example 1.

l2 stands for the total number of (displayed) diﬀerent functions among them, because diﬀerent candidate functions at time t − 1 can give rise to the same candidate function at time t. At time t = 3 there appear 3 fully deﬁned candidate functions, whereas at time t = 4 there are also 3 such functions, but only 2 of them are from time t = 3. So, at time t = 4, one fully deﬁned candidate function disappears as inconsistent, while a new one appears. At time t = 5 the set of candidate functions is not updated and at time t = 6 a unique consistent solution, h6 , is found, which is diﬀerent from all the previously obtained fully deﬁned candidate functions. Note that f = h6 , but f ∈ S(h6 ). More precisely, f (X) = h6 (X ⊕ X1 ), where X1 = 011 (see Fig. 3). In the experiments, we used (4) with randomly generated distinct vectors Cti , for variable values of q. It turns out that a typical tree ﬁrst grows and then gradually shrinks to a single node with a partially deﬁned candidate function. Finally, it takes a number of additional levels for this function to be updated into a fully deﬁned function, at which point the algorithms stops. Fig. 5 shows the dependence of the logarithm to the base 2 of the number of nodes with diﬀerent candidate functions upon the tree level, for the trees obtained from the same randomly chosen balanced 8-bit function f and a number of diﬀerent q. It is not shown that in this particular experiment the success end nodes were reached after t = 68, 45, 47, and 57 levels for q = 24, 28, 32, and 36, respectively. Let τmin denote the minimal number of levels in a tree until a level with a single success end node is reached and let N denote the total number of nodes with diﬀerent candidate functions at each level. Further, let τˆmin = 2n ln 2n /q denote an approximation for τmin according to the classical occupancy model. Fig. 6 displays the logarithms to the base 2 of the average values of N , τmin , and τˆmin , for variable values of q, where the averages were obtained over 1000 randomly generated 8-bit functions f . It is interesting that for the values of q larger than a critical point q2 (q2 ≈ 55, for n = 8), τmin ≈ N , meaning that for almost all f each node in the tree has a unique successor. Branching occurs for q < q2 , and for the values of q smaller than another critical point, q1 (q1 ≈ 35, for n = 8), N grows rapidly as q decreases. In particular, if q ≈ n, then the complexity appears to be prohibitively high.

On the Resynchronization Attack

16

109

q=20 q=24 q=28 q=32

12

q=36

8

4

0 1

3

5

7

9

11

13

t

Fig. 5. Log2 of the number of nodes as a function of the tree level.

Recall that on average, q has to be larger than n for the rest of the attack to be applicable. For q < q2 , τˆmin appears to be a very good approximation. Similar behavior is expected for any value of n.

7

Conclusions

It is shown that the number of initialization vectors required for a successful resynchronization attack can be larger than the number of binary inputs to the output function. The main properties of the so-called 0-order and 1-order linear structures of Boolean functions are established and it is pointed out that the nonzero 0-order linear structures of the output function can simplify the resynchronization attack. More importantly, a new algorithm is proposed which shows that the attack can also work when the output function is not known provided that the number

110

Jovan Dj. Goli´c and Guglielmo Morgari

16 log τ min log N ^ log τ min

12

8

4

0 0

32

64

96

128

160

192

224

256 q

Fig. 6. Average values of τmin , τˆmin , and N , for n = 8.

of initialization vectors is suﬃciently large. This algorithm is able of reconstructing both the output function and the secret key, and the larger the number of initialization vectors, the lower the complexity. If the number of initialization vectors is relatively small, then the complexity becomes prohibitively high. In this case, analyzing the complexity of this algorithm theoretically as well as ﬁnding other, possibly more eﬀective algorithms are problems interesting for future investigations.

References 1. A. Clark, E. Dawson, J. Fuller, J. Dj. Goli´c, H.-J. Lee, W. Millan, S.-J. Moon, and L. Simpson, “The LILI-II keystream generator,” Information Security and Privacy - ACISP 2002, Lecture Notes in Computer Science, vol. 2384, pp. 25-39, 2002. 2. J. Daemen, R. Govaerts, and J. Vandewalle, “Resynchronization weakness in synchronous stream ciphers,” Advances in Cryptology - EUROCRYPT ’93, Lecture Notes in Computer Science, vol. 765, pp. 159-167, 1994. 3. S. Dubuc, “Characterization of linear structures,” Designs, Codes and Cryptography, vol. 22, pp. 33-45, 2001. 4. X. Lai, “Additive and linear structures of cryptographic functions,” Fast Software Encryption - FSE ’94, Lecture Notes in Computer Science, vol. 1008, pp. 75-85, 1995.

Cryptanalysis of SOBER-t32 Steve Babbage1 , Christophe De Canni`ere2, , Joseph Lano2, , Bart Preneel2 , and Joos Vandewalle2 1

Vodafone Group Research & Development The Courtyard 2-4 London Road Newbury, Berkshire RG14 1JX, UK [email protected] 2 Katholieke Universiteit Leuven, Dept. ESAT/SCD-COSIC Kasteelpark Arenberg 10 B3001 Heverlee, Belgium {cdecanni,jlano,preneel,vdwalle}@esat.kuleuven.ac.be

Abstract. Sober-t32 is a candidate stream cipher in the NESSIE competition. Some new attacks are presented in this paper. A Guess and Determine attack is mounted against Sober-t32 without the decimation of the key stream by the so-called stuttering phase. Also, two distinguishing attacks are mounted against full Sober-t32. These attacks are not practically feasible, but they are theoretically more eﬃcient than exhaustive key search. Keywords: NESSIE, Cryptanalysis, Security Evaluation, Sober-t32, Guess and Determine Attack, Distinguishing Attack.

1

Introduction

The European NESSIE [2] competition evaluates a variety of cryptographic primitives (both asymmetric and symmetric) for standardization. Sober-t32, a software-oriented synchronous stream cipher designed by G. Rose and P. Hawkes [1], is one of the candidates. The cipher uses a Linear Feedback Shift Register (LFSR), a Non-Linear Function (NLF) and a so-called stuttering unit for the generation of the pseudo-random key stream. The NESSIE competition demands that a stream cipher oﬀers full security, i.e., that there is no known attack faster than exhaustive key search. In cryptanalysis, one considers that the pseudo-random key stream is known, and the aim is to recover the key (a so-called key-recovery attack). Another possible attack is a distinguishing attack, in which one tries to distinguish the key stream of the stream cipher from a truly random sequence.

F.W.O. Research Assistant, sponsored by the Fund for Scientiﬁc Research – Flanders (Belgium) Research ﬁnanced by a Ph.D. grant of the Institute for the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen) Research Council KUL: Concerted Research Action (GOA) Meﬁsto 666

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 111–128, 2003. c International Association for Cryptologic Research 2003

112

Steve Babbage et al.

In this paper, we will present new attacks on Sober-t32. The ﬁrst attack is a Guess and Determine (GD) attack against unstuttered Sober-t32. This attack exploits a probabilistic factor in the design and appears to be faster than exhaustive search. This is not expected by the designers, as they state in [1]: Our analysis indicates that the combination of the LFSR and NLF appears to be suﬃcient to resist GD-attacks. The second type of attacks are distinguishing attacks. In [5], Ekdahl and Johansson mount distinguishing attacks on Sober-t16 – a cipher very similar to Sober-t32 – and on Sober-t32 without the stuttering unit. In this paper, the attacks from [5] will be improved and distinguishing attacks on full Sober-t32 will be presented. The outline of this paper is as follows: In Sect. 2, a description of Sober-t32 is given. In Sect. 3, the GD attack is elaborated, based on a probabilistic factor in the design of Sober-t32. Finally, Sect. 4 presents two distinguishing attacks on Sober-t32.

2

Description of the SOBER-t32 Stream Cipher

Sober-t32 is a word-oriented synchronous stream cipher. It uses 32-bit words and has a secret key of 256 bits. Sober-t16 is a very similar stream cipher that uses 16-bit words and has a 128-bit key. In a synchronous stream cipher such as Sober-t32, the key stream is generated independently from the plaintext. The sender encrypts the plaintext by performing an XOR (exclusive or, ⊕) operation between plaintext and key stream. The recipient can, if he knows the secret key, reconstruct the key stream and recover the plaintext by performing an XOR operation between the ciphertext and the key stream. Sober-t32 is based on 32-bit operations within the Galois Field GF (232 ). Every word a = (a31 , a30 . . . a1 , a0 ) is represented by a polynomial of degree less than 32: (1) A = a31 x31 + . . . + a0 x0 . When adding two words in GF (232 ), the polynomials are added and their coeﬃcients are reduced modulo 2. This is the same as a bitwise XOR. For a multiplication, the polynomials are multiplied, the coeﬃcients reduced modulo 2, and the resulting polynomial is reduced modulo a polynomial of degree 32. For Sober-t32 the polynomial is: x32 + (x24 + x16 + x8 + 1)(x6 + x5 + x2 + 1) .

(2)

Sober-t32 consists of three main building blocks. First there is a Linear Feedback Shift Register (LFSR), which uses a recursion formula to produce a state sequence sn . Next a Non-Linear Function (NLF) combines these words in a non-linear way to produce the NLF-stream vn . Finally, the so-called stuttering produces the key stream zj by decimating the NLF-stream in an irregular fashion. All three parts are explained in detail below. An overview of the general structure of Sober-t32 is shown in Fig. 1.

Cryptanalysis of Sober-t32

113

Fig. 1. Overall structure of Sober-t32.

2.1

The Linear Feedback Shift Register (LFSR)

The LFSR is a shift register of length 17, where every register contains one word. The internal memory is thus 544 bits. The state of the LFSR at a certain time t is respresented by the vector − → St = (st , st+1 , st+2 , . . . st+16 ) = (r0 , r1 , r2 , . . . r16 ) .

(3)

The next state of the LFSR is calculated by shifting the previous state one step, and calculating a new word st+17 as a linear combination of the words in the LFSR. The word st+17 is calculated as follows: st+17 = st+15 ⊕ st+4 ⊕ α · st ,

(4)

with α = C2DB2AA3x . 2.2

The Non-linear Function (NLF)

At any time t, the NLF takes ﬁve words from the LFSR state and calculates one output, called vt . This output can be written as: vt = ((f (st st+16 ) st+1 st+6 ) ⊕ K) st+13 .

(5)

In this equation, K is a word that is determined during the initialization of the LFSR, denotes addition modulo 232 and f is a non-linear function. The

114

Steve Babbage et al.

structure of the function f is shown in Fig. 2. First, the word is partitioned into the Most Signiﬁcant Byte (MSB) and the three remaining bytes. The MSB is used as an input to a substitution box (S-box), which outputs 32 bits. Of these, the MSB becomes the MSB of the output, and the three remaining bytes are XORed with the three remaining bytes of the input word to form the rest of the output word.

Fig. 2. Structure of the function f .

It is important to notice that the S-box only uses the MSB as an input. This implies that most of the non-linearity is caused by the MSB. One can see that any diﬀerential in the less signiﬁcant bits goes straight through the f -function, and that the MSB of the output is solely determined by the MSB of the input (and vice-versa). The f -function can be written as: f (a) = SBOX(aM SB ) ⊕ (0aR ) .

(6)

In this equation, aR respresents the three remaining bytes of a 32-bit word a, thus without the MSB.

2.3

The Stuttering

The output of the NLF is used as the input for the stuttering phase. The stuttering decimates the stream in an irregular fashion. The ﬁrst output of the NLF is taken as the ﬁrst Stutter Control Word (SCW). This SCW is divided into pairs of bits, called dibits. The value of these dibits determines what happens with the following words, as explained in Table 1. The constant C is 6996C53Ax , and C is the bitwise complement of C. When all dibits have been used, the next word from the NLF-stream is taken as the next SCW. In this way, the stuttering allows only about 48% of the NLF-stream words to go to the key stream.

Cryptanalysis of Sober-t32

115

Table 1. The possible actions of the stuttering unit as a function of the dibit. dibit action 00 The next word is excluded from the key stream. 01 The next word is XORed with C, and goes to the key stream, the word after that is excluded from the key stream. 10 The next word is excluded from the key stream, the word after that goes to the key stream. 11 The next word is XORed with C , and goes to the key stream.

3

A Guess and Determine Atttack on Unstuttered SOBER-t32

In a standard Guess and Determine attack, some words of the LFSR are guessed and the remaining words are determined by exploiting the LFSR and NLF equations. In [3], Bleichenbacher et al. describe a GD attack on Sober-II which also applies to Sober-t32 and has a complexity of 2320 . De Canni`ere [4] improves this attack to 2304 . In this section, the weakness of the NLF, as explained in Sect. 2.2, will be exploited to elaborate a better GD attack. First a simpliﬁed attack will be presented, where the carry bits are not taken into account. Next the real attack will be presented. Finally, the attack will be improved in diﬀerent ways. 3.1

Attack without Carry Bits

First, we will rewrite the equation of the NLF (5) by separating the MSB and the three other bytes and by using (6). This yields the following equation: (((SBOX(st,M SB st+16,M SB o1 ) ⊕ (0st,R st+16,R )) st+1 st+6 )⊕ K) st+13 . (7) In this equation, o1 represents the carry bit towards the MSB. Under the assumption that the MSB of K is zero, this equation can be split up in two separate parts:  ((SBOX1 (st,M SB st+16,M SB o1 )) st+1,M SB st+6,M SB )     vt,M SB = st+13,M SB o2 (8)  (((SBOX2 (st,M SB st+16,M SB o1 ) ⊕ (st,R st+16,R ))   v =  t,R st+1,R st+6,R ) ⊕ KR ) st+13,R . vt =

In this equation, SBOX1 represents the MSB of the output of the S-box, and SBOX2 the three remaining bytes of the S-box output. o2 represents the carry bits from the additions. Especially the ﬁrst equation is interesting. Given the value of the MSB of st , st+1 , st+6 and st+13 , the value of the MSB of the key stream vt and the value of the carry bits o1 and o2 , it is possible to calculate the MSB of st+16 .

116

Steve Babbage et al.

For the moment, the carry bits are not taken into account. Now the attack starts. First, the MSB of the ﬁrst 16 words of the LFSR (st , st+1 , st+2 . . . st+15 ) are guessed, a total of 128 bits. Knowing these, together with the key stream vt , it is possible to calculate the MSB of st+16 , st+17 , st+18 , st+19 , . . . The LFSR will now be clocked a few times. We get a system of linear equations in bits, where the unknowns are the 24 least signiﬁcant bits of every word that appears in the LFSR. Meanwhile, at every iteration a number of linear equations is obtained. These equations are the following: – At every iteration, we get 32 new linear equations from the derivation of the new LFSR word. In fact, the linear recurrence over GF (232 ) is equivalent to 32 bit-wise LFSR’s (see [6]). These LFSR’s all have the same linear recurrence, which is given in [7]. – We also get an extra linear equation per iteration in the least signiﬁcant bit. As the input to the S-box is known, one can easily see that the following equation holds for the least signiﬁcant bit: vt0 = SBOX 0 (st,M SB st+16,M SB ⊕o1 )⊕s0t ⊕s0t+16 ⊕s0t+1 ⊕s0t+6 ⊕K 0 ⊕s0t+13 . (9) In this equation, the superscript 0 stands for the least signiﬁcant bit of the word. This equation is perfectly linear. We also get one such equation before iterating: before the LFSR is clocked, we already get such an equation from the initial state. After clocking the LFSR k times, 32 · k + k + 1 = 33 · k + 1 linear equations are obtained. The remaining unknowns in these equations are: – The value of the 24 least signiﬁcant bits of st , st+1 , st+2 . . . st+16 . – The value of the least signiﬁcant bit of K. – The value of the 24 least signiﬁcant bits of the new word at each iteration. After k iterations, the total is 24 · 17 + 1 + 24 · k = 24 · (17 + k) + 1 unknowns. In order to obtain a solvable set of equations, the number of equations must be larger than the number of unknowns: 33 · k + 1 ≥ 24 · (17 + k) + 1 ⇐⇒ k ≥ 45.33 .

(10)

Remark that the attack recovers the whole state of the cipher, except for the 23 remaining bits of K. However, once the attack is ﬁnished, these can be recovered easily from the second equation of (8). In order to evaluate the complexity of the attack, one should consider the number of bits that has to be guessed. The following bits should be guessed: – The MSB of st , st+1 , st+2 . . . st+15 , a total of 128 bits. – The MSB of K. In fact, this MSB will always be assumed to be zero. This assumption can be seen to be equivalent to guessing the MSB of K: It is possible to mount the attack on a number of diﬀerent key streams until we get a key stream in which the MSB of K is zero.

Cryptanalysis of Sober-t32

117

This means guessing 136 bits in total, which implies a complexity of 2136 . It should be noted that solving the set of equations does not increase the complexity of the attack: We know in advance the linear equations relating the deduced bits to the initial state bits, so we can precompute the inverse matrix and all we have to do is a matrix multiplication. Of course, we have not taken into account the carry bits that are unknown to us. In the next section, we will take the carry bits into account and show the complexity of the full attack. 3.2

Taking Account of the Carry Bits

In the previous section, we have assumed that the carry bits are known. In reality we will also have to guess these carry bits. Carry bits are not purely random, the distribution of their value is non-uniform. One can take advantage of this by ﬁrst trying to guess the more probable values. The number of times we have to guess on average will be well approximated by the entropy, as this equals the amount of information that is present in the carry bits. The Entropy of the Carry Bits. The entropy of the carry bits o1 and o2 can be derived theoretically. This is done in Appendix A. It is shown there that the entropy of o1 is 1, and that the entropy of o2 is 1.65. Complexity of the Full Attack. The full attack requires guessing the following bits: – The 136 bits that had to be guessed in the attack without carry bits. – The carry bits, a total of 47 · (1 + 1.65) = 124.55 bits. (That is, the number of iterations plus one. This extra guess comes from determining st+16 : there is no iteration here, but these carry bits also have to be guessed.) This means a total of 260.55 bits. The complexity of the attack is thus 2260.55 . This is only a little above the complexity of doing an exhaustive search for the 256-bit key. In the following section, a number of improvements will be presented in order to get the complexity below that of exhaustive key search. 3.3

Further Improvements

Instead of assuming that the 8 most signiﬁcant bits of K are zero, one could also assume that the nine most signiﬁcant bits of K are zero. This will lower the entropy of the carry bit o2 . It can be calculated that the new entropy for o2 is 1.48. Now we can recalculate the total complexity for the attack with the procedure described above. We get a complexity of 2253.56 . We can also assume that the 10 most signiﬁcant bits of K are zero. The entropy of o2 is then 1.43, and the complexity of the attack is 2252.21 . Another improvement would be to guess the carry bits of several rounds together. The entropy of the carry bit in (s0 + s16 ) is 1; and the entropy of the

118

Steve Babbage et al.

carry bit in (s16 + s32 ) is also 1. The entropy of the two together is 1.92 however. If we take (s0 + s16 ), (s16 + s32 ) and (s32 + s48 ) together, the entropy is 2.83. There is scope for more improvements of this sort. Another possible improvement would be to use all relations in the NLF, so not only the linear ones. This approach is described in App. B. The complexity of this approach is not well understood and requires further research.

4

Distinguishing Attacks

At FSE 2002, P. Ekdahl and T. Johansson presented distinguishing attacks on full Sober-t16 and on unstuttered Sober-t32 [5]. In this section, both attacks will be adapted to obtain two distinguishing attacks on full Sober-t32. 4.1

Extending the Attack on Unstuttered SOBER-t32 to Full SOBER-t32

In this section, the attack on unstuttered Sober-t32, described by P. Ekdahl and T. Johansson in [5], is adapted so that it also works on full Sober-t32. First an overview of the attack on unstuttered Sober-t32 will be given. For a complete description we refer to [5]. Then the attack on full Sober-t32 will be described. The Attack on Unstuttered SOBER-t32. The attack starts by linearizing the equation of the NLF (5): vt = st ⊕ st+1 ⊕ st+6 ⊕ st+13 ⊕ st+16 ⊕ wt = Ωt ⊕ wt .

(11)

Then it will be argued that the noise wt , introduced by this approximation, has a biased distribution. In the next step, a new linear recurrence is obtained by repetitive squaring of the LFSR equation (4): st+τ5 ⊕ st+τ4 ⊕ st+τ3 ⊕ st+τ2 ⊕ st+τ1 ⊕ st = 0 ,

(12)

with τ1 = 11, τ2 = 13, τ3 = 4 · 232 − 4, τ4 = 15 · 232 − 4 and τ5 = 17 · 232 − 4. This linear recurrence is valid for each bit position individually. Then the XOR between two adjacent bits in the stream vt are considered: vt [i] ⊕ vt [i − 1] = Ωt [i] ⊕ Ωt [i − 1] ⊕ wt [i] ⊕ wt [i − 1] .

(13)

The distribution F [i] of wt [i] ⊕ wt [i − 1] is then calculated. Simulation indicates that this distribution is quite biased. The largest bias was found for the XOR of bit 29 and 30. The bias depends on the corresponding bits of K, and for bit 29 and bit 30 it is at least 30 = 0.0052.

Cryptanalysis of Sober-t32

119

Now, given the NLF-stream v0 , v1 , . . . vN −1 , the linear recurrence (12) can be used to calculate Ωt+τ5 ⊕ wt+τ5 ⊕ Ωt+τ4 ⊕ wt+τ4 ⊕ vt+τ5 ⊕vt+τ4 ⊕vt+τ3 ⊕vt+τ2 ⊕vt+τ1 ⊕vt = Ωt+τ3 ⊕ wt+τ3 ⊕ Ωt+τ2 ⊕ wt+τ2 ⊕ (14) Ωt+τ1 ⊕ wt+τ1 ⊕ Ωt ⊕ wt , where the sum of all Ωj terms is zero because of (12). This equation can be rewritten as: vt+τ5 ⊕ vt+τ4 ⊕ vt+τ3 ⊕ vt+τ2 ⊕ vt+τ1 ⊕ vt =

5

wt+τj .

(15)

j=0

The left hand side of this equation is noted as Vt , the right hand side as Wt . It is now possible to calculate the following probability: P (Vt [i] ⊕ Vt [i − 1] = 0) = P (Wt [i] ⊕ Wt [i − 1] = 0) =

1 + 25 6i . 2

(16)

The ﬁnal correlation probability for the six independent key stream positions can then be obtained for i = 30: p0 = P (Vt [30] ⊕ Vt [29] = 0) =

1 1 + 25 (0.0052)6 ≈ + 2−40.5 . 2 2

(17)

In order to distinguish this nonuniform distribution P0 from a uniform source PU , the Chernoﬀ information between the two distributions is calculated: C(P0 , PU ) = − min log2 P0λ (x)PU1−λ (x) ≈ 2−81.5 . (18) 0≤λ≤1

x

In order to obtain an error probability of Pe = 2−32 , N = 286.5 samples from the key stream are needed. Each sample spans τ5 = 17 · 232 ≈ 236 bits, so in total N + τ5 ≤ 287 words from the NLF-stream are needed to distinguish unstuttered Sober-t32 from a uniform source. The Attack on Full SOBER-t32. This distinguishing attack requires the words vt , v11 , v13 , v4·232 −4 , v15·232 −4 and v17·232 −4 from the NLF-stream. Our aim is to ﬁnd these words in the key stream zj , i.e., after the stuttering. One might think that the probability of guessing the right positions for the large values 4·232 −4, 15·232 −4 and 17·232 −4 will be so small that the distinguishing attack will no longer be successful. In this section we will show that this is not the case. First of all, an expression is derived for the probability that a particular word will be at its most probable position in the key stream. A key stream word zi is taken, and this word comes from the word vt in the NLF-stream. Then the probability is calculated that the following words appear at their most probable position in the key stream.

120

Steve Babbage et al.

The most probable position of vt+11 in the key stream is zi+6 . Simulations indicate that the probability that this is a correct guess is 21.7%. The most probable position of vt+13 in the key stream is zi+7 . Simulations indicate that the probability that this is a correct guess is 19.8%. For the remaining three words, the situation is more complex. The probabilities will be calculated through a theoretical deduction. For the n-th word that n goes to the stuttering unit, it can be expected that 25

stutter control words (SCW) have been used before. Of all the remaining (non-SCW) words, 50% are expected to go to the key stream. The most probable position in the key stream of the word vn is thus: E[position(vn )] =

n

n − 25 . 2

(19)

In order to calculate the probability that the word vn will be at this most probable position, we will ﬁrst calculate the probability that the n-th SCW appears at its most probable position in the NLF-stream. This probability is easier to calculate theoretically, and it is easy to see that the asymptotical behaviour of both values will be the same. Two dibits, 00 and 11, determine what is going to happen with the next word. The two other dibits, 01 and 10, determine the stuttering of the two following words. As every dibit appears with the same probability, it is expected that a dibit uses 1.5 words of the NLF-stream on average. A SCW gives 16 dibits and uses thus an average of 24 words. Because the SCW is also coming from the NLF-stream, it is expected that the n-th SCW is at the position 25n of the NLF-stream. The probability that the n-th SCW is indeed on this position, is equal to the probability that n SCW’s determine the stuttering of exactly 24n words from the NLF-stream. This means that half of the 16n dibits should determine the stuttering of one word, the other half should determine the stuttering of two words. This probability can be expressed as follows:

16n 1 8n+8n 16n! P (position(SCW [n]) = 25n) = ( ) = . (20) 8n 2 (8n!)2 · 216n This equation can be approximated by using the Stirling equation for large faculties: √ n! ≈ 2πn · nn · e−n . (21) This gives the following equation: √ 2π16n · (16n)16n · e−16n 1 P (position(SCW [n]) = 25n) ≈ √ =√ . (22) 8n −8n 2 16n ( 2π8n · (8n) · e ) ·2 8πn Simulation shows that this equation is a very good approximation. When this line of reasoning is reversed, we see that the probability that the 25n-th √ word of the NLF-stream will be the n-th SCW, is also proportional to 1/ n.

Cryptanalysis of Sober-t32

121

The probability for the following words to be at their most probable position in the key stream will be similar. It can be written as: P (position(vn ) =

n

n − 25 λ , )= 2 8πn

(23)

25

where λ is a constant. Simulation shows that this hypothesis is indeed correct. From the simulations it follows that λ 0.84. The probability that v4·232 −4 , v15·232 −4 , v17·232 −4 appear in the key stream at their most probable position can now be calculated with (23). These probabilities – given that the words before are present in the key stream at their most probable position – are 2−17.3 , 2−18.0 and 2−16.8 respectively. The total probability p0 can now be calculated: p0 = 0.217 · 0.198 · 2−17.3 · 2−18.0 · 2−16.8 = 2−56.6

(24)

With probability p0 , v17·232 −4 , v15·232 −4 , v4·232 −4 , v13 and v11 are present in the key stream at their most probable position. They will appear in the key stream, XORed with 0, C or C . This will however not aﬀect the distribution: we are considering the XOR of bit 29 and bit 30, and for C both bits are 1. Thus, for C, C and for 0, the XOR of bit 29 and 30 is always zero, so the constant values are always eliminated. This yields the following for the calculation of the Chernoﬀ information between the distribution of the key stream PY and the uniform distribution PU : C(PY , PU ) ≈ p20 · C(PW , PU ) = 22·−56.6 · 2−81.5 = 2−194.7 .

(25)

PW is the distribution of the NLF-stream. For an error probability of Pe = 2−32 this means we need 32 · 2194.7 = 2199.7 samples. The attack requires 2199.7 sequences of 17 · 232 words from the key stream, this is a total of L = 2200 ≥ 2199.7 + 17 · 232 words. 4.2

Extending the Attack on SOBER-t16 to SOBER-t32

In [5], Ekdahl and Johansson present a distinguishing attack on the full Sobert16 (with stuttering). As the paper mentions, the proposed methods are expected to be applicable against Sober-t32 as well, but no complexity expression is given due to computational limitations. In this section, we derive the expected complexity by making a number of (realistic) probabilistic assumptions. Distributions. The distinguishing attack in [5] starts by approximating the equation of the NLF (5) of the cipher with a linear function and analyzing the distribution of the noise wt for diﬀerent values of K. vt = st ⊕ st+1 ⊕ st+6 ⊕ st+13 ⊕ st+16 ⊕ wt = Ωt ⊕ wt (26) In the case of Sober-t16, the noise wt can take on 216 values and the probability of each of these values can easily be estimated by simulations (in [5],

122

Steve Babbage et al.

an accurate estimation is obtained by taking 238 samples). A similar simulation for Sober-t32, however, would require an impractical amount of memory and a huge processing time. This motivates us to derive a complexity expression based on the average non-uniformity of wt , instead of on its full distribution. In the following, Pu (w) = p = 1/N = 2−32 stands for the uniform distribution, Pw (w) = p · (1 + w (w)) for the noise distribution and σ2w for the non-uniformity. To estimate the non-uniformity of the noise distribution, we will now simulate w (w) for a limited number of values w and assume that these samples are representative for the full distribution. A ﬁrst straightforward way to ﬁnd an estimate of w (w) for a given value of w would consist in randomly choosing n sets (st , st+1 , st+6 , st+13 , st+16 ), computing wt for each of them and analyzing the frequency at which wt equals w. However, in order to speed up the convergence, we will follow a somewhat diﬀerent approach. We ﬁrst uniformly choose n sets (st , st+1 , st+6 , st+16 ) and compute

at = fw (st st+16 ) st+1 st+6 ⊕ K (27) bt = st ⊕ st+1 ⊕ st+6 ⊕ st+16 . (28) Then for each set we count the number of st+13 for which wt = (at st+13 ) ⊕ (bt ⊕ st+13 ) = w .

(29)

This number can directly be derived from the bits of at and bt , i.e., it is not needed to run through all possible values of st+13 . The estimation for w (w) obtained this way is expected to converge more rapidly, as each step takes into account 232 possible values for (st , st+1 , st+6 , st+13 , st+16 ). In case at and bt were uncorrelated and uniformly distributed, one would ﬁnd that 3 32 1 4 ≈ . (30) error = 13 p·n 2 ·p·n For both Sober-t16 and Sober-t32, we performed the simulations for different values of K and diﬀerent sets of 256 consecutive w. From the estimated values of w (w), we calculated the non-uniformity σ2w . Eventually, the minimal and average non-uniformity were found to be 2−10 and 2−8 for Sober-t16 and 2−9 and 2−7 for Sober-t32. Combining Distributions. In the next step of the attack described in [5], diﬀerent shifted versions of wt are combined in order to eliminate the unknown LFSR words st . Wt = wt+17 ⊕ wt+15 ⊕ wt+4 ⊕ α · wt . (31) Next, we need to calculate the non-uniformity of the full noise Wt . In Appendix C, a general expression for the non-uniformity of z = x ⊕ y given σ2x and σ2y is derived.

Cryptanalysis of Sober-t32

123

Distinguishing Distributions: Chernoﬀ Information. The number of samples required to distinguish a distribution Px (x) from the uniform distribution is determined by the Chernoﬀ information [5]. This quantity can easily be derived from the non-uniformity σ2x : − log2

x

1 1 + x (x) N x 1 1 1 2 ≈ − log2 1 + · x (x) − · x (x) N x 2 8

1 1 = − log2 1 − · σ2x ≈ · σ2 . 8 8 · ln 2 x

Px (x) · p = − log2

(32) (33) (34)

Complexity of the Attack on SOBER-t32. Using the formulae in the previous sections we are now able to estimate the complexity of a distinguishing attack on Sober-t32. – From the simulated approximation for the minimal non-uniformity of the noise wt , 2−9 , we can ﬁnd the expected non-uniformity of the full noise Wt by using (51) and (52): σ2W ≈

3 2 4 · σw ≈ 2−130 . N3

(35)

To derive this result, the last two sums of the expression Wt = (wt+17 ⊕ wt+15 )⊕(wt+4 ⊕α·wt ) are considered to be sums of independent distributions. – The stuttering adds an additional unknown constant Ct to the noise Wt . This constant can be written as Ct = ct+17 ⊕ ct+15 ⊕ ct+4 ⊕ α · ct with ct ∈ {0, C, C }. Assuming that all 12 possible values of Ct are equiprobable (which is the worst case), we ﬁnd σ2C ≈ N/12 and σ2W ⊕C ≈

1 · σ 2 ≈ 2−134 . 12 W

(36)

– As explained in [5], we will only get this non-uniform distribution if we made correct guesses for the positions of the wt in the key stream. Simulations show that this happens with a probability p0 = 2−5.5 . The ﬁnal non-uniformity will therefore be reduced to: σ2 ≈ p20 · σ2W ⊕C ≈ 2−145 .

(37)

– In order to obtain a suﬃciently small probability of error (say 2−16 ), even after applying the distinguisher for all 232 possible values of K, we need a stream of at least 48 times the inverse of the Chernoﬀ information. This yields the ﬁnal complexity: 48 ·

8 · ln 2 ≈ 2153 . σ2

(38)

124

5

Steve Babbage et al.

Conclusion

In this paper, some new attacks on Sober-t32 have been presented. A ﬁrst attack is a 2252.21 Guess and Determine attack on unstuttered Sobert32. This attack is due to a probabilistic property of the t-class of stream ciphers found in their S-box construction: The relationship between 8 bits in and 8 bits out is not diﬀused to other positions in the word. Even a cyclic shift at the end of the S-box would have destroyed the attack. In order to prevent similar attacks, we suggest that in word-based LFSRs, the NLF should implicate the whole word, and not just a part of the word as in Sober-t. Then the attacker will not gain any proﬁt by guessing some bits of the words. Stuttering prevents the attack - not so much by the uncertainty it introduces as by the fact that consecutive words don’t appear in the key stream. In fact, a timing attack [7] on the stuttering can reveal a long sequence of consecutive words that are not eliminated, thus enabling the GD-attack described above (see [8]). Next, two ways of mounting distinguishing attacks on full Sober-t32 have been elaborated. Both attacks are based on the attacks described in [5]. The ﬁrst attack is an adaptation of the attack on unstuttered Sober-t32, such that it also works on full Sober-t32. This attack could distinguish the Sober-t32key stream from a uniform source with about 2200 output words. The second attack extends the attack on full Sober-t16 to full Sober-t32. This attack could distinguish the Sober-t32 key stream from a uniform source with about 2153 output words. Furthermore, these distinguishing attacks show that the stuttering cannot frustrate all attacks requiring vast amounts of key stream. The stuttering unit is however very expensive as it lowers the performance of the cipher by 52%. We would thus not recommend the usage of such parts in stream ciphers. The attacks described are only possible theoretically. However, they are more eﬃcient than exhaustive key search. This implies that Sober-t32 does not oﬀer the security required by the NESSIE competition.

References 1. P. Hawkes and G. Rose, Primitive Speciﬁcation and Supporting Documentation for Sober-t32 Submission to NESSIE, Proceedings of the First Open NESSIE Workshop, 2000. 2. New European Schemes for Signature, Integrity and Encryption, http://www.cryptonessie.org 3. D. Bleichenbacher, S. Patel and W. Meier, Analysis of the SOBER stream cipher, TIA contribution TR45.AHAG/99.08.30.12, 1999. 4. C. De Canni`ere, Guess and Determine Attack on SOBER, NESSIE report NES/DOC/KUL/WP5/010/a, 2001. 5. P. Ekdahl and T. Johansson, Distinguishing Attacks on Sober-t16 and t32, Fast Software Encryption 2002, LNCS 2365, J. Daemen, V. Rijmen, Eds., SpringerVerlag, pp. 210-224, 2002. 6. T. Herlestam, On Functions of Linear Shift Register Sequences, Eurocrypt 85, LNCS 219, F. Pichler, Ed., Springer-Verlag, pp. 119-129, 1985.

Cryptanalysis of Sober-t32

125

7. M. Schafheutle, A First Report on the Stream Ciphers Sober-t16 and Sober-t32, NESSIE document NES/DOC/SAG/WP3/025/02, NESSIE, 2001. 8. J. Lano and G. Peeters, Cryptanalyse van NESSIE kandidaten (Dutch), Master’s Thesis, K.U. Leuven, May 2002. 9. N. Courtois, A. Klimov, J. Patarin, A. Shamir, Eﬃcient Algorithms for Solving Overdeﬁned Systems of Multivariate Polynomial Equations, Eurocrypt 2000, LNCS 1807, B. Preneel, Ed., pp. 392-407, Springer-Verlag, 2000. 10. N. Courtois and J. Pieprzyk, Cryptanalysis of Block Ciphers with Overdeﬁned Systems of Equations, Cryptology ePrint Archive, Report 2002/044, http://eprint.iacr.org, 2002.

A

Calculating the Entropy of the Carry Bits

In the following, pji (and also qij and rij ) stands for: the probability that the value of the carry bit to the i-th bit is equal to j. The bits are numbered from least to most signiﬁcant. Bit 0 is the least signiﬁcant bit, bit 31 is the most signiﬁcant bit. The notations pji , qij and rij are used to distinguish between the three diﬀerent scenarios which are discussed below: – Two random 32-bit words are added. One can see that p01 equals 34 , and that p11 equals 14 . The values of the subsequent carry bits can be obtained with the following recursion: 0 pi+1 = 34 .p0i + 14 .p1i (39) p1i+1 = 14 .p0i + 34 .p1i The values of p0i and p1i can now be calculated for all i. Both values converge rapidly to 12 . For i = 24, the carry bit of interest here, this is a very good approximation. The entropy H for this carry value is thus: H=−

1 1 1 1 pi . log(pi ) = − log( ) − log( ) = 1 2 2 2 2

(40)

The carry bit o1 corresponds to this case. The entropy of the carry bit o1 is 1. – Now three words are added up. The carry value can now be 0, 1 or 2. The probabilities for the ﬁrst carry value are q10 = 12 , q11 = 12 and q12 = 0, and the recursion formula is:  q 0 = 12 .qi0 + 18 .qi1    i+1  1 qi+1 = 12 .qi0 + 34 .qi1 + 12 .qi2 (41)     2 qi+1 = 18 .qi1 + 12 .qi2 For increasing i, the values converge rapidly towards q 0 = q 2 = 61 .

1 1 6, q

=

2 3

and

126

Steve Babbage et al.

The entropy H converges thus towards the value: H=−

1 2 2 1 1 1 q i . log(q i ) = − log( ) − log( ) − log( ) = 1.25 6 6 3 3 6 6

(42)

– In a third scenario, we consider the carry value for the sum ((x+y)⊕C)+z+u. As an extra constraint, all bits of C that are more signiﬁcant (i.e. more to the left) than the carry value considered must be zero. It can be seen that this case is a combination of the two previously considered cases: the total carry value is the sum of the carry value from the addition of two words (x and y) and of the carry value from the addition of three words (x + y ⊕ C, z and u). It is then easy to see that the probabilities converge towards the following values:  0 1 r = p0 .q 0 = 12 . 16 = 12       5  r1 = p0 .q 1 + p1 .q 0 = 12 . 23 + 12 . 16 = 12 (43) 5   r2 = p0 .q 2 + p1 .q 1 = 12 . 16 + 12 . 23 = 12      3 1 r = p1 .q 2 = 12 . 16 = 12 The entropy H converges thus towards the following value. H=− ri . log(ri ) =−

1 5 5 5 5 1 1 1 log( ) − log( ) − log( ) − log( ) = 1.65 12 12 12 12 12 12 12 12

(44)

The carry value o2 corresponds to this case. The entropy of the carry value o2 is 1.65.

B

Using Multivariate Quadratic Equations

By knowing the MSB of the words of the LFSR, we have eliminated the main non-linearity in the algorithm. A consequence is that the equation (9) in the least signiﬁcant bit is completely linear, a fact that has been exploited above. For the 23 other bits, the equations will be similar to this equation, but some carry bits will appear in these equations. This is the only (small) non-linearity in the system. These carry bits do not represent new unknowns as they can be written as the product of bits. We can write the following bitwise equations for each vti for i going from 1 to 23 (and similarly for vt+1 , . . . vt+k :  i vt = SBOX i + sit + sit+16 + cait + sit+1 + sit+6 + cbit + K i + sjt+13 + ccit       i−1 i−1  cait = sti−1 · st+16 + sti−1 · cati−1 + st+16 · cati−1 (45)  i   cbt = . . .     i cct = . . .

Cryptanalysis of Sober-t32

127

We have introduced 3·23·(k +1) new unknowns and 4·23·(k +1) new equations. In order to have enough equations we need: 33 · k + 1 + 4 · 23 · (k + 1) ≥ 24 · (17 + k) + 24 + 3 · 23 · (k + 1) ⇐⇒ k ≥ 12.75 . (46) So we can now consider all possible number of iterations beginning with 13. The more iterations we will use, the more our system will be overdeﬁned. The so-called XL[9] and XSL techniques[10] can be used to solve this system. This may lead to a more eﬃcient attack. However, the complexity of these algorithms is not well understood and may be clariﬁed in future research.

C C.1

Calculating the Non-uniformity of the Sum of Distributions The Sum of Two Independent Distributions

Let x an y be drawn from two independent distributions with non-uniformity σ2x and σ2y .

Px (x) = p · 1 + x (x)

Py (y) = p · 1 + y (y) . The distribution of z = x ⊕ y can be written as: Pz (z) = Px (x) · Py (y)

(47) (48)

(49)

x⊕y=z

= p · 1 + z (z) .

(50)

Exploiting the fact that the sum of all equals zero and that the distributions of x and y are independent, we obtain:

1 E σ2z = E z (z)2 N z  2  1  −1 = E p · Px (x) · Py (y) − 1  N z x⊕y=z  2  1 1 1 1 = E x (x) + y (y) + x (x) · y (y)  N z N x N y N x⊕y=z  2  1 1 = E x (x) · y (y)  N z N x⊕y=z

1

1 1 = E x (x) · x (x ⊕ d) · E y (y) · y (y ⊕ d) . N N x N y d

128

Steve Babbage et al.

To calculate the expression between the square brackets we distinguish the cases

d = 0 and d = 0. When d = 0, we have E x (x) · x (x ⊕ 0) = σ2x . In all other cases we may assume that the expected value of x (x) · x (x ⊕ d), over all possible distributions with non-uniformity σ2x , is independent of d and equal to −σ2x /(N − 1) (because x sums to zero). The same applies for y and hence

E σ2z

2 2 −σ −σ 1 x y σ2x · σ2y + (N − 1) · = · N N −1 N −1 =

C.2

1 · σ2 · σ2 . N − 1 x y

(51)

The Sum of Two Identical Distributions

A similar expression can be derived for the non-uniformity of z = x ⊕ x with x and x drawn from a single distribution.

1 E σ2z = E z (z)2 N z  2  1 = E  p−1 · Px (x) · Px (x ) − 1  N z x⊕x =z  2  1  2 1 = E x (x) + x (x) · x (x )  N z N x N x⊕x =z  2  1  1 = E x (x) · x (x )  N z N x⊕x =z

1 1 1 = E x (x) · x (x ) · x (x ) · x (x ) N z N N x⊕x =z x ⊕x =z N · σ4x − τ4x 3 · N · σ4x − 6 · τ4x 1 τ4x +3· + = N N N N · (N − 3) 1 3 · (N − 2) · σ4x − 2 · τ4x , = (52) N · (N − 3) with 1 x (x)4 N x = O σ4x .

τ4x =

(53) (54)

OMAC: One-Key CBC MAC Tetsu Iwata and Kaoru Kurosawa Department of Computer and Information Sciences Ibaraki University 4–12–1 Nakanarusawa, Hitachi, Ibaraki 316-8511, Japan {iwata,kurosawa}@cis.ibaraki.ac.jp

Abstract. In this paper, we present One-key CBC MAC (OMAC) and prove its security for arbitrary length messages. OMAC takes only one key, K (k bits) of a block cipher E. Previously, XCBC requires three keys, (k + 2n) bits in total, and TMAC requires two keys, (k + n) bits in total, where n denotes the block length of E. The saving of the key length makes the security proof of OMAC substantially harder than those of XCBC and TMAC. Keywords: CBC MAC, block cipher, provable security

1 1.1

Introduction Background

The CBC MAC [6, 7] is a well-known method to generate a message authentication code (MAC) based on a block cipher. Bellare, Kilian, and Rogaway proved the security of the CBC MAC for ﬁxed message length mn bits, where n is the block length of the underlying block cipher E [1]. However, it is well known that the CBC MAC is not secure unless the message length is ﬁxed. Therefore, several variants of CBC MAC have been proposed for variable length messages. First Encrypted MAC (EMAC) was proposed. It is obtained by encrypting the CBC MAC value by E again with a new key K2 . That is, EMACK1 ,K2 (M ) = EK2 (CBCK1 (M )) , where M is a message, K1 is the key of the CBC MAC and CBCK1 (M ) is the CBC MAC value of M [2]. Petrank and Rackoﬀ then proved that EMAC is secure if the message length is a positive multiple of n [11] (Vaudenay showed another proof by using decorrelation theory [14]). Note that, however, EMAC requires two key schedulings of the underlying block cipher E. Next Black and Rogaway proposed XCBC which requires only one key scheduling of the underlying block cipher E [3]. XCBC takes three keys: one block cipher key K1 , and two n-bit keys K2 and K3 . XCBC is described as follows (see Fig. 1). T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 129–153, 2003. c International Association for Cryptologic Research 2003

130

Tetsu Iwata and Kaoru Kurosawa M [1]

M [2]

?

-? f ? K1- E

K1- E

M [3]

-? fK2 ? K1- E ?

M [1]

M [2]

?

-? f ? K1- E

K1- E

T Fig. 1. Illustration of XCBC.

M [3] 10i

-? fK3 ? K1- E ? T

– If |M | = mn for some m > 0, then XCBC computes exactly the same as the CBC MAC, except for XORing an n-bit key K2 before encrypting the last block. – Otherwise, 10i padding (i = n−1−|M | mod n) is appended to M and XCBC computes exactly the same as the CBC MAC for the padded message, except for XORing another n-bit key K3 before encrypting the last block. However, drawback of XCBC is that it requires three keys, (k + 2n) bits in total. Finally Kurosawa and Iwata proposed Two-key CBC MAC (TMAC) [9]. TMAC takes two keys, (k + n) bits in total: a block cipher key K1 and an n-bit key K2 . TMAC is obtained from XCBC by replacing (K2 , K3 ) with (K2 · u, K2 ), where u is some non-zero constant and “·” denotes multiplication in GF(2n ). 1.2

Our Contribution

In this paper, we present One-key CBC MAC (OMAC) and prove its security for arbitrary length messages. OMAC takes only one key, K of a block cipher E. The key length, k bits, is the minimum because the underlying block cipher must have a k-bit key K anyway. See Table 1 for a comparison with XCBC and TMAC (See Appendix A for a detailed comparison). Table 1. Comparison of key length. XCBC [3] TMAC [9] OMAC (This paper) key length (k + 2n) bits (k + n) bits k bits

OMAC is a generic name for OMAC1 and OMAC2. OMAC1 is obtained from XCBC by replacing (K2 , K3 ) with (L · u, L · u2 ) for some non-zero constant u in GF(2n ), where L is given by L = EK (0n ) . OMAC2 is similarly obtained by using (L · u, L · u−1 ). We can compute L · u, L · u−1 and L · u2 = (L · u) · u eﬃciently by one shift and one conditional XOR from L, L and L · u, respectively. OMAC1 (resp. OMAC2) is described as follows (see Fig. 2).

OMAC: One-Key CBC MAC M [1]

M [2]

?

-? f ? K- E

K- E

M [3]

M [1]

M [2]

-? fL · u ? ? K- E K- E ?

-? f ? K- E

131

M [3] 10i

-? fL · u2 ? K- E ?

T T Fig. 2. Illustration of OMAC1. Note that L = EK (0n ). OMAC2 is obtained by replacing L · u2 with L · u−1 in the right ﬁgure.

– If |M | = mn for some m > 0, then OMAC computes exactly the same as the CBC MAC, except for XORing L · u before encrypting the last block. – Otherwise, 10i padding (i = n−1−|M | mod n) is appended to M and OMAC computes exactly the same as the CBC MAC for the padded message, except for XORing L · u2 (resp. L · u−1 ) before encrypting the last block. Note that in TMAC, K2 is a part of the key while in OMAC, L is not a part of the key and is generated from K. This saving of the key length makes the security proof of OMAC substantially harder than that of TMAC, as shown below. In Fig. 2, suppose that M [1] = 0n . Then the output of the ﬁrst EK is L. The same L always appears again at the last block. In general, such reuse of L would get one into trouble in the security proof. (In OCB mode [13] and PMAC [5], L = EK (0n ) is also used as a key of a universal hash function. However, L appears as an output of some internal block cipher only with negligible probability.) Nevertheless we prove that OMAC is as secure as XCBC, where the security analysis is in the concrete-security paradigm [1]. Further OMAC has all other nice properties which XCBC (and TMAC) has. That is, the domain of OMAC is {0, 1}∗ , it requires one key scheduling of the underlying block cipher E and max{1, |M |/n} block cipher invocations. 1.3

Other Related Work

Jaulmes, Joux and Valette proposed RMAC [8] which is an extension of EMAC. RMAC encrypts the CBC MAC value with K2 ⊕ R, where R is an n-bit random string and it is a part of the tag. That is, RMACK1 ,K2 (M ) = (EK2 ⊕R (CBCK1 (M )), R) . They showed that the security of RMAC is beyond the birthday paradox limit. (XCBC, TMAC and OMAC are secure up to the birthday paradox limit.)

2 2.1

Preliminaries Notation R

We use similar notation as in [13, 5]. For a set A, x ← A means that x is chosen from A uniformly at random. If a, b ∈ {0, 1}∗ are equal-length strings

132

Tetsu Iwata and Kaoru Kurosawa

then a ⊕ b is their bitwise XOR. If a, b ∈ {0, 1}∗ are strings then a ◦ b denote their concatenation. For simplicity, we sometimes write ab for a ◦ b if there is no confusion. For an n-bit string a = an−1 · · · a1 a0 ∈ {0, 1}n , let a < < 1 = an−2 · · · a1 a0 0 denote the n-bit string which is a left shift of a by 1 bit, while a > > 1 = 0an−1 · · · a2 a1 denote the n-bit string which is a right shift of a by 1 bit. If a ∈ {0, 1}∗ is a string then |a| denotes its length in bits. For any bit string a ∈ {0, 1}∗ such that |a| ≤ n, we let a10n−|a|−1 if |a| < n, padn (a) = (1) a if |a| = n. Deﬁne an = max{1, |a|/n}, where the empty string counts as one block. In pseudocode, we write “Partition M into M [1] · · · M [m]” as shorthand for “Let m = M n , and let M [1], . . . , M [m] be bit strings such that M [1] · · · M [m] = M and |M [i]| = n for 1 ≤ i < m.” 2.2

CBC MAC

The block cipher E is a function E : KE × {0, 1}n → {0, 1}n , where each E(K, ·) = EK (·) is a permutation on {0, 1}n , KE is the set of possible keys and n is the block length. The CBC MAC [6, 7] is the simplest and most well-known algorithm to make a MAC from a block cipher E. Let M = M [1] ◦ M [2] ◦ · · · ◦ M [m] be a message string, where |M [1]| = |M [2]| = · · · = |M [m]| = n. Then CBCK (M ), the CBC MAC of M under key K, is deﬁned as Y [m], where Y [i] = EK (M [i] ⊕ Y [i − 1]) for i = 1, . . . , m and Y [0] = 0n . Bellare, Kilian and Rogaway proved the security of the CBC MAC for ﬁxed message length mn bits [1]. 2.3

The Field with 2n Points

We interchangeably think of a point a in GF(2n ) in any of the following ways: (1) as an abstract point in a ﬁeld; (2) as an n-bit string an−1 · · · a1 a0 ∈ {0, 1}n ; (3) as a formal polynomial a(u) = an−1 un−1 + · · · + a1 u + a0 with binary coeﬃcients. To add two points in GF(2n ), take their bitwise XOR. We denote this operation by a ⊕ b. To multiply two points, ﬁx some irreducible polynomial f (u) having binary coeﬃcients and degree n. To be concrete, choose the lexicographically ﬁrst polynomial among the irreducible degree n polynomials having a minimum number of coeﬃcients. We list some indicated polynomials (See [10, Chapter 10] for other polynomials).  for n = 64,  f (u) = u64 + u4 + u3 + u + 1 f (u) = u128 + u7 + u2 + u + 1 for n = 128, and  f (u) = u256 + u10 + u5 + u2 + 1 for n = 256.

OMAC: One-Key CBC MAC

133

To multiply two points a ∈ GF(2n ) and b ∈ GF(2n ), regard a and b as polynomials a(u) = an−1 un−1 + · · · + a1 u + a0 and b(u) = bn−1 un−1 + · · · + b1 u + b0 , form their product c(u) where one adds and multiplies coeﬃcients in GF(2), and take the remainder when dividing c(u) by f (u). Note that it is particularly easy to multiply a point a ∈ {0, 1}n by u. For example, if n = 128, a< <1 if a127 = 0, a·u= (2) (a < < 1) ⊕ 0120 10000111 otherwise. Also, note that it is easy to divide a point a ∈ {0, 1}n by u, meaning that one multiplies a by the multiplicative inverse of u in the ﬁeld: a · u−1 . For example, if n = 128, a> >1 if a0 = 0, a · u−1 = (3) (a > > 1) ⊕ 10120 1000011 otherwise.

3

Basic Construction

In this section, we show a basic construction of OMAC-family. OMAC-family is deﬁned by a block cipher E : KE × {0, 1}n → {0, 1}n , an n-bit constant Cst, a universal hash function H : {0, 1}n × X → {0, 1}n , and two distinct constants Cst1 , Cst2 ∈ X, where X is the ﬁnite domain of H. H, Cst1 and Cst2 must satisfy the following conditions while Cst is arbitrary. We write HL (·) for H(L, ·). 1. For any y ∈ {0, 1}n , the number of L ∈ {0, 1}n such that HL (Cst1 ) = y is at most 1 · 2n for some suﬃciently small 1 . 2. For any y ∈ {0, 1}n , the number of L ∈ {0, 1}n such that HL (Cst2 ) = y is at most 2 · 2n for some suﬃciently small 2 . 3. For any y ∈ {0, 1}n , the number of L ∈ {0, 1}n such that HL (Cst1 ) ⊕ HL (Cst2 ) = y is at most 3 · 2n for some suﬃciently small 3 . 4. For any y ∈ {0, 1}n , the number of L ∈ {0, 1}n such that HL (Cst1 ) ⊕ L = y is at most 4 · 2n for some suﬃciently small 4 . 5. For any y ∈ {0, 1}n , the number of L ∈ {0, 1}n such that HL (Cst2 ) ⊕ L = y is at most 5 · 2n for some suﬃciently small 5 . 6. For any y ∈ {0, 1}n , the number of L ∈ {0, 1}n such that HL (Cst1 ) ⊕ HL (Cst2 ) ⊕ L = y is at most 6 · 2n for some suﬃciently small 6 . Remark 1. Property 1 and 2 says that HL (Cst1 ) and HL (Cst2 ) are almost uniformly distributed. Property 3 is satisﬁed by AXU (almost XOR universal) hash functions [12]. Property 4, 5, 6 are new requirements introduced here. The algorithm of OMAC-family is described in Fig. 3 and illustrated in Fig. 4, where padn (·) is deﬁned in (1). The key space K of OMAC-family is K = KE . It takes a key K ∈ KE and a message M ∈ {0, 1}∗ , and returns a string in {0, 1}n .

134

Tetsu Iwata and Kaoru Kurosawa Algorithm OMAC-familyK (M ) L ← EK (Cst) Y [0] ← 0n Partition M into M [1] · · · M [m] for i ← 1 to m − 1 do X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← EK (X[i]) X[m] ← padn (M [m]) ⊕ Y [m − 1] if |M [m]| = n then X[m] ← X[m] ⊕ HL (Cst1 ) else X[m] ← X[m] ⊕ HL (Cst2 ) T ← EK (X[m]) return T Fig. 3. Deﬁnition of OMAC-family.

M [1]

M [2]

?

-? f ? K- E

K- E

M [3]

M [1]

M [2]

? -f HL (Cst1 ) ? ? K- E K- E ?

-? f ? K- E

T Fig. 4. Illustration of OMAC-family.

4

M [3] 10i

-? fHL (Cst2 ) ? K- E ? T

Proposed Speciﬁcation

In this section, we present two speciﬁcations of OMAC-family: OMAC1 and OMAC2. We use OMAC as a generic name for OMAC1 and OMAC2. In OMAC1 we let Cst = 0n , HL (x) = L·x, Cst1 = u and Cst2 = u2 , where “·” denotes multiplication over GF(2n ). Equivalently, L = EK (0n ), HL (Cst1 ) = L·u and HL (Cst2 ) = L · u2 . OMAC2 is the same as OMAC1 except for Cst2 = u−1 instead of Cst2 = u2 . Equivalently, L = EK (0n ), HL (Cst1 ) = L · u and HL (Cst2 ) = L · u−1 . Note that L · u, L · u−1 and L · u2 = (L · u) · u can be computed eﬃciently by one shift and one conditional XOR from L, L and L · u, respectively as shown in (2) and (3). It is easy to see that the conditions in Sec. 3 are satisﬁed for 1 = · · · = 6 = 2−n in OMAC1 and OMAC2. OMAC1 and OMAC2 are described in Fig. 5 and illustrated in Fig. 2.

5 5.1

Security of OMAC-Family Security Deﬁnitions

Let Perm(n) denote the set of all permutations on {0, 1}n . We say that P is a random permutation if P is randomly chosen from Perm(n). The security of a block cipher E can be quantiﬁed as Advprp E (t, q), the maximum advantage that an adversary A can obtain when trying to distinguish

OMAC: One-Key CBC MAC Algorithm OMAC1K (M ) L ← EK (0n ) Y [0] ← 0n Partition M into M [1] · · · M [m] for i ← 1 to m − 1 do X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← EK (X[i]) X[m] ← padn (M [m]) ⊕ Y [m − 1] if |M [m]| = n then X[m] ← X[m] ⊕ L · u else X[m] ← X[m] ⊕ L · u2 T ← EK (X[m]) return T

135

Algorithm OMAC2K (M ) L ← EK (0n ) Y [0] ← 0n Partition M into M [1] · · · M [m] for i ← 1 to m − 1 do X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← EK (X[i]) X[m] ← padn (M [m]) ⊕ Y [m − 1] if |M [m]| = n then X[m] ← X[m] ⊕ L · u else X[m] ← X[m] ⊕ L · u−1 T ← EK (X[m]) return T

Fig. 5. Description of OMAC1 and OMAC2.

EK (·) (with a randomly chosen key K) from a random permutation P (·), when allowed computation time t and q queries to an oracle (which is either EK (·) or P (·)). This advantage is deﬁned as follows.  def R R EK (·) P (·)  Advprp (A) = ← K : A = 1) − Pr(P ← Perm(n) : A = 1) Pr(K E E  Advprp (t, q) def = max {Advprp E E (A)} A

We say that a block cipher E is secure if Advprp E (t, q) is suﬃciently small. Similarly, a MAC algorithm is a map F : KF × {0, 1}∗ → {0, 1}n , where KF is a set of keys and we write FK (·) for F (K, ·). We say that an adversary AFK (·) forges if A outputs (M, FK (M )) where A never queried M to its oracle FK (·). Then we deﬁne the advantage as  def R FK (·)  Advmac forges) F (A) = Pr(K ← KF : A def mac  Advmac F (t, q, µ) = max {AdvF (A)} A

where the maximum is over all adversaries who run in time at most t, make at most q queries, and each query is at most µ bits. We say that a MAC algorithm is secure if Advmac F (t, q, µ) is suﬃciently small. Let Rand(∗, n) denote the set of all functions from {0, 1}∗ to {0, 1}n . This set is given a probability measure by asserting that a random element R of Rand(∗, n) associates to each string M ∈ {0, 1}∗ a random string R(M ) ∈ {0, 1}n . Then we deﬁne the advantage as  def R R FK (·) R(·)  Advviprf (A) = ← K : A = 1) − Pr(R ← Rand(∗, n) : A = 1) Pr(K F F

 Advviprf (t, q, µ) def = max Advviprf F F (A) A

where the maximum is over all adversaries who run in time at most t, make at most q queries, and each query is at most µ bits. We say that a MAC algorithm

136

Tetsu Iwata and Kaoru Kurosawa

is pseudorandom if Advviprf F (t, q, µ) is suﬃciently small (viprf stands for Variablelength Input PseudoRandom Function). Without loss of generality, adversaries are assumed to never ask a query outside the domain of the oracle, and to never repeat a query. 5.2

Theorem Statements

We ﬁrst prove that OMAC-family is pseudorandom if the underlying block cipher is a random permutation P (information-theoretic result). This proof is much harder than the previous works because of the reuse of L as explained Sec. 1.2. Lemma 1 (Main Lemma for OMAC-Family). Suppose that H, Cst1 and Cst2 satisfy the conditions in Sec. 3 for some suﬃciently small 1 , . . . , 6 , and let Cst be an arbitrarily n-bit constant. Suppose that a random permutation P ∈ Perm(n) is used in OMAC-family as the underlying block cipher. Let A be an adversary which asks at most q queries, and each query is at most nm bits (m is the maximum number of blocks in each query). Assume m ≤ 2n /4. Then R Pr(P ← Perm(n) : AOMAC-familyP (·) = 1)

q 2 7m2 + 2 (4) R R(·) 2 · − Pr(R ← Rand(∗, n) : A = 1) ≤ + 3m , 2 2n where = max{1 , . . . , 6 }. A proof is given in the next section. The following results hold for both OMAC1 and OMAC2. First, we obtain the following lemma by substituting = 2−n in Lemma 1. Lemma 2 (Main Lemma for OMAC). Suppose that a random permutation P ∈ Perm(n) is used in OMAC as the underlying block cipher. Let A be an adversary which asks at most q queries, and each query is at most nm bits. Assume m ≤ 2n /4. Then R Pr(P ← Perm(n) : AOMACP (·) = 1) (5m2 + 1)q 2 R − Pr(R ← Rand(∗, n) : AR(·) = 1) ≤ . 2n We next show that OMAC is pseudorandom if the underlying block cipher E is secure. It is standard to pass to this complexity-theoretic result from Lemma 2. (For example, see [1, Section 3.2] for the proof technique. In [1, Section 3.2], it is shown that a complexity-theoretic advantage of the CBC MAC is obtained from its information-theoretic advantage.) Corollary 1. Let E : KE × {0, 1}n → {0, 1}n be the underlying block cipher used in OMAC. Then Advviprf OMAC (t, q, nm) ≤

(5m2 + 1)q 2 + Advprp E (t , q ) , 2n

where t = t + O(mq) and q = mq + 1.

OMAC: One-Key CBC MAC x

x

?

? ? ? fRnd ? fRnd fRnd fHL (Cst1 ) ? fHL (Cst2 ) ? ? ⊕HL (Cst1 ) ? ⊕HL (Cst2 ) ? ?

P

P

x

P

? fRnd ? fRnd ? ? ?

Q1 (x)

Q2 (x)

x

Q3 (x)

P

?

Q4 (x)

x

137

P

?

Q5 (x)

x

P

?

Q6 (x)

Fig. 6. Illustrations of Q1 , Q2 Q3 , Q4 , Q5 and Q6 . Note that L = P (Cst).

Finally we show that OMAC is secure as a MAC algorithm from Corollary 1 in the usual way. (For example, see [1, Proposition 2.7] for the proof technique. In [1, Proposition 2.7], it is shown that pseudorandom functions are secure MACs.) Theorem 1. Let E : KE × {0, 1}n → {0, 1}n be the underlying block cipher used in OMAC. Then Advmac OMAC (t, q, nm) ≤

(5m2 + 1)q 2 + 1 + Advprp E (t , q ) , 2n

where t = t + O(mq) and q = mq + 1. 5.3

Proof of Main Lemma for OMAC-Family

Let H, Cst1 and Cst2 satisfy the conditions in Sec. 3 for some suﬃciently small 1 , . . . , 6 , and Cst be an arbitrarily n-bit constant. For a random permutation P ∈ Perm(n) and a random n-bit string Rnd ∈ {0, 1}n , deﬁne  def def  Q2 (x) = P (x ⊕ Rnd) ⊕ Rnd,  Q1 (x) = P (x) ⊕ Rnd, def def Q3 (x) = P (x ⊕ Rnd ⊕ HL (Cst1 )), Q4 (x) = P (x ⊕ Rnd ⊕ HL (Cst2 )),   def def Q5 (x) = P (x ⊕ HL (Cst1 )) and Q6 (x) = P (x ⊕ HL (Cst2 )),

(5)

where L = P (Cst). See Fig. 6 for illustrations. We ﬁrst show that Q1 (·), Q2 (·), Q3 (·), Q4 (·), Q5 (·), Q6 (·) are indistinguishable from a pair of six independent random permutations P1 (·), P2 (·), P3 (·), P4 (·), P5 (·), P6 (·). Lemma 3. Let A be an adversary which asks at most q queries in total. Then R R Pr(P ← Perm(n); Rnd ← {0, 1}n : AQ1 (·),...,Q6 (·) = 1)

3q 2 1 R P1 (·),...,P6 (·) · − Pr(P1 , . . . , P6 ← Perm(n) : A = 1) ≤ + , 2 2n where = max{1 , . . . , 6 }. A proof is given in Appendix B.

138

Tetsu Iwata and Kaoru Kurosawa Algorithm MOMACP1 ,P2 ,P3 ,P4 ,P5 ,P6 (M ) Partition M into M [1] · · · M [m] if m ≥ 2 then X[1] ← M [1] Y [1] ← P1 (X[1]) for i ← 2 to m − 1 do X[i] ← M [i] ⊕ Y [i − 1] Y [i] ← P2 (X[i]) X[m] ← padn (M [m]) ⊕ Y [m − 1] if |M [m]| = n then T ← P3 (X[m]) else T ← P4 (X[m]) if m = 1 then X[m] ← padn (M [m]) if |M [m]| = n then T ← P5 (X[m]) else T ← P6 (X[m]) return T Fig. 7. Deﬁnition of MOMAC. M [1]

M [2]

M [3]

?

-? f ?

-? f ?

P1

P2

M [1]

M [2]

?

-? f ?

P3

P1

?

P2

T Fig. 8. Illustration of MOMAC for |M | > n. M

?

M

P5

?

M [3] 10i

-? f ? P4

?

T

10i

?

P6

?

T T Fig. 9. Illustration of MOMAC for |M | ≤ n.

Next we deﬁne MOMAC (Modiﬁed OMAC). It uses six independent random permutations P1 , P2 , P3 , P4 , P5 , P6 ∈ Perm(n). The algorithm MOMACP1 ,...,P6 (·) is described in Fig. 7 and illustrated in Fig. 8 and Fig. 9. We prove that MOMAC is pseudorandom. Lemma 4. Let A be an adversary which asks at most q queries, and each query is at most nm bits. Assume m ≤ 2n /4. Then R Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1) (2m2 + 1)q 2 R − Pr(R ← Rand(∗, n) : AR(·) = 1) ≤ . 2n A proof is given in Appendix C.

OMAC: One-Key CBC MAC

139

O1 ,...,O6 Algorithm BA

1: When A asks its r-th query M (r) : 2:

T (r) ← MOMACO1 ,...,O6 (M (r) )

3:

return T (r)

4: When A halts and outputs b: 5:

output b

Fig. 10. Algorithm BA . Note that for 1 ≤ i ≤ 6, Oi is either Pi or Qi .

The next lemma shows that OMAC-familyP (·) and MOMACP1 ,...,P6 (·) are indistinguishable. Lemma 5. Let A be an adversary which asks at most q queries, and each query is at most nm bits. Assume m ≤ 2n /4. Then R Pr(P ← Perm(n) : AOMAC-familyP (·) = 1)

3m2 q 2 1 R · − Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1) ≤ + . 2 2n Proof. Suppose that there exists an adversary A such that R Pr(P ← Perm(n) : AOMAC-familyP (·) = 1)

3m2 q 2 1 R · − Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1) > + . 2 2n By using A, we show a construction of an adversary BA such that: – BA asks at most mq queries, and R Q (·),...,Q6 (·) = 1) – Pr(P ← Perm(n) : BA1 R

− Pr(P1 , . . . , P6 ← Perm(n) :

P (·),...,P6 (·) BA1

3m2 q 2 1 · = 1) > + , 2 2n

which contradicts Lemma 3. Let O1 (·), . . . , O6 (·) be BA ’s oracles. The construction of BA is given in Fig. 10. When A asks M (r) , then BA computes T (r) = MOMACO1 ,...,O6 (M (r) ) as if the underlying random permutations are O1 , . . . , O6 , and returns T (r) . When A halts and outputs b, then BA outputs b. Now we see that: – BA asks at most mq queries to its oracles, since A asks at most q queries, and each query is at most nm bits. R P (·),...,P6 (·) = 1) – Pr(P1 , . . . , P6 ← Perm(n) : BA1 R

= Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1), since BA gives A a perfect simulation of MOMACP1 ,...,P6 (·) if Oi (·) = Pi (·) for 1 ≤ i ≤ 6.

140

Tetsu Iwata and Kaoru Kurosawa M [1]

M [2]

? Rnd- f

M [1]

M [2]

P

P

? Rnd- f

P

P

? Rnd- f

?

M [3] 10i

-? fRnd - ? fRnd ? ? ⊕HL (Cst2 )

-? fRnd - ? fRnd ? ? ⊕HL (Cst1 ) ?

?

P

M [3]

? Rnd- f

P

?

T T Fig. 11. Computation of BA when Oi = Qi for 1 ≤ i ≤ 6, and |M | > n. M

M

? fHL (Cst1 ) ?

P

?

T

Fig. 12. Computation of BA R

P

?

T when Oi = Qi for 1 ≤ i ≤ 6, and |M | ≤ n.

Q (·),...,Q6 (·)

– Pr(P ← Perm(n) : BA1

10i

? fHL (Cst2 ) ?

= 1)

R

= Pr(P ← Perm(n) : AOMACP (·) = 1), since BA gives A a perfect simulation of OMACP (·) if Oi (·) = Qi (·) for 1 ≤ i ≤ 6. See Fig. 11 and Fig. 12. Note that Rnd is canceled in Fig. 11. This concludes the proof of the lemma.

We ﬁnally give a proof of Main Lemma for OMAC-family. Proof (of Lemma 1). By the triangle inequality, the left hand side of (4) is at most R Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1) (6) R − Pr(R ← Rand(∗, n) : AR(·) = 1) R + Pr(P ← Perm(n) : AOMAC-familyP (·) = 1) (7) R − Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1) . Lemma 4 gives us an upper bound on (6) and Lemma 5 gives us an upper bound on (7). Therefore the bound follows since 2

(2m2 + 1)q 2 1 7m + 2 3m2 q 2 q2 2 · · + + = + 3m . 2n 2 2n 2 2n This concludes the proof of the lemma.

Acknowledgement The authors would like to thank Phillip Rogaway of UC Davis for useful comments.

OMAC: One-Key CBC MAC

141

References 1. M. Bellare, J. Kilian, and P. Rogaway. The security of the cipher block chaining message authentication code. JCSS, vol. 61, no. 3, 2000. Earlier version in Advances in Cryptology — CRYPTO ’94, LNCS 839, pp. 341–358, Springer-Verlag, 1994. 2. A. Berendschot, B. den Boer, J. P. Boly, A. Bosselaers, J. Brandt, D. Chaum, I. Damg˚ ard, M. Dichtl, W. Fumy, M. van der Ham, C. J. A. Jansen, P. Landrock, B. Preneel, G. Roelofsen, P. de Rooij, and J. Vandewalle. Final Report of RACE Integrity Primitives. LNCS 1007, Springer-Verlag, 1995. 3. J. Black and P. Rogaway. CBC MACs for arbitrary-length messages: The three key constructions. Advances in Cryptology — CRYPTO 2000, LNCS 1880, pp. 197–215, Springer-Verlag, 2000. 4. J. Black and P. Rogaway. Comments to NIST concerning AES modes of operations: A suggestion for handling arbitrary-length messages with the CBC MAC. Second Modes of Operation Workshop. Available at http://www.cs.ucdavis.edu/˜rogaway/. 5. J. Black and P. Rogaway. A block-cipher mode of operation for parallelizable message authentication. Advances in Cryptology — EUROCRYPT 2002, LNCS 2332, pp. 384–397, Springer-Verlag, 2002. 6. FIPS 113. Computer data authentication. Federal Information Processing Standards Publication 113, U. S. Department of Commerce / National Bureau of Standards, National Technical Information Service, Springﬁeld, Virginia, 1994. 7. ISO/IEC 9797-1. Information technology — security techniques — data integrity mechanism using a cryptographic check function employing a block cipher algorithm. International Organization for Standards, Geneva, Switzerland, 1999. Second edition. ´ Jaulmes, A. Joux, and F. Valette. On the security of randomized CBC-MAC 8. E. beyond the birthday paradox limit: A new construction. Fast Software Encryption, FSE 2002, LNCS 2365, pp. 237–251, Springer-Verlag, 2002. Full version is available at Cryptology ePrint Archive, Report 2001/074, http://eprint.iacr.org/. 9. K. Kurosawa and T. Iwata. TMAC: Two-Key CBC MAC. Topics in Cryptology — CT-RSA 2003, LNCS 2612, pp. 33–49, Springer-Verlag, 2003. See also Cryptology ePrint Archive, Report 2002/092, http://eprint.iacr.org/. 10. R. Lidl and H. Niederreiter. Introduction to ﬁnite ﬁelds and their applications, revised edition. Cambridge University Press, 1994. 11. E. Petrank and C. Rackoﬀ. CBC MAC for real-time data sources. J.Cryptology, vol. 13, no. 3, pp. 315–338, Springer-Verlag, 2000. 12. P. Rogaway. Bucket hashing and its application to fast message authentication. Advances in Cryptology — CRYPTO ’95, LNCS 963, pp. 29–42, Springer-Verlag, 1995. 13. P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB: a block-cipher mode of operation for eﬃcient authenticated encryption. Proceedings of ACM Conference on Computer and Communications Security, ACM CCS 2001, ACM, 2001. 14. S. Vaudenay. Decorrelation over inﬁnite domains: The encrypted CBC-MAC case. Communications in Information and Systems (CIS), vol. 1, pp. 75–85, 2001. Earlier version in Selected Areas in Cryptography, SAC 2000, LNCS 2012, pp. 57–71, Springer-Verlag, 2001.

142

A A.1

Tetsu Iwata and Kaoru Kurosawa

Discussions Design Rationale

Our choice for OMAC1 is Cst = 0n , HL (x) = L · x, Cst1 = u and Cst2 = u2 , where “·” denotes multiplication over GF(2n ). Similarly, our choice for OMAC2 is Cst = 0n , HL (x) = L · x, Cst1 = u and Cst2 = u−1 . Below, we list reasons of this choice. – One might try to use Cst1 = 1 instead of Cst1 = u. In this case, the fourth condition in Sec. 3 is not satisﬁed, and in fact, the scheme can be easily attacked. Similarly, if one uses Cst2 = 1 instead of Cst2 = u2 or Cst2 = u−1 , the ﬁfth condition in Sec. 3 is not satisﬁed, and the scheme can be easily attacked. Therefore, we can not use “1” as a constant. – For OMAC1, we adopted u and u2 as Cst1 and Cst2 , since L · u and L · u2 = (L · u) · u can be computed eﬃciently by one left shift and one conditional XOR from L and L · u, respectively, as shown in (2). Note that this choice requires only a left shift. This would ease the implementation of OMAC1, especially in hardware. – For OMAC2, we adopted u−1 instead of u2 as Cst2 . It requires one right shift to compute L · u−1 instead of one left shift to compute (L · u) · u. This would allow to compute both L · u and L · u−1 from L simultaneously if both left shift and right shift are available (for example, the underlying block cipher uses both shifts). A.2

On Standard Key Separation Technique

For XCBC, assume that we want to use a single key K of E, where E is the AES. Then the following key separation technique is suggested in [4]. Let K be a k-bit AES key. Then   K1 = the ﬁrst k bits of AESK (C1a ) ◦ AESK (C1b ), K2 = AESK (C2 ), and  K3 = AESK (C3 ) for some distinct constants C1a , C1b , C2 and C3 . We call it XCBC+kst (key separation technique). XCBC+kst uses one k-bit key. However, it requires additional one key scheduling of AES and additional 3 or 4 AES invocations during the pre-processing time. Similar discussion can be applied to TMAC. For example, we can let K1 = the ﬁrst k bits of AESK (C1a ) ◦ AESK (C1b ), and K2 = AESK (C2 ) for some distinct constants C1a , C1b and C2 . We call it TMAC+kst. We note that OMAC does not need such a key separation technique since its key length is k bits in its own form (without using any key separation technique). This saves storage space and pre-processing time compared to XCBC+kst and TMAC+kst.

OMAC: One-Key CBC MAC

143

Table 2. Eﬃciency comparison of CBC MAC and its variants. Name

Domain K len. #K sche.

#E invo.

#E pre.

n m

CBC MAC ({0, 1} ) k 1 |M |/n 0 EMAC ({0, 1}n )+ 2k 2 1 + |M |/n 0 RMAC {0, 1}∗ 2k 1 + #M 1 + (|M | + 1)/n 0 XCBC {0, 1}∗ k + 2n 1 |M |/n 0 TMAC {0, 1}∗ k+n 1 |M |/n 0 XCBC+kst {0, 1}∗ k 2 |M |/n 3 or 4 TMAC+kst {0, 1}∗ k 2 |M |/n 2 or 3 OMAC

A.3

{0, 1}∗

k

1

|M |/n

1

Comparison

Let E : {0, 1}k × {0, 1}n → {0, 1}n be a block cipher, and M ∈ {0, 1}∗ be a message. We show an eﬃciency comparison of CBC MAC and its variants in Table 2, where: – ({0, 1}n )+ denotes the set of bit strings whose lengths are positive multiples of n. – “K len.” denotes the key length. – “#K sche.” denotes the number of block cipher key schedulings. For RMAC, it requires one block cipher key scheduling each time generating a tag. – #M denotes the number messages which the sender has MACed. – “#E invo.” denotes the number of block cipher invocations to generate a tag for a message M , assuming |M | > 0. – “#E pre.” denotes the number of block cipher invocations during the preprocessing time. These block cipher invocations can be done without the message. For XCBC+kst and TMAC+kst, the block cipher is assumed to be the AES. Next, let E : {0, 1}k × {0, 1}n → {0, 1}n be the underlying block cipher used XCBC, TMAC and OMAC. In Table 3, we show a security comparison of XCBC, TMAC and OMAC. We see that there is no signiﬁcant diﬀerence among them. They are equally secure up to the birthday paradox limit.

B

Proof of Lemma 3

If A is a ﬁnite multiset then #A denotes the number of elements in A. Let {a, b, c, . . .} be a ﬁnite multiset of bit strings. That is, a ∈ {0, 1}∗ , b ∈ {0, 1}∗ , c ∈ {0, 1}∗ , . . . hold. We say “{a, b, c, . . .} are distinct” if there exists no element occurs twice or more. Equivalently, {a, b, c, . . .} are distinct if any two elements in {a, b, c, . . .} are distinct. Before proving Lemma 3, we need the following lemma. Lemma 6. Let q1 , q2 , q3 , q4 , q5 , q6 be six non-negative integers. For 1 ≤ i ≤ 6, (1) (q ) (1) (q ) let xi , . . . , xi i be ﬁxed n-bit strings such that {xi , . . . , xi i } are distinct. (1) (q ) Similarly, for 1 ≤ i ≤ 6, let yi , . . . , yi i be ﬁxed n-bit strings such that

144

Tetsu Iwata and Kaoru Kurosawa Table 3. Security comparison of XCBC, TMAC and OMAC. Name

Security Bound

(4m2 + 1)q 2 + 1 + 3 · Advprp E (t , q ), 2n [3, Corollary 2] where t = t + O(mq) and q = mq. (3m2 + 1)q 2 + 1 TMAC Advmac + Advprp TMAC (t, q, nm) ≤ E (t , q ), 2n [9, Theorem 5.1] where t = t + O(mq) and q = mq. (5m2 + 1)q 2 + 1 mac OMAC AdvOMAC (t, q, nm) ≤ + Advprp E (t , q ), n 2 [Theorem 5.1] where t = t + O(mq) and q = mq + 1. Advmac XCBC (t, q, nm) ≤

XCBC

(1)

(q )

(1)

(q )

– {y1 , . . . , y1 1 } ∪ {y2 , . . . , y2 2 } are distinct, and (1) (q ) (1) (q ) (1) (q ) (1) (q ) – {y3 , . . . , y3 3 }∪{y4 , . . . , y4 4 }∪{y5 , . . . , y5 5 }∪{y6 , . . . , y6 6 } are distinct. Let P ∈ Perm(n) and Rnd ∈ {0, 1}n . Then the number of (P, Rnd) which satisﬁes  (i) (i)  Q1 (x1 ) = y1 for 1 ≤ ∀ i ≤ q1 ,    (i) (i)   Q2 (x2 ) = y2 for 1 ≤ ∀ i ≤ q2 ,    (i) (i) Q3 (x3 ) = y3 for 1 ≤ ∀ i ≤ q3 , (8) (i) (i) ∀  Q (x ) = y for 1 ≤ i ≤ q , 4 4  4 4   (i) (i) ∀    Q5 (x5 ) = y5 for 1 ≤ i ≤ q5 and   (i) (i) Q6 (x6 ) = y6 for 1 ≤ ∀ i ≤ q6 is at least (2n − (q + q 2 /2) · (1 + · 2n )) · (2n − q)!, where q = q1 + · · · + q6 and = max{1 , . . . , 6 }. (1)

(q )

Proof. At the top level, we consider two cases: Cst ∈ {x1 , . . . , x1 1 } and Cst ∈ (1) (q ) {x1 , . . . , x1 1 }. (1)

(q )

Case 1: Cst ∈ {x1 , . . . , x1 1 }. Let c be a unique integer such that 1 ≤ c ≤ q1 (c) and Cst = x1 . Let l be an n-bit variable. First, observe that: #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j

(i)

(j)

(c)

≤ q2 , x1 = x2 ⊕ y1 ⊕ l} ≤ q1 q2 , (i) (j) (c) ≤ q3 , x1 = x3 ⊕ y1 ⊕ l ⊕ Hl (Cst1 )} ≤ q1 q3 · 4 · 2n , (i) (j) (c) ≤ q4 , x1 = x4 ⊕ y1 ⊕ l ⊕ Hl (Cst2 )} ≤ q1 q4 · 5 · 2n , (i) (j) ≤ q5 , x1 = x5 ⊕ Hl (Cst1 )} ≤ q1 q5 · 1 · 2n , (i) (j) ≤ q6 , x1 = x6 ⊕ Hl (Cst2 )} ≤ q1 q6 · 2 · 2n , (i) (j) ≤ q3 , x2 = x3 ⊕ Hl (Cst1 )} ≤ q2 q3 · 1 · 2n , (i) (j) ≤ q4 , x2 = x4 ⊕ Hl (Cst2 )} ≤ q2 q4 · 2 · 2n , (i) (c) (j) ≤ q5 , x2 ⊕ y1 ⊕ l = x5 ⊕ Hl (Cst1 )} ≤ q2 q5 · 4 · 2n , (i) (c) (j) ≤ q6 , x2 ⊕ y1 ⊕ l = x6 ⊕ Hl (Cst2 )} ≤ q2 q6 · 5 · 2n ,

OMAC: One-Key CBC MAC

#{l | 1 ≤ ∃ i ≤ q3 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q3 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q3 , 1 ≤ ∃ j ≤ q3 q6 · 6 · 2n , #{l | 1 ≤ ∃ i ≤ q4 , 1 ≤ ∃ j ≤ q4 q5 · 6 · 2n , #{l | 1 ≤ ∃ i ≤ q4 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q5 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{l | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j

145

≤ q4 , x3 ⊕ Hl (Cst1 ) = x4 ⊕ Hl (Cst2 )} ≤ q3 q4 · 3 · 2n , (i) (c) (j) ≤ q5 , x3 ⊕ y1 ⊕ l = x5 } ≤ q3 q5 , (i) (c) (j) ≤ q6 , x3 ⊕ y1 ⊕ l ⊕ Hl (Cst1 ) = x6 ⊕ Hl (Cst2 )} (i)

(j)

(i)

(c)

(i)

(c)

(j)

≤ q5 , x4 ⊕ y1 ⊕ l ⊕ Hl (Cst2 ) = x5 ⊕ Hl (Cst1 )} (j)

≤ q6 , x4 ⊕ y1 ⊕ l = x6 } ≤ q4 q6 , (i) (j) ≤ q6 , x5 ⊕ Hl (Cst1 ) = x6 ⊕ Hl (Cst2 )} ≤ q5 q6 · 3 · 2n , (i) (c) (j) ≤ q 3 , y1 ⊕ y 1 ⊕ l = y 3 } ≤ q 1 q 3 , (i) (c) (j) ≤ q 4 , y1 ⊕ y 1 ⊕ l = y 4 } ≤ q 1 q 4 , (i) (c) (j) ≤ q 5 , y1 ⊕ y 1 ⊕ l = y 5 } ≤ q 1 q 5 , (i) (c) (j) ≤ q 6 , y1 ⊕ y 1 ⊕ l = y 6 } ≤ q 1 q 6 , (i) (c) (j) ≤ q 3 , y2 ⊕ y 1 ⊕ l = y 3 } ≤ q 2 q 3 , (i) (c) (j) ≤ q 4 , y2 ⊕ y 1 ⊕ l = y 4 } ≤ q 2 q 4 , (i) (c) (j) ≤ q5 , y2 ⊕ y1 ⊕ l = y5 } ≤ q2 q5 , and (i) (c) (j) ≤ q 6 , y2 ⊕ y 1 ⊕ l = y 6 } ≤ q 2 q 6 ,

from the conditions in Sec. 3. We now ﬁx any l which is not included in any of the above twenty-three sets. We have at least (2n −(q1 q2 +q1 q3 ·4 ·2n +q1 q4 ·5 ·2n +q1 q5 ·1 ·2n +q1 q6 ·2 ·2n + q2 q3 ·1 ·2n +q2 q4 ·2 ·2n +q2 q5 ·4 ·2n +q2 q6 ·5 ·2n +q3 q4 ·3 ·2n +q3 q5 +q3 q6 ·6 ·2n + q4 q5 ·6 ·2n +q4 q6 +q5 q6 ·3 ·2n +q1 q3 +q1 q4 +q1 q5 +q1 q6 +q2 q3 +q2 q4 +q2 q5 +q2 q6 )) ≥ (2n − q 2 · · 2n /2 − q 2 /2) choice of such l. (c) Now we let L ← l and Rnd ← l ⊕ y1 . Then we have: (1)

(q )

(1)

(q )

(1)

– the inputs to P , {x1 , . . . , x1 1 , x2 ⊕ Rnd, . . . , x2 2 ⊕ Rnd, x3 ⊕ Rnd ⊕ (q ) (1) (q ) HL (Cst1 ), . . . , x3 3 ⊕Rnd⊕HL (Cst1 ), x4 ⊕Rnd⊕HL (Cst2 ), . . . , x4 4 ⊕Rnd⊕ (1) (q ) (1) (q ) HL (Cst2 ), x5 ⊕HL (Cst1 ), . . . , x5 5 ⊕HL (Cst1 ), x6 ⊕HL (Cst2 ), . . . , x6 6 ⊕ HL (Cst2 )}, are distinct, and (1) (q ) (1) (q ) – the corresponding outputs, {y1 ⊕ Rnd, . . . , y1 1 ⊕ Rnd, y2 ⊕ Rnd, . . . , y2 2 ⊕ (1) (q ) (1) (q ) (1) (q ) (1) (q ) Rnd, y3 , . . . , y3 3 , y4 , . . . , y4 4 , y5 , . . . , y5 5 , y6 , . . . , y6 6 }, are distinct. In other words, for P , the above q1 + q2 + q3 + q4 + q5 + q6 input-output pairs are determined. The remaining 2n −(q1 +q2 +q3 +q4 +q5 +q6 ) input-output pairs are undetermined. Therefore we have (2n − (q1 + q2 + q3 + q4 + q5 + q6 ))! = (2n − q)! possible choice of P for any such ﬁxed (L, Rnd). (1)

(q )

Case 2: Cst ∈ {x1 , . . . , x1 1 }. In this case, we count the number of Rnd and L independently. Then similar to Case 1, observe that: #{Rnd | 1 ≤ ∃ i ≤ q2 , Cst = x2 ⊕ Rnd} ≤ q2 , (i) (j) #{Rnd | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j ≤ q2 , x1 = x2 ⊕ Rnd} ≤ q1 q2 , (i) (j) #{Rnd | 1 ≤ ∃ i ≤ q3 , 1 ≤ ∃ j ≤ q5 , x3 ⊕ Rnd = x5 } ≤ q3 q5 , (i) (j) #{Rnd | 1 ≤ ∃ i ≤ q4 , 1 ≤ ∃ j ≤ q6 , x4 ⊕ Rnd = x6 } ≤ q4 q6 , (i)

146

Tetsu Iwata and Kaoru Kurosawa

#{Rnd | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j #{Rnd | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j

(i)

(j)

≤ q3 , y1 ⊕ Rnd = y3 } ≤ q1 q3 , (i) (j) ≤ q4 , y1 ⊕ Rnd = y4 } ≤ q1 q4 , (i) (j) ≤ q5 , y1 ⊕ Rnd = y5 } ≤ q1 q5 , (i) (j) ≤ q6 , y1 ⊕ Rnd = y6 } ≤ q1 q6 , (i) (j) ≤ q3 , y2 ⊕ Rnd = y3 } ≤ q2 q3 , (i) (j) ≤ q4 , y2 ⊕ Rnd = y4 } ≤ q2 q4 , (i) (j) ≤ q5 , y2 ⊕ Rnd = y5 } ≤ q2 q5 , and (i) (j) ≤ q6 , y2 ⊕ Rnd = y6 } ≤ q2 q6 .

We ﬁx any Rnd which is not included in any of the above twelve sets. We have at least (2n − (q2 + q1 q2 + q3 q5 + q4 q6 + q1 q3 + q1 q4 + q1 q5 + q1 q6 + q2 q3 + q2 q4 + q2 q5 + q2 q6 )) ≥ (2n − q − q 2 /2) choice of such Rnd. Next we see that: #{L | 1 ≤ ∃ i ≤ q3 , Cst = x3 ⊕ Rnd ⊕ HL (Cst1 )} ≤ q3 · 1 · 2n , (i) #{L | 1 ≤ ∃ i ≤ q4 , Cst = x4 ⊕ Rnd ⊕ HL (Cst2 )} ≤ q4 · 2 · 2n , (i) #{L | 1 ≤ ∃ i ≤ q5 , Cst = x5 ⊕ HL (Cst1 )} ≤ q5 · 1 · 2n , (i) #{L | 1 ≤ ∃ i ≤ q6 , Cst = x6 ⊕ HL (Cst2 )} ≤ q6 · 2 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j ≤ q3 , x1 = x3 ⊕ Rnd ⊕ HL (Cst1 )} ≤ q1 q3 · 1 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j ≤ q4 , x1 = x4 ⊕ Rnd ⊕ HL (Cst2 )} ≤ q1 q4 · 2 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j ≤ q5 , x1 = x5 ⊕ HL (Cst1 )} ≤ q1 q5 · 1 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q1 , 1 ≤ ∃ j ≤ q6 , x1 = x6 ⊕ HL (Cst2 )} ≤ q1 q6 · 2 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j ≤ q3 , x2 = x3 ⊕ HL (Cst1 )} ≤ q2 q3 · 1 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j ≤ q4 , x2 = x4 ⊕ HL (Cst2 )} ≤ q2 q4 · 2 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j ≤ q5 , x2 ⊕ Rnd = x5 ⊕ HL (Cst1 )} ≤ q2 q5 · 1 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q2 , 1 ≤ ∃ j ≤ q6 , x2 ⊕ Rnd = x6 ⊕ HL (Cst2 )} ≤ q2 q6 · 2 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q3 , 1 ≤ ∃ j ≤ q4 , x3 ⊕ HL (Cst1 ) = x4 ⊕ HL (Cst2 )} n ≤ q3 q4 · 3 · 2 , (i) (j) #{L | 1 ≤ ∃ i ≤ q3 , 1 ≤ ∃ j ≤ q6 , x3 ⊕ Rnd ⊕ HL (Cst1 ) = x6 ⊕ HL (Cst2 )} ≤ q3 q6 · 3 · 2n , (i) (j) #{L | 1 ≤ ∃ i ≤ q4 , 1 ≤ ∃ j ≤ q5 , x4 ⊕ Rnd ⊕ HL (Cst2 ) = x5 ⊕ HL (Cst1 )} n ≤ q4 q5 · 3 · 2 , (i) (j) #{L | 1 ≤ ∃ i ≤ q5 , 1 ≤ ∃ j ≤ q6 , x5 ⊕ HL (Cst1 ) = x6 ⊕ HL (Cst2 )} n ≤ q5 q6 · 3 · 2 , (i) #{L | 1 ≤ ∃ i ≤ q1 , L = y1 ⊕ Rnd} ≤ q1 , (i) #{L | 1 ≤ ∃ i ≤ q2 , L = y2 ⊕ Rnd} ≤ q2 , (i) #{L | 1 ≤ ∃ i ≤ q3 , L = y3 } ≤ q3 , (i) #{L | 1 ≤ ∃ i ≤ q4 , L = y4 } ≤ q4 , (i) #{L | 1 ≤ ∃ i ≤ q5 , L = y5 } ≤ q5 , and (i) #{L | 1 ≤ ∃ i ≤ q6 , L = y6 } ≤ q6 , (i)

from the conditions in Sec. 3.

OMAC: One-Key CBC MAC

147

We now ﬁx any L which is not included in any of the above twenty-two sets. We have at least (2n −(q3 ·1 ·2n +q4 ·2 ·2n +q5 ·1 ·2n +q6 ·2 ·2n +q1 q3 ·1 ·2n +q1 q4 · 2 ·2n +q1 q5 ·1 ·2n +q1 q6 ·2 ·2n +q2 q3 ·1 ·2n +q2 q4 ·2 ·2n +q2 q5 ·1 ·2n +q2 q6 ·2 · 2n +q3 q4 ·3 ·2n +q3 q6 ·3 ·2n +q4 q5 ·3 ·2n +q5 q6 ·3 ·2n +q1 +q2 +q3 +q4 +q5 +q6 )) ≥ (2n − q · · 2n − q 2 · · 2n /2 − q) choice of such L. Then we have: (1)

(q )

(1)

(q )

(1)

– the inputs to P , {Cst, x1 , . . . , x1 1 , x2 ⊕ Rnd, . . . , x2 2 ⊕ Rnd, x3 ⊕ Rnd ⊕ (q ) (1) (q ) HL (Cst1 ), . . . , x3 3 ⊕Rnd⊕HL (Cst1 ), x4 ⊕Rnd⊕HL (Cst2 ), . . . , x4 4 ⊕Rnd⊕ (1) (q ) (1) (q ) HL (Cst2 ), x5 ⊕HL (Cst1 ), . . . , x5 5 ⊕HL (Cst1 ), x6 ⊕HL (Cst2 ), . . . , x6 6 ⊕ HL (Cst2 )}, are distinct, and (1) (q ) (1) (q ) – the corresponding outputs, {L, y1 ⊕Rnd, . . . , y1 1 ⊕Rnd, y2 ⊕Rnd, . . . , y2 2 (1) (q3 ) (1) (q4 ) (1) (q5 ) (1) (q6 ) ⊕ Rnd, y3 , . . . , y3 , y4 , . . . , y4 , y5 , . . . , y5 , y6 , . . . , y6 }, are distinct. In other words, for P , the above 1 + q1 + q2 + q3 + q4 + q5 + q6 input-output pairs are determined. The remaining 2n − (1 + q1 + q2 + q3 + q4 + q5 + q6 ) input-output pairs are undetermined. Therefore we have (2n −(1+q1 +q2 +q3 +q4 +q5 +q6 ))! = (2n − (1 + q))! possible choice of P for any such ﬁxed (L, Rnd). Completing the Proof. In Case 1, we have at least (2n −(q 2 /2)·(1+·2n ))·(2n −q)! choice of (P, Rnd) which satisﬁes (8). In Case 2, we have at least (2n − q − q 2 /2) · (2n − q · · 2n − q 2 · · 2n /2 − q) · (2n − (1 + q))! choice of (P, Rnd) which satisﬁes (8). This bound is at least (2n − (q + q 2 /2) · (1 + · 2n )) · (2n − q)!. This concludes the proof of the lemma. We now prove Lemma 3. Proof (of Lemma 3). For 1 ≤ i ≤ 6, let Oi be either Qi or Pi . The adversary A has oracle access to O1 , . . . , O6 . Since A is computationally unbounded, there is no loss of generality to assume that A is deterministic. There are six types of queries A can make: (Oj , x) which denotes the query “what is Oj (x)?” For the i-th query A makes to Oj , deﬁne the query-answer pair (i) (i) (i) (xj , yj ) ∈ {0, 1}n × {0, 1}n , where A’s query was (Oj , xj ) and the answer it (i)

got was yj . Suppose that we run A with oracles O1 , . . . , O6 . For this run, assume that A made qj queries to Oj (·), where q1 + · · · + q6 = q. For this run, we deﬁne view v of A as def

(1)

(q1 )

v = (y1 , . . . , y1

(1)

(q2 )

), (y2 , . . . , y2

(1)

(q3 )

), (y3 , . . . , y3

),

(1) (q ) (1) (q ) (1) (q ) (y4 , . . . , y4 4 ), (y5 , . . . , y5 5 ), (y6 , . . . , y6 6 )

For this view, we always have: (1)

(qj )

For 1 ≤ j ≤ 6, {yj , . . . , yj

} are distinct.

.

(9)

148

Tetsu Iwata and Kaoru Kurosawa

We note that since A never repeats a query, for the corresponding queries, we have: (q ) (1) For 1 ≤ j ≤ 6, {xj , . . . , xj j } are distinct. Since A is deterministic, the i-th query A makes is fully determined by the ﬁrst i − 1 query-answer pairs. This implies that if we ﬁx some qn-bit string V and return the i-th n-bit block as the answer for the i-th query A makes (instead of the oracles), then – – – –

A’s queries are uniquely determined, q1 , . . . , q6 are uniquely determined, the parsing of V into the format deﬁned in (9) is uniquely determined, and the ﬁnal output of A (0 or 1) is uniquely determined.

Let Vone be a set of all qn-bit strings V such that A outputs 1. We let def None = #Vone . Also, let Vgood be a set of all qn-bit strings V such that: For 1 ≤ ∀ i < ∀ j ≤ q, the i-th n-bit block of V = the j-th n-bit block of V . Note that if V ∈ Vgood then the corresponding parsing v satisﬁes: (1)

(q )

(1)

(q )

– {y1 , . . . , y1 1 } ∪ {y2 , . . . , y2 2 } are distinct, and (1) (q ) (1) (q ) (1) (q ) (1) (q ) – {y3 , . . . , y3 3 }∪{y4 , . . . , y4 4 }∪{y5 , . . . , y5 5 }∪{y6 , . . . , y6 6 } are distinct. observe that the number of V which is not in the set Vgood is at most Now q 2qn 2 2n . Therefore, we have qn q 2 #{V | V ∈ (Vone ∩ Vgood )} ≥ None − . (10) 2 2n Evaluation of prand . We ﬁrst evaluate R

prand = Pr(P1 , . . . , P6 ← Perm(n) : AP1 (·),...,P6 (·) = 1) def

=

#{(P1 , . . . , P6 ) | AP1 (·),...,P6 (·) = 1} . {(2n )!}6

For each V ∈ Vone , the number of (P1 , . . . , P6 ) such that For 1 ≤ j ≤ 6, Pj (xj ) = yj for 1 ≤ ∀ i ≤ qj , (i)

is exactly have

n 1≤j≤6 (2

prand =

(11)

− qj )!, which is at most (2n − q)! · {(2n )!}5 . Therefore, we

V ∈Vone

≤

(i)

V ∈Vone

= None ·

#{(P1 , . . . , P6 ) | (P1 , . . . , P6 ) satisfying (11)} {(2n )!}6 (2n − q)! (2n )!

(2n − q)! . (2n )!

OMAC: One-Key CBC MAC

149

Evaluation of preal . We next evaluate R

R

preal = Pr(P ← Perm(n); Rnd ← {0, 1}n : AQ1 (·),...,Q6 (·) = 1) def

#{(P, Rnd) | AQ1 (·),...,Q6 (·) = 1} . (2n )! · 2n

=

Then from Lemma 6, we have

preal ≥

V ∈(Vone ∩Vgood )

≥

V ∈(Vone ∩Vgood )

# {(P, Rnd) | (P, Rnd) satisfying (8)} (2n )! · 2n

(q + q 2 /2) · (1 + · 2n ) (2n − q)! · 1 − . (2n )! 2n

Completing the Proof. From (10) we have

qn (q + q 2 /2) · (1 + · 2n ) (2n − q)! q 2 · 1− · ≥ None − (2n )! 2n 2 2n

qn 2 n (q + q /2) · (1 + · 2n ) q 2 (2 − q)! ≥ prand − · 1− . · (2n )! 2n 2 2n

preal

Since 2qn ·

(2n −q)! (2n )!

≥ 1, we have

(q + q 2 /2) · (1 + · 2n ) q(q − 1) · 1− ≥ prand − 2 · 2n 2n 2 2 n (2q + q) + (q + 2q) · · 2 ≥ prand − 2 · 2n 2 1 3q · ≥ prand − + . 2 2n

preal

(12)

Applying the same argument to 1 − preal and 1 − prand yields that 1 − preal

3q 2 · ≥ 1 − prand − 2

Finally, (12) and (13) give |preal − prand | ≤

C

3q 2 2

·

1 + . 2n 1 2n

+ .

(13)

Proof of Lemma 4

Let S and S be distinct bit strings such that |S| = sn for some s ≥ 1, and def

R

|S | = s n for some s ≥ 1. Deﬁne Vn (S, S ) = Pr(P2 ← Perm(n) : CBCP2 (S) = CBCP2 (S )). Then the following proposition is known [3].

150

Tetsu Iwata and Kaoru Kurosawa

Proposition 1 (Black and Rogaway [3]). Let S and S be distinct bit strings such that |S| = sn for some s ≥ 1, and |S | = s n for some s ≥ 1. Assume that s, s ≤ 2n /4. Then (s + s )2 Vn (S, S ) ≤ . 2n Now let M and M be distinct bit strings such that |M | = mn for some def

R

m ≥ 2, and |M | = m n for some m ≥ 2. Deﬁne Wn (M, M ) = Pr(P1 , . . . , P6 ← Perm(n) : MOMACP1 ,...,P6 (M ) = MOMACP1 ,...,P6 (M )). We note that P5 and P6 are irrelevant in the event MOMACP1 ,...,P6 (M ) = MOMACP1 ,...,P6 (M ) since M and M are both longer than n bits. Also, P4 is irrelevant in the above event since |M | and |M | are both multiples of n. Further, P3 is irrelevant in the above event since it is invertible, and thus, there is a collision if and only if there is a collision at the input to the last encryption. We show the following lemma. Lemma 7 (MOMAC Collision Bound). Let M and M be distinct bit strings such that |M | = mn for some m ≥ 2, and |M | = m n for some m ≥ 2. Assume that m, m ≤ 2n /4. Then (m + m )2 . 2n Proof. Let M [1] · · · M [m] and M [1] · · · M [m ] be partitions of M and M respectively. We consider two cases: M [1] = M [1] and M [1] = M [1]. Wn (M, M ) ≤

Case 1: M [1] = M [1]. In this case, Let P1 be any permutation in Perm(n), and let S ← (P1 (M [1]) ⊕ M [2]) ◦ M [3] ◦ · · · ◦ M [m] and S ← (P1 (M [1]) ⊕ M [2]) ◦ M [3] ◦ · · · ◦ M [m ]. Observe that MOMACP1 ,...,P6 (M ) = MOMACP1 ,...,P6 (M ) if and only if CBCP2 (S) = CBCP2 (S ), since we may ignore the last encryptions in CBCP2 (S) and CBCP2 (S ). Therefore Wn (M, M ) ≤ Vn (S, S ) ≤

(m + m − 2)2 . 2n

Case 2: M [1] = M [1]. In this case, we split into two cases: P1 (M [1]) ⊕ M [2] = P1 (M [1]) ⊕ M [2] and P1 (M [1]) ⊕ M [2] = P1 (M [1]) ⊕ M [2]. The former event will occur with probability at most 1. The later one will occur with probability at most 2n1−1 , which is at most 22n . Then it is not hard to see that 2 (m + m − 2)2 2 (m + m )2 ≤ + ≤ 2n 2n 2n 2n by applying the similar argument as in Case 1. Wn (M, M ) ≤ 1 · Vn (S, S ) +

n

Let m be an integer such that m ≤ 2 /4. We consider the following four sets.  def  D1 = {M | M ∈ {0, 1}∗ , n < |M | ≤ mn and |M | is a multiple of n}    def  D2 = {M | M ∈ {0, 1}∗ , n < |M | ≤ mn and |M | is not a multiple of n} def  D3 = {M | M ∈ {0, 1}∗ and |M | = n}    def  D4 = {M | M ∈ {0, 1}∗ and |M | < n}

OMAC: One-Key CBC MAC

151

We next show the following lemma. Lemma 8. Let q1 , q2 , q3 , q4 be four non-negative integers. For 1 ≤ i ≤ 4, let (1) (q ) (j) Mi , . . . , Mi i be ﬁxed bit strings such that Mi ∈ Di for 1 ≤ j ≤ qi and (1) (q ) (1) (q ) {Mi , . . . , Mi i } are distinct. Similarly, for 1 ≤ i ≤ 4, let Ti , . . . , Ti i be (1) (qi ) ﬁxed n-bit strings such that {Ti , . . . , Ti } are distinct. Then the number of P1 , . . . , P6 ∈ Perm(n) such that  (i) (i)  MOMACP1 ,...,P6 (M1 ) = T1 for 1 ≤ ∀ i ≤ q1 ,    (i) (i) MOMACP1 ,...,P6 (M2 ) = T2 for 1 ≤ ∀ i ≤ q2 , (14) (i) (i)  MOMACP1 ,...,P6 (M3 ) = T3 for 1 ≤ ∀ i ≤ q3 and    (i) (i) MOMACP1 ,...,P6 (M4 ) = T4 for 1 ≤ ∀ i ≤ q4 is at least {(2n )!}6 1 −

2q 2 m2 2n (1)

·

where q = q1 + · · · + q4 .

1 2qn ,

(q1 )

Proof. We ﬁrst consider M1 , . . . , M1

. The number of (P1 , P2 ) such that

MOMACP1 ,...,P6 (M1 ) = MOMACP1 ,...,P6 (M1 ) for 1 ≤ ∃ i < ∃ j ≤ q1 (i)

(j)

2 is at most {(2n )!}2 · q21 · 4m 2n from Lemma 7. Note that P3 , . . . , P6 are irrelevant in the above event. (1) (q ) We next consider M2 , . . . , M2 2 . The number of (P1 , P2 ) such that MOMACP1 ,...,P6 (M2 ) = MOMACP1 ,...,P6 (M2 ) for 1 ≤ ∃ i < ∃ j ≤ q2 (i)

(j)

2 is at most {(2n )!}2 · q22 · 4m 2n from Lemma 7. Now we ﬁx any (P1 , P2 ) which is not like the above. We have at least q2 4m2 2 n {(2 )!}2 1 − q21 · 4m choice. 2n − 2 · 2n Now P1 and P2 are ﬁxed in such a way that the inputs to P3 are distinct and (1) (q ) the inputs to P4 are distinct. Also, the corresponding outputs {T3 , . . . , T3 3 } (1) (q4 ) are distinct, and {T4 , . . . , T4 } are distinct. We know that the inputs to P5 are (1) (q ) distinct, and the corresponding outputs {T3 , . . . , T3 3 } are distinct. Also, the (1) (q ) inputs to P6 are distinct, and and the corresponding {T4 , . . . , T4 4 } outputs 2 q2 4m2 · are distinct. Therefore, we have at least {(2n )!}2 1 − q21 · 4m 2n − 2 · 2n n n n n (2 − q1 )! · (2 − q2 )! · (2 − q3 )! · (2 − q4 )! choice of P1 , . . . , P6 which satisﬁes 2

2

(14). This bound is at least {(2n )!}6 1 − 2q2nm This concludes the proof of the lemma.

·

1 2qn

since (2n − qi )! ≥

(2n )! 2qi n .

We now prove Lemma 4. Proof (of Lemma 4). Let O be either MOMACP1 ,...,P6 or R. Since A is computationally unbounded, there is no loss of generality to assume that A is deterministic.

152

Tetsu Iwata and Kaoru Kurosawa

Similar to the proof of Lemma 3, for the query A makes to the oracle O, (i) (i) deﬁne the query-answer pair (Mj , Tj ) ∈ Dj × {0, 1}n , where A’s i-th query in Dj was Mj ∈ Dj and the answer it got was Tj ∈ {0, 1}n . Suppose that we run A with the oracle. For this run, assume that A made qj queries in Dj , where 1 ≤ j ≤ 4 and q1 + · · · + q4 = q. For this run, we deﬁne view v of A as (i)

(i)

def

(1)

(q1 )

v = (T1 , . . . , T1

(1)

(q2 )

), (T2 , . . . , T2

),

(1) (q ) (1) (q ) (T3 , . . . , T3 3 ), (T4 , . . . , T4 4 )

.

(15)

Since A is deterministic, the i-th query A makes is fully determined by the ﬁrst i − 1 query-answer pairs. This implies that if we ﬁx some qn-bit string V and return the i-th n-bit block as the answer for the i-th query A makes (instead of the oracle), then – – – –

A’s queries are uniquely determined, q1 , . . . , q4 are uniquely determined, the parsing of V into the format deﬁned in (15) is uniquely determined, and the ﬁnal output of A (0 or 1) is uniquely determined.

Let Vone be a set of all qn-bit strings V such that A outputs 1. We let def None = #Vone . Also, let Vgood be a set of all qn-bit strings V such that: For 1 ≤ ∀ i < ∀ j ≤ q, the i-th n-bit block of V = the j-th n-bit block of V . Note that if V ∈ Vgood , then the corresponding parsing v of V satisﬁes that: (1) (q ) (1) (q ) (1) (q ) {T1 , . . . , T1 1 } are distinct, {T2 , . . . , T2 2 } are distinct, {T3 , . . . , T3 3 } are (1) (q ) distinct and {T4 , . . . , T4 4 } are distinct. observe that the number of V Now qn which is not in the set Vgood is at most 2q 22n . Therefore, we have qn q 2 #{V | V ∈ (Vone ∩ Vgood )} ≥ None − . (16) 2 2n Evaluation of prand . We ﬁrst evaluate R

prand = Pr(R ← Rand(∗, n) : AR(·) = 1) . def

Then it is not hard to see

prand =

V ∈Vone

1 None = qn . 2qn 2

Evaluation of preal . We next evaluate def

R

preal = Pr(P1 , . . . , P6 ← Perm(n) : AMOMACP1 ,...,P6 (·) = 1) =

#{(P1 , . . . , P6 ) | AMOMACP1 ,...,P6 (·) = 1} . {(2n )!}6

OMAC: One-Key CBC MAC

153

Then from Lemma 8, we have preal ≥

V ∈(Vone ∩Vgood )

≥

V ∈(Vone ∩Vgood )

# {(P1 , . . . , P6 ) | (P1 , . . . , P6 ) satisfying (14)} {(2n )!}6

2q 2 m2 1− 2n

1 . 2qn

·

Completing the Proof. From (16) we have

qn 2q 2 m2 1 q 2 preal ≥ None − · 1 − · qn n n 2 2 2 2

2q 2 m2 q 1 · 1− = prand − 2n 2 2n q 1 2q 2 m2 ≥ prand − − n 2n 2 2 2q 2 m2 + q 2 ≥ prand − . 2n

(17)

Applying the same argument to 1 − preal and 1 − prand yields that 1 − preal ≥ 1 − prand − Finally, (17) and (18) give |preal − prand | ≤

2q 2 m2 + q 2 . 2n

2q 2 m2 +q 2 . 2n

(18)

A Concrete Security Analysis for 3GPP-MAC Dowon Hong1 , Ju-Sung Kang1 , Bart Preneel2 , and Heuisu Ryu1 1 Information Security Technology Division, ETRI 161 Kajong-Dong, Yusong-Gu, Taejon, 305-350, Korea {dwhong,jskang,hsryu}@etri.re.kr 2 Katholieke Universiteit Leuven, ESAT/COSIC Kasteelpark Arenberg 10, B-3001 Leuven-Heverlee, Belgium [email protected]

Abstract. The standardized integrity algorithm f 9 of the 3GPP algorithm computes a MAC (Message Authentication Code) to establish the integrity and the data origin of the signalling data over a radio access link of W-CDMA IMT-2000. The function f 9 is based on the block cipher KASUMI and it can be considered as a variant of CBC-MAC. In this paper we examine the provable security of f 9. We prove that f 9 is a secure pseudorandom function by giving a concrete bound on an adversary’s inability to forge a MAC value in terms of her inability to distinguish the underlying block cipher from a random permutation. Keywords: Message authentication code, 3GPP-MAC, Provable security, Pseudo-randomness.

1

Introduction

Within the security architecture of 3GPP (the 3rd Generation Partnership Project) a standardized data authentication algorithm f 9 has been deﬁned; this MAC (Message Authentication Code) algorithm is a variant of the standard CBC-MAC (Cipher Block Chaining) based on the block cipher KASUMI [22]. We refer to this MAC algorithm as “3GPP-MAC.” The purpose of this work is to provide a proof of security for the 3GPP-MAC algorithm. Providing a security proof in the sense of reduction-based cryptography intuitively means that one proves the following statement: if there exists an adversary A that breaks a given MAC built from a block cipher E, then there exists a corresponding adversary A that breaks the block cipher E. The provable security treatment of MACs based on a block cipher started by Bellare et al. [1]. They have provided such a security proof for CBC-MAC. However, their proof is restricted to the case where the input messages are of ﬁxed length. It is well known that CBC-MAC is not secure when the message length is variable [1]. A matching birthday attack has been described by Preneel and van Oorschot in [17]. Petrank and Rackoﬀ [16] were the ﬁrst to rigorously address the issue of message length variability. They provided a security proof for EMAC (Encrypted CBC-MAC) which handles messages of variable unknown lengths. Black and T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 154–169, 2003. c International Association for Cryptologic Research 2003

A Concrete Security Analysis for 3GPP-MAC

155

Rogaway [3] introduced three reﬁnements to EMAC that improve the eﬃciency. They also provided a new security proof by using new techniques which treat EMAC as an instance of the Carter-Wegman paradigm [5, 20]. Jaulmes, Joux, and Valette [7] proposed RMAC (Randomized MAC) which is an extension of EMAC. They showed that the security of RMAC improves over the birthday bound of [17] in the ideal-cipher model. This is not a reduction-based provable security result. Note that RMAC is currently being considered for standardization by NIST. However, recently it has been demonstrated that RMAC is vulnerable to related-key attacks [12–14]. Furthermore, it has been shown that it is not possible to provide a proof of security for the salted variant of RMAC [19]. Black and Rogaway [4, 18] have proposed a parallelizable block cipher mode of operation for message authentication (PMAC) together with a security proof. Several other new modes, such as XECB-MAC [6] and TMAC [8] have been submitted to NIST for consideration, but they will probably not be included in the standard [24]. The security evaluation of 3GPP-MAC has primarily been performed by the 3GPP SAGE group (Security Algorithms Group of Experts) [21]. Based on some ad hoc analysis, the general conclusion of [21] is that 3GPP-MAC does not exhibit any security weaknesses. Recently, Knudsen and Mitchell [11] analyzed 3GPP-MAC from the viewpoint of a birthday attack. They have described several types of forgery and key recovery attacks for 3GPP-MAC; they have also shown that key recovery attacks are infeasible: the most eﬃcient attack requires around 3 × 248 chosen messages. We believe that it is important to provide a security proof for a MAC algorithm based on an information theoretic and a complexity theoretic analysis. Such a security proof can be considered as a theoretical evidence of the soundness of the overall structure of a MAC algorithm. However so far no security proof has been provided in the literature for 3GPP-MAC. This observation motivates this paper. In this paper we prove that 3GPP-MAC is secure in the sense of reductionbased cryptography. More speciﬁcally, we prove that 3GPP-MAC is a pseudorandom function which means that no attacker with polynomially many queries can distinguish 3GPP-MAC from a perfect random function; by using this fact, we show that 3GPP-MAC is a secure MAC algorithm under the assumption that the underlying block cipher is a pseudorandom permutation. This assumption is a reasonable one since the pseudorandomness of the 3GPP block cipher KASUMI has recently been investigated by Kang et al. [9, 10]. We do not address the question whether the distinguishing bound we have obtained is suﬃciently tight or not. We leaves this as an open problem.

2 2.1

Preliminaries Notation

Let {0, 1}n denote the set of all n-bit strings, and {0, 1}n∗ be the set of all binary strings whose bit-lengths are positive multiples of n. Let Rn∗→l be the set of all functions λ : {0, 1}n∗ → {0, 1}l , Pn be the set of all permutations

156

Dowon Hong et al. M [1]

M [2]

M [r]

... I [1]

EK O [1]

I [2]

I [r]

EK

EK'

EK ... ...

O[2]

O [r]

3GPP-MACK(M ) OK(M )

(left-most l -bit)

Fig. 1. The 3GPP-MAC algorithm.

π : {0, 1}n → {0, 1}n , and K be the key space which is the set of all possible key values K. For any given key space K, message space {0, 1}n∗ , and codomain {0, 1}l , a MAC is a map F : K × {0, 1}n∗ → {0, 1}l . A MAC F can be regarded as a family of functions from {0, 1}n∗ to {0, 1}l indexed by a key K ∈ K. In fact, F is a multiset since two or more diﬀerent keys may deﬁne the same function. Let E : K × {0, 1}n → {0, 1}n be a block cipher; then EK (X) = Y denotes that E uses a key K ∈ K to encrypt an n-bit string X to an n-bit ciphertext Y . 2.2

The 3GPP-MAC Algorithm

The 3GPP-MAC algorithm operates as follows. Suppose the underlying block cipher E has n-bit input and output blocks. Every message M in 3GPP-MAC is ﬁrst padded such that the length is a multiple of n. The padding string in 3GPPMAC is appended even if the size of the message is already a multiple of n; it is of the following form: Count || F resh || M essage || Direction || 1 || 00 · · · 0, where Count, F resh, and Direction are system dependent parameters. Throughout this paper we assume that the lengths of all messages are multiples of n since the details of the padding scheme are not relevant for our proof of security. The 3GPP-MAC algorithm uses a pair of 128-bit keys K and K , where K = K ⊕ Const and Const = 0xAA · · · A. For any r-block message M = M [1] · · · M [r], 3GPP-MAC is computed as follows: O[0] ← 0 for i = 1, · · · , r do I[i] ← O[i − 1] ⊕ M [i] O[i] ← EK (I[i]) OK (M ) ← O[1] ⊕ O[2] ⊕ · · · ⊕ O[r] MK (M ) ← the leftmost l bits of EK (OK (M )) return MK (M ) Here MK (M ) is the 3GPP-MAC value of the message M . The 3GPP-MAC algorithm is also depicted in Fig. 1. The 3GPP integrity algorithm f 9 in the 3GPP technical speciﬁcation [22] states that the underlying block cipher is KA-

A Concrete Security Analysis for 3GPP-MAC

157

SUMI: KASUMI is a 64-bit block cipher with a 128-bit key. The 3GPP-MAC value consists of the leftmost 32 bits of the ﬁnal encryption or l = 32. Note that in the 3GPP-MAC algorithm, K and K should be distinct to handle variable length messages. In fact, it is easy to break the 3GPP-MAC algorithm if K = K . For example, if an adversary requests MK (X) for a 1-block message X, obtaining T , and requests MK (0) of a 1-block message 0, obtaining S, then she can compute the MAC MK (X||0||T ⊕ X||0||T ) = S. In other words, from the MACs of X and 0, one can forge the MAC of X||0||MK (X) ⊕ X||0||MK (X) without knowing the key K. 2.3

Comparison between CBC-MAC, EMAC, and 3GPP-MAC

The basic CBC-MAC algorithm [23] works as follows: for any r-block message M = M [1] · · · M [r], the CBC-MAC of M under the key K is deﬁned as CBCEK (M ) = Cr , where Ci = EK (M [i] ⊕ Ci−1 ) for i = 1, · · · , r and C0 = 0. The CBC-MAC is illustrated in Fig. 2. M [1]

M [2]

M [r]

...

EK

EK

C1

C2

EK ...

C r = CBC EK (M )

Fig. 2. The CBC-MAC algorithm.

It is well known that CBC-MAC is secure for messages of constant length, while it is insecure for arbitrary variable length messages [1]. There have been several eﬀorts to design a variant of CBC-MAC for variable length messages. Bellare et al. [1] have suggested three variants of CBC-MAC, Input-length key separation, Length-prepending, and Encrypt last block, to handle variable length messages. Out of these three variants the most attractive method is the last one, since the length of message is not needed until the end of the computation. The method of encrypting the last block is called the EMAC; it has been proposed by the RIPE project in 1993 [2] and subsequently included in the ISO standard [23]; its security has been rigorously analyzed by Petrank and Rackoﬀ [16]. For any r-block message M = M [1] · · · M [r], EMAC of M is deﬁned as EM ACEK1 ,EK2 (M ) = EK2 (CBCEK1 (M )), where K1 and K2 are two diﬀerent keys in K. The EMAC algorithm is depicted in Fig. 3. In fact, Petrank and Rackoﬀ [16] used one secret key K to produce two secret keys K1 = EK (0) and K2 = EK (1), and they regarded EK1 and EK2 as two independently chosen random functions f1 and f2 for the proof of security.

158

Dowon Hong et al.

M [1]

M [2]

M [r]

...

EK1

EK1

C1

C2

EK1 ...

Cr

EK2 EMAC EK ,EK( M ) 1

2

Fig. 3. The EMAC algorithm.

In order to optimize eﬃciency for constructions that accept arbitrary bit strings, Black and Rogaway [3] reﬁned EMAC in three methods which they called ECBC, FCBC, and XCBC, respectively. On the other hand, 3GPP-MAC can be seen as a variant of EMAC. There are two diﬀerences between 3GPP-MAC and EMAC. First, 3GPP-MAC uses a pair of keys K and K such that K is straightforwardly derived from K by XORing the ﬁxed constant, but in EMAC, two keys K1 and K2 are obtained by encrypting two plaintexts 0 and 1 with the same key K. Thus we cannot regard EK and EK as two independently chosen random functions f and f for the proof of security. This situation of 3GPP-MAC is diﬀerent from that of EMAC. Second, while 3GPP-MAC uses CBCEK (M )⊕(C1 ⊕C2 ⊕· · ·⊕Cr−1 ) as the input of the ﬁnal block computation EK , EMAC uses CBCEK1 (M ) without XORing Ci ’s as the input of the ﬁnal computation EK2 . These two distinct points give rise to a diﬀerent security proof for EMAC and 3GPP-MAC. 2.4

Security Model

We consider the following security model. Let A be an adversary and AO denote that A can access an oracle O. Without loss of generality, adversaries are assumed to never ask a query outside the domain of the oracle, and to never repeat a query. For any g ∈ F, we say that A forges g if A outputs g(x) for some x ∈ {0, 1}n∗ where Ag never queried x to its oracle g. Deﬁne R mac (A) = Pr A forges g | g ← F , AdvF R

where g ← F denotes the experiment of choosing a random element from F. Assume that for any random function λ ∈ Rn∗→l , the value of λ(x) is a uniformly chosen l-bit string from {0, 1}l , for each x ∈ {0, 1}n∗ . That is, for any λ ∈ Rn∗→l , x ∈ {0, 1}n∗ , and y ∈ {0, 1}l , Pr(λ(x) = y) = 2−l . This is a reasonable assumption since for any uniformly chosen function g : {0, 1}m → {0, 1}l , Pr(g(x) = y) = 2−l for each x ∈ {0, 1}m and y ∈ {0, 1}l , regardless of the input length m. We deﬁne the advantage of an adversary A to distinguish a MAC F from the family of random functions Rn∗→l as R R Rn∗→l (A) = Pr Ag outputs 1 | g ← F −Pr Aλ outputs 1 | λ ← Rn∗→l . AdvF

A Concrete Security Analysis for 3GPP-MAC

159

We overload the notation deﬁned above and write that mac mac (t, q, σ) = max{AdvF (A)} AdvF A

and

Rn∗→l Rn∗→l AdvF (t, q, σ) = max{AdvF (A)} , A

where the maximum is over all adversaries A who run in time at most t and ask its oracle q queries having aggregate length of σ blocks. On the other hand, we regard the block cipher Λn as a family of permutations from {0, 1}n to itself indexed by a secret key K ∈ K. Deﬁne R R Pn f π − Pr A AdvΛ (A) = Pr A outputs 1 | f ← Λ outputs 1 | π ← P n n n and Pn Pn AdvΛ (t, q) = max{AdvΛ (A)} , n n A

where the maximum is over all distinguishers A that run in time at most t and make at most q queries. In what follows it will be convenient for us to think of 3GPP-MAC as using two functions f and f instead of EK and EK , respectively. We do this by denoting f to be EK for a randomly chosen key K and f to be EK for a second key K . Note that f is derived from f . Now, we may write ¯ f (M )) , Mf (M ) = the leftmost l bits of f (O ¯ f (M ) = O[1] ⊕ O[2] ⊕ · · · ⊕ O[r], O[i] = f (I[i]), I[i] = O[i − 1] ⊕ M [i] where O for 1 ≤ i ≤ r, and O[0] = 0. We consider two function families related to 3GPP-MAC. A family MΛn for a block cipher Λn is the set of all functions Mf for all f ∈ Λn and a family MPn is the set of all functions Mπ for all π ∈ Pn . The Mπ is similarly deﬁned as Mf by considering π and π instead of f and f , that is, for any message M , ¯ π (M )) , Mπ (M ) = the leftmost l bits of π (O where π ∈ Pn − {π} is automatically determind by π. Note that our result in the next section have nothing to do with the method of determining π from π.

3 3.1

The Security of 3GPP-MAC Main Results

In this section we prove that the security of MΛn is implied by the security of the underlying block cipher Λn . We call a block cipher secure if it is a pseudorandom permutation: this means that no attacker with polynomially many encryption queries can distinguish the block cipher from a perfect random permutation.

160

Dowon Hong et al.

This approach to modeling the security of a block cipher was introduced by Luby and Rackoﬀ [15]. We ﬁrst give the following information-theoretic bound on the security of 3GPP-MAC. We start by checking the possibility of distinguishing a random function in Rn∗→l from a random function in MPn . We show that even a computationally unbounded adversary cannot obtain a too large advantage. Theorem 1 Let A be an adversary that makes queries to a random function chosen either from MPn or from Rn∗→l . Suppose that A asks its oracle q queries, these queries having aggregate length of σ blocks. Then Rn∗→l AdvM (A) ≤ Pn

(σ 2 + 2q 2 ) . 2n

The proof of Theorem 1 will be given in Sect. 3.2. It is a well-known result that if a MAC algorithm preserves pseudorandomness, it resists an existential forgery under adaptive chosen message attacks [1]. By using this fact and Theorem 1, we can obtain the main result: Theorem 2 Let Λn : K×{0, 1}n → {0, 1}n be a family of permutations obtained from a block cipher. Then (σ 2 + 2q 2 ) 2n

(3.1)

(σ 2 + 2q 2 ) 1 + l , 2n 2

(3.2)

Rn∗→l Pn AdvM (t, q, σ) ≤ AdvΛ (t , σ) + n Λn

and Pn mac AdvM (t, q, σ) ≤ AdvΛ (t , σ) + Λn n

where t = t + O(σn). Proof. Let A be an adversary distinguishing MΛn from Rn∗→l which makes at most q oracle queries having aggregate length of σ blocks and runs in time at most t. In order to prove equation (3.1), we ﬁrst show that there exists an adversary BA which distinguishes Λn from Pn such that Rn∗→l Rn∗→l Pn AdvΛ (BA ) = AdvM (A) − AdvM (A) , n Λn Pn

where BA makes at most σ queries and runs in time at most t = t + O(σn). The adversary BA gets an oracle f : {0, 1}n → {0, 1}n , a permutation chosen from Λn or Pn . It will run A as a subroutine, using f to simulate the oracle h : {0, 1}n∗ → {0, 1}l that A expects. f Adversary BA for i = 1, · · · , q do when A asks its oracle a query Mi , answer with Mf (Mi ) end for A outputs a bit b return b

A Concrete Security Analysis for 3GPP-MAC

161

The oracle supplied to A by BA is Mf , where f is BA ’s oracle, and hence R R f f Pn AdvΛ − Pr B (B ) = Pr B = 1 | f ← Λ = 1 | f ← P A n n A A n R R = Pr Ah = 1 | h ← MΛn − Pr Ah = 1 | h ← MPn . However

R R Rn∗→l h h AdvM − Pr A . (A) = Pr A = 1 | h ← M = 1 | h ← R P n∗→l n Pn

Therefore by taking the sum of the two equations above, we obtain that Rn∗→l Pn (BA ) + AdvM (A) AdvΛ n Pn R R = Pr Ah = 1 | h ← MΛn − Pr Ah = 1 | h ← Rn∗→l Rn∗→l = AdvM (A) . Λn

From this equation and the result of Theorem 1, we get Rn∗→l Pn (BA ) ≥ AdvM (A) − AdvΛ n Λn

(σ 2 + 2q 2 ) , 2n

and the equation (3.1) follows, since Rn∗→l Rn∗→l AdvM Adv (t, q, σ) = max (A) MΛn Λn A (σ 2 + 2q 2 ) Pn ≤ max AdvΛ (B ) + A n A 2n (σ 2 + 2q 2 ) Pn ≤ AdvΛ (t , σ) + . n 2n Using Proposition 2.7 of [1], we can easily show that Rn∗→l mac AdvM (t, q, σ) ≤ AdvM (t , q, σ) + Λn Λn

1 , 2l

(3.3)

where t = t + O(σn). Combining (3.1) and (3.3) we obtain the equation (3.2) which completes the proof. 3.2

Proof of Theorem 1

Remember that the second permutation π ¯ in Mπ (·) is derived from π. In order to prove Theorem 1 we ﬁrst prove the result under the condition that the second permutation π ¯ is not related with the ﬁrst permutation π in 3GPP-MAC. Assume that π and π are chosen independently from Pn . For any r-block message M = M [1] · · · M [r], we set ¯ π (M ) , Mπ,π (M ) = the leftmost l bits of π O

162

Dowon Hong et al.

¯ π (M ) = O[1] ⊕ · · · ⊕ O[r], O[i] = π(I[i]), and I[i] = O[i − 1] ⊕ M [i] for where O 1 ≤ i ≤ r. Let MPn ×Pn be the set of all functions Mπ,π , where π and π are chosen independently from Pn . Lemma 1 below provides an information-theoretic bound on the security of MPn ×Pn . Lemma 1 Let A be an adversary that makes queries to a random function chosen either from MPn ×Pn or from Rn∗→l . Suppose that A asks its oracle q queries, these queries having aggregate length of σ blocks. Then Rn∗→l AdvM (A) ≤ Pn ×Pn

(σ 2 + 2q 2 ) . 2n+1

Proof. To prove Lemma 1 we apply the idea from the proof of PMAC’s security in [18]. Let A be an adversary distinguishing MPn ×Pn from Rn∗→l . Since the adversary A is not limited in computational power, we may assume it is deterministic. One can imagine A interacting with a MPn ×Pn oracle as A playing the following game, denoted Game 1. Game 1: Simulation of MPn ×Pn 1 unusual ← false; for all x ∈ {0, 1}n do π(x) ← undeﬁned, π (x) ← undeﬁned 2 When A makes its t-th query, Mt = Mt [1] · · · Mt [rt ] where t ∈ {1, · · · , q} 3 It [1] ← Mt [1] 4 For i = 1, · · · , rt do 5 A ← {It [j] | 1 ≤ j ≤ i − 1} ∪ {Is [j] | 1 ≤ s ≤ t − 1, 1 ≤ j ≤ rs } 6 if It [i] ∈ A then Ot [i] ← π(It [i]) R 7 else Ot [i] ← {0, 1}n 8 Aπ ← {π(It [j]) | 1 ≤ j ≤ i−1}∪{π(Is [j]) | 1 ≤ s ≤ t−1, 1 ≤ j ≤ rs } R 9 if Ot [i] ∈ Aπ then [ unusual ← true; Ot [i] ← AC π ] 10 π(It [i]) ← Ot [i] 11 if i < rt then It [i + 1] ← Ot [i] ⊕ Mt [i + 1] ¯ t (Mt ) ← Ot [1] ⊕ · · · ⊕ Ot [rt ] 12 O ¯ s (Ms ) | 1 ≤ s ≤ t − 1} 13 B ← {O ¯t ) ] ¯ 14 If Ot (Mt ) ∈ B then [ unusual ← true; M ACt ← π (O R 15 else M ACt ← {0, 1}n ¯ s (Ms )) | 1 ≤ s ≤ t − 1} 16 Bπ ← {π (O R 17 if M ACt ∈ Bπ then [ unusual ← true; M ACt ← BπC ] ¯ t (Mt )) ← M ACt 18 π (O 19 Mπ,π (Mt ) ← the leftmost l-bit of M ACt 20 Return Mπ,π (Mt ) C Here we use AC π and Bπ to denote the complements of Aπ and Bπ , respectively. Two particular permutations π and π are equally likely among all permutations from {0, 1}n to {0, 1}n . In our simulation, we will view the selection of π and π as an incremental procedure. This will be equivalent to selecting π and π uniformly at random. This game perfectly simulates the behavior of MPn ×Pn .

A Concrete Security Analysis for 3GPP-MAC

163

Let UNUSUAL be the event that the ﬂag unusual is set to true in Game 1. In the absence of event UNUSUAL, the returned value Mπ,π (Mt ) at line 20 is random since the leftmost l bits of the string randomly selected at line 15. That is, the adversary sees the returned random values on distinct points. Therefore we get that Rn∗→l AdvM (A) ≤ Pr(UNUSUAL) . Pn ×Pn

(3.4)

First we consider the probability that the ﬂag unusual is set to true in line 9 or 17. In both cases, we have just chosen a random n-bit string and then we check whether it is a element in a set or not. We have that Pr(unusual = true in lines 9 or 17 in Game 1) 1 + 2 + · · · + (σ − 1) + 1 + · · · + (q − 1) ≤ 2n 2 2 σ +q ≤ . 2n+1

(3.5)

Now we can modify Game 1 by changing the behavior when unusual = true, and adding as a compensating factor the bound given by equation (3.5). We omit lines 8, 9, 16 and 17, and the last statement in line 14. The modiﬁed game is as follows. Game 2: Simpliﬁcation of Game 1 1 unusual ← false; for all x ∈ {0, 1}n do π(x) ← undeﬁned, π (x) ← undeﬁned 2 When A makes its t-th query, Mt = Mt [1] · · · Mt [rt ] where t ∈ {1, · · · , q} 3 It [1] ← Mt [1] 4 For i = 1, · · · , rt do 5 A ← {It [j] | 1 ≤ j ≤ i − 1} ∪ {Is [j] | 1 ≤ s ≤ t − 1, 1 ≤ j ≤ rs } 6 if It [i] ∈ A then Ot [i] ← π(It [i]) R 7 else [Ot [i] ← {0, 1}n ; π(It [i]) ← Ot [i]] 8 if i < rt then It [i + 1] ← Ot [i] ⊕ Mt [i + 1] ¯ t (Mt ) ← Ot [1] ⊕ · · · ⊕ Ot [rt ] 9 O ¯ s (Ms ) | 1 ≤ s ≤ t − 1} 10 B ← {O ¯ 11 If Ot (Mt ) ∈ B then unusual ← true R 12 M ACt ← {0, 1}n ¯ t (Mt )) ← M ACt 13 π (O 14 Mπ,π (Mt ) ← the leftmost l-bit of M ACt 15 Return Mπ,π (Mt ) By the equation (3.5) we have that Pr(UNUSUAL) ≤ Pr(unusual = true in Game 2) +

σ2 + q2 . 2n+1

(3.6)

In Game 2 the value Mπ,π (Mt ) returned in response to a query Mt is a random l-bit string. Thus we can ﬁrst select these M ACt values in Game 2. This does not change the view of the adversary that interacts with the game

164

Dowon Hong et al.

and the probability that unusual is set to true. This modiﬁed game is called Game 3, and it is depicted as follows. Game 3: Modiﬁcation of Game 2 1 unusual ← false; for all x ∈ {0, 1}n do π(x) ← undeﬁned, π (x) ← undeﬁned 2 When A makes its t-th query, Mt = Mt [1] · · · Mt [rt ] where t ∈ {1, · · · , q} R 3 M ACt ← {0, 1}n 4 Mπ,π (Mt ) ← the leftmost l-bit of M ACt 5 Return Mπ,π (Mt ) 6 When A is done making its q queries 7 For t = 1, · · · , q do 8 It [1] ← Mt [1] 9 For i = 1, · · · , rt do 10 A ← {It [j] | 1 ≤ j ≤ i − 1} ∪ {Is [j] | 1 ≤ s ≤ t − 1, 1 ≤ j ≤ rs } 11 if It [i] ∈ A then Ot [i] ← π(It [i]) R 12 else [Ot [i] ← {0, 1}n ; π(It [i]) ← Ot [i]] 13 if i < rt then It [i + 1] ← Ot [i] ⊕ Mt [i + 1] ¯ t (Mt ) ← Ot [1] ⊕ · · · ⊕ Ot [rt ] 14 O ¯ s (Ms ) | 1 ≤ s ≤ t − 1} 15 B ← {O ¯ 16 If Ot (Mt ) ∈ B then unusual ← true ¯ t (Mt )) ← M ACt 17 π (O We note that Pr(unusual = true in Game 3) = Pr(unusual = true in Game 2) .

(3.7)

Now we want to show that the probability of unusual = true in Game 3, over the random M ACt values selected at line 3 and the random Ot [i] values selected at line 12, is small. In fact, we will show something stronger: even if one arbitrarily ﬁxes the values of M AC1 , · · · , M ACq ∈ {0, 1}n , the probability that unusual will be set to true is still small. Since the oracle answers have now been ﬁxed and the adversary is deterministic, the queries M1 , · · · , Mq that the adversary will make have likewise been ﬁxed. The new game is called Game 4(C). It depends on constants C = (q, M AC1 , · · · , M ACq , M1 , · · · , Mq ). Game 4(C) 1 unusual ← false; for all x ∈ {0, 1}n do π(x) ← undeﬁned, π (x) ← undeﬁned 2 For t = 1, · · · , q do 3 It [1] ← Mt [1] 4 For i = 1, · · · , rt do 5 A ← {It [j] | 1 ≤ j ≤ i − 1} ∪ {Is [j] | 1 ≤ s ≤ t − 1, 1 ≤ j ≤ rs } 6 if It [i] ∈ A then Ot [i] ← π(It [i]) R 7 else [Ot [i] ← {0, 1}n ; π(It [i]) ← Ot [i]] 8 if i < rt then It [i + 1] ← Ot [i] ⊕ Mt [i + 1] ¯ t (Mt ) ← Ot [1] ⊕ · · · ⊕ Ot [rt ] 9 O ¯ s (Ms ) | 1 ≤ s ≤ t − 1} 10 B ← {O ¯ t (Mt ) ∈ B then unusual ← true 11 If O ¯ t (Mt )) ← M ACt 12 π (O

A Concrete Security Analysis for 3GPP-MAC

165

We know that Pr(unusual = true in Game 3)] ≤ max{Pr(unusual = true in Game 4(C))} . C

(3.8)

Thus, by (3.4) and (3.6)-(3.8) we have that Rn∗→l AdvM (A) ≤ max{Pr(unusual = true in Game 4(C))} + Pn ×Pn C

σ2 + q2 , (3.9) 2n+1

where, if A is limited to q queries of aggregate length σ, then C speciﬁes q, message strings M1 , · · · , Mq of aggregate block length σ, and M AC1 , · · · , M ACq ∈ {0, 1}n . Finally, we modify Game 4(C) by changing the order of choosing a random Ot [i] in line 7. This game is called Game 5(C). Game 5(C) 1 unusual ← false; for all x ∈ {0, 1}n do π(x) ← undeﬁned, π (x) ← undeﬁned 2 For t = 1, · · · , q do 3 For i = 1, · · · , rt do R 4 It [1] ← Mt [1] ; Ot [i] ← {0, 1}n 5 A ← {It [j] | 1 ≤ j ≤ i − 1} ∪ {Is [j] | 1 ≤ s ≤ t − 1, 1 ≤ j ≤ rs } 6 if It [i] ∈ A then Ot [i] ← π(It [i]) 7 else π(It [i]) ← Ot [i] 8 if i < rt then It [i + 1] ← Ot [i] ⊕ Mt [i + 1] ¯ t (Mt ) ← Ot [1] ⊕ · · · ⊕ Ot [rt ] 9 O ¯ s (Ms ) | 1 ≤ s ≤ t − 1} 10 B ← {O ¯ 11 If Ot (Mt ) ∈ B then unusual ← true ¯ t (Mt )) ← 0n 12 π (O Notice that in Game 5, we choose a random Ot [i] value in line 4. To avoid that the ¯ t (Mt )) to some particular game depends on the M ACt -values, we also set π (O n value, 0 , instead of to M ACt in the last line. The particular value associated to this point is not used unless unusual has already been set to true. Thus we obtain that Pr(unusual = true in Game 4(C)) = Pr(unusual = true in Game 5(C)) .

(3.10)

The coins used in Game 5 are O1 (M1 ) = O1 [1] · · · O1 [r1 ], · · · , Oq (Mq ) = Oq [1] · · · Oq [rq ], where either Os [i]’s are random coins or are a synonym Ou [j]. Here we set Ot [0] = 0 and It [k] ← Ot [k − 1] ⊕ Mt [k] for 1 ≤ t ≤ q and 1 ≤ k ≤ rt , and if there exists the smallest number u < s such that Is [i] = Iu [j] then Os [i] = Ou [j], else if there exists the smallest number j < i such that Is [i] = Is [j] then Os [i] = Os [j], else Os [i] is a random coin. Run Game 5 on M1 , · · · , Mq and the indicated vector of coins. Suppose that unusual gets set to true on this execution. Let s ∈ {1, · · · , q} be the particular value of t when unusual ﬁrst get set to true. Then

166

Dowon Hong et al.

¯ u (Mu ) for some u ∈ {1, · · · , s − 1} . ¯ s (Ms ) = O O In this case, if we had run Game 5 using coins Ou and Os and restricting the execution of line 2 to t ∈ {u, s}, then unusual still would have been set to true. In this restricted Game 5, we get ¯ s (Ms ) = O ¯ u (Mu ) = Pr (Os [1] ⊕ · · · ⊕ Os [rs ] = Ou [1] ⊕ · · · ⊕ Ou [ru ]) Pr O = 2−n ¯ u (Mu ) is a random string in {0, 1}n . Thus we obtain that because Ou [1] in O max {Pr(unusual ← true in Game 5(C))} C     ≤ max 2−n r1 ,··· ,r   q 1≤u<s≤q σ= ri q(q − 1) 1 ≤ · n 2 2 q2 ≤ n+1 . 2

(3.11)

Combining (3.9)-(3.11), we get that Rn∗→l (A) ≤ AdvM Pn ×Pn

σ2 + q2 q2 + . 2n+1 2n+1

This completes the proof of Lemma 1. Now we check the possibility of distinguishing a random function in the original MPn from a random function in MPn ×Pn . To obtain the result, we ﬁrst need to deﬁne what inner collisions are in MPn and MPn ×Pn . Deﬁnition 1 Let M1 , · · · , Mq be q strings in {0, 1}n∗ , and let π, π be two random permutations in Pn . We say that there occurs an inner collision of Mπ on the queries M1 , · · · , Mq if the collision occurs before invoking the second permutation π ¯ which is derived from π. Namely, if there exists a pair of indices ¯ π (Mi ) = O ¯ π (Mj ). Similarly, we say that there exists 1 ≤ i < j ≤ q for which O an inner collision of Mπ,π on the queries M1 , · · · , Mq if the collision occurs before invoking the second permutation π . Lemma 2 Let A be an adversary that makes queries to a random function chosen either from MPn or from MPn ×Pn . Suppose that A asks its oracle q queries, these queries having aggregate length of σ blocks. Then M

AdvMPPnn ×Pn (A) ≤

σ 2 + 2q 2 . 2n+1

Proof. Let ICol(MPn ) be the event that there is an inner collision among the messages in MPn and let ICol(MPn ×Pn ) be the event that there is an inner

A Concrete Security Analysis for 3GPP-MAC

167

collision among the messages in MPn ×Pn . Observe that since both algorithms are the same before invoking the second permutation, the inner collision probabilities in both algorithms are the same. Thus the following equation holds: R R (3.12) Pr ICol(MPn ) | π ← Pn = Pr ICol(MPn ×Pn ) | π, π ← Pn . For the same reason, if no inner collisions occur, the adversary outputs 1 with the same probability for MPn and MPn ×Pn because she sees outputs of permutations on distinct points and the second permutations of MPn and MPn ×Pn are independent. Let Pr1 (·) denote the probability that AMπ outputs 1 under R the experiment π ← Pn and Pr2 (·) denote the probability that AMπ,π outputs R 1 under the experiment π, π ← Pn . Then the following holds: Pr1 AMπ = 1 | ICol(MPn ) = Pr2 AMπ,π = 1 | ICol(MPn ×Pn ) , (3.13) where ICol(MPn ) and ICol(MPn ×Pn ) are complements of ICol(MPn ) and ICol(MPn ×Pn ), respectively. Therefore, by using the equation (3.12) and (3.13), we can write the adversary’s advantage as follows: M

AdvMPPnn ×Pn (A) ≤ Pr1 AMπ = 1) − Pr2 (AMπ,π = 1 = Pr1 AMπ = 1|ICol(MPn ) · Pr1 (ICol(MPn )) + Pr1 AMπ = 1|ICol(MPn ) · Pr1 ICol(MPn ) − Pr2 AMπ,π = 1|ICol(MPn ×Pn ) · Pr2 (ICol(MPn ×Pn )) − Pr2 AMπ,π = 1|ICol(MPn ×Pn ) · Pr2 ICol(MPn ×Pn ) = |Pr2 (ICol(MPn ×Pn )) · Pr1 AMπ = 1|ICol(MPn ) − Pr2 AMπ,π = 1|ICol(MPn ×Pn ) ≤ Pr2 (ICol(MPn ×Pn )) . To bound this quantity, we consider again the proof of Lemma 1. In the proof of Lemma 1, Game 1 perfectly simulates the behavior of MPn ×Pn . Observe that when an inner collision occurs in MPn ×Pn , the ﬂag unusual is set to true in Game 1. Thus by Lemma 1, we obtain that Pr2 (ICol(MPn ×Pn )) ≤ Pr (unusual = true in Game 1) σ 2 + 2q 2 ≤ , 2n+1 which completes the proof of Lemma 2. Proof of Theorem 1: From Lemma 1 and 2, the proof of Theorem 1 follows immediately.

168

4

Dowon Hong et al.

Conclusion

In this work we have examined the provable security of the 3GPP-MAC algorithm f 9. We have provided a proof of security for 3GPP-MAC in the sense of reduction-based cryptography. More speciﬁcally, we have shown that if there is an existential forgery attack on 3GPP-MAC, then the underlying block cipher can be attacked with comparable parameters. It might be seen as highly unlikely that a realistic attack exists on the 3GPP block cipher KASUMI. If that is indeed the case, our results establish the soundness of the 3GPP-MAC algorithm.

References 1. M. Bellare, J. Kilian, and P. Rogaway, The security of cipher block chaining, Crypto’94, LNCS 839, Springer-Verlag, 1994, pp. 341-358. An updated version can be found in the personal URLs of the authors. See, for example, http://wwwcse.ucsd.edu/users/mihir/. 2. A. Berendschot et al., Integrity Primitives for Secure Information Systems. Final Report of RACE Integrity Primitives Evaluation (RIPE-RACE 1040), LNCS 1007, Springer-Verlag, 1995 3. J. Black and P. Rogaway, CBC-MACs for arbitrary-length messages: the three-key constructions, Crypto 2000, LNCS 1880, Springer-Verlag, 2000, pp. 197-215. 4. J. Black and P. Rogaway, A Block-Cipher Mode of Operation for Parallelizable Message Authentication, EUROCRYPT 2002, LNCS 2332, Springer-Verlag, 2002, pp. 384-397. 5. L. Carter and M. Wegman, Universal hash functions, J. of Computer and System Sciences, 18, 1979, pp. 143-154 6. V. Gligor and P. Donescu, Fast encryption and authentication: XCBC encryption and XECB authentication modes, Contribution to NIST, April 20, 2001. Available at http://csrc.nist.gov/encryption/modes/. ´ Jaulmes, A. Joux, and F. Valette, On the security of Randomized CBC-MAC 7. E. Beyond the Birthday Paradox Limit: A New Construction, FSE 2002, LNCS 2365, Springer-Verlag, 2002, pp. 237-251. 8. K. Kurosawa and T. Iwata, TMAC: Two-Key CBC-MAC, Contribution to NIST, June 21, 2002. Available at http://csrc.nist.gov/encryption/modes/. 9. J. Kang, S. Shin, D. Hong and O. Yi, Provable security of KASUMI and 3GPP encryption mode f 8, ASIACRYPT 2001, LNCS 2248, Springer-Verlag, 2001, pp. 255271. 10. J. Kang, O. Yi, D. Hong, and H. Cho, Pseudorandomness of MISTY-type transformations and the block cipher KASUMI, ACISP 2001, LNCS 2119, Springer-Verlag, 2001, pp. 60-73. 11. L. R. Knudsen and C. J. Mitchell, Analysis of 3gpp-MAC and two-key 3gpp-MAC, Discrete Applied Mathematics, to appear. 12. L. Knudsen, Analysis of RMAC, Contribution to NIST, November 10, 2002. Available at http://csrc.nist.gov/CryptoToolkit/modes/comments/. 13. T. Kohno, Related-Key and Key-Collision Attacks Against RMAC, Contribution to NIST, 2002. Available at http://csrc.nist.gov/CryptoToolkit/modes/comments/. 14. J. Lloyd, An Analysis of RMAC, Contribution to NIST, November 18, 2002. Available at http://csrc.nist.gov/CryptoToolkit/modes/comments/.

A Concrete Security Analysis for 3GPP-MAC

169

15. M. Luby and C. Rackoﬀ, How to construct pseudorandom permutations and pseudorandom functions, SIAM J. Comput., 17, 1988, pp. 189-203. 16. E. Petrank, C. Rackoﬀ, CBC-MAC for Real-Time Data Source, J. of Cryptology, 13, 2000, pp. 315-338. 17. B. Preneel, P.C. van Oorschot, MDx-MAC and building fast MACs from hash functions, Crypto’95, LNCS 963, Springer-Verlag, 1995, pp. 1–14. 18. P. Rogaway, PMAC: A parallelizable message authentication code, Contribution to NIST, April 17, 2001. Available at http://csrc.nist.gov/encryption/modes/. 19. P. Rogaway, Comments on NIST’s RMAC Proposal, Contribution to NIST, December 2, 2002. Available at http://csrc.nist.gov/CryptoToolkit/modes/comments/. 20. M. Wegman and L. Carter, New hash functions and their use in authentication and set equality, J. of Computer and System Sciences, 22, 1981, pp. 265-279. 21. 3GPP TR 33.909, Report on the evaluation of 3GPP standard conﬁdentiality and integrity algorithms, V1.0.0, 2002-12. 22. 3GPP TS 35.201 Speciﬁcation of the 3GPP conﬁdentiality and integrity algorithm; Document 1: f8 and f9 speciﬁcations. 23. ISO/IEC 9797-1:1999(E) Information technology - Security techniques - Message Authentication Codes(MACs) - Part 1. 24. http://csrc.nist.gov/encryption/modes/

New Attacks against Standardized MACs Antoine Joux1 , Guillaume Poupard1 , and Jacques Stern2 1 DCSSI Crypto Lab 51 Boulevard de La Tour-Maubourg 75700 Paris 07 SP, France {Antoine.Joux,Guillaume.Poupard}@m4x.org 2 D´epartement d’Informatique Ecole normale sup´erieure 45 rue d’Ulm, 75230 Paris Cedex 05, France [email protected]

Abstract. In this paper, we revisit the security of several message authentication code (MAC) algorithms based on block ciphers, when instantiated with 64-bit block ciphers such as DES. We essentially focus on algorithms that were proposed in the norm ISO/IEC 9797–1. We consider both forgery attacks and key recovery attacks. Our results improve upon the previously known attacks and show that all algorithms but one proposed in this norm can be broken by obtaining at most about 233 MACs of chosen messages and performing an exhaustive search of the same order as the search for a single DES key.

1

Introduction

Message authentication codes (MACs) are secret-key cryptographic primitives designed to ensure the integrity of messages sent between users sharing common keys. MAC algorithms are often based on block ciphers or hash functions. Among the MAC algorithms based on block cipher, the CBC-MAC construction is probably the best known and studied. Initially proposed in [2], it has been studied in many papers, both from the cryptanalytic point of view [12] and from the security point of view [3]. It is well known that this algorithm suﬀers from birthday paradox based weaknesses, and this fact is reﬂected both in the known attacks and in the security proofs for this mode of operation of block ciphers. Of course, with 64bit block ciphers, the birthday paradox hits as soon as 232 messages have been authenticated with the same key. With high speed networks and high bandwidth applications, this is clearly not enough. In order to reach higher security, it is possible to use block ciphers with larger blocks, such as the AES, or more complicated MAC mechanisms secure beyond the birthday paradox, such as for example RMAC [11]. However, in many real life applications, developers still use variants of the CBC-MAC such as the algorithms described in ISO/IEC 9797–1 [7]. Of course, none of these algorithms has a known security proof that holds beyond the T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 170–181, 2003. c International Association for Cryptologic Research 2003

New Attacks against Standardized MACs

171

birthday paradox barrier. Moreover, most of them have known forgery attack with 232 known or chosen messages. Yet, these forgery attacks are often seen as academic and impractical, and most developers only care about key recovery attacks. As long as ﬁnding the keys requires an exhaustive search of two or more DES keys simultaneously, i.e. as long as the 56-bit DES keys cannot be found one by one, the algorithm is deemed secure. According to this point of view, some algorithms in the norm are still insecure, but others are considered as secure against key recovery attacks. More precisely, according to the informative annex B of ISO/IEC 9797–1, among the 6 MAC algorithms proposed in this standard, the ﬁrst 3 algorithms have known eﬃcient key recovery attacks, while the 3 others are considered to be secure in that sense. However, Coppersmith, Knudsen and Mitchell [6, 5] have proved that, whatever the padding method may be, the fourth algorithm is also insecure. In this paper, we show that the ﬁfth algorithm can also be cryptanalysed with eﬃcient key recovery attacks. This algorithm consists in two parallel computations of CBC-MAC. As a consequence, both for eﬃciency and for security reasons, it is much preferable to use a classical CBC-MAC with retail using a 128-bit block cipher such as AES and, if needed, to truncate the MAC value to 64 bits. We also describe generic key recovery attacks against any MAC based on a single internal CBC chain.

2

Description of CBC–MAC and Some Variants

In this section, we give a general description of MAC algorithm based on a single CBC chain with a key K and quite general and arbitrary initial and ﬁnal keyed transformation. For the sake of simplicity, we assume that the initial computation I is applied to the ﬁrst block of message and yields the initial value of the CBC chain which starts at the second block. We also assume that the ﬁnal computation F is applied to the ﬁnal value in the CBC chain and results in the MAC tag m. Since all MAC algorithms based on a single CBC chain we are aware of, are of this type, we do not lose much generality. Furthermore, this formalism is also used in ISO/IEC 9797–1. More precisely, for a message M1 , M2 ,. . . , M the computation works as follows (see also ﬁgure 1): – Let C1 = I(M1 ). – For i from 2 to , let Ci = EK (Ci−1 ⊕ Mi ). – Let the MAC tag m be F (C ). In order to fully specify a MAC algorithm of this type, it suﬃces to give explicit descriptions of I and F . The simplest example of a plain CBC-MAC occurs when I is deﬁned as EK and F is the identity function. ISO/IEC 9797–1 standard deﬁnes 6 MAC algorithms. The ﬁrst 4 ones are deﬁned in the following table:

172

Antoine Joux, Guillaume Poupard, and Jacques Stern

Algorithm ISO/IEC 9797–1 1 2 3 4

initial ﬁnal reference transformation transformation EK ∅ [7, 10, 2] EK E K1 [7] EK EK ◦ DK1 [7, 1] E K2 ◦ E K E K1 [7, 9]

The MAC algorithm 5 is deﬁned as the exclusive-or of two MAC values computed using algorithm 1 with diﬀerent keys. The MAC algorithm 6 is similar but uses the exclusive-or of two MACs computed with algorithm 4. Even if the algorithms of [7] are deﬁned with up to 6 secret keys, it is advised to derive them from only one or two keys. In the following, we propose attacks that do not try to take advantage of such key derivation technique. It should also be noticed that ISO/IEC 9797–1 deﬁnes a complete MAC algorithm by specifying which padding should be used and if a ﬁnal truncation is applied to the result. Our attacks immediately apply to standard paddings but we do not consider messages that include the bit length as a ﬁrst block (padding 3 of [7]).

M1 ?

I C1

M2

M3

M

? - j

? - j

? - j

?

?

?

K -E C2

K -E C3

K -E ?

C

F ?

MAC(M ) Fig. 1. Generic CBC MAC algorithm.

3 3.1

Overview of Classical Attacks Birthday Paradox Forgery of Any Generic CBC Mac Algorithm

Let us assume that the ﬁnal transformation F of a generic CBC MAC algorithm is a permutation. Then, we observe MAC tags of known messages. If we denote

New Attacks against Standardized MACs

173

Algorithm complexity(∗) reference algorithm 1 [256 ,1,0,0] algorithms 2 and 3 [3 × 256 ,232 ,0,0] algorithm 4 [4 × 256 ,232 ,2,0] [6] with paddings 1 et 2 or [4 × 256 ,1,1,256 ] algorithm 4 [4 × 256 ,0,264 ,264 ] [5] with padding 3 or [8 × 256 ,2 × 232 ,3 × 248 ,0] (∗) an [a, b, c, d] attack requires – – – –

a oﬀ-line block cipher encipherments, b known data string/MAC pairs, c chosen data string/MAC pairs, d on-line MAC veriﬁcations.

Fig. 2. Complexity of key recovery attacks against ISO/IEC 9797–1 MAC algorithms.

by n the block size (which is also the size of the MAC tag), according to the birthday paradox, the observation of O(2n/2 ) MAC tags allows to ﬁnd a collision, i.e., two diﬀerent messages M and M with the same MAC. For a 64-bit block cipher such as DES or triple-DES, this means that a collision occurs after the computation of only about 232 MAC tags. More precisely, we note the blocks of messages M and M in the following way M = (M1 , M2 , . . . , M1 , N1 , . . . , N2 ) M = (M1 , M2 , . . . , M , N1 . . . , N2 ) 1

with M1 =

M . 1

Then, it is easy to check that, for any blocks N1 , . . . N , 2

MAC(M1 , M2 , . . . , M1 , N1 , . . . , N ) 2

=

MAC(M1 , M2 , . . . M , N1 , . . . , N ). 1 2

Consequently, we obtain a forgery attack where the query of the MAC tag of a message enables to compute the (same) MAC tag of a diﬀerent message. Furthermore, using the same notations, we can also notice the following identity that will be used in the sequel: MAC(M1 , M2 , . . . , M1 −1 , X, N1 , . . . , N ) 2

=

MAC(M1 , M2 , . . . , M −1 , X 1

⊕ M1 ⊕

M , N1 , . . . , N ). 1 2

In conclusion, independently of the key size and of the complexity of initial and ﬁnal transformations I and F , CBC MAC algorithms are vulnerable to forgery attacks if the block size is too small. However, such attacks are often seen as academic and impractical by developers. That is why, in the following, we mainly focus on key recovery attacks. 3.2

Attacking Algorithm 1 from ISO/IEC 9797–1

The ﬁrst and simplest MAC algorithm, that has also been standardized by NIST [10], can be attacked in many diﬀerent ways. Used with the simple DES

174

Antoine Joux, Guillaume Poupard, and Jacques Stern

block cipher, the knowledge of one MAC tag allows, through an exhaustive search on the key K, to recover this key. Furthermore, the algorithm does not have any ﬁnal “retail” so it is easy to obtain valid MAC tags of concatenated messages using a so-called “xor-forgery”. 3.3

Attacking Algorithms 2 and 3 from ISO/IEC 9797–1

If the initial transformation is a single application of the block cipher, as in algorithms 2 and 3, and if the ﬁnal transformation is a permutation, the observation of a collision allows to recover the key K as in the case of algorithm 1. The attack [12] goes as follows. Observe MAC tags of known messages of at least two blocks until a collision occurs. Using the birthday paradox, such an event should appear after the observation of O(2n/2 ) messages, where n is the block size. Since F is a permutation, a collision is also present at the end of the CBC chain so we obtain a test for the exhaustive search on K. When using DES, we need the observation of 232 MAC values for known messages followed by an exhaustive search of a single DES key. Then, the key K1 used in F can be recovered by another exhaustive search.

4

Devising Some Tools

Throughout this section, we assume that n denotes the block size used in the MAC algorithms, or equivalently in the basic block cipher E we rely on. We are mostly interested in the case were n is 64 bits. The attacks presented in this paper mostly rely on techniques that allows us to learn the exclusive-or of two intermediate values present in two of the core CBC chains. 4.1

Exclusive-Or of Intermediate Values in CBC Chains

We ﬁrst remind a technique of Coppersmith and Mitchell [6]. We assume that we are given any generic message authentication code algorithm based on a single CBC computation chain with block cipher E and key K, as deﬁned in section 2. Clearly, by ﬁxing the last blocks of the message, we transform the ﬁnal computation into a function of the ﬁnal output of the CBC chain. In many cases, such as algorithms 1 to 4 of ISO/IEC 9797–1 without ﬁnal truncation, this function is in fact a permutation and we are able to learn whether the outputs of two diﬀerent chains are equal or not. Note that this hypothesis is not essential. Indeed, if the output function is not a permutation, it suﬃces to observe the output of several computation pairs done with diﬀerent ﬁnal blocks. When the output of the two chains are equal, all pairs will contain two identical values whatever the ﬁnal transformation may be. As a consequence, we may ignore the ﬁnal computation and assume that we can learn whether the output of two chains are equal or not. Let M and N be two messages of respective length M and N . Let CM denote the ﬁnal value of the CBC chain computed for message M and DN denote the ﬁnal value of chain computed for message N . Now form a message

New Attacks against Standardized MACs

175

M (T ) by adding a single block T , right at the end of the CBC chain for message M . Likewise, add a single block U at the end of N and form N (U ) . Writing down the equations of the two CBC chains, we ﬁnd that the ﬁnal value for messages M (T ) and N (U ) are respectively: (T )

CM +1 = EK (CM ⊕ T ), (U )

DN +1 = EK (DN ⊕ U ). Given 2n/2 diﬀerent messages M (T ) and 2n/2 diﬀerent N (U ) along with their MAC, we ﬁnd a MAC collision with high probability. Since EK is a permutation, such a collision implies that: CM ⊕ T = DN ⊕ U. As a consequence, we learn the value of CM ⊕ DN , namely T ⊕ U . Note that by structuring our choices for T and U , we can ﬁnd a collision with probability 1. One possible choice is to ﬁx the n/2 high order bits of T and let the n/2 remaining bits of T cover the 2n/2 possible choices. Similarly, we ﬁx the low order bits of U and let the high order bits cover the possible choices. Using the technique presented in this section, requires the MAC computation of 21+n/2 chosen messages, i.e., 233 chosen messages for 64-bit blocks. 4.2

Multiple Exclusive-Or of Intermediate Values in CBC Chains

We now explain how to eﬃciently obtain a large number of exclusive-ors of intermediate values in CBC chains, following ideas initially proposed in [5]. This useful tool will be applied in section 6.2 to attack general MAC algorithms based on CBC chains. We ﬁrst introduce a notation; for any message M formed with blocks M1 , M2 , . . . M , we denote by Internal(M, i) the intermediate value of the CBC chain at position i Internal(M, i) = EK (EK (EK (I(M1 ) ⊕ M2 ) ⊕ M3 ) . . . ⊕ Mi ) We consider a set of 2αn unknown intermediate values Xj . For each such value, we build a set Sj of 2βn MAC tags where Xj is the penultimate intermediate value. Formally, this means that Xj = Internal(M [j], i[j]) for a ﬁxed message M [j] and an index i[j] smaller than the block length of M [j]. Then we choose 2βn blocks Tk and that we query the MAC tags of messages (M [j]1 , . . . , M [j]i[j] , Tk ), for the 2βn values of k, in order to build the set Sj . Next, we compare the values of the sets Sj . If the same MAC tag appears in both Sa and Sb , we learn the exclusive-or of Xa and Xb , exactly as in the previous section. The probability to obtain Xa ⊕ Xb for ﬁxed indexes a and b is about 2βn × 2βn /2n . When both Xa ⊕ Xb and Xb ⊕ Xc are known, Xa ⊕ Xc can be easily deduced. In order to construct many exclusive-ors, we make a graph whose vertices are the 2αn unknown values Xj and where an edge links two vertices with known

176

Antoine Joux, Guillaume Poupard, and Jacques Stern

M2

M3

M

? -E

? - j

? - j

? - j

?

?

?

?

M1 K

K1 - E

K -E

K -E

K -E

C1

C2

C3

C

? K2 E ?

MAC(M ) Fig. 3. Algorithm 4 from ISO/IEC 9797–1.

exclusive-or obtained by collision of related sets Sj . When a path exists in this graph from Xa to Xb , Xa ⊕ Xb can be computed. We claim that this graph behaves like a random graph and well known results [4, 8] on such graphs say that as soon as the number of edges is larger than the number of vertices, a “giant component” (with size linear in the total number of vertices) appears. More precisely, with s vertices and cs/2 randomly placed edges, with c > 1, there is a single giant component whose size is almost exactly (1 − t(c))s, where [4] ∞

t(x) =

1 k k−1 (ce−c ) c k!

k

k=1

Since the number of edges is about 2(2α+2β−1)n , we obtain that, if α + 2β is larger than 1, the exclusive-or of all pairs of intermediate values in the giant component can be learned. Moreover, with a number of edges larger than the number of vertices, the giant component covers with probability more than 79% of all vertices. Furthermore, the smallest path between most pairs of vertices in the giant component is logarithmic (i.e. linear in n). As a consequence, we can eﬃciently learn the exclusive-or of a ﬁxed proportion, say one half, of the vertices in time O(n × 2αn ). As a conclusion, we obtain the exclusive-or of O(2αn ) intermediate values asking O(2(α+β)n ) MAC tags, with α + 2β ≥ 1.

5

Advanced Attacks against Algorithm 4 from ISO/IEC 9797–1

Let us recall that in this MAC algorithm, the ﬁnal value of the CBC chain is encrypted by a ﬁnal application of the block cipher, with a speciﬁc key K2

New Attacks against Standardized MACs

177

(see ﬁgure 3). Coppersmith and Mitchell [6] have proposed the following attack against this algorithm when padding methods 1 or 2 are used. Notice that even if padding method 3 is preferred, variants based on multiple exclusive-or computation can be applied [5]. The attack goes as follows. Observe MAC tags of messages of at least two blocks until a collision occurs. Let us note M and N such messages of respective length M and N and common MAC tag mcoll . Since the block cipher EK2 is a permutation, using the notation of section 4.1, we obtain CM = DN . Consequently, EK (CM −1 ⊕ MM ) = EK (DN −1 ⊕ NN ) and CM −1 ⊕ MM = DN −1 ⊕ NN , so CM −1 ⊕ DN −1 = MM ⊕ NN Finally, query the MAC tags mM and mN of the two truncated messages M1 ...MM −1 and N1 ...NN −1 . Since mM = EK2 (CM −1 ) and mN = EK2 (DN −1 ), we obtain −1 −1 EK (mM ) ⊕ EK (mN ) = MM ⊕ NN . 2 2

Then K2 is found by an exhaustive search. We expect that a single value will remain for K2 , when the key size is no larger than the block size. Once K2 is known, we can recover K through a second exhaustive search using the test −1 −1 EK (EK (mM ) ⊕ MM ) = EK (mcoll ). 2 2

Finally, recovering K1 can be done with a ﬁnal exhaustive search. This attack requires the observation of 2n/2 MAC values for known messages, the computation of two MAC values for chosen messages and ﬁnally the independent exhaustive searches on a K2 , K and K1 . When using the DES, we need 232 known messages and 2 chosen messages followed by an exhaustive search about four time as expensive as a simple exhaustive search on a single DES key.

6

New Attacks

6.1

Attacking Algorithm 5 from ISO/IEC 9797–1

Let us recall that in this MAC algorithm, each message goes through two independent plain CBC chain with keys1 in K1 and K2 (see ﬁgure 4). The MAC tag is the exclusive-or of the ﬁnal values of the two chains. We use a variation on the technique from subsection 4.1 to attack this algorithm. The ﬁrst step of the attack is to ﬁnd a collision between the MAC tags of two (short, i.e., one block) messages M and N . Let m denote the common MAC tag of M and N . Moreover, let EK1 (M ), EK2 (M ), EK1 (N ) and EK2 (N ) denote the ﬁnal values of the 4 CBC chains involved. We have m = EK1 (M ) ⊕ EK2 (M ) = EK1 (N ) ⊕ EK2 (N ). 1

In algorithm 5 from ISO/IEC 9797–1, the keys K1 and K2 are derived from a single key K.

M1

? K1 E

M2

M3

M

? - j

? - j

? - j

? K1 E

? K1 E

? K1 E

C1

C2

C3

M1

M2

M3

M

? - j

? - j

? - j

? K2 E

? K2 E

? K2 E

? K2 E

D1

D2

D3

C

? j-MAC(M ) 6

D

Fig. 4. Algorithm 5 from ISO/IEC 9797–1.

We would like to learn the value δ = EK1 (M ) ⊕ EK1 (N ) = EK2 (M ) ⊕ EK2 (N ). This can be done by computing the MAC values of two long lists of messages M (T ) formed by adding a single block T to M and N (U ) by adding a block U to N . It is easy to check that whenever T ⊕ U = δ, we get a collision for both of the CBC chains and of course a collision on the MAC values of the extended messages. Moreover, this kind of double collisions can be distinguished from “ordinary” collisions. Indeed, if we add any block, say the zero block, at the end of both M (T ) and N (U ) the resulting messages still collide. Once δ, M (T ) and N (U ) are known, we can proceed either with a forgery attack or a key recovery attack. It should be noted that for this particular algorithm, no eﬃcient forgery attack was previously known. Forgery attack. With a double collision between M (T ) and N (U ) in hand, making forgery is easy. Indeed, for any message completion L, the MAC value of M (T ) concatenated with L and the MAC value of N (U ) concatenated with L are necessarily equal. Indeed, the double collision propagates along L. Thus, it is easy to ask the MAC tag of one of the two extended messages and to guess that the MAC tag of the other extended message has the same value. The cost forgery attack is independent of the size of the keys of E, it is only a function of the block size n. The attack requires the computation of 21+n/2 MAC values. Key recovery attack. Since we know the value of δ, we know the two values EK1 (M ) ⊕ EK1 (N ) and EK2 (M ) ⊕ EK2 (N ) (both are equal to δ). Thus, we get two independent conditions respectively on K1 and K2 , and the keys can be

New Attacks against Standardized MACs

179

recovered with a simple exhaustive search. Indeed, assume that M and N are diﬀerent one block messages, then search for a key K that satisfy: EK (M1 ) ⊕ EK (N1 ) = δ. When the key size is no larger than the block size, we expect to ﬁnd two diﬀerent solutions during the exhaustive search, K1 and K2 . Furthermore, if there are more than two candidate solutions, we simply form all possible candidate pairs and keep the pair which is compatible with all the MAC values we already know. The key recovery attack requires the observation of 2n/2 MAC tags followed by the computation of 21+n/2 MAC values. When using the DES, we need 233 message followed by an exhaustive search roughly equivalent to a simple exhaustive search on a single DES key. 6.2

General MAC Algorithms with a Single CBC Chain

In this subsection, we consider key recovery attacks against general MAC algorithm based on a single CBC chain with a key K. We let I and F denote the initial and ﬁnal transforms as in section 2. In this subsection, our goal is to recover the key K of the main CBC chain as eﬃciently as possible. We assume that I and F are both keyed transformations which cannot be computed by the attacker since it (at ﬁrst) does not know any key material. We ﬁrst address the special case where I and F are closely related transformations (with identical keys) before considering the general case. The special case I = F ◦ EK . For example, a natural MAC scheme could apply the same triple-DES transformation, both at the beginning and at the end of the MAC computation. With our notations, this means that I = EK ◦ DK1 ◦ EK and F = EK ◦ DK1 . None of the previously described attacks apply to such a scheme. Let us consider 2αn blocks Mi and the associated internal value Xi = Internal(Mi , 1) = I(Mi ). In order to learn I(Mi ), we ﬁrst remark that ∆I (Mi ) = I(Mi ) ⊕ I(Mi ⊕ 1) can be seen as an identiﬁer for Mi . Of course, this identiﬁer is somewhat ambiguous, since ∆I (Mi ) and ∆I (Mi ⊕ 1) are identical. However, given any value for the identiﬁer, it has only a few associated values (unless I is almost linear, in which case simpler attacks are available). With this in mind, we can apply the technique of subsection 4.2 that computes ∆I (Mi ) for O(2αn ) blocks Mi asking 2(α+β)n MAC tags (with α + 2β larger than 1). Then, we ask for the MAC tags of pairs of messages whose only diﬀerence is that the last blocks are respectively Tj and Tj ⊕ 1. We denote by Yj the intermediate value after the exclusive-or with Tj , i.e., the input of the ﬁnal transformation F ◦ EK = I. Consequently, when the last block is Tj ⊕ 1, the input of I is Yj ⊕ 1. So, the exclusive-or of queried MAC tags reveals ∆I (Yj ).

180

Antoine Joux, Guillaume Poupard, and Jacques Stern

Finally, if we compute 2(1−α)n values ∆I (Yj ), we obtain, with high probability, a collision with one of the ∆I (Mi ). As we already explained, ∆I is a good identiﬁer so we probably have Mi = Yj . However, there might be false alarms, i.e., apparent collision not resulting from a real one. False alarms are easy to detect by computing a few additional MAC values. Given a real collision, we know that I(Mi ) is equal to the MAC tag whose last intermediate value is Yj . Thus, since we have learned the initial value of the CBC chain, we can compute K through exhaustive search, as we previously explained. This attack requires a total of approximately 2(α+β)n + 2(1−α)n MAC computations for chosen messages, with α + 2β ≥ 1. The best compromise is obtained with α = β = 1/3 and the number of queried MAC tags is about 22n/3 . When using the DES, we need 243 messages followed by an exhaustive search roughly equivalent to a simple exhaustive search on a single DES key. The general case with arbitrary I and F . We ﬁnally explain that, even if I and F are arbitrary transformations, the internal key K can still be attacked, even if the complexity is less practical than in previous cases. Always using the technique of section 4.2 we can consider 2αn intermediate values Xi which are unknown but whose pairwise exclusive-or are known. This requires the query of 2(α+1)n/2 MAC tags of chosen messages. Then, for each intermediate value Xi , the technique of section 4.1 allows to compute ∆i = EK (Xi ) ⊕ EK (Xi ⊕ 1) asking 2n/2 MAC computation for each Xi . With this list of ∆i in mind, we know guess a key K and a block X, and we compute ∆ = EK (X) ⊕ EK (X ⊕ 1). If K = K and X is one of the Xi s, ∆ is in the list of the ∆i s. We do not know the related Xi value but we know δ = Xi ⊕ Xj for any other j. Consequently, we learn the following test EK (X ⊕ δ) ⊕ EK (X ⊕ δ ⊕ 1) = ∆j . This allows to know if we have really guessed the correct key K or if it is only a false alarm. The probability to correctly guess K = K and that X is one of the Xi s is about 1 over 2k × 2(1−α)n , where k is the key size of K. The total number of MAC queries is 2(α+1)n/2 + 2n/2 and the complexity of the search on K and X is O(2k+(1−α)n ). According to the choice of α, we obtain diﬀerent compromises. The main ones are: α parameter number of MAC queries search complexity α=0 O(2n/2+1 ) O(2k+n ) 3n/4 α = 1/2 O(2 ) O(2k+n/2 ) n α=1 O(2 ) O(2k )

7

Conclusion

The main conclusion of this paper is that the use of MAC algorithms based on an internal CBC chain using a weak block cipher such as DES must be

New Attacks against Standardized MACs

181

carefully reconsidered, whatever the initial and ﬁnal transformation may be. A much more secure approach is to use a strong block cipher such as AES with a provably secure MAC algorithm.

Acknowledgments We would like to thank the anonymous referees for pointing out important references.

References 1. ANSIX9.19, American National Standard–Financial institution retail message authentication, 1986. 2. ANSIX9.9, American National Standard–Financial institution message authentication (wholesale), 1982. Revised in 1986. 3. M. Bellare, J. Kilian, and P. Rogaway. The Security of the Cipher Block Chaining Message Authentication Code. In Crypto ’94, LNCS 839, pages 362–399. SpringerVerlag, 1994. 4. B. Bollob´ as. Random Graphs. Academic Press, New York, 1985. 5. D. Coppersmith, L.R. Knudsen, and C.J. Mitchell. Key recovery and forgery attacks on the MacDES MAC algorithm. In Crypto 2000, LNCS 1880, pages 184–196. Springer-Verlag, 2000. 6. D. Coppersmith and C.J. Mitchell. Attacks on MacDES MAC algorithm. Electronic Letters, 35:1626–1627, 1999. 7. ISO/IEC 9797–1, Information technology–Security techniques–Message Authentication Codes (MACs)–Part 1: Mechanisms using a block cipher, 1999. 8. S. Janson, T. L uczak, and A. Ruci´ nski. Random Graphs. John Wiley, New York, 1999. 9. L.R. Knudsen and B. Preneel. MacDES: MAC algorithm based on DES. Electronic Letters, 34:871–873, 1998. 10. NIST. Computer Data Authentication, may 1985. Federal Information Processing Standards PUBlication 113. 11. NIST. Recommendation for Block Cipher Modes of Operation: The RMAC Authentication Mode, november 2002. NIST Special Publication 800-38B. 12. B. Preneel and P.C. van Oorschot. On the security of iterated Message Authentication Codes. IEEE Transactions on Information Theory, 45(1):188–199, January 1999.

Analysis of RMAC Lars R. Knudsen1 and Tadayoshi Kohno2 1

Department of Mathematics, Technical University of Denmark [email protected] 2 Department of Computer Science and Engineering, University of California at San Diego [email protected]

Abstract. In this paper the newly proposed RMAC system is analysed. The scheme allows a (traditional MAC) attack some control over one of two keys of the underlying block cipher and makes it possible to mount several related-key attacks on RMAC. First, an eﬃcient attack on RMAC when used with triple-DES is presented, which rely also on other ﬁndings in the proposed draft standard. Second, a generic attack on RMAC is presented which can be used to ﬁnd one of the two keys in the system faster than by an exhaustive search. Third, related-key attacks on RMAC in a multi-user setting are presented. In addition to beating the claimed security bounds in NIST’s RMAC proposal, this work suggests that, as a general principle, one may wish to avoid designing modes of operation that use related keys.

1

Introduction

RMAC [6, 2] is an authentication system based on a block cipher. The block cipher algorithms currently approved to be used in RMAC are the AES and triple-DES. RMAC is based on a block cipher with b-bit blocks and k-bit keys. RMAC takes as inputs: a message D of an arbitrary number of bits, two keys K1, K2 each of k bits and a salt R of r bits, where r ≤ k. It produces an m-bit MAC value, where m ≤ b. The method is as follows (see also Figure 1). First pad D with a 1 bit followed by enough 0 bits to ensure that the length of the resulting string is a multiple of b. Encrypt the padded string using the block cipher in CBC mode using the key K1. The last ciphertext block is then encrypted with the key K3 = K2 + R where ‘+’ is addition modulo 2. The resulting ciphertext is then truncated to m bits to form the MAC. The two keys K1, K2 may be generated from one k-bit key in a standard way [6]. There are ﬁve parameter sets in [6] for each of two block sizes. Parameter Set b = 128 b = 64 (r, m) (r, m) I (0, 32) (0, 32) II (0, 64) (64, 64) III (16, 80) n/a IV (64, 96) n/a V (128, 128) n/a T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 182–191, 2003. c International Association for Cryptologic Research 2003

Analysis of RMAC

183

Fig. 1. The RMAC algorithm with keys K1, K2 on input a padded string D1 D2 · · · Dn and salt R. E is the underlying block cipher and ‘+’ denotes addition modulo 2. The resulting MAC is M . We assume for illustrative purposes m = b and k = r.

In Appendix A of [6] it is noted that for RMAC with two independent keys K1 and K2 an exhaustive search for the keys is expected to require the generation of 22k−1 MACs, where k is the size of one key. However, for the cases with m = b this can be done much faster under a chosen message attack with just one known message and one chosen message. Independently of how the two keys are generated, an exhaustive search for the key K2 requires only an expected number of 2k decryptions of the block cipher [5]. Given a message D and the MAC using the salt R, request the MAC of D again. With a high probability this MAC is computed with a salt R , such that R = R. For these two MACs, the values just before the ﬁnal encryption will be equal and K2 can be found after about 2k decryption operations. Subsequently, K1 can be found in roughly the same time. The rest of this paper is organized as follows. In §2 an attack on RMAC used with three-key triple-DES is presented. The attack ﬁnds all three DES keys in time roughly that of three times an exhaustive search for a DES key using only a few MACs. §3 presents an attack on RMAC used with any block cipher. The attack ﬁnds one of the two keys in the system faster than by an exhaustive search. In §4 we present a construction-level related-key attack against RMAC and in §5 we describe some ways to exploit the related-key attack of §4 when attacking multiple users.

2

Attack on RMAC with Three-Key Triple DES

One of the block cipher algorithms approved to be used in RMAC is tripleDES with 168-bit keys. Consider RMAC with parameter set II, that is with 64-bit MACs and a 64-bit salt. The key for the ﬁnal encryption is then K3 = K2 + (R0104 ). However, it is not speciﬁed in [6] how the three DES keys are

184

Lars R. Knudsen and Tadayoshi Kohno

derived from K3. Assume that the ﬁrst DES key is taken as the rightmost 56 bits of K2 + (R0104 ), the second DES as the middle 56 bits, and the third DES as the leftmost 56 bits. Assume an attacker is given two MACs of the same message D but using two diﬀerent values, R and R of the salt. Assume that the rightmost eight bits of both R and R are equal. Then the encryption of the last same block for the two MACs is done using triple-DES where for one MAC the key used is (a, b, c), and where for the other MAC the key used is (a, b, c ⊕ d). Since the attacker knows d, he can decrypt through a single DES operation, ﬁnd c in 256 operations and derive one of the three DES keys[3]. This attack has a probability of success of 2−8 . If the attack fails, it is repeated for other values of D, R, and/or R . After the third DES key has been found, it is possible to ﬁnd the second DES key with similar complexity. Note that eight bits of the salt aﬀect the second DES key. Request the MAC of a message D2 using two diﬀerent values of the salt. Decrypt through the ﬁnal DES component with the third DES key. With a probability of 1 − 2−8 the two second DES keys in the ﬁnal encryption will be diﬀerent as a result of diﬀerent salt values. Since the salts are known by the attacker, one ﬁnds the second DES in about 256 operations. Subsequently, the ﬁnal DES key can be found using 256 MAC veriﬁcations [4] as follows. Assume one is given the MACs, M1 and M2 , of two diﬀerent messages D1 and D2 , each consisting of an arbitrary number of bits. Let P1 and P2 be the padding bits used in the respective MAC computations. Request the MAC, M3 , of the message D1 P1 E, where E is a one-block message. Let x1 , x2 and x3 be the values just before the ﬁnal triple DES encryptions in the computations of M1 , M2 and M3 . Given the value of the ﬁnal single-DES key of K2 one can compute also the MAC of the message D2 P2 (E ⊕ x1 ⊕ x2 ). Note that the value just before the ﬁnal triple DES encryptions in this case is x3 . Also note that the attacker has full control over the key bits which are modiﬁed using the (random) salts. Therefore this last part of the attack works regardless of how the salts are chosen, as long as the attacker knows them. In total, with 2 known and 1 chosen MAC, one ﬁnds the third DES key of K2 using 256 MAC veriﬁcations or alternatively using 256 chosen messages.

3

A Generic Attack

In this section we present an attack on the RMAC system with parameter set II for b = 64 and RMAC with parameter set V for b = 128. The attack ﬁnds the value of K2 after which RMAC reduces to a simple CBC-MAC for which it is well-known that simple forgeries can be found. In the following, let dK (x) denote the decryption of x using the key K for the underlying block cipher. The attack is based on multiple collisions. Deﬁnition 1. A t-collision for a MAC is a set of t messages all producing the same MAC value. We shall make use of the following lemma which is easily proved. Lemma 1. Let A, B, and C be boolean variables. Then A⇒B

⇔ not(B) ⇒ not(A), and

Analysis of RMAC

185

A ⇒ (B AND C) ⇔ not(B) OR not(C) ⇒ not(A). Let D be some message (with an arbitrary no. of blocks). Then the MAC of D, MACK1,K2 (D, R), is the last block from the CBC-encryption using K1, encrypted once again using the key K2 + R, where R is the salt. The attack goes as follows. Request the MACs of D for s diﬀerent values of the salt R. Assume that the attacker ﬁnds a t-collision, where the salts are R0 , . . . , Rt−1 and denote the common MAC value by M . For simplicity denote K2 + R0 by K, and K2 + Ri by K + ai−1 for i = 1, . . . , t − 1. The attacker guesses a key value L and computes the decryptions of the MAC value M using the keys L, L + a0 , . . . , L + at−1 . Then it holds for i = 0, . . . , t − 1, that if L = K or L = K + ai then dL (M ) = dL+ai (M ). Using Lemma 1 one gets that if dL (M ) = dL+ai (M ) then L = K and L = K + ai for 0 ≤ i < t. Similarly, if dL+ai (M ) = dL+aj (M ) then L = K + ai + aj for 0 ≤ i = j < t. In this way an exhaustive search for K2 can be made faster than brute-force. In some rare cases one gets equal values in the inequality tests. As an example, if dL (M ) = dL+ai (M ) for some i, then one needs to check if dL (M ) = dL+a0 (M ) = dL+a1 (M ) = ... after which all false alarms are ext−1 pected to be detected. The expected number of false alarms is t + . 2 Let us show the case of a 3-collision in more details. Assume that the random numbers, the salts used, are R0 , R1 , and R2 (which are known to the attacker). Since the messages are the same for all MACs and since the MACs are equal, say M , one knows that the keys K2 + R0 , K2 + R1 , and K2 + R2 all decrypt M to the same (unknown) message z, thus dK (M ) = dK+a0 (M ) = dK+a1 (M ), where K = K2 + R0 , a0 = R0 + R1 and a1 = R0 + R2 . The following implications are immediate. L=K

⇒ dL (M ) = dL+a0 (M ) dL+a0 (M ) = dL+a1 (M )

AND

L = K + a0

⇒ dL+a0 (M ) = dL (M ) dL (M ) = dL+a0 +a1 (M )

AND

L = K + a1

⇒ dL+a1 (M ) = dL+a0 +a1 (M ) AND dL+a1 (M ) = dL (M )

L = K + a0 + a1 ⇒ dL+a0 +a1 (M ) = dL+a1 (M ) AND dL+a1 (M ) = dL+a0 (M ) Lemma 1 enables us to rewrite the above implications as follows. dL (M ) = dL+a0 (M )

⇒ L = K

dL+a0 (M ) = dL (M )

⇒ L = K + a0

dL+a1 (M ) = dL (M )

⇒ L = K + a1

dL+a1 (M ) = dL+a0 (M ) ⇒ L = K + a0 + a1

186

Lars R. Knudsen and Tadayoshi Kohno

t 3 4 5 6 7 8 9 10 17

Table 1. t−1 u=t+ 2 4 7 11 16 22 29 37 46 136

u/t 1.3 1.8 2.2 2.7 3.1 3.6 4.1 4.6 8.0

Take (guess) a key value, L and compute dL (M ), dL+a0 (M ), and dL+a1 (M ). If dL (M ) = dL+a0 (M ), then L = K and L = K + a0 , if dL+a0 (M ) = dL+a1 (M ), then L = K +a0 +a1 , and if dL (M ) = dL+a1 (M ), then L = K +a1 . Summing up, with a 3-collision (provided a0 , a1 are diﬀerent) one can check the values of four keys from three decryption operations. Let us next assume that there is a 4-collision. Let the four keys in the 4-collision be K, K + a0 , K + a1 , K + a2 . Then from the results of dL (M ), dL+a0 (M ), dL+a1 (M ), and dL+a2 (M ), one can check the validity of four keys. Moreover, by arguments similar to the case of a 3-collision, from the four decryptions, one can check the values of all keys of the form K + ai + aj , where 3 0 ≤ i = j ≤ 2. Thus from four decryption operations one can check 4 + =7 2 keys. This generalizes to the following result. With a t-collision one can check the t−1 values of u = t + keys from t decryption operations. Table 1 lists values 2 of t, u and u/t. It should be clear that t-collisions can be used to reduce a search for the key K2, one question is by how much. How many values of L need to be tested before the sets of keys {L, L + a0 , . . . , L + at−1 , L + a0 + a1 , . . . , L + at−2 + at−1 } cover the entire key space? Consider the case t = 3. One can assume a0 = a1 (otherwise there is no collision), and that with a high probability there are two bit positions where a0 = a1 . Without loss of generality assume that these are the two most signiﬁcant bits and that these bits are “01” for a0 and “10” for a1 . Then a strategy is the following: Let L run through all keys where the most signiﬁcant two bits are “00”. Then clearly the sets {L, L + a0 , L + a1 , L + a0 + a1 } cover the entire key space and an exhaustive search for K2 is reduced by a factor of 43 , since in the attack one can check the value of four keys at the cost of three decryptions.

Analysis of RMAC

187

Consider the case t = 4. With a high probability the b-bit vectors a0 , a1 , and a2 are pairwise diﬀerent. Also, with a high probability there are three bit positions where a0 , a1 , and a2 are linearly independent (viewed as three-bit vectors). Without loss of generality assume that the bits are the three most signiﬁcant bits and that these are “001” for a0 , “010” for a1 and “100” for a2 . Then a strategy is the following: Let L run through all keys where the most signiﬁcant three bits are “000”. Then clearly the sets {L, L + a0 , L + a1 , L + a2 , L + a0 + a1 , L + a0 + a2 , L + a1 + a2 } cover 7/8 of the key space. Next ﬁx the most signiﬁcant three bits of L to “111”, ﬁnd other bit positions where a0 , a1 , and a2 are diﬀerent and repeat the strategy. Thus, in the ﬁrst phase of the attack one chooses 2b−3 values of L, does 4×2b−3 = 2b−1 encryptions, and one can check 7×2b−3 keys. In the next phase of the attack one chooses 2b−6 values of L, does 4 × 2b−6 = 2b−4 encryptions, and one can check 7×2b−6 keys. At this point, a total of 7×2b−3 +7×2b−6 = 2b −2b−3 −2b−6 keys have been checked at the cost of about 2b−1 + 2b−4 encryptions. In total, an exhaustive search for K2 is reduced by a factor of almost two. For higher values of t the attacker’s strategy becomes more complex. We claim that with a high probability (“good” values of ai ) the factor saved in an exhaustive search for the key is close to the value of u/t (see Table 1). The following result shows the complexity of ﬁnding t-collisions [7]. Lemma 2. Consider a set of s randomly chosen b-bit values. With s = c2(t−1)b/t one expects to get one t-collision, where c ≈ (t!)1/t . If it is assumed for a ﬁxed message D and a (randomly chosen) salt R that the resulting MAC is a random m-bit value, one can apply the Lemma to estimate the number of texts needed to ﬁnd a t-collision. Consider a few examples. With s = 2(b+1)/2 one expects to get one pair of colliding MACs, that is, one (2-)collision. With s = (1.8)22b/3 one expects to get a 3-collision, that is, three MACs with equal values (61/3 ≈ 1.8). With s = (2.2)23b/4 one expects to get√one 4-collision (241/4 ≈ 2.2). From Stirling’s formula n! = 2πn(n/e)n (1 + Θ( n1 )), one gets that (t!)1/t ≈ t/e for large t. Thus, with s = (t/e)2(t−1)b/t one expects to get a t-collision. Table 2 lists the complexities of ﬁnding t-collisions depending on the block size b. There are many variants of this attack depending on how many chosen texts the attacker has access to. Table 3 lists the complexities of some instantiations of the attacks, where for triple-DES the number of chosen texts has been chosen to be less than 264 (since the salt can be a maximum of 64 bits) and for AES the time complexity and the number of chosen texts needed have been made comparable. In both cases an exhaustive search for the key has been reduced by a factor of eight, so the correct value of the key can be expected trying half of that number of values. As a ﬁnal remark, note that the message D in the attack need not be chosen nor known by the attacker. Therefore one can argue that this attack is stronger than a traditional “chosen-text” attack.

188

Lars R. Knudsen and Tadayoshi Kohno Table 2. The estimated number of texts needed to ﬁnd a t-collision. t #texts needed b = 64 b = 128 3 244 286 49 4 2 297 53 5 2 2104 6 255 2108 7 257 2112 58 8 2 2114 59 9 2 2116 10 260 2118 63 17 2 2123

Table 3. Expected running times and chosen texts of attacks ﬁnding K2 of RMAC. Algorithm k 3-DES AES

4

b

Parameter t Expected # chosen sets running time texts 112 64 II 12 2108 263 124 128 128 V 20 2 2123

Construction-Level Related-Key Attacks

Another consequence of adding the salt to K2 is that it exposes the RMAC system to a construction-level related-key attack. Consider the RMAC system with parameter set II for b = 64 and RMAC with parameter set III, IV, or V for b = 128. Let K1, K2 and K1, K2 be two pairs of RMAC keys that are related by the diﬀerence K2 + K2 = X0k−r for some r-bit string X. If D is some message, then MACK1,K2 (D, R) = MACK1,K2 (D, R + (X0k−r )) with probability 1. An attacker can use this property to, for example, take a message MACed by one user (with keys K1, K2), change the salt by adding X0k−r , and then trick the second user (with related keys K1, K2 ) to accept the new MAC–salt pair as an authenticator for D.

5

Key-Collision Attacks

Even if an attacker cannot control or does not (a priori ) know the diﬀerence between multiple users’ keys, an attacker can still exploit the related-key attack in §4. Consider RMAC with parameter set II for b = 64 and parameter set V for b = 128. Assume k = r (if r < k then treat the bits of K2 not aﬀected by the salt as part of K1). Let us start by assuming that we have two users who share the ﬁrst key K1 but whose second keys K2 and K2 have some unknown relationship. To mount the construction-level related-key attack from §4 the attacker must ﬁrst learn the relationship between K2 and K2 . One way to learn this diﬀerence would be to ﬁrst force each user to MAC some ﬁxed message 2k/2 times. Let Ri be the i-th salt used by the ﬁrst user and let Mi be the i-th MAC. Let Ri be the i-th salt used by the second user and let Mi be the i-th MAC.

Analysis of RMAC

189

If K2 + Ri = K2 + Rj for any indices i, j, then we have a key-collision for the key to the last block cipher application and Mi = Mj with probability 1. The attacker cannot observe the values K2+Ri directly, but if he sees a collision Mi = Mj , then he guesses that the diﬀerence between K2 and K2 is Ri + Rj . Once this diﬀerence is known, the attacker can modify the MACs generated with K1, K2 to be valid MACs for K1, K2 . We expect to observe one collision Mi = Mj due to the key collision K2 + Ri = K2 + Rj , and we expect 2k−m collisions Mi = Mj at random, but recall that we are assuming that k = m. Note that if Mi = Mj occurs at random but K2 + Ri = K2 + Rj , then with very high probability an attacker’s subsequent forgery attempt will fail, and this is how we ﬁlter the signal from the noise. Now consider a group of 2k/2 users, each with independently-selected random keys, and assume that the adversary forces each user to MAC some ﬁxed message 2k/2 times. Note that, given a group of users this size, we expect two users to share the same ﬁrst key K1 and, by the above discussion, we expect one collision K2 + Ri = K2 + Rj for this pair of users. By looking for collisions Mi = Mj across diﬀerent users, an attacker can guess the relationship between two users’ keys, and thereby force a user to accept a message that wasn’t MACed with its keys. Unfortunately, this attack against 2k/2 users has a much lower signal-tonoise ratio than the attack against two users who are known to share the ﬁrst key K1. In particular, we expect approximately 22k−m collisions Mi = Mj at random. We ﬁlter the signal from the noise as before. The ﬁltering step does not signiﬁcantly slow down the attack since the attacker must already force 2k/2 users to each MAC 2k/2 messages and since we are assuming that k = m. As a concrete example, for AES with 128-bit keys, this attack works by forcing 264 users to each MAC some message 264 times. We expect 2128 collisions in the MAC outputs and one of those collisions will allow an adversary to take the messages MACed by one user and submit them as MACs to another user. Another way to exploit the related-key property of §4 is based on the keycollision technique of [1]. For this attack let n denote the number of users an attacker is attacking, and let n , q, q be additional parameters. The attack begins by the attacker picking keys L1u , L2u for u ∈ {1, . . . , n } (these keys correspond to “fake” users; these keys do not have to be random, but we assume that each L1u is distinct). Then, for each u, the attacker MACs some ﬁxed message D q u times; let M i be the i-th MAC produced using keys L1u , L2u , and let Ri be the i-th salt value. We assume that each Ri is distinct, but not necessarily random. Now assume that the attacker has each real user, indexed from 1 to n, MAC the message D q times, and let Miv be the i-th MAC produced by the v-th user, and let Riv be the i-th salt value for the v-th user (here we assume that all the salt values are chosen uniformly at random). Let K1v , K2v denote the keys of the v-th real user. If nn ≥ 2k and qq ≥ 2k , we expect at least one collision of the form L1u = K1v and L2u + Ri = K2v + Rjv to occur and, when this u occurs, M i = Mjv . If an adversary sees a collision of this form, it will learn both K1v and K2v . We do, however, expect approximately nn qq 2−m collisions

190

Lars R. Knudsen and Tadayoshi Kohno

u

M i = Mjv at random. This time, since we are guessing both RMAC keys, we can ﬁlter by recomputing the MAC of diﬀerent messages using the key guess. (As an aside, note that the basic (total) key-collision attack approach of [1] would require nn ≥ 22k .) We can instantiate this attack in diﬀerent ways. If n = n = q = q = 2k/2 , then we get a key recovery attack (against one of the 2k/2 users) with resources similar to our previous attack against 2k/2 users. If n = 1, n = 2k , q = q = 2k/2 , then we get an attack against a single user that uses 2k/2 chosen-plaintexts and approximately 23k/2 steps. As a concrete example, if we consider AES with 128bit keys, then the ﬁrst instantiation attacks one of 264 users using 264 chosenplaintexts per user, and 2128 oﬄine RMAC computations (broken down into 264 standard CBC-MAC computations and 2128 ﬁnal RMAC block cipher applications). The attack also requires approximately 2128 additional block cipher applications as part of the ﬁltering step. The latter instantiation attacks 1 user using 264 chosen-plaintexts and approximately 2192 oﬄine RMAC computations (broken down into 2128 standard CBC-MAC computations and 2192 ﬁnal RMAC block cipher applications). The ﬁltering phase requires an additional 2128 block cipher computations. As an additional note, we point out that the cost of the oﬄine computations can be amortized across multiple attacks, thereby reducing the cost per attack.

6

Conclusions

There are several conclusions to draw from this work. The ﬁrst and most obvious conclusion is that RMAC fails to satisfy the security claims in [6]. In particular, although NIST [6] claims that a key-recovery attack should require generating 22k−1 MACs, we have presented a number of ways to extract RMAC keys using much less work. We believe, however, that there are more important lessons to be learned from this research. First, our results suggest that one needs to be extremely careful when using and interpreting “provable security” results. What is being proven? And what assumptions are being made? In the case of RMAC we note that the proof of security is in the ideal cipher model. This is an extremely strong model and, unfortunately, not a good model for use with some popular block ciphers. For example, consider the attack against RMAC with triple-DES in §2. The attack in §2 worked because triple-DES is vulnerable to relatedkey attacks, whereas in the ideal cipher model there is no relationship between the permutations associated with diﬀerent keys (each key corresponds to an independently selected random permutation). This suggests that the ideal cipher model is not a good model to use when designing a mode of operation. Our results also show that, even when the underlying block cipher is secure against related-key attacks, interesting interactions can occur if a mode of operation uses related keys. For example, the attack in §3 reduces the search space of an exhaustive search attack by exploiting the fact that RMAC uses related keys. The construction-level related-key property in §4 also exists because RMAC uses related keys. And §5 shows the key-collision attacks become more serious when

Analysis of RMAC

191

a mode of operation uses a large number of related keys. These attacks further support our recommendation that modes of operation should not use related keys.

Acknowledgments We thank David Wagner for pointing out the relationship between the attacks in §5 and Biham’s paper [1]. Tadayoshi Kohno was supported by a National Defense Science and Engineering Graduate Fellowship.

References 1. E. Biham. How to decrypt or even substitute DES-encrypted messages in 228 steps. Information Processing Letters, 84, 2002. 2. E. Jaulmes, A. Joux, and F. Valette. On the security of randomized CBC-MAC beyond the birthday paradox limit: A new construction. In J. Daemen and V. Rijmen, editors, Fast Software Encryption 2002. Springer-Verlag, 2002. 3. J. Kelsey, B. Schneier, and D. Wagner. Key-schedule cryptanalysis of IDEA, GDES, GOST, SAFER, and triple-DES. In Neal Koblitz, editor, Advances in Cryptology: CRYPTO’96, LNCS 1109, pages 237–251. Springer Verlag, 1996. 4. L.R. Knudsen and B. Preneel. MacDES: a new MAC algorithm based on DES. Electronics Letters, April 1998, Vol. 34, No. 9, pages 871–873. 5. Chris Mitchell. Private communication. 6. NIST. DRAFT Recommendation for Block Cipher Modes of Operation: the RMAC Authentication Mode. NIST Special Publication 800-38B. October 18, 2002. 7. R. Rivest and A. Shamir. Payword and Micromint: Two simple micropayment schemes. Cryptobytes, 2(1):7–11, 1996.

A Generic Protection against High-Order Diﬀerential Power Analysis Mehdi-Laurent Akkar and Louis Goubin Cryptography Research, Schlumberger Smart Cards 36-38 rue de la Princesse, BP 45, F-78430 Louveciennes Cedex, France {makkar,lgoubin}@slb.com

Abstract. Diﬀerential Power Analysis (DPA) on smart-cards was introduced by Paul Kocher [11] in 1998. Since, many countermeasures have been introduced to protect cryptographic algorithms from DPA attacks. Unfortunately these features are known not to be eﬃcient against high order DPA (even of second order). In these paper we will ﬁrst describe new specialized ﬁrst order attack and remind how are working high order DPA attacks. Then we will show how these attacks can be applied to two usual actual countermeasures. Eventually we will present a method of protection (and apply it to the DES) which seems to be secure against any order DPA type attacks. The ﬁgures of a real implementation of this method will be given too. Keywords: Smart-cards, DES, Power analysis, High-Order DPA.

1

Introduction

The framework of Diﬀerential Power Analysis (also known as DPA) was introduced by P. Kocher, B. Jun and J. Jaﬀe in 1998 ([11]) and subsequently published in 1999 ([12]). The initial focus was on symmetrical cryptosystems such as DES (see [11, 14, 1]) and the AES candidates (see [3, 4, 7]), but public key cryptosystems have since also been shown to be also vulnerable to the DPA attacks (see [15, 6, 9, 10, 16]). Two main families of countermeasures against DPA are known: – In [9, 10], L. Goubin and J. Patarin described a generic countermeasure consisting in splitting all the intermediate variables, using the secret sharing principle. This duplication method was also proposed shortly after by S. Chari et al. in [4] and [5]. – In [2], M.-L. Akkar and C. Giraud introduced the transformed masking method , an alternative countermeasure to the DPA. The basic idea is to perform all the computation such that all the data are XORed with a random mask. Moreover, the tables (e.g. the DES S-Boxes) are modiﬁed such that the output of a round is masked by the same mask as the input. Both these methods have been proven secure against the initial DPA attacks, and are now widely used in real life implementations of many algorithms. T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 192–205, 2003. c International Association for Cryptologic Research 2003

A Generic Protection against High-Order Diﬀerential Power Analysis

193

However, they do not take into consideration more elaborated attacks called “High-order DPA”. These attacks, as described in [11] by P. Kocher or in [13] by T. Messerges, consist in studying correlations between the secret data and several points of the electric consumption curves (instead of single points for the basic DPA attack). In what follows, we study the impact of the High-order DPA attacks on both countermeasures mentioned above. Moreover, we describe new secure ways of implementing a whole class of algorithms (including DES) against these new attacks. The paper is organized as follows: – In section 2, we recall three basic notions: the (high-order) diﬀerential power analysis, the duplication method and the transformed masking method. – In Section 3, we study “duplication method” and show that an implementation of DES (or AES), which splits all the variables into n sub-variables is still vulnerable to an n-th order DPA attack. Section 3.1 gives the general principle of the attack and section 3.2 discusses practical aspects. – Section 4 is devoted to the analysis of the “transformed masking” (see [2]). For such an implementation of DES, section 4.1 describes how a “second order” DPA can work. A new variant we call the “superposition attack” is also presented. In section 4.2, we show that an AES (=Rijndael) implementation protected with the “transformed masking” method can also be attacked, either by second order DPA, or by the “Zero problem” attack. – Section 5 presents our new generic countermeasure: the “unique masking method”. We illustrate it on the particular case of DES. In 5.1, we explain the main idea of “unique mask”. In 5.2, we apply it to the full protection of a DES implementation. The security of this implementation against n-th order DPA attacks is investigated in sections 5.3 and 5.4. – Section 6 focuses on the problem of securely constructing the modiﬁed SBoxes used in our new countermeasure. The details of the algorithm are presented, together with practical impacts on the amount of time and memory needed. – In Section 7, we give our conclusions.

2 2.1

Background The (High-Order) Diﬀerential Power Analysis

In basic DPA attack (see [11, 12], or [8]), also known as ﬁrst-order DPA (or DPA when there is no risk of confusion), the attacker records the power consumption signals and compute statistical properties of the signal for each individual instant of the computation. This attack does not require any knowledge about the individual electric consumption of each instruction, nor about the position in time of each of these instructions. It only relies on the following fundamental hypothesis (quoted from [10]):

194

Mehdi-Laurent Akkar and Louis Goubin

Fundamental Hypothesis (Order 1): There exists an intermediate variable, that appears during the computation of the algorithm, such that knowing a few key bits (in practice less than 32 bits) allows us to decide whether two inputs (respectively two outputs) give or not the same value for this variable. In this paper, we consider the so-called High-Order Diﬀerential Power Analysis attacks (HODPA), which generalize the ﬁrst-order DPA: the attacker now compute statistical correlations between the electrical consumptions considered at several instants. More precisely, an “n-th order” DPA attack takes into account n values of the consumption signal, which correspond to n intermediate values occurring during the computation. These attacks now rely on the following fundamental hypothesis (in the spirit of [10]): Fundamental Hypothesis (Order n): There exists a set of n intermediate variables, that appear during the computation of the algorithm, such that knowing a few key bits (in practice less than 32 bits) allows us to decide whether two inputs (respectively two outputs) give or not the same value for a known function of these n variables. 2.2

The “Duplication” Method

The “duplication method” was initially suggested by L. Goubin and J. Patarin in [9], and studied further in [4, 10, 5]. It basically consists in splitting the data (manipulated during the computation) into several parts, using a secret sharing scheme, and computing a modiﬁed algorithm on each part to recombine the ﬁnal result at the end. For example, a way of splitting X into two parts can consist in choosing a random R and splitting X into (X ⊕ R) and R. 2.3

The “Transformed Masking” Method

The “Transformed Masking” Method was introduced in [2] by M.-L. Akkar and C. Giraud. The basic idea is to perform all the computation that all the data are XORed with a random mask. By using suitably modiﬁed tables (for instance SBoxes in the case of DES), it is possible to have the output of a round masked by exactly the same mask that masked the input. The computation is thus divided into two main steps: the ﬁrst one consists in generating the modiﬁed tables and the second one consists in applying the usual computation using these modiﬁed tables (the initial input being masked before starting the computation and the ﬁnal output being unmasked after the computation).

3 3.1

Attack on the Duplication Method Example: Second Order DPA on DES

In what follows, we suppose that two bits b1 and b2 , appearing during the computation, are such that b1 ⊕ b2 equals the value b of the ﬁrst output of the ﬁrst S-Box in the ﬁrst DES round. The attacker performs the following steps:

A Generic Protection against High-Order Diﬀerential Power Analysis

195

1. Record the consumption curves Ci corresponding to N diﬀerent inputs Ei (1 ≤ i ≤ N ). For instance N = 1000. 2. The attacker guesses the interval δ between the instant corresponding to the treatment of b1 and the instant corresponding to the treatment of b2 . Each curve Ci is then replaced by Ci,δ , which is the diﬀerence between Ci and (Ci translated by δ). He then computes the mean curve CMδ of the N curves Ci,δ . 3. The attacker guesses the 6 bits of the key on which the value of b depends. From these 6 key bits, he computes for each Ei the expected value for b. he then computes the mean curve CM’δ of all the Ci,δ such that the expected b equals 0, and CM”δ the mean curve of all the other Ci,δ 4. If CM’δ and CM”δ do not show any appreciable diﬀerence, go back to 3 with another choice for the 6 key bits. 5. If no choice for the 6 key bits was satisfactory, go back to 2, with another choice for δ. 6. Iterate the steps 2, 3, 4, 5 with two bits whose “exclusive-or” comes from the second S-Box, the third S-Box, ..., until the eighth S-Box. 7. Find the 8 remaining key bits by exhaustive search. 3.2

The Attack in Practice

As speciﬁed in the original paper [10], it is clear that the n-th duplication is vulnerable to an n-th order DPA attack. An important point is to notice that if the method is not carefully implemented, it will be easily detected on the consumption curve, just by identifying n repetitive parts in the calculus. In this case, it would be easy for the attacker to just superpose the diﬀerent parts of the curves (in a constant, or proportional to log(n), time, but not exponential in n). Moreover, in certain scenarios, the attacker has full access to the very details of the implementation. In particular, for high-level security certiﬁcations (ITSEC, Common Criteria), it is assumed that the attacker knows the contents of the smartcard ROM.

4

Attack on the Transformed Masking Method

4.1 DES: Second Order DPA 4.1.1 Usual Second Order DPA: For the DES algorithm, the input of a round is masked with a 64 bits value R = R0−31 ||R32−63 divided in two independent masks of 32 bits each. The modiﬁed S-boxes S’ are the following (where S are the original ones). S (X) = S(X ⊕ EP (R32−63 )) ⊕ P −1 (R0−31 ⊕ R32−64 ) Where EP represents the Expansion Permutation, and P −1 the inverse of the P permutation after the S-Boxes. We can see that using this formula the output mask of the value at the end of a DES round is nearly R. To get exactly the R masked value, the left part of the value has to be remasked with R0−31 ⊕ R32−64 .

196

Mehdi-Laurent Akkar and Louis Goubin

It is clear, like noticed in the article, than this countermeasure is subject to a second-order DPA attack. Indeed, the real output of the S-boxes is correlated to the masked value and the value R ; so getting the electrical trace of these two values one can combine them and get a trace on which will work a classical DPA attack. In order to perform eﬃciently such an attack, without need of n2 point like in the general attack, the attacker should get precise information about the implementation of the algorithm: he should know precisely where the interesting values are manipulated. 4.1.2 Superposition Attack: In this section we will present a new kind of DPA attack. In theory it is a second-order DPA attack; but in practice it is nearly as simple as an usual DPA attack. The idea is the following: in a second order DPA the most diﬃcult thing is to localize the time where the precise needed values are manipulated. On the contrary localizing a whole DES round is often quite easy. So instead of correlating precise part of the consumption traces we will just correlate the whole trace of the ﬁrst and the last round. With these method one can notices than at one moment we will have the trace consumption T of the following value which is the output S-Boxes values: T = (S (E(R15 ) ⊕ K16 ) ⊕ R ) ⊕ (S (E(R1 ) ⊕ K1 ) ⊕ R ) = S (E(R15 ) ⊕ K16 ) ⊕ S (E(R1 ) ⊕ K1 ) Where R is the right part of the mask permutated by the expansion permutation. One can notice that the T value does not depend of the random masking value and than R1 and R15 1 are often known. Considering this, it is easy to sea that performing a guess on the 2 × 6 bits of the subkey of the ﬁrst and last round, it is possible to guess the XORed value of the output of the S-Boxes of the ﬁrst and last round. After that once can perform an usual DPA-type attack attacker and ﬁnd the values of the diﬀerent sub-keys of K1 and K16 . Due to redundancy of the key-bits one can moreover check the coherency of the results: indeed with such an attack one will ﬁnd 2 × 6 × 8 = 96 >> 56 bits for the key. The detailed algorithm is the following: – Correlate (usually an addition or subtraction of the curves) the ﬁrst and last round traces. – For All the messages M, For the S-box j = 1..8 – For k=0 to 63, For l=0 to 63 – Separate the Messages , considering one bit of the XOR values of the output of the j th Sbox (round 1 and 16) for the message M considering that the subkey of the S-Box j of the ﬁrst round is k, and the subkey of the S-Box j of the last round is l. – Average and subtract the separated curves. – Choose the value k, l where the greatest peak appear. – Check the coherency of the keybits found. 1

R15 can be deduced from the output applying the inverse of the ﬁnal permutation.

A Generic Protection against High-Order Diﬀerential Power Analysis

197

A cautionary look of the attack could convince the reader that any error of one bit on the guess of K1 or K16 eliminate all the correlation. Comparing to an usual second order DPA attack, even if this attack require the analyze of 212 = 4096 possibilities, it has the advantage not to need a precise knowing of the code. And from a complexity point of view it increases by a constant factor (26 = 64) the amount of time and memory needed for the attacker and not by a linear factor. 4.1.3 Conclusion: The superposition attack, even if it is a theoretical second order attack is very eﬃcient in practice. Therefore to use transformed masking method, one must use diﬀerent masks at each step of the algorithm. This idea have been developed and adapted to produce the protection described in this article. 4.2

AES

For the AES, the countermeasure is nearly the same than in DES. The only diﬀerence is that no transformed tables are used for the non-linear part of the AES (the inversion in the ﬁeld GF(256)) but the same table with a multiplicative mask. The distributivity of the multiplication over XOR (addition in the ﬁeld) is used. So from an additive mask it is easy, without unmasking the value, to switch to a multiplicative one, to go through the Sboxes and to get back to an the mask. 4.2.1 Usual Second Order DPA: For AES it is exactly the same than in the DES transformed masking method. Correlating the masked value and the mask allow an eﬀective attack against this method. 4.2.2 The “Zero” Problem: Because a multiplicative mask is used during the inversion, one can see that if the inverted value is zero -and this value just depend of 8 bits of the key in the ﬁrst and last round- then whatever is the masking value, the inverted value will be unmasked. Therefore if someone is able to detect in the consumption trace that the value is zero instead of a random masked value, one will be able to break such an implementation. Of course probabilistic tools such as variance analysis are devoted to such analysis. 4.2.3 Superposition Method: As in the DES, one can say that using the same superposition method it would be possible to ﬁnd the key 16 bits by 16 bits superposing the ﬁrst and last round of AES because these are using the same mask. Unfortunately after the last round a last subkey is added to the output of the round. So the attacker need at least to guess 8 more bits of the key. It increase the attacker amount of work to 24 bits for each Sbox. In theory it is not a quadratic attack in the number of samples but in practice it is not so easy to perform more than 16 billions manipulation of the curve for each tables and each message.

198

Mehdi-Laurent Akkar and Louis Goubin

4.2.4 Conclusion: Judging by these attacks we can consider that the adaptive mask countermeasure on AES is not eﬃcient even against some simpler attack than second order ones.

5

Unique Masking Method Principle

We have seen that the actual countermeasure against DPA are intrinsically vulnerable to high order DPA. Often the order of vulnerability is two, and even when it is theoretically more; practically it is one or two. In the next section we will present a method to protect the DES that seem to be eﬃcient against any order DPA attacks. We will ﬁrst describe the elementary steps of the method for after see how to construct a complete secure DES and why it seems to be secure. 5.1

Masked Rounds

Given any 32 bits value α we will deﬁne two new functions S˜1 and S˜2 based on the Sboxes function S. ∀x ∈ {0, 1}48 S˜1 (x) = S(x ⊕ E(α)) ∀x ∈ {0, 1}48 S˜2 (x) = S(x) ⊕ P −1 (α) where E is the expansion permutation and P −1 is the inverse of the permutation after the Sboxes. We deﬁne fKi to be the composition of E, the XOR of the ith round subkey Ki the Sboxes and the permutation P . We then deﬁne f˜1,Ki and f˜2,Ki by replacing S by S˜1 and S˜2 in f . Remark We can see that f˜1 gives an unmasked value from a α-masked value and that, f˜2 gives a α-masked result from an unmasked one. Using the function f , f˜1 and f˜2 one can obtain 5 diﬀerent rounds using masked/ unmasked values. The ﬁgure 1 represents these ﬁve diﬀerent rounds. The plain ﬁll represents the unmasked value and the dashed ﬁll represents masked values. The following automata (cf ﬁg. 2) shows how these rounds are compatible with each other. The input states are the rounds where the input is unmasked (A and B) and the output states are the one where the output of the rounds are unmasked (A and E). 5.2

Complete DES with Masked Rounds

It is easy to see that one could obtain a 16 round complete DES with these requirements. IP − BCDCDCEBCDCDCDCE − F P represents a correct example (IP represents the initial permutation of DES and FP the ﬁnal one).

A Generic Protection against High-Order Diﬀerential Power Analysis

199

Fig. 1. Masked rounds of DES

Fig. 2. Combination of the rounds

5.3

Security Requirements

In all this section we will consider that the modiﬁed Sboxes are already constructed and that the mask α changes at each DES computation. The ﬁrst step is to analyze in the DES of how many key bits depends the bits of the data at each round. This simple analyze is summarized in the ﬁgure 3. We have also considered that the clear and the cipher were known, explaining the symmetry of the ﬁgure. To get a correct security we have considered that the critical data are the one where the bits are dependant of less than 36 2 bits of the key. So we can see that only two parts have to be protected: the one connecting R2 and L3 and the one connecting R15 and L16 . We deﬁne as usual Li (respectively Ri ) as the left part (respectively the right part) of the message at the end of the ith round. Of course the one depending of none bits of the keys have not to be protected. 2

If we consider that a curve contains 128 8 bits-samples, 36 bits represents an amount of 2 Tb of memory needed.

200

Mehdi-Laurent Akkar and Louis Goubin

Fig. 3. Number of key bits / bits of data

Therefore these values must be masked and oblige the ﬁrst three rounds to be of the form: BCD or BCE The last three rounds must be of the form: BCE or DCE Taking in account these imperatives IP − BCDCDCEBCDCDCDCE − F P is – for example – a good combination. 5.4

Resistance to DPA

5.4.1 Classical DPA: This countermeasure clearly protect the DES against DPA of order one. Indeed all the value depending of less than 36 bits of the key are masked by a random mask which is used only once. 5.4.2 Enhanced Attacks: First we have to notice that this countermeasure is vulnerable against the superposition method guessing 12 bits of the key.

A Generic Protection against High-Order Diﬀerential Power Analysis

201

Indeed the same mask is used in the ﬁrst and last round of the DES. So to counteract this attack we will from know consider that there’s two diﬀerent masks α1 and α2 which are used in the ﬁrst and last round of DES. It is easy to see that the proposed combination of round permit at the 7th and 8th round to switch from α1 to α2 because of the structure of E-round/B-round which leave their output/input unmasked. With evident notations we can get the following example of DES: IP − Bα1 Cα1 Dα1 Cα1 Dα1 Cα1 Eα1 Bα2 Cα2 Dα2 Cα2 Dα2 Cα2 Dα2 Cα2 Eα2 − F P Let now consider n-th order DPA attack. The idea is to correlate several value to get the consumption of an important value. For us an important value is consider to be a value which could be guessed with less than 36 bit of the key. But we have seen that all these value are masked. Moreover the mask appear only once in all the calculus3 , so even with high order correlation it is impossible to get any information about the masked value. 5.5

Variation

– If we want the mask never to appear several times (even on values depending on more than 36 bits of the key) one can use the following combination instead of the proposed one: IP − Bα1 Cα1 Eα1 AAAAAAAAAABα2 Cα2 Eα2 − F P – For paranoid people it is even possible to add two new masks and to mask every values depending on less than 56 bits of the key. – This method is modular: if one uses a protocol where the input or the output are not known, one can eliminate the associated mask.

6

Eﬀective Construction of the Modiﬁed S-Boxes

In this section algorithms will be described using pseudo c-code. 6.1

Principle

It is easy to see that the following operation must be performed securely in order to construct the Sboxes S˜1 . – Generate a random α. – Perform a permutation on α (permutation P −1 ). – XOR a value (P −1 (α)) to a table. For the construction of S˜2 , we need to: 3

We remind the reader that we have considered that the tables are already constructed. This part will be analyzed in the next section.

202

Mehdi-Laurent Akkar and Louis Goubin

– Recuperate α because it is the same than in S˜1 . – Permutate it (E(α)). – XOR to a table containing (1..63). Of course securely means that all these operations must be done without giving any information about the consumption of α at any order (1,2 ...). 6.2

Generation of a Random Number: For Example 64 Bits

We consider that we have access to a 64 bytes array t and to a random generator (for example a generator of bytes). We can proceed like the following: – for(i=0..63) { t[i]=rand()%2 } – for(i=0..63) { swap(t[i],t[rand%64]) } With this this method one can see that we get in memory a 64 bits random value and that an attacker just know the hamming weight of α (if he can perform an SPA attack). For this we have considered that the attacker could not in one shot determinate what is the array entry addressed when we swap the entries ; hypothesis which looks quite reasonable. Variant 1: To save time and memory we can imagine the following method which is much faster and does not look too weak. We will get 16 4-bits values in a 16 bytes array: – for(i=0..16) { t[i]=rand() } – for(i=0..16) { swap(t[i] AND 7,t[rand%16] AND 7) } Indeed we can consider that the 4 bits of high weight will strongly inﬂuence the consumption. Variant 2: This other method produces and 8 bytes random array. It is faster but less secure. – for(i=0..8) { t[i]=rand() } – for(i=0..16) { t[rand()%8] XOR= rand() } 6.3

Permutation

Classically it can be done bit per bit randomly. Against it only allow the attacker to get the hamming weight of the permuted value. To speed up and have a memory gain, one could perform randomly the permutation quartet per quartet or even byte per byte. An idea could be to add some dummy values and perform the permutation. The dummy values would just not be considered after the permutation time. 6.4

XOR

Here a general method could be to XOR the value bit per bit in a random order to the table. Once again many compromise are possible to perform the XOR: do it byte per byte, add dummy values ...

A Generic Protection against High-Order Diﬀerential Power Analysis

6.5

203

Practical Considerations

The usual Sboxes are using 256 bytes. We need them but they could be stored in ROM. For the additional tables we need to store them in RAM. In the normal security method (two masks α1 and α2 ) we need to store 4 new tables. So the total requirement in RAM is of 1024 bytes. We have seen that the construction of the Sboxes could be performed quite securely. Of course the most secure method is very slow and will really slow down the DES execution and use a lot of memory. The idea was just to show that it was theoretically possible to build the table without ﬁltering any information4 with a reasonable model of security5 But we have also seen that it is possible to increase the speed and decrease the memory without loosing too much security. Lets now have a look at how could be applied our countermeasure to the AES algorithm. Due to the higher number of tables (more than 16 instead of 8) and because they are bigger (8→8 bits instead of 6→4) compared to DES, our countermeasure would require about 8 Kb (or 16 Kb for a high security level) of RAM, a size which is too big for usual smart-cards. Some simpliﬁcations – which would unfortunately decrease the level of security – are therefore necessary to apply our countermeasure to AES implementation.

7

Real Implementation on the DES algorithm

A real implementation of this method have been completed on an ST19 component. It includes the following features described in the last sections: – SPA protections: Randomization and masking method for the permutations and the manipulation of the key (permutations, Sboxes access...). – DPA protection: HO-DPA Protection of the ﬁrst and last three rounds of the DES. – S-Boxes constructions is done bit per bit with bit per bit randomization while computing the masking value. – DFA Protection: multiple computation, coherence checking ... With all this features we get an implementation with: – 3 KB of ROM code. – 81 bytes of RAM and 668 bytes of extended RAM – An execution time of 38 ms at 10 Mhz. This implementation have been submitted to our internal SPA/DPA/DFA laboratory which have tried to attack it without success. 4 5

But the hamming weight of the value. The attacker is not able to read the exact memory access in one shot.

204

8

Mehdi-Laurent Akkar and Louis Goubin

Conclusion

Opposed to other proposed countermeasures, the unique masking method presents the following advantages: – It is actually the only protection known against high-order DPA. – The core of the DES is exactly the same than ordinary; so one can use with very light modiﬁcation its implementation just adding the Sbox generation routine. – The important values are masked with a unique mask which never appear in the DES computation. For example with the transformed masking method the mask were appearing often (for a ﬁrst mask at the whole beginning and at each rounds). Here one do not even have to mask the entry or unmask the output. – The only part where the mask is appearing (but it could be randomly and bit per bit) does not depend neither of the key and neither of the message. Therefore the security is totally focused at this point. – This method is very ﬂexible and modular without important changes in the code: it could even be a compilation parameter to determine which level of security one wants. – A real implementation have been performed proving the feasibility of this countermeasure in reasonable time (less than 40ms with full protections).

References 1. M.-L. Akkar, R. Bevan, P. Dischamp, D. Moyart, Power Analysis: What is now Possible. In Proceedings of ASIACRYPT’2000, LNCS 1976, pp. 489-502, SpringerVerlag, 2000. 2. M.-L. Akkar, C. Giraud, An Implementation of DES and AES Secure against Some Attacks. In Proceedings of CHES’2001, LNCS 2162, pp. 309-318, Springer-Verlag, 2001. 3. E. Biham, A. Shamir, Power Analysis of the Key Scheduling of the AES Candidates. In Proceedings of the Second Advanced Encryption Standard (AES) Candidate Conference, March 1999. Available from http://csrc.nist.gov/encryption/aes/round1/Conf2/aes2conf.htm 4. S. Chari, C.S. Jutla, J.R. Rao, P. Rohatgi, A Cautionary Note Regarding Evaluation of AES Candidates on Smart-Cards. In Proceedings of the Second Advanced Encryption Standard (AES) Candidate Conference, March 1999. Available from http://csrc.nist.gov/encryption/aes/round1/Conf2/aes2conf.htm 5. S. Chari, C.S. Jutla, J.R. Rao, P. Rohatgi, Towards Sound Approaches to Counteract Power-Analysis Attacks. In Proceedings of CRYPTO’99, LNCS 1666, pp. 398-412, Springer-Verlag, 1999. 6. J.-S. Coron, Resistance Against Diﬀerential Power Analysis for Elliptic Curve Cryptosystems. In Proceedings of CHES’99, LNCS 1717, pp. 292-302, SpringerVerlag, 1999. 7. J. Daemen, V. Rijmen, Resistance Against Implementation Attacks: A Comparative Study of the AES Proposals. In Proceedings of the Second Advanced Encryption Standard (AES) Candidate Conference, March 1999. Available from http://csrc.nist.gov/encryption/aes/round1/Conf2/aes2conf.htm

A Generic Protection against High-Order Diﬀerential Power Analysis

205

8. J. Daemen, M. Peters, G. Van Assche, Bitslice Ciphers and Power Analysis Attacks. In Proceedings of FSE’2000, LNCS 1978, Springer-Verlag, 2000. 9. L. Goubin, J. Patarin, Proc´ed´e de s´ecurisation d’un ensemble ´electronique de cryptographie a ` cl´e secr`ete contre les attaques par analyse physique. European Patent, SchlumbergerSema, February 4th, 1999, Publication Number: 2789535. 10. L. Goubin, J. Patarin, DES and Diﬀerential Power Analysis – The Duplication Method . In Proceedings of CHES’99, LNCS 1717, pp. 158-172, Springer-Verlag, 1999. 11. P. Kocher, J. Jaﬀe, B. Jun, Introduction to Diﬀerential Power Analysis and Related Attacks. Technical Report, Cryptography Research Inc., 1998. Available from http://www.cryptography.com/dpa/technical/index.html 12. P. Kocher, J. Jaﬀe, B. Jun, Diﬀerential Power Analysis. In Proceedings of CRYPTO’99, LNCS 1666, pp. 388-397, Springer-Verlag, 1999. 13. T.S. Messerges, Using Second-Order Power Analysis to Attack DPA Resistant software. In Proceedings of CHES’2000, LNCS 1965, pp. 238-251, Springer-Verlag, 2000. 14. T.S. Messerges, E.A. Dabbish, R.H. Sloan, Investigations of Power Analysis Attacks on Smartcards. In Proceedings of the USENIX Workshop on Smartcard Technology, pp. 151-161, May 1999. Available from http://www.eecs.uic.edu/∼tmesserg/papers.html 15. T.S. Messerges, E.A. Dabbish, R.H. Sloan, Power Analysis Attacks of Modular Exponentiation in Smartcards. In Proceedings of CHES’99, LNCS 1717, pp. 144157, Springer-Verlag, 1999. 16. K. Okeya, K. Sakurai, Power Analysis Breaks Elliptic Curve Cryptosystem even Secure against the Timing Attack . In Proceedings of INDOCRYPT’2000, LNCS 1977, pp. 178-190, Springer-Verlag, 2000.

A New Class of Collision Attacks and Its Application to DES Kai Schramm, Thomas Wollinger, and Christof Paar Department of Electrical Engineering and Information Sciences Communication Security Group (COSY) Ruhr-Universit¨ at Bochum, Germany Universitaetsstrasse 150 44780 Bochum, Germany {schramm,wollinger,cpaar}@crypto.rub.de http://www.crypto.rub.de

Abstract. Until now in cryptography the term collision was mainly associated with the surjective mapping of diﬀerent inputs to an equal output of a hash function. Previous collision attacks were only able to detect collisions at the output of a particular function. In this publication we introduce a new class of attacks which originates from Hans Dobbertin and is based on the fact that side channel analysis can be used to detect internal collisions. We applied our attack against the widely used Data Encryption Standard (DES). We exploit the fact that internal collisions can be caused in three adjacent S-Boxes of DES [DDQ84] in order to gain information about the secret key-bits. As result, we were able to exploit an internal collision with a minimum of 140 encryptions1 yielding 10.2 key-bits. Moreover, we successfully applied the attack to a smart card processor. Keywords: DES, S-Boxes, collision attack, internal collisions, power analysis, side channel attacks.

1

Introduction

Cryptanalysists have used collisions2 to attack hash functions for years [Dob98, BGW98b]. Most of the previous attacks against hash functions only attacked a few rounds, e.g., three rounds of RIPEMD [Dob97,NIS95]. In [Dob98], Dobbertin revolutionized the ﬁeld of collision attacks against hash functions by introducing an attack against the full round MD4 hash function [Riv92]. It was shown that MD4 is not collision free and that collisions in MD4 can be found in a few seconds on a PC. Another historic example of breaking an entire hash function is 1

2

depending on the applied measurement hardware and sampling frequency a multiple of 140 plaintexts may have to be sent to the target device in order to average the corresponding power traces, which eﬀectively decrease noise. In the remainder of this publication we do not require an internal collision to be detectable at the output of the cryptographic algorithm.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 206–222, 2003. c International Association for Cryptologic Research 2003

A New Class of Collision Attacks and Its Application to DES

207

the COMP128 algorithm [BGW98a]. COMP128 is widely used to authenticate mobile station to base stations in GSM (Global System for Mobile Communication) networks [GSM98]. COMP128’s core building block is a hash function based on a butterﬂy structure with ﬁve stages. In [BGW98b], it was shown that it is possible to cause a collision in the second stage of the hash function, which fully propagates to the output of the algorithm. Hence, a collision can be easily detected revealing information about the secret key. Cryptographers have traditionally designed new cipher systems assuming that the system would be realized in a closed, reliable computing environment, which does not leak any information about the internal state of the system. However, any physical implementation of a cryptographic system will generally provide a side channel leaking unwanted information. In [KJJ99], two practical attacks, Simple Power Analysis (SPA) and Diﬀerential Power Analysis (DPA), were introduced. The power consumption was analyzed in order to ﬁnd the secret keys from a tamper resistant device. The main idea of DPA is to detect regions in the power consumption of a device which are correlated with the secret key. Moreover, little or no information about the target implementation is required. In recent years the were several publications dealing with side channel attacks: side channel analysis of several algorithms, improvements of the original attacks, e.g., higher order DPA and sliding window DPA and hard- and software countermeasures were published [CCD00a, CJR+ 99b, CJR+ 99a, Cor99, FR99, GP99, CCD00b, CC00, Sha00, Mes00, MS00]. Recently, attacks based on the analysis of electromagnetic emission have also been published [AK96, AARR02]. The main idea of this contribution is to combine ‘traditional’ collision attacks with side channel analysis. Traditional collision attacks implied that an internal collision fully propagates to the output of the function. Using side channel analysis it is possible to detect a collision at any state of the algorithm even if it does not propagate to the output. Our Main Contributions A New Class of Collision Attack: The work at hand presents a collision attack against cryptographic functions embedded in symmetric ciphers, e.g., the f -function in DES. The idea, which originally comes from Hans Dobbertin, is to detect collisions within the function by analysis of side channel information, e.g., power consumption. Contrary to previous collision attacks we exploit internal collisions, which are not necessarily detectable at the output. Modiﬁed versions of this attack can be potentially applied to any symmetric cipher, in which internal collisions are possible. Furthemore, we believe that our attack is resistant against certain side channel countermeasures, which we will show in future publications. Collisions within the DES f-Function: In [DDQ84], it was ﬁrst shown that the f-function of DES is not one-to-one for a ﬁxed round key, because collisions can be caused in three adjacent S-Boxes. We discovered that such internal collisions reveal information about the secret key. On average3 140 3

averaged over 10,000 random keys.

208

Kai Schramm, Thomas Wollinger, and Christof Paar

diﬀerent encryptions are required to ﬁnd the ﬁrst collision, a signiﬁcant lower number of encryptions is required to ﬁnd further collisions. This result is a breakthrough for future attacks against DES and other cryptographic algorithms vulnerable to internal collisions. Realization of the Attack: Smart cards play an increasingly important role for providing security functions. We applied our attack against an 8051 compatible smart card processor running DES in software. We focussed on the S-Box triple 2,3,4 and were able to gain 10.2 key-bits with 140 encryptions on average including key reduction. We would like to mention that there exists another attack against DES based on internal collision which requires less measurements. This attack was developed by Andreas Wiemers and exploits collisions within the Feistel cipher [Wie03]. The remaining of this publication is organized as follows. Section 2 summarizes previous work on collision attacks, side channel attacks, and DES attacks. In Section 3 we explain the principle of our new attack. In Section 4 we apply our attack to the f -function of DES. In Section 5 further optimizations of our collision attack against DES are given. In Section 6 we compromise an 8051 compatible smartcard processor running DES. Finally, we end this contribution with a discussion of our results and some conclusions.

2

Previous Work

Collision Attacks. The hashing algorithm COMP128 was a suggested implementation of the algorithms A3 and A8 for GSM [GSM98]. Technical details of COMP128 were strictly conﬁdential, however, in 1998 the algorithm was completely reverse engineered [BGW98a]. COMP128 consists of nine rounds and the core building block is a hash function. This hash function itself is based on the butterﬂy structure and consists of ﬁve stages. The output bytes contain a response used for the authentication of the mobile station with the base station and the session key used for the stream cipher A5. In [BGW98b], the COMP128 algorithm was cracked exploiting a weakness in the butterﬂy structure. Only the COMP128 input bits corresponding to the random number can be varied. A collision can occur in stage 2 of the hash function. It will fully propagate to the output of the algorithm and, as a result, it will be detectable at the output. To launch the attack, one has to vary bytes i + 16 and i + 24 of the COMP128 input and ﬁx the remaining input bytes. The birthday paradox guarantees, that collision will occur rapidly and the colliding bytes are i, i + 8, i + 16, and i + 24. The attack requires 217.5 queries to recover the whole 128-bit key. Most of the presented attacks against hash functions only attacked a few rounds, e.g., three rounds of RIPEMD [Dob97, NIS95]. Also MD4 was ﬁrst attacked partially. There were approaches to attack the two round MD4 [dBB94, Vau94] (also an unpublished attack from Merkle). In [Dob98], Dobbertin introduced an attack against the whole MD4 hash function [Riv92]. It was shown, that an earlier attack against RIPEMD [Dob97] can be applied to MD4 very

A New Class of Collision Attacks and Its Application to DES

209

eﬃciently. An algorithm was developed that allows to compute a collision in a few seconds on a PC with a Pentium processor. Finally, it was demonstrated that a further development of the attack could ﬁnd collisions for meaningful messages. The main result of that contribution was that MD4 is not collisionfree and it requires the same computational eﬀort as 220 computations of the MD4-compression function to ﬁnd a collision. The basic idea of the attack is that a diﬀerence of the input variables can be controlled in such a way that the diﬀerences occurring in the computation of the two associated hash values are compensated at the end. Side Channel Attacks. A cryptographic system embedded into a microchip generally consists of many thousand logic gates and storage elements. The power consumption of the system can be analyzed with a shunt resistance put in series between the ground pad of the microchip and the external ground of the voltage source. A digital oscilloscope is used to digitize the voltage over the shunt resistance, which is proportional to the power consumption of the system. Power analysis can be classiﬁed into Simple Power Analysis (SPA) and Differential Power Analysis (DPA) [KJJ99, KJJ98]. SPA directly interprets power consumption during cryptographic operations. Hence, an attacker must have detailed information about the target hardware and the implemented algorithm. Two types of information leakage have been observed in SPA: Hamming weight and transition count leakage of internal registers and accumulators [MDS99]. The Hamming weight is often directly proportional to the amount of current that is being discharged from the gate driving the data and address bus4 [MDS99, Mui01]. Transition count information leaks during a gate transition from high to low or low to high when bits of internal registers ﬂip [MDS99]. The main idea of the DPA is to detect regions in the power consumption of a cryptographic device correlated with particular bits of the secret key [KJJ99]. The adversary guesses a key (hypothesis) and encrypts random plaintexts. Depending on a particular observed bit within the algorithm, whose state can be computed based on the prior hypothesis, measured power traces are added or subtracted yielding a diﬀerential trace. A correct hypothesis will provide a high correlation of the diﬀerential trace with the observed bit, which will be indicated by distinct peaks. Contrary to SPA no information about the target implementation is required. In [KJJ99], it was shown that DES [NIS77] and RSA [RSA78] can be broken by DPA.

3

Principle of the Internal Collision Attack

An internal collision occurs if a function of a cryptographic algorithm computes two diﬀerent input arguments, but returns an equal output argument. We propose the term ‘internal’ collision, because in general the collision will not propagate to the output of the algorithm. Since we are not able to detect it at the output we correlate side channel information of the cryptographic device, e.g., 4

if a precharged bus design is used.

210

Kai Schramm, Thomas Wollinger, and Christof Paar

power traces, under the assumption that an internal collision will cause a high correlation of diﬀerent encryptions (decryptions) at one point of time. Moreover, we assume that internal collisions which occur for particular plaintext (ciphertext) encryptions (decryptions) are somehow correlated with the secret key. A typical example of a function vulnerable to internal collisions is a surjective SBox. However many other functions, e.g., based on ﬁnite ﬁeld arithmetics, can cause collisions, too. In this publication, we exploit the fact that is possible to cause a collision in the non-linear f -function of DES in order to gain secret key-bits. In Figure 1 the propagation path of a collision occurring in the f -function of round n is shown. The f -function in round n + 1 processes the same input data, but any further rounds will not be aﬀected by the collision.

0

31 0

31

L

R

32

fk

32

round n L+f k (R)

R

R’=L+fk(R)

L’=R 0

31 0

32

31

fk

32

round n+1 L’+f k(R’)

R’

L’+f k(R’)

R’ 0

31 0

32

31

fk

32

round n+2

Fig. 1. Propagation path of an internal collision in DES.

An adversary encrypts (decrypts) particular plaintexts (ciphertexts) in order to cause an internal collision at one point of the algorithm. Detection of these collisions is possible by correlation of side channel information corresponding to diﬀerent encryptions (decryptions), e.g., power traces of round n + 1.

A New Class of Collision Attacks and Its Application to DES

211

Collisions within the DES f -Function

4 4.1

Collisions in Single S-Boxes

In this section we brieﬂy remind that it is possible to cause collisions in isolated S-Boxes. However, as stated in [DDQ84] overall collisions in the f -function can only be caused within three S-Boxes simultaneously. For a detailed description of DES the reader is referred to, e.g., [NIS77,MvOV97]. The eight S-Box mappings 26 → 24 are surjective. Moreover, the mappings are uniformly distributed, which means that for each input z ∈ {0, . . . , 26 − 1} of S-Box Si , i ∈ {1, . . . , 8}, there exist exactly three x-or diﬀerentials δ1 , δ2 and δ3 ∈ {1, . . . , 26 − 1}, which will cause a collision within a single S-Box Si (z) = Si (z ⊕ δ1 ) = Si (z ⊕ δ2 ) = Si (z ⊕ δ3 ),

δ1 = δ2 = δ3 = 0,

i ∈ {1, . . . , 8}

If, for example, the ﬁrst S-Box is examined and z = 000000, then there exist three diﬀerentials δ1 ,δ2 and δ3 causing a collision: S1 (000000) = S1 (000000 ⊕ 001001 = 001001) = S1 (000000 ⊕ 100100 = 100100) = S1 (000000 ⊕ 110111 = 110111) = 14 However, it is not possible to directly set the six-bit input z of an S-Box. The input z corresponds to a particular six-bit input x entering the f -function. This input x is diﬀused5 in the expansion permutation and x-ored with six key-bits k of the round key: z =x⊕k ⇔k =x⊕z

k, x, z ∈ {0, . . . , 26 − 1}

A table can be generated for each S-Box, which lists the three diﬀerentials δ1 , δ2 and δ3 ∈ {1, . . . , 26 − 1} corresponding to all 64 S-Box inputs z ∈ {0, . . . , 26 − 1}. These eight tables can be resorted in order to list the inputs z ∈ {0, . . . , 26 −1} corresponding to all occurring diﬀerentials δi ∈ {1, . . . , 26 −1}. In the remainder of this publication these latter tables will be referred to as the δ-tables (as an example we included the δ-table of S-Box 1 in the appendix). In order to exploit the six key-bits k an adversary chooses a particular δ and varies the input x until he/she detects a collision S(x⊕k) = S(x⊕k⊕δ). The two most and least signiﬁcant bits of the inputs x and x⊕δ will also enter the adjacent S-Boxes due to the bit spreading of the expansion box. As shown in Figure 2 the inputs of the adjacent S-Boxes only remain unchanged if the two most and least signiﬁcant bits of diﬀerential δ are zero. However, such a diﬀerential δ does not exist, which is a known S-Box criterion [Cop94]. Therefore a collision attack targeting a single S-Box while preserving the inputs of the two adjacent S-Boxes is not possible. 5

i.e. the two most and least signiﬁcant bits of x will be x-ored with particular bits of the round key and then enter the adjacent S-Boxes.

212

Kai Schramm, Thomas Wollinger, and Christof Paar 0

0 0

0 x1 x 2 0

0 0 x 1 x2 0 0

0

0 0

Fig. 2. Required Bit Mask of δ for a Single S-Box Collision.

4.2

Collisions in Three S-Boxes

As stated in [DDQ84] it is possible to cause collisions within three adjacent SBoxes simultaneously. In this case the inputs x and x ⊕ ∆ have a length of 18 bits6 . The diﬀerential ∆ = δ1 |δ2 |δ3 denotes the concatenation of three S-Box diﬀerentials δ1 , δ2 , δ3 corresponding to each S-Box of the triple. In order not to alter the inputs of the two neighboring S-Boxes to the left and right of the S-Box triple, the two most and least signiﬁcant bits of ∆ must be zero: ∆[0] = ∆[1] = ∆[16] = ∆[17] = 0 Moreover, in order to propagate through the expansion box, ∆ must fulﬁl the condition: ∆[4] = ∆[6], ∆[5] = ∆[7], ∆[10] = ∆[12], ∆[11] = ∆[13] Thus ∆ = δ1 |δ2 |δ3 must comply with the bit mask ∆ = 00x1 x2 vwvwx3 x4 yzyzx5 x6 00 with xi , v, w, y, z ∈ {0, 1}, which is shown in Figure 3. 0

0 0

0 x 1 x2 v

w x3 x 4 y

z x5 x6 0

0 0 x1 x2 v w

v w x 3 x4 y z

y z x 5 x6 0 0

0

0 0

Fig. 3. Required S-Box triple ∆ Bit Mask.

Analysis of the δ-tables reveals that there exist many diﬀerentials ∆, which comply with the properties stated above. As result, it is possible to cause collisions in an S-Box triple while preserving the inputs of the two neighboring S-Boxes. This means that there exist inputs x and x ⊕ ∆, which cause a collision f (x) = f (x ⊕ ∆) in the f -function. As an example we assume that an adversary randomly varies exactly those 14 input bits of function f in the ﬁrst round, which enter the targeted S-Box triple. All 50 remaining bits of the plaintext are not changed. Within function f these bits are expanded to the 18 bit input x and x-ored with 18 corresponding key-bits k of the 48 bit round key. The result z = x⊕k enters the targeted S-Box 6

We refer to x and x ⊕ ∆ as the inputs of function f after having propagated through the expansion box, i.e., they have a length of 18 bits, but x, x ⊕ ∆ ∈ {0, . . . , 214 − 1}.

A New Class of Collision Attacks and Its Application to DES

213

triple. The adversary uses power analysis to record the power consumption of the cryptographic device during round two. Next, he sets the input to x ⊕ ∆ and again records the power consumption during round two. A high correlation of the two recorded power traces reveals that the same data was processed in function f in round two, i.e., a collision occurred. Once he detects a collision, analysis of the three corresponding δ-tables will reveal possible key candidates k = z ⊕ x. Let Z∆ denote the set of all possible 18 bit inputs zi causing a collision in a particular S-Box triple for a particular diﬀerential ∆. For a ﬁxed x, K is the set of all possible key candidates ki : K = {x ⊕ zi } = {ki }

zi ∈ Z∆

Therefore, the number of key candidates ki is equal to the number of possible S-Box triple inputs zi : |K| = |Z∆ | However, for a particular 18 bit key k only those values of zi can cause collisions for which x = zi ⊕ k can propagate through the expansion box. Hence, we have to check whether all possible keys k ∈ {0, . . . , 218 − 1} can cause collisions for a particular z ∈ Z∆ . In particular, eight bits k[4], k[5], k[6], k[7] and k[10], k[11], k[12], k[13] of the key k determine whether zi ⊕ k yields a valid value of x. In general, we only use those diﬀerentials ∆ of an S-Box triple, for which there exist inputs zi which will yield a valid x = zi ⊕ k for any key k ∈ {0, . . . , 218 − 1}. Thus any 18 bit key k can be classiﬁed into one of 28 possible key sets Kj , j ∈ {0, . . . , 28 − 1}. The set ZKj of valid S-Box triple inputs zi causing a collision for a given key k ∈ Kj is generally a subset of set Z∆ : Z∆,Kj ⊆ Z∆

j ∈ {0, . . . , 28 − 1}

For a ﬁxed key k ∈ Kj and a random x ∈ {0, . . . , 214 − 1} the probability of a collision is |Z∆,Kj | P (f (x) = f (x ⊕ ∆)|k ∈ Kj ) = 214 In general, two plaintexts x and x⊕∆ have to be encrypted to check for a collision f (x) = f (x⊕∆). The average number of encryptions #M until a collision occurs for a ﬁxed key k is #M =

214 215 2 =2· = P (f (x) = f (x ⊕ ∆)|k ∈ Kj ) |Z∆,Kj | |Z∆,Kj |

The total probability of a collision for an arbitrary key k ∈ Kj is P (f (x) = f (x ⊕ ∆)) =

255

P (f (x) = f (x ⊕ ∆)|k ∈ Kj ) · P (k ∈ Kj )

j=0

= 2−22 ·

255 j=0

|Z∆,Kj |

214

Kai Schramm, Thomas Wollinger, and Christof Paar

The average number of encryptions #M until a collision occurs for an arbitrary key k ∈ Kj is #M = 2 ·

5 5.1

255 255 1 1 1 · = 27 · 256 j=0 P (f (x) = f (x ⊕ ∆)|k ∈ Kj ) |Z | ∆,K j j=0

Optimization of the Collision Attack Multiple Diﬀerentials

In order to decrease the number of encryptions until a collision occurs the attack can be extended to n diﬀerentials ∆1 , . . . , ∆n yielding a set of 2n possible encryptions f (x), f (x ⊕ ∆1 ), f (x ⊕ ∆2 ), f (x ⊕ ∆2 ⊕ ∆1 ),. . . , f (x ⊕ ∆n ⊕ . . . ⊕ ∆1 ) for a ﬁxed x. We are now looking for collisions between any two encryptions which has the potential to dramatically increase the likelihood of a collision due to the Birthday paradox. A collision f (x ) = f (x ) can only occur, if x ⊕ x equals a diﬀerential ∆j , with j ∈ {1, . . . , n}. In Table 1 the costs of the attacks using a single diﬀerential ∆ and using n diﬀerentials ∆1 , . . . , ∆n are compared. Table 1. Comparison of the collision attacks using a single and multiple diﬀerentials.

#x #∆ #M #collision tests

single ∆ multiple ∆’s m m 1 n 2·m m · 2n m m · n · 2n−1

For example using a single ∆ the random generation of m = 64 inputs x will result in #M = 128 encryptions and will only yield m = 64 collision tests f (x) = f (x ⊕ ∆). Using n = 4 diﬀerentials ∆1 , . . . , ∆4 the random generation of m = 8 inputs x will also result in #M = 8 · 24 = 128 encryptions, but will yield 8 · 4 · 23 = 256 collision tests. In this example, with the same number of encryptions we are able to perform four times as many collision tests, which results in a higher probability of a collision. As an example, Figure 4 shows a set of 2n = 23 = 8 encryptions for n = 3 diﬀerentials ∆1 , ∆2 and ∆3 . In this case n · 2n−1 = 3 · 22 = 12 possible collisions A1, A2, . . . , C4 can occur with the following probabilities: P1 = P (A1) = P (A2) = P (A3) = P (A4) = P (f (x) = f (x ⊕ ∆1 )) P2 = P (B1) = P (B2) = P (B3) = P (B4) = P (f (x) = f (x ⊕ ∆2 )) P3 = P (C1) = P (C2) = P (C3) = P (C4) = P (f (x) = f (x ⊕ ∆3 ))

A New Class of Collision Attacks and Its Application to DES

)

f(x

∆1 )

f(x + ∆2

f(x + f(x + ∆ 3

B1 B2

)

∆ 2 + ∆1 )

f(x +

A1

C1 A2 C2 C3

) ∆1 )

f(x + ∆ 3 + f(x + ∆ 3 + ∆ 2

A3

B3

C4

B4

)

f(x + ∆ 3 + ∆ 2 + ∆ 1 )

215

A4

Fig. 4. Possible collision tests for n = 3 diﬀerentials.

If collision tests A1, A2, . . . , C4 are stochastically independent7 , the overall probability can also be expressed as: P ((A1 ∪ A2 ∪ A3 ∪ A4) ∪ (B1 ∪ B2 ∪ B3 ∪ B4) ∪ (C1 ∪ C2 ∪ C3 ∪ C4)) = 1 − [(1 − P (A1)) · (1 − P (A2)) · (1 − P (A3)) · (1 − P (A4)) · (1 − P (B1)) · (1 − P (B2)) · (1 − P (B3)) · (1 − P (B4)) · (1 − P (C1)) · (1 − P (C2)) · (1 − P (C3)) · (1 − P (C4))] ≈ P (A1) + P (A2) + . . . + P (C4) In general, if n diﬀerentials are being used and there exist no stochastical dependencies among collision tests, the overall probability that at least one collision will occur within a set of 2n encryptions is P (collision) = 1 − (

n

(1 − Pi ))2

n−1

i=1

with

≈ 2n−1 ·

n

Pi

i=1

Pi = P (f (x) = f (x ⊕ ∆i ))

So far we assumed that collision tests were stochastically independent, i.e. the occurrence of a particular collision does not condition any other collision within a set of encryptions. Surprisingly, analysis of the collision sets Z∆ revealed that stochastical dependencies among collision tests do exist for certain diﬀerentials. In general, stochastical dependent collision tests are not desired, because they decrease the overall probability of a collision within a set of encryptions. 5.2

Linear Dependencies

By analysis we discovered that there exist many linear combinations among the diﬀerentials ∆ of all eight S-Box triples. In an attack based on multiple diﬀerentials ∆1 , . . . , ∆n linear combinations of these will eventually yield additionals 7

i.e. the occurrence of a collision does not depend on any other collision test within a set of 2n encryptions.

216

Kai Schramm, Thomas Wollinger, and Christof Paar

5 4 3 2 1 0

5 4 3 2 1 0

5 4 3 2 1 0

Fig. 5. Further collisions in single S-Boxes.

diﬀerentials ∆j . As result, further collision tests can be performed without increasing the number of encryptions. Thus the probability of a collision within a set of 2n encryptions is increased: ∆j = a1 · ∆1 ⊕ . . . ⊕ an · ∆n , ai ∈ {0, 1}

∆j = ∆1 = . . . = ∆n

The improvement achieved by exploiting linear combinations among diﬀerentials is shown in the next example. An adversary tries to cause a collision in S-Boxes 2,3,4 using n = 5 diﬀerentials ∆3 , ∆13 , ∆15 , ∆16 and ∆21 . Analysis of the δ-tables of S-Boxes 2,3 and 4 reveals that there exist the following linear combinations: ∆1 = ∆3 ⊕ ∆13 ⊕ ∆15 ∆2 = ∆3 ⊕ ∆13 ⊕ ∆16 ∆4 = ∆3 ⊕ ∆15 ⊕ ∆16 ∆14 = ∆13 ⊕ ∆15 ⊕ ∆16 ∆22 = ∆15 ⊕ ∆16 ⊕ ∆21 ∆23 = ∆13 ⊕ ∆15 ⊕ ∆21 ∆24 = ∆13 ⊕ ∆16 ⊕ ∆21 These seven linear combinations will allow the adversary to check 7 · 2n−1 = 112 additional collision tests for each set of 2n = 32 encryptions. The total number of collision tests for a set of 32 encryptions is thus (n + 7) · 2n−1 = 192. 5.3

Key Candidate Reduction

Once a ﬁrst collision occurred further collisions will provide additional key sets Ki . The intersection Kint of these sets delimits the number of key candidates: Kint = K1 ∩ K2 ∩ . . . ∩ Kj Additional collisions can be found eﬃciently by ﬁxing the input of two S-Boxes and only varying the input of the third S-Box. Due to the bit spreading in the expansion box not all input bits of the third S-Box can be varied. Only bits 2-5 of the S-Box to the left, bits 2 and 3 of the middle S-Box and bits 0-3 of the S-Box to the right can be varied without altering the inputs of the other two S-Boxes. Analysis of the collision set Z∆ provides all existing x-or diﬀerences = z ⊕z with z , z ∈ Z∆ . The theoretical8 maximum of diﬀerentials , which only alter 8

disregarding the S-Box disign criteria.

A New Class of Collision Attacks and Its Application to DES

217

oscilloscope data (GPIB) channel 1: current R S 232

power supply + -

scope Rs

PC µC

channel 2: trigger (I/O)

microcontroller board

Fig. 6. Measurement setup for power analysis of a microcontroller.

the input of a single S-Box is 15+3+15 = 33. For any existing further collisions f (x ⊕ ) = f (x ⊕ ⊕ ∆) might be detected. For example an adversary tries to cause collisions in S-Boxes 1,2,3 using diﬀerential ∆3 . A ﬁrst collision f (x) = f (x ⊕ ∆3 ) yields |Z∆3 | = 1120 possible key candidates. Analysis of the collision set Z∆3 reveals that there exist 18 out of 33 diﬀerentials i , which comply with the conditions stated above. The adversary tries to ﬁnd further collisions f (x ⊕ i ) = f (x ⊕ i ⊕ ∆3 ) and is able to detect eight additional collisions delimiting the number of key candidates from 1120 down to 16.

6

Practical Attack

In order to verify the DES collision attack, we simulated it on a PC. In addition, an 8051 compatible microcontroller running a software implementation of DES was successfully compromised using the proposed collision attack. The measurement setup used in this practical attack is shown in Figure 6. In this setup a PC sends chosen plaintexts to the microcontroller and triggers new encryptions. In order to measure the power consumption of the microcontroller a small shunt resistance (here Rs = 10Ω) is put in series between the ground pad and ground of the power supply. We also replaced the original voltage source of the microcontroller with a low-noise voltage source to minimize noise superimposed by the source. The digital oscilloscope HP1662AS was used to sample the voltage over the shunt resistance at 1 GHz. Collisions were caused in the ﬁrst round of DES. Power traces of round two were transferred to the PC using the GPIB interface. The PC was used to correlate power traces of diﬀerent encryptions in order to detect collisions. In our experiments we discovered that a correlation coeﬃcient greater than 95% generally indicated a collision. If no collision occurred, the correlation coeﬃcient was always well below 95%, typically ranging from 50% to 80%. In general, uncorrelated noise such as voltage source noise, quantization noise of the oscilloscope or intrinsic noise within the microcontroller can be decreased by averaging power traces of equal encryptions9 . In 9

we assume that no countermeasures such as random dummy cycles are present.

218

Kai Schramm, Thomas Wollinger, and Christof Paar

Fig. 7. Power consumption of the microcontroller encrypting x.

Fig. 8. Power consumption of the microcontroller encrypting x ⊕ ∆.

our experiments we found out that averaging of N = 10 power traces was clearly suﬃcient to achieve the signiﬁcant correlation results stated above. Averaging may not even be necessary at all if additional measurement circuitry is used in order to decouple the external voltage source from the target hardware or data is acquired at higher sampling rates. For example, in Figures 7 and 8 the averaged power traces of two diﬀerent plaintext encryptions x and x ⊕ ∆ during the S-Box look-up in round two is shown. The power traces 7 and 8 clearly diﬀer in peaks. This indicates a low correlation, i.e., no collision occured.

7

Results and Conclusions

We proposed a new kind of attack, which uses side channel analysis to detect internal collisions. In this paper the well known block cipher DES is attacked. However, the attack can be applied to any cryptographic function in which internal collisions are possible. We showed that internal collisions can be caused within three adjacent S-Boxes of DES yielding secret key information. Further-

A New Class of Collision Attacks and Its Application to DES

219

more, we presented diﬀerent methods in order to minimize the cost of ﬁnding such collisions. In our computer simulations we heuristically searched for the optimum combination of diﬀerentials ∆i for all eight S-Box triples in order to minimize the number of required encryptions until a collision occured. The results of this exhaustive search are listed in Table 2, where #M denotes the average10 number of encryptions until a collision occurs. #K denotes the average number of key candidates corresponding to 18 key-bits found after applying the key reduction method. As result, we were able to cause a collision in S-Box triple 2,3,4 with a minimum average of 140 encryptions. Using the key reduction method we were able to delimit 18 key-bits to an average of 220 key candidates which is equivalent to log2 (220) ≈ 7.8 key-bits, i.e., 10.2 key bits were broken. Moreover, we were able to cause collisions in S-Box triple 7,8,1 with an average of 165 encryptions yielding on average 19 key candidates, thus breaking 18 − log2 (19) ≈ 13.8 key-bits. Finally, we successfully validated our attack by compromising an 8051 compatible microcontroller running DES in software. Table 2. Results of the exhaustive search for the S-Box triple/∆ set optimum. S-Boxes 1,2,3 2,3,4 3,4,5 4,5,6 5,6,7 6,7,8 7,8,1 8,1,2

#∆ 3 5 3 3 5 5 5 4

∆1 , ∆ 2 , . . . ∆3 , ∆15 , ∆18 ∆3 , ∆13 , ∆15 , ∆16 , ∆21 ∆3 , ∆10 , ∆12 ∆2 , ∆10 , ∆11 ∆2 , ∆5 , ∆8 , ∆23 , ∆29 ∆7 , ∆10 , ∆19 , ∆20 , ∆32 ∆1 , ∆2 , ∆7 , ∆17 , ∆19 ∆1 , ∆2 , ∆8 , ∆38

#M 227 140 190 690 290 186 165 208

#K 20 220 110 71 24 52 19 158

References [AARR02] D. Agrawal, B. Archambeault, J. R. Rao, and P. Rohatgi. The EM Side – Channel(s). In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2002. Springer-Verlag, 2002. [AK96] R. Anderson and M. Kuhn. Tamper Resistance - a Cautionary Note. In Second Usenix Workshop on Electronic Commerce, pages 1–11, November 1996. [BGW98a] M. Briceno, I. Goldberg, and D. Wagner. An Implementation of the GSM A3A8 algorithm, 1998. http://www.scard.org/gsm/a3a8.txt. [BGW98b] M. Briceno, I. Goldberg, and D. Wagner. GSM cloning, 1998. http://www.isaac.cs.berkely.edu/isaac/gsm–faq.html. [CC00] C. Clavier and J.-S. Coron. On Boolean and Arithmetic Masking against Diﬀerential Power Analysis. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2000, volume LNCS 1965, pages 231 – 237. Springer-Verlag, 2000. 10

averaged over 10,000 random keys.

220

Kai Schramm, Thomas Wollinger, and Christof Paar

[CCD00a] C. Clavier, J.S. Coron, and N. Dabbous. Diﬀerential Power Analysis in the Presence of Hardware Countermeasures. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2000, volume LNCS 1965, pages 252–263. Springer-Verlag, 2000. [CCD00b] C. Clavier, J.-S. Coron, and N. Dabbour. Diﬀerential Power Anajlysis in the Presence of Hardware Countermeasures. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2000, volume LNCS 1965, pages 252 – 263. Springer-Verlag, 2000. [CJR+ 99a] S. Chari, C. S. Jutla, J. R. Rao, , and P. Rohatgi. A Cauttionary Note Regarding the Evaluation of AES Condidates on Smart Cards. In Proceedings: Second AES Candidate Conference (AES2), Rome, Italy, March 1999. [CJR+ 99b] S. Chari, C. S. Jutla, J. R. Rao, , and P. Rohatgi. Towards Sound Approaches to Counteract Power-Analysis Attacks. In Advances in Cryptology – CRYPTO ’99, volume LNCS 1666, pages 398 – 412. Springer-Verlag, August 1999. [Cop94] D. Coppersmith. The Data Encryption Standard (DES) and its Strength Against Attacks. Technical report rc 186131994, IBM Thomas J. Watson Research Center, December 1994. [Cor99] J.-S. Coron. Resistance against Diﬀerentail Power Analysis for Elliptic Curve Cryptosystems. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 1999, volume LNCS 1717, pages 292 – 302. Springer-Verlag, 1999. [dBB94] B. den Boer and A. Bosselaers. Collisions for the Compression Function of MD5. In T. Hellenseth, editor, Advances in Cryptology – EUROCRYPT ’93, volume LNCS 0765, pages 293 – 304, Berlin, Germany, 1994. SpringerVerlag. [DDQ84] M. Davio, Y. Desmedt, and J.-J. Quisquater. Propagation Characteristics of the DES. In Advances in Cryptology – CRYPTO ’84, pages 62–74. SpringerVerlag, 1984. [Dob97] H. Dobbertin. RIPEMD with two-round compress function is not collisionfree. Journal of Cryptology, 10:51–68, 1997. [Dob98] H. Dobbertin. Cryptanalysis of md4. Journal of Cryptology, 11:253–271, 1998. [FR99] P. N. Fahn and P.K. Rearson. IPA: A New Class of Power Attacks. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 1999, volume LNCS 1717, pages 173 – 186. SpringerVerlag, 1999. [GP99] L. Goubin and J. Patarin. DES and Diﬀerential Power Analysis. In C ¸. K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 1999, volume LNCS 1717, pages 158 – 172. Springer-Verlag, 1999. [GSM98] Technical Information – GSM System Security Study, 1998. http://jya.com/gsm061088.htm. [KJJ98] P. Kocher, J. Jaﬀe, and B. Jun. Introduction to Diﬀerential Power Analysis and Related Attacks. http://www.cryptography.com/dpa/technical, 1998. Manuscript, Cryptography Research, Inc. [KJJ99] P. Kocher, J. Jaﬀe, and B. Jun. Diﬀerential Power Analysis. In Advances in Cryptology – CRYPTO ’99, volume LNCS 1666, pages 388–397. SpringerVerlag, 1999.

A New Class of Collision Attacks and Its Application to DES

221

[MDS99]

T. S. Messerges, E. A. Dabbish, and R. H. Sloan. Investigations of Power Analysis Attacks on Smartcards. In USENIX Workshop on Smartcard Technology, pages 151–162, 1999. [Mes00] T. S. Messerges. Using Second-Order Power Analysis to Attack DPA Resistant Software. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2000, volume LNCS 1965, pages 238 – 251. Springer-Verlag, 2000. [MS00] R. Mayer-Sommer. Smartly Analyzing the Simplicity and the Power of Simple Power Analysis on Smart Cards. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2000, volume LNCS 1965, pages 78 – 92. Springer-Verlag, 2000. [Mui01] J.A. Muir. Techniques of Side Channel Cryptanalysis. Master thesis, 2001. University of Waterloo, Canada. [MvOV97] A. J. Menezes, P. C. van Oorschot, and S. A. Vanstone. Handbook of Applied Cryptography. CRC Press, Boca Raton, Florida, USA, 1997. [NIS77] NIST FIPS PUB 46-3. Data Encryption Standard. Federal Information Processing Standards, National Bureau of Standards, U.S. Department of Commerce, Washington D.C., 1977. [NIS95] NIST FIPS PUB 180-1. Secure Hash Standard. Federal Information Processing Standards, National Bureau of Standards, U.S. Department of Commerce, Washington D.C., April 1995. [Riv92] R. Rivest. RFC 1320: The MD4 Message-Digest Algorithm. Corporation for National Research Initiatives, Internet Engineering Task Force, Network Working Group, Reston, Virginia, USA, April 1992. [RSA78] R. L. Rivest, A. Shamir, and L. Adleman. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM, 21(2):120–126, February 1978. [Sha00] Adi Shamir. Protecting Smart Cards form Power Analysis with Detached Power Supplies. In C ¸ . K. Ko¸c and C. Paar, editors, Cryptographic Hardware and Embedded Systems – CHES 2000, volume LNCS 1965, pages 71 – 77. Springer-Verlag, 2000. [Vau94] S. Vaudenay. On the need of Multipermutations: Cryptanalysis of MD4 and SAFER. In Fast Software Encryption – FSE ’94, volume LNCS 1008, pages 286 – 297, Berlin, Germany, 1994. Springer-Verlag. [Wie03] A. Wiemers. Partial Collision Search by Side Channel Analysis. Presentation at the Workshop: Smartcards and Side Channel Attacks, January 2003. Horst Goertz Institute, Bochum, Germany.

A

S-Box 1 δ-Table

As an example the δ-table of S-Box 1 lists all inputs z corresponding to occurring diﬀerentials δ, which fulﬁl the condition S1 (z) = S1 (z ⊕ δ). The inputs z are listed in pairs of (z, z ⊕ δ), because both values will fulﬁl the condition Si (z) = Si (z ⊕ δ) ⇔ Si ((z ⊕ δ)) = Si ((z ⊕ δ) ⊕ δ). For convenience, the column and row position of inputs z within the S-Box matrix is also given in parentheses.

222

Kai Schramm, Thomas Wollinger, and Christof Paar Table 3. S-Box 1: S1 (z) = S1 (z ⊕ δ).

δ #z (z1 ,z1 ⊕ δ), (z2 ,z2 ⊕ δ), ... 000011 14 ((001000(04,0),001011(05,1)), ((010001(08,1),010010(09,0)), ((010101(10,1),010110(11,0)), ((011000(12,0),011011(13,1)), ((011001(12,1),011010(13,0)), ((100101(02,3),100110(03,2)), ((111001(12,3),111010(13,2)) 000101 4 ((000010(01,0),000111(03,1)), ((111011(13,3),111110(15,2)) 000111 2 ((010011(09,1),010100(10,0)) 001001 10 ((000000(00,0),001001(04,1)), ((000011(01,1),001010(05,0)), ((000100(02,0),001101(06,1)), ((000110(03,0),001111(07,1)), ((100000(00,2),101001(04,3)) 001011 2 ((100111(03,3),101100(06,2)) 001101 6 ((010000(08,0),011101(14,1)), ((110001(08,3),111100(14,2)), ((110101(10,3),111000(12,2)) 001111 2 ((100010(01,2),101101(06,3)) 010001 6 ((001110(07,0),011111(15,1)), ((100001(00,3),110000(08,2)), ((100011(01,3),110010(09,2)) 010011 2 ((100100(02,2),110111(11,3)) 010111 4 ((101000(04,2),111111(15,3)), ((101010(05,2),111101(14,3)) 011001 2 ((101111(07,3),110110(11,2)) 011011 4 ((000101(02,1),011110(15,0)), ((001100(06,0),010111(11,1)) 011101 4 ((000001(00,1),011100(14,0)), ((101110(07,2),110011(09,3)) 011111 2 ((101011(05,3),110100(10,2)) 100010 10 ((000010(01,0),100000(00,2)), ((000011(01,1),100001(00,3)), ((001100(06,0),101110(07,2)), ((001111(07,1),101101(06,3)), ((011100(14,0),111110(15,2)) 100100 12 ((000000(00,0),100100(02,2)), ((000110(03,0),100010(01,2)), ((001000(04,0),101100(06,2)), ((010110(11,0),110010(09,2)), ((010111(11,1),110011( 9,3)), ((011000(12,0),111100(14,2)) 100101 6 ((001101(06,1),101000(04,2)), ((010000(08,0),110101(10,3)), ((011101(14,1),111000(12,2)) 100111 10 ((000111(03,1),100000(00,2)), ((001011(05,1),101100(06,2)), ((010101(10,1),110010(09,2)), ((011011(13,1),111100(14,2)), ((011100(14,0),111011(13,3)) 101000 12 ((001110(07,0),100110(03,2)), ((010000(08,0),111000(12,2)), ((010001(08,1),111001(12,3)), ((010010(09,0),111010(13,2)), ((011101(14,1),110101(10,3)), ((011110(15,0),110110(11,2)) 101001 4 ((010100(10,0),111101(14,3)), ((011000(12,0),110001(08,3)) 101010 4 ((000101(02,1),101111(07,3)), ((011011(13,1),110001(08,3)) 101011 12 ((000010(01,0),101001(04,3)), ((000110(03,0),101101(06,3)), ((001010(05,0),100001(00,3)), ((001110(07,0),100101(02,3)), ((010001( 8,1),111010(13,2)), ((010010(09,0),111001(12,3)) 101100 4 ((000100(02,0),101000(04,2)), ((001011(05,1),100111(03,3)) 101101 6 ((001001(04,1),100100(02,2)), ((001111(07,1),100010(01,2)), ((011001(12,1),110100(10,2)) 101110 6 ((000111(03,1),101001(04,3)), ((010011(09,1),111101(14,3)), ((011010(13,0),110100(10,2)) 101111 2 ((001000(04,0),100111(03,3)) 110001 4 ((011010(13,0),101011(05,3)), ((011110(15,0),101111(07,3)) 110010 4 ((001101(06,1),111111(15,3)), ((011001(12,1),101011(05,3)) 110011 4 ((000011(01,1),110000(08,2)), ((000101(02,1),110110(11,2)) 110101 2 ((010110(11,0),100011(01,3)) 110110 2 ((010101(10,1),100011(01,3)) 110111 2 ((000000(00,0),110111(11,3)) 111001 6 ((010011(09,1),101010(05,2)), ((010111(11,1),101110(07,2)), ((011111(15,1),100110(03,2)) 111010 6 ((000001(00,1),111011(13,3)), ((001010(05,0),110000(08,2)), ((011111(15,1),100101(02,3)) 111011 2 ((000100(02,0),111111(15,3)) 111110 4 ((001001(04,1),110111(11,3)), ((010100(10,0),101010(05,2)) 111111 4 ((000001(00,1),111110(15,2)), ((001100(06,0),110011(09,3))

Further Observations on the Structure of the AES Algorithm Beomsik Song and Jennifer Seberry Centre for Computer Security Research School of Information Technology and Computer Science University of Wollongong Wollongong 2522, Australia {bs81,jennifer seberry}@uow.edu.au

Abstract. We present our further observations on the structure of the AES algorithm relating to the cyclic properties of the functions used in this cipher. We note that the maximal period of the linear layer of the AES algorithm is short, as previously observed by S. Murphy and M.J.B. Robshaw. However, we also note that when the non-linear and the linear layer are combined, the maximal period is dramatically increased not to allow algebraic clues for its cryptanalysis. At the end of this paper we describe the impact of our observations on the security of the AES algorithm. We conclude that although the AES algorithm consists of simple functions, this cipher is much more complicated than might have been expected. Keywords: Cyclic Properties, SubBytes transformation, ShiftRows transformation, MixColumns transformation, Maximal period.

1

Introduction

A well-designed SPN (Substitution Permutation Network) structure block cipher, Rijndael [4] was recently (26. Nov. 2001) selected as the AES (Advanced Encryption Standard) algorithm [11]. This cipher has been reputed to be secure against conventional cryptanalytic methods [4, 8], such as DC (Diﬀerential Cryptanalysis) [1] and LC (Linear Cryptanalysis) [7], and throughout the AES process the security of the AES algorithm was examined with considerable cryptanalytic methods [2–4, 13, 14]. But despite the novelty of the AES algorithm [5], the fact that the AES algorithm uses mathematically simple functions [6, 12, 15, 16] has led to some commentators’ concern about the security of this cipher. In particular, S. Murphy and M.J.B. Robshaw [15, 16] have modiﬁed the original structure of the AES algorithm so that the aﬃne transformation used for generating the S-box (non-linear layer) is included in the linear layer, and have shown that any input to the modiﬁed linear layer of the AES algorithm is mapped to itself after 16 iterations of the linear transformation (the maximal period of the modiﬁed linear layer is 16 [15, 16]. Based on this observation, they have remarked that the linear layer of the AES algorithm may not be so eﬀective at mixing data. At this stage, to make the concept of “mixing data” clear, we brieﬂy deﬁne the T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 223–234, 2003. c International Association for Cryptologic Research 2003

224

Beomsik Song and Jennifer Seberry

eﬀect of mixing data, which Murphy and Robshaw considered. We deﬁne that in a set K consisting of n elements, if an input of a function F is mapped to itself after p iterations of the function, then the eﬀect of mixing data is e = np . In this paper, we present our further observations on the AES algorithm in terms of the cyclic properties of the AES algorithm. We examine the cyclic properties of the AES algorithm via each function in the original structure. We note that the maximal period of each function used in the AES algorithm is short, and that the maximal period of the composition of the functions used in the linear layer is also short. We however note that the composition of the non-linear layer and the linear layer dramatically increases the maximal period of the basic structure to highly guarantee the eﬀect of mixing data. Speciﬁcally, we have found that: • any input data block of the SubBytes transformation (non-linear layer) returns to the initial state after 277182 (≈ 218 ) repeated applications (the maximal period of the SubBytes transformation is 277182). • any input data block of the ShiftRows transformation (in the linear layer) returns to the initial state after 4 repeated applications (the maximal period of the ShiftRows transformation is 4). • any input data block of the MixColumns transformation (in the linear layer) returns to the initial state after 4 repeated applications as well (the maximal period of the MixColumns transformation is 4). • when the ShiftRows transformation and the MixColumns transformation in the linear layer are considered together, the maximal period is 8. • when the SubBytes transformation (non-linear layer) and the ShiftRows transformation (in the linear layer) are considered together, the maximal period is 554364 (≈ 219 ). More importantly, we have found that the maximal period of the composition of the SubBytes transformation (non-linear layer) and the MixColumns transformation (in the linear layer) is 1,440,607,416,177,321,097,705,832,170,004,940 (≈ 2110 ). Our observations indicate that the structure of the AES algorithm is good enough to bring magniﬁcent synergy eﬀects in mixing data when the linear and the non-linear layers are combined. In the last part of this paper we discuss the relevance of our observations to the security of the AES algorithm. This paper is organised as follows: the description of the AES algorithm is presented in Section 2; the cyclic properties of the functions are described in Section 3; the impact of our observations on the security of the AES algorithm are discussed in Section 4; and the conclusion is given in Section 5.

2

Description of the AES Algorithm

The AES algorithm is an SPN structure block cipher, which processes variablelength blocks with variable-length keys (128, 192, and 256). In the standard case, it processes data blocks of 128 bits with a 128-bit Cipher Key [4, 11]. In this paper we discuss the standard case because the results of our observations will be similar in the other cases.

Further Observations on the Structure of the AES Algorithm i00 i10 i20 i30

i01 i11 i21 i31

i02 i12 i22 i32

i03 i13 i23 i33

SubBytes

225

Non-linear layer Linear layer

Round Key

ShiftRows

MixColumns

⊕

O00 O10 O20 O30

O01 O11 O21 O31

O02 O12 O22 O32

O03 O13 O23 O33

Fig. 1. Basic structure of the AES algorithm.

As Figure 1 shows, the AES algorithm consists of a non-linear layer (SubBytes transformation) and linear layer (ShiftRows transformation and MixColumns transformation). Each byte in the block is bytewise substituted by the SubBytes transformation using a 256-byte S-box, and then every byte in each row is cyclicly shifted by a certain value (row #0: 0, row #1: 1, row #2: 2, row #3: 3) by the ShiftRows transformation. After this, all four bytes in each column are mixed through the MixColumns transformation by the matrix formula in Figure 2. Here, each column is considered as a polynomial over GF (28 ), and multiplied with a ﬁxed polynomial 03 · x3 + 01 · x2 + 01 · x + 02 (modulo x4 + 1). After these operations, a 128-bit round key extended from the Cipher Key is XORed in the last part of the round. The MixColumns transformation is omitted in the last round (10th round), but before the ﬁrst round a 128-bit initial round key is XORed through the initial round key addition routine. The round keys are derived from the Cipher Key by the following manner: Let us denote the columns in the Cipher Key by CK0 ,CK1 ,CK2 ,CK3 , the columns in the round keys by K0 ,K1 ,K2 ,. . .,K43 , and the round constants by Rcon. Then the columns in the round keys are   K0 = CK0 , K1 = CK1 , K2 = CK2 , K3 = CK3 , Kn = Kn−4 ⊕ SubBytes(RotBytes(Kn−1 )) ⊕ Rcon if 4 | n  Kn = Kn−4 ⊕ Kn−1 otherwise.

3

Cyclic Properties of the Functions

In this section, we refer to cyclic properties of the functions used in the AES algorithm. The cyclic property of each function is examined ﬁrst, and then the cyclic properties of the combined functions are obtained. For future reference, we deﬁne f n (I) = f ◦ f ◦ f ◦ · · · ◦ f (I).

226

Beomsik Song and Jennifer Seberry

O0c

02 03 01 01

i0c

01 02 03 01

i1c

O2c

01 01 02 03

i2c

O3c

03 01 01 02

i3c

O1c =

Fig. 2. Mixing of four bytes in a column.

3.1

Cyclic Property of Each Function

Cyclic Property of the SubBytes Transformation. From the analysis of 256 substitution values in the S-box, we have found the maximal period of the SubBytes transformation (non-linear layer). Property 1 Every input byte of the S-box returns to the initial value after some t repeated applications of the substitution. In other words, for any input i of the S-box=S, S t (i) = i. The 256 values of the input byte can be classiﬁed into ﬁve small groups as in Table 1 according to the values of t. The number of values in each group (the period of each group) is 87, 81, 59, 27, and 2 respectively. In Table 1, each value in each group is mapped to the value next to it. For example ‘f2’ → ‘89’ → ‘a7’ → · · · → ‘04’ → ‘f2’, and ‘73’ → ‘8f’ → ‘73’. From Property 1, we can see that although the S-box is a non-linear function, every input block of the SubBytes transformation is mapped to itself after some repeated applications of the SubBytes transformation. Indeed, we see that if each byte in an input block (16 bytes) is ‘8f’ or ‘73’ (in group 5), then this block returns to the initial state after just two applications of the SubBytes transformation. From Property 1, if we consider the L.C.M (Least Common Multiple) of 87, 81, 59, 27, and 2, then we ﬁnd the following cyclic property of the SubBytes transformation. Property 2 For any input block I of the SubBytes transformation, SubBytes277182 (I) = I. That is, the maximal period of the SubBytes transformation is 277182. The minimal period of the SubBytes transformation is 2 when each byte in the input block I is ‘8f ’ or ‘73’. Cyclic Property of the ShiftRows Transformation. The cyclic property of the ShiftRows transformation is immediately found from the shift values (row #0: 0, row #1: 1, row #2: 2, row #3: 3) in each row.

Further Observations on the Structure of the AES Algorithm

227

Table 1. Classifying the substitution values in the S-box. Group #1 (maximal period: 87) f2, 89, a7, 5c, 4a, d6, f6, 42, 2c, 71, a3, 0a, 67, 85, 97, 88, c4, 1c, 9c, de, 1d, a4, 49, 3b, e2, 98, 46, 5a, be, ae, e4, 69, f9, 99, ee, 28, 34, 18, ad, 95, 2a, e5, d9, 35, 96, 90, 60, d0, 70, 51, d1, 3e, b2, 37, 9a, b8, 6c, 50, 53, ed, 55, fc, b0, e7, 94, 22, 93, dc, 86, 44, 1b, af, 79, b6, 4e, 2f, 15, 59, cb, 1f, c0, ba, f4, bf, 08, 30, 04 Group #2 (maximal period: 81) 7c, 10, ca, 74, 92, 4f, 84, 5f, cf, 8a, 7e, f3, 0d, d7, 0e, ab, 62, aa, ac, 91, 81, 0c, fe, bb, ea, 87, 17, f0, 8c, 64, 43, 1a, a2, 3a, 80, cd, bd, 7a, da, 57, 5b, 39, 12, c9, dd, c1, 78, bc, 65, 4d, e3, 11, 82, 13, 7d, ff, 16, 47, a0, e0, e1, f8, 41, 83, ec, ce, 8b, 3d, 27, cc, 4b, b3, 6d, 3c, eb, e9, 1e, 72, 40, 09, 01 Group #3 (maximal period: 59) 00, 63, fb, 0f, 76, 38, 07, c5, a6, 24, 36, 05, 6b, 7f, d2, b5, d5, 03, 7b, 21, fd, 54, 20, b7, a9, d3, 66, 33, c3, 2e, 31, c7, c6, b4, 8d, 5d, 4c, 29, a5, 06, 6f, a8, c2, 25, 3f, 75, 9d, 5e, 58, 6a, 02, 77, f5, e6, 8e, 19, d4, 48, 52 Group #4 (maximal period: 27) ef, df, 9e, 0b, 2b, f1, a1, 32, 23, 26, f7, 68, 45, 6e, 9f, db, b9, 56, b1, c8, e8, 9b, 14, fa, 2d, d8, 61 Group #5 (maximal period: 2) 73, 8f

* Each value in each table is followed by its substitution value Property 3 For any input block I of the ShiftRows transformation, Shif tRows(Shif tRows(Shif tRows(Shif tRows(I)))) = I. In other words, the maximal period of the ShiftRows transformation is 4. The minimal period of the ShiftRows transformation is 1 when all bytes in the input block I are the same. Cyclic Property of the MixColumns Transformation. In terms of the MixColumns transformation, we have found that the maximal period of this function is 4. Let us look carefully once again at the algebraic structure of the MixColumns transformation described in Section 2. As realised, each input column (four bytes) is considered as a polynomial over GF (28 ) and multiplied modulo x4 + 1 with a ﬁxed polynomial b(x) = 03 · x3 + 01 · x2 + 01 · x + 02. This can be written as a matrix multiplication, as in Figure 2, and from this matrix formula we can obtain the relation between an input column (Ic ) and

228

Beomsik Song and Jennifer Seberry

the corresponding output column (Oc ). Hence, we can ﬁnd that for any input column Ic (four bytes), M (M (M (M (Ic )))) = Ic where M is the matrix multiplication described in Figure 2. When all four bytes of Ic are the same, M (Ic ) = Ic . If we now consider one input block (four columns) of the MixColumns transformation described in Figure 1, then we ﬁnd the following property. Property 4 For any input block I (16 bytes) of the MixColumns transformation, M ixColumns(M ixColumns(M ixColumns(M ixColumns(I)))) = I. In other words, the maximal period of the MixColumns transformation is 4. The minimal period of the MixColumns transformation is 1 when the bytes are the same in each column. 3.2

Cyclic Properties of Combined Functions

We now refer to the cyclic properties of cases when the above functions are combined. We ﬁrst refer to the maximal period of the linear layer (the composition of the ShiftRows transformation and the MixColumns transformation). In the case when the ShiftRows transformation and the MixColumns transformation are considered together, we obtain the maximal period of the linear layer. Property 5 Any input block I of the linear layer is mapped to itself after 8 repeated applications of the linear layer. In other words, the maximal period of the linear layer is 8. From the two minimal periods referred to in Property 3 and Property 4 we obtain the following property. Property 6 Any input block I of the linear layer, in which all bytes are the same, is mapped to itself after one application of the linear layer. That is, the minimal period of the linear layer is 1. When the SubBytes transformation (non-linear layer) and the ShiftRows transformation (in the linear layer) are combined, we obtain the following cyclic property from the L.C.M of the two maximal periods referred to in Property 2 and Property 3. Property 7 Any input block I of the composition of the SubBytes transformation and the ShiftRows transformation is mapped to itself after 554364 repeated applications of the composition. In other words, the maximal period of the composition of the SubBytes transformation and the ShiftRows transformation is 554364.

Further Observations on the Structure of the AES Algorithm

229

Property 8 In Property 7, if all bytes in the input block I are the same and are either ‘73’ or ‘8f ’, then this block is mapped to itself after two repeated applications of the composition. That is, the minimal period of the composition of the SubBytes transformation and the ShiftRows transformation is 2. More importantly, we show that although the maximal periods of both the non-linear layer and the linear layer are short, the maximal period is surprisingly increased in the composition of the non-linear layer and the MixColumns transformation. We ﬁrst change the order of the SubBytes transformation and the ShiftRows transformation with each other as shown in Figure 3 (b) (the order of these two functions is changeable). i00 i10 i20 i30

i01 i11 i21 i31

i02 i12 i22 i32

i03 i13 i23 i33

i00 i10 i20 i30

i01 i11 i21 i31

i02 i12 i22 i32

i03 i13 i23 i33

i00 i10 i20 i30

i01 i11 i21 i31

i02 i12 i22 i32

S-box

ShiftRows

ShiftRows

S-box

MixColumns

MixColumns

ES-box

⊕

⊕

⊕

i03 i13 i23 i33

ShiftRows

O00 O10 O20 O30

O01 O11 O21 O31

O02 O12 O22 O32

(a)

O03 O13 O23 O33

O00 O10 O20 O30

O01 O11 O21 O31

O02 O12 O22 O32

O03 O13 O23 O33

O00 O10 O20 O30

(b)

O01 O11 O21 O31

O02 O12 O22 O32

O03 O13 O23 O33

(c)

Fig. 3. Re-ordering of SubBytes and ShiftRows.

We then consider the S-box and the MixColumns transformation together. As a result, we obtain an extended S-box, ES-box, which consists of 232 nonlinear substitution paths, as shown in Figure 3 (c) and Table 2. Now, using the same idea used to obtain Property 1, we classify the 232 four-byte input values of the ES-box into 52 small groups according to their periods. The number of values in each group (the period of each group) is 1,088,297,796 (≈ 230 ), 637,481,159 (≈ 229 ), 129,021,490 (≈ 227 ), 64,376,666 (≈ 226 ), and so on. Table 3 shows the classiﬁcation of all substitution values in the ES-box, which has been obtained from our analysis (see the appendix for more details). From these values of the periods we ﬁnally ﬁnd that the maximal period of the composition of the SubBytes transformation (non-linear layer) and

230

Beomsik Song and Jennifer Seberry Table 2. ES-box. 0x00000000 0x00000001 • • • • 0xabcdef12

I

↓

ES(I)

↓

↓

↓

↓

• • • ↓

↓

0xffffffff ↓

0x63636363 0x7c7c425d • • • • 0x0eb03a4d • • • 0x16161616

Table 3. Classifying the substitution values in the ES-box.

1088297796, 637481159, 637481159, 637481159, 637481159, 129021490, 129021490, 129021490, 129021490, 64376666, 64376666, 11782972, 39488, 16934, 13548, 13548, 10756, 7582, 5640, 5640, 3560, 1902, 1902, 548, 548, 136, 90, 90, 87, 81, 59, 47, 47, 47, 47, 40, 36, 36, 27, 24, 21, 21, 15, 15, 12, 8, 4, 4, 4, 2, 2, 2 e.g. Period of group #1 : 1088297796, Period of group Period of group #6 : 129021490,

#2 : 637481159,

Period of group #12 : 11782972.

the MixColumns transformation (in the linear layer) is 1,440,607,416,177,321, 097,705,832,170,004,940 (≈ 2110 ). Here, we note that the maximal period of this composition is the largest L.C.M of any four values above. This is because one input block consists of four columns. We now discuss shorter periods of the composition of the the SubBytes transformation and the MixColumns transformation which cryptanalysts may be concerned about. We ﬁrst refer to the minimal period. In very rare cases where each column in an input block I is ‘73737373’, ‘8f8f8f8f’, ‘5da35da3’, ‘c086c086’, ‘a35da35d’ or ‘86c086c0’ (each of these values is mapped to itself after 2 iterations of ES-box: see the appendix), for example, I = 8f8f8f8f c086c086 73737373 5da35da3, the period of the composition of the SubBytes transformation and the MixColumns transformation is 2 (this is the minimal period of the composition of the SubBytes transformation and the MixColumns transformation). We next refer to the periods of the composition of the SubBytes transformation and the MixColumns transformation for input blocks in which all bytes are the same. If all bytes in an input block I of the composition of the SubBytes transformation and the MixColumns transformation are the same, then this block leads to an output block in which all bytes are the same. In this case, the period of the composition of the SubBytes transformation and the MixColumns transformation is the same as the period of the S-box referred to in Table 1. For example, if the

Further Observations on the Structure of the AES Algorithm

231

bytes in an input block I of the combined function of the SubBytes transformation and the MixColumns transformation are all ‘f2’, then this block is mapped to itself after 87 iterations of this combined function (see Group #1 in Table 1 and Period 87 in the appendix). In the next section, we discuss that input blocks having short periods could provide some algebraic clues for cryptanalysis, as some previous works have expected [15, 16]. We show that input blocks having short periods, when compared with others, could have relatively simple hidden algebraic relations with the corresponding output blocks. However, we also note that although in some cases the composition of the non-linear layer and the linear layer has short periods which could provide some algebraic clues for cryptanalysis, the key schedule of the AES algorithm does not allow the short periods to go on.

4

Impact on the Security of the AES Algorithm

In this section, we discuss the impact of our observations on the security of the AES algorithm. We show that input blocks having short periods (the eﬀect of mixing data e = np is very small) are apt to give hidden algebraic clues for cryptanalysis when compared with others. To do this, we ﬁrst ﬁnd some input blocks having shortest periods in the composition of the non-linear layer and the linear layer (the SubBytes transformation+the ShiftRows transformation+the MixColumns transformation). Property 9 For any input block I of the composition of the non-linear layer and the linear layer (the SubBytes transformation, the ShiftRows transformation, and the MixColumns transformation), if all bytes in I are the same, then all bytes in the output block are also the same. In this case, the composition of the non-linear layer and the linear layer is equivalent to the S-box because the ShiftRows transformation and the MixColumns transformation do not aﬀect the data transformation. Property 10 For any input block I of the composition of the non-linear layer and the linear layer, if all bytes in I are equal to i (any value), then the period of the composition of the non-linear layer and the linear layer for this input block is the same as the period of the S-box for i. For example, if the bytes in an input block I of the composition of the nonlinear layer and the linear layer are all ‘ef’, then this input block is mapped to itself after 27 iterations (the period of the S-box for ‘ef’ is 27 as given in Table 1). This means that the eﬀect of mixing data of the composition of the non-linear 128 layer and the linear layer is e = 227 is the number of 128 for this input block (2 all possible blocks presented by 128 bits). Property 11 In Property 10, if all bytes in I are the same and are either ‘73’ or ‘8f ’, then I is mapped to itself after 2 iterations of the composition of the non-linear layer and the linear layer. In other words, the minimal period of the composition of the non-linear layer and the linear layer is 2 (the minimal eﬀect 2 ). of mixing data of the non-linear layer and the linear layer is e = 2128

232

Beomsik Song and Jennifer Seberry

We now show that input blocks having short periods could provide some algebraic clues for cryptanalysis if the key schedule of the AES algorithm were not well-designed. Let us assume that contrary to the original key schedule of the AES algorithm, for any Cipher Key in which all bytes are the same, a certain key schedule generates round keys in which each round key has all its bytes the same. (This does not actually appear in the original key schedule.) For example, suppose that the initial round key consists of all ‘78’, that the ﬁrst round key consists of all ‘6f’, . . ., and that the tenth round key consists of all ‘63’. Then, if we consider the encryption procedure, we see, from Property 9, that any plaintext in which all bytes are the same leads to a ciphertext in which all bytes are the same. This means that if anyone uses, for encryption, a Cipher Key in which all bytes are the same, then attackers will easily become aware of this fact with a chosen plaintext in which all bytes are the same. As long as the attackers realise this fact, it will be easy to ﬁnd the Cipher Key. They will ﬁnd the Cipher Key from 256 key searches. However, we note that this scenario does not occur with the original key schedule of the AES algorithm because plaintexts having short periods are not able to keep up the short periods in the original key schedule. For example, we consider the most simple case where a plaintext, in which all bytes are ‘73’, is encrypted with a Cipher Key in which all bytes are ‘00’. In this case, by Property 11, the period of the composition of the non-linear layer and the linear layer is 2 for the intermediate text I0 = 73737373 73737373 73737373 73737373 after the initial round key addition. However, we have found that the period of the composition of the SubBytes transformation (non-linear layer) and the MixColumns transformation (in the linear layer) becomes 1,088,297,796 (≈ 230 ) for the intermediate text I1 = edececec edececec edececec edececec after the ﬁrst round key addition. We here emphasise once again that although the combined function of the non-linear layer and the linear layer of the AES algorithm has some short periods in rare cases, the key schedule does not allow these short periods to go on, thus denying algebraic clues for its cryptanalysis.

5

Conclusions

We have summarised our further observations on the AES algorithm relating to the cyclic properties of this cipher. Speciﬁcally, we have shown that the maximal period of each function used in the AES algorithm is short, and that the maximal period of the composition of the functions used in the linear layer is short as well. However, more importantly, we have also shown that the well-designed structure brings remarkable synergy eﬀects in the cyclic property of this cipher when the linear layer and the non-linear layer are combined. We note that the structure of the AES algorithm is good enough to guarantee high data mixing eﬀects. We also note that although the composition of the non-linear layer and the linear layer of the AES algorithm has, in some cases, short periods which could

Further Observations on the Structure of the AES Algorithm

233

provide some algebraic clues for its cryptanalysis, the well-designed key schedule does not allow these short periods to go on. We believe that the combination of the simple functions in the well-designed structure is one of the advantages of the AES algorithm although some research studies have been recently making considerable progress [9, 10] in the cryptanalysis of the AES-like block ciphers.

References 1. E. Biham and A. Shamir, “Diﬀerential cryptanalysis of DES-like cryptosystems”, J. Cryptology, Vol.4, pp.3-72, 1991. 2. E. Biham and N. Keller, “Cryptanalysis of Reduced Variants of Rijndael”, http://csrc.nist.gov/encryption/aes/round2/conf3/aes3papers.html, 2000. 3. H. Gilbert and M. Minier, “A Collision Attack on 7 Rounds of Rijndael”, Proceeding of the Third Advanced Encryption Standard Candidate Conference, NIST, pp.230241, 2000. 4. J. Daemen and V. Rijmen, “AES Proposal: Rijndael”, http://csrc.nist. gov/encryption/aes/rijndael/Rijndael.pdf, 1999. 5. J. Daemen and V. Rijmen, “Answer to New Observations on Rijndael”, AES Forum comment, August 2000, http://www.esat.kuleuven.ac.be/∼rijmen/rijndael/. 6. L. Knudsen and H. Raddum, “Recommendation to NIST for the AES”, Second round comments to NIST, May 2000, http://csrc.nist.gov/encryption/ aes/round2/comments/. 7. M. Matsui, “Linear cryptanalysis method for DES cipher”, Advances in CryptologyEurocrypt’93, Lecture Notes in Computer Science, Springer-Verlag, pp.386-397, 1993. 8. M. Sugita, K. Kobara, K. Uehara, S. Kubota, and H. Imai, “Relationships among Diﬀerential, Truncated Diﬀerential, Impossible Diﬀerential Cryptanalyses against Word-oriented Block Ciphers like Rijndael, E2”, Proceeding of the Third AES Candidate Conference, 2000. 9. N. Courtois and J. Pieprzyk, “Cryptanalysis of Block Ciphers with Overdeﬁned Systems of Equations”, IACR eprint, April 2002, http://www.iacr.org/complete/. 10. N. Courtois and J. Pieprzyk, “Cryptanalysis of Block Ciphers with Overdeﬁned Systems of Equations”, Proceeding of ASIACRYPT’2002, Lecture Notes In Computer Science Vol.2501, pp.267-287, 2002. 11. National Institute of Standard and Technology, “Advanced Encryption Standard(AES)”, FIPS 197, 2001. 12. N. Ferguson, R. Schroeppel, and D. Whiting, “A simple algebraic representation of Rijndael”, Proceeding of SAC’2001, Lecture Notes In Computer Science Vol.2259, pp.103-111, 2001. 13. N. Ferguson, J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner, and D. Whiting, “Improved Cryptanalysis of Rijndael”, Fast Software Encryption Workshop ’2000, Preproceeding, 2000. 14. S. Lucks, “Attacking Seven Rounds of Rijndael under 192-Bit and 256-Bit Keys”, Proceeding of the Third Advanced Encryption Standard Candidate Conference, NIST, pp.215-229, 2000. 15. S. Murphy and M.J.B Robshaw, “New Observations on Rijndael”, AES Forum comment, August 2000, http://www.isg.rhul.ac.uk/∼sean/. 16. S. Murphy and M.J.B Robshaw, “Further Comments on the Structure of Rijndael”, AES Forum comment, August 2000, http://www.isg.rhul.ac.uk/∼sean/.

234

Beomsik Song and Jennifer Seberry

Appendix: Grouping in the ES-Box Periods 1088297796 637481159 637481159 637481159 637481159 129021490 129021490 129021490 129021490 64376666 64376666 11782972 39488 16934 13548 13548 10756 7582 5640 5640 3560 1902 1902 548 548 136 90 90 87 81 59 47 47 47 47 40 36 36 27 24 21 21 15 15 12 8 4 4 4 2 2 2

Elements in each group 00000003, 7b7b4b53, • • • • • • • • ••, 4487de39 00000002, 77775f4b, • • • • • • • • ••, 3943ﬀc4 00000004, f2f2cb5a, • • • • • • • • ••, a6284276 00000006, 6f6f777b, • • • • • • • • ••, 24c3a2a6 00000008, 303096c5, • • • • • • • • ••, d4f75ed0 00000001, 7c7c425d, • • • • • • • • ••, 40f39ed7 00000007, c5c59234, • • • • • • • • ••, 25322e95 00000009, 0101c5a7, • • • • • • • • ••, f8bc508a 00000010, caca832a, • • • • • • • • ••, 9660fca0 00000016, 47470f2b, • • • • • • • • ••, c50ccf88 00000142, 330d8ce2, • • • • • • • • ••, e401999a 000000ea, 878754b0, • • • • • • • • ••, 638a2857 00020002, 4b5f4b5f, • • • • • • • • ••, 30a530a5 00010001, 5d425d42, • • • • • • • • ••, 6ad56ad5 00023af9, 468fbf7b, • • • • • • • • ••, 6b5493f6 0005fde6, a1c7299d, • • • • • • • • ••, 8bf1558a 001004ad, e474f2ac, • • • • • • • • ••, 245557ee 00070007, 34923492, • • • • • • • • ••, d740d740 00022db0, 60198ddf, • • • • • • • • ••, feb74bd1 0015e186, 91861d8c, • • • • • • • • ••, 5d50a4a6 00094090, ac1ad06d, • • • • • • • • ••, f6110e3e 0000c22b, b73b421a, • • • • • • • • ••, 07a9ec2e 0021e4f9, 2aa0fc18, • • • • • • • • ••, 76a21d37 00b800b8, 7d727d72, • • • • • • • • ••, 05a905a9 00c600c6, d601d601, • • • • • • • • ••, 85708570 01d266c5, a9fe5e55, • • • • • • • • ••, f554d80d 02338d7f, 3fdf63b8, • • • • • • • • ••, 3c0c694e 0304c1ca, f778e5ef, • • • • • • • • ••, 8683dfa2 f2f2f2f2, 89898989, • • • • • • • • ••, 04040404 7c7c7c7c, 10101010, • • • • • • • • ••, 01010101 00000000, 63636363, • • • • • • • • ••, 52525252 0112dc34, 267c8afb, • • • • • • • • ••, c406421d 018b9ded, b4b1024d, • • • • • • • • ••, 32926cc7 024db4b1, 95eed67c, • • • • • • • • ••, 9ded018b 03c975a2, 2d5cc9b9, • • • • • • • • ••, c0c8d6db 0aﬀ4adf, bcb47f4e, • • • • • • • • ••, 1864fa71 03d603d6, 7af77af7, • • • • • • • • ••, 3e0a3e0a 07f107f1, 0d690d69, • • • • • • • • ••, 17a517a5 efefefef, dfdfdfdf, • • • • • • • • ••, 61616161 03d503d5, 8bf38bf3, • • • • • • • • ••, c6abc6ab 050f050f, 514c514c, • • • • • • • • ••, e344e344 0f050f05, 4c514c51, • • • • • • • • ••, 44e344e3 0e6e0e6e, c3f7c3f7, • • • • • • • • ••, ecbeecbe 6e0e6e0e, f7c3f7c3, • • • • • • • • ••, beecbeec 0327266c, 1eaab216, • • • • • • • • ••, 837b2f79 cac4cac4, a4cca4cc, • • • • • • • • ••, 4a2d4a2d 01828fc8, 5627aa2f, 8fc80182, aa2f5627 27aa2f56, c801828f, 2f5627aa, 828fc801 a37dadf5, 7dadf5a3, adf5a37d, f5a37dad 73737373, 8f8f8f8f 5da35da3, c086c086 a35da35d, 86c086c0

Optimal Key Ranking Procedures in a Statistical Cryptanalysis Pascal Junod and Serge Vaudenay Security and Cryptography Laboratory Swiss Federal Institute of Technology CH-1015 Lausanne, Switzerland {pascal.junod,serge.vaudenay}@epfl.ch

Abstract. Hypothesis tests have been used in the past as a tool in a cryptanalytic context. In this paper, we propose to use this paradigm and deﬁne a precise and sound statistical framework in order to optimally mix information on independent attacked subkey bits obtained from any kind of statistical cryptanalysis. In the context of linear cryptanalysis, we prove that the best mixing paradigm consists of sorting key candidates by decreasing weighted Euclidean norm of the bias vector. Keywords: Key ranking, statistical cryptanalysis, Neyman-Pearson lemma, linear cryptanalysis.

1

Introduction

Historically, statistical hypothesis tests, although well-known in many engineering ﬁelds, has not been an explicitely widely-used tool in the cryptanalysis of block ciphers. Often, some distinguishing procedures between two statistical distributions are proposed, but without much attention on their optimality. To the best of our knowledge, an unpublished report of Murphy, Piper, Walker and Wild [MPWW95] is the ﬁrst work where the concept of statistical hypothesis tests is discussed in the context of “modern” cryptanalysis. More recently, a paper of Fluhrer and McGrew [FM01] discussed the performances of an optimal statistical distinguisher in the cryptanalysis of a stream cipher. These tools were again used by Mironov [Mir02], by Coppersmith et al. [CHJ02], by Goli´c and Menicocci [GM] in the same context, for instance, while Junod [Jun03] makes use of them for deriving the asymptotic behaviour of some optimal distinguishers. 1.1

Contributions of This Paper

In this paper, we propose a sound and precise statistical cryptanalytic framework which extends Vaudenay’s one [Vau96]; furthermore, we describe an optimal distinguishing procedure that can be employed during any statistical cryptanalysis involving subkey candidates ranking. As illustration, we apply this distinguishing procedure to the linear cryptanalysis of DES [DES77] as proposed by Matsui in [Mat94]. In the ﬁrst version of T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 235–246, 2003. c International Association for Cryptologic Research 2003

236

Pascal Junod and Serge Vaudenay

linear cryptanalysis of DES [Mat93], Matsui’s attack returns a subkey which is the correct one with high probability, while a reﬁned version of the attack [Mat94] returns a list of subkeys sorted by maximum-likelihood. This approach, which is very similar to the list decoding paradigm in coding theory, allows to decrease the number of known plaintext-ciphertext pairs needed. Although very simple to implement, Matsui’s key ranking heuristic is however not optimal. We show basically that by sorting the subkey candidates by decreasing sum of squares of the experimental biases, we obtain a ranking procedure which minimizes the costs of the attack’s exhaustive search part. At ﬁrst sight, the optimisation of the exhaustive search complexity of a cryptanalysis does not seem to be so interesting, since exhaustive search is a “cheap” operation for a cryptanalyst, compared to the cost, or the diﬃculty of ﬁnding the required amount of known plaintext-ciphertext pairs. However, we show in this paper that by optimising the exhaustive search part of a linear cryptanalysis of DES, it is possible to decrease in a sensible way the number of pairs needed and to keep the computational complexity within a reasonable area. In [Jun01], Junod did a complexity analysis and proved that Matsui’s attack against DES performs better than expected, which was already conjectured. He further conﬁrmed this fact experimentally with 21 linear cryptanalysis: given 243 known plaintext-ciphertext pairs, and a success probability equal to 85 %, the computational complexity had an upper bound of 240.75 DES evaluations. In this paper, the power of this technique is illustrated by experimentally demonstrating that one can decrease the computational complexity of Matsui’s attack against DES by an average factor of two, or, equivalently, decrease the number of known plaintext-ciphertext pairs needed by a non-trivial factor (i.e 31 %) without an explosion of the computational complexity (i.e. less than 245 DES evaluations); one can also divide the number of known pairs by two (i.e. to 242 ) while keeping the computational complexity within 247 DES evaluations. Other examples of potential direct application of our optimal ranking rule are Shimoyama-Kaneko’s attack [SK98] on DES which uses quadratic boolean relations, or Knudsen and Mathiassen’s chosen-plaintexts version [KM01] of linear cryptanalysis against DES. However, the ideas behind our ranking method are not restricted to those attacks and may be applied in any statistical cryptanalysis. The rest of this paper is organized as follows: in §2, we recall Vaudenay’s statistical cryptanalysis model and Matsui’s ranking procedures; in §3, we introduce the necessary statistical tools and we propose the Neyman-Pearson ranking procedure. In §4, we apply it to a linear cryptanalysis of DES, we present some experimental results on the improvment and we discuss potential applications to other known attacks. Finally, we give some concluding remarks in §5. 1.2

Notation

The following notation will be used throughout this paper. Random variables X, Y, . . . are denoted by capital letters, while realizations x ∈ X , y ∈ Y, . . . of random variables are denoted by small letters. The fact for a random variable

Optimal Key Ranking Procedures in a Statistical Cryptanalysis

237

1. Counting Phase: Collect several random samples sj = f2 (Pj , Cj ), for j = 1, . . . , n and count all occurences of all the possible values of the sj ’s in |S| counters. 2. Analysis Phase: For each of the subkey candidates i , 1 ≤ i ≤ |L|, count all the occurences in all xi = h3 (i , sj ) and give it a mark µi using the statistic Σ(x1 , . . . , xn ). 3. Sorting Phase: Sort all the candidates i using their mark µi . This list of sorted candidates is denoted U. 4. Searching Phase: Exhaustively try all keys following the sorted list of all the subkey candidates. Fig. 1. Structure of a statistical cryptanalysis.

X to follow a distribution D is denoted X ← D, while its probability density and x distribution functions are denoted by fD (x) and FD (x) = PrX←D [X ≤ x] = f (t)dt, respectively. When the context is clear, we will write simply Pr[X ≤ −∞ D x]. Finally, as usual, “iid” means “independent and identically distributed”.

2

Statistical Cryptanalysis and Key Ranking Procedures

In this paper, we will assume that a given cryptanalysis can be seen as a statistical cryptanalysis, in the sense of Vaudenay’s model [Vau96], and that it uses a key ranking procedure. 2.1

Statistical Cryptanalysis

We recall now brieﬂy the principles of a statistical cryptanalysis. Let P, C and K be the plaintext, cipertext and key space, respectively. A statistical cryptanalysis uses three functions, denoted f1 , f2 and f3 which have the following role: - f1 : K → L is a function which eliminates information of the key unrelated to the cryptanalysis. - f2 : P × C → S, where S is called the sample space, eliminates information about the plaintext and ciphertext spaces unrelated to the attack. - f3 : L × S → Q, where Q is a space summarizing information depending on intermediate results in the encryption. In order to be eﬃcient, a statistical cryptanalysis should fulﬁl the following conditions: the information x = f3 (, s), where ∈ L, s ∈ S and x ∈ Q, should be computable with small pieces of information on (p, c) ∈ P × C and k ∈ K (namely, s and ); furthermore, the information x = f3 (s, r ) should be statistically distinguishable from x = f3 (s, w ), where r and w is the information given by the right key and a wrong key, respectively. The main idea of the attack consists in assuming that we can distinguish the right key from wrong key with help of a statistical measurement Σ on the observed distribution of the xi ’s. The attack is described in Fig. 1. The data complexity is then deﬁned to be the number n of known plaintext-ciphertext pairs needed in step 1, while the computational complexity is deﬁned to be the number operations in the last phase

238

Pascal Junod and Serge Vaudenay

of the attack. We note that usually, the complexity of steps 2 and 3 is negligible, but it may not be the case in all situations. Key ranking is a technique introduced by Matsui in [Mat94] in order to increase the success probability of a linear cryptanalysis against DES; it corresponds to step 4 in Fig. 1: instead of returning the subkey max possessing the highest mark µmax maxi µi out of |L| subkey candidates, the idea is to return a sorted list U containing key candidates ranked by likelihood and to search for the remaining unattacked bits in this order. Obviously, two central points in a statistical cryptanalysis are the deﬁnition of the statistic Σ and of the mark µ which has to be assigned to a subkey candidate. The ﬁrst issue is the essence of the attack: the cryptanalyst must ﬁnd a “statistical weakness” in the cipher. In the section §3, we will address the second issue in a general way by using concepts of statistical hypothesis testing and we consider known techniques under this light; before, we recall some generic facts about linear cryptanalysis and the related ranking procedures proposed by Matsui. 2.2

Linear Cryptanalysis and Related Ranking Procedures

We recall brieﬂy the principles of a linear cryptanalysis. The attack’s core is unbalanced linear expressions, i.e. equations involving a modulo two sum of plaintext and ciphertext bits on the left and a modulo two sum of key bits on the right. Such an expression is unbalanced if it is satisﬁed with probability1 p

1 1 + , 0 < || ≤ 2 2

(1)

when the plaintexts and the key are independent and chosen uniformly at random. Given some plaintext bits Pi1 , . . . , Pir , ciphertext bits Cj1 , . . . , Cjs and key bits Kk1 , . . . , Kkt , and using the notation X[l1 ,...,lu ] Xl1 ⊕ Xl2 ⊕ . . . ⊕ Xlu , we can write a linear expression as P[i1 ,...,ir ] ⊕ C[j1 ,...,js ] = K[k1 ,...,kt ]

(2)

As this equation allows to get only one bit of information about the key, one usually use a linear expression spanning all the rounds but one; it is possible to identify the subkey involved in the last round. One can rewrite (2) as (r) P[i1 ,...,ir ] ⊕ C[j1 ,...,js ] ⊕ F[m1 ,...,mv ] C, K(r) = K[k1 ,...,kt ] (3) Now, one can easily identify the abstract spaces deﬁned in the generic model of Fig. 1: the (sub)key space L is the set of all possible values of the involved subkey 1

In the literature, this non-linearity measure is often called linear probability, and expressed as LPf (a, b) (2 Pr[a · x = b · f (x)] − 1)2 = 42 , where a and b are the masks selecting the plaintext and ciphertext bits, respectively. In this paper, we will refer to the bias for simplicity reasons.

Optimal Key Ranking Procedures in a Statistical Cryptanalysis

239

(i.e. the “interesting” bits of K(r) and those of K[k1 ,...,kt ] ); the sample space S is the set of all possible Pi1 , . . . , Pir and Cj1 , . . . , Cjs , and ﬁnally, Q consists in the binary set {0, 1} (i.e. the two possible hyperplanes). The ﬁrst linear cryptanalysis phase consists in evaluating the bias, or more precisely, the absolute bias, as the cryptanalyst ignores the right part of (3), of the linear expression for all possible subkey candidates and for all known plaintext-ciphertext pairs: n (4) Σ Ψ − 2 where Ψ is the number of times where (3) is equal to 0 (for a given subkey candidate ) and n is the number of known plaintext-ciphertext pairs. In a second phase, the list of subkey candidates is sorted, and the missing key bits are ﬁnally searched exhaustively for each subkey candidate until the correct full key is found. The computational complexity of the attack is then related to the number of encryptions needed in the exhaustive search part. The (implicit) mark used to sort the subkey candidates is the following: Deﬁnition 1 (Single-List Ranking Procedure). The mark µ given to a subkey candidate is deﬁned to be equal to the bias n µ Σ = Ψ − (5) 2 produced by this subkey . Interestingly, the reﬁned version of linear cryptanalysis described in [Mat94] uses two biased linear expressions involving diﬀerent 2 key bits subsets. The heuristic proposed by Matsui (which was based on intuition3 ) is the following: Deﬁnition 2 (Double-List Ranking Procedure). Let U1 and U2 be two lists of subkey candidates involving disjoint key bits subsets. Sort them independently using the Single-List Ranking Procedure described in Def. 1. Let ρ(U ) () be a function returning the rank of the candidate in the list U. The Double-List Ranking Procedure is then deﬁned as follows: 1. To each candidate = (1 , 2 ) ∈ U1 × U2 , assign the mark µ(1 ,2 ) ρ(U1 ) (1 ) · ρ(U2 ) (2 )

(6)

2. Sort the “composed” candidates by increasing marks µ(1 ,2 ) .

3

An Alternative View on Ranking Procedures

In this section, we recall some well-known statistical hypothesis testing concepts, and we discuss the optimality of the two ranking procedures described above. 2

3

The diﬀerent problem consisting in dealing with multiple linear approximations has been studied by Kaliski and Robshaw in [KR94]. However, the setting is diﬀerent than our: they handle the case where one disposes of several linear approximations acting on the same key bits, and they compute the cumulated (resulting) bias. Private communication.

240

3.1

Pascal Junod and Serge Vaudenay

Hypothesis Tests

Let D0 and D1 be two diﬀerent probability distributions deﬁned on the same ﬁnite set X . In a binary hypothesis testing problem, one is given an element x ∈ X which was drawn according either to D0 or to D1 and one has to decide which is the case. For this purpose, one deﬁnes a so-called decision rule, which is a function δ : X → {0, 1} taking a sample of X as input and deﬁning what should be the guess for each possible x ∈ X . Associated to this decision rule are two diﬀerent types of error probabilities: α PrX←D0 [δ(x) = 1] and β PrX←D1 [δ(x) = 0]. The decision rule δ deﬁnes a partition of X in two subsets which we denote by A and A, i.e. A ∪ A = X ; A is called the acceptance region of δ. We recall now the Neyman-Pearson lemma, which derives the shape of the optimum statistical test δ between two simple hypotheses, i.e. which gives the optimal decision region A. Lemma 1 (Neyman-Pearson). Let X be a random variable drawn according to a probability distribution D and let be the decision problem corresponding to hypotheses X ← D0 and X ← D1 . For τ ≥ 0, let A be deﬁned by PrX←D0 [x] ≥τ (7) A x∈X : PrX←D1 [x] Let α∗ PrX←D0 A and β ∗ PrX←D1 [A]. Let B be any other decision region with associated probabilities of error α and β. If α ≤ α∗ , then β ≥ β ∗ . Hence, the Neyman-Pearson lemma indicates that the optimum test (regarding error probabilities) in case of a binary decision problem is the likelihood-ratio test. All these considerations are summarized in Def. 3. Deﬁnition 3 (Optimal Binary Hypothesis Test). To test X ← D0 against X ← D1 , choose a constant τ > 0 depending on α and β and deﬁne the likelihood ratio PrX←D0 [x] (8) lr(x) PrX←D1 [x] The optimal decision function is then deﬁned by 0 (i.e accept X ← D0 ) if lr(x) ≥ τ δopt 1 (i.e. accept X ← D1 ) if lr(x) < τ 3.2

(9)

The Neyman-Pearson Ranking Procedure

We apply now the Neyman-Pearson paradigm to the ranking procedure. One deﬁnes the two hypotheses as follows: H0 is the hypothesis that the random variable modeling the statistic Σ (we make here a slightly abuse of notation by assigning the same name to both entities) produced by a given subkey candidate is distributed according DR , i.e. it is distributed as the right subkey candidate, while H1 is the hypothesis that Σ follows the distribution DW , i.e. it is distributed as a false subkey candidate (note that we assume here that

Optimal Key Ranking Procedures in a Statistical Cryptanalysis

241

the “wrong-key randomization hypothesis” [HKM95] holds, i.e. that wrong keys follow all the same distribution): H0 : Σ ← DR H1 : Σ ← DW In this scenario, a type I error (occurring with probability α) means that the correct subkey candidate R , with ΣR ← DR , is decided to be a wrong one; a type II error (occurring with probability β) means that one accepts a wrong candidate W as being the right one. When performing binary hypothesis tests, one usually proceeds as follows: one chooses a ﬁxed α that one is willing to accept, one computes the threshold τ corresponding to α and one deﬁnes the following decision rule when given the statistic Σ produced by the candidate :   H0 is accepted if fDR (Σ ) ≥ τ fD (Σ )  H1 is accepted if

W

fDR (Σ ) fDW (Σ )

<τ

where fDR and fDW denote the density function of the distributions DR and DW , respectively. In our scenario, this means that, given a candidate , the cryptanalyst will deﬁne the threshold τ in such a manner that α is negligible, so that the test will virtually always accept H0 : indeed, accepting a subkey candidate as being the right one and take a wrong decision costs an encryption, while deciding that the right candidate is a wrong one causes the failure of the whole attack. Then, the cryptanalyst can rank the candidates by decreasing likelihood-ratio values: the greater the value, the more likely it is to be the looked-for candidate. We call this ranking procedure Neyman-Pearson Ranking Procedure: Deﬁnition 4 (Neyman-Pearson Ranking Procedure). To each candidate , assign the mark fDR (Σ ) (10) µ fDW (Σ ) where Σ is the statistic produced by the candidate , and fDR and fDW are the density functions of Σ in case of the right and a wrong key, respectively. Then, sort the candidates by decreasing values of µ . Multiple (note that these considerations are valid for more than two lists, too) lists giving information on disjoint subsets of the key bits can thus be optimally combined easily if the joint distribution of the underlying statistics is available. Usually, reasonable heuristic statistical independence assumptions can be taken. We show now that, in case of a linear cryptanalysis, Matsui’s single-list ranking procedure is equivalent to a Neyman-Pearson Ranking Procedure. Without loss of generality, we will assume that the linear expression (3) has a bias equal to (see (1) for a deﬁnition of the bias), with > 0. Approximations of the Σ distributions are known (we refer to [Jun01] for more details about the derivations of these expressions):

242

Pascal Junod and Serge Vaudenay

fDW (x) =

and fDR (x) =

2 nπ

−

e

8 − 2x2 e n , nπ

2(x−n)2 n

−

+e

for x ≥ 0 2(x+n)2 n

(11)

for x ≥ 0

(12)

The likelihood-ratio is then given by a straighforward calculation. Lemma 2. In the case of a linear cryptanalysis, the likelihood-ratio is given by 2

lr(Σ ) = e−2n · cosh(4Σ ),

Σ ≥ 0

(13)

We can now state the following result. Theorem 1. Matsui’s single-list ranking procedure (as deﬁned in Def. 1) is equivalent to a Neyman-Pearson Ranking Procedure and is furthermore optimal in terms of the number of key tests. Proof: This follows easily from the fact that (13) is a monotone increasing function for increasing Σ ≥ 0 and that the type II error probability is monotonly increasing as the likelihood-ratio is decreasing. ♦ Furthermore, one can easily observe that Matsui’s double-list ranking procedure, although very simple, is not a Neyman-Pearson Ranking Procedure, since it is not a total ordering procedure and it does not make use of the whole information given by each subkey candidate (i.e. it does not use the experimental bias associated to each candidate, but only their ranks). The ﬁrst observation leads to some ambiguity in the implementation of Def. 2. For instance, should the combination of two candidates having respective ranks equal to 1 and 4 be searched for the unknown key bits before or after the combination consisting of two candidates having both rank 2? In the next section, we illustrate the use of a Neyman-Pearson ranking procedure in the case of a linear cryptanalysis of DES.

4

A Practical Application

Matsui’s reﬁned attack against DES [Mat94] makes use of two linear expressions involving disjoint subsets of key bits; one is the best linear expression on 14 rounds of DES and is used for deriving the second one using a “reversing trick”. Each of them gives information about 13 key bits, the remaining 30 unknown key bits having to be searched exhaustively. We refer to [Mat94] for the detailed description of both linear approximations. In order to derive a Neyman-Pearson ranking procedure, one has to compute the joint probability distribution of the statistics Σ1 and Σ2 furnished by the two linear expressions. As these statistics are dependant of disjoint subsets of the key bits, one can reasonably take the following assumption.

Optimal Key Ranking Procedures in a Statistical Cryptanalysis

243

Assumption 1. For each 1 and 2 , Σ1 and Σ2 are statistically independent, where 1 and 2 denote subkey candidates involving disjoint key subsets. A second assumption neglects the eﬀects of semi-wrong keys, i.e. keys which behave as the right one according to a list only. This is motivated by the fact that, in case of a linear cryptanalysis of DES, the number of such keys is small, and thus their eﬀect on the joint probability distribution is negligible. Assumption 2. For each 1 and 2 , Σ (Σ1 , Σ2 ) is distributed according (1) (2) (1) (2) (1) (2) either to DR = DR × DR or to DW = DW × DW , where DR and DR are the (1) (2) distributions of the right subkey for both key subsets, and DW and DW are the distributions of a right subkey for both key subsets, respectively. Using these two assumptions, the probability density functions deﬁned in (11) and (12), and the fact that the bias of both linear expression is the same and equal to , one can derive the likelihood-ratio: 2

µ(1 ,2 ) = e−4n · cosh(4Σ1 ) · cosh(4Σ2 )

(14)

As (14) is not “numerically” convenient to use, we may approximate it using a Taylor development in terms of , which gives a very intuitive deﬁnition of the Neyman-Pearson ranking procedure: µ(1 ,2 ) ≈ 1 + (8Σ21 + 8Σ22 − 4n)2 + O(4 )

(15)

Hence, we can note that it is suﬃcient to rank the subkey candidates by decreasing values of Σ21 + Σ22 , i.e. the ﬁnal mark is just the Euclidean distance between an unbiased result and a given sample. We may generalize this result to the case where the biases, which we denote 1 and 2 , are diﬀerent in both equations; in this case, the likelihood-ratio is given by 1 2 µ(1 ,2 ) = e−2n(2 +2 ) cosh(41 Σ1 ) cosh(42 Σ2 ) (16) A ﬁrst order approximation is then given by µ(1 ,2 ) ≈ 1 + 8Σ21 21 + 8Σ22 22 − 2n(21 + 22 ) which is equivalent to put a grade equal to µ(1 ,2 ) = rize these facts in the following theorem.

Σ21 21 + Σ22 22 .

(17) We summa-

Theorem 2. Under Assumptions 1 and 2, in a linear cryptanalysis using t approximations on disjoint key bits subsets having each a bias equal to i , 1 ≤ i ≤ t, a procedure ranking the subkey candidates by decreasing µ(1 ,...,t ) =

t

2

(Σi i )

(18)

i=1

is a Neyman-Pearson ranking procedure, and furthermore, it is optimal in terms on key tests. Sketch of the proof : The proof is similar to the one of Theorem 1 and follows from the fact that β is a monotone increasing function when µ(1 ,...,t ) is decreasing. ♦

244

4.1

Pascal Junod and Serge Vaudenay

Experimental Results

The Neyman-Pearson ranking procedure described in the previous section has been simulated in the context of 21 linear cryptanalysis of DES, using the data of [Jun01]. The following table summarises our experimental results on the complexity of the exhaustive search part of the attack given 243 known plaintextciphertext pairs; we use the following notation: µC denotes the average experimental complexity, C85% the maximal complexity given a success probability of 85 %, which is the success probability deﬁned by Matsui in [Mat94], Cmed the median, Cmin and Cmax being the extremal values. Matsui’s Ranking Optimal Ranking

∆

log2 µC

41.4144

40.8723

-31.32 %

log2 C85%

40.7503

40.6022

-9.75 %

log2 Cmed

38.1267

36.7748

-60.71 %

log2 Cmin

32.1699

31.3219

-40.00 %

log2 Cmax

45.4059

44.6236

-41.86 %

These results lead to following observations: – The average complexity is decreased by a factor of about 30 %. Actually, the average complexity is not a good statistical indicator for the average behavior of the linear cryptanalysis, because most cases have a far lower complexity and only 3 cases have a complexity greater than the average. Thus, those three cases have a considerable inﬂuence on the average complexity and it is worth examining the median behavior. – A perhaps more signiﬁcant result is that the median complexity is decreased by a factor of about 60 %. Although one have to be careful with this result because of the small size of the statistical samples number, this value seems to be more accurate regarding the real impact of the improved rule as the average one. – Although the optimal rule decreases the exhaustive search part complexity on average, “pathological” cases where Matsui’s heuristic is better than the Neyman-Pearson ranking procedure can occur. One can explain this by the fact that the Σ densities are sometimes bad approximations of the real ones, several heuristic assumptions being involved. As the data complexity and the computational complexity of a linear cryptanalysis are closely related, it is possible (and desirable in the context of a knownplaintext attack) to convert a gain in the ﬁrst category to a gain in the second one: even if we decrease sensibly the number of known plaintext-ciphertext pairs, the complexity will remain within reasonable areas: for instance, given 242.46 known plaintext-ciphertext pairs, Cˆ85% = 244.46 DES evaluations, and with only 242 pairs, Cˆ85% = 246.86 ; these experimental values are summarized in the following table:

Optimal Key Ranking Procedures in a Statistical Cryptanalysis

Data complexity

242.00 242.46 243.00

Time complexity

246.86 244.46 240.60

245

Success probability 85 % 85 % 85 % 4.2

Other Attacks

Several published attacks (to the best of our knowledge, all are derived from Matsui’s paper) use key ranking procedures or suggest them as potential improvment. In [SK98], Shimoyama and Kaneko use quadratic boolean approximations of DES’ S-boxes possessing a larger bias. The ﬁrst part of their attack consists in a traditional linear cryptanalysis, and thus we can apply our optimal ranking procedure; furthermore, another part of their attack consists also in a sorting procedure using Matsui’s heuristic. In [KM01], Knudsen and Mathiassen show how to modiﬁy Matsui’s attack into a chosen-plaintexts attack in order to reduce the needs of pairs. Their attack can also use the “reversing trick”, i.e. one can apply the same linear characteristic on both encryption and decryption function, in order to derive twice as much key bits. A new time, one could use a key-ranking procedure and our optimal rule to deﬁne the order of the subkey candidates during the exhaustive search part.

5

Conclusion

In this paper, we show that considering a statistical cryptanalysis in a hypothesis testing framework allows to deﬁne the shape of an optimal distinguisher. We note that one can apply such a distinguisher to various published attacks, all of them being more or less related to Matsui’s linear cryptanalysis as applied against DES. We demonstrate experimentally that our distinguisher, in the case of a classical linear cryptanalysis of DES, allows a non-trivial computational complexity decrease. Simulations on 21 real attacks suggest an average complexity of 240.87 DES evaluations instead of 241.41 , as stated in [Jun01]. If one accepts a 15 % failure probability, which is the usual setting, the complexity had upper bound 240.61 . Equivalently, as exhaustive search operations are typically less costly than the collection of known plaintext-ciphertext pairs, this technique allows to decrease the number of needed pairs and to keep the computational complexity of the attack in cryptanalyst-friendly areas. Our experiments led, with a success probability of 85 %, to 244.85 DES evaluations given 242.46 pairs, or to 246.86 DES evaluations given only 242 pairs. Finally, we would like to outline that statistical hypothesis testing concepts seem to be very useful when considering distinguishing procedures in both theoretical and experimental settings. This seems to be conﬁrmed by the increasing interest of the cryptology community in this kind of mathematical tools.

246

Pascal Junod and Serge Vaudenay

Acknowledgments We would like to thank Thomas Baign`eres and the anonymous reviewers for useful and interesting comments.

References [CHJ02] D. Coppersmith, S. Halevi, and C. Jutla. Cryptanalysis of stream ciphers with linear masking. In Advances in Cryptology – CRYPTO’02, volume 2442 of LNCS, pages 515–532. Springer-Verlag, 2002. [DES77] National Bureau of Standards. Data Encryption Standard. U. S. Department of Commerce, 1977. [FM01] S. R. Fluhrer and D. A. McGrew. Statistical analysis of the alleged RC4 keystream generator. In FSE’00, volume 1978 of LNCS, pages 19–30. SpringerVerlag, 2001. [GM] J.D. Goli´c and R. Menicocci. Edit probability correlation attacks on stop/go clocked keystream generators. To appear in the Journal of Cryptology. [HKM95] C. Harpes, G. Kramer, and J.L. Massey. A generalization of linear cryptanalysis and the applicability of Matsui’s piling-up lemma. In Advances in Cryptology – EUROCRYPT’95, volume 921 of LNCS, pages 24–38. Springer-Verlag, 1995. [Jun01] P. Junod. On the complexity of Matsui’s attack. In Selected Areas in Cryptography, SAC’01, volume 2259 of LNCS, pages 199–211. Springer-Verlag, 2001. [Jun03] P. Junod. On the optimality of linear, diﬀerential and sequential distinguishers. To appear in Advances in Cryptology – EUROCRYPT’03, LNCS. Springer-Verlag, 2003. [KM01] L.R. Knudsen and J.E. Mathiassen. A chosen-plaintext linear attack on DES. In FSE’00, volume 1978 of LNCS, pages 262–272. Springer-Verlag, 2001. [KR94] B. S. Kaliski and M. J. B. Robshaw. Linear cryptanalysis using multiple approximations. In Advances in Cryptology – CRYPTO’94, volume 839 of LNCS, pages 26–39. Springer-Verlag, 1994. [Mat93] M. Matsui. Linear cryptanalysis method for DES cipher. In Advances in Cryptology – EUROCRYPT’93, volume 765 of LNCS, pages 386–397. Springer-Verlag, 1993. [Mat94] M. Matsui. The ﬁrst experimental cryptanalysis of the Data Encryption Standard. In Advances in Cryptology – CRYPTO’94, volume 839 of LNCS, pages 1–11. Springer-Verlag, 1994. [Mir02] I. Mironov. (Not so) random shuﬄes of RC4. In Advances in Cryptology – CRYPTO’02, volume 2442 of LNCS, pages 304–319. Springer-Verlag, 2002. [MPWW95] S. Murphy, F. Piper, M. Walker, and P. Wild. Likelihood estimation for block cipher keys. Technical report, Information Security Group, University of London, England, 1995. [SK98] T. Shimoyama and T. Kaneko. Quadratic relation of s-box and its application to the linear attack of full round DES. In Advances in Cryptology – CRYPTO’98, volume 1462 of LNCS, pages 200–211. Springer-Verlag, 1998. [Vau96] S. Vaudenay. An experiment on DES statistical cryptanalysis. In 3rd ACM Conference on Computer and Communications Security, pages 139–147. ACM Press, 1996.

Improving the Upper Bound on the Maximum Diﬀerential and the Maximum Linear Hull Probability for SPN Structures and AES Sangwoo Park1 , Soo Hak Sung2 , Sangjin Lee3 , and Jongin Lim3 1

3

National Security Research Institute, Korea [email protected] 2 Department of Applied Mathematics Pai Chai University, Korea [email protected] Center for Information Security Technologies(CIST) Korea University, Korea {sangjin,jilim}@cist.korea.ac.kr

Abstract. We present a new method for upper bounding the maximum diﬀerential probability and the maximum linear hull probability for 2 rounds of SPN structures. Our upper bound can be computed for any value of the branch number of the linear transformation and by incorporating the distribution of diﬀerential probability values and linear probability values for S-box. On application to AES, we obtain that the maximum diﬀerential probability and the maximum linear hull probability for 4 rounds of AES are bounded by 1.144 × 2−111 and 1.075 × 2−106 , respectively.

1

Introduction

Diﬀerential cryptanalysis [2] and linear cryptanalysis [12] are the most wellknown methods of analysing the security of block ciphers. Accordingly, the designer of block ciphers should evaluate the security of any proposed block cipher against diﬀerential cryptanalysis and linear cryptanalysis and prove that it is suﬃciently invulnerable against them. SPN(Substitution and Permutation Network) structure is one of the most commonly used structure in block ciphers. SPN structure is based on Shannon’s principles of confusion and diﬀusion [3] and these principles are implemented through the use of substitution and linear transformation, respectively. AES [6, 14], Crypton [11], and Square [5] are block ciphers composed of SPN structures. The security of SPN structures against diﬀerential cryptanalysis and linear cryptanalysis depends on the maximum diﬀerential probability and the maximum linear hull probability. Hong et al. proved the upper bound on the maximum diﬀerential and the maximum linear hull probability for 2 rounds of SPN structures with highly diﬀusive linear transformation [7]. Kang et al. generalized their result for any value of the branch number of the linear transformation [8]. In [10], Keliher et al. proposed a method for ﬁnding the upper bound on the maximum average linear hull probability for SPN structures. Application of T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 247–260, 2003. c International Association for Cryptologic Research 2003

248

Sangwoo Park et al.

their method to AES yields an upper bound of 2−75 when 7 or more rounds are approximated. In [9], it was proposed that the improved upper bound on the maximum average linear hull probability for AES when 9 or more rounds are approximated is 2−92 . In [15], Park et al. proposed a method for upper bounding the maximum diﬀerential probability and the maximum linear hull probability for Rijndael-like structures. Rijndael-like structure is a special case of SPN structures. By applying their method to AES, they obtain that the maximum diﬀerential probability and the maximum linear hull probability for 4 rounds of AES are bounded by 1.06 × 2−96 . In this paper, we present a new method for upper bounding on the maximum diﬀerential probability and the maximum linear hull probability for 2 rounds of SPN structures. Our upper bound can be computed for any value of the branch number of the linear transformation and by incorporating the distribution of diﬀerential probability values and linear probability values for S-box. On application to AES, we obtain that the maximum diﬀerential probability and the maximum linear hull probability for 4 rounds of AES are bounded by 1.144 × 2−111 and 1.075 × 2−106 , respectively.

2

Backgrounds

One round of SPN structures generally consists of three layers of key addition, substitution, and linear transformation. On the key addition layer, round subkeys and round input values are exclusive-ored. Substitution layer is made up of n small nonlinear substitutions referred to as S-boxes, and the linear transformation layer is a linear transformation used in order to diﬀuse the cryptographic characteristics of the substitution layer. A typical example of one round of SPN structures is given in Figure 1.

Fig. 1. One round of SPN structure.

On r rounds of SPN structures, the linear transformation of the last round, generally, is omitted, because it has no cryptographic signiﬁcance. Therefore, 2 rounds of SPN structures is given in Figure 2. S-boxes and linear transformations should be invertible in order to decipher. Therefore we assume that all S-boxes are bijections from Z2m to itself. Moreover, throughout this paper, we assume that round subkeys are independent and uniformly distributed.

Improving the Upper Bound

249

Fig. 2. 2 rounds of SPN structure.

Let S be an S-box with m input and output bits. Diﬀerential and linear probability of S are deﬁned as in the following deﬁnition: Deﬁnition 1. For any given a, b, Γa , Γb ∈ Z2m , deﬁne diﬀerential probability DP S (a, b) and linear probability LP S (Γa , Γb ) of S by DP S (a, b) = and LP S (Γa , Γb ) =

#{x ∈ Z2m |S(x) ⊕ S(x ⊕ a) = b} 2m 2

#{x ∈ Z2m |Γa · x = Γb · S(x)} −1 2m−1

,

respectively, where x · y denotes the parity(0 or 1) of bitwise product of x and y. a and b are called input and output diﬀerences, respectively. Also, Γa and Γb are called input and output mask values, respectively. The strength of an S-box S against diﬀerential cryptanalysis is determined by the maximum diﬀerential probability, maxa=0,b DP S (a, b). The strength of an S-box S against linear cryptanalysis depends on the maximum linear probability, maxΓa ,Γb =0 LP S (Γa , Γb ). Deﬁnition 2. The maximum diﬀerential probability p and the maximum linear probability q of S are deﬁned by p = max DP S (a, b) a=0,b

and

q = max LP S (Γa , Γb ), Γa ,Γb =0

respectively. The maximum diﬀerential probability p and the maximum linear probability q for a strong S-box S should be small enough for any input diﬀerence a = 0 and any output mask value Γb = 0.

250

Sangwoo Park et al.

Deﬁnition 3. Diﬀerentially active S-box is deﬁned as an S-box given a nonzero input diﬀerence and linearly active S-box is deﬁned as an S-box given a nonzero output mask value. Since all S-boxes in substitution layer are bijective, if an S-box is diﬀerentially/linearly active, then it has a non-zero output diﬀerence/input mask value. For SPN structures, there is a close relationship between the diﬀerential probability and the number of diﬀerentially active S-boxes. When the number of diﬀerentially active S-boxes is large, the diﬀerential probability becomes to be small, and when the number of diﬀerentially active S-boxes is small, the diﬀerential probability becomes to be big. Therefore, the concept of branch number was proposed [5]. We call it the branch number from the viewpoint of diﬀerential cryptanalysis, the minimum number of diﬀerentially active S-boxes of 2 rounds of SPN structures. Also, we call it the branch number from the viewpoint of linear cryptanalysis, the minimum number of linearly active S-boxes of 2 rounds of SPN structures. The linear transformation L : (Z2m )n −→ (Z2m )n can be represented by an n×n matrix M = (mij ). We have L(x) = M x, where x ∈ (Z2m )n and the addition is done through bitwise exclusive-or. For the block ciphers E2 [13] and Camellia [1], mij ∈ Z2 and the multiplication is trivial. For the block cipher Crypton [11], mij ∈ Z2m and the multiplication is the bitwise logical-and operation. For the block cipher Rijndael [6], mij ∈ GF (2m ) and the multiplication is deﬁned as the multiplication over GF (2m ). It is easy to show that L(x) ⊕ L(x∗ ) = L(x ⊕ x∗ ) and DP L (a, L(a)) = 1 [4]. Deﬁnition 4. Let L be the linear transformation over (Z2m )n . The branch number of L from the view point of diﬀerential cryptanalysis, βd , is deﬁned by βd = minx=0 {wt(x) + wt(L(x))}, where, wt(x) = wt(x1 , x2 , . . . , xn ) = #{1 ≤ i ≤ n|xi = 0}. Throughout this paper, we deﬁne wt(x) = wt(x1 , x2 , . . . , xn ) = #{1 ≤ i ≤ n|xi = 0} when x = (x1 , x2 , . . . , xn ). If x ∈ Z2m , then wt(x) is the Hamming weight of x. It can be proved that, if mij ∈ Z2 , then LP L (M t Γb , Γb ) = 1. Therefore, we know that LP L (Γa , (M −1 )t Γa ) = 1. Also, if mij ∈ GF (2m ), then it can be proved that LP L (Γa , CΓa ) = 1, for some n × n matrix C over GF (2m ) [8]. Therefore, we can deﬁne the branch number βl from the view point of linear cryptanalysis as follows: minΓa =0 {wt(Γa ) + wt((M −1 )t Γa )}, if mij ∈ Z2 , 1 ≤ i, j ≤ n, βl = if mij ∈ GF (2m ), 1 ≤ i, j ≤ n. minΓa =0 {wt(Γa ) + wt(CΓa )},

3

Security of 2 Rounds of SPN Structures

In this section, we give an upper bound on the maximum diﬀerential probability for 2 rounds of SPN structure. We also give an upper bound on the maximum linear hull probability.

Improving the Upper Bound

251

The following lemma can be considered as a generalized Cauchy-Schwarz inequality. Lemma 1. Let {xi }ni=1 , 1 ≤ j ≤ m, be sequence of real numbers. Then the following inequality is satisﬁed. m1 n m1 m1 n n n (1) (2) (m) (1) (2) (m) |xi xi · · · xi | ≤ |xi |m |xi |m ··· |xi |m . (j)

i=1

i=1

i=1

i=1

Proof. We will prove the result by using mathematical induction. For m = 2, the result is trivial. Assume that the result holds for m − 1. We have, by the H¨ older’s inequality, that m−1 n m1 n m n (1) (m) m (1) (m−1) (m) (m−1) m−1 m |xi · · · xi xi | ≤ |xi · · · xi | |xi | . i=1

i=1

i=1

By the induction hypothesis, the right hand side is bounded by n m1 m1 n m1 n (1) (m−1) (m) m m m |xi | ··· |xi | |xi | . i=1

i=1

i=1

Thus, the result is proved. From Lemma 1, we get the following lemma. Lemma 2. Let {xi }ni=1 , 1 ≤ j ≤ m, be sequence of real numbers. Then the following inequality is satisﬁed. (j)

n i=1

(1) |xi

(m) · · · xi |

≤ max {

n i=1

(1) |xi |m , · · ·

,

n i=1

(m) m

|xi

| }.

Theorem 1. Let βd be the branch number of the linear transformation L from the viewpoint of diﬀerential cryptanalysis. Then, the maximum diﬀerential probability for 2 rounds of SPN structure is bounded by   m m 2 2 −1 −1   Si βd Si βd . max {DP (u, j)} , max max {DP (j, u)} max max 1≤i≤n 1≤u≤2m −1  1≤i≤n 1≤u≤2m −1 j=1

j=1

Proof. Let a = (a1 , · · · , an ), b = (b1 , · · · , bn ) be the input diﬀerence and output diﬀerence, respectively, for 2 rounds of SPN structure. Since DP L (α, L(α)) = 1, the diﬀerential probability DP2 (a, b) is given as  n  n

DP2 (a, b) = DP Si (ai , xi )  DP Sj (yj , bj ) , x

i=1

j=1

252

Sangwoo Park et al.

where y = L(x), x = (x1 , · · · , xn ), and y = (y1 , · · · , yn ). Without loss of generality, we assume that a1 = 0, · · · , ak = 0, ak+1 = 0, · · · , an = 0, b1 = 0, · · · , bl = 0, bl+1 = 0, · · · , bn = 0. Note that if α = 0, β = 0 or α = 0, β = 0, then DP Si (α, β) = 0. Hence, it is enough to consider the following x(and y = L(x)) only in the above summation. x1 = 0, · · · , xk = 0, xk+1 = 0, · · · , xn = 0, y1 = 0, · · · , yl = 0, yl+1 = 0, · · · , yn = 0. We let the solutions of the above system be as follows: t x1 · · · xk (1) (k) 1 x1 · · · x1 (1) (k) 2 x2 · · · x2 .. .. .. . . . (1) (k) δ xδ · · · xδ

y1 · · · yl (1) (l) y1 · · · y 1 (1) (l) y2 · · · y 2 .. .. . . (1) (l) yδ · · · y δ

Then the maximum diﬀerential probability DP2 (a, b) can be written as  k  l δ

(i) (j) DP2 (a, b) = DP Si (ai , xt )  DP Sj (yt , bj ) . t=1

i=1

j=1

By the deﬁnition of branch number, it follows that k + l ≥ βd . We divide the proof into two cases: k + l = βd and k + l > βd . (Case 1: k + l = βd ). In this case, we have that, for each i(1 ≤ i ≤ k), (i) (i) x1 , · · · , xδ are distinct, because L is linear and k + l = βd . If, for some (i) (i) (i) (i) i(1 ≤ i ≤ k), x1 , · · · , xδ are not distinct, then there exist a pair (xJ , xJ ) (i) (i) (i) (i) such that xJ = xJ , where xJ is i-th component of x and xJ is i-th component of x , respectively. Therefore, i-th component of x ⊕ x is equal to zero. Since L(x) ⊕ L(x ) = L(x ⊕ x ), this is a contradiction of the deﬁnition of branch (j) (j) number. We also have that, for each j(1 ≤ j ≤ l), y1 , · · · , yδ are distinct. From Lemma 2, DP2 (a, b) is bounded by δ δ (1) βd (k) S1 max {DP (a1 , xt )} , · · · , {DP Sk (ak , xt )}βd , t=1 δ

t=1

{DP

S1

(1) (yt , b1 )}βd , · · ·

 

max max 1≤i≤n 1≤u≤2m −1 max

{DP

Sl

(l) (yt , bl )}βd

t=1

t=1

≤ max

,

δ

max

m 2 −1

m −1 2

1≤i≤n 1≤u≤2m −1

{DP Si (u, j)}βd ,

j=1

j=1

{DP Si (j, u)}βd

  

.

Improving the Upper Bound (i)

(i)

(j)

253

(j)

(Case 2: k+l > βd ). In this case, x1 , · · · , xδ or y1 , · · · , yδ are not necessarily dintinct. However, when we consider the subset of solutions such that k + l − βd components are ﬁxed(x1 = i1 , . . . , xp = ip , y1 = j1 , . . . , yq = jq ), each of the other βd components has distinct values, where 0 ≤ p ≤ k − 1, 0 ≤ q ≤ l − 1, and p + q = k + l − βd . We denote this subset of solutions by Ai1 ,...,ip ,j1 ,...,jq . Note that Ai1 ,...,ip ,j1 ,...,jq could be the empty set. As in the case 1(or by Lemma 2), we obtain that

= DP

  k

DP Si (ai , xi )  DP Sj (yj , bj )

i=1

(x,y)∈Ai1 ,...,ip ,j1 ,...,jq S1

k

j=1

Sp

(a1 , i1 ) · · · DP (ap , ip )DP (j1 , b1 ) · · · DP Sq (jq , bq ) ×    k k

 DP Si (ai , xi )  DP Sj (yj , bj ) i=p+1

(x,y)∈Ai1 ,...,ip ,j1 ,...,jq

j=q+1

(a1 , i1 ) · · · DP (ap , ip )DP (j1 , b1 ) · · · DP Sq (jq , bq ) ×  m 2 −1  max max max {DP Si (u, j)}βd , 1≤i≤n 1≤u≤2m −1 j=1  m 2 −1  Si βd max max {DP (j, u)} 1≤i≤n 1≤u≤2m −1 

≤ DP

S1

S1

Sp

S1

j=1

=: pi1 ,...,ip ,j1 ,...,jq Thus DP2 (a, b) is bounded by m 2 −1

i1 =1

= max

···

 

m m 2 −1 2 −1

···

ip =1 j1 =1

m 2 −1

max max 1≤i≤n 1≤u≤2m −1 max

pi1 ,...,ip ,j1 ,...,jq

jq =1 m 2 −1

max

1≤i≤n 1≤u≤2m −1

{DP Si (u, j)}βd ,

j=1 m 2 −1

j=1

{DP Si (j, u)}βd

  

.

From Cases 1 and 2, the result is proved. Corollary 1. Let βd be the branch number of the linear transformation L from the viewpoint of diﬀerential cryptanalysis. Then the maximum diﬀerential probability for 2 rounds of SPN structure is bounded by pβd −1 , where p is the maximum diﬀerential probability for the S-boxes.

254

Sangwoo Park et al.

Proof. By Theorem 1, the maximum diﬀerential probability for 2 rounds of SPN structure is bounded by  m 2 −1  βd −1 p × max max max DP Si (u, j), 1≤i≤n 1≤u≤2m −1 j=1  m 2 −1  Si max max DP (j, u) = pβd −1 . 1≤i≤n 1≤u≤2m −1  j=1

Theorem 2. Let βl be the branch number of the linear transformation L from the viewpoint of the linear cryptanalysis. The maximum linear hull probability for 2 rounds of SPN structure is bounded by   m m 2 2 −1 −1   Si βl Si βl max max . max {DP (u, j)} , max max {DP (j, u)} 1≤i≤n 1≤u≤2m −1 1≤i≤n 1≤u≤2m −1  j=1

j=1

Corollary 2. Let βl be the branch number of the linear transformation L from the viewpoint of linear cryptanalysis. Then the maximum linear hull probability for 2 rounds of SPN structure is bounded by q βl −1 , where q is the maximum linear hull probability for the S-boxes. Hong et al. proved Corollary 1 and 2 when βl = n + 1 or n [7]. Kang et al. proved them for any value of the branch number of the linear transformation [8].

4

Security of AES

AES is a block cipher composed of SPN structures and its linear transformation consists of ShiftRows transformation and MixColumns transformation. Let π : (Z28 )16 −→ (Z28 )16 be the ShiftRows transformation of AES. Let x = (x1 ,x2 ,x3 ,x4 ) = (x11 ,x12 ,x13 ,x14 , x21 , . . ., x34 , x41 ,x42 ,x43 ,x44 ) be the input of π. Figure 3 illustrate the ShiftRows transformation π of AES.

Fig. 3. ShiftRows transformation of AES.

Let y = (y1 ,y2 ,y3 ,y4 ) = (y11 ,y12 ,y13 ,y14 , y21 , . . ., y34 , y41 ,y42 ,y43 ,y44 ) be the output of π. It is easy to check that, for any i(i = 1, 2, 3, 4), each byte of yi comes

Improving the Upper Bound

255

from diﬀerent xi . For example, for y1 = (y11 , y12 , y13 , y14 ) = (x11 , x22 , x33 , x44 ), x11 is a byte coming from x1 . Furthermore, x22 , x33 and x44 are elements of x2 , x3 and x4 , respectively. The MixColumns transformation of AES operates on the state column by column, treating each column as a four-term polynomial. Let θ = (θ1 , θ2 , θ3 , θ4 ) be the MixColumns transformation of AES. Let y = (y1 , y2 , y3 , y4 ) = (y11 , y12 , y13 , y14 , y21 , . . ., y34 , y41 ,y42 ,y43 ,y44 ) be the input of θ and z = (z1 ,z2 ,z3 ,z4 ) = (z11 ,z12 ,z13 ,z14 , z21 , . . ., z34 , z41 ,z42 ,z43 ,z44 ) be the output of θ, respectively. Each of θi can be written as a matrix multiplication as follows:       02 03 01 01 zi1 yi1 yi2  01 02 03 01 zi2      = yi3  01 01 02 03 · zi3  . 03 01 01 02 yi4 zi4 In the matrix multiplication, the addition is just bitwise exclusive-or and the multiplication is deﬁned as the multiplication over GF (28 ). We can consider each θi as a linear transformation and we know that the branch number of each θi is 5. In [15], the upper bound on the maximum diﬀerential probability for 2 rounds of Rijndael-like structure is obtained as follows: Deﬁnition 5. Rijndael-like structures are the block ciphers composed of SPN structures satisfying the followings: (i) Their linear transformation has the form (θ1 , θ2 , θ3 , θ4 ) ◦ π. (ii) (The condition of π) Each of bytes of yi comes from each diﬀerent xi , where x = (x1 , x2 , x3 , x4 ) is input of π and y = (y1 , y2 , y3 , y4 ) is output of π, respectively. (iii) (The condition of θ = (θ1 , θ2 , θ3 , θ4 )) When we consider each of θi as a linear transformation, the followings hold: βdθ1 = βdθ2 = βdθ3 = βdθ4 and βlθ1 = βlθ2 = βlθ3 = βlθ4 . Deﬁnition 6. For x = (x1 , . . . , xn ), the pattern of x, γx , is deﬁned by γx = (γ1 , . . . , γn ) ∈ Z2n , where, if xi = 0, then γi = 0, and if xi = 0, then γi = 1. Theorem 3 ([15]). pwt(γπ(a) )(βd −1) , DP2 (a, b) ≤ 0,

if γπ(a) = γb , otherwise.

By Theorem 3, the upper bound on the maximum diﬀerential probability for 2 rounds of Rijndael-like structures is pβd −1 . By applying Theorem 3 to AES, it is obtained that the maximum diﬀerential probability for 2 rounds of AES is bounded by 2−24 , because βd = 5, p = 2−6 . Note that this result depends on the maximum diﬀerential probability of S-box.

256

Sangwoo Park et al.

By applying our result to Theorem 3, new upper bound on the maximum diﬀerential probability for 2 rounds of AES can be obtained. We apply Theorem 1 to 2 rounds of AES. Let S be the S-box of AES. If nonzero a ∈ Z28 is ﬁxed, and b varies over Z28 , then the distribution of diﬀerential probability of S-box, DP S (a, b) is independent of a, and is given in Table 1. In Table 1, ρi is the diﬀerential probability and πi is the number of occurrences of ρi . If nonzero b ∈ Z28 is ﬁxed, and a varies over Z28 , then the same distribution is obtained. Table 1. The distribution of diﬀerential probability for AES S-box. i 1 2 3 ρi 2−6 2−7 0 πi 1 126 129

From Theorem 1 and Table 1, we have DP2θi (a, b) ≤ 1.23 × 2−28 .

255

j=1 {DP

S

(1, j)}5 ≈

Theorem 4. When γπ(a) = γb , the upper bound of the maximum diﬀerential probability of 2 rounds of AES is as following: DP2 (a, b) ≤ (1.23 × 2−28 )wt(π(a)) . Therefore, the maximum diﬀerential probability of 2 rounds of AES is bounded by 1.23 × 2−28 . To compute the upper bound on the maximum diﬀerential probability for 4 rounds of AES, we need the following notations: (i)

(i)

(i)

(i)

(i)

(i)

(i)

(i)

(i)

(i)

– x(i) = (x1 , . . . , x4 ) = (x11 , x12 , x13 , x14 , . . . , x41 , x42 , x43 , x44 ): the input of π at i-th round. (i) (i) (i) (i) (i) (i) (i) (i) (i) (i) – y (i) = (y1 , . . . , y4 ) = (y11 , y12 , y13 , y14 , . . . , y41 , y42 , y43 , y44 ): the output of π at i-th round, i.e. the input of θ at i-th round. (i) (i) (i) (i) (i) (i) (i) (i) (i) (i) – z (i) = (z1 , . . . , z4 ) = (z11 , z12 , z13 , z14 , . . . , z41 , z42 , z43 , z44 ): the output of θ at i-th round. Theorem 5. The diﬀerential probability for 4 rounds of AES is bounded by 1.144 × 2−111 . Proof. We compute the upper bound on DP4 (a, b) for the value of wt(γπ(a) ) and wt(b). Since βd = 5, if wt(γπ(a) ) + wt(b) ≤ 4, then DP4 (a, b) = 0. Therefore, it is suﬃcient to compute the upper bound on DP4 (a, b), when wt(γπ(a) )+wt(b) ≥ 5. (Case 1: wt(γπ(a) ) = 4). By Theorem 4, DP2 (a, x(2) )DP2 (z (2) , b) ≤ max DP2 (a, x(2) ) DP4 (a, b) = x(2)

≤ (1.23 × 2−28 )4 ≈ 1.144 × 2−111 .

x(2)

Improving the Upper Bound

257

(Case 2: wt(b) = 4). By Theorem 4, DP4 (a, b) =

DP2 (a, x(2) )DP2 (z (2) , b) ≤ max DP2 (z (2) , b) z (2)

x(2)

≤ (1.23 × 2−28 )4 ≈ 1.144 × 2−111 . (Case 3: wt(γπ(a) ) = 2 and wt(b) = 3). We assume that γπ(a) = (1, 1, 0, 0) and γb = (1, 1, 1, 0). Then we can represent DP4 (a, b) as follows: DP4 (a, b) =

DP2 (a, x(2) )DP2 (z (2) , b)

x(2)

=

4

DP2 (a, x(2) )DP1 (z (2) , b) =: I + II + III + IV.

i=1 x(2) ,wt(z (2) )=i (2)

(2)

(3)

We know that wt(yi ) ≤ wt(x(2) ) = wt(γπ(a) ) = 2 and wt(zi ) = wt(xi ) ≤ (2) (2) wt(b) = 3. Since βdθi = 5, we obtain that wt(yi ) = 2 and wt(zi ) = 3, where (2) (2) yi and zi are the nonzero components of y (2) and z (2) , respectively. Note that (2) (2) yi is the input mask of θi and zi is the output mask of θi . Now, we compute the value of I. We can represent I as follows:

I =

DP2 (a, x(2) )DP2 (y (2) , b)

x(2) ,γy(2) =(1,0,0,0)

+

DP2 (a, x(2) )DP2 (y (2) , b)

x(2) ,γy(2) =(0,1,0,0)

+

DP2 (a, x(2) )DP2 (y (2) , b)

x(2) ,γy(2) =(0,0,1,0)

+

DP2 (a, x(2) )DP2 (y (2) , b)

x(2) ,γy(2) =(0,0,0,1)

=: I1 + I2 + I3 + I4 At ﬁrst, we compute the value of I1 . Since γx(2) = γπ(a) = (1, 1, 0, 0), γz(2) = (2) (1, 0, 0, 0), and, wt(y1 ) = 2, from the deﬁnition of π, we obtain that x(2) = (2) (2) (x11 , 0, 0, 0, 0, 0, 0, x24 , 0, 0, 0, 0, 0, 0, 0, 0). Furthermore, since γz(2) = γx(3) , (3)

(3)

(2)

1

1

γz = γy = γb = (1, 1, 1, 0), and, wt(z1 ) = 3, we obtain that z (2) = (2) (2) (2) (2) (2) (2) (2) (z11 , z12 , z13 , 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0). (x11 , 0, 0, x24 ) and (z11 , z12 , (2) z13 , 0) are the nonzero input mask and output mask of θ1 , respectively. Since (2) (2) (2) (2) (2) βdθ1 = 5, each of x11 , x24 , z11 , z12 , z13 is of distinct value. Therefore, we can establish the following:

258

Sangwoo Park et al.

I1 =

DP2θ1 (a∗1 , (x11 , 0, 0, 0))DP2θ2 (a∗2 , (0, 0, 0, x24 ))DP2 (y (2) , b) (2)

(2)

(2)

x(2) ,γy =(1,0,0,0)

≤ P4

DP2θ1 (a∗1 , (x11 , 0, 0, 0)), (2)

(2)

x11

where P = 1.23 × 2−28 , the upper bound of DP2θi (a, b). By applying the same method, the upper bounds of I2 , I3 and I4 can be determined.   (2) (2) I ≤ P4  DP2θi (a∗1 , (x11 , 0, 0, 0)) + DP2θi (a∗1 , (0, x12 , 0, 0)) (2)

(2)

x11

+

x12

DP2θi (a∗1 , (0, 0, x13 , 0)) + (2)

(2)



(2)  DP2θi (a∗1 , (0, 0, 0, x14 )) .

(2)

x13

x14

Using the same method, we arrive at the followings: (2) DP2θ1 (a∗1 , x1 ) II ≤ P 4 (2)

wt(x1 )=2

III ≤ P 4

DP2θ1 (a∗1 , x1 ) (2)

(3)

wt(x1 )=2

IV ≤ P 4

DP2θ1 (a∗1 , x1 ) (2)

(4)

wt(x1 )=2

Therefore, DP4 (a, b) ≤ I + II + III + IV ≤ P 4

DP2θi (a∗1 , x1 ) = P 4 (2)

(2)

x1

≤ (1.23 × 2

−28 4

) ≈ 1.144 × 2−111 .

(Case 4: wt(γπ(a) ) = 3 and wt(b) = 2). The proof is similar to that of Case 3 and we arrive at the following: DP4 (a, b) ≤ (1.23 × 2−28 )4 ≈ 1.144 × 2−111 . (Case 5: wt(γπ(a) ) = 3 and wt(b) = 3). The proof is similar to that of Case 3 and we arrive at the following: DP4 (a, b) ≤ (1.23 × 2−28 )4 ≈ 1.144 × 2−111 . The distribution of linear probability value LP S (a, b) for AES S-box is given in the Table 2. In the table, ρi is the linear probability value and φi is the number of occurence of ρi . 255 From Theorem 2 and Table 2, we have LP2θi (a, b) ≤ j=1 {LP S (1, j)}5 ≈ 1.44 × 2−27 . Using the similar method as in Theorem 5, we can compute the upper bound on the linear hull probability for 4 rounds of AES.

Improving the Upper Bound

259

Table 2. The distribution of linear probability values for AES S-box. i

1

8 2 ρi ( 64 )

φi

5

2

3

4

5

6

7

8

9

7 2 ( 64 )

6 2 ( 64 )

5 2 ( 64 )

4 2 ( 64 )

3 2 ( 64 )

2 2 ( 64 )

1 2 ( 64 )

0

16

36

24

34

40

36

48 17

Theorem 6. The linear probability of 4 rounds of AES is bounded by (1.44 × 2−27 )4 ≈ 1.075 × 2−106 . We know that the diﬀerential probabilities for r(r ≥ 5) rounds of AES are smaller than or equal to the maximum diﬀerential probability for 4 rounds of AES. DP5 (a, b) = DP4 (a, x(4) )DP1 (z (4) , b) ≤ max DP4 (a, x(4) ). x(4)

x(4)

Therefore, the upper bound on the maximum diﬀerential probability in Theorem 5 is the upper bound for r(r ≥ 5) rounds of AES. Similarly, the maximum linear hull probability for 4 rounds of AES in Theorem 6 is the upper bound for r(r ≥ 5) rounds of AES.

5

Conclusion

In this paper, we have obtained a new upper bound on the maximum diﬀerential probability and the maximum linear hull probability for 2 rounds of SPN structure. Our upper bound can be computed for any value of the branch number of the linear transformation. By applying this result, we have proved that the maximum diﬀerential probability for 4 rounds of AES is bounded by 1.144 × 2−111 . Also, we have proved that the maximum linear hull probability for 4 rounds of AES is bounded by 1.075 × 2−106 .

References 1. Kazumaro Aoki, Tetsuya Ichikawa, Masayuki Kanda, Mitsuru Matsui, Shiho Moriai, Junko Nakajima, and Toshio Tokita. Camellia: A 128-bit block cipher suitable for multiple platforms - design and analysis. In Douglas R. Stinson and Staﬀord Tavares, editors, Selected Areas in Cryptography, volume 2012 of Lecture Notes in Computer Science, pages 39–56. Springer, 2000. 2. Eli Biham and Adi Shamir. Diﬀerential cryptanalysis of DES-like cryptosystems. Journal of Cryptology, 4(1):3–72, 1991. 3. C.E.Shannon. Communication Theory of Secrecy System. Bell System Technical Journal, 28:656–715, October 1949. 4. Joan Daemen, Ren´e Govaerts, and Joos Vandwalle. Correlation matrices. In Bart Preneel, editor, Fast Software Encryption, Second International Workshop, volume 1008 of Lecture Notes in Computer Science, pages 275–285. Springer, 1994.

260

Sangwoo Park et al.

5. Joan Daemen, Lars R. Knudsen, and Vincent Rijmen. The block cipher square. In Eli Biham, editor, Fast Software Encryption, 4th International Workshop, volume 1267 of Lecture Notes in Computer Science, pages 149–165. Springer, 1997. 6. Joan Daemen and Vincent Rijmen. Rijndael, AES Proposal. http://www.nist.gov/aes, 1998. 7. Seokhie Hong, Sangjin Lee, Jongin Lim, Jaechul Sung, Donghyeon Cheon, and Inho Cho. Provable security against diﬀerential and linear cryptanalysis for the SPN structure. In Bruce Schneier, editor, Fast Software Encryption, 7th International Workshop, volume 1978 of Lecture Notes in Computer Science, pages 273–283. Springer, 2000. 8. Ju-Sung Kang, Seokhie Hong, Sangjin Lee, Okyeon Yi, Choonsik Park, and Jongin Lim. Practical and provable security against diﬀerential and linear cryptanalysis for substitution-permutation networks. ETRI Journal, 23(4):158–167, 2001. 9. Liam Keliher, Henk Meijer, and Staﬀord Tavares. Improving the upper bound on the maximum average linear hull probability for Rijndael. In Serge Vaudenay and Amr M. Youssef, editors, Selected Areas in Cryptography, 8th Annual International Workshop, volume 2259 of Lecture Notes in Computer Science, pages 112–128. Springer, 2001. 10. Liam Keliher, Henk Meijer, and Staﬀord Tavares. New method for upper bounding the maximum average linear hull probability for SPNs. In Birgit Pﬁtzmann, editor, Advances in Cryptology - Eurocrypt 2001, volume 2045 of Lecture Notes in Computer Science, pages 420–436. Springer-Verlag, Berlin, 2001. 11. Chae Hoon Lim. CRYPTON, AES Proposal. http://www.nist.gov/aes, 1998. 12. Mitsuru Matsui. Linear cryptanalysis method for DES cipher. In Tor Helleseth, editor, Advances in Cryptology - Eurocrypt’93, volume 765 of Lecture Notes in Computer Science, pages 386–397. Springer-Verlag, Berlin, 1994. 13. NTT-Nippon Telegraph and Telephone Corporation. E2: Eﬃcient Encryption algorithm, AES Proposal. http://www.nist.gov/aes, 1998. 14. National Institute of Standards and Technology. FIPS PUB 197 : Advanced Encryption Standard(AES), November 2001. 15. Sangwoo Park, Soo Hak Sung, Seongtaek Chee, E-Joong Yoon, and Jongin Lim. On the security of Rijndael-like structures against diﬀerential and linear cryptanalysis. In Yuliang Zheng, editor, Advances in Cryptology - Asiacrypt 2002, volume 2501 of Lecture Notes in Computer Science, pages 176–191. Springer, 2002.

Linear Approximations of Addition Modulo 2n Johan Wall´en Laboratory for Theoretical Computer Science Helsinki University of Technology P.O.Box 5400, FIN-02015 HUT, Espoo, Finland [email protected]

Abstract. We present an in-depth algorithmic study of the linear approximations of addition modulo 2n . Our results are based on a fairly simple classiﬁcation of the linear approximations of the carry function. Using this classiﬁcation, we derive an Θ(log n)-time algorithm for computing the correlation of linear approximation of addition modulo 2n , an optimal algorithm for generating all linear approximations with a given non-zero correlation coeﬃcient, and determine the distribution of the correlation coeﬃcients. In the generation algorithms, one or two of the selection vectors can optionally be ﬁxed. The algorithms are practical and easy to implement. Keywords: Linear approximations, correlation, modular addition, linear cryptanalysis.

1

Introduction

Linear cryptanalysis [8] is one of the most powerful general cryptanalytic methods for block ciphers proposed by date. Since its introduction, resistance against this attack has been a standard design goal for block ciphers. Although some design methodologies to achieve this goal have been proposed—for example [12, 10, 4, 13]—many block ciphers are still designed in a rather ad hoc manner, or dictated by other primary design goals. For these ciphers, it it important to have eﬃcient methods for evaluating their resistance against linear cryptanalysis. At the heart of linear cryptanalysis lies the study of the correlation of linear approximate relations between the input and output of functions. Good linear approximations of ciphers are usually found heuristically by forming trails consisting of linear approximations of the components of the cipher. In order to search the space of linear trails, e.g. using a Branch-and-bound algorithm (see e.g. [5, 9, 1]), we need eﬃcient methods for computing the correlation of linear approximations of the simplest components of the cipher, as well as methods for generating the relevant approximations of the components. Towards this goal, we study a few basic functions often used in block ciphers. Currently, block ciphers are usually build from local nonlinear mappings, global linear mappings, and arithmetic operations. The mixture of linear mappings and arithmetic operations seems fruitful, since they are suitable for software implementation, and their mixture is diﬃcult to analyse mathematically. T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 261–273, 2003. c International Association for Cryptologic Research 2003

262

Johan Wall´en

While the latter property intuitively should make standard cryptanalysis intractable , it also makes it diﬃcult to say something concrete about the security of the cipher. Perhaps the simplest arithmetic operations in wide use are addition and subtraction modulo 2n . Interestingly, good tools for studying linear approximations of even these simple mappings have not appeared in the literature to date. In this paper, we consider algorithms for two important problems for linear approximations of these operations: for computing the correlation of any given linear approximation and for generating all approximations with a correlation coeﬃcient of a given absolute value. Our results are based on a fairly simple classiﬁcation of the linear approximations of the carry function. Using this classiﬁcation, we derive Θ(log n)-time algorithms for computing the correlation of of linear approximations of addition and subtraction modulo 2n in a standard RAM model of computation. The classiﬁcation also gives optimal (that is, linear in the size of the output) algorithms for generating all linear approximations of addition or subtraction with a given non-zero correlation. In the generation algorithms, one or two of the selection vectors may optionally be ﬁxed. As a simple corollary, we determine closed-form expressions for the distribution of the correlation coeﬃcients. We hope that our result will facilitate advanced linear cryptanalysis of ciphers using modular arithmetic. Similar results with respect to diﬀerential cryptanalysis [2] are discussed in [7, 6]. The simpler case with one addend ﬁxed is considered in [11] with respect to both linear and diﬀerential cryptanalysis. In the next section, we discuss linear approximations and some preliminary results. In Sect. 3, we derive our classiﬁcation of linear approximations of the carry function, and the corresponding results for addition and subtraction. Using this classiﬁcation, we then present the Θ(log n)-time algorithm for computing the correlation of linear approximations in Sect. 4, and the generation algorithms in Sect. 5.

2 2.1

Preliminaries Linear Approximations

Linear cryptanalysis [8] views (a part of) the cipher as a relation between the plaintext, the ciphertext and the key, and tries to approximate this relation using linear relations. The following standard terminology is convenient for discussing these linear approximations. Let f, g : IFn2 → IF2 be Boolean functions. The correlation between f and g is deﬁned by c(f, g) = 21−n {x ∈ IFn2 | f (x) = g(x)} − 1 . This is simply the probability taken over x that f (x) = g(x) scaled to a value in n t [−1, 1]. Let u = (um−1 , . . . , u0 )t ∈ IFm 2 and w = (wn−1 , . . . , w0 ) ∈ IF2 be binary n m column vectors, and let h : IF2 → IF2 . Let w · x = wn−1 xn−1 + · · · + w1 x1 + w0 x0 denote the standard dot product. Deﬁne the linear function lw : IFn2 → IF2 by lw (x) = w · x for all w ∈ IFn2 . A linear approximation of h is an approximate

Linear Approximations of Addition Modulo 2n

263

relation of the form u · h(x) = w · x. Such a linear approximation will be denoted h

− w, or simply u ← − w when h is clear from context. by the formal expression u ← h Its eﬃciency is measured by its correlation C(u ← − w) deﬁned by h

− w) = c(lu ◦ h, lw ) . C(u ← Here, u and w are the output and input selection vectors, respectively. 2.2

Fourier Analysis

There is a well-known Fourier-based framework for studying linear approximations [3]. Let f : IFn2 → IF2 be a Boolean function. The corresponding realvalued function fˆ: IFn2 → IR is deﬁned by fˆ(x) = (−1)f (x) . With this notation, c(f, g) = 2−n x∈IFn fˆ(x)ˆ g (x). Note also that f + g ↔ fˆgˆ. Recall that an al2 gebra A over a ﬁeld IF is a ring, such that A is a vector space over IF, and a(xy) = (ax)y = x(ay) for all a ∈ IF and x, y ∈ A. Deﬁnition 1. Let Bn = fˆ | f : IFn2 → IF2 be the real algebra generated by the n-variable Boolean functions. As usual, the addition, multiplication, and multiplication by scalars are given by (ξ + η)(x) = ξ(x) + η(x), (ξη)(x) = ξ(x)η(x) and (aξ)(x) = a(ξ(x)) for all ξ, η ∈ Bn and a ∈ IR. The algebra Bn is of course unital and commutative. The vector space Bn is turned into an inner product space by adopting the standard inner product for real-valued discrete functions. This inner product is deﬁned by ξ, η = 2−n (ξη)(x) , ∀ξ, η ∈ Bn . x∈IFn 2

For Boolean functions, f, g : IFn2 → IF2 , fˆ, gˆ = c(f, g). Since the set of linear functions {ˆlw | w ∈ IFn2 } forms an orthonormal basis for Bn , every ξ ∈ Bn has a unique representation as αw ˆlw , where αw = ξ, ˆlw ∈ IR . ξ= w∈IFn 2

The corresponding Fourier transform F : Bn → Bn is given by F(ξ) = Ξ , where Ξ is the mapping w → ξ, ˆlw . This is usually called the Walsh-Hadamard transform of ξ. For a Boolean function f : IFn2 → IF2 , the Fourier transform Fˆ = F(fˆ) simply gives the correlation between f and the linear functions: Fˆ (w) = c(f, lw ). For ξ, η ∈ Bn , their convolution ξ ⊗ η ∈ Bn is given by ξ(x + t)η(t) . (ξ ⊗ η)(x) = t∈IFn 2

264

Johan Wall´en

Clearly, Bn is a commutative, unital real algebra also under convolution as multiplication. The unity is the function δ such that δ(0) = 1 and δ(x) = 0 for x = 0. As usual, the Fourier transform is an algebra isomorphism between the commutative, unital real algebras Bn , +, · and Bn , +, ⊗. Let f : IFn2 → IFm 2 be a Boolean function. Since the correlation of a linear f approximation of f is given by C(u ← − w) = F(l u f )(w), the correlation of linear approximations can conveniently be studied using the Fourier transform. Since lu f can be expressed as i:ui =1 fi , where fi denotes the ith component of f , we have the convolutional representation f

C(u ← − w) =

Fˆi ,

i:ui =1

where Fˆi = F(fˆi ). Especially when using the convolutional representation, it f

will be convenient to consider C(u ← − w) as a function of w with u ﬁxed.

3 3.1

Linear Approximations of Addition Modulo 2n k-Independent Recurrences

We will take a slightly abstract approach to deriving algorithms for studying linear approximations of addition modulo 2n , since this approach might turn out to be useful also for some related mappings. The key to the algorithms are a certain class of k-independent recurrences. The name comes from the fact that they will be used to express the correlation of linear approximations of functions whose ith output bit is independent of the (i + k)th input bit an higher. We let ei ∈ IFn2 denote a vector whose ith component is 1 and the other 0. If x ∈ IFn2 , x denotes the component-wise complement of x: xi = xi + 1. Let eq : IFn2 × IFn2 → IFn2 be deﬁned by eq(x, y)i = 1 if and only if xi = yi . That is, eq(x, y) = x + y. For x, y ∈ IFn2 , we let xy = (xn−1 yn−1 , . . . , x1 y1 , x0 y0 )t denote their component-wise product. Deﬁnition 2. A function f : IFn2 × IFn2 → IR is k-independent, if f (x, y) = 0 whenever xj = 0 or yj = 0 for some j ≥ k. Let r0 , r : IFn2 × IFn2 → IR be kindependent functions. A recurrence Ri = Rir0 ,r is k-independent, if it has the form R0 (x, y) = r0 (x, y) , and 1 i+k r(x , y) + r(x, y i+k ) + Ri (x, y) − Ri (xi+k , y i+k Ri+1 (x, y) = 2 for i > 0, where we for compactness have denoted z i+k = z + ei+k . Note that Rj is a k + j-independent function for all j. Note that k-independent recurrences can be eﬃciently computed, provided that we eﬃciently can compute the base cases r and r0 . The crucial observation is

Linear Approximations of Addition Modulo 2n

265

that at most one of the terms in the expression for Ri+1 is non-zero, and that we can determine which of the four terms might be non-zero by looking only at xi+k and yi+k . The four terms consider the cases (xi+k , yi+k ) = (1, 0), (0, 1), (0, 0), and (1, 1), respectively. This observation yields the following lemma. Lemma 1. Let Ri = Rir0 ,r be a k-independent recurrence. Then R0 (x, y) = r0 (x, y) , and 1 r(xei+k , yei+k ) , Ri+1 (x, y) = 21 xi+k Ri (xei+k , yei+k ) , 2 (−1)

if xi+k = yi+k and if xi+k = yi+k .

It turns out that the k-independent recurrences of interest can be solved by ﬁnding a certain type of common preﬁx of the arguments. Towards this end, we deﬁne the common preﬁx mask of a vector. Deﬁnition 3. The common preﬁx mask cpmki : IFn2 → IFn2 is for all j deﬁned by cpmki (x)j = 1 if and only if k ≤ j < k + i and x = 1 for all j < < k + i. Let wH (x) = {i | xi = 0} denote the Hamming weight of x ∈ IFn2 . Lemma 2. Let Ri = Rir0 ,r be a k-independent recurrence. Denote r1 = r, and let z = cpmki (eq(x, y)), = wH (z) and s = (−1)wH (zxy) . Let b = 0, if xz = yz and let b = 1 otherwise. Then Ri (x, y) = s · 2 rb (xz, yz) . Proof. For i = 0, cpmk0 (eq(x, y)) = 0, = 0, s = 1, and b = 0. Thus, the lemma holds for i = 0, so consider i + 1. Let x = xei+k , y = yei+k , z = cpmki (eq(x , y )), = wH (z ), s = (−1)wH (z x y ) , and b = 0, if x z = y z and b = 1 otherwise. By Lemma 1, there are two cases to consider. If xi+k = yi+k , z = ei+k , = 1, s = 1, and b = 1. In this case s·2 rb (xz, yz) = 12 r(xei+k , yei+k ) = Ri+1 (x, y). If xi+k = yi+k , z = ei+k +z , = +1, s = s (−1)xi+k , and b = b . In this case, s·2 rb (xz, yz) = 12 (−1)xi+k ·s 2 rb (x z , y z ) = 12 (−1)xi+k Ri (x , y ) = Ri+1 (x, y). We will next consider the convolution of k-independent recurrences. Lemma 3. Let Ri = Riδ,δ be a 0-independent recurrence, and let f : IFn2 → IR be k-independent. Deﬁne Si = Ri+k ⊗ f , s = f , and s0 = Rk ⊗ f . Then Si = Sis0 ,s is a k-independent recurrence. Proof. Clearly, s0 and s are k-independent. Furthermore, S0 = Rk ⊗ f = s0 by deﬁnition. Finally, 2Si+1 (x, y) = 2R(i+k)+1 (x, y) ⊗ f (x, y) = (δ(xi+k , y) + δ(x, y i+k ) + Ri+k (x, y) − Ri+k (xi+k , y i+k )) ⊗ f (x, y) = f (xi+k , y) + f (x, y i+k ) + (Ri+k ⊗ f )(x, y) − (Ri+k ⊗ f )(xi+k , y i+k ), where we have used the notation z i+k = z + ei+k .

266

3.2

Johan Wall´en

Linear Approximations of the Carry Function

In this subsection, we derive a classiﬁcation of the linear approximations of the carry function modulo 2n . It will turn out that the correlation of arbritrary linear approximations of the carry function can be expressed as a recurrence of the type studied in the previous subsection. We will identify the vectors in IFn2 and the elements in ZZ 2n using the natural correspondence (xn−1 , . . . , x1 , x0 )t ∈ IFn2 ↔ xn−1 2n−1 + · · · + x1 21 + x0 20 ∈ ZZ 2n . To avoid confusion, we sometimes use ⊕ and to denote addition in IFn2 and ZZ 2n , respectively. Deﬁnition 4. Let carry : IFn2 × IFn2 → IFn2 be the carry function for addition modulo 2n deﬁned by carry(x, y) = x ⊕ y ⊕ (x y), and let ci = carryi denote the ith component of the carry function for i = 0, . . . , n − 1. Note that the ith component of the carry function can be recursively computed as c0 (x, y) = 0, and ci+1 (x, y) = 1 if and only if at least two of xi , yi and ci (x, y) are 1. By considering the 8 possible values of xi , yi and ci (x, y), we see that

cˆ0 (x, y) = 1 and cˆi+1 (x, y) = 12 (−1)xi + (−1)yi + cˆi (x, y) − (−1)xi +yi cˆi (x, y) . Thus we have Lemma 4. The Fourier transform of the carry function cˆi is given by the recurrence Cˆ0 (v, w) = δ(v, w) , and 1 δ(v + ei , w) + δ(v, w + ei ) + Cˆi (v, w) − Cˆi (v + ei , w + ei ) , Cˆi+1 (v, w) = 2 for i = 0, . . . , n − 1. Note that this indeed is a 0-independent recurrence. In the sequel, we will need a convenient notation for stripping oﬀ ones from the high end of vectors. Deﬁnition 5. Let x ∈ IFn2 and ∈ {0, . . . , n}. Deﬁne strip(x) to be the vector in IFn2 that results when the highest component that is 1 in x (if any) is set to 0. By convention, strip(0) = 0. Similarly, let strip(, x) denote the vector that results when all but the lowest ones in x have been set to zero. For example, strip(2, 1011101) = 0000101. Let u ∈ IFn2 and let {i | ui = 1} = {k1 , . . . , km } with k < k+1 . Deﬁne j0 = 0 and j+1 = k+1 − k for = 0, . . . , m − 1. Then carry

C(u ←−−− v, w) =

Cˆi (v, w) =

i:ui =1

m i=1

Deﬁne a sequence of recurrences S0,i , . . . , Sm,i by S0,i = δ , and S+1,i = Cˆi+k ⊗ S,j ,

Cˆki (v, w) .

Linear Approximations of Addition Modulo 2n

267

for = 0, . . . , m − 1. The crucial observation is that carry

S,j (v, w) = C(strip(, u) ←−−− v, w) carry

for all . Thus, C(u ←−−− v, w) = Sm,jm (v, w). Lemma 5. Let S,i , j , and k be as above. Deﬁne s , s by s1 = s1 = δ, and for s ,s

> 0 by s+1 = S,j and s+1 = s . Then S,i = S,i recurrence for all > 0, where k0 = 0.

is a k−1 -independent

s ,s

Proof. For = 1, the result is clear. If S,i = S,i is a k−1 -independent recurrence for some ≥ 1, then S,j is a j +k−1 = k -independent function. By f0 ,f is a k -independent recurrence with f = S,j = s+1 Lemma 3, S+1,i = S+1,i ˆ ˆ and f0 = Ck ⊗ S,j = Ck ⊗ (Cˆj +k−1 ⊗ S−1,j−1 ) = Cˆk ⊗ Cˆk ⊗ S−1,j−1 = S−1,j−1 = s+1 . For any function f , we let f 0 denote the identity function and f i+1 = f ◦ f i . Lemmas 2 and 5 now give Lemma 6. The correlation of any linear approximation of the carry function is carry given recursively as follows. First, C(0 ←−−− v, w) = δ(v, w). Second, if u = 0, let j ∈ {0, . . . , n − 1} be maximal such that uj = 1. If strip(u) = 0, let k be maximal such that strip(u)k = 1. Otherwise, let k = 0. Denote i = j − k. Let z = cpmki (eq(v, w)), = wH (z), and s = (−1)wH (zvw) . If vz = wz, set b = 2. Set b = 1 otherwise. Then carry

carry

C(u ←−−− v, w) = s · 2− C(stripb (u) ←−−− vz, wz) . Our next goal is to extract all the common preﬁx masks computed in the previous lemma, and combine them into a single common preﬁx mask depending on u. This gives a more convenient formulation of the previous lemma. Deﬁnition 6. The common preﬁx mask cpm : IFn2 × IFn2 → IFn2 is deﬁned recursively as follows. First, cpm(0, y) = 0. Second, if x = 0, let j be maximal such that xj = 1. If strip(x) = 0, let k be maximal such that strip(x)k = 1. Otherwise, let k = 0. Denote i = j − k and z = cpmki (y) If zy = z, set b = 2. Set b = 1 otherwise. Then cpm(x, y) = cpmki (y) + cpm(stripb (x), y) . Theorem 1. Let u, v, w ∈ IFn2 , and let z = cpm(u, eq(v, w)). Then 0 , if vz = 0 or wz = 0, and carry C(u ←−−− v, w) = wH (vw) −wH (z) ·2 , otherwise. (−1) Since the only nonlinear part of addition modulo 2n is the carry function, it should be no surprise that the linear properties of addition completely reduce to those of the carry function. Subtraction is also straightforward. When we are − v, w, we are actually approximating approximating the relation xy = z by u ← − u, w. With this observation, it is trivial to prove the relation z y = x by v ←

268

Johan Wall´en

Lemma 7. Let u, v, w ∈ IFn2 . The correlations of linear approximations of addition and subtraction modulo 2n are given by

carry

carry

− v, w) = C(u ←−−− v + u, w + u) , and C(u ← C(u ← − v, w) = C(v ←−−− u + v, w + v) . Moreover, the mappings (u, v, w) → (u, v + u, w + u) and (u, v, w) → (v, u + v, w + v) are permutations in (IFn2 )3 .

4 4.1

The Common Preﬁx Mask RAM Model

We will use a standard RAM model of computation consisting of n-bit memory cells, logical and arithmetic operations, and conditional branches. Speciﬁcally, we will use bitwise and (∧), or (∨), exclusive or (⊕) and negation (·), logical shifts ( and ), and addition and subtraction modulo 2n ( and ). As a notational convenience, we will allow our algorithms to return values of the form s2−k , where s ∈ {0, 1, −1}. In our RAM model, this can be handled by returning s and k in two registers. 4.2

Computing cpm

To make the domain of cpm clear, we write cpmn = cpm : IFn2 × IFn2 → IFn2 . We will extend the deﬁnition of cpm to a 3-parameter version. Deﬁnition 7. Let cpmn : {0, 1} × IFn2 × IFn2 → IFn2 be deﬁned by cpmn (b, x, y) = (zn−1 , . . . , z0 )t , where z = cpmn+1 ((b, x)t , (0, y)t ). Lemma 8 (Splitting lemma). Let n = k + with k, > 0. For any vector x ∈ IFn2 , let xL ∈ IFk2 and xR ∈ IF2 be such that x = (xL , xR )t . Then cpmn (x, y) = (cpmk (xL , y L ), cpm (b, xR , y R ))t , L L L where b = xL 0 if and only if (y0 , cpmk (x , y )0 ) = (1, 1).

Proof. Let w = wH (xL ) and z L = cpmk (xL , y L ). If w = 0, the result is trivial. L If w = 1 and xL 0 = 1, b = 1 and the result holds. If w = 1 and x0 = 0, b = 1 L L L if and only if z0 = 1 and y0 = 1. If w = 2 and x0 = 1, b = 0 if and only if z0L = 1 and y0L = 1. Finally, if w = 2 and xL 0 = 0, or w > 2, the result follows by induction. Using this lemma, we can easily come up with an Θ(log n)-time algorithm for computing cpmn (x, y). For simplicity, we assume that n is a power of two (if not, the arguments can be padded with zeros). The basic idea is to compute both cpmn (0, x, y) and cpmn (1, x, y) by splitting the arguments in halves, recursively compute the masks for the halves in parallel in a bit-sliced manner, and then combine the correct upper halves with the correct lower halves using the splitting lemma. Applying this idea bottom-up gives the following algorithm.

Linear Approximations of Addition Modulo 2n

269

Theorem 2. Let n be a power of 2, let α(i) ∈ IFn2 consist of blocks of 2i ones and zeros starting from the lest signiﬁcant end (e.g. α(1) = 0011 · · · 0011), and let x, y ∈ IFn2 . The following algorithm computes cpm(x, y) using Θ(log n) time and constant space in addition to the Θ(log n) space used for the constants α(i) . 1. Initialise β = 1010 · · · 1010, z0 = 0, and z1 = 1. 2. For i = 0, . . . , log2 n − 1, do (a) Let γb = ((y ∧ zb ∧ x) ∨ (y ∧ zb ∧ x)) ∧ β for b ∈ {0, 1}. (b) Set γb ← γb (γb 2i ) for b ∈ {0, 1}. (c) Let tb = (zb ∧ α(i) ) ∨ (z0 ∧ γb ∧ α(i) ) ∨ (z1 ∧ γb ) for b ∈ {0, 1}. (d) Set zb ← tb for b ∈ {0, 1}. (e) Set β ← (β 2i ) ∧ α(i+1) . 3. Return z0 . Note that α(i) and the values of β used in the algorithm only depend on n. For convenience, we introduce the following notation. Let β (i) ∈ IFn2 be such that (i) β = 1 iﬀ −2i is a non-negative multiple of 2i+1 (e.g. β (1) = 0100 · · · 01000100). For b ∈ {0, 1}, let z (i) (b, x, y) = (cpm2i (b, x(n/2 i

i

−1)

, y (n/2

i

−1)

), . . . , cpm2i (b, x(0) , y (0) ))t , i

where x = (x(n/2 −1) , . . . , x(0) )t and y = (y (n/2 −1) , . . . , y (0) )t . We also let x → y, z denote the function “if x then y else z”. That is, x → y, z = (x ∧ y) ∨ (x ∧ z). Proof (of Theorem 2). The algorithm clearly terminates in time Θ(log n) and uses constant space in addition to the masks α(i) . The initial value of β can also be constructed in logarithmic time. We show by induction on i that β = β (i) and zb = z (i) (b, x, y) at the start of the ith iteration of the for-loop. For i = 0, this clearly holds, so let i ≥ 0. Consider the vectors x, y and zb split into 2i+1 -bit blocks, and let x , y , and zb denote one of these blocks. After step 2a, γb, = (y ∧ zb, ) → x , x when − 2i is a multiple of 2i+1 , and γb, = 0 otherwise. Let ξ denote the bit of γb corresponding to the middle bit of the block under consideration. By induction and the splitting lemma, cpm(b, x , y ) = L L R R (cpm(b, x , y ), cpm(ξ, x , y ))t . After step 2b, a block of the form χ00 · · · 0 in γb has been transformed to a block of the form 0χχ · · · χ. In step 2c, the upper half of each block zb is combined with the corresponding lower half of the block zξ to give tb = cpm(b, x , y ). That is, tb = z (i+1) (b, x, y). Finally, β = β (i+1) after step 2e. Since the Hamming weight can be computed in time O(log n), we have the following corollary.

Corollary 1. Let u, v, w ∈ IFn2 . The correlation coeﬃcients C(u ← − v, w) and

C(u ← − v, w) can be computed in time Θ(log n) (using the algorithm in Theorem 2 and the expressions in Theorem 1 and Lemma 7).

270

5

Johan Wall´en

Generating Approximations

In this section, we derive a recursive description of the linear approximations carry u ←−−− v, w with a given non-zero correlation coeﬃcient. For simplicity, we only consider the absolute values of the correlation coeﬃcients. The recursive description immediately gives optimal generation algorithms for the linear apcarry proximations. By Theorem 1, the magnitude of C(u ←−−− v, w) is either zero or 1 a power of 2 . Thus, we start by considering the set of vectors (u, v, w) ∈ (IFn2 )3 carry such that C(u ←−−− v, w) = ±2−k . carry We will use the splitting lemma to determine the approximations u ←−−− v, w with non-zero correlation and wH (cpmn (u, eq(v, w))) = k. Note that cpmn (x, y) = (cpmn−1 (xL , y L ), cpm1 (b, x0 , y0 ))t , L L L where b = xL 0 iﬀ (y0 , cpmn−1 (x , y )0 ) = (1, 1). Now, cpm1 (b, x0 , y0 ) = 1 iﬀ L b = 1 iﬀ either x0 = 1 and (y0L , cpmn−1 (xL , y L )0 ) = (1, 1) or xL 0 = 0 and (y0L , cpmn−1 (xL , y L )0 ) = (1, 1). Let the {0, 1}-valued bn (x, y) = 1 iﬀ x0 = 1 and (y0 , cpmn (x, y)0 ) = (1, 1) or x0 = 0 and (y0 , cpmn (x, y)0 ) = (1, 1), let − v, w) = ±2−k , bn (u, eq(v, w)) = 1}, and F (n, k) = {(u, v, w) ∈ (IFn2 )3 | C(u ← n 3 let G(n, k) = {(u, v, w) ∈ (IF2 ) | C(u ← − v, w) = ±2−k , bn (u, eq(v, w)) = 0}. n 3 Let A(n, k) = {(u, v, w) ∈ (IF2 ) | C(u ← − v, w) = ±2−k }. Then A(n, k) is formed from F (n − 1, k − 1) and G(n − 1, k) by appending any three bits to the approximations in F (n − 1, k − 1) (since u0 and eq(v, w)0 are arbitrary, and cpmn (u, eq(v, w))0 = 1) and by appending {(0, 0, 0), (1, 0, 0)} to the approximations in G(n − 1, k) (since u0 is arbitrary and cpmn (u, eq(v, w))0 = 0). Let S = {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)}, T = {(0, 0, 1), (0, 1, 0), (1, 0, 0), (1, 1, 1)}, and denote y = eq(v, w). We denote concatenation simply by juxtaposition. The set F (n, k) can be divided into two cases.

1. The vectors with wH (cpmn−1 (uL , y L )) = k, bn−1 (uL , y L ) = 0, and bn (u, y) = 1. Since (u0 , y0 ) ∈ {(1, 0), (1, 1)} and cpmn (u, y)0 = 0, this set equals G(n − 1, k)(1, 0, 0). 2. The vectors with wH (cpmn−1 (uL , y L )) = k − 1, bn−1 (uL , y L ) = 1 and bn (x, y) = 1. Since (u0 , y0 ) ∈ {(0, 1), (1, 0)} and cpmn (u, y)0 = 1, this set equals F (n − 1, k − 1)S. That is, F (n, k) = G(n − 1, k)(1, 0, 0) ∪ F (n − 1, k − 1)S . Clearly, F (1, 0) = {(1, 0, 0)} and F (n, k) = ∅ when k < 0 or k ≥ n. Similarly, G(n, k) can be divided into two cases: 1. The vectors with wH (cpmn−1 (uL , y L )) = k, bn−1 (uL , y L ) = 0, and bn (u, y) = 0. Since (u0 , y0 ) ∈ {(0, 0), (0, 1)} and cpmn (u, y)0 = 0, this set equals G(n − 1, k)(0, 0, 0). 2. The vectors with wH (cpmn−1 (uL , y L )) = k − 1, bn−1 (uL , y L ) = 1 and bn (u, y) = 0. Since (u0 , y0 ) ∈ {(0, 0), (1, 1)} and cpmn (u, y)0 = 1, this set equals F (n − 1, k − 1)T .

Linear Approximations of Addition Modulo 2n

271

That is, G(n, k) = G(n − 1, k)(0, 0, 0) ∪ F (n − 1, k − 1)T . Clearly, G(1, 0) = {(0, 0, 0)} and G(n, k) = ∅ when k < 0 or k ≥ n. Theorem 3. Let A(n, k) = {(u, v, w) ∈ (IFn2 )3 | C(u ←−−− v, w) = ±2−k }. Then carry

A(n, k) = F (n − 1, k − 1)(IF2 × IF2 × IF2 ) ∪ G(n − 1, k){(0, 0, 0), (1, 0, 0)} , where F and G are as follows. Let S = {(0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 0)} and T = {(0, 0, 1), (0, 1, 0), (1, 0, 0), (1, 1, 1)}. First, F (1, 0) = {(1, 0, 0)}, G(1, 0) = {(0, 0, 0)}, and F (n, k) = G(n, k) = ∅ when k < 0 or k ≥ n. Second, when 0 ≤ k < n, F (n, k) = G(n − 1, k)(1, 0, 0) ∪ F (n − 1, k − 1)S , and G(n, k) = G(n − 1, k)(0, 0, 0) ∪ F (n − 1, k − 1)T . Here, juxtaposition denotes concatenation. From this theorem, it can be seen that there are 8(n − 1) linear approximations carry u ←−−− v, w with correlation ± 12 . In the notation of formal languages, these are the 8 approximations of the form carry

0n−2 1a ←−−− 0n−2 0b, 0n−2 0c for arbritrary a, b, c ∈ {0, 1}, and the 8(n − 2) approximations of the form carry

0n−i−3 1d0i g ←−−− 0n−i−3 0e0i 0, 0n−i−3 0f 0i 0 for (d, e, f ) ∈ {(0, 0, 1), (0, 1, 0), (1, 0, 0), (1, 1, 1)}, g ∈ {0, 1} and i ∈ {0, . . . , n − 3}. The recursive description in Theorem 3 can easily be used to generate all linear approximations with a given correlation. The straightforward algorithm uses O(n) space and is linear-time in the number of generated approximations. Clearly, this immediately generalise to the case where one or two of the selection vectors are ﬁxed. By Lemma 7, this also generalise to addition and subtraction modulo 2n . Corollary 2. The set of linear approximations with correlation ±2−k of the carry function, addition, or subtraction modulo 2n can be generated in optimal time (that is, linear in the size of the output) and O(n) space in the RAM model (by straightforward application of the recurrence in Theorem 3 and the expressions in Lemma 7). Moreover, one or two of the selection vectors can be optionally ﬁxed. Theorem 3 can also be used to determine the distribution of the correlation coeﬃcients.

272

Johan Wall´en

Corollary 3. Let N (n, k) = {(u, v, w) ∈ (IFn2 )3 | C(u ← − v, w) = ±2−k }. Then n−1 N (n, k) = 22k+1 k for all 0 ≤ k < n and N (n, k) = 0 otherwise. Thus, the number of linear approximations with non-zero correlation is 2 · 5n−1 . Proof. Based on Theorem 3, it is easy to see that   0 , N (n, k) = 2 ,   4N (n − 1, k − 1) + N (n − 1, k) ,

if k < 0 or k ≥ n, if n = 1 and k = 0, and otherwise.

The claim clearly holds for n = 1. By = 4N (n − 1, k −

induction, N

(n, k)2k+1 n−1 2k+1 n−2 1) + N (n − 1, k) = 4 · 22(k−1)+1 n−2 + 2 = 2 k k . Finally, n−1 n−1 k k−1 n−1 n−1 . k=0 N (n, k) = 2 k=0 k 4 =2·5 If we let X be a random variable with the distribution Pr[X = k] = Pr [− log2 |C(u ← − v, w)| = k | C(u ← − v, w) = 0] , u,v,w

we see that

k n−1−k 1 n−1 4 5 5 k

4 k 1 n−1−k for all 0 ≤ k < n, since 2 · 5n−1 n−1 = 22k+1 n−1 k k . Thus, X is 5 5 4 binomially distributed with mean 45 (n − 1) and variance 25 (n − 1).

Pr[X = k] =

6

Conclusions

In this paper, we have considered improved algorithms for several combinatorial problems related to linear approximations of addition modulo 2n . Our approach might seem unnecessarily complicated considering the surprising simplicity of the results (especially Theorem 3), but should lead to natural generalisations to other recursively deﬁned function. This generalisation and applications to block ciphers are, however, left to later papers. A reference implementation of the algorithms is available from the author.

Acknowledgements This work was supported by the Finnish Defence Forces Research Institute of Technology.

Linear Approximations of Addition Modulo 2n

273

References 1. Kazumaro Aoki, Kunio Kobayashi, and Shiho Moriai. Best diﬀerential characteristic search for FEAL. In Fast Software Encryption 1997, volume 1267 of LNCS, pages 41–53. Springer-Verlag, 1997. 2. Eli Biham and Adi Shamir. Diﬀerential Cryptanalysis of the Data Encryption Standard. Springer-Verlag, 1993. 3. Florent Chabaud and Serge Vaudenay. Links between diﬀerential and linear cryptanalysis. In Advances in Cryptology–Eurocrypt 1994, volume 950 of LNCS, pages 356–365. Springer-Verlag, 1995. 4. Joan Daemen. Cipher and Hash Function Design: Methods Based on Linear and Diﬀerential Cryptanalysis. PhD thesis, Katholieke Universiteit Leuven, March 1995. 5. E.L. Lawler and D.E. Wood. Branch-and-bound methods: a survey. Operations Research, 14(4):699–719, 1966. 6. Helger Lipmaa. On diﬀerential properties of Pseudo-Hadamard transform and related mappings. In Progress in Cryptology–Indocrypt 2002, volume 2551 of LNCS, pages 48–61. Springer-Verlag, 2002. 7. Helger Lipmaa and Shiho Moriai. Eﬃcient algorithms for computing diﬀerential properties of addition. In Fast Software Encryption 2001, volume 2355 of LNCS, pages 336–350. Springer-Verlag, 2002. 8. Mitsuru Matsui. Linear cryptanalysis method for DES cipher. In Advances in Cryptology–Eurocrypt 1993, volume 765 of LNCS, pages 386–397. Springer-Verlag, 1993. 9. Mitsuru Matsui. On correlation between the order of S-boxes and the strength of DES. In Advances in Cryptology–Eurocrypt 1994, volume 950 of LNCS, pages 366–375. Springer-Verlag, 1995. 10. Mitsuru Matsui. New structure of block ciphers with provable security against diﬀerential and linear cryptanalysis. In Fast Software Encryption 1996, volume 1039 of LNCS, pages 205–218. Springer-Verlag, 1996. 11. Hiroshi Miyano. Addend dependency of diﬀerential/linear probability of addition. IEICE Trans. Fundamentals, E81-A(1):106–109, 1998. 12. Kaisa Nyberg. Linear approximations of block ciphers. In Advances in Cryptology– Eurocrypt 1994, volume 950 of LNCS, pages 439–444. Springer-Verlag, 1995. 13. Serge Vaudenay. Provable security for block ciphers by decorrelation. In STACS 1998, volume 1373 of LNCS, pages 249–275. Springer-Verlag, 1998.

Block Ciphers and Systems of Quadratic Equations Alex Biryukov and Christophe De Canni`ere Katholieke Universiteit Leuven, Dept. ESAT/SCD-COSIC Kasteelpark Arenberg 10 B–3001 Leuven-Heverlee, Belgium {alex.biryukov,christophe.decanniere}@esat.kuleuven.ac.be Abstract. In this paper we compare systems of multivariate polynomials, which completely deﬁne the block ciphers Khazad, Misty1, Kasumi, Camellia, Rijndael and Serpent in the view of a potential danger of an algebraic re-linearization attack. Keywords: Block ciphers, multivariate quadratic equations, linearization, Khazad, Misty, Camellia, Rijndael, Serpent.

1

Introduction

Cryptanalysis of block ciphers has received much attention from the cryptographic community in the last decade and as a result several powerful methods of analysis (for example, diﬀerential and linear attacks) have emerged. What most of these methods have in common is an attempt to push statistical patterns through as many iterations (rounds) of the cipher as possible, in order to measure non-random behavior at the output, and thus to distinguish a cipher from a truly random permutation. A new generation of block-ciphers (among them the Advanced Encryption Standard (AES) Rijndael) was constructed with these techniques in mind and is thus not vulnerable to (at least a straightforward application of) these attacks. The task of designing ciphers immune to these statistical attacks is made easier by the fact that the complexity of the attacks grows exponentially with the number of rounds of a cipher. This ensures that the data and the time requirements of the attacks quickly become impractical. A totally diﬀerent generic approach is studied in a number of recent papers [5, 7], which attempt to exploit the simple algebraic structure of Rijndael. These papers present two related ways of constructing simple algebraic equations that completely describe Rijndael. The starting point is the fact that the only non-linear element of the AES cryptosystem, the S-box, is based on an inverse

The work described in this paper has been supported in part by the Commission of the European Communities through the IST Programme under Contract IST-199912324 and by the Concerted Research Action (GOA) Meﬁsto. F.W.O. Research Assistant, sponsored by the Fund for Scientiﬁc Research – Flanders (Belgium)

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 274–289, 2003. c International Association for Cryptologic Research 2003

Block Ciphers and Systems of Quadratic Equations

275

function (chosen for its optimal diﬀerential and linear properties). This allows to ﬁnd a small set of quadratic multivariate polynomials in input and output bits that completely deﬁne the S-box. Combining these equations, an attacker can easily write a small set of sparse quadratic equations (in terms of intermediate variables) that completely deﬁne the whole block-cipher. Building on recent progress in re-linearization techniques [4, 8] which provide sub-exponential algorithms to solve over-deﬁned systems of quadratic (or just low degree) equations, Courtois and Pieprzyk [5] argue that a method called XSL might provide a way to eﬀectively solve this type of equations and recover the key from a few plaintext-ciphertext pairs. The claimed attack method diﬀers in several respects from the standard statistical approaches to cryptanalysis: (a) it requires only few known-plaintext queries; (b) its complexity doesn’t seem to grow exponentially with the number of rounds of a cipher. However, no practical attack of this type was demonstrated even on a small-scale example, so far. Research on such attacks is still at a very early stage, the exact complexity of this method is not completely understood and many questions concerning its applicability remain to be answered. In this paper we will not try to derive a full attack or calculate complexities. Our intention is merely to compare the expected susceptibility of diﬀerent block ciphers to a hypothetical algebraic attack over GF (2) and GF (28 ). For this purpose we will construct systems of equations for the 128-bit key ciphers1 Khazad [3], Misty1 [9], Kasumi [10], Camellia-128 [2], Rijndael-128 [6] and Serpent-128 [1] and compute some properties that might inﬂuence the complexity of solving them.

2

Constructing Systems of Equations

The problem we are faced with is to build a system of multivariate polynomials which relates the key bits with one or two (in the case of the 64-bit block ciphers) plaintext-ciphertext pairs and which is as simple as possible. The main issue here is that we have to deﬁne what we understand by simple. Since we do not know the most eﬃcient way of solving such systems of equations, our simplicity criterion will be based on some intuitive assumptions: 1. Minimize the total number of free terms (free monomials). This is the number of terms that remain linearly independent when considering the system as a linear system in monomials. For example, adding two linearly independent equations which introduce only one new monomial will reduce the number of free terms by one. In order to achieve this, we will try to: (a) Minimize the degree of the equations. This reduces the total number of possible monomials. (b) Minimize the diﬀerence between the total number of terms and the total number of (linearly independent) equations. This is motivated by the fact that each equation can be used to eliminate a term. 1

For ciphers which allow diﬀerent key sizes, we will denote the 128-bit key version by appending “128” to the name of the cipher.

276

Alex Biryukov and Christophe De Canni`ere

2. Minimize the size of individual equations. This criterion arises from the observation that sparse systems are usually easier to solve. Note that point 1 already assures the “global sparseness” of the system and that point 2 adds some local sparseness if it is possible. Another criterion, which is used in [8] and [4], is to minimize the ratio between the total number of terms and the number of equations. This is equivalent to the criterion above when the system involves all terms up to a certain degree (as would be the case for a random quadratic system, for example). We believe, however, that this criterion is less natural in cases where the number of terms can be reduced, which is the case for the systems considered in this paper. The most straightforward way of constructing a system of equations for a block cipher is to derive equations for each individual component and to insert them in a single system. In the next subsections we will brieﬂy discuss the contribution of each component.

2.1

S-Boxes

In most block ciphers, the S-boxes are the only source of nonlinearity and the equations describing them will be the main obstacle that prevents the system from being easily solved. For any S-box of practical size, one can easily generate a basis of linearly independent multivariate polynomials that spans the space of all possible equations between the input and the output bits. This is illustrated for a small example in Appendix A.1. In this space we would like to ﬁnd a set of equations that is as simple as possible (according to our criterion), but still completely deﬁnes the S-box. In some cases, this optimal set of equations might be an over-deﬁned system2 . Performing an exhaustive search over all possible sets of equations is infeasible, even for small S-boxes. In this paper, we will therefore restrict our search to systems consisting only of equations from the basis. It appears that this restriction still produces suﬃciently simple systems for small S-boxes, although the results rapidly deteriorate when the size of the S-boxes increases. Fortunately, many large S-boxes used in practice are derived from simple algebraic functions, and this usually directly leads to simple polynomial systems (see Sect. 3.2, for example). Nothing guarantees however that these systems are optimal and the results derived in this paper should therefore be considered as an approximation only. An eﬃcient way of ﬁnding optimal systems for arbitrary S-boxes is still an interesting open problem. 2

In this paper, we do not consider “over-deﬁnedness” to be a criterion on itself. The reason is that it is not clear whether an over-deﬁned system with a lot of free terms should be preferred over a smaller, deﬁned system with less free terms. We note however that the systems of all S-boxes studied below can easily be made overdeﬁned, should the solving algorithm require it.

Block Ciphers and Systems of Quadratic Equations

2.2

277

FL-Blocks

Both Misty1 and Camellia-128 contain an additional nonlinear component called F L-block. It is a function of an input word X and a key word K and it is deﬁned as YR = XR ⊕ [(XL ∩ KL ) ≪ s] YL = XL ⊕ (YR ∪ KR )

(1) (2)

with X, Y and K 2w-bit words. The constant s is 0 for Misty1 and 1 for Camellia-128 and the word size w is 16 and 32 for Misty1 and Camellia-128 respectively. The deﬁnition above can directly be translated into a system of quadratic equations in GF (2): yR,i = xR,i + yL,j · kL,j yL,i = xL,i + yR,i + kR,i + yR,i · kR,i

(3) (4)

for 0≤i<w j = i − s mod w

(5) (6)

What should be remembered is that the 2w nonlinear equations of this system contain 6w linear and 2w quadratic terms. 2.3

Linear Layers

The linear layers of a block cipher consist of both linear diﬀusion layers and key additions. Writing a system of equations for these layers is very straightforward. The number of equations and the number of terms in such a linear system are ﬁxed and the only property we can somewhat control is the density of the individual equations, for example by combining equations that have many terms in common. In cases where a linear layer only consists of bit permutations, we will not insert separate equations, but just rename the variables accordingly.

3

Comparison of Block Ciphers

The block ciphers we intend to compare are Khazad, Misty1, Kasumi, Camellia-128, Rijndael-128 and Serpent-128. The characteristics of these ciphers are summarized in Table 1 and Table 2. For each of them we will construct a system of equations as described in the previous section, and compute the total number of terms and the number of equations. The ﬁnal results are listed in Table 3.

278

Alex Biryukov and Christophe De Canni`ere Table 1. Characteristics of the ciphers (without key schedule).

Block size Key size Rounds S-boxes S-box width Nonlinear ordera F L-blocks F L-block width a

Khazad Misty1 Camellia-128 Rijndael-128 Serpent-128 64 bit 64 bit 128 bit 128 bit 128 bit 128 bit 128 bit 128 bit 128 bit 128 bit 8 8 18 10 32 384/64 24 + 48 144 160 1024 4/8 bit 7 bit and 9 bit 8 bit 8 bit 4 bit 3/7 3 and 2 7 7 2 10 4 32 bit 64 bit -

The non-linear order of an S-box is the minimal degree required to express a linear combination of output bits as a polynomial function of input bits. Table 2. Characteristics of the key schedules.

Key size Expanded key S-boxes S-box width

Khazad Misty1 Camellia-128 Rijndael-128 Serpent-128 128 bit 128 bit 128 bit 128 bit 128 bit 576 bit 256 bit 256 bit 1408 bit 4224 bit 432/72 8 + 16 32 40 1056 4/8 bit 7 bit and 9 bit 8 bit 8 bit 4 bit

3.1 KHAZAD Khazad [3] is a 64-bit block cipher designed by P. Barreto and V. Rijmen, which accepts a 128-bit key. It consists of an 8-round SP-network that makes use of an 8-bit S-box and a 64-bit linear transformation based on an MDS code. The 8-bit S-box is in turn composed of 3 times two 4-bit S-boxes called P and Q, which are connected by bit-permutations. The key schedule is a 9-round Feistel iteration based on the same round function as in the cipher itself. For both P and Q we can generate 21 linearly independent quadratic equations in 36 terms (excluding ‘1’). As discussed before, we want to ﬁnd a subset of these equations which forms a system that is as simple as possible. A possible choice can be found in Appendix A. The system describing P consists of 4 equations in 16 terms. For Q, the best set we found is an over-deﬁned system of 6 equation in 18 terms. Constructing the System Cipher Variables. The variables of the system are the inputs and outputs of the 4-bit S-boxes. After combining six 4-bit S-boxes into a 8-bit S-box and renaming the variables according to the bit-permutations, we obtain a system in 32 variables. This is repeated for each of the 64 8-bit S-boxes. Linear Equations. The S-box layers are connected to each other and to the (known) plaintext and ciphertext by 9 64-bit linear layers, each including a key addition. Their contributions in the system are 9 times 64 linear equations.

Block Ciphers and Systems of Quadratic Equations

279

Nonlinear Equations. The only nonlinear equations are the S-box equations. For each 8-bit S-box we obtain a system of 30 equations in 32 linear and 54 quadratic terms. Note that a better system may be obtained if one optimizes the sets of the equations, used in the neighboring 4-bit S-boxes, trying to maximize the number of common terms between the two S-boxes. Key Schedule Variables. In order to take the key schedule into account, we need to include extra variables for the ﬁrst 64 bits of the key (called K −2 ) and for the inputs and outputs of the 4-bit S-boxes in the key schedule (which will include the 64 variables for K −1 ). Linear Equations. The linear layers between the 9 S-box layers in the key schedule deﬁne 8 times 64 linear equations. Nonlinear Equations. Again, the nonlinear equations are just the S-box equations. The number of equations and terms in the ﬁnal system are listed in Table 3. Note that we need to build a system for two diﬀerent 64-bit plaintext-ciphertext pairs in order to be able to solve the system for the 128-bit key. 3.2

MISTY1 and KASUMI

Misty1 [9] is a 64-bit block and 128-bit key block cipher designed by M. Matsui. It is an 8-round Feistel-network based on a 32-bit nonlinear function. Additionally, each pair of rounds is separated by a layer of two 32-bit F L-blocks. The nonlinear function itself has a structure that shows some resemblances with a Feistel-network and contains three 7-bit S-boxes S7 , and six 9-bit S-boxes S9 . The key schedule derives an additional 128-bit key K from the original key K by applying a circular function which contains 8 instances of S7 and 16 of S9 . The 9-bit S-box S9 is designed as a system of 9 quadratic equations in 54 terms (derived from the function y = x5 in GF (29 )) and this is probably the best system according to our criterion. The 7-bit S-box S7 is deﬁned by 7 cubic equations in 65 terms (see Appendix A), but it appears that its input and output bits can also be related by a set of 21 quadratic equations in 105 terms. In this set, our search program is able to ﬁnd a subset of 11 equations in 93 terms which completely deﬁnes the S-box. The substitution S7 was not selected at random, however, as it was designed to be linearly equivalent with the function y = x81 in GF (27 ). This information allows to derive a much simpler system of quadratic equations: raising the previous expression to the ﬁfth power, we obtain y 5 = x405 = x24 , which can be written as y · y 4 = x8 · x16 . This last equation is quadratic, since y 4 , x8 and x16 are linear functions in GF (27 ). Next, we express x and y as a linear function of the input and output bits of S7 and obtain a system of 7 quadratic equations in

280

Alex Biryukov and Christophe De Canni`ere

56 terms over GF (2) 3 . This is a considerable improvement compared to what our naive search program found. Kasumi [10] is a 64-bit block and 128-bit key block cipher derived from Misty1. It is built from the same components and uses S-boxes which are based on the same power functions and are equivalent up to an aﬃne transform. Its structure is slightly diﬀerent: relevant for the derivation of the equations is the fact that it uses 24 more instances of the 7-bit S-boxes, that the placement of its 8 F L-blocks is diﬀerent, and that the key schedule is completely linear. Constructing the System for MISTY1 Cipher Variables. The variables of the system are chosen to be the inputs and outputs of the S-boxes, the output bits of the ﬁrst pair of F L-blocks, both the input and the output bits of the 6 inner F L-blocks and the input of the last pair of F L-blocks. We expect however that the variables for some of the output bits of the ﬁrst F L-blocks can be replaced by constants, as about 3/8 of them will only depend on the (known) plaintext bits. This is also true for the output bits of the last F L-blocks. Linear Equations. These include all linear relations between diﬀerent S-boxes and between F L-blocks and S-boxes. Moreover, 3/8 of the equations of the outer F L-blocks will turn out to be linear. The total number of linear equations can be found in Table 3. Nonlinear Equations. Apart from the nonlinear equations of the S-boxes, we should also include the equations of the inner F L-blocks and equations of the outer F L-blocks that remained nonlinear (about 1/4). Key Schedule Variables. The additional variables in the key schedule are the input and output bits of the S-boxes (which include K) and the 128 bits of K . Linear Equations. The linear equations are formed by the additions after each S-box. Nonlinear Equations. These are just the S-box equations. In the same manner, we can derive a system for Kasumi. The ﬁnal results are listed in Table 3 and here again we need two plaintext/ciphertext pairs if we want to be able to extract the 128-bit key. 3.3

CAMELLIA-128

The 128-bit block cipher Camellia-128 [2] was designed by K. Aoki et al. It is based on an 18-round Feistel-network with two layers of two 64-bit F L-blocks 3

Fourteen more quadratic equations can be derived from the relations x64 ·y = x2 ·x16 and x · y 64 = y 2 · y 4 , but these would introduce many new terms.

Block Ciphers and Systems of Quadratic Equations

281

after the 6th and the 12th round. The 64-bit nonlinear function used in the Feistel iteration is composed of a key addition, a nonlinear layer of eight 8-bit S-boxes and a linear transform. An additional 128-bit key KA is derived from the original key KL by applying four Feistel iterations during the key scheduling. Camellia-128 uses four diﬀerent S-boxes which are all equivalent to an inversion over GF (28 ) up to an aﬃne transform. As pointed out in [5], the input and output bits of such S-boxes are related by 39 quadratic polynomials in 136 terms. By taking the subset of equations that does not contain any product of two input or two output bits, one can build a system of 23 quadratic equations in 80 terms (excluding the term ‘1’), which is probably the optimal system. Constructing the System Cipher Variables. The variables of the system will be the inputs and outputs of all S-boxes and F L-blocks. Linear Equations. All linear layers between diﬀerent S-boxes and between F L-blocks and S-boxes will insert linear equations in the system. Their exact number is given in Table 3. Nonlinear Equations. The only nonlinear equations are the equations of the S-box and the F L-block. Key Schedule Variables. The key schedule adds variables for the inputs and outputs of its 32 S-boxes and for KL and KA . Linear Equations. These are all linear layers between diﬀerent S-boxes or between an S-box and the bits of KL and KA . Nonlinear Equations. Again, the only nonlinear equations included in the system are the S-box equations. 3.4

RIJNDAEL-128

Rijndael-128 [6] is a 128-bit block cipher designed by J. Daemen and V. Rijmen and has been adopted as the Advanced Encryption Standard (AES). It is a 10round SP-network containing a nonlinear layer based on a 8-bit S-box. The linear diﬀusion layer is composed of a byte permutation followed by 4 parallel 32-bit linear transforms. The 128-bit key is expanded recursively by a key scheduling algorithm which contains 40 S-boxes. The 8-bit S-box is an aﬃne transformation of the inversion in GF (28 ), and as mentioned previously, it is completely deﬁned by a system of 23 quadratic equations in 80 terms.

282

Alex Biryukov and Christophe De Canni`ere

Constructing the System Cipher Variables. As variables we choose the input and output bits of of each of the 160 S-boxes. Linear Equations. Each of the 11 linear layers (including the initial and the ﬁnal key addition) correspond to a system of 128 linear equations. Nonlinear Equations. These are just the S-box equations. Key Schedule Variables. The variables in the key schedule are the inputs and outputs of the S-boxes (which include the bits of W3 ) and the bits of W0 , W1 and W2 . Linear Equations. The linear equations are all linear relations that relate the input of the S-boxes to the output of another S-boxes or to W0 , W1 and W2 . Nonlinear Equations. The S-box equations. 3.5

SERPENT-128

Serpent-128 [1] is a 128-bit block cipher designed by R. Anderson, E. Biham and L. Knudsen. It is a 32-round SP-network with nonlinear layers consisting of 32 parallel 4-bit S-boxes and a linear diﬀusion layer composed of 32-bit rotations and XORs. In the key schedule, the key is ﬁrst linearly expanded to 33 128-bit words and then each of them is pushed through the same nonlinear layers as the ones used in the SP-network. Serpent-128 contains 8 diﬀerent S-boxes and for each of them we can generate 21 linearly independent quadratic equations in 36 terms. The best subsets of equations we found allow us to build systems of 4 equations in 13 terms for S0 , S1 , S2 and S6 . S4 and S5 can be described by a system of 5 equations in 15 terms, and S3 and S7 by a system of 5 equations in 16 terms. As an example, we included the systems for S0 and S1 in Appendix A. Constructing the System Cipher Variables and Equations. The system for the SP-network of Serpent-128 is derived in exactly the same way as for Rijndael-128. The results are shown in Table 3. Key Schedule Variables. The variables of the system are the inputs and outputs of the S-boxes, i.e. the bits of the linearly expanded key and the bits of the ﬁnal round keys. Linear Equations. These are the linear equations relating the bits of linearly expanded key. Nonlinear Equations. These are the S-box equations.

Block Ciphers and Systems of Quadratic Equations

4

283

Equations in GF (28 ) for RIJNDAEL and CAMELLIA

In [7], S. Murphy and M. Robshaw point out that the special algebraic structure of Rijndael allows the cipher to be described by a very sparse quadratic system over GF (28 ). The main idea is to embed the original cipher within an extended cipher, called BES (Big Encryption System), by replacing each byte a by its 8 8-byte vector conjugate in GF (28 ) : 0 1 2 3 4 5 6 7 a2 , a2 , a2 , a2 , a2 , a2 , a2 , a2 (7) The advantage of this extension is that it reduces all transformations in Rijndael to very simple GF (28 ) operations 8-byte vectors, regardless of whether 8 these transformations were originally described in GF (28 ) or in GF (2) . The parameters of the resulting system in GF (28 ) are listed in Table 4. Note that the results are in exact agreement with those given in [7], though the equations and terms are counted in a slightly diﬀerent way. The fact that Camellia uses the same S-box as Rijndael (up to a linear equivalence) suggests that a similar system can be derived for this cipher as well. The only complication is the construction of a system in GF (28 ) for the F L-blocks. This can be solved as follows: ﬁrst we multiply each 8-byte vector at j −1 the input of the F L-blocks by an 8 × 8-byte matrix [bi,j ] where bi,j = xi·2 are elements of GF (28 ) written as polynomials. The eﬀect of this multiplication is that vectors constructed according to (7) are mapped to an 8-byte vector consisting only of 0’s and 1’s, corresponding to the binary representation of the original byte a. This implies that we can reuse the quadratic equations given in Section 2.2, but this time interpreted as equations in GF (28 ). Finally, we return to vector conjugates by multiplying the results by the matrix [bi,j ]. Note that these additional linear transformations before and after the F L-blocks do not introduce extra terms or equations, as they can be combined with the existing linear layers of Camellia. Table 4 compares the resulting system with the one obtained for Rijndael and it appears that both systems have very similar sizes. We should note, however, that certain special properties of BES, for example the preservation of algebraic curves, do not hold for the extended version of Camellia, because of the F L-blocks.

5

Interpretation of the Results, Conclusions

In this section we analyze the results in Tables 3 and 4, which compare the systems of equations generated by the diﬀerent ciphers. Each table is divided into three parts: the ﬁrst describes the structure of the cipher itself, the second describes the structure of the key-schedule and the third part provides the total count. Each part shows the number of variables, the number of equations (we provide separate counts for non-linear and linear equations) and the number of terms (with separate counts for quadratic and linear terms). The bottom line

284

Alex Biryukov and Christophe De Canni`ere

Table 3. A comparison of the complexities of the systems of equations in GF (2). Cipher Variables Linear eqs. Nonlinear eqs. Equations Linear terms Quadratic terms Terms

Khazad (×2) 2048 576 1920 2496 2048 3456 5504

Misty1 (×2) 1664 904 824 1728 1664 2960 4624

Key schedule Variables Linear eqs. Nonlinear eqs. Equations Linear terms Quadratic terms Terms

2368 512 2160 2672 2368 3888 6256

528 200 200 400 528 912 1440

Kasumi Camellia-128 Rijndael-128 Serpent-128 (×2) 2004 2816 2560 8192 1068 1536 1408 4224 3568 3680 4608 1000 2068 5104 5088 8832 2004 2816 2560 8192 9472 10240 6400 3976 5980 12288 12800 14592 256 128 0 128 256 0 256

Total Variables 6464 3856 4264 Linear eqs. 1664 2008 2264 Nonlinear eqs. 6000 1848 2000 Equations 7664 3856 4264 Linear terms 6464 3856 4264 Quadratic terms 10800 6832 7952 Terms 17264 10688 12216 Free terms

9600

6832

7952

768 384 736 1120 768 2048 2816

736 288 920 1208 736 2560 3296

8448 4096 4752 8848 8448 6600 15048

3584 1920 4304 6224 3584 11520 15104

3296 1696 4600 6296 3296 12800 16096

16640 8320 9360 17680 16640 13000 29640

8880

9800

11960

provides the number of free terms, which is the diﬀerence between the total number of terms and the total number of equations. Note that for 64-bit block ciphers we have to take two plaintext-ciphertext queries in order to be able to ﬁnd a solution for the 128-bit key. For such ciphers the numbers provided in the “Cipher” part of the table have to be doubled, in order to get the complete number of equations, terms and variables. Note however, that one does not have to double the number of equations for the keyschedule, which are the same for each encrypted block. The total for 64-bit block ciphers shows the proper total count: twice the numbers of the cipher part plus once the numbers from the key-schedule part. After comparing systems of equations for diﬀerent ciphers we notice that the three ciphers in the 128-bit category (Camellia-128, Rijndael-128, Serpent-128) don’t diﬀer drastically in the number of free terms. Camellia-128 and Rijndael-128 in particular have very similar counts of variables, equations and terms, both in GF (2) and GF (28 ) which is in part explained by the fact that they use the same S-box and have approximately equivalent number of rounds (Rijndael-128 has 10 SPN rounds and Camellia-128 has 18 Feistel rounds which is “equivalent” to 9 SPN rounds). On the other hand Serpent-128 has 3–4 times more SPN rounds than Camellia-128 and

Block Ciphers and Systems of Quadratic Equations

285

Table 4. A comparison of the complexities of the systems of equations in GF (28 ). Cipher Linear eqs. Nonlinear eqs. Equations Linear terms Quadratic terms Terms Key schedule Variables Linear eqs. Nonlinear eqs. Equations Linear terms Quadratic terms Terms Total Variables Linear eqs. Nonlinear eqs. Equations Linear terms Quadratic terms Terms Free terms

Camellia-128 Rijndael-128 2816 2560 1536 1408 1408 1280 2944 2688 2816 2560 1408 1280 4224 3840 768 384 256 640 768 256 1024

736 288 320 608 736 320 1056

3584 1920 1664 3584 3584 1664 5248 1664

3296 1696 1600 3296 3296 1600 4896 1600

Rijndael-128, and this explains why its system of equations has approximately 3–4 times more variables (at least in the cipher part). The number of quadratic terms in Serpent-128 is much smaller due to the 4-bit S-boxes, and this keeps the number of free terms within the range of the other two ciphers. When one switches to GF (28 ) for Camellia-128 and Rijndael-128, the equations for the S-boxes are much sparser and the number of free terms is reduced. It is not clear however how to make a fair comparison between systems in GF (2) and GF (28 ). The number of free terms in Table 4 is several times smaller, but then again, working in a larger ﬁeld might increase the complexity of the solving algorithm. Comparing Khazad and Misty1, both of which consist of 8 rounds, one may notice that Khazad has approximately twice more variables and equations (due to the fact that we write equations for the intermediate layers of the Sbox4 ). On the other hand the number of quadratic terms per nonlinear equation is considerably higher for Misty1 due to its larger S-boxes. 4

There are no quadratic equations for the full 8-bit S-box of Khazad and for the original S-box (before the tweak). There are many cubic equations, which is true for any 8-bit S-box.

286

Alex Biryukov and Christophe De Canni`ere

Finally note, that the results presented in this paper are very sensitive to the criteria used for the choice of the equations. Our main criterion was to minimize the number of free terms, for example we did not aim to ﬁnd the most over-deﬁned systems of equations. At the moment of this writing one can only speculate what would be the criteria for a possible algebraic attack (if such attack will be found to exist at all).

Acknowledgements We wish to thank the anonymous referees, whose comments helped to improve this paper. We would also like to thank Jin Hong for pointing out an error in a previous version of this paper.

References 1. R. Anderson, E. Biham, and L. Knudsen, “Serpent: A proposal for the advanced encryption standard.” Available from http://www.cl.cam.ac.uk/˜rja14/serpent. html. 2. K. Aoki, T. Ichikawa, M. Kanda, M. Matsui, S. Moriai, J. Nakajima, and T. Tokita, “Camellia: A 128-bit block cipher suitable for multiple platforms.” Submission to NESSIE, Sept. 2000. Available from http://www.cryptonessie.org/workshop/ submissions.html. 3. P. Barreto and V. Rijmen, “The KHAZAD legacy-level block cipher.” Submission to NESSIE, Sept. 2000. Available from http://www.cryptonessie.org/workshop/ submissions.html. 4. N. Courtois, A. Klimov, J. Patarin, and A. Shamir, “Eﬃcient algorithms for solving overdeﬁned systems of multivariate polynomial equations,” in Proceedings of Eurocrypt’00 (B. Preneel, ed.), no. 1807 in Lecture Notes in Computer Science, pp. 392–407, Springer-Verlag, 2000. 5. N. Courtois and J. Pieprzyk, “Cryptanalysis of block ciphers with overdeﬁned systems of equations,” in Proceedings of Asiacrypt’02 (Y. Zheng, ed.), no. 2501 in Lecture Notes in Computer Science, Springer-Verlag, 2002. Earlier version available from http://www.iacr.org. 6. J. Daemen and V. Rijmen, “AES proposal: Rijndael.” Selected as the Advanced Encryption Standard. Available from http://www.nist.gov/aes. 7. S. Murphy and M. Robshaw, “Essential algebraic structure within the AES,” in Proceedings of Crypto’02 (M. Yung, ed.), no. 2442 in Lecture Notes in Computer Science, pp. 17–38, Springer-Verlag, 2002. NES/DOC/RHU/WP5/022/1. 8. A. Shamir and A. Kipnis, “Cryptanalysis of the HFE public key cryptosystem,” in Proceedings of Crypto’99 (M. Wiener, ed.), no. 1666 in Lecture Notes in Computer Science, pp. 19–30, Springer-Verlag, 1999. 9. E. Takeda, “Misty1.” Submission to NESSIE, Sept. 2000. Available from http: //www.cryptonessie.org/workshop/submissions.html. 10. Third Generation Partnership Project, “3GPP KASUMI evaluation report,” tech. rep., Security Algorithms Group of Experts (SAGE), 2001. Available from http: //www.3gpp.org/TB/other/algorithms/KASUMI_Eval_rep_v20.pdf.

Block Ciphers and Systems of Quadratic Equations

A

287

Equations

A.1

Constructing a System – Example

This appendix illustrates how a set of linearly independent S-box equations can be derived for a small example. The n-bit S-box considered here is a 3-bit substitution deﬁned by the following lookup table: [7, 6, 0, 4, 2, 5, 1, 3]. In order to ﬁnd all linearly independent equations involving a particular set of terms (in this example all input and output bits xi and yj together with their products xi yj ), we ﬁrst construct a matrix containing a separate row for each term. Each row consists of 2n entries, corresponding to the diﬀerent values of the particular term for all possible input values (in this case from 0 to 7). The next step is to perform a Gaussian elimination on the rows of the matrix and all row operations required by this elimination are applied to the corresponding terms as well (see below). This way, a number of zero rows might appear and it is easy to see that the expressions corresponding to these rows are exactly the equations we are looking for. Note also that this set of equations forms a linearly independent basis and that any linear combination is also a valid equation.                            

1 x0 x1 x2 y0 y1 y2 x0 y0 x0 y1 x0 y2 x1 y0 x1 y1 x1 y2 x2 y0 x2 y1 x2 y2

A.2

1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0

1 1 0 0 0 1 1 0 1 1 0 0 0 0 0 0

1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0

1 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0

1 1 0 1 1 0 1 1 0 1 0 0 0 1 0 1

1 0 1 1 1 0 0 0 0 0 1 0 0 1 0 0

  1  1    1    1    1    1    0    1 →  1    0     1   1    0     1    1 0

1 x0 x1 1 + x0 + x1 + y0 x2 1 + x1 + y1 1 + x0 + x1 + y0 + y1 + y2 x0 + x0 y2 x2 + y0 + y1 + x0 y1 1 + x1 + y1 + x0 y0 1 + x0 + x1 + y0 + y1 + y2 + x1 y0 x0 + x0 y2 + x1 y1 1 + x1 + x2 + y0 + x0 y2 + x1 y2 y0 + y2 + x0 y2 + x2 y0 x0 + x2 + y0 + y2 + x2 y1 1 + x0 + x1 + y1 + x0 y2 + x2 y2

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

KHAZAD

P : 4 quadratic equations in 16 terms: x0 y1 + y1 + x0 y2 + x0 y3 + x2 y3 = x0 + x2 + x3 + 1 x3 y1 + x0 y2 + y2 + x0 y3 = x0 + x1 + x2 + x3 x0 y1 + x3 y2 + y2 + x0 y3 + y3 = x2 + x3 y0 + x0 y2 + x0 y3 + y2 y3 + y3 = x0 x3 + x1 + 1

1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0

1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0

1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0

1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0

1 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0

 1 1  1  0  1  1  1  1  0  0  0  0  0  0  0 0

288

Alex Biryukov and Christophe De Canni`ere

Q: 6 quadratic equations in 18 terms: y0 + x0 y1 + y1 + x0 y2 = x0 x2 + x1 x2 + x3 + 1 x1 y0 + y1 + x0 y2 + y3 = x0 x1 + x0 x2 + x2 + x3 + 1 x1 y0 + y0 + x0 y1 + y2 + x3 y3 = x0 x1 + x0 + x2 y0 y1 + y0 + x0 y1 + x0 y2 + y3 = x0 x2 + x0 + x1 + x2 + x3 y0 y2 + y1 + y2 + y3 = x0 x2 + x1 + x2 + x3 + 1 y0 + y1 y2 + y1 + y2 = x0 x1 + x1 + x3 + 1

A.3

MISTY1

Table 5 shows the original systems of equations which completely deﬁne the S-boxes S7 and S9 of Misty1. We omit the quadratic system of 7 equations in 56 terms used for S7 in Section 3.2 because of its complexity. A.4

SERPENT-128

S0 : 4 quadratic equations in 13 terms: y3 = x0 x3 + x0 + x1 + x2 + x3 y0 + y1 = x0 x1 + x1 x3 + x2 + x3 y0 y3 + y1 y3 + y1 = x0 + x2 + x3 + 1 y0 y3 + y0 + y2 + y3 = x0 x1 + x1 + x3 + 1 S1 : 4 quadratic equations in 13 terms: x3 = y0 + y1 y3 + y2 + y3 x0 + x1 = y0 y1 + y0 y3 + y0 + y2 + y3 + 1 x0 x3 + x1 x3 + x0 + x3 = y1 + y2 + y3 + 1 x0 x3 + x2 + x3 = y0 y3 + y1 + y3 + 1

y8 = x0 + x0 x1 + x1 x2 + x4 + x0 x5 + x2 x5 + x3 x6 + x5 x6 + x0 x7 + x0 x8 + x3 x8 + x6 x8 + 1

y7 = x1 + x0 x1 + x1 x2 + x2 x3 + x0 x4 + x5 + x1 x6 + x3 x6 + x0 x7 + x4 x7 + x6 x7 + x1 x8 + 1

y6 = x0 x1 + x3 + x1 x4 + x2 x5 + x4 x5 + x2 x7 + x5 x7 + x8 + x0 x8 + x4 x8 + x6 x8 + x7 x8 + 1

y5 = x2 + x0 x3 + x1 x4 + x3 x4 + x1 x6 + x4 x6 + x7 + x3 x7 + x5 x7 + x6 x7 + x0 x8 + x7 x8

y4 = x1 + x0 x3 + x2 x3 + x0 x5 + x3 x5 + x6 + x2 x6 + x4 x6 + x5 x6 + x6 x7 + x2 x8 + x7 x8

y3 = x0 + x1 x2 + x2 x4 + x5 + x1 x5 + x3 x5 + x4 x5 + x5 x6 + x1 x7 + x6 x7 + x2 x8 + x4 x8

y2 = x0 x1 + x1 x3 + x4 + x0 x4 + x2 x4 + x3 x4 + x4 x5 + x0 x6 + x5 x6 + x1 x7 + x3 x7 + x8

y1 = x0 x2 + x3 + x1 x3 + x2 x3 + x3 x4 + x4 x5 + x0 x6 + x2 x6 + x7 + x0 x8 + x3 x8 + x5 x8 + 1

y0 = x0 x4 + x0 x5 + x1 x5 + x1 x6 + x2 x6 + x2 x7 + x3 x7 + x3 x8 + x4 x8 + 1

S9 : 9 quadratic equations in 54 terms:

y6 = x0 x1 + x3 + x0 x3 + x2 x3 x4 + x0 x5 + x2 x5 + x3 x5 + x1 x3 x5 + x1 x6 + x1 x2 x6 + x0 x3 x6 + x4 x6 + x2 x5 x6

y5 = x0 + x1 + x2 + x0 x1 x2 + x0 x3 + x1 x2 x3 + x1 x4 + x0 x2 x4 + x0 x5 + x0 x1 x5 + x3 x5 + x0 x6 + x2 x5 x6

y4 = x2 x3 + x0 x4 + x1 x3 x4 + x5 + x2 x5 + x1 x2 x5 + x0 x3 x5 + x1 x6 + x1 x5 x6 + x4 x5 x6 + 1

y3 = x0 + x1 + x0 x1 x2 + x0 x3 + x2 x4 + x1 x4 x5 + x2 x6 + x1 x3 x6 + x0 x4 x6 + x5 x6 + 1

y2 = x1 x2 + x0 x2 x3 + x4 + x1 x4 + x0 x1 x4 + x0 x5 + x0 x4 x5 + x3 x4 x5 + x1 x6 + x3 x6 + x0 x3 x6 + x4 x6 + x2 x4 x6

y1 = x0 x2 + x0 x4 + x3 x4 + x1 x5 + x2 x4 x5 + x6 + x0 x6 + x3 x6 + x2 x3 x6 + x1 x4 x6 + x0 x5 x6 + 1

y0 = x0 + x1 x3 + x0 x3 x4 + x1 x5 + x0 x2 x5 + x4 x5 + x0 x1 x6 + x2 x6 + x0 x5 x6 + x3 x5 x6 + 1

S7 : 7 cubic equations in 65 terms:

Table 5. Misty1: S7 and S9 .

Block Ciphers and Systems of Quadratic Equations 289

Turing: A Fast Stream Cipher Gregory G. Rose and Philip Hawkes Qualcomm Australia, Level 3, 230 Victoria Rd Gladesville, NSW 2111, Australia {ggr,phawkes}@qualcomm.com

Abstract. This paper proposes the Turing stream cipher. Turing offers up to 256-bit key strength, and is designed for extremely eﬃcient software implementation.It combines an LFSR generator based on that of SOBER [21] with a keyed mixing function reminiscent of a block cipher round. Aspects of the block mixer round have been derived from Rijndael [6], Twoﬁsh [23], tc24 [24] and SAFER++ [17].

1

Introduction

Turing (named after Alan Turing) is a stream cipher designed to simultaneously be: – Extremely fast in software on commodity PCs, – Usable in very little RAM on embedded processors, and – Able to exploit parallelism to enable fast hardware implementation. The Turing stream cipher has a major component, the word-oriented Linear Feedback Shift Register (LFSR), which originated with the design of the SOBER family of ciphers [13, 14, 21]. Analyses of the SOBER family are found in [1–3, 11]. The eﬃcient LFSR updating method is modelled after that of SNOW [9]. Turing combines the LFSR generator with a keyed mixing-function reminiscent of a block cipher round. The S-box used in this mixing round is partially derived from the SOBER-t32 S-box [14]. Further aspects of this mixing function have been derived from Rijndael [6], Twoﬁsh [23], tc24 [24] and SAFER++ [17]. Turing is designed to meet the needs of embedded applications that place severe constraints on the amount of processing power, program space and memory available for software encryption algorithms. Since most of the mobile telephones in use incorporate a microprocessor and memory, a software stream cipher that is fast and uses little memory would be ideal for this application. Turing overcomes the ineﬃciency of binary LFSRs in a manner similar to SOBER and SNOW, and a number of techniques to greatly increase the generation speed of the pseudo-random stream in software on a general processor. Turing allows an implementation tradeoﬀ between small memory use, or very high speed using pre-computed tables. Reference source code showing small memory, key agile, and speed-optimized implementations is available at [22], along with a test harness with test vectors. The reference implementation (TuringRef.c) should be viewed as the deﬁnitive description of Turing. T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 290–306, 2003. c International Association for Cryptologic Research 2003

Turing: A Fast Stream Cipher

291

Turing has four components: key loading, Initialisation vector (IV) loading, an LFSR, and a keyed non-linear ﬁlter (NLF). The key loading initializes the keyed S-boxes, and the IV loading initializes the LFSR. The LFSR and NLF then generate key stream in 160-bit blocks (see Figure 1). Five 32-bit words selected from the LFSR are ﬁrst mixed, then passed through a highly-nonlinear, keydependent S-box transformation, and mixed again. The resulting 5-word nonlinear block is combined with 5 new LFSR words to create 160 bits of keystream. The ﬁnal addition of 5 LFSR words (called whitening) provides the output with good statistical properties, while the nonlinear block hides the linear properties of the LFSR. For each 160-bit block of key stream, the LFSR state is updated 5 times.

LFSR Register

Non-Linear Filter

Keyed 8x32 S-boxes

5-PHT Mixes between words A

B C D E

Mixes within words

TA TB TC TD TE

Whitening

Outputs

Add in new LFSR Words

5-PHT Mixes between words

XA

YA

XB

YB

XC

YC

XD

YD

XE

YE

+WA

ZA

+WB

ZB

+WC

ZC

+WD

ZD

+WE

ZE

Rotate Bytes

Fig. 1. Block diagram for Turing.

The paper is set out as follows. First, the LFSR is deﬁned in Section 2. Section 3 describes the NLF and explains how the overall structure of Turing operates. The key and initialization vector loading is described in Section 4. Section 5 discusses performance, and Section 6 analyses security and possible attacks. Turing uses “big-endian” byte ordering, in which the most signiﬁcant byte of a multi-byte quantity appears ﬁrst in memory. For example, a 32-bit word A has bytes indexed as (A0 , A1 , A2 , A3 ) where A0 is the most signiﬁcant byte.

292

2

Gregory G. Rose and Philip Hawkes

LFSR of Turing

Binary Linear Feedback Shift Registers can be extremely ineﬃcient in software on general-purpose microprocessors. LFSRs can operate over any ﬁnite ﬁeld, so an LFSR can be made more eﬃcient in software by utilizing a ﬁnite ﬁeld more suited to the processor. Particularly good choices for such a ﬁeld are the Galois Field with 2w elements (GF (2w )), where w is related to the size of items in the underlying processor, usually bytes or 32-bit words. The elements of this ﬁeld and the coeﬃcients of the recurrence relation occupy exactly one unit of storage and can be eﬃciently manipulated in software. The standard representation of an element A in the ﬁeld GF (2w ) is a wbit word with bits (aw−1 , aw−2 , . . . , a1 , a0 ), which represents the polynomial aw−1 z w−1 + . . . + a1 z + a0 . Elements can be added and multiplied: addition of elements in the ﬁeld is equivalent to XOR. To multiply two elements of the ﬁeld we multiply the corresponding polynomials modulo 2, and then reduce the resulting polynomial modulo a chosen irreducible polynomial of degree w. It is also possible to represent GF (2w ) using a subﬁeld. For example, rather than representing elements of GF (2w ) as degree-31 polynomials over GF (2), Turing uses 8-bit bytes to represent elements of a subﬁeld GF (28 ), and 32-bit words to represent degree-3 polynomials over GF (28 ). This is isomorphic to the standard representation, but not identical. The subﬁeld B = GF (28 ) of bytes is represented in Turing modulo the irreducible polynomial z 8 + z 6 + z 3 + z 2 + 1. Bytes represent degree-7 polynomials over GF (2); the constant β0 = 0x67 below represents the polynomial z 6 + z 5 + z 2 + z + 1 for example. The Galois ﬁnite ﬁeld W = B 4 = GF ((28 )4 ) of words can now be represented using degree3 polynomials where the coeﬃcients are bytes (subﬁeld elements of B). For example, the word 0xD02B4367 represents the polynomial 0xD0y 3 + 0x2By 2 + 0x43y + 0x67. The ﬁeld W can be represented by an irreducible polynomial y 4 + β3 y 3 + β2 y 2 + β1 y + β0 . The speciﬁc coeﬃcients βi used in Turing are best given after describing Turing’s LFSR. An LFSR of order k over the ﬁeld GF (2w ) generates a stream of w-bit LFSR words {S[i]} using a register of k memory elements (R[0],R[1],...,R[k-1]). The register stores the values of k successive LFSR words so after i clocks the register stores the values of (S[i], S[i + 1], ..., S[i + k − 1]). At each clock, the LFSR computes the next LFSR word S[i + k] in the sequence using a GF (2w ) recurrence relation S[i + k] = α0 S[i] + α1 S[i + 1] + · · · + αk−1 S[i + k − 1] ,

(1)

and updates the register (here new contains the value of S[i + k]): R[0] = R[1]; R[1] = R[2]; ...; R[15] = R[16]; R[16] = new; The register now contains (S[(i + 1)], S[(i + 1) + 1], ..., S[(i + 1) + k − 1]). The linear recurrence (1) is commonly represented by the characteristic polynomial k−1 p(X) = X k − j=0 αj X j . In the case of Turing, the LFSR consists of k = 17 words of state information with w = 32-bit words. The LFSR was developed in three steps.

Turing: A Fast Stream Cipher

293

First, the characteristic polynomial of the Turing LFSR was chosen to be of the form p(X) = X 17 + X 15 + X 4 + α, over GF (232 ). The exponents {17, 15, 4, 0} were chosen because they provide good security; the use of exponents dates back to the design of SOBER-t16 and SOBER-t32 [13]. Next, the coeﬃcient α = 0x00000100 ≡ 0x00 · y 3 + 0x00 · y 2 + 0x01 · y + 0x00 = y, was chosen because it allows an eﬃcient software implementation: multiplication by α consists of shifting the word left by 8 bits, and adding (XOR) a pre-computed constant from a table indexed by the most signiﬁcant 8 bits, as in SNOW. A portion of this table Multab for Turing is shown in Appendix A. In C code, the new word to be inserted in the LFSR is calculated: new = R[15] ˆ R[4] ˆ (R[0] << 8) ˆ Multab[ R[0] >> 24]; where ˆ is the XOR operation; << is the left shift operation; and >> is the right shift operation. Finally, the irreducible polynomial representing the Galois ﬁeld W was chosen to be y 4 + 0xD0 · y 3 + 0x2B · y 2 + 0x43 · y + 0x67, since it satisﬁes the following constraints: – The LFSR must have maximum length period. The period has a maximum length (2544 − 1) when the ﬁeld representations make p(X) a primitive polynomial of degree 17 in the ﬁeld W . – Half of the coeﬃcients of bit-wise recurrence must be 1. The Turing LFSR is mathematically equivalent to 32 parallel bit-wide LFSRs over GF (2): each of length equivalent to the total state 17 × 32 = 544; each with the same recurrence relation; but diﬀerent initial state [15]. Appendix D shows the polynomial p1 (x), corresponding to the binary recurrence for the Turing LFSR. Requiring half of the coeﬃcients to be 1 is ideal for maximum diﬀusion and strength against cryptanalysis. The key stream is generated as follows (see Figure 1). First, the LFSR is clocked. Then the 5 values in R[16], R[13], R[6], R[1], R[0], are selected as the inputs (A, B, C, D, E) (respectively) to the nonlinear ﬁlter (NLF). The NLF produces the nonlinear block (Y A, Y B, Y C, Y D, Y E) from (A, B, C, D, E). The LFSR is clocked an additional three times, and the values in R[14], R[12], R[8], R[1], R[0] of this new state (referred to as W A, W B, W C, W D, W E) are selected for the whitening. These ﬁve words are added (modulo 232 ) to the corresponding nonlinear-block words to form a 160-bit key stream block (ZA, ZB, ZC, ZD, ZE). Finally, the LFSR is clocked once more before generating the next key stream block (a total of ﬁve clocks between producing outputs). The key stream is output in the order ZA, . . . , ZE; most signiﬁcant byte of each word ﬁrst. Issues of buﬀering bytes to encrypt data that is not aligned as multiples of 20 bytes are considered outside the scope of this document.

3

The Nonlinear Filter

The only component of Turing that is explicitly nonlinear is its S-boxes. Additional nonlinearity also comes from the combination of the operations of addition modulo 232 and XOR; while each of these operations is linear in its respective

294

Gregory G. Rose and Philip Hawkes

mathematical group, each is slightly nonlinear in the other’s group. As shown in Figure 1, the nonlinear ﬁlter in Turing consists of: – Selecting the 5 input words A, B, C, D, E; – Mixing the words using a 5-word Pseudo-Hadamard Transform (5-PHT), resulting in 5 new words T A, T B, T C, T D, T E. – Applying a 32 × 32 S-box construction to each of the words to form XA, XB, XC, XD, XE. Prior to applying the S-box construction, the words T B, T C and T D are rotated left by 8, 16 and 24 bits respectively, to address a potential attack described below. The S-box construction mixes the bytes within each word using four key-dependent, 8 → 32 nonlinear S-boxes. – Again mixing using the 5-PHT to form the words Y A, Y B, Y C, Y D, Y E of the nonlinear block. Note that the use of variables XA, XB and so forth is only to make the explanations simple. In practise, the same variable A would be overwritten for each of T A, XA, Y A, ZA, and similarly for B, C, D, E. 3.1

The “Pseudo-Hadamard Transform” (PHT)

In the cipher family of SAFER [16], Massey uses this very simple construct (called a Pseudo-Hadamard Transform) to mix the values of two bytes: (a, b) = (2a + b, a + b), where the addition operation is addition modulo 28 , the size of the bytes. The operation can further extended to mix an arbitrary number of words (often called a n-PHT). Such operations are used in the SAFER++ block cipher [17]), and the tc24 block cipher [24]. The Turing NLF uses addition modulo 232 to perform a 5-PHT:       TA 21111 A TB  1 2 1 1 1 B        TC  = 1 1 2 1 1 · C  .       TD 1 1 1 2 1 D TE 11111 E Note that all diagonal entries are 2 except the last diagonal entry is 1, not 2. In C code, this is easily implemented and highly eﬃcient: E = A + B + C + D + E; A = A + E; B = B + E; C = C + E; D = D + E; 3.2

The S-Box Construction

Turing S-box construction transforms each word using four logically independent 8 → 32 S-boxes S0 , S1 , S2 , S3 . These 8 → 32 S-boxes are applied to the corresponding bytes of the input word and XORed, in a manner similar to that used in Rijndael [6]. However, unlike Rijndael, this transformation is unlikely to be invertible, as the expansion from 8 bits to 32 bits is nonlinear. These four

Turing: A Fast Stream Cipher

295

8 → 32 S-boxes are based in turn on a ﬁxed 8 → 8 bit permutation denoted Sbox and a ﬁxed nonlinear 8 → 32 bit function denoted Qbox, iterated with the data modiﬁed by variables derived during key setup. The Sbox. The ﬁxed 8 → 8 S-box is referred to in the rest of this document as Sbox[.]. It is a permutation of the input byte, has a minimum nonlinearity of 104, and is shown in Appendix B. The Sbox is derived by the following procedure, based on the well-known stream cipher RC4TM . RC4 was keyed with the 11-character ASCII string “Alan Turing”, and then 256 generated bytes were discarded. Then the current permutation used in RC4 was tested for nonlinearity, another byte generated, etc., until a total of 10000 bytes had been generated. The best observed minimum nonlinearity was 104, which ﬁrst occurred after 736 bytes had been generated. The corresponding state table, that is, the internal permutation after keying and generating 736 bytes, forms Sbox. By happy coincidence, this permutation also has no ﬁxed points (i.e. ∀x, Sbox[x] = x). The Qbox. The Qbox is a ﬁxed nonlinear 8 → 32-bit table, shown in Appendix c. It was developed by the Queensland University of Technology at our request [8]. It is best viewed as 32 independent Boolean functions of the 8 input bits. The criteria for its development were: the functions should be highly nonlinear (each has nonlinearity of 114); the functions should be balanced (same number of zeroes and ones); and the functions should be pairwise uncorrelated. Computing the Keyed 8 → 32 S-boxes. Turing uses four keyed 8 → 32 S-boxes S0 , S1 , S2 , S3 . The original key is ﬁrst transformed into the mixed key during key loading (see Section 4.1). The mixed key is accessed as bytes Ki [j]; the j index (0 ≤ j < N , where N is the number of words of the key) locates the word of the stored mixed key, while the i index (0 ≤ i ≤ 3) is the byte of the word, with the byte numbered 0 being the most signiﬁcant byte. Each S-box Si (0 ≤ i ≤ 3) uses bytes from the corresponding byte positions of the scheduled key. The process is best presented in algorithmic form. The following code implements the entire S-box construction including the XOR of the four outputs of the individual S − boxes. The value w is the input word, and the integer r is the amount of rotation (recall that T B, T C, T D have their inputs rotated before being input to the S-box construction). static WORD S(WORD w, int r) { register int i; BYTE b[4]; WORD ws[4]; w = ROTL(w, r); /* cyclic rotate w to left by r bits*/ WORD2BYTE(w, b); /* divide w into bytes b[0]...b[3] */ ws[0] = ws[1] = ws[2] = ws[3] = 0; for (i = 0; i < keylen; ++i) { /* compute b[i]=t_i and ws[i]=w_i */ /* B(A,i) extracts the i-th byte of A */

296

Gregory G. Rose and Philip Hawkes

b[0] b[1] b[2] b[3] }

}

= = = =

Sbox[B(K[i],0) Sbox[B(K[i],1) Sbox[B(K[i],2) Sbox[B(K[i],3)

/* now w = (ws[0] w ˆ= (ws[1] w ˆ= (ws[2] w ˆ= (ws[3] return w;

ˆ ˆ ˆ ˆ

b[0]]; b[1]]; b[2]]; b[3]];

ws[0] ws[1] ws[2] ws[3]

ˆ= ˆ= ˆ= ˆ=

ROTL(Qbox[b[0]],i+0); ROTL(Qbox[b[1]],i+8); ROTL(Qbox[b[2]],i+16); ROTL(Qbox[b[3]],i+24);

xor the individual S-box outputs together */ & 0x00FFFFFFUL) | (b[0] << 24); /* S_0 */ & 0xFF00FFFFUL) | (b[1] << 16); /* xor S_1 */ & 0xFFFF00FFUL) | (b[2] << 8); /* xor S_2 */ & 0xFFFFFF00UL) | b[3]; /* xor S_3 */

We shall brieﬂy explain the process for an individual 8 × 32 S-box. The input byte b is combined with a key byte and passed through the ﬁxed Sbox, the result is combined with another key byte, and so on, to form a temporary result: ti (x) = Sbox[Ki [N − 1] ⊕ Sbox[Ki [N − 2] ⊕ · · · Sbox[Ki [0] ⊕ x] · · ·]], where ⊕ is the XOR operator, and N is the number of words in the key. Note that the byte function ti : x → ti (x) forms a permutation. This process can be visualised as the input byte “bouncing around” under the control of the key. At each bounce, a rotated word from the Qbox is accumulated into another temporary word wi (x); the rotation depends on the byte position in question and the stage of progress, ensuring that no entries of the Qbox can cancel each other out. Finally, the byte in position i of wi (x) is replaced with ti (x) to form the output Si (x) to ensure that the byte in position i is balanced with respect to the input.

4

Keying the Stream Cipher

For Turing, the key and Initialization Vector (IV) are presented as a byte stream, and converted to 32-bits words in the most signiﬁcant byte ﬁrst (big-endian) representation. The key and IV must therefore be multiples of 4 bytes each. The minimum size of the key is 32 bits; although clearly this is useless cryptographically, Turing with a 32-bit key makes a good, seedable pseudorandom number generator for statistical and simulation purposes. We hope for security equal to key enumeration for keys up to 256 bits. The largest key size supported is 256 bits. The minimum size of the IV is zero, however an IV loading stage is mandatory even in this case, because the LFSR is initialized when the IV is loaded. The maximum size of the IV is determined by the key length; the sum of the key length and IV length must be no more than 12 words (384 bits). There is no requirement that the size of the IV be constant. The structure of Turing guarantees that diﬀerent key/IV length combinations will generate distinct output streams. No more than 2160 (160-bit) blocks of output should be generated using any one key/IV combination.

Turing: A Fast Stream Cipher

4.1

297

Key Loading

The original key undergoes two steps of transformation during key loading; a byte-mixing step and a word-mixing step; resulting in the mixed key. Byte-Mixing Step. The key bytes are mixed through the ﬁxed Sbox and the Qbox, to ensure that all bytes of the key aﬀect all four of the keyed S-boxes. For each word of the key, the bytes are transformed serially through the Sbox, using the Qbox in an unbalanced Feistel structure for each byte to alter the other three bytes of the word. static WORD fixedS(WORD w) { WORD b; b = Sbox[B(w, 0)]; w = ((w b = Sbox[B(w, 1)]; w = ((w b = Sbox[B(w, 2)]; w = ((w b = Sbox[B(w, 3)]; w = ((w return w; }

ˆ Qbox[b]) ˆ ROTL(Qbox[b],8)) ˆ ROTL(Qbox[b],16)) ˆ ROTL(Qbox[b],24))

& & & &

0x00FFFFFF) 0xFF00FFFF) 0xFFFF00FF) 0xFFFFFF00)

| | | |

(b << 24); (b << 16); (b << 8); b;

Word-Mixing Step. An n-PHT transform forms the mixed key words. The transformation from the original key to the mixed key is reversible, ensuring that no keys are equivalent. The resulting words are stored for subsequent use; they occupy the same amount of space as the original key, which is no longer needed. These words will be used in the key-dependent S-boxes, and also during the IV loading process to initialize the LFSR. If the fastest implementation of Turing is desired, at this point each the four keyed S-boxes can be “pre-computed” and stored in four tables, each with 256 32-bit entries. For each S-box, the 256 outputs of the keyed S-boxes are computed and stored in a table, indexed by the corresponding input values. The combined S-box construction then consist of four byte-index table lookups and four word XOR operations for each input word. A similar optimization is used in fast implementations of Rijndael. Note. The role of the mixing steps in the key loading is to prevent related-key attacks. For embedded applications with a key ﬁxed in read-only memory, it is permissible to skip the key-loading stage and use a high quality cryptographic key directly as the mixed key (i.e., as if it was the output of the key-loading stage). Another alternative is to provision the mixed key (rather than the original key) directly into the hardware. 4.2

IV Loading

The Initialization Vector (IV) loading process initializes the LFSR with values derived from a non-linear mixing of the key and the IV. Let L be the length in words of the key, and I be the length in words of the IV. The LFSR register is initialized in the following manner:

298

Gregory G. Rose and Philip Hawkes

– The IV words are copied into place and processed using the byte-mixing step fixedS() described above. – The mixed key words are appended, without further processing. – A single word, 0x010203LI is appended, where L and I are presumed to be hexadecimal form. This ensures that diﬀerent length keys and IVs cannot create the same initial LFSR state. – The remaining words of the register are ﬁlled by adding the immediately previous word to the word (L + I) before that, then processing the resulting word with the keyed 32 × 32 S-box construction (here denoted S()). That is, the k-th word R[k] (L+I +1 ≤ k < 17) is set to S(R[k−1]]+R[k−L−I −1]]). – Finally, once the LFSR has been ﬁlled with data, the contents are mixed with a 17-PHT. Keystream generation can now begin.

5

Performance

If suﬃcient random-access memory (RAM) is available, the operations of the four keyed S-boxes can be precalculated at the time of key setup, resulting in four tables: one for each byte of the input word. Many current high-end microprocessor CPUs allow multiple instructions to execute at once, if the instructions are suﬃciently independent. Note that the operations mentioned above are all highly parallel, allowing very good performance on such processors. Similarly hardware or FPGA implementations can achieve high throughput using parallel paths. In the cases where the key is provisioned into hardware, it is possible for the entire key scheduling process, including the calculation of these tables, to be done at the time of provisioning. Thus, instead of 4K bytes of RAM and 1280 bytes of ROM, 4K bytes of ROM is suﬃcient and yields a very fast implementation (A further 1024 bytes of ROM is still required for the multiplication table.) Lastly, note that there is no accumulation of nonlinear data, nor is the clocking irregular. Therefore, if it is desirable to generate a small amount of keystream oﬀset in a much larger block, this can be done by “fast forwarding” the LFSR using polynomial or matrix exponentiation in logarithmic time, rather than the linear time that would be required to generate and discard the intermediate output. Turing provides ﬂexibility of eﬃcient implementation. There are 4 separate implementations in the source code archive [22]: – TuringRef.c, an unoptimized reference implementation, which also uses little RAM. It does not precompute any tables. – TuringTab.c precomputes the keyed S-boxes when the key is set. It uses 4K bytes of RAM in addition to the 1280-byte Multab for the LFSR. – TuringLazy.c is a key-agile implementation, which ﬁlls in entries of the S-box tables only as they are required (lazy evaluation). Thus key and IV setup are relatively fast, and encryption speed is adequate. – TuringFast.c uses S-box tables computed at key setup time, and performs as much computation inline as possible.

Turing: A Fast Stream Cipher

299

Table 1 shows various performance ﬁgures. These are measured times on an IBM laptop with 900MHz mobile Pentium III processor, using Microsoft Visual C++ V6.0, with the optimization options for “Release” build. Comparison times for Brian Gladman’s (highly optimized) implementation of AES and our implementation of an RC4 compatible cipher with a bulk encryption interface are also shown. All ﬁgures are for 128-bit keys. We consider RC4’s keying operation to actually be “IV setup”, and this does not include time to either discard generated bytes or to hash the key and IV, which would be necessary for security. Table 1. Performance ﬁgures comparing the speed of various implementations of Turing against AES and RC4 (arrsyfor). Cipher TuringRef TuringLazy TuringTab TuringFast arrsyfor AES enc.

6

MByte/s cycles/B 6.04 26.92 29.94 146.95 24.00 33.53

149.01 33.43 30.06 6.12 37.49 26.85

Key (cycles) 477.00 1802.70 72457.93 72417.12 0.00 239.00

IV setup tables Additional (cycles) (Bytes) RAM (Bytes) 4272.31 2304 68 991.80 2304 4164 900.90 2304 4164 882.90 2304 4164 10347.42 0 258 0.00 20480 176

Security Analysis

In this section, we attempt to justify Turing’s security by reference to the mechanisms by which it defeats a variety of known attacks. In this analysis we assume the attacker has direct access to the stream generator output. Summary. A keystream generator that exhibits basic statistical biases or detectable characteristics is weak. The LFSR used has well studied statistical properties that translate directly to the output. Additionally the highly nonlinear, key-dependent transformation in the core of Turing serves to disguise the inherent linearity of the LFSR output. We have extensively tested output from Turing using the Crypt-X package [7] and have detected no statistical weaknesses. 6.1

Period

The LFSR is clocked ﬁve times for each output block, and ﬁve is a factor of the LFSR period, the period of any cycle is (2544 − 1)/5 blocks. This corresponds to the expected period of (2544 − 1) words. 6.2

Guess and Determine Attacks

The choice of feedback positions for the LFSR and output positions to the NLF is copied to Turing from the SOBER t-class ciphers [13] (the LFSR taps and NLF taps respectively). The taps were chosen starting with the criteria that the

300

Gregory G. Rose and Philip Hawkes

NLF taps form a “full positive diﬀerence set”, so that as words move through the register and are selected as input to the nonlinear ﬁlter function, no pair of words is used more than once [10]. The combination of taps for the LFSR and NLF was then mechanically optimized against guess and determine attacks [1, 2, 12]. In addition, the attacks rely upon the fact that the nonlinear function can be rewritten so that given its output, and (n − 1) of its n inputs, the remaining input can be determined. Turing’s nonlinear ﬁlter function design frustrates this by (a) being key-dependent, (b) being non-invertible, and (c) requiring a large amount of output to build a large inversion table. However, the choice of output positions has proven to frustrate other attacks. It’s worth noting that SOBERt32 (and hence the underlying structure of Turing) has been extensively analyzed for the NESSIE project [19], and it seems that the structure should provide a minimum complexity exceeding the enumeration of 256-bit keys. 6.3

Analysis of the Non-linear Filter

Coppersmith et. al. have deﬁned a fairly general model [4] for Distinguishing Attacks against nonlinear ﬁlter generators. While there could exist other attacks on the cipher, it seems that most attacks are likely to reduce to some variation on this model, so we describe our analysis in terms of this model. The model assumes that some signiﬁcant correlation can be identiﬁed in the ﬁlter function, and that this correlation will remain usable after outputs have been combined in such a way as to eliminate the linear part from consideration. The attack relies on ﬁnding a highly-correlated linear relationship between the LFSR state and some function of the outputs. Courtois [5] recently described an algebraic attack on LFSR-based stream ciphers exploiting a highly-correlated Boolean functions of bits of the LFSR state and bits of the key stream. These functions are called approximations to the NLF. The approximations do not need to be linear, however the algebraic normal forms of these approximations do need to be of low order. This analysis of the NLF explains why we believe all low-order approximations to the NLF will have low correlation, thus resisting algebraic attacks. The S-Boxes. The XORing of the four outputs of the 8 × 32 S-boxes makes it likely that approximations require approximating the four individual 8 × 32 S-boxes rather than just one 8×32 S-boxes. The S-boxes in Turing are further designed to limit the correlation of low-order approximations. We are still performing detailed analysis of typical S-boxes used by Turing, but generally speaking the nonlinear functions are complex and of high degree and each involve at least 8 intermediate binary variables. The keying of the S-boxes provides signiﬁcant protection since an attacker must either consider: (1) average-key correlationsexpected to be negligible; or (2) key speciﬁc approximations- although these are diﬃcult to ﬁnd without access to the key. A ﬁnal comment on the S-boxes. Recall that the accumulated word wi (x) is highly nonlinear with respect to the input, and highly dependent on the key material, however the bit positions in it are not likely to be balanced. Each byte function ti , being a permutation, is by deﬁnition balanced. Replacing byte i of

Turing: A Fast Stream Cipher

301

wi (x) with ti (x) forms an output of Si that is balanced in byte i. Thus, when all the S-box outputs are XORed, the output is balanced in each output bit, and the distribution of values is uniform for each byte position. However, for a given key, the distribution of outputs from the whole 32 × 32 S-box construction is not uniform. When the key is ﬁxed, the S-box construction appears to be 32 × 32 pseudorandom function. The 5-PHT. The main advantage using of the 5-PHT is its speed and parallelism, however it is not as good a mixing function as could be desired. There are linear approximations between the LSBs of the inputs and output that hold with probability one and many quadratic approximations that hold with probability one. A further undesirable characteristic is that if two of the input words A, B, C, or D are equal, they remain equal after the transformation (e.g. A = B ⇒ T A = T B). The S-boxes operate only on individual words, and so also preserve this equality. This equality is preserved in the next 5-PHT so the two words in the nonlinear block are also equal. This undesirable diﬀerential characteristic is addressed by rotating the input to the S-boxes corresponding to T B, T C, and T D. 6.4

Analysis of the Whitening

While the S-boxes do a good job of masking the linearity of the underlying LFSR, the distribution of outputs from the 32 × 32 S-box construction (for a given secret key) is likely to be far from uniform. Thus, over the lifetime of a key, the distribution of 160-bits outputs from the NLF will be far from uniform. Our analysis indicates that, for a given key, the NLF is a 160 × 160 pseudorandom function. The whitening has three eﬀects: ﬁrst, it makes the outputs uniform; second, these operations “lock in” the mixing of the last 5-PHT stage, since an attacker needs to remove the eﬀects of these words before being able to reverse these mixing rounds; ﬁnally, by adding ﬁve new words, more than half of the register state is involved in the ﬁlter function. Unfortunately, the whitening is linear in the LSBs of each word. The linear nature of the LFSR means that the whitening blocks satisfy bit-wise linear recurrence relations. The corresponding key stream blocks can be combined to cancel the LSBs of the whitening and get a linear relationship between the LSBs of the key stream blocks and the LSBs of the nonlinear blocks. The choice of LFSR values used in the whitening was based on three criteria: no LFSR word should be used in the Final Addition Stage of more than one output block; the taps should be a full positive diﬀerence set; and no input to the ﬁnal addition stage should be used in the NLF of more than one output block. The ﬁrst criteria is the most important. If this ﬁrst criterion is not satisﬁed, then an attacker obtains additional linear relationships between the LSBs of the key stream blocks and nonlinear blocks. When combined with the linear relationships discussed in the previous paragraph, the attacker could obtain a solvable system of equations, and Turing would be broken.

302

6.5

Gregory G. Rose and Philip Hawkes

Analysis of the Key Loading and IV Loading

Key Loading. Related Key Attacks: Related key attacks assume that the attacker can somehow obtain key stream from keys that are closely related to that key being attacked. Turing’s key loading mechanism exists solely to address this attack, by ensuring that a change to any single byte of the key will signiﬁcantly (and nonlinearly) alter the behaviour of all the S-boxes and also the initial loading of the LFSR. The transformation performed is bijective and publicly known, so it is easy to create pairs of input keys which are very similar after transformation. However, ﬁnding a key whose transformation is similar to that of an unknown key appears diﬃcult. IV Loading. It is well known that key stream generated by a synchronous stream-cipher should not be re-used, irrespective of the security of the cipher. Turing has an integrated mechanism to support Initialization Vectors (IVs) which allows many key streams to be generated from the same shared key. Related / Chosen Initialization Vector Attacks: Initialization Vectors are often related (e.g. counters are often used) and might even be chosen by the attacker. The IV is used to initialize the LFSR, so we have been careful to ﬁll the LFSR in a highly key-dependent and nonlinear manner. Any change in the IV ﬁrst makes a large change in the corresponding word loaded. That word will cause an unpredictable change in at least one of the ﬁll words, then those changes will be propagated through the LFSR with the 17-PHT transform. LFSR states derived from diﬀerent IVs are less obviously related than states drawn from diﬀerent segments of the same output stream.

Acknowledgements The authors would like to thank Thomas St. Denis, Scott Fluhrer, David McGrew and David Wagner for useful insights and feedback into the design of Turing.

References 1. S. Blackburn, S. Murphy, F. Piper, and P. Wild. A SOBERing remark. Unpublished technical report, Information Security Group, Royal Holloway University of London, Egham, Surrey TW20 0EX, U.K., 1998. 2. D. Bleichenbacher and S. Patel. SOBER cryptanalysis. Fast Software Encryption, FSE’99 Lecture Notes in Computer Science, vol. 1636, L. Knudsen ed., SpringerVerlag, pages 305–316, 1999. 3. D. Bleichenbacher, S. Patel and W. Meier. Analysis of the SOBER stream cipher. TIA Contribution TR45.AHAG/99.08.30.12. 4. D. Coppersmith, S. Halevi and C. Jutla. Cryptanalysis of Stream Ciphers using Linear Masking. Advances in Cryptology - CRYPTO 2002, Lecture Notes in Computer Science, vol. 2442, M. Yung ed., Springer-Verlag, pages 515–532, 2002. 5. N. T. Courtois. Higher Order Correlatoin Attacks, XL algorithm and Cryptanalysis of Toyocrypt. Cryptology ePrint Archive, International Association for Cryptological Research (IACR), document 2002/087, 2002. See http://eprint.iacr.org.

Turing: A Fast Stream Cipher

303

6. J. Daemen, V. Rijmen. AES Proposal: Rijndael, 2000. See http://www.esat.kuleuven.ac.be/˜rijmen/rijndael. 7. E. Dawson, A. Clark, H. Gustafson and L. May. CRYPT-X’98, (Java Version) User Manual. Queensland University of Technology, 1999. 8. E. Dawson, W. Millan, L. Burnett and G. Carter. On the Design of 8*32 S-boxes. Unpublished report, by the Information Systems Research Centre, Queensland University of Technology, 1999. 9. P. Ekdahl and T. Johansson. SNOW - a new stream cipher, 2000. This paper is found in the NESSIE webpages: http://www.cosic.esat.kuleuven.ac.be/nessie/workshop/submissions/snow.zip. 10. J. Dj. Golic. On Security of Nonlinear Filter Generators. Fast Software Encryption, FSE’96 Lecture Notes in Computer Science, vol. 1039, D. Gollman ed., SpringerVerlag, pages 27–32, 1996. 11. C. Hall and B. Schneier. An Analysis of SOBER. Unpublished report, 1999. 12. P. Hawkes and G. Rose. Exploiting multiples of the connection polynomial in word-oriented stream ciphers. Advances in Cryptology, ASIACRYPT2000, Lecture Notes in Computer Science, vol. 1976, T. Okamoto ed., Springer-Verlag, pages 302–316, 2000. 13. P. Hawkes and G. Rose. The t-class of SOBER stream ciphers, 2000. See: http://people.qualcomm.com/ggr/QC/tclass.pdf. 14. P. Hawkes and G. Rose. Primitive speciﬁcation and supporting documentation for SOBER-t32 submission to NESSIE, 2000. See: http://www.cosic.esat.kuleuven.ac.be/nessie/workshop/submissions/sobert32.zip. 15. T. Herlestam. On functions of Linear Shift Register Sequences. Advances in Cryptology - EUROCRYPT ’85, Lecture Notes in Computer Science, vol. 219, F. Pichler (ed.), Springer Verlag, pages 119–129, 1986. 16. J. L. Massey. SAFER K-64: A Byte-oriented Block-Ciphering Algorithm. Fast Software Encryption, FSE’93 Lecture Notes in Computer Science, vol. 809, SpringerVerlag, 1993. 17. J. Massey, G. Khachatrian and M. Kuregian. Nomination of SAFER++ as Candidate Algorithm for the New European Schemes for Signatures, Integrity, and Encryption (NESSIE). September 2000. See: http://www.cosic.esat.kuleuven.ac.be/nessie/workshop/submissions/safer++.zip. 18. A. Menezes, P. Van Oorschot and S. Vanstone. Handbook of Applied Cryptography. CRC Press, 1997, Ch 6. 19. The NESSIE Project New European Schemes for Signatures, Integrity, and Encryption, 2000–2003 See: http://www.cryptonessie.org. 20. C. Paar. Eﬃcieint VLSI Architectures for Bit-Parallel Computation in Galois Fields. Ph.D. Thesis, Institute for Experimental Mathematics, University of Essen, 1994, ISBN 3-18-332810-0. 21. G. Rose. A stream cipher based on Linear Feedback over GF (28 ). Information Security and Privacy, Third Australasian Conference, ACISP’98, Lecture Notes in Computer Science, vol. 1438, C. Boyd, E. Dawson (Eds.),Springer-Verlag, pages 155–146, 1998. 22. G. Rose. Reference Source Code for Turing. QUALCOMM Australia, 2002. See: http://people.qualcomm.com/ggr/QC/Turing.tgz. 23. B. Schneier, J. Kelsey, D. Whiting, D. Wagner, C. Hall and N. Ferguson. Twoﬁsh: A 128-Bit Block Cipher. See: http://www.counterpane.com/twofish.html. 24. T. St. Denis. Weekend Cipher. sci.crypt news article: [email protected].

304

Gregory G. Rose and Philip Hawkes

Appendix A. Portion of Multiplication Table for Turing /* Multiplication table for unsigned long Multab[256] = 0x00000000, 0xD02B4367, 0x97AC41D1, 0x478702B6, ... 0x78DEE220, 0xA8F5A147, 0xEF72A3F1, 0x3F59E096, };

Turing */ { 0xED5686CE, 0x3D7DC5A9, 0x7AFAC71F, 0xAAD18478, 0x958864EE, 0x45A32789, 0x0224253F, 0xD20F6658,

Appendix B: The Sbox unsigned char Sbox[256] = { 0x61, 0x51, 0xeb, 0x19, 0x7c, 0xb2, 0x06, 0x12, 0x2b, 0x18, 0x83, 0xb0, 0xe9, 0xdd, 0x6d, 0x7a, 0xb5, 0x1c, 0x90, 0xf7, 0xae, 0x77, 0xc2, 0x13, 0x37, 0x6a, 0xd4, 0xdb, 0x87, 0xcb, 0x40, 0x15, 0x11, 0x0f, 0xd0, 0x30, 0x85, 0x27, 0x0e, 0x8a, 0xcc, 0xe4, 0xf1, 0x98, 0xd5, 0xbc, 0x1b, 0xbb, 0xba, 0xd9, 0x2e, 0xf3, 0x71, 0x4c, 0xab, 0x7d, 0xc1, 0x96, 0x1e, 0xfc, 0x5c, 0x78, 0x2a, 0x9d, 0x89, 0x05, 0xf4, 0x07, 0x9a, 0x92, 0x69, 0x8f, 0xde, 0xec, 0x09, 0xf2, 0xaa, 0xdf, 0x7e, 0x82, 0x03, 0x32, 0x4e, 0x39, 0xea, 0x72, 0x79, 0x41, 0x2c, 0xb4, 0xa2, 0x53, 0x54, 0x95, 0xb6, 0x80, 0x08, 0x93, 0x2f, 0x99, 0x56, 0x84, 0xd2, 0x01, 0x8b, 0x0c, 0x0b, 0x46, 0xa4, 0xe3, 0x70, 0xd6, 0xc9, 0x00, 0x9e, 0xe7, 0x5f, 0xa3, 0x33, 0x20, 0xee, 0x17, 0x81, 0x42, 0xe5, 0xbe, 0x6e, 0xad, };

0xb9, 0xc4, 0x7f, 0x6b, 0xed, 0xfd, 0x8e, 0x88, 0x48, 0xe0, 0xff, 0xd1, 0x1d, 0x8d, 0x44, 0xa5, 0x21, 0xc5, 0xd3, 0x29, 0x6f, 0xd8, 0x57, 0x8c, 0x5a, 0xf6, 0xb7, 0xfb, 0x4f, 0x02, 0x58, 0xbf,

0x5d, 0x5b, 0x75, 0x68, 0x9f, 0xcd, 0x65, 0x0d, 0xf9, 0x50, 0xa1, 0xfe, 0x47, 0xc7, 0xc8, 0xf0, 0x52, 0xc3, 0xaf, 0xc0, 0xc6, 0x26, 0xe2, 0x36, 0xf8, 0x66, 0x3c, 0xe6, 0x76, 0xef, 0x0a, 0x43,

0x60, 0x16, 0xfa, 0x2d, 0xe8, 0x3e, 0x1f, 0x35, 0xa8, 0x64, 0x04, 0x31, 0x4a, 0x59, 0x7b, 0x73, 0xa6, 0xf5, 0x34, 0x24, 0xb1, 0x6c, 0x9c, 0x67, 0x3a, 0x4d, 0x45, 0x10, 0x25, 0x62, 0x4b, 0x94,

0x38, 0x3b, 0xa0, 0x49, 0xce, 0xcf, 0x1a, 0xb3, 0xac, 0xa7, 0xda, 0xca, 0x3d, 0xb8, 0xdc, 0x22, 0x28, 0xe1, 0x23, 0x14, 0x9b, 0x5e, 0x86, 0xbd, 0xd7, 0x55, 0x91, 0xa9, 0x3f, 0x74, 0x63, 0x97,

Turing: A Fast Stream Cipher

Appendix C: The Qbox WORD Qbox[256] 0x1faa1887, 0x5957ee20, 0xd12ee9d9, 0x74864249, 0x4af40325, 0xcd175b42, 0x4ec11b9a, 0x1fed5c0f, 0xe71843de, 0x9fa803fa, 0x24d607ef, 0xc25b71da, 0xf8c9def2, 0x0d25b510, 0x8f951e2a, 0x2e6b6214, 0x6b2f1f17, 0xba7709a4, 0xd6aff227, 0x00d5bf6e, 0x4b0d0184, 0x5ead6c2a, 0xd200248e, 0x906cc3a1, 0x131b5e69, 0xca3a6acd, 0x390bb16d, 0x2cb6bc13, 0xbcdb5476, 0x956feb3c, 0x6c2466dd, 0x26b9cef3, 0xf77ded5c, 0xb1cc715e, 0x097a5e6c, 0xa11f523f, 0x7dd1e354, 0xa9829da1, 0xd3779b69, 0x8d8c8c76, 0x449410e9, 0xb5f68ed2, 0x4ff55a4f,

= { 0x4e5e435c, 0xd484fed3, 0xfc1f38d4, 0xda2e3963, 0x9fc0dd70, 0xf10012bf, 0x3f168146, 0xaab4101c, 0x508299fc, 0x9546b2de, 0x8f97ebab, 0x75e2269a, 0x46c9fc5f, 0x000f85a7, 0x72f5f6af, 0x220b83e3, 0xad3b99ee, 0xb0326e01, 0xfd548203, 0x10ee7888, 0x89ae4930, 0xef22c678, 0x303b446b, 0x54fef170, 0xc3a8623e, 0x4b55e936, 0x5a80b83c, 0xbf5579d7, 0x6864a866, 0xf6c8a306, 0x75eb5dcd, 0xadb36189, 0x8b8bc58f, 0x68c0ff99, 0x0cbf18bc, 0x35d9b8a2, 0x6bba7d79, 0x301590a7, 0x2d71f2b7, 0x2620d9f0, 0xa20e4211, 0x8243cc1b, 0x8740f8e7,

0x9165c042, 0xa666c502, 0x49829b5d, 0x28f4429f, 0xd8973ded, 0x6694d78c, 0xc0ea8ec5, 0xea2db082, 0xe72fbc4b, 0x3c233342, 0xf37f859b, 0x1e39c3d1, 0x1827b3a3, 0xb2e82e71, 0xe4cbc2b3, 0xd39ea6f5, 0x16a65ec0, 0xf4b280d9, 0xf56b9d96, 0xedfcfe64, 0x1c014f36, 0x31204de7, 0xb00d9fc2, 0x34c19155, 0x27bdfa35, 0x86602db9, 0x22b23763, 0x6c5c2fa8, 0x416e16ad, 0x216799d9, 0xdf118f50, 0x8a7a19b1, 0x06dde421, 0x5d122f0f, 0xc2d7c6e0, 0x03da1a6b, 0x32cc7753, 0x9bc1c149, 0x183c58fa, 0x71a80d4d, 0xf9c8082b, 0x453c0ff3, 0xcca7f15f,

0x250e6ef4, 0x7e54e8ae, 0x1b5cdf3c, 0xc8432c35, 0x1a02dc5e, 0xacaab26b, 0xb38ac28f, 0x470929e1, 0x2e3915dd, 0x0fcee7c3, 0xcd1f2e2f, 0xeda56b36, 0x70a56ddf, 0x68cb8816, 0xd34ff55d, 0x6fe041af, 0x757016c6, 0x4bfb1418, 0x6717a8c0, 0x1ba193cd, 0x82a87088, 0xc9c2e759, 0x9914a895, 0xe27b8a66, 0x97f068cc, 0x51df13c1, 0x39d8a911, 0xa8f4196e, 0x897fc515, 0x171a9133, 0xe4afb226, 0xe2c73084, 0xb41e47fb, 0xa4d25184, 0x8bb7e420, 0x06888c02, 0xe52d9655, 0x13537f1c, 0xacdc4418, 0x7a74c473, 0x0a6b334a, 0x9be564a0, 0xe300fe21,

305

306

Gregory G. Rose and Philip Hawkes

0x786d37d6, 0x7a670fa8, 0xe358eb4f, 0x140fd52d, 0x80700f1c, 0xd6549d15, 0x28f07c02, 0xbbcfffbd, 0x6ad2aa44, 0x6763152a, 0xc351758d, 0xd4c77271, 0x9c4d9e81, 0xa7a9b5d6, 0xf1b35d46, 0x85ec70ba, 0x03098a5a, 0xa1b8fe1d, 0x8419224c, 0xaaf0b24f, 0xcee6053f, };

0xdfd506f1, 0x5c31ab9e, 0x19b9e343, 0x395412f8, 0x7624ed0b, 0x66ce46d7, 0xc31249b7, 0xc507ca84, 0x577452ac, 0x9c12b7aa, 0x7e81687b, 0x0431acab, 0xed623730, 0xb297adf8, 0x7a430a36, 0x39aea8cc, 0x92396b81, 0x5db3c697, 0x21203b35, 0x40fda998, 0xd0b0d300,

0x8ee00973, 0xd4dab618, 0x3a8d77dd, 0x2ba63360, 0x703dc1ec, 0xd17abe76, 0x6e9ed6ba, 0xe965f4da, 0xb5d674a7, 0x12615927, 0x5f52f0b3, 0xbef94aec, 0xcf8a21e8, 0xeed30431, 0x51194022, 0x737bae8b, 0x18de2522, 0x29164f83, 0x833ac0fe, 0xe7d52d71, 0xff99cbcc,

0x17bbde36, 0xcc1f52f5, 0xcdb93da6, 0x37e53ad0, 0xb7366795, 0xa448e0a0, 0xeaa47f78, 0x8e9f35da, 0x5461a46a, 0x7b4fb118, 0x2d4254ed, 0xfee994cd, 0x51917f0b, 0x68cac921, 0x9abca65e, 0x582924d5, 0x745c1cb8, 0x97c16376, 0xd966a19a, 0x390896a8, 0x065e3d40,

Appendix D: The Binary Equivalent Polynomial p1 The Turing LFSR is equivalent to 32 parallel binary LFSRs with a characteristic polynomial (shown in binary, with the ﬁrst bit being the constant term and increasing exponent): 1000000000000000010101010101011101110111011000110110001111100011 0110011010110011100100011110010111110011110110011110011001001100 0011110001101011011111010101110010001001111001110111101001110111 0101111001000000000101100001001011101101111110100000111010000111 1000111101011100001110000001000000101111110100011100000000101011 0111011100100011110000101111111010110011101100011111000110010100 1010111101011111110000011110111101110010001100000010110111000010 1000101111000111111100001010100111100001110011110111100010100000 001000011100000000001000100000001 That is, p1 (x) = 1 + x17 + x19 + x21 + x23 + x25 + x27 + x29 + x30 + · · · + x520 + x521 + x532 + x536 + x544 . This polynomial has 273 nonzero terms.

Rabbit: A New High-Performance Stream Cipher Martin Boesgaard, Mette Vesterager, Thomas Pedersen, Jesper Christiansen, and Ove Scavenius CRYPTICO A/S Fruebjergvej 3 2100 Copenhagen, Denmark [email protected]

Abstract. We present a new stream cipher, Rabbit, based on iterating a set of coupled non-linear functions. Rabbit is characterized by a high performance in software with a measured encryption/decryption speed of 3.7 clock cycles per byte on a Pentium III processor. We have performed detailed security analysis, in particular, correlation analysis and algebraic investigations. The cryptanalysis of Rabbit did not reveal an attack better than exhaustive key search. Keywords: Stream cipher, fast, non-linear, coupled, counter, chaos

1

Introduction

Stream ciphers are an important class of symmetric encryption algorithms. Their basic design philosophy is inspired by the One-Time-Pad cipher, which encrypts by XOR’ing the plaintext with a random key. However, the need for a key of the same size as the plaintext makes the One-Time-Pad impractical for most applications. Instead, stream ciphers expand a given short random key into a pseudo-random keystream, which is then XOR’ed with the plaintext to generate the ciphertext. Consequently, the design goal for a stream cipher is to eﬃciently generate pseudo-random bits which are indistinguishable from truly random bits. The aim of the present work is to design a secure stream cipher which is highly eﬃcient in software. 1.1

The History behind Rabbit

The design of Rabbit was inspired by the complex behavior of real-valued chaotic maps. Chaotic maps are primarily characterized by an exponential sensitivity to small perturbations causing iterates of such maps to seem random and longtime unpredictable. Those properties have also previously lead to suggestions that chaotic systems can be used for cryptographical purposes, see [1], [2] and references therein. However, even though chaotic systems exhibit random-like behavior, they are not necessarily cryptographically secure in their discretized form, see e.g. [3, 4]. The reason partly being that discretized chaotic functions T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 307–329, 2003. c International Association for Cryptologic Research 2003

308

Martin Boesgaard et al.

do not automatically yield suﬃciently complex behavior of the corresponding binary functions, which is a prerequisite for cryptographic security. It is therefore essential that the complexity of the binary functions is considered in the design phase such that necessary modiﬁcations can be made. Moreover, many suggested ciphers based on chaos suﬀer from reproducibility problems of the keystream due to the diﬀerent handling of ﬂoating-point numbers on various processors, see e.g. [5]. The design goal of Rabbit was to take advantage of the random-like properties of real-valued chaotic maps and, at the same time, secure optimal cryptographic properties when discretizing them. More precisely, the design was initiated by constructing a chaotic system of coupled non-linear maps. This system was then restricted to be ﬁxed-point valued1 . This ensured reproducibility, and made the system analyzable from a binary point of view using well-known cryptographic techniques (see e.g. [7]). The analysis gave reason to some systematic improvements of the equation system, some of which were strictly binary in nature, e.g. adoption of rotations and the XOR operator. Those changes were advantageous for the complexity of the binary functions as well as the performance. 1.2

Rabbit in General

The Rabbit algorithm can brieﬂy be described as follows. It takes a 128-bit secret key as input and generates for each iteration an output block of 128 pseudo-random bits from a combination of the internal state bits. The encryption/decryption is carried out by XOR’ing the pseudo-random data with the plaintext/ciphertext. The size of the internal state is 513 bits divided between eight 32-bit state variables, eight 32-bit counters and one counter carry bit. The eight state variables are updated by eight coupled non-linear integer valued functions. The counters secure a lower bound on the period length for the state variables. The speciﬁc design goals of Rabbit were as follows: – Security: The cipher should justify a key size of 128 bits for encrypting up to 264 bytes of plaintext. – Speed: It should be faster than commonly used ciphers. 1.3

Summary of Results

The cryptanalysis of Rabbit resulted in the following. To investigate the possibilities for Divide-and-Conquer and Guess-and-Determine types of attacks, an algebraic analysis was performed with special attention on the non-linear parts of the next-state function, as they are the main sources for mixing input bits. No such attacks better than exhaustive key search were found. To verify the resistance against correlation and distinguishing types of attacks, a correlation 1

This means that each variable is represented by an integer type number, where a virtual decimal point is introduced manually, see [6] for details.

Rabbit: A New High-Performance Stream Cipher

309

analysis was performed by calculating the Walsh-Hadamard spectra of the nonlinear parts. Based on the correlation analysis we do not believe there exists a correlation-type attack, which requires less work than exhaustive key search for an output sequence shorter than 264 bytes. We measured an encryption/decryption speed of Rabbit of 3.7 clock cycles per byte on a Pentium III processor. For an ARM7 processor the measured performance was 10.5 clock cycles per byte. 1.4

Organization and Notation

In section two we describe the design of Rabbit in detail. We discuss the cryptanalysis of Rabbit in section three, and in section four the performance results are presented. We conclude and summarize in section ﬁve. Appendix A contains the ANSI C code for Rabbit. Note that the description below and the source code are speciﬁed for little-endian processors (e.g. most Intel processors). Appendix B contains test vectors. Appendix C discusses important properties of the counter system in detail. We use the following notation: ⊕ denotes logical XOR, & denotes logical AND, and denote left and right logical bit-wise shift, ≪ and ≫ denote left and right bit-wise rotation, and denotes concatenation of two bit sequences. A[g..h] means bit number g through h of variable A. When numbering bits of variables, the least signiﬁcant bit is denoted by 0. Hexadecimal numbers are preﬁxed by “0x”. Finally, we use integer notation for all variables and constants.

2

The Design of Rabbit

In this section we provide a detailed description of the algorithm design. 2.1

The Cipher Algorithm

The internal state of the stream cipher consists of 513 bits. 512 bits are divided between eight 32-bit state variables xj,i and eight 32-bit counter variables cj,i , where xj,i is the state variable of subsystem j at iteration i, and cj,i denote the corresponding counter variables. There is one counter carry bit, φ7,i , which needs to be stored between iterations. This counter carry bit is initialized to zero. The eight state variables and the eight counters are derived from the key at initialization. Key Setup Scheme. The algorithm is initialized by expanding the 128-bit key into both the eight state variables and the eight counters such that there is a one-to-one correspondence between the key and the initial state variables, xj,0 , and the initial counters, cj,0 . The key, K [127..0] , is divided into eight subkeys: k0 = K [15..0] , k1 = K [31..16] , ..., k7 = K [127..112] . The state and counter variables are initialized from the subkeys as follows:

310

Martin Boesgaard et al.

xj,0 and cj,0

k(j+1 mod = k(j+5 mod

8)

kj

8)

k(j+4 mod

k(j+4 mod 8) k(j+5 mod = kj k(j+1 mod 8)

for j even 8)

for j odd

8)

for j even for j odd.

(1)

(2)

The system is iterated four times, according to the next-state function deﬁned below, to diminish correlations between bits in the key and bits in the internal state variables. Finally, the counter values are re-initialized according to: cj,4 = cj,4 ⊕ x(j+4 mod

(3)

8),4

to prevent recovery of the key by inversion of the counter system. Next-State Function. The core of the Rabbit algorithm is the iteration of the system deﬁned by the following equations: x0,i+1 = g0,i + (g7,i ≪ 16) + (g6,i ≪ 16) x1,i+1 = g1,i + (g0,i ≪ 8) + g7,i x2,i+1 = g2,i + (g1,i ≪ 16) + (g0,i ≪ 16) x3,i+1 = g3,i + (g2,i ≪ 8) + g1,i

gj,i

x4,i+1 = g4,i + (g3,i ≪ 16) + (g2,i ≪ 16) x5,i+1 = g5,i + (g4,i ≪ 8) + g3,i x6,i+1 = g6,i + (g5,i ≪ 16) + (g4,i ≪ 16) x7,i+1 = g7,i + (g6,i ≪ 8) + g5,i = (xj,i + cj,i )2 ⊕ ((xj,i + cj,i )2 32) mod 232

(4)

(5)

32

where all additions are modulo 2 . This coupled system is schematically illustrated in Fig. 1. Before an iteration the counters are incremented as described below. Counter System. The dynamics of the counters is deﬁned as follows: c0,i+1 = c0,i + a0 + φ7,i

mod 232

c1,i+1 = c1,i + a1 + φ0,i+1 mod 232 c2,i+1 = c2,i + a2 + φ1,i+1 mod 232 c3,i+1 = c3,i + a3 + φ2,i+1 mod 232 c4,i+1 = c4,i + a4 + φ3,i+1 mod 232 c5,i+1 = c5,i + a5 + φ4,i+1 mod 2

32

c6,i+1 = c6,i + a6 + φ5,i+1 mod 232 c7,i+1 = c7,i + a7 + φ6,i+1 mod 232

(6)

Rabbit: A New High-Performance Stream Cipher

c0,i

c1,i x0,i

<<<8

x1,i

<<<16

c7,i

311

<<<16

<<<16

x7,i

c2,i

x2,i

<<<16 <<<8

<<<8 <<<16

x6,i

c6,i

x3,i

<<<16

<<<16

x5,i

<<<8

x4,i

c5,i

c3,i

<<<16

c4,i

Fig. 1. Graphical illustration of the system.

where the counter carry bit,   1 φj,i+1 = 1   0

φj,i+1 , is given by if c0,i + a0 + φ7,i ≥ 232 ∧ j = 0 if cj,i + aj + φj−1,i+1 ≥ 232 ∧ j > 0 otherwise.

(7)

Furthermore, the aj constants are deﬁned as: a0 = 0x4D34D34D a2 = 0x34D34D34 a4 = 0xD34D34D3 a6 = 0x4D34D34D

a1 = 0xD34D34D3 a3 = 0x4D34D34D a5 = 0x34D34D34 a7 = 0xD34D34D3.

(8)

Extraction Scheme. After each iteration 128 bits of output are generated as follows: [15..0]

= x0,i

[47..32]

= x2,i

[79..64]

= x4,i

[111..96]

= x6,i

si si si si

[15..0]

⊕ x5,i

[15..0]

⊕ x7,i

[15..0]

⊕ x1,i

[15..0]

⊕ x3,i

[31..16]

si

[31..16]

= x0,i

[31..16]

si

[31..16]

si

[31..16]

si

[63..48]

= x2,i

[95..80]

= x4,i

[127..112]

= x6,i

where si is the 128-bit keystream block at iteration i.

[31..16]

⊕ x3,i

[15..0]

[31..16]

⊕ x5,i

[31..16]

⊕ x7,i

[31..16]

⊕ x1,i

[15..0] [15..0] [15..0]

(9)

312

Martin Boesgaard et al.

Encryption/Decryption Scheme. The extracted bits are XOR’ed with the plaintext/ciphertext to encrypt/decrypt. ci = pi ⊕ si , pi = ci ⊕ si ,

(10) (11)

where ci and pi denote the ith ciphertext and plaintext blocks, respectively.

3

Security Analysis

The security analysis is divided into six parts. First we discuss the key setup function and counter properties. We then perform an algebraic analysis of the next-state function, a correlation analysis of the binary functions and discuss statistical properties of Rabbit. In the last part the results of the investigations are used in speciﬁc types of attacks such as Guess-and-Determine, Divide-andConquer, Distinguishing and Correlation attacks. 3.1

Key Setup Properties

In this section we describe speciﬁc properties of the key setup scheme. The setup can be divided into three stages: Key expansion, system iteration and counter modiﬁcation. Key Expansion. In the key expansion stage we ensure two properties. The ﬁrst one being a one-to-one correspondence between the key, the state and the counter, which prevents key redundancy. The other property is that after one iteration of the next-state function, each key bit has aﬀected all eight state variables. More precisely, for a given key bit there exists a j such that this key bit aﬀects the output from gj,0 , g(j+1 mod 8),0 , g(j+4 mod 8),0 and g(j+5 mod 8),0 . In each of the eight next-state subfunctions at least one of those g-functions enter. System Iteration. The key expansion scheme ensures that after two iterations of the next-state function, all state bits are aﬀected by all key bits with a measured probability of 0.5. A safety margin is provided by iterating the system four times. Counter Modiﬁcation. Even though the counters should be known to an attacker, the counter modiﬁcation makes it hard to recover the key by inverting the counter system as this would require additional knowledge of the state variables. Due to the counter modiﬁcation we cannot guarantee that every key results in unique counter values. However, we do not believe this to cause a problem as will be discussed later on. 3.2

Counter Properties

In this section we describe the dynamics of the counters, i.e. the period length and bit-ﬂip probabilities of individual bit values.

Rabbit: A New High-Performance Stream Cipher

313

Period Length. The most important feature of counter assisted stream ciphers [8] is that strict lower bounds on the period lengths can be provided. The adopted counter system in Rabbit has a period length of 2256 − 1. Since it can be shown that the input to the g-functions has at least the same period, a highly conservative lower bound on the period of the state variables, Nx > 2158 , can be secured (see Appendix C). Probabilities for Bit-Flips in the Counters. For a 256-bit counter incremented by one, the period length for bit position i is 2i+1 . This implies that the least signiﬁcant bit has a bit-ﬂip probability of 1 and the most signiﬁcant bit has a bit-ﬂip probability of 2−255 . Consequently, the value of the most signiﬁcant bit will remain constant for 2255 iterations, thereby making it very predictable. In contrast, all bits in the counter deﬁned in Eqs. (6) and (7), will have equal period length, as each bit is indirectly inﬂuenced by all other bits, due to the feedback of the carry, φi,7 into the counter ci,0 . This implies that all bits have the same period length as the full system. In Appendix C we calculate the bit-ﬂip probabilities in the counter system. The most important ﬁndings are as follows. For the chosen aj constants, Eq. (8), bit-ﬂip probabilities for the individual bit positions are all in the interval [0.17; 0.91]. Furthermore, the probabilities are unique for each bit position. Since all the counter bits have full period and unique bit-ﬂip probabilities, it seems diﬃcult to predict bit patterns of the counter variables. 3.3

Algebraic Analysis

In this section we analyze a given output byte’s dependence on its input bytes. The counter is ignored in the following. We ﬁrst analyze the g-function deﬁned by g(y) = (y 2 ⊕ (y 2 32)) mod 232 . (12) By dividing y into four bytes, (a, b, c, d), we can write y 2 as: (a 24 + b 16 + c 8 + d)2 = (a2 48 + ab 41 + ac 33 + b2 32 + ad 25 + bc 25 + bd 17 + c2 16 + cd 9 + d2 ). (13) The similar form of g(y) follows directly from above. By collecting terms corresponding to each of the four g(y) output bytes their dependencies on the four input bytes can be obtained. These dependencies are summarized in Table 1. To quantitatively examine a given output byte’s dependence on its input bytes, we deﬁne an input mask function, MI (y) = y&mI and a similar output mask function MO (y) = y&mO where mI and mO are masks selecting speciﬁc byte-patterns. For all input values, y, we calculate z = MO g(MI (y)) ⊕ g(y) . (14) This function characterizes the error in the output byte based on the input bytes deﬁned by the mask mI . We can deﬁne a measure for this error by calculating its corresponding entropy.

314

Martin Boesgaard et al.

Table 1. The inﬂuences of the input bytes on the output bytes of the g-function. The subscripts, H,L, denotes the eight most,least signiﬁcant bits of the 16-bit result for the multiplication of the two 8-bit numbers. For simplicity, carries from additions are ignored and the shifts are changed to nearest multiple of eight. a = y [31..24] b = y [23..16] c = y [15..8] d = y [7..0]

g(y)[31..24] (ad)L + (a2 )H (bc)L + (bd)H (bc)L + (c2 )H (ad)L + (bd)H

g(y)[23..16] (a2 )L + (ab)H (bd)L + (ab)H (c2 )L + (cd)H (bd)L + (cd)H

g(y)[15..8] (ab)L + (ac)H (ab)L + (b2 )H (cd)L + (ac)H (cd)L + (d2 )H

g(y)[7..0] (ac)L + (ad)H (b2 )L + (bc)H (ac)L + (bc)H (d2 )L + (ad)H

Table 2. The entropy of the error, maximally 8 bits, for an estimated output byte when removing a given input byte. a = y [31..24] b = y [23..16] c = y [15..8] d = y [7..0]

g(y)[31..24] g(y)[23..16] g(y)[15..8] g(y)[7..0] 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.99 7.98

The speciﬁc investigation consisted of calculating the 16 entropies obtained by using all combinations of four 8-bit rotations of mI =0x00FFFFFF and four 8-bit rotations of mO =0x000000FF. The results are shown in Table 2. The table shows the entropy of z for the 16 diﬀerent byte-wise combinations. We clearly observe the expected behavior from Table 1. Hence, we conclude that all four output bytes of the g-function each depend on four input bytes. Removing any of those input bytes will result in nearly maximal entropy of the error of the output bytes. We also performed a similar analysis based on individual bits instead of individual bytes leading to similar conclusions. Using the above results, we analyze the next-state subfunctions given by feven (y1 , y2 , y3 ) = g(y1 ) + (g(y2 ) ≪ 16) + (g(y3 ) ≪ 16),

(15)

fodd (y1 , y2 , y3 ) = g(y1 ) + (g(y2 ) ≪ 8) + g(y3 ).

(16)

and Each function depends on three independent g-functions of which one or two have been rotated. Therefore, we can easily construct a table similar to Table 1 and use the results shown in Table 2 to obtain the corresponding entropies of the errors for the next-state function. Clearly, all output bytes of the next-state function depend on the maximal 12 input bytes. Consequently, removing any of those input bytes will result in nearly maximal entropy of error of the output bytes. 3.4

Linear Correlation Analysis

The aim of the correlation analysis is to ﬁnd the best linear approximations between bits in the input to the next-state function and the extracted output.

Rabbit: A New High-Performance Stream Cipher

315

Each of the eight next-state functions takes three 32-bit state variables and three 32-bit counter values as input and returns the corresponding updated 32bit state variable. Each bit position in xj,i+1 deﬁnes a binary function from {0, 1}192 to {0, 1}. Thus, assuming that all 192 input bits are independently and uniformly distributed random variables, all correlations from output bits to linear combinations of input bits can be found via the Walsh-Hadamard Transform (WHT) [10, 11]. Clearly, we cannot numerically perform such a complete WHT of a 192-bit binary function. However, from analyzing the basic building block of the next-state function, i.e. the g-function, we obtain linear approximations of the cipher and their corresponding correlations coeﬃcients. Note that all correlation coeﬃcients are represented as absolute values. The g -Function. In the following we ignore the counter system and focus only on the correlation between the output of the g-function and its 32-bit input, y ≡ x + c. The WHTs of all single output bits of the g-function revealed that the largest correlation coeﬃcients for all output bits of g(y) are in the interval [2−9.74 ; 2−9.00 ]. Among those the best linear approximation is: g [6] ≈ y [0] ⊕ y [3] ⊕ (y [5] ⊕ y [6] ⊕ . . . ⊕ y [16] ⊕ y [17] ) ⊕ (y [19] ⊕ y [20] ⊕ . . . ⊕ y [30] ⊕ y [31] ).

(17)

In general, linear approximations for linear combinations of binary functions can be obtained by a convolution of the involved WHT spectra, [11]. An exhaustive investigation of all 232 possible convolutions of WHT spectra from the individual output bits in the g-function is not feasible. However, investigations of all convolutions of 16-, 18- and 20-bit g-functions show that the largest resulting correlation coeﬃcients are of similar magnitude as the non-combined output bits and we expect the 32-bit g-function to behave similarly. Non-combined Output Bits. To determine linear approximations between the input to the next-state subfunctions and single output bits of the nextstate subfunctions, we applied the following strategy. The next-state function includes the addition of three g-functions. To take those additions into account, we determined the best linear approximations for the function, f (a, b, c) = a + b + c, for each bit position. For instance, for each bit position j 3 the best linear approximations are: f [j] ≈ a[j] ⊕ b[j] ⊕ c[j] f [j] ≈ a[j] ⊕ b[j] ⊕ c[j] ⊕ a[j−1] ⊕ b[j−1] f [j] ≈ a[j] ⊕ b[j] ⊕ c[j] ⊕ b[j−1] ⊕ c[j−1]

(18) (19)

f

(21)

[j]

≈a

[j]

⊕b

[j]

⊕c

[j]

⊕a

[j−1]

⊕c

. j−1

[j−1]

(20)

Their corresponding correlation coeﬃcients are given by: n=1 2−2n . The next step is to substitute a, b and c by the corresponding binary functions of the g-function, i.e. a[j] ⊕ a[j−1] → g [j] ⊕ g [j−1] . By determining the WHT spectra of the independent parts, we obtain linear approximations for the output bits

316

Martin Boesgaard et al.

of the next-state function. Each corresponding correlation coeﬃcient is found by multiplying the product of the correlation coeﬃcients for the independent parts with each correlation coeﬃcient for the addition approximations. This results in [1] a largest correlation coeﬃcient of 2−28.6 for feven . In the extraction function two bits from independent subsystems are XOR’ed, and the largest correlation coeﬃcient can therefore be determined by the product of their largest individual correlation coeﬃcients, yielding a largest correlation coeﬃcient of 2−57.8 for [1] [17] feven ⊕ fodd . Linearly Combined Output-Bits. We assume that the best linear approximations for combined output bits are those that depend on the least number of g-functions. At the same time we assume that for a given number of g-function dependencies, the best approximations are those which include the fewest number of combinations of extracted output bits. To ﬁnd these g-function dependencies, additions were replaced by XORs and the output from each g-function were divided into 8-bit blocks. Then an exhaustive search among all combinations of extracted output bytes were performed to ﬁnd those with the least g-function dependencies. It was found that all combinations of output bytes depend on at least four diﬀerent g-functions which can only be obtained by combining at least ﬁve extracted output bits. On the other hand, it was found that by combining two extracted output bits, the least number of g-function dependencies is ﬁve. For instance, combining the extracted output s[7..0] and s[127..120] yields: s[7..0] ⊕ s[127..120] = (g0 + (g7 ≪ 16) + (g6 ≪ 16))[7..0] ⊕ (g5 + (g4 ≪ 8) + g3 )[23..16] ⊕ (g1 + (g0 ≪ 8) + g7 )[15..8] ⊕ (g6 + (g4 ≪ 16) + (g5 ≪ 16))[31..24]

(22)

and using Eq. (18) for each parenthesis, i.e. replacing addition with XOR we obtain: [15..8]

s[7..0] ⊕ s[127..120] ≈ g1

[23..16]

⊕ g3

[23..16] g6

⊕

[15..8]

⊕ g5

[31..24] g6

⊕

[23..16]

⊕ g5

[15..8] g7

⊕

⊕

[23..16] g7

(23)

which depends on ﬁve diﬀerent g-functions. The largest corresponding correlation coeﬃcient is 2−59.8 . All other combinations of two output bits depending on ﬁve g-functions have smaller correlation coeﬃcients. An example of a linear approximation that only depends on four g-functions is: [23..16]

s[7..0] ⊕ s[23..16] ⊕ s[79..72] ⊕ s[55..48] ⊕ s[111..104] ≈ g3

[7..0]

⊕ g5

[31..24] [7..0] ⊕ g6 g5 [23..16] [7..0] g6 ⊕ g7 [23..16] g7 .

[23..16]

⊕ g5 ⊕ ⊕

⊕

[15..8] g6 ⊕ [31..24] g7 ⊕

(24)

Rabbit: A New High-Performance Stream Cipher

317

with a largest correlation coeﬃcient of 2−59.2 . All other byte-wise combinations of ﬁve output bits depending on four g-functions have smaller correlation coefﬁcients. 3.5

Statistical Tests

The statistical tests on Rabbit were performed using the NIST Test Suite [12], the DIEHARD battery of tests [13] and the ENT test [14]. Tests were performed on the internal state as well as on the extracted output. Furthermore, we also conducted various statistical tests on the key setup function. Finally, we performed the same tests on a version of Rabbit where each state variable and counter variable was only 8 bit wide. No weaknesses were found in any case. 3.6

Resulting Attacks

This subsection discusses relevant attacks based on the above analysis. Attacks on the Key Setup Function. Due to the four iterations after key expansion and the ﬁnal counter modiﬁcation, both the counter bits and the state bits depend strongly and highly non-linearly on the key bits. This makes attacks based on guessing parts of the key diﬃcult. Furthermore, even if the counter bits were known after the counter modiﬁcation, it is still hard to recover the key. Of course, knowing the counters makes other types of attacks easier. As the non-linear map in Rabbit is many-to-one, diﬀerent keys could potentially result in the same keystream. This concern can basically be reduced to the question whether diﬀerent keys result in the same counter values, since diﬀerent counter values must necessarily lead to diﬀerent keystreams. The reason is that when the periodic solution has been reached, the next-state function, including the counter system, is one-to-one on the set of points in the period. The key expansion scheme was designed such that each key leads to unique counter values. However, the counter modiﬁcation might result in equal counter values for two diﬀerent keys. Assuming that the output after the four initial iterations is essentially random and not correlated with the counter system, the probability for counter collisions is essentially given by the birthday paradox, i.e. for all 2128 keys one collision is expected in the 256-bit counter state. Thus, we do not believe counter collisions to cause a problem. Another possibility for related key attacks is to exploit the symmetries of the next-state and key setup functions. ˜ related by K [i] = K ˜ [i+32] for all i. This For instance, consider two keys, K and K leads to the relation, xj,0 = x ˜j+2,0 and cj,0 = c˜j+2,0 . If the aj constants were related in the same way, the next-state function would preserve this property. In the same way this symmetry could lead to a set of bad keys, i.e. if K [i] = K [i+32] for all i, then xj,0 = xj+2,0 and cj,0 = cj+2,0 . However, the next-state function does not preserve this property due to the counter system as aj = aj+2 . Divide-and-Conquer Attack. This type of attack is feasible if only a part of the state needs to be known in order to predict a signiﬁcant fraction of the

318

Martin Boesgaard et al.

output bits. An attacker will guess a part of the state, predict the output bits and compare them with actually observed output bits. Our strategy is to accurately predict one extracted output byte based on guessing as few input bytes as possible. According to section 3.3 the attacker must guess 2 · 12 input bytes for the diﬀerent g-functions. Thus, 192 bits in total must be guessed. Furthermore, we have veriﬁed that calculating less extracted bits than a byte still results in more work than exhaustive key search. Finally, when replacing all additions by XORs, all byte-wise combinations of the extracted output still depend on at least four diﬀerent g-functions, see section 3.4. To conclude, it is not possible to verify a guess on fewer bits than the key size. Guess-and-Determine Attack. The strategy for this attack is to guess a few of the unknown variables of the cipher and from those deduce the remaining unknowns. For simplicity, we assume that the counters are static. A simple attack of this type consists of guessing the remaining 128 bits of the internal state from the extracted 128 bits for each of two consecutive iterations. This amounts to guessing the remaining 128 + 128 bits and derive the counter values. Each of the resulting systems must then be iterated a couple of times to verify the output. However, in the above attack it is assumed that no advantage is gained by dividing the counters and state variables into smaller blocks. An attack exploiting this possibility can be formulated as follows. Divide the 32-bit state variables and counters into 8-bit variables. Construct an equation system consisting of the 8 · 4 8-bit subsystems for N iterations together with the corresponding (N + 1) · 8 extraction functions which are split into (N + 1) · 16 8-bit functions. In order to obtain a closed system of equations, output from 4 · 8 extraction functions is needed, i.e. N = 3. Thus, the equation system consists of 160 coupled equations with 8 · 4 unknown counter bytes and (3 + 1) · 8 · 4 unknown state bytes, i.e. a total of 160 unknowns. A strategy for solving this equation system must be found by guessing as few input bytes as possible and determining the remaining unknown bytes. The eﬃciency of such a strategy depends on the amount of variables that must be guessed before the determining process can begin. This amount is given by the 8bit subsystem with the fewest number of input variables. Neglecting the counters, the results of section 3.3 illustrate that each byte of the next-state function depends on 12 input bytes. When the counters are included, each output byte of a subsystem depends on 24 input bytes. Consequently, the attacker must guess more than 128 bits before the determining process can begin, thus, making the attack infeasible. Dividing the system into smaller blocks than bytes results in the same conclusion. Distinguishing and Correlation Attacks. In case of a distinguishing attack the attacker tries to distinguish a sequence generated by the cipher from a sequence of truly random numbers. A distinguishing attack using less than 264 bytes of output cannot be applied using only the best linear approximation

Rabbit: A New High-Performance Stream Cipher

319

found in section 3.4 because the corresponding correlation coeﬃcient is 2−57.8 . This implies that in order to observe this particular correlation, output from 2114 iterations must be generated [9]. The independent counters have very simple and almost linear dynamics. Therefore, large correlations to the counter bits may cause a possibility for a correlation attack (see e.g. [15]) for recovering the counters. It is not feasible to exploit only the best linear approximation in order to recover a counter value. However, more correlations to the counters could be exploited. As this requires that there exists many such large and useable correlations, we do not believe such an attack to be feasible. Knowing the values of the counters may signiﬁcantly improve both the Guess-and-Determine attack, the Divide-and-Conquer attack as well as a Distinguishing attack even though obtaining the key from the counter values is prevented by the counter modiﬁcation in the setup function.

4

Performance

In this section we provide performance results from implementations of Rabbit on 32-bit processors and discuss 8-bit implementation aspects. 4.1

32-Bit Processors

Encryption speeds for the speciﬁc processors were obtained by encrypting 200 kilobytes of data stored in RAM and measuring the number of clock cycles passed. Memory requirements and performance results are listed in the tables below. For convenience, all 513 bits of the internal state are stored in an instance structure, occupying a total of 68 bytes. The presented memory requirements show the amount of memory allocated on the stack for calling conventions (function arguments, return address and preserved registers) and temporary data. No memory requirements for storing the key, the instance, the ciphertext and the plaintext have been included. Intel Pentium III Architecture. The performance was measured on a 1000 MHz Pentium III processor. The speed-optimized version of Rabbit was programmed in assembly language inlined in C using MMX instructions and compiled using the Intel C++ 7.0 compiler. The results are listed in Table 3 below. A memory-optimized version (where calling conventions are ignored) eliminates the need for memory, including the instance structure, since the entire instance structure and temporary data can ﬁt into the CPU registers. Table 3. Code size, memory requirements and performance for Pentium III. Function Code size Memory Performance Key Setup 617 bytes 32 bytes 350 cycles Encryption/Decryption 440 bytes 36 bytes 3.7 cycles/byte

320

Martin Boesgaard et al.

ARM7 Architecture. A speed optimized ARM implementation was compiled and tested using ARM Developer Suite version 1.2 for ARM7TDMI. Performance was measured using the integrated ARMulator. The performance results, code size and memory requirements are listed in Table 4 below: Table 4. Code size, memory requirements and performance for ARM7. Function Code size Memory Performance Key Setup 516 bytes 44 bytes 679 cycles Encryption/Decryption 424 bytes 52 bytes 10.5 cycles/byte

4.2

8-Bit Processors

The simplicity and small size of Rabbit makes it suitable for implementation for processors with limited resources such as 8-bit microcontrollers. Multiplying 32-bit integers is rather resource demanding using plain 32-bit arithmetics. However, as seen in Eq. (13) in section 3.3, squaring involves only ten 8-bit multiplications which reduces the workload by approximately a factor of two. Finally, the rotations in the algorithm have been chosen to correspond to simple byte-swapping.

5

Conclusion

In this paper we presented a new stream cipher called Rabbit. A complete description of the algorithm, an evaluation of its security properties, performance and implementation aspects were given. Our most important ﬁndings include the following: In terms of security, Guess-and-Determine attacks, Divide-andConquer attacks as well as Distinguishing and Correlation attacks were considered, but no attack better than exhaustive key search was found. The measured encryption/decryption performance was 3.7 clock cycles per byte on a Pentium III processor and 10.5 clock cycles per byte on an ARM7 processor.

Acknowledgements The authors would like to thank Vincent Rijmen for several ideas and suggestions. Furthermore, we thank Ivan Damgaard and Tomas Bohr for many helpful inputs. Finally, we would like to thank Anne Canteaut and the anonymous reviewers for helpful feedback.

References 1. S. Wolfram, Cryptography with Cellular Automata, Proceedings of Crypto ’85, pp. 429-432 (1985) 2. G. Jakimoski and L. Kocarev: Chaos and Cryptography: Block Encryption Ciphers Based on Chaotic Maps, IEEE Transactions on Circuits and Systems-1: Fundamental Theory and Applications, Vol. 48, NO. 2, pp 163-169 (2001)

Rabbit: A New High-Performance Stream Cipher

321

3. T. Habutso, Y. Nishio, I. Sasase and S. Mori: A secret key cryptosystem by iterating a chaotic map, Proceedings of the EUROCRYPT ‘91, Springer, Berlin, pp. 127-140 (1991) 4. E. Biham: Cryptoanalysis of the chaotic-map cryptosystem suggested at EUROCRYPT ‘91, Proceedings of the EUROCRYPT ‘91, Springer, Berlin, pp. 532-534 (1991) 5. R. Matthews: On the Derivation of a “Chaotic” Encryption Algorithm, Cryptology Vol. XIII NO. 1, pp. 29-41 (1989) 6. R. Yates, Fixed-Point Arithmetic: An Introduction, http://personal.mia.bellsouth.net/lig/y/a/yatesc/fp.pdf 7. A. Menezes, P. Oorschot and S. Vanstone: Handbook of Applied Cryptography, CRC Press LLC (1997). 8. A. Shamir and B. Tsaban: Guaranteeing the Diversity of Number Generators, Information and Computation Vol. 171, No. 2, pp. 350-363 (2001) 9. M. Matsui, Linear Cryptanalysis Method for DES Cipher, Advances in Cryptology – EUROCRYPT ’93. pp. 386-397 (1993). 10. R. A. Rueppel: Analysis and Design of Stream Ciphers, Springer, Berlin (1986). 11. J. Daemen, Chapter 5: Propagation and Correlation, Annex to AES Proposal (1998). 12. A Statistical Test Suite for the Validation of Random Number Generators and Pseudo Random Number Generators for Cryptographic Applications, NIST Special Publication 800-22, National Institute of Standards and Technology, 2001, http://csrc.nist.gov/rng 13. G. Masaglia, A battery of tests for random number generators, Florida State University, http://stat.fsu.edu/ geo/diehard.html 14. J. Walker, A Pseudorandom Number Sequence Test Program, http://www.fourmilab.ch/random 15. W. Meier and O. Staﬀelbach, Fast Correlation Attacks on Stream Ciphers, Advances in Cryptology-EUROCRYPT (LNCS 330), pp. 301-314, (1988).

322

A

Martin Boesgaard et al.

ANSI C Source Code

This appendix presents the ANSI C source code for Rabbit. rabbit.h Below the rabbit.h header ﬁle is listed: #ifndef _RABBIT_H #define _RABBIT_H #include <stddef.h> // Type declarations of 32-bit and 8-bit unsigned integers typedef unsigned int uint32; typedef unsigned char byte; // Structure to store the instance data (internal state) typedef struct { uint32 x[8]; uint32 c[8]; uint32 carry; } t_instance; void key_setup(t_instance *p_instance, const byte *p_key); void cipher(t_instance *p_instance, const byte *p_src, byte *p_dest, size_t data_size); #endif rabbit.c. In the C ﬁle, rabbit.c, the logical rotation function, _rotl, is used, however, for some compilers it may not be deﬁned. In such case, the logical rotation function can be deﬁned as: uint32 _rotl(uint32 x, int rot) {return (x<>(32-rot));} Below the rabbit.c ﬁle is listed: #include <stdlib.h> #include "rabbit.h" // Square a 32-bit number to obtain the 64-bit result and return // the upper 32 bit XOR the lower 32 bit uint32 g_func(uint32 x) { // Construct high and low argument for squaring uint32 a = x&0xFFFF;

Rabbit: A New High-Performance Stream Cipher

323

uint32 b = x>>16; // Calculate high and low result of squaring uint32 h = ((((a*a)>>17) + (a*b))>>15) + b*b; uint32 l = x*x;

}

// Return high XOR low; return hˆl;

// Calculate the next internal state void next_state(t_instance *p_instance) { // Temporary data uint32 g[8], c_old[8], i; // Save old counter values for (i=0; i<8; i++) c_old[i] = p_instance->c[i]; // Calculate new counter values p_instance->c[0] += 0x4D34D34D + p_instance->carry; p_instance->c[1] += 0xD34D34D3 + (p_instance->c[0] < p_instance->c[2] += 0x34D34D34 + (p_instance->c[1] < p_instance->c[3] += 0x4D34D34D + (p_instance->c[2] < p_instance->c[4] += 0xD34D34D3 + (p_instance->c[3] < p_instance->c[5] += 0x34D34D34 + (p_instance->c[4] < p_instance->c[6] += 0x4D34D34D + (p_instance->c[5] < p_instance->c[7] += 0xD34D34D3 + (p_instance->c[6] < p_instance->carry = (p_instance->c[7] < c_old[7]);

c_old[0]); c_old[1]); c_old[2]); c_old[3]); c_old[4]); c_old[5]); c_old[6]);

// Calculate the g-functions for (i=0;i<8;i++) g[i] = g_func(p_instance->x[i] + p_instance->c[i]);

}

// Calculate new p_instance->x[0] p_instance->x[1] p_instance->x[2] p_instance->x[3] p_instance->x[4] p_instance->x[5] p_instance->x[6] p_instance->x[7]

state values = g[0] + _rotl(g[7],16) = g[1] + _rotl(g[0], 8) = g[2] + _rotl(g[1],16) = g[3] + _rotl(g[2], 8) = g[4] + _rotl(g[3],16) = g[5] + _rotl(g[4], 8) = g[6] + _rotl(g[5],16) = g[7] + _rotl(g[6], 8)

+ + + + + + + +

_rotl(g[6],16); g[7]; _rotl(g[0],16); g[1]; _rotl(g[2],16); g[3]; _rotl(g[4],16); g[5];

324

Martin Boesgaard et al.

// key_setup void key_setup(t_instance *p_instance, const byte *p_key) { // Temporary data uint32 k0, k1, k2, k3, i; // k0 k1 k2 k3

Generate four subkeys = *(uint32*)(p_key+ 0); = *(uint32*)(p_key+ 4); = *(uint32*)(p_key+ 8); = *(uint32*)(p_key+12);

// Generate initial state variables p_instance->x[0] = k0; p_instance->x[2] = k1; p_instance->x[4] = k2; p_instance->x[6] = k3; p_instance->x[1] = (k3<<16) | (k2>>16); p_instance->x[3] = (k0<<16) | (k3>>16); p_instance->x[5] = (k1<<16) | (k0>>16); p_instance->x[7] = (k2<<16) | (k1>>16); // Generate initial counter values p_instance->c[0] = _rotl(k2,16); p_instance->c[2] = _rotl(k3,16); p_instance->c[4] = _rotl(k0,16); p_instance->c[6] = _rotl(k1,16); p_instance->c[1] = (k0&0xFFFF0000) p_instance->c[3] = (k1&0xFFFF0000) p_instance->c[5] = (k2&0xFFFF0000) p_instance->c[7] = (k3&0xFFFF0000)

| | | |

(k1&0xFFFF); (k2&0xFFFF); (k3&0xFFFF); (k0&0xFFFF);

// Reset carry flag p_instance->carry = 0; // Iterate the system four times for (i=0;i<4;i++) next_state(p_instance);

}

// Modify the counters for (i=0;i<8;i++) p_instance->c[(i+4)&0x7] ˆ= p_instance->x[i];

Rabbit: A New High-Performance Stream Cipher

325

// Encrypt or decrypt a block of data void cipher(t_instance *p_instance, const byte *p_src, byte *p_dest, size_t data_size) { uint32 i; for (i=0; ix[0] ˆ (p_instance->x[5]>>16) ˆ (p_instance->x[3]<<16); *(uint32*)(p_dest+ 4) = *(uint32*)(p_src+ 4) ˆ p_instance->x[2] ˆ (p_instance->x[7]>>16) ˆ (p_instance->x[5]<<16); *(uint32*)(p_dest+ 8) = *(uint32*)(p_src+ 8) ˆ p_instance->x[4] ˆ (p_instance->x[1]>>16) ˆ (p_instance->x[7]<<16); *(uint32*)(p_dest+12) = *(uint32*)(p_src+12) ˆ p_instance->x[6] ˆ (p_instance->x[3]>>16) ˆ (p_instance->x[1]<<16);

}

}

B

// Increment pointers to source and destination data p_src += 16; p_dest += 16;

Test Vectors

The keys and outputs are presented byte-wise. The leftmost byte of key is K [7..0] . key

=

[00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00]

s[0] = s[1] = s[31] =

[02 F7 4A 1C 26 45 6B F5 EC D6 A5 36 F0 54 57 B1] [A7 8A C6 89 47 6C 69 7B 39 0C 9C C5 15 D8 E8 88] [EF 9A 69 71 8B 82 49 A1 A7 3C 5A 6E 5B 90 45 95]

326

Martin Boesgaard et al.

key

=

[C2 1F CF 38 81 CD 5E E8 62 8A CC B0 A9 89 0D F8]

s[0] = s[1] = s[31] =

[3D 02 E0 C7 30 55 91 12 B4 73 B7 90 DE E0 18 DF] [CD 6D 73 0C E5 4E 19 F0 C3 5E C4 79 0E B6 C7 4A] [9F B4 92 E1 B5 40 36 3A E3 83 C0 1F 9F A2 26 1A]

key

=

[1D 27 2C 6A 2D 8E 3D FC AC 14 05 6B 78 D6 33 A0]

s[0] = s[1] = s[31] =

[A3 A9 7A BB 80 39 38 20 B7 E5 0C 4A BB 53 82 3D] [C4 42 37 99 C2 EF C9 FF B3 A4 12 5F 1F 4C 99 A8] [97 C0 73 3F F1 F1 8D 25 6A 59 E2 BA AB C1 F4 F1]

C

Counter Properties

In this appendix we discuss important properties of the counter system. Counter Period. The period of the counter system is Nc = 2256 − 1. This can be shown as follows. The counter system deﬁned in Eqs. (6) and (7) can equivalently be described by the following recurrence relation: Ci+1 = Ci + A + Φi mod 2256 , (25) where Φi+1 is deﬁned as: Φi+1

1 if Ci + A + Φi ≥ 2256 = 0 otherwise.

(26)

Ci is a 256-bit integer obtained by concatenating all eight individual counters, i.e. Ci = c7,i c6,i c5,i c4,i c3,i c2,i c1,i c0,i , and A is likewise obtained by concatenating the eight aj constants. The above recurrence relation is equivalent to the following linear congruential generator: Zi+1 = (Zi + A) mod (2256 − 1), (27) which has a period length of Nc = 2256 − 1, since A has been chosen such that gcd(A, 2256 − 1) = 1. To show that Z is equivalent to C, we consider an initial value C0 = Z0 for Z0 > A. The recurrence relation for Ci can be deﬁned in terms of Zi :   if (Zi−1 + A) < 2256 − 1 ∧ Zi−1 = 0 Zi 256 Ci = 2 − 1 if (Zi−1 + A) = 2256 − 1 (28)   256 if (Zi−1 + A) > 2 − 1 ∨ Zi−1 = 0. Zi − 1 Therefore, Ci will run through the same set of numbers as Zi except that Ci will attain the value 2256 − 1 but not the value A. Thus, the period of the recurrence relation, C, is the same as for the linear congruential generator, Z. In particular, Ci = Cj if i − j mod Nc = 0.

Rabbit: A New High-Performance Stream Cipher

327

Internal State Period. For convenience, we write the next-state function in the following way −−−→ xi+1 = F (y i ) mod 232 , (29) where yi = (ci + xi ) mod 232 ,

(30)

such that xi is the internal state variable and ci is the counter state. According to a generalized version of lemma 4.1 in [8], y i will have at least the period of the counter system, Nc : −−−→ Proof. Given that y i = y j for i − j mod Nc = 0, then y i+1 = F (y i ) + ci and −−−→ y j+1 = F (y j ) + cj . Moreover, we have: ci = cj , therefore, y i+1 = y j+1 . Finally, if y i−1 = y j−1 this would imply that y i = y j which is a contradiction. Thus, also y i−1 = y j−1 However, a combination of the internal state, xi , is extracted as output. It is not evident that xi will have the same period as the counter system, but a lower bound for that period is obtained in the following. First, we note that there are relations between the counter period, Nc , the internal state period, Nx and the period of the y variables, Ny : Ny = aNx = bNc

(31)

where a and b are integers greater than zero with gcd(a, b) = 1. −−−→ Proof. Since xi+1 = F (yi ), we have Nx Ny . In particular, Nx divides Ny , because, if we assume that this is not the case, then there would exist an i −−−→ −−−−−−−−→ such that F (yi ) = xi+1 = xi+1+ Ny N = F (y i+ Ny N ) which contradicts the Ny Nx

x

Nx

x

periodicity. Thus, there exists an integer, a > 0, such that Ny = aNx . We also have that Nc divides Ny because if this was not the case then ci = ci+ Ny N . We Nc

c

just showed that xi = xi+Ny for all i, but y i = xi + ci = xi+ Ny N + ci+ Ny N = Nc

c

Nc

c

y i+Ny which again contradicts the Ny periodicity. Therefore, there exists an integer, b > 0 such that Ny = bNc and consequently, Ny = aNx = bNc b We have the relation: Nx = a Nc . Thus, we want to ﬁnd an upper bound on the ratio, a/b. This can be done as follows. Deﬁne the degeneracy d to be the maximal number of pre-images xi+1 can have, i.e. d is the maximal number of diﬀerent y i which give the same xi+1 and similarly, deﬁne dg to be the analogue for each g function. Then we can obtain the following rather conservative lower bound for the period: Let (x0 , x1 , x2 , ..., xNx −1 ) be a periodic sequence with period Nx , then the upper bound on a/b is the degeneracy d, i.e.: Nx where Nc is the counter period.

Nc , d

(32)

328

Martin Boesgaard et al.

Nc Proof. We want to show that k ≡ ab = N d. The periodicity gives: xi = x xi+Nx = xi+2Nx = ... = xi+(k−1)Nx . On the other hand, the corresponding counter values are non-equal: ci = ci+Nx = ci+2Nx = ... = ci+(k−1)Nx . Therefore, it follows: xi + ci = xi+Nx + ci+Nx = xi+2Nx + ci+2Nx = ... = xi+(k−1)Nx + ci+(k−1)Nx or equivalently: y i = y i+Nx = y i+2Nx = ... = y i+(k−1)Nx . Because of −−−→ −−−−−−→ −−−−−−−→ −−−−−−−−−−→ the periodicity we have F (y i ) = F (y i+Nx ) = F (y i+2Nx ) = ... = F (y i+(k−1)Nx ). Nc d Since each xi+1 maximally can have d pre-images, we see that k = ab = N x

To illustrate that the period length is suﬃciently large, consider the equation sys−−−−→ tem, xi+1 = FI (xi ) arising by replacing all the g-functions by identity functions, but keeping the rotations. Fixing any two of the 32-bit input variables, the resulting equation system has a unique output for the remaining six input variables. −−−→ Therefore, FI (x) is maximally 264 -to-one. This bound can be combined with the measured degeneracy for the g-function, dg = 18, to obtain d < 264 · 188 < 298 which shows that the period length of the state variables is suﬃciently large, i.e. Nx (2256 − 1)/d > 2158 . → − This bound is, of course, highly underestimated. For instance, the FI map will probably have degeneracy close to one. Furthermore, all points in the periodic solution should have the maximal degeneracy, d, and they should appear in exact synchronization with the counter. So if the output of F is not correlated strongly with the counter sequence, the probability for actually realizing this lower bound is vanishing. Furthermore, for the speciﬁc g-function only one point have a maximal degeneracy of 18 and about half of the points have degeneracy one. It also follows from above that if a point with a degeneracy one belongs to the periodic solution then the period cannot be shorter than the counter period. Bit-Flip Probabilities. Below we calculate the bit-ﬂip probabilities for the counter bits. Let the bit-wise carry Φ[j 1] from bit position j to bit position j 1 be deﬁned as: 1 if C [j] + A[j] + Φ[j] ≥ 2 [j 1] Φ (33) = 0 otherwise where x y ≡ x + y mod 256 and C and A are deﬁned above. The value of C [j] only changes when either Φ[j] = 1 and A[j] = 0 or Φ[j] = 0 and A[j] = 1. The probability of the carry can be found by solving a system of recursive equations for carry probability as is shown in the following. The probability for carry from bit position j is given by: A[j] + P Φ[j 1] = 1 (34) P Φ[j] = 1 = 2 where x y ≡ x − y mod 256. Inserting the same expression for P Φ[j 1] = 1 into this equation we obtain A[j] [j] A[j 1] + P Φ[j 2] = 1 . (35) P Φ =1 = 1 + 2 22

Rabbit: A New High-Performance Stream Cipher

329

Continuing like this we get [j] A[j 1] A[j] + P Φ[j] ) = 1 A[j 2] A[j 255] P Φ =1 = + + ... + + 21 22 2255 2256

(36)

which can be rearranged into (2256 − 1)P Φ[j] = 1 = 2255 A[j 1] + 2254 A[j 2] + . . . + 21 A[j 255] + 20 A[j] . (37) This can equivalently be written as A≫j P Φ[j] = 1 = 256 . 2 −1 Inserting this expression into:   P Φ[j] = 0 = 1 − P Φ[j] = 1 if A[j] = 1 [j] P Φ = A[j] =  PΦ[j] = 1 if A[j] = 0 = A[j] − P Φ[j] = 1

(38)

(39)

leads to the following equation describing the probability for a bit-ﬂip at position j. [j] [j] A ≫ j [j] . (40) = A − 256 P Φ = A 2 − 1 The probabilities will be unique for each bit position, as A is formed by repeating the 6-bit block 110100, which ﬁts unevenly into a 256-bit integer. Consequently, A ≫ i = A for all i mod 256 = 0, thereby making P Φ[j] = A[j] unique for each j.

Helix: Fast Encryption and Authentication in a Single Cryptographic Primitive Niels Ferguson1 , Doug Whiting2 , Bruce Schneier3 , John Kelsey4 , Stefan Lucks5 , and Tadayoshi Kohno6 1

MacFergus [email protected] 2 HiFn [email protected] 3 Counterpane Internet Security [email protected] 4 [email protected] 5 Universit¨ at Mannheim [email protected] 6 UCSD [email protected]

Abstract. Helix is a high-speed stream cipher with a built-in MAC functionality. On a Pentium II CPU it is about twice as fast as Rijndael or Twoﬁsh, and comparable in speed to RC4. The overhead per encrypted/authenticated message is low, making it suitable for small messages. It is eﬃcient in both hardware and software, and with some pre-computation can eﬀectively switch keys on a per-message basis without additional overhead. Keywords: Stream cipher, MAC, authentication, encryption.

1

Introduction

Securing data in transmission is the most common real-life cryptographic problem. Basic security services require both encryption and authentication. This is (almost) always done using a symmetric cipher—public-key systems are only used to set up symmetric keys—and a Message Authentication Code (MAC). The AES process provided a number of very good block cipher designs, as well as a new block cipher standard. The cryptographic community learned a lot during the selection process about the engineering criteria for a good cipher. AES candidates were compared in performance and cost in many diﬀerent implementation settings. We learned more about the importance of fast re-keying and tiny-memory implementations, the cost of S-boxes and circuit-depth for hardware implementations, the slowness of multiplication on some platforms, and other performance considerations. The community also learned about the diﬀerence of cryptanalysis in theory versus cryptanalysis in practice. Many block cipher modes restrict the types of T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 330–346, 2003. c Springer-Verlag Berlin Heidelberg 2003

Helix: Fast Encryption and Authentication

331

attack that can be performed on the underlying block cipher. Yet the generally accepted attack model for block ciphers is very liberal. Any method that distinguishes the block cipher from a random permutation is considered an attack. Each block cipher operation must protect against all types of attack. The resulting over-engineering leads to ineﬃciencies. Computer network properties like synchronization and error correction have eliminated the traditional synchronization problems of stream-cipher modes like OFB. Furthermore, stream ciphers have diﬀerent implementation properties that restrict the cryptanalyst. They only receive their inputs once (a key and a nonce) and then produce a long stream of pseudo-random data. A stream cipher can start with a strong cryptographic operation to thoroughly mix the key and nonce into a state, and then use that state and a simpler mixing operation to produce the key stream. If the attacker tries to manipulate the inputs to the cipher he encounters the strong cryptographic operation. Alternatively he can analyse the key stream, but this is a static analysis only. As far as we know, static attacks are much less powerful than dynamic attacks. As there are fewer cryptographic requirements to fulﬁll, we believe that the key stream generation function can be made signiﬁcantly faster, per message byte, than a block cipher can be. Given the suitability of steam ciphers for many practical tasks and the potential for faster implementations, we believe that stream ciphers are a fruitful area of research. Additionally, a stream cipher is often implemented—and from a cryptographic point of view, should always be implemented—together with a MAC. Encryption and authentication go hand in hand, and signiﬁcant vulnerabilities can result if encryption is implemented without authentication. Outside the cryptographic literature, not using a proper MAC is one of the commonly encountered errors in stream cipher systems. A stream cipher with built-in MAC is much more likely to be used correctly, because it provides a MAC without the associated performance penalties. Helix is an attempt to combine all these lessons.

2

An Overview of Helix

Helix is a combined stream cipher and MAC function, and directly provides the authenticated encryption functionality. By incorporating the plaintext into the stream cipher state Helix can provide the authentication functionality without extra costs [Gol00]. Helix’s design strength is 128 bits, which means that we expect that no attack on the cipher exists that requires fewer than 2128 Helix block function evaluations to be carried out. Helix can process data in less than 7 clock cycles per byte on a Pentium II CPU, more than twice as fast as AES. Helix uses a 256-bit key and a 128-bit nonce. The key is secret, and the nonce is typically public knowledge. Helix is optimised for 32-bit platforms; all operations are on 32-bit words. The only operations used are addition modulo 232 , exclusive or, and rotation by ﬁxed numbers of bits. The design philosophy of Helix can be summarized as “many simple rounds.”

332

Niels Ferguson et al.

Helix has a state that consists of 5 words of 32 bits each. (This is the maximum state that can ﬁt in the registers of the current Intel CPUs.) A single round of Helix consists of adding (or xoring) one state word into the next, and rotating the ﬁrst word. This is shown in Figure 1 where the state words are shown as vertical lines.

Fig. 1. A single round of Helix.

Multiple rounds are applied in a cyclical pattern to the state. The horizontal lines of the rounds wind themselves in helical fashion through the ﬁve state words. Twenty rounds make up one block (see Figure 2). Helix actually uses two intertwined helices; a single block contains two full turns of each of the helices. During each block several other activities occur. During block i one word of key stream is generated (Si ), two words of key material are added (Xi,0 and Xi,1 ), and one word of plaintext is added (Pi ). The output state of one block is used as input to the next, so the computations shown in ﬁgure 2 are all that is required to process 4 bytes of the message. As with any stream cipher, the ciphertext is created by xoring the plaintext with the key stream (not shown in the ﬁgure). At the start of an encryption a starting state is derived from the key and nonce. The key words Xi,j depend on the key, the length of the input key, the nonce, and the block number i. State guessing attacks are made more diﬃcult by adding key material at double the rate at which key stream material is extracted. At the end of the message some extra processing is done, after which a 128-bit MAC tag is produced to authenticate the message.

3

Deﬁnition of Helix

The Helix encryption function takes as input a variable length key U of up to 32 bytes, a 16-byte nonce N , and a plaintext P . It produces a ciphertext message and a tag that provides authentication. The decryption function takes the key, nonce, ciphertext, and tag, and produces either the plaintext message or an error if the authentication failed. 3.1

Preliminaries

Helix operates on 32-bit words while the inputs and outputs are a sequences of bytes. In all situations Helix uses the least-signiﬁcant-byte ﬁrst convention. A sequence of bytes xi is identiﬁed with a sequence of words Xj by the relations

334

Niels Ferguson et al.

Xj :=

3

x(4j+k) · 2

8k

k=0

xi :=

Xi/4

28(i mod 4)

mod 28

These two equations are complimentary and show the conversion both ways. Let (x) denote the length of a string of bytes x. The input key U consists of a sequence of bytes u0 , u1 , . . . , u(U )−1 with 0 ≤ (U ) ≤ 32. The key is processed through the key mixing function, deﬁned in section 3.7, to produce the working key which consists of 8 words K0 , . . . , K7 . The nonce N consists of 16 bytes, interpreted as 4 words N0 , . . . , N3 . The plaintext P and ciphertext C are both sequences of bytes of the same length, with the restriction that 0 ≤ (P ) < 264 . Both are manipulated as a sequence of words, Pi and Ci respectively. The last word of the plaintext and ciphertext might be only partially used. The ‘extra’ plaintext bytes in the last word are taken to be zero. The ‘extra’ ciphertext bytes are irrelevant and never used. Note that the cipher is speciﬁed for zero-length plaintexts; in this case, only a MAC is generated. 3.2

A Block

Helix consists of a sequence of blocks. The blocks are numbered sequentially which assigns each block a unique number i. At the start of block i the state (i) (i) consists of 5 words: Z0 , . . . , Z4 ; at the end of the block the state consists of (i+1) (i+1) Z0 , . . . , Z4 which form the input to the next block with number i + 1. Block i also uses as input two key words Xi,0 and Xi,1 , and the plaintext word (i) Pi . It produces one word of key stream Si := Z0 ; the ciphertext words are deﬁned by Ci := Pi ⊕ Si . Instead of repeating the block deﬁnition in formulas, we deﬁne the block function using ﬁgure 2. All values are 32-bit words, exclusive or is denoted by ⊕, addition modulo 232 is denoted by , and rotation by ≪. In the remainder of this paper, the terms “block,” and “block function” are used interchangeably. 3.3

Key Words for Each Block

The expanded key words are derived from the working key K0 , . . . , K7 , the nonce N0 , . . . , N3 , the input key length (U ), and the block number i. We ﬁrst extend the nonce to 8 words by deﬁning Nk := (k mod 4) − Nk−4 (mod 232 ) for k = 4, . . . , 7. The key words for block i are then deﬁned by Xi,0 := Ki mod 8 Xi,1 := K(i+4) mod 8 + Ni mod 8 + Xi + i + 8  31  (i + 8)/2 if i mod 4 = 3 Xi := 4 · (U ) if i mod 4 = 1   0 otherwise

Helix: Fast Encryption and Authentication

335

where all additions are taken modulo 232 . Note that Xi encodes bits 31 to 62 of the value i + 8; this is not the same as the upper 32 bits of i + 8. 3.4

Initialisation

A Helix encryption is started by setting (−8)

Zi

= Ki+3 ⊕ Ni

(−8) Z4

= K7

for i = 0, . . . , 3

Eight blocks are then applied, using block number -8 to -1. For these block the plaintext word Pi is deﬁned to be zero, and the generated key stream words are discarded. 3.5

Encryption

After the initialisation the plaintext is encrypted. Let k := ((P ) + 3)/4 be the number of words in the plaintext. The encryption consists of k blocks numbered 0 to k −1. Each block generates one word of key stream, which is used to encrypt one word of the plaintext. Depending on (P ) mod 4, between 1 and 4 of the bytes of the last key stream word are used. 3.6

Computing the MAC

Just after the block that encrypted the last plaintext byte, one of the state words (k) is modiﬁed. The state word Z0 is xored with the value 0x912d94f11 . Using this modiﬁed state, eight blocks, numbered k, . . . , k + 7 are applied for post-mixing. For these blocks the generated key stream is discarded and the plaintext word Pi is deﬁned as (P ) mod 4. After the post-mixing, four more blocks, numbered k +8, . . . , k +11, are applied. The key stream generated by these four blocks form the tag. The plaintext input remains the same as in the previous eight blocks. 3.7

Key Mixing

The key mixing converts a variable-length input key U to the ﬁxed-length working key, K. First, the Helix block function is used to create a round function F that maps 128 bits to 128 bits. The four input words to F are extended with a single word with value (U ) + 64 to form a 5-word state. The block function is then applied with zero key inputs and zero plaintext input. The ﬁrst four state words of the resulting state form the result of F . 1

This constant is constructed by taking the 6 least signiﬁcant bits of each of the ASCII characters of the string “Helix”, and putting a single one bit both before and after it.

336

Niels Ferguson et al.

The input key U is ﬁrst extended with 32 − (U ) zero bytes. The 32 key bytes are converted to 8 words K32 , . . . , K39 . Further key words are deﬁned by the equation (K4i , . . . , K4i+3 ) := F ((K4i+4 , . . . , K4i+7 )) ⊕ (K4i+8 , . . . , K4i+11 ) for i = 7, . . . , 0. The words K0 , . . . , K7 form the working key of the cipher. (This recursion deﬁnes a Feistel-type cipher on 256-bit blocks.) 3.8

Decryption

Decryption is almost identical to encryption. The only diﬀerences are: – The key stream generated at the start of each block is used to decrypt the ciphertext and produce the plaintext word that is required half a block later. Care has to be taken with the last plaintext word to ensure that unused plaintext bytes are taken to be zero and not ﬁlled with the extra key stream bytes. – Once the tag has been generated it is compared to the tag provided. If the two values are not identical, all generated data (i.e. the key stream, plaintext, and tag) is destroyed.

4

Implementation

Compared to other ciphers Helix is relatively easy to implement in software. If 32bit addition, exclusive or, and rotation functions are available, all the functions are easily implemented. Helix is also fast. A single round takes only a single clock cycle to compute on a Pentium II CPU, because the super-scalar architecture can perform an addition or xor simultaneously with a 32-bit rotation. A block of Helix takes 20 cycles plus some overhead for the handling of the plaintext, key stream, and ciphertext. Our un-optimised assembly implementation requires less than 7 clock cycles per byte. This compares to about 16 clock cycles per byte for the best AES implementation on the same platform2 . Most implementation ﬂexibility is in the way the key schedule is computed. The key mixing only needs to be done once for each key value. The recurrence relation used in the key mixing implements a Feistel cipher, so the key mixing can be done in-place. The Xi,1 key words can mostly pre-computed with only the block number being added every block. Implementations that limit the plaintext size to 232 bytes can ignore the upper bits of the block number in the deﬁnition of Xi because these bits will always be zero. 2

This is a somewhat unfair comparison. The AES implementation does not actually read the data from memory, encrypt it, and write it back, which would slow it down further. What is more, most block cipher modes only provide encryption or authentication so two passes over the message are required. The alternative is to use one of the new authenticated encryption modes, such as [Jut01], but they are all patented and require a license.

Helix: Fast Encryption and Authentication

337

Helix is also fast in hardware. The rotations cost no time, although they do consume routing resources in chip layouts. The critical path through the block function consists of 6 additions and 5 xors. As the critical path contains no rotations, a certain amount of ripple of the adders can be overlapped, with the lower bits being produced and used before the upper bits are available. A more detailed analysis of this overlapping is required for any high-speed implementation. A conservative estimate for a relatively low-cost ASIC layout is 2.5 ns per 32-bit adder and 0.5 ns per xor , which adds up to 17.5 ns/block. This translates to more than 200 MByte per second, or just under 2 Gbit per second.

5

Use

One of the dangers of a steam cipher is that the key-stream will be re-used. To avoid this problem Helix imposes a few restrictions on the sender and receiver: – The sender must ensure that each (K,N ) pair is used at most once to encrypt a message. A single sender must use a new, unique, nonce for each message. Multiple senders that want to use the same key have to ensure that they never choose the same nonce, for example by dividing the nonce space between them. If two diﬀerent messages are ever encrypted with the same (K,N ) pair, Helix loses its security properties. – The receiver may not release the plaintext P , or the key stream, until it has veriﬁed the tag successfully. In most situations this requires the receiver to buﬀer the entire plaintext before it is released. These requirements seem restrictive, but they are in fact implicitly required by all stream ciphers (e.g. RC4) and many block cipher modes (e.g. OCB [RBBK01b,RBBK01a] and CCM [WHF]) Although Helix allows the use of short keys, we strongly recommend the use of keys of at least 128 bits, preferably 256 bits.

6

Other Modes of Use

So far we have described Helix as providing both encryption and authentication. Helix can be used in other modes as well. For any particular key Helix should be used in only one of these modes. Using several modes with a single key can lead to a loss of security. 6.1

Unencrypted Headers

In packet environments it is often desirable to authenticate the packet header without encrypting it. From the encryption/authentication layer this looks like an additional string of data that is to be authenticated but not encrypted. We deﬁne a standard method of handling such additional data without modifying the basic Helix computations. First a length ﬁeld is formed which is eight bytes long and encodes the length of the additional data in least-signiﬁcant-byte ﬁrst format. The additional data

338

Niels Ferguson et al.

is padded with 0–3 zero bytes until the length is a multiple of four. The concatenation of the length ﬁeld, the padded additional data, and the message data are then processed as a normal message through Helix. The ciphertext bytes corresponding to the length ﬁeld and the padded additional data are discarded, leaving only the ciphertext of the message data and the tag. 6.2

Pure Stream Cipher & PRNG

Helix can be use as a pure stream cipher by ignoring the MAC computations at the end. And like any stream cipher, Helix is a cryptographically strong pseudorandom number generator. For every (key,nonce) input it produces a stream of pseudo-random data. This makes Helix suitable for use as a PRNG. 6.3

MAC with Nonce

Helix can also be used a pure MAC function. The data to be authenticated is encrypted, but the ciphertext is discarded. The receiver similarly discards the key stream and just feeds the plaintext to the Helix rounds. In this mode Helix is signiﬁcantly faster than, for example, HMAC-SHA1, but it does require a unique nonce for each message. Unfortunately, it is insecure to use Helix with a ﬁxed nonce value, due to collisions on the 160-bit state.

7

Design Rationale

Although the design strength of Helix is 128 bits, we use 256-bit keys. This avoids a very general class of attacks that exploits collisions on the key value. For ﬂexibility Helix also allows shorter keys to be used, as there are many practical situations in which fewer than 256 bits of key material are available. The small set of elementary operations that Helix uses makes it eﬃcient on a large number of platforms. The absence of tables makes Helix eﬃcient in hardware as well. Most ciphers use lookup tables to provide the necessary nonlinearity. In Helix the nonlinearity comes from the mixing of xors with additions. Neither of these operations can be approximated well within the group of the other. There are some good approximations, but on average the approximations are quite bad [LM01]. The diﬀusion in Helix is not terribly fast, but it is unstoppable. As the attacker has very little control over the state, it is not possible to limit the diﬀusion of diﬀerences. In those areas where dynamic attacks are possible we use a sequence of 8 blocks to ensure thorough mixing of the state words. The key mixing is an un-keyed bijective function. The purpose is to spread the available entropy over all key words. If, for example, the key is provided by a SHA-1 computation then only 5 words of key material are provided. The key mixing ensures that all 8 key words depend on the key material. Using a bijective mixing function ensures that no two 256-bit input keys lead to the same working

Helix: Fast Encryption and Authentication

339

key values. The use of the input key length in X guarantees that even keys that lead to the same working key (each short key leads to a working key that is also produced by a 256-bit key) do not lead to equivalent Helix encryptions. 7.1

Key Schedule

The Xi,0 values simply cycle through the key words. The Xi,1 values depend on the same key words in anti-phase, the extended nonce words, the block number, and the input key length. This key schedule has a number of properties. All 8 key words and and all 4 nonce words aﬀect the state every 4 blocks. The key schedule also ensures that diﬀerent (K, N ) pairs produce diﬀerent block key sequences. Even stronger: no sequence of 17 key words ever occurs twice across all keys, all nonce values, and all positions in the encryption computation. To demonstrate this we look at the sequence Yj := Xj/2,j mod 2 . This is the sequence of key words in the order they are used. Given just part of the sequence Yj , without the proper index values j, we can recover the key, nonce, and block number. (When the plaintext word is zero the ﬁrst half of the block function is identical to the second half of the block function, so it makes sense to look at the sequence Yj and allow half-block oﬀsets.) If Yj = Yj+16 then j is even, otherwise j is odd. This allows us to split the Y values back into an Xi,0 and Xi,1 sequence. Now consider Ri := Xi,1 − Xi,0 + Xi+4,1 − Xi+4,0 = Ni mod 8 + N(i+4) mod 8 + Xi + Xi+4 + 2i + 20 = (i mod 4) + 2i + 20 + Xi + Xi+4

all modulo 232 . We ﬁrst look at Ri mod 4. The X terms can only have a nonzero contribution if i mod 4 = 3, so 3 out of 4 consecutive times we get just ((i mod 4) + 2i) mod 4 = 3i mod 4, which gives us i mod 4. Looking at the full Ri for an i with i mod 4 = 0 gives us i mod 231 . The sum Xi + Xi+4 from the case 3 i mod 4 = 3 gives us the upper bits of i. This recovers the block number, i. Given i mod 8 we can recover the working key from the Xi,0 ’s. Knowledge of i and the key words allows us to compute the key length and the nonce from the Xi,1 ’s, as well as check the redundancy introduced by the nonce expansion to 8 words. We have not investigated whether it is possible to recover the key, nonce, and block number from fewer than 17 consecutive key words. A simple counting argument shows that at least 14 are required. This remains an open problem. 7.2

Choice of Rotation Counts

The strength of Helix is depends on the rotation counts chosen for the Helix block function. The rotations provide the diﬀusion between the various bit positions 3

This isn’t absolutely perfect. We don’t recover the 62’nd bit of i + 8, but this bit will only be set during the very last few blocks of a message very close to 264 bytes long. This does not lead to a weakness.

340

Niels Ferguson et al.

in the state words. During the design process we examined the impact of various choices of rotation counts both in terms of attempts to cryptanalyze the cipher, and also in terms of their impact on statistical tests of the block function. To analyse the diﬀusion properties of a set of rotation counts, consider a variant of the block function with all the additions are changed to xors. (This is equivalent to ignoring the carries in the additions.) In this variant we can track which output bits are aﬀected by which input bits. For this analysis we consider an output bit aﬀected if its computational path has a dependency on the input bit at any one point, even if the output bit in our linearised block function is not changed due to several dependencies canceling out. This seems to be the most suitable way to analyse diﬀusion and is related to the independence assumption in diﬀerential and linear cryptanalysis. A set of rotation counts can, at best, ensure that changing a single state input bit aﬀects at least 21 bits of the output. There are a large number (over 6 000) of such rotation count sets. We discarded all rotation count sets that contained a rotation count of 0, 1, 8, 16, 24, or 31. Rotation by a multiple of 8 has a relatively low order, and rotation by 1 or 31 bit positions provides diﬀusion between adjacent bits, something the carry bits already do. This reduced the set of candidate rotation counts to 86. Using the full block function we ran statistical tests on many candidate rotation count sets to see how these values would aﬀect the ability of the block function to diﬀuse changes and mix together separate information within the 160-bit internal state. Among our tests, we considered: 1. The number of rounds required before all output bits passed binomial tests given a ﬁxed input diﬀerence in the state. 2. The number of rounds required before the output states’ Hamming weight distribution passed a χ2 test given low- and high-Hamming weight input states. 3. The number of round required before the output states’ diﬀerences Hamming weight distribution passed a χ2 test given low- and high-Hamming weight diﬀerences in the input state [KRRR98]. 4. Low- and high-Hamming weight higher-order diﬀerences, and the number of rounds required before the resulting output diﬀerences’ Hamming weights passed a χ2 test. The surprising result was that most rotation counts did pretty well. Our carefully-selected rotation count sets were slightly better than random ones, but only by a small margin. Degenerate rotation counts (all rotation counts equal, or most rotation counts zero) led to much worse test results. At the end of our analysis, we selected more or less at random from the remaining candidates. Based on our limited analysis, the speciﬁc choice of rotation counts does not have a strong impact on the security of Helix, with only the caveat that we had to avoid some obvious degenerate cases.

Helix: Fast Encryption and Authentication

8

341

Conclusions and Intellectual Property Statement

Most applications that require symmetric cryptography actually require both encryption and authentication. We believe that the most eﬃcient way to achieve this combined goal is to design cryptographic primitives speciﬁcally for the task. Towards this end, we present such a new cryptographic primitive, called Helix. We hope that Helix and this paper will spur additional research in authenticated encryption stream ciphers. As with any experimental design, we remark that Helix should not be used until it has received additional cryptanalysis. Finally, we hereby explicitly release any intellectual property rights to Helix into the public domain. Furthermore, we are not aware of any patent or patent application anywhere in the world that cover Helix.

Acknowledgements We would like to thank David Wagner, Rich Schroeppel, and the anonymous referees for their helpful comments and encouragements. Felix Schleer helped us by creating one of the reference implementations.

References [Arm02] Frederik Armknecht. A linearization attack on the Bluetooth key stream generator. Cryptology ePrint Archive, Report 2002/191, 2002. http://eprint. iacr.org/2002/191. [Cou02] Nicolas Courtois. Higher order correlation attacks, XL algorithm, and cryptanalysis of Toyocrypt. In Information Security and Cryptology–Icisc 2002, volume 2587 of Lecture Notes in Computer Science. Springer-Verlag, 2002. To appear. [CP02] Nicolas Courtois and Josef Pieprzyk. Cryptanalysis of block ciphers with overdeﬁned systems of equations. In Yuliang Zheng, editor, Advances in Cryptology—ASIACRYPT2002, volume 2501 of Lecture Notes in Computer Science, pages 267–287. Springer-Verlag, 2002. [DGV93] Joan Daemen, Ren´e Govaerts, and Joos Vandewalle. Resynchronisation weaknesses in synchronous stream ciphers. In Tor Helleseth, editor, Advances in Cryptology—EUROCRYPT ’93, volume 765 of Lecture Notes in Computer Science, pages 159–167. Springer-Verlag, 1993. [Gol00] Jovan Dj. Goli´c. Modes of operation of stream ciphers. In Douglas R. Stinson and Staﬀord Tavares, editors, Selected Areas in Cryptography, 7th Annual International Workshop, SAC 2000, volume 2012 of Lecture Notes in Computer Science, pages 233–247. Springer-Verlag, 2000. [Jut01] Charanjit S. Jutla. Encryption modes with almost free message integrity. In Birgit Pﬁtzmann, editor, Advances in Cryptology—EUROCRYPT2001, volume 2045 of Lecture Notes in Computer Science, pages 529–544, 2001. [KRRR98] Lars R. Knudsen, Vincent Rijmen, Ronald L. Rivest, and M.J.B. Robshaw. On the design and security of RC2. In Serge Vaudenay, editor, Fast Software Encryption, 5th International Workshop, FSE’98, volume 1372 of Lecture Notes in Computer Science, pages 206–221. Springer-Verlag, 1998.

342

Niels Ferguson et al.

[LM01] Helger Lipmaa and Shiho Moriai. Eﬃcient algorithms for computing diﬀerential properties of addition. In Mitsuru Matsui, editor, Fast Software Encryption2001, Lecture Notes in Computer Science. Springer-Verlag, To appear, 2001. Available from http://www.tcs.hut.fi/˜helger/papers/lm01/. [RBBK01a] Philip Rogaway, Mihir Bellare, John Black, and Ted Krovetz. OCB: A block-cipher mode of operation for eﬃcient authenticated encryption, September 2001. Available from http://www.cs.ucdavis.edu/˜rogaway. [RBBK01b] Phillip Rogaway, Mihir Bellare, John Black, and Ted Krovetz. OCB: A block-cipher mode of operation for eﬃcient authenticated encryption. In Eighth ACM Conference on Computer and Communications Security (CCS-8), pages 196– 205. ACM Press, 2001. [WHF] Doug Whiting, Russ Housley, and Niels Ferguson. Counter with CBC-MAC (CCM). Available from csrc.nist.gov/encryption/modes/proposedmodes/ccm/ ccm.pdf.

A

Test vectors

The authors will maintain a web site at http://www.macfergus.com/helix with news, example code, and test vectors. We give some simple test vectors here. (The 8-word working key is given as a sequence of 32 bytes, least signiﬁcant byte ﬁrst.) Initial Key: <empty string> Nonce: 00 00 00 00 00 Working Key: a9 3b 6e 32 bc e3 da 57 7d ef Plaintext: 00 00 00 00 00 Ciphertext: 70 44 c9 be 48 MAC: 65 be 7a 60 fd

00 23 7c 00 ae 3b

00 4f 1b 00 89 8a

00 6c 64 00 22 5e

00 32 af 00 66 31

00 6c 78 00 e4 61

00 00 00 00 00 00 0f 82 74 ff a2 41 7c 38 dc ef e3 de

Initial Key: 00 04 Nonce: 00 Working Key: 6e 04 Plaintext: 00 04 Ciphertext: 7a 0c MAC: e4

80 80 56 32 d8 10

00 00 00 e9 f8 00 00 72 74 e5

00 00 00 a7 4a 00 00 a7 46 49

00 00 00 6c d6 00 00 5b a3 01

01 05 01 bd 83 01 05 62 bf c5

00 00 00 0b 12 00 00 50 3f 0b

00 00 00 f6 f9 00 00 38 99 34

00 00 00 20 06 00 00 0b e6 e7

02 06 02 a6 ed 02 06 69 65 80

00 00 00 d9 d1 00 00 75 56 c0

00 00 00 b7 a6 00 00 1c b9 9c

00 00 00 59 98 00 00 d1 c1 39

03 07 03 49 9e 03 07 28 18 b1

00 00 00 d3 c8 00 00 30 ca 09

00 00 00 39 9d 00 00 8d 7d a1

00 00 00 95 45 00 00 9a 87 17

Initial Key: 48 65 Nonce: 30 31 Working Key: 6c 1e f4 03 Plaintext: 48 65 Ciphertext: 6c 4c MAC: 6c 82

6c 32 d7 28 6c 27 d1

69 33 7a 4a 6c b9 aa

78 34 cb 73 6f 7a 3b

35 a3 9b 2c 82 90

36 a1 b6 20 a0 5f

37 d2 9f 77 c5 12

38 8f 35 6f 80 f1

39 1c 7a 72 2c 44

61 d6 85 6c 23 3f

62 20 f5 64 f2 a7

63 6d 51 21 0d f6

64 65 66 f1 15 da 32 11 39 a1 01 d2

Helix: Fast Encryption and Authentication

B

343

Cryptanalysis

Helix is intended to provide everything needed for an encrypted and authenticated communications session. A successful attack on Helix will have occurred when an attacker can either predict a keysteam bit he hasn’t seen with a probability slightly higher than 50%, or when he can create a forged or altered message that is accepted by the recipient with a probability substantially higher than 2−128 . To be meaningful given the 128-bit security bound of Helix, any such attack must require fewer than 2128 block function evaluations for all participants combined. Also, any such attack must obey the security requirements placed on Helix’ operations, e.g., no reuse of nonces, MACs checked before decrypted messages released, etc. In this section, we consider a number of possible ways to attack Helix. Although our time and resources have been limited, we have not yet discovered any workable method of attacking Helix. B.1

Static Analysis

A static analysis just takes the key stream and tries to reconstruct the state and key. Several properties make this type of attack diﬃcult. Even if the whole state is known, any four consecutive key stream words are fully random. This is because each Xi,1 key value aﬀects Si+1 in a bijective manner, so for any given state and any sequence of Xi,0 words there is a bijective mapping from K(i+4) mod 8 , . . . , K(i+7) mod 8 to Si+1 , . . . , Si+4 . A similar argument applies when the block function is computed backwards. Any attempt to recover the key, even if the state is known at a single point, must therefore span at least 4 blocks and 5 key stream words. Of course, there is no reasonable way of ﬁnding the state. At the beginning of each block there is 128 bits of unknown state. (The 32 bits of the key stream word are known to the attacker.) As the design strength is 128 bits, an attacker cannot aﬀord to guess the entire state. A partially guessed state does not help much as key material is added at twice the rate that key stream is produced. B.2

Period Length

Helix’ internal state is updated continuously by the plaintext it is encrypting. So long as the plaintext is not repeating, the keystream should have an arbitrarily long period. With a ﬁxed or repeating plaintext, the Helix state does not cycle either. In section 7.1 we showed that any 17 consecutive key words used as inputs to the block function are unique. The nonrepeating key word values prevent the state from ever falling into a cycle. B.3

State Collisions

The 160-bit state of Helix can be expected to collide for some (key,nonce) pairs. However, this doesn’t lead to a weakness, because the state collision is guaranteed

344

Niels Ferguson et al.

not to survive long enough to yield an attack, or even allow reliable detection by the attacker. To detect a collision on 160 bit values requires 160-bits of information about each state. But in the four block computations required to generate 160 bits of key stream the whole key, nonce, and block number get added to the state. Starting at the same state these inputs will introduce a diﬀerence in the key stream, and make it impossible to detect the state collision4 . B.4

Weak Keys

Helix makes constant use of the words of the working key. An all-zero working key intuitively seems like a bad thing (it eﬀectively omits a few operations from the block function), but we have not discovered any possible attack based on it. The all-zero working key is only generated by a single key of 32 bytes length. Shorter key length cannot generate the all-zero working key. The all-zero working key does not seem to have any practical security relevance, and there is no reason to treat this key diﬀerently from any other key. B.5

Adaptive Chosen Plaintext Attacks

Because the plaintext aﬀects the state, Helix allows an attack model that traditional stream ciphers prevent: An attacker can request the encryption of a plaintext block under an already established (key,nonce) pair, and can use the resulting ciphertext to determine what plaintext to request next. We have found no way to use such an attack against Helix. As with the discussion of static analysis, above, the large unknown and untouchable state, and the continual mixing of key material into that state, appear to defeat attempts to use control over one input of the block function to control other parts of its state. Additionally, the usage restrictions on Helix do not allow reuse of nonces, which ensures that the state is always a “moving target.” B.6

Chosen Input Diﬀerential Attacks

One powerful mode of attack is for the attacker to make small changes in the input values and look at how the changes propagate through the cipher. In Helix, this can be done only with the key or the nonce. In each case, the block function is applied multiple times to the input. In Helix all the places where such attacks are possible we have eight consecutive blocks without any output. A change to the nonce, such as is considered in [DGV93], will be thoroughly mixed into the state by the time the ﬁrst key stream word is generated. Similarly, a change to the last plaintext byte is thoroughly mixed into the state before the ﬁrst MAC tag word is generated. A diﬀerential attack would have to use a diﬀerential through 8 blocks, or 160 rounds of Helix. A search found no useful diﬀerentials for 8 blocks of Helix, nor useful higher-order diﬀerentials. 4

State collisions where the key and nonce are the same and the block number diﬀers only in the upper 30 bits also do not lead to an attack.

Helix: Fast Encryption and Authentication

345

Fig. 3. A round of Single-Helix.

B.7

Algebraic Attacks Over GF(2)

The only reasonable line of attack we have found so far is to apply equationsolving techniques. In 2002, XSL was used to analyse block ciphers [CP02]. An attack on Serpent seems to be marginally better than brute force, another attack on the AES is slower than brute force. Similar techniques have been used to successfully analyse stream ciphers [Cou02,Arm02]. We have tried to analyse Helix by algebraic techniques. Under an optimistic assumption (from the attacker’s point of view) on the number of linearindependent equations, the best attack we could think of requires solving an

346

Niels Ferguson et al.

(overdeﬁned) system of ≈ 249.7 linear equations in N = 249.1 binary variables. Gaussian elimination needs N 3 ≈ 2147.3 steps, and falls well outside our security bound. [CP02] suggest to use another algorithm, which takes O(N 2.376 ) steps, but with an apparently huge proportional constant. In our case N 2.376 ≈ 2116.7 , so even a relatively small proportional constant pushes this beyond our security bound5 . Our analysis has not resulted in an attack that requires less work than 2128 block function evaluations, and we conjecture that no such attack exists.

C

Single Helix

Most ciphers are analysed by ﬁrst creating simpliﬁed versions and attacking those. Apart from the obvious methods of simplifying Helix we present Single Helix as an object for study. Single Helix uses only one helix instead of two interleaved ones, and has signiﬁcantly slower diﬀusion in the backwards direction. A block of single Helix is shown in Figure 3. This uses an alternative conﬁguration where the key and plaintext inputs are added directly to the state words.

5

Due to space constraints, we left out a more detailed description of the attack.

PARSHA-256 – A New Parallelizable Hash Function and a Multithreaded Implementation Pinakpani Pal and Palash Sarkar Cryptology Research Group ECSU and ASU Indian Statistical Institute 203, B.T. Road, Kolkata India 700108 {pinak,palash}@isical.ac.in

Abstract. In this paper, we design a new hash function PARSHA-256. PARSHA-256 uses the compression function of SHA-256 along with the Sarkar-Schellenberg composition principle. As a consequence, PARSHA256 is collision resistant if the compression function of SHA-256 is collision resistant. On the other hand, PARSHA-256 can be implemented using a binary tree of processors, resulting in a signiﬁcant speed-up over SHA-256. We also show that PARSHA-256 can be eﬃciently implemented through concurrent programming on a single processor machine using a multithreaded approach. Experimental results on P4 running Linux show that for long messages the multithreaded implementation is faster than SHA-256. Keywords: hash function, SHA-256, parallel algorithm, binary tree.

1

Introduction

A collision resistant hash function is a basic primitive of modern cryptography. One important application of such functions is in “hash-then-sign” digital signature protocols. In such a protocol a long message is hashed to produce a short message digest, which is then signed. Since hash functions are typically invoked on long messages, it is very important for the digest computation algorithm to be very fast. Design of hash functions has two goals – collision resistance and speed. For the ﬁrst goal, it is virtually impossible to describe a hash function and prove it to be collision resistant. Thus we have to assume some function to be collision resistant. It seems more natural to make this assumption when the input is a short string rather than a long string. On the other hand, the input to a practical hash function can be arbitrarily long. Thus one has to look for a method of extending the domain of a hash function in a secure manner, i.e., the hash function on the larger domain is collision resistant if the hash function on the smaller domain is collision resistant. In the literature, the ﬁxed domain hash function which

This work has been supported partially by the ReX program, a joint activity of USENIX Association and Stichting NLnet.

T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 347–361, 2003. c International Association for Cryptologic Research 2003

348

Pinakpani Pal and Palash Sarkar

is assumed to be collision resistant is called the compression function and the method to extend the domain is called the composition principle. A widely used composition principle is the Merkle-Damg˚ ard (MD) principle introduced in [3, 7]. To the best of our knowledge most known practical algorithms like MD5, SHA family, RIPEMD-160 [4], etc. are built using the MD composition principle. These functions vary in the design of the compression function. In fact, most of the work on practical hash function design has concentrated on the design of the compression function. This is due to the fact that the known attacks on practical hash functions are actually based on attacks on the compression function. See [9] for a survey and history of hash functions. As a result of the intense research on the design of compression function, today there are a number of compression functions which are widely believed to be collision resistant. Some examples are the compression functions of RIPEMD-160, SHA-256, etc. As mentioned before, the other aspect of practical hash functions is the speed of the algorithm to compute the digest. One way to improve the speed is to use parallelism. Parallelism in the design of hash funcions has been studied earlier. The compression function of RIPEMD-160 has a built in parallel path [4]. In [2] the parallelism present in the compression function of the SHA family is studied. In a recent work, [8] studies the eﬃciency of parallel implementation of some dedicated hash functions. In [6], Knudsen and Preneel describe a parallelizable construction of secure hash function based on error-correcting codes. Another relevant paper is a hash function based on the FFT principle [11]. Also [1] describe an incremental hash function, which is parallelizable. However, most of the work seems to be concentrated on exploiting parallelism in the compression function. The other way to achieve parallelism is to incorporate it in the composition principle. One such work based on binary trees is by Damg˚ ard [3]. However, the algorithm in [3] is not practical since the size of the binary tree grows with the length of the message. One recent paper which describes a practical parallel composition principle is the work by Sarkar and Schellenberg [10]. In this paper we design a collision resistant hash function based on the SarkarSchellenberg (SS) composition principle. To actually design a hash function, it is not enough to have a secure composition principle; we must have a “good” compression function. As mentioned before, research in the design of compression functions have given us a number of such “good” functions. Thus one way to design a new hash function is to take the SS composition principle and an already known “good” compression function and combine them to obtain the new hash function. This new function will inherit the collision resistance from the compression function and the parallelism from the composition principle. Note that the parallelism in the composition principle is in addition to any parallelism which may be present in the compression function. Thus the studies carried out in [8, 2] on the parallel implementation of the standard hash functions are also relevant to the current work. In this paper we use this idea to design a new hash function – PARSHA-256 – which uses the SS composition principle along with the compression function of SHA-256.

PARSHA-256 – A New Parallelizable Hash Function

349

PARSHA-256 can be implemented in both sequential and parallel manner. A fully parallel implementation of PARSHA-256 will provide a signiﬁcant speed-up over SHA-256. However, for widespread software use, full parallel implementation might not always be possible. We still want our hash function to be used – without signiﬁcantly sacriﬁcing eﬃciency. One approach is to simulate the binary tree of processors with a binary tree of lesser height. Details of this simulation algorithm can be found in [10]. Another approach is to use concurrent programming using threads to simulate the parallelism. We provide a multithreaded implementation of PARSHA-256. The SS composition principle is based on a binary tree of processors. In each round some or all of the processors work in parallel and invoke the compression function. The entire algorithm goes through several such parallel rounds. Our strategy is to simulate the processors using threads. The simulation is round by round, i.e., for each parallel round a number of threads (corresponding to the number of processors for that round) are started. All the threads execute the compression function in a concurrent manner. Also the inputs to the threads are diﬀerent. The simulation of a round ends when all the threads have completed their tasks. This is repeated for all the parallel rounds. Experimental results on P4 running Linux show that for long messages the above strategy of concurrent execution leads to a speed-up over SHA-256. This speed-up varies with the length of the message and the size of the binary tree. Thus we obtain a new hash function which is collision resistant if the compression function of SHA-256 is collision resistant; is signiﬁcantly faster than SHA-256 if implemented in a full parallel manner, and on certain single processor platforms for long messages is still faster than SHA-256 if implemented as a concurrent program using threads.

2

Compression Function and Processor Tree

We describe our choice of the compression function and the processor tree used for the composition principle. 2.1 Choice of Compression Function Let h() be the compression function for SHA-256. The input to h() consists of the following two quantities: (1) A : Sixteen 32-bit words and (2) B : Eight 32-bit words. In the intermediate stages, A is obtained from the message and B is the intermediate hash value. The output of h() consists of eight 32-bit words. Thus the input to h() is 768 bits and the output of h() is 256 bits. In the rest of the paper we will use n = 768 and m = 256. In our algorithm, the inputs to h() will be formed diﬀerently. However, we do not change the deﬁnition of h() and hence the assumption that h() is collision resistant remains unchanged. 2.2

Processor Tree

We will use a binary tree of processors. For t > 0, we deﬁne the processor tree Tt of height t in the following manner: There are 2t processors, numbered P0 , . . . , P2t −1 . For 0 ≤ i ≤ 2t−1 − 1, the children of processor Pi are

350

Pinakpani Pal and Palash Sarkar

P2i and P2i+1 . The arcs point towards parents, i.e., the arc set of Tt is At = {(P2i , Pi ), (P2i+1 , Pi ) : 0 ≤ i ≤ 2t−1 − 1}. Thus the arcs coming into P0 are from P1 and P0 itself. We deﬁne I = {0, . . . , 2t−1 − 1}, L = {2t−1 , . . . , 2t − 1} and P = {0, . . . , 2t − 1}. Figure 1 shows T3 .

- fP0 6 fP1 7 S o S

fP 2 A KA P A fP f 4 5

S fP 3 A KA P A fP f 6

7

Fig. 1. Processor Tree with t = 3.

The input to the processors are binary strings and the behaviour of any processor Pi is described as follows: Pi (y) = h(y) if |y| = n; (1) =y otherwise. Thus Pi invokes the hash function h() on the string y if the length of y is n; otherwise it simply returns the string y. We note that in the digest computation algorithm the length of y will always be n, m or 0.

3

A Special Case

In this section, we describe a special case with a suitable message length and a processor tree with t = 3 and without the use of initialization vector. The purpose of this description is to highlight the basic idea behind the design. In Section 4, we provide the complete speciﬁcation of PARSHA-256. The special case described here is intended to help the reader to better appreciate the diﬀerent parameters of the general speciﬁcation. Let x be the message to be hashed with length L = 2t (p + 2)(n − m) − (n − 2m) for some integer p ≥ 0. Consider the processor tree T3 having processors P0 , . . . , P7 . During the hash function computation, the message x will be broken up into disjoint substrings of lengths n or n − 2m. These substrings will be provided as input to the processors in the diﬀerent rounds. Let us denote by u0 , . . . , u7 the substrings of x which are provided to the processors P0 , . . . , P7 in a particular round. The computation will be done in (p+4) parallel rounds. In each round some or all of the processors work in parallel and apply the compression function to its input to obtain its output. Let us denote by z0 , . . . , z7 respectively the outputs of the processors P0 , . . . , P7 in a particular round. The description of the rounds is as follows.

PARSHA-256 – A New Parallelizable Hash Function

351

1. In round 1, each processor Pj , with 0 ≤ j ≤ 7, gets as input an n-bit substring uj of the message x and produces an m-bit output zj . 2. In rounds 2 to (p + 1) the computation proceeds as follows. (a) Processors P0 , . . . , P3 each get an (n − 2m)-bit substring of the message x. These substrings are u0 , . . . , u3 . Processor Pj (0 ≤ j ≤ 3) concatenates the m-bit strings z2j and z2j+1 of the previous round to uj to form an n-bit input. For example, P0 concatenates z0 , z1 to u0 ; P1 concatenates z2 , z3 to u1 and so on. Note that all the intermediate hash values z0 , . . . , z7 of the previous rounds are used up. (b) Processors P4 , . . . , P7 each get an n-bit substring of the message as input, i.e., the strings u4 , . . . , u7 are all n-bit strings. (c) Each of the processors invoke the compression function on their n-bit inputs to produce an m-bit output. 3. In round (p + 2), processors P0 , . . . , P3 each get an (n − 2m)-bit string, i.e., the strings u0 , . . . , u3 are each (n − 2m)-bit strings. None of the processors P4 , . . . , P7 get any input. Each processor Pj (0 ≤ j ≤ 3) then forms an n-bit string as described in item 2a above. These strings are hashed to obtain m-bit outputs z0 , . . . , z3 . 4. In round (p + 3), processors P0 and P1 each get an (n − 2m)-bit string. (The other processors do not get any input.) These processors then form n-input using the strings z0 , . . . , z3 as before. The n-bit strings are hashed to produce two m-bit strings z0 and z1 . 5. In round (p + 4) only processor P0 gets an (n − 2m)-bit string. The mbits outputs z0 , z1 of the round (p + 3) are concatenated to this (n − 2m)-bit string to form an n-bit input. This input is hashed to obtain the ﬁnal message digest. Figure 2 shows the working of the algorithm. Note that the total number of bits that is hashed is equal to 2t (n) + p(2t−1 (n − 2m) + 2t−1 (n)) + (n − 2m)(2t−1 + · · · + 1). A routine simpliﬁcation shows that this is equal to the length L of the message x. Hence the entire message is hashed to produce the m-bit message digest. Now we consider the modiﬁcations required to handle the general situation. 3.1

Arbitrary Lengths

The message length that we have chosen is of a particular form. In general we have to tackle arbitrary length messages. This requires that the original message be padded with 0’s to obtain the length in a desirable form. 3.2

Processor Tree

The special case described above is for t = 3. Depending upon the availability of resources, one might wish to use a larger tree. We have provided the speciﬁcation

352

Pinakpani Pal and Palash Sarkar u 0 eP 0 u 1 eP 1 u 2 eP 2

u 3 eP 3

eP4 eP5 eP6 eP7 u4 6 u5 6 u6 6 u7 6

P0 u 0 e z0 z 61 u 1 eP 1 7 S o z2 S z3 u u 2 e S3 eP3 P2 A K z5 z 6 A K z7 z4 eP4 A eP5 eP6 A eP7 u4 6 u5 6 u6 6 u7 6 Rounds 2 to (p + 1)

First Round P0 u 0 e z0 z 61 u 1 eP 1 7 S o z2 S z3 u u 2 e S3 eP3 P2 A K z5 z 6 A K z7 z4 eP4 A eP5 eP6 A eP7

P0 u 0 e z0 z 61 u 1 eP 1 7 S o z2 S z3 S eP3 e P2

eP4 eP5 eP6 eP7

Round (p + 2)

Round (p + 3)

P0 u 0 e z0 H H z 61 eP1

eP2

eP3

eP4 eP5 eP6 eP7 Last Round

Fig. 2. Example for the special case.

of PARSHA-256 using the tree height as a parameter. Suppose the height of the available processor tree is T . However, the length of the message might not be large enough to utilize the entire processor tree. In this case, one has to utilize a subtree of height t ≤ T , which we call the eﬀective height of the processor tree. 3.3

Initialization Vector

The description of the special case does not use an initialization vector (IV). As a result there are invocations of the compression function where the input is formed entirely from the message bits. This implies that any collision for the compression function immediately provides a collision for the hash function. To avoid this situation, one can use an initialization vector as part of the input to the invocations of the compression function. This ensures that to ﬁnd a collision for the hash function, one has to ﬁnd a collision for the compression function where a portion of the input is ﬁxed. Using an IV is relatively simple in the Merkle-Damg˚ ard composition scheme. The IV has to be used only for the ﬁrst invocation of the compression function. For the tree based algorithm, the IV has to be used at several points. It has to be used for all invocations of the compression function in the ﬁrst round and all invocations of the compression function by leaf level processors in the subsequent rounds. The disadvantage of using an IV is the fact that the number of invocations of the compression functions increases. Further, this value increases

PARSHA-256 – A New Parallelizable Hash Function

353

as the length of the IV increases. To allow more ﬂexibility to the user we provide for three diﬀerent possible lengths for the IV. The eﬀect of the length of IV on the number of parallel rounds and the number of invocations of the compression function is discussed in Section 5.

4

PARSHA-256 Speciﬁcation

In this section we provide the detailed technical speciﬁcation of the new hash function PARSHA-256. This includes the padding, formatting of the message and the digest computation algorithm. The choice of the compression function and the SS composition principle is also a part of these speciﬁcations. 4.1

Parameters and Notation

n = 768 and m = 256. Compression function h : {0, 1}n → {0, 1}m . Message x having length |x| = L bits. Height of available processor tree is T . Eﬀective height of processor tree is t. Initialization vector IV having length |IV| = l ∈ {0, 128, 256}. Functions δ(i) and λ(i): δ(i) = 2i (2n − 2m − l) − (n − 2m); λ(i) = 2i−1 (2n − 2m − l). 8. q, r and b are deﬁned from L and t as follows: If L > δ(t), then write L − δ(t) = qλ(t) + r, where r is the unique integer from the set {1, . . . , λ(t)}. If L = δ(t), then q = r = 0.

1. 2. 3. 4. 5. 6. 7.

r 9. b = 2n−2m−l . 10. Number of parallel rounds : R = q + t + 2. 11. The empty string will be denoted by NULL.

The initialization vector IV of length l is speciﬁed as followed. The speciﬁcation of SHA-256 describes a 256-bit initialization vector IV. If l = 256, then IV = IV; if l = 128, then IV is the ﬁrst 128 bits of IV; and if l = 0, then IV = NULL. 4.2

Formatting the Message

The message x undergoes two kinds of padding. In the ﬁrst kind of padding, called end-padding, zeros are appended to the end of x to get the length of the padded message in a certain form. This padding is deﬁned in Step 5 of PARSHA256 in Section 4.4. The other kind of padding is what we call IV-padding. If l > 0, then IV-padding is done to ensure that no invocation of h() gets only message bits as input. We now describe the formatting of the message into substrings. Let the endpadded message be written as U1 ||U2 || . . . ||UR , where for 1 ≤ i ≤ R − 1, Ui = ui,0 || . . . ||ui,2t −1 and ui,j , UR are strings of length 0, n − 2m or n − l as deﬁned in Equation (2).

354

Pinakpani Pal and Palash Sarkar

 n − l if (i = 1) or (2 ≤ i ≤ q + 1 and j ∈ L);     n − l if i = q + 2 and 2t−1 ≤ j ≤ 2t−1 + b − 1;     0 if i = q + 2 and 2t−1 + b ≤ j ≤ 2t ;  if q + 2 < i < R and i ∈ L; |ui,j | = 0    n − 2m if 2 ≤ i ≤ q + 2 and j ∈ I;     n − 2m if q + 2 < i < R and 0 ≤ j ≤ Ki − 1;   if q + 2 < i < R and Ki ≤ j ≤ 2t−1 − 1. 0 n − 2m if b > 0; |UR | = 0 otherwise.

                          

(2)

Here Ki = 2s−1 + ks (3)

t−s−1 +b−1 where s = R − i and ks = 2 2t−s . For 1 ≤ i < R and 0 ≤ j ≤ 2t − 1, the input to processor Pj in round i is a string vi,j and the output is zi,j . These strings are deﬁned as follows.  zi,j = Pj (vi,j );  vi,j = ui,j ||IV if i = 1 or j ∈ L; (4)  = zi−1,2j ||zi−1,2j+1 ||ui,j if 1 < i < R and j ∈ I. For l = 0, the correctness of the above formatting algorithm can be found in [10]. The same proof also holds for the case l > 0 and hence we do not repeat it here. 4.3

Computation of Digest

The digest computation algorithm is described as follows. ComputeDigest(x, t) Inputs : message x and eﬀective tree height t. Output : m-bit message digest. 1. 2. 3. 4. 5. 6. 7. 8.

for 1 ≤ i ≤ R − 1 for j ∈ P do in parallel zi,j = Pj (vi,j ); enddo; enddo; if b > 0 then w = P0 (zR−1,0 ||zR−1,1 ||UR ); else w = zR−1,0 ; z = h(w||binn−m (L)); return z.

The function bink (i) is deﬁned in the following manner: For 0 ≤ i ≤ 2k − 1, bink (i) denotes the k-bit binary representation of i. Remark 1. In rounds 1 to (q + 1) all the processors invoke the compression function on their n-bit inputs. However, in rounds (q + 2) to (q + t + 1) only some of the processors actually invoke the compression function. The compression function is invoked only if the input to the processor is an n-bit string. Otherwise

PARSHA-256 – A New Parallelizable Hash Function

355

the processor simply outputs its input (see equation (1)). This behaviour of the processors is controlled by the formatting of the message. The precise details are as follows: Let i be the round number and s = R−i. In round i = q+2, processors P0 , . . . , P2t−1 +b−1 invoke the compression function; in round q +2 < i < q +t+2, processors P0 , . . . , PKi −1 invoke the compression function, processor Pj (if any) where 2s−1 + ks − 1 < j < 2s−1 + ls − 1, ls = ((b + 2t−s − 1)/2(t−s) ) simply outputs its m-bit input. All other processors in these rounds are inactive. 4.4

Digest Generation and Veriﬁcation

We are now in a position to deﬁne the digest of a message x. Suppose that we have at our disposal a processor tree of height T . Then the digest z of x is deﬁned in the following manner. PARSHA-256(x, T ) Inputs : message x and height T of available binary tree. 1. 2. 3.

4. 5. 6. 7.

if L ≤ δ(0) = n − l, then return h(h(x||0n−l−L ||IV)||binn−m (L)); if δ(0) < L < δ(1), then x = x||0δ(1)−L ; L = δ(1); Determine t as follows : t = T if L ≥ δ(T ); = i if δ(i) ≤ L < δ(i + 1), 1 ≤ i < T ; Determine q, r and b from L and t; (see Section 4.1) x = x||0b(2n−2m−l)−r ; z = ComputeDigest(x, t); output (t, z).

Clearly the digest z depends upon the height of the tree t. Hence along with z, the quantity t is also provided as output. Note that the height t of the tree used to produce the digest may be less than the height T of the tree that is available. The reason for this is that the message length L may not be long enough to utilize the entire tree. Thus t is the eﬀective height of the tree used to compute the digest. During veriﬁcation, Step 3 of PARSHA-256 is not executed, since the eﬀective height of the tree is already known. This raises the following question: What happens if the veriﬁer does not have access to a tree of height t? In [10], it is shown that any digest produced using a tree of height t can also be produced using a tree of height t with 0 ≤ t < t. The same algorithm will also work in the present case and hence we do not repeat it here. Moreover, in this paper we provide a multithreaded implementation of algorithm ComputeDigest() where the processors are implemented using threads. This also shows that access to a physical processor tree is not necessary for digest computation.

5

Theoretical Analysis

In this section we perform a theoretical analysis of collision resistance and speedup of PARSHA-256. The speed-up is with respect to SHA-256, which is built using the Merkle-Damg˚ ard composition principle.

356

5.1

Pinakpani Pal and Palash Sarkar

Collision Resistance

We ﬁrst note that the composition scheme used in the design of PARSHA-256 is the parallel Sarkar-Schellenberg scheme described in [10]. Hence we have the following result. Theorem 1 (Sarkar-Schellenberg [10]). If the compression function h() of SHA-256 is collision resistant then so is PARSHA-256. If no initialization vector is used, i.e., if l = 0, then the ability to obtain a collision for h() immediately implies the ability to obtain a collision for PARSHA-256. Hence we can state the following result. Theorem 2. If l = 0, then h() is collision resistant if and only if PARSHA-256 is collision resistant. What happens if l > 0? In this situation the initialization vector IV is non-trivial. The intuitive idea is to increase the collision resistance of the hash function beyond that of the compression function. If there is no IV, then a collision for the compression function immediately leads to a collision for the hash function. However, if an IV is used, then the adversary has to ﬁnd a collision for the compression function under the condition that a certain portion of the input is ﬁxed. Intuitively, this could be a more diﬃcult task for the adversary. On the other hand, Dobbertin [5] has shown that for MD4 the use of IV does not lead to any additional protection. Still the use of IV is quite common in hash function speciﬁcation and hence we also include it in the speciﬁcation of PARSHA-256. 5.2

Speed-Up over SHA-256

Let the end-padded length L of the message x be such that L = γ(n − m) for some positive integer γ. To hash a message x of length L, SHA-256 requires γ invocations of h() and hence the time required to hash x is γTh , where Th is the time required by one invocation of h(). (We are ignoring the last invocation of h() where the length of the message is hashed; this step is required by both SHA-256 and PARSHA-256.) We now compare this to the number of invocations of h() and the number of parallel rounds required by PARSHA-256. Recall from Section 4.1 that the number of parallel rounds required by PARSHA-256 is R. Thus the time required for parallel execution of PARSHA256 is RTh . The number of invocations of h() by PARSHA-256 is same as the number of invocations of h() by PHA in [10]. Proposition 1. The number of invocations of h() by PARSHA-256 on a message of length L is equal to (q + 2)2t + 2b − 2. The parameters q and b depend on L, t, l, n and m. We have the following result. Proposition 2.

L λ(t)

−1
L λ(t) .

PARSHA-256 – A New Parallelizable Hash Function

357

Table 1. Comparison of RF and IF for L = 2ρ (n − m). l 0 128 256

RF ≈ 2−t + 2tρ ≈ 1.14 × 2−t + ≈ 1.33 × 2−t +

t 2ρ t 2ρ

IF ≈1 ≈ 1.14 ≈ 1.33

Note that λ(t) depends on t, l, n and m. For convenience of comparison we assume γ = 2ρ for some positive integer ρ. Also λ(t) = 2t−1 (2n − 2m − l). Hence we have 2ρ−t 2ρ−t − 1 < q + 2 < . l l 1 − 2(n−m) 1 − 2(n−m)

(5)

We deﬁne two ratios to compare PARSHA-256 and SHA-256. The ﬁrst ratio is the round factor RF which compares the number of parallel rounds required by PARSHA-256 to the number of rounds required by SHA-256. The lesser the value of RF the more is the speed-up attained by PARSHA-256 over SHA-256. The second ratio is the invocation factor IF which compares the number of invocations of the compression function h() made by PARSHA-256 to SHA-256. These two ratios are deﬁned as follows. RF = q+t+2 ; ρ 2 t (6) IF = (q+2)22ρ+2b−2 . Note that both RF and IF depend on l. Table 1 provides the values of RF and IF under diﬀerent values of l. For practical implementations the value of t will be small (typically between 3 and 8). The number of processors used is 2t and hence ideally we should have the round factor RF to be 2−t . For moderately long messages (around 1Mbyte) this is true for l = 0. For l > 0 the RF is more. However, for all l, the round factor RF decreases with increase in message length. −t Also for a ﬁxed length L (i.e., a ﬁxed ρ), the factor 2 decreases as t increases RF which implies that for a ﬁxed length the eﬃciency of speed-up decreases with increasing t. However, the actual speed-up increases as t increases. For l = 0, the number of invocations of h() made by PARSHA-256 is equal to the number of invocations of h() made by SHA-256. For l > 0, the number of invocations made by PARSHA-256 is more than that made by SHA-256. This is due to the use of IV. Thus a strictly sequential simulation of PARSHA-256 will require more time than SHA-256. In the next section, we show that for long messages a multithreaded implementation of PARSHA-256 on a single processor machine can lead to a speed-up over SHA-256.

6

Multithreaded Implementation

We implement the algorithm to compute PARSHA-256 using threads. The processors are implemented using threads and the simultaneous operation of the

358

Pinakpani Pal and Palash Sarkar

processors is simulated by concurrent execution of the respective threads. There are R parallel rounds in the algorithm. Each round consists of two phases – a formatting phase and a hashing phase. In the formatting phase, the inputs to the processors are formed using the message and the outputs of the previous invocations of h(). Once this phase is completed, the hashing phase starts. In the hashing phase all the processors operate in parallel to produce the output. We use two buﬀer sets – the input and the output buﬀer sets. The input buﬀer set consists of 2t strings of length n each. Similarly, the output buﬀer set consists of 2t strings of length m each. Thus each processor has its own input buﬀer and output buﬀer. In the formatting phase, the input buﬀer sets are updated using the message and the output buﬀers. In the hashing phase, the input buﬀers are read and the output buﬀers are updated. During implementation we declare the buﬀers to be global variables. This avoids unnecessary overhead during thread creation. The formatting phase prepares the inputs to all the processors. This phase is executed in a sequential manner. That is, ﬁrst the input to processor P0 is prepared, then the input to processor P1 is prepared and so on for the required number of processors. After the formatting phase is complete, the hashing phase is started. The exact details of processor invocation are given as follows. Rounds 1 to q + 1 : P0 , . . . , P2t −1 each invoke the compression function. Round q + 2 : P0 , . . . , P2t−1 +b−1 each invoke the compression function. Round i with q + 2 < i < R : P0 , . . . PKi −1 each invoke the compression function. Round R : if b > 0, then P0 invokes the compression function. Here Ki is as deﬁned in Equation (3). Note that in rounds q + 2 to R at most one processor may additionally output the m-bit input that it receives in the previous round. (See Remark 1 for further explanation.) Each processor is simulated using a thread. In the hashing phase of each round, the required threads are started. Each thread is given an integer j, which identiﬁes the processor number and hence the input and the output buﬀers. Also each thread gets the address of the start location of the subroutine h(). The subroutine h() is implemented in a thread safe manner, so that conﬂict free concurrent execution of the same code is possible. The management strategy for the input and output buﬀers described above ensure that there is no read/write conﬂict for the buﬀers even during concurrent execution. The hashing phase is completed only when all the started threads successfully terminate. This also ends one parallel round of the algorithm. Finally the algorithm ends when all the parallel rounds are completed. There is another way in which concurrent execution can be further utilized. As described before there are two phases of each round – the reading/formatting phase and the hashing phase. It is possible to introduce concurrency in these two phases in the following manner. Suppose the system is in the hashing phase of

PARSHA-256 – A New Parallelizable Hash Function

359

Table 2. Details of diﬀerent test platforms. Number of CPU Processor Processor Speed Main Memory OS

Silicon Graphics O2 P4 1 1 MIPS R12000A Intel Pentium 4 400MHz 1.40 GHz 512 MB 256 MB IRIX 6.5 RedHat Linux 8.0

a particular round. At this point it is possible to concurrently execute the reading/formatting phase of the next round. The advantage is that in the next round the hashing phase can be started immediately, since the reading/formatting phase of this round has been completed concurrently with the hashing phase of the previous round. In situations where memory access is slow, this method will provide speed improvements. On the other hand, to avoid read/write conﬂict, we have to use two sets of buﬀers, leading to a more complicated buﬀer management strategy. For our work, we have chosen not to implement this idea.

7

Experimental Results

First we note that if PARSHA-256 is simulated sequentially then the time taken is proportional to the number of invocations of h(). From Table 1, we know that for a strict sequential execution PARSHA-256 will be roughly as fast as SHA-256 when l = 0 and PARSHA-256 will be 1 times slower than SHA-256 when l > 0. IF Also for full parallel implementation the speed-up of PARSHA-256 over SHA-256 is determined by the factor RF in Table 1. In this section we compare the performance of the multithreaded implementation of PARSHA-256 with SHA-256. The experiments have been carried out on two platforms (see Table 2). The algorithms have been implemented in C and the same code was executed on both the platforms. The order of bytes in a long on the two platforms are diﬀerent; this was the only factor taken into account while running the program. Remark 2. The compression function h() for SHA-256 is also the compression function of PARSHA-256. We implemented h() as a subroutine and this subroutine was invoked by both SHA-256 and PARSHA-256. Thus the comparison of the two implementation is really a comparison of the two composition principles. Any improvement in the implementation of the compression function h() will improve the speed of both SHA-256 and PARSHA-256 but the comparative performance ratio would roughly remain the same. To provide a common platform for comparison, the same background machine load was maintained for the execution of both SHA-256 and PARSHA-256. For comparison purposes we have calculated the diﬀerence in clock() between the start and end of the program for both SHA-256 and PARSHA-256. Extensive experiments were carried out for comparison purpose and a summary of the main points is as follows.

360

Pinakpani Pal and Palash Sarkar

– On P4 running Linux, the following was observed. For long messages of around 1 Mbyte or more, the multithreaded implementation of PARSHA-256 was faster by a factor of 2 to 3 for all values of l. – On SG, the speed of both the algorithms was roughly same for l = 0 and 128. For l = 256, the speed of PARSHA-256 was roughly 0.85 times the speed of SHA-256. – For short messages, the multithreaded implementation was slower. This is possibly due to higher thread management overhead. – The gain in speed decreases as l increases. This is due to the increase in the number of invocations of the compression function as shown in Table 1. – The gain in speed increases with increase in message length. However, the rate of increase is slow. As an outcome of our experiments, we can conclude that on P4 running Linux and for long messages, the multithreaded implementation of PARSHA-256 is roughly 2 to 3 times faster than SHA-256.

8

Conclusion

In this paper, we have presented a new hash function PARSHA-256. The hash function is built using the SS composition principle and the compression function of SHA-256. Since the SS composition principle is parallelizable, our hash function is also parallelizable. A full parallel implementation of PARSHA-256 will show a signiﬁcant speed-up over SHA-256. In this paper, we have described a concurrent implementation of PARSHA-256 on a single processor machine. Experimental results show that for long messages the concurrent implementation is still faster than SHA-256. The basic idea explored in the paper is that it is possible to obtain secure and parallelizable hash functions by combining the SS composition principle with a “good” compression function. We have done this using the compression function of SHA-256. Using other “good” compression functions like RIPEMD-160 or other SHA variations will also yield new and fast parallel hash functions. We believe this task will be a good research/industrial project with many practical applications.

Acknowledgement We would like to thank the reviewers of the paper for their detailed comments, which helped to considerably improve the description of the hash function.

References 1. M. Bellare and D. Micciancio. A New Paradigm for Collision-Free Hashing: Incrementality at Reduced Cost. Lecture Notes in Computer Science, (Advances in Cryptology - EUROCRYPT 1997), pages 163-192.

PARSHA-256 – A New Parallelizable Hash Function

361

2. A. Bosselaers, R. Govaerts and J. Vandewalle, SHA: A Design for Parallel Architectures? Lecture Notes in Computer Science, (Advances in Cryptology - Eurocrypt’97), pages 348-362. 3. I. B. Damg˚ ard. A design principle for hash functions. Lecture Notes in Computer Science, 435 (1990), 416-427 (Advances in Cryptology - CRYPTO’89). 4. H. Dobbertin, A. Bosselaers and B. Preneel. RIPEMD-160: A strengthened version of RIPEMD. Cambridge Workshop on Cryptographic Algorithms, 1996, LNCS, vol 1039, Springer-Verlag, Berlin 1996, pp 71-82. 5. H. Dobbertin. Cryptanalysis of MD4. Journal of Cryptology, 11(4): 253-271 (1998). 6. L. Knudsen and B. Preneel. Construction of Secure and Fast Hash Functions Using Nonbinary Error-Correcting Codes. IEEE Transactions on Information Theory, vol. 48, no. 9, September 2002, pp 2524–2539. 7. R. C. Merkle. One way hash functions and DES. Lecture Notes in Computer Science, 435 (1990), 428-226 (Advances in Cryptology - CRYPTO’89). 8. J. Nakajima, M. Matsui. Performance Analysis and Parallel Implementation of Dedicated Hash Functions. Lecture Notes in Computer Science, (Advances in Cryptology - EUROCRYPT 2002), pp 165-180. 9. B. Preneel. The state of cryptographic hash functions. Lecture Notes in Computer Science, 1561 (1999), 158-182 (Lectures on Data Security: Modern Cryptology in Theory and Practice). 10. P. Sarkar and P. J. Schellenberg. A Parallelizable Design Principle for Cryptographic Hash Functions. IACR e-print server, 2002/031, http://eprint.iacr.org. 11. C. Schnorr and S. Vaudenay. Parallel FFT-Hashing. Lecture Notes in Computer Science, Fast Software Encryption, LNCS 809, pages 149-156, 1994.

A

Test Vector

Our implementation of PARSHA-256 is available at http: www.isical.ac.in/˜crg/software/parsha256.html. The test vector that we use is the string : (abcdef gh)128 . (Note that the corresponding ﬁles for little and big endian architectures are going to be diﬀerent.) Each of the characters represent a byte and the entire string is of length 1 Kbyte. We run PARSHA-256 for t = 3 and for l = 0, 128 and 256. Denote the resulting message digests by d1 , d2 and d3 . Each di is a 256 bit value and we give the hex representations below. d1 is as follows. 4d4c2b13 3e516dc1 35065779 536fd4bf 74f98189 bc6b2a92 10803d38 77e3b656 d2 is as follows. e554c47b 1538c9db 5cbff219 2d620fd3 ae21d04a 5ae6fa50 150888cc da6cf783 d3 is as follows. 459142c5 fcd6eff6 839d6740 177b54d5 2e8bc987 a7438438 a588441a 7113e8d3

Practical Symmetric On-Line Encryption Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard DCSSI Crypto Lab 51 Boulevard de La Tour-Maubourg 75700 Paris 07 SP, France {Pierre-Alain.Fouque,Gwenaelle.Martinet}@ens.fr [email protected]

Abstract. This paper addresses the security of symmetric cryptosystems in the blockwise adversarial model. At Crypto 2002, Joux, Martinet and Valette have proposed a new kind of attackers against several symmetric encryption schemes. In this paper, we ﬁrst show a generic technique to thwart blockwise adversaries for a speciﬁc class of encryption schemes. It consists in delaying the output of the ciphertext block. Then we provide the ﬁrst security proof for the CFB encryption scheme, which is naturally immune against such attackers. Keywords: Symmetric encryption, blockwise adversary, chosen plaintext attacks.

1

Introduction

Modes of operation are well-known techniques to encrypt messages longer than the output length of a block cipher. The message is ﬁrst cut into blocks and the mode of operation allows to securely encrypt the blocks. The resulting construction is called an encryption scheme. Speciﬁc properties are achieved by some of these modes: self-synchronization, ensured by chained modes such as CBC and CFB [11], or eﬃcient encryption throughput, ensured by parallelized modes such as ECB and OFB [11]. Two diﬀerent techniques are mainly used to build these schemes. The ﬁrst one directly outputs the block of the block cipher (ECB, CBC). The second method uses the block cipher to generate random strings which are then XORed with the message blocks (CTR [1], OFB, CFB). In this paper we investigate the security of the classical modes of operation in a more realistic and practical scenario than previous studies. In cryptography, security is usually deﬁned by the combination of a security goal and an adversarial model. The security goal of an encryption scheme is privacy. Informally speaking, privacy of an encryption scheme guarantees that, given a ciphertext, an adversary is not able to learn any information about the corresponding plaintext. Goldwasser and Micali have formalized this notion in [5] where it has been called the semantic security. An equivalent deﬁnition called indistinguishability of encryptions (IND) has also been more extensively studied in [1] for the symmetric encryption setting: given two equal length messages M0 T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 362–375, 2003. c International Association for Cryptologic Research 2003

Practical Symmetric On-Line Encryption

363

and M1 chosen by the adversary and the encryption C of one of them, it is diﬃcult for the adversary to distinguish whether C is the encryption of M0 or M1 . In practical scenarii, adversary goals can be diﬀerent from this theoretical notion of privacy. For example, the adversary can try to recover the secret key or to recover the plaintext underlying a given ciphertext. However, from a security point of view, if the scheme is secure under the IND security notion, key recovery or plaintext recovery cannot be achieved by the adversary. It is worth noticing that a security proof for encryption mode is not an absolute proof of security. As often in cryptography, proofs are made by reduction, in the complexity theoretic sense, between the security of the scheme and the security of the block cipher used in the encryption scheme. In practice, such a proof shows that the mode achieves the security goal assuming the security of the underlying block cipher. Orthogonally to the security goal, the adversarial model deﬁnes the adversary abilities. The considered adversarial models are known plaintext attacks, chosen plaintext attacks (CPA) or chosen ciphertext attacks (CCA). In these scenarii, the adversaries have access to an encryption oracle, queried with known or chosen messages, and/or a decryption oracle, queried with ciphertexts, that may be chosen according to the previous pairs of plaintexts and ciphertexts. In the sequel we consider schemes secure against Chosen Plaintext Attacks, such as CBC or CFB. We do not take into account schemes secure against Chosen Ciphertext Attacks, such as OCB [12], IACBC, IAPM [9] or XCBC [4]. Usually, it is implicitly assumed that messages sent to the encryption oracle are atomic entities. However, in the real world, the encryption module can be a cryptographic accelerator hardware or a smart card with limited memory. Thus, ciphertext blocks are output by the module before having received the whole message. Practical applications are thus far from the theoretical security model. Recently, Joux, Martinet and Valette in [8] have proposed to change the adversary interactions with the encryption oracle to better model on-line symmetric encryption schemes. Such a scheme can output the ciphertext block C[i] just after the introduction of the block M [i], without having the knowledge of the whole message. Many modes of operation have this nice property. Therefore, from the attacker side, adversaries in the IND security game can adapt the message blocks according to the previously received ciphertext blocks. The same notion concerning integrity on real-time applications has been used by Gennaro and Rohatgi [3]. The blockwise adversarial model, presented in [8], is used to break the INDCPA security of some encryption schemes, provably secure in the standard model. For example, in order to encrypt a message M = M [1]M [2] . . . M [] with the CBC encryption mode, a random initial vector C[0] = IV is chosen and for all 1 ≤ i ≤ , C[i] = EK (M [i] ⊕ C[i − 1]). In [1], Bellare et al. have shown that, in the standard model, the CBC encryption scheme is IND-CPA secure up to the encryption of 2n/2 blocks, where n denotes the length of the block cipher EK . However, in [8], Joux et al. have shown that the CBC encryption mode cannot be IND secure in the blockwise adversarial model: only two-blocks messages M0 and M1 allow the adversary to win the semantic security game. Indeed, if

364

Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard

the same input is given twice to the block cipher, the same result is output. Consequently in the IND security game, if the adversary knows the initial vector C[0] = IV and the ﬁrst ciphertext block C[1], he can adaptively choose M0 [2] as C[1] ⊕ C[0] ⊕ M0 [1] and a random value for M1 [2]. Then, if the second ciphertext block C[2] is such that C[2] = C[1], the ciphertext C = C[0]C[1]C[2] is the encryption of M0 . Otherwise it is the encryption of M1 . This attack works since the adversary can adapt his message blocks according to the output blocks. In the standard model, as the messages are chosen before the ciphertext is returned by the oracle, the probability that such a collision occurs in the inputs of the block cipher is upperbounded by µ2 /2n , where µ denotes the number of encrypted blocks with the same key. While µ remains small enough, the probability is negligible and the mode of encryption is secure. From a practical point of view, the blockwise attack on the CBC encryption scheme is as eﬃcient as an attack on the ECB encryption scheme in the standard model. Indeed, for both the ECB mode in the standard model and the CBC mode in the blockwise model, the adversary knows inputs and outputs of the block cipher. For the ECB mode, he can then adapt his messages to force a collision. For the CBC mode, he adapts the message blocks. It is worth noticing that in both cases a key recovery attack on the block cipher is possible. Such an attack only requires the encryption of some chosen plaintext blocks. For example, a dictionary attack on the block cipher can be mounted (see for example [10]). In this kind of attacks, the adversary precomputes the encryption of a plaintext block P under all the keys, and stores them in a table. Therefore, if he knows the encryption of P under the key used in the block cipher, he just looks in his table to recover the secret key. Moreover, the time/memory tradeoﬀ of Hellman [7] can be adapted to reduce the required memory of this attack. Therefore, blockwise attacks need to be taken into account in practical uses since attacks are not only theoretical but paves the way to more practical and serious attacks. Our results. In this paper we study the security of some well known encryption mode against blockwise adversaries. In a ﬁrst part we show how to secure the CBC encryption mode. The countermeasure we propose simply consists in delaying the output blocks. This modiﬁed scheme, called delayed CBC (DCBC), is proved secure against blockwise adaptive adversaries, mounting chosen plaintext attacks. Furthermore, this modiﬁcation can be applied to secure several modes of operation. In a second part, we show that the CFB (Ciphertext FeedBack) encryption mode is secure without any change in this new model. We also give in appendices a rigorous proof for the security of the DCBC and CFB modes.

2 2.1

Preliminaries Notations

In the sequel, standard notations are used to denote probabilistic algorithms and experiments. If A is a probabilistic algorithm, then the result of running A on inputs x1 , x2 , . . . and coins r will be denoted by A(x1 , x2 , . . . ; r). We let

Practical Symmetric On-Line Encryption

365

y ← A(x1 , x2 , . . . ; r) denote the experiment of picking r at random and letting y be A(x1 , x2 , . . . ; r). If S is a ﬁnite set then x ← S is the operation of picking an element uniformly from S. If α is neither an algorithm nor a set then x ← α is a simple assignment statement. We say that y can be output by A if there is some r such that A(x1 , x2 , . . . ; r) = y. If p(x1 , x2 , . . .) is a predicate, the notation Pr[x1 ← S; x2 ← A(x1 , y2 , . . .); . . . : p(x1 , x2 , . . .)] denotes the probability that p(x1 , x2 , . . .) is true after ordered execution of the listed experiments. Recall that a function ε : N → R is negligible if for every constant c ≥ 0 there exists an integer kc such that ε(k) ≤ k −c for all k ≥ kc . The set of all functions from {0, 1}m to {0, 1}n is denoted by Rm→n . The set of all the permutations of {0, 1}n is denoted by Permn . 2.2

Security Model

Security of a symmetric encryption scheme is viewed as indistinguishability of the ciphertexts, when considering chosen plaintext attacks. However, the recent attacks on some schemes, proved secure in the standard model, show that a new adversarial model has to be deﬁned. The new kind of adversaries, introduced in [8], are adaptive during a query, according the previous blocks of ciphertext. The security model has to take into account these adversaries, realistic in an implementation point of view. The diﬀerence with the standard model is that here the queries are made on the ﬂy: for each plaintext block received, the oracle outputs a ciphertext block. This better models on-line encryption. Thus, it is natural to consider a new kind of interactions, induced by this model: since the adversary does not send the whole plaintext in a single query, so that he can adapt the next plaintext block according to the ciphertext he receives, one can also suppose that the adversary may interleave the queries. In this case, the attacker is able to query the oracle for the encryption of a new message, even if the previous encryption is not ﬁnished. This introduces concurrent queries. The security model is thus modiﬁed in depth and security of known schemes has to be carefully re-evaluated in this new model. Formally, in this model, the adversary, denoted by A in the sequel, is given access to a blockwise concurrent encryption left-or-right oracle: this oracle is queried with inputs of the form (M0i [j], M1i [j]), where M0i [j] and M1i [j] are two plaintext blocks. At the beginning of the game, this oracle ﬂips at random a bit b. Then, if b = 0 it will always encrypt M0i [j], and otherwise, if b = 1, it will encrypt M1i [j]. The corresponding ciphertext block Cbi [j] is returned to the adversary, whose goal is to guess which message has been encrypted. Here the queries are made on the ﬂy (for each plaintext block received, the oracle outputs a ciphertext block), and also concurrently (the adversary may interleave the queries). In this case, A is able to query the oracle for the encryption of messages, even if the previous encryption is not ﬁnished. This introduces concurrent queries. Thus, bl,c we deﬁne the encryption left-or-right oracle, denoted by EK (LR(., ., b, i)), to i i take as input two plaintext blocks M0 [j] and M1 [j] along with the number i of the query, and encrypt Mbi [j]. We now give the formal description of the attack scenario:

366

Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard lorc−bcpa(b)

ExptSE,A

(k)

R

K ← K(k) bl,c d ← AEK (LR(·,·,b,·)) Return d The adversary advantage in winning the LORC-BCPA game is deﬁned as: lorc−bcpa(b) Advlorc−bcpa (k) = 2 · Pr[ExptSE,A (k) = 1] − 1 SE,A We deﬁne Advlorc−bcpa (k, t, q, µ) = max{Advlorc−bcpa (k)}, where the maxiSE SE,A A

mum is over all legitimate A having time-complexity t, making to the concurrent oracles at most q encryption queries totaling µ blocks. A secret-key encryption scheme SE is said to be lor-secure against concurrent blockwise adaptive chosen plaintext attack (LORC-BCPA), if for all polynomial-time probabilistic adversaries, the advantage in this guessing game is negligible as a function of the security parameter k. In this case, SE is said LORC-BCPA secure. The security of a block cipher is viewed as the indistinguishability from random permutations, as deﬁned for example in [1]. The attack scenario for the adversary is to distinguish the outputs of a permutation randomly chosen in Permn , from the outputs of a permutation randomly chosen in the family P of all permutations induced by a given block-cipher. The adversary advantage in winning this game is denoted by Advprp P (k, t, q). Following the same idea, the security of a pseudorandom function f randomly chosen in a given family F of functions of input length m and output length n, is the indistinguishability from a random function of Rm→n . The attacker game is the same as above, except that permutations are replaced by functions. The adversary advantage in winning the game in denoted by Advprf F (k, t, q).

3

Blockwise Secure Encryption Schemes

In this section, we propose two modes of encryption that enable to withstand blockwise adversaries. These modes are well-known and simple. The CFB encryption scheme and a variant of the CBC are secure against the powerful adversaries we consider. The complete security proofs are given in appendices and we only summarize in this section the security results and their implications on the use of those modes of encryption. 3.1

A Blockwise Secure Variant of the CBC: The Delayed CBC

Description. The CBC mode of encryption, probably the most currently used in practical applications, suﬀers from strong weaknesses in the blockwise adversarial model, as it has been shown in [8]. The main reason is that the security of modes of operation is closely related to the probability of collision in the inputs of the underlying block cipher. As shown by the attacks presented in [8], blockwise

Practical Symmetric On-Line Encryption M [1]

M [2]

M [3]

M [4]

M [ − 1]

M []

-

367

Stop

?

IV

? - l ? EK

? - l ? EK

? - l ? EK

?

?

?

?

C[0]

C[1]

C[2]

C[3]

-

? - l ? EK

? C[ − 1]

? - l ? EK

? C[]

Fig. 1. The Delayed CBC encryption mode.

adversaries can choose the message blocks according to the previously revealed ciphertext blocks so that they can force such a collision. This kind of adversaries are realistic if the output blocks are gradually released outside the cryptographic component. A simple countermeasure to prevent an adversary from having access to the previously ciphered block is to delay the output by one single block. Consequently, an attacker can no longer adapt the message blocks. More precisely, we slightly modify the encryption algorithm in such a way that the encryption module delays the output by one block, i.e., instead of outputting C[i] just after the introduction of M [i], C[i] is output after the introduction of M [i + 1]. This modiﬁcation in the encryption process is eﬃcient and does not require any modiﬁcation of the scheme; ciphertexts produced by a device implementing the delayed CBC mode are compatible with those produced by standard ones. A detailed description for this scheme, called Delayed CBC or simply DCBC, is given below and is also depicted in ﬁgure 1. We assume that each block is numbered from 1 to and that the end of the encryption is indicated by sending a special block M [ + 1] = stop. If the decryption algorithm does not have to output a block, it sends, as an acknowledgment, a special block “Ack”. Of course, the index i is only given to simplify the description of the algorithm but in practice this counter should be handled by the encryption module. In other words, we do not consider attacks based on false values of i since they do not have any practical signiﬁcance. In the following, EK (.) will be denoted by E(K, .). Function E − DCBCE (K, M [i], i) If i = 1, IV ← {0, 1}n , C[0] = IV Return C[0] Else If M [i] =stop Return C[i − 1] Else C[i] = E(K, C[i − 1] ⊕ M [i]) Return C[i − 1]

Function D − DCBCE (K, C[i], i) If i = 0, Return Ack Else Return C[i − 1] ⊕ E −1 (K, C[i])

368

Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard

Note that the decryption process is unchanged compared to the standard CBC encryption mode. Indeed, there is no need to delay the output block in the decryption phase since the adversary is not given any access to a decryption oracle for chosen plaintext attacks. Furthermore, since the DCBC does not provide chosen ciphertext security, for both the standard and the blockwise model, the decryption process does not need to be modiﬁed. Blockwise Security of the DCBC Encryption Mode. In appendix A, we analyze the security of the DCBC against blockwise concurrent adversaries mounting chosen plaintext attacks. Intuitively, it is easy to see that a blockwise adversary cannot adapt the plaintext blocks according to the previously returned ciphertext blocks since it does not know C[i − 1] when submitting M [i]. Furthermore, the knowledge of the previous blocks C[0], . . . , C[i − 2] does not help him to predict the i-th input C[i − 1] ⊕ M [i] of the block cipher as long as the total number µ of encrypted blocks with the same key K is not too large. The security proof shows that the advantage of an adversary is at most increased by a term µ2 /2n . In other words, DCBC is provably secure in the blockwise model, assuming the security of the underlying block cipher, while the total number of encrypted blocks with the same key is much smaller than 2n/2 . The security of the DCBC encryption mode is given in the following theorem: Theorem 1. Let P be a family of pseudorandom permutations of input and output length n where each permutation is indexed with a k-bit key. If E is drawn at random in the family P, then the DCBC encryption scheme is LORC-BCPA secure. Furthermore, for any t, q and µ ≥ 0, we have: Advlorc−bcpa (k, t, q, µ) ≤ 2 · Advprp DCBC P (k, t, µ) +

µ2 2n−1

It is important to notice that this security bound is similar to the one obtained in the standard model for the CBC mode [1]. This means that the delayed CBC is as secure in the blockwise model as the classical CBC encryption scheme in the standard model. 3.2

CFB Encryption Scheme

A review of the most classical modes of operation shows that one of them, the CFB mode [11], is naturally immune against blockwise attacks. Description. The CFB encryption mode is based on a function F , indexed by a key K, taking n-bit blocks as input and outputting n-bit blocks. This function F does not need to be a permutation, i.e., does not need to be implemented using a block cipher. For example the construction of Hall et al. [6], proved by Bellare and Impagliazzo in [2], can be used. In the following, FK (.) will be denoted by f (K, .). A detailed description for this scheme is given below and also depicted in ﬁgure 2, using the same conventions as for DCBC.

Practical Symmetric On-Line Encryption M [1]

IV

?

C[0]

FK

369

M [2]

? - l

-

?

FK

? - l

M []

-

-

FK

? - l

?

C[1]

?

C[2]

C[]

Fig. 2. The CFB encryption mode. Function E − CFBf (K, M [i], i) If i = 1, IV ← {0, 1}n , C[0] = IV C[1] = f (K, C[0]) ⊕ M [1] Return C[0] and C[1] Else C[i] = f (K, C[i − 1]) ⊕ M [i] Return C[i]

Function D − CFBf (K, C[i], i) If i = 0, Return Ack Else . Return C[i] ⊕ f (K, C[i − 1])

We insist on the fact that we have not modiﬁed the original CFB mode and that we only recall it in order to be complete. Blockwise Security of the CFB Encryption Mode. In appendix B, we analyze the security of the CFB against blockwise concurrent adversaries mounting chosen plaintext attacks. Intuitively, a blockwise adversary cannot adapt the plaintext blocks in order to force the input to the function f while the ciphertext blocks are all pairwise distinct. If no adaptive strategy is eﬃcient, the inputs of f behave like random values and the system is secure until a collision at the output of this function occurs. If the total number µ of encrypted blocks with the same key K is not too large, i.e., much smaller than the square root of 2n , this event only happens with negligible probability. The security proof formalizes those ideas and shows that the advantage of an adversary is at most increased by a term µ2 /2n , as for DCBC. In other words, the CFB mode is provably secure in the blockwise model, assuming the security of the underlying block cipher (or function), while the total number of encrypted blocks with the same key is much smaller than 2n/2 . Theorem 2 (Security of the CFB mode of operation). Let F be a family of pseudorandom functions with input and output length n, where each function is indexed with a k-bit key. If the CF B encryption scheme is used with a function f chosen at random in the family F, then, for every integers t, q, µ ≥ 0, we have: Advlorc−bcpa (k, t, q, µ) ≤ 2 · Advprf CF B F (k, t, µ) +

µ2 2n−1

Such a bound is tight since practical attacks against the indistinguishability of the mode can be mounted if more than 2n/2 blocks are encrypted. In practice, notice that using 64-bit block ciphers such as DES or triple-DES, this bound of

370

Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard

232 blocks could be quickly reached in some applications based on high speed networks. A block cipher rather than a pseudorandom function can be used in the CFB mode as it is speciﬁed in [11]. Indeed, a secure block cipher behaves like a pseudorandom function up to the encryption of 2n/2 blocks.

References 1. M. Bellare, A. Desai, E. Jokipii, and P. Rogaway. A Concrete Security Treatment of Symmetric Encryption. In Proceedings of the 38th Symposium of Fundations of Computer Science. IEEE, 1997. 2. M. Bellare and R. Impagliazzo. A tool for obtaining tighter security analysis of pseudorandom function based constructions, with applications to PRP → PRF conversion. Manuscript available at http://www-cse.ucsd.edu/users/russell, February 1999. 3. R. Gennaro and P. Rohatgi. How to Sign Digital Streams. In B. Kaliski, editor, Advances in Cryptology – Crypto’97, volume 1294 of LNCS, pages 180 – 197. Springer-Verlag, 1997. 4. V.D. Gligor and P. Donescu. Fast Encryption and Authentication: XCBC and XECB Authentication Modes. In M. Matsui, editor, Fast Software Encryption 2001, volume 2355 of LNCS, pages 92 – 108. Springer-Verlag, 2001. 5. S. Goldwasser and S. Micali. Probabilistic Encryption. Journal of Computer and System Sciences, 28:270 – 299, 1984. 6. C. Hall, D. Wagner, J. Kelsey, and B. Schneier. Building PRFs from PRPs. In H. Krawczyk, editor, Advances in Cryptology – Crypto’98, volume 1462 of LNCS, pages 370 – 389. Springer-Verlag, 1998. 7. M. E. Hellman. A Cryptanalytic Time-Memory Trade-Oﬀ. IEEE Transactions on Information Theory, IT-26(4):401 – 406, 1980. 8. A. Joux, G. Martinet, and F. Valette. Blockwise-Adaptive Attackers. Revisiting the (in)security of some provably secure Encryption Modes: CBC, GEM, IACBC. In M. Yung, editor, Advances in Cryptology – Crypto’02, volume 2442 of LNCS, pages 17 – 30. Springer-Verlag, Berlin, 2002. 9. C. Jutla. Encryption modes with almost free message integrity. In B. Pﬁtzmann, editor, Advances in Cryptology – Eurocrypt’01, volume 2045 of LNCS, pages 529 – 544. Springer-Verlag, 2001. 10. A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied Cryptography. CRC Press, 1996. 11. NIST. FIPS PUB 81 - DES Modes of Operation, December 1980. 12. P. Rogaway, M. Bellare, J. Black, and T. Krovetz. OCB: A Block-Cipher Mode of Operation for Eﬃcient Authenticated Encryption. In Eighth ACM conference on Computer and Communications Security. ACM Press, 2001.

A

Security Proof for the DCBC Encryption Scheme

We recall the following theorem giving the security bound for the DCBC encryption scheme, in the security model deﬁned in section 3.1.

Practical Symmetric On-Line Encryption

371

Theorem 3. Let P be a family of pseudorandom permutations of input and output length n where each permutation is indexed with a k-bit key. If E is drawn at random in the family P, then the DCBC encryption scheme is LORC-BCPA secure. Furthermore, for any t, q and µ ≥ 0, we have: (k, t, q, µ) ≤ 2 · Advprp Advlorc−bcpa DCBC P (k, t, µ) +

µ2

2n−1 Proof. The proof goes by contradiction. Assume that there exists an adversary A against the DCBC encryption scheme with non-negligible advantage. From this adversary, we construct an attacker B that can distinguish the block cipher EK used in the DCBC, and randomly chosen in the family P, from a random permutation with non-negligible advantage. More precisely, the attacker B interacts with a permutation oracle that chooses a bit b and if b = 1, chooses f as a permutation in the set of all permutations Permn . Otherwise, if b = 0, it runs the key generation algorithm K(1k ), obtains a key K and sets f as EK . The goal of B is to guess the bit b with non-negligible advantage. To this end, B uses the adversary A and consequently B has to simulate the environment of the adversary A. First, B chooses a bit b at random and runs A. B has to concurrently answer the block encryption queries of the LORC game. When A submits pairs of input block (M0i [j], M1i [j]), B always encrypts the block Mbi [j] ⊕ Cbi [j − 1] under the DCBC encryption mode thanks to the permutation oracle, yielding Cbi [j], and returns Cbi [j − 1] to A. Finally, A will return a bit b and if b = b , then B returns b∗ = 0, otherwise, B returns b∗ = 1 to the oracle. The advantage of A in winning the LORC game is deﬁned as: lorc−bcpa(b) (k) = · Pr[Expt (k) = 1] − 1 Advlorc−bcpa 2 DCBC,A DCBC,A = 2 · Pr[b = b |K ← K(1k ), f = EK ] − 1 It is easy to verify that the attacker B can simulate the concurrent lorencryption oracle to adversary A since B has access to a permutation f and B can simulate the encryption mode of DCBC. The advantage for B in winning his game is deﬁned as: ∗ ∗ Advprp P,B (k) = | Pr[b = 0|b = 0] − Pr[b = 0|b = 1]|

= | Pr[b = b |b = 0] − Pr[b = b |b = 1]| = Pr[b = b |K ← K(1k ), f = EK ] − Pr[b = b |f ← Permn ] 1 + Advlorc−bcpa DCBC,A (k)

− Pr[b = b |f ← Permn ] 2 Let us now analyze Pr[b = b |f ← Permn ]. We denote by D the event that all the inputs on the f permutation are distinct. Thus we have: ≥

Pr[b = b |f ← Permn ] = Pr[b = b |f ← Permn ∧ D] · Pr[D] ¯ · Pr[D] ¯ + Pr[b = b |f ← Permn ∧ D] ¯ + 1 − 1 · Pr[D] ¯ = 1/2 · 1 − Pr[D] 2n

372

Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard

This last equation comes from the fact that if f is a permutation chosen at random from the set of all permutations and no collision occurs, outputs of f are independent of the input blocks M0i [j] and M1i [j] and the adversary A has no advantage in winning the LORC game. Therefore, Pr[b = b |f ← Permn ∧D] = 12 . Otherwise, if a collision occurs, there exists i, i , j, j such that (i, j) = (i , j ) and Cbi [j] = Cbi [j ], and then since A knows all the plaintexts blocks (M0i , M1i ) and the corresponding ciphertext blocks Cbi , he can decide whether M0i [j]⊕M0i [j ] = Cbi [j −1]⊕Cbi [j −1] or whether M1i [j]⊕M1i [j ] = Cbi [j −1]⊕Cbi [j −1]. However, with probability 1/2n , we have M0i [j] ⊕ M0i [j ] = M1i [j] ⊕ M1i [j ] if (M0i , M1i ) are chosen at random. Thus in any wins his game in this case and we have way A ¯ ≤ 1 − 1n . So, we get: Pr[b = b |f ← Permn ∧ D] 2 1 1 1 ¯ − n · Pr[D] Pr[b = b |f ← Permn ] ≤ + 2 2 2 Now, let us bound the probability that a collision occurs. The following ¯ ≤ µ(µ−1) lemma shows that if µ is the number of encrypted blocks, then Pr[D] 2n−1 . Consequently, the advantage of the attacker B is related to the advantage of the adversary A: 1 + Advlorc−bcpa 1 1 1 DCBC,A (k) prp ¯ − + − · Pr[D] AdvP,B (k) ≥ 2 2 2 2n Advlorc−bcpa 1 1 DCBC,A (k) ¯ − − · Pr[D] ≥ 2 2 2n Consequently, we obtain

1 prp ¯ Advlorc−bcpa · Pr[D] (k) ≤ 2 · Adv (k) + 1 − DCBC,A P,B 2n−1 µ(µ − 1) 1 ≤ 2 · Advprp · (k) + 1 − P,B n−1 2 2n−1

and the theorem follows. To conclude the proof, we have to prove the following lemma. ¯ ≤ Lemma 1. Pr[D]

µ(µ−1) 2n−1 .

¯ = Pr[Collµ ] where Collµ denotes the event that a Proof. We note that Pr[D] collision occurs on the input of the function f during the encryption of the µ blocks. Consequently, Pr[Collµ ] = Pr[Collµ ∧ Collµ−1 ] + Pr[Collµ ∧ Collµ−1 ] = Pr[Collµ |Collµ−1 ] · Pr[Collµ−1 ] + Pr[Collµ−1 ] ≤ Pr[Collµ |Collµ−1 ] + Pr[Collµ−1 ] ≤

k=µ k=1

Pr[Collk |Collk−1 ]

Practical Symmetric On-Line Encryption

373

We now prove that Pr[Collk |Collk−1 ] = 2n2(k−1) −(k−1) . This represents the probability that a collision occurs in the input of the function f at the kth block given that no collision appeared before. We have Pr[Collk ∧ Collk−1 ] = 2(k−1) 2n since there is (k − 1) choices of picking one out of the 2(k − 1) previous diﬀerent values of Mbi [j] ⊕ C i [j − 1] (as no collision occurs before the (k − 1)th step). The factor 2 comes from the fact that there are two messages M0 and M1 . Thus, if a collision occurs for one of them, the adversary wins the game. The adversary cannot force a collision in the kth block: indeed, he does not know the output of the (k − 1)th block and this output of the function f is independent of the (k − 1)th input known by the adversary. Furthermore, there are 2n diﬀerent values of M i [j] ⊕ C i [j − 1]. n We also have Pr[Collk−1 ] = 2 −(k−1) since there are 2n −(k−1) diﬀerent val2n i i n ues for M [j]⊕C [j −1] out of the 2 choices (f is a permutation). Consequently, for k = 1, . . . , µ, we get: Pr[Collk |Collk−1 ] =

2(k − 1)/2n 2(k − 1) =2· n − (k − 1)]/2n 2 − (k − 1)

[2n

Thus, if µ ≤ 2n−1 , Pr[Collµ ] ≤

k=µ

Pr[Collk |Collk−1 ] =

k=1

≤

k=µ−1 k=0

k=µ k=1

2k = 2n − 2n−1

k=µ−1 k=0

k=µ−1 2(k − 1) 2k = n 2 − (k − 1) 2n − k k=0

2k µ(µ − 1) = 2n−1 2n−1

and the lemma is proved.

B

Security Proof for the CFB Encryption Mode

The following theorem gives the security bound for the CFB encryption scheme against concurrent blockwise adaptive adversaries. Theorem 4 (Security of the CFB mode of operation). Let F be a family of pseudorandom functions with input and output length n, where each function is indexed with a k-bit key. If the CF B encryption scheme is used with a function f chosen at random in the family F, then, for every integers t, q, µ ≥ 0, we have: (k, t, q, µ) ≤ 2 · Advprf Advlorc−bcpa CF B F (k, t, µ) +

µ2 2n−1

Proof. We consider an adversary A against the CFB mode, trying to win the LORC-BCPA security game. We show that this adversary can be turned into an adversary B trying to distinguish the function FK from a random function chosen in Rn→n . The attack scenario for A is as deﬁned in section 2.2. B has to simulate for the environment of A, by using his own oracle. Indeed, B has access

374

Pierre-Alain Fouque, Gwena¨elle Martinet, and Guillaume Poupard

to an oracle Of , deﬁned as follows: in the beginning of the game, Of picks at random a bit b. If b = 0 then he chooses at random a key K for the function F ∈ F and lets f = FK . Otherwise, if b = 1, then f is a random function chosen in the set Rn→n of all the function from {0, 1}n into {0, 1}n . B has to guess with non negligible advantage the bit b. We now precisely describe how the adversary B answers the encryption queries made by A. First, B picks at random a bit b . A feeds his encryption oracle with queries of the form (M0i [j], M1i [j]), where Mbi [j] is the j-th block of the i-th query. Note that queries can be interleaved, so that some of the previous queries are not necessarily ﬁnished at this step. When B receives such a query and if j = 1, then B picks at random a value Ri , sends it to Of and receives f (Ri ). If j = 1, then B transmits Cbi [j − 1] to Of and receives f (Cbi [j − 1]). Finally, B returns Cbi [j] = Mbi [j] ⊕ f (Cbi [j − 1]) or Ri along with Cbi [1] = Mbi [1] ⊕ f (Ri ) to A, according to the value j. At the end of the game, A returns a bit b representing its guess for the bit b . Then, B also outputs a bit b∗ representing his guess for the bit b chosen by Of and such that b∗ = 0 if b = b , and b∗ = 1 otherwise. We have to evaluate Advprf F (k). We have: ∗ ∗ Advprf F (k) = | Pr[b = 0|b = 0] − Pr[b = 0|b = 1]| = | Pr[b = b |f ← F] − Pr[b = b |f ← Rn→n ]|

≥

1 + Advlorc−bcpa CF B,A (k) 2

− Pr[b = b |f ← Rn→n ]

(1)

prf n→n ] − 1 and it Thus, Advlorc−bcpa CF B,A (k) ≤ 2 · AdvF (k) + 2 · Pr[b = b |f ← R remains to upperbound Pr[b = b |f ← Rn→n ]. As for the previous proof for the security of the DCBC encryption scheme, we will look at the collisions that can occur in the inputs of the function f . Indeed, if no such collision appears, then the advantage for the adversary A in winning his game equals 0. However if such a collision occurs then the adversary can easily detect it and consequently he can adapt the following plaintext block, to distinguish which of the messages is encrypted. Thus, in this case, the adversary wins the game. We denote by Coll the event that some collision appears on the inputs of the function f . So we have:

Pr[b = b |f ← Rn→n ] = Pr[b = b |f ← Rn→n ∧ Coll] · Pr[Coll] + Pr[b = b |f ← Rn→n ∧ Coll] · Pr[Coll] ≤ Pr[Coll] + Pr[b = b |f ← Rn→n ∧ Coll] 1 ≤ Pr[Coll] + 2

(2)

The last inequality come from the fact that if no collision occurs on the input of the function f , where f is a function chosen at random in Rn→n , then the outputs of this function are random values, uniformly distributed in {0, 1}n and independent of the previous values. Thus, the adversary cannot adapt the following message block, according to the previous ciphertext blocks. Thus, the random guess is the unique strategy for him to guess the bit b .

Practical Symmetric On-Line Encryption

375

We have now to evaluate Pr[Coll]. As before, we denote by Collk the probability that a collision occurs on the (k − 1)th input of the function f . We have: Pr[Collk ] = Pr[∃ 0 ≤ < k s.t. Cbi [] = Cbi [k]], where Cbi [0] = Ri . Thus, we have: µ Pr[Coll] = Pr[Collk |Collk−1 ] k=1

For sake of clarity, in the following we omit the bit b and the index i representing the number of the queries. We remark that: C[] = C[k] if and only if M [] ⊕ f (C[ − 1]) = M [k] ⊕ f (C[k − 1]). This last equation can be veriﬁed either at random, or if the adversary can choose M [k] so that M [k] = M [] ⊕ f (C[ − 1]) ⊕ f (C[k − 1]). However, since by assumption C[k − 1] does not collide with any of the previous ciphertext block, f (C[k − 1]) has never been computed and is then a random value, uniformly distributed in {0, 1}n and independent of the previous computed values. Thus, the adversary cannot guess it to adapt M [k] accordingly, except with negligible probability. Finally, we can write that for all 1 ≤ k ≤ µ: Pr[∃ 0 ≤ < k s.t. C[] = C[k] | Collk−1 ] ≤ 2 · k−1 2n . Indeed, there is at most k − 1 choices for the value , and two messages are queried. Thus, by summing up all the values k, we have: Pr[Coll] ≤

µ2 2n−1

Finally, by replacing all the probabilities involved in equations 1 and 2, we obtain: Advprf F (k, t, µ) ≥ and the theorem follows.

Advlorc−bcpa CF B,A (k, t, q, µ) 2

−

µ2 2n−1

The Security of “One-Block-to-Many” Modes of Operation Henri Gilbert France T´el´ecom R&D [email protected]

Abstract. In this paper, we investigate the security, in the Luby-Rackoﬀ security paradigm, of blockcipher modes of operation allowing to expand a one-block input into a longer t-block output under the control of a secret key K. Such “one-block-to-many” modes of operation are of frequent use in cryptology. They can be used for stream cipher encryption purposes, and for authentication and key distribution purposes in contexts such as mobile communications. We show that although the expansion functions resulting from modes of operation of blockciphers such as the counter mode or the output feedback mode are not pseudorandom, slight modiﬁcations of these two modes provide pseudorandom expansion functions. The main result of this paper is a detailed proof, in the Luby-Rackoﬀ security model, that the expansion function used in the construction of the third generation mobile (UMTS) example authentication and key agreement algorithm MILENAGE is pseudorandom.

1

Introduction

In this paper, we investigate the security of modes of operation of blockciphers allowing to construct a length increasing function, i.e. to expand a 1-block input value x into a longer t-block output (z1 , z2 , . . . , zt ) (where t ≥ 2), under the control of a secret key K. Such length increasing modes of operation of blockciphers associated with a one block to t blocks expansion function are of extremely frequent use in cryptology, mainly for pseudo-random generation purposes. They can be considered as a kind of dual of length decreasing modes of operation associated with a t blocks to one block compression function used for message authentication purpose (e.g. CBC MAC). In both cases, the essential security requirement is that the resulting one block to t blocks (respectively t blocks to one block) function be pseudorandom, i.e. (informally speaking) indistiguishable, by any reasonable adversary, from a perfect random function with the same input and output sizes. Thus the Luby and Rackoﬀ security paradigm [LR88], which allows to relate the pseudo-randomness of a function resulting from a cryptographic construction to the pseudorandomness of the elementary function(s) encountered at the lower level of the same construction, represents a suitable tool for analysing the security of both kinds of modes of operation. However, the security and the eﬃciency of length increasing modes of operation have been much less investigated so far than the one of length decreasing modes of operation such as CBC MAC T. Johansson (Ed.): FSE 2003, LNCS 2887, pp. 376–395, 2003. c International Association for Cryptologic Research 2003

The Security of “One-Block-to-Many” Modes of Operation

377

[BKR94,PR00], R-MAC [JJV02], etc., or than constructions of length-preserving functions or permutations such as the Feistel scheme [LR88,Pa91]. The practical signiﬁcance of length increasing modes of operation of blockciphers comes from the fact that they provide the two following kinds of pseudorandom generation functions, which both represent essential ingredients for applications such as mobile communications security. Example 1: Stream cipher modes of operation of blockciphers. It has become usual for stream ciphers (whether they are derived or not from a mode of operation of a blockcipher) to require that the generated pseudo-random sequences used to encrypt data be not only dependent upon a secret key, but also upon an additional (non secret) input value x, sometimes referred to as an initialization vector or as an initial value (IV). This holds for most recently proposed stream ciphers, e.g. SEAL [RC98], SCREAM [HCCJ02], SNOW [EJ02], BGML [HN00], and for the stream cipher mode of operation of the KASUMI blockcipher used in the third generation mobile system UMTS [Ka00]. As a consequence, stream ciphers are more conveniently modelled as a length increasing pseudo-random function FK : {0, 1}n → {0, 1}nt ; x → FK (x) = (z1 , z2 , · · · , zt ) than as a mere pseudo-random numbers generator allowing to derive a pseudorandom sequence (z1 , z2 , · · · , zt ) of nt bits from a secret seed K. The advantage of modelling a stream cipher as a length increasing function generator rather than as a numbers generator is that it allows to reﬂect the security conditions on the dependance of the pseudo-random sequence in the input value, by requiring that FK be a pseudo-random function, indistinguishable from a perfect random function with the same input and output sizes by any reasonable adversary. Example 2: Combined authentication and key distribution. In mobile communication systems (GSM, UMTS, etc.) and more generally in most secret key security architectures where authentication and encryption are provided, protected communications are initiated with a kind of “handshake” where authentication or mutual authentication between the user’s device and the network and session key(s) distribution are performed. Such an initial handshake is followed by a protected communication, where the session key(s) resulting from the handshake phase are used to encrypt and/or to authenticate the data exchanges. In order for the handshake protocol not to delay the actual protected communication phase, it is essential to restrict it to two passes and to minimize the amount of data exchanged. For that purpose one of the parties (typically the network in the case of mobile communications) sends a random challenge (accompanied by additional data such as a message authenticated counter value if mutual authentication is needed), and this random challenge serves as an input to a secret key function allowing to derive an authentication response and one or several session key(s). In recent mobile communication systems such as UMTS, the length of the outputs to be produced (measured in 128-bit blocks) far exceeds the 1-block length of the random challenge. Thus, one single operation of a blockcipher does not suﬃce to produce the various outputs needed. In order to base the security of the cryptologic computations performed during the handshake upon the security

378

Henri Gilbert

of a trusted blockcipher, a suitable one-block-to-many mode of operation of the underlying blockcipher has to be deﬁned. The security requirements are not only that each of the output blocks be unpredictable by an adversary. In addition, the information on one subset of the outputs (say for instance an authentication response) should not help an adversary to derive any information about the rest of the outputs (say for instance the session key used to encrypt the subsequent exchanges). These various security requirements can be again reﬂected, as in the example of stream cipher modes of operation, in saying that the one to t blocks function FK : {0, 1}n → {0, 1}n.t ; x → FK (x) = (z1 , z2 , · · · , zt ) used to derive the various output values must be indistiguishable from a perfect random function with the same input and output sizes. In this paper, we show that although the one block to t blocks functions associated with well known modes of operation of blockciphers such as the Output Feedback mode (OFB) or the so-called Counter mode are not pseudorandom, slightly modiﬁed modes of operation in which the one-block input is ﬁrst “prewhitened” before being subject to an expansion process are pseudorandom in a formally provable manner. The main result of this paper is a detailed pseudorandomness proof, in the Luby and Rackoﬀ security model, for the one to t blocks mode of operation of a blockcipher used in the UMTS example authentication and key distribution algorithm MILENAGE [Mi00], which can be considered as a modiﬁed counter mode. We also provide pseudorandomness proofs for a modiﬁed version of the OFB mode. Related work. The study of pseudorandomness properties of cryptographic constructions initiated Luby and Rackoﬀ’s seminal paper [LR88] has represented a very active research area for the last decade. In particular, Patarin clariﬁed the link between the best advantage of a q-queries distinguisher and the q-ary transition probabilities associated with f and proved indistinguihability bounds for numerous r-round Feistel constructions [Pa91], Maurer showed how to generalise indistinguishability results related to perfect random functions to indistinguishability results related to nearly perfect random functions [Ma92], Bellare, Kilian, Rogaway [BKR94], and later on several other authors [PR00,JJV02,BR00] investigated the application of similar techniques to various message authentication modes of operation, Vaudenay embedded techniques for deriving indistinguishability bounds into a broader framework named the decorrelation theory [Va98,Va99]. In this paper, we apply general indistinguishability proof techniques due to Patarin [Pa91] in an essential manner. Our approach to expansion functions constructions based on blockcipher modes of operation has some connections, but also signiﬁcant diﬀerences, with the following recently proposed blockcipher based expansion function constructions: – in [DHY02], Desai, Hevia and Yin provide security proofs, in the Luby-Rackoﬀ paradigm, for the ANSI X9.17 pseudo random sequences generation mode of operation of a blockcipher, and for an improved version of this mode which is essentially the same as the modiﬁed OFB mode considered in this paper. However, the security model considered in [DHY02] is quite distinct (and somewhat

The Security of “One-Block-to-Many” Modes of Operation

379

complementary): we consider the pseudorandomness properties of the one to t blocks expansion function resulting from the considered mode of operation, whereas [DHY02] models a PRG mode of operation as the iteration a “smaller” keyed state transition and keystream output function, and consider the pseudorandomness properties of such state transition functions. – in [HN00], Hastad and N¨ aslund propose a pseudorandom numbers generator named BMGL. BGML is based on a “key feedback” mode of operation of a blockcipher. The security paradigm underlying BMGL (namely the indistinguishability of pseudorandom numbers sequences from truly random sequences, based upon a combination of the Blum-Micali PRG construction [BM84] and a variant of the Goldreich Levin hard core bits construction [GL89], in which the conjectured onewayness of the key dependance of the blockcipher is used to construct PR sequences of numbers) is quite diﬀerent from the one considered here (namely the indistinguishability of the constructed expansion function from a perfect random function, assuming that the underlying blockcipher is indistinguishable from a perfect random one-block permutation). The advantage of the BGML approach it that it relies upon less demanding security assumptions for the underlying blockcipher than in our approach, but the disadvantage is that it leads to less eﬃcient constructions in terms of the number of blockcipher invocations per output block. – in [BDJR97], Bellare, Desai, Jokipii and Rogaway provide security proofs for stream cipher modes of operation, namely the XOR scheme and a stateful variant named CTR schemes. These two modes have some connections with the insecure one block to t blocks mode of operation referred to as the counter mode in this paper. However, a major diﬀerence between these modes is that in the XOR and CTR schemes, and adversary has no control at all of the inputs to the underlying blockcipher f (she can only control the plaintext), whereas in all the one to many blocks modes we consider in this paper, an adversary can control the one-block input value. Thus, there is no contradiction between the facts that the XOR and CTR encryption schemes are shown to be secure in [BDJR97] and that the counter mode of operation can easily be shown to be totally insecure. This paper is organized as follows: Section 2 introduces basic deﬁnitions and results on random functions and security proof techniques in the LubyRackoﬀ security model. Section 3 describes various “one-block-to-many” modes of operation of blockciphers, and introduces a modiﬁed variant of the counter mode used in MILENAGE and an improved variant of the OFB mode. Sections 4 and 5 present pseudorandomness proofs for the two latter modes.

2 2.1

Security Framework The Luby-Rackoﬀ Security Paradigm

A key dependent cryptographic function such as a blockcipher or a mode of operation of a blockcipher can be viewed as a random function associated with a randomly selected key value. It is generally deﬁned using a recursive construction

380

Henri Gilbert

process. Each step of the recursion consists of deriving a random function (or permutation) F from r previously deﬁned random functions (or permutations) f1 , · · · , fr , and can be represented by a relation of the form F = Φ(f1 , · · · , fr ). One of the strongest security requirement one can put on such a random function or permutation F is that F be impossible to distinguish with a non negligible success probability from a perfect random function or permutation F ∗ uniformly drawn from the set of all functions (or permutations) with the same input and output sizes, even if a probabilistic testing algorithm A of unlimited power is used for that purpose and if the number q of adaptively chosen queries of A to the random instance of F or F ∗ to be tested is large. It is generally not possible to prove indistiguishability properties for “real life” cryptologic random functions and large numbers of queries, because this would require a far too long key length. However, it is often possible to prove or disprove that if a random function F encountered at a given level of a cryptologic function construction is related to random functions encountered at the lower recursion level by a relation of the form f = Φ(f1 , · · · , fr ), then if we replace the actual f1 to fr random functions of the cipher by independent perfect random functions or permutations f1∗ to fr∗ (or, in a more sophisticated version of the same approach, by f1 to fr functions which are suﬃciently indistinguishable from f1∗ to fr∗ ), then the resulting modiﬁed random function F is indistinguishable from a random function (or permutation). This provides a useful method for assessing the soundness of blockcipher constructions. For instance, in the case of a three-round Feistel construction, a well known theorem ﬁrst proved by Luby and Rackoﬀ [LR88] provides upper bounds on the |p−p∗ | advantage of any testing algorithm A in distinguishing the 2n-bit random permutation F = Ψ (f1∗ , f2∗ , f3∗ ) deduced from three independent perfect random functions f1∗ , f2∗ and f3∗ from a perfect random 2n-bit permutation F ∗ with q adaptively chosen queries to the tested instance of F or F ∗ . This advantage is less 2 than 2qn . Another example is for the F = ΦCBCM AC (f ) CBC-MAC construction allowing to derive a tn-bit to n-bit message authentication function from chained invocations of a an n-bit to n-bit function f . It was shown by Bellare, Kilian and Rogaway in [BKR94] that if q 2 t2 ≤ 2n+1 , then the advantage of any testing algorithm A in distinguishing the random function F = ΦCBCM AC (f ∗ ) derived from a perfect nt -bit to n-bit random function using q adaptively chosen queries 2 2 t . is less than 3 2qn+1 In this paper, we will consider constructions of the form F = Φ(f ), allowing to derive a n-bit to nt-bit function from several invocations of the same instance of an n-bit permutation f , representing a blockcipher of blocksize n. We will show that for suitable modes of operation Φ, the random function F = Φ(f ∗ ) derived from a perfect n-bit random permutation is indistinguishable from a perfect n-bit to nt-bit random function F ∗ .

The Security of “One-Block-to-Many” Modes of Operation

2.2

381

Random Functions

Through the rest of this paper we are using the following notation: – In denotes the set {0, 1}n n – Fn,m denotes the set In Im of functions from In into Im . Thus |Fn,m | = 2m.2 – Pn denotes the set of permutations on In . Thus |Pn | = 2n !. A random function of Fn,m is deﬁned as a random variable F of Fn,m , and can be viewed as a probability distribution (P r[F = ϕ])ϕ∈Fn,m over Fn,m , or equivalently as a family (Fω )ω∈Ω of Fn,m elements. In particular: – A n-bit to m-bit key dependent cryptographic function is determined by a randomly selected key value K ∈ K, and can thus be represented by the random function F = (fK )K∈K of Fn,m . – A cryptographic construction of the form F = Φ(f1 , f2 , · · · , fr ) can be viewed as a random function of Fn,m determined by r random functions fi ∈ Fni ,mi , i = 1 · · · r. Deﬁnition 1. We deﬁne a perfect random function F ∗ of Fn,m as a uniformly drawn element of Fn,m . In other words, F ∗ is associated with the uniform probability distribution over Fn,m . We deﬁne a perfect random permutation f ∗ on In as a uniformly drawn element of Pn . In other words, f ∗ is associated with the uniform probability distribution over Pn . Deﬁnition 2. (q-ary transition probabilities associated to F ). Given a random F function F of Fn,m , we deﬁne the transition probability P r[x → y] associated with a q-tuple x of In inputs and a q-tuple y of Im outputs as F P r[x → y] = P r[F (x1 ) = y 1 ∧ F (x2 ) = y 2 ∧ ... ∧ F (xq ) = y q ] = P rω∈Ω [Fω (x1 ) = y 1 ∧ Fω (x2 ) = y 2 ∧ ... ∧ Fω (xq ) = y q ] In the sequel we will use the following simple properties: Property 1. Let f ∗ be a perfect random permutation on In . If x = (x1 , ..., xq ) is a q-tuple of pairwise distinct In values and y = (y 1 , ..., y q ) is a q-tuple of f∗

pairwise distinct In values then P r[x → y] = (|In | − q)!/|In |! =

(2n −q)! (2n )!

Property 2. Let f ∗ be a perfect random permutation on In . If x and x are two distinct elements of In and δ is any ﬁxed value of In , then Pr[f ∗ (x) ⊕ f ∗ (x ) = δ] ≤ 22n . Proof: Pr[f ∗ (x) ⊕ f ∗ (x ) = 0] = 0 since x = x . If δ = 0, Pr[f ∗ (x) ⊕ f ∗ (x ) = n n−2 δ] = 2 ·22n ! ···1 = 2n1−1 ≤ 22n . So, Pr[f ∗ (x) ⊕ f ∗ (x ) = δ] ≤ 22n . 2.3

Distinguishing Two Random Functions

In proofs of security such as the one presented in this paper, we want to upper bound the probability of any algorithm to distinguish whether a given ﬁxed ϕ

382

Henri Gilbert

function is an instance of a F = Φ(f1∗ , f2∗ , .., fr∗ ) random function of Fn,m or an instance of the perfect random function F ∗ , using less than q queries to ϕ. Let A be any distinguishing algorithm of unlimited power that, when input with a ϕ function of Fn,m (which can be modelled as an “oracle tape” in the probabilistic Turing Machine associated with A) selects a ﬁxed number q of distinct chosen or adaptively chosen input values xi (the queries), obtains the q corresponding output values y i = F (xi ), and based on these results outputs 0 or 1. Denote by p (resp by p∗ ) the probability for A to answer 1 when applied to a random instance of F (resp of F ∗ ). We want to ﬁnd upper bounds on the advantage AdvA (F, F ∗ ) = |p − p∗ | of A in distinguishing F from F ∗ with q queries. As ﬁrst noticed by Patarin [Pa91], the best advantage AdvA (F, F ∗ ) of any distinguishing algorithm A in distinguishing F from F ∗ is entirely determined F by the q-ary transition probabilities P r[x → y] associated with each x = 1 q (x , · · · , x ) q-tuple of pairwise distinct In values and each y = (y 1 , · · · , y q ) q-tuple of Im values. The following Theorem, which was ﬁrst proved in [Pa91] and an equivalent version of which is stated in [Va99], is a very useful tool for F deriving upper bounds on AdvA (F, F ∗ ) based on properties of the P r[x → y] q-ary transition probabilities. Theorem 1. Let F be a random function of Fn,m and F ∗ be a perfect random function representing a uniformly drawn random element of Fn,m . Let q be an integer. Denote by X the subset of In q containing all the q-tuples x = (x1 , · · · , xq ) of pairwise distinct elements. If there exists a subset Y of Im q and two positive real numbers 1 and 2 such that q (i) 1) |Y | ≥ (1 − 1 ) · |Im | F 2) ∀x ∈ X∀y ∈ Y P r[x → y] ≥ (1 − 2 ) · |Im1 |q (ii) then for any A distinguishing algorithm using q queries AdvA (F, F ∗ ) ≤ 1 + 2 . In order to improve the selfreadability of this paper, a short proof of Theorem 1, which structure is close to the one of the proof given in [Pa91], is provided in appendix at the end of this paper.

3

Description of Length Increasing Modes of Operation of Blockciphers

We now describe a few natural length increasing modes of operation of a blockcipher. Let us denote the blocksize (in bits) by n, and let us denote by t a ﬁxed integer such that t ≥ 2. The purpose of one to t blocks modes of operation is to derive an n-bit to tn-bit random function F from an n-bit to tn-bit random function f (representing a blockcipher associated with a random key value K) in such a way that F be indistinguishable from a perfect n-bit to tn bit random function if f is indistinguishable from a perfect random permutation f ∗ . We

The Security of “One-Block-to-Many” Modes of Operation

383

show that the functions associated with the well known OFB mode and with the so-called counter mode of operation are not pseudorandom and introduce enhanced modes of operation, in particular the variant of the counter mode encountered in the UMTS example authentication and key distribution algorithm MILENAGE. 3.1

The Expansion Functions Associated with the Counter and OFB Modes of Operation Are Not Pseudorandom

Deﬁnition 3. Given any t ﬁxed distinct one-block values c1 , · · · , ct ∈ {0, 1}n and any random permutation f over {0, 1}n , the one block to t blocks function FCN T associated with the Counter mode of operation of f is deﬁned as follows: FCN T (f ) : {0, 1}n → {0, 1}nt

x → (z1 , · · · , zt ) = (f (x ⊕ c1 ), · · · , f (x ⊕ ct ))

Given any random permutation f over {0, 1}n , the 1 block to t blocks function FOF B associated with the output feedback mode of operation of f is deﬁned as follows: FOF B (f ) : {0, 1}n → {0, 1}nt x → (z1 , · · · , zt ) where the zi are recursively given by z1 = f (x); z2 = f (z1 ); · · · ; zt = f (zt−1 )

Fig. 1. The counter and OFB modes of operation.

It is straightforward that FCN T and FOF B are not a pseudorandom. As a matter of fact, let us consider the case where FCN T and FOF B are derived

384

Henri Gilbert

from a perfect random permutation f ∗ . Let x denote any arbitrary value of {0, 1}n , and (z1 , · · · , zt ) denote the FCN T (x) value. For any ﬁxed pair (i, j) of distinct elements of {1, 2, .., t}, let us denote by (z1 , · · · , zt ) the FCN T output value corresponding to the modiﬁed input value x = x ⊕ ci ⊕ cj . The obvious property that zi = zj and zj = zi provides a distinguisher of FCN T from a perfect one block to t-blocks random function F ∗ which requires only two oracle queries. Similarly, to proof that FOF B is not pseudorandom, let us denote by x and (z1 , · · · , zt ) any arbitrary value of {0, 1}n and the FCN T (x) value. With an overwhelming probability, f ∗ (x) = x, so that z1 = x. Let us denote by x the modiﬁed input value given by x = z1 , and by (z1 , · · · , zt ) the corresponding FOF B output value. It directly follows from the deﬁnition of FOF B that for i = 1, · · · , t − 1, zi = zi+1 . This provides a distinguisher of FOF B from a perfect one block to t-blocks random function F ∗ which requires only two oracle queries. The above distinguishers indeed represent serious weaknesses in operational contexts where the input value of FCN T or FOF B can be controlled by an adversary. For instance if FCN T or FOF B is used for authentication and key distribution purposes, these distinguishers result in a lack of cryptographic separation between the output values zi . For certain pairs (i, j) of distinct {1, · · · , t} values, an adversary knows how to modify the input x to the data expansion function in order for the i-th output corresponding to the modiﬁed input value x , which may for instance represent a publicly available authentication response), to provide her with the j-th output corresponding to the input value x, which may for instance represent an encryption key. 3.2

Modiﬁed Counter Mode: The MILENAGE Construction

Figure 2 represents the example UMTS authentication and key distribution algorithm MILENAGE [Mi00]. Its overall structure consists of 6 invocations of a 128-bit blockcipher EK , e.g. AES associated with a 128-bit subscriber key K. In Figure 2, c0 to c4 represent constant 128-bit values, and r0 to r5 represent rotation amounts (comprised between 0 and 127) of left circular shifts applied to intermediate 128-bit words. OPC represents a 128-bit auxiliary (operator customisation) key. MILENAGE allows to derive four output blocks z1 to z4 (which respectively provide an authentication response, an encryption key, a message authentication key, and a one-time key used for masking plaintext data contained in the authentication exchange) from an input block x representing a random authentication challenge. It also allows to derive a message authentication tag z0 from the x challenge and a 64-bit input word y (which contains an authentication sequence number and some additional authentication management data) using a close variant of the CBC MAC mode of EK . The security of the MAC function providing z0 , the independence between z0 and the other output values are outside of the scope of this paper. Some analysis of these features can be found in the MILENAGE design and evaluation report [Mi00]. Let us also ignore the involvement of the OPc constant, and let us focus on the structure of the one block to t block construction allowing to derive the output blocks z1 to z4 from

The Security of “One-Block-to-Many” Modes of Operation

385

Fig. 2. Milenage.

the input block x . This construction consists of a prewhitening computation, using EK , of an intermediate block y, followed by applying to y a slight variant (involving some circular rotations) of the counter mode construction. More formally, given any random permutation f over {0, 1}n , the 1 block to t blocks function FM IL (f ) associated with the MILENAGE construction is deﬁned as follows (cf Figure 3): FM IL (f ) : {0, 1}n → {0, 1}nt

x → (z1 , · · · , zt )

where zk = f (rot(f (x), rk ) ⊕ ck ) for k = 1 to t A detailed statement and proof of the pseudorandomness of the MILENAGE construction are given in Theorem 2 in the next Section. Theorem 2 conﬁrms, with slightly tighter indistinguishability bounds, the claim concerning the pseudorandomness of this construction stated (without the underlying proof) in the MILENAGE design and evaluation report [Mi00]. 3.3

Modiﬁed OFB Construction

Figure 4 represents a one block to t blocks mode of operation of an n-bit permutation f which structure consists of a prewhitening computation of f providing an intermediate value y, followed by an OFB expansion of y. More formally, the FM OF B (f ) expansion function associated with the modiﬁed OFB construction of Figure 4 is deﬁned as follows: FM OF B (f ) : {0, 1}n → {0, 1}nt

x → (z1 , · · · , zt )

386

Henri Gilbert

Fig. 3. The MILENAGE modiﬁed counter mode construction.

where z1 = f ((f (x)) and zk = f (f (x) ⊕ zk−1 ) for k = 2 to t A short proof of the pseudorandomness of this modiﬁed OFB construction is given in Section 5 hereafter. It is worth noticing that the construction of the above modiﬁed OFB mode operation is identical to the one of the ANSI X9.17 PRG mode of operation introduced by Desai et al in [DHY02], so that the pseudorandomness proof (related the associated expansion function) provided in Section 5 is to some extent complementary to the pseudorandomness proof (related to the the associated state transition function) established in [DHY02]. The modiﬁed OFB mode of operation is also similar to the keystream generation mode of operation of the KASUMI blockcipher used in the UMTS encryption function f8 [Ka00], up to the fact that in the f8 mode, two additional precautions are taken: the key used in the prewhitening computation diﬀers from the one in the rest of the computations, and in order to prevent collisions between two output blocks from resulting in short cycles in the produced keystream sequence, a mixture of the OFB and counter techniques is applied.

4

Analysis of the Modiﬁed Counter Mode Used in MILENAGE

In this Section we proof that if some conditions on the constants ck , k ∈ {1 · · · t} and rk , k ∈ {1 · · · t} encountered in the MILENAGE construction of Section 3 are satisﬁed, then the one block to t blocks expansion function FM IL (f ∗) resulting from applying this construction to the perfect random one-block permutation f ∗

The Security of “One-Block-to-Many” Modes of Operation

387

Fig. 4. The modiﬁed OFB mode of operation.

is indistinguishable from a perfect random function of Fn,tn , even if the product of t and the number of queries q is large. In order to formulate conditions on the constants ck and rk , we need to introduce some notation: – the left circular rotations of a n-bit word w by r bits is denoted by rot(w, r). Rotation amounts (parameter r ) are implicitly taken modulo n. – for any GF (2)-linear function L : {0, 1}n → {0, 1}n , Ker(L) and Im(L) respectively denote the kernel and image vector spaces of L. With the above notation, these conditions can be expressed as follows: / Im(L) ∀k, l ∈ {1 · · · t}k = l ⇒ (ck ⊕ cl ) ∈ where L = rot(., rk ) ⊕ rot(., rl )

(C)

The purpose of the above condition is to ensure that for any y ∈ {0, 1}n and any two distinct integers k and l ∈ {1 · · · t}, the values rot(y, rk ) ⊕ ck and rot(y, rl ) ⊕ cl be distinct. If t is less than 2n , it is easy to ﬁnd constants ck and rk satisfying condition (C) above. In particular, if one takes all rk equal to zero, condition (C) boils down to requiring that the ci constants be pairwise distinct. Theorem 2. Let n be a ﬁxed integer. Denote by f ∗ a perfect random permutation of In . Let F = FM IL (f ∗ ) denote the random function of Fn,tn obtained by applying the MILENAGE construction of Figure 3 to f ∗ , and F ∗ denote a perfect random function of Fn,t·n . If the constants ck and rk (k = 1 · · · t) of the construction satisfy condition (C) above, then for any distinguishing algorithm

388

Henri Gilbert

A using any ﬁxed number q of queries such that AdvA (F, F ∗ ) ≤

t2 q 2 2n

≤

1 6

we have

t2 q 2 2n+1

Proof. Let us X denote the set of q-tuples x = (x1 , · · · , xq ) of pairwise distinct In values and Z denote the set of q-tuples z = (z 1 = (z11 , · · · , zt1 ), z 2 = (z12 , · · · , zt2 ), · · · , z q = (z1q , · · · , ztq )) of pairwise distinct Int values, such that the tq values z11 , · · · , zt1 , · · · , z1q , · · · , ztq be pairwise distinct. We want to show that there exist positive real numbers 1 and 2 such that: |Z| > (1 − 1 )|Int |q and

(i)

F

∀x ∈ X∀z ∈ ZP r[x → z] ≥ (1 − 2 ) ·

1 |Int |q

(ii)

so that that Theorem 1 can be applied. We have |Z| 2n · (2n − 1) · · · (2n − tq + 1) = q |Int | 2nqt 1 qt − 1 = 1 · (1 − n ) · · · (1 − ) 2 2n 1 ≥ 1 − n · (1 + 2 + · · · + (qt − 1)) 2 1 2n · (1 + 2 + · · · + (qt − 1) 2 2 t . 1 = 2qn+1

Since

=

(qt−1)qt 2n+1

≤

q 2 t2 2n+1 ,

we have |Z| > (1 − 1 )|Int |q ,

with Let us now show that for any ﬁxed q-tuple of In values x ∈ X and any q-tuple F 1 of Int values z ∈ Z, we have P r[x → z] ≥ 2ntq . For that purpose, let us consider from now on any two ﬁxed q-tuples x ∈ X and z ∈ Z. Let us denote by Y the set of q-tuples of pairwise distinct In values F y = (y 1 , .., y q ). We can partition all the possible computations x → z according ∗ 1 ∗ q to the intermediate value y = (f (x ), · · · , f (x )) in the F computation. f∗ f∗ F P r[x → z] = P r[x → y ∧ ∀i ∈ {1..q}∀k ∈ {1..t}(rot(y i , rk ) ⊕ ck ) → zki ] y∈Y

Let us denote by Y the Y subset of those values y satisfying the three following additional conditions, which respectively express the requirement that all the f ∗ input values encountered in the q F computations be pairwise distinct (ﬁrst and second condition), and that all the f ∗ outputs encountered in the same computations be also pairwise distinct (third condition). (I) ∀i ∈ {1..q}∀j ∈ {1..q}∀k ∈ {1..t}xi = rot(y j , rk ) ⊕ ck (II) ∀i ∈ {1..q}∀j ∈ {1..q}∀k ∈ {1..t}∀l ∈ {1..t} (i, k) = (j, l) ⇒ rot(y i , rk ) ⊕ ck = rot(y j , rl ) ⊕ cl

The Security of “One-Block-to-Many” Modes of Operation

389

(III) ∀i ∈ {1..q}∀j ∈ {1..q}∀k ∈ {1..t}y i = zkj We have f∗ f∗ F P r[x → y ∧ ∀i ∈ {1..q}∀k ∈ {1..t}(rot(y i , rk ) ⊕ ck ) → zki ] P r[x → z ≥ y∈Y

However, if y ∈ Y , Property 1 of Section 2 can be applied to the (t+1)q pairwise distinct f ∗ input values xi , i ∈ {1..q} and rot(y i , rk ) ⊕ ck , i ∈ {1..q}, k ∈ {1..t} and to the (t + 1)q distinct output values xi , i ∈ {1..q} and zki , i ∈ {1..q}, k ∈ {1..t}, so that f∗

f∗

(|In |−(t+1)q)! In ! (2n −(t+1)q)! = 2n !

P r[x → y ∧ ∀i ∈ {1..q}∀k ∈ {1..t}(rot(y i , rk ) ⊕ ck ) → zki ] =

F

n

Therefore, P r[x → z] ≥ |Y | (2 −(t+1)q)! (1) 2n ! A lower bound on |Y | can be established, based on the fact that |Y | =

2n ! (2n − q)!

(2)

and on the following properties: – The fraction of y vectors of Y such that condition (I) is not satisﬁed is less 2 than q2nt since for any ﬁxed i ∈ {1..q}, j ∈ {1..q} and k ∈ {1..t} the number of | y ∈ Y q-tuples such that xi = rot(y j , rk ) ⊕ ck is (2n − 1) · · · (2n − q + 1) = |Y 2n and the set of the y vectors of Y such that condition (I) is not satisﬁed is the union set of these q 2 t sets. – The fraction of y vectors of Y such that condition (III) is not satisﬁed is less 2 than q2nt , by a similar argument. – The fraction of y vectors of Y such such that condition (II) is not satisﬁed · t(t−1) · 2n1−1 . As a matter of fact, given any is upper bounded by q(q−1) 2 2 two distinct pairs (i, k) = (j, l) of {1 · · · q} × {1 · · · t}, we can upper bound the number of y vectors of Y such that rot(y i , rk ) ⊕ ck = rot(y j , rl ) ⊕ cl by distinguishing the three following cases: case 1: i = j and k = l. Since condition (C) on the constants involved in F is satisﬁed, there exists no y vector of Y such that rot(y i , rk ) ⊕ ck = rot(y i , rl ) ⊕ cl . So case 1 does never occur. case 2: i = j and k = l. For any y vector of Y , y i = y j . But the rot(·, rk )⊕ck GF(2)-aﬃne mapping of In is one to one. Thus, rot(y i , rk )⊕ ck = rot(y j , rk ) ⊕ ck . In other words, case 2 does never occur. case 3: i = j and k = l The number of Y q-tuples such that rot(y i , rk ) ⊕ | ck = rot(y j , rl )⊕cl is 2n ·(2n −2)·(2n −2)·(2n −3) · · · (2n −q +1) = 2|Y n −1 . Consequently, the set of y vectors of Y such such that condition (II) is not | satisﬁed is the union set of the q(q−1) · t(t−1) sets of cardinal 2|Y n −1 considered 2 2 in case 3, so that the fraction of y vectors of Y such such that condition (II) is not satisﬁed is upper bounded by q(q−1) · t(t−1) · 2n1−1 , as claimed before. 2 2

390

Henri Gilbert

As a consequence of the above properties, the overall fraction of the Y vectors 2 · t(t−1) · 2n1−1 , i.e. which do not belong to Y is less than 2q2n t + q(q−1) 2 2 |Y | ≥ (1 − (

2q 2 t q(q − 1) t(t − 1) 1 + ))|Y | 2n 2 2 2n − 1

(3)

Now (1) (2) and (3) result in the following inequality: F

P r[x → z] ≥ (1 − ( The

(2n −(t+1)q)! (2n −q)!

1 (2n − (t + 1)q)! 2q 2 t q(q − 1) t(t − 1) · · n )) · + n 2 2 2 2 −1 (2n − q)!

term of the above expression can be lower bounded as follows

1 (2n − (t + 1)q)! = n n n (2 − q)! (2 − q)(2 − q − 1) · · · (2n − ((t + 1)q − 1)) 1 1 = ntq · q q+1 2 (1 − n ) · (1 − n ) · · · (1 − (t+1)q−1 ) n 2

2

2

q q+1 (t + 1)q − 1 ≥ ntq · (1 + n ) · (1 + n ) · · · (1 + ) 2 2 2 2n 1 ≥ 1 + u) (due to the fact that if u < 1, 1−u 1 q q+1 (t + 1)q − 1 ≥ ntq · (1 + n + n + · · · + ) 2 2 2 2n 1 (t + 2)q − 1 = ntq (1 + tq ) 2 2n 1

Thus we have F

P r[x → z] ≥

1 2ntq

(1 − (

(t + 2)q − 1 2q 2 t q(q − 1) t(t − 1) 1 )) · (1 + tq + ) · · n 2n 2 2 2 −1 2n

1 (1 + ε)(1 − ε ) 2ntq (t + 2)q − 1 ∆ where ε = tq 2n 2 1 t 2q q(q − 1) t(t − 1) ∆ · · n and ε = n + 2 2 2 2 −1 =

Let us show that ε > 43 ε . Due to the inequality ε ≤

1 2n −1

qt (qt + 3q − t + 1) 2n+1

On the other hand, ε can be rewritten ε=

qt (2qt + 4q − 2) 2n+1

≤

2 2n ,

we have

The Security of “One-Block-to-Many” Modes of Operation

391

Therefore 4 qt 2 4 10 ε − ε ≥ n+1 ( qt + t − ) 3 2 3 3 3 4 10 2 ≥ 0 since t ≥ 2 and q ≥ imply ( qt + t − ) ≥ 0 3 3 3 Moreover, it is easy to see (by going back to the deﬁnition of ε and using the 2 2 2 2 fact that t ≥ 2) that ε ≤ 2t2nq , so that the condition t2nq ≤ 16 implies ε ≤ 13 . The relations ε ≥ 43 ε and ε ≤ 13 imply (1 + ε)(1 − ε ) ≥ 1 As a matter of fact (1 + ε)(1 − ε ) = 1 + ε − ε − εε ε ≥ 1 + ε − ε − 3 4 =1+ε− ε 3 ≥1 F

Thus we have shown that P r[x → z] ≥ We can now apply Theorem 1 with the upper bound

1 2ntq . 2 2 t 1 = 2q2n+1

AdvA (F, F ∗ ) ≤

q 2 t2 2n+1

and 2 = 0, so that we obtain QED

The unconditional security result of Theorem 2 is easy to convert (using a standard argument) to a computational security analogue. Theorem 3. Let f denote any random permutation of In . Let F = FM IL (f ) denote the random function of Fn,tn obtained by applying to f the MILENAGE construction of Figure 3 (where the constants ck and rk (k = 1 · · · t) are assumed to satisfy condition (C)). Let F ∗ denote a perfect random function of Fn,t·n . For 2 2 any q number of queries such that t2nq ≤ 16 , if there exists ε > 0 such that for any testing algorithm T with q(t + 1) queries and less computational resources (e.g. time, memory, etc.) than any ﬁxed ﬁnite or inﬁnite bound R the advantage AdvT (f, f ∗ ) of T in distinguishing f from a perfect n-bit random permutation f ∗ be such that AdvT (f, f ∗ ) < ε, then for any distinguishing algorithm A using q queries and less computational resources than R, AdvA (F, F ∗ ) < ε +

t2 q 2 2n+1

Proof. Let us show that if there existed a testing algorithm A capable to distinguish FM IL (f ) from a perfect random function F ∗ of Fn,nt with an advantage 2 2 t |p−p∗ | better than ε+ 2qn+1 using less computational resources than R, then there would exist a testing algorithm T allowing to distinguish f from a perfect random permutation with q(t+1) queries and less computational resources than R with a distinguishing advantage better that . The test T of a permutation ϕ would just

392

Henri Gilbert

consist in performing the test A on FM IL (ϕ). The success probability p of the 2 2 t algorithm A applied to F (f ∗ ) would be such that |p − p∗ | ≤ 2qn+1 (due to Theo rem 2), and therefore, due to the triangular inequality |p−p |+|p −p∗ | ≥ |p−p∗ |, one would have |p − p | ≥ ε, so that the advantage of T in distinguishing f from f ∗ would be at least ε QED. The following heuristic estimate of the success probability of some simple distinguishing attacks against the MILENAGE mode of operation indicates that 2 2 t bound obtained in Theorem 2 is very tight, at least in the case where the 2qn+1 the ri rotation amounts are equal to zero. Let us restrict ourselves to this case. Let us consider a z = (z 1 , · · · , z q ) q-tuple of FM IL output value, where each z i represents a t-tuple of distinct In values z1i , · · · , zti Given any two distinct indexes i and j, the occurrence probability of a collision of the form zki = zlj 2 can be approximated (under heuristic assumptions) by 2tn , so that the overall t2 collision probability among the qt output blocks of FM IL is about q(q−1) 2 2n . Moreover, each collision represents a distinguishing event with an overwhelming probability, due to the fact that zki = zlj implies zkj = zli . Thus the distinguishing 2 2 t . This does not probability given by this “attack” is less than (but close to) 2qn+1 hold in the particular case where q = 1, but in this case then another statistical bias, namely the fact that no collisions never occur among the t output blocks, provides a distinguishing property of probability about t(t−1) 2n+1 , which is again close to

5

q 2 t2 2n+1 .

Analysis of the Modiﬁed OFB Mode of Operation

The following analogue of Theorem 2 above can be established for the modiﬁed OFB mode of operation (cf Figure 4) introduced in Section 3 . Theorem 4. Let n be a ﬁxed integer. Denote by f ∗ a perfect random permutation of In . Let F = FM OF B (f ∗ ) denote the random function of Fn,tn obtained by applying the modiﬁed construction of Figure 4 to f ∗ , and F ∗ denote a perfect random function of Fn,t·n . For any distinguishing algorithm A using any ﬁxed 2 2 number of queries q such that t2nq ≤ 1 we have AdvA (F, F ∗) ≤

7t2 q 2 2n+1

Proof sketch: the structure of the proof is the same as for the MILENAGE construction. We consider the same X and Z sets of q-tuples as in Section 4. 2 2 t . For any ﬁxed As established in Section 4, |Z| ≥ (1 − 1 ), where 1 = 2qn+1 x ∈ X and z ∈ Z q-tuples of input and output values, it can be shown that P r[x

FM OF B (f ∗)

→

1 with 1 =

q 2 t2 2n+1

3q 2 t2 1 2ntq (1 − 2 ), with 2 = 2n . 2 2 and 2 = 3q2nt , so that we obtain

z] ≥

AdvA (F, F ∗) ≤

7q 2 t2 2n+1

We can now apply Theorem the upper bound QED

The Security of “One-Block-to-Many” Modes of Operation

6

393

Conclusion

We have given some evidence that although “one-block-to-many” modes of operation of blockciphers are not as well known and systematically studied so far as “many-blocks-to-one” MAC modes, both kinds of modes are of equal signiﬁcance for applications such as mobile communications security. We have given security proofs, in the Luby-Rackoﬀ security paradigm, of two simple one to many blocks modes, in which all invocations of the underlying blockciphers involve the same key. We believe that the following topics would deserve some further research: – systematic investigation of alternative one to many blocks modes, e.g. modes involving more than one key, or modes providing security “beyond the bithday paradox”; – formal proofs of security for hybrid modes of operation including an expansion function, for instance for the combination of the expansion function x → (z1 , z2 , z3 , z4 ) and the message authentication function (x, y) → z0 provided by the complete MILENAGE construction.

Acknowledgements I would like to thank Steve Babbage, Diane Godsave and Kaisa Nyberg for helpful comments on a preliminary version of the proof of Theorem 2. I would also like to thank Marine Minier for useful discussions at the beginning of this work.

References [BDJR97]

[BKR94] [BM84] [BR00]

[DHY02] [EJ02] [GL89]

M. Bellare, A. Desai, E. Jokipii, P. Rogaway, “ A Concrete Security Treatment of Symmetric Encryption: Analysis of the DES Modes of Operation”, Proceedings of 38th Annual Symposium on Foundations of Computer Science, IEEE, 1997. M. Bellare, J. Kilian, P. Rogaway, ”The Security of Cipher Block Chaining”. , Advances in Cryptology - CRYPTO’94, LNCS 839, p. 341, SpringerVerlag, Santa Barbara, U.S.A., 1994. M. Blum, S. Micali, “How to Generate Cryptographically Strong Sequences of Pseudo-Random Bits” SIAM J. Comput. 13(4), p. 850-864, 1984 J. Black, P. Rogaway, “A Block-Cipher Mode of Operation for Parallelizable Message Authentication”, Advances in Cryptology – Eurocrypt 2002, Lecture Notes in Computer Science, Vol. 2332, Springer-Verlag, pp. 384– 397, 2002. A. Desai, A. Hevia, Y. Yin, “A Practice-Oriented Treatment of Pseudorandom Number Generators“, Eurocrypt 2002, Lecture Notes in Computer Science, Vol. 2332, Springer-Verlag, 2002. P. Ekdahl, T. Johansson, “A new version of the stream cipher SNOW”, proceedings of SAC’02. O.Goldreich, L.Levin, “A hard-core predicate for all one-way functions”, Proc. ACM Symp. on Theory of Computing, pp. 25-32, 1989

394

Henri Gilbert

[HCCJ02] [HN00]

[JJV02]

[Ka00]

[LR88] [Ma92] [Mi00]

[Pa91] [Pa92] [PR00] [RC98] [Va98] [Va99]

S. Halevi, D. Coppersmith, C.S. Jutla,“Scream: A Software-Eﬃcient Stream Cipher”, Advances in Cryptology - FSE 2002, p. 195-209, Springer Verlag, 2002. J. Hastad and M. N¨ aslund, “BMGL: Synchronous Key-stream Generator with Provable security”, Revision 1, March 6, 2001) and “A Generalized Interface for the NESSIE Submission BGML”, March 15, 2002, available at http://www.cosic.esat.kuleuven.ac.be/nessie/ E. Jaulmes, A. Joux, F. Valette, ” On the Security of Randomized CBCMAC Beyond the Birthday Paradox Limit: A New Construction.”, Advances in Cryptology - FSE 2002, p. 237-251, Springer Verlag, 2002, and iacr eprint archive 2001/074 3rd Generation Partnership Project - Speciﬁcation of the 3GPP conﬁdentiality and integrity algorithms ; Document 2 (TS 35.202): KASUMI algorithm speciﬁcation ; Document 1:TS 35.201 f8 and f9 speciﬁcations ; Docment TR 33.904: Report on the Evaluation of 3GPP Standard Conﬁdentiality and Integrity Algorithms, available at http://www.3gpp.org M. Luby, C. Rackoﬀ, “How to Construct Pseudorandom Permutations from Pseudorandom Function”, Siam Journal on Computing , vol. 17, p. 373, 1988. U. Maurer, ”A Simpliﬁed and generalised treatment of Luby-Rackoﬀ Pseudo-random Permutation Generators”, Advances in Cryptology - Eurocrypt’92, LNCS 658 , p. 239, Springer Verlag, 1992. 3rd Generation Partnership Project - Speciﬁcation of the MILENAGE algorithm set: An example algorithm Set for the 3GPP Authentication and Key Generation functions f1, f1*, f2, f3, f4, f5 and f5* - Document 2 (TS 35.206): Algorithm speciﬁcation ; Document 5 (TR 35.909): Summary and results of design and evaluation, available at http://www.3gpp.org J. Patarin, “Etude de G´en´erateurs de Permutation Bas´es sur le Sch´ema du D.E.S.”, Phd. Thesis, University of Paris VI, 1991. J. Patarin, “How to Construct Pseudorandom and Super Pseudorandom Permutations from One Single Pseudorandom Function”, Advances in Cryptology - Eurocrypt’92, LNCS 658 , p. 256, Springer Verlag, 1992. E. Petrank, C. Rackoﬀ,“CBC MAC for Real-Time Data Sources”, Journal of Cryptology 13(3), p. 315–338, 2000 P. Rogaway, D. Coppersmith, “A Software-Optimized Encryption Algorithm”, Journal of Cryptology 11(4), p. 273-287, 1998 S. Vaudenay, “Provable Security for Block Ciphers by Decorrelation”, STACS’98, Paris, France,Lecture Notes in Computer Science No. 1373, p. 249-275, Springer-Verlag, 1998. S. Vaudenay, “On Provable Security for Conventional Cryptography”, Proc. ICISC’99, invited lecture.

Appendix: A Short Proof of Theorem 1 Let us restrict ourselves to the case of any ﬁxed deterministic algorithm A which uses q adaptively chosen queries (the generalization to the case of a probabilistic algorithm is easy). A has the property that if the q-tuple of outputs encountered during an A computation is y = (y 1 , · · · , y q ), the value of the q-tuple x = (x1 , · · · , xq ) of

The Security of “One-Block-to-Many” Modes of Operation

395

query inputs encountered during this computation is entirely determined. This is easy to prove by induction: the initial query input x1 is ﬁxed ; if for a given A computation the ﬁrst query output is y 1 , then x2 is determined, etc.. We denote by x(y) the single q-tuple of query inputs corresponding to any possible y qtuple of query outputs, and we denote by SA the subset of those y ∈ Im q values such that if the q-tuples x(y) and y of query inputs and outputs are encountered in a A computation, then A outputs the answer 1. The probabilities p and p∗ can be expressed using SA as F p = y∈SA P r[x(y) → y] and F∗ p∗ = y∈SA P r[x(y) → y] We can now lower bound p using the following inequalities: F∗ p ≥ y∈SA ∩Y (1 − 2 ) · P r[x(y) → y] due to inequality (ii) F∗ F∗ ≥ y∈SA (1 − 2 ) · P r[x(y) → y] − y∈Im q −Y (1 − 2 ) · P r[x(y) → y] F∗ But y∈SA (1 − 2 ) · P r[x(y) → y] = (1 − 2 ) · p∗ and F∗ |Im |q −|Y | ≤ (1 − 2 ) · 1 due to y∈Im q −Y (1 − 2 ) · P r[x(y) → y] = (1 − 2 ) · |Im |q inequality (i). Therefore, p ≥ (1 − 2 )(p∗ − 1 ) = p∗ − 1 − 2 · p∗ + 1 · 2 thus ﬁnally (using p∗ ≤ 1 and 1 · 2 ≥ 0) p ≥ p ∗ −1 − 2 (a) If we now consider the distinguisher A which outputs are the inverse of those of A (i.e. A answers 0 iﬀ A answers 1), we obtain an inequality involving this time 1 − p and 1 − p∗ : (1 − p) ≥ (1 − p∗ ) − 1 − 2 (b) Combining inequalities (a) and (b), we obtain |p − p∗ | ≤ 1 + 2 QED.

Author Index

Akkar, Mehdi-Laurent

192

Babbage, Steve 111 Biham, Eli 9, 22 Biryukov, Alex 45, 274 Boesgaard, Martin 307 Canni`ere, Christophe De 111, 274 Carlet, Claude 54 Christiansen, Jesper 307 Dunkelman, Orr

9, 22

Ferguson, Niels 330 Fouque, Pierre-Alain 362 Fuller, Joanne 74 Gilbert, Henri 376 Goli´c, Jovan Dj. 100 Goubin, Louis 192 Hawkes, Philip 290 Hong, Dowon 154 Iwata, Tetsu Joux, Antoine Junod, Pascal

129 87, 170 235

Kang, Ju-Sung 154 Keller, Nathan 9, 22 Kelsey, John 330 Knudsen, Lars R. 182 Kohno, Tadayoshi 182, 330 Kurosawa, Kaoru 129 Lano, Joseph 111 Lee, Sangjin 247

Lim, Jongin 247 Lucks, Stefan 330 Martinet, Gwena¨elle 362 Millan, William 74 Morgari, Guglielmo 100 Muller, Fr´ed´eric 87 Paar, Christof 206 Pal, Pinakpani 347 Park, Sangwoo 247 Pedersen, Thomas 307 Poupard, Guillaume 170, 362 Preneel, Bart 111, 154 Prouﬀ, Emmanuel 54 Raddum, H˚ avard 1 Rose, Gregory G. 290 Ryu, Heuisu 154 Saarinen, Markku-Juhani O. Sarkar, Palash 347 Scavenius, Ove 307 Schneier, Bruce 330 Schramm, Kai 206 Seberry, Jennifer 223 Song, Beomsik 223 Stern, Jacques 170 Sung, Soo Hak 247 Vandewalle, Joos 111 Vaudenay, Serge 235 Vesterager, Mette 307 Wall´en, Johan 261 Whiting, Doug 330 Wollinger, Thomas 206

36